Simulated Evolution in a Linguistic Model

In this paper we present a simple evolutionary model of childrens’ language development, whose central nonlinearity is represented by noninvertible discrete dynamical systems. The underlying assumption of the model is that children learn from other children through their interactions. The concrete learning mechanism used is based on imitation, where childrens’ languages evolve through attempting to imitate other childrens' utterances. The use of imitation in evolutionary models has been used, for instance, in evolution of bird song by Kaneko and Suzuki. The model to be presented here is similar to Kaneko and Suzuki’s model, the primary difference being the continuous nature of bird song, in contrast to the discrete nature of childrens’ utterances.


INTRODUCTION
Recent work in a complex systems approach is highlighting the co-evolution of vocabulary and grammar in child language [1], and the interde- pendence of vocabulary and syntax in language learning modelled by simple recurrent networks in a connectionist approach [5].
Before proceeding we shall introduce some notation, by the following definitions: A vocabu- lary is a fixed collection of words; an utterance is a sequence of words generated by a child; the language of a child is the set of utterances that can be generated by the child; and finally, an interac- *Corresponding author.tion is when two children talk together.It should be noted that in our simple model neither words nor utterances have any particular meaning associated with them (apart from an inherent ordering; see later).Improvements on the model would obviously have to address the issue of including some meaning, for example, through the inclusion of a grammar.
In order to keep the model as simple as possible we will make the following two basic assumptions about childrens' development of simple language.We stress that the assumptions are meant as suggestive and pedagogical rather than as a full scale model of the complex phenomena involved in real life language development.Firstly, it is assumed that children learn through interaction with other children, that is, the learning is an instance of unsupervised learning, contrasted by neural networks which normally learn under supervision [17].Secondly, the primary learning is through imitation of other children.Our model is similar to the bird song model by Kaneko and   Suzuki [10].
The model is reduced from the complications of real life language use and learning to a few important qualitative features: basically, speech generation and learning through reproducing the speech of others heard in interaction.Current views of both first and second language learning emphasise the importance of social interaction in the development [13,14].The model incorporates key aspects of second language learning in which both input to the learner and output produced by the learner are necesssary [15].The input to the child in the model arises through interaction, albeit of a very simplified form that downplays the importance of comprehensibility or understanding to learning [12,13].
The co-adaptive language development being modelled might be seen as either (a) first language development after about three years of age, such as might occur in nursery or pre-school classes, or (b) classroom foreign language learning through pair or group work.The major difference between these two contexts of language development, as far as the constituents of the model are concerned, lies in the size of the vocabularies brought to the learning task.In the first language context, the vocabularies of pre-school children are estimated at 3000 words or beyond [14]; in a typical foreign language context, this level of vocabulary would take at least four or five years of tuition to reach.The vocabulary used in the present model is very small relative to the first language figures, but might be considered to represent a subset of vocabulary for a particular group of children, perhaps referring to a relatively unfamiliar semantic field that is met only though formal schooling, such as weight and mass.
The modelling process will involve defining the development of the following three elements: a model of a child; a model of interaction between two children; and a model of the learning process.
Having defined these elements we can continue with exposing the model to simulated evolution.This will consist of letting a group of children interact, and letting (some of) the children learn from these interactions.After this we can then evaluate the results qualitatively as well as quan- titatively.First, we shall simply observe char- acteristic features of the evolutionary processes, such as determining what groups of children perform well, and describing what stable (and unstable) evolutionary structures are present in the system; then we will evaluate the complexity of the languages involved using finite state machines as models of computation.
In the model, a child will be capable of listening to other children (though only one at a time), to think about what it has heard, or rather is currently hearing, as well as generate speech.Thus, the child is merely a specific instance of a much more general situation: An entity capable of receiving information, processing information, and communicating information, i.e., the components characterising a computer.
The organisation of this paper is as follows.In Section 2 the mathematical model is derived and formulated.Section 3 introduces the fitness landscape and examples are given.Then in Section 4 we present the results of simulated evolution in the model by descriping the evolved structures.In Section 5 we evaluate the complexity of the lan- guages developed and interpret the results in this context.Finally, Section 6 contains a discussion of the presented results.

THE MODEL
Let us briefly go through the main model com- ponents, listening and speaking.The first com- ponent is the listening device.This determines how much we listen to ourselves and others.In one end we mainly ignore others and essentially we have autonomous speach.This is referred to as low coupling.At the other end we mainly listen to others and ignore our own internal state; this is called high coupling.For the speech generation we need to know what we say in the absence of input, i.e., autonomous generation of speech.We shall pick the speech generation process as the model's primary nonlinearity, and most interpretations will be in terms of this parameter.The model of a single child will be characterised by three main parameters, one determining the speech process: a parameter p E [0, 1] (which will be referred to as the parameter); one determining how much the child listens to others during imitation: a coupling c E [0, 1].Finally, it is also necessary to specify how the child starts to speak autonomously through a state x0 [0, 1].In the present model this is always chosen randomly (i.e., Xo in Eq. ( 2) below is always chosen by a pseudo random number generator).
To simplify matters we will let each child have access to the exact same vocabulary consisting of, say, 20 words.As an example the vocabulary could consist of the following words: and, dog, not, cat, why, pat, sit, bat, let, say, has, him, her, dot, wet, hot, hat, run, all, win.
The only significance of the words is that they are ordered as we have just listed them.That means that we consider the word pairs (and, dog), (sit, bat), and (hot, hat) as being close, and similarly the word pair (and, win) is considered distant.The number of words in the vocabulary is a parameter in the model and the effect of changing it will be considered.Although all children have the same basic vocabulary they may not all be able to use all of the words in the sentence, especially the words at the end of the list are not used as much as those in the middle (the explanation for this lies in the used speech gen- eration process outlined below).
Let us now explain the autonomous discrete speech generation (as in the case of a child being asked to say "something").The process must generate a sequence of discrete elements, i.e., words taken from the vocabulary.Discrete state systems are then a natural choice, e.g., finite state machines, Turing machines and cellular automata.
However, we have chosen to use a continuous variable, namely by the use of the logistic map, and then apply coarse graining (or symbolic dynamics) to generate utterances, as we describe in the following.We start by generating a sequence of fixed length, say, 20, {x0, xl,...,x19} by the logistic equation where a is determined by the parameter p through the relation a=ao+p/2 where a0= 3.5 and the initial condition x0.This sequence is converted to an integer sequence {I0, I,... ,I19) by transform- ing each x,. to an integer by I,.--integer part of (20x.), i.e., by a coarse graining of the orbit.Finally, the integer Ii is transformed into a word by the vocabulary, i.e., Ii-0 gives and, and I; 6 gives sit.As an example of the creation of an utterance consider the orbit {0.01, 0.16, 0.49, 0.71, 0.82}.This is turned into the integer sequence {0, 3, 9, 14, 16}, which in turn yields the utterance {and cat say wet hat}.Let us in passing note that the choice of the logistic map as the underlying dynamical system is rather arbitrary; any system capable of generating complex dynamics could presumably be used.
Having defined the autonomous speech genera- tion we can proceed to define the interaction between two children.The interaction between is an imitation process.A child evaluates its lan- guage by attempting to imitate an utterance generated by another child and vice versa.This is carried out as follows.The imitated child talks autonomously (using Eq. ( 1)).The imitating child listens to the other child while simultaneously "talking" but not out loud; this is a transient phase.After a while (an utterance of fixed length) the imitating child stops listening and attempts to imitate the other child by autonomous speech and initial condition as generated by the transient phase.The roles are reversed and the imitation process repeated.We then assign a score depend- ing on their ability to imitate each other.
The mathematical details of the interaction are as follows.The child that is being imitated generates a sequence autonomously by the recur- rence relation yn+l ay,(1 -y), n 0, 1,..., 18 (2) whereas the child attempting to imitate is being coupled to the child being imitated.The non- autonomous recurrence relation used here is x,+l a((1 e)x, + ey,,) (1 ((1 e)x, + ey,,)), n=0,1,...,8 For small values of the coupling parameter e the result very similar to that generated by the autonomous process described in Eq. ( 2).After the non-autonomous speech the imitating child USeS x,+=ax,,(1-x,), n 9,10,11,...,18 (4) where a ao+p/2 and e is the coupling paramter with different parameter and coupling values for the two children.(2).Note that this is a truncated version of the usual logistic map for the chosen parameter range [0, 1].Also note the periodic windows, e.g., the period three window near p 0.68.More informa- tion on the behaviour of the logistic map can be found in, e.g., Guckenheimer and Holmes [9] and Collet and Eckmann [2].
The learning process is clearly an important part of the model.If a child performs poorly relative to other children it will learn; if it performs well it doesn't learn from the interactions.We have chosen to let the speech generation process be the only dynamic quantity during evolution, meaning that the coupling remains fixed over time.This is done in order to make it easier to interpretate the results.To make sure that learning is gradual, the learning involves adding or subtracting a random number between 0 and 0.01 to the current value of the nonlinearity parameter.The only exception is when p becomes larger than one we subtract one, and similarly when the parameter becomes negative we add one.This effectively places the nonlinearity parameter on the circle, i.e., p E S1.
As mentioned above we assign a score to the performance of the children in the imitation game.This is done by measuring the quality of the imitation.To do this we define the difference between two utterances by the formula: Z (distance between words) 2 words in utterance or more precisely 19 (integer part of(201xk ykl)) (6) k--10 The better speaking child receives 10 points, the poorer point; in case of a tie both receive 5.5 points.As described in Section 4, in the round robin tournament, dealing with simulated evolu- tion, the scores are normalized such that the best overall gets a unit score.

THE FITNESS LANDSCAPE
To represent the performance of the children (or the individual speech generation processes) we can produce a so-called fitness landscape.The fitness landscape represents the performance or fitness of all individuals in a population relative to all other individuals.To determine the landscape we take a population of children and let all children interact with all other children in a round robin tourna- ment.The total score of each child is then normalized with respect to the best score.The fitness landscape is plotted by showing the fitness versus the nonlinearity parameter.A general feature of the fitness landscape is that it is in general rugged, see, e.g., Kauffman  [11].We note that the fitness landscape depends on the population, and hence when evolution takes place (as we will include later on), the fitness landscape changes over time.In other words it may be considered the goal of each individual to move around in the changing fitness landscape attempting to optimize its performance.
To test the importance of the size of the vocabulary we have computed the fitness period-3jl landscape for different vocabulary sizes, and a population containing 1000 individuals (small variations occur depending on the number of individuals, but the qualitative appearance is the same).Each child is assigned a randomized coupling between 0 and 0.1 which remains fixed.
In Figure 2 we have shown a fitness landscape for 4. SIMULATED EVOLUTION a vocabulary with only two words.Note the ruggedness of the landscape.Figure 3 shows a In this section we will present the results of fitness landscape when the vocabulary contains carrying out simulated evolution in the model.twenty words.There are obviously many differ- The recipe is very straightforward.We take a ences between the fitness landscapes shown in uniformly randomized population of children, Figures 2 and 3, but also similarities are present, i.e., they have random nonlinearity and coupling such as the ruggedness.More important simila- parameter.The children then interact with all rities are the peaks in the landscape, for example, other children in a round robin-like tournament.the largest peak is close to 0.7, located near the Through each interaction the children accumulate period-3 window (as is also indicated in the a score determining their fitness in the current figures).Also note the smaller peaks near some population.A ranking is performed and the 10% of the period-4 and period-5 windows.These poorest performers then learn, according to the peaks are all clearly present in both fitness simple ruled described above.These steps are then landscapes.Dramatically increasing the size of repeated over and over.We shall now show some the vocabulary to one thousand words yields a examples of simulated evolution in the model with fitness landscape as shown in Figure 4.The a population of 100 children and discuss the qualitative similarity with the previous landscapes results.
is clear.Hence we conclude that the system As we are only including evolution on the dynamics does not critically depend on the size nonlinearity parameter, we can plot this parameter of the vocabulary.For simplicity we shall there- against time, i.e., for each, say ten time units (or fore fix the size of the vocabulary at twenty words longer depending on the time scale) we show the until Section 5 where it will be reduced to two parameter for each individual against time.In words.
Figure 5 we have shown the evolution taking place on a time scale from 0 to 1000.The initial population is uniformly distributed as can be seen on the ordinate axis.Around time 50 we note that there is a depletion of individuals around the parameter 0.6 which lasts until time 200.The explanation for this can be seen in the fitness landscape shown in Figure 3.The average fitness of individuals near this parameter value is rather poor, and hence the individuals located here initially have learned and moved away from this region.Similar depletions are seen many places in the figure, but is most notable in this region where it occurs repeatedly.Another feature we can note is the clustering of individuals, particularly near 0.7. Figure 3 reveals that this is where the main peak is located.Another concentration is found around 0.8 at another peak.This structure also seems to persist on the time scale shown.A smaller clustering can be found near 0.95, but this is less stable, i.e., it can be seen how the individuals here disappear around time 600 only to return later.The reason for the latter being unstable is found in Figure 3 where we can see that there are individuals with poor performance in the region around 0.95, and hence the noise level (in the evolution) makes this evolutionary structure un- stable.Increasing the time horizon to 10000 we obtain the result shown in Figure 6.Here the structure near 0.7 becomes even more pronounced and its stability emphasized.The two other previously mentioned regions are seen to reappear occasionally but are never stable for long.An interesting feature present in this figure but not in the previous one, is the clear trend of individuals below 0.7 moving towards lower values of the nonlinearity parameter.Again Figure 3 provides a clue.It can be seen that for decreasing values of the nonlinearity the average fitness increases slightly, but of course with many fluctuations.Increasing the time horizon once more to 100000 yields Figure 7.Here the structure around 0.7 is still stable, and we conclude that it is stable for all times.The two other regions of clustering of children can be seen to exist once in a while but not for long judged on this time scale.The downwards moving trend is now barely visible which is due to the fact that it now moves downwards with a slope ten times larger, i.e., almost vertically.

$. COMPLEXITY OF LANGUAGES
In the previous section we saw that simulated evolution leads to clustering of children, with some clusterings beings stable, some unstable.The clusterings were located around specific parameter ranges, which corresponded to the peaks in the uniformly randomized populations fitness landscape (for example, as shown in Fig. 3).These high performance peaks were located near large peri- odic windows for the underlying logistic map (see Fig. 1).Thus, simulated evolution does tend to produce a clustering around parameter values for the speech generation process that performs well in the imitation process.However, it does not answer any questions about what kind of languages are doing well.To investigate this we shall attempt to evaluate the languages used by the children in the model.
To evaluate the language of a child we need to decide what is meant by the complexity of a language.This is obviously not an easy question to answer satisfactorily, so we shall settle for an operational definition.We will define the language in terms of how difficult it is to recognize the set of utterances generated by a particular child.In other words we wish to construct a machine that, given an utterance, is capable of determining whether it could have been uttered by that child.
The language of a child can be difficult to describe, and we will approach the problem with a straightforward operational definition.Many chil- dren in the model are capable of chaotic word composition and hence can generate infinitely many utterances.Thus to simplify we only consider utterances of fixed length, more specifically we will only consider utterances of length ten.To deter- mine the language we first have the child generate ten utterances, each of length one thousand.These utterances are then searched with a template of length ten for all possible occurrences of strings of length ten (out of the total possible number which is 21= 1024).The set of different utterances found is then defined as the language.
As we recall that the fitness landscape does not depend sensitively on the number of words in the vocabulary, we shall for simplicity only consider the case of a binary language.
A language recognizer is a special case of a finite state machine, see, e.g., Grimaldi [8] (more advanced machine models of the computation of the logistic map are described in Crutchfield and   Young [4]).It can formally be described as a three- tuple IS, I, f) where S is the set of states of the finite state machine, I is the input alphabet, in this case is the binary alphabet I= {0, 1}, and f: S x I S is the function determining the next state as a function of the present state and the input.Figure 8 shows an example of a simple finite state machine that recognizes the language defined by the binary strings {000, 001,100, 110}.In the figure it is clear that the two states 4 and 5 are identical since they give rise to the same next state on the same input, i.e., f(4,0) =f(5,0) and f(4,1) =f (5,1).
Collapsing these two states to one single state leads to a smaller finite state machine equivalent to the original.Continuing to collapse identical states is a minimization process that leads to the smallest finite state machine equivalent to the original one, which is easily constructed.We then define the complexity of a language as the number of states in this minimal finite state machine.
To illustrate the minimization procedure we shall now minimize the finite state machine shown in Figure 8.As mentioned above states 4 and 5 are equivalent and hence can be collapsed.The same is true for state 3 and the acceptance state (note that we haven't included the arrows on the error and acceptance states: the arrows should point to the states themselves).Reducing the recognizer we obtain the finite state machine shown in Figure 9. Considering states and 4 we can see that they map to the same states under the same input, and hence they can be collapsed into one single state.This yields the finite state machine depicted in Figure 10.We observe that this machine cannot be reduced any further, and hence it represents the minimal finite state machine capable of recogniz- ing the language {000, 001, 100, 110}.0 FIGURE 9 Partially reduced finite state machine recognizing the language {000, 001, 100, 110}.0 FIGURE 8 Sample finite state machine recognizing the language {000, 001, 100, 110}.In Figure we have shown the computed complexity of the languages against the nonlinear- ity parameter for a total of 2049 children, uniformly distributed along the nonlinearity para- meter axis.The most notable feature of the figure is the peaks, which are much more pronounced than the performance peaks in the fitness land- scape.We recognize the largest peak as the period- 3 window in the logistic map where the most dominant clustering took place during simulated evolution.Also we note the period-4 and period-5 windows that had visible clustering during evolu- tion.In addition there are a few peaks to be seen in the left hand side of the figure, most prominently a significant peak associated with another period-5 window.The figure shows that the most complex languages are not created in the period-3 window but rather in the period-4 and period-5 windows.
However, as we saw these are not as important during simulated evolution.This is due to the narrow structure of the peaks.Remember that there are a number of inherent noise factors.First, initial conditions before imitation are always chosen at random, and secondly the learning process adds a random number to the individuals nonlinearity parameter when they learn.Especially the latter has the effect of destabilizing very narrow peaks.The most interesting information we gain from Figure 11 is that the system dynamics clearly selects individuals with languages that are complex in terms of recognition, although there is nothing in the system dynamics that refers to the complexity of the language (only the ability to imitate matters).

DISCUSSION
We have described a model of childrens' language development whose central element was the speech generation process.This process was based upon discretization of the orbits of noninvertible maps, here logistic maps.The model of language devel- opment was based on imitation, i.e., the childrens' skills were measured by their ability to imitate utterances of other children.The relative abilities were presented in terms of the speech generation parameter by computing the fitness landscape.The peaks in this landscape, which correspond to good language skills with respect to imitation, were seen to be located mainly in periodic windows for the logistic map, with only the largest ones being visible, the largest by far being the period-3 window.
The model was extended with a learning mechanism by which the poor performing children could learn from the better.This was used in simulating evolution in a model with many indi- viduals interacting through imitation.The dy- namic interactions and hence learning lead to clustering of children near parameter values where periodic windows are located.
We introduced the complexity of the language of a child through the number of states of a minimal finite state machine capable of recogniz- ing the child's language.It was demonstrated that the imitation mechanism during evolution selected exactly those children with complex languages, in the above sense.This was despite the fact that the imitation process contains no information about the complexity of the language.We saw that even more complex languages existed but that they were destabilized these in evolutionary terms by noise.
Evolutionary models such as the present repre- sent a qualitative approach to analysing complex dynamical systems, in that they are constructed as analogous to the real life system, not as a direct representation of it [7].The model of language learning that we have explored contains only a few features characteristic of real life language use and development.In evaluating the usefulness of computational modelling in the investigation of extremely complicated real life systems, such as language development, the extent to which the model mimics, or not, the behaviour of real life systems will give some indication of usefulness.If a simple model performs appropriately, it might be gradually developed by introducing further features.
The features that were selected to be included in the model-speech generation and learning through reproducing the speech of others-are central to language development in both first and foreign language contexts.Evolving the model led to the clustering around certain values, and to a preference for the children who succeed in learn- ing to be also those who develop more complex languages, but not the most complex languages which was due to the noise level as discussed above.The model produces this as an emergent outcome; it could not have been predicted from the components of the model.
How does the emergent behaviour of the model compare with real life language development?In some key ways, the model reflects what applied linguists and teachers would recognise as intui- tively likely: reproducing the speech of others can lead to the development of a more complex language, as it has been defined here, which contains some sense of being original or unex- pected.The finding that this process is more successful for the good speaker (those around the period three window) than for really excellent speakers, or those who are not so good, is initially surprising, but might reflect the need for alter- native ways of learning for children of different proficiency levels.On the other hand, the less positive learning experience of those children with just slightly less complex languages than the good learners is a counter-intuitive outcome that needs further study.
The model we have presented is clearly extre- mely simplified and, of course, many aspects have been left out.There are obviously many ways in which one could develop the model.If we start with the speech generation mechanism it is impor- tant that the underlying model is capable of generating complex utterances, which was the main reason for our choice of the logistic map.It is possible to use other maps, for instance, higher dimensional maps that include more parameters, all or some of which could then be included in the learning process.Since the utterances we require are composed from a discrete set one could also consider the use of finite state machines or Turing machines, both of which could immediately deliver utterances without coarse graining.The richness of the dynamics of the logistic map would require large finite state machines and Turing machines if we were to mimic these.
Other models of interaction could be tried.
For example, it would be better to include a dynamic coupling rather than the static being used, in order to reflect how speakers accommo- date to each other in the process of word com- position [3].It is possible to maintain reproduction as the main mechanism of learning but rather than the current use of unsupervised learning one could introduce supervised learning with a teacher or other helpful adult who could present more learnable utterances for reproduction.Likewise the model of learning, which is extremely simple in the present model could be replace by something more sophisticated, for instance, a learning me- chanism that takes some account of the child's zone of proximal development [16], i.e., that at any given time, certain things are easier to learn than others [6].It would also be interesting to include additional effects into the model, such as a grammar which would constrain the order of words in an utterance.

For
future reference, Figure shows the bifurcation diagram for the nonlinear map in Eq.

FIGURE
FIGUREBifurcation diagram for the reparameterized logistic map.

FIGURE 5
FIGURE 5 Simulated evolution in a population of 100 children.

FIGURE 6 FIGURE 7
FIGURE6 Simulated evolution in a population of 100 children.
Fitness landscape for a vocabulary with one thousand words.