^{1}

^{1}

^{2}

^{1}

^{2}

We use coevolutionary genetic algorithms to model the players' learning process in several Cournot models and evaluate them in terms of their convergence to the Nash Equilibrium. The “social-learning” versions of the two coevolutionary algorithms we introduce establish Nash Equilibrium in those models, in contrast to the “individual learning” versions which, do not imply the convergence of the players' strategies to the Nash outcome. When players use “canonical coevolutionary genetic algorithms” as learning algorithms, the process of the game is an ergodic Markov Chain; we find that in the “social” cases states leading to NE play are highly frequent at the stationary distribution of the chain, in contrast to the “individual learning” case, when NE is not reached at all in our simulations; and finally we show that a large fraction of the games played are indeed at the Nash Equilibrium.

The “Cournot Game” models an oligopoly of two or more firms that simultaneously define the quantities they supply to the market, which in turn define both the market price and the equilibrium quantity in the market. Coevolutionary Genetic Algorithms have been used for studying Cournot games, since Arifovic [

In Arifovic’s algorithms [

Vriend was the first to present a Coevolutionary genetic algorithm in which the equilibrium price and quantity on the market—but not the strategies of the individual players as we will see later—converge to the respective values of the Nash Equilibrium [

Finally, Alkemade et al. [

In all the above models, researchers assume symmetric cost functions (all players have identical cost functions), which implies that the Cournot games studied are symmetric. Additionally, Vriend [

“Consider a n-player Cournot Game. We assume that the inverse demand function P is strictly decreasing and log-concave; the cost function

This theorem is relevant when one investigates Cournot Game equilibrium using Genetic Algorithms, because a chromosome can have only a finite number of values and, therefore, it is the discrete version of the Cournot Game that is investigated, in principle. Of course, if one can have a dense enough discretization of the strategy space, so that the NE value of the continuous version of the Cournot Game is included in the chromosomes' accepted values, it is the case for the NE of the continuous and the discrete version under investigation to coincide.

In all three models we investigate in this paper the assumptions of the above theorem hold, and hence there is a Nash Equilibrium in pure strategies. We investigate those models for the cases of

The first model we use is the linear model used in [

Finally, in the third model, we use a radical inverse demand function

We use two multipopulation (each player has its own population of chromosomes representing its alternative choices at any round) Coevolutionary genetic algorithms, Vriend's individual learning algorithm [

Vriend's individual learning algorithm is presented in pseudocode [

A set of strategies (chromosomes representing quantities) is randomly drawn for each player.

While Period is less than

If Period mod GArate

Each player selects one strategy. The realized profit is calculated (and the fitness of the corresponding chromosomes, is defined, based on that profit).

As implied by the condition (If Period mod GArate

Coevolutionary programming is quite similar, with the difference that the random match-ups between the chromosomes of the players' population at a given generation are finished when all chromosomes have participated in a game, and then the population is updated, instead of having a parameter (GArate) that defines the generations at which populations update takes place. The algorithm, described by pseudocode, is as follows [

Initialize the strategy population of each player.

Choose one strategy from the population of each player randomly, among the strategies that have not already been assigned profits. Input the strategy information to the tournament. The result of the tournament will decide profit and fitness values for these chosen strategies.

Repeat step

Apply the evolutionary operators (selection, crossover, mutation) to each player's population. Keep the best strategy of the current generation alive (elitism).

Repeat steps

In our implementation, we do not use elitism. The reason is that by using only selection proportional to fitness, single (random) point crossover, and finally, mutation with fixed mutation rate for each chromosome bit throughout the simulation, we ensure that the algorithms can be classified as

In order to ensure convergence to Nash Equilibrium, we introduce the two “social” versions of the above algorithms. Vriend's multipopulation algorithm could be transformed to the following.

A set of strategies (chromosomes representing quantities) is randomly drawn for each player.

While Period is less than

If Period mod GArate

And social coevolutionary programming is defined as follows.

Initialize the strategy population of each player.

Choose one strategy of the population of each player randomly from among the strategies that have not already been assigned profits. Input the strategy information to the tournament. The result of the tournament will decide profit values for these chosen strategies.

Repeat step

Apply the evolutionary operators (selection, crossover, mutation) at the union of players' populations. Copy the chromosomes of the new generation to the corresponding player's population to form the new set of strategies.

Repeat steps

So the difference between the social and individual learning variants is that chromosomes are first copied in an aggregate population, and the new generation of chromosomes is formed from the chromosomes of this aggregate population. From an economic point of view, this means that the players take into account their opponents choices when they update their set of alternative strategies. So we have a social variant of learning, and since each player has its own population, the algorithms should be classified as “social multipopulation economic Genetic Algorithms” [

In the single population social learning of Alkemade et al. [

Create a random initial population.

Repeat until all chromosomes have a fitness value assigned to them.

Each player selects randomly one chromosome, which determines the player's quantity for the current game.

The realized profit is calculated based on the total quantity and the price, and the fitness of the corresponding chromosome is defined, based on that profit.

Create the new generation using selection, crossover, and mutation.

It is not difficult to show that the stochastic process of all the algorithms presented here forms a regular Markov chain [

Any fitness function that is defined on the profit of the chromosomes, either proportional to profit, scaled or ordered, has a value that is solely dependent on the chromosomes of the current population. And, since the transition probabilities of the underlying stochastic process depend only on the fitness and, additionally, the state of the chain is defined by the chromosomes of the current population, the transition probabilities from one state of the GA to another are solely dependent on the current state (see also [

Having a Markov chain implies that the usual performance measures—namely, mean value and variance—are not adequate to perform statistical inference, since the observed values in the course of the genetic algorithm are interdependent. In a regular Markov chain, however, one can estimate the limiting probabilities of the chain by estimating the components of the fixed frequency vector the chain converges to, by

Another solution could be the introduction of

The maximum value of

We use two variants of the three models in our simulations. One about

The

Note that the number of total iterations (number of games played) of Vriend's individual and social algorithms is

We run 300 independent simulations for each set of settings for all the algorithms, so that the test statistics and the expected time to reach the Nash Equilibrium (NE state, or first game with NE played) are estimated effectively.

Although the individual-learning versions of Vriend's and Coevolutionary programming algorithms led the estimated expected value of the average quantity (as given in (

Mean values of players' quantities in two runs of the individual-learning algorithms in the polynomial model for

Player | Vriend's algorithm | Coevol. programming |
---|---|---|

1 | 91.8309 | 77.6752 |

2 | 65.3700 | 97.8773 |

3 | 93.9287 | 93.9287 |

4 | 93.9933 | 93.9933 |

Lumped states frequencies in two runs of the individual-learning algorithms in the polynomial model for

VI | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | .8725 | .0775 |

.05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||

CP | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | .0025 | .1178 | .867 |

.0127 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Mean Quantity in one execution of Vriend's individual learning algorithm in the polynomial model for

Players' quantities in one execution of Vriend's individual learning algorithm in the polynomial model for

Players' quantities in one execution of the individual-learning version of the coevolutionary programming algorithm in the polynomial model for

That significant difference between the mean values of players' quantities was observed in all simulations of the individual-learning algorithms, in all models, and in both

In the social-learning versions of the two algorithms, as well as the algorithm of Alkemade et al., both the hypotheses

The evolution of the individual players' quantities in a given simulation of Vriend's algorithm on the polynomial model (as in Figure

Players' quantities in one execution of the social-learning version of Vriend's algorithm in the polynomial model for

Notice that the all players' quantities have the same mean values (

Mean values of players' quantities in two runs of the social-learning algorithms in the polynomial model for

Player | Alkemade's | Social | Social | Individual | Individual |

Vriend's | Coevol. | Vriend's | Coevol. | ||

1 | 87.0320 | 86.9991 | 87.0062 | 93.7536 | 97.4890 |

2 | 87.0363 | 86.9905 | 87.0089 | 98.4055 | 74.9728 |

3 | 87.0347 | 86.9994 | 87.0103 | 89.4122 | 82.4704 |

4 | 87.0299 | 87.0046 | 86.9978 | 64.6146 | 90.4242 |

On the issue of establishing NE in some of the games played and reaching the Nash State (all chromosomes of every population equals the chromosome corresponding to the NE quantity) there are two alternative results. For one subset of the parameters set, the social-learning algorithms managed to reach the NE state, and in a significant subset of the games played, all players used the NE strategy (these subsets are shown in Table

Parameter sets that yield NE. Holds true for all social-learning algorithms.

Models | Algorithm | pop | ||
---|---|---|---|---|

All 4 payer | Vriend | 20–40 | ||

models | Coevol | 20–40 | ||

Alkemade | 20–200 | |||

All 20 player | Vriend | 20 | ||

models | Coevol | 20 | ||

Alkemade | 100–200 |

In the cases where mutation probability was too large, the “Nash” chromosomes were altered significantly and therefore the populations could not converge to the NE state (within the given iterations). On the other hand, when the mutation probability was low the number of iterations was not enough to have convergence. A larger population, requires more generations to converge to the “NE state” as well. The estimators of the limiting probabilities of one representative parameter set for representative cases of the first- and second-parameter sets are given in Table

Lumped states frequencies in a run of a social-learning algorithm that could not reach NE and another that reached it. 20 players-polynomial model, Vriend's algorithms,

No NE | 0 | 0 | .6448 | .3286 | .023 | .0036 | 0 | 0 | 0 | 0 | 0 |

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ||

NE | .261 | .4332 | .2543 | .0515 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |

Apparently, the Nash state

Markov and other statistics for NE.

Model | Algorithm | pop | GenNE | RetTime | NEGames | ||
---|---|---|---|---|---|---|---|

4-Linear | Vriend | 30 | 10,000 | 3,749.12 | 3.83 | 5.54 | |

4-Linear | Coevol | 40 | 10,000 | 2,601.73 | 6.97 | 73.82 | |

4-Linear | Alkemade | 100 | 100,000 | 2,021.90 | 9.94 | 87.93 | |

20-Linear | Vriend | 20 | 20,000 | 2,712.45 | 6.83 | 88.98 | |

20-Linear | Coevol | 20 | 20,000 | 2,321.32 | 6.53 | 85.64 | |

20-Linear | Alkemade | 100 | 100,000 | 1,823.69 | 21.06 | 87.97 | |

4-poly | Vriend | 40 | 10,000 | 2,483.58 | 3.55 | 83.70 | |

4-poly | Coevol | 40 | 10,000 | 2,067.72 | 8.77 | 60.45 | |

20-poly | Vriend | 20 | 20,000 | 2,781.24 | 9.58 | 67.60 | |

20-poly | Alkemade | 100 | 100,000 | 2,617.37 | 14.79 | 12.86 | |

4-radic | Alkemade | 100 | 100,000 | 1,969.85 | 8.54 | 86.48 | |

4-radic | Coevol | 40 | 10,000 | 2,917.92 | 5.83 | 73.69 | |

20-radic | Vriend | 20 | 20,000 | 2,136.31 | 7.87 | 75.34 | |

20-radic | Coevol | 20 | 20,000 | 2,045.81 | 7.07 | 79.58 |

We have seen that the original individual-learning versions of the multipopulation algorithms do not lead to convergence of the individual players' choices, at the Nash Equilibrium quantity. On the contrary, the “socialized” versions introduced here accomplish that goal and, for a given set of parameters, establish a very frequent Nash State, making games with NE quite frequent as well, during the course of the simulations. The statistical tests employed proved that the expected quantities chosen by players converge to the NE in the social-learning versions while that convergence cannot be achieved at the individual-learning versions of the two algorithms. Therefore it can be argued that the learning process is qualitatively better in the case of social learning. The ability of the players to take into consideration their opponents strategies, when they update theirs, and base their new choices at the totality of ideas that were used at the previous period (as in [

The stability properties of the algorithms are identified by the frequencies of the lumped states and the expected interarrival times estimated in the previous section (Table

Using these “social learning” algorithms as heuristics to discover unknown NE requires a way to distinguish the potential Nash Equilibrium chromosomes. When

By using the lumped state measure introduced in this paper, a fruitful analysis of the evolution of the players' choices in Vriend's individual learning algorithm and Coevolutionary programming algorithm has been achieved. Our results show that these algorithms are not expected to yield Nash equilibria; players' quantity choices do not converge to the quantities corresponding to the Nash Equilibrium, although the total quantity selected and the price in the market do converge—in a stochastic or Ljapunov sense, that is, the strategies chosen fluctuated inside a region around the NE, while the expected values were equal (as proven by a series of statistical tests) to the desired value—to the ones expected at an NE, as reported earlier [

Although the comparison between the “social learning” and the “individual learning” algorithms is evidently in favour of the former, at least in the models studied here, the comparison between the single population algorithm of Alkemade et al. and the multipopulation “socialized” versions of the two individual learning algorithm we have introduced is not one with a clearly advantageous candidate. Perhaps one could argue that the multipopulation algorithms represent human learning in a better way, since human agents do have their own sets of beliefs and ideas, even if they are influenced by the ideas of others; so a population of strategies for each agent seems more accurate, and perhaps the multipopulation algorithms are more appropriate in an Agent Computational Economics perspective. On the other hand a single population algorithm is easier to implement, and sometimes faster, and thus a better candidate in an algorithmic optimization perspective.

The effectiveness of the “social learning” algorithms allows one to treat them as heuristic algorithms to discover an unknown Nash Equilibrium in symmetric games, provided that the parameters used are suitable and that the NE belongs in the feasible set of the chromosomes' values. If this is the case, the high frequency of the “Nash chromosome” in the populations—especially in the latest generations—of the algorithms, or the high frequency of the games played at NE, should leave no doubts about the correct value of the Nash Equilibrium quantity. Finally, the stability properties of the social-learning versions of the algorithms allow one to use them as modelling tools in a multiagent learning environment that leads to effective learning of the Nash Strategy.

Paths for future research could be simulating these algorithms for different bit-lengths of the chromosomes in the populations since, apparently, the use of more bits for chromosome encoding implies more feasible values for the chromosomes and, therefore, makes the inclusion of unknown NE in these sets more probable. Another idea would be to use different models, especially models that do not have single NE. Finally one could try to apply the algorithms introduced here in different game theoretic problems.

Funding by the EU Commission through COMISEF MRTN-CT-2006-034270 is gratefully acknowledged. Mattheos Protopapas would also like to thank all the members of the COMISEF network for their helpful courses, presentations, and comments.