Evolving Ecological Social Dilemmas : A Spatial Individual-BasedModel for the Evolution of Cooperation with aMinimal Number of Parameters

Cooperation, both intraspecific and interspecific, is a well-documented phenomenon in nature that is not well understood. Evolutionary game theory is a powerful tool to approach this problem. However, it has important limitations. First, very often it is not obvious which game is more appropriate to use. Second, in general, identical payoff matrices are assumed for all players, a situation that is highly unlikely in nature. Third, slight changes in these payoff values can dramatically alter the outcomes. Here, I use an evolutionary spatial model in which players do not have a universal payoff matrix, so no payoff parameters are required. Instead, each is equipped with random values for the payoffs, fulfilling the constraints that define the game(s). These payoff matrices evolve by natural selection. Two versions of this model are studied. First is a simpler one, with just one evolving payoff. Second is the “full” version, with all the four payoffs evolving. The fraction of cooperator agents converges in both versions to nonzero values. In the case of the full version, the initial heterogeneity disappears and the selected game is the “Stag Hunt.”


INTRODUCTION
Mutualism, symbiotic relations, and altruistic behavior are ubiquitous in nature [1,2].How did cooperative behavior evolve among self-interested individuals, both between and within species, is an important open question in behavioral and evolutionary ecology.Evolutionary game theory [3][4][5] is a powerful tool to analyze this issue.In particular, 2 × 2 games, that is, 2 players making a choice between 2 alternatives to cooperate (C) or to defect (D), are useful to model very different individuals, from viruses [6] to humans [7,8].So, their application to diverse ecological issues is quite common [9,10].In these games, the payoff of a "player" depends on its strategy and the one of its coplayer.If it plays C, it gets either the "reward" for mutual cooperation R or "sucker's payoff " S depending if its opponent plays C or D, respectively.While a D move produces the "temptation" to defect T or the "punishment" for mutual defection P depending if its opponent plays C or D, respectively.Of special interest are the games which involve a tradeoff or dilemma between cooperation and competition.That is, either when none of the two pure strategies (C or D) is dominant (it provides a player a larger payoff than any other regardless of the strategy of its opponent) or when one is dominant but noptimal (there is at least another combination of strategies in which both players can be better off).The paradigmatic example is the Prisoner's dilemma (PD) which corresponds to T > R > P > S. Clearly, it pays more to defect (T > R and P > S) but the dilemma is that if both play D, they get P that is worse than the reward R they would have got if they had played C. The PD is connected with two other social dilemma games [11,12]: when the damage from mutual defection is increased so that it finally exceeds the damage suffered by being exploited, that is, T > R > S > P, the new game is called the chicken [13].This game applies thus to situations such that mutual defection is the worst possible outcome for both players as it happens in most of animal contests.On the other hand, when the reward surpasses the temptation, that is, R > T > P > S, the game becomes the Stag Hunt (SH) [14].The name of the game derives from a metaphor invented by the French philosopher Jean Jacques Rousseau: Two hunters can either jointly hunt a stag or individually hunt a rabbit.Hunting stags is quite challenging and requires mutual cooperation.If either of them hunts a stag alone, the chance of success is minimal.
There are several animal behaviors that have been described as stag hunts, for example, the coordination of slime molds [15].When individual amoebae of Dictyostelium discoideum are starving, they aggregate to form one large body.
Here if they all act together, they can successfully reproduce; however, the success depends on the cooperation of many individuals.Also, the hunting technique of orca called "carousel feeding" [16] is an example of an SH.By acting cooperatively, orcas manage to isolate and corral schools of fish to the surface and stun fishes by hitting them with their tails.
It is not easy indeed to determine the ranking of the payoff values that explain the results of experiments or field observations.Quite the opposite, in the case of animals frequently there is controversy whether the PD or chicken is the appropriate game [17,18].Or many circumstances that have been described as PD might also be interpreted as an SH, depending on how fitness is calculated [14].Moreover, experimental studies indicate that the payoff matrix is not a constant for very simple individuals like viruses [19].On the theoretical side, a problem that faces the modeler is that, for a given game-a specific rank ordering of the four payoffs-changes in the payoff values preserving this ranking often modify qualitatively the results [20].The outcomes also vary very considerably with different parameters determining: (i) characteristics of the agents as the size of their memory [21][22][23], their ability to distinguish cooperators from cheaters [24], and so forth; (ii) the kind of strategies available [25,26]; (iii) the topology of the spatial structure, in the case that territoriality is taken into account [20].
Here my goal is to minimize the dependence of crucial model predictions, like the evolution of cooperation, on the above parameters.Hence, I propose an evolutionary spatial game theoretical model with a minimal number of parameters.To avoid the introduction of quantities that parameterize the agents' memory or its complex strategies, I consider the simplest possible agents: unconditional players versus its neighbors.That is, at each generation or time step t of the game, there are those who always play C and those who always play D. The model has no payoff parameters as inputs.Instead, it starts with an initial heterogeneous spatial distribution of social dilemma payoff matrices.I assume that the individual's payoff matrix reflects its phenotype so it evolves, together with its strategy.As a result, the model gives rise, by natural selection itself, to "equilibrium" payoff matrices.This, besides taking into account the heterogeneity of individuals in the real world, allows to overcome the twin difficulties of the empirical determination of payoff parameters, and the high sensibility of models to their values.
Following the principle of parsimony, I begin studying a simpler version of this model, in which only the temptation T evolves and the other three payoffs are parameters kept fixed.Then I analyze the "full" version in which all the four payoffs are evolving variables (i.e., with no payoff parameters).This introduces unexpected remarkable changes: first, the natural selection process yields a homogeneous or almost homogeneous distribution of payoff matrices.Second, these payoff matrices correspond, in the great majority of cases, to the SH game.

THE MODEL
The two-dimensional artificial world is divided into cells representing agents with only two strategies: to cooperate (C) or to defect (D).L × L square lattices of length L ranging from 100 to 500, with periodic boundary conditions, are used.At t = 0, random values for the four payoffs are assigned to each cell, in the interval [0,1], fulfilling the ranking of payoffs that define the game.The initial configuration for strategies is half of the cells, chosen at random, playing C and the other half playing D. Two types of neighborhoods are considered: the von Neumann neighborhood (with z = 4 neighbor cells) and the Moore neighborhood (with z = 8 neighbor cells).The score U of a given player is the sum of the payoffs it collects against its neighbors in these bimatrix games [5] (each player has its own payoff matrix).The dynamic is synchronous: all the agents update their states simultaneously at the end of each lattice sweep.In order to mimic natural selection, I use the simplest "Imitate the Best" (in the neighborhood) update rule [27]: each individual, after playing against its neighbors, adopts the strategy and payoff matrix of the most successful neighbor (the one that collected the highest utilities U in the neighborhood at this round).For each generation t, we compute the average fraction of cooperators c and the averages of the payoffs T , R , P , and S until the steady state is reached (typically, this takes between 500 to 1000 generations).The symbols • denote averages that are both spatial, over all the lattice cells, and over 500 runs each starting from a different initial configuration (to ensure independence of the initial conditions).
Two versions of the model are studied: (1) simplest version (one evolving payoff game): as a first step let us consider the version with only the temptation T variable in [0,1] and the other three payoffs fixed: R = 1/2, S = P = 0; since P = S, when T > 0.5 (T < 0.5) the game is the frontier between the PD and the chicken (SH); at t = 0, T is a uniform random variable; (2) full version (four evolving payoffs game): next I concentrate on the case in which the four payoffs are evolving variables, always verifying the condition of social dilemma: R&T higher than P&S (this condition is verified for all the L × L payoff matrices at the initial configuration and, of course, for the final payoff matrices which are a subset of the initial set); at t = 0, the three dilemmas are equiprobable.This version also includes an additional sophistication to avoid dependence in the number z of neighbor cells: at each lattice sweep, each agent chooses just one of its neighbors to play (so a given agent plays at least one and at most z +1 times per generation).

One evolving payoff game
In the simplest version, the average temptation T and fraction of cooperators c evolve from T 0 = (1−0)/2 = 0.5 and c 0 = 0.5, respectively, to their steady-state values as shown in Figure 1.Notice the kind of "specular symmetry," with respect to an horizontal line, between the curves of T (t) and c (t).There is a short transient in which T ( c ) grows (drops) very quickly and then it decreases (increases) until it reaches its asymptotic value.The spatial patterns that emerge offer relevant information.For instance, Figure 2(a) represents a typical steady-state map showing clusters of C agents (white) on a "sea" of D agents (black).In Figure 2(b), all the agents that have a temptation T < 1/2 = R are marked in gray.Notice that they are a subset of the C agents.In other words, all the agents that were selected with low values of T are cooperators (the reciprocal is not true: many C agents have values of As a result of selection, the system evolves from an initial configuration with L × L different payoff matrices (one per lattice cell) to a situation in which many less matrices coexist: starting with 100 × 1000 = 10 000 payoff matrices one ends typically with around 500.
By means of a mean field approximation, that neglects all the spatial correlations, the average individual score U can be approximated by Substituting in (1) the computed values of T and c , one gets a value that slightly overestimates U .For example, for z = 4, U 0.47 and U 0.43.

Full version
In the case of the full version, it takes a little more algebra, but the average initial values of the four payoffs can also be computed analytically: T 0 = 11/15, R 0 = 2/3, P 0 = 1/3, and S 0 = 4/15.From this values they evolve to converge, on average, to the SH game. Figure 3 illustrates the evolution of the averages of the four payoffs.Their asymptotic values are R 0.96 > T 0.85 > P 0.59 > S 0.36.In this case, initially there is much more freedom in the choice of payoffs than in the simplest version.However, contrary to what one would expect, the effect of natural selection is more drastic, it eliminates all but very few payoff matrices.The average payoff U can now be approximated by ( It turns out that the agreement between the estimates (produced by ( 2)) and U is better than for the simplest version.We have U 0.758 and U 0.749.This smaller difference can be attributed to the greater homogeneity of the steady state.This uniformity is also manifest in Figure 5, where the histogram of utilities per pairwise game recorded for 500 generations is shown.The high peaks at 1 correspond to patches of cooperators, where R = 1 is a frequent payoff (these patches cover, on average, 2/3 of the lattice, and in many runs all the lattice).

CONCLUSIONS
The simplest version of this evolving payoffs model produces the evolution of cooperation.Starting from T 0 = 0.5 = R and c 0 = 0.5 (=R), it selects, on average, higher values of the temptation to cheat and, thus, lower values of the fraction of cooperators.The steady state consists of a rich structure of several "patches" of agents using the same payoff matrix.
The full version brings interesting changes.The fraction of cooperators is larger than for the simplest version (well above 0.5).This is correlated with a reward that on average is close to 1.A remarkable finding is that a homogeneous or almost homogeneous state of payoff matrices is selected.This (these) payoff matrix (matrices), even though the agents are unconditional players without memory, evolves by natural selection to SH games.I only considered evolution by natural selection.This eliminates the initial heterogeneity of payoff matrices reflecting asymmetries in the interactions between individuals.By taking into account mutations, these asymmetries can be recovered.
All the results are quite robust and do not depend on particular payoffs choices, nor on the lattice topology, and do not rely on specific characteristics of the agents.The dependence on the initial conditions is also mostly removed by taking averages.
Concerning the biological situations to which this model might apply, one can envisage different microorganism ecosystems in which the process of evolution can be directly observed and can lead to adjustments in the dose of cooperation/competition, for example, viruses [19], viral quasispecies [28], bacteria [29], and so forth.Furthermore, at a different level, the results of this model serve to reinforce the importance of the SH game since it is obtained as the product of natural selection.

Figure 4 Research
illustrate this for a typical run: starting with 100 × 100 = 10 000 only three different payoff matrices emerge!

Figure 3 :
Figure 3: Full version: the evolution of the four payoffs R (Δ) T (o) P (∇) S (+) and of c (filled line).

Figure 4 :
Figure 4: Full version: typical histograms for steady-state payoffs.Only 3 different payoff matrices survive.Insets in the histograms for R and T are zooms of the right peak.

1 Figure 5 :
Figure 5: Histograms of utilities per pairwise game for 500 generations and L = 300 for the full version.