Punishment Effect of Prisoner Dilemma Game Based on a New Evolution Strategy Rule

We discuss the effect of the punishment in the prisoner’s dilemma game.We propose a new evolution strategy rule which can reflect the external factor for both players in the evolution game. In general, if the punishment exists, the D (defection-defection) structure (i.e., both of the two players choose D-D strategy) which is the Nash equilibrium for the game can keep stable and never let the cooperation emerge. However, if a new evolution strategy rule is adopted, we can find that the D-D structure can not keep stable and it will decrease during the game from the simulations. In fact, the punishment mainly affects the C-D (cooperation-defection) structure in the network. After the fraction of the C-D structure achieved some levels, the punishment can keep the C-D structure stable and prevent it from transforming intoC-C (cooperation-cooperation) structure.Moreover, in light of the stability of structure and the payoff of the individual gains, it can be found that the probability which is related to the payoff can affect the result of the evolution game.


Introduction
Game theory is ubiquitous in the real world  in nature and society, such as the invasion of alien species and the conflict of trade between two countries.However, how to settle up with the contradiction between the selfish individual and the social wellbeing and make maximum benefit for the whole society have confused scientists for some decades.There are two classic models in game theory: the public goods game (PGG) and the prisoner's dilemma game (PDG).PGG can be used to study the problem about the cooperation in game [3].Wang et al. [5] studied the evolutionary dynamics of PGG in finite populations.Under the evolutionary dynamics, players who contributed more could successfully defend the invasion and invade others.It could help us understand cooperative behaviors about the contributions in the real world.Furthermore, Wang et al. [6] considered the effect of wealth distribution about PGG under collective risk to analyze the cooperation among rich and poor individuals.On the other hand, PDG has been used to study how to eliminate the dilemma between the person and the society [8].Nowak and May [9] found that the spatial structure could benefit the cooperators against defectors' invasion, which inaugurated a new field-complex network, to study the game theory.The vertex nodes represent the individuals and the edges represent the interactions among the players.Tomassini et al. [11] used two kinds of models of complex network-regular lattices and random graphs to research the Hawk-Dove game and found that the fraction of cooperators in the network was related to the gain-to-cost ratio.Heterogeneity, one of the most important properties in complex network, plays a very important role in the evolution game.Fu et al. [12] found that in small world network the underlying network topological organization could help in enhancing and sustaining the cooperative behaviors.Furthermore, Fu et al. [13] presented a punished strategy having the high heterogeneity property which could make the cooperators survive and wipe out the defectors.Perc and Szolnoki [15] found that the distribution of the wealth and social status could promote the cooperation in the evolution game.Ishibuchi and Namikawa [16] researched the evolution strategies about iterated PDG.The players in the game were located in a cell of grid world.They found that the structure could benefit from promoting the cooperation with random pairing.Roca et al. [18] discussed the effect of spatial structure about the evolution cooperation.Despite the results, they offered some new insights like the relation between the intensities of selection.
In this paper, we pay our attention to the effect of the defection's payoff for the evolution game.In order to simplify the process of the game of PDG, most of the literature usually adopts the limit PDG model, in which the punishment is zero.Here, we consider the factor about the punishment for the evolution game.The reminder of this paper is organized as follows.Section 2 gives the model and the strategy rule.Section 3 presents the simulation results and the explanations.And the conclusion is made in Section 4.

The Model and the Strategy Rule of the Evolution Game
There are only two strategies, C (cooperation) and D (defection), in the PDG.According to the strategy selected, the two players will get the benefits, respectively.If one player chooses the C and the other chooses the D, the individual choosing the strategy C will get the lowest payoff .Meanwhile, the individual choosing the strategy D will gain the highest payoff .If both of the two players choose the strategy C or strategy D, they will gain the payoff  or , with  >  >  >  and 2 >  + .We use a 2 × 2 matrix denoting the possible strategies and payoff as follows: In order to study the processing of the game theory concisely, researches usually take the limited payoff matrix; that is, let element  be zero, which is introduced by Nowak and May [19].Here, we use the normal payoff matrix as follows: Now, we give the strategy rule on lattice network for the evolution game; that is, if one individual's payoff is larger than its neighbor's, it will keep its strategy; otherwise, it will randomly imitate the other individuals' strategy in which one of its neighbors interacts with itself.Note that this evolution rule can reflect the external effect for the individuals in the game as in Figure 1.If the individual E's payoff is larger than F's, E will keep its strategy unchanged.Otherwise, E will imitate H's, I's, or G's strategy randomly.At the same time, we can regard H, I, or G as the environment for E who does not interact with them.In the evolution game, the probability of the strategy changed between individuals  and  depended on the payoff difference [13]: where  characterizes the noise for permitting irrational choices.In iterated PDG, both of the two players will choose the strategy D gradually, so the strategy (D, D) is the only Nash equilibrium for the PGD.Usually, the researches can let the punishment benefit be zero for simplifying the processing of the evolution game.However, in real world, the punishment benefit is not always zero.When the external effect becomes a very important factor in the evolution game, the effect of punishment can never be ignored.Moreover, sometimes the effect of punishment may not affect the evolution game in negative degree.

Main Result and Simulations
In this section, we will illustrate some simulations on the lattice network with the size  = 50 × 50; let the final time step  = 400; and all individuals' strategies selected are D in the initial network.We run 100 simulations independently and take the average data of the 100 simulations for Figures 2-12.It is easy to see that  in the payoff matrix can affect the result of the evolution game from the model.Therefore, we will discuss the two different cases for 1 <  < 1.5 and 1.5 <  < 2, respectively.First, we take  = 1.35,  = 0.3, and  = 0.1.For  = 1.35, the result of the game will evolute to the equilibrium state which is the cooperators and defectors located on the lattice alternatively for the evolution strategy rule.With the punishment element  existing, one individual   will not choose the strategy C. Because others choose the strategy D, it means that the one selecting strategy C will gain no payoff; moreover, both of the two players selecting strategy D will also get payoffs.And, in PDG, both of the two players choose the strategy D which is the dilemma of the game.Generally, the punishment can help the D-D structure keep stability.From Figure 3, comparing to  = 0, the appearance of cooperators in the network will increase slowly in the initial network.As the game goes, we can see that the faction of the cooperators for  = 0.3 will be larger than that of  = 0. But, from the common sense, the result of the game for  = 0 should be better than  = 0.3.Why can this unusual situation happen?If one individual on lattice selects strategy C, its neighbors can get the maximum payoff.However, the ones connected with the neighbors whose payoffs are less than the neighbors' , according to the evolution rule, may select the strategy C. In addition, the strategy of changing probability depends on the individual's payoff.For the effect of the punishment element , the probability of the strategy D transforming to strategy C will be large for formula (3).At the same time, the probability of the strategy C changing to strategy D will be less contrarily.So, the C-D structure formed for  = 0.3 is more stable than for  = 0, which is the reason why the percentage of the cooperators for  = 0.3 is more than that for  = 0 at the end of the game.And then we find that the strategy D-D structure cannot stop the cooperation in the network for the evolution strategy rule.For a fixed , we will see that the different  can affect the evolution game.Here, let  = 1.35,  = 0.1,  1 = 0.3, and  2 = 0.5.From Figure 3, being similar to the above analysis, for  2 = 0.5, we can see that the fraction of cooperators will increase more slowly and achieve more profit than  1 = 0.3.Therefore, the punishment does not always take negative effect.In some particular evolution rules, the punishment can help in promoting the cooperation.
Next, for 1.5 <  < 2, the result of the evolution game is different from the situation for 1 <  < 1.5.The cooperators can form triangle clusters to fight against the defectors' invasion efficiently and can expand more in the network [21,22].So, the cooperators can break up the equilibrium state who can take advantage obviously at the end of the game.Let  = 1.55,  = 0.1,  1 = 0.3, and  2 = 0.5.From Figure 4, for  2 = 0.5, the fraction of the cooperators will increase firstly and then when the percentage of the cooperators achieves some levels, the growth of the cooperators will slow down.Furthermore, at the end of the evolution game, the percentage of the cooperators will be less than  1 = 0.3.
We will illustrate it together in Figures 5-7.Because of the evolution rule, if someone chooses the strategy C, others who are the neighbors of the C individual's neighbors may select the strategy C. Therefore, the D-D structure cannot keep stable and then the D-D structure will transform to the C-D structure and drop obviously as the game goes.We can also see that, for higher , the fraction of D-D structure decreases faster and more in Figure 8.With the cooperators increased in the network, the C-D structure will increase and, accordingly, the C-C structure will also increase.Because the cooperators can form the triangle structure to defend the defectors' invasion and the cluster of the cooperators can be so the fraction of the C-C structure can increase.From velocity of increasing for cooperators in Figures 6 and  7, the C-D structure will increase fast.After achieving the summit, the C-D structure will transform to C-C structure fast and then decrease as the evolution game goes.The effect of the C-C cluster will enhance.That is the reason why the fraction of C-C structure will increase as the game goes.With the effect of the punishment, the C-D structure will decrease more slowly for  2 = 0.5 than for  1 = 0.3.For higher , the C-D structure can better keep stable.This situation means that the punishment  can defend the cluster of the cooperators and keep the C-D structure stable.Therefore, the fraction of the cooperators for  1 = 0.3 will be larger than  2 = 0.5's at the end of the evolution game.In conclusion, for the particular evolution rule, the effect of the punishment can affect certain strategy structure.Here, we give the data about the the changing of the fraction of C-D structure on lattice network as in Table 1.
From the above analysis, we can find that the changing of the payoff which the individual gains can affect the probability.And then we will focus on the effect of the probability.So, we can use the parameter  in formula (3).Here, for  = 1.55 and  = 0.0015, let  1 = 0.3 and  2 = 0.5.When the parameter  turns to be small, the probability of the strategy changing will also be small.In Figure 8, for  1 = 0.3, the fraction of the cooperators will increase firstly and will be larger than  2 = 0.5 at the end of the game.From Figures 9-11, we can see that the C-D and C-C structure for  1 = 0.3 will increase firstly and the D-D structure will decrease firstly.And, then, the C-D structure for  2 = 0.5 achieves the summit more; after that, the fraction of the C-D structure decreases less than  1 = 0.3 and the fraction of C-C structure for  = 0.3 in the network will almost be more than  2 = 0.5.Comparing with Figures 4-7, we can see the difference.In Figure 8, the faction of cooperators will increase slowly for higher , but the situation is in contrast to that in Figure 4. Why can this difference happen?For the smaller probability, the changing of strategy is not often.It will help the strategy D-D structure in keeping stable.However, for the evolution strategy rule, the D-D structure cannot keep stable and it will transform to C-D structure.In another way, the smaller probability can reduce the effect of evolution rule in some degree.From Figures 9-11, we also can see that the effect of the punishment mainly affects the fraction of the C-D structure.For higher , the C-D strategy can gain more and keep more stable.In Figure 12, we can find that the more the  is, the more the fraction of the cooperators is.Moreover, with the  increased, the fraction of the cooperators in the network will increase fast, which implies that the probability can affect the result of the evolution game.

Conclusion
In this paper, we have discussed the problem of the effect of the punishment for the evolution game on lattice.We proposed an evolution strategy rule which can reflect the external factors.Under the evolution rule, we can find that the punishment can affect the evolution game.The punishment can help the cooperators to increase firstly which is contrary to the common sense that the D-D structure will keep stable.Actually, the D-D structure cannot be stable for the evolution rule.Moreover, the punishment through the C-D structure affects the result of the evolution game.For higher , when the C-D structure achieves the summit, it will keep more stable and decrease less.Despite the payoff the players gain, we also find that the probability is related to the evolution game.The more the probability is, the more and faster the fraction of cooperators increases.

Figure 7 :Figure 8 :Figure 9 : 8 T
Figure 7: The solid line represents the fraction of D-D structure in the network with  = 1.55 and  = 0.1 for  1 = 0.3 and the dashed line represents that for  2 = 0.5.

Figure 10 :
Figure 10: The solid line represents the fraction of C-D structure in the network with  = 1.55 and  = 0.0015 for  1 = 0.3 and the dashed line represents that for  2 = 0.5.

Figure 11 :Figure 12 :
Figure 11: The solid line represents the fraction of D-D structure in the network with  = 1.55 and  = 0.0015 for  1 = 0.3 and the dashed line represents that for  2 = 0.5.

Table 1 :
The data about changing of the fraction of C-D structure in network.