Agent-Based Modeling and Genetic Algorithm Simulation for the Climate Game Problem

The cooperative game of global temperature lacks automaticity and emotional jamming. To solve this issue, an agent-based modelling method is developed based on Milinski’s noncooperative game experiments. In addition, genetic algorithm is used to improve the investment strategy of each agent. Simulations are carried out by designing different coding schemes, mutation schemes, and fitness functions. It is demonstrated that the method can achieve maximum benefits under the premise of the agent non-cooperative game through encouraging optimal individuals. The results provide a sound basis for developing tools andmethods to support the simulation of climate game strategy that involves multiple stakeholders.


Introduction
Climate change is a global issue that is addressed by taking into account various factors in society.Aiming at a complex social game issue, a large number of participants should make effort to prevent global climate variation.Global climate cooperation is proved to be very necessary in the climate game 1 .A comprehensive research of climate game is a challenging topic.
Global greenhouse gas GHG emissions have been growing greatly 2, 3 .Humankinds are facing a dramatic change of living conditions on the earth when the alreadyrising global temperature passes a certain threshold 4-8 .To reduce the risk of dangerous climate change, it needs to take the main GHG emissions countries into a "climate coalition," which provides climate ambitious emission reduction at the least cost 9 .At the same time, the broad alliance may reflect strongly to realize the incentive to "free ride" 10 .Therefore, GHG emission is not a problem for a single country.In other words, no country can solve the global climate change problem by acting alone.States have to cooperate in order to address the threat of climate change.
Although there are a few game-theoretical works on climate change, in general the social dilemma situations and the public goods game are perfectly fine models for the problem too.And these games have a very rich history as agent-based simulations.The most famous mathematical metaphor for a social dilemma denotes the prisoner's dilemma 11, 12 .Other well-studied models include public goods games 13, 14 , which essentially represent a generalization of the pairwise prisoner's dilemma to interactions in groups of arbitrary size 15, 16 .Foremost, there is the issue of reward and punishment, which has recently been studied a lot in order to understand how these two basic social forces may avert the dilemma 17-20 .Then there are also works concerning the critical mass, conditional strategies, population density, heterogeneity, and interdependent networks in social dilemmas.These subjects have been studied extensively in the very recent past, and their implications for the resolution of social dilemmas such as the climate change dilemma as agent-based models are very significant 21-25 .Most recently, the shift to agent-based modeling has also been highlighted for human bargaining 26 , which is obviously of relevance for the climate change dilemma as their players countries, nations have to agree on a certain policy that they will then carry out.
In essence, global climate game is the process of competition between the participants with different game strategies 27-31 .They must balance the relationship between economic development and environmental protection by finding the Nash equilibrium between these two interests.In other words, only in content with optimality condition and Nash equilibrium of the dual requirement, the climate cooperation will be the most stable and efficiency international cooperation.
Climate protection programs that appeal to a human sense of fairness, that is, each player contributes a "fair share" to the collective goal, are more likely to avoid irrational selfdetrimental behavior 32 .Milinski et al. 33 proposed the collective-risk social dilemma as a framework for investigating the inherent problems of avoiding dangerous climate change, and performed simulation to study the game.According to Milinski's experiment, students invest anonymously, with each student being informed of the cumulative investment sum after each round.Under such circumstance, the trade-off between personal benefits and group interests provides a basis for each student to make its investment scheme.Moreover, students can learn from a successful scheme to make better decisions in the next round.However, this strategy, in which computer runs the dice program, is a kind of the random investment strategy.A stochastic process is one whose behavior is nondeterministic, and subsequent of investment is determined by a random element.Therefore, it is more difficult to obtain the optimal investment strategy.
All those works can be linked to the voluntary contribution games with punishment possibility.The final step of the complex game process should be built by several models, algorithms, and different experiments application in order to make it clear and systematic.This is also the next stage of this work.
The aim of this paper focuses on developing evolutionary model and simulation strategy to improve Milinski's investment strategy.The investment strategy using genetic algorithm GA of agent-based modeling is established based on the noncooperative game experiments in 33 .Firstly, used agents represent climate game players to develop the game modeling.Then, we use GA for investment strategy simulation.The GA investment strategy is specifically designed to support the study of multiparticipant climate change game using computational modeling, simulation, programming, and running.GA is a population-based biomimetic evolutionary method to solve various complex decision-making problems 34-36 .This heuristic is routinely used to generate useful solutions for optimization and search Mathematical Problems in Engineering

Publishment subroutine
Save and abandon the last one Print The meaning of p 0.5 and p 0.1 is that the probabilities of losing all their money are 50% and 10%, respectively, if C120 is not reached.
The meanings of other marks are explained as follows: i W k represents the investment amount of the kth agent;  iii M m represents the total remaining saving in the account of the mth agent; iv r, R are the random variables; v p represents the punishment probability; vi i, j, m, and k, are the loop variables.
In initialization, we set the population size as 20 and randomly generate 20 individuals as the initial population to represent 20 investment schemes.In order to contrast, we save the generated 20 investment schemes into the corresponding records in the database established.

Random Investment
The random investment subroutine is shown in Figure 2 where each agent is provided C40 at the beginning.After that, the random subroutine will run to simulate the investment activities of the six agents for 10 rounds, then transfer the generated data to the punishment subprogram.The investment results and the incomes of each agent will be stored in the database at last.

GA Investment
By running the GA investment subroutine for each agent, a new investment scheme can be generated, from which the total amount of money can be obtained.Then, the personal benefit for each agent after experiencing the risk of losing all the remaining money can be calculated.The investment scheme for each agent, the total amount of money that each agent invested, and the cumulative investment sum among all these agents will be saved into the database.As such, there are 21 records of investment in the database.
Figure 3 is the process of the simple GA investment subroutine.Ris a randomly generated real number, and the condition R ≥ 0.9 implies that the mutation probability of variation is 0.1.The process is explained as follows.
Step 1. Rank the fitness values of the investment schemes which are decoded as individuals in GA in descending order.
Step 2. If the random number R ≥ 0.9, the best individual would mutate using single-point mutation; otherwise, go to Step 3.
Step 3. If the random number R < 0.9, there is no mutation in the game.The best individual is selected as the new one directly.

Individual Representation
As discussed earlier in this paper, a database is established for each agent to save its investment record which includes its investment quota and its remaining money in each round .In this case, the cumulative investment sums for k agents can be obtained.For instance, if the investment record of the kth agent during 10 rounds is 2220044220, which means that the agent invests C2 in the first three rounds, C0 in the fourth and fifth rounds, and so on.We can get that the total amount of money invested by the kth agent throughout 10 rounds reaches C18.If the cumulative investment sum among k agents during 10 rounds is C110, and the kth agent is chosen to be punished since the target sum C120 has not been achieved, the kth agent will lose its remaining C22.In this way, we put data 2220044220, 22, and 110 in the database for the kth agent.
The integer coding method is used in GA, with each individual representing the investment quotas in 10 rounds for each agent.An individual can be 2220044220 for the kth agent in the example mentioned in the last paragraph.

Fitness Function
Individuals in GA are evaluated via the fitness function.Since the goal is to achieve the maximization of personal benefits and cumulating the investment sum, the fitness function is designed as follows: In the equation, k refers to the agent index, and i refers to the investment scheme index.f i denotes the fitness value of the ith scheme for the kth agent, M ki refers to the remaining saving of the kth agent via its ith scheme, that is, profits after the punishment which is 90% probability to punish if the target sum has not been reached, and 6  k 1 W k means the cumulative investment sum from all agents involved.The weighting coefficient α ∈ 0, 1 reflects the balance between individual benefits and cumulative investment sum.
All the investment schemes are ranked in descending order in terms of fitness value.Consequently, the best investment scheme with the largest fitness value can be obtained, and it can be directly established as the designated scheme for the next game or established after an appropriate adjustment.

Mutation
In the GA investment subroutine, mutation is performed for the adjustment of the best investment scheme with a probability of 0.9.Specifically, a real number R within 0, 1 is randomly generated.If R < 0.9, the best investment scheme is directly established as the designated scheme for the next game; otherwise, perform the mutation.
Site-based mutation method is adopted, which can be described as follows: randomly select a gene from the individual and replace it with randomly generated numbers 0, 2, or 4, and a new scheme is obtained.The mutation probability can be 0.1 or other values between 0,1 .Through the GA investment subroutine, an agent can get a recommended option as a reference scheme.

Punishment
According to 28 , the subroutine of random investment with punishment is shown in Figure 4.After 10 rounds, if an agent's investment sum achieves or exceeds C120, it will obtain the surplus in its account.Otherwise, it will enter the punishment subroutine.The routine will punish all agents through throwing dice, leading to that all agents have 90%, 50%, or 10% probabilities to lose their surpluses.This case will be discussed as follows.
In the punishment subroutine, for every agent, the computer produces a random number R, and if R is greater than p p 0.9, 0.5, or 0.1 , the step M m 0 will be skipped, and the surplus money in the account will be saved.Otherwise, all surplus money in the account will be confiscated.

Fitness Calculation and Sorting
This stage involves calculating the fitness value of 21 individuals in the database, ranking these individuals in terms of fitness value and abandoning the one with the minimum fitness value.So there are still 20 records of investment in the database.Then it moves to the next round, until the given number of games i.e., 100 is completed.

Simulations and Results
The results of the simulation include three parts.First, we obtained variation curves of the total remaining savings in all six agents' accounts under different losing probability p and weighting coefficient α.Then we also obtained variation curves of cumulative investment sum under varying p and α, and finally, we identified the relationship between the variables and the parameters.: a The total remaining saving when p 0.9 and α 0. b The total remaining saving when p 0.9 and α 0.2.c The total remaining saving when p 0.9 and α 0.47.d The total remaining saving when p 0.9 and α 0.8.

Parameter Set 1
We first carry out the experiments to study the total remaining savings and the cumulative investment sum with p 0.9 and varying α.By implementing the experiment with different values of the parameters, variation curves of total remaining savings in all six agents' accounts are shown in Figures 5 a -5 d , where p denotes the punishment probability if the target sum is not reached, and α refers to the weighting coefficient in the equation.
There are two observations from the above figures under the 90% treatment: 1 the total remaining savings for all the six agents in a group decrease in response to the growth of the weighting coefficient α; 2 the total remaining savings remain as C120 when α 0.47.Consequently, the total remaining savings increase when coefficient α declines, showing a bias towards the group interests.On the other hand, personal benefits are more inclined to be accomplished, and the total remaining savings decrease when a higher value of α is used.The group interests and personal benefits are well balanced when α 0.47, where the total remaining saving remains as C120 and the Nash equilibrium is achieved.Variation curves of the cumulative investment sum among all the six agents in a group under different values of α in the 90% treatment are illustrated in Figure 6.
Two observations can be obtained from Figure 6: 1 the cumulative investment sum among all the six agents in a group decreases in response to the growth of the weighting coefficient α; 2 the cumulative investment sum remains as C120 when α 0.47.The results  indicate that the fitness function designed in 3.1 inclines toward the group interests, and that the cumulative investment sum increases as α decreases.The results also show a bias towards personal benefits, and the cumulative investment sum decreases when a higher value of α is used.When α 0.47, the group interests and personal benefits are both well taken care of, and the cumulative investment sum is about C120.
Although the experiment is in the 90% treatment i.e., p 0.9 , the above conclusions are also applicable for cases in the 50% treatment i.e., p 0.5 and 10% treatment i.e., p 0.1 .

Parameter Set 2
We then carry out the experiments to study the total remaining savings and the cumulative investment sum with p 0.5 and varying α.If the target sum C120 is not reached, an agent will risk losing all their remaining money with a probability of either 0.9, 0.5, or 0.1.Results on the total remaining saving among all the 6 agents in a group under different values of α in the 50% treatment i.e., p 0.5 are drawn through our experiment, as shown in Figures 7 a -7 c .Those on the cumulative investment sum are illustrated in Figure 8.

Parameter Set 3
We further carry out the experiments to study the total remaining savings and the cumulative investment sum with p 0.1 and varying α. Results on the total remaining savings and the cumulative investment sum among all the six agents in a group under different values of α in the 10% treatment i.e., p 0.1 are shown in Figures 9 a - It can be concluded that the cumulative investment sum goes down, and that the total remaining saving increases when loss probability increases.This indicates that a country     needs to invest more for recovery if dangerous climate change occurs and causes great damage.

Conclusion
An agent-based evolutionary model and a GA-based solution strategy are proposed in this paper.Based on the principle of maximizing individual and collective interests, linear weighting is used for the GA fitness function, and a coding and mutation operator is designed for GA evolutionary optimization strategies.The simulation experiments with groups of six agents show that it can achieve maximum benefits under the premise of the agent noncooperative game through encouraging optimal individuals.The results provide a solid basis for studying climate game strategy using multiagent modelling and simulation.This approach also has the potential to simulate the experiment that contains a large amount of data.

Figure 1 :
Figure 1: The main program based on GA.

Figure 5
Figure5: a The total remaining saving when p 0.9 and α 0. b The total remaining saving when p 0.9 and α 0.2.c The total remaining saving when p 0.9 and α 0.47.d The total remaining saving when p 0.9 and α 0.8.

Figure 6 :
Figure 6: The cumulative investment sum under different α.

Figure 7 :
Figure7: a The total remaining saving in the 50% treatment when α 0.2.b The total remaining saving in the 50% treatment when α 0.47.c The total remaining saving in the 50% treatment when α 0.8.

Figure 8 :
Figure 8: The cumulative investment sum in the 50% treatment.

Figure 9 :
Figure 9: a The total remaining saving in the 10% treatment when α 0.2.b The total remaining saving in the 10% treatment when α 0.47.c The total remaining saving in the 10% treatment when α 0.8.

Figure 10 :
Figure 10: The cumulative investment sum in the 10% treatment.