Coevolution of Artificial Agents Using Evolutionary Computation in Bargaining Game

Analysis of bargaining game using evolutionary computation is essential issue in the field of game theory. This paper investigates the interaction and coevolutionary process among heterogeneous artificial agents using evolutionary computation (EC) in the bargaining game. In particular, the game performance with regard to payoff through the interaction and coevolution of agents is studied. We present three kinds of EC based agents (EC-agent) participating in the bargaining game: genetic algorithm (GA), particle swarm optimization (PSO), and differential evolution (DE). The agents’ performance with regard to changing condition is compared. From the simulation results it is found that the PSO-agent is superior to the other agents.


Introduction
The current bargaining game research is based on the established theoretical model of Ståhl [1] and Rubinstein [2].Game theorists, economists, psychologists, and computer scientists have already started analyzing the underlying bargaining phenomenon which can be applied in e-commerce application [3], negotiation problem [4], and dispute resolution [5], to name a few.The game appears to be very simple but the results are fuzzy and controversial.
Over the past few years, a considerable number of studies have been conducted on modeling the bargaining game using artificial agents on the interaction among the homogeneous population.However, very few attempts have been made at the study on the interaction among the heterogeneous population.Matwin et al. designed a negotiation support system (NSS) which addresses multiple issues through populations of rules (classifier) which are learned by means of GA thereby supporting a two-party bargaining game [6].Meanwhile, using evolution strategy, Page et al. proposed a generalized adaptive dynamic framework that can deal with games in which the payoff is not differentiable [7].van Bragt and La Poutrè formulated bargaining strategies as finite automata coevolved by genetic algorithm to discriminate different opponents without any information about the identity or preferences of their counterparts [8].Takadama et al. suggested three learning bargaining models which are based on evolution strategy (ES), learning classifier system (LCS), and reinforcement learning (RL) strategy.They evaluated heterogeneous-population interactions in their study [9].Zhong et al. have tried to show that artificial agents with RL strategy can evolve against fixed rules and rotating rules with better performance [10].Cooper et al. further utilized the RL strategy in terms of observing the relative speeds of learning by proponents and respondents [11].Grosskopf studied the combined effect of RL and directional learning (DL) strategy in order to compare the result of the one-shot bargaining game with a proponent and varying respondents and showed that the strategies can coevolve [12].
The above studies have focused on the validity of the artificial agent models and compared the results of homogeneous-population interactions.However, these studies on the homogeneous-population interactions are conservative approaches due to the reason that the real-world bargaining game aims at the analysis of the deal in which there exist many behaviors with diversified propensities and tendencies, characterizing many kinds of agents.

Advances in Multimedia
In this paper we proceed to study interactions of agents in the heterogeneous population.We conducted experiments with three kinds of evolutionary computation based agents to play the bargaining game.From the experiments we identify what are the principal parameters and how much they affect the results of the bargaining game.Also patterns of action of artificial agents are analyzed according to their strategy.In particular, a bargaining game among EC-agents was conducted to observe the interaction and coevolution.
This paper is organized as follows: Section 2 briefly reviews the sequential bargaining game.The next section outlines the design consideration of artificial agents.In Section 4, coevolution model among EC-agents is described.The simulation results are demonstrated in Section 5. Finally, the paper concludes with some remarks in Section 6.

Sequential Bargaining Game
The sequential bargaining game is a division game of a fixed sum between two players.There exist infinite number of Nash equilibriums in the bargaining game according to the game theory and the subgame perfect equilibrium is that the last proponent makes a proposal as the , the lowest nonzero quantity, to the counterpart and the respondent always accepts the minimal proposal since any  is better than a null demand.But experimental evidence is in contrast with this strategy due to the fact that the proponents tend to offer the counterpart more than the noncooperative game theory predicts, and the respondents reject the small offers.The rejection of a low offer by the respondent can be seen as punishment.Page et al. surveyed that "some 60∼80% of proponents offer fractions between 0.4 and 0.5, and only 3% offer less than 0.2.They are well advised to do this-indeed, some 50% of respondents reject any split offering them less than one-third of the sum" [7,[13][14][15].It seems discrepancy between game theory and experimental data results from the notion of fairness and the absence of common knowledge of rationality [16][17][18].Recently, extensive studies have been carried out on the analysis of the bargaining game through the use of artificial agents [19][20][21].
A brief review, in this respect, follows.However, before that we prefer to review the following terms for clarity.
(i) Payoff: reward which agent receives from the game.
(ii) Control parameter: EC-agents factor which can affect the performance of agent in game.
(iii) Sequential game: the game composed of multiple rounds.

Artificial Agent Models
In this section, we discuss the underlying bargaining game phenomenon vis-à-vis simulation models.The game kicks off by virtue of randomly, that is, with equal probability, deciding a proponent and a respondent.The proponent chooses a proposal   , a real number between 0 and 10, which is the amount the proponent is able to pay at round .The respondent chooses a minimal acceptable demand   , which  is also a real number between 0 and 10 at round .If the proposal is more than the demand, that is, if   ≥   , then the proponent earns 10 −   , and the respondent earns   .If the proposal is not accepted, that is, if   <   , then the status of two players is exchanged and set at round  = +1.Finally if the deal between the two players is failed in the last round, that is,  = 5 in our experiment, then each player earns null.
We introduce three kinds of the artificial agents for evolving strategies using genetic algorithms (GA), particle swarm optimization (PSO), and differential evolution (DE).These ECs are based on an arbitrarily initialized population of trial solution which evolves toward better solution by means of each EC operators.
Figure 1 shows an EC-agent which is called solution, strategy, vector, and position.In a bargaining game, it is important whether the gamer begins the first transaction as a proponent or a respondent, and thus each strategy is composed of two vectors.The first vector represents a strategy to put an EC-agent in the first proponent position and the second in the first respondent position.When the agent is the first proponent, the first row is used as its strategy, otherwise, the second row.

GA-Agent
Model.Genetic algorithm (GA) is a search algorithm based on the mechanics of natural system, that is, the law of the survival of the fittest [22].GA operators consist of selection, crossover, and mutation.Fitness value of individual solutions is measured by a payoff which a GAagent earns in the bargaining game.
In GA-agent, we use a tournament selection, arithmetic crossover, and mutation as GA operators.The tournament selection is a selection method that one picks up two solutions randomly from current population and chooses a winner between them [23].The arithmetic crossover is a crossover method that each gene of offspring is averaged value of two parents' genes.As for mutation, we use the method to initialize genes.Figure 2 shows an evolution process of GAagent.

PSO-Agent Model. Particle swarm optimization (PSO)
is a metaheuristic method that optimizes a problem by iteratively trying to improve a candidate solution by moving particles, which are candidate solutions, around in the search space according to simple mathematical formulae which are concerned with particle's position update and velocity update [24].Each particle's movement is influenced by its local best known position but is also guided toward the best known positions in the search space, which are updated as better positions found by other particles.
The PSO algorithm is initialized with the population of individuals being placed randomly on the search space and searching for an optimal solution by updating individual generations.In each iteration, the velocity and the position of each particle are updated according to its previous best position ( best,, ) and the best position found by neighbors of the particle ( best,, ).The formula of particle's velocity and position update is as follows: where  is the index of particles in the swarm,  is the index of positions in the particle,  represents the iteration number, V , () is the velocity vector of the th particle, and  , () is the position vector.Note that  1 and  2 are the positive acceleration constants,  1 and  2 are random numbers uniformly distributed between 0 and 1, and  is the inertia weight.
In [25], it was shown that a good convergence can be ensured by making two constants which are an acceleration and inertia.This can be demonstrated from the relation between them using an intermediate parameter .Consider the following: In PSO-agent, we use an original version of PSO with intermediate parameter . Figure 3 shows an evolution process of PSO-agent.

DE-Agent
Model.Differential evolution (DE) is a metaheuristic method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality.In the DE, at first, the initial solution vector group should be generated randomly.The generated solution vectors are updated by performing three processes which are replacement, making a trial vector, and crossover.The replacement is a process that if a candidate solution made by crossover is better than a present solution, a present solution is updated by a candidate solution.The trial vector is a vector made by the following formulae to combine the existing vectors from the population [26].Consider the following: where  1, ,  2, , and  3, are randomly selected solutions in current population and  is a real positive coefficient.
In DE-agent, we use a standard version of DE with a uniform crossover.Figure 4 shows an evolution process of PSO-agent.A candidate vector  ,+1 is generated by uniform crossover operation with randomly selected solution  , in current population and the trial vector  ,+1 as follows: where the rand( ) means a random number between 0 and 1 and CR is probability of crossover.

Set parameters
• Initialized population of EC1-agent and EC2-agent • Current round = 1 and current iteration = 1 Evaluation EC1 • An EC1-agent does bargaining game with entire EC2-agents, respectively, when the EC1-agent is first proponent and when the EC1-agent is first respondent.And then the fitness of EC1-agent is average value of all competitions EC1-agent operations Evaluation EC2 • For evaluation of an EC2-agent, do same way with EC1-agent

Coevolution Model
The co-evolution model between two EC-agents in bargaining game is presented in Figure 5.After the solution groups of two kinds of EC-agents are randomly generated, each group is evaluated and evolved step by step.When one group of solutions is evaluated, entire solutions of another group are used for the counterparts in the bargaining game.And the player begin the bargaining game twice as a proponent or a respondent against each counterpart.Finally, the fitness value of solution is calculated by averaging all earns of total games.For example, when the number of entire solution is 30, two rounds of the bargaining game were conducted for each counterpart (beginning as a proponent, beginning as a respondent) to gain 60 different earns in total.The values were divided by 60 to determine the fitness of the solution.

Experimental Results
This section shows experimental results based on adaptive EC-agents.EC-agents have inter alia parameters which have effects on the performance.In a GA-agent, the parameters are a probability of crossover and mutation; in a PSO-agent, they are an intermediate parameter and maximum velocity; in a DE-agent, they are a coefficient  and probability of crossover.We examined the impact of variations of the above parameters on the experimental results.
In order to observe the coevolution among EC-agents in a bargaining game, three experiments on GA-agent versus PSO-agent, GA-agent versus DE-agent, and PSO-agent versus DE-agent were conducted.

Experimental Environment.
In order to create an experimental environment, we set the simulation parameters as follows: (i) population size: 30; (ii) maximum iteration: 10,000; (iii) maximum round in bargaining game: 5; (iv) number of counterparts: 30 (entire population).

Experiment of Single EC-Agent.
In this experiment, each EC-agent is tested on bargaining game with the fixed group of the counterpart's solutions in order to determine the optimal control parameter of each EC-agent.

GA-Agent.
The control parameters of GA-agent are a crossover rate and mutation rate.As shown in Figure 6, the best performance of GA-agent in bargaining game was observed under the crossover rate of 0.9 and mutation rate of 0.05.

PSO-Agent.
The control parameters of PSO-agent are an intermediate parameter  and maximum velocity V max .As shown in Figure 7, the best performance of PSO-agent in bargaining game was observed under  = 0.9 and V max = search space/5.Here, search space (SS) is 10; thus, V max = 2.

DE-Agent.
The control parameters of DE-agent are a coefficient  and crossover rate CF.As shown in Figure 8, there is very little difference in the performance of DE-agent in bargaining game with regard to two control parameters.Thus, we adopt that  = 0.7 and CF = 0.5 which are generally used.

GA-Agent versus PSO-Agent.
The result of the bargaining game by means of coevolution between the GA-agent and PSO-agent is shown in Figure 9.The GA-agent was set to the optimal environment determined in Section 5.2.1 and the PSO-agent to that in Section 5.2.2.As you can see, the PSOagent is superior to the GA-agent in the coevolution-based bargaining game.

GA-Agent versus DE-Agent.
The result of the bargaining game by means of coevolution between the GA-agent and DE-agent is shown in Figure 10.The GA-agent was set to the optimal environment determined in Section 5.2.1 and the  Secondly, the coevolutionary process among three kinds of EC-agents which are GA-agent, PSO-agent, and DEagent is tested to observe which EC-agent shows the best performance in the bargaining game.The simulation results show that a PSO-agent is better than a GA-agent and a DEagent and that a GA-agent is better than a DE-agent with respect to coevolution in bargaining game.
In order to understand why a PSO-agent is the best among three kinds of EC-agents in the bargaining game,   we observed the strategies of EC-agents after completion of game. Figure 12 shows the strategies of a GA-agent and a PSO-agent after completion of game.When the PSO-agent is a proponent, he suggests a small quantity of properties to the opponent, but when he is a respondent, he desired a large quantity.In contrast, when the GA-agent is a proponent, he suggests a large quantity to the opponent, but when he is a respondent, he desired a small quantity.
In case of bargaining game between a PSO-agent and a DEagent, the strategy of a DE-agent is similar to GA-agent of the figure.This indicated that the PSO-agent evolves in direction of the strategy to gain as much as possible at the risk of gaining no property upon failure of the transaction, while the GA-agent and the DE-agent evolve in direction of the strategy to accomplish the transaction regardless of the quantity.

Conclusion
The interaction and coevolutionary process among the heterogeneous EC-agents are studied to observe the performance of the bargaining game.This paper investigates the nature of interaction and coevolutionary process in order to understand the pattern of action of three kinds of EC-agents and also identifies the principal parameters that influence the performance of agents.The simulation results show that the control parameters of a GA-agent PSO-agent have more influence on the performance than those of a DE.Furthermore, the simulation results also show that a PSOagent is better than a GA-agent and a DE-agent with respect  to coevolution in bargaining game.We expect the analysis on the characteristics of artificial agents to help the researchers who study the game theory using artificial agents.

Figure 2 :
Figure 2: Evolution process of a GA-agent.

Figure 12 :
Figure 12: Comparison of strategies after completion of game.