Single-versus Multiobjective Optimization for Evolution of Neural Controllers in Ms . Pac-Man

The objective of this study is to focus on the automatic generation of game artificial intelligence (AI) controllers for Ms. Pac-Man agent by using artificial neural network (ANN) and multiobjective artificial evolution. The Pareto Archived Evolution Strategy (PAES) is used to generate a Pareto optimal set of ANNs that optimize the conflicting objectives of maximizingMs. Pac-Man scores (screen-capture mode) and minimizing neural network complexity. This proposed algorithm is called Pareto Archived Evolution Strategy Neural Network or PAESNet. Three different architectures of PAESNet were investigated, namely, PAESNet with fixed number of hidden neurons (PAESNet F), PAESNet with varied number of hidden neurons (PAESNet V), and the PAESNet with multiobjective techniques (PAESNet M). A comparison between the singleversus multiobjective optimization is conducted in both training and testing processes. In general, therefore, it seems that PAESNet F yielded better results in training phase. But the PAESNet M successfully reduces the runtime operation and complexity of ANN by minimizing the number of hidden neurons needed in hidden layer and also it provides better generalization capability for controlling the game agent in a nondeterministic and dynamic environment.


Introduction
A number of optimization solution techniques have been introduced for solving Multi-Objective Problems (MOPs) [1].An MOP has a set of conflicting objective functions subject to certain constraints which are to be minimized or maximized [2].Among these techniques, Evolutionary Algorithms (EAs) are particularly suited for handling MOPs [3,4] because of its population approach that can help in finding a set of trade-off solutions in single simulation run, instead of having to perform a series of separate runs such as in the case of traditional optimization techniques.Moreover, EAs have been successfully used in solving complex problems such as discontinuities, multimodality, disjoint feasible spaces, and noisy function evaluations [5].A large range of practical applications of Multi-Objective Evolutionary Algorithms (MOEAs) to real-life problems across a host of different disciplines can be found in the reference texts by Deb [6] and Coello et al. [3].There are several types of effective MOEAs such as Pareto Archived Evolution Strategy (PAES) [7], Strength Pareto Evolutionary Algorithm 2 [8], Nondominated Sorting Genetic Algorithm II [9], and Paretofrontier Differential Evolution [10].
Generally, MOEAs are able to solve separate distinct varied dimensional optimization problems.In other words, MOEAs outperformed single-objective EAs without combining and resorting those multiple problems into single weighted-sum objective.Such weighted-sum methods are disadvantageous in obtaining suitable mode for combining different objectives into a single-objective function, which caused high-cost effective [6].Furthermore, each evolutionary run generates single solution; the second solution will only be generated after the weights are changed; these processes will be repeated for obtaining other solutions [11,12].Another distinct advantage of MOEAs is its capability in generating a complete set of Pareto optimal solutions in a single run with provides users a choice of solutions for tradeoff between different objectives.
On the other hand, the disadvantages of MOEAs [6,11,12] are (i) as the number of objectives increases, the coverage International Journal of Computer Games Technology of the Pareto front sparser and become unable to provide a comprehensive set of solutions over multiple dimensions, (ii) problematic in maintaining good spread diverse solutions along the Pareto front, and (iii) the difficulty of fitness sharing decision making in MOEAs which utilize multiple populations.This open research question provides the motivation for the work in this paper.In other words, "Is a singleobjective optimization technique better than multiobjective optimization in real-life problems?" Games are one of the common used platforms for answering research question by allowing the testing and comparison of new and experimental approaches on a challenging but well-defined problem [13][14][15][16][17].In this research, Ms. Pacman has been chosen as the test-bed due to its ease of use in comparing the performances of the single-objective optimization and multiobjective optimization techniques.
In this study, a feed-forward artificial neural network (FFNN) is used and later evolved with PAES, a well-known and simple MOEA, for computer-based players to learn and optimally play Ms. Pac-man game.There are two distinct objectives to be optimized: (i) maximize the Ms. Pac-man's game scores and (ii) minimize the number of hidden neurons used in the FFNN architecture.
A comparative empirical experiment will be conducted in order to verify the performances for the methods used.
(1) Single-objective optimization: the first experiment uses fixed number of hidden neurons in the FFNN and only maximizes Ms. Pac-man game scores, namely, PAESNet F.
(2) Single-objective optimization: the second experiment is using variable number of hidden neurons in the FFNN and only maximizes Ms. Pac-man game scores, namely, PAESNet V.
(3) Multiobjective optimization: the third experiment maximizes the game scores as well as minimizes the hidden neurons in the FFNN, namely, PAESNet M.
The main contribution of these proposed algorithms is to create computer-based agent that not only is able to make intelligent decisions like human players in the dynamic game environments, but also is highly beneficial to the real-world problems with the successful application of these techniques, such as in the application of robotics and other complex systems.

Other Related Researches
Basically, most studies in the game of Ms. Pac-man have only focussed on hand-coded rule-based (RB) approaches or other specific methods [18][19][20][21].Although these methods can achieve quite high scores, they are associated with some limitations.Firstly, game domains contain highly complex solution spaces that require a large number of rules in order to represent a set of all possible situations and corresponding actions in game environments.For instance, Szita and Lorincz [21] list 42 rules of a very basic hand-coded RB agent used in Ms. Pac-man game from the lists of action modules and observations to control the behaviour of agent.Secondly, the computation time required to exhaustively explore the search space is very expensive indeed if large sets of rules are used by the search strategies.Thirdly, there is a lack of generalization across different game domains or platforms because they would only apply in that particular game or genre of game.
The intention of this research is to create game controllers capable of general intelligent action without requiring any domain-dependent solution and also trying to be proficient in other games by just changing the input and output values of ANN.Thus, the experimental results will be compared to an appropriate reference system created by Lucas [22].Lucas used general methods in designing the game controller which evolves ANN by using evolutionary strategy to play Ms. Pacman.The input of the network is a handcrafted feature vector that consists of the distance to each normal ghost, distance to each edible ghost, location of current node, distance to nearest pill, the distance to nearest power pill, and distance to nearest junction, whereas the calculated output is a score for every possible next location given the agent's current location.ES is applied to evolve ANN connection weights.The best evolved agent with (10 + 10)-ES had an average score of 4781 over 100 runs of the nondeterministic game.

Methods and Parameter Setting
This investigation has two modes of operation: training and testing as shown in Figure 1.In the training mode, the FFNNs are trained using evolution-based algorithm.The agents will learn to play many games in order to optimize weights, biases, and number of hidden neurons in FFNN architecture, as an effective mode for training.After the training process, the neural network is tested for generalization using the optimized networks.

Pareto Archived Evolution Strategy.
The (1 + 1)-PAES for a two-membered PAES has been applied for simultaneously optimizing network parameters and architecture to solve single, and multiobjective optimization problems.The resulting algorithm is referred to as the PAESNet.Figure 2 shows the flowchart of PAESNet and fitness evaluation process.The strengths of PAES are listed as follows: (1) simple structure; (2) easy to implement; (3) (1 + 1)-PAES and (1 + )-PAES are based on local search method with lower computational effort required compared to population-based MOEAs.(4) a small number of parameters are needed; (5)  -10 tests for each system.
-To train the PAESNet F, PAESNet V, and PAESNet M with 500 evaluations, respectively.
-10 experimental runs for each system.(PAESNet V).The default number of hidden neurons is set to 20.In the initialization phase, the ANN weights and biases are encoded into a chromosome from uniform distribution with range [−1, 1] to act as parent and its fitness is evaluated.Subsequently, polynomial mutation operator is used with distribution index = 20.0 to create an offspring from the parent and the fitness is evaluated.After that, the fitnesses of the offspring and parent are compared.If the offspring performs better than the parent, then the parent is replaced by the offspring as a new parent for the next evaluation.Otherwise the offspring is eliminated and a new mutated offspring is generated.If the parent and the offspring are incomparable, the offspring is compared with a set of previously nondominated individuals in the archive.Below is the description of the archiving process in PAESNet.
There are three possible situations that can occur between the comparison of the offspring and archive [7,24,25].First, if the offspring is dominated by a member of the archive, then the offspring is discarded and a new mutated offspring is created from the parent.Second, if the offspring dominates some members of the archive, then the set of dominated members is removed from the archive.The offspring will then be added to the archive and it also becomes the parent of the next generation.Third, if the offspring and the archive members do not dominate each other, then the archive will be maintained depending on the archive size.If the archive is not full, the offspring will be directly copied to the archive.Otherwise, in the scenario that the archive is full, a neighborhood density measure is used to ensure that a well-spread distribution is maintained in the archive.If the offspring has succeeded to increase the archive diversity, it will replace the archive member in the most crowded grid location in order to maintain the maximum archive size.Note that in this third situation, the offspring and the parent are the nondominated members of the archive.The neighborhood density measure is also applied for parent selection of the next generation from both of them.If the offspring resides in the less crowded area than the parent, then the offspring is selected.

Multiobjective: PAESNet with Multiobjective (PAES-Net M).
The structure of this proposed algorithm is similar to the algorithms in Section 3.2 except for the architecture of the ANN.In this proposed algorithms, two objectives are involved.The first objective is to maximize the game scores while the second objective is to minimize the number of neurons in the FFNN.The initial value of hidden neurons is set to 20.
The net input of a neuron is calculated using the weighted sum of inputs from all neurons in the previous layer, as follows: Log-sigmoid (logsig) is used as the activation function in the hidden and output layers.Based on [27], logsig has been identified as suitable activation function in ANN for creating neural-based Ms. Pac-man agent.

Experimental Setting.
The FFNN architecture of this model has a 5-20-1 structure, which consists of 5 inputs and 1 output together with one hidden layer of 20 neurons.This number of hidden neurons was suggested by Lucas [22].The Euclidean distance is applied to calculate the distance in the maze as the inputs of the network were obtained based on the following information: (1) the closest distance from agent to a pill; (2) the closest distance from agent to a power pill; (3) the closest distance from agent to a ghost; (4) the closest distance from agent to an edible ghost; (5) the closest distance from agent to a fruit.

Experimental Results and Discussions
The results obtained from the analysis of training and testing performances can be compared in the tables and figures below.

Training Results.
values of hidden neurons, we observed that the PAESNet M reduces the number of hidden neurons from 20 to 8 which is around 60% improvement.This emphasizes the advantages of MOEA approach in terms of computational complexity in FFNN.
Additionally, the average scores of all the proposed algorithms, PAESNet F (7161), PAESNet V (5935), and PAES-Net M (5734), are relatively higher when compared to Lucas (4781) [22] for training to play the Ms. Pac-man.Hence, this is further a proof that the proposed systems with PAES are able to usefully and automatically generate Ms. Pac-Man agents that display some intelligent playing behavior.
Table 2 lists the win rates (WR) for each comparison, which is the number of runs an artificial controller wins per total number of runs as shown in (2).Firstly, for PAESNet F versus PAESNet V, WR = 90%, PAESNet F won 9 out of 10 runs compared to PAESNet V except Run 1, and the result is same for PAESNet F versus PAESNet M. Subsequently, for PAESNet V versus PAESNet M, WR = 60%, PAESNet V won 6 out of 10 runs compared to PAESNet M except Run 3, Run 6, Run 9 and Run 10.The results clearly show that PAES-Net F outperformed the other two competing approaches.This result may be explained by the fact that PAESNet F is concerned with a single-objective of maximizing the game scores, while that of the PAESNet M is to find the set of trade-off solutions between the scores and number of hidden neurons.The acceptance of trade-off solutions is due to convergence performance and diversity preservation in Pareto optimal front.Due to these two criteria, the multiobjective optimization is harder than single-objective optimization.Another possible explanation for this is that multiobjective optimization is dealing with two search spaces, which are decision variable space and objective space compared to single-objective optimization just involving one search space (decision variable space).This factor may influence the performance of PAESNet M: WR = runs won total of runs played * 100%.
The global Pareto-frontier solutions obtained with the goal of maximizing scores and minimizing hidden neurons across all 10 runs using multiobjective optimization are illustrated in Figure 3.The global Pareto solutions are shown by the dotted line.As can be seen from the figure, the PAESNet M reported significantly decreases the number of hidden neurons needed from 20 to the range of 3 to 7 nodes in the hidden layer as the optimized networks and the game scores achieved were 3610, 4180, 4940, 5990, and 7170.

Testing Results.
After the training phase, the best evolved networks were used to test the generalization ability of the models in order to score as high as possible.The selected best numbers of neurons in the hidden layer are 20, 13, and 7 for PAESNet F, PAESNet V, and PAESNet M, respectively, as the optimum networks, as shown in Table 3. Table 4 presents the testing results of the three proposed algorithms.We can observe that both of the max and mean scores in PAESNet M (5360, 3249) were higher than PAESNet F (4360, 2830) and PAESNet V (4880, 3096).Based on mean values, PAESNet M was shown to have better performance compared to PAESNet F and PAESNet V.
Table 5 lists the win rates for each comparison.Firstly, for PAESNet F versus PAESNet V, WR = 70% and PAESNet V won 7 out of 10 runs compared to PAESNet F, except Run 4, Run 5, and Run 6. Next, for PAESNet F versus PAESNet M, WR = 80% PAESNet M won 8 out of 10 runs compared to PAESNet F, except Run 2 and Run 10.Lastly, for PAESNet V versus PAESNet M, WR = 60% and PAESNet M won 6 out of 10 runs compared to PAESNet V, except Run 2, Run 7, Run 8, and Run 10.From this data, we can see that PAESNet M resulted in the highest value of win rate compared to the two algorithms.The PAESNet M successfully found the appropriate network architecture and parameters by maximizing the game scores and minimizing the hidden neurons.Overall, the testing results have shown that FFNNs and PAES have strong potential for controlling game agents in the game world.

Conclusions
In this study, the FFNN is evolved with the PAES MOEA for the computer player to automatically learn and optimally play the game of Ms. Pac-man which is called PAESNet.Three forms of PAESNet, PAESNet F, PAESNet V, and PAESNet M, were introduced to solve single-and multiobjective optimization problems and compared to each other in the training and testing processes.The Pareto optimal front resulted from each MOEA run provided a set of NNs which maximized the scores of Ms. Pac-man and at the same time minimized the size of the controller.In the training process, PAESNet F outperformed PAESNet V and PAESNet M.However, in the testing process, PAESNet M outperformed the other two algorithms.One of the most significant findings to emerge from this study is that the generalization performance of the neural networks could improve significantly by evolving the architecture and connection weights (including biases) synchronously via a MOEA approach as opposed to fixing the network architecture and optimizing the scoring component only using a single-objective optimization approach.

Figure 1 :
Figure 1: The overview of the study.

Figure 3 :
Figure 3: The global Pareto-frontier solutions obtained across all 10 runs using multiobjective optimization.

Table 1
PAESNet V, and PAESNet M were (7161, 20), (5935, 9.7), and (5734, 8), respectively.According to the mean values of scores, the results showed that PAESNet F has the highest average score.However, the best scores are comparable across all three approaches (7430 in PAESNet F, 7190 in PAESNet V, and 7170 in PAESNet M).On the other hand, taking the mean

Table 1 :
The training results.
Score: best score; Neurons: number of hidden neurons.

Table 2 :
Win rates for training results across all 10 runs.

Table 3 :
The best evolved networks from proposed algorithms.

Table 4 :
The testing results.

Table 5 :
Win rates for testing results across all 10 runs.