AIN-Based Action Selection Mechanism for Soccer Robot Systems

Role and action selections are two major procedures of the game strategy for multiple robots playing the soccer game. In role-select procedure, a formation is planned for the soccer team, and a role is assigned to each individual robot. In action-select procedure, each robot executes an action provided by an action selection mechanism to fulfill its role playing. The role-select procedure was often designed efficiently by using the geometry approach. However, the action-select procedure developed based on geometry approach will become a very complex task. In this paper, a novel action-select algorithm for soccer robots is proposed by using the concepts of artificial immune network (AIN). This AIN-based action-select provides an efficient and robust algorithm for robot role selection. Meanwhile, a reinforcement learning mechanism is applied in the proposed algorithm to enhance the response of the adaptive immune system. Simulation and experiment are carried out to verify the proposed AIN-based algorithm, and the results show that the proposed algorithm provides an efficient and applicable algorithm for mobile robots to play soccer game.


Introduction
The objective of this research is to design a strategy planning system for multiple robots playing soccer game.The proposed system is composed of two levels: namely, role selection mechanism (RSM) and action selection mechanism (ASM).The RSM assigns different roles to each robot in order to work together as a team and fulfill the game strategy.When each robot is assigned a certain role, the ASM will consider what the appropriate action is for each robot to accomplish their roles.Each robot executes its own actions provided by the ASM, and a team of robots performs a formation task in the soccer game by collaboration.
In the literature, the RSM was often developed by geometry approach based on decision-tree theory [1][2][3][4][5].A decision tree has several nodes arranged in a hierarchical structure as depicted in Figure 1 [5].It is based on the instantaneous geometric situation on the soccer field, such as the absolute position of the ball and the relative position between the ball and robots, to choose the most suitable role for each of the robots.Roles of the robot can be distinguished into active robot and passive robot.Every moment in the game can only allow one robot to play as an active robot and in charge of offense and defense; while the others are passive robots to assist the active robot to carry out the mission.From Figure 1, it is easy to see that the decision tree implements the decision in a simple, apparent, and multistage manner.Since each node of a decision tree uses only a simple splitting rule, the entire decision process can be implemented very fast and efficiently.
The ASM is also premeditated by using the geometry method, and an action is assigned to each robot to accomplish the task based on the geometrical location of the ball or robot in the soccer field [3][4][5].Tsou et al. [5] designed eight basic actions for soccer robot based on geometric thinking approach, including chase ball, dribble ball, shoot ball, sweep ball, goal keeping, blocking, active attack, and assist attack.The details of these actions are explained in Table 1.There are two major disadvantages in using the concept of geometry thinking for constructing an ASM.First, if the ball is located at the boundary of two zones, the geometry thinking method will fail to function.Second, there are too many actions to be considered in order to cover all possible conditions of all geometrical divisions.In this paper, an ASM based on the artificial immune network (AIN) is proposed to replace the geometry thinking method thus avoiding its disadvantages.Meanwhile, the decision tree is still used to decide the role of each robot in this research.[5].

Chase ball
When robot is far away from ball, this action is given to go after the ball.

Dribble ball
If robot is close to the ball, this action is called to take control of the ball.

Shoot ball
If robot is close to the ball and goal, this action is used to shoot the ball.

Sweep ball
While the ball is in the corner or boundary, the robot uses this action to sweep the ball out.

Goal keeping
Robot playing as a goal keeper gets this action to prevent losing point.

Blocking
If an opponent and the ball are close to our goal, the closest robot goes between the opponent and ball trying to block the way.

Active attack
If ball is near the opponent's goal, the closest robot will play as an attacker.

Assist attack
Robot closest to the attacker gets ready to attack in case the attacker misses.
Where is the ball?zone 1th zone 2nd zone kth to play?

What formation to play?
What formation to play?
This research has two major contributions.First, the complexity of designing the robot actions is reduced by using the novel AIN-based ASM compared to the methods by geometry thinking [3][4][5].Instead of geometry thinking approach, if the concepts of AIN are applied to design the ASM, fewer number of robot actions are needed for playing the soccer game.Second, the geometry thinking method will fail to function in certain geometrical locations of the ball in the soccer field.However, the AIN-based ASM will not have the same functionality problem.Furthermore, a reinforcement learning mechanism is also utilized to determine the priority order of antibodies at the initial stage of the soccer game, and then the game strategy is carried out according to the priority order.Therefore, a tactic-based decision system is formed for a soccer robot team.
In Section 2, the proposed AIN-based action selection mechanism is presented.The reinforcement learning mechanism is explained in Section 3. The problem of camera calibration is discussed in Section 4. Sections 5 and 6 depict the simulation and experimental examples.Some conclusion remarks are discussed in the last section.

AIN-Based Action Selection Mechanism
2.1.Artificial Immune Network.The concepts of artificial immune network proposed by Farmer et al. [6,7] are utilized in this research to design the action selection mechanism for the robots to accomplish the soccer game.In the human body, the biological immune system defends the invasion of outer viruses or antigens by two successive response subsystems, including the innate immune system and the adaptive immune system.The innate immune system is a primitive nonspecific recognition system which is able to generate a series of chemical reactions to detect the invasive viruses or antigens, and then transmit the identification of antigen to adaptive immune system.This is the perception competence of the biological immune system.The lymphocytes (B-cell receptors) in the adaptive immune system will recognize an antigen and perform cell division, and then specialize themselves into plasma cells to duplicate a massive number of antibodies according to the transmitted identity of antigens.Each kind of the antibodies aims to recognize a certain kind of antigen and is responsible to destroy the specific invasive antigen [8].This is the reaction competence of the biological immune system.
By using the concepts of artificial immune network, the perception competence of the biological immune system is represented by the function of affinity, describing the relation between the antibody and antigen [6].The affinity m i is defined to represent the relationship between the antibody and the antigen [6] as follows: if the antibody i is combined with an antigen, 0, otherwise, (1) where k is the time step.
Jerne [9] proposed the idiotypic network hypothesis which stated that an antibody not only can bind with antigens, but also with other antibodies to form a network.Therefore, an artificial immune network is established by a massive number of antibodies against the invasive antigens [6,7].These antibodies form an artificial immune network by the stimulation and suppression effects among them.The stimulation and suppression of antibody i triggered by antibody j are represented by the affinity m i j and defined as the follows: if antibody i is triggered by antibody j, In AIN, the reaction competence of biological immune system, or called the reaction of an antibody to antigens, is modeled by the function of concentration.If there are N antibodies to form an AIN, the concentration x i of antibody i is expressed as the following first-order difference equation [6]: where the first and second terms in the right-hand side of (3) represent the stimulation and suppression effects, respectively; k i denotes the mortality of antibody i.By the procedure of stimulation and suppression among the antibodies, the antibody with the largest value of concentration will be triggered.

Robot Action Selection Mechanism.
In this paper, the perception competence of the biological immune system is utilized to model the perception of a soccer robot system, while the reaction competence is employed to model the response of a robot system to the environmental change.
A coordinate system is located on the robot, and the soccer field surrounding the robot is divided into four quadrants, as shown in Figure 2. The perception competence of the robot system at each quadrant is modeled by a biological immune system which has the capability to detect three kinds of antigens.These antigens represent three different kinds of occupant at each quadrant, including the ball, an opponent robot, and a vacancy.A vacancy means that there is neither ball nor opponent robot in the quadrant.As shown in Figure 2, there are twelve kinds of antigens to be detected for each robot.Therefore, the total number of antibodies is linearly proportional to the number of robots.
The AIN investigates each quadrant around the robot; if one kind of antigen is detected, the corresponding antibody is triggered according to the circumstance.At least one antigen in each quadrant around the robot is detected at any given time.For example, there may be two antigens, namely, the ball and an opponent robot, occupying one quadrant.A robot collects multiple antigens from the surrounding quadrants and there may be more than one corresponding antibodies.Therefore, the number of triggered antibodies depends on how many antigens are detected by a robot.
The affinity m i j of AIN in ( 2) is utilized to represent the detected occupants at each quadrant around the robot.Similarly, the concentration x i in ( 3) is applied to model the reaction competence in a soccer robot system, and the robots decide the next action according to the antibody having the highest value of concentration.If there is more than one antibody containing the highest value, the following priority orders can be applied to the immune response antibody: The flow chart of an AIN behavior-based controller system in soccer robot game is shown in Figure 3, containing three portions: sensing and perception, artificial immune network, and reinforcement learning mechanism.The portion of sensing and perception is composed of environment detection and antigen determination.The main purpose of this portion is for the robots to investigate the soccer field, which is divided into four quadrants, and then marshal the information to detect the antigens.In the portion of artificial immune network, there are triggering, stimulation, and suppression among antibodies, and the calculation of antibody concentration.Based on the environmental information obtained from antigenic detection, the robots determine which antibodies to activate.These antibodies influence their own concentration and change the affinity because of stimulation and suppression among themselves.Finally, the robot system chooses the antibody with the highest concentration to defend against the invasive antigens, and therefore, select an appropriate action.

Reinforcement Learning Mechanism
The reinforcement learning mechanism in machine learning area brought in the concept of determining the priority order and meaning of antibodies [10][11][12].In Figure 3, the reinforcement learning mechanism which has a system of reward and penalty is utilized to enhance the speed of producing antibodies by affecting the calculation of the affinity.The reinforcement learning mechanism determines whether the reaction of the antibody with the highest concentration conforms to the priority order.If the reaction matches the priority order, a reward is offered to the antibody; otherwise, a penalty is given.The reward and penalty will affect the concentration of the help T-cell.The definition of the concentration of the help T-cell is expressed as [12] T where η is the growing factor, and np is the number of times the penalty is offered.If there is no penalty, np is decreased by 1.When the concentration of T h reaches a preset threshold, δ, the help T-cell will take action and influence the affinity of the triggered antibody, and then help the antibody to learn and memorize the history of robot action.In this case, the learning rate γ is greater than zero; otherwise, it is set to be zero as follows: The learning mechanism of the artificial immune network in this research has two phases: the immune response mode and immune tolerant mode.At the immune response mode, the B-cells and help T-cells grow exponentially.In the early stage of immune response, the antibody cannot recognize any antigen; therefore, the function of the help Tcell is designed to assist the capability of recognition for the antibody.Antibody is trained to memorize antigen at this phase.In the soccer robot case, the robot continuously learns different behavior modes in order to handle an unfamiliar environment.When np is reduced to be zero, the help T-cell constrains the growth of the B-cell, and the immune tolerant mode will start to function.In the immune tolerant mode, the antibody can recognize an antigen, and the robot has steady mode and ability to handle all kinds of environmental conditions it confronts.The calculation of the concentration of help T-cell in (6) will be changed to where λ represents the decay factor.When the concentration of T h no longer affects the affinity of antibody, it means that the antibody can fully recognize all kinds of antigen, and the learning of the immune system is completed.If any unexpected circumstance happens, it means that some new antigens are not yet being recognized by the system.Therefore, the learning mechanism will go back to the immune response mode and learn again.The reward signal acts on the stimulation term of the triggered antibody's concentration in (3), while the suppression term remains unchanged.On the other hand, the penalty signal increases the concentration of help T-cell and also enhances the suppression term of the triggered antibody's concentration in (3), while the stimulation term keeps unchanged.The stimulative and the suppressive affinity of antibody i stimulated by antibody j is defined as Figure 4 depicts the concentration of a T-cell during a simulation process, while the affected concentration of one antibody is plotted in Figure 5. From the figures, we can see that the concentration of the antibody is stimulated or suppressed exponentially by the concentration of the T-cell, if an unexpected circumstance happens.When the concentration of the antibody is in saturation, the concentration of T-cell will decay to zero value according to (8).

Camera Calibration
The control system utilizes a global vision system to supervise the soccer robots.A procedure with decoupled nonlinear polynomials is proposed to calibrate the camera of the global vision system.The methods with coupled nonlinear polynomials used in the literature [13,14] will involve computational difficulty.Instead, a second-degree polynomial is utilized in this paper to model the effect of wide-angle lens: where R is the undistorted radius from the pixel of interest to the center of an image; r is the corresponding distorted radius by measurement; a i are the intrinsic parameters of the camera to be determined.Two polynomials are employed to model the extrinsic parameters caused by the linear and rotational motion of the camera as follows: where x and y are the coordinates of the undistorted pixel; x and y are the corresponding coordinates of the distorted pixel by measurement; c i are the extrinsic parameters of the camera.The ground and top of the robot are in different levels, as shown in Figure 6; therefore, the location of a robot at point B will be recognized incorrectly as the location at point A. The correct location of the robot can be determined by where H and h are the heights of the camera and the robot, respectively; L is the calculated distance by the method of image processing.As one example, five robots are placed at five different locations in the soccer field, as shown in Figure 7.The truth (undistorted) location and uncalibrated (distorted) location are listed in first and second rows in Table 2.The coefficients of intrinsic parameters are calculated as a 0 = 0.4004, a 1 = 0.4316, a 2 = 0.0001; while the coefficients of extrinsic parameters are c 1 = 1.012, c 2 = 0.0492, c 3 = −0.0001,c 4 = −2.8311,c 5 = −0.0349,c 6 = 1.0153, c 7 = 0.0, c 8 = 5.149; and the equation for different level point is determined as l = 0.94L.We calculate the root mean squared error (RMSE) for the image recovered by using three proposed procedures for camera calibration, namely, wide-angle, camera-motion, and different level calibrations: where x and y are the undistorted coordinates; x i and y i are the distorted coordinates.The results for wide-angle, camera-motion, and different level calibration are listed in 3th-5th rows in Table 2, respectively.Table 2 depicts that the effect of a combination of wide-angle, camera-motion, and different-level calibrations will reduce the RMSE from 11.42 cm to 1.27 cm.

Simulation Results
In this section, an example of 3-on-3 robot soccer game is demonstrated by using the FIRA simulator [15].In the example, the decision tree is used to decide the role of each robot, and the AIN is employed to determine what action each robot should take.The roles of the robots are defined as striker, fullback, and goalkeeper.The characters of striker and fullback are differentiated to be an active robot and a passive robot, respectively, according to the relative position of robots to the ball.For the active robot, its main purpose is to chase and shoot the ball.If there is no opponent robot trying to take over the ball or block the way, the action of an active robot will be rewarded and keep chasing the ball.For the passive robot, the objective of the robot action is to assist the attack.
A command generating algorithm is designed to create a point-to-point planner motion for the robots.The speeds of right and left wheels of the soccer robot are calculated as where ω R and ω L are the speeds of right and left wheels, respectively; ẋm , ẏm , and φ are the linear and angular velocities of the robot; D is the distance from the wheel to the center of the robot; φ is the rotation angle between world frame xy and robot frame x m y m as shown in Figure 8. opponent robots are located on the right half of the field, and our robots are located on the left half-field.During the soccer game, the decision tree assigns various roles to our robots, including the goalkeeper, fullback, and striker.Once the roles are assigned to the robots, the AIN-based ASM selects an action for each robot.As shown in Figure 10, the striker adopts the action of ball chasing and shooting, and the fullback heads forward and assists the attack, and the goalkeeper retains the action of defending our goal.

Experimental Results
The proposed AIN-based ASM is applied to the small-size robot soccer game, in which the global coordinates of the soccer robots are obtained by using an appropriate image processing method.Knowing the geometric locations of the ball and robots on the soccer field, the experimental test is carried out by three major steps.First, according to the circumstance in the soccer field, a team formation is chosen for the soccer robot system, and a role is selected for each individual robot by using decision-tree RSM.Second, each robot executes an action provided by the AIN-based ASM to fulfill its role playing.Based on the concepts of AIN, only three actions are necessary for the robot soccer game, including ball chasing, opponent blocking, and space chasing.Table 3 depicts the functions of these actions and the situation it is used for.Finally, the robot action is performed by using a point-to-point motion controller.
In the first example, the images shown in Figure 11 are the top views of robot continuous motion in the soccer field.The white arrow is placed on the top of the robot and indicates the motion direction of the robot.The red

Conclusion
In this paper, an action selection mechanism based on the concepts of artificial immune network is proposed for a robot system playing soccer.The decision-tree method is applied to the upper level of the strategy planning system, which can choose a team formation and assign an applicable role to a robot according to the location of the robot in the soccer field.After the role is selected, the lower level of the strategy planning system, the action selection mechanism, starts to work.Using the concept of immunology, the action selection mechanism is designed and composed of

Ball chasing
When the concentration of ball is the highest, this action is selected by robot and it will chase toward the ball.

Opponent blocking
Robot moves to a point between opponent and ball to prevent opponent takes control of the ball.This action is taken when the concentration of opponent is the highest.

Space chasing
This action is only used when the concentration of ball is the highest and under the following condition: a) opponent is within 5 inches from robot, b) opponent and ball are in the same quadrant, and c) opponent is between robot and ball.
an artificial immune network and a reinforcement learning mechanism.The concept of antibody in AIN is utilized to model the occupants surrounding the robot, such as the ball, opponent robots, and a vacancy.The circumstance of each quadrant around the robot in the soccer field is analyzed, and the antibody or the occupant with the highest concentration is triggered, such that each of our robots can be appointed to a certain action.The proposed reinforcement learning mechanism assures that each robot performs the right action by offering a reward, otherwise a penalty is given.This helps the antibodies of the AIN-based ASM to learn and memorize the actions of the robots.
In the application of multirobot soccer game, this research has implemented 1-on-1, 3-on-3, and 5-on-5 soccer games, simulated the AIN-based ASM by using the FIRA simulator, and tested the algorithm on a real soccer field.The results show that the AIN-based ASM can carry out desirable performances.Two major contributions of the AIN-based ASM are as follows.First, the complexity of designing the robot actions is reduced compared to the methods by geometry thinking [3][4][5], as we can see from Tables 1 and  3, the number of the required robot actions is reduced from eight to three.Second, the AIN-based ASM will not have the same functionality problem as the geometry thinking method does in certain geometrical locations of the ball in the soccer field.

Figure 2 :
Figure 2: Scheme of antibodies for a robot.

Figure 4 :Figure 5 :
Figure 4: The concentration of Help T-cell.

Figure 6 :
Figure 6: Point in different levels will be recognized incorrectly.

Figure 7 :
Figure 7: Five robots are placed at five different locations.

Figures 9 and 10 Figure 8 :
Figure 8: Top-view sketch of the two-wheel mobile robot.

Table 1 :
Actions of geometry thinking ASM

Table 3 :
Actions of AIN-based ASM.