An Intelligent Mission Planning Model for the Air Strike Operations against Islands Based on Neural Network and Simulation

Mission planning of air strike operations is hard because it has to give instructions to a large number of units during a relatively long period of time in an uncertain environment. If some instruction parameters can be calculated by an intelligent agent, better strategies can be found more quickly. In a speciﬁc combat scenario of air strike operations against islands, an intelligent model is proposed to improve the performance and ﬂexibility of mission planning. The proposed intelligent mission planning model is based on rule-based decision and uses a fully connected recurrent neural network to calculate some of the decision parameters. The proposed intelligent mission planning model shows better results as compared to rule-based decision making with randomized parameters, and it performs as good as experts in the test set of the speciﬁc combat scenario.


Introduction
Air strike forces composed of fighter, bomber, early warning aircraft, electronic countermeasure (ECM) aircraft, and unmanned aerial vehicle provide a versatile striking force capable of rapid and distant employment. ese forces can quickly gain and sustain air superiority over regional aggressors, permitting rapid air attacks on air and surface targets. Fighter aircraft, operating from airport or aircraft carrier, attacks enemy fighters while providing security to exploit the air for intelligence, early warning and control, bombing, and other functions. Bombers provide an intercontinental capability to strike surface targets on short notice. Early warning aircraft is a primary source of information on enemy air and surface forces and installations. ECM aircraft radiates interfering signals toward an enemy's radar, blocking the receiver with highly concentrated energy signals while greatly reducing the enemy's information superiority.
If used properly, air strike forces would have an immediate impact on a conflict by suppressing enemy air defenses and inflicting massive damage on an enemy's infrastructure. But mission planning of air strike operations is the most challenging part of military operations research due to the following features. (1) ere are many kinds of equipment involved in combat, and their functions are different. All forces involved in air strike operations need to form an organic whole and cooperate closely. How to cooperate with each other in combat is a problem that needs careful consideration. (2) e combat unit can move freely in a wide and continuous space with high speed which means the decision space is especially huge. (3) e decision space is huge, but the target space is small and discrete, so it is difficult to analyze the boundary effect and sensitivity of decision variables. e acquisition, processing, and utilization of information have a great impact on the combat effect, and the contribution rate of different equipment systems is different. (4) e combat process is full of game and confrontation. (5) e duration of operation is generally long, and the forces, information, cognition, tasks, and coordination relations are changing dynamically.
In the past, air strike operation mission planning relied heavily on commanders with wisdom and experience. In recent years, researchers have been trying to use artificial intelligence for planning in order to reduce the burden of commanders and improve speed and efficiency. But artificial intelligence is mainly used for tactical decision making, such as target classification [1][2][3], behavior recognition [4], threat assessment and target assignment [5,6], fire support of the maneuver operation [7], and so on. For combat-level mission planning, there is little research on the application of machine learning because it is difficult to obtain training datasets.
However, in many computer games, from classical board games such as chess, checkers, backgammon and Go to video games such as Dota 2 and StarCraft II, machine learning has made great achievements [8]. Computer programs with machine learning technology can play at the level of a human master and even at a human world champion level. e computer program AlphaGo has defeated the human European Go champion by 5 games to 0 in the full-sized game of Go [9,10]. AlphaGo uses "value networks" to evaluate board positions and "policy networks" to select moves. ese deep neural networks are trained by a combination of supervised learning from human expert games and reinforcement learning from games of self-play. e MuZero algorithm [11], by combining a tree-based search with a learned model, achieved superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. When evaluated on 57 different Atari games-the canonical video game environment for testing artificial intelligence techniques, in which modelbased planning approaches have historically struggled-the MuZero algorithm achieved state-of-the-art performance. AlphaStar was evaluated in the full game of StarCraft II through a series of online games against human players. It was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players [12,13]. It uses data from both human and agent games within a diverse league of continually adapting strategies and counterstrategies, each represented by deep neural networks. Machine learning has also made great progress in fast-paced multiplayer video games such as Dota [14,15]. e range of possible moves in Dota 2 is far greater than that in chess or Go, where each move has at most a few hundred options. In the brain storm optimization, Ma et al. [16] proposed an orthogonal learning framework in which two orthogonal design (OD) engines (i.e., exploration OD engine and exploitation OD engine) are introduced to discover and utilize useful search experiences for performance improvements.
ere are many variables of combat-level decision making in air strike operations, which are interrelated and influenced by each other. It is difficult to find the mapping relationship between decision variables and combat effectiveness by using the analytic optimization method.
In this case, it is inevitable to establish the mapping relationship between decision variables and combat effectiveness by simulation. Using the air strike combat simulation system, different decision variables can be combined as scenario input, and then the actual combat effect can be obtained through simulation run. But under the background of decision-making space explosion, it is almost impossible to simulate all possible scenarios. In this case, the combination of simulation technology and machine learning technology can help overcome the above difficulties.
It is difficult or even impossible to design a general mission planning model, so it is feasible to try to design an intelligent mission planning model for a specific scenario. In this work, we focus on a specific scenario of air attack operations against islands and try to use an intelligent agent to make some of the decisions, so as to improve the efficiency and speed of mission planning. e rest of this paper is organized as follows. Section 2 presents the specific combat scenario. In Section 3, the overall architecture of the simulation and training environment is introduced. In Section 4, the details of the intelligent model based on neural network are proposed. Section 5 presents the experiment results. Finally, Section 6 gives the conclusion and the future work.

Scenario Description
e red force has one early warning aircraft, three kinds of unmanned reconnaissance aircraft, one ECM aircraft, sixteen bombers, twenty fighters, two anti-aircraft frigates, one anti-aircraft radar, and one military airfield to carry out early warning, reconnaissance, jamming, bombing, air combat, air defense, and other tasks in air strike operations. e blue force includes early warning aircraft, eight bombers, twelve fighters, two ground radars, an anti-aircraft frigate, three ground-to-air missiles (one in the northern island and two in the southern island), two command posts (one in each island), and a military airfield (southern island). Blue aircrafts are deployed on the southern blue island. e distance between the two blue islands is 170 km, while the distance between the red island and either of the blue islands is 290 km, as shown in the Figure 1. e antiaircraft frigate can move freely on the sea near the blue islands.
e red force's purpose is to destroy two command posts of the blue force. If both command posts are destroyed, the red force wins; if neither of the two command posts is destroyed, the blue force wins. e score is calculated according to the damage of combat units.

Simulation and Training Environment
e containerized simulation platform provides the basic computation service. e input of the containerized simulation system refers to the decision instructions at the combat level as shown in Figure 2. e simulation model automatically makes action decisions according to combatlevel instructions.
We design an intelligent agent for the mission planning. e intelligent agent transmits decisions within instructions to the containerized simulation platform through the interactive interface. e simulation system generates the training data of the intelligent agent, including the scenarios, decisions, and engagement scores of red and blue forces. e overall architecture of the simulation and training environment is shown in Figure 2.
e intelligent mission planning agent identifies the intent of the enemy force and makes combat decision based on attribute information and the identified intent as shown in Figure 3. Combat-level instructions include mission parameters such as the patrol area of early warning aircraft, the attack position and formation relationship of bombers, the formation relationship and patrol airspace of fighters, the formation and interference position of the ECM aircraft, responsible sector of ground-to-air missiles, and so on.
Action-level decisions are automatically generated by simulation models in the containerized simulation platform based on specific rules. Action-level decisions include flight route to specified position and tasks performed automatically. e automatic task of the early warning aircraft is to detect as the instructions indicate, the automatic task of the bomber is to find and attack the target within its scope, the automatic task of the fighter is to attack the enemy fighter found, the automatic task of the ground radar is to detect the air target, and the automatic task of the ground-to-air missile is to automatically strike the target that meets the shooting conditions.

Network Structure.
e input information of the neural network includes the states of blue command posts, the positions of frigates, early warning aircraft, fighters, and ECM aircraft. e decision space for the red force is huge, and we selected some of the parameters with a larger impact on the combat results as the output information. e output of the neural network includes the attack series of blue targets, the number of escorting fighters for early warning aircraft, the number of escorting fighters for ECM aircraft, the patrol position of early warning aircraft, and the jamming position of ECM aircraft. e proposed neural network contains four hidden layers, which include 25, 15, 75, and 100 neurons, respectively, as shown in Figure 4. e attack sequence stands for strike sequence of red bomber against blue targets on the ground or sea. e attack sequence is the output of the neural network, and the attack sequence is also used as the input of the neural network to maintain the relative stability of the decision along the time axis.

Learning Strategy.
Because the learning space is too large and the execution time of each simulation is nearly 1 minute, we use supervised learning to train the intelligent agent. e supervised learning uses scenario data generated by experts. Experts adopt heuristic rules to formulate the mission planning scheme.
Firstly, experts analyzed the equipment performance of both forces. e killing distance of the blue ground-to-air missile is 100 km, and the ground-to-air missile is close to the command post. e bombing distance of the red bomber is 80 km, so it is difficult to bomb without the electronic suppression of the ECM aircraft. As a result, the first heuristic rule is that planned activities within the range of ground-to-air missile strikes are protected by the ECM aircraft whenever possible. erefore, in the case of only one ECM aircraft, the red force's attack on the two command posts of the blue force is to be carried out sequentially.
Secondly, the ECM aircraft plays a very important role, so strict escort is needed. If the ECM aircraft is destroyed, the bomber is difficult to enter the bombing domain. erefore,    the second heuristic rule is that the red force should send as much fighters as possible to escort the ECM aircraft. Further, let the ECM aircraft fly in formation with the bombers, so that the escort fighters can be used more efficiently.
Finally, after the application of such heuristic rules, the decision space of mission planning is greatly reduced. e bomber's firing position is within 80 km from the command post. e distance between the guard position of the escort fighter and the ECM aircraft should be changed according to the change of situation. In air combat escort, fighters need to intercept blue aircraft as far as possible. In the presence of surface-to-air defense, escort aircraft needs electronic interference. is requires the calculation of the interference effect of the ECM aircraft, so that the escort aircraft can escort in the effective interference area to prevent the attack of air defense fire. e orientation of the escort aircraft relative to the ECM aircraft is related to the threat situation. e escort aircraft is configured in the direction of the threatening aircraft as much as possible.
According to the given combat scenario, the expert group designed a rule-based combat mission planning model. In this model, the ECM aircraft is closely protected, and the attack operations against the three targets are carried out with the ECM aircraft as the center. In the simulation process, if the ECM aircraft is unfortunately destroyed, the bombers attack the target in order, but the effect is very poor.
We redesigned the escort model of the fighter, and the spatial relationship between the fighter and the escort target can be flexibly adjusted according to the escort mission and threat situation. erefore, the survival probability of the ECM aircraft is greatly increased.

Experimental Results
We made 9 scenarios for the deployment of blue frigate and early warning aircraft. en, we invited three experts with experience to develop plans for each scenario. In this way, we can get 27 scenario data with expert experience that can be used as the training data of supervised learning. Second, we randomly perturb the above scenario data to prevent the plan from falling into local optimization.
For each scenario, even if the parameters remain unchanged, the results are not exactly the same for each run, so multiple runs are required to count its average scores. For example, under a certain parameter configuration, a scenario runs 46 times, and the scores of both forces are counted as shown in Figure 5. With the increase of running times, the average scores gradually tend to be stable, as shown in Figure 6. e average score tends to be stable after running 10 times in general. erefore, 10 is taken as the standard number of runs for each scenario.
After training the agent, the rule-based decision-making method is replaced by the intelligent agent in the simulation system. We generated 800 randomly generated blue force's deployment to test the agent and compare it with the rulebased randomized decision-making method. e simulation results are shown in Table 1. e probability of the trained agent winning is as high as 70.125%, which is far more than the rule-based randomized decision-making method. en, we apply agents to the former 27 scenarios which are used to obtain expert experience. It is found that the total combat result of intelligent agent is better than that of experts as shown in Table 2. Discrete Dynamics in Nature and Society

Conclusion
Mission planning of air strike operations is hard because its decision space is huge and it is difficult or even impossible to design a general mission planning model. It is feasible to design an intelligent agent for a specific scenario. In this work, we developed an intelligent mission planning model for a specific air strike scenario against islands, based on neural network and simulation, that performs at the level of the experienced commanders. Compared with artificial decision making, intelligent agent decision making has stronger adaptability and faster computing speed. Experiments show that the intelligent agent is as good as the experts in our dataset, and it is far better than the rule-based decision-making method with randomized parameters.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.