A Defense Strategy Selection Method Based on the Cyberspace Wargame Model

Network defenders always face the problem of how to use limited resources to make the most reasonable decision. *e network attack-defense game model is an effective means to solve this problem. However, existing network attack-defense game models usually assume that defenders will no longer change defense strategies after deploying them. However, in an advanced network attack-defense confrontation, defenders usually redeploy defense strategies for different attack situations. *erefore, the existing network attack-defense game models are challenging to accurately describe the advanced network attack-defense process. To address the above challenges, this paper proposes a defense strategy selection method based on the network attack-defense wargame model. We model the advanced network attack-defense confrontation process as a turn-based wargame in which both attackers and defenders can continuously adjust their strategies in response to the attack-defense posture and use the Monte Carlo tree search method to solve the optimal defense strategy. Finally, a network example is used to illustrate the effectiveness of the model and method in selecting the optimal defense strategy.

(i) We propose a formal description method for the selection of optimal defense strategies, which formally defines the selection of optimal defense strategies for network security (ii) We propose a network attack-defense wargame model, which is a turn-based wargame and both attackers as defenders can continuously adjust their strategies in response to the attack-defense posture (iii) We propose a defense strategy selection method based on Monte Carlo tree search, using artificial intelligence methods to analyze the attack-defense strategies (iv) We design a simulation instance which is used to illustrate the effectiveness of the model and algorithm in selecting the optimal defense strategy e rest of this paper is structured as follows. e second section discusses the related work. e third section details the formal description of the optimal defense strategy. e fourth section discusses the network attack-defense wargame model. Continuing on this model, the fifth section uses Monte Carlo tree search to select the defense strategy method. e sixth section proposes an example to illustrate the effectiveness of the model and algorithm. e seventh section gives the comparison of related work. Finally, the eighth section summarizes the paper and proposes future work.

Related Work
Although some research results have been achieved about attack-defense models [2,3], strategy quantification and selection [4], and game theory [5][6][7], it is still in its infancy and no systematic theoretical methods have been formed.
Lee et al. [9] first proposed a cost-sensitive model as the basis of response decisions in 2002, which determined whether to respond or not according to the attack cost and revenue. e decision-making idea was relatively simple, and the quantification of cost-revenue was relatively rough. However, the ideas and methods of cost-revenue quantification, classification, and attack classification can be used for reference. Li et al. [10] established a noncooperative game model between attacker and sensor trust node and gave the optimal attack strategy by calculating Nash equilibrium. Because of the complexity of the restriction conditions of Nash equilibrium, Serra et al. [11] used the Pareto optimization method to calculate the Nash equilibrium solution of the game. Esmalifalak et al. [12] took the attack and defense times as the basic strategy of both sides, established a complete information two-person zero-sum game model, and verified it in the system. Wu et al. [13] used the reinforcement learning algorithm to solve Nash equilibrium and realize security situation analysis and prediction of an intelligent system. Liu et al. [14] investigated how to achieve such a trade-off optimally of cost-revenue by proposing a two-player strategic game model between the attack and the defender.
en, a graph-based simulated annealing algorithm is proposed to derive the utility-maximising strategy.
Wang et al. [15] analyzed the influence of the selection of cooperation strategy on the cooperation effect of sensor network nodes with the help of an evolutionary game. Na et al. [16] and Abass et al. [17] calculated the optimal evolutionary stability strategy against DoS attack and APT attack by using replication dynamic equation, respectively. Hayel and Zhu [18] established an evolutionary Poisson game model between malware and antivirus programs and analyzed the program opening strategy about the replication dynamic equation. Consider that the randomness of attack and defense means would inevitably lead to the state jump of the game system.

Formal Description of Optimal Defense
Strategy Selection e complex network topology, coupled with the various network node states, is difficult to describe in real-time. For example, in the network structure of n nodes, there are 2 1/2(n− 1)n 2 kinds of different situation combinations, where 2 n represents the authority status of the node and 2 1/2(n− 1)n 2 represents the state type of the network topology. erefore, a formal description of the network attack-defense environment can greatly reduce the computational complexity.
Definition 1 (network topology matrix). For the network topology, it can be represented by a two-tuple G � (V, E). If an n-order square matrix A [n×n] is used to represent the network topology, the square matrix satisfies the following formula: where the two-tuple G represents the network topology G � (V, E), V represents the set of nodes in the network (2) Definition 3 (network attack reachable node vector). For the network attack reachable node vector at time t, it is represented by the following vector: where n is the number of nodes in the network. Each node vector is calculated as follows: r i � 1, node i is reachable in the next attack, 0, node i is unreachable in the next attack.

Security and Communication Networks
Definition 4 (attack-defense posture vector). Attack-defense posture vectors can reflect the permission of each node in the network. In this paper, we believe that the permission of nodes in the network does not belong to the attacker; it must belong to the defender. It is represented by the following vector: where n is the number of nodes in the network.
Node i permission belongs to the defender, −1, Node i permission belongs to the attacker.
Definition 5 (the attack strategy node vector A → (t)). It represents the next attack node under the state S → (t). e attack strategy vector is expressed as follows: where each node vector is calculated as follows: Node i permission belongs to the defender after the game, −1, Node i permission belong to the attacker after the game, where there is only a i ≠ 0 in A → (t), and a i means that node i is the node of the current round of confrontation game.
From the definition, we can know that At time t, the nodes involved in the next attack strategy must belong to the reachable nodes of the network attack.
Definition 6 (target vector T → ). After several strategy combinations, the target is reached, and the target vector is expressed as follows: T → � (t 1 , t 2 , t 3 , t 4 , . . . , t target , . . . , t n ) where t target � −1 represents that the goal of the attacker is to obtain the permission of the node target.
According to the above description, under certain attack-defense resources, based on the attack-defense game rules, the attacker's permission to seize the node target can be described as given network topology A [n×n] , an initial attack to reach the node vector R → (t), and the initial attackdefense situation vector S → (t) � 1, after several strategies A → (t), and the target T → (t) is reached. e defense strategy selection problem implemented by the defender can be described as how the defender allocates the defense resource for each attack strategy A → (t) of the attacker.

Network Attack-Defense Wargame Model
Wargame [19] is a kind of turn-based, role-playing, strategy game. is paper models the high-level network attackdefense confrontation process as a turn-based wargame in which both attackers and defenders can continuously adjust their strategies in response to the attack-defense posture.

Model Hypothesis.
e hypothesis of the network attackdefense wargame model is as follows.
Hypothesis 1 (rational hypothesis). Assuming that the attacker and the defender are completely rational, the attacker will not launch unprofitable attacks and the defender will not defend at all costs.
Hypothesis 2 (cost assumptions). e goals of the attacker and the defender are to obtain and protect their network equipment. Both parties can be quantified and measured during the game to the offensive and defensive costs.
Hypothesis 3 (game hypothesis). Assume that the winner can replace the loser in the attack-defense game to obtain all the permissions of the node. If the attacker succeeds, it means that he will not be discovered by the defender and will proceed to the next round of attack-defense games. If the attacker fails, the defender can redeploy defense measures which are very important and effective. e existing network attack-defense game model usually assumes that the defender will not change after deploying the defense strategy, but in a high-level network attack and defense confrontation, the defender usually redeploys defense strategies for different attack situations. Last, if the game is flat, it means that the attacker is not discovered by the defender, and this node can be used as a springboard node for the next round of the game.
Hypothesis 4 (attack hypothesis). Assume that at the beginning of the attack, the permissions of all nodes in the network belong to the defender and no less than two network devices are exposed to the external network. If there is only one entry node, the defender only needs to protect the node, and there is no game process.

Formal Description.
In the network attack-defense game, the attacker finally obtains the target node permissions by obtaining the node permissions of the defender on the attack path. e following shows the network attackdefense wargame model in the multigame state.
represents the resource of the attacker and the defender. Resource A (t) presents the resource value of the attacker at time t, and Resource D (t) represents the resource of the defender at time t. When Resource A (t) ≤ 0, the attacker is not enough to launch an attack, and the game is judged to be a failure.
represents resource allocation vector.
ere is one and only one a i ≠ 0 in a 1 , a 2 , a 3 , a 4 , . . . . . . , a n , which represents that the attacker attacks at node i at time t.
Defender resource allocation vector , n is the number of nodes in the network and Resource D (t) ���������������→ represents the defense force distribution of the defense side at each node at time t.
attack-defense cost function at time t in the network attack-defense game process, including the attacker cost cost A (v i , Vul j , t)), which represents at time t the attacker attack the defender in node v i by using vulnerability Vul j and the defender cost cost D (v i t) represents the defense force deployed by the defender on node v i at time t.
represents the attack and defense revenue function at time t in the network attack-defense game process, including the attacker revenue revenue A (t), which represents the revenue of the attacker attacking the defender at time t, and the defender revenue revenue D (t), which represents the revenue of the defender defending the attacker at time t.

Cost-Revenue Quantification.
In the process of the network attack-defense game, the cost-revenue function needs to be quantified.
Cost A (v i , Vul j , t) means the cost that the attacker is attacking the defender in node v i by using vulnerability Vul j at time t. e attack cost in this model represents the value of vulnerabilities used by the attacker, assuming that the attacker purchases vulnerabilities and exploits tools through the vulnerability market. So, higher value of vulnerability leads to higher cost of attack cost, which further leads to greater difficulty and higher cost of defense. However, once the vulnerability is found by defenders, it will lose value as defenders can simply deploy firewall rules.
is paper uses the vulnerability price proposed in the VulDB library to evaluate the attack cost. VulDB is the number one vulnerability database worldwide with more than 178000 entries available. Its specialists work with the crowd-based community to document the latest vulnerabilities every day since 1970, providing technical details, and additional threat intelligence such as current risk levels and exploit price forecasts. eir price estimations are calculated via mathematical algorithms developed by their specialists over the years through observing the exploit market and exchange behavior of involved actors. It allows the prediction of generic prices by considering multiple technical aspects of the affected vulnerability [20].
is paper quantifies the cost of attack into 9 different levels according to the price range proposed by VulDB, Higher level represents higher cost of attack as well as higher value of vulnerability. e price and level of the vulnerability are shown in Table 1.
(1) Revenue A (t). e revenue of the attacker represents at time t. In the attack-defense game, the attack revenue increases by k when they get the permission of the nontarget node, which is the path node revenue. When the attacker obtains the permission of the target node, the attack revenue Table 1: Vulnerability price and level correspondence table. Level where n is the number of nodes in the network. In the process of attack, the ultimate goal of the attacker is to obtain the permission of the target node, so the sum of attack revenue obtained by the node on the attack path should be less than or equal to the attack revenue obtained by the target node. Otherwise, the attacker can achieve the maximum attack revenue as long as he focuses on the attack process. In this paper, for the convenience of calculation, k � 1: 1, Attacker obtains the permission of the nontarget node, n, Attacker obtains the permission of the target node, 0, Attacker do not obtain the node permission.
e cost of defender represents the defense cost of the defender deployed on the node v i . ere are many kinds of defender costs, such as manpower, equipment, and resources. In this paper, we do not consider the specific methods of defense but only pay attention to the allocation of limited defense resources to the network topology. Corresponding to the attack cost, the defense cost is e higher the level, the more defense resource deployed in the node.
(3) Revenue D (t). e revenue of the defender represents the defense revenue at time t. Since both attack and defend belong to a zero-sum game, revenue 1, Attacker obtains the permission of the nontarget node, n, Attacker obtains the permission of the target node, 0, Attacker do not obtain the node permission.

Attack-Defense Strategy.
In the process of network attack-defense game, it is necessary to define the attack and defense strategy.
Definition 8. Attack and defense strategy set S k � (S k A , S k D ). It represents the set of action strategies taken by the attacker or defense in the game state k . where represents the strategy of attack resource allocation, where r 1 , r 2 , r 3 , r 4 , . . . . . . , r n has one and only one r i ≠ 0. It means that the attacker launches an attack at node i at this time, and cost A � r i . where S k D (d) represents the strategy of defense resource allocation at each node in this state.
Definition 9 (Nash equilibrium [22]). e game model of NADWM is a zero-sum random game. In the game state k , attack and defense strategy set can be expressed as follows: e stable mixed strategy ((S i A (a * )), (S i D (d * ))) is a Nash equilibrium if and only if the mixed strategy is the optimal response of both offense and defense.
Due to the huge number of strategy choices, this paper uses the Monte Carlo tree search method in the fifth part to solve the Nash equilibrium. e final task of Monte Carlo tree search is to find the Nash equilibrium. e way to find is through continuous selection, expansion, and simulation and finally backpropagation whether the strategy on this path is appropriate. e victorious party increases all the utility, while the losing party reduces the action utility, and a balanced state can be reached finally.

State and State Transition Rules.
e construction of the network attack-defense game process consists of three steps: determining the initial state of the model, setting the rules of game state transition, and the termination state of the game.

e Initial State of the Model.
To start the game, the following conditions must be determined: (i) In the game process, there is only one attacker p A and one defender p D in the target network structure. (ii) In the initial stage, the permission of all nodes belongs to the defender, and the defender has no less Security and Communication Networks than 2 network devices exposed in the outer network. If only one device, the optimal strategy of the defender is to put all attack resources in this outer network node, and there is no intelligent game. (iii) e defender distributes the defense costs for all nodes in the network.

Game State Transition
Rules. e game state transition rules satisfy the following two conditions: (i) After the game starts, the attacker can move randomly in any direction within the network topology reachable nodes, and each move step needs to consume the attack cost cost A . (ii) In the process of network attack-defense, the attacker has a strong pertinence when attacking a target, while for the defender, it has a wide range of universality in the defense process. Simply put, an exploit attack may only apply to a certain environment of a certain node in the network, and in most cases, the defensive tools apply to both computers and servers. erefore, during each attack, only the attacker will deduct the cost of the attack from its total resource. en, the game is played according to the defense cost and attack cost. When cost A > cost D attacker wins, node permissions belong to the attacker, the defender does not detect the attacker, and judge whether it is a target node. If it is not a target node, revenue A � revenue A + 1 and revenue D � revenue D − 1. en, the attacker will proceed to the next attack. When cost A � cost D , both attacker and defender are tied and the node belongs to the defender, but the attacker can reach the next node from this node. When cost A < cost D , defender wins. Defenders can redistribute the defense revenue during each round of the game.

Game Termination State.
e game ends if any of the following conditions were met: (i) e attack resource is less than or equal to 0. (ii) e attacker has no new reachable nodes to choose from. (iii) e attacker arrives at the target node.Revenue is updated revenue A � revenue A + n and revenue D � revenue D − n.
In the process of attack-defense games, the goal of the defender is to ensure the security of the target node, as much as possible to ensure the security of other nodes in the network. In other words, the defender is to reduce the attack revenue under the premise of a certain amount of attackdefense resources.
e simulation operation process of NADWM is shown in Figure 1.

Algorithm Idea.
In the model of NADWM, as a large number of game states lead to a huge amount of computation, it is difficult to select the optimal defense strategy by using the enumeration method. is paper adopts the Monte Carlo tree search method, which is a heuristic search algorithm based on the tree data structure, which is still effective in the huge search space [23]. For different attack states, this scheme provides a relatively suitable defense strategy for the defender.

Algorithm Description.
Recently, Monte Carlo tree search is boosting the performance of computer Go playing programs which is a tree search strategy that balances the history and future returns. e basic principle is to randomly select the maneuver strategy and then update the value of the originally selected strategy through expected return. is algorithm makes a large number of repeated random simulations until the best strategy appears. Specifically, MCTS is divided into 4 parts, Selection, Expansion, Simulation, and Backpropagation. It is empirically proved that the performance of MCTS scales well against the number of simulations to select an optimal move in computer Go. In addition, developing efficient parallel MCTS (PMCTS) algorithms is important to improve the performance because single processor's performance may not be expected to increase as used to [24]. e PMCTS principle is shown in Figure 2.

Monte Carlo Tree Search
Steps. Monte Carlo tree search is essential to maintain a tree, in which each node corresponds to a specific situation state R. e edge of this  node is composed of all game actions of both attack-defense parties in state R [25], as shown in Figure 3.
Step 1. Selection. Select the node to go next.
Definition 10. N (R,p) represents the number of times the node performs action p in state R Definition 11. Revenue (R,p) represents the attack revenue of the node to perform action p in state R.
Definition 12. Q (R,p) represents the average attack revenue obtained by the node performing a series of actions p in state R and reflects the level of attack revenue that state R can provide, which is calculated as the following equation: Definition 13. T (R,p) represents the relative defense revenue. Because the attack-defense parties belong to a zero-sum game [19], the higher the revenue of the attack, the worse the defense strategy, and T (R,p) is calculated as the following equation: where β is a positive number to ensure T (R,p) ≥ 0. To select a node at the current position, the defender needs to select one from all legal strategies that satisfy the rules of the game and also satisfy the following equation: where T (R,p) presents the relative defense revenue, U (R,p) represents the upper limit of the confidence interval of T (R,p) , and U (R,p) is calculated as the following equation: where c is the priority probability stored in the strategy branch, n j is the number of times the action p j has been executed, and i n i is the total number of the exploration policy so far. Parameter c can be selected by the expert knowledge in the actual process, the larger the c, the more attention will be paid to the nodes with relatively few visits [26].
It can be seen that U (R,p) can measure the degree of policy exploration as the degree of uncertainty of T (R,p) . e addition of U (R,p) can improve the situation of the simple greedy policy which is easy to fall into the local minimum. [27].
Step 2. Expansion. In order to parallelize the algorithm, we modify the expansion stages as the way proposed in ref [28]. After selecting the attack strategy by S (R,p) , we expand the node into m random children rather than a single child. In this paper, we assume that the defense resources of the defender can be quantified as a nonnegative integer. Since the defender has at most resource + 1 methods of defense force deployment, m is less than resource + 1.
Step 3. Simulation. Selection and expansion are the simulation process, and the simulation process must follow the rules of the game. ere are two simulation end conditions: one is that the simulation reaches the leaf node and the other is that the simulation reaches the end state of the strategy and cannot be expanded. In order to parallelize the algorithm, we modify the simulation stages as the way proposed in ref [28] too. Rather than simulating out the child state only once, we simulate each child state k times. Here, k is a parameter that can be determined using m and N. N is the number of nodes in the cluster. k is at least N/m. is is to ensure that every node in the cluster is occupied during the simulation stage.
Step 4. Backpropagation. After simulation process is completed, the parameters of all edges in the simulation path must be updated. is process can reflect how the Monte Carlo tree search samples stronger strategic actions. e update method of a single simulation process is as the following equations: Default Policy

Selection Expansion Simulation Backpropagation
Tree Policy Figure 2: e principle of the PMCTS algorithm.

Security and Communication Networks
Among them, the formula updates two related variables, the number of visits of each edge is increased by 1, and the attack revenue is accumulated Revenue * (R,p) . Revenue * (R,p) represents the increase in the cumulative attack revenue during the expansion process and finally calculates the new T (R,p) . rough the above steps, it can be concluded that as the number of sampling increases, the Monte Carlo tree grows and the coverage state becomes more. e shape of the final tree is usually unbalanced. Some states are searched very deep and some are very shallow. is also reflects the advantages of Monte Carlo tree search. e most potential branches will be fully searched to a very deep level.
At this point, the Monte Carlo tree search has been sampled, and the information of all edges has been updated. Based on the finally formed Monte Carlo tree, the defender can make a defense strategy for the actual attack action of the current situation.

Algorithm Analysis.
e algorithm complexity can be simply expressed as O(mkI/C) where m and k are the same as Section 5.2, I is the number of iterations, and C is the number of cores available [28].

Experiment and Analysis
According to the network game rules, a simulation experiment is designed to test and analyze the effectiveness and applicability of the model and algorithm proposed in this paper.

Topological Structure Description.
is paper assumes that there is a typical network topology as shown in Figure 4. In the beginning, the attacker penetrates the internal network from the host v 0 , and the defender protects network 12 , v 13 }, including 3 desktops, 4 laptops, 2 printers, and 4 servers. It is divided into 4 different subnets, and the target of the attacker is v 13 .

Formal Description of the Optimal Defense Strategy.
A formal description of the attack-defense strategy is carried out according to the method in Section 3. Initially, the attacker has the authority of node v 0 and launches an attack on the network system V. e final attack target is v 13 . e network topology matrix A is expressed as follows: e formal expression of the attacker's target is as follows: T → � (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, −1), which means that the attacker obtains the permission of the target v 13 . e initial network attack reachable node is formally expressed as follows: R → � (1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), which means that the attack can attack v 1 and v 2 nodes at the first.
In summary, based on the network shown in Figure 4, the optimal defense strategy selection problem can be described as follows: on the certain attack-defense resources, based on the rules of the wargame model, the given network structure is A [13×13] , an initial reachable node vector R → , and initial attack-defense situation vector S → � 1, the optimal defense strategy for each attack strategy is found.

Description of Attack-Defense Resources and Attack-Defense Strategies.
For the convenience of calculation, resource A � resource D � 10 is set at the beginning of the game, which means that the resources of the attacker and the defender are both 10 at first.
In this simulation experiment, when the attacker obtains the authority of node v with a certain cost, it has the highest root privilege.

Defense Strategy Selection.
In this example, we set two typical attack-defense states that defenders must pay more attention to during the attack-defense game, that is, the beginning of the game and the end of the game. e selection of the defense strategy is discussed based on the abovementioned state using the method proposed in this paper.

At the Beginning of the Game.
e following discusses the defensive force distribution of the defender at the beginning of the attack. e defender firstly allocates the defense resource for each node in the network. ere are tens of thousands of defense resource deployment plans for the abovementioned network. Analyzing the topological diagram of the network structure shown in Figure 4, it can be seen that node v 9 is a necessary node from subnet C to subnet D whose authority attribution plays a key role in the network attack-defense. Combined with the network topology, this paper selects the following four typical game initial states for analysis.  (1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1) Considering both the path revenue and the target node revenue of the attacker, the strategy is evaluated according to the relative defense revenue. e larger the relative defense revenue, the better the defense effect of the strategy. e above four typical strategies are analyzed by relative defense revenue under different simulation times. As shown in Table 2, rows indicate the number of iterations and columns indicate different strategies.
As shown in Table 2, we can see that when the number of simulations increases, the score of relative defense revenue of the defense strategy gradually stabilizes. We can distinguish optimal strategies, among them (4) . According to the simulation results, we analyze the four strategies in detail.

Strategy 1.
is strategy is to pay attention only to the network entry node.
In this strategy, we concentrate the defense resource on the two entrance nodes, and each node deploys 5 resources. During the first attack, the attacker has an initial attack resource of 10, so the probability of the attacker entering the intranet is 50%. Suppose the attacker first attacks node v 1 , if the attack is successful, the attacker can drive straight into the network and obtain the permission of the node v 13 directly. At the same time, the gains on the attack path are relatively large, revenue A ≥ 8 making the relative defense revenue at a medium level.

Strategy 2.
is strategy is to pay attention only to the key nodes of the network.
In this strategy, we concentrate all the defense resources on the key node v 9 . During the attack, the attacker can successfully obtain the following 10 due to the attack resource resource A ≤ cost D , yet the attacker cannot get the permission of v 9 and thus cannot enter the subnet D. is ensures the security of the target node v 13 but let the attacker process gain more as revenue A � 8. It is not acceptable that this strategy protects the target but loses a lot of node permission along the path. If the security of the important data server is ensured yet network devices such as computers and printers are all implanted with Trojan horses or file encryption by the attacker, the company still cannot operate normally. So, in our opinion, this strategy is not a good strategy. is result is in line with our simulation result, which shows the effectiveness of the wargame model.

Security and Communication Networks
Strategy 3.
is strategy is to pay attention to network entry nodes and key nodes at the same time.
is strategy is a compromise between the above two strategies. During the attack-defense process, the defender pays attention to the entry node and key nodes at the same time.
at is to say, on the way to the target node, the defender balances the defense resource and network structure characteristics, setting two safety protection measures. Data analysis from Table 2 shows that it is the most suitable defense deployment plan. e abovementioned plan also conforms to our common methods of the network protection, which proves the practicability of the wargame model.

Strategy 4.
is strategy is to randomly select and defend 10 nodes on the network and divide the defense resources equally.
is strategy is random. When cost A is greater than 1, the attacker can attack the node where the defender deploys the defense without being discovered. However, since the defensive resource of the defender is scattered, when the cost of attack is high, it cannot defend effectively.

Near the End of the Game.
e following discusses the attack-defense game state and the attack-defense force distribution at the end of the game. At this time, resource A � 5 and resource D � 10.
e state finish means that the attacker has obtained the node of v 8 in subnet C and will launch an attack on v 9 . Since the attacker has used 5 attack resources when reaching the node v 8 , he can use not more than 5 attack resources in v 9 . It is very important that node v 9 is the only entry node for subnet C to subnet D. To ensure the safety of the target node v 13 , the defense deployment of the defender in v 9 is shown in Figure 5.
In this state, when the attacker uses cost A � 2 to attack the defender, the target node is effectively protected if the defense cost cost D ∈ [2,10], and when the attacker uses cost A � 3 to attack the defender, the target node is effectively protected if the defense cost cost D ∈ [3,10]. As a result, when the defender discovers that the attacker is approaching the target node, the defense should be concentrated on the key node entering the subnet entrance.
rough the abovementioned analysis, it can be concluded that the defender should pay more attention to the entry nodes and key nodes at the beginning of the game, while focusing on the key node entering the last subnet entrance when the attacker approaches the large node. e abovementioned strategy of the typical states also conforms to our common defense strategy in network protection, which proves the practicability of the model and algorithm in selecting the optimal defense strategy.

Related Work
is section will compare our work with related work, as game models, defense strategy, the rules of the game, and so on.
Liu et al. [29] used the state attack-defense map in combination with the security vulnerability assessment system to calculate and modeled the utility matrix. e optimal attack-defense decision was made by calculating the mixed strategy Nash equilibrium. However, since this defense strategy corresponded to the attack strategy one by one, it did not consider all possible defense strategies. Tosh et al. [30] proposed an evolutionary game model framework for cyber threat intelligence sharing. However, there existed excessive subjectivity in the quantitative calculation of the cost of attack-defense strategies. Lin et al. [31] proposed a full-information dynamic active defense game model by converting the "virtual node" into a game tree. Despite giving an algorithm for the game of attack-defense that suits two scenarios of complete and incomplete information, the algorithm did not fully consider the attacker's intentions. Zhang et al. [32] proposed the heterogeneous population evolutionary game model and then the decision method of network security defense and improved the accuracy of network security defense decision. e comparison of the characteristics of this paper and related research is shown in Table 3.
Compared with the related work, the defense strategy selection method based on the network attack-defense wargame model has the following characteristics:  (i) We model high-level network attack-defense confrontation as a turn-based wargame in which both attackers and defenders can continuously adjust their strategies in response to attack-defense posture. (ii) We add a higher and stronger defense strategy, by which the defender can redistribute defense resources when discovered that the target node is attacked. (iii) We propose the wargame model as a multistate dynamic attack-defense game. (iv) We use Monte Carlo tree search method to solve the optimal defense strategy.

Conclusions and Future Work
is paper proposed a defense strategy selection method based on the network attack-defense wargame model. We modeled the high-level network attack-defense confrontation process as a turn-based wargame in which both attackers and defenders could continuously adjust their strategies in response to the attack-defense posture. Based on the idea of artificial intelligence, we used the Monte Carlo tree search method to solve the optimal defense strategy in the game confrontation environment. Finally, a simulation model is designed to analyze the attack-defense process of the target network with rapid modeling and quantitative calculation.
Our future research will focus on the following 3 points. Firstly, stronger network attack-defense strategies will be added to the attack-defense wargame model. Secondly, the impact of defense deployment will be analyzed in key sections game. irdly, the applicability of the attack-defense game model will be improved with distributed coordinated attack-defense considered.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.