Efficient Defense Decision-Making Approach for Multistep Attacks Based on the Attack Graph and Game Theory

In the multistep attack scenario, each rational attack-defense player tries to maximize his payoff, but the uncertainty about his adversary prevents him from taking the favorable actions. How to select the best strategy from the candidate strategies to maximize the defense payoff becomes the core issue. For this purpose, the paper innovatively designs a game theory model from the point of network survivability in combination with the attribute attack graph.+e attack graph is created based on the network connectivity and known vulnerabilities using the MulVAL toolkit, which gives the full view of all the known vulnerabilities and their interdependence.+en, we use the attack graph to extract attack-defense actions, candidate attack-defense strategies, attackdefense payoffs, and network states, as well as other game modeling elements. Afterwards, the payoffs of attack-defense strategies are quantified by integrating attack-defense strength and network survivability. In addition, we input the above elements into the game model. +rough repeated learning, deduction, and improvement, we can optimize the layout of defense strategies. Finally, the efficient strategy selection approach is designed on the tradeoff between defense cost and benefit. +e simulation of attackdefense confrontation in small-scale LAN shows that the proposed approach is reliable and effective.


Introduction
With the expansion of network scale as well as the increase of complexity and the continuous development of attack technology, it is impossible to absolutely prevent the network from being attacked. A large number of network key service nodes may meet the network attack, and the defender should provide enough network services to meet the normal operation of the network through conducting defense strategies. erefore, the strategy selecting both sides of attack-defense starts around the survivability of the network. For the defender, the survivability of the network is the key to analyze the security and effectiveness of the defense strategy.
e purpose of the attack graph [1][2][3][4][5] is to analyze the attack-defense actions of the network through nodes and edges in the graph. Attribute attack graph regards the condition or attribute of the network as a node in the attack graph. When studying network security, it can accurately depict an event as a node in the network.
Attribute attack graph has become the main method of mitigating network security in recent years [6][7][8]. In this paper, we propose a selection approach of optimal strategy for multistep attacks using the attack graph and game theory. In detail, the related attack-defense elements are extracted and taken into the game model for defense strategy deduction. We mainly focus on the continuous decision-making in the process of attack-defense dynamic confrontation. With invasion going, the attacker masters more defense information and can find a better attack path. Accordingly, the defender can also adjust the related defense strategy based on the attack path predictions. In contrast to other models, the proposed model guides the generation and optimization of the defense strategy during attack-defense adversary. e main contributions are as follows: (1) e attack-defense model for defense decisionmaking using the dynamic game theory is constructed. In the multistep attack scenario, attackdefense has the characteristics of collaborative evolution.
e proposed model comprehensively considers the network environment and attack-defense security mechanism information, which can accurately reflect the dynamic adversary process of attack and defense in the multistep attack scenario. Compared with the previous decision-making models based on the finite state machine, cybernetics, expert system, case reasoning, impact network, etc., which do not fully consider defense information and are only applicable to the analysis and statistics of simple attack-defense laws, the proposed dynamic attack-defense game model has better capability of interpretability. It can depict the adversary evolution process of complex multistep attack-defense scenarios and continuously guide the optimization and arrangement of defense strategies in the process of dynamic game between the two sides of attacker and defender. Hence, the proposed model enhances the defense perspectiveness and decision-making continuity.
(2) e improved strategy payoff calculation method is put forward. Existing methods only consider the direct security payoff, which affects the accuracy of strategy selection. In fact, the defense of multistep attack is often difficult to achieve by relying only on a single strategy, while it requires the combination of various defense strategies to maximize the comprehensive defense payoff. erefore, the accuracy of strategy payoff quantification is significant. is paper further considers the indirect payoffs brought by legal blame and counterattacks (see in Section 4). For example, the defender can trace the attacks by collecting the attack evidences, including the port scanning time, port number, source IP address, and destination IP address, so as to obtain the indirect payoff through attack deterrence. e indirect payoff can lead to the increase of defense payoff and decrease of attack payoff. In addition, through adjusting the game payoff values, we can analyze the effect of defense strategy selection. Our model avoids the aggravation of attack-defense confrontation, enhances the ability of network security governance, and improves the flexibility and accuracy of defense decision-making.
(3) e optimal defense strategy selection approach for multistep attacks is designed. e complex attackdefense scenario has the characteristics of multistate and multistep. With the penetration of network attack, the information gained by the attacker will gradually increase. Based on the new information, the attacker can implement new attack strategies. Accordingly, the defender needs to adjust the defense strategies in different attack stages to improve the defense effect. To depict the interaction process of the attacker and defender, we employ the dynamic game theory to illustrate the decision interaction and behavior evolution of two sides. By calculating the game equilibriums in different game stages, we can calculate the optimal defense strategy arrangements in each moment. It enhances the pertinence and reliability of defense decision-making. e rest of this paper is organized as follows. Section 2 describes the related work. Section 3 designs the game model for network survivability. e general strategy payoff analysis is provided in Section 4. Section 5 performs specific attack strategy payoff analysis towards multistep attacks including the single-step attack payoff and multistep attack payoff. Section 6 provides the analyses of defense payoff and best strategy selection.
e experiments and analyses are demonstrated in Section 7. Finally, we conclude this paper in Section 8.

Related Works
In order to maintain the normal operation of the network, most network security managers need to take a series of defense measures to make the network survive. In recent years, game theory has gradually become a mainstream method to study network security defense decision-making. For example, Wang et al. [9] studied the survivability of the network in the process of attack-defense and quantitatively analysed the security states of the network. Chen et al. and Wang et al. [2,10] built a dynamic network attack-defense game model to carry out network defense decision-making. Shen et al. [11] regarded the behavior of selecting the attackdefense strategy as a multistage game process and dynamically analyzed the impacts of selecting the network security strategy on the network system. Similar studies include the following. e repeated attack-defense game theory is used in the wireless network for resisting DDoS attack [12]. e differential attack-defense game model is constructed in [13], and the calculation method of saddle-point strategy and the optimal strategy selection algorithm are given. Tan et al. [14] quantified the benefits of both sides of attacker and defender based on the bounded rational game model and studied the dynamic and evolution of both sides of network attack-defense. However, in the above research, network state change is regarded as the power of the game. e best defense strategy cannot be explored well.
We use the game theory to study network security, in which the game information is a key problem. Some scholars use the complete information game model. For example, Lye and Wing [15] defined the complete information static game model with the recovery time needed after the network is invaded as the payoff. In [16], the theory of complete information dynamic game is used to convert the network attack graph into the network game tree to study the active defense. In the network, there is information asymmetry between the attacker and the defender, and both sides cannot fully understand each other, which limits the application of the complete information game model. In order to solve the problem of incomplete game information, Lee et al. [17] established a static game model of incomplete information and analyzed the vulnerability risk. However, only one static game is used to predict the invasion behavior, that is, the attacker will not change the invasion strategy in the invasion process. In reality, the attacker often has limited information collection ability and cannot fully understand the target network before the invasion, so he can only make a local high payoff attack strategy based on the existing information. With the development of intrusion, the attacker may have a further understanding of the target network and will find a higher payoff intrusion path and then constantly adjust the attack strategy. Hence, the actual intrusion is composed of different stages, and the attacker in different stages of the target network information is different. e attacker in each stage will adjust the strategy to get more payoffs.
e dynamic game of incomplete information considers the factors of information update and strategy adjustment of the attacker. For example, the attack-defense signal game model is established, and the algorithm of selecting the optimal defense strategy is designed by Liu et al. [18]. e attacker can adjust the attack strategy through receiving the defense signals released by the defender, but the approach is limited to the bounded defense signals of the attacker. Considering that the lack of vulnerability information may make errors in the prediction of the attack path, the defender needs to solve two key problems to accurately select the best strategy. e first problem is information update.
e defender needs to predict the vulnerability information of the attacker in different attack stages. e second problem is strategy adjustment. When the attacker obtains the new information of the target network, he will adjust his strategy to get more payoffs. e defender needs to predict the strategy adjusted by each step of the attacker in order to get more defense payoff.
For this aim, this paper proposes a method of dynamic game strategy analysis about network survivability based on the attribute attack graph. We quantify attack-defense action strength as well as provide suggestions for network security administrators to implement single-stage and multistage defense measures. Firstly, we use a matrix to depict the IP address, attack action, and attack path of network nodes. Secondly, we quantify the impact of the attack-defense strategy on the survivability of the network system. irdly, according to different payoffs of candidate attack-defense strategies, we calculate the optimal defense strategy and provide more understandable and reasonable defense decision for network security mitigation.

Construction of the Network Survivability
Game Model e process of network attack-defense is a multistage game process. In each stage, both sides of attacker and defender select and execute attack-defense actions and get immediate returns. e cumulative sum of immediate returns in each stage is the total gains of both sides in the whole process of confrontation. e maximization of total payoff is the goal of the game between the attack-defense sides. e game process can be described in Figure 1(a). In each attack step, both attackers and defenders detect the current network state and select attack-defense actions according to the state and the adversary's former action.
e network system transfers from one state to another under the joint action of attack-defense. e steps of attack-defense interaction are as follows: (1) Both sides of attacker and defender detect the current network state at time t firstly (2) Both sides of attacker and defender implement their attack-defense strategies one after the other according to their expected strategy payoff functions (3) Both sides of attacker and defender calculate their real payoffs after performing the strategies (4) e network system transfers to the next security state at time t+1 (5) Steps (1)-(4) are repeated until the attack-defense reaches a balance state at time t + k e network state transition during the attack process is shown in Figure 1 e network state is denoted as s i � 〈host, privilege〉, where host is the identity of certain host in the network. Privilege � {none, user, root} indicates that the attacker does not have any privilege, has normal user privilege, and has administrator privilege, respectively. τ � S × S is the set of state transition relationships, which are determined by host information, vulnerability information, network topology, network connectivity, and attack-defense mechanism. Because of the non-cooperation and conflictedgoal features of the attacker and defender, the confrontation leads to the transition of the network state. e attacker's goal is to gain more advanced network access. e defense's goal is to prevent illegal access. e game model includes players, attack-defense strategies, attack-defense payoffs, and other security elements. On the basis of measuring security states of network survivability, we add the network survivability measures, and the payoff values of attack-defense strategies on network security are quantified. e definition of the dynamic survivability game model is given below. (1) N � (N A , N D ) represents the set of players in the attack-defense game; N A and N D represent the attacker and the defender, respectively. (2) S � (S A , S D ) represents the payoff matrix of the attack-defense strategy: n denotes the number of nodes in the attack path, and the order of attack is 1 ⟶ n. m denotes the maximum number of attack strategies when attacking n nodes, k denotes the maximum number of defense strategies of the defender against n nodes, and a ij denotes the j-th attack strategy of the i-th node. Matrix element is 0 or 1, which indicates whether an attacker or defender selects the strategy. In order to promote the analysis of security strategies, the defense strategy vectors and attack strategy vectors of each node are given as (3) Network survivability metric V is a quantitative value to measure whether a critical task can be accomplished or service continuity can be guaranteed when a network is compromised. In order to quantify the impact of attack-defense behavior on the network system, this paper uses the attack-defense action strength in Lincoln Laboratory Attack-Defense Behavior Database [19] to analyze the payoff of the attack-defense strategy on the system. (4) SI � (SI A , SI D ) is the strategy strength. SI A and SI D represent the strengths of attack and defense strategies, respectively; SI ∈ [0, 1].
V ∈ [0, 1] indicates that the network system is safe and has the capability to keep providing the necessary services. V ∈ [−1, 0) indicates that the network system is in risk with low survivability and hard to provide the normal services. (5) g denotes the function of attack-defense strategy selection. When an attacker invades the system, there is an attack vector a i → corresponding to the attack path. e attacker selects an element a ij in the vector. For defenders, in order to maintain the security of the network system, when facing an attack, they need to react to the attack action. erefore, a set of defense strategies is obtained. When the attacker selects the attack strategy a ij , the defender selects related element of the defense strategy set to make V ≥ 0. en, there exists a function as follows: e defense strategy set is indicated by the upper bound expression in the set. Formula (3) denotes that when an attacker selects an attack strategy a ij , in order to maintain the security of the network system, the defense measures can only be selected in represents the set of strategies for defending node i.
denotes the corresponding payoff matrix of the strategy selected by the attacker and defender.
U A and U D represent attack and defense payoff matrix, respectively. Attack payoff U aij is the profit of attacking node i when adopting defense strategy j. Defense payoff U di j is the profit of taking defense strategy j to defend attack strategy i. (7) f is the payoff function to calculate the attack-defense payoff matrices. When the attacker takes the attack measure j on the node i of the system, the defense measure of the defender is Sup S D i . en, we have e attribute attack graph is created based on the network connectivity and known vulnerabilities using the MulVAL toolkit [20] in this paper, which gives the full view of all the known vulnerabilities and their interdependence. en, we use the attack graph to extract attack-defense actions, candidate attack-defense strategies, attack-defense  payoffs, network states, and other game modeling elements. Afterwards, we input the above elements into the game model. rough repeated learning, deduction, and improvement, the game model can output the optimal defense strategy.

Payoff Analyses of Attack-Defense
e quantitative calculation of the strategic payoff of both sides of the attacker and defender is the basis of the subsequent game analysis. It directly affects the results of the strategy selection. erefore, it is necessary to quantify the payoffs of the strategies of both sides accurately. Present quantitative methods are not comprehensive enough. On the basis of summarizing the previous work, this paper puts forward an improved quantitative index set of attack-defense strategy payoff shown in Figure 2, which explains how to obtain the quantitative value of system cost and benefit in detail.

Attack Strategy Cost.
Attack cost (AC) refers to the cost of using the attack strategy, which includes resource consumption and camouflage cost.
In the dynamic game scenario, if the attacker fails to achieve the goal, he will take measures to conceal his attack behavior so that the defender cannot accurately identify the attack. e cost of attack camouflage indicates spending of concealing attack behaviors.
Definition 2. AOC (attack operation cost) is defined as the cost of system resources and attack skills consumed by an attacker to launch an attack. Based on CVSS evaluation, we select three parameters, vulnerability exploiting mode, attack complexity, and vulnerability availability, to evaluate the attack operation cost which is given as the following formula: V Av, V Ac, and V Exp are the assessment values of the vulnerability utilization mode, complexity, and availability, and ω Av , ω Ac , and ω Exp are weights, ω Av + ω Ac + ω Exp � 1; ρ is the cost attenuation factor of attack operation, which means that the cost will be reduced if the vulnerability has been attacked again. i is the number of times that the vulnerability has been attacked.
V Av, V Ac, and V Exp are measured according to the CVSS corresponding item. e level and value of specific V Av and V Ac can be obtained from the NVD database.
e level and value of V Exp are obtained by searching the public Bugtraq number of vulnerability.
Definition 3. ACC (attack configuration cost) is the index to describe the cost of attack camouflage in attack-defense interchange. Attacker often conceals his attack purpose and leads the defender implement the wrong defense. In order to achieve this purpose, attackers often need to take multiple types of attacks in parallel. erefore, the camouflage cost is the sum of the attack operation cost (AOC) of the attack actions taken to camouflage attacks.

Attack Strategy Benefit
Definition 4. AB (attack benefit) indicates the benefits gained by the attacker in the attack. According to the benefit type, we can divide into direct benefit and indirect benefit.
Definition 5. AL (attack lethality) indicates the inherent damage degree of a certain type of attack. e attack lethality should be related to the attack cost. e higher the lethality is, the higher the attack cost is. erefore, different types of attackers should adopt different lethal atomic attack strategies. For example, strong attackers tend to adopt high lethal atomic attack strategies. Definition 6. D cost (damage cost) indicates the loss of system resources caused by the attacks to the defender. e system loss can be quantified by criticality and security attribute damages. In this paper, the damage of security attributes can be divided into integrity cost, confidentiality cost, and availability cost. e damage of security attributes has a certain bias to the cost of each security attribute. From the three aspects of information integrity, confidentiality, and availability cost, denote as (P i , P c , P v ), where P i + P c + P v � 1. e value of the security attribute cost can be evaluated by three levels by the following formula, where m is the number of attacked hosts: Definition 7. ADR (attack direct benefit) indicates the benefits that an attacker can get directly from the defender to make a successful attack. ADR is generally smaller than D cost. e system loss cost (D cost) can be regarded as the direct benefit of the attacker. Definition 8. AIR (attack indirect benefit) is to the immediate benefit after a successful attack. e indirect benefit of the attack refers to the social loss that the defender may suffer in a period of time after the successful attack, such as the loss of users and the decline of service quality, which need to be calculated according to the environment and assessments.

Defense Strategy Cost.
According to the different ways that defense affects the system, defense cost (DC) can be divided into DDC (defense direct cost) and DIL (defense indirect loss). Compared with the traditional approaches, the index of defense indirect loss (DIL) is added, and the Mathematical Problems in Engineering quantitative process of defense direct cost is refined. Defense direct cost (DDC) is an adverse effect on the information system that may be caused by the defense strategy adopted by the defender. Among them, DDC can be expressed by the sum of operational cost, negative cost, and residual cost of the defense strategy and indirect loss of defense (negative value), namely, In the process of the attack-defense game, the defender needs to collect the signals released by the attacker, which also needs to consume a certain cost. erefore, the cost of signal collection (SCC) is added to describe this cost. DDC is defined as follows: Definition 11. R cost (rest cost) is the impact or loss of the residual attacks on the information system after the defense strategy is adopted. It can be expressed as the residual co- e(a, d), where ε(a, d) is the residual impact of attack a on the information system when the defender adopts strategy d and the attacker adopts strategy a.
Definition 12. DIL (defense indirect lost) is the social loss that the defender may suffer after being attacked for a period of time, such as the loss of users and the decline of service quality. Its value is the same as the IAR of indirect attack benefit (refer to the calculation of indirect attack benefit above).
Definition 13. SCC (signal collect cost) is defined to describe the cost of the defender monitoring the attacker's signal in the attack-defense game. e attacker's signal is mainly collected and processed by the IDS. erefore, the cost of signal collection and monitoring is mainly measured by the amount of time and computer resources consumed by the IDS to collect, analyze, and process signals. e cost of signal collection can be quantified according to the amount of time and network resources.
(i) SL1: signal collection is only carried out at the beginning of the attack event, and analyzing and processing of the attack signal hold very little resources (ii) SL2: signal collection is carried out at any time node of the attack event, and the signal should be analyzed and processed in the whole process of the event, which take up more resources (iii) SL3: in a period of time, it is necessary to monitor the signal of several attacks, as well as to analyze and  process the signal in each event, which take up a lot of resources According to the requirements of security threat assessment, specific values can be used to measure the cost of signal collection at different levels.

Defense Strategy
Benefit. DR (defense benefit) indicates the benefits gained by the defender after adopting the defense strategy. To our best knowledge, no rest D cost is considered in existing methods, however, without considering the benefits of the defenders' counterattack. e metric deviates from the actual value. Our improved strategy measurement is as follows.
Definition 14. DDR (defense direct benefit) is the direct benefit obtained by the defender after adopting the defense strategy. It is expressed as a defense strategy against an attack, and the information system is free from loss, which is generally expressed by the cost of system loss, D cost.
Definition 15. CR (counterattack benefit) is the profit that the defender uses the information left by the attacker to trace and counterattack the attacker. It is generally believed that the higher the cost of defense, the more attention the defenders attach to defense and the more rewards they will get from counterattack. e profit on counterattack can be classified and quantified according to the defense cost as follows: (i) CL1: defenders do not pay attention to information system security and invest less in defense and have low defense cost. (ii) CL2: defenders pay attention to information system security and invest in defense generally. (iii) CL3: defenders attach great importance to the security of the information system. ey invest a lot in defense, and the cost of defense is high. e security threat assessment scenario can use specific benefit value to measure the relative benefit of each level. erefore, we can get the payoff functions of attackdefense strategies as follows: attack strategy payoff � f(AR, AC), defense strategy payoff � f(DR, DC).
e strategy payoffs of both sides are as follows: e sum of the profits of the attacker and defender in the attack-defense game is as follows, respectively: Suppose in a network game scenario, the attacker selects not to attack and the defender selects a defense strategy. In this scenario, the payoffs of the attacker and defender are as follows: e sum of the payoffs of both sides in the above game scenario is From the above, it is easy to derive that whether the attacker attacks or not, the sum of the game payoff of the attacker and defender is a constant; that is to say, the information security of the attack-defense game is a nonzero sum game.

Single-
Step Attack Payoff. First, according to the network topology and network vulnerabilities, we can derive that the attack strategy matrix S A with size n × m is composed of 0, 1. For example, Only one element (i, j) is 1, and the rest is 0 in S A i , which indicates the single-step attack action of the attacker implementing attack strategy j on node i.
In order to calculate the attack strategy payoff matrix more objective, we use the database of Lincoln Laboratory Attack-Defense Behavior [19].
If there is a matrix B, then we get ‖B‖ p according to [21] According to the function relationship between the attack-defense payoff matrix and the attack payoff, the payoff of single-step attack action is obtained as follows: where f A i represents the cost of attack step i and f A 1 represents the cost of attack stepA 1 .
Next, a simple example is given to illustrate the attack cost of a single attack strategy. Assuming that the node n in the network system has i number of vulnerabilities and each vulnerability has an attack strategy, the attack strategy matrix S A n is defined as follows: e attack payoff matrix for compromising i vulnerabilities is According to formula (18), the attacker's attack payoff of attacking node n is as follows: When an attacker attacks vulnerability l(1 ≤ l ≤ i), the attack payoff is

Multistep Attack
Payoff. First, we analyze the attack steps as follows. When an attacker invades a targeted network system, due to the lack of information with the system, some attack actions may fail, and the attacker may take the same attack actions on the same attack targets. Suppose that the attacker attacked the same node, and then, the more the number of attack times, the less the attack cost. Given the parameter λ ∈ (0, 1), when the same attack action was executed on the same target n times, the n-th attack cost was λ n times as much as the first one. Since multistep attacks can be divided into several single-step attack actions, the multistep attack analysis can be divided into multiple single-step attack action analyses such as the following example.
When an attacker carries out a multistep attack, the attacker's multistep attack cost is calculated, and the attacker's q-step attack strategy is S A 1 , S A 2 , . . . , S A q ; then, the attack cost of the q-th step is given as follows: Because of the repeated game between attack-defense, when calculating the attack payoff, we consider the function f q A with the median coefficient as follows: f q A � υ 11 u a 11 + υ 12 u a 12 + · · · + υ nm u a nm (24) e number of attacks for the corresponding attack strategies is υ 11 , υ 12 , . . . , υ nm ∈ 0, 1, 2   Denote the attacker's attack strategy as a ij ; the defense strategy set Sup S D i of the defender adopted by the defender has the least payoff of guaranteeing the survival of the network system, which is called the optimal defense strategy. erefore, we can get By formula (25), we can obtain the optimal defense strategy d il , where d il ∈ Sup S D i . e defense strategy matrix is In S D i , only element d il is 1, and the rest is 0. A simple example is given to illustrate the calculation of the defense strategy payoff.
Assuming that the vulnerability i of attack node m is exploited and that there are j kinds of vulnerability for strategy i, the defense strategy matrix S D m is According to the optimal defense decision-making principle, the defense strategy with the least payoff under the condition of network survivability is the optimal defense strategy. According to formula (3), the proposed defense strategy in the defense matrix S D m is S D m � 0 . . . d ml . . . 0 ., where SI d ml > SI a ml and u ml > min(Sup U D m ). Similar to the single-attack strategy payoff, the single-defense strategy payoff is obtained as follows:

Multistep Defense Payoff and Strategy
Selection. e defender selects multiple optimal defense strategies according to multiple attack actions, which makes the maximum survivability of the network system. e payoff analysis of is the same. First, it is assumed that the attacker and the defender have the same business capability. When the defender implements the same defense strategy many times, the defense payoff decreases correspondingly, and herein, the parameter is λ. According to formulae (23) and (24), the payoff of the multistep defense strategy is as follows: When p � 1, it is formula (18) of calculating the payoff of a special single-step defense strategy. e number of times of defense strategy implementing is η 11 , η 12 , . . . , η nk ∈ 0, 1, 2, . . . { } for ∃η ij ≠ 0. e defense strategy is selected according to the attacker's attack strategy of making the maximized survivability V ≥ 0.

Experiments.
In order to verify the impact of defense strategy selection on network system security, the network environment is constructed as shown in Figure 3, which includes three hosts and two servers, as well as two firewalls and two IDS.
In the experimental environment of Figure 3, the attacker attacks the host and server in the system through the network. First, the attack-defense action strengths in Tables 1 and 2 are given. e attack-defense payoff is determined on the basis of the capabilities of attackers and defenders of the same levels. Different attack-defense strengths correspond to different attack-defense payoffs. In order to improve the decision-making accuracy, we divide the attack-defense payoff into three levels corresponding to attack-defense strengths as shown in Tables 1 and 2. e configuration information of experimental environment is given in Table 3. Table 4 is obtained by querying CVE [22]. U.S. National Vulnerability Database (NVD) [23] is used to get specific vulnerability attack-defense action table.
According to attack action a 5 , we can get λ � 0.7. According to formula (24), attack payoff is

Comparisons and Analysis.
e interpretability of the generated strategy is better than other approaches, namely, the proposed approach can clearly explain what a strategy is and how it is generated. In this paper, we first introduce the attack graph to enhance the visual expression of the attackdefense strategy. e nodes and edges in the graph represent the network states and the set of possible candidate attack and defense actions, respectively. It intuitively explains the attack-defense strategies.
Second, the game theory is employed to model the attacker and defender. Game theory has the characteristics of objective opposition and non-cooperative and strategic interdependence, all of which are in line with the basic characteristics of cyberattack-defense. By calculating the game equilibrium point, we can better understand the evolution process of strategy derivation, generation, and adjustment.
Compared with other methods [9,10,13,14] using game theory or attack graph alone, our method is more interpretable by combining attack graph and game theory. [9][10][11][12][13][14] are investigated based on the state attack graph to consider the impact of the attack strategy and defense strategy on the security state of the system. Most of the research studies do not give the quantitative method of the security state of the network system as well as the analysis of the attack strategies of the network system. Decision-making is actually the arrangement of different strategies, but in the process of network system attack-defense, managers cannot achieve unlimited trial and error opportunities, and there are difficulties in understanding the maximum payoff of the strategy arrangement. erefore, this paper studies the impact of the network attack-defense strategy set on the survivability of the network system based on the attribute attack graph. Table 5 gives the performance comparison.

Conclusions and Future Works
is paper studies the strategy selection with maximum payoff in the network attack-defense dispute based on the attribute attack graph. Firstly, the attack-defense matrix is used to represent the attack-defense strategy and path. Secondly, Lincoln Lab's attack-defense action data are used to quantify attack-defense strength and network survivability.
irdly, combined with the attack-defense strategy strength and payoff, network system security is studied against multistep attack threats in the small-scale network system. Fourth, the interpretability and implementation of our strategy are better through using visual attack-defense paths. Finally, according to the attack strategy matrix, the proposed approach is designed to predict the possible attack behaviors and targets in the next step in multistep attacks.
Future work is combining with machine learning to achieve automatic analysis of attack-defense strategies so as to implement faster strategy implementation. In addition, how to improve the smart level of the proposed decision-making is in the next step of our research. We try to introduce knowledge graph, artificial intelligence, and other emerging technologies to enhance the ability of immediate decisionmaking response and strategy continuous optimization. Due to the sensitivity of attack-defense data, there are few open labeled datasets on the internet. e lack of labeled data restricts the supervised or semisupervised machine learning algorithm in the field of intelligent decision-making. In this case, the sample distribution is difficult to cover the decision space, and the generalization and applicability of our training model for strategy learning are not strong. How to introduce small sample learning, incremental learning, reinforcement learning, and make full use of the limited data so that the defender can learn new knowledge from the new sample continuously in the process of adversary with the attacker is the key point. rough the intelligent game model, the defender can gain online decision-making capacity immediately and optimize the precision of strategy selection faster.
Data Availability e data that support the findings of this study are not publicly available due to restrictions as the data contain sensitive information about a real-world enterprise network. Access of the dataset is restricted by the original owner. e data can be made available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.