A Novel Dynamic Method in Distributed Network Attack-Defense Game

We analyze the distributed network attack-defense game scenarios, and we find that attackers and defenders have different information acquisition abilities since the ownership of the target system. Correspondingly, they will have different initiative and reaction in the game. Based on that, we propose a novel dynamic game method for distributed network attack-defense game. The method takes advantage of defenders’ information superiority and attackers’ imitation behaviors and induces attackers’ reaction evolutionary process in the game to gain more defense payoffs. Experiments show that our method can achieve relatively more average defense payoffs than previous work.


Introduction
Modern organizations embed information and communication technologies (ICT) into their core processes as means to facilitate the collection, storage, processing, and exchange of data to increase operational efficiency, improve decision quality, and reduce costs [1].In this way, distributed system is becoming widely used.Despite the significant benefits of distributed system, the system also places the processing tasks at the risk due to "distributed vulnerability." Traditional approaches to improve security generally consider only system vulnerabilities and attempt to defend all the attacks through system upgrading.No matter if the assuming attacks come, the defending resources have to be inputted.In distributed system, these keeping upgrading approaches will result in a huge waste of defending resource.Regarding this, game theory has been applied in network security.
In traditional game theory, equilibrium is achieved through players' analysis and reasoning based on common view about game rules, players' reason, and payoff matrix.Generally, the game players are the interactional individuals.Even as group-player, the members should be consubstantial with the same rational characteristics, strategies, and payoffs.However, this strong rational assumption of traditional game theory is receiving more and more criticism from game theory experts and economists [2].
In reality, there exist a large number of game problems between individual-player and group-player.For example, in distributed network attack-defense game scenarios, system officers, as defenders of the system, are consubstantial and can be regarded as individual-player (we use singular form to indicate individual-player and use plural form to indicate group-player).The defender has more information about system, game structure, and payoff matrix.Even if they temporarily lack knowledge, the defender has more resources to fill in the blank.So the defender is easier to make rational decision.On the other hand, attackers are regarded as groupplayer, because of their different information acquisition abilities and rational characteristics.In the game process, attackers will perform in an incomplete rational way and tend to imitate high payoff strategy behaviors.The process of imitation can be regarded as evolutionary process.As the theory of learning stated, the equilibrium is the results of the long-term process that players with incomplete rationality seek for optimization [3].In distributed network attackdefense game scenarios, game players, especially attackers as group-player, dynamically adjust their strategies based on game situation and press on towards dynamic equilibrium.
In this paper, we propose a dynamic method in distributed network attack-defense game scenarios.The method takes advantage of defenders' information superiority and attackers' imitation behaviors and induces attackers' evolutionary process to gain more defense payoffs.
The contribution of this paper is as follows.First, we describe distributed network attack-defense game as onemany game, regarding defender as individual-player and attackers as group-player.This way is more realistic.Moreover we formulate the game group-player's behaviors as evolutionary process.Based on the above, we propose a dynamic game method to achieve optimization of defense benefit.
The remainder of this paper is structured as follows.In Section 2, we discuss related work.In Section 3, we describe the problem and distributed network attack-defense game scenarios.In Section 4, we discuss group-players' behaviors in the game and model the behaviors into the imitation evolutionary process.In Section 5, we propose the dynamic game method with a strategy sequence generation algorithm and a parameter analysis method.In Section 6, experiments are performed to verify the proposed method.Finally, in Section 7, we present our conclusions and make recommendations for future works.

Related Work
Game theory is a study of mathematical models of conflict and cooperation between intelligent rational decisionmakers [4].In 1928, von Neumann proved the basic principle of game theory, which formally declared the birth of game theory.Due to the superiority of understanding and modeling conflict, game theory has recently been used in the field of computer network security.Reference [5] proposes a model to reason the friendly and hostile nodes in secure distributed computation using game theoretic framework.Reference [6] presents an incentive-based method to model the interactions between a DDoS attacker and the network administrator and a game-theoretic approach to infer intent, objectives, and strategies (AIOS).References [7,8] also focused on DDos attack and defense mechanisms using game theory.Reference [9] modeled the interactions between an attacker and the administrator as a two-player stochastic game and computed Nash equilibrium using a nonlinear program.However, these researches all assume that both players in the game are consubstantial even individuals.Obviously this assumption cannot cover all the realistic situations.This paper extends this assumption to one-many game to be more realistic.
In the field of dynamic game, [10,11] focused on the same scenarios as this paper.Reference [10] modeled the interaction of an attacker and the network administrator as a repeated game and found the Nash equilibrium via simulation.Reference [11] models the interaction between the hacker and the defender as a two-player, zero-sum game and explained how min-max theorem for this game is formulated.They concluded by suggesting that to solve this problem linear algorithms would be appropriate.Reference [12] modeled the mission deployment problem as repeated game and computed Nash equilibrium using improved PSO.They all do not consider the attackers' group behaviors.This paper precisely takes advantage of the attackers' group behaviors and in this way defender can gain more payoffs.More related works about applying game theory in network security can be referred to [13].

Distributed Network Attack-Defense Game
Given the flexibility that software-based operation provides, it is unreasonable to expect that attackers will demonstrate a fixed behavior over time [14].Instead, on the one hand, attackers dynamically change their strategy in response to the dynamics of the configuration of the target system or defense strategy.On the other hand, relative to the defenders, attackers vary in degree of information acquisition abilities and rational characteristics.
We simplify attackers into two categories: senior attacker and junior attacker.Senior attacker has greater ability to acquire game information than junior attacker.As a result, senior attacker can react as soon as game situation changes and junior attacker generally follows senior attacker's behavior because of his weaker information acquisition ability.
Different from attackers, defenders, as system officers, are consubstantial and have more information about system, game structure, and payoff matrix.Even if they temporarily lack knowledge, they have more resources to fill in the blank.So the defenders are easier to gain the whole view of game situation.
Similar to Stackelberg model [15], there are senior and junior players in the distributed network attack-defense game.Moreover, distributed network attack-defense game is one-many game, as is stated above.Attackers are groupplayers, containing a minority of senior players and a majority of junior players.Defender is individual and senior player.
In distributed network attack-defense game, there are three game stages classified based on players' behaviors.
Stage 1. Attackers, as group-players, select different pure strategies randomly and format the proportion distribution of various kinds of pure strategies.Generally, the first game stage will not last too long and it will be terminated by defender's behavior.
Stage 2. Defender, as individual-player, behaves based on the proportion distribution of attack strategies.In our opinion, defender can gain more payoffs through misleading and guiding attacker group distribution structure, as in Section 5.

Stage 3.
Senior attackers react to the game situation, and junior attackers follow senior attacker's behaviors to gain more payoffs.Junior attackers' behavioral pattern can be modeled as imitation dynamics model, as in Section 4.
Then, the game situation will repeat between the second stage and the third stage infinitely, unless in some special situation which we will discuss in Section 5.1.

Imitation Dynamics Model
As discussed above, attacker group presents imitation dynamics pattern in distributed network attack-defense game.Different from general imitation dynamics model, minority of senior attackers can lead the imitation actions.In this section, we model attacker imitation dynamics in distributed network attack-defense game considering the effect of senior attackers.
Stage 1.As attackers select pure strategies randomly, proportion distribution of various kinds of pure strategies obeys uniform distribution.Let attacker's pure strategy space be   (  1, . . .,   ) and let the number of the attackers be .The Proportion Vector (PV) of attacker group choosing strategy    at time  is denoted by   ().In this stage,   () is equal to 1/. is the number of attack strategies.In the attacker group, the proportion of senior attackers is denoted by .So there are   () ⋅  senior attackers choosing   .Similarly, defender's pure strategy space is denoted by   (  1, . . .,   ) and the game situation when attacker chooses    and defender chooses    is denoted by   , corresponding to attacker's payoff   ⋅   and defender's payoff   ⋅   .
Stage 2. Defender behaves based on the proportion distribution of attack strategies.There are two cases to be considered: first defense behavior and follow-up defense behavior.Before the first time defender behaves, senior attackers randomly choose attack strategies and the distribution of senior attackers obeys uniform distribution like junior attackers.After the first time defender behaves, senior attackers always concentrate on the best response strategy no matter how defense strategy changes because of their quick reaction capability and the distribution of senior attackers obeys concentrated distribution.
Stage 3. Senior attackers react to the game situation immediately.Let senior attackers react in vector at time  be ().In the first defense behavior case, uniform distribution of the senior attackers concentrates on the best response attack strategy, suppose   : In the follow-up defense behavior case, suppose that best response attack strategy changes from    to   .Then concentrated distribution of senior attackers accordingly changes from    to   :  () = (0, . . ., , . . ., −, . . ., 0) . ( Let () be the PV after senior attackers' reaction at time : For junior attackers, they imitate senior attacker's behaviors to gain more payoffs in the imitation probability .The distribution of junior attackers concentrates on the best response strategy gradually.Let imitation vector be ().Similar to the first defense behavior case, uniform distribution of the junior attackers concentrates on the best response attack strategy, suppose   : In the follow-up defense behavior case, suppose that best response attack strategy changes from    to   .Then concentrated distribution of junior attackers accordingly changes from    to   :  () = (0, . . ., +   () ⋅ , . . ., −   () ⋅ , . . ., 0) .(5) Correspondingly, the Proportion Vector (PV) of attacker group is updated as Imitation probability  is affected by additional game information obtained by junior attackers beyond their own information acquisition ability.In this paper, we assume that the additional game information is obtained from two aspects.One is revealing game information initiatively by defender.The more game information is revealed, the higher value of  can be.So  can reach maximum value of 1 if plenty of game information was revealed by defender.The other aspect is internal communication among attacker group which is the natural attribute of group and cannot be controlled by external behaviors.So  has a constant minimum value, suppose  0 .As a result, the following is obtained: As mentioned above, defender has a partially ability to control junior attackers' imitation rate through revealing game information purposefully.The game information revealing strategy will be discussed in Section 5.2.1.

Dynamic Game Method
We now present a dynamic game method for achieving the optimization of defense benefit.The proposed method is a two-step procedure which involves defense strategy sequence generation algorithm (SSGA) (Section 5.1) and parameter analysis method (Section 5.2) used to set parameters in dynamic game method.
Consider a simple game payoff matrix  as in Table 1.Obviously, defender wishes to keep game situation in  22 = (  2,   2) within which he can gain global best payoff of 9.However, this desire seems unrealizable since if defender chooses strategy   2, attackers will choose strategy   3 as response to gain more payoffs.What we propose is a novel dynamic game method to keep game situation in global best situation as long as possible in which way defender can gain more payoffs.

Strategy Sequence Generation Algorithm.
Strategy sequence generation algorithm (SSGA) produces a strategy pair which will be chosen in sequence circularly by defender to keep game situation in global best situation.Two parameters will be attached to the strategy pair and we will discuss them in Section 5.2.
We firstly define some notions which are necessary in SSGA.
Vertex.It is the best response game situation of attackers.In Table 1,  31 = (  3,   1),  32 = (  3,   2), and  23 = (  2,   3) are vertices.Obviously, there is one and only one vertex in a row of game payoff matrix and, in imitation process, attacker group will gather to the vertex of the row.In this way, we can redefine Nash equilibrium as game situation which is both the vertex and the best response game situation of defender.
Inducing Point.It is game situation which is not Vertex and has Vertex in the same line.Among inducing points, two inducing points are determined seriously by SSGA: key inducing point with higher defense payoff which defender wishes to keep as long as possible, for example,  22 = (  2,   2) in Table 1, and assist inducing point which is used to adjust the contribution of attacker group and assist to keep game situation in key inducing point.We have the following trivial result of the number of inducing points in a game payoff matrix: where  is the number of vertices in th line in game payoff matrix.
Ring.A triple ((  ,   ),   ,   )⋅(  ,   ) is a defense strategy pair which will be chosen in sequence circularly;   is inducing point of the ring which is in the same line of vertex of   ;   is inducing point which is in the same line of vertex of   .The defense payoffs of   ,   decide which one is key inducing point and which is the order of strategy pair.In Table 1, ((  2,   3),  22 ,  33 ) is a ring of the responding payoff matrix.
The number of rings in a game payoff matrix can be computed as It is easy to prove that, unless all the vertices concentrate upon one same line, there must exist at least one ring.A ring identification algorithm is as in Algorithm 1.
In lines 9-11, we select higher defense payoff inducing point and corresponding vertex as the ring result in accordance with original intention.
Based on ring identification algorithm, a global ring selecting algorithm is as in Algorithm 2 to work out a global best ring.
The global ring selecting algorithm works out a ring  which is used in the following method discussion.However, let us consider a special case in which the game has Nash equilibrium with corresponding defense payoff larger than key inducing point's defense payoff.In this case, it is easy to make the conclusion that Nash equilibrium is the better choice.So in this paper, we do not consider this case.strategy in strategy Ring is with which degree of information leakage.Since we have discussed Ring in the last section, in this section, we discuss the parameters in the dynamic game scheme, mainly Duration and  .

Leakage Factor. The parameter 𝐿𝑒𝑎𝑘𝑎𝑔𝑒 𝑓𝑎𝑐𝑡𝑜𝑟 indi-
cates to what degree defender should reveal information in each strategy duration to induce the behavior of attacker group.As definition of Ring in Section 5.1, there are two strategies in a Ring, corresponding to two inducing points.Therefore, the parameter   should also have two subparameters. .A two-tuple   (  ,   ) is a pair of percentage figures corresponding to each strategy in Ring and also key inducing point and assisting inducing point.Note that the percentage figures are independent of each other, since they are used during different strategy durations.  equaling 0 means that defender does not reveal game information intentionally; oppositely,   equaling 1 means that defender reveals plenty of game information; others mean that defender reveals game information partially.
As mentioned above, imitation probability  is affected by revealed game information.The more game information is revealed, the higher value of  will be, meaning higher imitation speed of junior attackers.Obviously, we can simply suppose that there is positive correlation between  and  .Since the functional relationship  = ( ) is not the focus of this paper, we just have the following assumptions: There are two cases to be considered: duration of    and duration of   .
In duration of   , the concentrated distribution of junior attackers changes game situation from key inducing point   to the Vertex of the same row.Based on our purpose, we wish to keep game situation in key inducing point as long as possible.As the result, defender should decrease   to 0 by revealing no game information intentionally, corresponding to minimum imitation probability of  0 and the lowest imitation speed of junior attackers.
In duration of   , the concentrated distribution of junior attackers changes game situation from assist inducing point   to the Vertex of the same row.Based on our purpose, assist inducing point is used to adjust the contribution of attacker group and assist in keeping game situation in key inducing point.So we wish the concentrated distribution of junior attackers ready rapidly to be induced to key inducing point.As a result, defender should increase   to 1 by revealing plenty of game information, corresponding to maximum imitation probability of 1 and the highest imitation speed of junior attackers.Before analyzing the parameter of Duration, a computational method of the average payoff of game players should be given.Let () be average payoff of players at time .() can be deduced through payoff variation.Suppose the best response attack strategy changes from    to   :

Duration. The parameter
where the second part indicates the payoff increment by junior attackers imitation behavior which means there is the proportion   ( − 1) ⋅  of attackers changing strategy from    to   , the corresponding payoff increment.The Proportion Vector (PV) strategy    varies as Let (0) = 0 be the initial payoff.The average payoff of players at time  is as follows: Then the sum of player payoff in the duration of  can be deduced as Suppose that there is a dynamic game scheme (((  ,   ),   ,   ), (  ,   ), (  ,   )) and let the cost of changing defense strategy be .In duration of   , defender lets   change to 1 by revealing plenty of game information, corresponding to maximum imitation probability of 1 discussed above which means attackers converge instantaneously.So we suppose that   is 0 and that defender gains no payoffs in duration of   .Then the average of defender payoff in the duration of a Ring can be We can achieve the highest  by controlling the only variable   through solving the equation of ()/(  ) = 0.

Numerical Example.
In this section, we provide a numerical test to illustrate the implementation of the proposed method.In the example, we suppose that the game payoff matrix, the imitation probability, and the cost of changing defense strategy are determined since these are not the focus of this paper.Let  0 be 0.1 and the game payoff matrix as Table 1.
The result of the equation ()/(  ) = 0 is   ≈ 4.734.When   < 4.734, defender changes strategies frequently by which lots of costs  = 2 are introduced.So we can see in Figure 1 that, with the increase of   ,  grows rapidly.When   > 4.734, the cost of changing defense strategy is not main impact factor on  anymore because of low frequency of defense strategy replacement.In this case, attackers change their convergence from  22 to  32 gradually in the process where defender's payoff decreases.That means that longer duration results in low , as seen in the figure .6.2.Effectiveness.In this section, we verify the effectiveness of proposed method.Reference [12] proposed a game strategy optimization approach solving mission deployment problem.A Nash way to choose game strategy is figured out using particle swarm optimization (PSO).Although reference [12] had different assumption and problems with this paper, the method itself is comparable.We compare the average defender payoffs achieved by two methods, shown in Figure 2.
We can see in Figure 2 that the graphic of dynamic method shows the vibration waveform and amplitude decreases.In every vibration cycle, average payoffs increase firstly with a declining slope.This is because of the fact that the dynamic regulation of attacker group leads to the decreasing growth rate of defender payoff.Then average payoff decreases rapidly caused by the cost of changing strategies.In Nash way, the average payoff fluctuates seriously in the earlier stage because the average payoff is still unstable as average values [12].Then, after about 30 times, the average payoff is tending towards stability, about 4.5.It is clear that the dynamic method can achieve obviously higher average payoff.The Nash way seeks for an optimization approach by safeguarding a Nash equilibrium game situation.This is driven by minimizing the possible losses.On the other hand, in this paper, our dynamic method applies a different thinking by seeking for the global best payoff in the game.So our method can greatly improve the payoffs.

Conclusions
In this paper, we model distributed network attack-defense game as one-many game and formulate the game groupplayer's imitation behaviors as evolutionary process.Taking advantage of defenders' information superiority and attackers' imitation behaviors, we propose a dynamic game method to help defender gain more payoffs through inducing attackers' evolutionary process.The experiments prove the effectiveness of the proposed method.In our future research, we will apply the proposed method in other areas to verify the effectiveness, such as state estimation, dynamics control, resources allocation, or information management [16][17][18].
On the other hand, this paper is based on the assumption that players in the game are seeking for the increasing of their own payoffs and do not care about opponents' .However, in the reality, there are different types of attack.For example, there exists such case where attackers are seeking for destroying opponent's system.In this kind of attack, attackers concentrate more on the decreasing of opponent's payoffs than the increasing of their own payoffs.So the attack type will affect defense mode.Our future work will be driven towards these problems.
Duration indicates how long the time to hold each strategy in strategy Ring is.Similar to  , the parameter Duration should also have two subparameters.Duration.A two-tuple Duration (  ,   ) is a pair of durations corresponding to each strategy in Ring and also key inducing point and assisting inducing point.

Figure 1 :Figure 2 :
Figure 1: The influence of   to .