A Strategy Optimization Approach for Mission Deployment in Distributed Systems

In order to increase operational efficiency, reduce delays, and/or maximize profit, almost all the organizations have split their mission into several tasks which are deployed in distributed system. However, due to distributivity, the mission is prone to be vulnerable to kinds of cyberattacks. In this paper, we propose a mission deployment scheme to optimize mission payoff in the face of different attack strategies. Using this scheme, defenders can achieve “appropriate security” and force attackers to jointly safeguard the mission situation.


Introduction
Modern organizations embed information and communication technologies (ICT) into their core processes as means to facilitate the collection, storage, processing, and exchange of data to increase operational efficiency, improve decision quality, and reduce costs [1].In this way, organizations tend to split their mission into smaller tasks which can be executed in distributed system.
Despite the significant benefits of distributed system, the system also places the mission at the risk due to "distributed vulnerability." Traditional approaches to improve security generally consider only system vulnerabilities and attempt to defend all the attacks through system upgrading.No matter whether the assumed attacks come, the defending resources have to be inputted.In distributed system, these keeping upgrading approaches will result in a huge waste of defending resource.Regarding this, the concept of "appropriate security" is proposed to pursue a tradeoff between security risk and defending resource.The nature of attack and defense can be abstracted to the game between attack strategy and defense strategy.Whether the defense strategy is effective or not depends not only on self-condition but also on attack and defense strategy.Therefore, when we deploy mission tasks, it is critically important to take both system vulnerabilities and attack strategy in consideration.
In this paper, we improve mission assurance through suitable mission deployment and propose a novel defense scheme that enables deployment of mission tasks such that the payoff of mission is maximized considering all the possible attack strategies and vulnerabilities exposed.In other words, we figure out a trade-off between attack strategy and defense strategy, which neither attacker nor defender has enough reasons to break it.That means every strategy change of attacker and defender is sure to reduce self-payoff.
The contribution of this paper is as follows: first, we formulate the mission deployment problem about attacking and defending using game model; second, we improve game theory formula, and, based on that, we design particle swarm optimization algorithm suitable to find optimal mission deployment strategy; finally, we prove that, using proposed method, the defender can force rational attackers to jointly safeguard the mission situation.
The remainder of this paper is structured as follows.In Section 2, we discuss related work.In Section 3, we describe the problem to be solved and present some preliminary definitions.In Section 4, we model the mission deployment problem using game theory.In Section 5, we design particle swarm optimization (PSO) algorithm to calculate the Nash equilibrium.In Section 6, we report our experimental results and comparison of other works.Finally, in Section 7, we present our conclusions and make recommendations for future works.

Related Work
The problem of mission assurance by deployment has not been sufficiently studied in the literature.Reference [2] presents a solution for deploying mission tasks in a distributed environment in a way that minimizes a mission's exposure to vulnerabilities by taking into account available information about vulnerabilities and dependencies.In this paper, we propose a novel model and method to discuss this problem of mission assurance by deployment using game theory.
The main related works can be summarized into two categories.

Task Allocation.
The problem of task allocation in distributed system has been widely studied.Some researchers study task allocation as NP-hardness problem.Most solutions are based on heuristics.Several heuristics algorithms were studied to pursue the optimal finishing time in [3,4].Reference [5] studies the problem of task assignment based on the graph theoretic approach, taking two incompatible factors, interference cost and communication costs, into account.Some other works integrate the security requirements within task assignment process.Reference [6] describes a task allocating method that introduces safety into distributed system and attempts to maximize the reliability.Reference [7] proposed a scheduling strategy, named SAREC, to satisfy security requirements while meeting timing constraints imposed by applications.Reference [8] extended the research of [7] by solving the problem of greedy and proposed a new security-aware real-time scheduling algorithm based on genetic algorithm (SAREC-GA).Although the problems of security in task assignment have already been paid attention to for several years, studying it in game theory field is still a new issue.A research focuses on game theoretic resources allocation algorithm that considers the fairness among users and the resources utilization.Moreover, the allocation problem use in other areas has been studied [9][10][11][12], especially in the area of nonlinear systems [13] and fuzzy systems [14].

Game.
Game theory is a study of mathematical models of conflict and cooperation between intelligent rational decision makers [15].In 1928, Von Neumann proved the basic principle of game theory, which formally declared the birth of game theory.Due to the superiority of understanding and modeling conflict, game theory has been studied in many fields.Hafezalkotob and Makui proposed a Nash game approach for supply chains competition considering two main sources of uncertainty: customers purchasing behaviors and rival chains strategies.Sekine et al. develop a scheme dealing with fairly allocated problems in the multicriteria environment based on data envelopment analysis (DEA) and game theory.Jia et al. proposed a distributed localization method based on game theory for road sensor networks.Game theory has recently been used in the field of computer network security.Reference [16] proposes a model to reason the friendly and hostile nodes in secure distributed computation using game theoretic framework.Reference [17] presents an incentive-based method to model the interactions between a DDoS attacker and the network administrator, and a game theoretic approach to inferring intent, objectives, and strategies (AIOS).References [18,19] also focused on DDos attack and defense mechanisms using game theory.Reference [20] models the interactions between an attacker and the administrator as a two-player stochastic game and computes Nash equilibria using a nonlinear program.For more related works about applying game theory in network security, the reader can be referred to [21].

Problem Formalization
In this section, we present some preliminary concepts and our assumptions on computing infrastructure and missions.
A distributed system consists of a collection of autonomous physical hosts, connected through a network and distribution middleware, which enables these hosts to coordinate their activities and to share the computing resources of the system.Modern organizations tend to use distributed system as means to facilitate the collection, storage, processing, and exchange of data to increase operational efficiency, improve decision quality, and reduce costs.In other words, organizations improve the quality of the completion of mission taking the advantage of the distributed system.About the notion of mission, several literatures have given the definition based on their scenes.Reference [22] views mission as the task, together with the purpose, that clearly indicates the action to be taken and the reason therefore.Donley [23] analyzed the use of the term mission across multiple DoD documents and identified that, more commonly, a mission is "considered generally as integrating many activities around a common theme or purpose." To simplify our model, we view the mission as a set of tasks, known as mission's subtask chain, which have logical relationship and make a contribution to a common objective.Noting that mission is a long-term objective, we assume that mission's subtask chain will be executed many times.
We simply define a mission  as a set of tasks  = {1, 2, 3, . . ., }.Each task produces payoff to the mission.A mission is successful if all its tasks are correctly executed.In many cases, these tasks may be executed repeatedly.For each task  ∈ , a set of procedure replicas can fulfill the task,  = {1, 2, 3, . . ., }.For example, both ftp and NFS can achieve the task of remote file accessing.In our framework, we define the set  of procedures to be deployed  = .The physical hosts in distributed system provide different kinds of procedures so that we can deploy mission's subtasks into procedures and execute the tasks separately.Assumption that there are adequate hosts provides  ∈ .
The deployment is a function :  →  which maps each task  ∈  to a procedure  ∈ .The binary variable  denotes the truth value of () = ; that is,  = 1 if () = , 0 otherwise.Also, each procedure and its executing environment can introduce vulnerability and failures.At the same time the executed tasks produce positive payoff to the mission, the attacks to these procedures and their executing environment produce negative payoff to the mission.The purpose of this paper is to find a deployment strategy to optimize mission payoff in the face of attacking.Let  = {1, 2, 3, . . ., } be the set of attacks.The attack influence to the mission will be described in the next section.

Game Model
Rational attackers will consider both attack cost and benefit comprehensively before implementing attacking and choose the attack which costs less and wins more, meaning higher payoff.However, irrational attacker will aggressively seek higher benefit without regard to attack cost.This paper focuses on the rational attackers and discusses the best defense strategies.
In the gaming process, both attackers and defenders attempt to gain maximized payoff through optimal strategies.Based on the assumption that they are rational, we formalize their conflict relations as game model and work out attack intention and best defense strategies.

Model Definition. Game:
A description of the strategic interaction between opposing or cooperating interests where the constraints and payoff for actions are taken into consideration [21].There are three basic elements in game model Game = (, , ).
(1)  = (1, 2, . . ., ) means a set of basic entities in a game which join in the game and make choices of strategies.An entity can represent a person, machine, or group of persons within a game.In this paper, there are two entities, attacker and defender.
(2)  = (1, 2, . . ., ) means strategies set of entities within the game which a given entity can take during game play. is the set of strategies for entity  and, for each, entity the number of strategies is no less than 2; for example,  = (  1 ,   2 , . . .,    ) and  is the number of strategies for entity .In this paper, defender's strategies are different deployment strategies with different vulnerabilities exposed.
(3)  = (1, 2, . . ., ) means utility function set of entities.Utility function is the function of all the entities' strategies which presents the payoffs entities can gain from game.We will make quantification example of utility function in next section.

Utility Function.
The quantification of utility function is mainly based on one security evaluation and risk analysis.Many literatures have studied in this area; for example, [24] proposes intrusion detection systems (IDSs) cost model based on cost quantification, cost classification, and attack classification.Since this part of content is not the focus of this paper, here, we just give the simple quantification based on the factors of benefit, cost, and successful probability for attack and defense separately.
Attack benefit (AB) presents the reward attackers can obtain by launching a successful attack.AB is related to attack strategy.
Attack cost (AC) presents resources, professional knowledge, or response to legal sanction spent by attacker to launch an attack.Noting that no matter whether the attack is successful or not, the same cost has to be paid.To simplify the analysis, we can regard expected cost of statistical result as AC.AC is related to attack strategy.
Attack success probability (ASP) presents the possibility of whether the attack can be successful.This factor is impacted by attack strategy and defense strategy.If attacker attempts to launch attack A when defender happens to implement corresponding defense strategy and the vulnerability exploited by attack A is not exposed, the ASP must below, even to 0. Otherwise, if there happens to be no defense strategy aiming at attack A, the ASP may be high.ASP is related to both attack strategy and defense strategy.So ASP is a matrix, as in Table 1.
Payoff of attacker (PoA) presents the payoff an attacker can gain from game play.The utility function is as follows: Defense benefit (DB) presents the reward defenders can obtain by launching an effective defense against an attack.In the mission environment, this reward can be contributions to mission made by the mission subtasks' normal execution.DB is related to defense strategy, meaning deployment strategy.
Defense cost (DC) presents resources and operational time spent by defenders to launch a defense.Noting that no matter whether the defense is effective or not, the same cost has to be paid.To simplify the analysis, we can regard expected cost of statistical result as DC.DC is related to defense strategy.
Defense effective probability (DEP) presents the possibility of whether the defense is effective against attack.The defense is effective means that the exposed vulnerabilities cannot be exploited or are hard to be exploited.This factor is impacted by attack strategy and defense strategy.The meaning of DEP is similar to ASP.DEP is also a matrix, as in Table 2.
Payoff of defender (PoD) presents the payoff a defender can gain from game play.The utility function is as follows: (2) Based on utility functions, we can work out payoff matrix.For convenience of discussion, Game scene Game = (, , ) will be given as follows.
In the Nash equilibrium, the entities choose strategies that are best responses to each other; then, no entity has an incentive to deviate to an alternative strategy.So the system is in a kind of equilibrium state, with no force pushing it toward a different outcome.Nash equilibrium can be thought of as equilibrium in beliefs.If each entity believes that the other entity will actually play a strategy that is part of Nash equilibrium, then they are willing to play their part of the Nash equilibrium.Noting mission's subtask chain will be executed many times; we should consider attack and defend mixed strategies game.
In the mixed strategy game, the payoffs expectation of attacker and defender are computed as follows: On the other hand, according to the implication of Nash equilibrium, in Nash equilibrium situation, for given other entities' strategies, the pure strategies of each entity should gain the same payoffs.As a result, in Nash equilibrium situation, for each entity, it satisfies: F satisfies In PSO, a swarm of particles is represented as potential solution, and firstly a swarm of random particles is initialized.Each particle is associated with two vectors, velocity and position; that is, the velocity vector   = [ 1   ,  2  , . . .,    ] and the position vector

Particle Swarm Optimization
where  stands for the dimensions of the solution space and  stands for the th particle.Each particle is also associated with a fitness value decided by a unified fitness function.During the evolutionary process, the velocity and position of particle are updated followed by two current best particle positions.The first "best" particle position is called local best which is the historical best position of the particle itself.The second "best" particle value is called global best which is the best position in the neighborhood.The velocity and position of particle are updated as follows: where  is the inertia weight,  1 and  2 are the acceleration coefficients, and  1 and  2 are two uniformly random numbers independently generated within [0, 1].Best is local best position and Best is global best position.

PSO Design for Game.
In this algorithm, we regard a game strategies situation as a particle position.In the game scene described in Section 4.2, particle position is like  = (, ) = (( 1 ,  2 ,  3 ), ( 1 ,  2 ,  3 ,  4 ,  5 ,  6 )), and first vector is attack mixed strategies and second vector is defend mixed strategies.We use    for the probability of the th strategy for entity , noting that the position in this algorithm is multidimensional vector group which is different from the description in Section 5.1.In Nash equilibrium, every entity's strategy is best response for others, so the particle which represents Nash equilibrium situation has the best fitness value.During the evolutionary process, particles search the best position within game space and update position following by local best solution and global best solution and move gradually to Nash equilibrium situation.
In order to improve the convergence speed, as proposed by Shi and Eberhart [26], in this algorithm,  is linearly decreasing with the iterative generations as follows: where  is the generation index representing the current number of evolutionary generations and  is a predefined maximum number of generations.
Based on the Nash equilibrium definition and formula (8), we design the fitness function as follows: Obviously, if and only if  is Nash equilibrium, the fitness function is equal to the minimum value 0. That means, in the situation of Nash equilibrium, for every entity in the game, no matter which strategy they choose, the payoff is the same.If  is not Nash equilibrium, then () > 0.
In addition, in order to keep particles staying in game space during the evolutionary process, we should control the particle position space.Since the vectors of particle position are representation of mixed strategies, so the sum of vector elements must be 1 and the elements must be positive numbers; namely, ∑   =1    = 1,    ≥ 0,  = 1, 2, . . ., ,   is the number of entity 's pure strategies.We also ensure that the initial particle velocity ∑   =1 V   = 0,  = 1, 2, . . ., , so that, in every generation, the velocity is zero.
Another factor, particle update steps, is needed to be control to keep particles staying in game space.The particles update positions are followed by the function    =    +    .To keep    ≥ 0,  = 1, 2, . . ., ,  = 1, 2, . . .,   , we add the step control factor in the function: = min (   ) ,  = 1, 2, . . ., ,  = 1, 2, . . .,   .(16) Based on the above-mentioned design, it can be ensured that every particle in every generation stays in mixed strategy game space.In this way, many invalid particles are avoided and algorithm performance is improved.
Step 2. Compute the fitness values of particles using function (12) and find out local best lBest and global best Best.
Step 4. Compute step control factor  for every entity and every particle, and update particles' position using function (13).
Step 5. Renew Best and Best.If Best = 0, then stop; otherwise, turn to Step 3.

Performance Index.
We evaluate the convergence property of algorithm using off-line performance, referred from the method proposed by De Jong [27].
Definition 4 (off-line performance).In generation , we define () as follows: where  * () is the best fitness in generation .

Convergence Property.
In this section, we use Dekel-Scotchmer game [28] as an example.The payoff matrix is as follows: In the algorithm, the scale of particles is set as 30.The maximum iteration generation is set as 1000. max = 0.9 and  min = 0.4.Accuracy is set as 1 − 5.The experiment is performed on matlab R2013a.We perform the experiment for 5 times and achieve Nash equilibrium at (1/3, 1/3, 1/3, 0; 1/3, 1/3, 1/3, 0) after the average iteration generation of 268.The results are as in Table 3.
We choose off-line performance of the last experiment to plot Figure 1 in blue line.
Reference [29] presented a game genetic algorithm (GGA) which realized the choice of strategies using genetic algorithm (GA).We realize the genetic algorithm on matlab R2013a within the same environment as PSO.The comparison results of PSO method proposed in our paper and GA   are shown in Figure 1.Obviously, our method can achieve a higher convergence speed.That is because, relative to GA in which chromosomes share information with each other and the entire population of group moves to the optimal area in a uniform way, in PSO only Best and Best particles deliver information and the entire population of group moves in a more targeted way, which means a higher convergence speed.

Effectiveness.
In this section, we evaluate the influence of defending strategy on the payoff of defender.As mission deployment must be performed many times as assumption, both defender and attacker are nearly impossible to choose single strategy in practice.As a result, pure strategy is not considered in our discussion.
In our discussion, both attacker and defender have two ways to choose strategy: random way and Nash way.Random way means that entity, attacker or defender, randomly chooses the strategy from strategy set every time.On the other hand, Nash way means that entity chooses the strategy in the same probability as Nash equilibrium, which is more rational than random way.Therefore, there are four situations in our discussion: attack in random way versus defend in random way; attack in random way versus defend in Nash way; attack in Nash way versus defend in random way; attack in Nash way versus defend in Nash way.We use game scene and Nash equilibrium in Section 6.1 for reference and the time of mission deployment and attacking is set as 1000.The average payoff of defender is used as evaluation indicator.The experiment is performed on matlab R2013a.The result of experiment is shown in Figure 2.
As shown in Figure 3, defender will achieve highest average payoff, about 78, after 400 times, if both attacker and defender choose strategy in Nash way.If attacker is rational and chooses strategy in Nash way and defender randomly chooses strategy, the average payoff of defender is the lowest, about 50.If one of the entities in the game chooses strategy in random way and the other one choose it in Nash way, defender will achieve medium average payoff, and, in this situation, defenders will achieve a little higher average payoff if they are rational and choose strategy in Nash way.We can see in the Figure 3, in the earlier game stage, the indicator of average payoffs waves seriously that the low average line can be higher than high average line.That is because the average payoffs are still unstable as average values.It is easy to come to the conclusion that defender will achieve higher payoff in Nash way no matter which way attacker will choose.
On the other hand, similarly, attacker will achieve higher payoff when choosing strategy in Nash way.
So we can make the conclusion that both rational attacker and defender will choose strategy in Nash way and jointly safeguard the Nash equilibrium.

Comparisons of Dimensions.
In this section, we evaluate the influence of number of strategies on algorithm performance.In the algorithm, the scale of particles is set as 30.The maximum iteration generation is set as 1000. max = 0.9,  min = 0.4.Accuracy is set as 1 − 5.The experiment is performed on matlab R2013a.We calculate the off-line performance for two dimensions, three dimensions, and four dimensions."Dimension" is the number of strategies in strategy set of the game.The comparison result is shown in Figure 3.
We can see that the average convergence speed is reducing gradually with the increase of the strategy dimensions.In the comparison experiment, we fix PSO parameters and, as the result, the major factor influencing the convergence speed is fitness function.The more dimensions means the greater scale of fitness function (as function (12)) and corresponding higher computing cost.

Conclusions
In this paper, we have highlighted the fact that the mission is prone to be vulnerable to kinds of cyberattacks.As a result, we propose a mission deployment strategy using game theory to optimize mission payoff in the face of different attack strategy and design particle swarm optimization algorithm to calculate Nash equilibrium.Experiments show that both rational attacker and defender will choose strategy in Nash way and jointly safeguard the Nash equilibrium.However, we can see in Nash equilibrium, the payoff of defender is just the better one, not the best one globally.So our future work will be driven towards the way to achieve the integration of Nash equilibrium and "social best" which means in Nash equilibrium the payoff of defender is the social best and the payoff of attacker is the worst.

5. 1 .
PSO Framework.Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Dr. Eberhart and Dr. Kennedy in 1995 [25], inspired by social behavior of bird flocking or fish schooling.

Figure 1 :
Figure 1: Off-line performance of convergence experiment.

Figure 2 :
Figure 2: Average payoffs of strategy choosing ways.

Table 3 :
Results of convergence experiment.