Path Hopping: An MTD Strategy for Long-Term Quantum-Safe Communication

Movingtargetdefense(MTD)strategieshavebeenwidelystudiedforsecuringcomputersystems.WeconsiderusingMTDstrategies toprovidelong-termcryptographicsecurityformessagetransmissionagainstaneavesdroppingadversarywhohasaccesstoa quantumcomputer.Insuchasetting,today’swidelyusedcryptographicsystemsincludingDiffie-Hellmankeyagreementprotocol andRSAcryptosystemwillbeinsecureandalternativesolutionsareneeded.Wewilluseaphysicalassumption,existenceofmultiple communicationpathsbetweenthesenderandthereceiver,asthebasisofsecurity,andproposeacryptographicsystemthatusesthis assumptionandanMTDstrategytoguaranteeefficientlong-terminformationtheoreticsecurityevenwhenonlyasinglepathisnot eavesdropped.FollowingtheapproachofMalekietal.,wemodelthesystemusingaMarkovchain,deriveitstransitionprobabilities, proposetwosecuritymeasures,andproveresultsthatshowhowtocalculatethesemeasuresusingtransitionprobabilities.Wedefine twotypesofattackersthatwecallrisk-takingandrisk-averseandcomputeourproposedmeasuresforthetwotypesofadversaries foraconcreteMTDstrategy.Wewillusenumericalanalysistostudytradeoffsbetweensystemparameters,discussourresults,and proposedirectionsforfutureresearch.


Introduction
Cryptographic infrastructure of the Internet allows users from across the world to establish private and authenticated, confidential communication channels, and interact securely.Shor's discovery of a quantum algorithm that can efficiently solve integer factorization and discrete logarithm problems [1], the two mathematical problems that are the basis of the security of the most prominent public key crypto algorithms such as RSA public key encryption and Diffie-Hellman key agreement, effectively brings down the cryptographic infrastructure of the Internet.The NSA's recent call for quantumsafe cryptography and the prediction that significant progress in the development of quantum computers could be expected within fifteen years [2,3] have created a flurry of activities in research community, standardization bodies [4,5], and major industries [6][7][8].The main approaches to quantumsafe cryptography use, (i) quantum cryptographic models and algorithms, (ii) cryptographic algorithms that rely on computational assumptions for which no efficient quantum algorithm is known [9], and (iii) cryptographic systems that do not use any computational assumptions.This last approach results in information theoretically secure systems and is followed in this paper.
A prominent and widely researched direction in information theoretically secure communication is physical layer security systems that base security on assumptions about the physical environment [10].In these systems the advantage of the sender and the receiver over the adversary is captured through the properties of the physical layer of communication.For example, in Wyner's model [11] Alice is connected to Bob and Eve through two noisy channels, and the assumption is that Eve's reception is noisier than Bob's reception.The extra noise in Eve's channel is the resource and can be used for securing communication against Eve, without the need for a shared key.A unique property of this approach compared to computationally secure systems is providing long-term security which refers to the property that Eve's transcript of communication cannot be used for offline attacks.This is because security is due to the lack of information and not adversary's limited computation.
In this paper we assume there are multiple communication paths ( paths) between the sender and the receiver.A path is an abstraction of a channel and can have different realizations in practice.For example, in wireless communication, a path can correspond to a frequency that is used for transmission and reception by the sender and the receiver; and multiple paths are specified by a set of frequencies that will be used by the two.Or in a sensor network, a path consists of a sequence of nodes in the network which are used to send messages from the sender to the receiver.A similar notion of path can be defined for communication over the Internet.If the adversary can eavesdrop all the paths, secure communication without additional assumptions (e.g., quantum mechanical, computational or other physical layer assumptions) is impossible.We assume that although the set of all paths (e.g., possible frequencies) is known to the attacker, they cannot eavesdrop all the paths at the same time.
To provide cryptographic security against an attacker whose goal is to learn the sent message, the sender can use an (, )-secret sharing (see Section 2) to construct  shares of the message and send each share along a path.This is an inefficient solution with communication rate of 1/ (i.e., for one bit of information, the sender must send  bits).If the number of paths that the adversary can simultaneously eavesdrop is bounded by   < , the sender and receiver can select a random set of  paths and use a (, )-secret sharing.Note that  is not necessarily equal to   , but to keep the introductory discussion simple, let  =   ; we will later discuss the relation between  and   in Section 3. If the set of  paths is kept fixed, the attacker will discover them over time and will be able to learn the message.We propose "path hopping," where the sender and the receiver regularly change ("hop") one or more paths that have been chosen for communication.We also allow the attacker to change their selected paths.We model and analyze dynamic behaviour of the system and show that it results in efficient cryptographic security using an MTD strategy.
1.1.Our Work.Alice wants to send a stream of data to Bob.The adversary is computationally unlimited (no computational assumption is made).Alice and Bob are connected by a set of  communication paths, up to  of which can be eavesdropped by the adversary at the same time.The adversary probes a path to determine if it carries data, and if it does, it captures it (e.g., scanning the port on a server and if open, later break into the path).The adversary is mobile [12] and can move in the network in the sense that, in each time step, it can release a captured path and capture a new one.Hence, it can eavesdrop different sets of paths during different time periods.We assume the attacker can eavesdrop up to  paths, but only eavesdrops those that carry data, and not the ones that are not in use between Alice and Bob (one can consider a case where the attacker always eavesdrop on  paths.Although the analysis approach will be similar, the actual calculations will be different).Alice uses  (out of ) paths at each time and the attacker needs to know all the  paths (targets) to be able to determine the message.
Time is divided into fixed consecutive intervals, each referred to as a time step.In each time step, defender (which includes Alice and Bob), attacker, or none of them takes an action (move).Using the MTD framework of [13], the combination of the attacker's and the defender's actions in each time step can be modelled as a Markov chain.We define  ,V Markov chain where (exactly)  and V paths are hopped by the defender and the attacker, respectively, and derive transition probabilities of the chain.(One can also consider  ≤,≤V Markov chain where in each time step, the defender randomizes up to paths and the attacker hops and probes up toV paths.We leave this for future work.)For our concrete analysis we focus on  1,1 where the defender's and the attacker's actions involve a single path, only.
In each time step, the system can be in one of the  + 1 states labeled by 0, 1, . . ., , where state 0 is the starting state of the system, state  is the winning state for the attacker, and in state , attacker has captured  paths.We assume the defender leads in each time step and moves with a fixed probability .The attacker also moves in the same interval with a fixed probability  that is upper bounded by 1 −  (the model assumes that, in each time step, the attacker and the defender do not act simultaneously.)Note that  can be chosen by the adversary, knowing the upper bound.We define a risk-taking and a risk-averse attacker, depending on their choice of .A risk-taking adversary chooses the highest available attack probability, that is,  = 1 − .A risk-averse attacker would like to stay undetected and so limits their action rate to a threshold that is determined by the intrusion detection system (IDS) of the defender.Thus, in the case of a risk-averse adversary, in each time step, there is a probability that no one moves.We model the system as a Markov chain, and in Section 3, derive the transition probabilities of the Markov chains associated with the risk-taking (case C1) and the risk-averse (case C2) attackers.
Security Measures.We use two security measures to evaluate effectiveness of a path hopping strategy: (i) the expected number of times that the attacker reaches the winning state in  time steps, assuming that it is starting from state 0, and (ii) the expected number of time steps to enter the wining state for the first time, assuming that the attacker starts from state 0. The two measures are denoted by   and  (1)  win , respectively.These security measures capture security requirements of different scenarios.  is appropriate for data streams for which sporadic access to different parts of the stream may be tolerated.For example, small excerpts of a large file are not expected to leak much information about the file.Theorem 3 shows that   is upper bounded by the product  ⋅ (), where () is the th component of the stationary probability distribution of the Markov chain.This suggests that () can be used to represent   , with higher values corresponding to less security.
(1)  win is appropriate for highly sensitive data streams that must stay strictly inaccessible to the adversary and the sender wants to ensure that the expected number of time steps to the first compromise is sufficiently high (possibly higher than the length of the stream).Theorem 4 shows that  (1)   win can be calculated by solving a set of linear equations whose coefficients are derived from the transition probabilities of the Markov chain.We use  (1)  win as a security measure with higher values corresponding to higher security.Numerical Results.Deriving closed form expressions for   and  (1)  win is a challenging task.For  1,1 , we use numerical calculations to study variations of security measures for different values of system parameters.Our results are given in Section 6.They show the following: (1) For fixed  and , security increases (i.e., () decrease, and  (1)  win increases) as , the defender's probability of action, increases (Figures 3 and 4 for C1 and Figures 6 and  7 for C2).
(2) For fixed , security can be maintained by increasing , even when  =  − 1 (Figure 5) and in all cases communication rate is 1/.
Figure 5 also shows that, for given values of  and , as  increases, security initially increases, then it reaches a plateau and then starts to decrease.This is because when  is small (relative to ), the target paths are hidden among many available paths and the success chance of correctly guessing a path would be small.However, when  is large (relative to ), the attacker's probability of correct guessing increases.Interestingly, this point of saturation increase as  that represents variability of the system increases.This graph can be used to select the optimal value of  to provide maximum security, while achieving the highest communication rate.
(3) Using numerical analysis, one can estimate the cost of being risk-averse in terms of decrease in   or increase in  (1)  win .In Section 7 we show that an adversary who chooses not to use all their attacking power (although they can act with probability 1 − , they choose  < 1 − ) will effectively reduce the expected number of times that they will occupy the winning state (proportional to ()) and will have higher  (1)  win .
Attack Costs.Our model focusses on the defender's ability to provide security by making the physical environment dynamic and does not consider the associated costs.Attacker and defender's actions have payoffs.The attacker needs to spend resources to launch attacks and also bear consequences of being detected.The defender must spend resources to implement the randomization strategy.This introduces side effects such as packet loss and communication delays that are a function of the rate of randomization (captured with parameter ).The attacker's reward of their action is related to getting closer to the winning state, and the defender's reward is preventing the attacker reaching the final state.In Section 7 we discuss these payoffs.We also use our numerical calculation results to quantify the cost of being risk-averse.
Randomness Requirements of the System.Our proposed system assumes that the sender and the receiver share the set of target paths that is used for communication in each time step.
In practice if one can assume that the receiver will receive on all paths all the times, then no shared randomness is required: the sender will hop the paths and the receiver will receive the content on the target paths used in each time step.If receiver has the same restriction as the sender on the number of target paths, that is, the receiver can only receive on  paths (e.g., cost or restriction on the receiving equipment), then the sender and the receiver need shared randomness to simultaneously hop the paths.This can be realized in two ways: (i) using a preshared random string or (ii) employing a secure pseudorandom generator to extend an initial shared random seed.
The adversary view of the system in state , in addition to the eavesdropped shares that are sent over that  target paths, includes the labels of the  target paths.In case (i), the sequence of random numbers associated with the labels of target paths will not reveal any information about future values of the sequence of target paths and so future path labels will remain unpredictable.In the case of (ii) however, each observation (of a target path) will leak information about the seed of the PR generator and one needs to use a PR generator with appropriate security level (e.g., a quantumsafe PR generators using a secure block cipher).Note that the MTD system will retain its security because the recorded transcript of communication although may reveal the seed of the PR generator in an offline attack will not have enough information about communicated message and so the message transmission will have long-term security.

Related Work.
Breaking information into shares, to provide confidentiality and reliability, has been used in many cryptographic systems such as secret sharing [14] and information dispersal [15], in information theoretic setting, as well as computational setting [16].These algorithms have been used in distributed storage systems [17] and are the building blocks of Secure Message Transmission [18] and network coding [19] which use multiple paths between the sender and receiver for providing security and reliability.
Uncoordinated Frequency Hopping (UFH) [20] has similarity to our work.In UFH the sender and receiver send and receive on two independently chosen subsets of frequencies, and the eavesdropper uses a third subset of frequency for eavesdropping.Authors show that, assuming public key infrastructure, one can communicate securely and reliably in this setting.The work of [21] uses a similar abstract model to construct information theoretic protocols for secure communication, without requiring public key infrastructure.The communication rate in this latter construction, however, is very low.Our approach is coordinated path hopping where the sender and receiver share an initial secret key that can be established using the scheme of [21].We leave the analysis of the secret key requirement of our system and in particular efficient ways of generating new keys at the required hopping rate for future work.
Using diversity and introducing dynamic properties has been widely used in security systems.System properties that can be diversified and randomized include program instructions [22,23], operating system distributions [24], and systems [25].A comprehensive study of various methods is given in [26].Using game theory for analyzing attackers' strategies in dynamic systems has been studied in [27,28].
Organization.Section 2 recalls the MTD Markov chain framework.Section 3 presents our path hopping model.Security analysis and measures for our model are introduced in Section 4. Sections 5 and 6 present our simulation results for the  1,1 game.Sections 7 and 8 cover utility discussions and our concluding remarks.

Preliminaries
We recall the basic MTD Markov chain framework that is used in our work and review construction and properties of (, ) secret sharing schemes.[13].The system is a defined by the interaction between a defender and an attacker.The defender and the attacker each have a set of possible actions, denoted by D = {, 1 ,  2 , . ..} and A = {, 1 ,  2 , . ..}, respectively; in both  shows no action.Time is divided into time steps.In each time step the system is in one of the  possible states.In each time step, the defender and the attacker get a turn to move and the state change probability is determined by their chosen actions and their results.A strategy of a player determines all actions taken by the player in all points of the game.Using Markov model allows the player's strategy to only depend on the state that the system is in and independent of the history of how the system has reached a state.Definition 1.An M-MTD game is defined by a (+1)×(+1) transition matrix  which describes a Markov chain of state transitions that reflects both defender and attacker moves.Initially the game starts in the state 0. At each next time step the game transitions from its current state  to a new state  with probability  , .

MTD Markov Framework
The state  is the winning state from the adversary's view (defender losing the game).Initially the system is in state 0 (from both attacker and defender's view point).In each time step the defender takes an action according to matrix   with probability , the attacker takes an action according to   with probability , and with probability 1 −  − , both remain without any action.Definition 2. An (  , ,   , )-MTD Game is defined by (1) parameters  and  that satisfy 0 ≤  +  < 1; the parameters represent the rate of defender's and the attacker's play, respectively; (2) ( + 1) × ( + 1) transition matrices   and   ; for ,  ∈ {0, 1, . . ., },   , (or   , ) represents the probability of transitioning from state  to state  when the defender (or the attacker) plays a move in state .
Thus, in each time step a three-sided coin is tossed, and for each side, the corresponding action is realized, and we have the transition matrix where  +1 is the ( + 1) × ( + 1) identity matrix.
A Markov chain  is irreducible if each state can be reached from any other state.A Markov chain is aperiodic, if all the states have period 1 where the period of state  is defined as gcd{ > 0 : Pr(  =  |  0 = )}, where   is the random variable describing the state of the game after  steps.The two properties together guarantee the existence of a limiting stationary distribution, where  = .
Secret Sharing.A (, )-secret sharing is a cryptographic primitive [14] that divides a secret  into  shares, each given to a party, satisfying two properties: (i) reconstructability which means the share of all parties can perfectly reconstruct the original secret and (ii) perfect secrecy which means that if a single share is missing, the secret remains perfectly uncertain.A secret sharing scheme provides two algorithms for share generation and secret reconstruction.Let M =   , where   is the set of integers modulo , denote the set of secrets, and assume that all secrets are equally likely (Pr( =   ) = 1/).The share generation algorithm takes a message  ∈ M as input and generates  shares  1 ,  2 , . . .,   as follows.For   ,  = 1, . . .,  −1 , randomly chooses an element   in   .Then the shares of the secret  are It is easy to see that  shares recover the secret (finding the sum modulo ), and even if −1 shares are known, the secret remains completely uncertain.

The MTD Game of Path Hopping
We consider the setting described in Section 1.1: there is a message source that generates a stream of data that must be protected against an eavesdropper.There are  communication paths that connect the sender to the receiver.To protect message transmission against an eavesdropper who can simultaneously eavesdrop up to   paths (  < ), the sender does the following: (i) randomly chooses a subset of  <  available paths; (ii) uses a (, ) secret sharing to construct  shares for the message, and (iii) sends each share on one of the selected path.The chosen paths are also called target paths.The receiver knows the paths that are used by the sender in each time step.If the adversary eavesdrops only a subset of the target paths (and not all target paths), because of the perfect secrecy of the (, )-secret sharing scheme, the attacker will stay completely uncertain about the message.
We assume that the attacker will not keep a path that is not carrying data.That is, because of the limitation on the number of paths that they can simultaneously eavesdrop, they prefer to release a path that is not used in the current time step and wait for the next time step to try again, noting that, due to their probabilistic strategy, there would be a chance to try the released path in the next time step again.To simplify our analysis, we first consider the case that  =   .For the cases that  >   or  <   , similar analysis can be used; we omit details because of space.
To protect against this adversary, in each time step, the sender and receiver will hop one or more of the target paths, noting that lacking access to even one of the target paths will leave the adversary completely uncertain.We will use the MTD game framework of Definition 2 and model the problem as a dynamic system (game) influenced by (between) two players, a system defender (or simply defender) that includes the sender and the receiver and an attacker.The attacker wins the MTD game (in each time step) if they find the  target paths.
3.1. ,V Games.In each time step, the defender can randomize a subset of  target paths.Similarly, the adversary can simultaneously probe V paths.
We first describe the Markov chain associated with the game, then derive transition probabilities of  ,V , and finally present a detailed analysis of  1,1 .

Markov Chain.
The set of the defender's and the attacker's actions is D = {,  1 } and A = {,  1 }, respectively, where  1 and  1 are defender and attacker actions and  is no action.Let S ⊂ [] denote the set of current target paths and S | denote the subset of target paths known to the adversary.
Defender's Move.The defender cannot determine with certainty if a path is being eavesdropped.We thus consider a defender who, in all time steps, plays a memoryless strategy.That is the defender plays (issues the move  1 ) with probability , irrespective of any learnt information about the attacker's state, or own history of actions.When the defender plays in state , they will choose a subset    of the current target paths S and replace the paths in    with a randomly selected subset of ( − ) nontarget paths.
The chosen paths in    may belong to    (attacker's known path in state ) or be outside it.
Attacker's Move.The attacker is adaptive.In state , the set S | of target paths that is known to the adversary is of size .The adversary randomly selects a subset    of size V of ([] \    ) possible target paths and keeps the message carrying paths and releases the rest.For the adversary, all paths that are not in their set of  known target paths have the same probability of being a target path.
We assume that, in state , as soon as the defender reallocates a target path that is in    , the attacker can detect the change (the path is not one of the  target paths).However this will not affect the adversary's action at this state simply because they know that those paths are not possible target paths.
No Move.Defender and attacker are probabilistic and no moves can be issued by either of them.
In a time step, if the attacker does not issue an action, they will bear the risk of potentially losing one of their known target paths during the next time step.This is because the defender will play a memoryless strategy and will move with probability .This extra risk would translate into a higher probability of not being able to reach the winning position of the game.
To reduce the probability of losing a target path while waiting, the attacker should act when possible and use the available 1 −  action rate.We refer to this attacker as a risktaking attacker as they focus on maximizing their winning chance.More frequent attacks however have the risk of triggering alarm in the defender's intrusion detection system (IDS), tightening security, and reducing access to the system.Let  be a threshold that is used by the defender's IDS to raise the threat level of the system.To avoid reduction in accessing the system, the attacker may prefer to keep their attack rate below .We refer to this attacker as a risk-averse attacker.
The defender plays memoryless with probability , and so in each time step the attacker moves with probability  = min {, 1 − } . ( There will be no move by any of the players in a time step, with probability 1 −  − .Thus the system transition matrix will be Equation ( 2) shows that, depending on the value of  (the attack detection threshold of the defender), we have two cases: C1:  > 1 − .In this case from (2), we have  = 1 −  and C2:  < 1 − .In this case from (2), we have  =  and We refer to C1 and C2 as risk-taking and risk-averse attacker, respectively.

Transition
Probabilities of  ,V .In state , the attacker knows  target paths in S | .A state transition that starts from state  is in general because of the combination of the defender's and the attacker's actions in the following time step.A defender's action reallocates target paths and (since the attacker only holds target paths) can result in state  to change to  where  ≤ .An attacker's action, however, could result in more target paths being captured and so change the state to  ≥ .The state will not change that is stays at , because of the defender or the attacker's action or no moves at all.In the following, we obtain transition probabilities (starting from state ) for (i) the defender's move and (ii) the attacker's move and combine them to obtain the transition probabilities of the chain.For the case of "no move" (which happens with probability 1 −  − ) the state of the game will not change.
Defender's Move in State .Defender chooses a set    of  paths from the set S of current paths and replaces them with a set    of  paths chosen from the  −  candidate target paths [] \ S.

Let 𝑆 𝑖
,1 =    ∩ S | be the intersection of    and the adversary's set of captured paths, and let |  ,1 | = .We note that 0 ≤  ≤ min{, }.Thus the state of the game after the defender's action will be  −  (because  target paths have been removed from S | ) and we have Note that, for  = 0, the state of the game will not change. Attacker For  = 0, the state of the game will not change.
Transition Probability from State  to .Transition probabilities from state  to  will be calculated using ( 6) and ( 7) and max{0, −} ≤  ≤ max{+V, }.We note that transitions with  <  occur only due to the defender's move, and transitions with  >  occur due to the attacker's move.No transition will be due to the defender, the attacker, or no move, with probabilities ,  and 1 −  − , respectively.Thus we have the following transition probabilities: Here   ( | ) and   ( | ) are defined in ( 6) and ( 7), respectively.
The above probabilities show that in each time step a state can be changed to up to  + V other states or stay the same.

System Parameters.
The Markov chain that models the system is determined by the parameters , , , and .In the following we will define security measures for the system and prove Theorems that relate these measures to the system parameters.

Security Analysis
We use two security measures related to the success criteria of the attack.

Expected Number of Compromises.
Consider the system over a period of  time steps, starting from the state 0. Within these  time steps, the expected number of times that the system will be in the compromised state, that is, the attacker is able to learn the message, is an important security measure.(Note that one can use coding strategies [15] to spread information over longer sequences, and so estimating the expected number of compromises provides the required parameter for encoding.)Theorem 3.For an MTD game of path hopping with transition matrix  and stationary distribution  = ((0), (1), . . ., ()), where  is the winning state,   , which is the expected number of times the adversary wins in the first  time steps, is less than or equal to  ⋅ (), assuming that the game starts with the 0 = (1, 0, . . ., 0) distribution.
Proof.The game starts at 0 = (1, 0, . . ., 0).Let   denote the expected number of times the attacker wins in the first  time steps.
We first assume the attacker's starting position is chosen according to the stationary distribution  = ((0) ⋅ ⋅ ⋅ ()).Our goal is to find the expected number of times the attacker wins in  steps, starting with this distribution.
Let   ,  = 1, . . .,  be an indicator variable that takes the value 1 if the attacker wins in the time step  and zero otherwise.Note that, starting from stationary distribution , the distribution of next step position of the attacker is  = , and so each   has identical distribution Pr(  = 1) = ().
The random variable  = ∑  =1   is the number of times that the attacker wins in  time steps.Noting the linearity of the expectation function E(⋅), that is, E(+) = E()+E(), we have The adversary will have zero chance of winning in the first /V time steps if they start with the 0 distribution, and so The last step of the argument assumes that, starting from the initial distribution (1, 0, . . ., 0), Pr(  = 1) is monotonically increasing (in fact a weaker assumption would suffice for this last step of the argument, which is Pr(  = 1) ≤ () for all ) in each step of the chain until it reaches ().
In our numerical computation we use () to represent this security measure.

Expected Number of Steps to the First Time Win.
Our second security metric is the expected number of steps to first time compromise.This is an important measure for defender to estimate unbreakability of the system and for the attacker to estimate the work (in terms of the number of time steps that could be translated into attacker's cost) needed to break the system.This measure can be calculated by solving a set of linear equations.

Theorem 4. Consider an MTD game with transition matrix
.Let V  denote the expected number of times to reach the state  (the winning state) for the first time, if the game  has started with state .We have where r() = 1 for all  ∈ {0, . . .,  − 1} and M is the same matrix as  with the last (th) row removed.
Proof.We consider an attacker that starts in state 0. Let  (1)   win denote the expected number of time steps to compromise the system.Let V  denote the expected number of time steps to reach state  (the winning state) for the first time, if the game  starts at state .In both MTD games, starting from state 0, the next state will be state 0 or 1, and so we can write That is, the expected number of times to reach state  from state 0 is one (time step) more than the weighted average of the expected number of times to reach state  from state 0 if the next move was to 0, and the expected number of times to reach state  from state 1, if the next move was to state 1.We can write similar equations for all states except state , for which V  = 0. Let k and r be column vectors such that k() = V  and r() = 1 for all  ∈ {0, . . .,  − 1}.The set of equations can be written as where M is  with the th row removed.This linear equation can be solved for k for any given  and the first element of k is our desired  (1)  win security metric.In Section 6 we will present graphs of  (1)  win for various game settings.

The 𝐺 1,1 Game
To better understand the relationship between parameter values (, , , ), in the following we will focus on  1,1 where That is, in state , the defender moves by randomly choosing one of the  paths in S and swapping it with a randomly chosen path from the  −  paths in [] \ S. If the random choice from S is one of the  paths in    that is known to the adversary, the adversary loses one of their  captured paths, and the state will move to state  − 1, and this happens with probability /.Otherwise, if the defender's selected path is not in    the system stays in the same state  and this has probability 1 − /.
The attacker will randomly choose one of the [] \ S | possible paths.The new path will be a new target path with probability ( − )/( − ) and so the system will move to the state  + 1 with this probability.On the other hand with probability 1 − ( − )/( − ), the selected path will not be a target path and with this probability the system will remain in state .

Case C1:
Risk-Taking Adversary.State transition probabilities are given by (4). Figure 1(a) shows state transition probabilities due to the attacker's and the defender's actions.The transition probabilities on the upper part of the figure are due to the attacker's action.Figure 1(b) shows the combined transition probabilities.
It is easy to see that the Markov chain is irreducible and aperiodic.Using this matrix we can find stationary probability distribution of the system denoted by  = ((0) ⋅ ⋅ ⋅ ()).

Risk-Averse
Adversary: Case C2.The state transition matrix is given by (5).At each time step, (i) defender moves (randomize) with probability , (ii) attacker moves with probability , and (iii) no move happens with probability 1 −  − .
The adversary knows the state and moves with probability .The state transition matrix  can be obtained from Figure 2(b), given by where  = [ , ] = [( | )].Using this matrix we can find stationary probabilities  = ((0) ⋅ ⋅ ⋅ ()).Again, the Markov process is irreducible and aperiodic and a limiting stationary distribution always exists.
As  = , for the case of  =  + 1 we have Assuming the case  =  holds we have Recall that ∑   , = 1; thus Combining the last two equalities will yield or equivalently  Theorem 6.Let  be the ( + 1) × ( + 1) transition matrix of the Markov chain of the  1,1 game defined in (17).Also let  = ((0) ⋅ ⋅ ⋅ ()) be the stationary probability distribution of .Then the following inequality holds: Proof.The Markov chains of  1,1 games satisfy Lemma 5. Therefore, using (17) we have The second equality is due to Lemma 5, and the third equality is due to the transition probabilities of  1,1 .Note that for the case of risk-taking adversary  = 1 − .

Numerical Results for 𝐺 1,1
To study the effect of different system parameters on the system security, we calculated security measures of the two MTD games for different values of system parameters and graphed the results.We used the results of Section 4 to calculate () and  (1)  win for different choices of , , , and .We used MATLAB for our calculations.For each set of system parameters , , , or , using ( 6)-( 14), we first obtained the transition matrix .For C1  is varied from 1% to 99% in steps of 1%.For C2  is varied from 1% to 1 −  in steps of 1%.
To calculate  (1)  win , we used the results of Theorem 4 and employed the linear equation solver (linsolve) of MAT-LAB to solve the set of equations k = r + Mk for each parameter set.For large values of ,  becomes near-singular and an exact solution cannot be found (in fact, for large values of  and large values of  a MATLAB warning occurred due to the equation being near-singular and thus only an approximation of the answer was calculated.We have excluded those cases from our analysis and the graphs are only included with exact results here).This explains the choice of  < 10 in our graphs.The choice of  determines the computation time but otherwise is not restricted.We chose values of  and  such that the cases of  < /2 and  > /2 both are shown in the graphs.
To calculate () we used MATLAB eigenvector analysis function ([V,D]=eig(M)) to find for the stationary distribution  where  = .The stationary distribution is obtained by normalizing the eigenvector that corresponds to the eigenvalue + 1.We could use  up to 1000 and  up to 200 in this analysis.For the choice  we ensured that  < 0.5,  = 0.5, and  > 0.5 are represented.The results of the above calculations are graphed.
6.1.Case C1.The results of numerical computations for different settings are depicted in Figures 3, 4, and 5.
Stationary Probability.For fixed  and , as  increases () decreases.For fixed  and , when  increases, () decreases.Both imply better security as expected from more dynamic systems (Figure 3).
Figure 5(b) shows that, for fixed  and , increase in  results in the reduction of () and increase in  (1)  win and so better security.However, this gain in security diminishes after  reaches a threshold.In this last case almost all paths are target paths and so attacker's chance of correctly guessing is high.The thresholding behaviour suggests using higher  (and so higher system cost) will not have a substantial effect on security.   ( win as a function of  for different values of  and  for C1, using analytic calculations.Similar to (), for any fixed  and , increasing  improves security of the system; that is,  (1)  win increases.Figure 5: Results for security measures () and  (1)  win as functions of  for  = 20 and different values of .Note that for large values of  the system can remain secure even if  =  − 1 as  (1)  win remains very large (a) or () remains very close to zero (b).
The figures also show that, for fixed  as  increases, () decreases.For example, for  = 30,  = 3, 5, and 7, for all  ≥ 0.5, we have () ≤ 10 −3 (see Figure 3). (1) win .Figure 4 shows that, for given parameters  and , increasing  increases the security of the game.Moreover, we observe that  (1)  win behaves linearly in the log graphs of Figure 4 as  increases, and it increases faster for higher 's.Therefore, with increasing , the risk-taking attacker needs much longer time to compromise the system.

Expected Number of Steps to the First Win 𝐸
We also observe that, for given  and , as  increases, after a certain threshold value, () increases.For example for  = 20,  =  − 1 = 19, and  = 0.2, this threshold is () = 0.57 (see Figure 5(b)).For  > 0.5 however, even for  = −1, () remains small (very close to zero).The same behaviour exists for  (1)  win : for given  and , the expected number of steps to the first win of the adversary decreases as  increases.However, decrease in security is negligible if  is sufficiently large (see Figure 5(a)).

Case C2.
The stationary distribution for given , , , and  is given in Figure 6.We consider three different values of  = 0.3, 0.5, an 0.7.
We also show our results for  (1)  win in Figure 7.The figures demonstrate that the defender achieves better security by choosing higher 's.
Figure 6 shows that, for fixed values of , , and , the security of the system increases with the increase in , represented by the reduction in ().We also observe that, similar to the case C1, for fixed , , and , we gain more security with increasing .Moreover, as can be expected, smaller values of  will let defender to reach higher security by increasing . Figure 7 shows our analytical calculation results of  (1)  win .Again for fixed values of , , and , the security of the system increases with the increase in , and this is shown by the increase in  (1)  win .It can also be seen that for fixed values of , , and , more security can be gained with the increase in .The figures also indicate that smaller values of  allows the defender to increase security of the system by increasing .
As it was discussed in Section 7, we can calculate the cost of being a risk-averse adversary in terms of  (1)  win .Figure 8 shows the behaviour of this cost (penalty) as a function of  < 1 − .

Utilities
There are costs and gains associated with the defender's and the attacker's actions.Below we outline important aspects of utilities of the two players and note that our basic modelling and numerical analysis could provide insight into better quantifying these utilities.More concrete analysis and estimation of utilities require considering specific realization of path hoping systems and more detailed modelling and numerical computations.
Defender's cost in state  is denoted by  ,  and  , 1  for no action and action  1 , respectively.
(i)  ,  is the cost of the defender if they do not move.In this case, the defender does not need to bear any resource cost; however, the chance of the attacker winning in the next time step would increase because the attacker may capture additional target paths in the next time step and the state will change to  > .The defender, however, does not know the exact state of the system, , and so their cost would be an expected value, where expectation will be taken over the stationary probabilities of the system.  (1 win as a function of  for different values of  and  for C2, using analytic calculations.Similar to (), for any fixed , , and , increasing  improves security of the system; that is,  (1)  win increases.
(ii)  , 1  is the cost of randomization.This will be a fixed cost (that could depend on the number of of paths that will be randomized in one time slot) associated with redirecting traffic, possibly packet losses, and delays to reestablish the paths.
Attacker's costs in state  is denoted by  ,  and  , 1  for no action and action  1 .
(iii)  ,  is the cost of the attacker not taking action.They do not need to spend resources, but their success chance would reduce because the defender's action in the next time step may result in one of the target paths in    to be removed from the target path set.As discussed below, this cost can be estimated in terms of increase in  (1)  win .(iv)  , 1  is the cost of launching the attack in state  and indicates the resources that the attacker must spend to realize the attack.This cost would be fixed as long as the attack rate is below  and will increase if the attack rate  is above .
One can also consider gains associated with actions of the attacker and the defender.Utilities of the players will be a function of the costs and the gains.
Estimating  ,  .A risk-averse adversary will use an attack rate below 1 −  and so, with probability 1 −  − , there is no action from the attacker.Intuitively, no action means that the attacker will have a reduced success chance in breaking security.This can be quantified by the larger expected number of time steps to win for the first time.For example, consider  = 30,  = 7, and  = 0.6.A risk-averse adversary who moves with probability  = 0.3 < 1 − 0.6 = 0.4 will have  (1)  win ( = 0.3,  = 0.6) = 6×10 6 ; however, for the same values of  and , a risk-taking attacker with  = 1 −  = 0.4 will have  (1)  win (1 − ,  = 0.6) ≈ 1.0 × 10 6 .This shows that a riskaverse adversary is paying the penalty of being risk-averse and (most likely) must wait for its first win much longer than a risk-taking adversary.By defining the cost of being riskaverse as V =  (1)  win (, ) −  (1)  win (1 − , ) for a risk-averse adversary with probability of moving , we can graph the behaviour of this cost (penalty) as a function of  < 1 − .
Figure 8 shows that smaller  (being more risk-averse) will have higher costs, and as  increases, the cost decreases.However, as defender increases , the available attack rate of the attacker (1 − ) decreases and after a certain threshold value of , the cost of being risk-averse decreases and becomes 0 when  = 1 − .

Concluding Remarks
We introduced path hopping as an approach to providing efficient long-term cryptographic security for communication against an adversary with access to a quantum computer.We considered a general class of dynamic strategies that can be modelled as a Markov chain that models the attacker's and the defender's interaction and gave detailed analysis of  1,1 .Our work opens new directions for future work including considering more complex set of actions for the players.For example, allow defender and attacker to choose  and  in each state, and/or use different values of  and V at different state.Including the actual costs of the defender and the attacker in the modelling and analysis will unravel the limits of randomization strategies in practice.

Figure 3 :
Figure3: Numerical computation results for (), the winning probability in the stationary state as a function of  for different values of  and  for the case C1.For any fixed  and , by increasing  security of the system increases; that is, () decreases.

Figure 4 :
Figure4: Variation of (1)  win as a function of  for different values of  and  for C1, using analytic calculations.Similar to (), for any fixed  and , increasing  improves security of the system; that is, (1)  win increases.

Figure 6 :
Figure6: Numerical computation results for (), the winning probability in the stationary state as a function of  for different values of  and  for the case C2.For any fixed , , and , by increasing  security of the system increases; that is, () decreases.

Figure 7 :
Figure7: Variation of (1)  win as a function of  for different values of  and  for C2, using analytic calculations.Similar to (), for any fixed , , and , increasing  improves security of the system; that is, (1)  win increases.

Figure 8 :
Figure8: The cost of being risk-averse in terms of (1)  win as a function of  for different values of ,  and .
's Move in State .Attacker holds the target paths in S | and knows the state of the game.The attacker will choose a set    of V paths from [] \ S | , the set of  −  available candidate target paths.Let   ,1 =    ∩ (S \ S | ) be the intersection of    and defender's set of target paths S that are not captured yet, and let |  ,1 | = .We have 0 ≤  ≤ min{ − , V}.With the new  captured target paths, the sate of the game will become  +  and we have