A Novel Attack-and-Defense Signaling Game for Optimal Deceptive Defense Strategy Choice

Increasingly, more administrators (defenders) are using defense strategies with deception such as honeypots to improve the IoT network security in response to attacks. Using game theory, the signaling game is leveraged to describe the confrontation between attacks and defenses. However, the traditional approach focuses only on the defender; the analysis from the attacker side is ignored. Moreover, insufficient analysis has been conducted on the optimal defense strategy with deception when the model is established with the signaling game. In our work, the signaling game model is extended to a novel two-way signaling game model to describe the game from the perspectives of both the defender and the attacker. First, the improved model is formally defined, and an algorithm is proposed for identifying the refined Bayesian equilibrium. Then, according to the calculated benefits, optimal strategies choice for both the attacker and the defender in the game are analyzed. Last, a simulation is conducted to evaluate the performance of the proposed model and to demonstrate that the defense strategy with deception is optimal for the defender.


Introduction
IoT networks and devices are highly vulnerable to sophisticated cyber-attacks. Despite the widespread deployment of security monitoring tools, which include firewalls and intrusion detection systems (IDSs), attackers can infiltrate target IoT devices by leveraging multiple attack vectors [1].
Recently, honeypot-enabled deceptive security mechanisms were introduced as an emerging proactive cyber defense strategy for confusing or misleading attackers and showed significant advantages over traditional security techniques [2]. For attackers, deceptive behaviors of defenders increase the uncertainty of the target to be compromised [3]. Attackers must spend additional resources (e.g., time and money) to deal with the uncertainty via reconnaissance and to develop situational awareness. In addition, deceptive behaviors prevent attackers from launching efficient custom attacks. For example, by collecting an attacker's information when he is compromising a target device that is disguised by honeypots, the defender can use the learned knowledge to enhance the IoT network security [4]. As a result, deception by providing seemingly convincing yet misleading information to deceive attackers has become a major defense mechanism. With the wide utilization of deception, the security status of organizations has been substantially improved. When attackers are following the seven phases of the cyber kill chain [5] in launching an attack, deception approaches can be performed effectively in disrupting each stage of the cyber kill chain, as illustrated in Figure 1.
The contributions of the paper are the following.
(1) A two-way signaling game model based on the signaling game is formally defined to describe the confrontation from the perspectives of both the defender and the attacker.
(2) With the two-way signaling game model, an algorithm is defined to identify the refined Bayesian equilibrium in the game. (3) With the deception strategy introduced, the optimal strategies choice for both the attacker and the defender in the game is analyzed.

Related Works
In previous work [6], due to a lack of clarity regarding the concept of deception, deploying honeypots to detect an attacker and to obtain information on the attacker's intentions is the primary deception mode for the defender to use. For instance, Rowe et al. [7] showed how to decrease the number of attacks to which a network is subjected by utilizing fake honeypots, namely, by disguising normal systems as honeypots. Garg and Grosu [8] used a honeynet system to characterize deception, where defenders may have the choice to conceal a regular host as a honeypot (or inversely) in response to the attackers' probe. Seamus et al. [9] created a honeypot that simulates a ZigBee gateway to assess the presence of the ZigBee attack intelligence on a SSH attack vector in Wireless Sensor Networks (WSNs).
Until recent years, as deception became a powerful tool for protecting IoT networks and devices against attackers [10], game theory was introduced into the field of cybersecurity to model the interaction between defender and attacker and to identify the optimal defense strategies for both players. Cohen [11] comprehensively discussed deception as a technique for protecting information systems and concluded that deception has a positive effect for the defenders and a negative effect for the attackers. Carroll and Grosu [12] modeled the way deception affects the attack-defense interactions based on a game in which the players (defenders and attackers) have incomplete knowledge of each other. Pawlick and Zhu [13] extended the signaling game by assuming that the adversary can obtain evidence of the true state of the system, and they concluded that the effectiveness of deceptive defenders sometimes increases if an adversary develops the ability to detect deception. Duan et al. [14] proposed an energy-aware trust derivation scheme using the game theoretic approach to manage overhead while maintaining adequate security of WSNs. Fugate and Ferguson [15] discussed techniques for combining artificial intelligence algorithms with game theory models to estimate hidden states of the attacker using feedback through payoffs to learn how to optimally defend the system using cyber deception. Additional works are listed in Table 1.
As discussed above, in contrast to the previous focus on the analysis of the defender, our work will describe the process from not only the perspective of the defender but also that of the attacker.
3. An Improved Signaling Game Model 3.1. Analysis of the Novel Attack-And-Defense Signaling Game. According to [22][23][24], the information that is released by the defender actively or the information that is leaked via defensive behavior passively is an important decisionmaking basis for the attacker. Such information is referred to as the signal that is sent by the defender, and the defense signal can affect the behavior of the attacker by changing the benefits to both the attacker and the defender. Furthermore, we believe that the information that is released by the attacker and observed by the defender will also affect the defense decision and change the final attack-and-defense benefits. We construct an attack-and-defense behavior interaction model with incomplete information. According to signaling game theory, we analyze the dynamic game process and the signal mechanism from the perspectives of both attack and defense, and we investigate the influence of defense signals on the game equilibrium and strategy choice for both the attacker and the defender. We describe this process as a novel attack-and-defense signaling game that is defined as a two-way signaling game model, as illustrated in Figure 2.
The defender is defined as the leader of the signaling game, and the attacker is the follower when analyzing the forward signal transmission. The roles of the attacker and the defender will be exchanged when analyzing the reverse signal transmission. By constructing the attack-and-defense game process in both the forward and reverse directions, the influences of two examples on the defense strategy are analyzed: (1) in the forward phase, ① a defender mixes a defensive strategy with a (or no) deception strategy to deter, deceive, and induce the attacker and sends a defensive signal; ② the attacker forms an initial belief regarding the defender type by collecting reconnaissance information in advance and public information from the defender. The attack strategy is selected according to the calculation of the Bayesian posterior probability for the defender type; and ③ the defender selects the optimal defense strategy for implementing security defense. (2) In the reverse phase, ① the attacker sends an attack signal while attacking; ② the defender forms a belief regarding the attacker. Under the action of the attack signal, the defender calculates the Bayesian posterior probability of the attacker type and corrects the defense strategy accordingly; and ③ the attacker corrects the current optimal attack strategy.
For convenience, we analyze the forward signaling game process and the reverse process separately; however, logically, these two processes are conducted simultaneously. Therefore, the strategy choice that is made by the defender is simultaneously affected by these two processes.

Author
Focus of the study Çeker et al. [16] Modeled with a similar approach that uses game theory and provides the option of disguising a real system as a honeypot (or vice versa) to mitigate denial of service (DoS) attacks Hichem et al. [17] Proposed a game theoretic technique to activate anomaly detection technique only when a new attack's signature is expected to occur Aaron et al. [18] To increase the uncertainty of adversarial reconnaissance and introduced a novel game theoretic model of deceptive interactions between a defender and a cyber-attacker into responses to network scans or reconnaissance Somdip [19] Proposed a methodology in which game theory can be used to model the activity of stakeholders in the networks to detect anomalies such as collusion by using a supervised machine learning algorithm and algorithmic game theory Pawlick and Zhu [20] Investigated a model of signaling games in which the receiver can detect deception with a specified probability Kun et al. [21] Employed Nash equilibrium in the noncooperative game model and analyzes its efficiency in vehicular ad hoc networks  Figure 2: In a two-way signaling game model, the forward direction is defined as the defender sending a signal m d to the attacker, who will infer the type of the defender θ and choose the action a; the reverse direction is defined as the attacker sending a signal m a to the defender, who will infer the type η of the attacker and choose the action d.

Wireless Communications and Mobile Computing
is the private information, which determined by the defensive action that is taken; the type of attacker Θ A = ðη j jj = 1, 2,⋯nÞ is the private information of the attacker, which is determined by the attack action that is taken. ③M = ðM D , M A Þ denotes the signal set for the defender and the attacker. M D = ðm d j d = 1, 2,⋯ÞM D ≠ ∅ denotes that the defender selects and releases the signal according to the set signal release mechanism. For ease of representation, the signal name is consistent with the defender type name. The defense signal and the defender type are not necessarily consistent due to the objective of deceiving and inducing the attacker. Similarly,M A = ðm a j a = 1, 2,⋯ÞM A ≠ ∅ denotes the attack signal that is sent by the attacker, and the signal name is the same as the attacker type name.
④ S = ðD, AÞ denotes the strategy set for the defender and the attacker, where D = fd g jg = 1, 2,⋯g and A = fa h jh = 1, 2,⋯g denote the defense strategy and the attack strategy, respectively.
⑤ P A is the belief set of the attacker on the type of defender, ⑥ P A ′ is the posterior probability set of the attacker on the type of defender, where, P A ′ = P A ′ ðθ i | m d Þ = ðμ 1 ,⋯,μ n Þ denotes the posterior probability of the type of defender, which follows the Bayesian rule, after the attacker observes the defensive signal m d .
⑦ P D is the belief set of the defender on the type of attacker, where ⑧ P D ′ is the posterior probability set of the defender on the type of attacker, where P D ′ = P D ′ ðη i | m a Þ = ðδ 1 ,⋯,δ n Þ denotes the posterior probability of the type of attacker, which follows the Bayesian rule, after the defender observes the defensive signal m a .
⑨ U = ðU D , U A Þ denotes the expected utility set of the defender and the attacker, whose value is determined by the strategies that are chosen by all players. The corresponding utility functions will be discussed in the next section.

Refined Bayesian Equilibrium Solution and the Optimal
Defense Strategy Choice. According to Definition 1, this section extends the refined Bayesian equilibrium to the twoway signaling game model based on the definition of the refined Bayesian equilibrium [25] and proposes a refined Bayesian equilibrium algorithm for the two-way signaling game. Instances in the forward direction and in the reverse direction for the two-way signaling game model were constructed to show the details.
Definition 2. The equilibrium in a two-way signaling game model for defense strategy choice with deception is a refined Bayesian equilibrium if the following requirements are satisfied: (I) a * ðmÞ ∈ argmax a ∑ θ P A ′ðθ | mÞU 2 ðm, a, θÞ: (III) P′ðθ | mÞ is the posterior probability that is calculated by the signal receiver according to the Bayesian rule based on the prior probability PðθÞ, signal m, and the signal sender's optimal strategy m * ðθÞ.
In (I), a * ðmÞ denotes the optimal action that is adopted by the signal receiver after obtaining the posterior probability P ′ ðθ | mÞ of the type to which the signal sender belongs; U 2  Figure 4: In the signaling game tree G DS ðRÞ for the reverse direction, nature assigns type η H with probability p A ðη H Þ and type η L with p A ðη L Þ. The attacker can send either signal m A (signaling that the attacker is of typeη H ) or m P (signaling that the defender is of type η L ). The defender will revise her judgement on the type of the attacker by selecting fS N H , S N L g if observing signal m A and fD N H , D N L g if observing signal m P as the posterior probability for the type of the attacker fη H , η L g. u ij denotes eight outcomes, where each outcome results in the corresponding payoff.   Wireless Communications and Mobile Computing ðm, a, θÞ denotes the utility function of the signal receiver, which is the expected utility function of attacker U D ðm j , d g , a h , θ i Þ in the forward direction and the expected utility function of the defender U D ðm j , d g , a h , θ i Þ in the reverse direction; and θ ∈ Θ = ðΘ D , Θ A Þ denotes the type set for the defender and the attacker, where θ ∈ Θ D = ðθ i ji = 1, 2,⋯nÞ in the forward direction and θ ∈ Θ A = ðη j jj = 1, 2,⋯nÞ in the reverse direction. In (II), m * ðθÞ denotes the optimal strategy that is selected by the signal sender after predicting the optimal action a * ðmÞ of the signal receiver; U 1 ðm, a * ðmÞ, θÞ denotes the utility function of the signal sender, which is U D ðm j , d g , a h , θ i Þ in the forward direction, and U A ðm j , d g , a h , θ i Þ in the reverse direction.
In (III), P ′ ðθ | mÞ indicates the posterior probability calculated by signal receiver according to the signal sent by the signal sender followed by the Bayesian rule, which is P A ′ in the forward direction andP B ′ in the reverse direction.

Method of Refined Bayesian Equilibrium in the Two-Way
Signaling Game Model. The steps are as follows: (1) Construct the posterior inference Pðθ | mÞ of various information sets on the signaling game tree (2) Calculate the optimal strategy for the signal receiver according to the posterior inference When observing the signalm ∈ M, the signal receiver will choose optimal strategy a * ðmÞ according toPðθ | mÞ for the type θ of the sender to maximize the expected utility U 2 , namely, the signal receiver will identify his optimal strategy a * ðmÞ by calculatingmax∑pðθ | mÞU 2 ðmðθÞ, a, θÞ.
(3) Calculate the optimal strategy for the signal sender according to the posterior inference The signal sender foresees that the signal receiver will select the optimal strategy based on observations of the signal that is released by him and chooses the strategy that maximizes the expected utility U 1 , namely, the signal sender identifies his optimal strategy m * ðθÞ based on the posterior inference by calculating max U 1 ðm, a * ðmÞ, θÞ.
(4) Calculate the refined Bayesian equilibrium  //Calculate the optimal strategy for attack and defence Bayesian (P A ′ ðθÞ); //Calculate the posterior probability and apply the Bayesian rule for the defender Create ðm * ðθÞ, a * ðmÞ, P A ′ ðθÞÞ; //Construct the refined Bayesian equilibrium Sort (m * ðθÞ);//descending Output (m * ðθÞ);//output the optimal strategy for the defender End Algorithm 1: Optimal strategy choice algorithm description based on a two-way signaling game model.

Wireless Communications and Mobile Computing
Calculate P′ðθÞ via the Bayesian rule according to a * ðmÞ from (2), m * ðθÞ from (3), and the belief P. If P′ðθÞ and Pðθ | mÞ are not in conflict, then the refined Bayesian equilibrium solution is EQ = ðm * ðθÞ, a * ðmÞ, P′ðθ | mÞÞ.
The following two instances of the forward direction and the reverse direction of the signaling game demonstrate the process above. The defender type is denoted as Θ D = ðθ N , θ H Þ = ðNormal sys, Honeypot sysÞ, and the signal corresponds to the defender type, namely, M D = ðm N , m H Þ = ðNomlSysSig, HonSysSigÞ. In addition, the defensive strategy set isD = fd g jg = 1, 2,⋯g, and the utility function is U D ðm j , d g , a h , θ i Þ; the attacker type is denoted as Θ A = ðη H , η L Þ = ðAdvAttacker, PrimAttackerÞ, with the attack strategy A = fa h jh = 1, 2,⋯g, and the utility function is U A ðm j , d g , a h , θ i Þ.

Refined Bayesian Equilibrium Solution Method for the Forward Signaling Game. A game of incomplete information
can be transformed into a game of imperfect information by adding a hypothetical player, namely, nature (denoted by C here), and by conditioning the payoffs on Nature's unknown moves. The nature player moves first by randomly choosing the defender type with the prior probability distribution over all defender types. In the forward direction, nature assigns type θ N with probability p D ðθ N Þ and type θ H with p D ðθ H Þ.
Once the defender has learned her type, she decides what signal or message to send to the attacker. The signal provides indirect information for the attacker about the defender type. In our example, the defender can send either signal m N (signaling that the defender type is θ N ) or m H (signaling that the defender type is θ H ). The defender can send signal m H , even in the case that her real type is θ N , or send signal m N , even in the case that her real type is θ H . The attacker revises his judgement on the defender type and takes action fO P N , O P H g if observing signal m N and action fW P N , W P H g if observing the signal m H , as the posterior probability for the defender type fθ N , θ H g. In the game tree, a ij indicates eight outcomes, which results in a corresponding payoff. The forward signaling game tree G DS ðFÞ is presented in Figure 3.
3.6. Refined Bayesian Equilibrium Solution Method for the Reverse Signaling Game. Nature moves first by randomly choosing the attacker type with the prior probability distribution over the attacker types. The reverse signaling game tree G DS ðRÞ is presented in detail in Figure 4.   Figure 5: Game scenario with deception, which considers two decision makers, namely, a defender and an attacker. The defender deploys a honeypot in the IoT network as either a system or a service host. In the specified scenario, the forward and reverse transmissions occur simultaneously. The sequences of moves, type sets, and action sets follow the modeling elements that were discussed in the previous section. The incomplete information comes from the attacker's uncertainty regarding the type of the system.

Wireless Communications and Mobile Computing
According to the definition, m * = m a * ða * , ηÞ = m a * ðηÞ indicates that attacker η sends signal m a * and chooses strategy a * ðm a * Þ according to the signal, which is denoted as m a * ðηÞ; a * = a * ðd g , m a Þ = a * ðm a Þ indicates the defender's responding action a * ðd g , m a Þ, which is denoted as a * ðm a Þ; P ′ = P D ′ ðη | m a Þ = P D ′ indicates that the defender calculated P D ′ ðη | m a Þ as the posterior probability for the attacker type, which is denoted as P D ′ ; and the existence of a refined Bayesian equilibrium is denoted as EQ = ðm a * ðηÞ, a * ðm a Þ, P D ′ Þ. Based on the two examples above and the algorithm in [26], the optimal strategy selection algorithm for the twoway signaling game model is presented as Algorithm 1.

Simulation Environment.
To evaluate the proposed attack-and-defense signaling game model and the algorithm for optimal strategy selection, we construct the simulation environment illustrated in Figure 5.

4.2.
Calculating the Utility. According to Richard [27], common vulnerability [28] and the database of attack-anddefense behaviors from MIT [29], attack strategies that are composed of basic options are listed in Table 2.
Common defense strategies with deception that are composed of basic operations are described in Table 3.
For selecting the optimal strategy more scientifically and intuitively, the most basic approach is to quantify the utilities of the strategies that are selected by the defender and the attacker. In this paper, we utilize the scheme that was proposed by Zhang and Li [30] to calculate the expected utility functions of the defenders and the attackers as follows: The notations that are used in equations (1) and (2) are described in Table 4.
For the defender type {θ N , θ H }, the defense strategy is assumed to be D 1 fd 1 , d 2 g or D 2 fd 5 d 6 g, and for the attacker type {η H , η L }, the attack strategy is A 1 = fa 1 , a 2 g or A 2 = fa 4 , a 6 g. Based on historical data and experience, C a = fC A 1 , C A 2 g = f590, 320g, C d = fC D 1 , C D 2 g = f360, 285g, and C ds = fC D 1 , C D 2 g = f20, 10g. System loss cost function with the defensive strategy d g and the attack strategy a h as parameters, which indicates the loss to the defender's system when it is compromised, namely, the benefit to the attacker of successfully compromising the system C ds : cost of deception signal Cost of a signal using deception, namely, the cost that is incurred by the defender in sending a spoofing signal that does not match its type to deceive the attacker All the utilities that are specified in Figures 6 and 7 were calculated via equations (1) and (2).
The posterior inferences can be constructed on various sets of information. Via Algorithm 1, we obtain possible equilibria in the forward direction, as presented in Table 5.
To calculate the utility of the reverse-direction signaling game, we set p A ðη H Þ = 0:4, p A ðη L Þ = 0:6, S N H + S N L = 1, and D N H + D N L = 1. The posterior inferences that can be constructed on the two sets of information are P D ′ ðη H | m A Þ = 0:46 and P D ′ ðη L | m P Þ = 0:65. Via Algorithm 1, we obtain the possible equilibria in the reverse direction, which are presented in Table 6.
The algorithm proposed, and the game simulated in the paper is compared with other approaches in Table 7. We have analyzed both directions of signal transmission in a dynamic incomplete information game, which is more in line with the actual attack-and-defense scenario, and the results can guide the defense decision much more precisely.

Result Analysis.
By implementing the simulation above, we obtain the following results: (1) In the forward-signaling game model, if ðP A ′ ðθ N | m d Þ, P A ′ ðθ H | m d ÞÞ and ðS N H , D N H Þ do not conflict, the refined Bayesian equilibrium is a pooling equilibrium. Hence, the defender chooses a honey system and releases the honey system signal, which deceives the attacker, thereby influencing the attacker's judgement on the defender type and on the choice of attack strategy. Thus, the defender uses the signal to demonstrate a capability that exceeds the actual capability, thereby reducing the likelihood of suffering a loss (2) In the reverse-signaling game model, the attacker moves first. He can be of type η H and send signal m A (presenting himself as an advanced attacker) or m P (pretending to be the primary attacker). He can also be of type η L and send the signal m P (presenting himself as the primary attacker) or m A (pretending to be an advanced attacker). According to Table 6, the refined Bayesian equilibrium is realized when the advanced attacker pretends to be the primary attacker and the defender chooses strategy D 1 with the deception technique. The advanced attacker   (3) From the perspective of utility for both the defender and the attacker in a two-way signaling game, regardless of whether the attacker's ability is low or high, the choice of the deception defense strategy would increase the payoff of the defender compared with the normal system without deception. The defense strategy with deception is the optimal strategy for the defender. Therefore, the defender would choose the deceptive strategy, namely, the normal system would be disguised as a honeypot

Conclusions
We model the confrontation between a defender and an attacker by utilizing signaling game theory. Additionally, we propose the concept of a two-way signaling game and propose an algorithm for identifying optimal defense strategies. Finally, we conduct an extensive simulation analysis to evaluate the performance of the proposed approaches by fortifying the attack-and-defense confrontation in a two-way signal releasing mechanism and calculating the utilities for both sides. This paper mainly proposes a proactive defense mechanism that utilizes signal selection and release methods and does not consider other defense mechanisms. There are several limitations in our methods, one is that the expected utility functions used in equations (1) and (2) could not be extended to multistage games, and another is that the example shown in the simulation part did not consider the synchronous affect between the attacker and the defender during the game, both of which will be studied in the future work. However, the proposed two-way signaling game model is of substantial importance for subsequent research in the IoT network security. For example, with the method proposed, the defender of the IoT network could infer the optimal strategy of the attacker and take action such as improving the protection level in advance to defense attacks. In the future, we will integrate the analysis via mathematical description, implement the attack-and-defense model for multiple stage games, and explore the security defense decision-making method in IoT networks.

Data Availability
The data used to support the findings of this study are included within the article.