Stochastic Differential Game-BasedMalware Propagation in Edge Computing-Based IoT

Internet of *ings (IoT) has played an important role in our daily life since its emergence. *e applications of IoTcover from the traditional devices to intelligent equipment. With the great potential of IoT, there comes various kinds of security problems. In this paper, we study the malware propagation under the dynamic interaction between the attackers and defenders in edge computing-based IoTand propose an infinite-horizon stochastic differential game model to discuss the optimal strategies for the attackers and defenders. Considering the effect of stochastic fluctuations in the edge network on the malware propagation, we construct the Itô stochastic differential equations to describe the propagation of the malware in edge computing-based IoT. Subsequently, we analyze the feedback Nash equilibrium solutions for our proposed game model, which can be considered as the optimal strategies for the defenders and attackers. Finally, numerical simulations show the effectiveness of our proposed game model.


Introduction
Recently, a rapidly increasing number of physical devices and sensors are connecting to the Internet at an unprecedented rate. It has led to the emergence of the Internet of ings (IoT). By deploying smart devices and sensors to collect and analyze the physical data, the IoT can monitor and control the physical environment [1]. IoT has brought great convenience to our daily life in the past few years. For example, the IoT has been widely used in intelligent transportation, smart home appliances, smart healthcare, and other fields [2,3].
Since the IoT devices typically have limited resources, it is common to forward the physical data to the cloud computing platform, which will need extra bandwidth or cause data security problem. With the advance of IoT, edge computing has been introduced to address the above issues [4][5][6]. Generally, edge computing provides powerful computing resources at the edge of the Internet and is close to the IoT devices [7,8]. Edge computing has relieved the pressure of bandwidth and overcome the latency issue. However, edge computing environment is an open ecosystem and the IoT devices with limited resources are more vulnerable to be attacked [9]. en, the existing defense mechanisms based on cloud computing cannot be used to edge computing because of the geographically dispersed nature of IoT devices. us, how to effectively design defense mechanisms to defend against attackers has become a serious problem that desperately needs to be solved. In this paper, we pay attention to the security problem of malware propagation based on the stochastic differential game; in this framework, we try to model an optimal defense strategy for IoT devices.
In edge computing-based IoT, attackers want to infect more IoT devices with malware to gain illegal gains using the attack strategy, while the defenders want to minimize the damage caused by IoT devices infected with malware using the defense strategy. Meanwhile, the IoT devices join or exit the network randomly, which can affect the stability of the edge network. e dynamic interaction between the attackers and defenders leads to the propagation of the malware, and the influence of network instability can be considered as the stochastic elements. In this paper, we propose an infinite-horizon stochastic differential game model to research the malware propagation among IoT devices under the dynamic interaction between attackers and defenders in edge computing-based IoT, considering the stochastic fluctuations in the network. e main contributions of our proposed scheme are as follows: (1 )Firstly, we use the infinite-horizon stochastic differential game to model the malware propagation under the dynamic interaction between the attackers and defenders in edge computing-based IoT. (2) Secondly, the Itô stochastic differential equation is used to characterize the effect of stochastic fluctuations of the edge network on the malware propagation. (3) Finally, we discuss the feedback Nash equilibrium solutions for our proposed game model, which can be considered as the optimal strategies for both the attackers and defenders.
is paper is organized as follows. Section 2 introduces related works. Section 3 discusses the security problem of the attackers and defenders in a stochastic differential game theory. e feedback Nash equilibrium solutions for our proposed game model are analyzed in Section 4. Numerical simulations are given in Section 5. Finally, we conclude this paper in Section 6.

Related Works
Malware propagation problem is one of the most fundamental problems, for which many kinds of research have been proposed in the literature [10][11][12][13][14][15]. Malware propagation means that the infected legitimate nodes are able to contaminate other noninfected legitimate nodes, in addition to the attack nodes [16]. e edge users achieve shared interactions through smart applications in edge computing-based IoT, which increase the probability of malware download.
Generally, there are two complementary classes of methods to defend against malware threat: detection-based method and prevention-based method. Tobias et al. [10] proposed a novel malware detection approach that used the compression-based graph mining, in which the characteristic behaviors were extracted by the quantitative data flow graphs to derive the detection accuracy. TaeGuen et al. [11] discussed the malware characteristics through the feature vector generation methods and proposed a multimodal deep neural network malware detection model for android applications. e advantage of this method is more suitable for the dynamic environments. Dehghantanha et al. [12] studied the malware detection problem in IoT using the deep learning based method, which provided a new direction for further research. Since various IoT device vulnerabilities, Indre et al. [13] created a system that could detect and prevent malicious connections based on machine learning to enhance network security. Lan et al. [14] researched the propagation of epidemic in complex networks and proposed a dynamic prevention model with a time-varying community network. ey considered the subnets of the network as communities and investigated the process of the malware. Khouzani et al. [15] searched the propagation of malware in a battery-constrained mobile device, considering the fact that malware can control the rate of killing the infectives and the scanning rate of the infectives. e maximum damage caused by the malware was quantified with the optimal control theory, through which the network damage can be minimized by adjusting the relevant parameters.
In addition to successful defense mechanisms to defend against malware threat, another effective defense scheme should consider both the limited resources and the dynamic characteristic of network. In recent years, game theory has been used to solve the decision making between the IoT devices and attackers [17,18]. Game theory provides a mathematical method for the problems that different players compete with each other or with contradictory goals. Similarly, an effective security scheme in edge computing-based IoT depends not only on the successful defense strategies but also on the attackers' behaviors.
Spyridopoulos et al. [19] proposed a game-based security model to solve the malware dissemination prevention problem and analyzed the optimal defense strategy for the defender to minimize the damage of malware and the security cost with the optimal strategy. Quang et al. [20] modeled the problem of defending against attackers in IoT networks as a Bayesian game of incomplete information and showed that there was a threshold for the frequency of active attackers. Liao et al. [21] designed a zero-sum stochastic game to analyze the effect of malware in IoT and obtained the optimal defense strategy by the feedback Nash equilibrium solutions for the game model. Sedjelmaci et al. [22] presented a game-based detection technology for IoT device, which can not only activate the anomaly detection technology but also balance the energy consumption. Kaur et al. [23] proposed a stochastic game net security model, which combined the advantages of game theory and stochastic Petri nets. Shen et al. [24] proposed a multistage privacy-preserved game model for malware detection in fog-cloud-based IoT networks. In [24], the optimal detection strategy was attained under the consideration of privacy leakage of IoT devices, and the proposed detection scheme overcame the problem of limited resources of IoT devices.
Nevertheless, none of the above research considered the stochastic characteristic of edge network.. In this paper, we introduce an infinite-horizon stochastic differential game to analyze the malware propagation problem in edge computing-based IoT, in which the stochastic characteristic of edge networks is considered.

System Model
In this section, we will use the infinite-horizon stochastic differential game to model the malware propagation under the dynamic interaction between the attackers and defenders. An infinite-horizon stochastic differential game involves an m-dimensional vector-valued stochastic differential equation which describes the evolution of the state and n objective functions where E t 0 · { } denotes the expectation operation taken at time Brownian motion, and the initial state x 0 is given in [25].
We consider an edge computing-based IoT environment with N IoT devices. Figure 1 shows the architecture of edge computing-based IoT.
We first use the SEIRS model [26] to describe the spread of the malware in edge computing-based IoT. Like the SEIRS model, we divide the IoT devices into susceptible, exposed, infective, and recovered classes. e devices in the infectious state show that the device has been infected by the malware and the susceptible device is prone to be infected, but not infected.
e exposed IoT device shows that it has been infected but not yet infectious, and the device in the recovery state represents that it has been immune to malicious attacks. We use S(t), E(t), I(t), and R(t) to denote the number of them at time t ∈ [t 0 , ∞), respectively (that is, Let β denote the rate of transmitting malware between a susceptible and an infectious IoT device, σ denote the rate of exposed IoT devices becoming infectious, c denote the rate of infectious IoT devices becoming recovered, ξ denote recovered IoT devices becoming susceptible, u 0 (t) denote the number of IoT devices from susceptible to exposed caused by the attacker strategy at time t ∈ [t 0 , ∞), and u 1 (t) denote the number of IoT devices from infectious to recovered caused by the defender strategy at time t ∈ [t 0 , ∞). As shown in Figure 2, due to the dynamic interaction between the attackers and defenders, the spread of the malware in edge computing-based IoT can be described as the following differential equations: In edge computing-based IoT, the parameters β, σ, c, and ξ also may fluctuate because of the effect of stochastic fluctuations of the edge network on the malware propagation. To characterize the fluctuation of the parameters β, σ, c, and ξ, we work on a complete probability space (Ω, F, F t t ≥ 0 , P) with a filtration F t t ≥ 0 satisfying the usual conditions [27]. By the central limit theorem, the fluctuation of the parameters β, σ, c, and ξ follows a normal distribution. en, we may replace the parameters β, σ, c, and ξ by β ⟶ β + ((δ 1 dB 1 (t))/dt), σ ⟶ σ + ((δ 2 dB 2 (t)) /dt), c ⟶ c + ((δ 3 dB 3 (t))/dt), and ξ ⟶ ξ + ((δ 4 dB 4 (t)) /dt), respectively, where B i (t) is standard Brownian motion defined on the complete probability space with B i (t 0 ) � 0 and δ i is a positive constant describing the intensity of the fluctuation for i � 1, 2, 3, 4. us, the differential equations (3) can be rewritten as the following Itô stochastic differential equations: As mentioned in Section 1, in edge computing-based IoT, attackers want to make malware infect more IoT devices to gain illegal gains using the attack strategy, while defenders want to reduce the damage caused by IoT devices infected with malware using the defense strategy. More accurately, the aim of the attackers includes maximizing the number of infectious IoT devices and the number of IoT devices from susceptible to exposed and reducing the payoff of the attack strategy; the aim of the defenders includes minimizing the number of infectious IoT devices and the number of IoT devices from susceptible to exposed and reducing the payoff of the defense strategy. Inspired by Alpcan and Başar [28], the payoff of the attack strategy and the defense strategy can be described as ((u 0 (t) 2 )/2c 0 ) and (((u 1 (t) 2 )/2c 1 )), respectively, where c 0 and c 1 are positive constants. us, the objective functions of the attackers and defenders can be formulated as Security and Communication Networks 3 where a 0 and a 1 are positive constants, b 0 and b 1 are constants, and r is the discount factor. Note that a 0 represents the benefits of each infectious IoT devices to the defenders while a 1 represents the losses of each infectious IoT devices to the defenders; c 0 describes that the payoff of the attack strategy is proportional to the number of IoT devices from susceptible to exposed caused by the attack strategy while c 1 describes that the payoff of the defense strategy is proportional to the number of IoT devices from infectious to recovered caused by the attack strategy. In summary, the malware propagation under the dynamic interaction between the attackers and defenders can be formulated as the infinite-horizon stochastic differential game: subject to the stochastic dynamics where

Nash Equilibrium Solution
In this section, we will discuss the feedback Nash equilibrium solutions for game 6-7 to obtain the optimal strategies for the defender and attackers. Each participant is assumed to be rational and the decision making of each participant depends on their own objective functions in this game. e feedback Nash equilibrium solutions for game 1-2 can be characterized by the following theorem [25]. Theorem 1. An n-tuple of strategies u * i � ϕ * i (x): i ∈ N provides a feedback Nash equilibrium solution to game 1-2 if there exist continuously twice differentiable functions W i (x): R m ⟶ R, i ∈ N, satisfying the following set of partial differential equations: where Ω[

x(s)] � g[x(s)]g[x(s)] T denotes the covariance matrix with its element in row h and column k denoted by
To obtain the feedback Nash equilibrium solutions for game 6-7, we consider the alternative problem subject to the stochastic dynamics where Invoking eorem 1, we obtain two feedback strategies u * 0 � ϕ * 0 (x) and u * 1 � ϕ * 1 (x) constituting the feedback Nash equilibrium solutions for game 13-14, if there exist continuously twice differentiable functions W i (x): R 4 ⟶ R, i � 0, 1, satisfying the following set of partial differential equations:

□
According to the proof of Proposition 1, the feedback Nash equilibrium solutions for game 6-7 is given by equation (21). In other words, the optimal strategies for the defenders and attackers are derived. e optimal state for game 6-7 describes the propagation of the malware in edge computing-based IoT when both the attackers and defenders adopt the optimal strategy. Substituting (21) into (7), we obtain the optimal state for game 6-7, i.e.,

Numerical Simulations
In this section, we discuss the implementation of the stochastic game algorithm which is given in Table 1 and analyze the proposed infinite-horizon stochastic differential game model by simulations. e algorithm is divided into two parts. One is the "feedback Nash equilibrium of defenders" part, which is used to calculate the optimal defense strategies during the attacks. e other is the "feedback Nash equilibrium of attackers," which is used to calculate the optimal attack strategies. e time and space complexity is O (n), respectively, because the proposed algorithm should be solved in a finite time horizon [0, T] for all the attackers and defenders. Besides, all the functions need to be invoked at each time.
We assume that the number of IoT devices is N � 10000 and consider the time horizon to be T � 20 minutes. e rest of the related simulation parameters are shown in Table 1. Figure 3 shows the optimal trajectory x(t) with time t. It can be seen that the number of the susceptible devices is rapidly decreased with the time variation, while the number of the infected devices increases at the beginning and then gradually decreases to zero. e dynamic evolution of the number of the exposed devices is similar to that of the infected devices. In addition, the number of the recovered devices is increased with the time variation. It means that defenders can respond their defense mechanism against attackers, which is consistent with the practical network environment, where the infected devices are always recovered and the exposed devices always exist.
Based on equation (21), we discuss the optimal strategies of the attackers and defenders in Figure 4. As shown in the results, the variation of the optimal defense strategy is increased with the time variation while the variation of optimal strategy of the attackers is decreased and then tends to be stable. Figure 5 shows the change of the optimal strategy of the attackers with a 0 and c 0 while Figure 6 shows the change of the optimal strategy of the defenders with a 1 and c 1 , where a 0 represents the benefits of each infectious IoT device to the defenders while a 1 represents the losses of each infectious IoT devices to the defenders; c 0 describes that the payoff of the attack strategy is proportional to the number of IoT devices from susceptible to exposed caused by the attack strategy while c 1 describes that the payoff of the defense strategy is proportional to the number of IoT devices from infectious to recovered caused by the attack strategy. It can It
ALGORITHM 1: e stochastic game algorithm for the defenders and attackers (the optimal strategies for the defenders and attackers).   The state trajectory can be seen that the level of the optimal strategy of attackers increases with the increase of a 0 and c 0 while the level of the optimal strategy of defenders increases with the increase of a 1 and c 1 . As shown in the results, the level of the optimal strategy of defenders grows faster.
e comparison of proposed model with the existing model [23] is shown in Figure 7; it can be seen that the number of infected devices in both models is rapidly increased with the time variation and then decreased. e number of infected devices in the proposed model is less than that of the comparative model, which means that the proposed security strategy is more effective and more suitable for IoT environment.

Conclusions
In this paper, we have proposed an infinite-horizon stochastic differential game model to study the malware propagation problem under the dynamic interaction between the attackers and defenders in the edge computingbased IoT environment that is composed by N IoT devices and to maximize the profit for both the attackers and defenders. In terms of model construction, we assumed that the states of IoT devices were infected, susceptible, exposed, and recovered and considered the effect of the stochastic fluctuations of the network on the state of the IoT devices. By solving the feedback Nash equilibrium solutions for our proposed game model, we obtained the optimal strategies for both the attackers and defenders. Based on the simulations results, it can be seen that the proposed model can prevent the malware propagation in edge computing-based IoT. In future work, we will apply this model to other resourceconstrained environments.
Data Availability e data used in this paper are given in Table 1.

Conflicts of Interest
e authors declare that they have no conflicts of interest.