A Virus Propagation Model and Optimal Control Strategy in the Point-to-Group Network to Information Security Investment

Epidemiological dynamics is a vital method in studying the spread of computer network viruses. In this paper, an optimal control measure is proposed based on the SEIR virus propagation model in point-to-group information networks. First, considering the need for antivirus measures in reality, an optimal control problem is introduced, and then a controlled computer virus spread model in point-to-group information networks is established. Second, the optimal control measure is formulated by making a tradeoff between control cost and network loss caused by virus intrusion. -ird, optimal control strategies are theoretically investigated by Pontryagin’s maximum principle and the Hamiltonian function. Finally, through numerical simulations, effective measures for controlling virus spread in point-to-group information networks are proposed.


Introduction
Given the situation that critical infrastructures and human daily activities are becoming more and more dependent on information networks, it is vital to understand their weaknesses and uncover potential risks to threaten their security. Network viruses are the main threat that leads to the cyber security. Observing the significant similarities between the propagation of biological epidemics and network viruses, Kephart and White devoted pioneering work research that uses epidemic models to investigate the propagation characteristics of worms and targeted countermeasures [1][2][3]. After that, some biological epidemic models, such as susceptible-infected-susceptible (SI/SIS) and susceptible-infected-recovered (SIR), have been adapted to capture behaviors of the network virus propagation [4][5][6][7][8][9][10]. Based on these classical models, lots of extent models, which have more state transitions among nodes, have also been proposed to investigate the dynamical characteristic of internet worms [11][12][13][14][15]. ese previous studies contributed to reveal the dynamical phenomena of internet worms and provided the effective theoretical instructions for network managers to protect network security. e optimal control theory has been widely applied to the control of the spread of the epidemic [16][17][18][19][20]; however, to our knowledge, research studies which combine the optimal control theory with models of network viruses are seldom. Zhu et al. [21] established a delayed computer virus model with control, and the existence and uniqueness of an optimal control strategy were proved. Liu et al. [22] proposed a malware dynamical model with the impact of users' security awareness and, furthermore, developed effective prevention strategies with human interventions by optimal control theory. Yang et al. [23] proposed the novel conflicting opinion propagation model based on the differential game theory, which is contributed to the misinformation restraint and competitive viral marketing. ese previous studies demonstrate that optimal control strategies are effective to prevent the diffusion of network viruses. e above existing models suppose that the infection of network viruses is "point to point," i.e., an infected node infects a susceptible node one time. In fact, there is a situation that an infected node will infect a group of nodes in the information sharing network, such as the popular e-mail system and instant message software. To investigate the effect of optimal control strategies on virus prevalence in the information sharing network, in this paper, an epidemic model with optimal control measures is addressed, which is based on [24].
Our main contributions are as follows: (1) we present a virus propagation model with variable antivirus measures in the point-to-group information sharing network; (2) the optimal control function to adjust antivirus strength is constructed; (3) the existence of the optimized solution is proved theoretically.
is paper is organized as follows. Section 2 formulates the model and corresponding optimal control strategy. e optimal control strategy is studied in Section 3. Section 4 represents some numerical results. Finally, Section 5 concludes the paper.

Model
Formulation. An information sharing network is a complex communicating network including a great deal of nodes (computer, terminal, or entity, etc.), in which one node that received the information can send the information to other nodes in the network. We can represent the information sharing network by an undirected network, as shown in Figure 1.
Our model is based on the network virus epidemic model with the point-to-group (P2G) information propagation proposed by Hua Yuan and Guoqing Chen [24]. is model assumes that the individuals in an information network system have four states: susceptible (S), exposed (E), infected (I), and recovered (R). At any time, the node is in one of the four states. And the state of an individual can switch under the action of virus infection and antivirus measures. Consider the following two facts: e hosts will undergo a latent period (E-state) during the transition from the S-state to the I-state because users do not open the link or e-mail immediately Users may immunize their hosts with countermeasures in states S, E, and I ese countermeasures can be seen as the strategies of multistate antivirus and may result in the following state transition paths: S ⟶ R, using countermeasures of real-time immunization E ⟶ R, cleaning viruses after hosts are infected I ⟶ R, cleaning viruses after hosts are infected As a result, the model is formulated as the following ordinary differential equations: where μ is the replacement rate of old nodes, α is the transition rate from E to I, δ is the recovery rate from I to R, N is the total number of nodes in the network, ρ SR describes the impact of implementing real-time immunization and ρ ER describes the impact of cleaning the virus and immunizing the nodes in the latent period, v(t) represents the transition rate from S to E, where v(t) � (αr/N)E(t), and r is the average number of neighbor nodes (with various states) that are directly connected with an infected node.

Optimal Control Strategy.
In system (1), parameters ρ SR , ρ ER , and δ mean the strength of antivirus measures. In reality, they should be variable, so in this paper, we will induce a control function to adjust ρ SR , ρ ER , and δ. For our purpose, we want to find the control strategy which can obtain the minimum loss and cost when the accident of cyber security indicate an admissible control set. Consider the loss of finance caused by network viruses is relevant to the number of infected nodes. Assume that the average loss caused by a node per unit time is the constant χ. en, the whole loss caused by all infected nodes in unit time is proportional to the number of infected nodes I, expressed as χI. Define L loss (I(t)) as the loss function; then, during the time interval [0, T], L loss (I(t)) can be calculated as follows: 2 Complexity In addition, countermeasures need the investment of enterprises, such as buying antivirus software and users' security education. e famous Gordon-Loeb model shows that the maximum security investment is not necessarily optimal from the view of economics [25]. So, the equilibrium is vital that the loss of the enterprise is the minimum in the condition of minimum investment in information security when a cyber security incident occurs. Let L cos t (φ(t)) be the cost of deploying security systems during the time interval 0 T . Define subject to where κ is a tradeoff coefficient. Equation (4) is the control system that we will concern. Aiming at equation (4), our aim is to find a control function φ(t) to minimize the following objective function: where K(φ) is the sum of loss and investment about cyber security.
For investigating the dynamics of control system (4), we can calculate the basic reproduction number and equilibria by the dynamical theory of differential equations [26]. From equation (4), it is easy to find that the fourth equation will not affect the dynamics of equation (4). So, without loss of generality, we can omit the fourth equation when discussing the dynamics of equation (4). It is easy to get the expression of the basic reproduction number, which is determined by When R 0 < 1, equation (4) has the only equilibrium Q 0 � (S 0 , E 0 , I 0 ) � ((μ/ρ SR φ(t) + μ), 0, 0) which means the network virus will be eliminated completely with the evaluation of time. When R 0 > 1, equation (4) has the epidemic equilibrium Q * besides Q 0 , where Q * is expressed by

Solving the Optimal Control Strategies
For obtaining the optimal control solution, we need to define the Lagrangian and Hamiltonian function for optimal control problems (4) and (5). In fact, Lagrangian of the optimal problem can be given by Next, we will seek an optimal function φ which satisfies that the integration of equation (8) is minimum. To do so, we define Hamiltonian H as follows: Theorem 1. ere exists an optimal control function φ * such that K(φ * (t)) � min φ∈Θ K(φ(t)).
Proof. According to the result of Kamien and Schwartz [27], it is easy to confirm the existence of an optimal control function to system (4).
First, the control set and corresponding state variables are not empty. e control set Θ is convex and closed. In the meanwhile, the right parts of the equations of system (4) are bounded and continuous and can be written a linear function of φ in the state variables. Besides, L(I, φ) is convex on Θ, and there exist a constant ρ > 1 and two positive numbers ξ 1 and ξ 2 such that L(I, φ) ≥ ξ 1 + ξ 2 (|φ|) (ρ/2) . us, we conclude that there exists an optimal control function.
Next, we will find the optimal solution by means of Pontryagin's maximum principle.
Proof. By differentiating Hamiltonian (8) with respect to state variables S, E, I, and R, we obtain the adjoint system as follows: By using optimality conditions, we have It follows that Considering the feature of the feasible region of Θ, we obtain 4 Complexity Hence, we can have an optimal control function φ * (t) as follows: So, we have the following optimal system:

Numerical Simulations
In this section, numerical simulations will be performed to compare system (1) without optimal control with system (16) with optimal control, and systems (1) and (16) will be solved with the Runge-Kutta fourth order. As shown in Table 1, the parameter values are taken. Furthermore, we assume that the initial states of the systems are  Figure 2 shows that the growth rate of exposed nodes significantly decreased in the system with optimal control and that the evolution speed from exposed nodes to infected nodes slowed obviously. Hence, taking optimal control measures, network managers can have more time to protect information systems when viruses intrude. e same as Figure 2, Figure 3 shows that the growth rate of infected nodes with optimal control measures is much slower than that without optimal control measures. Moreover, the scale infected nodes also decreased in the network system with optimal control.
For surveying the evolution of the optimal control strategy with time, Figure 4 is depicted. From Figure 4, one can see that the control strategy keeps a stable state, and then the intensity of control increases with the increase of infected nodes. Subsequently, the intensity of control decreases with a reduction of virus spread.

Conclusion
e objective of this work is to model the virus prevalence in point-to-group information sharing networks and then find out certain optimal strategies of controlling the virus propagation. Taking into account the tradeoff of investment and income, the strength of network security defense measures should vary with the severity of the virus damage. So, based on the e-SEIR model [24], we consider the control parameters φ(t) that change over time. First, we put forward the objective function and study the optimal control strategy toward the parameter φ(t). Second, the existence and uniqueness of the optimal control strategy are proved. Finally, some numerical experiments are performed, which show that the scale and speed of virus prevalence will decrease greatly by taking optimal control strategies. In the future, we need more effort to trace the real data to test our results.

Data Availability
All the data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.      Complexity