Dynamic Modeling and Analysis of the Email Virus Propagation

A novel deterministic SEIS model for the transmission of email viruses in growing communication networks is formulated. Interestingly, the model is different from classical SEIS models not only in the form, but also in the dynamical features. We study the equilibria and their stability and analyse the bifurcation dynamics of the model. In particular, we find that the virus-free equilibrium is locally asymptotically stable for any parameter values, which may attribute to the absence of the basic reproduction number. It is shown that the model undergoes a saddle-node bifurcation and admits the bistable phenomenon. Moreover, on the basis of the Lyapunov function, the domains of attraction of equilibria are estimated by solving an LMI optimization problem. Based on the above theoretical results, some effective strategies are also provided to control the propagation of the email viruses. Additionally, our results are confirmed by numerical simulations.


Introduction
Currently, email has become one of the most basic applications in the Internet with the development of networked computer.As more and more people rely on email communications for business and everyday life, email viruses constitute one of the Internet major security threats.When an email virus infects a machine, it sends an infected email to the addresses in the computer's email address book.This self-broadcast mechanism allows for the virus's rapid reproduction and spread.Due to this facility, and the advantages of sending viruses through email 1 , hackers mostly tend to choose the email as the vector of spreading their viruses 2 .Indeed, according to the Virus Bulletin 3 and National Computer Virus Emergency Response Center 4 , the email viruses still account for a large share of the virus prevalence today.
In order to eradicate viruses, as well as to control and limit the impact of an outbreak, a lot of efforts have been devoted to develop various mathematical models to simulate the real case of viruses' spreading 5-9 .Unlike scanning worms such as Code Red or Slammer, email viruses spread over a logical network defined by email address relationship, making traditional epidemic models invalid for modelling the propagation of email virus.
The spread of email virus is a complex process.An adequate modeling of this process requires both a correct description of the logical network along which viruses spread 10-12 and a quantitative formulation of various behavioral mechanisms of the users who motivate the spread of the virus 1, 2 .
Due to the specific characteristics of email viruses propagation, some researchers focus attention on the topological structures of email networks 13 , while some researchers deal with the development of new techniques for the detection and elimination of viruses 14 .Immunization strategies in email networks are presented by experiments and simulations in 15 and 16 .In 2003, Zou et al. 1 presented an email viruses' model that accounts for the behaviors of email users, such as email checking frequency and the probability of opening an email attachment.But they rely primarily on simulation rather than mathematical analysis.In 12 , the simulations for a stochastic susceptible-hidden-infectious-recovered model and mathematical analysis for homogeneous and heterogeneous SIR model are provided.When considering immunization in the heterogeneous SIR model, the saddle-node bifurcation occurs.Here we must emphasize the hidden state which is a key characteristic that the email virus different from other viruses.So based on the spreading mechanism of email viruses, we proposed a SEIS model in growing homogeneous networks by stochastic approach.Relying primarily on mathematical analysis, we give some effective control strategies.
The rest of the paper is organized as follows.In Section 2, a discrete-time model of the email viruses' spreading is proposed whose detailed process is also given.Then we deal with the model to derive the continuous-time equations for the dynamics of email viruses spreading.In Section 3, we present a qualitative analysis of the model.We show that the virus-free equilibrium is locally asymptotically stable for any parameter values and the system admits a saddle-node bifurcation.Section 4 estimates the domain of attraction of the equilibrium on the basis of the determined Lyapunov function.The subsequent sections are the numerical examples and the conclusion and discussion.

Mechanism Analysis and Model Establishment
In this paper, we consider email viruses that are only transferred through users' email address books.Thus email address relationship forms a logical network for email viruses.Email address book of a user usually contains the user's friends' or business partners' email addresses.Thus if user A has user B's address, user B probably also has user A's address in his own address book.Due to this, we model the email network as an undirected graph, although this may not always be true.Without loss of generality, we can safely assume that every distinct email address is associated with one host computer which is associated with a single user.Then the logical network can be seen as host computer network, in which the vertices are the computers and the links are the address relationships of the corresponding users.Here we consider the network as a homogeneous network that assumes each computer has same neighbors which are supposed as k; that is, an email user has k addresses in his own address book and his address is in k users' address books.According to the already studied propagation characteristics, the spreading mechanism of email viruses can be described as follows: a healthy computer, which links with infectious one, would receive virus emails.But it may not be infectious immediately.The email viruses are hiding in the email box.At this moment, the computer is infected but not yet infectious.Only if the user opens the virus attachments, the computer becomes infectious.Then if the computer has been found infectious, the user would take antivirus measures to clean the viruses.After that, the computer recovers from infectious state and returns to healthy state again.
Obviously, the viruses' spreading is just the process of computers switching between different states essentially.We call computers that have not received virus emails and could be infected easily after receiving virus emails as the susceptible computers.The computer that has received virus emails but the virus attachments have not been opened is called the exposed computer.And further, a computer that has been infected by the viruses and can transmit them to susceptible computers is called infectious computer.The numbers of susceptible, exposed, and infectious computers at time t are denoted as S t , E t , and I t , respectively.Then the total number of computers at time t is N t S t E t I t .Here we assume the total number of computers is varying dynamically as time t; that is, the new computers can be connected to the network and some old computers can be removed from the network including the obsolescence computers .Suppose A is the rate at which new computers are connected and d is the rate at which the computers are removed per unit time.The flow diagram of the state transition is depicted in Figure 1.
By the above analysis, we have already known the mechanism of the virus propagation.In the following, we will elaborate the probabilities of transforming from S to E, E to I, and I to S, by assuming the number of infected contacts as a random variable that follows the binomial distribution.

The Transition from Susceptible State to Exposed State
Consider now a computer U is in the susceptible state at time t.By the usual homogeneous mixing approximation, Θ t I t /N t denotes the rate that a neighbor of the computer U is in the infectious state at time t.Suppose that k neighbors of computer U are independent with each other, we have k independent events occurring with the same probability Θ t .Then the number of the infected neighbors of computer U is a random variable that follows the binomial distribution B k, Θ t .And the probability of the number of the infected neighbors of computer U equal to k i at time t can be written as follows: When a computer is infected, it will send a virus email to all email addresses in the user's email address book.So as long as there are infected computers in the k neighbors, the susceptible computers would have chance to turn into exposed.Let p SS denote the probability that computer U still stays in the susceptible state in the time interval t, t Δt , then 1−p SS is the probability that it enters into the exposed state.k i is the number of the infected neighbors of computer U, and if p denotes the probability that the computer U received virus emails from each infected neighbor per unit time, then we have Because the infected number k i of the computer's k neighbors can change from 0 to k and the corresponding p SS k i , t changes from 1 to 1 − Δtp k , using 2.1 and 2.2 , we obtain the transition probability p SS t averaged over all possible values of k i as Then in the interval t, t Δt , the probability of a susceptible computer transforming to the exposed state is 1 − 1 − ΔtpΘ t k .

The Transition from Exposed State to Infectious State
An exposed computer being infected or not depends on whether the user opens the virus emails.As long as the email user opens a virus attachment one virus email only has one virus attachment arbitrarily, the computer will be infected, so only if the user does not open all virus emails that he has received, the computer will stay in the exposed state.Let p o be the probability that the email user opens a virus attachment 0 < p o < 1 ; thus 1 − p o is the probability that the email user does not open a virus attachment.Let the number of the virus emails that the computer U received during the time interval t, t Δt be n e ; then the probability that all n e virus attachments would not be opened is 1 − p o n e .Therefore, the probability of the computer infected is 1 In addition, from 2.1 , we can obtain that the average infected number of the computer's neighbors is Suppose the computer U received a virus emails from each infected neighbor per unit time, then in the time interval t, t Δt , the computer U received akΘ t Δt virus emails on average; that is, n e akΘ t Δt.
Substituting n e into the above formula, the probability of an exposed computer becomes infected is 1 − 1 − p o kaΘ t Δt in the time interval Δt.

The Transition from Infectious State to Susceptible State
An infected computer recovering from the infection and reentering into the susceptible state is dependent on the computer users.If the computers have been found virus-infected, the users would take antivirus measures; as a result the viruses could be removed and the computers will enter into susceptible state.But if the users do not do this, the computers will stay in the infected state.So, we suppose the computers return to the susceptible state at the probability δ per unit time.Based on the above analyses and Figure 1, we can indeed write the spread mechanism of email viruses as the following deterministic model:

2.5
In the limit Δt → 0, the above model can be simplified by keeping only the two first terms in the development of the binomial.This leads to the following continuous-time model: It is easy to see that system 2.6 can be shown to be mathematically well posed in the positive invariant region D { S, E, I | 0 ≤ S E I ≤ N 0 } and solution in D exists for all positive time.
Hence, 2.6 has the following limit system:

2.8
Rescaling 2.8 by x 1 α/N 0 d δ E, x 2 α/N 0 d δ I, τ d δ t and still writing τ as t, we obtain 2.9 where B α/ d δ , m β/α, n d/ d δ , and the corresponding positive invariant set is

Dynamic Analysis of System 2.9
The objective of this section is to perform a qualitative analysis of system 2.9 .We first consider the nonexistence of limit cycle.Then we study the equilibria and their stability.Finally, we prove the occurrence of saddle-node bifurcation.
Theorem 3.1.For system 2.9 , there is no limit cycle.
Proof.Take the Dulac function where Therefore, there is not a closed orbit inside the first quadrant.
Next, we will study the equilibrium situation of system 2.9 .The system always has the virus-free equilibrium E 0 0, 0 , and the Jacobian matrix is Obviously, 0, 0 is locally asymptotically stable for any parameter values.
To find the positive equilibrium, set which yields the equation of x 2 : We can obtain that 1 there is no positive equilibrium if B < 2 n/m 1/m 1; 2 there is one positive equilibrium if B 2 n/m 1/m 1; 3 there are two positive equilibria if B > 2 n/m 1/m 1.
In the following, we analyse the stability of the positive equilibria in different situations, respectively, and prove the existence of saddle-node bifurcation, to elaborate from three cases.
Case 1 only one positive equilibrium in system 2.9 .Suppose B 2 n/m 1/m 1; system 2.9 has one positive equilibrium the Jacobian matrix at E * is Clearly, the eigenvalues of matrix J E * are σ 1 0 and σ 2 − 1 m x * 2 − n; E * is a nonhyperbolic point.Therefore, we investigate the dynamics near E * by the center manifold theorem 17, 18 .
Firstly, shift E * to the origin via y 1 x 1 − x * 1 and y 2 x 2 − x * 2 ; system 2.9 can be transformed into

3.8
Secondly, define the transformation which transformed system 3.8 into the following standard form:

3.10
where By the existence theorem 19 , there exists a center manifold for system 3.10 , which can be expressed locally as follows: with ρ sufficiently small and Dh is the derivative of h with respect to z 1 .
To compute the center manifold W c 0 , we suppose h z 1 has the form By the local center manifold theorem, the center manifold 3.12 satisfies where , respectively, where

3.14
Substituting 3.10 into 3.13 and then equating coefficients on each power of z 1 to zero yields

3.15
Then, we get the approximation for h:

3.16
Substituting 3.16 to the first equation of system 3.10 , we achieve So from 3.17 see 20, p. 338-340 , we get the following theorem.
Theorem 3.2.If B 2 n/m 1/m 1, the system 2.9 has one positive equilibrium E * and E * is a saddle node.
Case 2 two positive equilibria in system 2.9 .Suppose B > 2 n/m 1/m 1, there are two positive equilibria

3.18
We first determine the stability of E * 1 .The Jacobian matrix at E * 1 is

3.19
Obviously, det 1 is a saddle.Now we analyze the stability of the second positive equilibrium E * 2 .The Jacobian matrix at E * 2 is By a similar argument as above, we obtain that

3.22
After some algebra calculations, we obtain Δ > 0; E 2 * is a stable node.So we have the following theorem.
Theorem 3.3.If B > 2 n/m 1/m 1, the system 2.9 has two positive equilibria E * 1 and E * 2 ; E * 1 is a saddle and E * 2 is a stable node.
Case 3 saddle-node bifurcation .From Cases 1 and 2, we can see system 2.9 experiences a saddle-node bifurcation.In this part, we give the proof.Let us consider B as a bifurcation parameter, define B 0 2 n/m 1/m 1 and x 0 E * .Rewrite system 2.9 as Then we have

3.24
Df x 0 , B 0 has a simple eigenvalue λ 0 with eigenvector v 0, 1 T , and that D T f x 0 , B 0 has an eigenvector w mx * 2 / 1 m x * 2 n , 1 T corresponding to λ 0. Furthermore, the following conditions are satisfied:

3.25
According to the theorem Sotomayor; see 21, p.148 , we get the following result.
Theorem 3.4.System 2.9 experiences a saddle-node bifurcation at the equilibrium x 0 E * as the parameter B passes through the bifurcation value B B 0 .

Estimation of the Domain of Attraction
From the previous study, we know, when B > 2 n/m 1/m 1, E * 2 is a stable node, and at the same time, the virus-free equilibrium is also locally asymptotically stable, that is, the bistable state.In this case, the eventual behavior of the system is sensitive to the initial positions.So in this section, we study the domains of attraction about the two equilibria using the method in 22 .For this task, we make some preparations.
Consider the following system: where f : R n → R n is an analytical function with the following properties: 1 f 0 0; that is, x 0 is an equilibrium point of 4.2 ; 2 every eigenvalue of ∂f/∂x 0 , the Jacobian matrix of f at the origin, has negative real part; that is, x 0 is an asymptotically stable equilibrium point.
Let DA be the domain of attraction of the zero solution of 4.2 .The following result provides a tool to determine DA.

4.3
The function V is positive on DA and lim x → x 0 V x ∞ for any x 0 ∈ ∂DA, where ∂DA is the boundary of DA.
Based on Lemma 4.2, the problem of determining DA can be reduced to finding the natural domain of analyticity of the solution V of 4.3 .The function V is called the optimal Lyapunov function for 4.2 .Generally, it is not easy to construct the optimal Lyapunov function V and determine its domain of analyticity.But, in the diagonalizable case, we can determine the coefficients of the expansion of W V • H at 0, where H reduces ∂f/∂x 0 to the diagonal form.By the Cauchy-Hadamard-type theorem 25 , we can obtain the domain of convergence D 0 of the series W, and DA 0 H −1 D 0 is a part of the domain of attraction.
For system 4.2 , the following lemma holds.

Lemma 4.3 see 26 .
For each isomorphism H : has a unique analytical solution; namely, W V • H, where V is the optimal Lyapunov function for 4.2 .
Let H : C n → C n be one isomorphism which reduces ∂f/∂x 0 to the diagonal form where V is the optimal Lyapunov function for 4.2 .
For given H, we consider the expansion of W at origin: and the expansions of the scalar components g i of g at origin: then the coefficients B j 1 j 2 •••j n of the development 4.5 are given by where λ i 0 is the i 0 th eigenvalue of ∂f/∂x 0 , i 0 1, 2, . . ., n, s ip s iq represents the entry of H which lies in the ith row and the pth qth column.Let V p , p ≥ 2 be the Taylor polynomials of degree p associated to V at the origin.For V p , we have the following results 27 .Lemma 4.4.For any p ≥ 2, there exists r p > 0 such that for any x ∈ B r p \ {0} one has where B r p {x ∈ R n | x < r p } and B r p is its closure.
Hence, for any positive integer p ≥ 2, V p is a Lyapunov function for system 4.2 in B r p .Now, we construct the Lyapunov function at the virus-free equilibrium.Define the following transformation: 4.9 Then system 2.9 is transformed into the following form: where

4.11
For the isomorphism H : C 2 → C 2 and the function g : R 2 → R 2 , it follows from Lemma 4.3 that there exists a unique analytical function W V • H such that ∇W, g − Hz 2 and W 0 0, where V is the optimal Lyapunov function for the reduced system 4.10 .Let the expansion of W at 0, 0 be 4.12 and denote the components g i of g as The coefficients of expansion for W are given by

4.15
Let W 2 z 1 , z 2 be the sum of the square terms of the expansion 4.12 .Clearly, Correspondingly, by the reversibility of H, we obtain an analytical function 4.17 where By using Lemma 4.4, it is easy to see that there exists a domain of the origin and in which V 2 is a positive definite function and further, it provides a Lyapunov function to guarantee that the trivial equilibrium E 0 of system 2.9 is asymptotically stable.Obviously, this domain belongs to DA.
Next step, we will use the Lyapunov function V 2 to estimate the domain of attraction for the virus-free equilibrium.For this, we first introduce the following results.Lemma 4.5 see 28 .Let V x be a Lyapunov function for system 4.2 in the domain Assume, moreover, that Ω c is bounded and contains the origin.If V is negative definite in Ω c , then the origin is asymptotically stable and every solution in Ω c tends to the origin as t → ∞.
To determinate the whole domain of attraction DA of a general nonlinear system is an extremely difficult problem.However, by Lemma 4.5 it is possible to compute subsets of the domain of attraction which are defined by 4.19 .Our objective is to find the maximum value c * of c 29 .This c * is defined by the following optimization problem.

4.20
Geometrically, this means that we seek the global minimum of the Lyapunov function V x along the hypersurface {x ∈ R n | dV x /dt 0, x / 0}.Correspondingly, for system 2.9 and the determined Lyapunov function V 2 , the optimization problem 4.20 should be reformulated as where V 2 is defined by 4.17 .Clearly, 4.21 is a linear matrix inequality LMI relaxations of global optimization problem with multivariable real-valued polynomial criterion and constraints see 30, 31 and the references therein for details .
Then Ω c * is the estimation of domain of attraction for analogously, for the positive equilibrium E * 2 , after tedious computation, we can obtain the Lyapunov function: where

4.23
From the discussion in Section 3, we can obtain hence k 1 , k 2 , and k 3 are all positive.Correspondingly, we can obtain the estimation of domain of attraction for E * 2 .

Numerical Examples
In this section, we will perform numerical examples to verify the results that obtained in previous sections, hence, provide an intuitive impression about saddle-node bifurcation, bistability, and the domain of attraction.Firstly, we present the saddle-node bifurcation.Here we emphasize that the virus-free equilibrium is locally asymptotically stable for any parameter values, and it can be observed in Figures 2, 3, and 4 simultaneously the green asterisk in Figures 2-4 .In Figure 2  B > 2 n/m 1/m 1.According to Theorem 3.3, there have been two positive equilibria; one of them is a saddle and the other is a node.Above all, system 2.9 experiences a saddle-node bifurcation which can be seen obviously from Figures 2-4.After crossing the critical value, the saddle node the red asterisk in Figure 3 becomes one saddle and one node the red pentagrams in Figure 4 .Consequently, the bistable case occurs.
Secondly, we show the domain of attraction for the equilibria.From Figure 4, we can see that system 2.9 has two locally asymptotically stable equilibria under the condition B > 2 n/m 1/m 1. Letting the parameters be the same as in Figure 4, we give the estimated results of domains of attraction Figures 5 and 6 .From 4.17 , we get the Lyapunov function corresponding to E 0 as follows: and its total derivative along the trajectory of system 2.9 has the form

5.2
Then solving 4.21 by using the MATLAB tool, we obtain c * 1 2.8989975698 for E 0 .The corresponding region is V 2 x E 0 c * 1 which is shown in Figure 5.In analogy with the virus-free equilibrium E 0 , we obtain the Lyapunov function of E * 2 as follows: c * 2 which is shown in Figure 6.

Conclusion and Discussion
In this paper, we have established a model to study email virus propagation and analyzed its dynamical behaviors.It is worthwhile to note that we use βEI/N instead of βE to depict the transition from E to I, which is different from most SEIS infectious disease models.

6.1
It is shown that the global dynamics is completely determined by the basic reproduction number R 0 .If R 0 ≤ 1, the disease-free equilibrium is globally stable.If R 0 > 1, a unique endemic equilibrium is globally stable in the interior of the feasible region.The reason of using βEI/N is that we have considered the number of virus emails that the user received in our model, which is proportional to the relative number of infected computers, that is, I/N.Due to this difference in the form, our model shows many different and complex dynamics.In our model, however, the basic reproduction number R 0 does not appear; hence the virus-free equilibrium is locally asymptotically stable for any parameter values.And it is proved that the model experiences a saddle-node bifurcation as parameter varies through the bifurcation value, which also occurs in vector-borne disease model 33 .Furthermore, the bistability is obtained.In short, the spreading mechanism and dynamical features of email virus are indeed different from most biological virus.This can offer a valuable perspective to the future dynamics modeling.Especially, the amount of virus is considerable in the transition from E to I when modeling the infectious disease.By doing so, the models would be beneficial for better understanding the infectious disease propagation.The main aim of the model is to investigate how email viruses transmit and to identify effective strategies for their prevention and control.So based on the arguments above and the simulation results, it is straightforward to provide some control strategies.
From the theoretical analyses and as a threshold parameter.If R < 0, the virus will die out; if R > 0, virus may die out or outbreak.
In the first case, to make R below than zero, the most direct way is to change the parameter values which correspond to the related measures.Obviously, R increases with k, a, and p o .The relationships between R and p, δ are shown in Figure 7; R increases with p, while decreases with δ.According to the sensitivity analysis, some measures can be taken to control virus propagation for instance, enhancing the people's understanding of email viruses to minimize the probability of opening a virus email p o and appealing people to install the antivirus software and update in time to increase the ability of finding and removing the viruses δ .
In the second case, from the definition of the domain of attraction, we know that starting from any initial points in the region Ω c * of an equilibrium, the solution of a dynamical system will converge to the equilibrium.As is well known, the solution tending to the virusfree equilibrium indicates that the virus will be extinct without taking any measures, whereas tending to the positive equilibrium indicates the outbreak of the virus; that is, a large number of computers will be infected.As a result, in order to control the propagation of the virus, we should take measures to restrict the initial value of the computers within the domain of attraction for virus-free equilibrium the white region in Figure 5 , such as improving the computers' detection ability and enhancing the people's awareness of email virus defense.
It should be noted that the epidemiological model described here for email viruses could also be used to study other disease similar to email viruses from an epidemiological point of view.Moreover, in our model, we only consider some fundamental factors.In the future work; more factors will be incorporated into new and improved models for prediction and containment of email viruses.

Figure 1 :
Figure 1: The flowchart of email virus spreading.

Definition 4 . 1
domain of attraction 23 .The domain of attraction of the origin is given by

2 Figure 3 :
Figure 3: The phase diagram of system 2.9 when B 2 n/m 1/m 1 saddle node .

Figure 5 :
Figure 5: The white region is the estimation of domain of attraction for E 0 .

Figure 6 :
Figure 6: The white region is the estimation of domain of attraction for E * 2 .

Figure 7 :
Figure 7: The combined influence of parameters on R; R is plotted as a function of p and δ.Here, k 6, a 1, p o 0.5, d 0.05.

Figures
For example, Fan et al. 32 proposed an SEIS model as