Network Security Situation AssessmentModel Based on Extended Hidden Markov

A network security situation assessment system based on the extended hiddenMarkov model is designed in this paper. Firstly, the standard hidden Markov model is expanded from five-tuple to seven-tuple, and two parameters of network defense efficiency and risk loss vector are added so that the model can describe network security situation more completely. -en, an initial algorithm of state transition matrix was defined, observation vectors were extracted from the fusion of various system security detection data, the network state transition matrix was created and modified by the observation vectors, and a solution procedure of the hidden state probability distribution sequence based on extended hiddenMarkov model was derived. Finally, a method of calculating risk loss vector according to the international definition was designed and the current network risk value was calculated by the hidden state probability distribution; then the global security situation was assessed. -e experiment showed that the model satisfied practical applications and the assessment result is accurate and effective.


Introduction
With the widespread use of Internet technology, network security has gradually attracted public attention. Attacks on the network are increasingly complex although the defense measures based on the intrusion detection [1], firewall [2], virus prevention, and others have been formed, but it is also more and more difficult to get effective information and take effective emergency measures because the alarm information is too large. For example, IDS alarm data is enormous, false alarm and omission often happened, it is difficult to grasp the network security situation, and these traditional security means focus on the solution of unilateral security problems; how to grasp the current network situation accurately has become the hot topic in the field of Internet [3]. Network security situation assessment technology considers security elements comprehensively, reflects network states constantly, accurately predicts potential threats, and helps network administrators take effective measures [4].
Situation awareness technology was first used in the military field [5], and now it is widely used in aviation, transportation, network, medical emergency, and many other areas [6]. In 1988, Endsley firstly proposed the concept of situation awareness [7]; then, Bass proposed the concept of network situation awareness, which includes element extraction, situation understanding and situation assessment, and other contents and gave the concept of network situation awareness model [8]; Lakkaraju et al. got data mining technology as a network situation awareness of the key technologies [9]; Elshoush fused the elements which were extracted by data mining technology and used the fusion into intrusion detection, but it was difficult to avoid false alarm because of its huge number of data [10,11]. e research of network security situation assessment started late at home. Wei et al. proposed a network security situation assessment model based on information fusion, which used the improved D-S evidence theory to fuse multisource information [12], but this method was prone to have evidence conflicted. Chen et al. proposed a quantitative hierarchical threat evaluation model for network security, which started threat calculation from the bottom [13], but the method was too subjective, and the accuracy was not enough. Xi proposed a situation assessment method based on the attack graph method, which used the network topology and attack targets to construct the attack path, but this method was prone to the state space combination explosion problem [14]. Zhu et al. proposed the evaluation method based on honeynets, which used honeynets to collect intrusion behavior and draw the curve of the situation, but this method is only aimed at the intrusion behavior, and its data source was single [15].
rough the analysis of the network security situation assessment model at home and abroad, it is found that there are still many problems in the study of network security situation assessment: the state transition matrix is generally obtained by the experience of administrators, with strong subjectivity, and it is influenced by the administrator's own ability; secondly, due to the lack of two parameters of network defense capability and risk loss, it is easy to lead to the calculation deviation of the hidden state vector sequence in the evaluation model when the observation vector sequence is generated. In this paper, we propose an improved Hidden Markov Model, which extends the five-tuple to seven-tuple in the traditional hidden Markov model and obtains a new model called HMM-Plus, or HMMP for short.
e system fuses a variety of security detection data, extracts the main attack logs from the network security equipment to form the observation vector sequence, then corrects state transition matrix by the real-time state, forms the hidden state probability distribution sequence by using the improved Viterbi algorithm, finally, combines the network topology and the network asset information with the hidden state probability distribution to calculate the current network risk value, and then assesses the global security situation of the current network, making the analysis and processing ability of network security products improved to a great extent in multiple index.

Network Security Situation Assessment Technology
Network security situation assessment model [16] refers to the factors which affect the network security situation and the relationship between them. reat sources include hostile network or physical attacks; negligent or intentional man-made errors; and natural or man-made disasters. Once the threat event occurs, it will lead to unauthorized disclosure of information; modification of information; damage of information; or loss of confidentiality, integrity, and availability of information systems. From the above, we can see that risk is a function between the probability of occurrence of a threat and the harm it caused.
Information system assessment aims to understand the current and future risks of the system; assess the potential threats and the extent of the harm caused by these risks; and provide the basis for security decision-making, information system construction, and safe operation. Information system assessment process [17] is mainly divided into 4 steps: the first step is to prepare the assessment; the second to execute assessment; the third to feedback the evaluate result; and the fourth to maintain the assessment, as shown in Figure 1.

Network Security Situation Assessment Model Based on HMMP
Before introducing the HMMP model, let us first introduce the hidden Markov model (HMM). For HMM, we assume that S is a set of all possible hidden states and V is a set of all possible observed states, which satisfy S � {s 1 , s 2 , . . ., s N }, where N is the number of possible hidden states and M is all possible observed states. For a sequence with length T, I corresponds to a sequence of states and O to a sequence of observations, which satisfies e HMM model has two important assumptions: (1) e hypothesis of homogeneous Markov chain: the hidden state at any time only depends on its previous hidden state. e advantage of this assumption is that the model is simple and easy to solve. If the hidden state at time t is i t � s i and the hidden state at time t + 1 is i t+1 � s j , then the state transition probability p ij from time t to t + 1 can be expressed as So, a ij can form the state transition matrix P of Markov chain: (2) e hypothesis of observational independence: the observed state at any time depends only on the hidden state at the current time. If the hidden state at time t is i t � s j and the corresponding observed state is o t � v k , then the probability q j (v k ) generated by the observed state v k under the hidden state s j at time t satisfies So, q j (v k ) can constitute the probability matrix Q generated by the observed state: In addition, we need a set of initial hidden state probability distribution II at time t � 1: where π(i) � P(i 1 � s i ).
A HMM model can be determined by initial hidden state probability distribution II, state transition probability matrix P, and observed state probability matrix Q. II and P determine the state sequence and Q determines the observation sequence. erefore, the HMM model can be represented by a five-tuple as follows: λ � (S, V, P, Q, ).

HMMP Model.
It is currently a hot topic to extend the traditional HMM model to carry out research in related fields [18,19]. e standard HMM model consists of fivetuple λ � (S, V, P, Q, ). In this paper, we expand it to seven-tuple λ � S, V, P, Q, , F, C , called HMM-Plus model, or HMMP for short, in which two parameters of network defense efficiency and risk loss vector are added to make it possible to describe the network security situation better.
(1) S, hidden state set space, S � s 1 , s 2 , . . . , s n , indicates all the hidden states that the system may be in; there are N hidden states. In this paper, the network hidden state is divided into Safe State G, Probe State P, Attack State A, and Compromise State C according to practical demand. Here, we can set s 1 � G, s 2 � P, s 3 � A, and s 4 � C.
(i) Safe State G (good) indicates that the host or network is not attacked. (ii) Probe State P (probed) indicates that the host or network is being probed or scanned. (iii) Attack State A (attacked) host or network is being attacked by one or more objects. (iv) Compromise State C (compromised) indicates that the network or host has been compromised. (i) Compromise log: this type of log indicates that a successful attacker gains the administrator privileges. (ii) Scan log: this type indicates that the system has been scanned. (iii) Attack log: this type of log indicates that the system has been attacked. (iv) No log: no network security equipment logs on the network. (v) Suspicious log: the logs are not classified correctly.
(3) P, hidden state transition matrix, denotes the transition probability between the hidden states of sys- and i t+1 means the network is in a hidden state s j at time t + 1.
the probability that the network is in the state G j at the initial moment. (6) F, current network defense efficiency, indicates the efficiency of defense efficiency of the current network Step 1: prepare for assessment derived from organizational risk frame Step 4: maintain assessment Mathematical Problems in Engineering e higher the F, the better the network defense capability.
indicates the risk value that the system faces when the network is in the state T.

Primary Generation of State Transition Matrix.
is section mainly introduces the initial algorithm of hidden state transition matrix; this algorithm is different from the traditional algorithm, which is based on expertise. It is through the game theory to assess the transformational relation between hidden states and ultimately determine an initial state transition matrix.
e hidden state transition model is shown in Figure 2. ese circles represent the system state; according to the definition of this model, there are four hidden states: Safe State G, Probe State P, Attack State A, and Compromise State C. E represents the security events that may occur in the network; D indicates the defense measures in the network. Assuming that the current network has security measures D j and the hidden state is s i , if there is a security event E j in the network at this moment, the network will enter the hidden state s j at the next time. e process can be expressed as When the current state is s i , the hidden state transition can be described by the following |E| × |D| matrix, as shown in Table 1.
E 1 to E m indicate security events; D 1 to D n indicate network defense measures; and s j indicates that when the network state is s i , if the network defense measure is D 1 , the network state will be transferred to s j when E 1 security event occurs. Similarly, s o indicates that when the network state is s i , if the network defense measure is D n , the network state will be transferred to s o when E 1 security event occurs. e distribution of Safe State transitions can be seen intuitively from the matrix |E| × |D|. e probability of state from s i to s j is e Safe State transition vector P i � (s iG , s iP , s iA , s iC ) can be obtained when the current state is s i .
Establishing state transition matrix for all states which are s i of the current network severally, then we can get the initial transition matrix: P � P GG P GP P GA P GC P PG P PP P PA P PC P AG P AP P AA P AC P CG P CP P CA P CC . (8)

Network State Transfer Matrix Modification Based on
Defense Efficiency. Network security equipment defense efficiency refers to the network defense equipment and network basal equipment due to its high load or being attacked or other reasons and the availability is destroyed and cannot provide sufficient defense efficiency or external service. In order to assess the network security equipment defense efficiency, select the following factors: number of connections, bandwidth utilization, CPU utilization, and memory utilization. e quantification of the above situation assessment factors is mainly carried out according to the following steps: (1) Establish hierarchy structure: It is done by analyzing the relationship among network efficiency and CPU utilization, memory utilization, network bandwidth utilization, and the number of connections, to establish the following two layers of structure, as shown in Figure 3.
(2) Construct judgment matrix and assign value: Assuming that the network bandwidth utilization is more important than the number of connections, the score is 3. In contrast, the number of connections for network bandwidth utilization is 0.3333333. Memory utilization and CPU utilization are equally important to the number of connections, which can be scored as 2. So, we can build a judgment matrix as shown in Table 2, in which the data are for illustrative purposes only.
(3) Weight calculation and consistency test: Using the SPSSAU online analysis tool, the analysis results are shown in Table 3. Here, we can also obtain the weight vector w i by using the arithmetic mean method according to the following formula: where a ij is the comparison score of the judgment matrix.
Now, the weights have been calculated and the judgment matrix satisfies the consistency test. en, the overall defense efficiency F of the current network is calculated according to the following formula: where f t represents the normalized standard value of indicator i at the current time t. (4) In the current period, the average state of each index is used to measure the defense efficiency of the current period.
(5) e probability of successful attack is different due to the different defense efficiency, so the state transition matrix is inconsistent under different conditions.

Mathematical Problems in Engineering
and j indicates the hidden state of the system: (6) e probability transfer matrix P t is modified according to the modified vector c t

Solution of Network Hidden State Sequence Based on
Improved Viterbi Algorithm. Viterbi algorithm is a dynamic programming algorithm, usually used to find the hidden state sequence which is most likely to produce observed event sequence from the hidden Markov model [20]. In this paper, the probability distributions of hidden states of the system every time in HMMP are obtained according to the idea of Viterbi algorithm. e solution procedure is as follows.
For the network security situation assessment model λ � S, V, P, Q, , F, C , suppose the observation vector sequence is Y � (y 1 , . . . , y t , . . . , y T ). Take the first observation as y 1 , and the calculation method of c 1 (i) in the initial state is as follows: where α 1 (s i ) represents the probability of the system in the state s i at time t � 1 (i.e., the initial time). P(x 1 � s i | y 1 , λ) represents the probability of the system in the state s i when y 1 can be observed from the sequence of observed vectors and the model's parameter is λ.
Conditional probability formula is derived as follows: When the parameter model is λ, the probability of observing y 1 is equal to the sum of the product of the probability that y 1 can be observed in all states and the probability that the system is in the same state when the parameter model is λ. e derivation is as follows: By formulas (14) and (15) we can get Substituting the specific parameter model λ, we can get And then using formula (17) to calculate the probability of the system in each state at the initial time, the system probability matrix at the initial time e exhaustive operand is too large and the recursion method is used in order to simplify the computation of the follow-up state probability vector, assuming at time t, the system probability matrix X t � [α t (s i )] 1×N , s i ∈ S. As known, α t (s i ) � P(x t � s i | y 1 , y 2 , . . . , y t , λ) represents the probability of the system in the state s i when the parameter model is λ and the observation vector sequence is y 1 , y 2 , . . . , y t . To solve the system state probability matrix X t+1 at time t + 1, α t+1 s i � P x t+1 � s i y 1 , y 2 , . . . , y t , y t+1 , λ , Figure 2: Hidden state transition model. where α t+1 (s i ) indicates the probability of the system in the state s i at time t + 1. P(x t+1 � s i | y 1 , y 2 , . . . , y t , y t+1 , λ) indicates the probability of the system in the state s i at time t + 1 when the parameter model is λ and the observation is y t+1 . In the same way, where P(y t+1 | x 1 � s i , λ) indicates the probability that y t+1 can be observed when the parameter model is λ and the system is in the state s i . is probability is q i (y t+1 ) which is from the observation vector probability distribution matrix of model λ. P(x t+1 � s i | y 1 , y 2 , . . . , y t , λ) indicates the probability of the system in the state s i at time t + 1. Now we only need the probability of the system in the state s i at time t + 1. From the definition of the state transition matrix, we can see that the probability of the system in the state s i at time t + 1 is equal to the sum of the product of state probability distribution matrix when the system is at time t and the probability that the system would transfer to state s i : P x t+1 � s i y 1 , y 2 , . . . , y t , λ � s k ∈S P x t � s k y 1 , y 2 , . . . , y t , λ P x t+1 � s j x t � s k , λ , (20) where s k ∈S P(x t � s k | y 1 , y 2 , . . . , y t , λ) indicates the probability that the system is in state S i when the parameter model is λ and the observation vector sequence is y 1 , y 2 , . . . , y t , which is the previously assumed condition α t (s i ). P(x t+1 � s j | x t � s k , λ) indicates the probability that the system would transfer to state s i the next time when the parameter model is λ and the current moment is s k . We can get In order to express convenience, reckon β t+1 (s i ) � P(x t+1 � s i | y 1 , y 2 , . . . , y t , λ), and then Substituting into the formula, we can get   (23) en, the system state probability distribution at time t + 1 is Finally, arranging above all, we can get

Algorithm Pseudocode Description.
From the previous section, we get the steps to solve the system hidden state probability distribution in HMMP as follows: (1) Judge whether the current observation vector is read; if it is, then jump to step 8; otherwise, enter step 2 (2) Obtain the current observation vector y t and judge whether the current time is the initial time; if it is, go to step 3; otherwise enter step 5 (3) Calculate the conditional probabilities separately of each hidden state when y t is observed by the formula α 1 (s i ) � (q i (y 1 )π i / N j�1 q j (y 1 )π j ). (4) Convert the conditional probability of each implied state into matrix X t in order and store the final Input: model λ of HMMP, observation vector sequence Y, real-time efficiency sequence F Output: final hidden state probability distribution sequence (1) for t � 1 to T do (2) if t � � 1 repeat (13) endif (14) repeat (15) return X; ALGORITHM 1: Algorithm for solving hidden state probability distribution sequence.  hidden state probability distribution sequence X; then return to step 1 (5) Calculate the system hidden state probability distribution without considering the observation vector y t by the formula β t (s i ) � s i ∈S p ki · c t+1 k · β t−1 (s k ). (6) Calculate the conditional probability of each hidden state when y t is observed by the formula α t (s i ) � (q i (y t )β t (s i )/ s j ∈S q j (y t )β t (s j )) .
(7) Convert the conditional probability of each implied state into matrix X t in order, and store the final hidden state probability distribution sequence X; then return to step 1 (8) Output the final hidden state probability distribution sequence X; then end the program. e pseudocode of Algorithm 1 is as follows:

Calculation of Risk Loss Vector.
According to the national standard definition, the risk is a function between the possibility of the system under attack and the degree of loss when the system is attacked. In the last section, the hidden probability distribution vector sequence is calculated, which is the probability of the system being attacked. en, the following sections mainly calculate how much loss the system will be in the state, which is called risk loss vector. First of all, there is asset evaluation. e three security attributes of asset evaluation are classified as confidentiality, integrity, and availability. According to the national standard GB/T20984, the assets are classified into five levels, and the more important the assets are, the higher the severity level will be. en, there is severity levels classification. Now, most of the network equipment uses syslog type of log; syslog divides the severity into eight levels, as shown in Table 4.

e Current Network State Assessment.
Calculate the risk that the current system is facing according to the hidden state distribution probability sequence X and the loss when the system is in each state. e loss of the current network state can be calculated by the following formula:   Figure 5: CPU utilization sequence.

Mathematical Problems in Engineering
and c t (i) indicates the probability that the system is in the state s i at time t. C(i) indicates the risk that the system will face in the state s i .

Case Study
In order to verify the rationality of the evaluation method proposed in this paper, the quantitative evaluation of network security situation was carried out by using the IDS data of a certain department in real environment in November 3,  Figure 4. e experimental network is connected to the Internet through a router, in which a firewall, intrusion prevention system, and other security defense systems are deployed, and the local area network includes the business office area and the server area. e experimental data include security event alarm record from the firewall, intrusion detection system, and server host and the record of efficiency index operation.

Network topology is shown in
First, import data into the database such as IPS; Table 5 is part of the IPS original data.
Organize the data into a standard weblog format and extract the required fields, time, attack name, source IP, source port, destination port, and attack level, as shown in Table 6. e time interval of the test is 5 minutes; the calculation process is the following based on the alarm importance calculation method.
ere are 43 alarm logs in the time from 0:00, 2015/11/3, to 0: 05, 2015/11/3; 42 of them are SMTP e-mail attachment vulnerability, and the attack level is prompt and happened for the first time. One of them is Web application: SQL injection attack, the attack level is prompt and happened for the first time. So the importance of SMTP e-mail attachment vulnerability is 42 * 1 + 42 * 25 + 1 * 25 � 1117; the importance of SQL injection attack is 1 * 1 + 1 * 25 + 1 * 25 � 51. So, the main alarm in this period is SMTP e-mail attachment vulnerability.
ere are 56 alarm logs in the time from 0:05, 2015/11/3, to 0:10, 2015/11/3; 51 of them are SMTP e-mail attachment vulnerability, and the attack level is prompt and happened in the last period. Five of them is Web application: SQL injection attack, the attack level is prompt and happened for the first time. So, the importance of SMTP e-mail attachment vulnerability in this period is 51 * 1 + 51 * 1 + 1 * 1 � 103; the importance of SQL injection attack is 5 * 1 + 5 * 25 + 1 * 25 � 155. So, the main alarm in this period is Web application: SQL injection attack.
Obtain the main alarm vectors in each time period of 288 time periods in turn to form the alarm vector sequence, as shown in Table 7. e next step is to model the Markov model. Firstly, the initial state transition matrix is constructed according to the security event and the defense strategy: P � P GG P GP P GA P GC P PG P PP P PA P PC P AG P AP P AA P AC P CG P CP P CA P CC     Extract the main efficiency indexes from the firewall, IPS, and the host in this period, including CPU utilization, memory utilization, bandwidth utilization, and connection utilization, as shown in Figures 5-8, respectively.
Analyze the weight of each parameter by the Analytic Hierarchy Process, the process is as follows: CPU utilization is as important as memory utilization, as the main task of the network service is outside, so the network bandwidth utilization and the number of connections are more important than CPU utilization and memory utilization. e department prohibits the operation of large flow, but allowing more people to access the same time, so the number of connections is slightly more important than the bandwidth utilization. Finally, we can get the relation matrix, as shown in Table 8.
Input process, and get CR � 0.0571 < 0.1 which accord with the consistency standard. e weight is shown in Table 9.
Take each efficiency into the final defense efficiency curve, as shown in Figure 9.
e hidden state probability distribution curve, as shown in Figure 10.
And then, obtain the network risk vector in a similar way.
According to the confidentiality, integrity and availability of the national standard, the asset importance attribute is shown in Table 10.
Combined with the losses caused by security events, the risk loss vectors are obtained as follows: C � C G , C P , C A , C C � (1, 2.6, 6.3, 14.7). (28) Finally, we can get the figure of network situation, as shown in Figure 11.
From Figure 11, we can see that in the 18 min to 20 min time period, the number of host connections is almost saturated, the service cannot provide services, and the host is actually in the capture state. And at this point, the network risk value is also in the highest state, in line with the actual situation.
rough the above security events information and network situation diagram, network administrators can clearly understand the global network security event occurring at that time and control the network situation realtimely.

Conclusions
e network situation assessment technology based on HMMP is studied mainly in this paper, to solve the problem that network administrators can control the global network state real-timely in the face of multisource logs. In order to achieve this goal, state transition matrix generation method in HMMP is designed in this paper, the system can modify the state transition matrix automatically in real-time  according to the network state through this method and obtain the hidden state probability distribution sequence of the current system through the improved Viterbi algorithm. Finally, the final network risk value is obtained through the method of calculating the risk loss vector. e experiment shows that the security assessment based on HMMP in this paper can describe the current network state precisely and comprehensively.
Data Availability e data set can be obtained free of charge from http://kdd. ics.uci.edu/databases/kddcup99/kddcup99.html.

Conflicts of Interest
e authors declare that they have no conflicts of interest.