^{1}

^{1}

^{2}

^{2}

^{1}

^{2}

A network security situation assessment system based on the extended hidden Markov model is designed in this paper. Firstly, the standard hidden Markov model is expanded from five-tuple to seven-tuple, and two parameters of network defense efficiency and risk loss vector are added so that the model can describe network security situation more completely. Then, an initial algorithm of state transition matrix was defined, observation vectors were extracted from the fusion of various system security detection data, the network state transition matrix was created and modified by the observation vectors, and a solution procedure of the hidden state probability distribution sequence based on extended hidden Markov model was derived. Finally, a method of calculating risk loss vector according to the international definition was designed and the current network risk value was calculated by the hidden state probability distribution; then the global security situation was assessed. The experiment showed that the model satisfied practical applications and the assessment result is accurate and effective.

With the widespread use of Internet technology, network security has gradually attracted public attention. Attacks on the network are increasingly complex although the defense measures based on the intrusion detection [

Situation awareness technology was first used in the military field [

The research of network security situation assessment started late at home. Wei et al. proposed a network security situation assessment model based on information fusion, which used the improved D-S evidence theory to fuse multisource information [

Through the analysis of the network security situation assessment model at home and abroad, it is found that there are still many problems in the study of network security situation assessment: the state transition matrix is generally obtained by the experience of administrators, with strong subjectivity, and it is influenced by the administrator’s own ability; secondly, due to the lack of two parameters of network defense capability and risk loss, it is easy to lead to the calculation deviation of the hidden state vector sequence in the evaluation model when the observation vector sequence is generated. In this paper, we propose an improved Hidden Markov Model, which extends the five-tuple to seven-tuple in the traditional hidden Markov model and obtains a new model called HMM-Plus, or HMMP for short. The system fuses a variety of security detection data, extracts the main attack logs from the network security equipment to form the observation vector sequence, then corrects state transition matrix by the real-time state, forms the hidden state probability distribution sequence by using the improved Viterbi algorithm, finally, combines the network topology and the network asset information with the hidden state probability distribution to calculate the current network risk value, and then assesses the global security situation of the current network, making the analysis and processing ability of network security products improved to a great extent in multiple index.

Network security situation assessment model [

Information system assessment aims to understand the current and future risks of the system; assess the potential threats and the extent of the harm caused by these risks; and provide the basis for security decision-making, information system construction, and safe operation. Information system assessment process [

Information system assessment process.

Before introducing the HMMP model, let us first introduce the hidden Markov model (HMM). For HMM, we assume that _{1}, _{2}, …, _{N}},

For a sequence with length _{1}, _{2}, …, _{T}}, _{1}, _{2}, …, _{T}}, where

The HMM model has two important assumptions:

The hypothesis of homogeneous Markov chain: the hidden state at any time only depends on its previous hidden state. The advantage of this assumption is that the model is simple and easy to solve. If the hidden state at time _{t} = _{i} and the hidden state at time _{t+1} = _{j}, then the state transition probability _{ij} from time

So, _{ij} can form the state transition matrix

The hypothesis of observational independence: the observed state at any time depends only on the hidden state at the current time. If the hidden state at time _{t} = _{j} and the corresponding observed state is _{j} at time

So,

In addition, we need a set of initial hidden state probability distribution II at time

where

A HMM model can be determined by initial hidden state probability distribution II, state transition probability matrix

It is currently a hot topic to extend the traditional HMM model to carry out research in related fields [

_{1} = _{2} = _{3} = _{4} =

Safe State

Probe State

Attack State

Compromise State

Compromise log: this type of log indicates that a successful attacker gains the administrator privileges.

Scan log: this type indicates that the system has been scanned.

Attack log: this type of log indicates that the system has been attacked.

No log: no network security equipment logs on the network.

Suspicious log: the logs are not classified correctly.

_{ij}}, _{t} means the network is in a hidden state _{i} at time _{t+1} means the network is in a hidden state _{j} at time

This section mainly introduces the initial algorithm of hidden state transition matrix; this algorithm is different from the traditional algorithm, which is based on expertise. It is through the game theory to assess the transformational relation between hidden states and ultimately determine an initial state transition matrix.

The hidden state transition model is shown in Figure

Hidden state transition model.

These circles represent the system state; according to the definition of this model, there are four hidden states: Safe State _{j} and the hidden state is _{i}, if there is a security event _{j} in the network at this moment, the network will enter the hidden state _{j} at the next time. The process can be expressed as

When the current state is _{i}, the hidden state transition can be described by the following

The attack and defense game matrix under the security state _{i}.

_{1} | … | _{m} | |
---|---|---|---|

_{1} | _{j} | … | _{k} |

… | … | … | … |

_{n} | _{o} | … | _{p} |

_{1} to _{m} indicate security events; _{1} to _{n} indicate network defense measures; and _{j} indicates that when the network state is _{i}, if the network defense measure is _{1}, the network state will be transferred to _{j} when _{1} security event occurs. Similarly, _{o} indicates that when the network state is _{i}, if the network defense measure is _{n}, the network state will be transferred to _{o} when _{1} security event occurs.

The distribution of Safe State transitions can be seen intuitively from the matrix

The Safe State transition vector

Establishing state transition matrix for all states which are

Network security equipment defense efficiency refers to the network defense equipment and network basal equipment due to its high load or being attacked or other reasons and the availability is destroyed and cannot provide sufficient defense efficiency or external service. In order to assess the network security equipment defense efficiency, select the following factors: number of connections, bandwidth utilization, CPU utilization, and memory utilization.

The quantification of the above situation assessment factors is mainly carried out according to the following steps:

Establish hierarchy structure:

It is done by analyzing the relationship among network efficiency and CPU utilization, memory utilization, network bandwidth utilization, and the number of connections, to establish the following two layers of structure, as shown in Figure

Construct judgment matrix and assign value:

Assuming that the network bandwidth utilization is more important than the number of connections, the score is 3. In contrast, the number of connections for network bandwidth utilization is 0.3333333. Memory utilization and CPU utilization are equally important to the number of connections, which can be scored as 2. So, we can build a judgment matrix as shown in Table

Hierarchy structure of network equipment defense efficiency.

Network equipment defense efficiency judgment matrix.

Item | Number of connections | Bandwidth utilization | Memory utilization | CPU utilization |
---|---|---|---|---|

Number of connections | 1 | 1/3 | 1/2 | 1/2 |

Bandwidth utilization | 3 | 1 | 2 | 2 |

Memory utilization | 2 | 1/2 | 1 | 2 |

CPU utilization | 2 | 1/2 | 1/2 | 1 |

Weight calculation and consistency test:

Using the SPSSAU online analysis tool, the analysis results are shown in Table

where _{ij} is the comparison score of the judgment matrix.

AHP analysis results.

Item | Eigenvector | Weight value (%) | The largest eigenvalue | CI | RI | CR |
---|---|---|---|---|---|---|

Number of connections | 0.484 | 12.094 | 4.071 | 0.024 | 0.900 | 0.026 |

Bandwidth utilization | 1.667 | 41.680 | ||||

Memory utilization | 1.078 | 26.948 | ||||

CPU utilization | 0.771 | 19.278 |

Here, CR = CI/RI, the CI value has been obtained when evaluating the eigenvector, and the RI value is directly obtained by looking up the table, and the corresponding CR value is 0.026. SPSSAU prints this result directly, as well as consistency tests.

Now, the weights have been calculated and the judgment matrix satisfies the consistency test. Then, the overall defense efficiency

where _{t} represents the normalized standard value of indicator

In the current period, the average state of each index is used to measure the defense efficiency of the current period.

The probability of successful attack is different due to the different defense efficiency, so the state transition matrix is inconsistent under different conditions. Now, the modified vector

The probability transfer matrix

Viterbi algorithm is a dynamic programming algorithm, usually used to find the hidden state sequence which is most likely to produce observed event sequence from the hidden Markov model [

For the network security situation assessment model _{1}, and the calculation method of

Conditional probability formula is derived as follows:

When the parameter model is

By formulas (

Substituting the specific parameter model

And then using formula (

The exhaustive operand is too large and the recursion method is used in order to simplify the computation of the follow-up state probability vector, assuming at time

Now we only need the probability of the system in the state

In order to express convenience, reckon

Substituting into the formula, we can get

Then, the system state probability distribution at time

Finally, arranging above all, we can get

And

From the previous section, we get the steps to solve the system hidden state probability distribution in HMMP as follows:

Judge whether the current observation vector is read; if it is, then jump to step 8; otherwise, enter step 2

Obtain the current observation vector

Calculate the conditional probabilities separately of each hidden state when

Convert the conditional probability of each implied state into matrix

Calculate the system hidden state probability distribution without considering the observation vector

Calculate the conditional probability of each hidden state when

Convert the conditional probability of each implied state into matrix

Output the final hidden state probability distribution sequence

The pseudocode of Algorithm

Input: model

Output: final hidden state probability distribution sequence

According to the national standard definition, the risk is a function between the possibility of the system under attack and the degree of loss when the system is attacked. In the last section, the hidden probability distribution vector sequence is calculated, which is the probability of the system being attacked. Then, the following sections mainly calculate how much loss the system will be in the state, which is called risk loss vector.

Risk vector is used to measure the degree of loss of the system in some state.

First of all, there is asset evaluation.

The three security attributes of asset evaluation are classified as confidentiality, integrity, and availability. According to the national standard GB/T20984, the assets are classified into five levels, and the more important the assets are, the higher the severity level will be.

Then, there is severity levels classification.

Now, most of the network equipment uses syslog type of log; syslog divides the severity into eight levels, as shown in Table

Severity levels based on syslog.

Severity level | Log description |
---|---|

0 | Urgency: system is not available |

1 | Alert: must take measures right now |

2 | Important: important condition |

3 | Error: error condition |

4 | Warning: warn condition |

5 | Attention: normal with landmark condition |

6 | Information: information message |

7 | Debug: debug message |

Calculate the risk that the current system is facing according to the hidden state distribution probability sequence

In order to verify the rationality of the evaluation method proposed in this paper, the quantitative evaluation of network security situation was carried out by using the IDS data of a certain department in real environment in November 3, 2015. Network topology is shown in Figure

Real network experimental environment.

First, import data into the database such as IPS; Table

IPS original log.

Attack ID | Time | Attack name | Source IP | Destination IP | Source port | Destination port | Application protocol | Hit counts | Attack level |
---|---|---|---|---|---|---|---|---|---|

151000249 | 2015/11/3 0:00 | SMTP mail vulnerability | 27.24.159.231 | 168.160.167.28 | 24476 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:00 | SMTP mail vulnerability | 116.207.12.175 | 168.160.167.28 | 2724 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:00 | SMTP mail vulnerability | 221.239.226.214 | 168.160.200.18 | 2858 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:00 | SMTP mail vulnerability | 221.239.226.214 | 168.160.200.18 | 2858 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:01 | SMTP mail vulnerability | 119.147.194.226 | 168.160.1.104 | 47271 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:01 | SMTP mail vulnerability | 119.147.194.226 | 168.160.1.104 | 47271 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:01 | SMTP mail vulnerability | 111.176.71.222 | 168.160.200.18 | 3843 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:01 | SMTP mail vulnerability | 27.24.159.228 | 168.160.1.104 | 11682 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:01 | SMTP mail vulnerability | 27.24.159.228 | 168.160.1.104 | 11682 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:01 | SMTP mail vulnerability | 49.70.232.139 | 168.160.1.109 | 4687 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:01 | SMTP mail vulnerability | 116.207.13.72 | 168.160.200.18 | 1785 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:01 | SMTP mail vulnerability | 116.207.13.72 | 168.160.200.18 | 1785 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:02 | SMTP mail vulnerability | 27.24.159.231 | 168.160.167.28 | 2981 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:02 | SMTP mail vulnerability | 111.176.77.199 | 168.160.200.18 | 3101 | 25 | SMTP | 1 | Prompt |

151000249 | 2015/11/3 0:02 | SMTP mail vulnerability | 111.176.77.199 | 168.160.200.18 | 3101 | 25 | SMTP | 1 | Prompt |

Organize the data into a standard weblog format and extract the required fields, time, attack name, source IP, source port, destination port, and attack level, as shown in Table

Network security equipment log.

Time | Attack name | Source IP | Source port | Destination IP | Destination port | Attack level |
---|---|---|---|---|---|---|

2015/11/3 0:00 | SMTP mail attachment vulnerability | 27.24.159.231 | 24476 | 168.160.167.28 | 25 | Prompt |

2015/11/3 0:00 | SMTP mail attachment vulnerability | 116.207.12.175 | 2724 | 168.160.167.28 | 25 | Prompt |

2015/11/3 0:00 | SMTP mail attachment vulnerability | 221.239.226.214 | 2858 | 168.160.200.18 | 25 | Prompt |

2015/11/3 0:00 | SMTP mail attachment vulnerability | 221.239.226.214 | 2858 | 168.160.200.18 | 25 | Prompt |

2015/11/3 0:01 | SMTP mail attachment vulnerability | 119.147.194.226 | 47271 | 168.160.1.104 | 25 | Prompt |

2015/11/3 0:01 | SMTP mail attachment vulnerability | 119.147.194.226 | 47271 | 168.160.1.104 | 25 | Prompt |

2015/11/3 0:01 | SMTP mail attachment vulnerability | 111.176.71.222 | 3843 | 168.160.200.18 | 25 | Prompt |

2015/11/3 0:01 | SMTP mail attachment vulnerability | 27.24.159.228 | 11682 | 168.160.1.104 | 25 | Prompt |

2015/11/3 0:01 | SMTP mail attachment vulnerability | 27.24.159.228 | 11682 | 168.160.1.104 | 25 | Prompt |

2015/11/3 0:01 | SMTP mail attachment vulnerability | 49.70.232.139 | 4687 | 168.160.1.109 | 25 | Prompt |

2015/11/3 0:01 | SMTP mail attachment vulnerability | 116.207.13.72 | 1785 | 168.160.200.18 | 25 | Prompt |

2015/11/3 0:01 | SMTP mail attachment vulnerability | 116.207.13.72 | 1785 | 168.160.200.18 | 25 | Prompt |

2015/11/3 0:02 | SMTP mail attachment vulnerability | 27.24.159.231 | 2981 | 168.160.167.28 | 25 | Prompt |

2015/11/3 0:02 | SMTP mail attachment vulnerability | 111.176.77.199 | 3101 | 168.160.200.18 | 25 | Prompt |

2015/11/3 0:02 | SMTP mail attachment vulnerability | 111.176.77.199 | 3101 | 168.160.200.18 | 25 | Prompt |

2015/11/3 0:02 | SMTP mail attachment vulnerability | 111.176.71.222 | 4214 | 168.160.1.104 | 25 | Prompt |

2015/11/3 0:02 | SMTP mail attachment vulnerability | 111.176.71.222 | 4214 | 168.160.1.104 | 25 | Prompt |

2015/11/3 0:02 | SMTP mail attachment vulnerability | 58.217.74.33 | 2270 | 168.160.1.104 | 25 | Prompt |

2015/11/3 0:02 | SMTP mail attachment vulnerability | 58.217.74.33 | 2270 | 168.160.1.104 | 25 | Prompt |

2015/11/3 0:03 | SMTP mail attachment vulnerability | 222.72.175.185 | 4819 | 168.160.200.18 | 25 | Prompt |

The time interval of the test is 5 minutes; the calculation process is the following based on the alarm importance calculation method.

There are 43 alarm logs in the time from 0:00, 2015/11/3, to 0:05, 2015/11/3; 42 of them are SMTP e-mail attachment vulnerability, and the attack level is prompt and happened for the first time. One of them is Web application: SQL injection attack, the attack level is prompt and happened for the first time. So the importance of SMTP e-mail attachment vulnerability is 42

There are 56 alarm logs in the time from 0:05, 2015/11/3, to 0:10, 2015/11/3; 51 of them are SMTP e-mail attachment vulnerability, and the attack level is prompt and happened in the last period. Five of them is Web application: SQL injection attack, the attack level is prompt and happened for the first time. So, the importance of SMTP e-mail attachment vulnerability in this period is 51

Obtain the main alarm vectors in each time period of 288 time periods in turn to form the alarm vector sequence, as shown in Table

Observation vector sequence.

Start time | End time | Attack name | Number of attacks | Importance of attack |
---|---|---|---|---|

2015/11/3, 0:00 | 2015/11/3, 0:05 | SMTP mail attachment vulnerability | 42 | 1117 |

2015/11/3, 0:05 | 2015/11/3, 0:10 | SQL injection attack (select) | 51 | 155 |

2015/11/3, 0:10 | 2015/11/3, 0:15 | SMTP mail attachment vulnerability | 31 | 191 |

The next step is to model the Markov model. Firstly, the initial state transition matrix is constructed according to the security event and the defense strategy:

Extract the main efficiency indexes from the firewall, IPS, and the host in this period, including CPU utilization, memory utilization, bandwidth utilization, and connection utilization, as shown in Figures

CPU utilization sequence.

Memory utilization sequence.

Bandwidth utilization sequence.

Connections utilization sequence.

Analyze the weight of each parameter by the Analytic Hierarchy Process, the process is as follows:

CPU utilization is as important as memory utilization, as the main task of the network service is outside, so the network bandwidth utilization and the number of connections are more important than CPU utilization and memory utilization. The department prohibits the operation of large flow, but allowing more people to access the same time, so the number of connections is slightly more important than the bandwidth utilization. Finally, we can get the relation matrix, as shown in Table

Equipment protection efficiency judgment matrix.

Proportion of the number of connections | Bandwidth utilization | Memory utilization | CPU utilization | |
---|---|---|---|---|

Proportion of number of connections | 1 | 3 | 5 | 5 |

Bandwidth utilization | 1/3 | 1 | 3 | 3 |

Memory utilization | 1/5 | 1/3 | 1 | 1 |

CPU utilization | 1/5 | 1/3 | 1 | 1 |

Input process, and get CR = 0.0571 < 0.1 which accord with the consistency standard. The weight is shown in Table

The weight of each index of equipment defense efficiency.

Number of connections | Bandwidth utilization | Memory utilization | CPU utilization |
---|---|---|---|

0.5441 | 0.2481 | 0.1039 | 0.1039 |

Take each efficiency into the final defense efficiency curve, as shown in Figure

Final defense efficiency curve.

The hidden state probability distribution curve, as shown in Figure

Hidden state distribution sequence.

And then, obtain the network risk vector in a similar way.

According to the confidentiality, integrity and availability of the national standard, the asset importance attribute is shown in Table

Asset importance information.

Confidentiality | Integrity | Availability | Importance | ||
---|---|---|---|---|---|

Server 1 | 2 | 3 | 4 | 2.884 | 3 |

Server 2 | 2 | 3 | 5 | 3.017 | 3 |

Server 3 | 3 | 5 | 4 | 3.914 | 4 |

Server 4 | 3 | 4 | 5 | 3.914 | 4 |

Combined with the losses caused by security events, the risk loss vectors are obtained as follows:

Finally, we can get the figure of network situation, as shown in Figure

Final network situation diagram.

From Figure

The network situation assessment technology based on HMMP is studied mainly in this paper, to solve the problem that network administrators can control the global network state real-timely in the face of multisource logs. In order to achieve this goal, state transition matrix generation method in HMMP is designed in this paper, the system can modify the state transition matrix automatically in real-time according to the network state through this method and obtain the hidden state probability distribution sequence of the current system through the improved Viterbi algorithm. Finally, the final network risk value is obtained through the method of calculating the risk loss vector. The experiment shows that the security assessment based on HMMP in this paper can describe the current network state precisely and comprehensively.

The data set can be obtained free of charge from

The authors declare that they have no conflicts of interest.

This present research work was supported by the National Natural Science Foundation of China (nos. 61202458 and 61403109), the Natural Science Foundation of Heilongjiang Province of China (no. F2017021), and the Harbin Science and Technology Innovation Research Funds (no. 2016RAQXJ036).