Evaluation and Prediction Method of System Security Situational Awareness Index Based on HMM Model

In recent years, with the continuous development and progress of information technology and science and technology, big data has entered all walks of life, integrated into the lives of the public, and has become a necessity for social operation; the gradual development of artiﬁcial intelligence has also made life in modern times. People in society are more and more convenient. However, the development of science and technology is also accompanied by corresponding problems, and the war in information has gradually started. This paper simulates the possible information security through the hidden Markov model and then veriﬁes the feasibility and eﬀectiveness of the situation assessment method and the situation prediction method, in order to eﬀectively evaluate the relevant information security level and eﬀectively predict the accuracy of the situation value. The experimental results show that the ﬂuctuation of the situation value corresponds to the diﬀerent attack behaviors carried out by the attacker, accurately describes the information security status of the system, and veriﬁes the eﬀectiveness and accuracy of the situational awareness method proposed in this paper, while the situation prediction method based on ARIMA predictable short-term changes in situational values can be used for short-term forecasts that require high accuracy.


Introduction
In recent years, with the continuous improvement of people's economic income, mobile phones and computers have become the necessities of every family. Cybersecurity cases often occur in our lives, our cyberspace security is facing severe challenges, and many enterprises or organizations will also face cyberattacks, such as the once-famous "Aurora Attack," which attacked Google's mail server attacked. Some foreign hacker organizations hope to steal our country's military secrets and understand our country's political and economic situation by attacking our country's software. In China, some criminals use the Internet to attack some websites to steal the internal data of the websites and defraud some elderly people through the Internet. ese situations affect the social atmosphere and reduce the happiness of residents' lives. As these problems become increasingly prominent, it should be put on the agenda to improve system security protection capabilities, pay attention to network security, and monitor and predict possible events. When security aggregation and configuration related data [12]; analysis of sources and identification methods of information risks [13]; application of artificial intelligence in music education [14]; performance improvement of water pollution monitoring and rapid decision-making systems [15]; network security situation prediction [16]; the combination of artificial intelligence algorithms and physical modeling [17]; and so on. It is defined as "the extraction and understanding of the surrounding environmental factors within a certain time and space, and the prediction of the future development trend." Situational awareness originated from the military field and was soon applied to other fields. e application of situational awareness in network security is even more extensive and far-reaching; in enterprise information security management, the defense-discovery-repair approach is adopted [18].

Situational Awareness Model.
e three-layer situational awareness model visually represented in Figure 1 is a widely accepted general theoretical model given by Endsley.
e model is composed of situational element extraction, situational understanding, and situational prediction and plays an important role in the subsequent research on network security situational awareness systems.

Security Situation
Forecast. Security situation prediction is the highest level technology in the whole situational awareness model [19]. e prediction of network security situation plays an important role in the defense of network security. e definition of situation prediction is to make a preestimation of the events or scenarios that will occur in the future to determine the probability of its occurrence, which usually requires rigorous investigation and observation, and artificial intelligence algorithms such as machine learning and deep learning can discover and identify potential patterns in input data and output of the required prediction information. ey have achieved great success in computer vision, natural language processing, and other fields and are widely used in artificial intelligence algorithms. ey have also been used in network security situation prediction and achieved initial results. erefore, according to certain scientific basis, through the analysis and study of relevant factors, a specific prediction model as shown in Figure 2 is established.

Cybersecurity Situational Awareness.
e network situation is a network state and changing trend, which is affected by factors such as different types of network operating conditions, network behaviors, and user behaviors. ese behaviors are combined together to form a network situation. In a large-scale network environment, we can select those security elements for analysis, fully understand the changes in the network situation, and use big data technology to process different types of information. e perception platform integrates user terminals, through different types of perception data sources, fully explores technologies, understands intelligent algorithms, and improves the sensitivity of the network security situational awareness platform. Situational awareness describes the network system and requires a full understanding of its microstate, which is reflected by various connection parameters. After in-depth mining of parameters, determine the correlation and development trend of information, and use related tools (algorithms or measures) to detect and perceive, and associate the data and information from detection and perception in some way to form knowledge. is completes a basic process of situational awareness as shown in Figure 3.
We know that the biggest feature of network situational awareness [20] is the need to measure the physical network.
e following abilities serve as support: (1) Big data processing capability: the Internet is huge, and a large amount of concurrent data traffic is transmitted through each node. In this kind of data reanalysis, Network maintenance personnel cannot meet the requirements of the end-of-network activity and the characteristics of the network situation rate. erefore, the sampling method to detect abnormal network activities cannot meet the requirements. Network situational awareness must have the ability of very good processing power for massive data.
(2) Fine analysis capability: the detection of data packets must meet fine-grained requirements, and various data streams and parameters must be efficiently retrieved and matched. To achieve a good perception of the network microstate, in addition to the traditional key physical parameters (source and destination addresses, ports, and protocol types), situational awareness technology must also have the ability to identify multiple logical parameters.  (3) Protocol identification capability: as an important parameter, the protocol type is very important to accurately identify it. In the process of implementing situational awareness, in addition to identifying widely used Internet standard protocols (such as TCP/IP protocol suite, etc.), other nonstandard and private protocols should also be captured as much as possible. Security poses a threat. For the identified protocol features (fingerprints), a fingerprint database shall be formed, which shall be continuously updated and maintained to support the effective matching retrieval function. (4) Reliable operation capability: situational awareness technology equipment needs to have reliable operation capability to ensure long-term normal operation. Any interruption of the operation process may cause inaccurate perception information. (5) Business diversion capability: after all kinds of information are accurately identified, they should be redirected in a certain way; that is, after classifying the information, the next step is processed in a targeted manner. is can not only make the data information more accurate, but also greatly reduce the background processing load, improve operating efficiency, and save memory capacity.

Hidden Markov Model
Hidden Markov model is a commonly used probability model in statistics. In this section, the concept of hidden Markov model is introduced from the classic stock market problem, and then three types of problems and solutions of hidden Markov model are introduced. e observation sequence problem extends the one-dimensional hidden Markov model to a multidimensional hidden Markov model.

Overview of Hidden Markov Models (HMM). Hidden
Markov model (HMM) is a powerful probabilistic modeling tool for characterizing implicit stochastic processes with observable sequences. HMM have been used in areas such as signal processing, pattern recognition, and machine learning. At the beginning of the twentieth century, Andrei Markov proposed the mathematical theory of Markov Process. Until 1960, Baum and his colleagues proposed and developed the hidden Markov theory model. Figure 4 depicts a simple example of a Markov process used to describe changes in the stock market. is stochastic process divides the daily changes of the stock market into three states: A city, B city, and a volatile market, which correspond to three observations of a stock's stock price rising, falling, and remaining unchanged. e transition between states is a Markov process with limited time discrete state space, also known as a Markov chain.
Assuming that the probability that the stock is in city A on the first day is 0.7, if it is observed that the stock is risingfalling-falling for three consecutive days, then it can be inferred that the changing state of the stock for three consecutive days is city A-city B-city B, and we can calculate the probability of this happening as If each state is allowed to correspond to multiple observations, for example, when the stock is in market A, not only is there an increase in one observation, but also a Scientific Programming decrease or shock may be observed, and the Markov chain can be extended to a hidden Markov model. is change can make the model more expressive. In Figure 5, the stock may also fluctuate or decline slightly in the state of city A. If the stock is observed to rise-fall-fall for three consecutive days, it cannot be said that the stock must be in the B market, so the stock state is "hidden" and can exist in any state sequence with a certain probability. e mathematical definition of hidden Markov model can be given as follows: (2) S is the set of hidden states and V is the set of observable states: en define the hidden state sequence Q of length T and the corresponding observation sequence O: A is the implicit state transition matrix. e element a ij of the matrix represents the probability of transitioning from state i to state j. Note that the state transition probability is independent of time: B is the observation matrix, and the element b i (k) of the matrix represents the probability of observing v k in the hidden state i of the system. is probability is also independent of time: π is the initial probability distribution matrix, and the elements represent the probability that the system is in each hidden state at the initial moment: e HMM makes two assumptions. e first one is called the Markov assumption, which considers that the current state of the system only depends on the state of the system at the previous moment, which is expressed as e second assumption is called the independence assumption, which considers that the observed state of the system only depends on the implicit state of the system at the current moment, which is expressed as

Hidden Markov ree Kinds of Problems.
For HMM to be useful in practical applications, three problems related to them must be solved, which are estimation problem, decoding problem, and learning problem.

Estimation Problem.
Given an HMM model λ, calculate the probability P(O|λ) of occurrence of observation sequence O. is problem can be viewed as evaluating the ability of a known model to predict a given sequence of observations, and by comparing P(O|λ) the most appropriate model can be selected. Given a sequence of hidden states Q, the probability of observing sequence O is e probability of occurrence of the hidden state sequence Q is P(Q|λ) � π q1 a q1q2 a q2q3 · · · a qr−1qr .   Scientific Programming Given a model, the observation probability can be calculated: From this, the probability of occurrence of observation sequence O for a given model can be calculated, but its time complexity is in the exponential form with respect to time T (to be precise, it requires 2T.N T calculations). ere are a large number of identical operations in the abovementioned calculation process, and these redundant operations can be reduced by means of cache calculation, so as to achieve the purpose of reducing the time complexity. e meshed grid is used to cache the operations that need to be repeated in the calculation process, and the grid can be moved forward until time T to obtain the result. is method is called the forward algorithm. To this end, an intermediate variable α needs to be introduced, which represents the probability that the implicit state is s i and the observation sequence e specific algorithm is as follows: (1) Initialization: (2) Recursion: (3) Termination: Strictly speaking, the forward algorithm can solve the evaluation problem, but in order to solve the learning problem, a backward algorithm must be introduced, which can also solve the evaluation problem. Similar to the forward algorithm, define an intermediate variable β, given the state s i at time t, the probability of observing the sequence from o t+1 to o T : Unlike the forward algorithm, the backward algorithm recurses from the back to the front. e specific algorithm is as follows: (1) Initialization: (2) Recursion: (3) Termination:

Decoding
Problem. e purpose of decoding is to find the hidden state sequence that is most likely to produce a given observation sequence, that is, the known model λ, and to find the hidden state sequence Q that makes the observation sequence O most likely to appear. e best solution to the decoding problem is to use the Viterbi algorithm, which is another grid algorithm, similar to the forward algorithm, except that the probability at each moment is maximized instead of summing. We can define which is the probability of the most likely hidden state path that makes the observation sequence appear up to time t. e Viterbi algorithm is as follows: (1) Initialization: (2) Recursion: (3) Termination: (4) Backtracking of the optimal state sequence: e main difference between the Viterbi algorithm and the forward algorithm is that the Viterbi algorithm maximizes the probability in the recursive process, rather than summing it up, and stores the state when the probability is the largest, so as to end the used backtracking.
Backtracking allows finding the optimal sequence of states from the states stored in the recursive steps, and there is no easy way to find a suboptimal sequence of states.

Learning Problems.
Learning problems are divided into two categories: supervised and unsupervised, corresponding to two standard solutions. Learning problems are divided into two categories: supervised and unsupervised, corresponding to two standard solutions. If the training data set used to solve the learning problem is supervised, that is, when the observation sequence is given, the corresponding hidden state sequence is also specified, and a supervised learning algorithm is used. If the training data set is unsupervised, that is, only the observation sequence is given, the unsupervised learning algorithm, also known as the B-W algorithm, can be used. e B-W algorithm, jointly proposed by Baum and his colleagues, is a very classic algorithm for solving model parameter selection problems.
In order to describe the estimation process of HMM parameters, the B-W algorithm first defines an intermediate variable to represent the probability of state s i at time t and state s i+1 at time t+1 under the premise of given model parameters and observation sequence: e B-W algorithm uses the forward algorithm at time t and the backward algorithm at time t+1, which cleverly combines the forward algorithm and the backward algorithm, also known as the forward-backward algorithm, and ξ t (i, j) is also called the forward algorithm Backward variable.
According to the Bayesian formula, the forward-backward variable announcement can be written as: In the formula, the numerator term is P(q t � s i , q t+1 � s j , O|λ), and the denominator term is P(O|λ), which is obtained by the full probability formula.
Define c t (i) as the probability that the model is in state s i at time t, and establish the relationship between c t (i) and ξ t (i, j) as follows: e value obtained by the summation of c t (i) at time t can be used to represent the expectation of the number of visits to state s i , that is, the expectation of the number of transitions from state s i . Likewise, the result of summing ξ t (i, j) over time can be used to express the expectation of the number of transfers from s i to s j . e idea of the B-W algorithm is to iteratively obtain the new model parameter λ. Once P(O|λ) > P(O|λ) is found, λ is assigned to λ and iteratively calculates until P(O|λ) no longer changes significantly. erefore, unfortunately, this algorithm can only obtain the local optimum. In order to calculate λ efficiently, a helper function for λ is introduced:

(31)
No matter how λ changes, as long as L(λ, λ) is increased, the infimum of P(O|λ) can be increased, and then maximizing L(λ, λ) can increase the probability of P(O|λ), namely, erefore, iterative calculation can make P(O|λ) converge to the maximum point.

(34)
For solving the above problems, there are three natural constraints: According to the Lagrange multiplier method, combined with the above constraints, the extreme value of each component of L(λ, λ) can be obtained, and the one-step optimal estimation of the HMM parameters can be obtained: (36) e new model parameter λ is obtained iteratively from the above. Once P(O|λ) ≥ P(O|λ) is found, λ is assigned to λ, and the iterative calculation is performed until P(O|λ) no longer changes significantly, and the final stone is used as the parameter estimation result of HMM. e B-W algorithm cleverly uses the idea of maximum likelihood estimation to obtain the model parameter λ that maximizes P(O|λ).

Experimental Results and Analysis
By selecting the gateway computer that is easy to be attacked, the system is connected to the signal and the ATS intranet at the same time, as the attack object of the experiment. Select DDoS as the main attack method, conduct vulnerability scanning, MS17-010 vulnerability attack, and DDoS attack. Based on the above data, relevant experimental analysis was carried out.

Situation Assessment Experiment Results and Analysis.
By collecting the situational factor index data of gateway computer, CI, and ZC when the system is running normally, the maximum likelihood estimation method is used to fit the probability distribution of the data, and the K-S test is used to determine the distribution and parameters of the data. en, according to the distribution of the data, the Lloyd-Max method is used to divide the data into states, and the state division interval with the minimum quantization error is obtained.
Taking the gateway computer as an example, the change and distribution of CPU usage within 1 hour are shown in Figures 6 and 7. e real data and the CDF curve of the fitted t distribution basically coincide, and the K-S test result is passed; it can be considered that the gateway computer CPU usage follows a t distribution with parameters u � 10.6727, σ � 0.5594, and ] � 4.3870. e change and distribution of RAM usage within 1 hour are shown in Figures 8 and 9. e real data and the CDF curve fitting the Gaussian distribution basically coincide, and the K-S test result is passed; it can be considered that the RAM usage of the gateway computer obeys the Gaussian distribution. e parameters are u � 29.4752 and σ � 0.3447. Figures 10 and 11 show the change and distribution of the network sending rate within 1 hour. e real data basically coincides with the CDF curve fitting the t distribution, and has passed the K-S test. It can be considered that the gateway computer network sending rate obeys the t distribution. e parameters are u � 16.1603, σ � 0.3524, and ] � 3.7766. Figures 12 and 13 show the change and distribution of the network reception rate within 1 hour. e real data basically coincides with the CDF curve fitting the t distribution and has passed the K-S test. It can be considered that the gateway computer network reception rate obeys the t distribution. e parameters are u � 21.5830, σ � 1.0182, and ] � 4.3939. e distribution and distribution parameters of each index data of the gateway computer are summarized as shown in Table 1. e table lists the distribution and parameters of the gateway computer's CPU usage, RAM usage,   network sending rate, and network receiving rate. Since the gateway computer is running, there is basically no data interaction with the disk, and the read rate and write rate of the disk are usually 0, so we will not do much research on it here. After obtaining the distribution of each indicator data, the Lloyd-Max algorithm can be used to quantify the data. e quantification interval of each indicator is shown in Table 2.
In the same steps, the distribution and distribution parameters of each index of CI are obtained, as shown in Table 3. Since CI performs a large number of logical operations during the running process, it consumes a lot of CPU, and the average CPU usage rate reaches about 61%. However, the memory space occupied by the CI application software when running is smaller than that of the gateway application software, so the average RAM usage of the CI is slightly lower than that of the gateway computer.
e Lloyd-Max algorithm is used to quantify the data of each index of the CI, and the combined index is shown in Table 4. Table 5 shows the distribution and distribution parameters of ZC index data. ZC simulation software does not have great hardware requirements, so the average of its CPU usage and RAM usage is at a low level.
e Lloyd-Max algorithm is used to quantify the data of various indicators of ZC, and the quantification interval of each indicator is shown in Table 6.    8 Scientific Programming Figure 14 shows the change trend of the basic operation situation value of the system obtained by simulation. It can be seen from the figure that the change of situation value can be divided into three stages according to time. Around 20s, the situational value suddenly increased from 0 to 0.4, which is consistent with the vulnerability scanning event conducted by the attacker at this time. e posture value then returned to 0, indicating that the attacker had been scanning for vulnerabilities for a short period of time and had no other activity for a period of time. During the period of 34s to 54s, the situation value fluctuated between 0.4 and 0.9, and there was an obvious upward trend. During this period, the attacker used the MS-17010 vulnerability to attack the system and obtained the administrators of the devices in the system        Scientific Programming (permission), implanted the DDoS virus on the device, and then stopped activities for a period of time, and the situation value also returned to 0. During the period of 60s to 100s, the situation value fluctuates around 0.8, which is consistent with the event that the attackers control the devices of the system to launch DDoS attacks on the gateway computer. e attacker then ceased activity, and the situational value returned to 0. e experimental results show that the fluctuation of the situation value corresponds to the different attack behaviors carried out by the attacker, which accurately describes the information security status of the system and verifies the effectiveness and accuracy of the situation awareness method proposed in this paper. And the time node of the change of the situation value is synchronized with the time node of the attacker's attack, which also verifies the real-time performance of the situation assessment method.

Situation Forecasting Experiment Results and Analysis.
is section selects the trend of situation change from 60s to 70s and uses the first 20 data points as training samples and the last 10 data points as comparison samples to verify the situation prediction method. e selected data is shown in Figure 15. Using ADF to check that the situation sequence is not a stationary sequence, the sequence needs to be stationary. e selected situation sequence is differentiated once, and the obtained difference sequence passes the ADF stationarity test. erefore, the sequence difference is stopped, and the subsequent steps of ARIMA can be used to predict the situation sequence. e difference result is shown in Figure 16. e training samples are fitted by the ARIMA method [22]. Determine the model parameters of ARIMA as p � 3, d � 1, q � 4. e parameters in the formula are obtained by using the maximum likelihood estimation method: the constant term μ � −0.026513; the three autoregressive coefficients are ϕ 1 � −0.20651, ϕ 2 � −0.33681, and ϕ 3 � − 0.4685 ; the four moving average coefficients are θ 1 � −0.54385, θ 2 � 0.24325, θ 3 � −0.78119, and θ 4 � 0.081789. Figure 17 shows the situation prediction result using this estimation model. It can be seen from the figure that the predicted value is basically consistent with the trend of the actual value. is shows that the ARIMA model has a high prediction accuracy for the information security situation value of the train control system. e experimental results show that the situation prediction method based on ARIMA can effectively predict the short-term changes of the situation value, but because the ARIMA method produces the characteristic that the sequence tends to be stable, the predicted value will converge near the mean value of the real value, which cannot reflect the fluctuation of the situation value. It is suitable for shortterm forecasts with high accuracy requirements.

Conclusion
Information security is an issue that every country should pay attention to. e article is short and mainly explains the use of the hidden Markov model. After simulating the information security experiment, the feasibility and effectiveness of the situation assessment method and the situation prediction method are carried out. After a brief verification, the results also show that the situation assessment method based on the hidden Markov model can effectively assess the level of information security. However, with the passage of time, this method will gradually converge to the mean value of the situation value sequence, so it cannot reflect the fluctuation of the situation. erefore, the long-term prediction of the situation value can become a follow-up research direction.
is article only provides a superficial understanding of security situational awareness and ignores some details in the analysis process. It is hoped that the article can provide ideas for future situational awareness research in the future.

10
Scientific Programming Data Availability e experimental data used to support the findings of this study are available from the corresponding author upon request.