Identification of Encrypted Traffic Using AdvancedMathematical Modeling and Computational Intelligence

is paper proposed a hybrid approach for the identication of encrypted trac based on advanced mathematical modeling and computational intelligence. Network trac identication is the premise and foundation of improving network management, service quality, and application security. It is also the focus of network behavior analysis, network planning and construction, network anomaly detection, and network trac model research. With the increase in user and service requirements, many applications use encryption algorithms to encrypt trac during data transmission. As a result, traditional trac classication methods classify encrypted trac on the network, which brings great diculties and challenges to network monitoring and data mining. In our article, a nonlinear modied DBNmethod is proposed and applied to encrypted trac identication. Firstly, based on Deep Belief Networks (DBN), this paper introduces the proposed Eodied Elliott (ME)-DBN model, analyzes the function image, and presents the ME-DBN learning algorithm. Secondly, this article designs an encrypted trac recognition model based on the ME-DBN model. Feature extraction is carried out by training the ME-DBN model, and nally, classication and recognition are carried out by the classier. e experimental results on the ISCX VPN-non-VPN database show that the MEDBN method proposed in this article can enhance the classication and recognition rate and has better robustness to encrypt trac recognition from dierent software.


Introduction
Network tra c classi cation is a basic step for managing and controlling network resources. Previous tra c classi cation methods, such as the tra c classi cation method [1] based on port number and deep Packet Inspection (DPI), cannot deal with encrypted tra c and can hardly adapt to the current tra c environment. e method based on tra c statistics and machine learning (ML) is popular in current, which not only can deal with encrypted tra c but also regular tra c, for example, decision tree (DT) and KNN algorithm. Nevertheless, the performance based on the ML depends largely on arti cially designed features and private information in tra c. erefore, there is a limitation on the generality and accuracy of the method. In addition, the method t requires a mass of storage and computing resources, which limits its implementation in resource-constrained nodes [2], such as vehicles, home gateways, and mobile phones. Real-time and accurate network tra c classi cation is the basis of network management tasks and intrusion detection systems, so a new tra c identi cation method is urgently needed. e development of mobile tra c identi cation technology has experienced three stages based on port, based on payload and based on tra c statistical characteristics. Nevertheless, the advent of port spoo ng, random ports, and tunneling quickly rendered these models ine ective. As users become more aware of privacy protection and security, technologies such as SSL, SSH, VPN, and Tor have become more widely used, resulting in an increasing proportion of encrypted tra c in network tra c. e payload-based approach, known as Deep Packet Inspection (DPI) technology, cannot handle encrypted tra c because it requires matching Packet content and is computationally expensive. erefore, to handle the problem of encrypted tra c classi cation, the method based on data tra c appears. Its generality depends on statistical or time series properties and uses ML algorithms, like some tree-based methods, classical model as SVM, and KNN, etc. Furthermore, some statistical methods such as GMM [3] and HMM [4] are also employed to identify the encrypted traffic. Classical machine learning methods could handle some problems that port and payloadbased methods cannot solve, but they still have certain limitations: (1) e characteristics of data traffic need to be extracted manually, which often depends on expert experience and is very time-consuming and labor-intensive. (2) e characteristics of traffic change rapidly and need to be updated frequently. (3) For traffic identification tasks, category imbalance is a major problem. Category imbalance refers to the fact that the data volume of some samples in the data set is several times or even higher than that of others. Using such data set to train the model, a high recognition rate can be obtained as long as all the small samples are classified into large samples, which is not meaningful in actual production. e method to solve this problem is to expand the data amount of small samples through different ways, but the current data expansion method cannot accurately generate samples as close as possible to the original data. (4) In model training, marked samples are mostly relied on. How to combine a large number of easily obtained unlabeled data sets with some difficult-toobtain labeled data sets for traffic classification in order to reduce the need for labeled data is a very key research topic. Different from most traditional ML algorithms, DL automatically extracts features without human intervention, which is undoubtedly an ideal traffic classification method, especially for mobile service encrypted traffic. Recent research work proves the superiority of the DL method in traffic classification [5][6][7][8][9]. erefore, it is of great importance and far-reaching significance to study the application of DL in traffic classification and how to improve the recognition rate of small sample traffic in unbalance data sets so as to more effectively and conveniently encrypt traffic and improve the accuracy of application identification.
Recently, DL-based methods have been employed in many fields and achieved good results, such as image recognition, speech recognition, and natural language processing. Owning to the Deep Learning, this article proposes a frame of classification and detection based on Deep Learning (DL), which can construct feature space through the deep structure of multiple hidden layers and discover data features through autonomous learning of a large number of data. It solves the difficulty of feature subset selection and improves the classification efficiency, which lays a foundation for the real-time classification of network traffic. In the second part, we summarize the existing research. In the third part, we introduce the identification model of encrypted traffic; in the fourth part, we introduce the evaluation process of the model in detail; the fifth part describes the data collection and processing in detail; the sixth part introduces the detailed process of experiment and simulation; the summary and discussion are arranged in the last part.

Research Overview
As early as 1995, Claffy et al. [10] used the traditional classification method based on service host attributes to identify network traffic. Almost all communication protocol packets, including encrypted packets, have their own unique traffic characteristics, which can be analyzed and distinguished from a large number of traffic samples. erefore, Gu et al. [11] made an in-depth analysis of the classification method of traffic load content characteristics. Yeganeh et al. [12] summarized the relative positional word set carried in the session flow payloads of each protocol and then detected the payloads by the deep packet detection method of word sequence matching to identify the protocol types as smart computing continues to evolve.
Due to the quantitative limitations of current technologies, new methods have been found that rely on the statistical characteristics of traffic to classify applications. In recent years, stream classification methods based on statistical features have attracted extensive attention. Common statistical features such as packet interval statistics, flow arrival time statistics, flow duration, packet length, traffic idle time, packet arrival interval, packet length, and other statistical characteristics of the network. With the explosive growth of traffic in the current network environment, the simple traffic statistics method has been unable to achieve the ideal classification effect of network traffic, and the method based on machine learning came into being. Machine learning mainly includes supervised learning, semisupervised learning, and unsupervised learning.
Recognition models based on ML and DL are widely used. Ibrahim et al. [13] designed a classifier for online traffic classification (SSPC) that combines three identification methods: port-based, payload-based, and statistically based. e classification results based on payload are preferred for identification, followed by the same results based on port and statistical characteristics. Conti et al. [14] used the method of RF to identify the actions of users on mobile phones through the IP, packet size, port, direction and other characteristics of the encrypted traffic generated by marked users when using the application mobile phone client. Compared with the ML-based methods, in 2004, literature [15] used packet length, packet interval, and stream duration as statistical features and used an expectation maximization algorithm to classify traffic types by unsupervised learning. Literature [16] uses an unsupervised machine learning algorithm to carry out unsupervised machine learning training on long-term and short-term memory recurrent neural networks so that the network can distinguish a group of time series and group them. e results show that the neural network has a strong time series learning ability and clustering ability based on multiple features. Literature [17] proposed a method of malicious traffic detection using representation learning.
is method does not need to manually design traffic characteristics but directly classifies the original data as input data. is is the first time that the representation learning method is applied to the detection and classification of malicious traffic. When the three classifiers are verified in two cases, the results meet the requirements of practical application accuracy. is document proves that the efficiency of representation learning in malicious traffic detection is high, but there are also shortcomings. e tuning parameters of the convolutional neural network are not studied, and the time factor and unknown malicious traffic are not considered [18]. Literature [19] classifies more than 20 kinds of fine-grained network traffic based on hierarchical learning. e results of large data sets show that the average accuracy of traffic classification of hierarchical classifier can reach 90%, and the accuracy and recall of commonly used traffic categories are higher than 95%. Wang et al. used the long-term and shortterm memory recurrent neural network to automatically learn the timing characteristics in the traffic, solved the problem of manually designing the characteristics, and achieved a high detection rate and low false alarm rate. In the literature, a cyclic neural network is used to learn the timing characteristics of encrypted traffic to realize the mobile application type recognition of the Android platform, and a high recognition rate and recall rate are achieved. Literature [5] uses the improved RNN and density clustering method to detect network abnormal traffic, which has achieved better results than the current method. Document [20] introduces a deep packet, which is an algorithm that uses deep learning [21] to automatically extract features from network traffic to classify traffic. e deep packet is the first traffic classification system using a deep learning algorithm, namely SAE and a one-dimensional convolutional neural network, which can identify applications and handle traffic characterization tasks. e automatic feature extraction process of network traffic can save the cost of using experts to identify traffic and extract manual features and reduce the overhead of traffic classification. e classification method based on machine learning has high classification accuracy and can be used for the identification of encrypted traffic, but the cost is high, and the data set needs to be understood and preprocessed in advance. Different business types have different requirements for the packet size of data flow. For example, the flow media data is small, and the packet downloaded from the file can be the maximum message segment length. erefore, there are differences in the distribution of packet sizes for different business types. e method based on packet size distribution is not affected by encryption and has good applicability. Qin et al. propose a new method based on packet size distribution signature, which can reduce the amount of packet processing and realize the accurate identification of P2P and VolP applications. Renyi crossentropy is used to identify by calculating the similarity between the two-way flow and the message size distribution of specific applications [22]. Wang et al. [5] simultaneously used CNN and LSTM to learn and classify data packet headers and loads, showing good performance in real-time intrusion detection. Aceto et al. [23,24] designed and implemented a recognition model based on MLP in order to track which APP the data stream came from. ey used some features in the first N bytes of payload and original data, and some features in the first 20 packets before interactive communication, including source port, payload bytes, size of TCP slide window, Sequential packet arrival interval, and direction, which were used as input, and the experiments were compared with random forest, stack automatic encoder SAE, CNN, and LSTM. Martin et al. [25] conducted a group of controlled experiments, respectively, using the combination of RNN, CNN and recursive neural network RNN and CNN to identify the traffic of the Internet of things. e results showed that the combination of RNN and CNN had the best effect. Hochst et al. [26] designed and implemented an autoencoder SAE in order to find out actions such as web browsing interaction, game download, online play, and upload in network traffic, which achieved good results. It can be seen that using deep learning to classify encrypted traffic is a good research direction.

Encrypted Traffic Classification
Model Design

Restricted Boltzmann
Machine. e constrained Boltzmann machine (RBM) is a deformed structure of the Boltzmann machine (BM). Based on statistical mechanics, the sample of BM follows the Boltzmann machine distribution.
e probability distribution of the energy-based probability model is defined by the energy function E(x): where x is the input sample, Z � x e −E(x) is the normalized function, the commonly used method to solve P(x) is gradient descent, and the negative logarithm of the training set D is its cost function: where θ is the parameter space of the model, the partial derivative of θ is obtained through the optimization algorithm so as to get the optimal solution of the cost function: e boltzmann machine is a random NN defined by the above energy function, which consists of a visible layer and a hidden layer, as introduced in Figure 1(a). As can be seen from the figure, both intralayer nodes and interlayer nodes have connection weights, and there are only two states of the output node: activated and inactive. 1 means activated, and 0 means inactive. erefore, we can see unit vector v � 0, 1 { } D and implicit unit vector h � 0, 1 { } k , and their learning mode belongs to unsupervised learning. e energy function between the visible layer neuron and hidden layer neuron of the BM model is defined as follows: where θ � W, L, R, c, b { } is the parameter of the BM model; W, L, R are the connection weights between nodes respectively, and the diagonal elements of L and R are 0. a and b represent the bias of v and h. rough this energy function, the probability distribution can be obtained by formula (1), and the solution of the model can be obtained by further solving. Although BM has a strong self-learning ability and can learn complex internal features in data, BM has a complex structure, resulting in a very long training time. In addition, it is di cult to obtain random samples of the distribution represented by BM, so the practical value is relatively low. e di erence between RBM and BM lies in that the neurons at the same layer are independent of each other, that is, L 0 and R 0. Only interlayer neurons are connected, and their structure is shown in Figure 1 where θ W ij , a i , b j , W ij is the weight matrix among the visible layer and the hidden layer. e purpose of learning RBM is to t the distribution of training data by nding the appropriate parameter θ. In order to get the optimal value of θ, we can use the stochastic gradient ascent method. erefore, the key step is to nd the partial derivatives of each parameter. e gradient of RBM logarithmic likelihood function is as follows: In the above formula, L(θ) is the likelihood function of the RBM model, and · P represents the expectation of distribution P. For the former term, the probability distribution of h under a given sample can be calculated; for the latter term, all possible values of v need to be searched before the joint probability distribution can be calculated. erefore, a feasible sampling method is needed to obtain the value of the distribution.

Gibbs Sampling Method. Gibbs Sample [1] is a sampling method based on the Markov Chain Monte Carlo (MCMC)
strategy that constructs random samples of probability distributions of multiple variables. For example, the joint distribution of more than two variables is constructed in order to work out integrals and expectations. e e ciency of the MCMC algorithm is low because the high-dimensional data has a certain reception rate. If the reception rate can be set to 1, the problem of slow convergence caused by the frequent rejection of transfer can be avoided, and Gibbs sampling can sample the joint distribution of high-dimensional random variables. e speci c process mainly, assuming a kd random vector X x 1 , x 2 , . . . , x M , cannot obtain the joint probability distribution P(X) of X, but the rest of the components x k − x 1 , x 2 , . . . , x k−1 , x k+1 , . . . , x M of a given X, the conditional probability of the k-th component x k is P(x k |x k − ), therefore can from an initial state ofX (such as [x (0) 1 , x (0) 2 , . . . , x (0) M ]), using the amount of conditional probability, iteratively to state the weight of samples, e distribution of the random variable converges geometrically to P(X) as the number of samples increases.
Gibbs algorithm is employed to get random data conforming to the model distribution in the RBM model. e sampling process is introduced in Figure 2.
e speci c steps of t-step Gibbs sampling in RBM are as three steps as follows: Step 1: First, use the input sample to initialize the state v 0 of the visual node; Step 2: en, determine the sampling times t. Sampling is carried out according to the following conditional probability formula: h (s− 1) is obtained by sampling with conditional probability P(h|v (s− 1) ); en the conditional probability P(h|v (s− 1) ) is sampled to get v s ; Among them, s 1, 2, . . . , t. Step 3: Step 2 is cyclically sampled for t times, and nally, when the sampling times t is large enough, v t can be obtained.

DBN Model Based on Modi ed Elliott Function.
DBN model is composed of multiple RBM stacked on top of each other, so in the training process of RBM, the activation function also determines the ability of feature extraction. RBM performs a step sampling by CD algorithm and Gibbs sampling. Firstly, the visible layer is mapped to the output of the hidden layer through the activation function, and then the output is taken as the input of the visible layer.
According to the analysis of the activation function in this article, it can be known that the activation function is a core position in network training. If the activation function is improperly selected, it is di cult to improve the accuracy of training learning no matter how to construct the model structure. However, if the activation function is properly selected, the feature extraction ability of the network can be signi cantly improved. Based on this, a DBN model based on the Modi ed Elliott function (ME-DBN) is proposed in our article. Elliott function [2] satis es the generalized Logistic di erential equation, so this paper introduces the Elliott function into the model to improve the traditional sigmoid activation function, as shown below: In order to ensure that all neurons are saturated in the pretraining stage, the activation function should have a high gradient zero value. Based on this analysis, formula (7) is revised in this paper: Figure 3 shows the function graph of the modi ed Elliott function and sigmoid function.
As we can nd from Figure 3, the modi ed Elliott function becomes steeper near zero, which causes more major features to fall into the middle region of the function, and at the same time, reaches the threshold at the lower value of its input, closer to the biological neuron than the sigmoid function.
Next, this paper compares the modi ed Elliott function with the sigmoid function, as shown in Figure 4: As we can nd from Figure 4 that ReLU has no gradient at the negative half-axis of input, and the modi ed function in this paper has a gradient, so the problem of failing to update the weight will not occur.
To better t the distribution of input data in the network model in the pretraining stage, the activation function in the pretraining stage is improved in this paper. erefore, after introducing the modi ed Elliott function into RBM, the conditional probability of the visible layer and hidden layer can be deduced as follows: In the pretraining stage, the training objective is still the maximized likelihood function. CD algorithm is used for sampling, so the parameter update formula is as follows: Figure 2: Gibbs sampling process.  e parameters trained in the pretraining stage are used as the initialization of the fine-tuning stage, and the whole MEDBN network is fine-tuned by using the gradient descent method.

Evaluation Processing
After the training of these three classification models and training data, the performance of these models is evaluated with test data. e classifier best suited to the current traffic environment is that it has the most accurate classification model. Accuracy is represented as follows: In the formula, TP is a true positive, indicating that the traffic that belonged to category C is classified in category C. FP is a false positive, showing that the traffic not belonging to category C is classified by mistake; FN is missing report, indicating that the traffic not belonging to category C is classified into others; TN is a true negative, indicating that the traffic of noncategory C that is classified as noncategory C.
e precision defined in formula (11) is used to select the optimal proposed model. At the same time, three indicators are used to evaluate the performance of the proposed model, which are Precision, Recall, and F1 score, e definition is as follows:

Data Processing
ISCX VPn-NonVPN Traffic Dataset was selected in the experiment. As shown in Table 1, this data set consists of 15 applications, such as Facebook, Youtube, Netflix, and so on. e selected application uses various security protocols for encryption, including HTTPS, SSL, SSH, and proprietary protocols. e selected data set contains a total of 206,688 packets. Obviously, the data set is unbalanced. Some applications have a large number of traffic samples, such as Netflix, which accounts for 25.126% of the total data set. Meanwhile, some applications have a small number of traffic samples, such as Aim Chat and ICQ, which only account for 2-3% of the total data set.

Experiment and Simulation
To further distinguish the effectiveness of our proposed model for identifying encrypted traffic, we list a series of classic benchmark models that have been proven to achieve excellent prediction and classification results in various fields. Figure 5 shows the evaluation of classification recognition results with the benchmark models. ey include XGboost algorithm and GBDT algorithm based on number model, Bayesian classification algorithm and SVM algorithm based on classical classification algorithm, LSTM model and RNN model based on the neural grid. At the same time, we also take a single DBN model as one of the benchmark models to compare the classification recognition results between the DBN model and our ME-DBN model. As can be seen from the figure above, (1) Among all benchmark models, the ME-DBN model achieves the best performance in five indicators, which indicates that our proposed model is effective in the classification of encrypted traffic; (2) compared with all benchmark models, DBN model achieves the best results in ACC and F1 indicators, and also ranks TOP3 in classification results of other indicators, which indicates that, on the whole, DBN model can effectively identify encrypted traffic; (3) although RNN model achieves the best result in FRrate index, its performance in other indicators is poor. We could find that classification results based on the RNN model are unstable. (4) Compared with the DBN model, the performance of ME-DBN proposed by us is superior to the DBN model in all five indicators, which indicates that the method proposed by us can effectively enhance the basic DBN model; (5) we can also know from the experimental results that neural grid based models such as CNN, RNN, and LSTM have significantly higher classification and identification effects on encrypted traffic than other benchmark models (including decision tree-based classification model and classical mathematical classification model). Figures 6 and 7 show the comparison of training and prediction time of different algorithms on the ISCX VPN-nonVPN dataset. Among them, 70% samples are selected as training samples and 30% samples are selected as test samples. It can be seen from the figure that in the ISCX

Conclusions
At present, the deep neural network has become an important research content on machine learning. Feature extraction algorithm based on DNN mainly uses a deep neural network model to carry out feature extraction of data imitating the information processing mechanism of the human brain, so as to screen the important information in data. e deep neural network has an excellent performance in extracting images, sound, text, and other information. However, with the increasing scale of data sets, the network structure becomes more and more complex, making network training more di cult, which requires more e ective training methods. Secondly, when the traditional sparse deep network model learns input data, all hidden layer nodes may have the same e ect without completely changing the feature homogeneity phenomenon. In addition, the traditional Sigmoid activation function is nonzero mean, which is di cult to e ectively train the network and is prone to the phenomenon of gradient disappearance. To handle the above problems, our article studies the feature extraction algorithm based on DNN, and proposes a DBN tra c classi cation method based on the nonlinear correction. In view of the phenomenon of gradient disappearance that the traditional Sigmoid function is prone to, the Elliott function satisfying the generalized Logistic di erential equation is proposed to replace the Sigmoid activation function, and then the Elliott function is modi ed to meet the characteristics of RBM. e modi ed Elliott function can make the nodes in the saturation state, so it is not easy to cause the problem of gradient disappearance.
Data Availability e support data can can be obtained from the author upon request.

Conflicts of Interest
e author declares no con icts of interest. Mathematical Problems in Engineering 9