ETCC: Encrypted Two-Label Classification Using CNN

Due to the increasing variety of encryption protocols and services in the network, the characteristics of the application are very different under different protocols. However, there are very few existing studies on encrypted application classification considering the type of encryption protocols. In order to achieve the refined classification of encrypted applications, this paper proposes an Encrypted Two-Label Classification using CNN (ETCC) method, which can identify both the protocols and the applications. ETCC is a two-stage two-label classification method. )e first stage classifies the protocol used for encrypted traffic. )e second stage uses the corresponding classifier to classify applications according to the protocol used by the traffic. Experimental results show that the ETCC achieves 97.65% accuracy on a public dataset (CICDarknet2020).

with encryption [6]. In recent years, machine learning methods have been the most commonly used method for classifying encrypted traffic. is is because encryption is usually only for the payload, and the machine learning method only care about statistical features, not the value of the payload. Hence, machine learning methods are less affected by encryption. is makes machine learning based methods more accurate than other methods.
Most encrypted application classification methods are based on single label. In other words, they directly use the classifier to determine the application type of network traffic. But under different encryption protocols, the characteristics of the application are also different. e encryption protocol mainly has two steps, the initialization of the connection and the transmission of encrypted data. e initialization of the connection is divided into initial handshake, identity verification, and shared key establishment. Because the encryption principles of different encryption protocols are different, these steps are very different, which leads to different representation of the final encrypted traffic [5].
erefore, if we can classify encrypted applications on the basis of known encryption protocols, we can get more accurate results than single-label classification.
In this paper, we propose an Encrypted Two-Label Classification method, referred to as ETCC, to improve the accuracy of encrypted application classification. ETCC is a two-stage two-label classification method. e two labels are encryption protocol and application. e first stage classifies the protocol used for encrypted traffic. e second stage uses the corresponding classifier to classify applications according to the protocol used by the traffic. e contributions of this paper are summarized as follows: (1) We propose a two-stage two-label scheme called ETCC, which carries out refined application classification according to the encryption protocol used (2) In the second stage of application classification, encrypted traffic can select the corresponding classifier according to the protocol type, instead of uniformly using the same classifier (3) Our scheme can identify both the protocol and the application, which can meet various needs e rest of this paper is organized as follows. Section 2 introduces some encryption traffic classification methods and some multilabel classification methods. In Section 3, a scheme is proposed to achieve refined applications classification. And some experiments and evaluations are presented in Section 4. Finally, Section 5 concludes our work and proposes some future works.

Related Work
In this section, we introduce some methods for classifying encrypted traffic and methods for multilabel classification. ese works give bright inspiration for our research.
In the early research, the commonly used methods include port-based classification, entropy-based classification, payload-based classification, and pattern matching-based classification. In the early days of Internet development, every application had a fixed port number assigned by IANA [7]. erefore, we only need to check the IANA TCP/UDP list to know the type of application. However, with the emergence of technologies such as port confusion and network address translation (NAT), port-based methods have become no longer feasible. Entropy-based methods classify encrypted traffic by extracting geometric features between traffic. Casino et al. [8] propose a method to distinguish encrypted and nonencrypted traffic based on the entropy value. ey only analyze a random subset, not the complete network traffic, to ensure real-time performance. e payload-based method can no longer analyze the contents of the package and cannot be used anymore [9]. e method based on pattern matching judges whether it is encrypted traffic and encryption protocol type by checking the header format but cannot further judge the application type. In summary, we need more advanced methods to achieve encrypted traffic classification task. e most commonly used method is based on machine learning. e differences of these methods are reflected in feature extraction, model selection, and parameter setting. Liu et al. [10] only consider first N packets in a sliding window, which not only reduces the dimension of encrypted traffic characteristics but also reduces the number of data packets in each flow. Similarly, Hasan et al. [11] analyze the first 64 packets to identify Android applications. Finally, they state that most Android applications can be identified through the TCP/IP header. Shen et al. [12] combine the certificated packet length and the first application data size as a unique fingerprint for a given application and then use the second-order Markov chain to classify encryption applications. Cui et al. [13] propose the SPCaps model, which uses capsule neural networks (CapsNet) to learn the spatial features of encrypted traffic. e advantage of this model is that it simultaneously learns the position of the feature in the package and the order between the packages. Ly Vu et al. [14] used time series as an entry point to classify encrypted traffic. eir method is divided into two steps. e first step is to extract behavior patterns based on the time series of packets. e second step is to classify according to the correlation between time series samples. Zeng et al. [15] think more comprehensively.
eir scheme not only analyzes spatial features but also analyzes temporal features and coding features. However, these works still ignore the suddenness of network traffic and cannot capture complex nonlinear features. e framework proposed in [16] leverages multifractal feature extraction technology, which can capture the self-similarity of network traffic structure in a wide time range. Because it is always difficult to consider comprehensively when extracting features, Wang et al. [17] took a different approach and directly converted the flow into a picture and put it into the model for classification. Lotfollahi et al. [18] employ CNN and SAE to classify encrypted traffic, respectively. ere is no need for an expert to extract features and provide reference for many later studies. e classification in general scenarios was introduced earlier, but, for specific scenarios, using specific methods can be more efficient. Shen et al. [19] introduce the traffic classification in Ethereum. Because these flows are all generated on the same platform, it will be more difficult to distinguish. To this end, they study where the existing methods are easy to misclassify and extract features from three aspects: packet length, packet burst, and time series. In order to evaluate quality of experience (QoE) and bring better services to users, Orsolic et al. [20] propose a system for YouTube videos called YouQ. ey collect YouTube videos and evaluate the QoE of the videos based on the traffic characteristics of each video session. Similarly, Tarun Mangla et al. [21] evaluated the QoS of encrypted HTTPbased adaptive streaming (HAS) sessions. Anderson et al. [22] analyze TLS encrypted sessions in commercial malware sandboxes and two enterprise networks. ey claim that the choice of features has a great impact on performance. In order to monitor and detect specific users, Pierre-Olivier Brissaud et al. [23] propose a scheme for monitoring HTTP/ 2 communication based on the TLS protocol. is scheme is designed to detect whether the user has performed certain specified operations. e QUIC (Quick UDP Internet Connection) protocol is a new default encrypted Internet communication protocol that provides many improvements to speed up HTTP communication while making it more secure. However, since it is a new type of protocol, the amount of data available is very small. Rezaei et al. [24] propose a semisupervised learning based method that first trains the model with a large amount of unlabeled data and then retrains the model with a small amount of labeled data. For network traffic classification, it greatly reduces the amount of labeled data required. e studies on multilabel classification are very few. ere are two common ways to deal with multilabel classification. Convert the multilabel classification problem into several single-label classification methods, or integrate multilabels into a single label. Grigorios Tsoumakas et al. [25] give a detailed introduction to multilabel classification and compare several classification methods, which provide a lot of guidance for our research. Tien anh Nguyen et al. [26] propose a Bayes-based method that not only considers the relationship between labels and features but also considers the relationship between label pairs. Jesse Read et al. [27] constructed a multilabel Hoeffding tree with classifiers at the leaves. Moreover, they create a new set of benchmarks in predictive performance and time complexity. Darshin Kalpesh Shah et al. [28] use RNN and LSTM to classify multilabel text. e performance is significantly better than Logistic Regression and ExtraTrees. Ou Guangjin et al. [29] present a graph convolution networks based multilabel zero-shot learning model to recognize novel categories. Most of the multilabel classification is aimed at the problem of category independence. However, Nadia Ghamrawi et al. [30] study the problem of high label dependence. Jesse Read et al. [31] also study the high dependency between labels. ey use a chaining method to model the label relationship. Pengcheng Yang et al. [32] regard the multilabel classification task as a sequence generation problem and used the sequence generation model for classification. Experiments show that this method can effectively capture the correlation between labels. ese works help us a lot. Similarly, a two-stage two-label method is proposed in our paper, in which the protocols are classified in the first stage, and then applications are classified in the second stage. e biggest difference between our method and other multilabel classification methods is that our method will select the corresponding classifier for the second stage classification based on the results of the first stage. We achieve refined classification and two-label classification can meet various needs.

Methodology
In this section, we propose a two-stage, two-label scheme to classify encrypted applications, called ETCC. Our scheme consists of three modules: preprocessing, first label and second label module.
ey are used to preprocess data, classify protocols, and classify applications, respectively. Figure 1 presents the details.

Preprocessing Module.
is module is used to process raw data and convert them into a format suitable for the input of the classifier.
First, we collect some encrypted traffic and label them with protocols and applications.
Second, we select and extract some features. A flow is a collection of packets with the same IP five-tuple {Source IP, Destination IP, Source Port, Destination Port and Protocol}. Because the packets of the same flow are usually the same encryption protocol and application, we process data in units of flows. We use spatial features and temporal features to distinguish encrypted traffic, because these two features are not easily affected by encryption. Spatial features are related to quantity and size. Temporal features are features associated with time series. e specific features are shown in Table 1.
ird, we use the Sequential Floating Forward Selection (SFFS) algorithm [33] to select the most suitable features. We finally selected 41 features about Port, Protocol, Flow Duration, Length of Packet, Flow Bytes/s, Packets/s, Flow IAT, Forward IAT, Backward IAT, Flag Count, and Active Time. Detailed features are shown in Table 1. rough these simplified features, we can get a classifier with better generalization and faster speed.
Finally, we apply Min-Max Scaling [34] to normalize feature to meet the input requirements of supervised classifier and speed up model training. e formula of Min-Max Scaling is shown in where X max is the maximum value of the sample data, X min is the minimum value of the sample data,X is the current sample value, and X norm is the normalized value of the current sample. After this, the feature values are all mapped to the interval [0,1] and fed into the first label module.   input. However, if several convolution layers are used continuously, the amount of calculation will become very large, and the pooling layer can effectively reduce the amount of calculation through downsampling. Next, the flatten layer will convert the convolved data to one-dimensional and facilitate connection to the dense layer. e dense layer combines all local features into global features at the end to get the classification results. Figure 3 depicts the architecture for LSTM. e input layer and output layer of LSTM are similar to CNN, but the difference lies in the intermediate calculation process. LSTM cells can learn two pieces of information: new input information and previous memory. is allows LSTM to effectively use historical information so that it can learn long dependencies [35].

First Label
After input and calculation, the output layer can get a probability distribution of the flow classification p 1 , p 2 , . . . , p m . We define p max � max p 1 , p 2 , . . . , p m that determines the prediction category.
Finally, protocol types of encrypted traffic are obtained. We sent this m encrypted application traffic to the next module.

Second Label Module.
On the basis of known encryption protocols, we leverage this module to further classify encrypted applications into n categories.
Corresponding to the m encryption protocols obtained in the last module, we prepare m classifiers. at is, each protocol corresponds to a classifier. encrypted traffic selects the corresponding classifier according to its protocol type, and each classifier is only responsible for the application classification of a specific protocol. By using different classifiers for different protocols, we can get more accurate results.
We choose CNN and LSTM in this module. In the end, we apply CNN. e performance of these two algorithms is addressed in Section 4.3.

Experiment and Evaluation
In this section, we do some experiments to evaluate ETCC and compare it with the state-of-the-art method. We deploy our model on Ubuntu 16.04 OS, equipped with NVIDIA GTX 1050 GPU.

Dataset Description.
ree public datasets CICDar-knet2020 [36], ISCXTor2016 [37], and ISCXVPN2016 [38] are used to evaluate ETCC. ese datasets include four types of protocols and five types of applications. e four protocols are Tor, Non-Tor, VPN, and Non-VPN. e five applications are chat, FTP, email, audio, and video, as shown in Table 2.
CICDarknet2020 is a complete dataset covering Tor traffic and VPN traffic. e specific quantity of each type of data is shown in Table 3. Since ISCXTor only has Tor traffic and ISCXVPN only has VPN traffic, we mix them together as a dataset, called ISCX-Tor-VPN. In order to eliminate errors caused by data sample selection, ISCX-Tor-VPN uses the same sample quantity as CICDarknet2020. In addition, we set the ratio of the train set to the test set with 4 : 1.

Parameter Settings.
We deployed experiments for each classifier in each stage.
For the first label module, the structures of the CNN classifier and the LSTM classifier are shown in Figure 4. e dropout layer is used to discard neurons with a certain probability to prevent model overfitting and improve the generalization ability. Furthermore, we set the activation function, loss function, batch size, and epochs with ReLU, categorical_crossentropy, 32, and 15, respectively. For optimizer, the CNN classifier uses SGD, and the LSTM classifier uses Adam.
For the second label module, we have four classifiers to classify encrypted applications. e structures of the CNN classifier and the LSTM classifier are shown in Figure 5. Other parameters are the same as the last module.

Results and Discussion.
In this section, we analyze the performance of ETCC on the two datasets and compare ETCC with the state-of-the-art method.
We evaluate the classification results after the first label module. Figure 6 shows confusion matrices of the results. Rows and columns represent the true category and predicted category. e value represents the probability of a category being classified into each category.
From Figure 6, we find that, under the same model, the results of CICDarknet2020 are better than the results of ISCX-Tor-VPN. is is because the data of CICDarknet2020 is generated under the same network environment, and ISCX-Tor-VPN is a mixed dataset, which makes the distinction between Tor and Non-Tor and between VPN and Non-VPN smaller. For the two classifiers, it is obvious that the results of CNN are better, so we choose CNN as the first stage classifier. In addition, we also find that the easily confused categories are VPN and Tor and Non-VPN and Non-Tor. It is not difficult to understand that there are some similar characteristics between encrypted traffic and nonencrypted traffic. Tables 4 and 5 show the experimental results of the second label module on the premise that the first label module uses CNN classifier. Accuracy, precision, recall, and  Maxpooling (2) Dropout ( F1 are used to evaluate the scheme.
ey are defined as follows: For category X, TP is the number correctly classified into X, TN is the number correctly classified into Not-X, FP is the number incorrectly classified into X, and FN is the number incorrectly classified into Not-X.
As can be seen from the Tables 4 and 5, CNN performs better than LSTM. For CICDarknet2020, except the F1 of Tor, other indicators CNN performs better. For ISCX-Tor-VPN, except the precision of Tor, the precision of Non-VPN, and the F1 of Non-VPN, other indicators CNN performs better. is is because CNN has a better understanding of local features, while LSTM can memorize some context information. In our dataset, the category of a flow has little relationship with the flow before and after it, so CNN Maxpooling (2) Maxpooling (2) Maxpooling (2) Maxpooling ( performs better. erefore, we also chose CNN as the second stage classifier; the worst indicator also exceeds 91.1%. Tables 6 and 7 show the performance with the second label module and CNN classifier. We find the classification capabilities of Non-Tor and Non-VPN classifiers are better than Tor and VPN classifiers. is proves that encryption makes traffic classification more difficult. Another observation is that the precision of email is very low; this is because the sample size of the email in the dataset is very small. is phenomenon will not occur when the sample size is balanced. Moreover, audio and video achieve the best classification results.
Finally, we compare the results of CICDarknet2020, ISCX-Tor-VPN and the state-of-the-art method [39], as shown in Table 8. e result of CICDarknet2020 is better than that of ISCX-Tor-VPN. e reason is as mentioned earlier; that is, ISCX-Tor-VPN is a mixed dataset, and data is less distinguishable. Moreover, compared with [39], except  the precision of email and the recall of video, other indicators are improved. Total precision and recall increase by 1% and 1.6%, respectively. In general, our ETCC significantly improves the classification accuracy of encrypted applications through a two-stage two-label method. is proves that applications have different characteristics under different protocols, and the classification of applications on the basis of known protocols will result in more accurate results.

Conclusion and Future Work
In this paper, to achieve refined classification of encrypted applications, we propose a two-stage two-label scheme. e first stage classifies the protocol used for encrypted traffic. e second stage uses the corresponding classifier to classify applications according to the protocol used by the traffic.
e experimental results prove that our scheme is effective and feasible.
Furthermore, we discuss two-label classification in this paper. We will consider more labels in the future and propose more practical solutions. In addition, our method is based on the identification of encryption protocols. Once the traffic uses an unknown encryption protocol, the application classification results will be affected. erefore, we will consider the use of unknown encryption protocols in our future work.
Data Availability e datasets used in this paper are mainly obtained through the website https://www.unb.ca/cic/datasets/dar-knet2020.html; https://www.unb.ca/cic/datasets/tor.html; https://www.unb.ca/cic/datasets/vpn.html. e raw/processed data required to reproduce these findings cannot be shared at this time as the data also form part of an ongoing study.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.