HTTP Cookie Covert Channel Detection Based on Session Flow Interaction Features

HTTP cookie covert channel is a covert communication method that encodes malicious information in cookie felds to escape regulatory audits. It is difcult to detect this kind of covert channel according to the cookie content because cookie felds are mainly encoded in custom modes. To efectively identify the HTTP cookie covert channel, this paper proposes a detection method based on the interaction features of the session fow. First, we split the HTTP session fow into fne-grained “interaction process” subfows to comprehensively describe the communication process of the cookie. Ten, we compare and analyze the diferences between HTTP cookie covert channels and normal cookie communications based on the interaction process, design three types of 7-dimensional features, and build the detection model combined with the machine learning algorithm. Experimental results show that our method can efectively detect HTTP cookie covert channels, and the detection rate can reach 99%. We also prove that our method has advantages in stability and time performance compared with the existing detection methods through experiment and analysis. In addition, our method has certain practicability in the simulation environment with imbalanced data.


Introduction
Covert channel refers to the secret transmission of messages over channels that are not designed to transmit information, which is achieved by using resources shared by the sender and the receiver to encode covert information [1]. Shared resources that can be used to construct covert channels include data packets in the network, and data transmission time [2,3]. In some new scenarios, even resources such as cache [4] and optical frequency comb [5] can be used to encode covert information. At present, the application layer protocol HTTP that still exists in a large number in the network is widely used to design covert channels due to its own fexibility. Te HTTP cookie covert channel delivers information by writing covert data in custom encoding modes in the cookie feld of the HTTP protocol. Typically, standard intrusion detection systems will not inspect the cookie content. In addition, the encoding formats of cookies are usually specifed separately by diferent servers [6], making HTTP cookie covert channels well concealed.
Te HTTP cookie covert channel is a covert channel that directly manipulates network protocol data. In fact, many covert channels manipulate random values or strings in the protocol in some way [7]. Although the random value mode covert channel has good concealment, the transmission efciency is low due to the limited utilization of resources. URL is another common string used to construct covert channels in the HTTP protocol. However, compared with the cookie, the URL contains more semantic information, resulting in relatively poor covertness. In recent years, many attack groups have used HTTP cookie covert channels to carry out network attacks, seriously endangering cyber security. In 2016-2017, after implanting the malicious program into the victim host, the APT10 organization [8] would collect the hostname, process identifer, current working directory, window resolution, and Microsoft Windows version of the victim host and would encrypt and encode them in the cookie feld to send them to the command and control (C2) server. In 2016-2017, the backdoor program implanted into the victim host by the APT15 organization [9] would transmit relevant parameters of the victim host and query commands to the malicious server through the cookie feld. Te data transmitted in the cookie would frst be encrypted by AES-CBC and then encoded by Base64. In 2019-2021, in order to establish a secure session with the C2 server, the backdoor implanted into the victim host by the APT29 organization [10] would hard-code the commands that request the session key into the cookie feld and would send it to the C2 server.
Terefore, efective detection of HTTP cookie covert channels can detect and block related network attacks in time and can also provide a basis for attack source tracing, which is of great signifcance to protecting cyber security. Currently, there are many types of research on covert channel detection, which according to diferent analysis methods, can be classifed into covert channel detection based on packet payload features, covert channel detection based on fow behavior statistical features, and covert channel detection based on deep learning automatic feature mining. In most cases, HTTP cookie covert channels and normal HTTP cookie communications behave similarly in terms of packet payload and fow behavior, so detection based on packet payload features and fow behavior statistical features cannot identify HTTP cookie covert channels well. Moreover, detection based on deep learning and automatic feature mining often results in large model training overheads.
To solve the above problems, we propose an HTTP cookie covert channel detection method based on the interaction features of session fow in this paper. We extract features based on the interaction principles of HTTP cookie session fow to achieve efective detection of the HTTP cookie covert channel. Te main contributions of this paper are as follows: (1) On the basis of aggregating the HTTP session fow, the HTTP session fow is split into "interaction process" subfows according to cookie interaction behavior characteristics so that the efective features of the HTTP cookie covert channel can be extracted.
(2) Trough the network behavior analysis method, three types of behavior features of the HTTP cookie covert channels are extracted and combined with the traditional machine learning algorithm to build a detection model so as to achieve efective detection of HTTP cookie covert channels.
(3) Experiments and theoretical analysis prove that the detection method proposed in this paper has advantages in stability and time performance compared with the existing detection methods. Furthermore, in the simulation environment with imbalanced data, our method still has certain practicability.
Te rest of this paper is organized as follows. Section 2 introduces current covert channel detection methods. Section 3 introduces the framework of the detection method proposed in this paper. Section 4 details our detection method. In Section 5, we conducted experiments to verify the efectiveness and superiority of the detection method proposed in this paper. Section 6 concludes the paper with a prospect of future research.

Related Work
In this section, we introduce existing covert channel detection methods. Te covert channel detection method is divided into two main steps: feature extraction and classifcation.
2.1. Covert Channel Feature Extraction. Unlike stealing cookies, the cookie feld of the HTTP cookie covert channel is constructed by the attacker according to a custom encoding method and conforms to the standard format rather than being stolen or intercepted from legitimate users. Terefore, the critical point of feature extraction lies in the portrayal of covert behavior rather than the content analysis of the cookie. Te existing covert channel feature extraction methods are divided into two types depending on the different perspectives of portraying covert behavior.
Te frst type of method extracts features based on packet payloads and portrays covert behavior at the packet level. Several works [11][12][13] extracted information such as feld diference, entropy value, and semantic content from the specifc positions in HTTP packets as features to characterize HTTP covert channels. However, such methods can only extract features for specifc types of covert channels and cannot efectively characterize the HTTP cookie covert channel. Te second type of method extracts behavior statistical features by aggregating network fows and portrays covert behavior at the network fow level. Some works [14][15][16] mainly extracted the time information of the network fow as the feature to detect covert timing channels that use the time to encode information. Such methods obviously cannot detect HTTP cookie covert channels which do not use the time to encode information. Several works [17][18][19][20] constructed multidimensional features from raw data such as packet size, packet interval, transport layer fag information, and ports in the network fow to portray covert behavior from multiple perspectives. Such methods can efectively characterize traditional covert communication behaviors. However, these general fow features are not directly related to the encoding of the cookie, so such methods are unstable when facing the problem of detecting HTTP cookie covert channels, they may be at risk of being bypassed.

Covert Channel Classifcation.
Te classifcation methods for covert channels are divided into three types. Te frst type of method classifes covert channels by setting thresholds for a small number of features. Relevant works [11,12,15] achieved the detection of covert channels by setting a threshold for a single complex feature. Such methods have good detection efects for specifc types of data but are relatively prone to low robustness of the detection model due to the low dimension of the extracted features. Te second type of method classifes covert channels by traditional machine learning algorithms. Relevant works [13,[16][17][18][19][20] extracted multidimensional features and input them into traditional machine learning classifers to classify covert communications. Te classifcation ability of such methods depends on the efectiveness of feature extraction. As long as the efective features of the covert channel can be extracted, a model with good detection ability can be obtained. Te third type of method uses deep learning techniques to automatically mine features from raw network trafc for classifcation without relying on expert knowledge. Relevant works [21][22][23] encoded the original network trafc and used it as the input data of the deep learning classifer to detect covert communications. On the one hand, when the payload of malicious trafc is not very diferent from that of normal trafc, for example, when there is no signifcant diference in cookie felds between the HTTP cookie covert channel and normal communication, such methods that use the original payload as input may not always have good detection efects, while the interpretability of the method is poor. On the other hand, such detection methods rely on deep learning techniques, which will cause enormous time overhead in model training. Table 1 shows the main information of the above covert channel detection methods and summarizes their defects in detecting HTTP cookie covert channels. In summary, existing detection methods are not good at detecting HTTP cookie covert channels. To solve this problem, in this paper, we analyze the diferences in session fow between HTTP cookie covert channels and normal HTTP cookie communications based on the function of the cookie and the interaction characteristics of HTTP session fow to extract efective features and achieve efective detection of HTTP cookie covert channels combined with the machine learning algorithm.

Detection Framework
Te overall framework of the HTTP cookie covert channel detection method proposed in this paper is shown in Figure 1, and it consists of three phases: (1) Data processing. In this phase, we frst aggregate the original network packets into session fows, and then flter the data to obtain the HTTP session fows that complete communication successfully and contain cookie felds, and fnally extract "interaction process" subfows. (2) Feature extraction. In this phase, we extract three types of features based on the interaction process: cookie distribution features, response content distribution features, cookie and related data relationship features, and generate feature vectors with category labels. (3) Detection. In this phase, we use supervised machine learning algorithms to train and classify the data and determine whether the input session fow is an HTTP cookie covert channel or a normal HTTP cookie communication.
Te main contributions of the above detection framework are as follows: (1) We innovatively propose the defnition "interaction process," which can comprehensively describe the communication process of the cookie and is the basis of efective feature extraction. Moreover, we detail the process of extracting the interaction process. (2) Te features we extract not only have good detection efects in various classifcation algorithms but also have strong interpretability. (3) Experiments show that our detection method is superior to the existing detection methods to a certain extent and has certain practicability.

Detection Method
In this section, we detail the detection method we propose. We frst analyze why this kind of covert channel is difcult to detect, and combined with the communication principle of HTTP, propose the defnition of "interaction process," which comprehensively describes the communication process of the cookie. Ten, we detail the method to extract the interaction process, based on which detail the method to extract features according to the analysis of diferences between HTTP cookie covert channels and normal HTTP cookie communications. Finally, we introduce the classifcation method we choose.

Problem Analysis.
Te cookie is a small piece of data that the server frst sends to the client when using the HTTP protocol and it will be carried the next time the client sends requests to the server. Te main functions of the cookie include: (1) Session state management. Enables the stateless HTTP protocol to stably record session state information.
(2) User behavior tracking. Push specifc content through the analysis of user behavior.
Te latest standard relating to the cookie specifcation, RFC 6265 document [24], does not require the encoding method of the cookie, which provides conditions for attackers to arbitrarily encode the cookie feld to build covert channels. Moreover, many normal servers can also customize the encoding or encryption method of the cookie.
Te process of HTTP cookie covert channel generation is shown in Figure 2. First, the attacker will implant the malicious backdoor program into the victim host, and the implanting methods include fle bundling, email attachments, and web page implantation. In order to bypass defense mechanisms such as frewalls and intrusion detection systems and communicate with the malicious server, the backdoor program will encode malicious messages to be delivered in the cookie feld of the HTTP protocol. After decoding the cookie, the malicious server will launch further attacks on the victim host according to the information provided by the backdoor. Table 2 shows examples of the cookie in a normal communication and a covert channel, respectively. Te cookie of the covert channel is captured during the communication of the malicious program ChChes [8], and it can be decoded as "AUSER-PC * 1620?3618468394?C:\\User-s\\admin\\AppData\\Local\\Temp?1.4.1 (1280 × 720) * 6.1.7601.24545." It can be seen that this HTTP cookie covert channel leaks relevant information about the victim host. However, the examples in Table 2 both conform to the standard cookie format defned in RFC 6265: separating keyvalue pairs with a semicolon. In addition, it can be inferred from the unreadable felds that either are encrypted or  [11] 2018 Character diference in HTTP header feld

Set thresholds
Can only achieve detection of specifc HTTP covert channels [12] 2020 Relative entropy between HTTP header feld probability matrices  [19] 2019 Te number, duration, port information, dissimilarity, and average length ratio of sending to receiving of fow Traditional machine learning [20] 2022 General features of network fow Traditional machine learning [21] 2017 Convert trafc into images and feed them into deep learning for automatic feature extraction Deep learning Poor interpretability, long model training time, when the character level diference is not obvious, the detection efect may not be good [22] 2018 Encode and aggregate trafc into matrices and feed them into deep learning for automatic feature extraction Deep learning [23] 2020 Extract a certain amount of payload within the fow and feed it into deep learning for automatic feature extraction Deep learning encoded in a particular way. Te encoded cookie feld is essentially uncertain information. At present, there are many uncertain information processing works [25,26]. However, the cookie encoding of normal communication and covert channel can be arbitrary and irregular, so it is difcult to detect the HTTP cookie covert channel from the character-level dimension of the packet feld. One of the cookie functions is session state management, so in this paper, we consider detecting HTTP cookie covert channels at the session fow level. Common covert channel detection features at the session fow level include packet length, number of packets, and packet interval. Tese features can achieve coarse-grained characterization and detection of covert channels from the perspective of fow behavior without considering the construction principles. However, as the behavior of the HTTP cookie covert channel fow constructed by attackers becomes more and more similar to the normal fow, the detection efects of the above features will gradually worsen. In order to extract efective features in the session fow, we next analyze the communication process of the HTTP session fow that uses the cookie feld.   An example of the HTTP session fow that uses cookie felds is shown in Figure 3. It is an example of a covert channel, and actually, there are no diferences between normal communication and covert channel in session fow structure or packet structure. Te session fow begins with the handshake phase. After the TCP connection is successfully established, the client and the server ofcially start to transmit HTTP packets. At this stage, the client frst sends a single or a set of continuous packets to the server to request HTTP content, and then the server returns a single or a set of continuous packets to the client. Based on the above analysis, this paper proposes the defnition of the interaction process.
Defnition 1 (Interaction process). In an HTTP session fow, a single or a set of continuous HTTP packets sent by the client to the server and the subsequent single or a set of continuous HTTP packets sent by the server to the client constitute an interaction process.
In an interaction process, the application layer payload in packets sent by the client to the server will constitute the HTTP request line, request header, and request data. And the application layer payload in packets sent by the server to the client will constitute the HTTP status line, response header, and response entity. In general, when the client does not cache information of the server, the frst few interaction  processes will not contain cookie felds. When the server sends the client a set-cookie feld defning the cookie content, subsequent interaction processes will include cookie felds. So the interaction process can comprehensively describe the communication process of the cookie.

Interaction Process Extraction.
Te HTTP session fow contains packets for establishing, maintaining, and disconnecting TCP connections, which do not constitute the actual content of the HTTP request or response, so such packets need to be fltered out when extracting the interaction process. Te flter condition is: discard packets with the application layer payload length of 0 in the HTTP session fow. Next, the remaining packets in the session fow need to be labeled, as shown in Figure 4. Each interaction process begins with a packet labeled 0 or 1, followed by a set of continuous packets labeled 2. And when there follows a packet labeled 0 or 1, it means that a new interaction process is started. Te specifc method for interaction process extraction is shown in Algorithm 1. In the sequence of the labeled packets, if a packet labeled 0 or 1 is found, followed by several packets labeled 2, store these packets in a subsequence, representing an interaction process. Te resulting fnal sequence stores all interaction processes of an HTTP session fow.

Feature Extraction.
We analyze the diferences between HTTP cookie covert channels and normal cookie communications based on the interaction process and design three types of 7-dimensional features.
(1) Cookie distribution features. Some covert channels will transmit a large amount of malicious information, so the length of the encoded cookie feld is relatively large. Moreover, HTTP uses the persistent connection by default, that is, multiple requests from the client to the server use a single connection. So in the same session fow, the cookies of each interaction process are strongly similar, and their diferences only exist in the key-value pairs inside a few cookies. Covert channels usually use cookie felds to encode diferent malicious information at diferent stages in a session fow, so the similarity among multiple cookies is weak. Te similarity can be fully characterized by the length dispersion and character-level similarity of multiple cookies in a session fow. Terefore, in a session fow, compared with normal communications, the length distribution of multiple cookies in covert channels is more discrete, and character-level diferences of multiple cookies in covert channels are greater. Based on the above analysis, the extracted features are as follows: (1) Average cookie length in a session fow. Extract the length of the cookie from each interaction process containing the cookie to get the sequence {cookieLen 1 , cookieLen 2 , ..., cookieLen n }, and calculate its average value: (2) Cookie length variance in a session fow. Calculate the variance of the above cookie length sequence: (3) Adjacent cookie similarity in a session fow. Extract the feld content of the cookie from each interaction process containing the cookie to get the sequence {cookie 1 , cookie 2 , ..., cookie n }. To evaluate the character-level similarity of all cookies in a sequence, calculating the similarity score for each pair and then taking the average will cause a huge time overhead and is not interpretable. So, we calculate the character-level similarity score of adjacent cookies in the sequence and take the average value, which reduces the time overhead on the one hand.
On the other hand, cookies may gradually change over time in the session fow of normal communications, so cookies that are separated over a long period of time in normal communications may also difer signifcantly. Te character-level diference between adjacent cookies is theoretically smaller in normal communication than in covert channels. Te calculation method is shown in Algorithm 2.
In Algorithm 2, the similarity calculation function CaculateSimilarity is based on the Gestalt pattern matching approach [27]. Te core idea is, given two strings S 1 and S 2 , frst fnd their longest common substring, then recursively keep searching for the longest common substring at the remaining unmatched positions of the two strings until there is no common substring in each subregion of the two strings. And calculate In formula (3), K m represents the number of characters of all the longest common substrings in the whole process, and |S 1 | and |S 2 | represent the lengths Security and Communication Networks 7 of the two strings, respectively. Te value range of the calculation result is (0, 1). And the closer the result is to 1, the more similar the two strings are; the closer the result is to 0, the more diferent the two strings are.
(2) Response content distribution features. Take the most common HTTP application web as an example. Due to the diversity of web page elements requested by clients, the response content returned by the server is also diverse in diferent interaction processes in the session fow. While in order to improve the data transmission rate, malicious servers using HTTP cookie covert channels usually only pay attention to the legality of the header structure of the HTTP response packets. So the response content of the session fow of the HTTP cookie covert channel in diferent interaction processes presents the characteristics of similar content and small length. Based on the above analysis, the extracted features are as follows.
(4) Average response content length. Te response content length is obtained by accumulating the application layer payload length of all response packets for an interaction process, and the sequence {ResConLen 1 , ResConLen 2 , ..., ResCon-Len m } is obtained in a session fow, calculate its average value: (5) Response content length variance. Calculate the variance of the above response content length sequence: (3) Cookie and related data relationship features. In general, the position where the information is actually transmitted in the network trafc will store the longer content. Te actual position for the HTTP cookie covert channel to transmit information is the Input: Cookie content sequence {cookie 1 , cookie 2 , ..., cookie n } Output: Adjacent cookie similarity in a session fow if n < 2 then return −1 end if for i � 1 to n − 1 do Similarity i ← ||CaculateSimilarity (cookie i , cookie i + 1 )|| end for sum ← 0 for i � 1 to n − 1 do sum ← sum + Similarity i end for return sum/(n − 1) ALGORITHM 2: Adjacent cookie similarity in a session fow algorithm.  cookie feld, while the main position for normal HTTP communication to transmit information is the response content returned by the server to the client. So in each interaction process of a session fow, calculating the ratio of the content lengths stored in the above two locations and then calculating the average value in the session fow can describe the diference between normal communications and covert channels to a certain extent. In addition, some covert channels will also respond to the malicious information passed in the cookie feld through the set-cookie feld of the HTTP response packets. Terefore, in each interaction process of a session fow, calculating the ratio of the set-cookie  Character level diference of adjacent cookies 5 F4 Information quantity of response content 2 F5 Dispersion degree of response content length 7 F6 Information quantity comparison of response content and cookie 4 F7 Information quantity comparison of set-cookie and cookie 1 feld length to the cookie feld length and then calculating the average value in the session fow can refect the individual diference between such covert channels and normal communications. Based on the above analysis, the extracted features are as follows: (6) Relationship between cookie and response content. For all interaction processes that contain cookies, calculate the ratio of the response content length to the cookie length, and take the average value in the session fow: ResConLen i cookieLen i . (6) (7) Relationship between cookie and set-cookie. For all interaction processes that contain cookies, calculate the ratio of the set-cookie length to the cookie length, and take the average value in the session fow:

Classifcation Method.
We extract a total of 7 numerical features from three perspectives, so it is impossible to use the detection method that sets a threshold for a single feature. Moreover, to obtain a detection model with a better classifcation efect and shorter training time in small sample data, we implement the detection of HTTP cookie covert channels based on the traditional supervised machine learning algorithm rather than the deep learning method.

Experiments and Analysis
In this section, we introduce the data used for the experiments and the experimental procedures and analyze the experimental results.

Data Collecting and Processing.
To ensure the purity and diversity of the normal data used in the experiment, the pcap fles in the ISCXIDS2012 dataset [28] and CTU dataset [29], which have been marked as collected from normal network activity, were selected as the normal data samples. Tere are no public datasets for HTTP cookie covert channels currently. We obtained 7 types of backdoor programs in the ChChes malware family [8] that use HTTP cookie covert channels from threat intelligence, and executed them in the virtual machine of the 32-bit Windows7 system and collected the trafc of the virtual machine. Te malicious servers that communicate with these backdoor programs were still alive, but they had their own stealth mechanisms. When the backdoor was frequently executed, the malicious server would respond to the status information of "429 Too Many Requests," and the communication recovery time was unknown. So the amount of data collected is relatively limited.
Next, we aggregated the original packets through the IP addresses and ports of the two communicating parties and the transport layer protocol to obtain the bidirectional session fow. On this basis, the session fow was truncated according to the two time standards proposed in the paper [30]: (1) Truncate the session fow when its duration exceeds 1800 seconds. (2) Truncate the session fow when no new packets are generated between the two communicating parties within 120 seconds. Te resulting session fows contained various protocols of the transport layer and the application layer, which needed to be fltered to obtain the HTTP session fows with the cookie feld and successful communication. Te logic of data fltering is as follows: frst, determine whether there is at least one packet in the session fow with the HTTP application layer protocol and the cookie feld. If the condition is true, retain this session fow; otherwise, discard it. Ten determine whether there is at least one HTTP response packet in the session fow with a status code of 200. If the condition is true, retain this session fow; otherwise, discard it. Te fnal data collecting and processing information in this paper is shown in Table 3.

Feature Validity Experiment.
In order to verify the validity of the proposed features, we constructed supervised classifers using Random Forest, SVM, Decision Tree, Naive Bayes, and KNN, respectively. After feature extraction and labeling of the fltered session fows, we divided them into a training set and a test set according to 7 : 3 while ensuring that the proportion of positive and negative samples in the two parts of data was the same. Ten trained each classifer and compared the results on common evaluation indicators: Accuracy, Precision, Recall, and F1 score. Te results are shown in Table 4 and the results are compared, as shown in Figure 5. Bold values in Table 4 represent the best result in each indicator. Te experimental results show that extracted features perform well in most supervised classifers, and the performance of Random Forest algorithm is the best.
Ten, in order to evaluate the contribution of each feature, we utilized the Random Forest to get the contribution score of features F1-F7 (following the order described in Section 4.3). Te higher the score is, the greater the feature's contribution is. Te results are shown in Figure 6. It can be seen that the contribution of each feature is relatively average, which indicates our extracted features are comparatively efective to detect HTTP cookie covert channel. Table 5 summarizes all the features.

Comparative Experiment.
Next, we compared our detection method with the two types of existing covert channel detection methods.
To realize the detection of various covert channels generated by malware from the perspective of fow behavior without considering the construction principle, the paper [31] extracted 28-dimensional general fow statistical features combined with existing work, as shown in Table 6. Te features in Table 6 have relatively good detection results for various covert channels under several commonly used machine learning algorithms. Paper [23] extracted a certain    amount of payload within the fow and fed it into Convolutional Neural Networks for covert communication detection. To make a valid comparison, we used the same data to verify detection indicators of our method and the above two methods. Moreover, our method and the method of paper [31] both used Random Forest algorithm. Te results are shown in Table 7 and the comparisons of results are shown in Figure 7. Bold values in Table 7 represent the best result in each indicator.
It can be seen that the detection indicators of our method are slightly higher than the other two methods. However, compared with paper [31], our method extracts fewer features while ensuring a high detection rate, which proves the efectiveness of our proposed features. Moreover, the paper [31] mainly extracts features from four aspects: bytes sent and received, packet length, packet time, and request/response time. Tese four aspects of information are not directly related to the encoding of cookie. Te features extracted by our method are related to the behavior of cookie encoding, so theoretically, our features are more stable when detecting HTTP cookie covert channel, that is to say, when the general fow statistic features change, HTTP cookie covert channel can still transmit information because the cookie does not change, in this situation, our method still works because it does not use general fow statistical features. Paper [23] uses deep learning method for classifcation, which will result in more time overhead. Figure 8 shows the comparison of the training time for the models to reach convergence between our method and the method of paper [23] for diferent data sizes. Terefore, our method has better time performance than deep learning methods.

Data Imbalance Experiment.
Typically, the covert channel accounts for only a small part of the network trafc. Terefore, to test the practicability of our method, we conducted the data imbalance experiment. We used 500 normal and 500 abnormal data points as the training set. Te test set consisted of two parts; the frst part was 500 normal data that were diferent from the training set. In the second part, we selected 500, 100, 50, 10, and 5 abnormal data that were diferent from the training set to test the classifcation efect of our method when the ratio of abnormal and normal data were 1 : 1, 1 : 5, 1 : 10, 1 : 50, and 1 : 100, respectively. And we used Random Forest as the classifer. To exclude possible individual errors in a single experiment, we repeated the experiment ten times with diferent data under each data ratio. Finally, the classifcation performance was measured based on the average value of each evaluation indicator. Te results are shown in Table 8 and the comparisons of results are shown in Figure 9. It can be seen that the four indicators fuctuate within a certain range. In summary, our method  performs well under each data ratio and has certain practicability.

Conclusion
Aiming at the problem that existing covert channel detection method cannot well detect HTTP cookie covert channels, we propose an HTTP cookie covert channel detection method based on session fow interaction features. We propose to further divide the session fow into smaller subfow, which is defned as the interaction process. And we extract efective features based on the interaction process. Experimental results show that, for HTTP cookie covert channels, the detection rate of our method can reach 99%. We also prove the superiority of our method compared with the existing methods and its practicability in the simulation environment through experiments and analysis. In the future work, we plan to further expand the dataset to optimize the efect and performance of our method in practical application.

Data Availability
Te data used to support the fndings of this study are available from the frst author upon request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.