Neural Linguistic Steganalysis via Multi-Head Self-Attention

. Linguistic steganalysis can indicate the existence of steganographic content in suspicious text carriers. Precise linguistic steg-analysis on suspicious carrier is critical for multimedia security. In this paper, we introduced a neural linguistic steganalysis approach based on multi-head self-attention. In the proposed steganalysis approach, words in text are ﬁrstly mapped into semantic space with a hidden representation for better modeling the semantic features. Then, we utilize multi-head self-attention to model the interactions between words in carrier. Finally, a softmax layer is utilized to categorize the input text as cover or stego. Extensive experiments validate the eﬀectiveness of our approach.


Introduction
Steganography is an ancient technique aiming at embedding secret messages into carriers which can be divided into image steganography [1], text steganography [2], and audio steganography [3] according to the different types of carriers. On the contrary, steganalysis mainly focuses on how to detect hidden messages in suspicious carriers.
Text steganography is the process of embedding secret data through a cover text so that the existence of the data is invisible/undetectable for adversaries or casual viewers. It has been widely considered as an attractive technology to improve the use of conventional cryptography algorithms in the area of multimedia security by concealing a secret message/watermark into a cover text file/message to protect confidential information. However, this technology can also be used by terrorists and other criminals for malicious purposes, which poses great threats to security in cyberspace.
Besides, text steganography technology has been significantly changed with the significant development of natural language processing technology. us, it is crucial to propose a linguistic steganalysis approach with most recent technologies.
Traditional linguistic steganalysis always firstly extracts statistical features from the carrier directly and then conducts the following classification procedure. For example, Taskiran et al. [4] distinguished cover and stego text based on a 3-gram language model as the feature and used Support Vector Machine (SVM) for classification. Chen et al.
[5] proposed a steganalysis scheme (NFZ-WDA) to model the language structure based on the word distribution in different natural frequency zones. e authors in [6] utilized the meta features which included word frequency, word length, and space rate and immune mechanism to select the proper features. e main difficulties of previous approach are that they always need related domain knowledge and the generalized performance to the proposed latest text steganography is very limited.
Zuo et al. [7] firstly proposed word embedding feature which can better exploit semantic and statistical distortion in linguistic steganography. Many other neural steganalysis approaches are based on CNNs and RNNs or their combination [8]. Although being studied, precise linguistic steganalysis still remains an unsolved problem.
Our work is motivated by the observation that the interactions between words in text are important for steganalysis and multi-head self-attention has great potential to model these interactions [9,10]. us, we propose a neural steganalysis approach with multi-head self-attention. In the proposed approach, a hidden representation layer is utilized to mapping words in text into semantic space for better exploitation of the semantic features. Secondly, we utilized multi-head self-attention to exploit the relationships between different words in a text, which is crucial in linguistic steganalysis. Finally, we concatenated the representation from words and the calibration representation from multihead self-attention for further classification. e softmax layer is used to categorize the text into "cover" or "stego." Experiments validate the effectiveness of the proposed approach.
e contribution of our work is as follows.
(1) As far as we know, we are the first to propose an approach based on attention mechanism to model and extract correlation features in linguistic steganalysis. (2) Experiments show that the proposed steganalysis method achieves excellent performance in detecting generative linguistic steganography. e content of the paper is divided into four parts. Section 2 describes many details of the proposed linguistic steganalysis approach based on multi-head self-attention. Section 3 introduces the experimental results, and model is discussed in this part. Finally, concluding remarks and future works are given in Section 4.

Proposed Approach
e architecture of the proposed linguistic steganalysis approach is shown in Figure 1. It contains three major modules, i.e., text representation, carrier encoder, and carrier prediction. Detailed analyses on different components of the proposed architecture are presented in the subsequent subsections.

Text Representation.
e core of text representation is word embedding, which is used to convert a text carrier from a sequence of words into a sequence of low-dimensional embedding vectors. Denote a suspicious carrier text with M words as [w 1 , w 2 , . . . , w M ]. rough this layer it is converted into a vector sequence [e 1 , e 2 , . . . , e M ]. Besides, in order to provide more position information of words in text for the approach, we added a position embedding vector to word embedding sequence, and thus get a new word represen- In the carrier encoder layer, we adopt multi-head selfattention [9] which has recently achieved remarkable performances in modeling complicated relations between context words. Taking the m-th word representation feature h m as an example, we will explain how to identify multiple meaningful correlation features involving feature h m based on such a mechanism. At first, we define the correlation between feature h m and feature h k under a specific attention head i as follows: where φ (i) (·, ·) is an attention function which defines the correlation between feature h m and feature h k . e attention functions have many different forms and most of them are neural networks. In our case, we adopted the widely used form inner product, which can be formularized as follows: where W i query , W (i) key ∈ R d×d′ are transformation matrices which map the original embedding space R d into a new semantic space R d′ .
After getting these correlation coefficients, we recalibrate representation of feature h m in subspace i by combining all relevant features guided by coefficients α (i) m,k : where An interesting intuition is that utilizing more different levels of features may boost the performance, and inspired by the residual connection structure in ResNet [11], we concatenate features from calibration representation h and the origin word representation h in concatenation layers. e final feature vector r in these layers can be formulated as follows: where r was taken as the text representation for the proposed linguistic steganalysis problem. en, global average pooling is utilized to reduce dimension of features because the dimension of r is very high which may lead the model under the risk of overfitting. After that, the pooling features were fed in to a classification layer for the model to generate the probability distribution and give an indicator over the label set.

Carrier Prediction.
e main focus of the carrier prediction module is to categorize whether a text belongs to "cover" or "stego." e prediction layer is composed of two dense layers with ReLU and sigmoid activation functions, and the layer can be formulated as where σ(x) � (1/1 + e − x ), w and b are parameters and bias terms of linear transformation. Finally, the suspicious text belonging to "stego" cover was reflected by output value T which is a probability. A prediction label can be finally determined by a threshold, which can be formulated by following equation:

Training Framework.
Optimization of proposed approach is based on a supervised learning framework. Loss function of the network is cross entropy error loss. Parameters in the proposed model are updated by the back propagation. e gradient of the procedure is computed by minimizing the cross entropy loss, and the procedure can be formulated as follows: Moreover, in order to mitigate the overfitting issue, we applied batch normalization [12] and dropout technique [13] to regularize the proposed model.

Dataset and Experimental Settings.
A linguistic stegosystem was firstly constructed based on the proposed approach in [14] for the purpose of evaluating the performance of the proposed approach. ree large-scale text datasets containing the most common text media on the Internet are taken as our training sets, which are Twitter [15], Movie reviews [16], and News to train the Linguistic stegosystem.
en, we utilized the linguistic stegosystem to construct our own steganalysis dataset. 10000 stego samples were generated and 10000 nature texts were randomly chosen in each dataset as our dataset to conduct steganalysis. Note that sentences for different types of text with different embedding rates are different. e cross validation process in the validation set determined the hyperparameters in the proposed model. Specifically, the number of heads in multi-head self-attention is 8. e embedding size d is 256. e dimension of fully connected layer in classification layer is 100. e detection threshold is set as 0.5. Optimization method in the training process is Adam [17], where the learning rates were initially set as 0.001, dropout rate was set as 0.9, and the batch size was set as 256.

Evaluation Metrics.
Several evaluation metrics commonly used in classification tasks were utilized to evaluate the performance of proposed model, including accuracy (Acc), precision (P), recall (R), and F1 score. e definition of metrics is formularized as follows: where TP (true positive) represents the number of positive samples that are predicted to be positive by the approach, FN (false negative) illustrates the number of positive samples predicted to be negative, TN (true negative) represents the number of negative samples predicted to be negative, FP (false positive) indicates the number of negative samples predicted to be positive, and F1 score is the harmonic mean of the precision and recall.

Performance Evaluation.
Several different representative steganalysis algorithms were chosen as our baseline models [18][19][20][21] to validate the performance of proposed model. e results of the comparison is shown in Table 1. From the results, we can conclude that compared to other linguistic steganalysis methods, the proposed model has achieved the best detection performance on various metrics, including different text formats and different embedding rates. We can also observe that different datasets have different linguistic steganalysis performances. is may be because of different text lengths in different datasets. Longer texts may have more clues for steganalysis which lead to higher detection accuracy. Besides, we also noticed that the detection performance of steganographic text will increase with the increase of steganographic information in generated texts. One of the explanations of the phenomenon is that once more information is embedded in texts, the distortion of the generated texts will decrease, which will damage the coherence of text semantics and give more steganalysis clues. We also conducted multi-classification experiments in the dataset which can be taken as embedded rate estimation task [22] where we mixed the texts at various embedding rates, i.e., bpw � 0, 1, 2, 3, 4, 5. e experimental results are shown in Table 2. From Table 2, we can see that our model can also outperform all the base models.

Conclusions
Precise linguistic steganalysis on suspicious carrier is critical for multimedia security. In this paper, we introduced a neural linguistic steganalysis approach based on multi-head self-attention. In the proposed approach, words in text are firstly mapped into semantic space with a hidden representation for better exploitation of the semantic features. en, we utilize multi-head self-attention to model the interactions between words in carrier. Finally, a softmax layer is utilized to categorize the input text as cover or stego. Extensive experiments validate the effectiveness of our approach. In the future, we will construct more general steganalysis approach to detect more linguistic steganography.
Data Availability e datasets to train this linguistic stegosystem are based on three large-scale text datasets containing the most common text media on the Internet as our training sets, which are Twitter [15], Movie reviews [16], and News (https://www. kaggle.com/snapcrack/all-the-news/data).