Toward High Capacity and Robust JPEG Steganography Based on Adversarial Training

. JPEG steganography has become a research hotspot in the feld of information hiding. However, the capacity of conventional JPEG steganography methods is hard to meet the requirements in high-capacity application scenarios and also can not extract secret messages accurately after JPEG compression. To mitigate these problems, we propose a high-capacity and robust JPEG steganography based on adversarial training called HRJS, which implements an end-to-end framework in the JPEG domain for the frst time. Te encoder is responsible for embedding the secret message while the decoder can reconstruct the original secret message. To enhance robustness, an attack module forces the neural network to automatically learn how to correctly recover the secret message after an attack. Experimental results show that our method achieves near 100 % decoding accuracy against JPEG_50 compression at 1/3 bits per channel (bpc) payload while preserving the imperceptibility of the stego image. Compared with conventional JPEG steganography methods, the proposed method is feasible with high capacity (e.g., 1bpc) and has an obvious advantage in terms of robustness against JPEG compression at the same time.


Introduction
Modern steganography is a technology used to realize secret communication, which embeds a secret message into the cover image and ensures the imperceptibility.Nowadays, image steganography [1] has played a signifcant role in the felds of covert communication, medical systems [2], information certifcation [3], digital communication [4], and so on.
Image steganography makes use of visual redundancy to embed secret information so that the naked eye and Steg-Analyzer can not detect the suspicious visual changes of the image.In order to achieve this goal, the traditional adaptive steganography methods hide the secret message in complex texture regions, while avoiding the smooth regions.Based on the above-given perception, researchers proposed many spatial adaptive steganography algorithms, such as S-UNIWARD [5], HILL [6], and MiPOD [7].For the JPEG domain, J-UNIWARD [5] and UERD [8] are proposed.
However, the design of distortion functions is heuristic, which excessively lies on the experience of designers.At the same time, the hand-crafted distortion functions based on heuristic principles do not fully consider the statistical undetectability, which leads to traditional adaptive steganography methods can not efectively resist the detection of many advanced steganalysis methods [9,10].
With the development of deep learning, some steganography methods combined with CNNs (convolutional neural networks) have been proposed, which can efectively alleviate the disadvantages of heuristic design as well as improve the performance to resist advanced steganalysis [11].Te deep steganography methods can be mainly divided into two diferent categories: automatic embedding cost learning-based image steganography methods and endto-end image steganography methods.In the frst category, neural networks are used a similar traditional algorithm to identify the locations suitable for embedding data, such as ASDL-GAN [12], UT-GAN [13], and JS-GAN [14].Tese methods can automatically fnd the embedding locations and generate the embedding cost.However, their embedding and extracting message processes are completely dependent on syndrome trellis code (STC) [15].It should be noted that the cost design-based steganography can not against attack; e.g., the message can not be extracted when the stego is under JPEG compression attack, and the robustness should be improved for practical application.
In order to improve the robustness of steganography, researchers have proposed end-to-end steganography methods, such as deep steganography [16], HiDDeN [17], SteganoGAN [18], and IS-GAN [19].As shown in Figure 1, the end-to-end steganography framework takes the cover image and secret message as input and fnally outputs the decoded secret message.In particular, embedding and extracting secret messages are accomplished by the hiding network and revealing network, respectively.By inserting an attack module to force networks to learn how to recover messages after being attacked, the robustness can be improved.Nevertheless, all the above-given end-to-end steganography methods are concentrated in the spatial domain.Taking account of the image on the Internet will inevitably be compressed in the transmission process; so JPEG image steganography [20] has higher practical value.However, there are the following challenges when applied to the JPEG domain.Firstly, due to the complex statistical characteristics of discrete cosine transform (DCT) coefcients, the difculty and complexity of steganography in the JPEG domain is usually higher than the spatial domain.Secondly, the lossy quantization in JPEG compression leads to more difcult to recover the secret message.Terefore, all of the previous works have not implemented the end-to-end steganography methods in the JPEG domain.So most existing JPEG image steganography methods are not robust to JPEG compression and have a low embedding capacity.Moreover, the existing image steganography methods are difcult to efectively allocate the secret message between RGB channels.In order to address these limitations, we propose a novel end-to-end JPEG steganography framework based on adversarial training [21].Te advantages of the proposed method are as follows: (1) We propose a high-capacity and robust JPEG steganography framework called HRJS, which embeds and extracts secret message by a neural network.It is worth mentioning that our method implements the end-to-end steganography methods in the JPEG domain for the frst time.(2) Te robustness is greatly improved.We insert an attack module between the encoder and decoder to simulate practical application scenarios.It can extract meaningful information from the network disturbance through adversarial training, which forces the neural network to automatically learn how to correctly recover the secret message from the attacked stego image.(3) Te capacity is greatly improved.Our method utilizes a neural network to adaptively embed a secret message into RGB channels.It realizes the embedding of 1 bpc secret message while preserving the imperceptibility of the stego image and robustness.By comparison, the conventional works only efective up to a payload of around 0.5 bpnzAC (bit per nonzero AC DCT coefcient) or even much lower.
Te rest of this paper is organized as follows.We review the traditional JPEG image steganography and introduce end-to-end image steganography in Section 2. Ten, in Section 3, we introduce the architecture and loss function of HRJS in detail.Next, we show the experimental results and comprehensive analysis in Section 4. Finally, in Section 5, we present conclusions and avenues for future work.[22,23] is the quantized DCT coefcients.Many adaptive JPEG image steganography methods are committed to designing a more appropriate distortion cost function for the embedding modifcation of DCT coefcients.UED [24] and UERD [8] proposed to uniformly extend the embedding modifcations to the quantized DCT coefcients which have similar magnitudes.J-UNIWARD [5] took into account the statistical characteristics of the spatial domain when designed the cost function and obtained high security.BET [25] transformed the distortion cost function for spatial images into JPEG images and utilized the statistics of the DCT domain and spatial domain.It showed excellent performance in resisting advanced JPEG steganalysis.GUED [26] proposed new distortion measures that can keep the statistical characteristics of the cover unchanged on DCT block and AC (alternating current) mode.Moreover, a general and efective empirical rule is proposed to select the parameters of the exponential function.Work [27] creatively put forward a block boundary maintenance (BBM) principle, and the nonadditive cost function of JPEG steganography is defned by the coefcient correlation of intrablocks within the DCT domain.

End-to-End Image Steganography.
Te previous cost learning-based steganography methods using deep learning still rely on other traditional algorithms like STC to embed or extract the secret message.Diferent from these methods, the embedding and extracting processes of end-to-end steganography methods are both accomplished by networks.
Te end-to-end steganography methods usually contain encoder and decoder networks.Te encoder produces the visually indistinguishable stego image by inputting the secret message and cover image, from which the decoder could reconstruct the original secret message.Work [16] proposed an end-to-end steganography structure and embedded a full-size color image within another image of the same size, which signifcantly increased the payload.Due to the large payload, it could be easily detected by steganalysis tools.Work [17] can extract useful information under the attack of 2 Security and Communication Networks adversarial perturbations and greatly improve the robustness by inserting an optional noise layer.It is worth mentioning that it has achieved good performance in steganography and watermarking.Although this work can efectively resist many kinds of attacks, it applied upsampling to cover the image before it was fed into the encoder and reduced the quality of the image signifcantly.Steg-anoGAN [18] explored the encoder architecture of three connectivity patterns, which contributed to optimize the perceptual quality of stego image.Diferent from the above-given methods which took cover and secret message/image as the input of the encoder, UDH [28] disentangled the encoding of a secret image from the cover, which can analyze the embedding mechanism of the secret image conveniently.Benefting from this universal deep hiding (UDH) framework, it found that frequency discrepancy between encoded secret image and cover is the key to the success of deep steganography.Work [29] utilized the forward and backward propagation of an invertible steganography network to handle the embedding and extracting message processes.In addition, with the increase in the number of hidden image channels, the steganography capacity increases.Although end-to-end steganography has been developed in the spatial domain, it is still in the initial stage in the JPEG domain.

Proposed Methods
In this section, we will defne the basic notations and introduce the overall architecture of our proposed HRJS.Moreover, the details of encoder, inverse discrete cosine transform (IDCT), attack module, decoder, and discriminator will be given.After that, we will describe our loss function in detail.Finally, we will introduce the training steps of HRJS.

Notation.
Te capital letter C � (c i,j ) 3×W×H and S � (s i,j ) 3×W×H , respectively, represent the cover and stego images, where W and H mean the width and height of the images, respectively.Specifcally, both C and S are RGB color images.We use M ∈ 0, 1 { } D×W×H and M ′ ∈ 0, 1 { } D×W×H to represent the binary secret message and decoded secret message, respectively, where D means the depth of the secret message.
3.2.Architecture.Te architecture of our proposed HRJS is shown in Figure 2, which consists of the following fve modules.Encoder is responsible for embedding the secret message, which takes the DCT coefcients of the cover image and secret message as input and generates the DCT coefcients of the stego image.IDCT module is similar to JPEG decompression operation, which is the reverse process of JPEG compression.It converts the JPEG domain image into the corresponding spatial domain, while the DCT module is just the opposite.Te attack module simulates JPEG compression.It receives a stego image from the IDCT module and produces the attacked stego image.In particular, it can efciently force the network to learn how to recover secret messages correctly from the attacked stego image.Decoder receives the DCT coefcients of attacked stego image and attempts to recover the original secret message.Discriminator is responsible for distinguishing the stego image from the cover image.Te principle and implementation of these fve modules in detail will be introduced in the next section.

Encoder.
Te encoder receives the DCT coefcients of the cover image as well as a binary secret message and then produces the DCT coefcients of the stego image, which is also an RGB color image and has the same shape with the cover image.It should be noted that the secret message is binary data and has the same width and height with a cover image.It is worth mentioning that hiding binary messages with neural networks in the vast majority of JPEG image steganography methods has a low embedding capacity, which does not satisfy the needs of large-scale data hiding.To mitigate this problem, we set an adjustable parameter D, which denotes the depth of the secret message.By adjusting the value of D, the payload of our HRJS can reach up to 1 bpc, which is much larger than the cost design-based steganography methods.
Te network structure of the encoder is shown in Figure 3.We choose a residual variant, which is inspired by the ResNet [30].Firstly, we apply a convolutional block to the cover image to produce a high-dimensional representation of shape (32, W, H) and then concatenate it with secret message.Te combined tensor is through three convolutional blocks and then add input cover image tensor to form an output.Specifcally, each convolutional layer has a kernel size of 3 × 3, with stride and padding 1. Te frst three convolutional blocks utilize the LeakyReLU activation function while the last one utilizes the Tanh activation function.Security and Communication Networks and inverse discrete cosine transformation on the coefcients in each block and fnally concatenates the subblocks to transform them into the spatial image.Due to the color image dataset used in our method, we need to add the color space transformation in the above-given process.Te core of this module is the following IDCT function:

IDCT Module
where a, b � 0, 1, 2, . . ., N − 1. f(a, b) and F(u, v) represent the spatial pixel value and DCT coefcient, respectively.Te coefcients α(u) and α(v) can be expressed as Tis IDCT module can keep the back-propagation while performing the IDCT process and ensure the gradient does not disappear.In addition, we utilize matrix multiplication to realize the IDCT module, which ensures the high efciency.Generally, JPEG compression consists of two steps, which are DCT transformation and quantifcation.Te quantifcation process is followed by rounding.Because the rounding function is a piecewise step function and cannot be diferentiated, the transfer of gradient will be truncated after rounding.Terefore, we utilize the following rounding operation to simulate rounding [31]: where [x q ] stands for rounding x q .Formula (3) has a nonzero derivative almost everywhere, which can simulate the round operation and at the same time keep gradient propagation.
Tanks to the careful design of the JPEG compression attack module, our method can efectively resist relative attacks by simulating the attack in the training step and greatly improve the robustness of the JPEG steganography method.It is worth mentioning that the attack module can also simulate other malicious attacks, such as Gaussian noise and dropout and can also achieve excellent performance.

Decoder.
Te decoder takes the DCTcoefcients of the attacked stego image as input and produces the decoded secret message, which has the same shape as the original secret message.Te network structure of the decoder is shown in Figure 4. We apply six convolutional blocks to obtain the decoded secret message.Each convolutional layer has a kernel size of 3 × 3, with stride and padding 1.In particular, we add the DCT coefcients of the attacked stego image to the output of the ffth convolution block and then carry out the last convolution.Tis operation contains more steganographic weak signals, which is conducive to the recovery of secret messages.

Discriminator.
To provide feedback on the performance of the network and produce more realistic images, we introduce an adversarial discriminator as shown in Figure 5.It takes a spatial cover image and a spatial stego image as input.We use four convolutional blocks, each convolutional layer has a kernel size of 3 × 3, with stride 2 and padding 1. Te frst three blocks take LeakyReLU as the activation function and the last block uses ReLU.At the end of the last convolutional block, adaptive average pooling is performed.After that, the discriminator performs squeeze and full connection operations to output the result of binary classifcation.

Loss Function.
Te objective loss function of HRJS contains reconstruction loss L rec and adversarial loss L adv .Te training objective is to minimize: where λ a is used to adjust the weight of the above two losses.Te reconstruction loss L rec encourages the stego image and the decoded secret message closer to the cover image and the original secret message, respectively.Terefore, where λ c and λ m are hyper-parameters which balance the fdelity of the stego image and the degree of secret message recovery.L c and L m represent the image reconstruction loss and message reconstruction loss, respectively.In the next section, we will present the efect of the diferent weights of L m in detail.
In order to ensure the visual quality of the stego image, we use mean square error (MSE) and structural similarity (SSIM) to measure the similarity between the cover image C and stego image S.So we defne the following image reconstruction loss L c : where parameter β is used to adjust the importance of MSE and SSIM, and we will explain the determination of β in detail in the next section.MSE(C, S) denotes the mean square error between cover image C and the corresponding stego image S. Te smaller the value of MSE, the better the image quality.Similarly, SSIM(C, S) denotes the structural similarity between C and S. Te range of SSIM is [0, 1].
Closer to 1 means that the stego image is more similar to the cover image.Given the current image X � (x i,j ) W×H and the reference image Y � (y i,j ) W×H , where W and H represent the width and height of the images, respectively.MSE and SSIM are calculated by Security and Communication Networks where L(x, y), C(x, y), S(x, y) represent brightness comparison, contrast comparison, and structural comparison, respectively, while l, m, and n are used to adjust the weight of SSIM.Usually, we set l � m � n � 1. L(x, y), C(x, y), and S(x, y) are calculated by where μ x and μ y represent average pixel value, σ x and σ y represent standard deviation, σ 2 x and σ 2 y represent variance, σ xy is the covariance of x and y, and C 1 , C 2 , C 3 are constants.
At the same time, in order to ensure the decoding accuracy of the message, we use MSE to measure the diference between the secret message M and the decoded secret message M ′ .We defne the following message reconstruction loss L m : Finally, the adversarial loss L adv is defned by where L d represents the discrimination loss, which aims to train the discriminator to distinguish the cover image and the corresponding stego image, we defne this loss by the following cross-entropy loss: where z 1 and z 2 are the softmax outputs of the discriminator while z 1 ′ and z 2 ′ stand for the ground truth labels.
Te training steps of HRJS is shown in Algorithm 1.

Experimental Results
In this section, we perform lots of experiments to verify the imperceptibility and high robustness of the proposed HRJS.We begin with introducing the implemental details and Te model has been optimized by Adam optimizer.We set the learning rate as 0.0001, and batch size as 16 to adapt our devices.Each epoch includes 2500 iterations and the whole process has 100 epochs.At the end of the training, the model has already converged sufciently.

Evaluation Metrics.
We evaluate our method along with three metrics, which are commonly used to evaluate deep steganography methods: capacity, de codi ng accuracy, and image.quality.We utilize bits per channel (bpc) to evaluate the capacity.For a steganography algorithm, we have to make sure that the decoded secret message and secret message are as similar as possible, consequently, the decoding accuracy is undoubtedly a fundamental metric.Moreover, we utilize peak signal-to-noise ratio (PSNR) to evaluate the distortion of the stego image, which is a metric commonly used to measure image quality.PSNR can be calculated by: where n is the number of bits per pixel, generally be set as 8.
Te larger the value, the smaller the distortion.Due to most errors, sensitivity-based quality assessment methods (such as MSE and PSNR) utilize linear transform to decompose image signals, which can not well refect human visual characteristics.So we also utilize SSIM, which is more relevant to the perception of the human eyes to measure image quality.MSE and SSIM have been described previously.

Imperceptibility of the Image Steganography.
As we all know, the most fundamental and intuitive indicator of image steganography is imperceptibility.Terefore, we randomly select a cover image from the MS COCO dataset and obtain corresponding stego images by embedding secret messages with diferent payloads in Figure 6.In particular, to better visually observe secret messages, we choose three clear images from the MNIST dataset [33] and process them into binary images to replace random secret messages.Te differences between the cover image and corresponding stego images are boosted three times for illustration.We can obviously observe that the diferences between the cover image and corresponding stego images is very small in magnitude.Tis shows that HRJS has a well imperceptibility, and its process of steganography cannot be recognized by human eyes.Even when 1 bpc secret message is embedded, the quality of the stego image is better preserved.In addition, Table 1 shows the image quality and decoding accuracy of embedding secret messages with diferent payloads in the case of no attack; we can clearly observe that when the embedding capacity is 1/3 bpc, and the values of SSIM, PSNR, and decoding accuracy are as high as 0.9908, 40.39 and 0.9999, respectively.Even if 1 bpc secret message is embedded, our method can still maintain high image quality and achieve almost 100% decoding accuracy for extracting secret messages.Te above-given visual and quantitative results verify that our HRJS has high imperceptibility.

Robustness against JPEG Compression.
We train several diferent models and the results of image quality and decoding accuracy are presented in Table 2.We set β � 1, λ c � 1, λ m � 3, and λ a � 1 in this experiment.Te determination of those parameters would be discussed in the next subsection.We can observe that the image quality is well maintained under JPEG compression, especially under JPEG compression with QF � 95, the values of SSIM and PSNR can reach up to 0.9602 and 34.16, respectively.Even under a heavy JPEG compression with QF � 50, the values of SSIM and PSNR can be up to 0.8791 and 28.85, respectively.It is worth mentioning that the overall decoding accuracy is close to 100%.Te above-given analysis refects our method is robust to JPEG compression.

Input: Secret message; DCT coefcients of the cover image.
Step : Connect the DCT coefcients of the cover image with a binary secret message and input them into the encoder to obtain the DCT coefcients of the stego image; Step 2: Convert the DCT coefcients of cover and stego images into corresponding spatial images by utilizing the IDCT module; Step 3: Input the spatial stego image into the attack module to obtain the attacked stego image; Step 4: Convert the attacked stego image into DCT coefcients; Step 5: Input the DCT coefcients of attacked stego image into the decoder to obtain the decoded secret message; Step 6: Feed spatial cover and stego image into the discriminator; Step 7: Update network parameters alternately: ALGORITHM 1: Training steps of HRJS.

Security and Communication Networks
Figure 7 shows the decoded secret message, we can see that the secret messages can be recovered in high visual quality even if the stego images have gone through various heavy attacks.Te numbers contained in the recovered secret messages are clearly recognizable.
Owning to JPEG compression discards a lot of trivial information of the cover image, it causes more disturbance to the stego image compared with other attacks.So when the stego image sufers from JPEG compression, it will cause larger image distortion, and the secret message is more difcult to recover.So we especially show the image quality and decoding accuracy of JPEG compression with diferent payloads in Table 3.In order to explore a more general situation, the embedded secret messages in Tables 3-5 are randomly generated binary messages.
As shown in Table 3, we can see that, at the same quality factor, the more information embedded, the lower the image quality and decoding accuracy.For example, under the JPEG_75 compression at 1/3 bpc payload, the model achieves 4.3% and 7.27% higher decoding accuracy than the two other payloads, respectively.Similarly, at the same payload, the smaller the quality factor, the greater the information loss, and the lower the image quality and decoding accuracy.In addition, we can observe that binary images from MNIST datasets are easier to embed and extract than randomly generated secret messages.For example, as shown in Table 2, the value of decoding accuracy can be up to 0.9990 under the JPEG_95 compression at 1/ 3 bpc payload, this accuracy drops from 0.9990 to 0.9778 in Table 3. Te reason for this phenomenon is that the binary images from MNIST datasets have less information and strong texture regularity, which is easier to train the network.Te experimental results in Table 3 also show that our HRJS has high robustness despite the presence of heavy JPEG compression.
Figure 8 presents the results of PSNR and SSIM of diferent payloads with diferent JPEG quality factors.Te experimental results show that the image quality decreases with the increase of payload.We can draw a conclusion that no matter with a small payload or a relatively larger payload, the image quality is still pretty good.It proves that HRJS has good robustness against JPEG compression.We randomly select 500 cover and 500 stego images and train the common steganalysis tools StegExpose [34] to measure the security of our methods.As shown in Figure 9, we utilize the receiver operating characteristic (ROC) curves to show the detection results.Tese results show that StegExpose does not work well when QF � 95 or no attack, and our method has certain security when encountering slight JPEG compression.

Further Analysis.
In order to further verify the security of our method, we utilize the discriminator trained by HRJS as a steganalyzer to distinguish the stego images from the cover images.Te detection accuracy of this discriminator is 63.1% in the case of 1/3 bpc and without attack.Tis indicates that HRJS has certain security.Furthermore, we utilize two advanced deep steganalysis methods YangNet [35] and WISERNet [36] to detect the stego images.Unfortunately, these two methods can easily distinguish the stego images, and the detection accuracy is close to 100%.Te reasons for this phenomenon are as follows: (1) the end-to-end steganography method has a larger embedding capacity, which will inevitably lead to the loss of security.
(2) Incorporating the attacker model improved the robustness and sacrifce the security.(3) Te security also depends on the weight of the discriminator of the loss function.We will focus on improving the security of endto-end JPEG steganography with the same payload situation in the future.

Te Efectiveness of Attack Module.
To verify the effectiveness of the attack module, we train HRJS without and with the attack module then make experiments on the testing set with diferent JPEG compression attacks.
According to the results shown in Table 4, the attack module can improve the decoding accuracy by 0.3526, 0.4073, and 0.3969 under the three quality factors, which proves the attack module is very essential in this architecture.Security and Communication Networks on decoding accuracy and image quality, we set λ c � 1, λ a � 1, and diferent λ m in this experiment to explore its efect.Decoding accuracy and image quality confict with each other.Image quality would be lower if decoding accuracy was higher, so we have to adopt a trade-of strategy.

Diferent Weights of the Message Reconstruction
According to experimental results shown in Table 5, the secret message can not be extracted correctly with QF � 75 compression attack when λ m � 1.We choose λ m � 3 empirically to perform all the experiments.Te parameter can be determined by application scenarios.For instance, if there is a higher requirement for image quality, choose λ m � 2 will be better.

Diferent Weights of the Image Reconstruction Loss L c .
We set λ c � 1, λ m � 3, λ a � 1 and diferent β to explore the efect of the diferent weights of the image reconstruction loss L c .According to the result shown in Table 6, both of β � 0 and β � 1 perform well on decoding accuracy but β � 1 get a better performance on SSIM.We choose β � 1 to perform all the experiments.Te determination of β depends on applications, β � 0 is a preferred taking account of the decoding accuracy.

Conclusion
In this paper, we propose an end-to-end JPEG steganography method based on adversarial training, whose embedding and extracting message processes are both realised by a neural network.Besides, to enhance robustness, we insert an attack module to force the neural network to automatically learn how to correctly recover the secret message after being attacked.Te proposed method greatly improves the embedding capacity and decoding accuracy.
Comprehensive experimental results demonstrate that the end-to-end method is feasible with high-capacity JPEG steganography and has an obvious advantage in terms of robustness against JPEG compression at the same time.Moreover, we explore the efects on the diferent parts of the loss function.In our future work, we would explore more advanced and complex network structures to propose a more robust model.We will also focus on improving the security of end-to-end JPEG steganography at a large payload.

.
Tis module is used to obtain the spatial cover and stego images.It frstly segments the DCT coefcients of the image and then performs dequantization

Figure 1 :
Figure 1: Existing end-to-end architecture in spatial domain.

Figure 2 :
Figure 2: Te architecture of the HRJS.Te encoder is used to generate the DCT coefcients of the stego image, while the decoder is used to reconstruct the original secret message.Te attack module simulates JPEG compression.Te IDCT module and DCT module convert the image into spatial domain and JPEG domain, respectively.Te discriminator is used to identify whether a given image conceals the secret message.

Figure 6 :
Figure6: Example of embedding secret messages with diferent payloads."Message 1," "message 2," and "message 3" are binary secret images selected from NMIST dataset, and when each of them is embedded, the embedding capacity is increased by 1/3 bpc.

Figure 7 :
Figure 7: Secret messages recovery of the HRJS under various attacks."GT" means the original image, the four rows represent four diferent examples.
L m .λ m is used to adjust the importance of the message reconstruction L m .Diferent λ m would make an obvious efect

Figure 8 :
Figure 8: Te PSNR and SSIM of diferent payloads with diferent JPEG quality factors.As the payload increases, the PSNR and SSIM are decreasing, and it shows that image quality would decrease with a bigger payload.(a) PSNR.(b) SSIM.

10
Digital images are always accompanied by various attacks in the process of real Internet transmission, such as JPEG compression in the social Internet.However, most existing image steganography networks are vulnerable to these attacks, which can not satisfy the steganographic requirement in the practical application scenario.Moreover, if the stego image obtained by traditional steganography sufers these attacks in the process of transmission, it is difcult to recover the secret message correctly.To mitigate this problem, we insert an attack module after the IDCT module during the training stage to simulate JPEG compression.
Te architecture of the discriminator."Squeeze" is to compress the dimension of tensor.Figure 4: Te architecture of the decoder.Te frst fve convolutional blocks utilize LeakyReLU activation function while the last one utilizes sigmoid activation function.
[32]Implemental Details.We use the MS COCO dataset[32]to train and evaluate our model.MS COCO is an RGB color image dataset and we randomly select 40000 images as our training set, 1000 images as the validation set, and 5000 images as the testing set.Te images were uniformed to the size of 256 × 256, then compressed with QF (quality factor) � 75 (JPEG_75) by Matlab.

Table 1 :
Te image quality and decoding accuracy of embedding secret messages with diferent payloads in the case of no attack.Te embedded secret messages come from MNIST dataset, and the size of them are adjusted to 256 × 256 by Matlab."Decoding_acc" means the decoding accuracy.

Table 2 :
Te image quality and decoding accuracy at 1/3 bpc under JPEG compression with diferent quality factors.Te embedded secret messages also come from MNIST dataset.

Table 3 :
Te image quality and decoding accuracy of JPEG compression with diferent payloads.

Table 4 :
Te decoding accuracy under diferent JPEG compression attacks where HRJS is, respectively, implemented without and with attack module."-" denotes no JPEG compression.