Anti-Forensics of Image Contrast Enhancement Based on Generative Adversarial Network

In the multimedia forensics community, anti-forensics of contrast enhancement (CE) in digital images is an important topic to understand the vulnerability of the corresponding CE forensic method. Some traditional CE anti-forensic methods have demonstrated their eﬀective forging ability to erase forensic ﬁngerprints of the contrast-enhanced image in histogram and even gray level cooccurrence matrix (GLCM), while they ignore the problem that their ways of pixel value changes can expose them in the pixel domain. In this paper, we focus on the study of CE anti-forensics based on Generative Adversarial Network (GAN) to handle the problem mentioned above. Firstly, we exploit GAN to process the contrast-enhanced image and make it indistinguishable from the unaltered one in the pixel domain. Secondly, we introduce a specially designed histogram-based loss to enhance the attack eﬀectiveness in the histogram domain and the GLCM domain. Thirdly, we use a pixel-wise loss to keep the visual enhancement eﬀect of the processed image. The experimental results show that our method achieves high anti-forensic attack performance against CE detectors in the pixel domain, the histogram domain, and the GLCM domain, respectively, and maintains the highest image quality compared with traditional CE anti-forensic methods.


Introduction
With the development of computer techniques, digital images are widely used in our world. Accordingly, potential security problems in digital images have been emerging in recent years. Many manipulated digital images are threats to our forensic systems. To deal with this issue, researchers studied a large number of image forensic methods. However, numerous forensic methods have their limitations in robustness. To understand their limitations and weakness, anti-forensic studies in digital images were developed as well.
In recent years, many anti-forensic studies were proposed [1][2][3][4][5][6][7][8][9][10][11][12][13][14]. e common practice of digital image anti-forensics [1-3, 5, 7-11, 14] is to introduce a minimum distortion in the digital image to eliminate or change the corresponding fingerprints that forensic methods rely on, which can lead to the successful anti-forensic attack against forensic methods. In this case, the anti-forensic image is visually close to the attacked image. Recently, a different kind of anti-forensic practice in operation image anti-forensics is to model the image's anti-forensic problem as an image translation or restoration problem based on GAN, such as median filtering anti-forensics [4], JPEG compression anti-forensics [6], and multi-operation image anti-forensics [12,13].
is kind of practice translates the operated image to its unaltered one, which can result in the disappearance of the operation fingerprint. Besides, Chen et al. [14] proposed a GAN-based camera model anti-forensic study [14] to deceive camera model detectors as well as preserving the visual effect of the attacked image. Up to now, it is still developable to study operation image anti-forensics based on GAN under the condition of preserving the visual effect of the operation. We decided to focus on the study of a single operation anti-forensic task, such as CE anti-forensics, for the first attempt at this issue.
CE anti-forensics, as one of the tasks of anti-forensics, was developed to counter CE forensic methods in recent years. Early CE anti-forensic strategies [8][9][10] were studied against histogram-based CE forensic methods. Cao et al. [8] proposed the method of local random dithering (LRD), which aims at removing the peak-gap artifacts that appeared in the gray level histogram of the contrast-enhanced image. Barni et al. [10] proposed a universal anti-forensic method against histogram-based forensic detectors. After that, to further remove the artifacts in both histogram and gray level cooccurrence (GLCM), Ravi et al. [11] proposed an effective antiforensic CE technique by solving an optimization problem.
Although these methods can deceive histogram-based detectors and even GLCM-based CE detectors, they ignore the fact that their ways of pixel value changes would expose them in the pixel domain. So far, it is still a challenging CE anti-forensic task that counters CE forensic detectors in the pixel domain, the histogram domain, and the GLCM domain simultaneously. To both solve this problem and dig the potential capability of GAN, we propose a novel GAN-based CE anti-forensic method in this paper. We exploit GAN to process the contrast-enhanced image and make the processed image indistinguishable from the unaltered one in the pixel domain. Meanwhile, a specially designed histogram-based loss is introduced to enhance the attack effectiveness in the histogram domain and the GLCM domain. Besides, we use a pixel-wise loss to keep the visual enhancement effect of the processed image. We follow the mean-shifted Gaussianfunctions-based method in [15] to calculate the differentiable histogram suitable for deep learning training procedure. e experimental results show that our method successfully deceives three deep-learning-based CE forensic detectors [6,16] in the pixel domain, the histogram domain, and the GLCM domain, respectively, and keeps the highest image quality compared to traditional CE anti-forensic methods.
Our contributions are summarized as follows: (1) We exploit GAN to accomplish CE anti-forensics in the condition of preserving the visual effect of CE to a large extent. To the best of our knowledge, this is the first attempt to use GAN for CE anti-forensics in the condition of preserving the visual effect of CE. (2) We introduce a specially designed histogram-based loss to enhance the attack effectiveness in the histogram domain and the GLCM domain.
(3) Our method shows high anti-forensic attack performance in terms of the pixel domain, the histogram domain, and the GLCM domain. (4) Compared with traditional CE anti-forensic methods, our method achieves the highest image quality. e rest of the paper is organized as follows. In Section 2, we describe the background. In Section 3, we describe the proposed method. In Section 4, we present the details of our experiment. Finally, we summarize the work and look into the future development in Section 5.

Generative Adversarial Network and Anti-Forensics.
GAN is a deep learning framework proposed by Goodfellow et al. [17] to generate visually realistic images. Classical GAN includes two networks, a generator G and a discriminator D.
G tries to generate an image x ′ and make its distribution p g (x ′ ) close to the distribution p r (x) of the real image x as much as possible. D tries to distinguish the real image x from the generated image x ′ . e two networks are trained alternately in a competitive way by optimizing the following minimax problem: e two networks in GAN are opposite, which is similar to the relationship between the attacker and the forensic investigator. us, it is a proper way to study the anti-forensic method based on GAN [18].

CE Artifacts and CE Detection in Histogram Domain.
In the early age of CE forensic studies, Stamm and Liu [19] studied a blind CE forensic method in digital images based on the fact that the histogram of the unaltered image is smooth, while the corresponding histogram of contrast-enhanced one is with artifacts of peak-gap, as is illustrated in Figure 1.
Specifically, CE operation used to be a nonlinear pixel mapping, for example, Gamma Correction, which can be separated into locally contractive mapping and locally expansive mapping. e rounding after locally contractive mapping and locally expansive mapping results in the appearance of peaks and gaps, respectively. e peak-gap artifacts in the histogram are a kind of high-frequency signal, which can lead to the addition of the high-frequency component in the Fourier domain, as is shown in Figure 1. Meanwhile, there does exist a similar phenomenon in high-end and low-end saturated unaltered images. In the corresponding histograms, there exists the impulsive peak at the pixel value of 255 and 0. e Fourier transform of an impulse is a constant function, which also results in the addition of the high-frequency component. To avoid this effect, Stamm and Liu [19] proposed a pinch-off function to process the histogram as follows: where x represents the image, p(x) is the pinch-off function, and h(x) represents the histogram of x. en, the high-frequency measure F is calculated by the following formula: where Y(ω) is the Fourier transform of y(x) and β(ω) is a weighting function that takes values between 0 and 1 to deemphasize low-frequency regions of Y(ω). β(ω) is formulated as where c is a cutoff frequency.

Security and Communication Networks
Finally, Stamm and Liu [19] performed a threshold test to identify CE operation. From the point of anti-forensics, lowering the high-frequency component of the histogram of the contrast-enhanced image is a possible solution.

CE Artifacts in GLCM Domain.
GLCM is used to describe texture features from the pixel correlation of gray. De Rosa et al. [20] firstly discovered empty rows and columns appearing in GLCM of the contrast-enhanced image, while  there does not exist this kind of artifacts in the GLCM of the unaltered one, as is illustrated in Figure 1. e empty rows and columns in the GLCM correspond to the gaps in the histogram because of the absence of the corresponding pixel values. From the point of anti-forensics, empty rows and columns can be removed under the circumstance that the artifacts of peak-gap in the histogram are eliminated.

Histogram Calculation for Convolutional Neural Network.
Shifted step functions centered on the corresponding histogram bins can be used to calculate the histogram without any information loss, while they are useless for CNN to learn the histogram feature due to the fact that their derivative is zero everywhere except for the edges. Towards this issue, Sedighi and Fridrich [15] proposed a method to approximate the histogram with mean-shifted Gaussian functions, as is illustrated in Figure 2. With this method, the histogram bins can be calculated by the following formula: where I ij represents the pixel value of the image I in the location i, j, W, and H represent the corresponding width and height of I, k denotes the mean value, and σ denotes the standard deviation. Mean-shifted Gaussian functions are continuously differentiable and their derivatives are not always 0. With this property, it can obtain a valid back-flow of gradients for the update of CNN parameters.

GAN-Based CE Anti-Forensic Framework.
In this section, we propose a GAN-based framework for CE anti-forensics in the condition of preserving the visual effect of CE. Given a contrast-enhanced image x, our goal is to reconstruct it with the capability of attacking CE forensic methods as well as maintaining the visual effect of x. Figure 3 shows the overall architecture of our framework. Our framework is composed of three portions. In the blue portion, to enhance the attack effectiveness in the pixel domain, the generator G is used to transform the contrast-enhanced image to the generated one capable of falsifying the discriminator D by optimizing the adversarial loss L adv G . In the green portion, we approximate the histogram of the generated image and then process the histogram using Fourier transform (FT) and finally calculate the corresponding high-frequency measure F as the histogram-based loss L F G . Considering the fact that the high-frequency measure of the contrast-enhanced image histogram is higher than that of the unaltered one and that the CE artifacts in the histogram and the GLCM are interrelated, as is mentioned in Section 2, we can enhance the attack effectiveness of our method in the histogram domain and the GLCM domain by minimizing L F G . In the orange portion, we use a pixel-wise loss L pixel G to lower the visual difference between the contrast-enhanced image and the generated image. e details of our network and loss function are described in the next two subsections.

CE Anti-Forensic Network.
We design our generator for the anti-forensic processing of the contrast-enhanced image able to falsify the discriminator. Our generator network is illustrated in Figure 4. We introduce a skip connection from the input position to the position before the last clamp layer, which is a type of residual learning strategy to accelerate the network training [21]. e last clamp layer is used to restrict the maximum and minimum pixel values for keeping the consistency of the pixel range between the generated image and the contrast-enhanced image. e backbone network is composed of several groups. e first group includes a 3 × 3 convolution layer with output of 16 feature maps, a batch normalization (BN) layer, and a leaky rectified linear unit (LeakyReLU). en, three same residual blocks (ResBlocks) are connected. Each ResBlock has two repetitive parts, which include a 3 × 3 convolution layer with output of 16 feature maps, a BN layer, and a LeakyReLU layer. Besides, there exists a skip connection between the input of Resblock and the position  before the second LeakyReLU of ResBlock. e last group includes a 3 × 3 convolution layer with a single channel output, a BN layer, and an activation layer of the hyperbolic tangent (Tanh).
To ensure that the discriminator is enough capable of detecting CE in the generated image, the discriminator directly adopts the structure of P-CNN in [16], which was proposed for CE forensics.

Loss Function.
Our ultimate goal is to generate antiforensic images that can deceive CE forensic methods and are visually close to the corresponding contrast-enhanced images. To achieve this goal, we set the loss functions for the generator and the discriminator, respectively, and optimize their parameters for both of them by minimizing the loss functions during the training procedure. e details of our loss functions are as follows.

Generator Loss.
e generator loss function is where L adv G represents the adversarial loss for fooling the discriminator, L F G represents high-frequency measure F loss based on the image histogram in the Fourier domain, and L pixel G represents pixel-wise image quality loss of the generated image compared with the contrast-enhanced image. e coefficients λ 1 , λ 2 , and λ 3 represent the corresponding weights of each loss term.
L adv G is to ensure that our generator G can falsify the discriminator D. We calculate L adv G by the following formula: where x represents the contrast-enhanced image, G(x) represents the generated image, and D(·) denotes the output of the discriminator. L F G is to lower the high-frequency component in the histogram for the better attack effectiveness in the histogram domain and the GLCM domain. Before calculating this loss, we need to calculate the histogram of the generated image. To ensure that the back-flow of gradients is not blocked due to the calculation of the image histogram, we follow the method of using mean-shifted Gaussian functions to approximate it [15]. Considering that the pixel values of the generated image are not integers and they cannot be rounded for keeping the back-flow of gradients in the training procedure, we introduce a bias term b to the mean value of mean-shifted Gaussian functions to center on fractional pixels.
e different values of b correspond to different fractional pixels. e corresponding histogram bin of the generated image G(x) is calculated by the following formulation: where k � 0, 1, . . . , 255, − 1 < b < 1, k + b represents the center of the histogram bin, and σ � 0.3. We can get the final histogram V(G(x), b) by concatenating the 256 histogram bins. After that, we calculate L F G by the following formula: where N denotes the number of bias terms, b i represents the i-th bias term, V(G(x), b i ) represents the calculated histogram, and M(V(G(x), b i )) represents the high-frequency measure of the histogram V(G(x), b i ) in the Fourier domain. We follow the method [19] mentioned in Section 2 to calculate the high-frequency measure. L pixel G is to ensure that the generated image is visually close to the contrast-enhanced image. We calculate the absolute mean difference between the generated image G(x) and the contrast-enhanced image x. e formula of L pixel G is as follows: where i and j denote the pixel indexes and W and H denote the width and height of the image, respectively.

Discriminator Loss.
e discriminator is trained to identify the generated image from the unaltered one by optimizing the traditional discriminator loss function [17], which is as follows: where y represents the unaltered image and G(x) represents the corresponding contrast-enhanced image.

Experiment Setup.
In our experiment, we chose the public BOSSbase dataset [22] as the original dataset, which contains 10,000 grayscale images of size 512 × 512 in png format. Considering the limited hardware configuration, we decided to launch our experiment with images of size 128 × 128. Each image in the original dataset was cropped with no overlapping to get eight 128 × 128 patches. In this way, we obtained the unaltered dataset containing 80,000 images. Accordingly, we created 80,000 contrast-enhanced images using gamma correction. We chose four c values of 0.5, 0.8, 1.2, and 1.5, while the number of images for each c value is 20,000. erefore, we got 80,000 pairs of unaltered and contrast-enhanced images. We divided these image pairs into the training set and the testing set at a ratio of 4 : 1 for training and testing, respectively. e proposed network was implemented by PyTorch framework [23] and trained on one GPU, NVIDIA RTX 2080 Ti.
During each iteration for training our network, the generator was trained with 40 contrast-enhanced images and the discriminator was trained with 40 pairs of images, including contrast-enhanced images and the corresponding unaltered ones. e generator and the discriminator were alternately trained in iterations. Our training procedure was divided into two parts. Firstly, we trained our network for 35 epochs. e learning rates for the generator and the discriminator were fixed to 5 × 10 − 5 and 1 × 10 − 4 , respectively. We set the coefficients of λ 1 � 1, λ 2 � 0, and λ 3 � 100 in generator loss. en, we continued to train our network for 5 epochs.
e learning rates for the generator and the discriminator were both fixed to 1 × 10 − 6 . We set the coefficients of λ 1 � 1, λ 2 � 0.35, and λ 3 � 100 in generator loss. e cutoff frequency was set to 0.875π. Besides, we set four bias terms of b 1 � − 0.25, b 2 � 0, b 3 � 0.25, and b 4 � 0.5 in L F G . We used Adam as the optimizer with β 1 � 0.5, β 2 � 0.999, and ε � 0.5 for the generator and used SGD as the optimizer with momentum � 0.9 and weight decay � 5 ×10 − 4 for the discriminator. After the training procedure, we input 16,000 contrast-enhanced images in the testing set into the well-trained generator model to obtain 16,000 anti-forensic images.

Evaluation.
Before evaluating CE anti-forensic algorithms, we trained three deep-learning-based CE forensic detectors proposed by [16,24]. Two detectors of P-CNN and H-CNN, the input data of which are in the form of images and histograms, respectively, were proposed in [16]. Another detector was proposed in [24]. For convenience, we refer to it as GLCM-CNN, as it classifies contrast-enhanced images from unaltered images by analyzing the GLCM of images. e performance of the three detectors under the testing set is shown in Table 1.
We evaluated CE anti-forensic methods in two aspects, attack effectiveness and image quality. Firstly, we carried out anti-forensic attacks using four types of antiforensic images, which were obtained by our method and three other traditional methods [8,10,11], against three trained CE forensic detectors. e detection accuracies of each detector for these four types of anti-forensic images are shown in Table 2.
e lower detection accuracy indicates the better attack effectiveness of the corresponding anti-forensic methods. e average detection accuracy of P-CNN for our anti-forensic images is 0.1304, which is the lowest compared with the other three anti-forensic methods. is is because our method considers the antiforensic attack in the pixel domain, while other methods do not take it into account. e detection accuracies of H-CNN and GLCM-CNN to our method are still at low levels because we consider enhancing the anti-forensic attack performance in terms of the histogram domain and the GLCM domain by introducing a histogram-based loss L F G . Even if they are not the lowest, these results indicate that our method is still effective enough to deceive H-CNN and GLCM-CNN. In general, our method successfully deceives P-CNN [16], H-CNN [16], and GLCM-CNN [24].
Secondly, to verify the image quality of these four CE anti-forensic methods in the condition of keeping the contrast-enhanced visual effect, we calculated PSNR and SSIM between anti-forensic images and the corresponding contrast-enhanced ones.
e higher values of PSNR and SSIM mean the better image quality. e average PSNR and average SSIM of these four anti-forensic images are shown in Table 3. Our method achieves the highest image quality, 49.0258 dB of PSNR, and 0.9926 of SSIM. To summarize, our method can still keep good anti-forensic attack effectiveness with the highest image quality.
For visualization, we present an example that contains an unaltered image, a contrast-enhanced image, an antiforensic image of our method, and the corresponding histograms and GLCM, shown in Figure 1. We can hardly find the visible distortion in the anti-forensic image compared to the contrast-enhanced one. e artifacts of peak-gap and empty rows and columns of the contrastenhanced image in the histogram and the GLCM, respectively, are successfully erased. Besides, the high-frequency component in the Fourier transform of our antiforensic image histogram is at a low level, which is close to the unaltered one.
Finally, we evaluated the impact of the histogram-based loss L F G . In Figure 5, we can find that the loss term of L F G is beneficial to enhance the attack ability against P-CNN, H-CNN, and GLCM-CNN. In particular, the enhancement of the attack effectiveness against H-CNN and GLCM-CNN is obvious, which is in accord with our idea of enhancing the attack effectiveness in the histogram domain and the GLCM domain by using L F G .

Conclusions
In this paper, we propose a novel CE anti-forensic method based on GAN. Our method shows the high anti-forensic attack performance against deep-learning-based CE detection techniques in terms of the pixel domain, the histogram domain, and the GLCM domain. e image quality of our anti-forensic images is also superior to other traditional methods. In the future, we attempt to study a general visual effect preserved operation image anti-forensic method based on GAN for more tasks of operation image anti-forensic.

Data Availability
All data included in this study are available upon request to the corresponding author.    Figure 5: e detection accuracies of (a) P-CNN [16], (b) H-CNN [16], and (c) GLCM-CNN [24] to our anti-forensic method under the circumstances of using L F G and not using L F G . e lower accuracy indicates the better attack effectiveness of the anti-forensic method.