Reversible Privacy Protection with the Capability of Antiforensics

In this paper, we propose a privacy protection scheme using image dual-inpainting and data hiding. In the proposed scheme, the privacy contents in the original image are concealed, which are reversible that the privacy content can be perfectly recovered. We use an interactive approach to select the areas to be protected, that is, the protection data. To address the disadvantage that single image inpainting is susceptible to forensic localization, we propose a dual-inpainting algorithm to implement the object removal task.-e protection data is embedded into the image with object removed using a popular data hiding method.We further use the pattern noise forensic detection and the objective metrics to assess the proposed method. -e results on different scenarios show that the proposed scheme can achieve better visual quality and antiforensic capability than the state-of-the-art works.


Introduction
Photo sharing has become a widespread user activity with the advent of intelligent mobile devices and online social networks (OSN). Image distributions cause privacy concerns and the requirement to modify permissions since the shared content contains sensitive data of users. By providing unique rights to selected communicating parties in OSN, users' security and privacy can be strengthened. A well-established form of privacy protection is to blur a part of an image, which can be achieved by various image processing techniques, for example, blurring, mosaic, masking, and object removal, as shown in Figure 1. In these methods, the first three must introduce a significant amount of distortion to hide the underlying content. Object removal provides more natural viewing conditions and is able to protect the content.
is process is reversible such that the original data can be accessed with permissions [1].
After object removal in an image, the broken parts can be inpainted using the surrounding contents. Generally, image inpainting algorithms can be divided into three groups, including the statistical-based, the diffusion-based, the patch-based, and the deep generative models-based methods [2,3]. Statistical methods use parametric models to describe textures but fail when additional intensity gradients are applied [4]. Diffusion-based methods propagate pixels from the known areas of the image [5][6][7] using smoothness priors; however, blurring occurs when large and high-frequency regions need to be inpainted. Patch-based and deep generative models are the most widely used, where the former fills the holes in the image using the patch from local or global search regions [8][9][10][11][12] and the latter exploits semantics learned from large-scale datasets [13][14][15]. None of the inpainting algorithms have considered the secrecy of the inpainted areas from the security perspective. e inpainted images are easy to be detected and located by forensic algorithms.
In this paper, we propose a new privacy protection scheme using image inpainting and data hiding, which realizes the antiforensics capability. When considering the undetectability of edge inpainting, we use the algorithm of the DFNet network [16]. e regions around the broken edge are inpainted twice, and the inpainting results are fused to achieve the capability of antiforensics. By combining image dual-inpainting and data hiding, a privacy protection scheme with antiforensics capability is realized. We combine local variation within and between channels and use the popular data hiding algorithm HILL [17] to embed the protection data. e rest of this paper is organized as follows: we introduce the related works in Section 2. e proposed method is depicted in Section 3. Experimental results and analysis are provided in Section 4. Section 5 concludes the whole paper.

Related Works
In this section, we introduce the works that are related to the proposed method, including the image inpainting, the data hiding, and the image forensics.

Image Inpainting.
Image inpainting is a method to fill the missing information in an image and is quite important in the field of image processing. Nowadays, the deep generative models-based methods are widely used in the field of image inpainting [14,[18][19][20][21][22][23]. Numerous methods can be divided into two categories [24]. One approach is to use an effective loss function or construct an attention model to fill in the missing regions to try to make the content more realistic. ey use the content in the background to fill, and a better way is to fix the unknown region by partial convolution [18]. e other approach focuses on structural consistency. To ensure the continuity of the image structure, these approaches usually adopt edge-based contextual priors. For example, [19] designed an edge linking strategy that can well solve the image semantic structure inconsistency problem.
Regardless of the inpainting method, there is a discontinuous transition zone at the edge of the inpainting. is area will become a forensic object and thus easy to locate the inpainting area by someone who is interested, which is quite unsafe. In order to not only achieve a good visual effect but also secure safety, a smooth transition needs to be achieved in advance. An iterative method to optimize the pixel gradients in the edge transition regions is proposed in [25],. e quality of fusion depends on whether the incorporated content is consistent with the original content in terms of gradient changes. us, Hong et al. [16] design a learnable fusion block to implement pixel-level fusion in the transition region, which is named deep fusion network for image completion (DFNet). e results show that DFNet has superior performances, especially in the aspects of harmonious texture transition, texture detail, and semantic structural consistency.

Data Hiding.
To further optimize the data embedding problem in information hiding, adaptive embedding algorithms are widely proposed. Among them, STC (Syndrome Trellis Coding) [26] based adaptive architectures are most preferred by researchers.
is method uses a predefined distortion function to minimize the additive distortion between stego and cover. For the multiscale characteristics of the image space, the design of the distortion function has attracted more and more attention. For instance, Li et al. [17] proposed a new distortion function for image information hiding. e cost function is composed of a high-pass filter and two low-pass filters. e high-pass filter is used to locate the difficult-to-predict parts of an image and then employ the low-pass filters to make the low-cost values more clustered. Furthermore, the methods of MiPOD (Minimizing the Power of Optimal Detector) [27] and ASO (Adaptive Steganography by Oracle) [28] were proposed one after another. In addition, a number of distortion functions have been proposed for JPEG steganography as well, such as IUERD (Improved UERD) [29], UED (Uniform Embedding Distortion) [30], and RBV (Residual Blocks Value) [31].
In addition, some work uses machine learning algorithms to design steganalysis tools to detect steganography. Most of these approaches learn a general steganography model through a supervised strategy and then use it to distinguish suspicious images [32][33][34][35]. With the rapid development of deep learning, the performance of steganalysis has been greatly improved [36][37][38]. However, depth features still have limitations in steganalysis [39]. For example, the truncation and quantization operations in the feature extraction process are difficult to be learned by existing networks. erefore, feature extraction is still a challenge in steganalysis, and many rich feature sets have been used for JPEGY steganalysis. e main available feature sets include JPEG rich-model [40], DCTR GFR (Gabor filter residuals) [41], and DCTR (Discrete Cosine Transform Residual) [42]. In the classification process, the ensemble classifier is considered to be effective in measuring the feature set [43,44].

Image Forensics.
Currently, there are two forensic methods of detecting image inpainting [45,46]. In [45], the authors find that the Laplacian operations along the isophote direction in the inpainted regions are different from the other regions. Accordingly, the inpainted regions can be identified by exploring the changes of local variances between intra-and interchannels. In [46], noise pattern analysis is used to locate the inpainted regions. For the images captured by one camera, the noise patterns in each image are approximately the same and vice versa. erefore, the noise pattern can be used as the fingerprint for a camera, which is widely adopted in image forensics. e noise pattern analysis algorithm in [46] is popular. In this model, the pixel values can be constructed by ideal pixel values, multiplicative noises, and various additive noises, which can be expressed by where I and O are the actual pixel and ideal pixel value of the natural scene, a is the sum of various additive noises, f(•) is the camera processing like CFA interpolation, and K is the coefficient for noise pattern. In equation (1), the multiplicative noise K·O is the theoretical expression of the noise pattern, which is a multiplicative noise in the high frequencies related to the image contents. Generally, we can use a low-pass filter to remove the additive noises. e residual noise is then used to estimate the noise pattern [47], as shown in the following equation: where F(•) is the low-pass filter and p is the estimated noise pattern. e noise pattern can be used to distinguish the content from different images. erefore, the inpainted region can be detected after extracting the noise pattern from each part of the image.
During inpainting, since there are limited pixels around the damaged regions, each diffusion is smoothed based on the surrounding pixels to accomplish the diffusion. erefore, the pixels located in the inpainted region satisfy I n t (i, j) � 0, which means that the results of Laplacian operation on this position remain unchanged along the isophote direction after the diffusion-based inpainting. e Laplacian variation along the isophote direction can be calculated by where ΔI(i, j) is the (i, j)-th Laplacian value and ΔI(i v , j v ) is the result of Laplacian operation on a virtual pixel on (i v , j v ). e virtual pixel is located at the direction of ∇I ⊥ (i, j), and its distance to the pixel I(i, j) is identical to 1.

Proposed Method
In this section, we present an antiforensic framework to perform object removal in images using dual-inpainting and data hiding. As shown in Figure 2, the proposed framework contains four parts. We first select the protected area interactively and calculate the percentage of the area in the whole image. en, the background with the missing protected area was inpainted. In order to achieve a satisfactory visual effect and be as forensic-free as possible, an image dual-inpainting algorithm is proposed, as shown in Figure 3 and described in Section 3.1-3.3. For the inpainted image, region segmentation is performed based on the changes of local variances between the intra-and interchannels. Meanwhile, the protected region is embedded into the background after converting it into a bitstream by combining the HILL embedding algorithm and considering the segmentation. On the recipient side, we can extract the embedded data, fuse it with the background image, and recover the original image.

Protection Region Selection.
We interactively specify the area in an image to be protected, which also means that the hidden area is determined. After that, we calculate the number of the pixels to be hidden, including the values and coordinates of these RGB pixels. e pixels are converted into bit stream for embedding. We define the bits of each pixel as 5 × 9, in which "5" stands for pixel values in three channels, horizontal and vertical coordinate values, and "9" means that we convert each decimal to 9 bits. In a color image, information can be embedded in all three channels at each position. us, the maximum amount of embeddable information is three times the image size. e maximum embedding ratio T is calculated to be 6.66% per image. Let t be the proportion of the selected protection region. e proportion should be smaller than a predefined threshold T. An example of the interactive region selection is shown in Figure 4.

Background Processing.
After specifying the protection area, we remove the contents in this area and inpaint the image. When inpainting large areas, it is often not possible to perfectly blend the inpainted area with the existing content, especially in the edge areas [16]. To fill this gap, the DFNet network [23] introduces a fusion block, which combines the structural and texture data and smoothly blends them during the inpainting process. As shown in Figure 5, I is the input image, F k is the feature maps from k-th layer, and I k is resize of I. e learnable function M is designed to extract the raw completion C k from feature maps F k , which is as follows: where M denotes the channel conversion operation, which converts n channel feature maps into 3-channel images under the condition of constant resolution. In addition, another learning function A is used to generate the alpha composition map a k : Map a k usually is obtained by synthesis from a single channel or 3 channels for imagewise alpha composition. Previous experience has demonstrated that channelwise alpha composition performs better. A is a convolutional module which consists of 3 convolutional layers with kernel sizes of 1, 3, and 1, respectively. e final result I k ′ is achieved by e fusion block makes the image inpainted by the DFNet network almost visually free of edge discontinuity. Although the DFNet network achieves good visual results, it is not suitable for privacy protection since it can be easily localized for forensics. For example, pattern noise of the image detection reveals clear artifacts in the restoration edge area. To conceal these traces and achieve the privacy-preserving, further manipulation of the inpainting image is required. Security and Communication Networks e detection area is mostly found in the edge area of the restoration, so we consider secondary processing of the edge area to eliminate the traces left during the restoration process. In this process, we used the mathematical morphology of the dilation operation and the erosion operation. In the dilation operation, the structural element B is used as an external window to increase the overall boundary of the target image. In the erosion operation, the structural elements serve as the internal windows to eliminate the boundary of the image. e dilation operation is expressed by equation (7) and erosion operation can be expressed by equation (8): e specific dual-inpainting process is shown in Figure 3. Firstly, the background image should be inpainted using the DFNet network. en, we apply a mathematical morphological dilation operation on the edges of the broken region mask map. Based on this mask map, secondary inpainting of the primary inpainted image is performed in the region. In addition, mathematical morphology erosion operation is then applied to the secondary inpainted region, leaving only a portion of the region close to the edge. Note that the dilation operation uses a larger size of structural elements than that of the erosion operation to ensure the results of the secondary inpainting of the lower edge are preserved. e results of the secondary inpainting of the edge region are fused with the primary repair map to obtain a graph of the experimental results of antiedging detection.

Area Segmentation and Data Hiding.
To hide the secret data of the protection region, we employ the popular data hiding framework which can be achieved by STC [17]. We improve the popular cost function HILL for STC to fit the requirements in our method.
In the STC framework, the theoretical minimum steganography distortion D for the marked image with an embedding amount of c (bits) can be defined as where p + i,j and p − i,j are the probabilities of adding 1 or subtracting 1 on c i,j , 0 < p + i,j + p − i,j < 1, and ρ i,j stands for the distortion values used to measure the effects of modification. e parameter λ (λ > 0) is used to make the ternary data entropy of the modification probability identical to the capacity c, as shown in the following equation:

Security and Communication Networks
To achieve the minimum distortion D, STC encoding is where y l ∈ {0, 1} MN is the least significant bits of the stego image, C(m) � {z ∈ {0, 1} MN |Hz � m} is the companion set of m, and H ∈ {0, 1} c×MN is a predefined low-density parity test matrix related to embedding speed and embedding efficiency. e embedded bits m can be extracted simply by a matrix multiplication operation: To fit the requirements in our method, we improve the popular cost function HILL for STC by combining variations within and between adjacent pixel channels. Specifically, we divide the cover image into four regions (marked with green, blue, black, and red in Figure 6) using the cost values of HILL and edge connectivity. e pixel complexity of the four regions decreases in the order of green, blue, black, and red. In other words, the green region has the most complex pixels and is the best embedding region for the whole image. erefore, secret bits are embedded into the green region preferentially.

Experimental Results
is section presents the experimental evaluation results. Firstly, we introduce the database employed and the corresponding parameters.
en, experiments for each part are presented in turn and their validity is demonstrated.

Performance for Antiforensics.
To evaluate the performance of antiforensics, we randomly select images from the database for validation and interactively select the areas to be protected, as mentioned in Section 2.
In each image, the selection of the protected area is irregular shape generally. For later embedding of data, we strictly controlled the ratio of protected areas to the image to less than 6.66%. We use two separate forensic approaches for the forensic analysis of our results: one is pattern forensics by pattern noise, and the other one is based on changes between and within adjacent pixel channels.
Firstly, we select 50 landscape images sized 512 × 512 from Today's Headlines. As shown in Figure 7, we selected four of them, I1, I2, I3, and I4 in turn. Table 1 lists the space proportion t and the number of pixels to be embedded in the whole image of the corresponding protection area of the four images in Figure 7.  Figure 7(b), we find that Figure 7(d) has obvious traces at the repair edges, which makes the repair region easy to be forensically located. While our method overcomes this drawback well, it is difficult to forensically locate our tampered region from the pattern noise forensic aspect only. It shows that our aspect has a good antipattern noise forensic effect.
In Figure 8, we show the experimental results for five images (M1, M1, M3, M4, and M5) in the UCID database, sized 384 × 512. Table 2 lists the space proportion t and the number of pixels to be embedded in the whole image of the corresponding protection area of the four images in Figure 8. Two traditional methods and a deep learning method are used for comparison, where the traditional methods are edge-oriented and Delaunay-oriented provided by G'MIC [48], a full-featured open-source framework for image processing.
e deep learning-based one is the DFNet method mentioned in [16].

Security and Communication Networks
Comparing from the subjective vision, both our experimental results and the deep learning method outperform the traditional method and achieve good visual connectivity at the edges. In particular, in row 7 of Figure 8, the effect at the red petal achieves a good visual effect after blending with the primary restored image by our secondary processing of the restored edges.
In addition, we localized the inpainted image for forensics by the forensic algorithm proposed in [46], as shown in the even rows of Figure 7. e traditional restoration-based algorithm is easy to be detected and located, and the DFNet-based restoration also achieved good antiforensic results. However, the images obtained by our method are more suitable to hide the area to be protected. In particular, the results are better when the area to be protected accounts for less than 4% of the whole image.
In Table 3, we show the F1 values of the five images in Figure 8, where a smaller F1 value indicates a worse ability to correctly locate the image and indicates that we have a better antiforensic effect. We can see from Table 3 that our method is superior in terms of objective indicators.

Experiment Setup.
In our experiments, we use the free user-shared image dataset provided by Today's Headlines, which contains a large number of people landscapes, and various life images. We also use the UCID database. Based on the maximum amount of data that can be embedded in an image, it can be calculated that the size of the protected area must not exceed 6.66% of the whole image (T � 6.66%) no matter how large the image size is. For the structural elements for the mathematical morphology of the background process, the circular structure is employed since it has a Security and Communication Networks smoother edge where the structure size is 10 for the dilation operation and 5 for the erosion operation.
To evaluate the performance of image dual-inpainting against detection and localization, we adopt F1-score, peak signal-to-noise ratio (PSNR), and mean square error (MSE) objective indicators to evaluate the inpainting results: where TP (true positive), FN (false negative), and FP (false positive) stand for the number of detected inpainted pixels, undetected inpainted pixels, and wrongly detected untouched pixels, respectively: where A(i, j) and B(i, j) are the original image and the inpainted image, respectively.

Reversibility Analysis.
In this section, we show that our privacy protection method is effective during communication or sharing. Meanwhile, our method is fully reversible, which enables data to be extracted when it reaches the recipient side.
In Figure 9, we show five sets of comparisons between the recovered images and the original images. e first two of which are from the Today's Headlines database and the last three from the UCID database. In the prerecovery and embedding image operations, there is no damage or tampering to the regions other than the region to be protected. erefore, under the condition of having the pixel values and coordinates of the region to be protected, the original images can be recovered.  Table 1: e percentage of protected areas in the whole image(t) and the total number of pixels in the protected area(p), I1, I2, I3, and I4 represent the four pictures in Figure 7, respectively.  Figure 8: Examples from the UCID database. Rows 1, 3, 5, 7, and 9: from left to right, the first image is the original image, and the second to the fifth images represent the inpainted image by references [16,48] and our method, respectively. Rows 2, 4, 6, 8, 10: from left to right, the first image is ground truth, and the second to the fifth images represent the localization result calculated by forensic algorithm 2.

Conclusion
Currently, most of the privacy protection methods only focus on visual quality, while the real protection needs to be considered from the perspective of image security analysis. We propose a reversible privacy protection scheme using image dual-inpainting and data hiding, in which the original image can be perfectly recovered.
Experimental results show that after the inpainting of the image with the removal of the area to be protected by the dual-inpainting algorithm, antiforensics for the two current methods for target removal forensics can be achieved. e later embedding and extraction of the protected region also achieve an effective combination of the two research directions of antiforensics and steganography. In addition, reversible privacy protection not only effectively stops snooping but also guarantees that the original image can be recovered when needed.

Data Availability
In our experiments, we use the free user-shared image dataset provided by Today's Headlines, which contains a large number of people landscapes and various life images. We also use the UCID database.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.  Table 2: e percentage of protected areas in the whole image (t) and the total number of pixels in the protected area (p), M1, M2, M3, M4, and M5 represent the four pictures in Figure 8,