A Novel DIBR 3D Image Hashing Scheme Based on Pixel Grouping and NMF

School of Information Science and Technology, Heilongjiang University, Harbin 150080, China Guangxi Key Laboratory of Cryptography and Information Security, Guilin 541004, China College of Mathematics Physics and Information Engineering, Jiaxing University, Jiaxing 314000, China State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China


Introduction
Depth-image-based rendering (DIBR) [1] is a kind of 3D representation technology, by which the virtual left and right images are generated from the center image according to the depth information described with the depth image. Then, viewers can easily get stereo perception with the virtual image pair. In the digital communication model of DIBR 3D image, receiver performs depth-image-based rendering operation to generate virtual image pair for 3D video perception. As a matter of fact, either of the center image, the virtual left image and the virtual right image may suffer from illegal or unauthorized redistribution. In order to resolve this problem, robust perceptual hashing has been widely used for digital multimedia protection. As variety of copies for center image and virtual images existing, image hashing can also help us to find the similar one and detect the tempered [2][3][4][5][6]. In this paper, we focus on designing a robust image hashing scheme for DIBR 3D image identification. In the DIBR system, virtual right image and left image are generated from the corresponding center image with pixel mapping. In a sense, virtual images have similar visual content with their corresponding center image, which demands the hashing scheme should identify the virtual images with the same content as the center image as shown in Figure 1.
Generally, traditional 2D image hashing should have the several characteristics such as one-way function, compactness, perceptual robustness, visual fragility, and unpredictability [7]. For DIBR 3D image hashing, the perceptual robustness should have more stringent requirements as I c represents the center image, I v represents the virtual image, and I d represents the perceptually similar copy of I c or I v with minor distortion. Here, ε and τ should be close to zero. This paper focuses on designing a robust DIBR 3D image hashing for DIBR 3D image identification.
In general, robustness and discrimination are two important aspects should be considered to design an image hashing scheme. Robust image hashing has been extensively studied for content-based identification for traditional 2D images. As feature extraction affects the identification performance for image hashing, many existing methods focus on extracting robust features resisting to content-preserving operations [9,19]. In addition, some dimensionality reduction methods have been adopted to extract the robust features for hash generation, such as singular value decomposition (SVD) [20] and nonnegative matrix factorization (NMF) [21], which are robust to most kinds of signal distortion attacks but sensitive to geometric distortions such as rotation. In [22], a robust image hashing with multidimensional scaling is proposed, which achieves better performance when taking into account image classification. In order to make the hashing scheme robust to geometric distortion attacks such as rotation, robust image hashing scheme is proposed to extract the geometric-invariant features for generating the final hash vector [23]. In [24], a robust image hash in Radon transform domain is proposed, which is robust against rotation, but the discriminative capability is not good enough. In [25], invariant moments extracted from color spaces are used to generate the final hash vector. This hashing scheme is robust against rotation, but increase misclassification. In [26], Li uses Gabor filtering to extract features and compresses these features with dithered lattice vector quantization to generate the compact hash. This method is robust against rotation, but the discriminative capability is also not good enough.
In recent years, some novel and excellent hashing algorithms are proposed. Qin et al. [27] propose a security image hashing scheme based on perceptual texture and structure features, but the image classification performance is not good enough. In [28], a robust image hashing based on tensor decomposition is proposed, which is robust to common signal distortion attacks. However, the discriminative capability is not good enough. Lv et al. [7] propose shape contexts and local feature point-based image hashing scheme. Compres-sing the descriptors of SIFT feature points in each hash bin to form the final hash vector, their hashing scheme is robust to geometric distortion attacks such as rotation. However, the performance is degraded when the detected key points from the test image are not stable enough to coincide with the detected key points from the original. Tang et al. [29,30] propose a kind of robust image hashing scheme based on ring partition. Using the pixels in each ring to form a secondary image insensitive to rotation, they extract the final hash vector from the secondary image. The experimental results show that their hash schemes are robust to rotation with good discriminative capability. This kind of method considers that the viewpoint never changes when the digital image is attacked by most of the content-preserving manipulations. In other words, the image center of original image and their copies would not change.
Performance comparisons among some traditional 2D image hashing algorithms are summarized in Table 1. For signal distortion attacks, the word "Yes" means that the algorithm is robust against some operations including additive noise, blurring, and JPEG compression. For geometric distortion attacks, the word "Yes" means that the algorithm is robust against scaling, rotation within arbitrary degree, and the word "Unknown" means that such performance result has not been reported in the literature as far as we know.
In fact, the image center of center image and virtual images are different, which is caused by the DIBR operations. As a result, this kind of traditional 2D hashing scheme would not achieve good performance when applied for DIBR 3D image identification. Some of the state-of-art traditional 2D robust image hashing schemes resisting to geometric distortions do not take into account the situation about viewpoint changing [7,29,30]. Dividing the image into several rings or constructing rotation-invariant secondary image according to the unchanged image center is the key step to construct hash vector robust to rotation manipulation. However, the image center changes when generating virtual images in the DIBR system.
In this work, a pixel grouping and nonnegative matrix factorization-based hashing scheme is designed for DIBR 3D image identification. The key contribution is using the approximate invariance of histogram shape to extract features insensitive to the operation of virtual image generation, making our DIBR 3D image hashing scheme identify the virtual images with the same visual content as the original center image. The rest of this paper is organized as follows: Section 2 briefly reviews the DIBR operations. Section 3 introduces the pixel grouping according to approximate invariance of histogram shape and nonnegative matrix factorization-based image hashing. Section 4 shows the experimental results and performance comparisons. Section 5 gives the final conclusions.

Review of Depth-Image-Based
Rendering Process Figure 2 illustrates the relationship between the center image and the virtual images generated by DIBR operations [31]. Suppose P is a point in the space, C c , C l , and C r represent   Wireless Communications and Mobile Computing the center viewpoint, left viewpoint, and the right viewpoint, respectively, f represents the focal length of the center viewpoint, and Z represents the depth of P. x c , x l , and x r represent the x-coordinate of pixel in the center image, the virtual left image, and the virtual right image, respectively. t x represents the baseline distance, value of which is equal to the distance between the left and right viewpoints. As geometric relations shown in Figure 2, x-coordinate of pixel in the virtual images is computed as Z In fact, the gray value of pixel in depth image is not the real depth value. Pixel with gray value close to 255 indicates that P is close to the near clipping plane Z near . On the other hand, pixel with gray value close to 0 indicates that P is close to the far clipping plane Z far . According to formula (3), the depth value ZðvÞ of P is computed, where v represents the gray value.

Proposed Image Hashing
Our DIBR 3D image hashing scheme includes the following steps: the original center image is filtered with a Gaussian kernel low-pass filter to get the low frequency, and we standardize the low frequency of center image for hash generation. Then, pixels of normalized low frequency image are divided into different groups according to the histogram shape. Then, these pixel groups are used to construct a secondary image, which is almost unchangeable under geometric distortions and slightly changes after DIBR operations. Lastly, the secondary image is discomposed by nonnegative matrix factorization to get the coefficient matrix, and the final hash is constructed with these coefficients.

Preprocessing.
Low-pass filtering is adopted to extract the low-frequency component of original center image, which is aimed at enhancing the robustness of proposed hashing  3 Wireless Communications and Mobile Computing scheme to some common content-preserving manipulations [32]. The low-frequency component IC low of original center image IC is obtained as where * represents the convolution operation, and the low-pass filter Gaussian function Gðx, y, σÞ is represented as where σ is the standard difference. According to parameters setting in [32], σ is set to 1.

Pixel
Grouping. The gray levels of filtered image I low also range from 0 to 255. In this paper, only pixels with M different gray levels are randomly selected to construct the secondary image, which is aimed at ensuring the security of proposed hashing algorithm. With a key-based sequence Pð After resizing I low to m × m, pixels with L B neighbouring gray levels in H M are selected to form one pixel group. In total, n = bM/L B c groups are formed, where b·c is a floor function.
Suppose g i be one of the pixel groups. In order to form the i th column of the secondary image, we sort and resize g i to a new vector v i sized k × 1. Then, the secondary image is represented as It is clear that the histogram shape of V is the same as that of the resized I low , and the secondary image V is robust to geometric distortions such as rotation. In this paper, M is set to 240, m = 256, L B = 6, and k = 4m.

Hash Generation.
Since the histogram shape is almost unchangeable under geometric distortions and slightly changes after DIBR operations, features extracted from the secondary image V also have this property. NMF is used to get the base matrix W and coefficient matrix H, respectively. Concatenate the coefficient matrix H to obtain the final hash vector, the length L of hash vector is n × r, where n is the number of pixel groups and r is the rank for NMF. In this paper, r is set to 2.
In this paper, correlation coefficient is taken as the metric to measure the similarity between two image hash vectors Hash1 and Hash2. The correlation coefficient SðHash1, Hash2Þ is defined as According to formula (8), SðHash1, Hash2Þ ranges from −1 to 1, and a bigger SðHash1, Hash2Þ value indicates that the input image is more similar with the original corresponding center image. If the correlation coefficient SðHash1, Hash2Þ is higher than the threshold predefined, the input image is viewed as perceptual content unchanged. If the correlation coefficient SðHash1, Hash2Þ is lower than the threshold predefined, the input image is viewed as a different image or a maliciously tempered version of the original corresponding center image. For DIBR 3D images, the virtual images should have much bigger SðHash1, Hash2Þ value when computing the perceptual distance from their corresponding original center image. According to experiment results listed in Tables 2 and 3, some virtual images are viewed different from the original center image when the hashing method proposed in [30] is adopted. It is clear that our DIBR 3D image hashing scheme can identify the virtual images with the same visual content as the original center image.

Invariance of Histogram
Shape. Robustness to geometric distortion attacks, especially the rotation attacks, is the major problem to be considered when designing a traditional 2D image hashing scheme with features insensitive to geometric operations. According to [33], the histogram shape is robust to scaling, rotation, and affine attacks. To design a DIBR 3D image hashing scheme, the robustness to the operation of virtual image generation is also important.
The resistance of the histogram shape to the operation of virtual image generation is discussed as follows. According to [33], with some regions cropped from the original image, the histogram shape of the original image will be different from the histogram shape of cropped one. Strictly speaking, the robustness of histogram shape under cropping attacks depends on the image and the cropped area. So the invariance property of the histogram shape of an image under cropping attacks is an approximate invariance. Similarly, in the operation of virtual image generation, the virtual images are generated from the center image with some regions cropped and  holes filled, and the robustness of histogram shape under this operation depends on the image, the baseline distance, and the key-based sequence used for selecting the gray levels, so the invariance property of the histogram shape of the virtual images is also an approximate invariance. As shown in Figure 3, although the virtual images are generated from the center image with pixels' translation and parts of pixels cropped, the number of pixels in each group slightly changes compared with that of the center image. Resizing each pixel group to a new vector as the column of secondary image, this secondary image is similar with that formed from the center image. Using the NMF to extract features from the secondary image to obtain the final hash vector, the final hash vector of the virtual image is almost the same as that of the center image. Table 4 illustrates the statistics of perceptual distances between the tested center images and their corresponding virtual images. It can be seen that all means are close to 1, and their standard deviations are small. Moreover, the minimum values are also close to 1. This indicates that the approximate invariance of histogram shape can be used to extract features insensitive to DIBR operations, making our DIBR 3D image hashing scheme identify the virtual images with the same visual content as the original center image.

Experimental Results
Dataset with 2727 images is constructed to evaluate the identification performance for DIBR 3D image. Pairs of the center    [35] to construct the dataset, and the sizes of these images are ranging from 450 × 375 to 1390 × 1110. Hashes of the center image, the virtual left image, the virtual right image, and their distorted versions are generated with our hashing scheme in order to calculate the identification accuracy rate. The distorted versions are generated by attacking the center and virtual images according to 10 classes of common content-preserving operations. In this paper, MATLAB is exploited to implement these 10 class operations with different parameters. These operations include common signal and geometric distortion attacks such as JPEG compression, blurring, additive noise, scaling, rotation, and cropping after rotation. The operations and their parameters are listed in Table 5.
5.1. Discrimination. 120 different color images are collected from the Ground Truth Database [36] in order to test the discriminative capability of proposed hashing. The hash vectors are generated for these 120 images, and then 7140 correlation coefficients of S are computed between each pair of different hash vectors. The maximum value of these correlation coefficients is 0.9785, and the minimum value is -0.5101. If the threshold T is set as 0.92, 0.32 percent pairs of different images are identified with the similar content. 0.09 percent pairs of different images are identified with the similar content with T is set to 0.94. No pair of different images is identified with the similar content when T is set to 0.98.

Perceptual Robustness.
Firstly, four pairs of the center image and the depth image are selected from the above dataset. They are "Breakdancers," "Books," "Dolls," and "ballet" as listed in Table 2. Each virtual image pair and the center image are attacked by the content-preserving operations listed in Table 5. As shown in Figure 4, no pair of visually identical images (including the distorted center and virtual images) is identified with different content when the threshold T is set to 0.96.
In this paper, combinational attacks between image geometric distortion attacks and signal distortion attacks are also performed for many images to evaluate the perceptual robustness of the proposed image hashing scheme. Combinational attacks are used as follows: Gaussian noise +rotation, Gaussian noise+cropping after rotation, salt and paper noise+rotation, salt and paper noise+cropping after rotation, speckle noise+rotation, speckle noise+cropping after rotation, Gaussian blurring+rotation, Gaussian blur-ring+cropping after rotation, circular blurring+rotation, circular blurring+cropping after rotation, motion blurring +rotation, motion blurring+cropping after rotation, JPEG compression+rotation, and JPEG compression+cropping after rotation. In addition, scaling+rotation and scaling +cropping after rotation are also performed.
To obtain the versions under combinational attacks, both of the center image and the virtual image are firstly attacked by rotation (2°, 10°, and 45°) or cropping after rotation (2°, 10°, and 45°). To simulate combinational attacks, all operations listed in Table 5 (the quality factor of JPEG compression is set from 30 to 100) are performed except rotation and cropping after rotation. Then, hash vectors of the attacked versions are generated to compute the perceptual distances represented by the correlation coefficient S. For space limitation, a typical example is exemplified here. Figures 5 and 6 illustrate the robustness of our hashing against combinational attacks, where the x-axis is the parameter value of each manipulation, the y-axis is the correlation coefficient S, and the center image and the virtual image are firstly attacked with rotation 45°or cropping after rotation 45°, then further attacked by all operations listed in Table 5, except rotation and cropping after rotation. It is observed that all correlation coefficients are above 0.94, except the combinational attack Gaussian noise with variance 0:005 + rotation with 45°. This means that our hashing is also robust against most of the above combinational attacks. As shown in Table 6, the correlation coefficients are above 0.98, when the angle of rotation is 2°. The experiments demonstrate that our DIBR 3D hashing is robust against these combinational attacks.
In order to show the identification performance of our DIBR 3D image hashing scheme is better than some other existing traditional 2D hashing schemes, two kinds of the current state-of-the-art 2D image hashing schemes are tested for experimental comparisons. One is the NMFbased hashing algorithm proposed in paper [21], and the other is the ring partition-based hashing algorithm proposed in [29,30].
Suppose IC = fIC i , 1 ≤ i ≤ Ng be the set of original center images. Then, we generate the compact hash HðIC i Þ from each of the center images, and HðIC i = h 1 , h 2 , ⋯, h L Þ is the hash vector with length L for center image IC i .
In this paper, we use correlation coefficient as the performance metric to evaluate the distance between two different hash vectors. Suppose HðIC i Þ is the hash vector of one of the center image set, and HðI Q Þ is the query hash vector of distorted vision for either of the center image or their corresponding virtual images. Then, we calculate the correlation where SðHðI Q Þ, HðIC i ÞÞ is calculated as the correlation coefficient between HðI Q Þ and HðIC i Þ.
Higher identification accuracy rate means that the image attacked by common content-preserving operations can still be identified having similar perceptual content with the original one. When considering the problem of DIBR 3D image identification, high identification performance means that the virtual image should be identified having similar perceptual content with their corresponding center image even though the virtual images are attacked by common contentpreserving operations.
As shown in Table 7, it is clear that the proposed hashing, NMF-based hashing, and ring partition-based hashing algorithms can achieve good identification performance, only taking into account the identification for center images. In [29,30], they consider that all the perceptual distortions and malicious operations on digital images will not change the viewpoint, and the image center is usually unchanged, so it is relatively stable under geometric attacks such as rotation, scaling, and cropping after rotation. In fact, in the process of DIBR, the virtual image is generated from the center image through pixel shifting. Therefore, the hashing methods based on ring partition lose the advantage of generating robust hash for DIBR 3D image, as shown in Table 8. The experimental results show that the signal distorted virtual image can still be classified as the corresponding original center image with proposed hashing method. NMF-based method is sensitive to rotation attack due to the change of predefined position caused by geometric synchronization distortion. In contrast, the proposed hashing in this paper is robust to this kind of geometric attack. According to the experiment results listed in Table 9, it is clear that our DIBR 3D hashing scheme outperforms ring partition-based hashing schemes and NMF-based hashing scheme under content-preserving operations listed in above section.
Identification accuracy performances under combinational attacks between image geometric distortion attacks and signal distortion attacks are also tested with many images. As shown in Table 10, it is clear that the proposed hashing achieves good identification performances under most combinational attacks with slight degradations under Gaussian noise+geometric attacks (rotation with 45°and cropping after rotation with 45°) and speckle noise+geometric attacks (rotation with 45°). This means that our DIBR 3D hashing is robust against most of the above combinational attacks.
In this paper, FRR (false reject rate) and FAR (false accept rate) are also used to evaluate the perceptual robustness of proposed DIBR 3D image hashing scheme. FRR describes the error identification probability, the smaller FRR is, the better robustness of hash algorithm. FAR reflects the discrimination of hashing algorithm, the smaller FAR is, the better the discrimination. It is clear that an excellent hashing algorithm should have the minimum FRR and the minimum FAR with a certain threshold. As shown in Figure 7 7 Wireless Communications and Mobile Computing threshold set from 0.86 to 0.93. This means that the proposed hashing could achieve the highest probability of true identification with zero false classification rate. As shown in Figure 7(a), for the NMF-based hashing [21], the minimum FRR and the minimum FAR are 0.164 when the threshold is set to 52. As shown in Figure 7(b), for the ring partitionbased hashing [30], the minimum FRR and the minimum FAR are 0.176 when the threshold is set to 0.45. As shown in Figure 7(c), for the ring partition-based hashing [29], the minimum FRR and the minimum FAR are 0.16 when the threshold is set to 570. This experiment shows that the proposed hashing scheme is robust to common signal and geometric distortion attacks, such as additive noise, blurring, JPEG compression, scaling, and rotation.
The underlying reason is that these kinds of traditional 2D image hashing method consider that all perceptually insignificant distortions and malicious manipulations on a digital image would not lead to viewpoint changes, and the center of an image is generally preserved and thus relatively stable under geometric attacks such as rotation. In fact, virtual images are generated from center image with pixels shifting in the DIBR process. In paper [29,30], they divide the image into several rings with the center of the image as the center. Using the pixels in every ring to form a secondary image, they extract the final hash from the secondary image. In the same way mentioned above, the different centers lead to form different secondary images, and the final hash vector of the center image is different from the hash vector of either virtual image.    3. Robustness against Baseline Distance Adjustment. As shown in Section 3, in the DIBR process, a virtual image can be generated using an appropriate baseline distance of t x . Usually, t x is set different to suit different people's vision.
Because t x is not fixed during DIBR rendering, baseline distance adjustment may affect the identification performance of virtual image. In order to show the robustness of the proposed hash method for adjusting the baseline distance, the range of the baseline distance t x is from 5% to 7% of the image width. As shown in Table 11, the identification accuracy of different baseline distance is invariable.

Key Dependence.
To enhance the security of hashing scheme, a secret key is usually used in the processes of feature extraction and feature compression to generate the final hash. As a result, the key-based hashing scheme is key dependent, making the hash unpredictable to prevent unauthorized access.
In the proposed hashing scheme, only pixels with M different gray levels are used to construct the secondary image. Using a key-based sequence PðMÞ to select pixel groups, the security of proposed hashing scheme is enhanced. To validate key dependence of proposed hashing scheme, four images "Breakdancers," "Ballet," "Dolls," and "Books" are adopted.
For each image, hashes are generated with 100 different keys. Then, we calculate the correlation coefficients between the original key-based hash and hashes with different keys; it can be found that all correlation coefficients between different hashes of the four images are smaller. It should be noted that the parameters of hash generation are kept unchanged     Figure 8, where the x-axis is the index of key and the y-axis is the correlation coefficient S, which represents the hash distance. For the image of "Breakdancers," the maximum, the minimum, and the average distances are 0.4507, -0.1849, and 0.1525, respectively. For the image of "Ballet," the maximum, the minimum, and the average distances are 0.4754, 0.0838, and 0.3185, respectively. For the image of "Dolls," the maximum, the minimum, and the average distances are 0.3162, -0.2067, and 0.1226, respectively. For the image of "Books," the maximum, the minimum, and the average distances are 0.5470, -0.0440, and 0.3319, respectively. It is clear that the maximum distances between the original key-based hash and other 400 hashes with different keys are lower than 0.96. This experimental result shows that the security of  Figure 7: (a) The FAR and FRR of hash algorithm in [21]. (b) The FAR and FRR of hash algorithm in [30]. (c) The FAR and FRR of hash algorithm in [29]. (d) The FAR and FRR of proposed hash algorithm.

Conclusions
In this paper, we propose a pixel grouping and NMF-based DIBR 3D image hashing scheme, which can be used for virtual image identification and retrieval. Low-pass filtering and histogram shape-based pixel grouping are the key steps to make proposed hashing scheme robust to common content-preserving manipulations, and the approximate invariance of histogram shape to cropping and DIBR operations ensures that our DIBR 3D image hashing scheme also has better performance for virtual image identification. The experiment results have shown that the proposed DIBR 3D image hashing resists to common content-preserving manipulations including signal distortion attacks and geometric distortion attacks. However, the proposed hashing method may identify an input image with different content to be visually identical, when the input image has the same histogram shape. We will solve this problem in the future work.

Data Availability
To get the dataset for discrimination, please visit http://www .cs.washington.edu/research/imagedatabase/groundtruth/. Further details can be provided upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.

Acknowledgments
We would like to thank the anonymous reviewers for their helpful comments and suggestions, and their comments and suggestions help us to improve the quality of this paper.  Figure 8: (a) Correlation coefficients between hashes of "Books" generated by different keys, (b) correlation coefficients between hashes of "Dolls" generated by different keys, (c) correlation coefficients between hashes of "ballet" generated by different keys, and (d) correlation coefficients between hashes of "Breakdancers" generated by different keys.