Robust Image Hashing with Low-Rank Representation and Ring Partition

Image hashing has attractedmuch attention of the community of multimedia security in the past years. It has been successfully used in social event detection, image authentication, copy detection, image quality assessment, and so on. This paper presents a novel image hashing with low-rank representation (LRR) and ring partition. The proposed hashing finds the saliency map by the spectral residual model and exploits it to construct the visual representation of the preprocessed image. Next, the proposed hashing calculates the low-rank recovery of the visual representation by LRR and extracts the rotation-invariant hash from the low-rank recovery by ring partition. Hash similarity is finally determined by L2 norm. Extensive experiments are done to validate effectiveness of the proposed hashing. The results demonstrate that the proposed hashing can reach a good balance between robustness and discrimination and is superior to some state-of-the-art hashing algorithms in terms of the area under the receiver operating characteristic curve.


Introduction
With the popularity of the platforms of the social network, such as Facebook and Twitter, more and more digital images are transmitted via the Internet and stored in the cyberspace. Therefore, efficient techniques are required for processing massive images and protecting content security. For example, when an important event happens, such as an opening ceremony of the Olympic Games, many people would like to forward the same image of the event in their social network. These forwarded images may undergo some digital operations, such as compression and enhancement. Consequently, there are many image copies of a hot event in the cyberspace. It is an important task to find a hot event of the social network by detecting image copies [1]. In recent years, a useful technique called image hashing [2,3] attracts much attention of the community of multimedia security, which can extract a short code called hash based on the visual content of the input image regardless of its detailed bits. It can not only be applied to social event detection [1],but also can be used in many other applications [4][5][6][7][8], e.g., image retrieval, image authentication, image copy detection, and image quality assessment. In practice, the hash is used to represent the input image. As the hash is a short representation, the use of image hashing can achieve efficient data processing. In this paper, we study a novel hashing algorithm based on the lowrank representation model and ring partition.
The most important properties of the image hashing algorithm are robustness and discrimination [9]. The requirement of robustness is that the image hashing algorithm must map visually similar images to the same or similar hashes no matter whether their bit representations are the same or not. In other words, the hashing algorithm should be robust to normal digital operations, e.g., compression, filtering, and enhancement. This is because they alter the bit representations of digital images but keep their visual appearances unchanged. The requirement of discrimination is that the image hashing algorithm must map different images to completely different hashes. Since the number of different images is much bigger than that of similar images in practice, good discrimination means a low error rate of judging different images as similar images. This property is helpful to many applications, such as social event detection and image retrieval. Note that the two properties restrict each other. A high-performance algorithm should reach a good balance between them.
In the past years, many researchers have devoted themselves to developing image hashing algorithms. Some typical techniques are briefly introduced below. For example, Swaminathan et al. [10] combined the Fourier-Mellin transform and randomization method to develop secure image hashing. Monga and Evans [11] proposed to use feature points detected by the end-stopped wavelet to build the hash. These two hashing algorithms [10,11] are resilient to small-angle rotation. Lv and Wang [12] combined SIFT features and Harris points to construct the hash. This scheme is robust to large-angle rotation and brightness adjustment, but its robustness against additive noise and blurring must be improved. Zhao et al. [13] derived the robust image hash for authentication from global features determined by Zernike moments and local features based on the position information of salient features. Tang et al. [14] calculated the image hash by jointly using the color vector angle (CVA) and discrete wavelet transform (DWT). Both the above hashing methods [13,14] only resist small-angle rotation. Laradji et al. [15] extracted the hash of the color image by using the hypercomplex numbers and quaternion Fourier transform (QFT). This approach has good discrimination, but its robustness against rotation needs to be improved. Wang et al. [16] jointly exploited the Watson model and SIFT features to design the hashing algorithm for authentication. This method has better performance than the hashing method [13].
Recently, Yan et al. [17] introduced the quaternion techniques called quaternion Fourier transform and quaternion Fourier-Mellin moments to image hashing design. In another study, Yan et al. [18] exploited the adaptive local image features to design multiscale hashing. Both the hashing schemes [17,18] reach good performance in tampering detection. Tang et al. [19] constructed the feature matrix via the DCT (discrete cosine transform) and learned hash code from the DCT-based matrix by local linear embedding. This hashing only resists image rotation within 5°. In [20], Tang et al. proposed to extract local statistical features from image rings and compressed statistical features by calculating vector distance. As the contents of image rings are rotation-invariant, rotation robustness of this hashing [20] is thus enhanced. Davarzani et al. [21] combined SVD (singular value decomposition) with CSLBP (Center-Symmetric Local Binary Patterns) to make the hashing scheme for authentication. The scheme is robust to additive noise and JPEG compression, but it is sensitive to rotation. Huang et al. [22] introduced the random walk to hash generation for improving security. In another work, Qin et al. [23] used SVD to conduct preprocessing and extracted hybrid features based on the circle-based feature and block-based feature to construct the hash. The hybrid feature-based hashing has good robustness against compression and filtering, but its computational cost should be reduced. Tang et al. [24] first proposed to construct a three-order tensor with image blocks and extracted the image hash from the three-order tensor by Tucker decomposition. This approach can resist small-angle rotation. Shen and Zhao [25] proposed to compute the image hash by using the color opponent component and quadtree structure features. This method shows good performance in image authentication, but it is fragile to image rotation.
From the above reviews, it can be found that many algorithms do not make a good balance between rotation robustness and discrimination. Aiming at this problem, we propose a new image hashing based on low-rank representation and ring partition. The main contributions of the proposed algorithm are as follows: (1) We calculate the visual representation of the preprocessed image based on the saliency map extracted by the spectral residual model. Since the saliency map can indicate visual attention of human beings, the visual representation using the saliency map can effectively describe salient regions of the image. Consequently, hash generation with the visual representation can improve robustness of the proposed algorithm (2) We propose to incorporate low-rank representation into ring partition. The low-rank representation can capture the global structure of data, which is helpful to make the discriminative hash. Since ring partition can produce a set of image rings invariant to rotation, hash code extraction based on image rings can reach good rotation robustness Many experiments are done to validate effectiveness of the proposed algorithm. The results illustrate that the proposed algorithm can resist many digital operations, including image rotation with a large angle. Performance comparisons with some state-of-the-art algorithms are also done. The receiver operating characteristic (ROC) curve results show that the proposed algorithm has better classification performance than those of the compared algorithms in discrimination and robustness.
The structure of the rest of the paper is as follows. Section 2 describes the image hashing algorithm proposed in this paper. Sections 3 and 4 discuss experimental results and performance comparisons, respectively. Section 5 summarizes this paper.

Proposed Image Hashing
The proposed image hashing can be divided into four steps: preprocessing, visual representation calculation, low-rank representation, and ring partition. Figure 1 is the diagram of our algorithm. The preprocessing is to produce a normalized image, and the visual representation calculation is to generate an image representation which can indicate salient regions of the image. The use of low-rank representation is to extract principal image features for making the discriminative hash, and the use of ring partition can make the extracted feature code invariant to image rotation. These steps are described in detail in the following sections. Wireless Communications and Mobile Computing 2.1. Preprocessing. This step includes three operations: bilinear interpolation, Gaussian low-pass filtering, and color space conversion. The bilinear interpolation is to resize the input image to a standard size n × n. This operation can make our algorithm resilient to image scaling. The 3 × 3 Gaussian low-pass filtering is then applied to the resized image. Such operation can alleviate the influence of some digital operations on the resized image, such as image noise and JPEG compression. Finally, the filtered image in RGB color space is converted to HSV color space [26] and the brightness component in the HSV color space is used to denote the input image. The HSV color space is also called the hexagonal cone model and has shown good performances in many existing hashing algorithms [26]. Let H 1 be the hue of the pixel and S 1 and V 1 be the saturation and brightness of the pixel in the HSV color space, respectively. Thus, the calculation formulas for converting from RGB space to HSV space are as follows: where R 1 is the red component of the pixel, G 1 and B 1 are the green and blue components of the pixel, respectively, Max ðR 1 , G 1 , B 1 Þ and Min ðR 1 , G 1 , B 1 Þ represent the maximum value and the minimum value of R 1 , G 1 , and B 1 , respectively. Figure 2 shows a practical example of the preprocessing.

Visual Representation Calculation.
In this work, a wellknown model of saliency detection called the spectral residual model (SRM) [27] is exploited to find the saliency map of the image. Then, the visual image representation can be determined by combining the saliency map and the brightness component of the preprocessed image. Here, we select the SRM as the method of saliency detection. This is because the SRM is better than the conventional method such as Itti's method [28] in detection performance and computational speed [27].
Saliency map calculation of the classical spectral residual model is based on the spectral residual of the image Rð f Þ, which is defined as follows: where h n ð f Þ is a matrix of size n × n and Lð f Þ is the log spectrum of the image. More specifically, h n ð f Þ is defined as follows: In addition, Lðf Þ is determined by the below formula: where Að f Þ is the amplitude spectrum of the image, which can be determined by the following formula: where IðxÞ denotes a given image, F½• is the Fourier transform, and |•| denotes the amplitude of the image. Finally, the saliency map SðxÞ in the spatial domain can be constructed by using the inverse Fourier transform as follows: where gðxÞ is a low-pass filter for smoothing the output saliency map of the inverse Fourier transform for better visual effects (a circular averaging filter radius 3 is used here), F −1 ½• is the inverse Fourier transform, and Pð f Þ is the phase spectrum of the image defined in the below equation: where φð•Þ denotes the phase of the image. In [27], it is stated that image width (or height) with 64 pixels can reach a good estimation of the scale of normal visual conditions. Following this, we resize the n × n brightness component to 64 × 64 and convert the calculated saliency map to the original size n × n by bilinear interpolation. More details of SRM can be referred to [27]. Suppose that V is the brightness component of the preprocessed image and Vði, jÞ is the element of V in the ith row and the jth column. Also, assume that S denotes the saliency map of V and Sði, jÞ is the element of S in the ith row and the jth column. Therefore, the visual representation X can be obtained by the following equation: where Xði, jÞ is the element of X in the ith row and the jth column.
2.3. Low-Rank Representation. Low-rank representation (LRR) is a useful technique for capturing the global structure of data [29]. The LRR is robust to noise and can extract the lowest-rank representation of all data [29,30]. It has been widely used in many applications, such as subspace segmentation [31], image segmentation [32], and image classification [33]. Suppose that X is an observation matrix corrupted by noise E. Thus, LRR calculation can be solved by the regularized rank minimization problem [29][30][31] as follows: in which k•k * is the nuclear norm of a matrix (sum of the singular values of the matrix), λ > 0 is a parameter for balancing effects of the two parts, k•k 2,1 is the l 2,1 norm, and kEk 2, 1 is defined as follows: Let Y be the low-rank recovery of X. Assume that the minimizer of (9) is ðZ * , E * Þ. Thus, it can be obtained by Y = XZ * or Y = X − E * . In practice, problem (9) can be converted to an equivalent optimization problem as follows: This optimization problem can be solved by the below ALM (Augmented Lagrange Multiplier) problem: in which Y 1 and Y 2 are the Lagrange multipliers and μ > 0 is the penalty parameter. Problem (12) can be solved by the inexact ALM method [30]. More details of LRR can be referred to [29][30][31].
In this work, the input of LRR is the visual representation X and the low-rank recovery Y is taken for hash code extraction. The reasons of our use of LRR for image hashing design are as follows. The influences of digital operations (e.g., compression, filtering, and noise) on image are viewed as noises added to the image. Since LRR is robust to noise, hash generation with low-rank recovery can improve robustness of our algorithm. In addition, LRR can efficiently capture the global structure of input data. Therefore, the use of LRR can ensure the discriminative capability of the proposed algorithm.

Ring Partition.
To make the proposed algorithm resilient to image rotation, a well-known technique of image segmentation called ring partition (RP) [9,34] is exploited here. RP takes the image center as the circle center and divides the inscribed circle of the image into a set of rings. Figure 4 presents an example of RP with 4 image rings. Clearly, the contents of image rings are unchanged after image rotation. Therefore, we can calculate the hash code resistant to rotation by using the mean values of these rings. Details of hash code extraction based on RP are explained as follows.
Suppose that the low-rank recovery Y is divided into m rings. Note that the size of Y is n × n and the area of each ring is kept the same. Obviously, the elements of image rings can be determined by using two adjacent radii except those of the innermost ring. Assume that r i is the ith radius (i = 1, 2, ⋯, m ) labeled from the small value to the big value. Therefore, r 1 is the radius of the innermost circle and r m is the radius of the outmost circle. It is clear that r m = bn/2c, where b·c means downward rounding. To calculate the other radii, the average area of the image ring should be first determined by the below equation: where C = πr m 2 is the area of the inscribed circle. Thus, the radius of the innermost circle can be calculated by the following equation: Next, other r i (i = 2, 3, ⋯, m − 1) can be determined by the below formula: After all radii are obtained, the elements of Y can be classified into image rings by using radii and the distances from these elements to the image center. Let U i be the set of the elements of the ith ring (i = 1, 2, ⋯, m) and pðx, yÞ be the element of Y in the xth row and yth column. Assume that the coordinates of the image center are ðx c , y c Þ. Therefore, x c = n/2 + 0:5 and y c = n/2 + 0:5 if n is an even number. Otherwise, x c = ðn + 1Þ/2 and y c = ðn + 1Þ/2. Thus, the distance from pðx, yÞ to the image center ðx c , y c Þ can be obtained by the below equation: Consequently, the set U i can be determined by one of the following equations: For each set, the mean of its elements is selected as compact feature. Let v i be the mean of the elements in U i (i = 1, 2, ⋯, m). Thus, it is quantized to an integer for reducing the storage by the below equation: where ½· is the rounding operation. Finally, our hash is obtained by concatenating these integers as follows: Therefore, the length of our hash is m integers.

Hash Similarity Computation.
As our hash is composed of some integers, the well-known distance metric called the L 2 norm is exploited to measure similarity between two hashes. Assume that h 1 = ½h 1 ð1Þ, h 1 ð2Þ,⋯,h 1 ðmÞ and h 2 = ½h 2 ð1Þ, h 2 ð2Þ,⋯,h 2 ðmÞ are the hash sequences of two images. Thus, the L 2 norm of the two hashes is defined as follows: in which h 1 ðiÞ is the ith element of h 1 and h 2 ðiÞ is the ith element of h 2 . Generally, a smaller L 2 norm means more similar hashes of the evaluated images. If the L 2 norm is bigger than a threshold T, the evaluated images corresponding to the input hashes are judged as different images. Otherwise, they are viewed as similar images.

Experimental Results
The parameter settings of the proposed algorithm are as follows. The λ of LRR is 0.9, the input image is resized to 512 × 512, and the ring number is 64, i.e., n = 512 and m = 64. In the following experiments, Sections 3.1 and 3.2 5 Wireless Communications and Mobile Computing validate the properties of robustness and discrimination, respectively. Section 3.3 analyzes our hash storage in binary form. Section 3.4 discusses the influence of the ring number on our algorithm performance.

3.1.
Robustness. An open database called the Kodak dataset [35] is exploited to construct a test database of similar images. The Kodak dataset is composed of 24 color images of different categories with the size of 768 × 512 or 512 × 768. To produce similar images of these color images, some commonly used operations are used to conduct robustness attacks. These operations are achieved by Photoshop, MATLAB, and StirMark. More specifically, Photoshop provides brightness and contrast adjustments (parameters are ±10 and ±20). MATLAB provides 3 × 3 Gaussian low-pass filtering (standard deviations range from 0.3 to 1.0 with a step 0.1), gamma correction (parameters are 0.75, 0.9, 1.1, and 1.25), salt-and-pepper noise, and speckle noise (both parameters range from 0.001 to 0.01 with a step 0.001). Stir-Mark provides JPEG compression (quality factors range from 30 to 100 with a step 10), watermark embedding (strengths range from 10 to 100 with a step 10), image scaling (scaling ratios are 0.5, 0.75, 0.9, 1.1, 1.5, and 2.0), and image rotation (rotation angles are ±1, ±2, ±5, ±10, ±15, ±30, ±45, and ±90). Note that image rotation will increase image size and some padded pixels are added to the rotated images. In this experiment, only the 361 × 361 central parts of 24 original images and their rotated images are taken for evaluating rotation robustness. Therefore, the number of the used operations is 10, which totally contribute 80 manipulations. This implies that each original image has 80 similar images. So the total number of visual similar images is 24 × 80 = 1920, and the number of the used image is 1920 + 24 = 1944. Figure 5 demonstrates robustness experiments under different operations, where the x-axis is the parameter value of digital operation and the y-axis is the L 2 norm. Note that the curves in Figure 5 are the mean values of the L 2 norms between hashes of 24 color images and their similar images. From Figure 5, it can be seen that the mean L 2 norms are all smaller than 15, except two values of rotation operation. For image rotation, the maximum value is 17.29, which is a little bigger than those of other operations. It is found that, if the threshold is selected as T = 15, our algorithm can cor-rectly detect 92.19% similar images. If there is no rotated image, our algorithm can recognize all similar images. A high correct detection rate illustrates good robustness of our algorithm. [36] is selected to test discrimination of our algorithm. This database contains 1338 color images with the size of 512 × 384 or 384 × 512. Hashes of these 1338 color images are firstly calculated, and the L 2 norm between each pair of hashes is then computed. Therefore, the total number of valid distances is C 2 1338 = ð1338 × 1337Þ/2 = 894453. Figure 6 illustrates the distribution of these distances, where the x-axis is the value of the L 2 norm and the y-axis is the frequency of the L 2 norm. It can be observed that the maximum L 2 norm is 163.25 and the minimum L 2 norm is 4.80. Moreover, the mean value of these L 2 norms is 42.72 and the standard deviation is 18.48. From Figure 6, it can be seen that most distances are bigger than the abovementioned threshold T = 15, indicating our good performance in discrimination. Actually, the performances of discrimination and robustness are closely related to the selected threshold. Different thresholds will lead to different performances. Table 1 demonstrates the performances of robustness and discrimination under different thresholds, where the robustness is measured by the correct detection rate and the discrimination is indicated by the false detection rate. Note that the correct detection rate is the ratio between the number of similar images correctly detected and the total number of similar images. The false detection rate is the ratio between the number of different images falsely judged as similar images and the total number of different images. From Table 1, we can select T = 15 as the recommended threshold since it can make the minimum total error rate.

Hash Storage.
To analyze the required bits for storing our hash, the hashes of 1338 images in UCID are selected as the data source. Note that each hash generated by our algorithm is composed of 64 integers. Therefore, there are 1338 × 64 = 85632 integers in the data source. Figure 7 illustrates the distribution of these 85632 hash elements, where the x-axis is the element value and the y-axis is the frequency of the element value. It can be found that the   To make visual and quantitative comparisons, the receiver operating characteristic (ROC) graph [37] is taken. In the ROC graph, the x-axis represents FPR (false positive rate)  Wireless Communications and Mobile Computing and the y-axis represents TPR (true positive rate). Let P FPR and P TPR be the FPR and the TPR, respectively. Thus, they can be defined as follows: in which M 1 is the number of similar images correctly judged as similar images, M 2 represents the total number of similar images, M 3 is the number of different images falsely identified as similar images, and M 4 denotes the total number of different images. Obviously, P FPR and P TPR can indicate discrimination and robustness, respectively. In the ROC graph, a curve is plotted by using a set of points with coordinates ðP FPR , P TPR Þ. High performances of discrimination and robustness mean a low P FPR and a big P TPR . There-fore, it can intuitively conclude that the curve near the topleft corner has better classification performance than the curve far away from it. To make quantitative analysis, the AUC (area under the ROC curve) is taken. The range of AUC is [0, 1]. In general, a bigger AUC means better classification of the evaluated algorithm. Figure 8 shows  Table 2 summarizes performances of different ring numbers. From the viewpoint of the whole performance, it can be observed that the ring number m = 64 reaches a good balance among the three performance indices.

Performance Comparisons
To demonstrate advantages of our hashing algorithm, we compare it with some popular hashing algorithms. The selected hashing algorithms are CVA-DWT hashing [14], SVD-CSLBP hashing [21], random walk-based hashing [22], and hybrid feature-based hashing [23]. The datasets used in the above robustness experiment and discrimination test are also taken here, i.e., 1920 pairs of similar images and 1338 different images. To make fair comparison, the parameters of the selected algorithms are set to the same values as their original papers. Their hash similarity metrics are kept unchanged, i.e., the L 2 norm for CVA-DWT hashing and hybrid feature-based hashing, the correlation coefficient for SVD-CSLBP hashing, and the normalized Hamming distance for random walk-based hashing. The result of our hashing with m = 64 is taken for comparison.   Figure 9 illustrates the ROC curves of the evaluated hashing algorithms. To see more details, the ROC curves in the upper-left area are zoomed in and placed in the bottomright part of Figure 9. It is observed that the curves of our hashing and hybrid feature-based hashing intersect with each other. Moreover, both of them are above the curves of other evaluated hashing algorithms. To conduct quantitative analysis, the AUCs of the evaluated hashing algorithms are also calculated. It is found that the AUCs of CVA-DWT hashing, SVD-CSLBP hashing, random walk-based hashing, hybrid feature-based hashing, and our hashing are 0.97563, 0.74522, 0.95758, 0.97545, and 0.99069, respectively. From the results, it can be seen that our hashing is better than the compared algorithms in classification performance.
Average time of calculating a hash is also compared. The abovementioned computer is used again, and the compared algorithms are also implemented with MATLAB. It is found that the average time of CVA-DWT hashing, SVD-CSLBP hashing, random walk-based hashing, hybrid feature-based hashing, and our hashing are 0.05, 0.19, 0.02, 6.12, and 29.36 seconds, respectively. The speed of our hashing is slower than those of the compared algorithms. This is because the computational cost of the LRR method is relatively high. Moreover, the hash lengths of all algorithms are also compared. The hash lengths of CVA-DWT hashing and random walk-based hashing are 960 and 144 bits, respectively. The hash lengths of SVD-CSLBP hashing and hybrid feature-based hashing are 64 and 104 floating-point numbers. As a floating-point number requires 32 bits for storage according to the IEEE standard [38], the hash lengths of SVD-CSLBP hashing and hybrid feature-based hashing are 2048 and 3328 bits, respectively. The length of our hashing is 384 bits. It is longer than that of random walk-based hashing, but it is much shorter than those of other compared hashing algorithms. Table 3 summarizes performance indices of the evaluated hashing algorithms, where the text in italic is the best result of the corresponding column. Our hashing outperforms the compared hashing algorithms in classification performance in terms of AUC, but it runs slower than the compared algorithms. As to hash length, it is better than all compared algorithms, except random walk-based hashing.

Conclusions
In this paper, we have proposed a novel image hashing with LRR and RP. An important contribution is the calculation of the visual representation based on the saliency map determined by SRM. Hash generation based on the visual representation can improve robustness performance. Another significant contribution is the combination of LRR and RP, which can make the discriminative hash invariant to rotation. Many experiments with two well-known databases have been carried out. The results have shown that the proposed hashing is robust and discriminative. ROC curve comparisons have illustrated that the proposed hashing outperforms the compared hashing algorithms in classification performance. In addition, hash length comparisons have shown  that the proposed hashing is better than all compared algorithms, except random walk-based hashing. As to running speed, the proposed hashing runs slower than the compared algorithms due to the high computational cost of the LRR method. In the future, we plan to design fast hashing algorithms, deep learning-based hashing algorithms, and hashing algorithms for image authentication.

Data Availability
The image datasets used to support the findings of this study can be downloaded from the public websites whose hyperlinks are provided in this paper.