CF Model: A Coarse-to-Fine Model Based on Two-Level Local Search for Image Copy-Move Forgery Detection

,


Introduction
With the rapid development of technology worldwide, there are many ways to obtain and process images [1]. Evolutions in computer technology, the Internet, and image applications have allowed individuals to tamper easily with image content. Copy-move is the most common means of image forgery, in which a copy of a region is inserted into the same image. Two examples are shown in Figure 1, where the copymove forgeries are used to enrich image content. Considering scenarios involving the court, news, and so on, it is of paramount importance to determine whether an image is tampered. e purpose of digital image forensics is to verify the authenticity of an image.
As one of the most common means of image tampering, copy-move forgeries may be accompanied by certain postprocessing, including JPEG compression, noise addition, and blurring, to change the image content and confuse the information recipient [2]. In particular, the copied area is often geometrically transformed (rotated, scaled, etc.). erefore, the passive forensics of copy-move tampered images faces great technical challenges and has a strong practical application value.
is paper studies the corresponding passive forensic techniques for copy-move operations.
Our main contributions can be summarized as follows: (1) is paper proposes a coarse-to-fine model for detecting forged regions by the affine transformation matrix (CFM). e localization of the forged regions from sparse to accurate is achieved. (2) To further extract the forgery region accurately, a two-stage local search algorithm is designed in the refinement stage to better maintain the balance between complexity and effectiveness of forgery detection. (3) e method has better detection results and higher robustness to postprocessing operations such as scaling, rotation, noise, and JPEG compression.

Related Work
Numerous methods for copy-move forgery detection (CMFD) have been proposed in the last decade, which are traditionally categorized into two classes: block-based and interest point-based methods.

Block-Based CMFD.
In 2003, Fridrich [3] proposed the first CMFD algorithm which divided an input image into overlapping blocks to yield similar block pairs and used discrete cosine transform (DCT) to describe image blocks. LBP is a grey-scale texture operator which is used to describe the spatial structure of the image texture. Wang et al. [4] extracted Quaternion Exponent Moment (QEM) moduli from each overlapped circular color block. e main limitation of this method is the higher computational complexity, which can be reduced by applying super pixel theory. Chen et al. [5] proposed a scheme to detect copy-move regions through the invariant features extracted from each block, and each block was only compared with other blocks under the intersection of closed mean and variance features. Mahmood et al. [6] divided the approximation sub-band of the shift invariant stationary wavelet transform into overlapping blocks. Distinct features extracted from the overlapping blocks were used to expose tampered regions forged in digital images. e features of these algorithms can be classified as follows: invariant moments, dimension reduction, textural features, and polar transform. Matching techniques include dictionary sorting and Euclidean distance [7]. However, most algorithms based on image blocks do not perform well in resisting affine transformation attacks.

Interest Point-Based CMFD.
Different from blockbased algorithms, interest point-based CMFD algorithms are more robust against affine transformations. Unlike dividing an image, this method extracts interest points on the image, and image features are then extracted around the interest points. He et al. [8] used PCA on the feature vector to reduce computational complexity. Mohamadian and Pouyan [9] combined SIFT and Zernike moments to reduce the potential of being unable to detect tampered regions in flat regions. Pun et al. [10] proposed a novel CMFD scheme using adaptive oversegmentation and feature point matching, which integrates block-based and interest point-based forgery detection methods. Pandey et al. [11] proposed a fast and effective copy-move forgery detection algorithm through hierarchical feature point matching. Due to the high stability of intermediate and postprocessing operations, the SIFT method has been widely used in CMFD. To improve SIFT performance, Bay et al. [12] initially proposed the speeded-up robust features (SURF) technique. e SURF operator maintains the excellent performance of the SIFT operator but addresses the shortcomings of high computational complexity and time consumption. Bo et al. [13] proposed a CMFD technique based on SURF and extended the dimensions of Bay's techniques to 128 to reduce false matching. Many scholars have only used this technique in interest point detection to produce feature points, after which local features were employed to describe an interest point to achieve satisfactory results [14,15]. Mishra et al. [16] presented a detection method based on the combination between speeded-up robust features (SURF) and hierarchical agglomerative clustering (HAC). Zandi et al. [17] proposed a new interest point detector that leverages the advantages of block-based and traditional interest pointbased methods and uses improved strategies to implement the algorithm. However, because the interest points are comparatively few and scattered, interest point-based detection methods can encounter difficulties in locating a precise forged region. e block-based CMFD algorithm and interest pointbased CMFD algorithm each have a similar framework as depicted in Figure 2 [18].
(i) Preprocessing: its main purpose is to eliminate irrelevant information in the image and restore useful real information; the most common approach is to convert the image from an RGB version to a grayscale image (ii) Feature extraction: local image information is extracted from an image block or interest point represented by a feature descriptor (iii) Matching: similar pairs of image blocks or points are determined during the matching process Most existing algorithms based on image blocks suffer from some attacks, such as scaling, rotation, and noise addition, and interest point-based methods cannot locate the tampered region precisely. To solve these problems, a hybrid two-level method combining image blocks and interest points is proposed in this paper. We chose the SIFT as the feature descriptor to represent the interest point.
en, the adaptive oversegmentation method is used to improve the matching process and calculate the affine transformation matrix. Finally, the proposed local search algorithm is applied to image block level and pixel level, respectively, to locate the tampered region accurately.

Proposed Detection Algorithm
In this paper, an accurate CMFD method based on interest point and local search algorithm is proposed. e process is illustrated in Figure 3. e main flow of the proposed algorithm is as follows: (1) feature extraction: interest points are detected in the input image represented by a feature descriptor, after which accurate interest point matches are obtained via a matching process; (2) affine transformation calculation: utilize a random verification algorithm to calculate the affine transformation matrix; (3) forgery region extraction: local search algorithm is applied to the image block level and the pixel level. e image block level realizes the location of the tampering region, and the pixel level is used to refine the tampering region boundary. e image-level detection and pixel-level detection of the proposed model on the testing dataset show promising results. Our main contributions are as follows: (i) A method combining image blocks and pixels is proposed. Based on the block, the forged region can be located, and the pixel points are used to make the area boundary more refined. is method can make up for the poor performance of only extracting tampered areas with points of interest, thereby improving detection performance.
(ii) Considering the balance between algorithm complexity and performance, design a two-level local search algorithm. In the first stage, the image is divided into small blocks by rectangular blocks. If the image block contains the point of interest, it is marked as a forgery unit and calculated by affine transformation. e search algorithm matches the result to get the forgery region. In the second stage, the boundary of the forged area is extracted at the pixel level, and a secondary search algorithm is used for improvement to further improve the accuracy of model detection. (iii) Four different postprocessing operations were performed on the test dataset, and the experimental results show that our model still exhibits high robustness.
In the rest of this section, we present the process of this detection algorithm as illustrated in Figure 3. e details of our proposed algorithm are reflected in the following sections: Section 3.1 presents the feature extraction and description along with image segmentation using the adaptive oversegmentation algorithm to prepare for the next matching process. Section 3.2 outlines the feature-matching process using the two nearest neighbor (2NN) algorithm [19]. And then, the affine transformation is calculated. Section 3.3 introduces the local search algorithm. In Section 3.4, two-level local search algorithm using affine transformation matrix is utilized to locate the tampered region accurately. In the first stage, the image blocks are used as search units for feature matching, and the second stage is at the pixel level to refine the edge of the region.

Feature Extraction and Adaptive Oversegmentation.
e first phase of the proposed algorithm involves interest point detection and feature extraction based on SIFT features, referring to local features of an image. SIFT remains invariant to rotation, scaling, and light intensity and maintains stable robustness to changes in the viewing angle, affine transformation, and noise. e interest points and their corresponding descriptors are obtained. Based on these results, the proposed algorithm performs a matching operation to identify similar local regions.
To obtain good performance in matching and calculation of the affine transformation matrix, the adaptive oversegmentation method is adopted [10]. Next, we find corresponding interest point pairs via the feature matching process. In our proposed method, the segmentation algorithm is simple linear iterative clustering (SLIC). SLIC algorithm can generate compact and nearly uniform superpixel, and has high comprehensive evaluation in terms of operation speed, object contour preservation, and superpixel shape, which is more in line with the expected segmentation effect. When the SLIC segmentation method is used, the balance between computational cost and detection precision must be guaranteed. erefore, the adaptive over segmentation algorithm is adopted to adaptively define the size of superpixels according to the texture of the test images.

Security and Communication Networks
Next, a segmented image builds the image blocks set where NB is the total number of image blocks; the interest points and feature descriptors in the i th image block are stored in B i . Figure 4 depicts the relationship of the block set. en, we find the corresponding interest point pairs via the feature-matching process.

Interest Point Matching and Affine Transformation
Calculation.
e 2NN algorithm utilizes the ratio of the distance between the nearest neighbor and the second nearest neighbor. If image blocks B i and B j must match, for any feature point, where is the k th point in block B i , the calculation is as follows: where T b is the similarity threshold, d 1 is the closest neighbor, and d 2 is the second closest neighbor. e distance d m is calculated as where d m denotes the distance between point p k i and point p m i . p m j is the m th point in P j , and f k i and f m j are the corresponding feature descriptors.
In our experiment, T b is set to 0.2. If constraint (1) is satisfied, then the inspected interest point p k i is matched with p m j (p k i and p m j denotes the interest pairs). We iterate the 2NN process in different image blocks in our experiment until all blocks have been traversed, resulting in a dataset: where size () represents the number of point pairs in MP [x] and the threshold T p is set to 3 to filter the failed pairs. us, most missed matches are filtered.
To better display the tampered region, affine transformation matrix T is used to describe the relationship between   and store them in . e affine transformation matrix T is described as follows: where the affine transformation matrix T is represented as where t x and t y denote translations and a 1 , a 2 , a 3 , and a 4 are associated with scaling and rotation. C matrix can obtain the affine transformation matrix T.
To verify the accuracy of matrix T, all point pairs in M j i must be tested using this matrix. For any interest point pairs (p, p ′ ) in M j i , point p can obtain the corresponding interest point p' using the following equation: We verify the matrix accuracy based on the distance between p ′ and p'.
where x', y', and x', y' are the coordinates of p ′ and p'. T d is the similarity threshold of the matrix (T d � 1.5 in our experiment). en, we obtain the number of right point pairs count in M j i . When rate is greater than 0.5, the matrix T is considered correct. In this case, where size (M j i ) is the amount of all point pairs in M j i . In most cases, the source region and replication region may be covered by many image blocks. Many affine transformation matrices can be obtained through MP. We propose an algorithm to deal with this problem. Whenever any set M in MP must be calculated, we must examine the relationship between point pairs in M and existing matrix using formulas (6)- (8). If the label rate is more than 0.5, the set M is not to be calculated. Finally, the matrix set is described as follows: Next, we will display the tampered region in the search algorithm.

Local Search Algorithm.
Extracting the tampered region using only the interest point results in poor performance. By considering the balance between algorithm complexity and performance to more accurately extract the forgery region, we propose a local search algorithm that can be applied at the image block level and pixel level. e role of the local search algorithm is described in Figure 5, where the grid is used to replace the test image, the region outlined in red is the forged region, and the blue small block is the forged unit; when the first search algorithm is used, the forged unit is an image block, and the second forged unit is a pixel. Details of the search algorithm are provided in the following section. e detection unit can find a corresponding unit via the affine transformation matrix, which is key to the local search algorithm. e detected unit can find corresponding unit through the matrix. Before executing the search algorithm, the forged units must be collated and added to the forgery region set (TR). en, the local search algorithm is executed; steps are shown in Algorithm 1.
TR cnt is the result of the current detection, D nei is the set of neighborhood p nei � p 1 , p 2 , p 3 , p 4 , and (1, 2, 3, 4) denotes four angles (0°, 90°, 180°, and 270°). Notably, the detection unit in p nei may be the detected element; therefore, the detected elements in p nei must be deleted. en, the corresponding unit p i is calculated by matrix T, and feature descriptors are used to measure the similarity. ese descriptors are explained in detail in the following section. Security and Communication Networks e successfully matched unit pairs are added to TR cnt + 1 . is operation is iterated until all elements in TR cnt have been detected. Finally, the test result TR cnt + 1 is combined with the original result TR cnt + 1 , and we obtain the final result TR cnt + 1 � TR cnt + 1 U TR cnt . To understand the algorithm flow and prove the validity of the local search algorithm, a flow chart is used for descriptive purposes ( Figure 6). Figure 6 presents the ordinary flow of the local search algorithm. ere are only six forged units (a, b, c, a′, b′, and c′) at the beginning of the algorithm; the forged region is not completely covered. Implementation steps of the algorithm are described in Figure 6, where the blue blocks are forged units, green tags stand for detecting units, red blocks are nonforged units, and white blocks are units that have not been detected. Assume that there is only one affine transformation matrix T, and the final result was shown.

Tampered Region Localization.
To balance the complexity and accuracy of the algorithm, the two-stage local search algorithm is proposed: the image block level. And, the second stage is at the pixel level to refine the edge of the tampered region. e framework of the algorithm is displayed in Figure 7.

e First Stage.
In our method, interest points in the MP are extracted and stored in P right . First, a small, nonoverlapping rectangular block is used to cover the host image, and all image blocks are scanned. If the image block contains interest points in P right , the block is marked as a forged unit. en, the image blocks as a detection unit are added to TR 0 , and the search algorithm is employed on the image block level. Corresponding image blocks are calculated by the affine transformation T. Assume that image block B i calculates corresponding image block B i ; in this case, image block B i cannot reach the center of another block (B i ) and needs to extract the true matching image block B i , so feature comparison must be executed between B i and B i . en, the ZNCC (zero-based normalized crosscorrelation) should be calculated between B i and B i as follows: where I (u) and I ″ (u) denote pixel intensities at location u, and I and I are the average pixel intensities of B j and B. We apply a Gaussian filter of 7 × 7 pixels with a standard deviation of 0.5 to reduce noise; the threshold (T RD ) is set up to obtain similar image block pairs: In our work, T RD is set to 0.55 once formula (11) has been calculated. e two image blocks (B i and B j ) are similar, and the results of the search algorithm are stored in TR 1 .
A filtering algorithm is used to render the test results more accurate. For each forged unit in TR 1 , the neighbor of detection element D must be extracted, and the neighboring blocks are defined as D nei � {d 0 , d 1 , d 2 , d 3 , d 4 , d 5 , d 6 , d 7 }. In our experiment, if the number of forged units in D nei is less than 2, the detection element D is deleted.

3.4.2.
e Second Stage. It is challenging to extract the forgery region at the image block level, and the algorithm does not have good performance at the edge of the tampered region. us, the edge of TR 1 is extracted, and we obtain an edge region ER 0 and a center region CR 1 on the image block level, where ER 0 is considered inaccurate and CR 1 is accurate. In matrix T, all pixels in ER 1 must be calculated. For the obtained pixel pairs, the ZNCC algorithm is used to measure similarities, and the threshold (TDR) is set to 0.55. e matching result is saved in ER 1 , from which, forgery region TR 2 is obtained by combining the center region CR 1 and the matching result ER 1 . To improve the edge of the forged region, the edge of ER 2 is extracted at pixel level in TR 2 , and ER 2 is used to execute  local search algorithm. Assume that we get (I, I′) by matrix T; the color feature should be extracted, respectively, between I and I′ as follows: where R (), G (), and B () are three color channels of the detected image unit; F I , F I' are the color features of I and I'; and if feature F I and F I' conform to formula (11), matching is successful between unit I and I'.
where T RD 2 is the degree of similarity between I and I′. In our work, T RD 2 is 0.5. Results are stored in ER 3 . e tampered region TR 2 is obtained by combining ER 3 and center region CR 2 . After the filtering step, the morphological close operation is applied to TR 3 to eliminate small gaps, after which the tampered region TR end is generated. e algorithm is evaluated in the following section to demonstrate its effectiveness.

Experimental Results
In this section, a series of experiments are conducted to evaluate the performance of the proposed CMFD method. Section 4.1 introduces the image dataset used in our experiments and the evaluation criteria used to evaluate the performance of the proposed method. Section 4.2 shows the experimental results of the proposed algorithm. In section 4.3, the experimental results of the proposed CMFD method were finally compared with existing state-of-the-art CMFD

Datasets and Evaluation Criteria.
In the following experiments, a benchmark database [20] that includes realistic copy-move forgeries was used to test the proposed scheme. is image dataset included 48 source images along with manually prepared per image, semantically meaningful regions to be copied. Each image measured 3000 × 2300 pixels. Forgery regions comprised approximately 10% of each image. e copied regions belonged to the categories of living, natural, artificial, and mixed textures ranging from smooth to complex. Transformed images, such as those that underwent rotation, scaling, JPEG artifacts, and added noise, were also included in the image dataset.
To quantitatively evaluate the detection performance, we adopted two metrics: precision and recall. Precision is the fraction of pixels identified as forgery that are truly forgery, defined as the ratio of the number of correctly detected forged pixels to the total number of detected forged pixels. Recall refers to the fraction of forged pixels that are correctly classified, defined as the ratio of the number of correctly detected forged pixels to the number of forged pixels in the ground truth forgery image. Precision and Recall are calculated using (14) and (16), where Ω denotes the set of the detected forged regions in forged images with the CMFD method at the pixel level and Ω′ denotes the forged regions of the ground-truth of forged images. We provide the F i score as a measure that combines precision and recall in a single value.
Using these metrics, we show how precisely the CMFD algorithms identified tampered regions. To reduce the effects of random samples, the average precision and recall were computed for all images in the dataset.

Experimental Results on Plain Copy-Move Forgery.
Plain copy-move forgery is a kind of one-to-one copy-move method that does not involve other transformation operations. It is to cut the local area of the target image and then paste it into the target image again through rotation, scaling, and other operations to generate a new tampered image. We experimented on 48 plain copy-move forgery images in total. Figure 8 displays eight copy-move forgery detection results for the plain copy-move forgery, and the forgery content is either smooth (e.g., sky), rough (e.g., rocks), or structured (typically man-made buildings). From top to bottom are test images and corresponding ground-truth forged regions, and the final row is forged region detected by the CFModel. As can be seen from the figure, the proposed model obtains fine prediction masks and even in small forgery region. ese groups can be used as categories for CMFD images.

Experimental Results under Various Attacks.
In addition to one-to-one copy-move forgery, we also experimented on the various attacks to verify the effectiveness of the proposed algorithm.
(i) Scale: the tampered region is rescaled to between 91% and 109% of their original size with 2% step length.

Comparative Analysis of Algorithms.
is section presents the comparison results between CFModel and the existing methods, and experiments on the dataset proposed in [20] including 1488 tampered images. ree recent methods based on SIFT [20] and SURF [20] along with iterative CMFD [17] were selected for comparison.

Detection Results under Plain Copy-Move Forgery.
We first evaluated our algorithm under plain copy-move forgery attack. We experimented on 48 original images and 48 forged images, which are tampered by one-to-one copymove forgery. Tables 1 and 2 present the results of the evaluation at the image level and pixel level. As noted in Table 1, the CFModel achieved 97.82% precision and 93.75% recall, better than the most state-of-the-art methods at image level. Our scheme also achieved better performance at the pixel level. As indicated in Table 2, the CFModel achieved up to 84.58% precision and up to 97.41% recall, surpassing most stateof-the-art methods. Compared to Bi [22] and Chen [23], F1 score is slightly lower than them. e possible reason is that the proposed model is based on block and interest point, which focuses more on recall rate (whether the forged pixel is checked completely and correctly). ese results show that the proposed method is more effective than others. Figure 8 also provides the representative results of eight examples. As is shown in the figure, we can see that our proposed algorithm can accurately locate the tampered region even in those small or smooth copymove regions.

Detection Results under Various Attacks.
In order to obtain a more detailed assessment of the discriminative properties of the method, the detailed data of copy-move forgery detection results, experimented on 1392 tampered images under various attacks in total, are shown in Figure 13. We use 1392 images in total under different attacks. Figure 13 provides all qualitative results: top to bottom-scale attack, rotation attack, Gaussian noise addition, and JPEG compression; left to right-precision rate, recall rate, and F1 score. As shown in the figure, the precision rate and recall rate of our scheme reached a higher level than other methods, the F1 score was particularly prominent under scale indicating that our method provides a good balance of precision and recall. e main reason is that our method proposes a twostage local search algorithm, which can not only locate the tampered region at the image block level but also locate the edge at the pixel level. In other words, our scheme performed better than most state-of-the-art methods in most cases; however, our method has a very low score when the standard deviation exceeds 0.6, and we will address this deficiency in subsequent work.

Conclusion
With the development of digital technology, digital images can be easily forged using image processing software. Forged images must be identified given the potential legal and other implications. In this paper, we propose a copy-move forgery detection algorithm using SIFT as the interest point and feature extraction method. e affine transformation matrix was then calculated, followed by a local search algorithm to locate the forged region. Experimental results show that the proposed scheme performs much better than state-of-the-art copy-move forgery detection algorithms and demonstrates good performance under various attacks. However, performance was poor when images contained noise; we will focus on this image type in later work.
Future research is mainly as follows: (1) To address the problem that the method cannot adapt to noisy operations, future plans are to incorporate richer texture feature information to achieve better robustness (2) In future work, we will focus on detection tasks with multiple copy-move tampered regions at the same image to realize practical applications of the detection algorithm

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.