Image Anomaly Detection Based on Adaptive Iteration and Feature Extraction in Edge-Cloud IoT

The Internet of Things (IoT) has penetrated into various application ﬁ elds. If the multimedia information obtained by the IoT device is tampered with, the subsequent information processing will be a ﬀ ected, resulting in an incorrect service and even security threat. Therefore, it is very necessary to study multimedia forensics technology for IoT security. In the edge-cloud IoT environment, an image anomaly detection technology for security service is proposed in this paper. First, preprocessing is performed before image anomaly detection. Then, we extracted sparse features from the image to roughly localize the region of anomaly detection. Feature extraction based on the polar cosine transform (PCT) is then performed only on the candidate region of anomaly detection. To further improve the detection accuracy, we use iterative updating. This method makes use of the feature that the edge node is closer to the multimedia source in physical location and migrates the complex computing task of image anomaly detection from the cloud computing center to the edge node. Provide a security service for abnormal data and deploy it to the edge-cloud server to reduce the pressure on the cloud. Overall, preprocessing improves the ability of feature extraction in smooth or small region of anomaly detections, and the iterative strategy enhances the security service. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods.


Introduction
In recent years, with the continuous integration of emerging technologies such as artificial intelligence, blockchain [1], big data [2], and the Internet of Things (IoT) [2][3][4][5][6][7] and the increasing number of intelligent devices [8], the image data to be processed by the IoT has increased exponentially. IoT technology has penetrated into many fields, and its development has attracted extensive attention. A large number of multimedia data are generated in IoT. If these multimedia data are tampered with, it will threaten the information security and the Internet [9]. Therefore, the research of multimedia forensics is of great significance. Image forensics is an important branch of multimedia forensics. Aiming at the problems of high delay and low processing efficiency of edge cloud, an image anomaly detection method based on edge computing is proposed. Deploy the image security service task to the edge device closest to the image data to be processed to share the computing pressure of the cloud server.
The methods of image anomaly detection [10] can be divided into active methods and passive methods. Active methods are aimed at embedding useful information in an image and then verifying the authenticity and integrity of the image by evaluating the embedded information. However, conventional digital cameras lack digital watermarking functions for security. Consequently, active methods cannot be used when embedded information is unavailable. Alternatively, passive methods, also known as blind forensics, do not require preprocessing of digital images. Thus, it is used to identify the authenticity of images without embedded information, being more applicable than active methods. To conceal tampering and make the image visually more realistic, postprocessing can be applied to the cloned area with methods such as rotation, loss JPEG compression, scaling, and other distortions.
Two main types of passive forensic algorithms are used. One is based on block matching, also known as dense-field algorithm, and the other is based on key points, also known as sparse-field algorithm. Dense-field algorithms usually divide an image into circular or square overlapping blocks to extract a feature vector from each block. After lexicographic sorting, the similarity between the successive vectors is evaluated, and the region of anomaly detection is determined by thresholding. Generally, dense-field algorithms have high computational complexity and may lead to false matching of similar smooth areas in natural images. On the other hand, sparse-field algorithms extract selected points, called key points, to generate feature descriptors. Key points have distinctive characteristics and can reflect essential characteristics of an image to identify target objects. However, sparse-field algorithms cannot extract enough key points from smooth or small areas in images, limiting their performance. In addition, the sparsity of key points impedes the accurate localization of duplicated areas.
To handle the abovementioned problems and leverage both dense-field and sparse-field algorithms, we propose an algorithm integrating these algorithms. First, the region of anomaly detection is roughly localized using a sparse-field algorithm, and then, a dense-field algorithm is applied to accurately determine the region of anomaly detection. Furthermore, we propose an adaptive iterative strategy to improve the localization accuracy. The main contributions of this study are summarized as follows: (1) In the edge-cloud IoT, an anomaly detection technology for security service is proposed to further construct the trust mechanism of network data. This method makes use of the feature that the edge node is closer to the multimedia source in physical location and migrates the complex computing task of image anomaly detection from the cloud computing center to the edge node.
(2) The advantages of dense-field and sparse-field algorithms are combined in the proposed method. The proposed algorithm first obtains the approximate location of anomaly detection by sparse-field algorithm and then obtains the accurate location of anomaly detection by dense-field algorithm.
(3) An adaptive iterative strategy is introduced to improve the accuracy of tampering localization. Even if few matching points are available, the region of anomaly detection can be accurately determined The remainder of this paper is organized as follows. Section 2 presents related work. In Section 3, we detail the proposed algorithm. Section 4 reports experimental results. Finally, we draw conclusions in Section 5.

Related Work
Edge-cloud calculation in IoT means processing data at the edge of the network. Edge computing may solve the prob-lems of response time requirements, battery life constraints, and bandwidth cost savings and provide data security services [11]. Ferrari et al. used full-cloud and edge-cloud architectures for industrial IoT anomaly detection [12]. The results show that edge domain can reduce data transmission and communication delay. Feature extraction and feature matching are the bases in image anomaly detection [13]. In a dense-field algorithm, detection involves block feature extraction and feature matching across blocks [14]. The discrete cosine transform (DCT) was first proposed by Fridrich et al. [15]. However, the corresponding algorithm has high computational complexity and low robustness. Subsequent improvements to feature extraction measures have been proposed, such as principal component analysis (PCA) [16], singular value decomposition (SVD) [17], discrete wavelet transform (DWT) [18], blur-invariant moment features [19], and local binary patterns (LBP) [20]. Bayram et al. [21] extracted scale-invariant features from each block using the Fourier-Mellin transform (FMT). However, this algorithm is only robust for small region rotations. On the other hand, the Zernike moments (ZM) proposed by Ryu et al. [22,23] and the polar cosine transform (PCT) proposed by Li [24] allow to extract robust rotation-invariant features from small overlapping blocks. For matching, lexicographic sorting is widely used [25]. To accelerate matching, k-dimensional trees [19] and locality-sensitive hashing [24] have been adopted to detect similar patches. However, these algorithms have high computational complexity because all image blocks should be matched. Recently, a fast approximate nearest neighbor search algorithm called Patch Match (PM), which is based on nearest neighbor search, was introduced [26,27]. Regarding performance, sparse-field algorithms are faster than dense-field algorithms because the former should process fewer points. The scale-invariant feature transform (SIFT) was proposed by Lowe [28] in 1999. Luo et al. [29] extracted rotation and scale invariant descriptors. Subsequently, an accelerated version called speeded up robust features (SURF) was proposed [30]. Other fast feature detection and description algorithms include oriented features from accelerated segment test (FAST) and rotated binary robust independent elementary features (BRIEF) [31], multisupport region order-based gradient histogram [32], and histogram of oriented gradients.
In recent years, blockchain [33] and deep learning have been used for information protection [34,35]. Fusion strategies based on SIFT have achieved suitable detection results [36][37][38][39]. In particular, the histogram of oriented gradients has been applied to feature extraction and tampering detection using a support vector machine (SVM) [36]. Nonoverlapping superpixel segmentation has been used as a preprocessing step before applying feature extraction [37]. Features have been extracted and matched in two different color spaces for rough detection [38], and DCT features have been extracted for accurate localization. Furthermore, key points have been detected using a uniqueness metric and described using PCT [39], with iterative improvement enabling accurate localization. Despite its advantages, SIFT has various drawbacks. Specifically, it cannot detect 2 Wireless Communications and Mobile Computing tampering of smooth or small areas in an image. In addition, the sparsity of feature points provided by SIFT impedes to accurately locate the region of anomaly detection. We propose three strategies to overcome the limitations of this method. First, the target image is represented in the Lab color space in smooth areas. Second, rescaling is applied in small areas. Third, the localization accuracy is improved by combining dense-field and sparse-field algorithms.

Proposed Algorithm
IoT technology [40] has penetrated into many fields [41], and its development has attracted extensive attention [42]. Edge cloud is a cloud computing platform built on edge infrastructure based on the core and edge computing capabilities of cloud computing technology to form an elastic cloud platform with comprehensive capabilities in computing, network, storage, and security at the edge. The edgecloud IoT architecture is shown in Figure 1. We can see that the edge cloud, central cloud, and IoT terminal in Figure 1 form an end-to-end "cloud three-body collaboration" technical framework. By placing tasks such as computing and intelligent data analysis at the edge, cloud pressure can be reduced. The image data generated by massive terminal devices are transmitted to the cloud computing layer [43,44] for centralized processing through the network, which has the problems of large amount of calculation and large image processing delay. An image anomaly detection method for security service, which is based on edge calculation, is proposed in this paper. Taking advantage of the fact that the edge nodes are closer to the multimedia source in physical location, the complex image analysis and processing computing tasks are migrated from the cloud computing center to the edge computing layer. We propose an iterative algorithm based on dense-field and sparse-field algorithms in edge-cloud IoT. First, SIFT is applied to roughly locate the region of anomaly detection. Then, PCT feature extraction is performed only on the candidate region of anomaly detection, and PM is used for matching. As SIFT may partially identify a region of anomaly detection, an adaptive iterative strategy is introduced to further improve the localization accuracy. Finally, after morphological operations, the region of anomaly detection is accurately localized.
The flowchart of the proposed algorithm is shown in Figure 2. The algorithm comprises a rough localization stage (including preprocessing) and an accurate localization stage. The following subsections detail each process in the proposed algorithm.
3.1. Image Preprocessing. Firstly, the image is preprocessed. The image analysis process does not need to transmit the image to the cloud through the network for processing but directly analyzes and processes the image in the edge server close to the data source. SIFT is a feature extraction and matching algorithm that provides higher accuracy and robustness to scaling attacks than similar algorithms such as SURF, BRIEF, oriented FAST, and rotated BRIEF. SIFT can extract key points on a spatial scale without being affected by illumination, affine transformations, noise, and other image factors such as corner points, edge points, bright spots in dark areas, and dark spots in bright areas. Based on these key points, feature descriptors of each key point are generated. Owing to its superior performance, we use SIFT for feature extraction in the rough localization stage.
A common preprocessing step before applying SIFT is representing the target RGB (red-green-blue) image in grayscale. However, detection often fails when using grayscale images, especially in smooth areas. To prevent this problem, channels a and b of the Lab color space, the grayscale image, and contrast limited adaptive histogram equalization have been used for preprocessing before feature extraction [38]. Reducing the contrast threshold and rescaling the image have also been used as preprocessing methods [45]. Although such preprocessing methods can increase the number of matching points, they apply various techniques simultaneously, resulting in a large computational overhead. Figure 3 gives an example using SIFT for two preprocessing methods. Figures 3(a)-3(c) gives the tampered image, the tampered image and the ground truth, separately.

Figures 3(d) and 3(e)
show key points extracted from the grayscale and Lab space (channel a), separately. We can see that the key points in Lab space are denser than those in grayscale. In contrast, after representing the RGB image in Lab color space, tampering could be detected using channel a, as shown in Figure 3(f). The Lab color space allows to extract more key points than the grayscale representation for smooth areas. Nevertheless, the grayscale representation is more robust than the Lab color space against various postprocessing attacks.
In the proposed algorithm, three preprocessing methods are used: (1) RGB-to-grayscale transformation, (2) RGB-to-Lab transformation, and (3) image resizing. However, if these methods are used simultaneously, the computational overhead would notably increase. Therefore, only when one preprocessing method fails, the next method is used, effectively reducing the calculation burden. On the other hand, the proposed algorithm does not require many matching points for rough localization. Thus, if three or more matching points are identified, accurate localization can proceed iteratively. The main preprocessing steps are described as follows.
Step 1. The RGB image is converted into a grayscale image, and feature matching is performed.
Step 2. If the security detection fails, the image is represented in the Lab color space for detection.
Step 3. Otherwise, the image is expanded to repeat detection. If the security detection fails after applying the three preprocessing methods, the image is considered as authentic and safe.  [46] for feature extraction and description. After preprocessing, 128-dimensional SIFT features are extracted. We denote the key points as x i ði = 1,⋯,nÞ and the feature descriptors as f i ði = 1,⋯,nÞ for n feature points. Then, generalized two nearest neighbor matching is applied [47]. The Euclidean distance between a feature descriptor and the other descriptors is calculated. For example, we calculate the distance between f 1 and f 2 , f 3 , ⋯, f n and obtain distance vector D = fd 1 , d 2 ,⋯,d n−1 g after sorting. If d k /d k+1 < T thresh and d k+1 / d k+2 ≥ T thresh for kð1 ≤ k ≤ n − 2Þ, then feature point x 1 and the key points with distances of fd 1 , d 2 ,⋯,d k g from x 1 are considered to be matching. In this study, we set the threshold T thresh to 0.05.
As many similar areas can appear in natural images, false matching should be prevented. To this end, we use agglomerative hierarchical clustering [47] to filter out classes with less than three points. Furthermore, we use robust random sample consensus to estimate homograph that allows to filter out the effects of unwanted outliers. When at least two classes are detected and at least three matched pairs between classes are available, we consider that the image is tampered.
The sparse-field algorithm can only provide an approximate location of the anomaly detection through the abovementioned steps. For smooth or small region of anomaly detections, few matched points may be extracted, undermining the accuracy. As shown in Figure 3(f), after rough localization, only eight matching points are obtained, being   Wireless Communications and Mobile Computing difficult to accurately determine the region of anomaly detection. Therefore, we use a dense-field algorithm and an iterative strategy for accurate localization in the following stage.

Accurate Localization Stage.
To improve the localization accuracy, we use an iterative update strategy as described below.
Step 1. By centering at the matching points, the candidate tampering area (R) is expanded as follows: where x j ðj = 1,⋯,mÞ represents the matching points obtained during rough localization, I represents the target image, B = 30 + ½0:1 ffiffiffiffiffiffiffiffiffiffiffiffiffi M × N p is the expansion radius, and M × N is the size of the target image.
Step 2. Using R, block matching is used for accurate localization. Considering the powerful distinguishing performance of PCT, we use it to extract block features [24]. Specifically, 9-dimensional PCT block features are extracted from expanded matching area R. Let f ðr, θÞ denote the polar coor-dinates of the image. The PCT with order n and repetition l can be expressed as where H n,l ðr, θÞ = cos ðπnr 2 Þe ilθ is the kernel equation of PCT and Then, the PCT feature vector can be calculated as After PCT block feature extraction, PM [26] and dense linear filtering are applied for matching and filtering out mismatches, respectively. The PM algorithm proposed by Barnes et al. [26] is an approximate nearest neighbor search algorithm. The algorithm searches for similar image blocks globally in a single image through neighborhood search and random sampling. It mainly includes three steps: 5 Wireless Communications and Mobile Computing random initialization, propagation, and random search. Filtering is mainly aimed at finding a dense approximate neighbor matching between image blocks through initialization, propagation, and random search. After this step, we obtain candidate region of anomaly detection map ðiÞ , where i is the number of iterations.
Step 3. To remove isolated small erroneous detections, corrosion is applied to map ðiÞ with radius B, obtaining area Cor map ðiÞ after corrosion.
Step 5. Map dif map ðiÞ is obtained as ðExp map ðiÞ − Cor ma p ðiÞ > 0Þ. The algorithm returns to Step 2 to obtain map ne w ðiÞ . Except for the first iteration, PCT feature matching is applied only to new area dif map ðiÞ during any other iteration.
Step 7. The morphology open operation is applied to delete objects with area below T in map ði+1Þ . In this study, we used eight neighborhoods and a minimum clone size T of 1200.
Step 8. The candidate region of anomaly detections obtained over iterations is denoted as map F i = fmap ð1Þ ,⋯,map ðiÞ g ði ≥ 2Þ: Their first derivative is denoted as ∇map F i = diff ðmap F i Þ. If ∇map F i ðendÞ ≤ T term (set to 500 in this study) or the number of iterations exceeds maximum limit T iter (set to 5 in this study), the algorithm terminates. Otherwise, the algorithm returns to Step 3 to start a new iteration. The pseudocode is shown in Algorithm 1.

Experimental Results
We evaluated the performance of the proposed algorithm on the GRIP dataset [14]. This dataset contains 80 original images of 768 × 1024 pixels along with the corresponding copy-move forged images and ground truths. Most of the copies in this dataset are obtained from smooth areas. For the experiments, we used a computer equipped with a 2.60 GHz Intel(R) Core i7-9850H CPU running a MATLAB R2019a implementation.

Evaluation Criteria.
We calculated the precision and recall at the image level and pixel level to evaluate the performance of the proposed anomaly detection algorithm: Input: I: the tested image; T iter : the maximum number of iteration; T term : algorithm termination threshold; Obtaining the candidate tampering area Rðx, yÞ using formula (1) Algorithm 1: The proposed adaptive iterative algorithm.
where T P is the number of tampered images in image level (or tampered pixels in pixel level) correctly detected, F P is the number of original images in image level (or original pixels in pixel level) erroneously detected as tampered, and F N is the number of tampered images in image level (or tampered pixels in pixel level) incorrectly detected as authentic.
The precision represents the accuracy of the predicted results, and the recall represents the accuracy of the total positive samples. Thus, higher precision and recall indicate a better algorithm. However, a low recall implies a high precision and vice versa. Thus, we used another comprehensive measure, the F 1 score, obtained as the harmonic mean of the precision and recall: 4.2. GRIP Dataset. Given an image, we need to determine the presence of tampering, in which case it becomes necessary to accurately localize the region of anomaly detection. We evaluated the proposed algorithm at the image level and pixel level separately. We combined 160 images, including 80 original images and 80 tampered images from the GRIP dataset. At the image level, we obtained precision of 93%, recall of 1, and F 1 score of 96%. At the pixel level, we obtained precision of 95%, recall of 99%, and F 1 score of 97%. The F 1 score obtained from different methods are listed in Table 1. At the image level, the proposed algorithm provides the highest F 1 score. At the pixel level, the proposed 7 Wireless Communications and Mobile Computing algorithm has the same F 1 score as the method in Ref. [38] and higher F 1 score than the other methods. The proposed algorithm provides better detection because mismatched points obtained from rough localization are likely eliminated after accurate localization. Figure 4 shows an example of this situation. We tested the tampered image in Figure 4(a) at the image level. The detection results for SIFT matching are shown in Figure 4(b). The points enclosed by the red circle indicate SIFT mismatching, which is eliminated after PM matching, as shown in Figure 4(c). Figure 5 shows examples of textured, mixed, and smooth region of anomaly detections. Figure 5(a) shows the forged images, and Figures 5(b)-5 (d) show the corresponding results for PM [27], SIFT [39], and the proposed algorithm, respectively. The red area in the detection result indicates false detection, while the white area indicates that tampering could not be detected, and the green area indicates correct detection. The remaining black areas represent areas that neither have been tampered with nor have been misdetected. The PM algorithm suitably detects tampering in smooth areas (second and third rows), but it provides false detection for the textured area (first row). SIFT fails to accurately localize the region of anomaly detection and is completely unable to detect tampering in the smooth area. In contrast, the proposed algorithm combining SIFT and PCT provides the best detection results.

FAU Dataset.
We also used the public image dataset in Ref. [50] to test the performance of the proposed algorithm under rotation and scaling attacks in smoothed areas. In Figure 6, we tested 15 rotation or scaling images of smoothed area tampering. The first row shows rotation attacks from 2°to 10°, with step of 2°. The second row shows scaling attacks from 91% to 109%, with the step as 2%. We compare the proposed method with the state-of the art method: the SIFT-based method [39], indicated in red, and the PM-based method [27], indicated in blue. The results indicated in green are the detection result of the proposed method. We can see that the proposed scheme performed better than the other two methods in smoothed area tampering.
The computation time of the proposed algorithm and similar methods is listed in Table 2. By calculating 160 images in the GRIP dataset, the mean computation time of the proposed algorithm is slightly higher than that of the methods in Refs. [38,39], but it remains within an acceptable range.

Conclusions
At present, in the image anomaly detection task of IoT, a large number of terminal devices transmit images to the cloud computing center through the network, resulting in large computing load and high image processing delay. In the edge-cloud IoT, a security service-oriented image anomaly detection technology is proposed in this paper. The RGB   image is represented in grayscale and channel a of the Lab color space, and it is resized for preprocessing. Then, SIFT feature extraction is applied. The preprocessing methods are not performed simultaneously, but each method is applied only if the preceding one cannot detect tampering, effectively reducing the computational overhead. SIFT feature matching then provides a rough localization of the anomaly detection, while PCT block feature extraction and PM feature matching provide accurate localization of the anomaly detection. An adaptive iterative update strategy is introduced to gradually improve the localization accuracy. The performance of the proposed algorithm was evaluated at the image and pixel levels. The experimental results show that migrating the image security service task to the edge computing device can reduce the pressure of the computing center, deal with the image data anomaly detection in time, and improve the image privacy and security. In the future, deep learning algorithms will be combined to improve the scope of application of image anomaly detection.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.