An Image Similarity Acceleration Detection Algorithm Based on Sparse Coding

Aiming at the problem that the image similarity detection efficiency is low based on local feature, an algorithm called ScSIFT for image similarity acceleration detection based on sparse coding is proposed. The algorithm improves the image similarity matching speed by sparse coding and indexing the extracted local features. Firstly, the SIFT feature of the image is extracted as a training sample to complete the overcomplete dictionary, and a set of overcomplete bases is obtained. The SIFT feature vector of the image is sparse-coded with the overcomplete dictionary, and the sparse feature vector is used to build an index. The image similarity detection result is obtained by comparing the sparse coefficients. The experimental results show that the proposed algorithm can significantly improve the detection speed compared with the traditional algorithm based on local feature detection under the premise of guaranteeing the accuracy of algorithm detection.


Introduction
Image similarity detection is a hot issue in the field of multimedia information processing.Similar image is a set of images obtained from an image of the same scene or the same object taken from different environmental conditions such as different angles or different lighting conditions and edited transformations of the same original image through different ways.Examples of some similar images are shown in Figure 1.
Image similarity detection is to judge the similarity of visual content by matching the image.According to the adopted feature, image similarity detection methods can be divided into two categories, namely, global-featurebased detection methods and local-feature-based detection methods.The global feature of the image refers to the use of one or a few feature vectors to represent whole image content.Common global features include color histograms, texture features, and block features.Because the number of feature points is small, the calculation speed of image content similarity detection based on global feature is usually very fast.However, due to the singularity of its feature selection and the roughness of the description image, the global feature is very susceptible to edits and local transformations.For example, image similarity detection with color histogram as global feature is very sensitive to the illumination of the image.Usually, similar images are created by editing transformation; similarity detection accuracy is generally relatively low based on the global features of the image content.
In recent years, some scholars have proposed local features for image similarity detection.Compared with global features, local features of the image usually have some local invariance for the illumination, rotation and scaling of the image and have been widely applied in the field of contentbased image and video retrieval.Local feature points are usually local extreme points in an area of the image, and have more obvious features than the rest of the pixels in the region.Description of the local feature points is generally the combination of the characteristics of the key points and the information of the surrounding area, thus ensuring the local invariance of the feature.
The SIFT feature proposed in [1] is still the most accurate feature of image matching because of its good invariance of rotation, scaling, and scale of the local neighborhood of the image, but its 128-dimensional high dimensional feature vectors and the number of feature points detected will usually bring a large computational burden and reduce the efficiency of the algorithm.In the literature [2], dimension of the SIFT feature point is reduced by principal component analysis (PCA), and a PCA-SIFT feature is proposed.This feature reduces the SIFT feature vector from 128 degrees to 32 degrees or less, which improves the efficiency of the algorithm.
Hong Kong City University [3] proposes a similar key frame detection algorithm based on the PCA-SIFT, better than the previous algorithm.Based on the timing information of key frames, the algorithm divides the key frames into different time series groups, performs similar key frame detection in PCA-SIFT in each group, and then analyzes the correlation between different story units.Although the algorithm improves the efficiency of the algorithm by PCA dimensionality reduction, it is still difficult to meet the realtime requirement of large data processing.In the literature [4], a SIFT feature filtering algorithm is proposed, which effectively reduces the SIFT feature points extracted from an image by the punishment mechanism, reduces the computational complexity, and improves the matching accuracy of the SIFT algorithm.In [5,6], the image is classified into different categories by using the clustering algorithm.It calculates the distance between the image to be detected and different clustering center and selects the nearest several types of images and exactly matches their SIFT characters, which reduces the SIFT feature matching data and improves the detection speed of the algorithm.
In summary, the shortcomings of the current image similarity detection are as follows: although the global feature is faster, it does not have local information, which is usually sensitive to the partial change of the image and is not robust enough.Although the local feature can guarantee the local deformation of the feature points, the computational efficiency is very low due to the high dimension of the feature points.At present, the research based on local features mainly focuses on the acceleration of local invariant features, and some studies focus on the comprehensive use of global features and local features [7].How to detect the content of similar images quickly and effectively is still a problem.

Principle of ScSIFT Algorithm
The algorithm of Scale Invariant Feature Transform based on Sparse Coding (ScSIFT) focuses on the sparse SIFT feature vectors.After extracts of the SIFT feature, the algorithm conducts sparse coding on the 128-dimension SIFT feature vector and establishes the feature sparse coding index to improve the matching speed.The merits of the sparse representation of the original signal depend largely on the selection and design of the overcomplete dictionary.The errors generated by the partial eigenvectors are less than the errors produced by the default fixed dictionaries when sparsity represents the rest of the eigenvectors, and different dictionary learning algorithm to learn the error between the dictionaries is basically the same.This paper uses the dictionary learning algorithm proposed in [8] to train the complete dictionary.The main idea of the ScSIFT algorithm is to use the SIFT feature extracted from the key frame image of the query library as a training sample to train the overcomplete dictionary to obtain a set of overcomplete bases.The SIFT feature vector of the query image is sparsely encoded with the overcomplete dictionary, and the sparse feature vector is indexed.Furthermore, the SIFT feature of the image to be detected is matched with the query image feature index by using the overcomplete dictionary to obtain a set of similar candidate sets, compare the sparse coefficient of the image to be detected and the sparse coefficient of the candidate image and get similar image detection results.SIFT feature sparse coding of the sparse coefficient vector is called the image ScSIFT features.
While learning the dictionary as a training sample with the SIFT feature, the SIFT feature should be normalized.Standardization process can be seen in where  is a 128 ×  SIFT feature matrix. means is a 128 ×  matrix with the mean of the columns of eigenvector group .|  | mod is a 128× matrix with the mod of the columns of eigenvector   .  is a normalized SIFT feature vector group.
In general, the distance between the image's SIFT feature points uses Euclidean distance or absolute distance measurement.Euclidean distance is also known as  2 norm distance, and the distance calculation can be seen in ( The sparse representation of the eigenvector is performed by the overcomplete dictionary  obtained by training, as shown in The sparse feature vector is as shown in where ‖ − ‖ 2 is the Euclidean distance of feature vectors  and .‖  −   ‖ 2 is the Euclidean distance of sparse vectors   and   . From formula (4), we can see that the square of the distance between the SIFT features is related to the square of the distance between the sparse representation of the eigenvector.Thus, the distance between the original SIFT features can be represented by the distance of the sparse feature vector.

󵄩 󵄩 󵄩 󵄩 𝛼
Since the dimension of ScSIFT feature vector is much higher than SIFT feature vector, it would be time-consuming to compute the distance by (5).While the vast majority of sparse vectors are zero, the ScSIFT feature distance calculation can be simplified as where  is the element ordinal for those nonzero values in vector   and zero value in vector   .Similarly,  is the element ordinal for those zero values in vector   and nonzero value in vector   . is the element ordinal for those nonzero values in both vectors   and   .When comparing the ScSIFT features of the image and the candidate set image, the ScSIFT feature of the candidate set image can be indexed to improve the retrieval efficiency.The indexing process is as follows.
Calculate the number num of nonzero elements in the secondary index  bool .Because the vast majority of elements in  bool are 0, the range of num is not large and varies in the [0, 20] range.Use num as the first-level index of the ScSIFT feature.
When the query ScSIFT eigenvector is to be compared with the ScSIFT vector in dataset, the ScSIFT eigenvector  is transformed into a binary vector  bool according to formula (7), and the number num  of nonzero elements in  bool is calculated.Then compare num  with the first-level index to the corresponding column num  , and compare  bool and column num  to retrieve the corresponding ScSIFT eigenvector ( 1 ,  2 , . . .,   ) in the dataset.Finally, match the ScSIFT feature vector  and ( 1 ,  2 , . . .,   ) to find the nearest neighbor.

Process of ScSIFT Algorithm
The implementation of image similarity acceleration detection algorithm based on sparse coding is mainly divided into three subprocesses: sparse dictionary learning algorithm, query image offline sparse coding algorithm, and real-time matching algorithm of images.

Sparse Dictionary
Learning.The basic process of sparse dictionary learning algorithm is as follows: (1) Select the  images  1 ,  2 , . . .,   from the query library, extract the SIFT features of these images, and constitute the training feature set .

Offline Sparse Coding.
Offline sparse coding is carried out for the similar images in the query library; the basic process is as follows: (1) Read the first image  of the query library.(4) Search for the nearest  column coefficients    of the ScSIFT feature  according to the index index.
(5) Calculate the distance   of    for each column and  according to formula (6).If   ≤ , then the feature matching amount ∑  of the image  which the column belongs to is increased by 1. (6) Repeat steps (4) and (5) until all columns of the ScSIFT feature are cycled once.(7) For each feature matching amount of image  which meets the condition Σ  ̸ = 0, calculate the total number of sparse feature points Total  and the similarity degree Σ  /Total  .If Σ  /Total  ≥ , then the image  is the similar image to the query image.(8) End.

Experimental Results and Analysis
In the experiment, we select a total of 10816 frames extracted from 1000 videos as the query library images.Five different images are selected from the library, and, for each image, Gaussian blur, the mark transform, the gray scale transformation, and the scale cropping transform are carried out.The original image and the transformed images, altogether 50 images, are compared with the query library by similarity detection, and comparisons of experimental results are carried out.

Sparse Coding Dictionary Learning Experiment.
Error of sparse coding depends mainly on the choice of overcomplete dictionary.In this paper, we choose the overcomplete dictionary based on feature learning and the overcomplete dictionary of DCT as a comparison and carry out sparse coding experiments to test the influence of different dictionaries on sparse coding.Experimental factors include two aspects: time of training dictionary and feature coding and error of the dictionary sparse coding.
The sparse coding error  is calculated by where  is the original data signal,  is the overcomplete dictionary,  is the sparse coding of the signal , and  is the sparse coding regular parameter.In this paper, number of blocks, number of cycles, and dictionary dimension are used as the control variable training dictionaries to compare the coding errors and the differences in dictionary learning time between different control variables.At the same time, use the DCT dictionary as the benchmark, and compare the coding error and the feature coding time difference between the feature learning dictionary and the DCT dictionary of different dimension.The experimental results are shown in Figure 2.
Figures 2(a) and 2(b) show the errors and the dictionary learning time between different number of blocks in the dictionary learning and different number of cycles.Here, the number of blocks in Figure 2  change of the number of blocks and the number of cycles have no significant effect on the dictionary coding error, and the variation of the dictionary coding error is less than 0.01, while it has a greater impact on dictionary learning time.Therefore, the variable block number and the cycle number can be set to a smaller value to save training time and improve the efficiency of dictionary learning.
Figures 2(c) and 2(e) are the dictionary coding errors corresponding to the dictionary dimension change set (200, 400, . . ., 5000) and the dictionary learning time.Figures 2(d) and 2(f) are the coding time required for sparse coding of features in a dictionary of corresponding dimensions.Compared with Figures 2(c) and 2(e), it can be found that the coding error of the feature learning dictionary and the DCT dictionary decreases significantly with the increase of the dictionary dimension, but the DCT dictionary coding error decreases to about 0.43 without obvious change, and its dictionary construction time is basically the same.The error of the feature learning dictionary decreases from 0.28 to 0.15 as the dimension increases, but its dictionary learning time is also significantly increased.At the same time, compared with Figures 2(d) and 2(f), the two dictionary features of the coding time are significantly increased when the dictionary dimension increases, the feature learning dictionary coding time accompanied by the dimension's increasing eventually remains at 15 seconds, and the encoding time of DCT dictionary is increased from 0 to 40 seconds significantly.
By contrast, we can find that the feature learning dictionary has obvious advantages compared with the DCT dictionary.One is that its coding error is lower than that of the DCT dictionary, and it is less than 28%, and the coding time is better than that of the DCT dictionary.When the dictionary is trained by the feature, change of the feature number and the cycle number has little effect on the coding error of the dictionary, but the dictionary learning time is significantly increased.Therefore, these two parameters can be set low to improve the dictionary learning efficiency.In this paper, set the block number batch = 400, the cycle number iter = 100, and the dictionary dimension   = 512.

ScSIFT Algorithm Validity
Test.The validity of ScSIFT algorithm is verified by comparison with SIFT algorithm and SURF algorithm.In the experiment, select five key frame images and 10 sets of key frame images as query image to test different algorithm's running time.The number of key frame images is (5, 15, . . ., 95) frames, respectively.
Feature extraction time, distance calculation time between feature points, average matching time of each frame, and total run time of three algorithms are shown in Table 1.The total runtime is shown in Figure 3.
In Table 1, single-frame detection time is the average similarity detection time for a frame.
From Table 1 and Figure 3, we can find that SIFT feature extraction speed is the fastest with an average of 0.7 seconds.SURF feature extraction speed is the slowest with an average of 2.76 seconds.For the same number of key frames similarity detection, SIFT algorithm detection time is 4 times of ScSIFT algorithm.The detection time of ScSIFT algorithm and SURF is very close when the number of key frames is small, and as the number of key frames increases, the speed is slower than SURF.This is mainly because the number of feature points extracted by the SIFT algorithm is larger than the number extracted by the SURF feature.When the number of key frames is large, the disadvantage of increased computational burden is gradually evident.However, due to the slow feature point detection speed of the SURF algorithm, total running time of the algorithm is slower than that of the ScSIFT algorithm and the SIFT algorithm when the number of key frames is small.Experiments show that the average run time of ScSIFT algorithm is 52% higher than that of SIFT algorithm, which is 45% higher than SURF algorithm.Six frames which are transformed by gray scale transformation, scale cropping transformation, tag adding, and Gaussian blur are selected.The similarity determination is performed by ScSIFT algorithm and SIFT algorithm, respectively, with other selected fifteen frames to test the robustness of ScSIFT algorithm in different edit styles.ScSIFT algorithm and SIFT algorithm for different transformations of key frame similarity detection results are shown in Tables 2 and 3.
The comparison results of Tables 2 and 3 are shown in Figure 4.
In Figure 4, " " is the ScSIFT algorithm detection results, " " is the SIFT algorithm test results.The abscissa in the figure is the image frame number, and the ordinate is similarity.Combining the results in Tables 2 and 3 and Figure 4, it is shown that the similarity search results of ScSIFT algorithm are basically the same as SIFT algorithm, and it is slightly higher than SIFT for some videos.In summary, the experimental results show that ScSIFT algorithm is similar to SIFT algorithm, but its operation speed is faster, which is 52% higher than that of the latter.

Figure 1 :
Figure 1: Examples of some similar images.

( 2 )
Extract the SIFT feature of the image  to constitute the image feature set .(3)The training feature set  is normalized according to formula (1), and the normalized feature set   is obtained.(4) The sparse algorithm is performed on the normalized feature set   ; preserve the sparse coefficient , that is, the ScSIFT feature.(5) If there is an uncoded image in the query library, continue to read the next image , and repeat steps (2), (3), and (4).Otherwise, go to step (6).

( 6 )
Create index and save all images' ScSIFT features in the query library.(7) End.

3. 3 .( 1 )
Real-Time Matching of Images.The basic process of the algorithm of real-time matching images is as follows: Read the image , extract its SIFT feature to form the feature set , and normalize it to   .(2) Get the sparse representation of the normalized feature set   by the sparse algorithm, and obtain the ScSIFT feature .(3) Set the ScSIFT feature similarity distance threshold  and image similarity threshold , and read the first column of ScSIFT feature  =  0 .
(a) and the number of training cycles in Figure 2(b) are set to (100, 200, . . ., 1000), respectively.It can be seen from Figures 2(a) and 2(b) that

Figure 3 :
Figure 3: Total runtime comparison of the three algorithms.

Figure 4 :
Figure 4: The detection results of two algorithms on different keyframe transforms.

Table 1 :
Feature extraction and distance detection time of the three algorithms.

Table 3 :
SIFT algorithm similarity detection results.Image similarity detection is a hot issue in the field of multimedia information processing.Current image similarity detection methods are mainly based on global features or local features.Image similarity matching method based on global feature is fast but has the disadvantage of poor robustness.Image similarity matching method based on local feature is just the opposite.Aiming at the problem of high computational complexity and low efficiency based on the local feature detection method, an algorithm for image similarity detection based on sparse coding is proposed.The algorithm improves the detection speed by sparsely coding the local features and indexing them.The experimental results show that, compared with the traditional image similarity detection algorithm based on local feature, the algorithm can improve the detection speed while ensuring the detection efficiency.In the future work, we will further combine global and local features to ensure the best balance between detection accuracy and speed.