Maximum Entropy Threshold Segmentation for Target Matching Using Speeded-Up Robust Features

,


Introduction
In recent decades, the image target matching not only plays a significant role in many research fields, like the computer vision and digital image processing [1], but also has been widely used in a variety of military and civil applications [2], such as the image target detection, autonomous navigation, 3-dimensional reconstruction, target and scene recognition, and visual positioning and tracking.The image target matching involves two main categories of matching algorithms: gray correlation-based algorithm [3] and feature-based algorithm [4].Gray correlation-based algorithm is based on the calculation of image similarities and the searching for the extreme values of similarities by using the optimal parameters in transformation model.However, feature-based algorithm mainly relies on the matching of the feature parameters extracted from images (e.g., the points, lines, and surfaces in images).In the condition of slight distortion of gray and geometry, although a large amount of computation cost is required by gray correlation-based algorithm, it normally outperforms feature-based algorithm, in terms of accuracy, robustness, and antinoise ability.However, in the serious distortion condition, feature-based algorithm is much preferred due to the lower false matching rates and better robustness for gray changes, image deformation, and occlusion.
The basic motivation of addressing 2DMETS based SURF in this paper is to improve the cost efficiency and matching accuracy further.In concrete terms, due to the smaller sizes of descriptors in integral images (e.g., each descriptor in SURF only contains 64 bins which is half the size of the descriptor in SIFT), 2DMETS based SURF requires lower computation cost for the detecting and matching of feature points compared to the conventional SIFT [5].There are two main steps involved in 2DMETS based SURF: (i) performing 2DMETS to construct 2D gray histogram and (ii) conducting feature point searching and target matching by SURF.
The rest of this paper is structured as follows.In Section 2, we give some related works.In Section 3, the detailed steps of  2DMETS based SURF are discussed.Experimental results are provided in Section 4. Finally, Section 5 concludes this paper and presents some future directions.

Related Work
As the first representative work on image target matching, the authors in [6] proposed the cross correlation algorithm to conduct the target matching in remote multispectral and multitemporal images by using the fast Fourier transform.The sequential similarity detection algorithm (SSDA) addressed in [6] can not only effectively eliminate the unmatched points, but also remarkably save the cost for image matching.Rosenfeld and Kak in [7] used a new concept of cross-correlation-based target matching which relies on the similarities of the gray areas in different templates.the mutual information-based medical image target matching.By conducting normalization of rotation and translation to obtain the affine invariant, SIFT [11] was proved to perform well with respect to image rotation, transformation, and zooming [12].One of the most popular ways to represent local features as the histogram of gradient locations and orientations was introduced in [13].
In recent decades, many institutes and universities proposed a variety of enhanced approaches for image target matching, like the principle component analysis-based SIFT (PCA-SIFT), Harris-SIFT, affine SIFT (ASIFT), shape SIFT (SSIFT), and speeded-up robust features (SURF).
The descriptors in PCA-SIFT can effectively reduce the number and dimensions of feature points.In concrete terms, the descriptors in PCA-SIFT encode salient aspect of image gradients into the neighborhood of feature points and then normalize the gradient patches by using the PCA approach [14].Harris-SIFT relies on the Harris operators to extract feature points and calculate descriptors [5].There are two camera axis parameters, latitude angle and longitude angle, considered in ASIFT [15].Based on the global shape context, SSIFT was applied to recognize the Chinese characters in the images contaminated by complex circumstance in [16].By using the integral images for the image convolution, SURF only requires a small number of histograms to quantize the gradient orientations [17].Sergieh et al. in [18] studied the way to reduce the number of required features by SURF, while preserving the high correct matching performance.Zhang and Hu in [3] invented the Fast-Hessian detectors for SURF from accelerated segment test (FAST) corner detector.Kai et al. in [19] proposed the normalized SURF to reduce the influence of huge difference on target matching.Juan and Gwun in [20] focused on panorama image stitching by the integration of SURF and multiband blending.The abovementioned algorithms fail to carefully consider the interference of background noise and edge pixels on image target matching.To fix this problem, we propose the 2DMETS based SURF in this paper.2DMETS based SURF can be simply recognized as an integration of 2DMETS and SURF.

Steps of 2DMETS of SURF
3.1.Flow Chart.In 2DMETS based SURF, we first construct a 2D gray histogram based on the gray level of each pixel and the average gray level of its 8 neighboring pixels (or neighborhood).Then, we conduct image segmentation for the sake of mitigating the interference from background noise and edge pixels.Finally, we use SURF to conduct the target matching.The flow chart of 2DMETS based SURF for image matching is shown in Figure 1.

Gray Histogram Construction.
For each raw image  (with gray levels ), the gray level of each pixel  ( = 0, . . .,  − 1) and the average gray level of its neighborhood  ( = 0, . . .,  − 1) form a pair of gray levels [, ].We can calculate the probability of each gray level pair (, ) (0 ≤ (, ) ≤ 1) by where (, ) denotes the frequency of pair [, ];  and  stand for the number of pixels in horizontal direction and in vertical direction, respectively, in the raw image.Then, the gray histogram to be constructed in this paper can be recognized as a 2D histogram consisting of the frequencies of gray level pairs.(

Optimization of Maximum Entropy
The total entropy ((, )) for the target and background regions can be obtained by We select the pair [ * ,  * ] which results in the largest total entropy as the optimal maximum entropy threshold, such that

𝐻 (𝑠, 𝑡) .
(4) 3.4.Feature Point Determination.The three main steps involved in the determination of feature points are as follows: integral image construction, interest point detection, and Gaussian scale approximation.After the Gaussian scales have been approximated, all the interested points can be detected.As the final step of the feature point determination, we compare each interested point with its 26 neighboring pixels in a 3 × 3 region at the current and adjacent scales by the nonmaximum suppression approach and then localize the feature points at the interest points which have the local maximum or minimum values of box filter responses.

Calculation of SURF Descriptor.
To guarantee the rotation invariance, each feature point is assigned by a reproducible orientation.By assuming that a feature point is found at scale , Haar wavelet responses with the size 4 can be obtained for the neighboring pixels with radius 6.The Haar wavelet responses are weighted by the Gaussian scale with  = 2 and then represented as the points in a space centered at the feature point.The longest orientation vector is selected as the dominant orientation to be assigned to the descriptor.

Target Matching.
We adopt the Euclidean distance to evaluate the similarity of every two normalized SURF descriptors ( , ), as described in where   and   stand for the th and th normalized SURF descriptors in two different images.We calculate the Euclidean distances from each feature point in one of the two images to its first nearest neighbor (1st NN) and second nearest neighbor (2nd NN) in another image.The matching occurs when the ratio (1st NN)/(2nd NN) is larger than a given threshold.In our experiments, we set the threshold as 0.75.The larger threshold could result in the smaller number of matching points between these two images.

Image Description.
There are four groups of images selected for the testing: (i) group 1 (in Figure 2): indoor shortdistance images containing one target and with slight difference of illumination intensity and angle rotation; (ii) group 2 (in Figure 3): indoor short-distance images containing multiple targets and with similar illumination intensity, but slight difference of angle rotation; (iii) group 3 (in Figure 4): outdoor long-distance images with great difference of angle rotation; this group of images is also used in [13,14]; and (iv) group 4 (in Figure 5): image 1 is from the SOSO street view [21], while image 2 is taken by a SONY L26i cellphone.The interference of background noise in this group of images is more significant compared to the previous three groups of images (e.g., the passing pedestrians).

Matching Results.
First of all, we apply Otsu segmentation and 2DMETS to transform the raw images into blackand-white images in a uniform gray scale to mitigate the interference from background noise and edge pixels, as shown in Figures 2-5.By setting  = 16, we have 256 pairs of gray levels, as represented at horizontal coordinates in gray histogram, while the vertical coordinates stand for the frequencies of gray level pairs.Figures 2(a   Then, we define Repeatability as the ratio between the number of correspondence and the minimal number of feature points (min(feature img1 , feature img2 )): where feature img 1 and feature img 2 stand for the numbers of feature points in the two different images, respectively.The higher value of Repeatability indicates that the targets are more likely to be matched.10.
As can be seen from Figure 10, we can find that (i) the targets are very likely to be matched by 2DMETS based SURF due to the high Repeatability achieved; (ii) there is slight influence on Match Score by using the SURF with and without 2DMETS; (iii) our proposed 2DMETS based SURF performs best in terms of Correct Matching Rate; and (iv) although a little extra time cost is required by 2DMETS processing, the real-time capacity can also be guaranteed by the proposed 2DMETS based SURF.

Conclusion
A novel 2DMETS based SURF proposed in this paper is proved to perform well in accuracy and computation cost for image target matching.Compared to the conventional SIFT, SURF, Otsu based SIFT, Otsu based SURF, and the enhanced 2DMETS based SIFT, an effective improvement of Correct Matching Rate without significant loss in real-time capacity is possible, which indicates an important advantage for the time-efficient image processing applications.However, this paper mainly focuses on the target matching between gray images.We will pay more attention to the design of the accurate and cost-efficient image target matching approaches for color images in future.

Figure 6 :
Figure 6: Target matching for images in group 1.

Figure 7 :
Figure 7: Target matching for images in group 2.

Figure 8 :
Figure 8: Target matching for images in group 3.
), 3(a), 4(a), and 5(a) show the segmentation results by Otsu, while Figures 2(b), 3(b), 4(b), and 5(b) show the results by 2DMETS.Second, Figures 6, 7, 8, and 9 show the results of target matching by using SIFT, SURF, Otsu based SIFT, Otsu based SURF, 2DMETS based SIFT, and 2DMETS based SURF for each group of images.Last, the matching performance is compared in Tables 1, 2, 3, and 4. 4.3.Result Discussion 4.3.1.Repeatability.After the affine transformation, if there is a pair of feature points located at the same target in the two different images, a correspondence occurs.

Figure 9 :
Figure 9: Target matching for images in group 4.

Figure 10 :
Figure 10: Repeatability, Match Score, Correct Matching Rate, and matching time for each group of images.

Table 1 :
Matching performance for images in group 1.
4.3.2.MatchScore.Match Score is defined as the ratio between the number of correct matches and the value min(feature img1 , feature img2 ).Obviously, the higher Match Score means that the targets are more likely to be matched correctly: We use Correct Matching Rate to examine the probability of the targets to be matched.Correct Matching Rate is defined as the ratio between the number of correct matches and the number of total matches.

Table 2 :
Matching performance for images in group 2.

Table 3 :
Matching performance for images in group 3.

Table 4 :
Matching performance for images in group 4.