A Method of Waypoint Selection in Aerial Images for Vision Navigation

,


Introduction
In recent years, the research on vision navigation based on scene matching technique has attracted more and more attentions for its accuracy and independence [1].In order to assure the matching precision, it is a primary step to select waypoints from aerial images of candidate flying regions for scene matching, which can be implemented via suitability analysis [2][3][4].
Some researchers have made efforts to solve the problem of suitability analysis.Yang et al. [2] used gray-based descriptors and edge-based descriptors as the input of SVM to classify the matching suitability of the image.Jiang and Chen [5] provided a hierarchical way of selecting optimal scene matching areas.Liu et al. [6] presented the method of selecting matching areas using independent pixel numbers and variance.In [7] it was suggested that information entropy and summation of image gradient can be used for evaluating image suitability.However, features used in the above methods are inadequate to describe the suitability and there is redundancy information among the feature descriptors.Research has shown that the suitability is considered to consist of information, obviousness, stability, and uniqueness [1].From the view of visual analysis, the information and obviousness can be analyzed via visual saliency.Visual saliency is a cognitive procedure which can rapidly select a small set of highly informative or visually salient objects from a scene for further processing [8].The stability and uniqueness are the feature attributes of an image.So we can transform the suitability analysis to the combination of visual saliency analysis and feature attributes classification.
For visual saliency analysis, in order to avoid the uncertainty influence of statistic features, Li et al. [9] proposed the visual saliency analysis method based on length of incremental coding, which is based on sparse coding [10,11] and centersurround model (C-S); Han et al. [12] proposed saliency analysis method based on weighted length of incremental coding.They are all efficient approaches to select regions with local saliency, but for matching suitability the saliency should be with unique structure information, which means it is sparse in the global image.So low-rank recovery is introduced to analyze the global and local saliency with sparse coding for preparatory selection.For feature attributes classification, SVM is used to analyze stability and uniqueness for optimizing selection.So this paper presents a practical framework for waypoint selection, as illustrated in Figure 1.
The proposed framework consists of two major components: a preparatory selection model based on visual saliency analysis and an optimal selection model based on SVM.In the formal model, the initial image is decomposed as the sum of sparse matrix and low-rank matrix, and then saliency of sparse matrix and low-rank matrix is analyzed, respectively, to construct a new saliency map.The preparatory selection results are got with threshold constraint and nonmaxima suppression.In the second component, SVM is used for optimizing selection, and the input vector is composed of four measure parameters based on the edge and cross correlation surface.
The rest of the paper is organized as follows.Section 2 describes the preparatory selection model based on visual saliency analysis.Section 3 presents optimal selection of waypoints model based on feature attribute classification.Section 4 reports and analyzes evaluation results.Finally, conclusions are drawn in Section 5.

The Preparatory Selection Based on Visual Saliency Analysis
Salient objects can be viewed as a small number of foregrounds, which are different from the surrounding backgrounds.So low-rank matrix recovery is introduced to separate foreground from background [13].The low-rank matrix  and the sparse matrix  are recovered by optimizing the constraint, min , (rank () + ‖‖ 0 ) , s.t. =  + , where rank(⋅) represents matrix rank, ‖ ⋅ ‖ 0 represents  0norm, and  is the weight parameter which is used to weight the sparse relationship between the rank of  and the sparsity of .Given a suitable , we want to be able to get a pair of (, ).However, it is a nonconvex problem for the existence of rank(⋅) and ‖ ⋅ ‖ 0 .Usually, (1)

(c).
There are two kinds of saliency in aerial images.One is with its unique information of scene structure (shown as green dotted line in Figure 2(c)) and the other is some small man-made areas with high brightness where there is no structure information (shown as red dotted line in

Figure 2(c))
. We can see that the second saliency is the same as its surrounding objects in the image  whereas the first saliency in both  and  is different from its surrounding objects.So we can separate the two kinds of saliency by local saliency analysis of  and , respectively, based on centersurround model (C-S).
Here sparse coding is introduced to describe the local saliency with C-S model.Sparse coding codes the center patch over a dictionary constructed by the surrounding patches.If the center patch is similar to its surroundings, it has sparse coefficients.Denote by   the th patch of , and (  ) is dictionary consisting of surrounding patches, which is represented by a set of vectors (  ) = { 1   ,  2  , . . .,    } as a dictionary.Consider   ∩    = 0.So the problem of saliency based on sparse coding is shown as follows: where  is the balance factor between sparsity and data integrity.The  1 -norm optimization problem can be solved efficiently by Lasso method [11].The local saliency of image patch   is obtained as (4).The process is shown as in Figure 3. Consider All patches of the image can be calculated, so we can get the saliency map Sal().
We calculated the local saliency in both  and  based on sparse coding.And then the new saliency could be represented by the following function: A certain threshold  is used to judge possible waypoints, and the rule is For the region   () = 1, the nonmaxima suppression is used to get  peaks in Sal(), which are the centers of possible waypoints.

Optimal Selection Based on Feature Attribute Classification
It is a problem to judge whether there are stability and uniqueness of preparatory results.From the viewpoint of pattern recognition, it can be solved by two-class classification.So, SVM is introduced.
where ‖ ⋅ ‖ is  2 -norm of a vector, so ( 7) is equivalent to minimize (1/2)‖‖ 2 with the same constraint.The decision function is International Journal of Optics When the original set will not be linearly separable, it is common to define a soft margin by including variables   and parameter  > 0 The original sample can be mapped into a highdimensional space  (named feature space) by nonlinear transform Φ :   →  and  is expressed as the dual form  = ∑  =1       .So classification output can be predicted using the decision function, as where (  ,   ) is the kernel function.There are three forms of kernel function: radial basis function and the linear and polynomial kernel.Preliminary researches suggest that the radial basis function outperforms the others.So the radial basis function kernel in the following equation will be used in the classification: 3.2.Feature Selection.Suitable descriptors as the input vector of SVM can optimize computational efficiency and gain the better classification results.Here measure parameters based on the edge and cross correlation surface are considered for stability and uniqueness analysis.

Stability.
The stability is an important feature attribute of an aerial image, which is suitable for scene matching.So we need to select waypoints with stable features.Edge complexity and edge density are selected to be measure parameters of stability.
(1) Edge Complexity.Edge complexity is a homogeneity parameter of edge texture distribution.When it is smaller, the image is more smoother, which will lead to mismatching more easily.Edge complexity is calculated by where Γ(, ) is a local neighborhood with the center (, ) and   (, ) and   (, ) are 1-dimensional derivative and 2-dimensional derivative, respectively.
(2) Edge Density.Edge density can show the concentration of features in the original image.It is computed by where  Edge Pixel (, ) is the number of points in the neighborhood with the center (, ).(, ) is the number of pixels in the neighborhood.

Uniqueness.
The global uniqueness of waypoints is analyzed to avoid the selection of repeated scenery areas.The uniqueness is determined by cross correlation plane statistic feature.Cross correlation plane  is computed pixel by pixel in the whole image  via matching waypoint image .
We use two features of cross correlation, Submaxratio and Ngb8maxratio: where  is the mean of  and   is the mean of an area with the same size  ×  as  in .
(1) Submaxratio.Submaxratio denotes ratio of secondary maximum peak to maximum peak, which is computed using where  sub is secondary high correlation peak and  max is maximal correlation peak.It means the waypoint has better uniqueness when the value of SMR is closer to zero.
(2) Ngb8maxratio.Ngb8maxratio represents ratio of maximum of eight neighbor peaks to maximum peak, which is computed using where  ngb is maximum of eight neighbor peaks and  max is maximal correlation peak.

Training and Classifying.
Select 100 sample images as a training set and label each image manually.50 images are waypoints, and the other 50 images are nonwaypoints, which are shown as in Figure 4.
The image is decrypted by the measure parameter vector based on the edge and cross correlation surface.So the vector In = [EC, ED, SMR, NMR] is used as the input for SVM.The feature vectors should be normalized before training.The best parameters  = 6.8 and  = 0.0769 for the SVM classifying using (11) are obtained via training.In the testing, each of preparatory results is decrypted as In and normalized, which is put into SVM for classifying suitable or unsuitable results.

Experiments
As known at present, automatic selection method of waypoints has not been reported up to now, so there is no public  dataset for method validation.To evaluate the performance of our method, experiments on aerial images from Google Earth are conducted.The image pair of the same scene is taken at different time, as shown in Figure 5.One is used as reference image for waypoint selection, and the other is as sensed image used to verify the suitability of selected waypoints.For simplicity, the size of reference image and sensed image is set to be 223 × 223 and the size of waypoint image is set to be 51 × 51 in all experiments.
There are two kinds of reference images.One is called Class 1 reference, part of which is with its unique information of scene structure and can be selected as waypoints.The other is called Class 2 reference which is without any unique information of scene structure at all, as shown in Figure 6.The quantities of the two kinds are 150, respectively.1.
A few waypoints can be selected from one reference image, so the quantity of waypoints is larger than the quantity of references.Part results of Class 1 are shown as in Figure 7.In order to reduce the complexity of analysis, International Journal of Optics of scene structure.Our method can extract the areas with salient structure information and effectively inhibit the disturbance of brightness; for example, in the fourth line results, just the traffic intersections are extracted because of their structure information.Therefore, the number of the waypoint candidates is less than the results of the former methods.When we analyzed the reference images from Class 2 in the preparatory selection, there were still saliency regions in the images of Class 2, as shown in Figure 8.We note that the methods are ineffective in analyzing the saliency in the references from Class 2. It is because of the normalization of the saliency coefficients.The results in Class 2 are not suitable to be waypoints, so we need to classify the results in Class 1 and the results in Class 2 with SVM.

Optimal Selection.
SVM is used to optimally select waypoints by classifying the feature attributes.To evaluate the results of classification, cross correlation matching method is used for verification (it is thought to be a correct matching when the matching error is smaller than 5 pixels).The result is shown as in Table 2.
There are two kinds of mistakes in the process of classification.One is called "undetected, " which means a waypoint is classified as nonwaypoint.The other is called "false detected, " which means a nonwaypoint is classified as waypoint.The former error is tolerable, but the latter is fatal for vision navigation.So the latter should be avoided or be reduced as much as possible.The classification rate is shown as in Table 3.
From Table 3, we know that though there are many waypoints in images of Class 1, false detection still exists because of the scenes changing with time.Undetected is produced by the difference between training samples and testing samples.There is no undetected mistake in Class 2. It is because that there is no waypoint in the reference image of Class 2.
For comparison, the algorithms [2,5] are used for waypoints selection.Random sampling investigation was carried out on sets of Class 1 and Class 2 with the same quantity of waypoints as in Table 1.The times are 1000 and the result is shown as in Table 4.
We can see that, in the analysis of Class 1, our method is better than the other two methods, and, in the analysis of Class 2, our method is better than [2] and almost the same as [5].It is because that the threshold in [5] is set manually.

Conclusions
A method of waypoints selection was proposed in this paper, which firstly selected salient areas as candidate waypoints and then classified the candidates based on their feature attributes.The method combined the visual saliency analysis and feature attributes classification and especially avoided the inference of some small man-made areas with high brightness where there is no structure information.
The sensed image and the reference image are both from Google Earth, which makes the suitability analysis only depend on the original reference image itself and does not consider the matching condition under the geometrical transformations such as image scaling and rotation.In the next stage, we plan to extend this work along the following directions.Firstly, the matching condition will be considered for suitability analysis, and more aerial images under the real flying condition will be used to test the validity of the approach.Secondly, we will incorporate more powerful features or improve the saliency analysis model and classifying model to improve the effectiveness of the method.

Figure 7 :Figure 8 :
Figure 7: Results of waypoints in Class 1 (the first line is two reference images; the second line is the results of Li et al. [9]; the third line is Han et al. 's [12]; the fourth line is ours).
=1 of labeled examples, with each input   ∈   and the output label   ∈ {−1, 1},  is the number of training samples.The best hyperplane    +  = 0 ( is a constant) to separate two classes is achieved by satisfying the constraint:   (    + ) ≥ 1,  = 1, . . ., , [2].Introduction of SVM.SVM lies in strong connection to the underlying statistical learning theory, where it implements the structural risk minimization for solving the problem of two-class classification[2].SVM has advantages in solving the problems like small samples, high dimensions, and large scale.Given a training sample set  = {(  ,   )}

Table 1 :
Result of preparatory selection.

Table 2 :
Results of optimal selection by SVM.

Table 4 :
The comparison of results.