Feature Selection for Object-Based Classification of High-Resolution Remote Sensing Images Based on the Combination of a Genetic Algorithm and Tabu Search

In object-based image analysis of high-resolution images, the number of features can reach hundreds, so it is necessary to perform feature reduction prior to classification. In this paper, a feature selection method based on the combination of a genetic algorithm (GA) and tabu search (TS) is presented. The proposed GATS method aims to reduce the premature convergence of the GA by the use of TS. A prematurity index is first defined to judge the convergence situation during the search. When premature convergence does take place, an improved mutation operator is executed, in which TS is performed on individuals with higher fitness values. As for the other individuals with lower fitness values, mutation with a higher probability is carried out. Experiments using the proposed GATS feature selection method and three other methods, a standard GA, the multistart TS method, and ReliefF, were conducted on WorldView-2 and QuickBird images. The experimental results showed that the proposed method outperforms the other methods in terms of the final classification accuracy.


Introduction
With the development of satellite remote sensing technologies, more and more high spatial resolution images are now becoming available. High spatial resolution images have been widely and successfully utilized in land-cover classification [1]. As smaller-scale ground objects can be identified and more detailed information can be obtained from highresolution images, the traditional pixel-based image analysis methods cannot satisfy the classification demands of highresolution remote sensing images, because of the low accuracy and the insufficient utilization of the rich information [2]. In object-based classification approaches, by grouping pixels together with a specific method, the images are segmented into homogeneous regions named "objects," which can provide not only spectral information, but also texture features, shape features, and neighboring relationships for the classification [3]. Therefore, in high-resolution image classification, it is reasonable to use object-based methods instead of pixel-based methods. A large number of studies have compared pixel-based and object-based classification techniques, and it can be concluded that the classification accuracies obtained by the object-based methods are higher than those obtained by the pixel-based methods [4].
The dimensions of the features extracted from image objects are much larger than pixels, which mainly contain spectral-based information (e.g., mean, ratio, and standard deviation) [5]. In object-based classification, hundreds of features involving the spectral, geometry, and texture features can be obtained from the image objects. However, large amounts of features participating in classification always give rise to the "dimension disaster," which decreases the classification accuracy. As some features make contributions to the classification and others have less influence on the result, features are commonly divided into relevant features, redundant features, and irrelevant features [2]. To yield better classification results, the irrelevant information should be removed, as much as possible, and the utilization of relevant information should be maximized. Therefore, feature selection prior to the object-based classification of high-resolution 2 Computational Intelligence and Neuroscience remote sensing images is a prerequisite. After the redundant and irrelevant features are removed, the training time is reduced and the classification efficiency can be improved [6].
The task of feature selection is to obtain the optimal feature subset to achieve a similar or better classification quality than when using all the features [6]. Various approaches have been put forward for feature selection, including the branch and bound method [7], the sequential forward selection (SFS) method [8], the sequential backward selection (SBS) method [9], and ReliefF [10]. However, a variety of problems still exist in the above methods, including the high computational complexity, monotonicity of the objective function, and insufficient consideration of the correlations between features. The feature selection process is actually a kind of combinatorial optimization problem; therefore, intelligent search algorithms can be used to solve the problem. Evolutionary computation (EC) methods have been applied in feature selection problem and achieved much success recently [11]. For example, as a global optimization method developed from the genetic process of natural selection, genetic algorithms (GA) have been widely used in feature selection studies [12][13][14] because of their robustness and fast searching speed. With mutual information as the evaluation function, Huang et al. [15] searched for the optimal feature subset using a GA, with consideration of the correlations not only between candidate features and classes, but also between candidate features and selected features. Yan et al. [16] carried out feature selection using an adaptive GA, in which the probabilities of crossover and mutation for each individual depend on their fitness values. In this way, superior individuals can be found and the convergence speed is increased. However, in terms of population diversity, there is still room for improvement. Therefore, further study and improvement of the use of GAs for feature selection are necessary.
Tabu search (TS) is also a typical way to solve optimization problems [17], and it has been performed successfully in the combinatorial optimization field. Danenas et al. [18] developed a credit risk evaluation method based on TS and the correlation-based feature subset evaluator. Based on TS and variable neighborhood search, Sicilia et al. [19] presented an optimization algorithm to solve the problem of vehicle routing in urban areas. Moreover, combinations of TS and other approaches have been proved to be able to solve different optimization problems. For example, by combining the advantages of simulated annealing (SA) and TS, Katsigiannis et al. [20] presented a hybrid method named SA-TS to solve the optimal sizing problem of autonomous power systems. Shen et al. [21] proposed a gene selection method based on a combination of particle swarm optimization (PSO) and TS for tumor classification. A continuous tabu simplex search (CTSS) method was developed by Chelouah and Siarry [22] by the combination of TS and the Nelder-Mead simplex approach to solve global optimization problems in multiminima functions.
In El Ferchichi et al. 's work [23], both GA and TS were used to select optimal feature subsets based on data from transport system and they found that each method had its own advantages in terms of processing speed or dimensionality reduction. Through an investigation of the performance of the GA and TS, it is not difficult to see that the weakness of TS is the dependence on the initial solutions and the slow speed of convergence, while the problem with the GA is premature convergence. However, in theory, the GA could provide good initial solutions for TS and, in return, the characteristics of TS could help the GA to escape from premature convergence. Consequently, a novel feature selection method named GATS based on the combination of a GA and TS is proposed in this paper.
The rest of this paper is organized as follows. In Section 2, the main principle of the proposed GATS method is introduced. The implementation of the GATS method is detailed in Section 3. The experimental results and discussions are provided in Section 4. Finally, the conclusions are made in Section 5.

Introduction to GATS
2.1. Overview of the Genetic Algorithm (GA). As a random heuristic search algorithm inspired by natural evolutionary laws, the GA was first proposed by Holland in 1975 [24]. To solve a problem by the use of the GA, the first step is to establish the initial population. Each member of the initial population is called an "individual" (or chromosome), corresponding to a solution to a certain problem [19]. Commonly, fitness is used to represent a chromosome's adaptability to the environment, so each chromosome is evaluated by a certain objective function [25]. A selection operation is then carried out as it picks the individuals with higher fitness values, which are used to regenerate new offspring [26,27]. After this, crossover is an essential step to produce new individuals by randomly recombining the selected parent chromosomes on a random crossover point with a specific probability. Finally, a mutation operation is implemented with a relatively small probability, which can reduce the appearance of local optima by randomly replacing one or more genes of the current chromosomes [13,28]. The crossover and mutation operators of the GA are illustrated in Figure 1.

Overview of Tabu Search (TS).
As a metasearch strategy first put forward by Glover [17,29], the TS algorithm has been widely used to solve combinatorial optimization problems. By starting with an initial feasible solution , TS conducts the search in the neighborhood solutions of generated by neighborhood moves (explained later). The value of the best solution so far best is initially assigned the value of . Supposing that is the best among the neighborhood solutions, then the value of is replaced with the value of Computational Intelligence and Neuroscience in two cases: is not included in the tabu list (explained later); and is included in the tabu list, but it satisfies the aspiration criterion (explained later). At the same time, if the new solution is superior to best , the value of best is replaced with the value of . The move from to is then recorded by the tabu list, which means that this move is forbidden in a certain number of iterations. The neighborhood search is continued based on the new feasible solution . This whole procedure is iteratively executed until the stopping condition is satisfied. After the iterative process has ended, the current best solution so far best is the final optimal solution provided by the TS method [30].
In this section, we explain the key elements in the TS method, as mentioned above: the neighborhood moves, the tabu list, and the aspiration criterion. Commonly, neighborhood moves can be realized in several ways, for example, increasing or decreasing the values of the chromosome genes by one and reversing the positions of two genes belonging to the same chromosome. In this paper, the position exchange pattern is adopted. As shown in Figure 2, supposing that one current solution can be expressed by The tabu list is a kind of short-term memory table and is used to deposit the latest neighborhood moves which are forbidden for (length of the tabu list) times. In this way, local optima can be effectively avoided. Generally, the first input first output (FIFO) strategy is used to deal with the updating of elements in the tabu list, which means that, after times iterations, the element is released from the tabu list and the tabu property for this move is removed. However, once a move included in the tabu list leads to a better solution than best , the tabu property is ignored because it led the searching to obtain the best solution so far. Accordingly, its solution replaces the current solution and the best solution so far best . This is the so-called aspiration criterion. On the one hand, it can help to prevent the loss of superior solutions during the iterations; on the other hand, it can encourage the movement to unexpected solution fields to further realize the global search [31].

Basic Principle of the GATS Method.
As previously mentioned, premature convergence is the main problem of the GA, and the weaknesses of TS is its dependence on the initial solutions and the single-point search mode. Fortunately, the GA can provide high-quality initial solutions for TS, and its fast searching speed can compensate for the speed problem of TS. Moreover, the flexible tabu list and the aspiration criterion of TS can help the GA to escape from local optima. In the proposed GATS method, TS is integrated with the GA in the following way: the mutation operator based on TS replaces the original mutation operator once the prematurity warning has been triggered. Evolutionary computation approaches refined by local search methods could be termed as memetic algorithms which have been successfully applied in many studies [32][33][34][35]. And a memetic framework has also been utilized in the proposed feature selection method.
To judge whether or not the search process has been trapped in premature convergence, the prematurity index is defined through calculating the similarity degree between each two individuals as follows: where is the prematurity index, is the count of individuals of the population, !/(( − 2)! * 2!) is the count of the combinations of pairwise individuals from the population, is the count of the same genes with the same locations for  each two individuals among the whole population, and is the length of the chromosome. During each iteration of the feature selection procedure, the prematurity index is calculated after the crossover step to judge if the prematurity problem has occurred. Once prematurity does happen, all the individuals in the current population are first sorted in descending order by values of fitness. The TS is then performed based on the top 50% of the sorted individuals, and the mutation operator is executed on the others with a high probability. By the use of the proposed GATS method, on the one hand, the prematurity problem of the GA is improved and, on the other hand, TS can start searching with a batch of favorable initial solutions instead of a common one. In this way, both the GA and TS can give full play to their respective advantages in the optimization search problem.

The Proposed GATS Methodology
3.1. Coding Scheme. The binary coding scheme is the most commonly used coding technique, and its encoding and decoding are simple. It is also easy to realize genetic operators, including crossover and mutation, in the binary coding scheme. Therefore, the binary coding method is adopted in this paper to express the chromosomes in the GATS feature selection procedure. As shown in Figure 3, we suppose that the number of all the candidate features is , then the length of each chromosome is , and each gene of the chromosome corresponds to one feature. When a gene from one chromosome is expressed as "1," it means that the corresponding feature has been selected; when the gene is marked as "0," it means that this feature has not been selected.

Objective Function.
The objective function in the GA is designed to calculate the fitness values, which can be used to evaluate the viability of the individuals. A set of good features can separate classes quite precisely by making the within-class distance as short as possible and the between-class distance as long as possible. In this paper, the within-class and betweenclass distances are chosen as the main factors to form the objective function. And they are calculated as follows: where is the within-class distance, is the count of all the classes, is the count of the samples from class , is the feature vector of sample from class , is the feature vector of the center belonging to class , and is the count of all the samples.
is the between-class distance, and is the count of the combinations between classes which can be calculated by !/( − 2)!2!. Based on the principle of minimizing the within-class distance and maximizing the between-class distance, the objective function can finally be expressed as follows: where is the fitness value and is an extremely small constant (here, = 10 −10 ) in the case that the value of is zero.

Procedure of GATS.
A flowchart of the proposed GATS method is shown in Figure 4. The implementation of the whole procedure can be explained as follows.
Step 1 (initial population). Individuals with the length of are randomly generated to form the initial population with the size of .
Step 2 (objective function). Values of fitness for each individual are calculated by (3).
Step 3 (selection). The purpose of selection is to retain the superior individuals with higher fitness values. As a classical random selection technology, roulette wheel selection [36] Computational Intelligence and Neuroscience 5 (also named proportional selection) is adopted in GATS. In the roulette wheel selection method, the selected probability of each individual is proportional to the value of its fitness. When the size of the population is , the chosen probability for one individual can be calculated as follows: where the probability of being chosen for individual is , and the fitness value of individual is . Individuals with higher fitness values are more likely to be selected.
Step 4 (crossover). By exchanging the genes of two parent individuals with a certain crossover probability , as shown in Figure 1, two offspring individuals are produced. Through the crossover operation, the information of the individuals is sufficiently recombined and the search range is effectively expanded.
Step 5 (judgment of prematurity). As the proposed GATS method is used to improve the premature convergence problem of the GA, the detection of prematurity is quite important. The prematurity index is calculated by (1). Through a large number of experiments, the threshold value is derived, and once the value of is larger than threshold , Step 8 is executed. Otherwise, it means prematurity has not yet occurred, so go to Step 7.
Step 6 (conventional mutation). As an important operation, this step simulates gene mutation of the biological evolution process. The mutation operation is executed on a random gene of parent individuals by changing it from "1" to "0" or from "0" to "1" with a specific mutation probability . Then go back to Step 2.
Step 7 (the improved mutation). This is the key step in helping the search procedure to jump out of the local convergence situation. When premature convergence occurs, TS is carried out based on the superior individuals with higher fitness values. The conventional mutation operation with a higher probability is performed on individuals with lower fitness values. The proportion of superior individuals is set to 50% in GATS. Then go back to Step 2.
Step 8 (termination condition). In GATS, after a specified number of iterations, the feature selection process stops.
It is worth mentioning that voting technology is utilized in GATS to obtain optimal feature subsets with a quantitative size, as the count of features selected by the proposed method is uncertain. At first, the above procedures are carried out iteratively for a certain number of times (50 times in this paper). Statistical analysis is then conducted based on the above feature selection results, and the features are ranked according to the number of times they are selected. Finally, the features with the highest number of selections are included in the optimal feature subset.

Results and Discussion
The proposed GATS method was realized by visual C++ programming language on a computer with a 3.10 GHz CPU and 4.00 GB RAM under the Windows 7 operating system. Figure 5, a World-View-2 image with a 0.5 m spatial resolution and a QuickBird image with a 0.6 m resolution were, respectively, used in two experiments to verify the proposed method. The experimental regions are both located in the city of Wuhan, Central China. As shown in Figure 5(a), the first study site is a typical urban area, with the land-cover types including buildings, vegetation, water, shadows, and ground surfaces. The second study area displayed in Figure 5(b) is a complex suburban area, with vegetation, water, buildings, ground surfaces, bare land, and secondary bare land.

Experimental Design. As shown in
As the first step of the whole classification procedure, a bottom-up region merging method is employed in GATS to segment the images. Through segmentation experiments, the settings of the parameters were decided and are shown in Table 1. Figure 6 shows the final segmentation results of the WorldView-2 and QuickBird images. After the segmentation process, 790 objects for the WorldView-2 image and 1319 objects for the QuickBird image were finally obtained, and 249 features were extracted from the objects, as shown in Table 2. All the texture features listed in the table were derived from the gray-level cooccurrence matrix (GLCM) proposed by Haralick et al. [37]. And then, 77 training samples for the WorldView-2 image and 96 training samples for the Quick-Bird image were randomly selected by manual work for both the feature selection and subsequent classification procedure.
Based on the above analysis and preprocessing of the experimental images, the GATS method was executed on the 249 features to select the optimal feature subset. Table 3 shows the details of the parameter settings for the GATS feature selection method and as parameters of GA and TS are commonly used, most of these parameters are assigned values empirically according to the practical problems [26,38,39]. As the main purpose of GATS is to improve the premature convergence of the GA, a comparison between GATS and the GA is essential to confirm the effectiveness of the proposed method. To prove that the GA can provide multiple initial solutions with a high quality for TS, a multiinitial-solution TS method with common initial solutions was utilized for the comparison. In addition, as a typical feature selection technology, ReliefF [2] was also compared with GATS. In summary, three experiments based on standard GA, a multistart TS approach [40], and the classical ReliefF algorithm were carried out in this study. In the experiments, the values of the parameters for the GA and TS were kept the same as GATS. In the experiment with ReliefF, the sampling times were 500 and 800, the number of neighbor points was 5 and 10, and the threshold values of the feature weight were 0.8 and 0.8 for the WorldView-2 and QuickBird images, respectively. Table 4 lists the numbers of selected features, the mean and standard deviation of both fitness values and CPU time of GATS, GA, multistart TS, and ReliefF methods through 50 times runs. As a typical highly efficient feature weighting method, the time taken by the ReliefF algorithm is shorter than the other three methods. Among the other three methods, although the GA costs the least time, the number of features obtained by this method is always the largest, and its mean fitness values are the lowest.

Experimental Results.
In the multistart TS method, as several initial solutions are used in the search instead of a single one, the CPU time is longer than for the GA, but the fitness values are higher. As for the proposed GATS method, the time required is longer than for the multistart TS method, but the feature extraction effect is much better as the number of optimal features obtained by GATS is much smaller. Most importantly, the mean values of fitness obtained by the proposed GATS method are obviously improved compared with the other methods. In addition, statistical analysis has been conducted based on above experiments and standard deviation values for fitness and CPU time have been obtained. It is not hard to see that both items get low values which demonstrate the high stability and reliability of the proposed method. Table 5 lists the final feature selection results of each method (the number of features participating in the following classification was uniformly set to seven for all the experiments). It is not difficult to observe that, as important information in object-based high-resolution image analysis, texture features such as GLCM mean are always selected by the proposed method, but, for the other methods, texture is ignored in most cases, and the spectral information occupies the dominant position.
As one of the most effective machine learning algorithms, support vector machine (SVM) has been widely employed in the classification of remote sensing images [41][42][43][44]. Using the above feature selection results, SVM was adopted to    classify the WorldView-2 and QuickBird images. Through calculation of the -fold cross-validation method, the values of C and Gamma (the parameters of SVM) were, respectively, set to 100 and 5 for the WorldView-2 image and 32 and 0.5 for the QuickBird image. The final classification results are shown in Figures 7 and 8.
For the WorldView-2 classification results, it can be observed that buildings with more accurate contours and higher integrity are provided by the GATS method, as highlighted with the yellow rectangle in Figure 7(a). Meanwhile, the extraction of buildings by GA and ReliefF is incomplete, and, for TS, some of the buildings are misclassified as water and shadows. In terms of a place featuring a mixture of ground and vegetation, as highlighted by the yellow ellipse, it can be distinguished by GATS, whereas the other methods result in misclassification. As highlighted by the blue circle, the small area of vegetation surrounded by other large-scale objects can also be successfully recognized by the proposed method. The TS method misclassifies the small vegetation object as shadow, and GA and ReliefF fail to recognize it.
As shown in Figure 8, for the QuickBird image, there are several miss-extractions of buildings in the results of the other three methods, as highlighted by the yellow ellipse in the bottom right corner of the image. The extraction of the main road in Figure 8(a) by GATS is more complete than for the other methods. Despite the high similarity of the spectral characteristics, the water, buildings, and shadows in Figure 8(a) are less likely to be misclassified because of the participation of the texture information in the classification. From the visual assessment of the classification results, it 8 Computational Intelligence and Neuroscience is not difficult to conclude that, in general, the proposed method leads to a preferable classification effect.

Accuracy Analysis and Discussion.
As the final objective of analyzing images by the proposed GATS method is to improve the classification accuracy, confusion matrices are used to quantitatively evaluate the performance of the different methods. Producer's accuracy, user's accuracy, overall accuracy, and Kappa coefficient calculated from the matrix are the key indicators to assess the classification quality in Figures 7 and 8. The producer's accuracy refers to the probability of a reference sample being correctly classified [45]. It can be calculated by where PA represents the producer's accuracy, refers to the element in the th row and th column of the matrix, and ∑ + represents the sum of the elements from the th column of the confusion matrix. The user's accuracy represents the probability of the classified land being grouped into the true ground reference category: where UA represents the user's accuracy and ∑ + is the sum of the elements from the th row.
The overall accuracy refers to the percentage of correctly classified samples and can be calculated by where OA represents the overall accuracy, is the dimension of the confusion matrix (also the number of classes), and is the sum of the elements in the confusion matrix's diagonal.
To measure the classification ability of the utilized method with respect to a random classification, the Kappa coefficient is obtained by statistical calculation of the confusion matrix and can be expressed as follows: where represents the Kappa coefficient and is the total number of samples. Generally speaking, the higher the above accuracies or coefficients, the better the classification effect. Overall accuracies and Kappa coefficients for both images are listed in Table 6. For the WorldView-2 image, the GATS method produces a significantly better classification effect than the other methods, with increments of at least 11.5% in OA and 0.14 in Kappa. In terms of the QuickBird image, the highest OA of 88.25% is yielded by GATS. Table 7 lists the producer's accuracies and user's accuracies for each class of the WorldView-2 image. Due to the    Although ReliefF outperforms the other methods in CPU time, it provides the lowest user's accuracies of 51.95% and 47.5%, respectively, for vegetation and shadows. The producer's accuracies and user's accuracies for the QuickBird image are listed in Table 8. Compared to the other classes, the buildings class shows the biggest increase in user's accuracy with the GATS method. Both the producer's and user's accuracies for vegetation 1 and vegetation 2 provided by GATS reach a stable level of 90%, whereas the accuracies yielded by the other methods are mainly between 70% and 80%. Not only does ReliefF obtain the lowest overall accuracy and Kappa, but it also yields the lowest user's accuracy of only 50% for bare land and the lowest producer's accuracy of 55% for roads.

Conclusions
In this paper, we have put forward a feature selection method based on the integration of GA and TS (GATS). The proposed GATS method is aimed at improving the premature convergence of the GA with the new mutation operator modified by TS. To validate the reliability and effectiveness of the proposed feature selection method, other feature selection methods, a traditional GA, multistart TS, and ReliefF, were also implemented. SVM was then utilized to classify the WorldView-2 and QuickBird images based on the selected features. Through the experiments and comparisons, it was demonstrated that the proposed GATS method can increase the classification accuracy by providing feature subsets with the within-class distances as small as possible and the between-class distances as big as possible. However, the proposed method could be further improved in terms of feature number. As restriction of the feature number is not easy to implement in the binary coding scheme, voting technology is used in GATS to select a fixed number of features from the feature selection results. Therefore, in our future work, a novel coding scheme for GATS will be studied, by which control of the optimal feature number will be realized.

Conflicts of Interest
The authors declare that they have no conflicts of interest.