Automatic Classification of Remote Sensing Images Using Multiple Classifier Systems

It is a challenge to obtain accurate result in remote sensing images classification, which is affected by many factors. In this paper, aiming at correctly identifying land use types reflec ted in remote sensing images, support vector machine, maximum likelihood classifier, backpropagation neural network, fuzzy c-means, and minimum distance classifier were combined to construct three multiple classifier systems (MCSs). TwoMCSs were implemented, namely, comparative major voting (CMV) and Bayesian average (BA). One method called WA-AHP was proposed, which introduced analytic hierarchy process into MCS. Classification results of base classifiers and MCSs were compared with the ground truth map. Accuracy indicators were computed and receiver operating characteristic curves were illustrated, so as to evaluate the performance of MCSs. Experimental results show that employing MCSs can increase classification accuracy significantly, compared with base classifiers. From the accuracy evaluation result and visual check, the bestMCS isWA-AHPwith overall accuracy of 94.2%,which overmatches BAand rivals CMV in this paper.Theproducer’s accuracy of each land use type proves the good performance of WA-AHP. Therefore, we can draw the conclusion that MCS is superior to base classifiers in remote sensing image classification, and WA-AHP is an efficient MCS.


Introduction
With the development of remote sensing technology, it has been widely applied in many different fields such as land use land cover (LULC) monitoring, investigation of forest resources, disaster monitoring, and urban planning [1,2], where the identification of land use types by image classification technology plays a very important role.There are a large number of land use types with irregular distribution on the surface of earth.Remote sensing images reflect the complicated information of the earth's surface, so it is a challenge to make accurate classification of remote sensing images.
Many artificial intelligence (AI) methods have been utilized in remote sensing images classification including neural networks [3][4][5], support vector machine (SVM) [6][7][8], and maximum likelihood classifier (MLC) [9,10].Although the results of these classifiers are generally positive, they still have their own limitations.According to a large number of studies, there is no individual algorithm that performs perfectly in classification.Different classifiers have different accuracies for the same class, while the same classifier has different accuracies for different classes, due to the complementary advantages of different classifiers.As a result, multiple classifier systems (MCSs) naturally become a good choice.Combining strategies for multiple classifiers has been widely investigated, the aim of which is to determine an efficient combination method that makes full use of the complementary advantages of each classifier and tackles the drawbacks of individual classifiers, to improve the accuracy of classification.In the early period, MCS theory was developed in pattern recognition such as signal processing, handwriting recognition, face recognition, and fingerprint identification [11][12][13][14].In recent years, MCS has been gradually introduced to remote sensing images processing [1,7].There are many methods combining multiple classifiers in previous work.According to the types of the output of the base classifiers, MCS methods can be grouped into three categories, namely, abstract level, ranked level, and measurement level.When it comes to remote sensing images classification, MCSs are usually implemented in abstract level and measurement level.Xu proposed standard methods of MCS that combined multiple classifiers in both levels in the pattern recognition field [11,12].Voting method and Bayesian average (BA) method are the classical methods in abstract level and measurement level, respectively.
For the purpose of improving the accuracy of identifying LULC types, we applied some traditional AI methods in remote sensing images classification.Three MCS methods integrating support vector machine (SVM), maximum likelihood classifier (MLC), back propagation neural network (BPNN), fuzzy c-means (FCM), and minimum distance classifier (MDC) were implemented, one of which was proposed as a novel method called weighted average based on AHP (WA-AHP), making full use of the advantages of AHP and MCS.In this paper, we compared the voting method, BA, and WA-AHP.Confusion matrix and several indicators were computed to evaluate classification accuracy.Meanwhile, we illustrated the receiver operating characteristic (ROC) curve, from which area under curve (AUC) was derived, so as to evaluate the performance of the classifiers including base classifiers and MCSs.Finally, we discussed the application of MCSs in remote sensing images classification.

Data Sets
We chose a plot of the SPOT-5 satellite image in Liaoyang city of China as shown in Figure 1(a).According to visual interpretation and field measured data, we made the ground truth map as shown in Figure 1(b).In this region covered by the remote sensing images, there were four main types of land use, namely, vegetable field, settlement place, farm  land 1, and farm land 2. Meanwhile, as shown in Figure 2, we analyzed the spectral character of each type.The coordinate axes represent the bands of the SPOT-5 image.It can be seen that the four types have different spectral characters.

Methodology
We aimed at improving the classification accuracy for remote sensing images by combining multiple classifiers.As shown in Figure 3, the framework consists of three major steps.Firstly, five base classifiers were selected and implemented, respectively.Secondly, based on the output of the five base classifiers, the classification results by comparative major voting (CMV), Bayesian average, and WA-AHP were obtained, respectively.Finally, the performance of all classification methods was evaluated including base classifiers and MCSs.

Base Classifiers
3.1.1.Support Vector Machine.SVM is a machine learning method which uses a certain distance between samples as the criterion of classification, based on the principle of structural risk minimization.This method has been found to be efficient for pattern recognition and recently for satellite image classification [7,8,15].In SVM classification, the selection of the kernel function is an important step, and Gaussian radial basis function was selected in this paper.

Maximum Likelihood Classifier.
MLC is a well-known method for determining a class based on Bayesian formula [9,10].In theory, there is an assumption in MLC of the training samples with normal distribution.During the process of classification, MLC builds the probability density functions for each class.All unclassified pixels are assigned membership based on the relative likelihood (probability) of that pixel occurring within each class.The probability is calculated as follows.First, it is supposed that there are  predefined classes.Then, according to Bayesian formula, the posterior probability of  is labeled to class , and (  /) is defined by the following equation: where (  ) is the prior probability of class , (/  ) is the conditional probability of  from   , and () is the probability of  and is the same for each class.When the statistical probability of the given pixel is calculated, pixel  will be assigned to the class with the highest probability.The three layers include input layer, hidden layer, and output layer [4,5].A simple sketch map of BPNN is shown in Figure 4. BPNN needs some input sample sets and the known correct outputs of each case to learn.The learning process of BPNN consists of forward propagation and back propagation.In the process of forward propagation, information input from the input layer is propagated through the hidden layer to the output layer.The state of neurons in one layer affects the state of neurons only in the next layer.BPNN defines an object error function using mean squared error (MSE) as shown in (2), where   is the actual output and   is the predicted output.If there is a difference between   and , then the neural network turns to the process of error back propagation.The error value is then propagated backwards through the network, and small changes are made to the weights of the neurons in each layer.The whole process is repeated until the error value drops below a predetermined threshold; then the network has learned the problem well: (2)

Fuzzy c-Means.
FCM is a method of clustering which allows one piece of data to belong to two or more clusters.This method is frequently used in pattern recognition [16].It is based on minimization of the objective function   in (3), where  is the number of cluster centers,  is the number of data,   is the feature vector for the unknown input,   is the center of cluster ,   is the degree of membership of   belonging to cluster , and  is a real number greater than 1: FCM is carried out by iterative optimization of the objective function   through the update of membership   and the cluster centers   in (4).This iteration will stop when max  {| +1  −    |} < , where  is a termination threshold between 0 and 1, and  is the iteration step.Objective function Classification results of base classifiers Else, when sum(P em ) = max(sum(P ej ),sum(P ek )), then X ∈ m then X ∈ j   will converge to a local minimum when this procedure is finished: 3.1.5.Minimum Distance Classifier.MDC is a method which classifies an unknown class according to the distance between the unknown class and the centers of clusters, as shown in (5), where   is the th feature vector for the unknown input and   is the center of the cluster .This method aims to assign the class with the minimum distance to the unknown class pixel:

Abstract Level MCS-Comparative Major
Voting.The voting method derives from the hypothesis that the decision of a group is superior to that of the individuals.It includes majority voting rule and conservative voting rule.In the principle of majority voting, if one pixel is identified as the same class by most base classifiers, the pixel is labelled to this class.And for conservative voting rule, unless one pixel is identified as the same class by all the base classifiers, it is labeled to the class.Otherwise, it cannot be classified.The rule of class determination is shown in (6), where   denotes the voting value which is the number of classifiers with consistent classification results and  + 1 means the unclassified class.
When  is 0.5 and 1, it is the majority voting and conservative voting, respectively: However, this voting strategy has a drawback that when different classes obtain the same voting value for one pixel, the pixel cannot be labeled, which may greatly affect the classification accuracy.So we proposed a new approach called CMV to combine the base classifiers.The general flowchart of CMV is given in Figure 5.In this rule, two principles are followed: the decision of the majority is superior to that of the individual; and if several different classes get the same voting value for one pixel, the class of this pixel is determined by comparing the summations of probability of the concerned classes outputted by all the base classifiers.The pixels were labeled to the class with the highest summation.

Measurement Level MCS-Bayesian
Average.Most classifiers can output posterior probability, which represents the probability that input pixel  belongs to class   , as shown in (7).Specifically, we supposed that there are  classes and  base classifiers which can output a vector   (), and   () = [  (1), . . .,   ()]  .Here,   (  ) denotes the posterior probability that pixel  belongs to   through classifier .One approach called Bayesian average calculates the average value of posterior probabilities.The final classification criterion is shown in (8), which means that pixel  is labeled by the class of the largest posterior probability: ( ∈   ) ,  = 1, . . ., , ( 7) Some base classifiers are not capable of outputting posterior probability, in which case we adopted (9), where   (  | ) denotes the distance between pixel  and the center of cluster   through the th classifier:

Weighted Average Based on AHP
(1) Analytic Hierarchy Process.Analytic hierarchy process (AHP) [17,18] is a structured and expert grading-based technique for organizing and analyzing complex decisions.AHP uses experience and knowledge to order the indicators in the criteria layer and constructs judgment matrix to calculate the weight of each indicator.It was developed by Thomas L. Saaty in the 1970s and has been extensively studied and refined since then.It has particular application in group decision making and weights determination.It is widely used in many fields such as government, business, industry, healthcare, and education, however, rarely in remote sensing image classification.
(2) WA-AHP.BA is a simple method to calculate the average posterior probability.There is an obvious drawback with it that it does not consider the accuracy difference between all the base classifiers.So we take the advantages of both AHP and BA and integrate them into a novel method as shown by (10), where   determined by AHP is the weight of each classifier and  is the number of base classifiers: To be specific, after comparing the classification accuracy of base classifiers, we calculated the classifier weight of each base classifier through AHP.The higher the classification accuracy is, the larger weight the classification has.In this paper, we selected producer's accuracy to measure the classification accuracy of each base classifier for each class.By consistency check, the consistency ratio (CR) is less than 0.10, meaning that the judgment matrix has a reasonable consistency.The final classification rule is like the one used by BA.Pixel  will be classified into the class with the largest posterior probability in all the predetermined classes.

Smooth Processing.
Because of the complexity of the earth surface, there are some isolated pixels inconsistent with neighboring pixels in the classification results.Meanwhile, there are also many noise spots in the results.So, in order to obtain better classification results, we used mean filtering to smooth the classification results [19].

Evaluation of Classification Accuracy.
The classification algorithms mentioned in this paper are compared using the following evaluation methods.
(1) ROC Curve.ROC curve is widely used in data mining, for it is an intuitive method that visually utilizes curves to evaluate the performance of classifiers.However, it is suitable for classification of two classes, so it is rarely applied in the classification of remote sensing images.In this paper, we adopted the following strategies to introduce ROC curve into accuracy evaluation of remote sensing image classification.There are many classes in the remote sensing images, and we suppose that all of them are divided into two classes, namely, target class and nontarget class.Target class is the one to be evaluated, while the remaining classes are considered to be of nontarget class.Then the classification of multiple classes was converted to binary classification, and consequently ROC curve can be obtained.Usually, a higher AUC which represents the area under the ROC curve denotes a better performance [20].
(2) Indicators Derived from Confusion Matrix.We used ground truth map and classification results to calculate confusion matrix, with which the classification accuracy is evaluated.Confusion matrix is denoted by  in the following parts.Each element   in the confusion matrix is the number of records pertaining to class  that have been automatically classified in the class .So the diagonal elements correspond to the numbers of records that have been correctly classified.Indicators such as overall accuracy (  ), kappa coefficient (Kappa), producer's accuracy (  ), and user's accuracy (  ) are derived from confusion matrix as shown by (11), where  + and  + represent the sums of the elements in the th row and the th column and  is the number of samples.Kappa ranges from 0 to 1. Higher values of indicators indicate more accurate results.Thus,

Experimental Results and Analysis
4.1.The Classification Results.We made classification of remote sensing images using SVM, MLC, BPNN, FCM, and MDC, respectively.The classification results of base classifiers were shown in Figure 6.
We used CMV, BA, and WA-AHP to construct MCS to classify remote sensing images.When using WA-AHP, it is required to obtain the priorities of base classifiers by their classification accuracy.The producer's accuracy of base classifiers was used to order the classifiers.Finally, the weight of each base classifier was obtained through AHP as shown in Table 1.The classification results of the three MCSs were shown in Figure 7.

Analysis of Classification Accuracy.
We used ROC curve to evaluate the performance of all classifiers.Meanwhile, the confusion matrix is the base of calculating the accuracy evaluation indicators, so we constructed confusion matrix for    4 and 5. Table 4 shows that, in terms of producer's accuracy, the best individual classifiers (BIC) are SVM for vegetable field and farm land 1, BPNN for settlement place, and FCM for farm land 2, respectively.Table 5 shows that, in terms of user's accuracy, BIC are MDC for vegetable field, SVM for settlement place and farm land 2, and FCM for farm land 1, respectively.Compared with base classifiers, three MCSs improved the producer's accuracy for each land use type.We also found that the producer's accuracy of each class using WA-AHP rivals the other two MCSs.So we can draw a conclusion that WA-AHP is an effective method for MCS.

Conclusion
We used multiple classifier systems in remote sensing image classification.We selected five classifiers to be base classifiers, including SVM, MLC, BPNN, FCM, and MDC.Three MCSs were constructed, namely, comparative major voting, Bayesian average, and weighted average based AHP proposed in this paper.We compared base classifiers and MCSs.Meanwhile, ROC curve is illustrated and some accuracy indicators were calculated to evaluate the classification results.From this paper, we can draw some conclusions as follows.
(1) In terms of the overall classification accuracy, MCS is more accurate than base classifiers, such as MDC, MLC, BPNN, FCM, and SVM, because it integrates the advantage of each base classifier.The best base classifier is SVM in the selected five base classifiers.
The overall accuracy and kappa coefficient of SVM are 0.873 and 0.814, respectively.(2) In terms of producer's accuracy of each type, the best individual classifiers are SVM for vegetable field and farm land 1, BPNN for settlement place, and FCM for farm land 2, respectively.(3) In terms of classification accuracy evaluation, the performance of WA-AHP proposed in this paper is superior to Bayesian average method and rivals comparative major voting method.That is because WA-AHP takes the accuracy difference of different base classifiers into consideration.(4) ROC curve used in the classification accuracy evaluation of remote sensing images is effective.
It is a challenge to improve classification accuracy of remote sensing images.And we introduced artificial intelligence technologies such as MCS, SVM, BPNN, MLC, FCM, and AHP to the research of remote sensing images classification, and the experimental results show that they improve the classification accuracy significantly.Our future research will focus on the application of more artificial intelligence technologies in processing of remote sensing images.

Figure 2 :
Figure 2: Location of the data for each class in feature space.

Figure 3 :
Figure 3: Framework of remote sensing images classification using MCS.

Table 1 :
Weight of each base classifier in WA-AHP.ROC curves for each class of different classifiers were shown in Figure8, and Table2described the corresponding AUC.Although all the classifiers performed rather well, including base classifiers and MCSs, the performance of different classifiers varied for different land use types.To be specific, the best performances are SVM for vegetable field and farm land 1, BPNN for settlement place, and FCM for farm land 2, respectively.MCSs are superior to

Table 2 :
AUC of each ROC curve.

Table 3 :
The comparison of overall accuracy and kappa coefficient.
BIC means the best individual classifier and SVM and MLC are two best base classifiers in this paper due to their better performance.MCS improved the classification accuracy significantly compared with base classifiers.The overall accuracy and kappa coefficient of WA-AHP were as good as those of CMV and are better than BA.

Table 5 :
The comparison of user's accuracy.