Sparse Representation Based Binary Hypothesis Model for Hyperspectral Image Classification

The sparse representation based classifier (SRC) and its kernel version (KSRC) have been employed for hyperspectral image (HSI) classification. However, the state-of-the-art SRC often aims at extended surface objects with linear mixture in smooth scene and assumes that the number of classes is given. Considering the small target with complex background, a sparse representation based binary hypothesis (SRBBH) model is established in this paper. In this model, a query pixel is represented in two ways, which are, respectively, by background dictionary and by union dictionary. The background dictionary is composed of samples selected from the local dual concentric window centered at the query pixel. Thus, for each pixel the classification issue becomes an adaptive multiclass classification problem,where only the number of desired classes is required. Furthermore, the kernelmethod is employed to improve the interclass separability. In kernel space, the coding vector is obtained by using kernel-based orthogonal matching pursuit (KOMP) algorithm. Then the query pixel can be labeled by the characteristics of the coding vectors. Instead of directly using the reconstruction residuals, the different impacts the background dictionary and union dictionary have on reconstruction are used for validation and classification. It enhances the discrimination and hence improves the performance.


Introduction
The technology for artificial target recognition in nature background is important to military science and civil autocontrol.Hyperspectral remote sensor captures digital images in hundreds of narrow spectral bands, which span the visible to infrared spectrum.The high spectral resolution of the data provides an invaluable source of information regarding the physical nature of the different materials and strengthens the capability to identify structures and objects in the image scene.As a result, it makes detecting and classifying the target at the same time possible, that is, integrated detection and classification.However, such a large number of spectral channels imply the high dimensionality of the data and bring challenge to image analysis.Most of the common technologies designed for the analysis of grey level, color, or multispectral images are not applicable to hyperspectral images.
One of the most important applications of HSI is classification.Different materials usually reflect electromagnetic energy differently at specific wavelengths.This enables discrimination of materials based on the spectral characteristics.Various techniques have been developed for HSI classification.SVM [1][2][3] is a powerful tool to solve supervised classification problem in remote sensing image scene and performs pretty well.Variations of SVM-based algorithms have also been proposed to improve the classification accuracy [4,5].In the context of supervised classification, such as SVM, an additional problem is the so-called Hughes phenomenon that occurs when the training set does not have enough samples to ensure a reliable estimation of the classifier parameters.It is hard to find the separating hyperplane between two classes with very limited reference data while the number of spectral channels is usually very large in hyperspectral image scene.
The sparse representation [6] has recently been applied to hyperspectral detection and classification [7][8][9] relying on the observation that the pixels belonging to same class approximately lie in same low-dimensional subspace.Thus, a query pixel can be sparsely represented by a few training samples (atoms) from dictionary, and the associated sparse 2 Mathematical Problems in Engineering representation vector will implicitly encode the class information.While SVM is a binary classifier (multiclass SVM requires one-against-one or one-against-all strategy [10]), the SRC is a multiclass classifier, which is from a reconstruction point of view.The SRC can be regarded as generalized model and often results in good performance.However, the number of classes present in hyperspectral image scene has to be known to structure the dictionary and calculate the residuals when employing the sparsity model as [11,12].In addition, the sparsity model may not be suitable for background pixels due to the interaction between the target and background sparse vectors.
In this paper, a sparse representation based binary hypothesis (SRBBH) model, which can be regarded as a discriminative and semisupervised model, is proposed to strengthen the performance of sparsity model.Only the number of desired classes, which is accessible in most practical cases, is required for SRBBH model.Thus, a target pixel is approximately represented by union dictionary consisting of both corresponding target training samples and background training samples while a background pixel can be approximately represented just by background dictionary.Different from SRC, different impacts the background dictionary and union dictionary have on reconstruction, instead of residuals themselves, are used for validation and classification in SRBBH model.This scheme enhances the discriminative power of different subspaces and then improves the classification performance.However, when the data structure is complex and the problem becomes nonlinear, the SRBBH model based classifier (SRBBHC) may not be competent any more.With implicitly exploiting the higher order structure of the given data, the kernel algorithm obtains significant performance improvement.Therefore, the kernel SRBBHC (KSRBBHC) is developed to project the data into highdimensional feature space in which the data becomes linearly separable.Taking the projected data into consideration, KSRBBHC intends to separately represent the desire class and undesired class in corresponding high-dimensional feature space and makes classification performance better.
The rest of the paper is organized as follows.Section 2 briefly reviews the conventional SRC and its kernel version.Section 3 proposes the SRBBHC and its kernel version for HSI.The effectiveness of proposed SRBBHC and KSRBBHC is demonstrated by experimental results in Section 4. Finally, conclusions are drawn in Section 5.

SRC.
In sparsity model, it is assumed that the spectral signatures of pixels belonging to same class approximately lie in same low-dimensional subspace.A query pixel is given y ∈ R  , where  is the number of bands.Then the linear representation of y can be written in terms of all training samples as where A = [A 1 , A 2 , . . ., A  ] ∈ R × is a structured dictionary whose columns are  training samples of all  classes and x ∈ R ×1 is the sparse coefficient vector.x can be recovered by solving where ‖ ⋅ ‖ 0 denotes the  0 -norm, which is defined as the number of nonzero entries in the vector, and  0 is a preset upper bound on sparsity level.The problem in ( 2) is a NPhard problem, which can be approximately solved by greedy algorithms, such as orthogonal matching pursuit (OMP) [13] and subspace pursuit (SP) [14], or relaxed to convex programming [15].In this paper, the OMP algorithm is exploited to generate sparse coefficient vector.The OMP algorithm augments the support set by one index per iteration until  0 atoms are selected or the approximation error is within a preset threshold.Once the sparse coefficient vector is obtained, the class label of y is determined by the minimal residual between y and its approximation from each class of subdictionary: where

KSRC.
Kernel methods outperform the classical linear algorithms by implicitly exploiting the nonlinear information of given data [16].It relies on the observation that a pixel in kernel-induced feature space can be linearly represented in terms of the training samples in same space [9].Let y ∈ R  be the data point of interest and let (y) be its representation in kernel-induced feature space.Similar to the SRC, the linear representation of (y) in terms of training samples in kernelinduced feature space can be formulated as where A  is the training dictionary in kernel-induced feature space and x  is the coefficient vector.The vector x  can be recovered by solving Problem ( 5) can be approximately solved by kernelized sparse recovery algorithms, such as the kernelized orthogonal matching pursuit (KOMP) and kernelized subspace pursuit (KSP).Implementation details of the KOMP and KSP can be found in [17].In this paper, the KOMP is used to solve the problem with RBF kernel function.To avoid directly evaluating the inner product in high-dimensional feature space, the kernel-based learning algorithm uses an effective kernel trick to implement dot products in the feature space without knowing the exact mapping function .
Once the sparse vector x is obtained, the residual associated with th subject in the feature space is then computed by where k A,y and K A are, respectively, kernel tricks of target dictionary with the query pixel and itself.The class label of query pixel y is then determined as Though SRC and KSRC have been proved to be powerful approaches as shown in [8], the main idea of the SRC and KSRC is only appropriate for extended surface target classification such as plantation and geology.In this case, the pixels in a large neighborhood are likely to consist of similar materials, and different subjects are next to each other without undesired subject (background) existing between them.As a result, the spectral mixture only occurs along the boundary, leading to the fact that most of the target pixels are observed without corruption brought by background and most of the training samples selected from dataset for dictionary are pure.However, this probably would never happen to small size target whose spectrum is almost mixed with background.Thus, it may be unreliable for conventional SRC and KSRC to use the residual information for validation and classification.On the other hand, it is assumed that we have hold all class of subjects present in the hyperspectral image scene, including the number of classes and corresponding training samples.In other words, the SRC and KSRC cannot distinguish between small targets in hyperspectral image scene for generally lacking training samples of undesired class.

SRBBHC and the Kernelized Version
In this section, we introduce the proposed SRBBH model based classification algorithm for HSI, which utilizes the binary hypothesis for quality validation as well as the reconstruction residuals by the two hypotheses for classification.Moreover, a kernelized version of the proposed classifier is also introduced for nonlinear classification in a highdimensional feature space.

SRBBHC.
When employing SRC and KSRC for HSI, it is assumed that the number of classes present in the image scene  is known.However, it is actually difficult to know this information due to scene complexity.In many practical situations, only the number of desired classes is available.
Fortunately, considering the regions of interest, such as artificial target in nature background, what we wonder is the class label of desired subjects, but not the class label of all kinds of subjects.In other words, we need to detect and then reject the undesired query sample before classification.On the other hand, different from extended surface target, the pixels of target with small size are almost mixed with background spectrum.Detecting and classifying desired subjects from the mixed pixel are generally difficult, especially when the background spectrum has a close or even larger abundance than target.In addition, although the background and target training samples have distinct spectral signatures and lie in two different subspaces, the two subspaces are usually not orthogonal, due to spectral variation [18].In such case, the reconstruction residual via corresponding target training samples may be on the contrary larger than that via background training samples, which will lead to mistake target for background.Thus, it is no longer sufficient to directly use the reconstruction residuals for validation and classification.The SRBBHC solves these problems by utilizing a binary hypothesis model with more reasonable dictionaries, where the query pixel is, respectively, modeled with background dictionary under the null hypothesis and with union dictionary under the alternative hypothesis.And then the binary hypothesis is used for validation.In SRBBHC and its kernelized version, the different impacts the background dictionary and union dictionary have on reconstruction, instead of residuals themselves, are used for validation and classification.In a sense, the SRBBHC can be viewed as a joint target detection and classification scheme, which firstly detects the valid samples and then classifies them.
In detail, denote the union dictionary consisting of both target training samples from class  and background training samples as A   = [A   , A  ],  = 1, . . ., , where A   is target subdictionary associated with class  and A  is background subdictionary.If y belongs to undesired class, the spectrum lies in a low-dimensional subspace spanned by the background training samples.As a result, the residuals of different union dictionary A   are similar to the residual of background subdictionary A  , that is, hypothesis  0 .On the other hand, if y belongs to class , the union dictionary A   will give better representation, leading to smaller residual than background subdictionary, that is, hypothesis  1 .The binary hypothesis for quality validation is modeled as follows: where  and   are coefficient vectors associated with A  and A   , respectively.In other words, the problem is reformed into local binary classification problem, where the binary hypothesis is used to decide if the test pixel is a valid sample from one of the classes we desire.

Mathematical Problems in Engineering
According to the sparse coding theory [15], the coefficient vectors can be recovered by solving following minimization problem with the same sparsity level: Once the sparse coefficient vectors are obtained, the semantic information can be directly extracted from the coefficient vectors.The residuals of background subdictionary and different union dictionaries are calculated as If we decide the given y as a valid sample belongs to class , the union dictionary A   will also give much better representation than the other union dictionaries A  , ̸ = , leading to larger difference between   and  0 .Then, y will be labeled to the class with greatest difference between   and  0 .Defining a vector R = ( 0 −  1 ,  0 −  2 , . . .,  0 −   ), the outputs of integrated detection and classification decision are then made by Detector (y) = max (R) where  is a threshold used for validation.When Detector (y) < , the query pixel will be labeled as undesired class, that is, background.The threshold  makes important effects on validation and hence classification.However, in this study, the threshold  is determined experimentally due to lack of parameter analysis theory.In our future work, we will investigate how to automatically choose appropriate  for different test datasets.
Considering the size of the desired subjects, the background dictionary is generated locally for each query pixel through a dual concentric window centered at query pixel.Only the samples in the outer region are involved in A  .As a result, the background dictionary is constructed adaptively for each pixel and captures the background spectral signature of the query pixel better.It is important to note that same sparsity level must be adopted for each union dictionary A   to make sure of the comparability of residuals of two hypotheses in SRBBHC.

KSRBBHC.
For a hyperspectral image scene, the spectral mixing may be nonlinear due to the complex imaging condition in many practical situations [19].As a result, the data structure may become complex and the problem becomes no longer linearly separable.In such case, the linear SRBBHC is not competent any more.Fortunately, kernel methods can project the linearly nonseparable data into a high-dimensional feature space in which those data become more separable.Here we extend the proposed SRBBHC into a kernel vision, referred to as KSRBBHC.
Similar to the KSRC, suppose that (y) is the representation of query pixel in the high-dimensional feature space; the SRBBH model becomes where   is also a threshold used for validation.When Detector (y) <   , the query pixel will be labeled as undesired class, that is, background.Similar to SRBBHC, although   is important for validation and classification, it is also determined experimentally.

Experimental Results
In this section, the classification performance of KSRBBHC is evaluated and compared to the other four classifiers (SVMC, SRC, KSRC, and SRBBHC), and RBF kernel function (x  , x  ) = exp (−‖x  − x  ‖ 2 / 2 ) is used for KSRC and KSRBBHC.The average recognition rate (ARR) and overall recognition rate (ORR) are suggested as performance parameters.The effectiveness of the proposed algorithms is evaluated with two datasets: a synthetic dataset ROI-I and a real dataset ROI-II, as shown in Figures 4 and 5.
The ROI-I is constructed by implanting five classes of targets, which are, respectively, artificiality, clay, tree, plane, and grass with a background scene size of 100 × 100 pixels collected by the Airborne Visible Infrared Imaging Spectrometer (AVIRIS) from San Diego, CA, USA.The image has 224 spectral channels (189 available) in wavelengths ranging from 370 to 2510 nm.In detail, each class of target is linearly mixed with background by varying abundance from 0.3 to 0.7 with step 0.1 in five small neighborhoods of size 2 × 2 pixels.In other words, the image contains 25 desired subjects occupying 100 pixels.4 unmixed samples per class are randomly chosen for training, and the dual window sizes  in ,  out are set as 3 × 3 and 9 × 9 according to the size of desired target.We compute the ARRs and ORRs of KSRC and KSRBBHC for ROI-I with varying kernel parameter  and sparsity level  0 , as shown in Figure 1.One can see from Figure 1 that the KSRC and KSRBBHC both are sensitive to , while  0 plays a nearly negligible role for the two kernel vision algorithms when  is fixed, especially when  is fixed as 0.001, 0.01, and 0.1.When  is fixed as 10, the ARRs and ORRs of KSRC and KSRBBHC both remain at a relatively high level and change smoothly with  0 .As a result, the kernel parameter  is set as 10 for the KSRC and KSRBBHC.
The ROI-II with the size of 100 × 100 pixels is a region directly taken from the real AVIRIS image, San Diego.It contains 3 classes of desired subjects occupying 317 pixels and undesired subjects occupying 9683 pixels.For each class, around 10% of the labeled samples are chosen randomly for training.The dual window sizes  in ,  out are set as 7 × 7 and 11 × 11 manually according to our previous work [20].The ARRs and ORRs of KSRC and KSRBBHC with varying  and  0 for ROI-II are shown in Figure 2. In detail, as shown in Figure 2(a), when  is very small (0.001, 0.01) the KSRC always has a poor performance.When  is relatively large, in general, the performance of KSRC increases with  0 smoothly, until the parameter reaches a certain level.Only when  is too large ( = 100), the performance of KSRC fluctuates with the value of  0 .For KSRBBHC, as shown in Figure 2(b), the performance remains at high level at all sparsity level when  is small (0.001, 0.01, and 0.1).And the performance increases with  0 smoothly when  is fixed as 1, until the parameter reaches a certain level.However, the performance decreases with  0 when  is large (10, 100).In a word, the experiment results for ROI-II also show that the classification performance is more sensitive to kernel parameter rather than sparsity level.Unlike the same optimal kernel parameter setting in ROI-I, the kernel parameter  is, respectively, set as 10 for the KSRC and 0.01 for KSRBBHC in ROI-II.
Figure 3 shows the classification performances of the four sparsity-based classifiers with varying  0 when kernel parameter  is, respectively, optimized for corresponding algorithm.For the ROI-I, as shown in Figure 3(a), in general, the experiment result shows that the classification performances of four algorithms increase with  0 , until the parameter reaches a certain level, and then the performances hold the line.As a result, the parameter  0 is set as 8 to balance the performance and complexity for all sparsitybased classifiers in ROI-I.For the ROI-II, the classification performances of SRC, KSRC, and SRBBHC increase with  0 when the parameter is smaller than 6; then the performances of SRC and KSRC keep increasing smoothly with  0 , whereas the performance of SRBBHC reduces gradually.The reason for this may be that only proper  0 can lead a good discriminative performance of sparsity-based classifiers.In detail, for a very small  0 , the residuals of background dictionary and union dictionary are both large, which finally weakens the classification performance.When  0 is too large, along with the nonorthogonality between desired target and background, the solution may be dense and result in a degraded discriminative power.Exactly as expected, for both ROI-I and ROI-II the performance of KSRBBHC reaches a good result with relatively small  0 and remains at high level when  0 increases (i.e., insensitive to  0 ).The parameter  0 is set as 6 to balance the performance and complexity for all sparsity-based classifiers in ROI-II.
After getting the optimal values of  0 and , the experimental results of the five classifiers, averaged over five independent realizations, are determined via ORR and ARR, as shown in Tables 1 and 2. Because only the number of desired classes is known, it should be noted that the SVMC is designed as a local  + 1-class ( = 5 for ROI-I;  = 3 for ROI-II) classifier with one-againstall strategy, where the training samples from class  + 1 (background) are adaptively collected for each query pixel in local dual concentric window.One can observe from Tables 1 and 2 that both SRBBHC and KSRBBHC have significantly improved the classification accuracy.One can also see that the relationship of ORR and that of ARR between two classifiers are sometimes mismatched.The reason for this phenomenon may be that the total number of desired pixels is so small that even modest increase of desired target classification accuracy can lead to enormous increase on ARR but only small change on ORR.For example, for the KSRBBHC in Table 1, the classification accuracy for every single class is greater than or equal to that of SRBBHC, leading to a greater ARR.However, due to the small ratio but big absolute amount growth of classification on background, the ORR of SRBBHC is turning to greater than KSRBBHC.Considering the mismatch, the best one of the five independent results for each classifier Mathematical Problems in Engineering is chosen to estimate the performance intuitively, as shown in Figures 4 and 5.The corresponding algorithm and ORR are presented on the bottom of each map.One can clearly see that both binary hypothesis model based classifiers have significantly improved classification performance.The SRBBHC not only classified the desired target exactly but also avoid mistaking background to desired target effectively.Furthermore, the KSRBBHC outperforms all other classifiers.In detail, for ROI-I, because of the mixture of desired target and background, the SVMC, SRC, and KSRC mistake desired target to background to a great extent.Severely, the SRC mistakes abundant background pixels to desired target, while the KSRBBHC and SRBBHC yield a superior performance.For ROI-II, the SVMC barely classify the 2nd class of target correctly because of lack of enough training sample and the similarity between 2nd class of target and background.And the conventional sparsity-based classifiers (SRC and KSRC) lead to quite a number of misclassifications due to their weak discriminative power.In a word, one can observe from the experimental results that the SRBBH model based classifier (SRBBHC) offers better performance than some traditional techniques.Moreover, the kernelization of SRBBHC further improves the performance.

Figure 1 :
Figure 1: Effect of sparsity level  0 and RBF kernel parameter  on performance of two kernel-based classifiers: (a) KSRC; (b) KSRBBHC for ROI-I.

Figure 2 :
Figure 2: Effect of sparsity level  0 and kernel parameter  on classification performance of two kernel classifiers: (a) KSRC; (b) KSRBBHC for ROI-II.
arg min      x − A       2

Table 1 :
Classification accuracy for the ROI-I using 4 unmixed samples as training set.

Table 2 :
Classification accuracy for the ROI-II using around 10% labeled samples as training set.
corresponding algorithm (SRBBHC) based on this model is proposed for HSI classification.Furthermore, taking the