Dictionary-Based , Clustered Sparse Representation for Hyperspectral Image Classification

This paper presents a new, dictionary-based method for hyperspectral image classification, which incorporates both spectral and contextual characteristics of a sample clustered to obtain a dictionary of each pixel. The resulting pixels display a common sparsity pattern in identical clustered groups. We calculated the image’s sparse coefficients using the dictionary approach, which generated the sparse representation features of the remote sensing images. The sparse coefficients are then used to classify the hyperspectral images via a linear SVM. Experiments show that our proposed method of dictionary-based, clustered sparse coefficients can create better representations of hyperspectral images, with a greater overall accuracy and a Kappa coefficient.


Introduction
With the development of electronic spectrum theory as well as electronic and computer technology, researchers have developed hyperspectral remote sensing (HRS) at full speed in recent years.The resulting hyperspectral remote sensing images are of a much better quality, which entails the need for larger storage capacity.Hyperspectral imagery (HI) captures detailed terrestrial information with high resolution in both the spatial and spectral dimensions [1].Analyzing hyperspectral remote sensing data can yield abundant spectral information and detailed features [2].Substances that cannot normally be detected by multispectral remote sensing technique can be successfully studied using hyperspectral remote sensing.HRS data has become an important data source in fields such as precision agriculture, natural disaster research, atmospheric observation, fog monitoring, environmental monitoring, and resource investigation [3].
Hyperspectral remote sensing images are composed of pixels that reflect the characteristics of the terrain object.Each pixel represents the surface features of hundreds of wavelengths of solar radiation.In mathematics, these pixels are modeled as members of a vector space.However, due to the inherent limitations of hyperspectral sensors, which often neglect the correlation between the signals, sample data indicates rates that far exceed the effective dimension of the signals, thereby causing issues with dimensionality.In recent years, researchers have proposed many dimension reduction methods, such as principal component analysis (PCA) [4], linear discriminant analysis (LDA) [5], and independent component analysis (ICA) [6].Since 2005, Camps-Valls and Bruzzone have carried on the research of hyperspectral remote sensing image classification using the machine learning method [7].Understanding the causes of signal representation in a low-dimensional model is a recent research trend, known as dictionary learning [8].The idea of dictionary learning is to represent a signal using a linear combination of a few elements from a dictionary, which is taken from the data.Each data point is represented through a sparse vector of coefficients, as a member of a low-dimensional subspace, spanned by a few dictionary elements.Iordache et al. studied the HI unmixing using the dictionary learning method [9].In 2010, Charles et al. achieved the sparse representation of hyperspectral remote sensing images using the common dictionary learning method for images [10].Then, in 2013, Soltani-Farani et al. created a remote sensing image sparse Band f bandn (7,8) f band1 (7,8) f band2 (7,8) . . . . . . . . .representation in a spatial domain using the spatial relationships within the hyperspectral remote sensing image [11].
In this paper we propose a new high-quality and efficient classification technique that extends existing dictionary learning-based classification frameworks in several aspects.We incorporate both the spectral and contextual characteristics of a hyperspectral sample by clustering remote sensing images to obtain a dictionary of pixels and present a clusterbased dictionary learning method; pixels that belong to the same cluster group are often made up of the same materials.This property holds for various hyperspectral image singularities such as straight and corner edges, as shown in Figure 1.Using a linear SVM as a classifier, we completely classified the hyperspectral remote sensing images.We compare this cluster-based dictionary learning method with other alternatives for classification and show that it performs significantly better in terms of both accuracy and a Kappa coefficient.

Basic Model
For a set of pixels of hyperspectral image, let x ∈ R  , and the fundamental goal of the dictionary learning method is to find a set of atomic signals D = [d 1 , . . ., d  ] to represent the hyperspectral data by a small number of terms in a linear generative model; that is, In this paper, we use lowercase letters to represent vectors (such as x) and capital letters to represent matrixes.Moreover,  is a small residual due to modeling x in a linear manner with the sparse representation vector y ∈ R  .The formulation of (1) is often a regularized least squares optimization as follows: where X = [x 1 , . . ., x  ], Y = [y 1 , . . ., y  ], and ‖ ⋅ ‖  denotes the Frobenius norm of a matrix.The second part of  is a parameter that trades off between the data fidelity (leastsquares) term and the sparsity based regularizer (the  1 norm); (Y) can be interpreted as finding a maximum a posteriori (MAP) estimate of the coefficients under the assumptions of Gaussian noise and a prior independent identically distributed (i.i.d.); traditionally, a Laplacian distribution is preferred as it leads to the well-known Lasso or ℓ 1 minimization and can be expressed as [12] arg min Tibshirani et al. used (3) to solve problem (2).The basic problem of dictionary learning is to learn through sparse regularization representations  1 , . . .,   .Its vectors form y (1) , . . ., y () .The above optimization is convex in either D or Y, but not in both.Commonly, a two-step strategy is used for this problem.
(1) Sparse Coding.In this step, D is fixed and the optimization is solved with regard to Y.The objective function (3) is converted to (4) and can be solved for each y  independently; that is, arg min Several efficient algorithms have been proposed to solve this issue (4) in recent years, though this paper implements the data [11] provided by the SPAMS toolbox.
(2) Dictionary Update.In this step, researchers apply the dictionary update, in which Y is fixed and the optimization becomes which is quadratic in D. The gradient of the objective function equals DYY  − XY  , in which zero is used for D = XY  (YY  ) −1 .There are many ways of solving the problem; we use the block coordinate descent (BCD) [13], which updates the dictionary atoms iteratively.Since the objective function is strongly convex, BCD is guaranteed to achieve the unique solution.The atom of  of objective function can expressed as .In order to solve for d  , the algebra is as follows:

Clustered Sparse Representation for Hyperspectral Image Classification
Recently, Song and Jiao used the sparse representation method for hyperspectral image classification [14], in which the sparse representation coefficients y  are considered to be independent of each other.Soltani-Farani et al. [11] partitioned the pixels into groups of the same size, such as group 1 in Figure 1.Yet, the features of the group are not necessarily similar when grouped by identical size, such as those in groups 2 and 3 in Figure 1.In order to solve this problem and further improve the classification accuracy, we propose partitioning the pixels of the hyperspectral images into a number of spatial neighborhoods called groups by clustering.Pixels that belong to the same cluster group are often made up of the same materials, so we assume that their representations use a common set of atoms from the dictionary.Thus, the sparse representations of the HSI pixels that belong to the same group are no longer independent.In fact, the pixels in the same cluster groups have revealed hidden relationships within the spectral bands.HSI are a collection of hundreds of images that have been acquired simultaneously in narrow and adjacent spectral bands.In this research, x 1 , . . ., x  denote the representation of the pixels in a hyperspectral image and define the cluster groups  1 , . . .,   as nonoverlapping image patches.Figure 1 shows how the pixels of a hyperspectral image may be partitioned into a number of different groups.Accounting for the above assumption, the establishment of a sparse representation model can now be written as In this model, the columns of Y   and E   are the sparse representations and error vectors corresponding to the hyperspectral samples, respectively.In order to get the dictionary and sparse representations, we employ the ℓ 2 /ℓ 1 convex joint sparsity-inducing regularizer in order (2) to arrive at arg min where    is the regularization parameter for the group of  and ‖Y   ‖ 2,1 is the ℓ 2 /ℓ 1 norm of the row of Y.In order to solve the problem, we have empirically adopted a regularized M-FOCUSS algorithm [15].By estimating each row of the ℓ 2 norm, we can update   according to the estimated value.
Setting the gradient of the objective function at (8) zero, we arrive at where Λ = diag(‖Y  , ‖ 2 ).By solving (9), we arrive at Y  = ΛD  (DΛD  +   I)X  .According to the relationship of the pixels in the same cluster groups have a hidden relationships within the spectral bands, showed in Figure 1.In order to implement the dictionary method, we used x 1 , . . ., x  to denote the spectral representation of the training data with respective labels l 1 , . . ., l  and then applied the dictionary learning formulation of (4) to these samples to yield the corresponding sparse representations y 1 , . . ., y  and the dictionary, D. When there is a new hyperspectral sampling, sparse coding can be applied (as in ( 4)) to find the corresponding sparse representation y, which is then classified using the trained linear SVM to find the corresponding label l.The specific steps are as follows: (1) cluster the hyperspectral image into different groups by -means++ [16]; (2) apply the dictionary learning method using the SPAMS toolbox to solve the formula of (4), which yields the dictionary, D, and the corresponding sparse representations coefficient, y, with respective labels l 1 , . . ., l  ; (3) a linear SVM classifier is trained on the sparse representations and their corresponding labels l  1 , . . ., l   ; (4) according to the sparse representation of remote sensing images, we used a linear SVM classifier to achieve classification.

Experimental Results and Analysis
In this section, in order to validate and test the effectiveness of the proposed clustered dictionary-based algorithm, we provide the experimental results from two sets of real hyperspectral images.We then compare the classification accuracies of the basic SVM classification (SVM) [17], which is the hot issue accompanying artificial neural network in machine learning, and it involves any practical problems such as classification and regression estimation.In this paper, Libsvm 3.17 is used to do the experiments.Classification accuracy depends on the choice of parameters.All parameters (polynomial kernel degree , RBF kernel parameter, regularization parameter , the composite kernel weight, and the window width ) are obtained by fivefold cross validation.The spectral-contextual dictionary learning (SCDL) is presented by Soltani-Farani et al. [11].In the paper, the authors partitioned the pixels into groups of the same size.The clustered dictionary learning (Cluster-DL) is presented by our team; the pixels of a hyperspectral image are partitioned into a number of different groups by -means++.We also compare the spectral characteristics that have been gathered from the dictionary learning method, which are made up of dictionary atoms with the original spectral remote sensing images' characteristics.The experiment adopted four indicators to evaluate overall accuracy (OA), average accuracy (AA), Kappa, and execution time.
4.1.The 1st Experiment.We collected the 1st experiment over an agricultural/forested area in NW Indiana using the AVIRIS sensor, called the Indian Pines image.The image is 145 pixels × 145 pixels and consists of 220 bands across the spectral range 0.2 to 2.4 m and 20 noisy bands (104-108, 150-163, and 220) that correspond to the region of water absorption that has been removed.The image consists of 16 ground-truth classes; the specific classes and the number of train and test data in each class are shown in Table 1.
We randomly chose 10% as the training data, as shown in Figure 2(c), and the remaining 90% is the test data.Table 2 displays the test results, which contain the OA, AA, and Kappa coefficient.The SVM classification result is shown in Figure 2(d), whereas the classification maps obtained by other methods can be found in Figures 2(e) and 2(f).As a means of visual comparison, we used learning dictionaries with 138 atoms (using 10% of the Indian Pines training data).Figure 3 demonstrates the comparison map of sample spectra for Alfalfa in the Indian Pines dataset and the learning dictionary atom obtained by SCDL and Cluster-DL.
In Figure 3, we can see the sample spectra for Alfalfa in the Indian Pines dataset, and the learning dictionary atoms obtained by Cluster-DL and SCDL are close to each sample.Relatively speaking, the Cluster-DL is closer to the sample    3).
According to the experimental results, the clustered dictionary learning algorithm proposed in this paper can significantly improve classification accuracy.In the 1st experiment, Cluster-DL can improve classification accuracy from 0.9664, without the clustered dictionary learning algorithm, to 0.9679.In the 2nd experiment, Cluster-DL can improve the classification accuracy from 0.9488 to 0.9734; this means that the clustered dictionary leaning algorithm has more obvious advantages when the terrain is more complex.The execution time of the SVM algorithm is less than the time of the dictionary learning algorithm, which also illustrates the fact that the clustered structural dictionary learning improved the classification accuracy by increasing the execution time.

Conclusion and Discussion
In this paper, we have investigated clustered dictionary learning algorithms based on the models of hyperspectral data for HSI classification.Our research represents a hyperspectral sample with a linear combination of a few atoms learned from the data.The identical clustered groups share the atoms of a dictionary.The hyperspectral samples are classified by a linear SVM trained on the coefficients of this linear combination.
Experiments on two sets of real HSI data confirmed this model's effectiveness for HSI classification and show that the proposed method can achieve better overall accuracy and Kappa coefficients.This is because the basic SVM classification does not take into account the relationship between the pixels.The SCDL classification partitioned the pixels into groups of the same size; the features of the group are not necessarily similar when grouped by size.In this paper, hyperspectral image is partitioned into a number of different groups by -means++; the pixels in the same cluster groups have revealed hidden relationships within the spectral bands; this is closer to real object.Further research is needed in order to better understand how to integrate information between spatial and spectral information of HSI and utilize supervised classification algorithms to improve the classification accuracy and execution time.

Figure 1 :
Figure 1: The pixels of a hyperspectral image partitioned into a number of different clustered groups.

Figure 2 :
Figure 2: Indian Pines hyperspectral image and the comparison maps of different classification.

Figure 3 :
Figure 3: The comparison map of sample spectra for Alfalfa in the Indian Pines dataset and the learned dictionary atom obtained by Cluster-DL and SCDL.

Figure 4 :
Figure 4: Pavia Center hyperspectral image and the comparison maps of different classification.

Table 1 :
Indian Pines ground-truth classes and train/test sets.
Figures 4(d) and 4(e) show the classification maps obtained by SCDL and Cluster-DL (Table

Table 2 :
Classification accuracy and execution time (S) for AVIRIS Indian Pines for different classifiers.

Table 3 :
Classification accuracy for Pavia Center for different classifiers.