Target Recognition of Synthetic Aperture Radar Images Based on Two-Phase Sparse Representation

. A synthetic aperture radar (SAR) target recognition method is proposed via linear representation over the global and local dictionaries. The collaborative representation is performed on the local dictionary, which comprises of training samples from a single class. Then, the reconstruction errors as for representing the test sample re ﬂ ect the absolute representation capabilities of di ﬀ erent training classes. Accordingly, the target label can be directly decided when one class achieves a notably lower reconstruction error than the others. Otherwise, several candidate classes with relatively low reconstruction errors are selected as the candidate classes to form the global dictionary, based on which the sparse representation-based classi ﬁ cation (SRC) is performed. SRC also produces the reconstruction errors of the candidate classes, which re ﬂ ect their relative representation capabilities for the test sample. As a comprehensive consideration, the reconstruction errors from the collaborative representation and SRC are fused for decision-making. Therefore, the proposed method could inherit the high e ﬃ ciency of the collaborative representation. In addition, the selection of the candidate training classes also relieves the computational burden during SRC. By combining the absolute and relative representation capabilities, the ﬁ nal classi ﬁ cation accuracy can also be improved. During the experimental evaluation, the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset is employed to test the proposed method under several di ﬀ erent operating conditions. The proposed method is compared with some other SAR target recognition methods simultaneously. The results show the superior performance of the proposed method.


Introduction
Synthetic aperture radar (SAR) works day and night to provide high-resolution observations of ground scenarios. Specifically, in the field of battlefield information acquisition, automatic target recognition (ATR) technique is often employed to determine the labels of interested targets in a SAR image. Over the past thirty years, a rich set of SAR ATR methods has been developed, which could be generally categorized as two main mainstreams-template-based and model-based ones-which differ in the way of describing the targets' characteristics [1]. The Semi-Automated Image Intelligence Processing (SAIP) program was a typical template-based SAR ATR system, where the classical threestage processing procedure was proposed, i.e., target detection, discrimination, and classification [2]. The template-based methods describe the target characteristics by template images measured from different conditions. The template set is first constructed by SAR images from different view angles, backgrounds, target configurations, etc. Afterwards, the classification stage builds the relations between the test sample to be classified with the template classes. Finally, the template class with the highest matching score (relation) with the test sample is determined as the target class. In comparison, the model-based methods describe the targets using CAD models [3], global scattering center models [4,5], etc. The Moving and Stationary Target Acquisition and Recognition (MSTAR) program [4] provided a feasible way of conducting model-based SAR ATR, in which the CAD models were used to predict the targets' signatures. In recent years, 3D scattering center models of complex targets were developed and applied to SAR ATR because of the concise forms and high flexibility [4,5]. Owing to the release of the MSTAR dataset, many pattern recognition techniques were employed or improved to enhance the recognition performance including feature extraction techniques and classifiers. Various types of features are extracted to convey the targets' characteristics in SAR images. The geometrical features are used to describe the physical sizes or shapes of the target such as target contour [6,7], region [8][9][10][11], and shadow [12]. The projection features are extracted to depict the intensity distribution of the target's image using mathematical transformations (e.g., principal component analysis (PCA) [13], nonnegative matrix factorization (NMF) [14], and other manifold learning algorithms [15][16][17]) or signal processing techniques (e.g., wavelet [18] and monogenic signal [19,20]). When working in the high-frequency region, the backscattering field of the whole target can be regarded as the summation of several scattering centers [21,22]. Therefore, the scattering centers are also good candidates for SAR ATR-like attributed scattering centers [23][24][25][26]. Based on the extracted features, the classifiers are designed to make decisions on the target labels. The nearest neighbor (NN) [13,27], support vector machine (SVM) [28,29], sparse representation-based classification (SRC) [29][30][31], adaptive boosting [32], and the recent deep learning classifiers (e.g., convolutional neural network (CNN) [33][34][35][36][37]) are popular classification schemes in SAR ATR.
SRC is a classifier that originated from the compressive sensing theory, which was first applied to face recognition by Wright et al. [38]. The results demonstrated its superiority over the traditional classifiers like NN and SVM especially the robustness to some nuisance situations like noise corruption and partial occlusion. Liu and Li brought SRC into the classification of SAR targets and validated its effectiveness [29]. Thereafter, many SAR ATR methods employed SRC as the basic classifiers while making some improvements [19,20,30]. As reported in these literatures, the basic idea that lies behind SRC is the linear representation of the test sample based on the global dictionary comprised of different classes and comparing the reconstruction errors. Ideally, the training samples from the correct class can best reconstruct the test sample with the minimum error; thus, the target label can be correctly decided. In essence, the reconstruction errors from different classes actually reflect their relative representation capabilities for representing the test sample. However, the absolute representation capabilities of different training classes are not fully compared in traditional SRC. As a reliable decision, the correct class should be able to approximate the test sample with a small error to validate its absolute representation capability before comparing the relative representation capabilities of different classes. Therefore, this study proposes a SAR ATR method via the two-phase sparse representation over the local and global dictionaries. The local dictionaries are formed by the test samples from individual classes. Then, over each local dictionary, the collaborative representation is performed [39]. Different from the sparse representation, the collaborative representation is aimed at best reconstructing the test sample using all the atoms in the local dictionary with no constraints on the linear coefficients. As a result, the collaborative representation could achieve analytic solutions with high precision. Therefore, it can get more effective reconstruction results than traditional sparse representations. By comparing the reconstruction errors of different classes through the collaborative representation, their absolute representation capabilities can be compared, which provide good references for determining the target label. When the reconstruction error of one class is notably lower than that of others, the target label can be directly decided. Otherwise, when more than one class shares approaching reconstruction errors, the absolute representation capability is not reliable for the present classification task singly. In this case, the classes with low reconstruction errors are selected as the candidates to form a global dictionary, based on which the test sample is further classified by SRC. Afterwards, the reconstruction errors from SRC are fused with those from the collaborative representation for the classification. In this way, both the absolute and relative representation capabilities of a certain training class can be exploited. In addition, the selection of the candidate classes for the global dictionary can effectively reduce the computational complexity and interferences from the wrong classes during SRC. Therefore, both the effectiveness and efficiency during the classification can be enhanced.
The remainder of this study is organized as followings. In Section 2, the collaborative representation over the local dictionary is introduced. Section 3 describes SRC over the global dictionary formed by the candidate classes selected from the collaborative representation stage. In Section 4, the proposed target recognition algorithm via the two-phase sparse representation is explained. Experiments are investigated in Section 5 on the MSTAR dataset. Section 6 makes conclusions of this study based on the experimental results.

Collaborative Representation over Local Dictionary
Collaborative representation [39] was proposed by Zhang et al. with application to face recognition. In comparison with the sparse representation, the collaborative representation tries to linearly represent the test sample with the minimum error. Therefore, the collaborative representation is a convex optimization problem with an analytic solution. As for target classification, the reconstruction errors of different classes are compared, and the one with the minimum error is determined to be the target label. In this study, the collaborative representation is used to evaluate the absolute representation capabilities of different classes. Different from the strategy in [28], this paper performs the collaborative representation over the local dictionary formed by the training samples from a single class. So, each training class can be fully exploited to represent the test sample. Then, the reconstruction errors of different classes from the collaborative representation over the local dictionaries can better reflect their absolute representation capabilities. Denote the training samples from the kth class as X k = ½x k,1 , ⋯, x k,n k ∈ ℝ d×n k ðk = 1, ⋯, CÞ, where d is the dimension of the atoms. The test sample from the kth class can be linearly represented as follows: Journal of Sensors where α k = ½α k,1 , ⋯, α k,n k T ∈ ℝ n k denotes the linear coefficient vector.
In the collaborative representation, the solution of the linear coefficient vector is obtained using the following Lagrangian formulation: The problem in equation (2) is a convex optimization problem, which has a closed-form solution as where I denotes the identity matrix and λ is the regularized factor. With the estimated coefficient vectors b α k , the reconstruction errors of different training classes can be calculated as follows: The reconstruction errors of different classes actually reflect how precise they can linearly represent the test sample. In this sense, they can be used to make a decision on the target label according to the rule of the minimum error as The decision rule in equation (5) compares the absolute representation capabilities of different classes to determine the target label. However, when the minimum reconstruction error is notably approaching to those of other classes, the decision from equation (5) is assumed to be not reliable enough. Therefore, in this study, some modifications are made to the conventional decision rule. Denote r CR ðkÞ as the minimum of all the reconstruction errors, and the modified decision rule is as follows: In equation (6), T 1 denotes the threshold to evaluate the difference between the minimum reconstruction error and the other ones. Accordingly, when the minimum reconstruction error is notably lower than that of the others, the decision made from the collaborative representation is assumed reliable. Otherwise, the decision is not adopted, but some candidate classes can be selected. The candidate classes are assumed to be the potential target labels of the test sample while those unselected are not the correct labels with high probabilities. In this study, the classes with lower reconstruction errors than the threshold T 2 are selected as the candidate classes, which are used to construct the global dictionary for SRC. In detail, we define the two thresholds T 1 and T 2 as m/5 and m/2, respectively, where m = meanð½r CR ð1Þ, r CR ð2Þ, ⋯, r CR ðCÞÞ.

SRC over Global Dictionary
SRC is a popular classifier with successful applications to pattern recognition fields, e.g., face recognition [36] and SAR target recognition [28][29][30]. During the linear representation of the test sample over the global dictionary, the sparsity constraint is assigned on the coefficient vector as where X = ½X 1 , ⋯, X C ∈ ℝ d×n represents the global dictionary comprised of n training samples from all the C classes; α and ε represent the sparse coefficient vector and error tolerance, respectively. Unlike the solution of the collaborative representation, the problem in equation (7) is a nonconvex one due to the ℓ 0 -norm objective. To improve the efficiency of solution, the ℓ 1 -norm relaxation [38] or some greedy algorithms (e.g., orthogonal matching pursuit (OMP)) [30] can be employed to obtain an approaching solution. Similar to the decision mechanism in equation (5), SRC classifies the target label as the training class, which achieves the minimum reconstruction error.
According to equation (7), the representation precision of the test sample is closely related to the completeness of the global dictionary. If the dictionary could cover the condition of the test sample, then the linear representation is a correct one. In this sense, a large and comprehensive dictionary is preferred. However, when the dictionary has too many atoms, the solution of the sparse coefficients becomes complex. As reported in [30], the complexity of the OMP algorithm for solving the sparse representation problems is OðLNdÞ, where L denotes the sparsity level, N is the number of atoms, and d is the dimensionality of the atom. Therefore, the best choice is to construct a global dictionary, which only contains the highly possible class labels of the test sample. Then, the high time consumption and interferences caused by the redundant classes can be avoided. According to the classification scheme based on the collaborative representation, it can serve as a prescreener to select the candidate classes for the building of a proper global dictionary for SRC.
It is much easier to get a precise solution using the collaborative representation in contrast with SRC. So, the representation precision of different classes for the test sample can be better compared. However, SRC over the global dictionary can reflect the relative representation capabilities of different classes under a unified framework. Therefore, it is a feasible way to combine the results from the collaborative representation and SRC to comprehensively evaluate the correlations or differences between the test sample and each training class. Then, the final classification accuracy could be promisingly enhanced.

Target Recognition
In this study, the collaborative representation and SRC are jointly used to comprehensively evaluate the absolute and relative representation capabilities of different training clas-ses in order to form robust decisions. Figure 1 shows the general idea of the proposed method, which can be divided into two stages.
In the first stage, the collaborative representation is used for classification according to the decision rules in equation     Journal of Sensors (6). In this case, when one class significantly outperforms the others on the absolute representation capability, the target label can be directly determined. Otherwise, the candidate classes are selected based on the reconstruction errors to form the global dictionary for SRC in the second stage. The reconstruction errors from the collaborative representation and SRC are fused to form more robust decisions. Compared with SRC, the proposed method incorporates the collaborative representation as a prescreener, which has much higher efficiency. For those test samples, which could be reliably classified by the collaborative representation, there is no need to perform SRC further. In addition, the collaborative representation selects the candidate classes for SRC, thus effectively relieving the interferences from the classes, which share notably low similarities with the test sample. Therefore, it helps improve the classification accuracy during SRC as well as the efficiency. Denote the M selected candidate classes from the collaborative representation as Γð1Þ, Γð2Þ, ⋯, ΓðMÞ, where ΓðMÞ corresponds to the original class index, and the reconstruction errors from the collaborative representation and SRC of the candidate classes are r CR ðΓðmÞÞðm = 1, 2, ⋯, MÞ and r SR ðΓðmÞÞðm = 1, 2, ⋯, MÞ, respectively. When there is no reliable decision from the collaborative representation, the two reconstruction errors are linearly fused as where ω 1 and ω 2 are the weights. This study defines ω 1 = 1/3 and ω 2 = 2/3 to impose more importance on SRC because the collaborative representation could not achieve a reliable decision in this situation. Based on the fused reconstruction errors, the target label can be decided.         Figure 5: Accuracy of different methods under depression angle variance.  Figure 2 shows the optical images of the ten targets. Table 1 displays the training and test samples for the experimental setup under the standard operating condition (SOC). Images at 17°depression angle are adopted as the training samples whereas those at 15°depression angle are classified. In addition, the dataset contains some extended operating conditions (EOC) like configuration variance and (i) KNN. The number of neighbors "K" is set to be 3, and the Euclidean distance is adopted as the distance measure between the test sample and training ones (ii) SVM. LIBSVM [40] is used to perform the multiclass SVM, and the parameters (e.g., the kernel parameter and cost factor) are determined via the cross validation (iii) SRC. The sparse representation is performed over the global dictionary comprised of all the training classes. The sparsity level and error tolerance are consistent with those in the proposed method (iv) CRC. The collaborative representation-based classification (CRC) in [39] is introduced in SAR ATR for comparison, which is performed on the global dictionary (v) CNN. The networks designed in [33] are used for comparison For fair comparison, PCA is employed for feature extraction in the proposed method, KNN, SVM, SRC, and CRC, whose feature dimension is set as 80. CNN conducts the training and classification based on the image intensities. In the remainder of this section, SOC is first established to test the proposed method. Afterwards, different types of EOCs are used to evaluate the robustness of the proposed method.

Recognition of 10-Class Targets under SOC.
Based on the experimental setup in Table 1, the proposed method is first evaluated under SOC. The detailed recognition results of the ten targets are recorded as the confusion matrix in Figure 3, in which the diagonal elements represent the recognition rates of the corresponding targets. Each of them can be classified with a recognition rate of over 97%, and the average reaches 98.86%. BMP2 and T72 suffer the lowest recognition rates among the ten targets because there are configuration differences between their training and test samples as shown in Table 1. Table 2 compares the performance of different methods. With a slightly higher recognition rate than CNN, the proposed method works much better than the remaining ones with a notable margin. As a deep learning technique, CNN is able to learn highly discriminative features when the training samples are sufficient for the recognition task. Under SOC, the test samples share high similarities with the training ones so CNN could work with very high effectiveness with an approaching recognition rate to the proposed one. In comparison with SRC and CRC, the proposed method effectively improves the recognition performance by combining their advantages. The time consumption of different methods for classifying one image is also compared in Table 2. For fair comparison, all the methods perform the classification on a PC platform with Intel i7 3.4 GHz CPU and 8 GB RAM. The proposed method significantly improves the efficiency in contrast with the traditional SRC. The collaborative representation has an analytic solution, which can be solved with very high efficiency in this situation. In this experiment, 2144 of the 3203 test samples can be reliably classified during the collaborative representation. In addition, the selection of candidate classes also reduces the computational complexity of SRC. Therefore, the actual time consumption of the proposed method can be reduced.

Configuration
Variance. The variety in configurations is common for the ground vehicle targets. Take tanks, for example; its shield and spare barrels may be equipped or removed for different applications. In this experiment, the training and test samples are set in Table 3. As listed, the configurations of BMP2 and T72 for classification are different from those of their training samples. The average recognition rates of different methods are listed in Table 4 for comparison. Because of the existing configuration differences, the average recognition rates of all the methods decrease compared with the ones under SOC. The superior robustness of the proposed method to possible configuration variance can be validated because of its highest recognition rate. The collaborative representation and SRC could complement each other when evaluating the representation capabilities of different training classes. Therefore, the combination of their results could better find the configuration differences between the test sample and corresponding training class, which helps improve the classification accuracy under the configuration variance. CNN achieves the second highest recognition rate among all the methods due to its high classification capability. However, it is assumed that the network trained by one configuration may lose some effectiveness when classifying other configurations. As a result, the gap between CNN and the proposed method becomes more remarkable in this case.   Table 5, in which the training set is from 17°depression angle whereas the test samples are from 30°and 45°depression angles. Figure 4 illustrates the influence of the depression angle variance on the captured SAR images. It is clear that the image at 45°has many differences between that at 17°. Table 6 shows the results of the proposed method at different depression angles, which achieves a very high average recognition rate of 98.15% at 30°depression angle but decreases sharply to 72.06% at 45°depression angle. The performance of different 9 Journal of Sensors methods under the depression angle variance is compared in Figure 5, which shares a similar trend, i.e., high recognition rates at 30°depression angle but much lower ones at 45°d epression angle. This is mainly because the images at 45°d epression angle have many differences with the training samples at 17°depression angle as shown in Figure 4. In comparison, the proposed method outperforms the other methods at both depression angles, reflecting its superior robustness to the depression angle variance.

Noise
Corruption. The measured SAR data may be contaminated by the background clutters or system noises [41,42], and the noisy SAR images at low signal-to-noise ratio (SNR) are much more difficult to be classified with high precision. Therefore, it is necessary that the target recognition method could work with robustness under noise corruption. The original MSTAR images are collected at high SNRs, which indeed relieves the difficulty of target recognition. In this experiment, different levels of additive Gaussian noises are first added to the test samples in Table 1, and then, the noisy test samples are classified. Some simulated noisy SAR images are given in Figure 6 to illustrate the influences of noise corruption. Compared with the original image in Figure 6(a), the noisy samples at lower SNRs have more obscure target contours and unstable intensity distributions. Figure 7 shows the average recognition rates of the proposed method at different SNRs, which are simultaneously compared with those of the other methods. The proposed method obtains the highest recognition rate at each SNR, so its robustness to possible noise corruption is validated. Both the collaborative representation and sparse representation perform the linear representations of the test sample. In essence, the two are both optimization problems, so they can actually eliminate the noise interferences during the classification. By combining their advantages, the robustness to noise corruption can be further improved. Similar to the situations of the former two EOCs, the performance of CNN degrades significantly with the decrease of SNR. 5.6. Partial Occlusion. Object occlusion is a nuisance problem in pattern recognition fields, e.g., face recognition. Also, partial occlusion is common in SAR ATR because the ground targets may be occluded by the nearby obstacles. To test the robustness of the proposed method to partial occlusion, the occluded SAR images are first simulated according to the model in [43,44] based on the original test samples in Table 1. Then, these occluded images are classified. Figure 8 illustrates the influences of partial occlusion, where 20% of the target regions are occluded (removed) from different directions. Compared with the original image in Figure 8(a), the absence of some portions of the targets corrupts the original target outlines and intensity distributions. Figure 9 shows the performance of the proposed method at different occlusion levels and compares it with those of the other methods. The linear approximation of the test sample in both the collaborative representation and sparse representation is assumed to have some robustness to partial occlusion as demonstrated in [36]. Then, via the combination of the col-laborative representation and SRC in a hierarchical way, the robustness to partial occlusion can be further strengthened.

Conclusions
In this study, we propose a SAR ATR method by the twophase sparse representation, which combines the advantages of the collaborative representation and SRC. The collaborative representation is performed on the local dictionaries to evaluate the absolute representation capabilities of different classes whereas SRC is employed to evaluate the relative representation capabilities of the selected candidate classes. The two classification schemes are fused hierarchically to perform the recognition tasks. Therefore, both the efficiency and classification accuracy of the proposed method can be improved. Based on the experiments on the MSTAR dataset under several operating conditions, some conclusions can be reached. First, the proposed method achieves a notably high recognition rate of 98.86% on 10 classes of targets under SOC, which demonstrates its good performance for the classification tasks of several similar targets. In addition, because of the fast prescreening by the collaborative representation, the average time consumption of the proposed method is significantly reduced to enhance the overall efficiency. Second, under different types of EOCs such as the configuration and depression angle variances, noise corruption, and partial occlusion, the performance of the proposed method is much more superior than that of the reference methods. Although deformed or corrupted by EOCs, the proposed method could still maintain its higher robustness to obtain reliable classification results. Third, as an overall evaluation, the proposed method could keep robust performance under different operating conditions compared with other methods with relatively high efficiency. In the future, the proposed method can be further improved via adaptive determination of the threshold in the collaborative representation and intelligent fusion of the decisions from the collaborative representation and SRC.

Data Availability
The MSTAR dataset used to support the findings of this study is available online at http://www.sdms.afrl.af.mil/ datasets/mstar/.

Conflicts of Interest
The authors declare that they have no conflicts of interest.