Target Recognition in SAR Images Based on Multiresolution Representations with 2D Canonical Correlation Analysis

/is study proposes a synthetic aperture radar (SAR) target-recognition method based on the fused features from the multiresolution representations by 2D canonical correlation analysis (2DCCA). /e multiresolution representations were demonstrated to be more discriminative than the solely original image. So, the joint classification of the multiresolution representations is beneficial to the enhancement of SAR target recognition performance. 2DCCA is capable of exploiting the inner correlations of the multiresolution representations while significantly reducing the redundancy. /erefore, the fused features can effectively convey the discrimination capability of the multiresolution representations while relieving the storage and computational burdens caused by the original high dimension. In the classification stage, the sparse representation-based classification (SRC) is employed to classify the fused features. SRC is an effective and robust classifier, which has been extensively validated in the previous works. /e moving and stationary target acquisition and recognition (MSTAR) data set is employed to evaluate the proposed method. According to the experimental results, the proposed method could achieve a high recognition rate of 97.63% for the 10 classes of targets under the standard operating condition (SOC). Under the extended operating conditions (EOC) like configuration variance, depression angle variance, and the robustness of the proposed method are also quantitively validated. In comparison with some other SAR target recognition methods, the superiority of the proposed method can be effectively demonstrated.


Introduction
Synthetic aperture radar (SAR) plays an important role in modern battlefield surveillance owing to its all-day, allweather capabilities etc. Automatic target recognition (ATR) has been a hot topic in SAR image interpretation since it was first researched in 1990s [1]. As a typical supervised pattern-recognition problem, a concrete SAR ATR algorithm usually involves two key techniques, i.e., feature extraction and classification. Feature extraction seeks discriminative representations from the original SAR images, which could better embody the target's properties. At the present stage, the available features for SAR ATR can be generally divided into three categories. e first depicts the geometrical properties of the target including binary target region [2][3][4], target outline [5], and target's shadow [6]. Ding et al. proposed a binary region matching algorithm with application to SAR target recognition [2]. In [3], the Zernike moments were used to describe the binary target regions from SAR images. Anagnostopoulos employed the outline descriptors as the basic features for SAR ATR [5]. e target's shadow in SAR image was surveyed in [6] for target recognition. e second category mainly describes the intensity discrimination of the original SAR images using some mathematical tools or signal processing techniques [7][8][9][10][11][12]. In [7], the principle component analysis (PCA) and linear discriminant analysis (LDA) were used for SAR image feature extraction. Cui et al. applied the nonnegative matrix factorisation (NMF) to SAR ATR [8]. Some manifold learning algorithms were also demonstrated to be effective for feature extraction of SAR images [9,10]. Dong et al. introduced the 2D monogenic signal to compressively investigate the spectral properties of SAR images [11,12]. e last one reflects the electromagnetic characterizes of SAR targets [13][14][15][16][17]. At the high-frequency region, the backscattering of the whole target can be modeled as the summation of several local phenomenon, i.e., scattering centers [13]. In [14], a Bayesian matching scheme was designed for the attributed scattering centers for SAR ATR. Ding et al. developed several different ways of applying attributes scattering centers to SAR target recognition by exploiting the local structural properties of the scattering center set.
Based on the extracted features, different kinds of classification schemes are designed to make decisions on the target labels. At the early stage, the template matching was employed to match the test sample with the template ones to evaluate the intensity divergences between them. In essence, it is a nearest neighbor (NN) classifier. As a modified version of NN, K-Nearest Neighbor (KNN) was employed to classify the PCA and LDA features in [7]. Zhao and Principe applied the support vector machines (SVM) to SAR target recognition, and it demonstrated good performance [18]. Since then, many SAR ATR methods employed SVM as the basic classifier to classify different kinds of features, e.g., region moments [3], outline descriptors [5], and projection features [19]. e sparse representation-based classification (SRC) was developed based on the compressive sensing theory, which has been successfully applied to pattern recognition applications, e.g., face recognition [20] and SAR target recognition [21] [22]. It was validated in several works that SRC is an effective and robust classifier for SAR ATR. In [21], iagaraianm et al. introduced SRC to SAR ATR by classifying the random projection features. Song et al. further investigated the performance of SRC on different kinds of features extracted by PCA, down-sampling, etc. Dong and Kuang employed SRC as the basic classifier for the monogenic components in [11]. e emergence of deep learning triggers waves of artificial intelligence and machine learning [23,24]. As a typical representative, convolutional neural networks (CNN) have been widely used in the field of image interpretation including SAR ATR [25][26][27][28]. Several different networks were designed to improve SAR ATR performance. Chen et al. proposed the all-convolutional networks for SAR ATR thus significantly reducing the parameters. In [26], SVM was combined with CNN to enhance the SAR ATR performance.
is study proposes a SAR ATR method based on the fused features of multiresolution representations by 2D canonical correlation analysis (2DCCA) [29]. In the previous works, the multiresolution representations were demonstrated effective for SAR ATR. In [30], the multiresolution representations were independently classified by SRC, and their results are combined using a score-level fusion. In order to capture the inner correlations of different resolutions, the joint sparse representation was adopted to jointly classify all the resolutions [31]. Furthermore, considering that there may be some resolutions with low discriminability, the discrimination analysis was performed before the joint sparse representation of the multiresolution representations [32]. en, only those highly discriminative resolutions are used for the final decision. ese works effectively improved SAR ATR performance. However, they indeed have some shortages. First, the inner correlations among the multiresolution representations cannot be exploited fully. In [30], different resolutions were classified independently, so their correlations are actually neglected. For the methods using joint sparse representation [31,32], the correlations were reflected by the sparsity constraint during the solution of the multitask learning problem. However, such constraint is not robust especially when there are some nuisances in the multiresolution representations. Second, each resolution is conveyed by an SAR image of the same size with the original SAR image. Hence, it is inevitable that the previous methods with notably increase the storage and computation loads. As a remedy, this study aims to seek a unified representation of the multiresolution SAR images, thus better capturing the inner correlations while improving the classification efficiency. In detail, 2DCCA is employed to fuse the multiresolution representations sequentially. 2DCCA is the generation of CCA [33] to the 2D space, which considers the structural information of the 2D images. In addition, 2DCCA could maintain the inner correlations of the components while reducing the redundancy, which is beneficial to improve the overall classification accuracy and efficiency. At each turn, 2DCCA is performed to capture the correlations between the highest two resolutions. And the two resolutions are combined as a new feature matrix. en, the new feature matrix is combined with the next resolution (the highest one in the remaining). A final feature matrix is obtained after processing the last resolution, which is used for target classification. SRC is adopted as the classifier in this study. As demonstrated in previous works, SRC could work very well on different kinds of features for SAR ATR. In addition, it is demonstrated to have good robustness to nuisance conditions, e.g., noise contamination and partial occlusion. e remainder of this study is organized as four sections. Section 2 introduces feature generation from multiresolution representations based on 2DCCA. In Section 3, the basic theory of SRC is described with application to SAR target recognition. Section 4 presents the experimental results of the proposed method on the moving and stationary target acquisition and recognition (MSTAR) data sets. Conclusions are drawn in Section 5 to summarize the whole paper.

2D Canonical Correlation Analysis of Multiresolution Representations
According to SAR imaging mechanism, the low-resolution representations of a SAR image can be conveniently generated by using only a proportion of the original frequency spectrum. e detailed procedure can be referred to the previous works in [30][31][32]. Figure 1 illustrates the multiresolution representations of a BMP2 SAR image from the MSTAR data set. e original image with the resolution of 0.3 m × 0.3 m is used to generate the low-resolution images of 0.4 m × 0.4 m, 0.5 m × 0.5 m, 0.6 m × 0.6 m, respectively. As shown, the multiresolution representations are capable of describing the target from coarse to fine. At a very low resolution, the region information of the target is mainly 2 Scientific Programming manifested. With the increase of the resolution, more details of the target can be observed, e.g., the distribution of the scattering centers. It is assumed that the multiresolution representations of the same SAR image share some inner correlations. Meanwhile, they have much redundancy, e.g., the backgrounds. erefore, this study aims to construct new features from the multiresolution representations, which could exploit their inner correlations while reducing the redundancy. 2DCCA [29] is the extension of the conventional CCA to the 2D space, which is capable of investigating the correlations between two 2D variables. For two matrix sets X t ∈ R m x ×n x , t � 1, . . . , N and Y t ∈ R m y ×n y , t � 1, . . . , N , they can be regarded as the realizations of random variable matrix X and Y, respectively. In the CCA, the 2D matrices are first transformed into 1D vectors and the canonical analysis is conducted afterwards. However, the vectorization operation may probably lose the 2D structural information of the matrices. en, the 2DCCA is proposed to directly analyze the correlations between the two matrix sets.
At first, the mean matrices of X t and Y t are obtained as Afterwards, the original matrices are centralized as e objective of 2DCCA is to seek left transforms (l x and l y ) and right transforms (r x and r y ), which maximize the correlations between l T x Xr x and l T y Yr y . So, 2DCCA can be solved as follows: e detailed solutions of 2DCCA can be referred to the original work in [29]. Based on the resulted left and right transforms, the corresponding matrices from the two sets are fused as a unified feature matrix, which could maintain their inner correlations.
In this study, 2DCCA is used for the fusion of multiresolution representations. Assume there are M resolutions to be fused, i.e., Z i t ∈ R m×n , t � 1, . . . , N (i � 1, . . . , M)., which are arranged according to the resolution in a descending manner. Figure 2 gives an illustration of feature generation based on the multiresolution representations using 2DCCA. At the start, the first two resolution, i.e., Z 1 t and Z 2 t (t � 1, . . . , N) are combined using 2DCCA. Afterwards, the fused feature matrix is combined with the third resolution. e process is repeated until the Mth resolution. In this way, M − 1 sets of transformation matrices are calculated and each set contain four transforms (two left and two right ones).

Sparse Representation of Fused Feature for Target Recognition
3.1. SRC. SRC is a newly proposed classification scheme based on the sparse signal processing technique [20]. e basis of SRC lies on the assumption that the test sample from a certain class can be linearly reconstructed using the training samples from that class. Denote the training samples from the kth class as Φ k � [x k,1 , . . . , x k,n k ] ∈ R d×n k (k � 1, . . . , C), where d is the dimension of the atoms. en, the test sample from the kth class can be linearly represented as where α k � [α k,1 , . . . , α k,n k ] T ∈ R n k . Actually, in a classification task, the target label of the test sample is unknown. erefore, the global dictionary is often used in the sparse representation as follows: where Φ � [Φ 1 , . . . , Φ C ] ∈ R d×n denotes global dictionary formed by n training samples from the C classes; α � [α 1 , . . . , α C ] T ∈ R n represents the coefficient vector over the global dictionary; and ε is the preset error tolerance. e optimization task in equation (6) is proven to be a nondeterministic polynomial (NP) hard problem. As a result, it is hard to directly find the optimal solution of equation (2). Considering the high sparsity of the coefficient vector, it is feasible to replace the ℓ 0 norm in equation (6) by the ℓ 1 norm, thus relaxing it as a convex optimization problem. In addition, the greedy algorithms, e.g., the orthogonal matching pursuit (OMP) [19,20], are also effective to find the approximate solutions to equation (6).
Ideally, the nonzero elements in the solved sparse coefficient vector α mainly occur in the corresponding class to the test sample. In this sense, the representation capability of different training classes can be reflected by their reconstruction errors. en, the minimum reconstruction error criterion is adopted to make decision on the target label as where α i denotes the coefficient vector related to the ith training class and r(i) represents the error as for representing the test sample using the atoms in ith training class.

Target
Recognition. e fused features from the multiresolution representations are classified by SRC with application to target recognition. Figure 3 shows the implementation procedure of the proposed target recognition method. In detail, it can be summarized as the following six steps: Step 1: generate the multiresolution representations of all the training samples Step 2: analyze the multiresolution representations to calculate the transform matrices Step 3: calculate the feature matrix of each training sample and use the vectorized forms of the all the feature matrices to build the overcomplete dictionary Step 4: generate the same multiresolution representations of the test sample Step 5: calculate the feature matrix of the test sample using the transform matrices and vectorize it Step 6: classify the feature vector of the test sample by SRC to determine its target label

Data Set and Reference Methods.
To quantitatively verify the performance of the proposed method, the MSTAR data set is used for experiments. e data set collects SAR images of 10 classes of ground targets (shown as Figure 4) with the 10 GHz HH-polarization SAR sensors. e resolution of the original SAR images is 0.3 m × 0.3 m. Table 1 presents the 10class training and test, which is a classical experimental setup for the recognition under the standard operating condition (SOC). Images at 17°depression angle are used for training, whereas those at 15°are tested. Both the training and test sets cover the full azimuths of 0∼359°.
Some other SAR ATR is used for comparison as listed in Table 2. SVM, SRC, and CNN are the most prevalent classification schemes in SAR ATR at present stage. In detail, SVM [18] and SRC [21] are used to classify the features extracted by PCA, which is a common feature extraction method in SAR ATR. And the feature dimension is set to be 80. For CNN, the network architecture in [25] is adopted. PAR-Res and JSR-Res are the methods proposed in [30] and [31], respectively, which also perform on the multiresolution resolutions. In [30], the score-level fusion is used to parallelly combine the decisions from individual resolutions. In [31], the joint sparse representation is adopted to jointly classify the multiresolution representations. In the following, (a)    the proposed method is tested under different conditions including SOC and several typical extended operating conditions (EOC).

Recognition of 10-Class Targets under SOC.
e preliminary performance of the proposed method is first tested under SOC based on the 10-class training and test samples in Table 1. e confusion matrix of the proposed method for the recognition of 10-class targets under SOC is given in Figure 5, in which the each element on the diagonal denotes the recognition rate of the corresponding target. As shown, all the targets can be classified with recognition rates over       Table 5, where the highest one is achieved by the proposed method. Also, in this case, the methods using multiresolution representations generally achieve better performance than the remaining ones. As analyzed in Section 2, the multiresolution representations could describe the target's characteristics from coarse to fine, so they are able to capture the local variations caused by the configuration variance. e higher recognition rate of the proposed method over PAR-Res and JSR-Res indicates that 2DCCA is more capable of maintaining the stable features under configuration variance.

Depression Angle
Variance. For SAR images captured at different depression angles, they have much differences embodied in both the target region and shadow. Table 6 showcases the training and test samples for the recognition under depression angle variance. e training samples are measured at the depression angle of 17°and the test ones are from 30°and 45°. Table 7 presents the classification results of the proposed method at different depression angles. A notably high recognition rate of 98.15% is achieved at 30°d epression angle because the images at 17°and 30°still share many resemblances. In addition, the 3-class recognition problem here is much easier than the 10-class one. However, for the test samples at 45°depression angle, they are classified with a much lower recognition rate of 72.50%. e large depression angle variance causes many differences between the test and training samples, which severely degrades the recognition performance. e average recognition rates of different methods are compared in Table 8. All the methods share similar trend under depression angle variance. With the highest recognition rates at both depression angles, the proposed method is validated to be the most robust to depression angle variance.

Noise
Corruption. e MSTAR data set are collected at high signal-to-noise ratios (SNR), which indeed relieve the burden of the following target recognition. Actually, in the practical applications, the measured SAR images to be classified are probably to be contaminated by the noises from the background environment [34,35]. Hence, it is desired that the target-recognition methods could correctly classify the noisy SAR images. In this experiment, the noisy test samples are first generated by adding different levels of additive Gaussian noises to the original 10class test images. e detailed process of noise addition can be referred to [35]. en, the noisy samples are classified by different methods to examine their robustness. Figure 6 shows the average recognition rates of different methods changing with the SNR. In comparison, the proposed method defeats all the reference methods at each SNR, indicating its best noise-robustness. In addition, the methods using sparse representation (SRC, PAR-Res, JSR-Res, and the proposal) outperform the remaining ones (SVM and CNN) especially at low SNRs. erefore, the good performance of the proposed method benefits from the high effectiveness of 2DCCA as well as the robustness of sparse representation.

Conclusion
e multiresolution representations are exploited using 2DCCA with application to SAR target recognition. e multiresolution representations from the same SAR image describe the target from coarse to fine. So, they complement each other to provide more information for the following classification. In addition, they share inner correlations, which also benefit the correct classification. 2DCCA is adopted to fuse the multiresolution representations, and the resulted features describe the correlations among different resolutions while greatly reducing the high dimension. Finally, SRC is employed to classify the fused features to determine the target label. Experiments are implanted on the MSTAR data set to evaluate the performance of the proposed method. According to the experimental results, the superior effectiveness and robustness is quantitively validated in comparison with several reference methods.

Data Availability
e data used to support the findings of this study are available online at http://www.sdms.afrl.af.mil/datasets/ mstar/.