Product Dictionary Learning-Based SAR Target Configuration Recognition

Dictionary construction is a key factor for the sparse representation-(SR-) based algorithms. It has been veriﬁed that the learned dictionaries are more eﬀective than the predeﬁned ones. In this paper, we propose a product dictionary learning (PDL) algorithm to achieve synthetic aperture radar (SAR) target conﬁguration recognition. The proposed algorithm obtains the dictionaries from a statistical standpoint to enhance the robustness of the proposed algorithm to noise. And, taking the inevitable multiplicative speckle in SAR images into account, the proposed algorithm employs the product model to describe SAR images. A more accurate description of the SAR image results in higher recognition rates. The accuracy and robustness of the proposed algorithm are validated by the moving and stationary target acquisition and recognition (MSTAR) database.


Introduction
Due to the powerful day and night working ability under inclement weather conditions, the synthetic aperture radar (SAR) has attracted increasing popularity in recent years [1]. As one of the hottest topics related to SAR remote-sensing applications, SAR automatic target recognition (ATR) focuses on the recognition of the interest targets from 2dimensional (2D) high-resolution SAR images. SAR ATR algorithms can be roughly categorized into template-based methods and model-based methods [2,3]. With respect to template-based methods, model-based ones can achieve better performance.
Model-based methods often involve two related parts, which are feature extraction and classifier design [4]. Plenty of effective features have been exploited over the past decades, such as physical models [5], geometrical characteristics [6], and mathematical features [7]. e performance of these algorithms heavily relies on the precision of feature extraction.
As for the classifier design, some advanced classifiers such as support vector machine (SVM) [8], sparse representation [7,[9][10][11], and convolutional neural networks (CNN) [1,5,8] have been employed. Algorithms based on CNN [1,5,8] or other deep learning [3] have been enriched enormously. However, due to the special complexity of SAR images and shortage of data amount, these algorithms usually suffer from overfitting and local minima [2]. Besides, estimating and initializing associated parameters such as the learning rate, the number of hidden layers, and the number of hidden layer units are quite challenging. Nowadays, the linear representation-based classification has been widely studied due to the great discriminative power, which includes collaborative representation-(CR-) based algorithms and sparse representation-(SR-) based algorithms. Due to the efficient closedform solution of the CR-based algorithm, satisfying recognition results can be obtained [12][13][14][15]. As for the algorithms based on SR, the testing sample is sparsely described by using only few atoms in the dictionary [2,9,11], which have also achieved overwhelming recognition results. However, SAR image recognition suffers from the shortage of sample numbers, and the SAR image quality cannot be completely guaranteed due to various nonideal factors. As a result, we try to adopt learned dictionaries to overcome the obstacle. e learned dictionaries have been proved to be more powerful than the predefined ones, which has wider applications [16,17].
Moreover, as is known, better descriptions of the SAR images result in better recognition results since the essential characteristics of the sample can be described accurately. And due to the coherent imaging mechanism of SAR, the speckle noise in SAR images is multiplicative [18,19]. erefore, different from the presented work, in which SAR images are modelled by the additive model [16], we describe SAR images in a more precise way by the product model in this paper. e motivation of the proposed method is to fuse the advantages of SR with learned dictionaries and the product model together to improve the robustness of the recognition under various severe conditions. In other words, we aim to realize SAR target configuration recognition under a learned dictionary in a statistical way in this paper. Since we adopt the product model to describe the SAR images, we named the proposed method as product dictionary learning (PDL).
Target configuration recognition aims to classify tiny differences of the samples within the same type. It is of crucial importance to application fields, such as battlefield interpretation and reconnaissance [2,10,16].

The Proposed Product Dictionary Learning (PDL) Algorithm
e speckle component in SAR images is inevitable due to the special imaging mechanism. As a result, for any SAR image, it can be expressed as [18,19] i � A ∘ n, (1) where A represents the radar cross-section coefficients of the clutter, n represents the noise component, and the symbol "∘" represents element-wise multiplication. e product model shown in (1) can give a better description of the SAR image with respect to other models, such as the additive model [18,19].
Due to the nature discriminative property, SR has been widely used in pattern recognition applications [7,9,20]. In this paper, we aim to enhance the performance and robustness of SAR image recognition by fusing the product model and the SR technique with learned dictionaries. e product model provides the accurate description of the SAR image, and the SR supports the discriminative power.
Assume that there are C different configurations in the training datasets. For configuration c(c � 1, 2, . . . , C), there are N c training samples. e training datasets with configuration c can be expressed as I c � i c1 , i c2 , . . . , i cj , . . . , i cN c } (j � 1, 2, . . . , N c ). And for a given sample i cj , it can be expressed as the following formula with the definition of A cj � Φ c θ cj : where Φ c is the dictionary, θ cj is the sparse vector of i cj , and n cj is the corresponding noise component.
In this paper, we try to get the optimized dictionary to realize satisfying recognition performance with robustness. From the view of statistics, searching the dictionary by the maximum likelihood estimation (MLE) can be given by [21] Due to the fact that p(i cj , θ cj | Φ c ) � p(i cj | θ cj , Φ c ) p(θ cj ) has a tightly peaked maximum, we can solve the integration in (3) by approximating the value at its maximum point. erefore, (3) can be simplified as Firstly, we come to model the likelihood function p(i cj | θ cj , Φ c ). As can be seen from (2), the likelihood function p(i cj | θ cj , Φ c ) shares the same distribution as the speckle component. It has been proved that the gamma distribution [18,19] can realize accurate modelling of the speckle component, which can give a better description of the SAR image. Better descriptions of the SAR images will contribute to better recognition results. e distribution of the speckle component n in SAR images can be expressed as [18,19] where L is the number of looks of the SAR image and Γ(•) denotes the gamma function. Supposing that the dimensionality of i cj is D, i.e., (1) and (2), we have i cj d � a cjd ∘ n cjd . T represents the matrix transposition, a cj d is the d th element of A cj � [a cj1 , a cj2 , . . . , a cj d , . . . , a cj D ] T , and n cjd is the d th element of n cj � [n cj1 , n cj2 , . . . , n cj d , . . . , n cj D ] T . As a result, p(i cjd | θ cj , Φ c ) can be given by 2 International Journal of Antennas and Propagation Hereto, we can get the distribution of the likelihood function: What follows is to deduce the prior function p(θ cj ). e requirement of the SR-based algorithm is to ensure the sparsity of the sparse vector. And, the Laplace distribution [22] has been validated to be effective, which can be expressed as where exp(•) represents the exponential function and σ is a constant which can be determined by cross-validation.
Here, we will give a brief discussion of how (8) works. We can see that we would like p(θ cj ) to be large to satisfy the objective function (4). Also, from the expression of the prior function shown by (8), we can tell that p(θ cj ) is a decreasing function. erefore, a larger p(θ cj ) will result in a smaller ‖θ cj ‖ 1 , and a smaller ‖θ cj ‖ 1 implies that most entries of θ cj are zeroes or close to zeroes according to the definition of the L1 norm.
(4) can be given by the following formula by combining (7) and (8).
From (9), we can tell that we can iterate the following two steps to get the optimized dictionary Φ * c : (1) solve θ cj with a fixed Φ c ; (2) solve Φ c with a fixed θ cj . In Step (1), solving θ cj with a fixed Φ c is not a convex problem, so we can get the solution by employing the multistage convex relaxation method [23], whereas in Step (2), we can adopt the quasi-Newton algorithm [24] to obtain the dictionary. e gradient of the objective function with respect to Φ c is given by where Θ c � θ c1 , θ c2 , . . . , θ cj , . . . , θ cN c and K c is given by

International Journal of Antennas and Propagation
After solving all the learned dictionaries, recognition can be realized by where x is the testing sample, L(x) is the determination function that confirms the label of x, Φ � Φ * 1 , Φ * 2 , . . . , Φ * c , . . . , Φ * C } is the dictionary composed by all the optimized divided ones, θ x is the sparse vector of x obtained under dictionary Φ, and h c (θ x ) is a function that only keeps the entries of θ x with label c and makes other entries to be zeroes. e minimum reconstruction error will correspond to the desired label.

Experimental Results and Analysis
e proposed algorithm is tested on the standard moving and stationary target acquisition and recognition (MSTAR) database. We conduct the experiments on different configurations of the targets. Samples with the depression angle 17°are used for training, whereas the ones with the depression angle 15°are used for testing.
Subimages with 64 × 64 pixels are extracted and normalized from the original images [9,11,16]. e parameter σ is determined from the set 10 − 2 , 10 − 1 , 10 0 , 10 1 , 10 2 by using the 3-fold cross-validation. e number of the atoms in each dictionary is empirically set to be 60. Dimensionality reduction is realized by using the independent and identically distributed Gaussian random matrix [11,20]. e feature dimensionality is set to be 800. e original SR algorithm [20], the monogenic signalbased SR algorithm (MSR) [9], the joint SR algorithm (JSR) [11], the SR-based algorithm using the K-SVD algorithm [25] to learn dictionaries (DL), and the SR-based algorithm using the statistical dictionary learning algorithm [16] to learn dictionaries (SDL) are chosen to be competitors to test the advantage of the proposed method.

Target Configuration Recognition on Various Targets.
In the beginning, we conducted the experiment on 7 different configurations belonging to 3 different targets. Datasets description is given in Table 1. To show the advantage of the proposed algorithm, we not only compare it with the abovementioned SR-based algorithms, but also compare it with a deep learning-based algorithm named as A-convnet [26]. Corresponding results are shown in Table 2. To make the comparison between A-convnet and the proposed algorithm convenient, we refer to the structure of A-convnet constructed in [26,27]. e structure that results in the best recognition results for the 64 × 64 pixels image is displayed in Figure 2.
From the experimental results, we can see that the MSR and JSR perform better than the baseline (SR), thanks to the utilization of extra useful information. And, all the dictionary learning-based algorithms can enhance the performance effectively. is is due to the fact that with learned dictionaries, more robustness can be achieved. As for the three different dictionary learning-based algorithms, SDL performs better than DL. e reason is that SDL can give a better description of the SAR image than DL due to the utilization of the additive model and the Gaussian mixture distribution. As previously discussed, better depictions lead to better recognition performance. From Table 2, we can find that A-convnet-based recognition can obtain the highest recognition rates for BMP2-C21 and BTR70-C71. e proposed PDL can realize the best performance of all the algorithms, which is 6.16%, 2.93%, 2.13%, 3.89%, 1.10%, and 1.76% better than the SR, MSR, JSR, DL, SDL, and A-convnet, respectively. anks to the better description of the SAR images, satisfying recognition results can be obtained by using the proposed algorithm.

Target Configuration Recognition on One Target.
From this part, we test the proposed algorithm on a much more challenging case, and we only compare the five SRbased recognition algorithms. We recognize 11 different T72 configurations from this part. Datasets description is given in Table 3, and the corresponding targets are shown in Figure 3. e recognition rates are demonstrated in Table 4. In this case with 11 different configurations with tiny differences, the proposed PDL can get the accurate recognition rate of 88.09%, which is 7.07%, 4.65%, 3.39%, 4.94%, and 1.91% better than SR, MSR, JSR, DL, and SDL, respectively. e reason for this phenomenon is that the proposed algorithm can describe the characteristics of SAR images more accurately in the statistical way. e comparisons of the proposed PDL and SDL further prove the fact that the product model can describe SAR images better than the additive model.

Configuration Recognition under Random Corruption.
With learned dictionaries, the recognition algorithms can achieve more robustness. In this section, we test the proposed PDL algorithm under random corruption [9,20]. We corrupt each SAR image of the datasets up to 15% with 5% increasing interval. e random chosen pixels are replaced with the independent and identically distributed samples from a uniform distribution [9,20]. e recognition results of 11 T72 configurations under all the algorithms are given in Figure 4. From the results, we can see that the performance of all the algorithms will drop with the increase in the corruption percentage. However, the reduction amount of the dictionary learning-based algorithms is much smaller than other SR-based algorithms without dictionary learning. And, we can still find that the proposed algorithm will get the highest accurate recognition rates under all corruption conditions due to the usage of the product model and the gamma distribution.

Configuration Recognition with Limited Number of
Training Samples. In this part, we test the proposed algorithm by another challenging case with limited training samples. We select 1/2, 1/3, and 1/4 of all the training SAR images to learn the dictionaries, respectively. Corresponding recognition results of 11 T72 configurations are displayed in Figure 5. Similarly, we can see that with the decreasing of the number of the training samples, the performance of all the algorithms will become poorer.
anks to the accurate descriptions of the SAR images by using the product model, the proposed PDL algorithm can get the best performance with different percentages of training samples.

Computational Complexity Analysis.
In the end, we evaluate the computational complexity of the proposed PDL algorithm and we compare it with other dictionary learningbased algorithms. e main computational complexity of the dictionary learning-based methods is to obtain the dictionary. Supposing that the size of the dictionary is D × N Φ c , where D is the dimensionality of the training samples and N Φ c is the number of the atoms of Φ c .
From the comparison of the computational complexity of the proposed PDL algorithm and the DL algorithm, we can tell that they have the same order.

Conclusions
In this paper, we proposed a PDL algorithm to realize SAR target configuration recognition. e dictionaries needed for the SR-based recognition algorithms are obtained in a statistical view to enhance robustness to noise. And the product model is utilized to describe SAR images considering the multiplicative speckle. Experiments on the MSTAR database have demonstrated that the recognition accuracy and robustness to noise of the proposed PDL algorithm are highly improved. And comparisons with other state-of-theart algorithms further prove its superiority.
From the experimental results, we can see that the recognition algorithm based on deep learning has advantages on the recognition of some targets over the SR-based ones, and how to combine it with the SR technique to achieve better results is well worth working on.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.