Automatic Target Recognition Strategy for Synthetic Aperture Radar Images Based on Combined Discrimination Trees

A strategy is introduced for achieving high accuracy in synthetic aperture radar (SAR) automatic target recognition (ATR) tasks. Initially, a novel pose rectification process and an image normalization process are sequentially introduced to produce images with less variations prior to the feature processing stage. Then, feature sets that have a wealth of texture and edge information are extracted with the utilization of wavelet coefficients, where more effective and compact feature sets are acquired by reducing the redundancy and dimensionality of the extracted feature set. Finally, a group of discrimination trees are learned and combined into a final classifier in the framework of Real-AdaBoost. The proposed method is evaluated with the public release database for moving and stationary target acquisition and recognition (MSTAR). Several comparative studies are conducted to evaluate the effectiveness of the proposed algorithm. Experimental results show the distinctive superiority of the proposed method under both standard operating conditions (SOCs) and extended operating conditions (EOCs). Moreover, our additional tests suggest that good recognition accuracy can be achieved even with limited number of training images as long as these are captured with appropriately incremental sample step in target poses.


Introduction
Synthetic aperture radar (SAR) is a valuable technique for remote sensing and monitoring applications. Automatic target recognition (ATR) of SAR images is one of the most challenging SAR applications [1]. A typical SAR ATR system recognizes tactical ground targets of interests, that is, tanks, howitzers, and armoured vehicles, which is essential for identifying friends and foes and prerequisite for precision strikes.
SAR ATR involves a sequence of processes, such as some type of preprocessing, feature extraction, classifier construction, and finally target classification. The preprocessing stage may involve multiple types of processing that aims at facilitating the efficiency of image interpretation and analysis in the subsequent stages, for example, by suppressing the clutter reflections that obscure the contrast between the target of interest and the clutter. Moreover, SAR images are resized, shifted, and rotated to predefined standards. The so-called resizing is normally implemented by cropping out part of the image. The shifting and rotating processes are also known as image registration and pose rectification, respectively [2,3].
Feature extraction is another essential stage which extracts effective discriminant features for improving recognition accuracy. Several features have already been exploited in SAR ATR [4][5][6][7][8][9][10]. Based on the consideration that tactical ground targets usually have a rectangular shape with different widths and lengths, geometric features are commonly used in SAR ATR. Zernike moments (ZMs) are employed in [6], taking advantage of their linear transformation invariance properties and robustness to the presence of noise. In [7], features are extracted based on pseudo-Zernike moments (pZm), which have merits such as the invariance properties, the independent property, and much lower sensitivity to noise in comparison with the ZMs. In [11], multiple geometric features are produced from calculating the axis projection of a target shape blob rotated clockwise with certain increment about the centre of the target. Then, the redundancy of the learned feature set is eliminated by keeping the rank of the covariance matrix of the new feature set the same as that of the entire data set. However, the geometric features of the target of interest in SAR images are difficult to measure precisely due to the cluttered background and variations in poses and depression angles. Therefore, the recognition accuracy is not guaranteed. The polar mapping method, which is frequently used in ISAR image classification, is modified and used in [3] to address the SAR ATR problem. The original images are converted from the original 2D spatial domain (range and cross-range) to images in the polar coordinate domain (radius and angle) to produce polar-mapped images. The polar-mapped images are similar to the images that are mapped from the same target even in different poses. For that reason, the commonly used pose estimator is not necessarily needed for polar-mapped images. However, the performance of the polar mapping method depends highly on the determination of the reference central point for coordinate transformation, which is not a simple task especially for SAR images captured under various clutter environments. Certain features are not feasible to be directly applied to classification due to their high dimensionality [12][13][14][15][16][17][18][19]. In [12], a compact representation feature, the monogenic signal, is employed for SAR ATR, where the high dimensional problem is circumvented by uniform downsampling, normalization, and concatenation of the monogenic components. Feature dimensionality reduction methods for SAR ATR based on manifold learning theory are also studied in recent years [13][14][15][16][17]. In [16], each sample is given a weight, which is called the sample discriminant coefficient (SDC), relating to its similarity to neighbouring samples, and then the SDC is combined with the Local Discriminant Embedding (LDE) method for producing redundancy-reduced features. Similarly, in [17], the so-called neighbourhood geometric centre scaling embedding (NGCSE) method is proposed, where geometric centre scaling is introduced into the neighbourhoods such that the samples are provided with clear clustering directions. However, the performance of most of the nonlinear dimensionality reduction methods relies heavily on the parameter selection of the neighbourhood, which is still an open problem.
The nearest neighbour classifier is one of the most used classifiers, where the extracted features are directly fed into the classifier to achieve the classification results [16]. Sparse representation based classification (SRC) is recently developed and exploited in SAR ATR, where the feature vectors of the testing samples are coded as sparse linear combinations of the feature vectors of the training samples, and the target with the minimum residual energy is recognized [12,20]. Methods such as Support Vector Machines (SVM), Neural Networks (NN), and adaptive boosting (AdaBoost) are all vastly exploited in SAR ATR [2,5,[21][22][23]. Various choices of base learners can be combined with the AdaBoost algorithm to solve the SAR ATR problem [2]. As explained in the Hughes phenomenon (also known as the curse of dimensionality), the difficulty of constructing classifier models becomes more prominent especially when the feature set is high in dimensionality while the number of the training data is limited (a fact in SAR ATR). However, the combination of the AdaBoost and graphical models is empirically proven in [24] to demonstrate good performance even when the training data is limited in number.
The SAR images are known for their indistinct appearances, variations in target appearances, and small number of available training samples. These problems must be properly addressed to achieve good recognition results for ATR tasks. To this end, a SAR ATR scheme is introduced as illustrated in Figure 1. Firstly, an initial processing stage is applied to facilitate the efficiency of feature extraction in the subsequent stages. More specifically, aiming at reducing the impact of variations in SAR images caused by variational echo energy and target poses, an image energy normalization process and a pose rectification process are applied sequentially. The construction of effective feature sets for ATR tasks is of crucial importance for achieving reliable recognition results. Therefore, it is suggested to extract a rich feature set that is formed by combining various types of discrimination features and then construct a more compact feature set by eliminating the redundancy of the rich feature set. We have decided to employ wavelet-based features. A rich feature set is firstly formed by combining the decomposed wavelet subband features, for example, the low-frequency information in LL subband coefficients and the high-frequency information in both LH and HL subband coefficients, where the HH subband is not involved since it is not stable feature in SAR images [25]. The involved coefficients actually depict the combination of texture features and horizontal and vertical edge features. After this, a compact low dimensional feature set which comprises features which retain most of the variance is constructed by employing the Principle Component Analysis (PCA) technique [26]. The relationship among features is statistically learned in a discriminative fashion rather than a generative fashion. Specifically, instead of using the true distribution, which is usually unknown for most of the time, the empirical estimates are learned in a discriminative fashion by maximizing the -divergence. Therefore, although the learned models may have low consistency with the real model of target classes due to limited amount of training data, high discrimination ability can still be achieved. Then, a final classifier is constructed by combining several discriminative tree based classifiers with the Real-AdaBoost framework [27]. To evaluate the performance of the proposed method, the moving and stationary target acquisition and recognition (MSTAR) public release data set is involved. Experimental results demonstrate that the proposed method outperforms several widely cited methods under both standard operating conditions (SOCs) and extended operating conditions (EOCs).
Variation reduction techniques that facilitate the efficiency of feature extraction are introduced in Section 2. The feature extraction and processing techniques are introduced in Section 3. The recognition scheme is detailed in Section 4. Experimental results using the MSTAR public database are shown in Section 5, followed by our conclusions in Section 6.

Variation Reduction Techniques
2.1. Image Energy Normalization. The echo strength of SAR is strongly affected by, for example, the range distance between the imaging target and its corresponding radar and several other reasons; therefore the average amplitude of image pixels in different image chips may be different even for the same target [28]. To mitigate the potential influence of amplitude variations in subsequent features extraction, the image energy normalization process needs to be applied. Let and denote the number of pixels in range and crossrange dimension for a given SAR image chip. The SAR image chip can be denoted as ( , ), where = 1, . . . , and = 1, . . . , are the dimension of range and cross-range, respectively. The energy normalized image pixel ( , ) can be described as where min ( , ) and max ( , ) is the minimum and maximum value among all pixels of ( , ), respectively, and ( , ) is calculated as The benefit of employing the image energy normalization process is provided in Section 5.1.

Pose Rectification.
Pose rectification is beneficial for improving the accuracy of SAR ATR and can be achieved by rotating the given images according to the pose of target of interests. However, targets with partial defected contour shapes that are caused by the shadow effect may suffer from poor pose estimation accuracies. This section introduces a pose estimation method that is based on the exploration of targets' geometrical information for achieving higher estimation accuracy.
Several methods have been proposed for achieving higher accuracy in pose estimation. The methods proposed in [29,30] are based on maximizing the mutual information with multilayer perceptron (MLP). Although a low estimation error is achieved, these methods are computationally expensive and require a long training time. The method proposed in [31] is based on the 2D continuous wavelet transform (CWT), where the orientation that maximizes the angular energy is considered as the estimated pose. However, this method is based on the assumption that the target of interest is already placed in the image centre, which is difficult to achieve especially for SAR images with indistinct targets.
In fact, the tactical ground targets show rectangular shaped boundaries, which can be used for pose estimation. Therefore, methods based on the analysis of the geometrical information of target of interests have been proposed. The methods proposed in [2,32] are based on finding the encapsulating box of the target of interest, where the basic assumption is that the edges of the estimated box should be tangent to the rectangular shaped target boundaries. However, this is not always true with incomplete target shape boundaries due to the shadow effects in SAR images. Moreover, the least squares linear fit based methods estimate the centreline of the target of interest, where the slope of the centreline is considered as the target pose. However, for similar reasons, the shadow effect in SAR may produce images with defected target, which can affect the corresponding pose estimation results. As discussed, the encapsulating box based methods have failed to achieve the optimum estimation result due to the defect targets in SAR images. However, as will be introduced, the Radon transform based method can achieve better estimation result in such scenarios [33]. Therefore, better estimation accuracy can be achieved by employing these two methods in a well-designed fashion. Firstly, the target of interest is segmented from the SAR image, and the rectangle that has the minimum perimeter around the segmented target is considered as the minimum bounding rectangle (MBR) [34]. Then, the completeness of the target of interest can be evaluated. In the case of a target with complete contour shape in the SAR image, the MBR estimated result is considered as the final result. Otherwise, the Radon transform is conducted and its estimation is used as the final result.

Estimation for Targets with Complete Contour Shapes.
Tactical targets in SAR images have randomly distributed poses ranging from 0 ∘ to 360 ∘ (the target pose is defined as the angle between the target's longer edge and the horizontal image axis). Tactical targets in SAR images show rectangularlike shapes. Figure 2 shows the segmented SAR chips, where the target poses can be estimated according to the inclination angle of its MBR. As introduced in [34], the rectangle that has the shortest perimeter enclosing a convex polygon has at least one side collinear with one of the convex edges. The MBR can be efficiently calculated as follows: Step 1. Estimate the centroid of the target of interest.
Step 2. Compute the convex polygon of the target of interest. Step 3. Compute and store the edge orientations of the convex polygon.
Step 4. Rotate a bounding rectangle according to the stored edge orientations until a full rotation is done.
Step 4.1. Find a fitted rectangle.
Step 4.2. Store the perimeter of the fitted rectangle.
Step 5. Return the rectangle corresponding to the minimum perimeter.

Estimation for Targets with Incomplete Contour Shapes.
Due to the imaging principle of SAR, partial part of the target of interest is not radiated by radar beam, and therefore the imaged target shows incomplete boundary shape. However, the long edge of the target of interest is always well imaged, as shown in Figure 3. In fact, the Radon transform (RT) can be used for long edge detection. Therefore, for SAR images with targets that show incomplete contour shapes, the RT based estimation can achieve higher accuracy. The application of the RT on a target image I( , ) limited by a set of angles can be considered as calculating the projection of the target along given angles. The calculated projection result is the sum of pixel numbers in each single direction, where a line can be found in the corresponding target image according to the peak of the projection result [33]. Define G( , ) as the projection at angle with distant to the image centroid, and the RT is implemented as follows: where (⋅) is the Dirac delta function. The parameters and determine the projection direction, where the projection is repeated from ∈ [0 ∘ : 180 ∘ ). Note that a pixel in the RT transform is divided into four subpixels such that accurate projection result can be achieved, where the projection contribution is calculated according to the position of the subpixel that hits the projection bin.

Degree of Overlapping
Rectangle. In fact, for any given image, the completeness of the target in SAR images can be automatically calculated. As introduced in Section 2.2.1, the calculated MBR has at least one edge overlap with the target boundary. Therefore, in the case of a complete target, one long edge of the target of interest will overlap with that of its corresponding MBR. In the case of a target with partial defect, the diagonal line of the target of interest may overlap with a long edge of its corresponding MBR with few pixels. Let denote number of pixels of the two MBR long edges, and let denote the number of target pixels that overlap with the two MBR long edges. The completeness of the target in SAR images can be evaluated as follows: the target is firstly dilated, and then the degree of overlapping rectangle is calculated as / , and finally the completeness of the target boundary is evaluated according to the calculated degree of overlapping rectangle. After dilation, since the difference between the complete contour shape and defected contour shape is large, the proposed method is not sensitive to the selected threshold employed for evaluating the degree of overlapping. Overall, as shown in Figure 4, the target pose is estimated using the MBR based method or the RT based method depending on the evaluation result of the degree of overlapping rectangle, and several estimation results are shown in Figure 5.

Rich Feature Set Extraction.
Feature extraction is of crucial importance to the overall performance of the entire ATR system. It is ideally preferable to extract features that have characteristics of high discrimination ability (or, in other words, high interclass variation) and high tolerance to target translation. These feature characteristics can be achieved by efficiently employing the wavelet decomposition technique. As depicted in Figure 6, the texture features are reflected in LL and the horizontal and vertical edge feature are reflected in LH and HL, respectively. HH is actually a combination of features reflected in LH and HL. Furthermore, the translation invariant features can be extracted by sequentially further decomposing the previously decomposed image to a much  coarser resolution. The idea behind the translation invariant features is that each decomposition process throws away the exact positional information of certain feature that exists in a specific area. More specifically, as illustrated in Figure 7, a pixel point in a newly decomposed image implicitly reflects the presence of certain feature(s) in a corresponding entire local region in the original image.
MMC finds the mother wavelet that maximizes the average margin between classes. This is achieved by comparing the difference between the average within-class distance and the average between-class distance − . The mother wavelet that achieves the maximum difference is the best selection. Suppose we have classes 1 , 2 , . . . , , each class with samples and therefore, = ∑ =1 samples in total. Let denote the th sample in the th class, let be the centroid of the th class, and let be the centroid of the training set. The average within-class distance and the average between-class distance can be denoted as The comparison of the discrimination performance of the mentioned wavelets is illustrated in Figure 8. It is noted that the Reverse biorthogonal wavelet 3.1 achieves the highest value, an observation which indicates that it has the highest discrimination ability among these wavelets. Therefore, the Reverse biorthogonal wavelet 3.1 is selected as the default mother wavelet for feature extraction in SAR images.
The above process yields large sets of features which exhibit a high variability as far as the quality of the discriminative information that they convey is concerned. To achieve the truly effective features, the PCA is used, which achieves comparable result to both 2D-PCA and two-stage 2D-PCA when they are employed for SAR feature compression purposes, as analysed in [26]. Moreover, the PCA is much more efficient as far as both computation time and storage space are concerned. The implementation of the PCA is introduced as follows: Number of principal components .
Step 1. Subtract the mean of variables from .
Step 3. The dimensionality reduced feature set is calculated with the first column of as .

Learn Statistical Relationship among Features.
Since access to data arising from true distributions is often not available, the learned models based separately on positive/negative samples are usually not accurate enough for classification. In fact, the discriminative methods construct models from both the positively and negatively labelled samples in a discriminative fashion. Since the final objective is classification, even if the learned distributions may not converge to the true distributions, the constructed discriminative models tend to have better discrimination performance than the generative models [36,37]. In binary classification case, which can be naturally extended to the more general -ary classification case, for a given labelled training set The indexes of the mother wavelets Discrimination ability Figure 8: Discrimination ability of various types of mother wavelets. The involved 47 mother wavelets are sequentially discrete Meyer wavelet, Biorthogonal wavelets (orders 1.
where is the threshold [38]. In most cases, it is impossible to have access to the true conditional distributions and . Approximationŝandâ re normally built to learn the unknown distribution from the labelled training set S. Therefore, the log-likelihood ratio test can be rewritten as The recently proposed method named discriminative tree estimates the multivariate distributionŝand̂jointly from both the positively and negatively labelled samples in the training set of Tan et al. [37]. This method is based on the assumption that the learned distribution̂( ) is Markov with respect to an undirected graph G = (V, E), where V = {1, . . . , } represents the vertex set and E ⊂ ( V 2 ) represents the set of all unordered pairs of vertexes. The mentioned Markov conforms to the local Markov property where N( ) fl { ∈ V : ( , ) ∈ E} represents the set of neighbour nodes of and A = { : ∈ A} for any set A ⊂ V.
A tree structured distribution̂that is Markov with respect to an undirected graph G = (V, E) can be factorized as follows [39]: wherê( ) represents the marginal of the random variable and̂, ( , ) represents the pairwise marginal of the pair ( , ).
Based on this, for a given distribution , the projection of onto some tree distribution G = (V, E) is defined as follows:̂( We digress here to introduce the method for constructing models in generative fashion and then provide the method for constructing models in discriminative fashion. The generative methods attempt to construct a model that is the same as the underling model of the classification target. The widely researched generative method, namely, the Chow-Liu algorithm [40], employs the KL-divergence as the measure of the differences between two probability distributions and . The optimization in the Chow-Liu algorithm is therefore defined as wherê∈ T states that̂is a tree structured distribution over the same alphabet as T. It is shown by Chow and Liu that this optimization problem can be solved by using a maximum weight spanning tree (MWST) algorithm (e.g., Kruskal's [41]) where the mutual information is used to represent the edge weights between pairs of variables. In contrast, the recently proposed discriminative method employs the -divergence as the measure of the separation between two probability distributions and . Thedivergence is defined as follows [42]: The optimization problem reduces to two tractable MWST problems for maximizing the tree approximatedivergence over the two tree structured-distributionŝand for known empirical distributions̃and̃, which is defined as It is noted that, as described in [37], (13) can be decoupled into two independent optimization problems: These can be solved by the MWST algorithm Overall, the procedure of the learning of the discriminative tree is summarized in the following steps [37]: Given. Training set S.
Step 3. Find the optimal tree structures with the given edge weights.
Since the classification result is finally determined by the numerical result of the log-likelihood ratio test, we choose to employ one fixed threshold 0 for the entire training process. This is because likelihoods larger than 0 indicate higher probability of belonging to ( ). Similarly, likelihoods smaller than 0 indicate high probability of belonging to ( ).

Recognition Scheme
The main aim of classifier construction in ATR is to convert a wealth of training data into useful knowledge for classification by learning. However, a classifier learned from massive amounts of high varying data is not guaranteed to achieve good performance in classification and may yield large feature dimensions. Therefore, it is of great importance to find effective representations for the targets of interest to be used for constructing the classifiers.
Extracted features might comprise large sets of features which at a glance might be worth of exploiting but turn out to be too "messy" and high in redundancy. In fact, the learning process of the classifiers could be enormously benefited from a feature dimensionality reduction process after the acquisition of the extracted features as previously discussed.
The redundancy-reduced features can be used for learning classifiers, where efficient classifiers and better classification accuracy results can be achieved. Therefore, it is suggested to enlarge the quantity of the potential features but then eliminate the existing redundancy, reduce the dimensionality of the enlarged feature set, and finally exploit the preserved features for classification, which comprise the characteristics of proper combination of both quality and quantity. In the proposed recognition scheme, features are extracted with wavelet decomposition, but then the dimension of the feature set is reduced to provide a feature set rich in discriminative information but with limited dimensionality and less redundancy. To make the most of the extracted features, tree structured classifiers are learned in discriminative fashion based on the statistical information provided by the training data of the target classes. In the learned classifiers, the feature nodes are connected as a spanning tree, where each node is connected to another node which has the maximum relevance. Moreover, the relevance between feature nodes can be accordingly calculated. Finally, classifiers are combined using the Real-AdaBoost algorithm to construct the final classifier that has high classification accuracy and is less prone to overfitting, where the recently proposed discriminative trees are involved as the base classifiers. A generic sequence of steps of the proposed scheme is illustrated in Figure 9.

Construct a Strong Classifier.
Efforts have been constantly made to construct a classifier with high classification accuracy and strong generalization ability (the later meaning that performance of the classifier learned from a given training dataset will still be good when the classifier is exposed to unseen data) [43]. Employing ensemble learning methods is one of the solutions. Ensemble learning methods construct and combine a set of base classifiers instead of constructing and using one single classifier learned from the training dataset. Base classifiers can be generated from a training dataset with the use of any learning algorithm (e.g., decision tree, graphical models, and neural networks).
AdaBoost is one of the ensemble methods that have achieved great success in diverse domains [27,[43][44][45][46][47]. The general idea of the AdaBoost is to constantly update the distribution of the training data such that the learning of the base classifiers in each iteration focuses more on the wrongly labelled samples by the previous learned base classifiers. Real-Adaboost is a variant of the AdaBoost which has been empirically proved to have better performance than ordinary AdaBoost (Discrete-AdaBoost) [27,37,44]. Specifically, for a given training dataset S, each sample is assigned with an initial weight ( ) 0 = 1/ , where is the number of training samples. A base classifier is learned in each iteration such that ℎ : X → R, where a larger absolute value in ℎ ( ) indicates higher confidence. Then, the samples wrongly labelled by ℎ ( ) are increased in weights such that the constructed classifiers in the following iterations can focus on the misclassified samples. Finally, the combined classifier resulting after iterations is Image pixel data of the original target chip image

Feature dimensionality reduction
Preserved features

Classi er construction
Constructed classifier models
where sgn is the sign function that sgn( ) = 1 if ≥ 0 and −1 otherwise and is the coefficient calculated in each iteration for minimizing the weighted training error. Overall, the Real-AdaBoost algorithm trains a set of base classifiers sequentially and combines them to a strong classifier, where the current learned base classifiers focus more on the wrongly labelled samples by the previous base classifiers. The ensemble process of the Real-AdaBoost is iterated with the rearrangement of the training set distribution while the learning method of the base classifiers is not changed. For the learning of the base classifier in each iteration , the group of redundancy and dimensionality reduced wavelet features are employed and fed to learn the discriminative trees for classifier construction. By employing the learning method introduced in Section 3.2, a pair of discriminative trees is constructed to provide an estimation of the classification result. Specifically, the pair of discriminative trees constitutes a base classifier for the Real-AdaBoost ℎ : X → R, where ℎ ( ) = log[̂( )/̂( )], and̂and̂denotes the learned discriminative tree models at the th iteration of the Real-AdaBoost. After iterations, pairs of discriminative trees are learned and combined to construct a stronger classifier with better approximation of the classification result which can be written as Viola and Jones [45] wherê * ( ) = ∏ =1̂( ) and̂ * ( ) = ∏ =1̂( ) . For the iterative updating of the training set distribution, the misclassified samples are reassigned with larger weights and the correctly classified samples are reassigned with smaller weights compared to their previous weights. Regarding the weight distribution updating problem, simply reduplicating the samples with higher weights is time and computation inefficient. This is because as the number of iterations increases, the wrongly labelled samples would be much less in number but have much larger weights. Therefore, the final training set is fixed in size and constructed in random sampling fashion, where samples of the original training set are chosen according to the updated distribution weights. The entire classifier construction scheme is summarized as below: Given. Training dataset . Number of iterations .
Step 1. Wavelet feature extraction from the given training dataset .
Step 2. Redundancy and dimensionality reduction for the extracted features.

Experimental Results
In this section, the performance of the proposed scheme is evaluated and compared with several established methods. The widely used SAR ATR experimental validation and comparison benchmark moving and stationary target acquisition and recognition (MSTAR) public release database is employed for performance evaluation [49][50][51]. The MSTAR database consists of 10 vehicle classes, which are collected by X-band SAR with 1-ft by 1-ft resolution, including BMP2, BTR70, T72, BTR60, 2S1, BRDM2, D7, T62, ZILI131, ZSU234, and SLICY. The collection of target images is captured under various depression angles and aspect angles, which are  There are two categories of operating conditions in the MSTAR database: the standard operating conditions (SOCs) and the extended operating conditions (EOCs) [50]. The targets captured under SOCs are listed in Table 1 including information about vehicle types, number of chip images, serial numbers, and depression angles. It is worth noting that the EOCs are much more difficult for SAR ATR than the SOCs. In EOC-1, the depression angles are larger in variation where the training images are captured under 15 ∘ and the testing images are captured under 30 ∘ , as shown in Table 2.  Table 3.
We experiment with both two-level and three-level twodimensional wavelet decomposition with respect to the Reverse biorthogonal wavelet (the selected mother wavelet as introduced in Section 3.1). In the following, wavelet 768 (256 × 3 = 768) and wavelet 192 (64 × 3 = 192) are used to denote the two-level and three-level wavelet decomposition, respectively. The stopping criterion of the Real-AdaBoost is set to 400 iterations. The segmentation of the target of interest is implemented with the MRF model based method, where the potential class number is 2, the expectation is 0.4, and the maximum iteration number is 50. The segmented target is dilated with a disk-shaped template with radius 3. The degree of overlapping rectangle is 0.5 indicating that an appropriate MBR must have more than 50% long edge overlapping pixels in terms of the target of interest. Moreover, the proposed method is implemented using Matlab R2013a and tested on a computer with 1.8 GHz CPU and 4 GB RAM. Regarding the computation complexity, for a classifier trained for classifying 10 targets in OvO fashion, the processing time for one single sample takes less than 0.02 s, including the processes of extraction and compressing of features and recognition of targets.
Before applying the proposed method to SAR ATR and comparing with other methods, it is necessary to test the proposed method in conjunction with several important processes, including image energy normalization, feature extraction, extension of two-class to multiclass classification, and pose rectification. These four tests are conducted in Sections 5.1 to 5.4, and the comparisons of recognition accuracy performance with other methods are provided in Section 5.5.

Image Energy Normalization.
The significance of image energy normalization in SAR ATR is tested in this section, where the performance of the proposed scheme is tested with or without image energy normalization processing. The dataset includes all 10 classes captured under SOCs as listed in Table 1. The wavelet 192 is used for feature extraction. It is noticed in Figure 10 that the involvement of normalization before feature extraction is beneficial for improving classification accuracy. In fact, as the dimension of feature vectors employed for classification grows, the advantage of image energy normalization diminishes. This is because a larger training feature set provides more information for classification, where the classifier is empowered with more discrimination ability by exploiting the provided information. However, the classification with normalization achieves good classification accuracy (around 96%) even when the feature vector dimension is much lower, yielding an accuracy which is almost the same as the accuracy achieved with higher feature dimensions. Therefore, it is still suggested to employ image energy normalization for preprocessing, especially for classifiers constructed from training feature sets of lower dimensionality. In the following, the image energy normalization process is employed as a standard default processing step.

Extension to Multiclass.
We compare the OvO and OvA strategies on the same training set (all 10 classes under SOCs) to test their performance on the SAR ATR problem. It is noted in Figure 11 that the OvO strategy appears to be outperforming the OvA strategy marginally in the SAR ATR problem. The marginal differences in recognition accuracy lie in the unbalance of the training set, where the OvA strategy employs the positive sample classes that are much less in quantity than the negative sample classes. In fact, the advantage of the OvA strategy is that it is less in computation and time complexity, where the OvO constructs 45 classifiers and the OvA constructs 10 classifiers for a 10-class problem, respectively. Since the aim of this paper is to provide a SAR ATR scheme with high recognition accuracy, OvO is employed as the default strategy for solving the multiclass problem.

Feature Extraction.
In this section, we compare the performance of feature extraction using the wavelet 192 12 Computational Intelligence and Neuroscience  (three-level wavelet decomposition) and the wavelet 768 (two-level wavelet decomposition). All 10 classes captured under SOCs are employed for both training and testing. As illustrated in Figure 12, these two curves coupled with each other. The wavelet 192 outperforms the wavelet 768 when feature vectors possess lower dimensions. However, this situation changes as the dimension of feature vectors grows to 40. Moreover, the best classification result (97.46%) is achieved by the use of wavelet 768 with feature dimension of 70. This is because the wavelet 768 provides more features for discrimination and the proposed ATR scheme constructs and combines several discriminative tree classifiers that make the most of the discriminative information inherent to the extracted features.

Pose Estimation.
To test the performance of the proposed pose estimation method, the estimation results of the proposed methods are compared to the ground truth of target poses (the azimuth information provided in the MSTAR database). The correctness of the estimation results is evaluated with the so-called mean absolute difference (MAD), which is calculated as Err = 1/ ∑ =1 | ( ) − ( )|. The MAD reveals the actual estimation error about the ground truth in comparison to the mean error (ME), since it prevents the offset of the positive and negative errors. Moreover, the performance of the proposed method is evaluated and compared with several widely cited methods, such as the least square method (LSM) based estimation, the Hough transform (HT) based method, the MBR based method, and the Radon transform (RT) based method. All 10 targets captured with different depression angles and target poses are involved to test the robustness of the proposed method over depression angle variations. The evaluation results for the 10 targets at depression angles 17 ∘ and 15 ∘ are listed in Tables 4 and 5, respectively. All serial number variants of BMP2 and T72 in MSTAR dataset are involved to test the robustness over variation in serial numbers. The evaluation results of the data captured at depression angles 17 ∘ and 15 ∘ are listed in Tables 6 and 7, respectively. It is noted that the MBR based method has achieved much lower estimation error in comparison to other methods. However, the performance of the MBR based method has estimation error higher than 10 ∘ in several tests. More specifically, the MBR based method achieves the highest estimation error 15.32 ∘ for the BRDM2 captured at depression angle of 15 ∘ . In comparison with these methods, the proposed method achieves the lowest estimation error in all tests (lower than 10 ∘ ).
Furthermore, the average estimation error of the above tests is illustrated as bar figure in Figures 13 and 14, such that a much more distinct comparison can be observed. Similarly, the proposed method is compared to the least square method (LSM) based estimation, the HT based method, the MBR    based method, and the RT based method. It is noted that, for most of these methods, higher estimation error is achieved in the serial number variation test. Compared with these methods, the proposed method achieves robust and accurate estimation results in both tests. Specifically, the proposed method achieves the lowest average estimation error in all tests (lower than 8 ∘ ).

Pose Rectification.
The various target poses introduce great variations into the SAR images. It has been experimentally proven in several researches that rotating images in certain directions or introducing rotationally invariant features is beneficial for improving classification accuracy [2,3]. To this end, we rotate the image according to the target poses in the SAR images, which is named as pose rectification.
In this section, we test the performance of the proposed scheme with or without pose rectification using the same training and testing set (all 10 classes under SOC). The SAR images are rotated anticlockwise according to their poses. As can be seen from Figure 15, the classification with pose rectification universally outperforms the classification without pose rectification. It is also noted that the best classification accuracy (99.3%) is achieved by the wavelet 768 with feature dimension of 75. These results meet our expectation that the classification can benefit from eliminating the pose variations in SAR images. Specifically, the rectification of poses provides target images for classification with fewer variations.

Outlier Rejection Performance.
To evaluate the outlier rejection performance of the proposed method, a varying threshold for the log-likelihood test, which is introduced in Section 3.2, was incorporated to provide the ROC curve. BTR70, BMP2, and T72 listed in Table 1  with 1168 image chips, that is, 210 chips captured at 15 ∘ , 298 chips captured at 16 ∘ , 386 chips captured at 17 ∘ , and 274 chips captured at 29 ∘ . As shown in Figure 16, at the probability of detection = 90%, the probability of false alarm for feature dimension of 75 is fa = 2.34%. It is clear that the proposed method is robust at rejecting confuser targets.

ATR Performance Comparisons.
The effectiveness of the proposed ATR scheme is tested in this section. Several widely cited methods are involved for performance comparison, for example, the Extended Maximum Average Correlation Height Filter (EMACH) [53], the Support Vector Machine (SVM) classifier with Gaussian kernel [52], feature fusion via AdaBoost with neural networks as the base classifiers [2], and the Iterative Graph Thickening (IGT) approach [24]. The best result obtained from the proposed method is used to compare with other methods. Table 8 lists the performance comparison of the mentioned methods under SOCs. It is noted that the proposed method has achieved significant improvements in classification accuracy in comparison with other methods. A majority of the classes are correctly classified with 100% accuracy and the rest have cc higher than 98%, which is also much higher than other methods. Moreover, the superiority of the proposed method is strengthened by the fact that the average cc (99.3%) is much higher than the second highest average cc (84.8%).
Four distinct target classes are involved in the following test: EOC-1, including 2S1, BRDM2, T72, and ZSU234, as listed in Table 2. All of these four classes are involved in training and testing stages. The only difference is that the training and testing set are captured under depression angles 15 ∘ and 30 ∘ , respectively. The increase in depression angle variations introduces a bigger challenge to the classification problem. It is noted in Table 9 that the classification accuracy of the most of the mentioned methods is lower than 88% under EOC-1, where the superiority of the proposed method (higher than 96%) is obvious. Furthermore, the average classification accuracy of the proposed method is 97.5% which is much higher than the other listed methods.
The training dataset under EOC-2 is composed of four different target classes, BMP2, BRDM2, BTR70, and T72, as summarized in Table 2. This test aims at testing the performance of the SAR ATR algorithms with significant different in serial numbers and configurations. The testing set has only the T72 family with five different serial numbers and the training set is composed of all these four mentioned classes. In addition, the training set was obtained at 17 ∘ while the testing set was obtained at depression angles of both 15 ∘ and 17 ∘ as shown in Table 3. Table 10 lists the performance comparison of the mentioned methods under EOC-2. The improvement in classification accuracy is substantial since the average cc of the proposed method is 96.9% which is much higher than the second highest 84.8%.

Performance Comparison of Variations in Target Poses.
As analysed in Section 5.4, the involvement of pose rectification is beneficial for improving the classification performance. In fact, a single target will exhibit different appearances when it is captured under various poses. In this section, we conduct an experiment to test the influence of the appearance differences introduced by the pose variations. The experimental database is almost the same as the data listed in Table 1 except that only one single serial number of each target is involved for training and testing (C21 for BMP2 and S7 for T72). The images for training are selected from the training database with different sample steps of target poses (varied from 1 ∘ to 7 ∘ ), where 51 images are selected for the training of each target. For example, the poses for Step 1 are 1 ∘ , 2 ∘ , . . . , 51 ∘ , the poses for Step 2 are 1 ∘ , 3 ∘ , . . . , 101 ∘ , and the poses for Step 7 are 1 ∘ , 8 ∘ , . . . , 351 ∘ . Additionally, we have also investigated the possibility of training classifiers using training databases with different sizes, for example, Table 8: Confusion matrix of EMACH, method proposed in [52], method proposed in [2], IGT, and the proposed method tested under SOCs ( cc (%)).
It is noted in Figure 17 that as the incremental step of poses increases, the achieved classification accuracy grows too. More specifically, a much higher cc of 90.3% is achieved by employing 51 training images with incremental Step 7 in pose, in comparison with a cc of 42.9% achieved by employing 85 training images with incremental Step 1 in pose. The principle behind this observation is that the training datasets formed with small pose variation steps can provide less target signal information and thus, their content is not sufficient enough to cover the different appearances of the targets captured under various poses. In contrast, a much more complete training dataset can be formed when the involved images are captured with larger pose variations. The experimental results in Figure 17 show that the best classification performance 16 Computational Intelligence and Neuroscience 90.3% is achieved when training with 51 images captured using incremental Step 7 and 93.0% is achieved when training with 85 images captured using incremental Step 4. Moreover, it is quite straightforward to find that better classification performance is always achieved when training with relatively larger number of training images. It is worth pointing out that only a small number of images are involved in the training stage rather than several hundreds of them as used in the previous tests. This is a promising result which implies that a good classification result can be achieved even with much less number of training images, as long as they are captured with appropriate incremental step.

Conclusion
In this paper, we presented a systematic scheme for the SAR ATR task. The proposed scheme involves three main stages: preprocessing, feature extraction and processing, and classifier construction. The effectiveness of involving several preprocessing approaches (e.g., the image energy normalization and the pose rectification processes) is analysed and empirically verified. The results suggest that the involvement of these preprocessing steps is beneficial for improving the classification accuracy. Moreover, we proposed to expand the feature set to provide more information for discrimination and then eliminate the redundancy and dimensionality of the extended feature set to form a more compact and efficient feature set. Finally, the discriminative trees are learned as the base classifiers and combined to construct a strong classifier by using the Real-AdaBoost algorithm. The proposed method is evaluated with the MSTAR dataset under various operating conditions. Experimental results demonstrate that the proposed method outperforms traditional methods, for example, EMACH, SVM, NN, and IGT. The advantages of the proposed method give credit to the reduction of variations in target images, the improvement of feature efficiency, the elimination of redundancy in feature sets, and the excellent generalization capability of the combined strong classifier. Moreover, we have tested the classification performance of the classifiers trained with different combinations of target poses. Experimental results show that a classifier trained with training images covering large variations of target poses can produce good classification result even with limited number of training images.

Conflicts of Interest
The authors declare that they have no conflicts of interest.