Robust SAR Automatic Target Recognition Based on Transferred MS-CNN with L2-Regularization

Though Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) via Convolutional Neural Networks (CNNs) has made huge progress toward deep learning, some key issues still remain unsolved due to the lack of sufficient samples and robust model. In this paper, we proposed an efficient transferred Max-Slice CNN (MS-CNN) with L2-Regularization for SAR ATR, which could enrich the features and recognize the targets with superior performance. Firstly, the data amplification method is presented to reduce the computational time and enrich the raw features of SAR targets. Secondly, the proposed MS-CNN framework with L2-Regularization is trained to extract robust features, in which the L2-Regularization is incorporated to avoid the overfitting phenomenon and further optimizing our proposed model. Thirdly, transfer learning is introduced to enhance the feature representation and discrimination, which could boost the performance and robustness of the proposed model on small samples. Finally, various activation functions and dropout strategies are evaluated for further improving recognition performance. Extensive experiments demonstrated that our proposed method could not only outperform other state-of-the-art methods on the public and extended MSTAR dataset but also obtain good performance on the random small datasets.


Introduction
SAR ATR is widely used in various fields such as urban monitoring, natural environment survey, and military target reconnaissance [1,2], which can acquire earth observation images from severely adverse weather conditions and excavate hidden and camouflaged targets effectively [3,4]. Compared with other microwave detection tools, distinctive characteristics derived from SAR images can work well better than other sensors, like optical and infrared methods on coherent imaging system and electromagnetic scattering mechanism. Nowadays, SAR ATR is an essential technique in remote sensing application. SAR ATR was restricted by imaging quality and advancement of image classification. SAR target classification can be categorized as traditional methods and deep learning methods. Generally, traditional methods aim to extract discriminative and represented features from the training samples. Traditional feature extraction methods, such as Histogram of Oriented Gradients (HOG) [5], Local Binary Pattern (LBP) [6], Principal Component Analysis (PCA) [7], and Scale Invariant Feature Transform (SIFT) [8], were applied to SAR target classification task. Song et al. [9] designed a novel gradient HOG-like feature-based SAR ATR method to tackle a complex application environment. Li et al. [10] proposed a HOG descriptor-based method to match features between SAR images and optical images. Ghannadi and Saadatseresht [6] proposed a modified LBP descriptor to obtain robust features for SAR ATR. Wang et al. [11] presented an improved SAR interferogram denoising method based on PCA to improve the accuracy of phase unwrapping. Xiang et al. [12] combined an adaptive sampling method with SAR-SIFT to eliminate obvious scale difference images. However, the presence of speckle noises and lack of robust features have seriously degraded the feature robustness of the SAR image.
In recent years, deep learning methods have been proposed to extract more robust features. Rumelhart et al. [13] proposed the Back Propagation (Back Propagation) algorithm for Multilayer Perception (Multilayer Perception, MLP) to effectively solve the nonlinearity classification problem. LeCun et al. [14] presented LeNet structure to improve the classification performance. In 2006, Hinton et al. [15] proposed a self-learning method to overcome the gradient disappearance in deep network training. Zhang et al. [16] proposed a SAR-CNN model with batch normalization to estimate the speckled image and improve its performance. Zhen et al. [17] integrated effective image preprocessing and CNN for SAR target classification. Shahzad et al. [18] used CNN to generate visible images with high quality from SAR images and yield good result. Hughes et al. [19] identified corresponding patches in SAR and optical images with pseudosiamese CNN, to identify corresponding patches in high-resolution optical and SAR remote sensing imagery. ough the above methods could achieve relatively good performance, two key problems still remain unsolved in SAR ATR [17,20]. e first challenging problem is effective model design, which is mainly impacted by objective function and cost function design, super-high dimensional parameters optimization, and so on. e second challenging problem is model generalization, such as overfitting caused by insufficient samples and generalization ability on unknown targets.
In this paper, a transferred MS-CNN with L 2 -Regularization is proposed to tackle the aforementioned challenging problems. Firstly, joint ROI feature extraction and data amplification methods are adopted to prepossess interested SAR regions and enhance the richness of raw training samples, respectively, which could seek accurate interested image regions and effectively scale up the dataset. Secondly, a novel transferred MS-CNN framework with L 2 -Regularization is proposed to extract robust features and address the overfitting challenge. irdly, transfer learning is employed to improve the feature discrimination and evaluate the model robustness. en, dropout strategy is utilized to address the redundancy of extracted features from the network. Finally, experiments conducted on the original and extended MSTAR dataset indicated that the proposed method could achieve an excellent performance. e contributions of this paper can be summarized as follows: (1) Joint ROI and data amplification: ROI extraction and data amplification methods are presented to suppress noise and enrich numbers of raw training samples (2) Max-Slice CNN model: the presented MS-CNN model could not only extract robust features but also recognize the targets correctly on MSTAR database and yield satisfactory performance on both small samples (3) L 2 -regularization: L 2 -regularization algorithm is incorporated to avoid overfitting and optimize the trained model, which could boost 8.53% compared with the one without L 1 -regularization (4) Transfer learning strategy: transfer learning is employed for improving the robustness under small samples and outperforms other state-of-the-art methods, thus greatly increasing the performance of the feature generalization representation and discrimination e rest of the paper is mainly organized as follows: Section 2 introduces the related work on SAR ATR. Section 3 describes the proposed method in SAR ATR. Section 4 presents the content of transfer learning. Section 5 details the conducted experiments. Section 6 draws the conclusions.

Convolutional Neural Networks.
Convolution neural network is a forward neural network [21] through convolution operation to realize the connection between network layers [22], which incorporates Convolutional layers (Convs), Rectified Linear Unit (Rectified Linear Unit, ReLU) layers, Pooling layers (Pooling), Fully Connection layers (FC), and so on. e Convs applied linear filters followed by activation functions, such as Randomized Parameterized ReLU, Exponential Linear Units (ELU), Scaled Exponential Linear Units (SELU), TanHyperbolic (Tanh), and so on. e filter weights were shared across receptive fields in the Convolutions. e activation layer was adopted to increase nonlinearity of the network without affecting receptive fields of Conv layers. e pooling layer was applied to nonlinear activation maps, and useless information or redundancy in feature maps was discarded. e fully connection layer maps extracted visual features to desired outputs and generated a value to represent grasp success probability.
us, CNN could extract low-level features from images from early layers, which provided justification behind development of other improved CNNs.
GoogleNet has achieved a significant recognition effect on large-scale visual recognition [23] by using maximum and average pooling, random inactivation, and softmax classifier. A residual module [24] was proposed to enhance feature learning by jumping connection and prevent the gradient dissipation. e inception module [22] extracted multiscale information from an image by convolution operation of different branches to widen the network. In addition, the inception-v4 [25] structure formed by introducing jump connection into inception could greatly accelerate training speed and improve network performance. e pyramid model [26], which was composed of bottom-up, top-down repetitive processing and intermediate supervision, was also proposed to improve the performance by processing and integrating features on a multiscale architecture. Liu et al. [27] analyzed the performance of GoogleNet in SAR ATR and achieved good results. Wang et al. [28] used very deep convolutional networks (VGG) to construct ship classification in SAR images to solve training bottleneck caused by small dataset. Fu et al. [29] used learning optimization in ResNet for SAR ATR to solve the feature extraction challenge. CNNs rely on training samples, while insufficient training data result in an overfitting phenomenon.

2.2.
Regularization. Due to overfitting caused by insufficient samples and large scale of parameters, regularization methods have become a significant strategy in deep learning to improve model generalization. Regularization technique, which discourages complexity of the model, can be categorized as L 1 and L 2 -regularization. eir difference lies in the parameter restricted term. L 1 -regularization, also referred to as L 1 norm or Lasso, helps shrink the parameters to zero and finish feature selection by assigning insignificant input features with zero weight and useful features with a nonzero weight. L 2 -regularization is the sum of square of all feature weights to fix the error by penalizing the weights. Regularization is a good method less prone to overfitting and wildly used for deep learning models.
Bi et al. [30] used the L 1 -regularization-based SAR imaging and CFAR detection method to efficiently improve SAR imaging performance, including suppressing sidelobes and clutter. Rambour et al. [31] introduced spatial regularization in SAR tomography reconstruction to sever to the ground analysis. Wagner et al. [32] proposed a deep learning SAR ATR system using regularization and prioritized classes to improve the convergence properties. Meng et al. [33] adopted an adaptive pseudo-p-norm regularization based despeckling SAR images method to provide a high-quality interpretation of SAR data. Ni et al. [34] presented the L 1 /L 2regularization SAR imaging via complex image data to better reconstructing the image target detection task. Kang and Kim [35] used improved L 2 -regularization for compressive sensing to enhance the performance. Although regularization is a good choice to avoid model overfitting in SAR ATR and recognition machine, feature extraction and feature discrimination methods are still challenging issues accurately distinguishing targets. [36] is an important research topic in machine learning. e goal of transfer learning is to transform the knowledge learned from a domain to different but related ones and to reuse the knowledge of target domain by using shared information from source domain. e transfer learning method can effectively use existing marked data to assist classification task of similar datasets and improve target recognition rate of SAR images. erefore, it can effectively alleviate the intervention of inherent factors and shortage of tagged training samples, and provide an effective path to improve target recognition performance.

Transfer Learning. Transfer learning
Wang et al. [37] adopted transfer learning for SAR target detection based on SSD with data augmentation and obtained better performance than other methods. Zhong et al. [38] presented a simple and feasible approach by using transfer learning and achieved a good performance. Xu et al. [39] proposed a differentiated adaptive regularized transfer learning framework for SAR ship classification to overcome the limitation under insufficient labeled training samples. Al Mufti et al. [40] employed a pretrained AlexNet to train a multiclass SVM classifier.
us, transfer learning is becoming a popular approach to solve small sample problems.

Proposed Method
In our work, the transferred MS-CNN method is proposed to develop a feature refinement from the initial SAR image to the final classification map. e structure incorporates training stage and testing stage. During the training stage, the input of the transferred MS-CNN is the enriched samples augmented by ROI and data amplification and the output is the corresponding predicted label; a softmax classifier with L 2 -regularization is considered as the loss function to optimize the network. During the testing stage, images and labels are input into the network, and the aim is to extract the features by using the learned model and predict the recognized classes. e framework is shown in Figure 1.
e operation of convolution in MS-CNN is shown as where x k and y s indicate that feature maps are extracted from the k-th input and s-th feature maps, respectively. W is described as the convolutional filter connecting the k-th input feature map and s-th output map. ⊗ Denotes the operation of convolution. b s is the bias of the j-th output map. For learning various regional features, weights in each layer are locally shared. In addition, max-pooling is illustrated as where each value in the s-th output map y s pools over the p × p n-overlapping region in the s-th input map x s . In the training stage, images input into the network are processed layer wise to obtain the representative features, to have the data further intuitionistic. As is shown in Table 1, data visualization is employed by using this method.

Data Preprocess and Amplification.
Suffering from the background noise, especially speckle noise performs negative to the classification task. ROI extraction from the SAR targets, via resizing the input samples, contributes to reducing the influence of irrelevant background noise and optimizing the training time and coverage speed. e ROI in SAR image stands in the central region in the whole picture, and then the ROI algorithm is employed below to obtain the interested region. e particle of the image is considered as the center to locate the target. e formula is shown in the following equation: where (i c , j c ) is the particle ordinations of SAR target image. e m 10 and m 01 are 1-order origin moments.
Computational Intelligence and Neuroscience 3 where m ij denotes the (i + j)-order origin moment, (x, y) denotes the pixel coordinates of the image, and p (x, y) denotes the pixel value.
To the best of our knowledge, feature extraction and classifier discrimination are also affected by limitation of training sample. e adequacy and confidence of feature information determine the classification performance as well. To this regard, a data amplification method that could rotate the target image at 360 degrees is proposed to address the challenges in sufficient training samples. is method could generate superior SAR target images that can not only better remove background noise from images and locate the region interested to process but also can offset the disadvantages caused by insufficient samples and avoid overfitting as well as gradient explosion. e data process and amplification procedure are depicted in Figure 2.

MS Block for Feature Refinement.
e MS-block is used to refine the extracted features from convolution layers. e role of the slice layer is to decompose the bottom into multiple tops as needed. Considering that the dimensions of the input feature are the N * 2 * H * W, set the axis to be the value 1, that is, the dimension to be decomposed of the feature map. Under the basis of slice operation, the dimensions of the output feature map will be N * 1 * H * W and N * 1 * H * W, respectively. e aim of the designed eltwise layer is to obtain the refined feature maps with the same dimensions as the output of convolution layers by element operation. ree basic operations SUM, PROD, and MAX are available to make its realization. In this paper, MAX operation served to compare the size of output feature maps for the goal of obtaining refined feature maps. e achievements of MS-block are described in Figure 3. Firstly, the scale space of the MS-block representation is obtained by smooth convolution with different Gaussian kernels and shares the same resolution on all scales. Secondly, due to the redundant information produced by the block, max pooling is provided to reduce the redundancy and increase the efficiency. irdly, the advantage of feature representation is that the local features of the image can be described on different scales in a simple form with an abundant theoretical basis to analyze the local features of the image.

L 2 -Regularization-Based Classifier.
After the MS-CNN is amply trained, an L 2 -regularization-based classifier is employed to recognize the SAR images. Consider that N input variables are represented by vector V. e prediction can be illustrated as follows: where W is the weights of above layers and T is transposition operation. b is the bias of the output map.  2S1  BMP2  BRDM_2  BTR60  BTR70  D7  T62  T72  ZIL131  ZLU_2/34 Transfer Figure 1: Structure of transferred MS-CNN for SAR target recognition. Firstly, raw SAR images and transformed SAR images is utilized in MS-CNN to extract information, and the extracted features are by normalization operation to preprocess the learned features. en, the learned features are transformed to Max-Slice block; and the obtained feature maps are scaled to different size and operated with feature aggregation; meanwhile, the processed feature maps are merged and associated with each specified size. irdly, various filters are utilized to obtain the feature information and max-pooling is served to enforce the robustness of the features. e fully connected high-level feature layer and softmax layer predict the recognized classes. Finally, parameters from outside datasets are transferred to the target classification.
Assuming that the index, a class of submodels based on the element-by-element penalty of a binary vector, is d, the formula can be described as follows: e integrated predictor is defined as the geometric average of the predictions of all members restandardized as follows: where Formula (10) is simplified as follows:   Computational Intelligence and Neuroscience e formula is normalized by simplifying the operation and ignoring those multiplication terms that are constant with respect to y, as is shown below: In order to reduce the error in classification, network parameters are regularized in the course of the training stage. e choice of regularization is to add weight delay to rectify the training standard of linear regression. e influence of regularization is shown in Figure 4. L 2 -regularization is a good choice in the network. L 2regularization is also known as ridge regression or Tikhonov regularization and can be defined as the objective function as where w T w is the regularized term and λ is the value of weight decay. J(·) is the target function. We assume that there are no bias parameters with the corresponding parameter gradient.
To take a single gradient step to update the weights, we perform this update: ω ⟵ ω − ∈ zω + ∇ ω J(ω; X, y) , is regularization strategy drives the weights closer to the origin by adding a regularization term Ω(θ) � (1/2)‖w‖ 2 2 to the objective function. e addition of the weight decay term has modified the learning rule to multiplicatively shrink the weight vector by a constant factor on each step, just before performing the usual gradient update.

Dropout.
Overfitting is a common problem in machine learning matters. In order to further solve the problem of overfitting, we usually adopt the integration that trains multiple models and combines their advantages together. e problem is that the model is time consuming to train and test. Dropout strategy can effectively alleviate the occurrence of overfitting and achieve regularization. Generally, dropout can be used as a trick for training deep networks. In this paper, dropout is adopted after the final MS-block network training. e contributions of dropout activation are as follows: the firstly is the averaging effect. e strategy can effectively prevent the problem of overfitting, and the random deletion of half of the hidden neurons leads to a different network structures, and the whole dropout process is equivalent to averaging a host of different neural networks. e second is reducing complex coadaptation relationships between neurons in the network. e updating of weights is no longer dependent on the coaction of implicit nodes with fixed relationships, which prevents some features from being effective only under other specific features, forcing the network to learn glowingly robust features, which also exist in random subsets of other neurons.

Transfer Learning
Transfer learning is the ability of a system to recognize and apply knowledge and skills learned in previous domains/ tasks to novel tasks/domains, which share some commonality. Given a source domain and source learning task, a target domain, and a target learning task, transfer learning aims to help improve the learning of the target predictive function f (T(·)) using the source knowledge, where D S ≠ D T or T S ≠ T T . e domain consists of two components: a feature space X � {x 1 , x 2 , . . ., x n } and a marginal distribution P (X). Given a specific domain and label space Y � {y 1 , . . ., y n }, for each x i in the domain, the task is to predict its corresponding label y i where y i ∈ Y. In general, if two tasks are different, then they may have different label spaces or different conditional distributions P (Y | X). Specifically, for SAR ATR task, the domain task shares the same feature space F � {f 1 , . . ., f n } with n dimensions, while the marginal probability distribution is different due to different classification task. In this paper, transfer learning configurations are implemented in Figure 5. e algorithm of transfer learning is shown as follows: for the input, given the source dataset D S and target dataset D T , set the initialized MS-CNN model M Pre � f (x, θ Pre ). Firstly, fine-tune the MS-CNN model M 0 based on M Pre using D S to get a well-pretrained MS-CNN model. en, transfer the shallow layer's parameters by freezing the learned layer, and the hyperparameters of MS-CNN are retrained on D T until the model converges to the optimal solution. Specifically, for example, as transferred the parameters of conv4, the parameter update of the layer before conv4 would be the pretrained parameters, and the parameters of the rest layers 6 Computational Intelligence and Neuroscience would be trained from scratch. Finally, the small learning rate are considered to further fine tune the model slightly, to make the model more suitable for the SAR ATR task.

Experiments
All the experiments here are conducted on deep learning acceleration computing service with Intel Core i3-7350K CPU, on an Ubuntu 16.04 LTS operation system. e graphics card is NVIDIA GTX 1080ti, and the RAM is 8G. All the proposed convolutional neural network models are implemented using the publicly available Caffe framework. e total number of samples was 698 images. e optical images and SAR images are shown in Figure 6. Dataset configuration is listed in Tables 2 and 3.

Training/Validation Methodology.
In the training process, 50 epochs are set to fine-tune the proposed models and information regarding about the loss values and output classification results is recorded. As shown in Figure 7, the accuracy of training and validation is 100%, and the predicted loss value converges to zero fast at an earlier time quantum. To evaluate the proposed model, we visualize the   Computational Intelligence and Neuroscience network layer by layer, which is shown in Figure 8. Observing the feature map from Conv1, the model learned some fundamental features from the input, such as edges and corners from all directions. e deeper the layers, the more complex and richer the learned features, including outlines, background, and higher-level semantic information. Features in the later layers are more discriminative containing the corresponding label information; thus, the learned feature in feature map is more targeted based on the belonging class.      Computational Intelligence and Neuroscience validate on SAR image databases, and the accuracy of network feature extraction under variables is tested. e confusion matrix of the test dataset is shown in Figure 9. It is apparent that the network did not work efficiently as the Max-slice number is six, as the overfitting is caused by inadequate feature selection. From eight MS-blocks to twelve MS-blocks, the performance is relatively superior gradually and acquires excellent result at ten MS-blocks.

Evaluation of Data
Amplification. Ten MS-blocks are selected in the proposed network for further performance evaluation. We performed both the operations on data process and without pooling on the chosen network and noticed that the algorithm adopted quite excellently served to the model. As shown in Table 4, results without ROI extraction and data amplification are 74.51% and 80.66%, respectively.

Evaluation on Various Activation Function.
Meanwhile, motivated by the advantage that activation function contributes to advancing the understanding of the obtained features and boosting the performance of classification, ablation study on MS-CNN is also processed in this paper. e traditional activation function, such as ReLU, TanH, and Sigmoid, does not yield the top accuracy as expected. e Power function is negative due to the logistics operation that undermines the amount of the output features. e ELU is a common idea, and compared with the negative value of ReLU, the average value of cell activation ELU can be close to zero, similar to the effect of batch normalization but with lower computational complexity and soft saturation, and the accuracy of ELU is 95.93%. e model with sigmoid activation could achieve 96.70 accuracy. e result is shown in Figure 10.

Evaluation on Dropout and Regularization.
To get insight into the strategy, the comparable study was conducted as follows in Table 5. e accuracy with dropout strategy is 98.93%, boosting 1.64% compared with the one without dropout. As indicated in Table 6, L 2 -regularization has the advantages to enhance the performance by reducing the redundancy via adjusting the weighted delay. e L 1regularization method achieved 90.40% accuracy, reducing 8.53% improvement by using the L 2 -regularization. It has demonstrated that the L 2 -regularization has obtained excellent accuracy compared to other options, especially the method without any regularization that caused overfitting during the training stage.       model shares good generalization ability on an unknown dataset. e intuitive manifestation of metrics measurement is the fitting performance and prediction performance on the unknown dataset. To this regard, we selected the optimized MS-CNN as the pretrained model to learn some sort of information from various classes SAR images classification. Specifically, we fine-tune the model by keeping all the layers before the last fc layer fixed and modifying the three classes as six, eight, and ten to attain more generic features. Table 7 shows the generalization evaluation performance. e involved classes are randomly selected from the ten classes SAR images. From class three to class ten, the accuracy is all satisfactory, and they are 98.97%, 97.57%, 97.26%, and 94.19%, while the performance using the pretrained model surpass the one without it, and the improvements are 0.97%, 1.35%, and 2.59% from class six to class ten. Initiatively, the performance by using the pretrained MS-CNN model outperforms the one trained from scratch.

Robustness Evaluation
(1) Comparison to the State-of-the-Art Based on Small Samples. Inspired by the superior performance on three target classification tasks, we carried on the experiments on different state-of-the-art networks. We randomly select 1/2, 1/3, 1/4, and 1/8 images from the raw training samples and enrich the dataset by the method proposed in Section 3. e aim is to validate the robustness of MS-CNN by reducing the raw images. Observed from Figure 11, the performance of our proposed method is better than the LeNet, AlexNet, and ResNet when applied to the SAR ATR. Generally, the deep structure is not satisfactory to deal with the simple images characterized with grayscale and small size, while our proposed method can achieve the corresponding result as the GoogleNet (VGG16).
(2) Transfer Learning vs Scratch. As is shown in Figure 11, from LeNet to MS-CNN, we notice that our proposed model could outperform partial networks, while some results are not satisfactory. Another brick in the wall is introducing the transfer learning approach. We trained the MS-CNN by ten class SAR images and transfered the weights to the three SAR image classification. From MS-CNN to MS-CNN (Transfer learning) in Figure 11, we notice that the model used transfer learning that could significantly surpass the performance of the model trained from scratch. Specifically, it has achieved an improvement over the current MS-CNN by 6.22%, 2.93%, 3.34%, and 1.61% in the 1/8, 1/4, 1/3, and 1/2 dataset. is observation validates the effectiveness of the transfer learning. e model trained from scratch is not satisfactory than the model using the pretrained model.

Conclusion
In this paper, a novel transferred MS-CNN structure with L 2regularization is proposed to solve the overfitting problem caused by the insufficient samples and model design for computational consumption. e data process pipeline is employed to address the data acquisition limitation and reduces the computation and redundancy for SAR target recognition. It is testified that combining the ROI extraction and data amplification algorithm has potential advantages to solve the sample problems. e transferred MS-CNN structure is available to refine the extracted features and contributes to SAR ATR. Furthermore, the methodology of dropout strategy and regularization term in this model has reflected despeciation to avoid the overfitting phenomenon. Overall, the performance conducted on extended MSTAR dataset indicates that our method is discriminative and effective and also proves that our proposed method is of good regularization and robustness. Due to the computational complexity and insufficient samples, a more efficient method, such as transfer leaning, a few shot learning will be explored in our future work.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.