A Method Combining CNN and ELM for Feature Extraction and Classification of SAR Image

and


Introduction
Synthetic Aperture Radar (SAR) is an important mean to obtain information, and it is widely used in geological survey, topographic mapping, utilization of marine resources, and so on.The reason why it is worthy to explore in the future is due to its vast practical applications.
SAR image recognition always is seemed as research hotpot, in the process of obtaining information, while feature extraction is one key factor in the success of an image target recognition system.Mishra and Mulgrew firstly put forward the application of principal component analysis (PCA) to SAR image classification.Experiments on Moving and Stationary Target Acquisition and Recognition (MSTAR) database show that the classifier based on PCA is better than the bias classifier based on Gaussian model under the condition of limited training data [1].Knee et al. proposed an automatically classifying method using image partitioning and sparse representation-based feature vector generation for SAR image [2].Wang et al. proposed a complementary spa-tial pyramid coding (CSPC) approach in the framework of spatial pyramid matching.Both the coding coefficients and coding residuals are explored to develop more discriminative and robust features for representing SAR images [3].Chen et al. proposed the deep convolutional networks (DCN) for target classification of SAR images, which can achieve an average accuracy of 99% on classification of ten-class targets of MSTAR database [4].Ding et al. proposed the method combining of global and local filters (CGLF) for SAR target recognition under the standard operating condition and various extended operating conditions, which can be seen as a standard of comparison [5].
Convolutional neural network (CNN) is a class of feedforward neural networks including convolution computation and deep structure.It is one of the representative algorithms of deep learning [6].CNN has been widely used in many fields.Particularly in the field of recognition, CNN is used for handwritten digit recognition [7,8], speech recognition [9,10], facial expression recognition [11,12], human face recognition [13,14], refrigerator fruit and vegetable recognition [15], verification code recognition [16], traffic sign classification [17] and recognition [18], and so on.In the field of image recognition, the images can be directly made the input of CNN, which reduces the complexity of the experiment.Furthermore, the image information can be passed through the forward propagation to the convolution layer and the downsampling layer by CNN.At the same time, it can be handled in the different network layers, which avoids the extraction process of the complex features in the traditional algorithms.The most basic features of the images can be accessed by the neuron in the local sensing domain of CNN.CNN still remains a high degree of invariance in extracting complex features of images, regardless of shift, scaling, deformation, rotation, or other forms of deformation of the image [19].Cireşan et al. built multicolumn deep neural networks to apply to Mixed National Institute of Standards and Technology database (MNIST) handwriting database, which has a great recognition effect (recognition rate is less than 0.3%) [20].Therefore, the CNN is a better image feature extractor.
In recent years, CNN has also been successfully applied to radar image recognition.[23].Cho and Park proposed CNN architecture using aggregated features and fully connected layers, which the accuracy recognizing the 10 classes of military targets on MSTAR dataset is 94.38% [24].
Generally speaking, the last layer of CNN can be regarded as a linear classifier, but it is not the optimal classifier.The commonly used optimal classifier is support vector machine (SVM) and its improved algorithms.In 2006, Huang and LeCun combined convolutional networks (CN) and SVM algorithm to identify targets and generated a high recognition rate [25].In 2012 under the combining algorithms, the experiment on handwritten digit was conducted; the accuracy can be up to 99.81% [7].Particularly in reference [26], CNN works as a trainable feature extractor and SVM performs as a recognizer, the classification accuracy 98.49% is gained on the Kennedy Space Center (KSC) dataset and 99.45% is gained on the Pavia University Scene (PU) dataset.
However, compared with the traditional SVM, back propagation (BP) and other classification algorithms, the Extreme Learning Machine not only has a fast training speed, less adjustment parameters, but also has short running time and high training precision [27].A new SAR image recognition method based on the CNN-ELM algorithm is proposed in this paper.The process of the method is as follows: firstly, ReLU function is used in CNN, instead of Sigmoid function; secondly, the image features are extracted; finally, the last layer of CNN will be replaced with ELM in order to recognize these images.And the method is characterized by a high recognition rate and short running time.

CNN and ELM
2.1.Convolution Neural Network.In recent years, because machine learning does not need to change the topological structure of the images, it is very popular in image recognition.Convolution neural network (CNN) is not only one of the deep learning [28] but also one of the artificial neural networks, which mainly is used in the fields of speech analysis [29] and image recognition [30].
The structure of a traditional CNN model is shown in Figure 1.There are five layers in the CNN model.The input layer is a matrix of the normalized pattern with size S × S. The feature map connects inputs with its previous layer.It means that the features obtained by convolution layer are used as input of pooling layer.All the neurons in one feature map share the same kernel and connecting weights (known as the sharing weights in [31]).For example, with a kernel size of 5 and a subsampling ratio of 2, each feature map layer reduces the feature size from the previous feature size S to ðS − 4Þ/2.
There are three unique structural characteristics in the CNN model: local sensing domain, weight sharing [31], and downsampling.The local perception domain is the single neuron of each layer.It is the neurons of every layer which is only relative to the neuron in a certain domain (generally the neurons in the rectangular area are 5 * 5) in the network input layer.Due to the unique structural characteristics, the structural features of the input image can be extracted by each neuron.Weight sharing can greatly reduce the training parameters of the network and the number of training If there are L hidden layer nodes, the single hidden layer neural network can be expressed as where gðxÞ is an activation function, is the inner product of W i and X j .The learning aim of the single hidden layer neural network is to minimize the error of output.It can be considered as that means, there are existences β i , W i , and b i , which can be regarded as It can be expressed as a matrix In order to train the single hidden layer neural network, there is a hope to get Ŵi , bi , and b β i that makes where i = 1, 2, ⋯, L; it is equivalent to the minimum loss function There is no need to adjust parameters in the ELM algorithm.Once the input weights W i and hidden layer bias b i are randomly determined, the output matrixes H and β of the hidden layer are uniquely determined.

Recognition of SAR Images Based on
Improved CNN-ELM In the convolution layer of Figure 1, the characteristic graph (map) is convoluted by convolution kernel, and the map of the convolution layer is outputted by the convolution structure through the activation function.The convolution layer and the downsampling layer alternately appear, and each output map of the convolution layer is related to the input map.Generally, the output of the convolution layer is where n is the number of layers for the convolution layer, W ij is the convolution kernel, ϕ j is the bias, and M j is the input map.f ð•Þ is the activation function.The standard CNN activation function is the Sigmoid function which can be expressed as [32] The output range of Sigmoid function is (0,1) in the process of adjusting the weights; it is positive proportions between the change of weight value and the output of the upper layer; when part of it tends to zero, the reduction or no adjustment of the weight adjustment will increase the training time.From Figure 2(a), it can be seen that its derivative curve, in turn, looks like a bowl, which easily causes the problem of gradient dispersion.
Therefore, it is improved to ReLU function [23], where it is an unsaturated nonlinear function; it is easy to derive and realize the sparsity of the network.From Figure 2(b), the output of some neurons is 0, which reduces the dependence among the parameters, relieves the overfitting problem, and can transmit the gradient to the front network very well in reverse propagation.Meanwhile, it can reduce the problems which are caused by gradient dispersion and speed up the convergence speed of the network.The formula of the ReLU function is In this paper, the improved CNN is used to extract the image features which are used as the input of the ELM algorithm to get the recognition accuracy.Therefore, a SAR image recognition algorithm based on improved CNN-ELM is proposed, and the structure of proposed algorithm can be seen in Figure 3.  Comparing with the optical image, the SAR image contains a lot of noise; the SAR image will be pretreated including denoising, segmentation, and edge detection.These images will be shown in Figure 5.Then, the SAR images   5 Journal of Sensors which are pretreated and taken from 15 °of depression angle are used as the training set; meanwhile, the SAR images taken from 17 °of depression angle are used as the test set.The details can be seen in Table 1, the total size of the training set is 2747, and the size of the test set is 2426.Next to it, the improved CNN is used to extract the feature of the training set and the test set, and the acquired feature vectors are extracted.These vectors are seemed as the input of the Extreme Learning Machine (ELM).The recognition accuracy which is obtained through the ELM algorithm to identify and classify is compared with other experimental algorithms (the algorithm in the literature [7,25]); the performance of the proposed algorithm is significantly improved.

Experimental Result Analysis.
The calculation method of accuracy is the 10 kinds of SAR images (2S1, BMP_2, BRDM_2, BTR_60, BTR_70, D_7, T_62, T_72, ZIL_13_1, ZUS_23_4) of training set and test set which are, respectively, labeled with 1,2,•••,10 labels as original labels.The ELM algorithm is used to get their labels.If the number of labels tested is the same as that of the original labels, then add 1 and finally, divide the number of each label by the total number of the labels to get the accuracy.The time of feature extraction using CNN is about 1.2 seconds, and the time of ELM recognition is about 0.15 seconds, so the total time is about 1.35 seconds.
In order to test the performance of the proposed CNN-ELM algorithm for SAR image recognition, comparisons are made with principal component analysis (PCA) [1], deep convolutional networks (DCN) [4], sparse representation classification (SRC) [2], complementary spatial pyramid coding (CSPC) [3], combination convolutional nets and support vector machine (CN-SVM) [25], and CNN-SVM [7] methods, respectively.The results of the experiment are shown in Table 2.
Compared with CN-SVM, the accuracy rate of CNN-ELM in the type 2S1 is 5.47% higher, in the type BMP_2 is 4.03% higher, in the type BRDM_2 is 14.96% higher, in the type BTR_60 is 2.54% higher, in the type BTR_70 is 8.67% higher, in the type D_7, T_72, and ZUS_23_4 is 1.82% higher, in the type T_62 is 8.79% higher, and in the type ZIL_13_1 is 6.2% higher, respectively.
Compared with CNN-SVM, the classification accuracy rate of these five SAR images (BTR_60, BTR_70, T_62, T_72, and ZIL_13_1) has been improved.Separately, the accuracy rate of CNN-ELM in the type BTR_60 is 0.01% higher, in the type BTR_70 is 2.34% higher, in the type T_62 and T_72 is 0.5% higher, and in the type ZIL131 is 1% higher, respectively.On the whole, the recognition accuracy rate of the improved CNN-ELM algorithm is 5.62% higher than CN-SVM, and the accuracy rate is also 0.43% higher than CNN-SVM.The experiment time is very short, which shows that the algorithm has a very strong feasibility and can be further applied to the classification and recognition of other objects.

Conclusion
In this paper, the CNN is improved through improving the activation function; then, the improved CNN is used to extract the depth features of 10 kinds of SAR gray images which are under the processes of denoising, segmentation, and edge detection.Following this process, these SAR images are classified by the ELM algorithm combined with the features extracted from CNN.The experimental result shows that the combination of CNN and ELM has a high accuracy rate and short time for the recognition of SAR images, which demonstrates the effectiveness of the algorithm and can be further applied to the classification and recognition of other objects, especially for some images whose features are not obvious.

Figure 4 :
Figure 4: Target shapes and SAR image shape.

Figure 5 :
Figure 5: 10 kinds of tank SAR images after denoising, segmentation, edge detection, and other pretreated images under 15 °of depression angle.
Downsampling is an effective feature of extracting images, which can make the model have a good antinoise capability and greatly reduce the feature dimension of images.The CNN model is divided into input layer, hidden layer, and output layer.There are two hidden layers: convolution layer (extracting feature) and downsampling layer (selecting the optimizational feature).
j , t j Þ, where

Table 1 :
Training and testing set of MSTAR database.

Table 2 :
Recognition and contrast results (%) of different algorithms for SAR images.