Feature Representation Using Deep Autoencoder for Lung Nodule Image Classification

This paper focuses on the problem of lung nodule image classification, which plays a key role in lung cancer early diagnosis. In this work, we propose a novel model for lung nodule image feature representation that incorporates both local and global characters. First, lung nodule images are divided into local patches with Superpixel. Then these patches are transformed into fixed-length local feature vectors using unsupervised deep autoencoder (DAE).The visual vocabulary is constructed based on the local features and bag of visual words (BOVW) is used to describe the global feature representation of lung nodule image. Finally, softmax algorithm is employed for lung nodule type classification, which can assemble the whole training process as an end-to-end mode. Comprehensive evaluations are conducted on the widely used public available ELCAP lung image database. Experimental results with regard to different parameter setting, data augmentation, model sparsity, classifier algorithms, and model ensemble validate the effectiveness of our proposed approach.


Introduction
Lung cancer is one of the most deadly diseases around the world, with about 20% among all cancers in 2016.The 5-year cure rate is only 18.2% in spite of great progress in recent diagnosis and treatment.It is noted that if the patient can be accurately diagnosed in the early stage and suitable treatment can be implemented, there will be a greater chance for their survival [1].Therefore, it is of great significance to do research about early diagnosis of lung cancer.Computed Tomography (CT) is currently the most popular method among lung cancer screening technologies [2].CT can generate high resolution data, which enable small/low-contrast lung nodules effectively detected compared with conventional radiography methods.According to the report of National Lung Screening, low-dose CT scan reduces lung cancer mortality by a rate of 20% [3].Due to the fact that traditional lung cancer diagnosis only relies on professional experts, two main drawbacks will be caused: (1) subjectivity, different doctors have different diagnostic results for the same CT scan image; (2) huge workload, reading CT images consumes much time and effort.This makes the efficiency inevitably weakened.With the development of computer vision technology, some benefits are brought for medical image process and analysis.Its efficiency and stability provide auxiliary help for doctors with automatically or semiautomatically pattern.
During the last two decades, a number of researchers have been devoted to the development of medical image process and analysis with computer vision and machine learning technologies especially for lung disease diagnosis [4].Among these studies, lung nodule image classification has attracted much attentions for it is a key step for lung cancer analysis.The lung nodule is characterized by its appearance and relation between surrounding regions.Usually, the lung nodule can be classified into 4 types [5], as shown in Figure 1.To be specific, Figures 1(a)-1(d) demonstrate nodule types W, V, J, and P, respectively, where W is well-circumscribed nodule located centrally in the lung without any connection to other structures; V is vascularized nodule that is also central in the lung but closely attached to the neighbouring vessels; J is juxtapleural nodule that has a large portion connected to the pleura; P is pleural-tail nodule that is near the pleural surface connected by a thin tail.
Lung nodule CT image classification includes two main steps.First, feature extraction and representation use segmentation, filter, and statistical method to describe feature of lung nodule based on shape and texture.Second, classifier design constructs classifier based on supervised or unsupervised machine learning method.However, these methods belong to the fields of traditional image processing and machine learning, which can only characterize the abstraction of lung nodule image in a shallow layer and make the research at low level.As a result, the complex structure of lung nodule makes the classification still a challenging problem.This paper proposes a novel model for lung nodule feature representation and classification.

Related Works
Many studies have reported the classification of lung nodule in CT image.Some representative works are introduced in this section.Many researches designed feature based on texture, shape, and intensity of lung nodule image.A feature extraction method based on morphological and shape of lung nodule was designed in [7].A subclass local constraint based method is proposed in [8].Spectral clustering and approximate affine matrix were used to construct data subclass and each subclass was used as reference dictionary.The testing image was represented by sparse dictionary.Finally, two metrics based on approximation and distribution degree were merged.In [9], spectrum was sampled around center of lung nodule and feature was constructed by FFT.All features were used to construct the dictionary, and then BOVW mode was used to represent the feature of lung nodule.The Haralick texture feature based on spatial direction distribution was proposed in [10], and SVM was used as classifier finally.Ridge direction information was adopted in [11].Local random comparison method was used to construct the feature vector, and then random forest was used as classifier.Reference [12] first labeled nodule as solid, part-solid, and nonsolid.Then shape based feature was extracted and kNN was used train the classifier.Reference [13] adopted smoothness and irregularity of lung nodule as feature representation.Texture, shape, statistics, and intensity were extracted as feature representation and ANN was used as classifier in [14].An eigenvalue of Hessian matrix based feature extraction method is adopted in [15], and AdaBoost was used as classifier.Reference [16] used rotation-invariant second-order Markov-Gibbs random field to model the intensity distribution of lung nodule, and Gibbs energy was used to describe the feature vector.Finally, Bayes classifier was constructed.LDA and 3D lattice were used to construct the mapping between lung nodule image and feature representation in [17].Reference [18] used topology histogram to represent feature vector of lung nodule, and discriminant and -means were used as classifier.These methods represent the lung nodule image feature in relatively low level, and they lack sophisticated extraction.On the other hand, these methods need heavy participation of professional expert and they have less generality.Some well-engineered feature extraction and representation methods widely used in computer vision domain were adopted in lung nodule image classification.Reference [22] proposed a method based on texture and context of lung nodule.Lung images are divided into nodule level and context level; then SIFT, LBP, and HOG features were extracted.Reference [19,23] divided lung nodule as foreground and background with graph model and conditional random field.Then SIFT was used to extract feature and SVM was used as classifier.In [24], SIFT feature was first extracted.Then PCA and LDA were used for dimension reduction.Finally, complex Gabor response was used for representation.In [25], a supervised method was used for initial classification with 128length SIFT descriptor and weighted Clique was constructed using 4-length probability vector against the 4 nodule types.The overlap that lung nodule belongs to different types was used for optimizing the final classification result.These methods adopt general designed features.They obtain higher performance compared with traditional low-level features, while such methods are considered as mid-level abstraction of lung nodule and with less flexibility.
Several methods were concerned with other aspects.An ensemble based method was applied in [26] for lung nodule classification.Lung nodule image patch was used as input, and six large scale artificial neural networks were trained for classification.Data imbalance problem was discussed in [27].It used downsampling and SMOTE algorithms to train lung nodule classifier.
Due to its breakthrough in the field of image processing and speech recognition, deep learning has become one of the most hottest topics in machine learning research and application [20,[28][29][30].High-level abstraction of image object can be described using deep learning model.Meanwhile, feature extraction and representation are more efficient and effective.In [28], curvature, hu-moment, morphology, and shape features were used to detect nodule candidate region.Then convolutional neural network (CNN) was used to extract feature for candidate region and multiple classifiers were merged for final result.Some changes were made in [29,30].OverFeat was used for CNN parameter initialization.In [20], a deep feature extraction with one hidden layer autoencoder was adopted, and a binary decision tree was used as classifier for lung cancer detection.This paper proposes a lung nodule image classification method combining both local and global feature representation.Our proposed work is close but has essential difference from the work of [20].Method in [20] just applied one hidden layer autoencoder to lung nodule image.Our proposed method uses Superpixel to generate intact patches and deep autoencoder to extract local feature.Moreover, method BOVW is incorporated for lung nodule global feature representation and method in [20] has no consideration.

Framework
The procedure of proposed lung nodule classification method is shown in Figure 2

Local Feature Representation
Local feature representation is proposed in this section.The process consists of two steps: (1) local patch generation and (2) local feature extraction and representation.

Local Patch Generation.
Decomposing a lung image into small patches is useful and practical and for important tissues can be picked up and unrelated ones can be get rid of.As shown in (1), a lung nodule image  can be composed of a group of image patches   , where  denotes the number of local patches: The location and scale of local patches are determined through generation [22,24].Useless part will be contained for large size patch, while small part may not cover enough intact tissue.Superpixel is a popular method that can partition the image into small similar regions with better representativeness and integrity [31].So it is adopted in this work.
Figure 3 illustrates the process of the proposed local patch generation method.For a lung nodule image (Figure 3(a)), it is first segmented into local patches using Superpixel and a Superpixel map is obtained (Figure 3  (i) Let   be a local patch; it is removed when the area of   is larger than  max or smaller than  min .
(ii) Let   and   be two local patches; if the ratio between their intersection and their union is larger than   , then the smaller one is removed. max ,  min , and   are predefined thresholds.

Local Feature Extraction and Representation.
With the rapid development of unsupervised learning in recent years, using unlabeled data to extract feature with autoencoder has become an appropriate way.Autoencoder model is essentially a multilayered neural networks.Its original version is a forward network with one hidden layer.Let   be the input data,    be the activation of unit  in layer , and   be the matrix of weights controlling function mapping from layer  to layer  + 1.If layer  has   units and layer  + 1 has  +1 units, then   will be a matrix with size of   *  +1 .The activation can be formulated as (2), where  2 1 is the 1st unit in the 2nd layer and  0 - 3 are 4 input features: The main difference between ordinary forward neural network and autoencoder is that an autoencoder's output is always the same as or similar to its input.The basic formula can be expressed as follows: An autoencoder can be seen as a combination of encoder and decoder.The encoder includes an input layer and a hidden layer, which converts an input image  into feature vector .The decoder includes a hidden layer and an output layer that transform feature  to output feature   .  and   are weight matrices of encoder and decoder, respectively.Functions (⋅) and (⋅) can be either sigmoid or tanh activation functions, which is used to activate the unit in each layer.When   approximates , it is considered that the input feature can be reconstructed from an abstract and compressed output feature vector .The cost function can be generally defined as follows: A deep autoencoder can be constructed by stacking more hidden layers.As shown in Figure 4, there are 5 layers in the model (including 3 hidden layers). 1 to  3 are encoding layers, and  3 to  5 are decoding layers.  is used as the input of the layer  +1 , and the weights can be gained based on (3).There are 2 stacked autoencoders.The activation of 1st hidden layer is the input of the 2nd stacked autoencoder.The network can be trained in a fine-tuning stage by minimizing the equation (4). 1 and  4 are trained through the encoding and decoding weights of the 1st stacked autoencoder, and  2 and  3 are trained through the encoding and decoding weights of the 2nd stacked autoencoder.Finally, the whole network can be constructed layer by layer in a stacked way.Moreover, Figure 4 just shows an example of symmetric encoding and decoding structures, and other variational structures can also be adopted.
(5)  visual words can be treated as similarity between lung nodule image samples.

Global Feature Representation
Recall that a lung nodule image is decomposed into a group of local patches and each patch is represented with a feature vector based on deep autoencoder.Assume there are  local patches generated from all lung nodule training images and each local patch is represented with dimensional feature vector; then all local feature vectors can be assembled into a feature space with size of  * .Clustering is performed with  *  features, and -means clustering method is adopted since it has relatively low time and storage complexity, irrelevant to data process ordering.Each cluster center   represents a visual word , and  cluster centers constitute the visual vocabulary.A lung nodule image sample  can be represented by the encoded local patches as a bag, which is the occurrence frequency of visual word in vocabulary.To get the histogram representation ℎ() of an image , all local patch feature vectors of  are mapped onto the cluster center of the visual vocabulary, and each local feature is assigned with the label of its closest cluster center using Euclidean distance in feature space.Then a bins histogram ℎ() is obtained by counting all the label of local patches generated by image , as shown in (6) (6) For an input image sample   , we want to compute ( =  |   ) ( ∈ {0, 1, 2, 3}).The output, a 4-dimensional vector, is estimated to represent the probability of each type that   belongs to.The hypothesis function can be expressed as follows:

Classifier Model
where  = { 0 ,  1 ,  2 ,  3 } is model parameter set.This equation normalizes the result and makes the sum to 1.For training procedure, the loss function is given as follows: where 1{⋅} is an indicative function, and stochastic gradient descent (SGD) is used for function optimization and the corresponding derivative functions are given as follows: Figure 6: Demonstration of lung CT images (downloaded from [6]). Figure 6 demonstrates the lung nodule CT scan images, which are sampled from different slices.Table 1 shows the format of a * .cvsfile.Each row denotes a lung nodule.The 4th column indicates the slice number where the lung nodule exists.The 2nd and 3rd columns give the positions that the lung nodule is located in.In this section, lung nodule images are cropped from the raw CT images based on the and coordinates of nodule center given in Table 1.The raw lung CT scan image is fixed with 512 * 512 pixels, and the cropped nodule images are too small to implement the algorithm.Therefore we further resize the cropped lung nodule image into 180 * 180 pixels with bicubic method.The lung nodule images are labeled with one of four types according to the guidance by an expert.Programs are implemented with Matlab 2016a programming language and tested on a Pentium i7 CPU, 8 G RAM, NVIDIA GTX 960 GPU, Windows OS PC.

Experimental Evaluations
The experiments include the following aspects: where  correct is the number of correctly labeled images and  all is the number of all testing images.Cross validation mode is adopted.The dataset is divided into 8 groups: 7 randomly chosen groups are used for training and the left group is used for testing.This process is repeated 7 times and the result is computed by averaging 7 independent tests.

Parameter Setting.
The parameters are needed to be set in local patch generation, local feature representation, and global feature representation.For local patch generation, we need to set the number of superpixels that each lung nodule image generates.For local feature representation, the number of hidden layers and nodes that each layer contains should be set.For global feature representation, the size of visual vocabulary should be set.
As shown in Table 2, the number of patches that each lung nodule image generates is set with 15, 20, 25, and 30.The number of hidden layers in deep autoencoder is set with 1, 2, and 3.The number of nodes in deep autoencoder is set with 50, 75, 100, 125, and 150.The size of visual vocabulary is set with 200, 300, 400, and 500.The classification rate is evaluated on the combination of these parameters.For convenience, parameters are expressed with  1 ,  2 ,  3 , and  4 , respectively.

Classification Rate with Different
Parameters.The size of local patch is set with 30 * 30 pixels in our experiment.Table 3 gives the average performance of lung nodule image classification based on combination of parameters  1 ,  2 ,  3 , and  4 .It can be seen that classification model with  1 = 25,  2 = 2,  3 = (100, 50), and  4 = 400 gets the optimal result, The sparsity regularization term is regulated by Kullback-Leibler divergence KL( ‖   ).  is the average activation of th layer of deep autoencoder and  is the target activation. with small value can reduce the mean activation of the model. is a trade-off parameter.Table 5 gives the result of classification performance with different  (values from 0.1-0.9).It can be seen that  set around 0.3-0.4leads to the superior performance.

Classification Rate with Different Classifier Algorithms.
In this subsection, we evaluate the performances of 4 commonly used classifier algorithms.Softmax (which is used in this paper), SVM, kNN, and decision tree are used.The same feature representation is adopted.Table 6 shows that softmax slightly outperforms SVM, kNN, and decision tree.The  [19] studies the same problem as ours.Reference [20] adopts the primitive autoencoder method.References [7,21] use non-deep-learning methods for classification.Reference [9] employs the BOVW model.The compared methods are reimplemented and are tested with diverse parameters.Table 7 gives the testing result.Among all testing methods, the proposed one demonstrates the best performance.Comparing with non-deep-learning method, our method can construct better feature representation, while, comparing with primitive autoencoder method, the Superpixel and DAE used in our method can catch more detailed information.
where  is a testing image,  is a class label,  denotes number of selected models, and   means th classifier.(  () = ) = 1, if   classifies  as .The label with maximal value of (⋅) is determined as the final result.If multiple labels have the same Classification method Performance Ref. [19] 0.877 Ref. [7] 0.88 Ref. [20] 0.82 Ref. [21] 0.895 Ref. [9] 0.891 Our proposed method 0.939 With different parameters combination, models with top performances are retained for ensemble.Table 8 gives the testing result.The 1st row denotes the single model.The 2nd to 4th rows denote model ensemble with 5, 6, and 7 individual models, respectively.The result demonstrates that model ensemble can complement individual ones and the performance is improved with about 1.5%.

Conclusion and Future Works
In this paper, a novel feature representation method is proposed for lung nodule image classification.Superpixel is first used to divide lung nodule image into local patches.Then local feature is extracted and represented from local patches with deep autoencoder.Bag-of-visual-words model is used as global feature representation with visual vocabulary constructed by local feature representation.Finally, an endto-end training is implemented with a softmax classifier.The proposed method is evaluated from many aspects, including parameter setting, data augmentation, model sparsity, comparison among different algorithms, and model ensemble.We draw a conclusion that the proposed method achieves superior performance.The merits of our method are the combination of local and global feature representation, and better model generalization can be gained by incorporating unsupervised deep learning model.
Our future works will focus on two aspects: (1) study new classification framework and method according to upto-date convolutional neural network and (2) analysis of our method in large data set for making further improvement and optimization.

Figure 1 :
Figure 1: Demonstration of four types lung nodule image samples (cropped from images in [6]).
. It contains training and testing stages.

Figure 2 :
Figure 2: Framework of the proposed method.
Figure 3(c) is an individual patch sample.However, the region that Figure 3(c) gives is an irregular shape, and it is inconvenient for local feature extraction and representation.So we expand local patch with its minimum enclosing rectangle, as shown in Figure 3(d).Finally, a lung nodule image is decomposed into a set of local patches, as shown in Figure 3(e).Besides, there are some additional criterions to determine whether an image patch is qualified for local feature extraction:

Figure 3 :Figure 4 :
Figure 3: The process of local patch generation by Superpixel.

For
BOVW model, visual vocabulary is first constructed based on clustering all local patch descriptors (local feature representation) generated by a set of training images.Then each lung nodule image can be represented globally by a histogram of visual words.Distance between histograms of

Figure 5 :
Figure 5: Procedure of BOVW representation of lung nodule image.

7. 1 .
Dataset and Program Implementation.In order to evaluate the performance of the proposed lung nodule image representation and classification method, a widely used public available lung nodule image dataset, ELCAP, is used for testing[6].The dataset contains 379 lung CT images, which are collected from 50 distinct low-dose CT lung scans.The center position of lung nodule is marked in an extra * .csvfile.

( 1 )
parameter setting; (2) classification rate with different parameters; (3) classification rate with data augmentation; (4) classification rate with model sparsity; (5) classification rate with different classifier algorithms; (6) comparing with other methods; (7) classification rate with model ensemble.The performance of lung nodule image classification is computed with overall classification rate, as shown in the following:

7. 8 .
Classification Rate with Model Ensemble.Model ensemble can improve the classification performance by aggregating multiple individual classifiers[34].We evaluate model ensemble based on Majority Rule in this subsection.In Majority Rule, the class label is assigned with the one that most classifier votes.The function to evaluate the class label  for image  is given as follows:  () = ) ,  = arg max   (  ) , The model considers both local feature and global feature.Lung nodule CT images are first divided into local patches with Superpixel, and each patch is associated with a relatively intact tissue.Then local feature is extracted from each patch with deep autoencoder.Visual vocabulary is constructed with local features.Global representation is constructed by bag of visual word (BOVW) model and classifier is trained using softmax algorithm.The main contributions of our work are as follows: (i) a novel feature representation model for lung nodule image classification is proposed.Local and global features are constructed by unsupervised deep autoencoder and BOVW model, and (ii) comprehensive evaluations are conducted, and performance analyses are reported from multiple aspects.The structure of this paper is organized as follows.Related works are introduced in Section 2. Section 3 gives the framework.Local feature representation and global feature representation are given in Sections 4 and 5. Section 6 presents the classifier model.Experimental evaluations are shown in Section 7. Section 8 concludes this paper.

Table 1 :
Format of lung nodule position.

Table 3 :
Performance with different parameters.For original lung nodule image, it is sampled with possibility of 0.5 for data augmentation.The new created examples are set with same labels as original.As shown in Table4, data augmentation can increase classification rate

Table 4 :
Performance with data augmentation.

Table 5 :
The effect of model sparsity.

Table 6 :
The effect of different classifier algorithms.
7.7.Comparing with Other Methods.In order to evaluate the classification rate of different methods, 5 related algorithms are used for testing.Reference

Table 7 :
Performance comparing with other methods.

Table 8 :
Performance of model ensemble., the arithmetic average of class probabilities predicted by individual model is used as classification result. votes