A Five-Level Wavelet Decomposition and Dimensional Reduction Approach for Feature Extraction and Classification of MR and CT Scan Images

This paper presents a two-dimensional wavelet based decomposition algorithm for classification of biomedical images. The twodimensional wavelet decomposition is done up to five levels for the input images. Histograms of decomposed images are then used to form the feature set. This feature set is further reduced using probabilistic principal component analysis. The reduced set of features is then fed into either K nearest neighbor algorithm or feed-forward artificial neural network, to classify images. The algorithm is compared with three other techniques in terms of accuracy.The proposed algorithm has been found better up to 3.3%, 12.75%, and 13.75% on average over the first, second, and third algorithm, respectively, using KNN and up to 6.22%, 13.9%, and 14.1% on average using ANN. The dataset used for comparison consisted of CT Scan images of lungs and MR images of heart as obtained from different sources.


Introduction
Biomedical images like Magnetic Resonance Imaging (MRI), Computed Tomography-(CT-) Scan, ultrasound images, and so forth have been recognized as a powerful tool for the detection of diseases in recent times.Various supervised or unsupervised algorithms are proposed to analyze biomedical images for purposes like segmentation of an organ, identification of disease affected area, classification of images, and so forth [1][2][3].Following subsections summarizes various algorithms used for biomedical image classification.
1.1.MRI Related Work.Chaplot et al. [4] used Daubechies wavelet transform to extract the features of an MR image for patients suffering from brain tumor.Then self-organizing map and support vector machines have been used for classification between images of patients suffering from tumor and images of patients not suffering from tumor.Saravanan and Ramachandran [5] further extended this approach to use the Daubechies wavelet from level db1 to db15 and the wavelet having highest potential was selected.The coefficients extracted for that wavelet component are used for classification using backpropagation algorithm.
White mater hyperintensities in brain are commonly observed disorders found in ageing people.Griffanti et al. [6] proposed a method where correlation amongst images was identified using mean and standard deviation between various features like cognition, tissue microstructure, and so forth.This correlation was able to diagnose similar MR images with white matter disturbances.Ramakrishnan and Sankaragomathi [7] used SVM along with sequential minimal optimization (SMO) and modified region growing (MRG) with grey wolf optimization (GWO).The features fed into these two systems were grey level cooccurrence matrix, maximum intensity, and local Gabor XOR pattern.The proposed framework was compared with other similar techniques on the basis of accuracy and was claimed to be better.Also Nayak et al. in [8] diagnosed pathological brain by using fifty largest coefficients from level-5 discrete curvelet transform and then reducing the feature vector using PPCA.SVM is then used to classify between healthy and pathological brain.Authors in [9] proposed a system that employs contrast limited adaptive histogram equalization scheme to enhance the diseased region in brain MR images.Subsequently a two-dimensional wavelet transform is applied and correlated features are extracted using symmetric uncertainty ranking based filter.Zhang et al. [10] used stationary wavelet entropy to extract features from MR images and then a neural network feed-forward classifier is employed to classify between images of healthy people and patients suffering from hearing loss.Also authors in [11] proposed a scheme to identify pathological brain by using a simplified pulsecoupled neural network (SPCNN) for the region of interest (ROI) segmentation and fast discrete curvelet transform (FDCT) for feature extraction.Then PCA and linear discriminant analysis (LDA) are used to reduce feature and then probabilistic neural networks classified the images.The system achieved an accuracy up to 99.5%.
Maitra and Chatterjee [12] used slantlet transform to extract features.Slantlet transform is an extension of discrete wavelet transform (DWT) where the support of discrete time based functions is minimized.A number of features thereby extracted were kept to six which were then fed into a feedforward artificial neural network for further classification.The results thereby obtained had 100 percent accuracy as compared to other DWT based algorithm for classification.Nayak et al. [13] further extracted the 2D wavelet components of an MR image and reduced them using probabilistic principal component analysis (PPCA).Finally, with the reduced feature set of thirteen, authors used AdaBoost random forest classifier and claimed 100 percent accuracy.Zhang et al. [14] applied level 3 decomposition via Haar wavelet transform to obtain the features and then applied principal component analysis (PCA) to reduce the features.Further backpropagation algorithm was used to classify the images as normal or diseased.The image dataset used was T2-weighted MR brain images from Harvard University.Sauwen et al. [15] compared various unsupervised classification techniques for brain tumor segmentation using their own two different datasets.Chen et al. [16] first defined a cluster center and then used simple extenics based correlation function to identify the relation between features and remove the redundant ones.Further particle swarm optimization was applied to classify the images.The accuracy and error rates were compared to similar algorithms and the proposed one was found to be superior.
Termenon [17] used extreme learning machines to extract features from MR images and applied majority vote classification to classify them.Cabria and Gondra [18] fused the segmented parts of a brain MR image to detect brain tumor in it.The fusion is achieved using intersection and union methods.Then AdaBoost with SVM is applied for classification.

CT Scan or Other Biomedical Images Related Work.
Sudarshan et al. [19] presented a review work to understand the application of wavelets in detection of different types of cancer and they used ultrasound images to measure the performance.They compared the wavelet analysis of ultrasound images mainly.Authors in [20] compared various artificial neural network based classifiers which can be used for classification and clustering in biomedical images.Im and Park [21] also proposed a feature based classifier using ANN.The algorithm was tested for accuracy on a voting database and Monks problem for classification purpose.Polat et al. [22] used fuzzy based algorithm for classification of breast cancer and liver disorders.He normalized input data and then obtained artificial recognition balls (ARBs) for them.El-Dahshan et al. [23] used DWT to extract the components which were reduced by first computing a covariance matrix.This covariance matrix composed of eigenvalues is then rearranged in ascending order and feature vector is thereby selected out of it.This method is also known as PCA.KNN and ANN both were then applied to classify the images.Saritha et al. [24] extracted the DWT components based on Daubechies wavelet up to 8 levels.The extracted features are arranged on a spider web plot.The area components under the edges of these spider web plots are fed into a probabilistic neural network for further classification.Trigui et al. [25] classified CT images suffering from prostate cancer from the regular ones by extracting the spectrum signal based information which was analyzed to retrieve the choline and citrate levels in the prostate glands.A global feature vector was constructed by combining these two feature vectors and a supervised learning algorithm; namely, SVM was then applied for classification.
A lot of wavelet based techniques have been used by researchers for classification of biomedical images.By now, the various wavelet decomposition based approaches did not consider the feature set extracted by concatenating histograms of five different images obtained by wavelet decomposition of a biomedical image up to five levels.The proposed work extends the wavelet based pedagogy for the classification of biomedical images.It decomposes an image up to five levels using two-dimensional wavelet decomposition.Wavelet transforms have proven to be an efficient way of extracting information from images and less complex as compared to techniques like DWCT, curvelet transform, and so forth and thus are used here.Approximation coefficient matrix at each level is selected and its corresponding histogram is generated.The five histograms thereby obtained are concatenated to form a feature vector.The dimensionality of this feature vector is further reduced by using probabilistic PCA.The feature vector obtained with reduced dimension is used for classification purpose by either KNN or ANN.This approach is found to be more robust as compared to other approaches as discussed in Section 3.
The rest of the paper is organized as follows: Section 2 describes the methodology used for classification.It describes various steps of the proposed algorithm in detail.Section 3 discusses the results that are obtained when the proposed work is compared with algorithms in [23,24,26].In Section 4, conclusion and possible future work are discussed.

Methodology
The proposed algorithm consists of the following steps.signal [] can be calculated by passing it through a high pass and a low pass filter simultaneously.If a low pass filter has impulse response [] then DWT can be evaluated by calculating the convolution of original signal with the impulse response as Here * indicates complex conjugate [27].The signal is simultaneously decomposed with a high pass filter.The wavelet decomposition is done using Daubechies-4 wavelet technique.The high pass and low pass filters used are given in ( 2) and (3), respectively (where ℎ and  defines wavelet sequences for high and low filters used for convolution) [28].
To compute DWT for a two-dimensional image, the original image is convolved along  and  directions by low pass and high pass filters as shown in Figure 1.The images obtained are downsampled by columns indicated by 2↓.Downsampled columns means only even indexed columns are selected [27,29].The resultant images are then convolved again with high pass and low pass filters.These images are now downsampled by rows denoted by 1↓ which ultimately yields four subband images of half the size of original image.Thus the four subband images generated are  1 ,  1 ,  1 , and  1 . 1 ,  1 , and  1 contain the horizontal, vertical, and diagonal information of the image. 1 is the approximation coefficient and contains the maximum information of the image. 1 is selected for the next round of decomposition in the same manner as that of the original image.From the next round also, approximation coefficient, that is,  2 , is extracted.Similarly the image is decomposed by two-dimensional wavelet decomposition up to five levels.The approximation coefficients obtained, that is,  1 ,  2 ,  3 ,  4 , and  5 , are then used to form the feature set as demonstrated in the following subsections.Daubechies wavelet has two vanishing moments and thus it extracts better features as compared to simpler wavelets like Haar and achieves similar results as compared to complex wavelets like Gabor.Also it takes lesser time to retrieve results as compared to complex wavelet techniques and thus become a suitable choice for us to retrieve images [30].The decomposition up to five levels is done since at sixth level the image lost most of its details.Also through experimental results, it has been validated that the accuracy for classification was less at 4thlevel decomposition and also found to be decreasing on sixth level decomposition.

Feature Extraction and Dimensionality Reduction Using
Probabilistic PCA.The histogram of five approximation coefficient matrices, that is,  1 ,  2 ,  3 ,  4 , and  5 , is computed.To compute the histogram, we consider 256 equally spaced bins and calculate the number of pixels that belongs to each bin.Thus even if the image sizes are different, we get a histogram of size 256 × 1 for all the images.Thereby the five histograms thus are vectors of size 256×1 each.These five histograms are concatenated and the concatenated matrix is a feature vector of size 256 × 5.
The feature set is thereby reduced by applying probabilistic principal component analysis (PPCA).Principal component analysis reduces a given set of dimensions into lower dimension space.In PPCA the concept of associated likelihood function is used.It extracts a  dimensional vector  from a -dimensional vector variable  by the relationship as given in where  is the row vector of observed variable, * stands for multiplication,  is the row vector of latent variables, and  is the isotropic error term [13].The -by- weight matrix  relates the latent and observation variables, and the vector  permits the model to have a nonzero mean.In our case  is a vector of size 256 × 5.  is a 256×1 predefined weight matrix.Thus  comes out to be a vector of size 5 × 1.This step reduces the feature set of 256 × 5 values into a feature set of just 5 values for each image.As suggested by authors in [31], PPCA prevents overfitting of data during classification, particularly for images.Moreover, it helps in modelling data to higher dimensions with relatively few parameters and hence PPCA has been chosen for dimensionality reduction over regular PCA in this paper.The set of features was reduced up to a single dimensional vector of size 256 × 1 which took minimum time to classify images, when compared with a feature set of two or more columns and same number of rows Applied Computational Intelligence and Soft Computing (obtained if we use PPCA to reduce the feature set to obtain a matrix of size 256 × 2, 256 × 3, and so on).However there was negligible change in accuracy.Thereby PPCA is used to reduce the feature set up to a single column, that is, 256 × 1 vector.

Classification of MR Images.
The feature set obtained is then fed into a classifier.Two different classifiers,  nearest neighbor (KNN) classifier and ANN classifier, have been used for performance measurement.Support vector machines (SVM) can classify between two classes and thus are not used here.A brief overview of these classifiers is presented below.

𝐾 Nearest Neighbor Classifier.
In this method, we classify the given input image into one of the  closest training vectors.The -nearest neighbor classifier is a nonparametric supervised classifier which performs better when optimal values of  are chosen.Supervised learning is used for training this classifier.In the training phase a given feature vector is mapped to one predefined class out of four classes given as given in Section 3 to form a classifier.During the testing phase, classification of any feature vector is done by determining the lowest Euclidean distance to one of the four classes of biomedical images [23].

Feed-Forward Artificial Neural Network Classifier.
A feed-forward artificial neural network with one hidden layer has been used as a classifier.The hidden layer has 10 neurons.Output layer has four neurons to classify between four classes.The weights are initialized randomly and supervised learning is used to train the network and the weights are updated to map a given feature vector into a corresponding known class.After the training phase is over, matching on the testing dataset is performed and performance accuracy is measured [24].

Proposed Algorithm. The block diagram of the proposed feature extraction method for classification of MR images is
shown in Figure 2.
We consider Read image() as a function to read an image of a given format, two dimensional wavelet decomp() as a function to compute two-dimensional wavelet decomposition of the input image, and Histogram() as a function which computes histogram of the input image.Also Probabilistic principal component analysis(, ) reduces a matrix  of dimension  ×  to a matrix of dimension  × .Then the proposed algorithm can be summarized in Algorithm 1.
Novelty of the proposed work lies in the fact that proposed work decomposes a given image using Daubechies-4 wavelet decomposition up to five levels.No such algorithm exists where a feature vector is formed by concatenated histograms of five decomposed images.Moreover PPCA is applied to reduce the size of feature vector, yet maintaining the information for classification.Thereby a highly informative feature vector is designed of comparatively smaller length.

Experimental Results and Performance Analysis
All simulation work has been carried out on a computational device with 4 GB RAM,   infarct, patients suffering from hypertrophy, and normal patients without any heart disease as shown in Figure 4.
The configuration of two datasets is given in Table 1.
The superiority of this algorithm over others is demonstrated in Figures 5 and 6.If we now compare Figures 5(f) and 6(f), we find a significant difference between the two images.Thus at each level the difference between images is increasing.Thereby when we concatenate all the histograms of these images, we obtain a more accurate feature vector as against feature vector obtained by single level decomposition.
The performance of the proposed feature set is compared with three similar classification algorithms in terms of accuracy which is defined in [23] Accuracy = correctly classified samples total number of samples .   is varied or the type of classifier used is varied, the proposed algorithm yields better results.Also the comparison of four algorithms in terms of running time (seconds) is summarized in Table 5.
As we can see in Table 5, the proposed algorithm takes much lesser time as compared to [23], even though [23] gave equal amount of accuracy in few cases, [24,26] takes less time to execute but are much inferior in terms of accuracy if compared to the proposed algorithm in almost all the cases.

Conclusion and Future Scope
In this paper, a multilevel wavelet transform based feature matrix has been proposed for classification of CT Scan images and MR images.The feature set is extracted using histogram concatenation of images obtained by decomposing the original image through wavelet transform up to five levels.Extracted feature set is used with two classifiers, that is, KNN and ANN.The feature set is giving better results for both the classifiers and thus it can be claimed that the proposed feature vector is robust.For the proposed method, an increased accuracy of 3.3%, 12.75%, and 13.75% using KNN is achieved with respect to technique of [23], [24], and [26], respectively.Similarly, an increased accuracy of 6.22%, 13.9%, and 14.1% is achieved using ANN with respect to technique of [23,24,26], respectively.Thus it can also be claimed that the proposed feature set is more effective in terms of accuracy for multiple classifiers when compared to other three algorithms.As a future work, the proposed feature set can also be tested using other classifiers like random forest, deep neural networks, and so forth and with different medical images.

Figure 2 :
Figure 2: Schematic representation of the proposed work.

Figure 4 :
Figure 4: MR images of patients with heart condition as (a) heart failure with infarcts; (b) heart failure without infarcts; (c) hypertrophy; (d) normal.

Figure 5 (
a) indicates the middle view of lungs for a patient suffering with CLE and Figure 6(a) indicates the middle view of lungs for a patient suffering from PSE.When we apply only one level wavelet decomposition to these images we obtain Figures 5(b) and 6(b), respectively.As we can observe that Figures 5(b) and 6(b) are very similar, thus it becomes evident that it is difficult to distinguish between two figures correspondent to different classes of emphysema by a single level decomposition.However successive images, that is, Figures 5(c)-5(f) are images obtained by wavelet decomposition of Figure 5(b) to one more level till we obtain fifth level decomposition for Figure 5(a).Similarly if we decompose Figure 6(b) by one level at a time for one figure we obtain Figures 6(c)-6(f), and thus Figure 6(f) corresponds to fifth level wavelet decomposition of Figure 6(a).

Figure 5 :
Figure 5: CT Scan image of a patient suffering from CLE and its different levels of  images in wavelet transform (a) original image; (b) 1st level  image; (c) 2nd Level  image; (d) 3rd level  image; (e) 4th level  image; (f) 5th level  image.

Figure 6 :
Figure 6: CT Scan image of a patient suffering from PSE and its different levels of  images in wavelet transform (a) original image; (b) 1st level  image; (c) 2nd level  image; (d) 3rd level  image; (e) 4th level  image; (f) 5th level  image.

Table 1 :
Configuration of datasets used for experimentation.

Table 2 :
Accuracy for 75 and 25 percent training and testing data.

Table 3 :
Accuracy for 80 and 20 percent training and testing data.The comparisons in these tables are made for both datasets.As we can see that in all the cases the proposed algorithm performs better in terms of accuracy from the other three algorithms, even if the percentage of training and testing set

Table 4 :
Accuracy for 85 and 15 percent training and testing data.

Table 5 :
Running time comparison (in seconds) of different algorithms.