The DNA microarray classification technique has gained more popularity in both research and practice. In real data analysis, such as microarray data, the dataset contains a huge number of insignificant and irrelevant features that tend to lose useful information. Classes with high relevance and feature sets with high significance are generally referred for the selected features, which determine the samples classification into their respective classes. In this paper, kernel fuzzy inference system (K-FIS) algorithm is applied to classify the microarray data (leukemia) using
Accurate diagnosis of the disease, particularly “cancer,” is vital for the successful application of any specific therapy. Even though classification related to cancer diagnosis has been improved over the last decade significantly, still there is a need for its proper diagnosis with less subjective methods. Recent development in diagnosis indicates that DNA microarray provides an insight into cancer classification at the gene level due to their capabilities to measure abundant ribonucleic acid (mRNA) transcripts for thousands of genes concurrently.
Microarray-based gene expression profiling has emerged as an efficient technique for cancer classification as well as for diagnosis, prognosis, and treatment purposes [
The major drawback that exists in microarray data is the curse of dimensionality problem; that is, the number of genes
In this paper,
However, a linear subspace cannot describe the nonlinear variations of microarray genes. Alternatively, a kernel feature space can reflect nonlinear information of genes, in which the original data points are mapped onto a higher-dimensional (possibly infinite-dimensional) feature space defined by a function
The kernel trick is a mathematical technique which can be applied to any algorithm. It solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced by the kernel function. When properly applied, these candidate linear algorithms are transformed into nonlinear algorithms (sometimes with little effort or reformulation). These nonlinear algorithms are equivalent to their linear originals operating in the range space of a feature space.
In the literature, it is observed that the following types of kernels have been used to map the function in high dimensional space:
where
The choice of a kernel function depends on the problem in hand because it depends on what we are trying to model. For instance, a polynomial kernel allows feature conjunction modeling to the order of the polynomial. Radial basis function allows picking out circles (or hyperspheres) in contrast with the linear kernel, which allows only picking out lines (or hyperplanes). The objective behind using the choice of a particular kernel can be very intuitive and straightforward depending on what kind of information is to be extracted with respect to data.
Fuzzy logic provides a means to arrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or missing input information. Since the nature of dataset is quite fuzzy, that is, not predictable, which in turn (data) leads to different inference, the relationship among the data and inference is unknown. The fuzzy concept has been used in this work, to study the behavior of the data (capturing human way of thinking), and also it is also possible to represent and describe the data mathematically. Further, fuzzy system has been considered because of the limited number of learning rules that needs to be learnt in the present system. The number of free parameters to be learnt is reduced considerably, leading to efficient computation. In general, if the number of features is larger than 100, then it is suitable to use machine learning techniques rather than using statistical approaches.
If ANN is applied for the same method, designing the model would be far more challenging due to the large number of cases. Hence coupling ANN with Fuzzy logic will be easy to handle by inferring the rule base of the fuzzy system.
In the current scenario, neurofuzzy networks have been found to be successfully applied in various areas of analytics. Two typical types of neurofuzzy networks are Mamdani-type [
Along with the feature selection using t-statistic, a nonlinear version of FIS called kernel fuzzy inference system (K-FIS) using 10-fold cross-validation (CV). The results obtained from the experimental work carried out on leukemia dataset show that the proposed methods perform well when certain performance indicators are considered.
The rest of the paper is organized as follows. Section
This section gives a brief overview of the feature selection methods and classifiers used by various researchers and practitioners and their respective accuracy rate achieved in gene classification. Table
Relevant works on cancer classification using microarray (leukemia) dataset.
Author | Feature selection/extraction method | Classifier used | Accuracy (%) |
---|---|---|---|
Cho et al. [ |
Kernel fisher feature discriminant analysis (KFDA) | 73.53 | |
|
|||
Deb and Raji Reddy [ |
NSGA-II | 100 | |
|
|||
Lee et al. [ |
Bayesian model | Artificial neural network (ANN), KNN, and SVM | 97.05 |
|
|||
Ye et al. [ |
Uncorrelated linear discriminant analysis (ULDA) | KNN ( |
97.5 |
|
|||
Cho et al. [ |
SVM-RFE | Kernel KFDA | 94.12 |
|
|||
Paul and Iba [ |
Probabilistic model building genetic algorithm (PMBGA) | Naive-Bayes (NB), weighted voting classifier | 90 |
|
|||
D |
Random forest | 95 | |
|
|||
Peng et al. [ |
Fisher ratio | NB, decision tree J4.8, and SVM | 100, 95.83, and 98.6 |
|
|||
Pang et al. [ |
Bootstrapping consistency gene selection | KNN | 94.1 |
|
|||
Hernandez et al. [ |
Genetic algorithm (GA) | SVM | 91.5 |
|
|||
Zhang and Deng [ |
Based Bayes error filter (BBF) | Support vector machine (SVM), |
100, 98.61 |
|
|||
Bharathi and Natarajan [ |
ANOVA | SVM | 97.91 |
|
|||
Tang et al. [ |
ANOVA | Discriminant Kernel partial least square (Kernel-PLS) | 100 |
|
|||
Mundra and Rajapakse [ |
|
SVM | 96.88, 98.12, 97.88, and 98.41 |
|
|||
Lee and Leu [ |
|
Hybrid with GA + KNN and SVM | 100 |
|
|||
Salem et al. [ |
Multiple scoring gene selection technique (MGS-CM) | SVM, KNN, and linear discriminant analysis (LDA) | 90.97 |
The presence of a huge number of insignificant and irrelevant features degrades the quality of analysis of the disease like “cancer.” To enhance the quality, it is very essential to analyze the dataset in proper perspective. This section presents the proposed approach for classification of microarray data, which consists of two phases: this phase, preprocessess the input data using various methods such as missing data imputation, normalization, and feature selection using the fact that K-FIS algorithm has been applied as a classifier.
Figure
Proposed work for microarray classification.
This section describes the performance parameters used for classification [
Classification matrix.
NO | YES | |
---|---|---|
NO | True Negative (TN) | False Positive (FP) |
YES | False Negative (FN) | True Positive (TP) |
Performance parameters.
Performance parameters | Description |
---|---|
|
It is the degree to which the repeated measurements under unchanged conditions show the same results |
|
It indicates that the number of the relevant items are to be identified |
|
It combines the “precision” and “recall” numeric values to give a single score, which is defined as the harmonic mean of the precision and recall |
|
It focuses on how effectively a classifier identifies negative labels |
|
It measures the percentage of inputs in the test set that the classifier correctly labeled |
Receive operating characteristic (ROC) curve | ROC curve is a graphical plot which illustrates that the performance of a binary classifier system as its discrimination threshold is varied. It investigates and employs the relationship between “true positive rate (sensitivity)” and “false positive rate ( |
Generally, the problems with microarray data are (a) “curse of dimensionality,” where numbers of features are much larger than the number of samples, (b) the fact that there are so many features having very less effect on the classification result, and so forth. To alleviate these problems, feature selection approaches are used. In this paper,
A widely used filter method for microarray data is to apply a univariate criterion separately on each feature, assuming that there is no interaction between features. A two-class problem test of the null hypothesis (
Empirical cumulative distribution function (CDF) of the
From Figure
For a given universe set
Fuzzy sets are obtained by generalizing the concept of characteristic function to a membership function
The TSK fuzzy model (FIS) is an adaptive rule model introduced by Takagi et al. [
In this section, K-FIS has been described which is a nonlinear version of FIS. The number of rules (
Framework of kernel fuzzy inference system (K-FIS).
The kernel subtractive clustering (KSC) is a nonlinear version of subtractive clustering [
Reject ratio (
For a given data point
Accept
Accept
Reject
After computing the number of rules (
IF
where
THEN
The degree (firing strength) with which the input matches
In this case, each rule is a crisp output. The overall output is calculated using the weighted average as shown in the following:
Using the usual kernel trick, the inner product can be substituted by kernel functions satisfying Mercer’s condition. Substituting the expansion of
In this section, the obtained results are discussed for the proposed algorithm (Section
The leukemia dataset consists of expression profiles of 7129 features (genes), categorized as acute lymphoblastic leukemia (ALL), and acute myeloid leukemia (AML) classes, having 72 samples [
Classification matrix before classification.
ALL(0) | AML(1) | |
---|---|---|
ALL(0) | 47 | 0 |
AML(1) | 25 | 0 |
Since the dataset contains a very large number of features with irrelevant information, feature selection (FS) method has been applied to select the features (genes) which have high relevance score, and the genes with a low relevance score are discarded. to avoid overfitting and improve model (classifier) performance, to provide faster and more cost-effective models, to gain a deeper insight into the underlying processes that generate the data.
To achieve these objectives of FS, forward selection method has been employed by selecting the features having high “
Selected features with “
Number of features | Notation | Selected features with gene ID. |
---|---|---|
5 | F5 |
|
10 | F10 | F5 |
15 | F15 | F10 |
20 | F20 | F15 |
25 | F25 | F20 |
30 | F30 | F25 |
After feature selection using
The dataset is divided into different subsets for the training and testing purpose. First of all, every tenth sample out of seventy-two (72) samples is extracted for testing purpose and the rest of the data will be used for training purpose. Then the training set has been partitioned into the learning and validation sets in same manner as shown below.
After partitioning data into learning set and validation set, model selection is performed using 10-fold CV process by varying the parameters of K-FIS. The parameters used in the proposed work are shown in Table
Parameters of K-FIS model.
Parameters used | Range | Value used |
---|---|---|
Squash factor ( |
|
1.25 |
Accept ratio ( |
(0, 1] | 0.75 |
Reject ratio ( |
(0, 1] | 0.15 |
Cluster radius ( |
(0, 1] | — |
By varying the value of
Divide the training set ( Train the model using learning set ( Validate the model using validation set ( Calculate Accuracy of the model.
After feature selection using
In this study, kernel TSK fuzzy (K-FIS) approach based on kernel subtractive clustering (KSC) has been used to classify the microarray gene expression data. The process of classifier (model) building using KSC has been carried out by formation of clusters in the data space and translation of these clusters into TSK rules. The number of clusters signifies the number of rules; that is, the number of rules in K-FIS will be equal to a number of clusters obtained using KSC. The parameters used in K-FIS are shown in Table
After feature selection using
After performing “10-fold CV” on the dataset, the predicted values of test data are collected from each of the folds and classification matrix has been computed in each of the cases as shown in Table
Classification matrix for FIS with different set of features.
F5
0 | 1 | |
0 | 44 | 3 |
1 | 2 | 23 |
F10
0 | 1 | |
0 | 46 | 1 |
1 | 1 | 24 |
F15
0 | 1 | |
0 | 43 | 4 |
1 | 2 | 23 |
F20
0 | 1 | |
0 | 44 | 4 |
1 | 0 | 25 |
F25
0 | 1 | |
0 | 45 | 2 |
1 | 1 | 24 |
F30
1 | 0 | |
0 | 45 | 2 |
1 | 1 | 24 |
Performance analysis of FIS with different set of features with best suitable cluster radius (
Models ( |
Accuracy | Precision | Recall | Specificity |
|
---|---|---|---|---|---|
F5 (0.5) | 0.9306 | 0.9200 | 0.8846 | 0.9565 | 0.9020 |
F10 (0.2) | 0.9722 | 0.9600 | 0.9600 | 0.9787 | 0.9600 |
F15 (0.4) | 0.9167 | 0.9200 | 0.8519 | 0.9556 | 0.8846 |
F20 (0.45) | 0.9444 | 1.0000 | 0.8621 | 1.0000 | 0.9259 |
F25 (0.2) | 0.9583 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
F30 (0.4) | 0.9583 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
ROC curve for FIS with different set of features.
F5
F10
F15
F20
F25
F30
It has been observed that L-FIS as a classifier achieved highest accuracy when 10 numbers of features (i.e., F10) have been selected. Model L-FIS has high (
Hence from the obtained results, it is concluded that the role of feature selection is very important to classify the data with the classifier.
Classification matrix for K-FIS with different sets of features using polynomial (
F5
0 | 1 | |
0 | 46 | 1 |
1 | 0 | 25 |
F10
0 | 1 | |
0 | 46 | 1 |
1 | 1 | 24 |
F15
0 | 1 | |
0 | 45 | 2 |
1 | 1 | 24 |
F20
0 | 1 | |
0 | 45 | 2 |
1 | 1 | 24 |
F25
0 | 1 | |
0 | 45 | 2 |
1 | 1 | 24 |
F30
0 | 1 | |
0 | 45 | 2 |
1 | 1 | 24 |
The
In comparison with Table
ROC curve for K-FIS using polynomial kernel (
F5
F10
F15
F20
F25
F30
After analyzing K-FIS (polynomial) with various sets of features, Table
Performance analysis of K-FIS using polynomial kernel (
Models ( |
Accuracy | Precision | Recall | Specificity |
|
---|---|---|---|---|---|
F5 (0.2) | 0.9861 | 1.0000 | 0.9615 | 1.0000 | 0.9804 |
F10 (0.2) | 0.9722 | 0.9600 | 0.9600 | 0.9787 | 0.9600 |
F15 (0.3) | 0.9583 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
F20 (0.2) | 0.9583 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
F25 (0.2) | 0.9583 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
F30 (0.4) | 0.9583 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
After performing “10-fold CV” on the dataset, the predicted values of test data are collected from each of the folds and classification matrix has been computed in each of the cases as shown in Table
Classification matrix for K-FIS with different set of features using RBF kernel (
F5
0 | 1 | |
0 | 45 | 2 |
1 | 0 | 25 |
F10
0 | 1 | |
0 | 45 | 2 |
1 | 1 | 24 |
F15
0 | 1 | |
0 | 45 | 2 |
1 | 0 | 25 |
F20
0 | 1 | |
0 | 42 | 5 |
1 | 0 | 25 |
F25
0 | 1 | |
0 | 42 | 5 |
1 | 0 | 25 |
F30
0 | 1 | |
0 | 40 | 7 |
1 | 2 | 23 |
In comparison with Table
ROC curve for K-FIS using RBF kernel (
F5
F10
F15
F20
F25
F30
After analyzing K-FIS (RBF) with various sets of features, Table
Performance analysis of K-FIS using RBF kernel (
Models | Accuracy | Precision | Recall | Specificity |
|
---|---|---|---|---|---|
F5 (0.4) | 0.9722 | 1.0000 | 0.9259 | 1.0000 | 0.9615 |
F10 (0.2) | 0.9583 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
F15 (0.3) | 0.9722 | 1.0000 | 0.9259 | 1.0000 | 0.9615 |
F20 (0.4) | 0.9306 | 1.0000 | 0.8333 | 1.0000 | 0.9091 |
F25 (0.6) | 0.9306 | 1.0000 | 0.8333 | 1.0000 | 0.9091 |
F30 (0.6) | 0.8750 | 0.9200 | 0.7667 | 0.9524 | 0.8364 |
After performing “10-fold CV” on the dataset, the predicted values of test data are collected from each of the folds and classification matrix has been computed in each of the cases as shown in Table
In comparison with Table
Classification matrix for K-FIS with different set of features using tansig kernel (
F5
0 | 1 | |
0 | 46 | 1 |
1 | 0 | 25 |
F10
0 | 1 | |
0 | 46 | 1 |
1 | 1 | 24 |
F15
0 | 1 | |
0 | 45 | 2 |
1 | 1 | 24 |
F20
0 | 1 | |
0 | 45 | 2 |
1 | 2 | 23 |
F25
0 | 1 | |
0 | 45 | 2 |
1 | 1 | 24 |
F30
0 | 1 | |
0 | 45 | 2 |
1 | 1 | 24 |
ROC curve for K-FIS using tansig kernel (
F5
F10
F15
F20
F25
F30
After analyzing K-FIS (Tansig) with various sets of features, Table
Performance analysis of K-FIS using tansig kernel (
Models ( |
Accuracy | Precision | Recall | Specificity |
|
---|---|---|---|---|---|
F5 (0.2) | 0.9861 | 1.0000 | 0.9615 | 1.0000 | 0.9804 |
F10 (0.2) | 0.9722 | 0.9600 | 0.9600 | 0.9787 | 0.9600 |
F15 (0.2) | 0.9583 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
F20 (0.2) | 0.9444 | 0.9200 | 0.9200 | 0.9575 | 0.9200 |
F25 (0.2) | 0.9683 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
F30 (0.2) | 0.9683 | 0.9600 | 0.9231 | 0.9783 | 0.9412 |
A best model for classification of microarray data is chosen based on the performance parameters such as accuracy, precision, recall, specificity, and In case of K-FIS classification using different kernel functions, tansig kernel function obtained high values of accuracy with different set of features, namely, F5, F10, F15, F20, F25, and F30. The respective accuracies for the features are 98.61%, 97.22%, 95.83%, 94.44%, 96.83% and 96.83% respectively on test data. In case of SVM classifier with different kernel functions:
the parameters of the kernel functions like from Table
Average training, average testing accuracy, and CPU time (in seconds) with different models.
Models∖number of Features | F5 | F10 | F15 | F20 | F25 | F30 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Train Acc. | Test Acc. | Train Acc. | Test Acc. | Train Acc. | Test Acc. | Train Acc. | Test Acc. | Train Acc. | Test Acc. | Train Acc. | Test Acc. | |
KFIS (linear kernel) | 95.71 | 93.06 (2.9) | 97.81 | 97.22 (7.6) | 96.86 | 91.66 (14.7) | 94.5 | 94.44 (24.6) | 96.88 | 95.83 (30.4) | 95.98 | 95.83 (37.1) |
KFIS (poly kernel) | 98.55 | 98.61 (44.3) | 97.19 | 97.22 (52.1) | 97.83 | 95.83 (60.2) | 94.31 | 95.83 (79.5) | 96.79 | 95.83 (81.5) | 96.76 | 95.83 (80.7) |
KFIS (RBF kernel) | 99.24 | 97.22 (5.5) | 95.71 | 95.83 (13.4) | 96.55 | 97.22 (18.8) | 92.12 | 93.05 (26.1) | 92.07 | 93.05 (31.3) | 89.36 | 87.50 (36.8) |
KFIS (tansig kernel) | 98.71 | 98.61 (41.7) | 97.19 | 97.22 (53.4) | 97.5 | 95.83 (69.9) | 93.88 | 94.44 (81.1) | 96.92 | 96.83 (80.7) | 96.62 | 96.83 (84.2) |
|
||||||||||||
SVM (linear Kernel) | 97.22 | 97.22 (3) | 97.37 | 97.22 (3.5) | 96.61 | 94.44 (3.6) | 97.22 | 95.83 (3.8) | 97.22 | 95.83 (4) | 97.84 | 97.22 (4.2) |
SVM (poly Kernel) | 96.75 | 91.67 (1.5) | 96.14 | 94.44 (1.6) | 96.76 | 93.06 (1.7) | 95.83 | 93.06 (2) | 97.22 | 97.22 (2.2) | 97.22 | 97.22 (2.3) |
SVM (RBF Kernel) | 97.68 | 94.44 (2) | 97.84 | 97.22 (2.3) | 99.38 | 100.00 (2.7) | 98.00 | 95.83 (3.2) | 98.61 | 98.61 (3.7) | 98.15 | 98.61 (4.7) |
SVM (tansig Kernel) | 98.00 | 97.22 (3.1) | 98.30 | 98.61 (3.3) | 98.15 | 95.83 (3.5) | 97.69 | 94.44 (3.7) | 97.22 | 95.83 (4) | 97.84 | 97.22 (4.7) |
The comparative analysis of the accuracies of different models has been presented in Figure
Comparison of testing accuracy of K-FIS using different feature set.
Training accuracy in each fold with different set of features using K-FIS with linear kernel.
Testing accuracy in each fold with different set of features using K-FIS with linear kernel.
Training accuracy in each fold with different set of features using K-FIS with polynomial kernel.
Testing accuracy in each fold with different set of features using K-FIS with polynomial kernel.
Training accuracy in each fold with different set of features using K-FIS with RBF kernel.
Testing accuracy in each fold with different set of features using K-FIS with RBF kernel.
Training accuracy in each fold with different set of features using K-FIS with tansig kernel.
Testing accuracy in each fold with different set of features using K-FIS with tansig kernel.
The running time of the classification algorithm depends on number of features (genes) and number of training data points. The running times were recorded using MATLAB’13a on Intel Core(TM) i7 CPU with 3.40 GHz processor and 4 GB RAM for different models in Table
In this paper, an attempt has been made to design a classification model for classifying the samples of leukemia dataset either into ALL or AML class. In this approach, a framework was designed for construction of K-FIS model. K-FIS model was developed on the basis of KSC technique in order to classify the microarray data using “kernel trick.” The performance of the classifier for leukemia dataset was evaluated by using 10-fold cross-validation.
From the computed result, it is observed that K-FIS classifier using different kernels yields very competitive result than SVM classifier. Also, when the overall performance is taken into consideration, it is observed that tansig kernel coupled with K-FIS classifier acts as a more effective classifier among the selected classifiers in this analysis. It is evident from the obtained results that “kernel trick” provides a simple but powerful method for classification where data is nonlinearly separable. Data existing in nonlinear space can be easily classified by using a kernel trick.
Further, kernel trick can be applied for all the existing classifiers or to the recently proposed classifiers to classify the data with high predictive accuracy.
For more details see Figures
The authors declare that there is no conflict of interests regarding the publication of this paper.