An Efficient Feature Subset Selection Algorithm for Classification of Multidimensional Dataset

Multidimensional medical data classification has recently received increased attention by researchers working on machine learning and data mining. In multidimensional dataset (MDD) each instance is associated with multiple class values. Due to its complex nature, feature selection and classifier built from the MDD are typically more expensive or time-consuming. Therefore, we need a robust feature selection technique for selecting the optimum single subset of the features of the MDD for further analysis or to design a classifier. In this paper, an efficient feature selection algorithm is proposed for the classification of MDD. The proposed multidimensional feature subset selection (MFSS) algorithm yields a unique feature subset for further analysis or to build a classifier and there is a computational advantage on MDD compared with the existing feature selection algorithms. The proposed work is applied to benchmark multidimensional datasets. The number of features was reduced to 3% minimum and 30% maximum by using the proposed MFSS. In conclusion, the study results show that MFSS is an efficient feature selection algorithm without affecting the classification accuracy even for the reduced number of features. Also the proposed MFSS algorithm is suitable for both problem transformation and algorithm adaptation and it has great potentials in those applications generating multidimensional datasets.


Introduction
The multidimensional classification problem has been a popular task, where each data instance is associated with multiple class variables [1]. High-dimensional datasets contain irrelevant and redundant features [2]. Feature selection is an important preprocessing step in mining high-dimensional data [3]. Time complexity is high for selecting the subset of features and for further analysis or to design the classifier if the number of features and targets (class variables) in the dataset is large. Computational complexity is based on three factors: number of training examples " ," dimensionality " ," and number of possible class labels " " [4,5].
The prime challenge for a classification algorithm is that the number of features is very large, whilst the number of instances is very small. A common approach to this problem is to apply a feature selection method in a preprocessing phase, that is, before applying a classification algorithm to the data, in order to select a small subset of relevant features for microarray data classification (high-dimensional data) [6,7]. Multidimensional data degrade the performance of the classifiers and reduce the classifier accuracy and processing this data is too complex by traditional methods and needs a systematic approach [8]. Therefore, mining the multidimensional dataset is a challenging task among the recent data mining researchers.
Most of the proposed feature selection algorithms support only single-labelled data classification [9,10]. The related feature selection algorithms do not fit into those applications generating multidimensional datasets [9]. The effective feature selection algorithm is an important task for efficient machine learning [11]. Feature selection in the multidimensional is the challenge task. The solution space, which is exponential in the number of target attributes, becomes enormous, even with a limited number of target attributes. The relationships between the target attributes can add a level of complexity that needs to be taken into account [12]. 2 The Scientific World Journal 2 statistic is used to rank the features of highdimensional textual data by transforming the multilabel dataset into the single label classification using label powerset transformation [13]. The chi-square test is not suitable in determining the good correlation between the decision classes and features. Also, it is not suitable for the highdimensional dataset [14]. Pruned problem transformation is applied to transform multilabel problem to single label and greedy feature selection employed by considering the mutual information [15]. REAL algorithm is employed for selecting the significant symptoms (features) for each syndrome (classes) in the multilabel dataset [16]. Classifier built from the MLD is typically more expensive or timeconsuming with multiple feature subsets. It discusses the future works related to multidimensional classification such as studying different single-labelled classifiers and feature selection [1]. A genetic algorithm is used to identify the most important feature subset for prediction. Principal component analysis is used to remove irrelevant and redundant features [17].
In multidimensional learning tasks, where there are multiple target variables, it is not clear how feature selection should be performed. Limited research is only available on multilabel feature selection [9]. Therefore, we are in need of a robust feature selection technique for selecting the significant single subset of features from the multidimensional dataset. In this paper, an efficient feature selection algorithm is proposed for the multidimensional dataset (MDD).
The rest of this paper is organized as follows. Section 2 briefly presents the basics of multidimensional classification and addresses the importance of data preprocessing. Section 3 describes the proposed multidimensional feature subset selection (MFSS) which is based on weight of featureclass interactions. Section 4 presents the experimental results and analysis to evaluate the effectiveness of the proposed model. Section 5 concludes our work.

Preliminaries
This section presents some basic concepts of multidimensional classification and the importance of preprocessing in data mining.

Multidimensional Paradigm.
In general, the multidimensional dataset contains " " independent variables and " " dependent variables. Each instance is associated with multiple class values. The classifier is built from a number of training samples. Figure 1 shows the relationship between different classification paradigms, where " " is the number of class variables and " " is the number of values for each of the " " variables. Multidimensional classification assigns each data instance to multiple classes. In multidimensional classification, the problem is decomposed into multiple, independent classification problems, aggregating the classification results from all the independent classifiers; that is, one single-dimensional multiclass classifier is applied to each class variable, called problem transformation [1].

Multidimensional Classification.
Multilabel classification (MLC) refers to the problem of instance labelling where each instance may have more than one correct label. Multilabel classification has recently received increased attention by researchers working on machine learning and data mining. Multilabel classification is becoming increasingly common in modern applications For example, a news article could belong to multiple topics, such as politics, finance, and economics, and also could be related to China and the USA as the regional categories. Typical examples include medical diagnosis, gene/protein function prediction and document (or text) categorization, multimedia information retrieval to tag recommendation, query categorization, gene function prediction, medical diagnosis, drug discovery, and marketing [18][19][20][21]. Traditional single-label classification algorithms refer to classification tasks that predict only one label. The basic algorithms are generally known as single-label classification and it is not suitable for the data structures found in real world applications. For example, in medical diagnosis, a patient may be suffering from diabetes and prostate cancer at the same time [18,22].
Research on MLC has received much less attention compared to single-labelled classification. MLC problem is decomposed into multiple, independent binary classification problems and determines the final labels for each data point by aggregating the classification results from all the binary classifiers [23]. Due to its complex nature, the labelling process of a multilabel data set is typically more expensive or time-consuming compared to single-label cases. Learning effective multilabel classifiers from a small number of training instances is important to be investigated [8].

Handling Missing Values.
Raw data collected from different sources in different format are highly susceptible to noise, irrelevant attributes, missing values, and inconsistent data. Therefore, data preprocessing is an important phase that helps to prepare high quality data for efficient data mining in the large datasets. Preprocess improves the data mining results and ease of the mining process. Missing values exist in many situations, where there are no values available for some variables. Missing values affect the data mining results. Therefore, it is important to handle missing values to improve the classifier accuracy in data mining tasks [24][25][26][27].
The Scientific World Journal 3

Feature Selection. Feature selection [FS] is an important
and critical phase in pattern recognition and machine learning. This task aims to select the essential features to discard the less significant features from the analysis. It is used to achieve various objectives: reducing the cost of data storage, by facilitating data visualization, reducing the dimension of the dataset for the classification process in order to optimize the time, and improving the classifier accuracy by removing the redundant and irrelevant variables [28][29][30].
It is classified into three main categories: filters, wrappers, and embedded methods. In the filter method, selection criterion is independent of the learning algorithm. On the other hand, the selection criterion of the wrapper method depends on the learning algorithm and uses its performance index as the evaluation criterion. The embedded method incorporates feature selection as part of the training process [28][29][30].

Proposed Multidimensional Feature Subset Selection Algorithm
In this section, the proposed algorithm for selecting the single subset of features from the MDD is presented. The block diagram of the proposed MFSS is shown in Figure 2. MFSS has three phases. In the first phase, calculate the feature-class correlation, and assign weight for the features based on the feature-class correlation for each class. In the second phase, aggregate the results of feature weight of each class using proposed overall weight. In the third phase, select the optimal feature subset based on the proposed overall weight for further analysis or to build classifier. The proposed algorithm is developed from the correlation based attribute evaluation. A proposed MFSS algorithm for MDD is shown as follows.
Output. Optimal single unique subset of " " number of features from " " features: < .
Step 1. Compute Pearson's correlation between feature and class using the equation where -th class = 1 ⋅ ⋅ ⋅ " " is the number of classes, -th feature = 1 ⋅ ⋅ ⋅ " " is the number of features, " " is the number of observations, and is the (Pearson's correlation) between the th feature and th class.  Pearson's Correlation between the th feature and th class is represented as ( × ) matrix having "l" rows and "m" ].
Step 3. Let the weight of feature for class be . For each class , = 1 ⋅ ⋅ ⋅ .

{
Consider that " " is the number of features in the dataset.
Assign the weight " " for the feature , which contains the highest value of .
And assign the weight " − 1" for the feature , which contains the next highest value of , and so on.

}.
Step 4. Compute the overall weight for each feature using the equation Step 5. Rank the features, according to the overall weight .
Step 6. Select top " = log 2 " number of features based on the overall weight .

Experimental Evaluation
This section illustrates the evaluation of proposed MFSS algorithm in terms of the various evaluation metrics and the number of selected features in those applications generating multidimensional datasets.

4
The Scientific World Journal

Results and Discussion.
This section explores the inferences of the proposed MFSS and classification algorithms which are adopted in this study. A proposed MFSS algorithm uses threshold " = log 2 " to select the top features, where is the number of features in the data set [39][40][41]. In our experiment, various evaluation metrics, namely, Hamming loss, Hamming score, exact match, and zero-one loss are calculated before feature selection (BFS) and after applying the proposed MFSS for each of the four classifiers, namely, J48, Naive Bayes, SVM, and IBk for MDCSC. In this work Hamming score and exact match are used to evaluate the effectiveness of the proposed MFSS [33,37]. Tables 2, 3, 4, and 5 show the experimental results of five datasets for the four classifiers J48, Naive Bayes, SVM, and IBk for raw and selected features using the proposed MFSS. Hamming loss is the fraction of misclassified instance, label pairs. It is a loss function and it is inferred that before and after applying the proposed MFSS it is nearer to zero. Figures 3,4,5,6,7,8,9,and 10 show the relationship between the BFS and MFSS for the evaluation metrics HS and EM for the four classifiers.
Hamming score is the accuracy measure in the multilabel setting. The highest Hamming score was 99% before feature selection (BFS) and 97.8% after applying MFSS obtained using J48 compared with the other algorithms. An exact match is the percentage of samples that labels correctly classified. The highest exact match was 94.8% before feature selection (BFS) and 89.6% after applying MFSS obtained using J48 compared with the other algorithms. For solar flare dataset highest Hamming score was 91.2% before and after      datasets, namely, thyroid, solar flare, music, and yeast dataset. But for scene dataset the exact match is very less after applying the MFSS for all the four classifiers. Compared with other three algorithms SVM performs well for all the five datasets. From Figures 3,4,5,6,7,8,9, and 10 it is inferred that the proposed MFSS is superior to another regarding the aspects of Hamming score and exact match. Also MFSS achieves slightly poor exact match on the scene dataset for all the four classifiers.
Proposed algorithm needs to be validated by comparing the results of classifier before and after feature selection using statistical methods [42]. Correlation analysis is a technique used to measure the strength of the association between two or more variables. Correlation coefficient values always lie between −1 and +1. If the value is positive it indicates that the two variables are perfectly associated with positive linear and the value is negative, and it indicates that two variables are perfectly associated with negative linear. If the values are zero, there is no association between the variables. Evans classified the correlation coefficient into five categories such as very weak, weak, moderate, strong, and very strong [43]. Table 6 gives the details of Evans correlation coefficient   classification. Pearson's correlation coefficient ( ) is given by where is metrics before feature selection (BFS) and is metrics of proposed MFSS   The correlation coefficients between BFS and MFSS for the evaluation metrics, Hamming score, and exact match are depicted in Table 7. It indicates that the strength of association between the BFS and MFSS is very strong for all the four classifiers ( = 0.93, 0.868, 0.868, and 0.930 for HS and = 0.947, 0.909, 0.909, and 0.947 for EM) based on Evans categorization.
The paired -test is used for the comparison of two different methods of measurements that are taken from the same subject before and after some manipulation. To test the efficiency of the proposed feature selection algorithm paired The Scientific World Journal 7 -test is used and the results are depicted in Table 8. The paired -test statistic is given by Hypothesis for evaluation of proposed MFSS: consider the following. 0 : there is no significant difference between the performance of the classifier before feature selection (BFS) and after applying MFSS. 1 : there is a significant difference between the performance of the classifier before feature selection (BFS) and after applying MFSS.
From the paired -test for result, it is inferred that there is no significant difference between the performance of the classifier before feature selection and after MFSS for all the datasets with the critical value (2.7764, ∝= 0.05) and (4.6041, ∝ = 0.01) for the degrees of freedom 4. Table 9 gives the detail of features selected using the proposed MFSS. Figure 11 shows the relationship between the features selected using BFS and MFSS. From Table 9, it can be observed that the proposed MFSS selects only a less percentage of features (minimum 3% and maximum 30%) for further analysis or to build a classifier and have the computational advantage of multidimensional classification.
Multilabel classification is categorized into two types, namely, problem transformation and algorithm adaptation. Problem transformation is to decompose the multilabel learning problem into a number of independent binary classification problems. Algorithm adaptation methods tackle multilabel learning problem by adapting popular learning techniques to deal with multilabel data directly [5]. The feature selection method is categorized into global and local. Selecting the same subset of features from all classes is called global and that identifies a unique subset of features for each class called local [44]. An existing feature selection technique in the literature concentrates only on problem transformation (i.e., first transforming the multilabel data into single-label, which is then used to select features using traditional singlelabel feature selection techniques) [13][14][15][16]. It does not remove all the features because the union of the identified subsets of features from all classes is equal to the full feature subset [44].
An existing feature selection technique is compared with the proposed MFSS in terms of time complexity for further analysis or to build classifier in the multilabel setting which is depicted in Table 10. " " is the number of classes, " " is the number of features, and " " is the number of features selected using proposed MFSS in the MDD. From Table 10, the  time complexity is high when the existing feature selection techniques used are compared with the proposed MFSS for further analysis or to build a classifier. Existing feature selection algorithm is suitable only for single label dataset; therefore multidimensional dataset is transformed into single label using problem transformation for feature selection. It results in " " feature subset after problem transformation (i.e., a relevant feature subset for each class) but MFSS results only in a single unique feature subset. It is computationally high and complex because of " " times required for further analysis or to build a classifier. Algorithm adaptation methods deal with multilabel data directly, and it requires only one feature subset for further analysis or to build a classifier. The highlight of proposed MFSS is that it yields only a single unique feature subset. Also the proposed MFSS algorithm is suitable for both problem transformation and algorithm adaptation and has great potentials in those applications generating multidimensional datasets.

8
The Scientific World Journal fs : feature subset for class after problem transformation, = 1 ⋅ ⋅ ⋅ ; " ": the number of classes. fs : optimal single unique feature subset using proposed MFSS for all the " " classes.
To diagnose a disease, the physician has to consider many factors from the data obtained from the patients. Most researchers' aim is to identify the predictors which are used for diagnosis and prediction. The most important predictor is always increasing the predictive accuracy of the model. To diagnose the thyroid disease, physicians use the most important clinical experiments TSH, TT4, and T3. Experiment result of proposed MFSS shows that T3, FTI, TT4, T4U, and TSH are the top ranked feature. This reveals that the selected features obtained from the proposed method are same as the clinical experiments used by specialists to diagnose thyroid diseases. In almost all cases, classification results obtained using the proposed MFSS were significantly better than using the raw features. In conclusion, the study results indicate that the proposed MFSS is an effective and reliable feature subset selection method without affecting the classification accuracy even for the least number of features for the multidimensional dataset.

Conclusions
The prime aim is to select the optimum single subset of the features of the MDD for further analysis or to design a classifier. It is a challenging task to select the features with the interaction between feature and class in the MDD. In this paper, an efficient and reliable algorithm for feature subset selection from MDD based on class-feature interaction weight is proposed and the effectiveness of this algorithm is verified by statistical methods. The proposed method consists of three phases. Firstly, for each class feature-class correlation is calculated to identify the importance of feature for each class. Secondly, the weight is assigned to features based on the feature-class correlation for each class. Finally the overall feature weight is calculated based on the proposed weight method and selects the single subset " = log 2 " number of features for further analysis or to design a classifier. The proposed MFSS algorithm selects only a less percentage of features (minimum 3% and maximum 30%) and yields unique feature subset for further analysis or to build a classifier and has the computational advantage of multidimensional classification. The experimental results of this work (MFSS) on five multidimensional benchmark datasets have improved prediction accuracy by considering only the least number of features. The proposed MFSS algorithm is suitable for both problem transformation and algorithm adaptation. Also, it reveals some interesting conclusion that the proposed MFSS algorithm has great potentials in those applications generating multidimensional datasets.