The evolvement of the fuzzy system has shown influential and successful in many universal approximation capabilities and applications. This paper proposes a hybrid NeuroFuzzy and Feature Reduction (NFFR) model for data analysis. This proposed NFFR model uses a featurebased class belongingness fuzzification process for all the patterns. During the fuzzification process, all the features are expanded based on the number of classes available in the dataset. It helps to deal with the uncertainty issues and assists the Artificial Neural Network (ANN) based model to achieve better performance. However, the complexity of the problem increases due to this expansion of input features in the fuzzification process. These expanded features may not always contribute significantly to the model. To overcome this problem, feature reduction (FR) is used to filter out the insignificant features, resulting the network less computational cost. These reduced significant features are used in the ANNbased model to classify the data. The effectiveness of this proposed model is tested and validated with ten benchmark datasets (both balanced and unbalanced) to demonstrate the performance of the proposed NFFR model. The performance comparison of the NFFR model with other counterparts has been carried out based on various performance measures such as classification accuracy, root means square error, precision, recall, and fmeasure for quantitative analysis of the results. The obtained simulated results have been tested using the Friedman, Holm, and ANOVA tests under the null hypothesis for statistical validity and correctness proof of the results. The result analysis and statistical analysis show that the NFFR model has achieved a considerable improvement in accuracy and is found to be efficient in eliminating redundant and noisy information.
In the last few decades, machine learning [
The objective of the work is to develop a hybrid model called NFFR for data classification. In the first step, Πtype membership function is employed for fuzzification of input patterns. Then, the feature reduction algorithm is applied on fuzzified variables after the fuzzification process (postfeature reduction) of the NF model to develop neurofuzzy with the feature reduction (NFFR) model. In this model, postfeature reduction has been employed on fuzzified patterns to filter out the irrelevant, redundant, and noisy features. Unlike prefeature reduction, this allows all the features to participate in the fuzzification process and then identify irreverent features from the fuzzified patterns. This approach allows exploring potential fuzzified features from the weak feature set. The NFFR model extracts the fuzzified information that is truly contributing to the network for speeding up the classification process by eliminating the irrelevant fuzzified features. The major observation is that the overall time required for running the algorithm decreases considerably by using the FR algorithm called PCA. Thus, the NFFR model not only provides more accurate results but also reduces the execution time. We have compared four models such as ANN, ANN with FR using PCA (ANNFR), NF, and NFFR using ten benchmark datasets from the UCI machine learning repository. Each dataset is then evaluated on various performance measuring parameters such as root mean square error (RMSE), fmeasure, precision, and recall from the confusion matrix.
The remainder of this paper is organized as follows: Section
Several soft computing techniques such as FS, NN, NF and dimensionality reduction play a critical role in the development of hybrid models in the last few last decades. The hybridization of these techniques is also considered to be one of the benchmark works in the field of data mining, machine learning, and pattern recognition. This literature review indicates the most recent development of aforesaid models and their applications in various fields. FS proposed by Zadeh represents the way of representation of human perception specially in diverse fields such as language communication, pattern recognition, and information abstraction that solves uncertainty issues. These uncertainty problems can be resolved through different fuzzification techniques that are used to convert the input features into its corresponding fuzzified feature sets. This fuzzification process can be represented in two ways such as class belongingness fuzzification and class nonbelongingness fuzzification. Ghosh et al. [
PCA also plays an important role to eliminate redundant features from the input pattern which improves system performance along with accuracy. It extracts the important information from the datasets and represents it as a new set of orthogonal variables called principal components. It is a statistical method used to reduce the number of variables by collecting highly correlated variables. Polat and Güneş [
Last few decades, researchers are trying to design the hybrid systems by using the fuzzy system and neural network for pattern classification. The basic concepts of NF models, PCA, and the proposed hybrid model of NFFR models are presented in the following sections.
In realworld problems, uncertainty is one of the major challenges which leads to incomplete and imprecise information about the input data in pattern classification problems. Therefore, it is necessary to provide ample provision to handle uncertainty. In the NF model, instead of a normal crisp input value, fuzzy values are being inputted to the neural network. The fuzzified matrix is a result of the fuzzification process that generates a membership matrix in which the total number of element present in this matrix is equal to the product of the number of features and number of classes present in the dataset, which is input to the neural network. The fuzzified input matrix is associated with the degree of belongingness with respect to the classes, which extracts the featurewise information of the input pattern. Each feature value of a pattern represents membership values for each class, where membership values are measured by using Πtype membership functions as represented in Figure
Πtype membership function.
The fuzzification process may result in high dimensional data, where all the features may not carry significant information for discrimination of the pattern. Furthermore, this increase in dimension affects the complexity of machine learning algorithms. This section describes the working principle of a feature reduction algorithm called PCA to extract the relevant features from the original feature set to reduce the dimensions of the data. This can be achieved by transforming the high dimensional features into new small transformed features without losing the essential information of the original datasets. These new sets of features are called principal components in which the data varies as the linear combination of original features. PCA considers only those components that have a larger variance of the data. The major objective of PCA is to identify the hidden patterns of the data, determine the correlation among the features, and decrease the dimensionality of the features by eliminating the redundant and noisy features.
In this model, featurewise information of input patterns is extracted from the original data with respect to different classes. Since all features are not equally important in discriminating the instances, the featurewise belonging is expected to help in the classification process. In this section, a detailed schematic diagram of the novel NFFR model has been proposed for the classification of nonlinear data. This proposed model is moving parts into three major steps such as input feature fuzzification process, feature reduction using PCA, and classification using ANN with backpropagation learning.
Initially, this NFFR model is used to extract the featurewise information from the input pattern to its corresponding fuzzified matrix by using the class belongingness fuzzification technique. In this present study, we have used a popular Πtype membership function for fuzzification of the input pattern. Since all the features may not be significantly contributing to the classification process, it is essential to find the class belongingness of each attribute. In order to achieve this, here Πtype membership functions have been used for the fuzzification process which provides the degree of the belongingness of individual features with respect to the class labels. As a result, each feature value of the input patterns has been expanded to
Framework of the NFFR model.
The detailed working model of proposed models.
In this step, the
In equation (
In equations (
After the fuzzification process, the fuzzified matrix
Fuzzification process of IRIS dataset of feature (petal width).
Due to expansion of input feature, the complexity of the model increases. To make the classification process more effective and efficient, PCA is used to reduce the features of the fuzzified membership matrix. In this step, this fuzzified membership matrix
Let, the aforesaid fuzzified membership matrix
Let
The eigen values (
The output of this step is the reduced matrix (
In this step, the artificial neural network with backpropagation (ANNBPN) model is used for the classification process. The ANNBPN model uses backpropagation as a supervised learning algorithm to train the artificial neural network. It updates the weights of the model to minimize the loss by efficiently computing the gradients. This network uses the reduced fuzzified matrix as input which is generated from the Step
In the feedforward step, the model is trained based on the reduced fuzzified matrix input information. The net input is computed by making the sum of product of input patterns and the assigned weights and add the bias. Mathematically, the performance of the net input of the
In the backpropagation step, the error is computed by subtracting the actual output with the target output, and the error is expressed in as follows:
Similarly, the errors are computed, and the weights and bias are updated in the learning process. The weights of the connecting path between the different layers are adjusted by computing the change in weights in the model to reduce the overall error of the model, which is realized in equation (
The new weights and bias of the model can be computed by using the following equations, respectively:
This process is repeated multiple times to minimize the rootmeansquare error of the model or till the stopping criteria is reached. This proposed approach is different from ANN (feedforward with backpropagation), ANNFR, and NF as follows: (i) in ANN (feedforward with backpropagation), all the input features are processed in parallel without extracting insignificant features which take more time to train the model and also leads to uncertainty problem. (ii) In the ANNFR model, PCA is used in the preprocessing stage to eliminate insignificant features, but this model is unable to address the uncertainty issues. (iii) The NF model solves uncertainty issues by the using the class belongingness fuzzification process, but it is unable to eliminate the redundant or noisy features. (iv) By considering aforesaid issues, the proposed model solves the uncertainty issue by using the class belongingness fuzzification process and also eliminates the insignificant fuzzified features by using PCA instead of eliminating the complete features which seem insignificant.
In this section, the simulation environment and the dataset used for the training and testing phases for the analysis of the proposed model are presented. Here, four models (ANN, NF, ANNFR, and NFFR) are implemented using Matlab (version R2015a) with the Windows 7 operating system. The benchmark datasets from the UCI machine learning repository [
Several performance comparison techniques such as classification accuracy, root mean square error (RMSE), precision, recall, and fmeasure are obtained from the confusion matrix for all the benchmark datasets of each model, and the comparison result is presented. The details of these performance measures are outlined below. The comparison of RMSE of ANN, NF, ANNFR, and NFFR models is shown in Table
Comparison of RMS errors of ANN, NF, ANNFR, and NFFR models.
Datasets/models  ANN  NF  ANNFR  NFFR 

Iris  0.18721  0.10258  0.15471  0.11721 
Mammographic  0.30163  0.19111  0.26163  0.18031 
Breast Cancer  0.39734  0.17052  0.21637  0.02546 
Pima Indian  0.41819  0.29887  0.38248  0.30353 
HayesRoth  0.32242  0.14770  0.35525  0.16222 
Thyroid  0.15335  0.05315  0.14085  0.08838 
Titanic  0.19373  0.28599  0.07685  0.34237 
Wine  0.05249  0.03854  0.08942  0.10774 
Haberman  0.25329  0.33007  0.35470  0.35749 
Blood Transfusion Service  0.31176  0.31284  0.13597  0.29339 
Error plots of four datasets: (a) Titanic; (b) Mammographic; (c) Breast Cancer; (d) Wine.
The results presented here are exclusively based on the experiments that are observed. Table
Comparison of classification accuracy of ANN, NF, ANNFR, and NFFR.
Datasets/models  ANN  NF  ANNFR (studied)  NFFR (proposed)  

WC  AC  BC  WC  AC  BC  WC  AC  BC  WC  AC  BC  
Iris  85.71  91.56  94.44  90.00  92.63  97.06  90.63  94.02  98.02  91.43  95.15  99.26 
Mammographic Mass  73.91  79.21  84.62  79.99  84.19  87.35  82.53  84.50  86.36  83.05  85.31  87.72 
Breast Cancer Wisconsin  85.42  91.07  95.00  91.18  93.98  96.48  93.66  95.39  97.74  93.86  95.44  97.04 
Pima Indian diabetes  65.52  72.96  78.41  71.76  78.78  86.84  76.00  81.66  86.49  76.74  82.55  85.94 
HayesRoth  56.25  66.34  76.00  60.00  73.85  83.33  69.23  75.20  83.87  70.59  77.15  86.95 
Thyroid  83.72  86.76  92.85  86.04  92.01  96.07  88.46  92.53  97.29  92.98  95.60  99.55 
Titanic  75.13  77.64  81.87  76.00  79.25  83.74  77.00  80.54  85.00  77.10  81.47  85.94 
Wine  85.29  91.24  94.12  88.88  93.1  97.22  90.90  93.94  98.07  90.62  95.59  99.88 
Haberman  70.21  75.42  78.79  75.00  78.05  82.81  76.60  80.29  83.93  77.97  81.02  86.89 
Blood Transfusion Service Center  73.37  75.86  80.00  73.61  76.79  80.41  76.00  80.14  84.50  76.58  80.22  84.91 
WC: worst case; AC: average case; BC: best case.
Comparison of different performance parameters (precision, recall, and fmeasure) of ANN, NF, ANNFR, and NFFR models.
Precision, recall, and fmeasure of models  

Datasets/Models  ANN  NF  ANNFR  NFFR  
P  R  FM  P  R  FM  P  R  FM  P  R  FM  
Iris  0.918  0.910  0.911  0.926  0.928  0.923  0.947  0.942  0.940  0.950  0.953  0.948 
Mammographic  0.733  0.523  0.610  0.846  0.545  0.663  0.808  0.491  0.610  0.852  0.532  0.653 
Breast Cancer  0.972  0.359  0.501  0.914  0.343  0.499  0.942  0.327  0.485  0.925  0.324  0.480 
Pima Indian  0.393  0.181  0.246  0.903  0.750  0.819  0.633  0.263  0.368  0.936  0.723  0.815 
HayesRoth  0.683  0.706  0.673  0.781  0.754  0.751  0.750  0.751  0.734  0.801  0.758  0.756 
Thyroid  0.953  0.751  0.803  0.915  0.868  0.875  0.933  0.831  0.863  0.932  0.944  0.933 
Titanic  0.762  0.389  0.512  0.781  0.681  0.728  0.840  0.324  0.465  0.816  0.686  0.740 
Wine  0.9155  0.9289  0.9186  0.9361  0.9419  0.9359  0.8512  0.9437  0.9363  0.9629  0.9544  0.9569 
Haberman  0.6233  0.0807  0.1417  0.635  0.0699  0.1245  0.7014  0.074  0.132  0.5233  0.0308  0.0576 
Blood Transfusion  0.5015  0.0839  0.0574  0.7283  0.9923  0.8703  0.129  0.1015  0.1011  0.8105  0.9741  0.8843 
P: precision; R: recall; FM: fmeasure.
In Tables
Friedman’s rank to all the models based on avg. classification accuracy.
Datasets/models  ANN  NF  ANNFR  NFFR 

Iris  91.56 (4)  92.63 (3)  94.02 (2)  95.15 (1) 
Mammographic Mass  79.21 (4)  84.19 (3)  84.50 (2)  85.31 (1) 
Breast Cancer Wisconsin  91.07 (4)  93.98 (3)  95.39 (1)  95.04 (2) 
Pima Indian Diabetes  72.96 (4)  78.78 (3)  81.66 (2)  82.55 (1) 
HayesRoth  66.34 (4)  73.85 (3)  75.20 (2)  77.15 (1) 
Thyroid  86.76 (4)  92.01 (3)  92.53 (2)  95.60 (1) 
Titanic  77.64 (4)  79.25 (3)  80.54 (2)  81.47 (1) 
Wine  91.24 (4)  93.1 (3)  93.94 (2)  95.59 (1) 
Haberman  75.42 (4)  78.05 (3)  80.29 (2)  81.02 (1) 
Blood Transfusion Service  75.86 (4)  76.79 (3)  80.14 (2)  80.22 (1) 
Friedman’s Rank  4  3  1.9  1.1 
In this experiment, few parameters were considered in the design process of the models. The fuzzy expansion, number of input neurons, and number of output neurons for ten datasets are presented in Table
Parameters of the models.
Datasets  Fuzzy expansion  No. of input neurons  No. of output neurons 

Iris  3  4  3 
Mammographic  2  5  2 
Breast Cancer  2  9  2 
Pima Indian  2  8  2 
HayesRoth  3  6  3 
Thyroid  3  6  3 
Titanic  2  14  2 
Wine  3  14  3 
Haberman  2  4  2 
Blood Transfusion Service  2  5  2 
The complexity of the model describes how efficient the proposed NFFR model is. This model comprises three components such as fuzzification, feature reduction, and ANN classification. Fuzzification step requires constant amount of time for initialization of initial parameters that take
Statistical analysis is a wellknown method to analyze the performance of various models with several datasets. Generally, different statistical tools are used to analyze the nature of data and algorithms. In this section, statistical analysis [
ANOVA [
ANOVA test with descriptive statistics.
Models 

Mean  Std. deviation  Std. error  95% confidence interval for mean  Minimum  Maximum  

Lower bound  Upper bound  
ANN  10  80.806  8.83292  2.79321  74.4873  87.1247  66.34  91.56 
NF  10  84.263  7.89241  2.4958  78.6171  89.9089  73.85  93.98 
ANNFR  10  85.821  7.39622  2.33889  80.5301  91.1119  75.2  95.39 
NFFR  10  86.91  7.53305  2.38216  81.5212  92.2988  77.15  95.6 
Total  40  84.45  7.97159  1.26042  81.9006  86.9994  66.34  95.6 
ANOVA test with mean square and sum of squares.
Sum of squares  d 
Mean square 

Sig.  

Between groups  (Combined)  212.449  3  70.816  1.125  0.352  
Linear term  Contrast  197.408  1  197.408  3.136  0.085  
Deviation  15.041  2  7.521  0.119  0.888  
Quadratic term  Contrast  14.019  1  14.019  0.223  0.64  
Deviation  1.022  1  1.022  0.016  0.899  
Within groups  2265.856  36  62.94  
Total  2478.305  39 
To reject the null hypothesis, Tukey test [
Tukey test and Dunnett test.
(I) Algo  (J) Algo  Mean difference (IJ)  Std. error  Sig.  95% confidence interval  

Lower bound  Upper bound  
Tukey HSD  ANN  NF  −3.457  3.54797  0.765  −13.0125  6.0985 
ANNFR  −5.015  3.54797  0.499  −14.5705  4.5405  
NFFR  −6.104  3.54797  0.328  −15.6595  3.4515  
NF  ANN  3.457  3.54797  0.765  −6.0985  13.0125  
ANNFR  −1.558  3.54797  0.971  −11.1135  7.9975  
NFFR  −2.647  3.54797  0.878  −12.2025  6.9085  
ANNFR  ANN  5.015  3.54797  0.499  −4.5405  14.5705  
NF  1.558  3.54797  0.971  −7.9975  11.1135  
NFFR  −1.089  3.54797  0.99  −10.6445  8.4665  
NFFR  ANN  6.104  3.54797  0.328  −3.4515  15.6595  
NF  2.647  3.54797  0.878  −6.9085  12.2025  
ANNFR  1.089  3.54797  0.99  −8.4665  10.6445  


Dunnett 
ANN  NFFR  −6.104  3.54797  0.221  −14.8041  2.5961 
NF  NFFR  −2.647  3.54797  0.798  −11.3471  6.0531  
ANNFR  NFFR  −1.089  3.54797  0.98  −9.7891  7.6111 
Means for groups in homogeneous subsets are displayed. ^{a}Dunnett
Homogeneous group of models based on their level of significance.
Algorithms 

Subset for alpha = 0.05  

1  
Tukey HSD^{a}  ANN  10  80.806 
NF  10  84.263  
ANNFR  10  85.821  
NFFR  10  86.91  
Sig.  0.328  


Duncan^{a}  ANN  10  80.806 
NF  10  84.263  
ANNFR  10  85.821  
NFFR  10  86.91  
Sig.  0.124 
^{a}Uses harmonic mean sample size = 10.000.
Friedman test [
The average rank of four models: ANN, ANNFR, NF, and NFFR is computed based on assigned rank, which is represented in equation (
The Friedman statistics
The performances of models are different, if the corresponding average rank differs by at least the critical difference. The critical value is computed as 4.6 with (4 − 1 = 3) and (4 − 1 = 3) × (10 − 1 = 9) degree of freedom and significance level
Density plot.
This Holm procedure [
Results of the Holm statistical test.
Models  Models 




1  NFFR: ANN  5.02  2.584871e7  0.0033 
2  NFFR: ANNFR  3.29  0.000501  0.005 
3  NFFR: NF  1.38  0.083793  0.01 
In this paper, the proposed NFFR model is demonstrated successfully for solving data classification problems in data mining. Initially, this model uses the fuzzification process for expansion of the input features classwise belongingness of the features to various classes which provide to handle imprecise and uncertainty problems. Due to the expansion of features, the model structure becomes massively parallel and also found that all the features may not contribute significantly to the model. In the next step, PCA is applied to reduce the dimension of the expanded features by selecting the best suitable relevant and nonredundant features. As a result, the learning time of the proposed model was also reduced with the selected relevant features. However, a particular Πtype membership function considered for the fuzzification process may not always be suitable for the entire datasets. In such cases, the selection of suitable membership functions may be taken into consideration for data analysis. As per the experimental analysis, the proposed method is able to classify the datasets with superior classification performance as compared to ANN, NF, and ANNFR models. After statistical analysis, it is found that the proposed NFFR model is valid and efficient as compared to ANN, NF, and ANNFR models. In the future, this proposed model can be used in various reallife problems such as gene expression classification, document classification, and satellite image classification.
The data used to support the findings of this study are included in
This study was not funded by any research organization.
All authors declare that there are no conflicts of interest.
This research work was supported by the Science and Engineering Research Board (SERB), Department of Science and Technology (DST), New Delhi, Govt. of India, under the research project grant Sanction Order No. EEQ/2017/000355.