An Adaptive Fuzzy Min-Max Neural Network Classifier Based on Principle Component Analysis and Adaptive Genetic Algorithm

A novel adaptive fuzzy min-max neural network classifier called AFMN is proposed in this paper. Combined with principle component analysis and adaptive genetic algorithm, this integrated system can serve as a supervised and real-time classification technique. Considering the loophole in the expansion-contraction process of FMNN and GFMN and the overcomplex network architecture of FMCN, AFMN maintains the simple architecture of FMNN for fast learning and testingwhile rewriting themembership function, the expansion and contraction rules for hyperbox generation to solve the confusion problems in the hyperbox overlap region. Meanwhile, principle component analysis is adopted to finish dataset dimensionality reduction for increasing learning efficiency. After training, the confidence coefficient of each hyperbox is calculated based on the distribution of samples. During classifying procedure, utilizing adaptive genetic algorithm to complete parameter optimization for AFMN can also fasten the entire procedure than traversal method. For conditions where training samples are insufficient, data core weight updating is indispensible to enhance the robustness of classifier and the modified membership function can adjust itself according to the input varieties. The paper demonstrates the performance of AFMN through substantial examples in terms of classification accuracy and operating speed by comparing it with FMNN, GFMN, and FMCN.


Introduction
The merge of fuzzy set theory 1-6 stimulates its development on pattern recognition and classification.The capacity for fuzzy logic to divide the complex class boundaries has generated a lot of achievements in neuro-fuzzy pattern recognition systems 7-23 .The fuzzy min-max neural network FMNN which is proposed in 24 puts a solid foundation for further research in this field.The FMNN utilizes hyperbox fuzzy sets to represent a region of the n-dimensional pattern space; input samples which fall in a hyperbox have full Mathematical Problems in Engineering memberships.An n-dimensional hyperbox can be defined by stating its min and max vertices.This algorithm is to find suitable hyperboxes for each input patterns with a three-step process: expansion, overlap, and contraction.But the contraction of hyperboxes of different classes may lead to classification error which is demonstrated in 25 , and its performance highly depends on the initialization of the sequence of the training data and the expansion coefficient which controls the size of hyperbox.
The proposed GFMN 26 is also an online classifier based on hyperbox fuzzy set concept.Its improvement lies in proposing a new membership function which monotonically decreases with a growing distance from a cluster prototype, thus eliminating the likely confusion between cases of equally likely and unknown inputs 26 .But the contraction process problem remains.This situation is the same with the proposal of a modified fuzzy min-max neural network with a genetic-algorithm-based rule extractor for pattern classification even though it is creative to use genetic algorithm to minimize the numbers of features of input dataset 27 .In FMCN 25 , a new learning algorithm called fuzzy minmax neural network classifier with compensatory neuron architecture FMCN architecture has been reported.This method introduces compensatory neurons to handle the confusion in overlap regions and disposal of the contraction process.However, the FMCN does not allow hyperboxes of different class to be overlapped which results in the increasing number of neurons in the middle layer of network, thus consuming much more time during training and testing.And the algorithm distinguishes the simple overlap and containment.In fact, even though FMCN performs better than FMNN and GFMN in most cases, its structural complexity increases and consumes more time during training and testing.Meanwhile, it omits a kind of overlap 28 which results in classification error.Another improved network based on data core is called data-core-based fuzzy min-max neural network DCFMN .DCFMN 28 can adjust the membership function according to samples distribution in a hyperbox to get a higher classification accuracy, and its structural is simpler than FMCN.However, all these four networks cannot perform well with relatively insufficient training samples.A weighted fuzzy min-max neural network WFMM is proposed in 29 .The membership function of WFMM is designed to take the frequency of input patterns into consideration.
The proposed AFMN owns its advantages in several aspects.First, the proposed AFMN maintains the simple architecture of FMNN and adds preprocessing for input patterns, and its technique is principle component analysis PCA 30 .This kind of data dimensionality reduction technique can reduce the number of features of input patterns and extract the useful information.It is known that without preprocessing of training dataset, it is hard to practically implement the classifier due to the high dimensionality, the redundancy, and even noise inherent in input patterns.
Second, considering that there are nodes of more than one class in a hyperbox, it is not reasonable to allocate full membership for any input pattern that falls in the hyperbox.So the confidence coefficient for each hyperbox is introduced for resolving this confusion to achieve a higher classification accuracy.
Third, membership function is modified according to the inspiration of data core from DCFMN.The concept of data core which can update itself during testing aids to adjust the membership based on the training samples distribution.And loopholes that existed in FMNN, GFMN, and FMCN overlap test cases are found out and resolved by rewriting the rules.Meanwhile, adaptive genetic algorithm AGA 31-33 is utilized in classifying algorithm for parameters optimization instead of traversal method to improve the speed and accuracy of the entire neural network classifier.
Finally, the proposal of this new classifier is not only a original attempt of theory, but also an important initial step for its application on the running pipeline for working condition recognition which is a typical nonlinear control system 34-39 .
The rest of the paper is organized as follows.Section 2 analyzes the traditional fuzzy neural network classifier.Section 3 introduces the AFMN classifier system in detail.Section 4 provides abundant examples to demonstrate the performance of AFMN.Section 5 concludes with summary.

Analysis of Precedent Fuzzy Min-Max Neural Network Classifier
FMNN learning algorithm consists of three procedures: 1 expansion, 2 overlap test, and 3 contraction.Its rule is to find a suitable hyperbox for each input pattern.If the appropriate hyperbox exists even after expansion , its size cannot exceed the minimum and maximum limits.After expansion, all hyperboxes that belong to different classes have to be checked by overlap test to determine if any overlap exists.So a dimension by dimension comparison between hyperboxes of different class is performed.FMNN designs four test cases, at least one of the four cases is satisfied, then overlap exists between the two hyperboxes.Otherwise, a new hyperbox needs to be added to the network.If no overlaps occur, the hyperboxes are isolated and no contraction is required.Otherwise, a contraction process is needed to eliminate the confusion in overlapped areas.
GFMN focuses on the disadvantages of the membership function proposed in FMNN and proposes an improved membership function that the membership value can decrease steadily when input patterns get far away from the hyperbox.
FMCN distinguishes the simple overlap and containment and introduces overlapped compensation neurons OCNs and containment compensation neurons CCNs to solve the confusion in the overlap region.
However, there exists two cases in the overlap area that FMNN, GFMN, and FMCN cannot operate properly on the hyperbox adjustment.Figure 1 depicts the two hyperboxes overlap cases.The positions of minimum and maximum points are described below:

2.1
When input data that satisfies this condition is trained according to the overlap test rules designed in FMNN, GFMN, and FMCN, overlap cannot be checked because they do not satisfy any one of the four cases in overlap test.However, obviously the two hyperboxes are partly overlapped in Figure 1 a , and the other two hyperboxes are fully overlapped in Figure 1 b .This case shows that the loophole exists in the overlap test case of the three algorithms.Especially in the case depicted in Figure 1 b , the network cannot cancel one of the two identical hyperboxes, which means creating the same hyperbox twice.Meanwhile, the number of nodes will increase if overlap occurs between two hyperboxes of the same class and increase the computation complexity.Figure 2 emphasizes this situation again, there should be four hyperboxes after training, but the overlap test regards the number of hyperboxes as five.Just as precedent discussion shows, the cases in the overlap are not complete and need revising.Another disadvantage of the traditional classifier and the same with DCFMN is that they do not take verifying the efficiency of a hyperbox into consideration.

Mathematical Problems in Engineering
The idea of testing the efficiency of a hyperbox is inspired by the situation that in a hyperbox there are input patterns of more than one class.For the convenience of explanation here we name input patterns of the certain class that its hyperbox belongs to as primary patterns PPs and those of any other class as subordinate patterns SPs .Figure 3 shows the hyperboxes generated according to the learning algorithm of FMNN and DCFMN.Among them, hyperboxes 1-3 belong to class 1 and hyperbox 4 belongs to class 2. We can notice that in hyperbox1 of class 1, there are more SPs than PPs which shows that the creation of hyperbox is not appropriate and may insert a negative impact in classification.
Meanwhile, in other traditional fuzzy min-max neural network classifiers, input data is not preprocessed before training.The redundancy and noise of data can undermine the performance of classification and consume more time during training and testing.In AFMN, the problem is solved by using principle component analysis PCA to reduce the dimensionality of input data and adopting genetic algorithm to fast select the optimal parameters combination during test procedure instead of traversal method.

Confidence Coefficient
The hyperboxes generated during training are in different sizes and the input patterns included a hyperbox may belong to different classes which means the hyperbox cannot guarantee that an input pattern that falls within it fully belongs to its class.Figure 4 shows a hyperbox creation result in which there are input patterns of three classes A, B, and C. Obviously it is not rational to regard the membership of all input patterns that fall in the hyperbox B as 1 because there are PPs and OPs at the same time in the same hyperbox.This problem can be removed by accounting for the proportion of PPs patterns to total patterns in the same hyperbox.By calculating the proportion of the PPs of total patterns in the same box, the confidence coefficient of each hyperbox can be gotten.Let H {η 1 , η 2 , . ..}, η k be the confidence coefficient of kth hyperbox.Two possibilities have to be considered when designing H.
1 Just like the discussion before, we name input patterns of the certain class that its hyperbox belongs to as primary patterns PPs and those of any other class as subordinate patterns SPs , we should should consider the portion of PPs to total patterns and that of PP and SP patterns.initially the patterns by introducing parameter ξ is necessary which means distributing a weight value for each class.
And the resolution consists of two steps.
Step 1. Compute Weight Value ξ for Each Class.
For there are p classes, the number of input patterns for each class is relatively ϕ 1 , ϕ 2 , . . ., ϕ p ; the function that decides weight value ξ k for each class is given by
For the jth hyperbox b j and b j ∈ c k , the corresponding η j is given by.
where φ jk k 1, 2, . . ., p represents the number of input patterns of class k in hyperbox b j , j 1, 2, . . .m; m is the number of hyperboxes.And the value of η j is decided by where β ranges from 0.1 to 1.

AFMN Architecture Overview
The architecture of AFMN is shown in Figure 5.The connections between input and middle layer are stored in matrices V and W. The connections between middle and the output layer are binary valued and stored in U.The equation for assigning the values from b j to the output layer node c i is as follows:

Fuzzy Hyperbox Membership Function
The membership function b j X h for an input X h is given by b where d ji c ji − v ji w ji /2 ; c ji is the geometrical core that is known as data core.It is given by where j N is the number of patterns belonging to its hyperbox's class.x q hi is the patterns belonging to its hyperbox's class.

Mathematical Problems in Engineering
λ is given by λ j max φ φ j , j 1, . . ., m, 3.7 where φ j indicates the number of PPs in the hyperbox j. f is a two-parameter ramp threshold function as follows o 1, 2, 3.8

Data Preprocessing by Principle Component Analysis (PCA)
Principle analysis is chosen as a data dimensionality reduction technique that removes redundant features from the input data.The input data after dimensionality reduction can accelerate the training and testing procedure meanwhile improving the network performance because PCA picks up primary features from original dataset to avoid affecting by the redundancy and noise within it.In this paper, the number of features we choose depends on the how many dimensions can include 80% of the total information.

Hyperbox Expansion
This procedure decides the number and min-max points of hyperboxes, its rule is as follows.
If the following criterion is satisfied where θ controls the size of a hyperbox 0 < θ ≤ 1.
If the expansion criterion has been met, the minimum and maximum points of the hyper box are adjusted using the following equation

3.10
Otherwise, create a new hyperbox and its min and max points are adjusted as below: x hi , i 1, 2, . . ., n.

3.11
Repeat the procedure until all the input patterns finish training.
Mathematical Problems in Engineering 9

Hyperbox Overlap Test
As previously stated, new cases have to resolve the problem existed in FMNN; so first for testing if two hyperboxes are fully overlapped, we design the case as bellow: v ji v ki < w ji w ki .

3.12
If the case can be satisfied, that means two hyperboxes of the same class fully overlap, then one of them will be removed from the network.
Here assuming α old 1, Δ 1 initially, for hyperbox j and hyperbox k, the four overlap cases and the corresponding overlap value for the ith dimension are given as follows.
Case 1 v ji ≤ v ki < w ji ≤ w ki .One has α new min w ji − v ki , α old .

3.13
Case 2 v ki ≤ v ji < w ki ≤ w ji .One has α new min w ki − v ji , α old .

3.14
Case 3 v ji < v ki < w ki < w ji .One has α new min min w ki − v ji , w ji − v ki , α old .

3.15
Case 4 v ki < v ji < w ji < w ki .One has If any dimension cannot satisfy any of the four cases, then Δ 0. Otherwise if Δ / 0, then there is overlap between hyperbox j and hyperbox k.

Hyperbox Contraction
If overlap exists between hyperboxes of different classes, the network will allocate 1 for the overlap region, thus generating the classification confusion.And only one of the n dimension needs to be adjusted to keep the hyperbox as large as possible.For Δ i, then Δth dimension is that we should select.The adjustment should be made as follows.

3.20
Through all the precedent procedures, parameters V and W are determined.The entire learning procedure can be summarized in Figure 6.

Genetic Algorithm in Network Classifying Procedure
GA is bestowed the task of finding the best parameter combination instead of the traversal method.Compared with the traditional traversal method to search for appropriate parameters for the network to achieve its best performance, genetic algorithm has two advantages.
a For traversal method, choosing an appropriate step is an obstacle.Setting too small step size can achieve a better classification performance at the cost of more time consuming.Otherwise, testing procedure will be fast at the cost of a relatively low accuracy.
b For high classification accuracy that means setting the step short.Genetic algorithm completes this task faster than traversal.
The GA fitness function used is defined as The genetic operation implemented consists of the following six steps.Step 1 initialization .Set the range for each parameter and initialize the population string in each generation.Here θ ranges from 0 to 1, β ranges from 0.1 to 1, and γ ranges from 1 to 10.
Step 2 selection .Select the certain numbers of pairs of strings from the current population according to the rule known as roulette wheel selection.Step 3 crossover .For each selected pair, choose the bit position for crossover.The rule is specified as bellow: where f indicates the lager fitness value in the pair, f max is the maximum fitness value, and f avg is the average fitness value of the current population.
Step 4 mutation .For each bit value of the strings, apply the following mutation operation according to the possibility defined as below: where f is the fitness value of the mutation individual.
Step 5 elitist strategy .Select a string with maximum fitness and pass it to the next generation directly.
Step 6 termination test .Here we use the number of generations as a condition for genetic algorithm termination.

The Entire Classifier System
The learning and classification algorithm can be summarized in the flowchart in Figure 7.

Examples to Demonstrate the Effectiveness of Overlap Test and Contraction Cases
Just as the previous discussion about the cases represented in Figures 1 a and 1 b .When overlap occurs in such case, the overlap and contraction algorithm of FMNN, GFMN,

Principle Component Analysis
To understand the effect of PCA in improving classification efficiency by implementing data dimensionality reduction, we use AFMN to classify five groups of complete GLASS dataset 40 , and one is preprocessed by PCA to get a simplified input pattern and guarantee the remaining formation is not less than 80%.The number of classification error is showen in Table 1.The training dataset ranges from 20% to 100%.The training dataset is selected randomly each time, and the entire glass dataset is used for testing.The experiment is conducted 100 times.The result is represented in Table 1.
From the results in the table, it is demonstrated that principle analysis can complete the task of dimensionality reduction, and it is important to notice that adding PCA is not bound to increase the classification accuracy which is verified in 40% and 100% training set.But thanks to its ability of reducing the dimensionality of the raw dataset, the consuming time has been shorten rapidly.

Genetic Algorithm for Parameter Optimization
The task of genetic algorithm is to find the appropriate combination of three parameters for best classification performance faster.And the result with genetic algorithm should be no worse than without it.Here Iris dataset is chosen for demonstration, 10% of the given dataset is for training and the rest for testing.The experiment is repeated 100 times to get the minimum misclassification numbers and the average consuming time.The result is shown in Table 2. Table 2 demonstrates that GA can find better combination of parameters and its speed is faster.Its ability is important for application in real world.

Various Dataset for Training and Testing with Complete Dataset
Here for the given iris dataset, 35%, 45%, 50%, 60%, and 70% of the dataset were selected randomly for training purpose and the complete dataset for testing.The performance of learning and testing is shown in Table 3.It is obvious that AFMN has a better performance with fewer misclassifications.

Different Dataset for Training and Testing
In this section, datasets such as wine, thyroid, and ionosphere are selected for comparing the abilities of several network classifier Table 4 .50% of each dataset is randomly selected for training and the entire dataset for testing.We conduct 100 times experiment for each dataset.Results show, in terms of classification accuracy, that the FMCN and AFMN have the very approximate performance, but from the consuming time and stability of these two classifiers, obviously AFMN is better than FMCN which demonstrates its advantage Table 5 .

Test the Robustness of AFMN with Noise-Contaminated Dataset
The robustness of a network classifier is important especially in application.50% of Iris data was randomly chosen for training, and the entire Iris dataset was used for testing.For the purpose of checking robustness of AFMN, FNCN, FMNN, and GFMN, We added the random noise to the Iris data set.The amplitude of noise added in the Iris data set is 1%, 5%, and 10%.The expansion coefficient varies from 0.01 to 0.4 in step of 0.02.One hundred experiments were performed for getting accuracy result.The result is shown in Table 6, when the amplitude of noise is 1%, the maximum and minimum misclassification of four methods is the same with the numbers of experiment with precise data.It proves that all the methods have robustness.But as the amplitude of noise increases, the performance of four methods becomes worse.Although the performance becomes worse, from Table 6, the average misclassification in AFMN increases more slowly than others, and the AFMN has better robustness.

Fixed Training Dataset Size (60% of Iris Dataset)
In this simulation, the effect of expansion coefficient is studied on the performance of AFMN, FMCN, GFMN, and FMNN.60% of iris data is chosen for training and the entire iris data for testing.The expansion coefficient varies form 1.0 to 1 in step of 0.1.The results of training and testing are shown in Figures 9 and 10, respectively.
From the result we can conclude, that FMCN is vulnerable to the fluctuation of expansion coefficient, and GFMN and FMNN have a relatively higher classification error.Compared with them, AFMN performs better.

Test on Synthetic Image
The dataset consists of 950 samples belonging to two nested classes which make the classification more difficult.Figure 11 shows the synthetic image.
Figure 12 shows the performance of AFMN, FMCN, GFMN, and FMNN on this specified data set.60% of dataset is randomly selected for training.Expansion coefficient varies from 0 to 0.2 in the step of 0.02.Obviously, AFMN works better than any other algorithm both in training and testing.

Comparison with Other Traditional Classifier
In this section we can compare the performance of AFMN, FMCN, GFMN, and FMNN on the iris dataset as Table 7 shows.The results show AFMN has no misclassification.

Comparison with Nodes Number (Hyperbox Number)
The complexity of the created network after training affects the speed and efficiency of classification.100% iris dataset is selected for training to see how many nodes created in the middle layer after training by each classifier.The results are shown in Figure 13.
As the expansion coefficient increases, the number of nodes decreases.AFMN, GFMN, and FMNN can generate a relatively simple structure network.In contrast, the architecture of FMCN is much more complex.

Conclusion
This paper proposes a complete classification system based on a new neural algorithm called AFMN, principle analysis algorithm, and genetic algorithm.The development of this classifier derives from the modification and completion of the fuzzy min-max neural network proposed by Simpson.Unlike the following neural algorithm for clustering and classification

Technique
Misclassification Bayes classifier 1 2 k-nearest neighborhood 1 4 Fuzzy k-nn 2 4 Fisher ratios 1 3 Ho-kashyap 1 2 Perceptron 3 3 Fuzzy perceptron 3 2 FMNN 1 2 GFMN 1 1/0 GFMN 3 0 FMCN 1 0 FMCN 3 0 AFMN 1 0 AFMN 3 0 1 Training set is of 75 data points 25 from each class and test set consists of remaining data points. 2Training data is of 36 data points 12 from each class and test set consists of 36 data points; results are then scaled up for 150 points. 3Training and testing data are the same.such as GFMN and FMCN, our classifier system is more complete and practical.The advantage of AFMN can be summarized as follows.
1 AFMN adds preprocessing for input patterns and its technique is principle component analysis PCA .This kind of data dimensionality reduction technique can reduce the number of features of input patterns and extract the useful information.This means saving the training and testing consuming time, meanwhile making the algorithm more suitable for application on real data for pattern classification.
2 The introduction of confidence coefficient is overlooked by precedent neural algorithm for clustering and classification.Considering that there are nodes of more than one class in a hyperbox, the confidence of hyperboxes must be different, thus the operation that allocate 1 for any input pattern that falls in the hyperbox is not reasonable.So in the AFMN we calculate the confidence coefficient of each hyperbox for more precise classification.
3 Adaptive genetic algorithm AGA is utilized in testing for parameters optimization while disposing of the step-setting obstacle in traversal method for parameters optimzation.GA can find the proper parameters combination more precisely and faster.
4 Modification to the membership function ensures the self-adjustment according to the samples distribution and maintains the data core concept proposed in DCFMN.
The data core can update itself online during classifying procedure, which is an indispensable ability to improve the classifier performance when training samples are insufficient.
5 AFMN solves the problem existing in the overlap test of FMNN, GFMN, and FMCN; thus it can generate hyperboxes properly and remove redundant ones.By rewriting the contraction rules, AFMN maintains the simple architecture of FMNN, and abundant simulations demonstrate its high recognition rate.
In conclusion, integrated with principle component analysis for dimensionality reduction and genetic algorithm for parameters optimization, AFMN is a fast fuzzy min-max neural network classifier with high recognition rate and robustness.The use of AFMN network will be explored out of the laboratory.

Figure 13 :
Figure 13: Node numbers generated by algorithm.

Table 1 :
The effectiveness of PCA process.
and FMCN will create misclassification error.This problem is solved by the revised overlap test cases.The hyperbox generation result is shown in Figures 8 a and 8 b .

Table 2 :
Parameter optimization using GA.
LE: learning error; TE: testing error.

Table 5 :
Classification consuming time of FMCN and AFMN.

Table 6 :
Robustness test on different amplitudes of noise.

Table 7 :
Comparison with other traditional classifiers.