A Multiple Classifier Fusion Algorithm Using Weighted Decision Templates

Fusing classifiers’ decisions can improve the performance of a pattern recognition system. Many applications areas have adopted the methods of multiple classifier fusion to increase the classification accuracy in the recognition process. From fully considering the classifier performance differences and the training sample information, a multiple classifier fusion algorithm using weighted decision templates is proposed in this paper. The algorithm uses a statistical vector to measure the classifier’s performance and makes a weighed transform on each classifier according to the reliability of its output. To make a decision, the information in the training samples around an input sample is used by the k-nearest-neighbor rule if the algorithm evaluates the sample as being highly likely to be misclassified. An experimental comparison was performed on 15 data sets from the KDD’99, UCI, and ELENA databases. The experimental results indicate that the algorithm can achieve better classification performance. Next, the algorithm was applied to cataract grading in the cataract ultrasonic phacoemulsification operation. The application result indicates that the proposed algorithm is effective and can meet the practical requirements of the operation.


Introduction
A single classifier was always used in a traditional pattern recognition system.However, in recent years, it has been found that the samples that were wrongly classified by distinct classifiers were usually not the same in many experiments.This finding means that complementary information about the object to be recognized can be potentially offered by different classifiers and effective fusion of the complementary information is expected to considerably improve the performance of a pattern recognition system.When the member classifiers are diverse or complementary, multiple classifier systems can usually obtain higher classification accuracies compared with a single classifier [1].Through a large number of experiments and applications, it has been proven that fusing classifiers' decisions can achieve better performance than the best single classifier and improve the efficiency and robustness of a pattern recognition system.Currently, many applications areas have adopted the methods of multiple classifier fusion, such as fault diagnosis [2,3], radar emitter recognition [4], remote sensing image recognition [5][6][7], medical diagnosis [8], face recognition [9], and intrusion detection [10].Some of the fusion methods have good generality and show good classification performance in certain applications.However, similar to the classifier, there is no fusion method that can obtain the optimal classification performance for all of the applications.Therefore, the study of multiple classifier fusion is still an open problem.For a specific application, this approach requires further research on a more suitable algorithm for multiple classifier fusion.
Multiple classifier fusion assumes that all of the classifiers are equally "experienced" over the entire feature space.Thus, all of the outputs of the classifiers are fused in a certain way to achieve the final decision.According to the different outputs of the classifiers, they can be further divided into three categories.When the output is shown in a decision form, Naive Bayes [11], majority voting, and Behavior Knowledge Space (BKS) [12] are the representative fusion algorithms.When the output is shown in a ranked form, Borda Count [13,14] is a typical fusion algorithm.When the output is shown in a measured form, the fusion algorithms mainly include Max, Min, Sum, Product, Median, Support Vector Machine (SVM) 2 Scientific Programming fusion [15][16][17], Neural Networks fusion [18], Dempster-Shafer theory [19], and decision templates (DT) [20].
The decision templates (DT) algorithm is a simple and effective fusion algorithm that has measure outputs and that is researched and used widely [21][22][23][24][25][26].During the training stage, the DT algorithm calculates the average of the decision profiles that correspond to the training samples that belong to each class according to the outputs of all of the classifiers.The averages are the decision templates (one per class).Then, the input sample's class is determined by evaluating the similarity between its decision profile and various decision templates.The algorithm has the following advantages [27]: there is a simple training process; it requires no strict assumptions compared to probability-based algorithms; it is less sensitive to the size of the training set than other algorithms and overtraining rarely happens; it is also intuitive, having a small number of calculations; and it is not time-consuming.However, the DT algorithm still has two problems: (1) a decision template is only the average of the decision profiles that belong to a class and does not fully reflect the differences in the classifier's performance.(2) It uses only the classification information of the decision template and does not take full advantage of the training samples' information.
Therefore, an improved DT algorithm (called VWDT) is proposed here.This algorithm measures the classifier's performance by a statistical vector and assigns different weights to each classifier according to their outputs.Reliable output is assigned to a large weight in such a way that the output represents a significant share of the decision templates.For a sample that is easily misclassified, the information in the surrounding training samples is used to make a decision in addition to using the similarities between it and the decision templates.
This algorithm was compared with the DT algorithm on 15 data sets, from the KDD'99, UCI, and ELENA databases.The experimental results show that the algorithm can achieve better classification performance.Then, the algorithm was applied to cataract grading in the cataract ultrasonic phacoemulsification operation.The application effect indicates that the proposed algorithm is effective and can meet the practical requirements of the operation.Thus, the algorithm can be applied to the computer-aided cataract recognition system of the ultrasonic emulsification apparatus to automatically recognize the hardness grade of the cataract, which thus lowers the operation difficulty level, shortens the study period, and improves the safety of the operation.
The organization of this paper is as follows: Section 2 contains the description of the multiple classifier fusion algorithm proposed in this paper.The experiment test of the algorithm is given in Section 3, and Section 4 provides the validation of the practical application, including the application background and the application description and effect.Finally, Section 5 presents the conclusions and summarizes the contents of this paper.

Relative Expression.
The structure of the multiple classifier fusion used in this paper is shown in Figure 1.Let   be the -dimensional feature space; let  = [ 1 ,  2 , . . .,   ]  be the -dimensional feature vector,  ∈   ; let Ω = { 1 ,  2 , . . .,   } be the set of potential class labels; and let  = { 1 ,  2 , . . .,   } be the set of trained classifiers for decision fusion.Given the input pattern , the output of the th classifier is denoted as in where  , (),  = 1, 2, . . ., ,  = 1, 2, . . ., , represents the measure value of the possibility that classifier   considers that  belongs to class   .The fused output of  classifiers is constructed as in where  is the fusion rule.
The output of all of the classifiers can be represented as the decision profile,  ×  matrix as shown in The th row is the measure layer output of the th classifier   (), which is given according to (1).The th column is the possibility measure value that  classifiers consider that the input pattern  belongs to class   .In addition, the fusion result () is an -dimensional vector that is represented by the measure layer form, which is denoted as in where   (),  = 1, 2, . . ., , shows the possibility measure value that  belongs to class   after fusion.After acquiring the fusion result of (4), a certain rule is used to judge which class the input pattern belongs to.Typically, the rule of using the maximum value is adopted; namely, if   () = max  =1   (), then it shall be deemed that  ∈   .
The simple fusion rules (Max, Min, Sum, Product, and Median) acquire the system output by operating each column of DP().
(2) Min rule is The Min rule takes the minimum of every DP() column as the fused output ().
(3) Sum rule is The Sum rule computes the sum of every DP() column as the fused output ().It is also called the mean rule when it computes the mean; these are simply two forms of the same rule.
The Product rule computes the product of every DP() column as the fused output ().
(5) Median rule is The Median rule takes the median of every DP() column as the fused output ().If  is an even number, then the mean of two medians is taken as the result of a column.
The DT algorithm acquires the system output using the entire DP() and calculates the corresponding decision template DT  ,  = 1, 2, . . ., , on the training set , as in where   represents the samples that belong to class   in the training set  and   represents the number of   .Then, the input sample's class is determined by evaluating the similarity between its decision profile and various decision templates.Here, the squared Euclidean distance is used for calculating the similarity, but other measures can also be applied.For a training sample  ∈ , the calculation equation is shown in where dt  (, ) represents the element at the intersection of the th row and th column in DT  .

Algorithm Idea.
Due to the existence of the performance differences of the classifiers, even using the same classifier, the distinguishing capacity for different classes of data in a data set is different.When the classifier output reserves detailed information about a class, the DT algorithm's performance will be quite good.However, when the classifier is very sensitive to certain features of the input space, the result is a major change to certain information in the classes in the output space, and the performance of the algorithm is reduced to an obvious extent.For this problem, the VWDT algorithm measures the performance of various classifiers by a statistical vector and self-adaptively assigns weights for various classifiers according to the specific output status of the classifiers.The output of a classifier that has better performance occupies a larger proportion in the constructing process of the decision template, to acquire a more reliable decision template and improve the classification accuracy.
For certain samples, even being compared with the samples in the same class, they still have obvious differences.These samples are often the outliers of each class of samples, and their decision profiles are quite different from most of the samples.When they are rare, in the DT algorithm, the decision template acquired by the calculation is quite different from their decision profiles.Moreover, the DT algorithm evaluates only the similarity between the decision profile and decision template, which makes some samples from various classes misclassified due to having a small similarity or the fact of being at an overlapping region of multiple classes.With respect to this problem, the VWDT algorithm takes advantage of the information in various training samples that are around the test sample.It searches for several nearestneighbors of the test sample and combines the training sample's information and the calculation of the decision template similarity together, thus avoiding the deviation that results from total dependence on the decision template to some extent and making the final class judgment more reliable.

Algorithm
Steps.Firstly, a statistical vector is used to measure the performance of various classifiers, and the decision profile takes a weighed transform.
The classification error of the th classifier   can be represented by an -dimensional statistical vector   , as in where the element    ,  = 1, 2, . . ., , represents the number of training samples that are in the th class that are correctly recognized by classifier   .Let the total number of training samples be .The vector   is normalized by dividing by .At this time, the meaning of each element changes into its corresponding percentage.The normalized vector is represented as in Here, the reliability of the output vector of classifier   can be weighed by  / ℎ , and the value of ℎ should satisfy the following condition, as shown in Thus, the reliable output of classifier   can be represented as in From ( 1), ( 15) is denoted as in According to the reliable output vectors of all of the classifiers calculated by ( 16), the new decision profile DP  () can be acquired as in Using DP  () to replace DP() and adopting the Euclidean distance as the similarity measure, the VWDT algorithm is described as follows.

Training Process
(1) Use each classifier to classify the training samples, acquire each classifier's statistical vector   ,  = 1, 2, . . ., , and construct the set  that corresponds to the decision profile DP  (); from (10), the corresponding decision template DT   ,  = 1, 2, . . ., , can be acquired, as shown in (2) For a training sample  ∈  and from (11), the square value of the Euclidean distance between the decision profile DP  () and DT   can be acquired, as shown in (3) Calculate the possibility measure values of training sample  belonging to various classes, as shown in If   () = max  =1   (), then it shall be deemed that  ∈   .(5) Set   ̸ = Φ, calculate the average values of all of the elements in set   as an additional decision template, which shall be denoted as DT  +1 , and use the set   to train a -NN classifier  knn .( 6) Return all of the decision templates DT   ,  = 1, 2, . . ., +1, the statistical vector   ,  = 1, 2, . . ., , and the classifier  knn .

Classification Process
(1) For an input pattern , use the statistical vectors that are acquired in the above training process to construct its decision profile DP  ().
(3) Judge the class of : The algorithm measures the classifier's performance by a statistical vector, which is acquired from the statistical data on the training sample set.Different weights are self-adaptively assigned to the classifiers' output vectors according to their judgments, which makes the decisions from different classifiers and the decisions of different classes from a classifier have different proportions in the decision templates.The weights can be acquired independently according to the prior information of the training samples and the classifiers' decisions on the current input sample.According to the statistical vector, if a classifier has better classification performance for a certain class of samples, then its output can be considered to be more reliable when it judges that the current input sample belongs to this class, and it is assigned a larger weight.If so, more reliable output has a higher proportion in the decision templates, which makes them more reliable.The training samples that are wrongly classified by the multiple classifier system are used to construct an additional decision template.When an input sample is nearest to the center of this template, it can be reasonably considered that the sample is easily misclassified by the decision templates.At this time, the nearest-neighbor rule is adopted to make the judgment, which combines the information in the training samples near the input sample and the distance data between the input sample and various templates, which thus improves the accuracy.

Experimental Analyses
Currently, there are many practical classification algorithms, such as the nearest-neighbor algorithm, linear classifiers, the minimum distance algorithm, support vector machines, and artificial neural networks.The nearest-neighbor algorithm is very intuitive, wherein the basic idea is to find a sample from the training set that has the minimum distance to the object to be recognized and take the sample's class as the recognition result.The -nearest-neighbor algorithm, as a typical representative of nonparametric algorithms, is used as the base classifier in this paper because it is simple and reliable.The experiments in this paper were realized with Matlab.During the programming process of the specific codes, the pattern recognition toolbox "PRTools4" [28] was used.In the experiments, the code of the -nearest (-NN) classifier is from the toolbox.To simplify the experiments, the parameter  is selected automatically and optimally by the leave-oneout method.The measure of the classifier's output is the posterior probability.To sufficiently test the performance and generality of the proposed algorithm, 15 data sets from the KDD'99, UCI, and ELENA databases were selected for the experiments.These databases are comparatively authoritative test data from related research fields, and the selected data sets show relatively good representation.
In the experiment results, if the sample data has separate training set and test set and the test result is unchanged, the statistical equation of the error rate is shown in where  represents the error rate,   represents the number of misclassified test samples, and   represents the number of test samples.If the sample data has only one set and -fold crossvalidation is used, the test result is the average of the error rates of  validations, and the statistical equation of the average error rate is shown in where  represents the average error rate.The connection patterns related to the ftp service were selected from two data sets with corrected labels.Each pattern was processed by dimension reduction (these features whose values were always constants were abandoned) and quantification of the symbolic features and normalization (a linear transformation was applied to make each feature within the range [0, 1]).Training set A and test set B were obtained correspondingly.This group of data sets with fiveclass data was denoted by ftp5c.Combining the four attack classes as an abnormal class, a group of data sets with twoclass data was obtained and denoted by ftp2c.

Using the KDD
Through a similar process, the connection patterns that were related to the http service were also selected.Two groups of data sets were acquired, which were denoted by http4c and http2c.The former was four-class data, and the latter was twoclass data.Because the number of http connection patterns was large, a restricted condition in the selection process was added: the duration was above zero.Table 1 is the description of the KDD data sets.Three -NN classifiers were trained using a distinct category of features on each training set A and constituted the classifier set.The original test results of the DT and VWDT algorithms are shown in Table 2 and Figure 2 (ER represents the error rate).
It can be observed that on the four data sets, the performance of the VWDT algorithm is better than that of the DT algorithm.Especially on the multiple-class data sets of ftp5c and http4c, the error rate of the algorithm is reduced more obviously.The reason is that the more the classes that exist, the more obvious the difference that is in the classifier's performance to various classes, which is reflected in the decision template, and the reliability of the decision template is higher.At the same time, the difference between the misclassified samples and other samples is more  obvious, and the complementarity is better by combining the information in the training samples around the input sample and the calculated similarity between the decision templates and the input sample's decision profile.

Using the UCI Data.
A total of 8 data sets from the UCI machine learning database (http://www.ics.uci.edu/∼mlearn/MLRepository.html/)were selected for experimental analysis.Due to the study demand for the algorithms, the selected data sets were all in digital form without errors or missing feature values.Normalization was performed on the selected data.The basic information about the selected data sets is in Table 3.The name of each data set is only the first word of its name in the original database.
For each data set, 6-fold cross-validation was used: the data set was divided into 6 subsets with similar size and distribution.Every time, a subset was selected as the test set, and the remainder comprised the training set.On each training subset, a -NN classifier was trained.The corresponding 5 classifiers constituted the classifier set for the fusion.The test results are the average of 6 times (see Table 4 and Figure 3; AER represents the average error rate).
It can be observed that, of the 8 data sets, the VWDT algorithm is better than the DT algorithm on 7 of the data sets.On the data sets (thyroid, wine, and vehicle), which have more classes, the performances of the VWDT algorithm are improved more obviously.However, the data set of glass is  an exceptional instance, mainly because there are not many samples but lots of classes in the data set, and the distributions of the samples of the different classes are obviously varied.Under such a situation, taking the samples' decision profile average of a class as the representative of the samples of that class could result in a larger error, which is also proven by the DT algorithm's performance.For the VWDT algorithm, an additional decision template is divided, which further lowers the performance.However, it can be observed that, on the data set of glass, compared with the DT algorithm, the error rate increase of the VWDT algorithm is not obvious (2.83%).
On the other data sets, the minimum error rate reduction of this algorithm is 8.5% (data set pima).

Using the ELENA Data.
A total of 3 data sets from the ELENA database (https://www.elen.ucl.ac.be/neural-nets/Research/Projects/ELENA/elena.htm) were selected for experimental analysis.Normalization was also made for the selected data.The basic information of the selected data sets is in Table 5.For each data set, 6-fold cross-validation was also used, and the test results are in Table 6 and Figure 4.It can also be observed that the VWDT algorithm is better than the DT algorithm.On the data set of satimage, which has more classes, the performance of the VWDT algorithm is improved more obviously.
The experiment results show that the VWDT algorithm is effective for improving the classification performance.It must be noted that -NN classifiers are adopted in the experiments of this paper.When the classifier set is composed of classifiers of a different type, because the physical meanings and measures of the classifiers' outputs are different, the performances of both the DT and VWDT algorithms could be obviously impacted.At this time, the outputs of all of the classifiers should be transformed into unified reliability.

Practical Application
4.1.Application Background.Age-related cataract is the leading cause of blindness in the world.Now, there are 25 million people with cataract-induced blindness, and, among them, almost 2.5 million are in China and account for 10 percent.Because the population is aging, there would be more than one million cataract patients yearly in China.Until now, an operation is the only effective method to cure cataract blindness.However, a cataract operation could not fully treat the newly increased patients, and, thus, the future appears to be rather gloomy for China's cataract blindness.
The ultrasonic phacoemulsification operation is considered to be the main treatment for cataracts and is widely accepted because of its small cut, short operation time, including no suture, no need of being in hospital, and no limitation for any activities, and fast recovery of eyesight.Before performing the operation, strict training shall be conducted for the doctor in charge of a case, to make the doctor very familiar with the cataract grading.However, because the hardness grades of the cataracts are different for different individuals, the problem of "how to correctly implement a proper oscillation frequency according to a different lens nucleus" requires long-term practice for the operation, to accumulate rich operation experience toward reaching the best operation outcome.The operation is impossible to promote over a large area because of its low automatization, operation difficulty, and long learning cycle.That is the vital reason why this operation cannot be popularized.
Along with the development of information technology, computer-aided healthcare by integrating medical devices and healthcare information systems to improve healthcare quality and productivity is receiving more and more attention.In particular, a computer-aided healthcare system could provide a solution in the developing areas where medical resources are scarce [29].Image processing and pattern recognition technologies can be applied to the computeraided cataract recognition system of the ultrasonic emulsification apparatus, to automatically recognize the hardness grade of the cataract and, thus, to lower the operation's difficulty level, shorten the study period, and improve the safety of the operation.
Such a cataract recognition system is mainly composed of five modules, as shown in Figure 5 (the position of the research content of this paper is in the dash-dotted ellipse).The image acquisition module has a real-time operation video via camera on the operating table, achieves single-frame images by sampling at a certain frequency, and switches them into formats that can be processed by the system; the object detection module detects the emulsification pinhead from the image and locates the image area for recognition, which is the first key problem of the system; the image preprocessing module conducts the preprocessing for the image area, such as filtering and denoising, to eliminate the noise; and the feature extraction module extracts the features of the image area.The extracted features should effectively represent the tissue and reserve the original information of the emulsification pinhead, cataract tissues, and normal tissues as much as possible; the cataract grading module adopts a certain classification algorithm according to the extracted features to recognize the tissue and submits the result to the control system.Finding a suitable recognition algorithm to ensure accurate judgment for the cataract grading is the second key problem of the system.If the recognition result is a cataract with certain hardness, the control system will emit different signals to control the ultrasonic frequency according to different hardness grades.Otherwise, no ultrasonic signal can be sent out.

Application Description.
The cataract grading is mainly based on Emery and Little's grade standard, which judges the color of the lens nucleus to grade its hardness according to the examination results under the slit-lamp.Human eyes can directly distinguish normal tissue from diseased tissue and recognize different hardness of the lens nucleus based on the appearance of different colors.Considering the practical situation of lens nucleus recognition, color is selected as the image feature for the system, and the RGB color model is used in this paper.
The image of the area near the emulsification pinhead was partitioned into  ×  pieces, and the color value of a piece was the average of all of the pixels in the piece.This paper selected  = 15 and  = 5 as the parameters.Thus, each image was partitioned into 75 pieces, and each piece had a similar number of pixels.The average R, G, and B value of all of the pixels in a piece were separately calculated as the feature value of that piece.In other words, every image corresponds to a 225-dimensional feature vector.
The software development environment of the recognition system was Visual C++.On the training set, three -NN classifiers were trained separately by using R, G, and B features; that is, the input of each classifier is a feature vector of 75 dimensions.The decisions of three single feature classifiers were fused by the DT and VWDT algorithms.For a comparison, a -NN classifier was trained by using the RGB feature; that is, the input of the classifier is a 225dimensional feature vector.To test the effect of the parameter  on the results, the parameter is selected to be  = 3, 5, 8.In the VWDT algorithm, the parameter  of classifier  knn is selected to be  = 1.  7. The original application results are shown in Table 8, and the average error rates (AER) are shown in Figure 6.
It can be observed that the classifiers that use the R feature have the highest error rates.The classifiers that use the G  feature have the lowest error rate among the single color feature classifiers, which are slightly lower than the classifiers that use the RGB feature.The error rate of the VWDT algorithm is lower than that of all of the single classifiers and the DT algorithm, which shows the effectiveness of the algorithm again.Among the images that were misclassified by all of the algorithms, there is no normal tissue, and more than 80% of them are divided into lower grates.This result is ideal, and it will not cause adverse effects on the patients.The doctor must only properly increase the output signal intensity according to experience.The VWDT algorithm has the highest average recognition rate, 94.75%, which reaches the actual demand of the cataract ultrasonic phacoemulsification operation.
When the parameter  takes on different values, the error rates change accordingly but not very obviously (as shown in Figure 7).When  = 5, the error rates of all of the algorithms (except for the VWDT algorithm, when  = 3) are slightly lower.Thus, in practical applications of the VWDT algorithm, the parameter  can be determined by experience (a fifteenth of the number of training samples is a suggestion).

Conclusions
In this paper, a multiple classifier fusion algorithm is proposed and applied for recognition of the hardness of a cataract lens nucleus.The algorithm considers the difference in the classifiers' performances and makes full use of the training

Figure 1 :
Figure 1: The structure of the multiple classifier fusion.

( 4 )
Repeat steps (2)-(3) until all of the training samples are classified.According to the classification results, acquire a weighted decision profile set   (  ⊆ ) that corresponds to the training samples that are misclassified.

Figure 2 :
Figure 2: Performance comparison on the KDD data sets.

Figure 3 :
Figure 3: Performance comparison on the UCI data sets.

Figure 5 :
Figure 5: Module structure of the cataract recognition system.

4. 3 .
Application Effect.The cataract images were intercepted from the operation videos provided by Beijing Tongren Hospital.The size of the image had three categories: 20 × 20, 30 × 30, and 40 × 40.The training image set had 649 images, and the test image set had 647 images.Each set had 6 types of images, normal tissues, and cataracts with I to V hardness grades.The class of each image was confirmed by the ophthalmologists from the hospital.The numeric distribution of the different classes of images is shown in Table

Figure 6 :
Figure 6: Comparison of the average error rates.

Table 1 :
Description of the KDD data sets.

Table 2 :
Performance on the KDD data sets (in %).
that were extracted from TCP/IP packets.Each connection pattern is a 41-dimensional feature vector and belongs to one out of five classes (i.e., normal, R2L attacks, DoS attacks, U2R attacks, and probing attacks).The feature can be divided into three categories: intrinsic feature, traffic feature, and content feature.

Table 3 :
Description of the UCI data sets.

Table 4 :
Performance on the UCI data sets (in %).

Table 5 :
Description of the ELENA data sets.

Table 6 :
Performance on the ELENA data sets (in %).

Table 7 :
Number distribution of the different classes of images.