A similarity classifier based on Bonferroni mean based operators is introduced. The new Bonferroni mean based variant of the similarity classifier is also extended to cover a new BonferroniOWA variant. The new BonferroniOWA based similarity classifier raises the question of how to accomplish the weighting needed and for this reason we also examine a number of linguistic quantifiers for weight generation. The new proposed similarity classifier variants are tested on four real world medical research related data sets. The results are compared with results from two previously presented similarity classifiers, one based on the generalized mean and another based on an arithmetic mean operator. The results show that comparatively better classification accuracy can be reached with the proposed new similarity classifier variants.
In this paper we introduce a new generalization to the similarity classifier that is based on using Bonferroni mean operators in the aggregation of similarities. The Bonferroni mean aggregation operator was introduced in [
In this paper we also apply an ordered weighted averaging (OWA) based variant of the Bonferroni mean, the socalled “BonferroniOWA operator,” proposed by Yager [
In the field of medical research, classification is a key concept and the use of classifiers is warranted in many practical problems, such as patient diagnosis and inevitably also the prognosis of various human conditions and pathologies [
The rest of the paper is organized as follows: in the second section we briefly go through the aggregation operators, the weight generation schemes for the new OWA based classifier variants, and the similarity measures applied in the paper, in the third section we introduce the new similarity classifiers and the new variants, and in the fourth section we first shortly introduce the used medical research data sets and then examine the achieved results. The paper is closed with discussion and conclusions.
The choice of an aggregation operator that is used in a similarity classifier is a fundamental issue, as it affects the final classification accuracy of the classifier. Several aggregation operators that can be used are available in the existing literature; in this paper we concentrate on averaging type aggregation operators [
One of the most common aggregation operators is the arithmetic mean, from which several different generalizations exist, for example, the generalized mean and the ordered weighted average (OWA). The aggregation operator is an important component that is used in similarity classifiers and in this paper, we specifically propose and examine the use of the Bonferroni mean and the BonferroniOWA as aggregation operators to be used in a similarity classifier, to create new similarity classifiers. The presented new variants of the similarity classifier are compared with previously presented methods that use the generalized mean and the arithmetic mean. Both the generalized mean and the arithmetic mean are special cases of the Bonferroni mean [
Let
By varying the value of the parameter
One other type of generalization of the arithmetic mean is the ordered weighted averaging operator. The ordered weighted averaging operator was introduced by Yager in [
Let
As it is our intention to apply the OWA together with the Bonferroni mean, we next present the Bonferroni mean operator and its OWA extension, the socalled BonferroniOWA operator, following the work by Yager in [
Let
It has been shown that the Bonferroni mean is an averaging operator and that it satisfies the necessary axioms (see [
Let
When the OWA operator is used, the need to generate the weights that the OWA uses arises; we propose to do this by applying linguistic quantifiers introduced by Zadeh [
Linguistic quantifiers are quantifiers that use a scale of linguistic expressions to summarize the properties of a class of objects without enumerating them; this way they offer an imprecise and a flexible methodology for the quantification of objects; Ying [
A fuzzy subset
During the ordered weighted aggregation process, terms like
In this paper we consider five different RIM quantifiers; these are the “basic,” “polynomial,” “quadratic,” “exponential,” and “trigonometric” RIM quantifiers. In what follows, we have denoted these with subscript enumerations 1–5 in the order given above. Next we briefly present each of the five selected RIM quantifiers and show how they can be applied in creating weight generating schemes for OWA.
The basic linguistic quantifier,
which is associated with the weights
The linguistic quantifier,
when
The quadratic linguistic quantifier,
applying it to weight generation we get
For the purposes of practical implementation, we have chosen
The exponential linguistic quantifier,
when it is applied to weight generation we get
The trigonometric linguistic quantifier,
and application to weight calculation gives
These operators, with the generated weighting vectors, are applied in the aggregation of similarities.
In this paper we use similarity measures in a generalized Łukasiewiczstructure (see [
Several other means can be used instead of the arithmetic mean in (
A new Bonferroni mean based similarity classifier and its OWA variant are introduced in this section. Before going into details of these new classifiers, we briefly describe the main components typically found in similarity classifiers.
It is possible to determine the similarity between two or more samples in a given data set; the main idea is based on comparing samples and as a result of the comparison providing a numerical value that represents their similarity. Typically for similarity classifiers, resulting values closer to 1 indicate high similarity between objects and values closer to
Suppose a data matrix
The sample
In order to use similarity with the BonferroniOWA in Algorithm
The experiments were carried out by splitting each studied data set into two parts, one part for training and the other for testing. The data set divisions were repeated randomly
Data sets used in testing our new classifier were retrieved from the UCI Machine Learning Repository [
Data sets used and their main properties.
Data set  Number of classes  Number of attributes  Number of instances 

Fertility  2  10  100 
Blood transfusion service center  2  5  748 
Echocardiogram  2  12  132 
Lung cancer  3  56  32 
Further detailed attribute information for the fertility, blood transfusion service center, and echocardiogram data sets is presented in Tables
Fertility data set attribute information.
Attribute number  Attribute name 

1  Season 
2  Age 
3  Childish diseases 
4  Accident or serious trauma 
5  Surgical intervention 
6  High fevers in the last year 
7  Frequency of alcohol consumption 
8  Smoking habit 
9  Number of hours 
10  Output (class attribute) 
Blood transfusion service center data set attribute information.
Attribute number  Attribute name 

1  Recency: months since last donation 
2  Frequency: total number of donations 
3  Monetary: total blood donated in c.c. 
4  Time: months since first donation 
5  Donated blood or not (class attribute) 
Echocardiogram data set attribute information.
Attribute number  Attribute name 

1  Survival 
2  Still alive 
3  Age at heart attack 
4  Pericardial effusion 
5  Fractional shortening 
6  EPSS 
7  LVDD 
8  Wall motion score 
9  Wall motion index 
10  Mult 
11  Name 
12  Alive at one year or not (class) 
In this section we present the obtained results from the experiments. Mean accuracies from 30 separate runs were computed for each data set and for each classifier combination. The resulting classification accuracies and the variances obtained are reported in Tables
Classification results with the fertility data set.
Method  Mean accuracy (%)  Variance 

Similarity classifier with BonferroniOWA using  
Basic RIM quantifier  70.60  0.0073 
Polynomial quantifier 

0.0068 
Quadratic quantifier  69.13  0.0094 
Exponential quantifier  70.40  0.0084 
Trigonometric quantifier  69.40  0.0076 


Similarity classifier with  
( 

0.0105 
( 
66.87  0.0031 
( 
69.07  0.0073 
Classification results with blood transfusion service center data set.
Method  Mean accuracy (%)  Variance 

Similarity classifier with BonferroniOWA using  
Basic RIM quantifier 


Polynomial quantifier 


Quadratic quantifier 


Exponential quantifier 


Trigonometric quantifier 




Similarity classifier with  
( 


( 
67.87 

( 
76.43 

Classification results with the echocardiogram data set.
Method  Mean accuracy (%)  Variance 

Similarity classifier with BonferroniOWA using  
Basic RIM quantifier 


Polynomial quantifier 


Quadratic quantifier 


Exponential quantifier 


Trigonometric quantifier 




Similarity classifier with  
( 


( 
86.89 

( 
90.11 

Classification results with the lung cancer data set.
Method  Mean accuracy (%)  Variance 

Similarity classifier with BonferroniOWA using  
Basic RIM quantifier  81.96  0.0052 
Polynomial quantifier  81.37  0.0038 
Quadratic quantifier  82.75  0.0036 
Exponential quantifier  82.16  0.0027 
Trigonometric quantifier  81.57  0.0033 


Similarity classifier with  
( 

0.0032 
( 
82.16  0.0032 
( 
82.16  0.0032 
For the fertility data set, the highest classification accuracy of
Figure
Mean classification accuracies (a) and the variances (b) obtained from the fertility data set with the Bonferroni mean based classifier, when
When the new similarity classifiers were used to classify the blood transfusion service center data set, the highest achieved mean accuracy was
Mean classification accuracies (a) and the variances of the accuracies (b) obtained from the blood transfusion service center data set with the BonferroniOWA based similarity classifier with the basic linguistic quantifier variant.
With echocardiogram data set, the highest achieved mean classification accuracy was
Figure
Mean classification accuracies (a) and the variances (b) obtained from the echocardiogram data set with the trigonometric linguistic quantifier variant of the BonferroniOWA based similarity classifier.
When the lung cancer data set was examined, the highest achieved mean accuracy was
Mean classification accuracies (a) and the variances (b) obtained from the lung cancer data set with the Bonferroni mean based similarity classifier, when
In Table
A summary of classification accuracies for each of the similarity classifiers.
Data set  Bonferroni mean based classifier  BonferroniOWA based classifier  Generalized mean based classifier  Arithmetic mean based classifier 

Fertility 

70.63%  69.07%  66.87% 
Blood transfusion  76.86% 

76.43%  67.87% 
Echocardiogram  90.96% 

90.11%  86.89% 
Lung cancer 

82.75%  81.57%  82.16% 
In this paper we have proposed a new Bonferroni mean based similarity classifier and a new BonferroniOWA based similarity classifier with five variants that are based on different linguistic quantifier based weight generation schemes for the OWA used in the classifier. The classification performance of the proposed new similarity classifiers was tested on four real world medical data sets, for each of which thirty sets of runs were made and the mean average classification performance was recorded. As a benchmark, we have compared the results from the proposed new similarity classifiers with two previously presented similarity classifiers, based on the generalized mean and on the arithmetic mean. The mean classification performance of the proposed new similarity classifiers was better than the performance of the benchmarks; however, not on all data sets was the difference in performance statistically significant. Nevertheless, there is evidence that suggests that the proposed new similarity classifiers perform at least as well as and often better than the benchmark similarity classifiers. We note that the performance of these classifiers is data dependent.
Future research on the subject of similarity classifiers, multiclassifier approach, could be considered where each classifier would have one vote on samples class and the final class of the sample is decided by the consensus of the classifiers.
The authors declare that there are no competing interests regarding the publication of this paper.