Support vector machine (SVM) is a popular machine learning method for its high generalizaiton ability. How to find the adaptive kernel function is a key problem to SVM from theory to practical applications. This paper proposes a support vector classifer based on vague sigmoid kernel and its similarity measure. The proposed method uses the characteristic of vague set, and replaces the traditional inner product with vague similarity measure between training samples. The experimental results show that the proposed method can reduce the CPU time and maintain the classification accuracy.
1. Introduction
Support vector machine (SVM) constructs a hyperplane or set of hyperplanes in a high-dimensional feature space, which can be used for classification, regression, or other tasks. It was first introduced by Vapnik [1] for classification and has been applied to many application fields successfully. SVM is based on the structural risk minimization principle, which incorporates capacity control to prevent overfitting and thus is a partial solution to the bias-variance trade-off dilemma. The basic idea of SVM classification is to find such a separating hyperplane that corresponds to the largest possible margin between points of different classes.
How to find the adaptive kernel function is a key problem to SVM from theory to practical applications. There are several kernel functions, for example, the linear kernel, the polynomial kernel, and the RBF kernel. These kernel functions are positive semidefinite (PSD). However, some non-PSD matrices are used in practice. An important one is the sigmoid kernel K(xi,xj)=tanh(axiTxj+r), which is related to neural networks. It was first pointed out by Vapnik [1] that sigmoid kernel matrix might not be PSD for the certain values of parameters, a and r. However, the sigmoid kernel matrix is conditionally positive definite in certain parameters and thus it is valid kernel.
Meanwhile, datasets in real life are usually noisy, and a classifier which is obtained by training with noisy data cannot classify some data samples correctly. So, fuzzy theory is introduced in support vector machines by many researchers in order to solve the above problem. There exist two cases. The first one is fuzzy support vector machine (FSVM) [2–4]. FSVM takes into account the noisy in the training set and associates a fuzzy membership with every sample which will account for the uncertainty in the class to which it belongs. It uses the membership function to express the membership grade of a sample belonging to positive class or negative class. The second one is to combine fuzzy theory to kernel functions of SVM and propose a fuzzy kernel-based SVM. Fuzzy kernel is apt to construct a robust classifier and process classification and regression of uncertain or fuzzy data. Soria-Olivas et al. [5] propose a fuzzy-based activation function for artificial neural networks. Camps-Valls et al. [6] extend the fuzzy-based activation function and propose a support vector classifier based on fuzzy sigmoid kernel. The fuzzy sigmoid function allows lower computational cost and higher rate of positive eigenvalues of the kernel matrix than those from the standard sigmoid kernel [6]. Yang et al. [7] develop a kernel fuzzy c-means clustering-based fuzzy SVM algorithm to deal with the classification problems with outliers or noises.
In fuzzy theories, the vague set theory [8] is one of the methods used to deal with uncertain information and has gradually become more and more popular for handling decision-making problems. Since vague sets can provide more information than fuzzy sets, they are considered superior in mathematical analysis of uncertain information. This paper combines vague sets with sigmoid kernel and proposes a novel support vector classifier based on vague sigmoid kernel.
The rest of this paper is organized as follows: Section 2 reviews the related research and briefly describes the vague set theory and support vector machine. We present a novel support vector classifier based on vague sigmoid kernel and its similarity measure in Section 3. Section 4 presents the experimental results obtained on benchmark data sets and analyzes the performance of the proposed algorithm. Section 5 concludes the paper with some final remarks.
2. Vague Set and Support Vector Machine2.1. The Conception of Vague Set
Fuzzy set theory was first proposed by Zadeh [9]. It is an important mathematical approach to uncertain and fuzzy data analysis and has successfully been applied in the areas of fuzzy control, fuzzy decision making, and so on.
Introduced by Gau and Buehrer [8], vague set is a generalization of the concept of a fuzzy set. Note that the vague set is the same with the intuitionistic fuzzy set in essence according to some research work [10]. The major advantage of vague set over fuzzy set is that the former makes descriptions of the objective world more realistic, practical, and accurate. Presently, many scholars have been interested in the theory and already made further studies. They have been widely applied in medical diagnosis, decision making, pattern recognition, uncertain knowledge acquisition, and so forth [11–14].
Let X be the universe of discourse, X={x1,x2,…,xn-1,xn}, with a generic element of X denoted by xi. A vague set A in X is characterized by a truth-membership function tA and a false-membership function fA,
(1)tA:X→[0,1],fA:X→[0,1],
where tA(xi) is a lower bound on the grade of membership of xi derived from the evidence for xi, fA(xi) is a lower bound on the negation of xi derived from the evidence against xi, and tA(xi)+fA(xi)≤1. It is clear that the grade of membership of xi in the vague set A has been restricted in a subinterval [tA(xi), 1-fA(xi)]of[0,1]. The subinterval [tA(xi), 1-fA(xi)] is called vague value of xi in vague set A.
The vague value [tA(xi), 1-fA(xi)] indicates that the exact grade of membership μA(xi) of xi may be unknown but is bounded by tA(xi)≤μA(xi)≤1-fA(xi).
When the universe of discourse X is continuous, a vague set A can be written as
(2)A=∫X[tA(xi),1-fA(xi)]xi.
When the universe of discourse X is discrete, a vague set A can be written as
(3)A=∑i=1n[tA(xi),1-fA(xi)]xi.
Let πA(xi) be an uncertain degree of xi in vague set A, πA(xi)=1-tA(xi)-fA(xi). πA(xi) characterizes the precision of our knowledge about xi. If πA(xi) is small, our knowledge about xi is relatively precise; if it is large, we know correspondingly little. If 1-fA(xi) is equal to tA(xi), our knowledge about xi is exact, and the theory reverts back to that of fuzzy sets. If both 1-fA(xi) and tA(xi) are equal to 1 or 0, our knowledge about xi is very exact, and the theory reverts back to that of ordinary sets.
For example, let A be a vague set with truth-membership function tA and false-membership function fA, respectively. If a vague value is [0.5, 0.8], then according to Definition 1 we can see that tA(xi)=0.5,fA(xi)=1-0.8=0.2, and πA(xi)=1-0.5-0.2=0.3. It can be interpreted as “assume that the total number of the votes is 10, the votes for a resolution is 5 in favor, 2 against, and 3 abstentions.” Obviously, fuzzy set cannot exactly denote and process the type of obscure information.
Many similarity measures have been proposed in the literature for measuring the degree of similarity between vague sets. Chen [15, 16] proposed the concept of similarity measures between vague sets and defined its expression MC(X,Y) as follows: MC(X,Y)=1-|(S(X)-S(Y))/2|, where X and Y are two vague values, S(X)=tX-fX, and S(Y)=tY-fY. It is obvious that the larger the value of MC(X,Y), the more the similarity between the vague values X and Y. Hung and Yang [11] presented three new similarity measures between intuitionistic fuzzy sets based on Hausdorff distance. Li et al. [17] analyzed and summarized several similarity measures between vague sets. Dou et al. [18] developed a new similarity measure of vague sets and defined the new relative degree of similarity measures to solve the fuzzy shortest path problem.
2.2. Support Vector Machine and Its Kernel
In this section, we briefly review the learning algorithm of support vector machine (SVM) initially proposed in [1]. Given a binary classification problem represented by a dataset {(x1,y1),(x2,y2),…,(xl,yl)}, where xi⊂ℜn represents an n-dimensional data sample and yi∈{+1,-1} represents the class of that data sample, for i=1,…,l, the goal of the SVM learning algorithm is to find an optimal hyperplane that separates these data samples into two classes. In order to find a better separation of classes, the data are first transformed into a higher-dimensional feature space by a mapping function ϕ. Then, a possible separating hyperplane, which resides in the higher-dimensional feature space, can be represented by
(4)w·ϕ(x)+b=0.
The support vector technique requires the solution of the following optimization problem:
(5)minw,b,ξ12∥w∥2+C∑i=1nξi
subject to the constraints
(6)yi(w·ϕ(xi)+b)+ξi-1≥0ξi≥0,i=1,…,n,
where the training vectors xi are mapped into a higher-dimensional space by the function ϕ parameter C is a user-specified positive parameter that controls the trade-off between maximizing the margin and minimizing the training error term. The slack variables ξi>0 hold for misclassified samples, and therefore, ∑i=1lξi can be thought of as a measure of the amount of misclassifications. This quadratic optimization problem can be solved by constructing a Lagrangian representation and transforming it into the following dual problem:
(7)maxλW(λ)=∑i=1nλi-12∑i=1n∑j=1nλiλjyiyjxi·xj=∑i=1nλi-12∑i=1n∑j=1nλiλjyiyjK(xi,xj)
subject to the constraints
(8)∑i=1nλiyi=0,0≤λi≤C,i=1,…,n,
where λi is the Lagrangian parameter. Note that the kernel trick K(xi,xj)=ϕ(xi)·ϕ(xj) is used in the last equality in (7). The Karush-Kuhn-Tucker conditions of SVM are defined by
(9)λi[yi(w·ϕ(xi)+b)-1+ξi]=0,i=1,…,n,(C-λi)ξi=0,i=1,…,n.
The sample xi with the corresponding nonzero λi is called a support vector. The optimal value of weight vector w0 is obtained by w0=∑i=1nλiyiϕ(xi)=∑i=1nsλiyiϕ(xi), where ns is the number of support vectors. The optimal value of bias b0 can be computed from the Karush-Kuhn-Tucker conditions (9); namely, b0=yi-w0·ϕ(xi), for a random support vector sample xi. Once the optimal pair (w0,b0) is determined, the SVM decision function is then given by
(10)f(x)=sign(∑i=1nsλiyiK(x,xi)+b),
where K(xi,xj) is called the kernel function as follows:
(11)K(xi,xj)=ϕ(xi)Tϕ(xj).
Several typical kernel functions are the linear kernel K(xi,xj)=xi·xj, the polynomial kernel K(xi,xj)=(axiTxj+r)d, and the RBF kernel K(xi,xj)=exp(-γ∥xi-xj∥2).
The kernel functions above must satisfy Mercer condition. Namely, the kernel function matrix is a symmetric and positive semidefinite (PSD) matrix. Nevertheless, some non-PSD matrices are used in SVM in practice [19]. The sigmoid kernel K(xi,xj)=tanh(axiTxj+r) is an available non-PSD kernel function. The sigmoid kernel is also known as the hyperbolic tangent kernel and as the multilayer perceptron (MLP) kernel, which comes from the neural networks field.
It was first pointed out by Vapnik [1] that its kernel matrix might be non-PSD for certain values of the paramenters α and r. When the kernel function is non-PSD, (11) cannot be satisfied. H. T. Lin and C. J. Lin [19] also study non-PSD kernel function and its applications to SVM and testify the sigmoid kernel matrix is conditionally positive definite (CPD). When parameters a>0 and r<0, the sigmoid kernel is suitable for a valid kernel. The sigmoid kernel has been used in several practical cases, such as support vector machine classification [6, 19], decision rules extraction [20], and chaotic time series prediction [21].
3. The Proposed Support Vector Classifier Based on Vague Similarity Measure
This section presents the proposed method for vague sigmoid kernel-based support vector classifier. It first gives a brief introduction to fuzzy kernel and then focuses on a proposed algorithm.
3.1. Fuzzy Kernel
Several researchers have studied fuzzy kernel. Kwan [22] proposes a simple sigmoid-like nonlinear activation function more suitable for digital hardware implementation as follows:
(12)f(x)={sign(x),|x|≥L-x·|x|L2+2xL,otherwise,
where L is the width of the transition region.
Inspired by the work of Kwan, Soria-Olivas et al. [5] think that the activation function of (12) can be drawn in a more natural way by defining the classical activation function by means of the fuzzy logic methodology and propose a fuzzy-based activation function for artificial neural networks, which considers triangular functions due to their simplicity. The fuzzy-based sigmoid function models the hyperbolic tangent function by means of linguistic variables. Camps-Valls et al. [6] extend the fuzzy-based activation function and propose a support vector classifier based on fuzzy sigmoid kernel. The fuzzy sigmoid kernel allows lower computational cost and higher rate of positive eigenvalues of the kernel matrix, which alleviates current limitations of the sigmoid kernel.
Although fuzzy set theory can preferably characterize fuzziness, it has an obvious shortage due to using a single-value membership μA(ui)∈[0,1] to represent the degree of membership. Fuzzy theory lacks consideration of some nondeterministic factors among samples. In this case, we propose a fuzzy sigmoid kernel support vector classifier based on vague theory.
3.2. Vague Value Computation
The proposed algorithm considers two-class classification for simplicity. We first decide the class center of each class, namely, xA* and xB* in Figures 1–3, and then compute vague values of samples.
Vague value computation in class A.
Vague value computation in class B.
Vague value computation in intersection area between class A and class B.
For the sample x in the training set, if x belongs to class A but does not belong to the intersection area between class A and class B, we define vague value of x as follows:
(13)v(x)=[1-∥x-xA*∥rA,1-∥x-xA*∥rA],
where rA is the radius of class A. This case is shown in Figure 1. Similarly, let rB be the radius of class B, we get ∥x-xB*∥/rB>1.
If x belongs to class B but does not belong to the intersection area between class A and class B, we define vague value of x as follows:
(14)v(x)=[1-∥x-xB*∥rB,1-∥x-xB*∥rB],
where rB is the radius of class B. This case is shown in Figure 2. We get ∥x-xA*∥/rA>1.
If sample x belongs to the intersection area, we label mA and mB as mA=∥x-xA*∥/rA and mB=∥x-xB*∥/rB and define vague value of x as follows:
(15)v(x)={[mA(1-mB),mB(1-mA)],ifmA≤mB[mB(1-mA),mA(1-mB)],ifmA>mB.
This case is shown in Figure 3.
Through a detailed analysis of samples in class A and class B, we find tA(x)=1-fA(x) shown in Figures 1 and 2, and vague set reverts back to fuzzy set in two cases. For sample x in Figure 3, we reduce the classification effect of these samples.
3.3. Similarity Measure of Vague Value and Its Kernel
For samples xi and xj, their inner xi·xj is obtained typically by computing Euclidean distance between two samples. In this paper, we replace Euclidean distance with similarity measures between vague sets after introducing vague membership. A similarity measure is used for estimating the degree of similarity between two sets. The main idea is described below. We first compute vague values of samples and then represent these vague values with points in the spatial coordinate system. At last, we compute similarity measures between points.
Definition 2 (see [<xref ref-type="bibr" rid="B19">23</xref>]).
Let v(xi) be the vague value of sample xi computed by the above method. The corresponding point in the spatial coordinate system is represented as Di(t(xi)(1+π(xi)),f(xi)(1+π(xi)),π2(xi)). We also denote a 3-tuple Di(T(xi),F(xi),Π(xi)) for simplicity, where T(xi)=t(xi)(1+π(xi)), F(xi)=f(xi)(1+π(xi)), and Π(xi)=π2(xi).
As shown in Definition 2, Di in the spatial coordinate system includes three parts, t(xi),f(xi),andπ(xi), respectively. The meanings of t(xi),f(xi),andπ(xi) are shown in Definitions 1 and 2. Analyzing from the vote model, we consider that some abstentions are likely prone to be in favor, others are likely prone to be against, and others are likely to be abstention. So, we further divide abstention part into three parts: t(xi)π(xi), f(xi)π(xi), and π2(xi), which represent the cases of being in favor, against, and abstention in all abstentions, respectively. We can use a point in three-dimensional space to depict a membership degree of a training sample.
Obviously, t(xi)+f(xi)+π(xi)=1. Using Definition 3, we can get a point in three-dimensional space Di. For three parts in Di, t(xi)(1+π(xi))+f(xi)(1+π(xi))+π2(xi)=(t(xi)+f(xi))(1+π(xi))+π2(xi)=(1-π(xi))(1+π(xi))+π2(xi)=1.
Definition 3.
Let Di and Dj be two points defined as Definition 2; then their similarity measures can be defined as follows:
Based on vague value and similarity measure above, we give a computation method of vague sigmoid kernel function. Expression (12) can be readily rewritten as a function of a and r as follows:
(17)K(xi,xj)={sign(M),|M|≥1a2αM-α2M·|M|,otherwise,
where M=T(Di,Dj)+r/a.
3.4. The Proposed Vague Sigmoid-Based Support Vector Classifier
In order to compute vague values of training samples, we first decide the class center of each class. In this paper, we select fuzzy c-means (FCM) method [24, 25] to do it. In fuzzy clustering, FCM method has become one of the most popular techniques.
FCM algorithm starts with an initial guess for the cluster centers, which intends to mark the mean location of each cluster [24, 25]. The initial guess for these cluster centers is most likely incorrect. Additionally, FCM assigns every data point a membership grade for each cluster. By iteratively updating the cluster centers and the membership grades for each data point, FCM iteratively moves the cluster centers to the “right” location within a data set. This iteration is based on minimizing an objective function that represents the distance from any given data point to a cluster center weighted by that data point’s membership grade. Namely,
(18)minJ(U,V)=∑i=1c∑k=1luikmd(xk,vi),(19)s.t.∑i=1cuik=1,0≤uik≤1,
where c is the number of clusters and selected as a specified value in this paper, l is the number of data points, uik∈[0,1] denotes the degree to which the sample xk belongs to the ith cluster, m is the fuzzy parameter controlling the speed and achievement of clustering, d(xk,vi)=∥xk-vi∥2 denotes the distance between point xk and the cluster center vi, and V is the set of cluster centers or prototypes (vi∈Rp). When the objects change clusters, the membership values are recalculated according to the following formula:
(20)uik=1∑j=1c(d(xk,pi)/d(xk,vj))2/(m-1).
Each cluster center is then calculated by
(21)vi=∑k=1l(uik)mxk∑k=1l(uik)m.
After getting the cluster centers, the algorithm can compute vague values of training sample and measure the similarity of vague values.
According to analysis above, we propose a novel vague sigmoid-based support vector classifier. The algorithm description is as follows.
Step 1.
Preprocess data set and classify training data set and testing data set.
Step 2.
Use FCM algorithm to compute the membership uik of each training sample xi to each class k and to obtain the cluster centers.
Substep 2.1. Select the number of clusters c, the maximal iterative count Nmax, fuzziness parameter m(letm=2) and converge error ε>0.
Substep 2.2. Initialize the membership matrix U(0)={uik(0)} satisfied with constraint conditions
(22)∑i=1cuik=1,0≤uik≤1.
Substep 2.3. For t=1,…,Nmax,
calculate the membership matrix U(t)={uik(t)} according to (20);
calculate the cluster centers vi(t)(i=1,…,c) according to (21);
calculate the objective function J(t)(U,V) according to (18);
when |J(t)(U,V)-J(t-1)(U,V)|<ε or t=Nmax, stop iteration and return the membership matrix U(t) and the cluster centers vi(t).
Step 3.
Compute vague values of training samples using (13)–(15).
Step 4.
Use SVM based on vague sigmoid kernel to train samples with vague values.
The key steps of the proposed algorithm are to compute vague values and to compute vague sigmoid kernel K(xi,xj)=tanh(axiTxj+r).
4. Experimental Analysis
Six data sets from the University of California at Irvine (UCI) machine learning repository [26] are used in our experiments. The data sets include Ionosphere, Sonar, Pima-diabetes, Wdbc, Iris and Vehicle. Iris, and Vehicle data are multiclass problems. The other data are binary classification problems. The characteristics of these data sets are shown in Table 1. We tested the proposed vague sigmoid kernel method and compared it to sigmoid-based SVM and fuzzy sigmoid-based SVM. The results of these methods depend on the values of the kernel parameters a and r and penalization parameter C.
Characteristics of experimental data sets.
Dataset
No. of samples
No. of attributes
No. of classes
Ionosphere
351
34
2
Sonar
208
60
2
Pima-diabetes
768
8
2
Wdbc
569
30
2
Iris
150
43
3
Vehicle
846
18
4
In order to test the proposed methods, SVM models were trained by using LIBSVM [27]. Parameter a was fixed to 1/A, where A is the input dimension of data set, and other parameters of all methods were optimized using grid-based 5-fold cross-validation method. For all the datasets each training set and each testing set were the same for all methods. For all the datasets, we used the 5-fold cross-validation method to estimate the accuracy of the classifiers. We compare classification accuracy and CPU time (s) for the sigmoid, the fuzzy-based sigmoid, and the vague-based sigmoid kernels. Experimental results are shown in Table 2.
Accuracy (%) and time (s) comparison of different methods.
Dataset
Sigmoid
Fuzzy sigmoid
Vague sigmoid
Accuracy (%)
Time (s)
Accuracy (%)
Time (s)
Accuracy (%)
Time (s)
Ionosphere
94.8
22.30
93.4
20.52
94.3
21.83
Sonar
88.2
31.85
88.1
24.87
87.9
28.32
Pima-diabetes
70.9
17.56
72.8
12.38
73.5
14.54
Wdbc
97.3
21.12
97.9
18.04
97.9
19.28
Iris
97.4
4.35
93.8
3.87
96.1
4.21
Vehicle
71.2
36.74
65.3
23.47
70.8
30.42
From Table 2, we can find that the sigmoid-based method achieves the better accuracy than the fuzzy sigmoid method and the vague sigmoid method on Ionosphere, Sonar, Iris, and Vehicle data sets. As we can see, the accuracy of the fuzzy sigmoid method and the vague sigmoid method is better than that of the sigmoid-based method on Pima-diabetes and Wdbc data sets. However, the sigmoid-based method has also more CPU times than the other methods. Notice that the vague sigmoid method achieves the same or slightly better accuracy than the fuzzy sigmoid method.
If we consider the cost of accuracy and CPU time, we may prefer the solution found with the vague sigmoid method. We can find that the average CPU time in the all 6 data sets is 22.32 s in the sigmoid method, while there is 19.77 s in the vague sigmoid method; in the meantime, the classification accuracy does not decrease remarkably.
5. Conclusions
Support vector machine is a novel machine learning method which has been applied to many application fields successfully. The sigmoid kernel was quite popular for support vector machines due to its origin from neural networks. In this paper, we propose a vague sigmoid kernel-based support vector classifier. The proposed method is combined with vague set methodology, which makes the computation of SVM simple. In vague sigmoid kernel, we replace the inner product computation using Euclidean distance between two samples with similarity measures. The experiment is conducted by using 6 data sets from the UCI machine learning repository. The results of classification are evaluated and compared in terms of the performance using accuracy and time. The results obtained from the experiment indicated that the proposed method can reduce the CPU time and maintain the classification accuracy.
Acknowledgments
This work is supported by China Postdoctoral Science Foundation (no. 20110491530), the Science Research Plan of Liaoning Education Bureau (no. L2011186), and the Dalian Science and Technology Planning Project of China (no. 2010J21DW019).
VapnikV. N.ChenJ. H.ChenC. S.Fuzzy kernel perceptronTsujinishiD.AbeS.Fuzzy least squares support vector machines for multiclass problemsLinC. F.WangS. D.Training algorithms for fuzzy support vector machines with noisy dataSoria-OlivasE.Martín-GuerreroJ. D.Camps-VallsG.Serrano-LópezA. J.Calpe-MaravillaJ.Gómez-ChovaL.A low-complexity fuzzy activation function for artificial neural networksCamps-VallsG.Martín-GuerreroJ. D.Rojo-ÁlvarezJ. L.Soria-OlivasE.Fuzzy sigmoid kernel for support vector classifiersYangX.ZhangG.LuJ.MaJ.A kernel fuzzy c-means clustering-based fuzzy support vector machine algorithm for classification problems with outliers or noisesGauW. L.BuehrerD. J.Vague setsZadehL. A.Fuzzy setsBustinceH.BurilloP.Vague sets are intuitionistic fuzzy setsHungW. L.YangM. S.Similarity measures of intuitionistic fuzzy sets based on Hausdorff distanceZhangD.ZhangJ.LaiK. K.LuY.An novel approach to supplier selection based on vague sets group decisionYeJ.Using an improved measure function of vague sets for multicriteria fuzzy decision-makingFengL.LiT.RuanD.GouS.A vague-rough set approach for uncertain knowledge acquisitionChenS. M.Measures of similarity between vague setsChenS. M.Similarity measures between vague sets and between elementsLiY.OlsonD. L.QinZ.Similarity measures between intuitionistic fuzzy (vague) sets: a comparative analysisDouY.ZhuL.WangH. S.Solving the fuzzy shortest path problem using multi-criteria decision method based on vague similarity measureLinH. T.LinC. J.A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods2003Department of Computer Science and Information Engineering, National Taiwan Universityhttp://www.csie.ntu.edu.tw/~cjlin/papers/tanh.pdfLiuQ.ZhangY.HuZ.Extracting decision rules from sigmoid kernelProceedings of the 4th International Conference on Advanced Data Mining and Applications2008294304LiuH.LiuD.DengL. F.Chaotic time series prediction using fuzzy sigmoid kernel-based support vector machinesKwanH. K.Simple sigmoid-like activation function suitable for digital hardware implementationLiuH. W.Basis of fuzzy pattern recognition-similarity measuresDunnJ. C.A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clustersBezdekJ. C.A convergence theorem for the fuzzy ISODATA clustering algorithmsFrankA.AsuncionA.ChangC. C.LinC. J.LIBSVM: a library for support vector machines