An Optimal Classification Method for Biological and Medical Data

This paper proposes a union of hyperspheres by the mixed-integer nonlinear program to classify biological and medical datasets. A classifying program with nonlinear terms uses piecewise linearization technique to obtain a global optimum. The numerical examples illustrate that the proposed method can obtain the global optimum more effectively than current methods.


Introduction
Classification techniques have been widely applied in the biological and medical research domains 1-5 .Either objects classification or patterns recognition for biological and medical datasets necessarily demands an optimum accuracy for saving patients' lives.However, cancer identification with the supervised learning technique does not take a global view in identifying species or predicting survivals.The improvement should cover the whole scope to give implications instead of only considering the efficiency for diagnosis.This research aims to extract features from whole datasets in terms of induction rules.
In the given dataset with several objects, in which each object has some attributes and belongs to a specific class, classification techniques are used to find a rule of attributes that appropriately describes the features of a specified class.The techniques have been studied over the last four decades, including decision tree-based methods 6-11 , hyperplane-based methods 12-14 , and machine learning-based methods 14-17 .
To assess the effects of these classifying techniques, three criteria are used for evaluating the quality of inducing rules based on the study of Li and Chen 3 .i Accuracy.The rule fitting a class should not cover the objects of other classes.The accuracy of a rule should be the higher the better.
ii Support.A good rule of fitting a class should be supported by most of the objects of the same class.
iii Compact.A good rule should be expressed in a compact way.That is, the fewer the number of rules, the better the rules are.
This study proposes a novel method to induce rules with high rates of accuracy, support, and compactness based on global optimization techniques, which have become more and more useful in biological and medical researches.
The rest of this paper is organized as follows.Section 2 gives an overview of the related literatures.Two types of mathematical models and a classification algorithm are proposed in Section 3. The numerical examples demonstrate the effectiveness of the proposed method in Section 4. Finally, the main conclusions of this study and future work are drawn in Section 5.

Literature Review
Currently, two well-known methods are used to induce classification rules.The first method is the decision tree-based method, which has been developed in the last few decades 6-10 .It is widely applied to fault isolation of an induction motor 18 to classify normal or tumor tissues 19 , skeletal maturity assessment 20 , proteomic mass spectra classification 21 , and other cases 22, 23 .Although the decision tree-based method assumes that all classes can be separated by linear operations, the inducing rules will suffer if the boundaries between the classes are nonlinear.In fact, the linearity assumption prohibits practical applications because many biological and medical datasets have complicated nonlinear interactions between attributes and predicted classes.
Consider the classification problem with two attributes as shown in Figure 1, where " " represents a first-class object, and "•" represents a second-class object.Figure 1 depicts a situation in which a nonlinear relationship exists between the objects of two classes.Decision tree method focuses on inducing classification rules for the objects, as shown in Figure 1 b , in which the decision tree method requires four rectangular regions to classify the objects.
The second is the support vector hyperplane method, which conducts feature selection and rule extraction from the gene expression data of cancer tissue 24 ; it is also applied in other applications 12-14, 25 .The technique separates observations of different classes by multiple hyperplanes.As the number of decision variables is required to express the relationship between each training datum and hyperplane, and the separating hyperplane is assumed a nonlinear programming problem, the training speed becomes slow for a large number of training data.Additionally, similar hypersphere support vector methods have been developed by Lin et al. 26 ,Wang et al. 27 , Gu and Wu 28 , and Hifi and M'Hallah 29 for classifying objects.In classification algorithms, they partition the sample space using the sphere-structured support vector machine 14, 30 .However, these methods need to form a classification problem as a nonlinear nonconvex program, which makes reaching an optimal solution difficult.Taking Figure 1 as an example, a hyperplane-based method requires four hyperplanes to discriminate the objects, as shown in Figure 2.  As previously mentioned, many biological and medical datasets have complicated boundaries between attributes and classes.Both decision tree-based methods and hyperplane-based methods find only the rules with high accuracy, which either cover only a narrow part of the objects or require numerous attributes to explain a classification rule.Although these methods are computationally effective for deducing the classifications rules, they have two limitations as follows.
i Decision tree-based methods are heuristic approaches that can only induce feasible rules.Moreover, decision tree-based methods split the data into hyperrectangular regions using a single variable, which may generate a large number of branches i.e., low rates of compactness .
ii Hyperplane-based methods use numerous hyperplanes to separate objects of different classes and divide the objects in a dataset into indistinct groups.The method may generate a large number of hyperplanes and associated rules with low rates of compactness.Therefore, this study proposes a novel hypersphere method to induce classification rules based on a piecewise linearization technique.The technique reformulates the original hypersphere model by a piecewise linearization approach using a number of binary variables and constraints in the number of piecewise line segments.As the number of break points used in the linearization process increases, the error in linear approximation decreases, and an approximately global optimal solution of the hypersphere model can be obtained.That is, the proposed method is an optimization approach that can find the optimal rules with a high rate of accuracy, support, and compactness.The concept of the hypersphere method is depicted in Figure 3, in which only one circle is required to classify the objects.All objects of class "•" are covered by a circle, and those not covered by this circle belong to class " ."

The Proposed Models and Algorithm
As the classification rules directly affect the rates of accuracy, support, and compactness, we formulate two models to determine the highest accuracy rate and support rate, respectively.To facilitate the discussion, the related notations are introduced first: a i,j : j th attribute value of the i th object, h t,k,j : j th center value of the k th hypersphere for class t, r t,k : radius of the k th hypersphere for class t, n t : number of objects for class t, c i : i th object belonging to class c i ∈ {1, 2, . . ., g}, m: number of attributes, R t : a rule describing class t.
Based on these notations, we propose two types of classification models as follow.

Two Types of Classification Models
Considering the object x i and hypersphere S t,k , and normalizing a i,j i.e., to express its scale easily , we then have the following three notations.Notation 1. Normalization rescales all a i,j as a i,j .The following is a normalizing formula: where 0 ≤ a i,j ≤ 1, a j is the largest value of attribute j, and a j is the smallest value of attribute j.
Notation 2. A general form for expressing an object x i is written as where c i is the class index of object x i .
Notation 3. A general form for expressing a hypersphere S t,k is written as where S t,k is the k'th hypersphere for class t.We use two and three dimensions i.e., two attributes and three attributes as visualizations to depict clearly a circle and a sphere, respectively Figure 4 .Figure 4 a denotes the centroid of the circle as h t,k,1 , h t,k,2 and the radius of the circle as r t,k .They are extended to three dimensions called sphere Figure 4 b ; in m dimensions i.e., m attributes , m > 3, which are then called hyperspheres.
To find each center and the radius of the hypersphere, the following two nonlinear models are considered.The first model looks for a support rate as high as possible while the accuracy rate is fixed to 1, as shown in Model 1.

Mathematical Problems in Engineering
Model 1.One has the following: where I and I − are the two sets for all objects expressed, respectively, by , where object i ∈ class t , 3.5 Referring to Li and Chen 3 , the rates of accuracy and support of R t in Model 1 can be specified by the following definitions.
where I and I − are the two sets expressed by 3.5 and 3.6 , respectively.Similarly, the rates of accuracy and support of R t in Model 2 can be considered as follows.
Definition 3.3.The accuracy rate of a rule R t of Model 2 is denoted as AR R t and is specified as follows.
i If k∈K v t,i ,k 0 belongs to class t, then V t,i 1 for all i ; otherwise, V t,i 0, where K represents the hypersphere set for class t.
ii  Proposition 3.6 referring to Beale and Forrest 31 .Denote approximate function L f x as a piecewise linear function (i.e., linear convex combination) of f x , where b l , l 1, 2, . . ., q represents the break points of L f x .L f x is expressed as follows: w l b l , 3.12 q l 1 w l 1, 3.13 where w l ≥ 0, and 3.13 is a special-ordered set of type 2 (SOS2) constraint (reference to Beale and Forrest [31]).Note that the SOS2 constraint is a set of variables in which at most two variables may be nonzero.If two variables are nonzero, they must be adjacent in the set.Notation 4. According to Proposition 3.6, let f x h 2 t,k,j .f x is linearized by the Proposition 3.6 and is expressed as L h 2 t,k,j .

Solution Algorithm
A proposed algorithm is also presented to seek the highest accuracy rate or the highest support rate, as described as follows.
Step 1. Normalize all attributes i.e., rescale a i,j a i,j − a j / a j − a j to be 0 ≤ a i,j ≤ 1 .
Remove the objects covered by from the dataset, temporarily Step 2 Step 3 Satisfy stop conditions?

No
Step 4 Are all classes processed?

Yes
Step 5 Check the unions of hypersphere in the same class Step 6 Calculate the number of unions for all

End
Step 0 Normalize (all attributes) Solve model 1 for k ′ th hypersphere of class t

US t t
Step 1 t = 1 and k = 1 Step 3. Solve Model 1 or Model 2 to obtain the k th hypersphere of class t.Remove the objects covered by S t,k from the dataset temporarily.
Step 4. Let k k 1, and resolve Model 1 or Model 2 until all objects in class t are assigned to the hyperspheres of same class.
Step 5. Let k 1 and t t 1, and reiterate Step 3 until all classes are processed.
Step 6. Check the independent hyperspheres and unions of hyperspheres S t,k in the same class t.Step 7. Calculate and record the number of independent hyperspheres and unions of hyperspheres in US t , and iterate t until all classes are done.
According to this algorithm, we can obtain the optimal rules to classify objects most efficiently.The process of the algorithm is depicted in Figure 6.

Operation of a Simple Example
Consider a dataset T in Table 1 as an example, which has object i, two attributes a i,1 , a i,2 , and an index of classes c i for i 1, 2, . . ., 15.The dataset T is expressed as T {x i | a i,1 , a i,2 ; c i ∀i 1, 2, . . ., 15}.There are the domain values of c i ∈ {1, 2, 3}.As there are only two attributes, these 15 objects can be plotted on a two-dimensional space after normalizing them, as shown in Figure 7 a .This example can be solved by the proposed algorithm as follows.
Step 6. Check and calculate the unions of hypersphere S t,k for all k in class t i.e., Initial t 1 .
Step 7. As t t 1, mark the number of unions of class t into US t and iterate Step 6 until t g.

Numerical Examples
This

Iris Flower Dataset
The Iris Flower dataset contains 150 objects.Each object is described by four attributes i.e., sepal length, sepal width, petal length, and petal width and is classified by one of three classes i.e., setosa, versicolor, and virginica .By solving the proposed method, we induced six hyperspheres i.e., S

Items
Proposed method Decision tree Hyperplane support vector The accuracy rates of R 1 , R 2 , R 3 in the proposed method are 1,1,1 , as Model 1 has been solved.This finding indicates that none of objects in class 2 or class 3 are covered by S 1,1 , none of objects in classes 1 or 3 are covered by S 2,1 ∪S 2,2 ∪S 2,3 , and none of the objects in classes 1 or 2 are covered by S 3,1 ∪ S 3,2 .The support rate of R 1 ,R 2 ,R 3 in the proposed method is 1,0.98,0.98 , indicating that all objects in class 1 are covered by S 1,1 , 98% of the objects in class 2 are covered by S 2,1 , S 2,2 , and S 2,3 , and 98% of the objects in class 3 are covered by S 3,1 and S 3,2 .The compactness rate of rules R 1 , R 2 , and R 3 is computed as CR R 1 , R 2 , R 3 3/3 1.Finally, we determine the following.
i Although all three methods perform very well in the rates of accuracy and support, the proposed method has the best performance for the accuracy of classes 2 and 3 i.e., R 2 and R 3 .
ii The proposed method has the best compactness rate.

Swallow Dataset
The European barn swallow Hirundo rustica dataset was obtained by trapping individual swallows in Stirlingshire, Scotland, between May and July 1997.This dataset contains 69 swallows.Each object is described by eight attributes, and it belongs to one of two classes i.e., the birds are classified by the gender of individual birds .
Here, we also used Model 1 to induce the classification rules.Table 4 lists the optimal solutions i.e., centroid and radius for both rules R 1 and R 2 .
The result of the decision tree method, which is referred to in Li and Chen 3 , is listed in Table 5, where AR R 1 , R 2 0.97, 1 , SR R 1 , R 2 0.97, 1 , and CR 0.3.
The result of the hyperplane method, referred to in Chang and Lin 35 , is also listed in Table 5, whereAR R 1 , R 2 0.97, 1 , SR R 1 , R 2 0.97, 1 , and CR 0.1.We compared the three methods in Table 5 to show that the proposed method can induce rules with better or equivalent values of AR and SR.In fact, the proposed method also has the best compactness rate.

HSV Dataset
The HSV dataset contains 122 patients classified into four classes, with each patient having 11 preoperating attributes.To maximize the support rate with respect to the proposed method i.e., Model 1 , the proposed method generated seven hyperspheres and three unions of hyperspheres.The centroids and radii of the hyperspheres are reported in Table 6, and a comparison with other methods is reported in Table 7.
Using the decision tree method in the HSV dataset generates 24 rules.In addition, the hyperplane method deduces 45 hyperplanes for the HSV dataset.Table 7 also shows that the proposed method can find rules with the highest rates i.e., AR, SR, and CR compared with the other two methods.

Limitation of the Proposed Method
The hypersphere models are solved by one of the most powerful mixed-integer program software CPLEX 32 running in a PC.Based on optimization technique, the results of the numerical examples illustrate that the usefulness of the proposed method is better than that of the current methods, including the decision tree method and the hyperplane support vector method.As the solving time of the hypersphere model, which is linearized, mainly depends on the number of binary variables and constraints, solving the reformulated hypersphere model from the proposed algorithm takes about one minute for each dataset i.e., in Sections 4.1 and 4.3 , in which using eight piecewise line segments linearizes the nonlinear nonconvex term i.e., L h 2 t,k,j of Model 1.The computing time for solving a linearized hypersphere program grows rapidly as the numbers of binary variables and constraints increase.Also, the computing time of the proposed method is slower than that of the decision tree method and hyperplane method, especially for large datasets or a great number of piecewise line segments.In the further study, utilizing a mainframe-version optimization software 36-38 , integrating metaheuristic algorithms, or using distributed computing techniques can enhance solving speed to conquer this problem.

Conclusions and Future Work
This study proposes a novel method for deducing classification rules, which can find the optimal solution based on a hypersphere domain.The optimization technique for finding classification rules is approached to optimal.Results of the numerical examples illustrate that the usefulness of the proposed method is better than that of the current methods, including the decision tree method and the hyperplane method.The proposed method is guaranteed to find an optimal rule, but the computational complexity grows rapidly by increasing the problem size.More investigation and research are required to enhance further

Figure 1 :
Figure 1: Classifying the objects of two classes.

Figure 4 :
Figure 4: The concept of hypersphere method.

Figure 6 :
Figure 6: Flowchart of the proposed algorithm.

1 a 1 S 1 , 1 S 1 1 b
Normalized data for Example Classified by the hypersphere method for Example 1
The second model looks for an accuracy rate as high as possible while the support rate is fixed to 1, as shown in Model 2.
Definition 3.1.The accuracy rate of a rule R t for Model 1 is AR R t 1. Definition 3.2.The support rate of a rule R t for Model 1 is specified as follows.i If k∈K u t,i,k ≥ 1 for all i belonging to class t, then U t,i 1; otherwise U t,i 0, where K indicates the hypersphere set for class t.

Table 2 :
Centroid points for the Iris data set by the proposed method.

Table 3 :
Comparing results for the Iris flower data set R 1 ,R 2 ,R 3 .
Step 3. The classification model i.e., Model 1 is linearly formulated as follows: 1 , x 2 , . . ., x 6 } and I − {x 7 , x 8 , . . ., x 15 }.The optimal solution of the h t,k,1 , h t,k,2 , r t,1 study shows how the experimental results evaluate the performance, including accuracy, support, and compactness rates, and compares the proposed model with different methods using CPLEX 32 .All tests were run on a PC equipped with an Intel Pentium D 2.8 GHz CPU and 2 GMB RAM.Three datasets were tested in our experiments as follows: i Iris Flower dataset introduced by Sir Ronald Aylmer Fisher 1936 , ii European barn swallow Hirundo rustica dataset obtained by trapping individual swallows in Stirlingshire, Scotland, between May and July 1997 1, 3 , iii the highly selective vagotomy HSV patient dataset of F. Raszeja Memorial Hospital in Poland 3, 33, 34 .

Table 4 :
Centroid points for the Swallow data set by the proposed method.

Table 5 :
Comparing results for the Swallow data set R 1 ,R 2 .

Table 6 :
Centroid points for the HSV data by the proposed method.

Table 7 :
Comparing results for the HSV data set R 1 ,R 2 ,R 3 ,R 4 .computational efficiency of globally solving large-scale classification problems, such as running mainframe-version optimization software, integrating meta-heuristic algorithms, or using distributed computing techniques. the