ect of Prior Probabilities on the Classi catory Performance of Parametric and Mathematical Programming Approaches to the Two Group Discriminant Problem

A mixed integer programming model MIP incorporating prior probabilities for the two group discriminant problem is presented Its classi catory performance is compared against that of Fisher s linear discrimininant function LDF and Smith s quadradic discriminant function QDF for simulated data from normal and nonnormal populations for di erent settings of the prior probabilities of group membership The proposed model is shown to outperform both LDF and QDF for most settings of the prior probabilities when the data are generated from nonnormal populations but underperforms the parametric models for data generated from normal populations


Introduction
Mathematical programming approaches to discriminant analysis have attracted considerable research interest in recent years.Simulation studies by Freed and Glover 5], Joachimsthaler and Stam 7], Stam and Jones 12], Hosseini and Armacost 6] and Loucopoulos and Pavur 10], h a ve s h o wn that these mathematical programming approaches are viable alternatives to Fisher's 2] linear discriminant function (LDF) and Smith's 11] quadratic discriminant function (QDF).An appealing characteristic of mathematical programming (MP) approaches is that they do not rely on the assumption of multivariate normality, nor do they impose any conditions on the covariance structures for optimal classi catory performance.In contrast, both LDF and QDF assume multivariate normality, with equal or unequal covariance structures respectively.Despite a plethora of proposed mathematical programming models for the twogroup discriminant problem by Freed and Glover 3], 4], Choo and Wedley 1], Koehler and Erenguc 8] and Lam, Choo and Moy 9], the e ect of prior probabilities on the classi catory performance of MP models has not received any research interest.This paper proposes a mathematical programming model incorporating the e ect of prior probabilities and compares its holdout classi catory performance against that of LDF and QDF for various settings of the prior probabilities of group membership.The proposed model is presented in the next section.The simulation experiment for the comparison of classi catory performance is described in Sec-tion 3, with simulation results analyzed in Section 4 and conclusions presented in Section 5.

An MIP Model Incorporating Prior Probabilities for the Two{Group Problem
In this section, a modi cation to the MIP model for the two-group discriminant problem is proposed.The modi cation involves the incorporation of prior probabilities into the objective function and the elimination of the risk of unacceptable solutions with the inclusion of appropriate constraints.This mixed-integer programming model is presented below. Notation: is the weight assigned to attribute variable X k (k = 1, 2, ..., p) is the value of variable X k for observation i ( i = 1 , 2 , . . ., n ) is the maximum deviation of a misclassi ed observation from the cuto value of its group Thus, a k (k = 1, 2, ..., p), I i (i = 1, 2, ..., n) and c are decision variables whose values are to be determined by the model, whereas j (j = 1, 2), M and are parameters. Formulation: where a k (k = 1 2 : : : p ) a n d c are sign-unrestricted variables.
The objective of this formulation is the minimization of the weighted sum of misclassi cations with the prior probabilities of group membership being the weights.In this formulation, a discriminant score p P k=1 a k X (i)   k is computed for each observation.
According to the rst constraint, an observation i 2 G 1 will be correctly classi ed, if its discriminant score p P k=1 a k X (i) k does not exceed c.Otherwise it is misclassi ed.However, its discriminant score cannot exceed c by more than M, where M is a preset large positive c o n s t a n t.
According to the second constraint, an observation i 2 G 2 will be correctly classi ed if its discriminant score p P k=1 a k X (i)   k exceeds c + , where is a preset small positive constant.Otherwise, the observation is misclassi ed.If i 2 G 2 is misclassi ed, the value of its discriminant score cannot fall below c + ; M. The purpose of the gap of width between the two groups is to enhance group separation.
The last two constraints guarantee that, whatever the values of the prior probabilities or the attribute variables, an unacceptable solution with a 1 = a 2 = ::: = a p = 0 is not feasible.In this case, all the observations would be classi ed into the same group.

Simulation Experiment
The holdout sample classi catory performance of the proposed model was compared against that of Fisher's linear discriminant function (LDF) and Smith's quadratic discriminant function (QDF) using data generated from bivariate normal, contaminated normal and exponential populations.The di erent con gurations included  in this simulation study are presented in Table 1.The prior probabilities i of membership in group G i were assigned values .20,.35,.50,.65 and .80,whereas the values of the parameters M and in the MIP model were set at 100 and .001,respectively.Such v alues of the parameters M and are consistant w i t h t h e p r a ctice employed in previous simulation studies on the classi catory performance of mathematical programming approaches to the discriminant problem, calling for the assignment of a large value to M and a small value to .Training samples of size 100 (50 per group) were simulated.Holdout samples of size 1000 were generated with the number of observations from group G i being 1000 i (i = 1, 2), where i represents the prior probability of membership in group G i .Each experimental condition was replicated 100 times.The simulation study was carried out using SAS 6.11 on a RISC 6000/58H computer.
In con gurations N 1 , N 2 and N 3 , the data were simulated from normal populations with equal and unequal covariance structures.In con gurations C 1 , C 2 and C 3 , the data were generated from contaminated normal populations with a contaminating fraction of .10.It should be noted that (c)  i and i refer to the mean and covariance structure, respectively, of the contaminant component of group G i (i = 1, 2).In con gurations E 1 , E 2 and E 3 , the data were generated from exponential populations with starting points a i and density function:

Simulation Results
The percentage misclassi cation rates of the di erent models in the holdout sample are presented in Tables 2. Under experimental conditions optimal for Fisher's linear discriminant function (con guration N 1 ), the proposed MIP model yielded higher mean misclassi cation rates than either LDF or QDF in the holdout sample.Under experimental conditions optimal for QDF (con gurations N 2 and N 3 ), the proposed model underperformed QDF for all settings of the prior probabilities, but outperformed LDF for certain settings of the prior probabilities.
When the data are generated from contaminated normal populations (con gurations C 1 , C 2 and C 3 ), the MIP model had lower average misclassi cation rates than both LDF and QDF in the holdout sample.This was true for all values assigned to the prior probabilities i .

Conclusions
This paper examines the e ect of prior probabilities on the classi catory performance of a proposed MIP model as well as the standard parametric procedures (LDF and QDF).It is shown that, regardless of the values assigned to the prior probabilities, the proposed MIP model will yield higher misclassi cation rates in the holdout sample when the experimental conditions are optimal for the parametric procedures.It is also shown that for data generated from contaminated normal populations, the proposed model outperforms both LDF and QDF, regardless of the values assigned to the prior probabilities.For data generated from exponential populations, the MIP model outperformed the other two models when the prior probabilities of membership in group G 1 (i = 1, 2) was .35,.50 or .65.However for 1 =.20 and 1 =.80, the results of the simulation study were inconclusive for data generated from exponential populations.
Because of the numerous possibilities in terms of data con gurations, prior probabilities and sample sizes, it may be inappropriate to draw generalized conclusions about the classi catory performance of the proposed model.Further research should focus on the relative performance of the proposed MIP model under di erent experimental conditions.

Table 1 .
Con gurations used in the simulation study