Multinomial Regression with Elastic Net Penalty and Its Grouping Effect in Gene Selection

and Applied Analysis 3 Hence,


Introduction
Support vector machine [1], lasso [2], and their expansions, such as the hybrid huberized support vector machine [3], the doubly regularized support vector machine [4], the 1-norm support vector machine [5], the sparse logistic regression [6], the elastic net [7], and the improved elastic net [8], have been successfully applied to the binary classification problems of microarray data.However, the aforementioned binary classification methods cannot be applied to the multiclass classification easily.Hence, the multiclass classification problems are the difficult issues in microarray classification [9][10][11].
Besides improving the accuracy, another challenge for the multiclass classification problem of microarray data is how to select the key genes [9][10][11][12][13][14][15].By solving an optimization formula, a new multicategory support vector machine was proposed in [9].It can be successfully used to microarray classification [9].However, this optimization model needs to select genes using the additional methods.To automatically select genes during performing the multiclass classification, new optimization models [12][13][14], such as the  1 norm multiclass support vector machine in [12], the multicategory support vector machine with sup norm regularization in [13], and the huberized multiclass support vector machine in [14], were developed.
Note that the logistic loss function not only has good statistical significance but also is second order differentiable.Hence, the regularized logistic regression optimization models have been successfully applied to binary classification problem [15][16][17][18][19]. Multinomial regression can be obtained when applying the logistic regression to the multiclass classification problem.The emergence of the sparse multinomial regression provides a reasonable application to the multiclass classification of microarray data that featured with identifying important genes [20][21][22].By using Bayesian  1 regularization, the sparse multinomial regression model was proposed in [20].By adopting a data augmentation strategy with Gaussian latent variables, the variational Bayesian multinomial probit model which can reduce the prediction error was presented in [21].By using the elastic net penalty, the regularized multinomial regression model was developed in [22].It can be applied to the multiple sequence alignment of protein related to mutation.Although the above sparse multinomial models achieved good prediction results on the real data, all of them failed to select genes (or variables) in groups.
For the multiclass classification of the microarray data, this paper combined the multinomial likelihood loss function having explicit probability meanings [23] with multiclass elastic net penalty selecting genes in groups [14], proposed a multinomial regression with elastic net penalty, and proved

Multinomial Regression with the Multiclass Elastic Net
Penalty.Following the idea of sparse multinomial regression [20][21][22], we fit the above class-conditional probability model by the regularized multinomial likelihood.Let   (  ) = Pr(  =  |   ).It is easily obtained that Abstract and Applied Analysis 3 Hence, Let Then ( 13) can be rewritten as Note that Hence, the multinomial likelihood loss function can be defined as In order to improve the performance of gene selection, the following elastic net penalty for the multiclass classification problem was proposed in [14] By combing the multiclass elastic net penalty (18) with the multinomial likelihood loss function (17), we propose the following multinomial regression model with the elastic net penalty: arg min where  1 ,  2 represent the regularization parameter.Note that (  ,   ) = (0, ⃗ 0).Hence, the optimization problem ( 19) can be simplified as arg min 3.2.Grouping Effect.For the microarray classification, it is very important to identify the related gene in groups.In the section, we will prove that the multinomial regression with elastic net penalty can encourage a grouping effect in gene selection.To this end, we must first prove the inequality shown in Theorem 1.
Theorem 1.Let ( b, ŵ) be the solution of the optimization problem (19) or (20).For any new parameter pairs which are selected as ( that is, Note that Hence, from ( 24) and (25), we can get where Equation ( 26) is equivalent to the following inequality: Hence, inequality (21) holds.This completes the proof.
Using the results in Theorem 1, we prove that the multinomial regression with elastic net penalty (19) can encourage a grouping effect.Theorem 2. Give the training data set ( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   ) and assume that the matrix  and vector  satisfy (1).If the pairs ( b, ŵ) are the optimal solution of the multinomial regression with elastic net penalty (19), then the following inequality holds, where  =   ()  () = ∑  =1     , ŵ() is the th column of parameter matrix ŵ, and ŵ() is the th column of parameter matrix ŵ.
Proof.First of all, we construct the new parameter pairs Let Since the pairs ( b, ŵ) are the optimal solution of the multinomial regression with elastic net penalty (19), it can be easily obtained that From ( 33) and ( 21) and the definition of the parameter pairs ( * ,  * ), we have ( From (37), it can be easily obtained that where  =   ()  () = ∑  =1     .This completes the proof.
According to the inequality shown in Theorem 2, the multinomial regression with elastic net penalty can assign the same parameter vectors (i.e., ŵ() = ŵ() ) to the high correlated predictors  () ,  () (i.e.,  = 1).This means that the multinomial regression with elastic net penalty can select genes in groups according to their correlation.According to the technical term in [14] Equation ( 40) can be easily solved by using the R package "glmnet" which is publicly available.

Conclusion
By combining the multinomial likelihood loss function having explicit probability meanings with the multiclass elastic net penalty selecting genes in groups, the multinomial regression with elastic net penalty for the multiclass classification problem of microarray data was proposed in this paper.The proposed multinomial regression is proved to encourage a grouping effect in gene selection.In the next work, we will apply this optimization model to the real microarray data and verify the specific biological significance.
ŵ ,   ) (19)is performance is called grouping effect in gene selection for multiclass classification.Particularly, for the binary classification, that is,  = 2, inequality (29) becomes Microarray is the typical small , large  problem.Because the number of the genes in microarray data is very large, it will result in the curse of dimensionality to solve the proposed multinomial regression.To improve the solving speed, Friedman et al. proposed the pairwise coordinate decent algorithm which takes advantage of the sparse property of characteristic.Therefore, we choose the pairwise coordinate decent algorithm to solve the multinomial regression with elastic net penalty.To this end, we convert(19)into the following form: ∑ =1 [  ∑ =1   (  +      ) − log  ∑ =1  (  +     ) ]