A High-Dimensional Modeling System Based on Analytical Hierarchy Process and Information Criteria

High-dimensional data sets frequently occur in several scientiﬁc areas, and special techniques are required to analyze these types of data sets. Especially, it becomes important to apply a suitable model in classiﬁcation problems. In this study, a novel approach is proposed to estimate a statistical model for high-dimensional data sets. The proposed method uses analytical hierarchical process (AHP) and information criteria for determining the optimal PCs for the classiﬁcation model. The high-dimensional “colon” and “gravier” datasets were used in evaluation part. Application results demonstrate that the proposed approach can be successfully used for modeling purposes.


Introduction
High-dimensional data refer to the state of n < p, where the number of unknown parameters is p and the sample size is n. Analyzing high-dimensional data is encountered in many areas. In statistical modeling, the classical solutions fail to produce useful results when there is high-dimensional data. High-dimensional modeling techniques are used to overcome this problem. e dimensional reduction is a convenient approach for high-dimensional modeling [1]. Some of the advantages of size reduction techniques are as follows: (i) it reduces the number of dimensions and data storage; (ii) it requires less time to calculate; (iii) irrelevant, noisy, and unnecessary data can be deleted; (iv) data quality can be well optimized; (v) it helps an algorithm work efficiently and improves accuracy; (vi) it allows visualizing data; and (vii) it simplifies classification and improves performance as well.
e researchers have studied on PCA for reducing the dimension of the explanatory variable sets. Liu et al. [2] used kernel PCA for gene expression classification. Nyamundanda et al. [3] gave a novel extension of PPCA called probabilistic principal component and covariates analysis (PPCCA) which provides a flexible approach to jointly model metabolomics data and additional covariate information. In [4], PCA was applied for EEG-based emotion recognition classification. Khedher et al. [5] established a classification system using PCA for early diagnosis of Alzheimer's disease. Gaikwad and Joshi [6] used PCA within probabilistic neural network to classify the brain tumors. Kondo et al. [7] applied PCA and logistic regression analysis for the diagnosis of lung cancer. Pamukçu et al.
[1] used probabilistic PCA and several classification algorithms in gene expression data sets. Passemier et al. [8] developed new statistical theory for probabilistic principal component analysis models in high dimensions. Smallman et al. [9] enhanced a sparse method for unsupervised dimension reduction for data from an exponential-family distribution. In [10], deep neural networks, PCA, and linear support vector machine algorithms were used for Hoda dataset classification. Yao and Lopes [11] demonstrated the role of transformations in bootstrapping methods for high-dimensional PCA with their simulation and numeric experiments. Hung and Huang [12] proposed the generalized information criterion (GIC) for high-dimensional PCA sequence selection. Ayesha et al. [13] presented the state-ofthe art size reduction techniques and their suitability for different types of data and application areas. Choubey et al. [14] proposed PCA, particle swarm optimization (PSO), and different machine learning algorithms as feature reduction or feature selection or attribute selection method for the detection of diabetes. e literature can be extended with different studies. While constructing the model with PCA, it is very important issue to determine the optimal number of principal components (PCs). However, there is not an objective criterion of identifying the optimal PCs. Although probabilistic PCA considers information criteria to select, this approach has two drawbacks: (1) probabilistic PCA only considers the explanatory variable set and does not take into account the classification model and (2) each criterion can determine the different number of PCs.
In this article, a novel hybrid approach is proposed to optimal model for high-dimensional data sets. e proposed method uses analytical hierarchical process (AHP) and information criteria for determining the optimal PCs for the classification model. e multivariate adaptive regression splines (MARS) algorithm is used as the classification algorithm within selected PCA scores. is article is an important attempt to construct an objective, high-dimensional model in the context of principal component regression modeling.
e article is organized as follows. 2 introduces the proposed hybrid system for determining the optimal PCs in high-dimensional settings and presents the employment of the MARS algorithm. Section 3 provides the application results on data sets. Section 4 presents the conclusion part.

Hybrid System Process for Selecting the Optimal PCs
A hybrid system is proposed to perform classification in cases where high-dimensional data are available. e multivariate adaptive regression splines (MARS) model is used as the classification model. e main reason of choosing MARS is inherently having variable selection capability. MARS can exclude redundant dimensions intelligently. e proposed system has three steps: (1) Once, the principal component analysis is implemented on the explanatory variables. However, for multivariate techniques based on an accurate estimation of true covariance, where the n < p problem exists, the classical sample covariance matrix has a systematically distorted Eigen structure. In this case, the structure of the covariance matrix undergoes a distortion such that the largest eigenvalues are upbiased and the smallest eigenvalues are down-biased.
To overcome the limitations of the sample covariance matrix, shrinkage approaches are often used to estimate the high-dimensional covariance matrix [15]. erefore, the shrinkage covariance matrix is used instead of the classical one. e principal component analysis is performed via the shrinkage covariance matrix, and the PCA scores are used as the explanatory variables. In this way, fewer predictors are used to build the MARS model.
(2) e PCA process is performed by selecting the optimal number of PC according to the information criteria using AHP. While performing PCA, the number of PCs should be chosen carefully. Since the PC scores directly affect the classification model, the process of PCA is linked within the MARS model. For each number of PCs, the MARS model is constructed and information criteria values are computed. Obviously, each criterion has the different decision on selecting the number of PCs, and we require a unique solution. is solution is obtained via AHP approach. e TOPSIS method is used for the AHP process. TOPSIS evaluates the several information criteria for every number of PCs, and it gives common scores. Upon these scores, one may select the optimal number of PCs. (3) After selecting the optimal number of PCs, the MARS model is constructed. Due to the nature of MARS, some redundant dimensions can be reduced to improve the inference capability.

Principal Component Analysis with Shrinkage Covariance
Matrix. Principal components analysis (PCA) was first introduced by Pearson [16] and developed by Hoteling [17] and Rao [18]. PCA is a vector-based approach in which the aim is to convert high-dimensional and interrelated vectors into small-sized unrelated vectors. e basic component analysis provides the explanation of the variance-covariance structure through a few linear combinations of the original variables. e general objective is to reduce dimension and to make an interpretation, as well as to take measures against the rank problem and to remove the linear relationship in the variance-covariance matrix. Let X be an (n × p) matrix where n is the number of samples and p is the number of variables or properties. Let us show the covariance matrix of the X data set with C. For the purpose of PCA, when D � (1/n)YY′ is a diagonal matrix, in the Y � PX transformation, there must be an orthonormal V matrix such that the rows of P are the principal components of X. If the D matrix is rewritten as (1) at is, selecting P ≡ V ′ makes D diagonal, and this means that the covariance of the newly obtained variables is zero, which is the purpose of PCA.
us, the principal components of X are C's eigenvectors [19]. e general process of the principal component analysis is summarized above. In this study, the eigenvalues of the covariance matrix C, which was used in the analysis of principal components, were negative. e shrinkage covariance matrix is considered instead of C matrix to eliminate possible bias in the analysis results. Shrinkage estimators of the C variance-covariance matrix shrink the eigenvalues of C MLE to the center. e purpose of shrinkage estimators is to take a convex combination of D which is a target diagonal matrix that has been selected appropriately with the sample variance C MLE of C [20]. en, the covariance matrix's shrinkage estimator is as follows: where ρ is the optimal shrinkage coefficient (or density) and takes the values between 0 and 1. is value can also be a function of observations. en, the D matrix is called the shrinkage target. D Naive form is given as follows: Further information on the different shrinkage target matrices of the shrinkage covariance matrix is given in [21].

Multivariate Adaptive Regression Splines with Several
Information Criteria. Multivariate adaptive regression splines (MARS) are a nonparametric regression technique which models the nonlinear relationship between a response variable and the set of predictors via basis functions [22]. Friedman [23] contrived MARS to capture the nonlinearities in the model. In MARS, it is not necessary to know the functional relationship. MARS can detect the relationships among predictors and response using basis functions. A general model form of MARS is shown as follows: where β represents the coefficient vector, B(.) shows the basis functions, and ε indicates the random error term. e basis functions are described as follows: where "-" shows the negative, "+" positive region, and "t" shows the knot points. Knots can be defined as the numbers that controls the starting and ending points in local relationships. MARS uses these knots for identifying the linearity of nonlinear relationships in the model. Figure 1 shows a graphical representation for the knots. Two knots points x 1 and x 2 are selected according to behaviors of the relationship between predictors and response. MARS handles the optimal knot selection with a goodness of fit measure. e most common used measure is generalized cross validation (GCV) which is defined as follows: where y indicates the predicted values and C(M) shows a penalty measure which is related with the number of selected parameters.
In this study, we used the following information criteria after the most commonly used GCV measurement in MARS. It should be noted that the following criteria are used for the model evaluation part. e included information criteria are Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and Consistent Information Complexity Criteria (CICOMP) [24][25][26]. e formulations of these criteria are as follows: where L(M) denotes the log-likelihood of the regression model, k shows the number of variables, and n indicates the sample size.

Analytical Hierarchy Process for Choosing the Number of PCs.
e analytical hierarchy process (AHP) that has developed by Saaty [27] is a mathematical decision-making method that enables more efficient tools for researchers to organizing and analyzing complex decisions. It is easy to use and take into account both measurable and nonmeasurable criteria. e AHP method organizes the selected factors in a hierarchical structure according to the criteria, subcriteria, and alternatives under a general target and tries to reach the result. e following steps are followed in the AHP method: By making pairwise comparisons, decision-makers create one matrix for each alternative and one matrix for criteria ese matrices are normalized, and their consistency is checked (whether the decision-makers two-point comparisons are consistent) en, with the help of matrix algebra, an average score is obtained for each alternative Mathematical Problems in Engineering e alternative with the highest score is the most appropriate alternative to the decision-makers comparisons [28].
In this study for each number of PCs, the MARS model is constructed and information criteria values are computed. Since each criterion has different decisions on selecting the number of PCs, we require a unique solution. We obtained this solution via AHP approach. e TOPSIS method is used for the AHP process. TOPSIS evaluates the several information criteria for every number of PCs, and it gives common scores.

Application
In this part, we applied the proposed method based on the hybridization of principal component analysis (PCA), analytical hierarchical processing (AHP), and information criteria to high-dimensional data sets.
Example 1. Colon data set as gene expression data is available in R software of packages "plsgenomics" [29]. e data set includes p � 2000 genes and n � 62 samples. e response variable has two groups as tumor tissues and normal tissues [30]. e task is to classify the tissues using 2000 predictors. Obviously, the data set is high dimensional for the reason of n < p.
If there exists n < p situation for multivariate techniques that is based on an accurate estimation of true covariance, the classical sample covariance matrix has a systematically deteriorated self-structure. As mentioned earlier, the negative eigenvalues of the covariance matrix are a major problem for the analysis. Table 1 presents the eigenvalues of both the classical and shrinkage covariance matrices. As it is seen, the classical covariance matrix produced negative eigenvalues because of the high dimensionality. is problem was handled by using the shrinkage covariance matrix which carries the negative eigenvalues into the positive range. After obtaining positive eigenvalues, we applied the principal component analysis with a positive definite shrinkage covariance matrix. Table 2 shows the information criteria of the MARS model for each number of principal components. In Table 2, while AIC and BIC select 14 principal components, CAIC and MARS model's GCV measurement selects 15 and more principal component. e reported results clearly denote that each criterion determined different number of principal components. In this case, deciding the number of principal components for researchers is becoming a problem. We have addressed the AHP process to solve this problem and find an optimum number of principal components. Table 3 shows the results for the AHP scores for each criterion. According to the results in Table 3, the number of principal components with a rank value of 1 is 14. So that this information shows us that the number of optimum principal components for this study is 14. After the principal component number was determined by the AHP process, we fitted the MARS regression model. Table 4 shows the results of the MARS regression model. We applied logit transformation in the MARS regression model, and the link function is as follows: e fitted model for the MARS regression is given as follows: Also, as shown in Table 4, the MARS regression model is used to select variables as well as model selection (the model did not select PC1, PC2, and PC7 principal components among 14 principal components).
In these equations, TP (True positive): means correct positive prediction, FP (False positive) means incorrect positive prediction, TN (True negative) means correct negative prediction, and FN (False negative) means incorrect negative prediction.
When the performance measurement values of the MARS model that we applied to colon cancer data are examined, it is seen that all measurement values are good. e point of particular interest in Table 5 is that the MCCR value (0.789) is quite good because this measurement generally does not give good results in model selection methods.
To evaluate the performance of the proposed hybrid approach, we used two high-dimensional modeling techniques: least absolute shrinkage and selection operator (Lasso) and adaptive elastic net (Aenet). Lasso and Aenet models are appropriate for modeling the high-dimensional data sets. Both of these models require a tuning parameter selection, and generally the information criteria are used to select the optimal one. We need to choose an information criterion which is adopted for high-dimensional data sets, and therefore we prefer to use Extended Bayesian Information Criterion (EBIC) during the selection of the tuning parameter in Lasso and Aenet [27]. EBIC is formulated as follows: In the EBIC formula, c � 0.5 and log p k shows the combination function. e Lasso and Aenet applications are handled with glmnet and msanet packages [28,29]. Table 6 shows the performance measures for Lasso-EBIC, Aenet-EBIC, and the proposed model. e comparisons are mainly based on three important measures: accuracy, F-measure, and MCCR values. It is pretty clear that the proposed model has the highest accuracy, F-measure, and MCCR values. Especially, the MCCR represents the classification performance for the proposed modeling approach. According to the results, the proposed hybrid approach gives more accurate results when comparing with the other classification models in high-dimensional data sets.
Example 2. Gravier data set as breast cancer data is available in R software of packages "datamicroarray." e data set includes p � 2,905 genes and n � 168 samples. e response variable has two groups in which patients with no events after diagnosis were labeled good and early metastases were labeled as poor [31]. Analyzes were made by selecting the first p � 1000 features from the data set. e data set is again high dimensional for the reason of n < p. Table 7 presents the eigenvalues of both the classical and shrinkage covariance matrices. e negative eigenvalue problem is solved using the shrinkage covariance matrix. After obtaining positive eigenvalues, we applied the principal component analysis with a positive definite shrinkage covariance matrix. Table 8 shows the information criteria of the MARS model for each number of principal components. In Table 8, while AIC, BIC, and MARS model's GCV measurement selects 19 principal components, CAIC selects 20 principal components. We discussed the AHP process to find the optimum number of principal components. Table 9 shows the results for the AHP scores for each criterion. According to the results in Table 9, the number of principal components with a rank value of 1 is 19. After the principal component number was determined by the AHP process, we fitted the MARS regression model. Table 10 shows the results of the MARS regression model. We applied logit transformation in the MARS regression model. e fitted model for the MARS regression is given as follows: Pmax Pmax (0, PC14 − 0.03291)     (13) Table 11 shows the performance measures for Lasso-EBIC, Aenet-EBIC, and the proposed model. In Table 11, the proposed hybrid approach gives more accurate results than other classification models.

Conclusions
In high-dimensional data sets where the number of variables is greater than the number of samples, the classical covariance matrix structure has a systematic degeneration. e degeneration of the classical covariance matrix structure leads to a negative value of the eigenvalues. is causes the results of PCA analysis to be misleading. Besides that fact, the reduction phase requires choosing the optimal number of components. In this study, we have introduced a new approach in model estimation to overcome this problem. e proposed approach objectively identifies the number of PCs for a high-dimensional setting.
e proposed system also enables to reduce the irrelevant component through MARS. One of the major advantages of this approach is to associate the selection process within the estimation model. Empirical findings prove that the developed hybrid system produces accurate results for the high-dimensional datasets.

Data Availability
Colon data set as gene expression data is available in R software of packages "plsgenomics" [29]. Gravier data set as breast cancer data is available in R software of packages "datamicroarray" [31].

Conflicts of Interest
e author declares that there are no conflicts of interest.