Incremental Multiple Hidden Layers Regularized Extreme Learning Machine Based on Forced Positive-Definite Cholesky Factorization

The theory and implementation of extreme learning machine (ELM) prove that it is a simple, efficient, and accurate machine learning method. Compared with other single hidden layer feedforward neural network algorithms, ELM is characterized by simpler parameter selection rules, faster convergence speed, and less human intervention. The multiple hidden layer regularized extreme learning machine (MRELM) inherits these advantages of ELM and has higher prediction accuracy. In theMRELMmodel, the number of hidden layers is randomly initiated and fixed, and there is no iterative tuning process. However, the optimal number of hidden layers is the key factor to determine the generalization ability ofMRELM.Given this situation, it is obviously unreasonable to determine this number by trial and random initialization. In this paper, an incremental MRELM training algorithm (FCIMRELM) based on forced positive-definite Cholesky factorization is put forward to solve the network structure design problem of MRELM. First, an MRELM-based prediction model with one hidden layer is constructed, and then a new hidden layer is added to the prediction model in each training step until the generalization performance of the prediction model reaches its peak value. Thus, the optimal network structure of the prediction model is determined. In the training procedure, forced positive-definite Cholesky factorization is used to calculate the output weights of MRELM, which avoids the calculation of the inverse matrix and Moore-Penrose generalized inverse of matrix involved in the training process of hidden layer parameters. Therefore, FC-IMRELM prediction model can effectively reduce the computational cost brought by the process of increasing the number of hidden layers. Experiments on classification and regression problems indicate that the algorithm can be effectively used to determine the optimal network structure ofMRELM, and the predictionmodel training by the algorithmhas excellent performance in prediction accuracy and computational cost.


Introduction
The neural network is a complex nonlinear system interconnected by a large number of neurons, and it is based on the research of human brain information processing ability by modern neurobiology and cognitive science.The neural network is also a mathematical simulation form of human brain physiological structure, with strong adaptability, selflearning ability, and nonlinear mapping, and it has been widely used by researchers in many scientific fields [1][2][3].However, the above prediction models are all based on the traditional neural networks, and the network training process needs to modify the network weights repeatedly according to the training objectives and gradient information.The entire network training process usually takes hundreds or even thousands of iterations before it can be finally completed, which requires a large amount of calculation.
Extreme learning machine (ELM) is a novel single hidden layer feedforward neural network.It transforms the iterative adjustment process of traditional neural network parameter training into solving linear equations.According to Moore-Penrose generalized inverse matrix theory, the least squares solution with the minimum norm is obtained analytically as the network weights.The whole training process can be completed in one time without iteration.Compared with the traditional neural network training algorithm, which requires several iterations to determine the network weights, the training speed of ELM is significantly improved [4,5].This advantage enables ELM to be successfully applied in pattern recognition [6,7] and regression estimation [8][9][10].In order to improve the generalization ability of ELM, the literature [11] draws on the principle of structural risk minimization in statistical learning theory and proposes a regularized extreme learning machine (RELM).RELM has a better generalization ability by introducing parameters  to weigh structural risks and empirical risks [12][13][14][15].For the single hidden layer RELM model with multiple input and single output, the literature [16] designed the Cholesky factorization method for regularized output weight matrix.In the learning and forgetting process of the sample sequence, the Cholesky factorization factor is calculated recursively by adding and deleting samples one by one, and then the output weights are adjusted, and the network structure is fixed.However, if dealing with input data with complex noise signals and high-dimensional information, or with more classification categories, RELM also shows its own shortcomings, and the accuracy of the established model is greatly reduced.
In order to improve the embarrassing situation of RELM, the literature [17,18] starts from improving its network structure.On the basis of the traditional RELM threelayer structure, the number of hidden layers is increased to form a neural network with one input layer, multiple hidden layers, and one output layer, that is, the multiple hidden layers RELM network model (MRELM), in which the neuron nodes of each hidden layer are fully connected.MRELM inherits the idea that RELM randomly initializes the weights matrix between the input layer and the hidden layer as well as the bias vector of the hidden layer.By forcing the actual output of the hidden layer to be as close as possible to the expected output, the weights matrix and the bias vector of the added hidden layers are calculated; thereby a neural network model with multiple hidden layers is established.The parameter training process needs to calculate the inverse matrix and the MP generalized inverse matrix, in which the first hidden layer parameters are randomly initialized, and the remaining hidden layers parameters are obtained by minimizing the error between the actual output and the expected output of the corresponding hidden layer.Compared with the traditional RELM model, MRELM can effectively improve the prediction accuracy through the layer-by-layer optimization of network parameters between different hidden layers.Moreover, it has the advantages of strong generalization ability and fast computing speed and is not easy to fall into local optimum [19][20][21][22].However, since the initial parameter values of MRELM are randomly initialized, although this avoids the situation that the algorithm falls into local optimum and overfitting, it also leads to the failure of some hidden layers or the reduction of the effect on the neural network during the modeling process.As a result, there are some redundant hidden layers in the MRELM network, which often require more reasonable selection methods and theories for the number of hidden layers.Meanwhile, the network structure of MRELM is determined by the users based on their own practical experience, but this empirical choice is not reasonable, and it is difficult to guarantee the optimality.In practical applications, users often need to carry out repeated experiments for many times and choose the network structure with the least time consuming and the highest accuracy from the complex results comparison as the optimal network model for the training and prediction of the actual data.
In order to realize the effective design of MRELM network structure, select the number of hidden layers reasonably, and achieve the desired accuracy requirements, an incremental MRELM training algorithm based on forced positive definite Cholesky factorization (FC-IMRELM) [23,24] is put forward in this paper.The algorithm can adjust the number of hidden layers in the network adaptively according to the predicted data, so as to determine the optimal network structure of FC-IMRELM.At the same time, a novel method is adopted to calculate the parameters of the newly added hidden layers, that is, the connection weight matrix and the bias vector of the hidden layers.Based on previous research, MRELM typically requires fewer hidden neurons than ELM to achieve a desirable performance level.This is a basic requirement for considering the multiplehidden-layers structure presented.The foundational ideas for the FC-IMRELM algorithm are far simpler to produce and more stable by comparing and contrasting its characteristics with other ELM variants.Experimental results for classification and regression problems show that the proposed FC-IMRELM algorithm has more advantages in terms of average accuracy compared to the traditional RELM model and other improved models of MRELM.
The rest of this paper is organized as follows: Section 2 presents a brief review of the basic concepts and related work of multiple hidden layers RELM, Section 3 describes the proposed incremental FC-IMRELM technique, Section 4 reports and analyzes the experimental results, and, finally, Section 5 summarizes key conclusions of the present study.

Brief Review of Multiple Hidden Layers Regularized Extreme Learning Machine
The MRELM algorithm tries to find a mapping relationship that makes the output predicted by the ELM neural network with multiple hidden layers infinitely close to the actual given result.This mapping relationship will be embodied in the solution process of the weight and bias parameters of the hidden layers.The number of hidden layers in the MRELM neural network needs to be selected according to the change of the predicted data.Therefore, in the training process of network parameters, in order to ensure that the final hidden layer output is closer to the expected hidden layer output, in addition to the random initialization of the parameters of the first hidden layer, the parameter training process starts from the second hidden layer to optimize the network parameters until all the network parameters are completed.Furthermore, during the establishment of the neural network, the weight matrix and bias vector of each hidden layer are acquired and recorded, so as to obtain the final predicted output result of the MRELM neural network.The solving process of the network parameters will be explained in detail in the following algorithm flow.Suppose that a set of training sample dataset given in MRELM neural network is {, } = {  ,   } ( = 1, 2, ⋅ ⋅ ⋅ , ), and  is the total number of training samples.Meanwhile, it is assumed that all hidden layers in the MRELM model contain the same number of hidden nodes , and each hidden node chooses the same activation function ().In the modeling process of MRELM algorithm, multiple hidden layers in the neural network are first treated as a single hidden layer, and then the hidden layer parameters in the MRELM network containing only a single hidden layer are randomly initialized, namely, the input weights matrix  1 = [( 1 ) 1 , ( 1 ) 2 , ⋅ ⋅ ⋅ , ( 1 )  ]  ∈  × connecting the input layer and the first hidden layer, and the bias vector  1 = [ 11 ,  12 , ⋅ ⋅ ⋅ ,  1 ]  ∈   of the first hidden nodes.Thus, the output matrix of the first hidden layer  1 can be calculated as follows: whose scalar entries (ℎ 1 )  = ( 1   +  1 ) ( = 1, 2, ⋅ ⋅ ⋅ , ,  = 1, 2, ⋅ ⋅ ⋅ , ) are interpreted as the output of the  − ℎ hidden node in the first hidden layer with respect to   and ( 1 )  = [( 1 ) 1 , ( 1 ) 2 , ⋅ ⋅ ⋅ , ( 1 )  ]  is the vector of connection weights between  input nodes and the  − ℎ hidden node in the first hidden layer.To better balance the empirical risk and structural risk, the MRELM adjusts the proportion of the two risks by introducing parameter , which can be expressed as the following constrained optimization problem. where is the connection weights matrix between the first hidden layer and the output layer, with vector components ( 1 )  = [( 1 ) 1 , ( 1 ) 2 , ⋅ ⋅ ⋅ , ( 1 )  ]  ( = 1, 2, ⋅ ⋅ ⋅ , ) that denote the connection weights between the  − ℎ hidden node in the first hidden layer and  output nodes,  = [ 1 ,  2 , ⋅ ⋅ ⋅ ,   ]  denotes the training error, and  > 0 is the regularization parameter.
According on the KKT theorem, the constrained optimization of (2) can be transformed into the following dual optimization problem: where  = [ 1 ,  2 , ⋅ ⋅ ⋅ ,   ] is the Lagrange multipliers vector.Utilizing KKT optimality conditions, the following equations can be obtained: Finally,  1 can be gotten as follows: or In order to reduce the computational costs, if  > , one may prefer to apply the solution (5a), and if  < , one may prefer to apply the solution (5b).Now the second hidden layer is added to the MRELM neural network, the network structure with two hidden layers is restored, and the two hidden layers are fully connected, so the prediction output of the second hidden layer  2 can be obtained as follows: where  2 denotes the weights matrix between the first hidden layer and the second hidden layer.We suppose that the first and second hidden layers have the same number of nodes, and thus  2 is a square matrix.The matrix  2 represents the bias of the second hidden layer.The expected output of the second hidden layer  2 * can be calculated as where ( 1 ) + is the MP generalized inverse of the matrix  1 , which can be calculated using the orthogonal projection method.Namely, if ( 1 )   1 is nonsingular, then is nonsingular.To make the predicted output of the hidden layer in the MRELM neural network infinitely close to the expected output, we may set  2 =  2 * .Subsequently, we define the augmented matrix ( 2 )  = [ 2  2 ], and it can be gotten as where ( 2 ) + is the MP generalized inverse of the matrix  2 = [1  1 ]  , and 1 represents a one-column vector of size N whose elements are the scalar unit 1.The solving method of ( 2 ) + is the same as previously discussed for ( 1 ) + .The notation  −1 () indicates the inverse of the activation function ().For classification and regression problems, we all invoke the widely used logistic sigmoid function () = 1/(1 +  − ).The predicted output of the second hidden layer  2 is obtained as Therefore, the connection weights matrix  2 between the second hidden layer and the output layer is calculated as or The solving method of  2 is chosen according to what is previously discussed for  1 .
According to the MRELM algorithm flow, the third hidden layer is added to the MRELM network, and restore the network structure with three hidden layers.Since the nodes between each hidden layer are all connected together, the prediction output of the third hidden layer can be obtained as where  3 represents the weights matrix between the second hidden layer and the third hidden layer, and the vector  3 denotes the bias of the third hidden layer.Thus, the expected output of the third hidden layer can be gotten as where ( 2 ) + is the MP generalized inverse of the weights matrix  2 , obtained using the approach described before.To meet the requirement that the predicted output of the third hidden layer is infinitely close to the expected output, let  3 =  3 * .Accordingly, the augmented matrix can be defined as ( 3 )  = [ 3  3 ], and we can solve it as follows.
where ( 3 ) + is the MP generalized inverse of the matrix  3 = [1  2 ]  , the specific meaning of the symbol 1 is described above, and the calculation of ( 3 ) + also proceeds in the manner discussed before.Therefore, we can update the predicted output of the third hidden layer as Finally, the connection weight matrix  3 between the third hidden layer and the output layer can be calculated as or The calculation approach of  3 is still selected according to the principle of  1 discussed previously.The final output of the MRELM network with three hidden layers after training can be expressed as If the number of hidden layers  in the MRELM network is more than 3, an iterative format can be adopted to realize the calculation process.In other words, the iterative calculation of formula (6) to formulas (15a) and (15b) is performed for  − 3 times until all hidden layer parameters are solved.Emphasized finally, this algorithm does not add all hidden layers to the network at one time, nor does it calculate all hidden layer parameters at one time, but one hidden layer after another is added to the network.Every time a new hidden layer is added, the weights matrix and the bias vector of the hidden layers are calculated immediately to prepare for the parameter calculation of the hidden layer to be added next time.

Solutions of IMRELM by the Forced Positive-Definite Cholesky Factorization
For the single hidden layer feed-forward neural network, the literature [15] puts forward the regularized extreme learning machine algorithm based on Cholesky factorization (CF-FORELM), introduces the Cholesky factorization of positive definite matrix into the solving process of RELM, and designs a recursive solution method for the calculation of the regularized output matrix Cholesky factorization factor.The advantages of CF-FORELM algorithm prompted us to introduce the forced positive-definite Cholesky factorization method into the framework of MRELM algorithm with multiple hidden layers, and we then proposed an MRELM neural network training algorithm based on forced positive definite Cholesky factorization (FC-IMRELM).Compared with the inverse matrix calculation of invertible matrix  −1 + (  )    in traditional RELM algorithm and the calculation of MP generalized inverse of matrix (  ) + in MELM algorithm ( denotes the number of hidden layers), the algorithm effectively reduces the computational cost and complexity brought by the matrix inverse process.Meanwhile, the numerical stability of the forced positive definite Cholesky factorization method also greatly weakens the randomness effect of the ELM algorithm on the prediction results.

Forced Positive-Definite Cholesky Factorization (FC).
The main difficulty of the MRELM algorithm is the calculation of the inverse matrix and the MP generalized inverse matrix involved in the training process, including the inverse calculation of symmetric positive semidefinite matrix.In this case, the improved MRELM mode based on the traditional Cholesky factorization could not be realized, because the Cholesky factorization of the symmetric positive semidefinite matrix might not exist.Even if such a factorization exists, the calculation process is generally numerically unstable for the elements of the matrix factorization factor may be unbounded.In order to overcome these difficulties, we put forward a modified approach based on the forced positive definite Cholesky factorization for the MRELM algorithm with multiple hidden layers, which is a numerical stability approach.
When the forced positive definite strategy is adopted to improve the MRELM algorithm, the key problem is how to form the positive definite matrix from the modified Cholesky decomposition of the undetermined matrix.If the matrix  is not a positive definite matrix, the Cholesky factorization method, which forces the matrix to have positive definite property, is to find a unit lower triangular matrix  and a positive definite diagonal matrix  for the general symmetric matrix , so that the matrix  =   is positively definite, and it is only one diagonal matrix  away from the matrix .
In fact, the Cholesky factorization of symmetric positive definite matrix can be described as follows: where   represents the element of matrix  and   denotes the main diagonal element of matrix .Here, the Cholesky factorization factors  and  are required to satisfy two requirements: one is that all elements of  are strictly positive, and the other is that the elements of the factorization factor  are uniformly bounded.That is, for  = 1, ⋅ ⋅ ⋅ ,  and a positive number , the formula ( 19) is required: where the auxiliary quantity   =   √  ,  is a given small positive number.The matrix satisfying the above conditions is said to be sufficiently positive definite, where  is a zero matrix.
Next, we describe the  − ℎ step of this factorization.Suppose the  − 1 column of the forced positive-definite Cholesky factorization has been calculated.For  = 1, ⋅ ⋅ ⋅ , − 1, equation ( 19) holds.First calculate where   is taken as   , and the test value  is defined as where  is a small positive number.In order to determine whether  can accept as the  − ℎ element of , we check whether   =   √  satisfies the formula (19).If so, let   = , and get the  − ℎ column of  from   =   /√  .Otherwise, let   =   +  , select positive number   to make max |  | = , and produce the  − ℎ column of .
If the above process is completed, we obtain the Cholesky factorization formula (17) of the positive definite matrix , where  is a nonnegative diagonal matrix and the diagonal element is   .For the given matrix , this nonnegative diagonal matrix  depends on .
where  is the maximum norm of the nondiagonal elements of  and  is the maximum norm of the diagonal elements of .If  2 = / √  2 − 1, the upper bound is minimized.So, let  satisfy formula (24): where   represents the machine precision.We increase   to prevent ‖‖ from being small.Finally, we present the forced positive-definite Cholesky factorization algorithm, where the auxiliary quantity   =     ,  = 1, ⋅ ⋅ ⋅ , ,  = , ⋅ ⋅ ⋅ , .These values need not be stored separately; they can be stored in the matrix .

Process of Matrices Decomposition for FC-IMRELM.
According to the MRELM training process shown in equations (1) to ( 16), its essence is to solve the connection weight matrix   between the hidden layer and the output layer.However, it can be seen from equations (5a), (5b), (10a), (10b), (15a), and (15b) that the solution method given in literature [18] involves matrix inversion, and the solution process of each hidden layer's parameters (  )  ( is the number of hidden layers) involves the calculation of the MP generalized inverse of matrix.The problem of large amount of calculation reduces the modeling efficiency of the MRELM prediction model.In order to solve the above problems effectively, we propose a solution method of the weights matrix   and hidden layer parameters (  )  based on the forced positive definite Cholesky factorization.
Therefore, the process of solving  1 according to equation (5a) can be transformed into solving  linear equations in the form of equation (25), and the process of solving  1 according to equation (5b) can be transformed into solving  linear equations in the form of equation ( 26);  is the dimension of observation vector.
At this point, we get the connection weight matrix  1 between the first hidden layer and the output layer.The connection weight matrix   between the other hidden layer and output layer can also be calculated by the above method.Compared with the solution method of the connecting weight matrix   as shown in equations (5a), (5b), (10a), (10b), (15a), and (15b), the solution of   based on Cholesky factorization does not involve the inverse operation of the matrix, and it can be achieved by using simple algebraic operations.
The MRELM model contains multiple hidden layers, and the solving process of each hidden layer parameter (  )  needs to calculate the MP generalized inverse of the corresponding matrices.However, it can be concluded from equations ( 7), ( 8), (11), and ( 13) that the solution method of (  ) + and (  ) + using orthogonal projection method [5] involves the inverse calculation of symmetric semipositive definite matrixes (  )    ,   (  )  , (  )    , and   (  )  , which has the problem of large computational cost and numerical instability, and if the condition number of the above matrices is too large, the calculation results of MP generalized inverse of matrices (  ) + and (  ) + are usually unable to be obtained.This not only affects the modeling efficiency and prediction effect of MRELM model, but also may make the modeling process impossible to complete.However, the traditional Cholesky factorization method can only be used to solve the calculation of symmetric positive definite matrix.To effectively overcome the above difficulties, we use the forced positive definite Cholesky factorization to solve the MP generalized inverse of matrices (  ) + and (  ) + , and we then get the hidden layer parameters (  )  .
Finally, on the basis of obtaining   and (  ) () , the element (  ) ()   of (  ) () can be calculated by using the elements of   and (  ) () .

FC-IMRELM Training Algorithm.
Studies have shown that the number of hidden layers determines the learning accuracy and generalization ability of the MRELM model [25], and it is also a key factor that must be determined in advance when designing the MRELM network structure.Due to the complexity of various training samples applied in MRELM prediction model, it is difficult to accurately determine the optimal number of hidden layers by human experience, so that the MRELM prediction model has enough hidden layers to ensure its learning accuracy, while, at the same time, it has as few hidden layers as possible to maintain its contracted network structure.To avoid the disadvantages and difficulties of artificially selecting the number of hidden layers, we propose an incremental MRELM training algorithm based on forced positive Cholesky factorization, which can automatically determine the optimal number of hidden layers in MRELM, and the training process is as follows.
where  is the learning precision and   is the maximum value of  1 , ⋅ ⋅ ⋅   ,  = 0, ⋅ ⋅ ⋅ , 3. If the formula (58) is satisfied, the training process terminates, determine  as the optimal number of hidden layers, and establish the corresponding MRELM prediction model; otherwise continue to increase  until the condition is met.
The number of hidden layers in MRELM increases successively from the initial value, and the expansion stops when   is no longer significantly reduced.At this time, even if the hidden layer is continued to be added, the   representing the learning accuracy and generalization ability of MRELM model will not be significantly improved but will lead to a large number of redundant hidden layers in MRELM.Therefore, the MRELM model at this time has the optimal number of hidden layers.

Proof of Positive Definiteness for FC-IMRELM.
In the implementation of FC-IMRELM algorithm, the process of solving the connection weights matrix   and the learning parameters (  )  in the  − ℎ hidden layer can be transformed into solving linear equations in the form of equations ( 25), ( 26), (37), (38), (47), and (48).The premise of applying standard Cholesky factorization to solving linear equations is that its coefficient matrix must be a symmetric positive definite matrix, and thus we need to prove that the matrices  =  −1  + ( 1 )   1 and  =  −1  +  1 ( 1 )  are symmetric positive definite matrices.The precondition of applying the forced positive definite Cholesky factorization to solving linear equations is that the coefficient matrix must be a symmetric matrix; therefore it is necessary to prove that the matrices  = (  )    ,  =   (  )  ,  = (  )    , and  =   (  )  are symmetric semipositive definite matrices.The following theorem shows that the matrices (  )    and   (  )  are symmetric semipositive definite matrices, and  −1  is symmetric positive definite matrix; hence the matrices  −1  + (  )    and  −1  +   (  )  are symmetric positive definite matrices, too.Consequently, Eq. ( 25) and Eq. ( 26) can be solved by the standard Cholesky factorization, and the connection weights matrix   in the  − ℎ hidden layer can be gotten.In addition, the matrices (  )    ,   (  )  , (  )    , and   (  )  are symmetric semipositive definite matrices, so Eq.(37), Eq. (38), Eq. ( 47), and Eq. ( 48) can be calculated by force positive-definite Cholesky factorization, and the learning parameters (  )  in the  − ℎ hidden layer can be obtained.
So,  is symmetric.Similarly, for the matrix  =  −1  +   (  )  , applying properties of transpose again, we get Hence,  is symmetric.On the other hand, let  = ( 1 ,  2 , ⋅ ⋅ ⋅ ,   )  be a nonzero vector in   .We can easily verify that In addition, Thus, the matrix  =  −1  + (  )    is positive definite Also, let  = ( 1 ,  2 , ⋅ ⋅ ⋅ ,   )  be a nonzero vector in   .Then we deduce that Proof (apparent).We can prove Theorem 5 using the same idea as in Theorem 4.

Results and Discussion
In this section, experiments of our proposed FC-IMRELM are conducted on benchmark data sets for classification and regression problems.In order to investigate the improvement of learning accuracy of our methods, original ELM [5], TELM [17], and MRELM [18] are also evaluated.All the  [27,28].To evaluate the robustness of our FC-IMRELM algorithm, we conducted the tests using simple benchmark datasets and real datasets collected from coal and iron ores industries.The characteristics of the datasets are shown in Table 1.

Evaluation of Testing Accuracy on Classification
Datasets.In order to make the performance evaluation more comprehensive, the real datasets that are related to complex industrial data were added to our performance evaluation.The original ELM, TELM, and MRELM algorithms are tested using the simple benchmark datasets and real datasets to validate the improvement of learning accuracy of our IMRELM algorithm.From Table 2 and Figure 1, we can see that each algorithm has a good classification accuracy for Banknote dataset.For Blood, Diabetic, Wilt, Coal spectral, and Iron spectral datasets, the algorithms TELM, MRELM, and FC-IMRELM all outperform the ELM algorithm, and the FC-IMRELM algorithm has the highest classification  accuracy.For Image dataset, the classification accuracy of the algorithms TELM, MRELM, and FC-IMRELM is much higher than that of the ELM algorithm, and the classification accuracy of the FC-IMRELM algorithm is still the highest, reaching 91.98%.The experimental results show that the average classification accuracy of our FC-IMRELM algorithm is significantly higher than that of the original ELM, TELM, and MRELM algorithms, and the computational experiments using Coal spectral and Iron spectral datasets also demonstrate that our FC-IMRELM algorithm can be easily extended to practical application.

Regression Problems.
For the regression model, the rootmean-square error (RMSE) and the coefficient of determination (R2) [29] are used as the model performance evaluation indexes in this study to verify the effectiveness of the proposed FC-IMRELM algorithm.where   is the number of samples in the prediction set;   is the actual value of the sample;  is the average of the actual values; ŷ is the predicted value calculated by the model.The value range of R2 is between (0, 1), the closer the value of R2 to 1 and the smaller the value of RMSE, the better the performance of the model.

Characteristics of Regression Datasets.
The regression datasets are obtained from the LIBSVM website [30].The characteristics of the datasets are shown in Table 3.

Evaluation of Estimation Accuracy on Regression
Datasets.As shown in Table 4 and Figure 2, we can conclude that the algorithms TELM, MRELM, and FC-IMRELM all have good prediction results for the Bodyfat dataset, where RMSE of these algorithms are small, and R2 are above 0.98, while the prediction ability of the ELM algorithm is slightly worse, and R2 of ELM is equal to 0.8.For Pyrim and Triazines datasets, the prediction results utilizing the FC-IMRELM algorithm are better than those utilizing the algorithms ELM, TELM, and MRELM, among which RMSE of the FC-IMRELM algorithm is the smallest, and R2 is the highest.The above analysis of experimental results indicates that the advantage of the FC-IMRELM algorithm is that it can better extract data characteristics for the multiattribute data, so it has better predictive ability.

Conclusions
(1) First of all, compared with MRELM, the FC-IMRELM algorithm proposed in this paper uses the idea of forced positive definite Cholesky factorization to determine the hidden layer parameters.The training process is more simplified, with low calculation amount and high numerical stability.In addition, the MRELM algorithm needs to set the network structure in advance, and the number of hidden layers remains unchanged in the training process, while the FC-IMRELM algorithm can automatically select the optimal number of hidden layers through the principle of structural risk and empirical risk minimization and adjust the network structure adaptively according to the training samples.
(2) Secondly, compared with CF-FORELM, the FC-IMRELM algorithm proposed in this paper is designed for the semipositive definite matrices appearing in the parameter solving process of MRELM model.The condition number of the matrix is improved while forcing the matrix positive definite, thereby accelerating the convergence speed of the MRELM model and ensuring the numerical stability of the modeling process.
(3) Finally, by introducing parameter  to weigh the structural risk and empirical risk of ELM model, the FC-IMRELM algorithm has significantly improved its generalization ability compared with the traditional neural network.In addition, the forced positive definite Cholesky factorization is used to calculate its output weights, effectively reducing the computational cost brought by the increasing process of hidden layers.The prediction example shows that the FC-IMRELM algorithm can effectively avoid the numerical instability of MRELM model and has the advantages of high prediction accuracy and fast calculation speed, which can provide a novel and efficient solution to the prediction problem.

Figure 1 :
Figure 1: The average testing classification accuracy for the algorithms ELM, TELM, MRELM, and FC-IMRELM using classification datasets.

Table 1 :
Properties of the classification datasets.

Table 3 :
Specifications for the regression datasets.