Globality-Locality Preserving Maximum Variance Extreme Learning Machine

. An extreme learning machine (ELM) is a useful technique for machine learning; however, the existing extreme learning machine methods cannot exploit the geometric structure information or discriminate information of the data space well. Therefore, we propose a globality-locality preserving maximum variance extreme learning machine(GLELM) based on manifold learning. Based on the characteristicsof the traditional ELM method, GLELM introduces the basic principles of linear discriminant analysis (LDA) and local preservation projection (LPP) into ELM, fully taking account of the discriminant information contained in the sample. This method can preserve the global and local manifold structures of data to optimize the projection direction of the classifier. Experiments on several widely used image databases and UCI datasets validate the performance of GLELM. The experimental results show that the proposed model achieves promising results compared to several state-of-the-art ELM algorithms.


Introduction
Single-layer feedforward networks (SLFNs) have been intensively studied over the past several decades. The wellknown algorithm in single layer feedforward networks is the backpropagation (BP) algorithm proposed by Rumelhar et al. [1] in 1986. The BP algorithm uses the idea of gradient descent to optimize the parameters in the neural network, but this optimization has the disadvantages of slow training speed, and it easily falls into a local minimum. Therefore, researchers have proposed different improved algorithms for the problem of slow training speed which easily falls into a local minimum. Hagan et al. [2] proposed a second-order optimization method in 1994. Branke et al. [3] proposed a global optimization method in 1995. Li et al. [4] proposed a subset selection method in 2005.
Recently, the extreme learning machine (ELM) [5] has attracted increasing attention from scholars. ELM is developed on the basis of single-hidden layer feedforward networks (SLFNs) and can be regarded as an extension of SLFNs. In traditional neural network algorithms, for example, the BP [1] neural network uses the gradient descentbased method to adjust the input weight and basis value of the hidden layer nodes in an iterative manner. However, the method based on gradient descent has the disadvantages of slow solution speed and easily falling into a locally optimal solution. Compared with the traditional neural network algorithm, ELM randomly generates the input weight and basis value of the hidden layer node, so it has a faster solution speed and requires less human intervention during the training process. The literature [6,7] analysed the input weight and bias value of ELM at randomly generated hidden layer nodes to determine the output weight, which maintained the general approximation ability of SLFNs. At the same time, a near-global optimal solution can be obtained. The literature [8,9] notes that ELM has better classification performance than support vector machine (SVM) [10]. Due to the good generalization ability of ELM, ELM has been widely used in pattern recognition [11][12][13][14][15].
In recent years, researchers have studied ELM in various ways and proposed various improvements. Huang et al. [6] further studied the general approximation ability of ELM. Lin 2 Complexity et al. and Liu et al. [16][17][18] used statistical learning theory to conduct in-depth research on the generalization ability of ELM. Wang [19] et al. proposed a local generalization error model for the problem of ELM generalization ability, and the researchers also compared ELM with other classification algorithms. Shi et al. [20] studied ELM and SVM and their improved algorithms in depth and concluded that ELM is superior to SVM in training speed and generalization ability. Many variants of ELM have been proposed to meet particular application requirements. For example, Wang et al. [21] analysed the influence of the hidden layer node output matrix on the ELM algorithm and proposed an improved algorithm. Zheng et al., Riccardo et al., and Zhang et al. [22][23][24] proposed various improvements to the ELM algorithm by analysing the influence of data on the ELM model from the perspective of cost sensitivity coefficients. Li [25] et al. studied the defects of ELM in unbalanced data and missing data to improve the ELM algorithm. Zhou et al. and Javier [26][27][28] et al. applied ELM to remote sensing images. Zhou et al. [29] proposed various improvements for ELM to solve the problems in online continuous data applications. Recently, researchers have combined ELM and dimensionality reduction techniques for application. Castaño [30] et al. applied principal component analysis (PCA) dimensionality reduction techniques to ELM, and Wang et al. [31] combined the local tangent space alignment (LTSA) dimensionality reduction algorithm with ELM. Researchers have also applied integration techniques to ELM to improve the robustness of ELM algorithms. Zhang et al. [32] applied AdaBoost technology to ELM, and Liu et al. [33] proposed an integrated extreme learning machine. Deepak et al. [34] applied bagging technology to the ELM algorithm.
The above improvements in theory and application enhance the generalization capability of ELM and greatly expand the application range of the ELM algorithm. However, the discriminant information of the ELM algorithm on the data samples and the global and local manifold structures between the data samples have not yet been carefully studied in mathematics or geometry. Recently, researchers have noted that manifold learning methods [35,36] can effectively reveal the intrinsic geometry of data points [9]. Assuming that data samples 1 and 2 are drawn from the same marginal distribution , if two points 1 and 2 are close to each other, then the conditional probabilities ( | 1 ) and ( | 2 ) should be similar as well. The above assumptions are widely referred to as smoothing assumptions in machine learning. Therefore, by mining the geometry between the data, it is possible to provide effective information for pattern classification. Recently, the researchers carried on the thorough research on manifold learning, puts forward the different methods to keep local characteristics of data [37][38][39]. Aiming to solve the drawback of ELM that the intrinsic manifold structure of the data space is ignored, and inspired by manifold learning and literature [40], we introduce the basic principles of linear discriminant analysis (LDA) [41] and locality preserving projections (LPP) [42] into ELM, proposing a novel learning algorithm called the globality-locality preserving maximum extreme learning machine (GLELM) in which the manifold structure within each class is explicitly considered. This method introduces the intraclass divergence and interclass divergence matrix in LDA and the basic principle of LPP into ELM so that it not only maintains the intrinsic local geometry of the sample but also maintains the global geometric structure of the sample to a certain extent and embodies the global discriminant information contained in the sample. GLELM retains the locality preserving characteristic of LPP and utilizes the global discriminative structures obtained from MMC, which can maximize the between-class distance and minimize the within-class distance. We combine the thought of LPP and the principle of LDA into ELM model, to enhance the information discriminant ability of ELM. So GLELM is superior to ELM for recognition task. Moreover, the experimental results show that the intrinsic manifold structure of the data sample can effectively improve the classification performance of the ELM algorithm. In addition, the literature [43] noted that some recent research shows that the images will reside on a nonlinear submanifold. Therefore, in this case, GLELM can usually achieve better performance than ELM. The contributions of the GLELM algorithm mentioned in this paper are as follows.
(1) While inheriting the characteristics of ELM, GLELM avoids the problem of insufficient learning to some extent.
(2) The basic principles of LDA and LPP are introduced into ELM, which effectively maintains the intrinsic local geometry and global geometry of the sample and introduces the global discriminant information of the data samples into the ELM model.
(3) The idea of manifold learning is applied to the ELM model, and the validity of the GLELM algorithm is verified by experiments.
The rest of the paper is organized as follows. In Section 2, this paper introduces related work. In Section 3, we introduce the basic principles and framework of the ELM algorithm. Section 4 presents the GLELM algorithm framework. Section 5 describes and analyses the experimental results. Section 6 summarizes the paper.  [46] to optimize the network output weights of ELM. The GEELM provides a unified way to incorporate subspace learning criteria formulated using graphs in elm optimization. In their paper, formulations using supervised and unsupervised subspace criteria in elm optimization are used. Liu et al. proposed the robust discriminative extreme learning machine (RDELM) [47] for the deficiency of the MCVELM algorithm for discriminating information between data samples. The RDELM algorithm not only takes into account the intraclass discrimination information of the data samples but also considers the interclass discrimination information of the data samples. The motivation for our paper is similar to the above papers, which also discussed the geometry of ELM. However, they directly used the geometric structure information of the data to optimize the network output weight. We focus on the data samples 1 and 2 that are drawn from the same marginal distribution . If two points 1 and 2 are close to each other, then the conditional probabilities ( | 1 ) and ( | 2 ) should be similar as well; therefore, the manifold structure information of the data samples is introduced into the ELM model, and the generalization ability of the ELM algorithm is enhanced. The most relevant work was proposed by literature [48][49][50]. Iosifidis et al. introduced local class information into the ELM model and proposed a Local Class Variance Extreme Learning Machine (LCVELM) classifier [48]. Based on the consistency property of data, which enforces similar samples to share similar properties, Peng et al. proposed a discriminative graph regularized extreme learning machine (GELM) [49]. GELM constructs the Laplacian Eigenmap (LE) [51] structure with discriminant information of data samples and introduces it into the ELM algorithm as a regular term. In addition, Peng et al. proposed a discriminative manifold extreme learning machine (DMELM) [50] based on local intraclass discriminant information, local interclass discriminant information, and data geometric structure information. The GELM and DMELM algorithms proposed by Peng et al. enhance the classification performance and generalization ability of the ELM model by introducing the manifold structure and discriminant information of the data samples into the ELM model. However, the GELM and DMELM algorithms ignore the global geometry and discriminant information of the data samples. The literature [52] shows that the intraclass divergence matrix, interclass divergence matrix, and global divergence matrix in linear discriminant analysis (LDA) maintain the global discriminant information and global geometric structure of the training samples. Therefore, based on the basic principles of the LDA and LPP algorithms, we introduce the global and local manifold structure and discriminant information into the ELM model and propose the GLELM model.

Background and Notation
It is clear that our GLELM models are the natural extension of ELM with the manifold regularization, and the manifold learning methods have also been combined with other machine learning algorithms, such as globality-locality preserving projections (GLPP) [53,54] and support vector machine with globality-locality preserving (GLPSVM) [55]; GLPP separates the data into a static part (subject-invariant factors) and a dynamic part (intrasubject factors) at first and then jointly learns these two graph Laplacians to yield a new graph Laplacian. GLPP realize dimensionality reduction for data by using the aforementioned method. By using LPP to keep local geometry information and LDA to keep global geometry information of data, GLELM unifies LPP and LDA into a manifold regularization framework. The proposed GLELM algorithm combines manifold criterion and Fisher criterion, with a stronger discriminative ability. GLPSVM introduced manifold structure information into SVM, using geometry and discriminative information to construct manifold regularization framework. Both GLPSVM and GLELM use LPP to construct manifold framework; however, GLPSVM uses data sample mean vector to obtain the global geometric structure information of data while GLELM uses LDA. In addition, the architecture of GLELM is completely different from the GLPP and GLPSVM. In fact, GLPP is a dimensionality reduction algorithm. As a classification algorithm GLPSVM do the classification by maximizing the geometric intervals. Based on single hidden layer feedforward neural network, GLELM randomly generate output weights and hidden layer offset value and analyse and determine the weights of the output so as to realize the data classification. Different architecture leads to different recognition performance.

Extreme Learning Machine.
The extreme learning machine proposed by Huang et al. [5] is an efficient and practical learning mechanism for single-layer feedforward neural networks. For different samples = {( , ) | ∈ , ∈ , = 1, 2, . . . , }, where = ( 1 , 2 , . . . , ) and = ( 1 , 2 , . . . , ) , the ELM model with hidden layer node activation function ( ) is as follows: is the input weight vector connecting the th hidden layer node with the input nodes; ) is the output weight vector connecting the th hidden layer node and the output node. is the offset value of the th hidden layer node.
⋅ represents the inner product of and . = ( 1 , 2 , . . . , ) is the network output corresponding to sample . To integrate all data samples, (1) can be rewritten as follows: is the output vector of the hidden layer with respect to , is the network hidden layer node output, is the output weight matrix, and is the expected output matrix: . . . . . . . . .
The standard single hidden feedforward neural networks (SLFNs) are to compute appropriatẽ,̃, and̃( = 1, 2, . . . , ) to satisfy Formula (5) can be solved by gradient descent method. Huang et al. [22] have proved that the weights between input layer and the biases need no adjustment compared with the standard SLFNs. In the algorithm of ELM, weights and bias values of hidden layer nodes are randomly input; the single hidden layer feedforward neural network nonlinear model is converted into linear model. Formula (5) can be written as = and can be solved by using least square method. When the number of hidden layer nodes is the same as the number of training samples ( = ), we can directly obtain the optimal output weight matrix by the inverse matrix of matrix by (5). However, in most cases, the number of hidden layer nodes is much smaller than the number of training samples ( < ). At this time, the matrix is a singular matrix. We solve (5) by the least squares solution: where + is the generalized inverse matrix of the matrix and + can be calculated by SVD or least-squares.
To improve the stability and generalization capability of traditional ELM, Huang [22] proposed the equality optimization constraint-based ELM. The optimization formula of the ELM of the equality optimization constraint not only minimizes the training error but also minimizes the output weight , so the ELM target of the equality optimization constraint can be written as In (7), = ( 1 , ⋅ ⋅ ⋅ , ) is a training error vector corresponding to the sample , and is a penalty parameter.
The number of training samples is larger than the number of hidden layer nodes, or the number of training samples is smaller than the number of hidden layer nodes in the calculation process of ELM. The two cases corresponding to the output weight are different. We will rewrite them as follows: When the number of training samples is less than the number of hidden layer nodes ( > ), the solution to (8) is When the number of training samples is greater than the number of hidden layer nodes ( < ), the solution to (10) is The ELM algorithm solving process can be summarized as follows: (1) Initialize the training sample set (2) Randomly specify the network input weight and the offset value , = 1, 2, ⋅ ⋅ ⋅ , (3) Calculate the hidden layer node output matrix by the activation function (4) Calculate the output weight matrix according to (9) or (10) 2.4. Linear Discriminant Analysis. The main idea of LDA is to enhance the global class discrimination after projection, which maximizes the rank of the inter-class discrete matrix by minimizing the rank of the intraclass discrete matrix to find a subspace to distinguish different categories. According to the derivation of LDA in the literature [48], and are defined as follows: In (11) and (12), is the number of samples in the th class, and ℎ ( ) is the th sample in the th class. ( ) is the mean vector of the th class, represents the mean vector of all samples, and is the total number of categories in the dataset. LDA has the following optimization criteria: Equation (13) finds the projection transformation matrix by the Lagrange multiplier method and then obtains the corresponding low-dimensional expression of via = .
2.5. Locality Preserving Projections. As a linear transformation of the LE algorithm, the LPP algorithm solves the difficulty that the LE algorithm has in obtaining low-dimensional projection mapping on new test data [51] and is easily embedded by nonlinearity, thus finding a high-dimensional nonlinear manifold structure. LPP achieves dimensionality reduction by maintaining the neighbourhood structure of the data samples. LPP is obtained by linear transformation on the basis of the LE algorithm. The LPP model can be expressed as follows: In formula (14), represents the Laplacian matrix, where is the diagonalization matrix, and = ∑ =1 . ∈ × is the sparse affinity matrix; if and are not near neighbours, then = 0. If ℎ and ℎ are near neighbours, then = exp(−‖ℎ − ℎ ‖ 2 /2 2 ). By learning a projection , the objective function minimizes the distance between those data points with neighbourhood relation in the raw data space.

Globality-Locality Preserving Maximum
Variance ELM

Motivation of Globality-Locality Preserving Maximum
Variance ELM. The local geometry of the sample can be used as side information for improving the performance of learning models. Assuming data samples ℎ 1 and ℎ 2 are drawn from the same marginal distribution ℎ , if two points ℎ 1 and ℎ 2 are close to each other, then the conditional probabilities ( | ℎ 1 ) and ( | ℎ 2 ) should be similar as well. Based on local geometry of the sample, many locality preserving methods were proposed [56]. Zhao et al. proposed a new and effective semisupervised dimensionality reduction method, called Learning from Local and Global Information (LLGDI) [56], to utilize the underlying discriminative information. Literature [37] solves the problem that traditional subspace learning methods are the sensitivity to the outliers. They proposed a series of methods based on the L2,1-norm for dimensionality reduction. Literature [38] studies the problem that ridge regression based methods are sensitive to the variations of data and can learn only limited number of projections for feature extraction and recognition. They propose a new method called robust discriminant regression (RDR) for feature extraction. In literature [39], LLE and ONPP are combined to form the framework of sparse subspace learning. The framework is not only suitable for sparse linear subspace learning but also suitable for sparse nonlinear subspace learning. Essentially, our method can be viewed as one type of manifold learning, which is aimed at preserving the local geometry structure during feature learning or classification.

Manifold Regularization Framework.
Manifold regularization framework can be obtained based on the LE algorithm [51]. However, because the LE algorithm has difficulty obtaining the low-dimensional projection mapping problem on the new test data [42], the LPP algorithm solves the above problems of the LE algorithm. Inspired by literature [40], based on the LPP algorithm, this paper proposes a manifold regularization framework. At the same time, considering that the LPP algorithm cannot maintain the global geometry of the data samples and the discriminant information contained in the data, this paper introduces the basic principles of the LDA algorithm into the manifold regularization framework. Compared with the literature [49,50], the advantages of the algorithm proposed in this paper are as follows: (1) not only is the local manifold structure considered but also the global manifold structure and the global discriminant information of the data samples are considered; (2) taking into account the singularity of the manifold regularization framework, the maximum marginal criterion (MMC) [57] is used to solve the above problem. The LDA algorithm will make the similar samples closer but heterogeneous samples far away after the projection transformation. The LPP algorithm has advantage of maintaining the neighborhood structure of the sample after projection transformation. Therefore, the combined Section 2.4 and Section 2.5 manifold regularization framework loss function is shown in (14): where is an intraclass discrete matrix and is an interclass discrete matrix as described in Section 2.4. = [ 1 , 2 , . . . , ] ∈ × is the projection transformation matrix and is the unit matrix. represents the Laplacian matrix and is a diagonalization matrix as described in Section 2.5.

GLELM.
The existing ELM algorithm cannot make good use of the intrinsic manifold structure information of the data, which can create the problem of insufficient learning. To overcome this problem, we propose a globality-locality preserving maximum extreme learning machine (GLELM) based on manifold learning. The optimization problem formulation of the GLELM is given by using the manifold regularization framework.
Based on manifold learning [9], assuming data samples ℎ 1 and ℎ 2 are drawn from the same marginal distribution 6 Complexity ℎ , if two points ℎ 1 and ℎ 2 are close to each other, then the conditional probabilities ( | ℎ 1 ) and ( | ℎ 2 ) should be similar as well. The above assumption is widely known as the smoothness assumption in machine learning. In this subsection, we introduce the manifold regularization framework into the ELM model. In the ELM algorithm, = ; therefore, the GELM algorithm model can be written as follows: where ( is the manifold regularization term, as described in Section 3.2. ‖ ‖ 2 is the 2 -norm regularization term; ∑ =1 ‖ ‖ 2 is the training error term. 1 is a penalty constant on the training errors, and 2 is a penalty constant on the manifold regularization term. ℎ( ) = ( ( 1 ⋅ + 1 ), ( 2 ⋅ + 2 ), . . . , ( ⋅ + )) is the output vector of the hidden layer with respect to , as described in Section 2.3.
We rewrite (16) to the following form: By substituting (17) Order / = 0; according to formula (18), we can obtain the following formula: The output weight matrix is obtained by solving formula (19) as follows: Based on the above derivation, the specific steps of the GELLM algorithm are as shown in Algorithm 1.

Experiments
In this section, to verify the validity of the algorithm GLELM proposed in this paper, we use the image dataset and the UCI dataset [58] to perform experiments. A detailed description of the UCI dataset and image dataset is given in Table 1. In all our experiments, we compare the results of the GLELM experiments with the experimental results of ELM, MCVELM [44], RDELM [47], and GELM [49]. The specific comparison results are given in Figures 2 and 3 and Tables  2 and 3. For all ELMs, we choose the Sigmoid function as the activation function, and the number of hidden layer nodes is selected from = {500, 1000}. For different ELM algorithms, on the training dataset, we use the threefold cross-validation and grid search methods to find the optimal parameters. For ELM, the MCVELM algorithm penalty parameter range is ∈ {2 −5 , 2 −4 , . . . , 2 4 , 2 5 }. For RDELM, GELM and GLELM contain penalty parameters and regularization parameters, respectively, and the values are 1 ∈ {2 −5 , 2 −4 , . . . , 2 4 , 2 5 } and Input: Initialize the training sample set = {( , )| ∈ , ∈ , = 1, 2, . . . , }, activation function ( ), the number of hidden layer nodes is , the regularization parameters are 1 and 2 ; Output: Output weight matrix ; Step 1: Randomly specify the network input weight and offset value , = 1, 2, ⋅ ⋅ ⋅ , ; Step 2: Calculate the hidden layer node output matrix by the activation function ; Step 3: Calculate the manifold regularization framework according to formula (15) ; Step 4: Calculate the output weight matrix from equation (20) . Algorithm 1: GLELM algorithm.     The ORL dataset [61] contains 400 images of 40 people. Each person contains 10 images. We selected images of different expressions under different lighting conditions. The image size is 32 × 32.
The COIL20 [62] dataset contains 1440 images of 20 types, each containing 72 images, and the image size is 32 × 32.
The USPS dataset [63] is a handwritten digital image containing 9298 images with an image size of 16 × 16.
Some samples of different image datasets are given in Figure 1. Table 1 gives specific details of the different image datasets.

Image Datasets Experiments.
In this subsection, we show the experiments with different ELM algorithms on the image datasets. Figure 2 and Table 2 show the recognition rate curves and average recognition rates of different algorithms.
In Figure 2, "TrainNum" denotes different training samples per subject. Figure 2 shows that the recognition rate curve of the GLELM algorithm in the six image datasets is better than the ELM, MCVELM, GELM, and RDELM algorithms. This is because the GLELM algorithm takes into account not only the local manifold structure information of the data samples but also the global geometry of the data samples and the discriminant information contained therein. The basic principle of the LDA and LPP algorithms is used to define a manifold regularization framework. At the same time, the manifold regularization framework is introduced into the ELM model to optimize the projection direction of the classification. As seen in Table 2, the recognition rate of the GELM algorithm on the ORL, Yale, Yale B, MNIST, COIL20, and USPS datasets is better than the ELM algorithm. This is because the GELM algorithm takes into account the local manifold information and discriminant information of the data samples and introduces the above information into the ELM model to optimize the output weights and enhance the classification performance. The experimental results of the MCVELM algorithm on the ORL and Yale dataset are very close to those of the ELM algorithm. The recognition rates on the Yale B, COIL20, and USPS datasets are better than those of the ELM algorithm. This is because the MCVELM algorithm takes into account the discriminant information of the data samples, thereby introducing the intraclass divergence matrix into the ELM model. The recognition rate of the RDELM algorithm on the Yale B, MNIST, COIL20, and USPS datasets is better than the ELM algorithm. This is because the RDELM algorithm takes into account the intraclass information and interclass discrimination information of the data samples. The intraclass divergence matrix and the interclass divergence matrix are introduced into the ELM model.

UCI Datasets Experiments.
To further verify the effectiveness of the proposed GLELM algorithm, we conduct experiments on the UCI dataset by GLELM, ELM, MCVELM, and GELM. Figure 2 and Table 2 show the recognition rate curves and average recognition rates of different ELM algorithms. In Figure 3, "TrainNum" denotes different training samples per subject. Figure 3 shows that the recognition rate curve of the GLELM algorithm in the six UCI datasets is better than the ELM, MCVELM, GELM, and RDELM algorithms. The recognition rate of the MCVELM  and GELM algorithms on the Segment, Glass, and Pima datasets is better than that of the ELM algorithm. MCVELM, GELM, and ELM are very close to the Banknote dataset. Based on Figures 2 and 3, we can see that the difference in the recognition rate curve of different ELM algorithms on the image dataset is relatively small. However, the recognition rate curves of different ELM algorithms on the UCI dataset fluctuate greatly, which may have a certain relationship with the dimensionality of the data. The dimensionality of the image data is generally small, while the UCI dataset dimension is generally large. The experimental results of different ELM algorithms on the image dataset and the UCI dataset show that the proposed GLELM algorithm enhances the classification performance and generalization ability of the ELM algorithm by introducing a manifold regularization framework in the ELM model.

Parameter Analysis for GLELM.
The two parameters included in the ELM algorithm are the number of hidden layer nodes and the penalty parameter . The GLELM algorithm contains three parameters, namely, the number of hidden layer nodes , the penalty parameters 1 , and regularization parameters 2 . Based on the literature [7], the performance of ELM is not very sensitive to the number of hidden nodes. When performing experiments on the UCI dataset, we set the hidden layer node = 500 and set the hidden layer node to = 1000 on the image dataset. We conduct experiments to investigate the effect of these parameters on the final recognition accuracy penalty parameters 1 and regularization parameters 2 ; the values are 1 ∈ {2 −5 , 2 −4 , . . . , 2 4 , 2 5 } and 2 ∈ {2 −5 , 2 −4 , . . . , 2 4 , 2 5 }. We use the image datasets to perform experiments to observe the effect of different parameter values on the final recognition rate. In the experiments, for the ORL and Yale datasets, we select the number of data samples in each subclass as 5 for the training set and the remainder as the test set. For the Yale B and MNIST datasets, the number of data samples is selected as the training set in each subclass, and the remaining data are used as the test set. For the COIL20 and USPS datasets, the number of data samples is selected as the training set in each subclass, and the remaining data are used as the test set. On the image dataset, the effect of different values of penalty parameter 1 and regularization parameter 2 on the GLELM algorithm is shown in Figure 4.

Computing Complexity Analysis.
In this subsection, we analyse the computational complexity of different algorithms, for the ELM algorithm = ( / + ) −1 , is a matrix of × and is the number of hidden layer nodes. In most cases, the number of hidden layer nodes is much smaller than the number of training samples (i.e., ≪ ). Thus, the computational cost decreases dramatically compared to LS-SVM and PSVM, which needs to compute the inverse of an × matrix [24]. Similarly, our proposed GLELM, RDELM, GELM, and MCVELM have similar complexity as conventional ELM. RDELM output weights can be written as = ( / 2 + + / 1 ) −1 ; is the difference between the intraclass divergence matrix and the interclass divergence matrix. The output weight of GELM = ( 2 + + 1 ) −1 , the output weight of GLELM = [ / 1 + + 2 / 1 ( − + )] −1 , the output weight of MCVELM = [ + / ] −1 are the intraclass divergence matrix. All of the algorithms need to calculate the × inverse matrix of . Therefore, the computational complexity of the above different ELM algorithms is ( 3 ). Figure 5 shows the training time for different ELM algorithms on the image dataset and the UCI dataset. Figure 5 shows the training time curves of different ELM algorithms on image datasets and UCI datasets. Table 4 shows the average training time for different ELM algorithms. As shown in Table 4, the average training time of the four algorithms MCVELM, GELM, RDELM, and GLELM on the image dataset and the UCI dataset is higher than the average training time of the ELM algorithm. This is because the four algorithms MCVELM, GELM, RDELM, and GLELM all introduce regular terms based on the ELM algorithm. From Table 4, we observe that the average training time of the GLELM algorithm on the ORL, Yale, Iris, Wine and   Glass datasets is higher than that of other algorithms. The average training time of GELM on eight datasets of Segment, Banknote, Pima, Yale B, COIL20, MNIST, and USPS is higher than that of other ELM algorithms.
Based on the analysis of the 12 datasets in Figure 5 and Table 4, we find that Segment, Banknote, Pima, Yale B, COIL20, MNIST, and USPS contain considerably more training samples than ORL, Yale, Iris, Wine, and Glass. The aforementioned phenomenon may be caused by time consumption of GELM, in which more adjacency graph matrix to be constructed as sample data increases. From this, we can infer that when the dataset contains a large number of training samples, the time efficiency of GLELM will be better than that of the GELM algorithm.
Combining Tables 2, 3, and 4, we analyse the performance of the proposed algorithm GLELM in terms of the classification recognition rate and time efficiency. Tables 2  and 3 show that GLELM has good classification ability and is superior to other ELM algorithms. As shown in Table 4, GLELM does not have a considerable advantage on time overhead, and it is not the least time-consuming algorithm on some datasets. However, based on the ELM classification speed and good classification performance, GLELM can be used as an effective classifier in pattern recognition.

Relationship between LCVEM and GEELM Models as well as GLELM
In this section, we analyze the proposed algorithms GLELM compared with LCVELM and GEELM in detail. Combining the discriminant information of the data sample and the adjacency graph structure, the LCVELM algorithm constructs Laplacian feature matrix to obtain the local geometric structure of the data sample and strengthen the generalization ability of the ELM algorithm. We design the graph embedding framework based on the LCVELM algorithm in LCVELM algorithm, which can be composed of algorithms such as LLE algorithm, LE algorithm, and LDA algorithm. The GEELM algorithm constructs the graph embedding framework based on the LCVELM algorithm. The graph embedding framework can be composed of algorithms such as LLE algorithm, LE algorithm, and LDA algorithm. These steps have enabled GEELM to further improve the classification capabilities of LCVELM. The GGLEL algorithm takes both the local and global geometry and the discriminant information of the data into consideration, which differ from the LCVELM algorithm and the GEELM algorithm essentially only using the local geometry of the data samples. In addition, different manifold regularization structures make GLELM, GEELM, and GLELM have distinguishing classification performance. In order to verify the classification performance of the three algorithms, we conduct experiments on AR face image datasets and speech data. For all ELMs, we choose the Sigmoid function as the activation function, and the number of hidden layer nodes is selected from = 1000. For different ELM algorithms, on the training dataset, we use the threefold cross-validation and grid search methods to find the optimal parameters. For LCVELM, GEELM and GLELM contain penalty parameters and regularization parameters, respectively, and the values are 1 ∈ {2 −5 , 2 −4 , . . . , 2 4 , 2 5 } and 2 ∈ {2 −5 , 2 −4 , . . . , 2 4 , 2 5 }. For the AR dataset, we randomly select = {11, 14, 17, 20} images per subject for training and the rest for testing. Similarly, for the remaining datasets, we set 1 = {5, 8, 11, 14}. All experiments were randomly selected from the training set and the test set to run 10 times, and then the average of the 10 runs was calculated as the final recognition result. The recognition results of different ELM algorithms on the image dataset are shown in Tables 5 and 6. (i) The AR face database [64]: dataset contains 4,000 images of 126 people (70 men and 56 women). We use a subset that contains 2600 grey images of 100 human subjects and have selected images of different expressions under different lighting conditions. In our experiments, the images were cropped and scaled to an image size of 50 × 40.
(ii) Isolet1 (http://www.cad.zju.edu.cn/home/dengcai/ Data/MLData.html): It contains 150 subjects who spoke the name of each letter of the alphabet twice. The speakers are grouped into sets of 30 speakers each and are referred to as isolet1 through isolet5. We selected Isolet1 for the experiment. Tables 5 and 6 show the comparison result of the recognition rates of LCVELM, GEELM, and GELLM on the AR face image dataset and the Isolet1 voice dataset. From Tables 5 and 6, we can see that the recognition rate of GELEM algorithm is better than LCVELM and GEELM algorithm. Through the experimental results, we can conclude that the local geometric information and global geometry of the data sample can effectively enhance the recognition effect of the ELM algorithm and is better than the LCVELM and GEELM algorithm.

Conclusions
Currently ELM faces the research hot point that ELM algorithm cannot fully use the local geometric structure and the global geometric structure information of data samples. Although researchers have made efforts to solve this problem by proposing related algorithms as MCVELM, GELM, and GEELM, these algorithms give a way in different sides; the more effective research is expected. In order to describe the geometry of the data sample we introduce manifold learning into the ELM algorithm. In manifold learning, LPP can depict local geometrical structure well. LDA can acquire the global geometric structure of data samples and the discriminant information as well. We adopt the basic principles of the LDA and LPP algorithms, define a manifold regular framework, introduce the manifold regular framework into the ELM model, and propose a globality-locality preserving extreme learning machine algorithm (GLELM). Compared with ELM, GLELM can acquire manifold structure information of samples and is of the stronger ability to recognize. We validate the GLELM algorithm using image datasets and UCI datasets, and the experimental results verify the effectiveness of GLELM. In the future, We will introduce local discriminant information of data into the ELM algorithm, to explore the effect on the performance of recognition.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.