A Review of Feature Extraction Software for Microarray Gene Expression Data

When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming large-scale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the large-scale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). A summary and sources of the software are provided in the last section for each feature extraction method.


Introduction
The advances of microarray technology allow the expression levels of thousands of genes to be measured simultaneously [1]. This technology has caused an explosion in the amount of microarray gene expression data. However, the gene expression data generated are high-dimensional, containing a huge number of genes and small number of samples. This is called the "large small problem" [2]. The highdimensional data are the main problem when analysing the data. As a result, instead of using gene selection methods, feature extraction methods are also important in order to reduce the dimensionality of high-dimensional data. Instead of eliminating irrelevant genes, feature extraction methods work by transforming the original data into a new representation. Feature extraction is usually better than gene selection in terms of causing less information loss. As a result, the high-dimensionality problem can be solved using feature extraction.
Software is a set of machine readable instructions that direct a computer's processor to perform specific operations. With increases in the volume of data generated by modern biomedical studies, software is required to facilitate and ease the understanding of biological processes. Bioinformatics has emerged as a discipline in which emphasis is placed on easily understanding biological processes. Gheorghe and Mitrana [3] relate bioinformatics to computational biology and natural computing. Higgs and Attwood [4] believe that bioinformatics is important in the context of evolutionary biology.
In this paper, the software applications that can be used for feature extraction are reviewed. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component Analysis (ICA), Partial Least Squares (PLS), and Local Linear Embedding (LLE). In the last section for each feature extraction method, a summary and sources are provided.

Software for Principal Component
Analysis (PCA) 2 BioMed Research International to decrease the dimensionality of a given data set, whilst maintaining as plentiful as possible the variation existing in the initial predictor variables. This is attained by transforming the initial variables = [ 1 , 2 , . . . , ] to a latest set of predictor variables. Linear amalgamation of the initial variables is = [ 1 , 2 , . . . , ]. In mathematical domain, PCA successively optimizes the variance of a linear amalgamation of the initial predictor variables: = argmax (Var ( )) , = 1 (1) conditional upon the constraint = 0, for every 1 ≤ ≤ . The orthogonal constraint makes sure that the linear combinations are uncorrelated; that is, Cov( , ) = 0, ̸ = . These linear combinations are denoted as the principle components (PCs): = . ( The projection vectors (or known as the weighting vectors) can be attained by eigenvalue decomposition on the covariance matrix : where is the th eigenvalue in the decreasing order, for = 1, . . . , , and is the resultant eigenvector. The eigenvalue calculates the variance of the th PC and the eigenvector gives the weights for the linear transformation (projection).

FactoMineR.
FactoMineR is an R package that provides various functions for the analysis of multivariate data [5]. The newest version of this package is maintained by Hussen et al. [6]. There are a few main features provided by this package; for example, different types of variables, data structures, and supplementary information can be taken into account. Besides that, it offers dimension reduction methods such as Principal Component Analysis (PCA), Multiple Correspondence Analysis (MCA), and Correspondence Analysis (CA). The steps in implementing PCA are described in Lê et al. [5] and Hoffmann [7]. For PCA, there are three main functions for performing the PCA, plotting it, and printing its results. This package is mainly for Windows, MacOS, and Linux.

ExPosition.
ExPosition is an R package for the multivariate analysis of quantitative and qualitative data. ExPosition stands for Exploratory Analysis with the Singular Value Decomposition. The newest version of this package is maintained by Beaton et al. [8]. A variety of multivariate methods are provided in this package such as PCA, multidimensional scaling (MDS), and Generalized PCA. All of these methods can be performed by using the corePCA function in this package. Another function, epPCA, can be applied to implement PCA. Besides that, Generalized PCA can be implemented using the function epGPCA as well. All of these methods are used to analyse quantitative data. A plotting function is also offered by this package in order to plot the results of the analysis. This package can be installed on Windows, Linux, and MacOS.

amap.
The R package "amap" was developed for clustering as well as PCA for both parallelized functions and robust methods. It is an R package for multidimensional analysis. The newest version is maintained by Lucas [9]. Three different types of PCA are provided by this package. The methods are PCA, Generalized PCA, and Robust PCA. PCA methods can be implemented using the functions acp and pca for PCA, acpgen for Generalized PCA, and acprob for Robust PCA. This package also allows the implementation of correspondence factorial analysis through the function afc. Besides that, a plotting function is also provided for plotting the results of PCA as a graphical representation. The clustering methods offered by this package are k-means and hierarchical clustering. The dissimilarity matrix and distance matrix can be computed using this package as well. This package is mainly for Windows, Linux, and MacOS.  [11] for multivariate analysis of gene expression data based on the R package "ade4. " Basically, it is the extensions of the R package "ade4" for microarray data. The purpose of writing this software was to help users in the analysis of microarray data using multivariate analysis methods. This software is able to handle a variety of gene expression data formats, and new visualization software has been added to the package in order to facilitate the visualization of microarray data. Other extra features such as data preprocessing and gene filtering are included as well. However, this package was further improved by the addition of the LLSimpute algorithm to handle the missing values in the microarray data by Moorthy et al. [39]. It is implemented in an R environment. The advance of this package is that multiple datasets can be integrated to carry out analysis of microarray data. The newest version is maintained by Culhane [40]. This package can be installed on Linux, Windows, and MacOS.

XLMiner.
XLMiner is add-in software for Microsoft Excel that offers numerous data mining methods for analysing data [12]. It offers a quick start in the use of a variety of data mining methods for analysing data. This software can be used for data reduction using PCA, classification using Neural Networks or Decision Trees [41,42], class prediction, data exploration, affinity analysis, and clustering. In this software, PCA can be implemented using the Principle Component BioMed Research International 3 tab [43]. This software is implemented in Excel. As a result, the dataset should be in an Excel spreadsheet. In order to start the implementation of XLMiner, the dataset needs to be manually partitioned into training, validation, and test sets. Please see http://www.solver.com/xlminer-data-mining for further details. This software can be installed on Windows and MacOS.
2.7. ViSta. ViSta stands for Visual Statistics System and can be used for multivariate data analysis and visualization in order to provide a better understanding of the data [13]. This software is based on the Lisp-Stat system [44]. It is an open source system that can be freely distributed for multivariate analysis and visualization. PCA and multiple and simple CA are provided in this software. Its main advance is that the data analysis is guided in a visualization environment in order to generate more reliable and accurate results. The four state-of-the-art visualization methods offered by this software are GuideMaps [45], WorkMaps [46], Dynamic Statistical Visualization [47], and Statistical Re-Vision [48]. The plug-ins for PCA can be downloaded from http://www.mdp.edu.ar/psicologia/vista/vista.htm. An example of implementation of the analysis using PCA can be viewed in Valero-Mora and Ledesma [49]. This software can be installed on Windows, Unix, and Macintosh.

imDEV. Interactive Modules for Data Exploration and
Visualization (imDEV) [14] is an application of RExcel that integrates R and Excel for the analysis, visualization, and exploration of multivariate data. It is used in Microsoft Excel as add-ins by using an R package. Basically, it is implemented in Visual Basic and R. In this software, numerous dimension reduction methods are provided such as PCA, ICA, PLS regression, and Discriminant Analysis. Besides that, this software also offers clustering, imputing of missing values, feature selection, and data visualization. The 2 × 3 visualization methods are offered such as dendrograms, distribution plots, biplots, and correlation networks. This software is compatible with a few versions of Microsoft Excel such as Excel 2007 and 2010.
2.9. Statistics Toolbox. Statistical Toolbox offers a variety of algorithms and tools for data modelling and data analysis. Multivariate data analysis methods are offered by this toolbox. The methods include PCA, clustering, dimension reduction, factor analysis, visualization, and others. In the statistical toolbox of MATLAB, several PCA functions are provided for multivariate analysis, for example, pcacov, princomp, and pcares (MathWorks). Most of these functions are used for dimensional reduction. pcacov is used for covariance matrices, princomp for raw data matrices, and pcares for residuals from PCA. All of these functions are implemented in MATLAB.

Weka.
Weka [16] is data mining software that provides a variety of machine learning algorithms. This software offers feature selection, data preprocessing, regression, classification, and clustering methods [50]. This software is implemented in a Java environment. PCA is used as a dim-ension reduction method in Weka to reduce the dimensionality of complex data through transformation. However, not all of the datasets are complete. Prabhume and Sathe [51] introduced a new filter PCA for Weka in order to solve the problem of incomplete datasets. It works by estimating the complete dataset from the incomplete dataset. This software is mainly for Windows, Linux, and MacOS.
2.11. NAG Library. In NAG Library, the function of PCA is provided as the g03aa routine [17] in both C and Fortran. This routine performs PCA on data matrices. This software was developed by the Numerical Algorithms Group.
In the NAG Library, more than 1700 algorithms are offered for mathematical and statistical analysis. For PCA, it is suitable for multivariate methods, G03. Other methods provided are correlation analysis, wavelet transforms, and partial differential equations. Please refer to http://www.nag .com/numeric/MB/manual 22 1/pdf/G03/g03aa.pdf for further details about the g03aaa routine. This software can be installed on Windows, Linux, MacOS, AIX, HP UX, and Solaris.

Case Study.
In this section, we will discuss the implementation of coinertia analysis (CIA) to cross-platform visualization in MADE4 and ADE4 to perform multivariate analysis of microarray datasets. To demonstrate, PCA was applied on 4 childhood tumors (NB, BL-NHL, EWS, and RMS) from a microarray gene expression profiling study [52]. From these data, a subset (khan$train, 206 genes × 64 cases), each case's factor denoting the respective class (khan$train classes, length = 64), and a gene annotation's data frame are accessible in aforementioned dataset in MADE4: < plotgenes (results.coa, genelabels = geneSym). Figure 1 shows the PCA of a 306-gene subset. As origin as the point of reference, the more advanced gene and case are projected in the similar direction, the stronger the association between involved gene and case is (gene is upregulated in that array sample). Tables 1 and 2 show the summary and sources of PCA software, respectively. Table 3 discusses the related work of this software.

Software for Independent Component Analysis (ICA)
ICA is considered as a valuable extension of PCA that has been established considering the blind separation of independent sources from their linear combination [53]. In a way, the initial point of ICA is the property of uncorrelation of general PCA. Based on × data matrix , whose rows ( = 1, . . . , ) tally to observational variables and whose columns ( = 1, . . . , ) are the individuals of the corresponding variables, the ICA model of can be written as = . (4) With generality intact, is a × mixing matrix, whereas is a × source matrix under the necessity of being statistically independent as possible. "Independent components" are the new variables confined in the rows of , to wit, the variables observed are linearly collected independent components.
is the marginal entropy of the variables , ( ) is the probabilistic density function, and ( ) is the joint entropy [54]. Value the independent components able to be attained by discovering the correct linear mixtures of the observational variables, since mixing can be inverted as 3.1. FastICA. FastICA is the most widely used method of ICA [55]. It is implemented in an R environment as the R package "FastICA" for performing ICA and Projection Pursuit by using the FastICA algorithm. FastICA was first introduced by Hyvärinen [54] for single and multiple component extraction. The FastICA algorithm is based on a fixed-point iteration scheme maximizing non-Gaussianity as a measure of statistical independence. This package is maintained by Marchini et al. [18]. ICA is used to extract the informative features through a transformation of the observed multidimensional random vectors into independent components.
This package is mainly for Windows, Linux, and MacOS. FastICA is also implemented in MATLAB. In MATLAB, FastICA implements a fast fixed-point algorithm for ICA as well as projection pursuit. It provides a simple user interface and also a powerful algorithm for computation.

JADE.
JADE is an R package that provides a function for implementing ICA. This package is maintained by Nordhausen et al. [19]. In this package, Cardoso's JADE algorithm [56] is provided for ICA. Instead of the JADE algorithm, other Blind Source Separation (BSS) methods such as the SOBI [57] and AMUSE [58] methods are offered. Both of these methods are mainly used for solving second order BSS problems. Amari error [59] is offered to evaluate the performance of the ICA algorithm. This package can be installed on Linux, Windows, and MacOS.

High Performance Signal Analysis Tools (HiPerSAT).
HiPerSAT is written in C++ for processing electroencephalography (EEG) data with whitening of data and ICA [20]. MPI and OpenMP are used to perform parallel analysis of ICA. Basically, this software is used to analyse EEG data in order to understand the neurological components of brain activity. In this software, FastICA, SOBI, and Informax algorithms are offered. HiPerSAT is integrated into MATLAB and EEGLAB [60]. EEGLAB is MATLAB-based software that is used for analysing EEG data. However, the advantage of HiPerSAT is that it can handle larger datasets compared to MATLAB. In comparison to EEGLAB, HiPerSAT is able to handle large datasets without partitioning but EEGLAB requires data partitioning. Data whitening is performed before implementing the algorithms. This software can be installed on all platforms.

MineICA.
MineICA is an R package that supplies the implementation of ICA on transcriptomic data [21]. The main purpose of MineICA is to provide an easier way of interpreting the decomposition results from ICA. Besides that, this software also provides a correlation-based graph for comparing the components from different datasets. The newest version of this package is maintained by Biton [61]. This package provides some features such as storage of ICA results, annotation of features, and visualization of the results of ICA. This package can be installed on Linux, MacOS, and Windows.

Pearson Independent Component Analysis.
Karnanen [22] developed an R package for a feature extraction technique based on the Pearson ICA algorithm. This is a mutual information-based blind source separation approach which applies the Pearson system as a parametric model. In order to extract the independent components using the ICA algorithm, the mutual information of the components has to be minimized. However minimization of mutual information is required to use a score function. The Pearson system was used to model the score function. The parameters of the Pearson system are estimated by the method of moments. In order to  [17] Fortran and C (i) Provision of more than 1700 mathematical and statistical algorithms (ii) Multivariate analysis using PCA can be implemented using the g03aa routine speed up the algorithm, tanh nonlinearity is used when the distribution is far from Gaussian.

Maximum Likelihood Independent Component
Analysis. Teschenforff [23] developed an R package for ICA by using maximum likelihood estimation. This method was first introduced by Hyvaerinen et al. [62]. This method uses a fixed-point algorithm as the Maximum Likelihood estimation. For a fixed set of data and underlying statistical model, Maximum Likelihood selects the set of values of the model parameters that maximizes the likelihood function.
Maximum Likelihood estimation gives a unified approach to estimation, which is well-defined in the case of normal distribution. By using a maximum likelihood framework and controlling the number of algorithm runs, this fixedpoint algorithm provides a very fast implementation for maximization of likelihood.

Sample Case Study.
In this section, we utilize MineICA for microarray-based gene expression data of 200 breast cancer tumors kept in the package breastCancerMAINZ [63] based on a study done by Biton et al. [21]. In this study, we  (i) Providing a multivariate data analytic technique for applications in biological systems (ii) To combine "Omics" data structured into groups (iii) To help on their functional interpretations.
(i) It provides a geometrical point of view and a lot of graphical outputs (ii) It can take into account a structure on the data (iii) A GUI is available.

MADE4
Culhane et al. [11] To provide a simple-to-use tool for multivariate analysis of microarray data (i) Accepts a wide variety of gene-expression data input formats (ii) No additional data processing is required

Statistic toolbox
The MathWorks [15] High-dimensional and complex microarray data need automatic/computer aided tools for analysis Elegant matrix support; visualization imDev Grapov and Newman, 2012 [14] Omics experiments generate complex high-dimensional data requiring multivariate analyses (i) User-friendly graphical interface (ii) Visualizations can be exported directly from the R plotting interface in a variety of file formats (iii) Dynamic loading of R objects between analyses sessions focused on how MineICA can be utilized to study an ICAbased decomposition. Pseudo code for this case study is as follows: (1) Loading the library and the data (2) Creation of an IcaSet object    Tables 4 and 5 show the summary and sources of ICA software, respectively.

Software for Partial Least Squares (PLS)
The fundamental hypothesis of PLS is that the experimental information is created by a framework or methodology which is determined by a small number of latent characteristics. Thusly, PLS goes for discovering uncorrelated linear transformation of the initial indicator characteristics which have high covariance with the reaction characteristics. In light of these latent components, PLS predicts reaction characteristics , the assignment of regression, and reproduce initial matrix , the undertaking of data modelling, in the meantime. The purpose of building components in PS is to optimize the covariance among the variable and the initial predictor variables : = argmax (Cov ( , )) , = 1.

(6)
Restricted to constraint = 0, for all 1 ≤ < . The crucial assignment of PLS is to attain the vectors of maximum weights ( = 1, . . . , ) to build a small number of components, while PCA is an "unsupervised" method that utilizes the data only. To develop the components, [ 1 , 2 , . . . , ], PLS decomposes and to yield a bilinear denotation of the data [64]: where 's are vectors of weights for building the PLS components = , V's are scalars, and and are the residuals. The concept of PLS is to assume and V by regression.

Partial Least Squares Discriminant Analysis. Barker and Rayens [24] developed a PLS for discriminant analysis.
However the original PLS was not designed for discriminant purposes. PLS Discriminant Analysis is used to find a linear regression model by projecting the dependent features and the independent features to a new space. Then the fundamental relations can be extracted from the latent variables. This method was developed for software called Unscrambler, which was first developed by Martens and Naes [65]. Unscrambler is a commercial software product for multivariate data analysis. Unscrambler is used for analysing large and complex datasets quickly and easily using the power of multivariate analysis. Moreover this multivariate data analysis also offers exceptional data visualization.

Least Squares: Partial Least Squares.
Jørgensen et al. [25] proposed a method of using an iterative combination of PLS and ordinary least squares to extract the relationship between the predictor variable and the responses. This method is based on a combination of least squares estimates for the design variables and PLS regression on the spectra. The PLS scores were incorporated into the ordinary least squares equation on the spectra. The idea is to separate the information from the spectral and design matrices in a nice way. However this method is able to extract the information even when fewer components are used. In addition, this method is insensitive to the relative scaling of the spectra and the process. Moreover this combination method is also less biased than the individual PLS technique.

Powered Partial Least Squares Discriminant Analysis.
Liland and Indahl [26] extended the Powered PLS to Powered PLS Discriminant Analysis to overcome the extraction of information for the multivariate classification problem. This method can construct more efficient group separation and generate more interpretive outcomes than the ordinary Partial Least Square Discriminant Analysis technique. The features extracted by the Powered PLS can contribute to revealing the relevance of particular predictors and often requires smaller and simpler components than ordinary PLS. Moreover the optimization task is equivalent to maximizing the correlation between the transformed predictors and the groups. This makes it possible to discard the influence of less important predictors. This method was also developed by the authors for availability in an R package.

Penalized Partial Least Squares.
Krämer et al. [27] proposed a combination of the feature extraction technique PLS with a penalization framework. This method is an extension of PLS regression using a penalization technique. Ordinary PLS is suited for regression problems by minimizing a quadratic loss function iteratively. In addition, the representation in terms of kernel matrices provides an intuitive geometric interpretation of the penalty term. The penalty terms control the roughness of the estimated functions. With the incorporation of penalization into this framework, the research direction became more promising. This method is used to extract relevant information for high-dimensional regression problems and also for noisy data. This method was also developed by the Krämer and her colleagues colleagues [66] for availability in an R package.  However the chosen feature set will be suboptimal when the features of the original set are dependent. Some of the features will add little discriminative power on top of previously selected features. SlimPLS is a multivariate feature extraction method which incorporates feature dependencies into calculation. This multivariate property is constructed by combining the highly predictive feature with some less predictive but correlated features. This is because the added features will provide more information on the behaviour of the samples.

Sparse Partial Least Squares Discriminant Analysis and
Sparse Generalized Partial Least Squares. Chung and Keles [28] proposed two extension feature extraction approaches based on Sparse PLS. These approaches are Sparse PLS Discriminant Analysis and Sparse Generalized PLS for highdimensional datasets. These two approaches improved ordinary PLS by employing feature extraction and dimension reduction simultaneously. These two approaches perform well even with unbalanced sample sizes of the classes. Sparse PLS Discrimination Analysis is computationally efficient because it only requires computational time for one run of Sparse PLS and a classifier. Moreover, Sparse Generalized PLS extends Sparse PLS to the generalized linear model framework. These methods were also developed by Chung and Keles for availability in an R package.

Degrees of Freedom of Partial Least Squares. Kramer and
Sugiyama [29] proposed a method of unbiased estimation of the degrees of freedom for PLS regression. The authors stated that the construction of latent components from the independent variable also depended on the dependent variable. However for PLS regression, the optimal number of components needs to be determined first. One of the ways of determining the optimal number of components is through the degrees of freedom for the complexity of fitted models. Moreover the degrees of freedom estimate can be used for the comparison of different regression methods. Furthermore, the two implementations for the degrees of freedom utilize the connection between PLS regression and numerical linear methods from numerical linear. The authors also developed an R package for this unbiased estimation of the degrees of freedom of PLS.

Surrogate Variable Analysis Partial Least Squares.
Chakraborty and Datta [30] proposed a surrogate variable analysis method based on PLS. In differential gene expression analysis, one of the important issues is to avoid the hidden confounders in the dataset. The hidden confounders of gene expression are caused by different environmental conditions of the samples. However this problem cannot be simply overcome by modifying the gene expression data by using a normalizing technique. This method can extract the informative features by identifying the hidden effects of the underlying latent factors using ordinary PLS and applying analysis of covariance (ANCOVA). ANCOVA is applied with the PLS signatures of these hidden effects as covariates in order to identify the genes that are truly differentially expressed. This method was also developed by the authors for availability in an R package.

Partial Least Squares Path
Modelling. Sanchez and Trinchera [31] developed an R package for Partial Least Squares Path Modelling (PLS-PM). PLS-PM was first introduced by Wold [67] and is also known as Structural Equation Modelling (SEM). It can be used as a composite-based alternative to factor-based SEM. PLS-PM can be used when the distributions are highly skewed. Moreover, PLS-PM can also be used to estimate relationships between latent variables with several indicators even though the sample size is small. Basically, PLS-PM consists of two sets of linear equations: the inner model and the outer model. The inner model specifies the relations between latent variables, while the outer model specifies the relations between a latent variable and its observed indicator. PLS-PM is a multivariate feature extraction analysis technique based on the cause-effect relationships of the unobserved and observed features.

Partial Least Squares Regression for Generalized Linear
Models. Bertrand et al. [32] developed a software application of PLS regression for generalized linear models. Generalized linear models are important to allow the response features to have a distribution other than normal. Generalized linear models can be viewed as a case of generalized linear models with an identity link. From the perspective of generalized linear models, however, it is useful to suppose that the distribution function is the normal distribution with constant variance and the link function is the identity, which is the canonical link if the variance is known. However, the generalized linear models preserve all the predictive power of the features where the predicted means are not assumed to be normally distributed. PLS regression is used to extract the predictive features from the generalized linear models.

Case Study.
In this section, we will discuss the R package consists of svpls. This function will call fitModel function in order to appropriate a number of ANCOVA models that are identified by pmax to the data and opt for the best model by looking the minimum value of the Akaike's information Criterion (AIC) [68]. Subsequently, this model is utilized to forecast the real pattern of genes' differential expression. The command lines in R are as follows: > ## Fitting the optimal ANCOVA model to the data gives: > fit <-svpls (10, 10, hidden fac.dat, pmax = 5, fdr = 0.05) > ## The optimal ANCOVA model, its AIC value and the positive genes detected > ## from it are givenL > fit$opt.model [1] > fit$AIC.opt [1] > fit$genes > ## The corrected gene expression matrix obtained after removing the effects of the hidden variability is given by: > Y.corrected <fit$Y.corr > pval.adj <-fit$pvalues.adj.
For instance, we study the efficacy of svapls on ALL/AML preprocessed dataset [69]. This data consists of expression levels of 7129 genes that have been log-transformed over two samples of patients. These two sets of 47 patients and 25 patients reported to suffer from Acute lymphoblastic Leukemia (ALL) and Acute Myeloid Leukemia (AML), respectively. By using svpls function, we yielded initial 1000 genes with corrected expression matrix. Random samples' distribution from four sources in the abovementioned matrix removes the extra effects owing to reported batch specific clustering in the initial data. In this context svapls performed equally efficient relative to another popular R package ber for removing batch effects in microarray data as shown in Figure 3. Tables 6 and 7 show the summary and sources of PLS software, respectively. Table 8 shows the related works on discussed software.

Software for Local Linear Embedding (LLE)
Straightforward geometric intuitions are the basis for LLE algorithm. Assume that given data comprise of real-valued vectors , for each dimensionality, tested by some core manifold. Given that there is adequate data, every data point and their neighbors are expected to be situated on or near to a locally linear patch of the manifold. Abovementioned patches are described by linear coefficients that rebuild every data point from respective neighbors. Equation (8) is the cost function used to calculate reconstruction errors which sums the squared distances between all the data points and their reconstructions. The weights summarize the contribution of the th data point to the th reconstruction. The optimal weights are found by solving a least-squares problem [70]:     To obtain a low dimensional approximation of a matrix that is "as close as possible" to a given vector (i) Focuses solely on feature selection (ii) Can be used as a pre-processing stage with different classifiers package is maintained by Diedrich and Abel [34]. The main functions of this package allow users to perform LLE and also to plot the results of LLE. The implementation of LLE is based on the idea of Ridder and Duin [71]. Besides that, some enhancements such as selection of the subset and calculation of the intrinsic dimension are offered. This package can be installed on Windows, Linux, and MacOS.

RDRToolbox.
RDRToolbox is an R package developed for nonlinear dimension reduction with LLE and Isomap. The package is maintained by Bartenhagen [35]. It offers the transformation of high-dimensional to low-dimensional data by using either LLE or Isomap. Besides that, a plotting function is provided to plot the results. In addition, the Davis-Bouldin Index is provided for the purposes of validating clusters. It is mainly for Linux, MacOS, and Windows.

Scikit-Learn.
Scikit-learn is software implemented in Python by integrating machine learning algorithms [36]. It is a simple-to-use software that allows users to implement a variety of machine learning algorithms.    S c i k i t -l e a r n http://scikit-learn.org/dev/install.html The residual variance of Isomap can be used to estimate the intrinsic dimension of the dataset: > Isomap (data = golubExprs, dims = 1 : 10, plotResiduals = TRUE, = 5).
Based on Figure 4, regarding the dimensions for which the residual variances stop to decrease significantly, we can expect a low intrinsic dimension of two or three and, therefore, visualization true to the structure of the original data. Next, we compute the LLE and Isomap embedding for two target dimensions: > golubIsomap = Isomap (data = golubExprs, dims = 2, = 5) > golubLLE = LLE(data = golubExprs, dim = 2, = 5).
Both visualizations, using either Isomap or LLE, show distinct clusters of ALL and AML patients, although the cluster overlaps less in the Isomap embedding. This is consistent with the DB-Index, which is very low for both methods, but slightly higher for LLE. A three-dimensional visualization can be generated in the same manner and is best analyzed interactively within R. Tables 9 and 10 show the summary and sources of LLE software, respectively. Table 11 shows the related works in discussed software.

Conclusion
Nowadays, numerous software applications have been developed to help users implement feature extraction of gene expression data. In this paper, we present a comprehensive review of software for feature extraction methods. The methods are PCA, ICA, PLS, and LLE. These software applications have some limitations in terms of statistical aspects as well as computational performance. In conclusion, there is a need for the development of better software.