Learning Latent Variable Gaussian Graphical Model for Biomolecular Network with Low Sample Complexity

Learning a Gaussian graphical model with latent variables is ill posed when there is insufficient sample complexity, thus having to be appropriately regularized. A common choice is convex ℓ 1 plus nuclear norm to regularize the searching process. However, the best estimator performance is not always achieved with these additive convex regularizations, especially when the sample complexity is low. In this paper, we consider a concave additive regularization which does not require the strong irrepresentable condition. We use concave regularization to correct the intrinsic estimation biases from Lasso and nuclear penalty as well. We establish the proximity operators for our concave regularizations, respectively, which induces sparsity and low rankness. In addition, we extend our method to also allow the decomposition of fused structure-sparsity plus low rankness, providing a powerful tool for models with temporal information. Specifically, we develop a nontrivial modified alternating direction method of multipliers with at least local convergence. Finally, we use both synthetic and real data to validate the excellence of our method. In the application of reconstructing two-stage cancer networks, “the Warburg effect” can be revealed directly.


Introduction
Learning a graphical model from high-dimensional but partial observations is ill posed, leading to infinitely numerous solutions. A possible approach to address this underdetermined problem is to impose a low complexity solution with a low-dimensional structure (geometry), such as the sparse vector [1], the low-rank matrix [2], and their combinations (the sparse and low-rank decomposition) [3].
One feasible way to learn such a graphical model is to capture any conditional independence between each pair of the variables with a sparsity prior. Under the assumption of multivariate normal distribution, this reconstruction can be simplified as an inverse of the covariance matrix through a penalized optimization plus a sparsity-induced regularization (Gaussian graphical model) as [4] Ω = arg min where is the covariance matrix of the data, Ω = −1 represents its inverse, and denotes the tuning parameter for the sparsity-induced regularization.
Unfortunately, there is the possibility that a few of the variables are hidden or unobserved, thus requiring a latent model. Imagine a complex network with a few latent variables, each densely interacting with multiple observed variables. Thus, the sparsity assumption will not hold because of this latent structure ( Figure 1). For instance, transcriptional factors (proteins) which regulate RNA transcriptions are not directly observed from whole-genome expressions (genechip or microarray). Therefore, an additive regularization (sparse plus low-rank recovery) has been developed to decompose the sparse interactions among the observed variables (sparsity) from a few latent variables (low rankness), that is, latent variables Gaussian graphical model (LVGGM) [5] as where is the covariance matrix of the observed data, Ω is the inverse of , and Ω − is the surl component of Ω , corresponding to the latent variables with low rank. 1  respectively. The details of (1) and (2) will be depicted in Section 2. An ultimate approach for such a sparse and low-rank recovery would be ℓ 0 (base pursuit) plus the selection of exact ranks and thus a NP-hard problem [6]. A common relaxation is the use of ℓ 1 plus the nuclear norm. An alternative relaxation is the use of concave penalties (Table 1, such as MCP, log or exponential type penalty, and SCAD), which have been verified to be very effective including biomolecular network reconstruction and image denoising [7,8]. However, this concave approach so far has only been applied in sparse estimation problems.
The overall goal of this paper is therefore to develop a computational framework for a concave regularization with additive sparse and low-rank constraints [9] because of our desire to encode latent variables in a Gaussian graphical model but with insufficient samples, a very important issue in gene interaction network with latent regulatory factors. Below we want to justify why we chose a concave approach.
We chose an additive concave regularization because it does not require the strong irrepresentable condition, particularly when sample complexities are not sufficient [10]. This irrepresentable condition is highly relevant for biological observations ( ≫ and often with very limited ). In contrast, a strong irrepresentable condition is necessary for Lasso in order for it to be selection consistent (requiring more sample complexities).
We chose a concave regularization also because of its parameter-estimation consistency, able to correct the intrinsic estimation biases from Lasso [11]. Here the bias is defined as the inevitable shrinkage introduced by Lasso, linearly expanded along with the tuning parameter . This bias issue has been noted during a regression setting [12] and a sparse precision matrix estimation [7].
Finally, we chose a concave regularization because an error bound has already been established for sparse least square problems [10]. C.-H. Zhang and T. Zhang established a bound by imposing an appropriate ℓ 2 regularity condition such that a family of column-normalized matrices can guarantee a desirable estimation under an appropriate sparsity assumption [10], leading to the error bound that is no worse than Lasso. This general result holds for the entire concave regularization family including the bridge penalties (ℓ , < 1). Note that both of our regularizations belong to the family of bridge penalties.
As aforementioned we have intentionally selected a regularization scheme from the general concave regularization family because of its established theoretical supremacy (improve the variable selection accuracy and gain the oracle properties by reducing the bias of Lasso) [12,13]. Our motivation is to provide the community with at least a computational alternative when some additive concave penalties are critically needed for some niches of timely applications (really demanding the oracle property), particularly when observations are limited (such as gene expression arrays).
Overall, the first contribution of our effort is having provided a novel bridge-nuclear penalty to induce a lowrank structure as well as a bridge-fused penalty to induce a fused-structural sparsity [14]. Note that we have explicitly derived the proximity operators [15] for these penalties (concave), respectively, an essential and important step towards a gradient-based optimization [16]. The structural sparsity here is used to join multiple graphical models to compare their differences (evolution of network structures, different stages and types of tumorigenesis, and other network comparison problems).
Our second contribution is to provide a modified alternative direction multiplier method (modified ADMM) [17] to numerically optimize these concave estimators (via their proximity operators), leading to at least some local solutions. We chose ADMM as the optimization method because we can prove its local convergence. We note the convergence of ADMM when applied to convex LVGGM as has been proved by [18].
Our third contribution is having provided a vigorous proof for the local convergence of this ADMM based on a framework for analyzing linear constrained optimization algorithms [19]. We use variational inequality to derive the contraction property in each iteration, which guarantees the monotonic convergence to a stationary point. Our experiments using both synthetic and real data indicated better performances compared to the classical convex regularizations. Overall we have developed an unified computational approach for additive concave regularization.
identity matrix. For rectangular matrices , in R × , the spectral norm ‖ ‖ is the largest singular value, ‖ ‖ = sup ∈R (‖ ‖/‖ ‖), and the nuclear norm ‖ ‖ * is the sum of the singular values. The Frobenius norm ‖ ‖ is the 2 norm of the singular values; ‖ ‖ = (Tr( The function sign( ) extracts the sign of a real number .

Gaussian Graphical Model with Concave Regularization.
In this section, we briefly review the related works on Gaussian graphical model (GGM) and latent variable Gaussian graphical model (LVGGM).
A GGM also known as a Gaussian-Markov random fieldbased method is defined with respect to a graph ( , ). The set of nodes consists of individual variables (features) with observations = ( 1 , 2 , . . . , ) ∈ R × under the multivariate normal distribution N( , Σ). The edges represent the conditional independencies among the variables, where the edges ∉ , if and only if and are independent, conditioned on the remaining variables. With the Gaussian assumption, this conditional independence for any pairs of nodes and is equivalent to their partial correlation being zero, so that ∉ ⇔ It is important to note that such an equivalence requires that the random variables be sampled from the families of distributions characterized with a semigroup property, including multivariate Gaussian, elliptical, multivariate hypergeometric, multivariate negative hypergeometric, multinomial, and Dirichlet distributions [20]. Therefore, the GGM can be learned by estimating its precision matrix, the inverse of the covariance matrix, in which each off-diagonal element represents its conditional independence. This estimation can be formulated as a sparsity-regularized optimization problem (1) with the sparsity assumption about the network structure. Note that there are a number of alternatives to formulate this sparsityinduced regularization. For instances, convex regularizations with the group 2 norm have been used to estimate the precision matrix [4,21]. These regularization schemes have been extensively discussed, whose bounds have been estimated and even analytically proved [22,23].
A more recent effort to achieve better statistical properties, including the oracle properties and unbiased estimation, has been developed by Fan et al., using a folder concave-based regularization [7] as follows: The SCAD is symmetric and a quadratic spline on [0, ∞), whose first-order derivative is given by SCAD , ( ) = { (| | ⩽ ) + (( − | |) + /( − 1) ) (| | > )} with a suggested value of = 3.7. Model (3) has some admirable performances in practice but still a few limitations. At least it is difficult to be extended to include possible complex and additive regularization schemes, such as the combination of sparsity with low rankness or structural sparsity. Still the most challenging is nonconvex nature for all concave-based approaches, possibly leading to many local solutions. Thus, one of the goals of this paper is to extend Fan et al. 's work to more complex and even additive regularization schemes.

Latent Gaussian Graphical Model with Additive Concave
Regularization. A Gaussian graphical model can be incorporated with a few hidden variables (such as the latent Gaussian Graphical Model). Let ∈ R be the observed variables and ∈ R ( ≪ ) the latent ones. Here typically we assume the number for those latent variables is small (low rankness). Thus, ( , ) can be jointly sampled from a multivariate normal distribution. Suppose its covariance matrix is Σ ( , ) = ( Σ Σ Σ Σ ) and its corresponding precision matrix is where Ω is a sparse matrix and Ω Ω −1 Ω is denoted as , being a low-rank matrix due to ≪ . Chandrasekaran introduced a regularized 4 Computational and Mathematical Methods in Medicine optimization with multiple additive terms, named as the latent variable graphical model (2) [5], and the error bound of LVGGM is proofed by Meng et al. [24]. Here 1 and 2 are two tuning parameters, which are often hard to choose. An empirical way is to use certain information criteria or conduct -fold cross-validations. We propose a novel bridge-nuclear penalty to induce a low-rank structure as follows.
By employing the 1/2 quasinorm (bridge penalty) plus the 1/2 quasinuclear norm (bridge-nuclear penalty), we formulate our latent Gaussian graphical model with additive concave penalties as Note again, our concave LVGGM does not need strong irrepresentable condition and thus can be applied to low sample complexity.

Joint Multiple Latent Gaussian Graphical Model with
Additive Concave Regularization. To demonstrate the applicability of our additive approach, we purposely include a fused-structural sparsity, being used together with the aforementioned sparsity and low rankness. Here a total of three penalties are additively combined, potentially useful to model a network comparison problem with latent variables (joint latent variable Gaussian Graphical Model, JLVGGM). The evolution of biological and some other networks (such as social network) often has some invariant portion over the progression, which can thus be captured by our fused regularization over individual snapshots. We consider this a potentially very useful approach to model a regulatory network in biology, since many gene interactions will remain invariantly imposed by their functional constraints (housekeeping, etc.). The most commonly used constraint so far assumes that network evolution is gradual and local, representing mainly sporadic and minor structural changes, with most of the systems remaining intact. For instance, Guo et al. developed a joint Gaussian graphical model to learn multiple snapshots, assuming their biological networks being only partially changed [25]. Recently, Danaher et al. [26] used a fused-lasso scheme to model multiple stages of tumorigenesis.
As an extension to Danaher's fused graphical model with latent variables, also as an example to demonstrate our additive strategy with a structural sparsity, we formulate a joint model with latent variables.
∈ R × are independent and identically distributed from N( , Σ ). We formulate our joint model with latent variables as where 3 , = 1, 2, . . . , − 1 represent the similarity measure among the temporal networks according to the tuning parameters.

Modified Alternating Direction Method of Multipliers
In this section, we want to establish the algorithm and its convergence of the modified alternating direction method of multipliers (ADMM). We applied this numerical method to our graphical model with latent variables. First we derived the proximity operators individually for ℓ 1/2 , the 1/2 quasinuclear norm ‖⋅‖ ♣ (bridge-nuclear penalty), and the fused ℓ 1/2 . With these proximity operators, we design a gradient-based but nonsmooth optimization based on alternating direction method of multipliers.

Proximity Operator
Theorem 2. The proximity operator of ‖ ‖ 1/2 1/2 is the global solution for the following problem: One has Computational and Mathematical Methods in Medicine 5 Proof. Since the subdifferential of a nonconvex function is not well defined, we resolve the optimization problem (6) has three roots in a compact trigonometric form as √ | | = 2√ 3 cos under the conditions > 0 and > (3/2) 3 √ 2 /2. It is validated that (6) will reach a global minimum, when = 0. Since and are either positive or negative simultaneously, we have Similarly we have By taking a square root on both sides of (10) and (11) within , the statement in Theorem 2 is proven.
Theorem 3. Assuming , ∈ R × , the proximity operator of the 1/2 quasinuclear norm is given as the global minimum of where ( ), ( ) ∈ R ×1 represent the single value of and , respectively, in nonincreasing order. and are the left and right orthogonal matrices with the singular value decomposition of = Diag( ( )) . We have Proof. Through von Neumann's trace inequality [28], we have The equality holds if and only if = Diag( ( )) . Then the optimization is reduced to arg min We note here (15) is just a special case of (6). This completes the proof of Theorem 3. We note this regularization would induce a low-rank approximation of due to the threshold of .

Modified Alternating Direction Method of Multipliers
In this section, we provide an algorithm to approach the local solution of additive concave regularization problem (2). The Lagrangian of (2) is where ∈ R × represents the corresponding dual variable. We propose a modified ADMM discretization to optimize the Lagrange as Algorithm 5 with at least local convergence.
Before we prove the convergence result, we need to prove the following contraction property which is the key for the proof of the convergence of general ADMM [29].

Computational and Mathematical Methods in Medicine
Proof. From Theorem 6, we can easily get the following: (2) { ( ) } lies in a compact region.
It follows that ( +1) is a Cauchy sequence and thus has a limit point . Therefore we have ( +1) converging monotonically to .

Numerical Evaluation
In this paper, we focus on the effective local convergence of ill-posed graphical model problems with additive concave regularization. The specific aim is to develop an additive regularization combining both sparsity and low rankness, essential for our latent graphical model. Again we chose a concave approach (instead of the popular convex methods) because of its presumptive advantage (the oracle property) especially when the data complexity is not sufficient ( ≫ ). It is thus important to first develop a robust numerical method to at least obtain a consistent local solution. We believe our methods are highly applicable to situations where the number of variables far exceeds their observations (large small problem, typical for most biological observations). Below we use both artificial and real data to establish the effectiveness of our methods. We demonstrate that our nonconvex regularization can reduce the biases, as illustrated in its applications to the selected biological and financial problems. In addition, we assess the performance of our concave models comparing with the corresponding convex ones.

Artificial Data.
Our concave graphical model is supposed to perform better as an asymptotically unbiased estimator. We generate our artificial data according to the following steps.
We generate 5% nonzero entries uniformly from a 50 × 50 matrixΩ. Each nonzero element in thisΩ is sampled from a Gaussian distribution of N(0, 2 ). We construct a symmetric precision matrix as Ω =Ω +Ω . To guarantee the positive definiteness of this Ω, we update the matrix Ω = Ω + 1.1 50 iteratively, such that all of its eigenvalues be greater than zero. We randomly choose 45 columns and their corresponding rows as a subblock matrix Ω form Ω; thus the remaining 10% variables are latent. Finally, we take the inverse of the precision matrix Σ = Ω −1 as its original covariance matrix to sample a Gaussian distribution.
We divide our experiments into two groups with the increasing sample complexity: = (1/2) , random samples from the multivariate Gaussian distribution N(0, 2 ), with 2 = 3. Important to note is that in order to test the effectiveness of our method for an ill-posed problem ( ≫ ), we purposely only increase the number of observations while keeping their variances a constant.
We use this example to validate that our method has better performance than the classic LVGGM. Firstly, for parameters estimation consistence as Figure 2 shows, our concave LVGGM estimate parameters more accurately than the classic model in [5], even with a local solution. We want to note that our method performs better with the larger observed data in Figure 2(c) but is significantly superior to the classical method with inadequate data in Figure 2

The Biomolecular Network of Medulloblastoma.
To demonstrate that our method performs consistent in both artificial data and real application, we apply our method to human medulloblastoma. Medulloblastoma is the most common form of childhood brain tumors. This cancer has at least four subgroups, including the WNT subgroups and the sonic hedgehog (SHH), plus the subgroups 3 and 4, with the molecular etiology remaining elusive for the latter two. In China, patients with medulloblastomas are still largely treated with universal and aggressive procedures, combining radical surgery, radiation, and chemotherapy, which might in fact fail the subgroups 3 and 4 (poor prognosis with unknown reasons), or probably have overtreated the WNT subgroup (more differentiated). Thus, it is important to develop a methodology to clearly distinguish and identify the key molecular markers and their interactions at system level, representing each of the subgroups decisively. We hope our results will yield a set of genes in some given structures which can be used as the signatures (biomarkers) to guide diagnosis, treatment, and prognosis in future.
The gene expression data are publicly available from the National Center for Biotechnology Information (NCBI), consisting of 73 individual cancer samples (8,10,16, and 39 samples for WNT, SHH, subgroup 3, and subgroup 4, resp.), each labeled with a specific subgroup (accession number: GDS4296) [31]. Firstly, We select 1146 genes out of 54676, whose expressions show significantly larger variance across both of the cancer samples. In order to tune our parameters we use Bayesian information criterion (BIC): BIC = −2 (− log det ( − ) + Tr ( ( − ))) + log (30) to select the tuning parameters 1 , 2 with a constant parameter . Here presents the nonzeros numbers of matrix , while denotes the total sample numbers. We use BIC because it is intrinsically incline to identify the "true" model while being asymptotically consistent in selecting such a model. We did consider another criterion such as Akaike information criterion (AIC) as it tends to explain the data, thus suffering the risk of overfitting. Since the BIC does not have a closed form with respect to 1 and 2 , we carry out a grid-based screening ( Figure 3).
As demonstrated in Figure 4(a), the famous WNT/betacatenin pathway is recovered, including WNT16 and XIST. The WNT pathway is required for basic developmental processes, such as cell-fate specification, progenitor-cell proliferation, and the control of asymmetric cell division. Identified here is the canonical pathway, which inhibits the beta-catenin degradation complex. Our results thus strongly support the etiological role of the canonical WNT signaling in the pathogenesis of this subgroup of tumors [32]. As illustrated in Figure 4(b), part of the SHH signaling pathway is identified, including oncogene MAGEA, which is associated with shorter survival of tumor cells [33]. Since SHH subgroup medulloblastomas develops from cerebellar granule neuron progenitors, which are supposed to guide axon growth into muscles, it is our speculation that cells of this subgroup of tumors might interact with muscle cells sometime during early development. Interestingly, many modules identified here including TAC1 and TTR,are implicated in some sensory-related activities during early development for subgroup 3 (Figure 4(c)). Consistent with our observations, those sensory genes are overexpressed for this subgroup of tumors according to the independent studies [32]. It is clearly distinguishable from the other three, particularly when combined with the sensory basis (see above). However, note that the genes of subgroup 4 are more randomly distributed than the other three, without the apparent modular structures. Interestingly, a number of important oncogenes and tumor suppressor genes are found such as FOXG1 (Figure 4(d)).

The Structural Changes on Gene Regulatory Networks
Occurring during the Progression of Human Lung Cancers. To indicate our concave additive regularizations can be used to model a structured sparsity (fused sparsity). We perform the following experiment to identify the structural changes on gene regulatory networks occurring during the progression of human lung cancers. The lung cancer dataset contains 22283 microarray-derived gene expression measurements from large airway epithelial cells sampled from 97 patients with lung cancer and 90 controls [34]. The data are publicly available from the Gene Expression Omnibus [35]; its accession number is GDS2771. We first selected the 36 most important biological modules (CDK4, CDK6, CDK2AP2, BCL2, XIAP, BAX, CASP3,  CASP8, CASP9, FOXA2, HNF4A, EBP, HGF, NFKB2, STAT3,  IL6, IL10, HIF1AN, MYC, GSK3B, TP53, PTEN, RB1, MDM2,  PDGFRA, SOX2, PIK3CA, FGFR1, IGF1R, EPHA2, MET, EGFR, DDR2, KEAP1, KRAS, and AKT1) out of the whole genome for lung cancer according to [36]. The goal of this experiment is to find out if our method can consistently detect key structural changes on the level of gene regulatory networks associated with lung cancer. It is well established that many protein-factors are directly or indirectly engaged in the control of transcripts. Also important to note is the availability of gene annotation information, which is still limited mainly for some of the key and frequently expressed genes. We take the notion that the network structure will largely determine the key aspects of biological functionality [37].
We want to know if there are any key differences between normal and cancer cells. We use the 36 key biological modules to reconstruct a two-stage gene regulatory network by our Concave JLVGGM. To approach statistical significance, we do a random permutation to the tuning parameter 11 for 1000 times. According to the second law of thermodynamics, it implies that the cancer network becomes unstable during the progress of tumorigenesis, consistent with the notion that instability is a common phenotype of cancer cells. The tuning parameters are taken as 12 = 0.8 11 . An interesting phenomenon we observed during our modeling effort is as follows: we found that the abnormal metabolism and apoptosis may be closely related to the progression of lung cancers according to Figure 5. As "the Warburg effect" suggests, most proliferation of cancer cells relies on aerobic glycolysis. Thus, the abnormal metabolism we detected in the cancer cells here is intriguing and significant. It is widely known that apoptosis plays a critical role in cancer and particularly lung cancer. However, based on our structural results (undirected graph), we do not know how these changes would be causally related.

Conclusion
We have developed a concave regularization approach for LVGGM for low sample complexity as well as for reducing the biases. Computational method based on proximity operators is provided with at least local convergence. Our methodology establishes its practical value as demonstrated by the numerical results (see Figure 2). Finally, as a future direction, we plan to assess a fully Bayesian interpretation of concave LVGGM. This may be helpful for an advisable choice of the tuning parameter as displayed in the framework of [38].