Genome-Scale Metabolic Modeling of Archaea Lends Insight into Diversity of Metabolic Function

Decades of biochemical, bioinformatic, and sequencing data are currently being systematically compiled into genome-scale metabolic reconstructions (GEMs). Such reconstructions are knowledge-bases useful for engineering, modeling, and comparative analysis. Here we review the fifteen GEMs of archaeal species that have been constructed to date. They represent primarily members of the Euryarchaeota with three-quarters comprising representative of methanogens. Unlike other reviews on GEMs, we specially focus on archaea. We briefly review the GEM construction process and the genealogy of the archaeal models. The major insights gained during the construction of these models are then reviewed with specific focus on novel metabolic pathway predictions and growth characteristics. Metabolic pathway usage is discussed in the context of the composition of each organism's biomass and their specific energy and growth requirements. We show how the metabolic models can be used to study the evolution of metabolism in archaea. Conservation of particular metabolic pathways can be studied by comparing reactions using the genes associated with their enzymes. This demonstrates the utility of GEMs to evolutionary studies, far beyond their original purpose of metabolic modeling; however, much needs to be done before archaeal models are as extensively complete as those for bacteria.


Introduction
Since their discovery and classification in the late 1970s and early 1980s [1][2][3][4][5] archaea have garnered considerable interest, due in part to prevailing thoughts at the time that they lived primarily in extreme conditions, a property that results in unique cell physiology and metabolic characteristics [6]. Although the original classification of organisms was based on only thirteen sequences with only four representatives of archaea [2], the proposal of the three domains of life has been tested time and time again [6][7][8][9][10] and holds up well. Archaea have now been found to reside in essentially every terrestrial environment, and the unique natural capability of methane production among certain archaeal groups makes this domain of life remarkably novel.
Despite the significant progress in sequencing archaeal genomes, a systematic understanding of the metabolism of archaea is still lacking. This is especially true for peripheral metabolic pathways and mechanisms of adaptation to extreme environments [11]. It has often been noted that the environmental niches dominated by archaea constitute extremely stressful or even fatal homes for their bacterial cousins; thus, they have evolved unique coping mechanisms and optimized their metabolisms to salvage the energy that would otherwise be left unused in the environment. It has been proposed that adaptation to energy stress could be the primary factor driving the evolution of archaea [12]. The consequence would be that they have evolved specialized tolerance and metabolic capabilities unique to their environments which make them relatively inflexible to 2 Archaea adaptation like their bacterial counterparts. It has been proposed that this inflexibility possibly results in tighter phylogenetic groups that directly represent less metabolic diversity [12]. Indeed, the evidence seems to support this hypothesis as only 89 genera of archaea have been identified in contrast to the over 1,400 bacterial genera. This fact should be exploitable by systems biology researchers as it means that information gained by one member of a taxon can largely be extended to other related members of the taxa.
For this reason, systematic databases of the metabolic properties of the archaea are highly desirable; the field of systems biology is uniquely positioned to provide useful insight into the diversity and evolution of metabolic capabilities. To date, fifteen genome-scale metabolic models (GEMs; one of the main products of systems biology research) have been constructed for ten archaeal species. However, these models represent primarily members of the Euryarchaeota with almost three-quarters representatives of methanogens. An examination of the phylogenetic tree demonstrates a lack of well-curated metabolic reconstructions in many of the archaeal taxa (see Figure 1).
Despite the limited representation, much can already be learned from the GEM "knowledge-bases". Here, we review the GEMs constructed to date and the knowledge gleaned from them. We begin by briefly reviewing the construction process for GEMs and general predictions made by metabolic models. We also give a historical perspective of the construction of the archaeal GEMs. We then review the models and specific insights gained from model constructions, including novel metabolic enzymes/pathways. Finally, we demonstrate the utility of these metabolic models to the study of evolution of diversity in archaea. We do this by computing the conservation of reactions (based on genetic association of the enzymes) across the archaea and visualizing the extent of conservation on a comprehensive map of the metabolism of the methanogen Methanosarcina acetivorans.

Genome-Scale Metabolic Models (GEMs)
Metabolic networks are invaluable tools for qualitatively understanding an organism's metabolic behavior under given conditions and have a long history of use in biology. Systematic construction of metabolic models which couple metabolic networks with genetic associations, reactions that exchange metabolites with the environment, and the organism's biomass composition only began to take shape in the mid-1990s when Fleischmann et al. [46] fully sequenced the entire genome of the bacterium Haemophilus influenzae Rd. Through comparative genomics they showed that 68% of the known E. coli proteins had homologs in the H. influenzae Rd genome, enabling a hypothesis of which metabolic pathways exist in H. influenzae Rd. Since then, the pioneering work of Thiele and Palsson [47] has established genome-scale metabolic models (GEMs) as the standard computational tool with which to quantitatively study the metabolic behaviors of organisms. In 2010, a well-established workflow was published in an article detailing the best practices for the model construction process [47].
A GEM can best be described as a knowledge-base containing all the biochemical information describing an organism's metabolic network. They are typically presented as Systems Biology Markup Language (SBML) [48] files that can be queried to obtain information about individual reactions, metabolites, and genes coding for the enzymes that catalyze the reactions. Software such as MATLAB COBRA Toolbox [49,50] or COBRApy [51] that implement Constraint-Based Reconstruction and Analysis (COBRA) methods can then use the information within a GEM to compute predicted metabolic behaviors of the organism subject to specified environmental and physiological limitations [52]. Alternatively, one can create independent analysis tools that simply use GEMs to identify product synthesis pathways [53][54][55][56], optimize bioprocessing efficiency [57,58], predict metabolic engineering targets [58,59], and elucidate more complex phenomena such as symbiosis in microbial communities [24,[60][61][62][63].

Model Construction and Predictions.
Here, we will briefly summarize the GEM construction process and highlight the most important characteristics of GEMs that one typically encounters. The de facto standard GEM construction protocol is that published by Thiele and Palsson [47] and should be referred to for standards within the field.
The construction process is divided into four broad stages: (1) automated construction of a draft model, (2) manual refinement of the draft model, (3) conversion of the model into a mathematical model, and (4) quantitative evaluation and refinement of the model. The first stage involves identifying all the potential reactions and pathways that the organism harbors based on its annotated genome. This process can be automated as it is essentially a bioinformatics problem requiring the comparison of the genome with databases that document known genes and their associated metabolic enzymes and pathways (e.g., KEGG [64], Uniprot [65], and BioCyc [66]). Many tools have been designed to facilitate this process such as the RAVEN toolbox [67], KBASE, PathwayTools [68], and the ModelSEED [69].
The manual curation stage of draft model refinement is the most time-consuming-arguably the most criticalportion of the process. All the reactions and pathways identified in the first stage are evaluated to ensure a variety of consistencies: experimental data should support their existence in the organism, masses and charges need to be balanced and consistent with reaction stoichiometries, and reaction directionalities need to be consistent with thermodynamic data. Missing pathways are added at this stage along with transport reactions responsible for the organism's influx and efflux of metabolites from the environment. The most crucial features that make GEMs unique and enable subsequent quantitative predictions are also established in this stage, specifically, the biomass objective reaction, the growth associated ATP maintenance reaction (GAM) and the corresponding nongrowth associated ATP maintenance (NGAM) reaction, and the Boolean gene-protein-reaction associations (GPRs).   [27] where the maximum-likelihood tree was constructed using 33 conserved ribosomal proteins and three largest RNA polymerase subunits. Highlighted species indicate that genome-scale metabolic models have been constructed for that organism. Although not shown in this adapted figure, a Methanobrevibacter smithii model within the Euryarchaeota has also been constructed. While numerous models have been constructed for Euryarchaeota, the Crenarchaeota are severely underrepresented.
Quantitative prediction using GEMs is typically framed as a linear programming problem in which one feature of the model is optimized under a given set of constraints. This feature is typically the model's biomass production rate (analogous to growth rate) which is described by a single pseudoreaction that produces a "biomass" pseudo metabolite by consuming all the metabolites that the organism requires to grow (e.g., individual amino acids, carbohydrates, lipids, nucleic acids, vitamins, cofactors, ions, and trace metals). Ideally, this reaction is constructed using the experimentally characterized biomass composition of the organism. However, this data is often difficult to obtain, leaving curators Archaea to either estimate biomass compositions from the organism's genome or adopt the compositions available from other organisms.
The GAM and NGAM reactions consume ATP. The GAM reaction reflects the ATP consumption required for the organism to grow whereas the NGAM reaction reflects the organism's basal ATP consumption required to survive but not necessarily to grow (e.g., maintaining membrane potential and redox balance). Since both reactions reflect the organism's energy requirements in the model, the choice of stoichiometric coefficients for these two reactions greatly influences the model's growth predictions. Ideally, the stoichiometric coefficients of these two vital reactions should be determined from a chemostat experiment in which the growth rate is tracked alongside ATP consumption (or some fiducial metabolite tracing ATP consumption, e.g., as in [70]). In practice, one will find that researchers often use a variety of estimation schemes based on the experimental data at hand. The most common alternative method for determining ATP requirements involves matching the growth rate and growth yield (grams dry mass per substrate uptake or efflux) data from batch cultures grown in different media assuming a specific biomass composition and stoichiometric matrix (e.g., as in [19]). In absence of measurements for the NGAM, it is often assumed to be some fraction of the GAM (for example, 2.5% the GAM reaction in the 2006 M. barkeri model by Feist et al. [19]).
The GPRs are Boolean expressions containing the genes that code for metabolic enzymes facilitating the reactions. By piecing the genes together in series of AND and OR operations, a GPR encodes which genes are necessary for an enzyme to be synthesized by the cell and therefore which genes are required for a metabolic reaction to exist. Predictions of gene knockout effects are commonly computed with GEMs. Not all reactions in the model will have GPRs due to either the lack of experimental gene characterizations, the use of nonphysical "gapfill" reactions [71], or the presence of novel uncharacterized pathways hypothesized by the curator.
Once this manual curation is complete, one can proceed to the third stage of converting the GEM into a quantitatively predictive model. This is done by defining the "objective" reaction to be optimized in the model and constraining the flux ranges on all model reactions. These flux ranges must reflect a specific growth condition to which the organism is subject. During model construction, most internal reaction fluxes will likely be unbounded, due to the relatively limited biochemical and proteomics data available for most reactions and organisms; it is the exchange reaction fluxes that must be constrained to reflect the nutrient availability of the organism's environment. Exchange reactions are nonphysical external reactions of the model that introduce compounds into the system, thereby simulating the organism's growth environment. These constraints [72] will have to be applied through COBRA-capable software. Once these model constraints have been set, flux balance analysis can be run to predict the organism's growth rate and the distribution of fluxes through the metabolic network. The fourth stage is validating these predictions with experimental growth data and discrepancies rectified with iterative manual refinement of the model. Numerous tools [67,69,[73][74][75][76][77][78] have been developed over the years to automate many stages of this arduous construction process, allowing researchers to focus their effort on the last stage of model refinement. Dias et al. [73] provide a comparative review of these various computational tools.

Genealogy of Archaeal GEMs
The genealogy of all the published archaeal GEMs to date is shown in Figure 2 (see Table 1 for statistics about the various models). The current archaeal GEMs can conveniently be divided between methanogenic and nonmethanogenic archaeal species with the former being the most developed due to the ecological roles that methanogens play in the global carbon cycle and their use in wastewater treatment [79]. Although the very first archaeal GEM was developed for Methanococcus jannaschii by Tsoka et al. [22] in 2003, the majority of the later methanogen GEMs were derived from a model for Methanosarcina barkeri (iAF692) which was first constructed by Feist et al. [19] in 2006. This inheritance stems from the fact that iAF692 was the first manually curated methanogen GEM thoroughly verified against experimental growth data. M. barkeri is also one of the most metabolically diverse methanogens in the Euryarchaeota kingdom, capable of consuming acetate, methylamines, methanol, CO, and CO 2 /H 2 . Two models, iVS941 [15] and iMB745 [16], were independently constructed for M. acetivorans and published in 2011 by the Maranas and Price research groups, respectively. M. acetivorans is equally diverse and similar in metabolism and thus inherited much of the M. barkeri GEM characteristics. iMB745 was then used as the base model from which to draft the more recent methanogen GEMs, Methanobrevibacter smithii (iMsi385) [24] and Methanospirillum hungatei (iMhu428) [21], both of which were qualified as preliminary reconstruction for use in larger microbial community studies. The most recent M. acetivorans models include iMAC868 [17] and our own iST807, which are both independent updates to iMB745. The GEM for M. maripaludis (iMM518) was constructed in 2014 by Goyal et al. [23,80] independent of the other methanogen GEMs. This is not surprising given that by this time GEMs construction had already been well established.
Nonmethanogenic archaeal GEM construction has been largely dominated by the work from Dieter Oesterhelt's research group. In 2008, they released the first manually curated GEM for Halobacterium salinarum R-1 (iOG478) [13]. This was followed in 2010 by a new GEM for a haloalkaliphile, Natronomonas pharaonis (iOG654) [25], which inherited significantly from the H. salinarum model. The only other nonmethanogenic archaeal GEM to our knowledge was independently constructed in 2012 for the Sulfolobus solfataricus by Ulas et al. [26].
Although archaea had been established since the 1980s as the third domain of life by the pioneering work of Carl Woese and collaborators [3-5, 7, 10], the lack of experimental data on metabolic characteristics of various archaeal species explains why so few GEMs have been constructed to date. Nevertheless, this early stage of archaeal GEMs development Archaea 5 Table 1: Model Statistics. The elementary units of genome-scale metabolic models are genes, metabolites, and reactions. These models consist of (1) a metabolic network that describes the connections between metabolites through reactions (often organized by metabolic subsystem), (2) gene-protein-reaction rules that map the gene dependencies of reactions, and (3)    Archaea provides a ripe opportunity for the community to grasp the core governing properties of archaeal metabolic networks and perhaps adopt standardized model building practices in order to facilitate more efficient communication of metabolic information among researchers going forward.

Methanogenesis Framework.
As the most defining metabolic pathway within methanogens, methanogenesis has been well characterized by numerous biochemical studies over the years. Therefore, the most significant and notable differences between the methanogen models will be found in the methanogenesis pathway and supportive pathways producing novel cofactors for different substrates. The basic framework is shown in Figure 3 where CO 2 is reduced to methane in a series of steps. Although this basic framework is well conserved among methanogens, the key difference lies in the exergonic-endergonic reaction couplings in the pathway. The first step of CO 2 reduction is an endergonic reaction that oxidizes ferredoxin and produces formylmethanofuran. In simple hydrogenotrophic methanogens that lack cytochromes, this energy is typically recovered by the methyl-H 4 MPT:CoM methyltransferase (Mtr) reaction and the heterodisulfide reductase (Hdr) reaction. Mtr expels Na + ions in the process of transferring the methyl group onto coenzyme M (CoM) and thus establishes the electrochemical gradient responsible for driving ATP synthesis. Electrons are extracted from formate or H 2 by Hdr which then uses these electrons to split the CoM-CoB heterodisulfide and reduce ferredoxin, thus replenishing the ferredoxin pool that is required to run the very first step of CO 2 reduction. This oxidation of formate or H 2 to reduce the heterodisulfide and ferredoxin is called the electron bifurcation reaction. This is in contrast to cytochrome-containing methanogens which are almost exclusively found within the Methanosarcinales. In these substrate-diverse methanogens, the Hdr enzyme evolved to harbor a cytochrome and can utilize methanophenazine as another electron carrier. Instead of directly reducing and replenishing the organism's supply of ferredoxin, Hdr expels hydrogen ions to establish a proton-based electrochemical gradient that is used by a membrane-bound energy conserving hydrogenase (Ech) to regenerate the reduced ferredoxin. This system is best exemplified by the M. barkeri model (iAF692) in which the Ech reaction was of particular interest during model construction because the ratio of protons translocated to electrons extracted was unknown at the time.
Using experimental growth yield data, a stoichiometry of 1 proton/2e and GAM/NGAM of 70/1.75 mmol/gDWT/hr enabled the model to predict growth yields consistent with experimental data for growth on methanol, acetate, H 2 /CO 2 , and pyruvate. This Ech stoichiometry was later updated in iMG746 to 2 protons/2e − . Although very closely related to M. barkeri, M. acetivorans has significant differences as a marine methanogen. Within methanogenesis, it substitutes Ech with the ferredoxin:NAD + oxidoreductase complex (Rnf) which interestingly translocates sodium ions instead of hydrogen ions [81]. This establishes a primarily Na + dominated electrochemical gradient and helps explain why M. acetivorans inhabits a marine environment [40] in contrast to freshwater M. barkeri [82]. Since M. acetivorans is not able to consume CO 2 , it would not be carrying out the endergonic first step of reducing CO 2 and thus justifies the absence of an Ech.

Methanogen
GEMs. The majority of methanogenic GEMs available to date derive from the M. barkeri model iAF692 and the M. acetivorans model iMB745. Both models describe seven major metabolic subsystems: vitamins and cofactor biosynthesis, amino acid metabolism, nucleotide metabolism, central metabolism, lipid and cell wall biosynthesis, and methanogensis. iMB745 inherited most of the reactions in iAF692 but also incorporated various additional pathways. The most notable changes include a modification of the methanofuran biosynthesis pathway based on homology of enzymes to those from the same pathway in M. jannaschii, a modified electron transport chain reflecting the aforementioned substitution of Rnf for Ech, and an updated biomass reaction that incorporated new carbohydrate, lipid, and nucleotide composition data. Although an attempt was made to estimate the GAM purely from genomic data, the model had to retain and optimize iAF692's original value in order to fit experimental growth data. The biomass reaction was more systematically constructed in iMB745 than iAF692.
The general components of the biomass reaction (proteins, RNA, DNA, lipids, carbohydrates, and trace components) were taken from a typical bacterial cell instead of an average methanogenic archaea cell, most likely due to the lack of experimental data. This practice is quite common when reconstructing archaeal GEMs and can have serious consequences because the biomass composition has significant influence over metabolic flux distributions throughout the network. These computed flux distributions may be biased by the use of bacterial biomass compositions rather than archaeal biomass compositions (see Tables 2 and 3 for biomass compositions from models). iVS941 was developed and published independently from iMB745 at the same time through homology comparison with M. barkeri and an automated curation procedure published by Suthers et al. [83]. The biomass reaction, which includes the GAM parameter, is directly inherited from iAF692, but the nucleotide compositions were modified to reflect the differences in G/C content between M. barkeri and M. acetivorans. The most recent models of the M. acetivorans lineage are iMAC868 and our own iST807, both of which are independently updated metabolic models. Although the M. smithii iMsi385 and M. hungatei iMhu428 models are indeed independent curations, we will not discuss them here because they are directly inherited from iMB745 and were qualified as preliminary draft models needing further revisions. iMAC868 was constructed to incorporate an engineered pathway that allowed for methane oxidation, essentially enabling the model to grow on methane and thus reversing the entire process of methanogenesis to produce the growth substrates that M. acetivorans would normally consume. Nevertheless, the model can still be used for simulations of a wild-type M. acetivorans and contains important updates to iMB745. iMAC868 merged the information from both iMB745 and iVS941 into a single model and corrected   [21]. Although it cannot use acetate as an energy source, the pathway to take up acetate is still present to shuttle it into gluconeogenesis. (c) The hypothesized methanogenesis pathway for M. acetivorans (iST807) [29]. The conventional CO 2 reduction pathway is only run in reverse as this methanogen cannot metabolize CO 2 . (d) The hypothesized methanogenesis pathway for M. barkeri (iAF692) [19] which bears great resemblance to that of M. acetivorans. The major differences between the two organisms' methanogenesis pathways lie in the electron transport chain (ETC). The specific pathways for each methanogen follow the same topological structure as the general methanogenesis illustration. Red circles are metabolites while green diamonds signify enzymatic reactions of the pathway.  numerous charge and mass imbalances within the electron transport chain. 64 GPRs were also updated with the most recent M. acetivorans gene annotations. The biomass, GAM, and NGAM requirements remained the same as those from iMB745. In iST807, we updated iMB745 by revising the methanofuran biosynthesis pathway with the most recent experimental data from M. jannaschii [84][85][86][87], adding 13 new reactions and 62 new genes, and revising the biomass reaction to utilize charged tRNAs instead of free amino acids. Among the new additions are reactions to enable pyrrolysine biosynthesis during methylamine growth, methyl-3mercaptopropionate metabolism, and o-phosphoserine conversion to cysteine after aminoacylation. Being able to uptake the various media components (Wolfe medium [88]) in which M. acetivorans is typically grown is crucial for accurately simulating the organism's metabolism. Many of the reactions required to emulate this are either missing or turned on in iMB745 and iMAC868. Cysteine is an important media component usually added with the purpose of quenching any oxygen in the methanogen's growth environment, but no one to date has verified whether this media component is also metabolized. Since unconstraining its uptake within iMB745 caused erroneously high growth rates, the cysteine uptake reaction was shut off and this was inherited by iMAC868 along with the various missing Wolfe media uptake reactions. iST807 fluxes this by incorporating uptake reactions for all the components of the Wolfe medium that have use in the metabolic network, including cysteine which is constrained  Figure 4: Growth characteristics of M. acetivorans models. The models were simulated using experimental growth substrate uptakes of MeOH:20, Acetate:7, and CO:11.6 mmol/gDCW/hr. Since experimental TMA uptake rates were not available, it was set to 6.77 mmol/gDCW/hr across all the models. This value was determined by fitting iST807 to experimental growth rates on TMA. iVS941 gave unrealistically large growth yields and therefore the values were omitted from the growth yield plot for a clearer display of the other models' performances. iVS941 also did not predict any methane production under the given growth conditions. Experimental growth rates are from [30][31][32][33][34][35][36][37][38][39][40]. Experimental growth yields are from [40][41][42]. Experimental CH 4 production rates are from [33,[42][43][44].
to a nongrowth-limiting value that maximizes the model's agreement with the experimental growth rates shown in Figure 4.
From the methanogenic GEMs geneology, it is clear that most of the methanogenic GEMs are inherited from iMB745 despite the fact that iVS941 was independently published at the same time. This inheritance trend is most likely due to the more complete model documentations and availability of a readily testable GEM provided for iMB745 in contrast to iVS941. Given that metabolic modeling for archaea is still a developing effort, this practice of providing poorly assembled GEM files that are ill-prepared for quantitative assessment is still, unfortunately, common in the field. In order to alleviate this problem, we provide in the supplementary information (in Supplementary Material available online at https://doi.org/10.1155/2017/9763848) of this review all the currently available M. acetivorans models standardized to use BIGG IDs (http://bigg.ucsd.edu/data access) and proper compartment tags such that the models can be conveniently handled within COBRApy. We also compare their growth characteristics as shown in Figure 4 to give a sense of how well these models perform with respect to each other and experimental data. We chose to focus on these models from this species because they are often used as templates for the reconstruction of many other methanogens.
iVS941 predicts unrealistic growth rates, growth yields, and no methane efflux, which indicates the model's deficiency. This growth characteristic assessment shows that, Archaea besides proper documentation, iMB745 also demonstrates better predictive ability over iVS941 and thus serves as a more reliable parent model for M. acetivorans. This is also evidenced by the predictive performances of iST807 and iMAC868 which are updated versions of iMB745. Across all growth substrates and growth characteristics, iMAC868 predictions showed a median deviation of 36% from experimental values. In contrast, iST807 demonstrated only a median deviation of 12% which is a marginal improvement over the 14% median deviation of iMB745. Although these statistics may seem to suggest that iMB745 and iST807 are more reliable models overall, it is important to keep in mind that growth predictions are heavily dependent on each model's allowed uptake reactions and their respective rates. In this assessment, each model's uptake reactions were set to the defaults that were provided within their respective publications. The uptake rate for the growth substrate being tested was uniformly set to the experimental value across all the models, and all other major growth substrate uptake reactions were turned off.

Halobacterium salinarum.
While only four GEMs have been developed for only three nonmethanogenic archaea, they provided significant insight into the metabolism and growth of the organisms. A reconstruction of the halophilic archaeum Halobacterium salinarum R-1 capable of growing on 15 different carbon/energy sources was developed by the group of Dieter Oesterhelt [13]. During construction, a novel pentose phosphate pathway (PPP) for the generation of ribulose-5-phosphate (R5P) was predicted and later verified. It was known that different archaea used different pathways to produce R5P (e.g., nonoxidative PPP, reverse ribulosemonophosphate pathway, and oxidative PPP). H. salinarum was missing all or portions of these pathways. An alternate pathway using the partial Entner-Doudoroff (ED) pathway was connected to the partial oxidative branch of the PPP by a semiphosphorylated 6-phosphogluconate. This pathway thus described why the organism retained parts of the oxidative PPP and part of the ED pathway even though it is incapable of growing on sugars. During the reconstruction the authors also noted that shikimate production was incomplete and thus proposed that hexose and L-aspartate-4 semialdehyde were used, consistent with 13 C labeling data from tryptophan degradation. Additionally, draft pathways for synthesis leucine, isoleucine, and valine could be generated in the model.
To calibrate the model, they measured the amino acid composition and content using experiments and found that protein mass constitutes 49% of the dry mass, much less than in the other archaea. Using dynamic simulations with experimentally measured uptake rates for amino acids they predicted internal fluxes from which they drew a number of conclusions. Most strikingly, only 15% of amino acid carbons ended up in biomass with the majority being used to produce energy in the TCA cycle. They found that all amino acids were simultaneously used, though arginine, aspartate, leucine, and isoleucine were taken up most quickly, even the essential amino acids methionine, lysine, isoleucine, leucine, and valine which the cells are incapable of producing. Using flux balance analysis, they found that H. salinarum primarily produces isoprenoid lipids using leucine (∼10%) while isoleucine was primarily degraded entering the TCA as acetyl-CoA and succinyl-CoA. Valine was the only amino acid that was primarily incorporated into biomass. Because the uptake rate of amino acid far outpaced the biomass incorporation they hypothesized that degradation pathways for all amino acids exist and proposed six enzymes to facilitate some of these reactions. However, it was only later that they determined the biosynthetic pathways for aromatic amino acids which they shared in common with M. jannaschii; during the discovery they used the metabolic model to identify uptake rates in auxotrophs [89]. Most impressively, they predicted, and later experimentally verified, that arginine is interconverted to ornithine during its degradation and is excreted to the environment early in growth, only to be taken up later as a source of arginine. Overall, they suggested that the greedy consumption of all available amino acids results in the "blooms" observed in the wild [13] and indicates that the metabolic pathways that have evolved are such that the organism can eat as quickly as possible to outgrow competitors.
The model was later updated to include a refined description of the respiratory chain as well as phototrophic growth leading to additional insights into metabolism [14]. Several key differences in the oxidative phosphorylation pathway compared to bacteria and mitochondria were proposed. First, because complex I is missing the NADH oxidation subunits, it uses another energy carrier. Second, that halocyanin carries electrons from complex III to complex IV rather than menaquinone. Finally, that ATP synthase has a stoichiometry of 10 protons per ATP, which is much higher than in most organisms.
By fitting uptake rates of amino acids to aerobic growth experiment measurements, they identified isoleucine, leucine, and valine as the preferred energy sources, while others such as alanine, proline, and ornithine had distinct periods of different uptake rate [14]. Thus, the organism hierarchically uses metabolites to maximize growth rate. They also predicted significant overflow of alanine, acetate, and succinate. Interestingly, they identified that arginine fermentation essentially revives cell growth, after which amino acid degradation and photosynthetic growth become dominant. They found that even during anaerobic phototrophic growth, the organism breaks down amino acids to obtain energy, even though they were incapable of deriving the maximal energy from respiration. Interestingly, they could identify that the network structure of amino acid degradation could describe why alanine was produced, specifically, as an overflow pathway during serine consumption. This is in contrast to aerobic growth where serine and alanine consumption appear to coincide with one another, likely due to the fact that pyruvate can be funneled into the TCA cycle. Overall, the studies of H. salinarum led to the conclusion that the organism evolved its metabolic behavior to maximize growth during blooms, which can occur sporadically with many years in between [13,14]. It was suggested that they use this as a strategy to outcompete other organisms that feed on available nutrients and build up enough of a population that they can survive long periods of starvation [14].

Natronomonas pharaonis.
The metabolic network for the polyextremophile (high salt concentration and alkaline pH) Natronomonas pharaonis was developed [25] using the reconstruction for H. salinarum. The network is significantly larger, with nearly 30% more genes associated with reactions, mostly due to additional amino acid and carbon degradation pathways. As N. pharaonis is capable of growth on a single carbon source the reconstruction complements that for H. salinarum which requires a complex broth for growth. For this reason, the reconstructions could be used to investigate questions regarding the metabolic objective of halophiles that are subject to different evolutionary pressures and answer questions about optimality of energy production.
The authors measured the amino acid content to define the biomass composition and found, similar to H. salinarum, that it made up about 75% of organic mass [13,25]. Perhaps the high protein content helps to compensate for the high osmolarity in which the organisms are grown. Using the model, predictions about aerobic growth were obtained; most importantly is that at very high (>7 : 3) and low (<3 : 7) acetate to oxygen consumption ratios the organism was incapable of growth. Using experiments, they identified an acetate:oxygen ratio of about 1 : 2 and an ATP maintenance cost of ∼30 mol/ΔOD⋅mL. A wide range of maintenance energies and acetate:oxygen ratios gave near optimal growth, indicating that growth of N. pharaonis is robust to environment and the biological objective is to maximize growth and energy production [25]. They found that the carbon incorporation was actually quite low (∼35%). Finally, using arguments about respiratory exchange ratio (e.g., the ratio between CO 2 production and oxygen consumption) the authors were able to demonstrate that about 10% of carbon is neither incorporated in biomass nor respired, suggesting that the organism uses some form of overflow metabolism [25]. While they did not make any suggestions, there are a number of likely suspects such as succinate or pyruvate which could act as available nutrients for other organisms.

Sulfolobus solfataricus. The final nonmethanogen model developed for an archaeon is for the hyperthermoacidophile
Sulfolobus solfataricus [26]. The model and organism are remarkable among the archaea represented here in that they grow optimally at a pH of 3.5 and temperature of 80 ∘ and consume 35 different carbon sources. The thermostability of their enzymes is of interest to bioengineers and makes the organism attractive for bioreactor design. Their unique abilities give them an edge in the hot-springs where they are found and allow them to consume a plethora of degraded organic mass. The final reconstruction consists of 706 reactions associated with 515 genes and conveys the ability to consume all 35 carbon sources. The model was calibrated with growth and nongrowth associated maintenances of 24.68 mmolATP/gDCW and 1.9 mmolATP/gDCW/hr, respectively, to match experiments. Interestingly, the GAM is the smallest of any archaea while the NGAM is moderate. Unfortunately, the model itself was not available and thus the biomass composition used in the study could not be compared with the others to identify the source of this low cost for growth (see Table 2). The authors of the study chose a phosphate/oxygen ratio of 0.5 as the final fit parameter of their model; this low value was due to the fact that the archaeon uses inefficient cytochrome complexes SoxABCD and SoxEFGHIM for respiration. Using these parameters, the model incorporates about 25% of carbon while respiring the rest.
During the model reconstruction, the authors identified the fact that S. solfataricus uses a reverse ribulosemonophosphate pathway (RRMP) instead of the pentose phosphate pathway. Specifically, they found that the organism was missing a transaldolase and thus they allowed accumulation of sedoheptulose 7-phosphate. They found that accumulation of sedoheptulose 7-phosphate in their simulated media accounted for ∼3% of all carbon atoms and thus is a significant portion of the overall carbon available for biomass. Simulations indicated that on glucose growth about 22% of carbon flux was fed into the RRMP pathway while the rest was metabolized to pyruvate via the Entner-Doudoroff (ED) pathway to be subsequently used in TCA cycle. Flux variability analysis of the metabolic model demonstrated that both the semiphosphorylative and nonphosphorylative branches of the ED pathway were possible and indicates that further studies are required to understand the growth of the organism. Similarly, the TCA cycle showed significant variability, primarily due to the glyoxylate shunt. Finally, variability in the production of amino acids such as histidine, tryptophan, alanine, and glutamate indicates different routes of synthesis.
Because a related organism Sulfolobus sp. VE 6 could grow autotrophically fixing bicarbonate, the authors searched for the hydroxypropionate-hydroxybutyrate cycle. They found 11 of the 16 enzymes and performed BLAST searches to identify putative homologs of the 5 remaining enzymes. Thus, they predicted that S. solfataricus is able to grow autotrophically and suggested that experiments should be performed. It is important to note, however, that autotrophic growth of S. solfataricus had already been verified by Zillig et al. [90]. During autotrophic growth, the model predicted that the TCA cycle was little used with flux flowing from succinyl-CoA through malate to form pyruvate which could be used in gluconeogenesis. Additionally, hydrogen sulfide was fixed to provide a sulfur source, and in fact it produces energy allowing the simulated organism to grow much more quickly than on glucose; however, this is likely due to the lack of an uptake rate on H 2 S. Regardless, this gives a hint about the possibility of syntrophic interactions with sulfate reducing bacteria.
The authors went on to compare the growth of the organisms on the 35 different carbon sources. To do this they fixed the carbon uptake rate and compared biomass flux. Overall, the organism grew significantly faster when growing on glycerol and propanol and marginally better on oligosaccharides. They also grew significantly more slowly Archaea on carbon sources than on compounds that enter the TCA cycle at points other than 2-oxoglutarate. Gene deletion assays indicated that over 50% of all single gene knockouts were nonlethal and an additional ∼25% allowed limited growth, suggesting S. solfataricus is metabolically versatile. Although S. solfataricus is naturally a thermophile and therefore likely possesses a large repertoire of thermostable enzymes, it has been suggested [91] that thermostability does not necessarily guarantee thermoactivity. Therefore, S. solfataricus' metabolic versatility is potentially advantageous in high-temperature environments, allowing the organism to circumvent the potential issues of decreased enzyme catalytic activity and higher mutation rates. Overall, these results suggest that S. solfataricus can preferentially consume certain carbon sources and regulate alternative pathways to support beneficial symbiotic relationships. Alternatively, one can also argue that S. solfataricus can take advantage of its metabolic versatility to outcompete other organisms.

Comparison of Metabolic Capabilities
Well-curated metabolic models function as comprehensive databases of the knowledge about organisms; thus, they are potentially useful tools for studying evolution and diversity of metabolism. Three properties of the metabolic models are of particular utility for comparative studies: (1) metabolic models connect gene function with metabolic function via their gene-protein-reaction rules, (2) the metabolic network is topologically defined by metabolites and the reactions that interconvert them, rather than the genes that facilitate those interconversions, and (3) metabolic networks coupled with modeling techniques allow for the identification of function redundancy/degeneracy. The first of these properties allows direct comparison of gene content based on metabolic function and the application of traditional evolutionary tools (e.g., bioinformatics and phylogenetic approaches). The second of these properties allows networks to be compared by function rather than gene content; for example, the network could be used to identify convergent evolution. The final of these properties could be used to provide insight into the selective pressures of the organism; specifically, duplicated functionality might suggest a critically important function for the organism.
To demonstrate the utility of metabolic models to evolutionary analyses we computed the conservation of genes facilitating metabolic reactions. An ITEP database [45] of 221 archaea, including each of the organisms in Table 1, was constructed using the default parameters. Briefly, ITEP is a software toolkit for examining microbial pan-genomes that provides functionality for constructing a BLAST database and querying protein family prediction, ortholog detection, and analysis of functional domains. Among its capabilities is assessing the GPRs of a metabolic model for each reaction and determine whether or not the homologs exist in another organism. The GPRs from the M. acetivorans model iST807 were used as input to the db evaluateReactionsFromGpr.py function to assess the conservation in other organisms. The "or" option to the function was used to assess whether any genes for each M. acetivorans reaction existed in the other archaea. Doing this for each organism, we computed the extent of conservation for each reaction (e.g., the fraction of organisms in which the reaction had conserved genes). The results can be seen in Figure 5. Given the broad comparison of many archaea against the well-curated model for M. acetivorans, it is not surprising that the extent of conservation is relatively low (blue) across the network. Nevertheless, it is interesting to note that the most visible conservation (red) occurs within Nucleotide metabolism, coenzyme synthesis-related reactions, and various amino acid biosynthesis reactions. This suggests that the underlying transcription and translation machinery of archaea are fairly similar.
Categorizing the homologous genes by metabolic subsystem as annotated in iST807 computed by ITEP lends more specific insight into conservation of metabolism in these archaea (see Figure 6). Amino acid biosynthetic pathways are generally highly conserved (labeled in blue if Figure 6). Proline and cysteine biosynthesis are notable exceptions. None of the proline biosynthesis genes annotated in M. acetivorans were found in type I methanogens, H. salinarum or S. solfataricus. Additionally, all genes annotated as synthesizing serine or glycine are missing in S. solfataricus. This could indicate either an incorrect annotation in the model or multiple proline biosynthetic pathways in metabolism. Biosynthesis of methanofuran, a cofactor in methanogenesis, is surprisingly well conserved beyond the methanogens. Notably nitrogen metabolism is the least similar among the archaea as other have previously identified [92]. Although this analysis leads to rather qualitative statements, it demonstrates how information from metabolic reconstructions can be used to compare differences between metabolism and study evolution and conservation of metabolic pathways. A similar analysis was performed with the GPRs from each of the metabolic models. The aggregate statistics from this analysis can be seen in Figure 7(a) where the count of reactions with a particular conservation level is shown. Similar to what was seen for M. acetivorans there appear to be mostly reactions that are highly conserved (low reaction uniqueness) or very lowly conserved (high reaction uniqueness). This is not a surprising observation as it has long been known that metabolic networks have a bowtie topology [93] and are generally scale-free networks [94]. Cross comparison between each of these conservation predictions in fine detail is beyond the scope of this review; however, we feel we have demonstrated the utility of using metabolic reconstructions as a tool to compare metabolism.

Conclusions
We have presented an overview of genome-scale metabolic models and discussed the defining metabolic features among the few GEMs available for archaea. In these discussions, we have also highlighted some much-needed improvements to model building practices in order to facilitate the development of archaeal models. By using the gene-protein-reaction associations in these archaeal GEMs, we also demonstrate the invaluable utility of these metabolic models as they can Reactions with thin grey lines are not associated with genes. To assess conservation, the db evaluateReactionsFromGpr.py functionality of the ITEP software [45] was used. It computes homologous genes to those in the GPRs of each reaction in iST807. The ITEP function was executed with the "or" option enabled to identify whether any of the enzymes (or enzymatic subunits) annotated as facilitating the reactions were encoded in the organism.
be extended beyond flux analysis to gain significant insight into evolutionary patterns among organisms. Visualizing the known archaeal metabolic models on a phylogenetic tree (see Figure 1) leads to the conclusion that model development in the community thus far has mostly focused on Euryarchaeota, leaving the Crenarchaeota largely unexplored. Although models do not yet exist for members of the Archaeoglobi, Thermoplasmata, and Thermococci classes, all other major Euryarchaeota classes have at least one representative model. This is not to underestimate the importance of further developing these Euryarchaeota models as Archaeoglobi have some of the most diverse metabolisms of any Euryarchaeota, capable of chemolithotrophy by reduction of sulfates, thiosulfates, nitrates, and heterotrophy via reduction of sulfates via organic compounds [95]. However, the paucity of GEMs for the Crenarchaeota is a major impediment for a comprehensive study of evolution and diversity in archaea. GEMs are invaluable tools to help guide the exploration and comparison of the great metabolic diversity of energy conservation in these organisms which are capable of sulfate reduction both chemolithotrophically and heterotrophically (members of the Desulfococcales), nitrate reduction (members of Thermoproteales) hydrogen oxidation, and sulfur reduction (members of the  The overall height of each bar indicates the total fraction of reactions in the metabolic subsystem that are conserved, while the height of individual portions of each bar indicates the relative conservation of reaction in the subsystem from that organism. The 221 archaea were grouped by taxonomic order, yielding 12 distinct groups as seen in the legend. Metabolic subsystems labels are color coded: amino acid metabolism subsystems in blue, vitamin and cofactor metabolism subsystems in green, central metabolism subsystems in red, and other categories in black. Here, we define uniqueness to be the fraction of 221 archaea that do not have the reaction currently present in the respective model; higher uniqueness means fewer of the organisms contain the genes coding enzymes that are annotated by the GPRs of the specified models. The presence of the genes is computed using ITEP as discussed in Figure 5. (b) A phylogenetic tree computed based on similarity to the M. acetivorans model iST807. This tree is based on the ITEP results discussed in Figure 5.
Sulfolobales) [95]. The existence of diverse energy conservation pathways will likely come with diverse electron transport chain and transport systems. Understanding these unique characteristics will be paramount in understanding growth in extreme conditions and syntrophy among microorganisms as well as for engineering communities for biotechnology applications.