In Silico Characterization and Homology Modeling of a Cyanobacterial Phosphoenolpyruvate Carboxykinase Enzyme

ATP-dependent phosphoenolpyruvate carboxykinase (PEPCK) is a key catabolic enzyme found in various species of bacteria, plants, and yeast. PEPCK may play a role in carbon fixation in aquatic ecosystems consisting of photosynthetic cyanobacteria. RuBisCO-basedCO 2 fixation is prevalent in cyanobacteria throughC 3 intermediates; however, a significant amount of carbon flows into C 4 acids during cyanobacterial photosynthesis. This indicates that a C 4 mechanism for inorganic carbon fixation is prevalent in cyanobacteria with PEPCK as an important β-carboxylation enzyme. Newly available genomic information has confirmed the existence of putative PEPCK genes in a number of cyanobacterial species. This project represents the first structural and physicochemical study of cyanobacterial PEPCKs. Biocomputational analyses of cyanobacterial PEPCKs were performed and a homologymodel ofCyanothece sp. PCC7424PEPCKwas generated.Themodeled enzyme consists of anN-terminal andC-terminal domains with a mixed α/β topology with the active site located in a deep cleft between the two domains. Active site residues and those involved in metal ion coordination were found to be conserved in the cyanobacterial enzymes. An active site lid which is known to close upon substrate binding was also predicted. Amino acid stretches that are unique to cyanobacterial PEPCKs were also identified.


Introduction
Phosphoenolpyruvate carboxykinase (PEPCK; EC 4.1.32)catalyzes the reversible ATP-or GTP-dependent decarboxylation of oxaloacetate (OAA) to yield phosphoenolpyruvate (PEP).This reaction uses the phosphate group from the nucleotide triphosphate and, as a result, produces CO 2 and the corresponding nucleoside diphosphate.PEPCK has a strict requirement for divalent cations with Mn 2+ as its best activator [1].Two classes of PEPCKs exist in nature, and they are classified in the basis of the nucleotide substrate: ATPutilizing enzymes are found in bacteria, plants, and yeast, while GTP-dependent PEPCKs are found mostly in higher eukaryotes [2].GTP-dependent PEPCKs also occur in some bacteria such as Corynebacterium glutamicum [3].While there is no significant sequence identity between the two classes, a number of residues are completely conserved across all PEPCKs in the regions of the enzyme that are necessary for nucleotide binding and metal ion coordination [1].The crystal structures of PEPCKs from representative species of plants, bacteria, and mammals have been published, and conservation in metal and substrate binding were confirmed [4].
In mammals, two forms of PEPCKs exist: PEPCK-M, which is expressed in mitochondria, and PEPCK-C, the cytosolic form of the enzyme.Mammalian PEPCK plays a major role in gluconeogenesis and glyceroneogenesis.The enzyme is also a contributor to other downstream processes [7].The significance of the PEPCK reaction has been studied in Streptococcus bovis, Selenomonas ruminantium, and other bacteria where PEPCK was found to be involved in growth initiation and amino acid synthesis [8].E. coli overexpressing PEPCK exhibited slower growth rate and increased ATP production [9].PEPCK has also been investigated as the sole anaplerotic enzyme in the yeast Saccharomyces cerevisiae [10].In plants, where PEPCK isoforms are expressed in various tissues, regulation of the enzyme by phosphorylation is dependent upon location and illumination [11].The same is true for CAM plants such as Ananas comosus [12].1: Multiple sequence alignment of selected regions of ATP-and GTP-dependent PEPCKs.The yellow block highlights are amino acid stretches that are conserved and exclusive to cyanobacterial PEPCKs; the amino acids in the blue blocks make up the pyruvate binding site; the kinase-1a and kinase 2 motifs are highlighted by the red and pink boxes, respectively; the active site lid amino acids are shaded in green; the adenine binding site is highlighted in olive green.The alignment was generated with Clustal W [5].
PEPCK is also prevalent in marine ecosystems where a significant amount of CO 2 fixation takes place.The enzyme is involved in light-dependent carbon fixation, and its activity varies with changes in water temperature and nitrogen concentration [13].Furthermore, PEPCK contributes to photosynthetic carbon fixation as an addition to the C 3 pathway catalyzed by RuBisCO [14].The same may be said of marine cyanobacteria where -carboxylation reactions involving phosphoenolpyruvate carboxylase (PEPC) and PEPCK are prevalent [15].
Cyanobacteria are unable to derive metabolic energy from the Krebs cycle since they lack -ketoglutarate dehydrogenase and NADH oxidase.Furthermore, C 4 acid concentration fluctuates in cyanobacteria in response to lightdark cycles.This indicates that C 4 mechanisms for inorganic carbon fixation involving PEPC and PEPCK are widespread Structural Biology in those organisms [16].In marine ecosystems, PEPCK may be involved in light-independent carbon fixation (LICF) in diatoms [17].
Despite their potential importance in net CO 2 fixation, PEPC and PEPCK from cyanobacteria have not been extensively studied.A model of PEPC from a marine cyanobacterium is available, but no such model exists for cyanobacterial PEPCK [18].In this work, a homology model for a cyanobacterial PEPCK is presented for the first time.Furthermore, the enzymes' physicochemical properties were characterized in silico.

Materials and Methodology
2.1.PEPCK Protein sequences.Phosphoenolpyruvate carboxykinase (PEPCK) protein sequences were retrieved from the National Center for Biotechnology Information (NCBI) [http://www.ncbi.nlm.nih.gov/].The search of the protein database yielded 21 cyanobacterial sequences from which 10 cyanobacterial enzymes were selected after redundant sequences were excluded.Bacterial, mammalian, and plant sequences were also obtained from the NCBI protein database (Table 1).The sequences were converted to FASTA format using the ReadSeQ sequence conversion server [19].

Sequence Alignments. Multiple sequence alignments
were performed with Clustal W [5].The Clustal W alignment file was imported into the BoxShade sequence alignment editor.Identical and similar amino acids were shaded or colored.Phylogenetic analysis of protein sequences was generated using the alignment obtained with Clustal W.

Structural Analysis.
The amino acid composition of the cyanobacterial PEPCK sequences was computed using the PEPSTATS analysis tool [24].The physicochemical parameters such as the molecular weight, isoelectric point (pI), extinction coefficient, half-life, aliphatic index, amino acid property, instability index, and grand average hydropathy (GRAVY) were calculated using the ProtParam tool of the ExPASy proteomics server.Secondary structure elements prediction was performed using the Network Protein Sequence Analysis (NPS@) server and the Secondary Structural Content Prediction (SSCP) server [6,25].The consensus secondary structure content and predicted disulfide patterns of each cyanobacterial PEPCK are tabulated in Table 3.The presence of disulfide bridges was analyzed using the CYS-REC tool which predicts the most probable bonding patterns between available cysteine residues (http://linux1.softberry.com/berry.phtml).The 3D models of cyanobacterial PEPCKs were constructed using the protein structure homology model building program SWISS-MODEL with energy minimization parameters [21].The modeled tertiary structures were built on the basis of sequence identity with the high-resolution crystal structures of the enzyme from Thermus thermophilus.The Swiss PDB viewer [26] was used to visualize and refine the models, and PyMOL (Schrödinger Inc.) was used to generate publishable images of the PEPCK models.The modeled 3D structures were evaluated and validated with the WHAT IF and RAM-PAGE programs [22,27].

Molecular Docking.
The molecular operating environment (MOE) program was used to calculate the docking energies between oxaloacetate (OAA) and the active site of Cyanothece sp.PCC 7424 PEPCK.OAA was drawn using the MOE software [23].The PDB file of the modeled Cyanothece sp.PCC 7424 PEPCK was imported into MOE, and a binding pocket in the area between the C-terminal and N-terminal domains was selected as the target area in a 3D docking box.OAA was imported to the protein's 3D box, and the docking program was initiated to search for favourable binding configurations between OAA and Cyanothece sp.PCC 7424 PEPCK.MOE's docking program flexes the substrate and attempts to fit it into the active site using an energy minimization algorithm.Docking studies were carried out at the Laboratory of Molecular Computations and Bioinformatics at Howard University, Washington, DC.

Results and Discussion
The primary structure analysis of cyanobacterial phosphoenolpyruvate carboxykinase enzymes showed that the most abundant amino acid is leucine which accounts for 9% of the enzyme's primary structure.The least common amino acids were tryptophan and cysteine.Cysteine residues are much more prevalent in GTP-dependent PEPCKs as such a greater number of disulfide bridges were predicted for  A multiple sequence alignment of selected ATP-and GTP-dependent PEPCKs was performed (Figure 1).While sequence identity was insignificant between the two types, domains involved in catalysis and metal ion coordination were conserved between ATP-and GTP-dependent PEPCKs.A phylogenetic tree was constructed from the alignment (Figure 2).The cladogram illustrated the high degree of divergence between ATP-and GTP-utilizing enzymes.The multiple sequence alignment also revealed amino acid sequences that are only found in cyanobacterial enzymes and do not occur in PEPCKs from eukaryotes and other prokaryotic organisms (Figure 1).
The predicted secondary structure composition of Cyanothece sp.PCC 7424 PEPCK was determined using the NPS@ server which generates a consensus report from twelve secondary structure prediction methods [6].All PEPCKs shared similar alpha helical and -sheet content (Table 3).A more detailed analysis of the secondary structural elements of Cyanothece sp.PCC 7424 PEPCK was performed using the PDBsum tool [20].The secondary structure prediction server revealed that 26.05% of amino acids resided in helices, while 16.26% of residues were in -sheets.The rest of the amino acids were found in other conformations such as -hairpins and -turns.The topology of Cyanothece sp.PCC 7424 PEPCK is illustrated in Figure 3(b).The overall fold of the enzyme is similar to other PEPCKs for which crystal structures are available.The protein's main chain consists of a C-terminal domain and an N-terminal domain that are both folded into a mixed / topology.
The predicted secondary structures generated by the PDBsum tool were generally in agreement with the three-dimensional structure of Cyanothece sp.PCC 7424 PEPCK (Figure 3(a)).However, while the enzyme's computed schematic diagram illustrated the existence of -helices and -sheets in the N-terminal between residues 50 and 88, no such structures exist in the corresponding region of the 3D model.This may be due to the fact that PEPCKs across species align poorly in this particular area of the N-terminal domain.According to the schematic diagram, the active site lid 417 SKLAGTERGITAP 428 is located between two strands.This was accurately represented in the 3D structure.The same can be said for the locations of the kinase-1a and kinase 2 motifs as well as the ATP-binding site.
The homology model of Cyanothece sp.PCC 7424 PEPCK was generated with the SWISS-MODEL software using Thermus thermophilus PEPCK as a template (Figure 4).The overall fold of the Cyanothece sp.PCC 7424 enzyme is similar to that of Thermus thermophilus PEPCK as expected from a sequence identity score of 51% between the two proteins.According to the Ramachandran plot generated with the RAMPAGE server, 96.2% of residues are found in the most favoured region, while 2.7% of amino acids reside in the generously allowed region (Figure 5).Furthermore, the model's calculated QMEAN4 score of 0.771 long with a Z score 0.3 confirmed its reliability [28].The protein is organized into a C-terminal domain and an N-terminal domain with mixed / content.The active site is located in a deep cleft between the two domains.
Based on what is known of ATP-dependent PEPCKs the purported active site residues of Cyanothece sp.PCC 7424 PEPCK are R100, K240, K241, H260, K282, D297, K316, and R362.In the well-characterized E. coli enzyme, the lysine   PCC 7424 PEPCK.96.2% of residues are found in the most favoured regions, 2.7% of residues are in the most allowed regions, and 1.1% of residues are found I the outlier regions.The plot was generated with the RAMPAGE program [22].
residues homologous to K240 and K241 are involved in OAA and PEP binding [28].Furthermore, R100 along with R362 are thought to be involved in pyruvate binding; additionally, R100 and Y235 form H-bonds with CO 2 [29].The residues form a positively charged active site that is well suited for OAA and PEP binding (Figure 6).Metal ion coordination is also an important aspect of the catalytic mechanism of PEPCK.In Cyanothece sp.PCC 7424 PEPCK, active site residues H260, D297, and K316 are most likely involved in Mn 2+ or Mg 2+ binding.Furthermore the cyanobacterial protein features the sequence 306 IFNFEGCYAK 315 which forms the Ca 2+ -binding domain in ATP-dependent PEPCKs [30].
Cyanobacterial PEPCKs also feature the nucleotidebinding motif kinase-1a which consensus sequence is GXXGXGKT [31].This motif, which forms the phosphatebinding loop, is common in ATPase enzymes.Another motif found in cyanobacterial PEPCKs is kinase 2 with the consensus sequence XXXXD, where X is a hydrophobic amino acid.This motif is often present in nucleotide-binding proteins.In the C-terminal domain of Cyanothece sp.PCC 7424 PEPCK, the kinase-1a motif is 276 GLSGTGKT 283 , and the kinase 2 motif is 293 LIGDD 297 .In E. coli PEPCK, D296 and D297 are involved in Mn 2+ binding.The N-terminal domain of Cyanothece sp.PCC 7424 PEPCK includes two additional motifs that appear to be exclusive to cyanobacterial enzymes: 48 PTYGLE 53 and 203 GLHGDPE/I 209 .The adenine-binding motif, 482 RIPIKHT 488 , is found in the C-terminal of Cyanothece sp.PCC 7424 PEPCK.H487 is found in all cyanobacterial sequences, while the equivalent amino acid is found as either a valine or aspartate residue in other ATP-dependent PEPCKs (Figure 1).
The sequence 417 SKLAGTERGITAP 428 makes up the active site lid which is prevalent in ATP-utilizing PEPCKs [32].This loop stands above the active site cleft and prevents water molecules from interring with the substrate (Figure 4).The Ω-loop is also present in GTP-dependent enzymes, but it has a different location and sequence.The 3D model of Cyanothece sp.PCC 7424 PEPCK generated with the SWISS-MODEL software was that of the unliganded, lid open conformation of the enzyme.Additional models of the protein were built with the Phyre2 and ESyPred3D automated homology modeling programs using default parameters [33,34].The generated 3D structures depicted the enzyme in the lid closed conformation as both programs used the ATP-Mg 2+ -oxalate ternary complex of E. coli PEPCK [35].
The MOE software correctly predicted the binding site of Cyanothece sp.PCC 7424 PEPCK (Figure 7).While the program flexes the substrate, the conformation of the modeled protein is left unchanged when energy minimization calculations are being performed.The substrate is able to bind to the active site when the lid is in the open conformation, but the lid must close in order for all active site residues to interact with the substrate and to initiate catalysis [36].Furthermore, the binding of ATP to the nucleotide binding site causes a conformational change which correlates to lid closure and catalytic activity [37].

Conclusion
This study provided the 3D structure of Cyanothece sp.PCC 7424 PEPCK through a homology modeling approach.This particular enzyme was modeled because of this group's particular interest in investigating the role of PEPCK in overall carbon assimilation by Cyanothece sp.PCC 7424 an obligate autotroph with unique metabolic properties.Docking studies of the interaction of OAA with the enzyme's active site showed that the spatial arrangement of catalytic amino acids in the cyanobacterial enzyme was similar to that of other modeled PEPCKs; this suggests a catalytic mechanism that is similar to other forms of the enzyme.In general, the enzyme's primary structure varies considerably among species suggesting significant evolutionary divergence.The precise role of PEPCK in cyanobacteria has yet to be investigated, and whether PEPCK is present in marine cyanobacteria remains to be seen.More work will be needed to determine the potential role of PEPCK in cyanobacterial carbon fixation.

Figure 2 :
Figure 2: Cladogram of selected PEPCKs based on amino acid sequences of ATP-and GTP-dependent enzymes.

Figure 3 :
Figure 3: Schematic and topology diagrams showing the secondary structural elements in the Cyanothece sp.PCC 7424 PEPCK.(a) -helices are labeled with the letter "H", and -strands are lettered in uppercase., , and hairpin turns are also labeled.(b) Helices are represented as cylinders and -strands as arrows.The secondary motif map and topology diagram were calculated using the PDBsum tool [20].

Figure 4 :
Figure 4: Predicted 3D structure of the Cyanothece sp.PCC 7424 PEPCK.The models were generated with SWISS-MODEL using PDB template 1xkvB.PyMOL was used to visualize the model.The active side lid is shown as a purple loop in the C-terminal domain [21].

Figure 5 :
Figure 5: The Ramachandran plot for the modeled Cyanothece sp.PCC 7424 PEPCK.96.2% of residues are found in the most favoured regions, 2.7% of residues are in the most allowed regions, and 1.1% of residues are found I the outlier regions.The plot was generated with the RAMPAGE program[22].

Figure 6 :
Figure 6: The electrostatic surface representation of the active site of Cyanothece sp.PCC 7424 PEPCK generated by MOE [23].The surface residues are color coded based on their nature: the blue coloration indicates that the enzyme's substrate-binding pocket is positively charged (MOE; Chemical Computing Group, Inc.).

Figure 7 :
Figure 7: (a) Predicted 3D structure of liganded Cyanothece sp.PCC 7424 PEPCK.The binding pocket is shown in red with the bound OAA substrate in green.(b) Active site of Cyanothece sp.PCC 7424 PEPCK showing active site residues as sticks and OAA as spheres.The model was generated with SWISS-MODEL, and docking studies were performed with MOE (Chemical Computing Group, Inc.); [21, 23].

Table 3 :
Predicted consensus secondary structure content and predicted disulfide patterns of cyanobacterial PEPCs.The data was generated from the Protein Sequence Analysis server and CYS REC (http://linux1.softberry.com/berry.phtml).
[6]s indicates that the enzyme is likely to precipitate in acidic buffers (Table2).The extinction coefficient (EC) of cyanobacterial and prokaryotic PEPCKs was lower than that of GTP-dependent enzymes.The instability indices (Ii) of bacterial PEPCKs were below 40, indicating that they would be stable in solution.The ProtParam tool calculated instability indices above 40 for selected eukaryotic PEPCKs[6].All PEPCKs have negative GRAVY scores attesting to their solubility in hydrophilic solvents.The aliphatic index (Ai) which evaluates the relative volume occupied by the side chains of hydrophobic amino acids was generally higher in bacterial PEPCKs.A high aliphatic index indicates that a protein may remain stable over a wide range of temperatures.