We developed a computational method to identify NAD- and FAD-binding sites in proteins. First, we extracted from the Protein Data Bank structures of proteins that bind to at least one of these ligands. NAD-/FAD-binding residue templates were then constructed by identifying binding residues through the ligand-binding database BioLiP. The fragment transformation method was used to identify structures within query proteins that resembled the ligand-binding templates. By comparing residue types and their relative spatial positions, potential binding sites were identified and a ligand-binding potential for each residue was calculated. Setting the false positive rate at 5%, our method predicted NAD- and FAD-binding sites at true positive rates of 67.1% and 68.4%, respectively. Our method provides excellent results for identifying FAD- and NAD-binding sites in proteins, and the most important is that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified.
1. Background
Over the past 12 years, projects involving structural genomics have generated structural data for ~12,000 proteins within the Protein Data Bank (PDB) [1]. For most of these proteins, however, biological function is unknown. It is therefore important to develop computational methodologies that can identify a protein’s function from its structure. Many biochemical processes depend on interactions between proteins and cofactors, such as metal ions, vitamins, and adenine dinucleotides, for example, flavin adenine dinucleotide (FAD) and nicotinamide adenine dinucleotide (NAD). Adenine dinucleotides play important roles in many central biological processes, including DNA repair [2, 3], glycolysis, photosynthesis, and transcription [4–7]. By June 2010, 5293 proteins in PDB were annotated “nucleotide binding,” and nucleotides constitute ~15% of biologically relevant ligands [8]. These statistics demonstrate how ubiquitous and essential protein-nucleotide interactions are to biological processes.
Although protein-ligand interactions are fundamental to most biochemical reactions, structural information concerning these binding sites is still inadequate. Once ligand-binding sites can be predicted from structural data, putative functions can be assigned to these proteins. More complete annotation of protein function will benefit both basic science and the pharmaceutical industry. Mutations or deletions within these ligand-binding domains often alter biochemical reactions and are the root causes of many diseases. This makes binding sites attractive targets for drug therapies, including anticancer chemotherapy. In recent years computational methods have been used to identify ligand-binding sites within proteins. These methods include empirical approaches [9], support vector machines (SVM) [8, 10, 11], random forest [12, 13] and artificial neural networks [14], and structure comparison approaches [15–17]. These prediction methods can be divided into two broad categories: ones that use protein-sequence information, for example, amino acid composition, position-specific scoring matrix, and physicochemical properties, and ones that use protein-structure information, for example, dihedral angles, secondary structure, and 3D-structure comparison. The most effective prediction methodologies, however, tend to use a combination of sequence and structure data.
The structural genomics initiative resolves 20 new protein structures each week, and more than 60,000 structures have been deposited into PDB. The functional surfaces of proteins, which interact with cofactors, tend to be more structurally conserved than internal structures [18]. Residues that form a functional binding region are usually quite close to one another when the three-dimensional structure of a protein is examined. In addition, binding regions typically constitute only 10–30% of the entire protein [19–21]. We took advantage of previously generated structural information and used the fragment transformation method [22] to identify new binding sites for the NAD and FAD ligands.
2. Results2.1. Residues that Bind NAD or FAD
To characterize the structural environment of NAD-/FAD-binding sites, we compared binding-site residues to whole-protein residues. The three-dimensional structure of the NAD/FAD molecule was divided into three moieties according to function. Within the spherical environment of NAD, the adenosine-binding site typically contained glycine, isoleucine, tyrosine, and aspartic acid residues; the phosphate-binding site contained glycine, isoleucine, serine, threonine, methionine, phenylalanine, tyrosine, tryptophan, arginine, and histidine residues; and the nicotinamide-binding site contained serine, threonine, cysteine, phenylalanine, asparagine, tyrosine, tryptophan, histidine, and asparagine residues. For FAD, adenosine was bound by glycine, valine, cysteine, and tryptophan; phosphate was bound by glycine, serine, and arginine; and flavin was bound by cysteine, methionine, phenylalanine, tyrosine, tryptophan, and histidine. The residue types whose ratio of binding-site residues frequency to whole-protein residues frequency was greater than 1.2 were listed above. As such, the binding residues were primarily polar residues, containing charged groups, amide groups, and nucleophilic groups (Figure 1).
Amino acid frequencies within NAD-/FAD-binding sites. Frequencies within NAD-/FAD-binding sites (black) are compared with whole-protein frequencies (white). (a) Adenosine-binding of NAD. (b) Phosphate-binding of NAD. (c) Nicotinamide-binding of NAD. (d) Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flavin-binding of FAD. The preferred types of amino acids surrounding the different moiety of NAD/FAD are shown.
We also characterized the types of atoms that were within 3.5 Å of the three moieties of each NAD/FAD ligand (Figure 2). Nicotinamide and flavin moieties were most commonly associated with nitrogen and oxygen atoms within the backbone and side-chains of the protein. Phosphate moieties were commonly bound by backbone and side-chain nitrogen or side-chain oxygen. Each ligand moiety preferentially bound certain atoms within certain residues.
Atom-type frequencies within NAD-/FAD-binding sites. Frequencies for both backbone (black) and side-chain (white) atoms are shown. (a) Adenosine-binding of NAD. (b) Phosphate-binding of NAD. (c) Nicotinamide-binding of NAD. (d) Adenosine-binding of FAD. (e) Phosphate-binding of FAD. (f) Flavin-binding of FAD. The preferred types of atoms surrounding the different moiety of NAD/FAD are shown.
2.2. Prediction Performance
We chose two criteria to evaluate the performance of our binding-site predictions: performance at less than 5% FPR and the Matthews correlation coefficient (MCC). We used a combination of features that included the number of aligned residues, RMSD, BLOSUM, and DSSP. Using a 5% FPR threshold, NAD-binding sites were predicted with an accuracy of 93.46%, a sensitivity of 67.09%, and an MCC of 0.52. Under these same conditions, FAD-binding-site predictions yielded 93.59% accuracy, 68.43% sensitivity, and an MCC of 0.54 (Table 1). When MCCs were maximized, NAD-binding proteins were identified with 95.34% accuracy, 57.88% sensitivity, 97.64% specificity, and an MCC of 0.57. Under these same conditions, FAD-binding residues were identified with an accuracy of 94.33%, a sensitivity of 64.13%, a specificity of 96.27%, and an MCC of 0.55 (Table 2). These data indicated that our method could predict binding residues for these two ligands.
The performance of binding-site predictions at a 5% FPR threshold.
Accuracy (%)
Sensitivity (%)
Specificity (%)
MCC
NAD
93.46
67.09
95.08
0.52
FAD
93.59
68.43
95.22
0.54
The performance of binding-site predictions at a maximum MCC threshold.
Accuracy (%)
Sensitivity (%)
Specificity (%)
MCC
NAD
95.34
57.88
97.64
0.57
FAD
94.33
64.13
96.27
0.55
2.3. Comparison with Other Methods
We next compared our results with other prediction methodologies. For these comparisons we chose two published methods that use similar criteria for analyzing these kinds of ligand-protein complexes [10, 11]. These chosen methods assign binding or nonbinding status to each residue within NAD-/FAD-binding proteins. Because these published methods use an equal number of binding and nonbinding residues, we applied our prediction method to a similar dataset to make the results comparable. Random-selection processes were performed five times for all nonbinding residues within ligand-protein complexes to generate the same scale for binding and nonbinding residues within each protein. For NAD-binding proteins, our method predicted binding residues with a sensitivity of 86.21% and an MCC of 0.75 compared with 86.13% and 0.75 for the method developed by Ansari and Raghava [10] (Table 3). For FAD-binding proteins, our method yielded 85.68% sensitivity and an MCC of 0.75. These values compared with the performance of the published method (83.36% and 0.66) developed by Mishra and Raghava [11] (Table 4). Our method, therefore, has similar performance in NAD-binding sites predicted but better in FAD-binding sites. However, in native proteins, the number of binding and nonbinding residues should not be equal. The equal number model needs to be further discussed.
Comparison between the fragment transformation and SVM methods for predicting NAD-binding-site residues.
Accuracy (%)
Sensitivity (%)
Specificity (%)
MCC
Random 1
87.46
86.45
88.48
0.75
Random 2
87.23
85.79
88.67
0.74
Random 3
87.38
85.65
89.11
0.75
Random 4
87.46
86.91
88.01
0.75
Random 5
87.38
86.25
88.51
0.75
Average
87.38
86.21
88.56
0.75
SVM [10]
87.25
86.13
88.37
0.75
Comparison between the fragment transformation and SVM methods for predicting FAD-binding-site residues.
Accuracy (%)
Sensitivity (%)
Specificity (%)
MCC
Random 1
87.38
85.68
89.08
0.75
Random 2
87.48
85.73
89.23
0.75
Random 3
87.35
85.55
89.15
0.75
Random 4
87.58
85.73
89.43
0.75
Random 5
87.44
85.73
89.15
0.75
Average
87.45
85.68
89.21
0.75
SVM [11]
82.86
83.36
82.36
0.66
2.4. Template Matching
Figures 3–6 show alignments of predicted NAD-/FAD-binding proteins and corresponding templates. Structures within these figures were drawn using PyMOL [23] and color coded: light gray for the query protein; blue lines for the ligand; hot pink, orange, and forest sticks for adenosine-, phosphate-, and nicotinamide-/flavin-binding residues that are predicted correctly; and dark gray sticks for nonbinding residues that are predicted to be binding residues. Our method accurately identified 21 NAD-binding residues within chain A of D-2-hydroxyisocaproate dehydrogenase (PDB ID:1DXY) [24, 25], with ten false positives (Figure 3). Nine nicotinamide-binding residues were identified based on D-Lactate dehydrogenase (chain A; PDB ID:3KB6) [26, 27], three phosphate-binding residues were identified based on phosphoglycerate dehydrogenase (chain A; PDB ID:1YBA) [28], five adenosine-binding residues were identified based on C-terminal-binding protein/brefeldin A-ADP ribosylated substrate (chain A; PDB ID:1HKU) [29], and four were identified based on other protein templates. Our method also accurately predicted 23 NAD-binding residues within chain C of 5-carboxymethyl-2-hydroxymuconate semialdehyde dehydrogenase (PDB ID:2D4E), with only eight false positives (Figure 4). Nine nicotinamide-binding residues were identified based on aldehyde dehydrogenase (chain A; PDB ID:3B4W), three phosphate-binding and eight adenosine-binding residues were identified based on 1-pyrroline-5-carboxylate dehydrogenase (chain A; PDB ID:2EHU), and three were identified based on other protein templates.
Identification of NAD-binding sites. (a) Chain A of D-2-hydroxyisocaproate dehydrogenase (PDB ID:1DXY) was the query protein. Templates were constructed from (b) D-Lactate dehydrogenase (chain A; PDB ID:3KB6), (c) phosphoglycerate dehydrogenase (chain A; PDB ID:1YBA), and (d) C-terminal-binding protein/brefeldin A-ADP ribosylated substrate (chain A; PDB ID:1HKU).
Identification of NAD-binding sites. (a) Chain C of 5-carboxymethyl-2-hydroxymuconate semialdehyde dehydrogenase (PDB ID:2D4E) was the query protein. Templates were constructed from (b) aldehyde dehydrogenase (chain A; PDB ID:3B4W) and (c) 1-pyrroline-5-carboxylate dehydrogenase (chain A; PDB ID:2EHU).
Identification of FAD-binding sites. (a) Chain A of deoxyribodipyrimidine photolyase (PDB ID:1OWL) was the query protein. Templates were constructed from (b) human cryptochrome DASH (chain X; PDB ID:2IJG), (c) photolyase-like domain of cryptochrome 1 (chain A; PDB ID:1U3C), and (d) photolyase (chain A; PDB ID:1IQR).
Identification of FAD-binding sites. (a) Chain H of D-amino acid oxidase (PDB ID:1DDO) was the query protein. Templates were constructed from (b) putidaredoxin reductase (chain B; PDB ID:1Q1R), (c) D-amino acid oxidase (chain A; PDB ID:1C0I), and (d) glycine oxidase (chain B; PDB ID:1NG3).
For the FAD-binding proteins, our method accurately predicted chain A of deoxyribodipyrimidine photolyase (PDB ID:1OWL) [30] which contains 24 residues that bind FAD (Figure 5) and only six false positives occurred. Three adenosine-binding residues were identified based on human cryptochrome DASH (chain X; PDB ID:2IJG) [31, 32], six phosphate-binding residues were identified based on photolyase-like domain of cryptochrome 1 (chain A; PDB ID:1U3C) [33], eleven flavin-binding residues were identified based on photolyase (chain A; PDB ID:1IQR) [34], and four were identified based on other protein templates. In addition, 30 FAD-binding residues were accurately predicted within chain H of D-amino acid oxidase (PDB ID:1DDO) [35] with 14 false positives. Five adenosine-binding residues were predicted based on putidaredoxin reductase (chain B; PDB ID:1Q1R) [36, 37], three adenosine-binding and nine flavin-binding residues based on D-amino acid oxidase (chain A; PDB ID:1C0I) [38], three phosphate-binding and five flavin-binding residues based on glycine oxidase (chain B; PDB ID:1NG3) [39], and five based on other protein templates (Figure 6).
3. Discussion
Small molecular cofactors (ligands) are essential for cells to perform numerous biological functions. NAD and FAD, for example, bind to proteins that play critical roles in energy transfer, energy storage, and signal transduction, to name just a few. To understand the mechanism by which these ligands affect protein function, it is important to identify ligand-binding residues within relevant proteins. The experimental identification of these interacting residues is so difficult; however, that computational methods to accomplish this task are in high demand.
Here we developed a structure comparison method that uses both sequence and structure information to predict NAD-/FAD-binding residues within proteins. This approach also provides valuable information concerning the microenvironment of the protein-ligand interaction. The composition of NAD-/FAD-binding residues that we identified here is generally similar to previous studies [10, 11]. Interestingly, glycine was the most frequent binding residue, binding to NAD through phosphate or adenosine moieties more often than through the nicotinamide moiety. In contrast, arginine preferentially interacted with phosphate moieties and aspartic acid preferentially interacted with adenosine moieties of NAD, whereas threonine, cysteine, and histidine bound to nicotinamide. The most common residue within FAD-binding sites was also glycine, which preferentially bound phosphate and adenosine moieties. Serine interacted with phosphate moieties, whereas cysteine, tyrosine, and tryptophan primarily bound to nicotinamide. By taking advantage of this kind of structural information, details concerning these critical binding sites may be revealed. To investigate the influence of amino acids on prediction performance, the sensitivity and specificity associated with each residue were calculated (Figure 7). For NAD-binding-site predictions, specificity for each residue was excellent (0.927–0.966), but sensitivity was relatively low for phenylalanine, tryptophan, arginine, and glutamine which were less than 0.5. For FAD-binding sites, all residues achieved high specificity (0.933–0.971) and sensitivity (0.532–0.791). It should be noted that the ratio of NAD-/FAD-binding residues to nonbinding residues is about 1 to 16 in our dataset. This large difference might cause lots of false positives when predicted. That is the reason for high specificity and accuracy but low sensitivity in our prediction results. Hence, the positions of false positive residues in sequence were also investigated; 20% and 25% of false positive residues of NAD- and FAD-binding prediction occurred next to the true positive residues in sequence. It was shown that these residues are also located near the ligand in the coordinate space. If these residues were treated as true positive residues, our prediction results of NAD-binding yielded 71.55% sensitivity and 0.61 MCC at a 5% FPR threshold. Under the same conditions, FAD-binding-site predictions yielded 73.34% sensitivity and an MCC of 0.64. Compared with other prediction methods, ours did not use protein evolutionary information but only used protein structure and did not need to use equal number dataset for training but predicted whole-proteins through comparing structures of template database. Our results yielded excellent prediction performance when analyzing NAD-/FAD-binding residues and thus provide important details concerning the binding-site microenvironment. This approach, therefore, may be used to predict putative NAD-/FAD-binding proteins and the specific residues involved in the interaction.
Sensitivity and specificity associated with each amino acid in NAD-/FAD-binding-site predictions. (a) NAD. (b) FAD.
4. Methods 4.1. Overview
We extracted structures of proteins bound to NAD or FAD from PDB and constructed a database of NAD-/FAD-binding residue templates. Residues that were defined as binding residues by the ligand-binding database BioLiP [40] were included in the template. Query protein structures were then compared with each template in the database using a “leave-one-out” comparison method. The fragment transformation method [22] was used to align query and template structures. After comparing the local protein structure, each residue was assigned a score based on both protein sequence and structure. Sequence similarity was calculated using the BLOSUM62 substitution matrix [41], whereas structural similarity was calculated by measuring the root mean square deviation (RMSD) of the Cα carbons from local structure alignments and using a secondary structure substitution matrix [22] according to the Dictionary of Secondary Structure of Proteins’ (DSSP) definition of secondary structure [42]. Residues with an alignment score that exceeded a predetermined threshold were predicted to bind NAD/FAD. This method is illustrated in Figure 8.
Schematic of the method for predicting NAD-/FAD-binding sites.
4.2. NAD-/FAD-Binding Proteins and Binding Residue Templates
We adopted the same datasets with previous research [10, 11]. All protein complexes were collected from PDB and had pairwise sequence identity <40% by using CD-HIT. Proteins chains that are not involved in NAD/FAD binding were excluded. Residues that were defined as binding or nonbinding residues by using the ligand-binding database BioLiP. The main dataset included 184 and 165 polypeptide chains for NAD and FAD, respectively. Because NAD is composed of a nicotinamide moiety, an adenosine moiety, and a phosphate moiety, binding residues were divided into three groups: nicotinamide binding, adenosine binding, and phosphate binding. FAD-binding sites similarly contain flavin-binding residues, adenosine-binding residues, and phosphate-binding residues. Groups of residues that contained more than or equal to two binding residues were considered a binding residue template (see Figures 9 and 10).
We used the fragment transformation method to align NAD-/FAD-binding residues. Each residue was treated as an individual unit and was used to align the query protein S with the binding template T. The structural unit consists of a triplet formed by the N–Cα–C atoms within a given residue. S denotes the query protein of length m, and T denotes the template of n residues. The query protein S of length m and the template T of n residues can therefore be expressed in terms of triplets as S=σ1,σ2,…,σm and T=τ1,τ2,…,τn, where σi=(pN,pCα,pC), τj=(qN,qCα,qC), and p and q are PDB coordinates for each atom.
A matrix of dimensions m×n was then constructed for the residues of S and T as(1)M=M1,1M1,2⋯M1,nM2,1M2,2⋯M2,n⋯⋯⋯⋯Mm,1Mm,2⋯Mm,n,where the element Mij is a rigid-body transformation matrix that transforms the triplet σi to τj (i.e., Mijσi=τj).
4.4. Performing Triplet Clustering
Dklij is the Cartesian distance between the target τl and the transformed triplet Mijσk, providing a measure of how similarly the triplet pairs (σi,τj) and (σk,τl) are oriented. This allows clustering of triplet fragments using the single-linkage algorithm [43] as follows. If for two triplet pairs, (σi,τj) and (σk,τl), Dklij<D0, i≠k and j≠l, then the triplets are clustered. Let G1 and G2 be two clusters, with the first containing (σi,τj) and (σk,τl) and the second containing (σi′,τj′) and (σk′,τl′). If Dk′l′ij<D0, then G1 and G2 are merged to form a new cluster G3, where G3=G1∪G2. These procedures are performed iteratively until no new clusters can be formed. For each final cluster Gμ, we can obtain the transformation matrix Mk,lμ and aligned substructure pair Sμ=⋃σk∈Gμσk and Tμ=⋃τl∈Gμτl, where Gμ has the minimum Cartesian distance when using Mk,lμ.
4.5. Scoring Function
For each residue i, the binding score Ci is defined as(2)Ci=MAXσi∈Gμεμ×CμR×CμB×CμD,where εμ is the number of triplets of Sμ (i.e., the aligned residues of the query structure). The alignment scores CμR, CμB, and CμD are defined as(3)CμR=11+RMSDSμ,Tμ,CμB=BLOSUMSμ,TμBLOSUMTμ,Tμ+1CμD=DSSPSμ,TμDSSPTμ,Tμ+1,where RMSD (Sμ,Tμ) is the RMSD of all Cα atoms between Sμ and Tμ, BLOSUM (Sμ,Tμ) is the sequence alignment score between Sμ and Tμ calculated using the BLOSUM62 [41] substitution matrix, BLOSUM (Tμ,Tμ) is the maximum sequence alignment score of Tμ, DSSP (Sμ,Tμ) represents the secondary structure alignment score based on a construction substitution matrix [22] using the definition of DSSP [42] between Sμ and Tμ, and DSSP (Tμ,Tμ) is the maximum secondary structure alignment score of Tμ. The value of RMSD (Sμ,Tμ) should be <3 Å.
For each residue i, we predict a geometric center Θiω of the ligand by Θiω=Mk,lμ-1Lω, where Lω is the geometric center of the binding template type ω in template T. ω represents the three moieties of NAD/FAD: nicotinamide, adenosine, and phosphate for NAD; flavin, adenosine, and phosphate for FAD. The binding score Ck is added to Ci if the distance between Θiω and Θkω′ is between 3 and 9 Å, and ω≠ω′. Finally, the normalized binding score ZiC is calculated as(4)ZiC=Ci-C¯SDC,where C¯ and SDC denote the mean and standard deviation, respectively, of the binding score Ci.
4.6. Performance Assessment
The accuracy of predicting NAD-/FAD-binding sites wasdefined as the number of true positives and true negatives and was evaluated using a leave-one-out approach. Accuracy (ACC), the true positive rate (TPR), and the false positive rate (FPR) were calculated using true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values as follows:(5)ACC=TP+TNTP+TN+FP+FNTPR=Sensitivity=TPTP+FNFPR=1-Specificity=FPFP+TN.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Authors’ Contribution
Chih-Hao Lu and Chin-Sheng Yu developed and implemented the methods; Chih-Hao Lu, Chin-Sheng Yu, Yu-Feng Lin, and Jin-Yi Chen carried out the analysis; Chih-Hao Lu and Yu-Feng Lin drafted the paper; Chih-Hao Lu supervised the work. All the authors have read and approved the content of the final paper. Chih-Hao Lu and Chin-Sheng Yu contributed equally to this work.
Acknowledgment
This work was supported by Grants from the National Science Council (NSC 99-2113-M-039-002-MY2) and China Medical University (CMU99-N2-02), Taiwan, to Chih-Hao Lu. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the paper. The authors are grateful to Yeong-Shin Lin (National Chiao Tung University, Taiwan) for invaluable comments.
BermanH. M.WestbrookJ.FengZ.GillilandG.BhatT. N.WeissigH.ShindyalovI. N.BourneP. E.The protein data bank200028123524210.1093/nar/28.1.2352-s2.0-0033954256WilkinsonA.DayJ.BowaterR.Bacterial DNA ligases20014061241124810.1046/j.1365-2958.2001.02479.x2-s2.0-0034944736BürkleA.Physiology and pathophysiology of poly(ADP-ribosyl)ation200123979580610.1002/bies.11152-s2.0-0034814670ZhangQ.PistonD. W.GoodmanR. H.Regulation of corepressor function by nuclear NADH200229555611895189710.1126/science.10693002-s2.0-0037040581SmithJ. S.BoekeJ. D.An unusual form of transcriptional silencing in yeast ribosomal DNA199711224125410.1101/gad.11.2.2412-s2.0-0031056907AndersonR. M.BittermanK. J.WoodJ. G.MedvedikO.CohenH.LinS. S.ManchesterJ. K.GordonJ. I.SinclairD. A.Manipulation of a nuclear NAD+ salvage pathway delays aging without altering steady-state NAD+ levels200227721188811889010.1074/jbc.M1117732002-s2.0-0037166274RutterJ.ReickM.WuL. C.McKnightS. L.Regulation of crock and NPAS2 DNA binding by the redox state of NAD cofactors2001293552951051410.1126/science.10606982-s2.0-0035919479ChenK.MiziantyM. J.KurganL.Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors201228333134110.1093/bioinformatics/btr6572-s2.0-84856577919SaitoM.GoM.ShiraiT.An empirical approach for detecting nucleotide-binding sites on proteins2006192677510.1093/protein/gzj0022-s2.0-31544440473AnsariH. R.RaghavaG. P. S.Identification of NAD interacting residues in proteins201011, article 16010.1186/1471-2105-11-1602-s2.0-77950423114MishraN. K.RaghavaG. P. S.Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information2010111article S4810.1186/1471-2105-11-S1-S482-s2.0-75149173507LiuZ.-P.WuL.-Y.WangY.ZhangX.-S.ChenL.Prediction of protein-RNA binding sites by a random forest method with combined features201026131616162210.1093/bioinformatics/btq2532-s2.0-77954185426WangL.LiuZ. P.ZhangX. S.ChenL.Prediction of hot spots in protein interfaces using a random forest model with hybrid features201225311912610.1093/protein/gzr0662-s2.0-84863285682ChauhanJ. S.MishraN. K.RaghavaG. P. S.Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information201011, article 30110.1186/1471-2105-11-3012-s2.0-77952971734RoyA.ZhangY.Recognizing protein-ligand binding sites by global structural alignment and local geometry refinement201220698799710.1016/j.str.2012.03.0092-s2.0-84861975365XieL.BourneP. E.Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments2008105145441544610.1073/pnas.07044221052-s2.0-44449139387YangJ.RoyA.ZhangY.Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment201329202588259510.1093/bioinformatics/btt4472-s2.0-84885655034YanY. T.LiW.-H.Identification of protein functional surfaces by the concept of a split pocket200976495997610.1002/prot.224022-s2.0-68149183187DillK. A.Dominant forces in protein folding199029317133715510.1021/bi00483a0012-s2.0-0025370815GovindarajanS.GoldsteinR. A.Evolution of model proteins on a foldability landscape1997294461466ParisiG.EchaveJ.Structural constraints and emergence of sequence patterns in protein evolution200118575075610.1093/oxfordjournals.molbev.a0038572-s2.0-0035028705LuC. H.LinY. S.ChenY. C.YuC. S.ChangS. Y.HwangJ. K.The fragment transformation method to detect the protein structural motifs200663363664310.1002/prot.209042-s2.0-33646070925SchrodingerL.2010DenglerU.NiefindK.KießM.SchomburgD.Crystal structure of a ternary complex of D-2-hydroxy-isocaproate dehydrogenase from Lactobacillus casei, NAD+ and 2-oxoisocaproate at 1.9 Å resolution1997267364066010.1006/jmbi.1996.08642-s2.0-0031552374GrossE.SevierC. S.ValaA.KaiserC. A.FassD.A new FAD-binding fold and intersubunit disulfide shuttle in the thiol oxidase Erv2p200291616710.1038/nsb7402-s2.0-0036142325AntonyukS. V.StrangeR. W.EllisM. J.BesshoY.KuramitsuS.InoueY.YokoyamaS.HasnainS. S.Structure of d-lactate dehydrogenase from Aquifex aeolicus complexed with NAD+ and lactic acid (or pyruvate)200965, part 121209121310.1107/S17443091090449352-s2.0-74549217865WuC. K.DaileyT. A.DaileyH. A.WangB. C.RoseJ. P.The crystal structure of augmenter of liver regeneration: a mammalian FAD-dependent sulfhydryl oxidase20031251109111810.1110/ps.02381032-s2.0-0037407929ThompsonJ. R.BellJ. K.BrattJ.GrantG. A.BanaszakL. J.Vmax regulation through domain and subunit changes. The active form of phosphoglycerate dehydrogenase200544155763577310.1021/bi047944b2-s2.0-17144364299NardiniM.SpanòS.CericolaC.PesceA.MassaroA.MilloE.LuiniA.CordaD.BolognesiM.CtBP/BARS: a dual-function protein involved in transcription co-repression and Golgi membrane fission200322123122313010.1093/emboj/cdg2832-s2.0-0037938505KortR.KomoriH.AdachiS. I.MikiK.EkerA.DNA apophotolyase from Anacystis nidulans: 1.8 Å structure, 8-HDF reconstitution and X-ray-induced FAD reduction20046071205121310.1107/S09074449040093212-s2.0-10044255851HuangY.BaxterR.SmithB. S.PartchC. L.ColbertC. L.DeisenhoferJ.Crystal structure of cryptochrome 3 from Arabidopsis thaliana and its implications for photolyase activity200610347177011770610.1073/pnas.06085541032-s2.0-33845189961ThodenJ. B.WohlersT. M.Fridovich-KeilJ. L.HoldenH. M.Molecular basis for severe epimerase deficiency galactosemia. X-ray structure of the human V94M-substituted UDP-galactose 4-epimerase200127623206172062310.1074/jbc.M1013042002-s2.0-0035827638BrautigamC. A.SmithB. S.MaZ.PalnitkarM.TomchickD. R.MachiusM.DeisenhoferJ.Structure of the photolyase-like domain of cryptochrome 1 from Arabidopsis thaliana200410133121421214710.1073/pnas.04048511012-s2.0-4344702547KomoriH.MasuiR.KuramitsuS.YokoyamaS.ShibataT.InoueY.MikiK.Crystal structure of thermostable DNA photolyase: pyrimidine-dimer recognition mechanism20019824135601356510.1073/pnas.2413713982-s2.0-0035923661TodoneF.VanoniM. A.MozzarelliA.BolognesiM.CodaA.CurtiB.MatteviA.Active site plasticity in D-amino acid oxidase: a crystallographic analysis199736195853586010.1021/bi96305702-s2.0-0030925616SevrioukovaI. F.LiH.PoulosT. L.Crystal structure of putidaredoxin reductase from Pseudomonas putida, the final structural component of the cytochrome P450cam monooxygenase2004336488990210.1016/j.jmb.2003.12.0672-s2.0-1042264042SongS. Y.XuY. B.LinZ. J.TsouC. L.Structure of active site carboxymethylated D-glyceraldehyde-3-phosphate dehydrogenase from Palinurus versicolor1999287471972510.1006/jmbi.1999.26282-s2.0-0033537901PollegioniL.DiederichsK.MollaG.UmhauS.WelteW.GhislaS.PiloneM. S.Yeast D-amino acid oxidase: structural basis of its catalytic properties2002324353554610.1016/S0022-2836(02)01062-82-s2.0-18744385215SettembreE. C.DorresteinP. C.ParkJ. H.AugustineA. M.BegleyT. P.EalickS. E.Structural and mechanistic studies on thiO, a glycine oxidase essential for thiamin biosynthesis in Bacillus subtilis200342102971298110.1021/bi026916v2-s2.0-0037452907YangJ.RoyA.ZhangY.BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions2013411D1096D110310.1093/nar/gks9662-s2.0-84876574278HenikoffS.HenikoffJ. G.Amino acid substitution matrices from protein blocks19928922109151091910.1073/pnas.89.22.109152-s2.0-0026458378KabschW.SanderC.Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features198322122577263710.1002/bip.3602212112-s2.0-0020997912GowerJ. C.RossG. J. S.Minimum spanning trees and single-linkage cluster analysis1969181, article 11