Elucidation of Novel Structural Scaffold in Rohu TLR2 and Its Binding Site Analysis with Peptidoglycan, Lipoteichoic Acid and Zymosan Ligands, and Downstream MyD88 Adaptor Protein

Toll-like receptors (TLRs) play key roles in sensing wide array of microbial signatures and induction of innate immunity. TLR2 in fish resembles higher eukaryotes by sensing peptidoglycan (PGN) and lipoteichoic acid (LTA) of bacterial cell wall and zymosan of yeasts. However, in fish TLR2, no study yet describes the ligand binding motifs in the leucine rich repeat regions (LRRs) of the extracellular domain (ECD) and important amino acids in TLR2-TIR (toll/interleukin-1 receptor) domain that could be engaged in transmitting downstream signaling. We predicted these in a commercially important freshwater fish species rohu (Labeo rohita) by constructing 3D models of TLR2-ECD, TLR2-TIR, and MyD88-TIR by comparative modeling followed by 40 ns (nanosecond) molecular dynamics simulation (MDS) for TLR2-ECD and 20 ns MDS for TLR2-TIR and MyD88-TIR. Protein (TLR2-ECD)–ligands (PGN, LTA, and zymosan) docking in rohu by AutoDock4.0, FlexX2.1, and GOLD4.1 anticipated LRR16–19, LRR12–14, and LRR20-CT as the most important ligand binding motifs. Protein (TLR2-TIR)—protein (MyD88-TIR) interaction by HADDOCK and ZDOCK predicted BB loop, αB-helix, αC-helix, and CD loop in TLR2-TIR and BB loop, αB-helix, and CD loop in MyD88-TIR as the critical binding domains. This study provides ligands recognition and downstream signaling.


Introduction
The innate immune response elicited by a variety of pattern recognition receptors (PRRs) is an immediate nonspecific and first line of defense of the host against invading various pathogens [1]. Toll-like receptors (TLRs) are a component of PRR and play a critical role by sensing organisms ranging from protozoa to bacteria and were involved in many infectious diseases [2]. They recognize a wide array of microbe associated molecular patterns (MAPMs) and activate downstream signaling to induce innate immunity [3]. The number of TLRs varied in different organisms and among these most TLRs are located on cell surface except for TLR3, TLR7, TLR8, and TLR9 [4,5].
Toll-like receptor 2 (TLR2) was shown to be the principal mediator of macrophages activation. It functions as homodimer [6] or heterodimer with TLR1 or TLR6 [7] to recognize diverse bacterial products [8] and activation of MyD88-dependent signaling pathway. In this pathway, the TIR domain of MyD88 interacts with the TIR domain of TLR [9] and transmits downstream signals to induce innate immune genes expression.
PGN is a highly complex structural component and an important derivative of both Gram-positive and Gramnegative bacterial cell wall, and is the target of the innate immune system [10]. It is composed of alternating -(1-4)-linked N-acetyl-glucosamine and N-acetyl-muramic acid residues cross-linked by peptide bridges [11]. It is recognized by various families of PRRs, including TLRs, nucleotide-binding oligomerization domain-containing proteins (NLRs), and peptidoglycan recognition proteins (PGRPs) [12]. In monocytes and macrophages, PGN binds to extracellular domain of TLR2 and activates signaling to induce inflammatory cytokines [13][14][15]. Structurally, PGN of most Gram-positive bacteria contains lysine at third position, and in Gram-negative and most rod-shaped Grampositive bacteria lysine is replaced by DAP [16]. Nascent PGN of bacterial cell wall is poorly recognized by TLR2. However, after its autolysin the remodeled PGN binds TLR2 with high affinity [16]. LTA is an amphiphilic, negatively charged glycolipid [17] component of Gram-positive bacteria cell wall. TLR2 binds LTA and activates signaling cascade to induce TNF-, IL-6, and IL-8 gene expression [18][19][20]. Zymosan is the cell wall derivative of Saccharomyces cerevisiae. It comprised mainly polysaccharides, of whichglucan and mannan are the major constituents. It was widely used as a model to study fungus-mediated inflammation, phagocytosis, and the production of inflammatory cytokines and chemokines [21,22]. TLR2 recognizes it directly or in coaction with CD14 and TLR6 [23,24] to induce TNF-gene expression [25]. TLR2 is the major pathway of proinflammatory signaling by zymosan interaction and is needed for the development of specific immune responses against pathogens [26].
India is the major supplier of fish in the world and ranks 3rd in freshwater fish production (FAO). Among various freshwater fishes, rohu (Labeo rohita) is the most commercially important and highly favored fish in the Indian subcontinent. TLR2 was characterized in rohu and the ligands that stimulate TLR2 signaling were also reported [35]. However, no studies have reported yet describing the structural characteristics of TLR2 in rohu and their key domains that binds to the specific ligands to stimulate cytokine expression. Furthermore, the key amino acids in the TLR2-TIR domains that interact with adapter molecule MyD88 to induce down-stream signaling were still unclear across the species.
To elucidate the structural scaffold in rohu TLR2, we report the 3D-model of extracellular domain of rohu TLR2 along with its key domains that are predicted to be involved in recognizing PGN, LTA and zymosan, and the critical region of interaction between TIR domains of TLR2 and MyD88. This is the first report across the fish species.  [38] program was used to perform sequence-structure comparison between the target and the template and was represented by JOY annotation program [39]. For each three domains (TLR2-ECD, TLR2-TIR, and MyD88-TIR) a set of twenty 3D models were generated by Modeller9v10 program [40]. Among these 20 models (for each domain), the model with lowest discrete optimized protein energy (DOPE) score was considered for further studies. The lowest DOPE models of TLR2-ECD, TLR2-TIR, and MyD88-TIR were subjected for loop modeling and refinement in Accelrys DS 2.5 (San Diego, Accelrys) under CHARMM force field. The long BB loops and DD loops in TLR2-TIR and MyD88-TIR models after loop refinement were notably analyzed, and changes were marked by superimposing them with their respective templates. The refined models were subjected to energy minimization by DS 2.5.

Molecular Dynamics
Simulation. Molecular dynamics (MD) simulations were carried out for the modeled systems using the GROMACS 4.5.5 program [41]. Homology models were set for MDS under GROMOS54a7 force field. The 3D models were placed in a cubic box maintaining a distance of 10Å for TLR2-ECD, 9Å for TLR2-TIR, and 9Å for MyD88-TIR between the box edges and the protein surface. The systems were solvated in simple point charge (SPC) models and were neutralized by adding counter ions. In order to remove spurious contacts energy minimization of the solvated systems was done using the steepest descent integrator. The bond lengths and geometry of water molecules were constrained. All of the three restrained models were subjected to position-restrained MD under NPT conditions for 1 ns (nanosecond). Finally, 40 ns production MD run was carried out for TLR2-ECD and 20 ns for TLR2-TIR and MyD88-TIR models using particle mesh Ewald (PME) electrostatics method under NPT conditions. Snapshots of the trajectory were taken in every 0.5 picoseconds. GROMACS and VMD 1.9.1 (http://www.ks.uiuc.edu/Research/vmd/) routines were utilized to check trajectories and the quality of the simulations. The graphs of trajectory analysis were created using Xmgr 4.1.2 (http://plasma-gate.weizmann.ac.il/Xmgr/).

Model Validation.
The final snapshot obtained at the end of each MDS was considered to represent the structures of the TLR2-ECD, TLR2-TIR, and MyD88-TIR models. These simulated models were set for validation by SAVES (http://nihserver.mbi.ucla.edu/SAVES/), WHAT IF [42], MolProbity [43], ProQ [44], ModFOLD [45], and MetaMQAP [46] servers. The simulated models were superimposed with their respective templates to examine the deflections by PyMOL (http://www.pymol.org/). Cross-check validation was carried out using model as template and the primary amino acids of the respective template as target. , were generated by Chemsketch (http://www.acdlabs.com/resources/freeware/chemsketch/). The 2D structure of zymosan (CID: 11375554) was obtained from the NCBI PubChem database (http://pubchem.ncbi.nlm.nih.gov/) and LTA was from KEGG (KEGG: C06042) ligand database (http://www .genome.jp/ligand/). The 3D structures of all these compounds were generated using PRODRG2 server (http:// davapc1.bioch.dundee.ac.uk/prodrg/) subjecting to chirality, full charges with energy minimization. The generated 3D structures were subjected to DS 2.5 for ligand minimization. The probable ligand binding pockets in TLR2-ECD were predicted by metaPocket finder [47] and Q-site finder [48]. The LTA binding site in mouse TLR2 (PDB ID: 3A7B) was also considered for docking. Molecular docking was carried out using AutoDock 4.0 [49], FlexX 2.1 [50], and GOLD 4.1 [51] following previously described methods [52] with receptor and ligand flexibility. In this, the important neighbouring residues at the predicted binding sites were set to flexible that covered all the active site residues and allowed for the flexible rotation of the ligand. Docking of previously reported PGN binding sites in other species [53] was also carried out. In AutoDock, the lowest-energy solution of the ligand all-atom RMSD cluster was taken to calculate the binding energy. The predicted interacting residues obtained by AutoDock were matched with the predicted binding pocket amino acids of metaPocket finder and Q-site finder, and these binding pockets were referred for docking in FlexX and GOLD. The docking poses with H-bond forming amino acids were graphically represented by PyMOL and DS 2.5.

Structural Refinement and Stability Evaluation of Complexes.
The best protein-ligand complexes obtained from docking studies of PGN, LTA, and zymosan with TLR2-ECD were subjected to MDS using the previously defined parameters in GROMACS. To gain insight into the structural stability of the protein-ligand and protein-protein complexes, MD simulations were performed for PGN-I-TLR2-ECD, PGN-II-TLR2-ECD, PGN-DAP-TLR2-ECD, LTA-TLR2-ECD, zymosan-TLR2-ECD, and TLR2-TIR-MyD88-TIR complex for different time periods of MDS. A production MD run for 10 ns was carried out for TLR2-ECD ligand complexes and protein-protein complex. The existence of H-bonds in the complex in different periods of MDS was analyzed.

In Silico Site-Directed Mutagenesis.
To identify the key amino acids among interacting amino acid residues in TLR2-ECD, TLR2-TIR, and MyD88-TIR domains, site-directed mutagenesis was carried out in DS 2.5 under build mutant protocol. Redocking was performed to calculate the fitness score in GOLD after mutation, and docking scores were cross-checked with previous fitness scores. Protein-protein interaction hot spots were predicted after mutagenesis by HADDOCK.    them, rohu TLR2 showed highest identity with common carp (88.1%) and lowest identity with mouse (40.5%). Nglycosylation site prediction server predicted 10 glycosylation sites, out of which 9 were in the ECD and 1 in the TIR domain. Among these 10 N-glycosylation sites, 8 were potential with a value greater than threshold value (0.5) and the remaining two were below the threshold. Single Nglycosylation site was present each at LRR1, 3, 8, 14, 16, 18, 21, and TIR domain, and the remaining two were at LRR6. The multiple sequence alignment of TLR2-TIR with other species (Figure 1(a)) and secondary structure analysis (Figure 1(b)) showed well-conserved -helices, -sheets, and biologically most important BB and DD loops [56]. In MyD88-TIR domain the multiple sequence alignment and secondary structure prediction (Figures 2(a) and 2(b)) also revealed a good conservation among the phylogenetically divergent species.

Structural Analysis of TLR2-ECD, TLR2-TIR, and
MyD88-TIR Domains. The BLAST search analysis showed that the ligand recognizing LRR regions in TLR2-ECD shared the close structural relationship with mouse TLR1-TLR2 heterodimer (PDB ID: 2Z81) having 35% and 52% sequence identities and similarities, respectively. The TLR2-TIR and MyD88-TIR domains shared 71% and 78% sequence identities with their respective templates (PDB ID: 1O77 and 2Z5V). The sequence-structure alignment by FUGUE revealed a good conservation of secondary structures ( -helices and -sheets) between the target and template. The identity scores between the LRR regions of TLR2-ECD and 2Z81 were presented in Table 1. The sequence identities between the important biological regions as reported in human and mouse TIR domains with their respective templates were given in Tables 2 and 3. The structure-structure alignment between the lowest DOPE score models and their respective templates showed good structural conservation across the domains. The TLR2-ECD took a horseshoe shape with 23 LRR domains including LRRNT and LRRCT. Most of the LRR domains consisted of -strands connected by long loop and some with -helices. The -strands faced towards the concave surface in TLR2-ECD model, and the -helices were present Confidence of prediction Predicted secondary structure Target sequence       trajectories of TLR2-TIR and MyD88-TIR were observed to be stable after 5 ns with an average RMSD of 2.74Å and 3.68Å, respectively. The RMSF of C atoms of each amino acid in TLR2-ECD identified LRR7-11 as the most flipped region (Figure 4(a)). These regions are constituted of sixsheets, one -helix, and long loops. In higher vertebrates, this region was reported as lipopeptide binding region [57]. The flexible long loops (BB and DD loops) in TLR2-TIR and MyD88-TIR domains showed major fluctuations in the RMSF graph and were expected to be engaged in protein-protein interaction (Figures 4(b) and 4(c)). Secondary structure analysis from the trajectory in TLR2-ECD showed little variation of -helices and -sheets. However, the coil regions much varied with respect to simulation period. In TLR2-TIR, no major secondary structural changes were observed during MDS.

Validation of Homology Models.
The PROCHECK analysis at SAVES of three models (TLR2-ECD, TLR2-TIR, and MyD88-TIR) showed that the phi-psi angles of most of the residues were in the allowed regions of Ramachandran plot ( Figure S2). The SAVES results (Table 4(a)) of all models were within the cut-off range suggesting the reliability of our proposed models. The protein stereochemical quality analysis by ProQ, ModFOLD, and MetaMQAP servers showed an acceptable score of all models (Table 4(b)). The average coarse packing quality, planarity, and the collision with symmetry axis, bond lengths, and bond angles obtained by the WHAT IF server of all models revealed the satisfactory acceptance of the models. The results of MolProbity server of all models were also within the range. The RMSD value for all atoms and C atoms by superimposing target models with their respective templates showed very low deviation along the significant biological domains. The low deviation between the target-template structures suggested the acceptance of the proposed models. The cross-check validation report indicated acceptability between the experimental structures (PDB ID: 2Z81, 1O77, and 2Z5V) and their models. The RMSD values calculated by PyMOL superimposition program for C atoms between the PDB coordinates and the lowest DOPE score models of 2Z81, 1O77, and 2Z5V generated by Modeller were 1.13Å, 0.518Å, and 0.705Å, respectively. The comparison of Ramachandran plot analysis of the homology models of 2Z81, 1O77, and 2Z5V showed similar results for both experimental and hypothetical models (Table S1). The cross-check validation fortified the acceptance of TLR2-ECD, TLR2-TIR, and MyD88-TIR models. (close to the N-glycosylation sites) were considered (Table S2) including previously reported LTA binding site in mouse TLR2 [58]. Interaction of PGN with TLR2-ECD in AutoDock revealed PGN binding sites at B5, and it was in agreement with the previous observation [53]. Both PGN-I and PGN-DAP showed good interactions at B5 regions (Figures 5(a)  and 5(b)). The lists of interacting amino acids were presented in  10) also resulted a satisfactory FlexX score for PGN-II (−13.23) and comparatively a low FlexX score was predicted for PGN-I and PGN-DAP. The rest of the binding pockets were found to be irrelevant for PGN interaction with very little positive and negative FlexX score. The interaction of LTA with TLR2-ECD with highest FlexX score (−6.92) was obtained at B mouse region (LRR12-14). Interaction of LTA at other binding sites yielded irrelevant FlexX score. Zymosan interacted with TLR2-ECD at B7 (LRR20-CT) region effectively with a FlexX score of (−13.81), and other binding sites were found to be very less interactive. The FlexX scores at different binding regions were presented in Table 5(b).

Binding Site
The GOLD scores for PGN-I, PGN-II, and PGN-DAP at B5 and B6 sites, for LTA at B mouse and for zymosan at B7 site were given in Table 5(c). The GOLD fitness score was found to be highest in the proposed binding sites of FlexX program in comparison to other binding sites (Figures 6(a)-6(f)). The GOLD docking score of PGN-I, PGN-II, and PGN-DAP was predicted to be less at B6 site as compared to B5 site (Figures 6(a)-6(c)). Interaction of PGN-II at B6 site also showed a good GOLD fitness score (Figure 6(d)). Docking of zymosan ( Figure 6(e)) and LTA (Figure 6(f)) in GOLD also ensured B7 and B mouse as the potential binding sites. cons-PPISP, InterProSurf, and PatchDock were observed to be present in the most flexible regions of TLR2-TIR and MyD88-TIR domains (Table S3). Majority of the interacting amino acids were distributed in BB loop, B, C, and CD loop in TLR2-TIR and BB loop, B helix, and CD loop in MyD88-TIR (Table S3). The best cluster obtained in HADDOCK showed a very low RMSD (1.2 ± 0.7Å) and intermolecular energy (−722.9 kcal mol-1) with a buried surface area of 2029.2Å 2 . The binding orientation and amino acid interactions generated by DS 2.5 were presented in Figure 7(a). The phylogenies of interacted amino acids identified the strongly bonded residues and were highlighted in Figure 7(b). DIMPLOT analysis of the complex showed that BB loop, C, and C helix residues of TLR2-TIR domain were mostly interacted with AA, BB loops and B, C helices of MyD88-TIR domain ( Figure S3(a)). The protein-protein complex (Rank-1) in ZDOCK yielded approximately same interacting amino acids residues (as HADDOCK) for TLR2-TIR and MyD88-TIR domains ( Figure S3(b)). Amino acid residues in TLR2-TIR and MyD88-TIR domain involved in hydrogen bonding and hydrophobic interactions were presented in Table 6.

Stability Analysis of Protein Complexes by MDS.
Previously it was reported that binding site B5 (LRR16-19) (corresponding to human) had the highest affinity for PGN recognition, and binding site B6 (LRR8-10) had a low potential for PGN recognition [53]. In this study, docking scores in B6 for PGN were comparatively lower than B5 region. However, the number of N-glycosylation sites closer to B6 was higher. This data may suggest the possibility of PGN interaction as it is comprised of N-acetylglucosamine. The H-bond analysis showed more number of H-bonds in B5 region (avg. 8 numbers) than B6 region (avg. 5 numbers) with highest binding affinity (Figures 8(a)-8(c)). The existence of H-bonds fluctuated at B6 region during different time periods    the amino acid residues that were involved in proteinprotein interaction were retained after the MDS ( Figure S4). This suggested that the predicted interacting loops and helices in TLR2-TIR and MyD88-TIR domains were significant for protein-protein interaction.

Validation of Binding Domains by Site-Directed Mutagenesis.
Alanine scanning of B5 regions of PGN binding domains showed loss of interaction due to absence of donors or acceptors in the active sites. But no single mutation of alanine for the interacting residues at this region showed a complete loss of docking. Proline and aspartic acid scanning of B5 region also ensued very low fitness score (9.6 and 12.8) in GOLD. Analysis of B5 region by mutating residues in pair, triplet, and quadruple combinations at a time indicated the viable importance of Asp394, Ser396, Asn397, and Ser399 as the fitness score attained a very minimum value in comparison to other GOLD runs.
Mutagenesis study at B7 residues in TLR2-ECD that formed H-bond with zymosan revealed a good fitness score for alanine scanning. However, proline scanning of all residues revealed loss of zymosan interaction with TLR2-ECD. Single proline mutation and acidic to basic mutation of different interacting residues followed by individual GOLD runs explored the importance of Ser520 and Asp522. A single mutation of Ser520-Pro520 and Asp522-Gln522 resulted in complete loss of zymosan interaction. Mutation of other residues altered the GOLD fitness score; however, in none of the cases docking loss was ensued. Alanine and proline scanning of LTA binding site resulted in low docking score emphasizing the role of key residues Asp320 and Phe324 in LTA recognition.

Conclusion
The proposed 3D model of rohu TLR2 describes the protein features and its important domains. It also accentuates the importance of predicting key amino acids and LRR regions that are responsible for the specific ligand interaction and TLR2 signaling in fish and depicts a residuedetailed structural theoretical model. In the absence of crystal structure of TLR2 in any fish, this study provides structural insight into the TLR2 domains architecture. In rohu fish, the peptides at LRR16-19 (at B5), LRR12-14 (at B mouse ) and LRR20-CT (at B7) are predicted to be the most important interacting regions for PGN, LTA, and zymosan interactions, respectively. The structural organization of TIR domains in fish TLR2 and adapter molecule MyD88 have also been described. The interaction between TLR2-TIR and MyD88-TIR domain highlighted the contribution of BB loop, B, C, and CD loop in TLR2-TIR and BB loop, B helix, and CD loop in MyD88-TIR domain. The data generated in this study are likely to be helpful to conduct further in vivo study to develop the strategy of innate immune activation and disease prevention in fish.