Research Article in Silico Identification of Potent Ppar-í Μí»¾ Agonists from Traditional Chinese Medicine: a Bioactivity Prediction, Virtual Screening, and Molecular Dynamics Study

The peroxisome proliferator-activated receptors (PPARs) related to regulation of lipid metabolism, inflammation, cell proliferation, differentiation, and glucose homeostasis by controlling the related ligand-dependent transcription of networks of genes. They are used to be served as therapeutic targets against metabolic disorder, such as obesity, dyslipidemia, and diabetes; especially, PPAR-í µí»¾ is the most extensively investigated isoform for the treatment of dyslipidemic type 2 diabetes. In this study, we filter compounds of traditional Chinese medicine (TCM) using bioactivities predicted by three distinct prediction models before the virtual screening. For the top candidates, the molecular dynamics (MD) simulations were also utilized to investigate the stability of interactions between ligand and PPAR-í µí»¾ protein. The top two TCM candidates, 5-hydroxy-L-tryptophan and abrine, have an indole ring and carboxyl group to form the H-bonds with the key residues of PPAR-í µí»¾ protein, such as residues Ser289 and Lys367. The secondary amine group of abrine also stabilized an H-bond with residue Ser289. From the figures of root mean square fluctuations (RMSFs), the key residues were stabilized in protein complexes with 5-Hydroxy-L-tryptophan and abrine as control. Hence, we propose 5-hydroxy-L-tryptophan and abrine as potential lead compounds for further study in drug development process with the PPAR-í µí»¾ protein.


Introduction
The peroxisome proliferator-activated receptors (PPARs) belonged to the nuclear receptor superfamily of ligandinducible transcription factors. They are "fatty acid sensors" related to regulation of lipid metabolism, inflammation, cell proliferation, differentiation, and glucose homeostasis by controlling the related ligand-dependent transcription of networks of genes [1][2][3]. There are three different isoforms of PPARs in mammal, which are PPAR-, PPAR-, and PPAR-/ . They have different tissue distributions and responses to different ligands [4][5][6]. PPARs are used to be served as therapeutic targets against metabolic disorder, such as obesity, dyslipidemia, and diabetes; especially, PPAR-is the most extensively investigated isoform for the treatment of dyslipidemic type 2 diabetes [7][8][9][10]. It is a well-known receptor located in fat for antidiabetic insulin sensitizers and has the functions related to adipogenesis, lipogenesis, and glucose homeostasis [11][12][13]. In rat stroke models, PPAR-has been served as a brain protector against ischemic cerebral infraction [14].
In the former study, we aim to detect potential candidates from TCM compounds as agonists targeting PPAR-, PPAR-, and PPAR- [40]. However, a compound which had   a higher binding affinity with target protein may not always obtain a higher bioactivity. In this paper, we aimed to focus on the target protein of PPAR-and filter TCM compounds using bioactivities predicted by three distinct prediction models before the virtual screening. The molecular dynamics (MD) simulations were also utilized to investigate the stability of interactions between ligand and PPAR-protein in the docking pose under dynamic conditions. We attempt to identify the potent TCM compounds with higher bioactivities and binding affinity for PPAR-protein and discuss the functional group of these candidates and common binding residues of PPAR-protein in their docking pose.  The X-ray crystallography structure of the human peroxisome proliferator-activated receptor gamma (PPAR-) protein was obtained from RCSB Protein Data Bank with PDB ID: 3K8S [43]. After protein preparation, the chain A of PPAR-protein was used as target protein for virtual screening, and T2384, cocrystallized in PPAR-protein, was used as control.

Biological Activity Prediction Using Multiple Linear Regression (MLR), Support Vector Machine (SVM), and Bayes
Network Toolbox (BNT) Models. For the prediction of biological activity for the TCM compounds, three distinct prediction models were constructed with the pEC 50 (log(1/EC 50 )) value of 20 compounds from Rikimaru et al. 's study [2] as training set. The genetic function approximation module [44] of DS 2.5 was utilized to determine the suitable molecular descriptors for constructing the prediction models, and the fitness of individual model was estimated by square correlation coefficient ( 2 ). Cross-validation test was used to validate the prediction model. For three distinct prediction models, multiple linear regression and Bayes network toolbox were performed using MATLAB, and support vector machine was performed using LibSVM developed by Chang and Lin [45].

Docking Simulation.
For virtual screening, LigandFit protocol [46] in DS 2.5 was employed to dock each compound into an active site using a shape filter and Monte Carlo ligand conformation generation, and each docked pose was minimized with Chemistry at HARvard Macromolecular Mechanics (CHARMM) force field [47] and evaluated with a set of scoring functions. In addition, LigPlot v.2.2.25 program [48] was employed to identify the interactions between protein and ligand in each docking pose.

Molecular Dynamics Simulation.
Before the molecular dynamics simulation by Gromacs [49], each protein-ligand complex in docking pose has been reprepared. Each ligand was reprepared by SwissParam program [50], and the protein was reprepared with charmm27 force field by Gromacs. The protein-ligand complex was solvated using a water model of TIP3P with a minimum distance of 1.2Å from the complex and then minimized by steepest descent algorithm [51] with maximum of 5,000 steps. Then a single 10 ps constant temperature (NVT ensemble) equilibration was performed using Berendsen weak thermal coupling method followed by a 40 ns production simulation. For each MD simulation, it adopts the particle mesh Ewald (PME) option with a time step of 2 fs. A series of protocols in Gromacs were employed to analyze the MD trajectories. descriptors for constructing prediction models with 20 compounds of training set. The selected descriptors were ES Sum sssCH, ES Count aaN, BIC, IAC Mean, CHI 3 P, and JY. These six optimum molecular descriptors can be broadly divided into two groups, which are electronic and special topological descriptors. For electronic topological descriptors, it includes ES Sum sssCH, ES Count aaN for calculating the sums of the electrotopological state (E-state) values and the counts of each atom type, respectively. For special topological descriptors, BIC and IAC Mean are bonding information content and mean information of atomic composition, which both belong to Graph-Theoretical InfoContent descriptors [52]. CHI 3 P is a Kier and Hall molecular connectivity index [53]. JY is a Balaban index [54]. According to these selected descriptors, the functional formula of multiple linear regression (MLR) model was constructed as follows:

Results and Discussion
The support vector machine (SVM) and Bayes network toolbox (BNT) models were also constructed with the identical training set and descriptors. The correlation of predicted and observed activities shown in Figure 1 illustrates the correlation trend and 95% prediction bands for each prediction model. The square correlation coefficients ( 2 ) of training set for MLR, SVM, and BNT models are 0.8442, 0.8536, and 0.7612, respectively. These prediction models are acceptable for predicting activity of PPAR-protein.

Docking
Simulation. The potent compounds, which have acceptable predicted activities in all three prediction models, have been virtual screening with the target protein. After filtering by the absorption properties, the top TCM candidates ranked by Dock score were listed in Table 1 with their predicted activities and pharmacokinetics properties. Human intestinal absorption model displayed in Figure 2 suggested that the top five TCM candidates may have good absorption.
For the docking simulation, the binding site of PPARprotein was defined by the volume and position of control, T2384 (Figure 3(a)). We visually inspected docking poses of top ranked TCM candidates (Figure 3(b)), 5-hydroxy-Ltryptophan, abrine, and saussureamine C interacting with   Figure 4 displays the structure of T2384 and top three candidates. According to the docking poses shown in Figure 5, T2384 has interactions with residues Phe264 and Phe363, hydrogen bonds (H-bonds) with residues Cys285 and Lys367, and hydrophobic contact with other nine residues. Compared with T2384 in PPAR-protein, the top three TCM candidates have been docked with similar docking poses. Due to the molecular size of three TCM compounds, none of them have interaction with Phe264 as T2384. Except saussureamine C, both of 5-hydroxy-L-tryptophan and abrine have interaction with residue Phe363 as control. However, saussureamine C still has hydrophobic contact with residue Phe363. All top three candidates have similar H-bond with residue Lys367 and hydrophobic contacts with some common residues, such as Leu330 and Met364. Except that 5hydroxy-L-tryptophan has hydrophobic contact instead of Hbond with residue Cys285, the other two candidates have the similar H-bond with residues Cys285 as T2384. In addition, abrine and saussureamine C also have H-bond with Ser289 and Met364, respectively.

Molecular Dynamics Simulation.
The docking poses in the docking simulation illustrate that the top three TCM candidates have similar interactions with the target proteins as T2384. However, the structure of PPAR-protein is fixed during the progress of docking simulation. As this reason, the molecular dynamics (MD) simulations for each proteinligand complex were performed to investigate the stability of interactions between ligand and target protein in the docking pose under dynamic conditionsand investigate the possible variations for each protein-ligand complex after docking.
The root mean square deviations (RMSDs) and radii of gyration for each protein and ligand in the complexes were illustrated in Figure 6. For RMSD, it calculates the deviation of the structure compared with the starting structure over 40 ns of MD simulation. They indicate that all protein-ligand complexes tend to be stable after 30 ns of MD simulation. Radius of gyration, which measures the mass of the atom relative to the center of mass of the complex, is indicative of the compactness of each complex. As shown in Figure 6, there is no significant variation for the compactness of each complex. Figure 7 illustrates the variation of total energy for each protein-ligand complex over the course of 40 ns MD simulation with the average fluctuations in a cycle of 21 frames shown in the center of each graph. Total energy trajectories indicate that these systems were stabilized for PPAR-protein in the complex with T2384 and top three TCM candidates over the course of 40 ns MD simulation. Figure 8 displays the variation of secondary structure of PPAR-protein and secondary structural feature ratio over the course of 40 ns MD simulation for each complex with T2384 and top three TCM candidates. It indicates that docking with three TCM candidates may not cause the significant differences from docking with the control in the secondary structure of PPARprotein.
The representative structures of each complex after MD simulation were identified by the cluster analysis with a RMSD cutoff of 0.1 nm. In Figure 9 Figure 11: Distances of potential H-bonds between PPAR-protein and each compound during 40 ns MD simulation. in a nonstatic condition. In the representative structures after MD simulation, it has H-bonds with residues His449 and Leu476, as well as a interaction with residue Phe282.
The H-bonds occupancies for key residues of PPARprotein in each complex are shown in Table 2 with cutoff of 0.3 nm. Figure 11 displays the variation of these distances over the course of 40 ns MD simulation. For T2384, the potential H-bonds with key residues of PPAR-protein are formed by its sulfonamide group. 5-Hydroxy-L-tryptophan and abrine form H-bonds with residue Lys367 by the carboxyl group. They form H-bonds with residue Ser289 by the indole group in the beginning of MD simulation, but the H-bond for abrine has shifted from the indole group to secondary amine group after 5 ns of MD simulation. In addition, the carboxyl group of abrine also forms a stable H-bond with residue Tyr327 after 7 ns of MD simulation. For saussureamine C, the docking pose in the docking simulation had changed after MD simulation. The H-bonds formed by the carboxyl group are shifted from residue Lys367 to residue Ser289 after 5 ns of MD simulation. In addition, it forms stable H-bonds with residue His449 by its sulfonamide group and heterocycle group after MD simulation.
The root mean square fluctuations (RMSFs) shown in Figure 12 illustrate the stability of each residue over 30-40 ns MD simulation. Residues Cys285, Lys367, and His449 are stabilized by all top three TCM candidates and T2384.
As abrine forms stable H-bond with residues Ser289 and Tyr327, the RMSFs of Ser289 and Tyr327 are much lower in the complex with abrine than with others. For saussureamine C, as the H-bonds with residue Tyr327 are shifted between the heterocycle group, secondary amine group, and sulfonamide group, it causes the highest value of RMSF for residue Tyr327 in the complex with saussureamine C.
To consider the variation of each ligand during MD simulation, variation of torsion angles during 40 ns of MD simulation for each ligand in the PPAR-complexes is shown in Figure 13. As T2384 is the cocrystallized compound in the PPAR-protein, the docking pose is stable during 40 ns of MD simulation. For 5-hydroxy-L-tryptophan, the docking pose which is also stable during 40 ns of MD simulation except for the hydroxyl group in the indole ring has a 180-degree shift after MD simulation. For abrine, the variation of torsions 10 and 11 at the initial period of MD simulation may be the reason that the H-bond has shifted from the indole group to secondary amine group, and carboxyl group forms a stable H-bond with residue Tyr327 after MD simulation. Torsions 14 and 16 for saussureamine C indicate that the docking pose of saussureamine C has a fluctuation during 15-30 ns of MD; it can also be seen in the ligand RMSD (Figure 6) and the distance variation with residue Tyr327 (Figure 11). The variation of torsion 19 shows that the sulfonamide group of saussureamine C is flexible over MD simulation.

Conclusion
This study aims to investigate the potent TCM candidates for PPAR-protein. The biologically activities of candidates were predicted by three distinct prediction models (MLR, SVM and BNT) based on their ligand characteristics. After docking simulation, the docking poses of top TCM compounds ranked by the scoring function were validated by the MD simulation. For the top three TCM candidates, both of 5hydroxy-L-tryptophan and abrine have an indole ring and carboxyl group to form the H-bonds with the key residues of PPAR-protein. The secondary amine group of abrine also stabilized an H-bond with residue Ser289. The key residues were stabilized in protein complexes with 5-Hydroxy-Ltryptophan and abrine as control. For saussureamine C, the interactions of docking pose in the docking simulation are not stable after MD simulation. Hence, we propose 5-hydroxy-L-tryptophan and abrine as potential lead compounds for further study in drug development process with the PPAR-protein.