Investigation of Estrogen Receptor (ESR1) for Breast Cancer from Traditional Chinese Medicine

Recently, an important topic of breast cancer had been published in 2013. In this report, estrogen receptor (ESR1) had defined the relation of hormone-cause breast cancer. The screening of traditional Chinese medicine (TCM) database has found the molecular compounds by simulating molecular docking and molecular dynamics to regulate ESR1. S-Allylmercaptocysteine and 5-hydroxy-L-tryptophan are selected according to the highest docking score than that of other TCM compounds and Raloxifene (control). The simulation from molecular dynamics is helpful in analyzing and detecting the protein-ligand interactions. After a comparing the control and the Apo form, then based on the docking poses, hydrophobic interactions, hydrogen bond and structure variations, this research postulates that S-allylmercaptocysteine may be more appropriate than other compounds for protein-ligand interaction.


Introduction
There is an important topic of breast cancer that had been published in 2013. In this report, the estrogen receptor (ESR1) had defined the relation of breast cancer [1].
The breast cancer is a common cause of death in women and may originate either hormonally [2][3][4][5] from hereditary factors [6][7][8][9] and some other reasons [10]. Although there are lots of methods to find breast cancer based on accurate prediction, the surgery treatment causes patient low emotion from losing breast. Thus the prevention and treatment early medicinal treatment is widely accepted.
The ESR1 is a ligand-activated transcription factor that mediates the biological effects of the steroid hormone estrogen [11,12]. From the inhibition of ESR1, the hormone would be regulated then decreased the risk for hormone breast cancer.
The Raloxifene is an FDA approval drug for reduction in the risk of breast cancer (http://www.cancer.gov/cancertopics/druginfo/fda-raloxifene-hydrochloride). The reference [13] had reported that the Raloxifene could target ESR1. For the above reason, the Raloxifene was chosen as the control for this investigation.
The computer-aided drug design (CADD) is an in silico simulation technique used in the screening of compounds based on the structure and biological activity. The CADD has the advantages of both greater speed and lower cost than the traditional drug design. The two major application areas of CADD are structure-based drug design and ligandbased drug design [14][15][16][17]. CADD is used to investigate drug design application centered on structure-based drug design and molecular dynamics.
The personalized medicine and biomedicine [18] are a novel department knowledge which could analyze the mutation [19], and the cause for special disease [20]. The traditional Chinese medicine (TCM) is defined a model case in this department. TCM has an important diagnosis culture in Asia, especially in China, Taiwan, Korea, and Japan. The TCM Database@Taiwan (http://tcm.cmu.edu.tw/) [21] is the largest traditional Chinese medicine database in the world. Since it was established in 2011, there have been successful discoveries of novel lead compounds from the TCM Database@Taiwan application for cancer treatment [22][23][24][25], memory disease prevention [26], pain relief [15], and antivirals [27][28][29][30][31]. Today, the screening of TCM compounds is possible from the application system of the website [32] and the cloud computing platform [33].
In this research, we screen a candidate compound against breast cancer from the TCM Database@Taiwan. The computational techniques of docking screening are used to select ligands prior to applying molecular simulation by molecular dynamics (MD) to investigate the variations in protein-ligand interactions that may contribute to the evaluation of the effects on ESR1 inhibition.

Data Set.
A total of 61,000 TCM compounds were downloaded from the TCM database (http://tcm.cmu.edu.tw/). The human ESR1 (PDB ID: 1GWQ) crystal structure was obtained from RCSB Protein Data Bank [1,13]. The Accelrys Discovery Studio 2.5 (DS 2.5) was used as the molecular simulations platform.

Disorder Protein Detection.
The disorder region in protein plays an important role in drug design; thus we take the sequence to predict the disorder region by the Database of Protein Disorder (DisProt: http://www.disprot.org/) [34]. The prediction decides the character of protein structure; then taking the comparison with the docking site could evaluate the efficacy of the drug during protein-ligand interaction.

Molecular
Docking. The LigandFit module [35], a receptor-rigid docking algorithm program in Discovery Studio 2.5 (DS 2.5), was used for docking simulations of Raloxifene and TCM compounds to ESR1 in the CHARMm force field [36]. The docking site of ESR1 was designed on the basis of the research [1,13]. Through docking simulation, the top two compounds with the highest docking scores of the TCM compounds were selected to make the analysis of the hydrophobic interactions by Ligplus [37,38].

Molecular Dynamics
Simulation. The ligands of candidate complex must be reprepared before applying MD  simulation by using SwissParam (http://swissparam.ch/) [39] based on the reference force field [40] of GROMACS 4.5.5 [41]. The ESR1, with ligands, was placed in a simulation box in an appropriate buffer or other solution at a minimum distance of 1.2Å from the complex. The solution for simulation was based on the TIP3P water model in which sodium and chloride ions were added to neutralize complex charges. Based on the Steepest Descent method for 5,000 steps to minimize the complex, the structure with the lowest energy was transferred to MD simulation. The electrostatic interactions were calculated on the basis of the particle-mesh Ewald (PME) method [42]. The calculation with each time step was 2 fs and the numbers of steps were 5,000,000 times then the total simulation time of MD was 10,000 ps. The equilibration under the 100 ps constant temperature (PER ensemble) was based on the Berendsen weak thermal coupling method. The protocols in Gromacs used the MD data to calculate the MD trajectories, RMSD, energy variations, and eigenvector after MD.

The Detection of Disorder Protein.
The disorder protein is defined as unstructured protein which makes the compounds dock to protein and stabilize the complex with difficultly while the docking site is a disorder region. The cited references [17,43] indicate that the disorder region may have lower side effect than the widespread domain. Thus the disorder Docking site  region can be defined as a challenge for drug design. In the prediction, the residue with the disposition greater than 0.5 is defined as disorder region (Figure 1). In this result, all the important amino acids of ESR1 are less than the threshold; thus the disorder protein has a weaker effect on docking and simulation.
The structure of control and the candidate compounds was selected after screening from the TCM database ( Figure 2). The docking poses presents the ligands had interactions with different critical amino-acids in the protein. Figure 3 indicates the selected compounds could target and interact with amino acids around docking site.
The hydrophobic interaction can be analyzed by Ligplus (Figure 4). This result shows that the amino acids Glu353, Leu391, Arg394, and Phe404 can interact with all ligands through hydrophobic interactions or hydrogen bonds, indicating that these amino acids might be important in ligandprotein binding situations. ( Figure 5). The total energy is in the range between −744 and −750 * 10 3 kJ/mol and tends to −748 * 10 3 kJ/mol. From this figure, if both the amplitude of complex RMSD and energy are less; then the simulation may become balanced. Sallylmercaptocysteine has larger variation than other ligands but the complex RMSD of S-allylmercaptocysteine is the lowest. From this situation, we suggest S-allylmercaptocysteine can still interact, and this interaction makes the complex more stable.

Molecular Dynamics
The RMSF focus on each residue was analyzed, and on the variation of the whole protein, including with the ligand interaction ( Figure 6). In this result, it can be seen that the regions of protein-ligand interaction are similar.
The reference-identified eigenvector was used to represent the protein variation [65]. The first two eigenvectors were selected based on PCA (principal component analysis) calculation, and become the -and -axes. The comparison with apo (unbound protein) could find protein variation of first main character of protein (Figure 7). The upper subunit in these figures are the first eigenvector diffusion between apo and complex. The following is a matrix established from first two eigenvectors. After the comparison, we find that complex with S-allylmercaptocysteine is similar to apo then different from other compounds. This situation may not mean an absence variation, but the position of variation might be smaller.
The clustering is a result of the division of data into several groups based on RMSD variation; thus data in the same group will have the similar structure (position and composition) ( Figure 8). In this result, S-allylmercaptocysteine has the least group and the largest one is in last. This situation means the complex with S-allylmercaptocysteine will tend to balance quickly. The largest group of apo and 5-hydroxy-Ltryptophan is not in last and it might present this simulation for apo and 5-hydroxy-L-tryptophan is not enough. Thus S-allylmercaptocysteine and control, Raloxifene, might have better effect on ESR1.
After the structure variation discussion is based on eigenvector and clustering, we should take focus on the structure variation during protein-ligand interaction (Figures 9 to 11). In Figure 9(a), there is high percentage (100%) of H bond occupancy in Glu353 which indicates that Glu353 may have a function in the inhibition of ESR1 from Raloxifene interaction. Figure 9(b) shows the variation between MD 0 ns and 10 ns which present the position and composition variation as control drug inhibit ESR1. As the variation of apo between 0 ns and 10 ns is smaller position variation, the variation of control might become a sample to detect the efficacy of compounds from the structure variation of simulation.
S-Allylmercaptocysteine has high H bond occupancy in both Glu353 and Arg394, with the variation in 2 and 5 being more variable than the control (Figure 10). This result might present that not only S-allylmercaptocysteine the efficacy as control, but also that the force might be stronger than control.
The 5-hydroxy-L-tryptophan complex interactions were recorded ( Figure 11). Besides Glu353 and Arg394, Leu346 (2) (4) (4) (3) has been recorded in Figure 11(a). The structure variation of 5-hydroxy-L-tryptophan is more similar to Sallylmercaptocysteine than control. From this result, we suggest 5-hydroxy-L-tryptophan might also have the function for the inhibition of ESR1. From these variations found, we suggest Glu353 might be important in inhibition and Arg394 might make the force stronger based on the H bond and structure variation.
The pathway definition is according to the calculation of caver 3.0 to determine the inter-path protein path during MD simulation [66]. The pathway could help to determine the ligand moving and the pole provided from protein after interaction ( Figure 12). In these results, we could find a path through ESR1. We suggest the ligands inhibit ESR1 while the interaction is in the protein.

Conclusion
Based on the above discussion, we can find that the top two TCM compounds S-allylmercaptocysteine and 5-hydroxy-Ltryptophan can have effect on ESR1 against breast cancer. Glu353 might have important role in inhibition based on high H bond occupancy in MD. Finally, according to the discussion from docking, interaction, and variation, we suggest that S-allylmercaptocysteine might be the best compound to inhibit ESR1 against breast cancer, even better than the control.