Structure-Based In Silico Investigation of Agonists for Proteins Involved in Breast Cancer

Cancer is recognized as one of the main causes of mortality worldwide by the World Health Organization. The high cost of currently available cancer therapy and certain limitations of current treatment make it necessary to search for novel, cost-effective, and efficient methods of cancer treatment. Therefore, in the current investigation, sixty-two compounds from five medicinal plants (Tinospora cordifolia, Ocimum tenuiflorum, Podophyllum hexandrum, Andrographis paniculata, and Beta vulgaris) and two proteins that are associated with breast cancer, i.e., HER4/ErbB4 kinase and ERα were selected. Selected compounds were screened using Lipinski's rule, which resulted in eighteen molecules being ruled out. The remaining forty-four compounds were then taken forward for docking studies followed by molecular dynamics studies of the best screened complexes. Results showed that isocolumbin, isopropylideneandrographolide, and 14-acetylandrographolide were potential lead compounds against the selected breast cancer receptors. Furthermore, in vitro studies are required to confirm the efficacy of the lead compounds.


Introduction
Breast cancer is a heterogeneous group of diseases that originate in the breast tissue and result in the formation of a lump or a mass in the breast. Breast cancer mostly originates from the epithelial cells lining the milk duct [1,2]. When the tumor is small and easily treatable, no concrete symptoms are observed, therefore making early screening important. However, a major symptom is the presence of a painless lump/mass in the breast. In some cases, cancer spreads to the lymph nodes present in the underarms and can cause swelling or lumps, even though the tumor itself is not large enough to be felt by the patient. Pain in the breast, a feeling of heaviness, swelling, or redness of the skin, and spontaneous discharge from the nipples are some of the rare symptoms experienced by some patients [3]. It is the most frequent cause of cancer and cancer-related death among women worldwide. It impacts more than 2.1 million women each year. According to the World Health Organization, in 2018, more than 620 thousand women died from breast cancer worldwide. is constitutes about 15% of all cancer-related deaths among women [4]. A total of 6.9% of cancer deaths are attributed to breast cancer with 684,996 deaths in 2020.
Estrogen receptor (ER) α, an ER subtype, plays a major role in the physiological development of the body [5]. e reproductive, central nervous system, skeletal, and cardiovascular systems are some of the organ systems where ERα plays an important role in the development and functioning [5]. As shown in [5][6][7], ERα is widely expressed throughout the body such as in the uterus, mammary glands, male reproductive system, ovaries, spleen, kidney, and lungs among other organs. ERα is responsible for human breast cancer progression [6,8]. Approximately 75% of breast tumors have ERα at the time of diagnosis (ER-positive breast cancer). erefore, it has been selected as a marker for prognosis during the course of therapy that a patient receives. ER-positive markers have been found to have a better prognosis than ER-negative tumors. Tamoxifen is the most widely used drug for treating cases of ER-positive breast cancer when the patient requires endocrine therapy [9]. Other selective ER modulatory drugs are toremifene and raloxifene. Aminoglutethimide and exemestane are potent aromatase inhibitors for ER [10].
HER4/ErbB4 is a member protein of the epidermal growth factor receptor (EGFR) family. Each member of the EGFR family is essential for normal development [11] in animals and humans. ey are necessary for the healthy development of the heart, nervous system, and mammary gland [11]. HER4 is known for its crucial role in carcinogenesis [12]. However, unlike other members of EGFR, HER4 signaling is less understood. It is known to play a positive role in cancer progression, especially breast cancer in humans [13,14]. As reported by Zhu et al. [15], HER4, alone, could moderate the estrogen-induced growth of breast cancer cells. And, as observed in vitro, overexpression of HER4 influences cell cycle arrest and apoptosis significantly in breast cancer [15,16]. Canertinib, developed by Pfizer, inhibits HER2 and HER4, but due to its limited effect, as shown in phases I and II, this drug was discontinued [17]. Afatinib, an irreversible tyrosine kinase inhibitor, simultaneously targets HER2 and HER4 [18]. It also suppresses HER3-mediated signaling [19].

Protein and Ligand.
e 3D crystal structures of the proteins considered in this study were imported from the Protein Data Bank (RCSB PDB) Web server [28]. For the following study, two proteins were taken for the search of their antagonists, HER4/ErbB4 kinase (PDB ID: 3BBT) and human estrogen receptor ɑ (PBD ID: 2IOG).

Ligands.
For the analysis and prediction of the potential ligand, a data set of 62 phytocompounds from Tinospora cordifolia, Podophyllum hexandrum, Andrographis paniculata, Beet vulgaris, and Ocimum sanctum was selected (Table 1). e structures obtained from the PubChem database [27,29] of all compounds were converted from .sdf format to .pdb format using the Discovery Studio Visualizer by BIOVIA.

ADME Testing.
e SwissADME tool [30] by ExPASy tools was used to determine the values of parameters proposed in the Lipinski Rule of Five [31]. In addition to lipophilicity, the number of hydrogen bond donors and accepters, and molecular weight, one parameter, molar refractivity (Ghose rule), was also taken into account to test the drug-likeliness of all the 62 compounds [32].

Molecular Docking.
Interaction studies of receptor structure with selected ligands were performed using the AutoDock v4.2.6 interface. e protocol started with the preparation of a receptor/protein in which surrounding water was removed from the vicinity of the protein, Kollman charges were added, and polar hydrogens were added, merging nonpolar at the same time. All the rotatable bonds of the ligand were allowed to rotate, and the Gasteiger charges were computed. Grid coordinates were selected using the existing inhibitor in the receptor after which the bound inhibitor was also removed to vacate the active site.
e Lamarckian GA output was used to obtain the docking results using a genetic algorithm. Of the 10 conformations obtained, the one with the least binding energy was selected, and 2D binding interactions between active site residues and the ligand were generated using the BIOVIA Discovery Studio Visualizer v19.1.0.18287.

Molecular
Dynamics. 100 ns simulations of the docked receptor-ligand complex were carried out. Molecular simulations were carried out on the Desmond-Maestro module 2020. All the parameters and algorithms were kept as default. Desmond by itself uses the most efficient algorithms for generating precise output data. e TIP3P model system of water was provided to the docked complex to provide a medium. 0.15 M Na + ions were added to neutralize the whole system with OPLS-AA 2005 as the assigned force field. e SHAKE/RATTLE algorithm restricted the covalent bond movement. NVT ensemble with 300K as temperature and 1 bar as pressure, and all inputs were combined using the RESPA integrator. 100 ns simulations were allowed to be subdivided into 1000 frames for dynamic analysis of proteinligand interaction [33,34].
2.6. PASS Webserver. Based on the structures using multilevel neighbors of atoms description, the PASS webserver can be used to predict the biological activity of the compound. e SMILES of the compound are taken as the input, and the probability of a biological activity can be obtained as the output.

Result and Discussion
3.1. ADME Testing. A preliminary structure-based analysis was conducted on the selected 62 phytocompounds which 2 Evidence-Based Complementary and Alternative Medicine were reported in our earlier study [35]. Compounds were subjected to a total of 5 parameters: logP/lipophilicity (<5), molar refractivity (40-130), molecular weight (<500 Da), hydrogen bond acceptor (<10), and hydrogen bond donors (<5) [36,37]. 18 small molecules showed two or more violations of parameters which resulted in their omission from further analysis.
Out of the 10 conformations obtained for the 3BBTisocolumbin complex, −9.06 kcal/mol was the best binding energy. Four different types of bond formation took place. THR835 and LEU 825 interacted with the five membered ring with pi-sigma interaction. ASP836 formed a conventional carbon-hydrogen bond. LYS726, VAL756, LEU758, LEU769, LEU839, and MET747 interacted by forming alkyl    Figure 2). Figure 3 represents the best docked conformation of the 2IOG-isopropylideneandrographolide complex having a binding energy of −10.3 kcal/mol. 18 residues in total interacted with the present ligand. CYS530 and THR347 formed conventional hydrogen bonds. ASP351 was the only residue forming a carbon-hydrogen bond. 4 residues interacted with weak van der Waals forces. All the other residues either interacted with either the alkyl or pi-alkyl bond formation. e best complex with 14-actetylandrographolide showed a binding energy of −9.26 kcal/mol; cumulatively, 21 residues around the ligand interacted with it. CYS530 and MET343 acted as anchors forming strong conventional hydrogen bonds. MET421, LEU384, MET388, LEU387, PHE404, LEU525, and LEY346 formed either alkyl or pialkyl bonds. 11 residues showed weak van der Waals interactions. THR347 formed an unfavorable donor-donor bond ( Figure 4) (Table 1).

Molecular Dynamics.
Simulation studies were carried out on the two best docked complexes for each receptor taken into account. Each dock was allowed to simulate in the dynamic environment for 100 ns seconds to yield the potential energy and total energy (Table 3).

Structural Deviation and Compactness.
Using four parameters, the conformational stability of the protein and ligand can be analyzed. e root mean square deviation (RMSD), root mean square fluctuations (RMSF), radius of gyration (rGyr), and solvent accessible surface area (SASA) plots can be insightful for defining the compactness of protein and ligand complexes.

HER4/ErbB4 Kinase (3BBT).
A stabilized RMSD plot can be observed with isocolumbin as the ligand molecule. e graph can be observed to fluctuate a little but around a fixed average value only, i.e., 2.5Å. Although during 20-40 ns a few small peaks are visible of magnitude 1Å for backbone atoms and Cɑ atoms, this sudden peak can be regarded to the internal vibrations of the atoms. After 40 ns, a stable plateau is attained, lasting throughout the simulation. With isopropylideneandrographolide as a ligand molecule, a significant deviation (>3) can be seen (>3Å) suggesting large conformational changes in the protein, which is not preferable. After 40 ns the graph stabilizes to the end around a value of 5 which is still higher compared to the RMSD of the 3BBT-isocolumbin complex ( Figure 5(a)).
RMSF graphs for both the ligand complexes can be observed as they should be. Secondary structures like the alpha-helix and beta-sheets are more rigid portions of the protein and therefore should fluctuate less, which is exactly what the RMSF plot showed. e unstructured part, including loops and straight chains of amino acids, fluctuated relatively more ( Figure 5(b)). e rGyr plot for the isolcolumbin complex yielded a value of 3.66 ± 0.12Å. e probability distribution graph revealed  Evidence-Based Complementary and Alternative Medicine that in most frames, the 3.66Å was achieved with little or no fluctuation overall. e isopropylideneandrographolide complex showed a higher rGyr value of 4.40Å with a very nominal crest and troughs. Both the graphs suggest the compactness of the complex; however, the lower value of isocolumbin gives it an edge suggesting a more compact structure during the simulation course ( Figure 5(c)).
Isocolumbin again provides a lower SASA value than the isopropylideneandrographolide; an average of 30Å SASA is observed with the isocolumbin ligand, whereas this number goes up to 100Å with the other ligand, with more variations in the overall graph. e binding of isolcolumbin with the protein is fairly more stable and the core residues are not exposed to the surrounding water, suggesting a good binding of the ligand with the 3BBT receptor ( Figure 5(d)). Receptor ɑ (2IOG). Toward the later stages of the dynamics, the RMSD plot with 14-acetylandrographolide stabilized almost at 3.2 without any   . At the beginning, from 10 to 20 ns, a relatively higher peak can be observed, but after 20 ns, the plateau is attained. Toward the end, there was a dip in the graph as well. e observations are suggestive of protein stability with both the ligand molecules ( Figure 6(a)).

Human Estrogen
Fluctuations considering residues as the basic entity were measured using the RMSF plot. e general convention of amino acids forming secondary structures deviating less as compared to free amino acids such as loops and straight chains was quite visibly followed by both ligand complexes. e binding pocket can be said to lie somewhere in the middle of being flexible and rigid as residues of both natures interacted with the ligand (Figure 6(b)). e rGyr plot is suggestive of the compactness of the protein structure. At 100 ns, a value of 3.8Å was obtained with the protein-14-acetylandrographolide complex, while with isopropylideneandrographolide a value of 4.2Å was obtained.
Both the complexes averaged 10Å 2 in the SASA plot which is also confirmed by the probability distribution plot. e exposure of core residues to the surrounding solvent is minimal which is an additive property to protein stability. e formation of a hydrophobic pocket around both Evidence-Based Complementary and Alternative Medicine 7 residues also provides confirmation for the lower SASA value (Figure 6(d)).

Secondary Structure Count and Interaction Dynamics.
44.39% of amino acids of 3BBT when bound to isocolumbin took part in secondary structure formation, while with isopropylideneandrographolide the percentage reduced to 41.03% which can be targeted toward the increased SASA value.
A difference of 1% can be observed in the 2IOG protein with two different ligands. With 14-acetylandrographolide, a total of 58.08% of residues formed secondary structures, while the number decreased to 57.07% with isopropylideneandrographolide (Table 4).
To assess the binding of protein-ligand complexes, protein-ligand contact estimation and analysis become crucial.
Considering the 3BBT-isocolumbin complex (Figure 7(b) ), ASP836 and GLY838 anchored the ligand molecules with multiple contacts, and both formed significant string hydrogen bonds. ey are assisted by the VAL756 residue, which forms the major water bridges. A total of 25 residues interacted with the ligand at different instances. LYS726, PHE837, LEU758, and VAL756 formed multiple types of bonds. e timeline   (Figure 7(a)) interacts with 27 amino acids, but only only MET774 acts as the anchor to the ligand. e remaining amino acids interacted with the ligand in a scattered and faded manner with a lot more instances of zero contact with the ligand. Two 2IOG complexes showed quite significant interactions in their own sense. e complex with 14-acetylandrographolide had 31 total residues that interacted with it. GLU419 forms a strong hydrogen bond along with HIS524 and a few more residues. ALA350, TRP383, LEU387, MET388, LEU391, PHE404, VAL418, and LEU525 (Figure 7(c)) stabilized the protein-ligand complex using hydrophobic interactions and contributed to the retention of the ligand in the binding pocket. On the other hand, with isopropylideneandrographolide, LEU346 formed major interactions by showing 3 types of binding, i.e., hydrogen bonds, hydrophobic bonds, and water bridges. e LEU346 interaction is briefly assisted by various residues such as ALA350, LEU383, LEU391, LEU 428, MET525, MET421, PHE404, and MET388 (Figure 7(d)). In the beginning, a lot of instances can be observed having no contact with the ligand which may explain the higher RMSD in the beginning.

PASS Webserver Prediction
e webserver can be used to predict the biological activity based on the ligand's structure. e three shortlisted compounds in the study were subjected to the prediction. All of the compounds led to the same biological activity. e probability of the ligands acting as antineoplastics, i.e., drugs having tumor restrictive property, ranged between 0.882 and 0.959 when Pa > Pi (Table 5).

Conclusion
e aim of this study was to identify natural compounds that can prove effective against different proteins associated with breast cancer with few or no side effects. Sixty-two compounds from five selected plants were shortlisted for this study, out of which eighteen compounds were ruled out in violation with Lipinski's rule of five. e least binding affinities and corresponding binding poses for the remaining forty-four compounds were determined. ereafter, MD simulation studies were carried out on the two best docked complexes for each receptor taken into account, followed by prediction of biological activity of the lead compounds. e results revealed that isocolumbin, isopropylideneandrographolide, and 14-acetylandrographolide are lead compounds against selected breast cancer proteins. All these compounds have antineoplastic effects. Further research should be encouraged to determine the in vitro efficacy of the lead compounds and their exact mechanism of action.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest. Pa � probability to be active; Pi � probability to be inactive.