In Silico Identification of New Targets for Diagnosis, Vaccine, and Drug Candidates against Trypanosoma cruzi

Chagas disease is a neglected tropical disease caused by the parasite Trypanosoma cruzi. Despite the efforts and distinct methodologies, the search of antigens for diagnosis, vaccine, and drug targets for the disease is still needed. The present study is aimed at identifying possible antigens that could be used for diagnosis, vaccine, and drugs targets against T. cruzi using reverse vaccinology and molecular docking. The genomes of 28 T. cruzi strains available in GenBank (NCBI) were used to obtain the genomic core. Then, subtractive genomics was carried out to identify nonhomologous genes to the host in the core. A total of 2630 conserved proteins in 28 strains of T. cruzi were predicted using OrthoFinder and Diamond software, in which 515 showed no homology to the human host. These proteins were evaluated for their subcellular localization, from which 214 are cytoplasmic and 117 are secreted or present in the plasma membrane. To identify the antigens for diagnosis and vaccine targets, we used the VaxiJen software, and 14 nonhomologous proteins were selected showing high binding efficiency with MHC I and MHC II with potential for in vitro and in vivo tests. When these 14 nonhomologous molecules were compared against other trypanosomatids, it was found that the retrotransposon hot spot (RHS) protein is specific only for T. cruzi parasite suggesting that it could be used for Chagas diagnosis. Such 14 proteins were analyzed using the IEDB software to predict their epitopes in both B and T lymphocytes. Furthermore, molecular docking analysis was performed using the software MHOLline. As a result, we identified 6 possible T. cruzi drug targets that could interact with 4 compounds already known as antiparasitic activities. These 14 protein targets, along with 6 potential drug candidates, can be further validated in future studies, in vivo, regarding Chagas disease.


Introduction
Chagas disease is a neglected tropical disease that affects around 8 million people worldwide, caused by the protozoan Trypanosoma cruzi [1]. It is primarily transmitted by bloodborne vectors such as triatomines and also through oral ingestion, blood donation, and organ transplantation, especially in countries where blood donors are not screened for T. cruzi. Clinical manifestations of the disease may vary from severe myocarditis and/or changes in the gastrointestinal system, presenting megaesophagus and/or megacolon, to an asymptomatic or undetermined form [2]. During these phases, individuals have a strong cellular and humoral immune response [3,4]. Despite these cellular and humoral immune responses, there is no diagnostic antigen yet to characterize and identify each of these phases of the disease. Still, it is not clear which mechanisms trigger the transition from asymptomatic to symptomatic but what is known are the factors involved in the etiopathogenesis that are related to the parasite strains, parasite load, infection phase, and the host immune response [5].
The development of an anti-T. cruzi vaccine has proven to be a challenging task due to difficulties of finding antigens or formulations with effective protection and also because of the risk of developing autoimmunity, which is considered by many to be a potential cause of disease progression and/or pathogenesis [6]. In this regard, a wide variety of vaccine formulations have been tested over the past decade, thus providing strong evidence that T. cruzi can be controlled by vaccines in experimental models. However, these studies have shown that vaccine formulation and vaccine antigens are still not satisfactory; thus, further studies are essential to obtain a vaccine that is truly effective for the entire population [6,7].
The vaccines developed using conventional approaches and already tested in experimental models are based on inactivated or attenuated pathogens or isolated parasite antigens. Although successful in several cases, these vaccines were not enough for their approval or multicenter trials in humans [8][9][10]. Regarding the vaccines based on previously selected antigens, there are still few studies investigating such antigen properties, being necessary to explore their origin and diversity, as well the most important epitopes and immune response activated against each one [6,7]. The application of computational methods to analyze the immunological process, known as immunoinformatics, is revolutionizing vaccine development. In this regard, reverse vaccinology is approachable to identify vaccine candidates in the postgenomic era with reduced cost and time. It is a genome-based screening of epitopes for B and T cells from predicted proteins that can elicit an immune response. First, conserved proteins among all strains of any species of interest are predicted using immunoinformatics approaches such as pangenomics aimed at finding common vaccine targets against all pathogen strains. Since vaccines may eventually induce an autoimmune response, it is important to analyze conserved predicted proteins against host proteins through a subtractive genomics approach. The subcellular localization prediction is also needed since the membrane and secreted proteins are the first to contact the host. Then, MHC I and II bindings are predicted, looking forward to possible diagnosis and vaccine targets. All stages are aimed at filtering targets until they reach what is most likely to generate an effective immune response [11].
The available drug used for Chagas disease treatment presents various side effects and variable cure rates at different stages of the disease. Moreover, these drugs were developed in the 1970s when there are only a few studies focused on this area. Considering this, molecular docking analysis is a useful tool that describes the interaction between small molecules (compound/ligand) with active sites, receptor residues (protein) of interest [12]. This approach is considered successful when it can identify the nearest ligand with the receptor, discovering the geometrical shape of the ligand within a boundary of specific obstructions and their connections [12]. It had become an important computational technique, playing an essential role in drug discovery against various pathogens.
Thus, this present study is aimed at identifying potential diagnostic and vaccine candidates and pharmacological targets for T. cruzi/Chagas disease using subtractive genomics, reverse vaccinology, and molecular docking tools.

Data Collection, Gene Prediction, and Orthology
Analyses. The complete genome sequences of 28 T. cruzi strains were obtained from the GenBank database, available on the National Center for Biotechnology Information (NCBI). Then, the GeneMark group software was used for gene prediction, homogenizing the predictions in order to avoid unexpected results and possible misinterpretations.
The orthology of the predicted proteins was determined using OrthoFinder through standard mode parameters in the Diamond tool (v0.9.22.123) all-versus-all. OrthoFinder is a fast, accurate, and comprehensive analysis tool used in comparative genomics. This software finds orthologists and orthogroups, determining phylogenetic trees, and also provides comprehensive statistics for comparative genomics analysis [13].

Identification of Intraspecies Conserved Proteins
Nonhomologous to the Host. Vaccine and drug targets must avoid autoimmune responses, and diagnostic targets must be molecules specific for only one microorganism. For this, subtractive genomics was carried out to identify, in the core, nonhomologous proteins to the host. We used BLASTp from the core genome against human proteins, which are found in databases provided by NCBI; proteins from the core genome that were homologous to human proteins were excluded.

Protein Subcellular Localization.
The proteins were evaluated for their subcellular localization using PSORT [14] and von Heijne signal sequence recognition [15]. PSORT predicts the presence of signal peptide by the McGeoch method [14], which is considered an N-terminally charged region and a central hydrophobic region. A score is calculated from 3 values: length of the hydrophobic region, the peak value of that region, and the net charge in the N-terminally charged region. Thus, a large positive discriminant score indicates a high possibility of having a signal sequence, whether cleaved or not. After PSORT, a second method is used: von Heijne signal sequence recognition [15]. This is a weight matrix method and incorporates information around the cleavage site, meaning it can detect signal sequences or not. A large positive output means more chances of being a cleavable signal sequence [16]. Data generated by PSORT classify proteins 2 Disease Markers as cytoplasmic, secreted (those that present signal to the endoplasmic reticulum and secretory system vesicles), nuclear, or membrane proteins. We submitted to MHOLline the multifasta files containing all amino acid sequences, regardless of their subcellular location [17]. This online tool uses various dependencies such as HMMTOP, BLAST, BATS, MODELLER, and PROCHECK to predict the threedimensional modeling of target proteins. Only very high, high, good, and medium to good quality sequences were used from MHOLline classified groups G2. G2 structures are those that have high levels of identity and were chosen for the docking molecular process [17].

Identification of Targets for Vaccines.
The proteins predicted as secreted and present in the plasma membrane of the parasite T. cruzi were submitted to VaxiJen to evaluate antigenicity and immunogenicity [18]. This tool is based on the transformation of cross-auto-covariance (ACC) of protein sequences into uniform vectors of major amino acid properties [18]. Thus, ACC transformations remove the influence of sequence length. Antigenicity and immunogenicity are not simple linear properties, and the ACC physicochemical properties process adequately reflects the discrimination between antigen and nonantigen [18]. All proteins that indicated antigenicity above the cutoff (>0.7) were considered possible vaccine targets. The molecules identified were also evaluated for their similarities to proteins of other trypanosomatids to prospect targets that could be used as diagnostic tools.
2.5. Identification of Targets for Diagnosis. The proteins predicted as secreted and present in the plasma membrane of the parasite T. cruzi were also submitted to BLASTp (proteinprotein blast), NCBI, to evaluate similarities to other organisms, including other trypanosomatids. This is important because the molecule could induce a great immune response and be particular to only one parasite such as the case of T. cruzi. All proteins that indicated antigenicity above the cutoff (>0.7) were considered possible for a diagnostic target but only one was specific for T. cruzi.
2.6. Epitope Prediction. The Immune Epitope Database and Analysis Resource (IEDB) contains a diverse catalog of information on immunogenic epitopes and immune response cells, using this information to predict and analyze epitope candidates, i.e., molecular targets of the adaptive immune response [19].   2.7. Identification of Drug Targets. Compounds described in the literature with antiparasitic activity, whether natural, isolated from medicinal plants, or secondary metabolites, were selected and a library of ligands was created. The structures of 67 compounds were downloaded from PubChem (https://pubchem.ncbi.nlm.nih.gov/) [20] in .sdf format and converted to .PDB using the Open Babel tool (v-2.4.1) [21]. PDB format was used to assign Gasteiger atomic partial loads and convert all binders to PDBQT format using the prepare_ligand4.py script on the terminal.
The 3D structure information and drainage analysis play an important role in pathogen target prioritization and authentication [22]. The three-dimensional structure of the final drug targets identified by the MHOLline workflow (http://www.mholline.lncc.br/) was submitted to the DoGSi-teScorer druggability analysis [23]. DoGSiteScorer is an automated online tool that calculates the drug's ability to interact with protein wells. For each identified cavity, the tool provides the cavity residues and a capacity score ranging from 0 to 1 [23]. Additionally, three-dimensional drug target protein structures were identified and converted to the required PDBQT format using ADT (Auto Dock Tool), MGL Tool (Version 1.5.4) [24]. For each target, a grid box in the center of the active site (comprising residues obtained from DoGSi-teScorer) was created for docking analysis.   [25]. Furthermore, the bestranked molecules were identified by the script in python topmolecule.py. The three-dimensional positions of the docking molecules were analyzed by Chimera [26], and PoseView was used for two-dimensional representations [27].

Results
The workflow of our approach, methods, and the total proteins found in each step is shown in Figure 1. We compared 28 genomes of T. cruzi strains ( Table 1). The coding DNA sequences (CDSs) shared by all strains, known as core genome, correspond to 2630 CDSs. Among these CDSs, considering the human genome as the host genome, we found 515 conserved proteins not homologous to the host. The subcellular localization of the proteins was predicted, in which secreted proteins (present in the endoplasmic reticulum), membrane proteins, and proteins belonging to the vesicle secretory system were selected since they are probably the most antigenic proteins and can be readily recognized by the immune system [28]. From those 515 conserved proteins not homologous to the host, 117 are secreted, membrane protein component, or proteins belonging to the vesicle secretory system (Table 2). Subsequently, these 117 proteins were submitted to VaxiJen to find proteins with a probability of MHC I and MHC II binding greater than 0.7. We found 14 proteins that are likely to be presented as antigens, from which 6 are mucin-associated surface proteins (MASPs), 6 are hypothetical proteins from different parasite strains, 1 is GP63 surface protease, and the latter was identified as putative retrotransposon hot spot protein ( Table 3).
The epitopes from 14 proteins identified by the VaxiJen software were predicted using the IEDB-based algorithms for both B cells and T cells (MHC I and MHC II). B cell epi-tope analysis was performed, and according to the previous cellular localization prediction, 100, 50, and 2 epitopes were found for those proteins present in the plasma membrane, endoplasmic reticulum, and vesicle secretory system, respectively. Graphs A, B, and C in Figure 2 demonstrate the epitopes' localization. A standard cut line greater than 0.5 was used, where the above cut epitopes are represented in yellow. The average, maximum, and minimum values are described in the legend of each graph.
HLA genes are highly polymorphic and differ among populations in both frequency and presence or absence of alleles. Thus, we used a software that classifies the epitopes according to the probability of MHC binding, in which those closer to 0 have a higher probability of binding to MHC, i.e., a greater chance of being recognized as an epitope. Due to the great diversity of alleles, the 30 best proteins classified for MHC I were selected, from which 20 are present in the plasma membrane, 9 in the endoplasmic reticulum, and only 1 in the vesicle secretory system (Table 4). For MHC II, the best 30 were also selected, being 25 present in the plasma membrane and 5 in the endoplasmic reticulum (Table 5).
Currently, phytotherapics and natural plant products are frequently used in health services in both developed and developing countries and play important roles in recent drug development. They are known as a combination of chemicals that are synthesized by plants, having a moderate impact due to very low absorption by oral administration [29,30]. The compounds selected by our group are described as medicinal plants or natural products with antiprotozoal activity against T. cruzi. For each target protein, all ligand compounds were used for docking analysis with drug residues in the cavity identified by the DoGSiteScorer [23] ( Table 6). The best binding affinity score-based compounds generated by Auto-Dock Vina were analyzed for better position detection ( Table 7). As a result, the predicted protein-ligand interactions for the best ligand compounds with each target that showed a significant interaction with most drug pouch residues, lower binding affinity scores, and number of hydrogen bonds are described in Table 7; moreover, we represented 3D and 2D target protein docking analysis (Figures 3-5).

Discussion
The vast repertoire of MASP sequences in the T. cruzi genome and the fact that they can be secreted by the parasite contribute to the ability of this protozoan to infect various host cell types and/or to participate in mechanisms of its immune evasion. MASP protein has been shown to induce the process of endocytosis in Vero cells, a process by which the parasite's trypomastigote forms actively invade host cells. Additionally, MASP peptides can elicit different antibody responses to both IgG (Immunoglobulin G) and IgM (Immunoglobulin M) and the level of antibodies to a peptide may vary after sequential passage in mice. Moreover, it has been shown that changes in the repertoire of antigenic MASP peptides may contribute to the evasion of the host immune response during the acute phase of the disease [31].
The proteomic and immunoinformatics techniques showed that several members of the MASP family, expressed  0  20  40  60  80  100  120  140  160  180  200  220  240  260  280  300  320  340  360  380  400  420  440   Position   0  20  40  60  80  100  120  140  160  180  200  220  240  260  280  300  320  340  360   It has been revealed that a synthetic 20-mer peptide (MASPpep) containing potential overlapping of B cells and T CD4 and T CD8 cell epitopes can induce immunity mediated by these two cell types against T. cruzi infection in mice. These data demonstrated that a MASPpep synthetic peptide-based vaccine can effectively control T. cruzi infection, prolonging survival and possibly reducing disease progression by inducing optimal immune stimulation, i.e., involving humoral and cellular responses [32]. The central region of MASP is highly variable, contributing to a vast repertoire of peptides that can interact with several receptors of different host cell types. Therefore, it is interesting to investigate whether MASP induces the immune system, especially during the acute phase of infection, when there are many circulating trypomastigotes in the human host organism [33]. Proteases are present in different protozoan parasites and appear to be important to several aspects of parasite-host interactions, regardless of their participation in pathogen nutrition [34]. Metalloproteases have been described in several parasites, but only those present in Leishmania spp. were completely characterized [35]. On its external surface of the plasma membrane, Leishmania spp. express an important 63 kDa glycosylphosphatidylinositol-(GPI-) anchored glycoprotein called gp63 or leishmanolysin, which represents more than 1% of the total cellular protein content [36,37]. Gp63 plays several roles in parasite-host interactions and is an important virulence factor [38]. In T. cruzi, different metalloprotease activities have been described [39]; some of them expressed only during the metacyclogenesis phase [40,41]. Four gp63 homologous genes have already been identified in T. cruzi; some of which are predominantly expressed at the mRNA level in the amastigote phase [42].
Immunocomplexes (ICs) are direct and real-time products of humoral immune responses. Among the various parasitic antigens incorporated in ICs, gp63 is relatively well known for its function [43]. Antipeptide antibodies against the C-terminal epitope, present in a subset of gp63 proteins, are recognized at all stages of the parasite and subsequently  [34]. In vitro studies also demonstrate that the presence of anti-gp63 serum has a significant inhibitory effect on T. cruzi infection [44].
Retrotransposon hot spot (RHS) proteins are encoded by a multigenic family present in T. cruzi and Trypanosoma brucei, but are not found in the Leishmania spp. genome [45]. A recent proteomic analysis was able to identify around 39 HRH isoforms that were expressed in the T. cruzi circulating trypomastigote form [46]. In ELISA tests, only the RHS recombinant has shown a strong serum response in patients with different clinical manifestations of Chagas disease [47]. Studies demonstrate through proteomics that T. brucei expresses the RSH protein [48]. However, several RHS protein sequence alignments showed that T. cruzi and T. brucei share less than 33% identity. No cross-reactivity between T. cruzi RHS protein in serum from patients with African sleeping sickness or leishmaniasis has been observed, thus indicating that HRH can be used as an antigen to increase the specificity of the diagnosis of Chagas disease [47]. More importantly, this protein could be tested to build a diagnostic test that could determine the clinical forms of the disease, a test that is not yet available.
Understanding antigen recognition at the molecular level opens the way to design new epitopic vaccines [49]. The identification of epitopes for both B and T cells is required to develop such vaccines since the antigenic determinants are immunodominant and capable of inducing a specific immune response [50]. Tools capable of predicting epitopes can serve as filters, i.e., they rule out regions that are probably not epitopes of additional experimental analysis [51]. This leads to the nomination of new candidates for more assertive and probably more efficient vaccines [49]. Therefore, by using reverse vaccinology, our work found possible vaccine targets that, after purification, will give rise to a prophylactic vaccine where the predicted antigens will be purified and tested in vitro and in vivo together with adjuvants, in order to generate greater efficiency.
Inosine-guanine nucleoside hydrolases have homology to model ID PDB: 3FZ0 (inosine-guanosine nucleoside hydrolase-(IG-NH-) Trypanosoma brucei) with identity ≥ 50% and ≤75%. The nucleoside hydrolase (NH) class is common in all kingdoms except mammals, in which their absence in the host has been highly recommended as potential targets for the antitrypanosomal drug [55]. Studies have also mentioned that NH inhibitors exhibit selective inhibition of isoenzyme for IAG-NH and IG-NH due to variation in the  10 Disease Markers active site characteristics, and inhibition of only one NH is not sufficient to impair the purine salvage pathway in parasites [56,57]. Comparing the DoGSiteScorer active site identification analysis (Table 6) and our molecular docking analysis for this protein, we found that the Emodin compound showed high interaction with ASP15, TRP60, and ASN40 (Table 7) residues with good scores linked to our active site identification analysis. Figures 4(a) and 4(b) show the three-dimensional and two-dimensional interaction, respectively, with the compound Diospirin. Mitochondrial RNA-binding protein 1 showed homology to model ID PDB: 2GIA (T. brucei MRP1/MRP2 crystal structures) with identity ≥ 50%. RNA-binding proteins are essential and play an important role in posttranscriptional gene regulation, coordinating the processing, storage, and control of cellular RNAs, which ultimately influences the expression of each gene in the cell [58]. Trypanosomatids have been considered for the identification of new biological mechanisms, such as RNA trans-splicing, mitochondrial RNA editing, and antigenic variation [59]. RBPs cannot be tested as drug targets because of their lack of enzymatic activity. In this sense, it is speculated that the enzymes (kinases, phosphatases, SUMO E3 ligases, and methyltransferases) responsible for these posttranslational modifications could be good candidates for new drugs [59]. Residues of the mitochondrial RNA-binding protein 1 active site, based on a DoGSiteScorer comparison active site identification analysis (Table 6) and our molecular docking analysis, found that compound (1R)-1,6,6-trimethyl-1,2,7,8-tetrahydronaphtho [1,2-g] [1]benzofuran-9,10,11-trione showed a high interaction Calpain-like cysteine peptidase showed homology to PDB model ID: 2FE0 (SMP-1 small myristoylated protein from Leishmania major) with identity ≥ 25% and <35%. Calpains are calcium-dependent heterodimeric cysteine peptidases that have been widely studied in mammals and exist in two major isoforms, μ-calpain (calpain 1) and m-calpain (calpain 2), which require micromolar concentrations and millimolar Ca2+, respectively, for its activation. These proteins are composed of a large subunit (divided into four subunits) of approximately 80 kDa and a small subunit of 28 kDa [60]. Proteases are enzymes that break the peptide bonds and are essential for numerous biological activities: peptide diges-tion, activation of other enzymes, immune system modulation, cell cycle participation, differentiation, and autophagy [61]. Given that the calpain protein family plays an important role in a distinct range of human disease and biological processes, it has a crucial therapeutic potential, and much has been done to develop or identify selective calpain inhibitors [62]. These proteins in trypanosomatids can turn them candidates for drug targets. In our molecular docking analysis for this protein, we found that the Usambarensine compound showed a high interaction and binding energy with the GLU42 residue (Table 6). Figures 5(a) and 5(b) show the three-dimensional and two-dimensional representation, respectively, with the compound Usambarensine.

12
Disease Markers cruzi trans-sialidase plays a key role in immunopathological events. The trans-sialidase enzyme catalyzes the displacement of glycoconjugate sialic acid molecules from the host to receptor molecules on the parasite surface. The activity of TS causes several biological effects that lead to the subversion of the host immune system, favoring both parasite survival and the establishment of chronic infection (Nardy et al., 2016). Trans-sialidase protein has been reported as a drug target against Chagas disease (Miller III; Roitberg, 2013). In our molecular docking analysis for this protein, we found that the Usambarensine compound showed notable interaction and binding energy with the THR155 residue ( Table 6). Figures 5(c) and 5(d) show the three-dimensional and two-dimensional interaction, respectively, with the Usambarensine compound. Trypanosomatids, T. brucei spp., T. cruzi, and Leishmania spp. cause disease in humans and animals being potentially fatal. Unfortunately, there are no effective vaccines against these parasites, and current drug treatments are highly toxic, have a low tolerance, and require long patient compliance [59]. While current drug treatments may be effective during the acute stage of infection, newer, safer, and more effective treatments against these neglected diseases are needed. Due to the toxicity and efficacy of available antiprotozoal drugs and the emergence of drug resistance, new trypanosomatid target discovery and new bioactive compounds are of utmost importance. Here, we performed subtractive genomics for drug target identification and molecular docking analysis with 6 identified drug targets. Interestingly, some of the identified targets are already reported as drug targets for Chagas disease. We prepared the binder library from a robust literature search and performed ligand-based docking. In our anchor analysis, we identified compounds such as Diospirin, Emodin, and Usambarensine showing high binding affinity with the number of targets identified.
The compound Diospirin is a plant product with a significant inhibitory effect on Leishmania donovani promastigote growth. This compound inhibits the catalytic activity of parasitic DNA topoisomerase I [63]. Emodin is a natural trihydroxyanthraquinone and is obtained from the roots and bark of numerous plants (particularly rhubarb and hawthorn), being an active ingredient in several Chinese herbs. It has a role as a tyrosine kinase inhibitor, an antineoplastic agent, a laxative, and a plant metabolite [64]. Usambarensine is a plant product isolated from the roots of Strychnos usambarensis in Central Africa. This compound exhibits strong antimalarial and cytotoxic effect activities. Its toxicity to B16 melanoma cells has been described [65].

Conclusions
Due to the absence of diagnostic tests that can determine a clinical form of the disease, it is necessary to develop a vaccine for the prevention and new drugs for the treatment of Chagas disease. Here, we apply the reverse vaccinology approach and identify 14 vaccine candidate proteins; these can also be used as a target for the diagnosis of clinical forms of the disease since it is specific for the T. cruzi parasite. We have also identified potential targets for already available drugs and natural products through molecular docking. We emphasize that both approaches are important but require a lot of time which can be further validated through in vitro and in vivo experiments.

Data Availability
All data generated or analyzed during this study are included in this published article. Additional information about the data is available from the corresponding author on reasonable request.

Conflicts of Interest
The authors declare that they have no conflict of interests.