Application of the Subtractive Genomics and Molecular Docking Analysis for the Identification of Novel Putative Drug Targets against Salmonella enterica subsp. enterica serovar Poona

The emergence of novel pathogenic strains with increased antibacterial resistance patterns poses a significant threat to the management of infectious diseases. In this study, we aimed at utilizing the subtractive genomic approach to identify novel drug targets against Salmonella enterica subsp. enterica serovar Poona strain ATCC BAA-1673. We employed in silico bioinformatics tools to subtract the strain-specific paralogous and host-specific homologous sequences from the bacterial proteome. The sorted proteome was further refined to identify the essential genes in the pathogenic bacterium using the database of essential genes (DEG). We carried out metabolic pathway and subcellular location analysis of the essential proteins of the pathogen to elucidate the involvement of these proteins in important cellular processes. We found 52 unique essential proteins in the target proteome that could be utilized as novel targets to design newer drugs. Further, we investigated these proteins in the DrugBank databases and 11 of the unique essential proteins showed druggability according to the FDA approved drug bank databases with diverse broad-spectrum property. Molecular docking analyses of the novel druggable targets with the drugs were carried out by AutoDock Vina option based on scoring functions. The results showed promising candidates for novel drugs against Salmonella infections.


Introduction
Recent progress in the field of computational biology and bioinformatics has generated various in silico analysis and drug designing approaches, eliminating the time and cost involved in the trial and error experimentations that go into drug development [1]. These methods serve to shortlist the potential drug targets that will subsequently be used for laboratory testing. Subtractive genomics is one such in silico approach used for drug target identification based on determination of essential and nonhomologous proteins within the pathogenic organism [1,2]. The Database of Essential Gene (DEG) server can be used for the identification of those proteins involved in important metabolic pathways required for the survival of the pathogen. Furthermore, determination of proteins homologous to humans can be screened out to avoid potential adverse drug reactions during the computer based drug development process. By selecting essential proteins unique to pathogen survival and propagation, the subtractive genomics technique allows identification of novel drug targets within the pathogen. Moreover in silico docking studies between the identified drug targets and existing drugs with slight modification may lead to the discovery of novel drugs for treatment of infection. As a result, a wide range of drug targets and lead compounds can be identified prior to laboratory experimentation, saving extensive time and money. This study focuses on identifying potential drug targets against Salmonella enterica subsp. enterica serovar Poona str. ATCC BAA-1673 using the subtractive genomics approach.
Salmonella enterica subsp. enterica serovar Poona str. ATCC BAA-1673 is a potent food-borne pathogen in humans [3]. Salmonella belongs to the family of flagellated, Gram-negative, facultative anaerobic bacterium and is the causative agent of salmonellosis. In most people, infection by Salmonella is manifested as abdominal cramps, diarrhea, 2 BioMed Research International and fever which resolves itself in 4-7 days [4]. However, in some cases such as in infants, the elderly, and immune compromised patients, the Salmonella pathogen may penetrate the wall of the intestine and enter the circulation from which they can travel to other sites of the body. These cases have a high mortality rate and must be treated promptly with the use of antibiotics [4]. In 2015-2016, a total of 907 people were infected by Salmonella Poona in the United States alone [5]. Epidemiological and laboratory studies showed that this outbreak was transmitted through ingestion of contaminated cucumbers imported from Mexico [5]. In other cases, several separate outbreaks of salmonellosis have occurred due to exposure to pet turtles [6,7]. Overall, salmonellosis incidence has not decreased in the past decade and while the incidences for some serotypes have decreased, incidences for some have increased, being attributed to the emergence of antibiotic resistant strains [8,9]. Since 1996, several Salmonella serovars have started showing resistance to various antimicrobial agents, namely, ciprofloxacin and ceftriaxone [5]. Furthermore, CDC reported that 5% of Salmonella species are resistant to 5 or more types of drugs. Therefore, it is imperative to have a protective plan in case of major future outbreaks. Hence, we aim to apply the subtractive genomics approach to identify novel potential drug targets against Salmonella enterica subsp. enterica serovar Poona str. ATCC BAA-1673.

Retrieval of Proteomes of Host and Pathogen.
The complete proteome of pathogen, Salmonella enterica subsp. enteric serovar Poona str. ATCC BAA-1673 (proteome ID: UP000017517), along with the complete proteome of the host, Homo sapiens (proteome ID: UP000005640) was retrieved from UniProt. The flow chart for different steps performed in the current study is given in Figure 1.

Identification of Essential Proteins in Salmonella enterica
subsp. Poona. The whole proteome of Salmonella enterica subsp. Poona was subject to selective removal by CD-HIT Suite with the sequence identity cut-off at 60 [6] to remove all paralogous proteins within the pathogen. This filtered dataset was referred to as set 1 proteome, which was then used as the query sequence and subject to exclusion using BLAST+ 2.2.26 having a customized database for the human proteome with the cut-off expectation value ( -value) of 10 −4 to acquire set 2 proteome dataset which did not contain any homologous proteins to those of Homo sapiens. BlastP analysis was carried out for set 2 proteome using DEG in which genes indispensable for the survival of Salmonella genus were selected as the reference database. -value cutoff score less than 10 −100 and a minimum bit score cut-off of 100 were used to obtain a set of essential genes. Thus, the resulting protein sequences (set 3 proteome) obtained were nonhomologous to Homo sapiens proteome and represent a way to subtract the host proteome from further analysis.

Analyses of Metabolic Pathway(s).
The essential proteins of Salmonella enterica serovar Poona as identified above was subject to metabolic pathway analysis using KAAS (KEGG Automatic Annotation Server) at KEGG for the identification of potential targets. KAAS (KEGG Automatic Annotation Server) provides functional annotation of genes by BLAST comparisons against the manually curated KEGG GENES database. The result contains KO (KEGG Orthology) assignments and automatically generated KEGG pathways [10].

Prediction of Subcellular Location.
The subcellular locations of the essential proteins must be known for determination of suitable drug targets by allowing prediction of protein function and genome annotation. Computational prediction methods are used to establish the location of a particular protein in the cell. PSORTb version 3.0 (http://www.psort.org/psortb/) was used for this purpose. CELLO version 2.5 (http://cello.life.nctu.edu.tw/) [11] was used to cross-check the data obtained from PSORTb. The proteins were then sorted according to their subcellular localization.

Evaluation of Druggability Potential of the Essential
Proteins. The essential proteins associated with the unique pathway in Salmonella enterica were subject to BlastP analysis against the customized database that is retrieved from drug bank for all FDA approved drug targets [12]. Targets that showed highly matched frequency (80% or more) with database for FDA approved drugs are druggable target. On the other hand, targets that did not show considerable degree of matching with the FDA approved drugs are considered as the novel target for new drug identification.

Analysis of Drug Spectrum.
BlastP was performed individually for each of the drug targets found above against a database containing nonredundant protein sequences. As obtained from the taxonomy report, if the drug targets were found to be present in greater than 25 bacteria, they were classified as broad-spectrum targets. Different bacterial species were used as references.

Molecular Docking Analysis of the Novel Target with the
Drugs. In order to understand the structural basis of the protein targets specificity with the drugs, a computational targetligand docking approach was used to analyze structural complexes of the novel druggable targets with the ligands (drugs). For this purpose, the three-dimensional structures of the novel druggable targets were downloaded from the UniProt database. The chemical structures of the ligands were obtained from DrugBank database [12]. For docking analysis, coordinates of the target protein and potential drug molecule were optimized by Drug Discovery Studio version 3.0 software and UCSF Chimera tool, respectively. Molecular docking analyses of the druggable targets with the drugs were carried out by AutoDock Vina option based on scoring functions [13]. The energy of interaction of the ligands with the targets is assigned "grid point." The grid box was optimized to cover the whole area of the target.

Results and Discussion
This article describes a simple subtractive genomics approach for identification of a suitable drug target among the essential  proteins within the proteome of Salmonella enterica serovar Poona. The subtractive genomics approach has been reported as an innovative and powerful method for identifying unique sequences as potential therapeutic targets [2,14,15]. The in silico subtractive genome analysis is based on sorting the essential proteins of a pathogen as unique (absent in the host organism) in order to facilitate precise drug designing by avoiding host toxicity through cross-reactivity with Homo sapiens proteome.

Identification of Nonhomologous Essential Proteins.
Salmonella enterica serovar Poona contains a total of 4906 proteins in its proteome. Following analysis with CD-HIT suite, 154 proteins were found to be duplicates or paralogs with 60% identity and were eliminated from the dataset as these were redundant as drug targets. The remaining 4752 proteins (set 1 proteome) were analyzed using BlastP against a customized human protein database and 1088 proteins were found to be homologous to human proteins and were again excluded from the dataset as these proteins may cause drug cross-reactivity and host cytotoxicity when used as drug targets during treatment. The resulting set 2 proteome containing 3664 proteins was used for further analysis and subject to BlastP search in the database of essential genes (DEG) in order to determine the essential genes required for the survival of the pathogen. A total of 198 proteins were found to be essential, which means that these proteins are involved in metabolic pathways indispensable for the propagation of this pathogen and thus can be used as target for treatment options (Table 1).

Metabolic Pathway Analysis.
The set of 198 proteins deemed to be essential through the DEG analysis was passed through the KEGG-KASS server to analyze their metabolic pathway. It was found that 52 proteins were involved in metabolic pathways unique to the Salmonella enterica species and thus, not found in humans. The associated pathways along with the names of essential genes and their KO have been presented in Table 2. A metabolic pathway of particular importance is the lipopolysaccharide biosynthesis in Salmonella Poona. LPS is composed of a conserved core oligosaccharide, lipid A, linked to a variable O-antigen in the cell membrane of the Gramnegative bacteria, thus, providing outer membrane stability [16]. 2-Dehydro-3-deoxyphosphooctonate aldolase (KDO 8-P synthase) was recognized as a potential drug target specific to this pathway.
Peptidoglycan composes the cell wall of bacterial cells and inhibitors of peptidoglycans form a major class of antibiotics. Drug targets that inhibit peptidoglycan biosynthesis can minimize microbe generated pathogenicity [14]. Three unique proteins involved in peptidoglycan biosynthesis within Salmonella Poona species were found to be inhibited by drugs, namely, alanine racemase, UDP-N-acetyl glucosamine 1-carboxyvinyltransferase and penicillin-binding protein 2. Two-component system is a signal transduction system responsible for sensing any change in the environment or intracellular state of the bacteria and inducing the appropriate response to adapt to these changes [17,18]. Thus, proteins involved in this pathway are better drug targets and their inhibition will make bacteria susceptible to various drugs. Using current in silico approaches, four such proteins were found and they are outer membrane channel protein tolC, outer membrane pore protein F, histidine kinase PhoQ, and histidine kinase envZ.
Cationic antimicrobial peptides (CAMPS) are key components of the innate immune system and weaken the bacterial cell membrane integrity. On the other hand, various bacteria, including Salmonella Poona, have developed pathways that attribute resistance to CAMP [19,20]. Metabolites involved in this pathway are good targets for altering CAMP resistance and cancelling virulence. Our study found several target molecules, which may interfere with the pathways responsible for developing resistance.
Vancomycin is a glycopeptide antibiotic which is active against most Gram-positive bacteria. This inhibits the synthesis of peptidoglycan in the bacterial cell walls by interacting with D-Ala-D-Al-pentapeptide at C-terminus and preventing their addition to the peptidoglycan chain [21].

Druggability of the Unique Essential Proteins.
Since the ultimate goal of the current study was to identify novel drug targets, the next step was to evaluate the druggability of the essential proteins that were involved in unique Salmonella specific metabolic pathway. The analysis of druggable targets, available drugs, and broad-spectrum property of the drugs showed that 11 of the shortlisted unique essential proteins are druggable according to the FDA approved DrugBank databases with diverse broad-spectrum property (Table 3). Table 3 represent the already identified drug targets with FDA approved drugs. In order to identify the novel druggable targets among the shortlisted protein sequences, we further carried out BLAST analysis of the essential proteins against the DrugBank database. A total of 6 different proteins selected from the 52 proteins that are associated with unique pathway were identified to be plausible novel targets. These proteins were chosen on the basis of their uniqueness and essentiality in pathogen-specific vital pathways. All the 52 proteins that BioMed Research International 5   Table 2 are essential for the existence of the specific Salmonella strain which was identified via KEGG pathway analysis. Among these, the druggable 11 proteins presented in Table 3 showed more than 80% identical similarity with the already FDA approved drug targets. But there were also other 41 proteins left in the unique protein groups for this strain presented in Table 3. Inhibition of these 41 proteins also can be used to fight against this specific microbe and these 41 also showed to some extent identical similarity (<80%) with the targets of FDA approved drugs that were used against other organisms. As it has been known, drug molecule does not bind with the whole protein to perform its activity. Amino acid sequences of the drug binding active site of the whole protein are the important residues for the binding of a drug to a protein. Among these 41 proteins, 6 were presented in Table 4. The reasons behind choosing these 6 proteins were as follows: (A) these proteins showed moderate similarity (65-30%) with the targets of FDA approved drugs that were used against other organisms. As it has been known that amino acid sequences of the drug binding active site of the whole protein are the important residues for the binding of a drug to a protein, these 65-30% identical residues may be laid within the drug binding active site residues. (B) These 6 proteins were also unique to this specific Salmonella strain and associated with the essential pathways which are important for the existence of this organism. UDP-N-acetylglucosamine O-acyltransferase is associated with both lipopolysaccharide biosynthesis and cationic antimicrobial peptide (CAMP) resistance pathways. UDP-3-O- [3-hydroxymyristoyl] N-acetyl-glucosamine-deacetylase and 3deoxy-manno-octulosonate cytidylyltransferase (CMP-KDO synthetase) are associated with lipopolysaccharide biosynthesis pathway. Phosphate regulon response regulator OmpR, nitrogen regulation sensor histidine kinase GlnL, and response regulator CheB are associated with two-component system pathway. LPS is composed of a conserved core oligosaccharide, lipid A, linked to a variable O-antigen in the cell membrane of the Gram-negative bacteria, thus providing outer membrane stability [16]. Drug that inhibits lipopolysaccharide (LPS) biosynthesis can kill the microbe. Cationic antimicrobial peptides (CAMPs) are key components of the innate immune system and weaken the bacterial cell membrane integrity. On the other hand, various bacteria, including Salmonella Poona have developed pathways that attribute resistance to CAMP [19,20]. Metabolites involved in this pathway are good targets for altering CAMP resistance and cancelling virulence. Two-component system is a signal transduction system responsible for sensing any change in the environment or intracellular state of the bacteria and inducing the appropriate response to adapt to these changes [17,18]. Thus, proteins involved in this pathway are better drug targets and their inhibition will make bacteria susceptible to various drugs.

Molecular Docking of the Novel Druggable Proteins.
The current study was further reinforced by performing comparative docking studies of the novel druggable proteins with the ligands. Binding affinities from docking were compared between our target proteins and intended targets from other species against the corresponding drug. The shortlisted potential drug targets showed a pattern of similar binding characteristics, similar residues involved in the active site, and lower free energy (Table 4 and Figures 2-7). Thus, the potential targets with similar binding affinities to the intended proteins-drug affinities can be deemed as novel drug targets to be used in treatment strategies.

Conclusion
The vast array of information regarding the proteomes and genomes of various prokaryotic organisms and knowledge obtained from the human genome project can be manipulated to accelerate drug designing and gain further knowledge of pharmacogenomics in the treatment of bacterial infections. Subtractive genomics can aid in the identification of proteins targeted by existing FDA approved targets. A total of 52 potential targets were found within the Salmonella Poona system. Among these, 11 proteins were already highly identical with the FDA approved drug targets. 6 proteins were proposed as novel drug targets to combat against Salmonella Poona which showed moderate similarity (65-30%) with the targets of FDA approved drugs that were used against other organisms. Furthermore, docking studies were used  to predict the binding of existing FDA approved drugs to the novel targets within the proteome of the pathogen and also with the drug-specific FDA approved database target to compare the binding pattern between them.