Structural and Functional Annotation and Molecular Docking Analysis of a Hypothetical Protein from Neisseria gonorrhoeae: An In-Silico Approach

Background Worldwide, Neisseria gonorrhoeae-related sexually transmitted infections (STIs) continue to be of significant public health concern. This obligate-human pathogen has developed a number of defenses against both innate and adaptive immune responses during infection, some of which are mediated by the pathogen's proteins. Hence, the uncharacterized proteins of N. gonorrhoeae can be annotated to get insight into the unique functions of this organism related to its pathogenicity and to find a more efficient therapeutic target. Methods In this study, a hypothetical protein (HP) of N. gonorrhoeae was chosen for analysis and an in-silico approach was used to explore various properties such as physicochemical characteristics, subcellular localization, secondary structure, 3D structures, and functional annotation of that HP. Finally, a molecular docking analysis was performed to design an epitope-based vaccine against that HP. Results This study has identified the potential role of the chosen HP of N. gonorrhoeae in plasmid transfer, cell cycle control, cell division, and chromosome partitioning. Acidic nature, thermal stability, cytoplasmic localization of the protein, and some of its other physicochemical properties have also been identified through this study. Molecular docking analysis has demonstrated that one of the T cell epitopes of the protein has a significant binding affinity with the human leukocyte antigen HLA-B∗15 : 01. Conclusions The in-silico characterization of this protein will help us understand molecular mechanism of action of N. gonorrhoeae and get an insight into novel therapeutic identification processes. This research will, therefore, enhance our knowledge to find new medications to tackle this potential threat to humankind.


Introduction
N. gonorrhoeae, the etiological agent of Gonorrhea, first isolated in 1878, belonging to the Neisseriaceae family [1,2], is a gram-negative, 0.6-1 micrometer in diameter [3], encapsulated bacterium [4]. It is fastidious [5], non-acid fast [6], oxidase-positive [7], and non-spore-forming in nature [8]. In addition, it is a non-motile [9] and obligate-human pathogen [10] that can thrive aerobically or anaerobically in the presence of nitrite [11]. These diplococci, kidney-shaped bacteria infecting both men and women can cause the sexually transmitted disease (STD) named gonorrhea [12,13]. Every year, 87 million new infections are being reported for this quick-spreading contagious disease. This STD has already emerged as a major problem in low-and middleincome countries in Africa, Asia, Latin America, and the Caribbean [1,5,12]. Gonorrhea can be asymptotic or develop with symptoms. It can manifest as urethritis in men, with symptoms such as epididymitis, urethral stricture, and prostatitis. In women, it might manifest as urethritis or cervicitis, with symptoms including tubal infertility, chronic pelvic discomfort, severe pelvic inflammatory disease sequelae, and ectopic pregnancy [3,14]. Oropharyngeal and anorectal gonococcal infections can be transmitted from one person to another through kissing and during oral-anal intercourse. Furthermore, gonorrhea can be caused by contamination via cervical fluids [14,15]. However, there is still no effective treatment for gonococci and even no gonococcal vaccination is available yet. To make the situation worse, N. gonorrhoeae has been found resistant to several antimicrobial drugs such as penicillins, tetracyclines, sulphonamides, fluoroquinolones, macrolides, azithromycin, and ceftriaxone [12,16,17]. Hence, WHO recommends azithromycin and ceftriaxone as a dual therapy for the time being against this disease [12]. All these things have now made the discovery of novel antibacterial drugs and the development of alternative therapies a crying need for combating this disease [18].
The genome size of N. gonorrhoeae varies from strain to strain, about 2001+/-197 kbp [19]. For example, the genome of N. gonorrhoeae NCCP11945 contains 2232.025 kbp in one circular chromosome that encodes 2662 predicted open reading frames and 4153 bp that codes 12 predicted ORFs [20]. Additionally, N. gonorrhoeae is known to encode several proteins with unknown functions, known as hypothetical proteins (HPs). HPs are considered to be expressed in an organism, but there is no experimental and chemical proof of their existence [21][22][23]. In most genomes, HPs cover approximately half of the protein-coding regions, but these proteins' roles are yet to be discovered [21,24,25]. Although there is no empirical evidence for the existence of these proteins, they can be predicted to be generated from an open reading frame (ORF) [23,24]. As a result, the annotation of the functions of hypothetical proteins has become increasingly popular [25]. The hypothetical proteins can be categorized as uncharacterized protein families (UPF) as well as the domain of unknown functions (DUF) [23]. Uncharacterized protein families (UPF) have been experimentally confirmed to exist, although they have yet to be identified or connected to a known gene. On the other hand, DUFs are proteins that have been found experimentally but have no known functional or structural domains [23]. Even though they have not been characterized, elucidating their structural and functional secrets can lead to the identification of new domains and motifs, pathways and cascades, structural conformations, protein networks, etc. [21,22]. These are crucial in understanding biochemical and physiological pathways, for example, in identifying pharmaceutical targets [21,22,25] and providing early detection and advantages for proteomic and genomic studies [21]. It is now easier to analyze hypothetical proteins utilizing a variety of bioinformatics tools that provide benefits such as 3D structural conformation prediction, identification of new domains and pathways, phylogenetic profiling, and functional annotation [22,23].
The purpose of this study is to characterize a hypothetical protein F0T10_13280 (plasmid) of N. gonorrhoeae with an integrated computational approach, with previously validated tools and databases, to get an insight into the HP's physical and structural information along with its potential functions. Potential role of this HP in plasmid transfer, cell cycle control, cell division, and chromosome partitioning may give insight into the pathogenic flexibility of N. gonorrhoeae. Analyzing the phylogenetic relationship between this HP and other proteins, physicochemical properties analysis, prediction of the HP's location in the cell, analysis of the secondary and tertiary structure, prediction of the potential function of the HP, and evaluation of the active sites are some of the main focuses of this research. Finally, this research also aims to design an epitope-based peptide vaccine and validate it with a molecular docking study. Figure 1 illustrates the complete workflow and tools used in this study. Table 1 depicts the entire framework, which includes all the tools used to annotate the structural and functional properties of HP of N. gonorrhoeae. A preprint of this research has previously been published by Mazumder et al. [26] 2. Materials and Methods 2.1. Sequence Retrieval and Phylogeny Analysis. The amino acid sequence (accession No. QIH20856.1) was selected by searching the NCBI protein database for HP of N. gonorrhoeae. The sequence was obtained in FASTA format. To identify sequence similarity, BlastP [27] was performed. MUSCLE v3.6 [28] was used to perform multiple sequence alignment. Phylogenetic analysis was carried out using MEGA X [29].

Physicochemical Properties
Analysis. The physicochemical properties of the target protein sequence were investigated using ExPASy's ProtParam program [30]. The molecular weight, atomic composition, estimated half-life, theoretical isoelectric point (pl), extinction coefficient, amino acid composition, aliphatic index, stability index, the total number of positive and negative residues, and grand average of hydropathicity (GRAVY) were all analyzed using this tool.

Subcellular Localization Prediction.
It is crucial to know the subcellular localization of proteins in order to comprehend their functions [31] entirely. Computer analysis helps in the discovery and localization of adhesion-like intercellular proteins [32]. In the last few decades, several computational tools have been developed that can efficiently determine and synthesize ORFs of various proteins (mitochondrial, cytoplasmic, nuclear, or extracellular) to convert them into potential vaccine candidates. However, vaccine candidates should be free of membrane or cytoplasmic localization [33]. CELLO v.2.5 [34] was first used to recognize the subcellular localization of hypothetical protein F0T10_13280 (plasmid) of N. gonorrhoeae. PSORTb v3.0.3 [35] was further used to anticipate subcellular location. To cross-check the results, we used PSLpred [36], a web server for predicting the subcellular localization of gram-negative bacterial proteins.
2.4. Secondary Structure Prediction. Secondary structure predictions of the hypothetical protein were performed using the SOPMA server [37]. The PSIPRED server [38] was also used to ensure the accuracy of the SOPMA results.
2.5. 3D Structure Prediction and Quality Assessment. HHpred server [39] provided a 3D model of the protein. The YASARA server [40] (http://www.yasara.org/minimizationserver.htm) was used to accomplish energy minimization. To visualize the final model and perform structural analysis, PyMOL v2 [41] was employed. The SAVES server's (https:// services.mbi.ucla.edu) quality assessment tools were used to assess the predictability of the hypothetical protein's projected 3D structural model. The Ramachandran plot was built using the PROCHECK [42] tool to visualize the backbone dihedral angles of amino acid residues. With the help of the ERRAT server [43], the quality of the protein 3D structure was evaluated. The Verify 3D server [44] was used to check whether an atomic model (3D) was compatible with its amino acid sequence and compare the results to standard structures.
2.6. Functional Annotation. In order to make exact and reliable functional predictions of the HP, we used a variety of tools. INTERPRO [45], MOTIF [46], Pfam [47], and the conserved domain database of NCBI [48] are the databases and tools being used for this requirement.
2.7. Active Site Detection. For active site assessment and structure-based ligand design, the shape and size of protein pockets and cavities are crucial. The computed atlas of surface topography of proteins (CASTp) was utilized in this experiment to detect possible binding sites, pockets, and cavities from the 3D structure of the target protein [49].

Prediction of CTL Epitope and MHC I Binding Allele
Analysis. In order to design an epitope-based vaccine against the hypothetical protein, cytotoxic T lymphocytes (CTL) prediction was performed using the NetCTL server [50]. The threshold parameter was set to 0.4 with 0.89 sensitivity and 0.94 specificity. To analyze the MHC I binding alleles, all CTL was evaluated with the immune epitope database (IEDB) utilizing the SMM method [51]. The MHC I alleles for which the epitopes showed higher affinity (IC50< 500 nM) were selected for further analysis.

Epitope Selection for Docking and Epitope Prioritization.
Among all the CTL epitopes, one epitope was selected based on its interaction with the maximum number of MHC I binding alleles. The suitability of this epitope for vaccine construction was cross-checked with VaxiJen 2.0 [52], Toxinpred [53], and AllerTOP 2.0 [54] servers to investigate the antigenic, allergenic, and toxicity properties, respectively. The threshold parameter of the VaxiJen 2.0 server was set to 0.4, and all the parameters of the Toxinpred and AllerTop 2.0 server were set to default.

Peptide
Designing and Docking Analysis. The threedimensional structure of the epitope was constructed with the APPTEST server [55]. APPTEST server is a peptide tertiary structure prediction tool that predicts peptide structure using a neural network architecture and simulated annealing  3 BioMed Research International methods. A molecular docking experiment was performed to scrutinize the binding interaction between the epitope and receptor molecule. The crystal structure of HLA-B * 15 : 01 (PDB ID -1xr8) was retrieved from the RCSB database [56] to perform docking analysis. The docking analysis between the peptide (ligand) and human receptor HLA-B * 15 : 01 was performed using the AutoDockVina tool [57]. The grid box size of the AutoDockVina tool was kept at 12.702, 31.843, and 18.307, respectively, for X, Y, and Z. The binding interactions and residues in the interacting surface between the peptide and receptor were investigated with Discovery Studio 2021 [58].

Results and Discussion
3.1. Sequence and Similarity Information. We selected a hypothetical protein (accession no. QIH20856.1) from the organism N. gonorrhoeae. This hypothetical protein contains 478 amino acids. The amino acid sequence for this protein was selected from the NCBI database and obtained in FASTA format. BlastP was performed to verify sequence similarity. The non-redundant protein sequences (nr) database (Table 2) and the UniProt/Swiss-Prot (SwissProt) database (Table 3) were examined to identify sequence similarity with other known proteins by utilizing BlastP. The HP exhibits similarities with other MobA/MobL family proteins, according to the non-redundant protein sequence database. A phylogenetic tree showing the phylogenetic relatedness among the sequences obtained from the non-redundant database was constructed using the MEGA X program by neighbor-joining method with a bootstrap replication of 1000, shown in Figure 2.

BioMed Research International
information about genomic annotation and drug design [31]. The prediction of an unknown protein's subcellular localization can be used to understand disease mechanisms as well as to develop drug or vaccine targets in a given pathogen genome [59]. The cytoplasmic proteins may serve as less suitable potential therapeutic targets, whereas surface     7 BioMed Research International membrane proteins are thought to be effective vaccine targets [33,60]. In our study, we have found our protein as cytoplasmic according to the result of the CELLO. The localization score from CELLO was found to be 1.680. PSORTb v3.0.3 and PSLpred were used to verify the result. PSORTb v3.0.3 also identified the protein to be cytoplasmic, and the score was found to be 8.96. According to the PSLpred, the protein was also predicted as a cytoplasm-resident protein with a score of 64.47.

Secondary Structure Prediction.
The secondary structure (helix, sheet, turn, and coil) aids in providing information on each amino acid's conformation. Protein secondary structure prediction can be used to predict tertiary structure and the primary sequence and tertiary structure are linked by it [61]. Though protein secondary structure prediction is an essential first step toward predicting tertiary structure, it also provides details on protein activity, interactions, and functions. Alpha helices were found to be the most frequently occurring structure in the HP while examined by SOPMA (69.87 percent) (Figure 3). The random coil was seen at 19.67 percent, followed by the extended strand at 5.65 percent. In addition, beta-turn was found to be 4.81 percent. We cross-checked the results using PSIPRED, and a similar result was revealed (Figure 4).

Homology
Modelling, Quality Assessment of the 3D Model, and Visualization. The 3D structure of the protein is highly related to its function. It also helps to predict the binding sites and active sites of the protein, which may contribute to design an effective vaccine against that pathogen. The 3D structure of the HP was obtained from HHpred server using homology modelling. By lowering the energy from -48,361.0 kJ/mol to -11487.9 kJ/mol, the YASARA energy minimization server made the model structure more stable. The 3D structure of the protein was developed by PyMOL v2 (Figure 5). A variety of quality assessment tools were employed to determine how reliable the protein's predicted 3D structural model was. PROCHECK's Ramachandran plot analysis, Verify3D, and ERRAT verified the protein's 3D structure. According to the Ramachandran Plot Statistics (Figure 6(a)), the model was thought to be acceptable, with 93.6 percent residues in the most favored regions (Table 5), and it was 90.8 percent before energy minimization. Utilizing the ERRAT and Verify3D programs, the     After energy minimization, ERRAT determined that the model was of good quality with an overall quality factor of 95.556 ( Figure 6(b)), whereas it was 78.453% prior to energy minimization. After energy minimization, The Verify3D showed that (Figure 6(c)) 96.30 percent of the residues have averaged 3D-1D score >= 0.2, indicating that the model's environmental profile is good. A comparison of all the quality factors of the predicted structure before and after energy minimization is summarized in Table 6.
3.6. Functional Annotation. Using the NCBI's conserved domain search tool, two functional domains of the HP were identified. The domain detected in the HP belongs to the MobA/MobL protein family (accession No. pfam03389). This family includes the MobA protein from the E. coli plasmid RSF1010 and the MobL protein from the Thiobacillus ferrooxidans plasmid PTF1. These are mobilization proteins, which are required for particular plasmid transfer. Smc or chromosomal segregation ATPase is another superfamily that involves cell cycle control, cell division, and chromosome partitioning. Plasmid transfer, cell division, cell cycle regulation, and chromosomal partitioning are essential aspects of genetic engineering and the biotechnological approach. Cell cycle regulation is critical for cell survival and proliferation. Lack of cell cycle maintenance can result in harmful mutations, leading to cell death and cancer [62]. This result was also cross-checked using INTERPRO, MOTIF, and Pfam. All produced similar findings, with positions ranging from 23 to 211 amino acid residues and an e -value of 3.5e-29.
3.7. Active Site Detection. Several studies have documented that the discovery and identification of active sites on proteins are becoming highly significant. The position of the active site on a protein is pivotal for a variety of purposes, including structural identification, functional site comparison, molecular docking, and de novo drug creation [26].
Since the computed atlas of surface topography of proteins (CASTp) just employs the Cα atoms to represent the protein structure, it is quick and appropriate for usage with models and unreliable structures. The geometric potential is a concept to quantitatively describe the shape of the protein structure, which can be affected by the overall form of the structure and individual residue's surroundings. About 85% of known binding sites may be reliably predicted by CASTp with above 50% residue coverage and 80% specificity, and it often uses the geometric potential for this purpose [63]. Hence, the CASTp server was used in this study to examine the protein's active site. The region involved in active site formation is illustrated in Figure 7. The CASTp server revealed that the active site of the protein had 16 amino acid residues, with the best active site located in regions with 63.924 and a volume of 57.845.
3.8. Prediction of CTL Epitope and Analysis of the MHC I Binding Alleles. The majority of vaccinations now in use are based on B cell immunity. However, any foreign particle can eventually avoid the antibody memory response due to antigenic drift. Therefore, T cell epitope-based vaccines have been promoted since the T cell immune response frequently results in long-lasting protection. A powerful immunological response against the infected cell can be produced by the host via CD8+ T cells [64]. Hence, T cell epitope prediction was performed with the most used computational server, NetCTL 1.2. The NetCTL server anticipated the 13 effective T cell epitopes from the selected protein sequence, such as QSA-QAKNDY, LTDKNQGFL, GMEVEITQY, DSGSNKLPY, HTDKNNHNP, QANQALEQY, KQAQGMGKY, FAEDNP-QEF, NQALEQYGY, LDDLQFSGY, AIYHLNVRY, DLQRIQGDY, and TVDSGSNKL with a specificity score of 0.940 and a sensitivity score of 0.89. The MHC I alleles for which the epitopes showed higher affinity (IC50 <500 nM) are shown in Table 7.

Epitope Selection for Docking and Epitope Prioritization.
Among the 13 T cell epitopes, the epitope AIYHLNVRY was found to interact with the highest number of MHC I alleles and was selected for vaccine design. This epitope interacted with 5 MHC I binding alleles, including-HLA-A * 30 : 02, HLA-A * 32 : 01, HLA-B * 15 : 01, HLA-A * 03 : 01, and HLA-A * 11 : 01. A vaccine candidate epitope must meet a number of requirements and our projected epitope met every requirement. The initial criterion requires that the epitope must induce an immunogenic response in the host. The VaxiJen 2.0 antigenic analysis tool was used to determine if the epitope had caused an immunogenic response in the host. Upon the analysis with this tool, it has been identified that the epitope is a putative antigen (antigenicity score 1.5783). Testing for toxicity is another crucial step in the creation of a vaccine. ToxinPred server identified the epitope as a non-toxic epitope. However, one of the main challenges to the creation of vaccines is allergenicity. Most vaccinations trigger an allergic immune response by inducing type 2 T helper (Th2) cells and immunoglobulin E [65]. AllerTOP 2.0 server identified the epitope as a non-allergen protein.
All these results have identified the epitope as a suitable vaccine candidate.
3.10. Molecular Docking Analysis. To evaluate the proposed epitope vaccine's affinity for the human leukocyte antigen HLA-B * 15 : 01, ligand-receptor docking has been performed. The docking analysis with AutoDockVina tool has revealed that the predicted epitope produced a total of nine hydrogen bonds with the residue Tyr9, Arg8, Val7, Ala1, Tyr3, Ile2, Asn6, Leu5, and His 2. The binding energy between the epitope and HLA-B * 15 : 01 receptor was found to be -7.5 kcal/mol. Strong hydrogen bonds and the docked complex's lowest energy value demonstrate a stable connection between the ligand and the receptor molecule. The three-dimensional structure of the peptide and the binding interactions of the peptide and HLA-B * 15 : 01 after docking analysis are visualized and captured with Discovery Studio 2021 and shown in Figure 8.

Conclusion
Throughout this study, we investigated a hypothetical protein from the bacteria Neisseria gonorrhoeae by utilizing several bioinformatics tools. According to our experiment, several physicochemical and functional properties of the studied hypothetical protein have been revealed. Although the cytoplasmic position of this protein makes it less suitable for prospective vaccine design, the molecular docking analysis performed in this study may serve as a foundation for future in-silico vaccine design research, and subsequently, this study will assist other researchers. This study may enhance our understanding of studying the structural and functional research of proteins with unknown functions. Additionally, this research study may subsequently benefit other researchers to do in-silico studies independently. However, as our analysis is based on computational tools and databases, further in vitro and in vivo research is suggested for experimental validation.

Abbreviations
CASTp: Computed Atlas of Surface Topography of Proteins CTL: Cytotoxic T lymphocyte DUF: Domain of unknown functions GRAVY: Grand average of hydropathicity HP: Hypothetical protein IEDB: Immune epitope database ORF: Open reading frame pl: Isoelectric point STD: Sexually transmitted disease STI: Sexually transmitted infection UPF: Uncharacterized protein families.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
All authors declare that they have no competing interests.

Authors' Contributions
LM designed the study, experimental work. MRH and KF collected necessary data and performed data analysis. MRH, KF, LM, and MZI participated in the drafting manuscript. LM and SKT reviewed the draft and revised the manuscript for necessary changes in format. LM supervised the entire study and also acted for all correspondences. All authors read and approved the final version of the manuscript.