Identification of Five Novel Salmonella Typhi-Specific Genes as Markers for Diagnosis of Typhoid Fever Using Single-Gene Target PCR Assays

Salmonella Typhi (S. Typhi) causes typhoid fever which is a disease characterised by high mortality and morbidity worldwide. In order to curtail the transmission of this highly infectious disease, identification of new markers that can detect the pathogen is needed for development of sensitive and specific diagnostic tests. In this study, genomic comparison of S. Typhi with other enteric pathogens was performed, and 6 S. Typhi genes, that is, STY0201, STY0307, STY0322, STY0326, STY2020, and STY2021, were found to be specific in silico. Six PCR assays each targeting a unique gene were developed to test the specificity of these genes in vitro. The diagnostic sensitivities and specificities of each assay were determined using 39 S. Typhi, 62 non-Typhi Salmonella, and 10 non-Salmonella clinical isolates. The results showed that 5 of these genes, that is, STY0307, STY0322, STY0326, STY2020, and STY2021, demonstrated 100% sensitivity (39/39) and 100% specificity (0/72). The detection limit of the 5 PCR assays was 32 pg for STY0322, 6.4 pg for STY0326, STY2020, and STY2021, and 1.28 pg for STY0307. In conclusion, 5 PCR assays using STY0307, STY0322, STY0326, STY2020, and STY2021 were developed and found to be highly specific at single-gene target resolution for diagnosis of typhoid fever.


Introduction
To date, there are more than 2,500 serotypes identified within the Salmonella enterica species [1]. Most are harmless to humans but one serotype, Salmonella enterica subspecies enterica serovar Typhi (S. Typhi), causes typhoid fever, a severe and life-threatening systemic infection in humans. Worldwide, typhoid fever causes 269,000 deaths from 26.9 million new cases each year [2]. Travellers, children, the elderly, and immune-compromised individuals are especially at risk [3,4]. The clinical manifestations of typhoid fever are similar to other febrile illnesses. Therefore, diagnosis based on clinical signs and symptoms alone is difficult [5]. The emergence of multidrug-resistant S. Typhi strains and development of the typhoid carrier state have further complicated the management of typhoid fever [6,7]. Delay in diagnosis and initiation of antibiotic treatment can cause serious clinical complications and fatality [8]. Thus, early and correct laboratory diagnosis of typhoid fever is critical to reduce the morbidity and mortality, as well as curtail transmission of the disease.
DNA-based detection methods, such as polymerase chain reaction (PCR), have proven to be sensitive, specific, and rapid compared to conventional culture-based methods for the diagnosis of many infectious diseases [9][10][11]. Several target genes have been used for S. Typhi identification using PCR, such as the O antigen somatic genes (tyv and prt) [12], H antigen flagellar gene (fliC-d) [13], and Vi capsular antigen gene (viaB) [14]. However, these genes cannot stand alone as single . Typhi-specific diagnostic marker since they are not specific to S. Typhi and are also found in other Salmonella serotypes. Thus, these markers provide provisional rather than differential diagnosis of typhoid fever. For example, the fliC-d gene of S. Typhi shares the same nucleic acid sequence as S. Muenchen [15]; the prt gene is present in S. Typhi, S. Paratyphi A, and S. Enteritidis [12]; and the viaB gene is found not only in S. Typhi but also in S. Dublin, a few strains of S. Paratyphi C [16] and Citrobacter freundii [17]. Due to the lack of specificity of these target genes, a combination of different pairs of primers using multiplex PCR [18] or nested PCR [19] are needed to increase the sensitivity and specificity of the PCR diagnostic test. This, however, will increase the cost, time, and complexity of the laboratory diagnosis.
Diagnostic markers which can detect pathogens at singlegene target resolution could lead to a simpler, cost-effective, and more functional DNA-based detection method since less primers are needed for target detection. Many approaches, such as subtractive hybridization [20], next generation sequencing [21], and microarray [22] techniques, have been used to identify genes that are specific or unique to a pathogen. However, these high-end technologies are cumbersome and expensive and sometimes yield false negative or false positive results [23]. Since bacterial genome databases have expanded tremendously over the past decade and advancement in computing technologies has made nucleic acid sequence alignment services readily accessible at NCBI, in silico comparative hybridization approach coupled with in vitro PCR (wet-lab) validation is sufficient to facilitate the translation of genomic data into diagnostic marker discoveries. In this study, a low-cost and simple attempt was made to identify new DNA diagnostic markers specific for S. Typhi by utilizing genome data (stored in NCBI databases) and nucleic acid sequence alignment tools (BLASTn) that are readily available in the public domain. The diagnostic sensitivities and specificities of the primers designed for amplifying whole gene sequences can be validated using a panel of confirmed bacteria isolates selected from S. Typhi, non-Typhi Salmonella, and non-Salmonella clinical isolates. To serve as a control for the PCR reaction, 16S rRNA gene, that is ubiquitous among bacteria species, can be used as a PCR amplification control [24].

Bacterial Strains.
A total of 111 bacteria isolates including 39 S. Typhi, 62 non-Typhi Salmonella serotypes, and 10 non-Salmonella strains were used in this study. S. Typhi strains consisted of 1 S. Typhi reference strains (ATCC 7251) and 38 different pulsed-field types (PFTs) representing all strains in the state of Kelantan in Malaysia. These 38 PFTs were the result of screening 279 S. Typhi clinical isolates using pulsed-field gel electrophoresis (PFGE) [25]. Non-Typhi Salmonella serotypes were closely related Salmonella species made up of 26 different serotypes (

Culture Conditions and Confirmation
Tests. All bacteria isolates used in this study were confirmed by traditional culture, biochemical, and serotyping methods as described in ISO6579 with some modifications. Bacteria isolates were revived from frozen glycerol stocks by pipetting 100 L thawed cells into 10 mL nutrient broth and incubated at 37 ∘ C for 18 hours in an orbital shaker at 200 rpm. The bacteria were streaked on Xylose Lysine Deoxycholate (XLD) selective agar and incubated at 37 ∘ C for 18 hours. Colonies grown on the agar were tested with a panel of biochemical tests, including Triple Sugar Iron (TSI), urease, Methyl Red Voges Proskauer (MRVP), citrate, and indole tests. Suspected Salmonella isolates were then sent to the Salmonella Reference Centre, Institute for Medical Research (IMR), Malaysia, to confirm their serotypes using specific antisera and latex agglutination method.

Identification of S. Typhi-Specific Genes Using Bioinformatics (In Silico).
Full genome sequence of . Typhi CT18 (GenBank accession number AL513382) was downloaded from the National Center for Biotechnology Information database (NCBI) and used as the reference genome. The 2 plasmids, namely, pHCM1 and pHCM2, which resided in . Typhi CT18 were excluded since plasmids are genetically unstable. The 6 complete . Typhi whole-genome sequences available in NCBI were used for data mining. They comprised CT18 (Genbank accession number AL513382) [27], Ty2 (Genbank accession number AE014613) [28], Pstx-12 (Genbank accession number CP003278) [29], Ty21a (Genbank accession number CP002099) [30], B/SF/13/03/195 (Genbank accession number CP012151) [31], and PM016/13 (Genbank accession number CP012091) [32]. In order to ascertain whether the genomic regions were conserved and specific to . Typhi, the nucleotide Basic Local Alignment Search Tool (BLASTn), a free online software for nucleic acid analysis, was used to compare the whole-genome sequence of . Typhi CT18 with the other 5 complete . Typhi genomes and other bacteria genomes in the NCBI database (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Genes found in  unique regions which have no nucleotide similarity with other enteric organisms were identified and retrieved from the Genebank of NCBI. These genes were further screened individually using similarity searches against the NCBI nonredundant nucleotide (nr/nt) database to reconfirm their specificities. The program was set for "somewhat similar sequences search," which allowed nucleotide sequence matching down to 7 bases (the smaller the nucleotide size, the more sensitive the result). Realizing the high genome similarity among the enteric pathogens and the possibility that different geographical areas may result in different bacterial genotypes, only genes which have 100% sequence conservation (an -value threshold = 0.0) in all 6 complete . Typhi genomes and had little or no similarity ( -value threshold ≥ 1e −41 ) to other bacterial sequences in the NCBI database were considered as potential targets and were subjected to wet-lab analysis. The experimental pipeline is as shown in Figure 1.

Design of Oligonucleotide Primers for PCR Amplification.
Primers were designed manually to amplify the . Typhispecific genes identified previously, including the start and the stop codons. A pair of primers specific for 16S rRNA gene amplification as described by Marchesi and colleagues [24] were also incorporated into each PCR assay to serve as an internal amplification control (IAC). This is a universal gene target which is highly conserved in bacteria [24]. All primers were synthesized by Integrated DNA Technologies (IDT) Pte. Ltd., Malaysia.
2.5. Template DNA Extraction. DNA from all bacteria isolates were extracted using DNeasy Blood & Tissue kit5 (Qiagen, USA) according to the manufacturer's instructions. The purity and concentration of the extracted DNA were determined using Nanodrop Spectrophotometer ND-1000 (Thermo Fisher Scientific, USA). DNA concentration was measured from the absorbance at 260 nm. Ratio of the absorbance at 260 and 280 nm ( 260/280 ) and ratio of the absorbance at 230 and 260 nm ( 230/260 ) were used to evaluate the DNA quality. The extracted DNAs were diluted to a final stock concentration of 50 ng/ L using ultrapure water and stored at −20 ∘ C until ready for PCR amplification.

Optimization of PCR.
Each PCR assay was optimized using a modified Taguchi method as described by Cobb and Clarkson [33]. The effects and interactions of the 4 main PCR components (IAC primers, S. Typhi-specific gene primers, MgCl 2 , and annealing temperatures) each at 3 different levels (IAC primers: 0.05, 0.10, and 0.15 M; S. Typhi primers: 1.00, 1.50, and 2.00 M; MgCl 2 : 2.00, 2.50, and 3.00 mM, and annealing temperatures: 50, 55, and 60 ∘ C) were investigated in a balanced orthogonal array of 9 experimental combinations. The PCR amplifications were carried out in a total reaction volume of 20 L, and the PCR products were analysed on a 1.2% (w/v) agarose gel containing SYBR5 Safe DNA Gel Stain (Invitrogen, USA), visualized using a bluelight transilluminator (Syngene, UK).

Analytical Specificities of Genes Unique to S. Typhi.
Analytical specificities of the PCR assays were assessed by running each PCR assay on a panel of bacteria strains consisting of 39 S. Typhi, 62 non-Typhi Salmonella, and 10 non-Salmonella clinical isolates.

Detection
Limit of the PCR Assays. Detection limit of the PCR assays was defined as the minimum amount of . Typhi DNA (ng/ L) that yielded positive PCR amplicons. The assay sensitivities were determined by amplification of a 5fold serial dilution of . Typhi ATCC 7251 DNA, ranging from 50 ng to 25.6 fg. Two microliters of the DNA was subjected to PCR amplification. The analytical sensitivity was indicated by the presence of visible PCR product bands on the agarose gel using the transilluminator as described above.
2.9. DNA Sequencing. To confirm the PCR products were indeed derived from the . Typhi strains, PCR amplicons from all assays produced using Phusion5 High-Fidelity DNA Polymerase (New England Biolabs, USA) were purified and sent to First BASE Laboratories Pte. Ltd., Malaysia, for sequencing. The resultant nucleotide sequences were compared with the reference . Typhi CT18 gene sequences in NCBI using BioEdit software.

Results
Using the bioinformatic method for whole-genome comparison (Figure 1), 6 potential diagnostic markers with NCBI locus tags, STY0201, STY0307, STY0322, STY0326, STY2020, and STY2021, were found. They exhibit 100% query coverage and identity ( -value = 0) with all 6 S. Typhi gene sequences but had low or no significant similarity ( -value ≥ 1e −41 ) with other enteric bacteria nucleotide sequences as of 11 March 2016. These genes were found to be (bioinformatically) highly conserved and specific and thus were selected for further wetlab validation using PCR method. The primers designed to amplify these selected genes are shown in Table 1.
The results showed that all 6 designed primer pairs successfully amplified their target genes with amplicon sizes of 1176, 495, 678, 261, 429, and 732 bps, respectively. DNA sequencing results of the amplicons showed 100% identity with their corresponding . Typhi genes, confirming the fidelity and sensitivity of the primers.
The 6 single-gene target PCR assays were then optimized using Taguchi method with the incorporation of IAC which targeted the 16S rRNA gene. The optimized master mix for the PCR assays targeting STY0201, STY0307, and STY2020 genes consisted of 1x Green GoTaq Flexi Buffer, 2.0 mM MgCl 2 , 0.2 mM dNTPs, 1.5 M . Typhi-specific gene primers, 0.10 M IAC primers, 0.75 U GoTaq Flexi DNA Polymerase (Promega, USA), and 5% glycerol in a total volume of 20 L. Two microliters of test DNA (50 ng/ L) was added to the master mix and amplified using the following optimized thermal-cycling parameters: initial denaturation at 95 ∘ C for 1 min, followed by 30 cycles elongation at 95 ∘ C for 30 s, 55 ∘ C for 30 s, 72 ∘ C for 1 min and a final extension at 72 ∘ C for 5 min. Similar PCR conditions were used for amplification of STY0322, STY0326, and STY2021 genes except for the concentration of MgCl 2 and IAC primers which were set at 3.0 mM and 0.15 M, respectively. The optimal annealing temperature was set at 50 ∘ C. Under these conditions, the IAC primer pair produced an amplicon of 1,362 bp for all bacteria isolates tested (111/111).
The results of serial dilution of . Typhi genomic DNA showed that the detection limit of the optimized PCR assays was 32 pg for gene STY0322, 6.4 pg for genes STY0326, STY2020, and STY2021, and 1.28 pg for gene STY0307.
Although gene STY0201 exhibited 100% sensitivity (detection of 39/39 S. Typhi isolates), it showed crossreactivity with . Oslo and . Kissi (Table 2), resulting in a specificity of only 97.2% (detection of 2/72 of non-Typhi isolates). Sequencing of their PCR products showed a substitution of nucleotide C → T at position 89 and T → C at positions 354 and 1,026 for both . Kissi and . Oslo. The sequence variation between . Kissi and . Oslo with the . Typhi CT18 reference genome was very small (only 3 nucleotide differences), indicating that the false positive results were due to sequence similarity among themselves.

Discussion
The diagnosis of typhoid fever based on clinical signs and symptoms is often ambiguous, while phenotypic detection of . Typhi bacteria based on biochemical and serotyping methods is laborious and time-consuming. Thus, rapid molecular detection methods, such as nucleic acid-based amplification, such as PCR assay, is critically needed to help diagnose this contagious disease. Development of this test requires diagnostic markers that are sensitive and specific.   This is the first report on the use of genes STY0307, STY0322, STY0326, STY2020, and STY2021 as . Typhispecific diagnostic markers. Unlike other . Typhi PCR targets that were selected based on immunological properties, these genes are individually highly specific for . Typhi and therefore can be used as single-gene target PCR assays without the need for nested or multiplex PCR. Also, these targets are whole gene sequences (from start to stop codon for the purpose of whole gene amplification) unlike other diagnostic markers which are only partial gene sequences. The idea of using this strategy is that if the whole gene sequence is specific to the bacteria then primers can be designed at any location of the gene. Thus, these gene sequences not only serve as specific targets for PCR assay, but also are suitable for more advance diagnostic tests that require multiple DNA sites, such as loop-mediated isothermal amplification (LAMP) and strand displacement amplification (SDA) which requires multiple primer annealing sites [34]. These genes could be utilized for the development of innovative Point-of-Care (POC) diagnostics to address the need for low-cost, simple, rapid, and accurate diagnostics for low resource settings.
The gene STY0201 has been used as a PCR target, and the PCR assays that were developed based on this gene were reported to be 100% sensitivity and specificity [35,36].   However, this study found that this gene was only 97.2% specific and cross-reacted with . Oslo and . Kissi. The incorrect bioinformatic prediction of the specificity of gene STY0201 may be due to the incomplete genome sequence available for the 2 bacteria in the NCBI database that limit the matching accuracy of the BLASTn search. This is a limitation of the alignment-based marker identification method, as it relies on the availability of a complete genome sequence. Thus, whenever new sequence data becomes available for the target organism, the bioinformatic analysis should be repeated to align the current diagnostic markers with the new gene sequence to ensure the specificity.
The other 5 genes identified in this study showed no sequence homology to proteins of known function using protein BLAST (BLASTp) programs. Genes STY0307, STY0322, and STY0326 encode for hypothetical proteins, while genes STY2020 and STY2021 encode for putative bacteriophage proteins. Interestingly, genes STY0307, STY0322, and STY0326 are located in the Salmonella Pathogenicity Island 6 (SPI-6). Yet, their role in bacteria virulence and  pathogenicity remains unknown. More importantly, antigenicity prediction scores using SCRATCH protein prediction software [26] showed that genes STY0201, STY0207, STY0307, STY0326, and STY2020 were highly antigenic and may have potential to serve as antigens for serodiagnosis of typhoid fever (Table 3). When compared with the deduced amino acid sequence of . Paratyphi A, which is the closest relative of . Typhi [37], the putative proteins showed weak or no similarity to . Typhi (Table 3). These findings provide an opportunity for gene cloning and protein expression to investigate their serodiagnostic value for development of lowcost antibody-based diagnostic tests or vaccines for typhoid fever.
In conclusion, 5 S. Typhi-specific genes, namely, STY0307, STY0322, STY0326, STY2020, and STY2021, were found to be highly conserved among . Typhi strains. Wet-lab experiments found no false positive reaction with non-Typhi serotypes or non-Salmonella enteric pathogens. These genes could serve as useful diagnostic markers for development of DNA-based diagnostics for sensitive and specific detection of typhoid fever.