Evaluation of the Ribosomal Protein S1 Gene (rpsA) as a Novel Biomarker for Mycobacterium Species Identification

Objectives. To evaluate the resolution and reliability of the rpsA gene, encoding ribosomal protein S1, as a novel biomarker for mycobacteria species identification. Methods. A segment of the rpsA gene (565 bp) was amplified by PCR from 42 mycobacterial reference strains, 172 nontuberculosis mycobacteria clinical isolates, and 16 M. tuberculosis complex clinical isolates. The PCR products were sequenced and aligned by using the multiple alignment algorithm in the MegAlign package (DNASTAR) and the MEGA program. A phylogenetic tree was constructed by the neighbor-joining method. Results. Comparative sequence analysis of the rpsA gene provided the basis for species differentiation within the genus Mycobacterium. Slow- and rapid-growing groups of mycobacteria were clearly separated, and each mycobacterial species was differentiated as a distinct entity in the phylogenetic tree. The sequences discrepancy was obvious between M. kansasii and M. gastri, M. chelonae and M. abscessus, M. avium and M. intracellulare, and M. szulgai and M. malmoense, which cannot be achieved by 16S ribosomal DNA (rDNA) homologue genes comparison. 183 of the 188 (97.3%) clinical isolates, consisting of 8 mycobacterial species, were identified correctly by rpsA gene blast. Conclusions. Our study indicates that rpsA sequencing can be used effectively for mycobacteria species identification as a supplement to 16S rDNA sequence analysis.


Introduction
Members of the genus Mycobacterium are widespread in nature and range from harmless saprophytic species to strict pathogens that cause serious human and animal diseases. Both slow-growing mycobacteria and rapid-growing mycobacteria can cause human infections. Traditionally, taxonomy based on biochemical characteristics has been used for species determination of mycobacteria, but this approach is limited due to the overlapping biochemical and phenotypic patterns among the different mycobacterial species. Another approach using analysis of cell-wall fatty acid and mycolic acid composition is also limited by profile similarity among some emerging nontuberculosis mycobacteria (NTM) [1]. 16S rDNA homologue gene sequence comparison has been used as an important method for the mycobacterial species identification; however, ambiguous results have been obtained either due to the presence of more than one copy of the 16S rDNA gene within the genome, for example, in M. celatum and M. terrae complex [2,3], or due to sequence homology between species [4]. Therefore, alternative phylogenetic markers which are capable of complementing 16S rRNA gene would be useful for the phylogenetic study and species identification of the genus Mycobacterium.
Ribosomal protein S1 (rpsA), which is in the 30S ribosome subunit, contains the S1 domain that has been found in a large number of RNA-associated proteins. RpsA is a vital protein involved in protein translation and the ribosome-sparing process of translation. In addition, it has been reported that RpsA is the target of pyrazinoic acid, the active form of the antituberculosis drug pyrazinamide [5]. Mycobacterium tuberculosis has a single copy of the rpsA gene in the genome [6], while the sequence homology among different mycobacteria species in which the gene has already been sequenced is between 86.7 and 100%. This suggests that rpsA may be suitable for phylogenetic study of the genus Mycobacterium. In this paper, we report an evaluation of the rpsA gene as a novel biomarker for mycobacteria species identification.

Mycobacterial Reference Strains and Clinical Isolates.
42 type and reference strains of the genus Mycobacterium (Table 1), 172 clinical NTM isolates, and 16 M. tuberculosis complex (MTC) isolates were investigated. All the type and reference strains were purchased from the American Type Culture Collection (ATCC) and all the clinical isolates used in this study were obtained from the Clinical Database and Sample Bank of tuberculosis of Beijing, National Clinical Lab on Tuberculosis, Beijing Chest Hospital. All the clinical NTM isolates were identified to the species level by sequence alignment of at least two of the following: 16S rDNA, 16-23S rRNA gene internal transcribed spacer (ITS), and rpoB and hsp65 genes as described before [2,[7][8][9] 2.2. rpsA Gene Amplification and Sequencing. DNA was released from cultured mycobacteria by boiling the cultured mycobacterial suspension in TE buffer for 10 min. After centrifugation, the supernatant was used for PCR amplification [10]. The primers used were forward primer, 5 -CCCTAC-ATCGGCAAGGAG-3 , position 487-504 in the rpsA gene of Mycobacterium tuberculosis, GenBank accession number NC 000962.2, and reverse primer, 5 -TGTCGATGA-CCTTGACCATC-3 , position 1032-1051 in the rpsA gene of Mycobacterium tuberculosis, GenBank accession number NC 000962.2. The amplified product was 565 bp. PCR products were purified and sequenced by a commercial company (YINGJUN Biotech Company, Beijing, China) using ABI 3730 DNA Analyzer (Applied Biosystems, California, USA).

rpsA Sequence Alignment of the Reference Strains.
Between 85.4% and 100% sequence homology (interspecies divergence, 0% to 14.6%) was observed among the 42 tested reference strains and the 5 additional mycobacterial species whose sequences were obtained from the Gen-Bank database ( Table 2). All the rpsA gene sequences of the analyzed mycobacteria strains were distinct from the outgroup strain R. equi. Among the 19 reference strains of the slow-growing Mycobacterium genus, 11 strains were greater than 97% homology, including 5 M. tuberculosis complex strains. Among the 28 reference strains of the rapidgrowing Mycobacterium genus, 14 strains were greater than 97% homology (

Phylogenetic Tree Construction.
A phylogenetic tree, which provided the basis for species differentiation in the genus Mycobacterium, was constructed ( Figure 1). The absolute majority tested species showed good separation. The rapid-growing species were well defined from the slowgrowing species in the tree. M. chelonae, M. abscessus subspecies massiliense, and M. abscessus, which are categorized in the pathogenic taxonomic group of rapid-growing mycobacteria, formed a distinctive cluster which was much closer to the slow-growing species compared with the nonpathogenic group of rapid-growing mycobacteria. The reliability of the phylogenetic tree was verified by the bootstrap method, using R. equi as the outgroup.  Table 3). No discrepancies were found in the species submitted as M. intracellulare, M. avium, M. abscessus, and M. tuberculosis complex. The sequence divergence among MTC members was 0.4%. The intraspecies divergence among the 188 clinical strains ranged from 0.6% to 5.3% (Table 3). The sequence diversity among the M. abscessus was 0.6%, while that among the M. gordonae clinical isolates was 5.3%. No M. gordonae clinical strain had an identical sequence with that of the M. gordonae reference strain (ATCC14470). Interestingly, the rpsA sequence of one strain among the 23 M. kansasii isolates exhibited a relatively low level of sequence similarity (95.6%) to that of the reference strain (ATCC 12478), while those of all the other M. kansasii isolates were identical. Since the strain was confirmed as M. kansasii by rpoB and hsp65 gene alignment, it might suggest a distant variant or a new subtype.

Discussion
16S rDNA gene sequence alignment has been used as the reference method for mycobacterial species identification. However, it has been reported that, by using the 16S rDNA gene alone for species identification of clinical NTM, 37% of such isolates remained unclassified which illustrates the need for additional molecular tools for proper phylogenetic assignment and accurate NTM identification [11]. The common assumption that bacterial isolates belong to the same species if they have fewer than 5-15 bp differences within the 16S rDNA gene sequence [12] or if they have more than 97% 16S rDNA gene sequence identities [13] may not be applicable to genus of Mycobacterium, whose members are much more closely related to each other.  [9,[14][15][16]. Three out of the 4 M. gordonae strains with discrepant identification still had the first distinction as M. gordonae by rpsA, but the sequence identity was lower than 97%, which means the identification might be correct when a more complete in-house database is being developed.
rpsA alone cannot differentiate the following: between members of the M. tuberculosis complex, between M. senegalense and M. thermoresistibile, between M. parafortuitum and M. trivial, between M. diernhoferi and M. duvalii, and between M. austroafricanum and M. terrae; however, these species are also difficult to be separated by other markers alone such as 16s rDNA, ITS, rpoB, and hsp65. Several   48 11.7 11.7 11.5 11.7 11.2 13.0 10.4 9.7 11.7 11.9 11.9 12.8 11.0 11.0 13.5 12.6 10.6 11.7 11.9 11.9 11.2 11.7 11.9 11.7 12.   Figure 1: Phylogenetic tree based on rpsA gene sequences shows the relationship of the 47 type strains of mycobacteria and 1 outgroup strain. This tree was constructed by the neighbor-joining method. Topology was also evaluated by bootstrap analysis (MEGA program, 1000 repeats, with R. equi as the outgroup). The numerical values in the tree represent bootstrap results. The distance between two strains is the sum of the branch lengths between them. a Numbers in parentheses ( ) represent the numbers of isolates identified as a particular species. b Identification based on sequencing of at least two of the following: 16S rDNA, 16-23S rRNA gene internal transcribed spacer (ITS), and rpoB and hsp65 genes.
reference strains had similar homology with two or more species which suggests inadequate taxonomy within the currently described species. Based on nucleotide sequences of rpoB, hsp65, and sodA in clinical isolates of the Mycobacterium abscessus group, one-fourth of isolates had discordant identification [16]. Multilocus sequence typing and sequence analysis of several genes facilitate the identification of closely related species or subspecies [17]. In our study, we have demonstrated that the rpsA homology gene is a promising marker for mycobacterial species identification for both fast-and slow-growing mycobacteria. To be a good marker for species differentiation, the target gene should be stable and sequence variations should occur randomly; additionally, an extremely conserved or highly variable gene may not be adequate. As a single-copy housekeeping gene, rpsA gene could work well as a target gene for Mycobacterium species discrimination without ambiguous identification. The 16S rDNA gene has higher homology within the mycobacteria compared to the rpsA gene: 94.3% to 100% compared to 85.4% to 100%. According to our work, when used as sole marker, rpsA had better resolution than 16s rDNA but had similar resolution as ITS, rpoB, and hsp65.
Due to the high resolution power of rpsA for species identification, we presume it has potential of clinical use. From our own experience, we recommend that when performing species identification by homologue DNA sequence comparison, one should start with 16s RNA gene, plus at least one of the other markers such as ITS, rpoB, hsp65, and rpsA. When conflicting outcome or dubious outcomes yield, more markers should be added further. Even though 16s RNA gene is inferior to the other markers considering the capacity, it should be chosen firstly since 16s RNA gene has the most robust sequence database, which can help to avoid big identification errors due to unreliable database of other markers [18].
The main deterrents for the primary use of rpsA sequencing as a routine means of identifying mycobacteria reside in the need for a comprehensive database. Therefore, we constructed our own rpsA sequence database by including as many type strains as possible and integrating rpsA sequences deposited in GenBank such as M. avium subspecies paratuberculosis, M. ulcerans, M. vanbaalenii, M. abscessus subspecies massiliense, and M. canettii. Besides type strains, rpsA sequences of confirmed clinical strains were also included. We will constantly upgrade our database and the capability to identify the most recent described species.