Genomic Analysis of Bacillus megaterium NCT-2 Reveals Its Genetic Basis for the Bioremediation of Secondary Salinization Soil

Bacillus megaterium NCT-2 is a nitrate-uptake bacterial, which shows high bioremediation capacity in secondary salinization soil, including nitrate-reducing capacity, phosphate solubilization, and salinity adaptation. To gain insights into the bioremediation capacity at the genetic level, the complete genome sequence was obtained by using a multiplatform strategy involving HiSeq and PacBio sequencing. The NCT-2 genome consists of a circular chromosome of 5.19 Mbp and ten indigenous plasmids, totaling 5.88 Mbp with an average GC content of 37.87%. The chromosome encodes 5,606 genes, 142 tRNAs, and 53 rRNAs. Genes involved in the features of the bioremediation in secondary salinization soil and plant growth promotion were identified in the genome, such as nitrogen metabolism, phosphate uptake, the synthesis of organic acids and phosphatase for phosphate-solubilizing ability, and Trp-dependent IAA synthetic system. Furthermore, strain NCT-2 has great ability of adaption to environments due to the genes involved in cation transporters, osmotic stress, and oxidative stress. This study sheds light on understanding the molecular basis of using B. megaterium NCT-2 in bioremediation of the secondary salinization soils.


Introduction
Soil application of organic and inorganic fertilizers for crop and vegetable cultivation is the major source for soil nitrate-nitrogen (nitrate-N), which increases agricultural productivity. However, the vegetable yields do not increase continuously with soil nitrate-N [1]. A large accumulation of nitrate in soil results in soil secondary salinization, having various adverse effects on soil productivity, and nitrate accumulation in vegetables [2]. What is more, the reduction of nitrate to nitrite can cause various human diseases [1]. Soil secondary salinization is a severe problem in intensively managed agricultural ecosystems [3]. It is required to develop a low-cost bioremediation method to remove nitrate from soil.
In our previous study, Bacillus megaterium NCT-2 was isolated from the secondary nitrate-salinized soil in a greenhouse, which shows high nitrate-reducing capacity and salinity adaptation in secondary salinization soil [4]. It can remove nitrate at initial nitrate-N concentrations ranging from 100 mg/L to 1,000 mg/L and grow well in inorganic salt medium with 4.0% sodium chloride [4]. In our field trails, the concentrations of NO 3 in both soil and plant were reduced significantly when we used the NCT-2 strain mixed with straw powder to treat secondary salinization soil (unpublished). Moreover, this strain showed significant phosphatesolubilizing ability of insoluble inorganic phosphates in the culture medium [5]. Strain NCT-2 has the potential to be utilized as a biofertilizer for bioremediation of the secondary nitrate-salinized soil and plant growth promotion [6].
The Gram-positive bacterium Bacillus megaterium is found in diverse habitats from soil to sediment, sea, and dried food. It was named after its big size with a volume approximately 100 times than that of Escherichia coli [7]. Its big size made it ideal to be used in studies of cell structure, protein localization, sporulation, and membranes [8,9]. Due to no production of endotoxins associated with the outer membrane and no external alkaline proteases, they are used widely as desirable cloning hosts in food and pharmaceutical production processes for αand β-amylases in the baking industry [10,11], penicillin acylase [12][13][14], and vitamin B12 [15], such as Bacillus megaterium DSM 319, Bacillus megaterium QM B1551, and Bacillus megaterium WSH 002 [16,17]. The genomes of them have been sequenced to gain insights into the metabolic versatility that facilitate biotechnological applications, not the bioremediation of secondary salinization soil [18,19].
Despite the previously published work sequenced the 5.68 Mb draft genome of B. megaterium NCT-2 by using the Solexa platform, consisting of the 204 contigs, it focused only on the multiple alignments of nitrate assimilationrelated gene sequences [20]. The functional nitrate assimilation-related genes (the nitrate reductase electron transfer subunit, the nitrate reductase catalytic subunit, the nitrite reductase [NAD(P)H] large subunit and small subunit, and the glutamine synthetase) were identified [20]. The genes that could be involved in the full potential of strain NCT-2 in the bioremediation of secondary salinization soil remain unknown. For this, we obtained its complete genome sequence by using a multiplatform strategy involving HiSeq and PacBio sequencing. Furthermore, we performed a comprehensive analysis of nitrogen metabolism and plant growth-promoting features. The comparative analysis might be helpful for use in soil bioremediation.

DNA Preparation and Genome
Sequencing. B. megaterium NCT-2, isolated from the secondary salinized greenhouse soil in China, was cultured in a defined inorganic salt medium as previously described [4]. It was registered in China General Microbiological Culture Collection Center under CGMCC No. 4698. Genomic DNA was isolated using QIAGEN DNeasy Blood & Tissue Kit (Hilden, Germany). The concentration and quality of DNA were determined by a Qubit Fluorometer (Thermo Scientific, USA), NanoDrop Spectrophotometer (Thermo Scientific, USA), and agarose electrophoresis. The whole genome of the B. megaterium strain NCT-2 was sequenced by the BGI Tech Solutions Co., Ltd. (Shenzhen, China) by using Illumina Hiseq 4000 short-read sequencing platform (Illumina Inc., San Diego, CA, USA) (insert size, 500 bp; 2 × 125 bp read length) and PacBio RSII long-read sequencing platform (Pacific Biosciences of California, Inc., Menlo Park, CA, USA) ( Figure S1).

Genome Assembly and Annotation.
After quality control, the de novo assembly of the whole NCT-2 genome was performed using the RS_HGAP Assembly3 in the SMRT Analysis pipeline version 2.2.0 [21]. The HiSeq clean reads were preliminarily assembled into contigs and then were used for hybrid error correction of the subreads from PacBio. There were two rounds of error correction. One was analyzed by using SOAPsnp and SOAPIndel [22] and another was by using the Genome Analysis Toolkit (GATK) [23]. Finally, SSPACE-LongRead [24] and Celera assemble [25] were used to generate a high-quality genome. The finished NCT-2 genome was submitted to GenBank, replacing the previous version of the draft genome [20].
The protein-coding genes were predicted by using Glimmer 3.02 [26], and the tandem repeats were detected with Tandem Repeat Finder 4.04 [27]. The gene function annotation was accomplished by blasting the protein sequences against the database of Kyoto Encyclopedia of Genes and Genomes (KEGG) [28]. In addition, the RAST web server (https://rast.nmpdr.org) with the default parameters was used to catalog all the predicted genes into subsystems according to functional categories [29,30]. CGView was used to produce the maps of the circular genomes with gene feature information [31]. Genome alignments with locally collinear blocks were performed with MAUVE [32].
2.3. Phylogenetic Analysis. The whole genome-based phylogenetic analysis was performed by using the CVTree 3.0 online server [33,34]. Fourteen genome sequences were obtained from GenBank. A phylogenetic tree was constructed by the neighbor-joining method using MEGA analysis [35][36][37]. In addition, FusionDB was used to analyze the functional repertories of B. megaterium NCT-2 and identify the nearest "neighbors" based on the functional similarities [38,39].

Results and Discussion
3.1. General Genomic Characteristics. A total of~1,189 Mb raw data and~1,147 Mb clean data were obtained after filtering the low-quality reads generated by the HiSeq platform. The PacBio platform yielded 48,392 polymerase reads (with the average size of 12.9 kb) and 622 Mb subreads after quality control. The complete genome was assembled by taking advantage of the higher accuracy short reads from the HiSeq platform and the long subreads from the PacBio platform. The genome consists of a circular chromosome of 5.19 Mb with an average GC content of 38.2% (accession number: CP032527.2) and ten circular plasmids designated as the plasmid pNCT2-1 to pNCT2-10 (accession numbers: CP032528.1-CP032537.1). Sequence information was visualized in CG view Server (Figure 1 and S2). The total genome size is 5.88 Mb with an average GC content of 37.87%. The whole genome contains 6,039 genes, including 5,606 coding sequences, 203 RNA genes, and 230 pseudo genes. There are 127 identified tandem repeat sequences (TRF), 83 minisatellite DNA, and 7 microsatellite DNA.
The general features of B. megaterium NCT-2 were compared with five genomes of Bacillus strains (Bacillus megaterium DSM 319, Bacillus megaterium QM B1551, Bacillus subtilis subsp. subtilis str. 168, Bacillus cereus Q1, and Bacillus licheniformis DSM 13) ( Table 1). The genome GC contents for three B. megaterium strains are around 38%. Strain NCT-2 has the largest genome size and most coding sequences and RNA genes, such as 53 rRNAs and 142 tRNAs. There were 14 rRNA operons on the negative chain and one rRNA operon on the positive strand with a 16S-23S-5S organization. In addition, the positive chain had one unusual rRNA operon with a 16S-23S-5S-5S organization and a single 5S rRNA. The microbial genome size is positively correlated 2 International Journal of Genomics with their environment adaptability [40]. One typical characteristic of soil microorganisms is the high number of rRNAs, which is helpful for fast growth, successful sporulation, germination, and rapid response to changing the availability of nutrients [41][42][43][44]. These features indicate that strain NCT-2 has great ability of adaptation to various environments. Most strains of Bacillus megaterium carry multiple plasmids, such as strain QM B1551 has seven resident plasmids [18], Bacillus megaterium strain 216 has ten plasmids [45], and Bacillus megaterium NBRC 15308 has six plasmids. As for the ten plasmids in strain NCT-2, the sizes range from 9,625 bp to over 132 kb making up 11.7% of the whole genome (Table S1). The plasmids have significantly lower GC contents than the chromosome (33.7-37.0% versus 38.2%). There are 761 coding sequences and 23 RNA genes. Both plasmids pNCT2-2 and pNCT2-6 had one tRNA. In   3 International Journal of Genomics addition, pNCT2-7 had 18 tRNAs, one 5S RNA, one large subunit ribosomal RNA (LSU rRNA), and one small subunit ribosomal RNA (SSU rRNA). Additional rRNA operons carried on plasmids slowed the growth rates of E. coli on poor carbon sources [46]. Further investigations are needed to clarify the role of plasmids in bacterial growth and adaptations to high-nitrate environments in bioremediation of the secondary salinization soils.
3.2. Phylogenetic Lineage Analysis. We used CVTree 3.0 to construct a phylogenetic tree based on the complete proteomes with Macrococcus caseolyticus JCSC5402 as an outgroup. The obtained tree (Figure 2(a)) indicated that B. megaterium NCT-2 was most homologous to B. megaterium DSM 319 and then B. megaterium QM B1551. Similarly, genome comparison using the RAST Prokaryotic Genome Annotation Server also showed that the genomic sequence of NCT-2 had a higher comparison score with B. megaterium QM B1551 and B. megaterium DSM 319 ( Figure S3). Furthermore, 16S rDNA sequences from 15 Bacillus strains were used to construct a phylogenetic tree by MEGA7 with the neighbor-joining method. The neighbor-joining phylogenetic tree shows that strain NCT-2 is closest to B. megaterium QM B1551, B. megaterium DSM 319, and B. megaterium WSH 002 (Figure 2(b)). Whole-genome alignment of B. megaterium NCT-2 to closely related QM B1551 and DSM 319 by using MAUVE revealed that the chromosomes of the three strains showed overall collinearity (Figure 2(c)).

Functional Annotations of B. megaterium NCT-2.
To investigate the function of the 5,606 coding sequences, the GO database, the KEGG database, the COG database, and RAST web server were used. The 3,159 genes annotated by GO were classified into biological processes, cellular components, and molecular functions ( Figure S4). The top five categories were catalytic activity (1,822), metabolic process (1,786), cellular process (1,567), single-organism process (1,400), and binding (1,214).
Like most strains of B. megaterium, which carry more than four plasmids, strain NCT-2 harbors ten indigenous plasmids. Only 75 genes (10%) were assigned into 37 subsystems by RAST ( Figure S5b), including genes for riboflavin metabolism, butanol biosynthesis, and xylose utilization, and parts of genes in benzoate degradation and metabolism of central aromatic intermediates. There are also genes for cobalt-zinc-cadmium resistance, oxidative stress, and nitrosative stress.

Microbial Functional
Similarities. The translated protein sequence of B. megaterium NCT-2 was downloaded from RAST and submitted to the FusionDB web server (https:// services.bromberglab.org/fusiondb/mapping) [38]. The submitted proteome (containing 5,364 proteins) matched to 3,662 FusionDB functions, while 228 proteins could not be mapped to any function in their database. The functional similarities of B. megaterium NCT-2 with 1,374 taxonomically distinct bacteria (with similarity > 40%) were shown in Table S2, most of them were soil bacterium. Strain NCT-2 is most functionally similar to B. megaterium DSM 319 (90%) and B. megaterium QM B1551 (89%). The functional relationships among nine Bacillus strains were demonstrated by the fusion+ networks (Figure 4(a)). There were 1,290 functions shared by all of them. The common functional annotations related to nitrogen metabolism were nitrite transporter NirC, nitrogen-fixing NifU domain protein, nitroreductase, nitrate transport protein, and 2nitropropane dioxygenase. Notably, there are 3,047 functions shared among three strains of B. megaterium (strain NCT-2, strain QM B1551, and strain DSM 319) (Figure 4(b)). Strain NCT-2 has most of the core genes and pathways, including vitamin biosynthesis and nitrogen metabolism. The nitrogen metabolism-related genes, such as those encoding nitrate transport protein, nitrate/nitrite sensor protein, nitric oxide reductase activation protein, nitrite reductase [NAD(P)H] large subunit, nitrite reductase [NAD(P)H] small subunit, nitrite transporter, nitrite-sensitive transcriptional repressor, nitrogen regulatory protein P-II, nitrogen-fixing NifU domain protein, nitroreductase, and nitroreductase family protein, were located on the chromosome of the three strains. Furthermore, only strain NCT-2 carries the gene encoding for periplasmic nitrate reductase.

Genome Inventory for Nitrogen Metabolism.
In our field experiment, strain NCT-2 shows high nitrate-reducing capacity in secondary salinization soil (unpublished). The functional nitrate assimilation-related genes that are involved in the process of converting nitrate to glutamine have been identified [20]. The genes encoding nitrate and nitrite reductase were cloned and overexpressed in Escherichia coli [47]. Here, the whole genomic analysis also revealed the genes encoding sensor, transporter, and enzymes are involved in nitrogen metabolism. The genes were scattered in the chromosome. Genes encoding nitrite-sensitive transcriptional repressor (NsrR), which is directly sensitive to nitrosative stress, were found in both the chromosome and the plasmid (Table S3 and Figure S6). B. megaterium NCT-2 possessed 4 International Journal of Genomics      International Journal of Genomics amino acids through L-Glutamine and L-Glutamate by glutamine synthetase type I (GSI), Ferredoxin-dependent glutamate synthase (GOGATF), glutamate synthase [NADPH] large chain (GOGDP1), and glutamate synthase [NADPH] small chain (GOGDP2). Ammonium transporter (Amt) was also encoded in the genome. Ammonium is an important nitrogen source for plant growth. Environmental NH 4 + /NH 3 was imported across membranes by Amt for cell growth in prokaryotes and plants [49]. Bacterial Amt proteins act as passive channels for the uncharged gas ammonia (NH 3 ) [50]. It means that B. megaterium NCT-2 might scavenge NH 4 + /NH 3 in soil instead of providing. In the face of nitrosative stress, genes encoding nitrite-sensitive transcriptional repressor (NsrR) were found in both the chromosome and the plasmid. NsrR played a pivotal role in the regulation of NirK (nitrite reductase), which was expressed aerobically in response to the increasing concentration of NO 2 and decreasing pH [51]. However, no functional NirK could be found. Instead, two nitric oxide reductase activation proteins (NorD and NorQ) for denitrifying reductase gene clusters were found but without nitric oxide reductase, making the function of denitrification highly unlikely. Thus, the genome analysis proposed that B. megaterium NCT-2 could convert nitrate from secondary salinization soil into biomass through glutamate rather than reduce nitrate to nitrous oxide or dinitrogen, which are lost from the soil ( Figure 5). It is an effective bioremediation approach to remove nitrate from soils.

Genes Associated with Plant Growth-Promoting Features.
Our previous studies on the plant growth promotion of B. megaterium NCT-2 revealed that it could produce organic acids (lactic acid, acetic acid, propionic acid, and gluconic acid) and phosphatase in culture medium, showing significant phosphate-solubilizing ability [5]. Inoculation with B. megaterium NCT-2 significantly increased the root fresh weight of maize [6]. The genome of NCT-2 contains genes encoding for glucose 1-dehydrogenase (EC 1.1.1.47) and alkaline phosphatase (EC 3.1.3.1). Glucose dehydrogenase can oxidize glucose to gluconic acid, which is the most frequent organic acid produced by phosphate-solubilizing bacteria [52]. Additionally, the phosphate starvation system for phosphate uptake encoded by pstS, pstC, pstA, and pstB was also found in the genome. The phosphate solubilization capacity of strain NCT-2 plays a positive role in promoting plant growth by dissolving unavailable P (PO 4 3-) in soil to plant available forms.
Many plant growth-promoting bacteria have the ability to synthesize plant auxins (indole-3-acetic acid, IAA)  7 International Journal of Genomics [53,54], which is a key regulator for plant growth and development, such as cell division and elongation, lateral root production, and flowering [55]. Large-scale genomic analysis of IAA synthesis pathways suggested that plenty of bacteria could synthesize IAA via multiple incomplete pathways, and Firmicutes genomes had the simplest Trp-dependent IAA synthetic system [56]. According to the KEGG analysis, strain NCT-2 could assimilate tryptophan (Trp) ( Figure S7) but had incomplete Trp-dependent IAA synthesis pathways, such as the indole-3-acetamide (IAM) pathway and indole-3-pyruvate (IPA) pathway ( Figure S8). It had aldehyde dehydrogenase (NAD + ) (EC 1.2.1.3) and amidase (EC 3.5.1.4) catalyzing the final step of IAA synthesis. However, we could not find the enzymes which convert Trp into IAM and IPA. These results suggested that strain NCT-2 might synthesize IAA from intermediates.
Both the phosphate solubilization and IAA synthesis play important roles in plant growth promotion of strain NCT-2 during biocontrol and bioremediation of the secondary salinization soils.

Genes Involved in Stress
Response. B. megaterium NCT-2 showed high salinity adaptation in secondary salinization soil in our previous study [4]. From the genome perspective, we can see genes involved in cation transporters (magnesium transport and copper transport system) and stress response, such as osmotic stress, oxidative stress, and detoxification. Glycine betaine, a very efficient osmoprotectant, can be synthesized or acquired from exogenous sources [57]. There are glycine betaine ABC transport systems (opuA, opuC, and opuD) for choline uptake and genes for the glycine betaine biosynthetic enzymes (choline dehydrogenase, gbsB, and betaine-aldehyde dehydrogenase, gbsA) in strain NCT-2 genome. Moreover, the genome contains genes encoding for superoxide dismutase (EC 1.15.1.1), catalase (EC 1.11.1.6), and ferroxidase (EC 1.16.3.1), protecting bacteria from oxidative stress. It implied that NCT-2 has great ability of adaption to environments.

Conclusion
A hybrid approach with multiple assembler was used to assemble the complete genome of B. megaterium NCT-2. The deeper investigation identified clues associated with the features of the bioremediation of secondary salinization soil and plant growth promotion at the gene level, such as nitrogen metabolism, phosphate uptake, synthesis of organic acids and phosphatase for phosphate-solubilizing ability, and Trpdependent IAA synthetic system. Furthermore, the genes involved in cation transporters, osmotic stress, and oxidative stress implied that NCT-2 has great ability of adaption to environments. In summary, these results provide valuable genomic resources for further studies and applications of using B. megaterium NCT-2 in bioremediation processes of secondary salinization soil.

Data Availability
All data generated or analyzed during this study are included in this published article and its supplementary information files. The genome sequence of B. megaterium NCT-2 has been deposited in GenBank. The accession number for the B. megaterium NCT-2 chromosome is CP032527.2, and those for ten plasmids are CP032528.   International Journal of Genomics

Disclosure
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Conflicts of Interest
The authors declare no conflict of interest.