Complete Genome Sequence and Comparative Genome Analysis of Variovorax sp. Strains PAMC28711, PAMC26660, and PAMC28562 and Trehalose Metabolic Pathways in Antarctica Isolates

The complete genomes of Variovorax strains were analyzed and compared along with the genomes of Variovorax strains PAMC28711, PAMC28562, and PAMC26660, Antarctic isolates. The genomic information was collected from the NCBI database and the CAZyme database, and Prokka annotation was used to find the genes that encode for the trehalose metabolic pathway. Likewise, CAZyme annotation (dbCAN2 Meta server) was performed to predict the CAZyme family responsible for trehalose biosynthesis and degradation enzymes. Trehalose has been found to respond to osmotic stress and extreme temperatures. As a result, the study of the trehalose metabolic pathway was carried out in harsh environments such as the Antarctic, where bacteria Variovorax sp. strains PAMC28711, PAMC28562, and PAMC26660 can survive in extreme environments, such as cold temperatures. The trehalose metabolic pathway was analyzed via bioinformatics tools, such as the dbCAN2 Meta server, Prokka annotation, Multiple Sequence Alignment, ANI calculator, and PATRIC database, which helped to predict trehalose biosynthesis and degradation genes' involvement in the complete genome of Variovorax strains. Likewise, MEGA X was used for evolutionary and conserved genes. The complete genomes of Variovorax strains PAMC28711, PAMC26660, and PAMC28562 are circular chromosomes of length (4,320,000, 7,390,000, and 4,690,000) bp, respectively, with GC content of (66.00, 66.00, and 63.70)%, respectively. The GC content of these three Variovorax strains is lower than that of the other Variovorax strains with complete genomes. Strains PAMC28711 and PAMC28562 exhibit three complete trehalose biosynthetic pathways (OtsA/OtsB, TS, and TreY/TreZ), but strain PAMC26660 only possesses one (OtsA/OtsB). Despite the fact that all three strains contain trehalose, only strain PAMC28711 has two trehalases according to CAZyme families (GH37 and GH15). Moreover, among the three Antarctica isolates, only strain PAMC28711 exhibits auxiliary activities (AAs), a CAZyme family. To date, although the Variovorax strains are studied for different purposes, the trehalose metabolic pathways in Variovorax strains have not been reported. Further, this study provides additional information regarding trehalose biosynthesis genes and degradation genes (trehalases) as one of the factors facilitating bacterial survival under extreme environments, and this enzyme has shown potential application in biotechnology fields.


Introduction
Variovorax is a Gram-negative and motile bacterium belonging to the family Comamonadaceae [1] that is found in a straight to slightly curved or rod-shaped form. Variovorax colonies are yellow due to the presence of carotenoid pigments, and their colonies are slimy and shiny on nutrient agar. Many strains belonging to the family Comamonadaceae thrive in polluted environments and degrade complex organic compounds [2], whereas Variovorax generally inhabits soil and water [3]. Variovorax sp. PAMC28711 [4], Variovorax sp. PAMC26660, and Variovorax sp. PAMC28562 were isolated from Antarctica, and they are complete metagenomic assembled genomes.
According to the Pearce group, due to the size of Antarctica, there are many other specifc extremes, such as areas with volcanic activity, hypersaline lakes, subglacial lakes, and even inside the ice itself, for which specialized extremophiles may be adapted [5]. Terefore, numerous microorganisms have specifcally adapted to a wide range of extreme environments to survive in novel biodiversity, much of which has yet to be elucidated [3]. Another key feature of the Antarctic ecosystem is the extreme variation in the physical conditions, ranging from freshwater lakes (some of the most oligotrophic environments on Earth) to hypersaline lakes [6]. Microorganisms found under extreme environmental conditions like Antarctica are ideal candidates for the study of eco-physiological and biochemical adaptations of such life forms [5]. Antarctica is one of the most physically and chemically challenging terrestrial environments for habitation [7]. Habitats with permanently low temperatures dominate the temperate biosphere and have been successfully colonized by a wide variety of organisms that are collectively termed psychrophiles or cold-adapted organisms [8]. Lichens are characterized by a mutualistic symbiosis between fungi and photosynthetic algae or cyanobacteria, but they also have other associated bacterial communities [9]. Bacteria associated with lichens were initially reported in the frst half of the 20 th century [10]. Te lichen-associated microorganism was reported to carry genes involved in the degradation of polymers [11].
CAZymes belong to a large class of enzymes that are involved in the synthesis and degradation of complex carbohydrates. Based on their amino acid sequences, they are classifed into families with conserved catalytic mechanism, structure, and active site residues but difer in substrate specifcity [12]. Tey are responsible for carbohydrate synthesis through glycosyltransferases (GTs), degradation of complex carbohydrates via glycoside hydrolases (GHs), polysaccharide lyases (PLs), carbohydrate esterases (CEs), and enzymes for auxiliary activities (AAs) and recognition (carbohydrate-binding module, CBM) [13]. Te GHs are the largest family of CAZymes that hydrolyze the glycosidic bond between two or more carbohydrates, or between carbohydrate and noncarbohydrate moieties, via the overall inversion of anomeric carbon [14].
Although the trehalose metabolism has been studied in various microorganisms, it has yet to be elucidated in the genomes of Variovorax. Te sp. belonging to the phylum proteobacteria is able to degrade complex carbohydrates after Bacteroidetes and Firmicutes [15]. Even so, the disaccharide (such as Trehalose) utilization ability of the genus Variovorax has not previously been highlighted. Terefore, this study compares trehalose metabolic pathway in coldadapted Variovorax strains PAMC28711, PAMC26660, and PAMC28562 acquired from the Antarctic region with other complete genomes of Variovorax strains deposited in the NCBI until October 2021. In addition, the study also covers the genes that encode for diferent CAZy families involved in the trehalose metabolic pathway in the complete genomes of Variovorax along with our three strains isolated from the Antarctic region. Bioinformatics tools like dbCAN, RAST, PATRIC database, KEGG pathway database, and Prokka annotation standalone program can assist in the prediction of trehalose synthesis and degradation genes' involvement of the microorganisms for preliminary screening approach without any experimental work.  [4]. A pure R2A agar was used to isolate the bacterial sample for DNA analysis at 15°C. Using a QIAamp DNAMini Kit (Qiagen, Valencia, CA), genomic DNA was extracted from Variovorax sp. PAMC28562 and PAMC26660, and the quantity and purity were evaluated by the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). To assess the quality of the isolated DNA, agarose gel electrophoresis was used. DNA was kept at − 20°C until required. PacBio sequel single-molecule real-time (SMRT) sequencing technology was used to sequence the genome (Pacifc Biosciences, Menlo Park, CA). SMRT cells were used to sequence SMRbell library inserts (20 Kb). Te strains PAMC26660 and PAMC28562 were used to extract raw sequence data of (7,388,698 and 4,693,528) bp, respectively. Tese were assembled de novo using the hierarchical genome-assembly process (HGAP v.4) protocol [16] and HGAP4 assembly using Pacifc Biosciences' SMRT analysis software (version 2.3) (https://github.com/ PacifcBiosciences/SMRT-Analysis). Te complete genome sequences for PAMC26660 and PAMC28562 were deposited in the GenBank database under the accession numbers NZ_CP060295 and NZ_CP060296, respectively.

Genome Annotation of Variovorax sp.
Te genomes of strains PAMC28711, PAMC28562, and PAMC26660 were annotated using the rapid annotation subsystem technology (RAST) server [17] and Prokka annotation [18]. For comparative studies, data on the Variovorax complete genomes were obtained from the National Center for Biotechnology Information (NCBI) database and PATRIC database [19]. Te enzymes involved in the trehalose metabolism pathways were determined using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and a 0.01 cutof value [20]. CAZyme gene analysis was performed using the dbCAN program [21] and a hidden model (HMM) profle retrieved from the dbCAN2 HMMdb database (version 7.0). Simultaneously, we obtained information regarding the existence of CAZyme genes from Signal P (version 4.0) [22]. Te coverage criteria were >0.35, and the e-value cutof was 1e − 15. To maximize prediction accuracy, we applied DIAMOND [23] (e-value 1e102) and Hotpep [24] (frequency >2.6, hits >6).

Complete Nucleotide Sequence and Strain Accession
Numbers. Te complete nucleotide sequences of Variovorax sp. strains PAMC28562 and PAMC26660 were deposited in the GenBank database under the accession numbers CP060296 and CP060295, respectively.

Phylogenomic Classifcation and Average Nucleotide
Identity (ANI) of Variovorax sp. PAMC28711, PAMC26660, and PAMC28562. Te genomes of Variovorax sp. strains PAMC28711, PAMC28562, and PAMC26660 were uploaded to the Type (Strain) Genome Server (TYGS) [25] for whole-genome-based taxonomic analysis [26]. Te genomes of the closest type strains were determined in two ways: frst, the genomes of the PAMC28711, PAMC26660, and PAMC28562 strains were compared to all the type strain genomes available in the TYGS database using the MASH algorithm, a fast approximation of intergenomic relatedness [27], and the type strains with the smallest MASH distances were chosen per genome of the PAMC28711, PAMC26660, and PAMC28562. Second, the 16S rDNA gene sequences were used to identify an additional group of closely related type strains. RNAmmer [28] was used to extract these sequences from the genomes of the PAMC28711, PAMC26660, and PAMC28562 strains, and each sequence was then BLASTed [29] against the 16S rDNA gene sequences of each of the 11,252 type strains now accessible in the TYGS database. Te pairwise comparison of the user strain with the type strains was performed using GBDP, and accurate intergenomic distances were inferred under the "trimming" algorithm and distance formula d5. Digital DDH values and confdence intervals were calculated following the recommended settings of GGDC 2.1 [26] Te intergenomic distances were used to create a balanced minimum evolution tree using FASTME 2.1.4 with 100 pseudo bootstrap replicates for branch support [30]. ANI analysis was performed using three diferent methods, like Orthologous Average Nucleotide Identity Software Tool (OAT) [31], JSp.WS [32], and FastANI [33].

Comparative Genomics Analysis.
All strains of the complete genomes of Variovorax deposited in the NCBI database (https://www.ncbi.nlm.nih.gov/) until October 2021 were analyzed. First, we determined the relationship of PAMC28711, PAMC26660, and PAMC28562 with other strains from the same genus using complete genome sequences and checked their similarity by comparing the phylogenomic analysis. And then we have done the comparison of CAZymes from the registered sp. were referenced using bioinformatics tools, such as CAZyme annotation (dbCAN2 meta server; https://bcb.unl.edu/dbCAN2/), as well as using CAZy (https://www.cazy.org/). Te Prokka annotation standalone program (https://vicbioinformatics. com) and the NCBI database were also used to fnd the genes that encode trehalose biosynthesis and degradation. Te dbCAN2 meta server program annotates the genomes using DIAMOND, HMMER, and Hotpep via CAZy, dbCAN, and PPR databases [21]. Te dbCAN2 meta server allows the submission of nucleotide sequences for prokaryotic and eukaryotic genomes, although protein sequences are preferred. Tis server uses three tools that comprise DI-AMOND (for fast blast hits in the CAZy database), HMMER (for annotated CAZyme domain boundaries according to the dbCAN CAZyme domain HMM database), and Hotpep (for conserved short motifs in the PPR library). Te Kyoto Encyclopedia of Genes and Genomics (KEGG) pathway database and the Prokka annotation standalone program were used to analyze the trehalose metabolic pathways of strains [18,20,34]. Likewise, the PATRIC database (https:// patricbrc.org/) [19] was also used for genomic information.

Various Polysaccharides Screening of Strain PAMC28711
by AZCL Activity. We confrmed the activity through azurine cross-linked (AZCL) analysis, which is based on the visible solubilization of small particles of the AZCL polysaccharide substrate for CAZyme function activity. Seven AZCL substrates (AZCL-amylose, AZCL-barley β-glucan, AZCL-arabinoxylan, AZCL-HE-cellulose, AZCL-xylan (beech wood), AZCL-xylan (birch wood), and AZCLxyloglucan), were used to determine the enzyme activity of the polysaccharide degradation in strain PAMC28711. Tis assay showed the formation of blue haloes around the well in agar media, indicating polysaccharide degrading activity [35]. PAMC28711 was incubated in four diferent media like Bennett's media (B's), Marine agar (MA), Malt Yeast (MY) media, and Reasoner's 2A agar (R2A) to detect active CAZyme-producing strains specifcally and rapidly. Te active culture plate consisted of 2% agarose, 25 mM sodium phosphate bufer (pH 5.5), and xanthan gum solidifed in the plate. A total of 20 μL of the original strain was dispensed on AZCL plates. Te plates were incubated at diferent temperatures (4°C, 15°C, 25°C, and 37°C) for 7 to 10 days, and a blue halo was recorded to confrm activity. Te AZCL activity was performed using a commercial kit from Meg-azyme© (Bray, Ireland; https://www.megazyme.com/) at diferent temperatures of (4, 15, 25, and 37)°C and expressed as the area (cm 2 ) with a blue halo around the sample well in the AZCL assays [36,37].

Phylogenomic Classifcation and ANI Analysis of Variovorax Strains.
Te relationship between strains PAMC28711, PAMC26660, and PAMC28562 and their associated type strains was shown via a phylogenetic tree derived from the intergenomic distance measured using GBDP on the TYGS database ( Figure 1). Based on the 16S rDNA comparison, strains PAMC28711 and PAMC28562 were found to be in the same node, while strain PAMC26660 was found in a diferent node. Tese three strains were found to be closest to the type strains V. boronicumulans NBRC 103145 T , V. beijingensis 502 T , and V. paradoxus NBRC 15149 T (Figure 1(a)), sharing the same clade. Likewise, the whole-genome-based phylogeny revealed a cluster of the same sp. as the closest relatives of PAMC28711, PAMC26660, and PAMC28562 (Figure 1(b)).

AZCL Screening of the Polysaccharide Degradation
According to the fndings, starch substrates, such as AZCL-amylose degrading activity, were seen at (4, 15, and 25)°C in MA and R2A media. But no activity was observed at 37°C (Table 5). Te GH13, GH15, and GH37 genes (Figure 3), in the strain Variovorax sp. PAMC28711, can degrade starch and other carbohydrates like trehalose.

Discussion
Te study of the trehalose metabolic pathway in bacteria has attracted researchers' attention since trehalose has a wide range of industrial and therapeutic applications. It has also been observed that trehalose accumulation or production in bacteria demonstrates stress resistance to desiccation, osmotic stress, and other factors. When we compared the genomic size and GC (percent) content of our three isolates of Variovorax sp. PAMC28711, PAMC26660, and PAMC28562, we discovered that both strains PAMC28711 (4.32 Mb genome size; GC � 63.70 percent) and PAMC28562 (4.69 Mb genome; GC � 66 percent) have smaller genome sizes and GC content than that of the strain PAMC26660, as well as within all the Variovorax genomes studied here. According to Almpanis et al., there may be a correlation between chromosomal length and genome GC content. Te longer the genome, the higher the GC content, which may be true for our two strains, PAMC28711 and PAMC28562, but not for one of our strains, PAMC26660, or most of the other Variovorax strains studied here. Furthermore, as revealed by the fndings of the linear regression model, this alone is not sufcient to explain the whole variation in genome G + C content. As a result, other factors must be explored in order to explain the G + C content [40]. Te organism's normal optimum temperature range is probably the most noticeable of these [41,42]. Based on ANI values, our three strains, PAMC28711, PAMC26660, and PAMC28562, have less than 95% identity whose value did not match the sp. delineation threshold. Te average nucleotide identity (ANI) is a genome similarity metric that may be applied to prokaryotic organisms regardless of their G + C composition, and a cutof value of >95% indicates that they belong to the same sp. [39,43]. Because orthologous genes can difer widely between genomes, ANI values do not imply genome evolution. On the other hand, ANI closely replicates the classic microbiological idea of DNA-DNA hybridization relatedness for defning sp., which is why many researchers prefer it because it considers the fuid nature of the bacterial gene pool and hence indirectly considers shared functions [33]. One of the earliest examples is the discovery that elevating trehalose levels in Streptomyces griseus spores increases resistance to heat and desiccation stress [44], which likely adds to actinomycete spores' capacity to endure harsh environmental conditions. Te germination of spores was similarly delayed by high levels of accumulated trehalose; however, the relevance of this is unclear. Trehalose has since been revealed to protect bacterial vegetative cells from a range of abiotic stresses. Variovorax sp. strains PAMC28711, PAMC26660, and PAMC28562 have trehalose biosynthesis and degrading genes that might be helpful to these organisms to survive in harsh environments like Antarctica.

PGGRFXEXY[G/Y] WD[S/T] Y) and motif 2 (QWD[Y/ F] P[N/Y][G/A]W[P/A]P), whereas GH65 and GH15
trehalases lack these motifs [49,51]. In the CDs of GH37 enzymes, in addition to the two well-known trehalase signature motifs 1 and 2, three CRs (motifs) are also . Te two catalytic residues stated above are found in Motifs 4 and 5. Lip loop regions are also observed in motif 5, which may play an important role in substrate recognition [54]. A cytoplasmic trehalase was found using the CAZyme database, based on the results of rapid annotations using subsystems technology (RAST) annotation [17].
Among the three Antarctic isolates studied, only strain PAMC28711 has both trehaloses GH37 and GH15, which are found in a small number of other Variovorax strains as well. Mycrolicibacterium smegmatis MC2155 was used as a reference sequence (ABK72415.1). Te signature motif of GH15 trehalase difers from that of GH37 trehalase. GAs and glucodextranases (GDases) are GH15 enzymes that have fve CRs in their basic structures, which are assumed to represent the active sites. In hydrolytic reactions, two Glu residues in GH15 CRs 3 and 5 are important [55][56][57][58]. WE[F/D/E/V] and [S/G/A] E[E/H] are analogous regions in GH15 trehalose, where a comparison of GA sites, WEE and [S/P/N] EQ, revealed two Glu residues at identical positions to be signifcant for the catalytic process as catalytic residues. It has been reported that trehalases from the GH15 family have greater K M values for trehalase than trehalases from other families [59].

Conclusions
In summary, the complete genomes of Variovorax strains PAMC28711, PAMC28562, and PAMC26660 were compared with the complete genomes of Variovorax that were deposited at the NCBI until October 2021. A comparative analysis of the obtained genome showed that strain PAMC26660 has only one complete trehalose biosynthesis pathway (TPS/TPP), whereas strains PAMC28711 and PAMC28562 possess all three complete trehalose biosynthesis pathways (TPS/TPP, TS, and TreY/TreZ). In addition, it was found that only strain PAMC28711 has two trehalases (GH37 and GH15) among the three Antarctica isolates studied here. Based on the results of AZCL screening, the strain PAMC28711 thrived at 25°C even though it was isolated from cold-adapted lichen. Based on 16S rRNA sequence analysis and ANI value similarity with other Variovorax sp., the two isolates, PAMC28562 and PAMC26660, have been confrmed as Variovorax sp. Tere have been no previous studies of the trehalose metabolic pathway in Variovorax, including isolates from Antarctica. Strains PAMC28711, PAMC28562, and PAMC26660 are anticipated to be able to synthesize and degrade trehalose. Furthermore, a genomic comparison of Variovorax sp. along with Antarctica isolates demonstrated that these coldadapted organisms can withstand harsh environments. In conclusion, we expect the genome sequence analysis might provide additional information regarding the role of trehalose biosynthesis and degrading encoding genes that are active at low temperatures and can be employed for biotechnological applications and fundamental research purposes.

Data Availability
On reasonable request, the corresponding author will provide the datasets used and analyzed during the current study.

Disclosure
A preprint has previously been published [62].

Conflicts of Interest
Te authors declare that they have no conficts of interest.

Supplementary Materials
Supplementary