A Comparative Analysis of Biosynthetic Gene Clusters in Lean and Obese Humans

Obesity is intrinsically linked with the gut microbiome, and studies have identified several obesity-associated microbes. The microbe-microbe interactions can alter the composition of the microbial community and influence host health by producing secondary metabolites (SMs). However, the contribution of these SMs in the prevention and treatment of obesity has been largely ignored. We identified several SM-encoding biosynthetic gene clusters (BGCs) from the metagenomic data of lean and obese individuals and found significant association between some BGCs, including those that produce hitherto unknown SM, and obesity. In addition, the mean abundance of BGCs was positively correlated with obesity, consistent with the lower taxonomic diversity in the gut microbiota of obese individuals. By comparing the BGCs of known SM between obese and nonobese samples, we found that menaquinone produced by Enterobacter cloacae showed the highest correlation with BMI, in agreement with a recent study on human adipose tissue composition. Furthermore, an obesity-related nonribosomal peptide synthetase (NRPS) was negatively associated with Bacteroidetes, indicating that the SMs produced by intestinal microbes in obese individuals can change the microbiome structure. This is the first systemic study of the association between gut microbiome BGCs and obesity and provides new insights into the causes of obesity.


Introduction
Recent studies show that gut microbes play an important role in the pathogenesis of obesity [1,2]. Diet-induced alteration of the gut microbiota alleviated obesity in children [3,4], and several intestinal microbes (e.g., Actinobacteria) have been significantly associated with obesity [5]. Occasionally, an obesity-associated microbe detected in one study cannot be validated in other studies. For example, some studies report an increased Firmicutes to Bacteroidetes ratio in obese patients [6,7], whereas others found no association between the above phyla and obesity [8,9].
Microbial interactions can alter the composition of the community by producing secondary metabolites (SMs). SMs are organic compounds that are produced by bacteria and fungi that can mediate microbial competition and interaction and therefore influence the composition of the gut microbiota. The biosynthesis of SMs is controlled by enzymes encoded by biosynthetic gene clusters (BGCs).
Genomic mining of gut microbiota BGCs has helped identify numerous bioactive SMs with antimicrobial potential [10,11]. Most microbial BGCs that have been identified so far contain genes encoding core biosynthetic enzymes such as polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS). More than 3000 small molecule BGCs were identified in the NIH Human Microbiome Project [12], of which lactocillin showed similar structure to some clinically tested antibiotics, and the in vivo expression was validated by the metatranscriptome sequencing analysis. These small molecules not only inhibit the growth of competing bacteria but also alter the composition of the gut microbiome. In addition, microbial SMs have also been implicated in human physiology, although their precise role in obesity is unclear.
In our previous study, we used a systematic approach to detect putative BGCs enriched in Parkinson's disease from raw metagenomic data, of which many originated from microbes that were not abundant in the corresponding patients [13]. In this study, we analyzed the differences in the BGCs of nonobese and obese individuals using human fecal metagenomic data, in order to identify obesity-associated BGCs. Our findings illustrate the widespread distribution of SM-encoding BGCs in the human microbiome and provide new insights into the causes of obesity.

Data Collection and Construction of Human-Related BGC
Protein Database. A total of 10,042 genomes were identified from the IMG-ABC website (Version 4.560) using "Homo sapiens" as the host name, and 246,188 human-related BGCs and 2,640,191 protein sequences were extracted from these genomes [14]. The gut metagenomic data was extracted from the European Bioinformatics Institute-Sequence Read Archive database using the accession number ERP003612, which initially was used to analyze the correlation between the colonic microbiota and metabolic disorders in a Danish cohort of 123 nonobese and 169 obese individuals [15]. In the quality control step, we only kept the first 70bp of the reads for each sample, and samples with read length less than 70bp were discarded. The remaining 278 samples were analyzed further.

Identification of Putative BGCs.
To determine the abundance of each putative BGC per sample, the metagenomic reads were first aligned against the protein sequence database of the human-related BGCs using the DIAMOND tool with an E-value of 1e-05 [16], and the top hit proteins per read were subsequently analyzed. To avoid contamination of the nonbiosynthesis genes, a list of biosynthesis and nonbiosynthesis related Pfam domains was, respectively, extracted from AntiSMASH [17] and a recent study [12]. A database was constructed using these Pfam domains and queried against the top hit proteins, and the biosynthesis genes were validated using the hmmscan program in the HMMER package with an E-value of 0.01. Finally, the abundance scores of the biosynthesis genes of each BGC per sample were calculated, and the BGCs with at least 50% biosynthesis genes that were detected in more than 10 samples with a frequency of reads ≥ 10 were selected for the following analysis [13].

Detection of Known SM-Encoding BGCs.
A database of 13460 protein sequences extracted from all SM-encoding BGCs was constructed, and the trimmed metagenomic reads from ERP003612 were aligned against this database using the DIAMOND tool with an E-value of 1e-05 [16]. The putative BGCs encoding known SMs were detected the same way as the human-related BGCs. For each known secondary metabolite BGC, we compared their abundance with the body mass index (BMI).

Normalization and Comparison.
The abundance of these putative BGCs and known SM-encoding BGCs was further assessed across different samples. Each BGC was normalized as N BGC = F BGC * 10 6 / ∑ total , where F BGC is the sum of reads aligned to all biosynthetic genes in a particular BGC and ∑ total is the total number of reads in the metagenomic data. A BGC absent in a specific sample was assigned a value of 0.01.  Spearman's rank correlation analysis was used to evaluate the correlation between BMI and the BGCs, and the p-values were corrected by the Benjamini-Hochberg method.

NRPS Analysis.
NRPS is a class of peptide SMs produced by bacteria and fungi and has been successfully used as antibiotics [18]. AntiSMASH 4.0 was used to predict the domain information and core chemical structure of putative NRPS [17], NRPSsp was used to find the subunit of NRPS [19], and NaPDoS was used to define the class of condensation domain [20]. In order to determine the potential effect of NRPS on the obesity-related (positively or inversely) microbes, we evaluated the distribution of NRPS using the taxonomic profile of the ERP003612 data and the Human Microbiome Project (HMP) data [8,21]. The taxonomic profiling of metagenomic reads was performed using metaphlan2 [22].

Overview of BGCs in the Gut Microbiome of Obese Subjects.
The IMG-ABC is the largest freely accessible database of predicted and experimental BGCs that includes more than one million reads isolated from both genomes and metagenomes. After mapping the BGC reads from the obesity-related metagenomics extracted from the IMG-ABC database with the DIAMOND tool, we calculated the abundance of these BGCs by normalizing the aligned metagenomic reads from at least 10 samples with frequency of reads ≥ 10. A total of 4,640 BGCs, corresponding to ∼2% of the total human-related BGCs, were finally selected, of which 2183 were detected in at least 80% of the samples (Figure 1). Most BGCs are species specific, mirroring the individual-specific taxonomic profiles of the gut microbiome [23].
Interestingly, there was a significant positive correlation between the mean abundance of BGCs and the BMI of corresponding subjects (Figure 2(a)). The gut microbiota of obese individuals exhibited reduced taxonomic diversity compared to that of lean individuals [5]. It seems that obesity-associated SMs play a role in inhibiting the growth of competing species and reduce the diversity of the gut microbiota of obese individuals. Spearman's rank correlation analysis was used to determine the correlation of some of the detected BGCs with BMI. The most significantly correlated BGCs are shown in Table 1, and most of them are located in metagenomic data (Genome ID with prefix "70000"), indicating that they originate from microbial species that have not been identified so far [24]. For the unknown species-derived BGCs, we defined the putative host organism by the best hit genome using NCBI-BLAST [13], and, not surprisingly, most of them belonged to obese-related genera like Akkermansia, Ruminococcus, Bacteroides, and Prevotella [4,7,25]. However, the association between the BGCs and BMI did not do that between the host and obesity. For example, Bacteroides and Prevotella were positively associated with BMI (Table 1), but these two genera are usually negatively associated with obesity [26]. Furthermore, many species listed in Table 1 have not been linked with obesity, e.g., Gastranaerophilaceae MH 37 (Figure 2(b)).

Obese Individuals Have Characteristic BGCs with Known SM.
We also determined the correlation between obesity and BGCs encoding known SMs using Spearman's rank correlation analysis (Table 2). Menaquinone produced by Enterobacter cloacae showed the strongest correlation to BMI (Figure 2(c)). This is consistent with the high concentration of menaquinone detected in the adipose tissues of obese adults [27]. In addition, Enterobacter cloacae B29 isolated from the gut of morbidly obese individuals induced obesity in germfree mice [28], and reduction in Enterobacteriaceae and other bacteria could decrease fecal levels of menaquinone [29]. Taken together, our approach can identify obesityassociated BGCs and SMs.

A NRPS Found Increased with BMI.
We also detected an NRPS-encoding BGC (Cluster ID: 160336495) that was significantly correlated with BMI (Figure 2(d)). The best match genome of this NRPS is Clostridium leptum DSM 753,  which is associated with both obesity and weight loss [30]. The structure of this NRPS was analyzed by antiSMASH (Figure 3), and its putative substrate was identified as phenylalanine by NRPSsp. Finally, both condensation domains of this NRPS were recognized as being of the LCL class by NapDoS.
We compared the NRPS with each phylum in the ERP003612 data and found that the Acidobacteria, Bacteroidetes, and Chlorobi were negatively associated with the abundance of this NRPS (Table 3). To further determine the potential role of this NRPS on gut microbiome of obese individuals, we calculated its abundance in HMP data and   correlated it with each phylum per sample ( Table 4). The phyla Bacteroidetes and Verrucomicrobia were negatively associated with the abundance of this NRPS.

Discussion
We identified several obesity-associated BGCs by comparing the metagenomics data of obese and lean individuals. In agreement with previous studies [31], the BGCs were highly host specific, with only half of them detected in at least 80% of the individuals. In addition, most of these BGCs encode for unknown secondary metabolites, thereby indicating a potential source for antimicrobials. Studies have largely focused on the effects of externally administered drugs on the human body [32], and those of endogenously produced antibiotics are virtually unknown. We found that obesity, measured in terms high BMI, was associated with increased BGC abundance, indicating lower complexity of the gut microbiome due to the inhibitory function of encoded SMs. This is consistent with a previous study which identified decreased microbial complexity of the gut as one of the factors promoting obesity [5]. This could be due to obesityassociated SMs that kill the competing bacteria and reduce diversity. In addition, an obesity associated NRPS identified in this study was negatively correlated with Bacteroidetes, an obesity-associated bacterial phylum, in both ERP003612 data and HMP data [4]. Comparison of the BGCs with known SMs between obese and nonobese individuals showed the strongest correlation of menaquinone with high BMI. In addition, many obesityassociated BGCs encoded for SMs hitherto unrelated to obesity, indicating potential biomarkers for obesity. Several BGCs are associated with mobile genetic elements like transposons that are involved in horizontal gene transfer [33] and can account for their spread across multiple genomes. This could explain the correlation of these BGCs with even the bacteria not associated with obesity. For example, the role of Bacteroidetes in obesity has been largely ambiguous [4], whereas we found an inversely association of some BGCs from this phylum with BMI. It is possible that only some species of Bacteroidetes are associated with obesity, while the rest have gained BGCs encoding for SMs that inhibit the obesity-associated species. In addition, the Clostridium genus of the phylum Firmicutes has been positively associated with obesity [34], especially the pathogenic C. difficile [35]. However, some species like C. bolteae [7] are more abundant in lean individuals. Faecalibacterium prausnitzii is another obesity-related member of family Clostridiaceae and can decrease adipose tissue inflammation and improve hepatic health [36].
To summarize, we identified obesity-related BGCs from metagenomics data and provided novel insights into gut microbial SMs as potential markers for obesity.

Conclusions
We identified 4640 BGCs in the human gut microbiota, which provides novel insights into the role of the intestinal microbial community in obesity.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that they have no conflicts of interest.