De Novo Sequencing of a Sparassis latifolia Genome and Its Associated Comparative Analyses

Known to be rich in β-glucan, Sparassis latifolia (S. latifolia) is a valuable edible fungus cultivated in East Asia. A few studies have suggested that S. latifolia is effective on antidiabetic, antihypertension, antitumor, and antiallergen medications. However, it is still unclear genetically why the fungus has these medical effects, which has become a key bottleneck for its further applications. To provide a better understanding of this fungus, we sequenced its whole genome, which has a total size of 48.13 megabases (Mb) and contains 12,471 predicted gene models. We then performed comparative and phylogenetic analyses, which indicate that S. latifolia is closely related to a few species in the antrodia clade including Fomitopsis pinicola, Wolfiporia cocos, Postia placenta, and Antrodia sinuosa. Finally, we annotated the predicted genes. Interestingly, the S. latifolia genome encodes most enzymes involved in carbohydrate and glycoconjugate metabolism and is also enriched in genes encoding enzymes critical to secondary metabolite biosynthesis and involved in indole, terpene, and type I polyketide pathways. As a conclusion, the genome content of S. latifolia sheds light on its genetic basis of the reported medicinal properties and could also be used as a reference genome for comparative studies on fungi.


Introduction
Sparassis latifolia (S. latifolia), also called cauli ower mushroom, is a valuable brown-rot fungus belonging to Sparassidaceae of Polyporales. S. latifolia usually grows on trees like pine or larch and have a wide distribution across the Northern Temperate Zone.
e mating system of S. latifolia is bipolar [1], and the basidiocarps are composed of numerous loosely arranged abella that are morphologically large, broad, dissected, and slightly contorted [2].
Polysaccharides represent a major class of bioactive compounds found in mushrooms. Beta-glucan was the major bioactive component of S. latifolia, which composes more than 40% its dry weight [3]. Previous studies suggest that a 6-branched 1,3-beta-glucan forms the primary structure of the puri ed beta-glucan from this mushroom. e puri ed beta-glucan exhibits various biological activities, such as immune stimulation and antitumor e ects [1,3,4]. Oral administration of S. latifolia also has antihypertension [5], antiallergen [6], and antidiabetic e ects [7,8]. Because of its potential in medical researches, factory cultivation of S. latifolia had been achieved in Japan, South Korea, and China. However, the long life-cycle and high labor intensity are still the key bottlenecks for wide cultivation.
In recent years, lots of fungal genomes were sequenced because of their importance in industry, agriculture, and medicine elds. Based on whole genomes sequencing, enzymes engaged in carbohydrate metabolism and key enzymes for secondary metabolite biosynthesis were analyzed in Ganoderma lucidum and Lignosus rhinocerotis [9][10][11]. In addition, Martinez et al. analyzed the lignocelluloses conversion mechanism of a brown-rot fungus Postia placenta using the genome, transcriptome, and secretome data [12].
ey also compared it with Phanerochaete chrysosporium, a white-rot fungi, and identi ed that the function of lignin for e cient depolymerization was lost during the evolutionary shift from white-rot fungi to brown-rot ones. e genomes of a few other edible or medical mushrooms were also sequenced, for example, Volvariella volvacea [13], 2.2. Protein Domain Analysis for S. latifolia. We adopted a widely used database Pfam [22] to perform protein domain analysis. In total, 6821 deduced protein sequences of S. latifolia were found to be associated with protein domains (Appendix S2), and the top 20 Pfam domains are plotted in Figure 4. e top two Pfam domains are associated with protein kinase activities (197 protein kinase domains and 149 protein tyrosine kinase domains). Protein kinases have roles in every aspect of regulation and signal transduction [23]. For example, tyrosine kinase (TK) usually catalyzes the phosphorylation of Tyr residues in a protein. It is generally thought the orthologs of animal TKs are rare in fungi [24,25]. In addition, we found 2 transporter domains including a superfamily/MSF_1 domain (PF07690.11) containing 149 proteins and a sugar (and other) transporter/sugar_tr domain (PF00083.19) containing 76 proteins. ese transporters were inferred to play roles in transportation of small solutes like sugar in response to chemiosmotic ion gradients.

Cytochrome P450 Monooxygenases.
Cytochromes P450 (P450s) are heme-containing monooxygenases and widely present in species across the biological kingdoms. We retrieved the P450 genes in S. latifolia and 12 other Polyporales using BLAST against the P450 database (Table 3). Phanerochaete carnosa contains the highest number of putative P450 genes (262) followed by Ganoderma sp. (209), Wolfiporia cocos (206), and Bjerkandera adusta (199). However, S. latifolia only had a total of 105 CYPs, in which 85 CYPs can be assigned to 26 families according to Nelson's nomenclature, and the left 20 CYPs need further assignment (Appendix S6) [32]. e CYP5146 family had the largest number of genes (20 genes), followed by CYP620 (9 genes), CYP53 (7 genes), and CYP63 (6 genes) families (Table 3). CYP5146 and CYP5150 family proteins were involved in the oxidation of heterocyclic aromatic compounds, and the number of CYP5146 proteins in S. latifolia was highest across the selected fungi. Enrichment of CYP5146 family suggested that CYP5146 proteins might contribute to fungal adaptation to ecological niches by involving in oxidation of plant material. e gene number of the CYP620 family (involved in the secondary metabolism) was signi cantly higher than other selected fungi. e CYP53 family, also known as benzoate-p-hydroxylase, possibly played a key role in colonization of plants through involvement in degradation of wood [33]. S. latifolia also harbours six genes from the CYP63 family, which are associated with xenobiotic degradation in Phanerochaete chrysosporium [34]. When compared to other fungi [11], it is worth noting that S. latifolia has 24 genes engaged in "Metabolism of xenobiotics by cytochrome P450" and 21 genes engaged in "Drug metabolism-cytochrome P450" KEGG subpathways (Appendix S6). However, the exact roles of these CYPs are yet to be studied.

Secondary Metabolism.
e secondary metabolism of fungi is a rich source of bioactive chemical compounds with great potential for pharmaceutical, agricultural, and nutritional applications, and secondary metabolite biosynthetic genes are often clustered [37]. ere are several metabolite gene clusters in the S. latifolia genome, suggesting its potential in producing certain biologically active compounds (Appendix S7). ere are 15 gene clusters encoding key enzymes critical to the biosynthesis of terpenes, indole, polyketides, and other secondary metabolite-related proteins. Interestingly, most of these clusters have homologous in other fungi except for clusters 1, 16, 18, and 33 (Appendix S8).
Fungal polyketides are one of the rst classes of secondary metabolites and responsible for both aromatic and highly reduced polyketide metabolites [38]. e S. latifolia genome has 24 putative synthesis-associated genes assigned to three type I polyketide clusters. As probably the largest class of nitrogen-containing secondary metabolites, indole alkaloids are widely present in species across the biological kingdoms, many of which display potent biological activities [39]. An indole-prenyltransferase-(indole-PTase-) encoding gene was detected in cluster 16. Indole-PTase, also referred to Tr an sp or te r ac tiv ity St ru ct ur al m ol ec ul e ac tiv ity El ec tr on ca rr ie r ac tiv ity M ol ec ul ar tr an sd uc er ac tiv ity En zy m e re gu la to r ac tiv ity A nt io xi da nt ac tiv ity N uc le ic ac id bi nd in g tr an sc rip tio n fa ct or ac tiv ity Re ce pt or ac tiv ity G ua ny l-n uc le ot id e ex ch an ge fa ct or ac tiv ity Pr ot ei n bi nd in g tr an sc rip tio n fa ct or ac tiv ity N ut ri en t re se rv oi r ac tiv ity  Terpenoids is a well-recognized group of secondary metabolites for their wide usage in pharmacy. Based on anti-SMASH analysis, terpene synthase cluster was the largest cluster (located in 6 di erent sca olds). e terpene synthases are known to be critical to the biosynthesis of monoterpene, sesquiterpene, and diterpene backbones [40]. A total of 4 terpene synthase genes were identi ed in the S. latifolia genome, many of which were clustered together with modifying enzymes (Appendix S7).
In addition, we identi ed 17 key enzymes in the mevalonate (MVA) pathway in the genome of S. latifolia based on KEGG.

Strains and Culture
Conditions. Cultivated in China, the S. latifolia strain "Minxiu NO.1" was provided by the Institute of Edible Fungi, Fujian Academy of Agricultural Sciences, and was grown at 25°C on PDA (20% potato, 0.2% peptone, 2% glucose, and 1.5% agar) for 25 days. To isolate genomic DNA and total RNA from mycelia, a 300 mL Erlenmeyer ask containing 50 mLPDB liquid medium (20% potato, 0.2% peptone, and 2% glucose) was inoculated with fresh plugs from the plate ( ve mycelial plugs/ ask) and incubated at 25°C for 25 days with rotation.

Protein Domain Estimation.
We adopted a similar procedure in Kumar et al. [49] to perform protein domain estimation of the S. latifolia genome. Roughly, the predicted proteins of the S. latifolia genome were scanned to Pfam [22] protein domain collection. Pfam domains were inferred using HMMER 3.0 [50] by removing overlapping clans. e readers were referred to [49] for detailed steps.

Secondary Metabolite Gene Clusters Annotation.
We rst used BLAST (with E value < 1e−3) to identify putative genes encoding proteins that produce bioactive compounds. Subsequently, we analyzed the S. latifolia genome by anti-SMASH (http://antismash.secondarymetabolites.org/) [37] to identify putative clusters, which were further examined by manually coupling with RNA-Seq data.
Protein sequences of β-glucan synthases from the different species were aligned using MUSCLE 3.6 [55,56]. e multiple sequence alignments were concatenated upon removing poorly aligned regions by the GBlocks server [57]. We then used a software PROTTEST 3.4 [58] to select the best model to t protein evolution of the concatenated alignment. Phylogenetic analysis was conducted with Bayesian inference (BI) implemented in MrBayes v3.2.5 [59] under the LG + G + I model.

Data Availability
is Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession LWKX00000000. e version described in this paper is version LWKX01000000. Additionally, more data can be downloaded from our institute website: http://www.fj-mushroom. cn/Sparassis%20latifolia%20genome/1.rar.

Conflicts of Interest
e authors declare that there are no con icts of interest.