Network Analysis of Transcriptome and LC-MS Reveals a Possible Biosynthesis Pathway of Anthocyanins in Dendrobium officinale

Anthocyanins, a group of flavonoids, are widely present in plants and determine the colors of the peels of stems, fruits, and flowers. In this study, we used UHPLC-ESI-MS to identify anthocyanins in the herbal plant Dendrobium officinale, which has been used for centuries in China. The results indicated that the total anthocyanin content in samples from Guangxi was the highest. Seven anthocyanins were identified, and the fragmentation pathways were proposed from D. officinale. Most of the identified anthocyanins were composed of cyanidin and sinapoyl groups. We also carried out that the sinapoyl group had active sites on breast cancer receptors by using Schrödinger. The relative levels of the 7 anthocyanins in the samples from the three locations were determined. Transcriptomic analysis was used to analyze the sinapoyl anthocyanin synthesis-related genes in plants, such as genes encoding UGTs and serine carboxypeptidase. We speculated that sinapoyl anthocyanin biosynthesis was associated with the activities of certain enzymes, including chalcone flavonone isomerase-like, hydroxycinnamoyltransferase 1, UGT-83A1, UGT-88B1 isoform X1, serine carboxypeptidase-like 18 isoform X3, and serine carboxypeptidase-like 18.


Introduction
Dendrobium officinale is a widely used herb in China and southern Asian countries. This plant was first recorded as a traditional Chinese medicine in Shennong's Classic of Materia Medica, which was written more than 2000 years ago [1,2]. The stems of D. officinale provided the benefits of improving body immunity, promoting salivation, antineoplasticity, and regulating blood sugar level because of large amounts of polysaccharides, flavonoids, alkaloids, and bibenzyl compounds [3][4][5]. With the increasing demand for human beings, this plant became nearly extinct in the wild and is now a second-class protected species in China. Currently, most D. officinale in the pharmaceutical markets is obtained by artificial cultivation in South China [6].
Anthocyanin, a type of flavonoids, is one of the most important secondary metabolites in plants. The concentration and type of those compounds determine the colors of plants [7]. Most reports on anthocyanins have focused on different kinds of flowers and the peels or seeds of fruits, such as grape [8]. Modern studies have shown that anthocyanin exhibits many types of bioactivity in vitro and in vivo, for example, antimicrobial [9], antidiabetic [10], anti-inflammatory [11], antioxidant [12], and anticancer [13,14]. Anthocyanin use is gaining popularity, and many food companies have developed several types of anthocyanin products for health preservation, such as grape seed extract, which is an important raw material in the food industry [15]. However, there remains a lack of studies on anthocyanins in other important plants. A preliminary study by our research group revealed that there exist significant variations in D. officinale based on place of production, such as the quantity and quality of flavonoids [16]. Notably, the appearance of the plant, especially the color of the peel, can differ. A previous report has stated that there exist substantial differences in anthocyanin content between D. officinale plants with green and red peels. Peel color is determined by the anthocyanin present in the stem of this plant [17]. But the related reports of qualitative differences of D. officinale among different areas are rare. Those genes involved in regulating the red peels are still defective.
Therefore, in this study, we used the UHPLC-MS/MS system to compare the quantitative and qualitative differences of D. officinale among the Guangdong, Guangxi, and Zhejiang provinces. For the identified anthocyanins, we aimed to propose the compound-related disease gene and analyze the active site of the disease-related gene at the molecular level by molecular docking. Base on a certain pharmacological effect of the identified anthocyanin, it is necessary to understand the biosynthesis of anthocyanin in plants. Thus, the biosynthesis-related genes in anthocyanin pathways of D. officinale, including UGTs and serine carboxypeptidase, were analyzed by transcriptome.

Plant Materials.
Nine batches of fresh, mature samples of the plant D. officinale were collected from 3 different places of production in Guangdong (GD), Zhejiang (ZJ), and Guangxi (GX) provinces ( Figure 1). As shown in Figure 1, the peels of the samples from Guangxi province were more purple than those from Guangdong while the samples from Zhejiang province were green.
2.3. Extraction of D. officinale Anthocyanins. Ten grams of fresh D. officinale stem was weighed precisely and cut into pieces. 1000 mL of 90% ethanol with 1% hydrochloric acid was added to the samples in a conical flask. The extraction was performed in the dark for 24 h. The liquid from the conical flask was filtered to obtain anthocyanin extract from the residue. The extract was dried at 40°C using a rotary evaporator in the dark. Ethanol was added to dissolve the extract and bring it to a 5 mL volumetric flask. A 0.22 μm filter mem-brane was used to obtain the final anthocyanin extract, which was used for quantitative and qualitative analyses.

Determination of the Total Anthocyanin Content in D.
officinale. The total anthocyanin content in D. officinale was calculated by the pH differential method. The formula used for calculation was as follows: where A is the absorption at a specific wavelength, calculated as follows: ðA520 − A700Þ pH 1:0 refers to the absorption of the sample solution at 520 nm minus the absorption at 700 nm when the pH of the sample solution is 1.0; ðA520 − A700Þ pH 4:5 refers to the absorption of the sample solution at 520 nm minus the absorption at 700 nm when the pH of the sample solution is 4.5.
MW in the first formula refers to the molecular weight of cyanidin 3-glycoside, which was equal to 449.2; DF refers to the dilution factor of the sample solution; ɛ refers to the molar absorption coefficient, which was equal to 26900; and W refers to the weight of the sample.

Composition and Relative
Quantitative Analysis by UHPLC-MS/MS. UHPLC-MS/MS analysis was performed on an HPLC with a UV detector (Thermo Separation Products Inc., Riviera Beach, FL, USA) system equipped with a Thermo Finnigan LCQ FLEET (Thermo Finnigan, Riviera Beach, FL, USA) ion trap mass spectrometer as well as an ESI source. The column used in this study was an Agilent Zorbax SB-Aq (250 mm × 4:6 mm, 5 μm). The mobile phase was composed of A (acetonitrile) and B (1% formic acid water) using a gradient elution: 0~15 min, 15~17% A; 15~20 min, 17~18% A; 20~35 min, 18~20% A; and 35~40 min, 20~25% A. The wavelength for UV detection was 520 nm; the flow rate was set at 1.0 mL/min; the column temperature was 35°C; the injection volume was 4 μL.
The settings for the ESI source was as follows: the analysis was performed in a positive mode ([M+H] + ), capillary temperature at 320°C, sheath gas flow rate at 40 psi, Aux gas flow at 2.0 psi, capillary voltage at 4.2 kV, and tube lens at 90.00 kV. The mass range was from 50 to 1900 m/z.
The relative quantitative analyses of all the identified anthocyanins were performed by calculating the peak area ratios of the target and the largest peak at 520 nm, which could be used to distinguish anthocyanins from flavonoids and other phenolic compounds. The qualitative analyses of anthocyanins were performed by data-dependent MS n scanning to trigger the target ion fragmentation and prevented repetition.   [16]. Due to anthocyanidins and flavonoids sharing the same biosynthesis pathway in the KEGG database, we aimed to compare the expression levels of anthocyanidin synthesis-related genes and flavonoid synthesisrelated genes. In addition, we also compared genes encoding UDP-glycosyltransferase and serine carboxypeptidase, which are involved in anthocyanin synthesis and led to the component diversity. Gene expression levels are shown as FPKM values.

Correlation Analysis of Metabolites and Transcriptome
Data. The Pearson correlation coefficient was used to perform the association analysis of transcriptomic and related anthocyanin (Pearson correlation coefficient > 0:8 as a significant correlation) [18,19]. The relationships between metabolites and related genes were visualized by using Cytoscape 3.7.2 (The Cytoscape Consortium, San Diego, USA).  Figure 3. The identification processes based on regular fragmentation of the anthocyanins were described below. Peak 1 was identified as cyanidin 3-[2-(glucosyl)-6-(sinapoyl)glucoside]-5-glucoside [20,21], C 44 H 51 O 25 + , which exhibited a retention time of 10.944 min by UV chromatography. The molecular weight of this compound was 979.01, and the peak was fragmented to 817.12 and 655.05 by MS 2 ; the ion peak at 655.05 was fragmented to 449.09 and 287.09. The peak at 287.16 was attributed to cyanidin, while the peak at 449.09 was attributed to cyanidin-5-glucoside. The peak at 817.02 was attributed to cyanidin 3-[6-(sinapoyl)glucoside]-5-glucoside.

Total and Relative Anthocyanin Content in D. officinale.
The total anthocyanin content of D. officinale from Zhejiang, Guangdong, and Guangxi provinces is shown in Figure 4(b). The average total anthocyanin content in the plant from Guangxi, Guangdong, and Zhejiang provinces was 2.144, 0.648, and 0.532 mg/mL. The total anthocyanin content in D. officinale from Guangxi province was the highest. The relative content of anthocyanin determined by a peak area in D. officinale is shown in Figure 4 Table S1.

Molecular Docking Analysis.
The potential interactions of the active compounds and disease-related genes were clarified by molecular docking. CTSD and CA9 frequently appeared in identified anthocyanins with a count of four and three, respectively. Cyanidin 3-[2-(glucosyl)-6-(sinapoyl)glucoside]-5-glucoside was the second highest compound of the seven anthocyanins. Thus, they were chosen as the proteins and ligand in this docking. The results showed that cyanidin 3-[2-(glucosyl)-6-(sinapoyl)glucoside]-5-glucoside was successfully docked into the breast cancerrelated proteins, CTSD and CA9. The dock details are shown in Table 2. The binding mode of cyanidin 3-[2-(glucosyl)-6-(sinapoyl)glucoside]-5-glucoside in the active site of CTSD (PDB:4OD9) and CA9 (PDB:6FE0) is represented in its three-dimensional mode in Figure 5. In the active site of CTSD, the glycosyl group showed H-bond interactions with GLY233 (3-glucoside), SER235, LEU236 (2-glucoside), and GLY79 (5-glucoside). A hydrogen in flavonoid skeleton A-ring showed aromatic H-bond with GLN285. In the active site of CA9, we surprisingly found that the carbonyl and hydroxyl in the sinapoyl group had H-bond interactions with GLN67 and VAL131. Flavonoid skeleton B-ring showed not only H-bond interactions but also pi-pi stacking with HID 94. Oxygen in flavonoid skeleton C-ring showed H-bond with GLN92 and in 3-glucoside interacted with ARG60.  [27]. The enzyme anthocyanidin 5-aromatic acyltransferase is involved in the malonylation of the 5-Oglucose residue of anthocyanidins [28]. Besides, 3 genes were relative to anthocyanin content. Chalcone flavonone isomerase not only is a key enzyme in the flavonoid synthesis pathway but also can regulate the accumulation of anthocyanins [29]. Hydroxycinnamoyltransferase catalyzes the transfer of    [30,31]. Anthocyanidin reductase is an important enzyme to convert cyanidin to (-)-epicatechin [32]. Based on the heat map, anthocyanidin 3-O-glucoside-6 ″ -O-malonyltransferase-like isoform X2, anthocyanidin 5-aromatic acyltransferase, chalcone flavonone isomerase-like, and hydroxycinnamoyltransferase 1 were highly expressed in samples from Guangxi province, where the total anthocyanin content was high.
3.6. Expression Level of UGTs. According to the heat map, there were no regular trends in the expression of UDPglycosyltransferase; the samples from three different locations exhibited different expression patterns. The heat map of UGTs is shown in Figure 6(b). For example, UGT-90A2, UGT-73B5, and UGT-86A1 were highly expressed in sam-ples from Guangdong province, while UGT-73C1, UGT-73B4, and UGT-708A6 were highly expressed in samples from Zhejiang. UGT-73E1, UGT-73C3, and UGT-71A1 were highly expressed in samples from Guangxi. The main function of UGTs is to catalyze the addition of glycosyl groups to other molecules. Some UGTs identified in this study were flavonoid glycosyltransferases. For example, the main function of UGT-73B3 is to synthesize 3-O-flavone in vitro [33]. The protein UGT-708A6 was reported to catalyze the conversion of glucose to 2-hydroxynaringenin [34]. The expression of these UGTs was certainly associated with anthocyanin synthesis in D. officinale. Based on the FPKM values, some UGT expression levels were high such as UGT-91C1, UGT-83A1, and UGT-88B1 isoform X1. UGT-91C1 was highly expressed in samples from Guangxi province, while UGT-83A1 and UGT-88B1 isoform X1 were highly expressed in samples from Guangdong. and serine carboxypeptidase-like 9 [35]. Although all serine carboxypeptidases have not been reported to be associated with the synthesis of sinapoyl in anthocyanins, we believe that certain highly expressed serine carboxypeptidases should be investigated. In this study, 20 genes annotated as encoding serine carboxypeptidases are shown in Figure 6(c). 14 genes were relatively highly expressed in samples from Guangxi, including serine carboxypeptidase-like 18 isoform X3, 36, and 3 isoform X4. Five genes, including serine carboxypeptidase-like 27, 2, and 33 isoform X2, were relatively highly expressed in samples from Zhejiang province, while six genes, including serine carboxypeptidase-like, 16 and 18, were highly expressed in samples from Guangdong.

Correlation Analysis of Metabolites and Related Genes.
The association analysis between 7 identified anthocyanins and transcripts was carried out. Eight transcripts were selected to be strongly correlated (Pearson correlation coefficient > 0:8, p value < 0.01) to the 7 metabolites. 19 transcripts were chosen as related genes (0:8 > Pearson correlation coefficient > 0:668, 0:01 < p value < 0.05) to the identified metabolites (Table S2). As shown in Figure 7, high values of the Pearson correlation coefficient to bright colors (yellow), some UGTs, and serine carboxypeptidases such as UGT-89B1-like, UGT-83A1, and UGT-88B1 isoform X1 serine carboxypeptidase 5, 18, 42, and 18 isoform X3 had a strong correlation with the content of peaks 1-7. Except for the UGTs and serine carboxypeptidases, the genes on the anthocyanin pathway such as chalcone flavonone isomerase-like and hydroxycinnamoyltransferase 1 also had a correlation to the anthocyanin components.

Discussion
Anthocyanins are the main determinants of the colors of the plants' tissue [36]. Normally, these compounds are present as glycosides. This is the first report that the structures of anthocyanins in D. officinale are present not only with glycosides but also with sinapoyl groups. Thus, the total anthocyanin content and the relative quantities of these identified anthocyanins were determined. Although the relative quantities of the samples from the three areas were different, the compositions of the anthocyanin have certain similarities. For example, cyanidin 3-[6-sinapoyl-2-O-(2-(sinapoyl)glucosyl)-glucoside]-5-glucoside was present at the highest level in all the samples.
To analyze the pharmacological activities of the identified anthocyanins, we found that all of them could act on breast cancer-related targets from the compound and disease database. Both CTSD and CA9 are the important targets that have reported the therapeutic effects on breast cancer [37][38][39][40]. At the same time, increasing studies have reported the therapeutic effect of anthocyanins on breast cancer, but they mainly focused on cyanidin-3-glucoside [41,42]. In this study, we found that the sinapoyl of cyanidin 3-[2-(glucosyl)-6-(sinapoyl)glucoside]-5-glucoside provided active sites on the breast cancer-related protein  Based on the pharmacological activities of anthocyanins, the synthetic pathway of anthocyanin in D. officinale needs to be clarified, which can provide the basis for the transformation of medicinal plants. Sinapoyl anthocyanins are widely present in many plants, such as Arabidopsis thaliana [43] and Orychophragonus violaceus [44]. However, the sinapoylation of anthocyanins in plants has not been studied extensively.
The biosynthesis of cyanidin and delphinidin was attributed to structural genes in the anthocyanin pathway. It can be speculated that the expression levels and correlation of the genes in the anthocyanin synthesis pathway determined the total anthocyanin content. We concluded that anthocyanidin 3-O-glucoside-6 ″ -O-malonyltransferase-like isoform X2, anthocyanidin 5-aromatic acyltransferase, chalcone flavonone isomerase-like, and hydroxycinnamoyltransferase 1 were closely related genes, especially chalcone flavonone isomerase-like and hydroxycinnamoyltransferase 1. The high expression level of these genes in samples from Guangxi contributed to the higher total anthocyanin content in samples from Guangxi than others. Although anthocyanidin reductase, an important enzyme to convert cyanidin to (-)-epicatechin [32], was expressed low in samples from Zhejiang, it might lead to the low content of (-)-epicatechin. Due to the high total anthocyanin content in Guangxi, it was possible that anthocyanidin reductase could not dramatically transform cyanidin to (-)-epicatechin. Many structural genes were highly expressed in Zhejiang province based on the heat map, but a majority of them were in the flavonoid synthesis pathway without a significant correlation of anthocyanin components. Thus, we concluded that the highly expressed genes in samples of Zhejiang province did not necessarily affect the contents of anthocyanins. Except for structural genes, the types of anthocyanins presented in plants were determined by other enzymes such as UGTs and serine carboxypeptidase genes. Thus, we speculated that the highly expressed and correlative genes of 2 UGTs and 2 serine carboxypeptidases also play key roles in anthocyanin biosynthesis in D. officinale.
Here, we proposed a sinapoyl anthocyanin biosynthesis pathway in D. officinale, which was described below. First, cyanidin and delphinidin were synthesized via the structural genes of the anthocyanin pathway. Chalcone flavonone isomerase-like and hydroxycinnamoyltransferase 1 may be the key enzyme to determine the total anthocyanin content. Then, cyanidin and delphinidin were glycosylated by UGTs such as UGT-83A1 and UGT-88B1 isoform X1 to attach the glycosyl to specific positions. Third, via the activity of serine carboxypeptidase-like 18 and serine carboxypeptidaselike 18 isoform X3, sinapoylation of cyanidin and delphinidin were achieved.

Conclusions
Quantitative and qualitative analyses showed that the anthocyanins in D. officinale plants from different places were different. The total anthocyanin concentration in samples from Guangxi was the highest. This result was consistent with the apparent characteristics of this herbal plant. By transcriptomic analysis, we revealed a possible biosynthetic pathway for the anthocyanins identified in this study; anthocyanidin synthesis-related genes, UGTs, and serine carboxypeptidase were involved in this pathway. The possible drug targets of anthocyanin were predicted by molecular docking. However, identification of the exact functions or roles of these analyzed genes required further investigation. In vitro enzyme activity assays and the exact pharmacological activity of anthocyanins needed to be verified.

Data Availability
LC-MS datas and the anthocyanidin related targets were included in supplemental data.

Conflicts of Interest
All authors declare no conflict of interest.