A Metabolomics Approach to Stratify Patients Diagnosed with Diabetes Mellitus into Excess or Deficiency Syndromes

The prevalence of type 2 diabetes continuously increases globally. The traditional Chinese medicine (TCM) can stratify the diabetic patients based on their different TCM syndromes and, thus, allow a personalized treatment. Metabolomics is able to provide metabolite biomarkers for disease subtypes. In this study, we applied a metabolomics approach using an ultraperformance liquid chromatography (UPLC) coupled with quadruple-time-of-flight (QTOF) mass spectrometry system to characterize the metabolic alterations of different TCM syndromes including excess and deficiency in patients diagnosed with diabetes mellitus (DM). We obtained a snapshot of the distinct metabolic changes of DM patients with different TCM syndromes. DM patients with excess syndrome have higher serum 2-indolecarboxylic acid, hypotaurine, pipecolic acid, and progesterone in comparison to those patients with deficiency syndrome. The excess patients have more oxidative stress as demonstrated by unique metabolite signatures than the deficiency subjects. The results provide an improved understanding of the systemic alteration of metabolites in different syndromes of DM. The identified serum metabolites may be of clinical relevance for subtyping of diabetic patients, leading to a personalized DM treatment.


Introduction
Diabetes mellitus (DM) is a chronic disease defined with high blood glucose levels, which may be due either to the progressive failure of pancreatic -cell function and consequently a lack of insulin production (type 1: T1DM), or to the development of insulin resistance and subsequently the loss of -cell function (type 2: T2DM). DM affects more than 230 million people worldwide and T2DM is predicted to affect approximately 8% of the population by 2030 [1]. The chronic hyperglycemia of diabetes is associated with significant longterm sequelae, particularly damage and/or dysfunction and failure of various organs, especially the kidneys, eyes, nerves, heart, and blood vessels [2]. Both the macrovascular (coronary artery disease, peripheral artery disease, and stroke) and microvascular (retinopathy, nephropathy, and neuropathy) complications are the major causes of morbidity and mortality of diabetes.
Traditional Chinese medicine (TCM) has a long history and particular advantages in the diagnosis and treatment of diabetes mellitus. Syndrome differentiation is not only the basic unit of TCM theory, but also the bridge to associating disease and formula. TCM can stratify the diabetic patients based on their different TCM syndromes and, thus, allow a personalized treatment. When people suffer from a disease, Yin (things associated with the physical form of an object), Yang (things associated with energetic qualities), Qi (life force that animates the forms of the world), and Xue (dense form of body fluids that have been acted upon and energized by Qi) [3] are in an abnormal state. Similarly, DM could be classified as having deficiency syndrome or excess syndrome, which refers to the insufficiency or excess in Qi, Xue, Yin, and Yang. However, syndromes depend on medical experience, academic origins, and other factors so that the concept of syndromes is vague and broad, which makes clinical application difficult. Hence, it is more important to realize the 2 Evidence-Based Complementary and Alternative Medicine syndrome objectification and standardization. Furthermore nowadays although the diagnosis and treatment of manifest diabetes have been thoroughly investigated, the identification of novel pathways or biomarkers indicative of the TCM syndrome differentiation of diabetes is still underway.
With the rapid development of the analytical technology and advanced multivariate statistical and bioinformatic tools, metabolomics has become a promising approach for understanding and elucidating the etiology and mechanisms of human diseases [4][5][6][7] and has been extensively applied to life science [8][9][10]. Metabolomics is also able to provide metabolite biomarkers for disease subtypes. The growing research field of metabolomics has introduced new insights into the pathology of diabetes as well as methods to predict disease onset and has revealed new biomarkers during the last decade. Recent epidemiological studies first used metabolism to predict incident diabetes and revealed branched-chain and aromatic amino acids including isoleucine, leucine, valine, tyrosine, and phenylalanine as highly significant predictors of future diabetes [11,12]. Our previous work also showed urinary carbohydrate metabolic characterization of DM patients with different traditional Chinese medicine syndromes, including biomarkers different from non-DM patients [13]. Xu et al. found that three TCM syndromes including Qi-deficiency, Qi and Yin-deficiency, and damp heat can be separated using metabolomics technology and such differences can be manifested by plasma fatty acids and lipid parameters [14]. Wei et al. designed an explorative study of 50 prediabetic males, and finally they indicated more disturbances of carbohydrate metabolism and renal function in subtype "Qi-Yin deficiency with stagnation" compared with "Qi-Yin deficiency with dampness" [15]. However, it is still far from clear about the different syndrome of diabetes although so many investigations have been performed [16].
In this study, we applied a metabolomics approach using ultraperformance liquid chromatography (UPLC) coupled with quadruple-time-of-flight (QTOF) mass spectrometry to characterize the metabolic alterations of different TCM syndromes including excess and deficiency in patients diagnosed with DM and discover biomarkers using metabolomics technology to further find the deep connotation of TCM syndromes.  [17]. TCM syndromes, including deficiency and excess syndromes, were differentiated according to the guidelines [18]. Patients suffering from other serious diseases involving major organs or infective diseases were excluded from the study. Patients with deficiency and excess syndrome simultaneously were also excluded.

Methods
The detailed inclusion and exclusion criteria were shown in previous work [13]. Altogether 295 subjects with T2DM (238 deficiency and 57 excess samples) were recruited to the study. All subjects provided their written informed consent. The ethics committee of the hospital approved the study plan and the study complied with the Declaration of Helsinki.

Biochemical
Analysis. Body mass index (BMI) was calculated as weight (kg) divided by height (m) squared. The waisthip ratio (WHR) was defined as the waist circumference (cm) divided by the hip circumference (cm). Systolic blood pressure (SBP) and diastolic blood pressure (DBP) were measured using standard mercury sphygmomanometers on the right arm of seated participants. Serum fasting glucose, triglyceride (TG), high density lipoprotein (HDL-C), very low-density lipoprotein cholesterol (VLDL-C), alanine aminotransferase (ALT), and 2 h PG were analyzed using an automatic bioanalyzer (Hitachi7180, Tokyo, Japan). Liver ultrasound examination was carried out on the same equipment (Aloka1700, Japan).

Chemicals.
HPLC grade methanol, acetonitrile, and formic acid were purchased from Merck Chemicals (Darmstadt, Germany). L-chlorophenylalanine was purchased from Sigma-Aldrich (St. Louis, MO). Ultrapure water was produced by a Milli-Q water system (Millipore, Billerica, USA).

Serum Sample Preparation for UPLC-QTOFMS.
Fasting serum samples were obtained and prepared strictly according to the previous work [19]. An aliquot of 100 L of serum was mixed with 400 L of a mixture of methanol and acetonitrile [5 : 3, (containing 0.1 mg/mL L-chlorophenylalanine as the internal standard)]. The mixture was then vortexed for 2 min, allowed to stand for 10 min, and centrifuged at 14 500 g for 20 min. The supernatant was used for UPLC-QTOFMS analysis.

UPLC-QTOFMS Spectral Acquisition of Serum Samples and Data Preprocessing.
A Waters ACQUITY ultraperformance liquid chromatography (UPLC) system equipped with a binary solvent manager and a sample manager (Waters Corporation, Milford, MA, USA), coupled to a QTOF mass spectrometry with an electrospray interface (Waters Corporation, Milford, MA), was used throughout the study as aforementioned [19]. All Chromatographic separations were performed with an ACQUITY BEH C18 column (1.7 m, 100 × 2.1 mm internal dimensions, Waters). The column was maintained at 50 ∘ C, and the injection volume of all samples was 5 L. The LC elution conditions were optimized as follows: linear gradient from 1 to 20% B (0-1 min), 20 to 70% B (1-3 min), 70 to 85% B (3-8 min), 85 to 100% B (8-9 min), and isocratic at 100% B (9-9.5 min) with a flow rate of 0.4 mL/min. (A) Water with 0.1% formic acid and (B) acetonitrile with 0.1% formic acid were used for positive ion mode (ESI+), while (A) water and (B) acetonitrile for negative ion mode (ESI−). The mass spectrometer was operated with source and desolvation temperatures set at 120 ∘ C and 300 ∘ C, respectively. The desolvation gas was set at a flow rate of Evidence-Based Complementary and Alternative Medicine 3 600 L/hr. The capillary voltage was set of 3.2 and 3 kV and the cone voltage of 35 and 50 V, respectively, in the positive and negative ion modes.
The UPLC-MS raw data were processed using Marker-Lynx 4.1 (Waters, Manchester, UK) using parameters mentioned in the previous work [20][21][22][23]. After removing the ion peaks generated by the internal standard, the data were normalized by dividing the sum of all peak intensities within the sample and then a data matrix consisted of the retention time, / value, and the normalized peak area was exported for multivariate statistical analysis using the K-OPLS package (available at http://kopls.sourceforge.net/download.shtml) and Statistics toolbox of the Matlab (version 7.1, Mathwork Inc.) software. Compound annotation was carried out by comparing the retention time, molecular weight, preferred adducts, and in-source fragments based on our in-house reference standard library (−800 mammalian metabolite standards available) and web-based resources, including the Human Metabolome Database (http://www.hmdb.ca/).

Data
Analysis. Data from the common and clinical information were expressed as mean ± standard deviation (S.D.). Differences between the means of groups were analyzed using independent samples -test for continuous variables and Pearson chi-square tests for categorical variable using the SPSS 17.0 software (SPSS, Chicago, Illinois, USA), with a twosided value of <0.05 considered statistically significant.
By applying preprocessing methods, both a synthetic minority oversampling technique (SMOTE) bagging rebalancing method and a genetic algorithm (GA) with kernelbased orthogonal projections to latent structures (K-OPLS), differential metabolites between groups from the UPLC-QTOFMS data were observed. The major protocol was according to our previous work [13,24] and related literatures [25,26], as shown in Figure S1  First, for the reason of the data's unbalance, the SMOTE algorithm was performed to simulate small data and achieve the equilibrium of the whole data set. According to the class of 50% randomly selected samples as training set, the original sample data as test set, and GA-KOPLS algorithm with balanced prediction errors of test set as a fitness function, the nearest neighbor parameters " " of SMOTE were optimized.
Secondly, under the optimized SMOTE parameter, GA-KOPLS algorithm is applied to modeling and selects the important variables (metabolites) at the same time with the balanced prediction error in test set as the fitness function (minimize error). Evaluation of classifier accuracy during each GA run was performed using a cross-validation [27].
Thirdly, the important variables selected by the GA were applied to a K-OPLS algorithm for classification, and the parameters including the Gaussian kernel function parameter ( ) and the number of Y-orthogonal components (Ao) of the K-OPLS model were optimized with internal tenfold cross-validation of training set. The kernel matrix was centered to model estimation. The samples from each training set study were taken for classification, in turn, excluding those being classified from the selected samples in the training set. The prediction accuracy of the original data set, AUC, sensitivity, and specificity were used to evaluate the K-OPLS model performance. Details on the model were provided in the previous work [13].
Finally, the frequency of variable significance test was performed by GA, and the values were calculated based on the binomial probabilities of variables being selected in the 50 independent runs, to identify the metabolites with significant influence in the classification. One has where = number of runs, V = number of variables, and = mean number of times variables are selected, rounded to an integer. For details, see literature [28]. In addition, these metabolites selected from the model were validated at a univariate level with nonparametric Wilcoxon rank sum test with a critical value usually set to 0.05.

Clinical Characteristics of Patients.
Subjects' clinical characteristics of the three groups were summarized in Table 1.
The clinical characteristics of this subset of subjects did not differ significantly between the groups at baseline, except for age, BMI, and the coincidence of fatty liver disease, which were significantly higher in the deficiency group than in the excess group.

UPLC-QTOFMS Analysis of Serum Metabolite Profiles.
The ESI positive ion mode was more efficient with a significantly greater number of serum metabolites detected than the ESI negative ion mode and, therefore, was selected for the full scan detection mode. Among a total of approximately 6680 metabolite features obtained from the UPLC-QTOFMS, 133 metabolites were identified with our in-house reference standard library and further verified by available reference standards. Their peak areas were integrated for further multivariate analysis.

Classification of the K-OPLS Models.
In the present study the nearest neighbor parameter " " of SMOTE was 3 after optimization, from DM patients with excess or deficiency syndrome. A K-OPLS model was fitted using the Gaussian kernel function with the important variables selected from GA. The parameters of GA including initiate population, (times of genetic algebra), selective ratio of initiate variable, and probability of simple point crossover were 30, 150, 0.1, and 0.7, respectively ( Table 2). Accuracy of classification of crossvalidation (ACCV) was calculated for each combination of and Ao which were optimized using 10-fold cross-validation. ACCV was the largest when = 2.5 and Ao = 3 for DM patients with excess and deficiency syndrome (Figure 1(a)). Table 2 showed the R2X, R2Y, Q2Y, AUC, sensitivity, and specificity used in evaluating all the calibration models of the two groups. R2Xcum and R2Ycum represented the cumulative sum of squares of all the X's (metabolic data) and Y's (disease category data) explained by all extracted  components. Q2Ycum is an estimate of how well the model predicts the Y's [28]. High coefficient values of R2Y and Q2Y represent good prediction [29]. As displayed by the score plots of K-OPLS (Figure 1(b)), the two sample groups can be separated into distinct clusters to indicate the changes in the metabolic response of serum samples from the DM patients with excess and deficiency syndrome. The model statistics R2X = 0.425, R2Y = 1.000, and Q2Y = 0.944 in the model suggest a highly predictive and general model (Table 3). Because the nonlinear method was used in the present study, the R2X had less significant meanings. On the contrary, the major indicator is AUC to evaluate the models' accuracy in nonlinear method. AUC = 0.968 (95% confidence interval = 0.950-0.987) predicted that the models had high accuracy (Table 3).

Representative Differential Metabolites Based on Multivariate and Univariate
Analysis. The metabolites contributed for the separation between groups derived from UPLC-QTOFMS analysis were selected in accordance with the criteria of multivariate statistics (GA, < 0.001, Figure 2(a)) and nonparametric univariate statistics (Wilcoxon rank sum test, < 0.05, Figure 2(b)). Four differentially expressed metabolites including 2-indolecarboxylic acid, hypotaurine (HTAU), pipecolic acid, and progesterone between DM patients with excess and deficiency syndrome were found (Figure 3). The serum levels of those four metabolites were higher in DM patients with excess syndrome than those with deficiency syndrome.

Discussion
Serum patterns of metabolites reflect the homeostasis of the organism to some extent. Metabolomics, a discipline dedicated to the global study of metabolites, may deepen our understanding of human health and diseases. In the present study, we found that four metabolites can differentiate two different TCM syndromes in DM, which cannot be characterized by the clinical biochemical indicators. The clear separation between two groups by TCM symptoms and metabolic profiles illustrated that excess and deficiency syndrome had their own substance fundaments.
Clinical characteristics of this subset of subjects in Table 1 showed there was no significant difference between the Evidence-Based Complementary and Alternative Medicine 5 Accuracy of classification of cross-validation (ACCV) produced from each combination of and Ao parameters after cross-validation. b R2Xcum and R2Ycum represent the cumulative sum of squares (SS) of all the X's and Y's explained by all extracted components. c Q2Ycum is an estimate of how well the model predicts the Y's. d AUC in 0.5∼0.7 has lower accuracy, AUC in 0.7∼0.9 has certain accuracy (model can be accepted), and AUC in more than 0.9 has high accuracy. When AUC = 0.5, the model has no value. deficiency and excess groups at baseline, in terms of sex, WC, HC, WHR, DBP, TG, ALT, VLDL, and HDL levels, except for age, BMI, and the coincidence of fatty liver disease, which were significantly higher in the deficiency group than in the excess group. The higher age in deficiency group is in accordance with clinical TCM theory that Qi, Xue, Yin, and Yang are more insufficient in older than in younger persons. This result shows that the clinical biochemical indicators are difficult to differentiate the TCM syndromes. Hence novel approaches for differentiating syndromes are urgently needed. The nontarget metabolomics provides a global view of the organism and also provides an opportunity to stratify the different TCM syndromes like we performed before [13].
In the present study, we performed UPLC-QTOFMSbased serum metabolic profiling combined with GA-KOPLS analysis on DM patients with different syndromes and four metabolites were eventually found between the two TCM syndromes. In order to exclude the effect of age and BMI, we separated two subsets with the cutoff of 70 in age, 25 in BMI. We found that there was no significant difference of the four differential metabolites between groups with Mann-Whitney Test Analysis ( > 0.05, Supplementary Table S1). Furthermore, we performed the KOPLS model based on the same parameters. It was shown that no matter the age >70 or ≤70 ( = 161 versus 134), deficiency group and excess group could be distinctly separated on the classification (Supplementary Figure S2). Similar results were also found in BMI ≥25 or <25 ( = 153 versus 142) (Supplementary Figure S3). Those results prompt that the age and BMI with significant difference between deficiency and excess groups do not affect our final metabolomics results.
HTAU is a product of enzyme cysteamine dioxygenase in taurine and hypotaurine metabolic pathway. It may function as an antioxidant and a protective agent under physiological conditions [31,32], and it results in the prevention of peroxynitrite-induced tyrosine nitration to 3-nitrotyrosine and oxidation to dityrosine. Nitration and oxidation of tyrosine residues in proteins have been detected in several conditions of oxidative stress that involve the overproduction of NO + and oxygen radicals. Hence, it is tempting to postulate that the protection afforded by HTAU on tyrosine modification may have important physiological significance. Gossai and Lau-Cam compared taurine, aminomethanesulfonic acid, homotaurine, and HTAU for the ability to modify indices of oxidative stress and membrane damage  associated with T2DM. Relative to control values, taurine and its congeners had equiproctective roles in reducing membrane damage, the formation of intracellular malondialdehyde and oxidized glutathione, and the decreases in reduced glutathione and antioxidative enzyme activities in diabetic erythrocytes [33].
Pipecolic acid (piperidine-2-carboxylic acid), the carboxylic acid of piperidine, is a small organic molecule which accumulates in pipecolic acidemia. It is a metabolite of lysine found in human physiological fluids such as urine and serum. However, it is uncertain whether pipecolic acid originates directly from food intake or from mammalian or intestinal bacterial enzyme metabolism.
Progesterone, converted from pregnenolone, serves as an intermediate in the biosynthesis of gonadal steroid hormones and adrenal corticosteroids. Progesterone was observed to have antioxidant properties, reducing the concentration of oxygen free radicals [34]. Recently, a crosstalk between progesterone and melatonin has been observed in various preclinical studies. The melatonin is reported to increase progesterone level and expression of progesterone receptors in reproductive tissues [35].
Interestingly, the above four metabolites are all related to oxidative stress. We suggest that diabetic patients with excess syndrome may have more severe systemic oxidative stress than those with deficiency syndrome. Understanding syndromes is a core research to develop more efficient therapeutic strategies, classification, and diagnostic criteria for patients. It will contribute to TCM syndrome objectification and standardization to the better diagnosis and therapy of disease. Further investigations with larger sample sizes are needed to confirm our findings.

Conclusion
The present study provides an improved understanding of the systemic alteration of metabolites in different syndromes of DM. It also presents that metabolomics method would be helpful in establishing a suitable model for reasonably evaluating disease syndrome, exploring pathological mechanism of syndrome, and clarifying the relationships between the syndrome and related diseases. Furthermore, the identified serum metabolites may be of clinical relevance for subtyping of diabetic patients, leading to a personalized DM treatment.