Traditional Chinese Medicine Constitution Discrimination Model Based on Metabolomics and Random Forest Decision Tree Algorithm

Constitution refers to the comprehensive and relatively stable characteristics of the genetic or acquired morphological structure, physiological function, and psychological state in the process of human individual life. A special metabolomics data processing method is established to find the unique m/z value of each constitution. Combined with the random forest decision tree algorithm, the discrimination model of 9 constitutions in traditional Chinese medicine is constructed, and the model is verified and tested. The test results show that the classification accuracy of each constitution is higher than 80%, indicating that the model can well identify nine constitutions of traditional Chinese medicine. The classification accuracy is related to the difficulty of distinguishing between constitutions. In a word, this study provides a fast and accurate method to distinguish the constitution of traditional Chinese medicine, provides an objective representation for the classification and judgment of clinical constitution of traditional Chinese medicine, and provides a scientific basis for the modernization of traditional Chinese medicine.


Introduction
Constitution refers to the comprehensive and relatively stable characteristics of human individual's morphological structure, physiological function, and psychological state formed by heredity or acquired during human individual's life. Under the physiological state, the difference of constitution is reflected in the difference in response and adaptability to external stimuli, as well as the susceptibility to some pathogenic factors and the tendency of disease development. e study of constitution is helpful to analyze the Xueyu constitution, and Tebing constitution, among which Pinghe constitution is a relatively healthy constitution and the rest of them are biased constitution [3].
Pinghe (peaceful) constitution [4][5][6]: a strong physical state, which is characterized by moderate posture, ruddy complexion, and energetic state; Qixu (qi deficiency) constitution: a physical condition characterized by weak breath and low function of the body and viscera; Tanshi (phlegm dampness) constitution: it is a physical state characterized by stagnation in water and fluid and condensation of phlegm and dampness, with viscosity and turbidity as the main characteristics; Shire (damp heat) constitution: a physical condition characterized by damp heat; Yangxu (Yang deficiency) constitution: a physical condition characterized by deficiency of Yang qi and deficiency of cold; Qiyu (qi depression) constitution: a physical condition characterized by introversion, instability, melancholy, fragility, sensitivity, and paranoia due to long-term emotional stagnation and qi stagnation; Yinxu (Yin deficiency) constitution: due to the deficiency of Yin fluid such as body fluid, essence, and blood in the body, the physical state is characterized by Yin deficiency and internal heat; Xueyu (blood stasis) constitution: it refers to the physical condition that there is a potential tendency of poor blood operation or the pathological basis of blood stasis internal resistance in the body and shows a series of external signs; Tebing (special) constitution: manifested as a specific constitution, mostly refers to a physical defect caused by congenital and genetic factors, including congenital and genetic physiological defects, congenital and genetic diseases, allergic reactions, and primary immune defects.
Compared with the highly subjective TCM syndrome diagnosis criteria and syndrome differentiation index system, metabolomics shows the overall changes of metabolic products in the body and predicts the internal changes of the body through the whole. Its characteristics of integrity are consistent with the thinking mode of "holistic view," "syndrome differentiation and treatment," and "outside the department, inside the department" in TCM [7]. rough the dynamic tracking analysis of metabolites in the metabolic cycle, with the help of multivariate statistical analysis method, it can reflect the changes of functional status of organisms caused by internal and external factors such as genes and environment instantly and sensitively, providing objective information for clinical practice. Compared with other omics, metabolomics is closer to phenotype and has similarities with TCM constitution identification [8][9][10]. Random forest (RF) algorithm is an ensemble learning method. On the premise of taking CART decision tree as the base classifier, RF introduces random selection samples and random selection features in the training process of decision tree to form multiple decision trees. e final classification result is the mode or average value of output categories of all decision trees in the forest. RF has the advantages of high prediction classification accuracy, strong generalization, and fast training speed so that it can be used for classification, regression, and feature importance analysis. erefore, we combine metabolomics with algorithm to make a preliminary analysis of TCM constitution from the "scientific" and "objective" perspective, which will bring a more comprehensive understanding to reveal the phenomenon of TCM constitution [11][12][13].
By recruiting volunteers of 9 kinds of constitution of TCM, this study uses the UPLC-Q-TOF-MS technique to detect and collect the serum of all volunteers so as to establish the discriminant model of 9 kinds of constitution of TCM through the RF algorithm, which provides a new method and idea for the study of physical classification and provides the objective basis for revealing the principle of theoretical physical classification of traditional Chinese medicine.

Volunteer Recruitment and Ethics Lot Number.
is study recruited ordinary people in two centers (Jiangxi University of Chinese Medicine and Physical Examination Department of Affiliated Hospital of Jiangxi University of Chinese Medicine) through oral publicity, recruitment posters, and public account publicity [14]. e volunteers who met the inclusion criteria knew the details of the experiment and signed the informed consent to complete the inclusion. e batch number of ethics is JZFYLL20200914015.

Constitution Diagnostic
Criteria. e constitution identification system of tongue and pulse information acquisition of DS01-A was adopted, combined with the standard of ZYYXH/T157-2009 (Classification and Determination of TCM Constitution) issued by China Association of Traditional Chinese Medicine, and the Table of Classification and Determination of TCM Constitution in the standard was filled in. e criteria for determination of mild constitution and 8 kinds of biased constitutions are shown in Table 1.

Inclusion Criteria
(1) Meet the classification criteria of TCM constitution (2) General population aged ≥18 years old (3) Voluntarily participate and sign informed consent

Exclusion Criteria
(1) People with mixed constitution (2) Pregnant or lactating women (3) ose with mental, cognitive, conscious disorders who cannot cooperate to complete the test (4) Patients with serious primary diseases of heart, brain, and hematopoietic system, serious liver and kidney

Sample Collection.
e volunteers did not take any drugs before blood collection on the specified date of sample collection (after 10pm before blood collection, no food was taken). After blood collection, the volunteers stood for 3 h and centrifuged for supernatant, which was stored at −80°C after collection.

Sample Preparation.
Frozen serum samples were thawed at room temperature. en, 100 μL of the sample was placed in centrifuge tubes, and 300 μL of methanol was added. e tubes were vortexed for 1 min, incubated for 3 h at 4°C, and then centrifuged (21300 × g, 10 min, 4°C). e supernatants were collected and dried by SpeedVac ® , and the residues were reconstituted in 200 μL of methanol: water (15 : 85). en, the samples were vortexed for 1 min and centrifuged (30000 × g, 15 min, 4°C). e supernatants were collected and subsequently analyzed following a previously described UPLC-Q-TOF-MS-based untargeted metabolic profiling strategy [15][16][17].
(2). Q-TOF-MS Conditions. e ionization source temperature is 120°C. e flow rate of cone-hole gas is 50 L·h −1 . e temperature of solvent removal was 400°C, and the flow rate was 800 L·h −1 . In positive and negative ion modes, the capillary voltage was 3.0 kV, 2.5 kV, and the taper hole voltage was 40 V. In the positive mode, the extraction taper hole voltage was 80 V. e compensating voltage is 80 V in the negative ion mode. e mass number ranges from 50 to 1000 Da. To ensure the classification accuracy and repeatability of quality, sodium formate standard was used to establish the quality axis standard curve, and leucine enkephalin was used for real-time quality correction. e tandem mass spectrometry collision gas is argon, with low impact energy of 4 eV and high impact energy of 20-40 eV.
In this experiment, the stability of quality control samples (QC) was investigated to ensure the stability of instrument detection. It is necessary to run five QC samples before sampling, so that the instrument can reach a stable state.

Instrument Stability Investigation and Data Processing.
Pearson analysis was performed on 6 QC samples obtained in positive and negative modes, respectively, and correlation coefficients were calculated to investigate the stability of instrument detection.

Data
Processing. After importing the data into Progenesis QI 2.0 software (Waters Corporation, USA), a text file with retention time and mass-to-charge ratio information is obtained [23]. Variable quality control: based on the QC group samples, variables with a coefficient of variation greater than 30% in the QC samples were deleted. e preprocessed data matrix was imported into GraphPad Prism 8.0 software (GraphPad Software,USA), and the eight biased constitutions were compared with the Pinghe constitutions, respectively. e m/z values of the significant differences between the eight biased constitutions and Pinghe constitutions were obtained. e common m/z values of the eight biased constitutions differing from the Pinghe constitutions were found through the intersection method, and the common m/z values were removed to obtain the unique m/z values of the eight biased constitutions. is step was completed on the jvenn website (https:// jvenn.toulouse.inra.fr/app/example.html), and the Venn diagram could be more intuitive. [24][25][26]. e unique m/z values of the eight biased constitutions obtained in 2.2.8 were used to establish the model, and the missing values in the data set were filled with the corresponding feature (m/z) mean values of each constitution. n training sets are obtained by random sampling of original training sets with replacement. m features are randomly selected for each training set to obtain n classification models, and then the optimal classification is determined by voting.

Building Discriminant Model. (1). e Discriminant Model of 9 Kinds of Constitution of TCM Was Established by RF Algorithm
(2). e 10-Fold Cross Validation Method Was Used to Test the Model. 10-fold cross-validation refers to dividing the data into 10 equal pieces. Each time the data is classified, one of the pieces is selected as the test set and the remaining 9 pieces are selected as the training set. Repeat 10 times so that each piece of data is used for one test set and nine training sets. e advantage of this method is that, as much data as possible is used as training set data, and each training set data and test set data are independent of each other and completely cover the whole data set.
Classification accuracy is the probability that samples are correctly classified, and the calculation formula is as follows:

Comparison of Clinical General Data.
e results of clinical examination showed that all the 9 groups of volunteers met the requirements for inclusion, and there were no differences in other indicators except physical differences. In the Qixu group, there were 12 males and 37 females, with a mean age of 26.41 years (standard deviation 4.11). In the Tanshi group, there were 16 males and 33 females, with a mean age of 28.98 years (standard deviation 4.41). In the Pinghe group, there were 12 males and 37 females, with a mean age of 26.96 years (standard deviation 4.37). In the Shire group, there were 16 males and 33 females, with a mean age of 26.73 years (standard deviation 2.94). In the Yangxu group, there were 10 males and 39 females, with a mean age of 27.57 years (standard deviation 4.33). In the Qiyu group, there were 13 males and 36 females, with a mean age of 26.69 years (standard deviation 2.59). In the Yinxu group, there were 14 males and 35 females, with a mean age of 26.31 years (standard deviation 4.75). In the Xueyu group, there were 15 males and 34 females, with a mean age of 26.67 years (standard deviation 2.14). In the Tebing group, there were 11 males and 38 females, with a mean age of 28.51 years (standard deviation 4.21). ere were no significant differences in the distribution of sex and age among the nine groups of constitutions, as analyzed by the chi-square test and analysis of variance, respectively ( Table 2). Figures 1 and  2 (uploaded separately) show the total ion flow (TIC) of 9 TCM constitutions in positive and negative ion modes, respectively.

Instrument Stability Investigation.
By calculating Pearson correlation coefficient between 6 QC samples, the closer the correlation coefficient is to 1, the better the system stability is and the higher the data quality is. Correlation of QC samples is shown in Figure 3. As shown in the figure, R 2 is greater than 0.99 and close to 1 in both positive and negative modes, indicating that the system has good stability and high data quality in the detection process.    Table 3. e m/z values of the eight biased constitutions are obtained by comparing them with the Pinghe constitutions.

m/z Values Unique to Eight Biased Constitutions.
e intersection of one of the biased constitutions and the other seven biased constitutions was carried out to obtain the common m/z values of the eight biased constitutions. After removing the common m/z values, the unique m/z values of the eight biased constitutions were obtained, as shown in Figures 4 and 5. Table 4 shows the number of unique m/z values of the eight biased constitutions.

e Test Results.
e verification results of 9 constitutions are shown in Figure 6. Under the positive mode, the overall classification accuracy is 89.66%, among which the average classification accuracy of Yinxu constitution is the highest, reaching 97.78%, and the average classification accuracy of Qiyu constitution is the lowest, reaching 86.38%. Under the condition of negative mode, the overall classification accuracy was 90.68%, among which the average classification accuracy of Yangxu constitution was the highest, reaching 99.38%, and the average classification accuracy of Qiyu constitution was the lowest, reaching 83.35%.

Discussion
According to the test results of the model, the overall classification accuracy rate and the average classification accuracy rate of each constitution are both higher than 80%, indicating that the constructed model can distinguish the 9 constitutions of TCM well, and the classification accuracy rate may be related to the difficulty of distinguishing each constitution. e classification accuracy of Yinxu    Evidence-Based Complementary and Alternative Medicine constitution and Yangxu constitution is high, which may be related to the fact that Yinxu constitution and Yangxu constitution are easily distinguished from other constitutions [27,28]. e classification accuracy of Qiyu constitution is low in both models, which may be related to the difficulty in distinguishing Qiyu constitution from other constitutions or the number of characteristic m/z values of qi depression constitution [29]. e nine constitution discrimination models of traditional Chinese medicine constructed in this study are based   traditional Chinese medicine constructed by us can continuously expand the data, continue to receive the characteristic information of each constitution, and is expected to distinguish many samples to achieve the purpose of scientifically, objectively, and quickly identifying the nine  constitutions of traditional Chinese medicine. However, due to the small sample size used for modeling at present, we need to conduct larger-scale research to further explore the differences in the mechanism of each constitution, provide services for clinical physique identification. [30,31]. By studying the physiological and pathological reaction states of human body and individual differences, TCM constitution divides people into different constitution types, discusses the relationship between different constitution types and diseases, and then realizes the purpose of preventing and treating diseases by intervening in different constitutions. Tian et al. [32] analyzed and discussed the historical evolution of TCM constitution identification, the guidance of TCM constitution identification for syndrome differentiation and treatment, and the guiding significance of TCM constitution identification for prevention and treatment of COVID-19. ey proposed the application of TCM constitution identification theory to analyze the physical characteristics of different populations and carry out early intervention and correction. It may reduce the incidence of COVID-19, and the treatment of COVID-19 should be based on regional and climate differences, combined with the physical characteristics of different individuals, to achieve a combination of syndrome differentiation and body differentiation, to improve the clinical efficacy, to provide theoretical reference for the application of TCM constitution identification in the prevention and treatment of COVID-19. Xie et al. [33] showed that TCM constitution was a risk factor affecting the occurrence of cardiovascular disease in community residents, among which the risk of cardiovascular disease was significantly higher in people with biased constitution, such as qi deficiency, Yin deficiency, blood stasis, and blood deficiency. Hu et al. [34] showed that phlegm-dampness constitution, Yang deficiency constitution, and damp-heat constitution are the main biased constitution types of patients with polycystic ovary syndrome (PCOS), and phlegm-dampness constitution, damp-heat constitution, and blood stasis constitution may be risk factors for the occurrence of polycystic ovary syndrome (PCOS). Yao et al. [35] showed that the physical characteristics of patients with alcoholic liver disease were damp-heat constitution, while those with alcoholic hepatitis tended to be qi stagnation constitution and blood stasis constitution.
e constitution of patients with alcoholic The classification accuracy of 9 constitutions in positive mode cirrhosis is partial to blood stasis constitution and phlegmdampness constitution.

Conclusion
In conclusion, these studies show that constitution types are closely related to diseases, and TCM constitution identification is an important method to determine different individual constitution types. erefore, it is possible to prevent and treat diseases by good constitution identification [36,37].
TCM constitution identification has played and will continue to play an important role in guiding TCM practitioners to prevent and treat human diseases. In this study, based on metabolomics data, a discriminant model of nine physiques of traditional Chinese medicine was constructed, and new method and idea were proposed. e model database can continuously expand data, continue to accept new feature information of each physique, and is expected to be able to identify many samples to achieve the objective, scientific, and rapid identification of physique. Although this study can provide an objective basis for the identification of physiques in traditional Chinese medicine, more research is still needed to further explore the differences in the mechanisms of various constitutions.

Consent
e patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article. Disclosure e funding bodies had no role in the study design, data collection, analysis, and interpretation of data.

Conflicts of Interest
e authors declare no conflicts of interest.

Authors' Contributions
HCD contributed to sample processing, method optimization, visualization, original draft writing, and supervision.
CYF and NB participated in the model construction and algorithm optimization. LBT and XGL contributed to methodology, visualization, oversight, and review and editing. ZQY and JL contributed to model construction and method optimization. CLH and JH contributed to conceptualization, data curation, funding acquisition, and supervision. All authors read and approved the final manuscript.