A Study of Logistic Regression for Fatigue Classification Based on Data of Tongue and Pulse

Methods The Tongue and Face Diagnosis Analysis-1 instrument and Pulse Diagnosis Analysis-1 instrument were used to collect the tongue image and sphygmogram of the subhealth fatigue population (n = 252) and disease fatigue population (n = 1160), and we mainly analyzed the tongue and pulse characteristics and constructed the classification model by using the logistic regression method. Results The results showed that subhealth fatigue people and disease fatigue people had different characteristics of tongue and pulse, and the logistic regression model based on tongue and pulse data had a good classification effect. The accuracies of models of healthy controls and subhealth fatigue, subhealth fatigue and disease fatigue, and healthy controls and disease fatigue were 68.29%, 81.18%, and 84.73%, and the AUC was 0.698, 0.882, and 0.924, respectively. Conclusion This study provided a new noninvasive method for the fatigue diagnosis from the perspective of objective tongue and pulse data, and the modern tongue diagnosis and pulse diagnosis have good application prospects.


Introduction
Fatigue refers to physical tiredness with lack of energy or mental exhaustion with lack of concentration. It can be divided into physical fatigue and mental fatigue [1]. Fatigue is the first cause of subhealth and is one of the most common symptoms in primary care, and it is experienced by many patients with chronic hepatitis [2,3], depression [4], and various types of cancers [5]. Subhealth and a wide variety of diseases are associated with different degrees of fatigue with a negative effect on people's life. With the improvement of general medical care and living standard, fatigue is more and more known by people; however, due to the lack of objective diagnostic evidence, there is still no reliable and stable evaluation method to distinguish disease fatigue and subhealth fatigue.
A large number of clinical practices and studies have shown that tongue and pulse can reflect the overall state of body [6]. Tongue and pulse of fatigue people have their own characteristics. Studies have shown that the tongue of patients with brain fatigue is usually dull, slow in movements, weak, or difficult to stretch [7]. In the physical examination population, the tongue of fatigue population has certain specificity in tongue body color, tongue coating color, and tongue shape. Fatigue is closely related to the tooth mark tongue, and the degree of fatigue is positively related to the tooth mark area [8].
e tongue of patients with fatigue syndrome has obvious characteristics of stasis, mainly purple tongue, petechiae or ecchymosis tongue, sublingual varices, and white thick tongue coating [9]. In patients with chronic fatigue syndrome in Hong Kong, the tongue body is usually light fat and dull, tongue coating is thin white, or thin greasy, and the pulse is usually deep and thin [10].
Intelligent diagnosis of TCM is a new research field in recent years, and it meets the trend that TCM diagnosis methods are developing gradually towards intelligence and potential application in clinical practice [11,12]. In recent research of tongue diagnosis and pulse diagnosis, new diagnosis systems are adopted to collect and analyze clinical data related to disease, and machine learning methods such as Artificial Neural Network [13,14], Support Vector Machine(SVM) [15,16], and KNN [17] are used to establish the corresponding diagnosis model, which can effectively assist doctor on the diagnosis of disease. In recent years, there have been more and more objectified and standardized studies on fatigue based on tongue diagnosis and pulse diagnosis [18][19][20].
Based on the modern research of tongue diagnosis and pulse diagnosis, this study aims to explore the distribution rules of tongue and pulse data in disease fatigue and subhealth fatigue and evaluate the contribution rate to fatigue diagnosis through modeling, so as to provide a new reference for convenient and noninvasive methods of fatigue diagnosis, and if an objective evaluation method based on tongue and pulse data can be established, it will play an important role in the clinical diagnosis of fatigue.

Study Design.
A total of 7,025 subjects were collected from January 2015 to December 2018 in the medical examination center of Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, collecting their Western medicine physical examination index and tongue and pulse data of TCM. e 7,025 subjects were divided into healthy controls (n � 799), a subhealth fatigue group (n � 361), and a disease fatigue group (n � 1529). After excluding the outliers with extreme values in tongue or pulse data, there were 551, 252, and 1,160 subjects in healthy controls, the subhealth fatigue group, and the disease fatigue group, respectively. e overall flow diagram of the study is shown as Figure 1.

Diagnostic Criteria.
Health and subhealth of each individual were determined using the Health Status Assessment Questionnaire H20 Scale [21] and the Information Record Form of Four Diagnosis of TCM [22] (Copyright No. 2016Z11L025702) which were designed by the Sub-Health Research Group. Excluding the disease population, the population with a score between 60 and 79 on the H20 scale was the subhealth population, and the population with a score between 80 and 100 on the H20 scale was healthy controls. e diagnostic criteria of disease are shown in Table 1.
Disease was diagnosed by four well-trained clinicians according to the abovementioned diagnostic criteria of Western medicine. e Information Record Scale of Four Diagnosis of TCM and H20 scale were used to select fatigue population, and people with "fatigue" symptom in the two scales were judged as the fatigue population.

Tongue Diagnosis and Pulse Diagnosis Instruments.
e TFDA-1 tongue and face diagnosis instrument (Patent no. 2018SR033451) [27] and PDA-1 pulse diagnosis instrument (Patent no. ZL201620157027.6) [28] are shown in Figures 2 and 3; they were used for data collection. e tongue was imaged by using a video camera (Nikon 1 J5) with a fixed-focal lens which has 12 megapixels, and the picture resolution is 5568 * 3712. e color rendering index of light source was 96, and color temperature was around 5,000-6,500 K. e indices of the tongue image were from color spaces of RGB, HSI, Lab, and YCrCb. e prefix TB represented the tongue body index, and TC represented the tongue coating index. e PDA-1 pulse diagnosis instrument uses a pressure sensor (model: HK-2000H). Each of the indices of the tongue and pulse has its meaning [11,13]. In our research, the normal range of L value was 0-255, and in order to better observe the continuity of the trend of data changes and find the data rules and real differences, we rotated the axis of H value by 180 degrees according to the law. e normal distribution measurement data were expressed as " Mean ± SD". Nonnormal distribution data were expressed as quartiles expressed as " Median (P25, P75)." Analysis of Variance (ANOVA) was performed for normality and homogeneity data among groups, the Kruskal-Wallis H test was performed for nonnormal distribution data, and GraphPad Prism Version 8.0 was used for the violin plot. Test level was α � 0.05, and a P value <0.05 (2 tailed) was considered statistically significant.

2.5.
Modeling. Logistic regression analysis was performed for factors with statistical significance by ANOVA or the Rank Sum Test. Logistic regression is often used in data mining, automatic disease diagnosis, economic prediction, and others, and the accuracy of decision can be improved by adjusting the parameters of the regression model [29,30]. e evaluation indices of the model were accuracy, sensitivity, and specificity, as well as ROC curves. ey were defined as follows: In the abovementioned formulas, TP represents the true positive rate, TN represents the true negative rate, FP represents the false positive rate, and FN represents the false negative rate.

3.1.
e Baseline Characteristics of Studied Participants. e main diseases in the disease fatigue group were hypertension, diabetes, hyperlipidemia, and fatty liver, and their distribution is shown in Figure 4. e numbers and percentage in Figure 4 represent the number of patients and the ratio of the number of patients with the disease to the total number of patients, and the overlapping part represents the number and percentage of patients suffering from multiple diseases at the same time. Table 2 shows the general result of the healthy controls, the group of subhealth fatigue, and the group of disease fatigue.
e statistical result showed that compared with the healthy controls, there were significant differences in age and BMI between the group of disease fatigue and the subhealth fatigue (P < 0.01). Table 3 shows the statistical analysis result of distribution of the characteristic parameters of the tongue body and tongue coating among the healthy controls, the group of subhealth fatigue, and the group of disease fatigue.  Evidence-Based Complementary and Alternative Medicine Diabetes [23] Fasting blood glucose ≥7.0 mmol/L and/or blood glucose at any point ≥7.8 mmol/L and/or blood glucose at two hours after meal ≥11.1 mmol/L Hypertension [24] Systolic blood pressure≥140 mmHg and/or diastolic blood pressure ≥90 mmHg Hyperlipidemia [25] TC ≥ 6.2 mmol/L and/or LDL-C ≥ 4.1 mmol/L and/or HDL-C ≥ 4.9 mmol/L and/or TG ≥ 2.3 mmol/L and/or non-HDL-C ≥ 1.55 mmol/L Fatty liver disease [26] Ultrasound examination In order to observe the distribution trend of data more clearly, the violin plots of selected parameters of tongue body and tongue coating with statistical significance were drawn as shown in Figure 5.

Statistical Analysis of Tongue Indices.
e main results of tongue indices were as follows: (1) Comparing among the three groups, the changes of TB indices in the group of subhealth fatigue and disease fatigue were more significant than those in the TC indices. (2) Index difference was more significant between the group of disease fatigue and subhealth fatigue. (3) Several indices (TB-B, TB-R, TB-G, TC-B, TB-I, TB-Y, TB-L, TB-Cb, and TB-Cr) in the healthy controls were between the two fatigue groups. It reflected that the two groups of fatigue people had different tendencies in the changing nature of the tongue. Table 4 shows the statistical analysis result of the distribution of pulse characteristic parameters in healthy controls, the group of subhealth fatigue, and the group of disease fatigue. Figure 6 shows the violin plots of selected parameters of pulse characteristic with statistical significance. e main result of pulse feature parameters showed that t 1 , t 2 , t 3 , t 4 , h 1 , h 4 , h 5 , w 1 , w 2 , w 1 /t, w 2 /t, h 1 /t 1, h 3 /h 1 , As, and Ad had significant statistical differences between the group of disease fatigue and healthy controls (P < 0.05, P < 0.01), t 4 had significantly statistical differences between the group of subhealth fatigue and the healthy controls (P < 0.05), and t 1 , h 1 , h 4 , h 5 , h 1 /t 1 , Ad, w 1 , w 2 , w 1 /t, and w 2 /t had significantly statistical differences between the group of subhealth fatigue and the disease fatigue (P < 0.05, P < 0.01). e main characteristic of result was that the group of subhealth fatigue and disease fatigue showed a gradual increasing tendency in each parameter compared with the healthy controls, and it reflected that the two groups of fatigue people had a consistent tendency in the changing nature of pulse. In addition, the changes of pulse feature in the group of disease fatigue were more significant than those in the group of subhealth fatigue.

Modeling Results.
Logistic regression was used to establish a classification model based on tongue and pulse data of the three groups. e classification result is shown in Table 5.
e ROC curves are shown in Figure 7. In addition, the classification model was reconstructed after adding BMI and age into tongue and pulse data. Table 6 shows the classification result of the reconstructed model. e ROC curves are shown in Figure 8. e research result showed that tongue and pulse data had a good classification effect on healthy controls and disease fatigue, followed by healthy controls and subhealth fatigue. After adding BMI and age, both of the model accuracy and ROC curves were improved. BMI and age are convenient and noninvasive data, which suggested that we could combine BMI and age with tongue and pulse data to improve the diagnostic accuracy of fatigue.

Discussion
In this study, the distribution trends of the objective tongue data were different between the subhealth fatigue population and the disease fatigue population. e study showed that TB-B, TB-R, TB-G, TC-B, TB-I, TB-Y, TB-L, TB-Cb, TB-Cr, TB-a, and TC-a were in an ascending order in the group of subhealth fatigue, healthy controls, and the group of disease fatigue, which indicated that disease fatigue people in general had more purple or red-purple tongue body and more white-greasy tongue coating. e tongue parameters of the subhealth fatigue population were lower than those of the healthy controls, while those of disease fatigue were higher than those of the healthy controls. Certain differences were found in tongue parameters of fatigue groups compared with the healthy controls; that is, subjects in the group of disease fatigue had darker tongue body and tongue coating and more yellow or brown tongue coating, which was more associated with excess syndrome, and the subjects in the group of subhealth fatigue had a pale red tongue body with white coating, which was more associated with the deficiency syndrome. e finding was consistent with the TCM theory that subhealth was manifested as decreased vitality, function, and adaptability, and disease was mostly due to the hyperactivity of evil spirits or dysfunction of the dysfunctional organs caused by phlegm [31], dampness [32], and blood stasis [33] and other pathological products. e result could help to distinguish subhealth fatigue and disease fatigue.
In our study, the pulse analysis result of the three groups showed that fatigue state can directly affect the changes of sphygmogram parameters, and the change had a consistent trend; so to say, the indices of disease fatigue were more abnormal and the differences were more significant compared to healthy controls, while between the group of subhealth fatigue and health controls, only w 2 /t had a statistical difference, and several indices had a significant difference between the group of subhealth fatigue and disease fatigue. As to the distribution trend of pulse indices, the group of subhealth fatigue was located between healthy controls and the group of disease fatigue. Studies   Evidence-Based Complementary and Alternative Medicine 5 of this study, to a certain extent, indicated that patients with disease fatigue had more severe functional decline and other abnormal changes in cardiovascular functions, such as left ventricular function, peripheral resistance, great artery compliance, wall elasticity, and blood viscosity. Since fatigue in the most serious case can cause sudden cardiac death, it was of great practical value to use a sphygmograph to detect fatigue in order to diagnose cardiovascular disease and help to guide the early intervention.
In this study, our focus was on whether tongue and pulse data or tongue and pulse combined with age and BMI could distinguish different fatigue states well and whether age and BMI affected tongue and pulse, but to what extent  Vs. healthy controls, * P < 0.05, vs. healthy controls, * * P < 0.01, vs. subhealth fatigue, # P < 0.05, vs. subhealth fatigue group, ## P < 0.01. 6 Evidence-Based Complementary and Alternative Medicine was not the focus of our study. Age and BMI are the basic information of human health and are closely related to disease. Studies have shown that there was a correlation between age and disease [36]; with the increase in age, the risk of disease gradually increased. BMI is an index of obesity which is closely related to health state, and studies have shown that the BMI combined circumference level can be used to assess the risk of coronary heart disease in diabetic patients [37]. Our actual research results also conform to this law, age and BMI combined with tongue and pulse data had a better effect on the classification of fatigue. is study provided a noninvasive differential diagnosis method for the data-driven evaluation of different fatigue states based on the data of tongue and pulse, and modern tongue diagnosis and pulse diagnosis have good application prospects. e modern technique of tongue diagnosis and pulse diagnosis is simple and feasible. With the development of the information technology of tongue diagnosis and pulse diagnosis, a small, convenient, and movable tongue diagnosis and pulse diagnosis instrument has provided the possibility of family health monitoring. With the development of more wearable health products and more personal health data collection and use, the integration of tongue diagnosis and pulse diagnosis information with other health information can effectively judge fatigue and other health conditions and make early warning of diseases, and it also can effectively promote the development of Internet smart medical treatment and remote diagnosis and treatment and innovate and develop the intelligent diagnosis and treatment model of TCM. In the future, on the basis of multidisciplinary interaction, natural language interaction or graphical interface, multichannel intelligent human-computer interaction, data mining, and machine learning based on big data to achieve automated analysis, we can effectively improve the accuracy of diagnosis and treatment.    Figure 6: Violin plots of the pulse characteristic parameters of the three groups.

Conclusions
In this study, we successfully analyzed the tongue and pulse data characteristics and distribution trend of the fatigue and healthy population; at the same time, logistic regression modeling can realize the diagnosis of disease fatigue and subhealth fatigue to a certain extent. It provided a noninvasive differential diagnosis method for the data-driven evaluation of different fatigue states based on the data of the tongue and pulse.

Consent
Written informed consent was obtained from all patients.

Disclosure
is manuscript has been presented as a preprint in "Research Square" [38]. It has been withdrawn. e funders were not involved in preparing this manuscript or in the decision to submit it for publication. ey had no role in the study's design, collection, analysis, and interpretation of data, or writing the manuscript.  Evidence-Based Complementary and Alternative Medicine 9