Interpretation of a Quantitative Diagnosis Model of Traditional Chinese Medicine Syndromes Based on Computer Adaptive Testing

Objectives The aim of this study is to interpret a quantitative diagnosis model of traditional Chinese medicine (TCM) syndromes based on computer adaptive testing (CAT), from the perspective of both patients and clinicians. Methods In this cross-sectional study, patients with postprandial distress syndrome completed the CAT model of TCM syndromes and the Chinese version of the Quality of Life Questionnaire for Functional Digestive Disorders (Chin-FDDQL); the clinicians' diagnosis was concurrently recorded. The patients completed this questionnaire again after 14 ± 2 days. The kappa test and paired chi-square test were used to evaluate the consistency between the CAT model and clinical diagnosis. Minimal clinically important differences (MCID) of the Chin-FDDQL scores were used to assess clinical efficacy from the patients' perspective. Logistic regression was used to examine the association between changes in the CAT model syndrome domain scores and changes in clinical outcomes. Results Changes in the CAT model syndrome domain scores may affect the clinical outcomes of patients with the total scores of Chin-FDDQL (all P < 0.05). There was a correlation between changes in the CAT model syndrome domain scores and the patients' clinical outcomes. Different syndrome elements had different effects on various Chin-FDDQL domains, which was consistent with the theory of TCM. Conclusions This study proposes a method for the clinical interpretation of the CAT model of TCM syndromes, including evidence derived from the application. It may provide a reference for future interpretation of other CAT models.


Introduction
Accurate syndrome diagnosis is the foundation of effective management and treatment [1]. However, traditional approaches to syndrome diagnosis rely on clinicians' experience; at present, there is a lack of objective traditional Chinese medicine (TCM) syndrome detection protocols [2]. erefore, the use of statistical models and artificial intelligence, among other methods, has increased in recent years, aiming to make TCM syndrome differentiation objective [3,4].
We have previously introduced syndrome elements based on the theory of TCM and established a quantitative diagnosis model of TCM syndromes in functional gastrointestinal diseases, specifically, functional dyspepsia and irritable bowel syndrome, using the traditional statistical theory and modern advanced measurement theory [2,5]. e model allows patients to input their symptoms and obtain scores per syndrome domain, helping to quantify the syndrome; subsequently, the model has been combined with computer technology. e computer adaptive test (CAT) model [6] streamlines the process of patients inputting their symptoms and maintains accuracy in syndrome differentiation. It is a novel and feasible tool for the quantification of TCM syndromes.
However, the clinical interpretation of the model has not been examined to-date. To our knowledge, a clinical interpretation of the quantitative CAT model of TCM syndromes has not been established for any specific disease. In fact, the accuracy of syndrome differentiation is often based on a clinician's judgment, which is unsatisfying. Patients are the main recipient of syndrome differentiation. e TCM theory stipulates that changes to any of the syndrome domains may change patients' symptoms and outcomes. erefore, we examined the changes in patients' symptoms and clinical outcomes to assess whether the CAT modelbased syndrome differentiation is accurate, helping in the quantification and objective assessment of TCM syndromes.
In this study, we used a postprandial distress syndrome (PDS) model. PDS is among the most common functional gastrointestinal diseases observed in clinical practice, and its incidence is increasing. PDS is not life-threatening; however, it is associated with a long disease course and recurring symptoms, which may affect the patients' quality of life. PDS is also associated with a high economic burden to patients and healthcare systems [7]. It can be divided into two subtypes: PDS and epigastric pain syndrome (EPS). Impaired gastric accommodation is more prevalent in PDS than in EPS [8]. PDS belongs to the TCM category of gastric stuffiness and is among the most common diseases in the clinic. TCM has been reported as an effective complementary and alternative approach in the treatment of PDS [9,10].
ere are no laboratory indicators that evaluate clinical outcomes of PDS. TCM tends to account for patients' subjective symptoms; therefore, we converted patients' symptoms into numeric values, using the Chinese version of the Quality of Life Questionnaire for Functional Digestive Disorders (Chin-FDDQL) [11], which is a commonly used patient outcome reporting scale; minimal clinically important differences (MCID) were calculated to estimate any relationship between the changes in the Chin-FDDQL scores and clinically meaningful outcomes for patients. e MCID may help make symptom and outcome reporting more objective [12]. It is commonly used in the clinical interpretation of patient-reported outcomes and can be calculated using anchor-and distribution-based methods [13][14][15][16].

Methods
is cross-sectional study included patients who attended the outpatient clinic at the study site. is work was approved by the Clinical Research and Ethics Committee at the First Affiliated Hospital of the Guangzhou University of Chinese Medicine (NO. K (2019) 074), and all patients were diagnosed by senior clinicians referring to the Roman IV classification criteria for PDS.
Patients were eligible for the present study if they met the following criteria: aged ≥16 years, met the Rome IV PDS criteria, and agreed to study participation. Patients were excluded from the present study if they had other digestive system diseases, cognitive or other impairments (including mental illness and visual impairment, among others) that affected their ability to complete self-reports, or diagnoses of cardiovascular or cerebrovascular diseases, renal insufficiency, hematopoietic system, or another serious primary disease; pregnant women were also excluded from the present study. Further, data from patients that met the following criteria were considered "invalid" and were excluded from analysis: misdiagnosis, another diagnosis, or a major accident experienced during the study period, loss to follow-up, or missing ≥20% of data.

Data Collection.
e CAT model is an adaptive quantitative evaluation system (patent no.: 2017 sr559575) for TCM syndromes of FGIDs, covering three TCM diseases: stomachache, gastric stuffiness, and diarrhea. It integrates the TCM syndrome differentiation diagnosis tree, artificial intelligence, computer engineering, and multivariate statistical models that account for syndrome domains and other aspects of the TCM theory. Development, simulation, and verification of the CAT model have been previously described [6,[17][18][19]. In this study, we selected a common PDS disease, which belongs to the gastric stuffiness category of TCM, to explore the CAT model clinical interpretation methods.
e gastric stuffiness CAT model had 39 items extracted from a bank of 215 items. It used the maximum determinant value of the information matrix to select the next test item; in addition, the maximum a posteriori capability level assessment estimates were used.
ere were 20 answers available as the test termination condition. We asked patients to input data on their symptoms and experiences into the CAT evaluation system. Finally, the patients' scores per syndrome domain were displayed in the form of a radar chart.
e Chin-FDDQL [11] was translated by our team from the original version, designed to measure the pathology and symptom scores of FD and irritable bowel syndrome across eight domains (daily activity, anxiety, diet, sleep, discomfort, health perceptions, stress levels, and total scores) and 43 items [20]. It is a useful health assessment instrument for Chinese patients with FD; it is associated with good reliability, validity, responsibility, item test function, differential item functioning characteristics, and interpretation systems [19,20].
Outcome assessment and follow-up protocols were as follows. First, the investigators presented the study aims to eligible patients; subsequently, the patients completed the Chin-FDDQL, using the Wen Juan Xing application, and the CATmodel system; the questionnaires were completed again after 14 ± 2 days. e clinicians' diagnoses were recorded at the same time; for patients unable to attend follow-up 2 Evidence-Based Complementary and Alternative Medicine assessments on schedule, we provided a link to the electronic version of the scale via WeChat or we collected their answers via phone interviews, subsequently requesting that the participating clinicians make a diagnosis based on the patient's statement.

Statistical
Methods. e CAT model and Chin-FDDQL data were exported to and sorted in Excel. To standardize the evaluation of syndrome domain, the CAT model scores were transformed, according to the distribution characteristics of the full-sample computer adaptive test scores. e conversion formula was as follows: e clinicians' syndrome differentiation results were divided into syndrome element forms, according to the theory of syndromes, and used as state variables. e CAT model diagnosis results were used as test variables to draw the receiver operating characteristic (ROC) curve for every syndrome element. e area under the curve (AUC) was used to verify the accuracy of model diagnosis; AUC values of >0.8 were considered indicative of high model accuracy.
e Youden Index was used as a reference parameter; when the Youden Index reached its maximum value, the score corresponding to the cut-off point was regarded as the diagnostic threshold of an element.
We examined the CAT model diagnosis from the physician's perspective, according to the diagnostic threshold of every syndrome domain. We then used the kappa test and paired chi-square test to analyze the consistency between the CAT model and expert diagnoses. Kappa values of ≥0.75 indicated excellent consistency; those 0.40-0.75 and <0.40 represented fair to good and poor consistency, respectively.
To account for the patients' perspective, we used the paired sample t-test or Wilcoxon signed-rank test to measure the responsiveness of the Chin-FDDQL scores to timedependent changes. We then calculated the associated MCID. To reduce bias associated with using a single method, we obtained averages of the MCID values by anchor-based and distribution-based methods; these values were used as final estimates. Anchor-based methods rely on an external measure of change as the standard, and distribution-based methods are based on a statistical measure of variability.
Because PDS has no objective index for clinical efficacy evaluation, we chose the most applied patient self-assessment method, adding an item as an anchor at the end of the Chin-FDDQL. is item was captured during the follow-up period to determine the MCID [21]. is item was "how do you feel now compared with last time?," with the following response options: obviously worse, somewhat worse, no change, somewhat better, and obviously better; the corresponding scores were set to −2, −1, 0, 1, and 2 points, respectively. We identified patients who reported having experienced a change and then calculated the difference between their baseline and follow-up Chin-FDDQL scores (total and domain-specific). If the score difference values obeyed the normal or skewed distribution, the mean or median of the difference was used as the MCID value, respectively. is study used the common effect size (ES) estimating methods [21]; MCID was estimated by multiplying the baseline standard deviation value of the Chin-FDDQL scores by the ES. Some studies in China have proposed an ES value of 0.5 [22], while recommending ES values of 0.2 for the evaluation of the MCID in the Western context [23]. erefore, we used both methods to estimate the MCID; we combined these estimates with the expert opinion to obtain the MCID that reflected clinical practice.
To explore the clinical value of the CAT model, we compared changes (d) to the Chin-FDDQL total and domain scores with the corresponding MCID; d ≥ MCID represented clinical benefits from the patients' perspective. We then classified patient outcomes into "change" and "no change" groups. Finally, we performed logistic regression analysis to explore the association between syndrome element score changes in the CATmodel (independent variable) and clinical outcomes (dependent variable) (1 � clinically significant change, 0 � no clinically significant change). Figure 1 presents a schematic of the approach to the CAT model exploration.

Results
A total of 300 patients with PDS were included in the present study at baseline, and a total of 291 patients were included at follow-up, with a total of nine study dropouts. e patients' demographic characteristics are presented in Table 1. ere were slightly more females than males; most patients were young and middle-aged, and the proportion of those with a bachelor's or higher degree was relatively high.
Syndrome element scores included in the CAT model are presented in Table 2. Although the average syndrome element scores decreased over time, this change was not uniform; specifically, liver and qi stagnation syndrome element scores changed markedly, while spleen-dampness and stomach syndrome element scores changed the least. e AUC for the CAT model was >0.8 (Appendix I). e Youden Index values for liver, stomach, spleen-dampness, qi deficiency, heat, and qi stagnation diagnostic thresholds were 39, 44, 52, 41, 47, and 43 points, respectively. e McNemar test findings of the qi deficiency syndrome element (P < 0.05) differed between the CAT and clinician diagnoses. e kappa coefficient was 0.628, indicating diagnostic consistency; however, the kappa coefficient was <0.75 for the general diagnostic consistency. For the other five syndromes, the McNemar test result was nonsignificant, indicating consistency between the diagnoses obtained by the CAT model and those obtained by clinicians. e kappa test revealed moderate consistency between the two diagnoses; the heat syndrome was associated with the highest kappa coefficient. e total scores of the Chin-FDDQL did not obey the normal distribution; the Wilcoxon test finding revealed Pvalues of <0.05, indicating sensitivity of the Chin-FDDQL score to any changes in patients' conditions.
Using the patients' experience as anchors, we identified 198 patients reporting changes in their condition, including 39, 156, and 3 patients that experienced obvious and some Evidence-Based Complementary and Alternative Medicine improvement, and a worsening of their condition, respectively. e differences in scores were normally distributed, and the average value was used as the MCID. However, PDS is a recurrent chronic disease, with long courses of treatment; given the study period of 14 ± 2 days, no obvious changes from baseline were observed. In our previous study, the total Chin-FDDQL score change was approximately 4 points (minimum clinically significant change) [24]. erefore, in the present study, we used the median MCID as the total score. e differences in domain scores followed a skewed distribution; thus, the median value was used; however, the median scores in the diet and stress domains were 0 points, which were inconsistent with actual clinical. Considering the possible bias of sample, we combined expert opinion method and took the average as MCID in diet and stress domains. Given that the study period was short, the changes between baseline and follow-up scores were small and    Table 3. e final MCID results are determined by the weighting methods.
According to the results, the total scores of the Chin-FDDQL must be changed at least 4.5485 to represent clinically meaningful improvements. Among various domains, the MCID values of worry (6.9477) and disease control (6.2919) were high, and those of daily activities (3.2043) and stress (3.5209) were low, suggesting that anxiety and disease control scores require greater changes than do daily activity and stress scores for patients to experience clinical benefits.
Tables 4-12 present findings on the association between changes in syndrome element scores of the CAT model and the patients' clinical outcomes, suggesting that score change in any syndrome element may affect patient outcomes; changes to the spleen-dampness scores had the greatest impact on patient outcomes.

Discussion
Syndrome diagnosis is at the core of TCM prescriptions. However, there is no standardized approach to syndrome diagnosis, which may restrict TCM modernization. Quantitative models integrated with modern technologies such as artificial intelligence may provide novel instruments for the objectification of TCM syndrome assessment. Despite a growing number of available models, the clinical interpretation of the model has not been established to date, resulting in the model not being used in clinical practice.        In this study, we used the conventional method to evaluate the reliability of the model diagnosis by considering the clinician syndrome differentiation results as the gold standard; AUC values were >0.8, indicating good diagnostic accuracy of the model (P < 0.05). Nevertheless, this finding suggests that the model may be further optimized, likely by adding factors such as tongue and pulse diagnosis. is study focused on a new approach from the perspective of patients to interpret the model. e findings obtained by conventional methods are presented in the appendix. e aim of this study was to establish a relationship between the CAT model findings and clinical practice. e diagnosis of a syndrome is made by clinicians, based on a set of patients' symptoms and signs; the TCM theory postulates that symptom changes may affect syndrome elements, suggesting a correlation between changes to the model syndrome element scores and clinically meaningful outcomes. Based on model score changes, clinicians may track patient symptom changes, providing evidence for the efficacy of TCM. However, to our knowledge, there is currently no objective method to assess the relationship between subjective symptoms and syndrome characteristics.
A scale is a commonly used clinical instrument to measure the disease status that cannot be accurately quantified. is study used the Chin-FDDQL to measure patients' subjective symptoms, and the MCID was used to correlate the Chin-FDDQL scores with clinical outcomes.
e relationships between the model score changes and patient outcomes were quantified, aiding in an objective interpretation of the model. e MCID refers to the minimum change in scores that a patient considers beneficial, regardless of the associated side effects or costs [12]. It has been used in the clinical interpretation of scales related to computer adaptive tests [25][26][27]; however, a TCM syndrome quantification model remains to be established.
In this study, we applied the MCID to the clinical interpretation of the TCM syndrome quantification model, showing that any syndrome element score change may affect   Evidence-Based Complementary and Alternative Medicine clinical outcomes. In addition, we found that different syndrome elements of the CAT model differentially affected the Chin-FDDQL domains, in a manner consistent with that proposed by the TCM theory; for example, stomach scores may significantly affect discomfort and diet outcomes, both of which are associated with gastrointestinal complaints. is study has two main limitations. First, the study period was relatively short, and the captured score changes were small; consequently, this study showed a relationship between score changes and patient outcomes but did not establish a specific regression equation. Future studies should involve extended follow-up to explore the contribution of every syndrome element to the changes in clinical outcomes and to establish the corresponding regression equation to improve syndrome quantification. Second, despite the use of multiple methods to estimate the MCIDs, the presented values may be subject to bias. e strengths of this study include objective evaluation of the accuracy of syndrome differentiation by the CAT model, and a description of a novel method for the interpretation of a quantitative diagnosis of a TCM syndrome.

Conclusion
is study showed an association between changes to the syndrome element scores of the CAT model and patient outcomes. In addition, this study showed that changes to different syndrome elements had a differential impact on the Chin-FDDQL sub-scores; this finding is consistent with the theory of TCM, which indicates that the MCID values of the relevant quality of life scales may aid the clinical interpretation of the CAT model of a TCM syndrome. ese findings may provide a reference for interpretation of other CAT models.

Disclosure
Simeng Yao and Zhongyu Huang contributed equally to this article as co-first authors.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request. Requests for data, (6/12 months) after publication of this article, will be considered by the corresponding author.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.

Authors' Contributions
Simeng Yao and Zhongyu Huang contributed to the study concept, data analysis, writing of the manuscript, and reviewing of the draft. Simeng Yao, Zhongyu Huang, Xianhua Liu, Qiaofeng Yan, and Jing Tang contributed to patient recruitment and data collection. Fengbin Liu and Zhengkun Hou contributed to the study concept and manuscript review.