Validation of the Constitution in Chinese Medicine Questionnaire: Does the Traditional Chinese Medicine Concept of Body Constitution Exist?

The study aims to adapt and validate the Constitution in Chinese Medicine Questionnaire (CCMQ) in Hong Kong Chinese people. 10 patients and 10 Chinese medicine practitioners (CMP) confirmed the content validity (CVI: 50%–100%) of CCMQ. 1084 HK subjects completed a cross-sectional study with 98.6% who could be classified into one or more BC types. Scaling success rates were 85.7%–100% for the 9 BC scales. Construct validity was supported by moderate correlations between CCMQ and SF-12v2 scores. The confirmatory factor analysis showed a reproducible structure as hypothesized. People with gentleness BC type had better health-related quality of life, HRQOL, than those with other (imbalanced) BC types. Internal consistency (reliability) (Cronbach's alpha  >  0.6) and test-retest reliability were also satisfactory (ICC > 0.6) for all scales. However, the sensitivity and specificity in predicting the BC types diagnosed by CMP were only fair, ranging from 42.7% to 82.7%. 27.6% of subjects had a change from the imbalanced BC types to gentleness BC type after 6 months. The CCMQ was adapted for HK Chinese people and proved to be valid, reliable, and responsive. People classified to have imbalanced BC types had significantly lower HRQOL than gentleness BC type, which supported the validity and importance of the TCM concept of the physiological BC type.


Introduction
Body constitution (BC), an ancient core concept in traditional Chinese medicine (TCM), is widely applied in daily practice by Chinese medicine practitioners (CMP), but there is little standardization on its measurement. Many debate and challenge on this which long regarded as subjective. Studies have found low agreement on the BC type diagnoses among CMP [1][2][3], which prevents proper selection of subjects for clinical trials and hinders the development of TCM research [1][2][3]. Enhancing the consistencies of the fundamental classification of physiological BC type under TCM diagnoses has been a major development direction since the 1990s [4]. With the help of objective and standardized questions, BC concept can be helpful for further investigation. To improve the consistency of diagnosis of BC type, TCM scholars have developed structured questionnaire to classify BC type [4,5]. The most common BC instruments are the Constitution in Chinese Medicine Questionnaire (CCMQ) developed Wang et al. in Mainland China for measuring BC type [6][7][8][9] and the Body Constitutions Questionnaire (BCQ) developed by Su et al. in Taiwan [10][11][12][13][14][15][16][17][18][19]. There are some data supporting the face validity by expert panel discussion, validity (construct validity and criterion validity), and reliability (internal consistency) on 2854 subjects in Mainland China of these questionnaires in populations of their origins [7][8][9]; they have never been evaluated for applicability and validity in other Chinese populations.
The Constitution in Chinese Medicine Questionnaire (CCMQ) was developed by Wang et al. in Mainland China [6], by consensus among experts in TCM BC types. It has 60 items measuring the 9 BC types: gentleness, Qi-deficiency, 2 Evidence-Based Complementary and Alternative Medicine Yang-deficiency, Yin-deficiency, phlegm-wetness, wetnessheat, blood-stasis, Qi-depression, and special diathesis. It was pilot-tested in the Beijing population to establish its face validity. Its reliability and construct validity were proven in 2500 people from five different geographical districts in China [20]. Although the CCMQ has been used in China nationwide campaigns since 2008 [21] mainly in epidemiological studies on the prevalence of BC types [22], it has never been tested or used on Chinese populations outside Mainland China including that of Hong Kong where the lifestyle, linguistic, health believes, and culture might be different [23]. The content and construct validity and other psychometric properties of CCMQ need to be confirmed before it could be applied to Chinese populations in Hong Kong or overseas. Confirmation of this would support the application of the CCMQ in cross-region and cross-country research on BC types. In addition, this is the first study to provide empirical data to validate the ancient concept in TCM theory which is important to future TCM clinical trials or investigation.
1.1. Aims. The aim of this study was to adapt and validate the CCMQ in Hong Kong Chinese in order to establish evidence on its content and construct validity, reliability, sensitivity, and responsiveness.

Objectives
(1) To adapt the CCMQ to a HK version that is linguistically valid for Cantonese speaking Chinese in Hong Kong.
(2) To evaluate the content validity of the HK version of the CCMQ by Chinese medicine practitioners (CMPs) experts and lay persons in Hong Kong.
(3) To test the psychometric properties including construct validity by scaling assumptions, factor structure and known group comparison, criterion validity, reliability, sensitivity, and responsiveness of the HK version of CCMQ.

Subjects.
To evaluate the content validity of CCMQ, convenient samples of 10 patients and 10 Chinese medicine practitioners (CMP), respectively, were recruited from June to July, 2010, to complete the CCMQ and cognitive debriefings. A convenient age-gender stratified sample of Cantonese speaking patients was recruited from the Ap Lei Chau General outpatient clinic (ALCGOPC), and all subjects completed a written consent form. All CMP were academically qualified with a bachelor's degree in CM and more than 5 years of clinical experience (average 7.2-8.4 years). The characteristics of the subjects are shown in Table 1.

Data
Collection. 2128 eligible patients attending a Western medicine (WM) outpatient clinic (ALCGOPC) and two Chinese medicine outpatient clinics were invited and 1084 patients participated in the cross-sectional validation study from July to October, 2010. The characteristics of the subjects are summarized in Table 1.
The CCMQ was reviewed and adapted to be linguistically appropriate for Chinese people in Hong Kong by a professional translator to form a draft HK version of the CCMQ. This was sent to 10 Chinese medicine practitioners (CMPs) with the completion of the cognitive debriefing questionnaire (see Supplementary Appendix A in Supplementary Material available online at http://dx.doi.org/10.1155/2013/481491). 10 lay subjects (patients from an ALCGOPC) completed the cognitive debriefing questionnaire administered by a trained interviewer whose responses to the open questions were recorded in verbatim. The standard procedure and questions of cognitive debriefing were followed [23,24].
To investigate the construct validity, reliability, sensitivity, and responsiveness of the CCMQ, each of the 1068 patients ≥18 years old answered the Hong Kong version of the CCMQ, the HK Chinese Short Form-12 version 2 Health Survey (SF-12v2) and a structured questionnaire on sociodemographics and chronic morbidity (Supplementary Appendix B) before assessment by a Chinese medicine practitioner (CMP). The CMP, blinded on the CCMQ results, assessed the subject and completed a structured evaluation form to indicate the BC types and severity (Supplementary Appendix C).
To evaluate test-retest reliability, 225 patients attending the ALCGOPC for routine chronic disease followup were retested with the CCMQ (HK version) administered by telephone 2 weeks after the first test.
1084 subjects agreed to a follow-up survey in 3 to 6 months, and 404 subjects completed the telephone interviews with the HK versions of the CCMQ and SF-12v2 in addition to a one-item Global Rating Scale (GRS) on change in health condition [25] to access the responsiveness of the CCMQ (HK version). The recruitment of patients was shown in Figure 1.

Sample Size
Calculation. The sample size for cognitive debriefing study on content validity was recommended by an international group [24]. The sample size for construct validity and reliability study was powered to determine the proportion of subjects with each BC type based on the most conservative estimate of 50%. In order to restrict the width of the 95% confidence interval of this proportion estimated to a two-sided standard error of 3%, a sample size of 1068 was needed. The sample size was also sufficient for the psychometric testing of the CCMQ based on the general guideline of least 7 subjects for each item [26].  [6,7,9,27] consists of 60 items to classify a person into one or more of nine BC types: gentleness (8 Items), Qi-deficiency (8 Items), Yang-deficiency (7 Items), Yin-deficiency (8 Items), phlegm-wetness (8 Items), wetness-heat (6 Items), blood-stasis (7 Items), Qi-depression (7 Items), and special diathesis (7 Items). Coexistence

The Global Rating on Change Scale (GRS) on Change in
Health. The Global Rating on Change Scale (GRS) asked the subjects to rate on the change in his/her own illness condition since the initial TCM/WM consultations. The response was given as a score of zero for no change, +1, 2, or 3 for different degrees of improvement, and −1, 2, or 3 for different degrees of deterioration.

Data Analysis
3.1. Content Validity. The content validity indexes (CVI), the proportion of subjects who gave a positive rating, on clarity and relevance, was calculated for each item [29]. A CVI ≥ 80% was considered satisfactory [29]. The Chinese medicine practitioners (CMPs) and lay subjects' answers to the open-ended cognitive debriefing questions were reviewed by an expert panel that consisted of the original author of the measure and experts in Health-related quality of life (HRQOL) research. Discrepancies between subjects' interpretation of the meaning of the items or response options and the intended meaning were highlighted, and items that were found to be unclear were identified. Revisions were made to the problematic item(s), taking into account respondents' suggested rewording, to form the final HK version of the CCMQ.

Scaling Assumptions.
The scaling assumptions were tested by (i) item-scale correlations, against the hypothesis that there should be substantial linear correlations ( ≥ 0.4); (ii) scaling success, defined as the item and hypothesizedscale correlation being greater than item and competing-scale correlations. This proportion of total number of item-scale correlations of all items in each scale that were successful was calculated [30]. Floor and ceiling effects of CCMQ scales were considered significant if over 15% of subjects got a minimal or maximum baseline score for each question [31].

Confirmatory Factor Analysis (CFA).
CFA was used to determine whether the items load onto the hypothesized subscales by the Satorra-Bentler scaled chi-square statistic [32]. The Confirmatory factor analysis (CFA) models were considered to have acceptable model fit if root mean square error of approximation (RMSEA) values, its 90% confidence interval, standardized root mean square residual (SRMR), were close to 0.08 or below, and comparative fit index (CFI), Tucker-Lewis index (TLI), and incremental fit index (IFI) values were close to 0.95 or greater [33].

Known-Group Comparison.
Correlations between scores of corresponding subscales of CCMQ were calculated by Spearman's correlation. Known-group comparison would be considered by studying the difference of CCMQ and SF-12 scores by gender and age groups in Mann-Whitney test and Kruskal Wallis test respectively. Moreover, the SF-12v2 scores of different BC types classified by CCMQ were compared by independent samples -test. It was hypothesized that subjects of the gentleness BC type of CCMQ should have the highest SF-12v2 scores because they were supposedly the most healthy.

Criterion
Validity. CMP diagnosis of the BC type was used as the "gold standard". The sensitivity and specificity of the CCMQ in predicting the CMP BC type diagnosis were calculated. The agreement between the diagnoses by the CCMQ with the CMP was assessed by the Kappa coefficient of which "1" indicates complete agreement and "0" complete disagreement [34].

Sensitivity.
The sensitivity of the CCMQ (HK version) was tested by patients with different levels of demographic groups (i.e., age and genders). It was hypothesized that patients who were older or female would have higher CCMQ scores [22]. The difference between two groups was tested by Mann-Whitney test, while more than two groups were tested by Kruskal-Wallis test for, and values less than 0.05 were considered statistically significant.

Reliability.
Internal consistency of CCMQ was measured by Cronbach's alpha that indicates the extent to which items in a scale are homogeneous in supporting the same concept. Test-rest reliabilities of CCMQ scales were evaluated by intraclass correlations (ICC) and paired -tests of the difference between 2-week test-retest scores, which were evaluated by the stability of the reproducibility of the instrument. Reliability coefficients ≥0.7 are the usual standard for group comparisons [35,36].

Responsiveness.
The proportion of subjects who had a change in the BC type classified by the CCMQ in 3-6 months was used as a measure of the responsiveness of the instruments to detect a change over time from summer to winter seasons. According to the Chinese medicine theory, 10-20% of patients would expect to have a change in their BC types. Subjects were divided into the gentleness BC type or any imbalanced BC types at the baseline survey, and the proportion of each with a change in the gentleness or imbalanced BC types classification was determined. The  change in the mean scale scores of subjects who have reported a change in the health condition measured by the GRS was analyzed and this would be tested by paired -test and McNemar-Bowker test.
Confirmatory factor analysis was adopted in LISREL 8.80, and other data analysis was carried out in SPSS for window 17.0. Statistical significant levels were set at values of 0.05.

Content Validity.
Time taken to complete the CCMQ by patients was 10.9 ± 5.4 minutes, while it was 15.6 ± 8.9 for Chinese medicine practitioners (CMPs). The CCMQ had satisfactory CVIs (≥80%) on clarity, consistency of response options, and relevance with health in all items except for 6 items ( Table 2). Three items had low rating from CMP and 3 had low rating from lay subjects. Firstly, CMP opinioned that the response options of "usually" (Jing Chang) and "often" (Chang Chang) were nondifferential, CVI of 50%. Items 3 "will you easily feeling shortage of breath (gasping for breath)" and 10 "do you think you are sentimental or fragile emotionally?" were rated as unclear by some CMP with CVIs of 60%. Some lay persons found the items of "are you capable to adapt to the external changes of the natural and social environment?, " "will you easily feel palpitation?, " and "will you feel wetness in your scrotum (men only)" unclear with CVIs of 50-60%. Although these items were thought by  some subjects to be unclear, the interpretations given from either CMP or lay persons were consistent with the intended meaning of the original CCMQ.
On the other hand, the interpretation of 5 items including the response options "often" (Chang Chang), "shortness of breath" (Qi Duan), "excessive sweating" (Xu Han), "urticarial" (Xun Ma Zhen), and "purpura" (Zi Dian) were interpreted wrongly by some lay persons. The expert panel revised these 5 items based on the suggestions on rewording from the CMP and lay persons (Table 3) to form the HK version of the CCMQ (Supplementary Appendix D) that was used in the further validation study.

CCMQ Score and BC Type Distribution.
The baseline score distribution, floor and ceiling effect proportions of the CCMQ, and SF-12v2 scales are shown in Table 1. No significant flooring and ceiling effect was found in CCMQ. Higher CCMQ scores indicate more severe imbalance (i.e., poor health), except for the gentleness scale. Among 1084 patients, 98.6% of subjects could be classified into at least one BC type by the CCMQ with 20% classified as the gentleness BC type (Table 4) and 64.9% had more than one imbalanced BC type. Table 5 shows satisfactory item-scale correlations and scaling success rates of CCMQ scales.

Construct Validity.
Item 53 "are you capable to adapt to the external changes of the natural and social environment?" of the gentleness scale had an item-scale correlation <0.4.

CFA.
Confirmatory factor analysis (CFA) ( Table 6) confirmed the 9-factor structure of CCMQ as originally hypothesized. The overall fit indexes (RMSRA, SRMR, CFI, TLI, and IFI) were up to standards, which indicate that the model fit of CCMQ was good.

Known Group Comparison.
The correlations between the scale scores of the CCMQ and SF-12v2 are shown in Table 7. As hypothesized, there was a positive correlation between the CCMQ gentleness BC type and SF-12v2 scores; higher scores of both indicate better conditions. Significant negative correlations were found between SF-12v2 and all other CCMQ scale scores indicating worse HRQOL associated with more severe imbalanced BC types, which supported the concurrent validity of the CCMQ. The scale scores of CCMQ correlated most strongly with the VT, RE, MH, and MCS of SF-12v2. Table 8 shows that subjects who had gentleness BC type had the highest SF-12v2 scores in all domains which further supporting the construct validity of CCMQ by known group comparison.

Criterion
Validity. CMP diagnoses were used as a gold standard to assess the accuracy of the BC type classification by the CCMQ. Table 11 shows that the sensitivities of the CCMQ in predicting the CMP BC type diagnosis ranged from 42.9% to 75%. The best sensitivity (75%) was for the detection of the special diathesis BC type. The specificities of the CCMQ ranged from 42.7% to 82.7% with the best (82.7%) in excluding the gentleness BC type. Table 12 shows the CCMQ and SF-12v2 scores classified by gender and age groups, respectively. It was found that female patients had lower mean CCMQ scores except for gentleness and wetness-heat of the CCMQ scales. Elderly subjects classified as gentleness BC type had a higher mean score than younger counterparts, but elderly subjects classified with imbalanced BC types had lower mean CCMQ   scores than younger people. This indicates that the CCMQ was sensitive in detecting the differences between groups.

4.8.
Reliability. The internal reliability of Cronbach's alpha and intraclass correlation were all satisfactory (>0.6) ( Table 9). The intraclass correlation (ICC) coefficients of CCMQ ranged from 0.71 to 0.88, supporting good test-retest reliability of the scale scores. Table 10 shows the number and proportion of people who were classified into the same BC type after 2 weeks with kappa statistics ranging from 0.318 to 0.531, which indicated poor reproducibility. Table 13 shows significant difference in the mean CCMQ scale scores after 3-6 months. A significant proportion of people had a change in the classification by each BC type. 20.6% of subjects had a change from the gentleness BC type to one or more imbalanced BC types. Apart from changes in CCMQ scores, 27.6% of subjects, with any imbalanced BC types had changed to the Gentleness BC type, had reported a better global rating scale of health status (i.e., 36.6% unchanged while 6.1% worse than baseline). These results confirmed that the CCMQ was responsive to change in BC type over time either by score change or patients' reported outcome.

Content Validity.
The results confirmed the need to evaluate the validity of a psychometric measure before its application to a different population. Some items of the CCMQ were not clear to Chinese people in Hong Kong even though the instrument was developed in Chinese. Some concepts such as "shortness of breath" (Qi Duan), "urticaria" (Feng Zhen), or "excessive sweating" (Xu Han) from Chinese medicine were not well understood by Chinese in Hong Kong, as found in a previous study [23]. Western medicine dominates the health-care system in Hong Kong making people unfamiliar with these terms used only by CMP. Psychometric testing supported the construct validity, reliability, sensitivity, and responsiveness of the CCMQ scales, but the accuracy and reproducibility in BC type classifications were uncertain.

BC Type Classification.
The CCMQ was able to classify 98.6% subjects into at least one type of BC supporting its feasibility and acceptability. Only around 20% subjects were classified to have gentleness BC type, which was lower than the 32.1% found in population studies in China [22] probably because the study subjects were recruited from a patient population. A very high (65%) proportion of subjects had more than one BC type, which was compatible with the theory of traditional Chinese medicine and clinical experience. The identification of overlapping BC types has important implication for individualized health promotion practices in that some treatments intended to improve one BC type may be contraindicated for another BC type. However, this might have caused the low agreement between the CCMQ classification and CMP diagnoses. The CMP would usually diagnose one or at most two (a major and a minor) BC types in any one person, taking the person as a whole, but a survey 10 Evidence-Based Complementary and Alternative Medicine

Psychometric Properties of the CCMQ.
There was no floor or ceiling effect in the CCMQ scale suggesting that the instrument could be useful in monitoring improvement or deterioration in a particular BC type over time or in response to interventions [31].

Validity of the CCMQ.
There were positive correlations between gentleness BC type with SF-12v2 scores but reverse correlations between the other imbalanced BC types. People with gentleness BC type are considered the healthiest; therefore, they had the highest SF-12 scores. The results were good evidence on not only the concurrent validity of the CCMQ but also the importance of imbalanced BC types and the concept of "Not Yet Ill. " Using the pragmatic diagnosis by CMP as the gold standard, the sensitivity and specificity of the CCMQ in predicting the BC type were lower than expected. A previous study also showed similar results [37], the authors pointed out the problem of subjectivity in the diagnosis by CMP. In our field testing of this study, we found that agreement on the BC type diagnosis between two CMP was only 52%,  which was improved to 78% after standardization on the diagnostic criteria. It would be interesting to find out whether the CCMQ could be used as an instrument to standardize BC type diagnosis among CMP. The cut-off score of 30 for a positive BC type might need to be recalibrated to improve its predictive accuracy.

Reliability.
The reliability coefficients of the CCMQ scales were comparable to the established SF-12v2 Health Survey, meaning that the CCMQ scales were consistent and reproducible. However, there was relatively poor reproducibility in the classification of the same BC type on retest after 2 weeks. Although the kappa values were rather low, 40% of subjects classified to gentleness BC type still had the same classification after two weeks, supporting fair reproducibility. The reproducibility of the specific imbalanced BC types was rather low, probably because of the overlapping BC types. The use of a single cut-off score, which may result in the change in the classification with a slight change in the response to scale items. It was also possible that subjects could have a change in their conditions although we only included patients whose reason for visiting the clinic was routine followup of their chronic conditions. 5.6. Sensitivity and Responsiveness. The CCMQ (HK versions) showed a difference in the severity of BC types between gender and age groups, supporting their sensitivity. It is likely that it could also differentiate between the healthy, not yet ill, and ill subjects. Further studies on people with different conditions should be carried out to establish its sensitivity in detecting difference in BC types associated with specific illnesses.
This study was the first to investigate and confirm the responsiveness of CCMQ to change in BC type classification and severity. The results also supported the TCM theory that BC type was not static and could change with season and may be subject for health promotion. It was good to note that a higher proportion of subjects changed back to gentleness BC type, and more people changed from "positive" to "negative" for any particular imbalanced BC types on followup. This could be related to an improvement in the health of these clinic patients after their consultations or a seasonal effect. In TCM theory, certain BC types occur more frequently or may become more severe during each season, for example, phlegm-wetness is expected to be more common in the summer than winter, while patients with Yang-deficiency will get more severe during winter but not in summer. Further epidemiological studies on the general population should be carried out to determine the effect of season on BC types.

5.7.
Limitations. The subjects were convenient patient samples, and the response rate in the psychometric study was relatively low, which might have biased the distributions of BC types. However, our study samples included people from all age groups and both genders, so the results on validity and reliability of CCMQ should be generalizable to other Chinese people in Hong Kong. The CCMQ were administered by an interviewer to all subjects, so the results might not be applicable to self-administration. No standardization of the CMP diagnosis of BC type was made, which might have affected the accuracy of the predictive sensitivity and specificity of the CCMQ. Better standardization of the CMP diagnosis and assessment of each subject by more than one CMP should be considered in future studies on the accuracy of CCMQ.

Conclusions
The CCMQ was adapted to a HK version with some changes to the wording of four items for Cantonese speaking Chinese people in Hong Kong. The construct validity, reliability, sensitivity, and responsiveness of the CCMQ scales were satisfactory. The CCMQ was able to classify the majority of people into one or more BC types. The instrument is useful in differentiating people with the gentleness BC type from those with imbalanced BC types, but the significance of more than one imbalanced BC types needs to be confirmed. One weakness is the relatively low sensitivity and specificity in predicting the CMP BC type diagnosis and low reproducibility in specific BC type classification. The CCMQ has the potential applications in population-based epidemiological studies as well as clinical trials.
Further research should also be done to explore whether the CCMQ can be shortened to improve its acceptability. Calibration of the cut-off scores for the definition of specific BC type should be carried out against better assured gold  9 10.9 52.5 10.6 * Significant difference between baseline and 3-6-month followup by paired t-test. † Significant difference between baseline and 3-6-month followup by McNemar-Bowker test.
standards. The performance as an outcome measure in health promotion interventions should be evaluated.