Visual Agreement Analyses of Traditional Chinese Medicine: A Multiple-Dimensional Scaling Approach

The study of TCM agreement in terms of a powerful statistical tool becomes critical in providing objective evaluations. Several previous studies have conducted on the issue of consistency of TCM, and the results have indicated that agreements are low. Traditional agreement measures only provide a single value which is not sufficient to justify if the agreement among several raters is strong or not. In light of this observation, a novel visual agreement analysis for TCM via multiple dimensional scaling (MDS) is proposed in this study. If there are clusters present in the raters in a latent manner, MDS can prove itself as an effective distinguisher. In this study, a group of doctors, consisting of 11 experienced TCM practitioners having clinical experience ranging from 3 to 15 years with a mean of 5.5 years from the Chinese Medicine Department at Changhua Christian Hospital (CCH) in Taiwan were asked to diagnose a total of fifteen tongue images, the Eight Principles derived from the TCM theorem. The results of statistical analysis show that, if there are clusters present in the raters in a latent manner, MDS can prove itself as an effective distinguisher.


Introduction
Reliability is an indispensable requirement in the biomedical diagnostics. The intraclass or interclass reliabilities have been proposed by many authors [1][2][3][4][5][6][7]. There are many works studying agreement measures for western medical diagnostics. However, only a few of them perform agreement analysis for TCM practitioners. In most of the literature concerning TCM agreement, even though complex combinations of TCM diagnostics are considered, a so-called proportion of agreement measure is adopted. The "proportion of agreement," shown by evidence, overlooks the possible bias caused by randomness. In order to remedy the bias, Cohen proposed his renowned "alpha" measure. Soon after his contribution, weighted kappa, Fleiss kappa, and so forth had been proposed to deal with more complex data types and more raters. Reference [8] considered a reliability measure called "Krippendorff 's alpha" to investigate the agreement of tongue diagnoses when there are many practitioners, and the data is ordinal. Krippendorff 's alpha coefficient equal to 0.7343 was reported in their study.
The core of diagnosis in Chinese Medicine is "pattern identification/syndrome differentiation and treatment" with inspection, listening, and smelling examination, inquiry, and palpation as the bases. Inspection tops the four diagnoses, and tongue diagnosis is a crucial part during observation. The tongue is connected to the internal organs through meridians; thus the conditions of organs, qi, blood, and body fluids as well as the degree and progression of disease are all reflected on the tongue. Organ conditions, properties, and variations of pathogens can be revealed through observation of tongue. Tongue inspection refers to the shape, color, and coating of a tongue that is, the degree of dimension for tongue diagnosis is three. Krippendorff 's alpha is a good approach for agreement analysis when evaluating the agreement of many TCM practitioners with ordinal data. However, it is complex, and only a single index representing agreement is rendered. More importantly, Krippendorff 's alpha cannot deal with highdimensional ordinal data obtained through the TCM tongue diagnosis. These two aforementioned pitfalls invalidate the application of Krippendorff 's alpha to the analysis of 2 Evidence-Based Complementary and Alternative Medicine  In light of the previous observation, we aim at proposing an effective approach to simultaneously deal with highdimensional ordinal data as well as the case when clusters present in the rating result.
A single value of agreement can only represent the "averaging mass" of agreement. We can hardly derive any meaning information out of the single agreement measure, especially when there are clusters present. For example, in the diagnosis of tongue shapes (thick, medium, and thin), suppose that there are three TCM practitioners judging some patients as "thick" and the other three practitioners "medium." We might reach a low-agreement conclusion, though the agreement is strong, respectively, within each of the two groups. It is interesting that, although it might be low in overall agreement, different TCM prescriptions could work well equally. With these perspectives, an alternative approach, such as multiple-dimensional scaling (MDS), may prove itself as a better alternative to analyzing the agreement of diagnostics among many TCM practitioners with highdimensional ordinal data. Kupper and Hafner proposed a method to assess the extent of interrater agreement when each unit to be rated is characterized by a subset of distinct nominal attributes [9]. When the attribute data is highdimensional, the interrater agreement can be treated as the similarity used in multiple-dimensional scaling [10] (MDS). The essence of MDS is an attempt to represent the observed similarities or dissimilarities in the form of a geometrical model by embedding the stimuli of interest in some coordinate space so that a specified measure of distance, for example, Euclidean distance, between the points in the space represents the observed proximities. In other words, MDS is the search for a low-dimensional space where each space point represents stimulus and the distance between points corresponds to dissimilarity.
In this study, we recruited eleven TCM practitioners with ages ranging from 29 to 47. A total of 15 tongue

Patients and TCM Tongue Inspectors.
Fifteen pictures of tongues are randomly selected from the archive of the Department of TCM, Changhua Christian Hospital (CCH). The pictures were taken by a digital image capturing and analyzing system called ATDS and were rated by eleven TCM practitioners with ages ranging from 29 to 47. The recruited TCM physicians have to classify each image, based on the Eight Principles, according to the features revealed by the tongues.

Statistical Analysis.
In this study we use four dissimilarity measures to conduct a nonmetric MDS which was first proposed by Kruskal [11,12]. The four measures refer to Kupper and Hafner's IAMA [9] (interrater agreement for multiple attributes), mean character difference (MCD), index of association [10] (IOA), and average Cohen's kappa (Cohen's kappa). The IAMA measure is a chancecorrected concordance. Among these four measures, IAMA and Cohen's kappa belong to similarity measures, while the other two measure dissimilarity. These four measures will be described in detail in the Appendix. Table 1 is a summary of the patterns of the fifteen patients that are identified by the eleven TCM physicians of CCH according to the Eight Principles. The letters in the body of the table refer to specific TCM physicians. In Table 2, the dissimilarities obtained by IAMA among the TCM physicians are listed. For example, the interrater agreement between rater A and rater C is 0.2462 therefore the dissimilarity can be defined by 1 − 0.2462 = 0.7538. Naturally, the diagonal entries are identically zero. The MDS graphs of agreement measures by the proposed four approaches are illustrated in Figure 1. The upper-left graph uses IAMA measure to conduct MDS, the upper-right one corresponds to the MCD method, the lower-left one represents the IOA method, and the lowerright one employs averaging Cohen's kappa of each attribute in the eight patterns between two distinct raters.

Results
We summarize the diagnoses of the patterns of the fifteen patients in Table 1. According to the four measures mentioned previously, MDS analysis may be conducted to further derive these similarity or dissimilarity measures. Figure 1 shows that the MDS graphs by IAMA and Cohen's kappa are similar. Rater C is an outlier for all these four graphs. Besides, the graphs by IAMA and Cohen's kappa share some characteristics in common. Note raters I and F are a little away from the biggest cluster formed by raters B, D, E, G, J, and K. Secondly, raters A and H form a small cluster. Traditional MDS distances using MCD or IOA lead to similar results. From Figure 1, raters C, I, H, and A are isolated singletons. There exists only one cluster formed by raters B, D, E, F, G, J, and K. In all these four graphs, raters B, D, E, G, J and K form a cluster.

Conclusion
In the TCM diagnostics, the practitioners are routinely confronted with a multiple-dimensional qualitative problem of symptom identification. Conventionally, the diagnosis according to Eight Principles summarizes the dynamics of a patient pursuing TCM treatment. When a TCM practitioner receives the information taken by way of the four diagnostics called "inspection, listening (smelling), inquiring and palpation," he has to distinguish the patterns which are coherent with the symptoms exhibited by the patients. Therefore, how to measure the agreement of the diagnoses according to the vector attributes observed by TCM practitioners is an important issue.
For a single attribute, the researchers are used to adopt Cohen's kappa, Fleiss kappa, or Krippendorff 's alpha to obtain a single-valued agreement measure. There is a drawback in these popular agreement measures. It does not have a rule of thumb to judge the level of agreement. In this study, we introduce a novel approach in deriving interrater agreement including IAMA proposed by Kupper and Hafner and the averaging Cohen's kappa, to calculate dissimilarities between any pair of raters. Using the dissimilarity measures, the MDS analysis can be conducted and an agreement graph is subsequently obtained. Figure 1 shows that rater C remains an outlier for all of the four methods. It might be due to that his diagnosis includes many "mixture" patterns, for example, "Yin" mixed with "Yang," or "Cold" mixed with "Hot," and so forth. Rater C is a senior TCM physician in the department of TCM of CCH and has a very long experience of research. Moreover, raters A and H are not only TCM practitioners in CCH, but also participate actively in advanced TCM studies for many years. From these analyses, other than agreement, we can distinguish the raters by clusters. As we mentioned in the Introduction section, the conventional single agreement is quite restricted in terms of successfully interpreting the meaning hidden underneath. It cannot judge whether a given "moderate" agreement coefficient is sufficient to quantify the reliability of TCM diagnostics or not. If there are clusters present in the raters in a latent manner, MDS can prove itself as an effective distinguisher.

A. IAMA Responses Proposed by Kupper and Hafner
Consider a study in which two equally trained raters, say raters A and B, independently examine each of N units. Let A i denote the subset of attribute for the ith unit chosen by rater A, and let card(A i ) = a i , 0 ≤ a i ≤ k, denote the cardinality of set A i . The symbol A stands for the complement of set A. We may depict the data for the ith unit as follows. Define the random variable to be the number of attributes for the ith unit either chosen by both raters or not chosen by either rater. Define the following agreement proportion: the overall concordance and the chance-corrected concordance π AB = π − π 0 1 − π 0 , where π 0 = 1 Nk

Conflict of Interests
No competing financial interests exist.
Evidence-Based Complementary and Alternative Medicine 5