Tongue Color Analysis for Medical Application

An in-depth systematic tongue color analysis system for medical applications is proposed. Using the tongue color gamut, tongue foreground pixels are first extracted and assigned to one of 12 colors representing this gamut. The ratio of each color for the entire image is calculated and forms a tongue color feature vector. Experimenting on a large dataset consisting of 143 Healthy and 902 Disease (13 groups of more than 10 samples and one miscellaneous group), a given tongue sample can be classified into one of these two classes with an average accuracy of 91.99%. Further testing showed that Disease samples can be split into three clusters, and within each cluster most if not all the illnesses are distinguished from one another. In total 11 illnesses have a classification rate greater than 70%. This demonstrates a relationship between the state of the human body and its tongue color.


Introduction
The human tongue contains numerous features that can be used to diagnose disease, with color features being the most important [1][2][3][4]. Traditionally, medical practitioners would examine these color features based on years of experience [5][6][7][8][9][10][11]. However, ambiguity and subjectivity are always accompanied with their diagnostic result. To remove these qualitative aspects, tongue color analysis can be objectively analyzed through its color features, which offers a new way to diagnose disease, one that minimizes the physical harm inflicted to patients (compared with other medical examinations).
A literature review on this topic revealed only a few papers where color features from the tongue are the main component used to diagnose disease. Reference [12] used tongue color along with qualitative and quantitative analysis to examine 207 patients suffering from lung cancer. The patients were split into four syndrome groups according to Chinese Medicine, and the CIELAB color model was used for quantitative classification. They reported significant statistical difference between the four groups when it came to each tongue's CIELAB value. The correct classification rate of each group was 69.4%, 54.4%, 72.2%, and 54.4%, respectively. A color metrics was utilized in [13] to diagnose appendicitis using tongue images. 798 tongue samples were captured from 399 patients (two samples from each person), consisting of common illnesses such as diabetes, pancreatitis, and hypertension, along with 114 images from tongues affected by appendicitis. The samples were captured using a specially designed device consisting of a 3-CCD digital camera, two D65 lights and calibrated with a white color plate. Four color spaces (RGB, CIExyY, CIELUV, and CIELAB) were evaluated to find the best combination. In their work they reported a correct classification of appendicitis to be 66.67%. Pancreatitis (29 samples) and appendicitis (53 samples) tongue images again appeared in [14], albeit this time with 56 normal samples. These images were captured with a device designed by their research center consisting of a lens, CCD sensor board, two D65 lights, and video frame grabber. Assessing the same four spaces as [13], the experimental results showed that normal and pancreatitis as well as appendicitis and pancreatitis can be linearly separated using color features.
In [12] the dataset was quite small and only one disease was analyzed. The patients were also diagnosed using Chinese Medicine. As for [13] its dataset was substantially larger but only appendicitis was classified. The samples in [14] include two illnesses as well as normal, but their sizes are too small to have any statistical significance. Both works in 2 Evidence-Based Complementary and Alternative Medicine  Upper Respiratory Tract Infection 12 10 Erosive Gastritis 10 11 Coronary Heart Disease 13 12 Chronic Bronchitis 11 13 Mixed Hemorrhoid 11 14 Miscellaneous 500 [12,14] did not use any image correction to ensure uniform feature extraction and analysis under different operating conditions. Also, [12][13][14] used some variation of the CIE color space to embody the tongue colors, which may not be very accurate. Therefore, there is a lack of any work on an indepth systematic tongue color analysis system for medical applications, one that accurately captures the images and represents its colors using a tongue color gamut [15]. In this paper such an application is described to address these problems. Tongue images are first captured using a specifically designed device with image correction. Afterwards, the images are segmented [16] with the background removed and tongue foreground remaining. Color features from each pixel are then extracted and assigned to 1 of 12 colors symbolizing the tongue color gamut [15]. This produces the tongue color feature vector. Experimental results were carried out on a large scale dataset consisting of 143 Healthy and 902 Disease samples (diagnosed using Western Medicine) taken from Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangdong, China. The Disease class was composed of 13 specific illnesses (with at least 10 samples in each group) and one sizeable miscellaneous group (made up of various illnesses). Classification was performed between the illnesses in addition to Healthy versus Disease.  The rest of this paper is organized as follows. An introduction to the tongue image acquisition device and dataset used is given in Section 2. Section 3 summarizes the tongue color gamut and explains how color features are extracted using it. In Section 4 classification between the two classes of Healthy and Disease is performed. Following this, illnesses in the Disease class are classified. Finally, concluding remarks are made in Section 5.

Materials
The tongue database is composed of 1045 images (one image per person) split into 143 Healthy and 902 Disease captured at  Evidence-Based Complementary and Alternative Medicine Guangdong Provincial Hospital of Traditional Chinese Medicine, Guangdong, China. The patients' consent was obtained according to the Declaration of Helsinki and the Ethical Committee of the Institution in which the work was performed approved it. The capture device used was a three-chip CCD camera with 8 bit resolution and two D65 fluorescent tubes placed symmetrically around the camera in order to produce a uniform illumination. The images captured were color corrected [17] to eliminate any noise caused by variations of illumination and device dependency. This allows consistent feature extraction and classification in the following steps. Figure 1 shows the capture device. Healthy samples were verified through a blood test and other experiments. If indicators from the tests fall within a certain range they were deemed fit. In the Disease class, samples were collected from inpatients with illnesses determined by their admission note, diagnosed using Western Medical practices. Inpatients suffering from the same illness were grouped together into a single class. In total there were 13 ailment groups (with at least 10 samples) and one miscellaneous group containing various illnesses. A summary of the Disease class breakdown is given in Table 1. Please note any future reference to a specific illness in Table 1 will be made using its Disease ID.

Methods
The following section describes how color features are extracted from tongue images. The tongue color gamut is first summarized in Section 3.1. In Section 3.2, every foreground tongue pixel is compared to 12 colors representing the tongue color gamut and assigned its nearest color. This forms the color features.

Tongue Color Gamut.
The tongue color gamut [15] represents all possible colors that appear on the tongue surface and exists within the red boundary shown in Figure 2 (CIExy chromaticity diagram). Further investigation revealed that 98% of the points lie inside the black boundary. To represent the tongue color gamut using 12 colors, the RGB color space is employed and plotted in and P. LR (Light red), LP (Light purple), and LB (Light blue) are midpoints between lines from the black boundary to W, while DR (Deep red) is selected as no previous point occupies that area. More details about the tongue color gamut can be found in [15]. GY (Gray) and BK (Black) are not shown in Figure 3 since both belong to grayscale.
The 12 colors representing the tongue color gamut are extracted from Figure 3 and shown in Figure 4 as a color square with its label on top. Correspondingly, its RGB and CIELAB values are given in Table 2.

Tongue Color Features.
Given a tongue image, segmentation is first applied to locate all foreground tongue pixels [16]. Having located each pixel its corresponding RGB value is extracted and converted to CIELAB [18] by first converting RBG to CIE using (2) In (2), 0 , 0 , and 0 are the CIE tristimulus values of the reference white point. The LAB values are then compared to 12 colors from the tongue color gamut (see Table 2) and assigned the color which is closest to it (measured using Euclidean distance). After calculating all tongue foreground pixels, the total of each color is summed and divided by the number of pixels. This ratio of the 12 colors forms the tongue color feature vector V, where , 10 , 11 , 12 ] and represents the sequence of colors in Table 2. As an example, the color features of two tongues are shown in visual form (refer to Figures 5 and 6) along with its extracted tongue color feature vectors, where the original image is decomposed into one of the 12 colors. Figure 5 is from a Healthy sample and Figure 6 is from a Disease sample. In the Healthy sample the majority of pixels are LR and for Disease it is GY. The mean colors of Healthy and Disease are displayed in Table 3 along with three typical samples from each class shown in Figure 7. Disease tongues have a higher ratio in R, DR, BK, GY, and Y according to Table 3. On the other hand, LR and W are greater in Healthy. Only 7 colors are listed out of the 12 as the remaining 5 have ratios less than 1%.

Results and Discussion
In this section classification using color features is described. Classification between Healthy versus Disease is first given in Section 4.1, while illnesses in Disease are classified in Section 4.2. Table 4 shows the classification rate between Healthy versus Disease on the test data. Half the images were randomly selected from either class to represent the training set and the remaining samples assigned to the test set. The training data in each class are the mean tongue color features of Healthy and Disease. To reduce the number of tongue color features, feature selection with sequential forward search was implemented. Both -NN [19] and SVM [19] using a quadratic kernel were tested producing the same result as can be seen in Table 4. This means for -NN and SVM the tongue color feature vector of the training set consisting of Healthy and Disease was placed in an -dimensional space. Each tongue color feature vector representing the test set was mapped to this space and classified depending on its classification rule ( -NN or SVM).

Typical Disease Analysis.
With Healthy versus Disease separated the next step is to examine whether certain illnesses within the Disease class can be distinguished from one another. All 13 illnesses were grouped into three clusters by FCM [19], with Table 5 illustrating which cluster each illness belongs to. The mean tongue color features of each cluster are shown in Table 6. R, DR, and LR are greater in Cluster 3. Cluster 2 has higher concentrations of GY, BK, and W, while Y is more significant in Cluster 1. Table 7 shows the classification rate of the three clusters calculated in groups of two. In each case the two clusters in question are clearly separable as seen in this table and Figures 8, 9, and 10. Three typical samples from each cluster are depicted in Figure 11. From a visual perspective the tongue color features in each cluster are quite different compared to the rest.
Next, each cluster is examined one by one to determine whether illnesses within it can be classified. This is accomplished by comparing illnesses inside the cluster and removing the illness with the highest classification. The process is repeated until all illnesses have been classified. The same experimental setup described in Section 4.1 was applied, where half the images are randomly selected for training and test sets. Both -NN and SVM were used as the classifiers along with sequential forward search for feature selection. An illness is considered successfully classified if its average accuracy is greater than or equal to 70%. The average accuracies stated in the following paragraph represents only SVM. For a complete list of the results please refer to Table 8.  Diseases 1 and 13 in Cluster 1 are separable with an average accuracy of 76.08%. In Cluster 2, Disease 7 can be first removed as its classification rate of 93.06% is the highest amongst the six illnesses. Diseases 10, 8, and 9 are subsequently taken out in that order which leaves illnesses 5 and 12 (classification rate of 81.45%). Looking at Cluster 3, Disease 6 with a classification rate of 74.05% is initially removed from the pack. This is followed by Diseases 3 and 11 leaving 2 and 4 which produced the lowest classification result of 54.41%. Table 8 summarizes this result. Diseases 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, and 13 achieved an average accuracy greater than 70% and therefore deemed successfully classified. Typical samples of the successfully classified illnesses are shown in Figures 12,13,14,15,16,17,18,19,20,21,and 22. As part of the future work we plan on returning to Guangdong Provincial Hospital of Traditional Chinese Medicine and collect more diseased tongue images. Color features (discussed in Section 3.2) will be extracted from these new images before combining it with the previous batch. The experimental results in the form of Healthy versus Disease classification and typical disease analysis will be recalculated in order to further validate its statistical accuracy.

Conclusion
Given a tongue image the tongue color analysis system is able to first distinguish Healthy versus Disease with an average accuracy of 91.99%. If the image is from Disease it is further assigned to one of three clusters. From these clusters 11 illnesses can be successfully classified given a classification rate of at least 70%. The proposed method uses a special capture device with image correction and extracts a tongue color feature vector from each image. This vector consists of 12 color ratios calculated with the tongue color gamut to better characterize each foreground tongue pixel. Testing was carried out on a large dataset collected from Guangdong, China, consisting of 143 Healthy and 902 Disease images   (13 specific illnesses with at least 10 samples and a miscellaneous folder). The experimental results showed that there is a relationship between tongue color and the state of the human body, which can be used in medical applications to detect various illnesses. This can potentially lead to a new painless and efficient way to examine patients.