Effective Heart Disease Detection Based on Quantitative Computerized Traditional Chinese Medicine Using Representation Based Classifiers

At present, heart disease is the number one cause of death worldwide. Traditionally, heart disease is commonly detected using blood tests, electrocardiogram, cardiac computerized tomography scan, cardiac magnetic resonance imaging, and so on. However, these traditional diagnostic methods are time consuming and/or invasive. In this paper, we propose an effective noninvasive computerized method based on facial images to quantitatively detect heart disease. Specifically, facial key block color features are extracted from facial images and analyzed using the Probabilistic Collaborative Representation Based Classifier. The idea of facial key block color analysis is founded in Traditional Chinese Medicine. A new dataset consisting of 581 heart disease and 581 healthy samples was experimented by the proposed method. In order to optimize the Probabilistic Collaborative Representation Based Classifier, an analysis of its parameters was performed. According to the experimental results, the proposed method obtains the highest accuracy compared with other classifiers and is proven to be effective at heart disease detection.


Introduction
Heart disease (HD) is actually a broad term used for a wide variety of diseases of the heart and blood vessels such as coronary artery disease (CAD) [1] and heart rhythm disorders called arrhythmias (ARR) [2]. According to the World Health Organization (WHO), HD is the number one cause of death globally [3]. In 2012, it was estimated that HD caused about 17.5 million deaths, which means a person died from HD every 2 seconds [4]. There are many tests to diagnose HD; the main traditional diagnostic methods of HD are [5] blood tests, Electrocardiogram (ECG) [6], Holter monitoring [7], echocardiogram [8], cardiac catheterization [9], cardiac computerized tomography (CT) scan [10], and cardiac magnetic resonance imaging (MRI) [11].
Many clues about the health of a person's heart can be discovered in his/her blood. However, a single blood test cannot reflect the risk of heart disease. Two common blood tests for heart disease are a cholesterol test and a C-reactive protein (CRP) test. These tests analyze cholesterol and CRP contents in the blood, respectively, while overall the results can help create a clear picture of a person's heart health [12]. An ECG records electrical signals, while a Holter monitor is a portable device the patient wears to record a continuous ECG, usually for 24 to 72 hours. An echocardiogram uses sound waves to produce images of a person's heart, while a stress test records a person's signs and symptoms during exercise using an ECG or echocardiogram. For cardiac catheterization, a special dye needs to be injected into a person's coronary arteries through a long, thin, and flexible tube (catheter) usually in the leg. The dye then outlines narrow spots and blockages that appear in X-ray images. A CT scan and MRI can also help doctors detect calcium deposits in the patient's arteries that can narrow it.
Blood tests performed on individuals with HD are considered invasive as bodily fluids are removed and can take time for the laboratory technician to reach a result. ECG on the other hand might not be as invasive as a blood test, but in the case of Holter monitoring, it is time consuming. As for cardiac catheterization, the injection of a special dye is the definition of invasive. Therefore, given these issues, there is a need to develop a noninvasive computerized method to detect HD.
In 2008, Kim et al. proposed one such method to conduct the color compensation of a facial image based on the analysis of facial color [13] rooted in Traditional Chinese Medicine (TCM). In [13], they extracted the center forehead and lips of a person and analyzed the red color value distribution of the center forehead and lips. The authors wanted to survey real clinical data of HD patients and group them into different cases based on the analysis that facial color can help doctors diagnose HD. However, the authors just proposed a method and did not experiment on a real dataset.
Recently, Zhang et al. [14] used facial block color features to detect diabetes in a noninvasive manner with the Sparse Representation Based Classifier (SRC). Even though their detection results are relatively high, further analyses using other representation algorithms have not been studied nor have these algorithms been applied to detect other nondiabetic diseases. To resolve these issues, we propose an effective noninvasive computerized method to detect HD through facial image analysis via the Probabilistic Collaborative Representation Based Classifier (ProCRC) and apply our proposed method on a real dataset. ProCRC was first proposed in [15] and applied in pattern recognition, being developed from the Collaborative Representation Based Classifier (CRC) of [16]. Zhang et al. [16] proved that Collaborative Representation played a more important role than sparsity in pattern recognition and proposed CRC, which outperformed the SRC [17] and also runs much faster. In our work, the ProCRC was modified to be applied for HD detection based on facial key block color features. The ProCRC combines CRC and the probabilistic theory.
For the proposed method, facial images are first captured through a specially designed facial image capture device and four facial key blocks are extracted from each image. A color gamut with six-facial-color centroids is employed to extract color features from each block. The dataset used in this paper has two distinctive classes: (1) HD with 581 samples and (2) healthy (H) consisting of 581 samples. Based on the seven facial key block permutations, ProCRC with its optimal parameters is applied to classify HD versus H. To the best of our knowledge, this is the first time noninvasive computerized heart disease detection has been proposed in the literature.
The organization of this paper is given as follows. The details about the dataset are represented in Section 2. Feature extraction of the facial key blocks is given in Section 3, succeeded by a description of our proposed method in Section 4 using ProCRC. Section 5 describes and discusses the experimental results and Section 6 concludes this paper.

Dataset
The dataset we collected and used in this work consists of 581 H and 581 HD samples from the Guangdong Provincial TCM Hospital, Guangdong, China, in 2015. Individuals were diagnosed as healthy by medical professional practicing Western medicine, while heart disease patients were determined using the methods described in Section 1. Please note the handling of human subjects was done according to the principles outlined in the Declaration of Helsinki and each individual gave their consent to be a part of this study. Ethical approval was obtained from the Science and Technology Development Fund (FDCT) of Macao for this study with the project number FDCT 124/2014/A3.
The gender and age distributions of H and HD are described in this section. During data collection, it is sometimes difficult to record the information of everyone due to many circumstances. Therefore, in gender and age distributions, there are cases of no record (NR). The following pie charts ( Figure 1) are used to show the dataset gender distribution. In the pie chart, blue represents males, yellow is for females, and NR is illustrated in gray. In Figure 1, there are two pie charts describing the gender distributions of the dataset: (1) H (Figure 1(a)) and (2) HD (Figure 1(b)). According to Figure 1(a), 72 people are missing their gender information in H and about half of the healthy dataset is female (295), while the number of males is 214. Different from the H dataset, the HD dataset has only 6 NR cases. About 1/3 of the HD patients are female (171) with 404 male HD patients 2/3 (see Figure 1(b)). The age distribution is given through a table (see Table 1). To show the age distribution (in years) clearly, the age is split into 5 parts: [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17], [18][19][20][21][22][23][24], , [61-80], and [≥81]. From this table, the first column is the class name, where each class has two rows: the first row is the number of the people belonging to the age range and the second row is the corresponding percentage of people out of the total. For the H dataset, the age of most people is from 18 to 60 (56.28% + 30.81% = 87.09%) with no healthy person above 80 and it contains only 4 people above 60. As for the HD dataset consisting of 581 samples, the majority of HD patients are aged from 25 to 80 (68.5% + 19.79% = 88.29%).
It should be noted that the missing gender and age information does not affect our study since we are only interested in each individual's health status.

Facial Key Block Feature Extraction
In order to decrease the effects of the capture environment, a specially designed facial image capture device was applied. Using the device, the individual just needs to place his/her head on the chin rest and the device operator clicks the capture button. More details about the device can be found in [14]. A color correction procedure [18] was also performed to portray the facial images in an accurate way after image capture.
In Traditional Chinese Medicine (TCM), it is believed that the status of the internal organs can be determined from different regions of the face [19][20][21]. Figure 2 shows a human face partitioned into various regions according to TCM [22]. Facial blocks were previously defined in [23] to detect hepatitis from digital facial images. The authors extracted 5 facial blocks, one between the eyebrows, two below the eyes, one under the bridge of the nose, and one underneath the lower lip. Applying this idea to our proposed method, four facial key blocks are automatically extracted from each facial image representing the main regions. No facial block is used to represent region C in Figure 2 due to the existence of facial hair.
Hence, according to the five facial regions, four facial key blocks are automatically extracted from each calibrated facial image. Furthermore, the dimensionality of the whole facial image is much larger than four facial key blocks. Therefore, where th key block name means the position of th key block, such as FHB is the position of FHB, and and are the width and height of the facial image, respectively. color gamut as a solid colored square, whose label is on top and correspondingly RGB value is below.
Each pixel in a facial block is compared to one of the six-color centroids and assigned to its nearest centroid. After evaluating all pixels of a facial block, the total of each color (based on the six-color centroids) is summed and divided by the total number of pixels. This ratio forms the facial color feature vector , where = [ 1 , 2 , 3 , 4 , 5 , 6 ] and represents the sequence of the six-color centroids in Figure 5.
By comparing the four facial color feature vectors (per facial image) in groups of two (using all images in the dataset), and calculating the mean absolute difference of each group, LCB and RCB are shown to have the smallest difference [14]. This is not surprising given LCB and RCB are symmetrical and located on either side of the face. Therefore, in the following experiments, RCB is removed.

Sparse Representation Based Classifier (SRC).
The SRC was first proposed by Wright et al. [17] and used for face recognition. Since then, this classifier has been applied in numerous fields such as pattern recognition [14,24], object detection [25], image restoration [26], image denoising [27], video restoration [28], image super-resolution [29]. For the following, represents a dataset; donates a sample; , , or stands for a coefficient; and or is a positive scalar. The principle of the SRC is using the linear combination of the training data ( ) to represent the query testing sample ( ) while keeping the coefficients ( ) sparse enough. The coefficients of the class that the testing samples belong to have The SRC is defined as where SRC can be set to obtain the real sparse coding vector of over .

Collaborative Representation Based Classifier (CRC).
In [16], Zhang et al. established the Collaborative Representation (CR) mechanism, but not the 1 -norm sparsity constraint, that truly improved the method's effectiveness and further proposed a Collaborative Representation Based Classifier (CRC). The authors of [16] proposed CRC by modifying the 1norm of the SRC (2) to a 2 -norm: where CRC is the regularization parameter. The solution of (3) can be easily and analytically derived aŝ The first part (( + CRC ⋅ ) −1 ) of (4) is independent of . Therefore, it can be precalculated and once a query sample is available, it is projected to get̂. This makes calculatinĝ faster than̂in (2). More details about CRC can be found in [16].

Probabilistic Collaborative Representation Based Classifier (ProCRC).
Cai et al. [15] proposed the Probabilistic Collaborative Representation Based Classifier (ProCRC) algorithm for pattern classification. Let = [ 1 , 2 , . . . , ] ∈ R × denote the training samples, where ∈ R × represents the training samples from the th class with samples ( = ∑ =1 ), and the dimension of each sample is . The coefficient of representing a test sample ∈ R ×1 via ProCRC is solved with the following: where and are regularization parameters. Using ProCRC, the class label of the test sample is determined through locating the minimum value of the residual error for each class: wherêrepresents the coefficients of the test sample in the th class. Algorithm 1 shows the procedure of ProCRC. In order to show the ProCRC procedure clearly, let = [0, . . . , , . . . , 0] ∈ R × and = − have the same size of . More details about ProCRC can be found in [15].

Experimental Results
The experimental results are represented in this section. The settings for HD detection are first given followed by the detection results using 10 classifiers to compare and contrast with the ProCRC. Finally, the analysis of the ProCRC parameters and is represented in Section 5.3.

Experimental Setting.
We randomly selected close to half (580) of the data for training and the remaining data (582) for testing, where accuracy (which is the proportion of the correctly classified samples divided by all samples) is the performance measurement used. To overcome the shortcoming of different results for different data partitions [30], 5 random partitions were applied, where the final accuracy is its mean. The following experimental results were conducted on a PC with 8 i7-6700 CPU @3.40 GHz processor, 16.0 GB RAM, and a 64-bit OS.

HD Detection Results.
Other than the ProCRC, 10 other classifiers were applied to detect HD representing an array of traditional and the state of the art. The 10 classifiers are (i) -Nearest Neighbor ( -NN) [31] with = 1, (ii) Support Vector Machines (SVM) [31] with linear kernel function, (iii) SRC [17] with = 0.1, (iv) Dictionary Learning (DL) with SRC [32] using SRC = 0.1, DL = 0.1, and a dictionary size equal to half of the feature dimensionality, such as 3 for one key block, (v) CRC [16] with = 0.01, (vi) Softmax [33], (vii) Decision Tree [34], (viii) AdaBoost [35] with Tree Leaner, (ix) LogitBoost [36] with Tree Leaner, and (x) Gentle Boost [37]. The classifier parameters were fine-tuned based on its best performance and for the ProCRC its two parameters are analyzed in Section 5.3. Figure 6 illustrates the best accuracies of all 11 classifiers based on facial key block color features for all seven block combinations. From this bar chart, it is obvious that the ProCRC results (in red) outperformed or came close to achieving the highest accuracy for almost each combination.
To be thorough, the complete set of results including accuracy, sensitivity, and specificity [38] of the 11 classifiers using seven block combinations is shown in Table 2. In the table, ACC, SEN, and SPC represent accuracy, sensitivity,   To further demonstrate the effectiveness of the proposed method, Figure 7 shows three examples of FHB for HD and H, respectively. In this figure, the top row is FHB from HD and the bottom row is from H. Looking at the figure, it is difficult to distinguish the blocks with the naked eye. However, the proposed method can classify each block correctly.  and had small fluctuations after 0.2. The best accuracy in this case, which was also the highest accuracy in all 11 classifier, was 88.01%, where = 0.1 and = 0.001.

Conclusions
This paper proposed a noninvasive computerized method to detect HD based on facial key block color analysis classified using the ProCRC. The experiments were conducted on a new dataset consisting of 581 HD samples and 581 H samples. The facial images are first captured through a specially designed device, where four facial key blocks are extracted to represent one sample. For each facial key block, color features are extracted using a facial color gamut with six-color centroids. To obtain optimal HD detection, three facial key blocks are permuted and applied for classification. The proposed method used the ProCRC which was developed from CRC and analyzed CRC based on the probabilistic theory [15]. Compared with 10 other classifiers, the best accuracy of HD detection was 88.01% with a sensitivity of 84.95% and a specificity of 91.07% (using the ProCRC with = 0.1 and = 0.001 with FHB + LCB + NBB). This proves the effectiveness of the ProCRC based on facial key block color feature analysis to detect HD and potentially provides a new innovative noninvasive way to detect this disease.
As part of the future work, more features from the facial key blocks will be explored and extracted. In addition, other representation learning algorithms will be developed and applied to HD detection.