Assessment of Heart Disease using Fuzzy Classification Techniques

In this paper we discuss the classification results of cardiac patients of ischemical cardiopathy, valvular heart disease, and arterial hypertension, based on 19 characteristics (descriptors) including ECHO data, effort testings, and age and weight. In this order we have used different fuzzy clustering algorithms, namely hierarchical fuzzy clustering, hierarchical and horizontal fuzzy characteristics clustering, and a new clustering technique, fuzzy hierarchical cross-classification. The characteristics clustering techniques produce fuzzy partitions of the characteristics involved and, thus, are useful tools for studying the similarities between different characteristics and for essential characteristics selection. The cross-classification algorithm produces not only a fuzzy partition of the cardiac patients analyzed, but also a fuzzy partition of their considered characteristics. In this way it is possible to identify which characteristics are responsible for the similarities or dissimilarities observed between different groups of patients.


INTRODUCTION
The mathematics of fuzzy set theory was originated by L.A. Zadeh in 1965 [15]. It deals with the uncertainty and fuzziness arising from interrelated humanistic types of phenomena such as subjectivity, thinking, reasoning, cognition and perception. This type of uncertainty is characterized by structures that lack sharp (well-defined) boundaries. This approach provides a way to translate a linguistic model of the human thinking process into a mathematical framework for developing the computer algorithms for computerized decision-making processes. The theory has grown very quickly [1,2,3,5].
There are two opposite approaches to hierarchical clustering, namely, agglomerative and divisive procedures. An agglomerative hierarchical classification places each object in its own cluster and gradually merges the clusters into larger and larger clusters until all objects are in a single cluster. The divisive hierarchical clustering reverses the process by starting with all the objects in a single cluster and subdividing it into smaller ones until, finally, each object is in a cluster of its own. The number of clusters to be generated may be either specified in advance or optimized by the algorithm itself according to certain criteria.
Among other interesting applications, the fuzzy clustering theory developed in References [3,4,6,8] has been used for the selection and the optimal combination of solvents [7,13], for the classification of Roman pottery [9], for the cross-classification of Greek muds [6], for the development of a fuzzy system of chemical elements [12,14], for producing a performant fuzzy regression algorithm [10], and for the cross-classification of thin layer chromatography data [11].
In this paper we analyze the possibility of identifying the correct diagnosis concerning the cardiac patients using different fuzzy clustering algorithms.

THEORETICAL CONSIDERATIONS Fuzzy Substructure of a Fuzzy Set
In this section we will recall the so-called generalized fuzzy n-means algorithm [3,4,6]. This algorithm is a generalization of the well-known fuzzy n-means algorithm [1,4]. Let us consider a set of objects X = { x 1 , ..., x p } ⊂ R s and let C be a fuzzy set on X. We are searching for the fuzzy partition corresponding to the cluster substructure of the fuzzy set C. Let us suppose that this fuzzy partition is {A 1 , ..., A n }. We admit that each fuzzy class A i may be represented by a prototype L i from the representation space, R s . If L i is from X, it is natural to suppose that L i has the greatest membership degree to A i , that is: Otherwise, if L i is not from X, we cannot speak about its membership degree to the fuzzy set A i , since the universe of the fuzzy set A i is X. Let us denote by d a distance in the space R s . For example, we may consider the distance induced by the norm of the space. The dissimilarity D i (x j ,L i ) between a point x j and the prototype L i is defined as the square local distance in the class A i, d i (x j ,L i ) , and is interpreted as a measure of the inadequacy of the representation of the point x j by the prototype L i .
If L i is not a point from the data set X, then If, on the contrary, L i is a point from the data set X, then we have from (1) that for any x j in X, and the relation (2) is still valid. The inadequacy between the fuzzy partition P and its representation, L = {L 1 , ..., L n } is given by the following function: J(P, L) may also be interpreted as the representation error of P by L.
It is easy to observe that J is a criteria function of the type of square errors sum. The classification problem becomes the determination of the fuzzy partition P and its representation L for which the inadequacy J(P, L) is minimal. We note that, intuitively, to minimize J means to give small membership degrees to A i for those points in X for which the dissimilarity to the prototype L i is large, and vice-versa. Another useful remark is that the fuzzy sets A i , i = 1,...,n are components of the fuzzy partition P of the fuzzy set C, and thus the obvious 'solution', A i = 0 for all i is not acceptable since it does not form a fuzzy partition of C.
If we admit that d is a distance induced by the norm, we may write If the norm is induced by the inner product, we have where M is a symmetrical and positively defined matrix. The transposing operation was denoted by T . The criteria function becomes: Because an algorithm to obtain an exact solution of the problem (5) is not known, we will use an approximate method in order to determine a local solution. The minimum problem will be solved using an iterative (relaxation) method, where J is successively minimized with respect to P and L.
Supposing that L is given, the minimum of the function J(•,L) is obtained [3,4] for: for all j for which for every k, d(x j , L k ) ≠ 0. If, on the contrary, there exists a j so that for some values of k, d(x j , L k ) = 0, then the membership degrees For a given P, the minimum of the function J(P,·) is obtained for: We observe [3,4] that L i is the weighting center of the class A i . The iterative procedure for obtaining the cluster substructure of the fuzzy class C is called generalized fuzzy n-means (GFNM) [3]. Essentially, the GFNM algorithm works with Picard iterations using the relation (6) and (7). The iterative process begins with an arbitrary initialization of the partition P. The process ends when two successive partitions are close enough. To measure the distance between two partitions, we will associate to each partition P a matrix Q with the dimensions n × p. Q is named the representation matrix of the fuzzy partition P and is defined as: Considering that Q 1 and Q 2 are the representation matrices of the partitions P 1 and P 2 , we may define The process ends at the r-th iteration if where ε is an admissible error (usually, 10 −5 ). For C = X this procedure is the well-known algorithm fuzzy n-means (FNM) [1].

Fuzzy Divisive Hierarchical Clustering
Let us consider a fuzzy set C on X and a fuzzy binary partition P = {A 1 , A 2 } of C, and the following function, called polarization degree: is larger than C(x)/2 and is equal to zero otherwise). (For a complete study of the polarization degree and its properties, please see Reference [3].) We will say here only that R(P) is as larger as the partition P is more polarized, and R(P) is as small as the partition P is fuzzy. We say that the binary partition P describes 'real' clusters if the polarization degree R(P) is larger than a certain threshold t ∈ (0,1) chosen a priori, and if for every class C i there exists at least one point Using the FNM algorithm we may determine a binary fuzzy partition P = {A 1 , A 2 } of the data set X. If the partition P does not describe 'real' clusters, the data set X does not have a substructure. If this partition describes 'real' clusters, we denote P 1 = {A 1 , A 2 }. Using the GFNM algorithm for two subclasses (n = 2) we may determine a binary fuzzy partition for each A i of P 1 . If this partition of A i describes real clusters, these clusters will be attached to a new fuzzy partition, P 2 . Otherwise, A i will remain undivided. The class A i will be marked and will be allocated to the partition P 2 . The unmarked classes members of P 2 will follow the same procedure. The divisive procedure will stop when all the classes of the current partition P l are marked, i.e., there are no more 'real' clusters.
The procedure described here is called the Fuzzy Divisive Hierarchical Clustering (FDHC) algorithm [3,9]. This procedure may be used to determine the optimal cluster substructure of the data set. The method is especially useful when the number of classes is unknown. We emphasize here that the ability of chose the value of the polarization threshold to be used allows us to stop the hierarchical analysis at that degree of refinement considered relevant for the application. If we decide to choose a high threshold we will obtain the fuzzy partition corresponding to the macroscopic structure of the data set, while by choosing a smaller threshold, we will have a more detailed image of the fuzzy substructure of the data. Moreover, we are not interested only in the final fuzzy partition; we are interested in the relationships between different fuzzy sets. These relationships may be observed very well from the binary classification tree [6,7,9,11,12,14].

Interpretation of the Final Fuzzy Partition
The fuzzy hierarchy obtained is richer in information (see Reference [5]) than a hierarchy based on classical sets, but sometimes is useful to have a classical partition also. For a complete discussion on the problem of passing from fuzzy partitions to classical partitions, see Reference [5]. We will only show the method used here for obtaining a classical partition.
Defuzzification of the final fuzzy partition will be realized using the maximum membership rule or a hierarchical assignment rule. This latter rule means that the classical sets corresponding to the fuzzy classes will be built in the same time with the respective fuzzy classes, based on the following rules (here, and in all that follows, C denotes the classical set obtained by defuzzification from the fuzzy set C): 1) initially, since X is a classical set, ~; X X = 2) when we build the fuzzy partition {C 1 , C 2 } of the fuzzy set C, we will say that: Remark. It is obvious that {~,~} C C 1 2 is a hard partition of the classical set C .
Finally, when obtaining the fuzzy hierarchy of the set X, we will also obtain the so-called classical hierarchy associated to that fuzzy hierarchy.

Associative Simultaneous Fuzzy n-Means Algorithm
Let X = {x 1 , ..., x p } ⊂ R s be the set of objects to be classified. A characteristic may be specified by its values corresponded to the p objects. Thus, we may say that Y = {y 1 , ..., y s } ⊂ R p is the set of characteristics. y j k is the value of the characteristic k with respect to the object j, so we may write y x j k k j = .
Let P be a fuzzy partition of the fuzzy set C of objects and Q a fuzzy partition of the fuzzy set D of characteristics. The problem of the cross-classification (or simultaneous classification) is to determine the pair (P, Q) which optimizes a certain criterion function.
By starting with an initial partition P 0 of C and an initial partition Q 0 of D, we will obtain a new partition P 1 . The pair (P 1 , Q 0 ) allows us to determine a new partition Q 1 of the characteristics. The algorithm consists in producing a sequence (P k , Q k ) of pairs of partitions, starting from the initial pair (P 0 , Q 0 ), in the following steps: The rationale of the hierarchical cross-classification method [4,6] essentially supposes the splitting of the sets X and Y in two subclasses. The classes obtained are alternatively divided in two subclasses, and so on. The two hierarchies will be represented by the same tree, having in each node a pair (C, D), where C is an objects fuzzy set and D is a characteristics fuzzy set.
As a first step we wish to determine simultaneously the fuzzy partitions (as a particular case, the binary fuzzy partitions) of the classes C and D, so that the two partitions should be highly correlated. With the generalized fuzzy n-means algorithm, we will determine a fuzzy partition P = {A 1 , ..., A n } of the class C, using the original characteristics.
In order to classify the characteristics, we will compute their values for the classes .., n. The value y i k of the characteristic k with respect to the class A i is defined as: Thus, from the original s p-dimensional characteristics we computed s new n-dimensional characteristics which are conditioned by the classes A i , i = 1, ..., n. We may admit that these new characteristics do not describe objects, but they characterize the classes A i . Let

= =
The way the set Y has been obtained lets us conclude that if we will obtain an optimal partition of the fuzzy set D, this partition will be highly correlated to the optimal partition of the class C. With the generalized fuzzy n-means algorithm we will determine a fuzzy partition Q = {B 1 , ..., B n } of the class D, by using the characteristics given by the relation (12). We may now characterize the objects in X with respect to the classes of properties B i , i = 1, ..., n. The value x i j of the object j with respect to the class B i is defined as: Thus, from the original p s-dimensional objects we have computed p new n-dimensional objects, which correspond to the classes of characteristics B i , i = 1, ..., n.
Let us consider now the set } ,..., of the modified characteristics. We define the fuzzy set C on X given by . ,..., 1 ), With the generalized fuzzy n-means algorithm we will determine a fuzzy partition P′ = {A′ 1 , ..., A′ n }, of the class C by using the objects given by the relation (13). The process continues until two successive partitions of objects (or characteristics) are closed enough to each other.
Considering P = {A 1 , ..., A n } is the fuzzy n-partition of X and Q = {B 1 , ..., B n } is the fuzzy n-partition of Y produced after this step of our algorithm. Let us remark that we made no explicit association of a fuzzy set A i on X with a fuzzy set B j on Y, i.e., what is the fuzzy set B j that best describes the essential characteristics corresponding to the fuzzy set A i . It only supposes that A i is to be associated with B i , and this is not always true.
Let us denote by S n the set of all permutations on {1, ..., n}. We wish to build that permutation σ ∈ S n which best associates the fuzzy set A i with the fuzzy set B i σ ( ) , for every i = 1, ..., n. Our aim is to build some function J : S n → R so that the optimal permutation σ is that which maximizes this function. Let us consider the matrix Z ∈ R n,n given by Let us remark the similarity between the way we compute the matrix Z in (14) and the way we computed the new objects and characteristics in relation (12) and (13).
The experience enables us to consider the function J as given by Thus, supposing that the permutation σ maximizes the function J defined above, we will be able to associate the fuzzy set A i with the fuzzy set B i σ ( ) , i = 1, ..., n. As we will see in the comparative study below, this association is more natural than the association of A i with B i , i = 1, ..., n. Based on these considerations we are able to introduce the following algorithm, the associative simultaneous fuzzy n-means algorithm (ASF):

S1.
Set l = 0. With the generalized fuzzy n-means algorithm we determine a fuzzy n-partition P (l) of the class C by using the initial objects.

S2.
With the generalized fuzzy n-means algorithm we determine a fuzzy n-partition Q (l) of the class D by using the characteristics defined in (12).

S3.
With the generalized fuzzy n-means algorithm we determine a fuzzy n-partition P (l+1) of the class C by using the objects defined in (13).

S4.
If the partitions P (l) and P (l+1) are close enough, that is, if ||P (l+1) -P (l) || < ε , where ε is a preset value, then go to S5, otherwise increase l by 1 and go to S2.

S5.
Compute the permutation σ that maximizes the function J given in relation (15).

Re-label the fuzzy sets
Let us remark now that, after steps S5 and S6, we are able to associate the fuzzy set A i with the fuzzy set B i , i = 1, ..., n.
Let us also remark that the computation required in step S5 is not an obvious one. But, as we will see, our purpose is to use this algorithm for developing a hierarchical technique. Thus, we will use the ASF algorithm in the particular case n = 2. In this case, the computation required in step S5 becomes trivial.

Fuzzy Hierarchical Cross-Classification Algorithm
The method described below is the straightforward way of developing a hierarchical algorithm that should use at each node of the classification tree our ASF algorithm. We will first show the way to build the classification binary tree. The tree nodes are labeled with a pair (C, D), where C is a fuzzy set from a fuzzy partition of objects and D is a fuzzy set from a fuzzy partition of characteristics. The root node corresponds to the pair (X, Y). In the first step the two sub-nodes (A 1 , B 1 ) and (A 2 , B 2 ), respectively, will be computed by using the ASF algorithm. Of course, these two nodes will be effectively built only if the fuzzy partitions {A 1 , A 2 } and {B 1 , B 2 } describe real clusters. For each of the terminal nodes of the tree we try to determine partitions having the form {A 1 , A 2 } and {B 1 , B 2 }, by using the ASF algorithm, modified as we have mentioned before. In this way the binary classification tree is extended with two new nodes, (A 1 , B 1 ) and (A 2 , B 2 ). The process continues until, for any terminal node, we are not able to determine a structure of real clusters, either for the set of objects or for the set of characteristics. The final fuzzy partitions will contain the fuzzy sets corresponding to the terminal nodes of the binary classification tree. This algorithm, termed the FHCCA algorithm, seems to be suitable for applications where the idea is to get most of the relationships between different classes of objects and different classes of characteristics. In Pop and Sârbu [11] we introduced two more variants of this algorithm, FHCCB and FHCCC.

Characteristics Clustering
In this section we address the problem of characteristics clustering. This may be useful in many situations. For example, the dimensionality reduction may be considered a characteristic classification process. The characteristics in the same class (which are, consequently, very similar to each other) will realize a reduced discrimination among the objects. On the contrary, the more distant the classes that contain two different characteristics, the greater their discrimination power. If the classes of characteristics are homogenous and well separated, a class may be replaced by the most representative characteristic. This characteristic represents an average of the properties of the class. The more compact the class, the smaller the loss of information produced by this replacement. In this way we realize a dimensionality reduction. By choosing a unique characteristic from each class, the number of selected characteristics is equal to the number of clusters in the set Y. Alternatively, we may not only select some of the existing characteristics, but we may replace them by new characteristics, by considering that each class of characteristics is replaced by the prototype characteristic. The technique obtained by using the fuzzy divisive hierarchical clustering algorithm on the set of characteristics will be called fuzzy hierarchical characteristics clustering (FHiCC). Similarly, the technique obtained by using the fuzzy n-means algorithm on the set of characteristics will be called fuzzy horizontal characteristics clustering (FHoCC).

Fuzzy Hierarchical Classification of Cardiac Patients
The successive partition of the cardiac patients produced by using the 19 descriptors mentioned above is presented in Table 3. To ensure a more uniform participation of the various descriptors, we used a normalization (auto-scaling) of descriptors. The classification based on the same 19 descriptors but normalized-a procedure that avoids certain descriptors, expressed by larger numerical values, to prevail-gives finally just two classes. Table 4 shows the partition of the cardiac patient in this case. Comparing the results in Table 3 with the classes obtained by paraclinical and clinical investigations, we have observed certain differences. However, using data normalization the algorithm seems to produce better results providing only two classes: the first one including the majority of the valvular heart disease patients and the second one the majority of the dilatative cardiomiopathy and ischemic cardiopathy patients. In this case, a good agreement with the original diagnostics was established, as it may be seen in Table 5. The table also presents the NYHA functional class as it was established by paraclinical and clinical investigations. Moreover, considering the relatively large number of variables, we attempt, in the next section, to reduce it by applying a fuzzy clustering algorithm.

Fuzzy Hierarchical and Horizontal Characteristics Clustering
The large number of available variables is always an issue, because of the extra computation required, and because not all the variables describe equally well the data. Following the fuzzy clustering of patients, our aim is to use fuzzy clustering in selecting the most relevant variables. We will next classify the data by using only these most relevant variables, and compare the results with those of the original fuzzy classification. In order to develop the classifications presented in this section we applied the FHiCC procedure to the initial descriptors.
The characteristics clustering with the 19 descriptors for the 72 cardiac patients without data normalization produced the final partition presented in Table 6. The first descriptor separated from the others is 5, followed by 6 and then by 4. The cluster containing the descriptors from 7 to 19 is not subjected to any more splitting (their membership degrees, MD, to this cluster are all near 1). In the next step the descriptor 2 follows and, finally, 3 and 1.
The characteristics clustering with the same descriptors but with normalization (see Table 3) illustrates the same aspect, i.e., the high similarity of the last 13 descriptors based on effort testing and a large dissimilarity among the first descriptors including ECHO data and age and weight, respectively. This conclusion is supported also by the horizontal characteristics clustering, with the number of classes preset to seven, i.e., the number of classes produced by the hierarchic clustering procedure. The results of the horizontal characteristics clustering procedure are shown in Table 7. It is interesting to remark that all the divisions in Table 4 are clear-cut, the membership degrees to the different classes are all 1 or 0. We have to stress that the same treatment, but with a predefined number of eight classes, gives absolutely the same results, i.e., seven classes are clear-cut and one remains vacant, the MDs of all the descriptors to this class being zero.

TABLE 6 Membership Degrees of the 19 Descriptors to the Clusters of the Final Fuzzy Partition without and with Data Normalization, Respectively
We may conclude that the most significant descriptors, as shown by fuzzy clustering, are, in the order of their importance, 5 (ECHO data for right ventricle), 6 (ECHO data for left atrium), 4 (ECHO data for left ventricle), 2 (weight), 3 (ECHO data for left ventricle), 1 (age). This is because these descriptors have the highest discriminative power among the data set.
On the other side, the descriptors from 7 to 19 had the same membership degrees to all the produced classes, and this may indicate that they are representations of the same unique property. As such, we have added one of this descriptors to the set of six most relevant descriptors.

Fuzzy Hierarchical Clustering of Cardiac Patients Considering Only Seven Characteristics
Taking into account the results obtained above referring to the characteristics clustering, it appears more illuminating and intuitive to use for the classification of cardiac patients only the first seven descriptors, namely age (1), weight (2), and ECHO data (3)(4)(5)(6)(7). In order to validate our method, we will use the same fuzzy clustering procedures, but on the data set described only by these seven descriptors.
The results obtained in this case without and with normalization are presented in Table 8. By careful examination and comparison with the results in Table 9 concerning the membership degrees of the patients to the four final classes it is easy to observe a good agreement with the classification based on the paraclinical and clinical examinations (see Table 5).
This analysis confirms that the data set using only the seven selected descriptors conserve its discriminative power, and supports our decision to discard the less relevant descriptors.

Fuzzy Horizontal Cardiac Patients Clustering Considering Only Seven Characteristics
We continue our analysis by clustering the set of patients with the seven descriptors without and with data normalization. Because the human experts indicated a classification of the patients in three classes, we will work here with the same number of classes. The fuzzy horizontal clustering distributes the cardiac patients according to the data presented in Table 10. It is interesting to remark in this case a better agreement with the classification obtained by paraclinical and clinical examinations. The class of arterial hypertension patients, A 3 , is much better separated than the other ones. Concerning the class of valvular heart disease patients, A 1 , and the class of the ischemic cardiac patients, A 3 , each of them contains patients from the other one. However, we have to observe that in each of these classes we find the majority of patients indicated by paraclinical and clinical investigations (see Table 5) and this is a good validation of our technique. The membership degrees of the cardiac patients to the classes of the final fuzzy partitions obtained by horizontal fuzzy clustering for seven descriptors without and with data normalization, presented in Table 11 illustrate also the efficiency of the fuzzy clustering approach. These fuzzy membership degrees are in good support with the medical practice that cardiac patients may present signs of more than one illness, since a clear-cut of the three groups of cardiac patients is practically impossible.

Fuzzy Hierarchical Cross-Clustering
In what follows our aim is to identify the descriptors responsible with the separation of each class of patients. We will achieve this by using our fuzzy hierarchical cross-clustering algorithm on the set of 72 cardiac patients characterized by the same seven descriptors.  The classification hierarchies produced in this way using both non-normalized and normalized data are presented in Table 12. The partitioning of the cardiac patients in classes 1 and 2 is practically the same in both cases. What is different is the partitioning of the descriptors in the two cases. The descriptors associated to the class 1 (without normalization), comprising the majority of valvular heart disease patients, are age (1), left ventricle (4), right ventricle (5), and left atrium (6). The patients in class 2 (without normalization), majority of arterial hypertension and ischemic cardiopathy patients, have as main descriptors the weight (2), left ventricle (3), and contractility (7). In the case with data normalization, the main descriptor associated to the class 2 is only the contractility (7), the rest, namely age (1), weight (2), left ventricle (3) and (4), right ventricle (5) and left atrium (6) are classified with the class 1, which includes the majority of the valvular heart disease patients and half from ischemic cardiopathy patients. We remark again a good agreement with the medical observations presented in Table 5.

CONCLUSIONS
Fuzzy classification algorithms applied to cardiac patients based on seven descriptors, namely ECHO data, and also age and weight, allow an objective interpretation of their similarities and dissimilarities. Moreover, the results obtained may be very useful in their reclassification. It is very interesting to study the classification of valvular heart disease and ischemic cardiopathy patients considering their membership degrees. Some of them belong practically with the same MD to the two classes, illustrating in this way the fuzziness of cardiac diseases. The new fuzzy approach, the fuzzy cross-classification algorithm, allows the qualitative and quantitative identification of the variables (descriptors) responsible for the observed similarities and dissimilarities between cardiac patients.
In addition, the fuzzy hierarchical characteristics clustering (FHiCC) and fuzzy horizontal characteristics clustering (FHoCC) procedures revealed a high similarity between the descriptors referring to the effort testing. This is one of the main conclusions and suggests their high redundant character concerning the diagnosis of cardiac diseases.