A Computer-Aided Diagnosis System for Breast Cancer Using Independent Component Analysis and Fuzzy Classifier

Screening mammograms is a repetitive task that causes fatigue and eye strain since for every thousand cases analyzed by a radiologist, only 3–4 are cancerous and thus an abnormality may be overlooked. Computer-aided detection (CAD) algorithms were developed to assist radiologists in detecting mammographic lesions. In this paper, a computer-aided detection and diagnosis (CADD) system for breast cancer is developed. The framework is based on combining principal component analysis (PCA), independent component analysis (ICA), and a fuzzy classifier to identify and label suspicious regions. This is a novel approach since it uses a fuzzy classifier integrated into the ICA model. Implemented and tested using MIAS database. This algorithm results in the classification of a mammogram as either normal or abnormal. Furthermore, if abnormal, it differentiates it into a benign or a malignant tissue. Results show that this system has 84.03% accuracy in detecting all kinds of abnormalities and 78% diagnosis accuracy.


INTRODUCTION
Breast cancer is considered one of the most common and fatal cancers among women in the USA [1].According to National Cancer Institute, 40 480 women died due to this disease and on average every three minutes one woman is diagnosed with this cancer.Right now there are over two and a half million women in the US who have been treated from it [1].Radiologists visually examine mammograms to search for signs of abnormal regions.They usually look for clusters of microcalcifications, architectural distortions, or masses.
Early detection of breast cancer via mammography improves treatment chances and survival rates [2].Unfortunately, mammography is not perfect.False positive (FP) rates are 15-30% due to the overlap in the appearance of malignant and benign abnormalities while false negative (FN) rates are 10-30%.A result of FP is defined to be when a radiologist reports a suspicious change in the breast but no cancer is found after further examinations.Therefore, it leads to unnecessary biopsies and anxiety.A result of FN means failure to detect or correctly characterize breast cancer in a case of which later tests conclude that cancer is present.Nonetheless, mammography has an overall accuracy rate of 90% [3].
CAD algorithms have been developed to assist radiologists in detecting mammographic lesions.These systems are regarded as a second reader, and the final decision is left to the radiologist.CAD algorithms have improved total radiologist accuracy of detection of cancerous tissues [4].CADD algorithms are considered as an extremely challenging task for various reasons.First, the imaging system may have serious imperfections.Second, the image analysis task is compounded by the large variability in the appearance of abnormal regions.Finally, abnormal regions are often hidden in dense breast tissue.The goal of the detection stage is to assist radiologists in locating abnormal tissues.
Many methods have been proposed in the literature for mammography detection and diagnosis utilizing a wide variety of algorithms.Chang et al. [5] developed a 3D snake algorithm that finds the tumor's contour after reducing the noise levels and followed by an edge enhancement process.Finally, the tumor's contour is estimated by using the gradient vector flow snake.Kobatake et al. [6] proposed the iris filter to detect lesions as suspicious regions with a low contrast compared to their background.The proposed filter has the features' extraction ability of malignant tissues.Bocchi et al. [7] developed an algorithm for microcalcification detection and classification by which the existing tumors are detected using a region growing method combined with a neural network-based classifier.Then, microcalcification clusters are detected and classified by using a second fractal model.Also, Li et al. [8] developed a method for detecting tumors using a segmentation process, adaptive thresholding, and modified Markov random fields, followed by a classification step based on a fuzzy binary decision tree.Bruce and Adhami [9] used the modulus-maxima technique of discrete wavelet transform as a feature extraction technique combined with a Euclidean distance classifier.A radial distance measure of mass boundaries is used to extract multiresolution shape features.Finally, the leave-one-out and apparent methods are used to test their proposed technique.Peña-Reyes and Sipper [10] applied a combined fuzzy-genetic approach with new methods as a computer-aided diagnosis system.Zheng and Chan [11] combined artificial intelligent methods with the discrete wavelet transform to build an algorithm for mass detection.Hassanien and Ali [12] proposed an enhanced rough set technique for feature reduction and classification.Swiniarski and Lim [13] integrated ICA with rough set model for breast-cancer detection.First, features are reduced and extracted using ICA.Then, extracted features are selected using a rough set model.Finally, a rough set-based method is used for rule-based classifier design.
This work is based on integrating PCA, ICA, and fuzzy classifier to identify and label suspicious regions from digitized mammograms.The rest of this paper is organized as follows: Section 2 presents PCA and ICA algorithms and covers fuzzy logic adaptation as a classifier.The proposed integrated approach is presented in Section 3. Section 4 presents the experimental results followed by the conclusions in Section 5.

PCA
PCA is a decorrelation-based technique that finds the basis vectors for a subspace in order to select the most important information.PCA consists of two phases.The first phase finds v uncorrelated and orthogonal vectors; and the second phase projects the testing data into a subspace spanned by these v vectors [14].PCA algorithm can be presented as follows: (i) construct R train matrix with dimension N × M, where N is the total number of training subimages and M is the size of each square subimage; then, generate its normalized matrix P M×N ; (ii) covariance matrix is constructed using (iii) let λ i and E i , i = 1, 2 • • • M, be its eigenvalues and eigenvectors that satisfy the equation ; discard of all eigenvalues less than T (a predetermined threshold) and retain the rest (the principal components) to produce the reduced matrix R R M×v .T is calculated using The given testing data R test is projected into the space spanned by the reduced training matrix R R M×v using (3)

ICA
Higher-order statistics, such as ICA techniques, are used to compensate for PCA shortcomings.ICA is based on the use of moments and cumulants up to fourth-order to describe any distribution of a random variable.
In general, ICA is a relatively new technique developed to find a linear representation of nongaussian data so that the data components are statistically as independent as possible.ICA has the ability to describe localized shape variations and it does not require a Gaussian distribution of the data as in PCA.However, the resulting vectors are not ordered; and, therefore, ICA requires a method for ordering the resulting vectors.
The statistical latent variables model is used to define ICA.Assuming that we have n linear mixtures The digital mammographic image R is considered as a mixture of linear combination of statistically independent source regions S where A, the mixing matrix, and its coefficients describe uniquely the mixed source regions and can be used as extracted features.After estimating the matrix A and its inverse W (the separating matrix), the independent components can be estimated using (5)

Fuzzy classifier
Fuzzy logic can be interpreted as the emulation of human reasoning on computers [15].Fuzzy rules are more comprehensible than crisp rules since they can be expressed in terms of linguistic concepts.The value of the linguistic variable is not a number but a word.For example, the linguistic variable "size" might have the values "small," "medium," and "large."Each one of these values is called a fuzzy set when implemented using fuzzy logic and thus fuzzy sets can be used to model linguistic variables.Fuzzy classifier is ideally suited to the labeled observed data to provide interpretable solutions.It handles imprecise data and the resulting fuzzy rules are interpretable, that is, fuzzy classifier structure can be analyzed through its semantic structure.There are two different methods for development of fuzzy classifiers; approximate and descriptive fuzzy rule base.Each fuzzy rule is defined using membership function of fuzzy sets in an approximate fuzzy rule base which is implemented in this work.Values of the linguistic variable can be described in terms of numerals using membership functions.The object membership degree to a fuzzy set defines a membership function.Its domain is the universe of discourse (all values an object may take) and its range of the interval [0.0, 1.0].A commonly used membership function is the triangular function.Figure 1 shows a triangular membership function of a fuzzy set "Small." In Figure 1, an object x has a membership degree of 0.7 to the fuzzy set "Small."A fuzzy space is defined to be the set of fuzzy sets that define fuzzy classes for a particular object as shown in Figure 2.
Fuzzy space allows the object to partially belong to different classes simultaneously.This idea is very useful in cases where the difference between classes is not well defined.For example, the object x has a membership degree of 0.7 to the fuzzy set "Small" and 0.3 to the fuzzy set "Medium."Similarly, in mammographic images, the difference between benign/malignant and normal/abnormal subimages is not well defined.For example, an abnormal subimage may be classified as benign rather than malignant which can be described in terms of numerals using membership functions as it has a membership degree of 0.7 to the fuzzy set "benign" and 0.3 to the fuzzy set "malignant." Fuzzy membership functions are easy to implement and their fuzzy inference engines are fast.
In descriptive fuzzy rule base, linguistic variables are commonly defined by fuzzy if-then rules where labels A i j are used to represent a discrete set of linguistic fuzzy sets.For example, fuzzy classification rules that describe each class of subimages may be developed to represent each class.Fuzzy rules have the form Fuzzy rules can also be expressed as where Y represents the decision class (i.e., normal, abnormal, benign, or malignant) and A i j represents a fuzzy set for j: 1, . . ., tth selected feature.

PROPOSED CADD ALGORITHM
In this section, a computer-aided detection and diagnosis algorithm of suspicious regions in mammograms is developed.PCA algorithm is used as a dimensionality reduction module followed by ICA as a feature extraction module.Finally, a fuzzy classifier is used to classify testing subimages into normal/abnormal and at a later stage to classify the abnormal subimages into malignant/benign as a diagnosis system.Figure 3 presents the general framework for this system.Each set of abnormal subimages is mixed with one set of normal subimages every time and then divided into two groups; one for training phase and the other group for testing phase as shown in Table 1.

Subimages generation
Each training set is used to create the matrix R train with dimension N × M where each row contains a subimage.The training matrix dimensionality is reduced by using PCA algorithm to generate R R .Then, the covariance matrix is estimated by using

Unsupervised learning
Estimation of the separating matrix, W, and the independent source regions, S, is done in an unsupervised manner.The independent source regions are estimated by using (9), where (R R ) T is the transpose of the reduced matrix R R .The separating matrix, W, is initialized to the identity matrix yielding To reach the maximum statistical independence of S, the nonlinear function Φ(S) is used to estimate the marginal probability density function of S using its central moments and cumulants.Minimum mutual information algorithm [16] is used to estimate Φ(S) as shown in ( 10)-( 14).Equations (10) and ( 11) are used to estimate the ith central moments and cumulants where E is the expected value and μ is the mean of the current feature r.Equations ( 12)-( 14) are used to estimate Φ(S) (• indicates the Hadamard product of two matrices) Natural gradient descent method [16] is used to estimate the change of W according to dW/dt = η[I − Φ(S)S T ]W, where η(t) is the learning rate and I is the identity matrix.If dW/dt is not close to zero, W is updated using Finally, selected features resulting from the training process are estimated using minimum square error method (MSE) [17,18].
(i) From ( 8), the training matrix is reconstructed as (ii) Substitute ( 9) into ( 16): (iii) There, the reduced dimensionality selected features from the training set are estimated by Same procedure followed for training data is used for testing; and R test is projected into the reduced matrix R R from the training procedure.The reduced dimensionality extracted features from the testing procedure are estimated by using

Fuzzy classifier modeling
The matrices Q train and Q test contain the reduced dimensionality extracted features from subimages where each one of size N by v.Each class of subimages (normal, abnormal, benign, and malignant) is represented by a single fuzzy rule by aggregating the membership functions of each antecedent fuzzy set using the information about selected feature values of training subimages.The proposed fuzzy-based classification algorithm can be summarized as follows.
(1) Four activation functions μ bs , μ ms , μ as , μ ns , with each one is of size N by 1, are initialized to 0 where each element of them represents the aggregated membership functions of the selected feature values for the corresponding testing subimage.Each one represents the degree of activation of the selected feature values and so these parameters are defined as (i) μ bs : represents the degree of activation for the benign testing subimages, (ii) μ ms : represents the degree of activation for the malignant testing subimages, (iii) μ as : represents the degree of activation for the abnormal testing subimages, and (iv) μ ns : represents the degree of activation for the normal testing subimages.
(2) Since subimages have different intensities and the goal is to reduce the variation and the computational complexity, the selected features of Q train and Q test are mapped into a limited range of [r1, r2] using (3) Using (21), membership functions of fuzzy sets of the testing subimages are obtained from the product space of the selected features from the training phase: where s i (x j ) represents number of samples of the current feature x j , s(x j ) represents the total number of all samples in the current feature x j , that is, the product space of the current feature.Also, the subscript ( j) is the index for the selected feature for each training subimage, and (i) is the index for the current processed sample of the current feature.(4) The membership functions are normalized by using (5) The degree of activation of the developed membership functions is computed for the testing subimages for μ as , μ ns in the detection phase and for μ bs , μ ms in the diagnosis phase by aggregating estimated membership functions: (6) There are many methods used in the literature to determine to which class a subimage belongs (i.e., normal/abnormal or benign/malignant).An efficient one is the maximum algorithm.It classifies the testing subimage into the class that has the maximum degree of activation according to (24) where C 1 is used as an index of a testing subimage being identified as normal or abnormal and C 2 for being identified as benign or malignant:

EXPERIMETAL RESULTS
Table 2 shows results of the proposed CADD algorithm against PCA and ICA algorithms for the same testing data using fuzzy classifier.Algorithm accuracy is defined as the ratio between number of correctly classified testing subimages and total number of testing subimages.Results demonstrate that combining ICA and PCA algorithms improves the total algorithm performance in all testing sets over usage of PCA algorithm only.PCA algorithm has a best result of 80.67% while 84.03% for the proposed CADD algorithm as shown in Table 2.The proposed algorithm improved PCA algorithm accuracy with an average of 8.56% for all tests.Table 2 also shows the simulation results of ICA algorithm versus the proposed CADD algorithm.ICA algorithm has an accuracy of 49.58% in all testing sets.In contrast, the best result of applying the proposed CADD algorithm is 84.03%.These results indicate that using PCA algorithm for dimensionality reduction before ICA algorithm improves the ICA algorithm accuracy with an average of 50.51%.Results from ICA algorithm show that fuzzy classifier performance is degraded when no dimensionality reduction module is implemented.A fuzzy classifier requires features reduction method in order to minimize total number of membership functions and improves its accuracy.As for ICA algorithm alone, each subimage has larger number of selected features and therefore fuzzy classifier performance is degraded in all testing subimages.
The experimental results of the proposed CADD algorithm as a computer-aided diagnosis system are shown in Table 3.The best result is 78% where 15 malignant subimages out of 25 are correctly classified and 31 benign subimages out of 34 are correctly classified.
This system uses several parameters that impact the performance and accuracy of results such as the number of selected principal components, learning rate, and mapping range.

Number of selected PC
Using PCA algorithm to reduce data dimensionality as a preprocessing step for ICA algorithm affects the total algorithm accuracy.In Table 4, simulation results on test sets 1-5 (PC indicates the number of selected principal components) are shown.These results indicate that selecting less than 11 principal components achieves acceptable results in all simulations.This means that less than 0.81% of principal components are selected for subimages of size 35 × 35 pixels and less than 0.5% of principal components are selected for subimages of size 45 × 45 pixels.This is harmony with all literature that used PCA algorithm for dimensionality reduction.

Learning rate
The learning rate for computing the change in W for ICA algorithm determines the speed of convergence for dW/dt and it impacts the total algorithm accuracy.Figures 4-8 show learning rate impact on test sets 1-5.It can be concluded that choosing a learning rate close to 0.0045 produce acceptable results for all sets.show the accuracy of the results versus the mapping range values for all test sets 1-5 and it can be concluded       that choosing a mapping range equal to [0, 9] or [0, 15] is acceptable for all testing sets.

Mapping range
The proposed system performance is a parameterdependent and an investigation of this dependency is outside this presentation but rather is left for future investigations.Efforts developed earlier such as in [19,20] can be investigated.Estimating the parameters will continue to be one of the main disadvantages of algorithms such ICA where human intervention is needed.In other classification methods such as in fractal models, [7], a set of 30 mammograms are used that contains single and clustered microcalcifications.50 subimages are extracted and divided into 30 subimages for the training phase and 20 subimages for the testing phase.Results of using two different multilayer subnetworks in neural network-based classifier indicate that the proposed system has a classification accuracy of 90%.Also, in discrete wavelet transform method [9], a set of 60 mammograms are used.Masses are segmented  Furthermore, Table 2 shows that each test set has different algorithm accuracy so cropping size for example has an impact on the results.

CONCLUUDING REMARKS
A CADD system has been developed and implemented.Its framework is based on integrating PCA, ICA, and fuzzy   logic.The performance of the proposed CADD is compared against PCA and ICA performance individually.Extensive simulations using 833 subimages are performed.These results indicate that combining ICA and PCA algorithms improves PCA algorithm accuracy about 8.56% for all test sets and ICA algorithm accuracy about 50.51%.The best results are obtained with subimage sizes of 45 × 45 pixels over the 35 × 35 size.Using ICA algorithm for feature extraction without using a preprocessing module of PCA degraded fuzzy classifier performance.ICA takes advantage of the reduction of dimensionality and noise to produce more accurate and robust results.Parameter values play a vital role in the system's performance and their selection should be investigated to improve system's robustness.Other membership functions can be modeled based on mean and standard deviation of selected feature values.

Figure 1 :Figure 2 :
Figure 1: A triangular membership function of the fuzzy set "Small."

Figure 3 :
Figure 3: Block diagram of the proposed CADD system.

Figure 4 :
Figure 4: Learning rate impact on algorithm accuracy for test set no. 1 where other parameters are kept constant.

Figure 5 :
Figure 5: Learning rate impact on algorithm accuracy for test set no. 2 where other parameters are kept constant.

Figure 6 :
Figure 6: Learning rate impact on algorithm accuracy for test set no. 3 where other parameters are kept constant.

Figure 7 :
Figure 7: Learning rate impact on algorithm accuracy for test set no. 4 where other parameters are kept constant.

Figure 8 :
Figure 8: Learning rate impact on algorithm accuracy for test set no. 5 where other parameters are kept constant.

Figure 9 :
Figure 9: Mapping range impact on algorithm accuracy for test set no. 1.

Figure 10 :
Figure 10: Mapping range impact on algorithm accuracy for test set no. 2.

Figure 11 :
Figure 11: Mapping range impact on algorithm accuracy for test set no. 3.

Figure 12 :
Figure 12: Mapping range impact on algorithm accuracy for test set no. 4.

Figure 13 :
Figure 13: Mapping range impact on algorithm accuracy for test set no. 5.
database has a total of 119 regions of suspicion (ROS) divided into 51 malignant and 68 benign.Two different sets of abnormal subimages, each set consists of 119 ROS, are cropped and scaled into 35 × 35 and 45 × 45 pixels based on the center of each abnormality.Then, five different sets of normal subimages, each set consists of 119 subimages, are cropped and scaled randomly from normal MIAS mammograms where two sets of size 35 × 35 and three sets of size 45 × 45 pixels.

Table 1 :
Different sets used to evaluate the detection algorithm performance.

Table 2 :
F P and F N ; and total PCA, ICA, and PCA-ICA algorithms accuracy.

Table 4 :
Number of selected principal components impact on algorithm accuracy where learning rate and mapping range of each set are kept fixed.