We propose a computer aided detection (CAD) system for the detection and classification of suspicious regions in mammographic images. This system combines a dimensionality reduction module (using principal component analysis), a feature extraction module (using independent component analysis), and a feature subset selection module (using rough set model). Rough set model is used to reduce the effect of data inconsistency while a fuzzy classifier is integrated into the system to label subimages into normal or abnormal regions. The experimental results show that this system has an accuracy of 84.03% and a recall percentage of 87.28%.

Breast cancer is the most common cancer among women worldwide. National cancer institute [

Several rough set-based and fuzzy-based methods have been proposed in literature for breast cancer detection. Hassanien and Ali [

In [

In [

The novelty of this work is the integration of RSM for feature selection with a fuzzy classifier as well as generating the framework for the integration of the PCA, ICA, RSM, and fuzzy classifier for breast cancer detection. The rest of this paper is organized as follows. Section

PCA is an orthogonal transform and a decorrelation technique that captures maximum variance. The correlation between components of a vector is used to measure data redundancy. This means that most of the information contained in the original vector can be represented by a much smaller vector after the PCA stage. In this paper, PCA is used as a dimensionality and noise reduction module. This step ensures that the source components of a vector are uncorrelated.

ICA is a statistical technique that can be used to extract hidden features within a set of data.

A mammographic image

The ICA algorithm estimates the separating matrix

Rough set theory can be used as a feature subset selection algorithm. RSM determines and removes the dispensable attributes representing the redundant information within the data while it aims to keep the core attributes representing the minimum essential information.

By relaxing the core algorithm, more attributes can be selected which are called

Cardinality is used to replace traditional rough set theory operations. Therefore, algorithm efficiency will be improved with reduced complexity. The cardinality of a set is defined as the number of elements in the set. For example, Table

Selected features for eight images.

Image | Feature 1 | Feature 2 | Feature 3 | Decision |
---|---|---|---|---|

A | F | C | ||

A | F | D | S | |

E | E | C | ||

B | E | C | S | |

B | E | C | S | |

A | E | D | ||

A | E | D | S | |

A | F | D | S |

Two objects are considered consistent if they have the same condition and decision values. For example, in Table

Human reasoning can be emulated using fuzzy logic. Fuzzy logic is proved to be a powerful tool to handle and process noisy and vague data. Fuzzy rules are more flexible than crisp rules for many reasons. They allow partial set membership and overlapping between fuzzy set definitions which should simplify the classification phase as opposed to crisp rules that are restricted to either a membership or nonmembership to the set. Also, they can be expressed in terms of linguistic statements based on expert knowledge. Finally, the interpretability of the results can be improved by fitting fuzzy rules to the labeled observed data.

Fuzzy membership functions are easy to implement and they improve speed of inference engines. The difference between normal and suspicious mammographic images may not be well defined. Figure

Fuzzy space for an object x consisting of two fuzzy sets: “Normal” and “Suspicious”.

Several approaches have been developed for automatic derivation of fuzzy rules from the labeled observed data such as genetic algorithm [

Fuzzy if-then rules are used to implement membership function of fuzzy sets as shown in (

The weight is a number in the interval [

Equation (

The antecedent results are applied then to the consequent, which is known as the inference step. In this case, the classifier will label the tested subimage as normal.

This paper integrates four techniques, namely, PCA, ICA, Rough Set, and Fuzzy classifier to build a CAD system. PCA algorithm is used as a dimensionality and noise reduction tool (prewhitening), and ICA algorithm is used as a feature extraction module while RSM is used as a feature subset selection module followed by a fuzzy classifier.

119 regions of suspicion (ROS) are manually extracted from MIAS database [

Four other sets of normal subimages are randomly and automatically extracted such that the first set is of size

Data sets that used in the evaluation of the proposed algorithm performance.

Set no. | Training set | Testing set | |||||
---|---|---|---|---|---|---|---|

ROS | Normal | Total | ROS | Normal | Total | Size-pixels | |

1 | 60 | 59 | 119 | 59 | 60 | 119 | |

2 | 60 | 59 | 119 | 59 | 60 | 119 | |

3 | 60 | 59 | 119 | 59 | 60 | 119 | |

4 | 60 | 59 | 119 | 59 | 60 | 119 |

(a) Benign, normal, and malignant subimages of size

A training matrix

In this paper, ICA scheme is based on minimizing the mutual information of the source components which can be achieved using cumulants. This is proposed (a modified version of [

(i)

(ii) The change in

(iii) The momentum method is used to boost the convergence speed of (

(iv) The separating matrix is updated and then normalized:

(v) Stop the algorithm when

Finally, the reduced dimensionality selected features can be estimated as follow.

A minimum square error approximation of the training matrix

From (10),

Since

First, a testing matrix

The reduced dimensionality extracted features from the corresponding testing set are estimated using (

The estimated matrices

There are some inconsistent elements (subimages) in the estimated matrices

The proposed training framework can be summarized as follows.

The consistent elements from the training matrix are removed. The resulting matrix is

Construct the decision matrix,

Find the Core attributes using the following procedure.

Initialize Core vector into

Check the cardinality for each attribute

Find

Initialize

Set

Let

Update

If

Else, go to step (II).

In this step, features are selected from the matrix

Finally,

Two single fuzzy if-then rules are used to represent the normal and abnormal fuzzy sets. The membership functions of each antecedent fuzzy set are aggregated using the information about the selected feature values of the training subimages.

The proposed fuzzy-based classification algorithm can be summarized as follows:

Two activation functions

Using (

The membership functions are normalized using

The membership functions are aggregated using (

By assigning the corresponding testing subimage into the fuzzy set with the maximum degree of activation, a crisp decision is made, that is, normal or abnormal. Equation (

Table

Results of PCA-ICA-Fuzzy, PCA-ICA-Rough-Fuzzy, PCA-Fuzzy, PCA-Rough-Fuzzy, ICA-Fuzzy, and ICA-Rough-Fuzzy algorithms. NA: not applicable.

Algorithm | Set no. | PC | FP | FN | Accuracy | Precision | Recall |
---|---|---|---|---|---|---|---|

PCA-ICA-Rough-Fuzzy | 1 | 8 | 21.85% | 9.24% | 68.91% | 56.66% | 75.56% |

2 | 7 | 9.24% | 7.56% | 83.19% | 84.49% | ||

3 | 8 | 12.61% | 12.61% | 74.79% | 74.99% | 74.99% | |

4 | 8 | 10.08% | 5.88% | 80.01% | |||

PCA-ICA-Fuzzy | 1 | 8 | 16.81% | 14.28% | 68.91% | 66.66% | 70.18% |

2 | 20 | 10.92% | 5.89% | 78.34% | |||

3 | 6 | 12.61% | 21% | 66.39% | 74.99% | 64.29% | |

4 | 5 | 8.4% | 9.25% | 82.35% | 81.96% | ||

PCA-Fuzzy | 1 | 20 | 26.05% | 10.08% | 63.87% | 48.33% | 70.74% |

2 | 5 | 10.08% | 14.29% | 75.63% | 73.84% | ||

3 | 6 | 12.61% | 21% | 66.39% | 74.99% | 64.29% | |

4 | 5 | 11.75% | 7.58% | 76.7% | |||

PCA-Rough-Fuzzy | 1 | 16 | 20.17% | 18.49% | 61.35% | 60% | 62.06% |

2 | 5 | 9.25% | 10.08% | ||||

3 | 6 | 16.81% | 17.65% | 65.55% | 66.66% | 65.57% | |

4 | 8 | 10.08% | 10.08% | 79.83% | 80.01% | 80.01% | |

ICA-Fuzzy | 1 | NA | 10.08% | 40.34% | 49.58% | 80.01% | 50% |

2 | NA | 10.08% | 40.34% | 49.58% | 80.01% | 50% | |

3 | NA | 10.08% | 40.34% | 49.58% | 80.01% | 50% | |

4 | NA | 10.08% | 40.34% | 49.58% | 80.01% | 50% | |

ICA-Rough-Fuzzy | 1 | NA | 15.97% | 15.13% | 68.91% | 68.33% | 69.48% |

2 | NA | 10.08% | 9.24% | 80.01% | |||

3 | NA | 14.29% | 18.49% | 67.22% | 71.66% | 66.15% | |

4 | NA | 8.4% | 11.77% | 79.83% | 78.12% |

Table

A Comparison of the different computer-aided detection system results.

Algorithm | Best accuracy | Average accuracy | Average FN | Average FP |
---|---|---|---|---|

PIRF | 84.03% | 77.73% | 8.82% | 13.45% |

PIF | 83.19% | 75.21% | 12.61% | 12.19% |

IRF | 80.67% | 74.16% | 13.66% | 12.19% |

PRF | 80.67% | 71.85% | 14.08% | 14.08% |

PF | 80.67% | 71.64% | 13.24% | 15.12% |

IF | 49.58% | 49.58% | 40.34% | 10.08% |

As the results show, fuzzy classifier cannot be implemented with ICA model alone without a dimensionality reduction since, without it, a large number of membership functions will be generated. Also, without a feature subset selection module, the classifier task complexity is increased and performance is degraded. Furthermore, results indicate that integrating ICA model with PF generated better results than integrating RSM with PF. The average accuracy was improved by 4.68% and false negative rates were improved by 4.76% if a PCA model was used with the ICA model while following it with RSM improved its average accuracy by 0.29% and its FN rates by 6.33%. Integrating RSM improved total PF algorithm performance by 0.29% but degraded its FN rates by 6.34%. Results also indicate that RSM and PIF integration improves accuracy with an average of 3.35%.

Comparing the results using FN rates, we find that PIRF has an FN of 8.82%, PIF of 12.61%, IRF of 13.66%, PF of 13.24%, PRF of 14.08%, and IF of 40.34%. Results indicate that using PCA as a dimensionality reduction module reduces FN rates in PIRF and PF at the expense of a little increase in the FP rates. Also, average FN rates are very close to average FP rates in PIF and PRF algorithms. On the other hand, average FN rates are increased in IRF and IF algorithms when no dimensionality reduction was integrated. Finally, integrating RSM into PIF and PF algorithms reduces the number of principal components required to obtain Reduct. The previous discussion shows that each one of the integrated techniques (PCA, ICA, RSM, and Fuzzy Classifier) is necessary and should be implemented in the proposed sequence in order to achieve the highest accuracy rates.

An implementation of the PIF proposed in [

Observations of the different developed algorithms.

Algorithm | Observations |
---|---|

PIRF | It has the highest accuracy and recall percentage but not the highest precision |

PIF | Needs a feature subset selection module and it has the highest average precision |

IRF | Needs a dimensionality and noise reduction module and it has the highest average precision |

PRF | Needs a feature extraction module |

PF | Needs a feature extraction module and a feature subset selection module |

IF | Needs a dimensionality and noise reduction module and a feature subset selection module and it has the lowest accuracy and recall percentage |

The average accuracy for PIF improved by 3.35% with PIRF system and its average FN rate improved 30.01%. Also, the average selected number of principal components in PIRF algorithm which is 7.75 is less than that of PIF algorithm which is 9.75. In other classification methods such as in [

The proposed CAD system uses several parameters that impact performance accuracy such as number of the principal components in the PCA algorithm, learning rate and alpha in the ICA algorithm, threshold in the Reduct process, and mapping range.

Reducing data dimensionality using PCA module affects PIRF algorithm accuracy. When large number of principal components is selected, extracted features will have redundant information and therefore will degrade the performance accuracy. However, if a small number is selected, extracted features cannot be estimated precisely and the fuzzy classifier performance will also be degraded.

Table

The influence of the number of PC on accuracy of the results while learning rate, mapping range, and threshold parameters are kept constants.

PC | Set no. 1 | Set no. 2 | Set no. 3 | Set no. 4 |
---|---|---|---|---|

5 | 62.19% | 78.99% | 69.75% | 80.67% |

6 | 63.03% | 81.52% | 69.75% | 80.67% |

7 | 62.19% | 69.75% | 81.52% | |

8 | 78.15% | |||

9 | 68.07% | 74.79% | 73.95% | 78.99% |

10 | 67.23% | 75.63% | 71.43% | 73.95% |

On the other hand, Figure

ROC plot for different values of selected principal components for testing set number 4.

The estimation of the matrices

Learning rate impact on accuracy for test set number 1 (all other parameters were kept constant).

Learning rate impact on accuracy for test set number 2 (all other parameters were kept constant).

Learning rate impact on accuracy for test set number 3 (all other parameters were kept constant).

Learning rate impact on accuracy for test set number 4 (all other parameters were kept at constant values).

ROC plot for different values of the learning rate for testing set number 4.

This constant determines the ratio of the previous

In investigating the mapping range values’ effect on the accuracy of the results, we found that mapping the data into a limited range results in accuracy loss but simplifies computational complexity and processing time. Figures

Mapping range impact on accuracy for test set number 1.

Mapping range impact on accuracy for test set number 2.

Mapping range impact on accuracy for test set number 3.

Mapping range impact on accuracy for test set number 4.

ROC plot for different values of the mapping range for testing set number 4.

A threshold value

Threshold

Set no. 1 | |
---|---|

1 | 69% |

0.75 | 65.55% |

0.5 | 63.87% |

0.25 | 63.87% |

A computer-aided detection system has been developed and implemented by integrating PCA, ICA, RSM, and a fuzzy classifier. Its performance is compared against the performance of PCA-ICA-Fuzzy, PCA-Fuzzy, PCA-Rough-Fuzzy, ICA-Fuzzy, and ICA-Rough-Fuzzy algorithms.

Results from Tables

Parameter values as well as block size play a vital role in the system’s performance and an investigation of this relation and perhaps automation of their selection is needed to further improve system’s robustness. Although cumulants offer simple computations, they are sensitive to outliers (large values within the set). Therefore, an alternative route that may be worthwhile to investigate is to use a learning rule of the ICA algorithm that is based on negentropy instead of cumulants.

The authors would like to acknowledge Western Michigan University for its support and contributions to the Information Technology and Image Analysis (ITIA) Center, funded by the National Science Foundation Grant (MRI-0215356).