One of the parameters that are usually stored for mammograms is the BI-RADS density, which gives an idea of the breast tissue composition. In this work, we study the effect of BI-RADS density in our ongoing project for developing an image-based CAD system to detect masses in mammograms. This system consists of two stages. First, a blind feature extraction is performed for regions of interest (ROIs), using Independent Component Analysis (ICA). Next, in the second stage, those features form the input vectors to a classifier, neural network, or SVM classifier. To train and test our system, the Digital Database for Screening Mammography (DDSM) was used. The results obtained show that the maximum variation in the performance of our system considering only prototypes obtained from mammograms with a concrete value of density (both for training and test) is about 7%, yielding the best values for density equal to 1, and the worst for density equal to 4, for both classifiers. Finally, with the overall results (i.e., using prototypes from mammograms with all the possible values of densities), we obtained a difference in performance that is only 2% lower than the maximum, also for both classifiers.
Several factors can affect the composition of breast tissue. The increase or decrease of the breast gland is part of the normal physiological changes that occur in the breast and usually occurs in both breasts simultaneously. These changes may be caused by hormonal fluctuations (natural or synthetic) including menarche, pregnancy, breastfeeding, or menopause. The increase in glandularity also depends on the woman’s genetic predisposition. In young women, normally, the breast is composed mostly of glandular tissue and very little fat. And although this composition varies depending on age, it is possible to find older women with extremely dense breasts, that is, consisting mostly of glandular tissue and not fat. Weight gain or loss also increases or decreases the fat content of the breast and therefore also affects the breast glandularity [
The composition of breast tissue is defined by the BI-RADS parameter called “density” [
Meaning of the BI-RADS density.
BI-RADS density | |
---|---|
Density value | Description |
1 | Breast tissue mainly fatty |
2 | Scattered fibroglandular densities |
3 | Breast tissue heterogeneously dense |
4 | Breast tissue extremely dense |
The degree of difficulty of analyzing a mammogram depends on the nature of the breast tissue, as can be seen in Figure
The left image shows the RCC view (right craniocaudal) of the case 1468 in USF's DDSM database that corresponds to a woman of 71 years, to which the radiologist assigned a density equal to 1. The right image shows the RMLO view (right mediolateral oblique) of the case 1985 in the same database corresponding to a woman of 41 years, and density equal to 4.
The rest of our paper is organized as follows. Section
In this section, we present the techniques used in this study for the generation and selection of prototypes, for feature extraction tasks, and for classification. We are going to review these methods in the following subsections.
In the literature, one can find various proposals focused on the detection and segmentation of masses on mammograms, such as those reviewed in [
The DDSM is a resource available to the mammographic image analysis research community and contains a total of 2,620 cases. Each case provides four screening views: mediolateral oblique (MLO) and craniocaudal (CC) projections of left and right breasts. Therefore, the database has a total of 10,480 images. Cases are categorized in four major groups:
The DDSM database contains 2,582 images that contain an abnormality identified as mass, whether benign or malignant. Some of them were located on the border of the mammograms and could not be used (see the following paragraph, dedicated to ROIs). Consequently, only 2,324 prototypes could be considered, namely, those which might be taken centered in a square without stretching. Some mass prototype examples are shown in Figure
Examples of masses for each combination of shape and margins. Each ROI image has been resized to a common size of 128
Ground truth region was defined by radiologist (red solid line) and was considered ROI (purple box) on a DDSM mammogram.
The generated regions have different sizes but the selected image feature extractor needs to operate on regions with the same size, so we need to reduce the size of the selected regions to common sizes. The reduction of ROIs to a common size has demonstrated to preserve mass malignancy information [
As we commented above, we used Independent Component Analysis (ICA) [
Decomposition of the image using an ICA basis.
The added value of our approach, compared to other methods that use some generic functions, is that our basis should be more specific for our problem, since it is obtained using a selection of the images to be classified.
In that sense, if we suppose that we have
In our system, the classification algorithm has the task of learning from data. An excessively complex model will usually lead to poorly generalizable results. It is advisable to use at least two independent sets of patterns in the learning process: one for training and another for testing. In the present work, we use three independent sets of patterns: one for training, one to avoid overtraining (validation set), and another for testing [
We implement MLP with a single hidden layer, and a variant of the Back-Propagation algorithm termed Resilient Back-Propagation (Rprop) [
As with MLP, the goal of using an SVM is to find a model (based on the training prototypes) which is able to predict the class membership of the test subset’s prototypes based on the value of their characteristics. Given a labeled training set of the form
In this algorithm, the training vectors
In the model,
In this section, we provide an overview of the structure of our system, describing the main steps required to configure the system to discriminate prototypes of masses from prototypes of normal breast tissue.
We provide an overview of our system’s structure, describing the main steps required to configure the system in order to discriminate ROIs corresponding to masses from ROIs corresponding to normal tissue. In addition, we will present the experiments devised to determine how the performance of these classifiers is affected by the breast density, that is associated with each mammography (and, therefore, with each ROI).
The main scheme that summarizes in a more graphical form all phases of this work is represented in Figure
Overview of the system proposed.
To determine the optimal configuration of the system, various ICA bases were generated to extract different numbers of features (from 10 to 65 in steps of 5) from the original patches, and operating on patches of the different sizes noted above (
The training process consisted of two stages—first training the NN classifiers, and then the SVM classifiers. The results thus obtained on the test subsets in a 10-fold cross validation scheme are shown in Figure
Choosing the best configuration for the feature extractor. The top row shows the results when using an NN classifier, and the bottom row shows the results for an SVM classifier. In both cases, prototypes of
The study was done with a total of 5052 prototypes: 1197 of malignant masses, 1133 of benign masses, and 2722 of normal tissue.
We found that the optimal ICA-based feature extractor configuration for an NN classifier was a feature extractor that operated on prototypes of
To determine how the density associated to each mammography (and, therefore, to each ROI) could affect the performance of our system, we carried out five experiments. In each of the experiments we made the same tests, but with different sets of prototypes: first with all the available prototypes (one experiment), and then with prototypes obtained from mammograms with a given value of density (four experiments).
For each of the experiments, a 30-fold cross validation scheme was used. In this process, 30 partitions of the data set are generated randomly, and, iteratively, one partition is reserved for test, and the remaining 29 are used for training and validation (80% of the prototypes for training and 20% for validation). As a result we have 30 performance values that can be studied statistically.
Finally, to analyze the performance and compare results, ROC curves [
Regarding the prototypes, Table
Average number of prototypes of malignant (M) and benign (B) masses and normal tissue (N) divided into training, validation, and test sets, distributed by density value.
Average distribution of prototypes by density value in the 30-fold cross validation study | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Density | Training | Validation | Test | |||||||
M | B | N | M | B | N | M | B | N | Total | |
1 | 158.2 | 132.9 | 280.0 | 50.8 | 43.0 | 89.3 | 7.0 | 6.0 | 12.7 | 780.0 |
2 | 402.9 | 349.4 | 720.0 | 129.0 | 111.9 | 232.0 | 18.0 | 15.7 | 32.4 | 2012.0 |
3 | 236.3 | 238.9 | 559.6 | 76.2 | 76.3 | 178.5 | 10.5 | 10.8 | 25.0 | 1412.0 |
4 | 78.9 | 108.3 | 432.9 | 25.5 | 34.8 | 139.7 | 3.6 | 4.8 | 19.4 | 848.0 |
| ||||||||||
Overall | 876.3 | 829.5 | 1192.5 | 281.5 | 266.0 | 639.5 | 39.1 | 37.3 | 89.5 | 5052.0 |
In Figure
Average number of prototypes of malignant and benign masses and normal tissue divided into training (L), validation (V), and test (T) sets, distributed by density value.
As we stated above, our main interest in this paper is to evaluate the dependence presented by our system with the composition of breast tissue, determined by the BI-RADS density parameter. For this study, we have considered all those prototypes of masses in the DDSM for which a square shape could be obtained by determining the smallest squared region that includes the complete area marked by the radiologist, and always without resizing. As we commented before, the distribution of prototypes is shown in Table
To determine the influence of the density parameter in the performance of our system, we applied first a 30-fold cross validation scheme to train and test the system with the whole set of 5,052 prototypes. Next, a ROC analysis was performed over each of the 30 test results, calculating the area under curve (AUC) as a parameter to describe the performance over each test set. Finally, the mean value of the 30 AUCs was determined, to give a parameter that describes the overall performance of the system with those prototypes.
This scheme was repeated later considering sets of prototypes containing only a given value of the density parameter, in order to compare the results. Those results are presented in Table
This table shows the average results obtained over the different test subsets (considering all the prototypes, or only for those with a given density), as area under the ROC curve (AUC) for a confidence interval (CI) of 95%.
Mass-Normal tissue. Depending on density 30-fold cross validation test | ||||
---|---|---|---|---|
SVM | NN | Description | ||
AUC | CI (95%) | AUC | CI (95%) | |
|
|
|
|
Overall |
|
|
|
|
cases with density 1 |
0.959 |
|
0.961 |
|
cases with density 2 |
0.927 |
|
0.916 |
|
cases with density 3 |
|
|
|
|
cases with density 4 |
Results obtained over the test subsets, considering all the prototypes.
Results obtained over the test subsets, considering NN classifiers and the cases of density 1 and 4.
Results obtained over the test subsets, considering SVM classifiers and the cases of density 1 and 4.
As we expected, the best results were obtained for a density value equal to 1 (virtually fatty breasts with very little breast tissue, usually corresponding to old women), and the worst results for a density of 4 (very dense breasts, with much breast tissue, usually corresponding to young women). These results are consistent with other studies about the nature of cancer cases that are discarded by radiologists in a larger proportion [
Besides, it is important to remark that there are very different distributions of prototypes for the different values of density. While for a density of 1 the number of mass and normal tissue prototypes is almost the same (a 3% difference favorable to the number of mass prototypes), for a density of 4 the difference is very important (a 57% favorable to the number of normal tissue prototypes). This difference in the number of prototypes of each class introduces a statistical bias which could affect the training of the classifiers.
In this work, we have studied the influence of the BI-RADS density parameter assigned to a mammogram over the performance of our system. As a result, we have concluded that the performance is affected by that parameter, since the AUC of the ROC curves decreases from 0.965 to 0.892 (−7.56%) for NN classifiers and 0.964 to 0.897 (−6.95%) for SVM classifiers when we move from density 1 to density 4. However, taking into account that mammograms with density 4 are more difficult to analyze than those with density 1 (density 4 means very dense breasts with much breast tissue, so it is difficult to find masses, while density 1 means that very little breast tissue is present), and considering also the difficulties during training due to the different number of prototypes of both classes, we can conclude that our system is rather robust and performs very well even in the worst conditions.
Besides, it is important that the AUC for the global set of prototypes is only 2.28% and 2.07%, respectively, for NN and SVM classifiers, lower than the performance achieved for density 1, which is the most favourable case, so the performance of the system with the overall set is acceptable.
Finally, as the number of samples in the subsets of prototypes with densities equal to 2 and 3 is significantly higher than those in the subsets with densities equal to 1 and 4, we conclude that the variation of performance due to the BI-RADS density of our system is limited to about 7% in both cases.
On the other hand, it worth to remark the equality of performance obtained with the two types of classifiers tested.
The authors declare that they have no conflict of interests.
A. G.-Manso developed the preprocessing system (selection and acquisition of ROIs, and obtaining the ICA bases), integrated the global system, conducted the experiments, obtaining and analysing the results, and drafted the paper. C. J. G.-Orellana developed the neural network classifier training algorithm, helped to adapt and adjust the hardware for the simulations (two Beowulf clusters with 45 and 48 nodes, resp.), together with the adaptation of the software to be run on the clusters. R. G.-Caballero developed the database associated with the experiments. H. M. G.-Velasco was responsible for the assembly and tuning of the clusters. M. M.-Macias developed the training algorithm for the support vector machine classifiers.
This work was supported in part by the “Junta de Extremadura” and FEDER through Projects PRI08A092, PDT09A036, GR10018, and PDT09A005.