Cluster of microcalcifications can be an early sign of breast cancer. In this paper, we propose a novel approach based on convolutional neural networks for the detection and segmentation of microcalcification clusters. In this work, we used 283 mammograms to train and validate our model, obtaining an accuracy of 99.99% on microcalcification detection and a false positive rate of 0.005%. Our results show how deep learning could be an effective tool to effectively support radiologists during mammograms examination.
Breast cancer is one of the most common malignant neoplasms in the female population. The referral examination used for screening of breast cancer is mammography.
Mammography is a radiological procedure that uses a bundle of X photons to map the breast tissue attenuation. With the use of high-resolution detectors, it is possible to detect microstructures with a high atomic number in the breast. Among them, breast microcalcification (MC) can be an indicator for the diagnosis of breast cancer as it is the expression of cell necrosis.
In mammograms, microcalcifications appear as regions with high intensity compared to the local background, and they can vary in size and have shapes ranging from circular geometries to strongly irregular ones with sharp or soft contours.
The Breast Imaging Reporting and Data System (BIRADS) standardized the interpretation of MCs by defining a scale ranging from 2 (benign finding) to 5 (highly suspicious of malignancy) based on their shape, density, and distribution within the breast.
An important type of benign calcification that can be seen incidentally on mammography is breast arterial calcification (BAC), which seems to correlate with coronary calcification. Breast vascular calcifications are differentiated from malignant and ductal calcifications by size, morphology, and distribution and appear as linear “tram tracks” [
Since there are studies [
Because of the variability of connective, glandular, and adipose tissue within the breast, microcalcifications are often difficult to find even for experienced operators. The heterogeneity of the breast tissue and projective image capture geometry implicate the impossibility to use a simple density threshold to automatically detect MCs. In addition, it is difficult to carry out the research by means of morphological filtering operations due to the large variability of their geometry.
Due to the intrinsic limitations of classical methods, with this work, we propose a system based on deep learning [
In the literature, a wide range of algorithms has been proposed for the automatic detection of clusters of mammary calcifications, highlighting the importance of this task. The first attempts were mainly based on the spatial characteristics of these lesions; an example of that is the morphological system proposed by Zhao et al. [
Subsequently, Wang and Karayiannis [
Another proposal for the MCs detection pipeline is the multiresolutional analysis carried out by Netsch and Peitgen [
Later on, several papers proposed machine learning approaches to solve the task. Particularly, Edwards et al. [
Unfortunately, even if in some cases these methods were able to achieve a good sensitivity (i.e., in [
Due to intrinsic limitations of classical methods, recent years have seen an increasing interest in nonlinear approaches based on convolutional neural networks. Such tools allow the avoidance of hand-crafted features definition, providing at the same time both automatic feature extraction and evaluation for the task at hand.
In particular, Mordang et al. [
Going further, in this paper, we propose a fast MC detection and segmentation procedure based on the usage of two CNNs: one to quickly detect the candidate region of interests (ROIs) and one to subsequently segment them. Later, our system identifies the clusters of MCs in the image.
Subsequent to the detection of calcification clusters, further development of this work might deal with the identification of potential cancers or identification of BAC for the CVD stratification. In this work, however, we focus only on the segmentation of MCs and the cluster detection, without making any clinical assessment of patient risk.
Mammograms are high-resolution images, and they can correspond to big matrices (e.g., 4095 × 5625 pixels) which could be time expensive to analyse. For this reason, we developed a model consisting of two CNNs: we called the first CNN detector while the second one was called segmentator. Detector’s role is detecting candidate ROIs to be analysed, while segmentator classifies every pixel inside the given ROI.
The process of suspect ROIs detection must be noncomputationally expensive because its role is to accelerate the processing of the whole mammographic image, and then we preprocessed input images using Otsu thresholding to detect background pixels and exclude them from further evaluation.
We implemented both neural networks in Python, using the open-source software library Tensorflow.
We chose a patch-based approach to process the input mammograms assuming the local information sufficient to classify such small and circumscribed regions. Moreover, by contrast with fully convolutional approaches, a patch-based approach allowed us to considerably increase the training set and easily perform a good data augmentation.
With this purpose in mind, we extracted squared patches with
In the proposed model, once completed the training process of the networks and during test time, the input mammogram is patched at run-time with
Example of breast segmentation process on a zoomed region in the mammogram (best viewed in color).
The resulting mask is then analysed via a labelling algorithm in order to localize the position of every MC inside the matrix. Clusters are considered inside regions with more than 5 distinct MCs on cm2, as the radiological definition suggests [
For our experiments, we used 283 mammography images with a resolution of 0.05 mm. Among these images, there are both natively digital mammograms and digitized images.
Every image is associated to the corresponding manual segmentation mask realized by a breast imaging radiologist. Since each segmentation mask consisted in a binary matrix, by classifying every pixel as a part of a MC or otherwise, we could utilize them to validate the obtained results.
In order to train our neural networks, the binary masks were also utilized to create the labels to be fed together with training samples to the CNNs, as ground truth.
We randomly chose 231 mammograms and the annotated labels to build the training set while 25 mammographic images were used to validate intermediate results and compare different networks architectures. The remaining 27 images were taken apart to build the test set and measure the final performances.
We experimented different values of patch dimension
We paid particular attention to collect samples for training, validation, and test set, trying to make sure that the networks could always see as many input typologies as possible. For this reason, we contemplated 4 possible classes of patch (Figure Class C1: patches whose central pixel belongs to a microcalcification Class C2: patches with MCs close to the center but with the central pixel not belonging to a calcification Class C3: cases where a calcification resides inside the patch but is located peripherally, and the central pixel does not belong to a MC Class C4: cases where no MC is present inside the patch
Example of patches and their subdivision in 4 different classes.
Summary of the patch classes.
MC position | Class C1 | Class C2 | Class C3 | Class C4 |
---|---|---|---|---|
Center | ✓ | |||
Close to the center | ✓ | |||
Periphery | ✓ | |||
Outside | ✓ |
Since MCs are small circumscribed regions inside mammograms, certainly, class C4 contains the largest number of patches inside the database and class C1 is the less numerous class.
Moreover, we considered patches of class C2 as those containing calcifications in a range of 2 to 3 pixels from the center. This is a tricky class because as consequence of partial volume effect, MC border is frequently weakly defined and the classification of these pixels is often uncertain.
We organized the training set in a SQLite database to gather a customizable access to its samples during training. In particular, during the training process, we sampled patches belonging to each of these classes paying attention to feed the network with the same number of positive and negative samples, which means a good balance of input minibatches. We built each minibatch on the fly with random patches sampled from the database. Since this approach leads to always different minibatches, as a limit case, we could say we will not ever have exactly the same samples in two different minibatches, we believe it could improve the regularizing effect of batch normalization because it adds more randomicity to mean and variance inside minibatches. Moreover, we found this strategy useful to train networks with such a strongly unbalanced dataset.
In addition, we used data augmentation at training time to increase the dimension of the dataset with artificial samples, obtained randomly by rotating and flipping the images.
Patches for the validation set and test set were extracted from the 52 mammograms excluded from the training. Even in those cases, each set contained a balanced number of samples, considering the presence of each class inside.
Both detector and segmentator CNN share the same architecture (Figure
Neural network architecture implemented for segmentator and detector.
In particular, we tested the difference between the usage of a
In the architecture, the first and the second convolutional layers are followed by a max-pooling layer with 2 × 2 kernel to reduce the computational burden and induce the network to extract more abstract representations of the data. Two fully connected layers of 64 and 2 hidden units close the architecture. We additionally used the drop-out strategy with 50% probability over the 64-units fully connected layer to limit network overfitting.
Exception made for the last fully connected layer, each one of the above layers is followed by a batch normalization layer, which we found gave a consistent speed up in the learning process, as suggested by Ioffe and Szegedy [
Neural networks often employ the softmax function to map the nonnormalized output to a probability distribution over predicted output classes. In line with the literature, we computed the posterior probability to belong to the
As common in modern CNNs, we trained the model employing Stochastic Gradient Descent with Adam optimizer [
We trained each network with 256 sized minibatches. Moreover, we chose a learning rate of
We applied early stopping strategy, taking the last obtained model configuration before any evidence of overfitting on training set.
Patch dimensions with side of 29, 39, and 49 pixels were tested. The following results relate to the usage of patch dimension
We obtained a final accuracy of 98.22% for detector CNN and an accuracy of 97.47% for segmentator CNN when considered independently and tested on a balanced number of patches extracted from the test set images.
We did a more extensive analysis on the segmentation masks extracted from the entire mammograms. Since most classical methods suffer from a high false positive rate (FPR) in the MC detection process, we calculated the FPR obtained from our system. In particular, we analysed the binary masks obtained from the whole system (segmentator working on the preliminary ROIs identified by detector CNN on the entire mammograms), and we obtained an FPR of 0.005%. The final accuracy was 99.99% instead.
Table
The obtained test error rate for each class and the overall test accuracy for the detector CNN and the segmentator CNN, using both valid and same convolution.
Class C1 | Class C2 | Class C3 | Class C4 | Overall test accuracy | |
---|---|---|---|---|---|
Detector | |||||
|
0.19 | 1.22 |
|
1.69 | 96.04 |
|
0.09 | 0.47 |
|
2.09 |
|
|
|||||
Segmentator | |||||
|
1.85 |
|
0.34 | 0.55 | 96.37 |
|
1.57 |
|
0.24 | 0.20 |
|
To deepen our understanding of the network behavior, we also conducted an analysis of misclassified patches in the features domain using a nonlinear dimensionality reduction technique, namely, t-SNE [
Latent feature spaces of the first fully connected layer projected on 2D plane for the detector and segmentator neural networks. (a, c) Projection of samples from the positive and negative classes. (b, d) Misclassified sample position in the bidimensional projected spaces (best viewed in color).
Example of detector failure case. The patch on the left represents the misclassified input sample containing a microcalcification, while the patch on the right is the (well classified) closest class C4 sample in the features space. Below, you can see the ground truth segmentations. Please note that the maximum possible error is equal to 1 and an error <0.5 means that the input patch is still correctly classified (best viewed in color).
Example of segmentator failure case. The patch on the left represents the misclassified input sample containing a microcalcification, while the patch on the right is the (well classified) closest class C2 sample in the features space. Below, you can see the ground truth segmentations. Please note that the maximum possible error is equal to 1 and an error <0.5 means that the input patch is still correctly classified (best viewed in color).
We tested the entire model on GPU GTX 970, and every mammogram was processed in roughly 20 seconds.
An example of input region segmented by the model is represented in Figure
From the results illustrated in Table
We believe these differences to be independent from stochastic oscillations of the cost function during the network training. Instead, this improvement can most probably be explained by the observation that using
We consequently prefer the usage of a
With regard to the gap in the overall performance between the segmentator and the detector, it can have many contributions and/or interpretations. There is obviously a stochastic component due to the optimization procedure; moreover, the task performed by the detector may be easier than the one carried out by the segmentator (i.e., the detector may be looking to the whole input patch rather than only on its central pixels, relying on other signal carriers).
Table
As a confirmation of all these hypotheses, we conducted an analysis of the misclassified patches both for detector and segmentator neural networks. In particular, we analysed extracted data representations from the penultimate fully connected layer using t-SNE.
We examined C2, C3, and C4 top misclassified patches from the segmentator and extracted the nearest neighbors in the features domain belonging to class C1 and vice versa. On the other hand, we examined C1, C2, and C3 top misclassified samples from the detector and corresponding nearest neighbor samples in the features domain which belonged to class C4 and vice versa.
An example of meaningful misclassified patches and their closest
By contrast, we highlight how Table
In addition, aside from tricky classes, an interesting fact pointed out by a qualitative analysis of the segmentation masks concerns a certain inclination of the model to make mistakes in correspondence of the transition region from the breast tissue to the background pixels. This is probably due to the fact that these regions usually relate to areas with strong contrast variation. An example of the phenomenon is illustrated in Figure
Example of false positives in correspondence of the transition region from the breast tissue to the background pixels (best viewed in color).
Finally, this analysis highlighted how the major source of false positives seems to reside in the digitized images, where the presence of a greater quantity of widespread noise leads the network to commit a greater number of errors.
We propose a model to detect and segment breast microcalcifications within mammographic images. This model is composed of two consecutive blocks based on convolutional neural networks: the detector and the segmentator. Thanks to the preliminary analysis carried out by the first CNN, the computational burden is considerably reduced and the total segmentation process does not become time consuming.
Moreover, the quality of the achieved results suggests the potentialities of this tool to effectively support radiologists during mammograms examination, bringing aid during the nontrivial evaluation of uncertain regions and reducing the diagnosis time. This could be especially useful in the screening setting, where the large number of examinations could reduce the attention of the reader, to support diagnosis or to narrow differential diagnosis.
The data used to support the findings of this study are restricted by the ethical board committee in order to protect patient privacy.
This study was performed as part of the employment of the authors at Imaging Department, Fondazione Gabriele Monasterio, Massa, Italy.
The authors declare that they have no conflicts of interest.