Breast Cancer is a serious threat and one of the largest causes of death of women throughout the world. The identification of cancer largely depends on digital biomedical photography analysis such as histopathological images by doctors and physicians. Analyzing histopathological images is a nontrivial task, and decisions from investigation of these kinds of images always require specialised knowledge. However, Computer Aided Diagnosis (CAD) techniques can help the doctor make more reliable decisions. The state-of-the-art Deep Neural Network (DNN) has been recently introduced for biomedical image analysis. Normally each image contains structural and statistical information. This paper classifies a set of biomedical breast cancer images (BreakHis dataset) using novel DNN techniques guided by structural and statistical information derived from the images. Specifically a Convolutional Neural Network (CNN), a Long-Short-Term-Memory (LSTM), and a combination of CNN and LSTM are proposed for breast cancer image classification. Softmax and Support Vector Machine (SVM) layers have been used for the decision-making stage after extracting features utilising the proposed novel DNN models. In this experiment the best Accuracy value of 91.00% is achieved on the 200
The unwanted growth of cells causes cancer which is a serious threat to humans. Statistics show that millions of people all over the world suffer various cancer diseases. As an example Table
Cancer statistics for Australia 2017 [
Female | Male | Total | |
---|---|---|---|
Estimated number of new diagnoses (all cancers) | 62005 | 72169 | 134174 |
Estimated number of deaths | 20677 | 27076 | 47753 |
Estimated new cases of diagnosis (breast cancer) | 17586 | 144 | 17730 |
Deaths due to breast cancer | 3087 | 57 | 3114 |
Proper BC diagnosis can save thousands of women’s lives, and proper diagnosis largely depends on identification of the cancer. Finding BC largely depends on capturing a photograph of the cancer-affected area which gives information about the current situation of the cancer. A few biomedical imaging techniques have been utilised, some of which are noninvasive such as Ultrasound imaging, X-ray imaging, and Computer Aided Tomography (CAT) imaging. Other imaging techniques are invasive such as histopathological images. Investigation of these kinds of images is always very challenging, especially in the case of histopathological imaging due to its complex nature. Histopathological image analysis is nontrivial, and the investigation of this kind of image always produces some contradictory decisions by doctors. Since doctors and physicians are human, it is natural that errors will occur.
A Computer Aided Diagnosis (CAD) system provides doctors and physicians with valuable information, for example, classification of the disease. Different research groups investigate opportunities to improve the CAD systems’ performance. Some advanced engineering techniques have been utilised to take a general image classifier and adjust it as a biomedical image classifier, such as a breast image classifier. The state-of-the-art Deep Neural Network (DNN) techniques have been adapted for a BC image classifier to provide reliable solutions to patients and their doctors.
The basic working principle of DNN lies in the basic neural network (NN). Rosenblatt in 1957 [
As with mammogram images, histopathological breast images have been classified by different research groups. Referring to the most recent, Zheng et al. classify a set of histopathological images into benign and malignant classes by locating the nucleus from the images using the blob detection method [
Images normally preserved a local as well as a hidden pattern which represent similar information. Histopathological images represent different observations of biopsy situation. The biopsy images which belong to the same groups normally preserve similar kinds of knowledge. Unsupervised learning can detect this kind of hidden pattern. The main contribution of this paper is to classify a set of biomedical breast cancer images using proposed novel DNN models guided by an unsupervised clustering method. Three novel DNN architectures are proposed based on a Convolutional Neural Network (CNN), a Long-Short-Term-Memory (LSTM), and a combination of the CNN and LSTM models. After the DNN model extracts the local and global features from the images the final classification decision is made by the classifier layer. As the classifier layer, this paper has utilised both the Softmax layer and a Support Vector Machine (SVM). Figure
Overall image classifier model for benign and malignant image classification.
The remainder of this paper is organized as follows. Section
Images naturally contain significant amounts of statistical and geometrical information. Representation of this kind of structural learning is a prior step for many data analysis procedures such as image classification. One of the techniques of finding the structural information is clustering the data in an unsupervised manner. Clustering allows the same kind of vector to be partitioned into the region. The clustering method partitions data of a similar nature and information in such a way that the partition between the grouped data is maximised. A few clustering methods are available. To find the hidden structure of the data, in this paper, we use the The The Mean-Shift (MS) algorithm by nature is nonparametric and does not have any assumption about the number of clusters. The MS algorithm can be described as shown in Algorithm
( (2 centroid point is (3 based on (4 (5
( Define a neighbour determining function ( ( ( ( (6
Figure
(a), (b), and (c) represent an original benign image, the KM cluster-transformed image, and the MS cluster-transformed image, respectively. (d), (e), and (f) represent an original malignant image, the KM cluster-transformed image, and the MS cluster-transformed image, respectively.
A Deep Neural Network is a state-of-the art technique for data analysis and classification. A few different DNN models are available, among them the Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). They have made some revolutionary improvements in the data analysis field. The following subsection will present the working principle of CNN and RNN (specially on the Long-Short-Term-Memory algorithm) and the working mechanism of the combination of the CNN and LSTM methods.
A CNN model is an advanced engineering version of a conventional neural network where the convolution operation has been introduced, which allows the network to extract local as well as global features from the data, enhancing the decision-making procedure of the network. To perfectly control the workflow of a CNN network, along with a convolutional layer, a few intermediate layers have been introduced. These are explained in more detail below.
Each of the neurons produces a linear output. When the output of a neuron is fed to another neuron, it eventually produces another linear output. To overcome this issue nonlinear activation functions such as Sigmoid TanH ReLU Leaky-ReLU
have been introduced.
Figure
Sigmoid, TanH, ReLU, and Leaky-ReLU.
Figure
The main ingredient of the convolutional layer is the kernel, which scans through all the input data and tries to extract the global features. The number of steps a kernel takes each time is known as the stride. The border row and column positions might not be convolved perfectly if we select imperfect stride steps and size. To perfectly conduct the convolution operation at the border, a few extra rows and columns (with all zeros) are added, which is known as zero padding.
The convolutional model produces a significant amount of feature information. As the model structure increases, the amount of feature information also increases, which actually increases the computational complexity and makes the model more sensitive. To overcome this kind of problem, a sampling process has been introduced: Max-Pooling Average Pooling Mixed max-average pooling Gated max-average pooling. Figure
Pooling operation performed by
A DNN deals with a large number of neurons, which enables the network to take a direction where the network takes into consideration a large number of predictions. This kind of situation provides very good performance in the training dataset and worse performance for the test dataset. This kind of problem is known as an overfitting problem. To overcome this kind of problem the drop-out procedure has been introduced. It is described in more detail below.
Drop-out.
At the end of the network, all the neurons are arranged in a flattened way. The neurons of the flat layer are fully connected to the next layer and behave like a conventional neural network. Normally more than one fully connected layer is introduced. Consider the last layer as the “end” layer; then, at the layer before the “end” layer, there must be at least one flat layer or fully connected layer. Then the end layer function can be represented as
Workflow of a Convolutional Neural Network.
In the decision layer Softmax-Regression techniques as well as the Support Vector technique are utilised. In the Softmax layer, the cross-entropy losses are calculated such as where Here
While a CNN learns from scratch, an error signal is fed back to the input. In a Recurrent Neural Network, instead of learning from scratch the network learns from the reference point. The output of a particular layer is fed back to the input which works as the reference input. A generalised RNN model is presented in Figure
A generalised RNN model, where the RNN output is computed and the reference information passes through the hidden unit.
Here,
A normal RNN suffers due to a vanishing-gradient probability. To overcome this problem, the Long-Short-Term-Memory (LSTM) architecture has been introduced by Hochreiter and Schmidhuber [
A generalised cell structure of an LSTM.
A CNN has the benefit of extracting global information. On the other hand, an LSTM has the ability to take advantage of long-term dependencies of the data sequences. To utilise both these advantages, the CNN and LSTM models have been hybridised together for the classification [
From the output of the CNN model, it is difficult to generate an undirected graph to make the data into the time-series format, so that the network can extract the dependencies of the data. To do this we have converted the convolutional output (which is 2-dimensional) into 1D data. Figure
CNN and LSTM models combined.
We have utilised three different models for our data analysis (Figure
Conventional CNN, LSTM based architecture (a, b), and CNN-LSTM based architecture (c).
In this method, the input image is convolved by a
After the C-2 layer the pooling operation P-1 is performed with the kernel size 2
In the second model we utilised the LSTM method, which is a branch of the RNN model. Our input image is in two-dimensional format. To make it a suitable format for the LSTM model we have converted the data to 1D data format, and the newly created data vector is 3072
In this model we have utilised both the CNN model and the LSTM model together. At first the input image is convolved by the convolutional layer C-1 with a
We have utilised the BreakHis breast image dataset for our experiment [
Statistical breakdown of the BreakHis dataset.
As Figure
Following subsections analyze the performance of the algorithms based on parameters such as True Positive (TP/Sensitivity), False Positive (FP), True Negative (TN/Specificity), False Negative (FN), Accuracy, Precision, recall, and Matthews Correlation Coefficient (MCC). For the sake of comparison we have also performed all the experiments on the original images and this particular case is represented as (OI). When we utilised the KM algorithm we have fixed the cluster size
This subsection describes the True Positive (TP/Sensitivity), False Positive (FP), True Negative (TN/Specificity), and False Negative (FN) performance from this experiment, and the data related to this experiment are presented in Table
Comparison of TN, FP, FN, and TP values% for the different algorithms and different datasets.
Dataset(x) | Cluster | Decision | Model 1 | Model 2 | Model 3 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Algorithm | TN | FP | FN | TP | TN | FP | FN | TP | TN | FP | FN | TP | ||
40x | MS | SVM | 68.39 | 31.60 | 8.00 | 92.00 | 53.55 | 46.44 | 5.20 | 94.76 | 59.10 | 40.00 | 5.00 | 95.00 |
Softmax | 93.00 | 7.00 | 19.00 | 81.00 | 75.28 | 24.71 | 12.23 | 87.76 | 68.39 | 31.60 | 9.40 | 90.50 | ||
KM | SVM | 67.00 | 32.00 | 6.00 | 93.10 | 53.00 | 46.99 | 7.00 | 92.00 | 70.68 | 29.30 | 10.00 | 90.00 | |
Softmax | 84.82 | 15.51 | 7.00 | 92.94 | 72.98 | 27.01 | 10.58 | 89.51 | 70.68 | 29.31 | 6.90 | 93.10 | ||
OI | SVM | 62.00 | 37.00 | 5.00 | 94.00 | 66.00 | 34.00 | 10.00 | 90.00 | 72.00 | 28.00 | 11.00 | 89.00 | |
Softmax | 77.00 | 23.00 | 6.00 | 93.00 | 74.00 | 26.00 | 10.00 | 90.00 | 78.00 | 21.00 | 5.00 | 94.00 | ||
| ||||||||||||||
100x | MS | SVM | 68.00 | 32.00 | 11.00 | 88.00 | 66.00 | 33.00 | 9.00 | 90.11 | 53.00 | 46.00 | 7.00 | 92.00 |
Softmax | 80.00 | 20.00 | 6.00 | 94.00 | 72.57 | 27.42 | 17.00 | 82.00 | 54.85 | 45.14 | 10.00 | 90.00 | ||
KM | SVM | 80.00 | 2.00 | 6.00 | 94.00 | 56.00 | 44.00 | 6.10 | 93.80 | 66.00 | 34.00 | 12.00 | 88.00 | |
Softmax | 75.00 | 25.00 | 4.00 | 95.96 | 64.71 | 35.42 | 12.00 | 87.20 | 61.14 | 38.87 | 10.10 | 89.00 | ||
OI | SVM | 70.00 | 30.00 | 8.00 | 92.00 | 56.00 | 44.00 | 4.00 | 96.00 | 71.00 | 28.00 | 6.00 | 93.00 | |
Softmax | 64.00 | 36.00 | 8.00 | 92.00 | 71.00 | 29.00 | 12.00 | 88.00 | 73.00 | 26.00 | 8.00 | 91.00 | ||
| ||||||||||||||
200x | MS | SVM | 70.70 | 29.00 | 4.10 | 95.82 | 61.61 | 38.38 | 4.00 | 95.30 | 72.00 | 27.00 | 6.80 | 93.00 |
Softmax | 81.00 | 19.00 | 5.00 | 95.00 | 75.75 | 24.24 | 9.80 | 90.17 | 65.00 | 35.00 | 2.00 | 97.00 | ||
KM | SVM | 69.69 | 30.30 | 3.60 | 96.00 | 64.14 | 35.85 | 5.00 | 94.00 | 75.00 | 24.00 | 8.00 | 91.00 | |
Softmax | 85.85 | 14.16 | 8.00 | 91.00 | 78.00 | 21.00 | 12.00 | 87.00 | 71.71 | 22.20 | 6.00 | 93.00 | ||
OI | SVM | 73.00 | 27.00 | 4.00 | 96.00 | 63.63 | 36.66 | 6.00 | 93.00 | 76.00 | 23.00 | 6.00 | 94.00 | |
Softmax | 78.00 | 22.00 | 5.00 | 94.10 | 76.00 | 24.00 | 12.00 | 88.00 | 70.00 | 30.00 | 6.00 | 93.00 | ||
| ||||||||||||||
400x | MS | SVM | 68.30 | 31.69 | 4.00 | 95.31 | 53.55 | 46.44 | 5.20 | 94.76 | 61.20 | 38.79 | 5.70 | 94.21 |
Softmax | 84.00 | 15.00 | 6.00 | 93.00 | 65.01 | 34.97 | 9.30 | 90.06 | 61.20 | 38.79 | 6.00 | 93.38 | ||
KM | SVM | 68.00 | 31.00 | 4.00 | 95.00 | 53.00 | 46.99 | 7.00 | 92.00 | 59.01 | 40.98 | 6.61 | 93.38 | |
Softmax | 78.00 | 22.00 | 4.00 | 96.00 | 63.93 | 36.06 | 11.57 | 88.42 | 65.00 | 35.00 | 5.00 | 95.00 | ||
OI | SVM | 75.00 | 25.00 | 6.00 | 94.00 | 61.00 | 39.00 | 12.00 | 88.00 | 79.95 | 24.06 | 9.00 | 90.00 | |
Softmax | 76.00 | 24.00 | 10.00 | 90.94 | 70.00 | 30.00 | 10.00 | 90.00 | 82.51 | 17.48 | 12.67 | 87.32 |
For the 40
For the 100
When we use the 200
For the 400
Figure
Comparison of the Accuracy in Model 1, Model 2, and Model 3.
For the 100
When we use the 200
For the 400
Figure
Comparison of Precision between Model 1, Model 2, and Model 3.
For the 100
For the 200
For the 400
Figure
Comparison of
For the 100
For the 200
For the 400
The best Accuracy performance is achieved when we utilised Model 1 along with MS clustering and the Softmax layer on the 40
Accuracy Loss and MCC values for Model 1 when we have utilised 40
After epoch 300 the Train Accuracy remains constant at about 90.00%. Interestingly, after around epoch 180 the Train Accuracy outperforms the Test Accuracy; after around epoch 180 the difference in Accuracy performance between the Train and Test increased, with the Test remaining constant.
Model 2 provides the best Accuracy with the 200
Accuracy, loss, and MCC values for Model 2 when we utilise the 200
Model 3 is the most accurate with the 200
Accuracy, loss, and MCC values for Model 3 with the 200
TS and ID have an effect on LSTM performance. In this subsection we analyze the effect of the TS and ID values with reference to Accuracy, average time, and required parameters for Model 2.
Table
Average time and parameters for various TS and ID.
TS | ID | Average time (s) | Parameters |
---|---|---|---|
24 | 128 | 191 | 58280 |
32 | 96 | 240 | 52904 |
48 | 64 | 346 | 47528 |
64 | 48 | 438 | 44840 |
96 | 32 | 636 | 42152 |
128 | 24 | 822 | 40808 |
For the 40
(a), (b), (c), and (d) represent the Accuracy for the 40
For the local partitioning we have utilised KM and MS algorithms. The cluster size of the KM method and the Bandwidth (neighbour size) of the MS method largely control the performance of the clustering. In this subsection we investigate how these two parameters affect the overall performance which has been presented in Table
Effect of the cluster size
TN | FP | FN | TP | Precision | | Accuracy | ||
---|---|---|---|---|---|---|---|---|
KM | | 85.85 | 14.16 | 8.00 | 91.00 | 91.00 | 92.00 | 90.00 |
| 77.00 | 23.00 | 06.00 | 94.00 | 90.00 | 92.00 | 88.90 | |
| 77.00 | 23.00 | 05.00 | 95.00 | 90.00 | 93.00 | 89.75 | |
| ||||||||
MS | BW = 0.2 | 81.00 | 19.00 | 5.00 | 95.00 | 93.00 | 93.00 | 91.00 |
BW = 0.4 | 70.00 | 30.00 | 04.00 | 96.00 | 87.10 | 91.00 | 87.00 | |
BW = 0.6 | 76.00 | 24.00 | 06.00 | 94.00 | 89.00 | 91.00 | 87.00 |
For the MS method the obtained Precision values are 93.00%, 87.10%, and 89.00%, respectively, for BW equal to 0.2, 0.4, and 0.6, respectively. The best Accuracy performance (91.00%) is achieved when we utilise BW = 0.2. For both BW equal to 0.4 and 0.6 the obtained Accuracy was 87.00% which is less than when BW is equal to 0.2.
DNN methods have been implemented for breast image classification with some success. Table
CNN and histopathological findings.
Authors | Dataset | Method | Augmentation | Number of classes | Accuracy | Sensitivity | Recall | ROC |
---|---|---|---|---|---|---|---|---|
Araujo et al. [ | [ | CNN | YES | 2 | 80.60 | 70.00 | — | — |
Araujo et al. [ | [ | CNN + SVM | YES | 2 | 83.20 | 80.00 | — | — |
B. Bejnordi | BREAST | CNN | YES | — | 92.00 | — | — | 92.00 |
Bejnordi et al. [ | [ | CNN | YES | — | 92.45 | — | — | — |
However, we cannot exactly compare our performance with this existing finding because of the different datasets. We have compared our findings with the findings based on the BreakHis dataset which are presented in Table
Comparing Accuracy (%) in different models.
40x | 100x | 200x | 400x | |
---|---|---|---|---|
CNN [ | 90.40 | 87.40 | 85.00 | 83.80 |
VLAD [ | 91.80 | 92.10 | 91.40 | 90.20 |
PFTAS [ | 83.80 | 82.10 | 85.10 | 82.30 |
ORB [ | 74.40 | 69.40 | 69.60 | 67.60 |
LPQ [ | 73.80 | 72.80 | 74.30 | 73.70 |
LBP [ | 75.60 | 73.20 | 72.90 | 73.10 |
GLCM [ | 74.70 | 78.60 | 83.40 | 81.70 |
CLBP [ | 77.40 | 76.40 | 70.20 | 81.80 |
The judgement about benign and malignant status from digital histopathological images is subjective and might vary from specialist to specialist. CAD systems largely help in making an automated decision from the biomedical images and allow both the patient and doctors to have a second opinion. A conventional image classifier utilises hand-crafted local features from the images for the image classification. However, the recent state-of-the-art DNN model mostly employs global information using the benefit of kernel-based working techniques, which act to extract global features from the images for the classification. Using this DNN model, this paper has classified a set of breast cancer images (BreakHis dataset) into benign and malignant classes.
Images normally preserve some statistical and structural information. In this paper, to extract the hidden structural and statistical information, an unsupervised clustering operation has been done and the DNN models have been guided by this clustered information to classify the images into benign and malignant classes. At the classifier stage both Softmax and SVM layers have been utilised and the detailed performance has been analyzed. Experiments found that the proposed CNN-based model provides the best performance other than the LSTM model and the combination of LSTM and CNN models. We have found that, in most cases, Softmax layers do perform better than the SVM layer.
Most of the recent findings on the BreakHis dataset provide information about the Accuracy performance but do not provide information about the sensitivity, specificity, Recall,
Providing a definite conclusion about the biomedical situation needs to be considered as it is directly related to the patient’s life. In a practical scenario, the classification outcome of the BC images should be 100.00% accurate. Due to the complex nature of the data we have obtained 91% Accuracy, which is comparable with the most recent findings. There are a few avenues for obtaining more reliable solutions such as the following: Each histopathological image contains cell nuclei, which provide valuable information about the malignancy. So the DNN model guided by the cell nuclei orientation and position can improve the performance, since it provides more objective information to the network. As our dataset is comparatively too small to be used with a DNN model, in the future the following two cases can be considered: Data Augmentation Transfer Learning with some fine local tuning. Locally hand-crafted features also provide valuable information. So parallel feeding of the local data along with the raw pixels could improve the model’s performance with reference to Accuracy.
The authors declare that there are no conflicts of interest regarding the publication of this paper.