Involvement of Machine Learning for Breast Cancer Image Classification: A Survey

Breast cancer is one of the largest causes of women's death in the world today. Advance engineering of natural image classification techniques and Artificial Intelligence methods has largely been used for the breast-image classification task. The involvement of digital image classification allows the doctor and the physicians a second opinion, and it saves the doctors' and physicians' time. Despite the various publications on breast image classification, very few review papers are available which provide a detailed description of breast cancer image classification techniques, feature extraction and selection procedures, classification measuring parameterizations, and image classification findings. We have put a special emphasis on the Convolutional Neural Network (CNN) method for breast image classification. Along with the CNN method we have also described the involvement of the conventional Neural Network (NN), Logic Based classifiers such as the Random Forest (RF) algorithm, Support Vector Machines (SVM), Bayesian methods, and a few of the semisupervised and unsupervised methods which have been used for breast image classification.


Introduction
The cell of the body maintains a cycle of regeneration processes. The balanced growth and death rate of the cells normally maintain the natural working mechanism of the body, but this is not always the case. Sometimes an abnormal situation occurs, where a few cells may start growing aberrantly. This abnormal growth of cells creates cancer, which can start from any part of the body and be distributed to any other part. Different types of cancer can be formed in human body; among them breast cancer creates a serious health concern. Due to the anatomy of the human body, women are more vulnerable to breast cancer than men. Among the different reasons for breast cancer, age, family history, breast density, obesity, and alcohol intake are reasons for breast cancer.
Statistics reveal that in the recent past the situation has become worse. As a case study, Figure 1 shows the breast cancer situation in Australia for the last 12 years. This figure also shows the number of new males and females to start suffering from breast cancer. In 2007, the number of new cases for breast cancer was 12775, while the expected number of new cancer patients in 2018 will be 18235. Statistics show that, in the last decade, the number of new cancer disease patients increased every year at an alarming rate. Figure 2 shows the number of males and females facing death due to breast cancer. It is predicted that in 2018 around 3156 people will face death; among them 3128 will be women which is almost 99.11% of the overall deaths due to breast cancer.
Women's breasts are constructed by lobules, ducts, nipples, and fatty tissues. Milk is created in lobules and carried towards nipple by ducts. Normally epithelial tumors grow inside lobules as well as ducts and later form cancer inside the breast [1]. Once the cancer has started it also spreads to other parts of the body. Figure 3 shows the internal construction from a breast image.
Breast cancer tumors can be categorized into two broad scenarios.
(ii) Malignant (Cancerous). Malignant cancer starts from an abnormal cell growth and might rapidly spread or invade nearby tissue. Normally the nuclei of the malignant tissue are much bigger than in normal tissue, which can be lifethreatening in future stages. Cancer is always a life-threatening disease. Proper treatment of cancer saves people's lives. Identification of the normal, benign, and malignant tissues is a very important step for further treatment of cancer. For the identification of benign and malignant conditions, imaging of the targeted area of the body helps the doctor and the physician in further diagnosis. With the advanced modern photography techniques, the image of the targeted part of the body can be captured more reliably. Based on the penetration of the skin and damage of the tissue medical photography techniques can be classified into two groups.
(i) Noninvasive. (a) Ultrasound: this photography technique uses similar techniques to SOund Navigation And Ranging (SONAR) which operates in the very-high-frequency domain and records the echos of that frequency, invented by Karl Theodore Dussik [2]. An ultrasound image machine contains a Central Processing Unit (CPU), transducer, a display unit, and a few other peripheral devices. This device is capable of capturing both 2D and 3D images. Ultrasound techniques do not have any side-effects, with some exceptions like production of heat bubbles around the targeted tissue. (b) X-ray: X-rays utilize electromagnetic radiation, invented by Wilhelm Conrad Roentgen in 1895. The mammogram is a special kind of X-ray (low-dose) imaging technique which is used to capture a detailed image of the breast [3]. X-rays sometimes increase the hydrogen peroxide level of the blood, which may cause cell damage. Sometimes X-rays may change the base of DNA. (c) Computer Aided Tomography (CAT): CAT, or in short CT imaging, is advanced engineering of Xray imaging techniques, where the X-ray images are taken at different angles. The CT imaging technique was invented in 1970 and has been mostly used for three-dimensional imaging. (d) Magnetic Resonance Imaging (MRI): MRI is a noninvasive imaging technique which produces a 3D image of the body, invented by Professor Sir Peter Marsfield, and this method utilizes both a magnetic field as well as radio waves to capture the images [4]. MRI techniques take longer to capture images, which may create discomfort for the user. Extra cautions need to be addressed to patients who may have implanted extra metal.
(ii) Invasive. (a) Histopathological images (biopsy imaging): histopathology is the microscopic investigation of a tissue. For histopathological investigation, a patient needs to go through a number of surgical steps. The photographs taken from the histopathological tissue provide histopathological images (see Figure 4).

Breast Image Classification
Various algorithms and investigation methods have been used by researchers to investigate breast images from different perspectives depending on the demand of the disease, the status of the disease, and the quality of the images. Among the different tasks, for breast image classification, machine learning (ML) and the Artificial Intelligence (AI) are heavily utilized. A general breast image classifier consists of four stages (see

Available Breast Image
Databases. Doctors and physicians are heavily reliant on the ultrasound, MRI, X-ray, and so forth images to find the breast cancer present status. However, to ease the doctors' work, some research groups are investigating how to use computers more reliably for breast cancer diagnostics. To make a reliable decision about the cancer   Computational and Mathematical Methods in Medicine  Figure 6 shows the number of published breast image classification papers based on the MIAS and DDSM database from the years 2000 to 2017.
Histopathological images provide valuable information and are being intensively investigated by doctors for finding the current situation of the patient. The TCGA-BRCA and BreakHis databases contain histopathological images. Research has been performed in a few experiments on this database too. Among these two databases, BreakHis is the most recent histopathological image database, containing a total of 7909 images which have been collected from 82 patients [6]. So far around twenty research papers have been published based on this database.

Feature Extraction and
Selection. An important step of the image classification is extracting the features from the images. In the conventional image classification task, features are crafted locally using some specific rules and criteria. However, the-state-of-the-art Convolutional Neural Network (CNN) techniques generally extract the features globally using kernels and these Global Features have been used for image classification. Among the local features, texture, detector, and statistical are being accepted as important features for breast image classification. Texture features actually represent the low-level feature information of an image, which provides more detailed information of an image that might be possible from histogram information alone. More specifically, texture features provide the structural and dimensional information of the color as well as the intensity Computational and Mathematical Methods in Medicine 5  Tamura features [8] (1) Coarseness, (2) Contrast, (3) directionality, (4) line-likeness, (5) roughness, (6) regularity.

Detector
Single scale detector (1) Moravec's Detector (MD) [9], (2) Harris Detector (HD) [10], (3) Smallest Univalue Segment Assimilating Nucleus (SUSAN) [11], (4) Features from Accelerated Segment Test (FAST) [12,13], (5) Hessian Blob Detector (HBD) [14,15]. Multiscale detector [8] (1) Laplacian of Gaussian (LoG) [9,16], (2) Difference of Gaussian (DoG) Contrast [17] (3) Harris Laplace (HL), (4) Hessian Laplace (HeL), (5) Gabor-Wavelet Detector (GWD) [18].  of the image. Breast Imaging-Reporting and Data System (BI-RADS) is a mammography image assessment technique, containing 6 categories normally assigned by the radiologist. Feature detector actually provides information whether the particular feature is available in the image or not. Structural features provide information about the features structure and orientation such as the area, Convex Hull, and centroid. This kind of information gives more detailed information about the features. In a cancer image, it can provide the area of the nucleus or the centroid of the mass. Mean, Median, and Standard Deviation always provide some important information on the dataset and their distribution. This kind of features has been categorized as statistical features. The total hierarchy of the image feature extraction is resented in Figure 7. Tables 2 and 3 further summarize the local features in detail.

Strutural
Features which are extracted for classification do not always carry the same importance. Some features may even contribute to degrading the classifier performance. Prioritization of the feature set can reduce the classifier model complexity and so it can reduce the computational time. Feature set selection and prioritization can be classified into three broad categories: (i) Filter: the filter method selects features without evaluating any classifier algorithm. (ii) Wrapper: the wrapper method selects the feature set based on the evaluation performance of a particular classifier. (iii) Embedded: the embedded method takes advantage of the filter and wrapper methods for classifier construction.

Classifier Model.
Based on the learning point of view, breast image classification techniques can be categorized into the following three classes [41]: These three classes can be split into Deep Neural Network (DNN) and conventional classifier (without DNN) and to some further classes as in Table 4.

Performance Measuring Parameter.
A Confusion Matrix is a two-dimensional table which is used to a give a visual perception of classification experiments [54]. The ( , )th position of the confusion table indicates the number of times that the th object is classified as the th object. The diagonal of this matrix indicates the number of times the objects are correctly classified. Figure 9 shows a graphical representation of a Confusion Matrix for the binary classification case.
(vi) Matthew Correlation Coefficient (MCC): MCC is a performance parameter of a binary classifier, in the range {−1 to +1}. If the MCC values trend more towards +1, the classifier gives a more accurate classifier and the opposite condition will occur if the value of the MCC trend towards the −1. MCC can be defined as

Performance of Different Classifier Model on Breast Images Dataset
Based on Supervised, Semisupervised, and Unsupervised methods different research groups have been performed classification operation on different image database. In this section we have summarized few of the works of breast image classification.

Performance Based on Supervised
Learning. In supervised learning, a general hypothesis is established based on externally supplied instances to produce future prediction. For the supervised classification task, features are extracted or automatically crafted from the available dataset and each sample is mapped to a dedicated class. With the help of the features and their levels a hypothesis is created. Based on the hypothesis unknown data are classified [55]. Figure 10 represents an overall supervised classifier architecture. In general, the whole dataset is split into training and testing parts. To validate the data, some time data are also split into a validation part as well. After the data splitting the most important part is to find out the appropriate features to classify the data with the utmost Accuracy. Finding the features can be classified into two categories, locally and globally crafted. Locally crafted means that this method requires a hand-held exercise to find out the features, whereas globally crafted means that a kernel method has been introduced for the feature extraction. Handcrafted features can be prioritized, whereas Global Feature selection does not have this luxury.   Dendrites collect signals and axons carry the signal to the next dendrite after processing by the cell body as shown in Figure 11. Using the neuron working principle, the perceptron model was proposed by Rosenblatt in 1957 [56]. A singlelayer perceptron linearly combines the input signal and gives a decision based on a threshold function. Based on the working principle and with some advanced mechanism and engineering, NN methods have established a strong footprint in many problem-solving issues. Figure 12 shows the basic working principle of NN techniques. In the NN model the input data X = { 0 , 1 , . . . , } is first multiplied by the weight data W = { 0 , 1 , . . . , } and then the output is calculated using Function g is known as the activation function. This function can be any threshold value or Sigmoid or hyperbolic and so forth. In the early stages, feed-forward Neural Network techniques were introduced [57]; lately the backpropagation method has been invented to utilize the error information to improve the system performance [58,59].
The history of breast image classification by NN is a long one. To the best of my knowledge a lot of the pioneer work y g Figure 12: Working principle of a simple Neural Network technique.
was performed by Dawson et al. in 1991 [60]. Since then, NN has been utilized as one of the strong tools for breast image classification. We have summarized some of the work related to NN and breast image classification in Tables 5, 6, and 7.  Lessa and Marengoni [43] (1) Mean, Median, Standard Deviation, Skewness, Kurtosis, Entropy, Range Thermographic 94
Chen et al. [40] (1) 19 BI-RADS features have been used Ultrasound 238 (1) Chi squared method has been utilized for the feature selection.
de Lima et al. [45] (1) Total 416 features have been used Mammogram 355 (1) Multiresolution wavelet and Zernike moment have been utilized for the feature extraction.
Abirami et al. [46] (1) 12 statistical measures such as Mean, Median, and Max have been utilized as the features

Mammogram 322
(1) Wavelet transform has been utilized for the feature extraction.
El Atlas et al. [47] (1) 13 morphological features have been utilized Mammogram 410 (1) Firstly the edge information has been utilized for the mass segmentation and then the morphological features were extracted.
(2) Achieved best Accuracy is 97.5%. (1) Feature reduction has been performed by Rough-Set theory and selected 5 prioritized features.
(2) NN techniques achieved Accuracy is 99.60% when RMS slope is utilized.
Chen et al. [53] (1) Autocorrelation features Ultrasound 1020 (1) The obtained ROC area is 0.9840 ± 0.0072. the model. A kernel of size × is scanned through the input data for the convolutional operation which ensures the local connectivity and weight sharing property.
(ii) Stride and Padding. In the convolutional operation, a filter scans through the input matrices. In each step how much position a kernel filter moves through the matrix is known as the stride. By default stride keeps to 1. With inappropriate selection of the stride the model can lose the border information. To overcome this issue the model utilizes extra rows and columns at the end of the matrices, and these added rows and columns contain all 0s. This adding of extra rows and columns which contain only zero value is known as zero padding.
(iii) Nonlinear Operation. The output of each of the kernel operations is passed through a rectifier function such as Rectified Linear Unit (ReLU), Leaky-ReLU, TanH, and Sigmoid. The Sigmoid function can be defined as and the tanh function can be defined as However the most effective rectifier is ReLU. The ReLU method converts all the information into zero if it is less than or equal to zero and passes all the other data as is shown in Figure 13 ( ) = max (0, ) .
Another important nonlinear function is Leaky-RelU where is predetermined parameter which can be varied to give a better model. (iv) Subsampling. Subsampling is the procedure of reducing the dimensionality of each of the feature maps of a particular layer; this operation is also known as a pooling operation. Actually it reduces the amount of feature information from the overall data. By doing so, it reduces the overall computational complexity of the model.  regularize the overfilling problem. The technique of randomly removing neurons from the network is known as dropout.
(vi) Soft-Max Layer. This layer contains normalized exponential functions to calculate the loss function for the data classification. Figure 15 shows a generalized CNN model for the image classification. All the neurons of the most immediate layer of a fully connected layer are completely connected with the fully connected layer, like a conventional Neural Network. Let −1 represent the th feature map at the layer − 1. The th feature map at the layer can be represented as where − represents the number of feature maps at the −1th layer, , represents the kernel function, and represents the bias at , where performs a nonlinear function operation. The layer before the Soft-Max Layer can be represented as As we are working on a binary classification, the Soft-Max regression normalized output can be represented as Let = 1 represent Benign class and = 2 represent the Malignant class. The cross-entropy loss of the above function can be calculated as Whichever group experiences a large loss value, the model will consider the other group as predicted class.
A difficult part of working on DNN is that it requires a specialized software package for the data analysis. Few research groups have been working on how effectively data can be analyzed by DNN from different perspectives and the demand. Table 8 summarizes some of the software which is available for DNN analysis.
The history of the CNN and its use for biomedical image analysis is a long one. Fukushima first introduced a CNN named "necognitron" which has the ability to recognize stimulus patterns with a few shifting variances [113]. To the best of our knowledge, Wu et al. first classified a set of mammogram images into malignant and benign classes using a CNN model [78]. In their proposed model they only utilized one hidden layer. After that, in 1996 Sahiner et al.  [114]. Their proposed model is known as AlexNet. After this work a revolutionary change has been achieved in the image classification and analysis field. As an advanced engineering of the AlexNet, the paper titled "Going Deeper with Convolutions" by Szegedy [115] introduced the GoogleNet model. This model contains a much deeper network than AlexNet. Sequentially ResNet [116], Inception [117], Inception-v4, Inception-ResNet [118], and a few other models have recently been introduced. Later, directly or with some advanced modification, these DNN models have been adapted for biomedical image analysis. In 2015, Fonseca et al. [81] classified breast density using CNN techniques. CNN requires a sufficient amount of data to train the system. It is always very difficult to find a sufficient amount of medical data for training a CNN model. A pretrained CNN model with some fine tuning can be used rather than create a model from scratch [119]. The authors of [119] did not perform their experiments on a breast cancer image dataset; however they have performed their experiments on three different medical datasets with layerwise training and claimed that "retrained CNN along with adequate training can provide better or at least the same amount of performance." The Deep Belief Network (DBN) is another branch of the Deep Neural Network, which mainly consists of Restricted Boltzmann Machine (RBM) techniques. The DBN method was first utilized for supervised image classification by Liu et al. [120]. After that, Abdel-Zaher and Eldeib utilized the DBN method for breast image classification [121]. This field is still not fully explored for breast image classification yet. Zhang et al. utilized both RBM and Point-Wise Gated RBM (PRBM) for shear-wave electrography image classification where the dataset contains 227 images [97]. Their achieved classification Accuracy, Sensitivity, and Specificity are 93.40%, 88.60%, and 97.10%, respectively. Tables 9, 10, and 11 have summarized the most recent work for breast image classification along with some pioneer work on CNN.

Logic Based Algorithm.
A Logic Based algorithm is a very popular and effective classification method which follows the tree structure principle and logical argument as shown in Figure 16. This algorithm classifies instances based on the feature's values. Along with other criteria, a decisiontree based algorithm contains the following features: (i) Root node: a root node contains no incoming node, and it may or may not contain any outgoing edge (ii) Splitting: splitting is the process of subdividing a set of cases into a particular group. Normally the following criteria are maintained for the splitting: Among all the tree based algorithms, Iterative Dichotomiser 3 (ID3) can be considered as a pioneer, proposed by Quinlan [149]. The problem of the ID3 algorithm is to find the optimal solution which is very much prone towards overfitting. To overcome the limitation of the ID3 algorithm the C4.5 algorithm has been introduced by Quinlan [150], where a pruning method has been introduced to control the overfitting problem. Pritom   Fonseca et al. [81] (1) Global Features Mammogram - (1) Breast density classification has been performed utilizing HT-L3 convolution.

Root node
Decision node Decision node database classification where they utilized 11 features and obtained 91.13% Accuracy. Logic Based algorithms allow us to produce more than one tree and combine the decisions of those trees for an advanced result; this mechanism is known as an ensemble method. An ensemble method combines more than one classifier hypothesis together and produces more reliable results through a voting concept. Boosting and bagging are two well-known ensemble methods. Both boosting and bagging aggregate the trees. The difference is in bagging successive trees do not depend on the predecessor trees, where in the boosting method successive trees depend on the Albayrak and Bilgin [86] (1) Global Features Histopathology 100 (1) Cluster-based segmentation has been performed to find out the cellular structure.
(2) Blob analysis has been performed on the segmented images.
(3) To reduce the high dimensionality, Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) methods have been utilized.

Mammogram -
(1) They performed their experiments on the DDSM database.
(2) Total required parameter is 5.8 × 10 7 and time for the per image processing is 1.10 ms.
(3) The best classification achieved is 96.70%; however they show that when they utilize the VGG model the Accuracy was 97.00% which is slightly better than their model. However in terms of memory size and time per image processing their model gives better performance than the VGG model.
(2) The best Accuracy obtained when they utilized GoogleNet model was 83.00%.
information gathered from the predecessor trees. Gradient boosting is a very popular method for data classification [153,154]; however a state-of-the-art boosting algorithm such as "Extreme Gradient Boosting" (XGBoosting) is a very effective method for data classification [155]. Interestingly, there has not been a single paper published for breast image classification using the XGBoost algorithm. Along with the boosting method, different bagging methods are available; among them Random Forest (RF) is very popular where a large number of uncorrelated trees are aggregated together for a better prediction. Tables 12 and 13 summarize a set of papers where a Logic Based algorithm has been used for image classification.

Support Vector Machine (SVM). SVM were proposed by VC (Vepnick-Cherovorenkis
). This technique does not require any prior distribution knowledge for the data classification task like Bayesian classification technique. In many practical situations, the distribution of the features is not available. In such cases, SVM can be used to classify the available data into the different classes.
Consider the set of two-dimensional data plotted in Figure 17. The symbol "∘" represents those data which belong to Class-1 and "◻" represents data which belong to Class-2. A hyperplane ( ) has been drawn which classifies the data into two classes. Interestingly, there will be " " hyperplanes available which can separate the data.
(2) Transfer learning was performed and obtained AUC was 0.88 whereas when the system learned from scratch, the best ROC is 0.82.
(2) The AUC of CNN and DLCNN model is 0.89 and 0.93, respectively.
(2) When using ensemble techniques the soft voting method has been used.
Geras et al. [94] (1) Global Features Mammogram 102800 (1) They investigated the relation of the Accuracy with the database size and image size. Arevalo et al. [82] (1) Global Features Mammogram 736 (1) The best ROC value was 0.822. (1) Two-dimensional discrete orthonormal -transform has been used for the feature extraction  Zhang et al. [97] (1) 133 features (mass based and content based) Mammogram 400 (1) Computer model has been created which is able to find a location that was not detected by trainee. Ahmad and Yusoff [98] (1) Nine features selected Biopsy 700 (1) Achieved Sensitivity, Specificity, and Accuracy are 75.00%, 70.00%, and 72.00%, respectively.
(2) When they use RF algorithm on Mammogram (DDSM) dataset, obtained Accuracy and ROC are 79.00% and 0.89.
(2) They claimed that the RLTP feature provides better performance than the rotation invariant patterns.
Dong et al. [106] (1) NRL margin gradient where W , X ∈ R . As the training data are linearly separable no training data will satisfy the condition (X , W , ) = 0.
Sometime it is very difficult to find the perfect hyperplane which can separate the data, but if we transform the data into a higher dimension the data may be easily separable. To separate this kind of data, a kernel function can be introduced.
Kernel Methods. Assume a transformation such that it transforms the dataset X 1 ∈ R into dataset X 2 ∈ R where > . Now train the linear SVM on the dataset X 2 to get a new classifier SVM .
The advantage of the kernel method for breast cancer image classification using an SVM was first introduced by El-Naqa et al. [156]. They classify Microcalcification clusters in mammogram images (76 images were utilized for the (1) Benign and malignant lesions are investigated.
(2) Linear kernel, a polynomial kernel, and a radial basis function kernel utilized along with the SVM method for the breast image classification.
(2) The achieved Accuracy, Sensitivity, and Specificity are 94.94%, 92.86%, and 93.33%, respectively. experiment where the total number of MCs was 1120). They utilized the SVM method along with the Gaussian kernel as well as the polynomial kernel. In 2003, Chang et al. classified a set of sonography images using SVM techniques where they consider that the image is surrounded by pickle noise [157], where the database contains 250 images. Their achieved Accuracy was 93.20%. A total of thirteen features, including shape, law, and gradient features, were utilized along with SVM and a Gaussian kernel for the mammogram image classification. They performed their operation on 193 mammogram images and achieved 83.70% sensitivity and 30.20% False Positive Rate [158]. SVM has been combined with the NN method by B. Sing et al. for ultrasound breast image classification where the database contained a total of 178 images. They performed a hybrid feature selection method to select the best features [159]. A breast ultrasound image is always very complex in nature. The Multiple Instance Learning (MIL) algorithm has been first used along with SVM for the breast image classification by [176], and their obtained Accuracy was 91.07%. The Concentric Circle BOW feature extraction method was utilized to extract the features and later the SVM method was used for breast image classification [177]. Their achieved Accuracy is 88.33% when the dimension of the features was 1000. A Bag of Features has been extracted from histopathological images (using SIFT and DCT) and using SVM for classification by Mhala and Bhandari [178]. The experiment is performed on a database which contains 361 images, where 119 images are normal, 102 images are ductal carcinoma in situ, and the rest of the images are invasive carcinoma. Their experiment achieved 100.00% classification Accuracy for ductal carcinoma in situ, 98.88% classification Accuracy for invasive carcinoma, and 100.00% classification Accuracy for normal image classification. A mammogram (DDSM) image database has been classified by Hiba et al. [179] by SVM along with the Bag of Feature method. Firstly the authors extract LBP and quantize the binary pattern information for feature extraction. Their obtained Accuracy was 91.25%.
Along with the above-mentioned work different breast image databases have been analyzed and classified using SVM. We have summarized some of the work related to SVM in Tables 14, 15, and 16.

Bayesian.
A Bayesian classifier is a statistical method based on Bayes theorem. This method does not follow any explicit decision rule; however it depends on estimating probabilities. The Naive Bayes method can be considered one of the earlier Bayesian learning algorithms. Zhang et al. [122] (1) Fractional Fourier transform information utilized as features
(2) SVM and Mixed Gravitational Search Algorithm (MGSA) used together for feature reduction.
(2) Removing unwanted objects from the images for reducing the redundancy and computational complexity.
(2) They utilized the radial basis function (RBF) for their analysis.
(3) The Sequential Forward Floating Selection (SFFS) method utilized for the feature selection.

Kavitha and
Thyagharajan [128] (1) Histogram of the intensity has been used as a statistical feature.
The Naive Bayes (NB) method works on the basis of the Bayes formula, where each of the features is considered statistically independent. Consider a dataset with samples, with each sample containing a feature vector x k with features [180] and belonging to a particular class . According to the NB formula, the probability of the particular class with the conditional vector x k is represented as Applying the chain rule The NB theorem considers all the features independently which can be represented as Raghavendra et al. [132] (1) Gabor wavelet transform utilized for feature extraction.
(2) Their obtained Accuracy is 96.4% The NB method is very easy to construct and very first to predict the data. This method can also utilize the kernel method. However, for a large dataset and continuous data, this method has very poor performance. NB can be classified into the following subclasses: (i) Gaussian Naive Bayes (ii) Multinomial Naive Bayes (iii) Bernoulli Naive Bayes.
One of the constraints of the NB classifier is that it considers that all the features are conditionally independent. A Bayesian Network is another Bayesian classifier which can overcome this constraint [181,182]. The literature shows that the Bayesian classifier method is not utilized much for breast image classification. In 2003 Butler et al. used NB classifier for X-ray breast image classification [183]. They extracted features from the low-level pixels. For all feature combinations they obtained more than 90.00% Accuracy. Bayesian structural learning has been utilized for a breast lesion classifier by Fischer et al. [184]. Soria et al. [185] classify a breast cancer dataset utilizing C4.5, multilayered perceptron, and the NB algorithm using WEKA software [186]. They conclude that the NB method gives better performance than the other two methods in that particular case. They also compared their results with the Bayes classifier output. Some other research on the Bayes classifier and breast image classification has been summarized in Tables 17 and 18.

Performance Based on Unsupervised
Learning. This learning algorithm does not require any prior knowledge about the target. The main goal of the unsupervised learning is to find the hidden structure and relations between the different data [187] and distribute the data into different clusters. Basically clustering is a statistical process where a set of data points is partitioned into a set of groups, known as a cluster. The -means algorithm is a clustering algorithm proposed by [188]. Interestingly, unsupervised learning can be utilized as preprocessing step too.
(i) In the -means algorithm, firstly assign centroid points. Suppose that we have feature points where ∈ {1, . . . , }. The objective of the -means algorithm is to find positions , where ∈ 1, . . . , that minimize the data points to the cluster by solving arg min (ii) Self-Organizing Map (SOM): SOM is another popular unsupervised classifier, proposed by Kohonen et al. [189][190][191]. The main idea of the SOM method is to reduce the dimension of the data and represent those dimensionally reduced data by a map architecture, which provides more visual information.
(iii) Fuzzy -Means Clustering (FCM): the FCM algorithm cluster databased on the value of a membership function is proposed by [192] and improved by Bezdek [193].
The history of using unsupervised learning for breast image classification is a long one. In 2000, Cahoon et al. [194] classified mammogram breast images (DDSM database) in an unsupervised manner, utilizing the -NN clustering and Fuzzy -Means (FCM) methods. Chen et al. classified a set of breast images into benign and malignant classes [164]. (1) For the training data the AUC value is 0.959 for the inclusive model, whereas AUC value is 0.910 for the descriptor model. Rodríguez-López and Cruz-Barbosa [137] (1) Eight image feature nodes utilized.
(2) This can classify the data as well as identify the target.
(2) Generation of number of clusters (3) Detection of regions of interest.
(4) Mean detection of regions of interest is 85.00%.
(2) This method obtained 96.00% similarity between segmented and reference tumors.
They utilized a SOM procedure to perform this classification operation. They collected 24 autocorrelation textural features and used a 10-fold validation method. Markey et al. utilized the SOM method for BIRADS image classification of 4435 samples [195]. Tables 19 and 20 summarize the breast image classification performance based on -means algorithm and SOM method.

Performance Based on Semisupervisor.
The working principle of semisupervised learning lies in between supervised and unsupervised learning. For the semisupervised learning a few input data have an associated target and large amounts of data are not labeled [196]. It is always very difficult to collect the labeled data. Few data such as speech or information scratched from the web are difficult to label. To classify (1) Both FCM method and Adaboost method utilized separately to classify images.
(2) For the classification purposes selected 23 features and also select the best features using feature selection algorithm. When they used the FCM method, the obtained Mean Accuracy was 75.00% whereas the Adaboost method Accuracy was 88.00%. Nattkemper et al. [161] MRI -(1) -means algorithm as well as SM method utilized. Slazar-Licea et al. [162].
Marcomini et al. [163] ( To the best of our knowledge, Li and Yuen have utilized GB semisupervised learning for biomedical image classification [197]. The kernel trick is applied along with the semisupervised learning method for breast image classification by Li et al. [198].

Conclusion
Breast cancer is a serious threat to women throughout the world and is responsible for increasing the female mortality rate. The improvement of the current situation with breast cancer is a big concern and can be achieved by proper investigation, diagnosis, and appropriate patient and clinical management. Identification of breast cancer in the earlier stages and a regular check of the cancer can save many lives. The status of cancer changes with time, as the appearance, distribution, and structural geometry of the cells are changing on a particular time basis because of the chemical changes which are always going on inside the cell. The changing structure of cells can be detected by analysing biomedical images which can be obtained by mammogram, MRI, and so forth techniques. However these images are complex in nature and require expert knowledge to perfectly analyze malignancy. Due to the nontrivial nature of the images the physician sometimes makes a decision which might contradict others. However computer-aided-diagnosis techniques emphasising the machine learning can glean a significant amount of information from the images and provide a decision based on the gained information, such as cancer identification, by classifying the images. The contribution of machine learning techniques to image classification is a long story. Using some advanced engineering techniques with some modifications, the existing machine learning based image classification techniques have been used for biomedical image classification, specially for breast image classification and segmentation. A few branches of the machine learning based image classifier are available such as Deep Neural Network, Logic Based, and SVM. Except for deep-learning, a machine learning-based classifier largely depends on handcrafted feature extraction techniques such as statistical and structural information that depend on various mathematical formulations and theorize where they gain 22 Computational and Mathematical Methods in Medicine Cordeiro et al. [166] (1) Zernike moments have been used for the feature extraction.
(2) For the fatty-tissue classification this method achieved 91.28% Accuracy.
(2) This experiment shows impressive results on DDSM database.
object-specific information. They are further utilized as an input for an image classifier such as SVM and Logic Based, for the image classification. This investigation finds that most of the conventional classifiers depend on prerequisite local feature extraction. The nature of cancer is always changing, so the dependencies on a set of local features will not provide good results on a new dataset. However the state-of-the art Deep Neural Networks, specially CNN, have recently advanced biomedical image classification due to the Global Feature extraction capabilities. As the core of the CNN model is the kernel, which gives this model the luxury of working with the Global Features, these globally extracted features allow the CNN model to extract more hidden structure from the images. This allows some exceptional results for breast cancer image classification. As the CNN model is based on the Global Features, this kind of classifier model should be easy to adapt to a new dataset.
This paper also finds that the malignancy information is concentrated in the particular area defined as ROI. Utilizing only the ROI portions, information gathered from the segmented part of the data can improve the performance substantially. The recent development of the Deep Neural Network can also be utilized for finding the ROI and segmenting the data, which can be further utilized for the image classification.
For breast cancer patient care, the machine learning techniques and tools have been a tremendous success so far, and this success has gained an extra impetus with the involvement of deep-learning techniques. However the main difficulty of handling the current deep-learning based machine learning classifier is its computational complexity, which is much Computational and Mathematical Methods in Medicine 23 higher than for the traditional method. The current research is focused on the development of the light DNN model so that both the computational and timing complexities can be reduced. Another difficulty of using the DNN based cancer image classifier is that it requires a large amount of training data. However the reinforcement of learning techniques and data augmentation has been largely adapted with the current CNN model, which can provide reliable outcomes. Our research finds that the current trend of machine learning is largely towards deep-learning techniques. Among a few other implications, the appropriate tools for designing the overall deep-learning model was the initial obligation for utilizing deep-learning based machine learning techniques. However some reliable software has been introduced which can be utilized for breast image classification. Initially it was difficult to implement a DNN based architecture in simpler devices; however due to cloud-computer based Artificial Intelligence techniques this issue has been overcome and DNN has already been integrated with electronic devices such as mobile phones. In future combining the DNN network with the other learning techniques can provide more-positive predictions about breast cancer.
Due to the tremendous concern about breast cancer, many research contributions have been published so far. It is quite difficult to summarize all the research work related to breast cancer image classification based on machine learning techniques in a single research article. However this paper has attempted to provide a holistic approach to the breast cancer image classification procedure which summarizes the available breast dataset, generalized image classification techniques, feature extraction and reduction techniques, performance measuring criteria, and state-ofthe-art findings.
In a nutshell, the involvement of machine learning for breast image classification allows doctors and physicians to take a second opinion, and it provides satisfaction to and raises the confidence level of the patient. There is also a scarcity of expert people who can provide the appropriate opinion about the disease. Sometimes the patient might need to spend a long time waiting due to the lack of expert people. In this particular scenario the machine learning based diagnostic system can help the patient to receive the timely feedback about the disease which can improve the patientmanagement scenario.