Local Binary Patterns Descriptor Based on Sparse Curvelet Coefficients for False-Positive Reduction in Mammograms

Breast Cancer is the most prevalent cancer among women across the globe. Automatic detection of breast cancer using Computer Aided Diagnosis (CAD) system suffers from false positives (FPs). Thus, reduction of FP is one of the challenging tasks to improve the performance of the diagnosis systems. In the present work, new FP reduction technique has been proposed for breast cancer diagnosis. It is based on appropriate integration of preprocessing, Self-organizing map (SOM) clustering, region of interest (ROI) extraction, and FP reduction. In preprocessing, contrast enhancement of mammograms has been achieved using Local Entropy Maximization algorithm. The unsupervised SOM clusters an image into number of segments to identify the cancerous region and extracts tumor regions (i.e., ROIs). However, it also detects some FPs which affects the efficiency of the algorithm. Therefore, to reduce the FPs, the output of the SOM is given to the FP reduction step which is aimed to classify the extracted ROIs into normal and abnormal class. FP reduction consists of feature mining from the ROIs using proposed local sparse curvelet coefficients followed by classification using artificial neural network (ANN). The performance of proposed algorithm has been validated using the local datasets as TMCH (Tata Memorial Cancer Hospital) and publicly available MIAS (Suckling et al., 1994) and DDSM (Heath et al., 2000) database. The proposed technique results in reduction of FPs from 0.85 to 0.02 FP/image for MIAS, 4.81 to 0.16 FP/image for DDSM, and 2.32 to 0.05 FP/image for TMCH reflecting huge improvement in classification of mammograms.


Introduction
Breast cancer is the most common cancer disease among women across worldwide. It is the leading cause of deaths for women suffering from cancer disease in India. It is estimated that breast cancer cases in India would reach to as high as 1,797,900 by 2020 [1]. Rising rate of incidences can cause high mortality. is is due to lack of awareness about breast screening, late reporting, and insufficient medical access [2].
is fact brings a concern and necessity that screening for breast cancer is prudent in its early stage to confirm longer survival. Among all techniques, namely, mammography, tomosynthesis, ultrasonography, computed tomography, and magnetic resonance, mammography is the most reliable and accepted modality by radiologist for preliminary examination of breast cancer due to cost benefits and accessibility [3][4][5].
e diagnosis of breast cancer using mammogram by radiologist varies from expert to expert as symptoms are misinterpreted or overlooked, due to the tedious task of screening mammograms. Study reveals that 10% to 30% of the visible cancers on mammograms are overlooked, and only 20% to 30% of biopsies are positive [6][7][8]. Biopsies are traumatic in nature and costly; therefore, computer aided detection and diagnosis (CAD) systems combined with expert radiologists' experience would provide more comprehensive diagnosis [9]. Detailed survey about the research in the design of CAD systems has been given in next section.

Literature Survey
e design and development of CAD system is an important progressive area of research for contrast enhancement for better visualization and clarification [10][11][12], pectoral muscle removal, segmentation for better delineation of region of interest (ROI), extraction of features, and classification [13,14]. e segmentation method is classified as region based, contour-based, and clustering method [15]. e region and contour-based methods are popularly used by many researchers. Görgel et al. [16] developed Local Seed Region Growing-Spherical Wavelet Transform (LSRG-SWT) algorithm using local dataset and MIAS [17] with classification accuracy of 94% and 91.67%, respectively. Pereira et al. [18] presented segmentation and detection of masses in mammogram using wavelet transform and genetic algorithm that provides FP rate of 1.35 FP/image and sensitivity of 95% using DDSM [19]. Rouhi et al. [20] studied segmentation using region growing, Cellular Neural Network (CNN), and ANN. e result of classification varied from 80 to 96%, which is the main weakness of their study. Berber et al. [21] proposed Breast Mass Contour Segmentation (BMCS) approach and showed 6 FPR for local dataset. Hybrid level set segmentation method [22] based on combination of region growing and level set was used to segment tumor. e results showed that the sensitivity varied from 78 to 100% due to the presence of artifact in the MIAS database. e difficulties in region and contour-based segmentation methods are the appropriate initialization of seed point and contour position.
Several researchers have implemented clustering method like K-means and Fuzzy C-means (FCM) for breast abnormality segmentation [3,23]. However, they have limitations in terms of learning abilities. Learning-based techniques such as Self-organizing map (SOM) [24] have been successfully used in medical image segmentation [25]. e success of SOM in medical image segmentation has inspired the researcher to choose it for mammogram segmentation. Many of the times the tumor-segmented regions are not the abnormal tissues (cancerous region), and they are known as false positives (FPs). is FP consumes much time of radiologists and results into unnecessary biopsies. us, reducing the FPs is an open research problem and various researchers have proposed FP reduction algorithms to improve the specificity of the CAD systems [5,9,23,[26][27][28][29][30][31]. Usually, FP reduction algorithm is postprocessing step of a CAD system with two stages namely: Feature extraction and Classification. Various methods have been developed for feature extraction based on wavelets [8,18,32], curvelet [33,34], Gabor [35,36], morphological descriptors [20], textural analysis [26,27,30,32], histogram [4,5,7,29,[37][38][39][40], etc. e segmentation error can reduce the performances of morphological descriptor. When Gray Level Co-occurrence Matrix (GLCM) from normal and abnormal region in dense mammogram is same, texture descriptor overlaps that leads to more number of FPs [37]. Ojala et al. proposed local binary patterns (LBPs) [41] for textural feature extraction which works well in feature extraction as compared to morphological descriptor and GLCM-based textural descriptor. LBP descriptor can be considered as local microstructures, namely, edges, flat areas, spots, etc. Variants of LBP have been proposed by various researchers to achieve rotation and intensity invariant features. Also, LBP is computationally efficient and extracts robust features; therefore, LBP descriptors have been widely applied in FP reduction and classification methods for mammogram images [29,37,39,40]. However, LBP descriptor does not provide the directional information of local micropattern.
erefore, transform technique such as curvelet combined with LBP was used to extract features. Various curvelet-based approaches have been proposed in the literature [8,33,34,42] which conclude that curvelet outperforms as compared to wavelet transform.
In this work, novel method of extracting sparse curvelet subband coefficients by incorporating the knowledge of irregular shape of masses as they appear in sparse matrix and calculating LBP features has been presented. erefore, this paper presents scheme as follows: (1) Preprocessing of mammogram image for contrast enhancement using local entropy maximizationbased image fusion algorithm and removal of background noise (2) Cluster-based segmentation of mammograms using SOM and extract tumor regions, i.e., ROI) (3) FP reduction: extraction of sparse curvelet subband coefficients and computation of LBP descriptor to classify true positives and false positives to improve performance of CAD system using MIAS [17], DDSM [19], and Tata Memorial Cancer Hospital (TMCH) datasets.
e organization of paper is as follows: Sections 1 and 2 illustrate the introduction and literature review on automatic segmentation and extraction of abnormal masses (i.e., tumor region) as well as FP reduction methods. Section 3 presents the proposed methodology for SOM based segmentation of mammograms followed by novel false positive reduction in detail. Section 4 depicts the experimental results and discussions on three benchmark datasets. Finally, Section 5 concludes the proposed approach for accurate extraction of abnormal masses (i.e., tumor region) by excluding the FPs.

Methodology
e block schematic of proposed integrated method for automatic detection of breast cancer using sparse curvelet coefficient-based LBP descriptor has been shown in Figure 1.

Preprocessing.
e mammogram images are low-dose x-ray images so they have poor contrast and suffer from noises. e preprocessed mammogram image as shown in Figures 2(a)-2(d) represents preprocessing of mammogram, and Figures 2(e)-2(g) represents SOM clustering and ROI extraction.

Local Entropy Maximization-Based Image Fusion: Contrast Enhancement.
e contrast enhancement of the mammogram is performed using local entropy maximization [12] for better segmentation. Here, original image is given to the contrast limited adaptive histogram equalization (CLAHE) algorithm to get the second input to our image fusion algorithm. Further, original image along with the CLAHE has been given to the image fusion algorithm. Procedure of the image fusion has been given in Algorithm 1. We have used local entropy as a fusion rule given by the following equation: where ENT is the local entropy and p_org(k) and p_CLAHE(k) are the probability of k th pixel from 5 × 5 sliding window [12]. Here, both high frequency components from original mammogram and CLAHE mammogram have been fused using maximum entropy criteria. Figure 3(b) presents contrast-enhanced mammogram using local entropy maximization-based image fusion.

Pectoral Muscle Removal.
Pectoral muscle suppression has been performed by defining rectangle as suggested in [14] (Figure 3(c)). It illustrates the rectangle (ABDC) and fixes the points G and has intensity variation and joins them for pectoral muscle suppression. Figure 3(d) illustrates pectoral muscle removed image to avoid discrepancies in the algorithm because of similar intensities present between pectoral muscle and masses.

SOM
Clustering. SOM is a special type of neural network designed to map the input image of size N x × N y to M clusters based on their characteristic features [25]. For SOM, the image (I) is converted into a feature vector f � f 1 , f 2 , . . . , f m , where m is the number of features. In this experiment, we have trained SOM with M � 4 clusters using p � 9 neighbourhood features such as given a centre pixel (g c ) in the image, the neighbourhood features are computed as given in the following equation: where n is the number of neighbourhood (3 × 3 window), g p is the neighbourhoods, and F is the feature vector

Mammogram Preprocessing
Step I Features for SOM SOM clustering 9×m ×n Step II

ROI extraction
Original image corresponds to centre pixel g c . e selection of 3 × 3 window pixel is based on [43] to capture local details.
At the start, weight vector W i � w i1 , w i2 , . . . , w im−1 is random and updated as the network learns. e minimum Euclidean distance ‖f − W i ‖ is described as the best matching component or winner node ‖f − W c ‖ and described as Weight vector for winning output neuron and its neighboring neurons are updated as where t � 1, 2, . . . is time coordinate. e function N ci (t) is the neighbourhood kernel function and expressed as where η(t) is the learning rate, σ(t) is a width of kernel that corresponds to neighbourhood neurons around node c and m c and m i corresponds to location vectors of nodes c and i. Figures 4(a) and 4(b) represent cluster map and cluster boundaries marked on mammogram. After the several observations for known areas, it was empirically noticed that number of pixels of range or pixel level threshold (PLT based on pixel count in TP) as 450 to 31,500; 16,000 to 2,00,000; and 4,000 to 2,00,000 consist of abnormality for MIAS, DDSM, and TMCH database, respectively, which is verified from the expert. e size of the tumor is varying because of the mammogram size of 1024 × 1024 pixels for MIAS, 2728 × 3920 pixels to 4608 × 6048 pixels for DDSM, and 2294 × 1914 or 4096 × 3328 pixels for TMCH datasets. erefore, cluster regions below or above the specified threshold are discarded and the remaining region is marked as true positive (TP) as shown in Figure 4. Figure 4(a) shows the clustered image using SOM; Figure 4(b) shows the cluster boundaries marked on original image.
We can see that there are many FPs along with TP (marked by pink color) which are reduced using pixel level threshold (PLT based on pixel count in TP) as explained above. Figure 4(c) shows the filtered result using PLT.

ROI Extraction.
After SOM clustering (initial segmentation), the next step is to classify the detected regions into TP and FP by using proposed local sparse curvelet features (LSCF) followed by ANN classifier. To do so, initially, we have extracted ROIs from detected regions by SOM clustering and manually categorized into TP and FP. We collected these ROIs from three different datasets according to their maximum height and maximum width using connected components e.g., region marked in Figure 4(c). erefore, their patch size is different as shown in Figure 5, ROIs for MIAS, DDSM, and TMCH dataset. Further, these extracted patches have been used to train the ANN for the task of FP reduction.

False-Positive (FP) Reduction.
After ROI extraction, FP reduction algorithm performs computation of proposed local sparse curvelet features (LSCF) followed by ANN classifier. [43] was proposed as LBP descriptor computation at circular neighbourhood which is called as uniform LBP (ULBP) descriptor and expressed as

Proposed Algorithm. LBP
Computation of LBP based on actual shape of mass according to sparse matrix has been shown in Figure 6, where it takes pixels related to shape of mass which are called as foreground pixels and rejects the other pixels called as background pixels. e proposed algorithm uses foreground pixels only for LBP computation, and this will tend to number of pixel reduction in LBP computations. erefore, identification of foreground and background pixels is an important step which is performed using lookup table approach. e identification of foreground and background pixel is based on number of nonzero pixels in the lookup table, i.e., if count of sliding window nonzero pixels is greater than 2, count(p(i, j)) > 2 is identified as foreground and LBP is estimated. On the other hand, if count of sliding window nonzero pixels is less than 2, count(p(i, j)) < 2 is identified as background and LBP would not be estimated and rejected from lookup table.

Journal of Healthcare Engineering
Nonzero pixels provide actual shape of mass and are taken for LBP computations. Graphical representation of proposed algorithm for LBP descriptor computation using foreground pixels has been given in Figure 7 and the algorithm has been described in Algorithm 2.

e Fast Discrete Curvelet Transform (FDCT)
. e authors [44] have introduced computationally simple and efficient Fast Discrete Curvelet Transform (FDCT). We have preferred wrapping-based FDCT approach in proposed work, as it is faster.
e curvelet coefficients C D (j, l, k) represented by scale j, angle l, and spatial location k can be written as Figure 8 illustrates LBP code computation based on sparse curvelet coefficients; ROI decomposes using curvelet transform with scale orientations l of 16°and scale of 2 as the database consists of minimum ROI size of 25 × 22 pixels. Curvelet transform with scale orientations l of 16°and scale of 2 produces 1 + 16 � 17 different subbands based on subband division. Further, each curvelet subband coefficients have been represented using lookup table using 3 × 3 sliding window, and if the row in the lookup table identifies foreground coefficient, then LBP is computed with radius R � 1 and P � 8 neighboring pixels as shown in Algorithm 2; total 58 LBP features have been obtained from foreground curvelet subband coefficients. erefore, total 986 LBP features have been extracted from 17 curvelet subbands. It can be observed from Figure 8, curvelet subbands also provide shape of mass in 16 different directions so that the directional information can be associated with LBP features. Kanadam et al. [3] used concept of sparse ROI; similarly, we have extended it for sparse curvelet subband and LBP features computation.

Classification.
In this work, we have analyzed extracted ROI from mammogram using normal-abnormal, benignmalignant, and normal-malignant classes with ANN, SVM, and KNN classifiers. e detailed description of ANN classifier has been given in [45,46]. To evaluate performance of the proposed system, we have used 3-fold cross validation where database is randomly divided into three sets and accuracy is calculated for each set. e final accuracy of the system is average of accuracy of each of three sets. However, it will not be fair to compare 3-fold cross validation result of SVM and KNN classifier with ANN, because ANN classifier is tested on only one set of images (33% for training, 33% for testing, and 33% for validation). us, to do fair comparison, we have trained ANN using input layer (986 neuron) over three different sets (which are considered in SVM and KNN) and calculated its average accuracy. Our proposed false positive reduction algorithm illustrates in Figures 9(a)-9(c). Algorithm 3 summarizes flow of the proposed method for FP reduction in mammograms.

Experimental Results and Discussions
e proposed method has been tested and validated using three classifiers and three clinical mammographic image datasets.
e mini-MIAS [17] database consists of 322 mammograms, each having 1024 × 1024 pixels and annotated like background tissue character, class, severity, center of abnormality, and radius of circle for abnormality. is database includes 64 benign, 51 malignant, and 207 normal cases, which have been taken for experimentation.

Digital Database for Screening Mammography (DDSM).
e DDSM [19] dataset consists of 2500 studies and is composed of cranial-caudal (CC) and mediolateraloblique (MLO) views of mammographic image for left and right breast, annotated with ACR breast density, type of abnormality, and ground truth. Randomly selected 150 abnormal and 100 normal cases from both HOWTEK and LUMISYS scanner of 12 bits per pixel resolution have been subjected for experimentation.   Journal of Healthcare Engineering

Segmentation Evaluation and ROI Extraction.
e segmentation using SOM that detects suspicious mass regions is considered as TP whereas from nonmass is taken as FP. From  depends upon the shape of the ROI as per the sparse matrix. Tables 2 and 3 do not represent exact reduction in pixels for complete database, but they exhibit pixel reduction for sample mammograms.

Classifier Evaluation and False-Positive Reduction.
From Figures 10-13 (1) Load input image (img1) (2) Apply CLAHE algorithm and obtain enhanced image (img2) (3) Process img1 and img2 and obtain enhanced image using procedure given in Algorithm 1 (4) Remove pectoral muscle using proposed approach (Section 3.1.2) (5) Extract neighbourhood features for each pixel and apply SOM clustering (6) Obtain clustered image and separate out the tumorous cluster (7) Extract detected regions i.e., ROI's from clustered result (8) Extract Sparse Curvelet Coefficients (Subband) up to 2 level from each ROI (9) Extract Sparse LBP code for each subband and obtain a combined feature vector for each ROI (10) Classify each ROI into tumorous and nontumorous class i.e., TP and FP respectively (11) Map each TP region on original mammogram (img1) (12) end ALGORITHM 3: Summary of proposed method for FP reduction in mammograms. Data augmentation has been used for some classes to maintain balance between two classes, to improve performance, and to learn more powerful model. Table 4    Similarly, Table 5 shows the reduction in FPs as 0.85 to 0.01 FP/image for MIAS, 4.81 to 0.03 FP/image for DDSM, and 2.32 to 0.00 FP/image for TMCH using sparse curvelet coefficient-based LBP features. e results show the effectiveness of sparse curvelet coefficient-based LBP and ANN. From Table 6, the best value of AUC � 0.99 is obtained in benign versus malignant classification for MIAS, AUC � 0.98 in benign versus malignant in case of   e worst performance of AUC � 0.53 for MIAS is obtained with the proposed algorithm using KNN classifier as shown in Table 7. Similarly, from Table 7, the best value of AUC � 0.98 is obtained in TMCH: Scanner1, AUC � 1 is obtained in TMCH: Scanner2 database for normal versus malignant classification, AUC � 0.98 in benign versus malignant classification is attained in MIAS database, and AUC � 0.98 is achieved for normal versus malignant classification in DDSM database using ANN classifier for sparse curvelet subband-based LBP features.
However, from Table 7, it should be noted that the performance of proposed algorithm is the best using ANN classifier. Figure 14 represents automated CAD system for breast cancer diagnosis with sample mammograms. Table 8 provides comparative study of methods developed for breast tissue classification. e proposed method provides best results in terms of AUC and reduction of number of FPs as 0.85 to 0.01 FP/image for MIAS, 4.81 to 0.03 FP/image for DDSM, and 2.32 to 0.00 FP/image for TMCH. e earlier reported work uses the fixed patch size-based approach which  limits the automatic CAD system scope whereas proposed system provides complete solution to CAD system right from automatic tumor patch segmentation to reduction in FPs and final representation of mammogram with TP marked on it. It will drastically reduce the radiologist work by location tumor directly on mammogram.

Conclusion
A fully automatic CAD system, which can accurately locate the tumor on a mammogram and reduces FPs, has been proposed. e developed CAD system consists of preprocessing, SOM clustering, ROI extraction,    classifier. e performance of LBP features and LBP features based on sparse curvelet coefficients are nearly same which show that the proposed algorithm is suitable for cancer breast tissue diagnosis. In future, the reduced curvelet coefficients can be used to extract local ternary patterns and other local descriptor and local directional patterns, etc. e present work deals with mammogram with single mass; this can be further extended for multiple mass models with multiple LBP features based on sparse curvelet coefficients.

Data Availability
In this research, we have used two publicly available datasets MIAS and DDSM. ese datasets can be found here in [17] and [19]. e third database is collected from the local hospital Tata Memorial Cancer Hospital, Mumbai, which can be found at http://eureka.sveri.ac.in/ or available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.