Breast Mass Detection in Digital Mammogram Based on Gestalt Psychology

Inspired by gestalt psychology, we combine human cognitive characteristics with knowledge of radiologists in medical image analysis. In this paper, a novel framework is proposed to detect breast masses in digitized mammograms. It can be divided into three modules: sensation integration, semantic integration, and verification. After analyzing the progress of radiologist's mammography screening, a series of visual rules based on the morphological characteristics of breast masses are presented and quantified by mathematical methods. The framework can be seen as an effective trade-off between bottom-up sensation and top-down recognition methods. This is a new exploratory method for the automatic detection of lesions. The experiments are performed on Mammographic Image Analysis Society (MIAS) and Digital Database for Screening Mammography (DDSM) data sets. The sensitivity reached to 92% at 1.94 false positive per image (FPI) on MIAS and 93.84% at 2.21 FPI on DDSM. Our framework has achieved a better performance compared with other algorithms.


Introduction
Breast cancer is responsible for 23% of all cancer cases and 14% of cancer-related deaths amongst women worldwide [1]. Mammography is a reliable and trustworthy tool for early detection of breast cancer [2,3]. Early detection of potential abnormalities could generate a recommendation for further examination [4]. Current progress has shown that computer-aided detection (CAD) systems can assist doctors in finding breast masses from digitized mammograms at an early stage, which greatly improves doctor's working efficiency [5]. Efficient CAD systems have potential to reduce the breast biopsies and release radiologists from heavy workload [6,7].
However, CAD for breast mass detection is a challenging task due to the varying sizes, shapes, and appearances as well as densities of masses [8,9]. Conventional methods for breast mass detection mainly rely on the threshold values [10] or mass templates [11] based on various kinds of filter operators. However, the false-positive results are still very high [12,13]. e threshold methods based on gray-level images or various filtered images consider only the simple brightness of masses. Although there are ongoing research studies trying to model templates that use the general geometric properties of the masses [11], these are always complex, multiparameter models which are not applicable to all masses with various sizes and shapes.
In order to cope with the problems above, some researchers get inspirations from the doctors' detection process. ey use the visual salience to locate the suspicious lesions [14,15]. Visual saliency models human beings' ability that perceives salient features in an image. In computer vision, these visual saliency-based methods compute probabilistic maps of an image where the pixels are very different from surrounding regions [16]. ese methods greatly simplify the process of mass detection. For example, Tourassi et al. [14] proposed a novel similarity measure by incorporating the Gaussian salient map of image pixels. Agrawal et al. [16] proposed an automatic mass detection algorithm using the graph-based vision saliency (GBVS) map. However, most of these methods are derived from natural scene statistics while the characteristics of medical images are different. In fact, they only consider the size and brightness of suspicious regions, which is not enough for mass detection, as shown in Figure 1. Some masses are so small or they adhere tightly to glandular tissue. e visual significance is relatively low, resulting in a high false positive.
At present, deep learning [18] has been shown to be consistently producing higher performance compared with traditional machine learning methods [19]. Directly distilling information from training samples and convolutional neural networks (CNNs) [20] have been successfully applied to some medical tasks: for example, breast mass detection/diagnosis [21,22], segmentation of the left ventricle [23], and classification of skin cancer [19]. Dhungel et al. [24] presented an automated mass detection method using a cascade of deep learning and random forest classifiers. Kooi et al. [25] have shown that a CNN trained on a large data set of around 45,000 images outperforms a state-of-the-art system in CAD. Most of these methods work well on large data sets but exhibit certain limitations on small data sets because they need to decide a large number of parameters [26]. erefore, the traditional machine learning method is still valuable in some aspects, like data mining based on small samples [27], integration of multiple knowledge [28], and so on.
Studies have shown that the recognition of doctors plays an important role in lesion detection of radiology [29,30]. Gestalt psychology tries to understand the laws of our ability to acquire and maintain meaningful perceptions in an apparently chaotic world [31]. Meanwhile, the theory has been validated on extensive experiments, which are performed on neatly organized dot lattices on a screen. ese dots share many similarities with pixels in medical images [32,33].
Hence, we consider incorporating visual perception properties described by the Gestalt psychology framework into mass detection. Inspired by Gestalt psychology, the Gestalt framework covers sensation (bottom-up) and perception (top-down), which are also collectively called recognition [34]. e theory aims to formulate visual rules according to which perceptual input is organized into unitary forms. e Gestalt theory includes the following principles: proximity, similarity, continuity, symmetry, closure, simplicity, and so on [34]. ese visual rules can be used to help doctors to distinguish cancer masses from normal tissues.
Inspired by the framework of Gestalt theory, we propose to apply visual rules to medical image analysis. More exactly, we present an automatic mass detection framework based on Gestalt psychology. It contains three modules: sensation integration, semantic integration, and validation. In each module, a series of mathematical and calculation models for visual rules are presented. e proposed automatic mass detection method integrates human cognition properties and the visual characteristics of breast masses. To the best of our knowledge, combining bottomup sensation and top-down recognition of the radiologist has not been attempted before. Experimental results demonstrated that the proposed method has yielded better performance than other algorithms.

Mass Detection Framework Inspired by Gestalt Psychology
In this paper, we propose to incorporate the visual perception properties into breast mass detection. First, the characteristics of mammogram reading by radiologists are analyzed as per Gestalt psychology. Second, a framework for automatic detection of masses is proposed. All visual rules in the framework are quantified through mathematical methods.

Analysis of Mammogram Screening under the Gestalt
Framework. In most cases, screening mammogram is a process of discovery, detection, and diagnosis by the radiologist [35]. e diversity of mammogram tissues brings many problems to mass detection. Radiologists are professional analysts who spend thousands of hours refining their abilities of detecting lesions in medical images. ey gain a lot of experience in the learning process. According to Gestalt psychology research, radiologists read the images with the eyes and the brain. e visual rule plays an important role in recognizing masses from mammogram screening.
In clinical practice, radiologists tend to analyze medical images from overall impression down to individual parts for single-read mammography (or top-down), as shown in Figure 2. At the beginning, radiologists go through the mammogram and then focus on the highly suspicious areas. In vision psychology, the eyes can only accept a small number of associated units. If a Gestalt framework contains too many unrelated units, the eyes try to simplify it and combine the units into a big unit that can be processed easily. at is, our brain tends to combine and simplify the units [31]. en, all the suspicious areas (called regions of interest, ROIs) will be further analyzed to get the masses. Many factors need to be considered by radiologists, such as morphology, density, and correlation with surrounding tissue. Generally speaking, a mass is a generic term indicating a localized swelling protuberance or lump in the breast [11].

e Framework of Automatic Mass Detection Based on
Gestalt Framework. Inspired by the clinical practice, a mass detection method based on Gestalt framework is proposed in this paper ( Figure 3). We divide the framework into three stages including sensation integration, semantic integration, and verification. It can be viewed as a combination of bottom-up sensation and top-down recognition methods. For each part, there are various visual rules based on Gestalt psychology and morphological characteristics.

Stage 1: Sensation Integration.
In the initial stage, the visual sense of the radiologist plays an important role in mass detection. e attention is a process of selecting and getting visual information from pixels of the image (bottom-up) [36]. e principle of figureground segregation is one of the basic cognitive principles [37]. When applied to a mammogram, this principle supposes that the background does not contain any valuable information and is neglected by radiologists, obviously a simplified treatment. As a result, some existing methods propose to separate the figure from the background [38]. e  mammograms have low contrast and still have noise in the background such as tape markings and labels (as shown in Figure 3). We use an adaptive global threshold to compute the outline of the breast region [39]. Based on the morphological analysis, an image enhancement method is adopted [40], which can effectively suppress the background and enhance the features of masses on mammograms simultaneously. Meanwhile, the pectoral muscle is removed [41].
(2) Visual Attention in the Medical Image. As we all know, the mass areas are always "brighter" than the surrounding areas. at is, the highlighted region attracts more visual attention than the background region when doctors browse the mammogram. It is called prominence in Gestalt psychology. A simple method is to predefine a threshold value for a breast image. However, this approach is unfavorable as there is a large variance between tissues in mammogram. Following [42], opening operation is adopted to find the focal area in the mammogram: where ∘ is the morphological opening operation, ⊗ is the erode operation, and ⊕ is the dilation operation. e morphological opening operation consists of two steps in our method. Firstly, the original image I is eroded (⊗) with the structural element Φ. Φ is created by a flat disk-shaped structuring element with the specified radius of 6 pixels in the experiments. Secondly, dilation (⊕) is performed on the eroded image to produce the final reconstructed image (I Φ ). And then, we perform regional maximum on the reconstructed image, which retrieves all the salient regions without overselecting any of them.
(3) Visual Patches Generation. Gestalt theory aims to formulate some rules according to which the perceptual input is organized to unitary forms such as wholes, groups, or gestalt. e most common method is to group the similar or proximate object together. Inspired by the concept of "superpixel" [43], the basic processing units (visual patches) are generated by using our previous work [44]. In [44], the abdominal computed tomography (CT) image is divided into many visual patches as per the law of similarity evaluated by both intensity and spatial distance. Now, the proposed method is applied on mammograms ( Figure 4). Visual patches are generated by clustering pixels based on both intensity similarity and spatial proximity. Firstly, K cluster centers are set to divide the image into several rectangular patches. en, we use the following similarity index D s (2) to cluster pixels in mammograms: where d xy is the spatial proximity, which is calculated by Euclidean distance on image plane, d g is the intensity distance, and μ is the parameter of the pixel compactness [44]. S is a constant which balances the spatial proximity in image gray space, which is set as 25 in the experiment. e generated visual patches act as the basic processing units in the mass detection. Visual patches are generated only in the salient positions which reduce the computation expense greatly as shown in Figure 4. ese visual patches are the basic processing and analysis units in our algorithm. Suppose we have M visual patches which meet the condition of visual attention. U � P 1 , P 2 , . . . , P M is the set of M visual patches in a mammogram, and we assume that all of these patches are candidate masses in the sensation phase. Meanwhile, we define another set N � ϕ to store the patches which is regarded as normal tissue.

Stage 2: Semantic Integration.
After the first stage (sensation integration), there are still many false-positive patches in U after the first stage (sensation integration). According to morphological characteristics of breast tissue, we introduce the semantic integration to further distinguish the masses patches from normal tissue. Kinds of semantic features of visual patches are used to help separate the mass from the normal tissue. e semantic integration can be regarded as a top-down recognition process [45].
Observation 2: Normal tissue is always rich in glands and has poorer or lower density than that of the mass region. In addition, the shape of masses tends to be round and oval [46,47]. Masses and glandular tissues have very different shapes. Following the Gestalt principle, we propose two rules based on morphological characteristics of masses, including densification and shape.
(4) Densification of Mass. Eltonsy et al. [11] showed that the growth of a mass disrupted the normal breast parenchyma structure and formed a focal activity area called "focal seeds" in our research. According to the Gestalt principle, the law of similarity indicates that elements are perceptually grouped together if they are similar. e focal seed is regarded as an entirety because it has a strong self-similarity as shown in Figure 5(a), whereas normal tissue sometimes has a poor densification and even forms some holes in the visual patch ( Figure 5(b)). e main reason is that normal tissue contains rich glands that affect the consistency of a given visual patch.
e visual patches of masses exhibit homogeneity, whose solidity values are very high [48]. Here, the densification is defined as where M(i, j) is the gray-level co-occurrence matrix of the visual patch P(i, j) and u and σ are the mean value and variance for each visual patch P, respectively. e bigger the Dens (densification) is, the greater the probability of it being a tumor; otherwise, it is more likely to be the normal tissue. erefore, a threshold T dens is assigned based on the results of statistical analysis. Each visual patch in the candidate set U is detected, and the false-positive rate of visual patches is reduced as shown in (4), where N is the set to store visual patches of normal tissue: (5) Shape of Mass. As we all know, an abundance of glands exist in the breast. ey are radically arranged from the center of nipple, like a crown, occupying a large part of the mammogram. e brightness of the gland is most similar to the tumor tissue, and this causes high false-positive results in various CAD systems [16]. But the shape of visual patches which contain glandular tissue is always a long strip as shown in Figure 6.
Comparatively, the visual patches of tumor tend to be round and oval [46] (Figure 5(a)). In addition, the focused visual patch is positioned only in the center area of mass rather than its edge. e main reason is that many masses have no obvious margins surrounded by glandular tissues. e continuity law of Gestalt states that elements of objects tend to be grouped together and integrated into perceptual wholes. In other words, the patches containing glandular tissue can be easily perceived with the distribution of continuity. In that case, the linear patches can be filtered from the candidate visual patches.
Here, we bring in the concept of eccentricity to restrict the shape of each visual patch, which is an important index in ellipse. An ellipse fitting algorithm is used for each visual Journal of Healthcare Engineering patch edge to get the eccentricity (Figure 6).

e elliptic equation is defined as
A ere are 6 parameters in (5), that is, A ′ , B ′ , C ′ , D ′ , E ′ , and F ′ . ey can be estimated by using the leastsquares method according to the edge points of each patch. en, the eccentricity is defined using the long and minor axis of the fitted ellipse: where a i and b i are margin axis and minor axis of the ellipse, respectively. E(P i ) measures the circularity of the visual patch P i and the range is from 0 to 1. Since we assume that the eccentricity value of a visual patch is significantly lower in tumors than that in normal tissues. A threshold T e is defined to reduce the false-positive rate: Based on the above rules, the number of candidate patches in set U is greatly reduced, while the sensitivity and specificity are very high which will be explained in detail in the experiment. Another advantage of our method is that the focused visual patches only lie in the center of tumors. It can lay the foundation for the further proceeding and analysis, such as mass segmentation and mass diagnosis. e importance of texture features is obvious for mass detection. In this research, we have extracted intensity and gray-level co-occurrence matrix (GLCM) features on the candidate visual patches, including N dimensions for the patch P i , F(P i ) � g 1 , g 2 , . . . , g N . Among these features, two of them are the first-order texture feature, which is the mean of gray value and gray variance describing visual patches. Ten features are calculated from the gray-level cooccurrence matrix, namely, contrast, correlation, energy, homogeneity, standard deviation, inverse difference movement, kurtosis, skewness, entropy, and root mean square. e candidate patches left in U serve as the seeds of region growing algorithm. ROIs are obtained by clustering these visual patches based on texture features F. e features of each ROI are represented by the average value of associated patches as shown in (8), where M j is the total number of visual patches in ROI j : Recently, extreme learning machine (ELM) has been extensively studied, and important progress has been made in both theories and practical applications. e ELM theory in [49] has proved that random feature mapping can provide universal approximation capability. e ELM has built some tangible links between machine learning techniques and biological learning mechanisms. It is an emerging learning algorithm for the generalized hidden layer feedforward neural network [49,50]. Here, the ELM is used to simulate the final decision of doctors. Furthermore, it classifies the ROIs into normal and abnormal cases based on the texture feature extraction.

Data Set and Parameter Setting.
Our proposed method is tested on two publicly available data sets: MIAS [17] and DDSM [51]. A set of 257 mammograms of MIAS is used in the research. Among these images, 207 images do not contain any lesions while other 50 images have masses. e spatial resolution of image in MIAS is 50 µm × 50 µm, and grayscale intensity is quantized to 8 bits. e DDSM data set contains 210 images, in which 130 images contain masses and the other ones are normal mammograms. e images of DDSM have been resized to 1024 × 1024 pixels, and grayscale intensity is quantized to 8 bits in accordance with images in MIAS. In both MIAS and DDSM data sets, the mammograms containing masses have been annotated by expert radiologists, which are used for reporting the detection performance in our experiments. e ELM classification method divides visual patches into mass and nonmass candidates using 10-fold cross validation.
In our research, a series of indexes are used to quantitatively evaluate the effectiveness of our method. e performance indexes include sensitivity (Sens), false positives per image (FPI), homogeneity (Dens), and free-response receiver operating characteristic (FROC). ese indexes are described below.
(a) Sens and FPI are computed as a region-based evaluation. If the ratio of the overlapping region of the ground truth and the computer-segmented mass region is larger than 50%, the region is considered as "True Positive Marks" or "Positive ROIs" [52]. Otherwise, it is considered as "False Positive Marks," which is also called "Negative ROIs" in our experiments. For computer-aided systems, we would like the value of sensitivity to be as high as possible.
Meanwhile, FPI should be low while guaranteeing high sensitivity [53].
(b) In Section 3.2, we use homogeneity (Dens) to characterize the distribution of visual patches, which is defined as (3). It is based on the fact that every visual patch is an independent processing unit that should be homogeneous as per Gestalt rules [44]. e value of Dens ranges from 0 to 1. When patches are uniform, the value of Dens tends to be 1, while for nonuniform patches, the value tends to be 0. (c) In Section 3.5, free-response receiver operating characteristic (FROC) [52] curve is used. e FROC curve is defined as the plot of sensitivity (Sens) versus the average number of false positives per image (FPI).
All numerical methods are performed using MATLAB 2012b software running on a desktop PC with a 2.50 GHZ CPU and 2G RAM. Different from data-driven algorithms like deep learning, our method does not need a large amount of data. e major reason is that it is designed based on human visual characteristic and radiologists' experience.
ere are all together three types of parameters in our method, which are (descending order of importance) the thresholds for visual rules, the parameters for generating visual patch, and some other parameters of the ELM classifier. e medical images from different hospitals or different apparatuses may be completely different.
us, the parameters of the method should be modified on different apparatuses or data sets. e thresholds for visual rules (medical image attention, densification of mass, and shape of mass) are crucial in the proposed method, which determine the number of suspicious regions, because three key parameters in the method are independent. By comparing the Sens and FPI in different thresholds, three parameters can be determined for new validation data. In addition, the parameters for visual patches generation and ELM classifier have less impact on detection performance than that for visual rules. If the input size of image is [M, N], then the initial size of the visual patch can be calculated by M * N/K, where K is the cluster parameter for visual patches generation. at is to say, the bigger the value of K is, the smaller the size of visual patches is, and vice versa. e initial size of the visual patch is preferably less than the size of the entire breast mass. In this manuscript, both M and N is 1024, and K is set as 2000. ELM has been extensively studied, and it shows a good convergence speed and stability [49]. In our research, the parameters of ELM do not require careful adjustment. All parameters used in the experiments are shown in Table 1.

e Effectiveness of Visual Patches.
e mass region patches on mammogram are called "positive patches," and the normal tissue ones are "negative patches" in the following content. Figure 7 shows the statistical histogram of homogeneity of visual patches both for negative and positive patches. It is obvious that the homogeneity of all patches is above 0.85, with positive patches having a homogeneity value above 0.9, center around the value 0.95. We can draw a conclusion that the use of visual patches ensures the semantic consistency of objects in ROIs, which lays a solid foundation for further research. Besides, the homogeneity of positive patches is a bit higher than that of negative patches, which supports the description of similarity rules defined in Section 2.2.2. As shown in Figure 7, the homogeneity distribution curves of positive and negative patches are similar to normal distribution. e two curves are distinguishable owing to the higher homogeneity of positive patches than that of the negative ones.

Mass Detection Performance.
In the proposed framework, a series of visual rules have been defined, and finally, all the salient patches are saved (in set U). We compare the performance of our framework to three existing visual saliency algorithms: Agrawal et al. [16] (graph-based vision saliency), Achanta and Süsstrunk [54] (maximum symmetric surround saliency), and Murray et al. [55] (saliency estimation using a nonparametric low-level vision model), as shown in Figure 8. In Figure 8, each row corresponds to the output of four algorithms, and the corresponding mammograms are shown in the first column. e detection results of three stages in the proposed method have been listed in Figure 8(d). It is obvious that our method outperforms other saliency algorithms for mammogram mass detection. At the stage of verification, the false-positive rate is further reduced and there is a bit impact on the mass region compared with normal tissue. We experimentally observe that Agrawal et al. [16] yielded a relatively good result that is in accordance with what is reported in [16]. However, it computes the saliency of a region with respect to its local neighborhood using the directional contrast. But it is invalid when the mass is surrounded by dense gland tissues as shown in the last row of Figure 8. However, the positive aspect of our method is that it combines visual cognitive theory with various morphological characteristics of masses. e advantages can be summarized as follows: (1) e detection method based on gestalt rules is able to detect masses of varying sizes, resulting in a low false-positive rate (the green region in Figure 8). (2) e salient visual patches of our method mostly lie in center of the ground truth regions. e detected results based on the proposed method can be used for further analysis, such as mass segmentation and diagnosis. Table 2 shows the overall performance of our proposed method on MIAS and DDSM data set. As shown in Table 2, the performances of three stages of our method are given. Masses are detected effectively, and Sens reaches 92% at 1.94 FPI and 93.84% at 2.21 on MIAS and DDSM data sets, respectively. As shown in Table 3, we can get extended statistic information on both the total number and percentage of patches and ROIs. Meanwhile, the performance curves are plotted in Figure 9. From Table 3 and Figure 9, the number of positive patches and positive ROIs remains largely unchanged, whereas the number of negative examples is greatly reduced as the detection stages are introduced. e performance curves show the similar change of detection performance for MIAS and DDSM. We can draw a conclusion that the positive visual patches can be preserved, and false-positive results are declined dramatically under the gestalt rules constraint.

Influence of the Breast Density for Mass Detection
Performance. In general, masses in low-density breast are easily detected than masses in high-density breast [56]. To further evaluate the ability of our method, mass images with different densities are separated to test on the MIAS and DDSM data sets. e results are summarized in Table 4.
ere are three types of densities for MIAS, that is, fatty (F), fatty-glandular (G), and dense-glandular (D). Different from MIAS, the images in DDSM data set are divided into four categories based on breast density, that is, 1, 2, 3, and 4. e quantity proportions of each subset with different densities are listed in the first row of Table 4. is table shows Sens and FPI of mass images with different densities. It shows that the algorithm has good performance and works well on different mass images. Looking at the MIAS results, the proposed method has superior performance on fatty (F) and fatty-glandular (G) breast images compared to the dense-glandular (D) images. Similarly, the method performs better in low-density (level 1 and level 2) images than highdensity (level 3 and level 4) images on DDSM. Masses in low-density images usually have distinctive visual features compared with the tissue around it. Conversely, some visual patches with high density may cause erroneous judgment at the final recognition stage. So, the false-positive rates would increase when the data set includes many breast images with high density. In this section, the sensitivities for all lesions are 92% at 1.94 FPI on MIAS and 93.84% at 2.21 FPI on DDSM.
A common method for evaluating true-positive detection is free-response receiver operating characteristic (FROC) analysis [57]. It is a plot of operating points showing a tradeoff between the sensitivity rate and the average number of false positives per image. e complete FROC curves of our method are presented in Figure 10. e blue and red curves denote the detection results on MIAS and DDSM, respectively. We can get a favorable detection result when a false positive reaches 2 per image on the two data sets.

Comparison of Experimental Results.
e mass saliency is introduced in the proposed framework. In order to evaluate the ability of our method, experiments were conducted with 10-fold cross validation on MIAS and DDSM.
e performance is compared with other popular algorithms in terms of Sens and FPI. It is clearly shown in Table 5 that the proposed method has equivalent or even better accuracy     [52,53,58], following the similar pipeline, include two parts: image preprocessing and suspicious mass regions identification with different adaptive thresholds. In contrast, a sliding window scheme is utilized in [59], and texture features are modeled by kernel-based supervised hashing to get the mass location. Different from the sliding window in [59], visual attention of radiologists is used in our method. In the end, our method not only utilizes the advantages of machine learning approaches, but the visual saliency of mass is also modeled which achieves significant improvement in reducing false positives and sensitivity.

Conclusion
In this paper, we have proposed an automatic mass detection framework for digitized mammograms. e main contributions of our research can be summarized as follows: (1) e visual characteristic of radiologists is modeled based on the Gestalt theory. (2) An automatic mass detection framework is proposed which is in accordance with the doctors' visual perception. Some constraints are defined such as density and shape of visual patches. ese parameters are probably used by experienced radiologists in detection/diagnosis masses and so on. (3) Our framework achieves a good performance both on MIAS and DDSM data sets.   Different from existing detection techniques, our methods use the visual patches as the basic processing unit. We focus on providing a more efficient and more innovative data analysis method for lesions detection than traditional methods. In our future work, more mammograms from different apparatuses will be tested to evaluate the performance of our proposal. We will further improve our method as per the feedback reports from more radiologists. Moreover, we would like to focus on expanding this research to the deep learning method.

Conflicts of Interest
e authors would like to confirm that no conflicts of interest, financial or others, exist.