A New Approach for Clustered MCs Classification with Sparse Features Learning and TWSVM

In digital mammograms, an early sign of breast cancer is the existence of microcalcification clusters (MCs), which is very important to the early breast cancer detection. In this paper, a new approach is proposed to classify and detect MCs. We formulate this classification problem as sparse feature learning based classification on behalf of the test samples with a set of training samples, which are also known as a “vocabulary” of visual parts. A visual information-rich vocabulary of training samples is manually built up from a set of samples, which include MCs parts and no-MCs parts. With the prior ground truth of MCs in mammograms, the sparse feature learning is acquired by the l P-regularized least square approach with the interior-point method. Then we designed the sparse feature learning based MCs classification algorithm using twin support vector machines (TWSVMs). To investigate its performance, the proposed method is applied to DDSM datasets and compared with support vector machines (SVMs) with the same dataset. Experiments have shown that performance of the proposed method is more efficient or better than the state-of-art methods.


Introduction
Breast cancer is the most common tumor disease in women, with the increasing incidences in recent years. And also, it is one of the major death causes among middle-aged women in the world. Currently, digital mammograms are one of the most reliable methods to perform the early diagnosis, which is very important for the effectiveness of treatment methods.
In digital mammograms, an important sign of the early breast cancer is the existence of MCs. They always exist in 30%-50% of mammographically diagnosed cases, which are present with tiny bright spots of different morphology. Microcalcifications are small calcifications with different shapes and densities, approximately 0.1-1 mm in diameter. Isolated microcalcifications are not dangerous, but a microcalcification cluster might be an early sign of breast cancer [1], which is a region including more than three microcalcifications per 5 mm × 5 mm.
However, there is only about 3% of useful information in mammograms, which can be seen by doctors with the naked eye. Due to the fact that most details in mammograms cannot be perceived by human eyes, it is even very difficult for a skillful radiologist to find the sign of early breast cancer, that is, MCs, as a result missing the best time for treatment. So, one of the key techniques for early diagnosis of the breast cancer is to detect MCs and to judge whether they are malignant or not in mammograms.
According to recent researches, there are several existing criteria to characterize the MCs shape properties. Among them, one of the well known is the category criterion proposed by Le Gal et al. [2], which illustrates five groups (shown in Figure 1). Different groups identify different kinds of MCs in the ascending order of the degree of malignancy. As shown in Figure 1, the first group describes the O-shaped calcifications and partially calcified ones, known as teacup calcifications. The second one includes regular and round calcifications with uniform density. The third is composed of calcifications with the same shape and smaller size than the second one. Class IV is also called salt shaped, which is irregular MCs related to the high degree of malignancy. Type V is also closely related to a very high degree of malignancy, which is called vermicular shaped.
Up till now, computer aided diagnosis (CAD) is still a useful tool in breast cancer detection to improve the accuracy of radiologists and to help radiologists to read mammogram films. It may provide good help to radiologists in interpreting 2 The Scientific World Journal V IV III II I Figure 1: Le Gal's MCs classification standards. Type I: annular; Type II: regularly punctiform; Type III: dusty; Type IV: irregularly punctiform; Type V: vermicular calcification. mammograms to detect MCs and classify them into malignancy or not. A large number of researchers in this field have been trying to find effective methods to automatically detect MCs and categorize them as normal, benign, or malignant.
However, how to successfully apply the mammography technology to detect breast cancer and design a breast cancer detection system greatly depends on the careful designing of the two important modules: feature selection and sample classification. A lot of well-established methods have been proposed to address this challenge problem. According to [30], these methods can be categorized into the following groups: (a) traditional methods, such as linear discriminant analysis (LDA), K-nearest neighbor (KNN), logistic regression (LR), and generalized partial least square (GPLS); (b) classification trees and aggregation methods, such as classification and regression tree (CART), bagging and boosting (BB), random forest (RF), and ensemble learning (EL); (c) machine learning based methods, such as neural network (NN), support vector machines (SVMs), and twin support vector machines (TWSVMs); and (d) generalized methods, such as flexible discriminant analysis (FDA), bias discriminant analysis (BDA), mixture discriminant analysis (MDA), and shrunken centroid method.
To detect the early sign of this disease and to aid doctors to diagnose breast cancer in early stage, a novel approach for MCs classification is proposed based on sparse feature learning and representation with TWSVMs, which is inspired by the recent progress in 1 -norm minimization-based approaches [31,32]. These approaches, such as compressive sensing for sparse signal reconstruction, basis pursuit denoising, and the Lesso algorithm for features selection, have been well developed. Inspired from the above well-established approaches, our approach presented in this paper is based on the belief that the key problem to finding a solution to the problem depends on learning the suitable representation.
Especially, to extract high-level and conceptual information, for example, the existence of an MC in a mammogram block, it is very important for us to convert the low-level input, such as the pixel value, to high-level and more meaningful representations. Through this transformation, the feature learning and detection process will be well constructed.
Ideally, a test example can be represented just from the training samples of the same category. Therefore, when the test sample is expressed as a linear combination of the entire training sample, the coefficient vector will be sparse. That is, there will be relatively few nonzero coefficients in the vector. Test samples from the same category will have a similar sparse representation, while test samples from different categories will lead to different sparse representations. So the sparse representation coefficients can be treated as the more meaningful and discriminant information for the samples classification. In order to get the sparse coefficient vector, we use 1 -regularized least square [33] to solve the problem.
To achieve a good performance for MCs detection, we designed two methods to achieve the goal of the detection system. The first one is the sparse discriminant analysis algorithm, which is achieved by computing the residuals of sparse coefficients of the test sample between the centroid sparse coefficients of training samples. As we have known traditional supervised learning methods always use a training procedure to create a classification model for testing. But the proposed sparse representation based approach does not contain the separate training and testing sections. We directly achieved the classification goal out of the testing samples' sparse representation according to the training samples. Another unique feature of the new method is that no model selection is needed. The second one is designed by the combination of sparse representation and the state-of-the-art classifier TWSVMs. We employ the sparse representation approach as a feature learning method in terms of the coefficient vector for samples feature extraction, and then we feed it with the The Scientific World Journal 3 trained TWSVMs to formulate the detection method as a supervised learning approach.
The paper is organized in five sections. Technology backgrounds of our approach are presented in Section 2; sparse representation based MCs detection algorithm and TWSVM based MCs detection with sparse representation are given in Section 3. Thereafter, the sparse representation based MCs detection algorithms are formulated, and experimental results are illustrated in Section 4 accordingly. Finally, conclusions are drawn in Section 5.

Technology Backgrounds
and ‖ ‖ 0 is minimized, where ‖ ‖ 0 represents the 0 -norm, which means that it is equal to the number of non-zero components in the vector c.
Suppose that we define a matrix by putting x as the th column of A = [x 1 , x 2 , . . . , x ]; we can convert the problem of sparse representation into c = min c ∈ c 0 subject to y = Ac. ( How to get the close solution of the sparse representation problem is NP-hard, because it is a combinational optimization. If we replace the 0 -norm in (2) with -norm, an approximation solution can be gotten. Thus, where the -norm of a vector u is defined as ‖u‖ = (∑ |u | ) 1/ . A generalized version of (3), which allows for certain degree of noise, is defined to find a vector c, when the following objective function is minimized: where the scalar regularization is a positive parameter, which balances the trade-off between sparsity and reconstruction error.
Recently, development in the theory of compressed sensing and sparse representation reveals that, if the solution of (2) is sparse enough, the solution of the 0 -minimization problem is equal to the solution of the following 1minimization problem [33], which takes = 1 in (4): We can solve the problem in polynomial time by quadratic programming or standard linear programming methods. If the solution is very sparse, there will be more efficient methods to solve this problem.

Twin Support Vector Machines.
Twin support vector machines (TWSVMs) are a new binary data classifier proposed by Jayadeva et al. [34], which aims at obtaining two nonparallel planes close to two nonparallel planes such that each plane is closer to one of the two classes and is as far as possible from the other. There are two quadratic programming problems (QPPs) to be solved in the TWSVMs. But each QPP is of a smaller size instead of a large one as we have in the traditional SVMs. So, to some extent, TWSVMs work much faster than the stand SVMs facing the same classification problem; that is, it is more efficient According to (6) we can solve the following two QPPs to obtain the TWSVM classifier: where e 1 and e 2 are vectors of ones with proper dimensions and 1 , 2 > 0 are scalar parameters. The first term, in the objective function of (6) or (7), is defined by the sum of squared distances from the points of each class to the hyper plane. So, minimizing the objective function tends to keep the hyper plane close to points which belong to one class (say class +1). Meanwhile, the constraints request of the hyper plane to be a distance of no less than 1 to the points belonging to the other class (say class −1). In the model, we use a set of error variables to measure the computing error, whenever the distance to the hyper plane is less than 1. In (7), the objective function aims at minimizing the total error variables, so as to attempt to minimize the misclassification with respect to the points belonging to class −1.
TWSVMs are composed of two QPPs. And the objective function in each QPP, corresponding to a particular class and constrains, is determined by the patterns belonging to the other class. In TWSVM1, patterns of class +1 are clustered near the hyper plane x w (1) + (1) = 0. Similarly, in TWSVM2, patterns of class −1 are clustered around the hyper plane x w (2) + (2) = 0. The Lagrange equation corresponding to TWSVM1 is given by (w (1) , (1) , q, , ) where H H is positive semidefinite. With the K.K.T. conditions and the Lagrange equation of problem TWSVM1, we can get the Wolfe dual of TWSVM1 as follows: Similarly, if we consider TWSVM2; then we can also obtain its dual as (DTWSVM2) And we define the augmented k = [w (2) ; (2) ] as If we get the vector u and k from (10) and (13), the hyper planes can be obtained. Suppose that we have a new sample x ∈ , which is assigned to class ( = 1, 2). The data belongs to which category will be determined by the closer plane to the sample; that is, where | ⋅ | represents a vertical distance of the point x to the plane x w ( ) + ( ) = 0, = 1, 2. From the K.K.T. conditions, we can observe that patterns lie on the hyper plane given by x w (1) + (1) = 0 of class −1 given 0 < < 1 ( = 1, 2, . . . , 2 ), and such patterns of class −1 are called support vectors of class 1 according to class −1 as they play a key role when we determine the required hyper plane. A similar observation can be gotten from the problem TWSVM2. From the example shown in Figure 2, one can find the difference between TWSVMs and traditional SVMs.

Database and Evaluation
Metrics. The digital database for screening mammography (DDSM) [35] was built by the University of South Florida, which is available for research at [36]. In our experiments, all images manually selected from this database are intensity images, digitized at 43.5 m/pixel and 12-bit gray scale. To evaluate our methods, we totally selected a set of 267 images from the DDSM, which are all clinical mammograms, to form an evaluation database.
To summarize quantitatively the performance of the proposed method, we used receiver operating characteristic (ROC) curves [37]. Receiver operating characteristic analysis is based on statistical decision theory, which is a commonly used criterion for classification performance measure. We can get a ROC curve by figuring the plotting of the classifier's sensitivity (also known as true positive classification rate) as a function of the classifier's specificity (also known as false positive rate). Sensitivity is a probability of correctly classifying a target object. Specificity is a probability of incorrectly classifying a non-target object. The area under the ROC curve (Az) is defined as an accepted way to compare the classifiers performance. Higher Az would characterize the greater discrimination capacity. A good classifier should have a true positive rate of 1.0 (or 100%) and the false positive rate of 0.0 accordingly with respect to an Az of 1.0.

MCs Classification Based on Sparse
Representation. Ideally, the nonzero entries in the estimated c will be associated with all the columns in A from a single category. So we can easily designate the test image y to that category. However, because of the existence of noise, the nonzero entries sometimes may be related to multiple categories. A lot of the-stateof-art classifiers can resolve the problem. For example, we can simply designate y to the category with the largest entry of c. However, such heuristics cannot model the structure of the subspace associated with MCs blocks.
To better model the structure, we classify y based on how much the coefficients relate to the training sample of each category reproducing y alternatively. For each category we define the corresponding characteristic function : R → R to select the coefficients associated with the th category. If ∈ R , ( ) ∈ R is a new vector whose nonzero entries are the entries in that belongs to the category and whose entries associated with all the other subjects are zero or very close to zero. The classification algorithm can be summarized as follows.

MCs Classification Approach with Twin Support Machines and Sparse Representation (TWSVMs-SR).
Given a set of mammogram blocks, each block is transformed and represented in terms of the sparse coefficients with respect to the parts from the vocabulary constructed in the image sparse representation and learning stage. Each block is then transformed into a vector and represented as a sparse feature vector based on the vocabulary parts (the negative and positive samples in the sparse learning set). Then we can use the learned sparse coefficients as each block's sparse feature. So we consider the sparse representation approach as a feature extraction method in terms of the coefficient vector, and then we feed it with the state-of-the-art classifier (TWSVMs) to formulate the detection method as a supervised learning approach.
If we have a set of training blocks labeled as positive (MCs) or negative (non-MCs), each image is represented as a feature vector using the method described above. These feature vectors are then fed as inputs to TWSVMs, which learns to classify an image block as a member or not a member of the object category, under some associated confidence. When we get the learned TWSVMs model, we can use the learned classifier as a reliable detector to perform the detection task.

Results and Discussion
Up till now, we have illustrated our new methods to detect MCs in mammograms. In this section, we will evaluate the performance of our methods by using the real mammogram data obtained from DDSM. In our experiments, we use the training, test, and validation sets, which were randomly selected from the preprocessed image blocks. The blocks included 3000 with true MCs and 3000 with normal tissue. We chose 75% of the blocks for training and 25% for test.
For a given digital mammogram, we formulate the MCs detection and classification approach as the following steps.
Step 1. We first preprocess the mammogram to remove the artifacts, suppress inhomogeneity of the image background, and enhance microcalcifications in the breast area.
Step 2. At each pixel location of the image, we manually select a small window (x = × , where = 115) to describe its surrounding image feature.
Step 3. Get the sparse representation of each image block by using the proposed methods.
Step 4. Apply the proposed MCs detection methods to decide whether x belongs to MCs or not.
In our experiments, they are designed to quantitatively verify the performance of sparse representation based methods for MCs detection and classification by using mammograms. To get an accurate performance measure in this study, a stratified 5-fold cross validation method is employed. We also compared our approaches with the state-of-the-art algorithm, SVMs, which have been successfully applied in MCs detection.
All the experiments are performed on a notebook computer with DUO Intel 2.54 G CPU and 4 G memory under Windows 7. MATLAB 2011 is used to implement sparse representation based MCs detection methods. To get the sparse representation of each image block, we employed the 1 MATLAB package to achieve the optimization. 1 is a toolbox in MATLAB implementation of the interior-point method for 1 -regularized least squares solution.
Before we performed the MCs detection methods, we first used the 1 package to learn the sparse transform matrix with the training dataset. The trained sparse transform matrix is evaluated using all the mammograms in the test dataset. Because TWSVMs do not vary significantly over a wide range of parameter settings, we chose the fixed parameters of TWSVMs, having an RBF kernel with = 15 and 1 = 2 = 1000 in our experiments.
First, we did the experiments using MCs-SRC algorithm to perform the MCs detection and classification. The test results are summarized with ROC curves in Figure 3 for the MCs-SRC method. For comparison, ROC curve is also shown for the sparse representation based TWSVMs and SVMs methods with the same inputs.
From Figure 3, it can be shown that the proposed algorithm has a higher detection accuracy rate compared with SVMs and TWSVMs with the same dataset and configuration. By using the same test samples, compared with SVMs and TWSVMs, the proposed method has a better detection performance when we train the classifier. In particular, the proposed method achieved the averaged sensitivity of approximately 92.17% with respect to 7.83% false positive rate and Az = 0.9507. With the same training data set and test data set, the TWSVMs classifier achieved a sensitivity of 90.01%, 9.63% false positive rate, and Az = 0.9459, and SVMs achieved a sensitivity of 87.13%, 9.88% false positive rate, and Az = 0.9298.
To evaluate the stability of our methods, we repeat the sampling 50 times so that we can compute the mean and standard deviation of the detection accuracy, sensitivity, and specificity. We perform the detection and classification task in 20 rounds, and in each round we randomly select training samples from 95% of training samples to 5% to train classifiers. The trained classifiers are evaluated using the other 500 test samples. Average experimental results of the MCs-SRC method, compared with SVMs and TWSVMs, are shown in Table 1.
From Table 1, we can see that the MCs classification with Twin Support Machines and sparse representation (TWSVMs-SR) approach has a bit better performance than the other two. But as we have known, the MCs-SRC method does not need to train a classifier, because the features are acquired by learning, so it will get the classification more efficiently.

Conclusions
In this paper, a novel approach is described to aid breast cancer detection and classification using digital mammograms. The proposed method is based on sparse feature learning and representation, which expresses a testing sample as a linear combination of the built vocabulary (training samples). The sparse coefficient vector is obtained by using