Accurate classification of hepatocellular carcinoma (HCC) image is of great importance in pathology diagnosis and treatment. This paper proposes a concave-convex variation (CCV) method to optimize three classifiers (random forest, support vector machine, and extreme learning machine) for the more accurate HCC image classification results. First, in preprocessing stage, hematoxylin-eosin (H&E) pathological images are enhanced using bilateral filter and each HCC image patch is obtained under the guidance of pathologists. Then, after extracting the complete features of each patch, a new sparse contribution (SC) feature selection model is established to select the beneficial features for each classifier. Finally, a concave-convex variation method is developed to improve the performance of classifiers. Experiments using 1260 HCC image patches demonstrate that our proposed CCV classifiers have improved greatly compared to each original classifier and CCV-random forest (CCV-RF) performs the best for HCC image recognition.
Liver cancer is a malignant tumor with high morbidity, which is a huge challenge that humans must face for a long time. Researches demonstrate that early diagnosis can greatly reduce the incidence of cancer. Computer aided diagnosis is important for early diagnosis, which can largely improve the accuracy of cancer diagnosis. With the rapid development of machine learning, there are many algorithms for image classification, such as
Recently, there are different kinds of classification models for pathological image classification. The paper [
However, accurate recognition of cells on pathological images based on hand-crafted features with classifier model remains a challenging task because of two main issues. One is that redundancy features not only influence classification results but also increase computing cost and another is how to optimize classifiers in a new form for more accurate results. To solve these two problems, this paper proposes a concave-convex variation (CCV) method to optimize three classifiers (random forest, support vector machine, and extreme learning machine) for the more accurate HCC image classification results. The main contributions of this paper are as follows. First, for removing the features with redundancy and low contribution value, a new sparse contribution (SC) feature selection method is proposed. Second, CCV model computes weights according to geometrical characteristic of cell nucleus and optimizes three classifiers using the weights.
The remainder of the paper is organized as follows. In Section
Cell image features directly influence the performance of classifier and general divided into three groups, which are intensity, morphology, and texture. Three groups of features are shown in Table
Nuclei feature used in histopathology.
Category | Features |
---|---|
Intensity | Density, mean, median, variance, kurtosis, skewness, and so on |
Morphology | Area, perimeter, diameter, area overlap ratio, center of mass, minor axis, major axis, smoothness, symmetry, and so on |
Texture | Gray level cooccurrence matrix, local binary pattern, scale-invariant feature transform Tamura, fractal, Markov random field, wavelets, Haar-like features, Gabor, run-length, and so on |
The flow of random forest.
RF increases the diversity between two classification models by constructing different training sets. After training, there is a sequence of classification models
Hematoxylin and eosin (H&E) staining is ubiquitous in pathology practice and research, which facilitates pathologist interpretation of microscopic slides by enhancing the contrast between cell nuclei and other histological structures. This allows pathologists to visually identify cellular components, extracellular structures, and lumen with relative ease [
As shown in Figure
Our proposed classification model flowchart.
In training stage, the whole slide images (WSI) are preprocessed using bilateral filtering firstly. Then, normal and HCC cells are selected from WSI under the guidance of pathologist. Then, single cell images are obtained by cell segmentation with a size of
During testing stage, similarly, the testing images are preprocessed with bilateral filtering. Next, gray-scale and binary images involving the segmented cells are obtained. Testing images extract and select features as with training stage. After that, initial classification results are acquired through the trained original classifiers. For our contribution of this paper, the more accurate result is calculated using the optimized classifier with CCV model.
Feature extraction is an important step of image classification. As described in Section
Divide the input image into cells with size of For every pixel in each cell, compare its gray value with neighboring 8 pixels. If the center pixel’s gray value is less than the neighboring pixel’s gray value, then mark the neighboring pixel as 1; otherwise, mark it as 0. Each Plot each cell’s histogram with computing the probability of each number (assume it as decimal number) and then normalize the histogram. Obtain the LBP textural feature of whole image by connecting each cell’s histogram into a feature vector.
Following these methods of feature extraction, this paper extracts 463 dimensions’ features for each patch, including intensity, morphology, and texture. Table
The number of each class’s features.
Intensity features | Morphology features | Texture features | Total features |
---|---|---|---|
21 | 185 | 257 | 463 |
After obtaining the complete features of HCC, feature selection is a normal practice. In recent years, some methods [
Let
After normalizing the data, a contrast mapping is performed to normalize the feature data of each row from
An example of the calculation of contribution value.
Based on the first two steps, a new feature set is represented as follows:
In the proposed feature selection method, there are three important characteristics with respect to HCC image feature set. First,
Concavity-convexity is an essential feature for image process [
Following this a priori knowledge, different from the concave-convex feature in [
A circumscribed rectangle of the contour is built through four horizontal and vertical lines. In this way, 4 tangent points are acquired, which are marked as
As shown in Figure
The edge curve and exterior rectangle of cell. The red points represent tangent points and blue points express select points for computing slope vector.
For writing convenience, let
After obtaining each HCC image’s CCV, the CCV models of normal and abnormal are established according to the labels of all training data.
A classification result of each original classifier (RF, SVM, and ELM) without CCV is acquired and it can provide the probability of each nucleus belonging to normal or abnormal. All testing data is roughly divided into two parts. For the classified normal testing data,
Following this method, the label of each HCC testing image is obtained using optimized CCV classifiers. To verify the effectiveness of CCV, three common classifiers (RF, SVM, and ELM) are utilized and the experiment results demonstrate that our proposed CCV classifiers improve classification performance than the corresponding original classifiers.
The experimental platform is Intel(R) Core(TM) i7-4790 CPU@3.60 GHz, 8 G RAM, 930 G hard disk, Windows 7 OS, and MATLAB R2016a simulation environment.
The Experimental Data of this paper are obtained by the pathology department of a large hospital in Shenyang, China. 96 hematoxylin-eosin (H&E) pathological images are used in our experiment. The picture format is TIFF and spatial resolution is 1280 × 960. The magnification of pathology images is 400. Figure
The liver tissue pathology images. (a) is normal image and (b) shows HCC image.
As mentioned in Section
The number of training images and testing images used in experiment.
Image type | Training images | Testing images | Total images |
---|---|---|---|
Normal images | 441 | 189 | 630 |
HCC images | 441 | 189 | 630 |
Total images | 882 | 378 | 1260 |
The cell gray-scale images and the cell binary images. (a1, a2) are normal cell gray-scale image and (b1, b2) are the matching binary images. (a3, a4) are HCC cell gray-scale image and (b3, b4) are the matching binary images.
This paper uses accuracy (ACC), sensitivity (SEN), and specificity (SPE) to evaluate the performance of classification. Sensitivity is the proportion of HCC cell images that are correctly classified and specificity indicates the rate of normal cell images that are correctly classified.
As introduced in Section
The performance comparison of three classifiers. (a) ELM versus CCV-ELM, (b) SVM versus CCV-SVM, and (c) RF versus CCV-RF. The blue bars represent original classifiers and the red bars are the classifiers optimized by CCV.
Based on Figure
This paper also plots precision and recall (
ROC curves of three classifiers. (a) ELM versus CCV-ELM, (b) SVM versus CCV-SVM, and (c) RF versus CCV-RF. The red curves represent original classifiers and the green curves are the classifiers optimized by CCV.
Figure
[
The performance comparison of VRRF and CCV-RF. The blue bars represent VRRF and the red bars show the RF optimized by CCV.
According to Figure
In Section
The running time of three classifiers.
ELM | SVM | RF | |
---|---|---|---|
Before selection | 2.320221 s | 463.804518 s | 35.183513 s |
After selection | 1.900448 s | 262.635644 s | 18.348838 s |
The ACC of three classifiers.
ELM | SVM | RF | |
---|---|---|---|
Before selection | 63.14% | 85.97% | 78.62% |
After selection | 75.26% | 92.12% | 87.15% |
Following Tables
According to compared results and analysis above, our proposed classifiers optimized by CCV improve classification performance compared to original. Different from concavity-convexity feature, the CCV model utilizes statistical property and difference between the testing CCV and the training mean CCV to adjust the optimized weights. The rule of adjusting weight is based on the Euclidean distance of CCV difference model. In addition, instead of parameter optimization for each classifier, the CCV model considers the difference among samples rather than the performance of classifiers.
This paper proposes a concave-convex variation (CCV) method to optimize three classifiers (random forest, support vector machine, and extreme learning machine). A new SC feature selection method is developed to remove redundancy features from complete feature set and it is beneficial for final classification and reducing the computational cost. Each classifier provides initial classification results and the corresponding probability. Then, we establish CCV statistical model according to all training data. The final classification results are obtained through CCV classifiers using the rule of adjusting weight. Experiments with 1260 HCC image patches demonstrate that our proposed CCV classifiers perform better than original classifiers in terms of ACC, SEN SPE, and
The authors declare that they have no conflicts of interest.
This research is supported by the National Natural Science Foundation of China (no. 61472073).