Feature Quantification and Abnormal Detection on Cervical Squamous Epithelial Cells

Feature analysis and classification detection of abnormal cells from images for pathological analysis are an important issue for the realization of computer assisted disease diagnosis. This paper studies a method for cervical squamous epithelial cells. Based on cervical cytological classification standard and expert diagnostic experience, expressive descriptors are extracted according to morphology, color, and texture features of cervical scales epithelial cells. Further, quantificational descriptors related to cytopathology are derived as well, including morphological difference degree, cell hyperkeratosis, and deeply stained degree. The relationship between quantified value and pathological feature can be established by these descriptors. Finally, an effective method is proposed for detecting abnormal cells based on feature quantification. Integrated with clinical experience, the method can realize fast abnormal cell detection and preliminary cell classification.


Introduction
Cervical cancer is one of the most malignant tumors that hazard women's health, and the morbidity of cervical cancer is rising consistently in recent years. Generally, the incubation period before the real formation of cervical cancer is long, and the early detection and confirmation can prevent it from further deteriorating.
Due to the comparatively easy curing of cervical cancer in the early stage, manual detection and identification become necessary. Moreover, fatigue and subjective factors may contribute to the improper diagnosis of cervical cancer [1][2][3]. Thus, it is necessary to build an efficient and highly accurate automatic diagnosis system.
The methods of computer image processing and analysis are applied to the study of cervical cell images, which mainly concerns the preprocessing of original images, cell feature extractions, classification of data, and the diagnosis outcome. There are many related works in the literature. In [4], a bottom-up searching method is applied to automatically examine cancer cells. It used 40 images, containing 149 cells, to validate the high performance of their proposed method.
By using the method, all cells are classified into 41 abnormal cells and 108 normal cells. In [5], a multilevel segmentation method, which is applicable to abnormal nucleus detection on cervical cells, is used to tackle the problems of the segmentation of abnormal nucleus areas and the separation of adhesion situations and cell clusters. Experimental results of [5] show that this method can deliver a high detection accuracy.
In [6,7], a cervical cancer detection method based on pixel-level top-down feature extraction strategy and svm (Support Vector Machine) feature classification is proposed. In [8], the authors extracted the cell-level morphological and luminosity features for classification, but the segmentation result is not satisfying and may undermine the accuracy of features. In [9], the authors proposed an automatic method for cervical cancer cell segmentation and classification. The authors used their proposed method to classify cervical cells into four classes, that is, normal cells, LSIL (low-grade squamous intraepithelial lesion), HSIL (high-grade squamous intraepithelial lesion), and SCC (squamous cell carcinoma), which are shown in Figure 1. However, most previous works only took single or a few cell images for analysis and the extracted features and analysis results are restricted to specific application.
In this paper, the images are provided by pathologists, which are used for lesion screening. In pathology domain, cervical cancer can be divided into two categories, that is, cervical adenocarcinoma and cervical squamous cell carcinoma. Compared to cervical adenocarcinoma, cervical squamous cell carcinoma is more common. Clinically, cervical cancer mostly refers to cervical squamous cell carcinoma. This paper is mainly concerned about the research on cervical epithelial cells and 48 pathological images that are taken to the process and analysis in our study.
In pathological diagnosis, liquid thin-layer cytology production technology is applied to get cervical smears, from which people can observe conveniently and obtain highquality microscopic images [10]. In Figure 2, there are many images in different stages. The categories are defined in the Bethesda system (TBS) [11].
In this paper, both feature quantification and abnormal detection are based on TBS grading standards and expert diagnosis experiences. According to the lesion degree, TBS classifies cervical squamous epithelial cells into different categories, as shown in Figure 3. Details of TBS grading standards are described below.
(1) Normal: normal stage, no lesions. (3) LSIL (low-grade SIL): expanded nucleus, the nucleus is at least three times as big as normal nucleus, with enlarged N/C (ratio between nucleus and cytoplasm), commonly having binucleated and multinucleated conditions, hyperchromatic and in homogeneous distribution, nucleus hyperkeratosis, and cytoplasm jacinth-dyed.
(4) HSIL (high-grade SIL): expanded nucleus the same as LSIL, with reduced cytoplasm, more enlarged N/C than LSIL, hyperchromatic, fine or coarse granules are in homogeneous distribution, irregular nucleus boundary, and the existence of nuclear grooves.
Taking full advantage of images of practical lesion screening and specialists' diagnostic experience can make computer assisted image analysis more valuable.
This paper is conducted under the assistance and instructions of pathologists. They also provide the cervical squamous epithelial cell images. The two main contributions of our study are summarized below.
One is cell feature quantification. Besides the commonly used features, like size, N/C, circularity, compactness, and color strength [12], some features related to pathology need to be extracted, including abnormal morphology, hyperkeratosis, and deeply stained degree. The extracted feature descriptors are related to cervical cell pathological descriptors, making feature parameters more valuable for further analysis.
The other is abnormal cell detection method based on feature quantification. Radiation propagation clustering method [13,14] is applied to classify abnormal cells into different categories. Moreover, the research on the features of abnormal cells can produce more information.

Acquisition of the Image Set.
Due to the complexity of pathological cell images, there are many overlapping and aggregated situations, as well as the weak boundary problem caused by uneven dyeing [15][16][17][18]. These serious situations may lead to unsatisfactory segmentation outcome. In our study, we apply manual segmentation approach to get the informative regions. The standard of manual segmentation is shown in Figure 4.
Taking Figure 2(c) image as an example, its related manual segmentation results are shown in Figure 5. The informative sections of Figure 2 Centroid locations can be used to determine which nucleus region belongs to which cell region. The judgment rule is the minimum distance between the centroid pair. The regions in the same class have the same color label, as shown in Figures 5(e) and 5(f). Cell image sets and nucleus image sets are the foundations of feature extraction.

Quantification of Morphological
Difference. The level of difference is calculated mainly by the comparison between abnormal cells and normal cells. In this paper, the level Computational and Mathematical Methods in Medicine of morphological difference is described by the feature combination of the size of nucleus , N/C , circularity , compactness , centroid position ( 0 , 0 ), and the nucleus boundary ( , ). The level of morphological difference can help pathologists detect morphological abnormalities of a single cell and help pathologists determine the lesion areas.
The morphological difference degree can be composed of two parts, which are the size difference degree and the shape difference degree. The size difference degree is described by the size of nucleus and N/C, which is mainly compared to normal cells. The shape difference degree can be described by circularity, compactness, and string distribution shape descriptor.

Size Difference Degree.
The ratio between the size of abnormal nucleus and the size of normal nucleus; the ratio between the N/C of abnormal cells and that of normal ones is indicated by ∇ . The corresponding equations are written as follows: where normal and normal represent the size and N/C of normal cells, respectively. and represent the size and N/C of detected cells, respectively. Based on pathology, we have the following.
Criterion 1. When the nucleus of the detected cell satisfies the conditions that ∇ > 0 or ∇ > 0 , the detected cell can be treated as abnormal and pathologists should not rule out the possibility of lesion for further analysis. ∇ 0 and ∇ 0 represent the thresholds. In this paper, ∇ 0 and ∇ 0 are set to 2 and 2.5, respectively.  ways, both of which are related to pathology and described by the following two criteria.

Shape
Criterion 2. When the shape of the detected cell satisfies that < 0 or < 0 , the cell can be determined as an abnormal cell and pathologists cannot rule out the possibility of its being lesion for further analysis. Here, 0 indicates the circularity of normal nucleus, while 0 indicates the compactness of normal nucleus. Each value is determined by each weighted average value of a set of normal nucleus and cells. The bigger the set, the higher the reliability of thresholds is. In this paper, 0 and 0 are set to 0.8 and 0.7. In Figure 6(a), is 0.7485 and is 0.6667, which satisfy Criterion 2. Therefore, the cell in Figure 6(a) can be judged as the abnormal one.
Criterion 3. When the string distribution of shape descriptor of the detected cell satisfies > 0 , the cell is determined as abnormal.
In this paper, 0 is set to 4. The string distribution of shape descriptor is based on the descriptor of nucleus boundary. The shape descriptor can be extracted by the following procedure. Given the binary maps, locating the nucleus position ( 0 , 0 ), to get the nucleus boundary by edge detection algorithm. Starting from a random point in the boundary, the distance between the point ( , ) and nucleus centroid can be calculated by traversing all points in the boundary.
The distance can be calculated as The distance values can be represented in the Cartesian coordinate. After using a high order polynomial function to fit the points in each plane, the total number of all peaks and valleys in each curve is counted as the string distribution shape descriptor. The bigger is, the more complex the nucleus shape is. From Figure 6(a) to Figure 6(c), values of nucleus are 4, 4, and 6, respectively. In Figure 6(c), the detected cell satisfies Criterion 2, so it can be determined as abnormal.

Hyperkeratosis and Deeply Stained Feature Quantification.
The phenomenon that cervical squamous epithelial cells turn to jacinth after dyeing is called hyperkeratosis. It is commonly seen in LSIL condition. From Figure 2 Deeply stained nucleus feature is important for lesion identification, especially for the judgment of cells on SCC stage. The color strength ( , , ) is used for the descriptor of the feature. The strength is defined by the average of (read), (green), and (blue) values. The relationship between the descriptors and pathological judgment can be defined as follows.

Abnormal Detection and Grading on Individual Cells.
Based on the features discussed above, a fast abnormal detection method on cervical squamous epithelial cells is proposed. The detection procedure follows the way that when the feature of detected cell satisfies any criterion, the cell is determined as abnormal and pathologists should not rule out the possibility of its being lesion for further analysis.
The cervical squamous epithelial abnormal cells have the traits, including the enlarged nucleus area, enlarged N/C, heteromorphism, deeply stained and hyperkeratosis. Based on the experiences of pathologists, using nucleus area and N/C, most abnormal cells can be easily identified. In this paper, Criterion 1 is first applied and then the cell is judged by Criteria 4, 5, 2, and 3, successively. Affinity propagation (AP) algorithm is implemented for further analysis on the detected abnormal cells. Aiming at simplifying dataset and performing classification to realize preliminary grading of abnormal cells, the AP algorithm can classify large amount of data directly without the predefined number of classes and preset centers. It is aimed at simplifying dataset and doing further classification based on clustering centers to realize preliminary grading of abnormal cells.
The generated clustering centers can be used as sample centers for further data analysis. Pathologists only need to make further analysis on the sample centers and therefore the screening efficiency can be highly improved. Because different parameters have different weights for identification, these parameters cannot be mixed up.
In our study, we take different features respectively to form sample distance with AP algorithm. Each sample center and corresponding feature threshold can be obtained. Our proposed fast abnormal detection method realizes preliminary grading on abnormal cell samples among three categories, which are LSIL, HSIL, and SCC.

Experiments and Results
From the image sets of cervical squamous epithelial cells, we randomly select 40 cells. After feature quantification on cell images and nucleus images, our proposed fast abnormal detection method is applied to cells classification. The 40 cells are classified into 34 abnormal cells and 4 normal cells. The detection result is shown in Figure 7 and the detection accuracy is 100%. Experimental results show that the abnormal detection method on cervical squamous epithelial cells is efficient and effective.
Applying AP algorithm to the quantified features of the 36 abnormal cells, we get sample centers, which are shown in Tables 1 and 2. The classification thresholds can be set by the results of AP algorithm. In Criterion 1, ∇ 0 = 2 and    Table 3. In the detection and identification of abnormal cells, cells in SCC stage can be easily detected, which satisfy the features of deeply stained, enlarged area, and N/C. More specifically, when the parameters of cell features satisfy that < 120, < 120, < 200, ∇ > 2, and ∇ > 10 at the same time, the detected cell is the abnormal one and is possibly cancer. There are two main differences between LSIL and HSIL. The first is that the N/C of LSIL cells is smaller than that of HSIL cells, while the second is that HSIL cells have heteromorphic features and HSIL cells have deeply stained nucleus phenomenon. Thus, when the parameters of cell features satisfy that < 0.8 and < 0.7 at the same time or ∇ < 5, < 170, < 120, and < 200 at the same time, the detected cell is abnormal and its possibility of being in the LSIL stage cannot be ruled out. After the determination of all LSIL and HSIL cells, the rest of undetermined abnormal cells cannot rule out the possibility of being in the HSIL stage.
Based on the experimental results in Table 3, the grading accuracy of abnormal cells is 76.47%. A small amount of misclassification can be tolerable, which is mainly due to two practical facts. First, cells in more severe stages cannot be ruled out the possibility of being in the comparatively less severe stages. In practical application, cells in SCC stage may

Conclusion
This paper presents the study on feature quantification and abnormal detection on cervical squamous epithelial cells.
Two main aspects are accomplished. First, on the foundation of stored cell image sets and integrating with various feature descriptors, we extract quantified features of individual cells and aggregation cells. These feature descriptors can convert images into data information and are used for building criteria. Second, a fast abnormal cell detection method is proposed. The method takes advantage of clinical experiences and can realize the detection and identification of individual cells. Integrated with pathological experiences about feature quantification and criteria, the detection accuracy is high, but the segmentation and classification of cell images need more substantial work to improve their design and effectiveness.