An Automatic Mass Screening System for Cervical Cancer Detection Based on Convolutional Neural Network

Department of Electrical Engineering, GIK Institute of Engineering Sciences and Technology, Topi 23640, District Swabi, Khyber Pakhtunkhwa, Pakistan Department of Electrical Engineering, Capital University of Science and Technology, Islamabad Expressway, Kahuta Road, Zone-V, Islamabad, Pakistan Department of Electrical Engineering, Mirpur University of Science and Technology (MUST), Mirpur 10250, Pakistan Centre for Innovative and New Technologies of Academy of Sciences of the Republic of Tajikistan, Rudaki Ave., 33, Dushanbe 734015, Tajikistan


Introduction
Cervical cancer is the leading cause of cancer-related deaths in females. It arises from the cervix, i.e., the lower and narrow end of the uterus, as shown in Figure 1. It starts due to abnormal growth of cells that have the ability to spread into other parts of the body. Human Papilloma Virus (HPV) infection is the major risk factor for cervical cancer. ere are no symptoms in the beginning of the disease, while with the passage of time symptoms may include abnormal vaginal bleeding, pelvic pain, and pain during sexual intercourse. It can be diagnosed earlier through regular medical check-ups [1].
ere are many diagnostic tests for cervical cancer identification. Papanicolaou (PAP) smear is the most commonly used test for cervical cancer screening worldwide. In the conventional PAP smear procedure [2], a speculum is inserted into vagina to widen the walls so that vaginal smear can be viewed. Several weeks are required to prepare for the final results of the PAP smear test. e process is timeconsuming and laborious. It requires microscopic examination of hundreds of thousands of cells for the diagnosis of precancerous and cancerous cells. In every 10 to 15 positive cases, there is a chance of one case to be missed in conventional screening [3]. e rate of the incidence of cervical cancer is lesser in the USA and other parts of developed countries because of early detection and better screening methods [4]. Its rate of occurrence has been dropped by 80% since the screening systems are introduced in some Nordic countries. In Sweden [5], it is dropped by 65% during the last four decades and the occurrence of cervical cancer and mortality figures are stable over the last decade. However, improved screening systems are still unavailable in underdeveloped countries, partly due to the complexity and tedious nature of manual screening of abnormal cells from a cervical cytology specimen [6,7]. While auto-assisted mass screening techniques can boost efficiency, they are not accurate enough to be used as a primary tool for cervical screening [8].
During the past few years, extensive research has been carried out for the development of computer-assisted automated reading systems based on cell image analysis [7,9,10]. e manual screening process is normally initiated with the collection of cervical cell samples from the uterine cervix and their placement on a glass slide. After visual inspection under a microscope, these are classified into different categories. e shape, size, texture, ratio of nucleus, and cytoplasm are the main characteristics for the classification task. Hence, for an automated system, the first step may include segmentation of images of cell samples to extract regions of interest, containing single cells with nucleus and cytoplasm, from the noncell regions. is initial segmentation is then followed by separation of main cell components including the nucleus and cytoplasm and extraction of their shape/textural features. However, the separation of main components and shape feature extraction is not an integral part of an automated screening system, as proposed schemes in the literature include both options, i.e., with geometrical feature extraction and without prior extraction.
For a system that includes prior feature extraction, accurate segmentation of nucleus from cytoplasm in cervical cell images is a difficult task and is prone to error, thus limiting the success of overall system. e presence of large irregular shapes, appearance dissimilarities, and cell clusters between malignant and benign cell nucleus is the major problem in accurately segmenting the cytoplasm and nucleus. Various segmentation algorithms have been proposed by researchers to segment out cell components. An iterative algorithm for assigning pixels based on a statistical criterion function was proposed in [11] to separate the nucleus, cytoplasm, and background. In another study [12,13], Gabor filters were applied for exploiting textural variation of the cervical cells to segment out regions of interest. Fuzzy C-means clustering was used in [14,15] to segment the single cell images into nucleus, cytoplasm, and background. However, if the overlapping cells are taken into account, the classification accuracy is decreased significantly. erefore, a majority of the presented segmentation approaches [16][17][18][19][20], [11,12,14] are effective in terms of their performance for single and clear cervical cell images only, but in the case of overlapping cells or other shape changes, they lack the performance accuracy.
To overcome this dependency on segmentation, many techniques have been proposed during the past few years, which do not include prior segmentation and directly classify the unsegmented cell images. A pixel-level classification method is proposed in [21] to classify normal and abnormal cells without prior segmentation using block-wise feature selection and extraction techniques. However, the validation accuracy of the proposed algorithm is not up to the mark. In [22], block image processing was proposed that includes cropping arbitrary image blocks prior to feature extraction, and the cropped blocks are then classified using SVM. However, in their approach, arbitrary cropping could potentially separate a full cell into distinct patches.
Recently, feature representation in image classification problems based on deep learning methods has become more popular [23]. In particular, convolutional neural networks (ConvNets) [24] have achieved unprecedented results in the 2012 ImageNet Large Scale Visual Recognition Challenge, which consisted of classifying natural images in the ImageNet dataset into 1000 fine-grained categories [25]. Besides, they have drastically increased the accuracy in the field of medical imaging [26,27], specifically classification of lung diseases and lymph nodes in CT images [28,29], and detecting cervical intraepithelial neoplasia based on cervigram images [30] or multimodal data [31]. ConvNets have also shown superior performance in the classification of cell images for diagnosis of pleural cancer [32].
However, large datasets are essential to achieve high performance and to overcome the problem of overfitting with ConvNets [33]. is is a major limitation in applying ConvNets to the cervical cell classification problem as in the case of cervical cells, and a limited number of annotated datasets are available. For instance, the Herlev dataset [34] only contains 917 cervical cells with 675 abnormal and 242 normal cells that are insufficient for ConvNets. To overcome this limitation, recently, image data augmentation techniques have been proposed to virtually increase the size of training datasets and reduce the problem of overfitting [25]. Data augmentation can be achieved by linear transformation of the data such as mirroring, scaling, translations, rotation, and color shifting unless the information of the object in the image is intact. Transfer learning [21,22,[35][36][37][38][39] is another solution to overcome data overfitting. In transfer learning, a ConvNet is first trained on large-scale natural image datasets and then can be fine-tuned to the desired dataset which is limited in the size. In this paper, an automatic screening system is proposed to classify malignant and benign cell images without prior segmentation using ConvNets. Due to limited size of Herlev datasets, transfer learning is used to initialize the weights and then fine tune on the dataset. e feature vector at a fully connected layer is extracted after fine-tuning and passed to various classifiers. To show the efficacy of the proposed approach, its performance is evaluated on the Herlev dataset for 2-class and 7-class problems. Malignant and benign cells are considered in the 2-class problem, while in the 7-class problem, all seven categories of the cervical cells have been explored. In short, the research contributions of the presented work are summarized as follows: (1) Our work is aimed at developing tool for automatic classification of cervical cells using the convolutional neural network. Unlike previous methods, it does not require prior segmentation and hand-crafted features. is method automatically extracts hierarchical features embedded in the cell image for the classification task.
(2) A data augmentation technique has been considered to avoid overfitting. e rate of overfitting has been reduced as the data augmentation strategy is applied to train our network. is approach is fruitful for our network to learn the most discriminative features of cervical cells and thus achieve superior classification results. (3) Transfer learning is also explored for pretraining, and initial weights are reassigned to another network for fine-tuning on cervical cell images. Training from scratch requires a large amount of labeled data which is extremely difficult in medical diagnosis. Moreover, the designing and adjustment of the hyperparameters are the challenging tasks with reference to overfitting and other issues. Transfer learning is the easiest way to overcome such problems. (4) We also conduct extensive malignant and benign cell assessment experiment on the Herlev dataset. Our results clearly demonstrate the effectiveness of the proposed convolutional neural architecture. e experimental results are compared with recently proposed methods, and our approach provides superior performance as compared with existing systems for cervical cells classification. e paper is organized as follows: the proposed methodology is presented in Section 2; experiments and results are given in Section 3; result-related discussion is presented in Section 4; and conclusion and future work are summarized in Section 5.

Proposed Methodology
e proposed automatic mass screening system for cervical cancer detection using ConvNets is shown in Figure 2. ere are four steps: (1) data collection, (2) preprocessing, (3) feature learning, and (4) classification of cervical cells. ese steps are explained in the following sections.

Data Collection.
e publicly available Herlev Pap smear dataset is used for the training and testing purpose. It contains 917 single cervical cell images with ground truth classification and segmentation. e cells are categorized into seven different classes. ese seven classes are diagnosed by doctors and cytologists to increase the reliability of the diagnosis. Furthermore, these seven classes are broadly categorized into two groups, i.e., malignant and benign. e first class to third class is normal or benign, while fourth to seventh class is abnormal or malignant. e class's distribution is shown in Table 1.
Normal and abnormal cell images are shown in Figure 3. It can be seen that the size of the nucleus in malignant or abnormal cells is larger than that of the normal cells. e difficult task from classification perspective is that the normal columnar cells have nucleus size quite similar to that of severe nucleus, and also chromatin distribution is same.

Preprocessing. Herlev dataset consists of images that contain multiple cells in a single image.
e data preprocessing phase includes image patch extraction from the original cervical cell images and augmentation of data for training ConvNet 2.

Image Patch Extraction.
e proposed approach, like previous patch-based classification methods, does not directly operate on original images present in the Herlev dataset that contains multiple cells at a time [40][41][42][43]. Image patches, each containing single cell, are first extracted. In order to extract the individual cell, presegmentation of cytoplasm is required [44]. e nuclei are first detected and then image patches of size M × M, and each centered on a nucleus is generated that embed not only the size and scale information of the nucleus but also the textural information, of the cytoplasm surrounding the nucleus. Scale and size of the nucleus is a very important discriminative feature between malignant and benign cells.

Data
Augmentation. An image data augmentation technique is used to virtually increase the size of training dataset and reduces overfitting [25]. As the cervical cells are invariant to rotations, they can be rotated from 0 − 360 degree with a step angle θ. In the data augmentation process, Nr � 10 rotations with θ � 36 degree, translations in the horizontal direction, Nth � 15 translations up to 15 pixels for each normal cells, while in vertical direction, Ntv � 8 translations up to 15 pixels for each abnormal cells are performed. Hence, we generate 300 image patches from a single normal cell and 160 image patches from each abnormal cell.
is transformation yields relative normal distribution, as the numbers of samples of abnormal cell images are as large as compared to that of normal cell images. e size of the generated image patch is set to m � 128 pixels to cover the cytoplasm region. ese patches are then upsampled to a size 256 × 256 × 3 using bilinear interpolation. ese upsampled image patches, as shown in Figure 4, are used in ConvNet 2 for initiating layer transfer and training [28]. e malignant cells in the Herlev dataset are 3 times more than the benign cells. erefore, it is natural that the classifier tends to be more biased towards the majority class, i.e., the malignant cells.
e unfair distribution of data is commonly solved by normalization of data prior to classification, whereby the ratio of positive and negative samples of data is evenly distributed [45]. is normalization process improves not only the convergence rate of training of the ConvNets but also the classification accuracy [25]. In the proposed approach, the training dataset is made balanced by unequal augmentation of benign and malignant cells, in which a higher proportion of benign training samples are generated as compared to malignant training samples.

Feature Learning.
e ConvNets can learn to discriminate features automatically for an underlying task. In this work, a typical deep model is used consisting of 2 ConvNets, named ConvNet 1 and ConvNet 2. At first, the base network ConvNet 1 is pretrained on ImageNet database that consists of over 15 million labeled high-resolution images, belonging to roughly 22,000 categories [46]. e images were collected from the web and labeled through human judgment using Amazon's Mechanical Turk crowdsourcing tool. In all, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing images [25]. ConvNet 1 contains five convolutional (conv) layers denoted as conv1-conv5, followed by three pooling (pool) layers denoted as (pool1), (pool2), and (pool5), and there are three fully connected (fc) layers as (fc6), (fc7), and (fc). All these layers are transferred to ConvNet 2, which is the network used for feature extraction, setting its initial parameters. is new network is then finetuned on the single cervical cell images of the Herlev database. is procedure is shown in Figure 5.
As described earlier, (conv) and (pool) layers are transferred from ConvNet 1 at the same locations to ConvNet 2. Both ConvNet 1 and ConvNet 2 share the same structure from (conv1) to (pool5) layers. However, the fully   e classification score is then calculated using three different classifiers including SR, SVM, and GEDT. e details of three classifiers are also presented as follows.

Softmax Regression (SR).
For the multiclass dataset, SR is used for classification of unknown samples that are first preprocessed according to the described approach. Unlike ConvNets, SR uses cross entropy function for the classification. e sigmoid function is replaced by softmax function. Mathematically, it is represented by the following equation: where we define the network input z as where w is a weight vector, x is the feature vector of training sample, and w 0 is the bias. is softmax function computes the probability score that training sample x i belongs to class j given the network z. e probability score is generated at the softmax layer of ConvNet 2, next to fully connected layer. Cross entropy function is used for the classification at the final layer of the ConvNet 2. Softmax layer of ConvNet is shown in Figure 6.

Support Vector Machine (SVM)
. SVM is a supervised learning model that uses an optimization method to identify support vectors S i , weights α i , and bias b. e classification is being considered to classify vectors x, according to the following equation: Mathematical Problems in Engineering 5

Rotations Translations
where k is the kernel function depending on the model assumed for decision boundary. In case of a linear kernel, k is a dot product. If c ≥ 0, then x is classified as a member of the first group and otherwise the second group. Error correcting code classifier is trained using support vector machine. e batch size is set to 256. e training set is applied to the classifier along with deep hierarchical feature vector using ConvNet 2. Validation data are used in SVM to calculate validation accuracy.

GentleBoost Ensemble of Decision Tree (GEDT).
Regression trees or GEDT are used to predict the response of the data. e classification decisions are made when the query sample follows the path from the initial or root node to the end or leaf node.
In GEDT, an ensemble of trees is used that is based on majority voting. It is trained on the training data, numbers of trees are set to 100, and batch size is set to 256. Validation accuracy of the Herlev dataset is evaluated using validation data.

Aggregated Score.
Evaluation of the cervical cell classification task is done using 5-fold cross validation on the Herlev dataset for both 2-class and 7-class problems. e performance metrics used for evaluation include accuracy, F1 score, area under the curve, specificity, and sensitivity. Finally, the count of correct classification score is obtained for each cell from all the categories in the Herlev dataset.

Experimental Protocol.
In the training stage, the conv and pool layers of Alexnet, i.e., ConvNet 1, as shown in Figure 5, are used as initial layers for the ConvNet 2. Random weights are initialized to fc. In order to train ConvNet 2, a patch of size 227 × 227 is cropped randomly from each augmented image to make training/test images   compatible to the input nodes of the network. To achieve zero-center normalization, a mean image over the dataset is subtracted. Stochastic gradient descent (SGD) is used for training ConvNet 2 using 30 epochs. Small batches of image patches are fed to ConvNet 2, and validation accuracy of batches is evaluated. e size of mini batch is set to 256. Initial learning rate for convolutional and pooling layers is set to 0.0001, which is set to decrease with a factor of 10 after every 10 epochs. L2 regularization and momentum can be tuned to reduce overfitting and speed up learning process of the ConvNet 2 [25]. L2 regularization or weight decay and momentum are empirically set to 0.0005 and 0.9, respectively. Finally, the network is trained using a randomly selected subset of epochs and validated for its accuracy. e model having a minimum validation error is used for classification application.
In order to test the system against an unseen image, multiple cropped patches of test images, each having single cell, are generated from the original images containing multiple cells. Abnormal score of each crop is generated by the classifier.
e abnormal scores of all (N test × N crop ) patches of the test image are aggregated to generate the final score [47]. Patches of test image N test � 300 (10 rotations × 30 translations) are generated same as for training images. Furthermore, ten cropped images (N test ) (four corner, center of cell, i.e., nucleus portion, and their mirrored images) are generated from each of test patch. ese (N test × N crop ) image patches are input to the classifier. e prediction score of the classifiers (SR, SVM, and GEDT) is then aggregated to calculate the final score, as shown in Figure 7.

ConvNet 2 Learning
Results. ConvNet 2 is fine-tuned on the Herlev dataset for 2-class and 7-class problems using 30 epochs. It is observed that, after 10 epochs, the validation accuracy reaches its maximum value, i.e., 0.9935 for the 2class problem and 0.8745 for the 7-class problem. Figure 8 illustrates a fine-tuning process of ConvNet 2 during 30 training epochs. ese results are improved by considering various classifiers. GEDT provides better performance with reference to both classes because it exploits the randomness of data more efficiently as compared to other classifiers. Table 3 shows the comparison of SR, SVM, and GEDT. e structure of the layers of network is also being observed after passing a test image to the fine-tuned network. Features learned at the first layer, i.e., (conv1) are more generic, as shown in Figure 9.
It can be seen that these learned filters contain gradients of different frequencies, blobs, and orientations of colors. As we go deeper in convolutional layers (conv1 − conv5), the features become more prominent and provide more information. Figure 10 shows the feature learning results in (conv2 − conv5) for a test cervical cell image. e strongest activation is also shown in Figure 11 at the pool layer. e white pixels show strong positive activation, while black pixels provide strong negative activation and gray does not activate strongly. It is also observed that the strongest activation initiates negatively on right edges and positively on left edges. e feature set at fully connected layer is also explored and it is observed that features are more abstract as compared to the previous layers, as shown in Figure 12. Fully connected layer provides features learned for 7 classes.

ConvNet 2 Testing
Results. It is a common knowledge that a single evaluation metric is not appropriate to evaluate the performance of a given algorithm due to the presence of some imbalanced classes in the dataset or a large number of training labels [48]. erefore, the performance of the deep model is reported in terms of four distinct metrics including accuracy, sensitivity, specificity, and F1 score, as proposed in the previous studies [49]. ese performance parameters are calculated using the following equations: where the precision and recall are expressed as precision � TP TP + FP , recall � TP TP + FN . In ConvNet 2, a test set of cervical images using the multiple crop testing scheme is considered with three classifiers, i.e., SR, SVM, and GEDT. It can be seen that GEDT again outperforms the classification accuracy of SR and SVM in test results also. e results are presented in Table 4.
By analyzing the class-wise accuracy, one can observe that the proposed method can predict the cervical cell images well. e classification accuracy of each of the seven cell categories is calculated by feeding all the images as test to the classifiers. It is observed that GEDT shows superior performance on class 1, class 2, class 4, and class 5 because of its ability to eliminate irrelevant features and to extract decision rules from decision trees efficiently. e performance slightly deteriorates for class 3 Figure 9: (a) Test image; (b) feature learned at (conv1).  Figure 13. e evaluation parameters of the classification performance, i.e., accuracy, F1 score, area under the curve, specificity, and sensitivity of the trained ConvNet 2 are displayed in Tables 5 and 6, where the performance comparison of proposed work with [13,39] and [50][51][52][53][54][55][56] is presented. We have proposed two scenarios with different classifiers, i.e., SVM and GEDT. e mean values of accuracy, F1 score, area under the curve (AUC), sensitivity, and specificity of fine-tuned ConvNet 2 with GEDT classifier are 99.6%, 99.14%, 99.9%, 99.30%, and 99.35%, respectively, for the 2-class problem. ese are 98.85%, 98.77%, 99.8%, 98.8%, and 99.74%, respectively, for the 7class problem.   e accuracy of our system, i.e., ConvNet with GEDT is 99.6% for 2-class and 98.85% for 7-class compared to 99.5% and 91.2% for [39], respectively.
is indicates that the prediction accuracy of our classification model is better than the existing models. Similarly, the sensitivity of 99.38% and 99.30% implies better performance of the proposed method compared to the existing methods in classifying the cervical cell images. Likewise, the values of specificity and accuracy of proposed system for the 2-class problem are better than previous methods in [15,16,34,36,38,41,56]. e images of cervical cells that are correctly classified or misclassified are also analyzed. Figure 14 shows the correctly classified malignant cell images; columns 1 to 4 are mild dysplasia, moderate dysplasia, severe dysplasia, and carcinoma, respectively. Figure 15 shows the result for test cell images that are misclassified (normal misclassified as abnormal and abnormal misclassified as normal).

Computational Complexity.
In the training phase, ConvNet 1 is trained on the Corei7 machine with clock speed 2.8 GHz, NVidia 1080Ti GPU, and 8 GB of memory on MATLAB R2017b. e average training time for ConvNet 2, running for 30 epochs, is about 4 hours and 30 minutes for the 2-class and 8 hours and 20 minutes for 7-class problem. In the testing phase, the system takes 8 seconds to classify a test image into normal and abnormal classes. Using multiple crop testing, i.e., (Npatches × Ncrops � 3000) classifications and score aggregation, the average time for the testing of one cell image is around 8 seconds.

Results-Related Discussion
e experimental results presented in this study suggest the following key observations: (1) Compared with the traditional prior feature extraction schemes, the proposed cervical cell screening system is more effective and robust. is is because ConvNets have been used to encode cervical cancer specific features automatically. In the traditional methods such as [11,12,14,15], cervical cells extraction strategies are hand-crafted which limit the success of overall system. Moreover, in the presence of large irregular shapes, appearance dissimilarities and cervical cell clusters between malignant and benign cell nucleus are the major problems in accurately segmenting the cytoplasm and nucleus. In contrast, this method uses automatic cervical cell features extraction to encode cancerous cell representation and thus achieve superior classification accuracy across a range of cells severity. (2) In order to prevent overfitting, a data augmentation technique, suitable for the underlying task of cervical cell grading, has been proposed. e training and validation losses for 30 epochs have been evaluated to analyze the impact of the proposed data augmentation on classification accuracy. It is observed that the rate of overfitting is greatly reduced when the data augmentation strategy is applied to train our classification model. e smaller difference between training and validation losses caused by data augmentation is presented in Figure 16. It indicates that how this approach is fruitful for the classification model to learn the most discriminative features for the desired task. Furthermore, the proposed model works across a variety of cervical cells and preserves the discriminative information during training. While, in the testing stage, a cell image with arbitrary level of severity can be easily classified into the true grading level. Hence, this suggests the efficacy of our method to avoid the classification model from overfitting and shows robustness for classification accuracy against varying nature of cervical cells. (3) A multiple crop testing scheme is also used with three classifiers to calculate the accuracy of all individual classes of cervical cell images. e class-wise accuracy displayed in Figure 13 shows, if the cervical cells are more clear, the classification ability of our system is more robust. For example, classification accuracy for class 1, class 2, class 4, and class 5 is the highest, i.e., 100% among all other classes. It is because this type of cells can be identified more effectively by the underlying model. e classification accuracy for class 7 is 99.20%. Conversely, the classification performance slightly degrades for the class 3 (97.50%) and class 6 (97.79%) because their features are very close to each other owing to lesser cervical cells specific discriminative information presented to the model. (4) In a general way, a single performance metric can lead to inappropriate classification results due to some imbalanced classes in the dataset or too small or large number of training samples. e literature Class-wise accuracy 6 7 Accuracy (%) review of the existing methods on cervical cell images such as [13,39], and [50][51][52][53][54][55][56] shows classification performance in terms of accuracy metric only. On the contrary, we have considered four distinct evaluation metrics including accuracy, sensitivity, specificity, and area under the curve. e experimental results displayed in Tables 5 and 6 show the consistent performance of our proposed models in Table 6: Comparison of the proposed work with existing state-of-the-art using the Herlev dataset.

Systems Features
Dataset Accuracy% [50] SVM 149 images of Herlev 98 (2-class) [51] k-NNs and ANNs Herlev k-NNs: 88; ANNs: 54 (2-class) [52] 20 features, fuzzy C-means, 2nd order NN Herlev 98.4 (2-class) [53] Decision tree Local 67.5 (2-class) [54] Backpropagation neural networks Local 95.6 (2-class), 78.06 (7-class) [55] 20 features, C-means/fuzzy clustering Herlev 94-96 (2-class), 72-80 (7-class) [13] Shape, texture, Gabor, SVM Local 96 (2-class) [56] 20 features, GA Herlev 98 (2-class), 96.9 (7-class) [39] CNN-ELM cervical cell images classification across different evaluation metrics. We have proposed two scenarios with different classifiers, i.e., SVM and GEDT. It is noted that the proposed scheme outperforms all previous approaches. is is despite of the fact that training and test images also contained images of overlapping cells. is exceptional performance is mainly because of following reasons: (1) during the training stage, transfer learning is used and the network is finally, fine-tuned on the Herlev dataset; (2) the trained network is used only for extraction of deep features; and (3) the extracted features are then fed to more robust classifiers like SVM and GEDT which are used for final classification. is suggests the effectiveness of our method for underlying task in the presence of a wide variety of cervical cell images ranging from class 1 to class 7. (5) e structure of different layers of fine-tuned network is also explored. It is seen that the features learned in the initial conv layers are more generalized, and as we move deeper into the network, the extracted features tend to become more abstract. e features learned at fully connected layers are displayed in Figures 9-12. (6) Despite higher performance of deep learning-based cervical cell screening system, it has some limitations. Classification time of testing a cropped single cell image is 8 seconds for the system that is very slow in clinical setting as there are large numbers of samples from one PAP smear slide. is limitation can be addressed by neglecting the process of data augmentation step for the test data, and only multiple crop testing can be used for classification problem. is increases the speed of the system as it requires only 0.08 seconds for classification, but the accuracy of the system is compromised by 1.5%. Although classification accuracy of the system on the Herlev dataset is high, there is room for further improvement.

Conclusions and Future Work
is paper proposes an automatic cervical cancer screening system using convolutional neural network. Unlike previous methods, which are based upon cytoplasm/nucleus segmentation and hand-crafted features, our method automatically extracts deep features embedded in the cell image patch for classification.
is system requires cells with coarsely centered nucleus as the network input. Transfer learning is used for pretraining, and initial weights or feature maps are transferred from a pretrained network to a new convolutional neural network for fine-tuning on the cervical cell dataset. e features learned by the new fine-tuned network are extracted and given as input to different classifiers, i.e., SR, SVM, and GEDT. e validation results for 2class and 7-class problems are analyzed. To test a single cell image, different test image patches are generated same as training data, and the multiple crop testing scheme has been carried out on all patches to achieve classifier score. It is further aggregated for the final score. e proposed method yields the highest performance, as compared to previous state-of-the-art approaches in terms of classification accuracy, sensitivity, specificity, F1 score, and area under the curve on the Herlev Pap smear dataset. It is anticipated that a segmentation free, highly accurate cervical cell classification system of this type is a promising approach for the development of auto-assisted cervical cancer screening system.
In future, the effect of system on field of view images containing overlapping cells is to be analyzed. e system should avoid the misclassification of overlapping objects. Specific classifiers relying on deep learning may be used to cater these problems. Moreover, deep learning-based cervical cell classification still needs to be explored for highprecision diagnosis. Data Availability e data used to support the findings of this study will be made available on request.