The Performance of Artificial Intelligence in Cervical Colposcopy: A Retrospective Data Analysis

Objective We aimed to evaluate the performance of artificial intelligence (AI) system in detecting high-grade precancerous lesions. Methods A retrospective and diagnostic study was conducted in Chongqing Cancer Hospital. Anonymized medical records with cytology, HPV testing, colposcopy findings with images, and the histopathological results were selected. The sensitivity, specificity, and areas under the curve (AUC) in detecting CIN2+ and CIN3+ were evaluated for the AI system, the AI-assisted colposcopy, and the human colposcopists, respectively. Results Anonymized medical records from 346 women were obtained. The images captured under colposcopy of 194 women were found positive by the AI system; 245 women were found positive either by human colposcopists or the AI system. In detecting CIN2+, the AI-assisted colposcopy significantly increased the sensitivity (96.6% vs. 88.8%, p=0.016). The specificity was significantly lower for AI-assisted colposcopy (38.1%), compared with human colposcopists (59.5%, p < 0.001) or the AI system (57.6%, p < 0.001). The AUCs for the human colposcopists, AI system, and AI-assisted colposcopy were 0.741, 0.765, and 0.674, respectively. In detecting CIN3+, the sensitivities of the AI system and AI-assisted colposcopy were not significantly higher than human colposcopists (97.5% vs. 92.6%, p=0.13). The specificity was significantly lower for AI-assisted colposcopy (37.4%) compared with human colposcopists (59.2%, p < 0.001) or compared with the AI system (56.6%, p < 0.001). The AUCs for the human colposcopists, AI system, and AI-assisted colposcopy were 0.759, 0.674, and 0.771, respectively. Conclusions The AI system provided equally matched sensitivity to human colposcopists in detecting CIN2+ and CIN3+. The AI-assisted colposcopy significantly improved the sensitivity in detecting CIN2+.


Introduction
Cervical cancer is a common malignant tumor among women. According to the estimation of the International Agency for Research on Cancer, there were more than 600,000 new cases worldwide and 340,000 women died from cervical cancer in 2020 [1]. It is well known that persistent infection with high-risk human papillomavirus (HPV) is the cause for cervical cancer and precancerous lesions, and cervical cancer is highly preventable by vaccination of prophylactic HPV vaccine and screening [2].
In recent decades, HPV testing is recommended to be used as a primary screening approach by guidelines. Women with positive screening results of HPV testing and cytology would be referred to colposcopy and biopsy [3]. e pathological diagnosis of the biopsy specimen is the golden standard for the early diagnosis of cervical cancer and precancerous lesions. Hence, the biopsy specimen obtained under colposcopy is essential for the accurate diagnosis. However, the accuracy of colposcopy and biopsy depends on the experience of the colposcopists [4]. e accuracy and reproducibility among different colposcopists and between the colposcopy finding and histopathology confirmed CIN varies greatly [5,6]. To avoid wasting of health resources caused by overdiagnosis or missing cases, it is imperative to improve the diagnostic accuracy of colposcopy [7].
With the fast development of computing and Internet science, artificial intelligence (AI) has been engaged in the healthcare industry in recent years, especially in the diagnosis of cancers [8][9][10][11][12]. In the field of cervical cancer prevention, efforts have been made in the development of computing scoring systems and artificial intelligence [13][14][15][16][17][18][19]. Computing scoring systems involving artificial intelligence were proposed to improve the quality of management of women with abnormal screening results [13,15]. Computational analysis was involved to improve the accuracy of cytology grading [16][17][18][19]. In the year 2020, Xue et al. reported that a colposcopic artificial intelligence auxiliary diagnostic system (CAIADS) for grading colposcopic impressions and guiding biopsies was developed and successfully validated and concluded that CAIADS achieved high sensitivity and comparable specificity to colposcopies interpreted by colposcopists [20]. We are interested in the performance of the AI system that identified the colposcopic images alone or assisted the human colposcopists in detecting high-grade cervical precancerous lesions. In this study, we selected an independent dataset to further evaluate its performance as an independent diagnosis system and as an assisted system.

Materials and Methods
is was a retrospective, diagnostic study in Chongqing University Cancer Hospital, Chongqing, China. e cytology, HPV testing, colposcopy findings, and histopathological results were collected along with the colposcopy images.
e selected records should be cytology abnormal or HPV testing positive, or self-reported symptoms that the gynecologists decided to perform colposcopy examination and biopsy and had colposcopy examination with sequential images for diagnosis and histopathology diagnosis. e images were captured by electronic colposcopy devices (Goldway, China) and were stored in a JPEG format (640 pixels × 480 pixels). e images for each woman included at least five images, which included a preacid image and four postacid images at 60 s, 90 s, 120 s, and 150 s. e personal information of all selected records was fully anonymized. All methods were carried out in accordance with relevant guidelines and regulations. e study was approved by the Research Ethics Committee of Chongqing University Cancer Hospital. e need for informed consent was waived due to the fully anonymized personal information.

Cytology and HPV Testing.
e cytology findings of the selected medical records were liquid-based cytology results and were reported according to the 2014 Bethesda nomenclature, including negative for intraepithelial lesion or malignancy (NILM), atypical squamous cells of undetermined significance or worse (ASC-US+), atypical glandular cells (AGC), atypical squamous cells that cannot exclude high-grade squamous intraepithelial lesion (ASC-H), the low-grade squamous intraepithelial lesion (LSIL), the high-grade squamous intraepithelial lesion (HSIL), squamous cell carcinoma (SCC), adenocarcinoma in situ (AIS), and adenocarcinoma (ADC).

Colposcopy and Histopathology.
In the colposcopy examination, 5% of acetic acid was applied to the cervix. e colposcopy finding was classified as normal/benign or abnormal (including low-grade, high-grade, and cancer). A punch biopsy was performed if acetowhitening epithelium was observed after the application of the acetic acid. e colposcopy-directed biopsy was performed targeted on each suspected lesion area. If colposcopy impression was normal, HPV testing and/or cytology results, self-reported symptoms, disease history, and benign findings (such as a polyp and condyloma) were taken into consideration for the necessity of performing biopsies or diagnostic excision. Endocervical curettage (ECC) was performed if necessary. e pathological results of the histological specimens were the golden standard. e final pathological diagnosis for a woman was based on the worst finding from the histopathological slides. All slides were reviewed by pathology experts from the Chongqing University Cancer Hospital.

e AI System.
e development and validation of the AI system were reported elsewhere by Xue et al. [11]. e AI system is consisted of a deep learning framework and a risk prediction scoring model. A convolutional neural network (CNN) is trained to crop cervix region from the colposcopy images.
e CNN-ResNet-50 [21] is employed as the backbone to identify the cervix bounding box. A fully convolutional network, U-Net5, is adopted in the AI system to perform lesion area segmentation. e cervical images with manual annotation on the lesion areas were used to train and validate the lesion segmentation U-Net. To address the false negative yielded by the deep learning framework, a risk prediction scoring model was designed to optimize the diagnosis by analyzing the cytology or/and HPV testing results. Cases with negative colposcopy but HSIL + cytology and hrHPV 16/18 with LSIL + cytology were suggested to be biopsied. e example pictures of the AI system identify and mark the suggested areas for biopsy and are shown in Supplementary Figure 1.

Statistical
Analysis. CIN2+ and CIN3+ were the clinical endpoints for the evaluation, respectively. e finding of the AI system was a dichotomy variable. A positive result of the AI system indicated a low-grade or worse finding under colposcopy. e addition of the AI system to human colposcopists was named "AI-assisted colposcopy." e AIassisted colposcopy was a dichotomy variable, and a positive result of AI-assisted colposcopy was defined as either human colposcopists or the AI system finding was positive. e enrolled medical data were classified by histopathology finding (negative, CIN1, CIN2, and CIN3+), cytology result (NILM, ASC-US, AGC, LSIL, ASC-H, HSIL, and SCC), HPV status (negative, HPV 16/18 positive, or other high-risk subtypes positive), human colposcopists colposcopy finding (normal, LSIL, HSIL, or cancer), the AI system finding, and the AI-assisted colposcopy (negative or positive). e sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were evaluated with 95% confidence intervals (CIs) calculated by the Wilson score method.
e areas under the curve (AUC) were evaluated. McNemar's test was used to evaluate the differences in sensitivity and specificity between the AI system, AI-assisted colposcopy, and human colposcopists. A p value less than 0.05 (two-sided) was considered to be statistically significant. Statistical analyses were conducted with IBM SPSS 21 software (IBM, New York, USA).

Discussion
In the presented study, we further validated the accuracy of the colposcopic deep learning auxiliary diagnosis system developed by Xue et al. e results showed that the AI system alone was accurate as of the human colposcopists in detecting high-grade precancerous lesions of the cervix with comparable sensitivity and specificity. e addition of the AI system to the human colposcopists could improve the sensitivity of detecting histopathological confirmed CIN2+, although with a lower specificity.
Colposcopy is a real-time visualization and assessment instrument of the cervix for the detection of CINs and invasive cancer. e accuracy of colposcopy and colposcopy-guided biopsy in detecting high-grade CIN and cervical cancer has been a concern for decades. It has been well documented that colposcopic assessment and biopsy were less reproducible and could miss a substantial proportion of prevalent high-grade CIN, and the false negative rate ranges from 13% to 69% [22][23][24][25]. To minimize the potential harm caused by the colposcopy and biopsy, it was suggested that the colposcopy should be performed by a well-trained, knowledgeable provider to reduce inaccurate diagnosis and resultant inappropriate management [26]. However, in real-world clinical practice, the countries and areas that suffered from the heavy disease burden of cervical cancer were usually at a shortage of experienced colposcopists. To improve the sensitivity of colposcopyguided biopsy, some suggested taking a multibiopsy and random biopsy from the normal appearing quadrants [27][28][29]. However, a widely adopted biopsy guideline is absent hitherto.
As computer science and technology are developing rapidly, the advantages of AI are at recognizing complex patterns in images and transforming the image interpretation from a qualitative and subjective task to one that is quantifiable and effortlessly reproducible [30]. e problem that being short of well-trained personnel seemed to be 20.5 * A positive finding by the AI system indicated the finding of the images was classified as positive by the AI system alone. † A positive finding by the AI-assisted finding indicated the finding of the images was classified as positive either of the AI system or human colposcopists or both. AI, artificial intelligence; ASC-US, atypical squamous cells of undetermined significance; ASC-H, atypical squamous cells that cannot exclude high-grade squamous intraepithelial lesion; AGC, atypical glandular cells; CIN, cervical intraepithelial neoplasia; HPV, human papillomavirus; HSIL, high-grade squamous intraepithelial lesion; LSIL, lowgrade squamous intraepithelial lesion; NILM, negative for intraepithelial lesion or malignancy; SCC, squamous cell carcinoma.  Journal of Oncology possible to be solved within a shorter time interval. e application of artificial intelligence for medical services has become promising for cancer and precancerous lesions screening [30]. To meet the need of improving the quality of colposcopy and biopsy, especially in low and middle-income countries, Xue et al. developed and validated a colposcopic deep learning auxiliary diagnosis system. In the previous results reported by Xue et al., the AI system achieved a high agreement (82.2%) for grading colposcopic impressions with the pathological gold standard (kappa 0.750). However, the observation agreement between the AI system grading and histopathological findings was 66.9% for HSIL. Since the task for the colposcopy examination is to decide whether to take a biopsy or not and to locate the suspicious lesions for detecting underlying cervical precancerous lesions for subsequent treatment, the AI system-graded HSIL finding seemed not to be a practical threshold for biopsy. In their validation set, if the biopsy threshold is set at low-grade or worse colposcopy findings, the sensitivity for the analysis of images by the AI method was 87.3% (95% CI: 85.5%, 88.9%) and the specificity was 48.9% (95% CI: 46.8%, 50.9%). In our data, the sensitivity for detecting CIN2+ by the AI system was 95.5% (95% CI: 89.0%, 98.2%) and the specificity was 57.6% (95% CI: 51.5%, 63.5%), respectively. e sensitivity for detecting CIN3+ by the AI system was numerically higher as 97.5% (95% CI: 91.4%, 99.3%). Xue et al. did not report the accuracy of adding CAIADS to the human colposcopists, instead of presenting the diagnostic performance of CAIADS and colposcopists separately, because the main task for the previous study was to construct an accurate AI method. However, for clinical implementation, it may not be possible to make the decision of biopsy based on the AI system alone, although it showed comparable sensitivity and specificity to the human colposcopists. Our data implied that the scenario of combining the AI system and the human colposcopists were practical, since in a population with a high risk of cervical cancer and precancerous lesions identified by HPV testing and/or cytology, a relatively higher sensitivity with a loss of specificity may be tolerable in clinical practice.
Our study further compared the performance of the AI system with human colposcopists and evaluated the AIassisted colposcopy in a practical clinical condition. e results suggested in resource-limited areas that lack welltrained, knowledgeable colposcopy providers but bear the heavy disease burden of cervical cancer; the AI system may be useful for assisting the biopsy procedure and for training young colposcopists. e limitation of this study was the single-center and retrospective design. e disagreement between the AI system and human colposcopists could not be addressed by the present study if extra biopsy was suggested by the AI system. A prospective study is necessary to further validate the predictive performance of the AI system and the AI-assisted colposcopy.
In conclusion, our study indicates that the analysis on the images of the AI system provided equally matched sensitivity to the human colposcopists in detecting CIN2+ and CIN3+. e AI-assisted colposcopy significantly improved the sensitivity in detecting CIN2+.
Data Availability e datasets used and/or analyzed during the present study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Dr. Yuqian Zhao and Dr. Yucong Li contributed equally to this work.