Use of Nuclear Morphometry Characteristics to Distinguish between Normal and Abnormal Cervical Glandular Histologies

This is a methodological study exploring the use of quantitative histopathology applied to the cervix to discriminate between normal and cancerous (consisting of adenocarcinoma and adenocarcinoma in situ) tissue samples. The goal is classifying tissue samples, which are populations of cells, from measurements on the cells. Our method uses one particular feature, the IODs‐Index, to create a tissue level feature. The specific goal of this study is to find a threshold for the IODs‐Index that is used to create the tissue level feature. The main statistical tool is Receiver Operating Characteristic (ROC) curve analysis. When applied to the data, our method achieved promising results with good estimated sensitivity and specificity for our data set. The optimal threshold for the IODs‐Index was found to be 2.12.


Introduction
This study explores quantitative pathology (QP) as a possible aid for detection and diagnosis of adenocarcinoma (AdCa) and adenocarcinoma in situ (ACIS) of the cervix. QP methods measure nuclear characteristics through the use of digital image analysis. Com- 1 The work described in this article was performed while LCDR Loyd A. West, MC, USNR was a fellow at M.D. Anderson Cancer Center training in Gynecologic Oncology. The views expressed in this article are those of the author and do not reflect the official policy or position of the Department of the Navy, Department of Defense, nor the U.S. Government. mercially available systems that collect and analyze these nuclear images exist. Some of these systems are designed for cervical cytologic diagnosis of the more common squamous lesions. These systems take a large number of measurements based on the optical properties of nuclear images in order to capture a subset of pertinent characteristics of the nuclei related to ploidy and abnormality [8]. QP reduces the subjectivity associated with classic visual diagnostic procedures and allows for reproducible criteria for the diagnosis of ACIS and AdCa. QP has been shown to be effective in detecting premalignant and malignant cells [19]. There is evidence that QP methods have potential in detecting malignancy-associated changes -premalignant lesions associated with changes in the DNA structure which cannot be detected using classical visual tech-niques [17]. QP has also successfully monitored regression/progression in chemoprevention studies examining premalignant conditions [3,23], and the measurements can be correlated to biomarkers evaluating ploidy [20,21].
ACIS was thought to be rare when it was first discovered, but recent evidence suggests more focus should be given to diagnosing this disease. Hepler et al. [12] first described ACIS in 1953, and Friedell and McKay [11] detailed more characteristics of the lesion in a later publication. Currently this lesion occurs in 1 out of 25,000 Pap smears, but several studies offer evidence that the prevalence of this lesion is rising [7,22]. ACIS is the putative precursor of adenocarcinoma (AdCa), and detection of ACIS might yield better prognosis for patients with AdCa. Currently, AdCa has a poorer prognosis than the more common squamous cervical lesions [4,7,9,10,26]. The histologic and cytologic diagnostic criteria for ACIS has been only relatively recently established [1,2,5,13,15]. The cytologic and histologic characteristics of ACIS can resemble those of other lesions such as metaplasia, endometriosis, and reparative changes [13]. Therefore, traditional cytologic procedures may not detect this lesion with high sensitivity, which may result in underdiagnosing this potentially serious disease [4,16].
Previously, West et al. found several QP features where the means significantly differed across cell types [27]. The cellular measurements were taken using the Cyto-Savant TM , one of the commercially available image analysis systems. Subsequently, Swartz et al. reported promising results for classifying tissue slices based on cellular measurements [25]. In the second work, several different classification procedures were examined. Each procedure involved first using the cellular measurements from the Cyto-Savant TM to create features that apply to the tissue sample as a whole. Then these tissue-sample features were used to classify the tissue sample as diseased or not diseased.
One of the promising procedures used by Swartz et al. involved only the IODs-Index [25]. The IODs-Index (also called ICM-DNA, or Image Cytometric Measurement of DNA [3]) measures the Integrated Optical Density (IOD) of the nuclear image. The small "s" is added to emphasize that the measurement is being applied to cells collected from sections. It is a normalized feature, meaning that the IOD measurement from each of the cells of interest, in this case the glandular epithelial cells from the canal, are divided by the mean IOD measurement of a collection of control cells from the same tissue slice. In this study lymphocytes serve as the control cells. The normalization reduces the patient-to-patient variability, and it reduces the effects of stain intensity so that the IODs-Index is easier to compare across patients than just the IOD measurement. Also the IODs-Index is highly correlated with the amount of DNA in the nucleus.
In the previous work by Swartz et al. [25], the tissue-slice feature created from the IODs-Index was the proportion of cells from a given tissue slice with IODs-Index values greater than or equal to 2.5. Higher values of this proportion score were associated with cancerous tissue slices. Although this value of 2.5 had biological motivations, it was not chosen to optimize the classification procedure.
In this work we empirically find a threshold value that is optimal with respect to the classification task at hand for the given data set. Our results suggest that optimizing the threshold for the IODs-Index gives a smaller threshold value than 2.5, and thus could potentially improve the classification performance when classifying normal versus cancerous (ACIS and AdCa) tissue slices. This shows that a simple algorithm involving only the IODs-Index has the potential to perform remarkably well when classifying tissue slices.

Materials and methods
We will begin by discussing the patient case selection and the selection of the cells from the tissue slices. Then we discuss the classification algorithm along with the optimization procedure.

Patient case selection
The data used in this article is the same as that described in [25] and [27]. A retrospective computerized search through the pathology records at the M.D. Anderson Cancer Center by pathologic diagnosis identified patients for this study. Archival tissue blocks and pathology slides stained with hematoxylin and eosin (H&E) were retrieved, reviewed, and mapped by two pathologists specializing in gynecology (I.B. and A.M.). The pathologists confirmed the diagnosis of ACIS and AdCa using established criteria [5,15]. Any cases containing mixed lesions or insufficient tissue for additional preparation and analysis were excluded. The patients included in the normal group were also found from the archival tissue blocks, but the reason for their treatment was unrelated to adenocarcinoma, and the slices showed no evidence of this disease. Further quality-control issues were examined as described subsequently, resulting in a final data pool consisting of 68 tissue samples: 13 with normal histology, 37 with ACIS and 18 with AdCa.

Specimen preparation
Sections of 4 µm were cut from the archival specimens and stained using a thionin-Fuelgen reaction method as described in [3]. The pathologists then rereviewed the slides and compared them to the original H&E slides to confirm both the diagnosis and that sufficient tissue was present for analysis. One of the pathologists also mapped the region of interest for the image analysis.

Image analysis and cell selection
The QP measurements were made by a Cyto-Savant TM computer-assisted image analysis system (Cancer Imaging, Vancouver, BC, Canada) using feature set FB5. Seventy-five tissue slices -one from each patient -were initially collected. Data from two specimens were discarded because the values used for normalization were missing. Five other slides were discarded because the images were out of focus. From each of the remaining 68 tissue sections, images of 146 ± 50 (mean ± standard deviation) epithelial and 105 ± 30 lymphocyte cell nuclei were collected from the mapped areas using a semi-interactive procedure. Only nonoverlapping nuclei with clearly discernable borders and no evidence of "capping" were collected for the final analysis. Capping refers to the cutting effect where only the tip, or cap, of a nucleus is present on the slide.
The images were then stored in the computer memory. Nuclear boundaries can be defined by the computer software in a precise and reproducible manner, and measurements are made on the cell images and stored for analysis. The lymphocyte nuclear images are used as an internal standard on each slide to normalize some of the features, including the IODs-Index, so these features are adjusted for effects of stain intensity.
In addition to concerns of stain intensity variation, there was concern that the DNA might degrade over time because the time when the samples were taken covers a wide range (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999). This issue was examined by finding normal epithelial cells on all slides and checking the mean DNA amount for each slide. There was no significant correlation found between the age of the sample and the mean DNA amount for both the squamous cells and the lymphocytes. This suggests that there is minimal or no DNA degradation over the range of the samples. The analysis is discussed in more detail in Section 4.

Statistical analysis
Before analyzing the data, the AdCa and ACIS groups were combined to make up one group of diseased cases, and this diseased group was compared to the normal cases. Macros were written in SAS version 8.01 to perform all the analyses. Here we describe a composite discriminant algorithm we developed, which performs two basic tasks. First, it creates a feature for the tissue slice using the cellular information from the IODs-Index. Second, the tissue slices are classified using statistical methods applied to the tissue-slice feature. In this section, we describe the basics of the algorithm and the optimization of the threshold used to create the tissue-slice feature.

General algorithm
The general algorithm has two steps. In the first step, we generate a tissue-slice feature from the IODs-Index. We consider all the cells from a given tissue slice and calculate the proportion of cells with IODs-Index values greater than or equal to an Index Cutoff (IC) value. We will refer to the Proportion of IODs-Index values greater than or equal to this IC value as the PIV score, and to designate a PIV score associated with a particular IC value, we will include the IC value in the name (for example, the PIV score associated with an IC value of 2.12 is the PIV2.12 score). The PIV score is a feature that applies to the tissue slice.
In the second step, we use the tissue-slice feature, the PIV score, to classify the tissue slices. From the Swartz et al. study [25], a high PIV score is indicative of cancerous tissue slices. For the second stage of the algorithm, we apply a threshold to the PIV score so that we can classify the tissue slices in an optimal way. Then tissue slices with a PIV score above this threshold are classified as cancerous, and tissue slices with a PIV score below this value are classified as normal.

Optimization procedure
Before implementing the algorithm, we must find the IC value and the PIV score threshold that optimize the classification algorithm. We use Receiver Operating Characteristic (ROC) curve analysis to optimize both thresholds. A ROC curve is a visual tool used for assessing performance of a diagnostic test [18]. Each point on the ROC curve is called an operating point and is associated with a threshold and a sensitivity and specificity for performing the test at that threshold. To generate an empirical ROC curve, we follow a procedure similar to the one outlined by Schein et al. [24]. First we rank order the observations according to the PIV score. For each observed value of the PIV score, we consider a threshold at that value. Then we classify all samples at or above the threshold as diseased, and those below as normal. The percentages of samples correctly classified give the sensitivity and specificity for that operating point. This is done for all observed values of the PIV score to trace out the ROC curve. We add a final operating point associated with considering all samples normal. This ensures that both extreme points of the ROC curve are represented. The ROC curve associated with these empirical operating points is graphically represented by plotting 1 -specificity on the horizontal axis, and sensitivity on the vertical axis. The point at the upper-left most point on the curve is the point that maximizes the sum of the sensitivity and specificity of a test. Therefore, we can find the sensitivity and specificity of the test based on the PIV score that is optimal in the sense that it maximizes the sum of the sensitivity and specificity.
This procedure also helps us find an optimal IC value. For every particular IC value, the PIV score associated with it is a different score from a PIV score associated with any other IC value. Thus, each PIV score generates a different ROC curve. We can summarize the performance associated with each ROC curve using the area under the curve (AUC). It is a widely used summary value used to compare ROC curves [6]. The AUC is calculated exactly from the empirical ROC curve. Therefore, we can find the IC value that produces the PIV score with the largest AUC value by comparing across all ROC curves. This gives us the optimal IC value that yields the PIV score with the greatest AUC.
We used this idea to implement a simple procedure to optimize the IC value. We examined a mesh of IC values ranging from 1.95 to 3.00 using increments of size 0.01. A ROC curve was estimated for the PIV score associated with each IC value, and the operating point yielding the highest sum of sensitivity and specificity on the estimated ROC curve was identified. Then we determined the IC value associated with the ROC curve that obtained the highest AUC value.

Error-rate estimates
To reduce the bias of the error-rate estimates, we used a leave-one-out cross-validation procedure. All the cells from a given slide were omitted while finding the optimal IC value. Then the IC value found from this training set was used to calculate the PIV score for the omitted tissue slice. This was repeated for all tissue slices. Finally the sensitivity and specificity maximizing the sum (sensitivity + specificity) were estimated from the ROC curve generated from the leave-one-out PIV scores.

Results
For this data set, the IC value associated with the ROC curve that had the maximum overall AUC was 2.12. Following our earlier established convention, we refer to the PIV score associated with this optimal IC value as the PIV2.12 score. Furthermore, if we consider the raw scores instead of percents, the ROC curve scale goes from 0 to 1 on both axes, and the maximum possible AUC value is 1. On this scale, the AUC of the PIV2.12 score was 0.976, which is exceptional. Graphically one can see the results of the leave-oneout cross-validated ROC curve in Fig. 1. The operating point that maximizes the sum of sensitivity and specificity for the PIV2.12 score is marked. The estimates of the sensitivity and specificity associated with this operating point are 94.5% sensitivity and 100% specificity making the sum 194.5, and the threshold associated with this optimal operating point is 0.08. The classification results by tissue type are reported in Table 1.
As mentioned in Section 2.3, we also looked for a confounding effect from DNA degradation. We found no evidence of significant DNA degradation. The scatter plots for both the normal glandular cells of the cervix and the lymphocytes, along with their regression lines, are in Figs 2 and 3, respectively. The coefficient for the age in years of the slide was not significant for both cell types, see Table 2.

Conclusions and discussion
Quantitative features in histologic settings are useful in diagnosing adenocarcinoma. In the present study     we were able to develop a methodology for an algorithm that discriminates between normal and diseased tissue slices based on the IODs-Index alone. From this study we see that the IODs-Index contains valuable information regarding the cancer status of histologic tissue slices. The sensitivity and specificity associated with optimal operating points for the PIV2.5 score from [25] was 90.9% sensitivity and 92.3% specificity, with a sum of 183.2. In Section 3 we mentioned that the PIV2.12 yielded 94.5% sensitivity and 100% specificity with a sum of 194.5. Notice that the PIV2.12 score has a higher sum of sensitivity and specificity than the PIV2.5 score. This suggests that by optimizing the CI value, we gain performance. For the best procedure from [25] the sensitivity and specificity was 96.4% and 92.3%, respectively, and their sum was 188.7. This procedure was a more complicated algorithm based on multilevel statistical modelling. The simple algorithm in this work is comparable to the more complex algorithm of the previous work, and thus has the potential to be an effective tool for discriminating between normal and cancerous tissue slices. A larger study is required before the algorithm could be applied clinically. Figure 4 shows histograms of the IODs-Index values for three representative tissue slices, one for each diagnostic group: normal, ACIS, and AdCa. The optimal IC value of 2.12 is marked with a dotted line. One can see in this graph that the cancerous tissue slices (ACIS and AdCa) have histograms with longer right tails than the normal slice, indicating that there are more cells with higher IODs-Index values in the cancerous tissue slices. Also notice that the normal tissue slice has very few cells with IODs-Index values greater than the IC value. In fact, since the threshold of the optimal operating point is 0.08, which results in 100% specificity, we know that for this data set, less than 8% of the cells from a particular normal tissue slice will have IODs-Index value greater than 2.12.
We are confident the performance of the classification algorithm is not due to any confounding effects from possible DNA degradation. First, as noted before, the coefficient for the age of the slide in years was not significant for both cell types, see Table 2. Second, even if the relationship were significant for both cell types, the coefficients are still very small (−0.450 for the normal cells and −0.269 for the lymphocytes; this is a fraction of a percent per year loss in a typi- cal specimen) so that over the 14 year span of the slide age, we would be looking at a 6.3 unit reduction in the DNA amount for the normals and a 3.8 unit reduction for the lymphocytes. These numbers are small relative to the standard deviation of the cells on the slides (the minimum was 11.5). Third, the lymphocytes would degrade at a slower rate than the normal cells. This means that, if we consider just the possible effect of slide age, the IODs-Index, which is the IOD value of the normal divided by the mean IOD value of the lymphocytes, would tend to get smaller as the slides get older, since the denominator shrinks slower than the numerator. In our samples, the AdCa and ACIS groups have the oldest slides (1-14 years old and 0-4 years old, respectively), while the normal group consists of the most recent slides (0-1 years old). Yet, the normal cells, which were from the youngest slides, also have the smallest IODs-Index value.
In summary, there is no statistically significant evidence of an effect due to DNA degradation. Even if we were to consider the effect significant, it would imply that our classification results would be conservative. Therefore, the good performance of the classification algorithm is not due to a confounding effect from the age of the slide. Furthermore we would expect that in a clinical situation there is a possibility of seeing improved performance using fresh samples.
Because our 13 normal samples are strictly normal -they do not contain inflammation or other abnormal, non-cancerous conditions -the sensitivity and specificity values are potentially biased upwards. Further investigation using a larger data set is required before this tool could be used clinically. Specifically, using a larger prospective study with more non-cancerous conditions to determine the optimal IC value is in order. However the methodology is sound. During the leaveone-out cross-validation the optimal value of 2.12 was very stable; only 3 of the 68 leave-one-out cross validation training samples had an optimal IC value that was not 2.12. Thus there is potential benefit to be gained by optimizing the threshold for the IODs-Index empirically, and we also see that the IODs-Index does have valuable information which can potentially improve the assessment of cancerous lesions.
In the future, we hope to further develop several aspects of this study as well. Using a larger data set, we could explore the possibility of distinguishing between all three cell types: Normal, ACIS, and AdCa. Also, we could explore other possible methods of inducing features applying to the tissue slices from features measured on cells within the tissue slices. Again, although there is potentially useful information in the IODs-Index, discriminating between the three different groups might require other features than just the IODs-Index.