Relevance of Chromatin Features in the Progression of Esophageal Epithelial Severe Dysplasia

Since 1983, a long‐term clinical trial of esophageal carcinoma chemoprevention has been conducted in a high‐risk area in China. From this study, 25 esophageal severe dysplasia patients without therapy were selected for analysis. After 5‐year follow‐ups, 14 cases progressed to esophageal carcinoma, while the other 11 cases remained stable. Three Papanicolaou’s smears were used for each case, including one from the esophageal cytological examination at the beginning, two from the re‐examinations three and five years later respectively. About 100 visually normal intermediate cells were randomly collected per slide by high resolution image analysis. More than 100 features (morphologic, densitometric, textural) were extracted. The classifications were made by means of stepwise linear discriminate analysis at the single cell level as on the specimen level using up to ten features. In all three comparisons of patients with progression and with regression at time of diagnosis, three years after diagnosis and five years later, the correct cell classification rates were about 70%. The subsequent specimen classifications by means of the a posteriori probability (APOP) distribution of the cells in each case led to 80% correct classification. All selected features reflected the chromatin structure of nuclei. The result demonstrated that the chromatin structures of esophageal epithelial cells in severely dysplasic patients are different between cases with and without progression. These results suggest the possibility of the application of image analysis in the clinical trials to find the dysplasia patients with higher risk of progression, in order to reduce the number of patients for therapy.


Introduction
Esophageal carcinoma is one of the high-incidence tumors in China. Nearly 15,000 people die of it every year. Next to stomach carcinoma, it is the most important death cause from all malignant tumors. In some areas, which are called high-risk areas, the incidence of esophageal carcinoma reaches 0.2% per year. Therefore, prevention and therapy of esophageal carcinoma are indispensable in high-risk areas. A series of previous research studies has shown that the development of esophageal carcinoma is closely related to esophageal epithelial dysplasia, and severe dysplasia is a precancerous lesion of esophagus [18]. Thus, it is both essential and possible to treat esophageal epithelial precancerous lesions, inhibit their malignancy and promote regression to normal, in order to achieve the goal of esophageal carcinoma prevention.
Since 1983, a long-term clinical trial of esophageal carcinoma chemoprevention has been conducted in two high-risk areas of Henan Province, China [16,17]. First, an esophageal cytological examination was undertaken among 9,633 residents of 40-65 years of age. The detected 2,531 cases of esophageal severe dysplasia were stratified-randomly and stringently divided into three groups, according to sex, age and the grade of esophageal epithelial lesions. The patients of the three groups were treated with Anti Tumour B (ATB, a purely Chinese medication), Retinamide and a placebo respectively. A quality control system was set up for drug delivery and cancer case registration. After 3-or 5-year treatment, esophageal cytological examinations were repeated. Both re-examination rates amounted to more than 90%. The results showed that ATB and Retinamide reduced the incidence rate of esophageal carcinoma to about 50%, with significant statistical differences (p < 0.01) after 3-or 5-year treatments.
According to the studies mentioned above, only a small proportion (2% or so per year) of esophageal severe dysplasia patients progressed to carcinoma without therapy, while most of the cases remained stable [17]. Therefore, it is important to find a method of selecting the cases among dysplasias having a higher risk of progression. Such a method could be used to reduce the number of patients selected for therapy, thus leading to economical and ethical advantages.
Our investigation was carried out to determine if morphology, density and texture features obtained from an image analysis system can be used to predict the outcome on esophageal epithelial dysplasia.

Material and methods
Twenty-five esophageal severe dysplasia cases were selected for analysis from the control group of the clinical trial mentioned above. Fourteen cases progressed to esophageal squamous carcinoma after 5 years. The other 11 cases remained stable. Three smears were used for each case: one from the first esophageal cytological examination and two from the re-examinations 3 and 5 years later.
Smears were obtained by the balloon instrument according to standard method of esophageal cytological examination. Fixation was made using 95% ethanol, then standard Papanicolaou's stain was carried out. The diagnosis was made by at least two cytologists. The carcinoma cases were identified by means of X-ray, endoscopy and biopsy. All 14 cancer patients selected in this study were operated on or died as the cause of esophageal carcinoma according to clinical data.

Data acquisition
About 100 visually normal intermediate squamous cells per slide were randomly measured with an Axiomat-microscope (Zeiss, Oberkochen, Germany), equipped with a TV-camera (Bosch, T1VK9B1, Stuttgart, Germany, 512 × 512 pixel). The cells were scanned in transmission with a 100× objective (oil immersion, numerical aperture 1.3) using an optical narrow band filter of 548 nm wavelength. The pixel distance was 0.25 µm, and the nominal grey value resolution was covered by 256 channels [8]. Processing of digitized images was carried out using a VAX 4000-500 processor (Digital, Maynard, USA) with software written under idl (Interactive Data Language, RSI, Boulder, Colorado, USA). After segmentation of each nucleus by an automatically estimated threshold and a subsequent interactive control, more than 100 features (morphological, densitometrical, textural) were extracted using the extinction or optical density image, which was derived from the transmission image [34]. A shading correction was performed. Densitometric features such as mean, standard deviation, skewness, median, mode, and entropy were calculated from the histogram of the whole nucleus as well as from bright and dark particle regions inside the object that were segmented automatically [34]. For the latter a grey scale skeleton was applied on the extinction image (upper skeleton) and on its inversion (lower skeleton), which delivers the partitions into regions around dark and bright particles. The skeleton is similar to the watershed algorithm. For chromatin distribution features, several transformations were obtained by using linear and non-linear filtering such as Robert's gradient, Laplacean transforms and the above mentioned flat texture image, local fractal and multi-fractal dimensions, topological gradient, the difference of upper and lower skeletons, and statistical features from runlength and co-occurrence matrix [32,36]. These textural features are derived from pattern recognition methods. Their biological interpretation has to be derived from the order defined ( Fig. 1). Due to nonstoichiometric staining, only those features proven to be independent of staining intensity were used for classification, in order to avoid variance due to preparation and staining [35]. The remaining feature set contained about 60 variables. However the preparation of the smears over the acquisition time differed so much that only specimens collected in the same year could be pooled. But in all three classifications the same subset of features was used. The features are described in the Appendix.

Statistics
The statistical evaluations were made using SAS (SAS Institute, Inc., Cary, NC, USA) and BMDP (Statistical Software Inc., Los Angeles, CA, USA) program packages. All cells from specimens of the same clinical samples were pooled and the classifier was designed by means of a two-class stepwise linear discriminate analysis at the cell level. Of the evaluated feature set only those features were used in the classification steps which were univariate, significant and not highly dependent on staining and preparation methods. Up to 10 features were stepwise selected and were either accepted for the succeeding hold-one-out classification, or the procedure was stopped due to a non-significant F value. The value for the first selected feature is the univariate one whereas the following F-values are multivariate reflecting the impact of results after using this feature together with the already selected features. For each specimen, the means of the a posteriori probability (APOP) distribution of the corresponding cells was calculated. For the specimen classification, this APOP value and the double standard error of the mean (SEM) were used. A specimen was put into that class with the highest APOP value only if the mean APOP ± SEM did not cut a threshold (THR) which was set as the border between the two classes. In all cases, the threshold was defined at APOP = 0.5 that is the half distance between both group means. Cases with THRε{APOP ± 2SEM} were classed as unclear [4]. The significance of the specimen classifications was calculated using contingency tables without defining unclear cases.
All statistical evaluations were done at a 95% level.

Results
Due to changes in the preparation and/or staining over the long acquisition time of specimens, the classifications could only be done within the different years (1983,1987 or 1989, respectively). However, the same subset of 10 chromatin features was used which was more or less independent of staining and preparation changes. No size dependent features were taken into account.

Discrimination of patients with non-progression and progression at time of diagnosis (1983)
All cells of specimens belonging to the non-progression or progression group, were pooled and stepwise linear discriminant analysis was applied. The most significant feature was NC9 which calculates the entropy of the median filtered image followed by HETERO and MFRANG (Table 1). Figure 2 shows cells with high (4.7 [A.U.]) and low (3.9 [A.U.]) values of NC9 and Fig. 3 with low and high values for HETERO. In Table 1 the cell and specimen classification results are listed. The correct classification rate of the non-progression group was 69.3%, whereas for the progression group 77.8% was achieved. The subsequent specimen classification by means of the APOP-value and its standard error of the mean led to 8/10 correct specimen classifications of the non-progression group and 11/14 for the progression group. Three cases were defined as unclear and one case in each group was falsely classified. In Fig. 4 all specimens are ranked according to their mean APOP value. All cases plotted with open symbols were classified as unclear. Without defining the unclear class, these three unclear specimens were misclassified. The result is significant at p < 0.0001.

Discrimination of patients with non-progression and progression within three years at time of diagnosis (1987)
In this cell classification case the most important features was HLNO, which gives the mean value of chromatin particles in the nuclei extracted in the filtered image after the ricefield transformation, followed by RGM2 (Table 2).  values of HLNO. The cell classification was 71.9% for the non-progression group and 71.1% for the progression group. 8/11 specimens of the non-progression group and 12/14 of the progression group were correctly classified with two unclear cases and one false decision in the non-progression group and two false decisions in the progression group (Fig. 6). In case of no unclear decisions both cases were correctly classified (p < 0.001).

Discrimination of patients with non-progression and progression within five years at time of diagnosis (1989)
Using the subset of chromatin features, correct cell classification rates of 65.3% for the nonprogression group and 74.6% for the progression group were achieved ( Table 3). The best feature was once again NC9 followed by RGM2. The subsequent specimen classification resulted in 9/11 correct non-progression decisions and 11/14 correct progression decisions. Three cases of the progression group were unclear and two cases of the non-progression group were falsely classified (Fig. 7). In case of no unclear decisions two of the cases were correctly classified and one falsely (p < 0.0001).

Discussion
Cancer chemoprevention is being taken seriously in recent years. Normally at present, the rate of cancer incidence reduction is used as the endpoint of clinical trial in the field of cancer chemoprevention. However the barriers to the development of chemoprevention is the unacceptable cost, long duration and large scale of effort. Therefore, it is urgent that this endpoint be replaced by the Surrogate Endpoint Biomarker (SEB) that occurs early during neoplastic development and which is precise enough to achieve adequate statistical study power. Boone et al. believed that the use of morphometric parameters through computer-assisted image analysis can be used to predict the progression and regression of precancerous lesions [1]. Palcic concluded from a cervix chemoprevention study that nuclear texture measurements of MACs can be used as a SEB [30]. In recent study, Herlin et al. used DNA ploidy abnormalities to classify esophageal intraepithelial neoplasia and invasive carcinoma [12]. Their results showed that the parameters were unable to discriminate one from the other. Meanwhile two other studies [39,44], using texture features as MAC, led to a possible discrimination of dysplasia and carcinoma of cervix, as well as in situ and invasive breast carcinoma.
Our results given above show that it is possible to discriminate esophageal severe dysplasia patients with non-progression and progression to carcinoma within a few years. In spite of differences in preparation and staining conditions which only permitted the pooling of specimens according to different years, the classification results have been surprisingly good with correct decisions of about 80%. None of the patients of the progression group was classified as non-progression (1989) which should certainly be avoided. False positive decisions can be accepted if the subsequent treatments are not too aggressive.
The difference in significance of the selected features in certain classification steps were caused by the staining and preparation conditions. Nevertheless chromatin features extracted by high resolution image analysis were found to give significant additional information to the clinicians, about patients who will have a progress to cancer with a certain probability. In the future it may be possible that the pathologist can learn to differentiate patients with progression and non-progression, by means of textural features measured from normal appearing intermediate cells (Figs 2-4). Then according to the evaluated probability, an individual's treatment planning (operation, chemoprevention or reexaminations in shorter periods of time etc.) for the patient may be applied. Above all, it seems that features obtained from high resolution image analysis can be applicable as a marker of subtle changes in malignancy. Further, these features might be used as SEBs in cancer chemoprevention clinical trials.
Due to these surprisingly good results, a further prospective study using standardized air dried preparations and staining conditions was started in the meantime. Thus it should be possible to find that subset of chromatin features which is highly correlated with the prognosis of the dysplasia patients. Other questions such as the problem of which period the cancer actually occurs or the aggressivity of the tumour could be addressed.