DNA Histogram Interpretation Based on Statistical Approaches

Image cytometric DNA measurements provide data which are most often interpreted as equivalent to the chromosomal ploidy although the chromosomal and the DNA ploidy are not identical. The common link between them is the cell cycle. Therefore, if destined for DNA ploidy interpretations, the DNA cytometry should be performed on a population‐oriented stochastic basis. Using stochastic sampling the data can be interpreted by applying the rules of stochastic processes. A set of statistical methods is given that enables a DNA histogram to be interpreted objectively and without human interaction. These statistics analyse the precision and accuracy of the entire measurement process. They give in error probabilities for accepting a measurement as reliable, for recognition of stemlines, stemline aneuploidy, and for evaluating so‐called rare events. Nearly 300 image cytometric DNA measurements from breast cancers and rat liver imprints examples have been selected to demonstrate the efficiency of the statistics in each step of interpreting DNA histograms.


Introduction
The ESACP consensus [6] on image cytometric DNA measurements provides a basis for standardization of the measurements and several details of the measurement conditions and preparation of specimens. However, the diagnostic and/or prognostic interpretation of the results of DNA image cytometry remains difficult. Compared with other quantitative morphological techniques, DNA cytometry has the advantage of being backed by a clearly understandable biological model of the cell cycle. From a statistical point of view, several parameters for a cell cycle model are possible to describe DNA-stemlines and fractions of the cell cycle. The question raised by the statistics is how these parameters fit the current DNA-histogram.
The most crucial point in DNA cytometry is the rescaling of the DNA axis, i.e., the definition of the position of a DNA-histogram peak, which can be given in relative units only. All further interpretations, be it histogram classifications [1], distributional [2,14,18,22,30] or interpretative variables [3,20,21] of the DNA measurement have to be based on reliable calibration, i.e., also reliable determination of the position of the histogram peaks in terms of the DNA content.
Furthermore, the seemingly simple definition of a peak itself also remains a matter of discussion. It is closely related to the problems of cell selection for measurements, to the representativity of the measured sample as well as to its validity. As far as the representativity is concerned, the question is whether all possible cellular clones in a sample are really represented in their true proportions in the measurement data. This is essentially out of our control. The question of validity of the data is dependent on whether the number of cells measured allows a valid interpretation of the data set in terms of the underlying model, e.g., whether a histogram spike represents a peak.
For all of these problems appropriate statistics may provide us with sufficient solutions for reliable DNA cytometry interpretation. The present paper was intended to present such statistical approaches and to demonstrate their actual performance in normal and tumour cell samples.

Specimen preparation and staining
Twenty-seven rat liver imprints (Fischer rats, both sexes, age between the eigth and ten week) and 247 fine needle aspirates from breast cancers were immediately fixed in 4% buffered formaldehyde. Staining by the Feulgen reaction was made after 40 min hydrolysis in 5 M HCl at 22 • C followed by pararosanilin Schiff's reagent (CI 42500) for 60 min. For each staining batch at least one rat liver imprint was stained together with up to ten breast cancer specimens.

DNA image cytometry
The image cytometry device consisted of an Axioplan microscope (Zeiss, Germany) equipped with a 486/66 MHz IBM compatible PC with a MFG frame grabber (Imaging Technology, USA) and a CCD-TV camera XC-77CE (Sony, Japan). The microscope was also equipped with a computer-controlled, motor-driven xy-scanning stage for exact relocation of all nuclei measured (Zeiss, Germany). For the DNA-cytometry a Plan-Neofluar objective 63×, n.a. 1.25, and a green interference filter (570 nm) were used.
The DNA cytometry is based on routines written in C and linked with the OPTIMAS image analysis system (OPTIMAS Inc., USA). The segmentation of each nucleus, focused individually, is performed by a local multi-step extinction thresholding, which leads to the segmentation at the highest gradient of the spatial transmission profile, taking into account a local background correction.
Before the specimens are measured, the cytometry system is calibrated for a dark (transmission 0%) and a bright field state (transmission 100% at full range of the input video signal) in order to adjust it as near as possible to a linear characteristic.
For a highly precise measurement of the IOD two procedures were combined which compensate for the glare and the diffraction errors of the microscopic system to a large extent [17].
In rat liver specimens 150 diploid, tetraploid, and octoploid nuclei, as well as between 10 and 15 mesenchymal cells were interactively analysed; and in breast tumour specimens 250 tumour cells and 15-30 lymphocytes. Only those cells morphologically well preserved and isolated from each other were selected for the measurement. In each field all cells fulfilling these criteria were measured, including the appropriate internal reference cells. The fields were selected at random.

Test for uniformity of reference cell populations
In DNA image cytometry all values being interpreted are relative values of the DNA content. Due to the limited reproducibility of DNA stains for absorption cytometry [27], the calibration of the DNA axis has to be made by the use of reference cells with known DNA content for each specimen measured.
For all problems of the accuracy of DNA image cytometry, the reference cells are of utmost importance. Their measurements reflect the adequacy of the sampling, the homogeneity (or even heterogeneity) of the preparation of the specimen, the stability of the measurement settings and of the optoelectronic properties of the cytometry device. The reference cells should indicate the process quality which can be obtained in a given specimen. Therefore, sensitive tests are needed to check the reference cell data for unimodality, excluding outliers from computation of the reference cell measurement precision. In most of the diagnostic specimens only a few internal reference cells can be found. The Shapiro and Wilk's W -statistic [10,23,24] is a test also suited for the problem of comparatively low cell numbers.
The W -test detects both outliers and departures from normality (e.g., by bimodality) very sensitively. The resulting p-values can be interpreted as red flags for the entire measurement. If they are below an agreed level, e.g., 0.05, the diagnostic evaluation of the specimen should be done with care. In Fig. 1 examples for reference cell populations in 4 different specimens are given. In usual DNA histograms the departures from normality are not obvious. Therefore a very small bin size of the histogram for the reference cells was choosen for this demonstration in Fig. 1 also.

Detection of stemlines
From the biological point of view the DNA stemline was already defined by Sandritter [25] as a peak in a DNA histogram that represents a G 0/1 -phase of a proliferating population. It must therefore show a second peak or G 2/M -phase cells in a duplication position. However, not all of the spikes in a DNA histogram can be regarded as distinct peaks. The decisive criterion is the validity of a cluster of data values in a data set to be statistically different from other, non-clustered data values (peak events vs. out-of-peak events). This difference must be defined by the precision of the measurements and by the distribution characteristic of the data values, usually known as a distribution density (i.e., a histogram).
A spike in the histogram is thus a peak if a local maximum in the distribution density can be statistically defined by estimating the random events in the measurement.
Let us assume a data set of n values with a known standard deviation, SD, then a peak can be found with a given error probability p err (e.g., p err < 0.05) within a region of ± SD around a real maximum, if the number of events within this region is significantly different from zero. The SD can be derived from the precision of the measurements of the appropriate reference cells, e.g., by using the coefficient of variation (cv = SD/mean) of the IOD values from their G 0/1 -phasefraction.
The binomial distribution is used to compare the probability of occurrence of a detected number k of events within a region. In Fig. 2 an exemplary binomial distribution for n = 30 and p = 0.5 is shown. In this distribution 95% of all values k are between k low = 9 (with p low = 0.025) and k high = 21 (with p high = 0.025).
(c) n = 10, cv = 2.4%, p = 0.0113.   Values smaller than 9 are not within the same confidence region as values greater than 21. A probability p 0 is calculated which k low = 0 for a given number n of nuclei measured: We now consider the probability p high = p low of the occurrence of at least k high events: This process has to be repeated until the value of p high is below 0.025 (i.e., half of the error probability desired). The resulting k high is that number of cells not being in the same confidence region with k low . Let us start the search for the peak at the first (lowest) value of the sorted data values, and move forward value by value making n trials to find a peak.
In each trial we have to check whether the number of values within the ± SD region around a maximum is greater than k high . In such a region 68% of all cells of a peak would be found. It should be kept in mind that, due to the multiple trials, we have to take into account not the error probability p err , but the error probability p err /n while making n multiple trials (Bonferroni adjustment [8]); i.e., the error probability for each single trial is much lower.
A peak is found if the statistics confirm the occurrence of more than k high events within the test region.
An empirical approximation of (3) for an error probability p err = 0.05 can be given (max. error of k high ± 1 event): k high ≈ 6 log 10 n − 1 log 10 n + 8.
In Table 1 the figures of binomial distributions for peak events are shown, which in given numbers n of data values with a desired error probability p err have a maximum number k high statistically not different from zero. It becomes evident that this procedure is an iterative, converging process. The table is to be read as follows: If 10 data values have been measured, then 10 trials can be performed. The resulting k high is 8. In the next step 10 minus 8 values are available for the test, resulting in k high = 6. The third step can use 10 minus 6 values, resulting in k high = 7. The further steps converge at 7.
The maxima found by these procedures can be used as starting variables for any Gaussian fitting to compute the means, variances and counts of the peaks. The resulting peaks are subtracted from the original data measuration and the entire procedure is repeated until no statistically significant peak remains. Figures 3-5 show examples of DNA histograms of breast cancers. The effect of the peak recognition approach is shown in each histogram using the simplified formula (4). If Figs 4 and 5 are compared one can demonstrate that the visual impression of the histogram shape can be misleading concerning the recognition of peaks.
The search for a peak should be performed in a cumulative frequency distribution rather than in a frequency density distribution (classical histogram), because in the former no assumptions for the histogram bin size (and therefore for the validity of the histogram classes) are necessary [28].
Finally, a peak is then a stemline if a second peak or also out-of-peak events (see below) are to be found in the duplication region.

Test for DNA-stemline abnormality (aneuploidy)
As mentioned above, the calibration of the DNA axis has to be performed by the use of reference cells with a known DNA content, which also have measurement errors. In the theoretical model assumptions for the cell cycle, the reference cells should represent the "truly normal" G 0/1 -cycle-phase. Therefore, the reference cells have to be included in the model assumptions as well.  A test to show whether two samples from one case (e.g., internal reference and analysis cells) are from one population, is not reliable, because those samples do not belong to a common population. It is not clear whether the measurement system yields identical IOD values in those two samples because the effects of preparation, glare, diffraction, digitalisation, etc., may be quite different in the different cell types [9,28], as well as in those obtained from the same tissue. By using external reference cells this becomes still more obvious. But the ratio of the mean IOD of the G 0/1 -phase-fractions from both sample types can be tested in many cases for its statistical properties; it is then the population of these ratios in normal (euploid) cases which is compared to that ratio from a single case. These ratios need not only be ratios between any reference and analysis cells, but ratios between G 0/1 -and G 2/M -phases of analysis cells can also be used to check for polyploidization of a tissue.
For a given DNA-peak there are three error sources for the determination of ploidy: Any test statistic has to take into account all of these error sources. Let M R = mean IOD of G 0/1 -phase of reference cells; S R = standard deviation of the reference cells' G 0/1 -phase IOD; n R = number of G 0/1 -phase reference cells measured; M A = mean IOD of analysis cells (in the peak); S A = standard deviation of analysis cells (in the peak); n A = number of analysis cells (in the peak) measured; and R = IOD ratio M A /M R in the specimen under investigation; cf mean = mean of the IOD ratios M A /M R (so-called corrective factor) in a series of normal specimens, prepared under the same methodological conditions; S cf = standard deviation of the IOD ratios M A /M R in a series of normal specimens, prepared under the same methodological conditions; then, a statistical test value t is given by the difference of cf mean and R, divided by the square root of the sum of the variances of the three error sources: With cv = S/M and cf mean ≈ R the test value t is The test value t is normally distributed with a mean = 0 and a SD = 1. Then the probability of the null-hypothesis (bilateral) that the ratio tested belongs to the population of "normal" ratios is defined by From Eq. (6) it becomes evident how the three kinds of error mentioned above influence the test value and consequently the error probability. The error of cf mean is independent of the number of cells measured. If the cv of the IOD of reference cells is lower than 5% (compare the ESACP consensus [6]) and only 8 reference cells are measured, then its share in the overall error is about 50%. From the statistical point of view it would not be necessary to measure more cells. However, the problems of their representativity are not concerned in these considerations.
If the DNA-stemlines were detected as described above, the measurements usually result in cv A of analysis cells between 3-4%, and in cv R of reference cells between 2-3%. Concerning the IOD ratios between reference cells and diploid analysis cells, the measurements are both precise and accurate, with a cv cf of the IOD ratio of about 3%. The mean IOD ratio cf mean is near 1.0, 2.0 and 4.0, respectively. In Table 2 the actual cf mean and cv cf for different reference/analysis cell combinations are given. By analysing the sources of the cv cf in a set of 22 measurements of five rat liver imprints it becomes evident that the preparative variations contributed nearly double the overall cv than the repeated measurements (Table 3).  If the latter Eqs (6) and (7) are applied to each DNA-stemline, then error probabilities show to what extent the peak is different from the reference system. This is shown for two examples of aneuploid breast cancers in Figs 6 and 7. The error probability is a statistical figure, clearly defined and independent of additional assumptions or conditions. It could also be used for a continuous grading of DNA ploidy, resulting in "normal" grades around the euploid regions.

Evaluation of out-of-peak events and cell cycle phases
After one or several peaks have been defined in the DNA-histogram, the cells belonging to these peaks as well as the remaining out-of-peak events can be quantified. However, one of the drawbacks of DNA image cytometry is the comparatively low number of cells analysed that makes such a quantification unreliable. Taking into consideration the low cell numbers, the binomial distributions could be helpful again in the interpretation of those data.
Let us assume a number of events e, e.g., between the G 0/1 -and G 2/M -phase of a detected stemline, with a given proportion p, which is derived (for example) from an assumed S-phase fraction, and a cv = 1.78% cv 1 = 3.75%, p 1 = 0.2619 cv 2 = 3.18%, p 2 = 0.0003    0  4  4  15  12  28  29  50  86  113  500  1  9  16  34  37  63  83  117  228  271  1000  4  16  37  63  82  118  176  224  469  530  2000  12  28  81  119  174  226  365  434  969  1043 number of cells n in the appropriate stemline (or in the entire population), then the probability that for a given p, e events from n events occur, is determined by The possible number of events with a given error probability p err is in the range between n low and n high with Table 4 and Fig. 8 show how large the regions are, in those the probability of the occurrence of events from cycle phase fractions as well as out-of-peak events is equal for different absolute numbers of  Table 4). events, or vice versa, the probabilities of occurence are in the same confidence region for a given event number. For example, a DNA ploidy analysis of a breast cancer resulted in a DNA histogram shown in Fig. 9. This histogram was evaluated for its cycle phases, peaks, and out-of-peak events as follows: The DNA histogram shows two peaks and a few out-of-peak events. The S-phase cells in relation to the appropriate diploid and tetraploid G 0/1 peaks have strongly overlapping regions of their probabilities of occurence. The single nucleus at 8c can also be the polyploidized G 2/M part of 4c. The 4 cells between G 2/M (4c) and G 2/M (8c) do not have a higher probability of occurrence than the S-phase cells between 2c and 4c, thus not implicating a higher proliferative activity of the tetraploid cycle compared with the diploid.
However, the quantitation of cycle phases in DNA histograms with more than one stemline requires a priori knowledge about the G 0/1 -phase fractions, which cannot be obtained from the histogram alone. A reasonable quantitation should therefore be restricted to histograms with only one stemline.

Discussion
Although widely used, the term DNA ploidy has not been defined clearly so far. If it is assumed to be the quantitative cytometric equivalent of the chromosomal ploidy [7], the link between both terms is the cell cycle. Both terms, however, are not identical, as it was stressed by Schulte et al. [28]. Whereas the chromosomal ploidy is theoretically detectable by cytogenetic methods in each single cell, the DNA content of a single cell cannot be equated unequivocally with a certain chromosomal conformation of this cell. Therefore the DNA ploidy is understood here as a cell-cycle related variable, not to be applied to single cells. Consequently, the DNA cytometry should be performed on a population-oriented, stochastic basis. Its evaluations have then to be based on statistical methods. Given a stochastic sampling, the interpretation of the data can be done reliably by applying the rules of stochastic processes. This approach does not reflect directly on the diagnostic meaning of the results of DNA cytometry, i.e., whether a lesion is "normal" or "abnormal". Certain diagnoses can be also made without these statistical preconditions, e.g., the occurence per se of one cell with an abnormally high DNA content indicates malignancy in certain lesions.
The advantage of statistical approaches is that diagnostic conclusions can be drawn at a specified level of error probabilities, thus making the errors a quantifiable factor. This approach allows both to postulate a priori performance levels and to specify a posteriori the degree of accuracy of an actual measurement.
The statistics have to include both the precision and the accuracy of the measurements. In the huge body of literature about image cytometric DNA ploidy analysis, only two approaches can be found which considered the inherent measurement errors (or the precision) in a reproduciable way for the evaluation and interpretation of the DNA data [5,15]. Yet, in these approaches as well the measurement errors were considered to be confined to limitations in the precision of single cell measurements and of single specimens. In almost all papers a factor for the correction of the mean IOD of the reference cells is given to find the "real" diploid value for the analysis cells. The statistics of this factor remained neglected so far. However, all experience in DNA cytometry is contrary to the assumption that the IOD ratio between reference cells (both internal and external) and analysis cells, the so-called correction factor, could be kept constant [28]. Not only that these correction factors must be determined carefully and extensively for any reference/analysis cell combination, but also the variance of these factors must be measured in comprehensive test sets. It is this variance that mainly defines the degree of accuracy that can be reached in a DNA ploidy analysis, as it was already demonstrated in a previous paper [17]. All methodological and technological improvements [9,11,12,16,19,26] of the measurement process which lead to an increased precision will also reduce this very important error source, and will therefore improve the accuracy.
On the other hand, interpretation algorithms for DNA-stemlines, referred to in the literature, do really respect the variance, mentioned above, in a rather empirical way: the histogram regions defined as diploid (and tetraploid) are more or less dependent on the variation of the correction factors, although this is not obvious [13,29,31].
But there is another important aspect of a generally increased precision of the measurements. The higher this precision is, the more dependent the interpretations will be on methodological errors in the actual measurements. Therefore, the entire process has to be controlled by statistics to indicate deviations from the expected precision as sensitively as possible. This control concerns the sampling, the preparation effects, the reliability of the instrumentation and of the reference cells. For the first time the statistical procedures proposed in this study allow such a control of the entire process: In a first step the representativity of the reference cells concerning their uniformity is tested. In the next step the precision that can be reached in a given specimen is defined by the coefficient of variation of the reference cells. This precision is used for the estimation of the validity of analysis cell peaks in a third step. In a further step the precision of the actual measurements and the variance of the correction factor, known from previous measurements, are the basis for the interpretation of a stemline of the analysis cells. Finally, the out-of-peak events can be evaluated as to their designation and quantitation of cycle phases.
For each of these steps appropriate statistical measurements have to be used that are sensitive as well as robust enough for the practical use. Thus, the sometimes recommended Kolmogoroff-Smirnoff test [5,28] is not sensitive enough for the test of departure from normality in the case of reference cells. But it is a very robust non-parametric test not requiring any assumptions about the type of distributions compared with each other. Therefore, for some problems combinations of test statistics had to be found.
At each step a statistical figure, the error probability, emerges which is continuously distributed between 0.0 and 1.0. Therefore, probability thresholds for each of these steps can be defined, when this is exceeded then a particular measurement is designated as borderline or out of acceptable limits. Usually, these thresholds would be set at 0.05. It should be the aim of scientific committees to agree on the appropriate thresholds in order to ensure a product quality of the DNA image cytometry. All DNA ploidy analyses could be certified by such a procedure.
The consequence could be an automatic quality control and interpretation procedure for image cytometric DNA ploidy analyses, provided by a server or by software manufacturers to all of their users.