A Feature Set for Cytometry on Digitized Microscopic Images

Feature extraction is a crucial step in most cytometry studies. In this paper a systematic approach to feature extraction is presented. The feature sets that have been developed and used for quantitative cytology at the Laboratory for Biomedical Image Analysis of the GSF as well as at the Center for Image Analysis in Uppsala over the last 25 years are described and illustrated. The feature sets described are divided into morphometric, densitometric, textural and structural features. The latter group is used to describe the eu‐ and hetero‐chromatin in a way complementing the textural methods. The main goal of the paper is to bring attention to the need of a common and well defined description of features used in cyto‐ and histometrical studies. The application of the sets of features is shown in an overview of projects from different fields. Finally some rules of thumb for the design of studies in this field are proposed. Colour figures can be viewed on http://www.esacp.org/acp/2003/25‐1/rodenacker.htm.


Introduction
The visual interpretation of the appearance of cells and tissue parts through a light microscope is at the heart of medical diagnosis as carried out in cytology and histopathology. The human visual system is excellent at recognizing qualitative aspects of patterns. Even small deviations from normality can be recognized and interpreted. Humans are, however, not at all as good at quantitative interpretations. Even the trivial task of counting the number of objects quickly becomes tedious and difficult to carry out accurately when the numbers become large. And estimating size, shape and texture quantitatively is very difficult and usually leads to substantial subjectivity. The digital computer on the other hand is excellent at counting and measuring things as soon as a clear definition of what is to be measured can be given. Ever since the first digital computers became available there has therefore been efforts at applying them to the task of quantitatively analyzing images. An additional motivation for the use of computers in this field has been to automate tedious mass analysis tasks such as screening for precursors of gynecological cancer on appropriately prepared cell specimens, so called Pap smears [17].
At the heart of all these quantification and automation attempts are the definition of what is to be measured and how these measurements should take place. Unfortunately in spite of more then 30 years of intense efforts in this field there still does not exist any generally accepted and applied set of feature definitions and measurement standards. The most typical paper in cytometry selects a few dozen cases (or less) from each of a couple of different cell populations, measures some features and then reports how well the populations could be separated based on these features. Thousands of such papers have been published over the years. Most of the publications are problem and not method oriented. Many times the exact definition of the features are not even presented in sufficient detail to allow the reader to repeat the experiment or to try the same method on other applications. This is proba-bly mostly ruled by limited space and perhaps by limited awareness of the importance of the exact definition of the used features. The exact definition is perhaps less important as long as the features used are directly related to observable or visually readily comprehensible properties like size or length [57]. However, for more complex features derived from statistical properties, either from histograms (1st order statistics) or from two-dimensional distributions (2nd order statistics) like most of the texture features, this is not acceptable. Statistical features are strongly dependent on parameters and configuration decisions. These features have to be described exactly or referenced in publications with an exact description. Even worse some times black box machines are used for the measurements where the implemented algorithms are not even known by the author.
This lack of commonly accepted standards for how different features should be defined and studies documented in such a way that different results can be readily reproduced and compared is probably one of the main reasons why quantitative image analysis based methods have not yet fully penetrated mainline pathology practices. Another reason for the slow penetration may be the relatively frequent over-optimistic findings from experiments with too limited data sets [110].
This paper is written in recognition of the problem outlined in the previous paragraphs and can be seen as an attempt to contribute to the method oriented discussion about feature extraction in cytometry. It is based on more than 25 years of experience at each of the authors laboratory. The proposed feature sets have been used, at various stages of their development, by the authors in some 80 studies in various applications.
The paper is organized in the following way. In Section 2 some general concepts are discussed and the different stages in the feature extraction process is outlined. In Section 3 a taxonomy for feature classes is proposed and the various kinds of features are described somewhat in detail. In Section 4 some brief details about an implementation of the outlined feature set are given. In Section 5 we give a list of applications to which the proposed feature sets have been applied. And finally the paper concludes with a brief summary, including a set of rules of thumb that tries to summarize the experiences from the various studies discussed in the paper. The paper is supplemented by a rather extensive set of appendices in which the exact mathematical definitions of the proposed feature sets are given as well as a description of the parameters and data structures involved. Additional information and il-lustrations can be found under the web-pages of the authors http://www.gsf.de/ILIAD/.

From image to feature (and vice versa)
There are two typical situations in which a pathologist, cell biologist or other similar researcher is interested in applying cytometry methods: For automation and for decision support [19].
(i) For automation we have a situation where the pathologist is trained to recognize by observation deviations from normality. He can classify classes, states etc. To make these finding objective, reproducible or to construct automata to decrease the workload a machine should be designed to fulfill this classification task. (ii) In the decision support scenario we typically have different groups of patients defined by different development of diseases, either by survival time, re-occurrence of metastases etc. There is no explicitly known image representation for this functional behavior of cells. However, the pathologist is interested in differentiating these groups of patients for an eventual possible treatment.
Common for these two scenarios are the image sets grouped into certain classes. Different is that for scenario (i) verbal descriptions and possibly actionreaction relationships exist and that for scenario (ii) only information about future behavior of cells or organs exist. These scenarios lead either to mimic recognition abilities of the pathologist or to design some sort of useful quantitative descriptions. The latter is an endless task, however, the structuring of the descriptive methods in visual classes, as well as the structuring of the feature extraction methods, the taxonomy, as described in the following, should hopefully help make the task more manageable.

The dimensions of the image to be measured
A digital image is a regular, usually rectangular, array of measurement values of some property of interest from the area under study. The array elements are called pixels, short for picture elements. Each pixel represent the integrated intensity of light measured over the spatial extent of the pixel as defined by the optical and spectral properties of the measurement de-vice. This measured intensity may in light microscopy either be proportional to the amount of transmitted light, or intensities of fluorescence. Other modes such as reflection and phase contrast are also possible but far less common in quantitative biomedical microscopy.
The image array is at least two dimensional, representing the two spatial dimensions of a normal flat image. Since all objects intrinsically are three dimensional it is becoming increasingly common to capture 3D images by e.g., confocal, deconvolution or multiphoton fluorescence microscopy.
The spectral domain offers an additional data dimension. By limiting the spectral range over which a sensor registers light more or less well resolved spectra of the images may be obtained. The by far most common case is to create normal RGB color images by having three spectral channels, red, green and blue. Another common case is to have narrow bandwidth limiting filters that match some spectral property of the used stains. Thus a number of specific spectral channels can be obtained. Recently devices that can register whole spectra in each pixel have become available and found application also in biomedical microscopy [38].
Still another dimension is represented by time. If the specimen is subject to some process under which it changes with time this process may be monitored by successive images taken at suitably spaced time intervals. These images sequences can be seen as a multidimensional image with one temporal dimension.
A digital image may thus be up to five-dimensional (three spatial, spectral and temporal dimensions). There are a number of feature extraction aspects that need to be considered in order to take all these dimensions into account. We will in this paper, however, limit our discussion to the common monochrome two dimensional case. In the following it is thus assumed that the images only have one intensity value per pixel called gray value. This value has been obtained by suitable limitations in the other spatial, spectral and temporal domains through filtering, focusing and exposure time control.
When the image is analyzed the spatial arrangement and the measured values are assessed and some numerical values or features that hopefully carry some information of interest about the imaged object are computed. We thus have two closely linked aspects of the properties of the pixels: The actual measured gray values and their spatial arrangements. We have some features that expresses only the spatial arrangements, they are called morphometric features. Other features express the overall gray values without considering the spatial distributions, they are called densitometric. Finally we have the more complex and interesting features that expresses the combined aspects, the spatial distribution of the gray values. Those are called textural or structural features.
All area and size features are represented in manifolds of pixels. They are not normalized to absolute quantities of [µm] or [µm 2 ]. Knowing and documenting the actual pixel size in a particular study is, however, essential for the interpretation and reproducibility of the obtained results. Typically the pixel size may be anything in the range 0.5 to 0.05 µm per pixel squared. Larger values may be found in some studies but then the resolution of chromatin texture will be very limited.

The selection of appropriate features
The main objective when designing the feature extraction approach for a particular study in cytometry or histometry is usually to find a set of features that can discriminate between the different relevant populations of cells/specimens/cases/patients as well as possible. An additional requirement is robustness so that the results can be reproduced for new independently collected material. Usually the researcher has some notion about what features should be useful. This may be based on experience from visual image analysis, i.e., one tries to define features that expresses the properties of the cells that is used for the visual discrimination between the cell populations. It may be based on personal experiences from previous similar studies with quantitative methods. Or it may be based on studies of the relevant literature.
All features calculated can be grouped into shape features (Section 3.1), intensity features (Section 3.2), texture features (Section 3.3) and structure features (Section 3.4). In many applications the features of the two last groups are the most useful ones. They are at the same time the most difficult to define in a unique, robust and reproducible way and the ones most difficult to understand intuitively. We will thus devote most of this paper to features of those categories.
If the features that are expected to be useful are simple aspects such as size or global shape the definition of what to measure may be straightforward. It is much harder to express coarseness of the chromatin distribution or other features related to the textural appearance of the cells. But many studies have shown that these latter features may be more efficient in discriminating between various cell types of interest. The problems with the textural features are (i) the difficulty or even impossibility to comprehend and visualize texture features, (ii) the relative variability of selected (textural) features for different problems (cells, organs, treatments) and (iii) the missing relation of specific feature values e.g., found by a feature selection, to appearance and function of cells or nuclei. Parallel to the advances in molecular biology, morpho-and photometric related approaches using texture measures were less used in the early nineties. However, recently new and interesting methods and applications were published e.g., [14,45,110,132].
Sometimes authors give names for texture appearance of certain features, most probably since reviewers have requested this. However, these are often misleading hints. The necessary (and very time consuming) step to build a correspondence system of visual and functional terms to features (type and value) is up to now largely not done. In this publication we will avoid any verbal descriptions of texture features pretending a possible visual insight into possible representations. Instead we will concentrate on defining the different features in an exact mathematical way and sorting them into systematic groupings, hopefully improving the understanding and overview as well as making the features comparable and reproducible.
Since the selection of the best, or even a useful set of features, by intuitive methods may be quite difficult in many cases, a powerful supplement may be to perform a systematic evaluation of potentially useful features with statistical methods. As far as the authors knows, there exist no realistic methodology to automatically construct feature extraction methods on the base of a given material grouped by certain attributes. The methods proposed are strongly heuristic and combine learning and/or training capabilities with algorithmic knowledge [91] in combination with classical methods of statistical data evaluation.
The classical statistical approach, to evaluate a certain set of features per object of interest, is based on the construction of correlation functions between object feature values and object attributes or diagnoses. Or in the common case with multiple classes of objects the features with the greatest discriminating power may be found. This is straightforward when the features are taken one at a time. But combinations of features will in most cases give stronger separations of objects in the respective classes. So it is often desired to find the best combination of two or more features. If many features are calculated the number of possible combinations of several features may be very large. This makes the risk of over-fitting of the discriminating function to the training data substantial, unless the number of cases in the data set is very large [110]. It is thus of interest to initially limit the number of features used for the statistical evaluation to a number as small as possible without limiting it to the point where essential information is missed. Finding this tradeoff is an important reason for learning to understand what the different features express and how they should be combined to have the best chance of doing the desired discrimination or recognition job.

The feature extraction process
It is easy to invent many different features that can be extracted from an image. Thousands of features can be found in the literature that has accumulated over the decades. In order to be able to compare different studies and understand what the different features actually are expressing it is worth while to try to consider the feature definition process in a systematic way.
The feature extraction can be expressed in terms of (a) the definition of the zone of measurement, (b) an image transformation with a non-scalar result and (c) a measurement on the latter (see Figs 1,2). This separation is heuristic and is a method to describe formally the algorithms applied [91,111]. An added ad- vantage of this division of the feature extraction into three stages is implementational convenience. A few image transformations combined with a few measurement routines can generate hundreds of useful features for each zone of measurement.
The zone of measurement delimits the region to be quantified i.e., it describes where to measure. It may be the nucleus, the cytoplasm or the whole cell. But it may also be sub-regions that are helpful for quantization of semantically meaningful entities of the object e.g., the nuclear margin or the eu-and hetero-chromatin compartments of the cell nucleus, e.g., [78]. The feature extraction methods described here are primarily designed for digital images from single, complete cell nuclei, however they are applicable also for sections of cell nuclei and scenes, e.g., sets of nuclei. The definition of the zone of measurement is called segmentation and this is one of the crucial steps of any cytometry system (see Section 2.7).
The transformation is used to bring out the aspect of the distribution of the pixel values that is of interest. The transformation reflects the aim of the feature extraction method. If we are interested in the overall amount of absorbing material in the cell the creation of a simple 1D density distribution or histogram is an appropriate transformation. If we are primarily interested in the overall shape of the cell a thresholding transform that creates a binary image of the cell is a nat-ural transform step. If we want to express the local intensity variations a 2D gray level co-occurrence matrix may be an appropriate transform. The local intensity variations may also be enhanced by some initial image transform that e.g., removes the average gray level and creates an image of local deviations from the average. A large number of such transformations may be defined and produce more or less useful results depending on the particular application. The transformation results in a non-scalar value: a new image, a matrix of statistical co-occurrence values or a frequency distribution which are intermediate results used for the subsequent calculation of a scalar feature value. The transforms may very well be combined so that a histogram of some measurement from an initially transformed image is created and used for the feature calculation. Several feature groups are calculated from such frequency distributions. The calculation of a histogram is the most frequent transformation throughout all feature extraction routines presented here.
The final measurement delivers the feature value. The measurements results for each feature in exactly one (scalar) value. Measurements are usually a count, an integration or a selection. Since the transformation in many cases result in a histogram the distribution expressed by this histogram can be characterized by a moment. The first moment is the average, the second the standard deviation, the third the skew and the fourth the kurtosis. Similarly 2D moments may also be used to express the shape of the object or the overall density distribution. The statistical estimators A (= M0), M1, M2 and M3 are proportional to polynomial moments. The method of modelling a distribution or function via moments is a method commonly used in feature extraction and in other fields as well. Through a normalization that will be described in Section 3.1.4 the measurements can be made robust against several irrelevant variations such as orientation and position. Sometimes more than one scalar value is needed to express some properties of interest. In such cases several measurements are performed. On one transformation several measurements can be performed and the same type of measurement can be carried out for different transformations.
We have tried to reflect the three steps in the feature extraction process in the naming structure of the features. Thus the kind of measurement is described with short suffixes A, M1, M2, M3, MAX, MIN, MOD, MED, ENT, etc. (see Fig. 7 and Appendix C.1). The type of transformation and the zone of measurement is shown by a prefix e.g., BG for background, L for Laplace transformation or H for bright particles.

Concepts of texture and structure as descriptors of chromatin arrangement
When discussing features describing microscopic cell images much attention must be given to the arrangement of chromatin in the cell nuclei. The reason for this is twofold. The eu-and hetero-chromatin has in many studies been proven as the most important part to group cells in different states of malignancy or functional classes. It is thus very important to describe these aspects well. But this is also very difficult to do. Two main approaches have been developed. In the first approach (i) the chromatin distribution is seen as a local arrangement of more or less small objects of varying intensities. We will call this the structure approach. In the other, (ii) the statistical or texture approach, the chromatin distribution is characterized by 2nd or higher order statistics.
The first way of describing texture is more or less heuristic and is ruled by imagination and descriptive force. It is reflected by intensity measures on subregions of dark and bright particles. Resulting features try to mimic the visual diagnostic clues of a cyto-technologists. The second approach models the arrangement by more or less regular repetitions of arrangement configurations. It is reflected by trans-formations like Laplace, median and specially by cooccurrence and run-length features. The latter two appear nowadays under the term of GLDM (grey level dependency matrices) and are widely used. Several texture related publications use this concept, e.g., [109,129,132].
The large disadvantage of texture features, especially of the co-occurrence and run-length types is the difficulty to relate them to visually perceived chromatin changes. Another difficulty is the sensitivity of texture features for variations in specimen preparation and image gathering setup. In Rodenacker [86] the sensitivity on size and intensity changes was examined. Several features showed large dependencies on those influences and on parameter settings during the feature extraction process. Texture features are also very sensitive to proper focus of the microscope, even to the extent that the extraction of texture features has been used as a focus criterion by some researchers [79], i.e., when the texture feature has its maximal value the cell nucleus is in the best possible focus.

Intensity conversion
The feature extraction process will use several transformed versions of the original image obtained from the microscope. In addition to the original image, as it was gathered, a shading image containing an empty viewing field for bright field shading correction and the region of interest as a mask image are frequently used. Intensity values of the original image can represent transmitted light, inverted transmitted light, extinction values, reflection values, fluorescence intensities or whatever the image generating process was. In case of color images the method for analyzing scalar images discussed in this paper can be used if only a method to extract one intensity value per pixel is chosen. This can be one channel (red, green or blue) or a transformation according to a certain color system (e.g., hue, saturation or luminance in the HLS color system). The actual choice depends on the physical properties of the stain used and on the image gathering method used. It is generally assumed that for one project the staining procedures and the gathering method are unchanged. The parameter for feature extraction controls the way original values are converted. Values from transmitted light are typically transformed and used also as extinction values.
In the first step from the input data the final intensity image (one value per pixel) and the mask image is produced. For classical cytometric purposes includ- ing DNA content estimation the guidelines for standardization [55] are fulfilled including multiplicative bright field correction, extinction standardization using white value estimation from near background as well as black shoulder correction.
In physical terms the amount of transmitted light is the measurable quantity. Often the transformation from transmission T to extinction E is performed in the image gathering device. Our strategy is to gather and store data as original as possible. This means in most cases we store data with values in transmitted light. The formula for intensity conversion is listed in Appendix C. Figure 12(1 1) shows the original image in transmission and Fig. 12(3 1) shows the transformed extinction image. In the following text the resulting input image used for feature extraction is referred to as extinction image E.

Segmentation
Segmenting the image into the relevant objects and background parts is a crucial step in almost all image analysis tasks. It is also in many cases one of the more difficult tasks. This paper will not go into the different aspects of this important topic. We will only note that in most of our applications on cytological material the object segmentation is performed via thresholding with an automatically estimated threshold [23,69] and succeeding mask processing. Sometimes this threshold is interactively controlled and corrected. Only the threshold is stored together with the image. In more complicated cases the automatic segmentation consists of extraction of features per pixel and their classification into different classes of sub-regions. For histometric purposes i.e., when objects in tissue sections have to be segmented, or the objects are not compact, automatic segmentation is much harder to achieve and often the object mask itself has to be interactively corrected [66,76,88] (see Fig. 4). In these cases a mask image or label image accompanies the image. In most applications the nucleus and the background are the primary regions to be segmented but sometimes, depending on staining and problem, additionally the cytoplasm region is of interest.
The choice of features per pixel depend on the characteristics of the sub-areas to be discriminated and it is not fundamentally different from the statistical feature selection methods used for the discrimination of objects [46,91].
Objects are considered as simply connected and compact. The mask resulting from the initial thresholding is thus filled, border touching objects are deleted and cleaned by openings and closings [111]. In a first step all objects are deleted which either disappear under an erosion of minimum radius or do not disappear under an erosion of maximum radius (Fig. 5). Erosion, dilation, opening and closings are performed with an octagon of given radius instead of a circle. The mask of the object is displayed in Fig. 12(2 1). Equivalently the elimination of too small and too big objects can be achieved through a distance transform of the filled mask followed by a selection of connected components with a maximum distance value in the appropriate range. The result of the segmentation will for each object be a mask which we will denote O. It is the set of all pixels p = (x, y) of the object O (p ∈ O). We also define C ⊂ O as the ordered set of pixels of the contour or border and | · | the number of elements of a set · . In other words O describes the mask or zone of measurement and C its border e.g., as a list of coordinates. In Fig. 3 some typical cell images with overlaid borders are displayed. All further measurements are related to those data.

Feature combinations and normalization
Features that are directly extracted from the images can be combined to form new features that may express the desired properties of the cells in a better way. As a simple example, the area and the mean optical density of a cell nucleus are not biologically significant. However, the product total extinction, TOTE = M1 · A, represents the total amount of stain or the total amount of chromatin if a stoichiometric staining like Feulgen was used. Sometimes it is not clear which feature is the primary one and which is derived through a combination. Returning to our example of area, mean and total extinction one can directly extract the total extinction and obtain the mean optical density by dividing it with area. The choice of what is primary and calculated features is than mainly a question of implementation convenience.
One of the main reasons for calculating secondary or combined features is to obtain normalization against undesired factors (e.g., the number of particles is normally correlated with the area of the whole object) or to increase the comprehensiveness. Examples of feature combinations are the well known shape factor P2A (= P 2 /(4πA), Perimeter P , Area A), the already mentioned integrated optical density (TOTE, total extinction), any coefficients of variation or other ratios and differences for normalization purposes. Some examples are given in Appendix F.
Depending on the classification scheme certain combinations are not necessary. E.g., in the case of stepwise linear discrimination analysis it is not necessary to define linear combinations of features separately. However, the combination of base features normally increases the comprehensiveness of the features.
It should also be pointed out that in case the images represent histological sections the third dimension need to be taken in to account. In that case several estimates represent densities in a stereological sense [35,128]. We will, however, not go into any detailed discussion about stereological considerations.

Features
There are several possible ways of grouping the different kinds of features that can be extracted from microscopy images of cells. As outlined in the previous sections we propose size and shape, intensity, tex-ture and structure as the main categories of features, a grouping which is based on whether only the spatial, only the densitometric or the combined spatial and densitometric distributions are assessed. The Sections 3.1 to 3.4 correspond with the algorithmic descriptions in Appendixes B to E.

Morphometric features, expressing size and shape
The morphometric features expresses the overall size and shape of the cell. They are distinguished from the densitometric features in that they do not take the density of the cell into account, except for the initial segmentation step. For these features only the object mask O and its border C are needed, not the actual gray scale image. They are listed in Appendix B and illustrated in Fig. 12(2 1).
In our experience the shape of cells or nuclei respectively were rarely decisive for the problem under research. In fact the weak effects of early stages of diseases are not well represented by cellular and nuclear shape. However, they may be useful in distinguishing between various types of cells or between cells and other objects.

Position and orientation dependent features
A straightforward way of estimating the size of an object in an image is to find the minimum and maximum coordinates in the x and y directions ((LOX, LOY), (RUX, RUY)). These define the bounding box of the object and the height, width and area of this box will say something about the size and shape of the object. The ratio of the area of the actual object to the bounding box will provide a simple shape factor (FO1). Unfortunately all these measures have the serious disadvantage that they are strongly orientation dependent. If we have a narrow, elongated object these measures will vary greatly as the object is rotated relative to the image grid. Since absolute orientation rarely if ever neither is of interest nor can be controlled, these kinds of features should be avoided.
Similarly the absolute coordinates of the object centroid (KX, KY), are of no value for cytological specimens except for administrative purposes, e.g., to be able to relocate a cell in the cell image for future reference. In histological specimens they may be of use also for studying tissue architecture since the spatial distribution of the cells then is of interest [92].

Geometric features
Geometric features independent of position and orientation and thus more generally useful are area A, perimeter P , largest inscribable circle RAD, largest extension EXT_MAX. The perimeter P is computed as the length of the outer contour (8-connected neighbored pixels assumed) [80] i.e., as a weighted sum of steps around the border. The weight factors for horizontal/vertical steps should be 0.948 and for diagonal steps 1.343 in order for the estimate to be unbiased [70,71]. However, we have in our studies so far used the weights 1 and √ 2. The largest extension is computed as the longest straight distance between any two points on the object border. This value is similar to the smallest circumscribed circle, also called the minimum spanning circle [43], if the object has a reasonably regular shape.

Contour features: Curvature and Fourier
descriptors Most of the shape information about an object is contained in the contour of that object. A number of different shape features can thus be based on analysis of the contour. The contour C can be represented as the ordered set of border pixels. For consideration of the functional behavior of this border the contour function dependent on the perimeter c(p) has to be defined. For several analysis methods this function has to be regularized in terms of p to lc(p). In words, this regularization means that the border is re-generated from points with fixed equal distance between all neighboring points. In many cases it is useful to select this distance in such a way that the total contour has the same number of such border points for all objects. The other way around means to fix the distance between all neighboring points. We perform the regularization by piecewise linear interpolation between the given border points. lc(p) is now a parametric representation of the contour. It consists of lc x (p) and lc y (p). There is a variety of transformations and measures based on c(p) or lc(p) as shape representation. A thorough description would go beyond the scope of this presentation. A fairly general and instructive description of shape descriptor concepts can be found in [42].
Curvature and bending energy: The rate of change in orientation of successive border elements is the curvature of the border. If the object is smooth we have only small curvature values while an irregular object also show a number of large such values. Representing the curvature values in a histogram is a transform which expresses the object shape in a detailed way. If we sum the square of all curvature values and normalize by the length of the contour we obtain the bending energy. The name comes from the fact that this measure is proportional to the energy it would take to bend a straight elastic rod into the shape of the object.
Convex hull, deficiency and elliptic deviation: The convex hull of an object is the smallest convex object that completely encloses the object. The difference between the convex hull and the object is called the convex deficiency. Since a typical cell nucleus is elliptical, and an ellipse is convex we will usually not have much convex deficiency for cell nuclei. But the measure can be quite useful in e.g., detecting overlaps, since those will have a larger convex deficiency than single nuclei [39]. Another feature, the elliptic deviation, similar to convex deficiency, calculated from the ellipse of equal area using the two first Fourier spectral coefficients, is likewise helpful for shape characterization (Fig. 10).
The convex hull can be generated by several different algorithms [22]. We construct it from the points in c(p) by using the Delaunay triangulation, which delivers as a by-result the list of points of the convex hull. The perimeter and area of the convex hull is calculated. The convex hull and deficiency has the great advantage of being visible and easily recognizable features in contrast to the ones described next.
Fourier descriptors: The contour elements lc(p) can be considered as a complex valued function LC(p) = lc x (p) + i · lc y (p). Since it is a closed contour, it will be a periodic function and can as such be transformed by the Fourier transformation to its spectrum (see Fig. 9). To illustrate the meaning of the Fourier spectrum some low pass filtered contours are plotted in Fig. 9. The energy at different frequency intervals in this spectrum will be an, in many cases, useful descriptor of the object shape [60].

Invariant moment features
Geometric moments describing the extent of the object are straightforward to define and are useful shape descriptors. They are unfortunately sensitive to overall size and orientation of the object. Invariant moments which are normalized against those factors were proposed by Hu [62] and later extended and improved by Reiss [82]. Following the notation the five to six invariant moments are derived from the central 2dimensional polynomial moments. Using the indicator function (see Appendix A) for calculation, which is in fact the function of the mask, the invariant mo-ments are pure and sensitive shape descriptors (see Appendix B.4). As an, in many applications, particularly when dealing with tissue sections, useful, by-product THETM (θ M ) the angle of the principal axis of the object is obtained (Fig. 12(2 1)). Moments may also be weighted by intensity or other transformed images as described in Section 3.2.2.

Densitometric features, expressing total intensity
For the densitometric features the absolute values of the intensity measurements in the image play a crucial role. It is thus important that those are as well controlled and normalized as possible. As stated in Section 2.6 we save the original image with transmitted light values. It is thus always possible to return to image data as close to the physical acquisition conditions as possible. But for calculating intensity based features the extinction pixel values are more directly useful. We thus use the Extinction image E for this purpose. This guarantees that no original pixel values of the converted image are cut off (see Fig. 12( 3 1)). Pixel values are strictly positive. Since that image is based on a global white value estimate the extinction value zero it may be locally only approximately correct. A local correction of this zero extinction estimate is performed via the measurement of background mean (extinction) BGM1 (see Appendix C). In Fig. 12 In a pure densitometric measurement the spatial positions of the pixels are not included. All the densitometric information is thus retained in the histogram of the image. By calculating moments for the distribution in the histogram the densitometric information can be condensed into a few useful measures. Additionally the largest and lowest density value may be of interest. Since that is strongly influenced by noise the nth largest or smallest value may be used instead. Here we have used n = 7, but in some of our studies n = 3 has also been used. The median and mode values of the distribution are other useful measures as well as the grey level of the 10th and 90th percentile values of the distribution.

Intensity features from different regions
Calculating the histogram for the whole image limits the usefulness of the densitometric features. Therefor the image is subdivided into five different regions and the histogram based densitometric features can be extracted for each of these regions (Fig. 2):   Fig. 12(3 1)) are computed mostly for normalization, quality estimation and quality ensuring purposes. BGM1 expresses acquisition influences, BGM2 gives an impression of how clean and even the background is. M1 is also called mean optical density if the transformation to extinction (see above) is performed. Beside the normal interpretation, M2 can be used to measure, to a certain extent, the contrast between eu-and hetero-chromatin inside the cell nuclei [53].
The inside border zone (BI. . .) is defined as the set difference between the original object O and the eroded [111] object O B BI . The outside border zone (BO. . .) is defined as the set difference of original O and dilated object O ⊕ B BO (Fig. 6). Features from these regions are often used for normalization purposes. They deliver either estimates of the mean extinction of the cytoplasm or in case of nuclear specific stains a near background estimate. Zone features BI. . . and BO. . . serve also for quantification of disparities in the location of object extinction compared with intensity features from the whole object (see also features HU. . . and HL. . . in Section 3.4). Border located chromatin or koilocytosis of cells can nicely be represented by combinations of these features [21].

Invariant moments from the extinction image
By combining information about the spatial location and density of the pixels, features expressing combined size, shape and texture can be extracted. A useful tool for this are the density weighted moments. Formulas used for calculation of invariant moments are outlined in Appendix B.4. IM1-IM5 are the invariant moments, where the extinction image was used as weight function. A crucial point of these features is the normalization. Higher order polynomial moments cover a large range of values so the normalization becomes important. The calculation of moments using as weight function the extinction image E is similar to the proposed de-noising of invariant features proposed by Balslev [15].

Textural features
The goal for the extraction of the textural features is the quantification of the overall local density variability inside the object of interest. Here the concepts of building up the feature extraction process from definition of zone of measurement, transformation and measurement steps as outlined in Section 2.4 becomes very useful. Usually the zone of measurement is the whole cell nucleus, but it can also be some subpart of the nucleus e.g., the peripheral zone, since several studies have shown that splitting the central and peripheral parts of the nucleus into different zones may give better results [78].
Several possible transformations are of interest. Applying local operators as gradient, Laplace and the median transformation implies a concept of texture as described in [72]. The application of global operators like rice field reflects a model of texture where texture el-ements are extracted applying a topology on gray images defined by monotonous increasing or decreasing path connectivity [91,95,111].
Taken all together we have the following texture transformations The absolute size of the local operations vary under changes of magnification. For the Laplace and the flat texture transformation, the size of the transformation window can be changed by parameters accordingly. Depending on the size of the filter used the object region is eroded accordingly to avoid border effects. Hence, measured areas LA, GA, F A, etc. are slightly different from the full area A of the object O. In Fig. 8 the mask of the object, the cell nucleus, is outlined with the distance transformation in two different representations. A threshold in the distance transformation defines an eroded mask and its maximum is the radius of the largest inscribable circle. This reduction of the zone of measurement becomes important for complicated, non-compact and thin objects.
For each of the transformed images we can in principle extract all the histogram based features and all the invariant moments, using the transformed image instead of the extinction image. This generates great numbers of possible features. We will, however, give some comments about which we have found most useful.

Gradient image features
A 3×3 gradient approximation is used. The gradient value GRA (E(x, y)) is defined in Appendix D.1. This is an approximation of |(∂I(x, y)/∂x, ∂I(x, y)/∂y)|. The G . . . features can be considered as a quantification of the velocity of changes of gray values (see Fig. 12( 2 2)). The measuring field is eroded by a 3 × 3 square.
The skewness of the gradient value distribution was of importance in several projects. It shows how far the gradient distribution deviates from the random (triangular) distribution. The product of GA·GM1 = GTOTE is a measure of energy for intensity changes.

Laplace image features
An approximation of the Laplace operator is used to transform the image. The formula can be found in Appendix D.2. The distance r, in practice the size of the convolutions matrix, can be selected. The measuring field is eroded by an octagon of the size r to eliminate border effects (Fig. 12(3 2)).
The Laplace transformation is often used for segmentation purposes. The so called zero-crossings [73] define borders of objects via zeros of the sum of the second partial derivatives. These are approximated by the here applied convolution. From this point of view the features or distributions of a Laplace transformed image deliver a certain description of changes in pixel neighborhoods defined by the size of the convolution matrix used. An in-depth examination of this topic can be found in the recent thesis from G. Smith [112].
The convolution can also be considered as a correlation. The feature seen in this way reflects measurements of the probability distribution of the existence of pixel configurations described by the chosen convolutions matrix. In some cases this matrix is called an approximation of a Mexican hat. Mathematically spoken, the Laplace transformation is an approximation of ∂ 2 f (x, y)/∂x 2 + ∂ 2 f (x, y)/∂y 2 , the sum of the second partial derivatives. The resulting features L . . . may be roughly considered as quantification of the velocity of changes of the gradient.
The most important feature from this transformation is LM2, the standard deviation. Its magnitude reflects the intensity of regular particles fitting into the size r of the convolution kernel (see Appendix D.2).

Flat texture image features
Image FT represents the difference between the original image and a two-dimensional median filtered one (see Appendix D.3). The median is a non-linear smoothing operation with reduced edge effects. This means that the shape of the transformation window generates only very little artifacts in contrast to linear operations like the Laplace transformation or the wellknown box car smoothing.
The size of the square median operator window r can be chosen arbitrarily [59]. Depending on the window size the median transformation will smooth away all particles of the object up to an area of half the window area (r 2 /2, median). The flat texture image can be considered as a peel or hull of the original object containing just the interesting particles (and holes) on a flat [93]. The transformation normalizes the global variation of intensities (large and slow changes of background intensity) (Fig. 12(4 2)). The median operator is a special case of the rank order operators [59]. Erosion and dilation in gray images with a square window as structuring element are other rank order operators described in [111] and [93,94].
The measuring field is reduced by an erosion of an octagon of r/2. The local reach of the median transformation is less compared to linear transformations like Laplace. Deviations from zero of FM1 show a certain unbalanced ratio between dark and bright areas. Positive values reflect the more frequent appearance of bright areas in the original image. FM2 quantifies particle contrast.

Topological gradients and rice fields
The extension of the skeleton of binary images [111] leads under an appropriate connectivity criterion in gray images to watersheds [75] or in the resulting gray image to the so called rice field [85,94]. The connectivity chosen is the monotone decreasing or increasing path connectivity [95]. Watershed has in the last few years gained large acceptance since it is a partitioning procedure without a priori. Additionally an efficient algorithm [127] has made it usable for everybody. By the watershed, the whole image is partitioned into zones of influence of local extreme values. The borders of these zones are watersheds which directly corresponds to the topographical term considering the gray scale values as height values. Another watershed can be obtained if the gray scale values instead are seen as depth values.
Another way of expressing this is that there are watersheds for the local minima and watersheds for the local maxima. The regions surrounded by watersheds represent the set of pixels reachable by monotone decreasing or increasing paths from the local maximum or minimum, respectively. Rice field is the transformation where each pixel of these regions is assigned to the value of the corresponding local extreme value.
Features derived from the rice field difference (RG . . ., see Appendix D.4), which represents a topological gradient, reflect the variation of the gray values of particles in neighborhoods governed by this topology. The transformation shows the distance between the minimum and maximum hull (Fig. 13(3 1)) and can be interpreted as the mean peak distance between neighboring bright and dark particles.
The so-called 2nd difference D 2 between the original image and the mean of upper and lower rice field (RU and RL) is a transformation with very interesting and helpful properties (Fig. 13(4 1)). The D 2 image can be considered as another flat texture image ( Fig. 12 (4 1)). However pixels with D 2 (v) > 0 deliver a region containing and surrounding all eu-chromatin particles. The complement is the surrounding of all hetero-chromatin particles ( Fig. 13( 1 2) and (2 2)). It is called half-height partition since the border pixels of this partition are located on any monotone path from a dark particle to a bright one on half-height between the related local minimum and maximum. Intensity features of these zones of influence are called HU . . . and HL . . . (see Section 3.4.2).

Mayall/Young chromatin features
A special transform taking the global shape of the object into account was proposed by Mayall and Young in [133] together with a set of feature measurements taken from this transform i.e., HETEROgeneity, CLUMPness and MARGination (see Appendix D.5). In addition to these features we also extract and store the intermediate features NB, NG, NW, RM0, RM1, RM2. For the calculation of the radial moments the distance transformation is applied to the object mask to obtain iso-distance lines (see Fig. 8). A by-result is the maximum of this transformation delivering the radius RAD of the maximum inscribable circle already listed under the general shape features (Appendix B.1 and Section 3.1). Figure 12( 3 4) shows the regions O NB , O NW and O NW called black, grey and white pixels and the radial mean extinction distribution from border to center is outlined. The latter is calculated for the radial extinction moments RMi used in the formula for MAR-Gination.

Run-length and co-occurrence features
The most popular of all feature extraction methods in the literature are the Haralick [58] and Galloway [49] run-length and co-occurrence features. A recent example can be found in [132]. They are in detail described and evaluated in [86]. These features can be calculated for any of the previously described transformed images as well as for the original extinction image. We have mainly used the extinction image (CO1-CO15, RL1-RL5) and the flat texture image (NC1-NC15, NR1-NR5). The derived feature set is the original one proposed by Haralick. In Fig. 12( 1 2) and (1 4) and Fig. 12(1 3) and (2 4) the normalized images and the intermediate matrices (co-occurrence and run-length) are displayed.
The parameters controlling the extraction of the cooccurrence features are the displacement vector d (e.g., d = (1.5, 1.5) µm, corresponding to (6, 6) pixels), the size of the co-occurrence matrix N G (the number of gray values after normalization) and the normalization method (histogram equalization or linear spread).
Different displacement vectors will generate cooccurrence matrixes for different sampling orientations of the texture. If the texture is directionally homogeneous (as is usually the case with chromatin distributions) the sampling statistics can be improved and any irrelevant directional dependencies can be reduced if the displacement vector is also applied with a rotation of 90 • and accumulated to the co-occurrence matrix. However, if the texture can be expected to be directionally dependent displacement vectors generating cooccurrence matrices for a number of different orientations should be used. The maximum, minimum and range of feature values over the different directions can then be used to express the directional inhomogeneity of the texture. For a discussion of the choice of cooccurrence matrix parameters see Rodenacker [86].
Parameters for run-length features are the direction of run-length estimators (e.g., 45 • ), the number of gray values after normalization N G , the maximum considered run length N R and the normalization method, similar to co-occurrence feature evaluation.
The co-occurrence matrices has the disadvantage of not directly expressing the size of the texture elements, only the grey level contrast, while the run length features only expresses the size, not the contrast. Albregtsen and Nielsen [3] recently proposed a combination, the co-occurrence of grey level run lengths matrix. This is a 4D matrix expressing the probability of co-occurrence of two runs of a given grey level and given length. Instead of extracting predefined features from these matrices they created adaptive features based on a statistical analysis of the object populations under study. They have also used this adaptive feature concept for the regular co-occurrence and run length matrices. It seems to be a promising new approach [4,77].

Miscellaneous transformations
Even though the described combinations of zones, transformations and measurements can generate great numbers of useful texture features it is likely that new and better features will be discovered in the future. The chosen software structure allows an easy extension of the feature extraction procedure to other, newer or better methods.
As an example the local fractal dimension [97] is displayed in Fig. 12(2 3) and the local multi-fractal dimension at Fig. 12(3 3). From the corresponding histogram of the whole object F R . . . and MF . . . features are extracted (see Appendix C.1).

Structural or contextual features
If each chromatin particle is considered to be an object a number of features can be extracted that describe the relationships between those particles. This is the structural or contextual way of describing texture. It requires a way of defining particles followed by a way of generating useful measures of the relative locations and relations between the particles.
From the flat texture image dark and bright areas, as well as from the rice field images upper and lower half height areas, such particles can automatically be segmented. The thus defined particles are used to obtain statistical measurements from the extinction image E. Even though the measurements are the same as those described in the Densitometry Section. (Section 3.2) we will here consider them as contextual features. With images from histological sections the area measurements are useful for volume density estimation in nuclei [66,76,87,88].    These features deliver additional information about the dispersion of eu-and hetero-chromatin. The prefixes of the feature groups are D and H for dark and bright particle areas from the flat texture transformation and HU and HL from the rice field.

Contextual features from the flat texture image
These features are derived (see Appendix D.3) via automatic application of a threshold T D and T H for dark and bright particles to the flat texture image F T (Fig. 12(4 2)). E.g., c 3 = 0.0 results in a partition of the nuclear area comparable with half height area (see Section 3.4.2 and Figure 12(4 3)). For the resulting masks O H and O D intensity features from image E are calculated. As additional feature values the numbers of connected particles are counted and stored in HNO and DNO.

The half height partitions from the rice field
The watershed or rice field transformation delivers not only the zones of influence (ZOI) around each par-ticle but also without any a priori the number of dark and bright particles. To illustrate this, the transformation results are shown in one dimension in Fig. 11. For cell chromatin these properties are a very valuable feature. In Fig. 13 the images from rice field calculation are outlined. Especially Fig. 13(3 2) and (4 2) show the watersheds and (1 1) and (2 1) the upper and lower rice field. In Fig. 13(2 4)

Comparison of the different particle segmentations
The different ways of defining chromatin particles will give significantly different results. For the Mayall/Young method (see Section 3.3.5) a threshold is derived from the object mean M1 of the extinction image. For dark and bright particles a threshold is derived from the object mean value FM1 of the flat texture image FT which is already normalized in terms of global intensity changes. For the half height partition (see Section 3.4.2) the result is derived from a topological transformation where the border of the partition is independent of a threshold and only dependent on the topological neighborhood of the particles.
Comparing visually the respective segmentation in Fig. 12(3 4), (4 3) and Fig. 13(1 2), (2 2) the differences are obvious. The dark and bright particle segmentation delivers small regions as centers of the locations of the local extremes but by far not as complete as the rice field driven half height method. The Mayall/Young method is highly dependent on large scale intensity changes which occur relatively frequently in nuclei.

Contextual features from particle relationships based on graphs
The locations of the particles can be used to generate graphs for structural analysis [85,90]. The Delaunay triangulation graph, nearest neighborhood graph, the minimum spanning tree and the convex hull of the dark particles and the bright particles are calculated. From these graphs the numbers of nodes and neighboring nodes can be extracted and used for further analysis of topography and deposition of stain. Automatically the number of neighboring nodes per node is calculated and stored. In Fig. 13(3 3) and (4 3) the Delaunay triangulation and the nearest neighborhood graph is outlined. The variation in the number of neighbors in different parts of the tissue may be a strong indicator of topological heterogeneity. This type of feature is little used in cytometry but is much more common in (micro-)histometry [63,68,130]. We also did some work on applying texture measures to histometry and on comparing the two approaches [40,41].

Some feature transformations
As already mentioned in Section 2.8 base features have to be transformed to reduce e.g., size dependen-cies and increase comprehensibility. In Appendix F some transformations are outlined as examples. Possible transformations are strongly dependent on the goal and problem of the specific experiments.

Implementation
The presented feature set is the result from about 25 years work on cyto-and histological specimens at GSF and has been widely used for a multitude of different projects. A similar set of features has been developed and used at the Centre for image analysis in Uppsala. The actual implementation of the GSF extraction program is written in IDL 1 . It succeeds several precursors (ILIAD [48], DIBIVE, BIP (developments in our labs)) which were programmed by interpreted language and were running on several different processors following the technical development of computers. The actual IDL implementation is nearly independent of the underlying hardware. It was primarily developed on VAX 2 and runs now on PC and Unix machines likewise only dependent on the range of supported architecture of IDL. In Uppsala the Unix based IMP image measurement program platform which also traces some of its roots back to the ILIAD system is used.
A software package embedded into a graphical user interface (GUI) has been developed for easy access by computer-non-professionals (Fig. 14). The package is fully equipped for different image acquisition modes, image display (PRAEPARAT_ZEIGEN) (Fig. 3), segmentation (SCHWELLE_A, automatically) and (SCHWELLE_K, interactively controlled) (Fig. 4) as well as for different feature extraction methods (PRAEPARAT_RECHNEN) and the choice of certain groups of features (PARAMETER, PARAMETER_-SET) and their parameterization. The possibility to switch between different feature groups and parameter sets is used for production systems based on preliminary studies using a selected, possibly reduced set of features.
The input of the procedure PRAEPARAT_RECHNEN for feature extraction is a set of images of arbitrary format combined with the zones of interest described by a mask or a threshold and a control parameter file. The set of images normally represents one specimen. Images can contain either one object (which might con- sist of more than one connected component) or scenes with multiple objects. The output of the procedure is a feature file containing per object one record of all computed features. It is also possible to display intermediate results on screen. This facility is used to illustrate the described features and to visualize certain specific feature values or combinations as well as segmentation results. Shown are typically transformed images with the corresponding frequency distribution overlayed. The abscissa has [0 · · · 255] or [−128 · · · 127] values and the ordinate is scaled to the double of the maximum frequency of the histogram. Bin size used for histograms is always 1. The feature set from specimens can directly be displayed as a graph in combination with the corresponding images.
The image data are stored in a directory tree with the project directory at the top and underneath the specimen directories. Image data can be in various formats. They can contain one object (single cell evaluation) or scenes with several objects. The extraction method is considered as off-line. Image data are gathered, stored and more or less automatically segmented. Finally the feature values are calculated. Actually features and images are stored on hard disc using a data base system [44] and are fed into SAS 3 or BMDP 4 statistics packages.
Each extracted set of features per object represents a real valued vector. These vectors are stored in a feature file containing all feature vectors of the objects of one specimen. In addition to computed features some organizational data are stored in the feature file such as: NUM Running number of (cell) image in the set TC Automatically computed threshold TI Interactively controlled threshold if any BGM1E Estimated background mean BGM2E Estimated background SD These values are obtained by the automatic threshold estimation routine. Finally some stored data come from the gathering device describing a preliminary object classification and the coordinates of the image field on the specimen:

DIA12
Two character diagnosis (if any is given) CX, CY Frame coordinates in the specimen in µm For each project consisting of an arbitrary number of specimens a control file is created, containing all appropriate parameters for the processing of the images. These data can be grouped into parameters for • Automatic threshold estimation • Object mask generation • Intensity conversion • Feature calculation control.
The latter consist of Boolean parameters for switching on and off whole groups of features to be evaluated as well as the necessary parameters like median window size or radius of outer border. Additionally the representation for display is controlled for visual inspection of the feature extraction procedure. The parameters are separately described in the algorithmic appendices and on the web page of the authors.

Applications overview
The presented feature set has been used in a multitude of different cyto-and histometric projects, a list of typical projects sorted by application field is given in this section. Depending on the problem the most important features vary. Typically in classification problems of different cell types, shape and simple intensity features are used. In all other cases like functional diagnosis or prognosis concerns, mostly texture feature were selected to relate specimen to some relevant external criterion. This is also valid for (micro-) histometry approaches. The list of features and parameters are • Cytology -Cervix * Early cancer recognition [25] * Malignancy associated changes [26,30, -Esophagus: * Discrimination of dysplasias by chromatin features [50,134].

Summary and recommendations
In this paper we have outlined somewhat in detail the feature extraction strategies and the resulting fea-ture sets used at the GSF image analysis laboratory in Neuherberg, Germany and at the Centre for image analysis in Uppsala, Sweden. A list of studies in which the described feature extraction methodology has been applied has been provided to illustrate how the features are used in practice. In this final section we will try to summarize the main recommendations we would like to make based on these experiences in a set of rules of thumb.
(1) It is important to carefully analyze the problem at hand. The initial definition of the problem is rarely the problem which has to be solved. It is particularly important to take a critical look at the available image quality. Is the image magnification/resolution optimal for the problem. Are the preparation and image acquisition procedures sufficiently stable. The chances of success are significantly greater if the image analysis considerations are brought in before the specimens are prepared and the images collected. Too often the image analysts are contacted too late in the process when significant parameters no longer can be changed. Many image analysis problems can be made much simpler with small changes in imaging parameters. (2) The experimental design phase including first tests should be ruled by a sincere sense of simplicity and modesty. The experimental design rules the final trade off of the whole experiment. Never try to solve simple problems with complicated tools. Make the experiment as simple as possible, but not simpler. Try to visually determine which classes of features are likely to be able to discriminate between the cell categories of interest. The images (cells) should be observed carefully and knowledge concerning the cell type, the preparation procedure and the image acquisition should be applied. Every recognized or estimated regularity or irregularity is helpful to select or to reject an appropriate feature either for cleaning or for refining and structuring the data set. (3) The natural hierarchy of features as described in Section 2.3: morphometrical, photometrical, texture and structure features can serve as a guideline in this analysis. Considering features as statistical estimators this hierarchy reflects the increasing variability of the features. Intensity features allow to discriminate morphologically similar groups, texture features allow to discriminate photometrical similar groups and so on. Hence, for the experimental design it is recommended to start with the most appropriate feature group, e.g., in cytometry with morphometrical and in histometry with photometrical features and to continue on reduced strata (by a certain similarity criterion) with features from the next hierarchy level. Usually it is not recommended to apply textural or structural features on data varying strongly in size and intensity. (4) Within each relevant feature group in the hierarchy a few features that expresses the aspects of the cells that seems most promising should be selected. Among the morphometric features area and perimeter are commonly useful, while the Fourier shape features form a powerful but somewhat complicated way of extracting more detailed shape information. The total integrated optical density is the most commonly used photometric feature. In the texture group the cooccurrence features are the most popular in the literature, unfortunately there are very many degrees of freedom in the parameters and selection of the specific subset of features here which make the optimal feature selection difficult. The more recently developed texture and structure features based on watersheds and rice field images form a most interesting alternative with fewer degrees of freedom and less dependence on parameters although with much fewer results in literature to base an opinion about usefulness on. (5) In addition to features actually intended to be used for cell recognition or classification it is often worth while to record a number of features for quality control purposes. An example is to calculate the background intensity near the cell. If that varies significantly for different cells there may be problems with the illumination homogeneity that need to be taken care of. Even the time of recording the cell may be of interest since it can reveal drifts in the instrumentation. We once found strong correlation between time of day and malignancy index in an experiment, which turned out to be caused by higher temperature in the afternoon effecting the properties of the camera used. (6) Be aware of the exact mathematical definition of the different features used and document that in the experimental reports. Do not accept blackbox features with unknown definition, even if they have names that sound OK. This is absolutely necessary to make the experiments reproducible for others.
(7) Only when the best effort in applying knowledge about cell differences and visually feasible features has been exhausted should direct statistical methods for feature selection be applied. Modern multivariate statistics, e.g., various types of stepwise discriminant analysis does provide powerful tools for extracting which features are discriminatory, but the statistical outcome is strongly influenced by the number of features tested in the statistical analysis. It is far too easy to come up with overly optimistic results if hundreds of features are generated and tested on the few dozen cases typically available in a biomedical study. (8) It is usually a good idea to divide the found feature set according to the feature hierarchy to try to find out the decisive cells. This is especially true if cells are not individually diagnosed but by a case diagnosis. Only a subset of cells usually carries the important properties pointing to the overall diagnosis. Backtracking of decisions is not only necessary in cluster analysis. It helps to improve insight and comprehension into the variability of the material. (9) Any classification result needs to be verified on independent material. Preferably the available material should be divided from the very start in design set and test set. If that is not possible the leave one out, or jack-knife classification methods can be applied. But here one should be careful with the overly optimistic results that will result if the entire material is used in a statistical feature selection process. The test-case left out in each classification run is then not entirely independent since it has influenced the feature set used for the classifier. Still more overoptimistic results will result if only the results on the designset are quoted. Such results should not be trusted at all.
Cytometry is a diverse field with many kinds of images and many scientific questions that need to be treated in different ways depending on the specific application. So straightforward application of a recipe such as the one above will seldom work, the researchers need to apply his/her own scientific judgement when applying these rules. Still they may hopefully be of some use in avoiding some of the pitfalls we have encountered.
Our ambition with this paper has been to point out the need for a more clear documentation of what features are used in the various studies if medical microscopic image analysis. This is necessary for the re-search results to become more transparent, comprehensible and interchangeable. We hope to see future paper by other authors presenting alternative approaches. The developed programs can be given to others for non-commercial purposes. However, support and responsibility for errors and problems cannot be taken over. For a direct use of the programs a licensed version of IDL and IMP respectively is a prerequisite.

Acknowledgement
This work of development would not have been possible without the work and help of our colleagues. At GSF especially Uta Jütting as well as Michaela Aubele, Peter Gais and the continuous and encouraging discussions with the head of our group the unhappily early 1992 deceased Dr. Georg Burger. At CBA the implementation work by Dr. Bo Nordin is gratefully acknowledged.

Appendix A. Notations
I ⊂ Z 2 × R = Image of given size (columns, rows) f : I → R = Image considered as function E ⊂ Z 2 × R = Extinction image after intensity conversion X ⊂ Z 2 × R = Any other image after transformation , y), v) ∈ X A pixel of an image X at (x, y) and value v O ⊂ I = {p ∈ I | p pixel of the ob-ject} mask of the object = distance between p 1 and p 2 distance(O) : (1) Estimation of the border points

B.3.2. Convex hull, deficiency and elliptic deviation
Derived from border C by applying the Delaunay triangulation to the points of C. The returned border points of the triangulated region span the convex hull.

CH
= convex hull of O (see Fig. 10) Fig. 10, left, dark gray) From the contour the best-fit ellipse delivers other helpful and easy comprehensible shape features like (see Appendix B.3.3 Pt. 4 and Fig. 10): ELL_M N = minor axis of best-fit ellipse ELL_M X = major axis of best-fit ellipse ELL_DA = Elliptic deviation
(1) Generation of the complex valued function where n p new points are generated by linear interpolation for equidistant nodes. This is a necessary prerequisite of the fast Fourier transformation.

B.4.2. Invariant moment features
After Hu [62] and Reiss [82] with The affine invariants [82, p. 59] from the mask image: The affine invariants from the extinction image also invariant under contrast changes [82, p. 60]: The affine invariants from the flat texture image also invariant under contrast changes [82, p. 60 Threshold T H and T D in F T for bright (H) and dark (D) particle with parameter c 3 , intermediate particle masks O 1 H and O 1 D and resulting particle masks O H and O D :