Automated Detection of Working Area of Peripheral Blood Smears Using Mathematical Morphology

The paper presents a technique to automatically detect the working area of peripheral blood smears stained with May‐Grünwuald Giemsa. The optimal area is defined as the well spread part of the smear. This zone starts when the erythrocytes stop overlapping (on the body film side) and finishes when the erythrocytes start losing their clear central zone (on the feather edge side). The approach yields a quick detection of this area in images scanned under low magnifying power (immersion objective ×25 or ×16). The algorithm consists of two stages. First, an image analysis procedure using mathematical morphology is applied for extracting the erythrocytes, the centers of erythrocytes and the erythrocytes with center. Second, the number of connected components from the three kinds of particles is counted and the coefficient of spreading ρs and the coefficient of overlapping ρo are calculated. The data from fourteen smears illustrate how the technique is used and its performance. Colour figures can be viewed on http://www.esacp.org/acp/2003/25‐1/angulo.htm.


Introduction
Historically examination under microscope of good quality blood smears has been the best way to estimate the number of leukocytes or white blood cells; for leukocyte differentiation; to study the morphology of erythrocytes or red blood cells; to characterise the morphology of lymphocytes and; to calculate the number and morphology of platelets [9].
From the last 20 years automated systems for leukocyte recognition are currently available in the market and are used in clinical laboratory routines. These generally rely on flow cytometry techniques whereby a blood sample flows through a detector and is then discarded. In addition these devices reach the limited aims of identifying normally circulating leukocytes and of flagging abnormal circulating cells. Thus examination of stained peripheral blood smears remains necessary for detecting and classifying abnormal cells [4], particularly to study the morphology of lymphocytes which is regarded as the principle basis for the identification and discrimination among the chronic lymphoproliferative disorders [5,10].
From a methodological point of view there are two desirable qualities of a blood smear. These are the sufficient working area, defined as the well spread part of the smear, and an adequate quality and reproducibility of the staining procedure. Both have been previously studied, see Benattar and Flandrin [3,4].
Following some research works on haematological cytology image analysis for microscopic diagnosis [11,12], the aim of this paper is to present a powerful application of mathematical morphology in order to automatically detect the working area of a peripheral blood smear using the fields scanned under low magnifying power (immersion objective ×25 or ×16, see Fig. 1(a)). In a strategy of automation using a motorised microscope and within a telepathology context, once the optimal area is detected, the following step is to increase the magnifying power (immersion objective ×100, see Fig. 1(b)) in order to locate the interesting fields which contain some leukocyte and in this case to acquire the fields for storing, transmitting or processing the images.

State of the art
Classification of blood cells from peripheral blood smears is an established problem in image analysis and pattern recognition which started over four decades ago, Ingram et al. [17], Young [22]; and which continues still to arouse interest. See in Beksaç et al. [6], the use of a neural network for blood cell recognition, and in Theera-Umpom and Gader [21], the application of morphological granulometries for the same purpose. In the early 80 s, image analysis based systems for leukocyte differential counting were introduced and several of these systems were quite capable of differentiating between many types of immature and abnormal cells, see Haus et al. [16]. At present, several systems based on automated microscopy and image processing are under development for differential blood cell analysis, see Comaniciu et al. [8], and CellaVision Company [7]. However we did not find any reference relating to the automated detection of the working zone of peripheral blood smears using image analysis and the above referenced systems do not claim any characteristic about this issue.

Background
In order to define the working area of a peripheral blood smear we have to examine the smear under low magnifying power in such a way that we have in the field a sufficiently large sample of the smear zone, i.e., enough number of cells for an understanding overview. Our methodology is valid for different magnification powers whenever this essential requirement is observed. For instance, in this study we worked with images obtained from oil-immersion objective ×25 and oil-immersion objective ×16. We believe that these two values are respectively the upper and lower limit of validity of the approach.
In the stained images, the erythrocytes (ennucleated cells) have a reddish color from the haemoglobin and usually appear round or oval with a pale-staining central region, caused by their biconcavity [9].
The working area or optimal area is defined as the well spread part of the blood smear. This zone starts on the body film side, when erythrocytes stop overlapping (piled on top of one another) and finishes on the feather edge side, when erythrocytes start to lose their clear central zone (they were not well spread). See in Fig. 2 the series of 20 images which illustrate this definition. The rationale behind our algorithm for automatic detection of the optimal area is the quantification of these two erythrocyte phenomena, the spread and the overlap.

Peripheral blood smear images
In this paper, we used the images from 14 peripheral blood smears stained with May-Grünwuald Giemsa using an uniform and stable technique [3]. For each one, the acquisition of several fields yields a succession of images which covers the complete smear area. For this study, the manual technique for selecting the field images on the film is the following: the vertical position corresponds approximately to the middle of the film and on the horizontal dimension, a carefully sampling allows to cover the total smear. These microscopic images were acquired at controlled light intensities (the RGB camera is calibrated to ensure correct color registration) using a ICG TRIBVN workstation. shows by a statistical study the performance of the method.

Mathematical morphology
First introduced as a shape-based tool for binary images, mathematical morphology has become a very powerful nonlinear image analysis technique with operators capable of handling sophisticated image processing tasks in binary, grey-scale, color and multiresolution imaging modalities. A tutorial in the technique can be found in Serra [18,19]. In this section we briefly define the basic morphological operators used in this paper. The practical effects are shown in Fig. 3 by means of a grey-tone image series.
In the framework of digital grids, a grey-tone image can be represented by a function f : D f → T , where D f is a subset of Z 2 and T = {t min , . . . , t max } is an ordered set of grey-levels. Let B be a subset of Z 2 and s ∈ N a scaling factor. sB is called structuring element (shape probe) B of size s. The basic morphological operators are The two elementary operations of erosion and dilation can be composed together to yield a new set of operators having desirable feature extractor properties which are given by The morphological openings (closings) filter out light (dark) structures from the images according to the predefined size and shape criterion of the structuring element.
A morphological tool that complements the opening and closing operators for feature extraction (extract the marked particles) is the morphological reconstruction, implemented using the geodesic dilation operator based on restricting the iterative dilation of a function marker f by B to a function mask g, The reconstruction by dilation is defined by . There are two composed operators derived from the geodesic reconstruction that are of great importance for the application in this paper are the close-hole operator and the size-and-close-grain operator.
-Close-hole: this operator fills all holes in an image f that do not touch the image boundary f ∂ (used as a marker) and therefore provides a parameter free approach to detect holes in an image: -Size-and-close-grain: this operator removes the small light structures or grains (smaller than λB 1 ) using an opening by reconstruction and closes the packets of grains when they are sufficiently near (gap smaller than µB 2 where B 2 is a circle) using an isotropic closing

Cell segmentation by thresholding
In blood images scanned under low magnification power, two regions may be observed: the plasma (background) and the cells (erythrocytes, leukocytes and platelets). The histogram summarises the greylevel contents of an image. In this case, the histogram of the filtered green channel h G [n] is typically bimodal. The cell segmentation by thresholding at the optimal value, u T , between the two modal peaks, n 1 and n 2 , yields a binary image which matches the leukocytes, the platelets and the erythrocytes without their central zone. Different methods, see the survey on image segmentation by Fu and Mui [13] or the overview and comparison of eleven common histogram-based thresholding methods in Glasbey [14], can be employed to determine u T .
We propose a new approach which combines the classical selection of the thresholding value by minimising the sum of within class variances with a morphological technique for selecting the central mode values which speeds up the convergence to u T . The algorithm, divided into three steps, is summarised as follows,

Smooth histogram
In order to obtain the smooth histogramĥ G [n], we filter the histogram by averaging every bin with its neighbours (anti-causal, zero-phase filter of size N = 5). In Fig. 4(b) the interest of this step is shown: after smoothing, the number of maxima is reduced and the computation of the next step is faster. On the other hand, this preprocessing step stabilises and enhances the histogram modes in order to avoid that in an extreme situation a secondary maxima could have a significant contrast. Maxima of the histogram In order to find the two peak maxima, we use a morphological tool: the dynamics or contrast extinction value. Introduced by Grimaud [15], this measure of contrast maps each maximum with a value: the contrast of a maximum is the minimum descent necessary to move from the maximum to another higher maximum (the contrast of the highest maximum is defined as the difference between the maximum and minimum of the function). Figure 4(a) illustrates the contrast of histogram maxima. Then, we can select the two highest maxima, n 1 and n 2 , which square with the central value of each mode. This definition of histogram modes as the histogram peaks of largest dynamic (the most relevant peaks) has been previously used by Soille [20].
Variance criterion Using a statistical test, we obtain the best histogram partition u T with respect to a criterion of variance C,

Algorithm
Our algorithm can be divided into two parts. First, an image analysis procedure using mathematical morphology is applied for extracting the erythrocytes, the centers of erythrocytes and the erythrocytes with center. Second, the number of connected components from the three kinds of particles is counted and the coefficient of spreading and the coefficient of overlapping are calculated.

Image analysis
The aim of the image analysis step is to segment and classify the cells, mainly the erythrocytes. We are interested in the morphology of erythrocytes: presence or absence of clear center and overlapping of neighbouring cells. The distinction between erythrocytes with and without center and between separated and overlapped erythrocytes are required. The first task is easy, we extract on one hand the cells and on the other the centers (brighter than the body), afterwards by reconstruction, the cells with center are obtained. The second task is less direct, for identifying the overlapped cells we consider that after segmentation two overlapped cells are only a connected component nevertheless they have probably two centers. The presence of some leukocytes is not a serious error since their contribution from a statistical viewpoint is negligible. However, it is desirable to extract the platelets and other small artefacts which could be numerous.
Let f be the blood smear color image. After comparing several color spaces, we found that the contours of the erythrocytes appear most continuous and contrasted against the background as well as a better contrast between center and body in the green channel f G of the RGB color space. After pre-filtering and binarisation, it is enough to apply morphological filtering to binary images. There are two reasons behind the use of binary operators: the first one is for particle counting we have to use a binary image and on the other hand using binary operators the algorithm is signifi- cantly faster (to envisage the integration of this algorithm in a microscopic system). The algorithm can be summarised as follows (the intermediate images which allow to follow the different steps are shown in Fig. 6), Filtering In a first step we filter the image f G using a median filter M F n of size n = 3 in order to remove the noise and small mistakes of digitalisation. In fact, this pre-filtering which is the same for images under ×25 and ×16 allows that the possible Gaussian and flicker noises, introduced in the optical and image acquisition chain system, are controlled before the thresholding. Binarisation The histogram off G is typically bimodal, i.e., two object categories: the cells (erythrocyte bodies, leukocytes and platelets) and the background (plasma and erythrocyte center). A simple thresholding process at u, i.e., the optimal value between the two modes, T [t min ,u] , seg- ments the two categories obtaining a binary image The choice of u is crucial since it is important to differentiate between center and body on the erythrocytes. If u is chosen too low, cell bodies are broken, if it is set too high, cell and background are not separated. In Fig. 5 some examples of histograms are shown. The overlapping phenomenon determines the locus of the first mode and the variations in the staining procedure condition the relative position of histogram. In order to automatically control this lack of definition, the optimal u is chosen by means of the adaptive morphological thresholding, presented above.

Extraction of cells
In I, the cells without their centers are included. In order to fill all the holes in the cells, we apply a close-hole operator I cl = ψ clohole (I).

Extraction of platelets and artefacts
As we know approximately their size, in order to eliminate the platelets we can apply a reconstruction using an opening as the marker with s 1 such that s 1 B is larger than the size of the platelets; the structuring element B is a circle. The result is a binary mask of cells, knowing that a certain amount of them could be overlapped. By difference, we can obtain I pl = I cl − I cl2 , that contains the platelets and some small artefacts, typically pieces of cell.

Extraction of centers
The goal is to extract the centers of cells. To achieve this, we start by taking the difference between I and I cl Then, we apply a size-and-close-grain operator in order to merge the grains which are part of the same center where s 2 is the size of the opening by reconstruction which removes the small centers and s 3 the size of the closing which packs the grains sufficiently near. As result, we have the binary mask of centers.

Extraction of cells with center
The final image is obtained by applying a reconstruction of the mask of cells I cl2 using the mask of centers I ct2 as the marker that provides a binary mask with the cells which have at least a clear center.
This algorithm has three parameters: the size of the platelets s 1 , the size of the centers considered too small s 2 and the size of the gap between particles belonging to the same center s 3 . Since they are parameters of size of biological structures, the use of different magnification powers of cells involves different sizes. For instance, under ×25: s 1 = 9, s 2 = 3 and s 3 = 5 and under ×16: s 1 = 7, s 2 = 1 and s 3 = 3. The robustness of an algorithm can be defined in respect to changes in the parameters or to image quality. We have tested the algorithm and the influence of parameters on an image database with 180 field images, introducing several alterations of quality (noise, resolution, etc.). In view of the results we can state that the behaviour of the algorithm with respect to limited changes is quite robust. Figure 7 depicts the results of the algorithm for three fields belonging to a blood smear under ×25 and another under ×16, showing the differences of the possible three zones that we could find: too spread, acceptable and too thick.

Parameters for quantification
Taking these three binary images I cl2 , I ct2 and I clct as a starting point, we can count for each one the number of connected components (separated particles) in such way that we have the number of cells N cells , the number of centers N centers and the number of cells with clear center N cells_with_center . In order to calculate the connected components of a binary image, a fast algorithm based on the geodesic reconstruction is used. From these three measurements, the following parameters are defined.
The coefficient of spreading ρ s quantifies the proportion of cells which have a clear center; i.e., A low value of ρ s involves that the cells are too spread on the slide or that they are too overlapped on top of one another and the central zones are also piled. Since N centers can be greater than N cells , the range of ρ s > 1. This happens when there are some overlapped cells. However, in order to quantify more precisely this situation, we introduce a second parameter. The coefficient of overlapping ρ o assesses the proportion of cells with clear central zone which are overlapped; i.e., The synthetic diagram, in Fig. 8, contains the possible situations of two neighbouring erythrocytes and the respective values of ρ s and ρ o . The combined use of the two coefficients is very meaningful and yields a discrimination of the different zones of the smear.
In order to define numerically the working area, we present a control experiment on Smear 0 (see Fig. 2) Table 1 Quantitative results of the study and subjective remarks by the haematological expert on Smear 0 for testing the ability of coefficients ρ s and ρ o against a human expert. The images have been subjected to analysis by a haematological grader in order to remark the contents and to assign the corresponding zone of the smear. We consider three kinds of zone: too spread, acceptable (or working area) and too thick. Then the parameters have been computed for each image. The results have been compared with the subjective study. The values of parameters and human comments are summarised in Table 1.
From this table we were able to establish the values of the coefficients that allow the objective verification of the classification done by the human expert. For the coefficient of spreading, it is easy to observe that if we consider for instance ρ s > 0.50 as criterion of acceptation, the fields too spread are rejected. The interpretation of this round value is also very intuitive: half the detected cells (50%) have at least a clear center. The coefficient of overlapping is less simple since its interpretation is also less intuitive. We took the limit value ρ o > 0.73; i.e., at least 73% of cells with clear center are not overlapping. This choice implied that three images, doubtfully classified by human expert as belonging to the thick area, were acceptable for reading. These undecided cases for a human operator are resolved according to objective criteria. Figure 9 shows the scatter plot of parameters ρ s and ρ o . The established limits divide the space, demarcating the fields belonging to the working zone.
In summary, from this control case the working area is defined by the following boundaries We fixed these values as the classification stage of our approach.
Apart from the control example ×25, we tested initially the algorithm on 3 smears ×16, with 20 images from each one. By this scale changing we showed that the definition of working area by means of ρ work area s and ρ work area o is independent of magnification power and consequently independent of size of cell sample. Also independent of the field image size. This is natural because the definition of working area is given by two relative coefficients.
In order to compare the results obtained by the proposed algorithm with the performance of a human expertise, the image contents has been manually assessed into tree classes: too thick (TT), too spread (TS), acceptable (OK) and doubtful (DB). The results are shown in Table 2. We observe that the results are good enough. The manual results matched the automatic classification in the well-defined cases. A response was also obtained when the human has a doubt. The conflicts (image 16 of Smear B and image 16 of Smear C) happened just in the border between acceptable and spread area, where the decision to take is very difficult for the human grader.
Finally, in order to complete the results we carried out an additional analysis on other 10 smears ×16, with 10 images from each one. The methodology used to validate the performance is the same, comparing the result of the proposed algorithm with the human expert evaluation. We propose to use the well-known notions of true positive TP, true negative TN, false positive F P and false negative FN. Besides the absolute number of classified images, we used the classical relative ratios: sensitivity, Sens. = TP/(TP + FN) (measures the number of field images which truly belong to the working area and which correctly classified); specificity, Spec. = TN/(TN +FP) (measures the number of field images which do not belong to the working zone which test negative) and the positive predictive value PPV = TP/(TP + FP) (measures the proportion of fields which have been classified as working area are Table 2 Comparison of the proposed automatic method with manual expertise on Smear A, Smear B and Smear C. Code for the manually classified zones: too thick (TT), too spread (TS), acceptable (OK) and doubtful (DB). The automated working zone belongs with the upper-right quadrant Table 3 Comparison of the results obtained by the proposed algorithm with the manually classified by a human expert  really in the working area). In Table 3 are summarised the satisfactory values obtained for the 13 smears ×16.
In essence, the choice of decision boundaries for ρ work area s and ρ work area o is not only a question of automated classification. Despite that the fixed values of the coefficients are only based on a well-representative smear and a proved human experience these decision limits have yielded interesting results. We can not use the term "optimal" limits for the fixed values however we believe that for classical cell smear blood analysis the images have to be acquired from zones which coefficients of spreading and overlapping belong approximately to the presented values, see in Table 4 the statistical values of the coefficients ρ s and ρ o on the 14 smears for the different smear zones. Moreover, the physical sense of parameters allows to lift the boundaries depending on the requirements of the blood cell image analysis application.
Considering the success of the classification technique we could say that the information provided by these two coefficients and their computation speed can yield a simple tool to help an expert or a software system during the process of image acquisition.

Conclusions
Definition of working area of peripheral blood smear stained with May-Grünwuald Giemsa is a crucial step in order to automate the haematological cytology image analysis for microscopic diagnosis; detection of leukocytes using a motorised microscope; acquisition in digital images and quantitative analysis of morphological cell features based on mathematical models [1,2]. An efficient algorithm for the detection of the well spread part of the smear has been presented. Robustness and accuracy in comparison to human expertise have been evaluated on several smears under two magnification powers.
The results are encouraging and they open up new possibilities. Besides integrating the presented algorithm as a software module in a microscopic workstation, there are other indirect applications of the approach in practical haematology as a tool for quality control and standardisation of automatic devices and laboratory routines. Benattar and Flandrin [4] performed a comparison of an apparatus for automatically spreading peripheral blood films to the manual wedgepull technique, showing that the average size of optimal area for counting is twice as large as compared to manual ones. Using our approach, it is possible to envisage comparing objectively the performance of different slide maker apparatus.