Probabilistic Model-Based Cell Tracking

The study of cell behavior is of crucial importance in drug and disease research. The fields of bioinformatics and biotechnology rely on the collection, processing, and analysis of huge numbers of biocellular images, including cell features such as cell size, shape, and motility. However manual methods of inferring these values are so onerous that automated methods of cell tracking and segmentation are in high demand. In this paper, a novel model-based cell tracker is designed to locate and track individual cells. The proposed cell tracker has been successfully applied to track hematopoietic stem cells (HSCs) based on identified cell locations and probabilistic data association.


INTRODUCTION
Recent advances in cell culture and cell imaging have made possible the automated acquisition of millions of cell images. The corresponding automation of the analysis of such huge sets of images would allow fundamentally new questions to be addressed in proteomics, genomics, and stem-cell research [1][2][3][4][5][6][7][8]. This paper proposes a coupling of advanced methods in pattern recognition and image processing [9][10][11][12][13][14][15][16] to an existing cell-imaging platform [17] in which the analysis, currently being undertaken by hand, is impossibly slow and tedious for the volumes of data being generated.
The object of this particular project is the analysis of stem-cell behavior and differentiation, the process by which stem cells specialize to different cell types, a process which is crucial to understand if stem cells are to be used in cell and tissue regeneration. Specifically, given a culture of cells, observed over time, we need some way of determining whether a given cell is likely to die, to cause cancer, to specialize into an incorrect tissue type, or, desirably, to specialize into the correct cell type.
The first major step in this process, and the end goal of the research described in this paper, is the automated construction of cell lineage trees, essentially the descendent family tree of a single ancestor cell, as illustrated in Figure 1. Building such a tree for each of multiple cells in a culture requires maintaining cell identity over time, clearly requiring the tracking and associating of cells over a long sequence of images, typically 7000 images taken over a period of several days.

PROBLEM FORMULATION
Although cell tracking is among the most important and common tasks for biomedical researchers it continues to be undertaken manually. Researchers visually perform cell motion analyses and observe cell movement or changes in cell shape for hours to discover when, where, and how fast a given cell moves, divides, or dies. This task is tedious due to the often corrupted or blurred images, the presence of clutter, the fixing of eyes for long periods of time, and repeating the same task for different cell types. Furthermore, with imaging data ever more simply and rapidly acquired, manual tracking becomes progressively impractical. As a result, automated cell tracking systems are mandatory to further advance the study of biological cells [2,3,6,8,[18][19][20].
To produce the data for this study, HSC samples are first extracted from mouse bone marrow, then cultured in custom arrays having up to forty wells. A small fraction of a typical HSC microscopic image is depicted in Figure 2 with the superimposed dynamics of a mature blood stem cell before and after splitting. The cells were imaged using manual focusing through a 5X phase contrast objective using a digital camera (Sony XCD-900) and acquired by an IEEE 1394 standard (FireWire) connector. Images were sampled every three minutes over the course of several days.
To keep cells alive and healthy, light exposure must be controlled during their life cycle to minimize phototoxicity. Therefore it is desired to limit light exposure in each frame and to sample the frames as far apart as possible, leading to infrequent, poorly contrasted images, directly at odds with the data desired for easy tracking: frequent, high-contrast images. Cell staining techniques may be used to increase the contrast between cell and background, however different parts of tissue are undesirably stained unevenly, causing inhomogeneity. Fortunately the HSCs in our study have fairly regular shape and brightness patterns. Hence, a segmentation method which exploits these attributes should be able to perform better than simple thresholding. Suppose we have an image sequence I 1:K ={I 1 , I 2 , . . . , I K }. The two fundamental tasks needed to construct a lineage tree, such as in Figure 1, are the detection of cells in each image and the subsequent association of the detected cells over time. The cell detection problem is essentially one of anomaly detection: the localization of groups of pixels inconsistent with the random behavior of the image background. A wide variety of semi automatic or automatic methods have been proposed to segment cell boundaries [2,3] which can be divided into three major categories.
We can make the problem much more specific by seeking particular features consistent with HSCs. From Figures 2 and

3(a)
, HSCs can be characterized as an approximately circular object with a darker interior and a brighter boundary-an effect due to phase contrast imaging modality. So rather than a heuristic thresholding approach, these cell attributes allow a more specific model, essentially a matched filter, more robust to noise and to low contrast. The model depicted in Figure 3(b) considers the following criteria: (i) cell size: the radius r is known to lie in a limited range related to cell age; (ii) boundary brightness: brighter due to phase contrast imaging; (iii) interior brightness: tends to be dimmer than the boundary; (iv) boundary uniformity or symmetry: want to assert uniformity to avoid a strong response when straddling cells, as shown in Figure 4.
As depicted in Figure 3(b), to model a dark region surrounded by a bright boundary, the proposed cell model consists of two concentric circles, with the radius of the internal circle being half that of the external one. To facilitate the analysis of the image as a function of cell center location (x c , y c ) and radius r we construct the set of boundary pixels B x c , y c , r, Nezamoddin N. Kachouie et al.
3 Figure 4: A scenario in which a spuriously hypothesized (white) boundary may have a large associated average brightness B and a low cell interior brightness C. The uniformity constraint in the cell boundary is intended to address this case. and the set of interior cell pixels from which we extract sample means where B i or C i is the ith element of the respective set. The four cell criteria are then combined to formulate the following probabilistic cell model: where the cell boundary P cb , cell interior P ci , and boundary uniformity P bu terms are elaborated below. Based on a visual examination of the distribution of sample points of B derived from real imagery, the probability density of cell boundary P cb is modeled as Gaussian with mean μ cb and variance σ 2 where μ cb and σ 2 cb are estimated empirically. Similarly the probability density of dark region inside the cell P ci is also modeled as Gaussian with mean μ ci and variance σ 2 where μ ci and σ 2 ci are again estimated empirically. It should be mentioned that the parameters of model (4) are time invariant, consistent with most of our acquired data sets. Therefore in cases where the intensity or contrast of the image changes over time due to background noise or spatiotemporal illumination variations, the nonstationarity of the data might make (4) in error or inapplicable. In such cases, to improve the robustness of the proposed method, the time variations of the image sequence need to be removed by background estimation and subtraction, considered in future work.
As illustrated in Figure 4 we wish to penalize spurious cell detection. We propose to calculate an empirical cumulative density function (CDF) to discriminate background from cell boundary. The CDF on cell boundary pixel intensities is computed by , n ∈ 1 : |B|.
As a set of constant or uniform values in B corresponds to a straight line CDF, we use a Kolmogorov-Smirnov test on B to test its deviation from uniformity: An exponential function P bu (D) is used to penalize the nonuniformity as This completes the development of a simple model for the detection of cells in background noise. We will use the model to generate and test cell hypotheses in the following section.

CELL TRACKING
With a model in place describing the spatial pattern of pixels with the appearance of a cell, we move to the core of the problem: given a sequence of images I 1:K = {I 1 , I 2 , . . . , I K } and a definition of our "target" (the cell model (4)), we need to associate the cells over time. Denote by F 1:K a possible hypothesis of the K-frame association problem, where f k is a parametric representation of frame k. In the case of HSCs, f k is defined as where l k, j is the cell parent label, z k, j is the cell coordinate (x c , y c ), r k, j is the radius, s k, j is the cell age corresponding to cell j, and M k is the number of cells in frame k. The parent label i = l k, j associates cell j in frame k to parent cell i in the previous frame. The goal is to solve the spatiotemporal cell segmentation-association problem of Figure 5: we wish to estimate F 1:K given the image sequence I 1:K and given an initialization f 0 in frame zero.

MAP estimation
The proposed solution to the association problem is the maximum a posteriori estimation of F 1:K : From Bayes' rule, As P(I 1:K , f 0 ) is fixed, F 1:K does not depend on it, thus At the same time P( f 0 ) is fixed So we conclude that Since (16) is realized, in principle, by examining and evaluating all possible cell parameterizations and associations. In virtually all tracking problems of this kind the problem is made tractable by searching over a limited number of hypotheses such that we find the best member of this set The original, optimal solution is found if it is included among the hypotheses, that is, if arg max The key, here, to efficiency is to minimize the number of hypothesis; the key to quality of estimation is finding the most likely hypothesis. As these goals are in opposition, we are left with a complexity/quality tradeoff.

Evaluation of P(I
The proposed cell model P(x c , y c , r | I k ) = P(z c , r | I k ) from (4) evaluates the likelihood of a single cell, given an image. To solve the MAP problem we need to compute P(I 1:K | F 1:K ), the likelihood of a given image sequence as a function of a specified cell parametrization and association. Since f k provides a complete paramerized description of I k , conditioned on F 1:k , I 1:k is Markov: The cell model (4) describes only the likelihood of a single cell; it says nothing about groups of cells, nor does it provide any kind of prior on z c or r. Fortunately these latter aspects are straightforward.
(1) As the cells may be located anywhere with no prior bias, z c is uniformly distributed over the image. (2) We empirically define a size range r ∈ [2,4] pixels, within which the cell radius is uniformly distributed. (3) Any hypothesis which has cells violating the minimum required separation between cells is assigned a probability of zero.
It follows, then, that as long as zero-probability hypotheses are not created, then all remaining hypotheses { f h k } are equally likely a priori. Because P(I k ) is fixed, and moreover because all valid hypotheses are equally likely, such that P(z c , r) is constant, we can conclude implying that the evaluation of P(I k | z c , r) can follow from evaluating P(z c , r | I k ). The proposed parametric cell model can be applied to each image frame I; a two-dimensional probability map is generated, and hypothesised cells are located at local maxima of this map. Although our cell model allows this probability map to be computed as a function of r for each r ∈ [r a , r b ], we have found that a value of r • = 2 functions robustly. The locations of local maxima in P(z | I, r • ) may either be used in the generation of hypotheses from I, or in computing P(I | f ), to assess an asserted cell arrangement.
First, to generate possible measurement hypotheses from I, find the spatial local maxima of P(z) and keep only those maxima such that the likelihood of the ith maximum P(z k,i | I k , r • ) > τ. Choosing T values of τ thus generates T sets of maxima, each a hypothesised measurement set for frame k, Nezamoddin N. Kachouie et al. We have found that a fixed threshold τ = 0.65 works effectively for the HSC data set being considered here however, in general multiple τ would be used, leading to multiple measurement hypotheses. Second, to compute P(I | f ) as depicted in Figure 7 the cells in f are divided into two sets: (1) f M : those cells in f which are located within δD of a maximum; (2) f M : those cells in f which are not within δD of a maximum.
A third set contains the unmatched maxima: ( where P(z k, j | I k , r • ) is the probability of the location of the jth cell in the state f h k for frame k, and P(z τ k,i | I k , r • ) is the probability of the ith maximum in frame k. As depicted in Figure 7, P(I k | f h k ) is evaluated by applying (23).

Evaluation of P(F h
The second part of (18) is the evaluation of association hypotheses {F h 1:K }. To track the HSCs over time, detected cells in the measurement hypothesis of the current frame z τ k must be associated to the most probable element in the previous frame.
Considering that for each image frame k, we associate cell features from (k − 1) only, Markovianity can be asserted on where we recall that f k is the set of cell properties in frame k.
The cell age s k, j is updated as i f ∃m such that, l k,m = l k, j (i.e., cell split), s (k−1),lk,j + 1 otherwise. (25) Each cell in f k must belong to one of the following sets.
Unassociated: N = { j | l k, j = 0}. Split: In contrast with joint probabilistic data association (JPDA) [26][27][28] in which new tracks can not be initiated, our proposed method initiates new tracks for divided cells, therefore the following constraints are considered: (i) each measurement must originate from cell or clutter; (ii) each measurement can be associated to one cell; (iii) up to two measurements in frame k can be associated to the same cell in frame k − 1.
Asserting Markovianity we evaluate P( f h k | f h k−1 ) in thef rest of this section. The association problem is resolved frame by frame by selecting the hypothesis with the maximum joint association probability. In this way the measurement hypothesis z τ k for frame k and association hypothesis f h k−1 from the previous frame are used to generate hypotheses f h k . Therefore we have The filter step is 6 International Journal of Biomedical Imaging P(z τ k | z τ 1:k−1 ) is fixed and we have where λ k is a normalization constant. The first term of (28), P( f h k | z τ 1:k−1 ), is a prediction step which is illustrated as follows. Because of the nonlinear and non-Gaussian nature of both measurements and dynamics, in contrast with JPDA, the Kalman filter is not considered for the prediction step. The prediction step in the proposed method is The former term P vel is a nonlinear term to predict the location of the hypothetical cell j in frame k based on its dynamics and its location in frame k − 1. The motion will be celltype specific, and may further be influenced by environmental factors, chemical gradients, and so forth. In our context there are no deliberate experimental biases, and a Gaussian random walk was found to well-approximate hand-tracked cell motion. The latter term P state predicts the likelihood of cell division in frame k. An age penalty such that cell division cannot happen below some age; this minimum age is cell-type specific and is asserted from biological experience.
The second term of (28), P(z τ k | f h k ), is the likelihood of measurement z τ k given hypothesis f h k and is given by where v k, j = z τ k,i − z k, j is an innovation term so that the ith measurement is within δD of the jth hypothesised cell location in frame k. P una is a penalty on the association of unassociated cells, and P sep is the probability of separation distance of a measurement pair. As we can see in the proposed method, the likelihood of measurement z k, j is penalized by the unlikely events such as minimum separation distance and unassociated cells.

EXPERIMENTAL RESULTS
We begin by evaluating the proposed cell model. The model generates cell hypotheses, as illustrated in Figure 6, where candidate cells are found as local maxima in P(x c , y c , r) = P(z c , r). Because of the availability of hand-labelled groundtruth data, a good assessment of cell detection is possible. The only unknown parameter in cell detection is τ, the probability threshold in declaring a cell present. Figure 8 shows the probability of false alarm and missed detection as a function on the chosen threshold τ h over a sequence of HSC phase contrast microscope images. It is clear that a threshold yielding acceptably low failures of both types is τ = 0.65.
Next, Figures 9 and 10 test the number of detected cells (using τ = 0.65) and the detected spatial locations with manual ground truth. The greatest probability of misdetection Nezamoddin N. Kachouie et al. occurs during division where a mature cell gives rise to two new cells. During division, notable between frames 9 to 29, the cells are inconsistent with a circular shape with a fixed radius. However, the divided cells are recognized as soon as the division process is completed, which takes at most five frames. As it can be seen in Figure 10, the maximum cell center spatial error is 1.8 pixels per cell. Considering the fact that manual ground truth is prone to error, the results obtained by the proposed method are very promising. Results obtained by applying the proposed probabilistic cell tracking method are depicted in Figure 11. Cell centers are detected by applying the cell model (4), locating the local maxima in the probability map, and thresholding the local maxima map. Finally cell centers are associated using the proposed tracking method. Color coding is used to highlight associated cell centers such that different colors show the association of cell centers over time. It takes about 0.5 seconds for a well to be segmented and associated in the current stage using a Pentium 4 running at 1.6 Ghz.
As can be observed from Figure 11, by applying our probabilistic model-based tracking to HSC image sequence, it is able to identify and associate both nondividing and dividing cell centers correctly. However there are some cases, such as having a large number of proximate young divided cells or a few number of nearby dividing mature cells, in which some of the hypotheses have very similar probability, therefore deriving the best hypothesis for such a frame is very difficult and prone to error. Employing a more robust Bayesian approach will resolve those ambiguous situations over time by the further integration of information of neighboring frames, maintaining several hypotheses, and selecting the most likely one over the subsequent images.

CONCLUSIONS
Image cytometry is a practical approach to measure and extract cell properties from large volumes of microscopic cell images. As an important application of image cytometry, this paper presents a probabilistic model-based cell tracking method to locate and associate HSCs in phase contrast microscopic images.
Our statistical cell model, which is constructed after carefully observing HSCs in typical image sequences, captures the key properties of these cells. The close match between the model and imaged HSCs allowed for threshold selection yielding very low false alarm or missed detection. Cells in isolation are detected well; recently split cells provide a proper fit to the model and rely on association to resolve ambiguities.
Cell association is accomplished based on the proposed joint association method. As it can be observed in Figure 2 the cell dynamics can be well approximated by a random   walk as it has been considered in the proposed method to model the cell motion. It can be seen from the previous section that such a probabilistic model-based cell tracking method produces promising results and is able to identify and associate both dividing and nondividing cell centers correctly. However, there are some cases, such as a few proximate dividing cells or large number of nearby cells, in which the proposed method may be inaccurate. To resolve association ambiguities in such cases and to make the method more ro-bust to noise and clutter, future work will be conducted to extend the proposed method by integrating the information over multiple neighboring frames. Future work will also include improving the cell model to more accurately reflect unique properties of the cells under different conditions. Moreover further work is required to better preprocess the images with background subtraction to improve homogeneity and eliminate camera artifacts. There is also considerable interest in designing a parametric cell Nezamoddin N. Kachouie et al. 9 model with additional degrees of freedom to generate lineage trees in which cells can be characterized by richer features so that cell properties can be more reliably extracted.