Nonnegative Mixed-Norm Convex Optimization for Mitotic Cell Detection in Phase Contrast Microscopy

This paper proposes a nonnegative mix-norm convex optimization method for mitotic cell detection. First, we apply an imaging model-based microscopy image segmentation method that exploits phase contrast optics to extract mitotic candidates in the input images. Then, a convex objective function regularized by mix-norm with nonnegative constraint is proposed to induce sparsity and consistence for discriminative representation of deformable objects in a sparse representation scheme. At last, a Support Vector Machine classifier is utilized for mitotic cell modeling and detection. This method can overcome the difficulty in feature formulation for deformable objects and is independent of tracking or temporal inference model. The comparison experiments demonstrate that the proposed method can produce competing results with the state-of-the-art methods.


Introduction
Measurement of the proliferative behaviors of cells in vitro is important to many biomedical applications, such as drug discovery, stem cell manufacturing, and tissue engineering. Recently, the need for extended-time observation and the proliferation of high-throughput imaging have made automatic mitotic cell detection mandatory.
The state-of-the-art methods for this task generally fall into two categories. (1) Spatial feature-based method: this kind of methods detects mitotic cells directly in an image depending on spatial visual characteristics. Liu et al. [1] considered mitotic cell as a special visual pattern and train a Support Vector Machine classifier with region features for identification. Li et al. [2] extracted volumetric Haar-like features and implemented a cascade framework to classify spatiotemporal sliding windows of an image sequence. Since the current low level visual features usually have low discrimination for nonrigid and deformable objects, this kind of methods always achieves unsatisfactory performances. (2) Sequential feature-based method: this kind of methods usually implement object tracking or temporal inference models to leverage sequential features for decision. Yang et al. [3] extracted individual cell trajectories by cell tracking and identified mitoses with the dynamic features of the mother and daughter cells during mitosis progression. To handle the difficulty by cell tracking, temporal inference models are implemented to leverage the temporal context for mitosis event recognition. Gallardo et al. [4] trained a hidden Markov model for mitosis recognition with cell shape and appearance dynamics. Liu et al. [5] applied a hidden-state conditional random field to learn the sequential structure of mitosis progression. Liang et al. [6] implemented a conditional random field model [7] to further localize different mitotic phases based on the visual features of the nuclei.
Although much work has been done on this task, there still exist several limitations. On one hand, the state-ofthe-art visual features can not discriminatively represent mitotic cells with irregular appearance changes as shown in Figure 1 and therefore the spatial feature-based method only has relatively low generalization. On the other hand, the sequential feature-based method involves temporal inference model to take advantage of temporal context. Learning the complicated transition among multiple states will induce high computational complexity and make the system far from the requirement of real-time mitotic cell recognition for biological analysis.
To tackle this challenging task, we propose a nonnegative mix-norm convex optimization method for mitotic cell detection. First, we apply an imaging model-based microscopy image segmentation method that exploits phase contrast optics to extract mitotic candidates in the input images. Then, a convex objective function regularized by mix-norm with nonnegative constraint is proposed to induce sparsity and consistence for discriminative representation of deformable objects. At last, a Support Vector Machine classifier is utilized for model learning and detection on the extracted candidate regions. The main contribution lies in two folds. (1) This method can overcome the difficulty in feature formulation for deformable objects. (2) It is independent of tracking or temporal inference model and can greatly reduce the computational complexity.
The rest of paper is structured as follows. In Section 2, we briefly introduce the systematic workflow. Then, the mitotic cell representation is illustrated in Section 3. The experimental method and results will be detailed in Sections 4 and 5. At last, conclusion is presented.

System Overview
The proposed method was designed to automatically identify mitotic cell in phase contrast microscope images. To achieve this goal, the method proceeds through tree consecutive steps.
(i) Mitotic Candidate Extraction. This step aims to extract candidate regions, = { } =1 ( ∈ × , and separately mean the width and height of one region), that potentially contain mitotic cells from the original image, while eliminating most background regions to reduce the search space for refinement. We adopted the imaging model-based microscopy image segmentation method proposed in our previous work [1,8] to detect mitotic cells in the input sequence. Since this step is not the main focus of this paper, we only briefly introduce it as follows. Please refer to [1,8] for more details.
Under a positive phase contrast microscope, adherent stem cells growing in culture appear as dark objects surrounded by bright halos. We have proposed an imaging model-based microscopy image segmentation method [1,8] to restore the ideal image in which pixel values are positive inside cell regions while being uniformly zero in the background. The objective function is During mitosis, stem cells usually appear as drastically intensified halo artifacts, which often completely immerse the cell while their volume reaches minimum. By comparing these two phenomena, we found that the visual pattern of the mitotic region in the inverted phase contrast microscopy, − , is similar to the visual appearance of the normal cell region in the original microscopy, . Therefore, we can selectively enhance only mitotic regions by modifying = − and formulate the objective function equation where g and f are -dimensional vectorized representations of the observed image ( , ) and the artifact-free image ( , ) with only mitotic cells, respectively, with being the number of pixels in the image; L and D are, respectively, a Laplacian matrix and a diagonal matrix defining local smoothness and sparseness with corresponding weights smooth and sparse [8,9]. The similarity-based Laplacian matrix L is defined by L = D−W. W is a symmetric matrix whose off-diagonal elements are defined a = −( − ) 2 / 1 where and denote intensities of neighboring pixels and , 1 is the mean of all possible ( − ) 2 in the image, and D is a diagonal degree matrix where = ∑ . ‖ ⋅ ‖ denotes the norm; the nonnegativity constraint on f enforces the assumption that the cell-induced phase shifts to be restored are unidirectional; hence the restored pixel values are positive inside cell regions while being uniformly zero in the background; P is a × matrix such that the th element of the vector Pf can be computed by the convolution between f and the discretized point spread function PSF( , V) = ( , V) − airy( √ 2 + V 2 ), where airy(⋅) is an obscured Airy pattern [10][11][12].
The image f can be obtained by minimizing (f) using iteratively reweighted nonnegative multiplicative update [9]. Samples are shown in Figure 3.
(ii) Mitotic Cell Representation. This step aims to represent with a high level feature vector which directly denotes the similarity between one sample and the bases of the dictionary in the sparse representation scheme. Given a training set consisting of positive and negative mitotic cell regions and the corresponding low level image feature set = { } =1 ( ∈ ×1 ), this step means to decompose over a dictionary Φ is a sparse vector and ∈ ×1 is the residual. Since denotes the correlation between and each basis, it can be utilized as the feature representation of one sample. We proposed a convex objective function regularized by mix-norm with nonnegative constraints to simultaneously obtain the optimal * = { * } =1 and Φ * . This method will be detailed in Section 3. bases in Φ * by coefficient . Therefore, also reflects the relationship between and bases and can be utilized as the characteristic representation for test. In our work, an SVM classifier was trained with the high level feature set * = { * } =1 . Then it is utilized to predict each test sample, , as mitotic cell or not. SVM is a supervised binary classifier that constructs a linear decision boundary or a hyperplane to optimally separate two classes. Literature show that SVM usually has high generalization ability especially when only small amount of training data is available [14].

Problem Formulation.
For the representation of one mitotic cell, , we designed the objective function as follows where ‖ ⋅ ‖ 2 means a 2 norm; the relative importance of three terms is controlled by the positive weights 1 and 2 ; the nonnegative constrain ( ≥ 0) is imposed because represents the similarity between one sample and bases. The optimal decomposed coefficient * can be achieved by solving * = arg min Obj( , 1 , 2 ). The objective function in (3) consists of three parts.
(i) Fidelity. The first term penalizes the sum-of-squares difference between the reconstructed and original sample. Assuming that there are enough training samples of mitotic cell regions so that the dictionary Φ consisting of all these samples is overcomplete, it is obvious that a new mitotic cell image can be faithfully represented only by the linear combination of mitotic bases. However, it is impossible to enumerate all mitotic cases for training set in reality. The sparse coding in this way will be rather unstable. A feasible compensation is to utilize the samples of nonmitotic cell regions. By tuning , the negative samples might by very helpful for reconstruction when there are only limited mitotic bases. Therefore, the Fidelity term will utilize both mitotic and nonmitotic samples for reconstruction.
(ii) Sparsity. In the sparse coding scheme, it is usually expected that a mitotic sample should be reconstructed with both low residual and few mitotic bases. Although a nonmitotic sample can also be reconstructed with the same dictionary and the acceptable residual, it will leverage lots of bases for compensation and result in a dense . Lasso penalty is well known to impose sparsity for decomposition [15]. Therefore, ‖ ‖ 1 ( 1 norm of ) is implemented for Sparsity term.
(iii) Consistence. In the framework of sparse representation, overcomplete dictionary always exists and consequently the dictionary would be redundant. It is known that to induce sparsity the lasso tends to select only one basis from the group of bases which have high correlations in between and consequently lead to the nonunique solutions. To handle this problem, in the objective function equation (3) for mitotic cell representation, we impose the ridge penalty ‖ ‖ 2 ( 2 norm of ) for Consistence term. Zou and Hastie [16] mathematically demonstrated that strict convexity could guarantee the consistence in the extreme situation with identical bases. Since the linear combination of the lasso and ridge penalties is strictly convex, the regularizations of the objective function equation (4) can benefit from preserving the consistence. Furthermore, Zou and Hastie [16] derived the upper bound of the difference between the coefficients of two different bases to quantitatively describe the consistence effect by the elastic net penalty [16, Theorem 1]. Following Theorem 1, the difference between the coefficients of two bases is almost 0 if these bases are highly correlated. Therefore, the Consistence term can theoretically avoid the nonunique solution when the dictionary is redundant.
In this way, the proposed convex objective function regularized by mix-norm with nonnegative constraint can be formulated as follows: 3.2. Optimization. Given a training set of samples, denotes the extracted visual feature of each training candidate, there exists a latent dictionary of bases where each basis characterizes a special visual pattern of mitotic cell region or nonmitotic cell region such that a new image can be sparsely reconstructed with respect to this dictionary. Therefore, the goal of optimization of the objective function is to discover the optimal dictionary Φ * and reconstruction coefficients * for the corresponding . This task can be achieved by solving the optimization problem following: where = { } =1 and the convex set C = {Φ ∈ R × , s.t., for all i, ‖ i ‖ 2 2 ≤ 1}. Since the optimization problem above is not convex with respect to both Φ and , we follow the coordinate decent framework and propose the Iterative Updating method and summarize it in Algorithm 1, which is well tailored from the online learning algorithm [17]. Assuming the training set consisting of i.i.d. samples from a distribution ( ), the proposed algorithm randomly draws one sample at a time and alternates the sparse coding step for computing of over the dictionary Φ −1 obtained at the previous iteration and the dictionary updating step for computing the new dictionary Φ with respect to . The two main components of the method are, respectively, presented below.

Sparse Coding.
Given the obtained dictionary in ( − 1)th iteration, Φ −1 , the coefficient of in th iteration, , for updating, is independent of others. Therefore, we can optimize them independently as follows: For this convex objective function regularized by L1/L2 mix-norm with nonnegative constraint, we adopt the lineartime projection method on the L1/L2 mix-norm regularization [17]. To fulfill the nonnegative constraint, we only keep the members of which are greater than 0 and set the others with 0 in each interaction.
With the optimal dictionary Φ * , the optimal decomposition coefficients * can be achieved by * = arg min

Dictionary Updating.
During the th iteration, the algorithm will aggregate the previous information computed from loop 1 to loop for dictionary updating. Given the obtained for in th iteration, the expected dictionary in th iteration, Φ , can be obtained by minimizing the average error over all iterations. (Here 1 ≤ ≤ max (the maximum iteration) and max is independent of the sample number, , of the training set.) Therefore, Φ can be optimized by The constraint ‖ ‖

Experiments.
Any state-of-the-art visual feature can be utilized for low level image representation. To demonstrate that the proposed method does not need specific objectdependent visual feature formulation with high discriminative capability, we extracted the pixel-wise intensity feature as well as three representative visual features for comparison. Pixel-wise intensity feature (Raw) represents the global intensity distribution of one image and implicitly contains appearance characteristics. This feature is formed by concatenating each pixel intensity in raster order [9]. Histogram of Oriented Gradients (HoG), GIST, and Scale Invariant Feature Transform (SIFT) are widely utilized to represent the shape characteristic, local structural information, and local visual saliency, respectively. Due to the limited space, please refer to [18][19][20] for more details.
For dictionary construction, we need to discover which visual feature and what configuration of 1 and 2 are the best combination. Specifically, by fixing 1 and 2 , we can compare the performances of the learned SVM models with respect to four kinds of visual features. Moreover, by fixing the visual feature, we can compare the performances of the configurations of 1 and 2 by tuning both within [10 −4 , 10 −1 ].
To demonstrate the superiority of the proposed method for mitotic cell detection, we compared its performance against the spatial feature-based method [1]. Both methods only use spatial visual features for recognition and can form a fair comparison. We utilized the decomposed coefficients by the proposed method with respect to each visual feature and the best corresponding configurations of 1 and 2 to train SVM models separately. Comparatively, we directly used each kind of visual features extracted from the same training set to train a corresponding SVM model. We compared the performances with the same test set. Moreover, we compared Input: Training samples = { } =1 , 1 , 2 ∈ , initial dictionary Φ 0 ∈ × , maximum iteration max (1) initialize (2) 0 ∈ × ← 0, 0 ∈ × ← 0 (3) end (4) // main loop (5) for ← 1, . . . , max do (6) Select from (7) Sparse Coding: solve the objective function below with the linear time projection method [17] Dictionary Updating: solve the objective function below with Algorithm 2 Algorithm 1: Iterative updating method.
To evaluate the performance of mitotic cell detection, we examined four different outcomes by the proposed method, true positive (TP), false negative (FN), false positive (FP), and true negative (TN). Precision (TP/(TP + FP)), Recall (TP/(TP + FN)), and the 1 score ((2 × Precision × Recall)/(Precision + Recall), representing the overall performance of both)) are used as quantitative metrics to evaluate the performance of mitotic cell recognition. Accuracy ((TP + TN)/(TP + FN + FP + TN)) is utilized to evaluate the overall performance of both mitotic and nonmitotic cell recognition.

Mitotic Candidate Extraction.
The proposed mitotic candidate extraction method achieved 100% recall and 40% precision on the test sequence as compared to the ground Table 1: Performance comparison for mitotic cell detection with respect to four kinds of visual features ( 1 = 0.1 and 2 = 0.1).

Criteria
Proposed ( Figure 3. It is intuitive that our method eliminates most of the surrounding background of mitotic cells, keeping only the essential visual patterns for feature extraction. This helps improve the performance of mitotic cell detection as presented next.

Mitotic Cell Detection.
The performances of different dictionary learning strategies with respect to four visual features and different configurations of 1 and 2 are shown in Figure 2. With the comparison in each row, we can achieve   the best 1 score and accuracy when both 1 and 2 were 0.1 and the visual feature was fixed. It is implied that the stronger sparsity and consistence effects can benefit the decomposed coefficients for model learning. In our experiment, the maximum standard deviation (MSD) of 1 score by fixing 1 / 2 and tuning 2 / 1 is 0.044 (except the special case of 0.086 when using GIST and 1 = 0.1 shown in Figure 2(c)) and the MSD of accuracy is 0.019. These results show that the proposed method has strong robustness with respect to different visual features and a broad range of parameters. The best performances with respect to different visual features when 1 and 2 are both fixed with 0.1 were compared to decide which feature is the best for representation. From the left side of Table 1, it is obvious that the dictionary learned with Raw consistently outperforms others. It implies that it is not necessary to develop special visual feature to overcome the variance of rotation, scale, shape, and so on for mitotic cell detection although its appearance may changes irregularly. Comparatively, the decomposed coefficients of one sample can explicitly reflect its correlation with bases and this relationship can be achieved stably by the optimization method regularized by L1/L2 mix-norm with nonnegative constraint even though we did not explicitly align mitotic samples as we usually do for face recognition. Therefore, the proposed method can avoid the nontrivial task of feature extraction for deformable object recognition. The left column of Figure 3 shows the samples of the final mitotic cell detection on the frames with increasing confluency in one C3H10 image sequence. Especially, when cell density got much higher as shown in Figure 4, the proposed method can still effectively identify mitotic cell with the discriminative and robust high level feature stably produced by the convex optimization regularized by L1/L2 mix-norm with nonnegative constraint although one false positive and one false negative cases occur.

Comparison.
The performance comparison between the proposed method and the spatial saliency-based method (SSM) [1] is shown in Table 1. It is obvious that the proposed method can consistently outperform the other in terms of 1 score and accuracy with respect to any visual feature. Especially, we can achieve the best performance ( 1 = 85.7% and accuracy = 93.9%) when Raw was selected and both 1 and 2 were 0.1, which is competitive to the performance of GIST by both methods. However, the formulation of GIST would cost much higher computation [20] compared to the formulation of Raw. To our surprise, SIFT feature works worse than both Raw and GIST. It is explainable that the substantial and irregular appearance changes can not be preserved simply by SIFT formulation. It is expected that HoG works worst because it mainly represents the shape feature and is not suitable for deformable object representation.
The advanced comparison to the temporal context-based methods with Raw as low level feature is summarized in Table 2. The proposed method achieved the precision of 88.0% and the recall of 83.6%, with the best 1 score of 85.7% and the best accuracy of 93.9%. In contrast, CRFbased method [6] and HMM-based method [4] achieved significantly lower precision, recall and 1 scores. These results revealed that the high level feature stably achieved by the convex optimization regularized by mix-norm with nonnegative constraint has high discriminative ability and is essential for mitotic cell modeling. Consequently, the proposed method can outperform both methods even though no temporal context is incorporated. Because HCRF and EDCRF can capture the intermediate structures using hidden-state variables and is more flexible to model the temporal state transition, the HCRF-based method [5] and the EDCRFbased method [13] obtained better result in terms of 1 score (87.0% and 87.4%, resp.). However, both sacrificed high computation complexity only with 1.3% and 1.7% improvement of 1 score.

Conclusion
In this paper, we propose a nonnegative mix-norm convex optimization method for mitotic cell detection. This method can overcome the difficulty in feature formulation for deformable objects. Moreover, it is independent of tracking or temporal inference model. Large scale comparison experiments demonstrate that the proposed method can produce competing results with the state-of-the-art methods by the highest 1 score (85.7%) and accuracy (93.9%). We plan to discover more characteristics of mitosis event with the biology knowledge for objective function design to improve the performance of mitotic cell detection.