Automatic Global Level Set Approach for Lumbar Vertebrae CT Image Segmentation

Vertebrae computed tomography (CT) image automatic segmentation is an essential step for Image-guided minimally invasive spine surgery. However, most of state-of-the-art methods still require human intervention due to the inherent limitations of vertebrae CT image, such as topological variation, irregular boundaries (double boundary, weak boundary), and image noise. Therefore, this paper intentionally designed an automatic global level set approach (AGLSA), which is capable of dealing with these issues for lumbar vertebrae CT image segmentation. Unlike the traditional level set methods, we firstly propose an automatically initialized level set function (AILSF) that comprises hybrid morphological filter (HMF) and Gaussian mixture model (GMM) to automatically generate a smooth initial contour which is precisely adjacent to the object boundary. Secondly, a regularized level set formulation is introduced to overcome the weak boundary leaking problem, which utilizes the region correlation of histograms inside and outside the level set contour as a global term. Ultimately, a gradient vector flow (GVF) based edge-stopping function is employed to guarantee a fast convergence rate of the level set evolution and to avoid level set function oversegmentation at the same time. Our proposed approach has been tested on 115 vertebrae CT volumes of various patients. Quantitative comparisons validate that our proposed AGLSA is more accurate in segmenting lumbar vertebrae CT images with irregular boundaries and more robust to various levels of salt-and-pepper noise.


Introduction
Image-guided minimally invasive spine surgery (IG-MISS) was wildly performed for the degenerated lumbar spine in the last decades [1]. The public demand has been increased for these procedures due to its various benefits, such as performing more accurate instrumentation placement with less radiation exposure, yielding less postoperative complications, and reducing recovery time [2,3]. In Image-guidedsurgery (IGS) system, image segmentation of the interesting anatomical structure is an essential preprocessing step for 3D reconstruction and image registration, which is commonly applied in preoperative planning, intraoperative navigation, and postoperative assessment [4]. However, manually segmenting lumbar vertebrae is a time-consuming, subjective, and nonrepeatable task. Implementing an automatic lumbar vertebrae segmentation approach is of great importance not only for simplifying surgery process but also for improving operations accuracy.
Although a large amount of literature has been focused on lumbar vertebrae CT image segmentation, there still exist some potential challenges due to the inherent limitations of CT imaging modality and complexity of vertebrae anatomical structure. Figure 1 shows some specific challenges on lumbar vertebrae CT image segmentation, such as topological variation of vertebrae anatomical structure, irregular boundaries (double boundary, weak boundary) [5], and image noise. Generally speaking, two major types of approaches are exploited to deal with these issues, i.e., statistical shape models and level set models. Statistical shape models (SSM) [6][7][8][9][10][11] characterized by matching a prior template to new images is able to address the problem of irregular boundaries in lumbar vertebrae CT image segmentation. Nevertheless, these models do not explicitly cope with the individual differences of the vertebrae anatomical structure, since the prior templates are mainly established by statistical means of training shapes. Additionally, these statistical methods are computationally expensive because it is heavily dependent on spatial registration of the deformable model [12]. By comparison, level set methods (LSM) [13][14][15][16][17][18][19][20][21][22][23][24] are mainly based on edge or region information, which can naturally figure out the topological variation issue. However, the edge-based LSM are mostly quite sensitive to image noise and often suffer from serious boundary leaking problems when the objects have weak boundaries [25]. On the other hand, the region-based LSM assume homogeneity of local region intensities, which cannot segment images with inhomogeneity [26]. Furthermore, the manual interaction of initial contour is still required in traditional LSM and pixels far away from the initial contour are meaningless for obtaining the object boundary [27]. Due to these limitations in CT image segmentation, a novel lumbar vertebrae CT image segmentation approach is proposed to achieve fast, robust, and accurate segmentation. Our segmentation strategy is an automatic global level set approach (AGLSA), which comprises two stages within the coarse-to-fine framework. First, we introduce an automatically initialized level set function (AILSF) which exploits hybrid morphological filter (HMF) and Gaussian mixture model (GMM) to automatically generate a smooth initial contour which is precisely adjacent to the object boundary. Second, a regularized level set formulation is designed to overcome the weak boundary leaking problem, which utilizes the region correlation of histograms inside and outside the level set contour as a global term. Ultimately, a gradient vector flow (GVF) based edge-stopping function is employed to guarantee a fast convergence rate of the level set evolution and to avoid level set function oversegmentation at the same time. Our proposed approach has been tested on 115 vertebrae CT volumes of various patients. Quantitative comparisons validate that our proposed AGLSA is more accurate in segmenting lumbar vertebrae CT images with irregular boundaries and more robust to various levels of salt-and-pepper noise. The rest of this paper is organized as follows. Section 2 reviews the related literature. Section 3 presents the conception of our proposed method. The evaluation methods of these approaches are then introduced, and experimental results and analysis are detailed in Section 4. Section 5 concludes this paper.

Related Work
Lumbar vertebrae segmentation methods can be briefly classified into two types: (1) statistical shape models [12] which take shape prior information into consideration and (2) active contour models (ACM) [28] which directly take intensity information into account. SSM generate mean shapes using their own shape parameters, such as Fourier and wavelet descriptors, and use shape constraints to overcome ambiguous boundary information [6]. For instance, profound prior knowledge, such as various kinds of models covering shape, gradient, and appearance information, was utilized by Klinder et al. [7] to obtain a robust vertebrae segmentation framework. Ma et al. [8] proposed a coarseto-fine deformable surface model based on learned bonestructure edge detectors to segment vertebrae in 3D CT images. Manifold embeddings [9] were introduced to treat multiple vertebrae as a whole shape for spine segmentation. Recently, Rasoulian and Suzani et al. [10,11] developed a statistical multi-vertebrae shape+pose model and employed a registration-based technique to segment the CT and MR images of spine. Unfortunately, all these methods for lumber vertebrae segmentation are semiautomated requiring manually marked initial locations of vertebras and suffer from expensive computational complexity since these algorithms rely heavily on spatial registration of the deformable model. Additionally, these deformable spine models depend upon sufficiently large training and testing dataset, which increases the complexity, and these models fail to segment the vertebrae regions that are apparently distinct from the dataset images. The level set method [29], which belongs to ACM, has been widely utilized in image segmentation for its intrinsic property in dealing with topological variation. The basic idea of LSM is to evolve the zero-level of the level set function (LSF) in the image domain until it reaches the boundaries of the regions of interest [13]. LSM is able to handle the issue that topology of the contours merges and breaks, which wildly exists in the segmentation of lumbar vertebrae in CT images. LSM for image segmentation can be briefly categorized into three types: (1) edge-based models, such as distance regularized level set evolution (DRLSE) [14], use edge-detecting function to stop evolving curves which results in leaking out the ideal contours when the edges are ambiguous; Khadidos et al. [30] calculated a weighted energy term according to the relative importance of boundary points to solve the problem of weak edge leaking; (2) regionbased models, such as Chan-Vese model (C-V) [15], assume object and background intensity to be homogeneous which cannot tackle the problems of intensity inhomogeneity; (3) gradient vector flow models (GVF) [16] use GVF as the external force field to extend the capture range, but they fail to efficiently solve the convergence problem for an image with deep concavities boundary and high noise level. Liu et al. [31] attempted to distinguish noises and object edge points by using the local regional properties of images points. All the methods mentioned above need manually initialized contour and the corresponding segmentation performance is sensitively dependent on it. Several researchers have come up with discrepant methods to tackle this initialization problem. For instance, Aslan et al. [5,17,18] have integrated intensity, spatial interaction, and shape information into a probabilistic energy model in order to obtain the optimum segmentation. Shalaby et al. [19,20] have used a two-dimensional principal component analysis to extract the shape prior information in order to initialize level set function and constructed a probabilistic shape-based model. Lim et al. [21] have introduced an edge-mounted Willmore flow as well as a prior shape kernel density estimator, to the level set segmentation framework, but it is still in desperate need of sufficiently large training and testing dataset. Symmetry property of target boundary has been utilized by Liu et al. [22] to initialize the level set function, but the application is limited to segment symmetrical objects. A level set algorithm has been proposed by Li et al. [23], which tries to evolve level set function from the initial segmentation via spatial fuzzy clustering, yet the initial contour is not smooth enough. A simple initialization method for the level set function is developed by Huang et al. [24], which exploited Otsu method to automatically initialize LSF, but it assumed that the intensity of the regions inside and outside the object boundary was homogeneous which is incapable of conforming to the feature of lumbar vertebrae CT images. Recently, Balla-Arabe et al. [32] and Liu et al. [33] have taken advantage of 2D histogram information to constrain the level set evolution, in order to reduce the computational complexity of the LSM. All the methods above fail to consider the relevance between regions that lie inside and outside the evolving contour. In our previous work [34], a regioncorrelation-based LSM is designed to address this problem.

Methodology
This section details the conception of our proposed AGLSA for lumbar vertebrae CT image segmentation. The process of this approach is illustrated in Figure 2. It can be observed from the figure that AGLSA includes two stages within the coarse-to-fine framework: In the first stage, we introduce an automatically initialized level set function (AILSF) which exploits hybrid morphological filter (HMF) and Gaussian mixture model (GMM) to automatically generate a smooth and well-defined initial contour, which is precisely adjacent to the object boundary. In the second stage, a regularized level set formulation based on the region correlation of histograms inside and outside the level set contour is employed to overcome the weak boundary leaking problem and eventually to obtain the desirable segmentation.

Automatically Initialized Level Set Function
To address the salt-andpepper noise problem, a sequence of morphological filters is performed for lumbar vertebrae CT image denoising. Let be an input image, ( , ) denote the gray level at pixel ( , ), and denote the structuring element. Unlike binary image morphological operations [35], the erosion and dilation operators in gray images [36] are defined as follows: The erosion of at pixel ( , ) with a structuring element is the minimum value of the image in the region coincident with when the origin of is at pixel ( , ); the dilation of at pixel ( , ) with a structuring element is the maximum value of the image over the window when the origin of is at pixel ( , ). Therefore, the erosion of image at pixel ( , ) is given by and the dilation of image at pixel ( , ) is given by The morphological opening operator and closing operator for gray images are defined as where Θ and ⊕ denote erosion and dilation, respectively.
Opening operator removes small objects from the foreground (usually taken as the dark pixels) of an image, while closing operator removes small holes in the background (usually taken as the bright pixels). Therefore, we introduce a HMF defined as where ∈ [1, +∞) is the number of iterations, which can control the smoothing effect of the filter. This HMF can remove the dark pixels from vertebrae regions and the bright pixels from background through several iterations of opening and closing operators, respectively. Noticeably, the reason why we take the closing operator firstly in the sequence of morphological filter is that this type of operation order is able to preserve the weak boundary.

Gaussian Mixture Model.
To automatically attain a smooth and well-defined contour for the anatomical object, the input images should be preliminarily segmented into foreground and background areas. In this regard, we take advantage of GMM to cluster the input image into two classes: background class and foreground class. GMM is characteristic of a weighted sum of component Gaussian densities defined as where is the gray value of the input image, , = 1, . . . , , represent the mixture weights, and ( | , ) denotes the ℎ component Gaussian densities with mean and standard variation . Φ = { 1 , . . . , , 1 , . . . , } refers to the complete set of parameters for GMM. This set of parameters in Φ is estimated using expectation-maximization (EM) algorithm.

Global Level Set Approach.
A global level set approach is utilized to obtain the ultimate refined lumbar vertebrae CT image segmentation result, which has been automatically initialized by the method mentioned in section A. The LSM evolves a high-dimensional surface ( , , ) with an evolution function defined as = ∇ where is the speed function that controls the evolution of the LSF. The main idea of edge-based LSM is to utilize gradient information to evolve the initial contour to the object boundary. The popularly used formulation of edgebased LSM is [14] = div ((1 − 1 ∇ ) ∇ ) where (∇ ) denotes the edge indicator function, ( ) denotes the Dirac function, and , , are positive constants that control the contributions of these function evolving terms. Region-based LSM separately consider the statistical information of the entire inside and outside the contour. The classic formulation of region-based LSM is [15] = where , 1 , 2 are positive constants and are the average intensities of the region inside and outside the contour, respectively. Unfortunately, these methods fail to consider the relevance between regions that lie inside and outside the contour. In this section, a global level set approach based on region correlation is designed to cope with this issue.

Region-Correlation-Based Energy Function.
Due to the presence of intensity inhomogeneity and noise, the gray values are neither constant nor continuous variation in lumbar vertebrae CT images. Thus, the average intensity values in (9) are not capable of segmenting images with these problems. To tackle this challenge, we modify (8) by considering the region correlation between regions inside and outside the contour as a global term. For a LSF : Ω ∈ R, a novel energy function E( ) is defined by where , are positive parameters that regulate the impact of energy terms. The first internal energy term int ( ) is designed to avoid LSF from unnecessary reinitialization. It is based on the formulation proposed by [14] given by BioMed The second global term ( ) is the external energy depending upon the region correlation between regions inside and outside the contour. We define this term as Two matrices in (12a) denote the normalized region histograms inside and outside the contour, respectively; denotes the gray level of the images. In (12b), ( ) represents the Heaviside function and is a positive parameter. To take the histogram information of weak boundary into consideration, we calculate the Mahalanobis distance between and as follows: where Σ denotes the covariance matrix. Eventually, given (10), (11), (12a), (12b), and (13), the energy functional of our method in (10) is as follows which can be minimized by solving the following gradient flow: where ( ) is the Dirac delta function. As defined above, the proposed energy function should be effective when segmenting lumbar vertebrae CT images with weak boundary in noisy conditions. This property will be demonstrated subjectively and objectively by experiments on clinical lumbar vertebrae CT images.

Edge-Stopping Function.
As mentioned above, the proposed LSM is automatically initialized by AILSF and the initial contour is already in proximity to the object boundary. Therefore, the edge-stopping function (∇ ) should quickly converge and effectively avoid LSF oversegmentation which results in weak boundary leaking. Usually, an edge-stopping function is defined as where represents a Gaussian kernel with a standard deviation and = 1, 2. The edge map of gradient vector flow [37] is adopted to accelerate the convergence of (∇ ) and (17) can be rewritten as where ∈ [2,+∞), is a scalar that controls the extent of edge-stopping function convergence rate. Since the initial contour is already close to the boundary of lumbar vertebrae in the first stage of our approach, the exponential form of (∇ ) can make the evolution of LSF converge rapidly before it crosses the weak boundary. [38] and SpineWeb [39] are employed to test the performance of our approach. The datasets consist of spine CT image scans (pixel resolution: 512 × 512) obtained from 115 different patients aging between 23 and 86 years old and these scans have 150 to 200 slices per patient. The ground-truth images are manually and accurately segmented via expert using TurtleSeg [40] open software. Our segmentation approach is implemented on Matlab R2017a platform installed on PC with 2 Intel Xeon (R) 3.07GHz CPUs, 12GB RAM, and NVIDIA Quadro 5000 graphic processor. The following parameters are determined empirically for our proposed AGLSA in all the experiments: = 1, = 1.0, = 3.0, = 0.7, = 2.0, = 2, = 0.9.

Evaluation
where Ω and Ω represent the volumes of segmented result and the ground truth, respectively. DSC is targeted to measure the overlap extent between the segmentation results and ground-truth images, which varies from 0% to 100%. Higher DSC values would be representative of larger overlapping region areas and better segmentation outcomes. The MCR is defined as follows: The MCR aims to calculate the proportion of the object region being segmented to the incorrect class, which also ranges from 0% to 100%. In comparison with DSC, the lower value the MCR scores, the fewer regions are being segmented into the mistaken categories and certainly the better segmentation consequence will be obtained. The MAD is given by where and denote the boundaries of segmentation results and ground truth, respectively, denotes the total number of pixels that lie on the boundary in the segmentation image, represents the distance from the ℎ pixel on the boundary of segmentation result to the nearest pixel on the boundary of ground truth. Therefore, MAD measures the mean absolute boundary distance between segmentation result and ground truth. Obviously, the lower the MAD is, the better the segmentation result is. Figure 4: Comparison of the lumbar vertebrae CT image segmentation results with original noise level. (a) Input images, images segmented by (b) the C-V model [15], (c) Lim's model [21], (d) Khadidos' model [30], (e) Liu's model [31], (f) our proposed AGLSA, and (g) ground truth.
The HD is defined as follows: This measurement represents the maximum distance from pixels on the boundary of segmentation result to the closest pixels on the boundary of ground-truth images. In a similar manner, larger HD indicates farther distance between the two boundaries and worse segmentation outcome to be attained.

Segmentation Results.
To validate the performance of our approach on CT lumbar vertebrae image segmentation, we have conducted two experiments. The first experiment is designed to evaluate the effect of automatically initialized level set contour by our proposed AILSF, and the corresponding segmentation results are compared with those of Otsu method. As can be seen in Figure 3, the contours automatically initialized by AILSF are, apparently, smoother and more well-defined than the contours initialized via Otsu method. It is able to specifically filter salt-and-pepper noise and make the initialized contour precisely outside and close to the object boundary. The second experiment is performed to evaluate the robustness and accuracy of our proposed AGLSA for lumbar vertebrae CT image segmentation by adding various levels of salt-and-pepper noise. The intuitional comparisons are illustrated in Figures 4, 5, and 6. Figures 4, 5, and 6 illustrate the comparisons of the segmentation results utilizing the C-V model [15], Lim's model [21], Khadidos' model [30], Liu's model [31], our proposed AGLSA, and ground truth with various levels of salt-and-pepper noise. From these figures, it can be seen that the C-V model performs stable segmentation from different levels of noise, but it is unable to solve the irregular boundary problem and leads to undersegmentation since it assumes that the intensity of the object region is homogeneous. Lim's model is so sensitive to salt-and-pepper noise that it cannot 8 BioMed Research International generate continuous contour when the noise level is added to 7%. Moreover, Lim's model raises oversegmentation which is attributed to its edge-based property. Khadidos' model and Liu's model have better segmentation performance than Lim's model because of the utilization of local regional information. However, Khadidos' model and Liu's model failed to segment the inner boundary of the vertebral foramen. It is noticeable that our proposed AGLSA achieves smoother and more accurate segmentation results than others. Furthermore, our proposed AGLSA is robust to various levels of salt-andpepper noise because of using global region-correlation information. Figure 7 compares the average iteration numbers for level set formulation to achieve convergence via the C-V model [15], Lim's model [21], Khadidos' model [30], Liu's model [31], and our proposed AGLSA with various levels of noise. The − represent different image noise levels of input image from 0 to 7%. The − represent the average iteration numbers for level set formulation to achieve convergence. It is obvious that, because of using AILSF to automatically generate a smooth initial contour which is precisely adjacent to the object boundary, our proposed AGLSA has a faster convergence rate and is robust to various levels of salt-andpepper noise. Table 1 indicates the quantitative comparison results of our proposed approach with the other four methods. It lists five different criteria: DSC, MCR, MAD, HD, and RT of these methods. It can be seen that Lim's model obtains satisfactory segmentation result under original noise level, but its edge-based property makes it sensitive to the image noise, and the performance under 1%, 3%, 5%, and 7% levels of salt-and-pepper noise is significantly degraded. Although Khadidos' model and Liu's model have better performance, the running times become longer under 7% noise level because the manually initialized contours are sensitive to the image noise. It is remarkable that for images added with 1%, 3%, 5%, and 7% levels of salt-and-pepper noise, our proposed approach outperforms others within all these five criteria since we adopt region correlation of the histograms inside and outside the contour. The conclusion can be drawn that our proposed AGLSA is robust and accurate in presence of salt-and-pepper noise and is capable of converging quickly to the target boundaries.

Conclusion
In this paper, we present a novel level set formulation for lumbar vertebrae CT image segmentation. Our proposed AGLSA is able to conduct the automatic initialization via AILSF which consists of HMF and GMM, and the corresponding initial contour is smoother and more well-defined than that initialized by Otsu method. Furthermore, a global Figure 5: Comparison of the lumbar vertebrae CT image segmentation results added with 3% noise level. (a) Input images, images segmented by (b) the C-V model [15], (c) Lim's model [21], (d) Khadidos' model [30], (e) Liu's model [31], (f) our proposed AGLSA, and (g) ground truth.
level set formulation is introduced based on the region correlation which is calculated by Mahalanobis distance of histograms inside and outside the level set contour in order to avoid weak boundary leaking problem. Ultimately, a GVFbased edge-stopping function is utilized to guarantee a fast convergence rate of the level set evolution and to avoid LSF oversegmentation at the same time. Experimental results on clinical images demonstrate that our proposed AGLSA is sufficiently more accurate and robust than the other four models. Moreover, our algorithm performs much more computationally efficient in presence of various levels of saltand-pepper noise. Our proposed approach could be extended to other applications in medical image segmentation (brain MRI and cardiac ultrasound). The limitation of our approach is that it only copes with the irregular boundaries (double boundary and weak boundary) and image noises in the segmentation. It does not take into account the occlusion of vertebrae caused by the pathological conditions. Shape prior information could be a possible solution to this issue. Further researches and experiments can be conducted to guarantee the robustness and accuracy of segmentation results with pathological conditions, such as tumors, lesions, and implanting pedicle screws.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
We declare that we do not have any conflicts of interest in connection with this work.

Image Noise Level
Original Image 7% 5% 3% 1% Iteration Number Liu AGLSA Figure 7: Average iteration number for the segmentation results using the C-V model [15], Lim's model [21], Khadidos' model [30], Liu's model [31], and our proposed AGLSA with different noise levels. of our previous paper "A novel automatically initialized level set approach based on region correlation for lumbar vertebrae CT image segmentation" published in 2015 IEEE International Symposium on Medical Measurements and Applications (MeMeA) Proceedings.