Segmentations of medical images are required in a number of medical applications such as quantitative analyses and patient-specific orthotics, yet accurate segmentation without significant user attention remains a challenge. This work presents a novel segmentation algorithm combining the region-growing Seeded Cellular Automata with a boundary term based on an edge-detected image. Both single processor and parallel processor implementations are developed and the algorithm is shown to be suitable for quick segmentations (2.2 s for
Segmentation, also known as labeling, of medical images is an important task for quantitative analyses, custom intervention planning such as localized radiotherapy, and design of patient-specific tools such as orthotics or jigs for joint replacement. Manually labeling images takes a prohibitive amount of time due to the large number of voxels in most medical images, while autonomous segmentations can fail to reach the required accuracy due to interpatient morphological variability. Supervised segmentation algorithms are a promising solution because they allow the user to guide the segmentation without requiring the user’s attention for each voxel.
Popular supervised segmentation algorithms include active contours (snakes) [
This paper presents a novel segmentation algorithm that extends the region-growing SCA with a boundary term based on an edge detector. The resulting algorithm demonstrates improved segmentation accuracy, particularly given minimal user attention, while allowing real-time user supervision. In addition to the algorithm, this paper provides an approach for generating the edge-detected image and a description of the code used to run the algorithm on a consumer graphics card (GPU). Validation results compare the proposed algorithm with Graphcut and SCA.
This section describes the proposed algorithm, which is created by combining a region-growing segmentation algorithm (Ford-Bellman), a cell-based dynamic system (Cellular Automata), and an edge image (Canny filter). This section also explores implementation details, investigates how to generate a good edge image, and describes validation experiments.
Region-growing segmentation algorithms expand labels from seeds, which are voxels with
For FBA, the graph is initialized such that each seed vertex has a distance of zero and a label, each nonseed vertex has an infinite distance and no label, each graph edge between neighboring vertices has a length based on some measure (e.g., absolute difference in image intensities of its endpoints), and all the seed vertices are interconnected with zero-length graph edges. In each iteration, every graph edge with at least one labeled endpoint is tested to see if the sum of the distance of the source (the endpoint with the smaller distance) and the length of the graph edge is less than the distance of the target (the other endpoint). If so, the target’s distance becomes that sum and it copies the source’s label. Thereby, for iteration number
FBA is not the fastest shortest-path algorithm, but by taking advantage of its highly parallelizable nature, FBA can segment medical images faster than the fastest-known single-processor approach, Dijkstra’s algorithm. Furthermore, unlike Dijkstra’s algorithm (see Section
A Cellular Automaton (CA) is a discrete dynamic system that iteratively performs local calculations on a grid, for example, Conway’s Game of Life [
Since this approach needs
The proposed algorithm, Seeded Cellular Automaton Plus Edge (SCAPE) detector, improves SCA’s performance at weak boundaries via a boundary term that impedes region growth through edge-voxels provided by an edge detecting filter (Algorithm
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18)
A typical (e.g., [
FBA is
Three improvements were investigated to reduce the time for SCAPE to converge while running on a single processor. First, the calculations were performed using single-buffering instead of the double-buffering that is commonly described for FBA (colloquially, “in-place” updates instead of “ping-pong”) [
The second implementation tested herein was Dijkstra’s algorithm on a single processor [
The algorithm initially places all vertices in a priority queue based on the shortest known distance to each. Iteratively, the queued vertex with the shortest distance is removed from the queue, and its graph neighbors’ shortest distances and labels are updated (as in Algorithm
The algorithm has two traits that reduce its suitability for interactive segmentation. First, Dijkstra’s algorithm is not conducive to user corrections during a segmentation. Once a vertex is removed from Dijkstra’s priority queue, its distance is not updated unless the algorithm is reinitialized. Second, the algorithm is not clearly parallelizable due to its reliance on the priority queue, though multiple processors can provide some benefit, for example, Delta-Stepping [
SCAPE is well suited for running on parallel hardware, due to the local nature of the SCAPE calculations, and because FBA does not constrain the order that edges or vertices are inspected [
The initial step for running SCAPE on a GPGPU is to transfer the data into the GPGPU’s limited memory. Then, two kernels alternate running on the GPGPU. The first kernel performs the calculations to update the vertices’ distances and records which vertices had their distances updated. The second kernel checks if all the distances have stopped changing, which indicates the segmentation is finished. Finally, the data is transferred back out of the GPGPU’s memory.
In the distance-updating kernel, each thread updates the distance of a single vertex using the data from the neighboring vertices. Various approaches were attempted, with the fastest having a block consisting of all the voxels with the same
After the distances have been updated, each warp sets a flag bit high if any of its threads updated a distance. Then, the second kernel “reduces” those flag bits (one per warp) using the “or” operator to a single flag bit that is low when the segmentation has converged. For this work running on NVIDIA architecture, this kernel used the reduction code titled “Reduce6” that is provided with the CUDA documentation, and which required over 2 bytes per warp of 32 threads.
SCAPE’s segmentations are dependent on the edge image used as the boundary based term. Ideally, the edge image would have edge-voxels everywhere the user desires the label borders, but nowhere else, so that the regions would grow to meet at those borders. Placing the edge-voxels manually would amount to a manual segmentation, so a method requiring less user attention is needed. This work uses the Canny edge detecting filter [
The Canny filter is commonly available on image analysis software, and the exact form is considered optimal in its detection, localization, and minimal response to edges. The Canny image must be computed prior to use in the CA, because the filter uses nonlocal information. First, an intermediate image is created by clamping the intensities to a given range (equivalent to a window/level transform), which reduces spurious edge-voxels. Then, the Canny filter consists of a Gaussian blur, gradient magnitude calculation, edge detection from hysteresis thresholding of gradient magnitudes, and edge thinning using the gradient direction. Therefore, five parameters need to be selected: two for the clamp operation (minimum intensity, maximum intensity) and three for the Canny filter (sigma for the blur, low-gradient threshold, high-gradient threshold).
Various methods can be used to choose the parameters, including having the user heuristically select them, employing typical values, or using a nonlinear search. This section describes an approach to search the edge detecting filter’s parameter space and find an edge image that results in an accurate segmentation via SCAPE. The benefit of such a search is that, given the image and seeds, a large number of parameter sets can be searched automatically, finding a suitable edge image without additional user attention. This section also presents specific improvements to the search for use with SCAPE.
Beasley [
The following four improvements to that nonlinear search increase the utility of the resulting edge images for use in SCAPE. First, the cost function was adjusted by using the same edge-length calculation as in Algorithm
Third, the search space is reduced to four dimensions by fixing the low-gradient threshold in the Canny filter to three standard deviations below the mean gradient magnitude. The low-gradient threshold is used in the Canny filter’s hysteresis approach as the minimum gradient magnitude for a voxel to be an edge-voxel if the gradient direction is the same as an adjacent edge-voxel. In practice, the impact of the high-gradient threshold overwhelms that of the low-gradient threshold, and setting the low gradient threshold to a very low number typically has minimal effect because image noise makes the gradient directions erratic for voxels with low gradients. This change had a minimal effect on the resulting edge images, but significantly decreased the computation time.
Fourth, to further reduce the search time, the random search of the parameter space was replaced by a more effective combination of grid- and binary-based search. This change was motivated by the observation that “good” minimum and maximum intensity parameters remove intensity variations unrelated to features separating the seeds, making those parameters relatively independent of the other two parameters. Therefore, a binary divide-and-conquer search is performed over the minimum and maximum intensity parameters. Then, for each pair of minimum and maximum intensity parameters, an evenly spaced grid of sigma and high-gradient threshold parameters is examined. Throughout this search, the five best parameter sets are kept for initializing the simplex refinement performed as the last step.
This search finds parameters such that the Canny filter creates an edge image best separating the regions grown from the seeds. Since SCAPE and the parameter search use the same shortest-path calculation on the same graph, the search effectively optimizes the segmentation based on the seeds, the image features, the framework for creating the edge image, and the segmentation algorithm. A typical work-flow would be acquiring the image, manually placing seeds, using the parameter search to get an edge image, segmenting the image with SCAPE, and then interactively correcting the segmentation by adding additional seeds.
Experiments were performed to evaluate SCAPE’s segmentations, the different implementations of SCAPE, and the different approaches to determining the edge images. The data used in these experiments are 40 T1-weighted MRI images of normal brains from the Segmentation Validation Engine (SVE) [
Typical data used for validation: an axial slice (a) from one of the 40 T1-weighted MRI volumes from the Segmentation Validation Engine, with the three-slice seeds overlaid ((b), brain in red, nonbrain in blue), and a 3D rendering of a resulting SCAPE segmentation (c).
Axial slice
Seeds on slice
Segmentation
These experiments used
When the following experiments used the parameter search (Section
First, SCAPE was compared with SCA and Graphcut. For each image, three seed images were created by manually placing seeds: seeds on a single axial slice (1-slice), seeds on three orthogonal slices (3-slice), and seeds on five slices in each of the three orthogonal directions (15-slice). Then, a segmentation was performed on each image by each algorithm for each set of seeds, a total of 360 segmentations. The SCAPE segmentations used an edge image found using the short parameter search, and the GPGPU implementation. The SCA implementation was similar to the SCAPE implementation, but without the edge image calculation terms. The Graphcut implementation was on the CPU, provided by Boykov and Kolmogorov [
Second, the computation time and update rate were evaluated for the different SCAPE implementations. The three implementations segmented an SVE image, as well as the same image upscaled to 512 × 512 × 426 (one of the the maximum volume sizes that could be successfully segmented on the GPGPU, along with 484 × 484 × 484 and 1024 × 1024 × 106). Blank edge images and the 3-slice seeds were used. Dijkstra’s algorithm was provided by the Boost Graph Library [
In the third experiment, different approaches to determining the edge images used by SCAPE were evaluated by user time, computation time, and segmentation accuracy. Four approaches to determining edge images were investigated: manual selection, the short parameter search, the long parameter search, and fixed parameters. For the manual selection case, for each of the 40 SVE images, the author selected the parameters creating the edge image that seemed to best separate the brain from the surrounding tissue. For the two parameter searches, each search was performed on each image for each of the three sets of seeds. For the fixed parameters test the same values, the mean value of the manually chosen parameters were used to create all 40 edge images.
Manually drawing the three sets of seeds took 14 ± 2 s (mean ± standard deviation) for the 1-slice set, 53 ± 7 s for 3-slice, and 440 ± 44 s for 15-slice.
SCAPE’s segmentations were more accurate than those of either SCA or Graphcut (Figure
Comparison of segmentation accuracy for 40 brain MRIs, across three algorithms and three sets of seeds (for which seeds were placed on 1, 3, or 15 image slices). The proposed algorithm, SCAPE, generated segmentations that match the gold standard better than those of Seeded Cellular Automata (SCA) and Graphcut, particularly when minimal seeds were provided. Error bars show standard deviations.
SCA on the GPGPU took the least time to perform the segmentations, 1.7 ± 0.4 s. The SCAPE segmentation on the GPGPU took 2.2 ± 0.5 s, the extra time being used for data transfer of the edge image and additional calculations due to the edge image. Graphcut on the CPU took 155 ± 222 s. SCA’s timing varied by 10% difference between the three sets of seeds (not shown), as did SCAPE’s, but Graphcut’s variation in timing was an order of magnitude between the three sets of seeds (1-slice 23 ± 9.3 s, 3-slice 413 ± 220 s, 15-slice 31 ± 17 s). The SCAPE algorithm also requires an edge image, in this case determined by the short parameter search on the 3-slice seeds, for which the computation time is covered in Section
The GPGPU implementation was the fastest SCAPE implementation, segmenting the 256 × 256 × 124 image in 2.4 s and 1229 iterations (37 s and 1241 iterations for the 512 × 512 × 426), approximately 2 ms (29 ms) per iteration on average (Table
Comparison of segmentation time and number of iterations for different implementations of SCAPE. The parallel hardware (GPGPU) implementation has faster computation times than the single processor implementations of both FBA and Dijkstra’s algorithm. Altogether, the improvements to the CPU implementation of SCAPE reduce the segmentation time by 90%.
CPU | CPU | ||||||
no improve | all improve | Dijkstra | GPGPU | ||||
Image size | time (s) | iter. | time (s) | iter. | time (s) | time (s) | iter. |
256 × 256 × 124 | 334 | 1229 | 28 | 219 | 12 | 2.4 | 1229 |
512 × 512 × 426 | 4975 | 1241 | 189 | 124 | * | 37 | 1241 |
*Dijkstra implementation failed due to memory constraints. The test system had 12 GB of RAM.
Altogether, the three improvements described for speeding up SCAPE on the CPU cut the segmentation time by over 90%. For the 256 × 256 × 124 image, the segmentations took 334 s (1229 iterations) without any improvements. The single-buffering improvement reduced that to 186 s (714 iterations), while single-buffering and Yen’s improvement together reduced the time to 122 s (219 iterations), and finally including the narrow-band tiles improvement resulted in 28 s (219 iterations). For the 512 × 512 × 426 volume, the segmentation time went from 4975 s (1241 iterations) with no improvements to 189 s (124 iterations) with all improvements. The resulting labels were identical in all cases.
For the CPU implementation with all the improvements, the minimum update rate during the initial segmentation of the 256 × 256 × 124 image was 2 Hz, with an average of 8 Hz (512 × 512 × 426 image: 0.13 Hz minimum, 0.66 Hz average). To investigate interactive ease-of-use, an initial segmentation of the smaller image was performed and then additional seeds were added. The average update rate for these corrections varied with the number of affected voxels, from 16 Hz for 2.7 million voxels changed, to 220 Hz for 3.4 thousand voxels.
The Dijkstra implementation took 12 s to segment the 256 × 256 × 124 image, five times as long as SCAPE on the GPGPU, but less than half the time as SCAPE on the CPU. Unfortunately, the Dijkstra implementation was unable to load the 512 × 512 × 426 image due to memory constraints. The Boost Graph Library implementation for Dijkstra’s algorithm creates a separate graph data structure, so presumably a Dijkstra’s algorithm implementation working directly on the array of image intensities would be successful for the larger image size.
Of the four methods tested for determining an edge image for use in SCAPE, both parameter searches generated edge images resulting in the most accurate segmentations, demonstrating that the proposed search method is suitable for this purpose (Figure
Comparison of segmentation accuracy resulting from SCAPE using edge images from four different methods and three sets of seeds. The long and short parameter searches both determined edge images that resulted in high quality SCAPE segmentations. Manually selecting parameters resulted in relatively low quality segmentations. Using the same “fixed” parameters for each edge image did reasonably well with more seeds. Quality measured by comparing with a gold standard using Jaccard index, for which the scale is shown ranging from 0.7 to 0.95 to show more detail. Error bars show standard deviations.
Manual parameter selection resulted in relatively low quality segmentations (no greater than 0.876 ± 0.034) suggesting that manually setting the parameters based on inspection of the generated edge images can be a difficult task for a structure with a surface as complex as the brain’s. When using fixed parameters equal to the mean values from the manually selected parameters, relatively high quality segmentations resulted when 15-slice seeds were provided (0.908 ± 0.019), but not for 3-slice (0.851 ± 0.032) or 1-slice (0.820 ± 0.061) seeds. Other fixed parameter sets, for example, the means of the long search’s parameters, fared no better, presumably because the MRIs intensities have high interpatient variability.
In addition to any user time or computation needed to generate the seeds, additional time may be spent determining a suitable edge image for SCAPE. The parameter searches required no additional user time but did take off-line computation time: 33 ± 5 m per image for the short search and 250 ± 51 m for the long search. The fixed parameters took no additional time once the parameters were chosen. Finally, the manual selection took 145 ± 83 s of user attention per image, but no additional computation time.
This paper presents a novel segmentation algorithm, Seeded Cellular Automata Plus Edge detector (SCAPE), combining a region-growing algorithm with edge-detected images to improve behavior at weak borders. This combination reduces the amount of user interaction necessary for quality segmentations, while allowing real-time user guidance. Experimental results from a typical MRI segmentation task demonstrated higher segmentation accuracy compared with Seeded Cellular Automata and a typical implementation of Graphcut. Furthermore, three software implementations of SCAPE were examined, with a version running on a consumer graphics card performing segmentations of 256 × 256 × 124 images in 2.2 s on average, and a single-processor version having an update rate suitable for interactive user corrections (2–220 Hz, same image size).
SCAPE requires an edge-detected image to use as a boundary term, and various methods were investigated for generating edge images using the Canny edge detecting filter. The presented parameter search resulted in high-quality segmentations and required no additional user time, but took half an hour or more of computation time. A suggested work-flow would be placing seeds on some number of image slices (e.g., three slices taking less than a minute of manual scribbles), running the short parameter search for half an hour of offline computation, and finally user correction of the segmentation (using the interactive CPU implementation) until the segmentation is accurate. For modalities in which the edge image parameters for a segmentation task may be consistent, such as CT, fixed parameters can be used to bypass the need for the parameter search. Alternatively, other segmentation tasks have responded well to manual selection of the parameters, though the task demonstrated in this paper was not so amenable.
Determining good edge images for SCAPE is an open area of investigation. Other edge image filters should be explored, for example, the SUSAN filter, as should a parallel hardware implementation of the Canny filter to reduce the necessary computation time. Further investigations of the edge image parameter search are also needed, as the edge images found by the long parameter search had better costs but did not generate more accurate segmentations in comparison with the short-parameter search.
Further comparisons should be performed between SCAPE and the popular supervised segmentation algorithms, for example, Level Sets, Graphcut, Random Walker. Though Graphcut on the CPU took an order of magnitude more time in this experiment, if Graphcut was implemented on parallel hardware the segmentation times could be expected to be dropped by 90%, bringing it into the same range as SCAPE on parallel hardware. Also warranted is a more thorough investigation of the relationship between minimal user attention, the graph edge lengths used for Graphcut, and Graphcut’s boundary-length bias.
A main thrust of this work is the tradeoff between user attention and offline computation time. The more time a user spends placing seeds, the more likely any of the algorithms will provide an accurate segmentation. A decrease in user attention (placing seeds or correcting initial segmentations) can be offset by an increase in computation time, for example, using SCAPE and an offline parameter search. Further investigation of minimal attention for accurate segmentations could significantly further clinical use of image segmentations, particularly investigation of supervised algorithms such as the one proposed in this work. In particular, a study is planned on user-guided corrections following an initial segmentation, testing the increase in segmentation accuracy as a function of user attention.
This work grew out of image analysis advice and software framework provided by Dr. Christopher Wagner and Orthorun, Ltd.