Efficient active contour and K-means algorithms in image segmentation

In this paper we discuss a classic clustering algorithm that can be used to segment images and a recently developed active contour image segmentation model. We propose integrating aspects of the classic algorithm to improve the active contour model. For the resulting CVK and B-means segmentation algorithms we examine methods to decrease the size of the image domain. 
 
The CVK method has been implemented to run on parallel and distributed computers. By changing the order of updating the pixels, it was possible to replace synchronous communication with asynchronous communication and subsequently the parallel efficiency is improved.


Introduction
Looking at an image, it is usually very easy for a human to see what it represents. For a computer this is not so easy. Before computers "know" what is represented, objects must be measured, but before that, the objects must first be detected. Understanding images is very important in problems like stereo and motion estimation, part recognition or image indexing. The first step in image understanding is image segmentation. Segmentation is the problem of dividing an image into objects or distinguishing objects from a background.
This paper is based on [11], which was presented at DCABES2002 and discusses the classic K-means algorithm, the recently developed Chan -Vese (CV) active contour model and combinations of both (CVK, B-means). K-means can only handle a small subset of images and needs image enhancement for noisy or blurred images. CV and CVK are designed to handle a much larger class of images without enhancement. Unfortunately these algorithms are often slower. With-out any optimizations, a typical 600×480 image will take several hours. Therefore methods that reduce the size of the image domain and parallel and distributed computers are used to speedup calculation.
Another combination of CV and K-means is the Bmeans algorithm. This algorithm was developed after DCABES2002 and gives segmentations with the same quality as CVK but at lower computational costs. No parallel implementation is yet available.

Image processing
An image is a representation or imitation of an object. Using this definition many images cannot be seen with the naked eye. In this paper only a small subset of images is considered: images that can be represented by a brightness quantified in every point of a physical domain. For three-or lower dimensional domains, these images can be visualized. The brightness can be quantified by a number between a minimum value (black) and a maximum value (white), three numbers for a red, a green and a blue channel or in general by a vector of suitable size.
Image processing produces a modified image from an original image, for example compression and restoration (noise removal, deblurring or image enhancement). Image analysis processes an original image but does not produce an image. Instead it produces a measurement. So image processing and image analysis distinguish themselves by their output, image or measurement. Image segmentation is a form of image analysis and is concerned with finding the boundaries between objects in the image. After the image is segmented and the objects are represented in mathematical form, properties of the objects can be measured in order to recognize and classify the objects. Segmentation, measurement and classification are forms of image analysis. The term image processing is sometimes used for both image processing and image analysis.
This paper concentrates on image segmentation algorithms that do not need image enhancement, since they are robust with noisy or blurred images.

Image segmentation
For every point in an image domain, x ∈ Ω, the intensity of a gray-valued image can be specified by a number in [0,1] (0 for black, 1 for white) and the intensity of a RGB image (red, green, blue) can be specified by three numbers in [0,1]. In general, the intensity of images considered in this paper can be specified by m numbers in [0,1]. Therefore images are mappings from Ω to [0,1] m and can be written as u : Ω → [0.1] m , u(x) = (µ 0 (x), . . . , u m−1 (x)), x ∈ Ω The image domain can be discrete (a grid of points or pixels) or continuous, has a rectangular shape and can have any number of dimensions larger than one.
For example, for a 2D continuous image, or for a d-dimensional digital image. Here w 0 and w 1 are the width and height of the image and ω i is the number of grid points in the (i + 1)-th dimension. The goal of image segmentation is to segment the image domain Ω into several subdomains Ω i , based on some appropriate criteria that involve intensity or location of colors, such that the domain is formed by the union of the subdomains and the subdomains do not overlap, After segmentation is completed, the subdomains are considered k separate objects, or k − 1 separate objects and a background. The objects can then be measured and classified or recognized. This step towards image understanding is not covered by this paper.

Review of K-means algorithm
In this chapter the classic K-means clustering algorithm is introduced (section 2.1), as well as some commonly used extensions (section 2.2).

K-means clustering algorithm [6]
The K-means clustering algorithm was designed to cluster a number (N ) of objects {O i : i = 0, . . . , N − 1} into K clusters or classes {K i : i = 0, . . . , K − 1}, based on location of the objects. The objects O i can be assigned values v i by using some measure σ; v i = σ(O i ), and the distance between objects can be expressed in terms of the distance between values; and the distance between an object and a class can be defined by The K-means segmentation is completed when all objects are assigned to classes in such a way that the distance to their mean objects is smaller than the distance to the mean objects of other classes. The algorithm works by sequentially assigning all objects to the classes that lie closest to them and then recalculating the mean objects. The objective is to assign the objects to classes so that they lie closer to the average location of their class than to the average location of other classes.
The algorithm can be used to cluster pixels of a digital image with related color if the pixels are considered objects. Location is measured in terms of color rather than the actual position of the pixels in the image; pixels are located in color space, not in physical space. The average location in color space of pixels in a class can be calculated by averaging the colors of all pixels in the class. The distance between a pixel p i and the average of a class a j , can be expressed by (m = 1 for a gray valued image and m = 3 for a RGB image). The parameters λ j,k can be used to give priority to classes, indexed by j, or colors, indexed by k. [8,9]

On-the-fly
In the standard K-means algorithm, all pixels are updated before the new means are calculated. The means could also be updated after every pixel update. This is called on-the-fly K-means. The idea is that the calculation time per pixel update increases, but the number of iterations decreases. In practice, the means will not have to be recalculated completely after every pixel update. If the numerator, the denominator and the quotient of Eq. (1) are stored for all classes at all time, only six operations are needed to move a pixel from one class to another.

R-means
If the standard K-means algorithm is executed, the order in which the pixels are updated does not make a difference. However, if a pixel is updated in the onthe-fly version, it affects the means, and therefore the decisions to move other pixels. The order in which the pixels are updated influences the number of iterations, but no optimal deterministic order exists for an arbitrary input image. The R-means algorithm is a stochastic extension of the on-the-fly K-means algorithm where the order of updating the pixels is random. Pixels can be picked from a uniform distribution, or from a nonuniform distribution that models additional constraints.

X-means
A major disadvantage of previously discussed Kmeans versions is that the user must supply both the input image and the number of classes (K). K-means is not applicable to large sequences of input images, if the user is prompted for an extra input parameter for every image. X-means [9] is a variant of K-means where also the optimal choice of the number of classes is determined.

Active contour image segmentation
Image segmentation methods that work by curve evolution are called snakes or active contour models. Here a parameterized hypersurface C (or contour in 2D) moves through the image domain with respect to constraints from the image and stops on the boundaries of objects. The parameterization of the hypersurfaces must be flexible enough to deal with merging or breaking hypersurfaces. Level set functions [7] are handy tools and are explained in section 3.1. In section 3.2. an active contour model is described and some numerical aspects are discussed in section 3.3 and section 3.4.

Level set functions [7]
Level set functions can be used to track the positions of evolving fronts through time, using speed functions that can be defined in the whole space or just near the fronts. They can be applied to image segmentation if the fronts represent estimates of the boundaries between objects and evolution occurs in the directions that give better estimates. In order for the segmentation algorithm to work on any image, the fronts must be able to take on every shape. Therefore a representation must be chosen that can represent every topology and also every topological change.
On a d-dimensional domain Ω, a contour or hypersurface C is a (d-1)-dimensional object, but it can be represented by a function φ : Ω → R on Ω: (3) C is the zero level set of φ and is a vector normal to {x ∈ Ω : φ(x) = φ(x 0 )} at position x 0 pointing in the direction of level sets with φ(x) > φ(x 0 ). To make the normal point outward, φ is chosen to be negative inside C and positive outside C (Fig. 1). As a contour evolves over time, it remains represented by the zero level set: Differentiating Eq. (5) gives φ t + ∇φ · x t = 0. The speed of a contour in tangential directions is not important for tracking the contour as it evolves over time, so only the speed in normal direction is considered: This results in an evolution equation for φ: Not just the zero level set, but all the level sets move in normal direction with speed ν(x, t). [4,16,17] In this section a short introduction to Chan-Vese image segmentation is given. A more detailed derivation is given in Appendix A1.

Chan -Vese image segmentation
For vector-valued images (m > 1), Chan & Vese base the constraints on the minimization of an energy functional F CV n,m (C 0 , . . . , C n−1 , c 0 , . . . , c 2 n −1 ) = (7) is the length of a contour (in 2D) or the area of a surface or hypersurface (in 3D or higher dimensions), c i ∈ [0, 1] m are constants that represent the intensity of the image on every subdomain Ω i , µ 0 and λ 0 are parameters that can be set by the user.
The steps of the algorithm are to minimize the energy Eq. (7) by moving the hypersurfaces while keeping {c 0 , . . . , c 2 n −1 } fixed, and then recalculate {c 0 . . . , c 2 n −1 } while keeping the hypersurfaces fixed. The latter can be done by setting c i equal to the average color of the image on subdomain Ω i . In case |Ω i | = 0, subdomain Ω i is empty, it is not necessary to know c i , since the subdomain does not contribute to the energy. For the former an appropriate representation for the hypersurfaces must be chosen: level set functions.
Hypersurface C i is represented by the zero level set Eq. (3) of the function φ i : Ω → R. If the number of hypersurfaces n is larger than one, the level set functions are combined in a vector: {c 0 , . . . , c 2 n −1 } is written as a 2 n × m matrix c to shorten notation. The energy Eq. (7) can be rewritten in terms of the level set functions This energy can be minimized with respect to Φ by calculating the Euler-Lagrange equations In Eq. (9), I(x) is the index of the subdomain that x belongs to andĨ i (x) is the index of the subdomain that lies opposite subdomain I(x) relative to hypersurface i.
Introducing penalty functions p i : Ω → R to express why points in Ω should not be Ω i , the Euler-Lagrange equations Eq. (9) become Notice that these penalties are also used in the Kmeans algorithm Eq. (2).
To get an evolution equation for a contour moving in normal direction, δ(φ i ) is replaced by |∇φ i | [7]. In that case Eq. (6) is valid with speed function and Neumann boundary conditions For example a gray-valued image with one contour or a RGB image with two contours:

Singularities
Singularities will arise if the denominator of the evolution equation Eq. (11) equals zero. This only happens in points where |∇φ| = 0, or ∇φ = 0, so points where the normal to the hypersurface is not well defined. These singularities can be dealt with by adding 0 < ε 1 to the denominator [4] or by rewriting the first term of Eq. (11) to (appendix A2) and defining the limit for the Rayleigh quotient is defined. This means that the limit operator is evaluated in the direction of the all-one-vector, or the line ∂φi ∂x1 = . . . = ∂φi ∂x d . The latter approach is used, so Eq. (11) becomes No other singularities will occur as long as the first and second order derivatives of the level set functions exist.

Discretization
By discretizing a step is taken to what is called digital image processing. In digital image processing an image can be represented by a grid where every grid point is called a pixel. The color of a pixel is usually expressed by a number in {0,1} for a black and white image (1 bit in computer memory), a number in {0,. . . ,15} or {0,. . . ,255} for a gray-valued or color image (4 bits or 1 byte), and 3 numbers in for {0,. . . ,255} a color image. In this article, gray values are scaled to [0,1] and color values are scaled to [0,1] 3 .
Discretizing a PDE usually means that a useful discrete representation of the PDE and the domain must be chosen. In digital image processing the domain is already represented in discrete form, a rectangular grid, In 2D, the origin is placed in the left upper corner. Here, to be able to work with any number of dimensions, discrete coordinates η are introduced such that The PDE Eq. (13) is then defined on where min i∈{0,...,d−1} The time discretization is done with step size ∆t and time index τ , so and central differences on a 2d + 1 -stencil were used for the spatial derivatives. An explicit scheme was used where only the center of the stencil is taken at the new time index and all other points are taken at the old time index. For example, in 2D: Stability of the numerical scheme can be achieved by choosing with α close to one and µ not larger than one, which is the result of the CFL condition that states that a hypersurface may at maximum move with the distance of one pixel.
The factor (1 − µ) was introduced to compensate for the restriction just imposed on µ.

Extensions to CV image segmentation
Because of the CFL condition, Eq. (14), the CV image segmentation algorithm will generally require a lot more calculation time than K-means clustering. In this chapter two methods are explored that can potentially reduce the calculation time of CV image segmentation. The narrow band method (section 4.1.) updates only a small subset of all pixels per iteration, while the multiresolution method (section 4.2.) updates all pixels of a new image that is obtained by down sampling the original image.

The narrow band method
The first step in reducing the time complexity is to acknowledge that much work is done in vain. In the digital model, a hypersurface moves over a pixel if the level set function on that pixel changes sign in the iteration. Assuming that far away from the hypersurface this changing sign does not happen, it is a waste of time to first update the level set function there using the evolution equation and second reinitialize it to be the signed distance function to the hypersurface. However, this assumption might not always be justified. Instead of updating the level set functions on every grid point, a speedup will be achieved if only values on grid points near the hypersurfaces are updated. By specifying a maximum distance δ to the contours and only updating the level set functions on grid points within this distance, a band-shaped domain is created.
Applying the narrow band method to the segmentation algorithm may cause the algorithm to fail. The narrow band method is very effective in problems with evolving fronts where the hypersurface moves with certain speed in normal direction and where the speed function is based on properties of the hypersurface. In the segmentation algorithm, the fitting term (12) may cause new hypersurfaces to appear out of nothing. This means that new hypersurfaces that should appear more than δ away from existing hypersurfaces do not get a chance in the narrow band method. Or objects that are not yet detected might not be detected at all if they are located too far from objects that are already detected.
The narrow band method can still be used with the segmentation algorithm if the initial hypersurfaces are chosen well. The algorithm can be expected to succeed if the union of the narrow bands corresponding to the initial contours cover most of the image domain Ω. In that case no speedup can be expected in the first few iterations after initialization.
To make sure that the hypersurface stays within the narrow band during the curve evolution, two actions can be taken: reinitialize the narrow band after each iteration, or use the narrow band for as long as possible, detect when the hypersurface gets too close to the edge of the band and rebuild the band when it does. In [1] the use of the latter approach is proposed, because reinitializing the level set function is time-consuming. Therefore explicit information about the location of the band is required. The advantage of the former approach is that no additional data structures are required. All information needed to restrict calculations to the narrow band is already present in the level set functions, which makes parallelizing the algorithm more efficient.
To store the narrow band information, the δ-level sets and the (−δ)-level sets can be used to represent the edges of the narrow band. Reinitialization is done from the hypersurfaces working outward to the edges of the narrow bands. Outside the narrow bands the value is either δ + 1 or −(δ + 1). If the value on a grid point is positive (negative) and the value on a neighbor grid point is negative (positive), then the hypersurface is located between those grid points. These values remain unchanged and all other values are updated, based on these. Towards the edges of the band, for each grid point η the value is set to be

The multiresolution method
Decreasing the resolution of an image decreases the size of the image domain and thereby reduces the time complexity of the segmentation algorithm. Changing the resolution of a digital image means that the same image is spread over a different amount of pixels. A multiresolution method can take advantage of this. The multiresolution method should not be confused with the standard multigrid method, in which an iterative solution and the corresponding problem are coarsened to another grid, where the problem is solved and interpolated back to the fine grid. The grids are used recursively and iteratively. The multiresolution method for the image segmentation problem uses lower resolution versions of the original image to find initial solutions for higher resolution problems instead. So where the multigrid method starts at the highest level, returns to the highest level and uses all coarse grids regularly, the multiresolution method starts at the lowest level, ends at the highest level and uses all coarse grids only once.
Because the image and the level set functions are represented by the same data structures, d-dimensional arrays (matrices if d = 2), the only addition to the previous algorithm is a mechanism for resizing a ddimensional array. Popular mechanisms are linear (or more advanced) interpolation or nearest neighbor methods. Here they are used for the level set functions and the image respectively.

CV and K-means mixtures
In the CV algorithm, the color of a pixel is compared to the mean color of its subdomain and the mean colors of the subdomains that lie opposite the hypersurfaces. Therefore a pixel can stay in its subdomain or move to one of n other subdomains (if there are n hypersurfaces). However, the K-means algorithm allows pixels to move to any of the other 2 n − 1 subdomains. Apparently in the CV algorithm, pixels might be denied the opportunity to move to the right subdomain. Or in other words, hypersurfaces might not evolve correctly in the neighborhood of other hypersurfaces (Fig. 2).
The figure shows a hypothetical situation where four classes/subdomains are separated by two hypersurfaces α and β. The averages in the classes are 4,1,2 and 5. A pixel/object with value 4 is currently assigned to the class with average 5 (A) and should be assigned to the class with average 4 (D), which means that both hypersurface α (B) and hypersurface β (C) will have to move. For the obvious choice of parameters λ i,j , the CV algorithm does not move the hypersurfaces, because (4 − 5) 2 < (4 − 1) 2 and (4 − 5) 2 < (4 − 2) 2 . The K-means algorithm does move the hypersurfaces, because (4 − 5) 2 > (4 − 4) 2 .
This chapter describes two attempts of mixing the two different approaches in order to find a new algo- rithm with better accuracy or lower calculation time. CVK (Section 5.1.) can be seen as CV with some Kmeans influences and B-means (Section 5.2.) can be seen as K-means with some CV influences.

CVK algorithm [10]
The CVK algorithm segments an image by evolving hypersurfaces according to the PDE where µ ∈ [0,1] and is a pixel in the subdomain where the penalty function Eq. (16) is minimized, so where a pixel x should be moved to according to the K-means algorithm. Although there are many such pixels y, the level set function φ i has the same sign on all of them.
For µ = 1, so hypersurface C i moves in the direction opposite the normal Eq. (4), with a speed equal to its curvature. The curvature term v curvature i = −µk(φ i ) makes the hypersurfaces move in the direction that makes the curvature decrease ( ∂ ∂t |K| < 0). Whether that is in the direction of the normal or in opposite direction depends on the orientation of the level set functions (Fig. 3). As a result, the size of the hypersurface gets smaller. Or actually the size of the hypersurface is minimized by the energy functional and as a result, curvature gets closer to zero. By setting µ small, small objects can be detected and larger objects are detected if µ is chosen larger. Since noise in images is made of very small objects, the CV and CVK image segmentation algorithms can be made robust to noise or blur by setting µ large enough.
The sign function in Eq. (15) makes sure that the level set function φ i on pixel x gets closer to zero or even changes sign when x is located on the wrong side of hypersurface C i . In case x is located on the right side of C i , φ i is updated such that |∂φ i /∂t| > 0. This is done so the color criterion can oppose the curvature criterion, to prevent the hypersurfaces from showing wiggling behavior when these criteria contradict and alternate dominance in subsequent iterations.

Narrow band K-means
The power of the CV and CVK algorithms is the robustness with respect to noise, blur or damage, caused by an active contour that keeps spatially correlated pixels connected while still honoring the color constraints. Unfortunately the speed of the active contour decreases as the resolution of the image increases Eq. (14). The narrow band K-means algorithm (B-means) attempts to mimic the active contour behavior of CVK in a Kmeans framework.
The origin of the B-means algorithm lies in [14] where an attempt was made to speedup the CV algorithm for one active contour, by realizing that for the CV image segmentation and many other algorithms that work by updating level set functions to minimize an energy functional, only the sign of the level set function holds significant information. The length of a contour is approximated by (16) and added to the K-means criterion Eq. (2), to segment an image into two classes. Since the sign can be represented by only one digit in binary representation, the method can be easily extended to 2 n classes or arbitrary K classes. Equation (16) is a summation of terms that can only have three values; 0, 1 or √ 2, depending on the classes of the pixels (i + 1, j) and (i, j + 1) relative to the class of pixel (i, j). In [14] K-means and on-thefly K-means are denoted by Jacobi and Gauss-Seidel iteration schemes respectively.
In this paper the on-the-fly B-means algorithm from [14] is extended to an arbitrary number of classes. For every pixel, a value of one is added to the energy functional for all neighboring pixels that lie in different classes. So in 2D, the length term can be 0, 1, 2, 3 or 4 instead of 0, 1 or √ 2. The distance of a pixel p i to the mean a j of class K j is calculated by

if pixels i and k lie in different classes 0 if pixels i and k lie in the same class
The active contour is introduced by updating only the pixels that have neighbors that lie in different classes. To store the location of the band, all pixel indices of pixels that have neighbors in different classes are kept in a queue. Every iteration, an index is popped from the queue, then updated, and possibly pushed at the end of the queue. Also neighbor pixels that have become part of the band due to the last update are pushed to the queue if they were not in the queue already. Compared to the narrow band version of the CV and CVK algorithms where the level set functions have to be updated after every iteration, the narrow band of the B-means algorithm comes with practically no calculation overhead, if two conditions are satisfied: given the index of a pixel, the location of the pixel must be accessible from memory very fast, and given the location of the pixel, the location of the next pixel in the queue must be accessible fast. The data structure used consists of a grid of the same size as the digital image, where for every pixel the class index is stored along with the locations of the next and previous pixels in the queue. Also the locations of the first and last pixels in the queue are stored.

Parallelization
Due to the CFL condition Eq. (14), speedup methods discussed in chapter 5 are not sufficient for very large images. Therefore CV and CVK are implemented in parallel (section 6.1.). In 6.2. a different ordering of pixels is explored that allows for a more efficient use of calculation and communication resources on some parallel computers.

Parallel CV/CVK Algorithms
Reassigning pixels to classes or calculating average colors of pixels in classes can be seen as tasks of the algorithm. To reassign a pixel to a class, the color of the pixel must be compared to the average colors of some or all classes. The averages must stay constant until all comparisons are performed, so the averaging task may not start before the reassigning task has finished. Vice versa is also true, since the average colors of classes cannot be calculated correctly if pixels are shifted between classes during this calculation. In the sense of tasks that have to be performed, the segmentation algorithm is clearly sequential by nature. Therefore a data parallel model of computation is chosen. This way, several instances of the algorithm can run on different parts of the image domain. Every instance by itself runs sequentially. In general, hypersurfaces will not necessarily stay on one side of the artificial borders that arise when the image domain is decomposed. Communication must take place between the different instances of the algorithm to prevent occurrence of artificial segments in the segmented image.
The first dimension of the domain is partitioned while the other dimensions are not partitioned.
If d = 2 this is called striped partitioning; the domain is a matrix and the columns are assigned to submatrices. In 3D it could be called sliced partitioning. To let terminology stay valid for d > 2 let The grid points of the subdomains are assigned to S processes, a processor can run more than one process. Assigning a grid point to a process means that all data associated with that grid point is stored in the memory of the processor on which the process will run and that operations on the data will be performed by that processor. For every column that does not belong to the same process as its neighbor column an extra column is assigned; operations on this data are done by the neighbor process and data is updated during a synchronizing step between processes. In the synchronization step data is sent between processes, although sending is not the correct word if both processes are located on the same processor. This mechanism makes sure that subdomains overlap by one column, which is exactly enough for the scheme used in the discrete evolution equation.
Except after initialization, the hypersurfaces will be close to the objects to be detected. Since they are not distributed uniformly over the image, neither will be the narrow bands. Some parts of the domain may require more calculation than other parts. To distribute the work equally among the p processors, the S subdomains are mapped cyclically to the processors. Given p, S must be chosen so that Smodp = 0. Let b = S/p, then the domain is divided into b blocks and every block is distributed over p processors. So if p is given, b can be chosen small to reduce communication between processors or b can be chosen large to ensure a good load balance. In case no narrow band is used, the load is well balanced automatically, so b can be chosen equal to one.

Synchronous versus asynchronous communication
Before every iteration the overlapping data must be synchronized. This requires S-1 exchanges of columns between neighbor processes. Also two all-to-all broad- Fig. 4. 2D domain distributed to three processes. Data associated with a subdomain is assigned to the concerning process and calculations needed to update the data are done by the process. Columns of data from neighbor processes are needed, so synchronization must take place. casts must take place to evaluate the stop criterion and to recalculate the average intensities c i,j .
Subdomains Ω i are assigned to processor imodp. If neighbor processes are not located on the same processor, which is the case for every process if p > 1, synchronization must take place. This takes either two or three steps, depending on whether p is odd or even. In every step a process communicates with either its left or its right neighbor process. Here the communication time is minimized by bundling b overlapping areas into one message. Messages do not actually have to be copied to a combined message; instead computer memory can be organized so blocks of data that have to be sent in the same communication step are always located at consecutive addresses.

Asynchronous communication
On parallel computer systems where communication is slow, run time could be optimized if calculations and communication could be done in parallel. The advantage of communicating asynchronously is that processors do not have to wait until the other processor involved in the communication is ready. The disadvantage is that after the communication is initiated, the data involved is accessible but not yet usable. This can be dealt with by choosing the order in which grid points are updated carefully: first non-overlapping grid points must be updated, then the overlapping grid points. In MPI implementations, synchronous and asynchronous receive operations are implemented by the functions MPI Recv and MPI Irecv respectively. Synchronous and asynchronous send operations are implemented by the functions MPI Send and MPI Isend. MPI Isend is in fact asynchronous, but MPI Send can be synchronous or asynchronous, depending on the size of the systems message buffer; if the buffer is large enough, asynchronous sending is used.
Because some versions of MPI always use asynchronous sending, sometimes few improvement can be observed by using MPI Isend and MPI Irecv instead of MPI Send and MPI recv.

Evaluation of the algorithms
The segmentation results of the different algorithms are compared in 7.1. In 7.2. calculation times are compared.

Qualitative comparison
For undamaged, unblurred, synthetic images, all four algorithms (K-means, CV, CVK, B-means) work well. For natural images or noisy images, the K-means algorithm cannot be used to completely segment the images [6,10], although it can still be useful to create an initial guess for other algorithms. CV, CVK and B-means are designed to handle these images as well [4,10,16]. In these algorithms the hypersurfaces are moved by two effects; a fitting term makes sure that pixels in the same object have similar color and a curvature term makes the contours move in the direction that minimizes the curvature, ∂ ∂t |κ| 0. Small objects have large curvature, while large objects have smaller curvature. Noise is actually made of many very small objects and have large curvature. The parameter µ can be set by the user to specify whether large or small objects should be detected and can be used to make the algorithms robust to noise. The curvature term deals with noise and keeps detected objects from being scattered.
The improvement in quality of the CVK and Bmeans over CV can not be measured in terms of correct output of the algorithms, but in terms of user friendliness; both algorithms produce correct results if the right set of parameters µ and λ i,j are put in. However, it can be tricky to tune the parameters µ and λ i,j for the CV algorithm, whereas CVK and B-means work fine by just choosing µ and setting all parameters λ i,j equal to one [10,16]. So CVK and B-means are more robust in usage.

Quantitative comparison of the CVK versions
In the narrow band method for CV/CVK, time is saved because calculations are only performed on a small domain. On the other hand, extra administration is needed to calculate and store the location of the narrow band. In the current implementation, the narrow band does not result in speedup but some slowdown, whereas previous versions of the narrow band did result in speedup. This is not a flaw in the current implementation of the narrow band method. In previous implementation of the full domain method, the level set methods had to be reinitialized after every iteration. This could be eliminated in the current implementation of the full domain method, but not in the implementation of the narrow band method. The narrow band in B-means slows down the K-means algorithm, but gives a significant improvement over CVK.
The multiresolution method does not only reduce the number of grid points, but also reduces the number of operations that have to be performed on every grid point. For stability reasons, τ = O(h 2 ), with τ the time step and h the spatial step. On a coarser grid, larger time steps can be made, so lesser iterations are needed.

Results
The CVK algorithm was implemented (full domain, narrow band and multiresolution) in C++ using MPI functions for communication. B-means was implemented in C++ for sequential computation. For qualitative comparison, K-means was implemented in Matlab.
For undamaged, unblurred, synthetic images, Kmeans is be very fast. The algorithm can finish in just a few iterations, because in these kind of images there is only a small amount of different colors and pixels are assigned to classes based on color and not on location: if it is decided that a pixel with some color should be in some class, than all pixels with the same color are assigned to that class in the same iteration. The active contour algorithms CV/CVK/B-means on the other hand are slower, because only pixels near hypersurfaces are reassigned. For natural images or noisy images, the speed of the K-means algorithm is irrelevant, because the algorithm is not applicable.
Calculation time for CV is discussed in [16]. For the CVK algorithm, calculation time was measured on a Cray T3E parallel computer [19] in Delft and the DAS-2 clusters [20]. The efficiency of the algorithm run on DAS-2 on several gray-valued and color images, ranging in size from 100×100 to 600×480, using one or two level set functions, using different initializations and using different numbers of coarse grids, is shown is Fig. 7. Improvement in efficiency can be observed for most test cases on DAS-2, if asynchronous communication is used. Figure 8 shows an example of typical improvement by the asynchronous version. For large images, much less improvement can be seen, but for these images communication overhead plays a less important role relative to the increased number of calculations. Because both MPI Send and MPI Isend are implemented asynchronously with default setting of MPI BUFFER MAX on the Cray T3E, the use of MPI Send and MPI Isend do not influence the efficiency much.

Concluding remarks
In this paper we discussed two new image segmentation methods (CVK and B-means) and the parallel implementation of the former. Both methods were created by integrating a classic clustering algorithm (K-means) into a recently developed active contour model (Chan -Vese). The narrow band method and the multiresolution methods were attempts to decrease the size of the image domain and thereby the calculation time. The multiresolution method proved very useful for regular images and indispensable for large images. Because of reinitialization after every time step, the narrow band method for CVK could not compete with the full domain version. However, for B-means the narrow band is a significant improvement.
Parallelization is useful for both small and large images. Efficiency decreases when extra processors are added, but this decrease is smaller for large images than for small images. The MPI 1.1 does not support dynamic allocation of resources during the algorithm. The MPI 2.0 standard will support dynamic allocation of resources, which will make the multiresolution method more efficient; every time the algorithm moves from a coarse grid to a finer grid, more processors could be added.
Replacing synchronous communication functions with asynchronous ones made the DAS-2 [20] more efficient for most test cases. Efficiency did not change much for very large images. On the Cray T3E [19] that was used, no improvement in efficiency could be detected. Communication on the Cray is much faster than the DAS-2, but for the calculation time vice versa.

Appendix A1. Derivation of Chan -Vese image segmentation [4,16,17]
Mumford & Shaw base the constraints on the minimization of an energy functional where u is the original image, u is the segmented image, C is a hypersurface around detected objects, # processors surface(C) is the length of the contour (in 2D) or the area of the surface or hypersurface (in 3D or higher dimensions), µ 0 and λ 0 are parameters that can be set by the user.
Chan & Vese represent the hypersurface C by a set of n simple closed hypersurfaces {C 0 , . . . , C n−1 }, Every hypersurface C i segments Ω \ C in 2 subdomains and the set of n hypersurfaces therefore segments Ω \ C in 2 n subdomains (Fig. 9). Let be characteristic functions of the regions outside the hypersurfaces C i , then every subdomain can be indexed by a binary representation (Fig. 9) The index of the subdomain that lies opposite Ω i relative to hypersurface C j isĨ soĨ j (x) Eq. (19) can be calculated by inverting the (j + 1)-th bit of I(x) Eq. (18). For a large subset of images the gradient is small inside objects and large on boundaries. However, objects are harder to detect if they are not defined by a gradient, like in blurred or noisy images or when boundaries are smooth. To adapt the functional so those images can be segmented too, u is restricted to piecewise constant functions u has a constant value c i ∈ [0, 1] m on every subdomain Ω i and is the characteristic function χ i : Ω → {0, 1} of subdomain Ω i .
For vector-valued images (m > 1), the Chan-Vese energy functional is defined by or using the characteristic functions χ i : Ω → {0, 1} Eq. (20), The characteristic functions Eq. (20) can be defined in terms of β i,j Eq. (22), γ i (x) Eq. (17) and G i,j (x) Eq. (23) (Fig. 10) 1 if x and Ω i lies on the same side of C j 0 else This way the characteristic functions χ i can be conveniently decomposed into functions G i,j that depend on the location of one hypersurface C j only which will prove to be very useful in the derivation of speed functions for the hypersurfaces C j . The steps of the algorithm are to minimize the energy Eq. (21) by moving the hypersurfaces while keeping {c 0 , . . . , c 2 n −1 } fixed, and then recalculate {c 0 , . . . , c 2 n −1 } while keeping the hypersurfaces fixed. The latter can be done by setting the derivatives with respect to c i,j to zero: In other words, c i is the average color of the image on subdomain Ω i . In case |Ω i | = 0, subdomain Ω i is empty, it is not necessary to know c i , since the subdomain does not contribute to the energy. For the former, an appropriate representation for the hypersurfaces must be chosen: level set functions.
Hypersurface C i is represented by the zero level set Eq. (3) of the function φ i : Ω → R. If the number of hypersurfaces n is larger than one, the level set functions are combined in a vector: Φ : Ω → R n , Φ = (φ 0 , . . . , φ n−1 ). {c 0 , . . . , c 2 n −1 } is written as a 2 n × m matrix c to shorten notation.
With H the Heaviside function Eq. (34) and δ the Dirac function Eq. (35) (Appendix A3), The energy Eq. (25) can be minimized with respect to Φ by calculating the Euler-Lagrange equations. Let Ψ be a test function and define For convenience L is split into a surface term L S and a fitting term L F : Regularizations of H and δ Eq. (36) are used instead of the usual Heaviside and Dirac functions (appendix A3), however, for the regularized versions the same notation is used. By doing so, it is possible to expand the Dirac function: Also expanding (see appendix A4) gives an expression for L S : Finally, to get an expression without ∇Ψ i , Green's theorem ∂Ω n · (gf )dS = Ω ∇ · (gf )dx can be used with f = δ(φ i ) ∇φi |∇φi| and g = Ψ i to obtain A similar expression for L F can be found: l =k G i,l (Φ) can be seen as the characteristic function, where the presence of C k is ignored.
Combining Eqs (26-28) and (29) results in for all test functions Ψ i , i ∈ {0, . . . , n − 1}. Choosing all Ψ i equal to zero except for one i ∈ {0, . . . , n − 1}, Eq. (30) gets decoupled: The Euler-Lagrange equations can now be obtained by introducing an artificial time t for the curve evolution Introducing penalty functions p i : Ω → R to express why points in Ω should not be in Ω i , the Euler-Lagrange equations Eq. (33) become To get an evolution equation for a contour moving in normal direction, δ(φ i ) is replaced by |∇φ i | [7]. In that case Eq. (6) is valid with speed function and Neumann boundary conditions ∇φ i · n = 0 ∀ ∈ {0, . . . , n − 1} on ∂Ω.

Appendix A2. Alternative form of the curvature term
Let f i = ∂f ∂x , f ij = ∂ 2 f ∂xi∂xj , then