A Novel Multiscale Edge Detection Approach Based on Nonsubsampled Contourlet Transform and Edge Tracking

Edge detection is a fundamental task in many computer vision applications. In this paper, we propose a novel multiscale edge detection approach based on the nonsubsampled contourlet transform (NSCT): a fully shift-invariant, multiscale, and multidirection transform. Indeed, unlike traditional wavelets, contourlets have the ability to fully capture directional and other geometrical features for images with edges. Firstly, compute the NSCT of the input image. Secondly, the K-means clustering algorithm is applied to each level of the NSCT for distinguishing noises from edges. Thirdly, we select the edge point candidates of the input image by identifying the NSCT modulus maximum at each scale. Finally, the edge tracking algorithm from coarser to finer is proposed to improve robustness against spurious responses and accuracy in the location of the edges. Experimental results show that the proposed method achieves better edge detection performance compared with the typical methods. Furthermore, the proposed method also works well for noisy images.


Introduction
Edges are prominent features in images and their detection and analysis are an essential task in computer vision and image processing.However, edge detection is a very difficult task, mainly because of the abstract nature of the edge detection problem.For instance, there is no clear definition of what an edge is.Typically, edges are defined as those points of an image where the gradient is noticeably large [1].When viewing an image, humans can easily determine the boundaries within that image without needing to do so consciously.However, no single edge detection approach, up to the present, has been devised which will successfully determine every different type of edges.
The conventional edge detectors include different differential operators, such as Sobel operator [2], Prewitt operator [3], and Robert operator [4].These operators are easy to implement and, however, sensitive to noise due to lack of smoothing stage.They are inappropriate for the real images.Canny [5] puts forward well known three criteria of edge detectors.The criteria expressed by Canny are as follows [5,6].
(i) Good detection (C1): there should be a low probability of failing to mark real edge points and low probability of falsely marking nonedge points.(ii) Good localization (C2): the points marked as edge points by the operator should be as close as possible to the center of the true edge.(iii) Only one response to a single edge (C3): each edge in the image should produce a single response.
Besides, Canny also proposed a scheme for combining the outputs from different scales.His strategy is fine-to-coarse and this method is called feature synthesis.It was observed in [7,8] that, at a single scale, the Canny edge detector is equivalent to the detection of the local maxima of the wavelet transform of an image, for some particular choices of the analyzing wavelet.However, the edges are still obtained by globally threshold processing.In this kind of approaches, many edges are extracted accompanying noises when the threshold value is too low; fewer significant edges remain when the threshold value is too high [9].A richer description can be obtained by considering the response of the image to a family of filters of different scales and orientations.In the past, many multiscale methods based on wavelets have been successfully applied to the analysis and detection of edges [7][8][9][10][11][12].For instance, Mallat et al. [7,8] have implemented multiscale Canny edge detection with the discrete wavelet transform and proved that the wavelet transform modulus maxima detect all the singularities of a function.Shih and Tseng [9] proposed a two-stage edge extraction approach with contextual filter edge detector and multiscale edge tracker for further analysis of computer vision.Heric and Zazula [10] presented an edge detection algorithm using Haar wavelet transform and signal registration.Brannock and Weeks [11] have proposed an edge detection method based on the discrete wavelet transform (DWT), which combines DWT with other methods to achieve an optimal solution to the edge detection problem.Jiang et al. [12] propose a set of simplified version of Gabor wavelets and an efficient algorithm for extracting the features for edge detection.For one-dimensional (1D) piecewise smooth signals, wavelets have provided an optimal representation for these signals in a certain sense.However, natural images contain intrinsic geometrical structures that are key features in visual information.As a result of a separable extension from 1D bases, wavelets in two-dimension (2D) are not very effective in representing images containing distributed discontinuities such as edges.In addition, separable wavelets have a limited capability in dealing with directional information: an important and unique feature of multidimensional signals.These disappointing behaviors indicate that edge detection based on wavelet methods cannot get satisfactory results.
In recent years, Do and Vetterli [13] proposed an efficient directional multiresolution image representation method, namely, contourlet transform.Indeed, unlike wavelet, the contourlet transform has good direction sensitivity, the anisotropy, and can catch accurately the image edge information.Owing to the geometric information, the contourlet transform achieves better results than wavelet transform in image analysis applications such as enhancement and denoising.Due to downsampling and upsampling, the contourlet transform lacks shift-invariance, which is desirable in many image applications such as edge detection, contour characterization, and image enhancement [14].da Cunha et al. [15] propose an overcomplete transform called the nonsubsampled contourlet transform, which is a fully shift-invariant, multiscale, and multidirection expansion that has better directional frequency localization and a fast implementation.
In this paper, a novel multiscale edge detection method based on the NSCT is proposed.Our edge detection approach has similarities with a number of other methods based on curvelet, complex wavelet, and shearlet.By contrast, the proposed approach is very competitive for detecting both the location and orientation of edges and can effectively reduce the noise influence and produce proper edges for images.
The rest of this paper is organized as follows.Section 2 introduces NSCT and -means clustering algorithm.The proposed algorithm is described in Section 3. Section 4 shows the experimental results to demonstrate the effectiveness of the proposed method.Section 5 concludes this paper.

NSCT and 𝐾-Means Clustering
In this section, we will briefly review the theory of the NSCT and the -means clustering, presented in [14,15,17] and in [18,19], respectively.They will be applied in the subsequent sections.

NSCT.
The NSCT is a shift-invariant version of the contourlet transform.The contourlet transform employs Laplacian pyramids (LPs) for multiscale decomposition and directional filter banks (DFBs) for directional decomposition [20].Due to downsamplers and upsamplers present in both the LPs and DFBs, the contourlet transform is not shiftinvariant.To achieve the shift-invariance, the NSCT is built upon nonsubsampled pyramids and nonsubsampled DFBs as shown in Figure 1.
(1) Nonsubsampled Pyramid (NSP).The nonsubsampled pyramid structure ensures the multiscale property of the NSCT.The building block of the nonsubsampled pyramid is two-channel nonsubsampled 2D filter banks.Figure 2(a) illustrates the nonsubsampled pyramid decomposition with  = 3 stages [15].The perfect reconstruction condition is given as x H 0 (z) where  0 () is low pass decomposition filter, where  0 () and  1 () denote the low pass filter and the corresponding high pass filter at the first stage, respectively.The ideal frequency response of the nonsubsampled pyramid is given in Figure 2(b).
(2) Nonsubsampled Directional Filter Bank (NSDFB).The nonsubsampled DFB structure gives varying directions.The building block of the nonsubsampled DFB is also twochannel nonsubsampled 2D filter banks.The NSDFB is constructed by eliminating the downsamplers and upsamplers in the DFB [15].To achieve multidirection decomposition, the NSDFB is iteratively computed.Figure 3 illustrates a four-channel directional decomposition with two-channel fan filter banks.The equivalent filter in each channel is given by (3) Combining the Nonsubsampled Pyramid and Nonsubsampled Directional Filter Bank in the NSCT.The NSCT is constructed by combining the NSP and the NSDFB as shown in Figure 1(a).The NSP provides multiscale decomposition and captures the point discontinuities.The NSDFB provides directional decomposition and links point discontinuities into linear structures [21].This scheme can be iterated repeatedly on the low pass subband outputs of nonsubsampled pyramids.The NSCT is very suitable for image edge detection because it has such important properties as multiresolution, localization, shift-invariance, and multidirection.In this paper, the "maxflat" filters and the "dmaxflat7" filters are, respectively, selected for NSPs and NSDFBs.The details of design for filters are referred to [15].

𝐾-Means
Clustering.The -means is a classical unsupervised clustering algorithm [22].As an unsupervised technique, -means clustering is effective in producing clusters for many practical applications such as image segmentation and image compression and determining the boundaries of the objects in an image.The aim of -means algorithm is to divide  points in dimensional space into  clusters so that the within-cluster sum of squares is minimized and the between-cluster sum of squares is maximized.Suppose we have a set of  data points . ., } is cluster centers.The measure of the distance is commonly chosen as Euclidian distance to minimize the following mean square error (MSE) cost function: where  , is a binary variable that indicates the cluster assignment of the point; if   is assigned to cluster  then  , = 1 and  , = 0, for  ̸ = ,  = 1, 2, . . ., .In order to minimize (4) we need to find the set of cluster ownerships { , } and the set of cluster centers {  }.Starting from a set of initial values of   , this can be accomplished through an iterative procedure that interleaves between optimizing with respect to the cluster ownerships, keeping the cluster centers fixed, and optimizing with respect to the cluster centers while keeping the ownerships fixed [18].The -means algorithm can be summarized as follows.
Step 1. Generate the starting condition by defining the number of clusters and randomly select the initial cluster centers.For instance, select a set of  candidate cluster centers { 1 ,  2 , . . .,   }.
Step 2. Assign each   to its nearest cluster center   .
Step 3. Recalculate the cluster centers to the mean value of the points in each cluster.
Step 4. Repeat Steps 2 and 3 until a distance convergence criterion is met.

Research Methodology
In this section, the proposed multiscale edge detection algorithm will be described in detail.This includes the feature classification using -means clustering algorithm, the NSCT modulus maxima processing, and the edge tracking for coarse-to-fine.A schematic representation of the proposed method is shown in Figure 4.

Feature Classification Using 𝐾-Means
Clustering.In this section, we will first analyze and discuss the coefficients distribution of the NSCT in different subbands at a fixed level, for some typical locations.Then, we describe the feature classification using -means clustering.As described in Section 2.1, the NSCT is a fully shift-invariant, multiscale, and multidirection transform.It provides not only multiresolution analysis, but also geometric and directional representation [14]; thus the NSCT can efficiently capture the geometrical structures in natural images.Since edges are geometric structures, while noises are not, we can use the NSCT to distinguish them.The NSCT is shift-invariant, which leads to each pixel of the transform subbands corresponding to that of the original image in the same spatial location [15].Therefore, we gather the geometric information from the NSCT coefficients for some typical spatial locations.Consider a simple image  containing edges and smooth regions, like the one illustrated in Figure 5(a), and let NSCT [, , ] be the NSCT of , where  is the scale variable,  is the direction variable, and  is the location variable.We examine the behavior of the NSCT [, , ], at a fixed scale  0 , for some typical locations  0 .As Figure 5 shows, we can classify pixels into three categories by analyzing the plot of   0 () = |NSCT ( 0 , ,  0 )|, namely, strong edge points, weak edge (near edge) points, and smooth (noise) points.For instance, at the point  0 =  on smooth edges and the junction point  0 = ,   0 () have the large magnitude coefficients in all subbands (strong edge points); at the point  0 = ,   0 () has large magnitude coefficients in some directional subbands and small magnitude coefficients in other directional subbands within the same scale (weak edge points); at the point  0 = ,   0 () has small magnitude coefficients in all subbands (smooth points).The same behavior holds for more general images, even in the presence of additive white Gaussian noise.
This suggests devising a strategy for classifying strong edge points, weak edge points, and smooth points based on the plot patterns of Figure 5(f).Let  0 be a fixed (fine) scale.In this paper, we choose to use two statistical feature variables (the energy and the maximum denoted by  and Max, resp.) of the coefficients for each pixel across directional subbands at this level.The main reason is the magnitude of the NSCT coefficients at the points along the boundaries (including strong edges and weak edges) which is significantly larger than at the other points.On the one hand, the  value has large difference to some boundary points and smooth (noise) points; they can be easily separated by looking at the energy function at : On the other hand, the  value is similar to other weak edge points and smooth points but the Max with large difference; they can also be easily separated by looking at the maximum function at : Therefore, we construct a two-dimensional feature vector (, Max) based on the energy value  and the maximum Max for each pixel as shown in Figure 6.Next, using the (, Max) as the feature function, we apply the -means clustering algorithm, as described in Section 2.2, to cluster the image points into three sets  1 ,  2 , and  3 .Let , , and  denote the set of points   max ,   middle , and   min , respectively.This can be defined by where  is the set of strong edge points and  is the set of smooth points.The remaining set  contains the weak edge points.In order to obtain a complete edge, we reserve the sets  and , and the set  is set to 0. We denote the original edge map of an image  as (, ) = {(),  ∈  ∪ } for  = 1, 2, . . ., , where ∪ represents union.Figure 7 illustrates the application of the feature classification using -means clustering algorithm to a test image.The results show that the feature classification using -means clustering can effectively distinguish nonedge points from edge points.

Identifying of the NSCT Modulus Maxima.
In this section, we will describe the modulus maxima processing in NSCT domain.The theory of modulus maxima to edge detection lies in the fact that modulus maxima carry a significant degree of information about the position of edges.Mallat and Zhong [7] proved that the wavelet transform modulus maxima detect all the singularities of a function.They calculated the local maxima of discrete wavelet transform at each scale and formed a multiscale edge representation of an image.In what follows, we apply the modulus maxima in the NSCT domain.
The NSCT provides a multidirectional structure, which can be used to efficiently obtain the edge direction for each pixel point.Figure 8 illustrates an eight directions decomposition of the NSCT.As in the wavelet method, we will select the edge point candidates of an image  by identifying the local maxima of the function (, ) at fine scales.The NSCT modulus maxima algorithm consists of the following steps.
Step 5. Input the raw edge subband (, ) at each scale.The (, ) is obtained in Section 3.1.
Step 6. Estimate the edge direction (, ) by  (, ) = arg max      NSCT  (, , )     .In order to compare modulus values in any direction, we use the interpolation method (in our case, linear interpolation).
Step 7. At each edge point candidate, the magnitude of (, ) is compared with the values of its neighbors along the gradient direction (this is obtained from ( 8)).If the magnitude is the largest, the point is kept.Otherwise, it is discarded and (, ) is set to 0. These produce a set of possible edge points denoted by {(, ),  = 1, 2, . . ., }, where  is the decomposition level of the NSCT.The results of the NSCT modulus maxima processing are shown in Figure 9.We can find that the NSCT modulus maxima processing can effectively remove pseudoedge responses.from the multiscale edge images because very few works propose techniques for combining multiscale information.

Edge Tracking. After modulus maxima processing, it is still a challenging problem to extract accurately edge image
To this end, we propose to combine the edges appearing at different scales by performing coarse-to-fine edge tracking.This algorithm is used to extract the complete and true edges and improve the robustness against noise.According to the multiscale property of the NSCT, for an image , it provides scale space decomposition in the frequency domain.
In general, the fine scales tend to produce results with a higher spatial accuracy, but also to be particularly sensitive to noise.Alternatively, the coarse scales are more robust against noise but tend to suffer from displacements of the edges from their true positions [23].As shown in Figure 9, we observe that the nonedges disappear and the true edges progressively move away from their true positions and eventually merge or vanish when scale  increases.Besides, we also found that the displacements of the edges are along its gradient direction and the distance  = 1 commonly between two consecutive scales.We chose Chebyshev distance as a measure of distance.
We can see that the finest scale edge image has the problem of double edges and broken edges as shown in Figure 9(d).Therefore, we replace the finest scale edge image with canny edge image to obtain a complete and continuous edge image.
The following sections describe whole edge tracking algorithm process in detail.
We summarize our edge tracking algorithm as shown in Algorithm 1.
The edge tracking results at each scale are shown in Figures 10(a)-10(d).We can observe that spurious responses are removed and produce edge results with a higher spatial accuracy.Meanwhile, we can also find that some significant edges are removed at finest scale but it is retained at subfinest scale.Therefore, we integrate next the edges at finest scale and the significant edges by removing small fragments at  subfinest.To this end, the final edge map is obtained by the edge subsequent processing (e.g., edge connector and thinning processing).The final edge image using the proposed approach is shown in Figure 10(e).

Experiments and Results
In this section, we intend to compare the performance of the proposed method with some conventional edge detection algorithms, including the Sobel edge detector [2], the LoG edge detector [24], the Canny edge detector [5], and the M-Sobel edge detector [23].In the experiments we have used the test set of the Berkeley segmentation dataset (BSDS) because it has the advantage of providing multiple human labeled segmentations per image as shown in Figure 11.Thus we use them as ground truth for the quantification of the quality of the edge detection results and these images have a resolution of 481 × 321 pixels.We first use different scale decomposition values of the NSCT to generate the final edge images for testing the proposed approach, which are [2,2,3,3], and [2,3,3,3].The extracted edges using the proposed approach are shown in Figure 12.We can observe that the edge detector can reduce the noised influence with the increase of the number of scale decomposition of the NSCT, especially for noisy image as shown in Figures 12(f), 12(g), and 12(j).As previously described in Section 3.3, with the NSCT scale changing from finer to coarser, the nonedges disappear and the true edges progressively move away from their true positions and eventually merge or vanish.So the slight improvement comes at the cost of a higher computational complexity with the increase of the number of scale decomposition.In addition, we also found Mathematical Problems in Engineering that, when the number of scale decomposition is the same, the extracted edges are more complete in the case of 2 3 direction decomposition.The simulation results are shown in Figures 12(c)-12(e) and Figures 12(h)-12(j).Therefore, we use four scales of decomposition of the NSCT for the following experiments, and the direction numbers of decomposition are 2, 3, 3, 3 from coarser to finer, respectively.
Then we have compared the performances of the proposed method with the Canny method, the LoG method, the Sobel method, and the M-Sobel method.Figure 13(a) is the original images, and the edge detection results using different methods are shown in Figures 13(b)-13(f).We can see that the Canny method and LoG method detect many false edges when they try to obtain more real edges, which can be seen in Figures 13(b) and 13(c).The Sobel method has better edge detection results and less false edges.However, the detected edges are usually incomplete which can be clearly observed in Figure 13(d).Figure 13(f) shows the detection results using the proposed method.It can be seen that our method can remove the false edges and keep the real edges as much as possible.In addition, we can also find that the proposed method compared with the M-Sobel method (Figure 13(e)) has similar or better edge detection results for original images.Meanwhile, the edge detection results of our method are more complete and continuous than the edge images detected using the other three methods.These are mainly because the NSCT has the ability to fully capture directional and other geometrical features for images with edges.
Next, experiments are tested to validate that the proposed approach is efficient to the images with Gaussian white noise, where the noise variance is  2  = 0.001 in our experiments.Figure 14(a) is the original images and the extracted results using different methods for noisy images are shown in Figures 14(b)-14(f).We can find that the proposed approach has better robustness against noise than the other four methods.The main reasons are as follows: in general, fine scales of the NSCT provide spatially accurate results but are particularly sensitive to noise; meanwhile, coarse scales of the NSCT have better robustness against noise but tend to suffer from displacements of the edges from their actual position.Thus we achieved better edge detection results by performing coarse-to-fine edge tracking presented in this paper.Finally, in order to objectively evaluate the edge detection performances of the proposed approach and other methods, we use the classification inspired methodology by Martin et al. [25].In essence, edge detection can be seen as a binary classification problem.Therefore, it can be evaluated in terms of success and fallout, comparing the candidate edge image (generated by an edge detection method) with the ground truth (generated by a human).According to the ground truth, the pixels in the candidate edge image can be classified into four different categories: true positive (TP), false positive (FP), true negative (TN), and false negative (FN) [6].In this way, we also build a confusion matrix as shown in Table 1 [23].From the confusion matrix we extract the precision and recall evaluations, defined as The precision and recall measures hold good stability properties and illustrate specific aspects of the problem.Therefore, we use the F-measure by Martin et al. [25] to evaluation of the overall quality of an edge image, defined as where  is a parameter weighing the relative impact of the precision and recall evaluations.In addition, Pratt's figure of merit (PFoM) [26] is used for quantitative evaluation of spatial accuracy, defined as where   is the number of the candidate edges and  gt is the number of the ground-truth edges.() denotes the Euclidian distance from the th candidate edge to the corresponding ground-truth edge. is a constant, generally In this work, we use a one-to-one pixel matching algorithm to map the edge pixels in the candidate edge image and the ground truth.This matching allows for a certain spatial tolerance (in our case, as much as 0.5% of the diagonal of the image), so that an edge pixel can be slightly moved from its true position, yet being considered as correctly classified.For a given image, the result of each edge detection method is compared with several ground-truth images, each one producing a tuple of evaluations ( 0.5 ,   ).Then, the comparison producing the best quality assessment, in terms of  0.5 , is the one we select.Tables 2 and 3 show the results obtained of each detector in Figures 13(a) and 14(a), respectively.The results show that the proposed approach is quantitatively better than all other methods.More specifically, the proposed method can achieve better results with the typical methods for original images.In particular, it also works well for noisy images and the advantages are obvious.

Conclusions
In this paper, we developed a multiscale edge detection approach based on NSCT and edge tracking.This method is able to distinguish the real edges from the noise and overcome the shortcoming of the limited directions of wavelet transform.In order to produce the final edge image, we have proposed a coarse-to-fine edge tracking algorithm, to improve the edge localization accuracy and yield better edge detection results.Experimental results demonstrate that the proposed approach achieves better edge detection performance compared with the typical methods.In particular, it provides an accurate method for extracting the information about edges and their locations even in presence of noise.Nevertheless, the proposed method has a higher computational complexity due to the NSCT.

Figure 4 :
Figure 4: Schematic representation of the proposed method.

Figure 5 :Figure 6 :
Figure 5: The coefficients pattern for some typical locations across directional subbands at a fixed scale.(a) Representative points on the test image.(b)-(e) NSCT pattern (), as a function of  for the points indicated in (a).(f) Representative points on the test image.

Figure 7 :
Figure 7: Results of feature classification using -means clustering.(a) Original image.(b) Low pass subband image of the NSCT.(c)-(f) The results of feature classification of high pass subbands from coarser to finer.

Input:Figure 8 :
Figure 8: The gradient map with directional decomposition number is 2 3 .

Figure 9 :
Figure 9: (a)-(d) Results of the NSCT modulus maxima processing from coarser to finer.

Figure 10 :
Figure 10: (a)-(d) The results of the edge tracking from coarser to finer.(e) The final edge map on original in Figure 7(a).

Figure 11 :
Figure 11: Berkeley segmentation data set [16].Top to bottom: image and ground-truth segment boundaries hand drawn by three different human subjects.The BSDS500 consists of 300 training and 200 test images, each with multiple ground-truth segmentations.
1 () is high pass decomposition filter,  0 () is low pass reconstruction filter, and  1 () is high pass reconstruction filter.The ideal frequency support of the low pass filter at the th stage is the region [−/2  , /2  ]

Table 1 :
Confusion matrix for the edge detection evaluation.

Table 2 :
The  0.5 and   values, where the 1, 2, 3, and 4 denote top, middle top, middle bottom, and bottom original images in Figure12(a), respectively.

Table 3 :
The  0.5 and   values, where the 1, 2, 3, and 4 denote top, middle top, middle bottom, and bottom noise images in Figure13(a), respectively./9 for keeping the values in [0, 1].A better approach possesses larger   and   values.