Effective Volumetric Feature Modeling and Coarse Correspondence via Improved 3 DSIFT and Spectral Matching

This paper presents a nonrigid coarse correspondence computation algorithm for volumetric images. Our matching algorithm first extracts then correlates image features based on a revised and improved 3DSIFT (I3DSIFT) algorithm.With a scale-related keypoint reorientation and descriptor construction, this feature correlation is less sensitive to image rotation and scaling.Then, we present an improved spectralmatching (ISM) algorithmon correlated features to obtain a one-to-onemapping between corresponded features. One can effectively extend this feature correspondence to dense correspondence between volume images. Our algorithm can benefit nonrigid volumetric image registration in many tasks such as motion modeling in medical image analysis and processing.


Introduction
Establishing spatial correspondence between volume images is an important and fundamental computational problem in 3D image and video processing.It can greatly benefit many tasks such as nonrigid image registration [1][2][3][4], registration evaluation [5][6][7][8], volumetric ultrasound panorama [9], object recognition [10], and three-dimensional classification [11], to name a few.In biomedical computation, many medical scans (CT and MRI) are volumetric images, whose spatial and temporal correspondence are critical in clinical monitoring, diagnosis, and treatment planning [12][13][14].However, robust volumetric images registration is often challenging due to the nonrigid characteristics of the deformations of the organs and tissues.
A nonrigid registration is often formulated as a nonlinear optimization which is usually nonconvex and very expensive to solve.Feature correspondence suggests desirable initial guess to this optimization problem which can improve the computation efficiency.Furthermore, it can be adopted as an extra guidance in the dense registration computation to overcome many undesirable local minima and obtain a reliable global solution.
In 2D image registration, the SIFT descriptor and its variants [15,16] are widely adopted feature modeling methods due to their great robustness and discrimination power.3DSIFT [17][18][19] generalizes the 2D SIFT into volumetric images and has been demonstrated to be effective in volume image matching problems.However, the existing 3DSIFT algorithms are still inadequate in handling dynamically deforming volumetric images such as aforementioned medical images, due to their sensitivity against image rotation and scaling.Also, the extracted feature correspondences using local descriptors are usually many-to-many and include mismatches.We need a reliable correspondence selection/refinement algorithm to remove mismatches and obtain an objective mapping between feature point sets extracted from the source and target images.In the recent computer vision literature, such a correspondence refinement procedure is often formulated as a graph matching problem; Leordeanu and Hebert [20] proposed an efficient and effective spectral algorithm to solve this graph matching problem.

Mathematical Problems in Engineering
In this work, we propose a volumetric image matching algorithm in a three-step pipeline: (1) feature extraction and initial correlation, (2) feature matching and correspondence refinement, and (3) extending the coarse correspondence (feature matching) to dense intervolume correspondence.Specifically, consider the following: (1) we develop an improved 3DSIFT (I3DSIFT) descriptor that is less sensitive to rotation and scaling, to reliably extract feature correspondences between given volumes; (2) we propose an improved spectral matching (ISM) algorithm that can more effectively refine the initial feature correlation and obtain correct matching between corresponding features; (3) we implement a spline fitting algorithm to extend these coarse correspondences to dense intervolume correspondences.
The remainder of this paper is organized as follows.We briefly review related work in Section 2. After giving an overview of our algorithm pipeline in Section 3, we elaborate the three steps of this pipeline in Sections 4, 5, and 6.Some volumetric matching results are shown in Section 7 and one application on the motion modeling of medical images is shown in Section 8.

Related Work
Our algorithm of computing dense correspondences between volumetric data is guided by coarse correspondences between features.The major challenges and our contributions in this work lie in the robust extraction and matching of features.Therefore, we mainly discuss related work in these fields.

Extraction of Correlated Features.
Volumetric feature extraction is the process to build coarse correspondences between volume images.It has been widely studied.
Scovanner et al. [17] first proposed the 3DSIFT descriptor and showed its application in action recognition.Their work is only on 3DSIFT descriptor construction and does not suggest feature correspondences construction.Furthermore, their descriptor is sensitive to scaling and rotation change.Ni et al. [9] implemented a 3DSIFT algorithm for feature extraction and applied it in volumetric ultrasound panorama.They extended scale space extreme detection and dominant direction reorientation step of SIFT to 3DSIFT.However, they did not utilize scale information so that their method is sensitive to scaling change.Cheung and Hamarneh extended SIFT to the so-called N-dimension SIFT (N-SIFT) [18].They demonstrated its effectiveness using synthetic experiments on volumetric MRI and 4DCT data matching.However, N-SIFT algorithm is still sensitive to both scaling and rotation.Allaire et al. [19] proposed a full 3DSIFT algorithm.They filtered out potential keypoints that either have low contrast or locate on edges or ridges.But many correct corresponding keypoints are filtered out unnecessarily.As a result, only 0.1%∼8% of potential keypoints were kept, which is sometimes too few for reliable volumetric matching.Also, this 3DSIFT is sensitive to scale change.Cheung and Hamarneh [18] also proposed the global histogram feature and reoriented global histogram feature.Both of them use single histogram on a keypoint region, while the latter has an additional reorientation step.The reoriented global histogram feature normalizes the direction of the gradients with respect to the highest peak in the histogram to increase robustness to orientation change.Unfortunately, both are less discriminative than N-SIFT for the lack of space information.Han [21] proposed a method based on 3DSURF.It is based on the partial derivatives of local regions and their absolute values.Therefore, 3DSURF is sensitive to noise.

Feature Correspondence
Refinement.Feature extraction and matching algorithms depending on local descriptors usually compute a set of correspondences between feature pairs.Some correspondences are many-to-many and some are mismatches.In order to provide correct and one-toone matching for the subsequent dense correspondence computation, a refinement is needed.The RANdom SAmple Consensus (RANSAC) algorithm [22,23] is a widely adopted randomized algorithm that iteratively estimates correspondences and eliminates outliers.It starts with a small feasible dataset and enlarges this set with consistent data correspondences when possible.The RANSAC algorithm is nondeterministic and is only stable in a probabilistic sense.In Xu et al. [24], we used 3DSIFT to extract features and applied RANSAC to obtain the feature matching.
Spectral matching [20,25], on the other hand, is a deterministic algorithm that has been reported with better efficiency [20].In spectral matching, a correspondence set is treated as a bipartite graph.It is based on the observation that correct matches corelate with each other strongly, while incorrect matches corelate with each other accidently.Despite its successful application when the bipartite graph is connected, how well the algorithm works when the graph is not connected (which is the case in our pipeline) is not clear.

Algorithm Overview
In the paper, we propose a pipeline to find dense correspondences between volumetric images.The pipeline has three steps.
(1) Given two volume images   and   , we extract an initial coarse correspondence set (  ,   ) using a revised 3DSIFT (I3DSIFT) algorithm.This initial coarse correspondence set  contains all the potential correspondences between features on   and features on   .In the following  is called the potential correspondence set or initial correspondence set (Section 4).
(2) Some correspondences in  are mismatches; therefore, we perform a revised spectral matching (ISM) algorithm to filter them out and get a refined coarse correspondence set   (Section 5).(3) Finally, we perform a B-spline fitting algorithm, guided by   , to compute the dense correspondences between   and   (Section 6).

Revised 3DSIFT
As discussed in Section 2, a reliable feature extraction algorithm should be insensitive to scaling and rotation change.
We implement a revised 3DSIFT (I3DSIFT) algorithm consisting of four main steps: keypoint detection, orientation assignment, direction reorientation, and descriptor construction.Figure 1 illustrates the feature extraction through these steps.The main changes on I3DSIFT are in the direction reorientation and descriptor construction.We propose a scale-related region definition method in these two steps.Moreover, 5-linear interpolation and feature normalization are introduced in the descriptor construction step to improve the robustness.Compared with the original 3DSIFT with fixed-size region definition, I3DSIFT is insensitive to rotation and scaling transformations.We will elaborate these four steps in the following.Keypoint detection is an essential procedure in volumetric image matching.In 3DSIFT, scale space extrema are identified as keypoints.The detection is based on the scale space theory [26,27].In real implementation, DoG (difference of Gaussian) [15,28] is usually adopted to detect keypoints.Given a volume image, its DOG images can be defined as a stack of images that are smoothed using series of DoG functions with the variable .For efficiency, an image pyramid with  octaves and  levels per octave is implemented to represent DoG images.The extremes of 3 × 3 × 3 × 3 across scale in each octave are defined as keypoints.The  of the Gaussian image in which the keypoint shows up is selected as the keypoint's scale.For example, a keypoint that shows up on the th level image, its scale  is assigned as  = 2 / .The scale of a keypoint can represent its surrounding feature region size (e.g., keypoint with big scale has a big feature region and vice versa).Therefore, we implement a scale-related region definition method in the following direction reorientation and descriptor construction steps.
Direction reorientation is an essential procedure to deal with image rotation.It includes two steps: dominant direction definition and keypoint region reorientation.In the previous 3DSIFT algorithms [17], the size of the keypoint regions is defined as a constant (e.g., sized 16 × 16 × 16).However, this definition does not distinguish keypoints with different feature regions.Keypoints with small feature regions have the same size of keypoint regions with the large ones.Accordingly, the definition of the dominant direction is not accurate.In the paper, inspired by SIFT [15], we implement a direction reorientation step that is related to scale.Our size of the keypoint region for orientation definition is defined as  * , where  is a user-defined parameter and  is the scale of the keypoint defined above.The definition of  is related to the content of volume images.In our lung CT images application, we set  = 6.Consequently, this definition is more accurate.
Descriptor construction is to build description for each keypoint.In order to design a 3DSIFT descriptor that is insensitive to scale change, similarly, we adore a scale-related descriptor region.We set the region size 12 × 12 × 12.After subdivision of the descriptor region into 4 × 4 × 4 subregions, we add magnitudes of the pixels in each subregion (its size is 3) according to its  and .In order to avoid all boundary effects that the descriptor abruptly changes as a sample shifts from being within one subregion to another or one orientation to another during this step, we implement a 5-linear interpolation method to interpolate within three subregion dimensions (, , and ) and two gradient orientations ((, , ) and (, , )).Each entry into a bin is multiplied by a weight of 1− for each dimension, where  is the distance of the sample from central value of the bin as measured in units of the histogram bin spacing.Then, we join the feature of these subregions and get the descriptor.In order to discard information due to noise or illumination, we normalize the descriptor to [0, 1] and discard features below a threshold (0.2 in the paper) by setting it as 0. Finally we normalize the descriptor into [0, 1] and get the final descriptor.

Revised Spectral Matching
After initial feature extraction and matching (Section 4), we obtain an initial correspondence set , indicating potential matches between features on   and   .To filter out incorrect matches, we can use the spectral matching algorithm.Compared with the well-known RANSAC [22] algorithm, spectral matching is in general more efficient [20,25], especially when a good initial correspondence set  is given.

Basic Idea of Spectral
Matching.Spectral matching is an efficient spectral method to find consistent correspondences in an initial correspondence set .An assignment graph  is assigned to represent .Each node represents a correspondence, while the weights on the edges represent the pairwise agreements between the correspondences. can be represented by an affinity matrix .The definition of  is shown as follows: where  = (,   ) and  = (,   ) are two correspondences from the potential correspondence set.  is the distance between points  and , the same as      . is a nonnegative symmetric matrix and each   encodes the pairwise agreement of right matches between  and  correspondences.
The basic idea of spectral matching is from an observation: correct assignments are likely to establish link with each other and thus form a strongly connected cluster, while incorrect correspondences establish links with others only accidently so that they are unlikely to belong to strongly connected clusters.Accordingly, correct matches form a strong cluster in , while incorrect matches do not.This phenomenon can be illustrated by the principal eigenvector of  (denoted as V  ).The elements of V  represent the confidence of each match.Correct matches have big values in V  , while incorrect matches have small ones.Consequently, a greedy selection method based on V  is presented to pick out the correct matches.Intuitively, the matches are picked based on the elements from big value to small value, while the one-one matching constraint is considered.

A Revision on Spectral
Matching.The basic idea of spectral matching is strongly based on the analysis of the affinity matrix  and the assignment graph .When most of the elements in  are larger than 0,  usually corresponds to a connected graph; when most of them are 0,  usually corresponds to a unconnected graph.
The basic spectral matching algorithm works well when  is highly connected.However, when a discriminative descriptor was used to establish the potential correspondence set, the mismatch pairs have been significantly reduced, and  is often sparsely connected or even not fully unconnected (see Figures 2(a)-2(c)).For such graphs, the basic spectral matching algorithms have the following problems.
The first problem is the theoretical error on unconnected graphs for spectral matching.As was mentioned above, spectral matching can be illustrated by the principal eigenvector of .However, this is not true when  is not a primitive matrix (i.e.,  corresponds to a unconnected graph) according to the Perron-Frobenius theorem [29].Directly applying spectral matching on unconnected graphs is theoretically unsupported.
The second problem is the failure on filtering out unobvious wrong matches.Spectral matching considers all the positive values in V  .It can filter out those matches whose values equal zero or which do not meet the matching constraint.However, when wrong matches have some connection with others they have positive value in V  .Spectral matching can not filter them out correctly.
The existing spectral matching algorithms [20,25] can not work well when  is unconnected, specifically, when initial correspondences in  are almost one-one matching.In this work we present a revised spectral matching (ISM) to overcome these problems (see Figures 2(d First,  is decomposed into several submatrices.Suppose that  has ,  ≥ 1 connected components; it can be decomposed into  primitive submatrices.Hence, it is theoretically supported to apply spectral matching on each submatrix.Second, a threshold is given to filter out unobvious wrong matches.In particular, we add a reject ratio ,  ∈ [0, 1] to discard mismatches. represents the rate of matches that we want to discard.A small  could not filter out mismatches, while a larger  leads to a higher accuracy but could reject many correct matches at the same time.We find that a suitable  should be corelated with the accuracy rate  of .We always choose  = 1 −  or a slightly larger value.The accuracy rate  can always be estimated by randomly choosing a small amount of correspondences for verification.For example, in our biomedical application (see Section 8), we model the deformation of lung volume images during the respiratory cycles, where organs will not deform too drastically.And medical therapists suggest that correspondences should not have the offset more than 3 voxel in -direction and 20 voxels in or -directions.These type of rules can be used as guidance to estimate  and therefore .

Feature-Guided Image Registration
This algorithm is based on our previous paper [24].However, unlike [24], which focuses on solving a continuous parameterization and smooth trajectory of an image sequence, we solve this registration to evaluate the effectiveness of our coarse correspondence results.Hence we used a simplified model with only the intensity term and regularization term.The computation of this model is hence simpler.
Assume that we have two volume images.One image, the moving image   (x), is deformed to fit the other image, the fixed image   (x).  (x) and   (x) are defined on their own spatial domain: Ω  ⊂  3 and Ω  ⊂  3 , respectively.We are going to find a transformation T(x) = x + u(x) that makes   (()) spatially aligned to   ().The transformation is defined as a mapping from the fixed image to the moving image; that is, : Ω  ⊂  3 → Ω  ⊂  3 .The quality of alignment is defined by a cost function  with a regularization term .In here, we will adopt the sum of squared differences with a feature regularization term: where  weighs similarity against feature constraints.To solve the above minimization problem, there are basically two approaches: parametric and nonparametric.In here we will consider the parametric methods by introducing a parametrization model of the transformation.The original optimization problem thus becomes where the subscript  indicates that the transformation has been parameterized.The vector  contains the values of the "transformation parameters." In here, we use the nonrigid Bspline transformation.Consider with   the knot points,  3 () the cubic multidimensional B-spline polynomial,   the B-spline coefficient vectors (the control point displacements),  the B-spline control point spacing, and   the set of all knot points within the compact support of the B-spline at .The parameters  are formed by the B-spline coefficients   .
The cost function is defined as with Ω  the domain of the fixed image   and |Ω  | the number of voxels.Given a transformation , this measure can easily be implemented by looping over the voxels in the fixed image   (  ), calculating   (  (  )) by interpolation, and adding the squared difference to the sum.Basing on our proposed feature matching algorithm, we can get the feature correspondences between two volume images.They will guide the spline-fitting.The regularization term therefore minimizes the distance of two corresponding feature points: where |{   }| indicates the size of the correspondence set and    ,    are corresponding features from the fixed and moving images, respectively.

Experimental Results
We denote the first two steps of our algorithm (that use I3DSIFT and ISM) as I3DSIFT-ISM.This section demonstrates the effectiveness of our feature matching (the first two steps of our algorithm).In our experiments, we generate an image pyramid with  = 3 octaves and  = 6 levels, where we set  = 1.6 and  = 2 1/3 .We set  = 0.2 for I3DSIFT-ISM.Our experiment materials are lung volumetric images.We perform experiments on two sets of data: 1 and 2.1 includes 3 real volume images (SE 1, SE 5, and SE 8) from different respiratory periods of the same person.2 is from the public POPI data [30].The volume images in 1 have a resolution of 465 × 465 × 20, while images 2 are 482 × 360 × 141.In preprocessing, we extract region of interests (ROI) through the graph-cut image segmentation and rescale the gray level of each volume image to [0, 255].
In our experiments, we detect scale space extremes as keypoints.We match descriptors based on NN (nearest neighborhood) search and only those matches whose ratio of NN distance and the second NN distance is smaller than a threshold  ( = 0.8 in our paper).We perform the NN search in two directions and a matched pair is kept only if it satisfies the NN optimality in both directions.We conduct NN search using a k-d tree algorithm for the sake of efficiency.We adopt a 1.5 voxel matching accuracy in the experiments on synthetic images.In the paper, most existing volumetric image matching algorithms are included to make a full range of comparison on volumetric image matching.The algorithms include global histogram (Global) [18], reoriented global histogram (Reor.)[18], N-SIFT [18], and 3DSURF [21].
We implement a 3DSURF descriptor based on the paper [21] and use a source code provided in [18] on global histogram, reoriented global histogram, and N-SIFT.I3DSIFT and I3DSIFT-ISM are implemented in C++ and ITK.We show the performance of I3DSIFT and I3DSIFT-ISM on synthetic images to demonstrate the performance with scaling and rotation change.Moreover, we show results on real data.
7.1.On Synthetic Images.Rotation insensitive is an important property to evaluate volumetric matching methods.In order to demonstrate the performance of I3DSIFT-ISM with rotation change, we do matching on SE 1 (from the first dataset 1) and its synthetic images with transforms of varying angles about -axis.Figure 3 shows the result.In Figure 3, I3DSIFT gets an accuracy about 80% while N-SIFT [18] and 3DSURF [21] will fail when rotation change is large (larger than 30 ∘ ).I3DSIFT outperforms N-SIFT [18] and 3DSURF [21].I3DSIFT performs much better when the rotation angle is times of 10 ∘ , because we define the dominant direction every 10 degrees.As a result, we can change the way we define the dominant direction, for example, every 5 ∘ definition instead.However, we should make a compromise between accuracy and efficiency.I3DSIFT-ISM gets an accuracy of almost 100% in every rotation angle.Above all, I3DSIFT is less insensitive to rotation change to the state of the art, while, furthermore, I3DSIFT-ISM is nearly rotation-invariant.
Scale insensitive is another important property to evaluate volumetric matching methods.In order to demonstrate the performance of I3DSIFT-ISM with scale change, we show matching results on SE 1 and its down-sampling synthetic images by a varying factor along each of the 3 axes.Figure 4 shows the result.I3DSIFT outperforms N-SIFT [18] and 3DSURF [21].There are some further improvement when using I3DSIFT-ISM.I3DSIFT is relatively robust to scale change and I3DSIFT-ISM further improves the robustness.
In conclusion, I3DSIFT-ISM is very robust against rotation and insensitive to scaling.

Matching Real Data.
Our dataset images 1 are volumetric CT images scanned during respiratory cycles.SE  indicates the th frame.We pick three frames 1, 5, and 8, and use I3DSIFT-ISM to match their corresponding pairs: SE 15, SE 58, and SE 18.The gray values of corresponding points vary little (i.e., the lighting condition is stable) in the volume images.Therefore, we compute mean of Sum of Square Difference  to evaluate the matching results.Effective matching should produce small  value.Consider where   and   are the fixed image and the moving image, ( Fi ,  Mi ) is a matching pair between the two images, and  is the match number.Table 1 shows the results.
I3DSIFT finds matches about 2 times of N-SIFT and 1.5 times of 3DSURF.Moreover, I3DSIFT is more accurate.The I3DSIFT-ISM filters out mismatches of the matching result of I3DSIFT.The I3DSIFT-ISM is the most accurate.Figure 5 shows some matching results on volumetric images using I3DSIFT.

Application on Feature-Guided Image Registration
Integrating the step-3, we now apply our whole volumetric matching algorithm on the publicly available POPI dataset.In the 3D volume at time frame , the coherent landmarks (a set of 3D points, denoted as   = { ,1 ,  ,2 , . . .,  ,|  | }) are available and can be used to evaluate the registration.
In our experiments, we adopt the 3D consecutive image registration with the alinement of corresponded features enforced.The registration results are evaluated by the mean where  , is a landmark  in time  and  , is the transformation from time  to time .In our experiments, we set the control weight in (3) as  = 0.1.Table 2 shows the comparison results between feature-guided matching and the registration algorithm without any feature constraints  = 0.By comparing the MTRE errors, we can see that the method with both I3DSIFT and I3DSIFT-ISM outperforms (by 8%) the method that does not consider the feature correspondences.Also, the method with I3DSIFT-ISM outperforms the method with I3DSIFT.This demonstrates that (i) correct feature correspondences (computed by our first two-steps) are very important in improving the volumetric matching accuracy; (ii) better (more matches, higher correctness rate) feature correspondences lead to better matching.
Based on our feature-guided image registration, we can do the dynamic tumor motion modeling.With our volumetric registration between consecutive images, we can build up (by linear interpolation) a temporal deformation model.The motion and dynamics of the tumor can be analyzed using this model.The coherent tracking of the deforming tumor is visualized in Figure 6.

Conclusion
We propose a volumetric feature modeling and coarse correspondence algorithm for volumetric images.The pipeline Table 2: The registration error in mm, on 40 landmarks among POPI-data. , is the matching error from th to th frame;  is the mean error for the whole sequence.We show the result using B-spline fitting without feature as well as the result using B-spline fitting constrained with the correspondence result of I3DSIFT and I3DSIFT-ISM.We demonstrate the effectiveness of our coarse matching computation on both synthetic and real data by showing that our algorithm outperforms the existing volumetric matching algorithm, such as N-SIFT [18], 3DSURF [21], global histogram [18], and the reoriented global histogram [18].Our method can be used for reliable volumetric matching.As its application, we show effective feature-guided image registration using our volumetric matching pipeline.We show that it outperforms the image registration method without feature guidance.
A limitation of the current pipeline is the expensive computational cost in 3DSIFT.Feature extraction based on this needs a convolution in 3-dimensional space so it is quite slow.However, in this multiscale space, the computation can be easily conducted in a parallel manner.In the near future, we will explore its GPU/multithread implementation to achieve more efficient computation.

Figure 1 :
Figure 1: The four-step pipeline of I3DSIFT.(a) A small amount of the detected keypoints is illustrated.(b) The orientation of a pixel is computed.(c) A keypoint's surrounding region is reoriented according to its dominant direction .(d) 3DSIFT descriptor construction.Note that we draw fixed size regions in (c) and (d) for convenient visualization while both steps actually use scale-related region in I3DSIFT.

2 Figure 2 :
Figure 2: An simple example demonstrating the effectiveness of ISM.(a) shows an initial correspondence set .(b) is the affinity matrix computed from .(c) shows the assignment graph  which is unconnected.(d) shows the result of SM.(e) and (f) show the result of two revised steps of ISM, respectively.

Figure 3 :
Figure 3: Accuracy on rotation change.Accuracy comparison is shown on a volume image and its synthetic images with varying rotation angles around -axis.

Figure 4 :
Figure 4: Accuracy on scale change.Accuracy comparison is shown on a volume image and its synthetic images with varying scale change.

Figure 5 :
Figure 5: Matching examples on volumetric images using I3DSIFT.Note that we show the results in planar images for convenience, although I3DSIFT is applied to volumetric images.

Figure 6 :
Figure 6: Temporal image registration guided by our feature matching results in 0th, 5th, and 9th time frame (experiments conducted on CT scans of lung during respiratory cycles).

Table 1 :
Mean of Square Difference  on 3 real volume image pairs.SE  represents a volume image pair, that is, SE  is the fixed image and SE  the moving image.The volume images are from different respiratory periods of the same person.Note that this error is small; for example, out of the gray value range of 255,  = 32.04 indicates a mean gray difference of only 5.66.