Nonrigid 3D Medical Image Registration and Fusion Based on Deformable Models

For coregistration of medical images, rigid methods often fail to provide enough freedom, while reliable elastic methods are available clinically for special applications only. The number of degrees of freedom of elastic models must be reduced for use in the clinical setting to archive a reliable result. We propose a novel geometry-based method of nonrigid 3D medical image registration and fusion. The proposed method uses a 3D surface-based deformable model as guidance. In our twofold approach, the deformable mesh from one of the images is first applied to the boundary of the object to be registered. Thereafter, the non-rigid volume deformation vector field needed for registration and fusion inside of the region of interest (ROI) described by the active surface is inferred from the displacement of the surface mesh points. The method was validated using clinical images of a quasirigid organ (kidney) and of an elastic organ (liver). The reduction in standard deviation of the image intensity difference between reference image and model was used as a measure of performance. Landmarks placed at vessel bifurcations in the liver were used as a gold standard for evaluating registration results for the elastic liver. Our registration method was compared with affine registration using mutual information applied to the quasi-rigid kidney. The new method achieved 15.11% better quality with a high confidence level of 99% for rigid registration. However, when applied to the quasi-elastic liver, the method has an averaged landmark dislocation of 4.32 mm. In contrast, affine registration of extracted livers yields a significantly (P = 0.000001) smaller dislocation of 3.26 mm. In conclusion, our validation shows that the novel approach is applicable in cases where internal deformation is not crucial, but it has limitations in cases where internal displacement must also be taken into account.


Introduction
In many clinical tasks it is necessary to acquire images using different modalities such as magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography (CT), which often provide complementary information on anatomy and tissue function. Combination of these multimodal images can improve the diagnosis by providing synergistic information. Image registration is an important tool for fusion of medical images. It generates an image that simultaneously displays the information of the reference and the registered image. Image registration aims at identifying corresponding points in two images using spatial transform. Due to the spatial difference in the local coordinate systems of images acquired with different imaging modalities registration has to align the images.
Another application of image registration is serial imaging of a patient using the same imaging modality, which is often required for purposes such as treatment planning and monitoring [1], evaluation of disease development [2,3], and tracking of contrast bolus propagation in perfusion studies [4][5][6]. Registration is required to correct for the motion caused by patient movement and respiration and to compensate for the displacement of structures resulting from different patient positions in serial imaging studies. Although many spatial displacements can be traced back to rigid movement and are easy to correct, elastic deformation is required to describe the movement of many anatomic structures such as the liver. Thus nonrigid registration is often required for both repeated acquisitions using the same image modality and examinations using different image modalities.
Rigid transformation, which allows translation and rotation, and affine registration, which allows shearing and scaling, are widely used for image fusion in a clinical setting. However, since the human body is intrinsically deformable, rigid techniques often provide insufficient registration. Thus elastic or nonrigid methods are required to cope with local differences between images. While the number of parameters is limited to six for rigid transformation and to twelve for affine transformation, nonparametric elastic transformation requires a transformation vector for each voxel; that is, the number of parameters is three times that of image voxels. The huge number of parameters generates two basic problems for elastic transformation. First, the computing time is enormous. Second, the intrinsic image information is in general not sufficient to exactly and independently estimate the transformation vector of each voxel.
Therefore, elastic image registration uses models to limit the number of parameters. However, one can also benefit from the fact that the number of effective transformation parameters is actually much smaller. It is obvious that a tissue voxel in general cannot move independently of a neighboring voxel. Thus elastic registrations exploit the fact that the transformation field should be smooth [7,8]. A smooth transformation field adequately describes the actual dislocation of most voxels. On the other hand, in some instances, the entire inner organ moves, while movement of organ voxels relative to each other is small. It was demonstrated that movement of organs such as the prostate [4], kidney [6], or liver [1] can be estimated, in a first approximation, by a rigid or affine transformation. In consequence, the transformation field for some inner organs is not smooth. The difference between the transformation vectors inside an organ is small, while the difference between the transformation vectors at the organ surface might be much larger.
Most approaches to elastic registration are based on matching signal intensities [9][10][11][12], which is limited to special applications [10,11] or make use only of a smooth transformation field [7-9, 13, 14]. In our paper we introduce a novel geometry-based three-dimensional nonrigid registration approach. The method is based on the assumption that organ movement is primarily effective at the organ surface. We assume a smooth transformation field inside an organ and free transformation at the organ surface. Thus, our registration algorithm is a geometrical method based on segmentation. This kind of registration requires preliminary segmentation of the anatomical structure of interest on the reference or model image. Binary structures such as contours and surfaces can be generated by labeling them manually or (semi)automatically using an advanced algorithm. The basic idea is to use deformable models to guide image registration. With such models, nonrigid volume deformations are inferred from surface deformations. The transformation required for registration is then calculated by minimizing the distance of the contour points. Furthermore, a fusion technique based on the inferred volume deformation is introduced. Factors which may influence the performance of the method are discussed.

Concept.
Our registration algorithm is a geometrical method based on segmentation. This kind of registration requires segmentation of the anatomical structure of interest on the reference or model image to generate a binary structure. Segmentation can be performed manually or (semi)automatically using an advanced algorithm. An active surface mesh is generated from the segmentation of the reference images. Since an active surface mesh is a 3D deformable model which extends active contours [15] to 3D and thus can be adapted to be applied to the edges of an edge map extracted from the complementary image of the registration or fusion, we believe that mesh displacement could indicate movement of the organ. In our approach, surface evolution in the edge map leads to surface deformations described by the displacement of surface mesh points. The resulting nonrigid volume deformation vector field for registration or fusion inside the region of interest (ROI) is predicted by solving the reverse problem of free-form deformation (FFD) [16]. FFD allows to obtain the first experimental result of prediction based only on geometrical knowledge.

Active Surface.
Deformable models can be applied to image edges [17] with large image intensity gradients by using an optimization method. In our approach we use a 3D extension of active contours called active surfaces. While the level set method uses implicit surfaces [18,19], we use explicit surfaces constructed from tetrahedres.
The basic concept of active models in general, and of active surfaces in particular, is to give the models physical characteristics in terms of energy. Internal energy and external energy are two widely used concepts in this context: internal energy int describes the internal deformation characteristics of the model, that is, the smoothness of the surface. External energy ext describes the environmental influence on the model. Here "environment" refers to images or their filtered forms, on which the models are settled and transformed. Basically, external energy is extracted from features such as object boundaries, where the image gradient has its local maximum and is often described as potential energy.
Based on the definition of energy, the segmentation process ideally minimizes the total energy, total , of the model: As a result the active model will be attracted to the object boundary, where total energy is the lowest. To achieve this desired property of the model, the shape of the object of interest is supposed to be regular and smooth so that, at the object boundary, the bending energy, defined by rigid force ⃗ F rigid , and the stretching energy, defined by elastic force ⃗ F elastic , compensate for the potential energy that defines the external energy. This can also be seen as a state of Computational and Mathematical Methods in Medicine 3 force balancing of the internal forces ( ⃗ F rigid , ⃗ F elastic ) and the external force ( ⃗ F ext ): Since the active surface moves during the energy-minimizing procedure, we thus describe the whole process as a surface evolution function of time: where is the active surface defined as a set of surface mesh points at time . Vector ( ) ⃗ F total ( ) refers to the increment of surface movement at time , where the time-dependent scale factor is used to control movement speed, ensuring numerical stability under the Courant-Friedrichs-Lewy (CFL) condition [20]. The steady state, in which total energy is at its minimum, is reached if the increment for optimization approximates zero.

Free-Form Deformation.
Our method uses FFD to describe deformation of the object of interest, which is embedded in a control grid with a given resolution in three dimensions of ( + 1)( + 1)( + 1). Using its local coordinate system defined by unit vectors ⃗ S, ⃗ T, and ⃗ U, a grid point ⃗ p of the FFD can be defined as where ⃗ x 0 , described by its global coordinates, is the origin of the local coordinate system. Using trivariate tensor product Bernstein polynomial, the position in the global coordinate system of any point ⃗ x inside the FFD grid can be interpolated via where , , and are the binomial coefficients in respect of , , and . Furthermore , , and are the local Since all surface mesh points are settled inside the FFD grid, a surface point ⃗ s can be interpolated as well using the above interpolation function (5) by replacing ⃗ x with ⃗ s. Since we always take a set of points, in our case the active surface, into consideration, we use the matrix description of (5) for the whole surface S: where S is an × 3 matrix, B is an × ( + 1)( + 1)( + 1) matrix, and P is an ( + 1)( + 1)( + 1) × 3 matrix for a given number of surface mesh points considered inside the FFD grid. Since B solely describes the deformation for a given set of grid control points P, it is normally referred to as deformation matrix. Based on the above matrix description (7) the following applies: where P * describes the set of displaced control points and S * refers to the deformed surface resulting from the right side of the function, given that B is unchanged. This means, if we know the displacement of P, we can calculate the deformation of S. But note that our goal is to reversely solve the problem; that is, we have a set of deformed surface mesh points S * and wish to calculate P * in order to further use P * for interpolating the deformation of any point ⃗ x of the volume inside the surface as described in (5). Since the number of surface mesh points is much larger than that represented in the FFD control grid, using the pseudoinverse B + of B to solve will provide the solution to an overdetermined system rather than giving us a sufficient solution to the reverse problem in general. In our approach, we therefore solve the problem by using the Levenberg-Marquardt algorithm [21,22], a method of least squares, in order to minimize the squared distance between S and S * :

Numerical Implementation.
In our approach we use an active surface 3D deformable model to guide the registration. The active surface is generated through triangulation from a preliminary segmentation of the object of interest on the model image and later adapted to the edges of the reference image as a deformable model by defining the internal and external forces acting on it. The elastic force acting on a surface mesh point is the sum of tensile forces from its neighboring points. The rigid force is described as a linear prediction of the tensile forces from its neighboring points and their neighbors [23]. The active surface is optimized by applying the finite difference method (FDM) to an inverse edge map [24] of the reference image. After minimizing the total energy a deformed version of the original surface is displayed at the boundary of the object of interest on the reference image. Thus we have a one-toone mapping of the surface mesh points as well. Based on the mapping an FFD control grid wrapping the original active surface can be deformed by solving the inverse problem of the FFD. We then use the computed deformation of the FFD control grid, along with the deformation matrix of the FFD, to transform the voxels inside the original surface of the model image onto the reference image; hence image information such as intensity saved in the object of interest of the model image can be transferred into the deformed object described by the deformed surface on the reference image, which fulfills the registration task.
Conversely the procedure can also begin with a preliminary segmentation on the reference image in order to adapt the initial surface mesh to the reference image on the model image, thereby transferring the volume deformation from the reference to the model image. Using the aforementioned FFD interpolation method, image intensity in the deformed FFD on the model image can be sampled and traced back to the original object of interest in the reference image. In this way, fusion is accomplished.
Our method has been implemented as AMIRA (http:// www.amira.com) modules. AMIRA is an advanced 3D visualization software developed by Konrad-Zuse-Zentrum für Informationstechnik, Berlin (http://www.zib.de/de/ home.html) and distributed by Visage Imaging, Berlin (http://www.visageimaging.com). AMIRA is highly modularized using C++ to offer visualization and image analysis pipelines based on modules. Since our method is a twofold approach consisting of segmentation and subsequent FFD computation, the AMIRA modules are implemented in two packages, hxactcontour and hxffd. Furthermore, an upper level package hxsera (sera stands for segmentation-based elastic registration algorithm.) wraps the two packages for user-friendly access to the entire procedure.
The computational complexity of the segmentation task is linearly dependent on the number of surface mesh points and can be described as ( ) using big-notation [25]. When performing kidney segmentation, for example, we in general had from 10,000 to 20,000 points to process. In comparison the complexity of FFD computation is ( ⋅ ), where is the number of FFD control points and is the number of surface mesh points. The FFD grid we used typically had a resolution of 5 × 5 × 5 to 10 × 10 × 10. Thus the number of FFD control points is considerably smaller than that of surface mesh points.

Image Data.
To test the feasibility of our approach we first applied it to two contrast-enhanced dynamic computed tomography (CT) examinations consisting of a total of 41 3D datasets of the kidneys obtained in two patients who underwent routine clinical evaluation of renal perfusion. Each CT series was acquired using 320 slices with a 512 × 512 voxel inplane resolution. The CT examinations were performed on a TOSHIBA Aquilion ONE with a total acquisition time of 1 min for the complete dataset while the iodine contrast agent was administered as a bolus. To reduce the absorbed dose the tube current was minimized.
The first patient received 90 mL contrast medium and was examined with a CT scanner tube voltage of 120 kV and tube current of 150 mA. 24 3D CT datasets were acquired with a spatial resolution of 0.571 mm × 0.571 mm and a slice thickness of 0.5 mm. The second patient received 120 mL contrast medium and underwent CT scanning with a tube voltage and current of 100 kV and 100 mA, respectively. Seventeen 3D CT datasets were acquired with a spatial resolution of 0.702 mm × 0.702 mm × 0.5 mm. Due to the high signal noise of the low dose scans the effective spatial resolution was lower than the nominal resolution of the scans. Therefore, the 3D CT datasets were resampled to a resolution of 256 × 256 × 160 for the study.
We further used our method for multimodal registration in patients undergoing imaging of the liver, which is a more elastic organ than the kidney. Twenty patients treated by routine clinical brachytherapy [26] were investigated. A 3D CT and a 3D MRI interventional dataset from each patient were acquired no later than 1 hour after brachytherapy catheter positioning. One of the two 3D datasets was used for therapy planning. Furthermore, follow-up MRI performed several months after treatment was available for all patients. Because of the long interval between the intervention and follow-up MRI, there may be considerable liver movement and deformation in the follow-up images compared with the initial planning 3D dataset.
All axial CT scans of the liver were acquired with a resolution of 512 × 512, but the number of slices ranged from 31 to 322, resulting in severe partial volume effects in the CT scans acquired with a lower number of slices. The averaged spatial resolution of the CT scans is 0.743 mm × 0.743 mm × 3.04 mm compared with 1.187 mm × 1.187 mm × 2.50 mm for MRI, where an invariable slice thickness of 2.50 mm applies to all cases. T1-weighted volume-interpolated 3D gradient echo MR images were acquired during catheter positioning on an open bore Philips 1.0 Tesla MR with a more inhomogeneous signal distribution.

Evaluation Methods.
To validate our approach, we used empirical methods, which are subcategorized into discrepancy and goodness methods as described in [27]. Discrepancy methods depend on an optimal reference, which is generally known as the gold standard and has been verified by experts. Goodness methods do not need a reference but rather depend on some preferable characteristics to describe and thus judge the performance of the algorithm.
In our validation we investigated alignment of the volume and misalignment of contours by using two different kinds of discrepancy features. The Dice similarity coefficient (DSC) [28,29] was used to evaluate volume alignment where is the volume of the segment to be validated and the volume of the gold standard. For contour misalignment, we evaluated the Hausdorff distance [30] with and the averaged contour misalignment where and are sets of contour points of the segment to be validated and the gold standard, respectively, and and represent the contour points of sets and . Note that ( , ) and ( , ) are generally not equal [31], so that the combined average of the contour misalignment was used for evaluation.
Furthermore, two different goodness methods were used to validate the segmentation approach: the intraregion uniformity [32] and the gray level contrast [27]. Intraregion uniformity is based on the assumption that the regions that have been segmented should have a uniform distribution of gray levels, which means that variance within each region should be small. is defined by the signal intensity of voxel in region that has been segmented with mean intensity in as where describes the volume of . The variance of is defined as Based on the intensity variance defined previously, uniformity is defined as where all is the sum of all volumes that have been segmented and 2 max is a normalization factor defined as where max and min are the absolute maximum and minimum of the signal intensities from all segmented regions, respectively. In comparison, the signal contrast takes the intensity difference between the segmented region and its background into consideration and assumes that contrast should be large. For the averaged signal intensity of the segmented region, 0 , and the averaged background signal intensity, , signal contrast is defined as Furthermore, to validate our final registration result, we used a goodness method based on our empirical study, assuming that the change in signal intensity should be small within the region of interest (ROI) but large between the ROI and its background. This means that an accurate registration should yield a small variance in intensity, while a poor registration should yield much greater variance. Here we measure the standard deviation of intensity of the registered image within the reference ROI that was segmented by experts as gold standard (see Appendix A).

Evaluation.
As mentioned above, we first applied our approach to dynamically acquired renal CT scans because the kidneys are relatively rigid organs. Segmentation and registration were evaluated separately. The two patients are numbered .#1 and .#2, and their 3D datasets are numbered consecutively beginning with the first acquisition of the dynamic series.
To reduce intra-and interexpert variability gold standards were obtained for the discrepancy methods using an iterative expectation-maximization method [33,34] with an expert performing five segmentations for each 3D dataset. To validate the registration result we used our goodness feature-the standard deviation of intensity-to compare our registration with an intensity-based affine registration using mutual information [35] as the optimization metric.
We further validated our method for multimodal registration of the liver, which is more elastic than the kidney. We compared our registration results with a quasigold standard based on a voxel-based affine registration using mutual information on the same datasets. The quasigold standard was generated using the approach in [1]; here the liver is first segmented by a radiologist, and the segmented images are then registered. Intrahepatic landmarks positioned at vessel bifurcations by an experienced radiologist in both datasets were used to compare registration accuracy by measuring dislocation of the landmarks after registration.

Segmentation of Kidney.
First, we evaluated the performance of our method in the segmentation of the kidney. The kidney moves several centimeters during breathing but is, in a first approximation, a rigid organ. Table 1 presents Dice similarity coefficients, averaged contour misalignment, and Hausdorff distances in relation to the gold standard as well as intraregion uniformity and gray level contrast. Our method  showed subvoxel accuracy with a mean averaged contour misalignment of 0.504 mm (Table 1), which is below the the voxel size after subsampling of over 1.00 mm.
Furthermore, the averaged Hausdorff distance of 3.505 mm (see Table 1) is quite acceptable. To show this, we compared the results achieved with our method with those of manual segmentation by experts in Table 2, where we used the same gold standard to measure the Dice similarity coefficient, averaged contour misalignment, and Hausdorff distance. Our method has a slightly higher misalignment of 4.482 mm compared with the experts average of 4.181 mm. However, this value is well within the subvoxel range.
To test the goodness features, which are independent of the gold standard, we compared the quality of our segmentation method directly with that of experts. The results are presented in Table 3. The paired -test was used to investigate for significant differences. With a value of 0.2466, we found no significant difference in intraregion uniformity. Performance of the program was significantly better ( = 0.0028) using signal contrast for evaluation.

Registration of Kidney.
To compare our registration method with affine registration using mutual information, the paired t-test was used to estimate the level of reduction of standard deviation of image intensity after affine and elastic registration. As can be seen from Table 4 our method achieved a 15.11% better quality with a high confidence of 99%. This value corresponds approximately 1 mm correction of translation (see Figures 1 and 3). : intraregion uniformity of gold standard. prg : intraregion uniformity of program. GC : signal contrast of gold standard. GC prg : signal contrast of program. P: two-sided level of significance of the t-test for prg versus and GC prg versus GC .  : averaged dislocation (affine registration). : standard deviation (affine registration). , : number of patients (elastic registration).
, : number of landmarks (elastic registration). : averaged dislocation (elastic registration). : standard deviation (elastic registration). P: twosided level of significance of t-test for versus in the same column. per liver. Interventional liver MRI and CT examinations were performed within one hour. The paired -test was used to compare landmark dislocation between both registration methods. The first column of Table 5 shows that our method has an average landmark dislocation of 4.32 mm. Affine registration yields a significantly ( = 0.000001) smaller dislocation of 3.26 mm.

Registration of
The difference between the two registration methods was also investigated using the Wilcoxon signed-rank test, where the correlation value between the two dislocation tests was also calculated. No significant correlation between the registration accuracy of both methods was found, with landmarks using the Wilcoxon signed-rank test. The correlation between landmark dislocations of affine and elastic registration was found to be 0.49. Therefore, registration accuracy did not significantly depend on individual image quality. This indicates that the landmarks were well set and are very well suited for our evaluation purpose. Furthermore, the registration methods were compared using interventional and follow-up image datasets acquired 12 weeks later. The results for coregistration of CT to follow-up MRI and interventional MRI to follow-up MRI are presented in Table 5 and in the boxplot in Figure 2, respectively. There is a clear drop in quality for both affine and elastic registrations, which is attributable to greater deformation resulting from the long interval between the two examinations. In comparison to affine registration, our method shows nearly two times greater degradation in each case. Moreover, there are also some cases in which our method did not perform better (compare and in Table 5).

Discussion
This paper presents a novel approach of elastic registration based on the coregistration of surface and volume interpolation. An important feature of our approach is that volume deformation is interpolated solely from geometric changes of the surface. We validated our approach by comparison with affine registration as the gold standard. We performed a series of tests advancing from rigid objects (the kidney), to predominantly affinely deformed objects (liver with interventional images), to elastically deformed objects (liver with follow-up images). In the last test, the liver was definitely strongly deformed due to the localized effects of brachytherapy irradiation.
Applied to the kidney, our registration method significantly improves movement correction compared with affine registration. The kidney is a relatively rigid organ and there is relatively little deformation in the 4D CT time series due to short duration of image acquisition. The result indicates that our elastic registration method performs well based on effective first correction of affine registration and can adequately correct for displacement of rigid organs such as the kidney when there is relatively little deformation inside the organ.
In contrast, for multimodal elastic registration of the liver, which is more elastic than the kidney and was deformed by irradiation in our tests, our method showed poorer performance in terms of quality compared with the twelveparameter affine registration method. Landmark dislocation determined with affine registration seems to be a quasigold standard, which yielded 1.06 mm dislocation for quasi-simultaneously acquired interventional MRI and CT, 5.40 mm for interventional CT and a follow-up MRI, and 4.14 mm for interventional MRI and follow-up MRI. The poorer image quality of interventional MRI (signal-to-noise ratio, signal homogeneity, and spatial resolution) reduces registration accuracy for affine registration and especially for elastic registration. In principle, elastic registration should improve matching of a liver deformed by radiation treatment. The elastic registration method applied in the present study uses only information on the liver surface. In contrast, the affine registration method uses the image information of the entire liver, thus yielding a more accurate registration result. A technique using additional internal information beneath the surface might yield better results [10].
A limitation of the present study as well as of most studies using real medical images is the reliability of the validation method. Often, investigators use artificial image data for validation of their registration method, failing to take real registration problems into account. Several approaches have been proposed to validate registration accuracy; for example, Schnabel et al. validated their elastic registration method using a biomechanical model [36]. In the appendix to this paper, we demonstrate that the standard deviation in an ROI might also be a measure for comparing different coregistration methods, for example, comparison of affine registration with elastic registration as applied in the current study or by Rueckert et al. [13]. Registration accuracy can be estimated absolutely by comparing the discrepancy of the actual registration with that of landmarks defined by experts. To reduce further intra-and interexpert variability, we additionally applied an iterative expectation maximization method using multiple segmentations [33,34].

Conclusions
Existing methods of image registration yield unreliable results when applied to register an organ, such as the liver, that has been transformed, for example, by treatment. A more sophisticated method would be useful in this case. We have tested an approach based on coregistration of organ surfaces and interpolation of internal space. The technique has been shown to work when applied to rigid organs. However, the method was developed for application to serial datasets acquired to monitor the outcome of treatment. Our data show that the method yields unsatisfactory results when used to register an organ, such as the liver, that has been transformed by treatment, for example, radiation therapy.
Obviously, taking into account treatment-related changes on the surface is not sufficient to determine internal changes of the liver. Thus, to monitor radiation therapy of the liver, we need an elastic registration approach that uses surface information as well as information on internal organ structures in order to yield better results than 12-parameter affine registration.