Finite Element Surface Registration Incorporating Curvature, Volume Preservation, and Statistical Model Information

We present a novel method for nonrigid registration of 3D surfaces and images. The method can be used to register surfaces by means of their distance images, or to register medical images directly. It is formulated as a minimization problem of a sum of several terms representing the desired properties of a registration result: smoothness, volume preservation, matching of the surface, its curvature, and possible other feature images, as well as consistency with previous registration results of similar objects, represented by a statistical deformation model. While most of these concepts are already known, we present a coherent continuous formulation of these constraints, including the statistical deformation model. This continuous formulation renders the registration method independent of its discretization. The finite element discretization we present is, while independent of the registration functional, the second main contribution of this paper. The local discontinuous Galerkin method has not previously been used in image registration, and it provides an efficient and general framework to discretize each of the terms of our functional. Computational efficiency and modest memory consumption are achieved thanks to parallelization and locally adaptive mesh refinement. This allows for the first time the use of otherwise prohibitively large 3D statistical deformation models.


Introduction
Nonrigid registration remains one of the largest challenges in medical image analysis and computer vision. The correspondence it seeks to establish between related objects is essential for a large number of applications from shape statistics to model-based segmentation and computational anatomy [1,2].
In medical applications, the objects that need to be brought into correspondence are usually organs which were captured with a medical imaging device, such as CT and MRI. In this paper, we use bones, skulls, and hands as examples. We propose a method to bring them into correspondence by registering a number of feature images. The unique features and contributions of our method are as follows: (1) the representation of surfaces by a distance and a curvature image, (2) the inclusion of a volume preservation term into the registration, (3) the introduction of prior knowledge in form of a statistical deformation model, (4) a continuous formulation of the registration method, including the deformation model, making the registration independent of its discretization, (5) an efficient finite element discretization based on the local discontinuous Galerkin method.
For us, as in many other applications, the ultimate goal is the construction of statistical shape models. Therefore, we are mostly interested in the shape of the organ's surface, which we represent by the two most prominent feature images in our method: a distance and a curvature image of the surface. Together, they provide a good description of the shape of an object. Other possible feature images, which are then simultaneously registered, encode additional information about the organs like the CT or MRI data. Registering all feature images together represents our assumption that a good registration result should bring all of these feature images into correspondence.

Computational and Mathematical Methods in Medicine
In addition, we need to incorporate prior knowledge about the registration result in order to be able to address the inherently ill-posed problem of image registration. The result is given in the form of a vector field, and in the most basic form of our registration method, we simply enforce the smoothness of this vector field by controlling the norm of its first derivative. However, it turns out that this regularizer, which is also found in the Demons or Diffusion registration algorithms [3,4], allows large and unnatural looking volume change. By penalizing volume change, we impose our prior knowledge that the registration result should not compress or expand the objects excessively. This is achieved by penalizing the linearized volume change caused by the vector field.
While this generic knowledge about registration and the resulting regularization applies to most registration tasks, we can go even further and include specific prior knowledge about the objects under consideration. This is done by penalizing deformations that deviate from the space of known deformations for the specific type of object. This space is modeled by a statistical deformation model derived from previous registration results of the same object class, that is, the same organ.
One of the main contributions of this paper is the formulation, discretization, and minimization of the registration problem as a continuous functional, integrating all the different terms described above into a single analytic formula for the deformation field. This formulation allows for the simple enhancement of the scheme through further terms and for a straightforward discretization. In this paper, we present a memory-efficient and flexible schemes using adaptive finite elements; this approach is presented in a general setting. The results shown in this paper are obtained using the local discontinuous Galerkin method. This approach leads to a very simple formulation even of complex regularization terms and is well suited for the use with distributed grids and nonconforming adaptation. The implementation is based on DUNE-FEM [5], which uses the generic grid interface from the Dune library [6,7]. We use the ALUGrid [8] which supports nonconforming local adaptivity and the possibility of domain decomposition and dynamic load balancing for parallel computations. These advances in efficiency and memory consumption enable us to perform registrations that where previously not possible. In particular the use of large 3D statistical deformation models, which would typically require in excess of 700 MB for each mode of variation with a uniform discretization, becomes possible for the first time. Pre-and postprocessing are performed using ITK and VTK [9,10].

Prior Work.
Nonrigid registration is an extremely well researched problem. For an overview of registration methods we refer to the recent survey papers by Zitová and Flusser [11] (image registration), Audette et al. [12] (surface registration), and in particular the book by Modersitzki [4] for a thorough discussion of variational methods for image registration. The most basic form of our method, that is, leaving out all the optional terms, is closely related to Thirion's Demons algorithm [3] and Modersitzki's diffusion registration algorithm [4].
The idea of surface registration using a distance or levelset representation of surfaces has been introduced by Paragios et al. [13], and the inclusion of additional feature images, especially for parametrized surfaces, is used for instance in [14]. The use of curvature images has been presented in our previous paper [15].
Volume preserving image registration was introduced by Rohlfing et al. in [16] and Haber and Modersitzki in [17]. Rohlfing et al. include a term penalizing volume change in a B-spline-based registration framework, while Haber and Modersitzki enforce strict nonlinear volume preservation in a variational formulation. In our approach, we wish to allow a limited amount of volume change and therefore use a soft constraint, that is, an additive penalty term. For efficiency, we penalize only the linear part of the volume change, and we show that this is equivalent to the linear elastic regularization term first introduced by Broit [18] and Christensen et al. in [19], even though our motivation for using this regularizer does not stem from modelling the organs as elastic bodies.
The concept of statistical deformation models and their inclusion into registration algorithms has been researched by several groups [20][21][22][23]. However, these methods either constrain the registration result strictly to the span of the model or use an interlaced algorithm performing the statistical regularization in a separate step, whereas we present an integrated formulation, which builds on our previous publication [24]. Furthermore, all of the previous methods are formulated within the discretization framework preferred by the respective group, whereas our method provides a continuous formulation of the deformation prior and, independently, a finite element discretization of this prior.
The use of finite elements for image registration goes back at least as far as [25], and we published a first finite element registration algorithm in [15]. The final model derived in this paper results in an elliptic problem with a nonlinear forcing term. The finite element discretization for general elliptic problems has now been employed for decades and can be considered standard. A summary of the standard approach of conforming, continuous finite elements can be found in [26]. Since we are using nonconform locally adapted grids with distributed memory parallelization, we employ a discontinuous finite element approach. An overview of this class of schemes can be found in [27]. The method we use is based on the local discontinuous Galerkin scheme introduced in [28]. To compute the second order derivatives in the elliptic regularization operator, the model is rewritten as a system of first-order equations, and these terms are discretized element-wise using intercell fluxes to enforce consistency with the analytical model. This results in a small discretization stencil, which leads to a simple implementation on nonconform distributed grid structures. The flexibility of this approach leads to a very simple formulation in spite of the rather complex elastic regularization terms. Also, the freedom in the choice of the basis functions allows us to use an orthonormal basis resulting in a diagonal mass matrix which simplifies the statistical regularization term considerably.
Computational and Mathematical Methods in Medicine 3

Registration Method
In the following, we describe our registration method. At its core, it is an image registration method and as such can be used directly on images. But as our focus lies in registering the shape or surface of organs, we describe how the method can be used to register two surfaces Γ 0 , Γ 1 ⊂ R . The surfaces can be segmented from medical images or acquired otherwise. We assume that they are already rigidly preregistered, for instance by Procrustes alignment [29], so that our algorithm only needs to recover the nonrigid component of the registration. Moreover, the preregistration enables us to choose a common rectangular image domain Ω ⊂ R which contains both surfaces, so that we can represent each surface Γ by its signed distance function : Ω → R: where dist( , Γ) is the Euclidean distance from to Γ and the inside and outside of Γ have to be assigned in a meaningful way. For open surfaces, for which inside and outside cannot be defined, an unsigned distance function can be used. The aim of a registration algorithm is to find a deformation field : Ω → R such that the target surface's distance function 1 warped with this deformation field, that is, 1 ( + ( )), is as close as possible to the distance function of the reference surface given by 0 . The registration result of the distance functions implies a registration of the surfaces.
We formulate the registration problem as a minimization problem. It is shown in [4] that virtually all registration methods can be interpreted in this way. The deformation field is sought as the minimum of a functional which is the sum of two terms: distance and regularization terms. Thus, the registration problem consists of finding the minimum of the functional with distance term D and regularization term R. The former measures the distance between the reference and the registration target. At its minimum, the warp of the target is as close as possible to the reference image. The regularization term on the other hand measures the smoothness or regularity of the registration result . The smaller it is, the more regular the solution will be. By minimizing both terms simultaneously, we try to bring the reference and target as close together as possible while keeping the deformation field reasonably smooth. We believe that there is no single generic distance or regularization term that guarantees a good registration in every scenario. The notion of correspondence is application specific, and the more knowledge about the registration task at hand we can include into the method, the higher the chances will be to obtain a result that meets our requirements. For example, the most basic regularization term we propose simply penalizes the squared 2 -norm of the gradient. By including additional regularization terms, we can also penalize volume change or the deviation from a statistical model of the object to be registered, thus improving the registration result. The same holds for the distance measure, where the simplest approach measures the 2 difference between the two surfaces. Again considerably better results can be achieved by including additional terms, for example, the curvature of the surfaces.

Distance Term.
The basis for the distance term D is the 2 difference between the warp of the signed distance images 1 and the reference 0 of the two surfaces to be registered: The distance images of two similar surfaces have a similar range of values, especially close to the surfaces, which makes the 2 distance measure an appropriate choice for their comparison. In order to prevent undesirable effects at the boundary, where the distance function of each surface may be cut off at a different value, we bound the distance images at a certain distance from the surface: and register these bounded distance images instead of the original ( ). The bound ∈ R should be chosen so that the level set of each surface we want to register is completely contained inside Ω.

Robust Distance Measures.
For noisy or otherwise difficult feature images it can be advantageous to use a robust distance measure, which dampens the influence of the overly large differences between the images; see [30] for a review of different robust cost functions. We propose using a robust distance measure based on the Geman-McClure estimator [30,31], which has been successfully used for medical image registration in [32]. It can be easily realized by weighting the distance measure (3) with a term ( ): For the Geman-McClure distance measure, we have ( ) = 2 + ( 1 ( + ( )) − 0 ( )) 2 , with a regularization parameter ∈ R which controls the robustness of the measure. A similar term is used in Thirion's Demons algorithm [3], where the norm of the gradient of the image replaces : ( ) = |∇ 1 ( + ( ))| 2 + ( 1 ( + ( )) − 0 ( )) 2 . In our experiments, both weights yielded similar results, which is not surprising as for distance functions |∇ | = 1 almost everywhere. We have found that for distance images that represent surfaces free from artifacts or excessive noise, it is not necessary to use a robust distance measure. But it proved to be of good use for the additional feature images introduced in the following sections, such as the curvature images.

Curvature Guided Registration.
When registering surfaces by means of their distance images, the problem arises that by definition, and the value of the distance function is zero on the whole surface and therefore contains no information on the surface. Therefore, the distance function D is minimized whenever a point on one surface is registered onto a point on the other surface even if it does not correspond functionally or semantically to the given point. In fact, when we try to minimize the registration functional with a gradient descent scheme, the corresponding point is only sought in the direction of the gradient of the distance image, that is, perpendicular to the surface. This effect is somewhat alleviated by the regularization term, but this is often not enough to obtain a sensible registration. See the example in Section 4.3 for instance.
For bone registration, we wish to establish correspondence between points that have a similar anatomical function. So similar bumps, crests, ridges, and so forth should be matched. Such features are well described by the curvature of the surface. In fact, for a large class of objects, corresponding points on two surfaces have similar curvature. Figure 1 illustrates this for the mean curvature of human skulls. Therefore, we use the mean curvature as an additional feature to be matched.
With the surfaces represented by their distance images, the curvature is easily calculated by ( ) = div(∇ /‖∇ ‖).

For each
∈ Ω, ( ) is the mean curvature of the level surface passing through . So if is on the zero level set of , ( ) is the mean curvature of the surface at that point. Since for distance images ‖∇ ‖ = 1 almost everywhere, the curvature image is even more easily computed as = Δ .
The curvature images are included in the registration process with a distance term analogous to that in (5) as follows: The overall distance measure is then given as D [ ] + D [ ] with , ∈ R + controlling the influence of the distance and curvature images.

Additional Feature Images.
In an obvious fashion, any number of additional feature images can be added. If we denote the th pair of feature images by 0 , 1 , the full distance term for our registration method is given as with weighting parameters . In our experiments, we have included the original CT scans from which the bone surfaces were segmented as additional feature images where they were available. 2D projections of two such CT scans can be seen in Figure 2. Additional image modalities or manual annotations could also be included provided that they are available for both surfaces to be registered. Obviously for each of the surfaces, all its feature images need to be in correspondence, which is usually already the case or can be achieved with a rigid registration method. For multimodal image pairs, more sophisticated multimodal distance measures can be employed instead of the 2 distance measures used in (7) (cf. [4,33]).

Regularization Term.
Registration is an ill-posed problem, and any algorithm trying to minimize a distance measure without enforcing some kind of smoothness or regularity on the solution is bound to fail. We begin by introducing a very basic regularization term, which is later enhanced by adding further terms. One of the most basic ways to control the smoothness of the deformation field is through its first derivative , which we will, for simplicity, denote by ∇ . But the reader should bear in mind that : Ω ⊂ R → R , and hence ∇ ∈ R × . We define the basic regularization term as The smaller R [ ] is, the smoother the deformation field .
Computational and Mathematical Methods in Medicine 5 2.3.1. Volume Preservation. While the regularization term (8) forces the deformation field to be smooth, it still allows some quite unnatural deformations. In particular, it allows for excessive expansion or compression of the registered object by the deformation field; see Section 4.4 for an example. The compression or expansion of the warp + ( ) can be measured by the determinant of its first derivative det[ ( + ( ))]. A volume preserving deformation field must satisfy det[ ( + ( ))] ≡ 1. Naturally, as we are mostly interested in registering bones of different individuals, which in general do not have the same volume, we do not wish to enforce a strict incompressibility constraint as in fluid dynamics or other registration approaches [17]. Instead, we add a soft incompressibility constraint to our existing regularization term based on the linearization of det[ ( + ( ))]. Thus, volume change is limited but not completely prohibited.
Since det[ ( + ( ))] = 1 + div + (nonlinear terms), if follows that the smaller the divergence of , the closer the linearization of the determinant is to 1 and therefore the field to being volume preserving, provided the values of are not too large and therefore the linearization justified. Therefore we add the square of the deformation field's divergence to the functional: In a later section, we will see that the Euler-Lagrange equations for the functional (9) correspond to those of the wellknown linear elastic registration methods (see Appendix A).
Although the volume preservation is only approximative, it enhances the registration result visibly; see Section 4.4.

Statistical Deformation Prior.
It is well known that the regularization terms can be interpreted from two different perspectives. On the one hand, they serve as numerical stabilizers and make the numeric treatment of ill-posed problems feasible. On the other hand, the regularization term incorporates prior knowledge about the solution into the problem. In this respect, the regularization terms that we have introduced so far were generic in the sense that they only require the solution to be smooth or volume preserving. They do not take properties of the object to be registered into account. Even though registration is a prerequisite for expressing prior knowledge about a specific class of objects, it can itself benefit from such prior knowledge when the data is noisy. Therefore, we propose including prior knowledge about the specific class of objects by introducing an additional regularization term. This term is based on a probability distribution estimated from previous successful registration results and penalizes unlikely deformations. This type of regularization becomes very natural when we consider the probabilistic interpretation of the regularization approach.  [21] for a detailed discussion of this probabilistic interpretation for variational image registration.
Assume that we already have a set of deformation fields { 1 , . . . , }, mapping from a common reference image to a given set of target images { 1 , . . . , }. We expect that a new registration result is unlikely to be very different from the ones already given, since it registers an object of the same class. Indeed, we can regard the deformation fields 1 , . . . , as training data from which we can estimate a probability distribution over the possible deformations. This is the main idea behind Statistical Shape Models [34,35] and the related Statistical Deformation Models [22], to which our problem belongs. In these approaches, the common assumption is that the objects (i.e., the shapes and the deformation fields) follow a normal distribution. Its parameters are estimated from the training data.
We introduce a continuous formulation for deformation models here, which fits naturally into our continuously defined registration framework and permits a straightforward finite element discretization. The model is characterized by the mean field and the covariance operator C : which acts on a deformation field ∈ 2 (Ω, R ) through This operator is the continuous analogon to the sample covariance matrix. It is a linear integral operator with a symmetric integral kernel given by C is a self-adjoint (i.e., symmetric) compact operator. Thanks to the spectral theorem for compact normal operators (see [36] for instance) it admits an eigenvalue decomposition with positive eigenvalues 2 and corresponding orthonormal eigenfunctions . As C is estimated from examples − , there are at most ≤ nonzero eigenvalues; 2 and C can be represented as Moreover, the spectral theorem guarantees the orthogonal decomposition of 2 (Ω, R ) into the span of the eigenfunctions and the operator's null space N(C): 6 Computational and Mathematical Methods in Medicine The span of the eigenfunctions is the same as the span of the − . That means that it contains all linear combinations of the training examples. On this -dimensional space, which is frequently called the model space of the deformation model, the covariance operator C is invertible. Thanks to the eigenvalue decomposition, the inversion of C on the model space is easily achieved by inverting the eigenvalues and coincides with the formulation of the pseudoinverse C † on the whole space 2 (Ω) as follows: We can therefore formally define a normal distribution N( , C) with mean and covariance C on 2 (Ω): where is a normalization parameter. Technically, while this function exists and is the natural extension of the density function of a statistical shape or deformation model to a continuous function space, it is not a well-defined density function. However, for any finite discretization ℎ , the corresponding density of the multivariate normal C[ ℎ ] exists and is well defined. This distribution can be used as the basis for a statistical regularization term. However, it only represents prior knowledge about the model space. Its complement, N(C), is completely ignored. A statistical regularization based on this distribution would permit deformation fields ∈ 2 (Ω, R ) that are arbitrarily far away from the model space and would regularize only their projection onto the model. An extreme measure taken by other authors is therefore to restrict the registration results strictly to the model space [20,22]. We on the other hand wish to explicitly allow results that lie outside (but still close to) the model space, as only these results can be used as reasonable additions to the statistical model.
We define an additional distribution on N(C), which imposes our prior knowledge that the registration results should not lie far from the model. In the absence of additional example data, we simply assume a normal distribution with mean and uncorrelated uniform variance of 2 ∈ R + , which controls how much the results are allowed to deviate from the model space: While this term could in principle be restricted to N(C), we define it on the whole space 2 (Ω, R ). This corresponds to shrinkage estimation, a technique used in statistics to improve the estimation of the covariance [24,37], which adds the additional variance of also on the model space.
To define the distribution ( ) characterizing our complete statistical model, we assume the two normal distributions to be independent and define ( ) as their product It is then natural to introduce the functional as an additional regularization term, which penalizes unlikely deformation fields. The constant is used as a weighting factor. We thus arrive at the following functional describing our registration model:

Finite Element Discretization
In this section, we describe the discretization and minimization of the functional (21) based on a finite element method and an adaptive multiresolution pseudo-time stepping approach. We start with the description of the spatial discretization focusing on the terms from the deformation prior. This finite element discretization of the continuously defined deformation prior is one of the most important contributions of this paper, and thanks to use of a memoryefficient adaptive finite element basis makes the use of large 3D statistical deformations possible for the first time. , which can be interpreted as a bilinear form, is computed can be found in Appendix A. The three terms, corresponding to the distance measure, regularization term, and the deformation prior, respectively, are given by Computational and Mathematical Methods in Medicine The sum in (22) is over all feature images used for the registration, for example, the distance maps , the curvature , and the CT scans. We employ a finite element discretization to compute the deformation field. Given a function space ℎ ⊂ 2 (Ω, R 3 ) of finite dimension , we seek an approximate deformation field, which we will also denote by ∈ ℎ , satisfying J [ , ] = 0 for all test function ∈ ℎ .
The derivation of a discrete version both for the regularization term R and the distance measure D is straightforward and will be briefly sketched in Section 4.1 together with details on the construction of the discrete function space ℎ used in our tests. In the following, we will concentrate on the term P arising from the statistical deformation prior.
The term P defined in (24) includes the eigenfunctions of the covariance operator C, defined in (12). The eigenfunctions are needed for the efficient (pseudo-) inversion of C according to (16). In order to formulate a discretization and an implementation for P , we need to actually calculate these eigenfunctions. According to the ideas of functional PCA [38], the eigenfunctions can be calculated with help of a finite dimensional eigenvalue decomposition if both the argument and the training examples are given as a linear combination of a set of basis functions.
So to derive a discrete formulation of C, we fix the argument and examples to be from the discrete function space ℎ . They are then given as linear combinations of the basis functions , and we have the coefficient vectors w ℎ and k ℎ, := u ℎ, − u ℎ . The covariance operator C from (12) then takes the form The matrix A = ( ) is given by where we have used to denote the entries of the mass matrix M; that is, = ∫ Ω ( ) ⋅ ( ) .
Equations (12) and (26) show that C is restricted to ℎ maps to ℎ and is represented by the matrix A = , which, resubstituting the definition of v into (26), can be represented as where Σ is the sample covariance matrix of the DOF vectors k ℎ, = u ℎ − u ℎ . The eigenvalue decomposition of this discrete version of C can be achieved by an eigenvalue decomposition of the matrix A. Compared with traditional discrete statistical models as [22,34,35], where only the covariance matrix Σ is used, in our case, the mass matrix M slightly complicates matters as it makes the matrix A nonsymmetric in general. In functional PCA, this problem is solved by first calculating an eigenvalue decomposition of the symmetric matrix: This is a symmetric positive semidefinite ( × )-matrix with rank ≤ ≪ . Remember that is the number of training examples which is usually much smaller than , the number of degrees of freedom of the discrete function space ℎ . The positive eigenvalues 2 1 , . . . , 2 and the matrix S of their corresponding orthonormal right eigenvectors r can be computed efficiently with a singular value decomposition of an ( × ) matrix, without the need to compute a full large ( × ) singular value decomposition. We thus have the eigenvalue decomposition: and consequently the decomposition of A : with eigenvalues 2 and eigenvectors M −1/2 r . The associated functions = ∑ (M −1/2 r ) are the eigenfunctions of the discretized covariance operator from (25). C being a symmetric operator, we expect its eigenfunctions to be orthonormal with respect to the 2 (Ω, R 3 ) scalar product: With the eigenfunctions , we can rewrite (24) with an explicit formulation for the pseudoinverse C † as follows: which can easily be implemented once the eigenfunctions have been calculated.

Computational and Mathematical Methods in Medicine
This formulation takes the same form in the continuous, and the discrete setting and could have been calculated directly using the continuous eigenvalue decomposition Equation (16). The discretization lies only in the restriction of P to the discrete function space ℎ and the calculation of the eigenfunctions via a matrix SVD as described above.

Minimization Strategy.
Registration is achieved by minimizing the functional (21). We introduce an artificial time variable and try to minimize the functional by computing its gradient flow, that is, we solve the time-dependent partial differential equation: for a function ( , ), given an initial solution 0 = ( , 0). Note that the term has to be interpreted in the weak sense, that is, in the form ∫ Ω . A time stepping scheme for (33) can be viewed as a continuous version of gradient descent; indeed, if we use an Euler scheme for the time discretization of (33), we end up with a gradient descent iteration.
Choosing an explicit Euler scheme to solve (33) leads to a severe time step restriction, coupling the time step with the mesh width ℎ via = (ℎ 2 ) due to the regularization term (8). On the other hand, due to the nonlinearity of the distance term D[ ], a purely implicit iteration scheme would require the solution of a large nonlinear system in each iteration. Therefore, we propose to use semi-implicit time stepping scheme. We split up the functional J[ ] into an explicit part J expl [ ], containing all terms with nonlinear derivatives and an implicit part J impl [ ] containing all terms with linear derivatives. The iteration scheme, in its simplest form, is then defined as In our case, this means that J expl = D and J impl = R + P. If additional terms are introduced, they can be added to either J expl or J impl , depending on their linearity. Each iteration step thus requires the solution of a linear system of equations for which we employ an iterative Krylov type solver (in this case the BiCGStab method). To obtain higher order version of this method, IMEX Runge-Kutta schemes are used [39].
For our experiments, we chose a locally adaptive multiresolution strategy; we first minimize the functional on a coarse uniform grid; that is, starting with the initial guess 0 ≡ 0, we solve (33) up to some fixed time 0 . Then the grid is refined around the reference surface, and the coarse solution is interpolated onto the refined grid. This is then used as starting value for solving (33) on the refined grid up to some fixed time 1 . This process is repeated until a sufficiently fine resolution has been reached. The times have been experimentally determined. In each step, all elements closer to the surface than a certain threshold Θ are refined; that is, | 0 | < Θ. This threshold is decreased in each step, so that the final solution is calculated on a grid that offers a very fine resolution close to the reference surface and becomes more and more coarse further out. An example is shown in Figure 3.

Implementational Details.
The scheme is implemented in the Dune framework, a software library allowing the generic implementation of grid based numerical schemes [6,7]. The finite element implementation is based on the DUNE-FEM module [5]. Pre-and post-processing is done using ITK and VTK [9,10].
The spatial discretization of the deformation field is based on a tessellation T ℎ = { } ∈I of the image domain Ω. This tessellation must be nonoverlapping and can be uniform or adaptive and is typically made up of triangles or rectangles in 2D or tetrahedra or hexahedra in 3D. We use the ALUGrid library [8] which supports unstructured meshes in 2D and 3D with nonconforming local adaptivity and the possibility of domain decomposition and dynamic load balancing for parallel computations. Figure 3 shows a visualization of such a nonconforming locally adaptive grid.
The spatial discretization employed is based on the local discontinuous Galerkin (LDG) scheme. Given a tessellation T ℎ , this scheme follows the same ideas as the standard Galerkin method [26] but employs a discontinuous ansatz space: ℎ := {V ℎ :V ∈ [ ( )] for ∈ I}. Here V ≡ V ℎ| and ( ) denote the space of polynomials on the element of order . Note that there is no continuity assumption between elements, so that there is very little restriction on the tessellation. Due to the discontinuous ansatz space, the discretization of the higher order derivatives in the regularization term is slightly more involved than in the standard Galerkin method. We briefly sketch the LDG approach when applied to a partial differential equation of the form − Δ − ]∇ div =̃, wherẽcombines all lower order terms. This equation corresponds to the strong form of the Euler-Lagrange equation (see Appendix A (A.9)). Rewriting the second-order terms as a first-order system for the vector valued functions , 1 , . . . , , we arrive at with = 1, . . . , . This approach leads to a compact discrete form for the elastic regularization term R involving only Computational and Mathematical Methods in Medicine 9 first-order derivatives on each element ∈ T ℎ and numerical fluxes over cell boundaries. Focusing on a single element of the tessellation, the corresponding weak form of (35) takes the following form when we define the vector := ( + ] ) : The fluxeŝand̂can for example be taken as averages of the values on both sides of the boundary of or using suitable one-sided values. For more details on the LDG method see [27,28]. At the boundary we use homogeneous Dirichlet conditions for the deformation field . This is a reasonable assumption if we use bounded distance images as defined in (4). In this case, the images 0 and 1 are both equal to the constant bound near the domain boundary, and the functional only contains the regularization term R in this region; that is, only the regularization operator remains for which zero boundary conditions can be prescribed.
In our implementation, we use orthogonal basis functions { }. These are easily constructed by choosing on each element a polynomial basis A basis function of ℎ is then chosen to coincide with a polynomial basis function on one element and vanish on all other elements. Thus the mass matrix is a diagonal matrix of the form where is the identity matrix with = dim([ (R )] ) and is the number of elements in the grid; that is, the total number of DOFs is = . The diagonal form of makes its storage and the calculation of the weighted sample covariance matrix Σ defined in (27) very efficient. Thus, the LDG method used here not only allows us to efficiently use adaptivity and parallelization but also makes the computation of the PCA simple and efficient with respect to memory consumption and computational cost.

3D Surface Registration.
In this first experiment, we can observe that thanks to the level set representation of the surfaces, our method allows the accurate registration of surfaces. See Figure 3 for an example of the femur bone and Figure 4 for an example with the more complicated surface of the human skull, where our method benefits from the fact that the level set representation is independent of the topology of the surface. The topology of the skull surface is very complicated and, due to acquisition and segmentation artifacts, not necessarily the same for two different segmented skull surfaces.
In our experiments, only the reference was anatomically labeled and hand segmented with high attention to detail.   Figure 5 shows that this works over a very large range of examples from infant skulls to large adult skulls. Even though the concept of correspondence breaks down for different numbers of teeth in the reference and the target, the existing teeth are labelled correctly.
On a standard 3 GHz dual-core desktop PC with two parallel registration processes, a skull registration with 480 000 degrees of freedom (i.e., 160 000 grid points) takes about 10 minutes, a femur registration with 80 000 degrees of freedom (but more iterations) about 6 minutes. These are only indicative times to give the reader a feeling for the run time of our algorithm. Computation times can be further reduced with more parallel processes and more aggressive parameter tuning. Note that a great advantage of our method is that the adaptive discretization requires a much inferior number of degrees of freedom than a uniform discretization. A uniform discretization with a similar resolution around the surface requires about 18 million degrees of freedom, resulting in a memory consumption of over 700 MB per deformation field.

Curvature Term.
The above experiments were performed with all of the terms introduced in Section 2 except the statistical prior. We will now report on a couple of experiments that visualize the benefit of all the additional terms, starting with the curvature term introduced in Section 2.2.1.
In Figure 6(a), a registration of a femur is performed without the curvature term; that is, the surface is only represented by the level set function and the original CT scan. At a first glance, the surfaces are well registered, and the deformation field allows us to deform the reference bone so that it coincides with the target bone. However, on closer inspection, the implied correspondence is faulty. The deformation field matches the top of the trochanter minor (an important anatomical feature of the femur) of the reference to the side of the trochanter minor on the target. Such faulty correspondence causes problems in all subsequent applications such as building of statistical models or transferring anatomical labels. In Figure 6(b), when the curvature term is used, the correspondence is much more sensible, and top is matched to top and sides to sides. The mean curvature proves to be a good description of anatomical features of bones. Its use in the registration method ensures superior correspondence. Figure 7(a) shows the reference bone warped with a deformation field of a registration result.

Volume Preservation.
To amplify the effect for visualization purposes, we have warped the reference bone with the deformation field multiplied by 2. The coloring represents the size of the triangles that make up the surface. We see that in some places the original reference grid is quite unnaturally stretched resulting in very large triangles. This is the effect of large volume expansion in the deformation field. When this volume change is penalized with the volume preservation term, the resulting mesh is much more even, while still allowing an equally good matching of the target surface.
In Figure 7(a), the weighting parameters and ] from (8) have been chosen as = 2, ] = 0, that is, no volume preservation term. In Figure 7(b), as = ] = 2, that is, equal weight of the gradient and the divergence term. Simply augmenting the weight of the gradient term, for example, = 4, ] = 0 also results in less volume change, but still more than in Figure 7(b), while simultaneously decreasing the matching accuracy.

Statistical Deformation Prior.
In these final experiments, we exhibit the use and benefit of the statistical prior. We first performed registrations of 15 intact hands for the 2D example and 50 intact femurs for the 3D example. From these, statistical models are built according to the process described in Section 3.1. They can be used as prior knowledge for registering difficult or damaged data sets. In Figure 8,  Figure 7: Registration of two femurs with and without volume preserving term. When the term is used, the distribution of area over the triangles of the mesh is much more even because the limited volume change prohibits strong expansion or compression of the mesh. this prior knowledge allows the registration of a hand with a missing finger. Without this prior knowledge, the ring finger of the reference is deformed unnaturally in an attempt to match the hand with the missing finger. Note that the use of a robust distance measure and the regularization terms prevent a complete disappearance or distortion of the finger. When the prior knowledge from the statistical deformation model is used, the hand is well matched where intact. But the model enforces a registration with an intact ring finger, as all its training data sets include 5 intact fingers.
In Figure 9, a similar experiment in 3D is shown. The prior knowledge from 50 previous registrations is used in the registration of the reference bone to two damaged bones. The first bone has an artificial hip joint prosthesis and the second one a missing trochanter major. The registration method with statistical regularization registers the complete bones while complementing the missing parts from its prior knowledge. In this way, we can see how the original bone could have looked like. In these experiments, we have used 40 modes of variation (principle components). In a uniform discretization with a similar resolution around the surface, this would have required 28 GB of memory. Of course, the inclusion of the statistical prior introduces additional computational complexity. This results in an increase of computation time from about 5 minutes without to about 7 minutes with statistical prior per femur registration.
These experiments were performed with a robust distance measure (cf. Section 2.1). In the places with missing data, the regular 2 measure has a very large value and would dominate the registration process. As with all regularization techniques, a balance between over-and under-regularization has to be found. If too much weight is placed on the statistical prior, the result is pulled too close to the model.

Discussion
We have presented a registration method which allows the accurate and efficient registration of surfaces by registering their distance images. While this can be achieved with any image registration algorithm, we have shown that the most obvious choices for the distance measure and regularizer have several shortcomings, which can be relieved by adding additional distance and regularization terms to the model. We have seen that by representing the surfaces not by their distance images alone but also by a combination of feature images, most notably the curvature images, and we can obtain superior correspondence information. Concerning the regularizer, we have shown that adding a volume preserving term results in more even and naturally looking deformations and warps. Finally, we have shown that including prior knowledge in form of a statistical deformation model allows us to register considerably damaged data sets. While the statistical prior could be used in any registration task where prior registration results are available, we have found that for intact data sets (a) (b) (c) (d) Figure 9: Registration of a pair of damaged bones, with and without statistical regularization. The first bone has an artificial hip joint, and in the second, the trochanter major is missing. Without statistical regularization, the damaged bones are matched exactly ((a), (b)). With statistical regularization, the method recognizes that the damaged parts do not conform with the prior knowledge and restore them automatically ((c), (d)).
the method works well enough without the prior. We use the prior in cases where the data is too corrupted to be used directly, but where we still wish to add as much as possible of its information into our existing model. Our registration method is formulated continuously and is independent of the discretization method. We have presented a finite element discretization based on the local discontinuous Galerkin method, which allows for local grid adaption and a straightforward implementation of all the terms of our functional. The local grid adaption and parallelization limit the computational complexity and memory consumption, enabling us to perform registrations that were not possible with previous methods. In particular the inclusion of large statistical deformation models was not possible with previous methods based on uniform discretization such as the Demons Algorithm. This will become more and more important as the resolution of medical images increases in the future. The flexibility of the continuous functional and the LDG discretization allows the easy integration of other concepts and terms that are being developed in the field of image registration every day. On the other hand, the ideas and terms introduced here can be used with other discretization schemes or existing registration methods.
In the following, we give details on how to derive the Euler-Lagrange equations, which characterize the minimum of the functional J given in (21). Using the standard methods from the calculus of variations, we compute the Gateaux derivatives for each of the terms that make up our registration functional. This leads to the bilinear forms used in the finite element discretization in Section 3. We conclude this section with the derivation of the partial differential equations, that is, the strong form, which characterize the deformation field.
A.1. Distance Measure. As the individual terms of the combined distance measure (7)  The dependency of the weighting term on is neglected for reasons of efficiency. The other terms are derived analogously. The derivative of the combined distance measure (7) is therefore .

(A.2)
A.2. Regularization Term. We will continue by calculating the first variation of the regularization term R [ ] defined in (8). The divergence regularization term introduced in (9) is differentiated as follows: Statistical Prior. Taking into account that the covariance operator C is linear and self-adjoint and therefore this is also true for its pseudoinverse, it is easy to compute that the first variation of the statistical prior from (12)  This is the well-known equation from linear elasticity with a forcing term on the right hand side and the additional term from the statistical deformation prior.