3-D/2-D Registration of Cardiac Structures by 3-D Contrast Agent Distribution Estimation

For augmented fluoroscopy during cardiac catheter ablation procedures, a preoperatively acquired 3-D model of the left atrium of the patient can be registered to X-ray images. Therefore the 3D-model is matched with the contrast agent based appearance of the left atrium. Commonly, only small amounts of contrast agent (CA) are used to locate the left atrium. This is why we focus on robust registration methods that work also if the structure of interest is only partially contrasted. In particular, we propose two similarity measures for CA-based registration: The first similarity measure, explicit apparent edges, focuses on edges of the patient anatomy made visible by contrast agent and can be computed quickly on the GPU. The second novel similarity measure computes a contrast agent distribution estimate (CADE) inside the 3-D model and rates its consistency with the CA seen in biplane fluoroscopic images. As the CADE computation involves a reconstruction of CA in 3-D using the CA within the fluoroscopic images, it is slower. Using a combination of both methods, our evaluation on 11 well-contrasted clinical datasets yielded an error of 7.9+/-6.3 mm over all frames. For 10 datasets with little CA, we obtained an error of 8.8+/-6.7 mm. Our new methods outperform a registration based on the projected shadow significantly (p<0.05).

Abstract-For augmented fluoroscopy during cardiac catheter ablation procedures, a preoperatively acquired 3-D model of the left atrium of the patient can be registered to X-ray images. Therefore the 3D-model is matched with the contrast agent based appearance of the left atrium. Commonly, only small amounts of contrast agent (CA) are used to locate the left atrium. This is why we focus on robust registration methods that work also if the structure of interest is only partially contrasted. In particular, we propose two similarity measures for CA-based registration: The first similarity measure, explicit apparent edges, focuses on edges of the patient anatomy made visible by contrast agent and can be computed quickly on the GPU. The second novel similarity measure computes a contrast agent distribution estimate (CADE) inside the 3-D model and rates its consistency with the CA seen in biplane fluoroscopic images. As the CADE computation involves a reconstruction of CA in 3-D using the CA within the fluoroscopic images, it is slower. Using a combination of both methods, our evaluation on 11 well-contrasted clinical datasets yielded an error of 7.9±6.3 mm over all frames. For 10 datasets with little CA, we obtained an error of 8.8±6.7 mm. Our new methods outperform a registration based on the projected shadow significantly (p < 0.05).

I. INTRODUCTION
Atrial fibrillation is the most common heart arrhythmia affecting around 2.2 million people in the US. A possible treatment option is catheter ablation, which is a minimally invasive procedure. It is carried out using either electroanatomic mapping systems, a fluoroscopy guided approach or a combination of both. In this paper, we refer to fluoroscopy guided approaches. Unfortunately, X-ray images suffer from poor soft-tissue contrast such that the left atrium (LA) can only be seen if contrast agent (CA) is injected. However, to reduce the risk of contrast-induced nephropathy, physicians try to keep the use of CA to a minimum often highlighting only a part of the left atrium. To provide orientation to the physician when no CA is present, a model of the LA, e.g., generated by a CT or MRI scan of the patient can be overlaid [2]. As the coordinate systems of the preprocedurally acquired 3-D heart model and the patient during the intervention differ, a registration step has to be performed. In clinical practice, this registration is usually carried out manually often involving a CA injection [1]. Unfortunately, manual registration complicates the workflow. It either increases the workload of the treating physician, or it involves a trained assistant. Therefore, automatic registration is preferred.
There has been much research about registration of 3-D objects to 2-D fluoroscopic images, e.g. for bones [3], [4] or implants [7]. An overview is given by Markelj et al. [8]. Compared to implants, a registration of the LA is more complicated for two reasons: First, for implants and bones, usually the whole object is visible under fluoroscopy. During a contrast injection, however, only parts of the left atrium may be visible in the fluoroscopic images. Second, the general visibility of the LA may be poor depending on how much CA is used. As a consequence, further effort is needed to develop robust registration methods that can also be applied if CA is used sparingly.
In a first approach towards automatic LA registration, Thivierge-Gaulin et al. [9] tried to find a 3-D pose of a model such that its projected shadow matches the contrasted area in a selected image, enhanced by digital subtraction angiography (DSA), best. Based on CT images, a second approach by Zhao et al. [10] relied on digitally rendered radiographies of the segmented left atrium. The rendered image was compared to a DSA image using normalized gradient correlation where distinct regions of the atrium were weighted differently.
We propose two new registration techniques for contrast agent-based registration: In our first method, we take explicitly apparent edges extracted from a 3-D model and compare them to LA edges present in the fluoroscopic images as proposed by [3], [4] for automatic registration of bones and by [5] for manual registration of the LA, respectively. This comparison can be carried out quickly on a GPU. Second, we introduce a novel similarity measure for biplane fluoroscopy that is tailored for cases in which only parts of an object are visible. Based on a 3-D model of the LA, our second method estimates the contrast agent distribution inside the 3-D object from a simultaneously acquired pair of fluoroscopic images taken under two different view angles. Then we evaluate how consistent the contrast agent distribution estimate (CADE) is arXiv:1601.06062v1 [cs.CV] 22 Jan 2016 Fig. 1. A correct (red) and wrong (cyan) registration result. In both cases, the contrasted area is fully inside the projection shadow represented by the colored outline. This leads to a similar NCC value when using only an areabased feature for automatic registration. Thanks to the best reference frame selection, motion artifacts could be kept to a minimum. Only some remained in the vicinity of the moving coronary sinus (CS) catheter and the diaphragm, see white arrows.
with the acquired fluoroscopic images. As the CADE depends on the transformation used for registration, the transformation leading to the most plausible CADE is used as final position estimate.

II. REGISTRATION METHOD
For registration, two fluoroscopic sequences showing a CA injection are used. These sequences are acquired simultaneously from two different angles using an angiography biplane system. For each plane, the projection matrix that describes the X-ray camera setup is known. We denote the associated projection operator by P . We also assume that a 3-D model of the patient's LA is available, either as a triangle mesh or a binary volume, as they can be converted into each other.

A. Contrast Agent Extraction
The contrasted area is found based on a difference image (DSA) I DSA = I u − I c , I ∈ R m×n involving a frame I c that contains contrast agent and an uncontrasted frame I u .
To distinguish between contrasted and uncontrasted frames, either manual annotation, a threshold based method e.g. the method described in [10] or an automatic contrast detection [6] can be used. Depending on the chosen contrasted frame I c , I DSA may contain artifacts due to motion of the diaphragm or from catheters if they are at different positions in I u and I c . Such motion artifacts depend, unlike the information about contrast agent, to a large degree on the choice of I u . For example, if the catheters in I u are at the same position as in I c , their intensities cancel out in the subtraction image. Otherwise, I DSA has high positive values at the position of the catheter in I c and high negative values at the position of the catheter in I u . To keep motion artifacts to a minimum, we propose a best reference selection, which chooses an appropriate reference frameÎ u that matches the chosen contrasted frame I c as much as possible. Out of all uncontrasted frames, that frameÎ u is selected which minimizes the L 1 -norm of the resulting DSA imageÎ (1) By following Eq. 1, we get frames for which the catheters and the diaphragm cancel out as much as possible. See Figure 2(b) for an example. In I DSA , only pixels with positive values contain contrast agent. To extract them, we set the intensity of pixels with negative value to 0. Afterwards, we compute a filtered image I f by applying a median filter with a large kernel size. Smaller structures, e.g., caused by motion artifacts that remained despite the optimized choice of I u , do not pass this filter, and the noise in the contrasted area is reduced as well. Finally, a binary image I thr of the filtered image I f is computed using a threshold at µ f + σ f where µ f and σ f denote the mean and standard deviation of I f , respectively. Thus, a contrasted pixel p ∈ R 2 is indicated by I thr (p) = 1.
A previous approach [9] tried to find a transformation T of the 3-D model such that the projected shadows S A T , S B T of the model into the A-plane and the B-plane of a biplane C-arm system fit best to the contrasted region. Using the normalized cross correlation (NCC), denoted as ρ n , of two images I 1 , I 2 with corresponding mean values µ 1 , µ 2 and standard deviations σ 1 , σ 2 ρ n (I 1 , the similarity of the projected shadow and I DSA can be measured. A registration transformation can be estimated by maximizing either one of the two functions

B. Edge Feature
Unfortunately, a registration approach only based on contrasted area has multiple solutions if the amount of CA is so little that it can be located at different positions inside the LA, see Fig. 1. Often, CA is injected against the roof or into the pulmonary veins. This results in perceivable edges of the contrasted area which can be used as registration features as well. Edge-based registration can be done using only the silhouette boundary of the projected object [7], see Fig. 1 or all apparent edges [3], [4], see Fig. 2(f).
We decided to go for the second approach, as the silhouette corresponds to the edges in the image only if the complete atrium is filled with contrast. For a partially contrasted left atrium, internal contours may, however, also appear in the fluoroscopic images. This was already found to be beneficial for manual LA registration [5]. Instead of considering edges implicitly by comparing the DSA image to a DRR using gradient correlation [10], we computed them explicitly. To extract edges in the fluoroscopic images, we used the previously computed filtered image I f . After applying a median filter, edge-like variations inside the contrasted areas may remain. They would trigger a response, if an edge filter was applied. To obtain an edge response only at the boundaries of the contrasted area, the image needs to be homogenized before applying an edge filter. Using a simple threshold method would result in a loss of the intensity drop-off at the boundary which provides important information about the edge intensity. Therefore, we weigh all image pixels by a sigmoid function I sig (x, y) = 1 1 + e −(I f (x,y)+t)·s .
The value of t is set to µ f − σ f , and the parameter s depends on the pixel intensity range of the input image. An example of I sig is given in Figure 2(d). Finally, I sig is filtered using a derivative of Gaussian (DOG) filter to obtain the edge image I DOG . The kernel size of the DOG-filter is set to a large value to get a smooth similarity measure, see Figure 2(e). The projection of the 3-D triangle mesh edges into 2-D is done differently than in [3], [5]. We rendered the whole surface mesh and, depending on the viewing direction d and the surface normal n at a point, we set the opacity of projected triangles to o = 1 − (d • n), see Figure 2(f) for an example. By doing so, areas that are parallel to the imaging plane are rendered transparent while areas with a normal vector orthogonal to the viewing direction are rendered opaque. The similarity between edges extracted from the fluoroscopic images and the edge images E A T , E B T rendered from the 3-D model transformed by T is measured by

C. Contrast Agent Distribution Estimation (CADE)
Previous approaches [9], [10] for LA registration searched for a rigid transformation of the LA such that either its projected shadow or its DRR fit to the contrasted area in both fluoroscopic images. 3-D information was integrated insofar as the resulting projections came from the same 3-D position of the model. Unfortunately, such an approach does not guarantee that corresponding objects in both fluoroscopic images are matched to the same 3-D structure of the LA. More precisely, the registration result could be such that in plane A, the contrast agent is located in a left pulmonary vein (PV) whereas in plane B, the contrasted area corresponds to a right PV. This is possible as for a given 2-D registration in one plane, the 2-D registration in the other plane has one degree of freedom, which corresponds to an out-of-plane motion in the first plane.
To solve this problem, we compute for a given transformation T a CADE inside the LA using binary reconstruction. Then, T is optimized such that the contrast agent distribution estimate is most consistent with the projection images. More precicely, a voxel v is estimated as contrasted if it fulfills all of the following conditions: The voxel v transformed by T is (a) projected on a contrasted pixel in plane A, (b) projected on a contrasted pixel in plane B, and (c) v is part of the left atrium as contrast agent can only be found inside the left atrium. For the computation of the CADE, we define therefore the indicator function Given the binary images I A thr and I B thr with corresponding projection operators P A , P B and the indicator function χ(v ), the CADE C 3−D T can be computed as for a given rigid transformation T . These three factors correspond to the aforementioned conditions. If T is chosen suboptimally, the resulting 3-D CADE will be inconsistent with the CA observed in the 2-D images. I.e. a pixel in the 2-D image is constrasted but no corresponding voxel along its projection ray is estimated as contrasted. This can be due to following reasons as shown in Figure 3: (a) the projection ray from a contrasted pixel does not intersect the left atrium as the LA has not been placed at the proper position yet; (b) the projection ray hits the LA, but all voxels intersected by this ray cannot contain CA because their corresponding pixels in the other plane are uncontrasted. Fig. 3. For the transformation shown, only voxel B is estimated as containing CA as it is inside the LA and contrasted in both planes. Voxel C is only contrasted in Plane A but not in Plane B. Voxel A is filled with contrast in both planes, but it is outside of the LA and therefore considered as uncontrasted. If the LA is moved such that its projections in A and B cover the contrasted area, this voxel will also be estimated as contrasted.
Additional inconsistencies are introduced by pixels which are erroneously labeled as contrasted e.g. due to motion artifacts. To verify the validity of the CADE, we compute 2-D images C A T , C B T by forward projecting all contrasted voxels in C 3−D T using P A , P B . We assess the consistency of the CADE for the given transformation T by computing the similarity between the fluoroscopic images and the projected CADE by

III. EXPERIMENTS AND RESULTS
We evaluated our method on 21 clinical biplane X-ray sequences from 10 different patients. The data set contained 11 sequences showing an initial contrast agent injection where CA was injected into the LA center through a sheath. Besides the sheath, only the coronary sinus catheter was present. There were 10 more sequences showing a secondary injection for re-registration. Here, less CA was injected and additional catheters were present. For all 133 contrasted frames, reference registrations performed by three clinical experts were available. These reference registrations covered only translation as in [1].
As initialization for optimization, the 3-D model was placed at that 3-D position which corresponded to the centers of both 2-D images. In some cases, the initialization was more than 30 mm away from the correct solution and beyond the capture range for gradient-based methods. Therefore we applied an octree-like coarse-to-fine scheme where we evaluated several positions at a coarse resolution. At positions in space that yielded a good similarity value, we performed subsequent evaluations on an increasingly finer resolution. The 3-D translationt = arg max t ρ(t) found by the optimization process of the respective objective function ρ was compared to the mean translation vector t * of the three manual registration results. The distance ||t − t * || 2 was used as error measure. The significance of the results was measured using a Wilcoxon signed-rank test and a significance level of 0.05.
All sequences contained 12-bit images of size 1024 × 1024 pixels, all image processing, including rendering from the 3-D model, was performed on the full image size. The kernel size for median filter was 30 pixels, the value s of Eq. 5 was 0.1. The standard deviation of the DOG filter was 24 pixels.  In the evaluation, we compared the similarity measures ρ thr shad , ρ DSA shad , ρ CADE and ρ edge and the combined similarity measures ρ thr shad +ρ edge , ρ DSA shad +ρ edge and ρ CADE +ρ edge . We investigated different weightings. Giving both terms equal weights turned out to be a good choice. During evaluation, a registration for all frames marked as contrasted was performed.
We computed a registration for each contrasted frames and compared the result to the manual registration of the physicians. These results are clinically relevant if the physician requires a registration for a frame he determines, e.g depending on the breathing phase. As potentially every frame could be selected by the physician, the overall accuracy should be high. The overall accuracy is also of importance if the results are post-processed, e.g. a temporal filtering is applied.
The evaluation results are presented in TABLE I. The overall inter-user-variablity observed in the manual registrations was 3.2±2.3 mm. Considering all sequences, ρ thr shad and ρ DSA shad gave significant better results when they are combined with ρ edge . Also the performance of ρ CADE + ρ edge was significantly better than ρ thr shad , ρ DSA shad and ρ thr shad + ρ edge . Compared to other measures, ρ edge gave significantly worse results. An example for a result is given in Fig. 4.
The image preprocessing takes 0.5 s on an Intel Xeon E3 with 3.4 GHz and 16 GB RAM. The evaluations of the similarity measures were performed completely on the GPU. On an NVIDIA GeForce GTX 660 the evaluation of ρ DSA shad , ρ thr shad and ρ edge took 1.8 ms for a given translation and 13.4±3.7 ms for ρ CADE . The whole registration for a single frame takes 2.9 s for ρ DSA shad and ρ thr shad and 21.5±5.9 s for ρ CADE . For a combination with ρ edge it takes 5.8 s and 27.1±5.9 s, respectively.

IV. DISCUSSION AND CONCLUSIONS
Our novel CADE based method outperformed the shadow based similarity measures ρ DSA shad and ρ thr shad , especially for reregistration sequences where only a small amount of contrast agent was used. For these sequences, registration errors leading to inconsistent results could be avoided by ρ CADE . For well contrasted sequences, the improvement by ρ CADE is less as inconsistencies play a minor role. We found that the similarity measure using explicit apparent edges, ρ edge , yields poor results when used by its own, but can improve results significantly when combined with other similarity measures.
It remains open if the accuracy for all frames, which is between between 7 mm and 9 mm, is sufficient for a clinical application. In a future work, temporal constraints on the heart movement could be included in the registration process or a temporal filtering could be applied afterwards.
Compared to the approach by Zhao et al. [10], the determination of special weightings for different heart regions, and, for ρ thr shad +ρ edge , a time consuming DRR generation was avoided. To summarize, a registration based on a combination of shadow and edge features leads to a registration that can be computed fast. If it is possible to use more time for registration, the novel CADE-based measure, which estimates consistency, should be used as it leads to better results.