Contrast-Based 3D/2D Registration of the Left Atrium: Fast versus Consistent

For augmented fluoroscopy during cardiac ablation, a preoperatively acquired 3D model of a patient's left atrium (LA) can be registered to X-ray images recorded during a contrast agent (CA) injection. An automatic registration method that works also for small amounts of CA is desired. We propose two similarity measures: The first focuses on edges of the patient anatomy. The second computes a contrast agent distribution estimate (CADE) inside the 3D model and rates its consistency with the CA as seen in biplane fluoroscopic images. Moreover, temporal filtering on the obtained registration results of a sequence is applied using a Markov chain framework. Evaluation was performed on 11 well-contrasted clinical angiographic sequences and 10 additional sequences with less CA. For well-contrasted sequences, the error for all 73 frames was 7.9 ± 6.3 mm and it dropped to 4.6 ± 4.0 mm when registering to an automatically selected, well enhanced frame in each sequence. Temporal filtering reduced the error for all frames from 7.9 ± 6.3 mm to 5.7 ± 4.6 mm. The error was typically higher if less CA was used. A combination of both similarity measures outperforms a previously proposed similarity measure. The mean accuracy for well contrasted sequences is in the range of other proposed manual registration methods.


Introduction
Atrial fibrillation is the most common heart arrhythmia affecting around 2.2 million people in the USA. A possible treatment option is catheter ablation, which is a minimally invasive procedure. Depending on the ablation device used, it is carried out using either electroanatomic mapping systems, an X-ray guided approach, or a combination of both. X-ray guidance is, for example, required if a cryoballoon is used for ablation as current cryoballoons cannot be directly located by electroanatomic mapping systems. In this paper, we focus on X-ray guided approaches. One weakness of X-ray imaging is poor soft-tissue contrast. As a consequence, the left atrium (LA) can only be seen if contrast agent (CA) is injected. However, to reduce the risk of contrast-induced nephropathy, physicians try to keep the use of CA to a minimum often highlighting only a part of the left atrium. To provide orientation to the physician when no CA is present, a model of the patient's LA, for example, generated by a preoperative CT or MRI scan of the patient can be overlaid [1]. The use of such an overlay was found to reduce procedure time and fluoroscopy time significantly [2]. Since a preprocedural 3D scan is often acquired to obtain prior knowledge about a patient's anatomy of the left atrium and to rule out unusual pulmonary vein configurations, it is readily available for augmented fluoroscopy applications. In general, however, the coordinate systems of the preprocedurally acquired 3D heart model and the patient during the intervention differ, and a registration step is usually needed. Today, this registration is usually carried out manually. If the 3D heart model was acquired using CT, several authors proposed to use further landmarks in addition to the LA for registration having 2 International Journal of Biomedical Imaging obtained associated 3D models by segmentation. Examples are the carina [3], the coronary sinus (CS) [4,5], the spine [6], or a combination of spine and heart anatomy [7]. For 3D volumes acquired by MRI, for example, MRI angiographies (MRAs), however, a segmentation of the CS or the carina can be very challenging or has not yet been adopted for routine clinical procedures. Fortunately, registration based on contrast injection has been shown to be a fast and accurate alternative [8]. A good moment to obtain contrast-enhanced X-ray images is after the transseptal puncture, in particular if physicians use contrast injections to verify puncture success [9]. Example images of such injections are shown in Figure 1 to illustrate the images available. Since manual registration either needs the attention of the treating physician or requires a trained assistant, more automatic registration methods are desirable.
(1) Related Work. There is a significant body of research on registration of 3D objects to 2D fluoroscopic images, for example, for bones [10][11][12], implants, joints [13], or vessels International Journal of Biomedical Imaging 3 [14]. An overview is given by Markelj et al. [15]. Compared to implants, a registration of the LA is more complicated for at least two reasons: First, for implants and bones, all parts of the object are visible. During a contrast injection, however, only a part of the left atrium may appear under X-ray. Second, depending on the amount of CA injected, the overall LA visibility may be poor. For example, in our case EP physicians use contrast at the beginning of the procedure to verify the success of the transseptal puncture. It may also be used later to enhance the anatomy, for example, to make sure that a catheter, for example, a circumferential mapping catheter or a cryoballoon catheter, was placed correctly. This is different to vessel angiography. In this case, higher amounts of contrast are injected to derive diagnostic information, for example, about a stenosis. For ablation procedures in the LA, the CA density and so the visibility of the LA under X-ray may be poorer as the small amount of contrast is injected into the high blood volume of the LA. As a consequence, further effort is needed to develop robust registration methods that can also be applied if CA is used sparingly.
Currently, very few publications deal with registration based on contrast agent: In a first approach towards automatic LA registration, Thivierge-Gaulin et al. [16] tried to find a 3D pose of a model such that its projected shadow matches the contrasted area in a selected image, enhanced by digital subtraction angiography (DSA), best. However, if only a small amount of contrast agent is injected into a somewhat large chamber such as the left atrium, this approach will not lead to a distinct optimum, because the anatomy may not be fully opacified.
Based on CT images, a second approach by Zhao et al. [17] relied on digitally rendered radiographies (DRRs) of the segmented left atrium. The rendered image was compared to a DSA image using normalized gradient correlation. This approach uses a weighting scheme that puts the focus on the roof of the LA. How well this method performs for injections into other areas of the LA is still unclear.
Besides contrast-based registration, 3D data and 2D data can also be aligned using devices. A feature-based method by Sra et al. [4] uses a segmentation of the coronary sinus (CS) in a CT volume to register a 3D LA model to a single 2D fluoroscopic image. A further approach by Brost et al. [5] uses a segmentation of the CS in an MRI volume and a 3D reconstruction of the CS catheter from two fluoroscopic images. Unfortunately, how well the CS can be extracted from a 3D MRI data set depends very much on the MRI scan protocol. Furthermore, due to the strong motion of the CS, it is difficult to relate the position of the CS catheter to the position of the LA, especially for patients having no sinus rhythm but heart arrhythmia [18,19].
(2) Contributions and Outline of This Work. At the beginning of Section 2, we discuss the use of the normalized crosscorrelation for registration of a 3D object to 2D X-ray projections based on extracted contrast agent. Afterwards, we propose two new registration techniques for contrast agentbased registration. They require a segmented left atrium as input and can, in contrast to [17], accommodate both CT and MRI data. The first method is described in Section 2.2.
Here, we take explicitly apparent edges extracted from a 3D model and compare them to LA edges present in the fluoroscopic images as proposed by [10,11] for automatic registration of bones and by [20] for manual registration of the LA, respectively. Although our method is conceptually similar to a DRR generation followed by gradient correlation, this calculation can be carried out on a GPU much more quickly than a DRR.
The second contribution described in Section 2.3 is the introduction of a novel similarity measure for biplane X-ray that is tailored to cases in which only parts of an object are visible. Based on a 3D model of the LA, our second method estimates the contrast agent distribution inside the 3D object from a simultaneously acquired pair of fluoroscopic images taken under two different view angles. Then we evaluate how consistent the contrast agent distribution estimate (CADE) is with the acquired fluoroscopic images. As the CADE depends on the transformation used for registration, the transformation leading to the most plausible CADE is used as final position estimate. A brief description of both methods was previously published by us [21].
In Section 2.4 we propose to treat the registration results for each frame of the sequence no longer as independent. Based on our previous publication [22], we use a Markov chain approach to exploit the temporal dependency between successive frames instead.
In Section 3, we evaluate a similarity measure based on the projected shadow which is close to the approach by Thivierge-Gaulin et al. [16] and the two new methods as well as combinations of them. The results are discussed in Section 4.

Registration Method
For registration, two X-ray sequences showing a CA injection are used. These sequences are acquired simultaneously from two different angles using a biplane system. For each plane, the projection matrix that describes the X-ray camera setup is known. We denote the associated projection operator by A and B for the A-plane and the B-plane of the system, respectively. We also assume that a 3D model of the patient's LA is available, either as a triangle mesh or as a binary volume, as they can be converted into each other.

Contrast Agent Extraction.
The contrasted area is found based on a difference image (DSA) I DSA = I − I , I ∈ R × involving a frame I that contains contrast agent and an uncontrasted frame I . To distinguish between contrasted and uncontrasted frames, either manual annotation or a learning based method can be used, for example, the method described in [23]. Depending on the chosen contrasted frame I , I DSA may contain artifacts, for example, due to motion of the diaphragm or from catheters, if they are at different positions in I and I . Such motion artifacts depend, unlike the information about contrast agent, to a large degree on the choice of I . For example, if the catheters in I are at the same position as in I , their intensities cancel out. Otherwise, I DSA has high positive values at the position of the catheter in I and high negative values at the position of the catheter in I . To keep motion artifacts to a minimum, we propose a best reference selection, which chooses an appropriate reference frameÎ that matches the chosen contrasted frame I as much as possible. Out of all uncontrasted frames, that frameÎ is selected which minimizes the 1 -norm of the resulting DSA image:Î = arg min By following (1), we get frames for which the catheters and the diaphragm cancel out as much as possible. See Figure 3(b) for an example. In I DSA , only pixels with positive values contain contrast agent. To extract them, we set the intensity of pixels with negative value to 0. Afterwards, we compute a filtered image I by applying a median filter with a large kernel size. Smaller structures, for example, caused by motion artifacts that remained despite the optimized choice of I , do not pass this filter, and the noise in the contrasted area is reduced as well. Finally, a binary image I thr of the filtered image I is computed using a threshold at + , where and denote the mean and standard deviation of I , respectively. Thus, a contrasted pixel p ∈ R 2 is indicated by I thr (p) = 1.
A previous approach [16] tried to find a transformation of the 3D model such that the projected shadows S A , S B of the model into the A-plane and the B-plane of a biplane C-arm system fit best to the contrasted region. Using the normalized cross-correlation (NCC), denoted as , of two images I 1 , I 2 with corresponding mean values 1 , 2 and standard deviations 1 , 2 the similarity of the projected shadow and I DSA can be measured. Instead of I DSA , one can also use the binary version of it, that is, I thr . Then a registration transformation can be estimated by maximizing either one of the two functions (3)

Edge
Feature. Unfortunately, a registration approach only based on contrasted area has multiple solutions if the amount of CA is so little that it does not completely fill the LA; see Figure 2. Fortunately, CA is often injected against the roof or into the pulmonary veins. This results in perceivable edges of the contrasted area which can then be used as registration features. Edge-based registration can be carried out using only the silhouette boundary of the projected object [13] (see Figure 2) or all apparent edges [10,11] (see Figure 3(f)).
We decided to consider all apparent edges to improve robustness against injections of small amounts of CA, as the silhouette-based approach requires that the LA is contrasted in its entirety. For a partially contrasted left atrium, internal Figure 2: A correct (red) and wrong (cyan) registration result. In both cases, the contrasted area is fully inside the projection shadow represented by the colored outlines. Both registration results lead to a similar NCC value when using an area-based feature for automatic registration. Note that motion artifacts could be kept to a minimum thanks to the best reference frame selection. Only some residual artifacts remained in the vicinity of the moving coronary sinus (CS) catheter and the diaphragm; see white arrows.
contours may, however, also appear in the fluoroscopic images. This was already found to be beneficial for manual LA registration [20]. Instead of considering edges implicitly by comparing the DSA image to a DRR using gradient correlation [17], we computed them explicitly. To extract edges in the fluoroscopic images, we used the filtered image I . After applying a median filter, edge-like variations inside the contrasted areas may remain. They, however, correspond rarely to anatomical structures and would trigger a response, if an edge filter was applied. To obtain an edge response only at the boundaries of the contrasted area, the image needs to be homogenized before applying an edge filter. Using a simple threshold method would result in a loss of the intensity dropoff at the boundary which provides important information about the edge intensity. Therefore, we weigh all image pixels by a sigmoid function The value of is set to − , and the parameter depends on the pixel intensity range of the input image. An example of I sig is given in Figure 3(d). Finally, I sig is filtered using a derivative of Gaussian (DOG) filter to obtain the edge image I DOG . The kernel size of the DOG filter is set to a large value to get a smooth similarity measure; see Figure 3(e).
The projection of the 3D triangle mesh edges into 2D is done differently than in [10,20]. We rendered the whole surface mesh and, depending on the viewing direction d and the surface normal n at a point, we set the opacity of projected triangles to = 1 − |(d ∘ n)|; see Figure 3(f) for an example. By doing so, areas that are parallel to the imaging plane are rendered transparent while areas with a normal vector orthogonal to the viewing direction are rendered opaque. The similarity between edges extracted from the International Journal of Biomedical Imaging fluoroscopic images and the edge images E A , E B rendered from the 3D model transformed by is measured by

Contrast Agent Distribution Estimation (CADE).
Previous approaches [16,17] for LA registration searched for a rigid transformation of the LA such that either its projected shadow or its DRR fit to the contrasted area in both fluoroscopic images. 3D information was taken into account insofar as the resulting projections came from the same 3D position of the model. However, such an approach does not necessarily guarantee that corresponding objects in both fluoroscopic images are matched to the same 3D structure of the LA. More precisely, the registration result could be such that, in plane A, the contrast agent is located in a left pulmonary vein (PV) whereas, in plane B, the contrasted area corresponds to a right PV. This is possible as for a given 2D registration in one plane the 2D registration in the other plane has one degree of freedom, which corresponds to an out-of-plane motion in the first plane. An illustration of this problem can be found in Figure 4.
To solve this problem, we compute for a given transformation a CADE inside the LA using binary reconstruction. Then, is optimized such that the contrast agent distribution estimate is most consistent with the projection images. More precisely, a voxel k is estimated as contrasted if it satisfies all of the following conditions: (a) the voxel k transformed by is projected on a contrasted pixel in plane A, (b) k is projected on a contrasted pixel in plane B, and (c) k is part of the left atrium (as contrast agent can only be found inside the LA). To compute the CADE, we define the indicator function Given the binary images I A thr and I B thr with corresponding projection operators A , B and the indicator function (k), the CADE 3D for a transformed voxel, (k), can be computed as for a given rigid transformation . This product is the mathematical equivalent of the three conditions introduced above.
If is chosen suboptimally, the resulting 3D CADE will be inconsistent with the CA observed in the 2D images. That is, a pixel in the 2D image is contrasted but no corresponding voxel along its projection ray is estimated as contrasted. This can be due to the following reasons as shown in Figure 5: (a) the projection ray from a contrasted pixel does not intersect  Considering the LA position in (b), the area, which can contain contrast agent according to the combination of both detectors, is partially outside the LA. For the CADE, however, only contrast agent inside the LA is considered. So the registration in (b) gives rise to contrasted pixels on the A-plane detector that cannot be explained by the CADE. This will result in a low CADE value and indicate misregistration. There exist, however, also misregistrations that can lead to a consistent CADE as shown in (c). This is why a combination of CADE and edge-based methods yields improved results. Figure 5: For the transformation shown, only voxel B is estimated as containing CA as it is inside the LA and contrasted in both planes. Voxel C is only contrasted in plane A but not in plane B. Voxel A is filled with contrast in both planes, but it is outside of the LA and therefore considered as uncontrasted. If the LA is moved such that its projections in A and B cover the contrasted area, this voxel will also be considered as contrasted, hence, increasing the consistency with the CADE.
the left atrium as the LA has not been placed at the proper position yet; (b) the projection ray hits the LA, but all voxels intersected by this ray cannot contain CA because their corresponding pixels in the other plane are uncontrasted. Additional inconsistencies are introduced by pixels which are erroneously labeled as contrasted, for example, due to motion artifacts. To verify the validity of the CADE, we compute binary 2D images C A , C B by forward projecting all contrasted voxels in 3D using A , B . We assess the consistency of the CADE for the given transformation by computing the similarity between the fluoroscopic images and the projected CADE by Alternatively, the number of contrasted voxels inside the volume could be maximized. The set of potentially contrasted International Journal of Biomedical Imaging voxels is defined by the intersection of the projection ray bundles of all contrasted pixels from views A and B, respectively. Maximizing the number of contrasted voxels would lead to a transformation such that as much as possible of this set is contained inside the LA. If, for example, a pulmonary vein was filled with CA, the transformation would be chosen such that the set of possible contrasted pixels was located in the main body of the LA which provides the most space to include most possibly contrasted voxels; see Figure 6. To avoid this registration bias towards large structures inside the LA, we decided to optimize the consistency to the 2D images.

Confidence-Based Temporal Markov Filtering.
To exploit dependencies between successive frames of a sequence, we model the position of the LA model in 3D as a timedependent continuous Markov chain of the first order as proposed by us previously [22]. The states are transformations, that is, a translation and a rotation of the model. The parameter denotes the actual transformation of the LA in the th contrasted frame, refers to the estimate of the transformation for the th frame, and ( = ) is the probability that for frame the transformation is observed. For convenience, we define ( ) = ( = ). The transition probability from one state into another is independent of the frame number . It is denoted as ( → +1 ). Finally, a sequence of transformations 1 , . . . , is to be determined such that the term is maximized.
Transition probabilities and state probabilities control the filtering differently: Due to the transition probabilities, small motions of the LA from a frame to its next frame are preferred. This results in an averaging of the LA transformation over time. The state probability determines the impact of the averaging, that is, how much the registration result is changed by the filtering. We will model the state probability based on a confidence measure such that frames with high-confidence registration results are less subject to temporal filtering. If the confidence value is low, that is, the estimated error of the registration result is higher, we want to rely more on the results of the previous and next frame. Thus, high temporal averaging is performed for frames with a low confidence value. The state probabilities and the transition probability are determined as follows.
The confidence range is incorporated by setting the covariance matrix Σ to 1 ⋅ ( (I A, , I B, , )).

Transition
Probability. The transition probability ( −1 → ) states how likely a movement of the LA is from frame − 1 to . Over multiple breathing cycles, the LA moves about a mean position. So the mean transformation is the identity. The likelihood of a transformation change depends on the magnitude of the change. The larger the change in translation and rotation is, the less likely the transformation transition is. To account for different frame rates, we consider the translational and rotational velocity k ∈ R 6 which comprises three translational velocities and three rotational velocities. In a training step, velocity vectors are computed for annotated data. The covariance matrix Σ k of the velocities gives an estimate for how likely a transition from one transformation to the next is. We model the probability of a transition −1 → as a normal distribution Besides the frame rate , the transition probability depends also on the current breathing phase. Compared to breathing 8 International Journal of Biomedical Imaging motion, cardiac motion can be neglected as it rather deforms the LA. If information on the breathing phase can be estimated [24], superior motion should be, for example, more likely during the inhale phase. During exhalation phase, inferior motion should have a higher probability. To account for breathing motion, for example, the mean value and the covariance matrix could be estimated separately for each stage of the breathing cycle.

Most Probable State
can be found using a log-likelihood method: By applying the logarithm to (9), we get * 1 , . . . , * = arg max This convex optimization problem can be solved using the BFGS method.

Experiments and Results
We retrospectively evaluated our method on 21 clinical biplane X-ray sequences from 10 different patients. All of the patients provided their informed consent for the analysis of their clinical data. For all patients, a segmentation of their left atrium from a preoperatively acquired MRI scan was available as a triangle mesh. When using the CADE measure, this mesh was converted into a binary volume. The data set comprised 11 sequences showing an initial contrast agent injection where 15 mL CA was injected into the LA center through a sheath to verify the success of the transseptal puncture. Besides the sheath, only a coronary sinus catheter was present. There were 10 more sequences showing subsequent injections to verify catheter placement. In these cases, about 10 mL CA was injected and additional catheters were present. All Xray angiography sequences were acquired during normal breathing using a standard acquisition protocol. This resulted in a total number of 133 contrasted frames. For our experiments, the first contrasted frame was determined manually. For 129 frames, reference registrations performed by three clinical experts were available; for the remaining four frames registrations performed by two clinical experts were available. Reference 3D registrations were established by manually shifting the mesh in each of two orthogonal views. The resulting 3D translation was refined this way until a good match to the associated contrast distributions as seen in the two 2D X-ray projections had been found. These reference registrations covered only 3D translation for the following reasons: First, patients were positioned head first, supine, during both preoperative and intraoperative imaging. This patient positioning rules out large degrees of rotation a priori. Furthermore, given the small amounts of contrast injected in our cases, the remaining small rotations were very difficult to detect. This made it practically impossible for our clinical experts to reliably correct them. This is why we decided to evaluate our approach without rotation.
As initialization for optimization, the 3D model was placed at that 3D position which corresponded to the centers of both 2D images. In some cases, the initialization was more than 30 mm away from the correct solution and beyond the capture range for gradient-based methods. Therefore we applied an octree-like coarse-to-fine scheme where we evaluated several positions at a coarse resolution. At positions in space that yielded a good similarity value, we performed subsequent evaluations at an increasingly finer resolution. The 3D translationt = arg max t (t) found by the optimization process of the respective objective function was compared to the mean translation vector t * of the three manual registration results. The distance ‖t − t * ‖ 2 was used as error measure. The significance of the results was measured using a Wilcoxon signed-rank test [25] and a significance level of = 0.05.
All sequences contained 12-bit images of size 1024 × 1024 pixels; all image processing, including rendering from the 3D model, was performed on the full image size. The kernel size for median filter was 30 pixels; the value of (4) was 0.1. The standard deviation of the DOG filter was 24 pixels.
In the evaluation, we compared the similarity measures thr shad , DSA shad , CADE , and edge and the combined similarity measures thr shad + edge , DSA shad + edge , and CADE + edge . We investigated different weightings. Giving both terms equal weights, that is, setting = 1, turned out to be a good choice. We did two types of evaluation: an evaluation on all frames and an evaluation using only a single frame of each sequence, namely, the one which provides the best similarity measure.

Evaluation on All Frames.
We computed a registration for each contrasted frame and compared the result to the manual registration of the physicians. These results are clinically relevant if the physician requires a registration for a desired frame of his choice, for example, depending on the breathing phase. As potentially every frame could be selected by the physician, the overall registration accuracy should be high. The overall accuracy is also of importance if the results are postprocessed by temporal filtering.
International Journal of Biomedical Imaging  The evaluation results are presented in Table 1. The overall interuser variability observed in the manual registrations was 3.2 ± 2.3 mm. Considering all sequences, CADE performed significantly better than thr shad , DSA shad , and the state-of-theart method by Thivierge-Gaulin et al. [16]. Although edge gave significantly worse results than all other measures, a combination with edge could improve the results of thr shad and DSA shad significantly. For subsequent injections, CADE + edge performed significantly better than thr shad , DSA shad , their respective combinations with edge , and the method by Thivierge-Gaulin et al. [16].
For thr shad + edge and CADE + edge , the error distribution for each sequence is shown in Figure 8, first for all initial injections and then for subsequent injections. In Figure 9, the error distribution is given for each frame number as counted after the CA injection. An example for a result is shown in Figure 7. Figure 8 indicates that many sequences have at least one frame with a registration error of less than 5 mm. In fact, there exists such a frame for over 75% of all sequences when using DSA shad or thr shad and for over 85% of the sequences when using CADE , DSA shad + edge , thr shad + edge , or CADE + edge . Zhao et al. [17] proposed to automatically find a good frame in each sequence and consider only these single good frames for evaluation. In other words, from all frames of a sequence, a single frame was automatically chosen for registration. From a clinical point of view, this would, however, only make sense if the physician accepted an automatically selected frame for registration or if further motion compensation steps for the other frames were done, for example, using device tracking [26]. Zhao et al. suggested selecting this single frame as follows: For each frame ∈ [1, ] out of the contrasted frames, the transform t that maximizes the similarity measure for this frame is computed. In a second step, going through all frames of a sequence, the frame with the highest overall similarity  Figure 1(a). Here, contrast agent is very faint and in some frames contrasted is already ejected into the left ventricle. Sequence 5 is shown in Figure 1(d). In this case, thr shad + edge got confused by edges caused by motion artifacts.

Evaluation on a Single Frame.
measure is picked. The associated transform is denoted byt . The error to the average manual registration result t * is then computed aŝ The evaluation results are presented in Table 2. Recalling Section 3.1, where we found that the CADE method yielded good results for all frames, the CADE performance for a single frame was, however, not that good. This is why we investigated if a different single-frame selection strategy would lead to better results. Instead of using CADE both for translation estimation and for best-frame selection, we used CADE only for estimating the translation t for each frame . Among the obtained translations t , we selected the frame with the corresponding translationt such that thr shad was maximized. So (14) was modified tô t − t * 2 witht = arg max thr shad (I A, thr , I B, thr , t ) , t = arg max t CADE (I A, thr , I B, thr , t) .
The results of this frame selection method are denoted by CADE ⊳ thr shad . We did the same also for the combination with edge . Although only 10 sequences were available for initial contrast injections, DSA shad , thr shad + edge , and ( CADE + edge ) ⊳ ( thr shad + edge ) gave significant better results when compared to thr shad . Also the performance of edge was significantly less. For subsequent injections, our proposed method CADE ⊳ thr shad and the corresponding combination with edge outperformed the method by Thivierge-Gaulin et al.

Temporal
Filtering. The error estimate function ( (⋅)) and the transition probability covariance matrix Σ k were trained in a leave-one-patient-out cross-validation. The results for the temporal filtered frames are given in Table 3. For a combination of the CADE and edge similarity measure, Markov filtering reduced the mean error to 5.7 ± 4.6 mm for initial injections and 6.6 ± 3.6 mm for subsequent injections. The respective median errors were 4.0 mm and 5.7 mm.
The results obtained by the Markov filtering were in all cases significantly better than the results without filtering.

Runtime
Performance. The image preprocessing took 0.5 s on an Intel Xeon E3 with 3.4 GHz and 16 GB RAM. The evaluations of the similarity measures were performed completely on the GPU. On an NVIDIA GeForce GTX 660 the evaluation of DSA shad , thr shad , or edge took 1.8 ms for a given translation and 13.4±3.7 ms for CADE . The whole registration for a single frame took 2.9 s for DSA shad or thr shad and 21.5 ± 5.9 s for CADE . For a combination with edge it took 5.8 s and 27.1 ± 5.9 s, respectively. The runtime of the Markov filtering was 173 ± 124 ms per sequence. Number of frames after beginning of contrast injection Figure 9: Average registration errors of CADE + edge depending on the position of the frame after start of the contrast agent injection calculated over all sequences. Frame 1 denotes the first frame which contained contrast agent. The average error at the second frame after the CA injection was 11.8 mm ± 9.5 mm; the average error at the fifth frame was 5.5 mm ± 3.0 mm. The frame rate of all sequences was 7.5 fps. Assuming a heart rate of 60 bpm, one heart cycle takes approximately 7 frames. So a first portion of CA is ejected into the ventricle not later than at the 7th frame. Note that the number of sequences used for calculation was not constant for each frame number as several sequences had fewer than nine contrasted frames.  (14) and (15).

Objective function
Initial injection Subsequent injections a small amount of contrast agent was used. The problem with nondistinct optima for shadow based similarity measures which was already sketched in Figure 2 becomes also apparent in the sections of the optimization function in Figure 10 for thr shad : for a small amount of contrast agent like in Figure 10(a), the optimum is not at the position of the ground truth, but about 15 mm off. Also for secondary injections into small structures such as the pulmonary veins, the objective function for thr shad in Figure 10(c) has a large plateau. In both cases, CADE is more distinct, although clear extrema as for bones [12] are not obtained. Especially for secondary injections where CA was often injected into pulmonary veins, registration errors leading to inconsistent results could be avoided by CADE . For well-contrasted sequences, the improvement by CADE was less as inconsistencies played only a minor role and the shape of the contrasted region in the image was more similar to the projected shadow of the left atrium. This becomes also apparent in Figure 10(b) where the shapes of the objective function of thr shad and CADE look very similar.
We found that the similarity measure using explicit apparent edges, edge , yielded poor results when used on its own. A possible reason is that the objective function of edge has several local optima and also the global optimum does not necessarily correspond to the ground-truth position. However, edge can improve results significantly when combined with other similarity measures. It often has a more distinct local optimum at the position of the ground truth that facilitates fine registration. This is important as the other similarity measures define a rather plateau-shaped optimal region. Figure 9 shows that the accuracy of the registration depends on when the frame was recorded after contrast agent injection. The best results were achieved at the 5th frame. This was in many cases the frame before the ejection into the ventricle; that is, it contains the most contrast agent. Especially for the first frames, where only little CA was present, registration accuracy was lower. The first frame is an exception, as it has a lower mean error than the second frame. This is probably because the previous frame, that is, the last uncontrasted frame, is used as mask frame for DSA computation which leads to low motion artifacts and a better extraction of the contrasted region.

Best-Frame Selection.
We found that, for our registration to work best, a well opacified frame should be selected from the sequence. This strategy was evaluated in the context of the single-frame evaluation by selecting the frame which had the best objective function value. For methods like DSA shad and thr shad , which are based on the projection shadow, the best frame usually corresponded to the most contrasted frame. The measure CADE , however, is not related to the amount of contrast agent, but it depends on consistency. When relying on CADE for best-frame selection, we get the frame yielding the most consistent registration. This is, however, not necessarily the frame with the most contrast agent. And it is usually the frame with the most contrast agent which provides the most information for registration. We believe that this is the reason why the errors in Table 2 for CADE are higher. If, however, from the translations estimated by CADE +    Figure 3), (b) a frame of the same injection but with large parts of the left atrium contrasted, and (c) a secondary injection into a pulmonary vein (see Figure 1(g)). The plots for DSA shad were left out as they look very similar to those for thr shad .
edge , the single frame was selected based on thr shad + edge , better results were achieved as now again the frames with the most contrast agent were selected.
Sometimes, the automatic best-frame selection does not provide a frame with a satisfactory result. Still, 85% of all sequences contain at least one frame that is below a clinical relevant threshold of 5 mm [27,28]. To benefit from the fact that at least one frame with a good registration result is likely to be found, the frame selection could be performed manually by stepping through the frames and the associated registration results. The user can then quickly select the frame with the best registration result. Although this implementation still involves user input, the required user interaction is less than for fully manual registration.
International Journal of Biomedical Imaging 13 For the remaining 15% of the cases or if higher accuracy level, for example, 3 mm [29], is required, small manual adjustment may be necessary in these cases.

Temporal Filtering.
Temporal filtering based on the Markov chain could reduce the error significantly. Particularly high errors for frames at the beginning or at the end of the contrast injection were reduced. If a registration is not desired for an arbitrary frame but rather for a frame determined by the physician, for example, based on the breathing phase, temporal filtering should be applied.

Runtime Performance.
It is notable that the runtime for CADE was considerably higher compared to, for example, thr shad . This is because the contrast agent distribution needs to be updated in each iteration before projecting it. Moreover, this part of the implementation has not yet been optimized. Although the fast runtime of, for example, thr shad will remain out of reach, we are confident that the runtime for CADE could be reduced to a clinically acceptable level.

Impact of Clinical Issues.
In general, the performance for initial registration was better, as the amount of contrast agent was higher and fewer catheter artifacts were present. We found by visual inspection that also the amount of breathing motion had an impact on the registration accuracy as it caused motion artifacts in the DSA computation. As a consequence, if the patient was not anesthetized, acquisition of the CA injection should be performed under breathhold or shallow breathing. For cases with intubation, also jet ventilation [30] or a small period of apnea could be applied to reduce breathing motion. Since our data was not acquired using a dedicated DSA program, the different brightness levels within a sequence changed. This interfered with the subtraction image computation. Although intuitively appealing, our data did not allow us to make a statement if the registration accuracy depends on the part of the LA in which CA was injected.
For initial contrast injections, an average accuracy of 4.6± 4.0 mm and a median error of 3.7 mm could be achieved using CADE + edge together with the best-frame selection approach. This error is in the range of the interuser variability of 3.2 ± 2.3 mm. The faster registration method using thr shad + edge reached an accuracy of 4.8 ± 4.6 mm. These numbers are also similar to the accuracy reported when performing manual registration based on segmentations of the CS, the whole heart, and the spine [7] or a segmentation of the CS when 3D and 2D data are in the same breathing and cardiac phase [31] which was achieved by ventilation and rapid pacing of anesthetized patients. However, our method does not require structures other than the LA to be segmented and requires no anesthesia. Compared to a registration based on the CS alone [32], the error was reduced by over 50%.
The average accuracy for all frames after applying temporal filtering is 5.7 mm for CADE + edge and 6.1 mm for thr shad + edge . Though the results of CADE + edge are close to the 5 mm threshold [27,28], it remains open if this is sufficient for a clinical application, but we believe that these results should at least provide users with an acceptable initial estimate for further manual adjustments. 4.7. Future Work. By now, the registration method was evaluated only for left atria. In a next step, the performance of this approach to contrast-based registration of the right atrium should be evaluated. A registration based on the right atrium would also provide a registration for the LA which is available for transseptal puncture or one of the ventricles could be assessed. Due to the lack of pulmonary veins and arteries, the ventricles have a more simple anatomical structure but are subject to stronger cardiac motion. How strongly this affects the registration accuracy is open.

Conclusions
Compared to the approach by Zhao et al. [17], the use of special weights for different heart regions is not needed for any of the proposed approaches. In addition, for thr shad + edge , a time-consuming DRR generation can be avoided. As a result, a registration approach based on a combination of shadow and edge features can be computed fast. If sufficient computational power was available, the novel CADE-based measure, which takes consistency into account, should be used as it improves results significantly, especially when very small amounts of contrast agent are injected.

Disclosure
The concepts and information presented in this paper are based on research and are not commercially available.