Assessment of Left Ventricular Function in Cardiac MSCT Imaging by a 4D Hierarchical Surface-Volume Matching Process

Multislice computed tomography (MSCT) scanners offer new perspectives for cardiac kinetics evaluation with 4D dynamic sequences of high contrast and spatiotemporal resolutions. A new method is proposed for cardiac motion extraction in multislice CT. Based on a 4D hierarchical surface-volume matching process, it provides the detection of the heart left cavities along the acquired sequence and the estimation of their 3D surface velocity fields. A Markov random field model is defined to find, according to topological descriptors, the best correspondences between a 3D mesh describing the left endocardium at one time and the 3D acquired volume at the following time. The global optimization of the correspondences is realized with a multiresolution process. Results obtained on simulated and real data show the capabilities to extract clinically relevant global and local motion parameters and highlight new perspectives in cardiac computed tomography imaging.


INTRODUCTION
Cardiovascular diseases cause the death of 17 million people every year, representing the major cause of mortality in industrialized countries. Various cardiac functional parameters are used to guide diagnosis, treatment, and followup of these diseases. The most used indicators (left ventricle volume (LVV), left ventricular mass (LVM), ejection fraction (EF)) provide information about the heart global function, but the detection and the treatment of some pathologies, such as atherosclerosis, would need the precise quantification of cardiac motion and deformation. Technological improvements in cardiac imaging provide rich opportunities for such a progress.
The minimally invasive assessment of heart motion has therefore been studied from modalities providing fourdimensional (4D) data sets. Magnetic resonance imaging (MRI), with cine MRI [1] and especially tagged MRI [2][3][4] and phase contrast MRI [5], has been extensively used, giving access to mid-wall deformations. However, its limited spatial resolution and long acquisition time prevent MRI to image both cardiac motion and coronary arteries. Ultrasound images [6,7] providing, with high availability, cardiac sequences of high temporal resolution, are still limited by their low signal-to-noise ratio in spite of the recent advances of real-time 3D echocardiography. ECG-gated single photon emission computed tomography (SPECT) and positron emission tomography (PET) in spite of lower spatial and temporal resolutions have also been used in order to combine perfusion and contractility informations [8,9]. The emergence of electron-beam computed tomography and of the dynamic spatial reconstructor (DSR) has provided opportunities for cardiac motion estimation [10][11][12], but their availability is still very limited. See [13] for an exhaustive review of cardiac image functional analysis methods.
The recent significant advances of multislice computed tomography (MSCT), with the introduction of ultra-fast rotating gantries (0.5 s/tr), multirows detectors, and retrospective ECG-gated reconstructions, provide high contrast and spatiotemporal resolutions and allow a huge progress towards the imaging of moving organs. These advances allow the observation of all cardiac structures simultaneously, International Journal of Biomedical Imaging for successive instants of the cardiac cycle, under one single breathhold. Some studies have been conducted in MSCT for the detection of coronary diseases [14,15], but very few works have been realized for the quantitative 3D cardiac motion estimation [16].
The issue of nonrigid motion estimation from 3D images is one of the most important challenges of computer vision. Methods which have been proposed for this purpose can be classified into three kinds of approaches. In geometric model-based approaches, parametric models [3,17,18] involve the parametric formulation of the object and/or of the movement. This kind of methods is interesting to extract global motion and to represent it with few parameters. Nonparametric models [10,19,20], using mainly mass-spring and finite element methods, extract local motion using differential constraints. Optical flow methods [2,11,21,22] are mostly based on intensity conservation and motion smoothing constraints. That constraint of intensity conservation with time is difficult to advance with MSCT data because of the contrast agent diffusion combined with the retrospective reconstruction of the sequence. Furthermore, these methods providing dense motion fields are difficult to handle with big data volumes in which the study deals with only few objects. Feature matching methods [1,23,24] are based on the search of correspondences between entities (considered at following times) according to descriptive parameters. These methods enable focusing the study on the objects of interest and extracting local motion. However, most of them are highly dependent from the segmentation quality because they need an accurate segmentation for each instant of the studied sequence.
We propose a new method to jointly extract ventricular shapes and their motion from cardiac MSCT images in one unique process. This problem of dual 3D shape and motion estimation is handled by a statistical approach provided by a Markov random field (MRF) [25], associated to a multiresolution process. The MRF theory has been extensively used in computer vision [26,27]. Its application to motion analysis has mostly been done with optical flow estimation [28] and deformable models [29,30].
In this paper, the 3D sparse nonrigid motion field to estimate at each time instant is formulated, in a Bayesian framework, as a Markov random field model under spatiotemporal regularity hypotheses. This motion field is provided by a matching method based on features of different types which are surface 3D mesh nodes at a first time instant and image voxels at the following time instant. These extracted motion fields can then be used for global and local motion quantification and interpretation. Results obtained on simulated and real data give satisfying results.
In the remainder of this paper, we describe in Section 2 the hierarchical surface-volume matching method we have developed including the preprocessing step, the definition of the Markov random field model and its application in a multiscale process, and the optimization stage. In Section 3 we present the results obtained on simulated and real data before concluding in Section 4.

A 4D HIERARCHICAL MOTION ESTIMATION METHOD
From a time sequence of 3D MSCT cardiac images, our approach allows the spatiotemporal detection of the left heart cavities and the quantification of their deformations. This is achieved by a multiscale matching method which provides, along the whole sequence, the correspondences between a 3D surface mesh extracted at one time instant and the 3D volume available at the next time instant. The overall method includes the following steps: (a) a 3D segmentation step and a surface reconstruction process are first applied to only one 3D image of the time sequence (at time t 0 ) to provide the first surface of the sequence; (b) a hierarchical surface-volume matching process is applied to estimate a 3D motion field between the surface at time t 0 and the next volume at time t 1 ; (c) from the surface at time t 0 and the estimated motion field, a new 3D surface can be estimated at time t 1 ; (d) steps (b) and (c) are repeated until all images of the sequence are processed.
In order to obtain the mesh corresponding to the first considered time of the sequence, step (a) is decomposed in this way: a segmentation tool, based on a 3D region growing process bounded by a gradient information, is applied [31]. The segmented surface is then reconstructed using the marching cubes algorithm. Finally, in order to prepare the matching process, the resulting surface mesh is regularized in such a manner that each node coordinates correspond to one volume voxel coordinates. Surface estimation (step (c)) relies on the deformation of the surface corresponding to time t 0 with the motion estimated between times t 0 and t 1 . A regularization step is then performed in order to fill mesh holes and to suppress redundant nodes.
The 3D motion field estimation (step (b)) is performed by a matching process applied between the surface representing the endocardium at time t 0 and the original volume corresponding to the next time t 1 . A hierarchical process is used to gain both in terms of result quality and of computational efficiency. The matching process will firstly be described at one resolution, then the multiresolution scheme will be detailed.

A surface-volume matching process
In order to estimate the motion between one surface corresponding to time t 0 and a volume corresponding to the following time t 1 , a surface-volume matching method has been developed.
A feature matching problem implies choosing the entities to match and to define local energies which can be combined to provide a distance measure between entities. As in almost all motion estimation issues, this measure is not sufficient to deal with a well-posed problem. It is therefore necessary to add contextual constraints. The best correspondences of selected entities can finally be obtained from the minimization of global energy.
One original contribution of this method is to establish correspondences between spatial entities which are not of the same nature: the matching process is conducted between 3D mesh nodes on the one hand and image voxels on the other hand.
The 3D motion field to compute between two successive instants is considered as a realization f = { f i /i = 1, . . . , N S } of a 3D random field F (N S being the number of considered sites in the field). The set of sites S of the field F is given by all the 3D mesh nodes at time t 0 . The labels assigned to these sites, expressed by the f i estimations, are given by the voxels found (at time t 1 ) in best correspondence with the 3D nodes.
According to Bayes' theory, the posterior probability of the realization f according to the observation d (the mesh nodes at time t 0 and the voxels at time t 1 ) is given by with p( f ) the prior probability, p(d | f ) the conditional probability of the observation process d, and p(d) the observation probability which is considered independent of f . According to the maximum a posteriori (MAP) estimator, the most probable realization f is provided by the maximization of the a posteriori probability The mechanical properties of the heart induce a spatiotemporal regularity of the motion field. To capture the spatial regularity of this motion, p( f ), used to model an a priori of the considered random field, is defined considering F as a Markov random field.
This Markov random field (MRF) is defined in relation to a neighborhood system μ. According to this definition, the neighborhood associated to one node s, noted μ s , is given by all nodes which share a common edge with the node s.
The MRF conditional probability models the local properties of the field according to this neighborhood. It is given by From the neighborhood μ, a set of cliques C is defined as including all pairs of neighboring nodes: According to the Hammersley-Clifford theorem [32], the Markov random field F in relation to the neighborhood system μ is also a Gibbs random field in relation to μ. The prior probability distribution function is then given by with Z a normalization constant and U R ( f ) a global energy function defining the interactions between the sites. U R ( f ) represents the internal energy of the random field and has a regularization effect. It is defined as V R (c, f c ) being the local interaction potential defined for each clique c and its associated labels where α R is a weighting factor, constant for every clique, is the motion vector estimated at site s (resp., t), and dist(s, t) is the Euclidean distance between nodes s and t.
The conditional probability density function p(d | f ) is given by the definition of a global data fidelity term which models the error between the realization f and the observation d. It is defined by where U I ( f , d) models the global estimation error. It is defined by V I (s, f s ) being a local correspondence measure to evaluate the matching between one node s at time t 0 (s ∈ S) and its corresponding voxel f s at time t 1 . It provides a data conformity term and, in such a way, a distance between the observation d (surface at time t 0 and 3D image at time t 1 ) and the estimated motion field given by f . For the analysis of correspondence between one node s and one voxel v, this term is defined by the following equation: where with s is the considered node (of coordinates (s x , s y , s z ) t ); v is the considered voxel (of coordinates (v x , v y , v z ) t ); dist(·) is the Euclidean distance function; C(·) is the contour detection function (implemented by a Canny filtering); μ s is the neighborhood of the node s; i is a neighborhing node of s (of coordinates (i x , i y , i z ) t ); and α c , α t , α d are weighting factors. From (1), (4), and (7), we have with a global energy which is to minimize to obtain the most probable correspondences.

Hierarchical scheme and optimization stage
The previously described motion extraction process is applied according to a hierarchical scheme which allows focusing the correspondence research area and reducing computing needs. This multiresolution scheme is considered to preserve local Markov property according to the high spatial resolution provided by MSCT data. The surface mesh at time t 0 and the volume at next time t 1 are defined with decreasing scales as follows: each data set from upper resolution R i (i = 1, . . . , n l , n l being the number of used resolution levels) is restricted in space at a lower level scale R i−1 by the application of a mean filtering (or Gaussian filtering for the volume) and of a subsampling process in order to provide a regular mesh corresponding to volume voxel coordinates at the same level.
The matching process is first applied at the lowest resolution (with an initialization to a null motion) to guide the motion estimation with the coarsest details. The result of that first estimation is used as an initialization, after an interpolation step, for the correspondences computation at the next finer resolution. This motion extraction process is applied iteratively, with an adaptation of energy weighting coefficients, until the estimation is obtained at the desired resolution.
The global optimization of the correspondences is performed with a stochastic relaxation Metropolis algorithm combined with a simulated annealing process at the first lower-resolution level and with an iterated conditional mode (ICM) algorithm at the upper-resolution levels.

RESULTS
In a first part, the method has been applied on simulated data resulting from realistic deformations of a real ventricular shape. This first stage has provided the means to control and define the different parameters included in our approach. In a second part, the method has been applied on real human cardiac data acquired with MSCT. Finally, the means to extract global and local parameters in association to informative visual representation modes have been developed.

Tests on simulated data
Numerical simulations have been used to test the motion extraction process between two successive instants. In the absence of translation motion induced by patient respiration (the MSCT acquisition is realized under one single breathhold) the heart is submitted to three main kinds of motion: radial and longitudinal contraction/expansion and twisting (cf. Figure 1). To simulate data, these three kinds of motion are applied to a previously 3D extracted mesh (corresponding to the first instant of the sequence). These deformations result in the mesh corresponding to the second instant. Then, this deformed mesh is inserted into a volume preprocessed by a Canny filter followed by an endocardial suppression step. The hierarchical matching process is finally applied between the surface before deformation and this volume at three increasing scales.
Using this simulation process, the real correspondences are known. It enables measuring the error of matching of the proposed method and to study the evolution of the matching process along iterations and at each stage of the hierarchical process. The impact of the different parameters involved in the computation of the energies or in the optimization process, as well as meaningful information provided by scale refinement, can also be evaluated.
Different tests have been conducted and combined to find the optimal value for each parameter involved in the matching process. These values have shown a good unicity (tests running with these values on data generated with varied motion parameters have provided optimal results) and a good stability (moderate variations of these values do not affect the quality of the result).
Mireille Garreau et al.  and 128 3 (b, d)) (colours: in blue (resp., red), motion directed outside (resp., inside) the cavity corresponding to positive (resp., negative) displacements), measures in millimeters.  Figure 2 illustrates an example of results obtained at the two lowest-resolution levels (R 1 with volume size 64 3 and R 2 (128 3 )). In color are represented the applied (a, b) and estimated (c, d) motion amplitudes. In red are represented the displacements directed inside the cavity, and in blue the displacements directed outside the cavity. With an initial mean matching error of 6.7 mm at R 1 level, the process converges to a final mean error of 0.8 mm at R 3 level. We have observed that this hierarchical process enables gaining in precision and error deviation.
These results have been confirmed by tests running with different simulated motion parameters.

Results on real data
The algorithm has been applied on real human heart data with a temporal database acquired by a Siemens SOMATOM Sensation 16 with ten volume images representing a whole cardiac cycle. Each volume contains about 300 slices of 512 × 512 pixels, giving a resolution for each voxel of 0.35 × 0.35 × 0.5 mm (cf. Figure 3(a) illustrating one CT axial slice).
The segmentation preprocess has been applied to the first volume of the sequence resulting in the extraction of the heart left cavities and of the beginning of the aorta. To obtain the surface mesh corresponding to time t 0 , the segmented volume has been processed by the marching cubes algorithm (cf. Figure 4).
Using the optimal set of parameters found with the numerical simulations, the motion extraction process has been applied between this mesh (corresponding to time t 0 ) and the volume corresponding to time t 1 after a Canny operator application (cf. Figure 3(b) for an example of the Canny filter output). The algorithm has been iteratively applied to this dynamic sequence considering three resolution levels (from 64 3 to 256 3 ), providing a set of estimated surfaces and motion fields for each instant of the sequence.   Figure 5 illustrates results obtained at the two lowest levels at end-diastolic (a, b) and end-systolic (c, d) instants. The contraction movements (represented with red color or by negative displacements), characteristic of systole, are as well identified as the expansion movements (represented with blue color or by positive displacements) and are coherent with cardiac phases. We can see that the lowest level representation is meaningful and that the displacements obtained at upper levels provide the means to extract local clinical parameters at various scales. Figure 6 shows the estimated motion amplitude during the whole cardiac cycle and using the highest resolution. We can observe that movements extracted on the whole sequence are coherent with cardiac phases. We can particularly remark the progressive expansion of the ventricle, from its basis to the apex, during the ventricle diastole (Figures 6(b)-6(f)), the important expansion occurring during the end-diastolic phase ( Figure 6(f)), and the spatial evolution of contraction during the ventricle systole (Figures 6(g), 6(h)).
This kind of representations enables to highlight functional abnormalities. For instance, Figure 6 highlights the pathological situation where the antero-apical area (bottom left part of the visualized shape) suffers from akinesia.

Extraction of clinical parameters
Firstly, the extraction of global parameters of the ventricular function has been researched. After the interactive selection of three points determining the valvular plane, the 3D detected structure is limited to the left endocardium. Then, the ventricular volume can be computed for each instant, providing the curve of 3D volume variations along the entire temporal sequence analyzed. These measures give access to clinical functional features such as systolic volume or ejection fraction (EF). However, because the proposed method substracts trabeculas from the cavity volume, the measured volumes are lower than the volumes measured by a rough approximation of the endocardium. Obtained ejection fraction is therefore overestimated compared to those obtained by more classical methods (for one case, e.g., 55% was measured with echocardiography against 66% with our method).
Secondly, the extraction and representation of local parameters have been conducted. The anatomical system of reference, defined according to the great axis of the left ventricle, is computed from the patient specific 3D estimated surfaces. In this reference system, the detected movement is then decomposed in longitudinal, radial, and rotational motion components (cf. Figure 1). Cranial/caudal motion as well as inward and outward motion or wall twisting can therefore be measured. These motion components can then be analyzed along the temporal sequence, for a set of points selected on different anatomical parts of the ventricle (cf. Figures 7, 8). For this database, the extraction of radial movements show for example that the base of the heart verifies the largest motion amplitude (in contraction (negative values) and in expansion (positive values)) compared to the mid-cavity and to the apex. By the same way, movements of torsion can be locally enhanced and compared between different parts of the muscle. This kind of representation is of great interest for the clinical part because it allows to quantify how each type of movement (radial, longitudinal, torsion) is affected by specific pathologies.
A synthetic representation of local parameters has been also developed, based on the bull-eye representation, commonly used in echocardiography and MRI [33]. This kind of visualization is based on the anatomical segmentation of the 3D left ventricle in longitudinal and radial sectors and on their integration in a bull-eye scheme. The anatomical segmentation of the 3D surface is here conducted from the interactive selection of three points corresponding to the aortic and tricuspid valves and of the apex. For illustration, Figure 9 shows this anatomical segmentation and the motion amplitude estimated on the overall temporal sequence represented on the bull-eye scheme. This visual representation gives access to the dynamic behavior of each anatomical segment (on the whole sequence or between successive times). It allows to enhance and quantify pathological situations such as akinesia or asynchronism. The results observed on these real data show, for example, a reduced motion in apical septal segments (numbered 7, 8, 13, 14, and 17) which corresponds to the real pathological situation.

CONCLUSION
A new solution of motion extraction combined with surface estimation has been introduced and applied to the left ventricle in 4D cardiac MSCT imaging. It is based on a hierarchical surface-volume feature matching method formulated with a Markov random field and provides, with one unique process, the left cavity surfaces and associated 3D motion vector fields. The algorithm has been tested with simulated and real MSCT dynamic data, highlighting the great potential of MSCT imaging for quantitative clinical measure assessment in cardiac applications. Moreover these performances might be increased with new MSCT systems, combining more detectors (64 rows, or even 128 rows systems) and faster rotations. Indeed, with shorter acquisitions, the acquired data will be less submitted to artefacts, enabling to reconstruct more data volume by retrospective ECG-gating, providing  Figure 5: Estimated motion amplitude at end-diastolic (a, b) and end-systolic (c, d) instants, at two resolutions (levels 64 3 (a, c) and 128 3 (b, d)) (colours: in blue (resp., red), motion directed outside (resp., inside) the cavity corresponding to positive (resp., negative) displacements).  Figure 6: Estimated motion amplitude during the whole cardiac cycle. Time instants illustrated from (b)-(f) correspond to the ventricle diastole, while time instants from (g)-(a) correspond to the ventricle systole (colours: in blue (resp., red), motion directed outside (resp., inside) the cavity corresponding to positive (resp., negative) displacements).  sequences of higher temporal resolution. Also, these new systems will be less sensitive to irregular heart rates. Further works will carry on extensive evaluation with normal and pathological real data, including especially a comparison with motion estimated with other imaging modalities.