Kinect-Based Correction of Overexposure Artifacts in Knee Imaging with C-Arm CT Systems

Objective. To demonstrate a novel approach of compensating overexposure artifacts in CT scans of the knees without attaching any supporting appliances to the patient. C-Arm CT systems offer the opportunity to perform weight-bearing knee scans on standing patients to diagnose diseases like osteoarthritis. However, one serious issue is overexposure of the detector in regions close to the patella, which can not be tackled with common techniques. Methods. A Kinect camera is used to algorithmically remove overexposure artifacts close to the knee surface. Overexposed near-surface knee regions are corrected by extrapolating the absorption values from more reliable projection data. To achieve this, we develop a cross-calibration procedure to transform surface points from the Kinect to CT voxel coordinates. Results. Artifacts at both knee phantoms are reduced significantly in the reconstructed data and a major part of the truncated regions is restored. Conclusion. The results emphasize the feasibility of the proposed approach. The accuracy of the cross-calibration procedure can be increased to further improve correction results. Significance. The correction method can be extended to a multi-Kinect setup for use in real-world scenarios. Using depth cameras does not require prior scans and offers the possibility of a temporally synchronized correction of overexposure artifacts. To achieve this, we develop a cross-calibration procedure to transform surface points from the Kinect to CT voxel coordinates.


Introduction
C-arm CT systems (Figure 1(a)), in contrast to conventional CT systems, have a high mechanical flexibility which gives radiologists the opportunity to perform CT scans in a variety of spatial positions. In particular, it is possible to rotate the CT system around a vertical axis [1]. This enables imaging of patients with knee diseases such as osteoarthritis while they are standing in an upright position, hence while the knee is bearing the weight of the patient [2].
One challenge of imaging relatively thin body parts like the knee is the limited dynamic range of the C-arm CT flat panel detector, leading to overexposure of the exterior regions of the knee. If not avoided or compensated for, overexposure leads to artifacts in the reconstructed volume, as shown in Figure 1(b). The front and back of the knee appear blurry and lack clearly defined outer boundaries. The image quality of important parts of the knee image, such as the patella, is severely affected by these artifacts. This has a negative impact on reliability of the diagnosis.
Using a C-arm CT acquisition protocol with the patient lying in supine position, several approaches are available to avoid or compensate overexposure artifacts. One way to avoid overexposure artifacts during acquisition is by covering the knees with an additional absorber, for example, a rubber belt [2,3]. However, extra weight of the belt can cause great discomfort for an upright patient with pains in the knees.
Different algorithmic methods for truncation correction in C-arm CT systems have been developed in the recent years. Truncation artifacts that arise in scans with a small region 2 International Journal of Biomedical Imaging of interest can be effectively corrected without any explicit extrapolation scheme [4]. If bigger portions of the patient are of diagnostic interest, different correction methods have to be applied. In [5], additional knowledge through a prior low-intensity scan is facilitated for artifact correction. In the case of imaging of standing patients with knee diseases, however, expected patient movement makes the use of a prior scan very difficult. Other methods, which do not use a prior low-intensity scan, correct truncation artifacts through an appropriate extrapolation model such as a water cylinder for the upper body [6,7]. In [8], the model-based extrapolation is extended by an iterative truncation correction algorithm, which is able to handle cases where the water cylinder assumptions are not exactly fulfilled. These model-based methods are not applicable for knee imaging, as the anatomical structure is too complex to be approximated by a single cylindrical or elliptical object. Another approach which uses a multicylinder extrapolation model [9] yields better results. Similar to the single water cylinder model, however, overexposure correction only works for objects that sufficiently fit to the simplified cylindrical knee models.
Hence, in order to bring the novel diagnostic possibility of imaging knees of standing patients into clinical practice, it is highly desirable to develop an imaging solution that avoids these drawbacks.
In this paper, we present a method for correcting overexposure by combining information from a Kinect depth camera with a C-arm system. As a proof of concept, we demonstrate its feasibility for patients in supine position. However, there is no fundamental limitation for applying the same setup to patients in weight-bearing standing position. In such scenarios, multiple Kinect depth cameras, observing the patient from different angles, could be used for artifact correction. The approach has the further advantage that the information used for correction can be acquired simultaneously to the CT scan. Thus, depreciation of the correction through patient movement is low in comparison to methods relying on prior information.
The contributions of the paper are as follows: (i) We introduce a specifically designed, easy-toreproduce calibration target for cross-calibrating a C-arm CT system with a Kinect depth camera.
(ii) We propose a cross-calibration procedure between the depth camera and the C-arm CT.
(iii) We present a depth-based correction of overexposure artifacts. Figure 2(a) shows a sketch of the cross-calibration procedure using a calibration phantom. The calibration target is detected by both imaging systems and enables the computation of a transformation of the coordinates from one modality to coordinates of the other modality. Figure 2(b) shows a sketch of the imaging protocol. Once the system is calibrated, a patient is placed into the field of view of both modalities.
When imaging a patient, the Kinect depth data is used to find the points of intersection between the X-ray beam path and the object surface, that is, the points at which the X-rays enter and leave the knee tissue. For each pixel in each projection, the length of the beam path across the knee is calculated. Overexposed pixels are corrected by extrapolating the absorption along the corresponding line integrals.
In Section 2, we describe the phantom and the crosscalibration procedure for transforming points between both imaging modalities. In Section 3, we describe the proposed projection-based artifact correction. In Section 4, the reconstruction of the corrected projections is evaluated and compared with an uncorrected volume and the ground truth. In Section 5, we discuss the correction results and limitations of our proposed method. In Section 6, we discuss possible improvements and future work based on the current correction method.
International Journal of Biomedical Imaging  Figure 2: (a) A Kinect camera is cross-calibrated to the C-arm CT using a phantom on the patient bench. (b) For overexposure correction, the patient is imaged simultaneously by the C-arm and the Kinect.

Kinect to CT System Cross-Calibration
The Microsoft Kinect camera provides a color image and additionally per pixel the distance in 3D of the depicted scene point to the camera. To use this distance information in a CT scan, we determine the parameters for a rigid transformation between both imaging systems through cross-calibration procedure.
A cross-calibration phantom with known geometry is observed by both imaging modalities to determine the relative translation and rotation between both coordinate systems.
The cross-calibration phantom consists of the cylindrical PDS-2 calibration phantom, which is commonly used for Carm cone-beam CT calibration [10], and an attached depth calibration structure. Figure 3 shows the basic design and geometry of the phantom. The depth calibration structure is a scaffold of orthogonal plastic rods.
Three spheres are attached on each rod. The spheres are particularly suitable for detection and localization with the Kinect camera from a wide range of viewing angles. The goal of the calibration is to identify the three rods with the coordinate axes and their intersection with the coordinate origin.
From solely observing the cylinder surface, only the direction of the axis of the cylinder could be determined. The spheres allow the determination of the alignment of the -and -axes purely based on depth data. Painting the cylinder to indicate the axes directions, for example, would introduce inaccuracies from the Kinect-internal RGB-todepth calibration to the cross-calibration procedure.
The use of the attachment could prove to be especially advantageous in weight-bearing scanning scenarios, where two or more Kinect cameras are observing the phantom from different angles for the calibration.
For processing the depth data, we use the Range Imaging Toolkit RITK [11]. A visualization of acquired data can be seen in Figure 4. Raw depth images from the Kinect camera are relatively noisy, with a standard deviation of point-to-plane distances of about 25 mm at 1 m distance [12]. To counter this noise, we apply spatial smoothing (Gaussian kernel with = 2.5), temporal averaging (20 frames) and edge-preserving smoothing (guided filter with 4-pixel support [13]).

Sphere Segmentation and Fitting.
First, a user has to mark the spheres in the RGB data. We compute the estimated projected size of the sphere from the depth information at the marked point. Pixels of similar depth around the seed point are recursively added to the sphere area, as long as the distance of newly added pixels does not exceed the sphere size. Spheres belonging to the same axis are fitted to the depth data. We estimate the sphere center for each connected set of sphere surface points (see Figure 5). An initial estimation of the sphere center is made by using and coordinates from the initially user-selected spheres. The depth value of the center is approximated by adding half of the sphere radius to the mean depth value of the respective surface points. The best fitting center point is determined using a least squares error metric. Let p , 1 ≤ ≤ , be the th surface point in Kinect 3D coordinates and c * the unknown Kinect 3D center coordinate of the sphere. Then, c * is determined by solving the convex optimization problem where denotes the number of segmented sphere surface pixels.

Estimation of the Axes Directions.
From the estimated center points, position and direction of the axes are obtained as follows. Per axis we use at least two center points (each axis has 2 to 4 spheres, resp.). Without loss of generality, we aim to recover one point on the first coordinate axis and its direction, denoted as c 1 and k 1 , respectively. Let ∈ {2, 3, 4} denote the number of segmented spheres on this axis and c * , 1 ≤ ≤ , the center of the th segmented sphere. The axis point c 1 is the 3D mean coordinate of all c * : The algorithm for finding k 1 is analogous to finding the best fitting plane to the points. We solve this problem via orthogonal distance regression and singular value decomposition (SVD) [14]. Let is a least-square estimate of the first coordinate axis in parametric form with scale parameter . Accordingly, we estimate other coordinate axes a 2 and a 3 from the two remaining sets of sphere centers.

Estimation of the Kinect Coordinate Origin.
We calculate the axis origin as the estimated point of intersection of the rod axes. Due to noise and estimation inaccuracies, the axes are unlikely to intersect in one single point. Therefore, we define the coordinate origin as the closest point to all three axes in a least-squares sense [15]. The formula for calculating the closest point g to multiple -dimensional lines is the following (see Appendix A.1): The unit direction vectors k and suspension points c of the axes are already known from the previous estimation of the axes directions.
The solution g = ( , , ) T is the fitted origin of the sphere mount in the Kinect coordinate system. All detected 3D points in the Kinect coordinate system are translated to the estimated origin g: ] .

Coordinate System Transformation.
Knowing the position and rotation of the calibration structure to the phantom, coordinates can be directly transformed from Kinect to the C-arm CT (see also Appendix A.2). The coordinate system origin of the C-arm CT lies in the center of the cylinder (cf. Figure 3(a)). Let W capture rotation around -axis and t Origins translation between and the center of the cylinder. Then transforms a Kinect surface point p Kinect into a C-arm CT coordinate p Zeego .

Overexposure Artifact Correction
The flat panel detector used in C-arm CT imaging has a limited dynamic range. If both knees overlap in a projection, higher X-ray doses are necessary to penetrate both knees. In the exterior regions of the knees the X-rays are only slightly attenuated and the resulting high intensities at the detector cause saturation. Hence, information about these regions is lost and saturation artifacts arise.

Projection-Based Extrapolation.
The correction of the saturation artifacts is performed for every detector line in each projection separately. Joint use of Kinect and CT data allows a straightforward correction of overexposure in three steps: (1) If a detector line in a projection contains overexposed pixels, we determine the 3D points where the X-rays entered and exited the knee. (2) From these points, the length of the beam path through the knee is computed. (3) Overexposed pixels are corrected by extrapolating a smooth absorption fall-off from nonoverexposed pixels.
Note that the extrapolation does not automatically suppress tissue variations at knee boundaries: the angular range in Carm CT scans usually amounts to 200 ∘ . Upon tomographic reconstruction of the knee volume, there exist for each boundary voxel many projection angles where a sufficiently thick portion of the knee is traversed, such that tissue variations at knee boundaries can in principle still be observed. Figure 7(a) shows --axis view of an X-ray beam hitting an exemplary detector line. We are interested in the length of the beam path through the knees. Figure 7(b) shows the same trajectory in --axis view. We are looking at rays on a plane defined by the X-ray source and the currently considered detector line. For each ray, we are seeking the intersection length of the ray with the knees.

Geometric Considerations of Correction.
In our experiments, we simulate the knees with two plastic bottles filled with water (see Figure 6). To simulate the femurs in the legs in the CT images, two dense rods with a density of 1000 g/cm 2 are placed between the bottles.
In principle, the intersection length can be directly computed from the nearest Kinect surface points at the entrance and exit of the knee. However, to make the results more robust to noise, we first fit a cubic B-spline curve to all points lying on the plane and determine the intersection length from the spline. Note that this computation can be performed in 2D, as all involved points are located on the same plane. Examples for resulting closed cubic B-spline curves are shown in Figure 8. Here, we observe two plastic bottles that represented the knees. The line that passes through the curve represents an example of the X-ray trajectory. In this case the -component of the X-ray direction vector is dominating; that is, detector and radiation source lie close to the -plane. Note the slight inaccuracies on the right side and truncated horizontal contours due to limitations in the edge detection of the depth camera. We extrapolated the surface points on the unobserved side of the knee phantoms by mirroring the visible points on a plane parallel to the -plane.
A schematic explanation of the proposed extrapolation method is shown in Figure 9. The objective is a smooth and reasonable extrapolation of the line integrals at the transitions to the overexposed regions 1 and 2 .
For a smooth transition, the intersection lengths are normalized to match the value of line integral at 1 and 2 , respectively. To prevent noise-related inaccuracies, we use an average value of the last nonoverexposed points for normalization.
The result of the Kinect-based correction of a CT projection is demonstrated in Figure 10.

Reconstruction Setup.
We use the CONRAD framework for reconstruction [16] after the artifact correction.
The reconstruction pipeline consists of a cosine weighting filter [17], a Parker redundancy weighting filter [18], a Shepp-Logan ramp filter [17], and a GPU-based back projection tool [19]. After the reconstruction, the data is normalized to the Hounsfield scale. In a final step, the reconstructed data is smoothed with a bilateral filter (width: 5, photometric distance: 500). The source-detector and source to -axis distances are 1200 mm and 600 mm, respectively. We acquire 133 projections in a 200 ∘ rotation around the object. The detector size in pixels is 1240 × 960 with a pixel spacing of 0.308 mm for and . The mean distance of the Kinect camera to the phantom is 700 mm.

Results
We evaluate and compare the reconstructions of the four projection data sets which are shown in Figure 10. After a brief description of the reconstruction setup we describe the results for one slice of the reconstructed volumes.
Afterwards, the results are compared quantitatively for five regions of interest in the exterior region of the knee phantoms.

Observations.
We first inspect the reconstruction of the uncorrected projections (see Figure 11(a)). The saturation causes strong artifacts. High intensity streaks are observed at the onset of the overexposure and the original shape of the edges on the right side can not be clearly recognized. The exterior regions on the right side lack a definite outer boundary and are blurred. Figure 11(c) shows the reconstruction of the corrected projections. The overexposure artifacts are significantly 8

International Journal of Biomedical Imaging
Projection value Intersection length Projection value reduced for both bottles and the boundaries on the right side of the phantoms are mostly restored. However, the contour of the phantom is still blurred at the outer regions of the bottles in the top right and bottom right. The boundaries of the ground truth and the surface data do not align perfectly (see Figure 12(b)). This problem arises from inaccuracies in the cross-calibration procedure. As these inaccuracies are sufficiently small, we can still achieve good correction results. The outline of the bottles in the left half of the surface data slice lies outside the ground truth boundary. This inaccuracy results from the extrapolation of surface points to the back side of the knees, which was based on mirroring the surface points on the --plane at an estimated -height.
In Figures 11 and 12 we observe that in principle there is sufficient depth information to extrapolate the truncated boundary within the field of view. However, the boundary was not restored completely in the corrected volume. The reason for this is the nonlinear preprocessing by the Carm CT system. As a result of this preprocessing, the values of the last nonoverexposed pixels can be very low. If the intersection lengths are normalized to these very low values, the extrapolation is of almost no effect. This effect can be countered by starting the extrapolation at an earlier point at which the pixel values have not been minimized by preprocessing.   ground truth values. This change occurs because the previously truncated parts of the phantom are now partially restored at the positions of the ROIs. Now, the material of the phantom is more consistently measured inside the ROIs instead of air in the truncated case. Furthermore, the values of the standard deviation are reduced. This shows that the values within the ROI in the corrected data are more homogeneous and outliers, which would increase the standard deviation, have been eliminated. The saturation artifacts cause very high maximum values on the truncated edge of the lower bottle. These artifacts are corrected with the Kinect-based correction tool.

Quantitative
The corrected data shows significantly improved reconstruction results. The visualization of the absolute differences between both uncorrected and corrected data and the ground truth (see Figures 11(e) and 11(f)) backs up our measurements. We observe that the differences are lower for almost all regions. Furthermore, the figures show that, apart from artifact correction inside the knee phantoms, artifacts caused by truncation between the two phantoms were also reduced.

Discussion
The results have shown that the Kinect-based correction of saturation in cone-beam CT is a feasible approach for reducing artifacts in saturated scans. Lost surface information, especially at the front side of the knee phantoms, was restored. Furthermore, noise and overexposure artifacts were reduced through the correction of the projections.
Overexposure not only exclusively occurs in C-arm CT imaging but also occurs in other systems such as multidetector CT (MDCT). One factor that makes overexposure compensation easier in MDCT is the higher dynamic range of 20 bits [20], which generally leads to less severe artifacts. Furthermore, bowtie filters and tube current modulation can be utilized to reduce radiation dosage in the exterior regions of the scanned object [21][22][23]. In C-arm CT, overexposure artifacts are mostly tackled after image acquisition, as bowtie filters are linked to reduced detector efficiency [24] and overexposure of the detector is often even intentionally caused to tackle image quality limitations due to the limited dynamic range [9,25,26].
Fully leveraging the 200 ∘ raw data acquisition of the Carm CT around the knees might allow for better correction results in algorithmic approaches than the baseline considered in this paper. ROI reconstruction [27][28][29] or iterative reconstruction [30] could be utilized for this approach. Severe truncation, however, is still unlikely to be fully corrected [30].
In this context, it should be noted that the proposed method can, in principle, be used in combination with any other correction method. Using the additional surface information could be used for regularization which would likely lead to further performance improvements. Additional considerations would have to be made, if the overexposure occurred in the bone, for example, the patella. In this case, the normalization factor would be based on the bone density. Instead of the skin tissue, the bone would be expanded until the outer surface, which would cause correction errors. The first likely occurrence of overexposure is to be expected in the skin tissue right next to the bones. The extrapolation of the skin tissue based on the values of the neighboring bone tissue can algorithmically be avoided. If the values of the last nonoverexposed pixels are significantly higher than expected for skin tissue, the normalization factor can be adjusted according to nearby or typical skin tissue values.
In the B-spline interpolation we observe inaccuracies of the edge detection of the Kinect camera. For a bigger field of view problems may arise in the correction of the outer edges. However, saturation artifacts are usually only expected at front side and back side of the knees for patient scans.
For these regions we can acquire reliable information with the Kinect camera.
Choi et al. [31] proposed an approach for motion correction in weight-bearing knee scans. However, it is still necessary to correct for overexposure artifacts. A depth camera-based solution offers the possibility of a temporally synchronized correction of overexposure artifacts, because the depth information is captured in real-time and continuously throughout the complete scanning procedure.

Outlook
The experiments in our research aim to demonstrate the general feasibility of the correction method. For this, we focus on supine scans of the human knees. However, the design of the method is not restricted to supine scans and could in principle also be used for weight-bearing scans of the knees in real-world scenarios. For this, we propose using two Kinect sensors to gather surface information for all relevant angles. The design of the cross-calibration phantom allows the simultaneous crosscalibration of two Kinect sensors with the C-arm CT. By capturing the surface areas close to the patella and popliteus with two separate cameras, closed B-spline curves can directly be calculated from the merged surface data and used for saturation correction. By using this approach, no further estimations for the back side of the object have to be made and more accurate results are to be expected.
In this paper, we analyzed the new correction approach in isolation. The correction method could be combined with other recent algorithmic approaches to leverage their respective benefits. In future experiments, the performance improvements of the artifact correction for combined approaches could therefore be investigated in detail.
In order to use proposed approach in real-world scenarios, the accuracy of the cross-calibration is of high importance and can be improved through more precise manufacturing. Design improvements could be achieved by evaluating the cross-calibration accuracy for different positions, sizes, and numbers of spheres. Transparent materials are usually not detectable by the depth camera and could be used for the sphere-carrying rods to improve the segmentation accuracy.
Besides qualitative improvements in the phantom design, the procedure could be improved algorithmically. In the experiments, only depth features from the Kinect sensor are used for the calibration. By making use of the additional RGB data gathered by the Kinect, the accuracy of the crosscalibration could be further enhanced.
Big improvements in processing time can be made in the projection correction. The main source of computing time derives from the B-spline curve interpolation and calculation of line integrals along the X-rays through the object. This type of calculation is one of the basic routines on a GPU and could be performed by providing the graphics card with the 3D points and projection geometry [32].
The sphere segmentation was performed semiautomatically by first clicking on the individual spheres in a predefined order. In future, the spheres could be detected in the RGB image automatically, based on their color.

Summary
When scanning knees, the limited dynamic range of the detector causes saturation artifacts in the reconstructed volumes. As these artifacts affect the surface regions of the scanned object, the idea for the correction method is to additionally use a Kinect camera to locate the surface of the object in 3D.
In order to use these surface points for the correction of CT images, we develop a procedure for cross-calibration between the camera and the C-arm CT. For cross-calibration we use a PDS-2 calibration phantom and attached a structure that is detectable with the Kinect camera.
After the cross-calibration, a projection-based saturation correction is performed where all detector lines are successively corrected within the projections. With the C-arm geometry, we determine the 3D points where the X-rays entered and exited the knee and calculate the length of the X-ray through the knee with these points. Ultimately, we use these calculated lengths for smooth extrapolation of the boundary of the object in the overexposed regions.
The reconstruction results show that the projectionbased correction itself yields clear improvements to the noncorrected data. The boundaries of both knee phantoms are extrapolated to their correct position and overexposure artifacts are significantly reduced.
Potentially arising problems due to limited edge detection and the different tissue densities in the knees are also considered.
Possible future work includes the usage of a second Kinect camera for weight-bearing scans and a GPU-based calculation of the intersection lengths. The sphere segmentation could be automated by identifying the spheres based on their color. Furthermore, a temporally synchronized correction approach could be applied in current research projects.

A. Mathematical Formulas
A.1. Line-Line Intersection. All direction and normal vectors of lines in the following equations shall be considered as unit vectors. Two-dimensional lines can be represented by a point p line on the line and a normal vector n line perpendicular to that line. The distance between a point g and a line defined by p line and n line is The sum of squared distances to more than one line is where W is the 3 × 3 rotation matrix that is used to rotate the orthogonal axis vectors of the Kinect coordinate system onto the corresponding axis vectors of the Zeego C-arm coordinate system. C is the matrix containing the unit direction vectors of the sphere mount axes in the Kinect coordinate system. D is the matrix containing the unit direction vectors of the sphere mount axes in the Zeego coordinate system. Using (A.9), the direction vectors of C shall be rotated onto the direction vectors of D. The rotation matrix W can be obtained by calculating the matrix inverse C −1 and rearranging the equation to After rotating the points with the rotation matrix W, the final step of the coordinate system transformation is the translation of the coordinate system origin to the center of the PDS-2 phantom which amounts to 110 mm on the -axis: