Markerless Lung Tumor Motion Tracking by Dynamic Decomposition of X-Ray Image Intensity

We propose a new markerless tracking technique of lung tumor motion by using an X-ray fluoroscopic image sequence for real-time image-guided radiation therapy (IGRT). A core innovation of the new technique is to extract a moving tumor intensity component from the fluoroscopic image intensity. The fluoroscopic intensity is the superimposition of intensity components of all the structures passed through by the X-ray. The tumor can then be extracted by decomposing the fluoroscopic intensity into the tumor intensity component and the others. The decomposition problem for more than two structures is ill posed, but it can be transformed into a well-posed one by temporally accumulating constraints that must be satisfied by the decomposed moving tumor component and the rest of the intensity components. The extracted tumor image can then be used to achieve accurate tumor motion tracking without implanted markers that are widely used in the current tracking techniques. The performance evaluation showed that the extraction error was sufficiently small and the extracted tumor tracking achieved a high and sufficient accuracy less than 1 mm for clinical datasets. These results clearly demonstrate the usefulness of the proposed method for markerless tumor motion tracking.


Introduction
In radiation therapy, to irradiate sufficient dose to tumors and avoid unnecessary dose to the surrounding healthy tissues are crucial to achieve significant treatment effects and reduce adverse effects. Stereotactic body radiation therapy (SBRT) can satisfy such clinical demand for accurate isocenter positioning to the static center of the target tumor volume [1]. Intrafractional tumor motion can, however, badly affect the accuracy of the irradiating position and additional margins should thus be designed to account for such geometric uncertainties [2,3]. Inevitably, the larger margins cover the wider regions of surrounding healthy tissues. In this sense, motion management is necessary for effective treatment, especially for abdominal and thoracic tumors [2][3][4]. Indeed, such tumors can move several centimeter due to mostly respiratory and cardiac motions [5,6].
To achieve highly accurate irradiation to moving tumors, tumor tracking to measure or monitor the motion can be an ideal direction. Image-guided techniques to capture the tumor motion [7][8][9][10][11] have thus been developed for the tumor tracking. A kV X-ray fluoroscopy is widely used for such image-guided radiation therapy (IGRT) because of its capability of direct position measurement of a target tumor inside the patient's body. However, image quality of the fluoroscopy may not be sufficient for accurate measurement of the tumor position, and thus fiducial gold markers, which form sufficient contrast to the surroundings on fluoroscopic images, are often implanted into or near the tumor [7,12]. The markers position can be measured accurately and regarded as a good fiducial of the tumor position. Although implanted markers are very effective for the accurate monitoring, the implantation of markers is an additional burden in clinical routine [13]. Furthermore, for lung tumor cases, it is a serious problem that the implantation itself runs the risk of pneumothorax as high as 30% of such patients [14].
On the other hand, markerless techniques are fundamentally free from the risk and have thus been developed for "safer" tracking [15][16][17][18][19][20]. Among them, Berbeco et al. [15] and Mostafavi [16] developed a filter to enhance tumor contrast with the surrounding tissues by averaging tumor images at the same respiratory phase over several periods. This technique assumes that the geometric relation between the tumor and the other structures such as bones, blood vessels, and other tissues is the same for anytime at the same respiratory phase. However, the relation can change, and thus the technique often fails to measure the tumor position accurately. Meyer et al. [17] and Wilbert et al. [18] have evaluated a conventional template matching technique for tumor tracking on megavoltage portal images. They compared the tracking performance of the matching technique with several objective functions to be minimized. The technique is based on the nonfluoroscopic image assuming that target image intensities can directly be defined by observing the target itself and may not be affected by the other structures intensity. In contrast, the fluoroscopic intensities of pixels at which the tumor may be located are dependent on not only the tumor itself, but also the other structures passed through by the X-ray [21]. Due to the fluoroscopic characteristics, only insufficient measurements of the tumor motion can be achieved by such conventional techniques.
In this paper, we propose a new markerless technique for accurate measurement of the target position by decomposing the fluoroscopic pixel intensity, which is the sum of intensity components of the tumor and the other structures passed through by the X-ray, into the target intensity component and the others. In other words, the proposed technique can extract the intensity component of the target tumor from the fluoroscopic intensity. The extracted tumor intensity component is independent of the other intensity components and the extracted tumor has sufficient contrast with the surrounding area. Consequently, it can be used to measure the position accurately. The performance evaluation of the technique is conducted by using both phantom and clinical data.
The rest of the paper is organized as follows. In Section 2, a new method for extracting tumor intensity component is proposed for the markerless tracking. Performance analysis to evaluate the markerless technique is given in Section 3. Concluding remarks are described in Section 4.

Methods
Note that the extraction of the tumor component from the fluoroscopic image is generally ill posed. In fact, (1) is an indefinite equation because not only the tumor image but also the background is unknown. Thus, (1) does not have a unique solution of the tumor image . Indeed, it is often very difficult for one to recognize a tumor in an Xray fluoroscopic image. On the other hand, radiotherapists and radiologists can recognize the shape and image intensity component of the moving tumor on a fluoroscopic image sequence. This suggests that there is a mechanism to extract the shape and intensity component of the tumor not from each fluoroscopic image, but a sequence of them. In the following, we will formulate such extraction mechanism of the proposed dynamic decomposition.
It might be worth to mention that the spatial segmentation technique finds pixels or locations belonging to the target, while the intensity of a pixel may belong to more than two structures in a fluoroscopic image. Decomposition aims to extract the intensity component of the target structure from the observed fluoroscopic intensity. This is a main difference between the conventional segmentation and the proposed dynamic decomposition for tumor extraction.
Let us suppose that, for simplicity, tumor or background motion is a translation and consider each pixel of the extracted tumor component at a reference location ( , ) with intensity ( , ) that will move to a new location ( + ( ), + V( )) with intensity ( + ( ), + V( ), ) at time . In this case, the extracted tumor image at time is written by the reference image as where ( ( ), V( )) denotes the displacement vector at time . Equation (2) implies that an extracted tumor image of any frame (time) of the image sequence, , can be represented by the reference and the displacement vector ( ( ), V( )) at each time . The tumor image extraction problem can then be Journal of Medical Engineering 3 solved by finding the reference image common for all the frames and each displacement ( ( ), V( )).
For the background image at time with intensity ( , , ), the same condition using the reference image at the reference position ( , ) with intensity ( , ) can also be applied as where ( ( ), V ( )) denotes the background displacement. Similarly, the reference image is common for any frames.
If the displacements are known, we may formulate simultaneous equations consisting of a set of ill-posed equations of (1) for several frames. Such simultaneous equations accumulate temporal image constrains that must be satisfied by the extracted tumor, the background, and the observed fluoroscopic intensities. Ideally, if the accumulated constrains are sufficient, the simultaneous equations will be solved and the tumor image can be extracted. However, the number of frames required for regularization of the simultaneous equations is depended on the tumor and background motions and the other image configuration, and thus it is unknown in general. Then, we estimate the tumor image and optimize the estimation recursively, instead of solving the equations explicitly. In addition, the displacements are unknown in general, and thus we need to estimate them as well. (2) can be measured by using a template matching technique [17]. Obviously, an ideal template for the tumor is the extracted tumor reference that is unknown. Here the current estimation of the referencêcan be used as an estimation of the template. Then, the displacement ( , V) can be estimated by matching the templatêwith the tumor image estimatê.

Dynamic Decomposition
From (1), an estimation of extracted tumor intensity matrix at time can be given by subtracting the current estimate of the background image matrix from the fluoroscopic image matrix aŝ( where ( ) denotes the estimation error matrix at time and The estimate of the background displacement (̂( ),V ( )) can be given by matching the current background referencê with the fluoroscopic image ( ). This matching would be more accurate if the size of region of interest (ROI) is sufficiently large compared with the tumor size for ignoring a tumor effect on this background matching.

Estimation of Tumor Intensity Component.
A quadratic objective function to be minimized at time , ( ), is given as where tr( ) and denote trace and transposition of a matrix , respectively. For simplicity, a sequential steepest descent method will be formulated for finding the best estimation, but many optimization methods can be applied to minimize various objective functions.
In the steepest descent method, the change of the tumor reference estimation̂, Δ̂, to minimize is given as where is a step size of the optimization. The update of the variable is then represented aŝ where is an iteration number. The background reference estimation can be updated by the same manner, but in this paper it is updated by using the updated estimate of the tumor imagê( ) =̂( −̂( ), −V( ), + 1), so that the error is kept to be 0. Finally, the algorithm is summarized as follows.
Step 5. ← +1. Go back to Step 3 if ≤ , where (= 2 in this paper) is the maximum number of optimizing iterations; otherwise go to next Step.
Step 6. Estimate the motion ( ( ), V( )) as the center of mass of the binarized image of̂(see Figure 5).
Step 7. ← + 1. Go back to Step 2 if ≤ , where is the maximum number of frames; otherwise stop.

Experimental Results
We have evaluated performance of the proposed method by using phantom and clinical data sets. The extraction performance was evaluated only for the phantom data because the ground truth of the extracted 4 Journal of Medical Engineering tumor image for clinical data is unknown. On the other hand, if the extraction is accurate, then the tracking is also accurate. Thus, the tracking performance can be a good index of the extraction performance as well.
For phantom motion tracking evaluation, the following error of Euclidean distance between the measured displacement (̂( ),V( )) and the ground truth ( ( ), V( )) is calculated by averaging the distances over frames of the fluoroscopic image sequence: For clinical data evaluation, three radiologists and medical physicists manually contoured the tumor image, and then three centers of mass of the contoured image were averaged and used as the ground truth of the tumor position ( ( ), V( )). For comparison purpose, we will also show motion tracking results by the same template matching technique [17] without the proposed tumor image extraction.

Phantom Data.
A chest phantom fluoroscopic image without any tumor was taken first, and it was fluoroscopically superimposed by a moving phantom tumor image, which created 100 frames of size 600 × 250 pixels with spatial resolution 0.26 mm/pixel. The phantom tumor image used in this experiment is shown as Tumor (left) in Figure 1. Motion of fiducial markers implanted into a lung cancer patient was used as the phantom tumor motion, which was measured every 0.033 s by using the real-time tumor-tracking (RTRT) system at Hokkaido University Hospital [7]. Figure 2 shows ten phantom frames, , randomly chosen from the total hundred frames. The images shown are cropped to 100 × 100 pixels around the center of the original size. The left-upper image in Figure 2 shows the initial phantom fluoroscopic image, and the tumor location in the initial image is considered as the reference position with zero displacement ( , V) = (0, 0).

Tumor Image Extraction.
In this experiment, the tumor outline was initialized manually on the first frame (1), the left-upper image in Figure 2. Intensities inside the outline were initialized as a constant value 0 = 1. Then the reference imagêwas initialized aŝ where 0 denotes the region inside the initial outline. The background referencêwas initialized by using the initial observation of the fluoroscopy subtracted by the initial tumor image aŝ= Figure 3 shows extracted images of the moving phantom tumor,̂, corresponding to the ten frames of phantom images in Figure 2. As seen in Figure 3, the extracted tumor image was initially different from the ground truth shown in Figure 1 (left), but approaching to the truth gradually. Indeed, an extraction error decreased gradually as shown in Figure 4. The error is defined by using the error between the extracted tumor imagêand the ground truth as where =̂− . Figure 5 shows a comparison between the truth and the extraction results. In this figure, images were binarized for visibility. The initial shape of the tumor shown in Figure 5(b) was obviously bigger than the truth in Figure 5(a). Nevertheless, the extracted tumor intensity component shown in Figure 5(c) seems sufficiently similar to the truth and the error , reflecting the extraction error of the different shape and 2 line motion traces seen in Figure 3, was negligible for accurate tumor tracking (see Section 3.1.3). We may thus conclude that the tumor image can be extracted from the Xray fluoroscopic image sequence.

Motion
Tracking. By using the binarized extracted tumor, the average error of the motion measurement in (11) converged to zero, implying that the tumor motion can be measured within the spatial resolution, for example, 0.26 mm/pixel in this example. On the other hand, the error was more than 2.0 mm by using the same matching technique [17] without the proposed tumor extraction due to the fluoroscopic characteristics especially mismatching with the background structure. The comparison clearly demonstrates effectiveness of tumor extraction from the fluoroscopic images.

Clinical Data Cases.
We also applied the proposed method to three clinical datasets of fluoroscopic image sequences.

Clinical Data.
Fluoroscopic images of size 512 × 512 pixels with spatial resolution 0.42 mm/pixel were taken by the X-ray simulator system (Ximatron CX, Varian Medical Systems, Palo Alto, CA) at Tohoku University Hospital. The sampling interval of the image observation was every 0.5 s (i.e., 2 images/s). The number of image frames was 18 for each case. The less sampling frequency and smaller number of images make it more difficult for the proposed method to extract the tumor image because the number of accumulated constrains described in Section 2.1 is less than the phantom case. However, the number of frames can be larger or even equal to that of phantom case without clinical difficulty before the therapeutic fraction. The peak-to-peak displacements in craniocaudal direction were 9.42 mm, 5.88 mm, and 7.14 mm for cases 1, 2, and 3, respectively. Figures 6 and 7 show a fluoroscopic sequence of 10 frames chosen from tested 18 frames of the clinical case 1 and the corresponding frames of extracted tumor images, respectively. As seen in Figure 6,     the proposed extraction, the motion tracking errors for all cases are less than the conventional method without extraction. For example, the average with standard deviation of the tracking error for the three cases by the proposed method is 0.741 ± 0.230 mm and less than 1.721 ± 0.463 mm by the conventional method. Second, the fact that the error is also less than a minimum clinical requirement within 1.0 mm may be more important. It might be worth to mention that even though the extraction image was blurred and noisy as shown in Figure 7, the method can still achieve a good tracking performance. These clearly demonstrate the clinical usefulness of the extraction for the markerless tumor motion tracking with sufficient accuracy.

Discussions.
In the experiments, we manually initialized the tumor outline and intensities. Indeed, the radiologists can easily draw a rough outline on the fluoroscopy. Note that even started from a rough initial estimation of the outline and intensities, the proposed extraction can achieve a good tracking performance within 1 mm accuracy for both phantom and clinical cases. On the other hand, the more accurate initial estimation gives the more accurate extraction, especially for clinical cases. Such accurate outlines are available by using the X-ray CT image or digitally reconstructed radiography (DRR). Thus, the more accurate tracking can be achieved by the proposed extraction with the more accurate initial outline.
As mentioned earlier, better optimization techniques can improve extraction performance. The objective function ( ) of the sequential optimization formulated in this paper involves only one frame constrain and is good for real-time computation during treatment. On the other hand, constrains from more than 2 frames can simultaneously be incorporated into an objective function of a batch optimization, such as batch = ∑ t ( ). This is good for offline computation and can provide better intensity components before treatment. In this case, larger iterations may be chosen. For further improvement, many tracking techniques other than the simple template matching [17] can also be incorporated into the proposed method. In fact, the phase only correlation [22][23][24][25] for low contrast cases and the particle filters [26][27][28] for noisy and stochastic deformation can be applied to the extracted tumor image.
Although the proposed technique can achieve real-time tracking, the current radiotherapy machine may have latency to control the irradiation position. In this case, motion prediction based on the real-time tracking can be used and such prediction methods have been proposed [4].
The number of clinical data used is not good enough and the image quality of clinical data is different from that of the phantom case, but this is because no new or extra data were taken other than from normal planning routine to avoid any extra radiation dose for the developmental phase. However, a large number of clinical data with the same image quality of the phantom case will be tested for the evaluation phase of the proposed method.

Conclusions
We have developed the dynamic decomposition method to extract the moving lung tumor image component from kV Xray fluoroscopic images and applied it to the tumor tracking. The tracking does not require any fiducial markers implanted into the tumor and thus is fundamentally free from the risk of implantation troubles. Sufficiently high accuracy of the extraction and motion tracking has been demonstrated by using both phantom and clinical datasets. The results suggested that the proposed method is an ideal solution for the implantation risk and can achieve a low-risk and highly accurate tumor motion tracking for the real-time IGRT.