ESTERR-PRO: A Setup Verification Software System Using Electronic Portal Imaging

The purpose of the paper is to present and evaluate the performance of a new software-based registration system for patient setup verification, during radiotherapy, using electronic portal images. The estimation of setup errors, using the proposed system, can be accomplished by means of two alternate registration methods. (a) The portal image of the current fraction of the treatment is registered directly with the reference image (digitally reconstructed radiograph (DRR) or simulator image) using a modified manual technique. (b) The portal image of the current fraction of the treatment is registered with the portal image of the first fraction of the treatment (reference portal image) by applying a nearly automated technique based on self-organizing maps, whereas the reference portal has already been registered with a DRR or a simulator image. The proposed system was tested on phantom data and on data from six patients. The root mean square error (RMSE) of the setup estimates was 0.8 ± 0.3 (mean value ± standard deviation) for the phantom data and 0.3 ± 0.3 for the patient data, respectively, by applying the two methodologies. Furthermore, statistical analysis by means of the Wilcoxon nonparametric signed test showed that the results that were obtained by the two methods did not differ significantly (P value > 0.05).


INTRODUCTION
The effectiveness of radiation therapy depends on the patient setup accuracy at each radiation treatment session. A significant problem is to reproduce the intended position of the part of the patient that is irradiated with respect to the treatment beam(s) at each treatment session. It is a common clinical practice to verify the setup by comparing the portal image with a reference one which records the intended patient position. Typical reference images that can be used are simulator images, digitally reconstructed radiographs (DRRs), or another portal images. The introduction of electronic portal imaging devices (EPIDs) offers the potential for correcting inaccuracies in patient placement in a prospective manner, rather retrospectively as is done with conventional megavoltage films.
In general, setup errors are classified as random (or interfraction) and systematic errors [1]. The random errors are deviations between different fractions, during a treatment series, whereas the systematic errors are deviations between the intended patient position and the average patient posi-tion over a course of fractionated therapy [2]. Furthermore, random patient movement or periodic movements such as breathing can cause the so-called intrafraction error, which is defined as the deviation observed within a single fraction of fractionated therapy. However, these movements during a single fraction are usually insignificant for most patients and treatment sites, with a few exceptions (e.g., the lung).
A number of setup correction strategies aiming at improving target localization during radiation therapy treatments have been proposed. Most of these strategies are based on the matching of common anatomical features of portal images selected either manually or semiautomatically. In [3], the magnitude of the errors introduced into the registration between the rotated and the nonrotated phantom images and the reference DRR image was determined based on "match structures," which include the field edges and at least three anatomical landmarks manually selected on the reference image and matched with the corresponding anatomy in the portal images. An object-based registration method for portal images was developed in [4], which was based on core analysis, a fundamental computer vision method, to define correspondence between common anatomical structures of the images manually selected and a curved-based matching algorithm (chamfer-matching) in order to determine the translation/rotation parameters of the image registration. Similar approaches for registering DRRs with portal images were also proposed using either the Pearson's correlation coefficient as a measure of match on selected anatomical features [5] or the template matching technique [6]. Other strategies for the verification of patient setup were based on the optimization of a similarity measure such as histogram matching technique [7], phase-only correlation [8], the minmax entropy [9], and the mutual information [10]. Furthermore, a comparative study of various similarity measures and optimization procedures had been performed on matching high-quality DRRs against portal images that were acquired right before treatment dose delivery [11].
Setup errors that are caused by out-of plane rotations can be estimated by means of three-dimensional techniques [5,12]. In general, out-of-plane rotational errors that are smaller than 3 • do not affect the projected anatomy in portal images significantly. However, when larger rotational errors are not taken into account, this causes a reduced accuracy in the measurement of the translational error [13]. In some cases, two-dimensional techniques can also provide reliable estimations of the out-of-plane rotational errors; outof-plane rotation for an anterior image can be an in-plane rotation for a lateral image.
In this paper, an extended version of our paper in [14] is presented for the estimation of patient setup errors during radiotherapy treatments. According to the proposed methodology, the verification of patient setup consists of the following steps: (a) delineation of radiation field edges in the portal image in order to verify that the beam has the correct shape as well as to establish a common coordinate system with a previously delineated field edge from the reference image, and (b) matching of common anatomical structures within the two images in order to provide an estimation of the patient setup error relative to the field edges. Two registration methodologies are presented within the paper: (a) the registration of portal image of the current fraction of the treatment with the corresponding DRR image, used as a reference image and (b) the registration of portal images at different treatment sessions using a nearly automatic technique based on self-organizing maps (SOMs) to define automatic correspondence of common anatomical features of the portal images. Both registration methodologies have been incorporated towards the development of a software system, called ESTERR-PRO, for the estimation of patient setup errors as presented in the paper. A detailed description of the system in terms of registration methodologies is provided in Section 2. In Section 3, results of the performance of the system on phantom and real data are presented. Finally, in Sections 4 and 5, discussion of the results and concluding remarks are drawn, respectively.

The proposed software system: "ESTERR-PRO"
The complete system incorporates the following features: (a) friendly user interface, (b) image processing tools, and (c) a database of patient records and images. The user of the system is able to access and display portal images and corresponding reference images (DRRs simulator images) for a patient selected from the database. The image processing tools can then be used to detect and objectively estimate the setup error, if present.
ESTERR-PRO runs on personal computers (PCs) under the Microsoft Windows operating system. The software was developed in the C++ language.
In order to review portal images, the user must first select a patient name from the database. Then all the available reference and portal images, sorted by date, are presented to the user. When the selection of the reference and the portal image is completed, the images are displayed next to each other (see Figure 1).

Patient setup verification
As already mentioned, the patient setup verification comprises two steps: (a) delineation of radiation field edges in the portal image in order to verify that the beam has the correct shape as well as to establish a common coordinate system with a previously delineated field edge from the reference image, and (b) matching of common anatomical structures within the two images.

Verification of field shape
The radiation field edges in the portal image are delineated automatically as follows: a thresholding operation (with threshold level set to 5) of the gray-levels of the image is applied in order to obtain a rough approximation of the field contour. Then, the Canny edge detector [17] using a fast recursive implementation of the Gaussian kernel [18], applied on the original image in a band of width 15 pixels around the position of the initial contour, provides the final form of the field edges. The values for the standard deviation of the Gaussian kernel, the low threshold, and the high threshold for the nonmaximum suppression of the Canny edge detector were set to 2.0, 0.0, and 0.95, respectively. The field edges for each image are stored in the computer memory as a binary image, which is called field edge map and has the same size as the original image. A value 1 (0) in the field edge map indicates that the corresponding pixel belongs to the field contour (background). The verification of the field shape is accomplished automatically as follows: first, the distance transformation of the reference field edge map is calculated [19]. The distance transformation provides the smallest distance of each pixel of the field edge map from the field edge. Then, an optimization process is applied in order to achieve the spatial coincidence of the two radiation field edge maps. The optimization process involves the minimization, with respect to the parameters of a rigid transform (namely displacement and rotation in the image plane) of the distance between the reference edge map and the transformed version of the edge map of the portal image, using the current values of the parameters of the rigid transform. The distance between the two edge maps is calculated by means of the distance transformation of the reference field edge map. In our implementation, the Powell's method [20] is used for the optimization process. After the end of the optimization process, the distance between the reference edge map and the transformed edge map of the portal image should be small enough. If this distance exceeds a predefined value, this means that the two field edges do not have the same shape and a warning is generated, which informs the user about this inconsistency. Additionally, if the reference image is another portal image then it is expected that the parameters of the rigid transform, obtained during the optimization process, to be nearly zero. If this is not the case, then a warning is also generated. It must be noted that the whole process is invoked automatically, immediately after the user selects the pair of the images.

Matching of anatomical structures
The choice of the procedure that is used for the matching of common anatomical structures between the reference and the portal image depends on the type of the reference image: (a) if the reference image is a DRR or a simulator image, a modified manual procedure is applied, (b) if the reference image is a portal image, a semiautomatic approach is followed.
In particular, when the reference image is a DRR or a simulator image, the following procedure is applied. The edges of each image are extracted automatically by the Canny edge detector. Trackbars for the adjustment of the values of the parameters Canny edge detector (namely, the standard deviation of the Gaussian kernel and the high threshold for the nonmaximum suppression) are available (see Figure 2) in order to provide the capability of the user to select only edges that correspond to the anatomical structures of interest.
When the user selects the edges that correspond to the same anatomical structures within the two images, a similar optimization procedure as the one used for the verification of the field shape is invoked. The outcome of the aforementioned procedure is the setup error in terms of horizontal displacement, vertical displacement, and angle of rotation around the axis that is perpendicular on the image plane.
When the reference image is another portal image, the matching procedure involves the definition of fiducial marks (points) on anatomical structures of the reference image that 4 International Journal of Biomedical Imaging do not move with time, or with anatomical processes (the skeletal system is a good example, whereas the bladder or the intestines are counterexamples of placement of fiducial marks). The user-defined fiducial marks can be saved with the reference image to be utilized in subsequent fractions of the radiation treatment series. Let (x i , y i ) (i = 1, 2, . . . , N) be the coordinates (in pixel units) of the user-defined fiducial marks. The next step of the procedure involves the maximization of a properly chosen function, f , with respect to the parameters of the setup error (horizontal displacement (dx), vertical displacement (dy), and angle of rotation (θ)) using self-organizing maps [21]. The self-organizing map (SOM) is a neural network algorithm, which uses a competitive learning technique to train itself in an unsupervised manner. Kohonen first established the relevant theory and explored possible applications [22]. The Kohonen model comprises a layer of neurons m, ordered usually in a one-or twodimensional grid. The training of the network is performed in an iterative way. At each iteration k, a data point x ∈ R n is presented to the network; the neuron j with weight vector w j ∈ R n is declared as the winning neuron, according to the following rule: The winning neuron j and its neighboring neurons i have their weight vectors modified according to the following rule: where h i j (n) = h( r i − r j , n) is a kernel defined on the neural network space as a function of the distance r i − r j between the winning neuron j and its neighboring neurons i, as well as the iteration number n. This kernel has the shape of the "Mexican hat" function, which in its discrete form has maximum value at inter-neuron distance in the case of i = j whereas its value drops in a Gaussian manner as the distance increases. The width of this function decreases monotonically with iteration number. In this way convergence to the global optimum is attempted during the early phases of the self-training process, whereas gradually the convergence becomes more local as the size of the kernel decreases. Prior the description of the proposed method, some notations must be introduced. Let μ A (I) denote the restriction of an image I to the region A ⊂ R 2 and T w (A) ⊂ R 2 is the rigid transformation, with parameters w = (dx, dy, θ), of the region A, where dx, dy, and θ are the horizontal displacement, the vertical displacement, and the angle of rotation, respectively. Furthermore, MoM(I 1 , I 2 ) denotes a measure of match between two images I 1 and I 2 .
If I R , I F are the reference image and the image to be registered, respectively, then the implementation of the SOM network for registering the two images is as follows. The topology of the network is constructed by placing a neuron on each user-defined fiducial mark P i = (x i , y i ) (i = 1, 2, . . . , N) of the reference image. Each neuron is associated with a square area A i = [x i − R, x i + R]×[y i − R, y i +R], of (2R+1) 2 pixels, centered at the position of the neuron. Additionally, a weight vector w i = (dx i , dy i , θ i ), which holds the parameters of a local rigid transformation, is assigned to each neuron.
The SOM network is trained as follows. calculated, the variable MoM best is set to a very large (in magnitude) negative value, and the iteration variable, n, is set to 1. (2) While n is less than n max , (i) if the average value of the MoM i (n − 1), MoM ave (n − 1), is better than MoM best , then MoM best = MoM ave (n − 1) and the current weights are stored as w i ; (ii) an input signal, s(n) = (dx(n), dy(n), θ(n)), is generated randomly; (iii) for every neuron, the quantity MoM i (n) ≡ MoM(μ Ai (I R ), μ Ts(n) (A i )(I F )) is calculated; (iv) the winning neuron, k n , in the current iteration, is defined as under the condition MoM kn (n) > MoM ave (n − 1); (v) the weights of the neurons are updated according to the following equation: where h(k n , n, i) (i = 1, 2, . . . , N) is given by the following equation: L, a, d 0 ∈ R and p ∈ R are parameters to be defined later, denotes the Euclidean norm, and is the floor function; (vi) the iteration variable is increased by one.
(3) When the training is finished, the parameters of the affine transformation between the two portal images are calculated using a least squares method between the point sets {P i } and {T wi (P i )} [20].
The selected measure of match was the gradient correlation coefficient, namely, where The subscript R(F) refers to the reference (to be registered) image; the superscript h(v) refers to the horizontal (vertical) direction in the image plane, and G denotes the first derivative. For example, G h R (x, y) denotes the first derivative of the reference image along the horizontal direction estimated at pixel position (x, y). G refers to the mean value of the first derivative.
The rationale for selecting the aforementioned measure of match was that gradient measures concentrate the contributions on edge information, which intuitively appears sensible.
The following generator of random numbers is used for producing the input signals to the network: where s 1 (n) = dx(n), s 2 (n) = dy(n), s 3 (n) = θ(n), v j is a uniformly distributed random variable in [0, 1], and U j (L j ) denotes the maximum (minimum) allowed value for the corresponding component of the input signal. Although U j and L j are inputs to the matching process, for all pairs of images used in the current study, constant values were used (±50 pixels for the displacement and ±10 • for the angle of rotation). It must be noted that (9) is a slightly modified version of the generator used in the very fast simulated annealing method [23] and provides random signals which in general lie in the range [w kn, j − (U j − L j ), w kn, j + (U j − L j )]. When a generated signal is not in the allowed range [L j , U j ], then it is discarded and a new signal is produced until s j (n) ∈ [L j , U j ]. The parameter TM(n) controls how far from the weights of the current winning neuron the input signal can reach. As the iteration variable evolves, the magnitude of TM(n) falls exponentially and the generated input signals are more localized around the weights of the current winning neuron (see Figure 3). This is a desired property, since as the number of iterations grows, the weights of the current winning neuron get closer to the parameters of the solution of the matching problem.
The parameter d 0 provides the initial radius of a circular region around the winning neuron. Only neurons inside this region are updated. Usually, d 0 is set to the maximum distance between the fiducial marks. As can be seen from (6), this distance is reduced with geometric rate determined by the parameter α (0 < α ≤ 1). A typical value for the parameter α is 0.995. The parameter L acts like a gain constant for the magnitude of the update that is applied to the weights of the neurons. This parameter also decreases geometrically as the iteration variable evolves. The range of values L is between 0.99 and 1.0; a typical value is 0.995. The parameter p is an integer that determines the rate of change of the parameters L and α. Practically, this parameter determines the number of iterations that are executed before an adjustment of the values for the parameters L, α, and TM(n) takes place. A typical value for this parameter is 200. The number of iterations is set to 5000 and the size of the square area associated with each neuron is 19 (R = 9). Finally, since the transformed region T s (n)(A i ) does not have integer coordinates, bilinear interpolation is used in order to calculate MoM i (n).
In Figure 4, the results that were obtained from the application of the proposed methodology on real pelvic data are presented.

Data acquisition and evaluation protocol
The proposed system was tested on phantom and real data. All the portal images were acquired for gantry angle 0 • using a CCD camera-based EPID (Beamview Plus v2.1, Siemens) of "HYGEIA" with a total dose of ∼ 20 MU/image; 6-MV Xrays were used. The size of the pelvic field was 12 × 12 cm 2 at the isocenter. The DRR images were obtained from a CT data set acquired using a Siemens Somaton Plus 4 scanner at 120 keV, with a 2 mm slice thickness and no gap between slices.
An anthropomorphic phantom (Alderson Rando phantom), commercially available system, was used. The evaluation method was as follows.
(a) A DRR and a portal image of the selected region of interest (head, lung, pelvis) of the phantom, with no setup error between them, were acquired. This was achieved by means of three fiducial markers, visible on the CT imaging, adhered on the phantom using laser alignment to define a reference point on both CT scanning and irradiation. This point was used as the isocenter during treatment planning (DRR) and irradiation (portal imaging). These DRR and portal images served as the reference images. Before the acquisition of CT phantom images and phantom irradiation, the accuracy of the lasers alignment in both the CT and the treatment room was checked and found within 1 mm. Moreover, possible introduced inaccuracies due to no-horizontal CT-couch motion or inaccuracies in the stated slice thickness must be excluded since an extensive quality control was performed prior to the use of the CT scanner.
(b) An expert from the "HYGEIA" hospital defined the fiducial marks on anatomical structures, (x i , y i ) (i =  1, 2, . . . , N), on the portal reference image. These points were used for matching each portal image with the reference portal image.
(c) The treatment couch was moved along the horizontal (left-right) and/or vertical (head-foot) direction 2 mm, 4 mm, . . . , 12 mm and a new portal image was acquired. Before phantom irradiation, the treatment couch (ZXT-Siemens) reading positions were checked and their accuracy was found within 1 mm.
(d) The setup verification tool of the system was invoked in order to estimate the setup error between the reference images and each new portal image, using the procedures described in Sections 2.2.1 and 2.2.2. The output of the system was the estimated values of the parameters of the simulated setup error, namely horizontal displacement (mm), vertical displacement (mm), and angle of rotation (degrees).
(e) A set of 50 test points, P i (i = 1, 2, . . . , 50), was defined on the reference DRR. Since the setup error is known, the actual position of these points on the portal images (including the reference portal image) was identified. The position of these points on the portal images was also identified using the estimated values for the setup error. For each portal image, the root mean square error (RMSE) (in millimeters) between the actual and the estimated positions of this set points was calculated by the following equation: where the subscript act (est) refers to the actual (estimated) positions of the test points on the portal images and · denotes the Euclidean distance. For the real data, the aforementioned evaluation procedure was slightly modified, since it is not known the actual setup error. Therefore, a manual registration, to serve as ground truth, was carried out by two experts from "HYGEIA" hospital. The average value of the two obtained estimates was used as ground truth. A total of six subjects were investigated, who were recruited from patients referred to the "HYGEIA" hospital for prostate cancer treatment. For each patient, a DRR and thirty portal images were acquired correspondingly. The proposed evaluation protocol has been approved by the ethical committee of the "HYGEIA" hospital and the subjects gave informed consent to the work.

Statistics
The statistical difference between the actual and estimated positions was assessed by means of the Wilcoxon signed nonparametric test, for both real and phantom data [24]. The null hypothesis was that the two methods (DRR-portal and portal-portal matching) did not differ as per the RMSE.

Phantom data
Setup error estimations (three parameters and RMSE) are shown in Tables 1 and 2, respectively. For the pelvic region, the RMSE was 0.8 ± 0.3 (mean value ± standard deviation) and 0.8 ± 0.4 when the reference image was the DRR and the portal image, respectively. For the cranial region, the RMSE was 0.8 ± 0.3 and 0.6 ± 0.3 when the reference image was the DRR and the portal, respectively. For both cases, the Wilxocon signed test showed that the null hypothesis could not be rejected at the 5% level (P value > 0.05).

Real data
For each patient, the average values of the setup error over the thirty portal images are shown in Table 3. The RMSE over all six patients was 0.3 ± 0.3 and 0.3 ± 0.3 when the reference image was the DRR and the portal corresponding to the first fraction of the treatment, respectively. The statistical analysis of the RMSE measurements of each patient showed that the null hypothesis could not be rejected at the 5% level (P value > 0.05).

DISCUSSION
The design and the development of the proposed system were based on several constraints. It should use the equipment available in the "HYGEIA" hospital: a portal imager and a CT scanner. The human intervention should be minimal, the results should be accurate and the execution time should be kept as low as possible.
In this framework, the estimation of the patient setup can be accomplished using two alternate processes: (a) the portal image corresponding to the current fraction of the treatment is matched directly with the DRR (or the simulator image). (b) The portal image is matched with the portal image acquired during the first fraction of the treatment (reference portal), whereas the reference portal has already been matched with the DRR image.
The rationale underlying the proposed system design was based on the following facts. In general, it is a very difficult task to achieve an accurate matching between a DRR and a portal image automatically, mainly due to the fact that    the two images are acquired at totally different energies. It has been proposed to convert the DRR into a megavoltage DRR prior to the matching [5,25,26]. However, this was not possible in our case, due to software system installation into a dedicated computer platform, output incompatibility of the radiation treatment planning software and local network topology.
On the other hand, automated techniques based on segmentation should be excluded from the design since these methods rely heavily on the success of the segmentation step, which is a very difficult task due to the low inherent contrast of the portal images. Additionally, no segmentation technique can give satisfactory results for every anatomical region of interest. The very difficult task of portal image segmentation justified the development of other methods of research, such as intensity-based methods [5,9,27]. These methods assume there is a statistical relation between the gray level values of the pixels of the images to be matched and that this relation is at maximum when the images are matched. Although these methods seem to be promising, further work is required.
Another solution was the use of some kind of manual technique. However, pure manual methods depend on the accurate determination of homologous fiducial marks between the two images and furthermore are in general time consuming and prone to spatial inaccuracies. Therefore, regarding the match of DRR and a portal image, we have chosen a modified manual technique that automatically identifies candidate pairs of corresponding edges between the two images. Then, the user simply selects the proper pairs of edges that are going to be used for the matching. The results in Tables 1-3 indicate that the proposed methodology provides estimates of the setup error that are close enough to the expected ones. As can be observed, the values of patient displacements along the horizontal and vertical axis are smaller than those of the phantom, due to quality control processes adopted at the Radiotherapy Department of the "HYGEIA" Hospital.
As already mentioned, the estimation of the setup error can be also accomplished by means of a portal-to-portal matching method. This method requires the definition of a small number (four to seven) of fiducial marks only on the reference portal image. These fiducial marks are stored in the database and are automatically retrieved every time the specific patient is selected. The accuracy of the portal-to-portal matching lies within the limits imposed by the clinical routine. This approach introduces an additional error in the estimation of patient setup error (error due to the matching of DRR with the portal image of first fraction and error due to the matching of the portal of the current fraction with the portal image of the first fraction). However, statistical analysis showed that the two methods did not differ significantly as per the RMSE. Additionally, since a nearly automated method is used, the user is provided in return with a fast, reliable, robust, and user-friendly technique, which requires minimal user intervention.