Statistics-Based Prediction Analysis for Head and Neck Cancer Tumor Deformation

Most of the current radiation therapy planning systems, which are based on pre-treatment Computer Tomography (CT) images, assume that the tumor geometry does not change during the course of treatment. However, tumor geometry is shown to be changing over time. We propose a methodology to monitor and predict daily size changes of head and neck cancer tumors during the entire radiation therapy period. Using collected patients’ CT scan data, MATLAB routines are developed to quantify the progressive geometric changes occurring in patients during radiation therapy. Regression analysis is implemented to develop predictive models for tumor size changes through entire period. The generated models are validated using leave-one-out cross validation. The proposed method will increase the accuracy of therapy and improve patient’s safety and quality of life by reducing the number of harmful unnecessary CT scans.


INTRODUCTON 1.Head and Neck Cancer
Head and neck (H&N) cancer refers to a variety of malignant tumors that occur in the head and neck region.The term malignant defines these tumors as having the ability to metastasize or spread to other parts of the body.The goal in all cancer treatment is to remove the tumor with as little damage as possible to the important structures in the head and neck.
Most cancers are treated with surgery, radiation therapy (RT), chemotherapy, hormone therapy, or biological therapy.Doctors may decide to use one treatment method or a combination of methods.RT is a very important tool in the treatment of cancer.More than half a million cancer patients receive RT each year in US, either alone or in conjunction with surgery, chemotherapy or other forms of therapy.Together with image guided treatment planning, RT is a powerful tool in the treatment of cancer, particularly when the cancer is detected at an early stage.Chemotherapy and RT may also be used together to effectively treat the cancer.Intensity-modulated radiation therapy (IMRT) and Computer Tomography (CT) Imaging are the most widely used techniques for treatment planning [1].

Objective
The primary objective of radiotherapy is to deliver the correct dose of radiation to cancerous region with minimum damage to surrounding normal tissues in head and neck.Based on pre-treatment Computer Tomography (CT) images, a master plan for the treatment is developed at the beginning of the patient's visit to the hospital.Most types of tumors need more than 60 Gy (a radiation exposure unit) of total dose of radiation.Since human tissue can tolerate only a limited amount of radiation without side effects, the total dose is fractionated for a daily dose of 2 Gy, and the entire period will take 5 to 7 weeks.Most of the current RT planning systems assume that the tumor geometry will not change during the course of treatment.Therefore, based on this assumption, the daily planned radiation dose remains unchanged.However, there is a critical flaw in this assumption, because tumor geometry is shown to be changing over time.The size and shape may shrink or expand due to radiation or some other unknown reasons.Therefore, there is a critical need to track the changes in tumor geometry over time during the radiotherapy treatment.
In the present work, we propose a methodology in order to monitor and predict daily (fraction day) volume and surface changes of head and neck cancer tumors during the entire RT treatment period.The main goal of the proposed methodology is to increase the accuracy of each therapy and quality of life for patients.The objective is to develop a predictive model that captures the changes in the tumor geometry so that doctors and physicians can adjust the amount of planned radiation in each fraction day based on the predicted tumor size.The result of this project will have a positive impact on the head and neck cancer patients by reducing the inconvenient, costly, and time-consuming repetitive CT scans or Magnetic Resonance Imaging (MRI) during the radiation treatment therapy.In other words, using predictive models to monitor tumor deformation in certain fraction days can substitute CT scan in those days.Doctors can just use a few CT scans during the whole therapy to avoid any inaccuracy which may be caused by using the proposed predictive models.The literature survey reveals that no work has been conducted to model the geometrical changes that a tumor might go through after each radiation.

Background
Modeling and predicting deformations of body organs has gained increasing interest in recent years.These anatomical deformations can be caused by diverse factors such as breathing, tumor growth, etc. [2].Tumor growth is one of the anatomical deformations 572 Statistics-Based Prediction Analysis for Head and Neck Cancer Tumor Deformation and has critical applications in cancer treatment.The first step in the modeling of tumor growth is to detect the location of the tumor using the computer tomography slices [3].
From modeling point of view, the first step is to acquire the volumetric visualization of the tumor structure.Volumetric objects often come from sampled volume datasets generated with biomedical data acquisition devices such as CT scan, MRI, and confocal microscopy.Volume visualization plays an increasingly important role in biomedical research and applications.The recent model-based volume visualization approach extends modeling-rendering paradigms to a volumetric environment where volumetric modeling and rendering are integrated into one model-based visualization system [4].
Volume visualization can be achieved by image processing techniques.Image data are processed using one or a combination of different segmentation techniques available.Medical images are of great importance in RT, which became a privileged application field for image processing techniques.There is currently no single segmentation method that yields acceptable results for every medical image.Selection of an appropriate approach to a segmentation problem can therefore be a difficult dilemma.Hohne et al. [5] used ray casting algorithm working on a gray scale voxel data.Shen et al. [6] presented a deformable model for automatically segmenting brain structures from volumetric MR images and obtaining point correspondences using geometric and statistical information in a hierarchical scheme.Models proposed by Cootes et al. [7] and Wang and Staib [8] used equal weights in calculating shape statistical parameters.Cootes et al. [7] used two steps for active shape model (ASM) algorithm mainly for image data interrogation and shape approximation.Korn et al. [8] investigated the problem of shape retrieval from a large database based on the concept of mathematical morphology.Kervrann and Heitz [10] presented a motion-based segmentation method used for tracking the movement of deformable structures in image sequences.Ferrant et al. [11] modeled the biomedical shape changes using physics-based model of the objects the image represents.They used the shape changes of the surfaces of the objects as boundary conditions for the physics-based finite element (FE) model.Miller and Chinzei [12] proposed a constitutive model for brain tissue for modeling and simulation of surgical procedures using FE method based on the strain energy function in polynomial form with time dependent coefficients.Davatzikos et al. [13] proposed a framework for modeling and predicting anatomical deformations and tested on simulated images.Some researchers modeled deformation of human organs using biomechanical models.These models are considered to be accurate as they utilize the physical knowledge about the deformed anatomy and its properties, assuming that the material properties, equations governing the deformations and the boundary conditions are known [14][15][16].However, these factors are not always known and they are very complex to determine.
Most of the reviewed methods are considered complex and require long computation time.Some of the reviewed methods are also based on some assumptions such as the knowledge available for properties of the deformable objects and their boundary conditions which are not always easy to access.Some of these methods also are implemented on the basis of deformation detection or segmentation using two dimensional (2D) images.The extended review and comparison among these methods 574 Statistics-Based Prediction Analysis for Head and Neck Cancer Tumor Deformation are presented in [17].We aim to propose a model to remedy these disadvantages, which needs shorter computation time and involves simpler assumptions.Our approach is based on three dimensional geometrical analyses.Moreover, several papers have highlighted the use of some specific data mining methods and statistical approaches in providing assistance in the field of prognosis, diagnosis, and treatment planning decision makings as well as prediction of medical treatment outcome.Delen et al. developed a prediction model for breast cancer survivability using Neural network, decision trees and logistic regression methods to generate the prediction [18].Oztekin et al. adopted approaches such as decision trees, neural network, logistic regression and Cox regression model to predict the heart-lung graft survival time [19].Kusiak et al. aimed to predict survival time for kidney dialysis patients using Rough-set (RS) and decision trees [20].Pendharkar et al. used data envelopment analysis (DEA) (non-linear non-parametric mathematical technique) and artificial neural network for mining the breast cancer data [21].Su et al. attempted to construct a prediction model for Type II diabetes using neural network, decision trees, logistic regression and rough set in order to select the most effective features to predict diabetes [22].Chou et al. applied integration of artificial neural network and multivariate adaptive regression splines (MARS) methods to improve the performance of breast cancer diagnosis [23].Kuo et al. described a novel computer-aided diagnosis system using decision trees approach in order to improve the diagnosis of breast cancer and increase the ability of ultrasonographic (US) technology for differential diagnosis [24].Jilani et al. applied logistic regression to find the significant factors affecting the presence or absence of acute coronary syndrome [25].Chhatwal et al. applied logistic regression in constructing prediction models for early detection of breast cancer [26].Chen et al. applied a classification procedure including three classification techniques: discriminant analysis, support vector machine, and artificial neural network to differentiate innocent musical murmurs from other heart murmur types [27].
Prognostic models using data mining and statistical techniques in predicting patients' outcomes of interest can assist experts in the evaluation of effectiveness and efficiency of health centers services.Hanna and Keizer integrated classification trees and logistic regression methods in order to develop predictive models for patients' mortality in the Intensive Care (IC) [28].Ture et al. applied decision trees algorithms and Cox regression analysis for disease-free survival in breast cancer patients [29].Osareh and Shadgar utilized support vector machine, k nearest neighbor, neural network, naïve Bayesian, and decision trees to develop models for classification of cancer related gene expression [30].Jonsdottir et al. generated predictive model which is able to classify the new breast cancer patient into either no-event or recurrence-event of cancer five years after diagnosis; Naïve Bayesian classifier, decision trees, and a wide range of meta algorithms were used in this research [31].Aragones et al. presented a decision support tool by combining neural network and decision trees to identify the most important prognostic factors for breast cancer relapse [32].Most of the developed predictive models are associated with categorical class labels.No works have been devoted to predict deformation of cancer tumors by applying data mining and statistical techniques.

METHOD 2.1. Scope
Figure 1 illustrates the scope of the proposed methodology.The proposed methodology consists of two phases.The first phase constructs the model used for tumor geometrical analysis.The second phase uses proper statistical techniques for deformation prediction of the geometry based on patient's selected attributes (age, weight, stage, etc.).Finally, based on the prediction model, new treatment plans can be developed.The following is a detailed discussion of each phase.

Data Collection
Clinical patient data were obtained from the University of Texas-MD Anderson Cancer Center (MDACC), Department of Radiation Physics, with the approval by MDACC Institutional Review Board (IRB).Data of 23 patients were collected.In order to perform the analysis, two types of data are needed.The first type is CT scan data which come in the format of x, y, z coordinates or point clouds.The second type of data relates to patients' attributes which are collected through each of the 23 patients' clinical file.For geometrical analysis purpose, the first type of data, CT scan data, is used.In order to generate predictive model, a combination of the results of both geometrical analysis and patients' attributes is used.

Geometric Analysis
The first stage of the proposed methodology is implemented in two phases: (1) Geometric modeling and analysis, and (2) validation through Rapid Prototyping.The goal the first phase is to use the collected patient data to quantify the progressive Journal of Healthcare Engineering • Vol.Identify the space between slices/layers.• Identify the number of curves in each slice.Each slice may contain more than one curve.• Using x-, y-, and z-coordinates of each curve, the binary mask is constructed for each layer by applying roipoly function.This step is repeated for the entire curves of GTV data.

•
The region boundary in a binary image is traced and the number of pixels in the boundary is identified.

•
Using the number of pixels and physical dimensions of them, the surrounding surface area of the tumor is calculated.

•
Using the same procedure, the surface areas of the first and the last layers are identified and added to the surrounding surface.
In order to calculate the tumor volume, the same approach is applied, yet the number and dimension of voxels are used.The discussed methods are applied to calculate patients' tumor volume and surface values throughout the entire fraction days (33 days).

Phase II: Validation through Rapid Prototyping
The CT images can be exported to build a physical model of the anatomy using the rapid prototyping (RP) process.Rapid prototyping is a manufacturing technique that creates various 3D geometries by means of layer-by-layer construction.Rapid prototyping is impacting the medical sector in several ways.It can be used for surgical planning, prosthesis and implants, direct manufacture of biologically active implants and many others.Using principles of rapid prototyping, CT scan or MRI data of anatomical parts can be translated into physical data and can be used to manufacture a 576 Statistics-Based Prediction Analysis for Head and Neck Cancer Tumor Deformation 3D 1:1 scale physical, three-dimensional model.Medical RP models are physical hard copies of a patient's specific anatomy, visualized by three-dimensional scanning techniques.These medical models provide visual and tactile information for diagnosis and operational planning [33,34].In the proposed research, RP is employed to build a set of prototype mockups of the tumors.The prototypes are used to inspect and measure the geometry of the tumors as they go through deformation.The tumor prototypes are intended for further validation of the proposed methodology.

Data Preprocessing
The In order to prepare the attribute dataset for the statistical analysis phase, each attribute was categorized to different levels, and then transformed into dummy or indicator variables.If the categorical variable has n levels, it is possible to define n-1 dummy variables [35].The implemented method is reference coding, which uses zero and one to code the variable.
Based on the tumor volume values, tumor surface value and attribute datasets, two separate datasets were created for the next phase, statistical analysis.For this purpose, the attribute dataset is integrated with volume and surface datasets separately to form two individual datasets associated with volume and surface, respectively.

Statistical Analysis
In this phase, the aim is to develop a predictive model that can predict tumor deformation during RT for head and neck cancer patients.The preprocessed integrated datasets, consisting of tumor volume/surface values and patients' selected attributes, are treated as training sample data to develop the prediction model for tumor changes.The first two fraction day's tumor volume and surface values are assumed to be given to predict the rest of fraction days' volume and surface values which are in continuous format.Regression analysis, the most appropriate method to predict numerical values, was employed to generate predictive models.

Tumor Volume and Surface Analysis
Using CT scan data in each fraction day, data related to the gross tumor can be extracted and visualized by patching the surface of the tumor.Figure 2 illustrates a sample of tumor visualization created by MATLAB code.Table 1 and Table 2 present the average tumor volume and surface data, respectively, for all of the 23 patients throughout the entire fraction days.Figure 3    Tumor volume values changes during the whole period of therapy for each patient.Tumor Deformation

Rapid Prototyping Model
The prototypes are used to inspect and measure the geometry of the tumors as they go through deformation.They are intended for further validation of the proposed methodology.Sample prototype generated from CT scan data is illustrated in Figure 5 where the tumor is clearly visible in the head and neck area.

Statistical Analysis
To identify the best combination of predictor variables for each 31 day prediction model, the stepwise procedure has been conducted using MATLAB routines.In general, in identifying the best combination of variables for the dependent variable V i or S i (i = 3, 4, …, 33), the independent variables or predictors are patients' selected attribute plus subset of previous volume or surface values, including Considering the selected variables from stepwise procedure, we find that the most significant and effective predictor variable in prediction of each fraction day's volume/surface value is the previous day's volume/surface value.Therefore, the obtained result is extremely supportive of the idea of applying a simple regression model with one predictor variable, which is the previous day's volume/surface value.
For each fraction day's tumor volume/surface, a prediction model is developed using MATLAB software.The main assumption is that the volume/surface values of the first two fraction days for each patient are given.Table 3 and Table 4 summarize the results of regression analysis for tumor volume and surface, respectively.The tables represent the regression coefficients including the intercepts or constant values (α) and variable coefficients (β) for all 31 models developed for the volume/surface value prediction of  There are different criteria for checking the quality of the model.One of the most common criteria, coefficient of determination, R 2 , is used in this research.The R 2 value varies from zero to one.The closer this value to one, the better the regression model.Table 5 and Table 6 summarize the values of R 2 for each fraction day's volume and surface prediction model, respectively.For all models, the values of R 2 are above 0.9, and most of them are above 0.99, indicating good levels of fit.

Cross Validation
It is necessary to assess how the results will generalize to an independent dataset.An important point in the construction and evaluation of a prediction model is that they should not be built and tested on the same dataset to avoid the problem of overfitting.
In our study, a special case of k-fold cross validation, which is called leave-one-out, is implemented.This is the same as the k-fold cross validation with k being equal to the number of observations in the original dataset.As the name suggests, it involves reserving a single observation at a time as a testing set and using the remaining observations from the original dataset as training set.This procedure is repeated such that each observation is used once as testing data.In this research, the ratio of PRESS (predicted residual sum of squares) to SSE (error sum of square) is taken as an indicator for the level of fit. Figure 6 and Figure 7 display the graph of PRESS/SSE values in all fraction days for volume and surface predictions, respectively.The results reveal that for most of the fraction days, this ratio is close to 1.2, verifying a good level of fit.However, as the volume graph (Figure 6) shows, this ratio is higher than the normal value on fraction day 30, and surface graph (Figure 7) shows that this ratio is higher than the normal value on days 28, 29, and 30.In this case or similar cases where the results show some deviations from the accepted range, it is necessary to check the possible existence of outliers or highly influential observations in the model.The existence of any outliers or highly influential observations can significantly impact the final results.We used studentized residual values as the outlier detector, and identified one patient's data as a possible outlier.There are different reasons that could explain this result such as erroneous CT scan data in some fraction days of this patient.

Significance
Using the proposed method, we are able to measure tumor size using CT scan data and generate prediction model during the fraction days.Many previous research works were devoted to mathematically modeling cancer tumors disregarding the stages of deformation as the tumors go through deformation.Moreover, studies that were focused on tumor deformation and modeling used biological condition and information of the organs itself.For example, Kima et al. proposed and designed patient-specific RT strategies based on the patient's biological responses over several treatment sessions [36].The proposed mathematical method used the patient's biological condition to model the system state and the beam intensities as controls [36].Other researchers such as Kim et al. proposed a method to mathematically explore the potential benefit of fractionated RT using Markov decision process (MDP) model [37], and Ferris and Voelker formulate a model for the day-to-day RT planning using heuristics and neurodynamic programming and show that the results can be achieved by incorporating uncertainty about errors during treatment planning [38].The tool developed by Swanson uses MRI data to predict where a patient's tumor can grow a few months ahead of time [39].The software is based on MRI data to simulate how fast a tumor is likely to grow and how long a patient is likely to live under different scenarios [39].These methods, although may be valid, are mathematically complex and require long computation time.In the proposed method, CT scan data are used to visualize 3D image of the tumor and the information from these data is used for calculation of geometry.This process does not require complicated image analysis methods.Statistical and regression analyses are applied for effective and accurate deformation analysis and prediction, which is used by the clinicians to better plan the RT area for better accuracy and safety.

CONCLUSIONS
In this research, a methodology is proposed to monitor and predict daily (fraction day) volume and surface changes of head and neck cancer tumors during RT.The proposed method enables using geometrical and statistical analyses rather than repetitive CT scans to predict the tumor deformation during RT.Data of real patients were obtained from MD Anderson Cancer Center, including patient clinical attributes and CT scan data of 33 fraction days of RT.CT scan data were used to develop MATLAB routines to calculate tumor volume and surface.Patients' selected clinical attributes were used to build the training dataset for regression analysis implemented to develop predictive models for tumor volume and surface changes throughout the fraction days.The generated models were validated through leave-one-out cross validation.The predictive models may help physicians to predict the entire fraction days (generally 33 fraction day) for a new patient based on patient's selected attributes which are usually available after the first patient visit and the first two fraction days' CT scan data.The proposed methodology may help increase the accuracy of RT and avoid delivery of radiation to surrounding normal tissues in head and neck area.It may also reduce the number of harmful, repetitive CT scans and thus improving patient safety and quality of life.

Figure 2 .
Figure 2. Sample of tumor visualization by MATLAB.

Figure 3 .
Figure 3.Tumor volume values changes during the whole period of therapy for each patient.

2 )Figure 4 .TumorFigure 5 .
Figure 4.Tumor surface values changes during the whole period of therapy for each patient.
3 • No. 4• 2012 575 /anatomic changes occurring in patients treated with RT for head and neck cancer.Steps in this phase include (1) volume construction and analysis, and (2) surface contouring and construction.The CT scan data which are in the x, y, and z coordinates format are used in this phase.2.3.1.Phase I: Geometric Modeling and AnalysisTwo major objectives are associated with the geometry modeling and analysis phase.These two objectives are the following: (1) to visualize the tumor contour and patch the surface of the tumor, and (2) to establish the tumor's selected features such as volume and surface values.A series of MATLAB ® routines are developed to perform these analysis and required calculations.The surface calculation method consists of the following steps:•Read CT scan data of a patient.•ExtractdataassociatedwithGTVfrom CT scan.•Theprocesseddatarelated to gross tumors are sorted by the z-coordinate.Points with the same z-value (same layer) are then used to construct the slice contours.•Findthe number of slices/layers in GTV (in point cloud format).
geometric•Identify GTV data points' coordinates in x-, y-, and z-coordinates separately.• Sort data based on z-coordinate.• volume/surface database includes volume/surface values' data for 23 patients collected in 33 fraction days.The patients' attributes were collected through the patients' clinical files.The collected attributes can be categorized as characteristic attributes (e.g., sex, marital status, age, etc.), tumor registry attributes (e.g., tumor stage, type of RT, dose, etc.), and patients' personal habits and disease history (e.g., smoke, alcohol, cancer history in family).Efforts were made to select only those attributes that may influence the results.The final selected dataset attributes include the following: • Stage: The stage of the tumor is identified based on TNM system, in which each cancer is assigned to a T, N, or M category.The T category describes the primary tumor size and the extent the tumor has grown to nearby tissues.It is presented by ordinal numbers of T (T1, T2, …); the higher the T number, the larger the tumor and/or the more invasion to nearby tissues.The category N describes the extent the cancer has spread to nearby lymph nodes.It is presented by ordinal numbers (N1, N2, …); the higher the N number, the more lymph nodes are involved.The M category describes whether or not the cancer has spread to distant areas of the body.It is presented by M1 or M0, indicating that cancer has or