Development and Evaluation of an Open-Source Software Package “CGITA” for Quantifying Tumor Heterogeneity with Molecular Images

Background. The quantification of tumor heterogeneity with molecular images, by analyzing the local or global variation in the spatial arrangements of pixel intensity with texture analysis, possesses a great clinical potential for treatment planning and prognosis. To address the lack of available software for computing the tumor heterogeneity on the public domain, we develop a software package, namely, Chang-Gung Image Texture Analysis (CGITA) toolbox, and provide it to the research community as a free, open-source project. Methods. With a user-friendly graphical interface, CGITA provides users with an easy way to compute more than seventy heterogeneity indices. To test and demonstrate the usefulness of CGITA, we used a small cohort of eighteen locally advanced oral cavity (ORC) cancer patients treated with definitive radiotherapies. Results. In our case study of ORC data, we found that more than ten of the current implemented heterogeneity indices outperformed SUVmean for outcome prediction in the ROC analysis with a higher area under curve (AUC). Heterogeneity indices provide a better area under the curve up to 0.9 than the SUVmean and TLG (0.6 and 0.52, resp.). Conclusions. CGITA is a free and open-source software package to quantify tumor heterogeneity from molecular images. CGITA is available for free for academic use at http://code.google.com/p/cgita.


Background
Molecular imaging has become a significant component of patient management in clinical oncology.The importance of extracting quantitative measurements from molecular images has been widely embraced.Recently, there is an increasing interest to quantify the "tumor heterogeneity" from molecular images, especially, PET, as the tumor heterogeneity is an important biomarker for aggressiveness and disease outcome [1][2][3].The computation of tumor heterogeneity can be related to texture analysis that refers to numerous mathematical methods to compute quantitative textural features from 2D or 3D images based on the spatial variation of pixel intensity.To properly address the nature of quantification goals of this study, we use the term "heterogeneity index" to denote the calculated tumor heterogeneity in a numerical form.Although there is an emerging enthusiasm in quantification of tumor heterogeneity [4][5][6][7][8][9][10][11][12][13], such techniques remain to be evaluated and tested for clinical applications [13].
One existing challenge for investigators interested in testing the usefulness of heterogeneity indices lies in the lack of software available on the public domain.Because texture analysis is a relatively new concept for PET and nuclear medicine community, most software packages offered by vendors do not include functions for such analysis.Commercial third-party software also lacks these functions in general.To address these challenges, we implemented a software package for computing tumor heterogeneity indices and share it with the research community of molecular imaging.This report aims to describe this open-source project of our software package, namely, Chang-Gung Image Texture Analysis (CGITA) toolbox, for quantifying tumor heterogeneity.We will describe its implementation, data flow, and currently supported functions.To evaluate the usefulness of CGITA, we used a cohort of eighteen advanced oral cavity (ORC) cancer patients that were treated with definitive radiotherapy to demonstrate the use of tumor heterogeneity as a biomarker of prognosis assisted by CGITA.

Tumor-Wise and Voxel-Wise Heterogeneity Quantification.
The calculation of tumor heterogeneity is implemented in CGITA with two different levels: tumor-wise and voxel-wise heterogeneity.For the former level that generates heterogeneity indices based on the whole delineated tumor, we used the same processing scheme described by Tixier et al. [10].In brief, a tumor or target volume of tissue is first delineated from the image volume either manually or with automatic segmentation.The intensities of delineated voxels are then redigitized and carried into the mathematical transformation to compute the heterogeneity indices.The second level, on the other hand, computes the heterogeneity indices on a voxelwise basis.The intensities of surrounding voxels around a specific voxel are used to calculate the heterogeneity indices for this voxel.By repeating the same computation for each voxel, a parametric map can be formed to represent the heterogeneity distribution.

Implementation of CGITA.
CGITA was implemented in MATLAB (version 2012a, MathWorks Inc., Natick, MA, USA).It is now distributed over the Internet as an opensource project with two forms of program distribution.For users with a MATLAB license, the MATLAB codes of CGITA are available for them to download, use, and even modify.CGITA is supported and tested on the Windows and Linux platforms.For users without a MATLAB license, a standalone CGITA executable is available, although this version does not support the user-defined functions in general.All CGITA functions were implemented in native MATLAB without using compiled C++ functions or MEX files so that cross-platform support can be maximized.The only exception is the dependence of some executable functions in DCMTK [14] on DICOM query and retrieval.

Tumor Delineation.
CGITA allows two types of tumor delineation.First, CGITA accepts the volume of interest (VOI) saved as DICOM-RT structures or the VOI drawn and saved with PMOD (PMOD Technologies Ltd, Zurich, Switzerland).Second, the user may use our semiautomatic segmentation functions to delineate the tumor.Currently, the built-in segmentation in CGITA includes a thresholdbased region-growing method and a fuzzy C-means method.CGITA allows users to add new segmentation methods as well.

Computational Methods for Whole-Tumor Heterogeneity
Indices.We begin by defining what a "heterogeneity index" is.Since the term "heterogeneity" is a general description of mixed composition within an object, there is not a single or specific mathematical definition of heterogeneity.This is why we chose to use the term "heterogeneity index." Each heterogeneity index indicates the degree of heterogeneity.However, the exact way for computing its value varies from index to index.From the texture analysis point of view, each heterogeneity index is represented by a "textural feature." We use the term "heterogeneity index" instead of "textural feature" to specifically describe the biological parameter of a tumor that we wish to quantify.
The computation of tumor heterogeneity indices is performed in two steps: the computation of a "parent" matrix and the parameter extraction from this parent matrix.The "parent matrix" refers to a matrix obtained by a numerical transformation that accounts for the spatial arrangement, intensity, and relationship of the voxels contained within the VOI.We have implemented the four parent matrices described in the study by Tixier et al. [10].We also included an additional four parent matrices: the texture spectrum matrix [15], texture feature coding matrix [16], texture feature coding cooccurrence matrix [16,17], and neighborhood gray-level dependence matrix [18].For each of those eight parent matrices, a variable number of heterogeneity indices are calculated.Table 1 summarizes the currently supported indices in CGITA and their references.At present, there are a total of 72 heterogeneity indices included in CGITA.These heterogeneity indices, in brief, all point to the degree of spatial nonuniformity that directly correlates with tumor heterogeneity in tracer uptake.The difference between individual indices lies in the mathematical computation.For example, the indices computed with voxel-alignment matrix are related to the length of "run, " which is defined as the length of voxels aligned on a line that have the same pixel intensity.Among those indices, for example, the heterogeneity index "longrun emphasis" puts a stronger weighting on the intensities of voxels with long runs.Such an index can be used to measure the tumor heterogeneity by examining the voxels that have the similar tracer uptake and align along the same line.
Some conventional image-derived indices, such as SUV mean , SUV max , SUL peak [23], and total lesion glycolysis (TLG), are included in these indices based on literatures [10].

Software Validation.
The software validation is performed in two different levels.First, we validate the computation of heterogeneity indices against other software packages.Since we do not have access to other in-house software packages for computing the tumor heterogeneity, we are only able to validate some of the conventional indices such Texture spectrum [15] Max spectrum, Black-white symmetry Texture feature coding [16] Coarseness, homogeneity, mean convergence Texture feature coding cooccurrence matrix [16] Second angular moment, contrast, entropy, homogeneity, intensity, inverse difference moment, correlation, variance, code similarity Neighborhood gray-level dependence [22] Small-number emphasis, large-number emphasis, number nonuniformity, second moment, entropy as SUV mean and TLG against commercial software PMOD.
Tested with clinical PET images, CGITA is able to obtain nearly identical results as PMOD for SUV mean and TLG.The second level of software validation is the software reliability after each update and revision.In order to ensure the software quality, each time a function is added or modified, a set of clinical PET images and its corresponding VOI are kept internally for testing CGITA.Computed heterogeneity indices are generated and compared to historical results, in order to check if the computation remains consistent after software update.

Parametric Imaging.
For a given image volume, the parametric image of heterogeneity indices is computed by looping through every voxel and repeating the following steps for each voxel.The user must first choose how many voxels should be used to calculate the parametric image.For example, the user may elect to use a 3 × 3 × 3 cube centered at a specific voxel.The choice of cube size would affect the sharpness of the resulted heterogeneity parametric images.A larger cube size includes more voxels for analysis, potentially improving the heterogeneity accuracy but decreasing the spatial resolution.For a 3 × 3 × 3 cube, the intensities of those twenty-seven voxels are then treated as a delineated volume that is carried into the computation of heterogeneity indices as described in the previous sections.As a result, a heterogeneity index will be computed for the specified heterogeneity index at this given voxel.By looping through all of the voxels, except for those on the edges, a parametric image volume can be formed with the voxel-wise heterogeneity indices.) and the failed group ( = 9).In the successful group, 8 lived without evidence of recurrence, but one died of nondisease-related causes at 16 months after completion of treatment.All other deceased patients died of disease-related events.The pretreatment PET images of all eighteen subjects were used for image analysis.Each patient in our cohort received a pretreatment FDG-PET/CT scan for staging.Those pretreatment PET/CT scans provided the image sets with which we tested the texture analysis.Fifty minutes after the 370-MBq FDG injection, a whole-body static PET emission scan was acquired on a GE Discovery ST 16 PET/CT (GE Healthcare, Milwaukee, WI) [24] from the skull base to the mid-thigh, with three minutes per bed position.Images were reconstructed with OSEM (ten subsets, four iterations) with pixel spacing of 4.7 mm and 3.3 mm in the transverse and axial directions, respectively.Quantification of the tumor heterogeneity with PET images was performed as follows.First, the tumor contour is delineated by a board-certified nuclear physician in PMOD with a scheme similar to that of the previously reported head and neck tumor delineation [25].We elected to draw the VOI semiautomatically, since automatic segmentation in ORC patients is generally difficult because some surrounding oral tissues are benign but show a high FDG uptake.After the lesion was first manually outlined from the fused PET/CT images by the nuclear physician, this outlined lesion area was then reviewed to remove benign tissues with high FDG uptake.Once the lesion was outlined, an SUV value of 2.5 was used to delineate the outer contour of the main tumor.Image intensities of the delineated voxels are then used to calculate heterogeneity indices and saved for each patient.Parametric images of heterogeneity indices were calculated for selected subjects.

Performance Evaluation.
After the tumor-wise heterogeneity indices were calculated for every subject, the subjects were divided into two groups based on their outcome, with  = 9 in each group.A receiver operative characteristics (ROC) curve was plotted for each heterogeneity index independently.We calculated the area under the curve (AUC) from the ROC curves and the optimal sensitivity/specificity for each index.In addition, a Kruskal-Wallis test was performed for each index to evaluate the performance of these metrics [26].The AUC and the  value of the Kruskal-Wallis test calculated from the average intratumor SUV (SUV mean ) were compared to the AUC and  value calculated from each of the other indices.

Results
The appearance of CGITA is shown with a screen shot in Figure 1.Through a graphical user interface (GUI), CGITA's image display interface can be used to view the images and confirm whether the imported VOI aligns properly with the target tissue after importing images and VOIs.Table 2 summarizes the currently implemented heterogeneity indices in CGITA.The calculated indices, currently totaling 72, can be exported as spreadsheets.In addition to processing one subject at a time using the CGITA GUI, the user may also elect to use the batch mode by processing all subjects automatically without the user input.We tested the batch function on our ORC patient data.On average, each subject takes approximately 30 seconds to process, including image importation and the computation of all 72 features.CGITA is currently hosted at http://code.google.com/p/cgitawith both the source code and executables available for download.There is also a user manual for CGITA available at its website.Academic research uses are free of charge.As the hosting service also provides a version control system, users may also participate in this open-source project as developers to contribute new functions to CGITA.
The usefulness of heterogeneity quantification was evaluated with our ORC patient cohort.In terms of outcome prediction, the AUC from ROC analysis was calculated for each heterogeneity index, as was the  value for the Kruskal-Wallis test.Out of the total 72 textural features implemented in CGITA, we found that 13 textural features have a higher AUC and 19 have a lower  value than the SUV mean .The heterogeneity indices with the highest AUC are summarized in Table 3 and compared to conventional markers such as SUV mean and TLG.The conventional markers did not provide satisfactory discriminative power with low AUC (0.6 for SUV mean and 0.52 for TLG) and high  value under the Kruskal-Wallis test.On the other hand, some heterogeneity indices stood out as better indicators for prognosis under the current tests.Two indices computed from the intensity-sizezone matrix (ISZ) [21], low-intensity short-zone emphasis (ISZ-LISZE), and short-zone emphasis (ISZ-SZE) showed high AUC (0.9 and 0.81, resp.) and low  values (0.004 and 0.024, resp.).Compared to the SUV mean , LISZE improved the sensitivity and specificity by 11% and 22%, respectively.Including ISZ-LISZE and ISZ-SZE, six indices computed from the intensity-size-zone matrix provided an AUC over 0.7.The ROC curves of ISZ-LISZE and ISZ-SZE are plotted in Figure 2 along with the ROC curves of SUV mean and TLG.Parametric images based on the textural features of one subject are shown in Figure 3, illustrating a pattern not in the original PET images.Visual inspection revealed that parametric images formed with different heterogeneity indices exhibit various textural patterns.

Discussion
The search for image-based biomarkers remains an important but challenging aspect of clinical cancer imaging.As imaging technology continues to improve, information extraction from the reconstructed images becomes very important in maximizing the benefit of imaging studies.Recently, the term "radiomics" is proposed to describe the concept of integrating the information extracted from medical images into the proteogenomic and phenotypic information [7,8].It is apparent that, for such applications, conventional indices like SUV may not provide sufficient information.Advanced image analysis and information extraction methods become an inevitable component for the concept of radiomics, in order to maximize the amount of information that can be extracted from medical images.Quantification of tumor heterogeneity with texture analysis has been regarded as a promising field by several recent review articles [7,8,11,13].Recent reports have shown the application of tumor heterogeneity measured with textural features in nonsmall cell lung cancer [4], nasopharyngeal carcinoma [5], cervical cancer [6], peripheral nerve sheath tumors [9], gastrointestinal stromal tumors [12], and esophageal squamous cell carcinoma [27] based on FDG-PET images.Heterogeneity analysis has also been applied to the molecular image analysis of data from animal studies [28][29][30].
Although texture analysis may be a useful tool to quantify tumor heterogeneity from images, many questions require answers before this concept becomes a clinical standard.The first question arises from the signal and contrast source for different imaging methods.For example, FDG-PET reflects the glucose metabolism of tissues.A tumor that is spatially heterogeneous in cell proliferation may not appear heterogeneous in FDG-PET images, even with perfect resolution, as glucose metabolism may not be directly correlated to the proliferation.On the other hand, the properties of an imaging modality are also critical to texture analysis.The spatial resolution and signal-to-noise ratio (SNR) will both affect the performance of texture analysis.Low spatial resolution degrades the heterogeneity displayed on the acquired image, while high noise will cause a natively homogeneous image to show a high variation in pixel intensity.As a result, the results from texture analysis obtained from clinical PET images should be carefully interpreted and evaluated due to the resolution and noise limitations of PET.Researchers, who believe in the usefulness of quantification for tumor heterogeneity, should take on the responsibility of providing the imaging community with evidence-based studies.
As heterogeneity quantification or texture analysis with molecular images is still a relatively new technology, especially for nuclear medicine community, a free and open-source software package can become a key component for the success of such emerging technologies.Without it, investigators wishing to evaluate heterogeneity indices must develop in-house software, which can be time-consuming and resource limited.A free software package can therefore attract more investigators and allow them to test such new quantification methods on their data with a minimum effort and cost.The other important software characteristic that is much desired is the availability of the source codes to the users.Such efforts for open-source projects in medical imaging have been undertaken by many groups, producing tools such as the kinetic modeling software COMKAT [31,32] and radiation therapy software CERR [33].An opensource project has many benefits.It allows the source code to be examined for programming errors.Users with programming abilities may contribute to new functionalities.Most importantly, once the source code is agreed upon by most of the developers, such a software package may become a standardized platform for different groups of researchers to have a common ground for data comparison.A very successful model is the Statistical Parametric Mapping (SPM) [34], which has now become the standard software tool for neuroimaging research as it continues to expand its functionality and user base.CGITA has a long way to go before achieving such maturity like SPM.But with more users and developers being attracted by CGITA, we believe that it has the potential to become the standard software platform for studying the tumor heterogeneity with molecular images.For vendors who wish to develop and test heterogeneity functions in their commercial software, CGITA may also serve as a reference for software verification, thereby accelerating the research and understanding of quantifying tumor heterogeneity.
In brief, CGITA has the following features that may be attractive to different user groups.First, it is easy to use and has a simple GUI.For users without a MATLAB license, a compiled stand-alone executable is also provided on the website.Users without a programming background can easily apply CGITA to their image data.Second, it is open source and allows users to create new functions.Third, it supports more than seventy textural features and its functionalities continue to expand in this regard.Fourth, it has the unique feature of parametric imaging with heterogeneity indices.Finally, CGITA can be executed under a batch mode.The texture analysis of multiple subjects can be performed automatically without user intervention, which is extremely helpful for processing a large amount of data, an unavoidable task for future studies aiming to evaluate the heterogeneity quantification of molecular images.As a result, we believe CGITA will serve as a useful and practical tool for molecular imaging investigators.
A cohort of eighteen ORC patients was used in our study to test our software and demonstrate the potential application of heterogeneity indices.Because the cohort size is small, our intention is not to perform a theoretical study for establishing a standard to stratify advanced ORC patients with textural features.Instead, this case study is a demonstration of how a software package for heterogeneity quantification could be applied to patient data for research purposes.In our report, we chose the task of using heterogeneity indices for prognosis, aiming to test whether heterogeneity indices computed with CGITA might be better discriminators than conventional metrics, such as SUV.Indeed, the results support the claim that textural features may be more informative than SUV in this small-cohort case study.Judging from the ROC analysis, many of the implemented textural features showed better AUC than the AUC with SUV mean in outcome prediction power.Two of these features (ISZ-LISZE, ISZ-SZE) even improve the AUC from 0.6 to 0.81 and 0.9.The sensitivity and specificity have also been improved by heterogeneity indices.Although this study is not aimed to find the theoretical relationship between the heterogeneity indices and disease outcome, we may still speculate the reasons behind such findings.SUV and heterogeneity indices, by nature, represent different physiological and biological mechanisms.In general, SUV represents the "amount" of tracer present in local areas (e.g.SUV max ) and the whole tumor (SUV mean ), while heterogeneity indices express the "distribution variation" of tracer activities.The more heterogeneous a tumor is, the more likely a tumor is attempting to differentiate and generating different colonies to survive in its environment, especially during therapy.This may be the underlying reason that enables ISZ-LISZE and ISZ-SZE to improve the AUC from SUV mean because they are capable of capturing the tumor's heterogeneity by emphasizing the tumors with many small "zones, " which is defined as the number of interconnected voxels with the same voxel intensity.As a result, the quantified tumor heterogeneity may serve as a better indicator for tumor aggressiveness which has a direct impact on patients' prognosis and survival.This might explain why heterogeneity indices in general show stronger power for outcome prediction, in our data as well as in the literature [35][36][37].However we would also like to point out that, because we have a small patient cohort, the heterogeneity indices that are found with the best performance need to be further validated with a larger amount of data.Further validation is necessary in the future.
The image quality is also an important factor for heterogeneity quantification that requires further study and validation, especially in the spatial resolution and SNR.In our study, we used a PET camera with a spatial resolution of about 6 mm and system sensitivity of about 0.7% [24].Current state-of-art cameras could achieve a higher spatial resolution of about 4 mm and system sensitivity of 0.9% [38].Improvement on the spatial resolution with new cameras and image SNR by a higher tracer dose undoubtedly will increase the accuracy for heterogeneity quantification.However, the minimum requirement of image quality for a specific disease remains to be further studied.In our study, since the tumor size is generally large in our cohort with an average volume of 107 mL, a spatial resolution of 6 mm is probably sufficient.Similarly, since the tracer uptake is fairly high with SUV mean above five, the SNR should not be a concern under the standard injection dose of 370 MBq of FDG.With our current data, it is not feasible to determine the smallest tumor or the worst noise level that can still produce accurate tumor heterogeneity quantification.Such studies may require animal or phantom studies, in which the injected dose and image acquisition mode can be more freely modified and tested.CGITA may facilitate such testing on the software side.The exact determination of the spatial resolution and system sensitivity requirements awaits future investigation.
We are not the first group to propose the parametric imaging of heterogeneity indices.For example, a previous report has demonstrated such techniques applied to MRI for lesion segmentation purposes [39].However, to our knowledge, we are the first group to implement this function as part of an open-source project for quantifying the tumor heterogeneity of medical imaging.We tested this functionality in our patient cohort and were able to obtain heterogeneity parametric images.Unfortunately, we do not possess in vitro images of tumor heterogeneity with which our parametric images are compared because our cohort did not undergo surgical dissection of the tumor.However, it is still quite interesting to examine these images, as shown in Figure 3.For example, in the cooccurrence-contrast images, a hot spot is shown on the bottom right portion of the tumor, indicating a high variation in the voxel intensity in this area.It is easy to see that the hot spots in the original PET images and the heterogeneity images are quite different in terms of both size and location.This is reasonable because the parametric images are calculated based on the heterogeneity indices and therefore represent the spatial variation of tracer uptake.Efforts have been reported to use heterogeneity parametric images for tumor delineation and radiation targeting [40].We believe such heterogeneity images could do much more than simply determining the tumor contour.One potential role for heterogeneity images in particular, we believe, is to serve as the guide for dose-boosting techniques for targeting the radiation dose at the most aggressive areas.Further validation must be done to verify the relationship between heterogeneity images and local aggressiveness of the tumor.This could be achieved by comparing the heterogeneity images to the whole-mount pathology or immunohistochemistry data.Since we do not have appropriate data to study the relationship between heterogeneity images and the distribution of aggressive colonies, we hope that CGITA may encourage investigators who own such data to test this hypothesis for expanding the use of heterogeneity indices.
We would also like to comment that, although CGITA is targeted at oncological applications, it can also be applied in other fields, such as neurology.It is also not limited to PET, as long as the images are stored in the DICOM format that CGITA can import.This makes CGITA also a useful tool for analyzing experimental animal data.We have tested CGITA with CT and MR images for heterogeneity index computation, as shown in Supplemental Figures 1 and 2 (see Supplementary Materials available online at http://dx.doi.org/10.1155/2014/248505).Quite a few CGITA features can be further improved in the future for image display, segmentation, and acceleration for computation.New texture analysis methods, such as those based on wavelets [41], will be investigated and added to CGITA in the future.At this moment, CGITA is solely for research purposes and shall not be used for clinical diagnosis.Interested developers are welcome to join the project to advance the functionalities of CGITA.

Conclusion
We present the CGITA software package for quantifying the tumor heterogeneity with molecular images.As a userfriendly, open-source program that is free for academic use, CGITA could assist investigators to apply heterogeneity analysis to their data.With a pilot cohort of eighteen advanced ORC patients treated with definitive radiotherapy, we found that heterogeneity indices may serve as better prognosis predictors for patient outcome by improving both the sensitivity and specificity.We believe that CGITA will facilitate and accelerate our understanding of the usefulness of heterogeneity quantification and its future clinical role in patient management.Furthermore, we hope an open-source software model like CGITA can facilitate the establishment of clinical standards for heterogeneity analysis in the future to further expand its clinical use.

Figure 1 :
Figure 1: A screen shot of the CGITA program.The CGITA GUI provides users with a simple image display interface that allows users to examine different slices and views.The computation of heterogeneity indices is achieved simply by button clicking.As an open-source project, the current functions and interfaces of CGITA can be customized by users familiar with MATLAB programming.The screen shot here shows a subject with the FDG-PET images fused over CT images.

5 Figure 2 :
Figure 2: ROC curves of the heterogeneity indices, comparing two of the indices to the conventional metrics.The heterogeneity indices show a higher discriminative power than SUV mean and TLG.

Figure 3 :
Figure 3: Parametric images of textural features computed from a single patient compared to the original PET image.The PET image, shown in the top row, is displayed between SUVs of zero and twenty.Heterogeneity indices 1 to 4 represent, respectively, the contrast, dissimilarity, entropy, and inverse difference moment calculated from the cooccurrence matrix.Note that the parametric images appear different according to the spatial variation in voxel intensity.Furthermore, different index images display different tumor heterogeneity patterns.

Table 1 :
Summary of the currently supported heterogeneity indices of CGITA.

Table 2 :
Summary of the software features of CGITA.

Table 3 :
Comparison of AUC, specificity, and sensitivity of heterogeneity indices vs. SUV mean and TLG.
* denotes that the  value is less than 0.05 given the null hypothesis of AUC <0.5.† calculated using the Kruskal-Wallis test (19 indices have a  value greater than 0.453).