From its introduction in clinical practice, medical imaging has gained a central role in the management of a large variety of diseases. In particular, in oncology, medical imaging shows its unique property of characterizing, in vivo and noninvasively, the onset and progression of pathological processes at different stages of diseases [
In clinical practice, at a first level, medical images are qualitatively inspected by radiologists or nuclear medicine physicians [
To overcome such limitations, a great effort was focused in the recent years to develop quantitative approaches to medical image analysis. These approaches exploit the fact that digital medical images are inherently quantitative and that their quantitative values express several tissue functional characteristics, such as metabolism or proliferation [
Thanks to the advancements in such image processing methods, macroscopic indexes such as the Standardized Uptake Value (SUV) for Positron Emission Tomography (PET) or the Apparent Diffusion Coefficient (ADC) for Magnetic Resonance Imaging (MRI), and measuring global functional properties of an oncological lesion, were proven effective biomarkers for diagnosis or treatment response in oncological clinical studies [
Promising results published in several increasing papers proved that radiomic traits reflect tumor heterogeneity which is correlated to bad prognosis [
Furthermore, from a methodological point of view, one of the key problems emerging when defining image quantitative features is to assess their reproducibility, which is the closeness of the agreement between the results of successive measurements of the features carried out under the same conditions of measurement.
Moreover, different measurement conditions, such as different image reconstruction settings or lesion volume segmentation methods, can highly impact on the image feature stability, posing serious issues on the use of some image features as disease biomarkers [
The main purpose of this work is to evaluate reproducibility and stability of some radiomic features as effects of the use of different volume segmentation methods and reconstruction settings, which currently represent the more common variables in retrospective clinical oncological studies. We then assessed the significance of such radiomic features in effectively characterizing the lesion heterogeneity and shape.
These aims were pursued with the use of a realistic dataset of PET images obtaining from a thorax anthropomorphic phantom miming realistic oncological lesions with irregular shape and heterogeneous uptake of radiotracer whose GSs were known. Our work is helpful in determining the limits and the quantitative properties for clinical application of the radiomics approach with respect to the tested methods and parameters.
The anthropomorphic Alderson Thorax phantom (Radiology Support Devices, Inc.) was used to simulate man/woman thorax or breast body districts. Several synthetic lesions of irregular shape and both homogeneous and heterogeneous uptakes were realized and placed inside the thorax or the breasts of the anthropomorphic phantom within 18F-FDG radioactive background. In order to simulate realistic patient PET studies, each phantom compartment was filled with a different background of 18F-FDG radioactivity concentration: lungs with 0.004 MBq/cc, liver with 0.013 MBq/cc, myocardial wall with 0.023 MBq/cc, thorax with 0.006–0.007 MBq/cc, and breasts with 0.002–0.009 MBq/cc [
The whole procedure of preparing the phantom before PET acquisitions took about two hours. The 18F-FDG radioactivity concentration used during the preparation of the phantom took into account this time frame and was recalculated based on the half life of the 18F-FDG.
A strategy to produce realistic oncological lesions of irregular shape with a homogeneous or a heterogeneous uptake of 18F-FDG [
To obtain realistic oncological lesions with irregular shape, we defined 3D shells by segmenting the lesion volumes of different oncological lesions on 18F-FDG PET/CT images of real patients. The segmented volumes were then processed in order to generate images of 3D surfaces of lesions, saved in digital files. These surfaces were then cut into two parts by image manipulation and 3D printed using a 3D printer (Renkforce RF1000 Single Extruder) equipped with plastic filaments of 3 mm diameter (Renkforce PLA300 Plastic PLA 3 mm), thus manufacturing plastic moulds of patient-derived oncological lesions.
The availability of the printed shells allowed obtaining the gold standard (GS) for the sphericity of the shells to be compared with geometrical characteristics of radiomic features as extracted from the PET images of the experimental studies performed with the phantom. In particular, for each printed mould, an index of sphericity was defined as the ratio between the surface of the sphere, with volume equivalent to actual mould volume (
This index ranges from 0 to 1, where SGS = 1 expresses a full spherical shape.
For the PET experimental measurements, the shells were filled with a radioactive gel produced with a fast-setting, chromatic, dust-free alginate powder (phase plus, Zhermack Clinical SpA–Badia Polesine (RO), Italy) mixed with a water solution of 18F-FDG [
Seven experimental configurations were studied, with different radioactivity concentrations (
The seven different configurations to obtain lesions with different heterogeneous uptake. C1, C2, and C0 represent areas with lower, higher, and no radioactivity concentration, respectively. (a–c) Strategies for reproducing necrotic tissue; (d, e) heterogeneous (multifocal) uptake; (f, g) heterogeneous uptake and necrotic tissue.
GSs for the lesion volumes (
To obtain GSs for assessing the heterogeneity significance of radiomic features (as extracted from the PET studies of the phantom), two different indices of heterogeneity were considered.
The coefficient of variation of the different gels was measured as an index of heterogeneity in the radioactivity uptake, defined as the percentage ratio between the standard deviation and the mean of the radioactivity concentration within the lesion volume (COVGS).
The Gini index [
The product of COVGS and
The GS for lesion-to-background ratio (
18F-FDG PET-CT phantom measurements were performed on a Discovery 690 PET/CT system (General Electric Medical Systems) [
Images were reconstructed with a standard protocol optimized for whole-body clinical oncological studies: ordered subset expectation maximization (OSEM) in 3D mode, including Point Spread Function (PSF) [
PET images of lesions were segmented in order to obtain the Metabolic Tumor Volume (MTV) from which extract the radiomic features. Segmentation methods used in this work included an adaptive threshold method and a fixed threshold method. The adaptive method was calibrated and validated on a variety of synthetic lesions miming real oncological lesions (i.e., with spherical and nonspherical shape and with homogenous and nonhomogenous 18F-FDG uptake), with an accuracy in the MTV measurement of 92% [
Since it has been shown that the use of thresholding approaches is appropriate for small lesions only when there is a good
Radiomic imaging features were extracted from each segmented MTV as morphological and statistical imaging features. Morphological imaging features (IFM) were obtained starting from the shape and size characteristics of the segmented MTV [
The statistical analysis of first-order histogram describing the distribution of voxel intensities in MTV enabled to extract first-order statistical imaging features (IFHIST).
Texture analysis allowed obtaining statistical imaging features of higher orders. Images were resampled with an isotropic voxel size, considering the axial image size as resampled size. The MTV content was then resampled in 64 discrete gray-level values, and the texture analysis was performed with an in‐house‐developed MATLAB routine (v.2015b, MathWorks, Natick, MA, USA), largely based on a publicly available code [
In order to evaluate the impact of lesion volume segmentation (MTV) on the stability of radiomic features, the Friedman test was applied to the values of the radiomic features obtained for the two considered segmentation approaches (adaptive and fixed threshold methods, Section “Image Segmentation”).
To study the impact of reconstruction settings on stability of radiomic features, PET images were reconstructed with reconstruction algorithms or parameters different with respect to the standard reconstruction protocols (section “Phantom setting and PET data acquisition”). For each reconstruction setting, lesions MTVs were extracted with the adaptive threshold segmentation method.
Reconstructions were performed with OSEM with or without PSF modelling and considering or omitting TOF. The impact of the matrix size of reconstructed images was also evaluated. Considering algorithm parameters, the influence of the number of iterations and subsets was assessed fixing a matrix size equal to 256 × 256, because it is the most used size in clinical practice. In order to evaluate the impact of the full width at half maximum (FWHM) of Gaussian filter, matrix size was chosen such that the reconstructed voxel size is within 3.0–4.0 mm in any direction and FWHM not exceeding 7 mm, according to EANM guidelines [
Table
Reconstruction settings.
Reconstruction algorithm | Number of iterations | Number of subsets | FWHM Gaussian filter (mm) | Reconstructed matrix size | |
---|---|---|---|---|---|
Impact of reconstruction algorithm | OSEM3D | 3 | 18 | 5 | 256 |
OSEM3D + PSF | |||||
OSEM3D + TOF | |||||
OSEM3D + PSF + TOF | |||||
Impact of number of iterations | OSEM3D + PSF + TOF | 2 | 18 | 5 | 256 |
3 | |||||
4 | |||||
Impact of number of subsets | OSEM3D + PSF + TOF | 3 | 18 | 5 | 256 |
24 | |||||
Impact of reconstructed matrix size | OSEM3D + PSF + TOF | 3 | 18 | 5 | 128 |
192 | |||||
256 | |||||
Impact of FWHM of Gaussian filter | OSEM3D + PSF + TOF | 3 | 18 | 5 | 192 |
7 |
OSEM = ordered subset expectation maximization; PSF = point spread functions; TOF = time of flight; FWHM = full width at half maximum.
For each radiomic feature, COV were calculated as average of all lesions to quantify variations over the different reconstruction settings, thus characterizing feature stability vs. reconstruction.
On the basis of COV results, radiomic features were categorized into 4 groups: stable (COV ≤ 5%), quite stable (5% < COV ≤ 10%), poorly stable (10% < COV ≤ 20%), and unstable (COV > 20%).
For each feature, in order to provide representative information on its stability with respect to the different explored reconstruction settings, we considered the higher value of COVs obtained among all the reconstruction settings. A feature was considered quite stable when such COV value was found ≤10%.
In order to explore reproducibility of radiomic features, a test-retest setting was used. Two sets of test-retest images were acquired approximately 30 min apart (acquisition time of 180 sec for each bed position). Lesions in the two sets of test-retest images were segmented with the adaptive threshold segmentation method.
For each feature, the pairwise intraclass correlation coefficient (ICC) was calculated [
A Mann–Whitney test was used to evaluate significant differences among each feature when calculated from heterogeneous vs. homogeneous uptake, thus measuring the potential of radiomic features in discriminating heterogeneous from homogeneous lesions.
The significance of each radiomic features in terms of capturing heterogeneity was evaluated also by testing correlation of each feature with
The morphological radiomic feature “Sphericity” was evaluated in its ability to reflect geometrical characteristics as defined by
Table
Summary of the 18F-FDG PET/CT acquisitions of the phantom, with gold standard values of each lesion.
Number of PET acquisition | Number of lesion acquired in PET | Shell type |
|
|
|
|
|
---|---|---|---|---|---|---|---|
1 | 1 | A | 6.8 | 0.57 | 0 | 6.8 | 10 |
2 | B | 10.5 | 0.62 | 0 | 10.5 | 10 | |
3 | C | 8.5 | 0.49 | 0 | 8.5 | 10 | |
4 | D | 12.5 | 0.74 | 0 | 12.5 | 10 | |
|
|||||||
2 | 5 | A | 6.8 | 0.57 | 0 | 6.8 | 10 |
6 | B | 10.5 | 0.62 | 0 | 10.5 | 10 | |
7 | C | 8.5 | 0.49 | 0 | 8.5 | 10 | |
8 | D | 12.5 | 0.74 | 0 | 12.5 | 10 | |
|
|||||||
3 | 9 | A | 6.8 | 0.57 | 0 | 6.8 | 10 |
10 | B | 10.5 | 0.62 | 0 | 10.5 | 10 | |
11 | C | 8.5 | 0.49 | 0 | 8.5 | 10 | |
12 | D | 12.5 | 0.74 | 0 | 12.5 | 10 | |
|
|||||||
4 | 13 | A | 6.8 | 0.57 | 0 | 6.8 | 27 |
14 | B | 10.5 | 0.62 | 0 | 10.5 | 26 | |
15 | C | 8.5 | 0.49 | 21.1 | 7.4 | 9 | |
16 | D | 12.5 | 0.74 | 12.7 | 11.7 | 25 | |
|
|||||||
5 | 17 | A | 6.8 | 0.57 | 0 | 6.8 | 27 |
18 | B | 10.5 | 0.62 | 0 | 10.5 | 26 | |
19 | C | 8.5 | 0.49 | 21.1 | 7.4 | 9 | |
20 | D | 12.5 | 0.74 | 12.7 | 11.7 | 25 | |
|
|||||||
6 | 21 | A | 6.8 | 0.57 | 0 | 6.8 | 27 |
22 | B | 10.5 | 0.62 | 0 | 10.5 | 26 | |
23 | C | 8.5 | 0.49 | 21.1 | 7.4 | 9 | |
24 | D | 12.5 | 0.74 | 12.7 | 11.7 | 25 | |
|
|||||||
7 | 25 | A | 6.8 | 0.57 | 0 | 6.8 | 12 |
26 | B | 10.5 | 0.62 | 0 | 10.5 | 11 | |
27 | C | 8.5 | 0.49 | 21.1 | 7.4 | 4 | |
28 | D | 12.5 | 0.74 | 12.7 | 11.7 | 11 | |
|
|||||||
8 | 29 | A | 6.8 | 0.57 | 14.9 | 6.4 | 18 |
30 | B | 10.5 | 0.62 | 26.2 | 10.5 | 10 | |
31 | C | 8.5 | 0.49 | 24.8 | 7.6 | 7 | |
32 | D | 12.5 | 0.74 | 62.2 | 8.7 | 9 | |
33 | E | 32.3 | 0.73 | 16.3 | 29.4 | 25 | |
|
|||||||
9 | 34 | A | 6.8 | 0.57 | 14.9 | 6.4 | 12 |
35 | B | 10.5 | 0.62 | 26.2 | 10.5 | 7 | |
36 | C | 8.5 | 0.49 | 24.8 | 7.6 | 5 | |
37 | D | 12.5 | 0.74 | 62.2 | 8.7 | 6 | |
38 | E | 32.3 | 0.73 | 16.3 | 29.4 | 16 |
Nine 18F-FDG PET/CT acquisitions of the phantom have been performed including 38 lesions of different shape, size, radiotracer distribution, and L/B ratio in different locations of the phantom. Five different 3D-printed shells with irregular shape (A-E) were used as obtained from the PET image segmentation of real oncological lesions. Their
20/38 lesions were prepared with a uniform radiotracer uptake, while the remaining 18 lesions with a heterogeneous uptake. The
Explored L/BGS ranged from 4 to 27.
Image noise of each PET acquisition was evaluated as COV in uptake distribution inside a large region of the liver. The mean COV calculated on the 9 PET acquisition is <8%.
Figure
Examples of PET images of heterogeneous lesions (a-b, d-e, g-h), with 3D renders of lesions (c, f, i). (1)
Table
Mean percent error on the estimate of MTV of small lesions as a function of
|
Adaptive threshold mean percent error (%) | Fixed threshold mean percent error (%) |
---|---|---|
|
27 ± 9 | −33 ± 13 |
5 < |
16 ± 30 | −35 ± 26 |
10 < |
17 ± 16 | −31 ± 25 |
The adaptive threshold method presents good results at higher
Generally, results show the tendency of the adaptive threshold method to overestimate the volume, while the fixed threshold segmentation method always underestimates it. However, the selection of the optimal segmentation method was not the purpose of this paper.
Table
The radiomic features considered in the work.
Feature name | Feature group |
---|---|
MTV | IFM |
Surface | |
Spherical disproportion | |
Sphericity | |
Surface-volume ratio (SV) | |
|
|
Maximum | IFHIST |
Minimum | |
Mean | |
Median | |
Mean absolute deviation (MAD) | |
Root mean square (RMS) | |
Energy | |
Entropy | |
Kurtosis | |
Skewness | |
Standard deviation | |
Uniformity | |
Variance | |
|
|
Energy | IFTX-GLCM |
Contrast | |
Entropy | |
Homogeneity | |
Correlation | |
SumAverage | |
Variance | |
Dissimilarity | |
Autocorrelation | |
|
|
Short run emphasis (SRE) | IFTX-GLRLM |
Long run emphasis (LRE) | |
Gray-level nonuniformity (GLN) | |
Run-length nonuniformity (RLN) | |
Run percentage (RP) | |
Low gray-level run emphasis (LGRE) | |
High gray-level run emphasis (HGRE) | |
Short run low gray-level emphasis (SRLGE) | |
Short run high gray-level emphasis (SRHGE) | |
Long run low gray-level emphasis (LRLGE) | |
Long run high gray-level emphasis (LRHGE) | |
Gray-level variance (GLV) | |
Run-length variance (RLV) | |
|
|
Small zone emphasis (SZE) | IFTX-GLSZM |
Large zone emphasis (LZE) | |
Gray-level nonuniformity (GLN) | |
Zone-size nonuniformity (ZSN) | |
Zone percentage (ZP) | |
Low gray-level zone emphasis (LGZE) | |
High gray-level zone emphasis (HGZE) | |
Small zone low gray-level emphasis (SZLGE) | |
Small zone high gray-level emphasis (SZHGE) | |
Large zone low gray-level emphasis (LZLGE) | |
Large zone high gray-level emphasis (LZHGE) | |
Gray-level variance (GLV) | |
Zone-size variance (ZSV) | |
|
|
Coarseness | IFTX-NGTDM |
Contrast | |
Busyness | |
Complexity | |
Strength |
In particular, five morphological features were extracted characterizing the shape and size of each lesion [
Figure
Stability of radiomic features on different segmentations. Friedman test results (
By comparing the values of each feature extracted from the MTV as derived from the two segmentation approaches, it was found that many features have a large variability with respect to the applied segmentation method; thus the choice of the segmentation method have a strong impact on the stability of radiomic features.
In particular, results obtained on the whole datasets of both uniform and nonuniform lesions showed that less than 20% (11/58) of radiomic features can be considered full stable with respect the two considered segmentation methods.
In Figure
Uniform lesions. Stability of radiomic features on different segmentations. Friedman test results (
As expected, a larger number of radiomic features resulted stable (41%, 24/58).
Results obtained considering variations of reconstruction parameter (i.e., reconstruction type, matrix size, FWHM of Gaussian filter, number of iterations, and number of subsets) are summarized in Figure
Stability of radiomic features on different reconstruction settings. COV results. • indicates COV ≤ 10%.
Thirty-one of the 58 radiomic features (53%) resulted stable in the test-retest datasets (ICC ≥ 0.6), as reported in Figure
Reproducibility of radiomic features on test-retest datasets. ICC results. • indicates ICC ≥ 0.6.
Results from Mann–Whitney test showed that 24/58 (41%) of radiomic features have significantly different values in case of lesions with uniform versus nonuniform uptake (
Mann–Whitney test results (
As shown in Figure
Results of correlation analysis between radiomic features and
Paired
Despite the potential proven impact of radiomics, scientific evidences suggest that radiomic features extracted from PET images of cancer lesions may have a large variability depending in particular on the different reconstruction settings and segmentation strategies used prior the radiomic analysis [
Some published works were devoted to assess intrapatient reproducibility or features stability with respect to both segmentation or reconstruction settings [
Consistently with other published studies [
The fixed threshold approach is widely used in the literature [
We found, in agreement with previous reports [
Intrapatient reproducibility can be a serious concern, but it could be properly managed. A good number of features (31) resulted reproducible from our results of test-retest setting, suggesting to consider this subset for further radiomic analysis. Among these reproducible features we found most of morphological and histogram-derived features considered in this work, and some textural features from the gray-level co-occurrence matrix and gray-level run-length matrix.
Eleven of the 31 features were found also able to discriminate heterogeneous from uniform radioactivity uptake (
Furthermore, interesting results were obtained when comparing radiomic features with respect to gold standard indexes of heterogeneity and sphericity. Considering the uptake heterogeneity, we found 3 reproducible features (run-length-nonuniformity, run percentage, and large zone emphasis) among the 11 found above, which are also proven able to reflect the heterogeneity in the PET uptake (strongly correlated with the gold standard heterogeneity index). These findings suggest that the 3 features can be considered as first choice when testing the hypothesis that PET heterogeneity could reflect real tumor heterogeneity.
In conclusions, in this work, we showed some limits and quantitative properties of the radiomics approach (with respect of the tested methods and parameters) that should be overcome for a clinical translation of radiomics. Considering our findings, we suggest an optimal strategy for radiomic bias-free analysis to archive all raw data of PET acquisitions collected for a clinical study, to be then reconstructed and segmented by standardized reconstruction and segmentation protocols. We found a subset of thirty features that could be preferred for reproducible radiomic PET studies; 3 of them seeming particularly suitable for capturing tumor heterogeneity. However, our results need to be confirmed by other more extensive studies and cannot be exactly transferred to real or more complex clinical conditions.
An image set of our original anthropomorphic phantom is available to researchers, after registration, at
The authors declare that there are no conflicts of interest.
This work was supported by the CNR Research Project “Aging: Molecular and Technological Innovations for Improving the Health of the Elderly” (no. DSB.AD009.001; activity no. DSB.AD009.001.043).