DCE-MRI Pharmacokinetic-Based Phenotyping of Invasive Ductal Carcinoma: A Radiomic Study for Prediction of Histological Outcomes

Breast cancer is a disease affecting an increasing number of women worldwide. Several efforts have been made in the last years to identify imaging biomarker and to develop noninvasive diagnostic tools for breast tumor characterization and monitoring, which could help in patients' stratification, outcome prediction, and treatment personalization. In particular, radiomic approaches have paved the way to the study of the cancer imaging phenotypes. In this work, a group of 49 patients with diagnosis of invasive ductal carcinoma was studied. The purpose of this study was to select radiomic features extracted from a DCE-MRI pharmacokinetic protocol, including quantitative maps of ktrans, kep, ve, iAUC, and R1 and to construct predictive models for the discrimination of molecular receptor status (ER+/ER−, PR+/PR−, and HER2+/HER2−), triple negative (TN)/non-triple negative (NTN), ki67 levels, and tumor grade. A total of 163 features were obtained and, after feature set reduction step, followed by feature selection and prediction performance estimations, the predictive model coefficients were computed for each classification task. The AUC values obtained were 0.826 ± 0.006 for ER+/ER−, 0.875 ± 0.009 for PR+/PR−, 0.838 ± 0.006 for HER2+/HER2−, 0.876 ± 0.007 for TN/NTN, 0.811 ± 0.005 for ki67+/ki67−, and 0.895 ± 0.006 for lowGrade/highGrade. In conclusion, DCE-MRI pharmacokinetic-based phenotyping shows promising for discrimination of the histological outcomes.


Introduction
Breast cancer is the most common malignant tumor that affects women worldwide [1,2]. It is one of the leading cause of cancer death in women, with alarming statistics in the young population (under 40 years) [3].
An early diagnosis and classification of the breast tumor is fundamental in the patient's management: the tumor genotype is often predictive of outcome [4], it is used clinically for the selection of the most appropriate therapy [5][6][7] and has proved valuable for personalized treatments [8][9][10].
According to their gene expression, breast tumors can be classified into four molecular subtypes: luminal A (lumA), luminal B (lumB), human epidermal growth factor receptor 2-(HER2-) like, and basal-like [11]. This classification is based on the expression of estrogen receptor (ER), progesterone receptor (PR), HER2, and ki67, a marker of cellular proliferation. According to St. Gallen 2013 [12], the lumA subtype, defined as positive ER (ER+), positive PR (PR+, with a positive value larger than 20%), negative HER2 (HER2−), and low levels of ki67 (with a cut off of 20% [13]), shows the best survival and the highest probability of being longterm disease-free [14]. LumB subtypes are characterized by two different genotypes: ER+ combined with HER2− and PR < 20% or high levels of ki67 (≥20) and ER+ together with HER2+ with any value of PR and ki67. LumB has higher proliferation and poorer prognosis than lumA [15]. HER2 (negative to ER and PR, positive to HER2) and basal-like (negative for all three receptors and, consequently, also known as triple negative, TN) subtypes have the worst 2 Contrast Media & Molecular Imaging prognosis and the latter is often associated with lymph node involvement [16] and accounts for a large portion of breast cancer deaths after diagnosis [17]. These receptors together with ki67, providing direct observation on the molecular underpinnings of the tumor, have been widely studied and are at the basis of the choose for personalized treatments: for example, patients with HER2+ cancer have been found to be quite effectively treated with trastuzumab and lapatinib [18]; ki67 has been identified as a prognostic and predictive marker in hormone receptor positive breast cancer [19]. Breast cancers overexpressing ER, PR, and/or HER2 can be specifically targeted with hormonal therapies, while TN breast cancers currently have no targeted therapy available and are limited to general cytotoxic chemotherapies [20].
Another important clinical variable for patients' stratification and treatment options is the tumor grade. For breast cancer, it is defined by the Elston-Ellis modification of the Scarff-Bloom-Richardson grading system and it is based on duct structures, size, and shape of nucleus in the tumor cells, and mitotic rate, leading to a final three-grade scale: G1 (low grade), G2 (intermediate grade), and G3 (high grade), with lower grade indicating a better prognosis [21].
The molecular receptor status, ki67 levels, and tumor grade are obtained by immunohistochemical analyses on tissue samples [22] from core needle biopsy (CNB). CNB is widely used as a standard procedure for diagnosis of breast cancer [23], but, although several studies have reported the concordance between preoperative CNB and surgical specimens for molecular determination [24,25], it has two main limits. CNB, in fact, is an invasive procedure and it may not reflect completely the complexity and heterogeneity of tumor lesion, since the information obtained may vary depending on which part of the tumor is sampled [26].
In recent years, an increasing interest has been focused on the identification of imaging surrogates and development of noninvasive diagnostic tools for cancer characterization and monitoring [27]. In fact, the imaging approach, besides its noninvasiveness, can give in vivo information on the entire tumor volume, reducing inaccuracy due to sampling errors in histopathological analyses. In particular, radiomic approaches have proved to be a key way to study the cancer imaging phenotypes, reflecting underlying gene expression patterns [28,29]. Radiomics, in facts, refer to the extraction of a large number of quantitative features from medical images [30], revealing heterogeneous tumor metabolism and anatomy [31,32]. This high-throughput extraction is preparatory to a process of data mining [33] for studies of association with or prediction of different clinical outcomes [34], giving important prognostic information about disease. The potential of radiomics, to extensively characterize the intratumoral heterogeneity, has shown promise in the prediction of treatment response and outcome, differentiating benign and malignant tumors and assessing the relationship with genetics in many cancer types [35], such as non-smallcell lung cancer [36], liver [37], prostate [38] and head and neck [39] tumors, and glioblastoma [40]. In the last years, the most widely used imaging modalities in radiomic research have been positron emission tomography (PET) and computed tomography (CT) [41]; however, an increasing interest is emerging toward Magnetic Resonance Imaging (MRI), which is an extremely versatile imaging technique, as it can provide multiparametric information derived from both morphologic and functional signals [42].
Although all of them are based on the extraction of morphological features and enhancement features from DCE-MRI, no one takes into account the radiomic evaluation of the quantitative DCE pharmacokinetic parameters ( trans , ep, iAUC, and V ) that, in standard correlation analysis with mean values, have shown a good agreement with prognostic factors and TN subtypes [59]. To overcome this approach and take into account lesion heterogeneity, Li et al. [60] performed a histogram-based analysis for differentiating benign and malignant tumors.
The aim of this study was to select radiomic features extracted from a DCE-MRI protocol, including precontrast images, pharmacokinetic parametric maps, the auxiliary 1 map, and delayed postcontrast images, to evaluate their prediction power in the differentiation of molecular receptor status, ki67 levels, and tumor grade obtained by immunohistochemical analyses in a dataset of invasive ductal carcinoma patients.

Patient Cohort.
The study was approved by the Institutional Review Board. A group of 49 patients was enrolled. Inclusion criteria were diagnosis of invasive ductal carcinoma, availability of the core biopsy or mastectomy biopsy reports of the primary breast cancer, and age older than 18 years at the time of the study. Exclusion criteria included pregnancy and inadequate MR images. a dynamic scan with 60 consecutive phases with a VIBE sequence (TR = 5.3 ms, TE = 1.9 ms, FA = 20 ∘ , FOV = 356 × 379 mm 2 , resolution = 1.98 × 1.98 mm 2 , slice thickness = 3.6 mm, and temporal resolution = 9 s/phase); and a delayed 3D postcontrast fat-suppressed T1-weighted gradient echo sequences (TR = 8.4 ms, TE = 2.5 ms, FOV = 370 × 370 mm 2 , resolution = 0.82 × 0.82 mm 2 , and slice thickness = 0.89 mm). Intravenous contrast injection started at the end of the first phase of dynamic scan at a dose of 0.1 mmol/kg of body weight and at the highest rate compatible with patient's age and compliance (up to 5 mL/s) 2.3. Immunohistochemistry. Core needle biopsies were performed under ultrasound guidance by a radiologist with more than 15 years of experience. Biopsies were fixed in 10% neutral buffered formalin at the time of biopsy. Mastectomy specimens, obtained from patients who underwent mastectomy, were sent to the department of pathology immediately after resection. Expression of ER, PR, HER2, and Ki67 was determined by immunohistochemical analysis. Each tumor sample was classified as ER+, PR+, and/or HER2+, or being TN. The cut-off values for receptor and ki67 expression were chosen accordingly to the St Gallen Consensus Meeting 2013 [12]. The histological grade was determined using the method of Elston and Ellis. All pathological diagnoses were rendered by the Breast Pathology Subspecialty Department at Ospedale Moscati (Avellino, Italy).

Tumor
Segmentation. 3D segmentation of the lesion was obtained semiautomatically from the dynamic VIBE sequence. After motion correction of the single phases on the first time point, an experienced radiologist was asked to manually draw a rectangular bounding box containing the tumor region. Successively, the dynamic motion-corrected sequence and the bounding box were given as input to the SegmentCAD module of 3DSlicer [61], which automatically segmented the lesion on the basis of the temporal dynamic of the signal. Voxels that reached a signal increase higher than the 75% of the first time point were selected as tumor. The cutoff of 75% was selected in accordance with a previous study [62] that studied the concordance correlation coefficient between the longest dimensions of the tumor measured on the surgical specimen and on the DCE-MRI segmentation when the cut-off value changed.

Pharmacokinetic Map Calculation.
Pharmacokinetic maps were obtained with the commercial software Tissue 4D (Siemens Healthcare, Erlangen, Germany). After an automated step of motion correction of the VIBE sequences at variable FAs with the dynamic VIBE sequence, the Toft model [63] was chosen for the pharmacokinetic parameters calculation. The arterial input function (AIF) used for the analysis was set to "intermediate," on the basis of populationbased AIFs built in Tissue 4D. Finally, 3D maps of trans , ep , V , and iAUC were obtained.
In addition to these quantitative maps, from the fitting of the VIBE signal at variable FAs also the relaxation rate 1 (inverse of relaxation time 1 , used in the generation of pharmacokinetic parameters) was obtained, by an in-house software developed in Matlab (The MathWorks Inc., Natick, MA) and saved for feature extraction.
2.6. Image Preprocessing. Before feature extraction, some preprocessing steps were performed: for each subject, in order to avoid the presence of spurious points in the tumor masks, possible voxels, disconnected from the biggest connected component, were erased. Then, the TIRM and the delayed 3D postcontrast fat-suppressed 1 -weighted (postC) images were coregistered to the first time point of the dynamic VIBE sequence in order to correct for possible patients' movements and two resampled versions of the tumor mask were generated, in order to match the resolution of TIRM and postC images and to allow feature extraction in the native space, for each acquisition. This step was not required for 1 map, since it shared the same geometry of dynamic VIBE sequence, which was used for lesion segmentation, and consequently of trans , ep , V , and iAUC maps.

Feature Extraction.
Nine shape features (including number of voxels, maximum and minimum diameter, volume, surface area, surface volume ratio, compactness, spherical disproportion, and sphericity) were extracted from the tumor segmentation.
1 , trans , ep , V , and iAUC maps and TIRM and postC were used for first-and second-order feature extraction. They were normalized, limiting their dynamics within the tumor mask to ± 3 [64]; then thirteen first-order features were extracted from the intensity histogram computed on 256 bins: energy, entropy, kurtosis, maximum (Max), mean, mean absolute deviation (Mad), median, minimum (Min), root mean square (Rms), skeweness, standard deviation (Std), uniformity, and variance.
The second-order features chosen for this study were Gray Level Cooccurrence Matrix (GLCM) [65], computed by a 3D analysis of the tumor region with 26-voxel connectivity and simultaneously taking into account the neighboring properties of voxels in all the 3D direction [66], after image quantization on 32 grey levels. The obtained features were energy, contrast, entropy, homogeneity, correlation, sum average, variance, dissimilarity, and auto correlation.
Therefore, considering the first-and second-order features computed for each of the seven images in addition to the shape features, a total of 163 features were obtained.
The multivariable predictive models were obtained following the method described by Vallières et al. [66], using at each step an imbalance-adjusted bootstrap resampling (IABR) on 1000 samples. First, for each task, from the large initial set of 163 features, a reduced feature set of 25 features was computed through a stepwise forward feature selection scheme. The first feature was chosen as the best one (i.e., the one that maximized Spearman's rank correlation with the outcome under investigation). Then, one at a time, features were added (up to 25) that maximized a gain equation, given by the linear combination of Spearman's rank correlation (between the feature and the outcome) and the Maximal Information Coefficient (between the feature that was tested and the ones that were yet included in the reduced set) [67]. Then, from the reduced feature set, logistic regression models of order from 1 to 10 that would best predict the outcome under investigation were obtained with another stepwise forward feature selection that, one by one, added to the th model the feature that maximized the 0.632+bootstrap area under the receiver operating characteristic curve (AUC) [68] of the models of order .
Finally, for each classification task, the prediction model was obtained choosing the order that maximize the AUC and computing the final model logistic regression coefficients for the aforementioned combination of feature using IABR.
Mann-Whitney test was used to study the association between each classification task and both the single features of the respective reduced feature sets and the computed prediction models.

Results
For each of the six classification tasks, the study population, based on the availability of the histological markers under investigation, was reported in Table 1.
The reduced feature sets, one for each classification task, computed according to the stepwise forward feature selection scheme and each composed by the 25 top ranked features in the gain equation, were reported in Table 2, together with the values of the Mann-Whitney test for each feature. At this univariate analysis only median, mean, and energy of trans , together with the mean of ep , resulted to be significantly associated (considering the Bonferroni adjusted value for multiple comparison) with the ki67+/ki67− outcome.
For each reduced feature set, multivariable logistic regression models of order from 1 to 10 were obtained and their prediction performance for the different classification tasks was reported in terms of AUC in Figure 1.
By inspecting the curves in Figure 1, the best prediction results were overall reached for classification of tumor grade. Interestingly, for ki67 level discrimination task, which had individually significant features at the Mann-Whitney test (see Table 2), the AUC values did not show any improvement after order 2. For each task, the best model was chosen using as figure of merit the AUC and the selected features were given as input to the logistic regression. The order of the chosen models and the associated prediction performance were reported in Table 3.
The final computation of the multivariable model coefficients led to the following prediction models for ER, PR, HER2+ expression, TN, ki67, and grade, respectively: The most recurrent features in the models were skeweness and entropy and, to a lesser extent, auto correlation, variance, correlation, sum average, and energy, while no shape feature was included into the models. Looking at the source images, a greater number of occurrences was found for 1 map than TIRM images, while the pharmacokinetic maps and the postcontrast acquisition were equally frequent, with the exception of V that appeared only once in the prediction models.
The Mann-Whitney test revealed a higher discriminative power of the obtained multivariable models compared to the most significant single feature (Table 2) Figure 2, where, for each classification task, the box plot of the multivariable model was reported.

Discussion
In this work a radiomic approach to predict different histological outcomes was developed on the basis of a DCE-MRI protocol including pharmacokinetic parametric maps. Six classification tasks were tested, including the molecular receptor status (ER+/ER−, PR+/PR−, HER2+/HER2−, and TN/NTN), ki67 levels, and tumor grade. The molecular receptors are an immunohistochemistry surrogate for breast cancer subtyping and, together with ki67 levels, allow to differentiate lumA, lumB, HER2, and basal-like. Moreover they are fundamental when choosing personalized treatment or the addition of adjuvant chemotherapy to hormone therapy [15].
The obtained results show that radiomic approaches based on pharmacokinetic maps lead to predictive models with a high discriminative power, reaching AUC values above the 80% and accuracy up to 88%.
In order to assess the added value of the radiomic approach, the discriminative power of the single features of the reduced set has been separately evaluated by means of univariate analysis. When looking at these results ( Table 2), values of the Mann-Whitney test show that several features are associated with the tumor histological outcome under investigation, but only mean, median, and energy of trans , together with mean of ep , were found to be significantly associated with the ki67+/ki67 discrimination task. This may be due to the fact that trans and ep are indeed two quantitative pharmacokinetic parameters related to the tumor permeability and vascularization and to medium contrast wash-out; they are clinically used for the differentiation of breast lesions with nonradiomic approaches, and also previous works [59] demonstrated their utility, correlating them with prognosis and TN subtype.
However, the results obtained with the radiomic approach, which is the high-throughput extraction of features, followed by a learning approach for the construction of a predictive models, led to a higher discriminative power, showing that such methods have a great potential to improve quantitative MRI assessment of the tumors.  Other previous works performed radiomic studies on breast cancer, above all to differentiate between subtypes [14,16,20,34,44,[52][53][54][55][56][57][58] with classification performance lower or similar to our results. However, a direct comparison with them was not directly applicable, considering the different populations and imaging approaches.
Interestingly, in the predictive models obtained in this work, the most recurrent features were skeweness and entropy that were indices of randomness, showing the importance of studying the heterogeneity of the tumors. In particular, entropy, even if computed on the first postcontrast image, was already found to be statistically associated to tumor aggressiveness by Li et al. [34]. Skeweness, instead, was found to be predictive for discriminating molecular subtypes by Fan et al. [16] and Sutton et al. [44]. In particular, in this last work, the authors found the skeweness to be significant at three time points on postcontrast MR images, suggesting the pharmacokinetics as a key component in differentiating the subtype.
Interestingly, all these works found a significant association between the outcome and at least one shape feature. Instead, in our study, since the step of feature reduction, they were excluded with the exception of minimum diameter (in the ki67+/ki67 discrimination task) that, however, did not survive in the predictive model. This may be due to a different feature selection algorithm or to the presence of pharmacokinetic-based feature that may be more strongly associated to the outcome than shape features.
Our study propose, by first, the use of pharmacokinetic and relaxometric maps for the radiomic analyses. In particular, in the obtained predictive models, the pharmacokinetic maps, together with postC, were equally represented (with the exception of V ), proving the added value of multiparametric information. Interestingly, much more instances of 1 features were found compared to the features from TIRM images. 1 is a parametric map that in DCE time resolved studies is usually used as auxiliary to the computation of pharmacokinetic parameters and is seldom studied by itself, although it could give important information regarding increased vascularity, presence of edema, or necrosis [69,70]. Our approach, instead, dealing with this parametric map referring to explicit physiological and structural conditions without the use of contrast media, leads to the generation of more discriminative features, compared to the conventional TIRM sequence. This suggests that 1 map is more suitable to extract textural properties of the tissues.
This study has some limitations: first of all the sample size. A larger study group need to be studied in the feature, to better conduct a radiomic analysis. Moreover, in the computation of pharmacokinetic maps, a population-based AIF was used: this may be a limitation for a quantitative analysis and further evaluation is needed to understand the impact of different AIF on the prediction results. In addition, the diffusion and testing of the obtained models on other populations is limited by the fact that time resolved DCE-MRI protocols for the computation of pharmacokinetic models is not always available in the clinical practice. However, this study paves the way to the study of 1 map as itself and not necessarily related to the computation of trans , ep , V , and iAUC In conclusion, DCE-MRI pharmacokinetic-based analysis along with 1 leads to the creation of predictive model that can help in differentiation between molecular receptor status, ki67 levels, and tumor grade with high accuracy. In this direction, further studies will be conducted on the development of models that differentiates between subtypes and including PET images or other MRI acquisition techniques, such as Diffusion Weighted Imaging, and genomic data.