Computer-Aided Quantification of Interstitial Lung Disease from High Resolution Computed Tomography Images in Systemic Sclerosis: Correlation with Visual Reader-Based Score and Physiologic Tests

Objective. To evaluate the performance of a computerized-aided method (CaM) for quantification of interstitial lung disease (ILD) in patients with systemic sclerosis and to determine its correlation with the conventional visual reader-based score (CoVR) and the pulmonary function tests (PFTs). Methods. Seventy-nine patients were enrolled. All patients underwent chest high resolution computed tomography (HRCT) scored by two radiologists adopting the CoVR. All HRCT images were then analysed by a CaM using a DICOM software. The relationships among the lung segmentation analysis, the readers, and the PFTs results were calculated using linear regression analysis and Pearson's correlation. Receiver operating curve analysis was performed for determination of CaM extent threshold. Results. A strong correlation between CaM and CoVR was observed (P < 0.0001). The CaM showed a significant negative correlation with forced vital capacity (FVC) (P < 0.0001) and the single breath carbon monoxide diffusing capacity of the lung (DLco) (P < 0.0001). A CaM optimal extent threshold of 20% represented the best compromise between sensitivity (75.6%) and specificity (97.4%). Conclusions. CaM quantification of SSc-ILD can be useful in the assessment of extent of lung disease and may provide reliable tool in daily clinical practice and clinical trials.


Introduction
Systemic sclerosis (SSc) is a heterogeneous autoimmune disorder of unknown aetiology that is characterized by musculoskeletal involvement, vascular dysfunction, cutaneous and visceral fibrosis [1]. Interstitial lung disease (ILD) was reported in up to 70% of patients with SSc and frequently it can be cause of death of these patients [1][2][3].
High resolution computed tomography (HRCT) is currently the most accepted imaging tool for the detection, characterization, and treatment monitoring of ILD [3][4][5][6][7]. Moreover its findings have demonstrated a good correlation with the pulmonary function tests (PFTs) acquiring a prognostic value for ILD [4,8].
Despite these characteristics the correct interpretation of HRCT findings still represents often a problem for the inexperienced physicians since there is a wide interobserver variability even among expert radiologists [9]. Therefore, a quantitative, noninvasive and reliable imaging method able to permit an accurate assessment of ILD in SSc is highly desirable [10,11].
To date, several computerized tools to segment automatically the lung, using HRCT images, have been developed [12]. They include image display (e.g., multiplanar reformations and surface shading for three-dimensional and volume rendering), anatomic image quantitation (e.g., area and volume of airways and lungs), and regional characterization of lung tissue (analyzing attenuation, changes in attenuation, and texture patterns in the imaged lung) [12][13][14]. They also provide computer-derived measures such as mean lung attenuation (MLA) (representing the average global attenuation value of the pulmonary parenchyma), skewness (representing the extent of asymmetry of histograms), and kurtosis (representing the degree of "peakedness" of the histograms) [15]. Additionally, the acquisition of more sophisticated image analysis including the fractal analysis and the adaptive multiple feature method is possible [16].
With respect to the traditional visual interpretation of HRCT lung findings, the automatic computer-based assessment may improve the objectivity, sensitivity, and repeatability of quantitative changes in the lung features. We recently investigated the utility of an open-source Digital Imaging and Communication in Medicine viewer software OsiriX to assess ILD in patients with SSc showing a significant association between the quantitative OsiriX assessment and the conventional HRCT semiquantitative analysis. Results for the reliability of the open-source findings were also acceptable [17].
Taking into account this information we designed the present study aimed to evaluate the performance of a computerized-aided method (CaM) for the quantification of ILD, in patients with SSc and to determine its correlation with respect to both the conventional visual reader-based score (CoVR) and the PFTs findings. The secondary aims were to evaluate the feasibility and interreader reliability of the CaM.

Patients.
Patients with SSc, defined by the American College of Rheumatology (formerly, the American Rheumatism Association) classification criteria [18], were included in the study. SSc patients were classified in limited and diffuse cutaneous involvement (lcSSc and dcSSc, resp.). LcSSc was characterized by thickening of the skin distal to the elbows and knees and proximal to the clavicles (including the face) whereas dcSSc was characterized by thickening of the skin proximal as well as distal to the elbows and knees and including the trunk and the face. Exclusion criteria included absence of recent or current respiratory infection, severe pulmonary hypertension requiring specific treatment, uncontrolled congestive heart failure, known history of asthma, allergic alveolitis, and exposure to organic dusts or clinically significant abnormalities other than interstitial lung disease identified on chest radiography or on HRCT.

Pulmonary Function Tests.
PFTs were performed within 1 week from the lung HRCT assessment by a flow-sensing spirometer and a body plethysmograph connected to a computer for data analysis. PFTs were performed while the patient was at rest in a seated position. These tests consisted of spirometry using a computerised lung analyser (MasterScreen Diffusion, Jaeger GmbH, Höchber, Germany). Forced vital capacity (FVC), forced expiratory volume in 1 s (FEV1), and the single breath carbon monoxide diffusing capacity of the lung (DLco) were obtained. These parameters of PFT were expressed as percentage of predicted value. At least three measurements were taken for each variable to guarantee repeatability.

HRCT Assessment and Visual Reader-Based Disease
Quantification. All HRCT examinations were performed according to standard protocol using a CT 64 GE light Speed VCT power scanner with a rotation tube scanning time of 0.65 s. Scans were obtained at full inspiration from the apex to the lung base with the patients in the supine position, at 120 kV and 300 mAs and slice thickness and spacing of scans of 1.25 mm and 7 mm, respectively. HRCT assessment did not include the use of contrast media agents. The parenchymal abnormalities on HRCT were coded and scored in all the images by two independent readers, blinded with respect to the results, according to Warrick et al. [11]. A point value was assigned to each abnormality as follows: ground-glass appearance = 1; irregular pleural margins = 2; septal/subpleural lines = 3; honeycombing = 4; subpleural cysts = 5. In each patient the "severity of disease" score was obtained by adding single point values. The mean values of the two independent readers were used as a final control group. An "extent of disease" score was obtained by counting the number of bronchopulmonary segments involved for each abnormality: one to three segments scored as 1; four to nine segments scored as 2; more than nine segments scored as 3. The severity and extent of disease were then calculated as total HRCT score (range from 0 to 30). The HRCT examinations were randomised and reviewed by two radiologists (E.B and M.C) with more than 15 years of experience in general and thoracic radiology who were unaware of clinical or functional findings. The preliminary agreement between the two radiologists with regard the total HRCT scores was good: intraclass correlation coefficients (ICC) 0.81.

Computerized-Aided Scoring Quantification Process.
HRCT images were reconstructed and analysed by OsiriX, a DICOM viewer software [19] (OsiriX version 3.9; Apple Computer) on a Mac Mini (2.8 GHz Intel Core 2 Duo Desktop Computer, 16 GB random-access memory; Apple Computer, Cupertino, CA, USA) running Mac Operating System X 10.8.5. After inserting the DVD containing HRCT data in the drive, the DICOM data were automatically extracted from the disc by OsiriX. The DICOM data were stored in the OsiriX using the "Copy linked files to Database folder" under "file" in the OsiriX dropdown menu. The program uses a semiautomated thresholding technique to isolate the lungs from other tissues and structures. For each section, a semiautomatic lung parenchymal segmentation was performed in order to obtain analysis of all images ( Figure 1). Then, descriptive parameters of the computer analysis were calculated. The radiodensity of the lung parenchyma isolated from the mediastinum and the thoracic wall ranges between −200 and −1024. According to Shin et al. [20], the value of radiodensity for ILD was considered from −700 to −500. So, in the present study, the thresholds of −1024 and −700 were used for the evaluation of the nonfibrotic HRCT lung volume. Adopting these radiodensity values we calculated the pulmonary fibrosis fraction. Figure 1 illustrates the sequences of the OsiriX segmentation process. A minimal user intervention in the CaM (one author) was required to exclude lung structures not relevant for the assessment (i.e., trachea, blood vessels, and large bronchi near the hilum).

Statistical
Analysis. All data were entered into a Microsoft Access database developed for the management of all data. The data were analysed using the SPSS version 11.0 (SPSS Inc., Chicago, IL) and the MedCalc version 10.1 (Med-Calc Software, Mariakerke, Belgium). Measurement reproducibility of repeated OsiriX-based assessments and interobserver agreement between the two readers of HRCT were tested using the ICC. This value is an expression of 95% of all measurements that is expected to be included within the range (limits of agreement). Feasibility of computerized analysis by OsiriX was estimated by comparing the time spent for the quantitative analysis using the CaM with respect to HRCT CoVR semiquantitative analysis by the independent samples " " test. The relationships among the lung segmentation analysis, the readers, and the PFTs results were calculated using linear regression analysis and Pearson's product moment correlation (" " values). Student's -test was used to compare two subgroups of the study population for continuous characteristics, and the chi-square test was used for categorical characteristics. Differences corresponding to < 0.05 were considered significant. Receiver operating curve (ROC) analysis was also performed for determination of CaM optimal extent threshold. ROC curve was plotted to determine the area under the curve (AUC) and determine sensitivity, specificity, positive and negative predictive values, and positive likelihood ratio (LR+). Although there is no official consensus with this regard, fibrotic scores on CoVR, defined according to Warrick et al. [11], were categorised into two groups as follows: ≤7 (mild lung fibrosis) and >7 (severe lung fibrosis). A minimum score of 7 on CoVR system would be required to consider HRCT abnormalities in SSc as predictive of pulmonary disease [21]. We used this cut-off as external criteria to dichotomize the patients. The nonparametric Wilcoxon's signed rank test was used for calculation and comparison of the areas under the ROC curves (AUC-ROCs) derived from the sample of patients.

Feasibility.
The mean time spent completing the quantitative evaluation by CaM was 1.3 min (range 1 to 2.1 min) whereas it was 10.9 min (range 5.9 to 14.9 min) adopting the semiquantitative visual assessment. The difference was highly significant (Student's -test, < 0.0001).

Discussion
An accurate characterization and quantification of ILD is essential for a correct clinical management of patients with SSc [10,11,22]. Our results indicate that the CaM analysed by OsiriX provides a good concurrent validity, reliability, and feasibility for the assessment of ILD in patients with SSc. Considering the promising advent of user friendly software's [19], this approach may be effectively used in both clinical practice and research setting.
To date different visual-based semiquantitative approaches to assess ILD have been proposed [11,22]. In most of them the final score was calculated either by agreement between two reviewers or by obtaining the mean of the reading scores by two reviewers. Warrick et al. [11] have proposed a semiquantitative method in patients with SSc which allows the evaluation of the different patterns of abnormalities, rated according to the severity and extent of lung damage, through a total overall HRCT score. Kazarooni et al. [22] have divided each lung into three zones (lung apex to aortic arch, aortic arch to inferior pulmonary veins, and inferior pulmonary veins to lung bases) and scored the extent of lung abnormality of each zone on a scale ranging from 0 to 4. More recently, a simplified scoring system based on the grade of the lung involvement more or less than 25% has also been suggested [4].
The CoVR method plays an important role in the interpretation of ILD patterns. Moore et al. [23] have shown that a simple and quick grading system for the extent of total lung disease on HRCT has prognostic significance in SSc, even after adjustment for other prognostic covariates. Similarly, Goh et al. [4] have demonstrated that an easily applicable limited/extensive staging system for SSc-ILD, based on combined evaluation with HRCT and PFTs, provides discriminatory prognostic information. In particular, the risk of death in SSc-ILD and, separately, progression of disease rose strikingly when the overall percentage of lung involved on HRCT exceeded 20% [4]. This threshold value was defined using formal CoVR scoring systems, which is seldom practicable in routine practice.
Although the CoVR is currently the most popular method used [24], it has several disadvantages such as subjectivity and difficulty in estimating accurately the different components of disease (honeycombing, reticular, and linear ground-glass opacity). A further difficulty is represented by the complex task of integrating the extent of the abnormalities seen on several HRCT slices and deriving a quantitative measure of the total extent of abnormality of a lung zone or within the lung. Finally, CoVR scoring systems provide lack of reproducibility, with larger interreader and intrareader variation [9]. Compared with visual-based assessments, CaM scores are demonstrated to improve objectivity, sensitivity, and repeatability when measuring the quantitative changes in ILD [16,17].
As mentioned before, our preliminary experience using this system in SSc patients showed a high agreement with respect to the semiquantitative HRCT analysis performed by experienced radiologists and a significant association between the descriptive parameters by both the quantitative OsiriX assessment and the HRCT semiquantitative analysis [17]. It has been previously shown that there is a significant variability in the lung density in normal individuals, and this factor should be taken into account when considering the use of CT lung density mapping for the assessment of pulmonary disease. However, the radiodensity of the lung parenchyma isolated from the mediastinum and the thoracic wall ranges between −200 and −1024. CT attenuation of normal lung parenchyma is reported to range from −800 to −900 HU, depending on inspiration or expiration, on the level of inspiration achieved for the scan, and on anatomical location that is ventral or dorsal portion [25]. Shin et al. [20] defined the area with attenuation between −500 and −700 as the value of radiodensity for ILD. The author included both ground-glass opacity and reticular opacity. Contrary Yabuuchi et al. [26] used the thresholds of −500 and −800 HU for the evaluation of ground-glass opacity. Moreover the CT attenuation values for consolidation and ground-glass opacity were separated and the radiodensity of −500 UH was selected as the thresholds between consolidation and groundglass opacity. However, the application of a threshold value of −800 HU may include small peripheral pulmonary vessels and cause an overestimation of interstitial lung disease [27]. In our method, in agreement with Shin et al. [20], −700 HU is selected as the predefined threshold to obtain lung regions.  Our current results confirmed that the quantitative OsiriX assessment system correlate well with visual-based scoring techniques for the detection of HRCT extent and severity of disease [17]. The percentage of extent of lung disease showed a significant correlation ( < 0.0001) with FVC, FEV 1 , and DLco.
In a purely clinical context, we have shown that CaM scoring system of ILD may have several advantages in the management of SSc patients. First, the advantage of the OsiriXbased measurement is the use of a continuous scale, rather than a categorical Likert scale. The continuous computerized scoring, in comparison with categorical rating used in the visual (semiquantification) assessment, provides greater power for detecting a treatment effect within a given sample size or allows an approximately 50% reduction of the sample size [28]. The possibility of having the percentage extent of total lung disease, easily obtainable from the rheumatologist in a clinical outpatient setting with a simple and rapid procedure, represents a clear advantage for the assessment of responsiveness including prognostic value data. Secondly, the OsiriX segmentation algorithm proved to be time-efficient, reproducible, and requiring less than two minutes for the total lung evaluation. A third advantage is that OsiriX-based computerized scoring system can be implemented in the setting of a multicenter trial in SSc-ILD using digitized HRCT images. Finally, OsiriX is user friendly open-source software [17,19] that even rheumatologists can easily manipulate and generate 3D reconstructed images and acquire whole images of 3D anatomical structures. The training in using OsiriX software can be easily and quickly completed [19], providing clinicians with a valuable tool for the evaluation of disease extent and interpretation of patterns of pulmonary function impairment in SSc patients.
We are aware of some limitations in our study. First, the diagnosis of pulmonary fibrosis was based on radiological findings, not by histological examination. Secondly, the CaM scoring system used in this study focuses on quantification of total disease extent and lacks a differentiation between different radiographic patterns, but the clinical significance of these HRCT features is as yet unknown. However, the disease extent has been shown to be a strong predictor of functional pulmonary impairment. Furthermore, our quantitative evaluation did not focus on anatomic compartments of the lung; however, in comparison to emphysema, ILD tends to be widespread. Formal HRCT scoring, especially in clinical trials, is commonly performed using predefined anatomic levels rather than pulmonary lobes, as HRCT examinations are still widely performed due to radiation protection. Third, the use of our density mask method for the quantitative analysis of ILD could not discriminate accurately  the low attenuation areas of honeycombing from the normal lung density when honeycombing cysts are present which may underestimate the ILD severity. Therefore, the discrepancy of quantification between the CaM scoring systems and CoVR method may be intrinsic to the densitometric analysis. Regarding this intrinsic discrepancy, the usefulness of the automated system was criticized due to the usual presence of lung increased density (ground-glass opacities) and decreased density (cystic spaces, honeycombing) [29]. Finally, the sample size of our study was limited and the effect of pulmonary hypertension was not assessed. Therefore there might be a limit in the comparison with measures of disease severity.
In conclusion, our results showed that the CaM using an open-source software DICOM application-OsiriX-may assist the rheumatologist analysis of lung HRCT data and provides an objective method for supplementing subjective visual-based grading of the extent of ILD to achieve precise and reader-independent quantification. Compared with previous in-house software, OsiriX will enable wider use, resulting in easier computer-aided technique application in routine practice and better communication among different hospitals.
Computer-derived extent of total lung disease appears as discriminant method and, therefore, can help to produce an objective measure and to obtain prognostic information in SSc-ILD. Although these encouraging data require further validation in prospective studies, we believe that the CaM may improve the ability of rheumatologists to quantify accurately the extent of ILD in SSc patients in both daily clinical practice and clinical trials.

Disclosure
All the authors declare that they have not received any financial support or other benefits from commercial sources for the work reported in this paper nor any other financial interests that could create a potential conflict of interest or the appearance of a conflict of interests with regard to the work.