Enhanced Classification of Interstitial Lung Disease Patterns in HRCT Images Using Differential Lacunarity

The analysis and interpretation of high-resolution computed tomography (HRCT) images of the chest in the presence of interstitial lung disease (ILD) is a time-consuming task which requires experience. In this paper, a computer-aided diagnosis (CAD) scheme is proposed to assist radiologists in the differentiation of lung patterns associated with ILD and healthy lung parenchyma. Regions of interest were described by a set of texture attributes extracted using differential lacunarity (DLac) and classical methods of statistical texture analysis. The proposed strategy to compute DLac allowed a multiscale texture analysis, while maintaining sensitivity to small details. Support Vector Machines were employed to distinguish between lung patterns. Training and model selection were performed over a stratified 10-fold cross-validation (CV). Dimensional reduction was made based on stepwise regression (F-test, p value < 0.01) during CV. An accuracy of 95.8 ± 2.2% in the differentiation of normal lung pattern from ILD patterns and an overall accuracy of 94.5 ± 2.1% in a multiclass scenario revealed the potential of the proposed CAD in clinical practice. Experimental results showed that the performance of the CAD was improved by combining multiscale DLac with classical statistical texture analysis.


Introduction
Interstitial lung disease (ILD) is a common name for a heterogeneous group of complex disorders affecting lung parenchyma. The ILD affects similar lung regions and has identical clinical, radiological, and functional tests which hinder the differential diagnosis. However, ILD subtypes have different prognoses and treatments, so a correct diagnosis is essential [1]. High-resolution computed tomography (HRCT) imaging of the chest can offer such good image quality that it has become essential in the detection, diagnosis, and followup of ILD [2]. HRCT images of patients affected with ILD have specific patterns whose distribution and visual content analysis is particularly relevant in elaborating an accurate diagnosis [3].
Multidetector row computed tomography (CT) scanners generate a huge volume of data that must be visually examined by radiologists. This task is very time-consuming and requires experience, especially in the presence of ILD. Computer-aided diagnosis (CAD) for ILD is seen as a necessary tool to reduce interobserver and intraobserver variations, as well as to improve diagnostic accuracy by assisting radiologists in the detection, characterization, and quantification of pathological regions [3][4][5][6][7][8][9][10][11][12][13].
In this paper, a CAD scheme is presented allowing for a classification of regions of interest (ROIs), from HRTC images, in four classes of lung patterns: normal (NOR), ground glass (GG), honeycombing (HC), and emphysema (EMP). A scenario of binary differentiation, NOR class versus pathological class, is also considered. A generic flowchart of the proposed approach is shown in Figure 1. Classical statistical methods were used to extract and quantify texture information. The first-order (FO) analysis, the Spatial Gray Level Dependence Method (SGLDM), and the Gray Level spatial interaction between two or more pixel values. These methods have frequently been used in texture analysis of medical images, namely, in the description of ILD patterns [4,[8][9][10][11][12][13]. Given the heterogeneity of lung parenchyma in healthy subjects or in the presence of pathologies, a multiscale texture analysis was proposed using differential lacunarity (DLac). Lacunarity has been successfully used in analyzing medical images of different organs or structures, acquired by different types of equipment. In [14,15], fractal lacunarity analysis was applied to lumbar vertebra magnetic resonance images in order to extract relevant parameters, allowing for differentiation among three types of trabecular bone structure, from female subjects with different age and physiopathological status. In [16], lacunarity was combined with mean fractal dimension to differentiate between aggressive and nonaggressive malignant lung tumors, in sequence of contrast-enhanced CT images. An 83.3% accuracy can be valuable information in the choice of the appropriate treatment procedure. In [17], lacunarity analysis was applied for discriminating endoscopic images, obtained through a wireless capsule endoscopy technique, related to a common interstitial disease: ulceration. A promising classification accuracy of over 97% was obtained. In [18], lacunarity was applied to HRCT images of the chest to differentiate between normal and emphysematous regions of lung parenchyma. The preliminary results showed the potential of the proposed lacunarity features.
After a feature selection procedure, the obtained features were used to classify each ROI through a Support Vector Machines (SVM) algorithm. This learning algorithm has its origin in statistical learning theory and structural risk minimization [19,20]. It emerged as an efficient technique for solving classification problems. A comparative study between SVM and other popular classifiers was performed by Meyer et al. [21]. The results highlight that SVM classifiers are among the best. In [5], five common classifiers were compared according to their ability to differentiate six lung tissue patterns in HRCT images. The results showed that SVM provides the best trade-off between the error rate and the capacity for generalization, an important aspect to take into consideration given the diversity of pulmonary patterns.

Texture Analysis.
Texture is a major component in the interpretation of HRTC images in the presence of ILD. The most difficult aspect of texture analysis is to define a set of meaningful features that describe the texture associated with different lung patterns. Each ROI of × pixels was represented by a set of features extracted using the methods described in the sections below.

First-Order Statistics
Analysis. The CT attenuation of each ROI was described through FO statistical features extracted from ROI normalized histogram. Considering that is the number of gray levels used in ROI quantization, the normalized histogram ℎ( ), 0 ≤ < , gives the probability of observing the gray level in the ROI. From ℎ( ), six statistics features were computed: the mean, variance, skewness, kurtosis, energy, and entropy [22].

Spatial Gray Level Dependence
Method. The method of texture analysis proposed by Haralick et al. [23] describes the spatial dependence of gray level distribution between neighboring pixels. In the SGLDM, the second-order joint conditional probability distribution ( , | , ) can be estimated for a defined length and along a defined direction given by offsets in and direction: and . So, ( , | , ) is the probability that two pixels at a distance given by ( , ) have the gray levels and . The function ( , | , ) is defined as follows: (1) ROI( , ) is the intensity at pixel ( , ) and ( , ) is the total number of pixels pairs belonging to the ROI in the length and direction given by ( , ). The functions ( , | , ) can be written in matrix form Ω( , ) = ( , | , ), 0 ≤ , < , where is the maximum gray level of the ROI. For each pair ( , ), a different matrix Ω( , ) can be computed. Often, each matrix Ω( , ) is calculated taking into account a given offset and its opposite, giving rise to symmetrical matrices. In this study, from each matrix, six textural measures were extracted: angular second moment, entropy, inverse difference moment, correlation, contrast, and variance [23,24].

Gray Level Run-Length Method.
Run-length primitives were computed by the GLRLM [25]. A run-length primitive is a consecutive and collinear set of pixels with the same gray level. These primitives can be characterized by their length, direction, and gray level. Each chosen direction gives rise to a run-length matrix Ψ( ) whose elements represent the number of runs with gray level intensity and length , along the direction : where is the number of gray levels and is the possible maximum run-length in ROI along direction. From each run-length matrix Ψ( ), eleven features were extracted, listed, and described in [24][25][26][27].

Lacunarity Analysis.
Most of the textures and natural surfaces tend to have a fractal dimension (FD) that can be seen as a measure of irregularity [28]. However, different textures and natural surfaces can share identical FD. In order to differentiate these types of fractal patterns, Mandelbrot [29] proposed lacunarity, a complementary measure of FD that describes the texture of a fractal or their deviation from translational invariance [30]. More recent studies introduced lacunarity analysis as a technique that can be used to describe general spatial patterns, regardless of whether it is a fractal [31]. By using lacunarity, it is possible to distinguish the texture of spatial patterns through the analysis of their distribution gap sizes, at different scales.
Due to the extensive range of gray levels used in CT images acquisition, an appropriate algorithm to calculate lacunarity is that proposed in [32], called DLac. It is based on the gliding box [33] and the differential box counting algorithms [34].
According to DLac algorithm, the ROI is divided into overlapped windows of size × pixels, which scans the entire ROI, and a box of size × pixels, which scans each window ( < ). The box is placed on the left corner of the window and a column of accumulated cubes of size × × is used to cover the ROI intensity surface in the box place ( Figure 2). A sequential number is assigned to each cubic box, from bottom to top. Considering that the maximum and minimum pixel values lie in the cubic box V and , respectively, the differential height of the column is given by ( , ) = V− −1, where ( , ) is the box position. The box mass of the window , at specific coordinates, is obtained gliding the box inside the entire window : Considering ( , ), the number of windows with box mass calculated through a box , the respective probability function ( , ) is obtained by dividing ( , ) by the total number of windows. The DLac of the ROI for a box , given a window , is defined as follows:

Feature Selection.
After performing feature extraction, it is important to proceed with the selection of the most informative features. The resulting set of optimal features improves the classifier performance, while providing a reduction of the general data, as well as a better understanding of the data. The feature selection methods can be divided into two main groups: the filter methods and the wrapper methods. In the filter methods, the features are ordered based on a relevance index. In the wrapper methods, the process of feature selection involves the predictor. In these methods, subsets of features are scored during the learning machine training according to their predictive power [35].
In this work, the reduction of dimensionality was performed using the filter method stepwise regression [36]. In this systematic method, terms are added or removed from the multilinear model based on their statistically significant value of -statistics. The method begins with an initial model to which terms that have values less than an entrance tolerance are added, step by step, and the model terms with values greater than an exit tolerance are removed from the model.

Support Vector
Machines. The reduced number of parameters that need to be tuned as well as the good trade-off between the error rate and the capability of generalization of SVM classifier algorithm was decisive for its choice in the classification of lung patterns [5,21].
The SVM strategy, known as the kernel trick, is to map the input data space into a higher dimension feature space, via a nonlinear function kernel B : R → I, where separability between classes is improved. The distance between the nearest points of the two classes (margin) is maximized, creating an optimal separating hyperplane (OSH).
Considering the training data {x , }, = 1, . . . , , x ∈ R , ∈ {+1, −1}, each instance x is characterized by a vector of features (or attributes) and is associated with a class +1 or −1. The SVM machine learning solves the following quadratic optimization problem: min , , where w is a normal vector to OSH and is the bias. In hard margin SVM all the examples have to stay outside the margin and be well classified. However, in real datasets, it is necessary to deal with outliers that can be inside the margin or on the wrong side of the classification boundary. The solution proposed in [37], as the soft margin SVM, is to introduce constraint slack variables in the optimization problem. Ideally, these variables should be zero or have small values. So, to minimize the contribution of the slack variables, a penalty term is added to the objective function (5). This parameter is a trade-off between the maximization of the margin and the minimization of training errors. For a test example x, the decision function is given by There are several functions kernels (x, x ) = Φ(x)⋅Φ(x ) which can be selected to solve nonlinear problems. In this work, the Gaussian Radial Basis Function (RBF) was used: (x, y) = exp(−‖x − y‖ 2 )/(2 2 ). This function only has one parameter ( ) that has to be tuned during the classifier training and model selection.
As the standard SVM is a binary classifier, several methods were developed to extend SVM to an -class problem. Typically, these methods are based on combinations of binary classifiers such as one-versus-all and one-versus-one. In the one-versus-all approach, binary classifiers are trained. For example, the model of the th classifier is trained using the training instances of the -class as positive and all the instances of the other classes as negative. To classify a new instance, all the classifiers are run on this instance. The assigned class corresponds to the classifier which returns the largest distance from the separating hyperplane. In the oneversus-one approach ( −1)/2 classifiers are trained in a pairwise methodology, where each takes one class as positive and the other class as negative. To classify a new example, each classifier is run and a count is assigned to each class selected by the classifier. The new instance is classified as belonging to the class which obtains the greatest number of wins, such as in a winner-takes-all voting scheme [38]. A user friendly software was developed to visualize CT exams, to outline freehand ROIs (FH-ROIs), and to label and to characterize each FH-ROI [39]. HRCT scans were acquired using multidetector row CT LightSpeed VCT 64, from General Electric Healthcare, with an average voxel size of 0.7 × 0.7 × 1.3 mm 3 , without contrast agent. Each image was stored in a matrix of 512 × 512 pixels, with 16-bit gray level, using DICOM standard. Each image was displayed using a lung window with a centre in −700 Hounsfield Units (HU) and a width of 1500 HU. From CT images of 57 subjects (#29 female; #28 male) with an average age of 61 ± 16 years, radiologists outlined FH-ROIs from patients in different stages of disease.
The area and shape of each FH-ROI depend on the size and localization of the lung patterns. No more than one FH-ROI was selected from each side of the lungs. The lung region of each FH-ROI was sampled and covered with contiguous, nonoverlapping ROIs of 40 × 40 pixels [24]. Each FH-ROI was sequentially numbered and each ROI holds the reference of the FH-ROI from where it was extracted. For example, the FH-ROIxSy corresponds to ROI y extracted from FH-ROI x. Only the ROIs one hundred percent inside the FH-ROI boundary were considered in the train and test of the classifier; all the other ROIs were discarded. For example, in Figure 3 only the ROIs 8, 13, 14, and 18 respect the constraint. Table 1 resumes the dataset used to train and evaluate the proposed CAD system.

Model Selection and Performance Evaluation.
The dataset D (#1261 ROIs) was divided into a training set and a testing set in a proportion of 2/3-1/3, respectively. The samples were randomly selected using a holdout strategy with stratification, which ensures mutually exclusive partitions where the class proportions are roughly the same as those in the original dataset D [40]. The holdout procedure was based on FH-ROIs in order to ensure that ROIs extracted from the same FH-ROI are placed on only one of the sets, train or test.  During SVM training, a search was carried out to find optimal parameters to create the classifier model. In the case of the selected RBF kernel function, the parameters that have to be tuned were and , the regularization parameter that corresponds to a penalty over the training errors. The search for the optimal parameters was done using a grid search methodology in the hyperparameter space. So, for every point of the hyperparameter space, a -fold stratified cross-validation (CV) was performed, with = 10 [40,41]. The train set was randomly split into mutually exclusive folds 1 , 2 , . . . , , with approximately the same proportion of each class as in D. During CV, the classifier was trained and tested times. In each iteration, it was trained on − 1 folds and tested in the remaining fold , with ∈ {1, 2, . . . , }. The average of the -fold accuracy corresponds to CV accuracy. To avoid the model overfitting, the feature selection procedure was included in CV loop [35]. The parameters and features that allow the best CV accuracy were selected and a fine grid search was carried out around the selected parameters, for refinement. The final classifier model was built using all the training set, the selected features, and the optimal parameters previously found. The obtained model was evaluated in the test set, which was not used during classifier training.

⋅ ⋅ ⋅
The performance evaluation of the classifier was performed based on a contingency table, as exemplified in Table 2 for -class. Each matrix element has two indices; the first one corresponds to actual disease, while the second one corresponds to predicted disease. The elements of the main diagonal have equal indices representing correct classifications. All the other elements of the matrix correspond to incorrect classifications. For example, 31 means that a patient with disease 3 was misclassified as having disease 1. In the case of a binary classification, there are only normal and pathological classes, in a one-versus-all configuration.
After classification, the contingency table was filled with the obtained results. From these values, it is possible to compute a set of metrics allowing for the evaluation of the classifier performance. A common performance evaluation is overall accuracy, which measures the proportion of correctly classified instances for all the classes. Sensitivity of class measures the fraction of actual positive instances of that class that are correctly classified, while precision measures the correctness of the predictions for class . Specificity of class measures the fraction of actual negative instances of class that are correctly classified. These metrics can be computed by the following expressions [42]: Precision ( ) = ∑ , 6 BioMed Research International 2.6. Feature Settings. Each ROI was characterized by a feature vector extracted using FO, SGLDM, GLRLM, and DLac. The ROIs were quantized to 32 gray levels before the extraction of FO, SGLDM, and GLRLM features. The minimum and maximum HU values were calculated for all ROIs of D and each ROI was quantized according to this range. In SGLDM and GLRLM, the directions = {0 ∘ , 45 ∘ , 90 ∘ , 135 ∘ } were considered. In GLDM, the distance between neighboring pixels was = 1, in the four directions.
A multiscale approach was required due to the high variability of the appearance of lung patterns, even for the same pattern. The selected approach to calculate DLac allows a texture analysis at different scales by changing the value of , for a box size . The size of the window determines the coarseness of the scale. The size of should be relatively small in order to maintain sensitivity to small details present in the neighboring areas. Equation (8) illustrates the proposed approach to extract DLac features: DLac was computed for every box-window combination, subject to the condition < , in order to evaluate the DLac features that better differentiate the lung patterns. A DLac curve Λ( , = const) can be obtained by keeping the size of the gliding box constant and by changing the size of the window . To assure a common referential, DLac values were normalized in relation to the DLac value corresponding to the smallest window : Λ( min , = const) [43]. In order to take advantage of the extensive scale used in CT images, the curves of normalized DLac were computed using Hounsfield scale [−1000 UH; +1000 UH].

Results and Discussion
Two scenarios were considered in order to evaluate the potential of the proposed CAD and the importance of DLac features in the CAD performance improvement. In the first approach, the differentiation between normal and ILD patterns was considered. The next step was the differentiation of the four classes. In both cases, the feature vector was obtained using two different sets. Set 1 includes the features from FO + SGLDM + GLRLM. Set 2 also englobes DLac features.
The DLac features were extracted from DLac normalized curves. Various experiments have been conducted computing DLac curves for every box-window size for =  pixels and =  pixels. The ability to differentiate the four classes was evaluated. The best results were obtained for DLac normalized curves for = 4 pixels and =  pixels. Figure 4 shows the average of normalized DLac curves for patterns of all the dataset D. The results show that the DLac normalized curves are able to distinguish between lung patterns, being suitable to extract informative features.
The multiclass classification was performed using oneversus-one implementation [44]. In the case of the RBF kernel function, the parameters optimization was performed for the pair ( , ). First, the parameters were evaluated using a coarse grid for   Figure 5 depicts a graphic of contours of CV accuracy, obtained over a 10-fold CV using features of Set 1, for the binary classification scenario. A heuristic analysis of these curves provides a clear understanding of the influence of the parameters in the classifier performance, as well as clues to reduce search space. The results showed the importance of fine tuning the SVM parameters during the classifier training phase to achieve an optimized model. After some experiments, the search grid was reduced to = 2 3 , 2 3.5 , . . . , 2 13 and = 2 −2 , 2 −1.5 , . . . , 2 1 . For every coordinate of the hyperparameter space, a -fold CV was performed, with = 10. In each of the iterations feature selection was performed ( -test, value < 0.01) in the − 1 training folds. If the coordinates (2 , 2 ) generate the best CV accuracy, a finer search was performed around these values with a step of 0.25 upward and downward.
After the classifier training, the selected model was evaluated in the testing set. The training and testing of the classifier were repeated over fifty iterations. So, the training and evaluation of the classifier were performed in fifty different sets.  The presented metrics are the average of the results obtained over all the iterations.
In the binary classification scenario, the ROIs with normal pattern (#253) were considered as negative instances and the other ones as positive instances (#1008). In Table 3, the mean and standard deviation (SD) of overall accuracy, sensitivity, precision, and specificity are shown. The classifier performance obtained using features of Set 1 was 94.4 ± 2.0% for accuracy, 96.7 ± 1.2% for sensitivity, 96.0 ± 2.1% for precision, and 84.8 ± 8.6% for specificity. Using Set 2, the results increased to 95.8 ± 2.2% for accuracy, 97.9 ± 1.1% for sensitivity, 96.9 ± 2.1% for precision, and 88.1 ± 8.0% for specificity. High sensitivity and small SD values showed that the proposed CAD has the ability to signal the presence of abnormal patterns using both sets of features; that is, the number of false negatives is low. The integration of DLac features has primarily increased the specificity value, 3.3% on average, reducing the number of false positives. However, the SD value remains high (8.0). The correct classification of NOR class instances is not easy due the high variability of healthy lung tissue.
The classifier performance in the multiclass scenario was also improved using Set 2 (Tables 4 and 5). The overall accuracy increased from 91.9±1.9% to 94.5±2.1%. Moreover, the class-specific metrics for NOR, GG, and HC improved in a higher or lower percentage. For class EMP sensitivity slightly increased from 96.9% to 97.3%, the precision and the specificity maintained excellent values of 99.9%. Sensitivity of NOR class was the metric that most improved with a mean increase of about 5.3%, changing from 87.2% to 92.5%. In the case of NOR class these results mean that classspecific false negatives decreased; that is, the number of instances of NOR class that were categorized as pathological instances is smaller. In a clinical environment, this means that fewer patients are subjected to the stress of unnecessary additional medical exams. NOR class-specific precision and specificity improved from 89.6% to 92.3% and from 97.3% to 97.9%, respectively. These results mean that the number of NOR classified as GG GG classified as NOR Figure 6: Examples of misclassified ROIs between GG and NOR classes.
false positives for NOR class, that is, pathological instances classified as normal, decreased with Set 2. This type of misclassification has a serious meaning; that is, the CAD system does not signal the presence of a pathological pattern. The number of false negatives and false positives of GG and HC classes also decreased with the presence of DLac features increasing the correct classification of GG and HC instances. SD decreases in all metrics and classes, except for sensitivity of EMP class. So, the DLac features also improved the classifier stability. The highest percentage of misclassified examples occurred among NOR and GG classes. Almost fifty percent (47.8%) of all the classification errors were due to incorrect classifications between these two classes. Figure 6 illustrates some random examples of normal ROIs that were classified as GG, on left column, and examples of ROIs with GG pattern that were classified as NOR, on right column. Although GG opacities are characterized by areas of increased attenuation, sometimes they are not dense enough to "hide" the bronchovascular markings, especially in the initial phases of ILD diseases, associated with the presence of GG patterns.

Conclusions
A CAD scheme applied to HRCT images of the chest was proposed for the classification of healthy lung regions and with the presence of ILD. A texture analysis was performed to describe the lung patterns in study. Texture information of each ROI was represented by features extracted using a multiscale DLac approach combined with features obtained by classical statistical texture analysis methods. Feature selection and SVM training was performed over a 10-fold stratified CV. The performance evaluation of the classifier model was assessed using an independent test set.
Experimental results showed that DLac features improve the performance of the proposed CAD system in both suggested scenarios: normal versus pathological and multiclass. In this case, the number of false negatives and false positives of NOR class decreased, as well as the misclassification between instances of pathological classes. Differentiating the normal pattern from pathological patterns, the classifier accuracy improved with an average of 1.4% when DLac features were considered, resulting in a correct classification of 95.8 ± 2.2% of all instances. In the multiclass scenario the overall accuracy was improved from 91.9 ± 1.9% to 94.5 ± 2.1% due to the presence of DLac features. The performance of the proposed CAD highlights the good discriminatory properties of extracted DLac features, making it suitable to integrate clinical applications for the classification of patterns associated with ILD.