The Haralick texture features are a well-known mathematical method to detect the lung abnormalities and give the opportunity to the physician to localize the abnormality tissue type, either lung tumor or pulmonary edema. In this paper, statistical evaluation of the different features will represent the reported performance of the proposed method. Thirty-seven patients CT datasets with either lung tumor or pulmonary edema were included in this study. The CT images are first preprocessed for noise reduction and image enhancement, followed by segmentation techniques to segment the lungs, and finally Haralick texture features to detect the type of the abnormality within the lungs. In spite of the presence of low contrast and high noise in images, the proposed algorithms introduce promising results in detecting the abnormality of lungs in most of the patients in comparison with the normal and suggest that some of the features are significantly recommended than others.
The lung is an organ that performs a multitude of vital functions every second of our lives. This fact leads to considering lung abnormalities, life-sustained diseases that have high priority in detection, diagnosis, and treatment if possible. Our focus in this paper will be on two popular abnormalities within the lung, which are pulmonary edema and lung tumor. Pulmonary edema (water in the lungs) is caused by fluid building up in the air sacs of the lungs [
Computer-aided diagnosis (CAD) schemes for thoracic computed tomography (CT) are widely used to characterize, quantify, and detect numerous lung abnormalities, such as pulmonary edema and lung cancer [
The aim of our work is to develop an automated novel texture analysis based method for the segmentation of the lungs and the detection of the abnormalities, whether pulmonary edema or lung tumor. Haralick’s features based on the gray level cooccurrence matrix (GLCM) are applied to capture textural patterns in lung images. The objective of this work is the selection of the most discriminating and finding out the significant texture features that can differentiate between these two types of abnormalities, in comparison to normal.
Haralick features are statistical features that are computed over the entire image. These measurements are utilized to describe the overall texture of the image using measures such as entropy and sum of variance. Chaddad et al. propose an approach, based on Haralick’s features, to detect and classify colon cancer cells. This work aimed to select the most discriminating parameters for cancer cells [
In this paper, CT images are first preprocessed for noise reduction and image enhancement, followed by segmentation techniques, as the tools to segment the lungs, and finally Haralick texture features [
This paper presents a new automatic lung cancer detection system based on Haralick texture features extracted from the slice of DICOM Lung CT images. The proposed system is accomplished in four stages: image preprocessing, lung image segmentation, feature extraction, and classification. Statistical analysis is used to obtain the best features for classification to differentiate between lung cancer patients, ordered edema patients, and control subjects. The following sections will describe in detail these stages. All image analyses were achieved without any knowledge of patient clinical characteristics or status.
Patients with either a lung cancer tumor or pulmonary edema were encompassed in the study. This study included two datasets, the first dataset referred to the Radiology Department at New Elkasr ElAiny teaching hospital, University of Cairo. The other dataset was obtained from The Cancer Imaging Archive (TCIA) sponsored by the SPIE, NCI/NIH, AAPM, and the University of Chicago [
The main goal of preprocessing is to improve the quality of an image as well as make it in a form suited for further processing by human or machine [
Figure
(a) The lung CT image; (b) the histogram equalized image; (c) the Weiner filtered output image.
Lung segmentation step aims to basically extract the voxels corresponding to the lung cavity in the axial CT scan slices from the surrounding lung anatomy. The segmentation technique proposed in [
(a) The threshold image; (b) the eroded image; (c) the lung mask mirror; (d) the mask projection of the corresponding lungs images; (e) the extracted lungs.
Feature extraction is the process of obtaining higher-level information of an image such as color, shape, and texture. Texture is a key component of human visual perception. Statistical texture methods analyze the spatial distribution of gray values, by computing local features at each point in the image and inferring a set of statistics from the distributions of the local features. Haralick et al. introduced Gray Level Cooccurrence Matrix (GLCM) and texture features back in 1973 [
GLCM shows how often each gray level occurs at a pixel located at a fixed geometric position relative to each other pixel, as a function of the gray level [
Contrast (Moment 2 or standard deviation) is a measure of intensity or gray level variations between the reference pixel and its neighbor. Large contrast reflects large intensity differences in GLCM:
Moment 1 (
The calculation of the Haralick texture features using the previous equations for the CT images volume sequences for every segmented lung (right and left) separately was performed. For each participant the gray level cooccurrence texture features: contrast, homogeneity, entropy, energy, correlation, and
For the purpose of random lung assignment in healthy volunteers, the left lung represented the diseased lung in the same percentage of cases as the patient population. For the acute data, two single factor analyses of variance (ANOVA) tests were conducted for each Haralick texture feature measurement between affected (either left or right) and fellow lung (either left or right) for both categories cancer and edema patients. A single factor analysis of variance (ANOVA) was conducted as well between patients and controls. Other between-subject single factor analyses were conducted to find out the significant Haralick features that could differentiate cancer from edema.
Two datasets of 532 CT images were included. For each lung CT preprocessed image, we separate the left lung from the right lung automatically as discussed before in Section
ANOVA (1 within-subject factor) results for cancer patients Haralick texture features (comparison between AL and FL). AL: affected lung; FL: fellow lung.
Feature name | AL (average ± SEM) | FL (average ± SEM) | AL versus FL |
---|---|---|---|
Homogeneity | 0.511 ± 0.01 | 0.517 ± 0.01 |
|
Energy | 0.372 ± 0.01 | 0.374 ± 0.01 |
|
Correlation | 0.964 ± 0.001 | 0.965 ± 0.001 |
|
Contrast | 231.98 ± 4.54 | 231.76 ± 4.54 |
|
Entropy | 8.0 ± 0.19 | 7.94 ± 0.19 |
|
|
0.003 ± 0.02 | 0.007 ± 0.02 |
|
|
231.13 ± 4.54 | 231.75 ± 4.01 |
|
|
−164 ± 190.79 | −683.99 ± 155.33 |
|
|
1784467 ± 83311 | 1654941 ± 56455 |
|
Diff_ASM | 0.226963389 ± 0.006 | 0.229353096 ± 0.005 |
|
Diff_Mean | 6.195 ± 0.08 | 6.28 ± 0.09 |
|
Diff_Entropy | 3.159 ± 0.03 | 3.55 ± 0.03 |
|
ANOVA (1 within-subject factor) results for edema patients Haralick texture features (comparison between AL and FL). AL: affected lung; FL: fellow lung.
Feature name | AL (average ± SEM) | FL (average ± SEM) | AL versus FL |
---|---|---|---|
Homogeneity | 0.64 ± 0.013 | 0.60 ± 0.020 |
|
Energy | 0.428 ± 0.01 | 0.429.01 ± 0.01 |
|
Correlation | 0.006 ± 0.001 | 0.008 ± 0.001 |
|
Contrast | 177.07 ± 5.89 | 188.58 ± 4.26 |
|
Entropy | 2.10 ± 0.04 | 2.19 ± 0.067 |
|
|
0.52 ± 0.03 | −0.47 ± 0.02 |
|
|
199.975 ± 9.658 | 218.583 ± 10.085 |
|
|
5219 ± 1436 | −7539 ± 885 |
|
|
2854294 ± 208886 | 2382237 ± 263250 |
|
Diff_ASM | 0.377 ± 0.01 | 0.288 ± 0.08 |
|
Diff_Mean | 4.07 ± 0.4379 | 4.89 ± 0.48478 |
|
Diff_Entropy | 2.96 ± 0.05 | 3.29 ± 0.05 |
|
ANOVA (1 within-subject factor) results summary of statistics
Feature name | Diseased versus normal controls ( |
Feature name | Diseased versus normal controls ( |
---|---|---|---|
Homogeneity |
|
|
|
Energy |
|
|
|
Correlation |
|
|
|
Contrast |
|
Diff_ASM |
|
Entropy |
|
Diff_Mean |
|
|
|
Diff_Entropy |
|
ANOVA (1 between-subject factor) results summary of statistics
Feature name | Cancer versus edema patients ( |
Feature name | Cancer versus edema patients ( |
---|---|---|---|
Homogeneity |
|
|
|
Energy |
|
|
|
Correlation |
|
|
|
Contrast |
|
ASM |
|
Entropy |
|
Mean |
|
|
|
Entropy |
|
From Table
Table
Considering Tables
The texture features analyses are well known approaches to quantify and express the heterogeneity that may not be appreciated by clinical naked eyes, and it was presented before as good imaging biomarkers to differentiate between diseases. In this paper an evaluation of the Haralick texture features is done in order to identify the most significant features that can be used in order to detect and differentiate abnormalities within the lungs for cancer and edema versus normal. Our results indicate that entropy determined by gray level cooccurrence matrix and ASM is significantly different in edema patients versus normal while it is not in cancer patients versus normal. Since the entropy is the degree of randomness or the degree of disorder in the image, and the angular second moment represents the uniformity in the image, this may be interpreted as the cancer disease causing a localized heterogeneity in the diseased specified area in the lung while the edema causes heterogeneous disorder in the whole lung image. High entropy values calculated implies that the elevated level of disorder and disorganization occurred due to the edema diseased lung versus the cancer diseased lung. The energy feature that is derived from the angular second moment measures and representing the local uniformity of the gray levels is a good biomarker to differentiate between cancer and edema diseases. From Table
While our results are promising, there is still further work that can be done in the detecting of the abnormality within the lungs to detect the type of that abnormality whether it will be a lung cancer or edema. A preliminary investigation has been done using statistical analysis to identify the most useful texture features that can be fed to any classification technique later. This statistical analysis is done using ANOVA. After selecting these features we can feed them for better localization and classification as further work.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors acknowledge the SPIE, the NCI, the AAPM, and The University of Chicago for providing public access to the lung cancer dataset.