Refined Color Texture Classification Using CNN and Local Binary Pattern

Representation and classification of color texture generate considerable interest within the field of computer vision. Texture classification is a difficult task that assigns unlabeled images or textures to the correct labeled class. Some key factors such as scaling and viewpoint variations and illumination changes make this task challenging. In this paper, we present a new feature extraction technique for color texture classification and recognition. ,e presented approach aggregates the features extracted from local binary patterns (LBP) and convolution neural network (CNN) to provide discriminatory information, leading to better texture classification results. Almost all of the CNNmodel cases classify images based on global features that describe the image as a whole to generalize the entire object. LBP classifies images based on local features that describe the image’s key points (image patches). Our analysis shows that using LBP improves the classification task when compared to using CNN only. We test the proposed approach experimentally over three challenging color image datasets (ALOT, CBT, and Outex).,e results demonstrated that our approach improved up to 25% in the classification accuracy over the traditional CNN models. We identify optimal combinations for each dataset and obtain high classification rates. ,e proposed approach is robust, stable, and discriminatory among the three datasets and has shown enhancement in classification and recognition compared to the state-of-the-art method.


Introduction
e motivation of the usage of CNN models as features descriptors is the ability of the deep neural network to capture the high-level features that can be a key point for the classification of the texture images. Texture plays a significant role in distinguishing objects in color images. Texture classification involves a two-phase process. e first phase is extracting the features, which provides a feature-based description for each texture type; this phase tends to select features unaffected by image transformation, such as scaling, translation, and rotation. e second phase tends to recognize the texture from the extracted features. Texture recognition is increasingly set as a critical issue in computer vision that has many applications such as biomedical image processing [1][2][3][4][5], object detection [6][7][8], and remote sensing [9][10][11] application fields. e scale-invariant feature transform (SIFT), proposed by Lowe [12], is a common sparse descriptor. Mikolajczyk and Schmid changed the SIFT descriptor by altering the gradient location orientation grid and the quantization parameters of the histograms [13]. Chen et al. [14] proposed a robust discriminative descriptor called Weber local descriptor (WLD) based on human perception. Gabor wavelet descriptor is considered one of the most widely used dense descriptors [15].
e Gabor wavelet has extensively used image recognition applications such as face recognition, scene analysis, motion tracking, and face recognition [16]. Ershad and Fekri [17] proposed a new approach to analyze the texture based on its grain components and classify it from grain components histogram and statistical features. Local binary pattern (LBP) is one of the texture descriptor methods used to describe and represent texture. Ojala et al. [18] presented it as an adequate gray-level image features descriptor. It was then applied to the color image by Mäenpää and Pietikäinen [19] and used in many color representation and recognition tasks such as facial expression recognition [20,21], gender classification [22,23], scene classification [24,25], medical image analysis [26][27][28], and 3D face recognition [29]. Ershad and Tajeripour [30] proposed a noise-resistant and multiresolution version of LBP called HCLBP and proposed an approach that adapts to output images acquired using every kind of digital camera. e convolution neural network (CNN) is a deep learning architecture that attracts widespread interest in real-world applications because it automatically extracts and recognizes features automatically, eliminating the need for manual feature extraction and selection techniques.
Lately, convolution neural networks (CNNs) are undergoing a revolution in color texture recognition applications [14,[31][32][33][34][35][36]. Dabeer et al. [37] introduced a novel automatic technique for breast cancer detection based on CNN. Hosny et al. [38] proposed an automatic skin lesion classification approach based on transfer learning theory and a pretrained deep neural network. Genovese et al. [39] introduced a new technique for fusion palm print and inner finger texture using a single hand acquisition based on CNN. Umer et al. [40] presented a CNN technique that can identify COVID-19 patients from regular people from chest X-rays. Wu and Lee [41] used the CNN model for enhancing sound texture in acoustic scenes. Orii et al. [42] suggested a new recognition approach for tactile texture using CNN. Weskley et al. [43] described a new approach based on CNN for classifying the degree of bread browning during baking. Hosny et al. [44] provided a precise improvement method on their approach for skin lesions classification centered on deep neural network AlexNet. ey used transfer learning which can accurately classify seven various types of lesions. Kassem et al. [45] proposed a novel technique using transfer learning and pretrained deep neural network, GoogLeNet, to accurately classify eight various lesion types. Hosny et al. [46] proposed a new approach for color texture recognition; the proposed approach has a significant benefit. It uses LBP and multichannel moments to obtain local and global color texture features, leading to a high classification result. is success motivates us to propose a new technique for color texture classification based on the convolution neural network and local binary pattern to extract global and local features and combine them to obtain high color texture classification results. e rest of the paper is arranged as follows. In Section 2, convolution neural network models and its architecture are presented. In Section 3, we introduce the local binary patterns descriptor. Section 4 presents the proposed method and simulation results are shown in Section 5.

Convolution Neural Network
e primary concept of using CNN features is its capability to capture the high-level features that are considered the key point to separate the texture image. Convolution neural network (CNN) captures low detail features from the first convolution layers and then generates high detail features in the last layers. Selecting the proper right set of features has always been a challenging task for the machine learning algorithm. CNN has numerous extensive applications in different fields. Whereas image classification and recognition are challenging tasks for machine learning algorithms, CNN is highly promising for accuracy [47]. CNN generally consists of convolution, pooling, and fully connected layers, as shown in Figure 1.

Convolution Layer.
It is used to apply filters to the input feature map and produce convolved features. e following two factors define the convolution layer.
(1) e size of the block extracted from the input feature map.

Pooling Layer.
Generally, applied after the convolution layer, it is a downsampling (dimensional reduction) operation such as average pooling, min pooling, and max pooling.

Fully Connected Layer.
It is typically found towards the end of the CNN architecture. It works on such a flattened input, in which each input is linked to all nodes.
Each model of CNN has its characteristics. Table 1 summarizes the description of different CNN architecture models, image input size, accuracy, and model parameters number.

Local Binary Patterns
e local binary patterns (LBP) are considered one of the most effective texture descriptors. It was introduced by Ojala et al. [18] for the first time, and it is used to represent the local features of an image which means the key points of an image. e classic LBP operator is described as a window of 3 × 3 pixels. e center pixel of this window is considered a threshold; if the neighboring pixel's value is less than the threshold value, the pixel value is labeled 0, else it is 1. is method will generate an 8 bit binary number converted to a decimal number, as shown in Figure 2. e classic LBP descriptor can be expressed in the form below: where In (1), P c � P(m, n) is the central pixel at (m, n) location, and P j � P(m j , n j ) is a neighboring pixel of the central pixel P c , where 2 Mathematical Problems in Engineering where J represents the neighbor pixels number and R is the distance from P c . Ershad and Tajeripour [56] proposed a new approach for color texture classification based on LBP and Kullback-Leibler divergence, which used different neighborhood sizes to achieve effective classification performance.

The Proposed Method
We proposed a novel feature extraction approach that aggregates the features extracted from local binary patterns (LBP) and convolution neural network (CNN) to obtain more accurate and consistent features for representation, leading to higher texture recognition accuracy results. e performance of our descriptors is assessed both individually and in combination with local binary patterns (LBP). Firstly, we have fixed the image size at the preprocessing phase to be equal to the proposed image size of the pretrained deep learning models (224 × 224 and 227 × 227). We have made equalization of each image histogram to normalize under different illumination conditions and then smooth it to reduce the noise introduced by the equalization. e CNN and LBP have done the feature extraction. e local binary patterns are used to capture an image's local features representing a particular image portion. e radius of LBP neighborhoods was 1, and the number of neighbors was 8. e dimensions of the feature vector extracted from CNN are [1,57]. Convolution neural network is used to capture global features that represent the image en bloc. e dimensions of the feature vector extracted from CNN are [1,1000]. We aggregated the features of the two feature extraction approaches (CNN, LBP) in one vector.
. e dimensions of the resultant feature vector are [1,1059]. We find that the aggregation of features enhances efficiency and reliability over using CNN models individually. e flowchart of the proposed approach is shown in Figure 3.

Datasets.
Numerical experiments were carried out to validate and analyze the performance of the proposed descriptor. e experiments were executed on a Core i7 machine with 16 GB RAM. ree benchmark color textures datasets (ALOT, CBT, and Outex) were used to perform the numerical experiments.

ALOT Dataset. ALOT dataset [57] is a series of color image textures.
e rotation angle and illumination conditions are varied. More than 27.500 images are included in it. e original size of each image is 768 × 512. We focused on a fixed collection of 1500 texture images that varies from 15 classes of color textures. is collection has been proposed by [58]. Color texture samples of the ALOT database are shown in Figure 4.  e second dataset, colored Brodatz texture (CBT) dataset [58], is a grayscale-to-color extension of the original Brodatz dataset. is dataset has the advantage of keeping the original dataset's rich textural content while also providing a wide range of color content. It contains 112 color texture images that have various background intensities. e original size of each image is 512 × 512. We used a subset of 1500 color images from 15 classes. is subset has been proposed by Bianconi [59]. Texture samples of the CBT database are shown in Figure 5.

Outex Dataset.
e third dataset, the Outex dataset [60], is an extensive range of color textures; the range of textures is displayed in varied types of lighting, resolution, and rotated angles. It contains 320 surface color textures.   e original size of each image is 746 × 538. A subgroup has been identified, containing 1500 images from 15 classes. is subgroup was proposed by [58]. Color texture samples of the Outex database are shown in Figure 6.

CNN Models Descriptors.
To begin, we used CNN models to extract features for the grayscale and color image datasets. e number of features extracted from each image was 1000. e three-color image datasets were split into training and testing sets at a ratio of 70 : 30, respectively, in these simulations. e training set is used in classifier practice, while the testing set is used for testing, analysis, and evaluation. Using the linear support vector machine classifier because of its efficiency in solving multiclass classification problems, we checked the performance of CNN models over the Brodatz grayscale dataset. e results revealed that the performance of CNN has some limitations   Mathematical Problems in Engineering 5 in terms of accuracy and needs some enhancement. Table 2 details the results of the CNN descriptor over the grayscale Brodatz dataset. en we checked the performance of CNN models over the three challenging datasets (ALOT, Outex, and CBT). We have computed an accuracy and confusion matrix for performing classification in all the given datasets. CNN models are trained once over these datasets. Our tests showed that their performance is excellent and fair but needs to enhance the accuracy results. e highest accuracy result of CNN models in the ALOT dataset was 90.89%, using Xception as a descriptor. In the Outex dataset, it was 91.78% by using ResNet-101 as a descriptor. In CBT, the dataset was 86.45% by using AlexNet as a descriptor. Table 3 details the results (the best performed CNN in each dataset is indicated in bold) of CNN models using the SVM classifier for the three datasets.
We have computed an accuracy and confusion matrix for performing classification in all the given datasets. Figure 7 represents ROC curves of CNN models as descriptors using support vector machine classifier for ALOT dataset. Figures 8 and 9 represent ROC curves of CNN models as descriptors using support vector machine classifier for the CBT and Outex dataset, respectively. e ROC analysis revealed that the CNN model descriptors had a fair AUC score (∼0.8), which needs improvement to achieve the highest accuracy result. is observation led the authors to look forward to a local features extraction descriptor with the CNN model.

LBP Models Descriptors.
Secondly, we used a traditional LBP descriptor to extract features for the color image datasets. e number of features extracted from each image was 59; also using the SVM, we checked the consistency of CNN models over the Brodatz dataset. e result showed that the performance needs some improvements. Table 4 shows the accuracy results of LBP using the SVM classifier for the grayscale Brodatz dataset. en we checked the consistency of CNN models over the three challenging datasets (ALOT, Outex, and CBT). Our analysis and tests showed that their performance is excellent and fair but needs to enhance the accuracy results. e accuracy result of LBP models in the ALOT dataset was 70.3%, in the CBT dataset was 73.45%, and in the Outex dataset was 66.2%. Table 5 details the accuracy results of LBP using the SVM classifier for the three datasets.

CNN + LBP Descriptors.
e first set of analyses confirmed that the accuracy results obtained using only CNN models as grayscale and color image feature descriptors have some limitations in classification accuracy and need some improvement. erefore, we used an aggregated feature extraction descriptor (CNN + LBP) for the grayscale and color textures in the second experiment. Convolution neural networks are used to capture global features that describe the image overall. In contrast, Local binary patterns capture the local features of an image that describe a particular image part. We checked the performance of our proposed descriptor over the Brodatz grayscale dataset, and our result was remarkable and proved the adequate performance of the descriptor over a grayscale dataset. Table 6 details the proposed descriptors' results using the SVM classifier for the grayscale Brodatz dataset.
is performance encourages the authors to check descriptor performance over color datasets (ALOT, Outex, and CBT). Firstly, the number of features extracted from each image was 2559 (2500 global features from the CNN model   to be used in the input layer. e three datasets are split between training and testing datasets at a ratio of 70 : 30, respectively. Using a routine test for detecting overfit of our descriptor by monitoring the loss and accuracy on the training and validation sets, our result demonstrated that our model has no overfit. In the three challenging datasets (ALOT, Outex, and CBT), these tests confirm that the proposed descriptors show a clear advantage over the recognition accuracy, are effective and discriminant, and outperform the existing state-of-the-art methods. Using SVM classifier, the highest accuracy result of our descriptors in the ALOT dataset was 100% by using Xception + LBP descriptor, in Outex dataset was 96.67% by using Xception + LBP, and in CBT dataset was 100% by using ResNet-101 + LBP descriptor. e lowest accuracy result of our descriptors in the ALOT dataset was 93.12% by using ResNet-18 + LBP, in Outex dataset was 88.89% by using SqueezeNet + LBP, and in CBT dataset was 94% by using ResNet-18 + LBP. Table 7 details the proposed descriptors' results using the SVM classifier for the A LOT dataset. Figure 10 represents ROC curves of the proposed descriptors using SVM classifier for ALOT dataset. Table 8 details precisely the results of the proposed descriptors using the SVM classifier for the CBT dataset. Figure 11 represents ROC curves of the proposed descriptors using SVM classifier for CBT dataset. Table 9 details precisely the results of the proposed descriptors using the SVM classifier for the Outex dataset. Figure 12 represents ROC curves of the proposed descriptors using support vector machine classifier for Outex dataset. Table 10 reveals the high accuracy result obtained by the proposed method for the three datasets.
From Table 9 and Table 10, it is clear that the pretrained model Xception + LBP, ResNet + LBP, and Xception + LBP stand out in the performance when tested with ALOT, Outex, and CBT datasets, respectively. e result demonstrated that ResNet-18 + LBP has the lowest accuracy rate in two datasets (A LOTand Outex) and also SqueezeNet + LBP in CBTdataset. Figure 13 provided a comparison between our descriptor's promising accuracy result and the accuracy result obtained by CNN models for the ALOT dataset. Figure 14 provided a comparison between the promising accuracy result obtained by our descriptor and the accuracy result obtained by CNN models for the CBT dataset. Figure 15 provided a comparison between our descriptor's promising accuracy result and the accuracy result obtained by CNN models for the Outex dataset.
From previous results, we can observe that the accuracy increased when we used LBP + CNN as a descriptor for the three datasets, and to evaluate its performance, we presented a comparison between it and other state-of-the-art methods introduced by [61] and by [58] in Table 11. en we evaluated our proposed descriptors' performance when provided with a limited training set of textures to detect if it is a robust approach or not. Table 12 reveals the accuracy result obtained by the proposed method for the three datasets by providing only 20% for the training set. Table 13 reveals the accuracy result obtained by the proposed method for the three datasets by providing only 50% for the training set.

Conclusion
Overall, this paper presents a new feature extraction approach for color texture recognition techniques that aggregate features extracted from local binary pattern and convolution neural network. Features combinations provide more reliable, consistent performance. Our studies demonstrated that suggested descriptors have good features such as stability and robustness against different orientations and remarkably accurate classification results in recognition tasks. e experiments were performed on three benchmark color datasets (ALOT, CBT, and Outex) using support vector machine as a classifier. We have obtained results demonstrating the proposed classification technique's efficiency, robustness, and high accuracy against image transformation such as scaling, rotation, and translation. By analyzing classification and recognition accuracy, the findings are promising and encouraging.
Data Availability e datasets used in this study are from previously reported studies and datasets, which have been cited.

Conflicts of Interest
e authors declare that they have no conflicts of interest.