Renal Cancer Detection: Fusing Deep and Texture Features from Histopathology Images

Histopathological images contain morphological markers of disease progression that have diagnostic and predictive values, with many computer-aided diagnosis systems using common deep learning methods that have been proposed to save time and labour. Even though deep learning methods are an end-to-end method, they perform exceptionally well given a large dataset and often show relatively inferior results for a small dataset. In contrast, traditional feature extraction methods have greater robustness and perform well with a small/medium dataset. Moreover, a texture representation-based global approach is commonly used to classify histological tissue images expect in explicit segmentation to extract the structure properties. Considering the scarcity of medical datasets and the usefulness of texture representation, we would like to integrate both the advantages of deep learning and traditional machine learning, i.e., texture representation. To accomplish this task, we proposed a classification model to detect renal cancer using a histopathology dataset by fusing the features from a deep learning model with the extracted texture feature descriptors. Here, five texture feature descriptors from three texture feature families were applied to complement Alex-Net for the extensive validation of the fusion between the deep features and texture features. The texture features are from (1) statistic feature family: histogram of gradient, gray-level cooccurrence matrix, and local binary pattern; (2) transform-based texture feature family: Gabor filters; and (3) model-based texture feature family: Markov random field. The final experimental results for classification outperformed both Alex-Net and a singular texture descriptor, showing the effectiveness of combining the deep features and texture features in renal cancer detection.


Introduction
Histopathology images contain markers of disease progression and morphological information, supplying a clear view of the tiny structures in tissue such that it is considered as the final diagnosis for cancer subtype [1,2]. That being said, most hospitals lack pathologists; take Germany as an example, where a large shortage of pathologists in Germany could lead to a bottleneck in the health system [3]. The classification of histopathology images by pathologists is a very challenging task. First, there is interobserver discordance between pathologists due to their different capabilities and experiences. Second, the complicity of histopathology images makes diagnosis time-consuming. Many computer-aided diagnosis (CAD) systems have been proposed to overcome these difficulties by extracting the features from histopathology images to identify subtle differences between clinical categories [4,5], such as breast histopathology images [2], lung histopathology images [6], and kidney histopathology images [7].
Renal cancer (RC) is one of the worst cancers in the world. The American Cancer Society indicated that 76,080 new cases and 13,780 deaths will occur in 2021 [8]. Though renal cancer develops slowly, early treatment can improve the cure rate and survival time. There has been a substantial amount of research applying deep learning methods to classify renal cancer from histopathology images, of which deep learning methods have worked well given a large dataset. Tabibu et al. [7] used convolutional neural networks (CNNs) with a whole-slide image dataset with over 1500 samples to classify Renal Cell Carcinoma (RCC) subtypes and predicted the survival outcome from digital histopathological images, achieving an accuracy of 94.07%. Fenstermaker et al. [9] randomly selected over 15000 patches with a size of 1024 × 1024 pixels and achieved an accuracy of 99.1% for the classification of normal parenchyma and RCC using a CNN model. Fenstermaker et al. [9] developed deep convolutional neural networks (DCNNs) to diagnose renal cancers using a dataset of about 30000 whole slide histopathology images from The Cancer Genome Atlas (TCGA) and successfully detected malignancy with an AUC of 0.964-0.985.
However, from those literatures mentioned above, we know that deep learning methods require a large dataset to reach a relatively high accuracy, which can be difficult to obtain due to the scarcity of public medical datasets. Besides this, due to the intrinsic complexity of histopathology images, there are very subtle differences between images in different categories. If relying on deep learning only, misjudgements are unavoidable. At the same time, in histopathology images, there are repetitive patterns which can be particularly suited for texture analysis. Texture is generally characterized by homogenous areas with properties related to scale and regular patterns, texture analysis plays an importance role in many medical image analysis [10], such as medical image classification [11], medical image segmentation [12], and medical images retrieval [13]. Given a smaller histopathology image dataset, traditional texture feature extractors can reach a reasonable result. Alhindi et al. [14] compared local binary pattern (LBP), histogram of gradients (HOG), and deep features (VGG16) for classification of a smaller histopathology images dataset containing less than 1000 samples. The result showed that LBP achieved the highest accuracy of 90.52% with the support vector machine (SVM), which is lower than the accuracies that used a large dataset as we mentioned above, but better than the deep learning method. In [15], the HOG feature from gastric cancer histopathology images was extracted from normal, benign, and malignant gastric images. The accuracy rate of this work was 100%, which is quite impressive.
Currently, deep learning methods are the most frequently studied and successful type of machine learning methods, and the adoption of deep learning in histopathology images [2,6,7,16] has demonstrated its usefulness. While deep learning methods generate an abstract representation that is learned in the hidden layers of the neural network, traditional texture feature extractors generate more mathematically solid features that are particularly suitable for histopathology images and can reach a reasonable result without a large dataset. However, there exist few literatures on renal cancer detection using texture features. To combine the advantages of both the deep learning method and traditional methods, we proposed a classification model shown in Figure 1, which can be used to improve the classification accuracy for renal cancer detection. For the deep learning method, we utilized Alex-Net to extract robust deep features without experiencing overfitting. For traditional methods, we employed five texture descriptors from three families as shown in Table 1 to complement Alex-Net. The contributions of this work are as follows: (1)  proposed model outperformed either the deep learning method or a single traditional texture feature method. The rest of this paper is organized as follows. Section 2 presents Alex-Net along with five texture feature extraction methods with our applied fusion method. Section 3 explicitly shows the experiment results of our model for renal cancer detection and discusses the results and outlines our findings. Section 4 summarizes the research.

Materials and Methods
2.1. Dataset. In this paper, the dataset we used was provided by The Second Affiliated Hospital of the Guangzhou Medical University. It contains 93 RC and 150 patients with healthy kidneys who were enrolled and treated between the year 2010 and the year 2019. For each patient, there are an average of two histopathology images with a size of 1024 × 768 and some images are not in good quality to be included. These histopathology images have been manually diagnosed by multiple doctors. For the purpose of generalizability, we performed rotation and flipping on each image. After preprocessing, we set the proportions of training and testing as 7 : 3; the statistics are as Table 2 shows.

Preprocessing.
In a histopathology image, nuclei are dyed purple, while the other structures are pink. Different structures are distinguishable for the use of manual or automated analysis. However, the color variants due to the preparation of tissue sections like difference of the staining procedure make those analyses difficult. To improve the generalizability of the model confronting data with difference in color styles, we used Structure-preserving Color Normalization (SPCN) [17], which was proposed by Vahadane et al. to control the color variation and contrast enhancement by preserving the structure of the histopathology images. Stain separation is the key step of color normalization, where it first casts the stain separation problem as a nonnegative matrix factorization (NMF) to which they add a sparseness constrain and refer to it as sparse nonnegative matrix factorization (SNMF) with a cost function shown in Equation (1). With the SNMF, for a given source image s and a target image t, their color appearances and stain density maps can be estimate by factorizing V s into W s H s and V t into W t H t . Then, a scaled version of the density map of source H s is combined with the color appearance of the target W t instead of the source W t to generate the normalized source image, which can be described as Equations (2)-(4).
where H RM i = RMðH i Þ ∈ R r x 1 , i = ðs, tÞ and RMð⊙Þ compute robust pseudomaximum of each row vector at 99%. Figure 2 shows an example of color variation and color normalization.
2.3. Alex-Net. The deep learning features were extracted by Alex-Net [18], a classical convolution neural network, and have been widely applied in various medical image analysis tasks such as cancer detection [19] and lesion segmentation [20]. Nawaz et al. [19] fine-tuned Alex-Net by changing and inserting the input layer convolutional layers and fully con-nected layer, achieving a patch and image-wise accuracy of 75.73% and 81.25%, respectively, given a dataset consisting of 400 images (which is not high). Titoriya and Sachdeva [21] used the AlexNet model with the BreakHis dataset [22], and the training model achieved spectacular classification accuracy ranging between 93.8% and 95.7% with a dataset of about 8000 images.
The network consists of eight layers. The first five layers are convolutional layers, the last layers are fully connected layers, and the output of the last fully connected layer is passed to a softmax classifier; the simplified architecture is 3 BioMed Research International shown in Figure 3. There are several main characteristics of the network. First, it successfully used rectified linear units (ReLU) shown in Equation (5) as the activation function and verified that its effectiveness surpassed sigmoid in a deep network. Second, it used dropout to randomly ignore some neurons during training to avoid overfitting of the model. Moreover, it also used data augmentation consisting of horizontal reflection to overcome the problem of overfitting. Third, it used overlapped max pooling to avoid the blurring effect of average pooling. Besides this, it proposed local response normalization (LRN), which creates a competition mechanism for the activity of the local neurons so that the value with a larger response becomes relatively large, and other neurons with smaller feedback are inhibited, enhancing the generalization ability of the model. The responsenormalized activity b i x,y is given by Equation (6).
2.4. Texture Feature Extraction. The eight texture extractors are described in this subsection. First, the three methods (IGH, GLCM, and LBP) from the statistical texture feature family are given. Afterwards, Gaussian filter from the transform-based family is described. Finally, MRF coming from the model-based family is introduced. Table 1 lists the five methods from the three families.

Statistical
Texture Feature Family. The statistical texture feature descriptors are based on the statistical properties of the spatial distribution of the grey levels [23][24][25]. The statistical characteristics include the first-order (one pixel), second-order (two pixels), and higher-order (three or more pixels) statistics. The first-order statistics estimate properties of one pixel value, whereas second-and higher-order statistics evaluate properties of the spatial interaction between two and more image pixels [24]. To explore the various order statistics of kidney histopathology images, HOG (first order), GCLM (second order), and LBP (higher order) are used.

Histogram of Oriented Gradients (HOG)
. This feature is a feature descriptor used for object detection in computer vision and image processing. It composes the features by calculating and counting the histogram of the gradient direction of the local area of the image. HOG feature combined with a SVM classifier has been widely used in image recognition, especially in pedestrian detection [26]. It operates on the local grid cell of the image, which enables it to maintain a good invariance to the geometric and optical deformation of the image [27]. Since there is large randomness of viewing angles from the process of creating histopathology images, the HOG feature is particularly suitable for the feature extraction of histopathology images. Figure 4 is an example of plotting the HOG features over the original image.   HOG feature extraction steps are as shown [27].
(1) Normalize the image: convert the input image to a grayscale image and use the Gamma filter method to perform global normalization on the grayscale image. The purpose is to avoid the influence of noise in the image (2) Calculate the gradient value and direction of the image to describe the structure and shape of the image and eliminate the interference of noise. The formulas are as follows:   where Gðx, yÞ, G x ðx, yÞ, G y ðx, yÞ, and Hðx, yÞ are the gradient value of the current pixel, horizontal gradient, vertical gradient, and pixel value and αðx, yÞ is the gradient direction.
(1) Divide the image into cell units and construct a gradient histogram. The cell size will affect the encoding of the feature vector. If the cell size is too large, it will lead to incomplete coding of the feature information; if the cell size is too small, it will lead to an increase in the time complexity (2)  . GLCM is a well-known texture analysis method by extracting the second-order statistical texture features [28][29][30]. Each element Pði, j | d, θÞ in GLCM corresponds to the number of occurrences of the pairs of gray levels i and j which are at a distance d apart in the direction of θ. Figure 5 shows an example of the computation for GLCM [31] . Here, there is an image with 8 gray levels, where the size of GLCM is 8 × 8. When d = 1 and θ = 0, gray level (1, 2Þ appears once, meaning that the element Pð1, 2Þ in GLCM equals 1, while the gray level ð5, 6Þ appears twice and the element Pð5, 6Þ in GLCM is set as 2. Once the matrices are computed, various properties can be extracted to represent the texture of the image. In this paper, four properties are extracted (in what follows, the image has N discrete intensity levels): where contrast evaluates the local variations in the matrix, correlation measures the joint probability occurrences of the pairs, and energy is the sum of squared elements in the matrix, which provides information on image homogeneity; a low value means the probabilities of the gray-level pairs are rather similar and high values otherwise. Besides that, homogeneity estimates the proximity of the distribution of elements in the matrix. Table 3 is an example of the four properties of kidney histopathology images from a normal and RC sample.

Local Binary Pattern (LBP). LBP was introduced in [32]
to characterize texture features presented in grayscale images, and it has been widely used in many fields of computer vision due to the simple calculation and its good performance, especially in face recognition [33] and object detection [34]. First, the input image is divided into nonoverlapping cells, and histograms are extracted from each of those cells, respectively. Taking a window with size of a 3 × 3 as shown in Figure 6, the threshold is the gray scale of the center pixel; compare its 8 neighbors with the threshold. If the neighbor is large, its value is set as "1," otherwise it is "0." From left to right and top to bottom, an 8-bit binary number is generated and converted to decimal as the LBP value of the center pixel. Over the cell, a histogram is computed based on the frequency of each decimal number. Then, the histograms are concatenated into the LBP features of the image to represent the image, where the size of the LBP features depends on the number of cells and the number of bins of the histograms. Figure 7 is an example of extracting the LBP features from an image.
2.6. Transform-Based Texture Feature Family. Transformbased texture descriptors commonly use linear transformers, filters, or filter banks to transform images into another space to distinguish texture more easily in the new space [10]. The Gabor filter is a very useful linear filter used for texture analysis [35].
2.6.1. Gabor Filter. A Gabor filter has frequency and direction that are similar to the human visual system, which makes it very helpful in image processing, especially in face    BioMed Research International recognition [36]; a 2-D Gabor filter is defined as Equation (15) [37]. In the original spatial domain, a Gabor kernel is the result of a Gaussian kernel and sine wave modulation, and images are filtered by the real parts of the Gabor filter kernels. Then, the mean and variance of the filtered images are used as texture features for image classification. For this paper, we set various filter sizes to extract the texture feature of the histopathology images. Figure 8 is an example of Gabor output from a healthy and RC kidney histopathology images with the filter size being 24.
where x ′ = x · conθ + y · sin θ, y ′ = x · sin θ + y · cos θ θ, σ is the variances, θ is the wavelength, ɣ is the aspect ratio of the sinusoidal function, and θ is the orientation.   Step 3. Fuse both feature vectors (from steps 2 and 3) by concatenating them.

Model-Based
Step 4. Train the classifiers with the merged features as its input data.
Step 5. Apply the model on the test set to validate it. Output: the predicted labels Algorithm 1: Detailed steps (refer to Figure 1) are as follows.  (16), which implies that each element is only related to its neighbors and not influenced by the nonneighboring elements. Markov chains that are extended to multiple dimensions are called MRF [38]. MRF has been applied in many fields of image processing such as segmentation [39] and classification [40], with its main advantage being that it provides the interrelationship of the related random variables in the expression space and makes full use of the statistical dependence of the neighbor pixels. The difference of the kth feature diff k is calculated as where k ranges from 1 to 4096, N pos and N neg are the number of positive and negative images in the training set, and v i,k is the kth dimensional feature of the ith image. Feature components are then ranked from the largest diff k to smallest, and the top 100 feature components are selected [41]. We terminated the training after 5 epochs when the validation accuracy did not improve.
In this subsection, we proposed a model to tackle the issue of RC detection. The detailed steps of our framework are shown in Figure 1. After image preprocessing, feature extraction, feature selection, and feature fusion, we can eventually classify RC from healthy kidneys using histopathology images.

Results and Discussion
In this section, we validated the proposed model on the dataset mentioned in Section 2.1. The experiments were implemented in MATLAB 2020a with an Intel Core I7 computer processor, 16 GB of RAM, and a Windows 10 system. Three traditional classifiers of LR, SVM, and RF were chosen to detect RC based on the merged features, and we repeated the experiment for ten times and got the average as the final result. For LR, the penalty is set as "l2" and C equals to 1.0, while the linear kernel function is used and C is equal to 0.025 in SVM. In terms of RF, the criterion is entropy, and the maximum depth of the tree is equal to 3. We adopted accuracy, precision, recall, and F1 score as evaluation metrics for the proposed model, defined as follows: where True Pos: is the class of correctly classified normal kidney images and True Neg: represents the class of correctly classified RC histopathology images. False Pos: is the incorrectly classified normal kidney images, and False Neg: is the incorrectly classified RC images.

Deep Feature Results.
For Alex-Net, we fine-tuned the training parameters and trained Alex-Net by ImageNet. Then, we extracted the features from the histopathology images via the "fc7" layer and obtained a 4096-dimensional vector for each image [42]. We terminated the training after 20 epochs when the validation accuracy did not improve. An accuracy of 87.72% with a precision of 81.86%, a recall of 98.25%, and a F1 score of 88.89% was obtained as shown in Figure 9.

HOG Results.
In HOG, we analysed a range of combinations of cell sizes and block sizes (refer to Table 4) for renal cancer detection [43]. As Figure 10(a) shows the results of HOG using LR, the best accuracy of 83.34% with a precision of 84.60%, a recall of 91.92%, and F1 score of 89.95% was achieved where the combination is No. 3 (6 × 6 cell and 4 × 4 block size) with LR, SVM, and No. 4 (6 × 6 cell size and 5 × 5 block size). Figure 10(b) represents the results of HOG using SVM; an accuracy of 88.80% was reached with combination No. 3, while its precision, recall, and F1 score were 87.87%, 92.92%, and 89.85%, correspondingly. As shown in Figure 10(c), using RF, the highest accuracy of 79.13% with a precision of 80.28%, a recall of 81.03%, and F1 score of 79.09% was obtained.

GLCM Results.
For GLCM, four crucial properties were selected, including contrast, correlation, energy, and homogeneity as we mentioned in Subsection 2.5.2. All 15 combinations for these four properties were used to represent the texture feature of the histopathology images. The matrix property combinations are shown in Table 5, and its results are illustrated in Figure 11. As seen in Figure 11(a), using LR, the best accuracy of 73.04 with a precision of 74.38%, a recall of 74.90%, and a F1 score of 73.01% was obtained, where co ntrast + correlation + energy was used. Figure 11(b) shows that with SVM, an accuracy of 71.79% with a precision of 67.88%, a recall of 97.04, and a F1 score of 67.88 was reached using contrast + energy. For RF, the highest accuracy of 82.60% was higher than that of LR and SVM, with a precision of 82.53%, a recall of 83.65%, and a F1 score of 82.44%, using correlation + energy + homogeneity.
3.2.3. LBP Results. LBP as a higher-order statistical texture feature extraction method was used as the third extractor in the kidney histopathology images. The uniform LBP with 8 neighbors and radius 1 was used here since it has been proven to be compact and powerful [44]. We set the range of the cell size from 4 to 32. The results based on LBP with varying cell  BioMed Research International sizes using three traditional classifiers are represented in Figure 12. The highest accuracy using LR based on the LBP with cell size = 16 was 84.46% with a precision of 83.74%, a recall of 84.46%, and a F1 score of 83.99% (refer to Figure 12(a)). As shown in Figure 12(b), using SVM, an accuracy of 81.73% with a precision of 81.15%, a recall of 81.93%, and a F1 score of 81.37 was obtained, where the cell size = 8. Figure 12(c) presents the results of using RF; the best accuracy with the cell size = 32 was 85.21% with a precision of 85.21%, a recall of 84.46%, and a F1 score of 83.99%.

Transform-Based Texture Feature Family Results
3.3.1. Gabor Filter Results. The Gabor filter, as the most commonly used filter in pattern recognition was applied.
Here, we varied the filter size for the Gabor filter; the range of the filter size is 4 : 4 : 32. The classification results by different classifiers based on the filter with an increasing filter size are illustrated in Figure 13. The highest accuracy obtained through LR was 88.69% with a precision of 88.23%, a recall of 88.46%, and a F1 score of 88.34% where the filter size = 16. As shown in Figure 13(b), using SVM, an accuracy of   Figure 14 shows the classification results using MRF while varying the number of iterations. As shown in Figure 14(a), using LR with an iteration of 40, an accuracy of 53.91% with a precision of 54.65%, a recall of 54.78%, and a F1 score of 53.78% was obtained, which is relatively low. The highest accuracy of 80.86% with a precision of 81.12%, a recall of 82.18%, and a F1 score of 80.75% using SVM was obtained, where the iteration was 50 (refer to Figure 14(b)). Using RF, an accuracy of 73.04% RF was obtained with a precision where the iteration is equal to 40 (refer to Figure 14(c)).     Figure 9 and Due to the lack of equipment, most hospitals can only provide normal histopathology images with low lenses (at 100x magnification), where the quality of those images is much lower than a whole-slide image (WSI). In the future, we could explore the application of the proposed method on the WSI as literature. As a result, the accuracy of the 13 BioMed Research International classification is not as good as research that uses WSI [45,46]. In the literature [29], GLCM with a SVM classifier was employed achieving an accuracy of 92.8% and GLCM with k-NN obtaining an accuracy of 91.65%. These results are remarkable compared with accuracies of 73.04%, 71.79%, and 82.60% we got while using GLCM only with LR, SVM, and RF. However, considering the large number of basic hospitals and the number of patients, we could build a much bigger dataset to verify the proposed method. Compared with the limited published datasets using WSI, normal histopathology images provided by basic hospitals might be more promising.
In the future, there are several options to explore regarding improving the accuracy of detecting RC using our method. One avenue is to vary the size of the dataset to establish the optimum quantity of images. Also, the impact of the hardware specifications should be considered, a dedicated machine versus setting minimum required specifications. Furthermore, we can consider more features like shape to describe the characteristics of histopathology 14 BioMed Research International images more comprehensively, before obtaining a better performance in RC detection, so that we can detect and diagnose RC early and effectively improve the survival and cure rate.

Conclusion
In this study, we proposed a classification model to detect renal cancer using a histopathology dataset by fusing the features from a deep learning model with the extracted texture feature descriptors. After the preprocessing of histopathology images including transformation and color normalization, we extracted deep features using the Alex-Net and texture features using five texture feature descriptors from three families separately to complement Alex-Net, then fused deep features and texture features for the classification of RC. To optimize the performance of the proposed method, various parameter(s) of each extractor were experimented. Experimental results validated that the proposed model outperformed the deep learning model or the singular texture feature descriptor; we extensively studied the effects of texture features to accomplish deep features. For the future work, we can apply the proposed model for different histopathology images dataset to optimize the performance.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.