Transfer Learning with Feature ExtractionModules for Improved Classifier Performance on Medical Image Data

Transfer learning attempts to use the knowledge learned from one task and apply it to improve the learning of a separate but similar task. is article proposes to evaluate this technique’s eectiveness in classifying images from the medical domain. e article presents a model TrFEMNet (Transfer Learning with Feature ExtractionModules Network), for classifying medical images. Feature representations from General Feature Extraction Module (GFEM) and Specic Feature Extraction Module (SFEM) are input to a projection head and the classication module to learn the target data. e aim is to extract representations at dierent levels of hierarchy and use them for the nal representation learning. To compare with TrFEMNet, we have trained three other models with transfer learning. Experiments on the COVID-19 dataset, brain MRI binary classication, and brain MRI multiclass data show that TrFEMNet performs comparably to the other models. Pretrained model ResNet50 trained on a large image dataset, the ImageNet, is used as the base model.


Introduction
Transfer learning is the paradigm of learning that aims to transfer knowledge from one task to another, which is somewhat related [1]. In deep neural networks, the rst layer often learns the general features, and eventually, by the last layer, the speci c features are learned [2]. For example, the di erent nodes will learn features speci c to a particular class in the last layers of a neural network. In this research work, we have studied the transferability of learning, with the variation in the number of layers trained as feature extraction modules. is can be helpful because a lot of the low-level features that have been learned from a vast amount of readily available data can be used for another task that may have less amount of useable data. Often, when the dataset on which the model is to be built is less, we opt for transfer learning. A network is trained on data like our dataset, such as image data, and the knowledge is transferred to a di erent task, such as the diagnosis of X-ray images. For this, some layers of the old network are retrained on this new dataset. If we retrain all the parameters in our network, then this initial training phase on image recognition is sometimes called pretraining. en, we update all the weights and train on the target data by ne-tuning. Figure 1 depicts the general proposed methodology used in this study.
In this study, we have taken the brain magnetic resonance images (MRIs) and COVID-19 X-rays to evaluate the e ectiveness of transfer learning. Magnetic resonance imaging is the most used technique for identifying brain tumors. MRI may detect these tumors with the help of an expert or a doctor's opinion. In this research, we apply transfer learning with the feature extraction modules from which the representations are input to a projection head. A softmax classi er classi es the learned features. We name our model TrFEMNet, which is e ective because features from di erent levels of the hierarchy contribute to the nal classi cation process. We believe that combining features from di erent levels of the hierarchy by a projection head is a novel contribution and results in e ective outcomes, as seen from the experiments.
is model is evaluated for the identi cation of tumors from brain MRI images or COVID-19 and viral pneumonia cases from COVID-19 X-ray images. e ResNet 50 model has been taken and pretrained on the ImageNet dataset. e highlights of this article are as follows: (1) Preprocessing, data augmentation, and normalization of images. (2) Presenting a mathematical formulation for transfer learning.
(3) Proposing the model TrFEMNet, with feature representation by the GFEM and SFEM, with a projection head comprising of a two-layer multilayer perceptron with nonlinearity, on the pretrained ResNet50 model (4) Evaluating the results of experiments with Brain MRI 2-class, brain MRI multiclass, and COVID-19 X-ray dataset.
Sensitivity, specificity, F1-score, and accuracy are used to evaluate the classifiers' performance. e rest of the article is organized as follows. Section 2 presents the literature review, Section 3 presents the mathematical formulation of transfer learning, Section 4 presents the materials and methods, and Section 5 presents the results. Finally, Sections 6 and 7 present the discussion and conclusion.

Literature Review
Numerous studies on the brain MRI classification have been carried out. Varuna Shree et al. [3] presented a model that uses a discrete wavelet transform (DWT) for feature extraction, statistical features to reduce the number of features, and a blended artificial neural network for brain MRI classification. In this research, they obtained 98% accuracy. e authors in Ref. [4] present a content-based brain tumor detection system. ey design a feature extraction framework using the VGG19 convolutional neural network (CNN) model with closed-form metric learning. e authors in Ref. [5] propose a method of predicting CT images from MRI images, showing that their method is very robust. A model for tumor classification and segmentation was presented by Ali and Davut Hanbay [6], yielding a classification accuracy of 97.18%. Sajid et al. [7] introduced the data augmentation methodology to the original dataset and then processed it with the convolutional neural network (CNN) method; the softmax function was employed in the classifier on both original and enhanced data, upon which the accuracy was 94.58%. Sachdeva et al. [8] applied the region of interest (ROI) approach to an MRI dataset. e ROI images were then segmented to eliminate tissue and color features. en, they chose the most efficient characteristics using the genetic algorithms (GA) and went through the classification procedure.
ey obtained an accuracy of 94.9% in their study.
Nazir, Wahid, and Ali Khan described a new approach for automated brain tumor MRI identification and classification [9]. e data were classified into two parts: benign and malignant. In the presented paradigm, there are three basic phases. Filter methods were used to eliminate image noise in the first step. In the second stage, the features were used to derive the mean color moment of each image in the dataset. In the last stage, an artificial neural network (ANN) categorized the feature set of color moments with 91.8 percent classification accuracy. In Ref. [10], the author applied various deep learning techniques to classify Benign and malignant tumors. ey achieved 98.49% accuracy using a hybrid approach of CNN and SVM.
Navid et al. [11] proposed a deep learning model with generative adversarial networks in the multiclassification of MRI images. ey used the GAN on several MRI datasets, including meningioma, glioma, and pituitary tumors and then used a six-layer deep learning model to achieve an  overall accuracy of 95.60 percent. Chakraborty [12] has created a dataset on Kaggle, upon which several researchers have applied machine learning techniques and built models. Habibzadeh et al. [13] apply pretrained deep learning models for automatic white blood cell classification, and these models perform very well. e authors in Refs. [14,15] present a review of image enhancement techniques and work with fewer data. Fayaz et al. [16] incorporated feature extraction methods and converted images to three-channel mode; the final accuracy achieved was 92.5% with the KNN classifier. Tajik et al. [17] proposed a texture overcome matrix model. ey used the different feature extraction methods such as PCA, index approach, and Gabor filtering to obtain an accuracy of 96.67%. Togacar et al. [18] employed the CNN and SVM to achieve 96.77% accuracy with the masks produced by the hypercolumn approach in the suggested method.
In Ref. [19], the author used MobileNetv2 to select the features.
ey generated the 1000 features using Mobile-Netv2 and used iterative neighborhood component analysis to find the most important features. ese features are trained using SVM and obtained an accuracy of 99.10%. e dataset used 444 images related to three types of tumor diseases and the rest with no tumor. e authors of Ref. [20] obtained a 95.75 percent accuracy using the CNN model with 22 layers with the transfer learning approach. ey used the 22 layers in their proposed model, and the dataset used in this study comprised three main kinds of tumors. e authors in [21] developed the model for segmentation and detecting brain tumors. ey used Berkeley's wavelet transformation and the deep learning model for image segmentation in this research. ey also used the GLCM method for extracting the features from segmented images. In this research, the author obtained an accuracy of 98.5%. e author of Ref. [22] tested many approaches for classification, including support vector machines (SVMs), K-nearest neighbors (KNNs), binary decision trees (BDTs), random forest (RF), and ensemble methods, and obtained a high accuracy of 97% utilizing the SVM classifier. e authors of Ref. [23] employed the ResNet architecture with data suspension encoding and fusion layer to achieve a 98.02 percent accuracy. e dataset used in this study included three types of tumor images. e author of Ref.
[24] compared the findings using three distinct datasets. e author proposed a scalable range-based adaptive bilateral filter to reduce noise from pictures in this article. e fully convolutional network was then used for segmentation. ey were able to achieve a 97% accuracy rate. Several other researchers have also applied various transformation techniques to improve the quality of images used for classification [25,26]. e authors in Ref. [27] propose a convolutional neural network model with dilations to classify different types of brain tumors and achieve an accuracy of 97%. e authors in Ref. [28] have applied several optimizers on CNN models for brain tumor segmentation and found that the Adam optimizer had the best accuracy of 99.2% in enhancing the CNN ability in classification and segmentation. In contrast, others in Ref. [29] aim at improving feature extraction by using a texture amortization map (TAM). Several researchers have also extensively studied the application of deep neural networks such as hybrid networks [30], capsule networks [31], and convolutional networks [32][33][34][35].

Transfer Learning-Overview and Notations
Transfer learning is a very powerful idea in deep learning since we can take the knowledge a neural network has learned from one task and apply that knowledge to a separate task [36][37][38][39].
e transformation of each original feature into a learned representation for knowledge transfer preserves the properties or the potential structures of the data and finds the correspondence between features [40]. We first present the notations used for a better explanation of concepts. In this work, we assume that the labels for all data instances are available. Hence, supervised classification is performed. However, our notations are broadly adapted from Ref. [40]. Consider a scenario where a network learns to recognize objects like cars and trucks, and then uses that knowledge or uses part of that knowledge to improve the task of reading X-ray scans. Here, we have a source domain D s of images and task T s of identifying the labels of cars and trucks in the source domain. us, the source domain D s can be presented as follows: D s � (x, y)|x i ∈ X s , y i ∈ Y s , i � 1, 2, . . . , n s }, and let the learned model be lf s . Definition 5. We thus state that lf t * � lf s trans (D t , T t ), where lf t * is the model learned by applying for knowledge transfer from lf s , on D t , T t ; while lf t � L(D t , T t ).
In other words, by transfer learning, one can use the knowledge from the task T s of identifying the labels cars and trucks in the source domain D s of images to improve upon the target task T t of reading X-ray images for the presence of a disease, from the target domain of X-ray images. e scenario addressed in this research work is homogeneous, inductive transfer learning, and we apply the feature-based and parameter-based approach [41]. We aim at investigating whether lf t * is an improvement on lf t (please refer to Definition 5).

Materials and Methods
ree datasets have been used in this study. Two datasets are for brain MRI classification, namely, 2-class and 4-class data, and one 3-class dataset for COVID X-ray images.

COVID-19 X-Ray 3-Class Dataset.
is dataset is available on Kaggle [42]. It contains 111 samples of COVID-19 class and 70 samples of a normal class, and 70 samples of viral pneumonia class with 512 × 512 size. We generated 500 samples of each class using the data augmentation technique; median filters and histogram equalization were applied and resized the image into 224 × 224 size. Hence, there were 1500 samples (with an equal number of classes) for training. ere are 26 COVID-19 samples, 20 normal samples, and 20 viral pneumonia samples for testing in the original dataset. In this dataset, the skull boundary technique was not applied.

Brain MRI Tumor 2-Class Dataset.
e dataset used in this study is available on the Kaggle repository [12]. e dataset includes normal and tumor samples, with 98 samples belonging to the tumor and 155 samples to the normal. A total of 253 images were associated with the patients' MRI brain scans. e image quality is not great because of the multiple resolutions, and each image has a different resolution. e format of images is JPEG. In this article, the images were converted to grayscale, and the skull's border was found by removing the image's background color. is is to remove any extra color that may be present outside of the skull. As a result, it provided the original image's contour. e data were split into 195 samples for training and 60 samples (30 of each class) for testing. Data augmentation techniques [18][19][20] were used to enhance the training samples to 6412. Median filters and histogram equalization were applied. All images of the original dataset have different resolutions with sizes 512 × 512. We resized the size of all images to 224 × 224 for further processing. All the images are normalized between 0 and 1.

Brain MRI Tumor 4-Class Dataset.
is dataset is available on the Kaggle website [43]. is dataset contains 300 samples of glioma tumor, 306 samples of meningioma tumor, 405 samples of no tumor, and 300 samples of pituitary tumor class. Apart from this, the dataset also contains 100 samples of glioma tumors, 115 samples of meningioma tumors, 105 samples of no tumors, and 74 samples of pituitary tumors for testing purposes. No data augmentation technique was applied. Median filters and histogram equalization were applied. e original size of the dataset was 512 × 512, which we resized into 224 × 224. Table 1 gives the train test distributions and the class imbalance, if any, in all the datasets.

Preprocessing.
e preprocessing steps applied are enumerated as follows: (1). Capturing skull boundary of brain MRI images (2). Data augmentation: new data samples have been generated using the data augmentation process to balance the classes using the Keras module with the following augmentations: shear range of 20%, zoom range of 20%, rotation range angle of 30%, and fill mode set to nearest. (3). Filters and histogram equalization: the median filter is the most effective and extensively used filter for eliminating noise from pictures. A median filter is applied to the images after data augmentation. Finally, histogram equalization was applied to increase the images' contrast and improve the image quality.

Evaluation Parameters.
e evaluation parameters used in this research work are sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1score, and accuracy, as presented in Table 2.

Method Implementation Details.
All our experiments take the ResNet50 trained on the ImageNet dataset as the base model. e architecture of ResNet50 is presented in Figure 3. For our proposed architecture TrFEMNet, the General Feature Extraction module (GFEM) is implemented by freezing the weights of the lower layers of the ResNet50 model; the upper-level layers are retrained with the target data instances for implementing the Specific Feature Extraction module (SFEM). e output of the GFEM and SFEM gives the feature representations, which are fed into a projection head (PH) module. e output from the PH module is fed into the softmax classifier for the final output. e PH module comprises a two-layer MLP with ReLu nonlinearity. is module extracts the features at a middle level and higher level of hierarchy and gives a better representation for improved classifier performance. Few dense layers have also been added above the convolutional layers. For comparison with TrFEMNet, we have built three other models without the projection head. e variation in the number of layers with fixed/trainable weights has been done.
We develop models based on ResNet50, with softmax as an output layer activation function. e description of the models is as follows. Model 1 consists of one dense trainable layer of 1024 neurons (other than the last classification dense layer). In this model, we freeze all convolutional layers of ResNet50, thus enabling only one dense layer for training purposes. Model 2 is similar to Model 1, except it consists of two trainable dense layers of sizes 1024 and 512, respectively. In Model 3, we freeze all convolutional layers except the last one. In ResNet50, there are 48 convolutional layers, so we    freeze the 47 convolutional layers and enable one convolutional and one dense layer of 1024 neurons for training purposes. For the TrFEMNet model, we keep two convolutional layers and two dense layers with 1024 and 512 neurons for the SFEM, and the frozen layers comprise the GFEM. e outputs from GFEM and SFEM are fed into the PH and then to the classifier. Figure 4 shows the schematic diagram of transfer learning for the TrFEMNet model. Table 3 shows the parameters used in the training phase for all models.

Experimental Results on All Datasets.
e models were applied on healthcare datasets, and the results are presented in Tables 4-6 Tables 4-6, it is seen that TrFEMNet performs comparably to other models in all the cases. For the COVID-19 dataset, all the parameter values except accuracy are the highest. For the brain MRI 2-class dataset, the model gives the second highest values but within the 0.65% range. Moreover, for the brain MRI 4-class dataset, the value for specificity is 60.45%, which is 7% higher than the second best; the value for sensitivity is 85.01%, which is 0.74% higher than the second best. Similarly, values for accuracy at 78.05% and F1-score at 53.09% are also the highest.     6. Discussion e transfer learning approach applied in this research has used ResNet50 as the base model for parameter sharing from the previous training on the ImageNet dataset. e mathematical formulation of the transfer learning concept is given by means of several definitions, and Definition 5 provides the theme of the research, which has been conducted. We aim at finding out how well the model learned through transfer learning performs. For this, the concept of General Feature Extraction Module and Specific Feature Extraction Module has been introduced. e results of our experiments on the Brain MRI 2-Class dataset show that Model 1 and Model 2, which do not have any convolutional layers in their SFEM, do not perform very well. Model 3, with one convolutional layer in the SFEM, performs slightly better and achieves the best values for the brain MRI 2-class dataset. e TrFEMNet model performs comparably to other models for most of the parameters on all datasets. For the COVID-19 dataset, the micro-and macroaverage ROC curve area is 0.97 and 0.96, respectively. For the Brain MRI 2-class dataset, the ROC curve area is 0.99, and for the Brain MRI 4-class dataset, both the micro-and macroaverage curve areas are 0.71, while for class glioma tumor, it is 0.86. Feature Extraction Module (GFEM) and Specific Feature Extraction Module (SFEM) are input to a projection head and the classification module to learn the target data. e aim is to extract representations at different levels of hierarchy and use them for the final representation learning. In addition, for a detailed understanding of our work, we have detailed the steps for preprocessing, data augmentation, and normalization of images. A mathematical formulation for transfer learning is given in Section 3. e proposed model TrFEMNet obtains the feature representation by the GFEM and SFEM, with a projection head comprising a two-layer multilayer perceptron with nonlinearity, on the pretrained ResNet50 model. e model is evaluated by comparing it with three other models built by transfer learning, with brain MRI 2-class, brain MRI multiclass, and COVID-19 X-ray datasets. Experiments on the COVID-19 dataset, brain MRI binary classification, and brain MRI multiclass data show that TrFEMNet performs at par with other models, and for the most complex dataset, it achieves an improvement of 7% on specificity with respect to the second-best model. e outcome of this research motivates one to investigate further into the realm of transfer learning. As part of our ongoing work, we aim at investigating more architectures for the projection head and with more levels of extraction from the base model.

Data Availability
e dataset used in this study is available at https://www. kaggle.com.

Conflicts of Interest
e authors declare no conflicts of interest.