Remote Diagnosis and Triaging Model for Skin Cancer Using EfficientNet and Extreme Gradient Boosting

,


Introduction
Skin cancer is one of the commonly occurring and deadly types of cancer. e expected estimated number of newly diagnosed skin cancer patients during 2020 in the USA will be more than 1.8 million [1]. Skin cells are usually damaged due to excessive exposure to ultraviolet (UV) radiation. Skin cancer is a type of cancer caused by damaged skin cells or abnormal growth of skin cells. It can be mainly categorized as basal cell carcinoma (BCC), melanoma (MEL), nonmelanoma skin cancer, and squamous cell carcinoma (SCC). However, some types of skin cancer are very rare such as Kaposi sarcoma (KS) and actinic keratosis (AK), also known as solar keratosis, lymphoma, and keratoacanthoma. e nature of some skin cancer types is lethal and metastasize in nature. Early screening and prognosis of the skin cancer will increase the chance of recovery and survival; otherwise, it will lead to grim conditions. e widespread and deadly nature of the disease demands the effective noninvasive diagnostic mechanism with increased accuracy. Skin cancer is mainly diagnosed via visual examination along with some clinical and histological investigations. e clinical information includes some demographic information, location, and nature of skin lesion [2]. Visual examination with the naked eye usually could not recognize and disclose the details. To overcome this drawback, dermatoscope, a medical equipment for the skin lesion investigation, was introduced. e device greatly enhanced the accuracy of early diagnosis capability [3]. A dermoscopic image is a magnified high-resolution enlarged image of the skin lesion.
Despite the invention of dermoscopic images, which greatly enhanced the accuracy, it highly depends upon the dermatologist's experience and subjective judgment [3]. High similarity among the visual feature of different types of skin cancer sometimes leads to the wrong diagnosis. e diagnosis can be further enhanced using seven checklist points [4] and ABCD and ABCDE rules [5]. Seven checklist points use seven dermoscopic features for malignant melanoma diagnosis. e sensitivity of the diagnosis was further improved with the integration of dermoscopic images and seven checklist point [6]. While, in the ABCDE rule, A stands for asymmetry; B stands for border; C stands for colour; D stands for differential structure; and E stands for evolution [7]. e ABCDE rule increases the accuracy of the diagnosis but also require proper training to use the criteria. Previously mentioned methods improved the diagnosis process but were limited to melanoma diagnosis only. erefore, demand for an automated computationally intelligent method that can further enhance the visual features and can aid the dermatologist in the diagnosis. e rest of the paper is organized as follows. Section 2 reviews previous studies for skin cancer diagnosis. Section 3 presents the material and methods used in the study. Section 4 provides the experimental setup and results. Section 5 contains the conclusion.

Related Studies
Several studies have been made to develop a computer-aided diagnosis (CAD) system for skin cancer [8,9]. Initially, the studies related to skin cancer were mainly focused on using image processing techniques [10], following using machine learning techniques (supervised and unsupervised) [11] and recently convolution neural network (CNN) and deep learning models [12]. Deep learning models have produced significant development and advancement in medical image analysis and particularly for skin cancer [13,14]. Some of the recent studies using deep learning are discussed below. e studies are organized chronologically in the literature review.
Esteva et al. used the deep CNN model for the diagnosis of two types of skin cancer such as keratinocyte carcinomas and seborrheic keratoses. e diagnosis performance was compared against the decision made by 21 highly qualified and experienced dermatologists [15]. e study proved the significance of AI and particularly deep learning in skin cancer diagnosis. e study was performed using ISIC and Dermofit skin lesion data set. Another investigation was made by Haenssle et al. [16] who compared the performance of the Google Inception V4 deep learning model with the top five algorithms in the ISIC 2016 challenge and the diagnosis decision made by 58 dermatologists. e data set contains both the images (dermoscopic and digitalized images) and clinical information of 100 patients. Furthermore, a study made by Brinker et al. [17] developed an enhanced deep learning model for the diagnosis of melanoma. ey compared the performance of the model with the diagnosis decision made by 145 dermatologists from 12 hospitals in Germany.
Additionally, Pacheco et al. [18] developed a smartphone application using skin lesion images and clinical information for automated diagnosis. e study covers six categories of skin cancer with a total of 1,641 skin lesions.
e study compared various deep learning models such as GoogleNet, VGGNet-13/19-bn, ResNet (50, 101), MobileNet, and threelayer convolutional neural network. e models were first trained using skin lesion images taken from smartphone cameras and then using both (skin lesion images and clinical features). e first model achieved an accuracy of 0.69 and was enhanced with the integration of clinical data and achieved an average accuracy of 0.764. e proposed study attempts to enhance the outcome achieved by Pacheco's study.
Consequently, Kadampur and Riyaee's study developed a model-driven architecture for the diagnosis of skin cancer using dermal cell images. Several deep learning models were trained using the HAM10000 data set and achieved an AUC (area under the curve) of 0.99 [19]. Likewise, two CNN models such as region-based convolutional neural network (RCNN) and Faster RCNN was used for skin lesion classification for benign and malignant tumor images [20]. e outcome of the model was compared with the diagnosis made by 10 certified dermatologists and 10 trainee dermatologists and conclusively achieved better classification accuracy than the dermatologists.
Wei et al. [21] used pretrained deep learning models such as MobileNet and DenseNet. ImageNet weights were used for extracting the features using ISIC 2016 data set and achieved an accuracy of 0.962. Another study performed by Pham et al. [22] developed a CNN model for melanoma classification using ISIC 2019 and MClass-D set dermoscopic skin lesion and achieved an AUC of 0.944. e diagnosis of the proposed framework was further verified with the 157 certified dermatologists in German hospitals.
Importantly, the integration of skin cancer clinical images and intelligent computation techniques produced effective outcomes and motivated the exploration and implication of remote triaging for the skin cancer diagnosis. Recently, Udrea et al. [23] proposed smartphone application for identifying the patient at risk using a skin lesion. e model was trained using skin lesion images from multiple data sets. Initially, the lesion segmentation was applied on the image, after the segmentation noise such as hairs and freckles were removed; then features were extracted such as colour, shape, and texture; and finally all the extracted features were input to support vector machine (SVM) classifier. e application produced good outcomes in terms of sensitivity (0.951). e transfer learning concept was widely used for skin cancer detection. One of the studies performed by Kassem et al. [24] on the ISIC 2019 challenge data set using the GoogleNet pretrained model for eight categories of skin cancer lesion and achieved an accuracy of 0.949. Another extensive study was made to evaluate the performance of the proposed YOLOv2-SquuezeNet for segmentation and 2 Complexity several classifiers for classification using four-year ISIC data sets challenges (2017, 2018, 2019, and 2020) [25]. e study achieved mean average precision of 985 using optimized SVM. Moreover, Gessert et al. [26] proposed an ensemble method for integrating gender, anatomy information, and skin lesion for diagnosing skin cancer using multiple data sets. Several image processing techniques were applied for preprocessing. However, the model was trained using EfficientNet and achieved an accuracy of 0.63 using the ISCIS 2019 data set. Recently a study made by Goceri [27] developed a multilayered deep learning model using facial skin lesions. Initially, the images were segmented to identify the facial disorder skin lesion. Subsequently, these skin lesions were used by pretrained DenseNet201 for classifying the skin lesions. e study achieved an accuracy of 0.95. Furthermore, another study was performed for melanoma diagnosis using dermoscopic images [28]. ISIC 2020 data set was used for training EfficientNet models (B5 and B6).
e study extensively applied several data augmentation techniques to increase the number of images and better training the deep learning models. ey achieved an accuracy of 0.9411.
Despite extensive research made in skin cancer diagnosis, mostly, studies are using skin lesion images, and very few studies used clinical data. e importance of clinical data in the diagnosis cannot be denied [15,16,22]. One of the recent studies made by Pacheco and Krohling used the data set consist of digitalized images taken by smartphone cameras along with the clinical data of the patients [18]. e data set covers multiple types of skin cancer. Regardless of significant results made by the researcher for the specified data set, the results can be further enhanced, and several techniques can be used and integrate to better train the model.

Materials and Methods
is section contains the description of the data set (PAD-UFES-20) used in the studies, data preprocessing, and classification model used in the study.

Data Set Description.
PAD-UFES-20 [29] data set was collected under the Dermatological and Surgical Assistance Program (PAD) at Federal University of Espírito Santo. e PAD-UFES-20 data set consists of skin lesions and clinical data with an average patient's age of 60 years. e data set contains the data of 1,373 patients, 1,641 skin lesions, 2,298 images, and metadata containing 26 attributes. Some of the images were removed due to the low-quality phone camera used to capture the image. Some of the patients have more than one type of skin cancer lesion. e number of images per category in the data set is shown in Figure 1. PAD-UFES-20 suffers from class imbalance; the number of images for ACK and BCC is high when compared with other categories. e data set covers six types of skin cancer such as actinic keratosis (ACK), basal cell carcinoma (BCC), melanoma (MEL), nevus (NEV), squamous cell carcinoma (SCC), and seborrheic keratosis (SEK).
Moreover, the data set contains metadata, that is, clinical features (26 attributes) in addition to the skin lesions. Some of the attributes are the identifiers and were removed; 21 features are clinical data and the class label. Some of the features are demographic information such as age, smoking and drinking habits, and father and mother background. Some of the attributes are related to the lesions such as itches, bleed, hurt, and so on. Clinical features are established on the questions commonly asked by dermatologists. e description of the vital signs attributes in data set is shown in Table 1.
Some of the skin cancer have common regions in the human body. For example, SEK skin cancer type lesion is more common in the face region; however, ACK is common in the forearm, and NEV is more common in the back region. e occurrence of skin cancer lesions in different body regions is shown in Figure 2. Similarly, ACK, BCC, and MEL skin cancer types do not grow. However, SEK category has equal distribution of lesion that sometimes grow and sometimes does not grow. Figure 3 shows the distribution of skin lesions per category based on the attribute grew. ACK, BCC, and SCC types of skin moles are itchy in nature. Figure 4 shows the itchiness' nature of the different types of skin cancer. However, only ACK type of skin cancer hurts and bleeds when compared with the other five types. Some of the sample images from PAD-UFES-20 for each category of skin cancer are presented in Figure 5.
Most of the features in the data set are categorical except age, Fitzpatrick, diameter_1, and diameter_2. e statistical description of the numerical features is presented in Table 2.
e prevalence of some types of skin cancers such as ACK, BCC, MEL, SCC, and SEK is in the age range of 59.9 to 68.86 years. However, the mean (μ) age for the NEV category is 35.64. e minimum age of the patients in the data set is 6 years, and the maximum age is 94 years. Similarly, the mean of diameter_1 for BCC, NEV, and SCC is similar. However, the mean of the diameter_2 is similar for ACK, BCC, and SCC categories.

Data Preprocessing and Augmentation.
For better generalization of the deep learning model and to alleviate the data imbalance problem, data augmentation technique was applied. e data imbalance usually led to model overfitting for the majority class. Augmentation was applied via resizing, flipping, shifting, and rotating. For resizing the images, the zoom range of 0.1 and rescale of 1.0/255 were set, while a dimension of 300 × 300 × 3 was used, which is a recommended input size for EfficientNetB3. Moreover, random horizontal and vertical flipping along with width and height shifting with a range of 0.1 were used to increase the generalization of the model for all possible locations of the skin cancer in images. For some images, 360°rotation was performed. e data augmentations were only applied to the training data set.

Classification Model.
After the data augmentation classification model was developed. e proposed model consists of two classification models such as EfficientNet for Complexity 3 skin lesions and Extreme Gradient Boosting (XGB) for the clinical data. e description of both models is discussed below.

EfficientNet Deep Learning Model.
Deep learning (DL) is a kind of convolutional neural network (CNN) and is widely used for images [30]. Recently deep learning has been widely used for the diagnosis of various medical diseases. Similarly, some studies have been made on diagnosis of skin diseases using deep learning [12]. DL consists of multiple connected layers using various weights and activation functions. e basic deep learning model contains a convolutional layer, pooling, and connected layers. Several activation functions are used to adjust the weights. e activation functions create a feature map that is input into the subsequent layer.

Complexity
Pooling and convolutional layers are used for extracting the features. ese layers are used for extracting the visual features and understand the complex nature of the images. Nevertheless, the nature of the skin cancer lesion is very complex, and developing an automated diagnosis system using deep learning is challenging. To alleviate this problem, transfer learning is used.
In our study, EfficientNetB3 is used for skin cancer detection. EfficientNetB3 is an up-to-date, cost-efficient, and robust model developed by scaling three parameters such as depth, width, and resolution [31]. e EfficientNetB3 model with noisy-student weights is used in scenarios I and III for the transfer learning process, while "isicall_eff3_weights" weights are used as pretrain for scenarios II and IV. e GlobalAveragePooling2D layer is added to each scenario to generalize the model better. e number of parameters were reduced. Furthermore, the RELU activation function is used with three dense and two dropout layers. e output layer contains multiple output units for multiclass classification using the softmax activation function. Table 3 enlists layers, parameters, weights, and so on used in the proposed study. [32]. XGB uses boosted tree and is used for classification and regression. XGB has been widely used for various prediction task and produces significant outcome due to efficient learning capability and speed. XGB is an enhanced version of the gradient boosting tree. e main aim of the algorithm is the optimization of the objective function by reducing the loss, model complexity, and computational resource utilization. e complexity is reduced using regularization. Moreover, the technique normalization is used to alleviate the model overfitting. e aim of using XGB for clinical data is due to its innate capability to handle the data imbalance. e algorithm works by adding the trees iteratively by splitting the features. In every next iteration, new rules are added, and the loss decreases. e iteration continues until the model achieved the required optimal performance. XGB model uses the second-order derivative to the loss function. Assume D is the data set consists of n number of attributes as follows: (1)

Extreme Gradient Boosting (XGB). Extreme Gradient Boosting (XGB) is an ensemble-based classification algorithm and was proposed by Chen in 2015
Y represents the class attribute; Y i represents the actual value, while Y t ′ represents the predicted value.
where Tree_Ens represents a tree ensemble model. e loss represents loss function, which is the difference between the predicted and the actual. N represents the number of trees. F represents the set of the trees used in the model training. Ω represents the regularization term.

Experimental Setup and Results
e models were implemented using python 3.8.4, and experiments were carried out on Google Colab notebook using GPU run-time type. Experiments were conducted on original and augmented data sets. Different experimental scenarios are discussed in the section below: Scenario I. EfficientNet noisy-student weights (PAD_UFES-20 data set): In this scenario, EfficientNet noisy student's weight was used for training the model. Initially, the weights were computed using ISIC 2019 data set. Later, the model was further trained and tested using PAD-UFES-20 images. e noisy student is a semisupervised technique that enhances the training and purification of the model [33]. It enhanced the performance of the ImageNet. e main idea behind the noisy student is that the number of students is either equal to or greater than the number of teachers, with the aim that the larger the number better will be the training. Secondly, the noise will be added so that noisy students are pushed to learn better and harder from the data set. Scenario II. ISIC 2019 weights (PAD_UFES-20 data set): ISIC 2019 weights were used to train the model [34]. However, PAD_UFES-20 skin lesions were used for training and testing. Scenario III. EfficientNet noisy-student weights (PAD_UFES-20 data set): In this scenario, EfficientNet noisy student's weight was used for training the model. In this scenario, skin lesions and clinical data were also used. During the experiments, the stratified fivefold crossvalidation method was used. Moreover, 30 epochs with 76 steps per epoch and a batch size of 24 were used. e learning rate was set to 0.0001, and the ReduceLROnPlateau method was used to investigate the validation loss. e study defined factor � 0.5, patience � 5, and the min_lr � 0.000001 in the proposed method.
e ADAM with 0.001 optimization method was used as a solver. e model has been validated in two different phases. e performance of the proposed model is evaluated in terms of several standard evaluation measures such as accuracy, precision, recall, F1 measure, and AUC (area under the curve). e precision of the proposed model is the ratio of skin lesions that are correctly predicted as skin cancer types. However, the recall is the ratio of correctly predicted skin lesions. Similarly, accuracy is correctly predicted skin lesions as skin cancer types. While the F1 measure is the harmonic mean of recall and precision.  Table 4 for all four scenarios. e average highest accuracy reported by the experiments was same for scenarios III and IV. ISIC 2019 weights have outperformed the noisy student's weights with combined data, that is, skin lesion and clinical data. However, with only skin lesions, noisy students produced a better outcome. e reported average accuracy of 0.76, recall of 0.82, same precision, and F1 and AUC of 0.81. e study confirmed the finding made by Pacheco's study [18] that the integration of clinical data enhances the diagnosis and triaging performance. e proposed study outcome was compared with the literature study. Indeed, it is important to mention that only one study was used for comparison because so far only one study has used the PAD-UFES-20 data set. e results achieved in the proposed study were not compared with the studies in the literature using ISIC 2019 data set because in the current study, the results were achieved using the PAD-UFES-20 data set.
PAD-UFES-20 data set was proposed and used by Pacheco. e results presented in Table 5 confirmed that the proposed study outperformed [24] in terms of all specified   10 Complexity Despite the significant results achieved by the proposed study, there is still room for improvement. e study used the data augmentation technique for alleviating data imbalance and better model generalization. us, it is recommended to collect more skin lesions for the categories that have very a smaller number of samples as compared to the other categories. Moreover, some of the clinical features were missing. Similarly, some of the images in the data set do not have diagnosis using biopsy; the diagnosis decision was made using the dermatologist diagnosis. erefore, there is a need for the data set where all the diagnosis for the skin lesions was made using biopsy. However, the proposed study overall produced high results when compared with the original study that proposed the data set.
Conclusively, the main contributions are: (1) e study explores the impact of clinical data on diagnosis of skin cancer using skin lesions and attempts to propose an automated tool for early diagnosis (2) For better generalization of the proposed model, data augmentation techniques were applied (3) In general, the proposed study model has outperformed the benchmark study and can be served as an effective tool for the diagnosis and triaging of skin cancer

Conclusions
is research presents an automated diagnosis and triaging system for skin cancer. e study used the EfficientNetB3 model for analysing the images taken via smartphone cameras; however, for clinical data, Extreme Gradient Boosting (XGB) ensemble classifier model is used. e main reason for using the XGB classifier is due to its better performance in the imbalanced data sets. e proposed study confirms the findings made by Pacheco's study, that is, the integration of clinical data enhances the diagnosis and triaging performance. e average accuracy reported in the study was 0.76 using skin cancer images and 0.78 using   e proposed study outperformed the benchmark study. Despite the data imbalance limitation, data augmentation techniques were applied to reduce the risk of model overfitting. Nevertheless, the outcome was significant, but there is still further need for improvement. e model can be further enhanced by implementing and comparing other deep learning models. Furthermore, the proposed model needs to be tested on multiple data sets. However, there is no other open-source data set available for skin cancer diagnosis that contains the skin cancer lesions and the clinical data.

Data Availability
e study was performed using PAD-UFES-20 and can be accessed from the web link, https://data.mendeley.com/ datasets/zr7vgbcyr2/1.