Deep Learning-Based Classification for Melanoma Detection Using XceptionNet

Skin cancer is one of the most common types of cancer in the world, accounting for at least 40% of all cancers. Melanoma is considered as the 19th most commonly occurring cancer among the other cancers in the human society, such that about 300,000 new cases were found in 2018. While cancer diagnosis is based on interventional methods such as surgery, radiotherapy, and chemotherapy, studies show that the use of new computer technologies such as image processing mechanisms in processes related to early diagnosis of this cancer can help the physicians heal this cancer. This paper proposes an automatic method for diagnosis of skin cancer from dermoscopy images. The proposed model is based on an improved XceptionNet, which utilized swish activation function and depthwise separable convolutions. This system shows an improvement in the classification accuracy of the network compared to the original Xception and other dome architectures. Simulations of the proposed method are compared with some other related skin cancer diagnosis state-of-the-art solutions, and the results show that the suggested method achieves higher accuracy compared to the other comparative methods.


Introduction
e most common cancer in the United States is skin cancer, which occurs in the tissues of the largest part of the body, i.e., the skin [1]. e skin blocks heat, sunlight, wounds, and infections [2]. e skin has three layers: epidermis, dermis, and hypodermis [3]. e epidermis is the outmost layer of skin, which creates the skin tone and makes a waterproof barrier for the skin. e dermis is the second layer that comprises rough connective tissue, sweat glands, and hair follicles. And finally, the hypodermis, as the lowest layer, has been made by connective tissue and fat [4]. e main threat to skin is skin cancer. Skin cancer (like melanoma) is one of the most common types of cancer in the world, accounting for at least 40% of all cancers. It has been predicted that about 9,500 people in the US are diagnosed with skin cancer every day [5].
While cancer diagnosis is based on interventional methods such as surgery, radiotherapy, and chemotherapy, studies show that the use of new computer technologies such as image processing mechanisms in processes related to the diagnosis and classification of cancers has been acted successfully [6]. Among different kinds of skin cancers, melanoma is known as the 19th most commonly occurring cancer in men and women [7]. In 2018, about 300,000 new cases were recognized [8]. Based on the Cancer Cell Organization, melanoma cancer with 15000 cases is the fourth most common cancer in the world [9]. Also, based on this organization, melanoma is the 9 th most common reason for cancer death in 2019 [10]. Skin cancer diagnosis is known as a tough task because of the advent of diverse kinds of skin lesions, especially melanoma and carcinoma [11]. Several noninvasive methods have been proposed to avoid unnecessary biopsy for diagnosing melanoma [12]. Most of the methods usually contain three main parts: segmentation, features extraction, and classification [13]. Several works were done in this case [1,[14][15][16]. Bansal et al. [17] proposed a technique for melanoma diagnosis based on deep learning-based image feature extraction. e authors used convolutional neural networks (CNN) for the extraction of the features based on the transfer learning and some different classifiers including k-nearest neighbor (KNN), AdaBoost, and random forest (RF) to the final classification. e method was performed to the ISIC dataset, and the results showed the accuracy of each classifier. e method was a good technique, but due to the complex configuration, it needs more time for doing the process.
Xu et al. [6] presented a method for early detection of melanoma. ey used a sequential methodology including image noise reduction, image segmentation, feature extraction, and classification. e method of segmentation in the study was based on an optimized convolutional neural network (CNN) using the satin bowerbird optimization (SBO). To extract just important features from the segmented images, SBO was utilized. At last, Support Vector Machine (SVM) was used to classify the images based on the achieved features. e method was performed to American Cancer Society database, and the results showed efficient results for the proposed method. However, the method provided good results, using the proposed method, and due to the combination of deep learning and the SBO algorithm, it provided complex system.
Razmjooy et al. [18] proposed a diagnosis technique for determining the skin malignant cancer. ey first eliminated extra scales by the smoothing and edge detection. en, the method segmented the region of interest. e additional information was removed by mathematical morphology. e model used an optimized MLP neural Networks (ANN) based on World Cup Optimization algorithm to get more efficient results. In that study, the authors used the optimized ANN to diagnosis of the skin cancer. Simulations were performed to the Australian Cancer Database (ACD), and results indicated that the suggested technique modified the performance of the method. e method used ANN method that can be considered as an old and less accuracy in these years.
Vocaturo and Zumpano [19] used a method called multi-instance learning (MIL) algorithm to the diagnosis of the melanoma from dysplastic nevi. Simulation results showed that using the MIL technique can be considered as one of the suitable tools for using in skin cancer diagnosis. However, MIL was a simple form of weakly supervised classification technique with sets that can provide weaker results in some cases.
Dey et al. [20] proposed an optimal machine vision technique for the diagnosis of the melanoma. e Bat algorithm was used to improve the accuracy of the diagnosis system. Distance-regularized level-set (DRLS) segmentation method was used for efficient segmentation of the melanoma.
e results were then verified by evaluating the important image performance metrics (IPM) on the PH2 database to show the method accuracy.
Literature review showed that although there are different types of diagnosis system for the melanoma detection, numerous research gaps are still to be addressed, for example, higher complexity of some works, complex configuration, and less accuracy. e shortcoming of all research works is given previously after each work. erefore, this subject is still open and can be developed. However, the main configuration of the Xception network is based on the Inception module [21], blending of the inception modules, convolutional layers, residual connections, and depthwise separable convolutions to improve its efficiency. e main target of the present research is to deliver a new improved version of Xception based on performing the Swish activation function to diagnose the skin cancer and verify the method by Skin Cancer MNIST: HAM10000 dataset. e presented system classifies the input images into three classes: normal, carcinoma, and melanoma. Furthermore, the results of the proposed Xception architecture are compared with some renowned methods, including VGG16 [22], InceptionV3 [23], AlexNet [24], and the original Xception [25] to show the superiority of the proposed methodology.
e configuration of all methods can be achieved by their papers for reproduction. erefore, the novelty of the presented study is based on proposing an automatic diagnosis method for the skin cancer dermoscopy images based on a new configuration of the XceptionNet. In this study, an improved version of the XceptionNet based on swish activation function has been presented, and the results show a higher accuracy toward the original XceptionNet, and some other related CNN-based methods for skin cancer diagnosis.

The Modified Xception Network Architecture
e Xception architecture is one of the popular and strong convolutional neural networks that is advanced under different important concepts, like convolutional layer, depthwise separable convolution layer, residual connections, and inception module [21]. Furthermore, the architecture of CNN for the activation function is essential, wherein Swish as a new activation function has been used for developing the traditional activation function [26]. In this study, a Swish activation function has been proposed for improving the Xception based on Swish image classification model for initial melanoma diagnosis [25]. e Xception is described as a theory based on the Inception module that generates cross-channels correlations and spatial relations within CNN feature maps to be entirely decoupled [27]. Figure 1 shows the overall module of an Inception v3.
As can be observed from Figure 1, the model is based on cross-channel correlations by input data separation into four ways to convolution size of 1 × 1 and average pooling and mapping correlations by the convolution of size 3 × 3 and finally forwarding to the concatenation layer. e overall module of the studied Xception module has been shown in Figure 2.
As can be seen from Figure 2, in this network, the data from the input uses just one size of 1 × 1 convolution to generate convolution sizes of 3 × 3 with no average pooling, which ensue avoiding overlapping of the output channels to inject to the concatenation. is module is more consistent, stronger, and reliable than the Inception module, which operates correlations of cross-channels and spatial relations with maps fully decoupled. In the following, the stages for the Xception module are explained in detail:

Convolutional Layer.
For generating feature maps, the convolution kernels have been separated into input data areas [25]. e different convolution kernels generate the absolute results of the feature maps, such that the position (i, j) upon feature value in the feature map as the k th layer indicates the l th , i.e., where Wv l k describes the weight vector, Bv l k describes for the bias value of the k th filter of the l th layer, and C l i,j describes the input patch center on position (i, j) of the l th layer.
e Wv l k kernel has been generated in sharing the feature map of S l i,j,k . is process decreases difficulties and develops the network for graceful model training. Batch normalization is used to insert the convolutional layers of the Xception module, and the activation function is as ReLU, i.e.,

RELU �
x, x ≥ 0, where d describes the input data.
e ReLU activation function is not complicated mathematically with nonlinearity of the network that is vital in convolutional neural network for identifying the nonlinear features, which produce faster convergences and better predictions with less overfitting.

Depthwise Separable Convolution Layer.
e depthwise convolutions contain the main part of the Xception modules.
ese can decrease the computation and the model parameters, which are prepared in depth dimensions and spatial dimensions of color channels. e depthwise convolution makes a filter to the input data set channels of M and generates the feature map to define DF × DF × M. e depthwise convolution based on the input channel filter is obtained as follows: where G describes the alternatives of the feature maps output produced by F as the input feature map, and K defines the depthwise convolution kernel. e filter number m in K is employed to channel the m th in F for estimation of the output of the feature map. Afterward, the image is presented in multiple channels that can  be taken in each color channel. 1 × 1 convolution filters are then used to provide the output to be injected into the next layer. After the depthwise separable convolution layer, batch normalization is utilized, and then using the max-pooling layer, the computational complexity has been decreased.

Residual Connection.
For accomplishing the residual connection, the ResNet architecture has been employed, where the internal network performs identity shortcut connections directly into the final layers. By considering the parameters as p i , the residual block is explained as follows: where v i and v o describe the input and the output vectors of the layers, respectively. e benefit of the residual connection is that it avoids signal mitigation by transforming of multiple stacked nonlinearities. It has also quicker training process. Figure 3 shows the residual shortcut connection of ResNet.
Also, the method of using the residual shortcut connection in Xception is shown in Figure 4.
As can be observed from Figure 4, the input of X can direct a late layer by a shortcut of identity blocks. By considering Figure 4, 1 × 1 convolution operation directed data to a late layer via with a step of 2 × 2. Xception includes a network with 36 convolutional layers that is used for producing the feature extraction. It generates 14 modules that intersperse with residual connections excepting the first and the last modules. In Xception pretrained network, the input image should be of size 299 × 299 × 3.

Swish Activation Function.
Based on a new work from [28], the Swish activation function provides an efficient results for the classification results. In other words, Swish activation function develops the CNN performance rather than the traditional ReLU activation function [25]. e mathematical formulation of the Swish activation function is given as follows: where α describes an adjustable per-channel parameter, m defines the input data, and sigmoid(α × m) signifies the evaluation of the sigmoid function. e architecture of the modified Xception network is shown in Figure 5.
As can be observed from Figure 5, the modules are similar to the original Xception, and just ReLU function has been replaced with Swish activation function position, which is located before logistic regression and after the global average-pooling.

Dataset.
e skin cancer benchmark datasets in this study were collected from Skin Cancer MNIST: HAM10000 [29]. is dataset by license number CC BY-NC-SA 4.0 is considered as a guaranteed dataset for the skin cancer diagnosis techniques.
e dataset was collected from the different techniques are confirmed based on this dataset [30][31][32][33]. In this paper, the data from this benchmark are used to train the proposed Xception network. e data was collected as dermatoscopic images from different populations, acquired and stored by different modalities [34].
53.3% of lesions were confirmed by histopathology. Figure 6 shows some examples of the HAM10000 dataset.
We applied the proposed Xception network as a complete diagnosis system for the detection of the skin cancer. Here, we also employed data augmentation. Data Journal of Healthcare Engineering augmentation is performed for increasing the number of images for training the CNNs. is is done to compensate the smaller number of training datasets. In other words, Augmentation is utilized to expand the small size datasets by adding supplementary images that are variations of available images in the dataset. is will help improve the ability and performance of the system. ere are lots of variations that are introduced for augmentation. In this study, rotation, horizontal shifting, and cropping are used.
One of the transformations in this study is horizontal shift augmentation. A horizontal shift augmentation shifts image pixels horizontally with keeping the image dimension unchanged.
is process has floating-point value between 0 and 1 that shows the step size of moving the process. Here, we used 0.3 step size. Another transformation is rotation. During rotation, a rotation angle is specified of specific angle that we want the image to be rotated. is study uses [15,30,45, 60] rather than letting it randomly pick it from −90 to 90. Another method for augmentation in this study is based on cropping. Based on cropping in this study, a section (here, center of image) is sampled from the original image and then, it resized to the original image size.
Indeed, the reason of data augmentation here is to increase the quantity of data by adding somewhat altered copies of already existing data or newly created synthetic data from the present data. In other words, we used data augmentation (i.e., shifting, rotation, and cropping) to regularize and help decrease overfitting of data when training the proposed Xception model.

Training and Configuration of the Proposed Xception
Network. e dataset has been divided into two groups: 80% for training (8012 images) and 20% (2003 images) for test. In the training procedure, all of the images have been resized to 227 × 227. e CNN model runs 15 times independently to perform the training the dataset; in other words, the proposed network has been performed 15 times in MATLAB environment, and the average results of the model are considered as the measurement values of the model. e simulations were performed on a Core i7 CPU 2.00 GHz laptop, with 2.5 GHz, 16 GB RAM, and 64-bit operating system. e implementation was programmed on MATLAB 2019b as the main programming language on a Windows operating system. Table 1 indicates the specifications of the hardware and the software. Also, the model configuration for the prosed CNN contains 12 batch sizes with 2e − 2 initial learning rate based on stochastic gradient descent with momentum (SGDM) optimizer. e data configuration is achieved based on trials and errors and close to the [35].

Evaluation Criteria.
In this study, four evaluation criteria are utilized to indicate the capability of the proposed system. e mathematical formulation of the utilized measures is briefly given as follows: where the accuracy describes the measurements closeness to a specific value, while precision is the measurements closeness to each other and sensitivity � TP TP + FN , F1-score � 2 × precision × sensitivity precision + sensitivity , where sensitivity (True Positive rate) defines the positives proportion that is correctly recognized. Also, F1-score has been achieved by using the precision and sensitivity of the test; i.e., the F1-score determines the harmonic mean of the precision and the sensitivity.

Results.
In this section, we investigate the method based on some different measurement indicators. Table 2 reports the performance analysis of the proposed Xception method compared with other studied algorithms. As can be observed from the results reported in Table 2, the proposed Xception method for the studied dataset offers the highest accuracy rate, which is 100%, and the original Xception, AlexNet, InceptionV3, and VGG16 have been ranked in the next places. e results also show that the sensitivity of the proposed Xception method with 94.05% provides the uppermost toward the others. is shows that how the proposed Xception is good in the test at detecting a positive Melanoma. e results also indicate that the proposed Xception method with 97.07% precision has the highest value, which shows its higher reliability toward the other studied methods. Finally, the F1-score of the suggested technique is 0.9553, which is the highest value among the others. In F1-score indicator, if the value gets closer to 1, it has the maximum precision and sensitivity.
For more declaration, the Receiver Operating Characteristics (ROC) curve for three classes, i.e., melanoma, carcinoma (BCC), and Normal, is shown in Figure 7. e ROC curve is a graphical profile that indicates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. e method signifies the diagnosis ability for the model with measuring the separability degree among different classes. More area under ROC shows better results for the model; i.e., the ROC area will be ideal if the area under the diagram is 1, and it will be poor if the area is 0.
As can be observed from Figure 7, the average area under this curve for melanoma class is 1.0, normal class is 0.98, and pneumonia class is 0.98. e main cause of that the area of classes normal and pneumonia is 0.98, as our   Based on the results, it is clear that both the proposed and the original Xception models provide the classification performance probability for three classes to progress an image diagnosis for the melanoma screening. e classification accuracy of the suggested Xception is better than the original Xception model as determined in the results. In defining the performance of the model classification, a confusion matrix of true class and predicted class for Xception has been shown in Figure 8 [25] and the confusion matrix of the proposed Xception method is shown in Figure 9.
As can be observed from the confusion matrix results for the diagnosis of the melanoma based on the original and the proposed Xception models for the diagnosis of the threeclass dataset, the proposed method provides a high accuracy. Indeed, the original Xception achieved true prediction of melanoma in 42 images, accounting for 33.30%, true prediction of normal cases in 40 images, or 31.7%, with false prediction of 68.3%; and true prediction of the pneumonia is 27.8% of the accuracy.
Also, the proposed Xception model achieved true prediction of melanoma in 98 images, accounting for 33.33%, true prediction of normal cases in 40 images, or 33.33%, and true prediction of the pneumonia is 33.33% of the accuracy.
us, the proposed Xception model based on Swish activation function provides higher accuracy compared with original Xception model.
As can be observed from the results, the proposed method has better effectiveness for the skin cancer diagnosis. However, there are some cases that ca be more improved for resolving its limitations: the method needs a large amount of data to deliver better results. Due to the need for the complex data, its training is expensive; i.e., it needs expensive GPU for better performance. Selecting a good topology and its other parameters is hard, which can be even harder for the less skilled people.

Conclusions
Among different types of cancer, skin cancer is considered as one of the most widely distributed ones. Melanoma is one of the most dangerous forms of skin cancer. If this type of cancer is diagnosed early, it can be treated 100%. But if it becomes aggressive and spreads to other tissues in the body, it will not be possible to treat it. erefore, early detection of melanoma can increase a person's chances of recovery and prevention of transmission to others. is study proposed a new architecture of Xception deep network as a convolutional neural network to provide an efficient diagnosis system for melanoma detection. Two main improvements of this model are to use Swish activation function and depthwise separable convolutions to improve the accuracy of the classification stage of the CNN. e proposed Xception method was then implemented to MNIST skin cancer dataset, and the results were compared with some state-of-the-art methods. Results showed that the proposed method, with 100% accuracy, 94.05% sensitivity, 97.07% precision, and 95.53% F1-score, provided the highest performance among the others.