Skin Cancer Detection Using Kernel Fuzzy C-Means and Improved Neural Network Optimization Algorithm

Early diagnosis of malignant skin cancer from images is a significant part of the cancer treatment process. One of the principal purposes of this research is to propose a pipeline methodology for an optimum computer-aided diagnosis of skin cancers. The method contains four main stages. The first stage is to perform a preprocessing based on noise reduction and contrast enhancement. The second stage is to segment the region of interest (ROI). This study uses kernel fuzzy C-means for ROI segmentation. Then, some features from the ROI are extracted, and then, a feature selection is used for selecting the best ones. The selected features are then injected into a support vector machine (SVM) for final identification. One important part of the contribution in this study is to propose a developed version of a new metaheuristic, named neural network optimization algorithm, to optimize both parts of feature selection and SVM classifier. Comparison results of the method with 5 state-of-the-art methods showed the approach's higher superiority toward the others.


Introduction
Cancer is fundamentally an uncontrolled cell division disease. Its growth is usually associated with a group of variations in the action of the cell cycle regularized. e inhibitors of the cell cycle stop cell division while situations are not proper; therefore, so little action of these inhibitors can cause cancer. Cancer cells disregard the signals that must cause cells to prevent dividing. For example, when common cells growing in a culture medium are surrounded on all sides by adjacent cells, they no longer divide (contact inhibition). On the contrary, cancer cells divide into layers in a mass and stack on top of each other, and contact inhibition does not prevent them from growing. One of the most dangerous cancers around the world is skin cancer. e skin, as the first cell layer of the body that is in connection with the outside environment, can suffer from many injuries and diseases. e most common cancer in the United States is skin cancer, which happens in the tissues of the largest section of the skin body. Skin cancer generally occurs on the outer layer of the skin, which may first appear as a swelling, bulge, or different section of the skin [1]. e skin blocks heat, sunlight, sores, and infections; it also warms the body and stores fat water in the body and produces vitamin D. e skin owns 2 principal layers: the superficial layer of the skin (epidermis) and the inner skin (dermis) [2]. e skin's surface layer (epidermis) is mainly composed of flat, scaly cells named Squamous cells. Round cells, called basal cells, are located beneath the superficial cells of the squamous melanocyte in the surface layer [3]. Prolonged exposure to sunlight enhances melanoma skin cancer's risk over a lifetime. One of the most dangerous kinds of cancer is Melanoma, which is caused by damage caused by overexposure to the sun and some other factors. Melanoma can be diagnosed with skin biopsies. Figure 1 shows the statistical information of the lead cancers in quantity and death value in 2019 [4].
If melanoma is diagnosed and treated early, it can be treated, and late diagnosis can lead to the patient's death. Only experienced physicians can diagnose melanoma on time using appropriate tools and histological reports. One of the devices used to diagnose melanoma is dermatoscopy. With this tool, changes in the pigmentation of skin lesions in diseases can be evaluated. With the development of science in recent years, digital dermatoscopy has been replaced by conventional dermatoscopes with the ability to capture and store skin images. With the development of digital dermatoscopes, the development of an algorithm for the diagnosis of melanoma was considered by researchers, and then, other methods were proposed to enhance the accuracy of skin lesion diagnosis. Some of the selected researches are explained in the following.
Astorino et al. [5] suggested an approach for melanoma diagnosis. e method uses multiple instance learning (MIL) approach on some dermoscopy images to categorize between melanomas and common nevi. e method was then analyzed by comparing it with some state-of-the-art classification methods, like support vector machine. Simulation outcomes demonstrated that the suggested approach accomplishes better precision. A leave-one-out authentication is carried out on a dataset and showed the method's higher efficiency.
Mohamed et al. [6] suggested an image processing approach for melanoma diagnosis. e method used a simple pipeline image preprocessing technique including methods like image conversion, noise reduction, and hair removal. Also, the melanoma area is threshold to segment the area, and then, feature extraction was used for taking out the main features of the image using the ABCD rule.
Barros et al. [7] proposed a real-time cancer diagnosis system based on hardware designing by the field-programmable gate arrays (FPGA). A multilayer perceptron artificial neural networks (ANN) is utilized for this purpose. e proposed image processing is used to extract the characteristics of the skin from the images and classify them based on the features and using the ANN classifier. Simulation results of the proposed approach are confirmed by an open-access database. Ultimate results of the suggested approach were then compared with the results of another hardware-based technique using ARM A9 microprocessor to demonstrate the method's proper performance.
Santos and Espitia [8] presented an approach for the diagnosis of uveal melanoma (UM), which is a sort of intraocular cancer. e suggested technique combined iris segmentation methods and designed a method for UM diagnosis based on neural networks and fuzzy logic. Simulation results of the study indicated 96.04% accuracy for the artificial neural networks and 76% correct classification for the fuzzy logic system. Wang [9] proposed an all-inclusive method to provide a proper segmentation technique. e method is applied based on deep convolution networks on the hyperspectral pathology images. e study employed a 3D fully convolutional network, called Hyper-net, to obtain the best performances of segmentation from the hyperspectral pathology images. e loss function was then modified for improving the model sensitivity. Ultimate outcomes demonstrated that the suggested approach has higher performance than the 2D models in the aim of segmentation. e present research suggests a novel optimized pipeline methodology for automatic detection of skin cancer detection. e method includes four main phases. e first phase is preprocessing of the input image to prepare it for the main processing. e first phase here includes two parts: noise reduction and contrast enhancement. e next phase is to segment the region of interest. Afterwards, the features are extracted, and the best ones selected for injecting to a support vector machine (SVM) as the final phase. is investigation employs an improved version of a new metaheuristic, called neural network optimization algorithm, to optimize both parts of feature selection and SVM classifier.
is can improve the accuracy of the system as can be seen in the next sections. e main contribution of the paper has been highlighted as follows:  2 Computational Intelligence and Neuroscience (i) An early diagnosis system for skin cancer from dermoscopy images is proposed (ii) Kernel fuzzy C-means is used to segment the region of interest (ROI) (iii) Some features from the ROI are extracted for the final diagnosis (iv) An improved neural network algorithm is used for optimal feature selection (v) Support vector machine (SVM) is used for final classification (vi) Comparison results are compared with 5 state-ofthe-art methods Figure 2 illustrates the flowchart of the suggested approach.

Noise Reduction.
Ordinary nonpolarized light is used in medical images. Since, according to "Fresnel relations," the reflection of light from the surface of matter and its underlying layers and scattering from uneven and rough surfaces is affected, as well as a function of the state of polarization of light, it is possible to eliminate some annoying and destructive reflected or diffused lights by using a suitable filter to achieve a high signal to noise ratio. erefore, the first step for starting the image analysis of the cancer images is to eliminate, or, at least, reduce, this noise to have a high quality of images for processing. Different methods have been introduced for this purpose. One simple and popular method for noise reduction is Median filtering. However, this filter is nonlinear; it has some significant advantages such as keeping the main edges of the image after filtering. In the median filter, the pixels have been substituted by the median value of their neighbors, i.e., where β signifies the neighborhood of the considered pixel in position (m, n). e present study uses a 3 × 3 mask for the median filter. is selection is based on trial and error and may be different for other databases. us, it should be the point that using a higher mask can improve the probability of edges losing. Figure 3 shows an example of using the median filtering for noise reduction of the input images with considering 0.1 noise density salt and pepper noises.

Contrast Enhancement.
In some situations, due to different reasons, like the sufficient user experience's lack in imaging and the bad quality of measurement devices and sensors utilized in them, the contrast of the images has lessened, such that its intensity gets darkened, or overexposed. To correct this deficiency from the images, we need to implement the image contrast enhancement. erefore, in this study, we can use an image contrast enhancement step on the low-quality images to improve the image contrast to simplify the segmentation step. e present study considers global contrast enhancement based on Lookup Table to reach this purpose [10]. e method of global contrast enhancement is formulated as follows: where Out H signifies the improved output image, and Min H and Max H demonstrate the minimum and the maximum levels of gray values of the original image histogram, orderly. We used an 8-bit lookup table for the method. A simple contrast enhancement on a medical image is shown in Figure 4.

Segmentation Based on Kernel-Based Fuzzy C-Means.
One of the significant issues in medical image processing is the image segmentation of the medical image into its components. Image segmentation describes the success or the final failure of image analysis methods. However, there is no general method for successful segmentation of all medical images, and it still has good research areas due to its wide application. e accuracy of this study is crucial in areas such as medicine, which helps preserve and protect human life. Due to the wide range of applications, it provides image segmentation and the use of methods in various fields. e importance of this part especially in mammographic images is too high, because medical images have a naturally low quality, which makes them a complicated problem for mass segmentation. is research uses an enhanced multistep fuzzy C-means method for performing on the skin cancer images for their mass segmentation. Based on this method, the first step is to generate superpixels using a simple linear iterative clustering (SLIC) algorithm [11]. e SLIC algorithm generates superpixels based on CIE LAB color space in 5D space. Afterwards, texture and color properties are employed to separate the superpixels and finally to find the image elements. en, the color feature from the superpixels has been extracted and has been arranged, such that, for all images in HSV space, three separate histograms for S, H, and V channels have been considered. To reduce time complexity, the histograms have been quantized into 8, 4, and 2 subdistances. Afterward, to find the texture features, the NSCT algorithm has been used. en, the generated data is clustered by a fuzzy kernel, and all superpixels with cluster tags have been placed in one cluster. is study uses kernel fuzzy C-means (KFCM) algorithm employing segmentation. is technique offers a kernel-based version of fuzzy C-means to evaluate the data point's distance from the center of clusters.
e kernel function in this study is achieved as follows: First, the membership function has been evaluated. en, based on a similarity measure of kernel fuzzy, the belonging of each data sample to the clusters has been Computational Intelligence and Neuroscience 3 achieved.
e pseudocode of the algorithm is given as follows: (1) Apply KFCM to cluster the set of objects and generating a U membership matrix [12] (2) In all elements, x i , x j , the number of t nearest neighbor is found.
where ⊕ defines the exclusive OR that totally indicates the overlap between two fuzzy sets. (6) Here, diagonal matrix, D, is formulated as follows: And, it is normalized as follows: e number of K bigger eigenvector than L (the first vector) is found (the first vector is selected), and the matrix P � [p 1 , p 2 , . . . , p k ] has been formed, and then, the algorithm normalizes the rows in the P matrix to form matrix Y. (7) Each line of Y has considered a point in space R k and at then, the final clustering has been established by the K-means algorithm.

Improved Neural Network Algorithm
Optimization. Generally speaking, in most applications of engineering, optimization is a vital subject. Optimization is to make the best decision to get the optimal (minimum or maximum) result for the considered problem. Several methods have been introduced for optimization. However, classic methods as exact methods can solve these problems, and they fail in some cases that the problem is nonlinear or complicated. To overcome this problem, another technique, called metaheuristic, has been introduced [13]. Metaheuristics include a set of optimization algorithms that are inspired by nature, human behaviors, animals' competitions, etc. e main advantage of using metaheuristic algorithms is  Computational Intelligence and Neuroscience that they use random structure instead of using gradient methods. is simplifies the optimization process [14]. Furthermore, the most complicated problems are that their number is increasing day by day and they cannot be solved by the classic methods, but these methods, based on their stochastic nature, can find a near-optimal result in a logical time. ere are different types of these algorithms like ant lion optimizer (ALO) algorithm [15], chimp optimization (CO) algorithm [16], Harris Hawks optimization [17], and world cup optimization (WCO) algorithm [18]. Recently, a new metaheuristic algorithm, called neural network algorithm (NNA), is introduced, which is inspired by the concepts of biological nervous systems and artificial neural networks (ANNs). Based on [19], each ANN includes some artificial neurons, which are inspired by the biological nervous systems. e relationships among units principally specify the network function. e NNA approach is illustrated in the following.

Neural Network Algorithm.
Like any other metaheuristic algorithm, the NNA starts with an initial population that is called the pattern. In NNA, just like ANN, the main idea is to update the pattern population to minimize error among the forecasted data and the desired output data. Here, the best solution is the desired output that can be updated in each iteration for achieving the minimum error amount by moving the pattern population in the direction of the desired solution. In the following, the algorithm methodology is briefly explained.

Initialization.
e first operation in this algorithm, like any other metaheuristic, is to generate some random population (decision variables) for initial evaluation. By considering a D dimensional problem, the considered pattern solution vector will be an array of 1 × D, representing input data in the NNA. Moreover, by considering the D dimension and N pop several random candidates, the initial pattern population can be considered as follows: where the matrix X is made randomly between the minimum and maximum limitations of a problem. And, the cost value is achieved as follows: where f describes the objective value.
After evaluating the objective value of each pattern solution, the best one is selected as the best pattern solution.
e NNA has N pop input data and D dimension with only one target data. Figure 6 shows this structure.
After defining the target solution (X T ) among the other pattern solutions, its weight (W T ) has to be chosen from the population of weight (weight matrix).
In an ANN, the neurons connect with dendrite using a simple summation. e output is connected to the input layers based on weighted (w) interconnection. Initial weights are random values in this algorithm, and then, they have been updated based on some equations to provide the minimum network error. e initial weights in the algorithm are as follows: where W includes uniformly distributer random values in the range [0, 1]. e first and the second subscripts of weight relate to the pattern solution and participated with the other solutions of the pattern. All pattern solutions have their weight value to generate a new candidate solution. ese weights have also a constraint that should not exceed 1 and mathematically formulated as follows: where i, j � 1, 2, . . . , N pop . Weight is randomly distributed values in the range between 0 and 1, where their aggregate in a solution of the pattern must not be more than one, which is because of the bias control of the movement and to generate the new Computational Intelligence and Neuroscience pattern solutions. If the constraint is not considered, the probability of sticking in the local optimum by the values of the weight will be increased. After weight matrix formation, new pattern solutions (X N ) are evaluated as follows: After achieving the new pattern solutions using the best weight amount that is named target weight, this updating is established by the following equation: Another operator for updating in NNA is the bias operator.
is operator is employed for modifying the exploration part of the algorithm by using a specific percentage of the candidates in the new population (X New i (t + 1)) and updated weight matrix (W updated i (t + 1)). e pseudocode of this operator is as follows:

End for End if End for
Here, LB and UB represent the problem minimum and maximum bounds, orderly. c describes the modification factor that defines the candidates' percentage. c � 1 at first, and then, it decreases gradually during the process based on any decreasing formulation as follows: In the NNA, there is also a transfer function operator to transfer the new candidates to the new updated positions to provide better value against the target solution. e solution reimproved by giving the novel pattern solutions to the best solution direction (target solution). e transfer function operator (TF) can be formulated as follows: e pseudocode of the collaboration between bias and TF operators is tabulated below: Apply the transfer function operator End if End for 4.3. Improved Neural Network Algorithm. However, the neural network algorithm provides good results to solve different applications of optimization problems [20][21][22], and it is sometimes stuck in the premature convergence that gives a high impact on the solution accuracy. In this investigation, to modify this drawback, 2 mechanisms are considered. e first mechanism is to employ the Chaos theory. is mechanism explores novel situations, which are dynamic discrete-time, i.e., In this research, the logistic map function has been used as the modification mechanism. is mechanism could be mathematically modeled as follows: where i signifies the populations' number, β j,i,q defines the value for the i th chaotic iteration, ρ describes a constant, which is set 4, q demonstrates the iteration number, j illustrates the generators' number in the system, and β 0 describes the initial value of β i with a random value between 0 and 1 [23,24]. With assuming the above explanation, the set of initial variables w ij is rewritten as follows:  Computational Intelligence and Neuroscience is study also used Lévy flight (LF) as a second modification. e LF is a popular technique for improving optimization algorithms [25]. e LF uses random motion to control the local searching. is is obtained by the following equations: where τ represents the Lévy flight index in the range [0, 2] (here, τ � 1.5 [26]), w describes the step size, A, B ∼ N(0, σ 2 ), Γ(.) describes Gamma function, and the samples generate a Gaussian distribution with σ 2 variance and zero mean value. By assuming the LF mechanism, the updated equation for the new pattern solutions based on the best weight value is achieved as follows:

Algorithm Validation.
To assess the effectiveness of the proposed improved neural network algorithm, six standard benchmark functions have been utilized, and the results are compared with some new state-of-the-art algorithms, containing multiverse optimizer (MVO) [27], moth-flame optimization (MFO) algorithm [28], world cup optimizer (WCO) [19], and the original neural network algorithm (NNA). e parameter settings of the algorithms are as follows: for MVO: wormhole existence probability (WEP); WEP min � 0.2; WEP max � 1; Coefficient(P) � 6. For MFO: logarithmic spiral shaped(b) � 1; for WCO: α � 0.5, playoff � 4%. e population size for whole algorithms is set to 100, and their maximum iteration is set to 100. Table 1 tabulates the studied benchmark functions for the analysis. e benchmark functions are considered with 30 dimensions.
To validate the algorithms based on the studied functions, four statistical indicators including minimum amount (Min), maximum amount (Max), mean amount (Mean), and standard deviation amount (Std) are employed, and the results are indicated in Table 2.
According to Table 2, the outcomes of the suggested INNA have the proper outcomes in terms of lower, higher, and amount. Because it provides the minimum value for these values that is the main purpose of these benchmark functions, therefore, the proposed INNA has the highest accuracy. Furthermore, the proposed method with a minimum standard deviation value provides the highest reliable solution for the studied benchmark functions.

Feature Extraction and Selection Based on INNA
Feature selection is one of the most important methods and techniques for data preprocessing and data mining. Due to the introduction of new programs for large data mining, media information retrieval, and medical data processing that requires the processing of large volumes of data, it is important to limit the number of features. e purpose of feature extraction here is to take out the important characteristics from the region of interest in skin cancer images to simplify the diagnosis process. In this paper, 19 different features have been used for extraction from the image. Table 3 indicates the used features for the extraction. erefore, here, we utilized a feature selection technique to reduce and select some more important features and to eliminate the low-cost features. Different methods have been introduced for this purpose. Metaheuristics are a kind of new technique for feature selection.
Feature selection using metaheuristics is one of the subsets of feature extraction and is discussed in various fields of machine learning and data mining. In general, this problem does not have a definite solution, and so far, no precise method has been proposed to solve it. Various classical approaches have been proposed to these problems, but usually, the quality of their solutions is generally not very good. In contrast, intelligent optimization methods can provide far better solutions to these problems. erefore, one of the effective and constructive methods in solving feature selection problems and related issues is the use of metaheuristic optimization methods and evolutionary algorithms.
Here, we employed a cost function for optimum se- e idea is to minimize the abovementioned function with selecting the high cost and important features. Based on the designed INNA, the above function has been minimized, and less-cost features are eliminated from the processing. e next step is to employ an efficient method for classifying these features.
After performing the binary optimization technique on the features, the following features with higher effectiveness are selected for training: elongation, area, mean, correlation, entropy, and elongation. is selection is performed based on a metaheuristic technique based on a stochastic nature.

Skin Cancer Image Classification
Classification is the process of separating an input image into predetermined classes. Image classification is considered the last part of the diagnosis in medical image analysis. is step Computational Intelligence and Neuroscience can decrease the time for the cancer diagnosis process by decreasing the search space in it. A popular and useful method in classification is to use a support vector machine (SVM). e SVM seeks the best hyperplane that operates as a multipart (here two-part) data separator in the input space.
For reaching the constraints of the optimization problem, the support vector machine has been employed to evaluate the normal vector, w, in the hyperplane, the bias b, and the slack variable, η for incorrectly assigned training patterns that support the generalization, which is defined as follows:

Function
Formula Minimum Limitation Rosenbrock Table 2: e simulation results of the comparative algorithms on the studied benchmark functions, moth-flame optimization (MFO) algorithm [28], world cup optimizer (WCO) [19], and the original neural network algorithm (NNA). Algorithm 4π × area/perimeter 2 Standard deviation Variance 1/2 Form factor Area/a 2 MN describes the image size, B p defines the external side length for the boundary pixel, p(i, j) defines the pixels intensity amount at position (i, j), μ and σ describe the mean value and the standard deviation, orderly, and a and b present the major axis and the minor axis, respectively. However, some of the above features have a high impact, and some others have a low impact on feature extraction. 8 Computational Intelligence and Neuroscience y � min With separation function: Feature space is highly affected by the separating function in equation (21), so that a function selection needs to be performed correctly to generate optimal output. Here, for minimizing equation (20), the suggested INNA has been utilized. Accordingly, the support vector machine uses a kernel that is employed to alter data to the feature space. In this research, three different kernels including linear, polynomial, radial basis function (RBF), and sigmoid have been used. Figure 7 shows a typical SVM.
As can be explained, the SVM technique is a proper tool for classifying the images. Medical images are also a part of this event. erefore, after performing feature extraction on the images for decreasing the operation complexity, the skin cancer features for each image can be injected into the SVM, and the SVM will present the final classification, which will be healthy or cancer.

Implementation Details.
As mentioned before, the proposed skin cancer detection system contains different modules that are generated in MATLAB R2019b environment. e modules include image segmentation, image acquisition, feature extraction and choice, and the last classification. In the image acquisition stage for dermoscopic images, American Cancer Society (ACS) [30] and PH 2 [31] databases are used: where TP, TN, FP, and FN denote the true positive, true negative, false positive, and false negative. So, by analyzing the efficiency of the INNA-SVM method by considering different kernels based on the abovementioned indicators, we have the following.
It can be observed from Table 4 that the linear kernel of the proposed INNA-SVM, in addition to simplicity, results in the best values for all indicators for both databases: sensitivity, accuracy, and precision. erefore, this kernel is utilized here for the classification of the features. For a comprehensive investigation about the proposed method's efficiency, five evaluation measurements including specificity (SP), positive predictive value (PPV), negative predictive value (NPV), F1 score, and Matthews correlation coefficient (MCC) are considered, which are mathematically as follows: e general proposed method has been compared with five new methods, and the comparison results have been indicated in Table 5. e compared methods include Astorino's method [5] based on multiple instance learning (MIL), Hassan's method [6] based on a simple pipeline image preprocessing technique, Barros's method [7] based on hardware designing by the multilayer perceptron, Santos's method [8] based on neural networks and fuzzy logic, and Wang's method [9] based on deep convolution networks. e results are achieved from the mean value of both databases.
According to Table 5, the F1-score of the suggested method with 84.39% that defines its accuracy based on the precision and recall of the test data is the highest among all of Computational Intelligence and Neuroscience the compared methods. Also, the value of MCC in the proposed method with 94.21% is the highest among the others, and because it uses all of the analysis terms, it shows the method's higher efficiency than the others. Also, the higher value of the NPV and PPV with 95.57% and 85.09% against the other methods shows higher condition occurrence of the method to control the likelihood of a test identifying cancer toward the others. Finally, the higher SP value based on the proposed method reports higher occurrence-independent results of the algorithm.

Conclusions
Skin cancer is one of the most common cancers, and malignant melanoma is the most invasive and deadliest kind of this cancer. Early detection of this cancer by physicians has a high effect on the treatment of this cancer. Recently, several machine vision-based techniques are used for helping physicians with accurate early detection. is paper proposed a new approach for optimal skin cancer detection. e first section in the proposed method was to establish a preprocessing technique including noise reduction and contrast enhancement. en, the region of interest (ROI) was segmented by a kernel fuzzy C-means segmentation method. e features of the ROI were then extracted, and the most important features were selected optimally. Afterward, the selected features were injected into an optimal classifier using support vector machine (SVM) for classification. e main contribution of this study was to present a developed version of a new metaheuristic, called neural network optimization algorithm, to optimize both parts of feature selection and SVM classifier in the system. e superiority of   the proposed method was proved by performing a comparison among the suggested approach and five state-of-theart methods.
Data Availability e data are available at https://www.cancer.org/.

Conflicts of Interest
e authors declare no conflicts of interest.