Medical Image Classification Using Transfer Learning and Chaos Game Optimization on the Internet of Medical Things

The Internet of Medical Things (IoMT) has dramatically benefited medical professionals that patients and physicians can access from all regions. Although the automatic detection and prediction of diseases such as melanoma and leukemia is still being investigated and studied in IoMT, existing approaches are not able to achieve a high degree of efficiency. Thus, with a new approach that provides better results, patients would access the adequate treatments earlier and the death rate would be reduced. Therefore, this paper introduces an IoMT proposal for medical images' classification that may be used anywhere, i.e., it is an ubiquitous approach. It was designed in two stages: first, we employ a transfer learning (TL)-based method for feature extraction, which is carried out using MobileNetV3; second, we use the chaos game optimization (CGO) for feature selection, with the aim of excluding unnecessary features and improving the performance, which is key in IoMT. Our methodology was evaluated using ISIC-2016, PH2, and Blood-Cell datasets. The experimental results indicated that the proposed approach obtained an accuracy of 88.39% on ISIC-2016, 97.52% on PH2, and 88.79% on Blood-cell datsets. Moreover, our approach had successful performances for the metrics employed compared to other existing methods.


Introduction
e Internet of ings (IoT) has been formulated to de ne the use of devices that can be controlled remotely [1].e development of these devices allowed a wide range of uses.Hence, IoT is used in many areas, such as industrial [2], smart cites [3], agriculture [4], and Internet of Medical ings (IoMT) [5].However, the IoMT technology has been commonly applied due to its high performance, saving time, and e orts of specialists/patients [6].Besides, it provides patient care, such as monitoring their medications and tracking their hospital admission location.IoMT technologies are widely available, especially for diseases with the highest mortality rate globally, such as melanoma [7], leukemia [8], and others.Technology such as mobile devices and wearables can collect information about human health to provide e ective hospital care.ese technologies could be used in many applications or services, like obtaining data and analyzing them and monitoring the diagnosis of neurological illnesses.As a result of its e ciency and usability, the IoMT technology has been broadly accepted and widely used.
Deep learning (DL) models can help diagnose breast cancer [9] and Alzheimer's disease [10] using advanced biomedical imaging methods such as thermal imaging and magnetic resonance imaging (MRI); however, these methods are expensive, require specialized medical imaging equipment, and are not available in many rural areas of developing countries.
us, DL has recently been used by IoMT to automate and accurately diagnose a variety of diseases that help to facilitate efficient and appropriate healthcare [11].For instance, an IoMT system for stroke detection using convolution neural networks (CNN) and transfer learning was demonstrated to distinguish between a healthy brain and hemorrhagic and ischemic strokes in CT scan images, as introduced in [12].Although DL models outperformed traditional machine learning [13], there is less work known for DL-based IoMT on healthcare than services available on IoMT devices.
e IoMT system for stroke patients' prevention can capture and maintain the patient's heartbeats, core temperature, and external factors quickly and with the required precision.ese factors are essential for diagnosing stroke examination.DL techniques can help prevent frequent difficulties that take much time to solve.For example, web scraping [14], data mining [15], and sentiment analysis [16] are all areas where TL technology has a broad array of applications.
Moreover, these approaches need a huge size of welllabeled training data samples.Many transfer learning (TL)based approaches have been developed in medical image analyses to solve this issue.Due to its capacity to effectively solve the shortcomings of reinforcement learning and supervised learning, TL is becoming more widespread in medical image processing [17].
TL aims to train the forecast function in the target domain by utilizing information obtained in the source domain from a vast number of labeled datasets (e.g., ImageNet).TL is widely recognized in different computer vision domains for helping to enhance the learning of sparsely labeled or limited datasets in the particular domain [18].Unfortunately, the input image properties of the training examples (i.e., a massive dataset of natural images) and the test data are highly different for TL in medical imaging (i.e., a small dataset of clinical images).Because of the significantly different domains with various and unconnected classes, as in [19], the transferred functions learned from the source database (training set) may be biased when directly implemented into the target database (test set).Consequently, the biased function's features are unlikely to be desired in the target domain, the medical image field.Moreover TL is vital to have both indicate environmental and discriminative capability in the feature extraction process in order to improve classification accuracy [20].According to the traditional view, the TL is pretrained in the experiment and then finetuned for implementation using detailed information.Unsupervised, inductive, transductive, and negative learning are all types of TL.Also, it can solve these challenges [21].
Hence, we use a TL model to obtain features from medical images.
Many features, such as color, texture, and size, are used in standard medical image categorization methods.When controlling high-dimensional feature vectors through an optimizer algorithm, the selection of optimal features is offered in a way to improve classification efficiency [22].e optimal representation of the specified subset of features creates additional issues for the researchers.In order to automate this method, feature selection (FS) approaches have also been crucial for accurately defining these essential features.
erefore, we developed a method to solve the diagnostic imaging identification challenge and optimize the process, which is wrapped as an IoMT system to reduce morbidity and mortality worldwide.To the best of our knowledge, our approach is the first that tries to improve the efficiency of medical image classification on IoMT based on merging the deep learning (as MobileNetV3) and chaos game optimization metaheurstic optimization.
In order to improve the performance for classifying medical images, the system incorporates both TL and FS optimization techniques.It is initially recommended that a TL architecture analyzes the supplied medical images and develops contextualized representations without personal communication.A finetuned MobileNetV3 is utilized to retrieve the embedded images.Next, a novel FS method is also planned to analyze each pixel embedding and choose only the most important properties to improve medical image classification performance.e FS method depends on a new metaheuristic strategy known as chaos game optimization (CGO).
e reasons for employing CGO approaches to optimize the FS challenge in this paper are as follows.We would want to examine the most recent CGO optimizer.Furthermore, a CGO method is compared to the approach to complex, modern, and high-efficiency algorithms which reveal that the CGO optimizer has the optimal solution for the problems examined, with typically more incredible classification performance (i.e., fewer iterations and execution time).e contributions of this paper can be summarized as follows: (i) e proposed IoMT system helps minimize human intervention in medical centers and provides fast diagnosis reports embedded in low-resourced systems.(ii) e transfer learning (i.e., MobileNetV3) model is finetuned on the assessed medical image datasets to extract relevant features.(iii) A novel feature selection approach to select appropriate features is used to build an IoMT system.(iv) An extensive evaluation of the proposed system is reported and compared to several state-of-the-art techniques using two real-world datasets.
According to the paper's structure, Section 2 describes a review of recent work on medical imaging.Section 3 offers a detailed description of our approach.Section 4 analyzes the implementation results of image classification techniques.Finally, the concluding remarks give future scope in Section 6.

Related Works
e essential strength of the classification task to help diagnose the medical image makes it an important area of 2 Computational Intelligence and Neuroscience research.erefore, this section is presented with the recent works about medical image classification.Recently, researchers have improved the Internet of Medical ings (IoMT) using DL and the classification task performance by applying transfer learning.Due to advances in connectivity among systems, the Internet of ings (IoT) is currently being used in various fields.When used in the medical area, the IoT can construct care and monitoring systems that could be monitored remotely.It is now possible for medical professionals and sometimes even patients to remotely access sensor data generated by devices attached to persons who are being monitored or have specific requirements [23].Computed-aided diagnosis (CAD) technologies can benefit from the IoT by providing an interaction that directly correlates the terminal to the devices for medical images' classification.To put it another way, any person may now control a technology that formerly required training [24].
DL has been increasingly popular on the Internet of Medical ings (IoMT) in recent years [25].As a result, the IoMT concept is suitable for building embedded technologies that can accurately diagnose diseases in the same manner that professionals perform.IoMT innovation, according to [26], has contributed to the establishment of vital healthcare systems.Physicians may now receive it in various settings, allowing them to better diagnose patients without affecting subjective features.Another obstacle that is yet to be addressed is the disparity between rare and common diseases regarding the amount of data collected.ey introduced a method for the recognition of CT scan images of pulmonary and ischemic stroke on the IoMT [27].
ese researchers employed an IoT device to directly contact users to choose the optimum extraction methods and classifications for a given situation.However, it was a result of this problem that the system was underperforming.A considerable percentage of accuracy is required in the medical sector when diagnosing the form of the disease.It has been shown in previous research studies that early identification of cancer is vital for sick people to receive the best treatment possible.us, our goal is to improve the medical image diagnosis by increasing the accuracy of the applied algorithms.
In recent decades, metaheuristic optimization algorithms are combined with convolution neural networks (CNN) for medical image classification.e transfer learning process has been viral, primarily since it enables the system to be more powerful, reduces financial costs, and requires fewer inputs, supported by the entry weights supplied by the training process transferred.e study [28] examined training from many cases through a transformation in medical image processing.e researchers discussed various types of learning and future studies possibilities.For finetuning of transfer learning, Ayan and Ünver [29] employed the Xception and VGG16 structures.When they added two fully connected layers and the multiple output tier with a SoftMax activation function, they also completely modified the architecture of Xception.In the VGG16 structure, the past eight tiers were halted and the completely connected levels were modified.Accordingly, the testing time for each image for the VGG16 and Xception networks was 16 and 20 ms, respectively.InceptionV3, ResNet18, and GoogLeNet were among the models employed in Reference [30].Based on convolutional networks, a determination has been made.ey used each one of the models to test the premise that voting may be used to arrive at a diagnostic.In their study, the findings of the classifiers were combined using the clear majority.Accordingly, the diagnostic correlates to the class with the largest rate of start voting in the polls.e model's mean testing time per image was 161 ms using this method.On top of that, they attained high classification rates for X-ray pictures.According to this study, pneumonia can be diagnosed using deep convolutional networks.As part of our method, we rely on classical classifiers to minimize the computing cost of classifying information.
As a result of their extensive feature representation skills, CNNs have been commonly applied in medical image processing in the latest years and have shown substantial gains.Zhang et al. e authors of [31] have developed a system for target class lesion identification based on multi-CNN collaboration.In addition, their approach was more reliable in identifying lesions and its utility had been evaluated using necessary details.A strong ensemble structure for cancer detection was created [32] using dynamic classification techniques.
erefore, a more distinctive and robust model can be created.To identify skin lesions on their own, in Reference [33], they proposed that a crossnet-based mixture of multiple convolutional networks may be used.For the categorization of melanomas, MobileNet and DenseNet were coupled [34].Because the light medical image classification model was designed to boost feature selectivity, computation complexity, and parameter settings, it differed from older systems.It used a categorization strategy that worked well.
Currently, metaheuristic optimization algorithms are being used to solve a wide range of complex optimization problems.Rather than a single answer, a list of possible solutions allows them to navigate the solution space efficiently.ey beat other optimization approaches as a result.Samala et al.
e authors of [35] suggested a method of multilayered pathway development to identify breast cancer.
ey used a two-stage method: transfer learning and identifying features, respectively.Region of interest (ROI) from large lesions were being used to train pretrained CNNs.On top of it, a random forest classification model was created using the learned CNN.We evolved pathways using a genetic algorithm (GA) with random selection and total number crossover operators.eir research found a 34% change in features and a 95% reduction in parameter actions using their proposed strategy.
rough particle swarm optimization (PSO), da Silva et al. [36] optimized the hyperparameter of CNN for a false-positive reduction in CT lung images due to their comparable structures and low density, which causes false-positive results.Scientists have found that optimizing an automatic detection system can improve outcomes and minimize human intervention.In order to acquire the binary threshold value, Vijh et al. e authors of [37] adopted OTSU-based adaptable PSO for Computational Intelligence and Neuroscience automatic classification of brain cancers.To reduce noise and improve the image quality, noise processing was removed and skull stripping was applied.For feature extraction, GLCM was utilized and 98% of the features were extracted.
Utilizing the grey wolf optimization (GWO) method, Shankar et al. [38] developed a novel concept for Alzheimer's disease using brain imaging analysis.An initial consideration for picture editing is to remove undesirable regions.
e retrieved images are then sent to CNN for feature extraction, resulting in improved performance.According to Goel et al. [39], OptCoNet is an optimized CNN architecture for recognizing COVID-19 patients as normal/pneumonia sufferers.For hyperparameter adjustment of the convolution layer, they employed the GWO.eir study found that the proposed approach assisted in the automated examination of patients and reduced medical systems' burdens on the system.In order to enhance architectures for denoising images, Elhoseny et al. [40] employed the dragonfly and improved firefly algorithms (FFA) to categorize the images as normal and abnormal.
is adjustment improved significantly due to this adjustment, as the peak signal to noise ratio (PSNR) reduced significantly.Melanoma diagnosis was enhanced utilizing the whale optimization algorithm (WOA) and levy battles, as introduced in Reference [41].Two datasets were analyzed using the developed structure, and the accuracy was 87% on both datasets.Some of them suffer from premature convergence and local minima, especially when faced with a large solution space [42].Often, this limit results in inefficient task scheduling solutions, which hurts system performance.erefore, a globally optimal solution to the IoMT task scheduling problem is urgently needed.
However, these existing approaches were still unable to achieve a high degree of efficiency.To overcome this problem, this paper aims to find the best solutions that lead for improving performance.Hence, we combine transfer learning with metaheuristic FS optimization to create an available IoMT system.e characteristics of this system allow for outstanding performance, reasonable computing expenses, and address the financial concerns discussed earlier.As a result of the IoMT, it is necessary to treat and detect infection inside and outside the clinic.erefore, Internet-connected devices and a digital copy of scan were used in IoMT system.However, these existing approaches were still unable to achieve a high degree of efficiency.To overcome this problem, this paper aims to find the best solutions that lead for improving performance.
e main difference between the proposed model and previous approaches is that we combine transfer learning with metaheuristic FS optimization to create an available IoMT system.e characteristics of this system allow for outstanding performance and reasonable computing expenses.Hence, this system is necessary to treat and detect infections and diseases from anywhere.

Methodology
Inside this field of medical image classification, detecting a user's illness using a medical database is an interesting topic.e present study used three datasets for image recognition analysis, with the major goal of achieving maximal performance in disease diagnosis.e three datasets investigated were ISIC-2016 [43], PH2 [44] (both for melanoma detection), and Blood-Cell classification [45].Figure 1 depicts the established IoMT's architecture.Initially, the IoMT devices capture medical images, and if the goal is to train the IoMT system, the image data could be sent to a cloud center.
ere still are three main processes at this level.roughout the first stage, the features are extracted while using the TL architecture, as detailed in Subsection 3.2.2. e next stage is to find the relevant features using CGO.Lastly, the classification is performed, and the results can be dispersed across fog operating systems to save on communication costs if desired.If the goal is to identify the condition of the collected data, the training data in fog operating systems are utilized.
3.1.Proposed IoMT System.Our IoMT system is based on a computational cloud that communicates with a fog.Users may easily manipulate the data and parameters required to get the online service's classification results.
is system component also handles communication between IoT devices (mobile phones and laptops) and the cloud center.Because the patient's images are all the same, the system can be used for various exams, proving its reliability.Image sizes, formats, and color conversions are adjusted as standards.
e IoMTsystem represented in Figure 1 is what we offer to implement our methodology in the system in order to give a quick reaction and support the physician in making appropriate choices.ere are two components in our system, cloud computing and fog computing.
ese are done first by sending a medical image database to a training level in the cloud using IoT technologies.Using the training model, the created system from Subsections 3.2.2 and 3.3.2may be well.e pretrained feature extraction technique is deployed on cloud service and benefits from the light and quick approach.ere is wellknown interoperability and limited resource use on embedded systems with the MobileNetV3 structure to extract the features.e introduced CGO algorithm, a lightweight and robust feature selection method, has been used upon feature extraction to minimize the features embedding set and only maintain the more essential features in each filtered image.We can speed up the training process by decreasing the number of features, which will allow us to arrive at a classification choice in an acceptable amount of time.
One of the two components included in this IoMTsystem is fog computing.It allows the approved training model to make predictions without re-training the system, saving time and reducing network traffic.As a result, fog computing devices can assist the expert in making a judgment on medical image diagnosis better than waiting for a choice from the cloud centers.In addition, the training process on the cloud centers is finetuned regularly, employing photos gathered from connected devices and saved in a database.us, the training system's quality will improve, making better, more accurate decisions.
ere will also be a web-based application that the transmitter can use to create a rapid forecast that uses the 4 Computational Intelligence and Neuroscience pretrained or netuning program to re ne the system on a batch of new photographs.e sender will receive the nal choice among other measurement metrics like accuracy to back up system forecasts.

Feature Extraction Using TL.
is section gives a detailed description of the used transfer learning technique for features learning and extraction.As mentioned in Section 2, the pretrained model for image classi cation tasks in computer vision is bene cial in training and inference speed.In addition, few parameters can be netuned during the training process rather than training models from scratch.In our system, MobileNetV3 is used as the backbone of the feature extraction process where the top layers of the model are replaced with new layers, and only speci c layers are netuned.e MobileNetV3 is an optimized version generated by a network architecture search (NAS) algorithm called NetAdapt.
e NetAdapt algorithms use Mobile-NetV1 and MobileNetV2 components to search for an optimal network architecture and kernel size to minimize the model size and latency alongside maximizing its performance.

E cient Deep Learning. DL techniques and models
have demonstrated success in various tasks, including image classi cation, image segmentation, and object detection [46][47][48][49].However, the challenges of these tasks, especially the quality and the impact of the learned representations, remain largely unexplored.Over the past decade, several DL architectures and training techniques have been proposed.For instance, researchers focus on exploiting the power of DL models to improve the model's performance and eciency in terms of training time, computational resources, and accuracy.One of the most investigated DL models is convolutional neural networks with di erent architectures, designs, parameters, and training processes.Depthwise convolutions are DL components designed to exploit the spatial information in the input image and replace the traditional convolution layers, thus facilitating their deployment on embedded devices or edge applications.Various DL models have embraced the concept of depthwise convolutions to overcome the limitations of traditional convolution layers including MobileNets [50,51], Shuf-eNets [52], NASNet [53], MnasNet [54], and E cientNet [55].Unlike the traditional convolution layers, the depthwise convolution layers are used separately on each input channel.
us, the models can be computationally inexpensive and trained with fewer parameters and less training time.In this section, we will focus on introducing the MobileNetV3 [51] and its core components.More detailed information will be discussed in the following sections, where we describe the MobileNetV3 as our feature extractor used in the proposed system.
Howard et al. [51] introduced the MobileNetV3 in two versions: MobileNetV3-large and MobileNetV3-small.e MobileNetV3 is designed to optimize the latency and accuracy of the previous version, which is the MobileNetV2 architecture.For instance, MobileNetV3-large improved the accuracy by 3.2% compared to the MobileNetV2 while reducing the latency by 20%.e MobileNetV3 was designed using a network architecture search (NAS) technique termed the NetAdapt algorithm to search for the optimal network structure and kernel size of the depthwise convolution.As illustrated in Figure 2  (v) e squeeze-and-excite block (SE block) [54] to select the relevant features on a channelwise basis.(vi) e h-swish activation function [57,58] which is used interchangeably with the ReLU (Recti ed linear unit) activation function.

Feature Extraction Module.
Using di erent image datasets, the MobileNetV3 was netuned to learn and extract feature vectors from inputted images of size 224 × 224.e MobileNetV3 was trained on the ImageNet dataset [56].In our experiments, the MobileNetV3-large pretrained model was employed and netuned on the datasets having skin cancer and blood cells images.A 1 × 1 point-wise convolution (Conv) was used to replace the top layers used for classi cation in the MobileNetV3 model as shown in Figure 2. e 1 × 1 pointwise convolution can be seen as a multilayer perceptron (MLP) used for classi cation and feature extraction tasks.us, in our implementation, we used two 1 × 1 pointwise convolutions at the top of the model to extract features from the input images and netune the model on the image classi cation task.Meanwhile, the MobileNetV3 building block consists of an inverted residual block inspired by the bottleneck blocks.
e inverted residual block contains two important blocks: the depthwise separable convolution block and a squeeze-and-excite block used to link the input and output features on the same channels, thus improving the features representations with low memory usage.
e depthwise separable convolution block consists of 3 × 3 depthwise convolution, batch normalization (BN), activation function, and 1 × 1 pointwise convolution where the order of execution of the layers is as follows: In contrast, the squeeze-and-excite block consists of fully connected layers (FC) with nonlinear transformation for global feature extraction using global pooling operation with the following execution order: (Pool) ⟶ (BN) ⟶ (FC1) ⟶ (ReLU) ⟶ (FC2) ⟶ (Sigmoid).Each building block can integrate a depthwise separable convolutional layer with di erent nonlinearity functions such as ReLU or hard swish (h-swish) which are de ned in Equations 1 and 2, respectively.
ReLU(x) max(0, x), (1) where h_swich is a modi ed version of the sigmoid activation function and σ(x) de nes the piecewise linear complex analog function.
To extract the feature vector from each input image, we used the generated netuned model on each dataset.We attened the 1 × 1 pointwise convolutional layer (placed before the classi cation layer) output and used the output as the feature vector.
e extracted feature vector for each image of size 128 will be fed into the feature selection process in the proposed system.e model was netuned for 100 epochs with a batch of size 32 on each dataset to produce the best classi cation performance.Meanwhile, to update the model's weight and bias parameters, we used the RMSprop optimizer with a learning rate of 1e − 4. To overcome the model's over tting, we used the dropout layer with a probability of 0.38.Computational Intelligence and Neuroscience optimized feature choice process was implemented wherein most of the critical features were defined using the optimizer, i.e., chaos game optimization (CGO).

Chaos Game Optimization (CGO).
As a result of certain principles of the chaos theory, the CGO relies on fractal self-similarity issues [59].According to the chaos theory, small changes in the early conditions of a chaotic system can significantly impact its future because of the system's dependence on its beginning conditions.Following this theory, the present state of a system can predict its future state, while the estimated existing state of the system does not identify its future state.In mathematics, the chaos game is constructing fractals by utilizing the main polygon pattern and a chosen randomly crucial point to create fractal patterns.e main goal is to construct a combination of points with a recurrent attitude to achieve a shape with a similar style in different ranges.
Using a Sierpinski triangle fractal as an example, we may better appreciate the chaos game's theory.As shown in Figure 3, if three points are chosen for the main fractal structure, in this case, the output is the triangle.Selected vertices have been highlighted in red, green, or blue.e die utilized should have two red sides, two blue sides, and two green sides in this situation.First, a random point is chosen as the fractal's seed.A seed is moved from its starting location to the vertex corresponding to that color on each die roll by rolling it again and using its new location as a starting point for further reiterations.Finally, a die is rolled multiple times before the Sierpinski triangle appears.
As a result of using the chaos game mechanics and fractals, the CGO method has been developed.Many candidate solutions (S) represent a few of Sierpinski's valid points.ere are some choice factors (s j k ) associated with each solution candidate (S k ).ese selection factors reflect the placement of such eligible seeds within the Sierpinski triangle.e triangle can also be used to seek solutions.e primary strategy is to generate new seeds in the search area that could be the newly eligible seeds by generating temporary triangles.Toward achieving this goal, four different approaches are described.ere is an iteration of this technique across all eligible seeds and the k th temporary triangles inside the search domain.e triangle has three nodes inside the search area, including three k th initial points, the blue (S k ), green (G), and red points (M k ).In this temporary triangle, a die is used to create new seeds using the chaotic method.Chaos game principles are used in this temporary triangle, creating new points with a die and three seeds.e three seeds (S k , the G, and M k ) are placed in order of importance, from first to third, respectively.When it comes to S k 's first seed, a die with six faces (i.e., three red and three green) is used.Depending on the color of the die, the point is transformed in S k toward M k (red side) or G (green side).When rolling dice comes up green/red, the point is moved over to either G/M k .It is possible to replicate this feature by using a random number generation function that creates only two values (0 and 1) for the possibility of selecting red or green sides.e green side indicates that the seed placed in S k has moved to the G, while the red side indicates that the seed placed in S k has moved to M k .Unaffected by the fact that both sides of the game are equally likely to emerge, creating two random numbers for both M k and G assumes that perhaps the seed contained in M k is relocated anywhere along connected connections between the M k and G.As a result of the chaos game technique that manages this feature, some randomly generated factorials are also used based on the actuality of the seeds' movement inside the search region.e first point has the following mathematical expression: where S k is the solution candidate (k th ) and G refers to the global solution implemented so far.As the name suggests, M k is the average number of beginning points considered three points in the k th temporary triangle.Seed motion limitations are modeled using the randomly generated factorial, where α k is the seed's motion limitations.If there is a desire to represent the likelihood of rolling a dice, β k and c k correspond to random integers of 0 or 1. D is the number of eligible points (solution candidates).Regarding the second point, which is placed in the G, a die with six faces (i.e., three red and three blue) is utilized.
e point in the G is moved to the S k (blue face) or the M k (red face).When a random number production function generates only two numbers, 0 and 1, for the possibility of picking red/blue faces, this property can be represented.When the blue face shows, the position of the seed in the G is changed to the S k .When the red face shows, the position of the point in the G is changed to the M k .Although each blue or red side has an equal chance to happen, the potential of generating two random numbers of 1 for S k and M k is also assumed that the point placed in G is relocated along the course of the connected connections between M k and S k .According to the chaotic game technique, transportation inside the search region should be limited based on the actuality of the seed; certain randomly generated factorials are used to manage this feature.e mathematical representation for the second seed is as follows: In addition, for the third seed, which is placed in M k , a die with three blue sides and three green sides is used.e seed in M k is transferred to the S k (blue side) or the G (green side) by rolling the dice and relying on the color that shows green/blue. is functionality can be represented by a random integer creation function that generates only two values, 0 and 1, for the option of selecting green/blue faces.When the blue face shows, the position of the point in the M k is changed to the S k .When the green face occurs, the place of the point in the M k is transferred to the G.Each one of the green and blue sides has an equal chance of occurring in this game.en, creating two random numbers of S k and G. Next, the M k is transferred the path of the associated lines between the G and the S k .Based on the actuality of the point, Computational Intelligence and Neuroscience movements inside this search region should be controlled using the chaotic game technique to regulate this feature; speci c randomly generated factorials are used.e third seed has the following formula: ( e additional point is also used as a fourth point placed in the S k to conduct out all the stages of modi cation inside the search range.e technique for upgrading the fourth seed's placement is dictated by speci c random uctuations in the randomly chosen decision factors.e fourth seed has the following mathematical representation: where the point dimension is denoted by N. i denotes an integer in the range [1, N]. rand stands for an uniform random value (0, 1).For managing and changing the rates of exploration and exploitation within the proposed CGO algorithm, four formulas are conducted to identify the α k as shown in Equation (7), which is used to simulate the seeds mobility limitations.ese four formulas are randomly employed to locate the position of the rst through third seeds.
where R denotes an uniform random value in the range (0,1).Besides, ϵ and α are integers having random values ranged (0,1).According to the self-similarity of the fractals, the early eligible seeds and the freshly formed seeds applying the chaos game principle must be considered to determine if the newly created seeds should be included or not with the total eligible seeds inside the search domain.As a result, the initial seeds are transformed into new individual points if they achieve the highest levels of self-similarity or they are reserved if the new seeds achieve the lowest levels of selfsimilarity.Consider that the substitution operation is carried out in the mathematical technique to obtain a model with a reduced di culty level.Since the Sierpinski triangle is a total form, the total points that have been found so far are used to complete its shape.If the solution variables (S j k ) are out of bounds, it is crucial to deal with them as soon as they are discovered.S j k is outside the range of variables in this example, and the ag advises adjusting the boundaries of those variables.After a prede ned set of optimization rounds, the optimization method concludes.
Algorithm 1 outlines the steps of the CGO algorithm.Besides, Figure 4 depicts the owchart of this algorithm.Initially, the beginning locations of the solution candidates (X) inside the search region are determined by random selection.Second, we determine the initial solution options' objective value based on the initial seeds' self-similarity.en, it produces the global best (G) pertinent to the seed with a high eligibility level.Furthermore, we generate a mean group (M k ) that used a random choice technique for each eligible point (S k ) inside the search area.Also, we create a temporary triangle with the required three vertices of S k , M k , and G for each eligible seed (S k ) inside the search region.Subsequently, we nd four seeds for each temporary triangle using Equations ( 3)- (6).Afterward, the s j k external variables scope should be checked for boundary conditions.Moreover, self-similarity is taken into account while calculating the objective function of these new seeds.Finally, it is time to replace the initial eligible points with new seeds if their objective functions show high self-similarity levels.

Optimal Feature Selection.
In general, the feature extraction methods are separated into training and test datasets, with the training dataset used to learn the model to identify the essential features.Figure 4 depicts the stages of the binary CGO optimization technique.First, the CGO is to produce a series of N agents X that depict the FS best solution.en, the following formula is used to carry out a task: e dimension of the speci c issue is denoted by Dim in Equation (8) (i.e., the number of features).In comparison, the search space is de ned by U and L. A further step is to acquire the Boolean edition from each Xi, which is accomplished to use the following equations: e objective value from each X i is computed by applying the optimization technique, which depends on the binary BX i and classifying mistakes.
in which (|BX i |/Dim) represents the ratio of de ned feature sets.e classifying fault utilizing SVM is denoted by c i .SVM is commonly used because it is much more steady than other classi cation techniques and has fewer parameters.In contrast, λ is a measurement that always had to adjust the proportion of selected features and categorization fault.e following step is to examine the halt criteria, and if they have been encountered, the best solution is brought back.Alternatively, the automatic update steps are repeated.e classi cation is conducted after getting the optimal features from the CGO algorithm.We use a machine learning technique, such as stochastic gradient descent (SGD).To train deep neural networks with better prediction capabilities by investigating the top nonconvex cost space is among the main objectives in DL.As a typical reason to describe this phenomenon, one can demonstrate that perhaps the cost landscape on its own is simple, with no misleading local optimal.However, it turns out that the cost landscape of superior DL models has ctitious local (or global) optimum, and stochastic gradient descent (SGD) is capable of detecting them [60].Nevertheless, the SGD approach, launched at random, has high generalization qualities in the real world.In explaining this achievement, a hypothesis would have to provide for the entire method course, which became apparent.e problem remains challenging, even for the most advanced DL trained on datasets, which are still in the experimental stage.

Experimental Data.
ree datasets of medical photos were used to conduct image classi cation task for our experimental tests: PH2 [44], ISIC-2016 [43], and Blood-Cell datasets as in [45].
(1) PH2: A total of 200 dermoscopic images were included in this dataset, including

Evaluation Metrics.
is research was evaluated using the metrics in Table 2: balanced accuracy, accuracy, recall, precision, and F1 score.Balanced accuracy is specified as the average accuracy acquired across all classes.e quantity created across all predicted values is referred to as accuracy.
e recall is calculated as the proportion of actual numbers to values that should have been predicted.Precision is calculated as the proportion of actual numbers to defined properties.Finally, the F1 score indicates a class imbalance between recall and precision, where false positives (FP) refers to the precise number of positives discovered from actual samples, when referring to true negatives (TN), it refers to the correct number of nonmodular data found.Besides, the number of nodular data discovered in a nonnodular sample is known as false positives (FP).Finally, it represents the number of faults identified in actual nodular data, referred to as false negatives (FN).

Experimental Results and Analysis.
e results analysis and discussion of experiments for the suggested approach task scheduling technique are presented in this section.First, we compare our approach with various metaheuristic optimization strategies.Afterward, the three classifiers are compared, namely, k-nearest neighbor (kNN), support vector machines (SVM), and stochastic gradient descent (SGD).
en, we compare our results to those of other current medical image classification algorithms.Finally, a comparison with published techniques has been conducted.
To objectively examine the effectiveness of our proposed approach, we compared it to nine wellknown algorithms.
As seen in Table 3, each optimizer retains a particular set of parameters.As the number of search agents increases, so does the likelihood of finding a worldwide optimal.e sample size is set at 50 in all experiments.e number of search agents could be reduced complexity.
e nine optimization techniques were combined with standard machine learning classifiers to produce the findings, such as KNN, SVM, and SGD.(a) According to KNN, an unidentified sample's classification is determined by the geographical sharing of benefits in that population.We can then find out where the k closest examples are located.e length among items is used to determine consistency.A typical length in a Euclidean distance is based on a mathematical formula.(b) It is possible to use SVMs as classification algorithms by altering the distributed space of data.SVM uses statistical knowledge for the classification task, and hyperplane arithmetic can be used to understand statistics.e hyperplane is defined based on the kernel used during a plot.Linear, polynomial, and RBF kernels are among the most common kernel types.(c) ere are many advantages to using the SGD technique.An explanation for such success had to cover a broad duration of the procedure, which became apparent.Only the most robust DL learned on data, already in the test stage, have difficulty in solving the challenge.

Analysis Results.
When evaluating these optimization techniques, multiple measures are used.Evaluation of each method was based on recall, precision, accuracy, and F1 score.PH2, ISIC-2016, and Blood-Cell datasets are represented in 4, 5, and 6, respectively.In these tables, the bolded results are the highest accurate ones.According to the outcomes shown in these tables, the SGD-based CGO beats PSO, MVO, GWO, MFO, WOA, FFA, BAT, and HGS.
On the PH2 dataset, Table 4 shows that the CGO approach plays a significant role in feature selection when applying an SGD classifier since the findings are still effective; this is apparent throughout all measures.Analyzing results on the accuracy metric, using the SGD classifier, CGO can classify 97.52% of the test set, which is higher than the findings of the other optimization algorithms.According to the CGO, the BAT, HGS, MVO, MFO, and GWO in the second level are both at 97.50%.Moreover, the PSO's accuracy results are on par with WOA's, at 97.14%.Lately, the FFA's result has been the worst performance (i.e., 96.79%).On another view, the CGO achieved 97.54% on the precision metric, which was the best result on the SGD algorithm.BAT, HGS, MFO, and MVO came in the second level, 97.53%.ey are followed by the GWO, which achieved 97.51%.en, with the same level of precision, PSO and DLOHGS both have 97.19%.Last but not the least, FFA has the lowest performance with 96.84%.To make things even better, the recall measure for the SGD classifier was 97.51% for CGO, 97.50% for HGS, MVO, MFO, and BAT, 97.49% for GWO, 97.14% for PSO, and WOA, and 96.79% for FFA.In terms of F1 score, our CGO algorithm came out on top, with 97.51%.CGO is followed by the BAT, HGS, MFO, MVO, and GWO algorithms, 97.50%.Also, WOA achieved 97.15%.e last algorithms, PSO and FFA, are the worst in performance.In addition, the balanced accuracy measure for our CGO algorithm was 97.93%.Following CGO are BAT, MVO, MFO, GWO, and HGS algorithms with 97.92% each.More than that, the PSO has 97.62% accuracy.Lastly, FFA and WOA had the worst results with 97.32% and 97.02%, 10 Computational Intelligence and Neuroscience respectively.However, integrating these nine optimization techniques with the KNN classi er and SVM classi er produced the lowest metrics results compared with the SGD classi er. e proposed CGO algorithm outperformed other optimization techniques on the ISIC-2016 dataset, as seen in Table 5. e accuracy of the CGO algorithm for the SGD classi er was 88.39%, which was the best performance.In comparison, the BAT was at the second level, with 87.60%.With 87.07% of the vote, the HGS algorithm follows the preceding two.
e PSO algorithm, which has 85.75%, follows the preceding three methods.e FFA and MVO algorithms (84.77%) are similar to their predecessors' algorithms.e algorithms that follow are the GWO (84.43%),WOA (83.91%), and MFO (79.95%).For the precision measure, our suggested CGO approach achieved a score of 87.81%.Following the FFA comes the HGS, which has an 87.75% rating.It was 86.22% for the HGS algorithm to keep up with them.84.82% and 83.99% are the relative percentages for the PSO and WOA algorithms after the previous two algorithms in order of importance.e previous algorithms are followed by GWO, MVO, and FFA, which have respective success rates of 83.92%, 83.99%, and 83.78%.e MFO, on the other hand, has the lowest performance of 80.15%.As a result of the recall metric, 88.39% of the test samples were able to be compared using CGO, BAT, HGS, PSO, FFA, MVO, and GWO algorithms, while 83.91% of them were compared using the WOA method and 79.95% were compared using the MFO algorithm.For example, the proposed CGO outscored previous algorithms by 87.51% on the F1 score scale.86.14% was obtained by HGS, which HGS followed.Next, BAT, MVO, GWO, and WOA have 85.79%, 84.18%, 84.14%, and 83.95%, respectively.Finally, MFO gets    the poorest performance with 80.05% but not the latest.ere was a 75.69% balanced accuracy of the CGO algorithm, which was the best performance.Regarding the WOA's and the HGS's performance in the second and third levels, respectively, they scored 74.90% and 73.86%.GWO is behind with 73.72%.FFA scored 64.85%, which is the lowest possible score.
For the Blood-cell dataset, the results of the CGO method and other optimizers are shown in Table 6.e SGD, SVM, and KNN classifiers have been combined on the nine optimizers in the table.According to the table, merging the CGO algorithm with SGD surpassed other algorithms by 88.79%, which is the accuracy score.GWO is then used to get the same outcome as MFO (i.e., 88.74%).ere is also 88.70% in the MVO.BAT and HGS had the worst score, with 88.58%.e CGO also had the best results on the precision metric, with 91.10% of the vote.Ninety-one% (91.07%) was the secondbest result, which belongs to MFO and WOA.Two other algorithms (BAT and HGS) performed poorly, with 90.92% and 90.83% of their respective performances, respectively.Recall results were better when using the CGO algorithm, with the best outcomes.e GWO and MFO all have the same recall (i.e., 88.74%).ey are followed closely by the MVO; 88.66% was reached by the FFA and WOA, whom the FFA and the WOA followed.Finally, the BATand HGS algorithms have a worse outcome of 88.58%.e proposed CGO also outperformed other algorithms on F1 score, with 89.95%.e MFO and GWO optimizers came second with 88.98%.ere are also 88.95% for each of the other algorithms: MVO, WOA, FFA, PSO, and BAT, correspondingly.Finally, the HGS gets the poorest performance with 88.82% of the population.In the CGO algorithm, 88.78% accuracy was attained.At the same time, MFO was ranked second (88.74%) by the GWO.With 88.66%, WOA and FFA algorithms are next in line.Only BAT and HGS achieved a score of 88.58%.
According to a different perspective, Figure 6 depicts the average accuracy of each feature selection optimization algorithm on the three selected datasets examined on the SGD classifier.e total average result on three databases is about 91.57% for the CGO, while the BAT technique comes in second with 91.23%.About 91.05% of outcomes from the HGS are better than those from the PSO.ose are followed by the MVO (90.03%),GWO (90.23%),FFA (90.05%), and WOA (89.90%).Last but not the least, the MFO has the lowest performance (88.73%).
According to a client, the complete method takes far less time to execute. Figure 7 shows that the suggested CGO and HGS algorithms have an average execution time of 0.5672 and 0.5189 seconds for the three datasets, respectively.ese results are lower than those of other algorithms that have been compared.e MFO optimizer took 0.7164 seconds to run, whereas GWO, WOA, FFA, BAT, and MVO took 0.7169 s, 0.7177 s, 0.7332 s, 0.7644 s, and 0.7723 s, respectively.e highest (or worst) execution time was attained (1.0576 s) for the PSO.
Figure 8 displays the average balanced accuracy of each feature selection approach on the three datasets, namely, ISIC-2016, PH2, and Blood-Cell, from a different e SGD, SVM, and KNN classifier's average accuracy on the three selected datasets are shown in Figure 9 on various techniques for optimization (i.e., the nine optimizers, which are introduced before).In the figure, we can see that the SVM outperformed other classifiers on the accuracy metric.To be more specific, the SVM achieved 90.78% accuracy, whereas the KNN achieved 90.13% accuracy.In the end, the SGD algorithm achieved 90.40%.
However, the time to complete the full procedure is shorter than that for a user.As a result, the average execution time of the optimization algorithms for the three databases is presented in Figure 10.SGD's classification algorithm took the least amount of time, according to the results, then comes the SVM classifier, which takes 0.2767 seconds to complete its task.1.7271 seconds is the longest (and therefore the worst) time for another classifier, KNN.
To sum it up, the CGO optimization technique paired with the SGD classifier earned the greatest accuracy metric among all combinations for the ISIC-2016, PH2, and Blood-Cell datasets.Moreover, the SGD outperforms other classification algorithms (i.e., KNN and SVM) according to the results.

Comparison with the Literature Studies.
is section compares with other state-of-the-art medical image classification techniques.Table 7 shows the results of state-of-theart methods.e development of high-accuracy technology for medical image classification is a major undertaking.It is important to compare our strategy to other models that have been tested on the same datasets.Using ISIC-2016, PH2, and Blood-Cell datasets, Table 7 evaluates the performance of several techniques for disease identification.
For the ISIC-2016 dataset, the following advanced skin cancer identification methods were compared: Based on segregation and then validation [69], relied on feature-fusion [70], correlated with fisher-coding and deep residual networks [71], multi-CNN interactive learning model [31], ensemble method [32], and integrating fisher-vector and CNN fusion [33].To differentiate characteristics, a finegrained classification concept is applied [34].
For the PH2 dataset, the following advanced techniques for diagnosing melanoma have been included in the artificial neural network approach; as introduced in [72], they developed a decision-aid system.Also, it was proposed by the authors of [73] to use sparse kernel models to represent feature data in a high-dimensional feature vector.According to the authors of [74], U-Net can be used to detect malignant tumors automatically.As a part of their IoT system, the authors of [75] employed transfer learning and CNN.A hierarchical architecture founded on two-dimensional pixels in the image and ResNet was introduced in [76] for advanced DL.

Computational Intelligence and Neuroscience
As a result of the CNN solution, the SVM-based classi ers were able to classify data, as proposed in [77].Besides, a granularity feature and SVM are used in [78].In order to identify and count essential blood cells in the Blood-Cell dataset, they used the following identifying and counting methods.In order to automate the entire procedure, CNNs were presented as a DL method in [79].

Discussion
e bottom line is that we can remove super uous features from high-dimensional medical image representations obtained by CNN (i.e., MobileNetV3).e MobileNetV3 models achieved the e ective performance as a feature extractor in our work.
e class activation map for the MobileNetV3 model was prepared where the activation provided by the last layer is represented as an overlayed heat map, as shown in Figure 11.In the gure, the red regions represent the most important discriminative regions, while the other colored regions are less important.
In order to include a more complex comparison among di erent algorithms, we have used the Friedman (FD) test.e FD test is nonparametric that calculates and ranks the statistical value.In Reference [80], the FD test is used to determine whether there is a signi cant di erence between di erent methods.Furthermore, Figure 12 compares the CGO method to the nine optimization techniques on the three datasets in terms of recall, precision, F1 measure, accuracy, and balanced accuracy.When the CGO's results are analyzed using the ve metrics, it is clear that the CGO algorithm surpasses the others.In terms of balanced accuracy, the CGO has the lowest mean ranking of 1, following the GWO has the mean rank of 3.50.MFO and MVO have nearly identical mean levels, with 4. WOA and HGS have a mean rank of 4.17.Finally, BAT, PSO, and FFA are lower than the others, with a   Computational Intelligence and Neuroscience ese reasons support that our approach obtains the best results.
us, CGO is an effective search algorithm for tackling complex optimization issues; therefore, it is critical to pick its parameters carefully.For example, when the clusters of the population in CGO were analyzed, CGO worked better when the population of an optimal solution was classified into two parts.Second, CGO performed searches more consistently than other methods, as evidenced by lower standard deviation values in the results.Finally, CGO's exploration and exploitation techniques are successfully applied since they worked equally well on datasets with a wide variety of dimensions, making FS challenges adaptable.
However, our approach also has some limitations, mainly in time and memory complexity.erefore, we are currently working on trying to improve the efficiency of our approach.In fact, we are assessing to take into account other augmentation procedures, as introduced in [81].Moreover, we plan to use other deep learning models such as Swin or Vision transformer, which achieved the best results and have been more recently used in different computer vision tasks.

Conclusion and Future Work
e automatic medical image classification task has been expanding rapidly in recent years.However, existing approaches are still incapable of achieving good performance due to the similarity in physical attributes of images, the diversity of medical experience, and a small medical image dataset.
erefore, this paper demonstrates a new method of classifying medical images that uses the IoMT system to help clinicians and patients make a quick and advanced diagnosis of diseases in any area.
e proposed system relies on the classification models trained in the cloud center before being used, after extracting features from the medical images acquired from IoT nodes and passing them to fog computing.To obtain the features, Mobile-NetV3 was used.
e MobileNetV3 was finetuned on medical image datasets to generate higher sophisticated and informative representations and retrieve feature vector representation.
After that, we used a new metaheuristic method in the binary form (as chaos game optimization) to reduce the features' representation space. is algorithm leads to an enhancement for the convergence rate toward the optimal subset of relevant features.erefore, CGO produces a high convergence speed. is indicates that it avoids trapping in local optima.us, it successfully balances the exploration and exploitation phases because of the fast determination of the threshold values and the high accuracy presented in the results.
e learned model's efficiency is evaluated either by transmitting it to a tested medical images cloud center or by using fog computing with a clone of the learning algorithm.Our experiments were applied on three databases, ISIC-2016, PH2, and Blood-Cell.
According to the results, the new CGO optimization method outperforms other existing feature selection methods.
e work evaluated the combinations of nine optimizers with three different classifier configurations.e most significant results for accuracy, F1 score, recall, and precision metrics for these datasets were achieved with the CGO optimizer combined with the SGD classifier.For ISIC-2016, the accuracy value was 88.39%; for PH2, the accuracy was 97.52%;, finally, for Blood-Cell, the accuracy was 88.79%.Computational Intelligence and Neuroscience Furthermore, the results of the comparisons with some other state-of-the-art medical image classification technologies demonstrated that the created IoMTmethodology is an appropriate mechanism.In the near future, this system would be available in hospitals with the aim of monitoring the patients' condition from home.Patients would automatically send a report to the hospital through the connected devices, with vital information about blood pressure, insulin levels, etc.
en, professional staff at the hospital would follow up each case and, if needed, would directly communicate to the patient.
However, there are still some limitations to the proposed model, being the most relevant the requirements for computational resources, that is, more time is needed to obtain the results, and also the requirements for memory resources.
We are currently working on lowering complexity and enhancing the efficiency of the suggested system.Also, we plan to propose a CGO-based multiobjective feature selection approach for high dimensional data with a small instance to simultaneously maximize the classification performance and minimize the number of features, using more efficient classifiers.Additionally, automatic cluster number determination and the application of hyperheuristic approaches in FS can also be an exciting line of research.Moreover, a more comprehensive volume of medical data will be evaluated in the future study.Finally, merging several classifications algorithms is an attractive object of investigation that could allow practitioners to influence the performance of existing methods.

Figure 1 :
Figure 1: Diagram of the proposed IoMT system.

Figure 2 :
Figure 2: e building blocks of the proposed network architecture for feature extraction.

Figure 5 :
Figure 5: Example medical image samples for classi cation task from the three selected datasets.

Figure 8 :Figure 6 :
Figure 8: Average balanced accuracy of the selected datasets based on nine FS algorithms.

Figure 7 :
Figure 7: Average execution time of nine FS methods.

Figure 9 :Figure 10 :
Figure9: e averaged results of the selected dataset in terms of accuracy metric using the three classi ers.

Figure 11 :Figure 12 :
Figure 11: Grad-CAM heatmaps on the skin images using the MobileNetV3 model.

( 1 )
Input:(2) D: the number of starting eligible seeds.(3)Initialize the starting positions (S j k ) with random values of eligible seeds (S k ).(4) Output:(5) G: the global best eligible seed.(6) (7) Method:(8) Compute the objective function for each eligible seed.(9) repeat(10) for k � 1 to D do(11) Create a mean group (M k ).(12) Construct a temporary triangles on three vertices of S k , G, and M k(13) Create new seeds by Equations (3) to (6).(14) if boundaries are crossed by new seeds then(15) Position limitations can be adjusted for new seeds.(16) Assess the fitness of new points.(17) if new seeds have higher objective function than the last initial eligible seeds then(18) Substitute the last points by the new ones.(19) if the best solution is achieved then (20) Amend G. (21) until the iteration criterion has been met.(22) Return G. ALGORITHM 1: Algorithm of CGO.

18
Optimization.During using methods for extraction of features, including MobileNetV3, the extracted features were not transmitted straight to the classi cation algorithm since it needed more processing time to reach.Feature selection (FS) techniques reduce redundant or unusable features from retrieved patient data like a content decomposition method.It means that the FS process minimizes the quantity of data transferred.As a result, an

Table 4 :
Results of each algorithms on PH2 dataset.

Table 3 :
e parameters of each FS optimizer and their values.

Table 5 :
Results of each algorithms on the ISIC-2016 dataset.

Table 6 :
Results of each algorithms on the Blood-Cell dataset.

Table 7 :
Accuracy results (%) of the existing approaches.
thermore, we discovered that the CGO in the F1-score measure has the best mean rank of 1, and the GWO and MVO have the second and third mean ranks of 3.83 and 4, respectively.BAT and HGS have nearly identical mean levels (i.e., 5).Finally, MFO and WOA have a mean rank of 5.17 and 6.00, respectively.Finally, PSO and FFA have lower than the others, with a followed by MVO, which achieved 4.33.BAT has a mean rank level of 4.67, whereas MFO and HGS have 5. Lastly, GWO, PSO, FFA, and WOA have the highest mean ranking.As a result of Friedman's test, there is a noticeable di erence between the proposed model and the other models (where the p value is less than 0.05), as shown in Figure12.