DCNN-FuzzyWOA: Artificial Intelligence Solution for Automatic Detection of COVID-19 Using X-Ray Images

Artificial intelligence (AI) techniques have been considered effective technologies in diagnosing and breaking the transmission chain of COVID-19 disease. Recent research uses the deep convolution neural network (DCNN) as the discoverer or classifier of COVID-19 X-ray images. The most challenging part of neural networks is the subject of their training. Descent-based (GDB) algorithms have long been used to train fullymconnected layer (FCL) at DCNN. Despite the ability of GDBs to run and converge quickly in some applications, their disadvantage is the manual adjustment of many parameters. Therefore, it is not easy to parallelize them with graphics processing units (GPUs). Therefore, in this paper, the whale optimization algorithm (WOA) evolved by a fuzzy system called FuzzyWOA is proposed for DCNN training. With accurate and appropriate tuning of WOA's control parameters, the fuzzy system defines the boundary between the exploration and extraction phases in the search space. It causes the development and upgrade of WOA. To evaluate the performance and capability of the proposed DCNN-FuzzyWOA model, a publicly available database called COVID-Xray-5k is used. DCNN-PSO, DCNN-GA, and LeNet-5 benchmark models are used for fair comparisons. Comparative parameters include accuracy, processing time, standard deviation (STD), curves of ROC and precision-recall, and F1-Score. The results showed that the FuzzyWOA training algorithm with 20 epochs was able to achieve 100% accuracy, at a processing time of 880.44 s with an F1-Score equal to 100%. Structurally, the i-6c-2s-12c-2s model achieved better results than the i-8c-2s-16c-2s model. However, the results of using FuzzyWOA for both models have been very encouraging compared to particle swarm optimization, genetic algorithm, and LeNet-5 methods.


Introduction
COVID-19 was initially designated an epidemic disease by the World Health Organization (WHO) in March 2020 [1]. Due to the increasing number of deaths, the spread of the disease, the lack of access to vaccines and particular drugs, and rapid diagnosis of the disease to break, the transmission chain has become one of the most important research topics for researchers. Polymerase chain reaction (PCR) test [2] and X-ray images [3] are standard methods in detecting COVID-19. One of the problems of PCR tests is that there are not enough kits and also it takes a relatively long time to answer the test. In addition to being affordable, X-ray images are always and everywhere available. Reducing the time to diagnose and detect positive cases, even without fever and cough symptoms, are other benefits of using X-ray images [4]. AI tools can increase processing time and high accuracy in detecting patients with COVID-19 [5]. Much research has been done to identify positive cases of COVID-19 [3,6]. However, until COVID-19 disease is completely eradicated, the need to research and discover new, fast, low-cost, and accurate techniques is acute. DL is one of the AI techniques for detecting positive cases of COVID-19 [7]. Training is the most challenging part of DL. Examples of algorithms used for DL training are conjugate gradient (CG) algorithm [8], Krylov subspace descent (KSD) algorithm [9], and Hessianfree optimization (HFO) approach [10].
While stochastic GDB training methods are simple to construct and run quickly in the producer for large numbers of training samples, GDB approaches require extensive manual parameter adjustment for optimal performance. eir structure is sequential and leads to parallelizing them with GPU become challenging. On the other hand, though CG methods are stable for training, they are almost slow lead to needing multiple CPUs and a lot of RAMs resource [8]. Deep auto-encoders used HFO to train the weights of standard CNNs, which performs better than Hinton and Salakhutdinov's approach for pretraining and fine-tuning deep auto-encoders [11]. In addition, HFO is weaker than KSD and more complex. In terms of the amount of memory required, HFO requires less memory than KSD. KSD optimization and classification speeds also work better [9]. Recent years have seen the employment of metaheuristic and evolutionary algorithms to solve and optimize real-world problems [12][13][14]. Despite this, research on optimizing DL training needs to be given more attention. Optimization based on metaheuristic algorithms with a hybrid genetic algorithm and DCNN is the beginning of this field study [15]. is model determines the DCNN parameters through GA's crossover and mutation processes, with the DCNN structure modeled as a chromosome in GA. Alternatively, only the weights and biases of the first convolution layer (C1) and the third convolution layer (C3) are used as chromosomes during the crossover step. In [16], they present an evolutionary method for fine-tuning the parameters of a DCNN by utilizing the Harmony Search (HS) algorithm and several of its improved variants for handwritten field digit and fingerprint detection. In [17], researchers will develop a hybrid deep neural network (DNN), using computed tomography (CT) and X-ray imaging, to predict the risk of COVID-19-related disease onset. In [18], a new method of diagnosing COVID-19 based on chest X-ray images using artificial intelligence is proposed. In comparison to the stateof-the-art techniques currently used, the proposed method will demonstrate outstanding performance.
In [19], the progressive unsupervised learning (PAUL) algorithm is used for DCNN training. PUL is the easiest way to implement. erefore, it is considered a primary benchmark for unsupervised feature learning. Due to the fact that clustering data sets might be difficult to categorize, PUL initially inserts a selection stage between the clustering and fine-tuning stages. In [20], an approach for automatically building DCNN architectures on the basis of GA is suggested for optimizing image classification. e lack of knowledge about the structure of DCNN is the most crucial feature of this method. In contrast, the presence of large DCNNs causes chromosomes to grow, thus slowing down the algorithm. Due to the faults described, our proposed strategy comprises training a DCNN model on Data 1 to identify positive and negative cases of COVID-19 samples using X-ray pictures. Following that, the previously trained DCNN's FCL will be replaced with the new FCL, which has been tuning using the whale optimization algorithm, and employs fuzzy logic to adjust its control parameters for better WOA development and performance. e name of the proposed algorithm is called FuzzyWOA. erefore, in this article, our main motivation is to investigate the impact of FuzzyWOA on improving DCNN performance. Our main contribution in this paper is to improve WOA performance by designing and applying a fuzzy system to balance the exploration and extraction boundaries in the search space for automatic detection of COVID-19 using X-ray images. In this regard, for a fairer comparison, in addition to FuzzyWOA, PSO, GA, and LeNet-5 are used for two DCNN models with different structures in order to automatically detect COVID-19 cases. Of course, it should be noted that various metaheuristic methods have been used to train the neural network, such as sine-cosine algorithm [21], Salp swarm algorithm [22], best-mass gravitational search algorithm [23], particle swarm optimizer [24], biogeography-based optimization [25], dragonfly algorithm [26], and chimp optimization algorithm [27]. But the common problem of these algorithms that leads to inefficiency in some problems is the lack of detection of two phases of exploration and extraction. One of the advantages of using FuzzyWOA is establishing a correct trade-off between the two phases of exploration and extraction in the algorithm's search space. Other disadvantages of using some high metaheuristic methods include being stuck in local optimizations, low convergence speed, high complexity, increasing the number of control parameters, and so on. For this reason, it seems necessary to use an algorithm that performs better in less time. Improvements to FuzzyWOA have eliminated all of these drawbacks. Following that, the other connection weights are kept in the residual layers of the pretrained DCNN, resulting in the training of a linear structure using the characteristics of the final layer.

Materials and Methods
is section consists of four subsections. e first subsection first introduces WOA and then describes the proposed FuzzyWOA algorithm. e second subsection deals with the DCCN model. e third subsection is about the COVID X-ray database, and the fourth subsection describes the methodology.
2.1. FuzzyWOA. First, the WOA mathematical model is explained, and then how to use fuzzy logic to develop the algorithm.

WOA.
e WOA optimization algorithm was introduced in 2016, inspired by the way whales were hunted by Mirjalili and Lewis [28]. WOA begins with a collection of randomly generated solutions. Each iteration, the search agents update their location by using three operators: encircling prey, bubble-net assault (extraction phase), and bait search (exploration phase). Whales discover and encircle prey. e WOA assumes that the best solution right now his prey. at once best search agent has been recognized, all other search agents' locations will be updated to point to the best search agent. is behavior is expressed by the following equations: where t is the current iteration, A → and C → are the coefficient vectors, (X * � �→ ) is the place vector is the best solution so far, and X → is the place vector. In each iteration of the algorithm, (X * � �→ ) should be updated if a better answer is reached. e vectors A → and C → are obtained using the following equations: where α → decreases linearly from 2 to zero during repetitions and r → is a random vector in the distance [0, 1]. e whale uses the bubble-net assault strategy to swim simultaneously around its target and along a contraction circle in a spiral pattern. To describe this concurrent behavior, it is anticipated that the whale would change its location during optimization via one of the contractile siege mechanisms or the spiral model with a 50% probability. Equation (5) defines the mathematical model for this phase.
where D → is obtained from equation (6) and refers to the distance i from the whale to the prey (the best solution ever obtained). A constant b is used to specify the geometry of the logarithmic helix, and l is a random value between −1 and 1. p is a nonzero integer between 0 and 1. Vector A is used with random values between −1 and 1 to bring search agents closer to the reference whale. In the search for prey to update the search agent's position, random agent selection is used instead of using the best search agent's data. e mathematical model is in the form of the following equations: X ran d �����→ is the randomly chosen position vector (random whale) for the current population, and vector A → is utilized with random values larger or equal to one to drive the search agent away from the reference whale [29].

Proposed Fuzzy Logic for Tuning Control Parameters.
e proposed fuzzy model receives the normalized performance of each whale in the population (normalized fitness value) and the current values of the parameters α → and C → . e output also shows the amount of change using the symbols Δα and ΔC. e NFV value for each whale is obtained by equation (8).
is paper's optimization problem is of the minimization type, in which the fitness of each whale is obtained directly by the optimal amount of these functions. Equations (9) updating the parameters α → and C → for each whale are as follows: e fuzzy system is responsible for updating the parameters α → and C → of each member of the population (whale), and the three inputs of this system are the current value of parameters α → , C → , and NFV. Initially, these values are "fuzzification" by membership functions.
en their membership value is obtained using μ. ese values are applicable to a set of rules and result in the values ∆α and ∆C. Following the determination of these values, the "defuzzification" technique is used to approximate the numerical values ∆α and ∆C. Finally, these values are applied in equations (9) and (10) to update the parameters ∆α and ∆C.
e fuzzy system used in this article is of the Mamdani type (see Table 1). e suggested fuzzy model and membership functions used to update the whale algorithm's control parameters are shown in Figure 1.

Convolutional Neural Network.
DCNNs are very similar to multilayer perceptron neural networks [30]. ese networks are built on the basis of three principles: weight sharing between connections, local receive fields, and temporal/spatial subsampling [31,32]. e principles discussed above may be classified into two types of layers: subsampling layers and convolution layers. ree convolution layers C1, C3, and C5, positioned between layers S2 and S4, and a final output layer F6 comprise the processing layers (as shown in Figure 2). Feature maps are used to arrange these subsampling and convolution layers. In the last layer, neurons in the convolution layer are connected to a local receptive field. us, neurons with the same feature maps (FMs) receive data from different input regions until the input is wholly skimmed to share identical weights. e FMs are spatially downsampled by a factor of two in the subsampling layer. For example, in subsequent layer S4, FM of size 10 × 10 is subsampled to conforming FM of size 5 × 5. e last layer is responsible for categorization (F6). Each FM in this structure is the result of convolution between the maps of the previous layer and their respective kernel and a linear filter. e weights w k and adding bias b k produce the k th (FM) FM k ij using the tanh function as equation (10).

Computational Intelligence and Neuroscience
By lowering the resolution of FMs, the subsampling layer achieves spatial invariance, in which each pooled FM corresponds to a single FM in the previous layer. Equation (11) is defined as the subsampling function.
where α n×n i denotes the inputs and β and b, respectively, denote the trainable scalar and bias. After many convolution and subsampling layers, the final layer is a completely linked structure that carries out the classification process. Each output class has its own neuron. As a result, in the COVID-19 data set, this layer comprises two neurons for each of its classes.

Data set.
e database used with the name COVID-Xray-5k consists of 2084 tutorials and 3100 test images [33]. In this data set, since lateral images are not suitable for identifying the target and according to the radiologist's recommendations, anterior-posterior COVID-19 X-ray images have been used. Radiologists evaluate data set images, and items that do not have exact COVID-19 symptoms are removed. Out of 203 images, 19 images will be deleted, and 184 images with clear signs of COVID-19 will remain. By doing the job in this manner, the community was introduced, as well as a more clearly labeled data set. Of the remaining images, 184 images were used, 100 images were used for network testing, and 84 images were used for network training. Using data augmentation, we increase the number of COVID-19 samples to 420 samples. Due to the small amount of non-COVID pictures in the COVID-chest ray-data set [34], the supplemental ChexPert data set [35] was used. is data set contains 224316 chest X-ray images from 65240 individuals. Totally, 2000 images from the non-COVID-19 data set are used for the training set, and 3000 images are used for the test set. Table 2 summarizes the total number of photos utilized across all classes (see Table 2 and Figure 3). Figure 3 illustrates two picture samples from COVID-19 and four standard image samples randomly picked from the COVID-X-ray-5k data set.

Presentation of Whales.
Two fundamental concepts govern the tuning of deep artificial neural networks: to begin, the structure's parameters must be accurately represented by a FuzzyWOA (candid solution); next, the fitness function must be defined in terms of the problem at hand. e use of FuzzyWOA in DCNN tuning is a distinct phase in the presentation of network parameters. erefore, to achieve the highest and highest detection accuracy, the essential parameters in DCNN, i.e., weights and FCL, must be clearly defined. In general, FuzzyWOA optimizes the weights and biases used to compute the loss function as the fitness function in the final layer. In other words, whales are used in FuzzyWOA as the last layer's weight and bias values. ree main ways are available for representing the weights and biases of a DCNN as frank solutions of a metaheuristic algorithm: based on vectors, matrices, or binary states [26]. Since FuzzyWOA requires a vector-based model's parameters, this paper uses equation (12) for the candidate solution.
where n denotes the number of input nodes, W ij denotes the weight of the connection between the i th input node and the j th hidden neuron, b j denotes the bias of the j th hidden neuron, and M jo denotes the weight of the connection between the jth hidden neuron and the oth output neuron. As indicated in Section 2.2, the suggested design is a straightforward LeNet-5 framework. Two structures are utilized in this section: i-6c-2s-12c-2s and i-8c-2s-16c-2s, where C and S denote convolution and subsampling layers, respectively. All convolution layers have a kernel size of 5 × 5, and the scale of subsampling is downsampled by a factor of two.

Loss Function.
In designing and proposing the proposed metaheuristic optimizer (DCNN-FuzzyWOA), the task of DCNN training is the responsibility of FuzzyWOA. e purpose of optimization is to obtain the best accuracy, minimizing classification error and network complexity. is target may be calculated using either the whales' loss function or the classification procedure's mean square error (MSE). As a result, the lost function is defined as equation (13).
where o denotes the computed output, d is the desired output, and N denotes the training sample count. Two Table 1: Applied fuzzy rules.

If (NFV is medium) and ( α → is medium), then (Δα is ZE)
If (NFV is medium) and ( α → is high), then (Δα is NE) If (NFV is high) and ( α → is low), then (Δα is PO) If (NFV is high) and ( α → is medium), then (Δα is ZE) If (NFV is high) and ( α → is high), then (Δα is NE) If (NFV is low) and ( C → is low), then (ΔC is PO) If (NFV is low) and ( C → is medium), then (ΔC is PO) If (NFV is low) and ( C → is high), then (ΔC is ZE) If (NFV is medium) and ( C → is low), then (ΔC is PO) If (NFV is medium) and ( C → is medium), then (ΔC is ZE) If (NFV is medium) and ( C → is high), then (ΔC is NE) If (NFV is high) and ( C → is low), then (ΔC is PO) If (NFV is high) and ( C → is medium), then (ΔC is ZE) If (NFV is high) and ( C → is high), then (ΔC is NE) 4 Computational Intelligence and Neuroscience conditions are defined to terminate FuzzyWOA, including reaching maximum iteration or predefined loss function.

Results and Discussion
As mentioned in the previous sections, this paper attempts to improve the classic DCNN-FuzzyWOA classifier's accuracy by proposing and designing a fuzzy system to adjust the WOA control parameters. For the DCNN-Fuz-zyWOA simulation, the population size and maximum iteration are 15. In DCNN, the batch size is 100, and the learning rate is 1. Additionally, the number of epochs examined for each assessment ranges between 1 and 20. e test was conducted in MATLAB-R2020a on a PC equipped with an Intel Core i7-2630QM CPU and 6 GB of RAM running Windows 7, with six distinct runtimes. According to reference [20], the accuracy rate cannot provide sufficient information about the detector's effectiveness. e suggested classifier's effectiveness in all samples was shown using receiver operating characteristic (ROC) curves.     As a result, each sample is assigned an estimated probability of images P T . Following that, a threshold value T∈ [0.1] was added. us, the detection rate was determined for each value. us, the obtained values were presented as a receiver operating characteristic (ROC) curve. In general, the concept of ROC diagram curves can be interpreted so that the larger the area under the diagram (AUC), the greater the probability of detection. Figure 4 shows the result of the ROC curve in the use of DCNN-FuzzyWOA to detect COVID-19. Also, in order to be able to make a fair comparison, a simple DCNN has been used to detect  is comparison is made because the test data set, the initial values of the parameters, and the simple CNN structure, i.e., LeNet-5 DCNN, are entirely the same. According to what has been said, the competence and efficiency of DVNN-FuzzyWOA can be considered fair. On the test data set, the ROC curves demonstrate that DCNN-FuzzyWOA beats LeNet-5 DCNN considerably (Figure 4). e suggested approach was implemented and executed 10 times, with a total training duration of between 4.5 and 11.5 minutes. e proposed classifier (DCNN-FuzzyWOA) for the COVID-19 validation set has a detection power between 99.01% and 100%. Due to the wide range of possible outcomes, the 10 trained DCNN-FuzzyWOA models are ensembled using weighted averaging with validation accuracy as the weights. e DCNN-FuzzyWOA classifier obtains a validation accuracy of 99.27 percent, while the LeNet-5 DCNN classifier achieves a detection accuracy of between 75.08 and 83.98 percent. e resultant ensemble achieves an 86.91 percent detection accuracy on the COVID-19 validation data set. New benchmark models including LeNet-5 DCNN [36], DCNN-GA [20], and DCNN-PSO [37] have been used to prove the efficiency and performance of DCNN-FuzzyWOA in detecting positive and negative cases of COVID-19. e ROC and precision-recall curves for the i-6c-2s-12c-2s and i-8c-2s-16c-2s structures are shown in Figures 5 and 6, respectively. e simulation results show that the DCNN-FuzzyWOA classifier or detector provides better results than other benchmark models.
For a more accurate comparison to understand the power and ability of DCNN-FuzzyWOA to detect positive and negative cases of COVID-19, more than 99.01% of the diagnoses are correct. e false alarm detection rate is less than 0.81%. In general, the trade-off between recall and precision for various threshold levels shows with the precision-recall curve. e greatest area under the precisionrecall curve suggests that the accuracy and recall are strong. High precision shows a low false-positive rate, and high-

Normal
Normal Covid Normal Covid Normal Figure 3: Images random from the COVID-X-ray-5k data set [33].  recall indicates a low false-negative rate. Figures 5 and 6 show that DCNN-FuzzyWOA has the largest area under the precision-recall curve. It demonstrates a lower rate of falsepositive-and false-negative classifications than other benchmark classifiers (see Tables 3-5).
Tables 3-6 describe the accuracy and computational time findings for the i-6c-2s-12c-2s and i-8c-2s-16c-2s structures. e overall result of the simulation was that the accuracy improved with increasing epoch.

Computational Intelligence and Neuroscience
As the number of epochs rises, the time efficiency of the FuzzyWOA becomes increasingly apparent, as the Fuzzy-WOA's stochastic structure results in a decrease in the complexity of the search space. It should be noted that the i-8c-2s-16c-2s structure findings in Tables 5 and 6 corroborate the previous conclusion about the i-8c-2s-16c-2s network. As a result, FuzzyWOA can significantly increase the performance of DCNNs with i-8c-2s-16c-2s and i-6c-2s-12c-2s structures. Data science experts believe that the best results can be shown using overall accuracy, ROC    Computational Intelligence and Neuroscience 9 curve, F1-Score. erefore, Table 7 examines the F1-Score in structures i-2s-6c-2s-12c and i-2s-8c-2s-16c. As shown in Table 7, the results obtained from FuzzyWOA are more appropriate and encouraging than the other methods used. So that, in the twentieth epoch, in the structure of i-2s-6c-2s-12c, the value of F1-Score reaches 100%.

Conclusion
In this paper, using AI tools, i.e., a combination of DCNN, WOA, and fuzzy logic, an accurate model is designed and proposed to detect the positive and negative cases of COVID-19 using X-ray. In addition to using the COVID-Xray-5k benchmark data set, the DCNN-PSO, DCNN-GA, and DCNN classic models were used for a fair comparison of the proposed detector or classifier. Analysis of simulation results provided comparable and significant results for the proposed DCNN-FuzzyWOA model. Experts also confirmed the relationship between the results and clinical results. One of the most significant reasons for the optimal performance of the DCNN-FuzzyWOA model is the adjustment of WOA control parameters by the fuzzy system and the determination of a clear boundary between the exploration and extraction phases in the search space of the WOA trainer algorithm. All training algorithms used to train the two convolutional networks were compared in terms of accuracy, processing time, F1-Score, and curves of ROC and precision-recall. e results showed that FuzzyWOA had a more encouraging performance than the other methods used. In terms of structure, the i-2s-6c-2s-12c architecture has been more successful. Of course, despite getting good results from DCNN-FuzzyWOA, larger data sets than COVID-19 are needed to achieve higher accuracy with more excellent reliability.

Data Availability
Data are available and can be provided over the emails querying directly to the author at the corresponding author (abbas.saffari@birjand.ac.ir).

Conflicts of Interest
e authors declare that they have no conflicts of interest.