Multiobjective Genetic Algorithm and Convolutional Neural Network Based COVID-19 Identification in Chest X-Ray Images

Department of Computer Science and Engineering, University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya (Technological University of Madhya Pradesh), Bhopal (MP), India Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India Department of Science and Technology, Jayoti Vidyapeeth Women’s University, Jaipur, Rajasthan, India Lovely Professional University, Jalandhar, India Department of Computer Science and Engineering, Rabindranath Tagore University, Bhopal, India Department of Computer Science and Engineering, University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya (Technological University of Madhya Pradesh), Bhopal (MP), India


Introduction
e initial occurrence of COVID-19 disease was found in Wuhan, China, during December 2019. Ever since, it is increasing at a rapid rate in the entire world. e testing of COVID-19 is time-consuming, and also, the results obtained from rapid COVID-19 testing kits are not reliable. erefore, radiologists and doctors have started using supervised learning techniques to test COVID-19 disease. e prime objective is to identify COVID-19 patients as infected or not, at a rapid rate [1]. e deep learning techniques may be utilized for COVID-19 patient identification [2]. Figure 1 shows the different chest X-ray images. It is found that there exists a significant change in the chest X-ray image of COVID-19infected patients as compared to other images.
Machine learning and deep learning techniques are extensively employed to implement computer-aided identification [1,3]. It has been observed that these techniques can save significant time of clinical persons and doctors for the examination of medical images such as X-ray and Computed Tomography scan (CT scan) [3,4]. However, these learning techniques require a significant amount of medical images for training. Also, efficient feature extraction and selection techniques are desirable to achieve significant results [5,6]. Recently, metaheuristic techniques are also used to tune the hyperparameters of these machine learning models [2,7].
In this paper, a COVID-19 identification model from chest X-ray images is proposed. e main contributions of this work are as follows: (1) e Convolutional Neural Network (CNN) is used to predict COVID-19 disease by using their respective chest X-ray images (2) A pretrained GoogLeNet is used for implementing the transfer learning (i.e., by replacing some sets of final network CNN layers) (3) 20-fold validation is considered to overcome the overfitting issue (4) Finally, the multiobjective genetic algorithm is considered for tuning the hyperparameters of the proposed COVID-19 identification model (5) Extensive experiments show that the proposed COVID-19 identification model achieves remarkably good results and may be utilized for real-time testing of patients e rest of this paper is classified into the following sections. Section 2 presents the related work. e proposed COVID-19 identification model is illustrated in Section 3. Performance analysis is manifested in Section 4. Conclusions are outlined in Section 5.

Related Work
is section highlights various techniques that are used to diagnose COVID-19-infected patients from chest X-ray images.
Tang et al. [8] employed GoogLeNet to extract the characteristics of the images. Multistage feature fusion is contemplated to recognize the scene from output characteristics. Gao et al. [9] classified breast cancer by utilizing shallow deep CNN. Deepak and Ameer [10] presented an identification technique using GoogLeNet and deep transfer learning for brain MRI images. Cinar and Yildirim [11] proposed a technique to diagnose the brain tumor using ResNet-50. In this model, the last five layers are removed and eight new layers are appended. Nayak et al. [12] implemented an identification technique through CNN with five layers. is technique comprised four convolutional layers and one fully connected layer.
Liu et al. [13] implemented a ResNet model with multiscale spatiotemporal characteristics. Hao et al. [14] proposed optimized CNN based on target region selection for image recognition. Taheri and Toygar [15] proposed directed acyclic graph-based CNN for identification. It is based on the combination of VGG-16 and GoogLeNet. Ciocca et al. [16] applied CNN to diagnose the images and considered a residual network with 50 layers to extract the characteristics. Liu et al. [17] proposed an identification technique using  optimization of ResNet-50 for remote sensing images. Han and Shi [18] presented a multilead residual neural network to extract the characteristics of ECG records. Talo et al. [19] considered the pretrained models VGG-16, AlexNet, ResNet-18, ResNet-34, and ResNet-50 to automatically diagnose MRI images. ey found that ResNet-50 has better accuracy as compared to the other pretrained models.
Das et al. [20] designed a novel extreme version of Inception (Xception) based COVID-19 identification model. Liu et al. [21] suggested an identification model based on ResNet and transfer learning model. In this, a new data augmentation technique is considered with the help of a filter for small datasets. Togacar et al. [22] considered a deep learning model to detect COVID-19 using the X-ray images. e fuzzy color technique is considered to restructure the data classes. MobileNetV2 and SqueezeNet are applied to build the dataset. Social mimic optimization is considered to obtain the feature sets. Further, Support Vector Machine (SVM) is used to diagnose efficient characteristics. Pannu et al. [7,23] implemented swarm intelligence-based Adaptive Neuro-Fuzzy Inference System (ANFIS) to diagnose COVID-19-infected people.
It has been observed that supervised learning algorithms may be used to test COVID-19 disease from chest X-ray images. Also, the use of pretrained feature extraction models can improve the identification rate [24][25][26][27].
e hyperparameter tuning of these models can achieve significant results. e k-fold validation [25] is used to overcome the overfitting problem.

Proposed Deep COVID-19
Classification Model is work used CNN and GoogLeNet for the identification of COVID-19 disease. In addition, a multiobjective genetic algorithm is considered to tune the hyperparameters of the proposed COVID-19 identification model. e step-by-step flow of the designed COVID-19 identification model is discussed in Algorithm1.

Transfer Learning Using a Pretrained GoogLeNet.
In this work, a GoogLeNet is considered to extract significant characteristics of chest X-ray images. It is a pretrained model, and is used as a transferred source. e characteristics extracted from this layer are considered as transfer learning to build the CNN-based COVID-19 identification model.

Convolutional Neural
Network. CNN is widely used for identification problems [28]. Figure 2 shows the standard architecture of the CNN model. e subsequent sections discuss various layers of CNN.

Convolutional Layer.
is layer is considered to build the input characteristics. Various convolution filters are considered to compute the patterns (Figure 3). Each neuron of the convolutional layer is connected with its sibling neurons to process the feature maps [29,30].
Every time a convolutional operator provides a new feature map, the feature value y ij in the k th feature map is evaluated as follows: where w k and b k represent the average and bias values of k th mask, x ij represents the input mask centered at (i, j), and ⊙ is the Hadamard product of two matrices. Weights are shared between sibling nodes to minimize the complexity of the model.

Nonlinear Layer.
e nonlinear layer uses an activation function and is implemented on the entire set of feature maps. It can deal with the nonlinear dependencies of the feature maps. In this paper, the ReLu activation function is considered.

Pooling Layer.
is layer does not come up with any kind of weights. It endeavors to gain shift invariance by minimizing the feature maps and considering activation properties from the local range of CNN. Average and maximum operators are generally considered in the pooling layer. It uses k × k mask and produces a unique value. In case of a N × N layer, the output will be a N/k × N/k layer.

Fully Connected Layer.
is layer considers high-level reasoning. ere are connections in every input-output pair. After this layer, other nonlinear functions are used.

Loss
Layer. Finally, a loss layer is considered to obtain the trained COVID-19 identification model. For COVID-19 identification, a softmax operator is utilized. Assume that θ defines attributes of CNN such as bias and kernel operators. When obtaining N required sets, y (i) is the target class considering i th input and o (i) defines the output of CNN; then, the loss of CNN is computed as follows:

Multiobjective Fitness Function.
e proposed COVID-19 identification model suffers from the hyperparameter tuning problem; therefore, in this paper, a multiobjective genetic algorithm is considered. e performance metrics accuracy (A c ) and F-measure (F m ) are considered to design a multiobjective fitness (f(t)) function as where A c can be evaluated as follows: Mathematical Problems in Engineering 3 where TP, FP, TN, and FN are the true-positive, falsepositive, true-negative, and false-negative values, respectively. F m can be evaluated as follows: where p r and r c represent precision and recall values, respectively. p r and r c can be evaluated as follows: 3.4. Multiobjective Genetic Algorithm. e genetic algorithm for Pareto optimization is discussed in Algorithms 2 and 3.
(1) Input: chest X-ray images as a labeled dataset (2) Initially, GoogLeNet is utilized to evaluate the significant characteristics of COVID-19 dataset images (3) Further, transfer learning is considered to build a CNN-based COVID-19 identification model (4) Multiobjective genetic algorithm is utilized to tune the designed model (5) Implement k-fold validation to overcome overfitting (6) Return t e constructed COVID-19 identification model for chest X-ray images e genetic algorithm contains a group of operators to optimize the given fitness function [31]. Initially, random solutions are obtained using a normal distribution. ese solutions are then applied to CNN for evaluating the multiobjective fitness function (see equation (3)). Based on computed values, solutions are ranked for further processing. ereafter, mutation and crossover operators are applied to the solutions for obtaining child values. Based upon their fitness values, they are output: P f � max . A c , max . F m / * P F represents the Pareto front. * / input: COVID-19 training dataset, CNN, random population begin (1) Set random solution as hyperparameters of CNN; (2) Apply CNN on COVID-19 training dataset; (3) Validate CNN on the same fraction of COVID-19 training dataset; (4) Evaluate confusion matrix based on the actual and predicted values; Note. P F represents the Pareto front.
Mathematical Problems in Engineering ranked [32]. Finally, the most nondominated solution is returned as initial parameters of CNN.

Performance Analysis
is section discusses the performance analysis of the COVID-19 identification model. is work uses 20-fold cross-validation to overcome the overfitting problem. 70% of the entire dataset is considered for training purpose. e hyperparameters of the proposed COVID-19 identification model are obtained using a multiobjective genetic algorithm.

Chest X-Ray Image Dataset.
To enhance prognostic analysis, triage and manage patient care, data is the first step for building any identification tool. erefore, COVID-19 chest X-rays are collected to build COVID-19 identification models. Chest X-ray images of COVID-19-infected patients contain many unique characteristics. erefore, chest X-ray images may be utilized to diagnose COVID-19-infected patients at a rapid speed.
In this paper, the chest X-ray images are obtained from several datasets such as from [2,33]; there are 1332 COVID-19 (+) images and 1421 images of normal or pneumoniainfected patients. Figure 4 shows a partial set of X-ray images  of normal persons and COVID-19-infected patients. It clearly shows that there is a significant change in the X-ray images of normal and COVID-19-infected patients.

Comparative
Analysis. e performance of the proposed model is compared to various machine learning and deep learning approaches. e overall objective is to evaluate the significant improvement of the proposed model against various performance metrics such as accuracy, Area Under the Curve (AUC), F-measure, specificity, and sensitivity. e training and validation analysis of the proposed COVID-19 identification model is illustrated in Figure 5. It demonstrates that the proposed COVID-19 identification model achieves significant training and validation accuracy values. It also indicates that the loss of the proposed COVID-19 identification model is minimum. Further, as the number of epoch increases, it shows improvement in results, but after 3300 epochs, it seems to be constant, i.e., not much improvement in results is observed. Tables 1 and 2 show model building and testing analysis among the proposed and the existing COVID-19 identification models. Table 1 reveals that the proposed model achieves significant performance in terms of accuracy, Area Under the Curve (AUC), F-measure, specificity, and sensitivity as compared to the existing models.

Conclusion
In this paper, a CNN model is used to build COVID-19 identification model using the chest X-ray images. 20-fold cross-validation is used to overcome the overfitting problem. A pretrained GoogLeNet is also considered for implementing the transfer learning (i.e., by replacing some sets of final network CNN layers). Finally, the multiobjective genetic algorithm is used for hyperparameter tuning of the COVID-19 identification model. Performance analysis revealed that the COVID-19 identification model attains significantly good performance than the competitive models. e proposed COVID-19 identification model offered training and testing accuracy up to 98.3827% and 94.9383%, respectively. us, the designed identification model can be used for real-time identification of COVID-19 disease.

Data Availability
Data will be made available upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.