Hybrid Deep-Learning and Machine-Learning Models for Predicting COVID-19

The COVID-19 pandemic has had a significant impact on public life and health worldwide, putting the world's healthcare systems at risk. The first step in stopping this outbreak is to detect the infection in its early stages, which will relieve the risk, control the outbreak's spread, and restore full functionality to the world's healthcare systems. Currently, PCR is the most prevalent diagnosis tool for COVID-19. However, chest X-ray images may play an essential role in detecting this disease, as they are successful for many other viral pneumonia diseases. Unfortunately, there are common features between COVID-19 and other viral pneumonia, and hence manual differentiation between them seems to be a critical problem and needs the aid of artificial intelligence. This research employs deep- and transfer-learning techniques to develop accurate, general, and robust models for detecting COVID-19. The developed models utilize either convolutional neural networks or transfer-learning models or hybridize them with powerful machine-learning techniques to exploit their full potential. For experimentation, we applied the proposed models to two data sets: the COVID-19 Radiography Database from Kaggle and a local data set from Asir Hospital, Abha, Saudi Arabia. The proposed models achieved promising results in detecting COVID-19 cases and discriminating them from normal and other viral pneumonia with excellent accuracy. The hybrid models extracted features from the flatten layer or the first hidden layer of the neural network and then fed these features into a classification algorithm. This approach enhanced the results further to full accuracy for binary COVID-19 classification and 97.8% for multiclass classification.


Introduction
COVID-19 was the most challenging health problem in 2020, following its emergence in December 2019 in Wuhan, China. Due to its global impact on populations, the World Health Organization (WHO) called it a pandemic in February 2020. By November 28, 2020, there were more than 62 million confirmed cases and 1.5 million deaths worldwide. In fact, this virus is like other coronaviruses that have appeared in the past two decades, like the middle east respiratory syndrome coronavirus (MERS-CoV) and severe acute respiratory distress syndrome coronavirus (SARS-CoV) [1,2]. ese infections are dangerous, as they spread very quickly, and hence early detection and diagnosis will hasten the response, alongside proper treatment and care. In fact, there are many other causes of viral pneumonia: lung infections like flu, the common cold, and some viruses. In its early stages, viral pneumonia sticks to the upper part of the respiratory system. e lungs' air sacs start to get infected, inflamed, and filled with fluid if the infection reaches the lungs, presenting significant health risks, especially for those with comorbidities [3,4].
Hospital staff, doctors, nurses, and clinical facilities have many strategies and tools for diagnosing and reducing the impact of this epidemic. e most currently used technique to detect COVID-19 infection is reverse-transcription polymerase chain reaction (RT-PCR), but it has a low sensitivity of 60%-70%. Another possible diagnosis option is to use radiological images of patients using volumetric chest CT and X-ray imaging, which may help doctors analyze and predict the effects of COVID-19 on the human body. CTuses a high radiation dose, which limits its use in children and pregnant women, while X-rays use a low radiation dose at low cost. As such, the X-ray is a good candidate for imaging the lungs and may be an effective method for the early detection of COVID-19, especially in countries that cannot purchase expensive laboratory kits for COVID-19 testing [2,5,6]. However, discriminating between COVID-19 and other viral pneumonia is challenging because the radiographic features are similar. Moreover, the lung has complex morphological patterns that change in extent and appearance over time [4,7,8]. erefore, designing artificial models to detect these patterns with high accuracy is very important to rapidly screen and see infections of COVID-19 and help radiologists by providing practical assistant tools [9]. ese models use chest X-ray images of healthy lungs and those infected with COVID-19 for early detection of this disease.
For this purpose, we collected data of X-ray images of healthy and COVID-19 infected patients from different sources to test the effectiveness of the proposed models. In fact, this work focuses mainly on using convolutional neural networks (CNNs) and transfer-learning models for classifying chest X-ray images for coronavirus-infected patients. A lack of availability of many pictures of COVID-19 patients has made detailed studies about solutions for automatic detection of COVID-19 from X-ray (or chest CT) images rare. Moreover, labeling such images for deep-learning (DL) applications is not an easy job and seems expensive [10,11]. Small data sets of COVID-19 X-ray images have been announced for AI researchers to train machine-learning (ML) models to perform automatic COVID-19 diagnoses from X-ray images [12]. Recently, Sedik et al. [10] collected a data set of 6,128 X-ray, CT, and ultrasonic lung images, but they were mixed and imbalanced between training and testing. e data sets available are still small, and hence two enhancement strategies were adopted in this paper to address the scarcity of COVID-19 X-ray images: (1) We used data augmentation to create transformed versions of COVID-19 X-ray images (such as flipping, slight rotation, and adding a small amount of distortion) to increase the number of samples by a factor of 5. (2) Instead of training our models from scratch, we finetuned the last layer of the pretrained version of these models on ImageNet. is way, it was possible to train the model with fewer labeled samples from each class. e paper is organized as follows: the literature review of previous work on COVID-19 is presented in Section 2. e data set used in this paper and its characteristics are described in Section 3. e proposed models' architectures are defined and discussed in detail in Section 4. is section presents four different models that show promising results for both binary and multiclass classifications. Simulations and discussions are summarized in Section 5, and finally, the paper is concluded in the last section.

Literature Review
Research on medical image-processing using DL started in 1995, classifying lung nodules using X-ray images [13]. Apostolopoulos and Mpesiana [14] collected X-ray images from North America and Italy and fed them into a CNN model for COVID-19 detection. Ozturk et al. [5] proposed the DarkNet model for detecting COVID-19 cases via viral pneumonia images and identifying a location if there is a shortage of radiologists due to the enormous number of patients. Rajpurkar et al. [15] presented a CheXNet model to diagnose lung diseases using DL for processing X-ray images. Karakanis and Leontidis [16] built a network to augment X-ray data with synthetic images and proposed DL models for binary and multiclass classifications of COVID-19. Another work in such a direction was done by Das et al. [17] using DL and transfer-learning (TL) for building models to be tested on Kaggle data sets. Hussain et al. [18] proposed a TLCoV model to detect COVID-19 using CT scan and X-ray images automatically and called it CoroDet. ey argued that their COVID-19 classification model was accurate for both binary and multiclass classifications.
EMCNet was proposed by [19] to detect COVID-19 cases by evaluating chest X-ray images. ey used a CNN for extracting deep features of the photos and then used binaryclassification techniques.
ey combined the outputs of many classifiers to form an ensemble for better detection capabilities. Sedik et al. [10] used a CNN and convolutional long short-term memory for building a DL model to detect COVID-19 cases. ey tested their models on two data sets-X-rays and CT scans-with normal, COVID-19, and pneumonia classes. ey added some ultrasound images to their data set and argued that their models could be used for quick detection of COVID-19. Maior et al. [11] discussed the effect of limited X-ray images for COVID-19 detection. ey tried to resolve this problem by combining different data sets and used them for testing CNN models. Sedik et al. [20] proposed two models for augmenting images to increase the learning ability of some DL methodologies, which enhance the detection possibility of COVID-19 cases. Many other researchers have investigated many techniques based on ML and DL to detect COVID-19 based on X-ray and CT images for public data sets, as listed in Table 1.

Data Sets
As we discussed before, collecting X-ray images for COVID-19 is still in its early stages. For enhancing the number of samples for our experiments, we merged two real data sets. e first dataset was collected from Kaggle. We downloaded the database of chest X-ray images for positive cases of COVID-19, along with viral pneumonia and normal images. ere were 219 COVID-19 positive images, 1,345 viral pneumonia images, and 1,341 normal images for this data set. Downloading viral pneumonia images allowed us to test our model in differentiating COVID-19 from other viral pneumonia infections, as it is possible to guide the system to the wrong decision.
e second data set, on COVID-19 patients, was collected from Asir Hospital in Saudi Arabia.
We augmented images annotated positive in the Kaggle data set to generate more general and robust models due to the limited positive cases. We applied augmentation techniques on the assembled data set, such as random rotation and vertical flip operations, using the Image-DataGenerator function of the TensorFlow Keras framework. We generated 657 cases and combined all images together to get a final data set of 4,103 X-ray images. Each image in the data set was resized to 120 × 120 pixels to reduce space and computation time and hence derive consistent data. Additionally, image normalization was applied to scale pixel intensities to a range of 0-255. Table 2 describes the counts of X-ray images of each class and the distribution of each data set versus the total number of images. Some samples of the data set for different classes are shown in Figure 1

Model Architectures
As an advanced technology, DL tries to simulate the way the human brain's neurons work. DL consists of deep convolutional and deep neural network layers. In fact, CNNs are preferred for image-processing applications, as they are robust context learners and usually extract powerful features from the data [21,22]. Apart from its outstanding accuracy, DL requires considerable computation, memory, and time to train the model, as it has many layers and thousands, if not millions, of weights to be learned. erefore, TL can be used to shorten such restrictions and enhance accuracy. e following sections discuss different models for detecting COVID-19 infection through X-ray images using CNNs or TL. We need our models to classify the X-ray images into normal, pneumonia, or COVID-19. We used three basic blocks for building variants of our models: (1) CNN block (2) TL block using VGG16 or VGG19 (3) ML block In the following subsections, we discuss each model in detail and illustrate its main blocks.

Model 1: Convolutional Neural
Networks. DL has proven to be a highly accurate technique to guarantee high-level detection and prediction of many medical cases by extracting deep features from the data set on hand [28,29]. We built a CNN model and trained it many times with different parameters to select the best hyperparameters. e final model consisted of four convolutional layers and four dense layers. e convolutional layers were one layer with 16 filters, one layer with 32 filters, and two layers with 64 filters. All filters were of size 3 × 3, and all convolutional layers followed maximum pooling of 2 × 2. e four dense layers were three hidden layers with 128, 64, and 10 neurons and one output layer. Details of these configurations are given in Figure 2. is model has three classification processes: binary classification between COVID-19 and normal cases, binary classification between COVID-19 and viral pneumonia cases, and multiclass classification among normal, viral pneumonia, and COVID-19 cases. As such, it will be possible to investigate the ability of our model to differentiate COVID-19 infection from normal and other viral pneumonia infections.  [30]. In fact, TL models have already been trained for days or weeks and hence are perfect candidates as starting points for many tasks.
In this paper, we used VGG16 or VGG19 as TL models for building our model. However, our aim here was to enhance the system's accuracy, not to reduce training time, and thus we operated all models on the same data set without any reductions.
Initially, VGG16 was developed for the recognition of large-scale images. It used the ImageNet data set to overcome training time and insufficient data. Hussain et al. [18] showed that VGG16 outperformed other approaches they had tested. VGG16 is a CNN architecture that has 16 layers, 13 convolutional layers, and three dense layers. Our model kept all the convolutional layers with their parameters and reduced the dense layers to two only. e weights of the dense layers were trained using the data set. e convolutional layers of VGG16 were divided into five convolutional phases: two layers with 64 filters, two layers with 128 filters, three layers with 256 filters, three layers with 512 filters. All filters were of size 3 × 3, and each convolutional phase   followed maximum pooling of 2 × 2. e two dense layers were one hidden layer with 128 neurons and one output layer. VGG16 has approximately 138 million parameters for the network. e details are shown in Figure 3. Another TL model, VGG19, can be used instead of VGG16. VGG19 is a 19-layer model that adds one extra layer to each phase of the last three convolutional phases of VGG16. We followed the same strategy of using VGG16 for our models by maintaining the convolutional layers and reducing the dense layers to only two.

Model 3: Hybrid CNN with Machine
Learning. Since ML techniques need extracted features to complete the classification tasks, we decided to obtain these features from a robust technique like DL.
is model extracted features using a CNN and then fed these features to one of the ML techniques. It hybridized two blocks-CNN and ML-with one used as a feature-extraction block and the other for the classification process. We examined four supervised classification techniques (naïve Bayes, support vector machine, random forest, and XGBoost). In sum, 4,096 features were extracted after the flatten layer (Model 3a), as shown in Figure 4, or 128 features after the first hidden layer of the neural network (Model 3b), as shown in Figure 5. ese extracted features were used as inputs for ML. Basically, this method was used for obtaining a solid learner with the help of convolutional layers to extract features and ML to classify the results.

Model 4: Hybrid VGG16 with ML.
is model extracted features using a pretrained model (VGG16) because it had performed better than VGG19 for model 2. Features were extracted after the flatten layer (4,608 features, Model 4a) or after the first hidden layer of the neural network (128 features, Model 4b). en, the features extracted were used as inputs for ML. Like Model 2, we kept all the convolutional layers with their pretrained parameters and fine-tuned the parameters of the dense layer. More details are shown in Figure 6.

Simulation and Computational Experiments
is section discusses in detail data and experiment preparation, evaluation metrics, and finally, the results and discussion. Table 3 lists the complete set of the baseline and proposed models, along with descriptions.

Data and Experiment Preparation.
e experiments were applied on one open-source COVID-19 data set, as described in Section 3, and local COVID-19 patients' X-ray images from Asir Hospital, Abha, Saudi Arabia. e total number of pictures of COVID-19 was 219 after being increasing by augmentation. e images generated from augmentation numbered 643. We split the data into training data (3,279) images to build the model and validation data (820 images: 304 COVID-19, 270 normal, and 246 other viral pneumonia) to tune, monitor, and select the best parameters of the model. We fine-tuned each model for 20 epochs, and the batch size was set to 32. We used a categorical/binary cross-entropy loss function and ADAM optimizer to optimize the learning function with a learning rate of 0.001. We used the regularizers l1 and l2 (l1 � 1e − 5, l2 � 1e − 4) in dense layers and dropout (0.2) after convolutional layers to avoid overfitting during the model's training. All images were downsampled to 120 × 120 before being fed to the models. Overfitting is a general problem in DL and occurs when the model fits too well to the training set because of the increasing number of features compared to the small number of samples. In this study, two approaches were used to solve this problem: (1) Dropout regularization is used for reducing overfitting and improving the generalization of deep neural networks. e network becomes less sensitive to the specific weights of neurons, becomes more capable of better generalization, and is less likely to overfit the training data. In our experiments, dropout parameters were set to 0.2. (2) Regularization in dense layers: regularizers allow us to apply penalties on layer parameters or layer activity during optimization. ese penalties are summed into the loss function that the network optimizes. In our implementation, the L1 regularization penalty was set to 0.0001 and L2 to 0.00001.

Evaluation Metrics.
e experiments for our models were evaluated using accuracy, precision, recall, and F1 score [31]: where TP is true positive (the number of correctly classified images of a class), TN true negative (the number of images that did not belong to a class and were not classified as belonging to that class), FP false positive (the number of wrongly classified images of a class), and FN false negative (the number of images of a class detected as another class). Precision and recall were defined; thus, F1 score is defined as

Experiments Conducted.
We conducted extensive experiments to verify the suitability of our models, as illustrated in Table 4. As a baseline, we implemented ConvNet#4, as it is the best model of Sekeroglu and Ozsahin [6].    ). e baseline model's accuracy for binary classifications was high (>98%), while it was lower for multiclass classification (around 93%). Multiclass classification accuracy was 96.1%, 97.6%, and 96.6% for the proposed CNNs, VGG16, and VGG19 models, respectively. ese values were higher for the other two binary-classification scenarios. Performance in terms of precision, recall, and F1 score was very good, with the lowest value of 95.8%.

Results and Discussion
is indicated that Models 1 and 2 are efficient in detecting COVID-19 cases compared to either normal or other viral pneumonia cases. ese results were better than that of the baseline model by a good margin. e confusion-matrix plots for Models 1, 2a, and 2b are depicted in Figure 7. e rows correspond to the predicted class (output class), and the columns correspond to the proper class (target class). e diagonal cells in the confusion matrix correspond to  Figure 6: Hybrid model VGG16 with ML (Model 4a and Model 4b) architecture and configuration.  FN). e number of observations is shown inside each cell. From these results, the misclassification rate was very low for all models.
To further study the overfitting behavior of our models, we depict the accuracy and loss results for training and validation learning for each epoch in Figure 8. e figures showed no overfitting in the models' performance due to the slight differences between the accuracy and loss of training and validation sets. e results of the hybrid models between CNN and ML are listed in Table 6 for binary classification and in Table 7 for multiclass classification. We implemented the model four times for both scenarios, one for each ML technique, by taking the features from the flatten layer and four times by taking the features from the first hidden layer. e results showed accurate model performance, with overall binary accuracy of 100% for Model 3a with SVM. Model 3b showed 100% accuracy with SVM, naive Bayes, and random forest binary classifiers and 99% with the XGBoost binary classifier. Similarly, accuracy was very good for multiclass classifiers compared to baseline and previous models, especially Model 3b, which had only 128 extracted features from the first hidden layer of the neural network. e accuracy of Model 3a for many binary classifiers was 100%.
e results of the hybrid models between VGG16 and ML are listed in Table 8 for binary classification and Table 9 for multiclass classification. For both scenarios and each classifier, Model 4a was implemented by taking the features from the flatten layer, and Model 4b is by taking them from the first hidden layer. e results showed accurate model performance, with an overall accuracy of 100% for Model 4a with SVM and random forest binary classifiers, while Model 4b shows 100% accuracy with all examined binary classifiers. However, the performance of Models 4a and 4b was less than that of Models 2a, 3a, and 3b with multiclass classifiers.
To test the generality of the proposed models, we implemented them on another binary-class data set called "combined COVID-19 data set" [32]. is data set is a mix of   Computational Intelligence and Neuroscience X-ray, CT, and ultrasound images. is data set was augmented to generate 6,128 images and was divided into training and validation data sets as per Table 10. Table 11 lists the accuracy values of the proposed models on the combined COVID-19 data set. e proposed models performed very well on this mixed data set, with Model 4b showing the best result with 99.6% accuracy. is proves the generality of our models and their effectiveness for the correct classification of COVID-19 cases. e previous results show that the proposed models are general, robust, and accurate for many medical images, revealing excellent and promising results. ese models were built based on current technology and were tested on different general data sets.

Conclusions and Future Work
is paper presents various models for early diagnosis and classification of COVID-19 patients based on X-ray images. e models were built using CNNs, TL with VGG16 and VGG19, and ML techniques. We identified the best hyperparameters for the proposed models and exploited the power of DL to extract deep features for binary and multiclass classifiers to improve COVID-19 diagnosis accuracy.
For correct prediction, we tested our models on a data set with three classes to ensure that the models were accurate in differentiating COVID-19 from other viral pneumonia infections, which have many common radiographic and ambiguous features. e proposed models outperformed the baseline one and showed promising results, especially hybrid models, which revealed very good results for both types of classifiers. For binary classification, they offered full accuracy for many cases. is illustrates the ability of DL techniques to extract relevant features, which makes the job of ML easier.
In future work, we plan to explore more X-ray data sets from different countries. Other medical images like CT scans and ultrasound images can be used for improving the accuracy of the diagnosis process and early detection models, which in turn can empower the decision-making process regarding COVID-19 patients. Moreover, other TL models can be applied with different configurations to get better results.
Data Availability e data set used in this research is available on the Mendeley Data website under the name Covid-19.zip.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.