The world is facing a pandemic due to the coronavirus disease 2019 (COVID-19), named as per the World Health Organization. COVID-19 is caused by the virus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which was initially discovered in late December 2019 in Wuhan, China. Later, the virus had spread throughout the world within a few months. COVID-19 has become a global health crisis because millions of people worldwide are affected by this fatal virus. Fever, dry cough, and gastrointestinal problems are the most common signs of COVID-19. The disease is highly contagious, and affected people can easily spread the virus to those with whom they have close contact. Thus, contact tracing is a suitable solution to prevent the virus from spreading. The method of identifying all persons with whom a COVID-19-affected patient has come into contact in the last 2 weeks is called contact tracing. This study presents an investigation of a convolutional neural network (CNN), which makes the test faster and more reliable, to detect COVID-19 from chest X-ray (CXR) images. Because there are many studies in this field, the designed model focuses on increasing the accuracy level and uses a transfer learning approach and a custom model. Pretrained deep CNN models, such as VGG16, InceptionV3, MobileNetV2, and ResNet50, have been used for deep feature extraction. The performance measurement in this study was based on classification accuracy. The results of this study indicate that deep learning can recognize SARS-CoV-2 from CXR images. The designed model provided 93% accuracy and 98% validation accuracy, and the pretrained customized models such as MobileNetV2 obtained 97% accuracy, InceptionV3 obtained 98%, and VGG16 obtained 98% accuracy, respectively. Among these models, InceptionV3 has recorded the highest accuracy.
The current coronavirus disease 2019 (COVID-19) pandemic is very lamentable because the second wave seems to be more dangerous than the first wave. India is one of the most affected countries in the second wave of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The USA and Brazil are also two vulnerable countries as they have not recuperated from the first wave. On 26 April 2021, the total number of infected people in India was 360,960 and is increasing rapidly [
The virus normally attacks the lungs in the human body and causes pneumonia in severe cases. Subsequently, it decreases the oxygen level instantly. Because this virus has no cure thus far, the only solution before a vaccine is to prevent the spread of the virus. Therefore, tests and trace is the only solution thus far. Normally, the polymerase chain reaction (PCR) test is widely used in medical science for testing. However, because the number of cases is increasing rapidly, it has become nearly impossible to perform enough tests through PCR, as it is time-consuming and costly. Therefore, an alternative testing is required so that infected people can be identified quickly and quarantined or isolated. To date, some deep learning approaches have been used to identify viruses. However, the results of these deep learning techniques are not sufficient to deal with a medical-related diagnosis system.
COVID-Net, a deep CNN architecture built from chest X-ray (CXR) images for the detection of COVID-19, was introduced in [
Most studies obtained an accuracy of approximately 90%. Conversely, the present paper study used some pretrained models, that is, ImageNetV2 with customization yielded 98% accuracy and validation accuracy of 97%; VGG16, accuracy of 98% and validation accuracy of 98%; ResNet50, accuracy of 88% and validation accuracy of 91%; and InceptionV3, accuracy of 98% and validation accuracy of 99%. The designed custom CNN model obtained 97% accuracy and 97% validation accuracy. Clearly, the accuracy percentage of the models used in this study is higher than those of previous studies, making the models in the current study more reliable. Their robustness has been verified through multiple model comparisons, and the scheme can be drawn through the study analysis.
This paper describes a deep learning approach for identifying SARS-CoV-2-infected patients. In the classification, feature extraction in the CNN model can be achieved with high performance. Filter-based feature extraction is used in the CNN model, which can be effective for classification. CNNs can classify images with complex identities. A large number of weight parameters can be reduced using the CNN architecture. Considering these facts, this paper proposes different CNN architectures to detect COVID-19 [
The remainder of the paper is organised as follows. The materials and procedures are covered in Section
The dataset was obtained from the open source Kaggle and GitHub and then merged to prepare a suitable dataset. The dataset contained CXR images of normal patients and patients with COVID-19. A CNN was used for feature extraction. The model has four Conv2D layers, three MaxPooling 2D layers, one flattened layer, two dense layers, and a rectified linear unit activation function. The final dense layer, softmax, was used as the activation function.
In this study, transfer learning is also used so that the accuracy of the designed model can be compared with that of the pretrained model. For pretrained models, MobileNetV2, VGG16, Resnet50, and InceptionV3 were used with some modifications in the final layers, and a head model from the base model. The customized final layers are average pooling, flattening, dense, and dropout. The CNN model is suitable for image feature extraction as it extracts the features of given images and learns and differentiates the images from these features.
Python was the ideal programming language for data analysis. Deep learning-based challenges are particularly effective with Python programming because of Python’s large library access. To utilise a personal GPU for dataset preprocessing, Anaconda Navigator and Jupyter Notebook were used, as well as Google Colab, to handle large datasets and model training online. They were also used to save all data, code, and work so that it can be retrieved from any GPU using GitHub. Because GitHub has a tracking system for teamwork and code management, it is also suitable for teamwork.
The dataset used included images of CXR of two classes. One class holds CXR images of COVID-19 patients, and the other holds CXR images of normal patients. These classes were divided into two subclasses. One of them is a training set, and the other is a validation set. The dataset contained 2541 images [
X-ray image of a COVID-19 patient.
X-ray image of a normal patient.
Figure
In the block diagram of Figure
Block diagram of the system.
In the block diagram, the overall system is provided in the simplest manner. The decision part of this system is crucial and plays a vital role in this study. The decision is mainly based on the model, which is trained with a large amount of data that are extracted from CXR images.
The system architecture is an overview of the entire system. In this architecture, the input is a CXR image, and the output is a prediction of the image. In this case, it will predict whether the image is COVID-19 affected. The input shape is 224 × 224, and there are three channels. In the first two layers of the designed architecture, the filter size is 32 with padding, kernel size of 3, and activation function as ReLU. Thereafter, there is the first maxpooling layer, which has a pool size of 2 and strides of 2. The following layer is a flat layer that converts pooled features into a single column. Finally, two dense layers were formed. The first one has ReLU as an activation function, and the least dense layer’s activation function is softmax. After preprocessing, the features enter the network. Figure
System architecture.
The convolutional layer is the basic layer of the CNN. This is accountable for determining the design characteristics. The input picture is passed through a filter in this layer. The function map is obtained from the output of the same filters by convolution operation.
The multiplication of sets of weights with the input is performed by a convolution operation. A filter consists of a two-dimensional collection of weights multiplied by an array of input data. A dot product is a type of multiplication that is applied between a filter-sized patch of the input and the filter, which results in a single value. This product is applied between the filter-sized patch of the input and the filter. The filter is smaller than the input, and the same filter is used to multiply the input from different points. The filter is designed as a special technique to identify specific types of features as it systematically covers the entire image.
Assume that the NN input is
The pooling layer summarizes the presence of features by facilitating the downsampling features. It is normally applied after a convolution layer and has some spatial invariance. Two popular pooling methods, average pooling and max pooling, summarize the average presence of a function, and the most activated presence of a function [
In fact, the pooling layer deletes the unnecessary features from the images and makes the image literate. In average pooling, the layer averages the value of its current view every time. When using maxpooling, the layer selects the maximum value from the filter’s current view each time. By using the matrix size specified in each feature map, the max-pooling technique selects only the maximum value, resulting in reduced output neurons. Thus, the size of the image becomes very small, but the scenario remains the same. A pooling layer is important for reducing the number of feature maps and network parameters, and a dropout layer is used to prevent overfitting.
The activation of max pooling can be calculated as follows:
The flattened layer is used to convert data from the matrix into a one-dimensional array for use in the fully connected layer and to create a single one-dimensional feature that is both long and narrow. Flattening vectors are an option. Finally, it connects the single vector to the final classification model, which is also known as a fully connected layer [
CNNs rely mostly on fully connected layers, which have proven to be quite useful in computer vision image recognition and classification. Convolution and pooling are the starting levels of the CNN process, which breaks down the image into attributes and analyses them separately [
In a fully connected layer, each input is connected to all neurons, and the inputs are flattened. The ReLU activation function is commonly used as a fully connected layer. The softmax activation function was used to predict the output images in the last layer of the fully connected layer. The convolutional neural network architecture uses a fully connected layer. These are the last few layers and important layers of the convolutional neural network.
The scarcity of medical data or datasets is one of the greatest challenges for researchers in medical-related research, and data are one of the most crucial components of deep learning approaches. Data analysis and labelling are both costly and time consuming. Transfer learning provides the advantage of avoiding the requirement for large datasets. The calculations become lower and less costly. Transfer learning is a method in which the pretrained model, which is trained on a large dataset, is transferred to the new model that needs to be trained, including new data that are relatively smaller than required. For a certain task, this process initialized the training of the CNN with a small dataset, including a large-scale dataset that was already trained in the pretrained models [
Three CNN-based pretrained models were used to classify CXR images in this investigation. These applied models are MobileNet_V2, VGG16, and InceptionV3. The CXR images were of two classes. One is normal, and the other is a SARS-CoV-2-infected patient. This study also used a transfer learning method, which can perform with inadequate data by using ImageNet data, and it is also efficient in training time. Figure
System architecture of the pretrained model.
As shown in Figure
MobileNetV2 improves the cutting-edge performance of versatile models on numerous assignments and seat stamps across a range of model sizes. In every line of MobileNetV2, it works as a sequence of
After training the model with the train generator, validation generator, step per epoch = 8, and 10 epochs, our model provided 92% accuracy and 98% validation accuracy in the 10th epoch of our model. In the first few epochs, the accuracy of the training was quite low, starting at 55%, and after the 10th epoch, it changed to 92%. The validation accuracy started at 93% and ended at 0.9844 after the 10th epoch. VGG16 has a train accuracy of 98% and validation accuracy of 98%, where the loss of train was 4% and 6% of the validation loss. In ResNet50, this research found 88% training accuracy and 91% validation accuracy. The research found a 29% loss in training and 21% in validation loss. The histories of the accuracy and loss of the four models are given in Table
Histories of the accuracy and loss of the four models.
Model | Accuracy (%) | Validation accuracy (%) | Loss (%) | Validation loss (%) |
---|---|---|---|---|
Custom CNN | 97 | 97 | 6 | 8 |
Modified MobileNetV2 | 98 | 97 | 5 | 6 |
VGG16 | 98 | 98 | 4 | 6 |
ResNet50 | 88 | 91 | 29 | 21 |
InceptionV3 | 98 | 98 | 5 | 5 |
From the plot of accuracy history, it can be observed that the train accuracy increased rapidly after every epoch. In the first epoch, the accuracy was 77%, and after every epoch, it increased. The validation accuracy of the model was 94% and also increased until the last epoch. From the plot of the model accuracy, it can be observed that an increasing line has been drawn for the training accuracy, and for test accuracy, a line which is around the region of 94%–98% accuracy all the time during the epoch. Figures
(a) Model accuracy and (b) model loss.
From the plot of the model loss, it can be assumed that both the lines of training loss and test loss have decreased gradually. After the first epoch, the train loss was 45%, and after 10 epochs, it reached 7%. The validation loss was 16%, and after 10 epochs, it reached 9%. Figure
In transfer learning, based on pretrained models, MobileNetV2 and the modified head model provided even smoother predictions. In the first epoch, the accuracy was 72%, and the validation accuracy was 96%. After the 8th epoch, the accuracy increased to 98% of the training accuracy and 97% of the validation accuracy. From Figure
Model accuracy and loss (MobileNetV2).
From Figure
Model accuracy and loss (VGG16).
Model accuracy and loss (ResNet50).
As shown in Figure
Accuracy and loss (InceptionV3).
The systems plotted confusion matrix, with columns representing real values and rows representing the predicted values. In a classification model, the summary of the prediction results is known as the confusion matrix. In the confusion matrix, correct and incorrect predictions are summed and split down by class, and for
Figure
Confusion matrix.
Three terms are important in error analysis. These are predictions, data, and features. Prediction-based error analysis can be performed using a confusion matrix, where it can be visualised by the percentage of true positives, true negatives, false positives, and false negatives. Data size and nature are also important for the error analysis. Splitting the data accordingly for making trains and tests is also considerable for error analysis because the training and test sets may affect the result on a large scale. Features play a vital role in error analysis. Feature engineering and regularisation were also performed to reduce errors.
The performance analysis of the models is evaluated based on accuracy, precision, recall, and
Matrixes can be used to evaluate a system’s performance, and after the development of the model, its performance. Accuracy is a measure of how well a model or system works (i.e., the number of times the model correctly predicted the actual outcome) and should be calculated. The mathematical formulas for determining the accuracy are expressed in the following equations [
The rate of successfully detecting the real value from a set of all values is recognised as recall, also called sensitivity. Recall can be determined using the following expression [
The number of correct identifications is referred to as precision. The number of times the model’s positive forecast was right can be calculated, and this is more related to the with the model’s positive identification, using the following mathematical formula:
For both recall and precision, a single matrix can be used to summarize the classifier’s performance, and the
In equations (
Model evaluation.
Model | State | Precision | Recall | |
---|---|---|---|---|
Custom CNN | COVID-19 | 1.00 | 0.94 | 0.97 |
Custom CNN | Normal | 0.95 | 1.00 | 0.97 |
MobileNetV2 | COVID-19 | 0.99 | 0.96 | 0.98 |
MobileNetV2 | Normal | 0.97 | 0.99 | 0.98 |
VGG16 | COVID-19 | 1.00 | 0.95 | 0.97 |
VGG16 | Normal | 0.96 | 1.00 | 0.98 |
InceptionV3 | COVID-19 | 1.00 | 0.98 | 0.99 |
InceptionV3 | Normal | 0.98 | 1.00 | 0.99 |
This study also included real testing by providing CXR images as input to the model. When the model is ready, a file is created with an hdf5 extension, which is the actual model that has been created. For this study, four hdf5 files were created for four different types of models. Subsequently, a new notebook file was created as an ipynb extension for the test. Four models were included in the test file, and individual CXR images were provided as input. In Figure
Screenshot of affected image result.
In Figure
After this test, another CXR image was provided as an input to the model, and the image was normal. Figure
Screenshot of normal image result.
Figure
The pretrained models of this study (i.e., VGG16, InceptionV3, and MobileNetv2) were compared with some previous models. Compared with the models in the referenced studies, InceptionV3, MobileNetV2, and VGG16 in this study provided better results, accuracy, and efficiency. In the pretrained models, the accuracy increased to a significant level. Compared with those in previous studies [
Result comparison.
Reference | Model name | Accuracy (%) | Accuracy in this study (%) |
---|---|---|---|
In study [ | InceptionV3 | 95 | 98 |
In study [ | VGG16 | 95.9 | 98 |
In study [ | MobileNetV2 | 97.4 | 98 |
In study [ | ResNet50 | 92.5 | 88 |
In study [ | Custom CNN | 93 | 97 |
In study [ | Custom CNN | 94.5 | 97 |
In Table
In this study, CNN models were presented, namely, a full custom CNN, pretrained MobileNetV2, VGG16, and InceptionV3, which are modified in the final layers. The models used in this study obtained almost the same accuracy. The dataset contains 2542 SARS-CoV-2-affected and normal CXR images. The accuracy of the pretrained model was 98%, whereas that of the customized CNN model was 97%. Further work will be performed on a larger dataset and with other pretrained models. These models obtained excellent results in the dataset. MobileNetV2 and VGG16 performed well on these datasets. Classification and feature extraction performed well, and the model checks provided the correct results. These models can detect SARS-CoV-2 using a simple CXR image in the shortest possible time. X-ray technology is currently available and is cost friendly. Thus, it can be a quite efficient method for detecting COVID-19-affected patients. To test and trace the virus, this method way is quick and has no risk of waiting to test for COVID-19 and spreading the virus during that time.
This innovation will greatly change the medical sector. Using this technique, COVID-19 patients can be quickly identified, which may contribute to addressing the current pandemic situation. Chest radiography is comparably safe for obtaining a sample than from the nose of a patient. In the future, this type of technique will help diagnosis. Several deep learning techniques can be used to optimise the parameters to create a robust model, which can help mankind. The metaheuristic-based deep COVID-19 model could also be a good technique to be explored in the future [
The data used to support the findings of this study are freely available at
The authors would like to confirm that there are no conflicts of interest regarding the study.
The authors are thankful for the support from Taif University Researchers Supporting Project (TURSP-2020/114), Taif University, Taif, Saudi Arabia.