An Efficient CNN Model for COVID-19 Disease Detection Based on X-Ray Image Classification

Artiﬁcial intelligence (AI) techniques in general and convolutional neural networks (CNNs) in particular have attained successful results in medical image analysis and classiﬁcation. A deep CNN architecture has been proposed in this paper for the diagnosis of COVID-19 based on the chest X-ray image classiﬁcation. Due to the nonavailability of suﬃcient-size and good-quality chest X-ray image dataset, an eﬀective and accurate CNN classiﬁcation was a challenge. To deal with these complexities such as the availability of a very-small-sized and imbalanced dataset with image-quality issues, the dataset has been preprocessed in diﬀerent phases using diﬀerent techniques to achieve an eﬀective training dataset for the proposed CNN model to attain its best performance. The preprocessing stages of the datasets performed in this study include dataset balancing, medical experts’ image analysis, and data augmentation. The experimental results have shown the overall accuracy as high as 99.5% which demonstrates the good capability of the proposed CNN model in the current application domain. The CNN model has been tested in two scenarios. In the ﬁrst scenario, the model has been tested using the 100 X-ray images of the original processed dataset which achieved an accuracy of 100%. In the second scenario, the model has been tested using an independent dataset of COVID-19 X-ray images. The performance in this test scenario was as high as 99.5%. To further prove that the proposed model outperforms other models, a comparative analysis has been done with some of the machine learning algorithms. The proposed model has outperformed all the models generally and speciﬁcally when the model testing was done using an independent testing set.


Introduction
e virus called the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) had been discovered in late 2019.e virus which originated in China became a cause of a disease known as Corona Virus Disease 2019 or COVID-19.e World Health Organization (WHO) declared the disease as a pandemic in March 2020 [1,2].According to the reports issued and updated by global healthcare authorities and state governments, the pandemic affected millions of people globally.e most serious illness caused by COVID-19 is related to the lungs such as pneumonia.e symptoms of the disease can vary and include dyspnea, high fever, runny nose, and cough.ese cases can most commonly be diagnosed using chest X-ray imaging analysis for the abnormalities [3].
X-radiation or X-ray is an electromagnetic form of penetrating radiation.ese radiations are passed through the desired human body parts to create images of internal details of the body part.e X-ray image is a representation of the internal body parts in black and white shades.X-ray is one of the oldest and commonly used medical diagnosis tests.Chest X-ray is used to diagnose the chest-related diseases like pneumonia and other lung diseases [4], as it provides the image of the thoracic cavity, consisting of the chest and spine bones along with the soft organs including the lungs, blood vessels, and airways.e X-ray imaging technique provides numerous advantages as an alternative diagnosis procedure for COVID-19 over other testing procedures.
ese benefits include its low cost, the vast availability of X-ray facilities, noninvasiveness, less time consumption, and device affordability.us, X-ray imaging may be considered a better candidate for the mass, easy, and quick diagnosis procedure for a pandemic like COVID-19 considering the current global healthcare crisis.
Deep learning and ANNs have endorsed an exponential research focus over the last decade.e deep ANNs have outperformed other conventional models on many essential benchmarks.us, ANNs have generally proved to be the state-of-the-art technology across a wide range of application areas, including NLP, speech recognition, image processing, biological sciences, and other commercial as well as academic areas.
e advancement of ANNs has massive potential in healthcare applications, specifically in medical data analysis, diagnosis through medical image processing, and analysis.As seen in recent times, various parts of the world face the healthcare crisis both in terms of the needed number of healthcare professionals and testing equipment.Considering the present pandemic situation, there is an appurtenant relationship between the detection of COVID-19 cases and chest X-ray image analysis and classification.In this work, an automatic diagnostic system has been developed using CNN which uses chest X-ray analysis results to diagnose whether a person is COVID-19-affected or normal.Preliminary analysis of this study has shown promising results in terms of its accuracy and other performance parameters to diagnose the disease in a cost-effective and time-efficient manner.
is study used CNN with extra layers to improve the COVID-19 X-ray image classification accuracy.In neural networks, the CNN structure is specially designed to process the two-dimensional image tasks although it can also be used in one-and three-dimensional data.CNN is a type of DNN, inspired by the visual system of the human brain, and is most commonly used in the analysis of visual imagery.To train the CNN model, first, the dataset has been obtained from GitHub [5].Since the dataset obtained for training the model was very small in size and imbalanced, to solve the problem of having very-limitedsized X-ray image dataset, it has been extended using data augmentation techniques to increase its size and also to make the model training feature rich.Image flipping and rotation at different angles have been used to generate more data.For dataset balancing in terms of proportion of images with different class labels, the dataset has been further extended with some more image instances of the minority class.After data augmentation and dataset balancing, the CNN model has been trained using a total of 800 images (400 COVID-19 and 400 normal) and then the model has been tested by using a test set.
e CNN model performance evaluation has then been done using different performance metrics.
ese metrics include accuracy, precision, sensitivity, specificity, ROC AUC, and F 1 score.Later, the proposed CNN model has also been tested using an independent dataset obtained from the IEEE data port [6] for independent validation of the proposed CNN model.Various machine learning models have also been used for the comparative performance analysis in comparison with the proposed CNN model to show its significance over these models.
e following are some of the key findings of this study: (i) CNN with extra convolutional layers (e.g., six layers have been used in the CNN proposed in this study) performs best in COVID-19 diagnosis (ii) CNN models require a sufficient amount of images for efficient and more accurate image classification (iii) Data augmentation techniques are very effective to improve the CNN model performance remarkably by generating more data from an existing limitedsize dataset (iv) Data augmentation is also effective in image classification as it gives the ability of invariance to CNNs (v) e proposed CNN model performance has been proved statistically significant in the performance of other ML models (vi) CNN-based diagnosis using X-ray imaging can be very effective for medical sector to handle the mass testing situations in pandemics like COVID-19 e rest of the paper is divided into various sections.Section 2 constitutes the related work.Section 3 presents the workflow.Section 4 contains the materials and methods used.Section 5 describes the results of the study and discussion, and in the end, Section 6 presents the conclusion.

Related Work
Deep learning has shown a dramatic increase in the medical applications in general and specifically in medical imagebased diagnosis.Deep learning models performed prominently in computer vision problems related to medical image analysis.
e ANNs outperformed other conventional models and methods of image analysis [7,8].Due to the very promising results provided by CNNs in medical image analysis and classification, they are considered as de facto standard in this domain [9,10].CNN has been used for a variety of classification tasks related to medical diagnosis such as lung disease [10], detection of malarial parasite in images of thin blood smear [11], breast cancer detection [12], wireless endoscopy images [13], interstitial lung disease [14], CAD-based diagnosis in chest radiography [15], diagnosis of skin cancer by classification [16], and automatic diagnosis of various chest diseases using chest X-ray image classification [17].Since the emergence of COVID-19 in December 2019, 2 Complexity numerous researchers are engaged with the experimentation and research activities related to diagnosis, treatment, and management of COVID-19.
Researchers in [18] have reported the significance of the applicability of AI methods in image analysis for the detection and management of COVID-19 cases.COVID-19 detection can be done accurately using deep learning models' analysis of pulmonary CT [18].Researchers in [19] have designed an open-source COVID-19 diagnosis system based on a deep CNN.In this study, tailored deep CNN design has been reported for the detection of COVID-19 patients using X-ray images.Another significant study has reported on the X-ray dataset comprising X-ray images belonging to common pneumonia patients, COVID-19 patients, and people with no disease [20].e study uses the state-of-the-art CNN architectures for the automatic detection of patients with COVID-19.Transfer learning has achieved a promising accuracy of 97.82% in COVID-19 detection in this study.Another recent and relevant study has been conducted on validation and adaptability of Decompose-, Transfer-, and Compose-type deep CNN for COVID-19 detection using chest X-ray image classification [21].e authors have reported the results of the study with an accuracy of 95.12%, sensitivity of 97.91%, and specificity of 91.87%.
Having reviewed the relevant and recent research work on the design, development, and possible applicability of CNNs in COVID-19 detection using medical images, particularly X-ray images, due to the availability of a very less amount of X-ray images of COVID-19 patients and the poor quality of some images in the dataset, the accuracy of the models was affected.is study is particularly focused on dataset preprocessing to fine-tune it, data augmentation, and design of a CNN with extra layers to increase further the performance of the COVID-19 diagnosis using CNNs as described in subsequent sections.

Workflow
As illustrated in Figure 1, the workflow of this study begins with collection of primary dataset containing two image classes: one class belonged to chest X-rays of COVID-19confirmed cases and the other class of images belonged to the normal people without the disease.In the next phase of the study, the concerned medical professionals analysed the dataset and removed some of the X-ray images which were not clear in terms of quality and diagnostic parameters.Hence, the resulted dataset was very clean, as each X-ray image was of good quality as well as clear in terms of significant diagnostic parameters according to their expertise.In the third phase, the dataset was augmented using standard augmentation techniques to increase its size.e resulted dataset was used to train the model in the next phase.After training, the model was tested for its performance in the disease detection.e testing of proposed CNN model has been done using test dataset held from the primary dataset as well as using the independent validation dataset.Table 1 contains the details of datasets including the total number of X-ray images in training set, testing set, validation set, and the proportion of X-ray images in the two prediction classes.

Dataset.
In the experiments of this study, a primary dataset containing 178 X-ray images has been used as a base dataset.Of 178 images, 136 X-ray images belonged to confirmed COVID-19 patients and other 42 images belonged to normal or people with other diseases like pneumonia.e dataset used is available on GitHub [5].e basic dataset consists of two classes of COVID-19 with 136 samples and others with 42 samples.us, the dataset was imbalanced and needed preprocessing to achieve promising results.As a first attempt, CNN was trained on the given original dataset and around 54% accuracy was achieved, which was not worthy of the current application domain.
e main dataset sources used in this study are enlisted as follows: (i) Primary chest X-ray image dataset of COVID-19 patients collected from GitHub.e dataset has been collected by the University of Montreal's Ethics Committee no.CERSES-20-058-D from different hospitals and clinics [5].(ii) For dataset balancing, a collection of chest X-ray images were collected from Kaggle [22].(iii) Independent validation dataset containing a collection of 100 COVID-19 X-ray images for the realworld testing of the proposed CNN was collected from IEEE DataPort [6].
e experiments have been conducted using Core i7 7thgeneration machine with 8 GB RAM, Microsoft Windows 10 platform using Python language with Anaconda 3 software and Jupyter Notebook.

Balancing Dataset Classes.
To balance the given dataset, in order to improve the performance of the proposed CNN models in the detection of COVID-19 cases, 136 normal chest X-ray images have been used.ese concatenated extra X-ray images were downloaded from Kaggle [22].After balancing the dataset when the models have been trained again on the resulted dataset, the accuracy of the given CNN models was improved to 69%.Still, the performance given by the models in terms of accuracy and other measures was not justified as an effective system for COVID-19 detection.

Analysis of X-Ray Images by Medical Experts.
A deep analysis was done on the X-ray images by medical specialists.Out of 135 X-ray images of confirmed COVID-19 patients, only a set of 90 X-ray images was selected as a perfect candidate to train the models.e resulted dataset now was reduced to 90 COVID-19-confirmed cases and 90 normal X-ray images.e resulted dataset was again used in training the proposed CNN model; there was again an improvement Complexity in the performance of the model.Specifically, the accuracy was increased to 72% in the given scenario.Still, because the dataset was not containing a sufficient number of images for an effective training, there was not a significant increase in the accuracy and other performance metrics.

Data Augmentation.
Data augmentation is a technique that can significantly increase the data instances of a dataset to train a model [23].In the case of image datasets, the technique uses the basic image processing operations, such as flipping, rotating, cropping, or padding for augmentation.
e dataset is then extended by these transformed images resulted from the existing image set, which increases the size of dataset to train the neural networks [24].To solve the problem of the availability of a small size dataset that was affecting the performance of the proposed CNN, the data augmentation method has been used in this study.is technique increased the size of the dataset; in addition, it provides more learning features to the learning model.Two image processing operations, flipping and rotation, have been used in this study for data augmentation.In the first phase of data augmentation, the 90 X-ray images have been flipped to get extra 90 images.
e resulted dataset was increased to contain 180 images after applying this operation.In the second phase, the original 90 images have further been rotated by 90 °angle to get 90 more images and then rotated by 180 °angle to get 90 more images, and finally, the original 90 images were further rotated by 270 °angle to get more 90 images.ese operations resulted in a dataset containing 450 COVID-19 X-ray images.Table 2 shows the image processing operations performed on the image types and corresponding count of images resulted from the operation.Figure 2 shows the effect of augmentation techniques applied to the original sample image of the dataset used in this study.

Convolutional Neural Networks (CNNs).
e CNNs are inspired by visual system of human brain.e idea behind the CNNs thus is to make the computers capable of viewing the world as humans view it.is way CNNs can be used in the fields of image recognition and analysis, image classification, and natural language processing [25].CNN is a type of deep neural networks which contain the convolutional, max pooling, and nonlinear activation layers.
e convolutional layer, considered as a main layer of a CNN, performs the operation called "convolution" that gives CNN its name.Kernels in the convolutional layer are applied to the layer inputs.All the outputs of the convolutional layers  4 Complexity are convolved as a feature map.In this study, the Rectified Linear Unit (ReLU) has been used in the activation function with a convolutional layer which is helpful to increase the nonlinearity in input image, as the images are fundamentally nonlinear in nature.us, CNN with ReLU in the current scenario is easier and faster.Since the ReLU is zero for all negative inputs, it can be defined as Here, the function implies that the output z is zero for all negative value and positive value remains the constant as shown in Figure 3.
e pooling layer or subsampling layer is also an important building block of CNN.On each feature map extracted through the convolution layer, the pooling layer operates independently.To minimize overfitting and the number of extracted features, it decreases the spatial size of the feature map and returns the important features.Pooling can be the max, average, and sum in the CNN model.In this study, max pooling has been used because others may not identify the sharp features easily as compared to max pooling.In addition, the batch normalization layer has been used in this study as it involved the training of a very deep neural network.So the technique adjusts the scaling and activation to normalize the input layer and speed up the learning procedure between hidden units.e dropout layer with a 20% dropout rate has also been used, which drops the neurons during the training chosen at random to reduce the overfitting problem.Towards the last stage of the CNN used in the study, there is a flatting layer to convert the output of convolutional layers into a single-dimensional feature vector.In other words, the flattening layer arranges all the pixel data output produced by convolutional layers in one vector.After flattening, the vector data is given as an input to the next layers of the CNN called fully connected layers or dense layers.In a fully connected layer, each neuron of the previous layer is directly connected to each of the neurons in its next layer.
e main functionality of dense layers is to take flattened output results from the convolution and pooling layers and as input and classify the image to a specific class label.Each value of the flattened feature set represents the probability of a feature belonging to a specific class.us, on the basis of these probabilities, the fully connected network with dense layers finally drives the classification decision.

e Proposed CNN Architecture.
e proposed CNN model consists of 38 layers in which 6 are convolutional (Conv2D), 6 max pooling layers, 6 dropout layers, 8 activation function layers, 8 batch normalization layers, 1 flatten layer, and 3 fully connected layers; CNN model input image shape is (150, 150, 3), i.e., 150-by-150 RGB image.In all Con2D layers, a 3 × 3 size kernel has been used but the filter size after every two Con2D layers increases.At the 1st and 2nd layers of Con2D, 64 filters have been used to learn from input and the 3rd and 4th layers of Con2D use 128 filters, and at the 5th and 6th layers, 256 filters have been used.After each Con2D layer, the max pooling layer with 2 × 2 pooling size has been used, the batch normalization layer has been used with the axis � −1 argument, the activation layer has been used with the ReLU function, and the dropout layer has been used with 20% dropout rate.e output of 256 output neurons of the final Con2D layer is followed by max pooling, batch normalization, activation, and dropout layer.Since the final pooling and convolutional layer gives a three-dimensional matrix as output, to flatten the matrix, a flattening layer has been used which converts them into a vector that will be input for 3 dense layers.
is study uses CNN for binary classification; that is the reason for using the binary crossentropy (BCE) loss function.In binary classification since only one output node is needed to classify the data to one of the two given classes, so in the case of BCE loss function, the output value is being given to a sigmoid activation function.e output given by the sigmoid activation function lies between 0 and 1.It finds the error between the predicted class and the actual class.
e "Adam" optimizer has been used which changes the attribute weight and learning rate to reduce the loss of the learning model.e model parameter values are given in Table 3, and the model architecture is given in Figure 4.During the initial experiments, the CNN has been used with different configurations in terms of the usage of number of convolution layers in the model.e decision of how many convolution layers used in the model was made by using an incremental approach.First, the CNN was tested using only one convolutional layer and the results were analysed.en, the CNN was built with two layers and results were analysed and so on.e approach had been continued till the results provided by the model were accurate and effective.e final model which was very feasible according to its results consisted of six convolution layers.e results of each increment of the model have been reported in the Results section.

Results and Discussion
After preprocessing of the dataset, the final dataset consisted of a total of 900 X-ray images.For training and testing the proposed CNN, the dataset was partitioned into two subsets.
e training dataset contained 400 COVID-19 X-ray images and 400 normal X-ray images, making a total of 800 X-ray images.e testing dataset similarly contained 100 X-ray images, in which 50 X-ray images were from each class COVID-19 positive and normal.en, the training subset containing 800 X-ray images has been passed to the model with 25% validation size.So, out of 800 X-ray images, with each epoch, 600 X-ray images train the model, and 200 X-ray Images validate the model.As mentioned in the proposed  3. To evaluate the overall performance, in addition to accuracy, other important metrics have been adopted in this study including F 1 score, precision, sensitivity, specificity, and ROC AUC. e scores of these parameters are reported in Table 4. e confusion matrix of the model is shown in Figure 5. Figures 6 and 7 show the curve of accuracy and loss between training and testing, respectively.According to the confusion matrix, the CNN model test uses the 100 X-ray images from the GitHub dataset, where 50 images belong to the COVID-19 class and 50 to the normal images.e CNN model shows significant performance on testing and predicts all 100 images correctly with 0% error rate as reported in the confusion matrix of Figure 5. Figure 6 shows the plots drawn from the training and testing accuracy achieved by the proposed CNN model.Figure 7 shows the training and testing loss for the proposed CNN model.As can be observed in Figure 7, the proposed CNN is not taking a lot of time to converge, as in the first epoch the training loss is 31 and right after 5 epochs, it drops to 0.9; then after 23 epochs, it drops again to 0.0011, and at the last epoch, the total loss is 0.000058.
In addition to the above performance measurements, K fold cross-validation has been applied to further test the proposed model for its skill.In this study, a 10-fold crossvalidation has been used.e results provided are very effective as the average score of 10 iterations is 99.67% (±0.15%).

Testing of the CNN Using COVID-19 Independent Validation Data.
As proof of the significance of the proposed CNN model in the classification for detection of COVID-19 from X-ray images, the trained model has been tested using an independent dataset obtained from the IEEE DataPort [6].e independent test dataset contained 100 COVID-19 X-rays.
is dataset was then extended by adding 100 normal images for testing.e model using the same settings performs very well with an accuracy of 0.995 and precision 1.000 along with other performance parameters reported in Table 5.
Figure 8 shows the confusion matrix of CNN model when tested on the independent test dataset which have been obtained from IEEE DataPort.e CNN model also performed very efficiently on independent test data giving 198  Falsely classified image was examined by the experts.It was noted that the X-ray image belonged to a person on an early stage of COVID-19.As a result, the image does not contain the prominent patterns with which the image could have been differentiated from normal X-ray image class.Figure 9 shows the actual X-ray image of a COVID-19 positive case which was falsely classified by the model.e comparison of CNN model on test data and independent validation data in terms of different performance metrics is shown in Figure 10.
As mentioned in the proposed CNN model architecture section, the proposed model was constructed in an incremental approach.Starting with single convolutional layer model, in each following increment, a convolutional layer had been added and results were analysed.Table 6 illustrates the results of the model in terms of accuracy after every increment from single convolutional layer to a stable model consisting of six convolutional layers.e incremental approach comparison is shown in Figure 11.

Performance Comparison of Machine Learning Models
with the CNN Model.In this study, experiments have also been conducted on some of the relevant machine learning models such as Random Forest (RF) [26,27], gradient boosting machine (GBM) [28,29], support vector classifier (SVC) [30], logistic regression (LR) [31], and k-nearest neighbor (KNN) [32] for comparative analysis of CNN with these models.ese models have been used with their best parameter settings as shown in Table 7.
RF has been used with two hyperparameters as shown in the table.e n_estimators define how many decision trees are generated under RF to make a prediction.e max_depth defines what should be the maximum depth of each decision tree in RF, so in this setting, the max_depth parameter restricts the decision tree to a maximum 300 level depth.
GBM has been used with three parameters, two are the same as in RF and one is learning_rate which is a tuning    In contrary, even if the hyperplane misclassifies certain points, a very small value of C would cause the optimizer to search for a wider-margin separating hyperplane.LR has been used with the "liblinear" solver because it is preferred when there is a small dataset, and the second parameter is C as used in SVC.KNN has been used with all the default parameters setting.us, the CNN and machine learning models have been trained using the original dataset of this study.en, both CNN and each of the machine learning models have been tested using the training subset of the original dataset.In this scenario, three machine learning models SVC, LR, and KNN performed well almost the same as CNN as reported in Table 8.On the other hand, when the same machine learning models with the same settings were tested on the independent test set as mentioned earlier, their performance was degraded while the proposed CNN maintains the performance metrics in this scenario also.e results of the second scenario in which models were tested using the independent test set are reported in Table 9.
is comparative analysis and also comparing the overall performance results achieved by the proposed CNN model    12 shows the comparisons of the performances given by different machine learning models and CNN. e bar graph labeled as test data is the result of the first test scenario, where each of the models was tested on the test subset extracted from the original dataset.
e second bar graph labeled as independent validation data is the result of the second test scenario, where each of the models was tested on the independent test set.
To compare the performance results of the proposed CNN-based methodology for the current application domain, the results of other recent studies done by different researchers have been collected and compared.e comparison of these study results along with the mythology used has been shown in Table 10.

Statistical Significance of the Proposed CNN Model.
In order to test whether the proposed CNN model has a statistical significance over the other models, t-test [36] has been performed.To determine the significance, the alternate hypothesis H a and null hypothesis H o have been established as follows: (i) Alternate Hypothesis (H a ).
ere is a statistical significance in the performance given by the  First, the significance of the proposed CNN model has been shown on two models, RF and GBM, when these models have been tested on the test data set derived from the primary dataset.e H o has been rejected in the case of RF and GBM. at means the proposed CNN has a statistical significance over RF and GBM only in this testing scenario.While H o was accepted in the case of SVC, LR, and KNN, that means the proposed CNN has no statistical significance over these models.
Second, the t-test was performed on the performance results of all the models when these models were tested on the independent validation data set.In this scenario, H o was rejected in comparison to all the other models.is implies that the CNN is statistically significant on all the given models in this model testing scenario.

Conclusion
is study has been conducted to demonstrate the effective and accurate diagnosis of COVID-19 using CNN which was trained on chest X-ray image datasets.e model training was performed incrementally with different datasets to attain the maximum accuracy and performance.e primary dataset was very limited in size and also imbalanced in terms of class distribution.ese two issues with the primary dataset affected the performance of the models very badly.To overcome these issues, the dataset was preprocessed using different techniques, including dataset balancing technique, manual analysis of X-ray images by concerned medical experts, and data augmentation techniques.To balance the dataset for model training and also to test its performance parameters, an ample number of chest X-rays were collected from different available sources.After training and testing the CNN model on the fully processed dataset, the performance results have been reported.In addition, to test further the model performance, particularly the accuracy, the proposed CNN model has been tested using an independent dataset as an independent validation and real-world test obtained from IEEE DataPort [6].As reported in the results in both the testing scenarios, the proposed CNN model has shown highly promising results.Since this study uses an incremental approach in training the model using different sizes and types of datasets, the approach confirmed the fact that CNN models require an ample amount of image data for the efficient and more-accurate classification.e data augmentation techniques are very effective to significantly improve the CNN model performance by generating more data from an existing limited-size dataset and also by giving the ability of invariance to the CNN.e proposed CNN model's number of convolutional layers was also decided in an incremental approach; that is, in the first increment, only one convolutional layer was used and, then, on the basis of model performance metrics, one layer in each increment was increased till it reaches a stable and efficient stage in terms of its performance.e final version of the CNN consisted of six convolutional layers.A comparative analysis has also been done to further test the scope of the proposed CNN model by performance comparisons with some of the prominent machine learning models such as RF, GBM, SVC, LR, and KNN.
e results prove that the proposed CNN has outperformed all the models particularly when each model was tested on the independent validation dataset.Considering the significant effect of data augmentation techniques on model performances, the authors are currently working on the application of other state-of-theart data augmentation algorithms and techniques.In the future, the results obtained from the study concerned with the applicability of these modern data augmentation techniques in different application domains will be published.

Data Availability
e data supporting this study are from previously reported studies and datasets, which have been cited.
Figure 6 shows the model accuracy during the training and validation as a graph where the curve drawn in blue color shows the training accuracy of CNN, while the curve with orange color shows the validation accuracy.Training accuracy of the CNN according to Figure 6 remains consistent after the 5 epochs and the CNN also shows a consistent validation accuracy after the 25 epochs.e plot in Figure 7 shows the loss during the training and validation of CNN. e training loss of CNN is minimum and consistent from the 1st epoch while validation loss becomes minimum after 5 epochs and remains consistent till the last epoch.e above results show the efficiency of the CNN model proposed in this study.

Figure 5 :Figure 6 :
Figure 5: Confusion matrix of CNN model for primary/GitHub dataset.Here 0 represents the normal class and 1 represents the COVID-19.

Figure 7 :Figure 8 :Figure 9 :Figure 10 :
Figure 7: Training and testing loss plot by the CNN model.

Figure 11 :
Figure 11: Accuracy score comparison with different number of CNN layers.

Figure 12 :
Figure 12: Accuracy of different machine learning models along with proposed CNN model.(a) Test data.(b) Independent validation data.

Table 1 :
Dataset image count for training and testing.
Normal X-ray images for testing without labels
Complexity architecture of the CNN model, it consisted of 38 layers in which 6 are convolutional, 6 max pooling layers, 6 dropout layers, 8 activation function layers, 8 batch normalization layers, 1 flattening layer, and 3 fully connected layers.e CNN model thus achieved an extraordinary performance with an accuracy of 100% with the test data subset used from the processed dataset of this study with a precision of 1.0, with the model parameter values given in Table

Table 4 :
Model performance on test data.

Table 5 :
Model performance on independent validation data.

Table 6 :
Accuracy score with different number of CNN layers.

Table 8 :
Models' performance on test data.

Table 9 :
Models' performance on independent validation data.Null Hypothesis (H o ).ere is no statistical significance in the performance given by the proposed CNN model over the performance of other models

Table 10 :
Comparison with other studies performed on the same dataset.