SARS-CoV-2 Prediction Strategy Based on Classification Algorithms from a Full Blood Examination

A fast and efficient diagnosis of serious infectious diseases, such as the recent SARS-CoV-2, is necessary in order to curb both the spread of existing variants and the emergence of new ones. In this regard and recognizing the shortcomings of the reverse transcription-polymerase chain reaction (RT-PCR) and rapid diagnostic test (RDT), strategic planning in the public health system is required. In particular, helping researchers develop a more accurate diagnosis means to distinguish patients with symptoms with COVID-19 from other common infections is what is needed. The aim of this study was to train and optimize the support vector machine (SVM) and K-nearest neighbors (KNN) classifiers to rapidly identify SARS-CoV-2 (positive/negative) patients through a simple complete blood test without any prior knowledge of the patient's health state or symptoms. After applying both models to a sample of patients at Israelita Albert Einstein at São Paulo, Brazil (solely for two examined groups of patients' data: “regular ward” and “not admitted to the hospital”), it was found that both provided early and accurate detection, based only on a selected blood profile via the statistical test of dependence (ANOVA test). The best performance was achieved by the improved SVM technique on nonhospitalized patients, with precision, recall, accuracy, and AUC values reaching 94%, 96%, 95%, and 99%, respectively, which supports the potential of this innovative strategy to significantly improve initial screening.


Introduction
Te World Health Organization designated SARS-CoV-2 (COVID-19) as a pandemic on March 11, 2020 [1].Te rapid propagation of the disease around the world has increased the need to apply health protection measures.Tese measures were aimed at solving the problem of overburdened intensive care units, as well as to strengthen and preserve the capacity of hospitals.As a result, many countries have adopted new health approaches and diverse perspectives to prevent the excess spread of the virus in terms of virus vitality within a specifc political-economic territory.Examples include the closure of borders and the cancellation of sporting and cultural events.Unfortunately, these decisions have caused economic, social, and environmental disruptions.In addition, they have brought uncertainties and fears to the world economy, education, health, and the fundamental rights of the population.
As of September 30, 2022 [2], more than 622,585,710 cumulative cases of SARS-CoV-2 have been confrmed worldwide, along with more than 6,547,814 deaths in 228 countries and territories.Approximately 40% of cases present with mild disease (cough and fever), 40% with moderate disease (bilateral pneumonia), 15% with severe disease, and 5% with critical disease [3].Te severe consequences of COVID-19 are due to its rapid spread, the inability to make a quick and accurate diagnosis, and the inability to perform large-scale testing of patients.It is therefore crucial to establish rapid and reliable diagnostic methods to detect the disease in real time [4].
Indeed, healthcare is a vast sector that requires the collection, analysis, and processing of medical data, which have recently become impossible due to several factors, such as massive data volumes, the inadequacy of wireless network applications, and security issues [5].Hence, it is essential to use data mining to fnd and extract rich information for classifcation.Medical datasets can be used to precisely detect SARS-CoV-2 infections [6].However, the primary limiting factor is data processing, which necessitates realtime data collection and the provision of data to researchers for immediate medical response.
In the same vein, artifcial intelligence (AI) promises to transform the healthcare sector [7].Machine learning (ML) and deep-learning algorithms are capable of detecting COVID-19 [8].In fact, the classifcation is one process by which COVID-19 patients are assigned to their corresponding classes [9].Tere are many classifcation methods, such as the Bayesian method, AdaBoost, random forest, artifcial neural networks, and K-nearest neighbors [9].
Much COVID-19 research has focused on how AI can be deployed to detect, confrm, and make forecasts at early stages.As the authors in [10] involved regression models (CUBIST, RF, RIDGE, SVR, and stacked-set learning), the ARIMA statistical model has also been used in some cases to make similar predictions.Te authors in [11] adopted BRNN, KNN, QRF, and SVR as well as the VDM approach coupled with exogenous climate variables to predict confrmed cumulative cases in ten Brazilian states.Tese predictions were made one, three, and six days in advance.As reported in [12], the SVM model detected and discriminated patients with severe COVID-19 from those with mild symptoms using 28 features based on clinical information and blood/urine test data, with an overall accuracy of 0.8148.
Previously, an efcient scheme in [13] was proposed using the available, relevant X-ray images to train an efcient deep neural AI network and use the trained parameters to detect COVID-19 cases even with a very small sample of COVID-19 X-rays.Te proposed method provided a very satisfactory detection performance at 97.4% accuracy.
A case report in [14] emphasized the importance of full autopsy in understanding the disease process and identifying potential targets for therapeutic interventions.Te authors of the aforementioned study conducted a full autopsy on a confrmed COVID-19 patient in Lagos, Nigeria, providing valuable insights into the pathological features of the disease.
Two studies proposed diferent machine-learning approaches for addressing COVID-19 challenges.Ribeiro et al. [15] proposed the use of ensemble-learning models coupled with urban mobility information to predict COVID-19 incidence cases.Tis approach leverages the relationship between human mobility patterns and the spread of the virus to achieve accurate predictions.Another study [16] introduced an equilibrium-based COVID-19 diagnostic method using routine blood tests and a sparse deep convolutional model.Tis method provides a noninvasive, low-cost, and potentially more accurate alternative to existing diagnostic methods.Da Silva et al. [11] focused on using climatic exogenous variables to forecast COVID-19 cases.Tis study proposed a novel approach for forecasting Brazilian and American COVID-19 cases based on artifcial intelligence coupled with climatic exogenous variables, providing a more holistic approach to COVID-19 prediction.
In [17], the researchers applied AI to identify commercially available medicines that may be efective in treating patients with COVID-19.At the core of their proposed model, they implemented the bidirectional encoder representations from transformers (BERTs) framework.
COVID-19 primarily afects the respiratory system.Tus, in [18], the authors presented a fne-tuned model based on a generative adversarial network to detect one of the symptoms of COVID-19 infection from chest X-ray scans.Gunraj et al. [19] applied a convolutional neural network model to detect COVID-19 in patients using chest X-ray images.Tey used pretrained ImageNet and trained the model on an open-source dataset of X-ray images.Aggarwal et al. [20] reviewed and summarized a number of important research papers on deep learningbased classifcation of COVID-19 across CXR and CT images.Using a deep learning-based P-shot N-ways Siamese network as well as prototypical nearest neighbor classifers, classifcation of COVID-19 infection from lung CT slices was proposed by the authors in [21].Another approach for classifcation of COVID-19 chest X-ray images from two diferent datasets (small and large datasets) using a tunable Q-wave transform (TQWT) based on a memristive crossbar array (MCA) was proposed by the authors in [22].Te average accuracy values obtained for the proposed method are 98.82% and 94.64%, respectively.
Together, these studies highlight the potential of different approaches; therefore, it is necessary to construct prediction techniques and innovative applications for frequent diseases, as well as to further expand prediction methodologies.Te objective of the present study is to address this need.Te main contributions of the proposed work that have not been addressed in the prior art are as follows: (i) We showed that it is possible to predict whether a person is positive or negative for COVID-19 infection in the early stage of the disease, using anonymized data from Israelita Albert Einstein Hospital [23].Te data analysis process consists of two stages: statistical analysis followed by data processing with machine-learning algorithms using SVM and KNN.Tis article is structured as follows: Section 2 includes a description of the dataset, the data analysis and data preprocessing for the classifcation algorithms used, and a detailed description of the materials and methodology.Te results of the experiment are presented in Section 3. Section 4 discusses the results, and Section 5 ofers conclusions on the prospects for use of this analysis procedure to detect COVID-19.

Dataset.
Te data used in this study were obtained from the Kaggle website [23].Information was retrieved for patients treated at Israelita Albert Einstein Hospital in São Paulo, Brazil, who had samples collected to perform the SARS-CoV-2 RT-PCR and additional laboratory tests between March 28, 2020, and April 3, 2020 [24].Following international best practices, all data were anonymized.Te normalization process resulted in a mean of 0 and a standard deviation value for all clinical data.
Te hospital data consisted of 5,644 individual patients and 111 variables, as presented in Table 1.Te patients were classifed into four groups: community (not admitted to hospital), regular ward, semi-intensive unit, and intensive care unit (see Table 2).

Data Analysis.
In the dataset provided, we have divided the extracted information into columns and rows.Rows are referred to as observations.Each column in this dataset shows some information about observations, such as hematocrit, hemoglobin, or platelets.Tese columns are labeled features or predictor variables of our dataset.Te "SARS-CoV-2 exam result" column classifes our dataset and predicts whether or not the individual is infected with COVID-19; consequently, it is considered as the target variable.
Te degree of infuence that the variables in the dataset have over the target value can be determined by their correlation with the target.As a result, we were able to pinpoint the features that can distinguish an infected patient from a noninfected patient.
During our data mining and analysis of the "blood/target and hospitalization/blood" visualization graphs, we noticed that the monocyte, platelet, leukocyte, and eosinophil levels for infected and noninfected individuals were signifcantly diferent (see Figure 1).In addition, the relationship between a patient's hospitalization status and their blood characteristics difered for each hospitalization category (community, regular ward, semi-intensive unit, or intensive care unit) (see Figure 2), which presents the possibility that these variables are related to positive COVID-19 infection.Testing this hypothesis through Student's t-test allowed us to verify that the means (averages) between the two distributions (positive versus negative COVID-19 test result) are significantly diferent at the level of these variables.
Student's t-test results (see Table 3) support our hypothesis that the levels of platelets, monocytes, eosinophils, and leukocytes are signifcant for predicting SARS-CoV-2 and, therefore, can assist in decision-making.

Data Preprocessing.
Data preprocessing consists of treating, fxing, and preparing data before inputting it for machine learning.Te goal is to transform the raw data into   Te Scientifc World Journal a format conducive for the development of a machinelearning model and to clean the dataset as much as possible to improve the performance of the model.For our data, we followed a simple and efcient approach.Te dataset consists of columns with continuous and categorical variables.Since the machine-learning model requires that all input data be in numeric form, we have coded the target value "SARS-CoV-2 exam result" by assigning 0 for "negative" and 1 for "positive."Te hospital data contain 111 columns with 90% missing values (5,046 of the 5,644 results).Te dataset is also challenging because no information is provided about the patients except their ages, which makes it difcult to fll out the missing data using precise extrapolation methods.Using diferent methods to recover the missing data using the mean value is efective for some cases, but not for a set of medical exam results (sensitive data).For all these reasons, 5,046 of the 5,644 results were excluded from analysis, leaving only 598 cases (517 positives and 81 negatives) containing complete variables for use in the study.
Te analysis was performed based on patients' severity according to their hospitalization status.Blood counts were obtained for the community, regular ward, semi-intensive unit, and intensive care unit cohorts (see Table 2).Only patients with a full blood examination and RT-PCR SARS-CoV-2 outcome were included.
To ensure that our prediction is based on early indicators, patients in the semi-intensive and intensive care units were removed from our analysis.In addition, we excluded pathogenic (viral) factors and age from our study.
In this work, we used feature selection using the SelectKbest transformer and polynomial features in both groups of the dataset to fnd the most important variables.Given the result of our statistical test (Table 3), we will examine only the blood variables.Tese variables will be used to detect the presence of SARS-CoV-2.

Evaluation Metrics.
Te purpose of this study was to accurately predict whether an individual is infected with COVID-19 based on available clinical data.Te main issue in this study is the unbalanced classes.Since this is a very sensitive prediction, accuracy alone is typically not sufcient in the absence of other performance measures.In this case, we used a confusion matrix to evaluate the performance of classifcation models.Four indicators are measured in the confusion matrix: accuracy, recall, precision, and F1 score (see Table 4) [25].Tese indicators are defned as follows.
Te terms used in the equations are a, true positive; d, true negative; b, false positive; c, false negative; r, recall; and p, precision.

Accuracy.
Accuracy is the percentage of all predictions that were accurate.
Te formula is 2.4.5.AUC.AUC is the area beneath the ROC curve.It is calculated using the ROC curve, which is a plot of the true positive rate versus the false positive rate.Te greater the area under the plotted line, the better the algorithm performs due to its higher sensitivity and specifcity.Te commonly used metric known as the "area under the ROC curve," or "AUROC," ofers an easy approach to compare algorithms.[27].SVM involves fnding a hyperplan, whose ideal location is in the center of two classes.Te best hyperplan equation is that which maximizes the margin between the two groups in various classes [28].Te choice of the kernel function is an essential component because a suitable kernel function is imperative for the SVM to acquire learning capability.Terefore, we employ SVM with the radial basis kernel function.Meanwhile, the KNN algorithm is considered a type of lazy learning since it is practical machine learning that does not require preparation or a training cycle.Because of its straightforwardness, the KNN calculation is one of the ten best known data-mining algorithms [29].KNN demonstrates high profciency and a magnifcent capacity to tackle troublesome classifcation problems.As a rule, KNN is a valuable and quick procedure [30], which lends itself to our purpose of saving valuable time for health experts.
Terefore, we implemented and regularized the two models as follows (see Figure 3).First, we created a list of models that included SVM and KNN and then submitted all the models to the same evaluation procedure.We note that the algorithms in the list are introduced through a pipeline that includes steps completed in the preprocessing phase.Ten, we renamed this pipeline that has the polynomial features and SelectKbest transformers as "preprocessors."Tis preprocessor pipeline is appended, upstream, to these two models.In contrast, we created a new pipeline of the SVM model that contains the preprocessor followed by a standardization operation (with the StandardScaler function) and an SVC classifer.In addition, we applied the same process to KNN, which is a pipeline containing the preprocessor, StandardScaler, and KNeighborsClassifer.We trained and evaluated both models on their default hyperparameters using our evaluation procedure.(Te evaluation function provides training and testing of the models, as well as visualization of the confusion matrix and the learning curve.)Our goal was to improve the performance of these models by enhancing these hyperparameters.
2.5.1.Hyperparameter-Tuning Techniques.Te random search technique via the RandomizedSearchCV function enables identifcation of the best hyperparameters by comparing the performance of each combination using the cross-validation technique.We created a dictionary containing the diferent hyperparameters (penalty coefcient C, Gamma, polynomial feature, and SelectKbest) to be regulated.We embedded the SVM model, which is a pipeline, as well as the dictionary of hyperparameters in the function RandomizedSearchCV, followed by a scoring rubric which is the recall with cross-validation (cv � 10).We applied the same process to KNN using the RandomizedSearchCV function.It included the KNN model, a dictionary of hyperparameters ("neighbors classifer weights," "neighbors classifer neighbors," and "polynomial features degree," SelectKbest k), followed by a scoring rubric, which is always the recall with cross-validation equal to 10 and the number of iterations fxed at 100.

Application of SMOTE to the Imbalanced Dataset (Community).
Te community dataset included 39 positive and 431 negative patients.Terefore, the data are characterized by a distribution of the modalities of the class that is very far from a uniform distribution (that is, unbalanced classes), which is a relatively frequent situation in some classifcations.More concretely, unbalanced classes generally refer to a classifcation problem where the classes are not equally distributed.Te difculty of working with unbalanced data classes (defned as positive/negative � 0.09) is that the KNN and SVM models ignore the minority class.A class imbalance increases the difculty of learning via the classifcation algorithm.Indeed, the algorithm has few examples of the minority class to learn from.It is therefore biased and produces potentially less robust predictions than if the data were balanced.
Te imbalance between the two classes in the community dataset is signifcant (positive/negative � 0.09), thereby degrading the performance of the defned ML model.Tus, the SMOTE technique [31] is adapted to balance the two classes in the dataset, a type of data augmentation for the minority class, 1, and designed to make it similar to the majority class, 0. We used the implementations of SMOTE provided by the Python library imbalanced-learn set to their default parameters (k neighbors � 5 . ..); this object is an implementation of [31].Te 10-fold stratifed crossvalidation technique is again applied and repeated recursively for 10 classes.
Next, we divided the dataset, designating 85% of the data points for training and 15% for testing.Finally, we implemented and evaluated the SVM and KNN models, as shown in Figure 3.

Results
In this section, the efectiveness of the proposed SARS-CoV-2 prediction strategy is evaluated.Te proposed SVM and KNN classifers will be evaluated for both regular ward and community groups to accurately detect SARS-CoV-2 patients.Te performance of each implemented model is presented in terms of AUC, accuracy, precision, recall, and F1 score.

Statistical Analysis. Polynomial features and SelectKbest
provide more information about the most important variables.SelectKbest selects the 14 variables with a statistical test score of dependence (ANOVA test) with the target.Tese variables are the most signifcant for predictive purposes.Te dependency test analysis of the main variables according to the SelectKbest transformer corresponding to the patients in the regular ward (see Table 5) and the community ward revealed a recognizable ANOVA test score.It should be noted that the four variables from the regular ward data had the best scores for eosinophils, followed by red blood cells, hemoglobin, and leukocytes (see Table 6).Community patients with SARS-CoV-2 have high scores on leukocyte, monocyte, platelet, and eosinophil parameters.

Results for Patients Admitted to the Regular Ward.
SVM, run using the default settings, yields a precision value of 89% and a recall value of 75% and a precision and F1 score of 1 and 86%, respectively, for class 1 (patients testing positive) on 10-fold stratifed cross-validation, as shown in Table 7. Receiver operating characteristic (ROC) curves were plotted for the 10-fold and area under the curve (ROC) values for all folds (Figure 4).After the model was improved through optimization of its parameters via a random search, we obtained almost the same values for the model metrics.
Te metric evaluations of the model are presented in Figure 5. Te confusion matrix and the learning and validation curve are illustrated in Figures 6 and 7.
Te results of the implementation of KNN with default parameters yield an average AUC of 84% on 10-fold stratifed cross-validation, an accuracy of 78%, a recall of 75%, a precision of 75%, and an F1 score of 75%, respectively, for the class 1 patients.After regularization of the hyperparameters, the AUC improved remarkably, from 84% to 91%.While the other metrics remain almost the same (Figure 8), the AUC score is equal to 0.91 ± 0.11 (see Figure 9).Te results of the two classifers are summarized in Tables 7 and 8.

Implementation Results for the Community Dataset.
SVM defned for the data on patients not admitted to hospital using default parameters yields an average accuracy of 85%, a recall of 81%, a precision of 89%, an F1 score of 84%, and an AUC of 99% for class 1 on 10-fold stratifed cross-validation (see Table 9).Te ROC curves for all 10-folds produce an average AUC of 0.99 ± 0.00.Meanwhile, the optimized SVM results yield an average AUC of 99% (see Figure 10), an accuracy of 95%, a recall of 96%, a precision of 94%, and an F1 score of 95%, respectively (see Table 10 and Figure 11).Te confusion matrix, learning curve, and validation curve are illustrated in Figures 12 and 13.
After tuning parameters and 10-fold stratifed crossvalidation are applied, the mean AUC score is 0.99 ± 0.10 (see Figure 14), accuracy is 90%, recall is 91%, precision is 89%, and the F1 score is 91%, respectively (see Table 10 and Figure 15).Te results of the two classifers are summarized in Tables 9 and 10.

Discussion
Te complete dataset included 5,644 patients tested between March 28, 2020, and April 3, 2020, of which 598 complete blood count results were used for statistical analysis.Te remaining 5,046 results were omitted because of incomplete blood count data.Despite the constraint of the small sample size, our goal of identifying patients with COVID-19 infection was achieved with an accuracy of 95% using an SVM classifer.By promoting and developing diferent classifcation models (SVM and KNN) with 598 patients, we predicted SARS-CoV-2 infection with an AUC of up to 0.99 for nonhospitalized patients and 0.92 for regular ward patients, using only standardized complete blood count data.Te fnding that SARS-CoV-2-positive and negative cases can be classifed using biological features at an early stage of the disease has important implications.Te study was performed on a dataset organized by the patients' hospitalization status.We excluded patients in the semiintensive and intensive care units from our analysis in order to base predictions of COVID-19 test results on indicators of       Te symptoms of COVID-19 are often accompanied by an immune response [32].Terefore, hyperactivity of blood parameters is noticed in all stages whenever an infection exists [33].Indeed, several scientifc reports have confrmed this hypothesis.Te researchers in these works used similar predictive models based on blood parameters, suggesting elevated levels of some of these parameters; for example, an elevated level of eosinophilia could be a potential diagnostic indicator [34].Indeed, the value of this marker has been identifed in cerebrovascular pathologies and during coronary bypass surgery.Te previous fnding of an elevated neutrophil/lymphocyte count seems to be a relevant marker   in the diagnosis of COVID-19 [35], as is the case in our study (see the tables of the predictive variables for each dataset).
In this study, we examined the evolution of blood parameters in all the patients in a particular unit as exploratory analyses using ratios of diagnostic blood characteristics.However, the fact that these characteristics may be related to other pathogens and viral diseases is a potential limitation of the proposed method.Indeed, previous studies have shown that MERS increases monocytes [36].SARS also directly infects monocytes, which produce cytokines that directly afect neutrophils [37].Both infections, then, produce similar efects on blood activity related to humanitarian reactions.At the same time, this study indicates the immediate relation of the pathogenesis of COVID-19 to monocytes and neutrophils as shown in the dependency test score results (ANOVA test) (see Tables 5 and 6).
However, these parameters are often signifcant depending on the results of the statistical test performed, and we have initiated research to identify feature scores that distinguish SARS-CoV-2 with a preprocessor that embraces both polynomial features and the SelectKbest transformer in both groups of the dataset to fnd the most signifcant variables (see Score Table ).Te variable selection method used shows, on the one hand, the utility of data mining in extracting altered information from key features for classifcation should a future strain of coronavirus emerge, which remains a risk and danger facing humanity.On the other hand, the collection, analysis, and processing of medical data must be of interest to the health sector.

Conclusion
Our model and all artifcial intelligence-based predictive models related to the healthcare sector rely on medical data.Te use of machine learning (ML) is important for processing patient data to guide efective control and treatment strategies for the pandemic.Te main element in constructing an AI-based predictive model is information.Terefore, the availability of and access to such data are crucial for the development of similar studies.Te study will also be further adapted to address the lack of information and collected data in the medical feld to facilitate the task of direct detection of SARS-CoV-2 in hospitals and medical testing laboratories.An automated medical diagnosis that reduces costs for healthcare institutions is very important, especially when quick decisions are necessary to isolate infected patients and provide prompt treatment.Direct contact with infected patients may threaten doctors and caregivers with illness or even death.To overcome this global and dangerous challenge, it is fundamentally essential to analyze patient data at health facilities and detect the disease immediately, with accuracy, and within the shortest possible time frame.Future work will focus on creating a pipeline that combines AI-based predictive models with these types of complete blood counts and healthcare data processing models.Tese models will then be included in applications that will help in the development of mobile healthcare.Terefore, ML can provide a step toward a semiautonomous, expeditious diagnostic system that would be useful in   Te Scientifc World Journal 13 combating a future pandemic situation and would ofer tremendous opportunities to harmonize with sustainable development goals.

Figure 1 :
Figure 1: Visualization of blood/target.Te plots show the variation curves of individual parameters categorized according to whether the patient tested positive (blue curve) or negative (yellow curve) on the RT-PCR test for SARS-CoV-2.Tese plots indicate a statistically signifcant diference between the two curves (positive-negative).In particular, leukocytes, eosinophils, monocytes, and platelets seem to have diferent variability across the two classes (negative-positive).

Figure 2 :
Figure 2: Visualization of hospitalization/blood.Monocytes, eosinophils, leukocytes, and platelets seem to have diferent variability between COVID-19-positive and negative patients.In addition, the levels of these parameters vary according to patients' hospitalization status.

Figure 4 :Figure 5 :Figure 6 :
Figure 4: Te ROC curve and AUC values of the SVM model for regular ward patients.Te ROC curve shows the true-positive rate versus the false-positive rate.Comparing AUC values reveals that the ROC curve has greater AUC and thus indicates better overall performance.Generally, the higher the AUC, the better the model performance.

Figure 7 :
Figure 7: Learning and validation curve of the SVM model for regular ward patients.

Figure 8 :Figure 9 :
Figure 8: Te metric evaluations of the KNN model for regular ward patients.
SVM and KNN are very robust in analyzing data with two classes (positive or negative).

Figure 10 :
Figure 10: Te ROC curve and AUC values of the SVM model for community patients.Comparing AUC values for algorithm simulation cases (Figures 6 and 7) shows that the ROC curve for the SVM model with community patients has greater AUC and, thus, indicates better model performance.

Figure 11 :Figure 12 :
Figure 11: Te metrics of the SVM model for community patients.

Figure 13 :
Figure 13: Learning and validation curves of the SVM model for community patients.

Figure 14 :
Figure 14: Te ROC curve and AUC values of the KNN model for community patients.

Figure 15 :
Figure 15: Te metric evaluations of the KNN model for community patients.

Table 2 :
Albert Einstein hospital dataset groups.

Table 3 :
Student t-test results.Te purpose of this work was to predict COVID-19 infections from blood features using SVM and KNN classifers.After preprocessing and feature selection, we applied the two models to the bloodwork results for patients with and without COVID-19 who either were not hospitalized (39 tested positive for SARS-CoV-2 and 431 negative) or were admitted to the regular ward (26 tested positive for SARS-CoV-2 and 31 negative).A common supervised-learning technique used in regression and classifcation is SVM

Table 5 :
Te 14 variables (14 diferent blood counts) in the construction of the predictive model (regular ward data), ranked according to the best dependency test scores (ANOVA test) with the target (SARS-CoV-2).

Table 6 :
Te 14 variables (14 diferent blood counts) in the construction of the predictive model (community data), ranked according to the best dependency test scores (ANOVA test) with the target (SARS-CoV-2).

Table 7 :
Evaluation results for model predictions with the regular ward data group (patients testing positive for SARS-CoV-2) using default parameters.
10Te Scientifc World Journal the disease in its early phase.Terefore, age and pathogenic (viral) variables were excluded from our study.Te strategy we have developed can provide a reliable and rapid SARS-CoV-2 diagnosis.KNN and SVM algorithms on both groups of patient data have shown that the SVM algorithm applied to community patients with optimization and the SMOTE technique ofers the most accurate predictions.Tis enhanced SVM technique provides precision, recall, accuracy, and AUC values that reach 0.94, 0.96, 0.95, and 0.99, respectively.KNN optimized over community patient data after the SMOTE technique is applied produces accurate results.However, for the regular ward data, both classifers retain almost identical metrics regardless of the optimization

Table 8 :
Evaluation results for model predictions with the regular ward data group (patients testing positive for SARS-CoV-2) after tuning parameters.

Table 9 :
Evaluation results for model predictions with the community data group (patients testing positive for SARS-CoV-2) using default parameters.
formation for the signifcant variables of the regular ward patients as well (see scoring table).Overall, this underscores the difculty of interpreting standardized data of low registration.Both classifers can be used as an improved algorithm to perform SARS-CoV-2 prediction for new data.

Table 10 :
Evaluation results for model predictions with the community data group (patients testing positive for SARS-CoV-2) after tuning parameters.