Machine Learning Approaches to Predict Patient’s Length of Stay in Emergency Department

systems’


Introduction
In healthcare systems, the emergency department (ED) plays a vital role as it provides emergency services to patients who report to this department during their stay.According to [1], length of stay (LOS) can be defned as the time interval from a patient's arrival to the ED until the patient leaves the ED (the total hospitalization time).Te waiting time includes all times for triage, testing, obtaining test results, and waiting for the doctor and nursing assessment.Te pandemic signifcantly afected the number of emergency cases for reasons like COVID-19 and other medical reasons.A mathematical model for estimating the probable outbreak size of COVID-19 clusters as a function of time was presented by [2].Tis leads to exhausting hospital resources such as staf members, medical equipment, and beds [3], signifcantly afecting patients' waiting time to receive the required medical assistance.Tis will increase the risk to patients' lives due to the shortage of healthcare systems to handle the increasing number of patient cases.Risks of infection include the lengthy waiting time and the clustering inside a closed environment, such as the ED.[4] studied the efect of nonpharmaceutical interventions and clustering on the number of infections inside the ED using agent-based simulation.Terefore, classifying and then predicting the right patient's length of stay would enable hospital ofcials to manage the resources of their departments more efectively.
Tis research aims to determine the essential factors, represented by predictor variables, infuencing patients' LOS in the ED during the COVID-19 outbreak.Accurate prediction of ED LOS is crucial for several reasons.Firstly, ED LOS is a critical metric in healthcare as it directly impacts patient care and resource management.Excessive LOS can lead to delays in treatment, potentially compromising patient outcomes.It also afects the overall efciency of the ED, as prolonged stays can lead to overcrowding and strain on resources.Secondly, the defnition of excessive LOS may vary for diferent patients and conditions.Understanding what constitutes excessive LOS for specifc cases is vital for timely and efective care.
Furthermore, prolonged LOS can have signifcant implications for patients and the department.For patients, it may result in increased discomfort, stress, and dissatisfaction with their healthcare experience.For the ED department, it can lead to decreased throughput, increased operational costs, and challenges in managing patient fow.
Currently, ED LOS is used as a critical performance indicator for ED management and resource allocation.Hospitals rely on this metric to assess their ability to meet patient demand and make informed decisions about stafng, bed availability, and resource distribution.
Terefore, this research aims to determine the critical factors that predict the outcome: the length of stay, i.e., the predictor variables.Terefore, patients' length of stay in EDs across waiting time durations will be categorized as (low, medium, and high) using supervised machine learning (ML) approaches.Te purpose is to determine signifcant factors in predicting ED LOS accurately, enabling healthcare systems to address crucial factors contributing to prolonged LOS proactively and thus design interventions that could reduce LOS, enhance the overall patient experience, and optimize resource allocation based on those signifcant factors.By doing so, we contribute to more efective healthcare service delivery, particularly during pandemics such as the COVID-19 outbreak, when the demands on the ED are especially pronounced.
Te rest of this paper is organized as follows: the frst section will present related studies in which the knowledge gap covered in this work will be discussed.Te primary AI framework model with details about the ML model architecture is described in the methodology section.Ten, a case study based in Jordan is presented.After that, the results and discussion section includes a discussion of the study results and comparisons.Finally, conclusions and future work will be given.

Literature Review
Tis section reviews the applications of machine learning approaches in predicting patients' LOS in hospitals, especially in EDs, before and after the COVID-19 outbreak.
A work done by [5] focused on the efect of prolonged LOS in hospitals on poor functional outcomes and hospitalacquired infections.Tus, it is critical to focus on predicting and reducing LOS in hospitals, specifcally in the ED.For example, [6] studied the impact of delirium on patients' LOS in the ICU and hospital.A prediction model based on a light gradient boosting machine for indoor patients was developed by [7].A work by [8] addressed the idea that healthcare services might beneft from new technologies like artifcial intelligence (AI), big data and machine learning, and the Internet of Tings (IoT) to fght COVID-19 (coronavirus) and other pandemics.Te authors in [9] highlighted how AI and other factors can be incorporated into a model to predict patients' length of stay.Tese improved information systems will facilitate hospital EDs' services and reduce the overcrowding of patients in these departments.
Te authors in [10] applied AI algorithms and data mining tools, including logistic regression (LR), decision trees (DT), and gradient boosted machines (GBM), to predict hospital admissions with patient data collected from the ED.In order to reduce the hospital LOS, automated patient discharge predictions were presented and incorporated by [11], yielding over 12 hours reduction in the LOS of some units of the hospital.Te authors in [12] built an artifcial neural network to predict the length of stay and need for postacute care for coronary syndrome patients.Te proposed ANN consists of four layers: an input layer, two hidden layers, and one output layer.Due to the efect of LOS on hospital resources and stafng, accurately predicting the LOS is an essential step for healthcare givers, insurance companies, and medical teams.Te authors in [13] used general admission features to predict LOS accurately.Several ML models were used, which are neural networks (NN), classifcation trees (CT), tree bagger (TB), Random Forest (RF), fuzzy logic (FL), support vector machine (SVM), K-nearest neighbor (KNN), regression tree (RT), and Naive Bayes (NB).Te model was able to obtain 90.04% accuracy using the CT model.
Te authors in [14] investigated the feasibility of using artifcial neural network ensembles to predict ED disposition for infants and toddlers with bronchiolitis and their length of stay.Te authors in [15] adopted artifcial neural networks and genetic algorithms to predict renal colic in EDs.Machine learning classifcation techniques were of high interest to researchers during the COVID-19 pandemic; for example, [16] implemented machine learning classifers to classify the mortality of people with underlying health conditions.Te authors in [17] aimed at forecasting patients' length of stay using artifcial neural network (ANN) within the predictive input factors such as patient age, gender, mode of arrival, treatment unit, medical tests, and the needed inspection in the ED.Tis method can also provide insights to ED medical staf to decide the patient's length of stay.Te authors in [18] applied an established Random Forest (RF) algorithm to rank variables according to the power of AI and machine learning over clinical scores in predicting inpatient mortality for ED sepsis patients.Te authors in [19] examined the factors that might infuence the ED and length of stay for old patients.Factors that afect LOS in the ICU were investigated by [20].

2
Applied Computational Intelligence and Soft Computing A study by [21] in a diverse urban hospital found that a machine learning model, gradient boosting, accurately predicted the length of stay in the ED for COVID-19 patients based on clinical factors, aiding resource planning and informing patients about expected waiting times.Another work performed by [22] analyses electronic health records (EHR) of COVID-19 patients to predict infection severity based on the length of stay, utilizing oversampled data and an artifcial neural network (ANN) with optimized hyperparameters, ultimately selecting the model with the highest F1 score for evaluation and discussion.Te authors in [23] developed and validated a prediction model using a decision tree algorithm to accurately predict patients with an ED LOS of more than 4 hours, identifying key risk factors such as waiting for specifc consultations, providing valuable insights for health managers to implement targeted interventions, and suggesting the potential utility of real-time risk display at the point-of-care.
Although the above studies investigated how to estimate patients' length of stay in the EDs, the impact of the COVID-19 outbreak and other patients on the LOS of patients in the EDs has not been investigated yet.In addition, this work focuses on the critical factors afecting the LOS.Machine learning algorithms were used to address predictor variables crucial in determining and classifying the LOS of patients in the ED.Te reason behind selecting such algorithms is attributed to the nature of diferent input variables (gender, insurance, triage level, etc.) and the unawareness of the type of relationships between these variables and the LOS.As the above literature presented, machine learning has proved to be efcient in solving such complexity inherited in this kind of problem.

Te Proposed Prediction-Classifcation Framework.
Tis section presents the prediction model development framework.A conceptual overview is given in Figure 1.Te frst step is to determine the input attributes and collectrelated data.Details of the input attributes, defnitions, and types of each attribute are summarized in Table 1.
Figure 1 shows that unsupervised, followed by supervised algorithms, were applied.Te unsupervised algorithm's purpose was to cluster LOS times into range categories, followed by implementing the supervised algorithm after the categories had been generated to predict the correct range category.Te supervised part of the data (input and output) was used to learn the pattern and classify the LOS.

Data Acquisition and Analysis.
Te LOS of patients at the ED represents the total time a patient spends in the ED before leaving home or being admitted to further healthcare services inside other hospital departments.Te ED process starts with patients' arrival and ends with their departure.Te patient might need to go through several activities, each consuming a specifc amount of time refecting their entire LOS at the ED.Te LOS can be schematically depicted, as shown in Figure 2.
Figure 2 shows that the time spent in the ED starts with the patient's arrival, either by ambulance or as an ambulatory case.Ten, the patient must be checked in at the reception by providing information, including the mode of arrival, date, day, gender, insurance, and age.After check-in, medical care starts with immediate treatments for urgent cases.Depending on case urgency, the triage level is determined to assess the next level of needed care.All required tests and imaging are then decided by the medical staf members, which go in parallel with the medication.Te fnal step is the consultation before leaving the ED.Te workload (staf) is assumed to be constant.

Data Collection.
Te dataset used in this study was collected from hospital's records.Te data covers two years, from 2019 to 2020.A sample of data for the busiest days during the month was collected.Tese days are 1, 2, 9, 12, 13, 15, 18, 22, 23, and 28 of each month from January 2019 until October 2020.Te fnal dataset contains a total number of 400 randomly selected patients' records.Patient privacy is critical, so we consented to collect raw data without patient identifcation information.Patients were not interviewed or asked about this data; the research team reviewed historical records from the hospital database under the supervision of the records responsible, with patient identifcation information masked.Te hospital management granted the research team access to the data with consent to use the anonymous records solely for research purposes.Data were collected from emergency forms with categorical and numerical types for input in the ML model.Forty-two attribute data points were collected for the randomly selected 400 patients.Table 1 shows dataset defnitions and details.Te last attribute (LOS) is the response/output we want to estimate the LOS.
Te ED process starts with patients' arrival and ends with their departure.After the patient has arrived, the check-in data needs to be undertaken.In this process, the receptionist will give the patient an ID number and record the date, day, arrival time, gender, insurance information, and patient age.Immediately after check-in, treatment will occur, starting with a nurse assessing the patient's case urgency level to put him in the right triage level.Ten, the patient will be cared for by a physician to start the medication process, be prescribed all required tests to be correctly diagnosed, and be given the proper medication and consultation.When the medication process ends, the patient leaves the ED or is admitted to the hospital; thus, the LOS is calculated at this point.Data and inputs handled in this research are shown in detail with defnitions in Table 1.

Data Preprocessing and Transformation.
Real-world data is often incomplete, inconsistent, or lacking in specifc ways and is likely to contain many errors, and here comes the researcher's role in resolving these issues.Data preprocessing is an essential step in machine learning.Tis process ensures that the data will be in a format the model understands to obtain the output.Data preprocessing is a data mining technique that involves cleaning and Applied Computational Intelligence and Soft Computing transforming raw data into an acceptable form.Data preprocessing includes cleaning, instance selection, normalization, transformation, feature extraction, categorical values, sampling, etc. [24].Cleaned data were divided into training and testing sets.
Data preprocessing is an essential step in machine learning.Te phrase "garbage in, garbage out" is particularly suitable for data mining and machine learning projects; it emphasizes data preprocessing.Real-world data is often incomplete, inconsistent, or lacking in specifc ways and is likely to contain many errors, and here comes the researcher's role in resolving these issues.
Data samples from the raw data considered outliers were removed, including those who died in the ED, less than 1year-old infants (because they have diferent procedures), inpatients who left without being seen, and incomplete records.Also, qualitative attributes were labeled into quantitative data.Tables 2 and 3 in the following show some descriptive measures of the numerical and categorical variables, respectively.
Te LOS is, on average, 68.1 minutes with a standard deviation of 49.6 minutes (see Table 2); this shows a significant number of cases that take more than 100 minutes.Te time is considered high for two main reasons.First, patients visiting the ED are, in most cases, in need of immediate service, even if the case is not life-threatening.Second, in pandemics like COVID-19, high waiting time means a large queue, and as it is already well established, crowding is the primary factor for virus transmission and, thus, infection [4].
From a simple management perspective, data in Table 3 can be divided into two main categories: controlled and uncontrolled.Te controlled variables are those we can decide in advance, while the others are those collected and found based on the decision of the controlled.We tried to distribute the controlled data uniformly.Te output is given based on the two numbers of categorization tested; this will be discussed later.
In this research, a LOS prediction model was developed to determine the appropriate LOS time range using unsupervised machine learning techniques.Specifcally, the data was clustered into fve categories using the EM (Expectation-Maximization) algorithm implemented in Weka.Te EM algorithm applied unsupervised clustering to group the data based on similarities or patterns.Te resulting fve categories were defned as follows: Category 1 represented LOS times ranging from 0 to 60 minutes, Category 2 encompassed LOS times from 61 to 120 minutes, Category 3 covered LOS times from 121 to 180 minutes, Category 4 included LOS times from 181 to 240 minutes, and Category 5 spanned LOS times from 241 to 300 minutes.
By leveraging the power of unsupervised machine learning, this LOS prediction model enabled the accurate classifcation of data points into the appropriate time ranges.Such an approach provides valuable insights into LOS patterns and facilitates decision-making in various domains, allowing for more effective resource allocation and patient management.

Attribute Correlation Analysis.
Features selection, also identifed as variable selection, attribute selection, or variable subset selection, is the process of choosing a subset of relevant features (variables and predictors) for use in model building.We used the Correlation Attribute Evaluation to assess the worth of an attribute by measuring the correlation (Pearson's) between it and the class.Nominal attributes are considered on a value-by-value basis by treating each value as an indicator.
3.5.Classifcation.Artifcial intelligence (AI) is the intelligence demonstrated by machines.We used machine learning (ML), a branch of artifcial intelligence that enables a model to learn from past data or experiences without being explicitly programmed.Machine learning uses a massive amount of structured and semistructured data, so a machine learning model can generate accurate results or give predictions based on that data.It can be divided into three types: supervised learning, reinforcement learning, and unsupervised learning.We use supervised learning, the machine learning task of learning a function that maps an input to an output based on previous cases (input-output pairs).
Unsupervised learning is a type of machine learning that looks for previously undetected patterns in a dataset with no preexisting labels and with a minimum of human supervision.Two main methods used in unsupervised learning are principal component and cluster analysis.Cluster analysis is used in unsupervised learning to group or segment datasets with shared attributes to extrapolate algorithmic relationships.Cluster analysis is a branch of machine learning that groups the data that has not been labeled, classifed, or categorized [25].In our project, we use clustering analysis.Classifcation is done in this research using an unsupervised procedure (clustering analysis).Tis involves grouping data into categories based on inherent similarity or a distance measure.Unsupervised learning allows the system maximum fexibility in creating its own classifcation rules and hopefully fnding hidden patterns unknown to humans (Ethem Alpaydin, 2014).Our work mainly uses clustering analysis to determine the number of categories.Implementation of Expectation Maximization Clustering EM assigns a probability distribution to each instance, indicating the probability of it belonging to each cluster.EM can decide how many clusters to create by cross-validation, or we may specify how many clusters to generate (Frank et al., 2017).
Te next step is to implement one of the supervised learning algorithms to predict the right LOS category in terms of the attributes mentioned earlier.Finally, the performance evaluation and validation of the model are illustrated.Te details of this step will be given in the results and discussion section.But frst, let us provide some explanation of the main algorithms used as follows: (i) Logistic Regression (logistic function): It is a classifcation algorithm, used when the target variable's value is categorical.Logistic regression is a supervised classifcation algorithm.In a classifcation problem, the target variable (or output), y, can take only discrete values for a given set of features (or inputs), X, using the sigmoid function: ( (ii) Naïve Bayes: It is a classifcation algorithm for binary (two-class) and multiclass classifcation problems.Te technique is easiest to understand when described using binary or categorical input values.In machine learning, we are often interested in selecting the best hypothesis (h) given data (d).In a classifcation problem, our hypothesis (h) may be the class to assign for a new data instance (d).(iii) Random Forest: It is a machine-learning classifer based on choosing random subsets of variables for each tree and using the most frequent tree output as the overall classifcation.It consists of many individual decision trees that operate as an ensemble.As we mentioned, each tree in the random forest spits out a class prediction, and the class with the most votes becomes our model's prediction.(iv) Decision Stump: A decision stump is a machine-learning model consisting of a one-level decision tree.It is a decision tree with one internal node (the root) immediately connected to the terminal nodes (its leaves).A decision stump makes a prediction based on the value of just a single input feature.Sometimes, they are also called 1-rules.Decision stumps are often used as components (called "weak learners" or "base learners") in machine learning ensemble techniques such as bagging and boosting.Te confusion matrix (primary evaluation method used in this research), also known as an error matrix, is a specifc table layout that allows visualization of the performance of an algorithm, typically for supervised learning.Each row of the matrix represents the instances in a predicted class, while each column represents the instances in an actual class.It uses two-word terminology.Te frst word indicates the correctness of the decision, while the second indicates the prediction result.Regarding the model validation, the general layout for a three-category model can be summarized in Table 4. Tis means that the frst diagonal elements are desired.For example, True A means that the model correctly classifed A as A, while False A means that the model classifed B or C as A, which is incorrect output.

Five Categories vs. Tree Categories. Te 5-CAT results
were unsatisfactory; the best accuracy is 65.75% for the Naïve Bayes algorithm.Tus, a 3-CAT classifcation was suggested by the researchers after trying a few other category scenarios.Te time intervals are divided into three categories labeled by the numbers 1, 2, and 3, in which each number represents a category as follows (1: 0-100, 2: 101-200, 3: 201-300 minutes).It represents the general human-used classifcation: low, medium, and high.

Validation.
Te most critical indicator to consider in machine learning is cross-validation, a resampling procedure used to evaluate machine learning models on limited data samples.Te procedure has a single parameter called k, which refers to the number of groups a given data sample is split into.In a prediction problem, a model is usually given a dataset of known data on which training is run (the In k-fold cross-validation, the original sample is randomly partitioned into k equal-sized subsamples.Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining "k−1" subsamples are used as training data.Te cross-validation process is then repeated k times, with each k subsample used exactly once as the validation data.Te k results can then be averaged to produce a single estimation.Te advantage of this method over repeated random subsampling is that all observations are used for training and validation, and each observation is used for validation exactly once.10fold cross-validation is commonly used, and we use it in our models, but in general, k remains an unfxed parameter [27].Figure 3 shows an explanation of the concept using k of 5.

Case Study
In order to justify the developed ML-based classifcation model established to predict LOS classes before and after the COVID-19 pandemic, a case study was selected and run.Tis case study was carried out in one of the local hospitals in Jordan.Established in 1994, it positioned itself among the top medical destinations and leading referral hospitals for local, regional, and international patients.Te hospital strives to provide high-quality healthcare to all of its patients.
Tis hospital includes 14 specialized medical units consisting of 145 inpatient beds and 89 covering emergency, resuscitation, operations, newborns, and other outpatient services (with a total capacity of 234 beds).It ofers a full range of medical and surgical services, covering all specialties.Teir ED contains rooms dedicated to pediatric cases and for those with infectious diseases.Te department treats an average of 200 patients per day.Tere are doctors on call, covering all subspecialties 24/7.Tis section includes data collection, preparation, and descriptive statistical analysis and presents datasets and defnitions.
We used the top-ranked classifcation algorithms mentioned in the method to diferentiate between the threecategories model and the fve-categories model using 400 patient records.
Regarding the attribute correlation analysis, attributes with high correlation signifcantly afect the LOS of the patients.Tose attributes with a high correlation will be given more weight in the model.Te ED management should focus more on these variables when reducing the LOS.It becomes more important when fghting against spread of viruses, such as the recent COVID-19 pandemic.Table 5 summarizes the correlation analysis.
One of the main factors in reducing LOS is increasing the staf and equipment assigned to the most requested tests by the doctors.As a result, each patient's time spent in the ED is decreased.
Regarding the model evaluation, Table 6 shows the output confusion matrix for the 5-CAT classifcation with results for all algorithms implemented in this work for comparison purposes.For example, the logistic algorithm classifed 3 data points (LOS of 3 patients) as "a," and they are actually in category "a."On the other hand, it classifed 15 as "a," and they are actually "b."Te actual number of data points in class "a" � 3 + 16 + 7 + 2 + 0 � 28, correctly predicted 3.In other words, the diagonal of the confusion matrix represents the correctly predicted classes.For the logistic algorithm, 3 + 56 + 168 + 4 + 3 � 234 out of the 400 records were correctly predicted, resulting in a percentage of 234/400 � 58.5%.More discussion will be given later, when the 3-CAT is introduced.
Te table in the following (Table 7) compares these categories for all algorithms for the fve-categories vs. threecategories analysis.Table 7 shows correctly classifed measures for all algorithms in both classifcation schemes.In addition, the REP tree algorithm resulted in the best performance, with an accuracy of 86.3%, followed by the decision stump with 85.8% accuracy.Te main reason the 3-CAT is better than the 5-CAT is the efect of widening the scoring scale in decision problems in general.It becomes more difcult to distinguish between categories when their number increases.Tus, accuracy will increase for fewer categories, especially with a small sample size (400 is considered small in these models).Te tradeof between accuracy and informative classifcation is the primary criterion for selecting three categories and not two.In this section, we compare the model's performance before and after COVID-19 are necessary.Te before and after results of the spread of COVID-19 in Jordan are summarized in Figure 4. Te pandemic reduced the model quality.Tis is mainly due to unusual situations that cause interruptions in healthcare services.

Before and after
Figure 4 shows that all algorithms performed better for the data available before the COVID-19 pandemic.Te reduced accuracy of the LOS prediction model after the COVID-19 spread can be attributed to several factors.One signifcant factor is the limited data availability during the pandemic compared to the prepandemic period.Most of the data used for training and testing the model was collected before the outbreak.Consequently, the model's performance may have been adversely afected as it was not explicitly trained on postpandemic patterns.Moreover, there was a notable decline in the number of visits to the Emergency Department (ED) following the implementation of lockdown measures and mobility restrictions during the pandemic.Tis decrease in patient volume resulted in a shift in the types and distribution of medical cases encountered in the ED.Te reduced likelihood of accidents and infections due to restricted movements further infuenced the accuracy of the model's predictions.
Tis work has some limitations, which might hinder the accuracy of the results if they have not been tackled in the future.Among these limitations is the limited data availability during the pandemic compared to the prepandemic period.Tis might denigrate the model's performance if it is not well trained on postpandemic patterns.It is worth mentioning that this study is a single-site study, which refects the results of a special case for a single-site study rather than a general study with more than one hospital.Also, a small sample size was another limitation that impacted the validation and accuracy of the study results.

Conclusion and Future Work
Tis study aimed to determine the critical factors that predict the length of stay, i.e., the predictor variables in the ED across three predetermined time range categories (low, medium, and high), utilizing ML algorithms.Tese categories were determined using unsupervised algorithms and took into account the impact of COVID-19 and various factors associated with the ED process.A case study was conducted in a local healthcare facility.Regression predictive modeling was initially utilized; however, it failed in our case due to the small size of the available data.Tus, classifcation algorithms were used, which showed high performance in predicting the best LOS category at ED. Te best performance was achieved using Trees algorithms (decision stump, REB tree, and Random Forest) and the multilayer perceptron (with batch size 50 and 0.001 learning rate).Two scenarios were tested: the fve categories and the three categories.Te main reason the 3-CAT is better than the 5-CAT is the efect of widening the scoring scale in decision  Applied Computational Intelligence and Soft Computing problems in general.It becomes more difcult to distinguish between categories when their number increases.Tus, accuracy will increase for fewer categories, especially with a small sample size (400 is considered small in these models).Te tradeof between accuracy and informative classifcation is the main criterion for selecting three categories and not two.
As future work, the model can be expanded to include more than one facility and a larger dataset.Other factors might be considered to capture unusual situations and crises like pandemics for more accurate prediction.It is recommended to incorporate pandemic-related factors, such as mobility measures and healthcare service interruptions, into the training and evaluation processes to improve the model's accuracy in the postpandemic period.In addition, considering staf capacity, particularly the impact on nurses, during the initial stages of the pandemic can help mitigate the efects of clustering and improve the model's performance.By accounting for these pandemic-specifc factors, the LOS prediction model can better adapt to the changing healthcare landscape and provide more reliable and accurate predictions.

Figure 2 :
Figure 2: Length of stay breakdown in the emergency department.

Table 1 :
Input data set attributes, types, and defnitions.

Table 2 :
Descriptive statistical results of numeric variables.

Table 3 :
Descriptive statistical results of categorical variables.

Table 3 :
Continued.ability to predict new data not used in the model development.In this model, 90% of the data were used in training the model, which comes from the pre-COVID dataset, while the rest (10%) of the data were used for testing the model, which includes COVID-19 data in addition to part of the pre-COVID dataset.Tis is because the main aim is to investigate the critical factors in predicting LOS, and COVID-19 is a temporary issue that is not considered a fundamental element in the model.

Table
Correlation coefcients of the attributes.

Table 6 :
Confusion matrix for 5-CAT for each algorithm.

Table 7 :
Comparison between the 3-CAT and 5-CAT for all algorithms.