Machine Learning-Based Models for Prediction of Critical Illness at Community, Paramedic, and Hospital Stages

Overcrowding of emergency department (ED) has put a strain on national healthcare systems and adversely affected the clinical outcomes of critically ill patients. Early identification of critically ill patients prior to ED visits can help induce optimal patient flow and allocate medical resources effectively. This study aims to develop ML-based models for predicting critical illness in the community, paramedic, and hospital stages using Korean National Emergency Department Information System (NEDIS) data. Random forest and light gradient boosting machine (LightGBM) were applied to develop predictive models. The predictive model performance based on AUROC in community stage, paramedic stage, and hospital stage was estimated to be 0.870 (95% CI: 0.869–0.871), 0.897 (95% CI: 0.896–0.898), and 0.950 (95% CI: 0.949–0.950) in random forest and 0.877 (95% CI: 0.876–0.878), 0.899 (95% CI: 0.898–0.900), and 0.950 (95% CI: 0.950–0.951) in LightGBM, respectively. The ML models showed high performance in predicting critical illness using variables available at each stage, which can be helpful in guiding patients to appropriate hospitals according to their severity of illness. Furthermore, a simulation model can be developed for proper allocation of limited medical resources.


Introduction
Overcrowding of emergency department (ED) continues to be a problem faced by hospitals around the world [1][2][3][4][5]. Tis issue has strained national healthcare systems and healthcare workers in EDs. Furthermore, ED overcrowding adversely afects the clinical outcomes of critically ill patients [4,5]. In circumstances where the ED is overcrowded, critically and noncritically ill patients compete for limited medical resources, and as a result, some critically ill patients may not receive proper medical services.
In South Korea, a three-level system of emergency medical institutes is available. Tey are classifed into regional emergency medical centers (EMCs), local EMCs, and local emergency medical institutions. Regional EMCs and local EMCs provide 24-hour emergency care for critically ill patients, while local emergency medical institutions primarily care for noncritically ill patients. Reportedly, more than 75% of the annual ED visits in South Korea are made by vehicle rather than ambulance or walk-in, and regardless of their severity of illness, patients tend to choose a higher-level hospital such as EMCs or a nearby hospital [6,7]. Such tendencies or uninformed choices can exacerbate the imbalance between the limited supply and overwhelming demand for medical resources.
Terefore, the early identifcation of critically ill patients prior to ED visits can help with the efective allocation of ED resources and prevent negative efects on patient outcomes [2][3][4]. If critical illness can be predicted at the community or paramedic stage, these data can help guide patients or paramedic ambulances to visit the appropriate level of emergency medical institutes according to the severity of their illness [8,9]. Moreover, predicting and monitoring critical illness at the community or paramedic stage are crucial in establishing resource management policies, especially in situations where medical resource requirements are constantly increasing. Terefore, this study aimed to develop machine learning-based models to predict critical illness at the community, paramedic, and hospital stages using variables available at each stage.

Study Population and Data Sources.
Tis study was conducted based on administrative data from Korean National Emergency Department Information System (NEDIS) (N20190320311). Te NEDIS is a nationwide registry launched by the Ministry of Health and Welfare of Korea in 2003. A total of 414 EDs throughout Korea participated in the NEDIS, consisting of 36 regional EMCs, 118 local EMCs, and 260 local emergency medical institutions in 2019. Demographics and clinical information of patients visiting EDs is transmitted to NEDIS in a real-time basis. As patient information was anonymized and deidentifed, the requirement for consent was waived. Tis study was approved by the Korea University Anam Hospital Institutional Review Board committee (No. 2019AN0263). From NEDIS data, adult patients (age ≥15 years) who visited EMCs from January 1, 2016 to December 31, 2017 were chosen as the study population since critically ill patients were focused on. Tose who were dead on arrival (DOA), those who had Korea Triage and Acuity Scale (KTAS) at level 8 or 9 (others or unknown), and those who had missing values or invalid values (e.g., inconsistencies in vital signs based on the NEDIS guideline) were excluded.

Variables and Endpoint.
Te study utilized data from the public database in South Korea, and the variables included in the analysis were investigated as basic items within the NEDIS data. Tese variables included patient age, gender, triage measured by the KTAS, mode of arrival, ED visit date, ED visit time, chief complaint, symptom onset to ED arrival, systolic and diastolic blood pressure, pulse rate, respiratory rate, body temperature, mental status (AVPU), and number of diagnostic codes. Te selection of variables was based on their availability at each stage and their relevance in predicting critical illness. Age, a consistent risk factor for critical illness, is associated with an increased need for ICU admission and higher mortality rates [10]. Vital signs, such as respiratory rate, systolic blood pressure, and heart rate, are predictive of critical illness and are used in clinical prediction models and triage tools such as the Emergency Severity Index (ESI) [11][12][13][14]. Mental status is a simple assessment tool with prognostic value in predicting critical illness [15]. Chief complaints, such as chest pain, dyspnea, mental change, and hematemesis, are associated with poor clinical outcomes [15,16]. While the date and time of ED visits may not directly impact the severity of illness, they provide valuable contextual information that helps in understanding ED visit patterns and enhancing the performance of predictive models [17][18][19].
Te emergency medical process was classifed into three stages: (1) community, (2) paramedic, and (3) hospital. Variables were assigned according to their availability at each stage. Te community stage consisted of variables such as age, gender, ED visit date, ED visit time, symptom onset time, chief complaint, and mental status. In the paramedic stage, vital signs were additionally included because paramedics could identify and measure them. Te hospital stage encompassed all variables available in NEDIS data, including mode of arrival, KTAS level, and number of diagnostic codes.
In NEDIS data, chief complaints are recorded in Unifed Medical Language System (UMLS) code using Korean medical terminology so that chief complaint can be mapped to the UMLS metathesaurus [20]. Due to a wide variety of UMLS codes, a panel of emergency medicine specialists was involved in categorizing chief complaints into 100 groups. Afterward, for each ED visit record, 100 separate binary variables with respect to 100 groups were newly defned as whether the patient had grouped chief complaint. As for diagnostic codes, we used International Classifcation of Diseases, Tenth Revision (ICD-10) codes from the NEDIS database.
Te primary endpoint was "critical illness" that was defned as cases admitted to intensive care unit (ICU), transfer-out cases due to lack of ICU, death, or hopeless discharge at any point during hospitalization.

Development of Prediction Models for Critical Illness.
Preprocessing of data were performed with R 3.4.1. Tis study applied random forest and light gradient boosting machine (LightGBM) among tree-based ensemble algorithms. Ensemble algorithms can improve the stability and accuracy of predictions by minimizing underftting or overftting in training data with high bias or variance. Ensemble algorithm-based learning methods include bagging (i.e., an acronym for bootstrap aggregating) and boosting. Bagging can reduce variance by training on a subset generated via random sampling of a dataset (i.e., bootstrap) and aggregating trained decision trees. Boosting also corresponds to an ensemble technique that can reduce both bias and variance by training hundreds or even thousands of weak trees. Random forest is a classic example of an ensemble model based on bagging. LightGBM is best known for its high performance based on boosting. Random forest trains the bootstrapped dataset with a bagging algorithm and fnally predicts the data through voting.
Te modelling of random forest and LightGBM was performed with sklearn and lightGBM packages in Python 7.8. Te study population was split into a development dataset (70%) and a validation dataset (30%) at a 7 : 3 ratio using stratifed random sampling. As performance measures, the area under the receiver-operating-characteristics curve (AUROC) and the area under the precision and recall curve (AUPRC) were computed. External validation of the prediction models was performed using the population that satisfed the same inclusion and exclusion criteria among ED visits registered in NEDIS from January 1, 2018 to December 31, 2018.

Variable Importance.
Variable importance was calculated in random forest to gain insights into the contribution of each variable to the model for prediction of critical illness. Te determination of variable importance is accomplished for each tree by randomly reordering the values of a single variable in out-of-bag samples and then putting the samples down each tree. After repeating this process for all variables (e.g., all bands) of one tree, the whole process is repeated for all trees in the forest. By measuring how much the model prediction changed, it is possible to estimate the importance of that variable.

Results
A total of 18,217,034 ED visits were collected in NEDIS during the study period. Among them, 6,104,816 adult ED visits in regional and local EMCs were identifed. Tose with DOA (n � 17,010), Korean Triage and Acuity Scale (KTAS) level 8 or 9 (n � 3,490) and those with missing or invalid values (n � 102,591) were excluded. Overall, 5,981,725 ED visits were included in study population ( Figure 1). Critical illness during hospitalization occurred in approximately 5.77% of ED visits. Patients experiencing critical illness were older and more likely to be transported by ambulance, showed lower level of mental status, and presented with higher KTAS level (Table 1). Among patients with and without critical illness, median and interquartile range of vital signs were similar while total time from symptom onset to ED arrival was shorter in critically ill patients.
To gain insights into the relevance of each variable, the most important variables of random forest at each stage are summarized as shown in Figure 2. In community stage, variables such as age, mental status (AVPU), dyspnea, mental change, chest pain, hematemesis, symptom onset to ED arrival time, abdominal pain, gender, and paralysis were ranked in the top 10 important variables. In paramedic stage, vital signs were included in the upper ranks and showed higher importance than variables belonging to chief complaint or symptom onset time. In hospital stage, the number of diagnostic codes was the most important predictor, followed by KTAS level, arrival mode, age, and vital signs such as systolic BP and heart rate. Table 3 shows recent machine learning studies to predict critical illness in the feld of medical triage. All fve studies utilized similar predictor variables, such as age, sex, chief complaints, vital signs, and comorbidities, similar to our study. Te choice of models and their performance varied across these studies. Kang Table S1). Te probability distribution of critical illness and cumulative number of patients by probability were also analyzed and are shown in Figure 3. Te probability distribution of critical illness at community stage was skewed to the right and showed a mixed form of step and linear function, whereas at paramedic stage, the linear function was more prominent.

Discussion
In this study, ML models were developed to predict critical illness at the community, paramedic, and hospital stages using a national database. Te models demonstrated high predictive power across all stages, even in the community stage where vital signs and triage scoring systems were not available. Our fndings highlighted the top important variables, such as age, mental status, vital signs, chief complaints, and symptom onset, which are consistent with clinical rationality. For example, in the community stage, chief complaints such as dyspnea, mental change, chest pain, and hematemesis were ranked high in importance and these symptoms are recognized as severe by existing triage tools like ESI [12,14]. In the paramedic stage, vital signs were included in the top 10 important variables, refecting their clinical signifcance. In the hospital stage, additional factors such as the number of diagnostic codes, triage level, and arrival mode are commonly used in risk stratifcation and clinical decision-making [11][12][13][14]. Age had the highest variable importance in the community and paramedic stages and ranked 4th in the hospital stage, which indicates its signifcance in all stages. Geriatric patients frequently use critical care, and the increasing use of ICU services by         Emergency Medicine International geriatric patients in many countries [10] correlates with our fndings. Te predictive ML models and variable importance analysis can assist healthcare providers in several ways. First, by continually updating and refning triage protocols based on these insights, healthcare providers can make more accurate and efcient assessments, leading to better patient outcomes. Second, the predictive models can help guide patient fow to appropriate facilities based on their assessed risk of critical illness, relieving overcrowding in ED, and optimizing resource allocation. Tird, our study facilitates improved communication among healthcare providers across various stages, leading to more efective patient handofs and care coordination.
Te previous study, Christopher et al, predicted critical illness using out-of-hospital variables (e.g., age, sex, RR, SBP, HR, pulse oximetry, mental status, and nursing home location) and their model demonstrated good discriminative capacity (AUROC 0.77 (95% CI: 0.76-0.78)) [23]. However, the model showed signifcant errors in calibration such as overidentifying critical illness among those judged at high risk and underidentifying critical illness among those judged at low risk. Tese errors may occur due to the limitations of traditional analyses such as logistic regression because they assume that the efect of one predictor is not infuenced by the value of another predictor. When this is not true and the value of one predictor alters efect of another, there is said to be an "interaction" between the 2 predictors, and those interactions can afect the study result or model performance [24,25]. Our models were able to consider the interaction between these variables using machine learning techniques and showed a good performance in all stages and in external validation. Also, we categorized chief complaints into 100 groups under the supervision of emergency medicine specialists and applied those variables to ML models (e.g., stomach-ache, bellyache, and abdominal pain for Abdominal pain), with the expectation in improving the model performances. Table 3 summarizes relevant machine learning studies to predict critical illness in prehospital settings. In comparison to our study, our ML-based prediction model demonstrated superior performance in the paramedic stage, with an AUC of 0.899 (0.898-0.900), surpassing the best-performing model in the other studies. Furthermore, our model's performance in the community stage, with an AUC of 0.877 (0.876-0.878), was either similar to or slightly higher than the AUCs of other models. Tis suggests that our model holds the potential to accurately predict critical illness in the community stage, where vital signs are not available, or to predict critical illness for ED visits made by nonambulance patients, who constitute 75% of annual ED visits. Our study, therefore, highlights the value of our models in efectively predicting critical illness in both paramedic and community stages compared to other studies. Figure 3 displays the probability distribution of critical illness in community and paramedic stages, as well as the cumulative number of patients based on their probability of critical illness. By predicting and monitoring these probabilities and patient numbers in the prehospital stage, healthcare providers can efectively allocate patients to suitable hospitals according to illness severity. A simulation model can be developed and applied to help balance the demand and supply of medical resources, using a national monitoring system for health resources and service availability. For example, in a society with a probability distribution of critical illness as shown in Figure 3(b), if the    With the increase in critically ill patients from 4% to 6%, hospitals can predict a 2% rise and proactively prepare necessary medical equipment, personnel, and ICU beds. If expanding medical resources are not feasible, EMCs might explore alternative strategies. One approach involves accommodating patients with a probability of critical illness of 0.7 or higher, which represents 4% of inpatients and is equivalent to current ICU capacity, while transferring patients with a probability of critical illness between 0.6 and 0.7 to EMCs in other regions or lower-level facilities. Another method includes raising the ICU hospitalization criteria for visiting patients so that patients with a probability of critical illness of 0.7 or higher are admitted to the ICU, while those with a probability of critical illness below 0.7 are admitted to acute care or general wards. By employing the simulation model, EMCs can predict the number of critically ill patients at the prehospital stage and respond with specifc fgures and goals when an increased demand for medical resources is expected. In situations where the number of critically ill patients suddenly increases due to infectious diseases such as the COVID-19 pandemic, rapid estimations of anticipated medical resource demand are essential for enhancing hospital preparedness. Te simulation model can predict such situations in the prehospital stage, enabling emergency medical systems and hospitals to swiftly adapt by implementing suitable strategies at each stage [26,27]. At the community level, an efective approach involves reducing medical resource use for noncritically ill patients through temporary screening clinics or residential treatment centers [28,29]. At the paramedic stage, maintaining constant communication between hospitals and paramedics concerning the probability of critical illness and available resources can induce optimal patient fow [30]. At the hospital level, surge capacity is crucial for hospital preparedness and early estimation of increased medical resource demand facilitates efective capacity expansion. Strategies may include increasing hospital beds, expanding ward spaces [31,32], converting general wards to ICUs [33,34], reducing bed occupancy rates by discharging selective admissions and noncritically ill patients in the ED [31,35], and establishing designated hospitals and alternative medical facilities for efcient resource and personnel utilization [36,37].

Limitations.
Tis study has several limitations. First, since we used a national administrative data source, extensive clinic information such as free-textual nursing notes, laboratory and ambulatory exams, patient comorbidities, and relevant patient/family medical history could not be used for developing the predictive model. In the case of using high-dimensional or time-series electronic health records (EHRs) data, natural language processing (NLP) methods can be explored to extract meaningful information and further improve the predictive accuracy [38]. However, existing NLP methods are known to have limitations due to transcriptional inaccuracies (i.e., misinterpreting spoken words) and speech assignment errors (i.e., diarization) [39]. Chief complaint concepts can be handled with UMLS codes that contain a variety of information in a "source of knowledge" format. Tus, machine learning classifcation utilizing chief complaints based on UMLS codes allows predictive models to potentially have high performances [40].
Second, ICU admission was set as one of the defnitions of critical illness, but hospitals may have diferent indications for ICU admission even if hospitals are of the same class. However, since many other studies for predicting critical illness also use ICU admission as a defnition, it can be said that this is an academically acceptable range.
Lastly, our study population exhibited class imbalance, with critically ill patients constituting only 5.77%. Class imbalance can potentially skew the performance of predictive models, as machine learning algorithms tend to favour the majority class. To address this issue, we carefully selected machine learning algorithms such as random forest and LightGBM, which excel in handling imbalanced datasets [41,42] and experimented with ensemble learning techniques, such as bagging and boosting, to enhance our model's overall performance [43,44]. Additionally, it is crucial to note that the actual distribution of patients in ED is inherently imbalanced, and our dataset truly refects this patient distribution. Although techniques such as oversampling or undersampling can be employed to mitigate the efects of imbalanced data [45], these methods have their limitations and may not always be feasible in real-world settings.

Conclusion
Te ML models showed high performance in predicting critical illness using variables available in community and paramedic stages, which can be helpful in inducing patients to appropriate hospitals according to their severity of illness. A simulation model can be developed by monitoring probability of critical illness and the cumulative number of patients and can help health providers to respond more efciently in proper allocation of limited medial resources.

Data Availability
Te National Emergency Medical Center (NEMC) in Korea has administrative control and authority on the NEDIS (National Emergency Department Information System) data underlying this study. Te NEMC review committee approves the research support proposed by researchers and provides deidentifed NEDIS data to researchers for nonproft academic research. Any researcher who proposes a study object and plans with a standardized proposal form and is approved by the NEMC review committee on research support can access the raw data. Detailed information on the Emergency Medicine International approval process is now available on the NEMC website (https://dw.nemc.or.kr) or via contacting the NEMC review committee (skko@nmc.or.kr). Te authors accessed the data used in this study in the same method that they expect other researchers to do so and did not receive special rights to access the data from the NEMC of Korea.

Ethical Approval
In accordance with the policies of the Institutional Review Board of Korea University, ethical approval is not necessary for this study. Te analyses are performed with deidentifed data.

Disclosure
Te funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Conflicts of Interest
Te authors declare that they have no conficts of interest. Chulung Lee and Su Jin Kim received funding from Korea University.