A Machine Learning-Based Study of the Effects of Air Pollution and Weather in Respiratory Disease Patients Visiting Emergency Departments

Background To date, investigating respiratory disease patients visiting the emergency departments related with fined dust is limited. This study aimed to analyze the effects of two variable-weather and air pollution on respiratory disease patients who visited emergency departments. Methods This study utilized the National Emergency Department Information System (NEDIS) database. The meteorological data were obtained from the National Climate Data Service. Each weather factor reflected the accumulated data of 4 days: a patient's visit day and 3 days before the visit day. We utilized the RandomForestRegressor of scikit-learn for data analysis. Result The study included 525,579 participants. This study found that multiple variables of weather and air pollution influenced the respiratory diseases of patients who visited emergency departments. Most of the respiratory disease patients had acute upper respiratory infections [J00–J06], influenza [J09–J11], and pneumonia [J12–J18], on which PM10 following temperature and steam pressure was the most influential. As the top three leading causes of admission to the emergency department, pneumonia [J12–J18], acute upper respiratory infections [J00–J06], and chronic lower respiratory diseases [J40–J47] were highly influenced by PM10. Conclusion Most of the respiratory patients visiting EDs were diagnosed with acute upper respiratory infections, influenza, and pneumonia. Following temperature, steam pressure and PM10 had influential relations with these diseases. It is expected that the number of respiratory disease patients visiting the emergency departments will increase by day 3 when the steam pressure and temperature values are low, and the variables of air pollution are high. The number of respiratory disease patients visiting the emergency departments will increase by day 3 when the steam pressure and temperature values are low, and the variables of air pollution are high.


Introduction
Because of the exacerbation of air pollution, interest in the health effects of fine dust has increased. Fine dust is well known as a group 1 carcinogen. In addition, there have been reports of fine dust-related deaths, paralysis, neuropathy, high blood pressure, cardiovascular, and respiratory diseases [1][2][3][4][5] According to recent studies, it causes depression and anxiety [1], neurodegenerative diseases including dementia or Parkinson's disease, and skin diseases and increases the risk of childhood disorders, such as autism spectrum disorder, developmental disorders [6], asthma, respiratory tract infections, and atopic dermatitis [2,[7][8][9]. Nevertheless, there is no relevant research investigating patients visiting the emergency departments (EDs).
Overcrowding in EDs is a global problem and has been addressed as a national crisis in some countries [10]. e medical resources needed in the ED vary according to the severity, type of visit, and the patient's disease. Forecasting emergency medical demand can be a good way to efficiently allocate limited resources [11]. A variety of studies have evaluated the factors influencing the demands for emergency medical service [12]. In particular, previous studies have reported the characteristics of patients visiting EDs and the number of patients according to seasons and weather conditions [13].
Some diseases are sensitive to climate change. Studies have been conducted on the characteristics and number of patients visiting the ED depending on the season and climate. In addition, numerous studies have revealed that weather and air pollution are closely correlated with the development of cardiovascular and respiratory diseases. However, there is a lack of research on the multivariate factors in existing studies. Studies on the impact of weather and air pollution both on the demand for respiratory emergency medical resources remain insufficient. erefore, the data of respiratory disease patients who visited EDs were extracted from the national database of EDs and, using a machine learning technique, analyzed for the complex effect of air pollution, weather, and characteristics of respiratory disease patients visiting the ED for 3 years. Based on the analyzed general characteristics (age, gender, diagnosis), the use day of ED and hospital resources was examined. is study will help provide fundamental data on the prediction model of emergency respiratory patient visits related to weather including air pollution for patient treatment and the efficient management of limited medical resources.

Materials and Method
is study utilized the National Emergency Department Information System (NEDIS) database; a secondary data analysis was conducted using random forest (RF), a machine learning technique. NEDIS, an ED information network operated by the Ministry of Health and Welfare, is managed by the National Emergency Medical Center [14]. Since the execution of the system in 2003, it has collected clinical and administrative data of all patients who visited EDs nationwide. Korea provides national medical insurance, which covers 98% of the Korean population [15]. erefore, the data collected are extremely influential. Emergency medical centers in the country undergo evaluation once a year in order to be approved as official organizations and automatically transmit all the digitalized data for the items requested by the NEDIS, as a principle. erefore, the data utilized in this study included all the data from the EDs in Seoul, Korea.

Study Design and Statistical Analysis
Each weather factor reflected the accumulated data of 4 days: a patient's visit day and 3 days before the visit day. e number of explanatory variables corresponding to the response variable Y is 48 (4 × 12). With the use of weather and air pollution variables (X) such as temperature, the amount of precipitation, and PM 2,5 , the number (Y) of the ED patients who had a particular disease code was estimated. A RF Regression model that can select important variables was applied.
e importance of an explanatory variable that influences a dependent variable was extracted via calculating impurity-based feature importance. We used the code available in the RandomForestRegressor of scikit-learn package. Pandas package (version 1.0.0; NumFOCUS, Austin, TX, USA) and Dask package were used mainly for data preprocessing.
RF, as a machine supervised learning technique, has a combined form of multiple decision trees. In a conventional decision tree technique, if the number of explanatory variables is large, the number of the branches in one decision tree is also large. As a result, overfitting (in which the learned data only fits well) occurs. To prevent such overfitting, the RF randomly samples a part of the explanatory variables when one decision tree is generated and thereby creates multiple decision trees by sampling with replacement. Among the values predicted by the multiple decision trees generated in the process, the most predicted value becomes the final prediction value. In this study, the number of explanatory variables is large, and multicollinearity exists ( Figure 1). For this reason, RF was applied rather than a conventional decision tree technique. To evaluate the performance of RF, Out of Bag, which evaluates performance with 1/3 of the data not used at the time of sampling with replacement, was used. e importance of an explanatory variable that influences a dependent variable was extracted. e most predictive features of regressors build up on models showing R ∧ 2 over 0.5.

ER Visit Data
Among the patients who had visited emergency medical centers in Seoul within the 36-month period from January 1, 2015, to December 31, 2017, those whose disease classification code (J code; J00-J99) at the time they left the ED was related to respiratory diseases according to the Korean Standard Classification of Diseases (KCD) (based on ICD-10) were selected. e analysis was performed using the first primary diagnosis in the emergency centers. Local emergency medical centers that failed to transmit KTAS were excluded from the analysis. e patients whose visit date and time were not recorded were excluded as well.
e age, gender, disease name, and date and time of visit of study patients were utilized. e names of diseases are provided in Appendix 1.

Air Pollution and Weather Data
Fine dust contains enormous kinds of air pollutants, including heavy metals, ions, organic carbons, and black carbons. According to particle size, a particulate matter whose diameter is 10 µm or less is known as PM 10 , and a particulate matter whose diameter is 2·5 µm or less is known as PM 2·5 or ultra-fine particulate [16]. In this study, carbon monoxide, nitrogen dioxide, ozone (O 3 ), PM 10 , PM 2·5 , and sulfur trioxide (SO 2 ) were used as variables.
e corresponding meteorological data were obtained from the National Climate Data Service System as weather variables. Both the automated synoptic observing data (of 2 Emergency Medicine International ASOS) provided by the "meteorological data open portal" of the Korea Meteorological Administration and the fine dust measuring data provided by Air Korea were combined and used based on region [17], date, and time. e weather data of Seoul City were used as reference data, and the maximum number of influence days of disease occurrence was assumed to be 3. Data on the average temperature, amount of precipitation, relative humidity, steam pressure, wind speed, and wind direction provided by the Korea Meteorological Administration were set as weather factors. e distance between a regional emergency medical center in Seoul and an observatory was calculated. e five observatories with a small distance were selected. e mean of the values measured in the five observatories was calculated every hour. e mean of all the observatories in the region was also calculated. In this way, the mean value in the region was defined. A missing value was not processed and was left empty. e weather data from December 27, 2014, to December 31, 2017, were obtained. Seasons were classified as spring (March, April, and May); summer (June, July, and August); fall (September, October, and November); and winter (December, January, and February).  (Table 3). (Figure 1). e correlations between six air pollution variables and six weather factors were analyzed, and whether multicollinearity existed was examined. Blue color indicated a negative correlation, while red color indicated a positive correlation. A darker color denoted more correlation between variables. Air pollution variables had positive correlations, while O 3 had a negative correlation. Air pollution variables had negative correlations with weather factors (except for O 3 ). e correlation between six air pollution variables and six meteorological factors was compared. Blue color indicated a negative correlation, while red color indicated a positive correlation. A darker color denoted more correlation among the variables. Table 4). Figures 2-5 illustrate the graphs of 20 weather conditions and air pollution variables, which are highly related to the patients' visits to EDs because of each disease. Table 4 presents the top 10 variables. e number ranging from 0 to 3 after each variable denoted the relation between a patient's visit date and a variable measurement date. In other words, "0" indicates the relation between the weather condition on the day of a visit and an air pollution value; "1" indicates the relation between the weather condition on the day of a visit and the value on the day before the visit; "2" indicates the relation between the weather condition on the day of a visit and 2 days prior to the visit; and "3" indicates the relationship between the weather condition on the day of visit and 3 days prior to the visit. e "mean" is a value of the mean, while the "std" is a value of standard deviation that represents the changes in a variable on a certain day. Figure 2 illustrates the weather and air pollution variables on the day of a visit that have high correlations with ED visit according to the patient's disease. Influenza, pneumonia, and other acute lower respiratory infections [J09-J11] were highly related to temperature and steam pressure (4B-D). Lung   Emergency Medicine International                   diseases due to external agents [J60-J70] were highly related to CO, NO 2 , and the amount of precipitation as air pollution variables (4G). Figure 3 shows the correlations between the weather and air pollution variables on the day of a visit and the day before the visit and the ED visit. Figure 4 presents the correlations between the variables on the day of a visit and 2 days before the visit. Figure 5 illustrates the correlations between the weather and air pollution variables on the day of a visit, 2 days before the visit, and 3 days before the visit and the ED visit. A. Acute upper respiratory infection [J00-J06] was mainly related to NO 2 on the day of a visit and to PM 10 on the day of a visit and the day before the visit. B. Influenza was related to the temperature and steam pressure 3 days before a visit and was slightly influenced by PM 10 3 days before a visit. C. Pneumonia [J12-J18] was influenced by temperature and steam pressure 2-3 days before a visit, rather than on the day of the visit, and was influenced by PM 10 as well. Figure 6 is the result of total respiratory disease   (Table 4).

Results of Random Forest Based Analysis (Figures 2-6 and
Among the climate factors, steam pressure had an effect on 0, 1, 2, 3 days, and among air pollution, NO 2

Discussion
Based on the consistently registered and systemized data registry of national emergency medical centers, this study analyzed the correlations between weather and air pollution variables and respiratory disease patients visiting EDs by applying a machine learning approach as an AI technique. Previous studies have focused on the simple relationship between a single disease and one air factor. e present study considered all respiratory diseases and a variety of air pollution and weather variables. Unlike previous studies, it examined the effects of weather and air pollution variables 3 days before a visit. For air pollution, data of the five observatories in consideration of the location of the ED were used. Unlike previous studies that used the daily average data of air pollution variables [18,19], this study utilized the data of 3 days before a visit, the daily temperature difference, and other data to determine the values of weather conditions in detail and identify their level of influence.
As a result, patients who visited EDs due to respiratory diseases had correlations with weather and air pollution variables on the day of the visit and 1-3 days before the visit. Of the air pollution variables, PM 10 and PM 2·5, which have recently drawn a lot of attention, influenced patients' ED visit.
In this study, not only the effects of weather and air pollution variables on each disease, but also their level of influence was analyzed. Many air pollution variables had high correlations with acute upper respiratory infections [J00-J06], chronic lower respiratory diseases [J40-J47], and suppurative and necrotic conditions of the lower respiratory tract [J85-J86]. In cases of diseases that were highly influenced by air pollutants, steam pressure was not influential. As a result, steam pressure had a negative correlation with air pollution variables. In the case of acute upper respiratory infections [J00-J06], air pollution variables were highly influential; therefore, they had high correlations. Influenza and pneumonia were influenced by air factors like steam pressure; lower respiratory infections were influenced by air factors, and upper respiratory diseases by air pollution variables.
In the case of several diseases, compared with PM 2·5 , PM 10 had a greater influence on patients' visit to ED. However, this does not mean that PM 2·5 has little influence on the incidence of respiratory diseases. Nevertheless, it is reasonable to indicate that PM 10 (larger particle size) is more influential on acute diseases that trigger a patient's visit to the ED during a short-term period (on the day of the visit to 3 days before the visit). More studies should be conducted to determine the long-term effects of PM 2·5 [20], which is known to persist and affect the human body. PM10 influenced the respiratory disease patients' visits to the emergency departments.
In the case of influenza, the temperature and steam pressure on the day of a visit were most influential. In the case of pneumonia, which accounted for a majority of the respiratory disease patients visiting EDs, it was influenced more by steam pressure and temperature. e group of diseases including asthma (J40-J47) was influenced by PM 10 following steam pressure. Acute upper respiratory infections were mostly influenced by air pollution variables, especially NO 2 and PM 10 .
What was interesting was that acute upper respiratory infections [J00-J06], influenza [J09-J11], and pneumonia [J12-J18], which account for a majority of the respiratory diseases of patients visiting EDs, were highly influenced by PM 10 following temperature and steam pressure and that PM 10 was also highly influential in the top three diseases prompting visits to the ED: pneumonia [J12-J18], acute upper respiratory infections [J00-J06], and chronic lower respiratory diseases [J40-J47]. erefore, of the air pollution variables, PM 10 most influenced respiratory disease patients' visits to EDs.
Donaldson et al. reported that asthma symptoms were worsened by the influence of PM 10 . is finding is consistent with the results of the present study [21]. PM exposure can trigger an asthmatic response through multiple paths. Presumably, it is related to airway inflammation, increased smooth muscle constriction, direct stimulation of lipid mediators, additional oxidative stress, and proinflammatory burden [21,22]. Other studies have also reported that an increase in PM 10 is related to an increase in the use of asthma drugs [23,24], According to a recent study conducted by Sohn et al. [25] in Korea, a daily temperature change influenced the pneumonia patients' visits to EDs in Seoul. Choi et al. [26] reported that maximum temperature, Emergency Medicine International rainfall, relative humidity, and PM 10 had correlations with community-acquired pneumonia. is study also revealed that pneumonia patients' visits to EDs were influenced by weather and air pollution variables, such as steam pressure, temperature, CO, PM 10 , and O 3 (Figure 2(c)). Arbex et al. (Brazil) [27] reported the correlations between acute upper respiratory infections [J00-J06] and air pollution variables. According to their report, the diseases were related to lag 0 of NO 2 , SO 2 , O 3 , and PM 10 . In this study, acute upper respiratory infections were also influenced by lag 0 in the order of NO 2 , M 10 , and SO 3 (Figure 2(a)). Patients with acute upper respiratory infections accounted for 52.2% of the total respiratory disease patients visiting EDs and 12.8% of hospitalized patients. As such, the high number of patients with these diseases visiting the EDs was directly influenced by air pollution variables.
According to the research by Wanka et al. in Germany [28], weather and air pollution variables influenced respiratory diseases in a complex way. is study also revealed that a variety of variables were related to each other and influenced diverse disease groups in complex ways.
Zhang et al. [29] reported that a low concentration of PM 2·5 was related to acute respiratory infections 3 days before a visit, while a high concentration of PM 2·5 was related to the infections on the day before a visit. In this study, PM 2·5 influenced acute respiratory infections in lag 0 and lag 2. Weather and air pollution variables were more directly influenced by respiratory diseases than other disease groups. A similar result was found for all the disease groups [30]. e number of respiratory disease patients will increase by day 3 when the values of steam pressure and temperature are low, and the values of air pollution variables are high. e weather-related health index for predicting respiratory disease patients visiting EDs is yet to be developed. If a prediction model is additionally developed based on the study results, it is possible to provide a fundamental material for preventing respiratory diseases related to weather changes and to help medical institutions utilize their facilities and manpower efficiently to manage patients with respiratory infections.
is study has the following limitations. First, the analysis was conducted with data that was already codified and collected; therefore, it was impossible to determine the clinical characteristics, prognosis, sources of infection, and underlying diseases of each patient. e primary outcome of this study was assessment of trends using large data. erefore, it is necessary to analyze the clinical data of individual disease groups. Second, the study only lasted for 3 years. As described in this thesis, a group of chronic diseases and a group of acute diseases were included in the analysis. In particular, air pollution variables are needed in long-term influence analysis. However, the ED patients data system provided was based on 3-year data. erefore, it is necessary to analyze the long-term influence of the study variables.
ird, this study set the time lag to 3 days. If a general incubation period is taken into account, the lag of 14 days can be set. However, given the large number of variables, the time lag was set within a short-term period. At last, the data from the observatory near the hospital were used, not the data from the observatory near the patient's house. e reason for including the data from the observatory near the hospital is that if we use the observatory data near the patient's address, data cannot be obtained with personal information (address), and it has to be assumed that the patient has visited a nearby hospital.
In this study, the effects of weather and air pollution variables on respiratory disease patients' visits to EDs were analyzed. Most of the respiratory patients visiting EDs were diagnosed with acute upper respiratory infections [J00-J06], influenza [J09-J11], and pneumonia [J12-J18]. PM 10 following temperature and steam pressure had influential relations with these diseases. In patients with pneumonia [J12-J18], acute upper respiratory infections [J00-J06], and chronic lower respiratory diseases [J40-J47] as the top three diseases managed in EDs, PM 10 was highly influential. As a result, among air pollution variables, PM 10 was found to influence the respiratory disease patients' visits to EDs. e number of respiratory disease patients visiting ED is expected to increase by day 3 when the values of steam pressure and temperature are low, and the variables of air pollution are high. Additionally, a respiratory disease prediction index must be established using a prediction model.