Statistical Analysis Aiming at Predicting Respiratory Tract Disease Hospital Admissions from Environmental Variables in the City of São Paulo

This study is aimed at creating a stochastic model, named Brazilian Climate and Health Model (BCHM), through Poisson regression, in order to predict the occurrence of hospital respiratory admissions (for children under thirteen years of age) as a function of air pollutants, meteorological variables, and thermal comfort indices (effective temperatures, ET). The data used in this study were obtained from the city of São Paulo, Brazil, between 1997 and 2000. The respiratory tract diseases were divided into three categories: URI (Upper Respiratory tract diseases), LRI (Lower Respiratory tract diseases), and IP (Influenza and Pneumonia). The overall results of URI, LRI, and IP show clear correlation with SO2 and CO, PM10 and O3, and PM10, respectively, and the ETw4 (Effective Temperature) for all the three disease groups. It is extremely important to warn the government of the most populated city in Brazil about the outcome of this study, providing it with valuable information in order to help it better manage its resources on behalf of the whole population of the city of Sao Paulo, especially those with low incomes.


Introduction
The environmental impact on human health has been observed since Hipocrates, in 400 BC, when, in his book Air, Water and Places, he emphasized the importance of environmental conditions, such as atmospheric variables, on diseases and general human health. Nowadays, however, there is another influence which has been caused directly by man, which is air pollution.
The Earth's atmosphere has been contaminated by toxic substances emitted by anthropogenic sources such as vehicles, industries, mining and others since the Industrial Revolution. Such contamination is clearly apparent in large urban centres, like México City and the city of São Paulo, among others.
São Paulo is one of the most polluted areas in the world. According to Böhm et al. [1], Saldiva et al. [2], and Braga [3], air pollution constitutes one of the major sources of public health issues, being responsible for a considerable amount of hospital admissions of patients with respiratory, cardiovascular, stillborns, ophthalmic, dermatological, and haematological diseases [2][3][4][5][6][7][8][9]. According to these studies, it is important to note the association of respiratory morbidity with environmental conditions, such as meteorological variables and, mainly, air pollutants.
This study is aimed at quantifying the impact of environmental variables on respiratory morbidity in the City of São Paulo, in order to prevent them. It should also be noted that respiratory diseases are responsible for around 30% of the total morbidity in Brazil, mainly in the southern and southeastern areas, bringing out the importance of such research (http://www.datasus.gov.br).
Many studies have evaluated the impact of air pollutants and atmospheric conditions on human health over the last sixty years, yielding no hospital admission forecasts nevertheless. In this way, the contribution of this research is to predict respiratory hospital admissions, as well as to consider the effects of Thermal Comfort Indices (TCIs) on them, besides the effects of air pollutants and meteorological variables.

Material and Method
2.1. Data Set. This is an ecological study using time series in the MRSP (Metropolitan Region of São Paulo) during a four-year period (1997)(1998)(1999)(2000). The respiratory morbidity of children under thirteen years old was the dependent and discrete variable of the model, and the independent variables were environmental as follows: air pollutants, meteorological variables, and thermal comfort indices all variables are continuous. The months, days and holidays were variables dummies. The covariates patterns were not necessary, because the data by age and sex were selected before of the modeling.

Respiratory Morbidity Data.
Daily records of hospital admissions during the researched period, for children under thirteen years of age, were obtained from Brazil's Ministry of Health. These records related to about 80 hospitals, both public and private, spread over the city of São Paulo, which receive support from the public health system (http://www.datasus.gov.br). Thus, our sample is probably representative of the poorest segment of the population, since most in this category lack private medical care insurance. The classification of the diseases under analysis was based on ICD (International Code Diseases, 9th and 10th revisions) and was categorized as follows  computed and added to the meteorological data set, as done by Gonçalves et al. [8]. Cold and heat stress associated with low and high humidity can be strong stressors on heart patients. Thermal comfort indices deal with two or more meteorological variables (temperature, humidity, and wind, for instance), given that the human body reacts to all environmental variables at the same time. Thermal comfort indices become a useful tool to measure thermal stress, mainly in diseased people, hence their use in this research. Historically, thermal comfort analysis was based on two specific biometeorological indices, according to Fanger [10], and the impact on human physiology was evaluated, as described by Kalkstein and Valimont [11]. In a thermally comfortable environment, no thermal stress should be experienced. This situation is called thermal neutrality, and as such, no action needs to be taken to maintain the proper heat balance of the body, according to Fanger [10], who provides a sufficiently broad discussion of this topic in his book and defines thermal comfort as "that condition of mind which expresses satisfaction with the thermal environment." In the present study, the main meteorological variables studied were air temperature, relative humidity, and wind velocity.
In order to determine the thermal classification, different indices were used, based on the work of several authors, as follows. Ono and Kawamura [12] developed the effective temperature concept, based on individual sensitivity/sensitization, combining air temperature and relative humidity. The following equation was used to determine the effective temperature where T is (mean, maximum, or minimum) air temperature in • C, RH is (mean, maximum, or minimum) relative humidity in %, and ET is the effective temperature in • C. Another index, from Suping et al. [13], is defined as follows: where v is the mean wind velocity in m/s, T is (mean, maximum, or minimum) air temperature in • C, RH is (mean, maximum, or minimum) relative humidity in %, and ET w is the effective temperature with wind in • C. These indices were calculated from data collected at the Meteorological Station at Parque Estadual das Fontes do Ipiranga. It was decided that the following combinations of temperature (mean, maximum, and minimum), air relative humidity (mean, maximum, and minimum), and mean wind data would be employed, in order to verify whether there was a correlation between an increase on hospital admissions and thermal stress: hot and dry (ET1), hot and wet (ET2), mean (ET3), cold and wet (ET4), and finally cold and dry (ET5). Such combination was performed for both indices ET and ET w .
In the absence of empirical outdoor thermal comfort studies, it has been widely assumed that the indoor thermal comfort theory applies to outdoor settings without modification [14]. Besides, due to the nonexistence of a Brazilian thermal comfort index, the indexes above, which are considered adequate for Brazil, according to Maia and Gonçalves [15], were adopted in this study.

Statistical
Tools. This research took advantage of Poisson Regression Multiple Model (PRMM) for daily admission modeling, as it is the statistical approach recommended for this kind of study [2,3,9]. Firstly, a lag structure was applied on all the variables (thermal comfort indices, pollutants, and meteorological variables), ranging from one to seven days, at synoptical scale, to verify which lag had more statistically significant correlation with hospital admissions. The lag structure was used because, for example, air temperature can directly affect hospital respiratory admissions on the same day when it is measured and also on several subsequent days. Finally, the Poisson regression modeling was deployed.
The model was estimated and adjusted by its seasonality (nonparametric function), as well as by season of the year, holidays, days of the week, months, and morbidity of nonrespiratory diseases [16]. The Poisson distribution is assumed as where p, tci, and met are the air pollutants, thermal comfort, indices and meteorological variables, respectively. α and β are the equation's coefficients. X p is the pollutant variable, X tci is the thermal comfort index variable, and X met is the meteorological variable. ε is the error. Relative Risk (RR) and Admission Increase (AI) with Confidence Interval (CI) were also estimated as follows: where X is the threshold to estimate the independent variable and β is the Poisson regression parameter: where se is the standard error for β.
To check the fit of the model deviance statistic and the Pearson statistic were used, respectively where y i value of the discrete variable and i is estimative of the model.
The Mean Standard Error (MSE) was used after equations were linearized to check predicting hospital admissions: where P i is the admissions estimate, O i is the admissions that actually occurred and n is the the sample. All analyses were carried on using SPSS 10.0 at a 5% significance level. Table 1 presents the Pearson correlation matrix for URI, LRI, and IP and meteorological, air pollutant, and thermal comfort indices variables. The proportions of such hospital admissions compared to total hospital admissions been 7.6% (1.1% to LRI, 10% to URI, and 5.9% to IP, resp.).

Correlation Matrix.
The variables chosen, in bold, were those with higher statistical significance and correlation coefficients. Based on those results, PRMM was deployed, as shown in Table 2, as follows in Section 3.2.
To build the models URI, LRI, and IP (Section 3.2), air temperature and relative humidity should not be used separately, as their Pearson correlation coefficient (ETw) is higher than the correlation coefficients obtained by these meteorological variables separately. Table 2 shows the Poisson coefficients for the three groups: LRI, URI, and IP. In this model, and also according to Mazumdar et al. [16], the control variables were months, days of the week, seasons of the year, and holidays, as usual. In this table, the chosen variables, which presented the highest correlation and statistical significance, were based on a correlation matrix (see Table 1).

Poisson Regression Analysis.
Air pollutants PM 10 and SO 2 are presented with similar weights in the PRMM. Therefore, due to the colinearity between both variables, the significance of one decreases the significance of the other, when analysed at the same time. Thus, the selection criterion was to choose those variables with the highest correlation coefficient.
The pressure and precipitation variables do not present a statistically significant correlation with URI, LRI, and IP. From (5) (Section 2.2), was obtained for all the three disease categories, in order to evaluate the model's fit through of the pearson statistic ( Figure 1). From (6) (Section 2.2), the Mean Standard Error (MSE) during the year of 2001 was obtained for all the three disease categories, in order to evaluate the model's skill ( Figure 2).
For URI morbidity, the most significant variables were SO 2 and CO with no time lag and ETw4 with a four days time lag (lag4). The seasonal variability of URI is the least among all the disease groups, presenting similar numbers throughout the years, with a slight increase in fall/autumn and winter. During this analysis, SO 2 , even presenting values far below the CETESB safe standard (80 μg/m 3 ), is related to an increase in URI morbidity (see Table 1).  CO also presents a significant correlation coefficient to URI (0.274 at Table 1), despite CETESB's control and the constant decrease in concentration over the last years.
With respect to TCI, ETw4 presents the highest negative correlation (smaller TCI values, higher respiratory morbidity and vice versa, as expected) with URI compared to the other indices (−0.136) of the same disease group, presenting a lag of four days (lag4). This result means that an arriving cold mass affects children after four days later, which is in accordance with the hospital observations. PRMM for URI presents robust results with mean error around 15%.
For LRI morbidity, the most significant variables were PM 10 with no time lag and O 3 and ETw4 both with a lag of three days (lag3). This disease group presents a clear seasonality, with higher numbers in winter and fall/autumn. Other seasons also present morbidity although less so.
In this analysis, PM 10 and SO 2 were the most significant air pollutants (0.175 and 0.154, resp.). At PRMM, both pollutants could not be together, because SO 2 lost its significance while PM 10 kept it, due to their colinearity. Besides, PM 10 and O 3 also presents a significant correlation coefficient (0.093) and might be a LRI predictor. Similar results were found in the MRSP by such other authors as Braga et al. 2002, [17], and Gonçalves et al. 2007. Ozone in the MRSP is generated by vehicular emissions and the presence of VOCs (volatile organic compounds) (Andrade. 1993, [18]).
In the MRSP, ozone presents a different behavior, that is, higher in springtime, compared to the other pollutants, which are higher in fall/autumn and winter. Frequently, ozone overtakes the CETESB safe standard values (160 μg/m 3      PRMM for LRI presents excellent results, as well, with a mean error of less than 30%. Therefore, the LRI model may forecast morbidity for the MRSP in the same way as explained for URI. With respect to IP, the most significant air pollutants were PM 10 and SO 2 (0.321 and 0.354, resp.). However, SO 2 lost significance, and PM 10 kept it.
With respect to TCI, again, ETw4 presents a quite strong and the highest negative correlation with IP, compared to the other indices (−0.496), with a time lag of three days (lag3).
The PRMM deployment for IP presents a mean error of 40%, the worst value among the disease groups, which suggests a new modeling approach. Anyway, estimations can be made, despite the model's limitations.

Relative Risk and Admission Increase
3.3.1. URI. The model behaves satisfactorily, based on the URI morbidity estimation from the PRMM equation. From these results, it is possible to verify that the increase in respiratory morbidity is due to an increase in SO 2 or CO or a decrease in ETw4 (see Table 1). From the coefficients found in the PRMM model, it was possible to calculate the increase in URI morbidity, so that an increase of 10 μg/m 3 in the concentration of SO 2 is related to a nonlinear behavior in the URI morbidity increase, ranging from 13.9% at low values of SO 2 to 182.9% at high values of SO 2 .CO also does not show linear behavior, considering that an increase of 2 ppm is related to an increase of 9% at low values of CO to 99% at high values of CO. On the other hand, ETw4 presents a linear behaviour, as for every increase of 2 • C in TCI, URI morbidity decreases approximately 2.0% (see Table 3). Therefore, colder temperatures imply an increase in URI morbidity.
The Relative Risk rises from 1 to 2.8 with CI 95% ranging from −1 to +1 when SO 2 varies from 0 to 80 μg/m 3 (see Figure 3(a)). Regarding CO, the Relative Risk rises from 1 to 1.9 with CI 95% ranging from −0.6 to +1.0 when CO varies from 0 to 16 ppm (see Figure 3(b)). When it comes to ETw4, the Relative Risk decreases from 1 to 0.8 with CI 95% raging from −1.0 to +0.8 when ETw4 varies from 0 to 16 • C (see Figure 3(c)).

LRI.
The model presents robust results which forecast morbidity with an MSE below 25%. In this case, the variables used in this analysis were PM 10 , O 3 , and ETw4, as shown in Table 4. With respect to PM 10 , an increase of 20 μg/m 3 indicates a quasilinear increment of LRI, with an average increase of approximately 2.2% in the morbidity of this disease group. Several other authors, such as Zanobetti et al. [19], present similar results. Regarding ozone, which is the air pollutant that has more frequently overpassed safe thresholds in the MRSP [20], an increase of 40 μg/m 3 did not represent a linear increase in LRI morbidity. With respect to ETw4, an increase of 2 • C is related to a linear decrease of approximately 1.5% in the LRI morbidity, bringing out the protection effect of a more comfortable weather (warmer and drier).
The Relative Risk rises from 1 to 1.2 with CI 95% ranging from −1 to +1 when PM 10 varies from 0 to 160 μg/m 3 (see Figure 4(a)). Regarding O 3 , the Relative Risk rises from 1 to 2 with CI 95% ranging from −1 to +1.0 when O 3 varies from 0 to 320 μg/m 3 (see Figure 4(b)). When it comes to ETw4, the Relative Risk decreases from 1 to 0.8 with CI 95% ranging from 1.0 to −0.8 when ETw4 varies from 0 to 16 • C (see Figure 4(c)).

IP.
With regard to IP, the results are similar to the other disease groups so that there was a decrease of IP morbidity in response to an increase in ETw4, as expected. A decrease of 2 • C indicates an average increase of 3.3% in IP morbidity, which is considerably higher than the decrease observed in URI and LRI (see Table 4).
On the other hand, an increase in PM 10 is related to an increase in IP morbidity (see Table 5). A 20 μg/m 3 increment is related to an average increase of approximately 5% in IP morbidity, which is over twice as high as that in LRI (see Table 4.) The Relative Risk rises from 1 to 1.5 with CI 95% ranging from −1 to +1 when PM 10 varies from 0 to 160 μg/m 3 (see Figure 5(a)). Regarding ETw4, the Relative Risk decreases from 1 to 0.6 with CI 95% ranging from −1.0 to +0.8 when ETw4 varies from 0 to 16 • C (see Figure 5(b)).

URI, LRI, and IP Summary.
In summary, with respect to air pollutants, URI shows the impact of SO 2 and CO.  With respect to LRI, it is affected by PM 10 and ozone. With regard to IP, it is affected by PM 10 . Therefore, different air pollutants differently affect each disease group, as expected. With respect to TCI (an index obtained from meteorological variables), ETw4 shows an impact on the three disease groups, more considerably on IP. Also not surprisingly, there is more discomfort from cold and wet weather, which generates higher rates of morbidity. From the coefficients estimated in the URI, LRI, and IP models, it became possible to predict hospital admission variations as well as their relative risks, according to the individual variation of each associated variable.
The increase in SO 2 around 80 μm/m 3 , which is a feasible variation in São Paulo, rises in URI hospital admissions and may reach 182.9%. This result is worrying as, according to CETESB, 365 μg/m 3 is the maximum acceptable level of such air pollutant, which is considerably high. With regard to CO, in case there is a 16 ppm increase, hospital admissions are expected to rise by 99.0%.
With regard to LRI, it was noticed that with an increase in PM 10 of up to 160 μg/m 3 , which is considered acceptable, admissions will rise by 17.4%, and the Relative Risk will rise from 1 to 1.2. About O 3 , it was noted that if it increases up to 160 μg/m 3 , which is considered acceptable according to    CETESB. However the admissions will rise by 37.7%, and the Relative Risk will rise from 1.0 to 2.0. This pollutant has frequently overtaken safe levels of air quality, reaching peaks of 283.4 μm/m 3 , resulting in a hospital admission rise of 75.0%. Regarding IP, it was observed that if the TCI variation is positive, it behaves as a protector factor, as the higher the index the more comfortable (less cold) is the sensation. Nevertheless, it should be considered that such index tends to be uncomfortable, naturally, as it uses minimum temperatures in its calculation. If ETw4 varies from 0 • C to 20.0 • C, admissions due to this variation will decrease by 16.1%, 14.8%, and 33% for URI, LRI, and IP, respectively.

Conclusions
After deploying the PRMM, a hybrid model was built, named Brazilian Climate and Health Model (BCHM), capable of predicting hospital admissions from environmental variables, whose values may either be measured or obtained from a meteorological forecasting mathematical model. Such approach of allowing the use of variables came from a meteorological forecasting mathematical model is similar to the approach adopted by the Model Output Statistics (Karl, 1979, [21]).
According to the analysis in this article, it is possible to state that air pollutants, thermal comfort indexes, and meteorological variables show a statistically significant correlation with LRI, URI, and IP hospital admissions.
The projected model for LRI and URI shows robust overall results with an MSE below 25%, while the IP model accuracy was not as good as the other two, with a MSE of approximately 40%. With an error smaller than 30%, it is possible to forecast URI and LRI morbidity based on the thermal comfort indexes and air pollutants measures, as well as based on meteorological variables forecasting.
Relative Risk results show URI associated with CO and SO 2 , LRI with PM 10 and Ozone, and IP with PM 10 , as well. ETw4 was also associated with all the disease groups.
On the whole, the models were satisfactory because the objective of this paper was to show that environmental variables could be used to estimate hospital admissions. However, health depends not only on environmental factors, but also on several other factors: hereditary, nutritional, and economic, for instance. This explains part of the model's error. Nevertheless, the BCHM, besides estimating hospital admission from real data, can also be used to predict scenarios resulting from the climate change and extreme weather events (cold and hot air masses).