A Simple Tool to Identify Representative Wind Sites for Air Pollution Modelling Applications

1Faculty of Engineering, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand 2Faculty of Engineering, University of Peradeniya, 20400 Peradeniya, Sri Lanka 3Faculty of Medical and Health Sciences, University of Auckland, Private Bag 92019, Auckland 1142, New Zealand 4Faculty of Science, University of Auckland, Private Bag 92019, Auckland, 1142, New Zealand 5National Institute of Water and Atmospheric Research (NIWA) Ltd., 41 Market Place, Viaduct Harbour, Auckland Central 1010, New Zealand 644B Kelvin Road, Remuera, Auckland 1050, New Zealand


Introduction
The success of a meteorological air pollution model depends to a large extent on the "representativeness" of its input data in describing the meteorological characteristics of the atmosphere which govern the dispersion of pollutants [1,2].For the purpose of assessing air quality, the World Meteorological Organization (WMO) recommends that wind measurements should be made at a height of ten metres or 1.5 times the mean height of roughness elements in densely built-up areas and that the measurements should be made in an unobstructed area, with an "unobstructed area" defined as "an area where the distance between the instrument and any obstruction is at least ten times the height of that obstruction" [3][4][5].Siting and exposure requirements of meteorological monitoring stations specifically for the purpose of air quality modelling are also documented by the US Environmental Protection Agency [6].Colocated meteorological measurements (measurements made in close proximity to an air quality monitoring site and specifically for the purpose of studying air quality) are considered to be preferable to national weather station data for use in air quality modelling, providing that appropriate instrumentation is used and that suitable quality assurance procedures are followed [7].In the absence of site-specific measurements, national weather station data are considered acceptable by US Environmental Protection Agency [7], even though their prime focus is often on monitoring and predicting severe and adverse weather while the focus of air quality studies is on mild weather conditions [8].
In accordance with the above guidelines, most urban air pollution modelling analyses [9][10][11][12] use meteorological data that are either colocated or located as close as possible to the air pollution monitor in question.When colocated wind measurements are not available, meteorological data from weather stations located outside of the urban areas (e.g., at an airport site) are sometimes used [13][14][15][16].Considering the comprehensive and long-term meteorological data availability of weather stations, some studies have resorted to using wind data from stations located 20-50 km away [1,15,16].Practical constraints over the choice of sites, monitoring expense, and the desirability of colocating meteorological measurements with air quality instruments (where site selection is often constrained by other variables) mean that, in many cases, compromises have to be made in terms of meeting the siting requirements.Moreover, since topography, land use, and roughness length are inherently variable in urban areas, it may not be possible to identify any sites that comply with all of the requirements, as specified by the USEPA or WMO [2].
At the present time, the choice of the most suitable meteorological site for a particular air pollution modelling application remains somewhat arbitrary and based on limited understanding of the impact of the various meteorological site characteristics in air quality model performance.Moreover, there is no generally accepted analytical or statistical technique specified to determine the representativeness of meteorological data or monitoring sites for the purpose of air quality modelling [6].A few studies have been carried out which evaluate the representativeness of wind data on air quality model performance [11,17].One study carried out in the city of Florence compared the performance accuracy of an air quality model between wind data from meteorological observations and numerical weather prediction [11].A similar study has been carried out in Italy comparing the performance of a model using observed urban meteorological data over meteorological data provided by the CALMET preprocessor [17].However, as far as the authors are aware, there are no tools available to evaluate model performance using different observational data in relation to the "representativeness" of the meteorological sites on which the model outputs are based.The purpose of this paper is therefore to use simple Site-Optimized Semiempirical (SOSE) model as a tool for identifying the relative importance of meteorological site characteristics (such as proximity of meteorological monitoring to an air quality monitoring site, the height of the anemometer, and whether or not the site is located in an "open" area) in relation to air pollution model performance.

Site Selection.
The study was carried out in Auckland, the largest city in New Zealand, located along a narrow isthmus consisting of a complex coastline.The topography is low-lying undulating terrain.Due to the limited influence of industrial emissions, vehicular emissions and home heating are the major sources of air pollution.The key pollutants that impact air quality in the Auckland region therefore are CO, NO  , and PM 10 [18].
Air pollution concentrations used for this study are from the ROADSIDE field campaign [19] that covers a period of four months at a site focusing on Auckland's Southern Motorway, between 2 April 2010 and 1 August 2010.The site is located in Otahuhu East, a suburban residential neighbourhood in Auckland with generally flat terrain which is bisected by Auckland's Southern Motorway (annual average daily traffic volumes of ∼120,000).Air quality data and colocated meteorological data were collected from three monitoring sites located within 440 m of each other in close proximity to the Southern Motorway (see Figure 1).The locations of each of the air quality monitoring stations are shown in Figure 1(b).Luke Street is located in the west of the motorway set back by 240 m in open grassy terrain relatively free of any building or natural structures and so, to a large extent, complies with the WMO siting requirements.The 25 Deas Place site is also located 140 m from the motorway but to the east.It is located within a suburban neighbourhood consisting of single-story residential housing with modest amounts of vegetation (small trees and shrubs).Deas Place Reserve site is located in very close proximity (approximately 5 m from the motorway) to a slip road that is an exit of The meteorological dataset was supplemented by data from nine other meteorological monitoring sites across the region (see Figure 1(a)), downloaded from the National Climate Database maintained by NIWA (http://cliflo.niwa.co.nz/).Each of the twelve meteorological monitoring sites was different in terms of their site characteristics, including their measurement heights, their distance to the air quality monitoring sites, and the land use characteristics of the surrounding area (building height, density, etc.).The sitespecific characteristics of the twelve sites considered for the study are given in Table 1.The abbreviations used in Table 1 for the twelve sites are used here onwards.

The Model.
The air pollution model chosen for this study is a simple Site-Optimized Semiempirical model (SOSE) [9,10].This model was developed and tested in New Zealand [9,10] and shown to be effective in predicting ambient concentrations of a range of pollutants associated with road traffic [20].It has been found to be useful in practical applications such as interpolating for missing data and looking at "what if" scenarios associated with changes in traffic patterns and surface meteorology [9,10,20,21].It has also been shown to be effective in conditions of complex terrain, as found in the Aosta Valley of Italy [20].An advantage of this model is that it can be trained exclusively using a set of wind and concentration data and it can be expected to perform well if the wind data used as input are representative of the area in terms of its dispersion characteristics.Carbon monoxide (CO) and oxides of nitrogen (NO  ) are chosen as the pollutants of interest as they are strongly associated with road traffic, as discussed above.
SOSE assumes the concentration, , is inversely related to the wind speed () (as with the box model) but with a wind speed offset  0 (ms −1 ) included to avoid severe overpredictions in very light wind speed conditions, as suggested by Chock [22].The model becomes where  is the emission rate, Δ is the mixing height (or box height), and   is the background concentration of the pollutant.This equation is separately applied when the receptor is placed windward and leeward of the road.So for leeward (downwind) conditions, the equation becomes The emission term,   , for leeward conditions incorporates both emissions from the road adjacent to the monitor and emissions from other roads in the vicinity and   is the background concentration for leeward condition.
For windward (upwind) conditions, the model becomes The emission term,   , is the emission component from other roads in the vicinity and   is the background concentration for windward condition.With the data sorted by time of day, linear regressions of  (mgm −3 ) on ( +  0 ) −1 are performed and giving values of the regression coefficients   Δ −1 ,   for leeward conditions, and   Δ −1 ,   for windward conditions [9] for each time of day.If the daily distribution of emissions is different between weekdays and weekend days, the dataset may be partitioned so that regression parameters are obtained for weekday and weekend days separately.
Based on the three-month ROADSIDE dataset, the optimized model parameters, namely,   Δ −1 ,   ,   Δ −1 , and   , were calculated for weekdays and weekend days for each 10-minute period throughout the day with the optimum parameters being constrained to avoid negative concentration predictions.
2.3.The Procedure.The concentrations of CO and NO  at the three air pollution monitoring locations were modelled using wind field observations from each of the twelve wind sites separately (three site-specific wind observational sites and nine from the Auckland climate network for the same time period).This resulted in 36 combinations of model results for each of the pollutants.

Model Evaluation Statistics.
Model performance was evaluated using standard model evaluation statistics recommended in the literature for estimating the uncertainty in air quality model predictions [23,24].The statistics used were the normalized root mean squared error (NRMSE), the index of agreement (IA), the correlation coefficient (COR), the fractional bias (FB), and the fraction within a factor of two (FAC2).These statistics also have the advantage that they are dimensionless, allowing for easy comparison between pollutants.Their definitions are given in Table 2.
NRMSE and IA indicate the degree of agreement between observed and predicted time series data, and FB is a measure of agreement with the mean concentration; a positive FB will result if the model is overpredicting the mean concentration and a negative FB will result if mean concentrations are underpredicted by the model.A perfect model performance will result in NRMSE and FB scores of zero and an IA of unity.FAC2 is a measure of the proportion of predictions within a factor two of the observed concentration.

Results and Discussion
3.1.Meteorological Data.Wind rose diagrams were constructed for the twelve wind sites for the period of April 1 to August 1, 2010.Some of these diagrams that show similar and different wind roses to that of the air pollution measurement site of interest are presented in Figure 2. The dominant wind flows for most of the sites during the observational period are from the south west and north east directions.Observations at LS, PA, and M were similar in terms of wind direction (Figures 2(a), 2(b), and 2(c)).However, relative weakening of intensities was observed at LS (moderate) and PA (weak).Based on the aerial view of the sites, the degree of surface cover around the site increases in the same order (M, LS, and PA).From this, we speculate that the weakening of winds is caused by increased surface roughness.The wind rose patterns at the Wiri (W) site, situated 7 km away, and Pukekohe (PU), the furthest site situated 30 km away, show strong north westerly wind components that are not observed at other sites (Figures 2(d) and 2(e)).At Onehunga (O), located 5 km away from the study site, wind components from the east and south-east were observed, components that were not found at any of the other closer sites (Figure 2(f)).All of the other wind roses showed significant differences from those of the LS, PA, and M sites (not shown here).on the plots is the 95% confidence interval of the means calculated through bootstrap resampling.The distributions of CO and NO  are consistent with peaks during the morning and evening rush hours with concentrations persisting into the early night.An inverse relationship between pollutant concentrations and wind speed is also well depicted from Figure 3(a).Hourly average concentrations of CO and NO  are calculated separately for the events when the receptor site 25DP is upwind or downwind with respect to the motorway and is presented in Figures 3(b) and 3(c), respectively.Both CO and NO  concentration averages are higher when the site is downwind relative to upwind of the motorway.This is consistent with the presence of a significant line source and highlights the need to treat upwind and downwind time periods separately when modelling pollutants in the presence of such a source.

Model Performance.
Examples of SOSE modeling results for the fourth week of the four-month campaign, using ten-minute averages, based on colocated wind data from 25DP, LS, and DPR for observed CO and NO  concentrations at 25DP are presented in Figure 4. Similar results were obtained for the remaining weeks of observation.Model statistics for the four-month campaign are presented in Tables 3 and 4. Figures 4(a) and 4(b) illustrate the modeling results using wind measurements at LS and highlight the enhanced model performance achieved when meteorological data from this same site are used to train the model.Figures 4(c) and 4(d) show the results for 25DP using data from colocated instruments at 25DP, and Figures 4(e) and 4(f) illustrate the results using data from DPR.In each of these latter four cases, model performance is poor, indicating that the meteorological data used were not representative of the domain.
Figure 5 presents the time series of observed and predicted concentrations for one week using wind data from five weather stations from across the rest of the Auckland region (the locations of the weather stations are marked in Figure 1) using ten-minute averages.Figure 5(a) provides an example of the modelling results achieved using LS winds for the same week for comparison.The weather station at Mangere (M) appears to be the most representative of the nine noncolocated sites considered in this study (Figure 5(b)).Table 3 presents the model evaluation statistics for NO  concentrations at 25DP using wind data from the twelve meteorological stations.The results show that the best performance (NRMSE = 0.08, IA = 0.86, COR = 0.78, FB = 0.03, and FAC2 = 0.6) is obtained using wind data from LS located approximately 300 m away from the air quality monitoring site 25DP.The second best performance is obtained using wind data from M and PA, the two sites where similar wind patterns to those at LS are observed.The worst model performance (NRMSE = 0.14, IA = 0.46, COR = 0.28, FB = −0.77,and FAC2 = 0.46) was obtained using wind data from PU, the site located the furthest away and displaying a noticeably different wind rose pattern.The agreement between observed and predicted concentrations reduced by 46% when using wind data from PU site compared with data from LS.All of the other sites (W, MP, H, and O), which showed markedly different wind roses to that at LS, resulted in poor model performance.According to the FB values, MP, H, and PU tended to result in underprediction of concentrations compared to the mean.The effectiveness of model predictions (as shown by FAC2) also declined in a similar order from LS (best wind site) to PU (worst wind site).
The same statistical analysis was repeated, this time using CO as the pollutant.The results are presented in Table 4.In terms of the ranking of the sites for model performance, the results are similar to those found for NO  .Wind data from LS were found to produce the best model results (NRMSE = 0.06, IA = 0.82, COR = 0.74, FB = 0.04, and FAC2 = 0.39), while the results using PU data were found to be the worst (NRMSE = 0.09, IA = 0.44, COR = 0.32, FB = −0.88,and FAC2 = 0.2).The agreement between observed and predicted concentrations reduced by 46% when using wind data from the PU site compared with LS.The fractional bias takes a large negative value with PU winds, which is a result of the underpredicted events.FAC2 values reduced from the best wind site to the worst wind site showing a similar pattern to NO  .
These results show a similar level of improvement to the model as those achieved by Giambini et al. [11] using the model SIRANE when they replaced numerically predicted wind data with observed wind data (COR improved from 0.69 to 0.73) and the improvement to ADMS-Urban model obtained by Righi et al. [17] when wind data from the centre of the study area are used rather than those from a slightly remote station (COR improved from 0.2 to 0.45).

Model Performance in relation to Site Characteristics
Distance to Wind Site.The effect of distance between the anemometer and the air pollution monitor was evaluated by plotting the IA values of modelled results as a function of the distance (Figure 6(a)).The linear relationship between distance to wind site and IA values showed a correlation coefficient of 0.6 ( = 0.03).The results suggest that the representativeness of wind data is reduced with the distance to the wind site.However, the weak correlation is a result of wind data from some closer sites such as 25DP, DPR (on site), and PE (5 km away) that have their anemometers located in obstructive environments (with trees and buildings in close Height of Anemometer and Openness of the Site.When the IA values of modelled results at 25DP using three on-site meteorological stations were analyzed (Figure 6(b)), LS, the most open site having an anemometer height of 10 m, gives the best model performance.Prediction results for 25DP site using its colocated wind data were worse than those with LS wind data since the anemometer at 25DP was at 8 m and in an area covered with buildings.DPR wind data also showed less representativeness as a result of low anemometer height (6 m) and turbulence from the surrounding environment.This suggests that if the air pollution monitor is located in an area where it is not possible to erect an anemometer at 10 m in open grounds, it would be preferable to choose a wind monitoring site away from the air pollution monitoring site where it can be erected at a height of 10 m while satisfying the "open grounds" criteria rather than compromising on anemometer height for the sake of closeness.

Conclusions
This study explores the capability of a simple modelling tool, SOSE, to identify the most representative wind site to be used for local-scale air pollution modelling applications when a choice of the wind site has to be made from a set of available wind sites.The results not only support important aspects stated in the WMO and EPA meteorological monitoring guidance but also provide quantitative model performance statistics for all available wind sites.
According to the results of this study, selecting wind data that is measured in close proximity is very important in reducing the prediction error of an air quality model, especially in situations where there is a great deal of spatial variability in the surface wind flows.However, colocation should not be a priority over choice of an open area, free from local obstacles such as trees and buildings, consistent with the WMO guidelines for wind data collection sites.If an open area cannot be found in close proximity for implementing the anemometer, national weather station data can be used with good accuracy.However, choosing the appropriate weather station should be done carefully.According to the results of this analysis, sites within 10 km gave reasonable model accuracy.However, not all the sites within 10 km radius performed well.The wind data that showed similar wind directions and a similar fraction of data from each direction to the most representative on site wind data (as can be seen from the wind roses) gave good predictions.
The intensity variations due to weakening of wind speeds do not make a significant difference to the prediction accuracy of the SOSE model as long as the consistency of the wind pattern is maintained (e.g., the LS and M sites).A wind site identified as a representative wind site by SOSE will provide representative wind data for any air quality model.If multiple wind sites are available, it is worth experimenting with wind data from all available sites to identify most representative wind data that characterise the dispersion of the pollutants to the site of interest.This may be particularly important in areas of complex terrain as the winds are likely to be spatially inhomogeneous.The SOSE model is ideal for such an exercise as it is entirely dependent on representative wind data for good model performance.It is also computationally simple, is easy to implement, has no input requirements beyond wind speed and direction, and has been shown to be reliable for use in topographically complex areas.As such, the approach presented here can be expected to be transferrable to any urban area, irrespective of the topographically complexity or the nature of the road layout.
An analysis of wind data and air pollution model performance of this nature, using a simple tool such as SOSE, gives a clear basis for selecting representative wind data for air quality models.A valuable future exercise would be to verify these results using another statistical method such as artificial neural networks.

Figure 1 :
Figure 1: Map of monitoring locations used in this study.(a) Map showing twelve wind sites; (b) enlarged view of the three air pollution measurement sites with on-site anemometers for wind measurements.
Figure 3(a)  shows the normalized time variation of CO, NO  , and wind speed at 25DP averaged for each hour of the day for the study period.Also shown

Figure 3 :
Figure 3: (a) Time variation of hourly average concentration of CO, NO  , and wind speed at 25DP (normalized values are presented).(b) Upwind and downwind concentrations of CO at 25DP.(c) Upwind and downwind concentrations of NO  at 25DP.Note.The band shows 95% confidence interval through bootstrap resampling.

Figure 4 :
Figure4: Examples of one-week time series of observed and predicted ten-minute average CO and NO  concentrations at 25DP using wind data from 25DP, LS, and DPR, respectively.The results suggest that the best model performance is achieved using data from LS (the "open" site).

Table 1 :
Air pollution and meteorological data.Sites with colocated wind data and air pollution data, b   : anemometer height. a

Table 2 :
Model evaluation statistics and their definitions. −   ) (  −   )       Note.  is the th predicted value,   is the th observed value,  max is the maximum observed value,  min is the minimum observed value,  is the number of observed and predicted pairs,  is the mean predicted,  is the mean observed,    is the standard deviation over the observed data set, and    is the standard deviation over the predicted data set.

Table 3 :
Statistical analysis of the predicted and measured ten-minute average NO  concentration at 25DP from April to July 2010.

Table 4 :
Statistical analysis of the predicted and measured ten-minute average CO concentration at 25DP from April to July 2010.