Bayesian Spatial and Trend Analysis on Ozone Extreme Data in South Korea: 1991–2015

Background. Extreme events like flooding, extreme temperature, and ozone depletion are happening in every corner of the world. (us, the need to model such rare events having enormous damage has been getting priorities in most countries of the world. Methods. (e dataset contains the ozone data from 29 representative air monitoring sites in South Korea collected from 1991 to 2015. Spatial generalized extreme value (GEV) using maximum likelihood estimation (MLE) and two max-stable and Bayesian kriging models are the statistical models used for analysis. Moreover, predictive performances of these statistical models are compared using measures like root-mean-squared error (RMSE), mean absolute error (MAE), relative bias (rBIAS), and relative mean separation (rMSEP) have been utilized. Results. From the time plot of ozone data, extreme ozone concentration is increasing linearly within the specified period. (e return level of ozone concentration after 10, 25, 50, and 100 years have been forecasted and showed that there was an increasing trend in ozone extremes. High spatial variability of ozone extreme was observed, and those areas around the territories were having extreme ozone concentration than the centers. Moreover, Bayesian Kriging brought about relatively the minimum RMSE compared to the other models. Conclusion. (e extreme ozone concentration has clearly showed a positive trend and spatial variation. Moreover, among the models considered in the paper, the Bayesian Kriging has been chosen as the better model.


Background
Extreme events like flooding, extreme temperature, and ozone depletion are happening in every corner of the world.
us, the need to model such rare events is being getting prioritized in most countries of the world. ese days, depletion of the stratospheric ozone layer by halogenated chemicals is a global problem. Since the issue was primarily a matter of scientific curiosity during the 1970s and early 1980s, recently, it became an urgent policy question for governments in both developed and developing countries.
ere are various effects of ozone starting from human health to increasing the chance of existence of natural catastrophe. Some of them are the effects on human and animal health, terrestrial plants, aquatic ecosystems, biogeochemical cycles, air quality, materials, climate change, and ultraviolet radiation. According to Stedman et al., the effect on human health like eye diseases, skin cancer, and infectious diseases is due to increased penetration of solar UV-B radiation [1]. e changes in plant form and secondary metabolism due to UV-B could have an important implication for plant competitive balance, plant pathogens, and biogeochemical cycles. e climate and environmental applications used multivariate extreme distributions in order to take into account the extremal dependence [2][3][4][5]. For the study of extremes in a spatial context, max-stable processes are a relevant framework for modeling and inference. Different parametric models have been later proposed by various authors [6][7][8].
e extremal t process is a model of that family that extends the Schlather model [9,10]. It has a supplementary parameter that lets the extremal coefficient function vary in a wider range. Max-stable processes have been used in applications for modeling air pollution using ozone data, rainfall, temperature, snowfall, and snow depth [11][12][13][14].
Designing the methodology that brings accurate risk measures of these extreme events needed to minimize the casualties on human being and reduce the damage on infrastructure is indispensable.
us, the importance of modeling such events and getting ready to take precaution measures will have great importance. e main goal of this paper is to model and compare the predictive performance of the extreme values like spatial GEV with MLE estimation, two max-stable spatial models, and the other recent model Bayesian kriging. In most papers, Bayesian kriging has not been considered for comparison. e comparative criterion considered for comparing the predictive performance of Schlather's, extremal t, and Bayesian kriging is the 5-fold cross validation. e paper considers 29 air monitoring sites found South Korea, and the daily ozone data have been collected from the year 1991 to 2015. 8 hours' maximum ozone concentration has been selected from the daily maximum ozone concentration.
en, using this daily maximum ozone, the monthly maximum has been selected. From the monthly maximum, annual maximum have been selected. Actually, only ozone season months from May to September have been utilized for extracting data for the model fitting process.

Basics of Extreme Value eory (EVT).
ese days, the statistical modeling of univariate extremes has been well assessed. However, these tools to model spatial extreme data have been active research areas of the time.

Univariate Extreme Value Models.
For each station in this study, the univariate extreme value model has been fitted using generalized extreme value (GEV) distribution using Bayesian perspective. e GEV defined as on z: 1 + ξ((z − μ)/σ) > 0 and with parameters satisfying − ∞ < μ < ∞, σ > 0, and − ∞ < ξ < ∞. e maximum value of ozone per site (Z t,i ) ∼ GEV(μ t,i , σ, ξ) and the parameters of the model are location (μ), scale (σ), and shape (ξ) for each 29 air monitoring sites. Block maxima first approved in work by Fisher and Tippett approach has been utilized [15]. Time series plot of the ozone concentration versus time has been plotted. Trend has been observed. us, the location parameter has been expressed as a linear function of time: (2) As the data contain ozone concentration in different sites, spatial dependence is inevitable.

Generalized Extreme Value Distribution.
e generalized extreme value (GEV) distribution has a location parameter μ and a scale parameter σ and combines the three types of extreme value distributions by use of a shape parameter ξ. It is usually represented by GEV(μ, σ, ξ). e distribution function is as follows: the case ξ < 0 corresponds to the Weibull family; for the case ξ � 0, the limit of the GEV as ξ ⟶ 0 is used, which leads to the Gumbel family: erefore, we now have a single formulation for modeling block maxima data which overcomes the issue of distribution choice from the three types by allowing the data to choose the appropriate tail behavior for the model. e uncertainty for this choice is now incorporated into the model within the uncertainty for the parameter estimate of ξ: defined on z: 1 + ξ((z − μ)/σ) > 0 and with parameters satisfying − ∞ < μ < ∞, σ > 0 and − ∞ < ξ < ∞. e case ξ > 0 corresponds to the Fréchet family. e yearly maximum values of the ozone concentration per geographic zone i, Z 1,i , . . . , Z T,i follows generalized extreme value (GEV) distribution of the following form: with the location parameter, μ t,i � β 0,i + β 1,i (t), for t � 1, . . . , 25 and i � 1, 2, . . . , 29, where β 0,i is the intercept and β 1,i is the trend in t.

Return Level Estimation for Block Maxima Approach.
Finding the probability of occurrence for large events is the major motivation for modeling extreme data. is demands us to be able to predict the level of the process that we expect to exceed only once in the time period required, which is known as a return level. For yearly maximum data, the value of the return level can be estimated from the GEV distribution function directly. If we let Z p be the (1/P) year return level of the process, then where this expression comes from requiring the exceedance rate of z p to average once every (1/P) years and considering a constant rate of exceedance. Accordingly, the approximate value of the (1/P) year return level for the GEV distribution is given as

Bayesian Kriging Model Fitting for Annual Maximum
Ozone. A spatial model with a simpler version that is considered in this paper utilizes a model with only one latent spatial process and no measurement errors: e likelihood function is given by

Posterior for the Mean Parameter β.
e conjugate prior for the mean parameter β has been considered. Assuming a normal conjugate prior for the mean parameter β, which means e posterior distribution is given by us, normal distribution is a conjugate prior.

2.3.2.
Posterior for the Model Parameter for σ 2 . e posterior distribution for σ 2 is obtained by Pr σ 2 |y, β * , ∅ * ∝ Pr σ 2 Pr y|β * t, nσ 2 q, h∅ * , (13) where the prior distribution is the first term and the second is the likelihood. e posterior distribution for all the three prior distributions is a scaled-inverse-χ 2 of the form and is given by e choice of prior distribution determines the parameters (υ, Q). In a particular case of the inverse-gamma distribution, the scaled-inverse χ 2 is the conjugate prior for σ 2 .
is conjugate distribution is specified by two hyperparameters: which corresponds to ((n σ S 2 σ )/σ 2 ) ∼ χ 2 (n σ ) e posterior distribution is where is the maximum likelihood estimator for σ 2

Max-Stable Models.
A general representation of maxstable processes can be described by two components, a stochastic process X(s) { } and a Poisson process Π with the intensity dζ/ζ 2 on (0, ∞). Let {X i (s) i∈N } X(s) with E[X(s)] � 1 and let ζ i ∈ Π be points of the Poisson process. en, the spectral representation of Schlater's model is given by is is called Schlater's spectral representation. A more flexible class of max-stable processes by taking X i (s) to be any stationary Gaussian random field with finite expectation was suggested by Schlather [7]. A stationary max-stable process with unit Fréchet margins can be obtained by where μ � E max(0, X i (s)) < ∞ and ζ i denote the points of a Poisson process on (0, ∞) with intensity measure is used if the random process is specified for a stationary isotropic Gaussian random field X i (s) with unit variance. is process Z(s) is called an extremal Gaussian process, and the bivariate marginal distributions are given by

Advances in Meteorology 3
where h is the Euclidean distance between station s 1 and s 2 . For this paper, the power exponential correlation function has been utilized. Extremal dependence was discussed by various scholars in different disciplines [2][3][4]16]. Various parametric models have been suggested by different experts [6][7][8]17]. e extremal t process is a model of that family that extends the Schlather model [9, 10].

Specification of Parameters of Schlater's
Model. e covariate specification for the location and scale parameters for fitting Schlater's model is
ese validation criteria are defined as where m is the total number of observations that we need to validate, z i is the data indexed by i, z i is the prediction value, and Z i and Z p are the arithmetic mean of the observations and predictions, respectively.

Air Monitoring Sites Selected for the Study.
Twenty-five years' (from 1991 to 2015) extreme ozone data have been collected from the 29 sites. e reason for considering 1991 as the start of data collection was the existence of large number of missing data in almost all 29 sites prior to 1991. e distribution of air monitoring sites in South Korea is displayed in Figure 1.

Descriptive Outputs of Maximum Ozone
Concentration. e minimum, first quartile, median, third quartile, and maximum are the descriptive measures used to portray the characteristics of extreme ozone concentrations in each 29 sites, and they are depicted in Figure 2.
Relative to the standard ozone concentration in South Korea which is 0.06 ppm, the distribution of maximum ozone concentration in all air monitoring sites was above the standards.
Time series plot of the maximum ozone concentration of all stations is displayed in Figure 3. As seen in Figure 3, we can clearly observe a positive trend in the ozone concentration with respect to time. is indicates that the location parameter of the generalized extreme value (GEV) distribution can be expressed as a linear combination of time.

Station by Station Analysis by GEV.
We assumed that the maximum values of O 3 per monitoring station, Z 1 , . . . , Z T , are independent and follow a GEV distribution of the following form: where β 0 is an intercept parameter while β 1 represents a trend in t for the location parameter and t denotes the order in which the measurements were obtained. Moreover, σ and ξ are constants in time and denote the scale and shape parameters of the GEV distribution. e time series plot of all sites in Figure 3 can be taken as a justification to use a linear trend for the location parameter of generalized extreme value distribution of maximum ozone concentration.

Prior Distribution of Parameter of GEV.
e prior is defined by the marginal distributions μ ∼ N(0, 10 2 ), σ ∼ log N(0, 10 2 ), and ξ ∼ N(0, 10 2 ) and the initial value for the prior of trend parameter is set to be 0.728. With the above prior specification, a Markov chain θ 0 , . . . , θ n with target distribution π(θ|x) can be generated. en, chains of length 10,000 have been generated using the initial values θ 0 � (4, 1, 0.1, 0.1) and the proposal standard deviation s � (0.02, 0.1, 0.1, 0.1). Convergence of the posteriors of all parameters has been checked using trace plots and Geweke test.
As it is depicted in Table 1, all the values of β 1 for all the stations in the country are positive indicating that there is a positive trend in the maximum ozone concentration with time.

Spatial Generalized Extreme Value Analysis Assuming
No Spatial Dependence. In this perspective, the spatial extreme value analysis does not assume spatial dependence among stations in the study areas. Using this specification, the following likelihood is used for this analysis:

Exploratory Plot of Location Parameter versus Geographic Coordinates.
We used the exploratory plot in Figure 4 to show the relation between the dependent variable ozone extreme value with two covariates latitude and longitude. e plot shows approximately linear relation. As seen in the above plot, there seems to be a linear trend between the covariates latitude and longitude and the ozone concentration.

Exploratory Plot of Scale Parameter versus Geographic
Coordinates. Similarly, the relationship between the scale parameter of the spatial GEV model and the geographic covariates latitude and longitude has been displayed in Figure 5.

Model Selection Criterion Using Takeuche Information
Criteria. Based on the linear trend relation between the ozone concentration with that of latitude and longitude, the spatial generalized extreme value analysis has been fitted. erefore, a various combinations of covariates have been used to model a spatial GEV model. Using this specification, 9 models have been fitted. Takeuche information criteria (TIC) were used for selection. Based on this criterion, the model having small TIC would be preferable. TIC of the combinations of covariates is displayed in Table 2.
e second combination has been selected as good model for both the location and scale parameters with the consideration of the type of relationships displayed in Figures 4 and 5 and actually the smaller TIC value as well. TIC has been used as model selection criteria since Takeuchi [21] proposed Takeuchi information criterion (TIC) as a robust AIC in the spatial GEV model.

Specification of Location and Scale Parameter.
e response surface with the covariates latitude and longitude has been fitted as follows:

Advances in Meteorology
Referring the results in Table 3, the model coefficient for latitude covariate is negative indicating that there is an inverse relation between ozone concentration and latitude. e negative relation between ozone concentration dispersion and latitude covariate has been observed.

Interpolation of Return Level for Spatial GEV (No Spatial Dependence)
(1) Interpolation of 5-Year Return Level. For annual maxima data, the value of the return level can be estimated from the GEV distribution function directly.
As depicted in Figure 6, different colors show the intensity of the extreme ozone concentrations in various air monitoring sites of South Korea. e air monitoring sites found in territories having higher longitude have higher 5year return level compared to other areas. e 5-year return level is the ozone concentration that is expected to be exceeded once in 5 years.
(2) Interpolation of 10 Years' Return Level. e 10 years' return levels (Figure 7) have been interpolated for the study areas using the spatial generalized extreme value analysis. A similar pattern has been observed as the 5 years' return level.
ose areas having smaller longitude have a lower return level. e higher return periods have higher return levels. To confirm this relation between return periods and return level, we interpolated the 25, 50, and 100 years' return levels.
ese return levels are the magnitude of the extreme ozone concentrations that are expected to be exceeded once in 25, 50, and 100 years, respectively.   Figures 9 and 10, respectively. As seen in the 5, 10, and 25 years' return levels, the same pattern of interpolated return levels for ozone concentration has been observed for both 50 and 100 years' return periods.

Model Parameter Estimation of Schlater's Model.
e estimated parameter values based on Schlater's model is displayed in Table 4. ese values of model parameters are used for interpolation of return levels of extreme ozone in the specified period of time. Moreover, the standard error of each parameter of Schlater's model depicts the level of dispersion of the possible model parameters in sampling distributions.

Model Parameter Estimation of Extremal t Model.
e estimated model parameter and the corresponding standard error for each parameters value based on the extremal t model specifications have been displayed in Table 5.

Model Comparison Using 5-Fold Cross Validation.
Random classification of these 29 stations in to 5 folds has been done. Four of the folds have 6 air monitoring sites and the last fold has 5 sites. Using 5 years' return levels as actual ozone concentration and the interpolated ozone concentration as predicted values, the 7 performance measures have been computed for each fold, and we took the average of it (Table 6).
We used the root-mean-squared error (RMSE) as a measure of predictive performance. In Table 6, using the 5 years' return level, the RMSE for the Bayesian kriging model brought the minimum RMSE. is indicates that it has good prediction performance relative to the other models. Similar results have been found for the 10, 25, 50, and 100 years' return levels. Table 1: e slope (β 1 ) and its standard deviation (SD) of the trend line fitted for Bayesian extreme value analysis at 5% level of significance in all 29 air monitoring sites. Quentin conducted a study with the title "Comparison of spatial extreme value models application to precipitation data" [22]. is paper compares six models from max-stable process and other hierarchical models using the return level. But, the uniqueness of paper mainly lies on the consideration of other means of model comparison called k-fold cross validation in addition to return level measure.

Advances in Meteorology
When using the 5 years' return level as an observed value, the RMSE for the Bayesian kriging models brought about the RMSE of 0.063, and it is relatively the minimum compared to 0.077 for Schlater's model, 0.118 for the spatial GEV with MLE, and 0.083 for the extremal t model. Similarly, using the 25 years' return level as observed value, the RMSE for Schlater's, spatial GEV (MLE), Bayesian kriging, and extremal t are 0.083, 0.130, 0.071, and 0.089, respectively. e last return level considered as an observed value was the 50 years' return level. Using this value, it has brought about an RMSE of 0.089, 0.137, 0.077, and 0.091 for Schlater's, spatial GEV (MLE), Bayesian kriging, and extremal t models, respectively. Zahidetal conducted a study on spatial prediction and concluded that Bayesian universal kriging fits better than universal kriging [23]. Based on the result from the study by Cui, Bayesian kriging was more precise as compared to those obtained with ordinary kriging [24]. A simulation study    highlighted that the methodology of conditional simulations for the extremal t process was effective for a large range of α ∈ [1,6] values which characterizes the spatial asymptotic dependence [25]. e finding from the analysis indicated that the Bayesian kriging model brought about the minimum predictive performance measure compared to other extreme value models considered in this research. For researchers interested to extend this research, I would suggest to consider the criterion of amount of time consumed for estimating model parameters and the CPU used for model comparison.

Conclusion
In this paper, spatial GEV using MLE estimation, two models from max-stable process, Schlather's and extremal t, and the Bayesian kriging have been put in comparison with respect to the predictive performance measures. e station by station analysis using the Bayesian approach depicted that all the 29 stations considered in this paper brought about a positive trend in the annual extreme ozone concentration. is would indicate that that the ozone concentration of all these air monitoring sites has an increasing trend with time. From the interpolation of the 5, 10, 25, 50, and 100 years' return level, most stations found in the territories will have a higher ozone concentration.

EVT:
Extreme value theory GEV: Generalized extreme value MAE: Mean absolute error MLE: Maximum likelihood estimation rBIAS: Relative bias rMSEP: Relative mean separation RMSE: Root-mean-squared error SD: Standard deviation TIC: Takeuche information criteria UV-B: Ultraviolet B.

Data Availability
All the data used in this study are available from the corresponding author upon request.
Additional Points e use of national data that can represent South Korea and the utilization of the long period (25 years) extreme ozone concentration data are among the strengths of this study. e use of national data can help us to make inference to the national level. Although the data source where I extracted the data is a reliable source, the use of secondary data by itself has some limitation.

Conflicts of Interest
e author declares no conflicts of interest.