The Advantage of Using International Multimodel Ensemble for Seasonal Precipitation Forecast over Israel

This study analyzes the results of monthly and seasonal precipitation forecasting from seven different global climate forecast models for major basins in Israel within October–April 1982–2010. The six National Multimodel Ensemble (NMME) models and the ECMWF seasonal model were used to calculate an International Multimodel Ensemble (IMME). The study presents the performance of both monthly and seasonal predictions of precipitation accumulated over three months, with respect to different lead times for the ensemble mean values, one per individual model. Additionally, we analyzed the performance of different combinations of models. We present verification of seasonal forecasting using real forecasts, focusing on a small domain characterized by complex terrain, high annual precipitation variability, and a sharp precipitation gradient from west to east as well as from south to north. The results in this study show that, in general, the monthly analysis does not provide very accurate results, even when using the IMME for one-month lead time. We found that the IMME outperformed any single model prediction. Our analysis indicates that the optimal combinations with the high correlation values contain at least threemodels.Moreover, prediction with larger number of models in the ensemble produces more robust predictions. The results obtained in this study highlight the advantages of using an ensemble of global models over single models for small domain.


Introduction
Accurate prediction of precipitation amounts and its spatial distribution is vital for regional and local-scale hydrological applications.This is especially true for arid and semiarid regions such as the Middle East, where estimations and predictions of the highly variable precipitation amounts during the rainy season are critical for water resources planning and management.Therefore, weekly, monthly, and seasonal forecasting are highly desired by regional policymakers, water authorities, and climate-sensitive businesses.It is especially crucial in the early detection of oncoming droughts [1].Seasonal forecasting has made progress in recent years [2], and the climate models provide increasingly accurate and reliable seasonal forecasting with up to 6-9 months' lead time [2,3].The accuracy of such forecasts over land surfaces, however, is still not too favorable [4][5][6].
Previous studies have applied statistical downscaling methods for seasonal forecasting in the Middle East ( [7,8]).The analysis, however, was based only on the Climate Forecast System (CFS) model reanalysis data and not on real reforecasts, so they did not examine the skill of the seasonal forecasts for the various meteorological variables and for different lead times.Global dynamical climate models are providing forecasts for 6-9 months in advance at 80-100 km grid resolution.Due to the chaotic nature of the atmosphere and a limited physical understanding of it, the accuracy of seasonal precipitation forecasting on land is not so favorable unless performed during a period with strong oceanic anomalies, such as El Niño [4][5][6].An intermediate solution is the ensemble forecasting technique.This includes the ensembles of different initial conditions by perturbing sea surface temperature (SST) and wind stress [9], as well 2 Advances in Meteorology as the ensembles from multiple climate forecast models [10].Ensembles of initial conditions based on a single model do not necessarily sample the forecast space completely and usually result in underdispersion errors.Therefore, multimodel ensemble forecasts are receiving more attentions from a variety of perspectives, including the applications in the hydrological forecasting ( [11][12][13][14]).
Recently, the North American multimodel ensemble (NMME model forecast) has been launched in the United States [15], with real-time experimental operational forecasts out of the National Oceanic and Atmospheric Administration (NOAA)/National Centers for Environmental Prediction (NCEP).Becker et al. [19] examined the NMME's skill and verified it against observations globally.They found that, for precipitation rate and sea surface temperature, the NMME skill is higher than that for any single model, although there may be many regional and seasonal variations.The NMME usually has better predictions than most, if not all, individual models.However, both potential predictability and real forecast skill vary depending on geographical region and season.DelSole et al. [16] explored the skill of a combination of forecasts and whether the improvement is dominated by reduction of noise associated with ensemble averaging, or by addition of new predictable signals.They revealed that, for the El Niño-Southern Oscillation hindcasts, the skill of the North American Multimodel Ensemble (NMME) compared to individual models is substantially greater than that expected from increased ensemble size alone; thus one can conclude that the improvement is due to the addition of new signals.Shirvani and Landman [17] examined the skill of seasonal precipitation forecasts over Iran using the NMME models and two other coupled ocean-atmosphere models.Retroactive validations for lead times of up to three months were performed over a 15-year test period from 1995/1996 to 2009/2010.They found that downscaling forecasts from all NMME models generally produces the highest skill forecast at lead times of up to three months.Thober et al. [18] investigated the performance of a seasonal hydrologic prediction system for soil moisture drought forecasting over Europe based on the NMME forecast.They showed that using the NMME yields better results than using climatology values for hydrological forecasting.Moreover, they found that the NMME based forecasts with the full ensemble outperform even the single best-performing model (CFSv2 in this study).
The objective of this paper is to examine the performance and the accuracy of the International Multimodel Ensemble (IMME, a combination of six North American and the ECMWF seasonal models), with respect to the individual models in the ensemble.The study deals only with ensemble mean values (one per individual model, member of the multimodel ensemble, and one for the multimodel ensemble as a whole).
We examined the question of whether the multimodel was able to improve the monthly and seasonal forecast for a relatively small domain, at a country like Israel for different lead times (1 to 5 months).We also examined the best combination of the models to achieve the best results.Until now, examination of global models was done solely for global and large scale domains.Here, we present results for a multimodel ensemble (the NMME and the ECMWF models) for a small scale domain (100 × 300 km) that contains various climate conditions: transition from Mediterranean climate to semiarid conditions.

Methodology and the Study Areas
2.1.Analysis of Various Seasonal Forecast Models.In this study, we have used seven precipitation models: the six NMME models' ensemble (CFSV2, CMC1, CMC2, GFDL, NASA, and NCAR) and the ECMWF model.The NMME data are available in 1 × 1 degree, around 100 km grid resolution, while the ECMWF model runs at 80 km grid resolution.Hindcast data is available for models for the period 1982-2010.We analyzed the rainy season months in Israel, which are October through April.Two types of forecasts were examined: Each Forecast value from each model was statistically downscaled using regression versus the observed monthly due to the fact that global models are too coarse to predict the local precipitation.
The one-month lead means, for example, forecasts that were issued in October for the month of November, and so on.A three-month accumulated precipitation forecasting means forecasting for the coming three months (e.g., a forecast that was issued in October for the accumulated period of November-January).
The NMME ensemble, as considered in Becker et al. [19], is the simple average of all six models.In this study, we calculated a multimodel ensemble composed of the average of all the NMME individual models and the ECMWF model.The monthly precipitation from the models was compared to observed rain gauge values from the Israeli Meteorological Service (IMS) database.The analysis was done for grid locations at both Northern and Central Israel (Figure 1).The models' forecast point 33N/35E was used to represent the northern part of the domain and 32N/35E was used to represent the central part.The models' hindcast at the north and the center was verified against an average value calculated from a cluster of rain gauges (areal average) that is close to forecast grid points at each area.These areas include three rain gauges in the north and four in the center.The rain gauge locations are provided in Figure 1(a) and the rain gauge details are provided in Table 1. Figure 1(b) presents the NMME and the ECMWF forecast grid, which outlines the domain (circular for the NMME and triangular for the ECMWF locations).
The model performance was validated using Pearson Correlation (), Root of Mean Square error (RMSE), and the Nash-Sutcliffe Efficiency (NSE), a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance.Additionally, we tested each model and the multi-IMME model scores against the climatology (the average observed values considering the   precipitation of around 550 to 700 mm/y.Four major water resources are responsible for the majority of the water supply in the country (see their locations in Figure 1(a)): the Lake of Kinneret (Sea of Galilee), the Western Galilee Aquifer, the Costal Aquifer, and the Mountain Aquifer.As described in the previous section, seven rain gauges were chosen to represent the rainfall amounts in those basins.These rain gauges were also used for the seasonal analysis by Wu et al. [7] and Rostkier-Edelstein et al. [8].

Monthly Precipitation.
Tables 2 and 3 and Figures 2(a)-2(f) present the correlation, RMSE, and NSE for the observed monthly precipitation at the northern (Table 2) and central (Table 3) forecast points versus each single model and the IMME model for lead one to five months' hindcast.The results demonstrate the spread between the seven models' performances.For example, as can be seen in Table 1, for the northern domain with one-month lead time, the NCAR models' results were  = 0.58, RMSE = 74, and NSE = 0.38.
For the CFSV2 and ECMWF models,  = 0.64, RMSE = 65, and NSE = 0.43.It should be noted that, as expected, the models' performances are worsened as the lead time increases (Figures 3(a) and 3(b)).
Our results indicate that the IMME, quantified by the three performance measures, outperforms any individual

Seasonal Precipitation (Three Months Accumulated).
Tables 4 and 5 and Figures 4(a)-4(f) present the correlation, RMSE, and NSE for the observed versus model precipitation at the northern (Table 4) and central (Table 5) areas for accumulated three months' precipitation.All the performance measures for the accumulated three-month forecasts showed better performance for all models compared to the monthly analysis.Comparing the performances of the IMME reveals similar conclusions to those in the monthly analysis.That is, the score of the IMME is better than all individual models in the three performance measures in the entire domain.It should be also noted that, as in the monthly analysis, the model performances for the northern part of the domain are better than the central part.

Models Performance with respect to
Climatology.Up to this point, we have used the correlation, the RMSE, and the NSE performances to quantify the performance of the different models.The reality is that these performance measures, while convenient and well known by most researchers in the field, do not provide a reliable basis for comparing the results in respect to the naïve climatology prediction.For example, the NSE is a normalized measurement (-inf to 1) that compares the mean square error generated by the model to the variance of the observations.As such, it represents a form of noise-to-signal ratio, comparing the average variability of model residuals to the variability of the observation.It is implicitly comparing the performance of the particular model to that of the simple model, which uses the mean of all the observation as its prediction.This means that if this simple mean is a very bad predictor, it is easy to obtain a good performance of the particular model compared to it.We claim in our analysis that the reference model hidden in the NSE value (i.e., the simple constant mean) is not the best simple model to have as reference in the NSE calculation.The use of the mean observed value as a reference is a very poor predictor for a set of observations that contain strong seasonal patterns, as is the case in the analysis herein.Since our performance analysis is formed around the lead time, every set of observations contains different months.For example, when considering lead one's observations set, we have values from the various rainy months in Israel (October to April).As such, the constant mean of all of these months is not a good reference model.One can show that a naïve model corresponding just to the use of the mean observations in different months (i.e., climatology) yields already higher NSE values.To deal with this challenge, we adopt the Benchmark Efficiency (BE) measure suggested by Schaefli and Gupta [20].In this measurement, we use the climatology as the simplest reference model instead of the constant mean in the NSE calculation.The BE measure is defined in where   obs is the observed precipitation,   pred is the predicted precipitation,   ref is the precipitation from the best benchmark, and  is the number of observations.
Table 6 provides the correlation, the NSE, and the BE between the models and the observations for one month of lead time, as well as seasonal prediction at the northern part of the domain.As is evident from the results, the climatology obtained very good results due to the reasons explained above.The negative BE indicates that the model is worse than the climatology prediction.It can be seen that the correlation between the climatology and the observation is relatively high ( = 0.653, NSE = 0.42), which is higher than the correlation for the four models (CMC1, CMS2, NASA, and NCAR) in the monthly analysis.Thus the BE for these models is negative.In the monthly analysis, the IMME obtained the best BE of 0.052.The second best are the CFVS2 and GFDL, which obtained the same BE of 0.036.In the seasonal prediction, the climatology outperformed all single models (as indicated by the negative BE) except for the CMC2 model.The seasonal IMME obtained the highest BE of 0.094, which is double the CMC2 performance.

Combination Analysis of All Models.
Up to this point, we have presented the IMME performance as compared to the individual models.In this section we provide a comprehensive analysis that explores the possible combinations between the different models.Using the seven individual models, one can construct 127 possible combinations.The prediction of the combination is the simple mean of the selected models.The correlation of these combination ranges between 0.57 and 0.68.Table 7 presents the rank and correlation from each combination of the northern part of the domain and lead one monthly forecast.The results show that the combinations with the high correlation values contain at least three models.Interestingly, the combination that uses every model is not even in the top 10 combinations.The correlation of the combination that uses all the models is very close to the best combination correlation ( = 0.67 instead of  = 0.68 in the best combination).Further analysis shows that, unlike the top 10 combinations in Table 7 (which are optimal for the northern part and lead one), the combination that uses every model always produces high correlation independently of the lead and the locations, and, as such, it is a more robust predictor.
Figure 5 summarizes the combination analysis in Table 7 for average, maximum, and minimum correlation of onemonth lead time in the northern part of the domain as a function of the number of models in the ensemble.It can be seen that the average, maximum, and minimum corrections increase as a function of the number of models.It is interesting to note the range, that is, the difference between the maximum (best case) and the minimum (worst case) correlation, as a function of the number of models in the ensemble.This relationship shows that the range is decreasing rapidly with the number of models, indicating that the prediction is becoming less uncertain.Using only one model results in a range of correlations between 0.57 and 0.66, with an average correlation of 0.63, while using the seven models ensemble provides a correlation of 0.67.This highlights the advantage of using IMME predictions.
The monthly analysis shows a decrease in model performance with an increase in lead time.While there is only a slight difference between one and two months' lead time, going further to four or five months' lead time results in decreased accuracy ( = 0.50 for most of the models).In general, the monthly analysis does not provide very high results, even when using the IMME for one-month lead time.It has added value in respect to climatology, but this added value is limited to 3% increase in the correlation values.The results for the three months' accumulated precipitation are higher, and it is recommended to use it over the monthly forecasts.In the combination analysis, using the mean of the combination of models outperformed any single model prediction.The results show that the optimal combinations with the high correlation values contain at least three models' ensemble.Our analysis indicates that although the IMME prediction (i.e., a combination that contains all models) was not the optimal combination for the lead one prediction in the northern part of the domain, it is favorable over combinations which use fewer models.This is because the difference between the optimal combination and the IMME performance is very small, but the IMME produces high correlation independently of the lead and the locations.
It is interesting to note that the models' performances are better for the northern part of the domain with respect to the center part.For the IMME, the correlation for one-month lead time is 0.67 for the northern domain compared to 0.63 in the center.For three months' accumulated precipitation, the correlation decreases from 0.71 to 0.66 from the north to the center.The same trend is also noted for the RMSE and NSE.To expand the analysis, we tested the model results in the southern domain (latitude 31/35).The results showed a decreasing trend of the model's performance as we headed south.The correlation for IMME for one-month lead time is 0.55 and 0.58 for the three months' accumulated precipitation in the south.
A possible explanation for this trend could be the climate characteristics of the Eastern Mediterranean.As was described in Section 2.2, Israel is located in the eastern part of the Mediterranean, between about 30 ∘ and 33 ∘ north.The northern part of the country is relatively wet, while the central parts are drier.As we go south to around latitude 31, the climate becomes semiarid with annual precipitation around 300 to 200 mm.According to Goldreich [21], more than 90% of the rainfall in Central and Northern Israel is caused by cold fronts, and the air masses that follow these fronts are associated with extratropical cyclones that pass through the northeastern corner of the Mediterranean Sea (Cyprus low).There are also differences between Northern and Central Israel in the cloud systems contributing to precipitation.In the north (Galilee plain), 40% of the precipitation originates from stationary and cold fronts, compared to 19% in the central plains.On the other hand, Benard cells and coastal fronts contribute only 19% in the north compared to 41% in the center [22].
Due to the shape of Israeli's coastline (see Figure 1), the air mass coming from the west during Cyprus's low synoptic systems affects mostly the northern part of the country, while the center and the southern arid parts of the county are more influenced by moisture coming from the south and the east (smaller scale, convective systems like the red sea trough, and jet stream).The standard deviation of monthly precipitation with respect to the average increases strongly as we go from north to south: 84 mm standard deviation with respect to an average of 100 mm in the north, 78 mm with respect to 84 mm in the center, and 78 mm with respect to 70 mm in the south, which is actually higher than the average.
Numerical models with a coarse resolution of ∼100 km may resolve better large scale weather systems as cold fronts compared to local convergence like Benard cells and coastal fronts [23].Therefore, the lower skill in the south may be due to the current coarse model resolution.
Israel and the surrounding countries are experiencing a significant decrease in water availability.Better precipitation and hydrological forecasts can help decision-makers in the region better plan and manage their water resources systems.This would lead to more informed decisions such as allocation amounts for agriculture, Aquifer and lakes withdrawal amounts, reservoirs operations, and managing the desalination facilities and the reclaimed water facilities.In this study, we quantified the accuracy of the seasonal and the monthly forecasting in Israel and showed the advantage of using an ensemble of global models.Water related decisionmakers, such as the Israeli Water Authority (IWA), will be able to decide whether to take action or not, knowing the forecast skill for the different lead times.These methodologies can also work for other countries that use an integrated water resources management approach, which requires precipitation forecasting in order to derive the optimal management policy.
(a) One to five months' lead time (b) Seasonal forecasting (three months' accumulated forecasts).

Figure 1 :
Figure 1: (a) Rain gauges locations and the major Aquifers in Israel: Western Galilee, Kinneret, mountain Aquifer (Yarqon-Taninim), and coastal Aquifer; (b) NMME and ECMWF gird points in the study area.

Figure 2 :Figure 3 :
Figure 2: The correlation, RMSE, and NSE for 1-5 months' lead times for each individual model and the IMME ensemble at the northern (a, c, e) and central (b, d, f) parts of the domain.

Figure 5 :
Figure5: Combination analysis for average, maximum, and minimum correlation for 1-month lead time precipitation between the models and the reforecast, as a function of the number of models in the ensemble in the northern part of the domain.

Table 1 :
The rain gauges details.
seasonality) in order to quantify their added value over the naïve climatology prediction.2.2.The Study Area.The study area covers the northern and central parts of Israel.Israel is located in the eastern part of the Mediterranean, between about 30 ∘ and 33 ∘ north and 34 ∘ and 36 ∘ east.The northern part of the country is relatively wet with annual precipitation around 800-1000 mm/y in the mountain regions, which provides the major water resources for the country.The central parts are drier with annual

Table 2 :
Correlation, RMSE, and NSE for observed monthly precipitation versus the models predictions: one to five months' lead times, Northern Israel, Latitude: 33.0, Longitude: 35.0.

Table 3 :
Correlation, RMSE, and NSE for observed monthly precipitation versus the models predictions: one to five months' lead times, Central Israel, Latitude: 32.0, Longitude: 35.0.

Table 4 :
Correlation, RMSE, and NSE for observed three months' precipitation versus the models predictions: Northern Israel, Latitude: 33.0, Longitude: 35.0.lead times considered in the analysis.The IMME had the highest correlation, lowest RMSE, and highest NSE.Comparing the northern and the central part of the domain shows that both the individual models and the IMME performance are inferior in the central part of the domain; however the IMME still outperforms any individual model in the ensemble.

Table 6 :
Models' results in respect to the observed climatology quantified by the Benchmark Efficiency (BE), Northern Israel, Latitude: 33.0, Longitude: 35.0.

Table 7 :
Possible models combinations and their rank in the northern part of the domain and lead one forecast.