Initialized Decadal Predictions by LASG/IAP Climate System Model FGOALS-s2: Evaluations of Strengths and Weaknesses

Decadal prediction experiments are conducted by using the coupled global climate model FGOALS-s2, following the CMIP 5 protocol.The paper documents the initialization procedures for the decadal prediction experiments and summarizes the predictive skills of the experiments, which are assessed through indicators adopted by the IPCC AR5.The observational anomalies of surface and subsurface ocean temperature and salinity are assimilated through amodified incremental analysis update (IAU) scheme.Three sets of 10-year-long hindcast and forecast runs were started every five years in the period of 1960–2005, with the initial conditions taken from the assimilation runs. The decadal prediction experiment by FGOALS-s2 shows significant high predictive skills in the Indian Ocean, tropical western Pacific, and Atlantic, similar to the results of the CMIP5 multimodel ensemble.The predictive skills in the Indian Ocean and tropical western Pacific are primarily attributed to the model response to the external radiative forcing associated with the change of atmospheric compositions. In contrast, the high skills in the Atlantic are attributed, at least partly, to the improvements in the prediction of the Atlantic multidecadal variability coming from the initialization.


Instruction
In recent years, near-term climate predictions for the next 10-30 years is increasingly concerned by the community of climate modeling and policy makers for its potential values in dealing with the economic and social problems associated with the climate change (e.g., [1]). The pioneering decadal prediction studies based on climate models were published during 2007-2009 (e.g., [2][3][4][5]). Then extensive cooperative researches involving decadal predictions, the ENSEMBLES projects [6], and a coordinated decadal prediction experiment under the framework of the CMIP5 [7,8] were launched successively. Sixteen modeling centers had submitted their decadal prediction experiment results to the CMIP5, which were used in the fifth assessment report of the Intergovernmental Panel on Climate Change (IPCC AR5 [9]).
As noted in Meehl et al. [7], decadal prediction is a combination of an initial value problem and a forced boundary condition problem, because decadal prediction encompasses the climate system changes due to internally generated variability as well as externally forced variability. The externally forced variability is driven by external forcing factors, such as changes of atmospheric compositions associated with human activity or volcanic eruption, solar variations, and others, which can be considered as specified external forcing in climate models, as done by historical simulations or RCP projections [8].
Predictive skills of internal variability coming from initializations are primary added value of the decadal prediction experiments relative to the historical simulations and RCP projections. As an initial condition problem, the prediction of internal variability depends on the accurate estimation of initial climate states, which is also the most challenging problem of the decadal prediction. Different institutions have their own distinctive initialization schemes, which are simply introduced in Kirtman et al. [9] and Meehl et al. [1].
For the CMIP5, model initializations performed by most institutions just assimilated oceanic surface and subsurface 2 Advances in Meteorology temperature and salinity. However, there were also some explorations that involve observational atmosphere and sea ice data in the assimilation processes (Table 11.1 in Kirtman et al. [9]).
In terms of the approaches of dealing with the model drift in the forecast, the initialization schemes can be classified into two types, full-field initialization and anomaly initialization [10]. For the full-field initialization, though model biases are largely removed during initialization, the model drifts back towards its preferred state inevitably during hindcast/forecast due to inherent model biases. Therefore, forecast results must be corrected through posterior bias adjustments [10][11][12][13][14]. In the anomaly initialization, the model is constrained by observational anomalies plus model mean state [15]. Therefore, the model is not far away from its preferred state after the initialization and thus minimizes the drift during hindcast/forecast. But, so far, it is not clear which approach is better for the decadal prediction [10]. In the CMIP5, about 2/3 models use the full-field initialization, while the other 1/3 use the anomaly initialization [9].
The motivation of this study is to systematically assess the predictive skills of the decadal predictions experiments by using a coupled global climate model, FGOALS-s2, which has been submitted to the CMIP5. To make horizontal comparisons with other models' results, we use the indicators proposed by Doblas-Reyes et al. [16] to measure the prediction quality, which have been adopted as key indicators by the IPCC AR5 [9].
The rest of the paper is organized as follows. The model FGOALS-s2, experiment designs, observation data, and analysis methods are introduced in Section 2. The skills of the decadal prediction experiments are assessed in Section 3. Finally, Section 4 summarizes the major content.

Model, Experiment Design, Observational
Data, and Analysis Method  [19,20]. The land and ice components are Community Land Model version 3 (CLM3) [21] and Community Sea Ice Model version 5 (CSIM5) [17], respectively. Detailed description of the FGOALS-s2 and its general performances can be found in Bao et al. [18].

Decadal Prediction Experiments.
The decadal prediction experiments include the following two steps.
(a) Initialization. The model was initialized through assimilating observational oceanic temperature and salinity over upper 1000 m for the period of 1955-2005 (hereafter ASSIM run). The observational oceanic data was derived from EN3 v2a, which is gridded objective analysis data, with horizontal resolution of 1 ∘ × 1 ∘ and 42 levels in the vertical direction [22]. Only the anomalies relative to the climatology during 1961-1990 were assimilated (anomaly initialization approach noted in the introduction). The assimilation was confined in the zone of 70 ∘ S-70 ∘ N, with 60-70 ∘ N and 60-70 ∘ S being set as transitional zone. The observational information was introduced into the model integration through a method similar to an incremental analysis update (IAU) method. The IAU technology was designed for data assimilation system for meteorology [23] and then applied to the ocean assimilation [24] and coupled model initialization [25]. Its major advantage over the nudging approach is that it can keep analysis increment constant in model's prognostic equations and thus effectively suppress short-wave noises in the assimilation processes [23].
Because the ocean objective analysis data, EN3 v2a, is monthly mean data, the analysis interval in the assimilation processes was specified as one month ( = 1mon). In one assimilation cycle from to + , the model was integrated freely firstly, which produced the first guess for the assimilation. The analysis increments (Δ ) were calculated as in which and represent monthly mean anomalous ocean states (temperature and salinity) derived from the free integration and the observation. Then the model was restated from again and integrated to + , with analysis increments being introduced through the following way: The left-hand-side term is the time tendency term. The first term in the right hand side represents the forcing and dissipation terms calculated by the model. The last term is the correction term, which keeps constant in the integration interval. The modified IAU scheme has been used in the decadal prediction experiments by using FGOALS-gl [26].
(b) Hindcast/Forecast. The 10-year-long hindcasts/forecasts were started every five years over the period of 1960-2005. Initial conditions were obtained from the ASSIM runs. In the hindcast and forecast stages (before and after 2005), the model was driven by the time-varying radiative forcing consistent with the historical and representative concentration pathways 4.5 (RCP4.5) simulations, respectively. The second step was conducted in strict accordance with the standard experiment design of the CMIP5 [8].
To estimate the uncertainties of the prediction, we performed 3-member ASSIM runs with different initial conditions, which further offered initial conditions for three sets of hindcasts/forecasts runs.

Historical and RCP4.5 Simulations.
For the historical simulation, the FGOALS-s2 was integrated from 1850 to 2005 under the various historical forcing agents, including the concentrations of greenhouse gases and sulfate aerosols, solar cycle variations, and major volcanic eruptions [27]. After 2005, the model was driven by projected radiative forcing under the RCP4.5 scenario, which is referred to as the RCP4.5 simulation. The historical and RCP4.5 simulations are repeated three times with different initial conditions, following the recommendation of the CMIP5 [8].

Observational Data.
The following two datasets are used as observational references to assess the predictive skills of the decadal prediction experiments: (1) HadCRUT3 combined global land and ocean gridded (5 ∘ × 5 ∘ ) surface temperature datasets for the period of 1850 to present [28] and (2) Global Precipitation Climatology Centre monthly precipitation dataset (2.5 ∘ × 2.5 ∘ ) from 1901 to present (GPCC), which is gridded from global station data [29].

Analysis Method.
As noted in Section 2.2, the approach of anomaly initialization was used in the study, which can inhibit model drift during hindcast/forecast effectively [10]. Thus bias correction was not conducted as done for full-field initializations [10]. However, to prevent negative effects of any possible slight model drifts during hindcast/forecast to the predictive skill evaluations, we calculated anomalies as follows [16]: where and are anomalous and raw fields, respectively, for the hindcast/forecast at lead time . denotes ensemble size. The observational anomalies were also calculated by using corresponding years. Then to filter out interannual variability, the annual values were smoothed by a 4-year running average. In the study, we analyzed the predictions averaged over the hindcast/forecast years 2-5, 3-6, 4-7, 5-8, and 6-9.
The main strategy of evaluating the skills of the decadal prediction experiments is to compare it with corresponding historical simulations. In terms of the experiment designs (Section 2.2), their only difference is that the former is started from initialized states every five years, while the latter is successive integrations. Thus the predictions by the two experiments are referred to as INIT and NoINIT predictions, respectively. The comparisons between INIT and NoINIT demonstrate the change of the decadal predictions due to the initialization.
In this study, the skills are quantified by correlation, root mean square error (RMSE), and root mean square skill score (RMSSS), which generally follow Doblas-Reyes et al. [16]. The RMSSS is defined as in which RMSE (climatology) represents no skill baseline. The climatology is equivalent to the persistent zero anomalies. Thus high positive values of RMSSS represent high skills, while negative values represent no skills. The statistical significance of the correlation is tested by one-sided Student's -test. The significance of the ratio in RMSE between the INIT and NoINIT predictions is tested by a two-sided test. The significance of the RMSSS is assessed by using a one-sided test.

Results
We first assess the spatial distributions of the predictive skills on near-surface air temperature and land precipitation. Then we turn to the predictive skills of the global mean nearsurface air temperature and two dominant modes on the interdecadal time scales, the Atlantic multidecadal variability, and the Pacific interdecadal variability. Figure 1 shows the global distributions of the RMSSS to quantify the skills of the ensemble mean of the INIT runs in predicting nearsurface air temperature. For the prediction averaged over hindcast years 2-5, the forecast system has positive skills over much of the Atlantic and Indian Ocean and some areas in the Eurasia continent at 15% level of significance, while the system shows the low skills over the most of the Pacific, expect for the tropical western Pacific. These regions with significant high skills are generally consistent with the CMIP5 multimodel ensemble (MME) mean ( Figure 11.4a in Kirtman et al. [9]). The major disadvantage of the FGOALS-s2 relative to the MME mean is that the former skills in the midlatitude North Atlantic are lower than the latter. Many previous studies have noted that the Indian Ocean and western Pacific are primarily dominated by warming trend associated with anthropogenic forcing, which can be reasonably reproduced by historical simulations [30,31]. To estimate the added skills coming from the initialization, the ratios of the RMSEs between the ensemble mean of the INIT runs and that of the NoINIT runs are calculated. It is clear that the RMSE of the INIT is less than that of the NoINIT over the majority of the globe. However, it is only over some areas of the Atlantic and Indian Ocean that the skill improvements pass 15% significance level.

Spatial Distributions of Predictive Skills.
For the prediction averaged over the hindcast years 6-9, the spatial distribution of the RMSSS generally resembles that for the hindcast years 2-5. Obvious increase of the INIT skills relative to the NoINIT is just seen over the midlatitude North Atlantic and southern Indian Ocean. The ratio of the RMSEs is also less than 1 over majority of the globe. However, there is nearly no area passing the significance test.
The results indicate that the prediction information due to the initialization becomes smaller with the increase of the prediction time [1]. Compared with the near-surface air temperature, the ensemble mean of the INIT runs shows very low skills in the prediction of land precipitation for both the hindcast years 2-5 and 6-9. The global distribution of the RMSSS indicates that there are only sporadic regions with positive skills. These positive skills cannot pass 15% significance test (Figures 2(a) and 2(b)). Meanwhile, the ratio of RMSEs between the INIT and INI runs indicates that the skill improvement due to the initialization is very limited (Figures 2(c) and 2(d)). The results are consistent with the CMIP5 MME ( Figure 11.5 in Kirtman et al. [9]).

Global Mean Near-Surface
Temperature. Predictive skills of the area-weighted global mean near-surface air temperature (GMST) are quantified by correlation and RMSE (Figures 3(a) and 3(b)). The GMSTs simulated by all the individual members and ensemble means of the INIT and NoINIT runs for different hindcast range are highly correlated with the corresponding observational references at 5% significance level. In terms of the correlations, the skills of the INIT runs are somewhat lower than those of the NoINIT runs, especially in the early prediction time. In contrast, in terms of the RMSEs, former skills are higher than the latter. However, it is clear that the skill differences between the INIT and NoINIT become smaller with the increase in the prediction time, in terms of both the correlations and the RMSEs. It indicates that the prediction information coming from the initialization gradually decreases with the increase of the prediction time, and the evolution of the GMST is dominated by the external forcing associated with atmospheric composition increasingly.
The temporal evolutions of the GMST for the hindcast/forecast years 6-9 and corresponding observational reference are shown (Figure 3(c)). During the latter half of the 20th century, the GMST is dominated by a significant warming trend. However, during early 21st century, a hiatus of the GMST rise is observed (e.g., [32][33][34][35]; Figure 3(c)). Both the ensemble mean of INIT runs averaged over hindcast years 6-9 and that of NoINIT runs failed to simulate the hiatus of the GMST rise; however, the former warming trend is much smaller than the latter and more close to the observation, especially after 2000 (Figure 3(c)). This feature can be seen more clearly in the hindcast years 1-4 ( Figure 3(d)). The result is robust among different models (e.g., [3,9,36]).
It is interesting to further investigate what cause the correlation skills of the INIT runs to be lower than those of the NoINIT runs in the early prediction time. Compared with the hindcast years 6-9, the GMSTs over the hindcast years 1-4 are closer to the corresponding observation references, except for the prediction started from 1985 (Figure 3(d)). The GMST over 1986-1989 (hindcast years 1-4) is much higher than the observation. In the observation, 1986 and 1987 are dominated by a strong El Nino event, while 1988 and 1989 are dominated by a strong La Nina event (Figures 4(a)-4(d)). For the four-year average, the GMST is nearly in a neutral  Figure 2: As in Figure 1, but for the land precipitation.    However, the simulated El Nino persists longer than that in the observation and evolves to a neutral state rather than a strong La Nina as in the observation (Figures 4(h) and 4(i)). Therefore, the predicted GMST averaged over the four years is dominated by El Nino-like pattern (Figure 4(j)). For the NoINIT runs, though the ENSO evolution is completely different from that in the observation (Figures 4(k)-4(n)), the simulated GMST averaged over the four years is in the neutral state (Figure 4(o)). The results indicate that the decadal prediction is sometimes influenced by the interannual variability, Advances in Meteorology especially in the early prediction time. The negative impacts may be partly overcome through increasing the number of the hindcasts; that is, hindcasts are performed once per year instead of once every five years.

Atlantic Multidecadal Variability.
The spatial distributions of the RMSSS for the near-surface temperature (Figure 1(a)) indicate that the ensemble mean of the INIT runs shows high skills in the Atlantic. In the subsection, we assess the skills of the individual members of the INIT and their ensemble mean in predicting the Atlantic multidecadal variability (AMV). The AMV is depicted by an index defined as area-averaged SST anomalies in the 0 ∘ -60 ∘ N, 80 ∘ -0 ∘ W minus the area-averaged near global SST anomalies in the 60 ∘ S-60 ∘ N [37]. The predictive skills are measured by correlation along the hindcast time for 4-year averages ( Figure 5(a)). There is only one INIT member that reproduces the AMV index highly correlated with the observation reference at 5% significance level over the hindcast years 5-8 and 6-9. The skills of the ensemble mean are higher than any individual member over the hindcast years 5-8 and 6-9. The AMV in the ensemble mean is highly correlated with the observation reference at most hindcast ranges. The highest correlation is reached in the hindcast years 6-9. In contrast, the NoINIT runs do not have any significant correlations with the observation references.
The skills of the INIT runs are further quantified by RMSE ( Figure 5(b)). The skills of the ensemble mean of the INIT runs are also higher than all the members. For the ensemble mean, the smallest RMSE is reached in the forecast years 6-9. It is significantly smaller than the counterpart of the ensemble mean of the NoINIT runs, indicating that the added skill coming from the initialization is significantly in the prediction of the AMV. The high skills of the ensemble mean of the INIT runs are more clearly demonstrated by the high  Figure 5(c)). Lag 0 represents simultaneous correlation. Lags 2 and 5 represent that the AMV index lags the heat transport anomalies by 2 and 5 years, respectively. That is, the heat transport anomalies are averaged over the hindcast years 6-9 (lag 0), 4-7 (lag 2), and 1-4 (lag 5), respectively.
consistency of the predicted time series of the AMV index for the hindcast years 6-9 and corresponding observation reference ( Figure 5(c)). The enhanced predictive skills of the AMV because of the initialization stand out in most decadal prediction experiments as the major added value relative to the historical and RCP simulations (e.g., [1,3,[38][39][40] and many others). However, the performances of the FGOALS-s2 are somewhat different from previous studies. Kim et al. [41] showed that correlation skills of the AMV of seven models from the CMIP5 MME generally decrease with the prediction time far away from the initial time. In contrast, the predictive skills of the FGOALS-s2 change little in the various hindcast ranges, and skills in the late prediction time (hindcast years 5-8 and 6-9) are even higher than the early time (Figures 5(a) and  5(b)).
Previous studies proposed that the AMV is closely associated with the low-frequency fluctuation of the Atlantic Meridional Overturning Circulation (AMOC) [42,43]. Hence we further investigate whether the predictive skills of the AMV depend on the prediction of the AMOC variations. Figure 6 shows the climatological AMOC simulated by the ensemble mean of the NoINIT runs. The strongest overturning is about 19 Sv, which is located at about 25 ∘ -35 ∘ N, between 800 and 1200 m. The maximum value is very close to the observed value (18.5 Sv) at 26.5 ∘ N [44]. The major discrepancy of the simulated AMOC is that the northward mass flux does not reach high latitudes and the strongest downwelling is located at about 35 ∘ N. It causes the northward heat transport also not to be able to reach the high latitudes, which has some impacts on the prediction of the AMV, as we will see below.
The fluctuation of the AMOC influences the SST anomalies over the North Atlantic through modulating the northward oceanic heat transport [40,42,43]. Hence, we investigate the skills of the INIT runs in predicting the northward oceanic heat transport in the North Atlantic through comparing with the results from the ASSIM runs, which assimilate observational oceanic temperature and salinity and thus are taken as observational references here. Since the northward mass flux associated with the AMOC does not reach high latitudes, the northward heat transport is averaged over the 0-40 ∘ N. The ensemble mean of the INIT runs shows high skills in all the hindcast ranges. The highest skill is reached in the hindcast years 6-9 (Figures 7(a) and 7(b)). The temporal Advances in Meteorology evolution of the anomalous heat transport for the hindcast years 6-9 predicted by the ensemble mean of the INIT runs is very consistent with the corresponding results from the ensemble mean of the ASSIM runs (Figure 7(c)). In contrast, the skills of the heat transport simulated by the NoINIT runs are significantly lower than the counterparts of the INIT runs in the latter three hindcast ranges. For the former three hindcast ranges, the correlation skills of the ensemble mean of the NoINIT runs are close to the counterparts of the ensemble mean of the INIT runs. However, it is clear that the three NoINIT members show high spread and only one member shows similar skills with the INIT runs. It indicates that the skills of the ensemble mean of the NoINIT runs may be overestimated due to small sample sizes.
To test relationships between the AMV and preceding anomalous northward heat transport associated with the AMOC variations in the decadal predictions, we calculate the lag correlations between the AMV indices averaged over the hindcast years 6-9 with the heat transports averaged over the hindcast years 1-4, 3-7, and 6-9, respectively ( Figure 8). It is clear that the AMV is highly correlated with the preceding northward heat transport anomalies from the South Atlantic to about 45 ∘ N. The correlation coefficients decrease drastically to the north of 45 ∘ N, which is consistent with the location of the edge of the AMOC. In contrast, their simultaneous correlations are much lower in the North Atlantic ( Figure 8). The results indicate that the predicted AMV is mainly induced by the northward heat transport anomalies associated with the preceding AMOC fluctuations. Correspondingly, the high predictive skills of the AMV in the hindcast years 6-9 mainly come from the accurate predictions of the preceding AMOC fluctuations and associated heat transport anomalies.

Interdecadal
Variability in the Pacific. The interdecadal variability of the Pacific is dominated by the Pacific decadal oscillation (PDO) [45] or interdecadal Pacific oscillation (IPO) [46,47], both of which are extracted through EOF analysis. Because the INIT runs are not successive integrations, EOF analysis applied to the artificially linked predicted fields may yield false modes.
Wang et al. [48] defined a Mega-ENSO index to represent interdecadal variability in the Pacific. The index can be calculated easily and is highly correlated with the PDO or IPO indices. Therefore, we use the index as a substitute to assess the decadal predictive skills in the Pacific. Following Wang et al. [48], 4-year, instead of 13-year, weighted running averages are applied to the observed SST anomalies from 1960 to 2005, due to the short data length (in Wang et al. [48], 3-year running average was conducted). But the results are not sensitive to the choice of the window length. Then EOF analysis is conducted by using the 4-year runningaveraged SST in the region 50 ∘ S-50 ∘ N. The spatial pattern of the second EOF mode shows typical characteristics of the IPO (Figure 9(a)). The principal component time series of the second EOF mode is defined as the IPO index. Based on the spatial pattern, Mega-ENSO index is defined as the difference in the area-averaged SST anomalies between the western Kshape zone and the eastern triangle zone in Figure 9(a). It is clear that the Mega-ENSO index is highly consistent with the IPO index, with correlation reaching 0.97 (Figure 9(b)). The same analysis processes are applied to the first NoINIT run. The spatial pattern of the IPO simulated by the FGOALS-s2 resembles that in the observation (Figure 9(c)). Thus the FGOALS-s2 shares the same definition of the Mega-ENSO index with the observation. The Mega-ENSO index in the NoINIT run is also highly correlated with the corresponding IPO index (Figure 9(d)).
The skills of INIT (NoINIT) runs in predicting the Mega-ENSO index are measured by correlation and RMSE ( Figure 10). Unfortunately, in terms of both measures, none of the simulations can reproduce the temporal evolution of the Mega-ENSO significantly and the skills of the INIT runs are even lower than the NoINIT runs. The low skills are consistent with Figure 1(a), in which the ensemble mean of the INIT runs shows negative skills in the most areas of the Pacific. The results indicate that the initialization does not enhance the predictive skills of the interdecadal variability in the Pacific significantly, which is also indicated by the CMIP5 MME results [9].

Summary
In the paper, the procedures of the decadal prediction experiments by the coupled global climate model FGOALS-s2, which participated in the CMIP5, are introduced. Then the predictive skills of the experiments are assessed based on the indicators adopted by the IPCC AR5 [9,16]. The main content is summarized as follows.
(1) The decadal prediction experiments involve two steps, initialization and hindcast/forecast. The initialization was performed by assimilating observational ocean temperature and salinity over upper 1000 m through a modified incremental analysis update (IAU) scheme. Based on the scheme, the analysis increment keeps constant in one assimilation cycle (1 month) and thus effectively suppresses the increase of the short-wave noises in the integration. Meanwhile, in the initialization, only observational anomalous fields are assimilated to avoid the model drift in the hindcast and forecast runs. Started from the initial conditions derived from the initialization run, three sets of the 10-year-long hindcast/forecast runs are conducted with 5-year intervals between start dates from 1960 to 2005, following the CMIP5 protocol.
(2) The overall predictive skills of the decadal prediction in near-surface air temperature (TAS) and land precipitation are measured by the global distribution of the RMSSS. For the TAS, the model shows significant high skills in the Indian Ocean, tropical western Pacific, and Atlantic. However, compared with the historical simulations, the decadal prediction experiments do not show significant skill improvements, except for the Atlantic. The results indicate that the skills of the decadal prediction experiments in the Indian Ocean and tropical western Pacific are primarily attributed to the specified external radiative forcing, while the skills in the Atlantic are attributed to the initialization. For the land precipitation, the decadal prediction experiments do not show significant skill improvements relative to the historical simulations.
(3) On the interdecadal time scales, the dominant variability modes are IPO/PDO/Mega-ENSO in the Pacific and AMV in the Atlantic, which are the major forecast objects of the decadal prediction experiments. The prediction system based on the FGOALS-s2 shows high predictive skills in the AMV but low skills in the IPO/PDO/Mega-ENSO, which is similar to the CMIP5 MME. An interesting point is that the predictive skills of AMV of the FGOALS-s2 change little with the increase of the prediction time and even reach highest level in the hindcast years 6-9, rather than decrease as many CMIP5 models [41]. Further investigations indicate that the predictive skills of the AMV in the hindcast years 6-9 mainly come from the accurate predictions of the northward heat transport anomalies associated with the preceding AMOC fluctuations.
(4) Historical and RCP simulations cannot capture the global warming hiatus during the early 2000s. With the introduction of the initialization, the rise of the globally averaged surface air temperature predicted by the decadal prediction experiments of the FGOALS-s2 significantly weakens, which is consistent with CMIP5 MME [9].