Statistical Prediction of the South China Sea Surface Height Anomaly

Based on the simple ocean data assimilation (SODA) data, this study analyzes and forecasts themonthly sea surface height anomaly (SSHA) averaged over South China Sea (SCS). The approach to perform the analysis is a time series decomposition method, which decomposes monthly SSHAs in SCS to the following three parts: interannual, seasonal, and residual terms. Analysis results demonstrate that the SODA SSHA time series are significantly correlated to the AVISO SSHA time series in SCS. To investigate the predictability of SCS SSHA, an exponential smoothing approach and an autoregressive integrated moving average approach are first used to fit the interannual and residual terms of SCS SSHAwhile keeping the seasonal part invariant.Then, an array of forecast experiments with the start time spanning from June 1977 to June 2007 is performed based on the predictionmodel which integrates the above two models and the time-independent seasonal term. Results indicate that the valid forecast time of SCS SSHA of the statistical model is about 7 months, and the predictability of SCS SSHA in Spring and Autumn is stronger than that in Summer and Winter. In addition, the prediction skill of SCS SSHA has remarkable decadal variability, with better phase forecast in 1997–2007.


Introduction
Study on the variation of the sea surface height anomaly (SSHA) is an important issue in physical oceanography and meteorological science.Changes in SSHA will influence the frequency and impact of extreme sea level events which engender lots of negative impact [1,2].Previous studies found that the regional SSHA variability at the interannual time scale dominated by ocean variability is much larger than the global SSHA [3][4][5].As the largest marginal sea in Western Pacific, South China Sea (SCS) is a crucial link between Pacific and Indian Ocean, and it has significant impacts on human activities and the sustainable development of coastal economy and society.
Analysis and forecast are two crucial aspects of SCS SSHA.As far, most attentions have been paid to the analysis of the SSHA in SCS, like the annual variability [6,7], interannual variability [8,9], seasonal variability [10], the rising rate [11,12], the forcing of the variations [13][14][15][16], and the variability associated with El Niño and Southern Oscillation (ENSO) [8,9,16,17].For the forecast of SCS SSHA, the forecast model of SCS SSHA, mainly, includes the dynamical models and statistical methods.Now the 3D dynamical prediction system of the SCS SSHA has been improved [18,19].For example, Wei et al. [18] use a fine-grid dynamical model covering SCS to produce monthly and annual mean SSHA of the SCS from 1992 to 2000, and the model-produced data is in good agreement with altimeter measurements.While the dynamical approaches have been studied widely [1,20,21], little attentions have been paid to the statistical forecast of SSHA in SCS, which is simple and feasible in practice.The Pacific ENSO Applications Climate (PEAC) Centre at the National Oceanographic and Atmospheric Administration (NOAA) uses a statistical model to calculate site-specific seasonal sea level outlooks, and the results indicated that the statistical model is potentially useful in predicting seasonal sea level variations in the U.S.-affiliated Pacific Islands (USAPI) [22].Imani et al. [23] used the Holt-Winters exponential smoothing technique to analyze and forecast Caspian Sea level anomalies from 15-year altimetry 2 Advances in Meteorology data from 1993 to 2008 and found that the modeling results of a 3-year forecasting time span (2005-2008) agree well with the observed time series.As far as we know, no literature on the statistical prediction of SCS SSHA has been documented.Also it is feasible to use the statistical models to implement the forecast of the SCS SSHA compared with the complicated dynamical models.In this study, based on the long-term monthly time series of SCS SSHA derived from the Simple Ocean Data Analysis (SODA), we first use a time series decomposition method to analyze the variability of monthly SCS SSHA.Then, a statistical model is constructed to fit the monthly SCS SSHA and used to perform an array of forecast experiments.Finally, the statistical prediction skill of SCS SSHA is investigated.
The remainder of this study is organized as follows: the data, the time series decomposition method, and the statistical model construction are introduced in Section 2. Section 3 presents the analysis results of SSHA in SCS while the forecast results are shown in Section 4. Summary and discussion are given in Section 5.

Methodology
2.1.Data 2.1.1.SODA.An ocean reanalysis product, namely, simple ocean data assimilation (SODA), is used in this study, which is based on the Parallel Ocean Program ocean model with an average horizontal resolution of 0.5 ∘ × 0.5 ∘ and with 40 vertical levels during January 1948 to December 2007 [24].SODA data is downloaded from web site at http://dsrs.atmos.umd.edu/DATA/soda2.2.4/.For our study, we first calculate the monthly mean values of SSH so as to derive the monthly SSHA.Then, we average the monthly SSHA over the South China Sea (5 ∘ N-25 ∘ N, 105 ∘ E-121 ∘ E) to form the basic datasets used in this study.

AVISO. A merged gridded product of Maps of Sea
Level Anomaly produced by AVISO (Archivage Validation et Interpretation des donnees des Satellites Oceanographiques) based on TOPEN/Poseidon, Janson 1, ERS-1, and ERS-2 satellite data is used for evaluating the correctness of SODA data.This product provides SSHAs from January 1993 to December 2007, which consists of maps produced every day on a 1/8 ∘ × 1/8 ∘ Cartesian grid.The monthly SSHA in AVISO is first computed and then used to derive the monthly SCS SSHA.

Decomposition Method.
In this section, we briefly introduce the time series decomposition method, that is, the centralized moving average scheme.This method partitions a monthly time series into the following three components: the interannual component   , the seasonal component   , and the residual component   as follows: where   represents the monthly value at time .
According to the time scales of   and   , it is easy to derive that   reflects the variability of   whose time scale is interseasonal or smaller.As the resolution of   is monthly, it is expected that   is noise-dominant.In addition, the summation of   and   represents the anomaly of   .Due to the static nature of   , the forecasting of   mainly depends on the prediction of the anomaly of   which consists of   and   .Because of the different time scales of   and   , different statistical models are used to fit and forecast   and   .

Models for Fitting and Forecasting.
To investigate the statistical predictability of SSHA in SCS, the statistical forecast model is first constructed in this section.
(a) The Model for Interannual Term.Due to the simplicity and robustness [23,25], Holt's linear exponential smoothing technique is widely used to handle nonseasonal time series by introducing smoothing parameters [26], like the precipitation prediction [27] and the maxima and minima air temperature prediction [28].In this study, we also use Holt's linear exponential smoothing technique to fit and forecast   .The fit formula of   is where Here  and  denote the smoothing parameters.The initial values of  0 and  0 are usually set to 0. Given preset values (i.e., first guesses) of  and , the fitness of   (denoted as T ,  = 8,  − 6) can be calculated sequentially with known T−1 .Note that 7 is the starting index of   (see (2)) and T7 =  7 .Then a cost function between   and T is established.Taking  and  as the control variables to be optimized, an optimization algorithm, like the L-BFGS-B algorithm [29], is used to obtain the optimum  and .In addition, to avoid the overflow of  and , the [0, 1] bound is applied to these two parameters during the optimizing process.
For the forecast of   , a linear function of the lead time is constructed as follows: Here  is the lead time in month.
(b) The Model for Residual Term.Due to the noise-dominant nature of   , we use an auto-regression integrated moving average model (denoted as ARIMA(, , ) [30,31]) to construct the fit model of   .Here  represents the order of the autoregression model (i.e., AR());  is the order of time differential of   ;  indicates the order of the moving average model.The model can be formulated as where   represents the time series of -order time differentiate of   ,  is the time,  1 ,  2 , . . .,   are coefficients of the autoregressive integrated model,  1 ,  2 , . . .,   are coefficients of the moving average model, and   is a white noise time series [32].
For the fitness model, the determination process of the parameters is described as follows.First, the values of , , and  are automatically determined by auto.arimalibiary in R software, according to statistic tests.The value of  is selected based on successive KPSS unit-root tests and then  and  are chosen based on the approach of Akaike's Information Criterion (AIC) [33].Then there are a total of + parameters to be determined.Taking  = 1 as an example, according to (7), given the previous fitted  1 ,  2 , . . .,   ,  1 ,  2 , . . .,   can be computed accordingly.Then a cost function is established between   and   to determine the optimized parameters with an optimization algorithm.
Given that the fitness model has been determined, the R is integrated by   , and the same values of parameters are consistently used to perform the forecast experiment.

Modeling Construction
Affected by various factors, such as solar radiation, evaporation and precipitation, monsoon, and El Niño and Southern Oscillation (ENSO), the SCS SSHA has significant characteristics on different time scales.In this section, we roughly analyze the time variability of SCS SSHA with the time series decomposition method (Section 2.2.1) based on the SODA product.
3.1.Verification of SODA SSHA.Before applying the time series decomposition to the SCS SSHA derived from the SODA product, the correctness of this dataset should be first verified.Figure 1 shows the time series of the monthly SCS SSHA derived from SODA (red) and AVISO (black).It can be seen that the most prominent signal is the seasonality in both products.The correlation coefficient between two time series reaches 0.89 with the significance level above 95%.The root-mean-square error between two time series is 2.56 cm.The SODA product can well capture the temporal variability of SCS SSHA.
To detect the significant periods of these two datasets, we perform the power spectrum analysis.Figure 2 presents the power spectrum of SCS SSHA for AVISO (blue) and SODA (black), where the dashed curves represent the 95% confidence upper limits.Obviously, both time series have 1year and half-year significant periods.
Due to the limited length of AVISO SSHA data, we cannot systematically verify the correctness of the whole time series of SCS SSHA derived from SODA data.However, based on the above analysis, it is reasonable to assume that the quality of the time series of SCS SSHA from SODA data is reliable.

Results of Time Series Decomposition.
Given that the correctness of the SODA SSHA data has been validated by AVISO data, we apply the time series composition to the monthly SCS SSHA from SODA with a 60-year length.Figure 3 shows the time series of three decomposed terms of SSHA in SCS, where the interannual term, residual term, and seasonal term are represented by black, red, and blue curves.For the interannual term, it has significant interannual variability.During most of El Niño years, SCS SSHA is smaller than that during other years.Moreover, there are some inflexions most of which correspond to El Niño or  Thus, SSHA variations in SCS are strongly modulated by ENSO.For the seasonal term, the minima (maxima) occur in Summer (Winter) which may have some connection with the SCS monsoon [10,13].For the residual term that oscillates around 0, its amplitude is smaller than the seasonal term and the interannual term.

Model Construction.
Before performing the forecast experiments of SSHA, the forecast model of SCS SSHA should be first established.Note that the SSHAs from now on indicate the SSHA anomalies.Due to the invariant property of the seasonal term, the forecast model is divided into two parts: the Holt-Winters model for the interannual term and the ARIMA model for the residual term.Given the time series of SCS SSHA, the fitness models for the interannual term and the residual term are first, respectively, constructed and then used to forecast these two quantities.Finally the forecasted interannual term and the residual term are added together to form the forecasted anomaly of SCS SSHA.In this section, we examine the qualities of fitness for the interannual and residual terms.
Figures 4(a) and 4(b) show the fitted values (red curve) and SODA values (blue curve) of the interannual and residual terms of SSHA in SCS.Note that as the time series composition needs to remove the first and last 6 months' SSHAs, the actual fitting model uses the time series between July 1948 and June 2007.For the interannual term, the Holt-Winters model can exactly track the trajectory of the SODA data with little biases in the points of minimum and maximum.For the residual term, the ARIMA model can well simulate the phase of the SODA data, although the simulation of the amplitude is not so good, which may be attributed to the nearly white noise nature of the residual term.Figure 4(c) plots the fitted and SODA values of total SSHA in SCS.The fitted curve is strongly correlated with the SODA time series.Figure 4(d) presents the time series of the biases (computed as the difference between the fitted value and the observed value) of the fitted interannual term (black curve), residual term (blue dashed curve), and total SCS SSHA (red curve).The bias of the fitted total SCS SSHA oscillates around 0 and mainly consists of residual biases.The analysis indicates that the added statistical model is effective and flexible, and then we use the added model to do forecast experiments as follows.

Forecast Results
The last 30 years' analysis results are used to perform the forecast experiment which is similar to the dynamical forecast of physical processes (like ENSO [34,35]).For each month from December 1977 to December 2007, we first decompose the time series (from January 1948 to the current month) of SSHA in SCS into three parts and then use Holt-Winters and ARIMA models to fit the interannual and residual terms, respectively.Then, both models are integrated forward up to 2 years to forecast the interannual and residual terms of SSHA in SCS.Thus, 361 forecast experiments are conducted.Note that as the time series composition needs to remove the SSHAs during the first and last 6 months, the actual initial forecast months for 361 forecast experiments correspond to June 1977 and June 2007.The forecast SSHA anomaly (interannual term plus residual term) is compared to the SODA data with seasonal cycle removed.Figure 5 gives a sketch map of the forecast experiments.
Based on the forecast results, we investigate the predictability of SCS SSHA from the following three aspects.1949 1955 1961 1967 1973 1979 1985 1991 1997 2003  4.1.Forecast Skill.We first totally examine the prediction skill of SSHA in SCS with the anomaly correlation coefficient (ACC) and the root-mean-square error (RMSE) relative to the SODA data.The ACC and RMSE are defined as follows: where  and  index the lead time and the forecast experiment;   and   represent the forecasted and SODA anomaly of SCS SSHA while   and   denote the averages of   and   over all forecast experiments;   indicates the climatological standard deviation of anomaly of SCS SSHA.Figures 6(a) and 6(b) show the variation of RMSE (Figure 6(a)) and ACC (Figure 6(b)) of forecasted SSHA anomaly in SCS with respect to lead time (in months) and start month.The solid curves in Figures 6(a) and 6(b) indicate the 1.0-contour and 0.6-contour, respectively.According to the results of RMSE, we find that the RMSEs forecasted from Spring (like April and May) and Autumn (like October and November) are smaller than the other seasons.The minima of the RMSE happen in May and November, and the maxima occur in June.If an ad hoc 0.6 value of ACC is used to define the valid lead time, the valid forecast time of SCS SSHA anomaly is 9 (7) months in Spring (other seasons).
To further investigate the contributions of the interannual term and residual term to the total quantity, we separately analyze the predictabilities of the interannual term and the residual term.Figures 6(c), 6(d), 6(e), and 6(f) present the same results as Figures 6(a) and 6(b) but for the interannual term and the residual term, respectively.We can see that the valid forecast time of the interannual term is 12 (9) months in Summer (other seasons).In contrast, the prediction skill of the residual term is much worse than the total quantity, especially for the ACC and the RMSEs forecasted from Spring and Autumn.This may be caused by the white noise in the forecast model (i.e., the ARIMA model) of the residual term, which increases the uncertainty of the forecasted residual term.Therefore, the forecast skill of the anomaly of SCS SSHA is mainly contributed by the counterpart of the interannual term and more or less reduced by the bad prediction skill of the residual term.
To figure out the possible reason causing the prediction skill of the interannual term, we calculate the number of extreme (including maximum and minimum) values of the interannual term (decomposed from the SODA SCS SSHA time series between July 1977 and June 2007) that happens in each month (Figure 7).Due to the linear property of the forecast model (i.e., the Holter-Winter model) of the interannual term, the linear trend between the maximum and the minimum can enhance the prediction skill of the interannual term.Thus, if a forecast is started from a month where the extreme value happens, the prediction skill is likely to be improved.From Figure 7, we can see that most extreme values happen in July which is consistent with the results in Figure 6(d) with an exception of the case of June.

Seasonal Forecast Skill.
According to the analysis in last section, the valid forecast time of SCS SSHA is about 7 months.We now assess the seasonal forecasts in this section.Figure 8 plots the SODA data (blue) and 6-month (red in Figure 8(a)) or 9-month (red in Figure 8(b)) lead SCS SSHA anomalies.We can see that, in the seasonal forecast, the statistical model can capture the main variability of SSHA anomaly in SCS, especially for the 6-month lead.The correlation coefficient between the SODA data and the 6month (9-month) lead SCS SSHA anomaly is 0.77 (0.45) with 95% significance level.

Decadal Variability of the Forecast Skill.
To investigate the decadal variability of the prediction skill of SCS SSHA, we divide 361 forecast experiments into three arrays, corresponding to three decades.Figure 9 shows the variations of ACC and RMSE of the predicted SCS SSHA anomaly with respect to the lead time (month) for three decades (green for 1977.6-1987.5;blue for 1987.6-1997.5;red for 1997.6-2007.5)and total thirty years (black).Note that the RMSE here is not normalized by the climatological standard deviation.The SSHA in SCS can be forecasted in 6-month lead for all three decades.As the previous studies indicated, the SCS is modulated by PDO, and the SSH variability in SCS is coherent with PDO [16].We also find that the ACCs for the last decade and the total thirty years with the 7 lead months are higher than those for other two decades, when the two decades  are in warm Pacific Decadal Oscillation (PDO) phase [36].In addition, when we construct the forecast model, the SODA SSHA time series is not long enough.Thus, the forecast model may not catch both the cold and the warm phases, and the forecast in cold phase decades is better than in warm.Conversely, the RMSEs for the last decade are the worst, which may be related to the quality of SODA.From the results of the total years, the valid forecast time of SCS SSHA is about 7 months.Obviously, the prediction skill of SCS SSHA anomaly has apparent decadal variability.

Summary and Discussion
In this study, we first simply contrast the time series of monthly SSHAs calculated from SODA and AVISO datasets in SCS to verify the correctness of the SODA SCS SSHA which is used to construct the forecast model of SCS SSHA.
Results show that SODA well correlates with AVISO data.Afterwards, we use the long-term time series from 1948 to 2007 of SODA SCS SSHA to establish the statistical forecast model.A time series decomposition method is used to decompose the monthly time series of SCS SSHA into the following three parts: interannual term, seasonal term, and

Figure 2 :
Figure2: The power spectrums of SCS SSHA for the AVISO (blue) and SODA (black).The dashed curves represent the 95% confidence upper limit.

Figure 4 :
Figure 4: Time series of fitted (red curve) and SODA's (blue curve) interannual term (a) and residual term (b) of SSHA in South China Sea as well as the total SSHA (c).(d) presents the time series of the biases (computed as the difference between the fitted value and the observed value) of the fitted interannual term (black curve), residual term (blue dashed curve), and total SCS SSHA (red curve).

Figure 5 :
Figure 5: The schematic diagram of the forecast experiments.

Figure 6 :Figure 7 :
Figure 6: Variations of RMSE (a, c, and e) and ACC (b, d, and f) of forecasted SCS SSHA anomaly (a, b), forecasted interannual term (c, d), and forecasted residual term (e, f) with respect to lead time (in month) and start month.The solid curve in (a, c, and e) [b, d, f] is (1.0-contour) [0.6-contour].Note that RMSEs in (a, c, and e) have been normalized by the climatological standard deviations of SCS SSHA anomaly, interannual term, and residual term, respectively.

Figure 8 :
Figure 8: Time series of SODA data (blue curve) and forecasted (red curve) SCS SSHA anomaly with 6 (a) and 9 (b) lead months.
The computational processes of   ,   , and   are listed as follows.{  } filters the short time scale (less than 12 months) information of {  } and keeps the interannual variations.Thus, {  } describes the interannual variabilities of {  }.Note that the valid period of {  } is from July 1948 to June 2007.The following decomposition and analysis as well as the forecast experiments will be also based on {  } during the same period.After the interannual term of {  } has been computed with the above method, we first subtract   from {  } to get the remaining term.In this study, the seasonal term (  ) corresponds to the climatology of the remaining term.Thus, although the length of   is the same as   , the period of   is 12 months.(c) Residual Component.With the interannual term (  ) and the seasonal term (  ) being determined, the residual term   of   is directly computed by   =   −   −   .