Prediction of North Atlantic Oscillation Index Associated with the Sea Level Pressure Using DWT-LSTM and DWT-ConvLSTM Networks

,


Introduction
e North Atlantic Oscillation (NAO) is the most prominent mode that can reflect the climate variability of the North Atlantic region [1]. Quite a few extreme climate events are associated with the behaviour of the winter NAO [2] and would have a profound impact on several areas [3]. e NAO is a complex nonlinear process with unexplored influence factors, which are difficult to represent in kinetic equations. Besides the discovered factors, such as the geomagnetic activity intensity [4], stratospheric events [5], and sea surface temperature (SST) forcing [6], human influence is also detected as one of the causes of NAO events occurrence [7]. ese uncertainties, along with the uncovered mechanism, make the reliable prediction hard to accomplish.
A number of studies have performed the NAO prediction based on numerical models. e winter-mean NAO index (NAOI) has been predicted using the multisystem ensemble combining the operational seasonal prediction systems (UKMO, CFSv2, and CMCC) [8].
e ASF-20C seasonal hindcast ensemble has been proposed to predict the multidecadal variability of the winter NAO [9]. e predicted winter NAOI is obtained via the dynamical seasonal prediction system forced by MPI-ESM-MR ensemble members [10]. However, the prediction accuracy of numerical models would be influenced by errors in initial conditions and model parameters. In addition to oceanatmosphere models and ensemble products, statistical approaches have also been put into the prediction of climate events gradually. e multiple linear regression (MLR) technique provided a robust prediction of the winter mean NAOI [11]. e k-nearest neighbor, which is based on the local linear regression, was proposed to estimate the lowfrequency winter NAOI [12]. Hall et al. presented the interdecadal evaluation on the basis of the linear regression model [13]. Although the statistical outcome can roughly track the tendency of the NAO variation with averting the initial uncertainties, it is still hard to reflect the characteristics of the NAO via linear models. Currently, machine learning approaches have become alternative techniques in researches of geoscience. In terms of identification in meteorology, a convolutional neural network called CloudNet was proposed for meteorological cloud classification with high precision [14]. e reliable classification of ice crystal habits has been performed using a deep CNN model named TL-ResNet152, and an ice crystal dataset has been built [15]. is kind of data-driven solution has also made significant progress in weather forecasting and climate prediction recently [16][17][18][19]. ereinto, the long short-term memory (LSTM) and the convolutional LSTM (ConvLSTM) exhibit prominent performance on climate prediction since the climatic data is the typical time series. e structure of cell gates in LSTM helps to detect the inner connection between sequence points, and the convolutional layer is conducive to extract features from field data. e 6-24 h nowcasting of typhoon tracks with an improved precision was performed by the LSTM network, combining with data mining technologies [16]. e amount of rainfall was predicted by the deep learning model with ConvLSTM layers, and the two-stacked ConvLSTM network reduced RMSE by 23.0% compared to linear regression [18].
e Oceanic Nino Index (ONI) and sea surface temperature, which are closely relevant to ENSO forecasting, were trained by the ConvLSTM-RM model to provide a relatively reliable prediction [20]. As for the NAO, ConvLSTM network combined with ensemble empirical mode decomposition (EEMD) was adopted to forecast the NAOI sequence, and the result surpassed several state-of-the-art machine learning models [21]. Researchers have proved that the temporal or spatial dependencies among the climatic variable data can be identified by LSTM and ConvLSTM effectively.
In this paper, the NAO prediction is conducted with two schemes. One of them is to predict NAOI via the LSTM network, which is trained by the daily NAOI historical data derived from the Climate Prediction Center (CPC) website. e NAOI sequence is organized into training pairs with the rolling mechanism, and each piece of training input is handled with a smooth strategy of the grey model (GM). e best lead time of training pairs is determined by autocorrelation (ACF) and partial autocorrelation (PACF) within the e-folding time scale of the NAO. Since the typical NAOI is defined from sea level pressure (SLP) [22], the other scheme is to perform NAO prediction using SLP grid data, which is provided by National Oceanic and Atmospheric Administration (NOAA). e SLP frames are grouped in the same way as the NAOI series and fed into the ConvLSTM network.
e spatial characteristics and temporal dependencies of the SLP series can be simultaneously captured by the ConvLSTM layers.
en the predicted NAOI can be calculated by the projection of normalized SLP between the Azores and Iceland on the NAO anomaly pattern. However, the NAO prediction is more concerned with extreme events with extraordinarily high or low NAOI. Here, we adopt the discrete wavelet transform (DWT) method as the preprocessing operation of our models. e DWT is commonly used in singular signal analysis, noise elimination, and signal compression [23]. It can effectively improve the accuracy of the prediction model by decomposing the original sequence into several components, which are relatively stationary compared to the original time series [24]. e performances of our models are validated in the comparison of multiple statistics or machine learning models, including Holt-Winters, support vector regression (SVR), and gated recurrent unit (GRU). By reference to the observation, our models show less error and higher stability compared with the above methods for the NAO cases from 2006 to 2015. And more notably, our models behave especially well in peak values, and the multistep forecasting is more reliable than ensemble forecast products named Global Forecast System (GFS) and the ensemble forecasts (ENSM). e rest of this paper is structured as follows. Section 2 introduces the dataset and describes the technical details of the proposed DWT-LSTM and DWT-ConvLSTM. e experiment procedures and results are presented in Section 3. In the last section, we conclude with a summary and discuss future work.

Dataset and Study Area.
e daily NAOI data is provided by the Climate Prediction Center (CPC) of National Weather Service, with 25504 days from 1950-01-01 to 2019-10-31 (removing the missing data). e SLP daily grid data can be downloaded from the website of the Physical Sciences Division of National Oceanic and Atmospheric Administration (NOAA) and is from 1981-09-01 to 2019-10-31. e brief information about these two datasets is displayed in Table 1.
e region we mainly focus on is located in the North Atlantic region between 20°N and 80°N and between 90°W and 40°E. e resolution of the grid is 2.5°× 2.5°with 25 × 53 grid points in each pattern. Each frame is truncated into a matrix with a size of 24 × 52, which has an even number of rows and columns, to facilitate decomposition.

DWT.
Wavelet transform is an effective tool of spectrum analysis, and it can extract features in both frequency domain and time domain during a local time scale. e lowfrequency part of climate data often carries the principal features of the signal, which would be helpful in researching for interdecadal variation. Capturing its trend is relatively easy. However, short-term prediction using low-frequency data always cannot suit the requirement of accuracy. e reason is that some climate data like NAOI fluctuates wildly within a short period, and the high-frequency component plays an important role on it. In fact, the NAO events with abnormal NAOI (<− 1 or >1) may not be detected because of the forecasting failure on high-frequency data. e significance of the DWT is to separate index sequence or field data with different scales, and different weights are trained and fixed for these components, respectively. DWT would be conducive to preserve the feature of each component and avoid highfrequency signals vanish during the regression process. e steps of the DWT can be described as follows. Firstly, take samples at the discrete points (x � · · · , − (1/2 j ), 0, (1/2 j ), · · ·), and the sampled signal can be regarded as the j-order scaling function: where the discontinuity point is at (k/2 j ) and a k � f(k/2 j ). j and k stand for resolution and translation, and ϕ(x) denotes the scaling function. In this work, we test the performance of multiple mother wavelets, including Haar, db, sym, and coif.
Finally, the Haar wavelet is selected to perform the DWT. ϕ(x) and wavelet function ψ(x) in Haar wavelet are defined as follows: We can furthermore find that the relationship between scaling function and wavelet function can be written as follows: (3) en we decompose the scaling function in formula (1) into even and odd parts (see formula (3)). e even (odd) component is acquired by high (low) pass filter and downsampling filter: e approximate sequence and the detail sequence d are obtained via the above processes: e procedure of the DWT on the index sequence is presented in Figure 1. Based on the result of preliminary experiments, the raw data is set to decompose with only one level. As for the two-dimensional DWT, the operations of high-pass filtering, low-pass filtering, and underclocking need to be conducted in two directions successively, as shown in Figure 2. e original grid data is broken up into approximation (LL), horizontal detail (LH), vertical detail (HL), and diagonal detail (HH). e approximation is the low-frequency component in both horizontal direction and vertical direction, containing the bulk of information in the raw patterns. Analogously, horizontal detail, vertical detail, and diagonal detail are the highfrequency data in the horizontal direction, vertical direction, and both directions, respectively.
After training and forecasting, these components need to be reconstructed to obtain the predicted sequence. e discrete function for decomposition can be described as the following sum form: e output is reestablished by fusing the parameters of wavelet (b 0 ) and scaling (a 0 ), upsampling operator U, and tower-type inverse composition (L and H):

LSTM and ConvLSTM.
e architectures of the NAO prediction using LSTM is shown in Figure 3. e subdivisions with different properties are handled separately with decomposition using DWT. e two sets of weights in neural networks are trained to fit the approximate components and     Figure 1, but for the SLP grid data using two-dimensional DWT. e sample grid data is filtered in the vertical direction at first, and then the acquired parts are decomposed in the horizontal direction, respectively. After these steps, we obtain four components: LL (approximation in two directions), LH (vertical approximation and horizontal detail), HL (vertical detail and horizontal approximation), and HH (detail in both directions).
. (9) en, the normalized data is organized with a rolling form, which incorporates data with the fixed time horizon into the network, with the following expression: where k is the lagged coefficient, which denotes the scale of the time window. Since the NAO can be regarded as the nonlinear process with an e-folding time scale in about two weeks [25], we adopt ACF and PACF to determine the optimal lagged coefficient within 14 days. ACF describes the correlation of a given time series and a lag version within a continuous time interval, while PACF reports the correlation of two independent points, like X n− k and X n , ruling out the effects caused by k-1 points between them: where Figure 4 illustrates the ACF and PACF of the NAOI sequence in 14 days, and the confidence interval is set to 95%. We can see that the dependency reflected by the ACF becomes weaker as the lagged coefficient k increases, containing the synthesis relation of the previous sequence. e PACF reveals the direct association between observation and the lagged version. When k is larger than 5, the PACF is trending to zero. us, k is set to 5 as the optimal lagged coefficient.
en, the sequence is grouped into training pairs with highlighting the features by generation method in the grey model. Following this, these components are fed into our model, which has the bidirectional LSTM layer, dropout layers, and fully connected layers. e LSTM layer adopted in this paper is based on the structure with forget gates: where i t , f t , c t , o t , and h t are the vectors of input gates, forget gates, cell activation, output gates, and hidden layer at time t, and W x , W h , and W c stand for the weights from cells, hidden vectors, and cell activation to another component. σ and tanh denote the logistic sigmoid and hyperbolic tangent, respectively. e rectified linear unit (ReLU) is adopted as the activation function in the LSTM layer. e function of the dropout layer is to avoid overfitting, and the drop rate is set to 0.2. e linear transfer operation is performed by the fully connected layer, and its activation function is sigmoid. Mean square error is selected as the loss function of this model, and a nadam optimizer with a learning rate of 0.002 is used to minimize the cross-entropy loss.
As for the SLP prediction in Figure 5, the preprocessing operation is similar to the NAOI sequence. Since the spatiotemporal signals are mutually dependent, we adopt the convolution and recurrent network to extract the intercoupling features.
e model for SLP data consists of ConvLSTM layers, BatchNormalization layers, MaxPooling layers, UpSampling layers, and the Conv3D layer.
e ConvLSTM layer is able to detect both the temporal and spatial characteristics of the seasonal pressure field. It adopts convolution operation as the feedforward method instead of the matrix multiplication in LSTM. e updates of internal gates and outputs in ConvLSTM are shown as follows: where i t , f t , and o t are the gates of ConvLSTM represented by 3D tensors. * denotes the convolution operator and ∘ denotes the Hadamard product. e kernel size is set to 3 × 3, which is the appropriate size. e role of the BatchNormalization layer is to accelerate the training process in deep neural networks. It can also alleviate the problems of gradient vanishing and gradient exploding.
e MaxPooling layer is applied to simplify the calculation, while the UpSampling layer is responsible for the retrieval of dimensions. e nadam optimizer is also selected in this model. e prediction SLP, which is reconstructed using the inverse transform of the DWT2D, can be utilized to calculate the NAOI. NAOI has multiple definitions related to SLP [26][27][28], and we select the block index as the NAOI in this work [29]: where SLP NAO is the anomaly pattern, which is the first mode decomposed by EOF analysis on 30-year historical SLP data. As shown in Figure 6, the NAO anomaly mode  Figure 5: Similar to Figure 3, the difference is that the raw data is SLP grid data located in the North Atlantic sector, and the prediction model is DWT-ConvLSTM.  layers, and two fully connected layers, totaling 69121 trainable parameters. e DWT-ConvLSTM is composed of 14 layers with 406681 tunable parameters, as reported in Table 2. It contains 4 ConvLSTM layers and adopts the Maxpooling layer to enhance the stability of feature recognition.
We conduct experiments on the Tianhe-2 supercomputer, which is located in the National Supercomputer Center in Guangzhou, China. NVIDIA Tesla K80 GPUs are applied for training acceleration in this work.

NAO Prediction Using NAOI Sequence.
To visualize the performance of DWT-LSTM, we compare its prediction result with multiple models, which contain GRU, near neighbour regression (NNR), LSTM, SVR, and Holt-Winters. As with DWT-LSTM and LSTM, GRU is a kind of recurrent neural network for solving long-term memory and backpropagation. NNR and Holt-Winters are statistical models; thereinto, NNR is a nonparametric method based on the feature space, while Holt-Winters takes the concept of exponential smoothing estimation. SVR transforms machine learning to solve a convex quadratic programming problem by the optimization theory. Using these approaches, a 100-day prediction is performed during the wintertime from Oct 2018 to Feb 2019, which is illustrated in Figure 7. To simplify the problem, assume that an NAO event occurs when NAOI . Although the tendency of LSTM, GRU, and SVR fits the observation data closely, they overestimate (underestimate) the peak value distinctly. For instance, the NAOI provided by the above-mentioned models surrounding Jan 11, 2019, is heavily skewed, and the error is even greater than 1.5. However, the DWT-LSTM behaves better, especially at the peak points enlarged in round boxes.
We select the root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R 2 ) as the evaluation indicators. e formulas are given as follows: where n denotes the number of samples, and the X p stands for the predicted values. X o is the observation, and X o is the mean value of the observation data. To give a visual representation, we plot the indicator values corresponding to the predicted flows in Figure 7, as shown in Figure 8. e RMSE and MAE of the DWT-LSTM are slightly lower than other models, and the R 2 is higher. Since the closer the R 2 is to 1, the more reliable the model is, the result of DWT-LSTM is more similar to the ground truth as a whole. Notably, the DWT-LSTM's RMSE (0.2669), MAE (0.2016), and R 2 (0.8051) improve by around 21.3%, 5.0%, and 17.5%, respectively, in comparison to the pure LSTM. Combining the trend of flows in Figure 7, it shows that errors of other models are mainly concentrated at peaks, and the DWT-LSTM can alleviate the problem to some extent. e previous research counted the NAO events of two phases (NAO + and NAO − ) from 2006 to 2015, totaling 31 cases, 22 for NAO + (red), and 9 for NAO − (blue) [30]. e distribution of these cases is shown in Figure 9, with aggregate persistent periods. We can see that the cases are mainly concentrated in wintertime (DJF), and a case may span two months. e durations of these cases vary from 3 to 33 days, and some of them even last for the best part of a month. For instance, the NAO − event that started in Feb 2010 carried over into the next month, and it would deeply affect adjacent weather.
We also select some metrics to evaluate these cases. In addition to RMSE and MAE, other error evaluation indices are adopted to verify the performance of our model, such as mean absolute percentage error (MAPE), symmetric mean absolute percentage error (SMAPE), variance of absolute percentage error (VAPE), and root mean square percentage error (RMSPE), which are defined as (20) Figure 10 presents the distribution of indices for these cases predicted by multiple models. From formulas (17) and (18), the MAPE and SMAPE would increase sharply when the model produces false negatives or false positives. e statistical methods have comparatively larger deviation at extreme points and result in larger MAPE and SMAPE. Among them, the DWT-LSTM has relatively smaller steadystate errors and better stability with a smaller range of metric values. Besides, lower outliers also display occurrences of a higher positive correlation between the DWT-LSTM's result and the observation.

NAO Prediction Using SLP Grid Data.
As for SLP prediction, we perform a one-step prediction from 2019 to 01-10 to 2019-01-16, as illustrated in Figure 11. From the previous subsection, we know that these frames are in a state of the event with a negative phase. Due to dramatic changes in the NAOI, it is inherently difficult to predict. However, these problems are focal points of the NAO prediction and have been paid much attention. In Figure 11, the first row displays observed patterns of daily SLP, which are rather changeable and have the property of stochasticity. erefore, it is relatively hard to fit the sequential variation. e rows below illustrate the results of ConvLSTM and DWT-ConvLSTM, and the difference sequences between each of them and the ground truth are also displayed below. We can find that the pressure structures predicted by both ConvLSTM and DWT-ConvLSTM are very close to the observation data. Especially for the DWT-ConvLSTM, differences of grids are mainly distributed around 0 in a light green color. As for the ConvLSTM, the errors focus on the pressure centers, indicating the underestimation of extreme values.  8

Mathematical Problems in Engineering
To observe the accuracy directly, we plot some quantitative metrics for prediction results in Figure 11. Despite RMSE, MAE, and R 2 , we also select several indicators related to pattern similarity, such as the universal quality index (UQI) and anomaly correlation coefficient (ACC). UQI has been widely used to measure the image quality in the domain of computer vision, and ACC refers to the spatial correlation between the verifying analysis anomaly and forecast anomaly. UQI and ACC can be formulated as where M jc � (1/n) n i�1 (X p − X m ), M ac � (1/n) n i�1 (X o − X m ), and σ op denotes the covariance of the predicted data and the observation, σ o (σ p ) stands for the standard deviation of the observation (prediction), and X m is the climatological mean. e closer UQI and ACC are to 1, the higher reliability the model has. In Figure 12, bars report the mean values of the corresponding indicators in Figure 11, and error bars (standard deviation) describe the stability of these evaluation values. e DWT-ConvLSTM's errors are significantly lower than ConvLSTM, and its RMSE and MAE reduce by 43.7% and 48.1%. e R 2 , UQI, and ACC of the DWT-ConvLSTM are all greater than 0.9 and are improved by 24.4%, 4.6%, and 40.7% due to the application of the DWT. e result of our model has stronger relevance with actual values. High robustness is also reflected from the shorter error bars of the DWT-ConvLSTM. Taking errors in Figure 12 as a reference, other extreme events achieve similar outcomes to the NAO − event in Figure 11. e DWT-ConvLSTM is also tested with NAO events shown in Figure 9, with 31 cases and 255 frames. Figure 13 displays the distribution of RMSE using DWT-ConvLSTM and ConvLSTM, and RMSE in most of the cases are within 10.
e RMSE of events predicted by DWT-ConvLSTM chiefly centers on 1 to 4, while the distribution is more dispersed in the RMSE of ConvLSTM, which is generally larger than 4. e better result is obtained by the DWT-ConvLSTM, suggesting an essential effect of decomposition and respective training on the forecasting performance. Each training frame is divided into four parts with a size of 12 × 26, and these parts can be fitted accurately with tunable parameters. is ensures that the characteristics of the abnormal signal we care about can be well preserved. Despite the use of the dropout layer, overfitting is inevitable in some cases whose intensity is exorbitant.

Multistep Prediction for NAOI Comparison against Ensemble Forecast.
In general, ensemble forecast products always provide multistep NAOI predictions to meet the practical requirement. GFS and ENSM, which are authoritative products of NAOI prediction, present the 7-day, 10day, and 14-day forecasts on the CPC website (http://www. cpc.ncep.noaa.gov), as shown in Figure 14. e first row is the ground truth, and the following lines show the predicted flows. Overall, the predicted NAOI sequences are basically consistent with the ground truth. However, we can find that the accuracy decreases markedly when the lead time increases, and the 14-day forecasting cannot even identify the probable NAO events in the next two weeks.
To make our prediction have more practical implications, we perform the multistep prediction with a rolling mechanism. e principle of rolling forecasting is presented in Figure 15. e predicted value X 0 is obtained by feeding the input sequence with lagged length, which is set to 5 in this work. en, we stitch it behind the previous input sequence and remove the first element of the sequence, resulting in generating a new input. After that, the prediction is conducted again. After the loop is repeated n times, a prediction sequence with a length of n can be acquired. However, the error would be accumulated if the rolling forecasting has been performed too many times. erefore, we apply this method to implement short-term forecasts within two weeks. Figure 16 plots the multistep prediction result along with the observation data. According to the situation described in Figure 14, the 7-day predictions of both GFS and ENSM achieve better compared with the longer lead time. In the early part, almost all methods can attain relatively high accuracy, while the 14-day predictions of GFS and ENSM always maintain a state of underestimation. In contrast with ENSM, the prediction flow provided by GFS has a similar trend with the actual values. e proposed model, DWT-LSTM and DWT-ConvLSTM, also obtain a conviction result. Notably, our models are more accurate at some points with high NAOI, like 2020-01-05 and 2020-01-08. When the prediction time becomes longer, it has a significant deviation between our predictions and observation, and even a false positive is caused at 2020-01-12. Furthermore, a NAO + event appears during the presented period, and GFS_14, ENSM_10, and ENSM_14 do not discern the occurrence of the event.
e results indicate that proposed models can simulate the main trends of the NAOI variation within 14 days. Table 3 reports the error corresponding to the prediction in Figure 16. e longer the lead time that the ensemble forecast has, the worse the forecasting performance, except the GFS_10. e models of GFS are generally reliable than that of the ENSM, consistent with the description for Figure 16. As for neural networks, DWT-LSTM and DWT-ConvLSTM have slightly smaller errors than other models in this case. It is proved that our models have the potential capacity for the NAOI short-term prediction.   Figure 13: e frequency of RMSE for predicting NAO cases in Figure 9 with DWT-ConvLSTM and ConvLSTM. 31 NAO events in Figure 9 consist of 255 days in total, and the RMSE is calculated for each day.

Conclusions
e NAO is the most dominant atmospheric circulation in the North Hemisphere during the wintertime, and it has a profound effect on the weather of Eurasia, even the global climate. Owing to the drastic change of climate and human activities in the recent two decades, the cause of the climate phenomenon becomes unstable and complicated. In addition, the undetected physical mechanism also makes it more difficult to conduct a reliable prediction. As the alternative solution for time series prediction, the neural network displays its tremendous potential on climate forecasting.
In this paper, we adopt the DWT-LSTM and the DWT-ConvLSTM to perform the short-term NAO prediction. e DWT-LSTM is proposed to predict the daily NAOI sequence directly, while the predicted NAOI can also be obtained by the DWT-ConvLSTM via SLP prediction. e DWT is used as the preprocessing method to enhance the prediction skill of peak values. By decomposing the signal into components with different frequencies, the features of the anomaly signal that we care about can be retained successfully. Compared with other effective models and ensemble forecast products, both of the DWT-LSTM and the DWT-ConvLSTM achieve higher reliability. Besides, our models have smaller errors and yield better results in the prediction of NAO events from 2006 to 2015.
In the future, we tend to predict the NAO with other related meteorological variables, such as zonal wind and sea surface temperature (SST). Moreover, overfitting and error accumulation need to be improved in the following work.
Rolling predicted sequence Predict Predict Lagged length X 0 X 0 X 0 X 1 X 1 X n Figure 15: e sketch map of the rolling forecasting, which is applied in multistep prediction. Each piece of training data's length is set to the lagged coefficient determining in Section 2.3 (k � 5). Suppose that we put the first piece of training data into the model and obtain the predicted value X 0 ; then the next input is generated by removing the first frame and appending X 0 to the end of input. Putting the next input into the model, we obtain a new predicted value X 1 . Repeating the above steps, the predicted sequence is X 0 , X 1 , . . . , X n .

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.