Predicting the Water Level Fluctuation in an Alpine Lake Using Physically Based , Artificial Neural Network , and Time Series Forecasting Models

Accurate prediction of water level fluctuation is important in lake management due to its significant impacts in various aspects. This study utilizes four model approaches to predict water levels in the Yuan-Yang Lake (YYL) in Taiwan: a three-dimensional hydrodynamic model, an artificial neural network (ANN) model (back propagation neural network, BPNN), a time series forecasting (autoregressive moving average with exogenous inputs, ARMAX) model, and a combined hydrodynamic and ANN model. Particularly, the black-box ANN model and physically based hydrodynamic model are coupled to more accurately predict water level fluctuation. Hourly water level data (a total of 7296 observations) was collected for model calibration (training) and validation.Three statistical indicators (mean absolute error, root mean square error, and coefficient of correlation) were adopted to evaluatemodel performances. Overall, the results demonstrate that the hydrodynamicmodel can satisfactorily predict hourly water level changes during the calibration stage but not for the validation stage. The ANN and ARMAX models better predict the water level than the hydrodynamic model does. Meanwhile, the results from an ANNmodel are superior to those by the ARMAXmodel in both training and validation phases.The novel proposed concept using a three-dimensional hydrodynamicmodel in conjunction with an ANNmodel has clearly shown the improved prediction accuracy for the water level fluctuation.


Introduction
Accurate predictions of water level fluctuation that results from hydrometeorological variations and anthropogenic disturbances [1] are needed for sustainable development and management of lake water usage [2][3][4][5][6][7][8].As a complicated mathematical problem in water resources planning and engineering, water level variations in a lake represent the balance of water components such as direct precipitation or evaporation at a lake's surface, groundwater exchange, and incoming or outgoing river flows.Besides, water level can be influenced by the near surface wind field [9].Particularly, during severe rain storms (or floods), the water level of a thermally stratified lake can rise sharply with vertical mixing, leaving significant socioeconomic and ecological impacts [10].
To date, lake level variations at various time intervals have drawn more and more research attentions.In general, lake water level changes seasonally (e.g., high in the wet summer and low in the dry winter) with sharp rising/falling limbs during typhoon events, but not in a simple periodic mode (except for the seiche oscillation that occurs mainly in large lakes).Effective prediction tools play an important role in the studies of lakes.They can be used to simulate the lake water level variations based upon the available measured data and predict the possible responses under different scenarios, supporting management decisions of valuable water resources.In the past decades, several types of methodology have been developed.For example, physically based numerical models are constructed using a set of governing equations that address conservation laws of mass/momentum and transport processes.In contrast to a simple zero-dimensional massbalance computation for elevation and storage, these models can provide more detailed spatial-temporal patterns of water level, circulation, vertical stratification/mixing, and other aspects of lake physics [10][11][12][13][14]. Time series forecasting methods that define the trend or stochastic processes of variables have also been applied to predict water level fluctuation in lakes.One of the most commonly used approaches is the autoregressive integrated moving average (ARIMA) model, where the autoregressive and moving average filters account for systematic effects and internal shock effects in the endogenous variable, respectively.ARIMA models require a stationary time series without data missing [15].Based on the observations time series, an identified underlying process can be constructed under the assumptions of linearity, normality, and homoscedasticity.For better prediction accuracy and variability in endogenous variables, an autoregressive moving average with exogenous (ARMAX) model [16] further includes the explanatory variables in the cause-effect technique.
Alternatively, artificial neural networks (ANNs) provide a novel computational approach for one-step-ahead forecasting.ANNs are widely accepted by cross disciplines and suitable for the application of hydrological processes due to their informative processing characteristics, for example, nonlinearity, parallelism, and noise tolerance, as well as learning and generalization capabilities [17][18][19][20][21][22][23].Altunkaynak [19] used an artificial neural network model to accurately predict dynamic changes of surface water level in the lake.Van.Yarar et al. [20] estimated level changes of Lake Beysehir in Turkey using the adaptive neurofuzzy inference system (ANFIS), ANN, and seasonal autoregressive integrated moving average approaches.Kisi et al. [22] forecasted daily lake levels by artificial intelligence approaches.However, detailed comparison of the time series forecasting and ANN models with the physically based models has not yet been studied in lakes.
The purpose of this study is to establish a physically based three-dimensional hydrodynamic model to simulate the water level fluctuation in the Yuan-Yang Lake (YYL) in Taiwan.An artificial neural network model is adopted to predict the lake water level and to improve the calculations by the hydrodynamic model.Compared with increasingly applied data assimilation methodologies [24,25], the novelty in our proposed concept shows another way for accuracy improvement in lake water level prediction.A time series forecasting (ARMAX) model has also been developed.Four quantitative statistical indices including mean absolute error, root mean square error, correlation coefficient, and Nash Sutcliffe efficiency/skill score are used to evaluate the prediction performance.Model accuracy is investigated and carefully discussed.

Description of Study Site and Data Collection
Yuan-Yang Lake (YYL, 24 ∘ 34  60.00  N, 121 ∘ 24  0.00  E, area = 3.7 × 10 4 m 2 , and maximum depth = 4.5 m), located in north-central Taiwan, is a subtropical, subalpine, humic lake in the Cilan National Forest Preserve (Figure 1(a)).The lake is located 1670 m above mean sea level.The geography of the drainage basin allows large quantities of terrestrial runoff from the surrounding mountains to enter YYL.The mean annual air temperature is approximately 13 ∘ C (monthly averages range from −5 to 15 ∘ C), and annual precipitation can exceed 4000 mm.The depth of the euphotic layer is approximately 1.5 m [26].The lake experiences three to seven typhoons annually during the summer and fall; more than 40% of all annual precipitation can occur during the typhoons.Wind speeds over the lake are relatively weak.The dominant wind directions are from the east and southwest because the V-shaped valley faces east to west.Kimura et al. [10] developed a nowcasting system to predict the spatial distributions and temporal variations in the dynamics of YYL when subjected to severe rainstorms.They found that the lake water temperature dynamics were subject to meteorological conditions during normal conditions, while the strong winds and high inflows caused high water levels and affected vertical mixing during the typhoon events.A gauge station was established in April 2004 to collect the water level data hourly; these data are accessible from the Global Ecological Lakes Observatory Network (GLEON) website.All data from the gauge station (Figure 1(a)) and inflow and outflow rates were downloaded from the GLEON publicly accessible database (https://lter.limnology.wisc.edu/project/global-lakes-ecological-observatory-network-gleon).The hourly water level data collected from August 1 to December 31, 2009, and from January 1 to May 31, 2010, were utilized to calibrate/train and validate the hydrodynamic, ANN, and time series foresting models, respectively.The measured time-series data of inflow discharge, outflow discharge, precipitation, and water level in the YYL for the model calibration and validation phases are shown in Figure 3.It can be observed that there are two water level peaks (Figure 3(d)) during the periods of August 5 to 10 and October 3 to 6 in 2009, corresponding to Typhoon Morakot and Typhoon Parma, respectively.Note that the calibration samples including both smooth variations and typhoon-induced peaks ensure the prediction/generalization capabilities of these models in the validation phase.

Hydrodynamic Model Description and Model Setup.
In this study, we adopted the well-known three-dimensional hydrodynamic model EFDC (Environmental Fluid Dynamics Code) [27,28] which has been widely used for describing flow and transport processes in surface water systems, for example, rivers, lakes, estuaries, reservoirs, wetlands, and coastal regions [29][30][31][32][33].This model solves the 3D equations of motion and continuity equation with the Mellor-Yamada level 2.5 turbulence closure scheme [34][35][36] under the stretched (or sigma) vertical coordinates and Cartesian (or curvilinear) orthogonal horizontal coordinates.For numerical aspect, it employs a second-order accurate, three-timelevel finite difference scheme with an internal-external mode splitting procedure to separate internal baroclinic mode from the external free surface gravity wave [37].Further details of the EFDC model are given in Hamrick [27,28].
In this study, EFDC for lake water level prediction was driven by freshwater flow and atmospheric forcing (windshear stress, precipitation/evaporation, and heat flux).An orthogonal boundary-fitted computational grid (containing 700 horizontal cells and five vertical layers) was developed to represent the irregular shorelines and bottoms in YYL study area (Figure 1(b)) based on the topography data measured in August 2007.The nonuniform grid sizes are approximately 6 m × 7 m along the horizontal directions.The simulations were carried out using Δ = 1 second without the sign of numerical instability.

Artificial Neural Networks (ANNs
).An ANN model uses a multilayered technique to approximate complex mathematical functions or data.In this study, the back propagation neural network (BPNN), a learning algorithm based on back propagation of error gradient, was utilized to amend the water level results from hydrodynamic model for better prediction accuracy.
The BPNN proposed by Rumelhart et al. [38] is a multiple-layer network (see an input layer, a hidden layer, and an output layer in Figure 2).Each layer contains a number of neurons with nonlinear differentiable transfer functions.

Input layer (m)
Hidden layer (n) Output layer (l) Each neuron receives inputs from neurons in the previous layers or external inputs, and input is converted to either an output signal or another input signal to be used by neurons in the next layers.Weighted connections between neurons in successive layers represent their relative importance in the network.The neurons execute a weighted summation of all inputs and further assess the weighted sum by activation function : where   is the hidden layer with neuron ,   is the output of neuron ,   is the input to the neuron , and   , and   , are the synaptic weights.
A hyperbolic tangential sigmoid transfer function in (2) is used in the hidden layer: A linear transfer function in (3) is applied in the output layer: To scale the inputs and the targets, normalized equation ( 4) is often used, thus forcing the data to fall within a specified range: where   is the value after normalization,  min and  max are the minimum and maximum data values, respectively, and  min and  max are −1 and 1, respectively.
The training process of a neural network involves tuning the network weight and bias values through minimization of the cost function : where  is the number of input-output training patterns,   () is the difference between the output and target of neuron  for the th input-output pattern, and  is the number of neurons in the output layer.The back propagation method was accomplished using the Levenberg-Marquardt (LM) algorithm [39] that optimizes the network parameters with a nonlinear least squares error minimization technique.The LM algorithm combines the capability of the gradient descent method and Gauss-Newton method in terms of converge and speed.The ANN model together with the LM technique was implemented in the Neural Network Toolbox in Matlab (e.g., 2009 or latest version).

Time Series Forecasting
Model.ARMAX models have been used extensively to represent the relationship of system output and input in the presence of noise in many linear dynamic systems [16].In the case of lake water balance, a simple ARMAX (, , ) model can be expressed as where

Indices of Simulation Performance.
To evaluate the performances of the three-dimensional hydrodynamic model, the BPNN model, and the time series forecasting model, three different criteria were used to compare the predicted results with the observed data: mean absolute error (MAE), root mean square error (RMSE), coefficient of correlation (), and the skill score (SS), as based on the following equations: where  is the total number of data,   is the predicted water level,   is the observed water level,   = (∑  =1 (  )  )/, and   = (∑  =1 (  )  )/.

Calibration and Validation of the Three-Dimensional
Hydrodynamic Model. Figure 4 shows the calibration and validation results for the three-dimensional hydrodynamic model.In the calibration phase, the model captures the observed water level fluctuation with satisfactory predictions of peak water levels associated with the two typhoon events.Interestingly, both the observation and simulation results show high-frequency oscillation, which might be attributed to the diurnal cycles from the atmospheric forcing.Further studies are required to clarify the mechanisms for the temporal variability of water level fluctuations at different scales and will be reported separately in the future.For the validation phase, however, it somewhat fails to mimic the observed water levels (see Figure 4(b) with a zoom-in vertical axis).

ANN Training and Validation.
To forecast the water level, three cases were established using two to four input nodes, that is, the net volume flux  net () with the measured water levels at 1 to 3 hr lag time (), ( − 1), ( − 2) for cases 1, 2, and 3, respectively.The output of these three cases is the predicted water level ( + 1).For the hidden layer, the selection of an appropriate number of nodes is quite important because an excessively large number of nodes may result in overfitting, while an insufficient number of nodes may not capture the information adequately [17][18][19][20][21][22][23].The best network architecture (number of hidden nodes, number of iterations, learning rate, and momentum coefficient) was obtained by trial and error based on RMSE in the training and validation phases.Figure 5 shows the RMSE trend with the increasing number of hidden nodes in the training and validation phases.In this study, twenty hidden nodes in the BPNN were selected to yield the optimal performance.Other ANN training parameters include a learning rate of 0.01, a momentum coefficient of 0.3, and maximum iterations of 400.
A comparison of water level between the observation data and simulated results from ANN case 3 is shown in Figure 6.In contrast to the hydrodynamic model, ANN is able of forecast the measured water levels in both training and validation phases.Based on Table 1, case 3 gives the best performance among these three model settings with MAE, RMSE, , and SS equal to 0.77 cm, 1.17 cm, 0.999, and 0.997 (or 0.77 cm, 1.12 cm, 0.988, and 0.977) in the training (or validation) phase, respectively.
A comparison of water level between the measured data and predicted results from case 5 is shown in Figure 7.According to the performance assessment in Table 1, it can be found that ARMAX (2, 1, 1) yields the best water level prediction with MAE, RMSE, , and SS equal to 0.96 cm, 1.64 cm, 0.997, and 0.995 (or 0.83 cm, 1.18 cm, 0.987, and 0.974) in the training (or validation) phase, respectively.

Water Level Predictions Using the Combination of Hydrodynamic and ANN Models.
To improve the water level  prediction quality, an ANN model was utilized to amend the prediction results from the three-dimensional hydrodynamic model.The proposed approach offers another way for accuracy improvement, in comparison to widely used data assimilation [24,25].Besides the original inputs, we similarly constructed cases 7, 8, and 9 with an additional input  EFDC ( + 1) that represents the water level predicted by the EFDC hydrodynamic model in hour  + 1. Exactly the same learning parameters (i.e., a learning rate of 0.01, a momentum coefficient of 0.3, and maximum iterations of 400) and twenty hidden nodes were chosen for the optimal performance (see the RMSE trend in Figure 8).Figure 9 presents a comparison of water level between the measured data and case 9 results obtained by the combination model for training and validation phases.Table 1 shows that case 9 gives the best performance assessment with MAE, RMSE, , and SS equal to 0.72 cm, 1.05 cm, 0.999, and 0.998 (or 0.80 cm, 1.17 cm, 0.989, and 0.975) in the training (or validation) phase, respectively.

Comparison of the Predicted Water Levels from All Models.
To evaluate the model performance of predicting hourly water levels, the statistical indices for the three-dimensional hydrodynamic model, ANN model, ARMAX model, and   the combination model were compared.Figures 10 and 11 further present scatter plots for the measurement-prediction pairs of the hourly water level in the calibration (training) and validation phases.The three-dimensional hydrodynamic model did not show good predictions for lake water level, compared to other prediction approaches (Figures 10(a) and 11(a)).The ANN model better predicts water level variations than the ARMAX model.Overall, the combination model provides further accuracy for hourly water level predictions.The five input nodes consisting of ( − 2), ( − 1), (),  net (), and  EFDC ( + 1) are the best model setting to predict hourly water levels ( + 1) in this study.

Discussions
In terms of the computational expense for one-month simulation, a three-dimensional hydrodynamic model requires about 3.5 hours of CPU time on an Intel Core I5 PC, while the ANN and ARMAX models only need 1.5 and 1.2 minutes, respectively.The hydrodynamic model takes much longer computational time, compared with the ANN and ARMAX models.For the simulation aspect, the ARMAX model uses a linear stochastic approach to represent the relationship between the system input and output.Consequently, the nonlinear ANN model is better than the ARMAX model.Besides,  the predictability of the data-driven ANN model could be increased by providing a large number of appropriate inputoutput data sets during the training and validation phases [40].Nevertheless, the black-box features would hinder the understanding of physical processes in a lake, for example, conservation of mass/momentum.Simulation of physical processes is of critical importance for lake management.The hydrodynamic model can be used to physically describe the water level fluctuation in response to the high inflow discharge during the typhoon events with certain accuracy.For lake water level prediction, this study proposed a novel and accurate approach which combines the three-dimensional hydrodynamic model (i.e., a physically based model) with the ANN model (i.e., a black-box data-driven model), providing an alternative way for accuracy improvement in addition to the data assimilation methodology [24,25].The comparison between the present hybrid approach and the data assimilation method deserves further discussion and leaves an interesting topic for future research.

Conclusions
Four modelling approaches (the three-dimensional hydrodynamic model, ANN model, ARMAX model, and the combination model) have been implemented to predict water level fluctuation of YYL in north-central Taiwan.The measured data from August 1 to December 31, 2009 (a total of 3672 data sets), was used for calibration (training), while the data from January 1 to May 31, 2010 (a total of 3624 data sets), was adopted for validation.The performance of these models was comprehensively evaluated by various statistical indices (i.e., MAE, RMSE, , and SS).
The three-dimensional hydrodynamic model satisfactorily presents the temporal variability of water level fluctuation in the calibration phase but somewhat fails to mimic the water level in the validation phase.The ANN model preserving nonlinear characteristics between input and output variables can predict water level fluctuation in both training and validation phases.The ARMAX model predicts the water levels using linear stochastic characteristics.Overall, the accuracy of the three-dimensional hydrodynamic model and ARMAX model is less than that of the ANN model.The development of an ANN model simply requires the time-series of field measurement data, and the implementation of an ANN model does not require iterative computation.Therefore, the ANN model can be accomplished with much less effort in comparison to the three-dimensional hydrodynamic model for a lake.However, the ANN is a black-box model, while the hydrodynamic model is a physically based model.One novel idea is proposed to combine the three-dimensional hydrodynamic and ANN models to predict water level fluctuation in the lake.The water level results predicted by the threedimensional hydrodynamic model were amended using the ANN model for further accuracy (rather than the widely used data assimilation methods [24,25]).The overall results show that the combination model yields better prediction results for water level fluctuation.Currently, a systematic study for more detailed internal physics (circulation and vertical mixing/stratification), water quality, and ecological processes in YYL is ongoing based upon this modelling methodology and will be reported soon in the near future.

Figure 1 :
Figure 1: (a) Location of Yuan-Yang Lake in Taiwan, bathymetry (contours in m), and gauge location and (b) horizontal grid of Yuan-Yang Lake for three-dimensional hydrodynamic model.

Figure 4 :
Figure 4: Comparison of predicted hourly water levels with three-dimensional hydrodynamic model and measured data for (a) model calibration and (b) model validation.

Figure 5 :
Figure 5: The effect of number of nodes in the hidden layer on the root mean square error (RMSE) for the (a) ANN training phase and (b) ANN validation phase.Note that the triangle, square, and circle marks represent case 1, case 2, and case 3, respectively, for the ANN model.

Figure 6 :Figure 7 :Figure 8 :
Figure 6: Comparison of predicted hourly water levels with the ANN model and measured data for (a) training phase and (b) validation phase.

Figure 9 :Figure 10 :
Figure 9: Comparison of predicted hourly water levels with the combination of the three-dimensional hydrodynamic and ANN models and measured data for the (a) training phase and (b) validation phase.

Figure 11 :
Figure 11: Scatter plots of predicted and measured water levels with (a) three-dimensional hydrodynamic model validation, (b) ANN model validation, (c) ARMAX model validation, and (d) combination of three-dimensional hydrodynamic and ANN models validation.The solid lines represent the parity lines.

Table 1 :
Assessment of water level prediction performance for three-dimensional hydrodynamic model (EFDC), artificial neural network model (ANN), time series forecasting model (ARMAX), and the combination model (EFDC + ANN).