Bayesian Regularized Neural Network Model Development for Predicting Daily Rainfall from Sea Level Pressure Data: Investigation on Solving Complex Hydrology Problem

School of Computer Science, Baoji University of Arts and Sciences, Baoji 721007, China College of Education for Human Science-ibn Rushed, University of Baghdad, Baghdad, Iraq Computer Techniques Engineering Department, Al-Mustaqbal University College, 51001 Hillah, Babil, Iraq Electrical Engineering Department, College of Engineering, University of Babylon, Hilla, Babil, Iraq Geoinformatic Unit, Geography Section, School of Humanities, Universiti Sains Malaysia, 11800 USM, Pulau Pinang, Malaysia


Introduction
Rainfall decides agricultural activities, ecology, and environment of a region. erefore, it is considered as the key factor of social and economic development of any region [1]. At the same time, rainfall is the most influencing factor for different kinds of natural hazards [2]. Excess rainfall and floods are the most common natural hazards all over the earth [3]. Rainfall deficit and droughts are the most devastating disasters in terms of economic damages [4]. Rainfall is also the defining factor of many other kinds of natural hazards like landslides, soil erosion, and river bank subsidence [5,6]. erefore, the forecasting of rainfall is the major topic of interest to hydrologists and disaster management scientists for many decades [7,8].
Numerous attempts have been made for forecasting rainfall using various physical, empirical, and physio-empirical models. Such models have been used for forecasting rainfall at annual, seasonal, monthly, and daily scale in different regions of the world [9,10]. Physical models are developed considering the synoptic climate and physical mechanism of interactions of different climatic variables causing rainfall in a region [11]. Mathematical models are developed to represent those physical processes to forecast rainfall in physical models. Such models are generally complex and need data and information about many land-ocean-atmospheric variables. Still, the physical models are often not very successful in reliable forecasting of rainfall as it often makes the simple approximation of complex physical phenomena for model development [12]. Empirical models based on statistical methods have been employed as an alternative to physical models [13]. Such models are generally developed focusing a certain climate variable only like forecasting rainfall based on the statistical relationship of rainfall with its antecedent values or with other atmospheric variables related to it. e statistical model is always regionspecific which means the model developed based on the statistical relationship of the regional rainfall with other atmospheric variables is only applicable for the region [14]. erefore, such models are also known as empirical models. e advantages of empirical models are easy development and implementation and better accuracy compared with physical models in most of the cases. However, the major drawback of the empirical model is their complete dependency on historical data used for their development. erefore, they cannot simulate rainfall for an unknown situation [15,16]. For example, the rapid changes in rainfall that earth experienced in recent years and projected for the future cannot be forecasted by empirical models accurately as the models were not developed with such large variability in data [17]. Physical models are the option for forecasting in such situation. To tradeoff the disadvantages of both types of the modeling approach, recent focus is to develop physicalempirical models where empirical models are developed based on the variables physically responsible for the climate of the region. erefore, more emphasis has been given on the development of the physical-empirical model for forecasting rainfall [18,19]. Yim et al. [20] developed a physicalempirical model for the prediction of summer rainfall in southern China using several ocean-atmospheric variables. Yim et al. [9] used sea surface temperature in a regression model for the development of a physical-empirical model for the prediction of rainfall in Taiwan. Chen and Sun [21] used the physical-empirical model for performance improvement of a dynamical model in seasonal rainfall forecasting using sea level pressure (SLP) data. Pour et al. [22] employed SLP in development of physical-empirical models for the development of seasonal rainfall in Peninsular Malaysia.
Generally, statistical models are developed using different forms of regressions ranging from linear to higherorder polynomial (nonlinear) regressions. However, the relationship of rainfall with atmospheric variables responsible for rainfall is often highly intriguing and cannot be represented using generally used statistical methods. Machine learning (ML) algorithms are often used for developing complex regression analysis [23][24][25]. erefore, the development of ML-based physical-empirical forecasting models has grown very fast in recent years.
Forecasting daily rainfall in tropical region is much different compared with other regions due to the presence of highly extreme values and year-round rainfall. Use of more efficient ML algorithms is required for forecasting daily rainfall in such region. Some of the ML models have shown their superiority over others in solving environmental prediction problems. ough there is no single method that shows consistent excellence in solving all kinds of environmental problems in many regions, overall it has been noticed that some models often perform better in solving complex prediction problems which include extreme learning machine (ELM), Bayesian regularized neural networks (BRNNs), Bayesian additive regression trees (BART), extreme gradient boosting (xgBoost), and hybrid neural fuzzy inference system (HNFIS) [26][27][28][29][30][31]. erefore, the comparative performance of these efficient algorithms in forecasting daily rainfall can provide the best forecasting model. e main research objective is to develop a robust machine learning model for forecasting daily rainfall of East Coast of Peninsular Malaysia (ECPM). e ECPM is considered as the most vulnerable region of the peninsula to different kinds of rainfall-driven hazards [32]. Hence, a reliable rainfall pattern forecasting is highly essential for climate sustainable development of the region. Across large number of studies have been conducted for rainfall forecasting in Peninsular Malaysia, only one attempt was made for forecasting rainfall in Peninsular Malaysia (PM) using physical-empirical models [22]. However, the previous study of Pour et al. [1] was limited to the seasonal forecasting only. To the best knowledge of the current study, this is the first attempt to use an array of sophistical ML algorithms for the development of physical-empirical models for forecasting daily rainfall of PM.

East Coast of Peninsular Malaysia (ECPM)
e ECPM (Figure 1) is the most interesting region of PM in terms of rainfall and rainfall-driven hydrological hazards [33]. e area has an undulating topography which varies from 4.0 m in the coastline to 2270.0 m in the interior mountainous region. A considerable part of the interior of the ECPM is mountainous, which influences the climate of the region. e region is characterized by high uniform temperature, high humidity, and copious rainfall [34]. e area receives a significant amount of rainfall even in the driest month. e study area experiences two monsoon seasons, namely, the northeast (NE) monsoon from the mid of October to end of February and the southwest (SW) monsoon from April to September [35]. e months of March and October are intermonsoonal transitional periods. e eastern coastal region is wetter compared with other parts of PM. It receives an annual average rainfall of 2800 mm. Monthly variations in rainfall in ECPM are shown in Figure 2. Heavy rainfall in the region occurs during NE monsoon. Maximum rainfall in a year is usually recorded in November or December. Some parts of the study area suffer flooding at this time of year [36]. On the other hand, the longest dry spell in a year is observed during SW monsoon, particularly in June and July. e mean temperature in ECPM is found more or less uniform throughout the year. Seasonal variation of mean temperature (27.0°C) is always less than 2.0°C. e daily maximum temperature in the study area varies from 29.0°C in January to 33.0°C in April. e minimum temperature varies from 23.0°C in January-February to 25°C in May [37].

Complexity
Several rivers originate in the mountainous interior region of ECPM. Due to high variation in topography and a high amount of rainfall, the region has a dense river network.

Description of the Data Used
e APHRODITE gridded dataset was used as a proxy to observed data. APHRODITE rainfall was developed using gauge precipitation data obtained from the Global Telecommunication System (GTS) network and other hydrometeorological in situ records [38]. It has been developed based on a new interpolation technique with accurate longterm gridded orographic precipitation for Asia [38]. e data are available at http://aphrodite.st.hirosakiu.ac.jp/ product/APHRO_V1101EX_R1/APHRO_MA/025deg_nc/. e recently released NCEP/NCAR ERA2 reanalysis data of SLP with a resolution of 0.25°× 0.25°were used as predictors for the prediction of rainfall. e NCEP/NCAR reanalysis provides datasets of various atmospheric variables at 17 different pressure levels [39]. e NCEP dataset has been utilized for the development of prediction models [40]. Rainfall in ECPM mostly occurs due to moist air from the ocean. e wind circulation in the region depends on SLP.
erefore, it was considered in the present study for the development of the rainfall prediction model. e climate domain consisted of 2730 grid points for the extraction of probable predictors. e climate domain and the grid points from where SLP data were collected are shown in Figure 3.
APHRODITE rainfall data are available for the period 1951−2015 while the NCEP ERA2 reanalysis SLP data are available for the period 1948 to present. erefore, a common period of 1951−2015 was considered in the present study for model development and validation. APHRODITE daily rainfall and NCEP ERA daily SLP data for the period 1951−2015 were collected from the corresponding websites.

Description of Employed Machine Learning Algorithms
Five advanced AI algorithms such as ELM, BRNN, BART, EGB, and HNFIS were used considering the complex relationship of rainfall with SLP. e SLPs over the climate domain having a significant correlation with daily rainfall of ECPM were used to compute their principal components and selection of inputs. e APHRODITE rainfall of all the grid points over the ECPM was average to prepare the daily rainfall of the area. e ML models were used for the prediction of areal averaged daily rainfall of the region. e description of the ML models used in this study is given in the following subsections.

Extreme Learning Machine (ELM).
e ELM model was developed following the structure of a feedforward neural network (FNN) [41]. Basic difference of ELM and FFN is that ELM uses the Moore-Penrose generalized inverse algorithm to calculate the weights of hidden neurons in contrast to gradient algorithms of FFN [42]. A general structure of ELM is shown in Figure 4. e algorithm of ELM is very simple. It takes the predictors as inputs (x) and multiplies them with the estimated weights (w), adds bias (b), applies activation function (g), and repeats the steps for a time equal to the number of layers used to get the output as follows: in which L represents the number of hidden layers, N is training data size, and β is the weight multiplied with the output of hidden layer to get the output, which is calculated as the ratio of hidden layer output matrix H and target matrix T. e H and T can be expressed as follows: (2) e obtained output using equation (1) is backpropagated to the network to repeat the process until the desired level of error in prediction achieved. ELM has a good nonlinear adaptive capability to overcome the limitations of many other ML algorithms [43].

Bayesian Regularized Neural Network (BRNN).
BRNN is a version of an artificial neural network (ANN), which is much robust compared with conventional ANN [44]. e robustness in BRNN is achieved due to Bayesian regularization of ANN parameters. A common error function (E D ) of ANN using early stopping can be expressed as follows: where w is the weight, M is the ANN structure, n is the training data size, and t i is i-th target while t i is the output. Immature convergence of ANN causes overfitting of the model. A regularization of ANN using the Bayesian method helps optimization of ANN parameters using prior values of ANN parameters. For this purpose, an additional term (E w ) is included in objective function in BRNN as follows: where E w is used to penalize the unrealistic weights to have a better generalization and gradual conversion. A gradientbased optimization method is used to minimize the function: in which E w (w | M) is the sum of the square of ANN architecture and α and β represent the hyperparameters to be optimized. BRNN can reveal theoretically complex input-output relationship and thus considered as an efficient predictive model [28,29].

Bayesian Additive Regression Trees (BART).
BART is an ensemble of easily adjustable regression models [26]. It uses the basic structure of random forest (RF) [45], but the outcomes of the trees are estimated using a Bayesian inferential system. is allows estimation of uncertainty and regularization of model parameters to enhance the capability of the conventional tree-based regression method. e procedure used by BART to improve the performance of RF is like the gradient boosting (GB) where many weak models are used to learn the problem [46]. In RF, the outcomes of all trees are averaged whereas in GB, the outcomes are adjusted using constants [47]. e novelty of BART is the use of Bayesian framing. In BART, priors are used to achieve the posterior of the outcomes of the trees.
is provides a higher performance of the model. Besides, a sophisticated penalty system is used which helps in parameter regularization efficiently [26].
is improves or boosts the prediction capability of weak models, and thus it is called GB. e xgBoost is an advanced version of GB where model parameters are regularized properly to enhance its performance [48]. e major advantage of xgBoost is the combination of both high speed and good efficiency. It is generally found to outperform many conventional models when the prediction is done using large datasets [49]. e basic structure of xgBoost is like a random forest where multiple decision trees are generated using data samples following either bagging or boosting method [50]. However, the main difference is in the learning process. In xgBoost, the model tries to learn from the mistake in the previous step. It means, in each step, it focuses on the part which is hard to learn or the model failed to predict properly. It tries to correct the mistake in the previous step by an attempt to solve the hard part of problem. A simple diagram of xgBoost is shown in Figure 5. An ensemble of models is used in this process where different models are used to solve different parts. e more error by a model indicates that the model is weak or the problem is hard enough to solve by this model. e xgBoost tried to improve this weak model to improve the total prediction capability of the model. In each step, it estimates the error and focuses on the part which generates a higher error and try to solve the problem until the desired level of error is achieved [51].

Hybrid Neuro Fuzzy Inference System (HNFIS).
e HNFIS is developed by hybridization of the fuzzy inference system (FIS) with artificial neural network (ANN) [30], where the parameters of FIS were optimized using ANN which enables the system to learn the problem systematically and helps to enhance the predictive performance of the model. ANN has a good capacity to learn automatically. However, it cannot acquire outcomes efficiently. e FIS cannot learn automatically like ANN, but it can acquire the outcomes efficiently through fuzzy logic. erefore, the hybridization of ANN with FIS can improve the learning capability of the nonlinear problem [52]. A typical HNFIS system is presented using the diagram in Figure 6. It is very similar to the adaptive neuro fuzzy inference system (ANFIS) where a multilayer feedforward ANN is the core [53]. e AND operation is employed first, followed by product, normalization, and sum operations, and finally centroid to get the output. However, the weights connecting the nodes are defined using fuzzy sets. e leastsquares approach is used to decide the set of weights. e weights are adjusted using fuzzy rules to reduce model error [54].

Generation of Model Inputs.
e SLP of all the NCEP grid points over the selected climate domain was correlated with the areal averaged daily APHRODITE rainfall over the ECPM. Spearman rank correlation (Spearman, 1904) was used considering high skewness in daily rainfall data [55]. e significance of the correlation was estimated at a significance level of 0.5 (p < 0.05). e SLP data which found significantly correlated with the areal average rainfall of ECPM were used to estimate their principal components (PCs). e principal component analysis (PCA) allows reduction of dimensionality in data [56]. e PCs which are found to represent the maximum variability in SLP data were used as input. In this study, the above procedure was conducted for SLPs for 1-to 3-day lags and the PCs of each lag were used as model input. e PCA has been found as an efficient method for the selection of inputs from a large dataset in previous studies [57,58].

Development of Models and Assessment of Model
Performance. APHRODITE daily rainfall and PCs of daily SLP data for the period 1951−2015 were used for the development of the model. ere is no universal rule of using the proportion of data for training and validation. However, conventionally 70% of available data are used for model training. Following the generally adopted division rule, 70% of time series data (01 January 1951 to 30 June 1995) were used to train the models while the data for the period July 1995 to December 2015 were used for model validation.
e performance of the ML model depends on optimum values of model parameters. In this study, a k-fold validation was used for the optimization of ML model parameters, where k was considered as 10. Data were shuffled in each iteration so that all the data are used for model validation.
e parameter values estimated in each iteration were averaged to get the optimum parameters. In this study, two parameters for ELM (number of hidden units and activation function), three parameters for BRNN (α, β, and the number of neurons), five parameters for BART (number of trees, prior boundary, base terminal not, power terminal node, and degrees of freedom), four parameters for xgBoost (number Complexity 5 of boosting iterations, two regularization parameters, and learning rate), and two HNFIS parameters (number of fuzzy terms and maximum iterations) were optimized. e caret package of statistical software R was used for this purpose. All the models were also developed using different packages available in R.
Two error metrics, namely, mean absolute error (MAE) and normalized root mean square error (NRMSE) and two association metrics such as Wilmott's modified index of agreement (WI) and Kling-Gupta efficiency (KGE) were used to assess the performance of the models statistically [59,60]. e model performance was evaluated during the validation period to show the relative performance of the models in predicting an areal average of daily rainfall in ECPM with a lag time of one day. e mathematical formulas are as follows: where N is the number of observations, y f is the forecasted values, y o is the observed values, y f and y o are the mean values of the forecasted and observed, β is the bias, c is the fraction of the coefficient of variation, and r is the linear correlation.

Analysis of SLP Data for Generation of Inputs.
e areal average rainfall of ECPM was first correlated with different lags of the SLP data at different NCEP grid points over the climate domain. Obtained results are presented in Figure 7. e correlation of rainfall with 1-to 3-lag days is presented in three maps in the figure. e colour ramp in the maps is used to show the positive (negative) correlation. e significant correlation (p < 0.05) is presented with dots. e figure shows that rainfall in ECPM is positively correlated with SLP in the north and negatively correlated with SLP in the south. e differences in SLP cause movement of air over the region. e moist air from the sea when enters the land causes rainfall. erefore, SLP data can be used for prediction of rainfall in ECPM. e SLP data of nearly 2321 to 2486 are found to correlate with rainfall for different lags. Such a big number of variables cannot be used for the development of the model. erefore, PCA was used to reduce the dimensionality of data. e significantly correlated SLP data were used to estimate their PCs using PCA. Obtained PCs for one-day lag SLP are shown in Figure 8. e scree plot of the first 10 PCs for one-day lag SLP is shown in Figure 8 e results showed that the first 5 PCs represent 99% of the variance in the SLP data over the climate domain. e first PC covered 91.7% of variability, the first 2 PCs covered 95.4% of variability, the first 3 PCs covered 98.3% of the variability, and so on. e results indicated increasing of PCs after the first PC does not increase the explained variability by the PCs significantly. Similar results were obtained for SLP with 2-and 3-day lags. e first PC along was able to explain more than 91% of the variability in data for all the three lags. e higher number of PCs as inputs increases the complexity of the model. erefore, to make a parsimonious model, only the first PC of each lag was selected. ese three PCs were finally used as input to train the models.

Comparison of Model Performance Using Statistical
Metrics.
e performance of the models was first evaluated using statistical metrics. Obtained results are given in Table 1. Only two error and association metrics were used as they can assess the capability of the models in replicating all the aspects including bias in mean and variability and association in rainfall amount and volume. For a better comparison of the relative performance of the model based on all the metrics together, the results are also presented using a radar chart in Figure 9.
e results showed that none of the models was able to attain very high metric values such as KGE or WI above 0.8.

Complexity
However, metric values obtained by the models should be judged considering the amount of sample size of data. e obtained statistics were estimated for the validation period July 1995 to December 2015 which means the sample size of 7120. Computation of any statistical metrics depends on sample size (n), where metric value decreases with the increase in n, even when the model behave the same. erefore, the KGE or WI values more than 0.5 should be considered very good for this high amount of data size. is can be justified with a small amount of error in terms of MAE and NRMSE.
Overall, all the models were found to perform similarly with a slightly better or less performance in terms of different indices. However, BRNN was found to perform best in term of all metrics. Figure 9 shows lower values of BRNN in terms of error metrics and high values in terms of association-based metrics. e HNFIS can be considered as the secondbest model followed by BART and xgBoost, while ELM showed the least performance.
It is a well-established fact that the performance of ML models depends on the data used. ere is no universal ML model which shows good performance for all kinds of data. It is not possible to justify the better performance of BRNN and the least performance of ELM. Only it can be remarked that the present study revealed that BRNN is more suitable for prediction of rainfall in ECPM using SLP data over the climate domain of PM.

Comparison of Model Performance through Graphical
Presentations.
e performance of the five ML models used in this study was evaluated using graphical presentation of

Input
Hidden Output Figure 4: Schematic diagram showing the basic structure of an extreme learning machine.
Tree N Y n = f n (X n ) Complexity observed and model data in three different ways. Considering the large volume of data, different innovative approaches were employed for the presentation of data. First, the density-scatter plots were prepared. is type of plot is suitable for comparison of a large amount of data.
As the data size was 7120, it was not possible to compare effectively using a simple scatter plot. erefore, densityscatter plots are used in this study. Obtained results are presented in Figure 10. e red colour in the plot represents a higher density of data, while the blue represents a lower 8 Complexity density of data. As the days with zero or low rainfall is very high compared with days with higher rainfall, the red zones (high density of data) were noticed only for the lower rainfall. If the diagonal line of the plot bisects the red and other higher rainfall zones symmetrically, the model is considered better. e results presented in Figure 10 reveal that high-density data were more above the bisecting line for all the models which indicates over prediction of low rainfall values by all the models. However, the BRNN was most perfect followed by HNFIS and BART in symmetrically bisecting the high-density rainfall data. e results also showed that none of the models was capable to simulate the high rainfall of values. All the model underpredicted the high rainfall amount. Overall, the underprediction was less for BRNN followed by BART. Daily rainfall data are highly skewed. It follows a gamma distribution. e number of no rainfall days is very high compared with days having different amounts of rainfall. Again, the days with low rainfall values are always very high compared with the days having high rainfall values. Another major problem is the outliers in rainfall data. Among thousands of rainfall data, only a few are very high and exceed the range of three standard deviations from the mean. Prediction of such outliers by training a model with all rainfall data is still a challenge. Overprediction of low rainfall values and underprediction of high rainfall values in modeling daily or hourly rainfall are still a subject of investigation for both physical and empirical modelers.
e capability of the models in estimating rainfall of all the months and the seasonal variability was evaluated using side-by-side bar plots. Obtained results for the five models are presented in Figure 11. e results revealed that all the models were able to replicate the seasonal variability in rainfall (high rainfall during November-December and low rainfall during June−August). However, it was also noticed that all the models overpredicted rainfall during low rainfall months and underpredicted rainfall during high rainfall months. Almost no difference among the models was noticed based on the monthly presentation of data. However, close observation of results revealed a bit less underprediction and overprediction by BRNN.
Finally, the performance of the models was visually compared by plotting the first two PCs of the data. For this purpose, the PCs of observed and model outputs were estimated. e PCs were then plotted (first PC in the x-axis and the second PC in the y-axis). is provides a visualization of how the data are related to each other in PCs. It gives an assessment of which samples are similar and which are not. Obtained results are presented in Figure 12. e results showed similarity in results of BRNN, xgBoost, and  ELM while the results of HNFIS and BART were different from the other three model outputs. However, when compared with the observed data, the BRNN was found closer to the observed line in the plot followed by HNFIS and BART. e ELM is found to locate most apart from the observed line and also from the other models. e results indicate the best performance of BRNN in replicating observed rainfall followed by HNFIS and BART.

Conclusion
Five advanced AI algorithms, namely, ELM, BRNN, BART, xgBoost, and HNFIS were employed for the developed of the physical-empirical models for the prediction of rainfall in the eastern coastal region of PM from SLP data collected from the climate domain of PM. e parameters of the models were optimized using a k-fold validation approach. e performance of the models was evaluated using various innovative visuals presented along with statistical metrics. e results showed better performance of BRNN compared with other models in predicting daily rainfall one-day ahead. Prediction of daily rainfall is an extremely challenging task for the tropical region where daily rainfall data can be compared with a high fluctuating extremely noisy data without any exportable temporal pattern. e capability of the physical-empirical model of incorporation of physical mechanism and use of advanced ML algorithms has made the models developed in this study efficient in rainfall prediction. e Bayesian regularization of neural network parameters makes it more capable to reveal theoretically complex input-output relationship. It might be concluded that the complexity due to a very chaotic relationship of SLP and rainfall in the ECPM was able to capture through Bayesian regularization of the ANN model. e results revealed the potential of the model to be employed for the development of early warning of rainfall. However, it should be noted that the model was not capable to replicate very high rainfall which means it cannot be employed for prediction of extreme rainfall and probable flash flood. Besides, the model overpredicted low rainfall and thus more rainfall days which restricted this applicability in predicting dry spells. In the future, more attention should be given to improve the performance of the models in simulating extreme rainfall for expanding their applicability in prediction of extreme rainfall.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.