Long-Term Rainfall Information Forecast by Utilizing Constrained Amount of Observation through Artificial Neural Network Approach

Estimating models are becoming increasingly crucial in highlighting the nonlinear connections of the massive level of rough information and chaotic components. 'e study demonstrates a modern approach utilizing a created artificial neural network (ANN) method that may be an alternative strategy to conventional factual procedures for advancing rainfall estimate execution. A case study was presented for Turkey’s Düzce and Bolu neighboring territories located on the Black Sea’s southern coast. 'is study’s primary aim is to create an ANN model unique in the field to generate satisfactory results even with limited data. 'e proposed technique is being used to estimate rainfall and make predictions regarding future precipitation. Bolu daily average rainfall by month data and a limited number of Düzce rainfall data were used. Missing forecasts and potential rainfall projections will be examined in the fundamental research. 'is research further focuses on ANN computational concepts and develops a neural network for rainfall time series forecasting. 'e emphasis of this study was a feed-forward backpropagation network. 'e Levenberg–Marquardt algorithm (LMA) was implemented for training a two-layer feed-forward ANN for the missing rainfall data prediction part of this research. 'e inaccessible rainfall parameters for Düzce were determined for the years 1995 to 2009. From 2010 to 2020, a two-layer feed-forward ANN was trained using the gradient descent algorithm to forecast daily average rainfall data by month. 'e findings reported in this study guide researchers interested in implementing the ANN forecast model for an extended period of missing rainfall data.


Introduction
Data-driven approaches are extensively employed in many fields, including meteorological studies and environmental engineering, and a particularly favored method of modeling data processing is the artificial neural network (ANN) [1,2]. e use of mathematical models to achieve findings with limited data improves scientific analysis advancement [3]. Artificial neural networks have many benefits, such as eliminating the difficulties of modeling nonlinear mathematical systems [4,5]. e architecture that predicts the best outcome has been calculated by adjusting the intermediate neuron and activation functions [6,7]. According to Rao et al. [8], the trial and error approach is the best method for estimating the layer and neuron configuration in the artificial neural network. e intermediate layer change and particulate matter (PM10) to predict surface ozone (O 3 ) gas concentrations in Gangetic West Bengal. Al Omar et al. [13] used a multilayer perceptron (MLP) based predictive model on finding multihours ahead surface ozone (O 3 ) concentrations in Ontario, Canada.
Lack of correlation between the observed input and target data means that the ANN analysis may fail to produce the desired results. e regression analysis has been used in a variety of other studies and disciplines. e regression model's primary goal is to find a trend line that better represents the data set. Park et al. [14] implemented regression analysis and ANN models to forecast particulate matter concentration 10 µm (PM10) in the ambient air. Gualtieri et al. [15] also created regression and ANN models to predict fine particulate matter concentration and traffic-oriented nitrogen oxides (NO X ). Compared to regression analysis results, ANN analysis has consistently produced more successful outcomes [14,15].
Many studies have been conducted to predict lost and potential hydrological data values in environmental and civil engineering studies [16]. In their research, Pakdaman et al. [17] and Kashiwao et al. [18] stated that short-term rain forecasts are required for specific aspects of environmental water resources, such as evaluating the potential for flash flooding and real-time management of urban runoff systems. Hossain et al. [9] researched in Western Australia to forecast long-term seasonal rainfall. Rain, air temperature, specific humidity, relative vorticity, the vertical component of the wind, and moisture divergence flux were chosen as input data for the trained ANN to generate rainfall forecast [9]. e only output was rainfall in Western Australia using the qualified input data, and good results were obtained. Akıner and Akıner [19] conducted water quality simulation in Lake Sapanca by adopting an ANN technique to point out the main threats for superficial water quality deterioration.
It is well known that the ANN methodology produces excellent results with a large number of data points, but it is unknown whether the ANN results will produce acceptable, reliable results when projecting the past and future with limited observed data. is study's main purpose is to determine missing temporal data from the past and then make numerical predictions for the future using the ANN technique's cognitive skills and extreme learning ability, even with a limited amount of rainfall data. e ANN model's performance was assessed using measured rainfall data from a neighboring area, Bolu, which has more convenient temporal rainfall data, and Düzce's measured rainfall values. e results show that if the ANN architecture is correctly installed, satisfactory results can be obtained. As a result, it was assumed that we only have monthly observed data from January 2008 to November 2009 for Düzce, and the network was trained using data from the neighboring city of Bolu. Finally, the ANN methodology results were demonstrated using sufficient observed rainfall data accurately recorded for Düzce between 2009 and 2020.

Research Area.
is research was carried out for the Black Sea cities of Düzce and Bolu in Turkey. Düzce City is a neighboring town to the City of Bolu. Düzce's meteorological station is located at an elevation of 150 meters and coordinates 40°50′ N and 31°8′ E. e meteorology station in Bolu City is situated at the height of 740 meters; its coordinates are 40°44′ N and 31°36′ E. ere is a 41-kilometer distance between the two weather stations. Figure 1 depicts the research area's map, as well as the Düzce and Bolu meteorology stations.
is analysis's primary goal is to generate regional quantitative forecasts of daily average rainfall per month using the ANN approach to approximate missing rainfall values. Similarly, potential rainfall quantities for the same city can be predicted using the same technique.

Data Collection.
e Turkish State Meteorological Service (TSMS) provided rainfall data between January 1995 and December 2020 [20]. e measurements were made daily, and the weighted average of the measured values for each month was used in the analysis as a daily average rainfall per month in mm/day. However, there are some missing values in the meteorological data, especially for Düzce. On the other hand, Bolu has far better measurement data and far fewer unmetered records than Düzce. e ANN model was developed using extensive monthly data from the Bolu meteorology station from January 1995 to December 2009 and limited monthly data from the Düzce meteorology station from January 2008 to November 2009. Unfortunately, no measurements or data for Düzce were taken or recorded before 2008.
is study focuses on Düzce as a research field. Consequently, it will be decided whether it is advantageous to use the ANN method when providing Düzce's missing data.

Scenario-Based Approach.
Typical applications envision training the network with large data sets and then making a forward projection. Is artificial neural network still a reliable methodology even though data sets are limited? e primary goal of the study is to find an answer to this question. e network was trained using small data sets from the past, and ANN was used to forecast the future under the scheme devised. A scenario was developed to achieve this goal, and it was assumed that we were still in 2009. It has been assumed that Düzce will require 26 years of rainfall data between 1995 and 2020.
On the other hand, it will be determined whether it is still possible to obtain this 26-year rainfall data set with minimal error. e goal is to achieve a high correlation with the created ANN model using only 23 months of rainfall data for Düzce. e rainfall data available from the neighboring city of Bolu between 1995 and 2009 was used to perform the training, validation, and test phases of the ANN methodology and train the network for this purpose. With its older and far more urbanized structure than Düzce City, the City of Bolu has made it possible to access older meteorological data. e meteorology station in Bolu provided fifteen years of monthly data from January 1995 to December 2009. However, rainfall data for the City of Düzce are only available for the 23 months between January 2008 and November 2009, based on available meteorological data until December 2009.
Indeed, the idea of using data from the city of Bolu to create an extensive temporal meteorological data set in the town of Düzce was influenced by the acceptable level of the correlation value obtained from the linear regression analysis of the two cities' rainfall data. As a result, rainfall values from Bolu cannot be used in place of Düzce rainfall data. Furthermore, the function resulting from the linear regression analysis, when applied to a limited number of data points of 23 months, does not permit the creation of a rainfall data set of 26 years without the use of any other independent variable. Düzce and Bolu's rainfall data are not interchangeable, but the rainfall characteristics of these two cities, which are 41 kilometers apart, are similar. us, ANN methodology is regarded as a technique that can generate the Duzce rainfall data set between January 1995 and December 2020.
Missing rainfall values in the City of Düzce were calculated using the artificial neural networks (ANN) model between 1995 and 2007 and December 2009. Furthermore, the potential rainfall values in the City of Düzce from 2009 to 2020 were projected using the same method. e ANN model was built using publicly available daily average by month rainfall data from Düzce and related temporal data from neighbor Bolu. Before training the network and conducting analysis through ANN, regression analysis should show a significant relationship between the input and target data. Hence, both data sets were statistically tested to see if there is a sufficient correlation to conduct the ANN analysis.

Artificial Neural Network (ANN) Model Configuration.
e network setup was initiated after the data were decided and optimized for the artificial neural network. A multilayer perceptron (MLP) network can solve various engineering problems based on data as a feed-forward neural network class. An MLP is a type of ANN with layers of input, hidden, and output and is frequently used in time series forecasting [21,22]. A feed-forward network's weights and biases must be configured as small random values before training [23,24]. e training data set reduces the error on the neural network output. All of the training algorithms exhibit the backpropagation, feeding the input, and updating the weight and bias values [25]. e Levenberg-Marquardt algorithm (LMA) is a combination of step reduction and Gaussian Newton algorithms. LMA was implemented to train a twolayer feed-forward ANN. LMA is a popular algorithm since it has a high success rate in first-order derivative approaches. It is widely used in artificial neural networks with backpropagation architecture [26]. e two-layer feed-forward neural network for the future rainfall forecasting phase was trained using a gradient descent algorithm.
e Levenberg-Marquardt algorithm (LMA) blends Newton's speed with the gradient descent method's consistency, whereas backpropagation is a gradient descent algorithm. e surface is done parabolically at each iteration step of the LM learning algorithm approaching the error, and the solution is given at each iteration by the minimum gradient of the parabola. ere are two implementations of the gradient descent algorithm: incremental mode and batch mode. During ANN training, weights and biases were adjusted to determine the network's global minimum of error.
In this study, the ANN architecture was deemed to produce the best performance in error reduction. e network's optimal size was determined by adding and removing hidden layer neurons until the optimal neuron number satisfied the target training error tolerance. Previous researchers devised an equation to decide the neuron number at the hidden layer. Regarding the input neuron number (n) and output neuron number (m), the proposed number of neurons in the hidden layer varies between (2n + 1) and (2√n + m) [27,28]. e best network architecture and the optimal neuron number were determined using a comprehensive trial and error stage. It is vital to choose the appropriate activation function such that the neurons in the neural network achieve the desired effects. e type of data used as input and the neural network's purpose should be considered when selecting activation functions. When solving a nonlinear problem, using nonlinear activation functions produces better results [29]. e nonlinear model was used to forecast future rainfall. e neurons' nonlinear transfer functions are sigmoidal functions that increase monotonically and are continuously differentiable.
For the missing records, on the other hand, the linear transfer function was used. e weights were adjusted iteratively based on the training set to minimize the error between the network output and the observed values. e intrinsic nonlinearity of ANN better explains dynamic meteorological phenomena than linear methods [30]. Overfitting model parameters to training data due to an excessive number of parameters or weights may result simultaneously in the training data set's satisfactory performance [31]. e validation set controls the learning stage with the second data set, and an unbiased prediction of the generalization error to prevent overfitting is ensured by the third independent test set of data [32,33]. Increasing the number of hidden neurons causes the target function to fluctuate, allowing the model to cope with the data's volatility. Rainfall patterns are frequently influenced by seasonal variation. e learning rate is vital in MLP network training because it controls the weight changes at each iteration. According to Adamuthe and Vhatkar [34], a learning rate of 0.05 to 0.5 produces satisfactory results. As a result, the structure shown in Table 1 was chosen for the neural network used in this study. Figure 2 depicts a scatterplot with a trend line connecting the predicted and observed values of missing rainfall for the train, validation, and test data sets used during ANN training.
e MLP network's input matrix structure for further rainfall forecasting consists of five vectors with twelve elements, corresponding to five years and twelve months (see Table 2). For example, R05 in the matrix represents rainfall data from 2005, and subscripts represent each year. During the preliminary stage, observed rainfall data from 2009 (R09) were set aside as a target output for network training. At each step, the input matrix's initial vector was continuously shifted, and the output vector produced at the former stage was simultaneously placed in the following input matrix.
Rather than using a single network that included all of the data from 2010 to 2020, smaller samples of time series were used to generate more accurate neural networks [35]. With 12 months, eleven networks were used, in other words, small size samples with each having 12 elements. e gradient descent algorithm was far better in terms of its performance than the Levenberg-Marquardt algorithm (LMA). For ANN training with the incremental mode, the linear and nonlinear transformations were used (see Table 3). By providing values between −1 and 1, the MLP network's hyperbolic tangent function expedites weight learning more than the logistic function [36]. Since the linear fit correlation coefficient is 0.72, close to 1, the ANN model produces reliable prediction values with a high correlation value with such limited data. According to several research pieces in literature, this method is applicable even in such extreme cases compared to conventional statistical approaches, where the rainfall amount is much higher with a greater variation from month to month [37,38]. e rainfall amount for the examined area of Düzce, Turkey, the rainfall amount is moderate together with a reasonable variation of rainfall values from month to month.

Results and Discussion
is situation may be favorable in terms of the ANN analysis's success. However, the main objective of this study is to create an artificial neural network (ANN) model, which is unique to the research field and gives successful results in the case of limited data; the ANN model can be used for rainfall forecasting and can provide a prediction about the future rainfall situation. Concerning the traditional statistical techniques, the ANN methodology also provides more reliable numerical results for Düzce, where the precipitation rate is moderate and has a low variation of rainfall values from month to month. Under these conditions, it is expected to achieve successful results in reaching the main objective, but it is vital to determine the appropriate network architecture and the ANN model's training algorithm as in previous successful works [39].
MATLAB R2018b Deep Learning Toolbox [40] was used for neural network analysis, and graphical results were also presented for physical interpretation and discussion. For this study, two distinct ANN configurations were used. One was for predicting missing records, and the other was for forecasting future values. In some cases, the network's inputs are overburdened. In these cases, the training process takes a long time. By removing data that does not contribute to network training, principle component analysis (PCA) can reduce input data [41,42]. As a result, the correlation of input data with each other is avoided.
However, results did not change with or without PCA, and PCA was not required during this study. Without using PCA, the best possible outcome could be obtained. e ANN analysis reveals a linear relationship between rainfall amounts in Düzce and Bolu. When estimated Düzce rainfall records and observed Bolu rainfall records are scatter plotted, the fit to generated data is a linear polynomial, and there is a high correlation between them, as shown in Figure 4.
roughout the first part of this study, the missing monthly rainfall amounts for Düzce City from January 1995 to December 2009 were predicted using the ANN model, and the results are shown in Figure 5. e stars represent the model's forecasts, and the circles represent Düzce's daily average rainfall records by month. Both units are measured in millimeters per day. During the ANN study, linear and nonlinear transfer functions were evaluated together, and the linear transfer function was favored depending on the mean square error (MSE). As a result, the linear function was deemed preferable for estimating missing rainfall data using ANN. e ANN model was used in the second phase of this research to predict potential rainfall records for the City of Düzce from 2010 to 2020.
Predicted values from the first phase of this research were used for this purpose. Figure 6 depicts Düzce's observed and ANN predictions of daily average by month rainfall records between December 2009 and December 2020. Figure 7 illustrates the correlation between ANN model outputs and observed Düzce rainfall records. e coefficients of determination (R 2 ) and correlation (R) were calculated to be 0.62 and 0.79, respectively. e model's performance in estimating missing values can be used to gauge the study's success. e correlation coefficients for train, validation, and test results are 0.87, 0.92, and 0.93. Besides, the mean square error value was calculated to be 0.053 mm 2 d −2 . e applied model's satisfaction level can be determined by comparing these results to other studies [43,44]. Products from similar papers [45,46] were examined, and it is clear that the performance of the model used in this study is highly reliable among the minimal data. 4 Advances in Meteorology

Conclusions
ANN has been used to simulate dynamic hydrological processes as an essential alternative method and is commonly used for forecasting. is research's aim has two components. e first step is to create an ANN model specifically for predicting missing rainfall records in Düzce using rainfall data from the neighboring city of Bolu. Before beginning ANN analysis, a relationship between meteorological events in both cities should be discovered. Statistical approaches such as regression analysis are the most straightforward and widely used method for this purpose. e ANN model may not produce the desired results if there is no relationship between the dependent and independent variables. A strong correlation between rainfall records from Düzce and Bolu was discovered using regression analysis. e best network architecture for missing value estimation was formed after a lengthy trial and error stage.
It was impossible to proceed to the second phase of this research to forecast rainfall data from 2010 to 2020 without completing the first phase since the first part's data would be an input for the second part. e input parameters were five consecutive years of observed and forecasted daily average by month (mm/day) rainfall data.
e year following those five successive years of input was the anticipated outcome. Projections were implemented using both linear and nonlinear transformation models. roughout the first phase, the linear model yields more substantial results.
Because of the effectiveness of linear and nonlinear transfer functions in avoiding local minima, the use of linear and nonlinear transition functions together prevents the projected data from becoming trapped at the minimum peak values. According to the findings, Düzce's monthly average rainfall level ranges from 0 to 4.5 mm/day. is study demonstrates that ANN is an excellent method for estimating long-term rainfall data even with few measurements. e ANN model's successful application for rainfall forecasting indicates that the methodology used in this study can also be used for future studies on extreme rainfall events and flood analysis predictions. People can be better prepared for potential meteorological extreme events in this manner. is paper's findings suggest that it can be a valuable guide for evaluating rainfall prediction's efficacy and reliability using the appropriately developed ANN models concerning network architecture and the implemented case-specific algorithms.

Conflicts of Interest
e author declares that he has no conflicts of interest.