Improving the Forecasting Accuracy Based on the Lunar Calendar in Modeling Rainfall Levels Using the Bi-LSTM Method through the Grid Search Approach

Rainfall is one of the climatic factors that influence various human activities and affect decision making in daily life activities. High intensity of rainfall can turn into a threat and cause serious problems such as causing various natural disasters. Therefore, it is essential to conduct rainfall forecasting to anticipate and enable preventive actions and can be used as a decision consideration in increasing the productivity and mobility of human activities. The aim of this study is to compare rainfall accuracy between the Gregorian and the lunar calendars using the bidirectional long short-term memory (Bi-LSTM) machine learning model through the grid search approach. This method was used because it can capture patterns arising from the simultaneous effects of two asynchronous calendars, Gregorian and lunar, which were used in this study by finding the right parameters. Monthly rainfall data from Bogor City, Indonesia, were used from the period of 2001 to 2022. The results show that the MAPE of the lunar calendar is relatively smaller at 14.82% which indicates the better forecasting ability than the Gregorian calendar which is 35.12%.


Introduction
Most rainfall forecasting is based on the Gregorian calendar [1,2], but many rainfall phenomena are closely linked to the lunar calendar [3].Earth's climate including variations in rainfall and tides is infuenced by the phases of the moon, which are the cornerstone of making the lunar calendar (see [4]).Conversely, rainfall forecasting has generated signifcant research attention in recent times owing to its complexity and ongoing applications.Hence, methods employing machine learning algorithms in conjunction with time series data are being investigated as viable alternatives to address these limitations (see [5,6]).
In recent years, machine learning algorithms have been widely employed for time series data predictions, yielding highly accurate results.Machine learning enables the resolution of prediction problems in time series data with a wide range of values.Numerous studies have been conducted on rainfall forecasting using machine learning algorithms, including the support vector machine (SVM) method (see [7,8]), the deep neural network (DNN) method (see [9,10]), and the long short-term memory (LSTM) method (see [11][12][13]).
Bidirectional long short-term memory (Bi-LSTM) is a machine learning method suitable for time series data prediction.It is an extension of LSTM with the capability of retaining dat information from both forward and backward directions.Tis capability enhances the learning process by ofering additional neural networks, leading to more comprehensive results.To obtain the best forecasting model using the Bi-LSTM method, it is essential to determine the optimal parameters for the learning algorithm.Parameter setting and tuning play a signifcant role in improving forecasting accuracy.One efective approach to determining the optimal parameter setting and tuning is to utilize the grid search algorithm.Te grid search algorithm works by systematically combining various parameters used in the model creation process.Tis method divides the parameter range into a grid and explores diferent combinations of parameter settings to identify the best parameters for the model.According to [14,15], the grid search technique improved the model performance.
Tis study covered three aspects of analysis: (1) rainfall data conversion from the Gregorian-based calendar to the lunar-based calendar, (2) rainfall data modeling and forecasting based on the lunar calendar, and (3) comparison of rainfall forecasting accuracy which is based on the Gregorian calendar and the lunar calendar.Te purpose of this study is to forecast the rainfall using the bidirectional long shortterm memory (Bi-LSTM) model by the grid search approach.Tis research is expected to yield an efcient calendar conversion algorithm and can be used as the basis for further research for making an automatization of calendar conversion.Some previous works on the calendar conversion were conducted by [16][17][18][19][20][21].

Bidirectional Long Short-Term Memory (Bi-LSTM).
LSTM was specifcally designed to address the problem of vanishing gradient.LSTM units consist of forget gates, input gates, and output gates, which are used to control the storage or disposal of information.Tis method has been used in various cases such as sentiment analysis [22], COVID-19 vaccination responses [23], and smartphone data sensors [24].LSTM usually uses quite complex calculations and high computation in its application.Terefore, this study examines a method with a simpler level of computation but with comparable performance, Bi-LSTM.
Bi-LSTM was proposed by Graves and Schmidhuber to solve a faw in the recurrent neural network (RNN) and the LSTM model.In both models, information can only be propagated forward, meaning that the time state t depends only on the information before time t [25].On the other hand, Bi-LSTM involves two LSTM networks: processing the sequence of data input in the forward direction and processing the sequence of data in the reverse direction (backward).Tis method can store time series information in two directions and can provide additional training processes.Additional training processes and two-way feature extraction make Bi-LSTM have better performance [26].In addition, the outputs of the forward and backward LSTM networks are combined on each time sequence.
Te Bi-LSTM model can learn past and future information for each input sequence.In addition, Bi-LSTM has two layers of data input that are opposite to each other which enable the model does not forget a long sequence of data information during the training process [27].Terefore, theoretical prediction performance with Bi-LSTM is better than that with LSTM [28].Te architecture of Bi-LSTM [29] is provided in Figure 1.
Figure 1 shows that the order of the forward layer is the same as in a regular LSTM network that calculates the sequence of t − 1, t, and then t + 1.However, for the backward layer, the hidden layer and output iterated from t + 1, t, to t − 1. (h t → ) and (h t

⃖
) are the forward and backward layers, respectively.According to [30], the process of forward LSTM and backward LSTM can be written as follows: It is described in Figure 1 that the hidden layers on each forward and backward are connected and form an output value.Te calculation of the output value is shown in the following equation [31]: with y t as the fnal output value and U y and W y as the weight values for the output gate on h t → and h t ⟵ , respectively.Several studies using the Bi-LSTM method have been conducted by authors in [32] on the case of wastewater fow rate prediction, by authors in [33] on tropical cyclone prediction, by authors in [34] on groundwater content prediction and soil, and by authors in [35] on water content and river water fow prediction.

Te Grid Search Method.
Te grid search is a method used for fnding appropriate parameters to improve model performance by trying all combinations of parameters.In its applications, the grid search algorithm is usually combined with cross-validation to form a model evaluation index.Te index evaluates model performance by considering data sharing.
In this paper, the grid search algorithm is evaluated by a cross-validation (CV) test.A common form of crossvalidation is k-fold, which is used to estimate prediction errors in evaluating model performance.It divides datasets into k groups of equal size.One of the k-fold groups was used as test data while the rest of the groups were used as training data.Te parameter pair obtained from the cross-validation test with the smallest error average is the best parameter.Tis parameter is used in the formation of the model for later testing and evaluation.

Calendar Conversion.
Tis study utilizes algorithm to convert daily data into monthly data based on the lunar calendar.Te month's names and the number of days are referenced from the islamicfnder.comwebsite.Te conversion of daily data from the Gregorian calendar to the lunar calendar requires data division into three segments, as illustrated in Figure 2. Te conversion process using the "TS" package in R software [36] is done through the following steps: (1) Determine the initial and end dates of the lunar calendar (2) Partition the time interval into three calendar parts (3) Convert boundary points from the Gregorian calendar to the lunar calendar 2 Te Scientifc World Journal (4) Make these three segments into vector shapes of the dates of the Gregorian and lunar calendars as seen in Figure 2 (5) Include calendar attributes, i.e., lunar number and date, e.g., Attr < -c (no, dates) (6) Create a data frame which consists of the combination of the three vectors (7) Input the daily rainfall data based on the Gregorian calendar in a separate column/vector (8) Merge the column vector of daily data with lunar number and dates into the data frame in Step 6 (9) Shift the daily data according to the calendar converter and change the name of Gregorian months to the corresponding lunar months Te third stage is called a scaling or mapping technique which is used to normalize data.Te normalization process involves the min-max method.Te data normalization process will result in values ranging from 0 to 1.According to [28], the equation used in data normalization is as follows: Te Scientifc World Journal and the denormalization [37] of the data is where x norm is a normalised value, x max is the maximum value of the entire data, and x min is the threshold of the entire data.

Bi-LSTM-Grid Search Modeling.
In building models using the machine learning model such as Bi-LSTM, it is important to select optimal parameters.Parameter determination or tuning parameters are used to control the model so that it can produce better model performance [38].Tis study proposed the use of the Bi-LSTM-grid search model (see Figure 3).Bi-LSTM modeling with grid search consists of input layer, Bi-LSTM layer, dropout layer, dense layer, and the addition of the grid search algorithm to determine the best parameters in the Bi-LSTM learning process and output layer.Te input layer is the layer that receives input data, while the Bi-LSTM is a layer in the learning process.Dropout layers are used to prevent overftting of the model during the learning process.Te dense layer is a neural network layer that functions to convert the output of the previous layer into predicted values.Te output layer is a layer that produces outputs or fnal values in the learning process.Te addition of a grid search algorithm is used to determine the best parameters, dividing the range of parameters used into grids and at all points in order to obtain optimal parameters in the learning process of the Bi-LSTM model.Some of the parameters used in this study consist of one hidden layer, hidden neurons, batch, and epoch.In addition, dropout regulation techniques are also used to avoid overftting in the model.Adam's optimization or optimizer function is added to determine the optimal weight and reduce errors in the model formation process of maximizing model accuracy.Te parameter values set to build the prediction model are given in Table 1.
Te determination of neuron numbers in the hidden layer is carried out to obtain the optimal number of hidden neurons.Epoch is a condition where all data have gone through the training process on the network that is formed until it returns to the beginning in one round.Each epoch can be partitioned into batches.Batch is a parameter that determines the sample size used in the process before updating architectural parameters.
Te best parameter value from each combination of parameters can be determined using a grid search with the help of cross-validation.It allows for an evaluation of each model with various combinations of predefned parameter limit values.A common form of cross-validation is k-fold cross-validation.In this study, the grid search algorithm used 5-fold validation.273 months as seen in Table 2. Te results of rainfall data conversion from the Gregorian calendar to the lunar-based calendar can be seen in Table 3 which has 286 months.

Bi-LSTM Prediction Results
, Grid Search.Te data used in Bi-LSTM modeling were those that passed in the data preprocessing stage, and parameter tuning was conducted using the grid search algorithm.Based on predetermined parameters, 375 models were obtained from a combination of parameters.Of the 375 models, the most optimal combination of parameters was obtained based on the minimum MSE (mean squared error) value for each forecasting length in the Gregorian and the lunar calendars which was continued to the testing process.See Table 4 for the complete results.
Based on the results in Table 4, the smallest MSE was 0.01882 for a 12-month forecasting length with an optimal combination of parameters based on tuning parameters using the grid search such as the number of neurons 20, batch 4, epoch 200, and dropout 0.2.Te model with the best combination of parameters obtained from the training data using the grid search was then applied to the testing data.Te optimal combination of parameters for rainfall data based on the lunar calendar is shown in Table 5.
According to Table 5, the smallest MSE was 0.01891 with a forecasting length of 3 months, the number of neurons 20, batch 4, epoch 200, and dropout 0.1.Similar to the model in the Gregorian rainfall data, the model with the best combination of parameters obtained from the training data was then applied to the testing data.
Te Bi-LSTM model was formed using testing data based on the selection of the best parameters.Te forecasting length was evaluated using MAPE based on the lunar and Gregorian calendars which is provided in Table 6.
Table 6 shows the results of model evaluation using MAPE on rainfall data with various forecasting lengths.MAPEs are computed based on the best combination of parameters.Te lowest MAPE is the lunar calendar-based rainfall data with a forecasting length of 3 months.Te results in Table 6 conclude that the longer the forecasting, the smaller the MAPE.Te comparison of actual data and forecasting results of rainfall data based on the Gregorian and lunar calendars using a model with the best parameters for each forecasting length is reported in Figures 4(a According to Figures 4(a) and 4(b), the Bi-LSTM model that was formed using the best parameter selected using the grid search algorithm on rainfall data based on the lunar calendar produces predictions that are quite similar to the test data pattern.Te MAPE value obtained from rainfall data based on the lunar calendar was relatively lower than the one from the Gregorian.Te lowest MAPE was obtained from the lunar calendar-based rainfall data at 14.82% with the best parameter values optimized using the grid search algorithm: the number of neurons 20, batch 4, epoch 200, and dropout 0.1.Te criteria of the MAPE value could confrm the accuracy of the model.In conclusion, the best model obtained from the Bi-LSTM grid search was able to provide better results in modeling rainfall data based on the lunar calendar instead of the Gregorian calendar.

Discussion
Converting the Gregorian-based rainfall data to the lunar calendar gives an advantage in time series analysis.Te conversion of daily data to monthly data based on the lunar calendar increases the length of the data series, which provides more information and gives a better forecast.Te addition in time series length is an efect of diferent number of days in a year.In particular, the Gregorian calendar has 365 days, while the lunar calendar has 355 days.In general, the longer the forecasting horizon, the bigger the mean absolute percentage error (MAPE) for both Gregorian and lunar calendars.
Te use of lunar calendar-based forecasting for rainfall is because the moon's gravity afects the earth's climate.Furthermore, the moon exerts a more signifcant infuence on earth than the sun due to its closer distance.Te moon's position not only afects its phases but also has a gravitational impact on earth's weather.Te efect of the moon's gravitational force on earth's rainfall needs further study involving collaboration with astronomers.Lunar calendars are commonly used in countries with Muslim populations as the majority, such as Indonesia, Saudi Arabia, and the Middle East.Te Islamic calendar, known as the Hijri calendar, is derived from the lunar calendar.Religious holidays often afect signifcant movements of residents who want to visit places of worship or their hometown for family reunions, visit their parents and relatives, and so on.As a result, the lunar calendar has a notable impact on transportation and economy, particularly in countries with large Muslim communities.
Te grid search algorithm proposed in this paper remains time-consuming.Each parameter combination takes approximately two hours to complete, which is less efcient.Hence, it is advisable to explore alternative algorithms, such as the genetic algorithm.

Conclusions
In this paper, we employed the machine learning time series method with the Bi-LSTM model and the grid search approach.Te accuracy of forecasting results of rainfall data based on the lunar calendar was evaluated using the mean absolute percentage error (MAPE).We also compare the MAPE of the Gregorian calendar-based rainfall data model and the lunar calendar-based rainfall data model.
Te lowest MAPE for the Gregorian calendar-based model was 35.12%, while the lowest MAPE for the lunar calendar-based model was 14.82%, with a forecasting length of 3 months.Te smaller MAPE for the lunar calendar-based model suggests a superior forecasting ability compared to the Gregorian calendar-based model.According to the MAPE criteria, the forecasting model based on the lunar calendar can be considered highly accurate.Te optimal combination of parameters for rainfall data based on the lunar calendar, as determined through the grid search algorithm, comprises 20 neurons, 200 epochs, a batch size of 4, and a dropout value of 0.1.
Te frst stage in the data preprocessing process is data cleaning in which data are adjusted in the presence of missing values.Handling of missing values in data can be done using the mean imputation method, in which missing values are flled up with the average of all known values in a variable.Te second stage is data sharing or data splitting, i.e., data are divided into training and testing data.Training data are used in training models, while testing data are used in evaluation of the selection of model architectures with the best parameters.Te total data used in this study were 286 lunar calendar data and 273 Gregorian calendar data.Tese data were divided into several forecasting lengths: 3, 6, 12, 18, and 24 months.
2.4.Preprocessing Data.Te data preprocessing stage is carried out to improve performance in data processing and prevent errors in the data so that the data used for the prediction process have a high quality.

Table 2 :
Monthly rainfall data (mm) based on the Gregorian calendar.

Table 3 :
Conversion of monthly rainfall (mm) data based on the lunar calendar.

Table 4 :
Results of Bi-LSTM-grid search tuning parameters on the Gregorian calendar.

Table 5 :
Bi-LSTM-grid search tuning parameter results on the lunar calendar.

Table 6 :
MAPE value in the Gregorian and lunar calendars.