Hybrid Models for Weather Parameter Forecasting

,


Introduction
e forecasting of weather forecasting is an important task as the economy of a country, especially the one which is more agriculture-based, depends on the agriculture yield, which further is dependent on weather parameters.
ere have been constant efforts in the area of weather forecasting to increase prediction accuracy. e basic schematic of application of simple and hybrid models is explained through a flowchart in Figure 1.
In the recent past, research work has been significantly carried out on the time-series analysis of weather parameters. Bhardwaj and Duhoon [1] studied the persistence behavior of time series of monthly rainfall and temperature for India by applying dispersion analysis on the weather dataset from 1901 to 2015. Similarly, for ecological data like that of wildlife poaching, the persistence behavior of studied time series was assessed by using dispersion analysis in [2]. Furthermore, in [3] using time-series analysis, the change in the predictability of the temperature was studied for the factors which may lead to unpredictable or antipersistent behavior in time series leading to an increase in death rates due to random temperature fluctuations under phenomena of heatstroke. Antipersistence behavior was observed by the authors in the studied data, which is not a good indicator in the direction of lowering of temperature levels despite global initiatives like the Paris Climate Agreement 2015 being taken [3].
Different soft computing techniques to forecast weather parameters have been studied in [4]. In [5], it has been demonstrated that, on integrating ANFIS (Adaptive Neuro-Fuzzy Inference System) with Sugeno model and applying the resultant ANFIS-Sugeno model on weather time-series data like daily temperature, the error in the output gets reduced. Further in [6], single NARX (Nonlinear Autoregressive Exogenous) model and conjunction model W-NARX (Wavelet-NARX) were studied by Bhardwaj and Duhoon for temperature data time series, and on comparing the output results, it was concluded that W-NARX showed better efficiency in learning the behavior from the input parameters with least MSE (Mean Square Error) of Training, Validating, and Testing as 8.12376e − 1, 4.86326e -1, and 4.79787e -1, respectively. e R values for Training, Validating, and Testing obtained were 9.88728e − 1, 9.26185e -1, and 9.94526e -1, respectively.
Various machine learning strategies for temperature forecasting were studied by Cifuentes et al. [7], where the machine learning techniques proved to be better for accurate prediction. In comparison with traditional artificial neural networks, the deep learning strategies reported small errors. Both clustering and classification techniques using weather parameters have been studied. Naive Bayes provides better results as compared to others on the basis of statistical outcomes of Kappa Statistics and estimated errors. In [8], K-means clustering, EM clustering, hierarchical clustering were used as the clustering methods, and on the biases of the least time taken, it was concluded that K-means clustering is an efficient clustering technique [8].
For a monthly forecast of precipitation, Kalteh [9] studied artificial neural networks and conjunction of ANN (Artificial Neural Network) and singular spectrum analysis models. e conjunction model so formed was compared with the single ANN model on the basis of error estimation by calculating RMSE (Root Mean Square Error) and coefficient of efficiency. e results showed that the conjunction model performed way superior to the single ANN model. Further conjunction model of wavelet and SVM (Support Vector Machine) was studied by Kisi and Cimen [10] for forecasting daily precipitation where the discrete wavelet and SVM were combined to form a conjunction model. A single SVM was compared, and the conjunction model was observed to give the best results on the basis of error calculation.
On the other hand, Oana and Spataru [11] studied the application of genetic algorithms in conjunction with WRF numerical weather prediction systems for optimizing and forecasting using GA. It was observed that the conjunction model was efficient and showed less error. Comparison of ARIMA (Autoregressive Integrated Moving Average) with exponential smoothing models, which is Holt-Winters model and ETS model (i.e., Error, Trend, and Seasonality model), was carried out by Guizzi et al. [12] in order to forecast weather parameters: temperature, pressure, and humidity for Italy for one month. e Holts-Winters model gave better results as compared to the other two models. For weather parameter data of Saudi Arabia, individual MLP (Multiple Layer Perceptron) and RBF (Radial Basis Function) models were studied by Saba et al. [13]. ey compared the obtained results with the results of the hybrid neural model, which was a combination of MLP and RBF, in order to improve the forecast. e RMSE, correlation coefficient, and scatter index were compared in order to see which model performed better with lesser error. It was observed that the hybrid model showed better results as compared to  [24].
Based on the above-mentioned studies [25], in this paper, we attempt to form a new model which can forecast weather parameters and take lesser time in forecasting. e daily data of weather parameters, namely, maximum temperature, minimum temperature, rainfall, and wind speed, for Delhi and Mumbai from January 1, 2017, up till May 30, 2018, have been considered, and for Chennai, the daily data from January 1, 2020, up till February 28, 2021, have been studied in order to study the independent data sets for daily weather parameters. e objective of the study is to study the different metropolitan cities at different locations on the map of the country. Delhi, the capital of India, faces all kinds of seasons. Mumbai is near the sea and has a lot of moisture present there, and it has limited seasons. Chennai all together is at the different side and faces limited seasons. ese states are dense in terms of population as many people from rural and urban areas come together for their living. Hence, in order to study the independent data sets for daily weather parameters, these states have been taken.
To the considered data plain, LibSVM (Library Support Vector Machine), RBF, and SMO (Sequential Minimal Optimization) methods are applied. Next, the different hybrid conjunction models are studied under different schemes in which the time series of weather parameters are denoised using Haar wavelet and trained using NARX before obtaining the forecasts for the forecasting period. e study includes (1) wavelet-RBF, wavelet-SMO, and wavelet-LibSVM, (2) Neuro-RBF, Neuro-SMO, and Neuro-LibSVM, and (3) wavelet-neuro-RBF, wavelet-neuro-SMO, and wavelet-neuro-LibSVM. e outputs are compared, and on the basis of the errors and time taken in seconds by the models, the most suitable model is chosen. Weka software has been used for the study. e climate of Chennai is a tropical wet and dry climate. Chennai lies on the thermal equator and is also coastal, which prevents extreme variation in seasonal temperature. e weather in Chennai is hot and humid. e hottest part of the year is May and early June, with a maximum temperature of 38-42˚C. e coolest part of the year is January, with a minimum temperature of 18-20˚C. e extreme temperature ranges from 13.9 to 45˚C.

Data
Based on the data in Table 1, the model developed in the study is validated for the above data points.

Wavelet Method.
Wavelet transformation is used for denoising signals, which means regenerating or reconstructing a signal from the noisy one. Wavelet transformation applies interdependent analysis in order to match the output with the input signals. Now, if the charge of a signal is filled in a small wavelets dimension, its coefficient will be comparatively larger in comparison to disturbances; hence, energy spreads over a large number of coefficients. Further, reducing wavelets transformation will eliminate the low amplitude disturbance or signals which are not required in the wavelet domain.
e Haar wavelet's mother wavelet function ∅(t) can be written as Its scaling function λ(t) can be written as e steps of the denoising scheme are as follows: Step 1: decompose input time-series signal using DWT (Discrete Wavelet Transform) selecting wavelet. In the Complexity 3 study, the Haar wavelet is used, whose function is given in equations (1) and (2).
Step 2: by acting on the detail wavelet coefficient, the threshold function is used to reduce noise in the signal processed in Step 1. e coefficients are scaled or shrunk depending on the chosen threshold function. In the study, a soft thresholding technique has been used for which the function is as follows: where s and represent the wavelet coefficient and threshold value.
Step 3: extract the denoised time series by the inverse wavelet transformation of the output signal in Step 2.

NARX Method.
NARX is a recurrent neural network with feedback connections spread over several layers of the network. It is used mostly in modeling of time series and is based on the linear ARX (Autoregressive Exogenous) model. It is defined by the following equation: where on the past values of the output signal Y(t) and the exogenous independent input signal, the next values of the output signal are regressed. By using a feed-forward neural network to approximate f, the NARX model is implemented. Besides being used as a predictor, NARX is used for nonlinear filtering, where the target output is the noise-free version of the input signal. e steps of the NARX scheme are as follows: Step 1: load the decomposed output time-series signal denoised using DWT (Discrete Wavelet Transform).
Step 2: choose the time delay and the number of hidden neurons. en, use Levenberg-Marquardt method to train the time series.
Step 3: train the time series and obtain the output.
In NARX, using Levenberg-Marquardt backpropagation method, the training of data is done. It reduces the error to the minimum. e algorithm is used for training the time series for the purpose of error reduction after being denoised using a wavelet.

Soft Computing Techniques.
In the study, the kernelbased soft computing methods which have been used for forecasting and compared are described as follows.

RBF (Radial Basis Function) Method.
e RBF kernels are the most widely used kernels because of having similarity with the Gaussian distribution. e RBF kernel computes the similarity or closeness between two data points and can be mathematically be presented as where σ is the variance and the hyperparameter and ‖y 1 − y 2 ‖ is the Euclidean (L 2 norm) distance between the two data points y 1 and y 2 . Now linear regression is applied on RBF transformed data mentioned as above, and then we get Y n � RBF y n , y 1 , RBF y n , y 2 , RBF y n , y 3 , . . . , RBF y n , y N , (6) where N is the total number of data points. On the wavelet denoised and NARX trained data, the regression-based RBF conjunction model is applied to obtain the forecasted RBF data Y n . e steps of the RBF scheme are as follows: Step 1: load the decomposed output time-series signal denoised using DWT (Discrete Wavelet Transform) and then train time series using NARX. Step 2: choose the machine learning method-RBF for training the time series using the algorithm.
Step 3: obtain the desired output.

LibSVM Method.
As an optimization algorithm, SVM (Support Vector Machine) was introduced by Shirmohammadi [16] to find the hyperplane with the maximum discriminating margin between two classes of data. To evaluate an input data sample y t , the decision function used has the general form as follows: where y i , z i i�1,..,N is the training sample, y i ∈ R d is the input sample data from sample i of size d, and z i takes two possible values 1 or − 1. Kernel function K(y t , y i ) estimates the similarity between the i and t samples in the feature space with b as the threshold constant of the function. e coefficient values α i correspond to the Lagrange multiplier of quadratic programming (QP) problem, which are obtained by minimizing the objective function as follows: where the compromise between trading error minimization and the margin maximization is given by constant C and a large C involves high error penalization. e QP problem is divided into QP subproblems under the technique of chunking, where QP subproblems use a subset of nonzeroα i 's and the training samples (M) that violate the Karush-Kuhn-Tucker (KKT) condition. ere are several SVM techniques like one-class SVM, nu-SVM, and R-SVM. LibSVM is an integrated software for SV classification, regression, and distribution estimation, including different SVM formulations. e steps of the LibSVM scheme are as follows: Step 1: load the decomposed output time-series signal denoised using DWT (Discrete Wavelet Transform) and then train time series using NARX.
Step 2: choose the machine learning method-LibSVM for training the time series using the algorithm.
Step 3: obtain the desired output.
LibSVM supports multiclass classification providing a simple interface where users can easily link it with their own programs. Weka software has been used for the study. Weka library for SVM has been used.

SMO (Sequential Minimal Optimization) Method.
In each iteration, by making use of only two Lagrange multipliers, the SMO algorithm pushes the chunking method to the smallest possible expression.
e optimal values of the multiplier are determined, and the SVM framework is updated until the QP problem is solved. Analytically, the optimization subproblem can be solved for two Lagrange multipliers which make SMO advantageous. In the methodology, the algorithm chooses two training samples (y 1 , z 1 ) and (y 2 , z 2 ), whose associated Lagrange multipliers are α 1 and α 2 . e optimization of the objective function in equation (8) for variables α 1 and α 2 now becomes e main routine initializes the SVM algorithm for two classification problems and evaluates all the samples and the associated Lagrange multipliers α i . In the whole iteration, the value of the Lagrange multipliers does not change anymore and then the routine finishes. e output of the procedure is bias band a list of multipliers α i and any input data time series can be evaluated and forecasted using equation (7) of SVM. e steps of the SMO scheme are as follows: Step 1: load the decomposed output time-series signal denoised using DWT (Discrete Wavelet Transform) and then train time series using NARX.
Step 2: choose the machine learning method-SMO for training the time series using the algorithm.
Step 3: obtain the desired output.
Following the methodology, results generated are denoised by wavelet, trained by neural networks, and forecasted using kernel-based techniques of soft computing RBF, LibSVM, and SMO along with their conjunction where X i is the predicted value; X is the actual value; N is the number of terms. (b) Relative Absolute Error (RAE): RAE is the same as RMSE but slightly smaller, and also it is calculated to compare the models to see which has better performance.
where X i is the predicted values; Y i is the previous target value; X is the actual value. (c) Root Mean Squared Error (RMSE): it measures the average magnitude of errors. e errors are squared before they are averaged, and the RMSE is relatively high weight to large errors.
where X is the actual value; X i is the predicted values; N is the number of observations. (d) Root Relative Squared Error (RRSE): it is a tool to calculate the error, and the error is normalized by taking the square root.
where β is the predicted values; c is the previous target value; α is the actual value; N is the number of observations.

Results and Discussion
e data has been trained, tested, and validated as 70%, 15%, and 15% of all the data while using the NARX method to study the pattern of the time series. e parameters studied are maximum temperature, minimum temperature, and wind speed. It is observed that the denoised plus trained signal deviates least from the original data trajectory. For the forecasting period June 1, 2018, to June 7, 2018, for Delhi and Mumbai and March 1, 2021, to March 7, 2021, for Chennai, the original data signaled-noised data signal, trained data signal, and the denoised plus trained data signal are fed to RBF, SMO, and LibSVM models to get the simple model forecasts under Scheme 1 and hybrid conjunction model forecasts under Schemes 2, 3, and 4, respectively. e comparison plot of simple and hybrid forecasts with the original data for the forecasting periods is shown in Figures 2(a)-2(l). For different simple and wavelet conjunction hybrid models, the error and time taken are compared in Tables 1-7 for the metropolitan cities and Figures to select the most efficient model. e above error estimations and time taken by simple and conjunction models show the efficiency of the hybrid models, and hence, it can be concluded from Figures 6-8 and Tables 1-8 that the Wavelet + Neuro + RBF model shows the best results for the forecasting of weather parameters. RBF method has advantages of easy design, better generalization, and strong tolerance to input noise, and its learning ability makes it very suitable to design flexible control systems, whereas SMO is an iterative algorithm for solving problems. SMO breaks this problem into a series of the smallest possible subproblems, which are then solved analytically.
ough LibSVM works stepwise, first by training the data to obtain model and then using the model to predict information of testing data set, RBF is a better method as compared to it as RBF works faster by calculating the distance between the data values and then studying the pattern of the data to make the forecasts accordingly. SMO and LibSVM are methods that take more time to study the pattern of time series. In the case of SMO, it takes more time to break the data set into smaller and smaller sets, due to which some points are left out, and hence, the error chances increase. Similarly, for LibSVM, as it includes multiple algorithms in it, it takes time to understand the pattern and then choose the correct model. RBF predictions are thus observed to be more accurate than LibSVM and SMO methods. In the case of LibSVM and SMO, the forecasts and errors from the hybrid model are not much different from the conventional model forecasts and errors, even though the time taken (TT) is significantly reduced using the hybrid scheme. But applying wavelet and NARX and then feeding the denoised plus trained data to the RBF yield better predictions for the forecasting period compared to the predictions obtained by directly feeding the data to RBF. It is observed that when the data input signal is first denoised using the Haar wavelet and then trained using the neural networks (NARX), the double processing of data reduces the error in the forecasts obtained after feeding the denoised plus trained data signal to plain RBF model. e hybrid or conjunction model of Wavelet + Neuro + RBF provides better predictions with the least error and time taken to build the model [15,24] for the considered weather parameter data time series.
Time taken is in seconds, RRSE and RAE are taken in percentage, and MAE and RMSE have been taken as units.           Figure 6 shows the error plots of the statistical calculations done in Tables 2-5 for comparison of the models. e time-series plot of maximum temperature, minimum temperature, and wind speed of Delhi, Mumbai, and Chennai are as follows.
In order to validate the observation based on the initial study of weather parameters on the daily data set for Delhi region from January 1, 2017, up till May 30, 2018, as obtained from IARI (Indian Agricultural Research Institute, Delhi), the study is further applied to independent data sets of daily         producing better prediction than LibSVM and SMO and Wavelet + Neuro + RBF to be the most efficient method in comparison to the rest of hybrid and conventional schemes.    calculated for Delhi, Mumbai, and Chennai. e histogram of the bias is shown in Figure 9.

Conclusion
In this paper, for Delhi (maximum temperature, minimum temperature, evaporation, and wind speed) and Mumbai (maximum temperature, minimum temperature, and wind speed) from January 1, 2017, up till May 30, 2018, the daily data of weather parameters of have been considered, and for Chennai (maximum temperature, minimum temperature, and wind speed) from January 1, 2020, up till February 28, 2021, the daily data of weather parameters have been considered in the first phase of the study. In the second phase, the wavelet-neuroconjunction models are studied. Using wavelet transformation, the time series of weather parameters was denoised, after which the denoised data time-series signal was trained using NARX, and finally, the denoised plus trained time series was fed to Lib SVM, RBF, and SMO to obtain novel hybrid forecasts for the forecasting period. ese forecast outputs have been compared for the time taken by the models, and errors, namely, MAE, RMSE, RAE, and RRSE, have been calculated. In the case of LibSVM and SMO, the forecasts and errors from the hybrid model are not much different from the conventional model forecasts and errors even though the time taken (TT) is significantly reduced using the hybrid scheme. RBF predictions are observed to be more accurate than LibSVM and SMO methods. Based on the error calculation and the time taken, the hybrid scheme using the wavelet-neuroconjunction model gives better output as compared to the simple application of the conventional RBF method. Hybrid models, due to proper denoising and training of the time-series signal using wavelet and NARX, respectively, yield better results compared to the forecasts obtained by directly feeding data to the RBF model, thus verifying the study on independent data sets of weather parameters for Mumbai from January 1, 2017, up till May 30, 2018, and Chennai from January 1, 2020, up till February 28, 2021. It is concluded that Wavelet + Neuro + RBF model shows better results for the forecasting of weather parameters in comparison to all rest hybrid and conventional models. It can finally be concluded that Wavelet + Neuro + RBF model shows better results for the different data values and for the different time periods as well. e study will help the concerned authorities for future planning and take preventive steps for future coming calamities if any. It will also help the government to make effective policies.

Conflicts of Interest
e authors declare that they have no conflicts of interest.