A Novel Model Based on Square Root Elastic Net and Artificial Neural Network for Forecasting Global Solar Radiation

In recent years, solar energy has attracted a great deal of attentions from scientific researchers because it is a clean and renewable form of energy. To make good use of solar energy, an effective way to forecast solar radiation is essential to guarantee the reliability of grid-connected photovoltaic installations. Although an artificial neural network (ANN) is of great importance, irrelevant variables are utilized which results in complex model and intractable computation cost. To remove these irrelevant variables, the combination of variable selection methods and ANN are applied. However, how to select the regularization parameters in these techniques is challenging. This paper successfully investigates a square root elastic net(SREN-) based approach to tackle this challenge and selects all the important variables. An Elman neural network (ENN) is constructed with the important variables selected by SREN as inputs. Based on meteorological data, SRENENN has been developed for 1-year period in Xinjiang area of China. The present model delivers superior relationship between the estimated and measure values.


Introduction
Owing to the rapid development of the global economy, energy crisis and environmental pollution problems have threatened the sustainable development and human health.More and more countries pay much attention to green and renewable sources of energy, so it is essential to utilize sources of clean energy instead of fossil fuel [1,2].In fact, all kinds of energy sources derived from the sun have a diameter of 1.39 × 10 9 m and emit ferocious energy of 3.8 × 10 20 MW, but the earth only obtains a small fraction of 1.7 × 10 14 kW [3].As one of the most significant forms of green energy, solar energy was used since the prehistoric times because it can be captured anywhere.Solar energy is a renewable and clean alternative for solving the worldwide energy shortage and environmental problems [4].It can be applied in several fields including locating photovoltaic power plants, scheduling electrical load, and developing low-carbon economy [5].It is significant to get the reliable global solar radiation data for investigating, assessing, and utilizing solar energy resource.Although ground-based measurements can obtain the accurate global solar radiation, all the locations are not available [6].In recent years, geostationary weather satellites can be applied to estimate global solar radiation at ground level, but it is worse than the forecasting models because it is an indirect approach [7].In addition, the weather is intrinsically chaotic and instable which greatly affects the global solar radiation.These volatilities will threaten the stability and quality of the whole power system [8].Therefore, it is vital to develop some models to improve the forecasting accuracy of global solar radiation through several atmospheric factors.
Many researchers attempt to study several soft computing methods to forecast solar radiation and evaluate their potential of solar energy.These models include time series regression models (ARMA, ARIMA, and GARCH), empirical models, and machine learning techniques (artificial neural networks, support vector machine, etc.) [9].Sun et al. proposed ARMAX-GARCH model to forecast daily global solar radiation using several meteorological variables.The results of experiment showed that global solar radiation depends more on sunshine duration than temperature difference at certain stations [10].David et al. applied the combination of ARMA and GARCH model to provide probabilistic forecasts for solar irradiance.The proposed recursive ARMA-GARCH model was easier to estimate parameters and got a good accuracy [11].Quej et al. developed a new empirical model to predict hourly global solar radiation applying meteorological factors such as rainfall, temperature, and humidity at six sites in Mexico.Through comparing with other models, the proposed model had the best forecasting precision [12].Ouderni et al. utilized several empirical models including Benson model, Page model, and Angstrom-Prescott-Page model to assess the solar potential in the gulf of Tunis [13].
As one of the most popular forecasting models, machine learning techniques including artificial neural networks (ANNs), intelligent optimization algorithms, and support vector machines (SVMs) own self-adaptiveness and robustness and have already been successfully applied to forecast global solar radiation.ANN techniques include backpropagation (BP), radial basis function (RBF), multilayer perceptron (MLP), and extreme learning machine (ELM) [14][15][16][17].Benmouiza and Cheknane used k-means method to find the input samples and took advantage of nonlinear autoregressive (NAR) neural networks to forecast hourly global horizontal solar radiation [18].Chen et al. presented a model based on fuzzy rules and neural network to forecast solar radiation; the case study revealed that the proposed technique achieve excellent forecasting accuracy [19].Renno et al. developed two ANN models to estimate hourly direct normal irradiance and global radiation [5].Salcedo-Sanz et al. proposed a novel approach Coral Reefs Optimization-Extreme Learning Machine (CRO-ELM) to predict daily global solar radiation and achieved satisfactory results [20].Gairaa et al. adopt a new hybrid technique combining the linear ARMA and the nonlinear ANN to forecast daily global solar radiation in Algeria.The experimental results revealed that the hybrid model is superior to the single one [21].
Although ANNs have been widely exploited because of their nonlinear mapping ability, prediction capabilities, and robustness, the optimal parameters in the network such as weights, bias, and the number of the hidden layer nodes are not easy to determine, and the training of the network is likely to converge to a local minimum [22].Furthermore, its structure of the network would be quite intricate if all the variables are applied as inputs.This will cause the following two problems: (1) the complex structure makes critically trouble for forecasting and selection performance and (2) the complex structure needs much computation time.The weights between the nodes in an ANN are going to be estimated, and it would spend a lot of time if ANN has excessive number of nodes.Based on the above discussion, investigating an effective method to establish a simple neural network is essential.Since its structure relies much on the number of input sets, variable selection techniques are needed to choose the significant variables which are considered as inputs of an ANN.
Some researchers focus on selecting some important variables as inputs of the forecasting models including but not limited to ANN and SVM.Benghanem et al. applied Levenberg-Marquardt learning algorithm to construct ANN to study daily global irradiation of Saudi Arabia.Air temperature, sunshine duration, relative humidity, and day of the year are used as the input variables which achieved good forecasting accuracy [23].Rahimikhoob used temperature including the highest temperature and the lowest temperature to forecast global solar radiation in Southwest of Iran [24].Qing and Niu developed a new technique long shortterm memory (LSTM) networks to predict hourly solar irradiance and used the weather data (temperature, wind speed, dew point, etc.) to enter the networks [25].Vakili et al. established MLP neural network to estimate daily solar irradiance using temperature, wind speed, relative humidity, and particulate matter 10 [26].Rohani et al. proposed a Gaussian process with K-fold cross-validation model to forecast daily and monthly solar radiation using temperature, humidity, pressure, and sunshine hours as input variables [27].It is found that the above hybrid approaches combine the advantages of several single models and perform better.Variable selection algorithms can be used to reduce highdimensional data that select the optimal input variables or model [28,29].Jović et al. studied the solar radiation and used adaptive neuro-fuzzy inference system (ANFIS) to select the most relevant factors from temperature, mean sea level, and relative humidity as the predictors [30].Almaraashi applied four different feature selection methods to determine the input space and forecast daily solar radiation in Saudi Arabia based on a multilayer neural network [31].Aybar-Ruiz et al. adopted a grouping genetic method to select the relevant atmospherical features in extreme learning machine model for predicting global solar radiation [32].Mori chose meteorological variables using graphical modelling to estimate solar radiation [33].Jiang and Dong developed penalized kernel SVM approaches to select structural variables and forecast global horizontal radiation [34].
As far as we know, the current research papers focus on the way to select variables by trying some specific combinations or groups.However, there is no theoretical guarantee of the way to determine these combinations and considering all the possible combination of variables is time-consuming.Penalized variable selection methods are advocate to select the important variables directly without trying possible combinations, and they are more straightforward to use.Furthermore, compared with the conventional ANNs and SVMs, Elman neural network (ENN) is a local recurrent neural network with a single hidden layer, which owns fast learning rate, good dynamic characteristics, and high global stability [35,36].In this paper, an ENN structure can be selected as the forecasting technique for global solar radiation forecasting.This work advocates square root elastic net variable selection procedure in the Elman neural network (SRENENN) approach to forecast the global solar radiation in the Xinjiang area of China.The primary 2 Complexity novelty and contributions of this study are provided in the following list: (1) An ENN is applied to forecast global solar radiation with meteorological variables.
(2) Square root elastic net is used to effectively extract the meteorological variables which are applied as inputs of ENN, and the optimal model is determined by the 10-fold cross-validation to improve forecasting precision.
(3) A novel square root elastic net variable selection procedure in the Elman neural network (SRENENN) algorithm is proposed, and the corresponding forecasting results are compared systematically using Wilcoxon signed-rank test and Friedman test.
The structure of this study is given: Section 2 describes the square root elastic net variable selection procedure and Elman neural network; Section 3 investigates the case study based on real data analysis; Section 4 provides the forecasting accuracies and corresponding experimental results; the conclusions are presented in Section 5.The schematic overview of the whole paper is given in Figure 1.

Materials and Methods
2.1.Square Root Elastic Net.Based on the dataset Z = X, y , the following linear regression model is considered after centering X and y: where y ∈ R n denotes the target response which is going to be studied.X ∈ R n×p represents the data matrix with n samples and p variables, and β * ∈ R p is the coefficient for the true model.Let I be the identity matrix, the error term ε follows Gaussian distribution N 0, σ 2 I with σ 2 > 0.
To obtain an interpretable model, the following optimization problem is considered: where Ω β; λ denotes the penalty function with λ representing the tuning parameter.When Ω β; λ = λ β 1 which is convex penalty, (2) becomes a well-known LASSO [37] problem given in (3).LASSO is more easy to compute in big data because of its convex form.In addition to convex penalty function, nonconvex penalty function is also proposed to perform variable selection.For instance, [38] investigate the SCAD penalty which is given below.

LASSO = min
where ω t; λ = t 0 λ1 z≤λ + aλ − z/a − 1 1 z>λ , a = 3 7 is selected by general cross-validation.The elastic net (EN) [39] penalty is given as In this paper, we are going to fulfill the following two tasks: (G1) model interpretation and (G2) forecasting accuracy.Elastic net can be used to achieve these goals because its penalty function consists of both LASSO and ridge penalty.However, its forecasting performance is still affected negatively by the noise level which is difficult to estimate.To solve this problem, square root regularization is considered in our work by using square root error loss function y − Xβ 2 instead of square error  3 Complexity loss function y − Xβ 2  2 .Therefore, we combine the benefits of square root error loss and EN penalty by proposing square root elastic net (SREN) which considers the following optimization problem.

SREN = min
Comparing with EN which takes (5) into account, SREN has the following advantage: two tuning parameters (λ and η), which are determined by log p /n, can be selected properly since they do not involve σ that is extremely difficult to estimate in data analysis.Specifically, it is known that σ 2 = RSS/ n − p , where RSS represents residual sum of square.When p > n, σ cannot be estimated.Even when n > p, the high coherence causes a large RSS value which results in large σ value.SREN avoids estimating σ in the parameter tuning work which boosts the model forecasting accuracy.
Square Root LASSO (SRL) [40] considers the optimization problem as follows: Although both SRL and LASSO use the same L 1 penalty, SRL applies square root error loss function which can facilitate the parameter tuning work.Comparing with SRL, SREN adds ridge penalty η/2 β 2 2 which is a L 2 type penalty to handle the high coherence between variables and enforce more shrinkage to the model.Although they both apply square root error loss function, SREN is able to get more accurate result in a model with high coherence.Furthermore, SREN applies two tuning parameters (λ and η) to adjust the model performance while SRL just use one tuning parameter λ.
Two novel plans are proposed to design the algorithm for solving (6), which are denoted by Plan A and Plan B, respectively.(i) Plan A: denote the following: The algorithm is designed based on the following iterations: SREN-A = β j+1 = arg min Notice that soft thresholding operator is able to be applied to solve (9).
(ii) Plan B: the algorithm is derived based on the following iterations: To solve ( 9) and (10), threshold functions socalled Θ-estimators [41] are applied in our work.The definitions of thresholding rules are given as below.
Definition 1.A thresholding function is a real valued function It can be told from Definition 1 that Θ ⋅ ; λ is an odd monotone unbounded shrinkage rule for t, at any λ.Θ can be used in a vector manner if either t or λ is given as a vector.The LASSO, SCAD, and EN thresholding functions are provided as follows: where η > 0, λ > 0 are two regularization parameters.

Parameter Tuning.
Parameter tuning work is of great importance in assuring the performances of forecasting methods.Notice that there are two tuning parameters λ and η used in the proposed method.Cross-validation (CV) is a famous data-driven method which has been widely applied in machine learning community.Given a fixed value for λ and η, the in-sample data will be randomly partitioned into K pieces of roughly equal size.The forecasting model 4 Complexity will be trained using K − 1 pieces of in-sample data, and the test error is computed using the Kth piece.CV will repeat this procedure for K times.The CV errors are obtained by adding the test errors, and the optimal regularization parameter is determined by the smallest CV error.

Elman Neural Network. Elman neural network (ENN)
was first advocated by Elman in 1990 to solve speech recognition problem.It is a typically global feed forward local recurrent network.Its main network structure is consist of input layer, hidden layer, and output layer which are also the structure of three-layer feed-forward neural network [42] and backpropagation network [43].
The weights between different layers are going to be trained based on learning rule.The feedback connection has sets of neurons that record the output, and the weights are fixed.There is also a context layer in ENN which stored the output of hidden layer in the previous time point.Comparing with multilayer perceptron, ENN has a short memory and performs the task based on sequence prediction which adapts to time-varying characteristics.The schemes of ENN can be described in the following way: 5 Complexity where x t is the input vector, h t is hidden layer vector, y t is output layer, W, U, and b are weights and biases of the ENN, and T h and T y are activation functions.
The weights in the network are trained by gradientbased backpropagation through time (BPTT).To reduce the model complexity of the neural network, L 2 regularization is often applied and the following optimization problem is considered: where l y, ŷ; X, w = 1/n y − ŷ Xw denotes the mean square error of the forecasting model.The network weights needed to be estimated are given by w. ŷ represents the estimated forecasting value, and η o is the tuning parameter.
Notice that the complexity of neural network depends on the number of input layers and hidden layers.If variable selection method is applied appropriately, the number of inputs will be reduced so that a simple neural network can be constructed.

Square Root Elastic Net Elman Neural Network Model.
This paper combines the advantages of SREN and ENN Inputs: X (centered and scaled), y (centered), M: maximum number of iterations, λ, η: tuning parameters, tol: error tolerance.
Step for v = 1 to m; 10.
Step 4. ω j+1 = 1 + 1 + 4ω j  2, which is designed in the following 5 main steps: Step 1: Split the original global solar radiation dataset into training dataset and test dataset (cf.Section 3 for more details).
Step 2: CV procedure is applied to training data for selecting the optimal regularization parameters.
Step 3: SREN is used to select the important variables with regularization parameters.
Step 4: Elman neural network is established with variables selected by SREN.
Step 5: The forecasting performance is evaluated using the test dataset.
Algorithm 1 shows SRENENN algorithm with ω −1 = 0, ω 0 = 1, and β 0 = β −1 defined.When τ ≥ X 2 / 2, SRENEENN algorithm converges.However, there is no need to let the algorithm run until convergence to reduce the computation time.The stop criterion of SRENNE algorithm is determined based on trial and error.The convergence error tol is set as 1e -4, and maximum number of iteration M is given as 100.SRENENN algorithm uses Accelerated Gradient Method (AGM) [44] to reduce the number of iterations so that the convergence can be achieved using less computation time.AGM has three advantages: (i) it does not involve any computation of inverse of matrix; (ii) paralleling the selection of unknown parameter and computation of gradient; (iii) making use of momentum to increase the convergence speed. Define , and the convergence of SRENENN algorithm is guaranteed theoretically in Theorem 1 whose proof is shown in Appendix.
Theorem 1.Let τ be the step size of SRENENN algorithm and τ ≥ X 2 / 2, and assume the following regularity condition hold inf ξ∈A Xξ − y 2 > 0, where A collects all the linear combination of β j and β j+1 .Then the following inequality holds for some c > 0.

Case Studies
For the real data application, six sites from Xinjiang area in China are considered to demonstrate the advantages of the proposed SRENENN model via comparisons with traditional methods.1; it is observed that the forecasting performance of different models is going to be tested using four seasons, and the forecasting samples in each season take up approximately 24%, which is a reasonable proportion.

Evaluation Criterion.
To evaluate the forecasting performances of the proposed method and other comparing   9 Complexity approaches, several criteria including mean absolute percent error (MAPE), mean absolute error (MAE), root mean square error (RMSE), and Theil inequality coefficient (TIC) are applied as evaluation criteria [45].Let y i be true value, ŷi represents the estimated value, and N denotes to be the sample size of test data.The best forecasting model provides the lowest MAPE, MAE, RMSE, and TIC.The evaluation criteria are provided as below.Let N be the sample size or the number of pairs.The prediction sample size of each method was 216 (6 days × 9 hours × 4 seasons = 216), and thus, N = 216.For i = 1, 2, … , N, let y 1,i and y 2,i be the forecasting values of two different approaches and consider the following hypothesis: Alternative hypothesis: where μ 1 and μ 2 are medians of sequences The details of Wilcoxon signed-rank test process can be listed as follows [46][47][48]: Step 1: For i = 1, 2, … , N, calculate y 2,i − y 1,i and sgn y 2,i − y 1,i , where sgn ⋅ is the sign function.
Step 3: Let N r be the reduced sample size.Order the remaining N r pairs from smallest absolute difference to largest absolute difference y 2,i − y 1,i .Rank the pairs, starting with the smallest as rank 1.
Step 4: Calculate the sum W + of the positive ranks and the sum W − of the negative ranks.
Step 5: As N r increases, the sampling distribution of W converges to a normal distribution.Thus, for larger samples, Z statistic can be calculated as Z = W − N r N r + 1 /4/ N r N r + 1 2N r + 1 /6.If Z > Z critical , then reject H 0 .For small samples, W can be calculated as W = min W + , W − .If W ≥ W critical,N r , then reject H 0 .Alternatively, a p value can be calculated from enumeration of all possible combinations of Z or W given N r .

Friedman Test.
Friedman test is a nonparametric statistical test which can be applied to evaluate the performances of forecasting methods based on different criteria on multiple datasets [49].The Friedman test considers the following hypothesis: Null hypothesis: where R j = 1/B ∑ i r j i with r j i representing the rank of jth of k algorithms of B datasets.Based on the Friedman statistics [50]  If the null hypothesis H 0 is rejected which means there exist significant differences between the comparing algorithms, a post hoc test will be given based on critical difference (CD).

Statistical Analysis of Selected Variables.
To test whether the selected variables are significant or not, the following F test statistic is considered.
where RSS 0 is the residual sum of square for the least square fit of full model with p 1 variables.And the same for the smaller reduced model with p 0 variables.Under the Gaussian assumption and the null hypothesis that the smaller model is correct, the F test statistic will have a F α, p 1 − p 0 , n − p 1 distribution.If the F value is larger than the critical value, then the selected variables are determined to be significant.

Results and Discussion
In this paper, SREN is combined with the Elman neural network (ENN) to select the important variables and forecast the global solar radiation.SREN is a penalized variable selection method using convex penalty function which is computational efficient.Comparing with subset variable selection which considers all the possible combinations of the variables, SREN selected all of the important variables directly.
Lots of approaches are considered for global solar radiation such as SVM, ENN, LASSOENN, SCADENN, SRLENN, and PCAENN [50].Comparisons between these methods and SRENENN are presented in this part.
Table 2 shows the parameters applied in establishing the comparing forecasting models.The regularization parameters λ and η are selected as 0.0625 and 5e − 5 in SRENENN methods using 10-fold cross-validation.In LASSOENN and SCADENN, the regularization parameters are chosen as 4 and 1. N represents the maximum number of iterations to establish neural network and is set as 2000.The activation function from input layer to hidden layer (Func1) is given as Tansig transfer function based on trial and error.Similarly, denote Func2 to be the activation function from hidden layer to output layer and it is set as Tansig transfer function.The selection of the number of hidden neurons N h which determines the model complexity is important in constructing an ENN.The best value for N h is selected from a generated grid values {5, 10, 15, 20}.The back propagation through time (BPTT) is applied to construct an ENN with the weights initialized using random values from the uniform distribution U 0, 1 .Based on trial and error, the gradient descent with momentum and adaptive learning rate are set as 0.9 and 0.01.SVM is implemented using R package "e1071" with rbf kernel function with two unknown parameters γ, C selected from two grids 2 −5 , 2 −4 , … , 2 −1 and 2 2 , 2 3 , 2 4 using 10-fold cv.All the parameters are selected in all the models by proper tuning work.
The results presented in Table 3 reveal that SRENENN-B achieves the best results in terms of forecasting accuracy on average in all the sites except Site 3 where SRENENN-A has the best performance.The significant differences are observed among SRENENN-B, SVM, and ENN methods which do not involve any dimension reduction.For instance, MAE obtained by SRENENN-B is much lower than ENN in Site 2. The error has been reduced by about 37.23% using fewer variables.Comparing with SVM which also has a good performance, SRENENN-B improves the forecasting accuracy by 17.89%.In Site 4, SRENENN-B has boosted the RMSE of the ENN and SVM by 47.87% and 39.36%, respectively.Comparing with PCAENN, LASSOENN, and SCADENN which performs better than ENN, SRENENN-B is still the winner in terms of MAE, RMSE, MAPE, and TIC.It is easy to observe that the PCAENN, LASSOENN, and SCADENN provide similar performances but LAS-SOENN delivers better results than PCAENN, SCADENN, and SRLENN in almost all the sites.From the aspect of MAPE, LASSOENN provides better results in all sites except Site 1 and Site 2. Further, it was noticed that the performance of SVM is better than SRLENN in terms of MAE in Sites 2-6 and SRENENN-A outperforms SRLENN except Site 2. On the other hand, the computation time of different forecasting methods is shown in the last column of Table 3. Obviously, it takes SRENENN-B less computation time than other approaches.Both SVM and ENN which take all the variables as inputs use more computation time than penalized ENN and PCAENN.The computation time of other forecasting methods is comparable.The corresponding plot is shown in  4 depicts the scores of the compared models.The best model will give the lowest total score.Obviously, SRENENN-B provides the lowest score among all the compared methods (see the last column), followed by LASSOENN, SRENENN, SRLENN, SVM, PCAENN, SCADENN, and ENN.Tables 5 and 6 show the performances of compared forecasting approaches including four seasons.It is not difficult to find SRENENN-B that gives the highest forecasting accuracy.SVM provides better results than other methods in spring, autumn, and winter while ENN gives the worst result in four seasons.The results are quite similar as what we observe in Table 3.
The results using Wilcoxon signed-rank test between SRENENN-B and other forecasting approaches are summarized in Tables 7 and 8, which show Z statistic values and p values.In this study, the significant level is set as 0.05 so that the Z critical value is 1.96.From the tables, it is easy to observe that all of the Z statistic values are larger than 1.96 and p values are much smaller than 0.05.Thus, the null hypothesis is rejected and we decide that the proposed SRENENN-B model is significantly different from the other models.Since SRENENN-B has provided the smallest errors at all sites, it is concluded that SRENENN-B is superior over other models in terms of forecasting accuracy.13 Complexity of Friedman test show that the F distribution statistics follows F 7, 35 distribution and the critical value of it is 0.39.Thus, the null hypothesis that the ranks of compared methods are equal with each other is rejected.This means that a post hoc test based on Bonferroni-Dunn test is needed to make more comparisons.The CD value is calculated as 3.59 based on [49].Therefore, SRENENN performs significantly better than ENN, PCAENN, SCADENN, LASSOENN, and SVM for RMSE and TIC.This is because the average ranks between SRENENN-B and these competitors are larger than 3.59.On the other hand, SRENENN-B does not show great improvement over SRENENN-A and SRLENN in terms of evaluation criteria.
Figure 5 summarized the results of estimated values against the true value.It is not difficult to tell that the estimated values of SRENENN-B are closer to the true value than other compared approaches.ENN provides the worst results because all the variables are employed as inputs.Thus, there must be some redundant features contained in the ENN.SRENENN-B gives good forecasting results in all the sites which demonstrates that the selected variables temperature, pressure, solar zenith angle, wind direction, and wind speed are considered to be important for inputs of ENN.

Conclusions
Global solar radiation is a vital and hot research topic.Looking for a way to predict the global solar radiation accurately is crucial.There are a number of methods derived to achieve this goal.Our work investigated and studied the   Combining with the iterates defined by (9) or Eq. ( 10) and using Taylor expansion, we can get for some ξ j = ν j β j + 1 − ν j β j+1 with ν j ∈ 0, 1 .A simple reformulation of (A.7) yields that Under the regularity condition inf ξ∈A Xξ − y 2 > ε and τ large enough, F β j is monotone decreasing.Define M ≔ F β 0 and let X 2 2 < 2ε/M, we have C = 2/ Xβ j − y 2 − X 2  2 / Xξ j − y 2 .Thus, using the optimal conditions, β j has a unique limit point β * .Furthermore, β * satisfies the KKT condition which means it is a global minimum.This completes our proof.

Figure 1 :
Figure 1: A schematic overview of the whole paper.

Figure 3 :
Figure 3: Location description about global solar radiation in the Xinjiang area.

21 .
Obtain the solution path B = b uv and corresponding sparsity pattern G = g uv using CV training data T. Calculate CV errors 22. Calculate CV errors using F, B and G. Find the optimal tuning parameters 23.λ opt and η opt with respect to the smallest CV error.Step 4. Establish elman neural network 24.Determine the optimal model parameters using Training data D trn with selected variables considered as inputs Step 5. Evaluate the forecasting performance 26.Calculate the test error using Test data D tst 27.End for Algorithm 1: The SRENENN algorithm.6 Complexity and proposes a novel forecasting model called SRENENN model.The flowchart of SRENENN model is shown in Figure

7
Complexityprovided in the dataset.The samples of this dataset are collected based on the global solar radiation from 11:00 am to 19:00 pm in 2014 because the solar sources are very abundant during this time interval.The main purpose of this paper was to choose important variables from seven meteorological variables to perform the forecasting task.The strategy of splitting the data into training data and test data is given as follows: 19 days of months in each season are randomly selected as the training data to establish the forecasting model.The test data consists of 6 days which are also selected randomly from the remaining days in every season.Thus, the size of training data is 684 (19 days × 9 hours × 4 seasons = 684), and the size of test data is 216 (6 days × 9 hours × 4 seasons = 216).Furthermore, experiments on each season are also implemented based on the training samples (19 days × 9 hours = 171) and forecasting samples (6 days × 9 hours = 54) presented in Table

Figure 4 :
Figure 4: The computation time (in seconds) of eight models.

Figure 6
reveals the boxplots in terms of RMSEs of the compared forecasting models everyday in order to reveal the benefits of SRENENN.Figures6(a)-6(f), (A), (C), (E), (G), (I), and (K), show the RMSEs of each model with limitation including all the outliers.Median is applied here to make the comparisons because it is less sensitive to outliers.Here, RMSEs of ENN are far larger than other forecasting approaches at all the sites.From Figures6(a)-6(f), (B), (D), (F), (H), (J), and (L), obviously, SRENENN-B gives the lowest RMSE values.Therefore, based on boxplots (A)-(L) in Figures 6(a)-6(f), SRENENN-B delivers better forecasting performances.

Figure 5 :
Figure 5: The scatter plots of actual and forecast global solar radiation by seven models: bold black dash line represents the perfect fit.

F β j+1 + 1 2 β j+1 − β j T I Xβ j − y 2 − X T X Xξ j − y 2 β 1 2j − y 2 β= F β j − 1 2j − y 2 β
Abbreviation ENN: Elman neural network EL: Elastic net MAE: Mean absolute error MAPE: Mean absolute percentage error RMSE: Root mean square error SRL: Square root LASSO SREN: Square root elastic net TIC: Theil inequality coefficient CD: Critical difference.English Symbols h: Number of nodes in the hidden layer n: Sample size N: Maximum number of iterations in ENN p: Number of variables

Table 2 :
Parameter values of seven forecasting models at six sites.

Table 3 :
Mean forecasting errors and computation time (in seconds) of forecasting models at six sites.

Table 4 :
The total scores of compared methods for evaluation criteria.

Table 5 :
Mean forecasting errors of forecasting models for each season at Site 1, Site 2, and Site 3.

Table 6 :
Mean forecasting errors of forecasting models for each season at Site 4, Site 5, and Site 6.

Table 9 reveals
MAE, RMSE, MAPE, and TIC values of the SRENENN and other forecasting approaches.The results

Table 7 :
The results of Wilcoxon signed-rank tests between SRENENN-B and other competitors (p values are in the parenthesis).

Table 8 :
The results of Wilcoxon signed-rank tests between SRENENN-B and other competitors for each season at Sites 1-6 (p values are in the parenthesis).
Table 10 reveals statistical analysis of selected variables.It is observed that all the variables selected are significant because F values are much greater than critical

Table 9 :
The results based on Friedman test for compared methods.
crit .Furthermore, the coefficients of determination R 2 are approximately one which indicates that the established model is trustworthy.

Table 10 :
The results of statistical analysis of selected variables by SREN at Sites 1-6.