A Comparison of Hour-Ahead Solar Irradiance Forecasting Models Based on LSTM Network

*e intermittence and fluctuation character of solar irradiance places severe limitations on most of its applications. *e precise forecast of solar irradiance is the critical factor in predicting the output power of a photovoltaic power generation system. In the present study, Model I-A and Model II-B based on traditional long short-term memory (LSTM) are discussed, and the effects of different parameters are investigated; meanwhile, Model II-AC, Model II-AD, Model II-BC, and Model II-BD based on a novel LSTM-MLP structure with two-branch input are proposed for hour-ahead solar irradiance prediction. Different lagging time parameters and different main input and auxiliary input parameters have been discussed and analyzed. *e proposed method is verified on real data over 5 years. *e experimental results demonstrate that Model II-BD shows the best performance because it considers the weather information of the next moment, the root mean square error (RMSE) is 62.1618W/m, the normalized root mean square error (nRMSE) is 32.2702%, and the forecast skill (FS) is 0.4477. *e proposed algorithm is 19.19% more accurate than the backpropagation neural network (BPNN) in terms of RMSE.


Introduction
Along with the rapid increase of solar power generation, more and more solar power is connected to the grid, which has already shown its substantial economic impact. Based on the statistics of the International Renewable Energy Agency (IRENA), the total installed capacity for PV has reached 205.493 GW in China at the end of 2019 [1]. However, power generation from photovoltaic systems is highly variable due to its dependence on meteorological conditions. ere is a severe challenge to the security of the power grid because of the fluctuation of solar power. erefore, an effective method of solar irradiance forecasting can mitigate intermittency as it gives information about future trends and allows users to make decisions beforehand. Solar forecasting is a timely topic, and several short-term solar irradiance forecasting approaches have been presented recently. Broadly, prediction can be divided into five categories based on forecast methods as follows [2]: (1) time series; (2) regression; (3) numerical weather prediction; (4) image-based forecasting; and (5) machine learning. A time series is a sequence of observations taken sequentially in time. at is divided into stationary and nonstationary time series forecasting models. Autoregressive (AR), moving average (MA), and autoregressive moving average (ARMA) are commonly used to forecast stationary trends; integrated moving average (IMA), autoregressive integrated moving average (ARI-MA), seasonal autoregressive integrated moving average (SARIMA), and other models are used to forecast nonstationary trends [3][4][5][6]. Regression is a statistical process for estimating the relationships among variables; it is a handy tool to describe the relationship between solar irradiance and exogenous variables [7,8]. Numerical weather prediction (NWP) models directly simulate the irradiance fluxes at multiple levels in the atmosphere, separately considering the shortwave and longwave parts of the solar spectrum [9,10]. Image-based forecasting method is using satellite cloud images and all-sky images as main or auxiliary data sources to forecast irradiance.
is can effectively increase forecasting skills, as it provides warning of approaching clouds at a lead time of several minutes to hours [11][12][13]. e machine learning method, as a branch of artificial intelligence, can learn from datasets and construct a nonlinear mapping between input and output data. Nowadays, machine learning (ML) is perhaps the most popular approach in solar forecasting and load forecasting [2]. Although artificial neural networks (ANNs) and support vector machines (SVMS) are still the basis of machine learning methods in solar irradiance prediction, many other approaches have been used recently, such as k-nearest neighbors (kNN), random forest (RF), gradient boosted regression (GBR), hidden Markov models (HMMs), fuzzy logic (FL), wavelet networks (WNN), and long short-term memory networks (LSTM) [14][15][16][17][18][19][20][21][22]. Meanwhile, some hybrid algorithms are used to improve the prediction accuracy. For example, the metaheuristic algorithms, such as cuckoo search (CS) algorithm, krill herd (KH) algorithm, and chaotic immune algorithm, are combined with a support vector regression (SVR) model to predict electric load [19,[23][24][25][26]. Some signal preprocessing methods, such as variational mode decomposition (VMD) method and empirical mode decomposition (EMD), are also used in the hybrid model [24,25]. Obviously, the abovementioned methods are not detailed lists. Many other applications of machine learning algorithms in solar radiation prediction can be found in recent literature [27].
As a novel machine learning tool, LSTM has successful applications in solar irradiance forecasting [28][29][30]. Due to its special maintaining a memory cell structure, it can preserve the important features which should be remembered during the learning process and improve performance. erefore, using LSTM to predict irradiance can not only obtain the correlation during continuous hours but also extract its long-term (e.g., seasonal) behavior trends [30]. Yu et al. [29] proposed an LSTM-based approach for short-term global horizontal irradiance (GHI) prediction under complicated weather conditions, the result indicated that LSTM outperforms ARIMA, SVR, and NN models, especially on cloudy days and mixed days. Qing and Niu [30] proposed a novel hourly day-ahead solar irradiance predicted method using weather forecasts based on LSTM networks. e proposed algorithm uses the hourly weather forecasts of the same day and data information at the predicted time as the input variables, and the hourly irradiance values of the same anticipated day are taken as the output variable. Experimental results show that the proposed learning algorithm is more accurate than persistence, linear least squares regression method (LR), and BPNN due to the consideration of time dependence. Srivastava and Lessmann [28] studied the ability of LSTM in predicting solar irradiance, demonstrated the robustness of LSTM, and showed that the LSTM model with optimally configured outperforms GBR and FFNN for day-ahead GHI forecasting. Abdel-Nasser and Mahmoud [31] proposed a method based on LSTM to forecast the output power of PV systems accurately. Liu et al. [32][33][34] proposed a new hybrid approach for the wind speed high-accuracy predictions based on some decomposition algorithm (such as secondary decomposition algorithm (SDA), empirical wavelet transform (EWT), and VMD) and the LSTM networks.
However, the LSTM methods mentioned above do not deeply study the effects of different parameters and structures on experimental results, but these factors will affect the prediction accuracy. In this paper, two different models based on traditional LSTM network are applied, and the effects of various parameters are investigated; meanwhile, four models based on a novel LSTM-MLP structure with two-branch input is proposed. For the new LSTM-MLP model, we use historical irradiance (or historical irradiance and meteorological parameters) as the main input and the meteorological parameters at the current time or the next time as the auxiliary input to predict the irradiance at the next time through the multilayer LSTM-MLP network. Experimental results show that the proposed model can achieve better prediction results.
e main innovations of this study are as follows: (1) An LSTM-MLP structure with two branches, including main input and auxiliary input, is proposed, which can provide a reference for similar models. (2) It is confirmed that the lagging time plays an important role when the input variables of the LSTM model are small. Still, for more input information, it is not that the more the lagging parameters, the higher the accuracy. (3) e meteorological parameters at the next moment play a vital role in the prediction accuracy, which can be gained by the weather forecast. e organization of this paper is as follows: e methodology is described in detail in Section 2. Section 3 provides information about the dataset. Experimental results and discussion are presented in Section 4. Finally, conclusions are given in Section 5.

Long Short-Term Memory Network.
In the learning phase, the traditional neural network cannot use the information learned in the previous time step to model the data of the current step. is is the main shortcoming of conventional neural networks. RNNs attempt to solve this problem by using loops that pass information from one step of the network to the next, ensuring the persistence of the information. In other words, the RNNs connect the previous information to the current task. Using previous sequence samples may help to understand the current sample. e LSTM network, which has the time-varying inputs and targets, is a special RNN and was initially introduced by Hochreiter and Schmidhuber [35]. Due to the excellent ability to solve the long-term and short-tern dependency problem, the LSTM network often has satisfactory performance in processing time series. A general architecture is composed of a cell (the memory part of the LSTM unit) and three "regulators" (usually called gates), of the flow of information inside the LSTM unit: an input gate, an output gate, and a forget gate. e memory unit is an essential parameter of the LSTM network, which can store information over an arbitrary time. e input gate, forget gate, and output gate can control the actual input signal by adding or deleting information to the signal state.
A schematic of the LSTM block can be seen in Figure 1. Every time a new input comes, its information will be accumulated to the cell if the input gate is activated. e prior cell status could be forgotten in this process if the forget gate is activated. Whether the latest cell output will be propagated to the final state is further controlled by the output gate.
e model input is denoted as x � (x 1 , x 2 , . . . , x T ), and the output sequence is denoted as y � (y 1 , y 2 , . . . , y T ), where T is the prediction period. In the context of solar irradiance forecasting, x can be considered as historical input data (e.g., irradiance and meteorological parameters), and y is the forecasting data. e predicted irradiance will be iteratively calculated by the following equations [36]: where i t denotes the input gate, f t is the forget gate, c t is the activation vectors for each cell, o t is the output gate, m t is the activation vectors for each memory block, w is the weight matrices, b is the bias vectors, and "ʘ" represents the scalar product of two vectors, and σ(·) denotes the standard logistics sigmoid function defined as follows: g(·) is a centered logistic sigmoid function defined as follows: h(·) is a centered logistic sigmoid function defined as follows: 2.2. Model Development. As previously mentioned, the primary objective of this study is to examine the feasibility of the LSTM network for short-term solar irradiance forecasting and find the optimal structure of the LSTM for the forecast. In this section, firstly, the standard LSTM solar irradiance forecasting pipeline is introduced. en, a classical LSTM model with two input structures and a novel model with four different input structures were conducted to discuss the performance of the LSTM network. Figure 2 presents a standard pipeline for solar irradiance forecasting through LSTM. e data is divided into training, validation, and test. e feed-forward and feed-backward are the two types of LSTM models that are used to process the data and train network further. e error calculation is carried out when the models are developed, which can be used to describe the training accuracy and decide the feedbackward. At the final stage, the selection of a successful model for prediction is established. e structure of the conventional LSTM model (we call it Model I) for solar irradiance forecasting can be seen in Figure 3. e network structure contains 1 input layer, 2 LSTM layers (or 1 LSTM layer), and 1 output layer. e input layer includes two different structures in which the input A is the data of historical irradiance, and input B is the data of historical irradiance and meteorological parameters. ese structures can be expressed as I-A and I-B. For input A (I-A), the historical irradiance at t − 1, t − 2, . . . , t − m time is feed LSTM layer 1; for input B (I-B), the historical irradiance and meteorological parameters at t − 1, t − 2, . . . , t − m time is feed LSTM layer 1, and m is the length of the lagging window in time.
Meanwhile, the novel LSTM-MLP structure is proposed in Figure 4 (named Model II). A two-branch structure is designed, including one main input, one auxiliary input, one main output, and one auxiliary output. e data of history irradiance (or irradiance and meteorological parameters) is as main input, which is feed to LSTM layers. When the data is output from the LSTM layer, one part is output as auxiliary output, and the other part is previously combined with the meteorological parameters (auxiliary input) at the current or next time and sent to a new MLP structure. After several hidden layers of MLP, the final output is the main output, which is the irradiance prediction value at the next time. e simplified expression of the above operation is as follows: where x input main input represents the main input, which is the time series of historical irradiance (or together with historical meteorological), F LSTM denotes the LSTM layer, h LSTM represent the output through the LSTM layer, x aux input denotes the auxiliary input described in Figure 4, ⊕ means the concatenate operator, dense means the fully connected layer, F MLP denotes the MLP layer, and y aux and y main denote the auxiliary output and main output, respectively. As can be seen in Figure 4, there are two input methods for the main input and auxiliary input, respectively. According to the different combinations of main inputs A and B and auxiliary inputs C and D in Figure 4, the model can be expressed as II-AC, II-AD, II-BC, and II-BD. In order to find better network parameters, six experiments are designed with two models mentioned above. ere are (1) Model I-A; (2) Model I-B; (3) Model II-AC; (4) Model II-AD; (5) Model II-BC; and (6) Model II-BD, where the influence of different lagging time parameters (e.g., from Lagging 1 and Lagging 12) is discussed. Figure 5 shows the input (or main input) time series structure of the train samples. S(t) is the current data, n is the number of train samples, and m is the number of input data in each group, which is the number of lagging time and the length of the lagging window in time. For example, we used , and S(t−1) as training input and S(t) as training output. en the data are shifted; the input has become S(t−m−1) to S(t−2), the output is S(t−1), and so on.

Forecasting Accuracy Evaluation.
To assess the prediction performance of the involved models, four error measures, which include the root mean square error (RMSE), the normalized root mean square error (nRMSE), the mean absolute error (MAE), the mean bias error (MBE), and R (Pearson's correlation coefficients) are utilized in the forecasting experiments.
ese indexes can be defined as follows: where N denotes the number of testing instances, Y i ′ denotes the prediction value of the models, Y ′ denotes the mean value of Y i ′ , Y i denotes the measured value, and Y denotes the mean value of Y i .
Besides, forecast skill (FS) is an indicator that compares a selected model with a reference model (usually with the persistence model), regardless of the prediction horizon and location [37,38], which is a fair-minded approach to evaluating performance in solar irradiance prediction, as described by the following equation [2]: e persistence model is one of the most basic prediction models, which is often applied to compare the performance of other prediction models. e definitions of this model are varied; this paper adopts the most basic definition, which is to assume that the predicted value at the next time is the same as the present value [39,40]: To further evaluate the performance of the adopted model compared with the benchmark model, the promoting percentage of RMSE (P) is employed to make a further comparison. e formulas are as follows:

Mathematical Problems in Engineering
where P is promoting percentage of RMSE, and RMSE benchmark and RMSE comparison are the root mean square error computed from the benchmark model and comparison model, respectively.

Data and Analysis
e data used in this study came from a solar power plant in Denver, Colorado, USA. Average global horizontal irradiance (GHI; in this paper, solar irradiance represents GHI) and meteorological data (such as ambient temperature, relative humidity, wind velocity, atmospheric pressure, precipitation, and so on) have been collected in a one-hour resolution during January 1, 2012, to December 31, 2016, from NREL Solar Radiation Research Laboratory [41]. e data from 2012 to 2015 is used for training and validation; the data from 2016 is used for testing. e main statistical characteristics of solar irradiance in this dataset are shown in Table 1.
Pearson's correlation coefficient is the test statistics that measures the statistical relationship or association between two continuous variables. e relationship between irradiation and wind speed, atmospheric pressure, air temperature, and relative air humidity was analyzed to determine whether these variables should be included as inputs and which parameters to choose as inputs in this network. Table 2 shows the Pearson correlation coefficient between the five weather variables and the solar irradiance on the dataset. It can be observed that only temperature and humidity have a high correlation. However, the irradiance is not correlated with wind speed, precipitation, and pressure, so these three meteorological parameters are excluded. Figure 6 shows the average hourly irradiance distribution for different months in 2016. It can be noticed that there is a strong correlation between hours for each day and solar irradiance. Obviously, the irradiance value is low at the beginning of the day and increases to the peak value at noon and then gradually decreases in the afternoon. Meanwhile, it can be noticed that the peak of irradiance is different every month. e highest peak is between June and July, and the lowest peak value is between December and January. Consequently, the time must be used as an input variable.
Autocorrelation function (ACF) refers to the degree of similarity between time series and their own lag series in a continuous-time interval. However, irradiance is a timeseries data, which can be characterized by ACF. Let X t be a time series with length T. Denote X t−h the lagged time series by h periods. e autocorrelation of X t at lag h is given by where c X (h) is the autocovariance of X t at lag h, c X (0) is the autocovariance of X t at lag 0, and μ X is the expected value of X t . From the ACF plot above, we can see that our daily period consists of 24 timesteps (where the ACF has the second-largest positive peak). While it was easily apparent from the natural law, it can also be seen from Figure 7 that the time interval of the maximum positive and negative correlation is 12 hours. At the same time, in the actual model calculation, when the lagging time is between 12 and 24, the performance is very similar. erefore, in this paper, we choose a 12-hour lagging time.
e training dataset is optimized by Adam algorithm, and the sigmoid function is used in the output layer for all models. e program code of this paper is performed on an Intel ® Core ™ I7-8600 CPU using Python 3.7.

Results and Discussion
In this section, the above six models were simulated and calculated to verify the performance of the proposed method. We discuss the effect of the input length    (1) For Model I, since it has only one single-branch input, the number of input variables directly affects the prediction accuracy. As can be seen in Table 3, it is clear that with the increase of lagging time parameter, the RMSE and nRMSE decrease continuously. is fact implies that, for this case study, data from previous points in time is vital for forecasting, especially when only historical irradiance is used for prediction. (2) However, when the historical irradiance and meteorological parameters are input to the LSTM network at the same time, the influence of the lagging time parameters on forecasting accuracy has a significant downward trend. When the lagging time is only one hour, the RMSE of Model I-A is 110.64 W/ m 2 , and the RMSE of Model I-B is 75.4654 W/m 2 , which shows that when the lagging time is fixed, the information of meteorological parameters helps the prediction of irradiance very well. (3) As can be seen in Tables 3 and 4, in general, the prediction accuracy will increase with the increase of lagging time in 1-12 hours. However, the expansion of lagging time will lead to a rise in input variables, increasing in operation time. Considering these factors, we need to choose a more reasonable lagging time. In this case, although the best lagging time is 10 hours and 11 hours for Model I-A and Model I-B, respectively, we think the 8 hours lagging time is reasonable. Without a doubt, the perfect lagging time may be different for different datasets.  I-A and II-AC in Tables 3 and 5, with the same lagging time, the prediction accuracy has a noticeable difference; especially when the lagging time is small, the difference is more prominent. For instance, when the lagging time is 1 hour, the RMSE is 110.64 W/m 2 in Model I-A, but the RMSE is 73.2477 W/m 2 in Model II-AC. e best prediction accuracy of the two models is 75.22 W/m 2 and 71.0791 W/m 2 by RMSE, respectively, which shows that the proposed new branch can improve the prediction accuracy.
Meanwhile, it can also be seen from Table 5 that historical irradiance is used as the main input, and whether the auxiliary input is the meteorological parameter at the current time or the next time, the prediction accuracy is the same. (5) Comparing Tables 5 and 6, we find that using the meteorological parameters of the next moment can better take advantage of the proposed new branch structure. As shown in Model II-BD in Table 6, when historical irradiance and meteorological parameters are the main input and the meteorological parameters at the next moment are the auxiliary input, the prediction effect is the best; the RMSE and nRMSE are 62.1618 W/m 2 and 32.2702, respectively. For Model II-BC, because the current meteorological parameters in the auxiliary input already exist in the main input, the accuracy improvement effect is not apparent.
e best parameters and architecture of the LSTM network for 1-hour-ahead forecasting with the proposed six models are shown in Table 7. In Model I, two LSTM layers within 100 and 40 neurons (100-40) are used with lagging time 10 and 11, respectively, but in Model II, a 64-32 MLP hidden layer is added, and most of them used only one LSTM layer. e performance of the six models with the optimal parameters and structure can be seen in Table 8 and Figure 8. Compared with the persistence model, the performance of the forecast skill (FS) and the promoting percentage of RMSE (P) of each model is significantly improved. Compared with BPNN, the P of each model has also improved, and the improvement advantage of Model II-BD is more visible, reaching 19.19%. e RMSE and time cost curve of the different models with different lagging time are shown in Figure 9. It can be seen from the figures that with the increase of lagging time (the dimension of the input variable increases), the time cost increases approximately linearly (especially, in Figure 9(f ), there is a sudden change in time cost because the number of LSTM layers increased from 1 to 2).
is is because the increase in input variables leads to an increase in the amount of calculation. Meanwhile, except for Model I-A, the RMSE of other models does not decrease linearly with the rise of lagging time, but only shows a certain downward trend, and the whole curve is fluctuant. is indicates that the optimal lagging time is not the maximum lagging time; we need to choose the appropriate lagging time according to the actual dataset and the required accuracy. e one-hour-ahead irradiance forecasted results for the proposed Model II-BD with the best parameters and architecture are shown in Figure 10. As can be seen in Figure 10(a), the blue circle (O) in the figure represents the measured value, the red asterisk ( * ) denotes the forecasted value, and the predicted value and the actual value can remain the same for most of the time. It can be shown more clearly from the local enlarged drawing that the difference between measured and forecasted values is small. It is clear from Figure 10(b) that the predicted values are strongly correlated with the measured solar irradiance data, and the linear regression coefficient reaches 0.9642. So, in summary, the forecasted values of the solar irradiance have good agreement with the measured values.
rough the above experimental results, we found that the Model II-BD structure of the LSTM-MLP model has the best prediction accuracy. e following LSTM-MLP model specifically represents the LSTM-MLP model with a Model II-BD structure.
Six experimental simulations were performed to verify the performance of the proposed LSTM-MLP model, including BP network, general RNN network, random forest            but the error is more prominent. e rapid change of the cloud layer in the cloudy day brings enormous difficulties with irradiance prediction. In contrast, the proposed model shows a good prediction effect; the nRMSE is 35.89%. On a rainy day (May 17, 2016), the measured value of irradiance is low, which can be seen from the solid red line in Figure 11, but the predicted value of the red dotted line can better follow the change of measured value. is indicates that the proposed LSTM-MLP model shows better performance on rainy days. All related results are reported in Table 10.
In order to place the work with other published works, the results with the proposed approach and results from different studies of others are compared in Table 11. e results are similar.

Conclusions
In this work, a new novel LSTM-MLP structure with twobranch input is proposed. e proposed LSTM-MLP includes one main input, one auxiliary input, one main output, and one auxiliary output. e data of historical irradiance (or irradiance and meteorological parameters) is as main input, which is feed to LSTM layers. One part from the LSTM layer is output as auxiliary output, and the other part is previously combined with the meteorological parameters (auxiliary input) and sent to a new MLP structure. e output from several hidden layers of MLP is the main output, which is the final irradiance prediction value. Four network structures based on LSTM-MLP and two network structures based on traditional LSTM are designed and developed. A real-world test case in Denver, which consists of 5 years of data, is used to verify and discuss the potential of each model. e experimental results demonstrate that the proposed Model II-BD, which with historical irradiance and meteorological parameters as main input and the next moment meteorological parameters as an auxiliary input, significantly outperforms other models in terms of three widely used evaluation criteria. e RMSE is 62.1618 W/m 2 , the nRMSE is 32.2702%, and FS is 0.4477. Compared with BPNN, the promoting percentage of RMSE (P) of Model II-BD is 19.19%. e meteorological parameters at the next moment play a vital role in the prediction accuracy, which can be gained by the weather forecast. e lagging time is a significant variable for the input of LSTM, especially when only historical irradiance is used as input (e.g., Model I-A).

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.