Learning Processes to Predict the Hourly Global, Direct, and Diffuse Solar Irradiance from Daily Global Radiation with Artificial Neural Networks

This paper presents three different topologies of feed forward neural network (FFNN) models for generating global, direct, and diffuse hourly solar irradiance in the city of Fez (Morocco). Results from this analysis are crucial for the conception of any solar energy system. Especially, for the concentrating ones, as direct component is seldom measured. For the three models, the main input was the daily global irradiation with other radiometric and meteorological parameters. Three years of hourly data were available for this study. For each solar component’s prediction, different combinations of inputs as well as different numbers of hidden neurons were considered. To evaluate these models, the regression coefficient (R) and normalized root mean square error (nRMSE) were used. The test of these models over unseen data showed a good accuracy and proved their generalization capability (nRMSE= 13.1%, 9.5%, and 8.05% and R= 0.98, 0.98, and 0.99) for hourly global, hourly direct, and daily direct radiation, respectively. Different comparison analyses confirmed that (FFNN) models surpass other methods of estimation. As such, the proposed models showed a good ability to generate different solar components from daily global radiation which is registered in most radiometric stations.


Introduction
Morocco is a fossil energy deficient country (95% imports) but has an impressive environmental wealth.It is characterised by an intensive solar irradiation, as it lies in a sunny belt, which favours the utilization of solar energy.
Morocco has launched one of the biggest solar projects: the world's largest concentrated solar power plant (Figure 1), costing about nine billion dollars, with a combined capacity of approximately 2 GW, to be completed by 2020.
The installation of any solar power system requires highquality solar radiation measurements in order to size and simulate the system's functioning.Lack of long series of data or poor quality data series can combine errors in plant design, sizing, and performance forecasting, the thing that impacts negatively on the investment.Unfortunately, measures of solar radiation are usually inaccurate and rare over the world [1], especially in Morocco.Due to the measuring device's price, there is only a small number of solar stations.These stations, in addition to being insufficient, generally measure only global radiation.However, the knowledge of direct solar component is necessary for concentrating solar power plant (CSP) or concentrated photovoltaic (CPV) sizing.This component is difficult to measure because it requires the use of a pyrheliometer equipped with solar tracking system, which is very expensive.
On the other hand, sizing correctly all types of solar system or simulating their performance requires at least, daily or, even better, hourly values of different solar radiation components.Hence, it seems that elaborating relationships between available daily global data and the direct and diffuse solar irradiation ones at different time steps can be beneficial.
Previous studies have stated that artificial neural networks (ANNs) are particularly suitable to reach this goal.Indeed, artificial neural networks (ANNs) are powerful when applied to problems whose solutions require knowledge that is difficult to specify but for which there is a huge quantity of examples.The neural network approach does not need to know any information regarding the process that generates the data.Recently, ANN models have been used in solar radiation modelling for many locations with different climates.Pertaining researches have been done in countries such as Greece, Saudi Arabia, Turkey, Egypt, Cyprus, Spain, India, Oman, Algeria, the UK, and Malaysia .But no work seems to exist in Morocco.Thus, the purpose of this paper is the generation of horizontal hourly global, direct solar irradiations and daily direct radiation using ANN's models.Daily global solar radiation, which is the most available component, and different astronomical variables are used as input to the ANN models.
The paper is organized as follows.In the first paragraph, we present a bibliographical review that illustrates the ability of artificial neural networks (ANNs) to elaborate nonlinear relationships between input and output data.Such relations were developed between meteorological parameters and global solar irradiations for different time scales especially monthly, daily, or hourly mean values but rarely for direct solar component.In the second paragraph, we present the proposed feed forward neural networks (FFNNs) and the statistical parameters to evaluate their performances.
In the third part, we give a description of our data measurement station where different meteorological variables are measured.A database choice criteria will be defined.Then, for each solar component, the methodology that was developed to address different issues will be applied and results will be discussed.Finally, conclusions and perspectives for further studies are presented.

Bibliographical Review
Modelling or predicting solar irradiation methods can be classified in three categories: (i) Methods that use empirical relations between clearness index (K t ) and sunshine fraction (S f ); K t is defined as a ratio of horizontal global solar irradiation on extraterrestrial irradiation while S f is the sunshine duration divided by the theoretical day length.Generally, most of these methods have not been very accurate as they used high time steps or averaged data [26,34,35].Stochastic models have been also applied at different time scales [36][37][38].
(ii) The second category concerns models considering irradiative transfer modes, solar radiation, and earth atmosphere exchanges such as Rayleigh diffusion and absorptions by ozone, aerosols, and water vapour [26,[39][40][41].Besides the fact that these models are complex, they only estimate solar irradiation in clear sky conditions.
(iii) The most recent category is based on artificial intelligent methods.Nowadays, different models based on the artificial neural networks (ANNs) were introduced in literature for modelling and prediction of solar radiation data from meteorological and geographical parameters.ANNs are powerful when applied to problems whose solutions require knowledge that is difficult to specify but for which there is an abundance of examples.Indeed, for an ANN approach, we need a long-term data in order to get a better model.The developed model can be used either for forecasting data series, estimating solar irradiation from exogenous meteorological data, or for extrapolating solar irradiation from data measured on other sites [14,[42][43][44].In fact, FFNNs can be used for any kind of input to output mapping.A feed forward network with one hidden layer and enough neurons in the hidden layers can fit any finite input-output mapping problem [45].
This study belongs to this last category: While in literature many ANNs have been developed for the prediction of global radiation, estimation of the direct irradiation has been less investigated [46].Thus, different architectures of the feed forward neural network (FFNN) models were proposed to predict from daily global solar radiation the direct and diffuse solar radiation at different time steps.All components are considered on a horizontal surface.

Conception of the Artificial Neural Network (ANN) Model
3.1.Artificial Neural Networks.Artificial neural networks are information processing systems that are nonalgorithmic and massively parallel.They are composed of layers of parallel units called neurons.These neurons are connected with links called synapses.Each neuron has its own weight computed from the learning from data.They receive inputs over their incoming connections, perform nonlinear operations generally, and output the final results.ANNs have been applied in various aspects of science and engineering [47,48].There are two major categories of ANN: feed forward and feedback (recurrent) networks.The main difference between  International Journal of Photoenergy these categories is the existence of one or more loops in recurrent models, while feed forward networks are organized into layers connected strictly in one direction from the first layer to the last one [12].sums up its inputs x k after weighting them with the strengths of the respective connections w ik from the input layer and calculates its output y i as follows: f is a transfer function that can be a sigmoid, hyperbolic tangent, or radial basis function.The final output in the last layer is computed similarly.The neural network adopted in our study is a multilayer feed forward backpropagation network.The backpropagation is the workhorse of learning with ANNs.It consists to adjust the weights of the neurons by minimizing the measure of the difference between the measured data and the predicted data obtained by (1).Inside the training, the neurons' weights are being updated by reinjection of the error inside the network.
Our network was designed and trained using MATLAB's code and MATLAB's neural network toolbox.A simplified schematic diagram of this network is shown in Figure 2(c); the main characteristics of this model are to be mentioned as follows: (i) There is one hidden layer (the user can change the number of hidden neurons).
(ii) The transfer function adopted is sigmoidal while the output node has a linear activation function.
(iii) The training algorithm is backpropagation based on a Levenberg-Marquardt (LM) minimization method which is the most commonly used [12,48].
(iv) The learning procedure is controlled by a crossvalidation technique based on a random division of the initial set of data in 3 subsets (training, validation process control, and testing).

Model Evaluation.
To evaluate the quality of estimation and ANN performances, several parameters can be used such as the following [49].
Root mean square error (RMSE) and normalized RMSE (nRMSE) are expressed as follows: Determination coefficient R 2 is defined by the equation: Mean absolute error (MAE) and normalized mean absolute error (nMAE) are defined by the following equation: (i) Root mean square error (RMSE) shows the difference between the measured values and the predicted ones; it indicates the scattering of data around linear lines.The approximation is better if RMSE is minimal (tends to 0).
(ii) The determination coefficient (R 2 ) expresses the correlation between the real values and the estimated ones; the best approximation corresponds to the highest R 2 (closer to 1).Our station measures global irradiance (G h ), direct normal irradiance, diffuse irradiance (Df h ), wind speed (WS h ), wind direction (Wd h ), air temperature (T h ) and relative humidity (Hr h ).The station has also a pluviometer to measure precipitations and some radiometers to measure spectral components of solar radiation, in particular, the UV and the active photosynthetic radiation APR.All the measure instruments are related to a data acquisition board (CR10X) with a storage module.Data are measured and recorded every 5 seconds and then converted to hourly averages.

Meteorological Data and Database Development
Hence, three years of hourly data recorded from the 1st January 2010 to 31st December 2012 are available with 26304 data records for this study.In order to extract outlier values, each parameter was examined.Then data before sunrise and below sunset periods were also deleted to avoid the mask effect or a nonreliable response of pyranometers at high zenith angle on solar data [50].Therefore, we have 13,085 records for each variable mentioned above.All these variables are going to be used as input parameters to the ANN models.Figure 3 presents the variation of horizontal direct irradiation (HDI) h , diffuse (Df) h , and global (G h ) solar radiation versus time over two years.
In addition to measured variables, we have calculated the sunshine duration (SD) defined by the World Meteorological Organization as the time interval when direct solar radiation exceeds 120 (W m −2 ).It is a radiometric parameter, giving information about the site's nebulousness [51].Besides, we have considered some astronomical variables: declination  International Journal of Photoenergy angle (δ) and daylight hours (DH), the first permits to designate a specific day; the second indicates the period of the year considered.We have also calculated sunset hour angle (W ss ), sun hour angle (HA), extraterrestrial solar radiation on horizontal plan (ER), and the sun fraction (S f ), defined as a ratio of solar duration to daylight hours.All these astronomic variables have been computed using analytic expressions detailed in the Appendix.
4.2.Database Elaboration Criteria.The ANN models' performance depends on the choice of the best combination of weather variables as input, training algorithm, and ANN architecture design.The most important key task in time series prediction is the selection of the input variables.It is, in fact, a prerequisite stage as there is no systematic approach to adopt for nonlinear ANN models [45].However, we must take into account some criteria such as the following: (i) Parsimony which consists in developing the simplest ANN architecture with a minimum of inputs, hidden layers, and hidden neurons while keeping high performances.
(ii) Avoid redundant inputs (they contain the same information) and choose the best-correlated variables to solar irradiation.Indeed, too many inputs can reduce the model efficiency [51].
To deal with this issue, we will either compute the correlation between different input variables and the target variable and then decide of the best combination or consider many combinations of variables during the training phase and choose the one that gives the best results.

Prediction of Hourly Global Solar Irradiation from
Daily Radiation.As mentioned before, daily global solar radiation, when measures exist, is the most common available measure with relatively long series of data.But to carry out an accurate solar system sizing and evaluation, hourly data are needed.In this section, a FFNN model was trained to generate hourly horizontal global solar radiation.The daily global radiation (G D ) is the main input in addition to some calculated variables especially declination, hour angle, sunset hour angle, and extraterrestrial solar radiation on horizontal plan.
To implement the network, different combinations of these parameters have been proposed as matrix input (stimuli) to FFNN models with horizontal (HDI) D as target.For each combination, different configurations of the neural network were tested by changing the number of hidden neurons.Training was repeated up to 10 times for each number of hidden neuron so that we choose the best configuration.Table 1 shows the best results for training through the correlation coefficient R and RMSE.The training was done over 2 years (8714 records).
According to these results, the best FFNN model corresponds to three nodes in the input layer (HA, W ss , and G D ) and 10 neurons in the hidden layer.To ensure the efficiency of the developed network to generate synthetic hourly data, it was tested over unseen data (1 year, 4371 records).
Figure 4 shows a regression plot which represents predicted values of hourly global solar radiation for the year 2012 using the proposed model versus the measured ones.It is clear that there is a good correlation: R = 0 98 and RMSE = 0.061 kWh/m 2 .
To discuss the validity of these results, the proposed model was compared with accurate empirical models.A comparison between our model and Liu-Jordan and Collares-Pereira and Rabel (C-P&R) models is carried out.A brief description of these models is given in the Appendix.In Figure 5, we show a sample (8 solar days) to illustrate the conducted comparison.It appears that the three models can predict hourly solar radiation data accurately in clear sky days.Nevertheless, sometimes the Liu-Jordan model tends to underestimate values.As it can be seen, the proposed model is more accurate than the other two models.However,   5 International Journal of Photoenergy for the three models the prediction's accuracy is lower on cloudy days than on sunny days but still acceptable as most of the solar radiation prediction model's accuracies degraded on cloudy days [52][53][54].
In Table 2, we reported a deepened evaluation of the three models using nMAE, nRMSE, and R 2 .Indeed, one year of hourly data was generated using the three models.Results show that the proposed model exceeds the other models.This implies that the proposed model has more ability to predict future data based on the RMSE value.Moreover, it is more powerful in predicting hourly solar radiation according to the nMAE value.
Consequently, it is clear that the proposed model exceeds the other empirical models.In addition, it has the capacity of learning and handling huge data sets with nonlinear behaviour and stochastic nature.Even so, the empirical models can exceed our FFNN model in case of lack of long series data records necessary for learning processes.

Prediction of Hourly Horizontal Direct Solar Irradiation
from Daily Radiation.While most applications adopt the global solar radiation, the concentrating solar plants (CSP) require, generally, an accurate estimation of direct solar irradiation at different time scales [55].The direct irradiation forecasting represents an important aspect for a full solar energy potential evaluation.Global radiation measurements are usually obtained in most of the radiometric stations, whereas the data availability of its direct component is more limited.Moreover, when the direct component is measured, there is no extensive data series.Models to generate the direct solar resources are needed in order to establish its typical behaviour for energy applications.
In this paper, we considered horizontal direct solar irradiation (HDI), the normal component (NDI), if needed, can be calculated using the following equation: where A z denotes the zenith angle.
In particular, in this, paragraph we deal with hourly horizontal direct irradiation (HDI) h as target.Thus, to implement a suitable FFNN model, we have first defined the input variables to the model.To this end, a set of heterogeneous parameters has been considered.Especially, daily global radiation (G D ) combined to astronomical, climatic, and radiometric variables (δ, SD, HA, G D , K t , W ss , and S f ).
G D is the main variable as it provides information about the climatic and meteorological patterns.The hour angle and declination angle are information utilized to train our FFNN with considering information about sunlight duration and the day considered.Mainly, the HA has an effect on the optical path length through the atmosphere; then, it can replace the relative air mass.The clearness index K t represents the most relevant factor in the (HDI) h prediction.In fact, it represents an indirect measure of the atmosphere filtering action.
Table 2 shows the best performance for different architectures of the FFNN model for training process.It is worth recalling that the learning process is done over data for 2010 and 2011 and training repeated up to 10 times for each number of hidden neurons with the Levenberg-Marquardt algorithm to do fitting of the hourly horizontal direct solar radiation.
From Table 3, we notice that models 1 and 7 (7 has the same variables as 1 in addition to δ) give the best results.However, model 7 surpasses slightly model 1 especially in terms of RMSE.Thus, model 7 with five variables was adopted to carry on the hourly (HDI) generation as δ is a costless parameter.The generalisation of the model was evaluated by testing the model over data corresponding to 2012.Results are satisfactory as it can be seen in the regression plot of Figure 6 which represents predicted (HDI) h versus the measured ones.Indeed, R = 0 982 and RMSE = 0.041 kWh.m −2 during test process.This high correlation value implies that the proposed model makes accurate predictions.
In addition to that, both predicted and measured hourly (HDI) values were plotted versus time for the whole year.We also plotted the error defined as the difference between predicted and measured values.For a good illustration,   7 presents one month from each season.Generally, there is a good agreement.However, the quality of the fitting differs from a season to another and, as it can be seen, the error scatter is getting closer and closer to zero from winter to summer.Concerning the diffuse component, it is worth mentioning that once the direct radiation is predicted, the diffuse component can be determined by subtracting direct radiation from global radiation on a horizontal surface.

Prediction of Daily Horizontal Direct Solar Irradiation
from Daily Radiation.In some meteorological stations, there are relatively long series of only daily values of global solar radiation, temperature, and relative humidity.The aim in this section is to show how we can use these data to predict daily HDI (HDI) D using FFNNs.
For this analysis, we computed daily values for all variables from the recorded hourly ones to implement FFNN models.We kept the same scripts for variables and just replaced subscript "h" by "D." 1095 data records are available; 730 are used for the learning process, and the rest is used to test the model (365 records).As mentioned before, a correlation analysis was carried out between the input variables and (HDI) D .Especially, a set of nine heterogeneous parameters have been considered: horizontal extraterrestrial solar radiation (ER D ), global solar radiation (G D ), mean temperature (T D ), sunshine duration (SD), sunshine fraction (S F ), wind speed (WS D ), declination angle (δ), daylight hours (DH), and relative humidity (Hr D ).
Results, reported in Table 4, show that the most effective input parameters are global solar radiation and sunshine duration.Once the most effective input variables were defined, different combinations of these parameters have been proposed as matrix input to FFNN models with horizontal (HDI) D as target.Table 5 shows the best results of training for all the analyzed combinations in terms of nRMSE and R.
It can be noted that the network with three input variables (sunshine duration (SD), global solar radiation (G D ), and declination angle (δ)) reaches the best results for the (HDI) D prediction.
This configuration has been used to test the capability of the model of generalisation.Indeed, the model was tested over unseen data corresponding to 2012 (not used for learning process).
Figure 8 shows a regression that represents the predicted daily direct solar radiations versus the corresponding measured values in a test phase.It is clear from this plot that there is a good agreement between the predicted values and the measured ones.Such result is confirmed by R = 0 99 and a nRMSE = 8.05%.
In Figure 9, we plotted both measured and predicted daily direct solar radiation versus time corresponding to test period.Globally, there is no significant difference between the two curves.
For more illustration and to get an idea of the monthly direct solar irradiation in Fez, we represent, in Figure 10, the monthly average of predicted and measured direct solar radiation.There is a net improvement compared to Figure 9 as daily fluctuations are attenuated in monthly averages.Moreover, we notice that even for months with low solar radiation the ANN model gives relatively accurate prediction.Most of the monthly averages of (HDI) D fluctuate in the range of 2.5-6 kWh/m 2 .International Journal of Photoenergy To evaluate our model, we compare its prediction with estimation of (HDI) D with another approach.The direct solar radiation analysis, for a specific location, is often calculated starting from the global irradiance data registered.It is estimated by means of the decomposition model [56].
To estimate (HDI) D using G D , we begin by estimating the diffuse Df D using empirical statistical relations between the diffuse fraction (K d ) defined as the ratio of diffuse radiation to global radiation and the clearness index K t .
Generally, such relations are locally dependent, even if in many studies it is claimed to develop relations for different locations.Thus, to deal with this issue, we used our data base to plot K d versus K t .The fit of the resulting scatter points can be expressed by the following equation: The fitting is good since the correlation coefficient is 95% and RMSE equals to 0.07.
Hence, the diffuse Df D can be calculated from ( 6) and (HDI) D can be estimated using the following equation: In order to evaluate results from this procedure, we first plotted estimated daily direct solar radiation (HDI) D est versus the real values of (HDI) D (Figure 11).It is clear that the quality of fit is not as good as in Figure 8; this degradation is seen indeed in R that becomes equal to 0.93 and nRMSE = 17.5% instead of 8.6% for FFNN prediction.
Moreover, by considering data for 2012, we plotted, simultaneously, the estimated direct solar radiation using (6), the one predicted with the proposed FFNN and their relative real data.To better illustrate the difference between the two approaches, we represent a zoom in zones of good (Figure 12(a)) and bad agreements (Figure 12(b)).
From these figures, it appears clearly that prediction with the proposed model is more accurate and exceeds estimation with statistical relations (6).
Finally, we have plotted the cumulative distribution function for both measured and predicted series for each solar component studied (Figures 13(a), 13(b), and 13(c)); as can be seen, obtained results are almost confused with measured data.
Therefore, in point of view statistical test (goodness test), we can conclude that the predicted solar irradiance components with different FFNN's topologies are satisfactory.

Conclusion
To overcome the lack of accurate long series of solar data, needed for solar systems optimal sizing, especially for concentrating solar plant, an ensemble of FFNN models were trained to predict different components of solar radiation was presented in this research paper.These components are hourly global, daily, and hourly direct solar radiation.Three different architectures of FFNN models were used.In case of having daily global radiation for long series thanks to our model, hourly global radiation can be predicted.The generated data are not only satisfactory (R = 0 98, RMSE = 0.061 kWh/m 2 ) but surpass those estimated from empirical methods.For concentrating solar plant design, the developed FFNN models can provide long series of either (HDI) h or (HDI) D with a good accuracy (R = 0 98; nRMSE = 9.5% and R = 0 99; nRMSE = 8.05%, resp.).The horizontal diffuse component is determined by subtracting direct radiation from global radiation on a horizontal surface.
Additionally, (HDI) D series were estimated using an equation developed for Fez.Results' accuracy shows the superiority of FFNN model.
The cumulative distribution functions between generated and measured data were computed for each studied component, and results confirm the good performances of FFNN models.Finally, our models can provide synthetic series for different solar radiation components to be used in optimal sizing and planning of solar energy systems, especially for concentrating solar plant.
We look forward to apply this approach in further studies using data from other locations to develop a model that represents all Moroccan's locations.

Figure 1 :
Figure 1: First part of the Moroccan Solar Plan "Noor1" inaugurated on February 2017.

3. 2 .
Feed Forward Neural Network (FFNN).FFNNs are the most commonly used type of multilayer neural networks.A schematic diagram of the basic architecture is shown in Figure 2(a).Each neuron in the hidden layer (Figure 2(b))

Figure 2 :
Figure 2: (a) Feed forward neural network.(b) Architecture of an artificial neuron.(c) Diagram of the used FFNN in MATLAB.

4. 1 .
Meteorological Data.The data used in this paper were measured in a radiometric station supervised by our laboratory and installed in the Faculty of Science and Technology in Fez (Morocco) (latitude: 33 °56′-N; longitude: 4 °99′-W) at an altitude of 579 m.The site's climate patterns can be summarized by dry and hot summers and cold winters.

2 )Figure 4 :
Figure 4: Regression plot of measured and predicted G h .

Figure 5 :
Figure 5: Comparison between ANN results and other procedures of estimation.

Figure 7 :
Figure 7: Measured and predicted hourly (HDI) for four months representing different seasons.

Figure 9 :
Figure 9: Measured and predicted daily horizontal direct irradiation for test process.

Figure 10 :
Figure 10: Comparison between the monthly averages of measured and predicted horizontal direct irradiation (HDI).

2 )Figure 11 :
Figure 11: Regression plot of measured and estimated horizontal direct solar radiation.

Figure 12 :Figure 13 :
Figure 12: Comparison between estimated, predicted, and measured daily direct solar radiation.

Table 1 :
Best learning's performances with G h as target.

Table 3 :
Best performance of learning processes for different FFNN configurations.

Table 2 :
Statistics for different methods of estimation of G h .

Table 4 :
Correlation of different parameters to (HDI) D .

Table 5 :
Best performance of FFNN architectures.