Solar Energy Prediction for Malaysia Using Artificial Neural Networks

This paper presents a solar energy prediction method using artificial neural networks (ANNs). An ANN predicts a clearness index that is used to calculate global and diffuse solar irradiations. The ANN model is based on the feed forward multilayer perception model with four inputs and one output. The inputs are latitude, longitude, day number, and sunshine ratio; the output is the clearness index. Data from 28 weather stations were used in this research, and 23 stations were used to train the network, while 5 stations were used to test the network. In addition, the measured solar irradiations from the sites were used to derive an equation to calculate the diffused solar irradiation, a function of the global solar irradiation and the clearness index. The proposed equation has reduced the mean absolute percentage error (MAPE) in estimating the diffused solar irradiation compared with the conventional equation. Based on the results, the average MAPE, mean bias error and root mean square error for the predicted global solar irradiation are 5.92%, 1.46%, and 7.96%. The MAPE in estimating the diffused solar irradiation is 9.8%. A comparison with previous work was done, and the proposed approach was found to be more efficient and accurate than previous methods.


Introduction
Solar energy is the portion of the sun's energy available at the earth's surface for useful applications, such as raising the temperature of water or exciting electrons in a photovoltaic cell, in addition to supplying energy to natural processes like photosynthesis. This energy is free, clean, and abundant in most places throughout the year. Its effective harnessing and use are of importance to the world, especially at a time of high fossil fuel costs and the degradation of the atmosphere by the use of these fossil fuels. Solar radiation data provide information on how much of the sun's energy strikes a surface at a location on the earth during a particular time period. These data are needed for effective research into solar energy utilization. Due to the cost of and difficulty in solar radiation measurements, these data are not readily available; therefore, alternative ways of generating these data are needed. A comprehensive solar radiation database is an integral part of an energy efficiency policy [1,2]. In Malaysia, there are cities/regions that do not have measured solar radiation data; therefore, a predication tool should be developed to estimate the potential of solar energy based on location coordinates.
In recent years, ANNs have been used in solar radiation modeling work for locations with different latitudes and climates, such as Saudi Arabia, Oman, Spain, Turkey, China, Egypt, Cyprus, Greece, India, Algeria, and the UK . Little work regarding solar energy prediction has been done for Malaysia. The only significant prediction methods have been proposed in [35,36] in 1982 and 1992. The authors in [35] have only proposed solar radiation data for three locations without any prediction algorithms, while the authors in [36] have proposed a prediction algorithm for monthly solar radiation based on the least square linear regression analysis using eight data locations. Consequently, an ANN model for solar energy prediction should be developed to provide a comprehensive database for the solar energy potential in Malaysia. Moreover, the proposed ANN model will be more accurate than the proposed methods in [35,36], and it will provide hourly, daily, and monthly solar radiation predictions for many different locations in Malaysia because the location coordinates are provided.
The main objective of this paper is divided into two subobjectives: develop a feed forward ANN model to predict the clearness index (K T ) based on the number of sunshine hours, day number, and location coordinates and calculate the global (E T ) and diffused (E D ) solar irradiations based on the developed formulas for Malaysia. This work has been based on long-term data for solar irradiations  taken from the 28 sites in Malaysia. These data were provided by the Solar Energy Research Institute (SERI) of Universiti Kebangsaan Malaysia (UKM). Figure 1 shows a political map of Malaysia. Malaysia is an Asian country located in the Far East and consists of eleven states with two separated main lands.

Malaysia's Climate Profile
Malaysia has a hot, humid tropical climate with two monsoon seasons: one between October and February and the other from April to October; the latter is characterized by thunderstorms. Temperatures and humidity are high year round, but the mountains are slightly cooler. Table 1 shows the climate profile for 10 sites in Malaysia. In this table, three metrological variables are used: ambient temperature, rain precipitation, and sunshine hours.
From Table 1, it is clear that Malaysia has a cloudy sky and stable climate during the year; therefore, a low average solar irradiation with a low deviation is expected for all months. However, Long-term ) metrological data containing the global irradiation, diffused irradiation, clearness index, sunshine hours, humidity, ambient temperature, rainfall, and air pressure have been taken from these sites to develop and test the proposed ANN model.

Solar Energy Prediction Model
Solar radiation is classified in two main parts: the extraterrestrial solar irradiation (E extra ) and the global solar irradiation (E T ). The variable E extra stands for the total solar energy above the atmosphere, while E T is the total solar energy under the atmosphere. The value for E extra is given by [1,2] E extra = I o 1 + .034 cos 2πN 365 * Day length, (1) where I o is the solar constant, 1,367 W/m 2 , and N is the number of the day. The day length is calculated by [1,2] Day length = 2 15 where L is the latitude and δ is the angle of declination, given by The global solar irradiation (E T ) on a tilted surface consists of three parts: where E B , E D , and E R are beam (direct), diffused, and reflected solar irradiation, respectively. On a horizontal surface, E R is equal to zero; therefore, E T on a horizontal surface is given by The global (E T ) can be calculated using E extra as follows: where K T is the sky clearness index. After finding E T , E D can be calculated using To find an appropriate function that describes (7) accurately, the right and left sides of (7) should be drawn in an x-y plot, 5  and the most suitable function that fits the drawn data should be found. Figure 2 shows regression curves of daily diffuse ratio for four individual international sites.
The recommended equation which fits the shown relations is proposed by [37] as follows: In this paper, a new equation for calculating E D using the collected data is derived instead of using (8), for accuracy purposes. Figure 3 show regression curves of daily diffuse ratio for five different locations in Malaysia. Equation (9) was derived from Figure 3; however, using (9) to calculate E D in Malaysia, or in nearby locations, is more accurate than using (8): After finding E T and E D , E B can be calculated using (5). Based on the mentioned equations, the calculation of solar radiations starts with predicting the clearness index (K T ); therefore, a prediction for K T using ANNs was completed, as detailed in the next section.

Artificial Neural Network for Clearness Index Prediction
Artificial neural networks (ANNs) are information processing systems that are nonalgorithmic, nondigital, and intensely parallel [38]. They learn the relationship between the input and output variables by studying previously recorded data. An ANN resembles a biological neural system, composed of layers of parallel elemental units called neurons. The neurons are connected by a large number of weighted links, over which signals or information can pass. A neuron receives inputs over its incoming connections, combines the inputs, generally performs a nonlinear operation, and outputs the final results. MATLAB was used to train and develop the ANNs for clearness index prediction. The neural network adopted was a feed forward, multilayer perception (FFMLP) network, among the most commonly used neural networks that learn from examples. A schematic diagram of the basic architecture is shown in Figure 4. The network has three layers: the input, hidden, and output layers. Each layer is interconnected by connection strengths, called weights. Four geographical and climatic variables were used as input parameters for the input nodes of the input layer. These variables were the day number, latitude, longitude, and daily sunshine hours ratio (i.e., measured sunshine duration over daily maximum possible sunshine duration). A single node was at the output layer with the estimated daily clearness index prediction as the output. The transfer function adopted for the neurons was a logistic sigmoid function f (z i ): where z i is the weighted sum of the inputs, x j is the incoming signal from the jth neuron (at the input layer), w i j the weight on the connection directed from neuron j to neuron i (at the hidden layer), and β i the bias of neuron i. Neural networks learn to solve a problem rather than being programmed to do so. Learning is achieved through training. In other words, training is the procedure by which the networks learn, and learning is the end result. The most common methodology was used, supervised training. Measured daily clearness index data were given, and the network learned by comparing the measured data with the estimated output. The difference (i.e., an error) is propagated backward (using a backpropagation training algorithm) from the output layer, International Journal of Photoenergy 5

Results and Discussion
To ensure the efficacy of the developed network, five main sites were chosen out of the 28 sited in Malaysia. The chosen sites are Kuala Lumpur, Ipoh, Alor Setar, Kuching, and Johor Bahru. These sites span Malaysia and have been chosen to check the efficacy of the developed network over all of Malaysia.
The developed software first predicts the daily clearness index, then calculates the predicted global radiation based on (6), and finally the diffused radiation is calculated by (9). Figure 5 shows the predicted clearness indexes compared with the measured values for the five chosen stations. The figure shows good agreement between the measurements and the predictions.

Daily Solar Radiation Prediction.
The best fit appears in the Johor Bahru and Kuching stations, while the worst is in the Alor Setar station. The fittings are all acceptable due to the low calculated error, as will be discussed in Section 5.3.
To evaluate the developed network, the measured values of the sunshine ratio for the year 2000 in each of the chosen sites have been used to predict the global and diffused solar radiation for this year. The predicted and estimated data were then compared with the measured data, which were also taken from the chosen sites for the same year. Figure 6 shows a comparison between the measured and predicted daily global solar radiation of the chosen sites.
In general, the prediction of the global radiation was acceptable and accurate. Based on the results, it is clear that Malaysia has a stable climate throughout the year. Cloud cover generally reduces the radiation by 50%, so the global irradiation fluctuated in the range of 2 to 6 KWh/m 2 . The second part of the year (October to February) saw more cloud cover, and consequently, poorer solar potential compared with the first part of the year (March to October). Table 2 shows the yearly average global solar irradiation for the five sites. From the table, the best prediction is at Kuala Lumpur station, while the worst is at Alor Setar. Kuala Lumpur region has the highest solar potential. Figure 7 shows a comparison between the measured and predicted diffused solar irradiation.
The estimation of the diffused solar irradiation was evaluated using (9). In general, the estimation was acceptable. Table 4 shows the yearly average diffused solar irradiation for the five sites. Table 3 shows a comparison of the annual average of the diffused solar radiation estimated by the proposed equation (9) and estimated by the equation proposed by Caudill and Butler in [38] (8). The proposed equation showed better accuracy in the estimation.

Monthly Solar Irradiation.
To get an idea of the monthly solar irradiation profile in Malaysia, the chosen five sites' weather data were used again to predict the daily global and diffused solar irradiations at the five sites for five years (1999)(2000)(2001)(2002)(2003)(2004). The monthly average global and diffused solar irradiations were then calculated and compared with the monthly averages of the measured data. Figure 8 shows the monthly average of the predicted global solar irradiations compared with the measured values.
As mentioned previously and also from Figure 8, the global solar irradiation values were clearly degraded in the wet season (October to February) due to the heavy cloud cover and rains; however, most of the monthly averages of E T at all stations fluctuated in the range of 3.5-5.5 KWh/m 2 . Figure 9 shows the monthly average of the estimated diffused solar irradiation compared with the measured values. Figure 9 also shows a comparison of the diffused solar radiation values which were estimated by (8) with the values which were estimated by (9).
From Figure 9, using the proposed (8) to estimate E D is better than using (9), proposed by Caudill and Butler in [38]. The estimation of E D was not as accurate as the prediction of E T , but knowing E T accurately is more important for the design of photovoltaic power systems because these systems are designed based on global radiation not diffused. But in cause of solar water heaters (SWH), obtaining an accurate estimation regarding the diffused irradiation is important because SWH systems can only work by direct irradiation.

Developed ANN Evaluation.
As mentioned above, predicted values (daily global and diffused irradiations) have been compared with measured values to calculate the mean absolute percentage error (MAPE). The MAPE is defined as The MAPE values of the chosen sites are listed in Table 4. The average error in predicting the global solar irradiation was 5.86%. As for the error in estimating the diffused solar irradiation, the proposed equation has an average error of     Figure 6: Comparison between the measured and predicted daily global radiation for the chosen five sites. 9.8%, while the error in estimating the diffused solar irradiation using (8) was 10.1%. Thus, a slight error reduction is achieved by using (9) instead of (8) for the estimation of diffused solar irradiation in Malaysia and nearby regions.
Additionally, most authors who have worked in this field evaluated the performance of the utilized ANN models quantitatively and ascertained whether there is any underlying trend in the performance of the ANN models in different climates using statistical analysis involving mean bias error (MBE) and root mean square error (RMSE). These statistics were determined as where I Pi is the predicted daily global irradiation on a horizontal surface, I i is the measured daily global radiation on a horizontal surface, and n is the number of observations. MBE is an indication of the average deviation of the predicted values from the corresponding measured data and can provide information for the long-term performance of the models. A positive MBE value indicates the amount of overestimation in the predicted global solar radiation and vice versa. RMSE provides information on the short-term performance and is a measure of the variation of the predicted values around the measured data, indicated by the scattering of data around the linear lines shown in Figure 5. Table 5 shows the MBE and RMSE values for the chosen sites.
From Table 6, the MBE of the Kuala Lumpur station was −0.18%, meaning that the predicted values are underestimated by 0.018%, while every others station showed a slight overestimation. The average MBE for the developed network is 0.673 KWh/m 2 , meaning the predicted values were overestimated by 1.46%.
The RMSE shows the efficiency of the developed network in predicting future individual values. A large positive RMSE means a large deviation in the predicted value from the real value. The average RMSE for the developed network is 0.3684 KWh/m 2 , meaning a deviation of 7.96% is possible in a predicted individual value. Table 6 shows a comparison between the proposed network and other proposed networks, the comparison is made based on the MAPE, MBE, RMSE, number of network inputs, and the type of network.

Conclusion
A prediction of global solar irradiation using ANN is developed. This prediction was based on collected data from   20  40  60  80  100  120  140  160  180  Day number   200  220  240  260  280  300  320  340  360 Day number (e) Figure 7: Comparison between the measured and predicted diffused solar irradiation.    Month Measured E D Estimated E D by [37] Estimated E D by the proposed equation average MAPE, MBE, and RMSE for the predicted global solar irradiation are 5.92%, 1.46%, and 7.96. The MAPE in estimating the diffused solar irradiation is 9.8%.