Estimation of Monthly Sunshine Duration in Turkey Using Artificial Neural Networks

This paper introduces an artificial neural network (ANN) approach for estimating monthly mean daily values of global sunshine duration (SD) for Turkey. Three different ANN models, namely, GRNN, MLP, and RBF, were used in the estimation processes. A climatic variable (cloud cover) and two geographical variables (day length and month) were used as input parameters in order to obtain monthly mean SD as output. The datasets of 34 stations which spread across Turkey were split into two parts. First part covering 21 years (1980–2000) was used for training and second part covering last six years (2001–2006) was used for testing. Statistical indicators have shown that, GRNN andMLP models produced better results than the RBF model and can be used safely for the estimation of monthly mean SD.


Introduction
The importance of solar energy is increasing because demands for energy are increasing from day to day and the world's energy resources are limited.Due to this fact, a large number of studies are carried out and will continue to be so in the future in order to benefit from the solar energy which is priceless and clean.On the other side, the significant variations of solar radiation over long years cause important climatic changes which affect the life in the world from many aspects.For example, a decrease in the amount of solar radiation between 1950s and early 1980s was observed, which is known as global dimming, and this phenomenon was reversed from second half of 1980s known as global brightening [1][2][3][4][5][6].It is then crucial to know and analyze the variations of solar radiation in time and space.Global solar radiation measurements are generally made with the actinographs which are often not reliable due to the need of routine calibration of thermal sensitivity of the mechanical components of their sensors.More accurate measurements can be done by constructing networks with calibrated modern pyranometers but this is not the case for many countries because this type of instruments is expensive.It has been proved that solar radiation is highly correlated to the sunshine duration (SD) [7][8][9][10].This obviously means that if one has information about the SD over an area then one also has information about the solar radiation over that area or vice versa.Hence, long-term accurate observations of global SD become very important and necessary for climatologic and some other applications [1,6,7,[11][12][13].In fact, SD measurements have been achieved at many locations over the world more accurately than the solar radiation with cheaper instruments for long time periods.For instance, it has been measured in Ankara station in Turkey since 1928 [14].SD measurements have been done in The Netherlands since 1901 [9] and at eight meteorological stations in Taiwan since 1910 or earlier [15].Records of sunshine hours have been available from four stations of Ireland since the late 19th century [16].Measurements of duration of sunshine in the United States began in 1891 at 20 U.S. Weather Bureau (USWB) stations [17].Although SD can be measured almost in all meteorological stations, the networks of meteorological stations are still insufficient and/or limited for some parts of the world due to geographical and sometimes financial problems.This means that some countries have limited or sometimes unreliable SD database and maps.In order to overcome this problem, estimation techniques are improved for the areas where SD measurements are not available or reliable, especially for remote and inaccessible regions.Up to now, many researchers have concentrated themselves on the estimation of global solar radiation so that numerous studies can be found in the literature.In contrast to this, limited numbers of studies have been done for the estimation of SD, its relation with geographic and climatologic variables, and its variation throughout time and space.
Reddy developed an empirical model to calculate sunshine from total cloud amount [18].The relationship between point cloudiness and sunshine derived cloud cover was investigated using data recorded at 34 stations in India by Raju and Kumar [19].A cubic regression equation between monthly mean values of fraction of the sky and duration of bright sunshine was proposed by Rangarajan et al. and using this relationship they computed SD from cloud cover data [20].Harrison and Coombes derived a second order regression equation which defines the statistical relationship between long-term averages of monthly cloud shade and point cloudiness using data of 43 Canadian weather stations [21].Yin used monthly data of 729 worldwide stations for finding a generic algorithm that captures global variation of bright SD data in relation to temperature, precipitation, and geographic location [22].El-Metwally proposed a nonlinear relationship based on maximum and minimum air temperatures and cloud cover fraction for estimating relative SD [23].Matzarakis and Katsoulis estimated mean annual and seasonal duration of bright sunshine from an empirical formula which depends on distance of each station from the nearest cost, height of above sea level for each station location, percentage of land cover around each station, latitude of each station, and longitude of each station [24].Kandirmaz proposed a simple model for the estimation of daily global SD and constructed spatially continuous map of SD over Turkey using meteorological geostationary satellite data [25].Robaa established a simple model using observed cloud data and estimated SD at any region in Egypt [26].Kandirmaz and Kaba showed that the statistical relation between daily satellite-derived cloud cover index and measured SD is quadratic rather than linear [27].
Recently, artificial neural networks (ANNs) have been extensively used for the solution of many problems in many areas, such as optimization, prediction, and modeling in engineering, climate science, pattern classification, and economy.ANNs are effective tools for modeling nonlinear systems [28][29][30][31].Many previous studies have shown that usage of ANN techniques is an alternative and strong key for prediction of global solar radiation as compared to classical regression models.In ANN based solar radiation studies, meteorological, geographical, and climatological variables such as SD, temperature, cloud cover, relative humidity, wind speed, vapor pressure, precipitation, elevation, latitude, longitude, month, and satellite recorded or derived variables were used as input variables for obtaining the solar radiation as an output [32][33][34][35][36][37][38][39][40][41][42][43][44][45].Although many ANN based studies have been in existence for the estimation of solar radiation in the literature, there have been only two studies for the estimation of SD.Jervase et al. generated contour maps for sunshine hours and sunshine ratios for Oman using a radial basis function (RBF) neural network model [46].In their study latitude, longitude, altitude, and month of the year were used as input parameters.Mohandes and Rehman estimated the SD over Saudi Arabia using two neural network algorithms in which maximum possible day length, extraterrestrial solar radiation at that particular location, latitude, longitude, altitude, and month number were used as input parameters [47].On the other side, only a few studies have been devoted to the estimation, determination, and distribution of SD for Turkey.Aksoy analyzed the changes in SD for the Ankara station in Turkey [48].Kandirmaz [25] and Kandirmaz and Kaba [27] used satellite data for predicting the SD in Turkey.S ¸ahin developed a simple model and estimated SD for some stations in Turkey [49].Trends of the measured SD of 36 stations were analyzed by Yildirim et al. [50].The present work is the first study where an ANN approach is proposed for the estimation of SD for Turkey.In order to get the best possible results, we have constructed three different ANN models in which cloudcover, day length, and month have been used as input parameters.

Data Sources and Methodology
This research was conducted for Turkey which is located between latitudes 36 ∘ and 42 ∘ N and longitudes 26 ∘ and 45 ∘ E and has seven different climatic zones, namely, Black Sea, Marmara, Eastern Anatolian, Southeastern Anatolian, Mediterranean, Aegean, and Central Anatolian [51].Its average annual solar radiation was determined as 1311 kWh/m 2year (3.6 kWh/m 2 -day) and the annual average total SD as 2640 hours (7.2 hours/day) for the period from 1966 to 1982 [52].Ground-measured daily SD and cloud fraction data are collected from 34 selected stations of the Turkish State Meteorological Service (TSMS).These stations cover almost the whole country and hereby reflect all climatic properties of the country.The geographical positions of these stations can be seen in Figure 1.
Previous studies have shown that SD can be interrelated with the cloud cover, air temperature, precipitation, relative humidity, wind speed, and geographical variables or combination of some of variables given above [53][54][55][56][57][58].Clouds, consisting of liquid water droplets or ice particles, decrease incoming solar radiation in many ways before reaching earth surface.Physical explanation of the interactions between clouds and solar rays is rather difficult because these interactions depend on the size and shape of droplet or particles and total mass of water and spatial distribution.Other meteorological variables may affect the incoming solar radiation but the main and the greatest effect comes from the clouds [10,27].
Day length, which is a function of latitude and solar declination of the site interested, gives the maximum possible duration of sunshine in a day.Since the cloud cover and day length are the most responsible elements for determining the daily bright sunshine hours, they were used as input data in International Journal of Photoenergy  our ANN models.Daily cloud coverage and bright sunshine hours used in this study were made available by Turkish State Meteorological Service (TSMS).Cloud coverage observations have been made three times in a day by trained meteorologists and their average value is assigned to a day.Bright sunshine hours are recorded by Campbell-Stokes type sunshine recorder.In this instrument the solar energy is concentrated by a special lens and it detects the sunshine if the energy of the beam is high enough to burn a special dark paper card.A mean value of 120 Wm −2 of direct solar irradiance is accepted as the threshold value by World Meteorological Organization (WMO).SD should be measured with an uncertainty of 0.1 h and a resolution of 0.1 h.Adjustments and calibrations of the sunshine recorders are routinely made by TSMS.Day length is calculated for the station of interest by using the following formula [59]: where  is the latitude of location in the range and  is the solar declination and  is the number of day of the year starting from first of January.
Validation of the simulated values of the models is achieved using the five statistical indices: root mean square error (RMSE), percentage root mean square error (%RMSE), mean bias error (MBE), mean absolute error (MAE), and percentage mean absolute error (%MAE) defined by the following mathematical equations: where   is the estimated value and   is the measured value and  is the number of data.MBE is used to test whether the proposed model tends to overestimate or underestimate the measured value and generally it provides good information for long-term observations.On the other side, RMSE generally provides valuable information for short-term applications and it explains the measure of differences between measured and estimated values.As the values of RMSE, %RMSE, MBE, MAE, and %MAE get closer to zero, performances of the models get better.

4
International Journal of Photoenergy

Artificial Neural
Network.An ANN consists of biological neuron like operation units (nodes) linked together according to a specific architecture.ANNs have generally one input layer, one output layer, and some hidden layers.Hidden and output layers contain activation functions.Neurons between adjacent layers are interconnected.Connection weights are multiplied by inputs to obtain product terms.Sum of products and biases then applied to a transfer function through the output layer.The result of the output layer contains total effect of all the neurons in the network [34,60].
Input layer holds corresponding input without any operation and supplies input vector  to pattern layer.Pattern layer consists of neurons for each training datum.In this layer, weighted squared Euclidean distance is calculated as shown in (5).Any test input applied to network is first subtracted from pattern layer neuron values and then according to the distance function either squares or absolute values of subtracts are summed and applied to activation function.Exponential function is the most popular activation function.Results are transferred to summation layer.Summation layer neurons add dot product of pattern layer outputs and weights.In GRNN is also known as normalized RBF-NN.RBF units in GRNN are probability density functions shown in (7).In GRNN structure only smoothing parameter (), also known as bandwidth, is updated during training phase [66,73]. values are important for network performance; smaller  limits the number of effective samples; on the other hand the larger one extends radius of effective neighbors [62][63][64][65][66][67][68].

Multilayer Perceptron (MLP).
Unlike single perceptron, multilayer perceptron (MLP) contains input, output, and one or more hidden layers with computation nodes [73].Figure 3 is an example of architecture of MLP.MLP provides a solution to overcome the limitations of the perceptron.Input layer holds input to first hidden layer without any operation.Main computational tasks are done in hidden layers.Last hidden layer transmits results to output layer.Model of each neuron includes a differentiable nonlinear activation function.Connectivity is provided by synaptic weights of the network.Default training method of MLP is error backpropagation algorithm containing a special kind of LMS algorithm.Gradient descent is one of the most popular ones.Training includes forward and backward steps.In the forward step, synaptic weights of the network are fixed.Operation results of inputs with these fixed weights and activation functions determine the changes.In this step, each hidden and output neuron compute continuous nonlinear function of the input and weights associated with that neuron.Typically, activation functions used in hidden neurons are tangent hyperbolic sigmoid and logarithmic sigmoid functions given in ( 8) and ( 9), respectively.Forward step calculations are summarized as shown in (10).Each unit  in layer  receives  −1     from the previous layer of processing units and sends activations    to the next layer of units: In the backward step, error is calculated by subtracting network output with desired output.Then, square error is tried to be minimized while propagating through the network in the backward direction.During propagation, weights are updated to minimize the error.In backward step, each hidden and output neuron computes the gradient vector for learning.Update operation of weight is given in (11) where    denotes the th output layer neuron for th data,  denotes learning rate, err is error function,    denotes target value of th data for th output layer neuron, and    denotes the weight value between th layer th neuron and ( − 1)th layer th neuron: In our study, optimum results were obtained with three inputs, by 13 neurons in one hidden layer and a single neuron in the output layer for the MLP.Tansig and purelin were used in hidden layer and output layer as activation function, respectively.

Radial Basis Function Network. Radial Basis Function
Neural Network (RBF-NN) is three layered feed-forward network type applicable to various regression and classification problems.Structure of the RBF is given in Figure 4.
Input layer has  neurons for  dimensional inputs.Hidden layer contains the same number of neurons as the size of the training sample, .In this layer, activation function used by neurons is Radial Basis Function given in (12), where th data point   denotes the center point of RBF,  is input, and   is width of th Gaussian function: Output layer consists of neurons to calculate sum of weighted hidden layer outputs, given in (13).Although there is no limitation on the size of output layer, typically the size of output layer is smaller than hidden layer:

Results and Discussion
In this study three different ANN models, namely, GRNN, MLP, and RBF were employed in order to estimate the monthly mean daily SD for 34 stations in Turkey.The inputs of the networks were monthly mean values of cloud coverage and day length, and the output was monthly mean daily SD.Data belonging to selected 34 stations were subdivided into two separate datasets.It can be concluded from Figure 5(b) that the GRNN model produced much closer results to the observed values at 14 stations (Rize, Trabzon, Bursa, Erzincan, Erzurum, Sivas, Ankara, Kirikkale, Denizli, Adana, Elazig, Mersin, Izmir, and Antalya).MLP model produced closer results at 11 stations (Kocaeli, Samsun, Kastamonu, Zonguldak, Tokat, Afyon, Istanbul, Yozgat, Iskenderun, Adiyaman, and Van) than at the others.On the other hand, RBF model predicted better results for the remaining nine stations (Bolu, Sakarya, Kutahya, Kahramanmaras, Gaziantep, Kayseri, Konya, Diyarbakir, and Sanliurfa).Lower MAE values (less than 0.2 h) were found for Bolu, Zonguldak, Tokat, Ankara, Konya Izmir, Van, and Antalya stations while higher MAE values (greater than 0.4 h) were found for Bursa, Kahramanmaras, and Sanliurfa stations by all three models.On the other side MLP model produced lower RMSE results for 17 stations (Bolu, Kocaeli, Samsun, Kutahya, Kastamonu, Afyon, Istanbul, Ankara, Iskenderun, Kirikkale, Denizli, Yozgat, Adana, predicted values () was found to be 0.9530, 0.9563, and 0.8902, respectively.
The range of monthly mean daily SD values for the all stations is around 4.0 h (minimum value is around 4.4 h and maximum value is around 8.4 h) which really indicates that the country has geographically and climatologically different zones.Generally, lower SD values have been observed for the stations especially located inside the Black Sea Region (Rize, Trabzon, Bolu, Samsun, Kastamonu, and Zonguldak) and Marmara Region (Sakarya, Istanbul, Kocaeli, and Bursa), which is highly affected by Black Sea climate, north part of East Anatolia (Erzincan and Erzurum), and Central Anatolia (Kutahya and Tokat).This was expected because Black Sea Coasts have much more cloudy days than the other regions and receive the greatest amount of rainfall.On the other hand, higher SD values observed for the stations that stay inside Southeastern Anatolia (Sanliurfa, Diyarbakir, Adiyaman, and Gaziantep), Mediterranean (Antalya, Mersin, Adana, Iskenderun, and Kahramanmaras), and Aegean (Izmir, Denizli) and Central Anatolia (Ankara, Kirikkale, Konya, Kayseri, and Yozgat) and south part of East Anatolia (Van having latitude as 38,47) regions.It is certain that the cloudiness increases as one goes from lower latitude (south of country) to the higher latitude (north of country).
Results of the current study are comparable with previous studies in which neural network approaches and other methodologies were used for other geographical locations.Jervase et al. used RBF neural network model and found that the estimated values deviate from the measured values in between 0 and 1 hour for the stations inside Oman [46].Mohandes and Rehman obtained the minimum mean absolute percent errors (MAPE) for Al-Madina station as 2.3% and 2.7% and maximum values for Al-Numas station in Saudi Arabia as 22.9% and 16.7% using PSO and SVM methods [47].The duration of sunshine varies between 7.4 h and 9.4 h per day and its average daily value is approximately 8.89 h for Saudi Arabia [47] and 9.5 h for Oman [74].Due to their locations both of these countries have less cloudy days and have much solar energy and SD, in comparison to Turkey.Probably this was the reason why Mohandes and Rehman [47] and Jervase et al. [46] did not use the cloudiness as one of the input parameter in their studies.That is, the effect of cloudiness on SD is more impressive on Turkey and it has been used as an input parameter in the present study.
El-Metwally estimated relative SD for six sites in Egypt.MBE% and RMSE% values varied from −0.2% to −13.3% and 2.3% to 14.5%, respectively [23].Temporal and spatial distribution of bright sunshine hours over Greece were estimated by Matzarakis and Katsoulis and the correlation coefficient () and RMSE were found to be 0.87 and 9.90 h, 0.58 and 6.15 h, 0.89 and 4.69 h, 0.86 and 6.22 h, and 0.84 and 5.33 h for annual sunshine, winter, spring, summer, and autumn, respectively [24].The relative SD was estimated from the cloud data using three empirical formulae for Egypt in the study of Robaa.It was found that relative percentage error (), mean percentage error (MPE), MBE, and RMSE were changed from −7.2698% to +3.7908%, −0.6240% to +0.8069%, −0.0053 to +0.0070, and 0.0046 to 0.0160, respectively [26].A simple model was set up by Stanghellini [75] to evaluate monthly SD for various sites in Italy on the basis of the mean daily cloudiness.It is found that monthly MBE was in the range of −17.3 h to 14.9 h.

Conclusions
This paper presents a study on the monthly mean estimation of daily SD using three ANN methods, GRNN, MLP, and RBF, which were applied to 34 stations in Turkey.Month, day length, and cloud coverage data were selected and used as input parameters of the constructed models.Since day length can be calculated using astronomical factors and cloud cover obtained visually, monthly mean SD can be determined very accurately for any region by choosing an appropriate ANN model and without using any measuring instrument if enough historical SD and cloud cover database exist.The statistical indicators have shown that GRNN and MLP models work better than RBF model.Results obtained here seem to be good enough because Turkey has geographically and climatologically diverse zones, meaning that the range of distribution of SD over the country is not homogeneous.Also, it should be noted that the visual determination of cloud coverage is a subjective work which may also cause some error and affects the accuracy of the models used here.

Figure 1 :
Figure 1: The geographical distributions of 34 meteorological stations in Turkey.

Figure 2
weights are shown by  and , their values are determined by  values of training data stored at pattern layer, and () denotes weighted outputs of pattern layer where  is a Parzen window associated constant.() denotes multiplication of pattern layer outputs and training data output  values.At output layer, () is divided by () to estimate desired , given in (6), (7) [61, 71, 72]:

Figure 3 :
Figure 3: Structure of the MLP.
First part covering first 21 years was used in the training process and the second part covering last six years(2001)(2002)(2003)(2004)(2005)(2006) was used in the testing process.The models were constructed and optimized by varying the number of neurons in hidden layer.Matlab ANN toolbox was used for each modeling.Monthly mean daily SD values were estimated for each station by the proposed ANNs and their averages were compared with the values recorded at meteorological stations from 2001 to 2006.Figure5(a) shows the comparison of observed values with the estimated values for each station for the considered period.It has been generally observed that simulated values were very close to those observed at meteorological stations.In order to further clarify these results, MAE and RMSE were calculated for each model and station and the graphical representations were given in Figures5(b) and 5(c), respectively.