Regression Model to Predict Global Solar Irradiance in Malaysia

A novel regressionmodel is developed to estimate themonthly global solar irradiance inMalaysia.Themodel is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE), mean bias error (MBE), and the coefficient of determination (R) with other models available from literature studies. Seven models based on single parameters (PM1 to PM7) and five multipleparameter models (PM7 to PM12) are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from0.942 to 0.992, andMBE ranging from−0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.


Introduction
Solar power is one of the major unlimited and complimentary renewable energy sources.Solar energy helps support the significantly increasing global energy demand.Given its availability in nearly all parts of the world, solar power is now widely accepted as one of the major underutilized energy sources.Fossil fuels are nonrenewable and depleting; these energy sources will eventually run out because of the immense consumption to support daily energy requirement worldwide.Thus, solar energy may be used to replace fossil fuels.In addition, solar energy has the advantage of being environment friendly because it does not pollute the surroundings or produce any hazardous waste [1].
To date, the total amount of power produced by solar power plants in Malaysia is approximately 20,493 MW; this value is estimated to increase by up to 23,099 MW in 2020 [2].Thermal power plants and hydropower plants produced 7,103 MW and 1,911 MW, respectively, in 2013 [2].These figures indicate that the energy harnessed by solar power is considerably higher compared with those harnessed by hydropower and thermal power plants.Therefore, generating energy from solar power is a good alternative for Malaysia because it is a renewable and infinite source.
Solar energy is widely used, particularly in the agricultural, architecture, and biological industries; the two most common technologies used to generate solar power are photovoltaic cells and solar thermal energy [3].Photovoltaic cells convert energy into electricity, whereas solar thermal energy implements the concept of heating and cooling through the absorption and emission of solar radiation.Various advanced technologies have been reported recently, such as solarpowered unmanned aerial vehicles that use solar energy to operate [4][5][6].Therefore, the precise prediction of available solar irradiance is essential in several fields, particularly for developing highly efficient solar energy systems.
Solar radiation comprises three main elements, namely, global, direct, and diffused solar radiation.Global solar radiation is the total amount of undisturbed irradiance emitted by the sun.Direct solar radiation is the amount of radiation from direct solar beams that fall onto a unit area perpendicular to the beam at the surface of the Earth.Diffused solar irradiance 2 International Journal of Photoenergy is the portion of irradiance that is scattered and reflected by different atmospheric components, such as the surfaces of clouds, particulate matter, pollution, water vapor, aerosols, and other components in the atmosphere that may prevent direct solar radiation [7][8][9].This condition may reduce the amount of solar radiation hitting the surface of the Earth, which will directly reduce the amount of solar energy that may be harnessed.
Collecting solar radiation data has been broadened by three methods: direct estimation through (1) in situ measurements and (2) satellite data and indirect estimation by (3) statistical techniques.Among these three methods, in situ measurement is the most challenging because it requires pyrometers with other sensors, which are highly expensive and difficult to maintain and calibrate.Moreover, this technology is only available in some countries.Satellite data modeling techniques may provide precise and reliable global solar irradiance.However, these techniques are relatively expensive and difficult to maintain because it can only be sustained through the communication between satellites and ground stations [1].
Given the limitations of in situ and satellite measurements, statistical modeling has been used in this study as a reliable alternative for estimating solar radiation.In general, modeling is performed using available empirical data by fitting the data into sets of equations.This method is known as regression model estimation.The accuracy of the result from the model is then compared with observed data through statistical analysis, which objectively evaluates the fitting of observed and estimated data.
Empirical relationship may be estimated for hourly, daily, and monthly global solar irradiance by considering local atmospheric conditions and the climate of the location of interest.Some existing models may be modified to suit the climate of a certain place by changing the regression coefficient [1,3,[7][8][9].However, studies on the modeling of global solar irradiance in Malaysia remain limited.Therefore, the current study aims to estimate global solar irradiance in Malaysia, focusing on monthly solar irradiance.This is mainly due to Malaysia's nonseasonal weather which experiences consistent temperature and rain condition throughout the year.Thus, monthly average is deemed more suitable.

Existing Statistical Global Solar Irradiance Models
El-Metwally [10] proposed a nonlinear equation to estimate monthly global solar irradiance from relative sunshine values.
Another nonlinear equation was proposed to estimate the relative sunshine in a region given the unavailability of sunshine duration data.These equations include the temperature and cloud cover fraction as input parameters.Badescu [11] developed a novel model that associated the mean clearness index with the relative sunshine hours.Their study proposed three equations that included point cloudiness as an input parameter.Their group concluded that the model with relative sunshine hours had higher accuracy than the point cloudiness models.
Almorox et al. [12] developed a linear temperature-based model to estimate global solar irradiance.In their model, the coefficients for five previously available models were modified to match the local environment.Temperature-based models are likely to be subject to errors caused by weather conditions, such as cloud movements and wind speed.Thus, these models are recommended to be used with longer time steps to reduce the effect of errors.Zhao et al. [7] included the effect of the Air Pollution Index (API) to generate linear, exponential, and logarithmic models; their work improvised the model of Angstrom [13] to estimate daily solar radiation.The aerosol effect on solar radiation is significant in polluted areas [7].
Liu and Scott [14] estimated solar irradiance through rainfall and temperature observations in areas without data or with limited available data.Khorasanizadeh and Mohammadi [15] studied 11 available models from previous studies and selected the best model for each city in Iran.The models were characterized into three categories based on the included parameter, namely, sunshine duration, relative humidity, and ambient temperature.
A few models have been developed for Malaysia, namely, those by Shavalipour et al. [16], Daut et al. [17], and Masral et al. [18].Shavalipour et al. [16] discussed three available models that included the Paltridge and Proctor [19], Daneshyar [20], and modified Daneshyar models.A new method to estimate solar irradiance in Perlis, Northern Malaysia, by combining the Hargreaves and Samani [21] and linear regression was suggested by Daut et al. [17].Masral et al. [18] developed a model by including the month of the year as the only input factor for regions without available meteorological data.The same method was applied by Li et al. [22], but they used the day of the year as the input factor.
Pandey and Soupir [23] also developed a new model from the transmission function; their model incorporated the hour of day, Julian day, solar constant, optimized parameter value, latitude, and longitude.The developed model could observe the effect of the time step on the accuracy of solar irradiance on an hourly, daily, and monthly basis.Vakili et al. [24] utilized an artificial neural network (ANN) method to estimate global solar irradiance.Parameters, such as particulate matter, were used as inputs along with temperature, relative humidity, and wind speed.Their results showed that adding particulate matter substantially improved accuracy.
In addition, Koca et al. [25] also used an ANN method to study the effect of the number of parameters on global solar irradiance estimation.The number and combination of parameters are varied between each model, including the latitude, longitude, altitude, month, average cloudiness, average temperature, humidity, wind velocity, and sunshine duration.In addition, Almorox et al. [26] managed to calibrate seven existing models and proposed one new model.Their proposed model (PM) incorporated daily air temperature and saturation vapor pressure.Another method to obtain global solar irradiance data is using satellite imagery, as implemented by Polo [27].This method processes an image captured by a satellite over a region of interest.
Nonetheless, irrespective of their location, currently available global solar irradiance estimation models exhibit inadequate performance in terms of the root mean square error (RMSE), mean bias error (MBE), and the coefficient of determination ( 2 ).Additional research on the global solar irradiance prediction accuracy is required to explore and exploit solar energy utilization further, particularly in Malaysia.Thus, a novel regression model was developed to estimate the monthly global solar irradiance in Malaysia based on different available meteorological parameters, including temperature, cloud cover, rain precipitation, relative humidity, wind speed, pressure, and gust speed.

Site and Data Set
The data used in this study are collected from three reliable sources: the "Malaysian Meteorological Department" (MMD) [28], "Soda, Solar Energy Service for Professionals" [29] (an open source of satellite data that is available online), and World Weather Online, "Armines" [30].MMD provides daily global solar irradiance data in 2013 from five meteorology stations: Kota Kinabalu, Kuching, Ipoh, Alor Setar, and Kuantan.The details of the five meteorology stations are listed in Table 1 and their respective locations are illustrated in Figure 1.
"Soda, Solar Energy Service for Professionals" provides open source solar, astronomy, climate, energy, geography, meteorology, and solar radiation data.Another open source online website that is used for data collection in this study is World Weather Online [30].This website provides a complete set of weather data, including temperature, feels, rain precipitation, cloud cover percentage, wind speed, gust humidity, and pressure, in hourly time steps, which increase data accuracy.Therefore, atmospheric data for temperature, cloud index, rain precipitation, humidity, wind speed, pressure, and gust are quasi-validated prior to predicting or estimating global solar irradiance.

Statistical Performance Evaluation
The performance of the global solar irradiance model developed for Malaysia is assessed via regression analysis, which compares the performance of the predicted model with that of the observed data.
In addition, MBE also determine the error of the PM.The smaller the MBE values, the better the model representing the observed values.MBE in percentage can be determined by using 2 is used to determine the performance of a model in terms of its suitability.This value is one of the most significant indicators for comparing models because it is dimensionless and easily calculated.Ideally, a model is considered to be perfect if  2 = 1.This value indicates that the estimated values match perfectly with the observed values.The formula that may be used to determine  2 is given in 2 . (3)

Model Development
The proposed new models are generated through regression analysis to estimate the monthly average global solar irradiance in Malaysia.This section describes the method used to develop these models and investigates the relationship of global solar irradiance with other parameters: temperature, cloud cover, rain, humidity, wind speed, pressure, and gust speed.The solar irradiance measurement system is only A simple regression analysis method is initially used to determine the dependence on each parameter for predicting global solar irradiance.Based on the effect of each parameter, combinations of additional parameters are attempted.Consequently, several possible combinations of variables are used to estimate global solar irradiance.All possible combinations of parameters undergo an iteration process.Each possible combination is reiterated until the solutions converge, which indicates that the optimized solution is achieved with high accuracy.
Although simplicity is desirable when modeling global solar irradiance, introducing new prediction parameters, rather than using conventional parameters, has also been attempted.The nonclassical parameters are wind speed and gust speed, whereas the classical and most common parameters used in other studies are ambient temperature, cloud cover, relative humidity, rain precipitation, and pressure.This approach ensures that the global solar irradiance model being developed will consider the effect of wind in shifting clouds, particularly in countries surrounded by seas, such as Malaysia.

Results and Discussion
6.1.Single-Parameter Modeling.In this section, the effect of each parameter on predicting global solar irradiance is discussed based on RMSE, MBE, and  2 .The seven models for the different single-parameter PMs established in this study are presented in Table 2.Meanwhile, Table 3 illustrates the performance of each single-parameter PM in modeling global solar irradiance in Malaysia.The results show that humidity exhibits the best RMSE, MBE, and  2 values, which suggests its strong applicability for modeling global solar irradiance.This result agrees with the common modeling method, wherein humidity is one of the most conventional and practical parameters used.By contrast, selecting wind as a parameter in estimating global solar irradiance provides the least favorable RMSE, MBE, and  2 values.This result is understandable because no obvious correlation exists between wind and global solar irradiance.Moreover, wind speed is only predicted to affect cloud movement at a certain altitude.Cloud cover, pressure, gust, and rain provide average values of RMSE, MBE, and  2 , which implies that these parameters may add to the accuracy of the global solar irradiance prediction model.Therefore, combining these parameters may aid in increasing the accuracy of the proposed model.4 shows the statistical analysis of four proposed multiple-parameter models.PM10 and PM12 both exhibit better results than PM8 and PM9, which include cloud cover.Based on the  2 and RMSE values listed in Table 5, PM12 presents the best fit, with an  2 of 0.9884, RMSE of 0.8561%, and MBE of 0.2822%.Other models also appear to be acceptable, with  2 values ranging  PM12 combines five parameters: temperature, rain, humidity, wind, and pressure.Although cloud cover is proven to be a relevant parameter in estimating global solar irradiance when used as a single parameter, it is not as significant when used in combination with multiple parameters.Moreover, the inclusion of temperature when predicting global solar irradiance may also be considered an indicator of cloud cover.When cloud cover is less, solar irradiance may radiate onto the surface of the Earth to supply additional heat, which directly increases ambient temperature.

Multiple-Parameter Modeling. Table
PM10 also provides a reliable result, with an  2 of 0.9768 and RMSE and MBE of 1.2130% and 0.2679%, respectively.In addition, this model performs better than PM8 and PM9, even without the cloud cover and pressure data sets.Two significant parameters are considered in all the multipleparameter models, namely, temperature and rain.The results signify that these two parameters exhibit a strong correlation with the estimated global solar irradiance in Malaysia.This trend is mainly attributed to the climate of Malaysia, which only has two different weather patterns: a dry season that starts from May to September and a rainy season from the middle of November to March.

Comparison of PM Performance with Existing Global Solar Irradiance Models.
A comparison between the PM in this study and other available global solar irradiance models is presented in Table 6.Most new PMs exhibit good performance in RMSE, MBE, and  2 .Both single-and multiple-PMs have an  2 greater than 0.95 and an RMSE less than 2%.Among the single-parameter models, PM4 demonstrates the best performance in predicting global solar irradiance with an RMSE of 0.421%, MBE of 0.001%, and an  2 of 0.997.
Nevertheless, the multiple-parameter model PM12 is the best model for predicting global solar irradiance with an RMSE of 0.856%, MBE of 0.2822%, and an  2 of 0.988.PM4 and PM12 are recommended for estimating global solar irradiance in Malaysia.

Conclusion
Reliable and accurate global solar irradiance data are vital in the developing application of solar energy in Malaysia.Thus, the development of a specific model to aid in providing global solar irradiance data is crucial for the advancement of solar energy systems in Malaysia.In this study, seven singleparameter models and four multiple-parameter models are proposed and evaluated based on RMSE, MBE, and  2 .Relative humidity is identified as the best parameter for predicting global solar irradiance, followed by cloud cover, pressure, gust, rain, temperature, and wind speed.Wind speed also exhibits the least correlation among these parameters.Novel multiple-parameter models are also studied to estimate global solar irradiance in Malaysia.The five multiple-parameter models are compared based on RMSE, MBE, and  2 .The model that includes temperature, rain, humidity, pressure, and wind speed is determined to be the best model because of its excellent RMSE of 0.856%, MBE of 0.2822%, and  2 of 0.988.The PM developed in this study can reasonably estimate monthly global solar irradiance in Malaysia.T emperature( ∘ C) : Cloud cover : Rain precipitate (m) RH: Relative humidity : Wind speed (m/s) : Pressure (Pa).
The three performance indicators used to determine the accuracy and reliability of the PM are RMSE, MBE, and  2 .These indicators are widely applied by researchers to test the performance of a regression model.RMSE indicates the error of a model by determining the deviations between observed and estimated values.Low RMSE values indicate that the model accurately represents the observed global solar irradiance.RMSE is measured in percentage to make it dimensionless and independent of the study location.RMSE in percentage is defined in (1), where  est is the estimated global solar irradiance,  obs is the observed global solar irradiance obtained from meteorological stations,  obs is the averaged observed global solar irradiance, and  is the number of days of estimated global solar irradiance:

Table 1 :
Locations of meteorological stations in Malaysia.
U Figure 1: Meteorological Department stations.

Table 2 :
Single-parameter PM equations.Ipoh, Alor Setar, and Kuantan.All the available data are in the form of daily time steps, which signify that 365 sets of global solar irradiance data are available for 2013, with all 7 atmospheric parameters: temperature, wind speed, gust, rain precipitation, cloud cover index, humidity, and pressure.The data for Malaysia are averaged per month to obtain monthly time-step data as 12 complete data sets from January to December 2013.

Table 4 :
Equations for multiple-parameter PMs.

Table 6 :
Comparison of PM performance with existing global solar irradiance models.