Prediction of Rooftop Photovoltaic Solar Potential Using Machine Learning

Solar energy forecasting accuracy is essential for increasing the quantity of renewable energy that can be integrated into the existing electrical grid control systems. The availability of data at unprecedented levels of granularity allows for the development of data-driven algorithms to improve the estimation of solar energy generation and production. In this paper, we develop a prediction of solar potential across large photovoltaic panels from the roof tops using a machine learning method. The Restricted Boltzmann Machine (RBM) is the machine learning method used in the study to predict or forecast the solar potential in rooftops. The machine learning model is supplied with training dataset to get trained with the dataset for conversion into the model and then tested with the test dataset for validating the model. The results of simulation are conducted on R-package over various libraries to predict the rooftop solar potential. The results of simulation shows that the proposed method achieves higher rate of prediction accuracy than the other methods. The results of the simulation show that the proposed method achieves a higher rate of prediction accuracy of 99% than the other methods.


Introduction
Photovoltaic (PV) panels have been developed as a result of the global transition away from fossil fuels and toward sustainable sources of electricity (RES) [1]. Examples include the fact that the cost of producing electricity from solar panels has dropped substantially, while the efficiency of energy conver-sion has also increased [2]. The power cost of large-scale photovoltaic (PV) panels decreased by 73% between 2010 and 2017. PV panels have emerged as a viable renewable energy source in many countries as a result of their decreasing cost and increasing efficiency [3].
Consequently, the energy production of PV panels is highly changeable as a function of meteorological conditions such as cloud cover and sun irradiation [4]. The capacity to better assess and manage production unpredictability is of particular relevance to a number of energy market participants [5]. Over and under production of electricity may result in penalty fees, which is why a transmission system operator is interested in the energy output from PV panels in the near future (0-5 hours) [6]. In contrast to this, electricity dealers are more concerned with short-term forecasts, such as those for the following day, because the vast majority of electricity is traded in this manner. Because of this, the profitability of these activities is based on their ability to accurately predict the varied output of solar PV panels [7].
As more countries decide to increase their renewable energy investment, it is expected that the use of solar PV panels will increase. As a result, there will be an increase in the demand for accurate solar PV energy output projections [8]. PV panel energy output projections necessitate accurate and efficient forecasting, but the solution is anything but straightforward. There is a wide range of challenges being addressed by the current research in the field [9]. Because of the inherent instability of weather, it is difficult to forecast specific weather conditions [10].
Recent years have seen an increase in the popularity of machine learning (ML) approaches to forecast PV power, as opposed to traditional time-series prediction models, which have been in use for decades [11]. Despite the fact that machine learning (ML) technologies have been around for a while, increased processing power and better data availability have made them more useful for predicting the future [12].
The major objective of the study is to examine and contrast numerous methodologies for predicting the energy output of solar photovoltaic (PV) panels. With the help of machine learning and time series approaches, it is possible to dynamically determine the relationship between solar PV system output and various weather conditions. Four machine learning algorithms are evaluated in the context of realworld solar power plant installations, and their performance is compared to more traditional time series methods. It was necessary to study several ways of feature engineering in order to increase the overall forecast accuracy.
In this paper, we develop a deep learning model using the Restricted Boltzmann Machine (RBM) which is the machine learning method used in the study to predict or forecast the solar potential in rooftops.
The main contribution of the work involves the following: (i) The authors develop a prediction of solar potential across large photovoltaic panels from the roof tops using a machine learning method (ii) The machine learning model is supplied with training dataset to get trained with the dataset for conversion into the model and then tested with the test dataset for validating the model

Background
The method in [13] discovered that the RBM was more accurate than the ARIMA in short-term forecasting, indicat-ing that the RBM is superior to the ARIMA. Contrary to popular belief, Coimbra and Pedro anticipated the power output of solar PV panels rather than the amount of solar irradiation. There was no use of exogenous NWP data in this research; instead, the output levels of the panels were used to determine their historical output. Increased accuracy in RBM was achieved by the use of genetic algorithms (an optimization technique based on the natural selection process), which were developed by the authors. When it comes down to it, the different weather regimes [14] have varying degrees of accuracy in terms of forecasting. They advocate splitting data into subsets based on different weather regimes in order to use it for weather modelling. According to the authors of [15], it is feasible that fitting a distinct model to each dataset for a specific weather regime will result in better predictions than using a single fitted model for all possible weather regimes in a single dataset. The method in [16] examined machine learning approaches and evaluated them in the context of producing characteristics that were predicted to improve the performance of a system. The principal component analysis (PCA) and a feature engineering methodology in combination with a gradient boosting tree model were the two main approaches employed in the research project. The other strategy was a feature engineering methodology in combination with a gradient boosting tree model. In addition, the authors used different smoothing methods to extract features from their NWP data [17], which they shared with the community. A grid of NWP data was used to calculate the average and changes in meteorological factors surrounding the PV installation site. Along with building features from local grid points, the authors also calculated variances for various predictors at varying lead times. The major motivation [18] for constructing variance characteristics based on lead times was to demonstrate the variability of the weather.
It is undeniable that PCA and feature extraction can lead to more desirable results. According to the researchers, there is a two-fold gap in knowledge that will need to be overcome in the future [19]. In order to improve forecast accuracy, it is necessary to understand how to manage features (through feature engineering and feature selection). The second component of this research investigates machine learning modelling strategies that can be utilized in conjunction with informative characteristics. Their conclusion is that the combination of deep learning and feature management will be a fascinating road to travel down in the next few years [20].
The method in [21] predicted solar irradiance using PCA, RBM, and the Analog Ensemble (AnEn), among other techniques. When trying to reduce the dimensionality of a dataset, a technique known as principal component analysis (PCA) was applied. Over the course of eight years, the total daily energy production from solar radiation was gathered and combined into a single dataset for analysis. According to the findings of a study, using PCA in conjunction with RBM and AnEn improves prediction accuracy.
At 15-minute intervals, the method in [22] presents a technique for weather classification and SVM to anticipate 2 International Journal of Photoenergy PV power output for the next forecast. The day weather conditions include clear skies, overcast skies, foggy skies, and wet skies, to name a few examples. The classification of sites is based on an analysis of local weather forecasts and PV electricity generation. Once the data has been normalized, it is cleaned up in order to remove any unwanted noise and improve precision while also maintaining data correlation. The four distinct weather classes are then trained with four SVM models with radial basis function kernels, one for each weather type. We come to the conclusion that SVM models can be utilized to train models that are based on specific meteorological circumstances. According to practically every research study, lag energy output estimations are significant for short-term forecasting, but they become even more important as forecasting horizons increase. Both research studies that supported and rejected the usage of time series models discovered that they were superior to machine learning in the short run. Continuing the comparison between time series models and machine learning approaches, in our opinion, is necessary and worthwhile [23].
When using a regression-based model, the accuracy of forecasting can be increased by using a suitable optimization strategy, as indicated in [24]. As a result, we will not investigate the performance of the RBM across multiple versions because the purpose of this study is to give a generic comparison of different machine learning techniques. It has been discovered in several studies that errors in NWP data have a considerable impact on the accuracy of forecasts [25]. Any model may benefit from the use of NWP data from a number of sources, which may help to ease this difficulty. For this to be possible, data from a large number of credible predictors must be available, which may not be the case in all cases. The overall inaccuracy of the incoming NWP data may be reduced even further if the NWP data were derived from a variety of physical models.

Proposed Method
In this section, we present prediction of solar potential across large photovoltaic panels from the roof tops using a machine learning method. The Restricted Boltzmann Machine (RBM) is the machine learning method used in the study to predict or forecast the solar potential in rooftops.
In the beginning, data on solar generation is represented as a matrix with the formula m * n, where m signifies the number of features and n denotes the total number of observations. During this stage, the data is sorted into two categories: training data and testing data, as shown in the diagram. Cross-validation with different folds of the same training data is used to train RBM using different folds of the same training data. The results of the k-fold cross-validation approach is completely unaffected by bias.
When it comes to solar photovoltaic (PV) data, the weather has a considerable impact on the seasonal changes. Due to the fact that K-fold cross-validation training is performed on fewer data points, the variance is lower than that of a single hold-out model. With the K-fold method, you can divide your data into k equal-sized parts. Each of the k-folds in the test is taught to the models k times in total.
The models are validated using a five-fold crossvalidation procedure. Iteration one employs the first four folds to train models, while the last fold is used to test the models trained in the previous iteration. The model is tested on the second-to-last fold in the subsequent iteration, while the remaining folds are trained in the following iteration. This cycle is repeated until all five test fold predictions have been received, at which point the process is completed as in Figure 1. Given the model parameters, RBMs use an energy function Eðv, h ; θÞ to establish a joint distribution pðv, h ; θÞ across the visible and hidden units θ: where, Z = normalization factor. The marginal probability to v is given as below: The energy function is given as below: where w ij = symmetric interaction term between v i and h j , b i and a j = bias term, I = visible units, and J = hidden units.
The conditional probabilities is estimated as below: For a visible-hidden RBM, the formulation for energy is given below: The conditional probabilities is hence modelled as below: where v i f ollows a Gaussian with mean ∑ J j=1 w ij h j + b i with unit variance.
Initially, the RBMs can be applied to real-valued stochastic variables, after which they can be processed using the Bernoulli-Bernoulli RBMs, which are a subset of the RBMs. The Gaussian and binomial conditional distributions are  International Journal of Photoenergy the two most common RBM conditional distributions that are used in this discussion. An RBM performance could be improved by boosting or bagging an RBM with better beginning start weights in order to improve the performance of the RBM in question. There was also discussion of how particular optimization tactics can influence the network ability to converge, and thus the outcomes, in the literature review. We did not apply the genetic optimization strategy to our RBM, despite the fact that it has been shown to improve the performance of RBMs in previous research. When it comes to the RBM, it is critical to compare performance and computation time. A feasible

Results and Discussions
This section of the chapter will include a continuation of the topic of data processing as well as a full treatment of the machine learning methodologies. Through the course of the research, R version 3.4.3 and RStudio have been used to do data processing and mathematical computations. Caret is the most widely used R package, and it is a wrapper that contains tools for speeding up the construction of predictive models and preparing data. Caret is the most widely used R package. K-fold cross-validation is a technique that can be used to accomplish cross-validation when completing a statistical analysis. The collection is randomly divided into k equal halves, with the number of halves determined by chance. As a result of the validation principle, one of the k-folds is chosen as a test set, and the remaining observations are chosen as a training set. In the following step, the test set is relocated to a new fold, while the remaining dataset is used as training data. An individual performance metric is calculated for each of the k-folds that have been completed. After each metric has been obtained, the average performance of each fold is computed using the information gathered. In order to identify the best performance metric from the cross-validation, it is possible to perform this for a variety of hyperparameters. It is usual for the number k to fall within the range of 5 and 10.
It is crucial to get the weights just right from the beginning. If the weights are set too close to zero, the network may be forced to converge to a linear model. In most cases, zeroing out the initial weights is accomplished by drawing them at random. As a general rule, it is preferable to select lower values rather than larger ones. Initial weights that are both positive and negative are typically preferred. This is due to the fact that they provide greater flexibility in the analysis.
The rate at which back-learning propagation occurs will have an effect on the final result. Training can be timeconsuming when the learning rate is either excessively high or excessively low. For example, instead of employing a sophisticated technique, one might simply set a relatively high learning rate and then progressively drop it until, within a reasonable length of computation time, the accuracy of the model degrades to an undesirable level.
Applying penalizing terms to the minimization issue, like in the Lasso example, is a prominent therapy for overfitting in the optimization context. The scale of the input layer can have an impact on the weights of the output layer, which in turn can have an impact on the output results. As a result, it is common practice to standardize all data inputs to ensure consistency. A standard deviation of one is typically used as a starting point for data analysis when dealing with large datasets.
The peak power of these installation ranges between 20 kW and 150 kW, and the data on their energy output was gathered from these installations as well. The data on energy production was obtained from five different places and sampled every 15 minutes. The data was likewise collected from five different locations.
Certain aspects of access to the NWP data have been restricted. In the first instance, because forecasts are created every six hours and cover a time span of six hours in advance, it is not possible to produce longer-term estimates. Because one cRBMot makes use of weather forecasts that are  International Journal of Photoenergy no longer accessible, the training set is reduced for longer time horizons. This means that the precise alignment of the solar PV panel is not represented in the data. When the forecasts (Figures 1-5) are almost identical, the levels of actual energy output that result can be dramatically different, showing the underlying ambiguity of the data that was used to make the projection in the first place. In some cases, small local errors in the input data can be paired with small local flaws in the forecasting algorithms to produce huge global defects in the model, which can be very difficult to detect. Figure 6 reveal the MAPE.
The inherited patterns in the data may be able to assist in explaining the situation from a data perspective. During the winter months, for example, there was no reported energy generation at the vast majority of the power stations. This occurred occasionally as a result of snow accumulating on the panels for an extended length of time. If the series has a large number of days with zero output in a row, the series will become nonstationary for the remainder of the time period. Keep in mind that because the clear sky normalization does not affect zero-valued variables, it is difficult to maintain a constant state of affairs.
The different RBM parameter configurations did not significantly outperform one another. A greater amount of testing is required because the initial weights were picked at random and only a small number of trials were completed for each arrangement. Additional testing should be conducted. Despite the fact that the RBM outperforms other machine learning models, there is still room for improvement due to the model inherent complexity.

Conclusions
In this study, we employ a machine learning method to generate a prediction of solar potential over a large number of photovoltaic panels installed on roof tops. The use of RBMs to anticipate or forecast rooftop solar potential is an example of machine learning in action. In order for the machine learning model to be trained using the training dataset and then tested with the test dataset in order to validate the model, the training dataset must be provided to the model. The results of simulations are run in order to assess the rooftop solar potential, and the results are analyzed using the Rpackage and a number of libraries. The simulations indicate that the new strategy has a higher accuracy in predicting the future than the previous approaches, which is encouraging news. In future, large-scale real-time PVs can be connected with this model for finding the outcome using machine learning.

Data Availability
The data used to support the findings of this study are included within the article. Further data or information is available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.