Short-Term Power Prediction of Building Integrated Photovoltaic (BIPV) System Based on Machine Learning Algorithms

One of the biggest challenges is towards ensuring large-scale integration of photovoltaic systems into buildings. This work is aimed at presenting a building integrated photovoltaic system power prediction concerning the building ’ s various orientations based on the machine learning data science tools. The proposed prediction methodology comprises a data quality stage, machine learning algorithm, weather clustering assessment, and an accuracy assessment. The results showed that the application of linear regression coe ﬃ cients to the forecast outputs of the developed photovoltaic power generation neural network improved the PV power generation ’ s forecast output. The ﬁ nal model resulted from accurate forecasts, exhibiting a root mean square error of 4.42% in NN, 16.86% in QSVM, and 8.76% in TREE. The results are presented with the building facade and roof application such as ﬂ at roof, south façade, east façade, and west façade.


Introduction
Economic growth has given rise to increasing global demand for electrical energy production and consumption. Solar power plants are very common in renewable energy sources [1][2][3][4]. Satellite technology allows us to fly around the world. In addition to being easily installed on the roof of the building, PV modules will act as stand-alone solar power generators [5][6][7]. The installation of photovoltaic panels has increased every year in recent years. Globally, 117 gigawatts of solar PV energy are generated in 2019 [8]. Traditional grid-based power distribution also operates on stable power supply lines and a consistent load [9]. Grid efficiency can be enhanced by controlling both suppliers and customers. Solar PV power could interfere with conventional power generation, making conventional generation uncomfortable or even unworkable [10][11][12][13][14].
Machine learning has become more common in forecasting and classification because it reliably processes complex or nonlinear problems. They can distinguish the relationship between input and output variables, even when the representation is unlikely [15]. The most common are artificial neural network (ANN) [16], fuzzy logic (FL) [17], support vector machine (SVM) [18], K-nearest neighbor algorithm (kNN) [19], and decision tree-(DT-) [20] based techniques (including random forest (RF) [21]. Specifically, artificial intelligence approaches are discussed in detail to improve photovoltaic performance forecasting models [22]. IRT is the most commonly accepted technique for categorising light poles [23], centred on image processing techniques to distinguish between healthy and defective panels of all image processing-based approaches. Various patterns, challenges, and opportunities for the implementation of ANN light poles are highlighted [24]. Random forests were the most reliable among the various forecasting techniques used by the site and regional forecasters [25]. Several mathematical models were developed to increase the accuracy of diagnoses [26]. Also, the use of PHANN for clear days of the sky resulted     [28]. The input weights and the hidden layer biases of ELM are allocated. Randomised and then measure the least square instead of iteration approaches. The ELM will help to gain information and better transition between situations. Capability of many ELM-based models has been presented, and their excellent capacity has been verified in predicting PV power production [29]. The results are using ensemble method combining lower upper bound estimation (LUBE) and ensemble learning methods. Considering the degree of convergence and prediction precision [30], it is worked to combine ELM with the entropy method to build a hybrid forecast method for short-term PV power production, which is preferable to the radial basis function neural network and the generalised Deep Learning Network [31]. The alternative multimodel is based on ELM for PV power predictive. Forecasting is essential for operating power plants and other utilities [32]. The feasibility-prospective forecasting model is also developed and proved effective in predicting the short-term power production of PV systems. Certain parameters are allocated at random in traditional artificial neural networks (ANN), causing a certain degree of error and uncertainty in the prediction performance. Several artificial neural networks (ANNs) have been integrated with Generalised Annihilation (GA) to solve this problem. It is stated that a convolutional neural network system was effective in predicting solar irradiance, where the GA was applied to optimise the associated hyperparameters. However, in conventional ANNs, many parameters must be carefully optimised to establish learning strategies [33].
The algorithms are developed to predict PV power generation [34]. Time series have been decomposed into high-and low-frequency components. The DBN model will then be used to provide high-frequency trends. Finally, the forecasted trend components are summarised in the final results. GA is the algorithm most commonly used to solve nonlinear optimisation problems. Genetic Algorithm (GA) Theory of Evolution and Calculation of Individual Fitness Functions is predicted for evolutionary theory. The GA involves the iterative selection of elite individuals, crossover applications, and mutations [35]. support vector machine (SVM) to predict short-term solar PV power and optimise SVM parameters using the Meta-SVM Optimizer [36].
Since SVM uses Quadratic Programming, SVM training takes a long time when the number of items is large. Energy resource forecasting based on neural networks is very good at predicting solar power due to its strong task scheduling (-threshold) and outstanding mapping capability. The combined method for predicting PV power consumption with ANN and analogue integration is investigated [37]. ELM is developed and built using a neural feed-forward network (FNN). ELM can train without altering its weights and thresholds. It is characterised by rapid training speed and strong generalisation capability and strong applications [38]. The ELM model can effectively address complex nonlinear regression problems. This model has already been used to predict the radiance and power output of PV systems. The specific objective of explaining PV power fluctuations using a graphical method based on the ELM model has been reported [39]. A similarity-based photovoltaic module power prediction model is developed using the available historical data [40]. ANN model is used to predict monthly global solar radiation concerning the power predication based on the geographical location [41]. The performance of the photovoltaic module varies concerning the geographical location. The PV system's prediction is based on the machine learning algorithm developed, and the stability of the model is  3 International Journal of Photoenergy validated [42]. The algorithm is developed to predict the grid's load dispatch connected photovoltaic system for the microgrid [43].
The PV output power is determined using various algorithms to predict accuracy. A short-term, day-ahead power prediction is analysed and a long-term, day-ahead power prediction concerning climate condition. A study on the building-integrated PV system is required to be undertaken. In this study, the PV output is normalised based on experimental studies. The machine learning algorithm is used to predict the efficiency of the building integrated photovoltaic system concerning the various orientations. The systems being installed are flat, south and east oriented, and west oriented façades. The artificial neural network, decision tree, and quadratic support vector machine algorithm predict the BIPV system's performance short-term power prediction.

Machine Learning Algorithms
State-of-the-art solar power technology will only be established if forecasters can predict how much solar power will be available at a specific location at a given time. The built model can be replicated since it includes only environmental data without regard to geographical locations. The machine learning models are developed with three types of training, validation, and test set depending on the design's nature. The work flow chart is shown in Figure 1. 2.1. Artificial Neural Network. Artificial neural networks (ANNs) can describe nonlinear, complex, and incremental behaviours through input-output training patterns. An ANN characterisation is based on an architecture that shows  International Journal of Photoenergy the connections between nodes, the determination of weights methods, and the activation function. Artificial neural networks' ability to learn from large samples makes it neural networks possible to solve several major and complex problems [44]. The most common neural structure of the network is the feed-forward structure. A typical neural network is made up of different computational components called neurons. The input and output layer's weights and biases are mutually optimised until the output neuron values are within possibility of weeding out false reasoning errors. This approach was successfully applied in response to regression problems [45]. This feed-forward network model is presented in Figures 2 and  3. Each layer has a hidden layer and many nodes in the hidden layer, while the user-defined function types are shown in Table 1. ANN methods can handle nonlinear systems. Still, problems of overfitting, local minima, random initial data, intensive training data requirements, and increased complexity due to multilayered architectures are the limitations [46,47].
The support vector machine (SVM) has become popular in several areas because it uses soft computing. The fundamental concept in SVM is to apply optimisation in certain areas through nonlinear optimisation. For classification data, support vector regression (SVR) has been developed. The kernel functions are a prominent feature of the SVM, making the data part of the higher-dimensional space product of a lower-dimensional input space project on higherdimensional outputs. Several nonlinear kernel functions can be used in regression applications. As a kernel function, the radial base function is very efficient (RBF). Help vector machine is used for classification and regression problems. Help vector machine can be used for linear and nonlinear regression. SVM is aimed at fitting as many patterns between planes as possible. The SVM is based on several different methods. Quadratic Kernel Trick has been implemented in this model due to its low interpretability and medium model versatility. QSVM is more sensitive, that is a major limitation of the proposed model [46]. The quadratic SVM algorithm is shown in Figure 4(a).

Decision
Tree. The decision tree shall create models of regression or classification, which shall be detailed in the tree's hierarchy. First, the data is broken down into subgroups of small numbers, while at the same time, an artificial submatrix is created, which in its turn allows new subsets. The result is an elaborate tree crown with decision nodes and leaf nodes. The decision tree can be suffered from an overfitting problem and memory constraints. Calculations are more difficult when the tree structure is more depth [46]. The decision tree algorithm is shown in Figure 4(b).

Energy Performance Metrics
To evaluate the results of our NN-QSVM-TREE models, we need to compare the expected capacity values with the experiment's actual results. The performance of the agencies is, therefore, assessed by these three different metrics [48].

RMSE (Root Mean Square Error).
The root means the square error is simply the square root of the square mean of all the errors. RMSE is a good measure of accuracy but only applicable to comparing model predictions with data and not between variables.
3.2. R 2 Square. This is a statistical indicator that describes the amount of uncertainty explained by an independent variable.

MAPE =
M is the total number of predictions, actual test values, and the NN-QSVM-TREE model's predicted values.

Result and Discussion
The efficiency of this method is modelled and evaluated using machine learning algorithms. Machine learning algorithms considered are decision tree, artificial neural network, and quadratic SVM. Details of the three algorithms can be found in the section. The experimental data provided in [49][50][51] shall be used for this purpose. The power output of the PV panels varies due to the changing environmental conditions of the location. The statistics are given every five minutes. The input parameters for the study considered are the incident solar radiation and the ambient temperature shown in Figure 5.   The correlation plot is shown in Figure 6. PV prediction based on NN-QSVM-Tree Model is likely to happen. Prediction models are graded based on the mean absolute percentage error (MAPE) and the root mean square error (RMSE). The R 2 is widely used in statistics. This model is unstable when the ROI coefficient is zero. When R 2 = 1, firstly, this model is developed by using data available from the experimental data carried out in [49][50][51] to the public. Figure 5 presents the prediction curves of solar radiation. Figure 6 shows stage 1 of our statistical learning algorithm correlation plot of the selected feature of the PV power prediction. The other two models' errors in the processing stage of prediction are controlled at the specified values [-8%, 8%]; the NN-QSVM-Tree models' errors vary greatly, and the SVM model's error is almost 25%. The errors of the NN-QSVM-Tree model appear to land within a ±3 range. NN-QSVM has a relative error of -3% and 3% for 50% samples.   Figure 11: Mean absolute percentage error of predicted models. 8 International Journal of Photoenergy The prediction results of the NN-QSVM-Tree models for a flat roof is shown in Figure 7. The prediction curves of the sunlight power output of the NN-QSVM-Tree models for the first sunny daytime are presented in Figure 7. Every model will essentially show similar patterns of the PV power curve. Also, the error estimate of the tree model estimated in the early stage of observation is 30%. The prediction errors of the NN-QSVM-Tree model fall mostly within a ±2 range. The NN-QSVM-Tree model's relative error values of [-1%, 1%] account for 97.18% and 93% of the total test samples. The NN-QSVM-Tree model is evaluated using the data from typical summer days. The power output is considered as the training data of the model, considered as the test data. The test results of the New Rating scale are shown in Figure 8. This research shows that the NN-QSVM-Tree models display significant prediction errors at the beginning of the prediction process. The fitting effect of tree and SVM models in the late stage of prediction is lower. By analysing the network, it is found that the procedure would still be a lot better than anticipated prediction errors in Figure 9. The root mean square errors (RMSE) of the NN-QSVM-Tree model in the [-2%, 2%] ranges are calculated at 39.44%, 17.43%, and 10.09% of the total evaluation samples. This problem happens because cloudy weather is causing low power production. The predicted predictions are shown in Figure 10. The PV production curve fluctuates greatly during westorientated facades. All three power models roughly mirror the fluctuation pattern of the output power curve. One significant explanation for the high prediction errors is the uncertainty about power production in east-oriented facade. On an east-or west-orientated façade, the predicted model error would be more profound. In Figures 9 and 10, during the initial stage of prediction, the three models' prediction stability is lower while higher in the middle and late stages of prediction. According to the results shown, there are significant differences between the three models' accuracy in Figure 10. Relative error values of NN-QSVM-TREE models in the [-3%, 3%] range account for 17.78%, 11.87%, and 13.88% of the total test samples. Therefore, the prediction errors of the three models also increased despite the research. Evaluation results are shown in Table 2. Table 2 showcased various outcomes of several models for different orientations of the façade and horizontal surface conditions. The RMSE values and the MAPE values of the NN-QSVM-TREE model are always the smallest under different conditions. For three different weather conditions, the NN-QSVM-TREE models' average RMSE values are 6.51%, 5.54%, 10.83%, 13.10%, and 14.35%. The NN-QSVM-TREE model is the smallest in the mean squared error setting, which means the model's predictive stability is optimum. Also, because the PV power output curve has variability and randomness in façade and roof installations, the model's output has a greater prediction error. The average MAPE values are 3.67%, 7.70%, and 8.45%. It can be noticed that the MAPE value of the NN-QSVM-TREE model is lower. The MAPE is shown in Figure 11, supporting the hypothesis that an NN-QSVM-TREE model is useful under different orientation of the building installation conditions. For this assessment index, the R 2 in sunny weather is substantially higher than in façade installations. Under three situations, the average R 2 of NN-QSVM-TREE models are 88.3%, 88.43%, and 88.230%. NN-QSVM-TREE model has a strong superior fitting effect. The RMSE value of the earlier developed model is compared and listed in Table 3.

Conclusion
State-of-the-art solar power technology will only be established if forecasters can reliably predict how much solar power will be available at a specific place at a particular time (forecasted global horizontal irradiance and ambient temperature and the calculated elevation and azimuth angle of the sun). The built model is replicable since it includes only environmental data without regard to geographic locations. According to the nature of creation, the machine learning models are developed with three training, validation, and test sets. The results show that the model operates consistently for a specific time and place. When it comes to neural 9 International Journal of Photoenergy network forecasting, the application of linear regression coefficients will boost forecasters' accuracy. The final model resulted from accurate forecasts, exhibiting a root mean square error of 4.42% in NN, 16.86% in QSVM, and 8.76% in TREE. These aspects of the prediction process show the model is position and configuration independent. Finally, a supportive solar power system is disconnected at night with a stable power supply. A more computerised management of power grids is expected to increase power grid management processes' performance and reliability. The new regulations would also help increase the involvement of renewable energy producers and aggregators in the electricity market.

Data Availability
The data used to support the findings of this study are included within the article.