Short-Term Prediction Method of Solar Photovoltaic Power Generation Based on Machine Learning in Smart Grid

In order to improve the accuracy of ultra short-term power prediction of the photovoltaic power generation system, a short-term photovoltaic power prediction method based on an adaptive k-means and Gru machine learning model is proposed. is method rst introduces the construction process of the model and then builds a short-term photovoltaic power generation prediction model based on an adaptive k-means and Gru machine learning models. en, the network structure and key parameters are determined through experiments, and the initial training set of the prediction model is selected according to the short-term photovoltaic power generation characteristics. And the adaptive k-means is used to cluster the initial training set and the photovoltaic power on the forecast day.eGrumodel is trained on the initial training set data of each category, and the generated power is predicted in combination with the trained Gru model. Finally, considering three typical weather types, the proposed method is used for simulation analysis and compared with the other three traditional photovoltaic power generation single prediction models. e comparison results show that the proposed short-term photovoltaic power generation prediction method based on an adaptive k-means and Gru network has better eect, better robustness, and less error.


Introduction
In the 21st century, with the increasingly erce global economic competition, the demand for energy in various countries is increasing, and the energy problem has become a key factor a ecting the international in uence of countries [1]. However, with the continuous growth of social and economic development demand, the use and consumption of traditional fossil energy are also increasing, which not only causes the global shortage of traditional fossil energy but also causes a lot of pollution to the environment, and these pollution are serious and di cult to control. ere is no doubt that this will run counter to China's proposal to build a resource-saving and environment-friendly society [2][3][4].
ere are many renewable energy sources on Earth, including solar energy, wind energy, and many other energy sources, but all renewable energy sources have a common feature, that is, it is not easy to collect, and there is great inconvenience. Compared with other energy sources, solar energy is easy to obtain, the use conditions are unlimited, and there are no major technical barriers to its application. At the same time, it is inexhaustible. It is a new type of green energy widely used in the world at present. e most direct use of solar energy is photovoltaic power generation [5][6][7].
As the largest developing country, China's annual total solar radiation is between 928 kWh/m 2 -2333 kWh/m 2 , and the annual average solar radiation is 1626 kWh/m 2 . In most regions of China, the daily average radiation can reach 4 kWh/m 2 , which shows that China has superior natural conditions for photovoltaic power generation [8][9][10][11]. Globally, the newly added PV installed capacity increased from 39.6 gw in 2010 to 580.16 gw in 2019. e speci c growth of each year is shown in Figure 1.
With the policy support of encouraging the development of photovoltaic power generation in various aspects, China has made breakthroughs in photovoltaic power generation technology in recent years. Especially for some central and western regions of China, a ected by other energy structures, they have to transform in this direction. At the same time, more and more difficulties are faced, and the research on the prediction of photovoltaic power generation is also increasing year by year.
Photovoltaic power generation is the most important way for humans to use solar energy at present. It will not affect the environment during this utilization process but it has the advantages of short construction period, mature technology, large-scale development, and sustainable development. It has broad development prospects and is more and more valued by people [12]. Since photovoltaic power generation can only be carried out under sunshine, and there are weather changes such as day and night alternation and cloudy, sunny, rainy, and snowy on Earth, the photovoltaic power station can only generate electric energy during the day. It is a typical intermittent power supply. Its power generation is affected by meteorological conditions such as solar irradiation intensity and ambient temperature and has great volatility and randomness [13][14][15][16]. ese characteristics of photovoltaic power generation will have a severe impact on the stable operation of the power grid when large-scale photovoltaic power generation is connected to the grid and have a negative impact on the entire power system, which is one of the important challenges faced by large-scale photovoltaic power generation. Relevant research shows that when the grid-connected capacity of photovoltaic power generation accounts for more than 15% of the total power generation of the power system, its fluctuation may cause the paralysis of the power system [3]. If the photovoltaic power generation power can be predicted timely and accurately, the impact of the fluctuation of photovoltaic power generation on the power grid will be greatly reduced, which is of great significance to the power grid dispatching and the operation of photovoltaic power stations [17,18]. It is an important topic for researchers in relevant fields at home and abroad to use some energy transfer equations or mathematical statistical algorithms to predict the photovoltaic power generation power in the future, know the output of photovoltaic power stations in advance and provide a reference for power grid dispatching and power station operation [19][20][21][22][23][24]. However, errors will inevitably occur in the prediction of photovoltaic power generation. If the error of the prediction method used is too large, it will pose a serious threat to the reliability of the photovoltaic output prediction system developed and cannot be popularized. As the proportion of photovoltaic power generation capacity in the total power generation capacity of the power system increases, the impact of these prediction errors on the stability of the power system also increases. erefore, in order to increase the proportion of photovoltaic power generation capacity in the power system and improve the scale of photovoltaic power generation to expand its economic and social benefits, it is of great significance to make a short-term prediction of photovoltaic power generation [25][26][27]. e existing short-term photovoltaic power prediction methods can be roughly divided into two kinds, one is the physical method, the other is the statistical method. e physical method is generally to establish the prediction model through the solar radiation transfer equation and the operation equation of photovoltaic equipment; Statistical methods use the relationship between historical operation data to establish prediction models [28]. Because the physical prediction model needs to know the specific parameters of solar radiation equation and photovoltaic modules, its generalization ability is not strong, and the prediction accuracy is significantly lower than that of statistical methods, which have been less used. With the continuous breakthrough of human beings in the process of intellectualization and the continuous investment in research, scholars from various countries began to extend the    research on photovoltaic power prediction methods from traditional statistical methods to artificial intelligence algorithms and achieved good results in photovoltaic power prediction research based on the neural network. Now the process of intellectualization has been deeply rooted in the hearts of Chinese and foreign people. No matter in any field or anywhere, people expect to realize intellectualization as soon as possible. is is also an important field of current research, and its performance is more superior and advanced than other methods. However, the traditional ANN is easy to fall into a trap called local extremum during training, and its accuracy still has great room for improvement. Moreover, the existing high-precision ANN prediction methods need high-precision input data, and it needs to manually select the factors that have a great impact on the photovoltaic power as the input samples, and a lot of preprocessing work needs to be performed on the sample data before training the neural network. When the sample is complex and the feature density is low, the shallow Ann may not be able to effectively learn the internal relationship between the input data and the output data, resulting in low prediction accuracy [29].
By comprehensively analyzing the research status of the above two different directions of point prediction and interval prediction, it can be found that the research and prediction with the increasing demand for photovoltaic in the world, and the research results emerge in endlessly. Among them, the model established by using the deep learning method has high prediction ability, and it is also one of the main directions of photovoltaic generation point prediction in the future.

Adaptive k-Means eory.
In the case of three typical weather types, clustering the photovoltaic output power, respectively, can build a more targeted prediction model and further improve the prediction accuracy. Traditional k-means cannot actively determine the number of clusters without knowing the data set, so it is not suitable to cluster the input data set directly. erefore, this paper introduces an adaptive k-means, which can automatically set the number of clusters according to the input data set. Its main idea is an iterative process based on distance [30]. e steps of k-means algorithm are as follows: (1) First, according to our prediction needs, we build targeted models, mainly to determine the number of inputs, outputs, and interneurons (2) e second step is to screen the constructed data set and randomly select k data as a center of our initial blood drug clustering, which is recorded as: λ 1 , λ 2 , . . . , λ k (3) e third step is to calculate the Euclidean distance between the remaining samples and the cluster center through the algorithm and assign it to the nearest cluster center to form K clusters. e distance measurement formula is given in the following formula: (1) (4) In the fourth step, the cluster center is updated by the distance measurement method to be the mean value of all the samples belonging to the cluster (5) After the above four steps are completed, a basic unit step is completed. After that, only steps (1), (2), and (3) need to be repeated until the algorithm converges In addition, in order to reduce some unnecessary errors caused by manual operation to the calculation of the model, we have also made appropriate improvements to this method, mainly through some optimization methods of quantitative search samples to realize the automatic optimization of clustering. K-means selected in this paper is one of these methods, and its specific definition is shown in the following formula: In order to avoid generating too many clusters, a threshold is used to limit the number of clusters, which is recorded as k max . e adaptive k-means clustering process is shown in Figure 2.
e strong randomness and large fluctuation of photovoltaic power generation are largely related to meteorological and environmental factors. Temperature, humidity, wind speed, total radiation, air pressure, and other factors have varying degrees of influence on photovoltaic power generation. erefore, in this process, it is important to choose the right method, and this method can correctly express our impact on the short-term prediction of photovoltaic power generation. rough the expression of reference 28, we can find that this kind of influence factors are numerous and vary greatly and are related to various factors such as regions, among which the biggest influence factor is climate, which is mainly reflected in the variable factors of sunshine in the region. If the sunshine is sufficient and long, the photovoltaic power generation efficiency is high, otherwise the opposite is true. erefore, when we choose variables and determine factors, we mainly rely on the above views.

GRU.
GRU is a new neural network model developed from the deficiency of LSTM. e proposal of LSTM solves the problem of the RNN model. ere is also a gated cyclic unit network in the cyclic neural network, which also solves the problem that RNN cannot deal with the dependence of large time step distance. GRU is a variant of LSTM and can also well capture long-distance dependency problems. e difference between GRU and LSTM lies in the number of gate units. GRU network simplifies the three gates of LSTM into gate units: the update gate and the reset gate. e reset gate is a combination of a memory cell and a hidden layer. Its function is to control the transmission of the hidden state information of the previous time to the candidate hidden state of the current time and then reset the candidate hidden state information at the current time. e update gate is a combination of the forgetting gate and the input gate. Its function is to realize the hidden function that the original structure does not have through its own designed structure and to update this hidden information through its own special structure. By simplifying the gate structure, the network model parameters are reduced, the operation becomes simpler, and the performance of the network model is improved. GRU network structure is shown in Figure 3.
As shown in Figure 3, GRU network is divided into reset gate r t , update gate z t , H t , and H t . e calculation expressions for updating doors, resetting doors, and hidden states are as follows.

Update Gate.
e input is the input X t at the current time and the H t−1 at the back time, and the output of the update gate is z t . z t is the linear combination of X t and H t−1 , and then input to Sigmoid function to get a value from 0 to 1. (3)

Reset
Gate. e input is still the input X t at the H t−1 at the back time, and the output of the reset gate is r t . r t is the linear combination of X t and H t−1 , and then input to Sigmoid function to get a value from 0 to 1.

Candidate Hidden State.
Multiply the reset gate output r t and the H t−1 at the previous time by elements, and input the operation result into the tanh function after linear combination with the current input X t , so that the value of the H t is between −1 and 1.
From the above formula, we can see the function of reset gate r t . When r t � 0, the result of element multiplication between the reset gate output r t and the previous hidden state H t−1 is 0. It means that the H t−1 has no effect on the H t , which is equivalent to discarding the H t−1 information. en the H t at the current time is only related to the input X t at the current time, which resets the hidden state. Because of this, resetting the gate at this time helps to capture short-term dependencies.

Hidden State.
e update gate output z t is linearly combined with the H t−1 at the previous time and the candidate hidden state at the current time.
From the above formula, we can see the function of the update gate z t . When z t � 1, then 1-z t � 0. At this time, the hidden state H t−1 of the previous time completely gives the hidden state H t of the current time and completely retains the hidden state of the current time. At this time, if there is a long-term dependency, the hidden state information can also be transmitted and retained. Because of this, GRU can capture long-term dependencies, which is also the most critical part of the GRU network. en updating doors helps capture long-term dependencies.

Construction of the Joint Prediction
Framework Based on Adaptive K-Means and GRU  forecasts the photovoltaic output power under three typical weather conditions. e main steps of model construction are as follows: Step 1: according to the input and output of the model, determine the basic input and output units of the model, as well as the number of neurons and the type of activation function; Step 2: considering the weather conditions of the day to be predicted, select the historical similar day data as the initial training set; Step 3: normalize all data to obtain the training set and the test set; Step 4: use adaptive k-means for clustering analysis of historical data and predicted daily data under each weather and combine Gru for training and prediction; Step 5: get the results of the photovoltaic power generation prediction model through model training.
e specific process of the model constructed by this process is shown in Figure 4.

Introduction to Original
Data. e experimental data set in this paper is a photovoltaic power generation data set in a certain region. is data set records the relevant power generation data of more than 100 users equipped with solar power generation devices in 2020, and the data sampling frequency is 1 hour. at is, taking the power station as the unit, the respective power and meteorological data are recorded. e fields used in the power generation data table are shown in Table 1. e relationship between the three is: use � gen + grid. Use is the total power consumption of each part of the photovoltaic power station; Gen refers to the power generated by solar photovoltaic power generation itself; Grid refers to the power that the power station is connected to the power grid. A positive value indicates that it is connected to the power grid, and a negative value indicates that it is output to the power grid. e other two powers are always positive. e format of the above power generation data set is different due to the different types of sensors configured by each household, and the format of the power consumption field is also different. When the time information is recorded, it mainly includes some data sets in the following three formats: type I data set contains a Gen (generated power) field, which can be directly used in the prediction experiment of photovoltaic output; Type II does not contain a Gen field, which needs to be obtained by subtracting grid data from use field; e third type is automatic accumulation during recording. To obtain the generated power of this period, you can obtain it by subtracting the data of adjacent periods. e unit of the three is kW, so the unit of photovoltaic output power in the full text is also kW. e weather data matches the power generation data, and its specific description is shown in Table 2. Figure 5 shows the photovoltaic output power in one week. It can be seen that photovoltaic power generation has obvious diurnal periodicity, volatility, and randomness. erefore, we will rely on the powerful feature learning ability of the deep learning algorithm to analyze and predict it and realize the effective capture of its characteristics.

Data Preprocessing.
It can be seen from Figure 5 that the power generation data generally has the characteristics of stable change trend and relatively concentrated values. However, in some special times and stages, a part of the data is usually incomplete (lacking some interesting attribute values), inconsistent (including code or name differences), and vulnerable to noise (errors or abnormal values). At the same time, when the database is too large, the data sets often come from multiple heterogeneous data sources, showing low-quality data, and the results obtained from network training are often poor. erefore, it is necessary to preprocess the original power generation data to eliminate the influence of the dimension of the data itself and other useless features to ensure the effectiveness of the sample data training, and it is necessary to carry out standardized processing.
At present, the normalization of maximum and minimum values based on numerical linear transformation is often adopted at home and abroad, which restricts the where X max and X min and X * , respectively, represent the maximum and minimum values of photovoltaic data types of the overall data set, as well as the standardized data obtained from the standardization of this formula, whose size is not greater than 1 and not less than 0.

Selection of Prediction and Evaluation Indicators for Photovoltaic Power Generation.
e evaluation methods of the prediction algorithm mainly include: root mean square error RMSE, average absolute percentage error MAPE, square sum error SSE, mean square error MSE, and average absolute error MAE. In this paper, the root mean square error RMSE and average absolute percentage error MAPE are used to evaluate the prediction results.
(1) e formula of root mean square error RMSE is as follows: where N is the predicted quantity; P f ′ is the prediction data; P a ′ Is the actual data; i is the prediction time.
(2) e average absolute percentage error MAPE, whose formula is as follows: where N is the total number of sample data; P f ′ is the ith predicted value; P a ′ is the ith actual value. e MAPE evaluation criteria for evaluating the prediction accuracy are shown in Table 3.

Model Input Feature Selection.
is paper selects the photovoltaic power data in a certain interval of a photovoltaic power generation system and collects 24 sample points every day. It can only select the period of stable output of photovoltaic power for analysis. e photovoltaic power generation power under different weather is shown in Figure 6. When the weather is relatively stable, the photovoltaic power generation power is the highest in sunny weather, and the others are cloudy, cloudy and rainy, and snowy weather in turn. In stable weather, the power fluctuation of the photovoltaic system is small, and the output is relatively stable, which is close to the parabola as a whole. In sudden change weather, photovoltaic power generation fluctuates greatly, which has a great impact on the stable operation of the entire power grid. erefore, distinguishing    weather types is of great significance for the prediction of photovoltaic power generation.
To sum up, we can know that there are many factors that can affect photovoltaic power, but it mainly includes weather and time period. erefore, the characteristic input of photovoltaic power prediction in this paper mainly includes the above two kinds.

Prediction Steps of Photovoltaic Short-Term Power
Generation.
e combined neural network prediction model established in this paper mainly highlights the impact of day type on the output power of the photovoltaic power generation system. When establishing the model, we need to focus on this point and improve the weight of day type index and do not need to consider the impact of weather factors when predicting. It can adapt to all kinds of weather and has strong adaptability and good prediction ability.

Determining Input Data.
e number of nodes in the input layer corresponds to the input variables of the prediction model. e prediction model in this paper has a total of 12 input variables. Considering the specific geographical location of the photovoltaic power station selected in the paper, combined with historical data, the power generation between 18 : 00 a.m. in a day and 8 : 00 a.m. in the next day is almost 0, so the power generation of 10 power generation time series from 9 : 00 a.m. to 18 : 00 a.m. on the day before the prediction day is selected as the input to the prediction model.
In the prediction of photovoltaic power generation output, it is usually necessary to divide the sub models skillfully according to the type of day, otherwise the prediction model may fail. According to the factors that affect the photovoltaic power generation system, because the type of day has a large influence factor in the influencing factors of photovoltaic power generation output, the paper further increases the weight of the type of day index when establishing the prediction model.

Determining the Output Layer.
e prediction result is to predict the generation output power in each period of the day, so there are n nodes at the output end, corresponding to the hourly output power between 9 : 00 and 18 : 00, respectively.

Sample Data Selection.
Considering the weather type, the photovoltaic power in sunny, cloudy, and rainy days is predicted by using the prediction model in this paper. e data comes from the field measured data of a city in China, including total radiation, humidity, temperature, and photovoltaic power generation. e data is the real data of 2020. e selected time period is between 9 : 00 and 18 : 00, and the sampling interval is 15 minutes, that is, the number of sampling points in a day is 40. e predicted days are the first period in the period, i.e., sunny days. e second period was cloudy and the third period was rainy. Due to the lack of snow data, it is not included in the analysis and prediction. e power fluctuation caused by meteorological factors in adjacent periods is small. erefore, the data of similar days with the same weather similar to the forecast day is selected as the initial training set. According to the type of forecast days, the similar 10 days are selected as the initial training set.

Cluster Analysis.
Taking the total radiation, humidity, and temperature of similar days and forecast days as inputs, the photovoltaic power generation power under three weather types is clustered, respectively. e initial training set is 400 time sampling points, plus the data of the forecast day, a total of 520 data. According to the previous analysis, adaptive k-means are used for clustering. Considering the length of the data, k max is set to 10. Take time period 1 (sunny) as an example for analysis. e value range of K is [2,10]. DBI when k takes different values is shown in Figure 7.
According to Figure 7, when k is taken as 6, DBI is the largest, which is 1.61; When k takes 4 and 8, DBI is the smallest, which is 1.28. At this time, the clustering effect is the best. erefore, the data of the initial training set and the forecast day on sunny days are divided into three categories, which are called class 1, class 2, and class 3 here. Similarly, adaptive k-means are used to cluster the data of the initial training set and the predicted day in cloudy and rainy days, respectively. When k is taken as 4 when it is cloudy, DBI is the smallest, which is 1.12. In rainy days, K is taken as 4, and DBI is the smallest, which is 1.46.

Analysis of Prediction Results.
In order to compare the advantages of the model constructed in this paper, in addition to the training of the constructed model, this paper also selects two other commonly used short-term prediction models (LSTM and BP) and a single GRU model for comparative analysis.
Among them, the photovoltaic power prediction results in sunny weather are shown in Figure 8, and the prediction  Table 4. In sunny, the fluctuation of photovoltaic output curve is small, and the power change has a certain regularity. e four models show good prediction results. By analyzing MAPE and RMSE, it can be found that the error of the proposed adaptive k-means GRU prediction model is smaller than that of the single prediction model and the other two models, and the accuracy of the single model prediction is slightly reduced when the power curve fluctuates slightly at noon. Compared with GRU, LSTM, and BP model, the proposed prediction model based on adaptive k-means GRU can be better close to the actual power curve as a whole, and the fitting effect is the best. e photovoltaic output power prediction results and RMSE and MAPE results under cloudy weather conditions are shown in Figure 9 and Table 5, respectively. In cloudy weather, the sunshine is obviously lower than that in sunny weather due to the influence of weather. It is also due to the change of such complex factors that it is difficult for each model to control such variable factors, thus it is easy to lead to a series of prediction errors. e prediction results and RMSE and MAPE results are prone to large deviations, and the comparison of the prediction accuracy of various models is thus revealed. It reduces the influence of power data fluctuation on the accuracy of the model and improves the prediction performance of the model during the period when weather conditions fluctuate. In cloudy weather, the MAPE value of k-means Gru is lower than that of Gru, LSTM, and BP, and the values of each model are 0.04, 0.13, 0.21, and 0.37, respectively. It can be seen that the error value of the k-means Gru model is significantly lower than that of other models and is 10.81% of that of the BP traditional neural network model. erefore, according to the above, it can be concluded that the prediction performance of the k-means Gru model for photovoltaic is better and the prediction deviation is the lowest. e prediction results of photovoltaic output power and RMSE and MAPE results under rainy weather conditions are shown in Figure 10 and Table 6. Under the condition of rainy weather, the influence of weather is more obvious than that of overcast weather, and its sunshine is significantly lower than that of overcast weather, accompanied by the interference of rain. Under the change of this more complex factor, the conditions for controlling this variable factor are more stringent, which leads to more prediction errors, and its prediction results and RMSE and MAPE results are prone to large deviations. e comparison of the prediction accuracy of various models is thus easier to show. When weather condition fluctuates from time to time, it is also called sudden change weather, which is found through the prediction results. Under the sudden change weather of rainy days, the MAPE value of k-means Gru is lower than that of GRU, LSTM, and BP, and the values of each model are 0.08, 0.17, 0.27, and 0.42, respectively. In contrast, other models jump out of the predictable range and the acceptable error range. Only the model error constructed in this paper is still within the controllable range, and the intelligent k-means Gru model has a better prediction performance for photovoltaic, with the lowest prediction deviation.     Mathematical Problems in Engineering Based on the short-term prediction results of the above four models for photovoltaic power generation, it can be found that the adaptive k-means Gru model constructed in this paper has better advantages than other models in terms of the prediction of sudden weather, such as sunny, cloudy, or rainy days. Its errors are lower than other models, and its accuracy is higher, which verifies the effectiveness of the adaptive k-means Gru short-term photovoltaic power generation prediction model constructed in this paper.

Conclusion
In order to solve the short-term prediction accuracy temperature of photovoltaic power, this paper divides the weather types and proposes a photovoltaic power ultrashort-term prediction model based on the adaptive k-means GRU method. e adaptive k-means is directly used to cluster the initial training set and the photovoltaic power of the prediction day and find out the local characteristics of the data and predict the power. ree single models are established to compare with the proposed model, and the prediction error is evaluated according to MAPE and RMSE. e proposed model solves the problem of low accuracy of traditional prediction methods in power fluctuations. e main conclusions are as follows: (1) Photovoltaic output power has great randomness, among which there are many influencing factors, including injection time and weather, especially the weather type has a great impact on photovoltaic output power. (2) By using the combination of adaptive k-means and Gru models, the data preprocessing and clustering analysis of the initial training set and the photovoltaic power generation on the forecast day can significantly improve the prediction accuracy of the model. (3) rough comparison with other models, it is found that the prediction performance and stability of the model proposed in this paper are better in sunny days and cloudy days, and the prediction performance is improved in rainy days. e prediction error value of the model is significantly lower than that of other neural network models, and only 10.81% of that of the BP neural network, thereby indicating the reliability of the model. (4) According to the prediction results of sunny, cloudy, and rainy days, the prediction accuracy of the adaptive k-means and Gru combined model meets the ultra-short-term prediction requirements of the output power of the photovoltaic power generation system. e difference between predicted power and actual power is small, and the prediction error value does not affect the normal operation of the system.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.