Short-Term Master-Slave Forecast Method for Distributed Photovoltaic Plants Based on the Spatial Correlation

With the large-scale integration of distributed photovoltaic (DPV) power plants, the uncertainty of photovoltaic generation is intensively influencing the secure operation of power systems. Improving the forecast capability of DPV plants has become an urgent problem to solve. However, most of the DPV plants are not able to make generation forecast on their own due to the constraints of the investment cost, data storage condition, and the influence of microscope environment. 'erefore, this paper proposes a master-slave forecast method to predict the power of target plants without forecast ability based on the power of DPV plants with comprehensive forecast system and the spatial correlation between these two kinds of plants. First, a characteristics pattern library of DPV plants is established with K-means clustering algorithm considering the time difference. Next, the pattern most spatially correlated to the target plant is determined through online matching. 'e corresponding spatial correlation mapping relationship is obtained by numerical fitting using least squares support vector machine (LS-SVM), and the short-term generation forecast for target plants is achieved with the forecast of reference plants and mapping relationship. Simulation results demonstrate that the proposed method could improve the overall forecast accuracy by more than 52% for univariate prediction and by more than 22% for multivariate prediction and obtain short-term generation forecast for DPV or newly built DPV plants with low investment.


Introduction
Electricity power consumption increases drastically in recent years, and with the decreasing supply of fossil fuels, the renewable generation, especially photovoltaic (PV) generation, has developed rapidly as well [1]. In the background of green energy strategy, the global PV installed capacity has reached 300 GW. However, the large utility-scale generation is typically deployed in rural areas, which are far from the load centers; thus, the generated power is not efficiently used. Integration of distributed PV generation with the distribution network could contribute to solving the unmatched location of generation and consumption [2]. However, distribution network is the terminal end of power system, with weak infrastructure and low reserve capacity. e increasing amount of highly intermittent and variant DPV generation will greatly affect the stability of power systems [3,4]. erefore, the accurate generation forecast of DPV is significant for the scheduling and stable operation of power systems. Generation forecasts could be categorized into short-term forecasts (0-72 hours ahead of the next day) and ultra-short-term forecasts (15 minutes-4 hours ahead) [5,6]. Short-term generation forecast provides supportive data for decision-making of power system scheduling and helps improve operational reliability.

Literature Review.
ere is a considerable amount of scientific literature on renewable energy forecasting, and current research [7][8][9][10] on generation forecast of intermittent renewable energy has made great achievements, but the forecast methods are mostly focused on large capacity wind and solar power plants, in which a single generating unit has an installed capacity at MW level. e renewable energy forecast methods could be classified into two major categories: time series forecast method and spatial distribution forecast method.
Time series forecast methods analyze the trends of the past to predict future events, with the assumption that future trends will hold similar to historical trends. Two numerical weather prediction models are utilized to forecast the weather variables used by the third module to predict the hourly energy production in the PV plant in [11]. Weather status pattern recognition model for short-term PV forecasting is presented using a solar irradiance feature extraction and support vector machine [12]. ese references perform time series forecasting to predict weather and then obtain the short-term PV forecast power. Some other references on time series forecasting focus on different algorithms, e.g., traditional physical model prediction [13], BP-artificial neutral network (ANN) prediction with accurate numeric weather forecast [14], extreme learning machine (ELM) [15], and support vector machine (SVM) [16,17]. In [18], a new model combines two well-known methods: the seasonal auto-regressive integrated moving average method and support vector machines method are proposed for short-term power forecasting of a grid-connected photovoltaic plant. A short-term forecasting method is presented for large-scale grid-connected PV plants using ANN in [19]. A genetic algorithm-based SVM model for short-term power forecasting of residential scale PV system is proposed in [20]. Reference [21] provides a review about the methods used to predict PV power, with the main focus being on the metaheuristic and machine learning methods. In general, these time series methods rely on a large amount of historical generation data and numerical weather forecasts and could obtain high forecast accuracy. However, the spatial characteristics of distributed PV systems are not considered. is paper focuses on distributed PV generations, which have the problem of deficient historical data, and considers their spatial distribution characteristics to realize their short-term power prediction.
Spatial distribution forecast method considers the geographic information and the spatial distribution characteristics of PV systems. e effect of spatial and spectral nonuniform irradiance distribution on multijunction solar cell performance is analyzed using an integrated approach [22], and the spatial dependence of variations for small residential PV system power output is investigated, indicating that the fluctuations are correlated up to a certain decorrelation length [23]. In [24], Karakaya applies the finite element method to forecast the diffusion of solar PV systems in time and space, in which the time-varying parameters are arduous to determine. Spatial clustering of PV systems and quantitative analysis of PV adoption drivers in the time dimension are investigated to propose a data-driven forecasting approach of PV diffusion in [25]. ese references are studied to verify the spatial distribution characteristics or forecast the diffusion of PV systems. Our research is to utilize the spatial distribution characteristics for DPV power prediction. ese methods are not suitable to be applied to DPV prediction due to the data constraints and distributed characteristics of DPV [26]. In terms of data constraints, in actual DPV projects, most of the DPVs are not equipped with their own forecasting module and are not capable of storing a large amount of historical data or obtaining weather forecast data because of the limited investment.

Explanation of Spatial Correlation.
In terms of distributed characteristics, the affecting factors of generation include not only natural factors such as radiation and temperature, but also the installed tilt angle, construction layout, vegetation, and microscope weather, which could vary widely even in a small range [27]. Figure 1 illustrates the spatiotemporal distribution characteristics of DPV. e DPVs are distributed in 6 areas across 3 time zones. e microscope environments in each area are different from each other, and the generation of DPV may be more closely related to its surrounding environment than the area it is in. For example, the generation pattern of the DPV in area A may be similar to that in area F, even if it is located far away and in a different time zone, because the microscope environments (shadow of obstacles, moisture, and building height) are similar. e installation details also vary, such as the tilt angle and direction. is similarity, regardless of time-space continuity, is revealed in data correlation, instead of physical connections [25]. We define this correlation as a spatial correlation as follows: Spatial correlation refers to the numerical correlation of DPV generation at different locations. When analyzing the spatial correlation, eliminate the time difference of generation curve with data processing.

Contribution.
e current technique bottleneck of DPV generation forecast is caused by data deficiency and complex influencing factors, making the traditional method of mathematically modelling infeasible in DPV forecast. A new method considering the data deficiency and spatiotemporal distribution characteristics is required to meet the need of DPV forecast. In the current installation, there are a few DPV plants with functional forecast system, which are used as reference plants in the following paper. Meanwhile, most of the DPV plants are not able to make generation forecasts on their own due to the economic and technological constraints. ese plants are later referred to as target plants. According to this reality, this paper takes advantage of big data methodology and proposes a master-slave forecast technique based on spatial correlation between reference plants and target plants considering multiple affecting factors including radiation, temperature, time zone, etc., which were not studied before. e technique utilizes a masterslave forecast framework, matching the generation characteristics of target plants to reference plants using data correlation, forecasting the generation of slave target plants with the forecast data of spatially correlated master reference plants, and realizing DPV generation forecasting with data correlation relationship.
Based on the bottleneck analysis of DPV generation forecast and the characteristics of DPV, the main contributions of this paper are listed as follows: (1) A spatial correlation matching method is proposed to obtain the data correlation relationship across time and space between target plants and reference plants, in which the K-means clustering algorithm is utilized to cluster reference plants into groups with individual patterns on the basis of their generation characteristics. e clustering method could reduce the computation time of online matching and improve the matching accuracy.
(2) A master-slave forecast method is presented to make the generation forecast for a large number of target plants in short-term time scale, in which the LS-SVM algorithm is utilized to obtain the spatial correlation mapping relationship. erefore, the power of target plants as slave could be predicted based on the power of reference plants as master and the spatial correlation between these two kinds of plants.

Article Organization.
e following paper is composed as follows. Section 2 gives the introduction of the masterslave forecast framework. Section 3 describes the matching method for spatial correlation relationship and studies the time difference characteristics of DPV generation curves. Section 4 conducts a case study, validating the advantage of the proposed technique. Finally, Section 5 concludes the paper.

Framework of Short-Term Master-Slave Forecast Technique Based on Spatial Correlation
In this section, the framework of the proposed master-slave forecast technique is illustrated and explained. e forecast method is based on the spatial correlation between the generation characteristics of different DPV plants. Data mining shows that the generation trajectories of different DPV plants in the same time dimension have a certain numerical correlation; that is, two or more numerical trajectories approximately fit in some correlation relationship. For example, Figure 2 shows the generation curves of some randomly chosen DPV plants in 3 different areas and the comparison of selected curves from all areas. It is seen that the generation curves in the same area have different shapes, while a curve might share more similarity with curves from other areas than the curves within the same area, although the DPV plants are geologically closer in one area. erefore, the spatial correlation is defined as a numerical correlation between the generation data of different DPV plants, and the geological relationship is ignored. e master-slave spatial correlation based forecast technique is to utilize short-term forecast of reference DPV plants (master plant) and spatial correlation relationship to forecast short-term generation of target DPV plants (slave plant) indirectly. e master-slave DPV generation forecast framework based on spatial correlation is shown in Figure 3.
As shown in Figure 3, the framework of master-slave prediction method consists of three parts, namely, left part, middle part, and right part. e left part is the forecast results    Correlation Gn-1, Gn-2... of reference DPV stations. Based on the generation trajectory, historical meteorology, and other related information, the power of reference DPV plants is predicted to benefit the prediction of target DPV stations. e middle part is offline clustering of reference plants on the basis of their generation trajectory in history. Because there are a large number of reference plants, and many reference plants are spatialcorrelated, a pattern library is established using offline K-means clustering to reduce the searching time for online matching [28]. e right part is an online matching process of target plants, to establish the mapping relationship of spatial correlation between target plants and patterns in library. If the matching is successful, the forecast results of target plants are obtained based on the forecast results of the correlated pattern and correlation relationship. If the matching fails, other forecast methods should be adopted.

Spatial Correlation Matching with K-Means Clustering
To utilize the spatial correlation between reference plants and target plants, the pattern matching method to find the correlated reference master plants with target slave plants is given in this section. K-means clustering algorithm is used to cluster master plants into groups with individual patterns according to their generation characteristics, thus constructing the standard pattern library. Next, the clustering significance index (CSI) is defined to set the cluster number, and standardized Euclidean distance (SED) is used to match the standardized data of DPV generation to the data patterns in the pattern library to establish the spatial correlation mapping. e spatial correlation matching process using K-means clustering is shown in Figure 4.
e detailed corresponding algorithms are described in Sections 3.1-3.4.

Data Standardization.
Data standardization is the process of data scaling and nondimensionalization so that the data could be compared. In this paper, the raw data of DPV generation A are processed row by row using normalization, with the following equation: where A j is the jth row of the matrix A, A ij is the ith element in A j , mean (A j ) denotes the mean value of vector A j , std (A j ) denotes the standard deviation of vector A j , and B ij is the ith element of the jth row of matrix B. After standardization, A j is converted to B j . e mean of vector B j is 0, and the variance of vector B j is 1. Vector B j is called a standard vector, and matrix B is the standardized matrix of A.
e data standardization could reduce the influence of DPV installed capacity difference on spatial correlation and preserve the characteristics of the trend of historical DPV generation data.

K-Means Clustering of Reference Plants. PV generation
shows the characteristics of uncertainty and fluctuation, and the generation curve of a random day could not represent the general generation pattern of the plant. erefore, the average of several generation days' data is used to establish the pattern library of reference plants.
In this paper, K-means algorithm is adopted for clustering, and several groups are formed of plants with similar generation pattern inside each group. e cluster number needs to be inputted when using K-means algorithm. e cluster significance index (CSI) is given in equation (2) to determine the group number; that is, where N is the number of clustering groups, n j is the number of plants in the jth group, X cj is the eigenvector of the jth group, and X ji is the vector of the ith plant in the jth group. e number of groups is determined through iteration. Different values of N are selected, and CSI is calculated for each N. e value N with the largest CSI is chosen as the input of the group number for K-means algorithm, and the PV generation patterns are obtained subsequently.

Online Matching of Spatial Correlation.
e online pattern matching process is described as follows. Extract n monitoring points from recent historical data backward from the forecast point of the target DPV plant as a prediction window vector. Standardize the prediction window vector and add it to the pattern library as the (T + 1) cluster and perform a clustering process. If the current window vector could be put into the same cluster with the ith vector pattern, the target DPV plant is determined to have spatial correlation with the ith type of reference DPV plants.
Standard Euclidean Distance (SED) is more commonly adopted in actual application as the criterion of correlation. erefore, in this paper, SED is used to quantify the correlation, and the optimal delay value Δt is determined by searching for the minimum SED. (3) gives the equation to calculate SED: where a [i] and b [i] are the ith element of vector A and vector B, respectively. In theory, the probability of successful matching of spatial correlation is higher if the reference PV power plants are distributed more evenly and with larger number. For the target DPV plants that fail to match reference DPV plants, temporal correlation based forecast or other forecast methods are recommended.

Numerical Fitting Using LS-SVM.
After spatial correlation matching, a single one or multiple reference DPV plants are chosen from the spatially correlated reference plant groups. e spatial correlation model is obtained by numerical fitting of the prediction window historical data of reference plants and target plant. Next, the short-term generation could be calculated with short-term forecast of reference DPV plants and the spatial correlation relationship.
least squares support vector machine (LS-SVM) regression is applied to perform numerical fitting, which could achieve better results of multivariate regression. e equation is shown as where α i and b are the coefficients to be determined, and K (X i , X j ) is the kernel function. Radial basis function (RBF) is often used as the kernel function to solve a regression problem, which is given in where σ is called the extension constant of RBF, which reflects the width of the function image. e smaller the width σ is, the more selective the function is.

Forecast Performance Evaluation.
Although the prediction graph could show the results of all forecasting methods intuitively, it is arduous to quantitatively judge the pros and cons of each prediction method objectively. erefore, this paper applies the root mean square error (RMSE) and mean absolute error (MAE) to compensate the shortcomings of the prediction graph. e two error formulas are as shown in equations (6) and (7): where P p is the prediction value of PV power, P r is the actual power, and m is the total number of prediction points.

Influence of Time Difference Characteristics on Spatial
Correlation Matching. Considering the widespread distribution characteristics of DPV, the correlation relationship between the reference plant X and target plant Y may show some time and space difference characteristics; that is, Y (t) is more correlated to X (t + Δt). is characteristic is referred to as time difference characteristics in the following paper. As shown in Figure 5, curves A and B have similar changing trend, but the starting and ending points are different. By moving the curve B to the right with a period of Δt, the distance between the curves is reduced, and the similarity of the trend is highlighted. In references [10,11], the Pearson product-moment correlation coefficient (PPMCC) is used to describe the correlation between vectors. e optimal value of Δt is determined by finding the value of PPMCC. Equation (7) gives the equation to calculate PPMCC: where x av and y av represent the arithmetic mean of vector X and vector Y, respectively. PPMCC value close to 1 denotes strong correlation, while a value close to 0 denotes weak correlation.
Considering the fact that the DPV plants may be distributed in different time zones, a time shift method to improve pattern matching effects is given as follows. Set a unified reference time as 0 points, search from 0 points backward and forward with the time of Δt, and obtain the monitoring points within the range of [−p, p].
For every iteration of spatial correlation matching, move the target plant vector backward or forward for one monitoring point and keep the other vectors in the pattern matrix unchanged. Calculate the correlation of the target plant and all other patterns and find the pattern with minimum SED, and the most spatially correlated DPV plant is found globally with consideration of time difference characteristics.
In summary, the advantages of considering Δt include the following: 6 Mathematical Problems in Engineering (1) Increasing the probability of successfully matching a target plant to the reference DPV power plants. (2) Searching for the most correlated reference PV power plant globally; i.e., the global minimum is achieved rather than the local minimum, which could improve the forecast accuracy.
erefore, the reference PV power plants matched with target PV plants in this paper are the most correlated plants globally considering the time difference characteristics.

Case Studies
e case used in this paper to demonstrate the forecast method is the actual historical generation data of 5166 DPV in the USA [29]. e DPV plants are located from 73°to 125°W, 25°-49°N, as shown in Figure 6 e prediction window is set to be 3 days before the target forecast day, and the number of monitoring points is 288. e preparation for online forecast is offline clustering. e 3616 reference plants are clustered into 50 spatial correlated groups; i.e., 50 patterns are generated. e calculated largest CSI is obtained to be 1.15485 when the N equals 50 based on equation (2). e clustering results are shown in Section 4.1.
Next, the online forecast process of master-slave shortterm DPV generation forecast method is presented. In Section 4.2 and Section 4.3, two target plants T1 (3903#) and T2 (1346#) are chosen to show the forecast process. In Section 4.4, the forecasts of 1550 target plants are obtained, and the statistic error is compared. Section 4.5 discusses the situation in which multiple reference plants are used to make forecasts. Section 4.6 discusses the choice of prediction window size and its influence on forecast accuracy.

Clustering of Reference DPV Plants. 1000 DPV plants
with forecast ability are chosen as reference plants to generate a pattern library. Using the clustering method in Section 3, a prediction window composed of the average of 10 days' generation data before the forecast target day is chosen as the pattern mining and clustering data, and the group number is set to be 50. Figure 7 shows the curves of 4 typical generation patterns in the pattern library. e plants in the same pattern group share similar generation characteristics, and the generation patterns between groups are extremely different. erefore, the K-means clustering method could put the reference plants with similar generation patterns into the same groups and form a pattern library. e choice of clustering group number should not only consider the CSI, which affects the clustering performance, but also the time consumption for matching target plants to reference plants. In short-term generation forecast, the forecast interval is 15 minutes. If the number of groups is too large, the matching process will be extremely time-consuming, and the forecast timeliness could not be guaranteed.

Searching for Most Correlated Plants considering Time Difference.
is example shows the effect of the time shift  Table 1, and the search process is illustrated in Figure 8.
Several randomly chosen target plants are simulated, and the results show saddle-shaped curves similar to those in Figure 8, and the spatially correlated reference plants are usually located in time zones close to the target plants. ere exists a minimum among the SED values achieved with different time shifts, and the most spatially correlated reference plant may not be synchronous with the target plants.
erefore, the most spatially correlated reference plants could be found globally with the time shift method, considering the time difference. In addition, the results in Table 1 show that the matched reference plants are different when different time shifts are applied, which means that the consideration of time difference could affect the matching results and further affect the forecast performance.

Spatial Correlation Matching considering Time Difference.
Target plants T1 and T2 are added as two new patterns (patterns 51 and 52) into the pattern library. e clustering threshold is set as 1.40. K-means clustering is performed on the new library, and the results show that T1 is most strongly correlated with pattern 48. e reference plant R1 (785#), which has the highest correlation in that pattern group, is chosen as the master station, and the SED between the standard vectors of T1 and R1 is 1.1221. e spatial correlation results are shown in Figure 9.
In Figure 9, curves a and b are the real power of the reference PV plant and target PV plant, respectively, and c and d are the standard vectors of a and b, respectively. We compare the trajectory curves given that the nominal values of generation output of the two plants are quite different, which is the result of differences in installed capacity, converting efficiency, etc., but the overall changing trends are similar. erefore, it is verified that the standardized trajectory curve could preserve the similarity of changing trend and could present the significant numerical correlation. However, the SED between T2's standard vector and the closed pattern's vector is 1.6794, which is higher than the clustering threshold.
us, T2 will be regarded as a new pattern, and no match is found in the reference plant groups. e forecasting for unmatched DPV plants should adopt other forecast methods.

Univariate Prediction Based on Spatial Correlation.
LS-SVM regression method is utilized to perform the numerical fitting of the prediction window generation data of R1 and T1, and the correlation relationship model is obtained. Considering that the actual generation in night time is 0, the following modification of the correlation relationship model is made to avoid human introduced error: if the reference plant generation is 0, the target plant generation should also be 0.
As the main purpose of the case study is to examine the forecast performance of the spatial correlation based method, the actual generation data of reference plants is utilized as the short-term forecast results to avoid the forecast errors of the reference plants.
e short-term forecast generation is utilized as input of the correlation relationship model, and the entire day-ahead generation trajectory of target plant T1 with rolling calculation is obtained. Figure 10 shows the day-ahead forecast generation curve (green dotted line) and the actual generation curve (black line), with the comparison of forecast results using the temporal correlation method (blue broken line).
As shown in Figure 10, the predictive power of target PV plants with spatial correlation (green dotted line) is basically consistent with the predictive power of reference plants (red dotted line) and is closer to the real power of target plants (black line) compared with the predictive power of target PV considering timing correlation (blue broken line). It is obvious to know that the proposed spatial correlation method is effective and has high precision. e forecast performance is evaluated with the forecast errors given in Section 3. e forecast errors are given in Table 2. It can be seen from Table 2 that both RMSE and MAE are smaller for the spatial correlation forecast method compared with the temporal correlation forecast method, which signifies that the proposed spatial correlation method achieves higher forecast accuracy.

Multivariate Prediction Based on Spatial Correlation.
e spatial correlation matching is performed for 1550 target plants randomly chosen from all target plants, and 493 of the target plants fail to find a matching correlated pattern group, taking up 31.8% of all target plants. e short-term generation forecast for these plants should consider using temporal correlation forecast or other forecast methods. Among the rest 1057 target plants, which are matched to reference plant groups, 583 of them have 4 or more reference plants. Using the multivariate prediction function of LS-SVM, the generation forecasts for these 583 plants are obtained.
e forecast statistic mean errors are shown in Table 3. e longitudinal comparison of Table 3 shows that the more reference plants are matched, the more reference information is given, the less the forecast error is. erefore, when there is more than one match of reference plants, the result of multivariate prediction is better than that of univariate prediction. e horizontal comparison of Table 3 shows that the forecast based on spatial correlation is more accurate than the forecast based on temporal correlation. e reason is that the temporal forecast method only utilizes the historical generation data, and no information of future change is involved. e spatial correlated forecast method, on the other hand, uses the numeric weather forecast data (in the generation forecast of reference plants) and historical generation data, hence achieving higher forecast accuracy.

Influence of Prediction Window Size on Spatial Correlation Forecast.
is part discusses the choice of prediction window size and its influence on forecast performance. Considering the limited data storage capability of target plants, we assume that only ten days of historical generation data is available. Use the first nine days' data to generate prediction window data and make a forecast, and the tenth day's data to examine the forecast performance. Figure 11 shows the forecast error of a randomly chosen target plant (plant 1000#) with different prediction window sizes, from 1 day to 9 days. It can be seen that, for example, the forecast error of spatial correlation method is larger than that of temporal correlation using MAE as a criterion if the prediction window size is 3 days or 4 days. e forecast results of spatial correlation method with other prediction window sizes are better than those of the temporal correlation method. e optimal prediction window size is 2 days.
Next, 641 target plants are randomly chosen, and the optimal prediction window sizes are counted. As shown in Figure 12, it is noted that the optimal prediction window size is different for each plant, which is influenced by the characteristics of the plant and the surrounding environment. e majority of the plants could achieve good forecast performance with a prediction window of 3-7 days. erefore, in practical application, the forecast scheme should be customized for each target plant according to its historical data, the prediction window size should be appropriately selected, and the value of prediction window size should be updated as time goes by. To reproduce the cases, there are four limitations including the data source, the number of reference/target DPV plants, the prediction window size, and offline clustering threshold value setting.

Conclusions
Aiming to solve the technique bottleneck of small capacity DPV generation forecast caused by data deficiency and complex influencing factors, this paper proposes an indirect forecast method based on spatial correlation, using a masterslave structure and mapping the target plants incapable of making a forecast on their own to the reference plants, which could make the forecast with sophisticated method. e following conclusions are drawn: (1) e historical generation data contain the complete background information such as meteorological data, so that the spatial correlated forecast method for DPV generation could make full use of historical data and achieve accurate short-term forecast.
(2) Adopting LS-SVM regression for numerical fitting of the spatial correlation relationship could improve the overall forecast accuracy, compared to prediction methods based on temporal correlation and least squares linear regression.
(3) e proposed spatial correlation forecast method could use the DPV plants that are already equipped with forecast systems and obtain short-term generation forecast for DPV or newly built DPV plants with low investment.

Data Availability
e data used to support the findings of this work are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.