Optimal Parameter Selection for Support Vector Machine Based on Artificial Bee Colony Algorithm: A Case Study of Grid-Connected PV System Power Prediction

Predicting the output power of photovoltaic system with nonstationarity and randomness, an output power prediction model for grid-connected PV systems is proposed based on empirical mode decomposition (EMD) and support vector machine (SVM) optimized with an artificial bee colony (ABC) algorithm. First, according to the weather forecast data sets on the prediction date, the time series data of output power on a similar day with 15-minute intervals are built. Second, the time series data of the output power are decomposed into a series of components, including some intrinsic mode components IMFn and a trend component Res, at different scales using EMD. The corresponding SVM prediction model is established for each IMF component and trend component, and the SVM model parameters are optimized with the artificial bee colony algorithm. Finally, the prediction results of each model are reconstructed, and the predicted values of the output power of the grid-connected PV system can be obtained. The prediction model is tested with actual data, and the results show that the power prediction model based on the EMD and ABC-SVM has a faster calculation speed and higher prediction accuracy than do the single SVM prediction model and the EMD-SVM prediction model without optimization.


Introduction
With the increasing scale of grid-connected PV systems, the adverse effects of intermittent and uncertain characteristics of the PV system on the public grid are becoming increasingly important [1][2][3]. If the changes in the PV system power generation can be accurately predicted, a reasonable power grid scheduling and balanced power load configuration can be achieved to protect the security and stability of the public grid system. Currently, there are two methods to predict the output power of PV systems: the indirect and direct prediction methods. The crux of the indirect method is to predict the solar radiation intensity of the PV installation site, predict the solar radiation intensity at a certain time, and substitute it into the corresponding output model, thus obtaining the predicted output power value of the PV system [4]. Direct prediction methods do not require solar irradiance data and can predict the power output of PV power generation systems in the next time period by using only the historical PV system data and public weather information [5][6][7][8]. Some studies have shown that the influence of meteorological factors on the output power of PV systems is significant. If the meteorological conditions are similar in two time periods, the power output curves will have a great similarity. Therefore, it is possible to predict the output power of the grid-connected PV system by selecting a date with similar data [9].
The direct prediction method predicts the future power output by using the historical data of the output power based on a mathematical statistics prediction theory and method. The indirect prediction method is also called the step-bystep prediction method. In this method, the solar irradiance is forecasted, and the output power is then calculated based 2 Computational Intelligence and Neuroscience on the photoelectric conversion model. This method cannot obtain the output power directly, so it is called the indirect prediction method.
The above PV power prediction methods have their own characteristics and associated limitations in their application [10]. At present, the PV power prediction error resulting from a single prediction method is large, generally 15% to 30%, because the output power of a PV system is largely affected by meteorological factors. Furthermore, there are intermittent problems and uncertainties in photovoltaic power generation systems. The limitations of the prediction methods are also a key factor that causes a relatively large error. Numerous studies have shown that the accuracy of a single prediction method cannot meet the prediction accuracy requirements for the power generation of PV systems. Combined prediction methods can synthesize the advantages of multiple prediction approaches and improve the prediction accuracy of generated power of the PV system.
The empirical mode decomposition (EMD) has been largely and successfully combined to predict the nonlinear stochastic time series. This prediction method first decomposes the time series into multiple series of different frequencies, establishes prediction models for different series to reduce the interaction among the information about different characteristics, and finally reconstructs the prediction results to obtain the predicted value of the original series.
In this paper, a combined prediction model is introduced and applied to PV power prediction. The advantages of different algorithms are combined to establish and test the prediction model for the output power of grid-connected PV systems based on the EMD and ABC-SVM. This model effectively overcomes the defects, such as poor generalization performance, low prediction accuracy, and unstable prediction results, that are observed when a single model is adopted and successfully applies the artificial bee colony optimization algorithm and EMD method to predict the output power of grid-connected PV systems. Support vector machine (SVM) is a novel machine learning method that is based on statistical learning theory and minimized structural risk. It has been successfully applied in nonlinear regression predictions in various fields, such as wind speed forecasting, short-term load forecasting, and tourist flow forecasting [11][12][13]. These results have proven that the SVM can successfully solve prediction problems with small samples, nonlinearity, and high dimensionality. However, the parameter optimization in SVM plays a crucial role in improving the prediction accuracy and stability. Therefore, it is vital to select the most appropriate parameter value for the SVM. Determining the optimal parameters for SVM is very important.
The artificial bee colony (ABC) technique is an optimization algorithm based on the intelligent foraging behavior of a honey bee swarm. The unique mechanism of division of labor and collaboration in the ABC algorithm makes bees collaborate in accordance with different search strategies to complete the task of seeking the optimization, showing strong global optimization seeking ability. The algorithm has been shown to be superior to the performance of the genetic algorithm, ant colony algorithm, and particle swarm 7  algorithm in related research [14][15][16][17][18]. Therefore, the artificial bee colony technique is used to search for the optimal parameters of the SVM in this paper.

Clustering Selection Method of Similar Days for the Output Power of a PV System
It is found that the influence of meteorological factors on the output power of photovoltaic power generation system is significant. Under the same conditions, the size and the changing tendency of the output power will differ because of the varying weather types. Figure 1 shows the change in the 15-minute output power of a 10 kW PV system under different weather types, sunny, sunny to cloudy (cloudy to sunny), cloudy, and overcast, in August 2014. Figures 2 and 3 show 15minute average irradiance and temperature of the solar panels under different weather conditions similar to Figure 1. The 10 kW PV array consists of 51 photovoltaic modules (PLUTO195-ade). These modules are connected both in series and in parallel to obtain a larger output power. Seventeen modules are connected in series, and 3 strings of series-connected modules are connected in parallel (17 * 3 * 975 W = 9.945 kW).
The manufacturer specifications for one module are as follows: open-circuit voltage ( oc) is 45.4 V, short-circuit current ( sc) is 5.52 A, and the voltage and current at maximum power ( mp and mp) are 37.6 V and 5.19 A, respectively.
When the output power of the grid-connected PV system is predicted, finding reasonably similar days from the actual historical data can greatly improve the prediction accuracy of the output power of the PV system.
To find days that have similar weather and seasonal types and thereby determine temperature and humidity of the  prediction day, we first classify the weather and seasonal types. The weather types are classified as sunny, sunny to cloudy (cloudy to sunny), cloudy, overcast, and rainy (snowy) which are represented by 1, 2, 3, 4, and 5, respectively. The seasonal types are classified as follows: March, April, and May for spring; June, July, and August for summer; September, October, and November for fall; and December, January, and February for winter. Then, the classification is combined with the historical data according to the seasonal types of the prediction day, and fuzzy clustering is carried out using the fuzzy -mean algorithm. Finally, days similar to the prediction day are found. The general steps are as follows.
(1) Identify the clustering indexes. The weather type is , the daily highest temperature is h and the lowest is l , and the daily maximum humidity is h and the minimum is l . To increase the comprehensiveness of the clustering samples, we take the historical meteorological data of the same season two years before as the clustering sample set. Assuming that the data set has days, the sample set can be written as = { | = 1, 2, . . . , }, where = { , h , l , h , l }. At the same time, the number of clusters, , the fuzzy weighting parameter, , the threshold value, , and the initial iteration step, , are determined.
(3) Calculate the membership matrix = | | × , where represents the membership of vector on class V and satisfies the following: where 1 ≤ ≤ , 1 ≤ ≤ , represents the similarity between and the clustering center V , and is the vector dimension.
(4) Calculate the clustering center +1 according to the following: (5) Determine whether the termination condition is satisfied. If ‖ +1 − ‖ ≤ , the algorithm is terminated and the partition matrix, , and the clustering center, V, are obtained. Otherwise, return to step (4) to continue the calculation until the termination condition is satisfied.
After the clustering is complete, the prediction day in the same class and its similar historical date can be obtained according to the partition matrix .

Decomposing the Output Power Signal of the Grid-Connected PV System by EMD
The essence of EMD is to smooth the nonlinear and nonstationary signals based on local characteristic scale. The EMD also decomposes different scales of fluctuations or trends step-by-step from the original complex signals to form a series of intrinsic mode function (IMF) characteristics with different scales and a trend component [19][20][21][22][23]. Compared with the wavelet transform, the empirical mode decomposition (EMD) not only has the characteristic of multiresolution but also overcomes the difficulty in determining the scale 4 Computational Intelligence and Neuroscience of decomposition in the wavelet transform and selecting the wavelet base. The empirical mode decomposition of ( ) and the time series of the PV output power are performed as follows.
(1) Identify all maximum points in the signal sequence composed of the PV output power in multiple similar days, and the upper envelope line ℎ ( ) of the sequence is fit by interpolating the cubic spline function. Identify all minimum points in the output power time series, and the lower envelope line ( ) of the sequence is fit by interpolating the cubic spline function.
(2) The average values of the upper and lower envelope lines are calculated as the average envelope line V ( ); a new data sequence ℎ 1 ( ) is obtained using the original power output time series ( ) minus the average envelope line V ( ). In general, ℎ 1 ( ) is a nonstationary time series that should be processed again based on the above discussion. Assuming that ℎ 1 ( ) satisfies the IMF conditions after treatments, the first IMF component imf 1 ( ) is obtained, where imf 1 ( ) = ℎ 1 ( ), and it contains the shortest variable cycle component in the original output power time series.
(3) An output power time series 1 ( ) that removes the high-frequency component is acquired using the original output power time series ( ) minus the first IMF component imf 1 ( ). We can obtain all of the IMF components and a trend component Res, as shown in the following formula, after continuing the above-mentioned smoothing treatment on 1 ( ).
Finally, the form of the decomposed PV output power time series ( ) is obtained: where imf ( ) represents the intrinsic mode function component of the PV output power time series ( ) and Res( ) represents the average trend component of the original signal sequence. Thus, the original PV output power time series can be decomposed into the sum of a series of intrinsic mode function components and an average trend component. Taking the actual output power of the 10 kW gridconnected PV system of the new energy grid-connected PV power generation engineering technology center at a university of Henan Province as an example, the EMD decomposition of the output power time series with 15minute intervals in 50 similar days is performed, and the results are shown in Figure 6. The output power time series is decomposed into seven IMF components and a trend component. IMF1 and IMF2 are the high-frequency components and have strong nonlinear and random change characteristics caused by abrupt changes in the weather. The frequencies of IMF3-IMF7 become significantly lower and show a strong periodicity, which is affected by meteorological factors; this is the main component of the output power. The residual component Res shows relatively gentle changes, has small amplitude, and is the minor component of the output power.

Artificial Bee Colony Algorithm and Support Vector Machine
Artificial Bee Colony Algorithm. The artificial bee colony (ABC) algorithm is a group intelligent optimization algorithm that simulates the process of bee colonies gathering honey [24]. The bees in the artificial bee colony algorithm can be divided into three types: employed bees, scout bees, and onlooker bees. The three types of bees cooperate with each other to complete different stages of the tasks in the honey mining process and identify the position of the optimal nectar source by collecting and sharing nectar sources. In the artificial bee colony algorithm, the optimal nectar source position corresponds to the optimal solution of a problem, and the amount of nectar contained in the nectar source corresponds to the fitness value of the solution.
After the initialization is complete, the employed bees search the neighborhood of the corresponding known nectar source (the original solution to a problem) and find a new nectar source (a new solution to the problem). The position of the new nectar source (the parameter value of the optimized problem) is determined according to the following: where is a random number within [−1, 1] that controls the generation range of the neighborhood, ∈ {1, 2, . . . , SN} and ∈ {1, 2, . . . , } are randomly selected subscripts, and is not equal to .
SN employed bees return to the hive after completing the search task and share the searched nectar source information with the scout bees. The scout bees select the nectar source based on the amount of nectar in each source (fitness function value of a solution) and in accordance with the following: Subsequently, the scout bees will search near the selected nectar source and determine the position of a new nectar source per formula (5). They will use the method to select or not select a new nectar source, similar to nectar-gathering bees to determine whether to replace the old nectar source with the new one. If the nectar source cannot be improved after it is updated for limit times, this nectar source will be discarded. The corresponding employed bees will also change to onlooker bees, which will reidentify a new nectar source in accordance with the following: The unique mechanism of dividing the labor and collaboration in the ABC algorithm makes bees work together with Computational Intelligence and Neuroscience 5 different search strategies to complete the task of seeking the optimized solution by showing a strong global optimization seeking ability.

Regression
Model of SVM. Support vector machine can minimize the expected error and overcome the problem of overfitting because it is based on structural risk minimization principle. According to previous research, SVM can provide better resolutions for both classification and regression in different fields: fault classification, electricity load forecasting, wind speed forecasting, prediction of the air quality, and so on. The basic principle of SVM to solve regression prediction problems is described as follows. The sample set is normally denoted as The regression model defines the functional relationship between and ( ) as where , are the weight vector and threshold, respectively. Furthermore, the coefficients and can be found by solving the following convex quadratic programming problem: where is penalty coefficient, ( * ) is slack variable and is the insensitivity coefficient. ( * ) guarantees the satisfaction of constraint condition; controls the equilibrium between the complexity of model and training error; is a preset constant for controlling tube size. If is set too small, it will lead to overfitting; otherwise, it is easy to lead to the underfitting.
In this study, we chose Gaussian radial basis function as the kernel function: where 2 is the kernel parameter, and it precisely defines the structure of high dimensional feature space. The penalty coefficient , the insensitivity coefficient , and the kernel function parameter 2 in SVM determine the accuracy and generalization performance of the algorithm.

Constructing a PV Power Prediction Model Based on EMD and ABC-SVM
The strong nonlinear and nonstationary power sequence signals by the grid-connected PV system are decomposed by the EMD to obtain several basic modal components that have little influence on each other. This simplifies the interference or coupling of the characteristic information in the signal sequence and reduces the nonstationarity of the signal. Using this approach, an output power prediction model of a gridconnected PV system is proposed in this paper to optimize the support vector machine (SVM) with the artificial bee colony algorithm.

Optimal Parameter Selection for the SVM Model
Based on ABC Algorithm. The penalty coefficient , the insensitivity coefficient , and the kernel function parameter 2 in the SVM determine the accuracy and generalization performance of the algorithm. However, the selection of these three parameters still lacks an effective solution. To address this problem, this paper adopts the artificial bee colony algorithm to optimize the selection of the SVM parameters. The flow diagram of this method is shown in Figure 4. It is found from several tests that the EMD-SVM prediction model can achieve the ideal prediction accuracy and generalization ability when the parameters of the ABC algorithm are initialized as follows in predicting the PV system output power. The colony size is = 160, the numbers of nectar-gathering and observing bees are both 80, the number of the initial nectar sources (the initial solutions to the optimized problem) is 80, the maximum number of updates of the nectar source is 90, and the maximum number of algorithm loops is 150.

Constructing a Power Prediction Model for a Grid-
Connected PV System. First, the 15-minute output power time series of similar days is constructed based on the weather forecast data of the prediction day. Then, the output power time series is decomposed using the empirical mode to obtain the intrinsic modal component IMF and the trend component Res at different scales. The corresponding support vector machine prediction models are established for each IMF component and trend component. The input of the model includes the weather type , the maximum temperature h , the minimum temperature l , the maximum humidity h , and the minimum humidity l of the prediction day as well as This solution is abandoned, and the employed bees for this solution turn into scout bees. A new solution is randomly generated to replace the abandoned solution according to formula (7) Is the solution improved when the number of cycles is greater than the limit?
Preserve the value of the optimal solution in the search process Is the termination condition reached?
The sample data are normalized and the EMD-SVM model is constructed

Data Preprocessing.
To verify the performance of the output power prediction model for a PV system based on EMD and ABC-SVM, the Matlab software is used to complete the model construction. The prediction model is tested and analyzed on a test platform of the 10 kW grid-connected PV system operating in the engineering technology center of a university in Henan province. The sample data sets used in this study are the measured values of the output power of the grid-connected PV system and the local weather data records. In the test, the actual power data in the whole year of 2014 are selected as the research object. The data are classified into five categories according to the weather types, sunny, sunny to cloudy, overcast, and rainy (snowy) and are recorded every 15 minutes. In this example, one day is one period. According to the local sunshine characteristics of Anyang, the summer PV system outputs power for approximately 11 hours per day on average. In this paper, 11 h is chosen for each period, the generated power data are sampled once every 15 min, the meteorological parameters for each period are the temperature and weather type, and the input variable of the model obtained is = ( 1 , 2 , . . . , ), where is 228, 1 ∼ 220 represent 220 pieces of data sampled once every 15 minutes on 5 similar days that is closest to the prediction day, 221 ∼ 224 represent the maximum temperature, minimum temperature, maximum humidity, and minimum humidity of the similar day, and 225 ∼ 228 represent the maximum temperature, minimum temperature, maximum humidity, and minimum humidity of the prediction day. The output variables of the model are 44 output power values within the prediction day.
According to the method described above, the clustering analysis of similar days is carried out for the weather types in 2014. These days are divided into five typical weather and four seasonal types. In our study, there are 340 sets of sample data (actually, there are 365 sets of data, but 15 of them are bad data) including 208 sets in sunny days, and we will focus on these 208 sets of data firstly. · · · ABC-SVM (n + 1) with a sunny weather type is taken as the training data set, and the 8 sets were used as test data: February 24, February 27, May 26, May 28, August 30, August 31, November 25, and November 29. We will establish 4 forecasting models under 4 seasonal types separately and take summer type as an example to introduce the establishment and forecasting process of the model in detail. The model operates according to the construction method for the EMD-ABC-SVM power prediction model in the previous section. First, the EMD decomposition is conducted for the output power sequence of 50 similar days under summer type to obtain seven IMF components and one Res component, as shown in Figure 6. IMF1 and IMF2 are the high-frequency components and have strong nonlinear and random change characteristics caused by abrupt changes in the weather. The frequencies of IMF3-IMF7 become significantly lower and show a strong periodicity, which is affected by meteorological factors; this is the main component of the output power. The residual component Res shows relatively gentle changes, has small amplitude, and is the minor component of the output power.
A SVM power prediction model is constructed for each component, and the parameters of each SVM model are optimized using the artificial bee colony algorithm. The steps are shown in Figure 4. The performance test of the SVM model after parameter optimization can be tested with the test set.  Figures 7 and 8. Due to limited space, we only give detailed data for two days: August 30th and August 31st, as shown in Tables 1 and 2. In the three models, the EMD-ABC-SVM is several SVM prediction models that have been optimized by the ABC algorithm and constructed by multiple IMF components and an Res component that are obtained through EMD decomposition of the original signal. To verify the effectiveness of the ABC algorithm, we also established an EMD-SVM of the SVM parameters that had not been optimized. The single SVM model predicts the original sequence directly.

Experiment Results and
By comparing the three types of prediction models, we can see that the EMD-ABC-SVM has the highest accuracy, which indicates that the IMF components after the EMD decomposition reduce the influence of the nonstationarity and randomness in the SVM models and that the parameter optimization of the ABC algorithm gives the best performance for the SVM models. The prediction error in the two periods 7:00 am∼9:00 am and 16:00 pm∼18:00 pm is relatively large, but the actual amount of electricity generated in the morning and evening is smaller than the total amount of electricity generated throughout the day, indicating that these errors do not affect the practical application of the prediction model.
To fully verify the performance of the model, we have the output power of the grid-connected PV system under the four different weather types: sunny, sunny to cloudy, cloudy, and overcast. The performance of the models is compared and judged with the MAPE and RMSE. The comparison results 8 Computational Intelligence and Neuroscience  of the MAPE and RMSE for different prediction models are shown in Table 3.
The data given in Table 3 show that the weather types have different effects on the various prediction models. For sunny days, the three models all have good performance, but the EMD-ABC-SVM has the best prediction effect with an MAPE and RMSE of only 6.35% and 7.59%, respectively. In addition, the single SVM model without the EMD decomposition has a prediction error below 15%. For cloudy and sunny to cloudy days, the prediction effects are not ideal. The RMSE of the EMD-ABC-SVM model is up to 14.16%, and the maximum error of the single SVM model without the EMD decomposition is 21.27%. These results are mainly because the cloudy and sunny to cloudy weather conditions change frequently and increase the randomness of the data. The three prediction models have different performances under the same weather type. The prediction error of the EMD-ABC-SVM model is the smallest under the various weather types. The fundamental reason is that the original output power sequence establishes different ABC-SVM models after the EMD decomposition, which reduces the random interference of the power signal and reduces the mutual influence of the characteristic information in the power signal. Additionally, the parameters of the SVM models are optimized with the ABC algorithm to achieve the best working conditions. Therefore, even in cloudy and sunny to cloudy weather conditions with strong randomness, its performance is better than that of the EMD-SVM with nonoptimized parameters and the single SVM that is not decomposed by the EMD.
Grid search and cross validation are usually adopted to optimize parameter of SVM and EMD-SVM. But the grid search is an exhaustive search method and it will take a long time when the range of parameters is large. Optimal parameter selection for EMD-ABC-SVM uses the artificial bee colony algorithm and cross validation. Artificial bee colony algorithm is a heuristic search algorithm, so it did not need all data within the scope of traversal parameters in the group. Furthermore, in the process of optimization, it can use their own individual experience or exchange of experience to change the search strategy, so it can save a lot of time.
In order to apply the PV power prediction method to the practical photovoltaic power generation system, we have developed a PV power forecasting system. The system is developed on Eclipse platform, using Struts2 framework based on MVC model and data persistence framework Hibernate to implement the Web application, using Apache-Tomcat5.5 as Web server. The system mainly includes system management module, data management module, and power prediction module based on EMD-ABC-SVM. The system management module is mainly responsible for the management of the basic user information of the system, such as user information modification, adding or deleting user, and user's rights management. The data management module realizes the management of the power and weather data of the system, such as data import and export, data query, and display. The power prediction module based on EMD-ABC-SVM realizes the short-term prediction of photovoltaic power generation system and saves the prediction results into the database.
At present, the system is running normally, and the practical application proves that the system is practical and the prediction accuracy can meet the actual demand.
The predicting system has played an important role in application. First, the accurate power forecasting can help power dispatching department to make overall arrangements on the optimal combination of conventional power generation and photovoltaic power generation, effectively mitigate the adverse impact of PV power fluctuations on the power grid, and ensure the security and stability of the public grid system. Second, the predicting system can increase utilization efficiency of grid-connected photovoltaic system and help to reduce the spare capacity of the rotating equipment in the thermal power plant and reduce the fuel consumption. Third, the prediction data can provide reference for us to arrange maintenance and overhaul for the photovoltaic array and inverter properly and improve the economic benefits of photovoltaic power station.

Conclusions
In this paper, the artificial bee colony optimization algorithm and empirical mode decomposition method are combined and successfully applied to the field of short-term prediction of the output power of a grid-connected PV system. Similar days of the same season are filtered with the fuzzy -mean.
The EMD method is used to conduct the empirical modal decomposition of the output power series, producing the intrinsic modal component IMF under different scales and one trend component Res. The corresponding SVM prediction model is established for each component, and the optimizing pretreatment using the artificial bee colony algorithm is done for the SVM model parameters. Finally,   the results of each prediction model are integrated and reconstructed to obtain the predicted values of output power from the PV system. The results acquired from the test using measured data show that the effect of the EMD-ABC-SVM prediction model is superior to those of the single SVM prediction model and the unoptimized EMD-SVM prediction model. The proposed method improves the prediction accuracy of output power of the grid-connected PV system, reduces the influence of randomness of the PV generated power on the safe and reliable operation of the public power grid, and provides an effective method for the optimal scheduling of the output grid power.

Conflicts of Interest
The authors declare that they have no conflicts of interest.