Prediction Technology of Power Transmission and Transformation Project Cost Based on the Decomposition-Integration

Through the analysis of power transmission and transformation project cost, total cost can be decomposed into construction cost, equipment purchase cost, installation cost, and other costs.This paper proposes a decomposition-integration cost predictionmodel taking a substation project as an example by fully considering the cost characteristics. In decomposition module, the total cost is decomposed into four expenses. In prediction module, different forecasting models are selected to forecast different expense. In integrated module, choose different integration methods to get the predicting results of total cost. The empirical results show that decomposition-integration prediction algorithm has good effect which can effectively predict the cost of power transmission and transformation project and has practical application and popularization value.


Introduction
As the infrastructure construction of national economy, good construction and stable operation of power transmission and transformation project not only directly affect the development of the electric power industry, but also have a huge ripple effect on other related industries.With the high-speed development of national economy and rapid growth in electricity demand, electric power construction has accelerated [1].However, the electric power construction investment is limited.If the project cost control is undeserved, rising cost level will affect the power grid project economic benefits and the level of construction.So how can we determine and control the power transmission and transformation project cost is the main problem to be solved in the power engineering cost management.At present, there are a lot of related researches on electric power engineering cost prediction.The common cost prediction models are multiple regression, nonlinear regression, linear network prediction, BP network prediction, fuzzy neural network, and so forth.The literature [2] used the gray GM (1,1) method to establish two calculation models, which were used to the estimation budget establishment of the power engineering.The literature [3] established the Markov chain model to accurately forecast the power transmission project cost index.The literature [4,5] elaborated the key influence factors of the power transmission and transformation project cost, using multiple linear regression method to establish comprehensive cost prediction model.The literature [6,7] proposed a kind of transmission line engineering cost based on the BP neural network prediction method.The literature [8] estimated the project cost by calculating the similarity degree between the completed projects and projects based on fuzzy mathematics theory.The literature [9,10] constructed the engineering cost estimation model of support vector machine.Any single forecast model has its own advantages and disadvantages.In order to improve the accuracy of cost forecast, some mixed models appeared in the field of power engineering cost prediction.The literature [11] proposed a cost estimation method based on gray relational analysis and neural network.
The literature [12] established project cost prediction model based on chaos SVM and ARIMA models.As a whole, domestic and foreign engineering cost prediction methods can be summarized as the following kinds: first one is the method of quota with disadvantages of too long file budgeting time and complex budget work.The second is the engineering analogy method with poor accuracy.The third is the fuzzy mathematics method which estimates the project cost by the similar engineering fuzzy number conversion, while the determination of feature membership and coefficient adjustment is too difficult.The fourth is regression analysis; the disadvantage is that it cannot take enough factors, especially the uncertainty factors, into consideration [13].The fifth is artificial neural network algorithm which is intelligent and has a strong ability to find the rules, but the training process may face problems of no convergence, being easy to fall into local minimum point or unable to get global optimization, and so forth.The sixth is combination forecast method; at present, the research on the applying of this model in electric power engineering cost is rare.
Based on the above literature analysis, combined with the idea of "decomposition-integration" and the perspective of combination cost forecasting, this paper builds a decomposition-integration power project cost forecasting model to overcome the shortcomings of the various cost prediction models.As for the decomposition module, the power transmission and transformation project cost is decomposed into itemized expenses which can be simply described.As for the prediction module, the main influence factors are identified and the suitable forecasting model is selected according to the characteristics of the itemized expenses.As for the integration module, choose effective integration algorithm to restore prediction system.At last, this paper chooses the cost data of Zhejiang province power transmission and transformation project for the example simulation.
The results show that it can predict the power project cost effectively using the decomposition-integration method.

Decomposition-Integration Prediction Technology Process
Decomposition-integration of power transmission and transformation project cost prediction technology can be divided into three modules: decomposition module, prediction module, and integration module.And the prediction technology is discussed based on the example of a substation project.
Decomposition Module.According to "Grid Construction Budgeting and Calculating Standard, " the static investment of new substation engineering includes primary and secondary production engineering costs, individual project costs associated with site, compiled years spreads, and other expenses.
The main production engineering costs are divided into construction project cost, installation cost, and equipment purchase expense according to the cost types.Therefore, in terms of cost types, substation project costs can be decomposed into construction project cost, installation cost, equipment purchase cost, and other costs [14].
Prediction Module.Cost prediction method is numerous, including multiple linear regression, artificial neural network, and support vector machine.On the basis of cost decomposition, identify, analyze, and filter its influencing factors; then, select the suitable method for each of itemized expenses from the prediction method library to itemized forecast.
Integration Modules.Applying two methods to integrate costs, the first is simple summation, namely, to add the construction costs, installation costs, equipment purchase costs, and other expenses to get the final prediction; the other is the weighted summation, namely, to work out the weight of costs through analyzing the proportion of the four cost items and then calculate the weighted summation.
Feedback Regulation.Compare the prediction results to the actual value with mean absolute error (MAE) and mean square error (MSE), and constantly optimize the analysis and prediction methods according to the feedback adjustment of fitting effect.The training and optimization can be stopped only when the forecast cost data achieve the prediction accuracy.Decomposition-integrated prediction process is shown in Figure 1.Mining the main factors of the major cost of construction cost, more than 20 factors have been identified: there are location, endurance, pollution level, altitude, landform, the number of main transformer sets in current period, substation capacity, total land requisition area of the site, total land requisition area within the wall, total construction area, main control building area, cable length, excavation of field leveling, fill volume, foundation treatment method, path length of entering the substation, cable channel volume, volumes of well, ditch, and pond, weight of steel, weights of steel frame and other material, weight of cement, and so forth [15].

Prediction Model
(1) Factors Screening Based on MIV.The original input set has more than 20 factors, which are too many; besides, the information of some factors is overlap.In order to optimize the original variable set, the average impact value (MIV), which was regarded as one of the best indicators for evaluation of variable correlation in the neural network, was used to select the key factors in this paper.MIV is a value that represents the influence on weight of input index to output index.The attribute that MIV is a positive or a negative number embodies the influence direction to the outputs.And the absolute value illustrates the relative influence degree.
The calculation steps are as follows.Generate two samples by increasing or decreasing each index by 10% (the percentage is customizable) on the basis of original training samples.Then, calculate the training results of these two samples separately; the difference between the results can be regarded as the specific index changing's impact on the output value.Calculate the average MIV value of the independent variable for several times in order to reduce the training error.Then, repeat the process for each indicator to calculate the MIV values of all input nodes.At last, sort the MIV value, and key factors can be screened in accordance with the requirements [16].
(2) Data Preprocessing.In the process of practical engineering projects, the value of different indicators may vary widely on the order of magnitude as the implications of the original index are not the same.So the data must be normalized.The normalization formula is (3) The BP Neural Network Prediction.Build a three-layer BP neural network, setting the screened variables as network input and the project construction cost as network output.
Confirm the nodes number of hidden layer according to the number of input and the output node.Calculate the predicted output of the neural network by practicing the network training with sample data.Then, compute the network prediction error according to the predicted and desired output.Constantly update weight value and threshold value until the results meet the error precision.At last, save the final weight value and threshold value as the initial weight value and threshold value to predict construction cost of the project [17,18].

Factors Identification.
Equipment cost mainly refers to the purchase of various kinds of electrical equipment in substation project.According to the statistical data, substation engineering costs account for the largest proportion of equipment purchase expense, nearly 50 percent.So the entire project cost fluctuates with the equipment price.The main reason that leads to the variance of equipment purchase expense is the localization degree of the equipment.In recent years, with the development of the domestic technology, constantly improved localization rate of the equipment has made the major equipment purchase expense greatly reduce.But the equipment purchase is still the key link of cost management.In this paper, combined with the statistical data, the equipment purchase expense is analyzed based on the ABC classification method.
Combined with the secondary indexes raised by the experts, major cost in Figure 2 can be decomposed into the main equipment under each secondary index.The factor set of equipment purchase expense generated according to the quantities and price of main equipment is shown in Table 1.
The sum of squared residuals is Equipment purchase costs  Calculating with the matrix method, (2) Model Test.As the fitting effect of the obtained multivariate linear regression model may be not good enough, the equation and parameter must be examined to determine whether these equations are available.
(a) Test of Goodness Fit.In multiple linear regression, the multiple determination coefficient  2 means the influence degree of the explanatory variables within the model to the whole dependent variable.The value of  2 distributes between 0 and 1.The closer  2 to 1, the better the fitting degree of regression equations, and the closer of the relationship between the independent variable and dependent variables.Table 2 shows the two-dimensional division on the production engineering of technical factors.
According to the features of project installation costs, natural factors that affect the construction cost level include endurance and altitude.In summary, the installation construction cost prediction model should take all the technical conditions and natural factors as a factor library into consideration.

Prediction Model
(1) Data Preprocessing.Just like the neural network prediction, the original data have to be normalized to make different dimensional data comparable before SVM training.
(2) The Support Vector Machine (SVM).Support vector machine (SVM) is a kind of novel machine learning methods based on the theory of structural risk minimization principle.The basic idea is as follows: if samples are linearly separable, the optimal classification hyperplane of the two categories samples can be found in the original space; if samples are linearly inseparable, bringing in a slack variable, by using the nonlinear mapping method, the input space samples of lower dimensional space can be mapped to higher dimensional space to make it linear.Through these processes, the nonlinear sample can be analyzed in higher dimensional space by using linear algorithm [22].
Most cases are nonlinear, so the lower dimensional space must be mapped to higher dimensional space with nonlinear mapping function  → () to make the linear SVR be used in higher dimensional space.Kernel function (  , ) is actually a kind of mapping.To different kernel functions, the corresponding algorithms are not the same; thus the sample will be mapped to different higher dimensional spaces [23].In higher dimensional space, applying linear regression method for analysis makes the nonlinear regression problems equal to solve the maximum value of function (,  * ) under constraints.In this function, which is finally obtained as (3) The Grid Search Method.The grid search method is mainly used for the SVM parameters optimization of  and .The SVM parameter selection is actually a process of optimization search.Grid search means to split and exhaustively compute the grids within a fixed range.For selected parameters groups  and , the highest classification accuracy in the sense of cross validation can be found, and the optimal parameters can also be generated through comparing plenty of experiments.This method is simple and intuitive.But the amount of calculation increases with the enlargement of the parameter search range, and the calculation time will be extended [24].The natural environment factors and technical factors should be mainly considered when doing other costs predicting analysis of substation engineering.Among them, the natural environment includes location, endurance, pollution level, and elevation.Technical factors can be summarized as the number of main transformer in current period, a single main transformer capacity, total land requisition area, land requisition area within the walls, total construction area, main control building area, cable length, the excavation of earth leveling, fill volume, cable channel volume, volume of well, ditch, and pool, weight of steel, weights of steel frame and other material, weight of cement, construction site requisition and compensation cost, the preparatory work of the project cost, pile foundation inspection cost, large transport measures cost, design cost, inspection cost, supervision cost, and production preparation cost.

Prediction Model.
There are too many factors affecting other costs.The relationships between the factors are complicated.Besides, the effect is not obvious on the trend of cost expenses.So we use principal component analysis (PCA) to reduce the dimension of factors before calculating.And then predict the cost by using the support vector machine with strong generalization ability.Besides, we use the particle swarm algorithm (PSO) to optimize the parameter.
(1) Data Preprocessing.After the normalization preprocessing, use PCA to reduce the dimension of original data.PCA's goal is to reduce data dimension as much as possible under the premise of guaranteeing the integrity of data information.The analysis processes are as follows: firstly calculate the covariance matrix of factor set square, and work out the eigenvector and eigenvalue of covariance matrix; then compute the variance (information) contribution rate of the principal component to select principal components when the cumulative contribution rate meets the condition [25,26].
(2) Parameters Optimization Based on PSO Algorithm.PSO is a kind of swarm intelligence optimization algorithm in the field of computational intelligence which is applied to solve optimization problems.The basic thought is to initialize a random group of particles and then update the iteration to find the optimal parameters.Specific steps are as follows:

Empirical Research
In order to verify the effectiveness of the decompositionintegrated prediction technology in power transmission and transformation project cost prediction, this paper takes Zhejiang province transmission and substation engineering cost as the research object.The historical data table shows that there are 57 new 110 kV substation projects.Randomly choose 50 samples as training samples and the remaining 7 samples as the test validation samples.And set the unit capacity cost as the predicted output.Based on SPSS and MATLAB, the steps of decomposition-integration are shown in Figure 3.

Prediction of Construction Cost (1) The Analysis of the Main Influence Factors Based on MIV.
In order to reduce the MIV calculation error, this paper calculated the average absolute value of 10 calculation results to get the relative influence degree on construction costs of various factors.The calculation results are shown in Table 3.
As can be seen in Table 3, the MIV values of the first nine characteristics are larger, and that indicates the first nine characteristic parameters are the main characteristic parameter values that influence the output structure.So the first nine indicators can be set as new input variables: Input set = {the number of main transformer sets in current, construction area (main control building), land requisition area (total station), the cable, main transformer capacity, location, steel frame and other materials, fill volume of earth leveling, foundation treatment method}.
(2) Data Preprocessing.Preprocessing results of the 9 indexes data selected by the MIV according to the normalization formula are shown in Table 4.And the results show the normalized data range within (−1, 1).(3) The Forecast Analysis.Build a three-layer neural network with nine inputs and one output, and dynamically adjust the hidden node number.The prediction error results are shown in Table 5. (c) Forecast analysis follows.According to the trained model, compute the test set with the regression equation to calculate the prediction error; the results are shown in Table 6.

Prediction of Installation Cost
(1) Data Preprocessing.In order to uniform dimension and eliminate the influence of the numerical value and the unit, before SVM model training, the historical data needs to be preprocessed according to the normalization formula.
(   Parameter optimization process can be observed through 3D map generated by MATLAB.The highest point represents the optimal parameter combination point, as shown in Figure 4.
After the parameter optimization based on grid search method, the optimal values can be obtained:  = 32 and  = 0.0039063.
(3) The Forecast Analysis.Predict the cost with the trained SVM model after getting the optimal values of penalty parameter , the RBF, and the kernel parameter .The installation costs forecast error is shown in Table 7.

Prediction of Other Costs
(1) Data Preprocessing (a) Normalization Process.In order to make the data processing more convenient, normalize more than 30 original indexes according to the normalization formula.
(b) PCA Dimensionality Reduction.Use PCA to reduce the dimension of input parameters as there are too many original indexes.The calculation process generates 8 principal components, and the cumulative contribution rate reaches 97.8175%.Optimization process is shown in Figure 5.
After the PSO parameter optimization, get the optimal values of parameters  = 2.9811 and  = 0.01.
(3) Forecast and Analysis.Through PSO parameters optimization, work out the optimal parameters  and .Then use the trained SVM model to predict the cost.The prediction error is shown in Table 8.

Prediction of Total Cost
(1) Simple Summation.The substation project unit capacity predicted cost generated by module integrating can be obtained by adding the predicted data of test set.The calculated error is shown in Table 9.
According to the weighted summation formula, The statistical analysis of four charges of substation engineering cost shows that the proportions of construction engineering cost, equipment purchase cost, installation cost, and other costs are 30.96%,43.37%, 7.66%, and 43.37%.Then calculate the weighted summation according to these proportions.The weighted summation errors are shown in Table 10.By comparing the two methods of it can be seen that prediction error of the weighted summation is only a little bigger in samples 2 and 3 than the values of simple summation.And it suggests that the weighted integration can greatly reduce the prediction error and improve the prediction accuracy.
(3) Comparative Analysis of Models.Though neural network and support vector machine (SVM) belong to the artificial intelligence algorithms, the two models have obvious difference in prediction accuracy and stability.It can be found in the experiment process that the generalization ability of SVM is better than BP network; the error of previous experiment results of BP network is volatile, so it is difficult for BP network to achieve the global optimal result.It also can be seen that training results of BP network are not stable.
The traditional linear regression analysis is applicable to the models with obvious linear relationship like equipment purchase model and so forth.The experimental results show that, compared with artificial intelligence algorithms such as SVM, linear regression model is simpler and easier to operate, but the error precision depends on the selection of the independent variables.In the model validation, the errors of equipment purchase expense of two samples are higher than 9%, and the fluctuation is bigger than other three charges.
In order to further verify the application effect of decomposition-integration model in cost prediction, BP network and PSO-SVM are compared to the weighted integration model in terms of the total cost prediction error; the results are shown in Table 11.
It can be seen from Table 11 that the sorting of prediction effect accuracy of the three kinds of models is decompositionintegration > PSO-SVM > BP network.The decompositionintegration model considers the problem from multiple perspectives.And it combines the advantages of other models to improve the prediction accuracy of a single model.

Conclusion
The total cost is broken into the project construction cost, equipment purchase cost, installation engineering cost, and other costs from the perspective of cost decomposition.Mine and identify the influence factors of their costs, respectively.And establish the adaptive forecast model.Finally integrate the four charges.Through the analysis of the actual forecast for substation construction cost in Zhejiang province, the following conclusions can be drawn: (1) The decomposition-integration model is proposed from the angle of multicomponent forecasting.Convert the total cost prediction problem to some child costs prediction.Although the forecasting principle is simple, the advantages of this model are obvious.
From engineering point of view, this model can dig out the main influencing factors and choose the suitable prediction model of every child cost separately, which can be targeted to improve the prediction accuracy of each cost.From the theoretical point of view, this model combines the various forecasting methods to overcome the disadvantages of the single prediction model.Through the module of feedback regulation, it can choose highest accuracy of forecasting model series, which can improve the prediction precision greatly.
(2) In the prediction module, according to the characteristics of the four child costs, select the suitable prediction model.For equipment purchase expense, there is significant linear relationship between factors and costs.A multivariate linear regression model can be built.While the factors of other three costs are too much and the relationship is complex, the models with strong generalization ability can be chosen, such as neural network and SVM.When optimizing the prediction model, SVM parameters are optimized by algorithm of grid search and PSO.The experiments prove that PSO has great advantage in efficiency, speed, and accuracy.
(3) In the integration module, we can see that the prediction error is smaller using weighted summation compared with the simple summation from the experiments clearly.The reason is that the results using weighted summation are based on the statistical analysis results of four cost proportions in the total cost.It can add a fixed link compared with the simple summation.Thus, the prediction accuracy can be improved significantly.
In summary, the proposed decomposition-integrated model can improve the prediction accuracy effectively.This model is of a certain significance and good application effect in the power transmission project cost prediction.

2 MathematicalFigure 1 :
Figure 1: Technique flowchart of power transmission and transformation project cost prediction technology based on decompositionintegrated.

Figure 2 :
Figure 2: Equipment purchase cost analysis based on ABC.
Based on PSO-SVM 3.4.1.Factors Identification.In other costs, the most uncontrollable factors are the site acquisition costs and clean-up costs at early stage of the project.As the land acquisition costs, construction costs, relocation compensation costs, and other costs have big difference due to different land prices in different regions, the key to cost control is to analyze the station location.Other costs mentioned above include the following six subcosts: the construction site acquisition cost and clearing costs, management costs, the technical service costs, production preparation cost, and other basic reserve funds cost.
(a) divide the historical samples into training set and test set; (b) set the values of fold number of the cross validation, evolution algebra, population quantity, and the range of parameters  and ; (c) calculate the fitness value according to randomly generated parameters  and ; (d) get the position of the best fitness; (e) reset the search speed and cycle to search for the other optimization parameters  and ; (f) repeat calculation process and cease the calculation after reaching the maximum range [27-29].

Table 1 :
The main factors that influence the purchase expense recognition equipment.
[21]Significance Test of Regression Equation.Significance test of regression equation means to use the  test method to test whether parameter   is not equal to 0 significantly.That is, if  >   , the linear relationship is significant; if  <   , the linear relationship is not significant.Critical value   (, −− 1) can be calculated by the given significant level ; the value of  can be calculated by sample statistics[21].
3.3.1.Factors Identification.In the statistical sample, installation cost accounts for about 7% of the total investment, compared with other costs, and the volatility is relatively stable.The influence factors of installation cost can be excavated from the natural factors and technical conditions.

Table 2 :
Identification of technical conditions of installation cost factors.

Table 3 :
The average MIV values of the factors.

Table 4 :
Part of index sets after being normalized.

Table 5 :
The construction cost prediction error.

Table 6 :
Equipment cost prediction error.Build the multivariate linear regression model of equipment purchase expense by setting the unit prices of main transformer system as  1 , high side power distribution device as  2 , low voltage side power distribution device as  3 , capacitor banks as  4 , computer monitoring system as  5 , station transformer as  6 , station power distribution equipment as  7 , and communication system as  8 as the independent variable.
(a) After calculating, the multiple coefficient of determination is  2 = 0.9843, indicating the model fit better.(b) According to the sample to work out the statistics,  = 8.17.Through querying the table,  0.05 (8, 41) = 2.18, thus proves that the linear relation of the model is obvious.
2) Parameters Optimization Based on Grid Search Method (a) Firstly, set the variation range of penalty parameter , RBF, and kernel parameter .For example, set  min = −8 and  max = 8, which is to find the best parameter  in [2 ∧  min , 2 ∧  max ]; set  min = −8,  max = 8, and the variation range of RBF kernel parameter  is [2 ∧  min , 2 ∧  max ], namely, to find the RBF kernel parameter  in this range.

Table 7 :
Installation costs forecast error.

Table 8 :
Other expenses prediction error.

Table 10 :
The error of weighted summation.

Table 11 :
Error analysis of the models.