Intelligent Prediction of Transmission Line Project Cost Based on Least Squares Support Vector Machine Optimized by Particle Swarm Optimization

In order to meet the demand of power supply, the construction of transmission line projects is constantly advancing, and the level of cost control is constantly improving, which puts forward higher requirements for the accuracy of cost prediction. This paper proposes an intelligent cost predictionmodel based on least squares support vectormachine (LSSVM) optimized by particle swarm optimization (PSO). Originally extracting natural, technological, and economic indexes from the perspective of cost composition, principal component analysis (PCA) is used to reduce the dimension of indexes. AndPSO is innovatively introduced to optimize the parameters of LSSVMmodel to obtain the optimal parameters.The obtained principal component data are imported into empirical parameter LSSVM prediction model and the optimized parameter PSO-LSSVM prediction model, respectively, for modeling and prediction, and then comparing the prediction results to analyze the effect ofmodel optimization.The results show that the absolute deviation of the optimized parameter prediction model is less than 9%. And the prediction accuracy of the optimized parameter prediction model is better than that of the empirical parameter model, which can provide a reliable basis for investment decisionmaking of transmission line projects.


Introduction
With the rapid economic development, electricity demand has also increased significantly, so the pace of power transmission and transformation project construction is further accelerated, and the investment scale is further expanded.However, if the cost of power transmission and transformation projects can not be effectively controlled, the improvement of cost level will directly affect the economic benefits and construction level of power grid projects.The first task to control the cost of transmission line project effectively is to determine project cost quickly and accurately.Due to the characteristics of large area span, complex construction environment, many unpredictable factors, and nonlinearity of transmission line project, traditional forecasting methods are no longer suitable.Therefore, in order to make reasonable investment decisions, it is necessary to study the intelligent cost prediction model in depth to improve the speed and accuracy of transmission line project cost prediction.
In the traditional statistical analysis method, multivariate regression analysis is used to predict the production cost by establishing regression analysis model.Elmousalami HH et al. established a quadratic regression model and determined the key parameters of the model to establish a reliable conceptual cost estimation model for field canal improvement projects (FCIP) [1].Zhang HB analyzed the cost and elements through the regression analysis of the design parameters, revealed the influence degree of design parameters on the cost and elements of building structure, and obtained the prediction model of the cost and elements of building structure project [2].However, the regression analysis method is poor in the face of many uncertain factors and less sample data.The number of samples needed by time series analysis method is relatively small, but it can not fully consider the accidental factors which are too numerous and difficult to estimate.Moon S et al. established an autoregressive fractionally integrated moving average (ARFIMA) time series model with long memory characteristics [3].Moon T et al.
Mathematical Problems in Engineering established an interrupted time series forecasting model to predict the construction cost index (CCI) [4].And based on fuzzy logic, Zhang R et al. improved the prediction accuracy of time series [5].
With the development of artificial intelligence, machine learning and intelligent algorithms have gradually been popular for project cost prediction.Juszczyk M introduced the principal component analysis into the neural network to predict the cost of residential buildings in the design stage [6].Cheng MY et al. used the artificial intelligence method, evolutionary fuzzy neural inference model (EFNIM), to improve the accuracy of cost estimation [7].Sonmez R proposed a neural network model with bootstrap prediction intervals for range estimation of construction costs [8].Kim M et al. proposed an approximate cost estimating model for irrigation-type river facility construction at the planning stage, based on case-based reasoning (CBR) with Genetic Algorithms (GA) [9].Kim S used hybrid analytic hierarchy process (AHP) and case-based reasoning (CBR) to study the cost estimation model in the early stage of Korean highway project [10].Lesniak A et al. believed that design according to the principle of sustainable development has an impact on construction cost and proposed an approach to estimate the costs of sports field construction using the case-based reasoning method [11].Rafiei MH et al. proposed a cost estimation model including an unsupervised deep Boltzmann machine (DBM) learning approach and three level back propagation neural network (BPNN) [12].Juszczyk M et al. plotted the relationship between the total cost of construction project and the cost prediction factors of the characteristics of the sports field, and proposed a method for estimating the construction cost of the sports field based on the neural network [13].
With the maturity of forecasting technology, the research on power engineering cost prediction is gradually deepened.Wang FP combines the intelligent learning algorithm with the actual situation of power transmission and transformation projects and proposed a refinement method of control line for power transmission and transformation project cost [14].Based on Chebyshev inequality, Zhang GL et al. established a prediction model of transmission and transformation project cost, which expanded the dimension of cost analysis [15].Wang X et al. predicted the static investment of power transmission and transformation project based on Bayesian network, gave the range of probabilistic interval estimation, and provided a reasonable reference for project decisionmaking [16].Ling YP et al. proposed a method to predict the construction cost of transmission line based on BP neural network, and established a prediction model of three-layer BP neural network [17].Lu Y et al. proposed a decompositionintegration cost prediction model and selected different forecasting models to predict different costs considering cost characteristics [18].
Support vector machine (SVM) is based on the statistical learning theory (SLT) established by Vapnik et al. [19].It has strong generalization ability and outstanding performance in small sample learning and nonlinear problems [20,21] and is suitable for transmission line project with many unpredictable factors and nonlinear characteristics.Least square support vector machine (LSSVM) is an improved algorithm of SVM, which can effectively improve the efficiency of machine learning [22].The LSSVM has been successfully applied to many forecasting problems.Cheng MY et al. established the ELSVM hybrid forecasting model, combined with differential evolution method to optimize the core parameters of LSSVM, and simulated the fluctuation of construction cost index (CCI) of construction project price change [23].Cong YL et al. proposed a traffic flow forecasting model based on LSSVM with two parameters automatically determined by FOA [24].
The fitting accuracy and generalization ability of LSSVM mainly depend on its two parameters' selection (C and  2 ).Therefore, it is very important to use appropriate heuristic algorithm to determine parameter values.General optimization algorithms show low calculation efficiency and poor accuracy [25,26], and the particle swarm optimization (PSO) algorithm has faster convergence speed and higher accuracy [27], so this paper uses PSO to optimize the parameters of LSSVM.
This paper is organized as follows: Section 2 briefly introduces PCA, LSSVM, and PSO.Section 3 gives the screening and calculation methods of the factors affecting the cost.Section 4 conducts an empirical study to verify the proposed model and Section 5 obtains the conclusion.

Model Construction
This paper presents the proposal of an approach to the prediction of transmission line project cost which is based on LSSVM.PCA is used to reduce the dimension of the original index to get the corresponding principal component as the input index of the prediction model.And PSO algorithm is used to automatically optimize the parameters of least squares support vector machine to obtain the optimal parameters.The process of model building is shown in Figure 1.
As shown in Figure 1 , the construction of the prediction model mainly includes the following four steps: (1) Select the cost impact indexes and collect the original data of the impact indexes and cost.
(2) PCA is used to reduce the dimension of original data and output new samples.Firstly, the original data are standardized to obtain standardized data; then, the correlation coefficient matrix and its eigenvalues and eigenvectors are calculated based on the standardized data; finally, the variance contribution rate of each principal component is calculated, and the first n principal components whose variance contribution rate accumulates to the preset value are selected as the new sample output.
(3) Given the empirical parameters of the LSSVM model, an empirical parameter prediction model is constructed.
(4) PSO is used to optimize the parameters of LSSVM.Firstly, the particle velocity and position are initialized, and the current particle fitness is calculated according to the particle velocity and position.The minimum fitness of the current particle is selected as the individual extreme value of the particle, and the minimum global fitness particle is selected as the global extreme value.Then, update the  particle's velocity and position, and repeat the steps of calculating the current particle fitness according to the particle's velocity and position until the global optimal position is output when the iteration number reaches the maximum iteration number or the precision reaches the preset accuracy, that is, the parameters optimized for least squares support vector machine.Finally, an optimized parameter prediction model is constructed according to the optimized parameters.

. . Principal Component Analysis (PCA).
The large number of variables in transmission line project and the complex relationship between them will significantly increase the complexity of the analysis problem.Therefore, this paper uses the principal component analysis method to reduce the dimension of variables, so that the principal components can not only represent the main information of the original variables, but also have no correlation with each other, which will be helpful to the establishment of transmission line project cost prediction model and problem analysis.Principal component analysis (PCA) recombined the P indicators that had a certain correlation into a new set of interrelated comprehensive indicators to replace the original indicators.The main calculation steps of PCA are as follows: (1) Standardize the raw data The original data are standardized according to the following formula: where (2) Calculate sample correlation coefficient matrix Assuming that the original data is still expressed in X after standardization, the correlation coefficients of the standardized data are as follows: where

principal component and get the expression of principal component
Through principal component analysis, n principal components can be obtained.However, as the variance of each principal component decreases, the amount of information contained therein decreases accordingly.Therefore, in practical analysis, the first k principal components are usually selected according to the cumulative contribution rate of each principal component (the variance of one principal component accounts for the proportion of the total variance).Generally, the cumulative contribution rate of the first k principal components is required to be more than 85%.
. .Least Squares Support Vector Machine.Least squares support vector machine (LSSVM), as an improved algorithm of traditional standard support vector machine, was first proposed by Suykens and Vandewalle [22].LSSVM inherits a series of excellent characteristics of SVM, such as kernel function, structural risk minimization principle and small sample size.Based on regularization theory, loss function in SVM is defined as least squares loss function, and equality constraints are used instead of inequality constraints in standard SVM.Through the innovation of SVM objective function setting, the time consuming quadratic programming (QP) problem is transformed into the solving problem of linear equations, which significantly reduces the complexity of the model, reduces the memory consumption in the training process, and greatly improves the solving speed.The following is a brief introduction to the mathematical model of least squares support vector machine.
Given a set of sample vectors of n-dimensional input and one-dimensional output, the sample under a single technical scheme can be expressed as The sample is mapped from the original space to the high-dimensional feature space by a nonlinear mapping (  ).And the nonlinear estimation function is transformed into the linear estimation function () =  ⋅ () +  in the high-dimensional feature space.The weight vector and the offset of the regression function are expressed by  and  separately.According to the structural risk minimization principle, minimize  and  as follows: where ‖‖ 2 is used to control the complexity of the model;  is a regularization parameter to control the penalty degree of the sample exceeding the error;   is the error control function, that is, the insensitive loss function .LSSVM chooses the square of error   as the loss function when optimizing the target, so the optimization problem is as follows: where   is the relaxation factor.So in order to solve this problem, the Lagrange function is established as follows: where   ( = 1, 2, ⋅ ⋅ ⋅ , ) is Lagrange multiplier.According to the KKT optimization conditions, the partial derivatives of  to , ,   , and  are obtained, respectively, and they are equal to 0. The results are as follows: Then, an arbitrary symmetric function satisfying Mercer's condition is introduced as the kernel function.The parameters of the kernel function determine the complexity of the spatial distribution of the sample data and have a great influence on the performance of the model. and  are obtained by least square method.Finally, the decision function of LSSVM regression analysis is obtained as follows: . .Particle Swarm Optimization Algorithm.Particle Swarm Optimization algorithm (PSO) is a global search algorithm, initially inspired by the foraging behavior of birds, and used to solve the global optimal solution of optimization problems [28].The main idea of PSO algorithm is to initialize the location and velocity of a group of random particles and search for the optimal solution by iteration under certain conditions.The best position of each particle in the search process is defined as the individual extreme value   , and the best position of the current population is defined as the global extreme value   .In each iteration, the particle updates its speed and position in the next iteration according to the change of the two.
In a d-dimensional search space, there are m particles representing possible solutions to the problem  = { 1 ,  2 , ⋅ ⋅ ⋅ ,   }, where   = { 1 ,  2 , ⋅ ⋅ ⋅ ,   } represents the position of the  particle,  is the number of LSSVM parameters, here  = 2. Individual fitness is expressed by the mean square error generated by each training set sample in LSSVM training, and the fitness function is constructed as follows: The velocity of particles in D dimensional space is defined as   = {V 1 , V 2 , ⋅ ⋅ ⋅ , V  }.   = { 1 ,  2 , ⋅ ⋅ ⋅ ,   } is the best position   (the minimum fitness) that the particle can search for itself.  = { 1 ,  2 , ⋅ ⋅ ⋅ ,   } is the best position   for the entire population.The velocity and position update of the  particle is determined according to the following formula: where  is the weight factor of inertia;  is the number of iterations;  1 and  2 are acceleration factors, representing the step length of particle flying towards its optimal position and overall optimal position; rand( ) is a random number uniformly distributed in the interval [0, 1].
If the number of iterations reaches the maximum number of iterations or the precision reaches the preset accuracy, the iteration cycle will be withdrawn and the global optimal parameters will be returned.PSO can be used to optimize kernel function parameters and penalty coefficient in LSSVM model; thus artificial exhaustion can be avoided and better fitting effect can be obtained.

Selection and Measurement of Cost Prediction Indexes
.

. Analysis of Factors Affecting Cost and Selection of Indexes.
The transmission line project cost is composed of two parts: ontological project cost and other costs, and other costs only account for about 20% of the project cost.And more than 60% of the other costs are concentrated on the construction site requisition and clearance fees, while the construction site requisition and clearance fees are greatly affected by the region, which is not regular and difficult to accurately predict.Therefore, this paper only studies the ontological cost of transmission line project and does not discuss the changes of other costs.Ontological project cost can be divided into six parts: foundation project cost, tower project cost, grounding project cost, erecting project cost, annex project cost and auxiliary project cost.And the voltage level has a great impact on the cost of transmission line project.The cost of transmission line project with different voltage levels has significant differences.Therefore, this paper only chooses one kind of transmission line project under voltage level for analysis.
The transmission line project consists of six unit projects: foundation project, tower project, grounding project, erecting project, annex project, and auxiliary project.Therefore, this paper originally identifies the influencing factors of the cost from the perspective of the cost of the six unit projects.Firstly, the influencing indexes of each ontological cost module are determined separately, and then the influencing indexes of overall cost are summarized.The process of identifying indexes is shown in Figure 2.
According to the impact indexes identified in Figure 2, they can be classified into three categories: natural, technological, and economic indexes.Among the technical indexes, because this paper studies the influence indexes of transmission line project cost under single voltage level and the separated number of conductors matches with voltage level, the influence factor of separated number of conductors is eliminated.Among the economic indexes, the price of earthwork and the price of concrete are relatively stable, which have little impact on the cost of transmission line projects, so they are also eliminated.Therefore, the indexes affecting the transmission line project cost are shown in Table 1.
. .Index Measurement Method ( ) Unitization of Gross Index.In the analysis of the indexes affecting the unit length cost of transmission line project, some gross indexes can not be directly compared because of the different construction scale of the projects.In order to better analyze the impact of these indexes, it is necessary to unit some of the gross indexes as shown in Table 2.
( ) Quantification of Qualitative Index.In order to make some qualitative indexes available for the input of prediction model, it is necessary to quantify them according to the factor level of each index.If a project contains multiple levels, weighted average processing should be carried out proportionally.The results of index quantification are shown in Table 3.

Optimization Model Construction and Experiment Study
. .Project Samples and Data Statistics.The original data of 78 sets of 110 kV transmission lines in actual settlement in a certain area of China in 2017 are selected as samples.The initial indexes of the model are obtained by quantitative and unit processing of the original data indicators, as shown in Table 4.In order to eliminate the interaction between indexes and reduce the number of indexes, the principal component analysis (PCA) of the index data of the above 78 sample projects is carried out.Each sample has 15 variables, which constitutes a 78 * 15-order matrix: SPSS software was used for calculation, and KMO suitability test and Bartlett sphere test were performed for standardized data.The results showed that KMO value was 0.639 and Bartlett spherical test value was 622.87, which reached a significant level (p=0.000< 0.001), indicating that the data had high efficiency and could be used for principal component analysis.
On the basis of data standardization, the influencing indexes of transmission line project cost are analyzed by principal component analysis.The cumulative contribution of principal component is shown in Table 5.From the table, the cumulative contribution of variance of the first seven principal components is 86.530%, which is more than 85%.That is to say, the seven principal components can summarize most of the information.Therefore, the first seven principal components are selected as the input set of the transmission line project cost prediction model.
According to the component score coefficient matrix obtained by principal component analysis and the eigenvalues of variables, the principal component coefficient matrix is calculated as shown in Table 6.
The principal components can be calculated by principal component coefficient matrix and standardized variable data.Take the first principal component as an example:   The iteration process of PSO is shown in Figure 3, and the result of parameter optimization is shown in Figure 4.The lowest point represents the optimal parameter combination point.The optimal parameters of LSSVM are optimized by PSO algorithm: C=276.8476, 2 = 0.3412, and the minimum fitness is MSEmin=0.0188.
. .Analysis of Prediction Results.The other 18 sets of data are selected as test sets to input the model trained above, and the cost prediction analysis of transmission line project is carried out to further verify the rationality of the prediction model.The test samples are substituted into the empirical parameter model and the optimization parameter model, respectively, and the simulation results show that the predicted value of unit length cost is compared with the actual value as shown in Figure 5.As shown in Figure 5 , the LSSVM empirical parameter model predicts that the average absolute deviation rate between the cost and the real cost is 9.74%; the PSO-LSSVM optimized parameter model predicts that the average absolute deviation between the cost and the real cost is 6.73%.The prediction accuracy of the optimized parameter model is higher than that of the empirical parameter model, and it can meet the accuracy requirement of project investment decision.
Statistically, the mean absolute percentage error (MAPE) and root mean square error (RMSE) are used to measure the accuracy of the prediction model.The calculation formulas are as follows:   7.
Table 7 shows that the MAPE of PSO-LSSVM is 30.90% higher than that of LSSVM, which proves that the optimization parameter model can greatly improve the accuracy of transmission line cost prediction model.

Conclusions
This paper proposes an intelligent cost prediction model based on LSSVM optimized by PSO.Firstly, PCA is used to screen and reduce the dimension of the transmission line project cost data, and the principal components which can basically describe the factors affecting the cost are obtained as the input set of the prediction model.Secondly, the empirical parameters of LSSVM model which is mature in theory are given to determine the nonlinear mapping between transmission line project characteristics and cost.Then, the parameters of the model are optimized by PSO to obtain the optimized parameters model, and the model is trained.Finally, the trained model is used to predict the cost of transmission line project.And the accuracy of the model is verified by comparing the cost predict value of empirical parameter model, optimized parameter model, and actual value of cost.
Cost control of transmission line projects directly affects the economic benefits of power grid projects, so it is an urgent problem to predict the cost with high accuracy in the early stage of the project.Although cost prediction models based on various mathematical methods have been proposed in many literatures, they are difficult to apply to transmission line project, and the error is generally high.For example, the neural network model and the regression prediction model can not adapt to the characteristics of fewer samples of transmission line engineering, and the prediction model based on grey theory can not adapt to the characteristics of many influencing factors of transmission line engineering, so the error generated are rarely less than 10%.The average absolute deviation rate of the PSO-LSSVM optimization parameter model is 6.73% compared with the actual value of the cost, and the accuracy of the PSO-LSSVM optimization parameter model is improved greatly compared with other prediction models.
In summary, the PSO-LSSVM cost prediction model proposed in this paper can effectively improve the accuracy of transmission line project cost prediction.This model has strong practical significance and good application effect in transmission line project cost prediction.

Figure 1 :
Figure 1: Flowchart of transmission line engineering cost prediction model.

Figure 2 :
Figure 2: Selection of influencing indexes of transmission line project cost.

Figure 3 : 2 2𝜎 2 )
Figure 3: Iterative process of parameter optimization.Describe the minimum MSE for each iteration.

Table 1 :
Influencing indexes of transmission line project cost.

Table 4 :
Original data of transmission line project cost.

Table 6 :
The coefficient between the principal component and the canonical variables.
ErrorFigure 5: Comparison of prediction results of different models.

Table 7 :
Statistical error measures of prediction methods.  represents the predicted value and ŷ represents the actual value.The MAPE values and RMSE values of the LSSVM model and the PSO-LSSVM model are calculated, respectively, as shown in Table where