Application of Data Mining in the Settlement Management of Distribution Network Projects

With the development of intelligent technology, the distribution network has also ushered in a new round of upgrading and transformation. New technologies, equipment, and platforms will be introduced to bring innovative blood to the distribution network. This paper proposes a data mining settlement management method for distribution network projects. The improved K-means method is used to estimate the coeﬃcients of each link of the project investment. When a large deviation of a certain coeﬃcient is found, the investment adjustment is made immediately to avoid the overall investment. This method can save costs for infrastructure construction, through the data mining deeper, can form a standard, intelligent analysis method, realize based data acquisition to the output results from the application of rapid transformation, realize the material equipment price forecast and project construction cost and fast estimation, provide project decision-making more scientiﬁc and accurate, and make the whole process of project cost more accurate.


Introduction
Since the " irteenth Five-Year Plan," with the implementation and promotion of policies such as poverty alleviation and rural revitalization, the prominent contradictions in power construction have changed from "utilizing electricity" to "good use of electricity." e power distribution network is responsible for the distribution of electricity. High hopes were placed on it, and its engineering investment increased substantially. Faced with a large variety of distribution network construction data, the traditional manual management and control methods that lack "digital" overall analysis are inefficient, and decision-makers have insufficient analysis and response capabilities, making it difficult to achieve the goals and objectives of the distribution network construction during the 14th Five-Year Plan period [1][2][3][4]. In recent years, with the rapid development of data science, the application of data mining technology in the power grid has also flourished. Data mining technology can deeply dig its value through massive data information and provide strong support for the construction and development of electric power enterprises [5]. erefore, in order to meet the task of power grid construction and lean management and control requirements in the new era, it is urgent to integrate data mining technology into the management and control of distribution network engineering to achieve "digital" management and scientific decision-making [6,7].
With the continuous deepening of the power system reform and the continuous and rapid growth of the national economy, my country's power grid construction has made brilliant achievements. However, with the increase in the scale of investment, the impact of the project cost level on the development of the power grid has become more and more significant. erefore, it has attracted much attention. At present, the whole process cost management has proved the importance of the preproject cost to the entire project implementation. Scientific and accurate determination and effective preproject cost control and management are the key to achieving the construction project management goals [8]. In practice, the mid-and early-stage project cost has never received attention. e distribution network plays the role of distributing electric energy and stabilizing power supply in the power system. e safe and reliable operation of the distribution network is a prerequisite for the stable development of society, and the key to ensuring the safe operation of the power grid lies in the quality of the construction of the distribution network. erefore, it is an inevitable trend to deepen the research on key management and control technologies for the entire process of distribution network engineering and to stabilize the construction and implementation quality of distribution network engineering through scientific means, lay a pragmatic foundation for the safe and reliable operation of the distribution network, and obtain safe and reliable electric energy for the people. e stable development of society provides a strong guarantee [9][10][11][12].
e distribution network project has the characteristics of multiple points, wide areas, and extremely complex external environment, and the management and control elements have multilevel logical relationships. e intricate and complex relationships enable the identification, analysis, and analysis of key management and control elements in the construction of the distribution network. Management and control has become the top priority of the entire project control and the most difficult. In recent years, the concepts of "Internet of ings" and "cloud computing" have stepped onto the stage of the times. Big data and mobile Internet technologies have grown vigorously. Massive amounts of data are contained in the information systems of the power industry, and the valuable value of data has been increasingly tapped. e abundant and massive data obtained can reflect the status information of all aspects of the distribution network construction project in real time, and data mining technology makes the transformation of "datainformation-knowledge-wisdom" possible [13,14].
erefore, it is imperative to integrate data mining technology into the management of the whole process of distribution network construction. It is an important cornerstone for the better, stronger, more precise, and more accurate development of the entire process of distribution network management and control [15].
With the popularization and application of new technologies, new technologies, and new equipment in power grid engineering, the proportion of equipment and material purchase costs in project investment is increasing, and the impact of equipment and material prices on power grid engineering investment is increasing. Collect equipment and material price information released by the State Grid. It is helpful for power grid companies to analyze the trend of price fluctuations and improve the accuracy of project estimation and estimated investment. Create a basic data set of equipment and material prices, use the neural network analysis method to build an intelligent analysis data model, and perform iterative analysis and calculation through the data model and analysis method to realize the calculation and prediction of equipment and material information prices. e predicted material price information is fed back to the historical engineering cost data set for further adjustment, and the established rapid evaluation algorithm and model are used to make more accurate cost estimates for the projects that need to be invested. e cost of the proposed project is estimated based on the method of fuzzy mathematics, and the degree of closeness is calculated through the fuzzy relationship, so that the similarity between the constructed project and the estimated project is quantified, and then, the principle of selecting the nearest is adopted, and the project with the largest degree of closeness is used to estimate the estimated project cost.
With the development of intelligent technology, the distribution network has also ushered in a new round of upgrading and transformation, which will introduce new technologies, new equipment, and platforms and bring innovative blood to the distribution network [16]. e construction of distribution network projects has the characteristics of large investment amount, multiple nodes, small unit scale and long fund recovery period, and there are many uncertain factors. In the whole process of project implementation, funds are often brought in due to changes in certain factors. Changes have led to a large gap between budget and actual project investment, sometimes up to 40%, of which government influence factors may account for about 10%. High deviation will cause a great waste of power grid investment, and construction funds cannot be used rationally [17][18][19][20].
is article aims to analyze the problems and causes of cost control in each stage of the distribution network project and explore solutions and management measures, while ensuring the quality of project construction, reasonably and effectively control costs, save funds and resources, and improve corporate investment efficiency. e deviation is controlled at 20% (the target is 10%). Based on this, this article will be based on the massive cost data of power grid engineering involved in the process of historical project cost data management, technical and economic analysis, and application of technological and economic results within its jurisdiction, including massive cost data in infrastructure information. Unify management in the management system, conduct deeper data mining, form standard and intelligent analysis methods, realize rapid conversion from basic data collection to result output application, realize material and equipment price forecasts and rapid and accurate estimation of project engineering costs, and make project decisions. Provide more scientific and accurate support to make the whole process of the project more accurate.

Analysis on Influencing Factors of Distribution Network Project Settlement
2.1. Environmental Impact. e most unpredictable are natural environmental factors, such as weather or natural disasters and other irresistible factors, which can directly prevent continuous construction during the construction process and affect the construction progress. If a long-term disaster is encountered, it will seriously affect the construction period and operation and maintenance costs will also rise. Natural disasters severely damaged the construction site and also seriously damaged the construction property. Sometimes, the plots and paths planned in advance will need to be reselected due to the destruction of the geographical environment, and different construction methods must be adopted to continue. ese will seriously lead to budget overruns [21][22][23].

Human Factors.
e technical ability of personnel is the core of project implementation and runs through the entire project cycle. Starting from design consultation, the design plan should fully consider the rationality of the plan and have an emergency mechanism. When there is a change, the construction can be continued in the most economical way to reduce the difficulty; in the procurement stage, the personnel relies on professional ability to select cost-effective equipment and materials and accurate control of materials to avoid affecting the later construction; in the implementation process, effective evaluation of construction technology, and contractor capabilities can be carried out, and bad behaviors, such as violations, can be found and avoided as soon as possible.

Policy Factors.
e unified promotion and construction of the distribution network is a project behavior arranged by the State Grid. It is mandatory under certain industry needs, but this kind of mandatory can only restrict the inside of the State Grid system, and the construction of the distribution network belongs to the scope of the national infrastructure project. Construction site selection and construction period require relevant documents issued by government departments, and the approval period of the process will seriously affect the project implementation period [24]. Moreover, national policies are greater than the requirements for the development of the State Grid industry, and often, sudden higher-level policies will directly overturn the early stage of the project, which is also a big blow to the project. It is necessary to conduct thorough investigations in the early stage of the project, gain insights into various aspects of information, and reduce losses caused by policy changes.

Social Factors.
e distribution network is a "last mile" network, so it is necessary to communicate directly with the owner before construction, but disputes about compensation for land acquisition often occur, such as obstruction by villagers, mixing of surrounding enterprises, and government intervention, which hinder the smooth progress of the construction. Such social events will delay the implementation cycle. Negotiation time should be strictly controlled, reasonable negotiation terms should be adopted, and coordination should be done as much as possible.

Management Norm Factors.
e implementation of distribution network projects is consistent with the implementation process of ordinary projects, and each stage involves investment. erefore, investment funds must be strictly controlled at each stage. At present, at each stage, there is a waste of funds due to various negligence or accidents, and there is a lack of timely treatment measures. Strict management specifications should be formulated to avoid incomplete design considerations, deviations in project costs, and unreasonable project implementation cycles and control equipment and construction costs within the lowest acceptable range.

Analysis of the Effect of the Application of the Distribution Network Planning Data
e project cost of the distribution network is a huge project that runs through the entire life cycle of the distribution network project. Each stage will involve huge costs. Effective cost control methods should be used to strictly control every investment detail. From the perspective of the project cycle, including planning and project approval, design consultation, bidding and procurement, installation and construction, completion acceptance, and project maintenance, investment changes in each implementation stage will have an impact on the total investment: the project investment details, including installation projects fees, bank loan interest, equipment purchase fees, equipment testing fees. For the project cost, the most important thing is the real-time price mechanism. erefore, the distribution network project must formulate a market-based price mechanism under the national macropolicy to affect the macroinvestment decisionmaking and improve the project economy to a certain extent. is paper adopts the cost control method of distribution network engineering based on price control and the whole life cycle as the main line [25]. e proportion of investment in the distribution network has gradually increased in the total investment in the power grid. From 10% in the past to 45% now, it can be seen that the State Grid attaches importance to the distribution network. From the perspective of technological development, the construction of backbone power grids has reached a certain bottleneck stage, and breakthroughs may take some time to accumulate in operation. e polymorphism and diversity of the distribution network is destined to develop much faster. At the end of the power grid, it integrates and develops with multiple departments to form a platform for the integration and integration of business and distribution data. Some of them will have repeated investment. e distribution network investment data should be sent to the grid big data platform as sample data, and its impact on each application system should be calculated, respectively, so that the data can penetrate into each business system, and the investment scale of each system can be adjusted to form a virtuous cycle system with optimal capital. e comprehensive project cost of the cross-business platform is shown in Figure 1. e big data platform stores the investment data of each business department of the power grid. e investment data of each link of the distribution network is input as different sample sets. Data mining algorithms can be selected, such as support vector machines, priority, maximum expected value, and decision Security and Communication Networks tree algorithms. Select the appropriate distribution network data corresponding to the business platform for data calculation training and form auxiliary factors to further optimize the engineering investment parameters of each business department. e goal is to maximize the intensive investment of each department.

Data Mining Concepts and Methods
Data mining consists of several steps, such as data cleaning, integration, selection, transformation, mining, model evaluation, and knowledge representation. rough refining, analyzing, and transforming a large amount of data, the key target value is finally obtained. Its value lies in the use of data mining to improve forecast model.

Research on Data Mining Technology.
Knowledge discover in database (KDD) is a process technology, including the original simple data through a series of processing changes into useful information that people can visualize. As the core technology of KDD, data mining (DM) was born in the second half of the twentieth century. e essence is filtering and mining, that is, screening massive data, removing appearance information, and mining high-value information. Broadly speaking, data mining technology is a kind of big data as the object of analysis, in order to achieve the purpose of discovering its internal rules and exploring its internal value from the massive data of irregularities, integrating artificial intelligence technology, statistics, and other theories. Kind of data processing method. In a narrow sense, data mining technology is a kind of data processing method, which is the product of the combination with the database. Data mining technology frequently appears in the Internet of ings and financial fields. In recent years, it has been widely used in the power industry. Data mining is the integration of ideas and theories in various disciplines, such as statistics and AI, as shown in Figure 2.
Since data mining is a large integration of multiple disciplines, its methods are countless. Common methods include regression analysis, trend prediction, feature recognition, association analysis, cluster analysis, and anomaly detection, which mine data from different perspectives.
ere are many links and factors that affect the cost of distribution network projects, and a categorical statistical method should be selected to analyze and evaluate the factors of each link separately. is article mainly uses the clustering method.

Clustering Method.
Clustering is one of the most commonly used techniques in the field of data mining. e main idea is to use similarity measures to group similar samples together.
ere are many clustering algorithms, such as partition-based clustering algorithm (K-means), hierarchical clustering algorithm (BIRCH), density-based clustering algorithm (DBSCAN), and grid-based clustering algorithm (STING). e K-means algorithm is simple and fast, and the clustering results are satisfactory. erefore, the K-means algorithm has been widely used. e cluster analysis method is more suitable for this type of project evaluation. e adaptive classification method can be used to understand the project cost in real time. When a large deviation occurs in a certain link, remedial measures can be taken immediately. Cluster analysis is a more mature statistical analysis method of engineering data and has a huge algorithm system. Different subalgorithms have been applied to various engineering projects and have good practical applications. Clustering algorithms include K-means, factorbased, density-based, and hierarchical analysis. Each algorithm has its particularity, and the corresponding algorithm needs to be selected according to different application conditions. e K-means algorithm is relatively simple, fast in calculation, and not so accurate in prediction. It is more suitable for the cost of distribution network projects. For a large investment project such as the distribution network, the cost accuracy requirement is lower, but the K-means algorithm is more random, and it is difficult to determine the number of clusters. erefore, the K-means algorithm needs to be slightly improved. is paper proposes a K-means clustering algorithm based on contour coefficients. When determining the classification of each point in the traditional K-means algorithm, it is necessary to repeatedly calculate this point and the points in each cluster to determine the position in the cluster. e calculation efficiency is too low. e contour coefficient method can pass nodes and contours. e coefficient is compared and calculated to determine the position, which greatly improves the speed. e so-called contour coefficient method is actually an optimization coefficient proposed on the basis of the hierarchical clustering algorithm. It uses similarity to determine the number of clusters and improves the overall efficiency of the algorithm. e contour coefficient is to quantify the similarity between any object in the data set and other objects in the cluster and the similarity between the object and the objects in other clusters and combine the two quantified similarities in a certain form to obtain the cluster. Finally, the evaluation criteria of clustering algorithm are obtained. e detailed process of the improved K-means clustering algorithm is as follows.
(1) Initialization: randomly select n objects as the initial center of the cluster, set n as the total number of objects, the distribution network engineering cost data sample set T � (x 1 , x 2 , . . . , x n ), t is the number of iterations, and t < n. (2) Calculation of contour coefficient: for the i-th object, calculate the average distance from this object to all objects in its cluster, denoted as a i . For the i-th object and any cluster that does not contain the object, calculate the average distance from the object to all objects in the given cluster, find the minimum value for all clusters, and record this value as b i . For the i-th object, its contour coefficient y is calculated as follows: e value of the contour coefficient varies between −1 and 1. When a i < b i , for the i-th object, the value of the contour coefficient is positive; otherwise, it is negative. After averaging, the best cluster profile coefficient y is obtained, and the cluster value k is inferred.

Project Settlement Forecast of Distribution Network Based on Data Mining
According to the work of settlement data preparation, classification, index establishment, and data conversion, intelligent algorithm technology is used to establish a prediction model, and the key indicators of the preliminary design of the new project are used as input to obtain the output estimate, which provides a reference for the preparation and review of the estimate. Promote more reasonable preparation of budget estimates. e application of data mining technology to the settlement data to carry out the estimate and forecast of the power grid construction project is in the following order: settlement data collection, data classification, data conversion, data processing, intelligent algorithms, model establishment, engineering indicators, budget estimates, and forecasts. By analyzing a large number of settlement data for power grid construction projects, mining the internal laws of settlement changes, taking settlement data as the research object, using data preprocessing techniques such as statistical analysis, data conversion, and data denoising, combined with neural network technology, fuzzy mathematics, and genetic algorithms, support vector machines and other forecasting methods establish effective forecasting models to effectively forecast the estimates. e effectiveness of the evaluation algorithm is verified by experimental simulation data, and the investment data of the entire life cycle of a city power distribution project is selected as the simulation sampling value. ere are more than 3000 data in total.
e simulation is carried out in Matlab7.0. According to the project cycle data, the experiment divided the project cycle data into 6 stages for verification, and the square error value is calculated. e statistical results are shown in Table 1.
It can be seen from Table 1 that the average error of project construction cost � 0.235, the average error of consulting design � 0.052, the average error of bidding and procurement � 0.042, the average error of engineering construction � 0.209, the average error of acceptance and completion � 0.075, and the average error of operation and maintenance � 0.098. e higher error value of the project cost in the construction phase and the operation and maintenance phase indicates that this part of the emergency situation is higher, and the probability of occurrence is the highest in phase 3.
en, the cost control should be strengthened in this part to improve the technical ability and technical ability of employees. A variety of alternatives are provided as emergency measures in the engineering design, and the investment of each option is listed and analyzed in more detail as the basis for selection during the later on-site construction.

Conclusion
is paper proposes a data processing method using data mining and classification calculations. e improved K-means method is used to estimate the coefficients of each link of the project investment. When a large deviation of a certain coefficient is found, investment adjustments are made immediately to avoid the overall investment. e method saves the cost of infrastructure construction, carries out deeper data mining, forms a standard and intelligent analysis method, realizes the rapid transformation from basic data collection to the output application of results, and realizes the rapid and accurate estimation of material equipment price and project engineering cost. Project decision-making provides more scientific and accurate support, making the whole process of the project more accurate. e life-cycle management of the distribution network project cost is a dynamic process that penetrates all aspects of project planning, design bidding, construction implementation, completion acceptance, and later maintenance. It is professional, technical, and comprehensive in management. e project management unit should do a good job of precontrol, process management, and postevent closedloop. Strictly control every risk point that may affect the project cost in the project cycle, enhance the awareness of prevention, and take timely countermeasures when the cost changes to control the project cost within a reasonable range, so as to improve the lean management of the distribution network project cost. Promote the transformation and upgrading of power distribution network construction to meet the development process of modernization of power enterprises and better serve the society and people's livelihood. In the study of budget estimates and forecasts for power grid construction projects, due to the large number of engineering indicators and the complex relationships between the indicators, the estimates and forecasts are more difficult.
rough data mining on the settlement data of power grid construction projects, the original indicators are merged and dimensionality reduced to obtain key indicators, singular noise data are removed, data are cleaned, intelligent algorithms are used to establish predictive models, and reasonable estimates are obtained to control project investment. Within a reasonable range, the final settlement rate will be controlled within a reasonable range from the estimated rate of decrease.
Data Availability e dataset can be accessed upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.