Construction of Unbalanced Engineering Cost Management and Multivariate Statistics System Based on Random Matrix Weight Algorithm

e informatization of engineering cost plays an important role in the cost management and is also one of the important subjects of the informatization in the engineering eld. It is the direction of the future project cost management. ere is a trend of exponential increase in the information about project cost, which brings a great challenge to cost informatization. On the basis of the construction quantity and cost composition in the process of analyzing the present situation of the project cost, the overall index of the construction quantity and cost is constructed. For bridges and tunnels and to achieve a reasonable index of roadbed, on the basis of considering project at dierent stages and the demand for dierent levels of cost management personnel, adopt the method of statistical analysis and neural network, respectively, and set up the project cost fast estimation method and cost prediction model: intelligent prediction model and index system forecasting model. e error analysis and checking calculation of the two models provide the possibility for correctly estimating the engineering quantity and engineering cost in the preliminary design stage. Aiming at the low accuracy of the traditional unbalanced cost data classication algorithm, the dimension of random matrix weight was introduced into the cost data classication. On the basis of analyzing the spatial and temporal characteristics of WAMS measurement data, a high-dimensional random matrix model of WAMS measurement big data was constructed according to the high-dimensional random matrix theory, and a nonequilibrium cost data classication model based on the weighted algorithm of improved fuzzy rules was established. e model is applied to a unit project cost data classication, which veries the operability of the model and provides a new idea for other units to carry out similar work. Finally, other common problems that should be paid attention to when using multivariate statistical analysis method are put forward.


Introduction
With the rapid development of construction industry, people pay more and more attention to the management of construction engineering projects. Engineering cost management is an important part of construction engineering management. Improving the level of construction engineering cost management can promote the better development of construction engineering projects. Modern social information technology develops very fast and has been applied more and more widely in all aspects. Information technology has the characteristics of advancement, accuracy, and humanization and is applied to the cost management of construction engineering [1], which can e ectively manage the project cost and improve the e ciency of construction engineering cost management. At present, the application of information technology in construction cost management in China is not mature enough, and it needs to be improved and developed constantly to promote the progress of China's construction industry.
In the traditional construction project cost management, the related information of investment budget and cost budget plays a crucial role in the information management of the whole project cost, which is convenient for the acquisition and sorting of other information. Check the working drawing estimate information, for example, according to the full and effective information, using information technology to establish cost management of construction engineering information database, information data in the database for analysis, and research on the changes that occurred in the process of architectural engineering construction cost, and provide effective reference basis for construction project cost management. It can control the cost of construction projects in reality [2,3]. Influenced by economic development, the development of the construction industry is mainly concentrated in the regions with rapid economic development, such as the southeast coastal areas. e subordinate status of the economic development zone is easy to ignore. However, people are more willing to pay attention to the surface state so as to omit the focus on design. However, the project cost will directly affect the budget [4], so the design of the project is often ignored. Cost related personnel will mainly consider cost through engineering planning rather than design [5]. erefore, attention should be paid to the design, and the design throughout the highway project cost is always a top priority. Due to the lack of cost control in the design stage of the project, engineering design has not been implemented as a focus. erefore, the relevant technical personnel should communicate with the person in charge of the economy, requiring the cost personnel to consider the implementation of the project and provide a more realistic cost plan [6]. In the process of project implementation, the cost personnel and the economic person in charge should keep close contact, to ensure that the most accurate solution can be provided in time when problems occur in the construction process.
In recent years, the development of cost data shows a trend of large volume, fast speed, and high complexity. How to accurately extract massive cost data and accurately classify unbalanced cost data so as to achieve effective data classification and reuse has become one of the hot topics concerned by the academic and practical circles [7]. Construction project life cycle cost management needs a lot of data to do the support, whether from the initial estimate, budget data, or the later budget and final account data; there are huge unbalanced data processing needs. Many scholars at home and abroad have done a lot of research on the classification and application of unbalanced data. Generally, the data samples that account for a relatively small amount in the data set but have important role and value and attract the most attention are defined as minority samples [8]. For example, from the perspective of fuzziness phenomenon [9], this paper studies the fuzzy modeling method of data processing by using computer to imitate brain thinking mode and directly inputting part of human language into computer program and applies it to unbalanced data classification [9]. rough the comparative study of the traditional SVM (Support Vector Machine) and SPCC unbalanced data classification methods, it is proved that the data identification performance of this method is poor, and it is easy to lead to an increase in the probability of misclassification of a few samples, and it is difficult to obtain accurate classification results [10]. In order to solve the above problems, w-FSSPC method of unbalanced data classification is proposed, which introduces class weight factor and fuzzy rule membership degree to effectively reduce the influence of impatient points on classification performance [11] and improve the classification accuracy of unbalanced data to a certain extent.
rough the introduction of multidimensional fuzzy rules, the method of fuzzy recognition and representation in data classification is studied [12], and an empirical analysis is made on the sequencing of dam concrete construction jumping. An Ada-SVM-OBMS (unbalanced data classification algorithm based on oversampling technology) algorithm was proposed to synthesize new samples based on misclassified samples. By using misclassified samples to guide the generation of new samples, the misclassified samples were easier to identify [13], which effectively improved the prediction performance of minority classes. And take the telecom industry call cost data as an example to conduct data mining verification analysis.
Random matrix theory originated in the development and research of quantum physics; among them, Dave spectrum analysis of random matrix limit has seized the attention of many mathematicians and statisticians of the world, as early as random matrix combined with quantum physics [14], and introducing multidimensional mathematical statistics, in the past few decades, the random matrix theory has extensive and in-depth research. In recent years, the study of the random matrix is mainly concentrated in the second-order limit theory, including characteristics of extreme value distribution of [15] compared with the traditional mathematical tools, random matrix theory in the big data matrix reconstruction [16], and anomaly target detection, and location has good performance [17].
Summarize the watching to see that the project cost management is the use of economic science, the method of operation based on the objective development rule, deal with concrete engineering practice of some engineering construction related specific problems, such as economy, technology, and management, do all in its power to people and use money effect, and improve the relevant benefit [18].
is also reflects that the specific object of project cost management is the project construction, which is a specific activity involving technical economy, operation, and management. In the process of analyzing the present situation of project cost, this paper constructs the overall index of project quantity and cost based on the composition of project quantity and cost. In order to achieve the reasonable index of bridge, tunnel, and roadbed, considering the different stages and different levels, the engineering cost management requirements, on the basis of statistical analysis and neural network, were used, respectively, to measure and set up the highway engineering cost estimation method and cost prediction model, intelligent prediction model, and index system forecasting model, based on the error analysis and checking calculation of two kinds of model. It is possible to estimate the engineering quantity and the engineering cost correctly and provide the preliminary design stage.

Big Data Modeling of Unbalanced Project Cost Based on
High-Dimensional Random Matrix. e data imbalance of the cost industry mainly includes two forms: the imbalance between sample classes and the imbalance of elements within sample classes.
(1) Imbalance among sample classes. In other words, the number of different sample classes in the data set varies greatly. In this case, traditional classification algorithms aiming at the overall sample classification accuracy will pay more attention to most samples in order to obtain the highest classification accuracy, which reduces the classification performance of a few samples. Suppose there is a 1 : 99 imbalanced data, and the classifier misclassifies the minority category, which accounts for 1%, into the majority category. In this case, the overall classification accuracy is 99%, while the misclassification rate of the significant minority category samples is 100%. For example, the fire resistance grade of high-rise buildings is divided into grade I and grade II, and the improvement of fire resistance grade will greatly increase the total cost of the project. If the historical data classification of the cost industry is misclassified as a minority sample, the bidding process will have a serious impact on the cost guidance of similar new projects, resulting in economic losses. e technical route of unbalanced engineering cost based on random matrix is shown in Figure 1.
(2) ere is imbalance between sample categories, and there is imbalance of constituent elements within the same sample category, that is, other forms of imbalance generated in the same sample. Relevant studies show that not only will the degree of intersample imbalance reduce the classification accuracy, but also the imbalance of elements within the same sample is an important factor leading to the deterioration of classification performance. With data sets of rail transit project example, the unforeseen risk factors relative to the full cost data belong to the impact on the cost of the larger minority class, and in these risk factors, shield met underground pipe pile, sand flow, backfill soil, confined water, and other elements belong to the increase of the total project cost substantially unbalanced data. If there is no accurate classification of unbalanced elements of similar samples, the classification of the whole data set will be affected, as shown in Figure 2. Minority class samples and majority class samples are shown in blue and yellow.
ere are multiple different subsets and overlapping sample elements in Figure 2. ere is a close relationship between the separated items in the data set and the in-class imbalance. Most classes produce small separation terms, which increases the learning difficulty of the classifier. Moreover, due to the small number of minority class samples, the classifier cannot effectively distinguish the subset of minority class samples from the noise points of minority class samples.

Improved Random Matrix Fuzzy Rule Weight Classification Algorithm.
e traditional weight calculation method cannot meet the demand of data processing in the cost industry, because the project cost data is abundant, and the conditional attributes (rules) obtained from the analysis of a large number of historical data cannot fully reflect the index attributes of the sample data. e cost engineers in the industry usually have prior knowledge of the classification of training samples. rough the explicit expression of expert tacit knowledge, the new improved comprehensive weight calculation method is adopted to reduce the influence of decision-making environment on the output of rule decision-making. e idea of constructing the algorithm model is to calculate the value of weight by combining the matching degree of conditional attributes obtained from historical data sets with the importance degree of attributes obtained from the prior knowledge of experts by integrating the weight function.
e main calculation steps of the comprehensive weight algorithm are as follows: (1) Calculate the matching degree of weighted condition attribute. A weighted index is introduced to control the compatibility degree of different samples between fuzzy classes, and the matching degree of weighted condition attribute is calculated as follows: (2) Calculate the average weighted condition attribute matching degree. By calculating the average value of the sum of matching degree of training samples on rule J (condition attribute), the classification accuracy can be prevented from being reduced due to different dispersion degrees. e calculation is as follows: (3) Find the maximum value of the weighted average conditional attribute matching degree of class h.
(4) Determine the weight of objective rules p, calculated as follows: In the engineering cost system, different important line nodes will continuously transmit different characteristic quantities carrying GPS unified time stamps to the dispatching center. ese characteristic quantities are from PMU of different locations, which have spatial characteristics. In terms of time, it is the data sampled in time sequence, that is, time sequence data. See Table 1.
e flowchart of unbalanced cost model based on random matrix weight is shown in Figure 3. e calculation steps of the prediction model are as follows: (    (2) List fuzzy subsets: the similarity between typical engineering and predicted engineering is calculated and the typical engineering is listed according to the similarity degree. e training samples are typical engineering with high membership degree.
(3) Independent use of the selected sample training network: the estimation needs to have reasonable accuracy.
(4) e accuracy is determined by examining the prediction model with the tested samples. e model can be used only when the error, the difference between the forecast and the actual, is less than 10%.
(5) If the accuracy is not well controlled, the number of samples is increased and the network is trained again. When the accuracy is always met, it is necessary to reexamine all samples in detail, distinguish and select the sample data that needs to be adjusted and deleted, and then conduct the data reorganization. e fitting function relation between different spans of rigid frame box girder bridge and the quantity of main projects such as concrete, steel wire, ordinary steel bar, and fine rolled thread steel bar is summarized in Table 2. By putting the summarized function relation into the program of prediction model, the quantity of main projects of rigid frame box girder bridge can be obtained by input span.    Mathematical Problems in Engineering department of railway administration is connected with the multieconomy development of Railway Ministry through dial-up access, and the data transmission is completed through special communication program. Multichannel server uses SQL Server 7.0 database, free installation of server applications and communication programs, client applications, and communication programs installed at the same time. e application software of statistical system includes the client application program of multiservice statistical system, the special data transmission program of client and server, and the application program of multiservice statistical system. e client program can complete data entry and historical data query, form printing and other functions, and generate 6 kinds of statistical reports. e server application realizes the functions of establishing the database, collecting files, data statistics, report output, historical data query, and printing and generates 8 kinds of statistical analysis reports. Dedicated data transmission software ensures the correct transmission of data, with the function of "resumable breakpoint." See Figure 4.
In multivariate statistical analysis methods, some require correlation in the original variables, while others do not. For example, in the cluster analysis, the correlation between the original data variables is also required in the q-type system cluster analysis. For example, when Euclidean distance, Ming distance, and LAN distance are selected, the original variables are required to be unrelated. Only after the correlation of the raw data has been processed can you choose to use the above distance. If the original variables are correlated, Mahalanobis distance is appropriate. For example, principal component analysis and factor analysis require certain correlation in the original variables; otherwise it cannot be analyzed. Canonical correlation requires that there should be correlation in the original variables, but not a high degree of multicollinearity; otherwise it cannot be analyzed.
In addition, the nonlinear relationship between the original variables also needs attention. For example, when principal component analysis, factor analysis, and canonical correlation analysis are calculated based on correlation matrix, the correlation matrix here is actually Pearson's product difference correlation, reflecting the linear correlation between variables. However, if the relationship between variables is not linear, but nonsexual correlation, then, as Pearson's product difference correlation matrix has been  unable to accurately express the relationship between the original variables, the analysis and conclusion have lost their due significance.

Experimental Design
In this paper, the project cost data set in the actual bidding documents of unit project with large variation is selected for algorithm verification. ere were 46,958 data records in the initial data set, of which 12.88% were risk points. ere are 192 fields in total, including 34 basic attribute fields, as shown in Table 3. e project is carried out in a complex and changeable environment with design changes, accelerated progress, improved standards, construction conditions, material price changes, etc. Although these are important factors that affect the construction period and cost, there will still be some false low-cost bidding, and the construction unit will increase the project, raise the price and change the bidding documents through design changes to make profits. To construct enterprise character, do not adopt similar method that will likely show deficit. However, these unfair competition means bring great risks to project investment control.
Case approach is based on the bidding documents, the project list is the budget data that has been formed into the project database. Using data mining technology in the bidding documents of new projects, according to the content and classification of data, to realize risk analysis for the bidding of new project units. From all the attributes of the original data set, 23, 35 attributes that are important for classification are selected to form new data sets Projl, Proj2.
e new data set is classified using an improved fuzzy rule weight algorithm, as shown in Table 4.

Analysis of Experimental Results
In order to evaluate the performance of the proposed method in classification of unbalanced data, and to measure its risk identification performance in engineering cost data, the original data set was classified based on Chi, CVM, AdaBoosT-SVM, and the proposed method. en, the effectiveness of the online monitoring method based on local data information increment was verified. Based on the partitioned modes, the results are shown in Figures 5 and 6.
From the point of view of project life cycle, using the theory of cost management in different stages and combining with the principle of stability structure of cobweb model, the cobweb model of project cost management about performance evaluation is put forward. In the application of cobweb evaluation, the fuzzy evaluation value of indicators at each index layer is reflected by the cobweb model, forming a cobweb evaluation map for each stage. Taking the cobweb evaluation map identified in a certain stage of engineering cost management as an example, as shown in Figure 6 below, a model similar to the shape of cobweb is constructed. First of all, the line extending from the middle represents the value of the five index layers in different stages of cost management. e five lines are divided into five parts, representing different levels of engineering cost management about performance evaluation from the center to the periphery.
e five levels are distributed from inside to outside, and the area contained in each level can vividly show that the larger the area, the higher the level. e main engineering quantity of box girder with different span is obtained in the prediction model, which is calculated by obtaining the function relation between the engineering quantity and span. According to the collected design drawings of the whole rigid frame bridge, the main engineering quantity of the bridge superstructure, such as concrete, steel strand, ordinary steel bar, and fine rolled rebar, is calculated from the design drawings of each section of box girder with different span. On the basis of obtaining the quantity of each project, the relationship between different spans and the quantity of concrete, steel strand, and ordinary steel bar is analyzed emphatically. See Figure 7.
Visible from the experiment, the algorithm of classification accuracy is higher than the other algorithms, it shows that the method of engineering cost data set with a good classification performance, the unbalanced cost data set has higher classification accuracy, and the experiment obtained its F-value higher than that of other methods, also showing the method proposed in this paper, the engineering cost data set of risk factors identification ability. In the bidding stage, the cost data set of the bidding unit is classified and tested, and the matching degree between the bidding unit and the preset risk rule base is obtained according to the matching situation of the risk rules. If the matching value is higher, the risk is higher; otherwise, it is the opposite. is method can provide a new method to judge the risk identification ability of each bidder in the bid evaluation stage and enhance the risk early warning and analysis ability in the bidding stage, as shown in Table 5.
When the system is running normally, trouble-free, inner ring area is almost constant; on the edge of the inner ring with a small number of distribution points, the calculation and analysis results show that the error point accounted for about 2.5% of the total settlement; the main reason is the system error factors, such as impact on anomaly detection to judge without interference; the simulation results are shown in Figure 8.
According to the above analysis, it can be found that the abnormal data generated by different disturbances of the system can be detected by spectral analysis in the unbalanced engineering cost and multivariate system analysis after highdimensional random matrix modeling, and its change rule can be seen. e results of other scenarios are analyzed in Figure 9. It can be concluded that the proportion of unbalanced engineering abnormal data detected by this method changes significantly when the system is in stable operation and     failure occurs. As the severity of system faults increases, the proportion of abnormal data increases and the order of data distribution decreases.

Conclusion
Aiming at the problem of unbalanced cost data classification, this paper analyzes and compares the shortcomings and advantages of various traditional algorithms in solving this problem, absorbs their redeeming points, and optimizes the classification calculation method of unbalanced cost data based on the analysis perspective of experts' prior knowledge. In the data level, precondition the imbalance cost data set, make the original data relatively balanced, and improve the calculation method of rule weight, by the fuzzy rule and inference model together to form the classification algorithm. By an example of the classification algorithm performance testing and validation of the method in a certain extent, improve the unbalanced data classification accuracy. e fuzzy comprehensive evaluation method is combined with cobweb model to obtain the performance grade of the evaluated object, which makes the model more intuitive and operable in practical application. At the same time, the index system and model of project cost management performance evaluation are applied in practice. rough analysis and comprehensive evaluation method verification, it is concluded that the cost management performance is at a better level, and the project is rated as an excellent project, which confirms the conclusion of the paper.

Data Availability
e data used to support the findings of this study can be obtained from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest or personal relationships that could have appeared to influence the work reported in this paper.