Construction industry is the largest data industry, but with the lowest degree of datamation. With the development and maturity of BIM information integration technology, this backward situation will be completely changed. Different business data from a construction phase and operation and a maintenance phase will be collected to add value to the data. As the BIM information integration technology matures, different business data from the design phase to the construction phase are integrated. Because BIM integrates massive, repeated, and unordered feature text data, we first use integrated BIM data as a basis to perform data cleansing and text segmentation on text big data, making the integrated data a “clean and orderly” valuable data. Then, with the aid of word cloud visualization and cluster analysis, the associations between data structures are tapped, and the integrated unstructured data is converted into structured data. Finally, the RNN-LSTM network was used to predict the quality problems of steel bars, formworks, concrete, cast-in-place structures, and masonry in the construction project and to pinpoint the occurrence of quality problems in the implementation of the project. Through the example verification, the algorithm proposed in this paper can effectively reduce the incidence of construction project quality problems, and it has a promotion. And it is of great practical significance to improving quality management of construction projects and provides new ideas and methods for future research on the construction project quality problem.
As large-scale, group-oriented, and complicated construction projects, especially large-scale cluster projects, are constructed, traditional project management theories, methods, and models could not fully meet the needs of actual management anymore. Engineering quality management is one of the key contents of domestic and foreign researches, for it not only relates to the project itself but also relates to the life and property security of people. At the same time, although the management level of current construction project quality management in China is continuously improving, quality problems are also on the increase.
The 21st century is an era of data explosion. All walks of life are flooded with massive amounts of data, which contain enormous commercial value. Therefore, big data has become the focus of attention. Since the reform and opening up, China’s construction industry has rapidly grown in size, promoted economic and social development, and continued to expand the market capacity of the industry [
Research results on quality issues of engineering projects at home and abroad would be of high reference value for resolving project quality problems. However, there are also many shortcomings that are insufficient to support the requirements of a quality management of modern engineering projects. Further research is needed. At present, quality issues are solved by means of postremediation, when quality problems have already brought losses to the construction and use of the project and even led to quality and safety accidents, threatening the lives and property of the people. This is inconsistent with the quality management concept of “prevention first.” Therefore, in solving quality issues, a brand new perspective and thinking should be adopted to find a practical method and means. And in modern engineering project management, we should ensure the quality of construction products through prior management and implement the management philosophy of “prevention first.” This paper takes the quality of construction projects as the research object, uses the big data generated by the construction engineering process as the data source of enterprise-level BIM, extracts the historical data of many project cases for data searching, deeply explores the root causes of quality problems, and discusses the loopholes in the current project quality management in China so as to optimize the engineering quality management system. The research results would have certain practical significance for improving the quality management level of construction projects and provide new ideas and methods for future researches on construction project quality issues.
This paper takes the big data generated by the construction engineering process as the data source of enterprise-level BIM, extracts the historical data of many project cases and conducts data mining, discusses the loopholes in the construction quality management of Chinese construction companies at this stage, optimizes the project quality management system, and provides a certain reference for the further development of China’s construction industry.
Construction industry is the industry with the largest scale and the largest amount of data. The construction project has a relatively long life cycle and is generally divided into the design phase, construction preparation phase, construction phase, completion phase, and operation and maintenance phase. Each phase will generate a large amount of data, such as a large number of engineering drawings in the design and construction phase, raw materials, basic components, cost, quality, security, materials, and other information, so the entire project will produce a large amount of data from the beginning of the construction to the finalization; it can be divided into two types of structured and nonstructured and stored in the form of digital statements and text files [
Big data has long been in physics, biology, environmental ecology, military, finance, and communication industries. However, the construction industry with huge amounts of data still does not have its own enterprise-level and project-level databases. The whole project has been isolated from the Internet and big data, so it is much weak in management, innovation, transformation, and updating. The traditional architectural engineering quality management theories, methods, and thinking pattern could no longer meet the requirements of the big data era under the new situation. To solve quality problems, we must think out of the traditional mindset and search for new theories and methods which are suitable for the new environment, new technologies, and new situations, from a new perspective and direction to guide construction project quality management activities and dissolve construction project quality problems. Big data has provided new ideas for optimizing construction project quality management. In the era of big data, we can analyze more data, extract project quality data from the big data multidimensionally, and explore root causes for quality issues with data search methods. Moreover, it can enable us to predict the key points of quality management in the implementation of new projects and solve the drawbacks of traditional quality management by relying on postevent inspections to control product quality, so as to prevent the occurrence of quality problems, and realize and implement the quality management concept of “prevention first.”
Considering that these building data are scattered in the localized management in China, the classification, preservation, query, and update of data are very difficult, and a systematic data management and utilization system is not formed. A large amount of data still needs a data manager to handle manually every day, while the user of the data—the project manager or technician—only depends on paper materials or personal network communication to transmit the project information [
Building information modeling (BIM) is based on various related data generated in the implementation of a construction project. It constructs a database with the data collected over the entire life cycle of a building project and breaks down the single-line links between the participants of the project. The model changes the passive situation in which traditional projects rely on paper materials or personalized network communication to deliver project information, enabling participants to understand the progress of the project in real time and using Internet technology to search for the latest, most accurate, and most complete project data and data. It reduces quality problems caused by low collaboration efficiency and is an important way to realize the refinement and information management of the construction industry.
The birth and development of BIM break the single-line contact model between participants of the project. The era of relying on paper materials or personalized network communication to deliver project information is gone forever, enabling participants to understand the progress and profile of the project in a timely and comprehensive manner which reduces many unnecessary quality issues. The emergence of the BIM data integration platform has brought great advantages to data continuity and consistency. Project management in all stages of the life cycle is based on the 3D solid model [
Changes in the mode of information exchange.
Big data is a collection of data that cannot be captured, managed, and processed using conventional software tools within a certain time frame. It is a vast amount of data that requires a new processing model to have greater decision-making power, insight, and process optimization capabilities. Big data is a diversified information asset with a high growth rate. The core of big data technology lies in the specialized processing of data, which searches for data information and adds value to data by increasing the “processing ability” of data. BIM has a powerful back-end storage system, including data layer, model layer, and information application layer, which creates an efficient platform for information integration. Based on the information data of the construction project, it defines basic data such as collection attributes, physical structure attributes, and functional attributes of the components and builds a 3D building information model based on these data. BIM can realize dynamic, integrated, and visual information management. Model objects are related to attribute information and report data [
The data source of the BIM database.
Data is the support of management. The foundation of engineering project management is the management of engineering data. BIM technology accelerates the informationization of the construction industry. BIM technology can record all the data of the entire project lifecycle and create a project database. The accumulation of multiproject data based on BIM will form an enterprise database that internally stores massive amounts of data. Therefore, BIM can be regarded as the carrier and foundation of the construction industry database. It can be called the “source code of construction industry big data” [
BIM technology accelerates the informatization of the construction industry. BIM, which is known as the “source code of construction industry big data,” can record all historical data of the entire project lifecycle. The cycle-based accumulation of multiitem data based on BIM will form a huge database and store a large amount of data internally. Therefore, BIM can be regarded as a stable and reliable database. Based on the enterprise-level BIM database, this paper abstracts the multidimensional data from multiple cases and extracts the quality information data. Through data mining methods, it explores the root causes of many quality problems during project development and focuses on the key points in the project quality management process when the new project is launched and strengthens the ability to control the project management, which is important for the fundamental improvement of project quality management.
On the basis of using BIM to generate, extract, and mine quality management data and to convert from BIM model data to BIM big data, we should pay attention to the value of big data and be the owner and beneficiary of big data. Among them, the construction engineering quality management text data as a branch of data mining is based on the knowledge discovery of text documents, mining hidden, valuable, potentially unknown information from a large-scale text collection needs to pretreat the unstructured data in the text collection, such as text cleaning, text segmentation, text clustering, semantic network analysis, and so on [
Analyze a large amount of unstructured text data, formulate data cleaning rules, and perform text segmentation to obtain a standardized data source.
Extract the links between keywords and keywords by text clustering, semantic network analysis, and so on to achieve dimensionality reduction and text representation of the text, and visualize the data in the form of tools;
According to the text content, further analyze the visual data to obtain useful knowledge information in the text.
The process of text mining.
In data warehouses, unstructured data are described in natural language and in various forms. Due to the recording habits, errors in records, and incomplete records, problems such as incomplete data, incorrect data, and inconsistent data are inevitable. Through the cleaning process, duplicate data, missing values, entry errors, and meaningless values are processed and abnormal data are detected and adjusted as soon as possible to provide standardized high-quality data for subsequent excavation work. Mainly, the following measures should be adopted to achieve data normalization.
In natural language, different expressions of the same object will form different texts, and the segmentation process will form different nodes. To avoid duplicate nodes, text mining effects are affected. First, the original text is quasisegmented and the different representations of the same object are screened; secondly, the merger rules are formulated and the duplicate expressions are merged. By deduplicating the data (see Table
Mapping of synonyms for architectural engineering.
The group of words with the same meaning | Normalization |
---|---|
Mutual anchoring length; anchorage length; bending anchor length; elbow length; hook length | Anchorage length |
Concrete; concrete; commercial concrete | Concrete |
Standard curing chamber; concrete curing room | Standard curing chamber |
Aseismic reinforcement; rebar with E | Aseismic reinforcement |
— | — |
Design alteration; altered design; change | Design alteration |
Treatment of general quality diseases; requirements for prevention and treatment of general quality diseases; prevention and control measures for general quality diseases of residential engineering | Prevention and control measures of general quality diseases |
Levelness; planeness | Planeness |
Out of plumb; unstraight; not square | Out of plumb |
Acceptance specification; acceptance specification of quality; acceptance specification of construction quality | Acceptance specification of construction quality |
At present, the ability of automatic data acquisition is low, and most of them are manually entered, and there may be some problems with input errors. By comparing the existing input data with the contents of the standard query table and adding the artificial semantics to identify, the error information is corrected; after that, the standard query table established by the initial demand can be referred to, and the data in the table can be directly selected to reduce the input error (see Table
Record error correction record.
Original record | Error correction |
---|---|
The cement should be fed on demand; pay attention to the waterproof; the good processing has just covered to prevent the corrosion. | The cement should be fed on demand; pay attention to the waterproof; the processed steel is covered to prevent the corrosion. |
The length of the connecting beam of the steel bar anchored to the bottom of the two floors of the number 2 floor is insufficient. | The length of the steel bar anchored to the connecting beam of the two stories suspended the beam of the number 2 building. |
Dismantling the floor of the floor is not in place. | The protection of the dismantling floor slab is not in place. |
Stirrup | Horse stool |
Silk mouth | Thread |
Number 18 windowsill beam is not normally set. | Number 18 window beam Weitong long set |
25-layer ALC cutting quality did not meet the requirements. | 25-layer ALC block quality did not meet the requirements. |
In partially missing data, infer and supplement records based on the contextual semantic environment (see Table
Missing value processing records.
Original record | Processing method | Processing results |
---|---|---|
Sand filling, ash adding | Complete records | Sand aerated block, ash adding block |
GZ1 | Complete records | Steel column 1 |
Short sleeve | Filter | Semantic ambiguity and inability to divide subdivisional projects |
ApproachΦ8, Φ10 bale bar | Filter | The semantics is not clear, and the quality problem cannot be judged |
Because of the two commonly used text segmentation rules (word segmentation algorithms based on dictionary matching and word segmentation based on statistics, that is, probability segmentation), they are all based on the accumulation of a large number of corpus and existing dictionaries. The Chinese expression is organized according to certain grammatical rules based on real words and virtual words. In the Chinese expression, there is no obvious segmentation between the words; in addition, different words form different words and correspond to different meanings. Based on this, for a large amount of text data, text segmentation is needed, which will help quickly parse text semantics from a large amount of text. The ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) 3.0 developed by the Institute of Computing Technology, Chinese Academy of Sciences, is the best Chinese lexical analyzer in the world. The R language supports word segmentation-based word processing algorithms. It is much more accurate and efficient than other word segmentation algorithms.
First of all, we segmented the texts of the main structural quality issues of the construction projects that have been collected; at the same time, phrases annotated with text and have no significant effect on text content such as “existence,” “partiality,” “one layer,” “parts,” “individual,” “serious,” and “phenomenon” are filtered out; the phrases with the frequency of 50 times or more (see Table
The text keyword table of the quality problem of the main body structure.
Words | Freq | Words | Freq | Words | Freq | Words | Freq |
---|---|---|---|---|---|---|---|
Rebar | 711 | Stirrup | 139 | Frame column | 84 | Honeycomb | 73 |
Concrete | 361 | Anchorage | 120 | Drag hook | 83 | Tie bar | 71 |
Construction | 244 | Shear wall | 112 | Design requirement | 83 | Wall column | 700 |
Wall | 226 | Spacing | 107 | Set up | 82 | Exterior wall | 69 |
Pouring | 208 | Roof | 98 | Design requirement | 80 | Drawing | 68 |
Lashing | 197 | Welding | 97 | Electroslag pressure welding | 77 | Block | 68 |
Tectonic column | 190 | Main reinforcement | 96 | Bonded rebars | 76 | Dismantle | 68 |
Structural column | 188 | Install | 95 | Joint | 75 | Mortar | 64 |
Deficiency | 166 | Stairs | 91 | Postpouring belt | 73 | Lack of length | 64 |
Loading a custom dictionary into the R language enables text segmentation based on dictionary matching. By introducing the word segmentation results into the ROSTCM 6.0 software for word frequency statistics, the text of the subject structure quality problem is finally visualized in the form of a word cloud. For all phrases in the entire text set, word cloud expressions can display rich text information [
Textual nephogram of the quality problem of the main structure of the building engineering.
The word cloud can demonstrate the key summary of quality problems in the process of construction project quality management. It mainly reflects issues in two aspects. First, the distribution of high-frequency phrases is used to identify the hot spots of quality issues. By counting the changes of high-frequency words over time and identifying the trend of changes in quality, we can, to a large extent, reflect the remediation of existing quality defects and identify new emerging quality problems. This has a great role in guiding the development of quality management of construction projects. Second, the monitoring of low-frequency phrases can reveal the emerging quality of construction technology and the quality problems that are gradually exposed during its application. For the early detection of new quality problems, the prevention of the quality of new technologies is of great benefit.
It can be seen from Figure
Convert text content into a matrix, and assign corresponding weights to each phrase. TF-IDF is used to evaluate the importance of a phrase for a text set or a text file in a corpus. The greater the TF-IDF value of a phrase is, the higher the frequency TF of the phrase appears in a text, but rarely occurs in other texts, and it can be considered that the phrase has a good category distinguishing ability. It is used as a keyword for text clustering. Furthermore, since the description texts of the quality questions studied in this paper are short texts, the probability of repeated occurrences of the same keyword in any text is extremely low, and the calculated TF-IDF is more accurate. Therefore, in this text mining, the weight of each phrase is calculated using the TF-IDF value, and the TDM matrix (TF-IDF value of the text
The TF (term frequency) in TF-IDF represents the frequency of occurrence of a certain phrase
In the formula,
Check the relevant literature, and adjust the number of sparse entries to determine the number of categories that need to be classified. In order to achieve dimensionality reduction of high-dimensional texts and to analyze the classification results, the final choice is to remove 3% of sparse entries and divide the text into 13 categories.
The ward.D method is a method of clustering by the sum of squared deviations and an index of the corpora after text segmentation. The index is set by using the hierarchy method, the upper class will include the content index of the lower class, and the lowest class has the clearest classification. The vertical coordinate height represents the sum of squared deviations between classes. The greater the square and increase in dispersion between classes, the more differentiated the two categories are. The results of text clustering are shown in Figure
Text clustering results.
The class or cluster in the clustered tree is the center of the class or the boundary point of the class or the logical representation of the sample attributes. Therefore, the results of the clustering need to be conceptualized. The ontologies studied in this paper are the ontologies of the field of construction engineering. Ontologies in the field of quality of construction engineering follow the general definition of domain ontology, that is, research on individual (entity) or individual collections (concepts) through specific rules in the domain. The relationship between entity and concept is described by adopting some related characteristics and parameters of building engineering. Ontology in the field of architectural engineering quality links textual data of architectural engineering through certain logical relationships, allowing this disorderly information to follow certain rules and forming an information network with rich relationships, clear relationships, and large volumes through the description of ontology.
From Figure
Large-scale construction projects, obvious stages, multiparticipation, and professional-related features make it difficult to implement refined and comprehensive quality management. The cluster analysis is used to correlate the quality descriptions of the different engineering fields of the construction project ontology in different engineering phases, and the word frequency analysis is performed. However, because cluster analysis can only rely on the degree of subjective discriminant quality problems, it is impossible to quantitatively and accurately determine the probability of quality problems. This uncertainty greatly increases the construction cost. In this paper, an improved recurrent neural network is introduced to predict the probability of occurrence of quality problems in different engineering areas under field construction conditions in real time, and corresponding management measures are taken to reduce the probability of occurrence of quality in order to improve the construction efficiency.
Recurrent neural network (RNN) has achieved good results in the task of sequence modeling, and many tasks in real life are time series, such as natural language processing (NLP). The dialogue system and the machine translation system also have time series features in the collection of architectural text information during the construction period. Therefore, the recurrent neural network adapts to the real-time forecasting tasks of construction project quality management. A large number of textual data in construction projects are largely idle due to their processing difficulties. The establishment of a predictive model for the quality of recurrent neural networks makes an efficient use of architectural text data and optimizes the industrial structure. It is of great significance to accelerate the intelligentization of the construction industry.
Figure
Traditional recurrent neural network.
Figure
Recurrent neural network extends on time domain.
Equations ( Forward propagation:
Backward propagation:
The circulatory neural network allows the output of the previous moment to be multiplied by the corresponding weight and then the output obtained by the activation function as the input at the current moment. Therefore, for the loop chanting network, the characteristics of the current moment often include the characteristics of the first
The traditional circulatory neural network incorporates the concept of time on the basis of a multilayer feed-forward neural network, provides a neural network with a memory function, and enables the neural network to show good modeling ability on time series data, but on the time dimension. The most immediate problem caused by the deep layers is that the gradient disappears or explodes. The long short-term memory neural network (LSTM) proposed by Hochreiter and Schmidhuber has well controlled this problem [
The LSTM unit is shown in Figure
LSTM optimization of a recurrent neural network unit.
Let First, we calculate the candidate memory cell value at the current moment We calculate the value of the input gate We calculate the value of the forget gate We calculate current memory cell status value We calculate the output gate The output of the last LSTM unit is given by
The general logistic sigmoid function in the above formula is the range of values
Through the cluster analysis, the steel reinforcement project, the formwork project, the concrete project, and the masonry project are taken as separate construction project problems, and the relevant text data is analyzed and forecasted separately. Based on the BIM data platform, while assisting and optimizing the full-process quality management workflow in which all employees participate, the integration of heterogeneous and heterogeneous data from all stages, participants, and majors is consistent with the project quality sequence. Temporality, quality issues have some contextual relevance throughout the project construction process. That is, the input of historical engineering as a sequence requires a model that continuously learns the quality characteristics of the project before and after. The recurrent neural network (RNN-LSTM) has a memory-memory function by connecting feedback nerves. The project quality sequence forecasted in this paper is closely related to the quality of historic projects. For example, under certain construction preparation and construction conditions, there will be relatively close historical quality problems. RNN-LSTM can train sequence generation, process real data sequences at each step time, and predict what will happen next. The model is adjusted by continuously iteratively calculating the conditional probability between the input samples and the predicted results. Taking the engineering quality of steel reinforcement as an example, the RNN-LSTM rebar engineering quality forecast was established (see Table
The parameter of RNN-LSTM model.
Input vector |
… | ||||||
---|---|---|---|---|---|---|---|
Materials | |||||||
Engineering parameters | Subdivision project | Material name | Material properties | Project implementation stage | Scene description | … | Build name |
Index text | Reinforcement engineering | Reinforced template | Length | Construction stage | According to the design | … | Construction column, wall |
Output vector |
… | |||||
---|---|---|---|---|---|---|
Quality problem node | Reinforcement processing | Reinforcing bar connection | … | Install | ||
Index text | Reinforced length | Lashing | Welding | Threaded connection | … | Anchorage |
The vector sequence
The framework according to the model of this article only involves a layer of LSTM hidden layers; of course, this can be continuously adjusted as a parameter, where
After the clustering, the structured BIM platform big data is trained through the RNN-LSTM network. When the overall error rate is below 20%, the training is stopped.
As shown in Figures
Network training of reinforcement engineering.
Network training of template engineering.
Network training of concrete engineering.
Network training of masonry engineering.
This article is based on a new project (X) for selected construction companies, the theme of which is to design and build a two-story villa, whose total construction area is 821 square meters, the investment of which is about 10 million yuan and the duration of which is 85 calendar days. The project completion plan and the three-dimensional model diagram are presented in Figure
Examples of X engineering design.
Figures The schedule is tight and the task is heavy. The total construction area is 821 square meters, and the project duration is 85 calendar days. In order to ensure the project will be completed and delivered for using within the contract period, participants must work together efficiently, maximizing the quality of production, guaranteeing completion on schedule. In large project volume, if using traditional quality management methods to carry out management activities, we are prone to quality problems due to lack of management, in turn affecting project quality and duration.
In this paper, we excavate the historical data of similar cases in the BIM database of the enterprises, discovering the entire building lifecycle quality problem of the company’s past cases (see Table
Frequent-point statistics of quality problems.
Subproject | Frequent points of quality problems |
---|---|
Reinforcement engineering | Stirrups, main ribs, pull hooks, pull tabs, concrete works |
Concrete engineering | Pouring and appearance (honeycomb, matte, exposed bar, hole) |
Template engineering | Installation of support systems (racks, poles, joints) and demolition |
Masonry engineering | Grey joints, verticality control, retention of horse teeth and holes during masonry |
After the training of the neural network model passed, quality-related data generated in real time during construction as input was used, to achieve real-time forecasting of project quality. Based on data mining and engineering quality prediction model, construction projects can be completed on time with quality assurance, and during the entire project term, its frequency of having quality problems will be greatly reduced (see Figures
Prediction of engineering quality problems.
Prediction of quality problems after revision.
Based on the historical data mining results of previous cases and the established construction project quality forecast model, providing a starting point for optimizing the quality management system, in the light of the five stages for the operation of construction projects, the following requirements can be made:
In the design stage, when drawing the design, we should pay attention to the accurate checking of the construction property, dosage, and use of subitems such as steel bar, concrete, formwork, and masonry. In the construction preparation stage, using BIM’s powerful 3D modeling tools, we need to perform a collision check on drawings and design plans, focus on model comparison of subengineering parts, and reduce the quality problems caused by drawing design deviations. In the construction stage, through text mining, finding that the quality problems in the construction process are mostly located at the node, therefore, enterprises should perform node management and node visualization for complex nodes. In addition, we set up a project quality rescue team, based on engineering quality prediction model, solving the problem of quality timely and effectively. In the completed stage, we invite an authoritative quality inspection agency to carry out a quality inspection of the project. In the operation and maintenance stage, we regularly check the quality of the steel bar, concrete, formwork, and masonry structure.
This research takes the quality of construction projects as the subject, and BIM integrated construction engineering big data source as the foundation. Firstly, this article is based on BIM integrating big construction data sources, extracting the text data related to the project quality, and carrying out data cleaning and text segmentation. Then, with the help of word cloud visualization and cluster analysis, we mine the links between data structures, lock the frequent points of quality problems throughout the building’s lifecycle, and turn the integrated unstructured data into structured data. Finally, using RNN-LSTM to predict the quality problems of the subdivision projects such as steel bars, formworks, concrete, cast-in-place structures, and masonry in construction projects, it is more precise to locate the node of the quality problem in the implementation of the project. Through strength verification, this method can reduce the incidence of quality problems in construction projects effectively. The empirical study proves that the method is feasible, scientific, and reasonable and can effectively reduce construction project quality problems.
The data used to support the findings of this study are currently under embargo while the research findings in this paper are confidential. Requests for data, after the publication of this article, will be accepted by the corresponding author.
The authors declare that there is no conflict of interest regarding the publication of this paper.