Safety Early Warning Research for Highway Construction Based on Case-Based Reasoning and Variable Fuzzy Sets

As a high-risk subindustry involved in construction projects, highway construction safety has experienced major developments in the past 20 years, mainly due to the lack of safe early warnings in Chinese construction projects. By combining the current state of early warning technology with the requirements of the State Administration of Work Safety and using case-based reasoning (CBR), this paper expounds on the concept and flow of highway construction safety early warnings based on CBR. The present study provides solutions to three key issues, index selection, accident cause association analysis, and warning degree forecasting implementation, through the use of association rule mining, support vector machine classifiers, and variable fuzzy qualitative and quantitative change criterion modes, which fully cover the needs of safe early warning systems. Using a detailed description of the principles and advantages of each method and by proving the methods' effectiveness and ability to act together in safe early warning applications, effective means and intelligent technology for a safe highway construction early warning system are established.


Introduction
China is currently one of the top infrastructure investors in the world. From zero highway breakthroughs in 1988 to the 74,100 kilometers of highway traffic mileage implemented by the end of 2010, comprising the second greatest highway network in the world; China has achieved a level of development that took western countries over 40 years to accomplish in only 22 years, realizing a historic breakthrough in highway construction. In keeping with an overall construction plan for an 850,000-kilometer highway road network [1], an increasing number of highway construction projects will come into operation over the next 10 years, the growth of which is unprecedented. While highways have generated significant economic benefits in the rapid development of the last 20 years, they have also resulted in billions of RMB of economic losses due to safety issues, highlighting the severe safety concerns in this industry.
According to the accident statistics for construction project safety issued by the Ministry of Construction shown in Figures 1 and 2, because China's related department strengthened management and improved managerial stuff educational level, the numbers of accidents and fatalities have been decreasing annually over the past three years. The total number of accidents and deaths is relatively large, and the number of people who have died of safety accidents in construction projects in China is 1.5 times that of the total death tolls in 50 other developed countries, including the United Sates, the United Kingdom, Germany, and Japan. The accident occurrences in road construction projects, which are a high-risk subindustry in construction, account for 34% of total construction project accidents, while the fatalities in this subindustry account for approximately 31% of all construction project fatalities and are caused by five types of accidents: height crashes, construction collapses, object attacks, electric shocks, and machinery injuries. The safety conditions in this subindustry are not satisfactory.
According to computations of static investments, it is estimated that the future capital required for national highway network construction is approximately 200 billion RMB. National highway construction will be occurring fairly rapidly until 2020. The annual investment was approximately 140 billion RMB until 2010 and will be approximately 100 billion RMB from 2010-2020. However, the direct and indirect losses caused by safety issues account for 2% of the annual total investment, which is a large figure that greatly hinders the development of road construction. At first, the industry thought that the safety issues had purely incidental or unexplainable reasons, and concern for safety was limited to fatalities and property loss. With improved knowledge and concern for safety issues, the industry began to see that the occurrences were more or less related to incidents but also had their own laws and features. Because it has gotten a late start, the study of safety management in China is only an initial attempt in terms of both theory and practice, with imperfect on-site safety management materials, an indirect and hysteretic quality to safety effectiveness, and widespread uncertainties in construction projects. Thus, the importance of the construction safety work in China has been ignored for decades. So, there is an urgent need for current construction safety work to switch from accident handling after accidents to forecast at the initial stage, switch from handling the accident to predicting and preventing the accident, and switch from traditional management to modern scientific management. The key link to realizing this transition is construction safety early warning technology. The essence of safe early warning technology in construction projects lies in precontrol, prophase management, transitioning from accident handling to accident prevention, discovering and addressing potential risks at any time, and eliminating accidents in the early stages of a project. Therefore, early warning is one of the most effective methods of curbing accidents and reducing safety losses. In April 2011, Wang [2] noted at the 14th session of national construction safety officer working meetings that construction enterprises should establish and perfect safe production dynamic monitoring and early warning systems in addition to analyzing and auditing the hidden dangers and risks of their construction projects at regular intervals.
Further studies on early warning management models exist abroad and are focused mainly on macroeconomic premonitoring and microenterprise crises, such as an early warning study on financial crises [3,4], computer network crises [5], and natural disasters, such as tsunamis and earthquakes [6,7]. However, the study of industry production safe early warning, particularly early warning in construction projects [8], is relatively rare. A theoretical study of early warning in China must commence with the circular fluctuation of the economy in the middle of the 1980s [9] and then transition to Jan. Jan.to Feb.
Jan.to Mar.
Jan.to Apr.
Jan.to May Jan.to Jun.
Jan.to Jul.
Jan.to Aug.
Jan.to Sept.
Jan.to Oct.
Jan.to Nov.
Jan.to Dec .  2008 35  51  141 232 318 391 480 566 649 768 878 964  2009 29  56  123 215 294 364 456 535 608 690 739 802  2010 60  68  135 191 270 338 412 521 592 654 718  the noneconomic early warning that has occurred in recent years, beginning with early warning management studies in the field of construction projects, such as coal mining [10,11], bridge construction monitoring [12,13], and deep excavation [14,15]. Although the phrase "early warning" has been mentioned very frequently in other countries, systematic and indepth studies are still rare and mostly focus on the computer technology involved in early warning management information systems. The accident losses during highway construction in China over the past 20 years have been caused mostly by the lagging study and practice of safe early warning; thus, improving early warning abilities and preventing safety issues are now the industry's most challenging tasks.

Key Technology for Safe Early Warning Systems
Safety and risk are mutually contradictory and dependent in major construction projects. Safety risks do not exist alone on a microscale, and safety issues cannot be induced by a single risk element. In essence, safety is a systematic project containing subunits, such as safety risk forecasting, distinguishing safety risks, risk associations, risk element importance ranking, safety investment and effectiveness, safe early warning, safety evaluation, and an emergency response plan. On a macroscale, safety is related to construction progress, project quality, investment cost, and effectiveness, and these factors are interrelated and interact with one another, leading to an external action mechanism for safety issues. Based on the current knowledge of safety, the selection of monitoring indices for corresponding early warnings should have a hierarchy. This paper divides early warning monitoring indices into a compulsory index hierarchy and dynamic index hierarchy. The compulsory index includes an average safety training time, safety education coverage rate, licensed personnel rate, site safety member rate, safety symbol installation rate, temporary electricity usage management standard rate, reasonableness of machinery material management, fire protection management standard rate, safety danger patrol, safe production meeting frequency, employment injury insurance coverage rate, height workload, and ecological conditions. The dynamic index includes deviations in the project progress and investment costs, soil stress changes and deformations,  and variations in water level and environment. The improvement of safety awareness and safety standards is adopted for the compulsory indices in the early warning process, while such methods as reinforcing the monitoring of dangerous areas and time zones, qualitative and quantitative change monitoring of the index values, and the division of different warning districts are included in the dynamic indices. Once the analyzed data enter the warning districts, we can effectively curb safety issues with the different control measures that are taken according to the warnings made based on the level of severity. A complete and scientific safe early warning process includes the selection of monitoring indices and association analysis of the causes of the accident and warning degrees. Because a highway has such features as a one-off quality, uniqueness and a high level of uncertainty, the indices for early warning, accident association, and warning degree forecast should be uniquely based on the project features. Therefore, this paper introduces case-based reasoning (CBR) technology to the field of highway construction safe early warning systems to increase the accuracy and effectiveness of the technology. CBR is an important branch of artificial intelligence and originated in 1982 as part of Yale University Professor R. Schank's "Dynamic Memory, " a book that created the basic theory of case-based reasoning. CBR is a similar or analogical type of reasoning that is designed to use existing experience and cases to solve new problems while also explaining the new situations. By accessing a knowledge base used to solve similar problems in the past, the current problem solutions are given an inference model or the use of old cases or experiences to solve new problems, evaluate new issues, explain atypical circumstances, or understand a new situation. CBR technology is used to solve a problem directly using previous examples of knowledge and can effectively solve difficult or problem areas that cannot be expressed otherwise. The selflearning function of CBR ensures the continuous enhancement of its reasoning, and it efficiently handles important items that are close or similar to the means [16]. However, papers involving both construction project safety and CBR are very rare-there are dozens abroad and less than 10 from China. However, these papers focus mainly on safety diagnosis, quality control, and slope stabilization and accident emergency response, and none are deep or thorough enough.
Based on the advantages of CBR in project applications and its high accountability and communicability, a 2010 key scientific project regarding major accident prevention and solution technologies for safe production, issued by China's national safety supervision bureau, discusses CBR and shows us that CBR technology is increasingly used in construction safety studies.
Timely and accurate early warning systems can effectively reduce the occurrence of accidents and eliminate safety losses while maximizing the effectiveness of safety investments. This study is based on case-based reasoning technology and researches three key links of early warning systems, as shown in Figure 3, which is a virtuous cycle process of self-learning. Using the analogical reasoning-based features of CBR, the key to the application is in searching former cases that are similar to new projects because experiences from previous similar cases are more thorough and accountable and more severe or potential risks can be mined and identified. Therefore, we should search existing cases with similar control properties to the new projects, in which control properties can be set as indices, such as project type, construction technology, geological conditions, and methods of precipitation and water drainage. Next, we can calculate the similarity of comparative properties, such as construction costs and project kilometers, based on the search, filtering finished projects for which the similarities surpass a minimum threshold. The risks and accidents experienced by the similar projects can be summarized and used as keywords to search a case library, mining risk associations that lead to accidents, and then strongly correlated associations exceeding the minimum threshold of association rules' support and confidence as an accidentprone frequent item can be set to reinforce monitoring. This study uses the association degree to determine index weight. The greater the relationship to risk accidents is the heavier the index weight becomes. Because the indices have different types and associations, this paper uses a support vector machine with a strong generalization capacity and variable fuzzy set approach to perform the warning degree forecast and assure the accuracy of the warnings. These two methods have excellent theoretical superiority and comparatively lagging applications, so this paper combines cases to analyze and verify application effectiveness based on the two methods' principles and advantages.

Association Rules
Association rule mining is one of the most active directions of study in data mining, which is an important Knowledge Discovery in Database (KDD) research subject initially proposed by Ramakrishnan et al. [17]. Data mining reflects interesting or relevant associations among projects from a large database. With the increasing scale of data collected and stored in data libraries, people are becoming more interested in the mining of relevant association knowledge from these data.
There are two important concepts in the algorithm of association rule, support and confidence. If the proportion of objects and in data library is , then we can say that the support of the association rule for and in is , support( → ) = support( ∪ ) = ( , ). If the proportion of data library containing objects and at the same time is , then we can say that the confidence of the association rule for and is , confidence( → ) = support( ∪ )/support( ) × 100%, or ( | ). The support reflects the importance of association rules in the data library, and the confidence measures the accountability of the association rule. Using association analyses from previous construction projects, Chen [18] applied the grey association analysis approach to distinguish between the association elements affecting safety preevaluation systems and sequence the primary and secondary associated danger levels of dangerous substances, thus solving the uncertainty and accountability issues in safety accidents. Sawacha et al. [19] analyzed numerous accident samples and summarized the top 5 important elements associated with on-site safe production. Siu and his colleagues [20] made a comparative analysis of their associations from personal elements and accident rates, while Halperin and McCann [21] determined relevant elements from the study of frequent accident locations. Case-based reasoning association rules are different from the association analysis performed in the literature because references provided by similar cases can more accurately reflect the dependence and association between a monitoring index and risk events. In the mining process for early warning rules, we first set the minimum threshold for the support and confidence of the association rules. Then, we search all of the highfrequency risk sets related to safety issues in the case library and generate strongly correlated rules from these cases.
This study uses relational algebra theory-based association rules to perform risk association mining, and the algorithm only needs to scan the data library once (overcoming the classic Apriori algorithm's weakness of needing to scan a data library multiple times) and has good concurrency and scalability. Assuming that is the case library and = { 1 , 2 , 3 , . . . , } and = { 1 , 2 , 3 , . . . , } are the case set and risk itemset, respectively, the matrix is as follows: which stands for the binary relation from to . In the formula, the value of ( = 1, 2, . . . , ; = 1, 2, . . . , ) is 1 or 0, representing whether case includes risk element . ∑ =1 / is the support of property for the 1st set. If the support is bigger than the minimum threshold, then the risk item element is 1 large itemset. If an itemset is not large, then any sets including this itemset can never be large. Therefore, 2 large itemsets must search based on 1 large itemset. Assume that is a 1 large itemset, stores 1 large relevant itemset, and = 1, 2, . . . , , so has elements. ∑ =1 ( and )/ is the support of 2 itemsets { , }, so the support must be larger than the minimum threshold to be 2 large itemsets. These conditions apply to all itemsets. If there exists an item to make ∑ =1 ( −1 and V )/ larger than the minimum support threshold, then { 1 , 2 , . . . , −1 , ]} is a large itemset.
This paper considers height crash accidents, which have the highest occurrence and number of fatalities, as an example. Table 1 represents cases similar to the 12 height crash accidents obtained from the case library and their risk associations. -represent separate risk elements, such as safety belt failure or lack of safety belt use, strut damage, loss of body control, safety facility failure, and safety net damage.
The algorithm is described in MATLAB R2007a as in Algorithm 1.
Set the minimum support threshold of this early warning association rule to 40%. Then, the 1 large itemset from this algorithm is which is the same as the results gained from the classic Apriori algorithm. A large itemset can typically represent the mechanism of action, so curbing the occurrence of a large itemset is key to safe early warnings. The rules set by this association algorithm are fixed, so if we can use it as a base and combine quantitative data, such as the probability of basic events and accidents, sensibility of basic events, safety thresholds, or safety investment effectiveness, the risk element association rules can be further deduced.

Support Vector Machine
The accuracy of a warning degree forecast decides the pertinence of safety precontrol measures and the effectiveness of safety investments. Different warning degrees indicate different measures and investment costs. Therefore, safe early warning systems have strict classification method requirements to make full use of investment costs, effectively control risks, and avoid accidents. The interpretation of a neural network does not give it the ability to learn and can easily cause weak generalization characteristics. To combat this tendency, this study introduces the most successful statistical learning theory, support vector machine technology. The support vector machine (SVM) solves small samples with nonlinear and high-dimensional pattern recognition performance, giving it many unique advantages. Cortes and Vapnik [22] first proposed the SVM in 1995 and based it on statistical learning theory, and the theory of VC dimension is based on the structural risk minimization principle according to the limited sample information in the model complexity (the learning accuracy of a particular training sample, or accuracy) and the learning ability (error-free samples that identify any capacity) to establish the best compromise between the two and obtain the best generalization capability (or generalization) [23]. The main advantages of SVM technology are that its small samples can solve machine learning problems, improve generalization performance, solve high-dimensional problems and nonlinear problems, and avoid neural network structure selection and local minimum problems.
Experiments have shown that the results of fitting a loworder function are better than the results of fitting a higherorder function in noisy conditions, even if the true model occurs several times [24]. Thus, attempting to use a very complicated model to fit a limited sample, even with the "optimal" function, results in the loss of generalization ability in lowdimensional space.
Unlike traditional statistical methods, the SVM defines structural risk minimization as its goal and makes a good pre-selection using a nonlinear transformation, nuclear function, and low-dimensional input vectors mapped into  a high-dimensional feature space. An optimal separating hyper plane can be constructed in this feature space. In other words, the promotion of a high-dimensional space constructed with a low-dimensional space produces more powerful functions, as shown in Figure 4. The SVM two-dimensional realization of the situation in Figure 5 can be used to explain its use. The solid and hollow points represent two samples for the classification line, 1 and 2 , respectively, from the classification of various line types in a sample of recent data. In the classification of lines parallel to the straight line, the distance between the lines is called the classification interval (margin). The so-called optimal separating line requires that the correct classification of a line not only be capable of separating the two line types (a training error rate of 0) but also be capable of classifying the largest interval, or the promotion of capacity control, which is one of the core concepts of the SVM.
The classification line for the equation of is ⋅ + = 0, where 1 and 2 are classes 1 and −1, respectively, and the equations of 1 and 2 are ⋅ + = , = 1 and ⋅ + = , respectively, with = −1. The determination of whether the sample belongs to class 1 or class −1 can be summarized as 6 The Scientific World Journal The interval classification is equal to 2/‖ ‖, so the maximum interval is equivalent to the minimum ‖ ‖ 2 . Therefore, (2) the constraints to meet the minimum are ‖ ‖ 2 /2, the classification of surface is called the optimal separating surface, and 1 and 2 point to the training samples, called support vectors.
Because the presence of noise will not distinguish between some samples, even if the low-dimensional vector is mapped to a high-dimensional feature space, the introduction of slack variables and a penalty factor represent that the data noise in the fault tolerance of the SVM achieves better classification results. The purpose of the representation is to allow part of the introduction of the point that does not meet the requirement that the outliers give up. The resulting generalized optimal separating line model is This equation can be transformed into a dual problem for its resolution, and because it is a convex quadratic programming problem, there exists a global optimal solution.
In summary, the SVM training error and generalization, according to the limited sample information in the model complexity, find the best compromise to solve for small samples in nonlinear, high-dimensional problems, such as pattern recognition. Although the SVM is widely used and the method has many unique advantages, research into its use is still relatively lagging. This paper introduces the use of the SVM method into case-based reasoning for construction safety warning degree forecasts, preserving the objectivity of actual risk elements while maintaining the forecast accuracy of warning degrees, achieving target precontrol measures, and avoiding accidents. This paper takes the historical data from [25] (Table 2) as an example and considers a case study of vector machine applications in safe early warning systems.
The process can be described in MATLAB R2007a as follows: %% Support Vector Machines  This case makes the warning degree its object. Levels 1-3 represent slight, moderate, and severe warnings, respectively, and the other 7 elements represent the risk properties, forecasting the warning degree of this case through the support vector machine classifier. The first 31 cases are set as training samples, and the final 5 are set as testing samples. In the classification setting, the kernel function selects a radial basis function while optimizing the parameters of the cross-validation process. The training samples are randomly divided into 5 groups, and for the maximum number that appears, the crossvalidation accuracy of the smallest group of is selected because the high penalty parameter causes the algorithm to learn and is not conducive to the generalization of the results. Based on this result, the best penalty parameter = 181.0193 and the best RBF kernel parameter = 0.03125, for which the highest cross-validation accuracy rate is 74.1935%, are shown in Figure 6.
In the final training set, the forecast results show that the classification accuracy of the classifiers trained by this set is 90.3226%, and the testing set samples have an accuracy of 100%, with the entire classification process lasting only 3.96 seconds. At the same time, the accuracy of the BP neutral network approach for this testing set is only 60%, and it takes 22.48 seconds to finish the classification, which indicates the generalization capacity strength of the SVM. The SVM has major advantages over neutral networks in terms of its forecast accuracy and efficiency and can efficiently improve the pertinence of precontrol measures and the effectiveness of safety investments.

Variable Fuzzy Qualitative Change Criterion Mode
With the existence of a dynamic index, safe early warnings must be a process of dynamic monitoring. As the project progresses, the index values will change dynamically among warning districts, with some changing across warning districts and some changing only within warning districts. Therefore, the index values are a critical test for illustrating warning accuracy and disguising whether the index change is P l P m P r a quantitative or qualitative change. Professor Chen proposed relative difference function-based variable fuzzy sets [26][27][28] with quantitative and qualitative change (i.e., gradual and abrupt) criterion modes [29]. Assume that for any element ( ∈ ), there is a vague concept in the universe of discourse at any point on the reference continuum axis of the relative membership function. The relative membership of to is ( ) and ( ) to , the opposite concept of , and ( )+ ( ) = 1. Among these variables, 0 ≤ ( ) ≤ 1 and 0 ≤ ( ) ≤ 1. As shown in Figure  is called the fuzzy variable set and + , − , and 0 are called the attraction basin (main), rejection basin (main), and gradual qualitative change boundary, respectively.
Assume that is the variable element set of and = { , , }, where is a variable model set, is a variable model parameter set, and is the other variable element set excluding the model and its parameters.
When summarizing the above statement, we can conclude that the criterion modes of the variable fuzzy qualitative and quantitative changes are as follows.
(2) If ( ) > 0 and ( ( )) < 0, then ( ) ⋅ ( ( )) < 0 is a gradual qualitative change (through ( ) = 0).     change boundaries. If ( ) > 0, ( ) changes from a certain positive value to "1" (abrupt change), then ( ) ⋅ ( ( )) = | ( )| is an abrupt qualitative change (without ( ) = 0). If ( ) < 0, ( ) changes from a certain negative value to "−1" (abrupt change), then −| ( )| ⋅ ( ( )) = | ( )| is also an abrupt qualitative change (without ( ) = 0). Therefore, we can see that the criterion mode for abrupt qualitative change without ( ) = 0 can be summarized as ( ) ⋅ ( ( )) = | ( )|. As this process continues, the criterion mode for abrupt qualitative change with ( ) = 0 can be summarized as ( ) ⋅ ( ( )) = −| ( )|. Because current construction projects tend to excessively favor internal indices in dynamic index monitoring for safe early warning systems, the abnormal state of external indices, such as cost and progress, can also have a negative effect on safety situations. This study uses the changes in construction progress and investment for the 210 road section of a national highway as an example to verify the effectiveness of this qualitative and quantitative change model [30]. This road starts at the Qiujiahe River on the cross-boundary between Sichuan and Chongqing and ends at Heishizi in the Jiangbei District of Chongqing, connecting with the Yuchang highway. The highway has a total length of 53.108 kilometer. Tables 3 and 4 summarize the progress and investment costs, respectively, for Contract F of this project. The linear formula for the relative difference in this application document [31] is where 1 is the investment cost deviation rate, 2 is the progress deviation rate, the comprehensive evaluation of risk is based on the two deviation rates, and the relative differences between the investment cost and progress from February 2003 to September 2003 are calculated separately according to the [ , ] and [ , ] interval eigenvalues in Table 5. Assume that the weight vector of the two indices is = (0.5, 0.5). Then, the relative difference in monthly risk is Table 6 shows the monthly comprehensive relative difference. The closer the value of ( ) is to −1, the greater the risk and the greater the pressure for safety are. The closer the value is to 1, the safer the project is. Based on the results in Table 6, we can see the change in tendency to risk. The safety level experiences a gradual qualitative change from April to May, while the changes from February to April and from May to September are both quantitative. The change is negative from February to April, indicating that the deviation in progress and investment costs during this period have a negative effect on safety production, and the change from May to September is positive, indicating that the deviation in progress and investment costs during this period does not have a negative external effect on safety situations. The results are simple and intuitive, and the existence of association rules means that the occurrence of safety issues is a combined action of multiple risk elements. Therefore, it is far from sufficient to set a warning degree threshold for individual risk elements to ensure safety when monitoring dynamic indices. This method can also be applied to the dynamic monitoring of multiple indices and intervals. We can create corresponding control measures based on the results, thereby curbing safety issues.

Conclusions
Early warning technologies are used to determine both safety situations and safety losses. Good early warning technologies can not only reduce losses by limiting the available accident sources but can also indirectly lower investment costs by guiding safety input benefit maximization. Different from other existing research, the following conclusions and recommendations are made based on this research.
(1) By using analogical reasoning-based CBR, this paper gives a basi schematic for solutions to safe early warning technologies by practically solving the three-key issues of index selection, accident cause association analysis, and warning degree forecast, which also penetrate the whole process of safe management. (2) Combined with the characteristics of highway projects, as well as the possible problems in the process of data processing, this paper introduces association rule mining, support vector machine classifiers and variable fuzzy qualitative and quantitative change criterion modes in order to keep the data of high fidelity.
Together with experiments proving the effectiveness of the methods, the proposed method is a completely feasible and effective means of improving our country's early warning technologies. (3) With the gradual application of artificial intelligence to the security of construction projects, the CBR technology can be applied to safe early warning systems for construction projects in our country. However, research shows that a lack of existing cases and the complexity of the data are the biggest bottlenecks in the application of CBR. Therefore, further study of CBR technology and the settlement of data processing, case statistics, and searches of highway construction safe early warning systems are key to improving the practicability of safe early warning systems and safety management.