Prediction Model of Collapse Risk Based on Information Entropy and Distance Discriminant Analysis Method

1School of Earth Science and Resources, Chang’an University, Xi’an 710054, China 2Key Laboratory of Western Mineral Resources and Geological Engineering, Ministry of Education, Xi’an 710054, China 3State Key Laboratory of Water Resource Protection and Utilization in Coal Mining, Shenhua Group Co., Ltd., Beijing 100011, China 4Geological Survey Institute Co., Ltd., Sino Shaanxi Nuclear Industry Group, Xi’an 710100, China


Introduction
Collapse is a geological phenomenon whereby rock and soil on a steep slope suddenly fail, move downslope, and accumulate at the foot of the slope, as a result of gravity and other external forces [1][2][3][4][5][6][7][8].Due to the ongoing development of railways, highways, and other projects in western China, slope collapse and landslides are also increasing.During and following earthquakes, high and steep slopes are prone to collapse.The 5.12 magnitude Wenchuan earthquake in 2008, for example, resulted in subsequent collapse [9], causing serious damage to highways and other transportation infrastructure.Therefore, a quantitative assessment of damage to highways caused by collapse would provide strong support for a risk assessment of regional geological disasters and lay the foundation for the sustainable development of the region of interest.
Many methods of disaster risk assessment relating to collapse have been developed.For example, analytic hierarchy process (AHP) and the fuzzy comprehensive evaluation method were applied to evaluate the risk of collapse by Liu [10] and Xue [11].These authors proposed a risk evaluation model of collapse hazard, based on the comprehensive integrated method of extenics and fuzzy theory.Gao et al. [12] developed a model based on geographic information system.He et al. [13] established a comprehensive evaluation model of collapse risk evaluation based on uncertainty measurement models, and Liu [14] used the probability method and a Newmark displacement calculation model to evaluate the landslide hazards in Changbai Mountains area.The traditional evaluation method is simple and fast, but deciding which factors are used in each case is subjective.In some cases significant calculations are required and the application of this approach is limited.Since the specific geological conditions vary by location, the different evaluation methods in consideration of factors due to the different criteria cannot be used each other [15].The discriminant analysis method is also known as the "resolution method" and is a multivariate statistical method of analysis.This method is used to determine the attribution type based on various eigenvalue.The discriminant analysis method has been widely used in many fields of natural and social sciences since its development.Especially in China, the distance discriminant analysis model was firstly introduced in the practical geotechnical engineering by Gong and Li [15,16], and the good results had been obtained.We consider here the factors affecting collapse activity during construction and develop a risk prediction and evaluation model of collapse hazard, based on the distance discriminant analysis method.This approach provides a new method of risk prediction and evaluation of collapse hazard.
In practice, the impact index of collapse activity is more complex than can be modelled, due to variations in both the internal characteristics of the collapse body (internal causes), such as elevation, slope, lithology, soil type, and land utilization, and exterior factors that induce collapse disaster (external causes), such as groundwater, precipitation, vibration, and human activities.Some of these factors have no influence on the results of the present study.Therefore, we applied the entropy reduction method when analyzing the collapses on both sides of the Yingxiu-Wolong highway in Hanchuan County, Sichuan Province, after the 5.12 magnitude Wenchuan earthquake.To complete this process, we reduced the initial indexes, gained the major evaluation index system affecting collapse activity, and excluded uncorrelated indexes.On this basis, we collected further data and used the distance discriminant model to perform a comprehensive evaluation of the collapse risk.

Entropy Measurement
We obtained an evaluation matrix  = (  ) × by evaluating  indexes of  schemes.The term   is an assessment value of the index of  scheme.For a given ,   ( = 1, 2, . . ., ) has greater difference, and the comparison function of index to scheme is bigger.It also contains and transmits more decision-making information; information entropy can measure the information intensity; that is, where  > 0, ln is the natural logarithm, and () ≥ 0. This formula of entropy quantification shows that, for a given ,   has a greater difference and () is greater.When all   are equal,   /  = 1/, () is at its maximum value.That is,  max =  ln  if  = 1/ ln , () = 1.For the purpose of scheme comparison, index  has no distinguishing ability.
Evaluation value of each project is of greater difference, comparison of index to scheme is bigger, and the distinguishing ability of the index is stronger.
The total entropy of the evaluation matrix is defined for is a measure of the ability of each index to distinguish [17,18].

Classification Model of the Distance Discriminant Analysis Method
The discriminant analysis method is typically based on sample data of each category grasped in the past and summarizes the law of objective classification.This establishes specific criteria to determine which overall category a new sample belongs to.In discriminant analysis, consideration of the Euclidean distance does not consider the dispersion characteristics of the overall distribution.Mahalanobis [15,16,[19][20][21] first suggested the concept of the Mahalanobis distance in 1936.The basic principle is to compare the Mahalanobis distance of the sample with some entire population; the nearest belongs to some entire population.

Mahalanobis Distance.
The population is given as  = { 1 ,  2 , . . .,   }  , where  is the dimension population, the sample is   = { 1 ,  2 , . . .,   }  , and the set is   = (  ) ( = 1, 2, . . ., ).The population mean vector is  = { 1 ,  2 , . . .,   }  .The covariance matrix of the population, , is where  and  are set for two samples from the population .The square of the Mahalanobis distance between  and  is The square of Mahalanobis distance of the sample  and the population  is

Distance Discriminant of Two Populations.
The means of the two populations  1 and  2 are  1 and  2 , respectively, and the covariance matrices are Σ 1 and Σ 2 , respectively (Σ 1 , Σ 2 > 0). ×1 is a new sample for which the population was determined.The distance from  ×1 to  1 and  2 is defined for  2 (,  1 ) and  2 (,  2 ) and is determined according to the following criteria: When Σ 1 = Σ 2 , the discriminant can be simplified as follows: where  = (1/2)( 1 +  2 ) and  = Σ −1 ( 1 −  2 ).Note that the real number is equal to its transpose, so The set () =   ( − ); therefore, the discriminant rule is In practical terms, because the population mean and covariance matrix are typically unknown, the data are from two population training samples; then the mean and covariance matrix of the sample is used instead of the population mean and covariance.
Only two sample covariance matrices,  1 and  2 , can be determined.We therefore use the following to determine the total covariance matrix, when the two population covariance matrices are equivalent: where  1 and  2 are the capacity of the two samples. When The discriminant rule is The discriminant rule is  ∈   , if  2 (,   ) = min 1≤≤  2 (,   ).

Evaluation of Discrimination Criteria.
To investigate the properties of the above-mentioned criterion, the back substitution estimation method, based on the training samples, is used to calculate the error.
Two populations,  1 and  2 , are used, and the population   ( = 1, 2) of the training sample is where   is the number of samples taken from   , and the capacity of the two populations is

Evaluation Index System of Highway Collapse Risk
The factors contributing to risk of collapse are varied and complex [22][23][24][25][26][27][28].We have proposed here a highway collapse hazard evaluation system, based on 15 indexes.The evaluation grade of each index is one of four points { 1 ,  2 ,  3 ,  4 }.These can be rated as I (extremely high risk), II (high risk), III (moderate risk), and IV (low risk).The specific grading standards and descriptions are listed in Table 1, and the classification of series is listed in Table 2.

Typical Examples of Collapse along the Yingxiu-Wolong Highway following an Earthquake
The Yingxiu-Wolong highway (highway S303) is located in Hanchuan County, Sichuan Province, along the northwest margin of the Sichuan Basin.The highway's total length is 45.5 km and it is an important trunk road.The highway was fully paved prior to the 5.12 magnitude Wenchuan earthquake.The highway follows the Longmenshan tectonic belt from the Beichuan-Yingxiu Fault to the Houshan Fault, and the geological conditions are complex.It is the nearest highway to the epicenter in the Hanchuan disaster area, and seismic hazards in the region can cause serious damage.Many rock mass collapses caused by the earthquake buried and damaged the road itself, bridges, and a tunnel entrance and exit.As a result of the focus of research on disaster assessment and the prevention of further slope collapses along the highway after the earthquake, a wealth of data regarding collapse risk has been accumulated.The collapse data used here were collected along the Yingxiu-Wolong highway after an earthquake [13,23].Fifteen collapses were chosen and assigned values for each influencing factor.Each qualitative index is valued by the classification standard quantitative method.It is used to divide each   index into four categories, namely, I, II, III, and IV, respectively, indicating extremely high risk, high risk, moderate risk, and low risk, and the corresponding numerical value of 4, 3, 2, and 1 is given to these four grades, respectively.Each quantitative index is valued by measured values.The basic evaluation data of each collapse are listed in Table 3.  4.
The collapse hazard distinguishing abilities of indexes 5, 8, 10, 11, 14, and 15 (Table 5) are small and can be considered for exclusion.Removing these six indexes from the evaluation model, at the time of evaluation, these factors are no longer the most important index; they can be set to the "threshold" or "critical value" to the preliminary screening alternatives.Based on the above analysis, the nine indexes remaining form the basis of the evaluation index system of highway collapse risk.The investigation statistics data of the index system after reduction are listed in Table 6.

Distance Discriminant Analysis Model for Collapse Hazard
Level Discriminant.In the collapse data, 10-sample data were studied and the remaining 5-sample data were taken as unknown samples to determine.Nine factors (slope shape, aspect, gradient, and height and exposed structural face, strata lithology, relationship between weakness face and free face, rainfall erosion, and weathering degree of rock) were taken as the discrimination factors for the distance discriminant analysis model.The collapse hazard was divided into four levels (I, II, III, and IV), respectively, indicating extremely high, high, moderate, and low risk.
The I, II, III, and IV collapse risk categories were derived from four different populations, and we assume that the  7).
Ten historical training datasets in the study area were analyzed and compared with the actual collapse risk types.The ten groups of collapse risk type were the same as the discriminant type, and the error rate was zero.These results show the reliability of the distance discriminant model after training.
Five groups of measured data for the discriminant samples were analyzed using distance discriminant prediction by statistical software, and the prediction results were compared with the actual situation (Table 7), obtained through field  investigation and comprehensive analysis.The prediction result is in accordance with observed results, and the accuracy rate is 100%.The results indicate that the risk level prediction and identification method of collapse hazard established in this study is effective.

Conclusions
(1) This distance discriminant analysis model for collapse hazard prediction is based on results from previous works and considers factors influencing hazard uncertainty and information entropy.
(2) We have used methods of entropy measurement to reduce the number of indexes, by excluding irrelevant indexes, to achieve the aim of index optimization.The nine principal indexes affecting collapse activity were extracted as the discriminant factors of the distance discriminant analysis model to develop discriminant prediction.These indexes are slope shape, aspect, gradient, and height, along with exposed structural face, strata lithology, relationship between weakness face and free face, vegetation cover rate, and weathering degree of rock.
(3) Results show that the model achieves the aim of index optimization and has good learning performance, a zero error rate, and high prediction accuracy.It is an effective method for the prediction of collapse hazard and provides a new way to evaluate the risk of collapse.

)
3.3.Multipopulation Distance Discrimination.For the mean vector of  and the dimension population  1 ,  2 , . . .,   and  1 ,  2 , . . .,   , the covariance matrix is Σ 1 , Σ 2 , . . ., Σ  .Therefore, the square of the Mahalanobis distance from sample  to each group is 1 and  2 .All of the training samples are new samples ( 1 +  2 ), and substituted discriminant criteria are established to discriminate ownership, a process known as back discriminant. 12 indicates that the number of samples in population  1 is misjudged as the population  2 , and  21 indicates that the number of samples in the population  2 is misjudged as the population  1 .The back substitution estimation of error is

Table 1 :
Evaluation factors and grading standard of highway collapse risk.

Table 3 :
Evaluation index values.

Table 5 :
Distinguishing ability of index systems.

Table 6 :
Survey statistics data of the index system after reduction.

Table 7 :
Results of the distance discriminant analysis method.
5.3.Test of theDiscriminant Model.Ten groups of measured data were used as training samples, five groups of measured data for the discriminant samples, and 15 samples for analyses, based on a distance discriminant model.Ten training samples were classified and compared with actual results (Table