A Combined Prediction Model for Subgrade Settlement Based on Improved Set Pair Analysis

Prediction of subgrade settlement is a complex problem involving various uncertainty factors. To overcome the defects and limitations of the single prediction model, a combined prediction model based on the improved set pair analysis was proposed to take into account the uncertainty and certainty of the single prediction model and make the combined prediction based on the certainty degree, and the criterion of set pair relationship was optimized. In the model, the set pair was first constructed to express the relationship between predicted andmeasured values.Then the risk of the set pair relationship identificationwas expressed based on the Bayesian decision theory, and the optimal criterion of set pair relationship was obtained by the adaptive search algorithm. Next, the relationship between the predictionmodel and themeasured data was analyzed to get the certainty degree, and the weight coefficient was obtained according to the certainty degree. Finally the combination of single prediction models was carried out. A case study and comparison with other methods were conducted to confirm the reliability and validity of the proposed model. The result shows that thismodel fully considers the uncertainty and certainty of the single predictionmodel and also extends themethod of determining the criterion of set pair relationship, which provides ideas for other combined prediction and evaluation problems.


Introduction
China's railway is entering a new stage, whether it is a highspeed railway that aims to improve the speed of operation or a heavy-haul railway that aims to improve the axle load of locomotives and vehicles; it imposes stringent requirements on the stability of subgrade.The postconstruction settlement of railway subgrade directly affects the safety, stability, and operating life of the train; therefore, accurate settlement prediction is important for the safety evaluation, reinforcement, and maintenance of the subgrade.Consequently, accurate prediction of subgrade settlement has been a major concern in engineering research.
Current prediction models for subgrade settlement are mainly divided into single prediction models and combined prediction models.The single prediction models are mainly divided into three categories: (1) curve fitting method, such as exponential curve method, hyperbolic method, Asaoka method, and Poisson method [1][2][3][4][5]; (2) backanalysis method, such as the iterative backanalysis layer by layer method, the modified layer-summation based on elastoplastic theory, and the viscous-elastic BIOT consolidation FEM based on merchant model [6][7][8][9]; (3) system theory method, such as time series analysis, gray theory, and neural network method [10][11][12][13].However, single prediction methods are often unsatisfactory: the parameters in the curve fitting method and the system theory method do not have definite physical meaning, and the two methods analyze the settlement law of subgrade from a specific aspect; therefore, it is inevitable to ignore or lose part of the information contained in the data [14].For the backanalysis method, the solution of constitutive model parameters requires a large amount of experimental and measured data, and the validity of the data has a significant influence on the accuracy of the method, which limits the application of this method.In summary, there are certain defects and limitations in the single prediction models.
To comprehensively analyze the subgrade settlement and combine the advantages of the single prediction model, the combined prediction model was introduced [15][16][17].The combined prediction model has the advantage of reducing the predictive risk and improving the accuracy, which have been widely applied and developed in the settlement prediction.Leng [18] and Wu [19] proposed a combined prediction model based on the correlation coefficient, reciprocal of variance and fitting error, and introduced MSE, MAE, and MAP indicators to evaluate the prediction effect.Zheng [20] and Zhao [21] proposed a combined prediction model based on the residuals and discussed the criteria for selecting basic prediction models.Different from the methods above, Li [22,23] thought that the combined prediction based on the squares of the errors or the absolute values of the deviations was not necessarily the optimal combination and proposed the superior combination prediction model with the maximum efficiency.
The weight coefficient is the key of the combined prediction model [16].In the existing literature, the weights are mostly based on the squares of residuals and dispersion, the validity, and the fitting error.The methods of determining the weights are not uniform.Moreover, those combined prediction models often neglect the relationship between the information contained in each single prediction models in the process of portfolio analysis, which restricts the in-depth mining of data information and fails to make full use of existing information for combination prediction.
Subgrade settlement and its prediction is a complex and uncertain system problem affected by many factors.Set pair analysis is a novel analytical method for systematic problems of uncertainty [24].It dialectically treats uncertainty and certainty by identity-discrepancy-contrary analysis as a whole, which can comprehensively describe the complexity and uncertainty problem.This theory has been applied in the rating, risk assessment, and correlation analysis in civil engineering [25][26][27], and also provides new methods and ideas for studying the uncertain system problem of subgrade settlement and its prediction.
The set pair relationship is the key of set pair analysis.In the existing method of determining the identity-discrepancycontrary relationship, the subjective experience method lacks scientificity and rationality, and the association membership method is difficult to apply.The mean standard deviation method and the trisection method are commonly used to determine the criterion of set pair relationship [28][29][30], but these methods only segment the sample numerically and do not adequately reflect the information and internal relationship contained in the data; thus, the information contained in single models has not been fully utilized and optimally combined.
The main objective of this paper is to introduce a combined prediction model based on set pair analysis and decision theory for subgrade settlement.In the proposed model, the relationship of information from the each prediction model was fully considered based on set pair analysis, and the criterion of set pair relationship was reasonably determined according to Bayesian decision theory and obtained using self-adaption algorithm.Based on the certainty degree, the weight coefficient was obtained and the information from each prediction model was optimally combined.Moreover, the feasibility and validity of the proposed method were further discussed by case study taking the subgrade settlement of Shuohuang heavy-haul railway as an example, followed by a comparison with single prediction models and a combined prediction model.The proposed model also provides new ideas for other combination prediction and evaluation problems.

Theory
2.1.Set Pair Analysis.Set pair analysis is a theory proposed by scholar Zhao for systematic problems of uncertainty [24,27,31].The theory uses the certainty degree  to quantitatively describe and analyze the relationship between certainty and uncertainty of systematic problems.The mathematical model corresponding to certainty degree  is written as follows: where  denotes the quantitative expression for set pair relationship, , , and  are identity degree, discrepancy degree, and contrary degree, respectively, and  +  +  = 1.
Parameter  is the discrepancy coefficient within [−1, 1], and parameter  is the contrary coefficient and generally specified as -1.

Bayesian Decision Theory.
Bayesian decision theory is a powerful tool that enables decision-making even with limited data, succeeding where traditional statistical methods have failed to capture inherent dynamics.It is to make decisions with the least risk or cost under probabilistic uncertainty.This method has been applied in many research fields, such as image processing and machine learning [32,33].For a given object , let  = { 1 ,  2 , . . .,   } be a finite set of  states, and let  = { 1 ,  2 , . . .,   } be a finite set of  possible actions.Let (  | ) be the conditional probability of an object  being in state   given that the object is described by .Let (  |   ) denote the loss, or cost, for taking action   when the state is   .For an object , suppose action   is taken.Since (  | ) is the probability that the true state is   given , the expected loss associated with taking action   can be expressed as follows:

Combined Prediction Model Based on Set Pair Analysis
The combination prediction of subgrade settlement is essentially the analysis of relationship between each prediction model.The proposed model uses the certainty degree  to express the certainty and uncertainty of the prediction information of each model, and the certainty degree is obtained based on the set pairs and the criterion of set pair relationship.Next, set pairs are constructed using the absolute error Δ  , as shown below.
= { 1 ,  2 , . . .,   } is the measured settlement of monitoring points, and   = { 1 ,  2 , . . .,   } is the predicted settlement of the th prediction model.Set pairs are constructed using the absolute error Δ  of the predicted and measured values, and the absolute errors are normalized for subsequent analysis.
The absolute error Δ  and the normalized value   express the accuracy and validity of the predicted value, as well as the set pair relationship between the predicted and measured value.If the error value is less than a limit value  1 , it is considered that the prediction model has a good prediction at this point, and the predicted value is in the identity relationship with the measured value.And if the error value is greater than a limit value  2 , then the prediction effect is bad, and the predicted value is in the contrary relationship with the measured value.Otherwise, the prediction effect is in general, and the predicted value and the measured value are in a discrepancy relationship.The total number of samples is .Based on the set pair relationship, the number of samples in which the relationship between the predicted and measured value is identity, discrepancy, and contrary is , , and , respectively, and the set pair relationship between the measured values and the predicted values of the model  can be expressed by the certainty degree   as follows: The certainty degree   comprehensively reflects the accuracy and effectiveness of prediction model .The larger the certainty degree   , the better the prediction effect and the larger the weight coefficient of this prediction model.Therefore the certainty degree   can be the basis for determining the weight.Normalize the certainty degree   to obtain the weights and the combination of single prediction models is given as

Criterion of Set Pair Relationship Based on Bayesian Decision Theory
From the set pair analysis above, the set pair relationship is the key.But for the settlement problem in this paper, the criterion of set pair relationship is unknown.The mean standard deviation method and the trisection method are commonly used for systematic problems of uncertainty where the criterion of set pair relationship is unknown, but the two methods determine the criterion numerically and subjectively and do not consider the distribution and intrinsic relationship of the sample date.Therefore, this paper introduces a new method to determine the optimal criterion of set pair relationship based on the minimum-risk Bayesian decision theory.
In order to apply Bayesian decision theory, the meaning of normalized absolute error   is explained here.The normalized absolute error   satisfies the form and meaning of probability in Bayesian theory.The smaller the normalized absolute error, the better the prediction effect and the greater the probability that the predictive value is in the identity relationship with the measured value.Conversely, the greater the error, the worse the prediction and the greater the probability that the predictive value is in the contrary relationship with the measured value.Otherwise, the relationship between the predicted and measured value is discrepancy.Therefore, the normalized absolute error   can be used as the subjective probability in Bayesian decision theory for risk assessment and parameter decision-making and finally to determine the criterion of set pair relationship.
For a given object , let  = {, −} be the set of two states.Let  = {  ,   ,   } be the set of three possible actions, where   ,   , and   , respectively, represent actions that classify the object  into identity, discrepancy, and contrary relations.Let  be the conditional probability of an object  being in state , and the conditional probability of an object  being in state - is 1-p.The expected loss  associated with taking the individual actions and the expected loss   of taking actions on all samples can be described as follows: where   ,   , and   are expected loss by determining object  into identity, discrepancy, and contrary relations.  ,   , and   denote the loss for taking actions of   ,   , and   , respectively, when the object  in fact belongs to .   ,   , and   denote the loss for taking the same actions, respectively, when the object  actually does not belong to .
When the action taken on the object  matches the actual state, the expected loss is minimal, and the criterion of set pair relationship is the most reasonable.Therefore, the principle of determining the set pair relationship can be expressed as the following minimum-risk decision rules: To simplify the decision rules, we make some discussions about the relationship between loss functions.Consider a special kind of loss functions with   <   <   and   <   <   .That is, the loss of classifying an object  belonging to  into identity relation is less than to the loss of classifying  into the discrepancy relation, and both of the two losses are strictly less than the loss of classifying  into the contrary relation.And the reverse order of losses is used for classifying an object  that does not belong to .Then we can obtain the principle of determining the set pair relationship expressed in , substituting formula (8) into formula (10): This principle (11) can be written in simple form as follows: where Furthermore, the cost of classifying an object  into the discrepancy relation is closer to the cost of a correct relation than to the cost of an incorrect relation.
is the product of the differences between the cost of making an incorrect action and cost of classifying an object into the discrepancy relation, and (  −  )(  −  ) is the product of the differences between the cost of making a correct action and cost of classifying an object into the discrepancy relation.So we can get the following inference: Based on the relationship between the loss functions above, we can get the relationship between , , and : 0 ≤  <  <  ≤ 1.Furthermore, we can simplify the principle (12) to the principle below.Here, we classify the sample at the critical point as identity or discrepancy relation, not discrepancy relation.

𝑝 ≤ 𝛽 𝑖𝑑𝑒𝑛𝑡𝑖𝑡𝑦 𝛽 < 𝑝 < 𝛼 𝑑𝑖𝑠𝑐𝑟𝑒𝑝𝑎𝑛c𝑦 𝑝 ≥ 𝛼 𝑐𝑜𝑛𝑡𝑟𝑎𝑟𝑦 (15)
The solution of the minimum value of expected loss   involves the six loss functions and parameters , , and , and the six loss functions can be expressed in terms of the parameters , , and .So we make the following assumptions and simplifications.In six loss functions, the loss   generated by determining the object  belonging to  as the identity relation should be 0, and the loss   generated by determining the object  not belonging to  as the discrepancy relation should also be 0. Let us set   equal to 1, so the six loss functions can be simplified to three as follows: Assume that the number of samples which are determined to be identity, discrepancy, and contrary relations is , , and , respectively.Substitute formula (16) into formulas ( 8) and ( 9), and the expected loss   caused by the actions of all objects can be expressed as follows: When the expected loss   is the minimum, the actions on the set pair relationship determination for all samples are the most appropriate, and the parameter (, , ) is the optimal threshold for determining the criterion of set pair relationship using formula (15).

Solution of Optimal Threshold Based on Self-Adaption Algorithm
The minimum of the expected loss function   involves three parameters: , , and .The traditional direct solution method has a large solution space and a heavy calculation.The self-adaption algorithm has the advantages of automatically adjusting the processing parameters and constraints according to the characteristics of the data [34][35][36].It is widely used for the solution of optimal values and parameter determination and provides a solution for the criterion of set pair relationship.Since the parameter  is not involved in the criterion, it only appears in the expected loss function.
To simplify the algorithm, the parameter  is discussed in advance.Without involving professional knowledge, the loss generated by determining the object  that actually belongs to  as not belonging to  is equal to the loss generated by determining the object  that does not actually belong to  as belonging to , that is,   =   .Then  = 0.5 can be obtained from formula (16).The algorithm is briefly described as follows.
Step 3. On the premise of  ≥ , replace the parameters ,  with p 1 , respectively, and calculate the corresponding expected loss   .Compare the expected loss values under different (, ) conditions, and take the corresponding threshold value (, ) as the optimal parameter value when the expected loss   is the minimum.
Step 4. Continue to perform Step 2 and Step 3 on  2 until all elements in set  have completed the above steps, and select the final (, ) to determine the criterion of set pair relationship.

Case Study
In order to verify the validity and feasibility of the proposed model, the settlement data of Shuohuang heavy-hual railway subgrade were used to conduct the analysis.The subgrade is in the transition section of the railway and bridge.The thickness of the ballast on the bridge and the transition section is about 1.45m and 0.5m, respectively.The filler of the subgrade is a silty clay with low liquid limit.The profile diagram of the transition section is shown in Figure 1.
The settlement data obtained is along the longitudinal direction of the transition section.In order to avoid the monitoring data affected by the vibration caused by train passing, only the data of the skylight was selected for analysis.At the same time, the information of the train passing through the monitoring point was recorded.
In this paper, the settlement at different monitoring points shows similar rules, so one of the monitoring points can be selected as a demonstration for model calculation and verification (EL2 was selected in the paper).The settlement of the subgrade and the cumulative load of the passing train are shown in Table 1.
For the existing heavy-haul railway with certain years of operation, the instantaneous settlement and the primary consolidation settlement have been basically completed, and the current settlement is composed of the secondary consolidation settlement and the permanent deformation of subgrade.In this case, the subgrade settlement is very small; therefore, the applicable prediction model for this situation is limited.In this paper, hyperbolic model, exponential curve model, GM (1, 1) model, Poisson model, and neural network model with better prediction effect were selected as basic single prediction models to conduct the combined prediction of subgrade settlement under cumulative train load.Since the neural network method was carried out by numerical calculation software, the detailed formula is not given in Table 2.The basic prediction models are shown in Table 2.
Based on the basic prediction models, the predicted results were obtained as shown in Table 4. Then the absolute errors of the predicted results were calculated and normalized.Then we obtained the set , which contains the connotation of the set pair relationship and the probability.The expected loss function and the criterion of set pair relationship were constructed using the method in section 4, and the parameters (, ) were solved using the algorithm in Section 5.The solution of parameters is shown in Figure 2, and the criterion of set pair relationship is determined as formula (18). ≤ 0.385  0.385 <  < 0.633   ≥ 0.633  (18) It can be seen from the criterion of set pair relationship that the prediction results of the single prediction models   are mostly in the identity relationship with the measured values, and the prediction effect is good.While a small number of prediction results are in a discrepancy or contrary relationship with the measured values, and the prediction effect is not satisfactory, which lead to the occasional risk and volatility of the single prediction model.
Further, the certainty degree is determined to quantitatively describe the prediction effect and the relationship between the prediction model and the measured data.Count the number of samples in which the set relationship between the predictive and measured values is the identity, discrepancy, and contrary, and the certainty degrees   of three single models are as follows: Under the general conditions of existing research [24][25][26][27][28][29][30], the discrepancy coefficient  and contrary coefficient  are usually specified as 0.5 and -1, respectively.And the certainty degree can be further calculated as  − 1 = 0.679,  − 2 = 0.679,  − 3 = 0.643,  − 4 = 0.393, and  − 5 = 0.643.Normalize the certainty degree, and then obtain the weight coefficient and the combined prediction of subgrade settlement, using formulas ( 6) and (7).The weight coefficients and combined prediction results are shown in Tables 3 and 4.
In order to compare and analyze the prediction effects of the proposed model, the prediction error was analyzed.The standard error (SE) was selected as the accuracy evaluation index of the prediction model, and the relative standard error (RSE) was selected as the stability evaluation index.The  proposed model and the comparison with other prediction models are shown in Figure 3 and Table 5.
The single prediction models have good prediction effect for most samples, but there are some volatility overall.By contrast, the two combined prediction models show better prediction effect; they have significant advantages in the comprehensive utilization of information and the improvement of prediction stability and accuracy.Furthermore, compared with the combined prediction model in literature [18], the stability and accuracy of the proposed model in this paper are higher, and the prediction effect is better.

Conclusions
The current combination prediction models for subgrade settlement neglect the relationship between the information contained in single prediction models and fail to make full use of existing information for combined prediction.Subgrade settlement and its prediction are a complex and uncertain system problem affected by many factors, and set pair theory is a novel analytical method for systematic problems of uncertainty.Therefore, a combination prediction model based on set pair analysis was proposed in this paper.Furthermore, Bayesian decision theory is integrated into the set pair analysis to intelligently determine the criterion of set pair relationship.Finally, the information contained in single prediction models is optimally utilized and combined.With the case study and the analysis of prediction effects, the following conclusions are obtained.
Based on Bayesian decision theory, the criterion of set pair relationship is determined, which overcomes the subjective shortcoming of current methods.Furthermore, the selfadaption algorithm is used to solve the optimal threshold, which improves the effective recognition of the boundary samples and the accuracy of the set pair relationship.The certainty degree expresses the certainty and uncertainty relationships between predicted and measured values and describes the accuracy and reliability of the predictive model.Therefore, the certainty degree was used as the basis for determining the weight coefficient, which provides new standards and ideas for the determination of weights.The case study and comparative analysis with other models show that the combined prediction model based on set pair analysis is effective and feasible for subgrade settlement.In addition, the method of determining the criterion of set pair relationship based on decision-making method provides a new idea for the intelligent identification of set pair relations and also provides theoretical support for the application of set pair analysis.

Figure 2 :
Figure 2: Sample information and threshold results.

Table 1 :
Settlement and cumulative load of EL2.

Table 4 :
Prediction results of the proposed model and comparison with those of other models (mm).

Table 5 :
Effects of prediction models.