Sandy Soil Liquefaction Prediction Based on Clustering-Binary Tree Neural Network Algorithm Model

*e neural network algorithm is a small sample machine learning method built on the statistical learning theory and the lowest structural risk principle. Classical neural network algorithms mainly aim at solving two-classification problems, making it infeasible for multiclassification problems encountered in engineering practice. According to the main factors affecting sand liquefaction, a sand liquefaction discriminant model based on a clustering-binary tree multiclass neural network algorithm is established using the class distance idea in cluster analysis. *e model can establish the nonlinear relationship between sand liquefaction and various influencing factors by learning limited samples. *e research results show that the hierarchical structure based on the clustering-binary tree neural network algorithm is reasonable, and the sand liquefaction level can be categorized accurately.


Introduction
Earthquake-induced sand liquefaction is a very destructive phenomenon of geological disasters, which can bring about damages to the farmland, roads, bridges, various civil buildings, and water conservancy facilities, causing great harm to the social economy and human life safety. erefore, sand liquefaction under the action of earthquake has always been one of the hot issues in the field of geotechnical engineering and disaster geology. As earthquakes are inevitable, the possibility of seismic liquefaction assessment and prediction of the site are the focus of the research on this issue [1][2][3][4]. e current discrimination methods for seismic liquefaction are mainly divided into two categories: the empirical analysis method based on the investigation data of the sand liquefaction disaster at the earthquake site and the experimental analysis method based on the field or indoor test. e commonly used empirical analysis methods mainly include the standard penetration method, the critical void ratio, and the energy discrimination method [5]. e experimental analysis methods mainly include the Seed-Idriss simplification method [6], shear wave velocity method [7], static cone penetration method [8], dynamic simple shear test, shaking table test [9,10], and dynamic triaxial test [11,12].
In recent years, the application of machine learningbased research methods in the field of earth sciences has gained a lot of attention [13][14][15][16][17][18][19][20]. Traditional geological research is gradually moving closer to the fields of data deep mining, big data, and artificial intelligence, such as in coal mine water inrush prediction [21] and research on landslides [22,23].
In the study of seismic liquefaction, Xue and Yang examined the potential of support vector machines (SVMs) for assessing liquefaction potential based on cone penetration test (CPT) field data [24]. A hybrid model based on a combination of SVMs and particle swarm optimization (PSO) was proposed in the study to improve the forecasting performance. Chou and edja proposed a novel classification system integrating swarm and metaheuristic intelligence, i.e., a smart firefly algorithm (SFA), with a leastsquare support vector machine (LSSVM), to provide decision-makers with timely warnings of geotechnical hazards [25]. Xue and Liu presented two optimization techniques: genetic algorithm (GA) and particle swarm optimization (PSO), to improve the efficiency of the backpropagation (BP) neural network model for predicting liquefaction susceptibility of soil [26]. Yang et al. established a new liquefaction evaluation formula based on the 156 SPT data from the liquefaction investigation in China mainland [27]. Ali and Jahanpour applied the evolutionary polynomial regression technique and provided a new model for predicting the liquefaction potential [28]. Rahbarzare and Azadi used hybrid particle swarm optimization and genetic algorithms with a fuzzy support vector machine (FSVM) as the classifier for the soil liquefaction prediction problem [29]. Hu and Liu attempted to reduce the parameter and model uncertainties of the Bayesian network model for predicting earthquakeinduced soil liquefaction. In their work, 31 candidate intensity measures were investigated by the analyses of correlation, efficiency, proficiency, and sufficiency based on a large database of historical ground motion records [30]. Hu and Liu constructed two new Bayesian network (BN) models for predicting the probability of the occurrence of soil liquefaction and then compared them with four simplified procedures and a Bayes classifier for soil liquefaction evaluation [31].
At present, in the field of geotechnical engineering, machine learning methods are often used to consider problems such as binary classification. In contrast, the problems encountered in geotechnical practice are often multiclassification problems. To date, multiclassification research is still in its infancy.
e application of neural network algorithms in geotechnical engineering is rarely reported in the literature. is paper introduces multiple types of neural networks into the evaluation of sand liquefaction for the first time, analyses the factors affecting sand liquefaction, extracts characteristic parameters, and adopts the cluster analysis method of class distance to establish a sand soil based on the cluster-binary tree multiclass neural network algorithm. e liquefaction discriminant model has achieved satisfactory evaluation results.

Introduction to Neural Network Algorithm.
e neural network algorithm is a typical supervised machine learning algorithm, which is composed of a large number of nodes connected to each other by imitating the transmission of electrical signals between biological neurons in the natural world. Among them, multilayer perceptron (MLP) is one of the most popular feedforward artificial neural networks. A typical MLP neural network usually contains an input layer, a hidden layer (one or more layers), and an output layer. e input layer and the input variables have the same number of nodes; the hidden layer contains the weight values of the connections between neurons and the corresponding threshold function, which are used to transmit and process the signal; and the output value is the final data to be obtained. e mathematical representation of the feedforward neural network is as follows: where x is the input parameter, ω ih and ω ho are the input layer-hidden layer weight and hidden layer-output layer weight, respectively, b ih and b ho are the deviation parameter, M is the number of nodes in the hidden layer, d is the number of nodes in the input layer, and T r (z) is the transfer function that performs a nonlinear transformation on the summation input. e goal of the algorithm is to reduce the error between the calculated value and the true value through a series of training. e error E can be defined as follows: where p is the total number of training modes and E p is the error of P-th training mode that is obtained by the following formula: where N is the total number of output nodes, k is the output of P-th time output node, and t k is the target output of the Pth output node.

Neural Network Algorithm Method of Multiclassification
Based on Clustering-Binary Tree. e basic neural network algorithm is aimed at two-classification problems, and the multiclassification problem is more common in engineering practice. e multiclass neural network algorithm method based on the binary tree is one of the effective methods to solve the multiclassification problem. As shown in Figure 1, the binary tree structure classifier solves a complex multiclass problem into multiple two-class problems. A multicategory problem can be transformed into multiple twocategory problems in various forms with various corresponding binary tree structures. For a k-category problem, the number of binary trees that can be constructed is As shown from Figure 1, the classification performance of the neural network algorithm subclassifier of the upper node has a greater impact on the generalization of the entire classification model. If a classification error occurs at a node, the error will continue, resulting in a phenomenon of "error accumulation," making the next-level node lose its classification meaning. It can be seen that constructing a reasonable binary tree structure is essential to the correct classification of the model. In order to make the binary decision tree have the optimal level of performance and construct a reasonable binary tree hierarchy, the samples must be divided into two groups at the decision point in an approximate optimal level method even if the separability between the two type sample types in the upper layer is strong. In so doing, the distance between the cluster centers of the two groups of samples is the largest, and the divergence of each group of samples is the smallest.

Definition of Distance. For given
, the distance between two samples can be expressed as d(x a , x b ). In the linear case, the distance between the two samples x a and x b is defined as follows: In the case of nonlinearity, the distance between the two samples x a and x b is defined as follows: is paper adopts the idea of class distance in cluster analysis as the algorithm for generating binary trees. First, the class farthest from others is segmented, and the optimal hyperplane constructed at this time has good generalization.
Algorithm flow of multiclass neural network based on clustering-binary tree is as follows: (1) Calculate the distance between classes according to formulas (4) or (5): , sort the p-th category in front of the q-th category so that the larger the difference, the more the category partition. us, one can get the arrangement of all categories n 1 , n 2 , . . . , n k . (4) According to the generated binary tree, a binary neural network training algorithm is used to construct the optimal hyperplane of the binary node. Take the n 1 -th sample as the positive sample set and other samples as the negative sample set. e neural network training algorithm is used to construct the binary neural network subclassifier at the root node. e i-th subclassifier takes the i-th sample as a positive sample and uses the i + 1, i + 2, . . ., k-th sample as a negative sample to train the neural network i subclassifier until (k − 1)-th subclassifier separates the (k − 1)-th category from the k-th category. In this way, the recognition of multiple types of problems can be completed. (5) e multicategory classification ends and outputs the result.

Analysis of Influencing Factors and Model Establishment.
Sand liquefaction is a form of foundation failure caused by earthquake damages. It is an important content of geological engineering evaluation in earthquake areas. e liquefaction of saturated sand is a very complicated process with many influencing factors. It can be roughly divided into three categories: (i) e nature of the sand layer, such as the type of soil, particle composition, and degree of compaction (ii) e buried depth of the soil layer and the groundwater level (iii) Seismic factors, such as earthquake intensity, epicenter distance, and earthquake duration is paper proposes representative parameters from many influencing factors as the distinguishing index of sand liquefaction, including the seismic intensity I, epicenter Advances in Civil Engineering distance R, sand buried depth d, groundwater level d w , standard penetration number N, average particle size D 50, nonuniformity coefficient C u , and the dynamic shear stress ratio (τ d /σ v ′ )d. e eight variables are used as classification indicators. Such selection helps avoid that attributes with a large value range are more dominant than those with a small value range.
First, by performing normalization preprocess on each attribute of the sample data and adjusting linearly to [−1, +1], the sand liquefaction evaluation can be divided into 4 grades, which are severe liquefaction (I), moderate liquefaction (II), slight liquefaction (III), and no liquefaction (IV). e use of neural network algorithms to discriminate sand liquefaction is to find the nonlinear relationship between influencing factors and sand liquefaction. Assuming that n samples are collected, the problem is abstracted as the mapping from attribute set X to classification set Y; that is, the index matrix is X n×8 , Y � {I, II, III, IV}. is paper sorts out 40 typical examples from the literature of the Wenchuan Earthquake and divides them into training sample sets and test sample sets. Among them, 30 samples are used for learning machine learning, as shown in Table 1, and the other 10 samples are used to test the model performance.

Sample Learning.
ere are 4 levels of sand liquefaction, indicating a 4-classification problem. A decision binary tree with 3 classifiers needs to be established to divide the sample into 4 subsets. rough the learning of 30 samples, various classification functions are obtained, and the sand liquefaction discriminant model is established based on the neural network algorithm of the clustering-binary tree. e parameters of the neural network algorithm are mainly kernel function parameters and penalty factor C. In the research, it is found that for different training parameters, the learning efficiency and generalization ability of neural network algorithms are different. In this paper, the radial basis function is selected as the kernel function through a trial algorithm, namely: where x i and x j are the training input and σ is the width of the kernel function. In this way, the training parameters of the support vector machine in this paper are σ and C. e size of σ affects the output response interval of the sample. When σ is small, the response interval is narrow, leading to less risky e penalty factor C, which represents the degree of punishment for errors, is a compromise between training error and promotion ability by controlling training accuracy.
e larger the C is, the greater the penalty for misclassification is. e value of σ and C can be determined according to the samples during the training process.
is paper uses the variable increment iteration method based on the number of misclassified samples to obtain σ � 0.28 and C � 300.
rough the learning of 30 samples, the error-free classification of the sample is given in Table 1.

Sample Prediction.
When using the clustering-binary tree neural network algorithm for classification, starting from the root classifier, the attributes of the unknown samples are substituted into the classification function to test their values one by one.
When the test value is 1, it reaches the leaf node and stops moving forward, indicating that the sample is the class represented by the leaf node. When the test value is −1, one must go down the branch until reaching a certain leaf node. e other 10 samples are input as test samples into the decision binary tree, and the classification of each sample is shown in Table 2. One can see that the learning results are consistent with the actual ones. erefore, the method in this paper can correctly distinguish multiple types of problems, indicating that the promotion performance of the support vector machine is good.

Conclusion
In this paper, a multiclass neural network is introduced into the discrimination of sand liquefaction. During the modelling process, it was found that the promotion performance of the multiclass neural network model based on the binary tree is related to the binary tree structure. e learning efficiency and promotion performance vary with the binary tree structure and the model parameter. is paper adopts the idea of class distance in cluster analysis as the algorithm of generating binary tree and establishes a multiclass neural network to predict the sand liquefaction based on cluster-binary tree. e model establishes the nonlinear relationship between sand liquefaction and various influencing factors through the learning of limited samples. e research results show that the required number of classifiers in the clustering-binary tree neural network is small (only 3 classifiers are needed for 4 types of problems), the repetitive training samples are few, the binary tree hierarchy is reasonable, the classification accuracy is high, and the generalization is good. us, the proposed model predicts the liquefaction level accurately and can be used for sandy soil. e neural network algorithm method is based on the VC theory of statistical theory and the principle of minimum structural risk. It seeks the best compromise between complexity and learning ability based on limited sample information and has a good generalization ability. As a new machine learning method, the neural network algorithm has broad application prospects in geotechnical engineering and other fields because of its unique advantages.
Data Availability e data are generated from the field and can be available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.