Prediction of Seepage Pressure Based on Memory Cells and Significance Analysis of Influencing Factors

School of Water Conservancy, North China University of Water Resources and Electric Power, Zhengzhou 450045, China Henan Key Laboratory of Water Environment Simulation and Treatment, Zhengzhou 450045, China Collaborative Innovation Center of Water Resources Efficient Utilization and Protection Engineering, Zhengzhou 450045, China Henan Key Laboratory of Water Resources Conservation and Intensive Utilization in the Yellow River Basin, Zhengzhou 450046, China


Introduction
Reservoir dams are an important infrastructure that play an important role in the development of economies and societies. However, with the recent rapid increase in the construction of reservoir dams, many safety accidents have followed [1]. From 1954 to 2006, 3498 dams broke in China alone, and more than 93% of them were earth rock dam failures [2]. Based on the material characteristics and operation mechanism of earth rock dams, the primary failure modes are overtopping, landslide instability, and seepage damage [3]. Combined with relevant data, the number of dam failures caused by seepage failure in China is approximately 39% of the total [4]. Similarly, foreign survey data demonstrate that approximately 46% of earth rock dam failures are caused by infiltration damage, ranking only second to overtopping failure (approximately 48%) [5][6][7][8].
erefore, the influence of seepage failure on the safety of earth rock dams is self-evident, and its significance is second to none.
Seepage failure primarily refers to the reduction of soil effective stress and seepage deformation due to the movement of soil particles inside the dam due to the action of seepage pressure, which decreases the stability of the dam or even destroys the dam. erefore, the establishment of adequate seepage safety monitoring is an essential prerequisite to ensure seepage safety: prototype observation data are used for analysis, and a seepage monitoring model is developed to monitor and evaluate the operational conditions of water conservancy projects.
For seepage instability, influencing factors primarily include water level change, rainfall, temperature change, and duration. For example, Xu et al. [9] considered that the seepage flow of dam bodies is primarily caused by the superposition of reservoir water level and rainfall, and the seepage head and reservoir water level tend to be positively correlated. Guangjin et al. [10] and Ashraf et al. [11] studied variations in the seepage head of a soil slope due to changing rainfall intensity and duration. Testing indicated that as rainfall intensity and duration increased, the deeper the influence of rainfall factors on soil, the greater the change in the seepage head and the greater the change in seepage pressure. Xu and Chai [12] investigated the influence of changes in temperature on the permeability coefficient and calculated water heads in seepage fields. Compared to the calculated results without a change in temperature, their findings show that the former is more accurate when calculating the distribution of water heads in seepage fields. However, most studies to date have focused on the influence of changes in influencing factors on seepage under single or a combination of conditions, which only verifies that there is a significant correlation between these factors and the seepage field, namely, the measured seepage pressure value, making it difficult to distinguish the correlation or importance between the two. Based on existing monitoring results of seepage pressure, a change in the measured seepage pressure over a long time period is typically complex and variable; thus, it is difficult to directly determine how seepage pressure changes over time. ere also tends to be a longtime delay in response to changes in external conditions. erefore, it is important to determine the effect of various factors and consider the dynamic change characteristics of seepage pressure monitoring for analyzing the change law of seepage pressure value of long sequence of dam and realizing the related prediction.
Currently, there are many methods to establish mathematical models to predict seepage monitoring data. Data fitting and analysis can be achieved via multiple regression; however, the accuracy of this method is restricted by the collinearity between input factors [13]. Predictions based on neural network models have also been reported; for example, Zhang et al. [14] used a genetic algorithm (GA) to optimize the weights and thresholds of a backpropagation neural network (BPNN), thus establishing the BPNN-GA seepage prediction model. Shi et al. [15], Rankovic et al. [16], and others used a variety of different neural network models, including radial basis neural network models, to achieve accurate predictions of dam seepage. ese models provide various options to predict the measured seepage pressure using a neural network model; however, such static models are primarily based on correlations between the seepage pressure and associated factors and describe the lag in seepage pressure in response to changes in the factors via the averaging method using existing data. Also, these static models cannot describe the timeliness of the seepage pressure of a dam and thus cannot describe the memory function of the historical information hidden in the data, maintain the information, and output it to the current neuron for calculation. us, a predictive model developed based on dynamic changes in relevant factors will be useful for practical engineering applications.
A predictive model of seepage pressure is established in this paper that considers the effects of multiple factors and dynamic changes, integrates rough set theory and dynamic neural network technology, and examines the advantages of the dynamic model based on the distribution of error and the determination coefficient R 2 [17]. Model performance is characterized by comparing its structural characteristics and predictive results to those of other models based on engineering examples to provide references for the prediction of the seepage pressure value and engineering safety assessments.

Project Overview.
A concrete-faced rockfill dam is investigated in the paper, with a project scale of Class A Large (1). e elevation of the dam crest is 413.8 m with a maximum dam height of 99.8 m, a dam crest width of 10 m, and a dam crest length of 540.46 m. An L-shaped reinforced concrete wave wall with a height of 4 m is set at the upstream side of the dam crest. e normal water level of the reservoir is 410 m, and the storage capacity is 11,076,000 m 3 . To monitor dam safety, a comprehensive prototype dam monitoring system that is composed of deformation monitoring, seepage monitoring, and environmental monitoring (e.g., rainfall and reservoir water level) and other equipment is installed at the reservoir. e seepage monitoring equipment uses buried and installed osmometers; four osmometers are installed in total with a data-collection interval of 1-3 d, as shown in Figure 1.
Considering osmometer PB4 as an example, changes in the data over the last 8 years from 2008 to 2015 are given in Figure 2. e seepage pressure measured fluctuates within a relatively stable range and varies with changes of water level, rainfall, temperature, and other factors; thus, the monitoring results are reasonable. e rainy season is concentrated in July and August every year and exhibits an annual cycle. e maximum rainfall during the monitoring period was 95.50 mm on July 8, 2012. e temperature change between years is relatively stable; however, the change within a given year is major, with the high value of approximately 37.8°C and the low value of −9.70°C. During the monitoring period, the change in the water level upstream and downstream is minor, and the water level upstream varies between 385 and 410 m. Compared to the periodic alterations in the rainfall and temperature, the change in the seepage pressure is more moderate and lags due to the influence of the level of seepage pressure at the front of the dam. For example, from January 2011 to March 2011, the water level of the reservoir gradually decreased by 4 to 5 m; however, the change in the seepage pressure started at the end of March 2011 and gradually decreased from approximately 0.1 to 0.3 m (see Figure 2(a)).

Rough Set eory.
Rough set (RS) theory is a theoretical method that can achieve the same classification ability of a knowledge base via attribute reduction to solve problems or apply classification rules [18,19] and is typically 2 Complexity represented by a decision information table S � 〈U, C ∪ D, V, f〉, where U is the domain or set of nonempty finite objects; C ∪ D is the attribute set; C is the conditional attribute; D is the decision attribute, in which C ∩ D � ∅ is satisfied; V is the set of all attribute value domains; and f: U × R ⟶ V represents the attribute value of each object in the domain on the corresponding attribute. e reduction process of the influencing factors of the real seepage pressure is described as follows: (1) e seepage pressure value domain U, which is composed of the seepage pressure and the related influencing factor data, is divided into two equivalent classes by the Pawlak attribute importance method, respectively. (2) pos C (D) is calculated based on the dependence of the seepage pressure value D on the influencing factor set C, and the formula is as follows: (3) Removing the importance of a given influencing factor c i from the seepage pressure D can be defined as (4) e attribute importance of the influencing factor Formula (4) represents the degree to which the single factor c i affects the classification of the seepage pressure after factor removal: the larger the valueσ C D (c i ) is, the greater the importance of the influencing factor is, and vice versa. e seepage characteristics of dams are affected by multiple factors. However, due to the different importance of various factors regarding seepage, factors with little affect or irrelevant redundancy can be screened out by this method to facilitate the prediction of dynamic neural networks.

Neural Network Dynamic Model/Long-and Short-Term Memory Network Model.
e RS method can be employed to screen the influencing factors of seepage pressure; however, it cannot obtain accurate predictions. Considering the timeliness of changes in seepage pressure and the lag between influencing factors, the long-and shortterm memory network model (LSTM) is introduced [20]. e LSTM model, which is an improved form of recurrent neural networks (RNNs), is a deep learning model that can process time series data [21]. Compared to the BP neural network model (BP model), the nodes between the hidden layers in LSTM are connected to each other; the adjacent nodes of the hidden layers are connected [15], which can describe the memory function, retain hidden information in the historical data, and output the data to the current neuron for data calculation. ese nodes are also constantly updated with input data (see Figure 3). Compared to RNN, LSTM includes a memory unit to solve the problem of gradient disappearance or explosion during prediction [22] to express the hidden historical information in the data more accurately. e memory unit of the model consists of an input gate, forget gate, and output gate, which are utilized to control information transmission at different times. e input gate controls the strength of the new input heading into the memory unit and determines how many new memories will be merged with previous memories. e forget gate manages the strength of the memory unit to maintain the value at the last moment (i.e., selects or rejects historical information). If the forget gate is closed, no memory can pass; otherwise, all memories can come through. e output gate controls the strength of the output memory unit and determines the LSTM response to the outside world [14]. e descriptions in detail can be figured out in Figure 4. e aforementioned function equation of propagation in each mechanism during LSTM training is described as follows [23][24][25]: where I t , f t , and O t are vectors describing the input gate, forget gate, and output gate of the model at time t, respectively; C t LSTM is the set of influencing factors of the seepage pressure after RS reduction at time t; D t−1 LSTM is the seepage pressure at time t − 1, representing the hidden historical information of LSTM; W 1 is the connection weight between the input layer and the hidden layer; W 2 is the connection weight between the hidden layer and the output layer; b I , b f , b o , and b S are bias terms that correspond to each structure, respectively; S t is the vector of the memory unit at time t and is only used for memory units to forget old information and add new information; S t is a new candidate vector value created for tanh, which is the hyperbolic tangent function; and g is an activation function, which can map real numbers to [0, 1], where 1 indicates that all information in the unit at the previous time is reserved and 0 represents that all information in the unit at the previous time is discarded.  the superior performance of the two methods proposed in data analysis, RS reduction is adopted before the LSTM prediction and analysis, and the RS-LSTM model structure is constructed as illustrated in Figure 5 to achieve the analysis and prediction of dam seepage in the operating period under the influence of multiple factors. Based on the RS analysis results, an LSTM prediction model that considers various influencing factors is built to predict the seepage pressure. Two hidden layers are selected, and the number of hidden layer neurons is determined by the following formula:

RS-LSTM Model
where l is the number of nodes in the hidden layers; m is the number of nodes in the output layers; n is the number of nodes in the input layers; and α is the adjustment constant between 1 and 10. To determine the best prediction model structure after multiple calculations and training, training was found to be most effective when the number of nodes in the hidden layers is 10; thus, the neural network topology structure is 6-10-1, which is displayed in Figure 6. Considering PB4 as an example, the sample data contain water lever, temperature, rainfall, duration, and monitoring values and have a total of 8,673 groups, including 1,239 groups of seepage pressure monitoring values. A total of 196 groups of data (2% of the total data set) were selected as prediction samples, 28 groups of data were selected as output samples for prediction, and the remaining were selected as training samples. e number of iterations was set to 5,000, the error range was set to 0.001, and the correction factor was set to 0.02.

Application Results and Analysis of RS.
Based on existing literatures [9][10][11][12], seven influencing factors are considered for prediction and analysis in this study, including the upstream water level, downstream water level and temperature on the observation day, the mean rainfall in the first

Output layer
Hidden layer Input layer RNN model Complexity four days, and the time-dependent components θ, e 1+θ , and ln (1 + θ). ese factors are recorded as X 1 -X 7 , respectively, in the influencing factor set C, in which C 1 : X 2 -X 7 ; C 2 : X 1 , X 3 -X 7 ; C 3 : X 1 -X 2 , X 4 -X 7 ; C 4 : X 1 -X 3 , X 5 -X 7 ; C 5 : X 1 -X 4 , X 6 -X 7 ; C 6 : X 1 -X 5 , X 7 ; and C 7 : X 1 -X 6 . e seepage pressure is set as decision attribute D. After the normalization and discretization of C and D, the decision sequence is developed. e discretization is achieved by the method of equidistance; thus, after the normalization of the decision sequence value, they are discretized into five levels, i.e.,  Figure 7(a) for details; according to the decision attribute D, it can be divided into normal data and abnormal data). ese groups of data can be reduced using formulae (1)-(3), and the results are presented in Figure 7(b), which show that the importance of X 6 is 0; X 1 , X 2 , X 3 , X 4 , X 5 , and X 7 are all necessary attributes, and their importance order can be obtained. e importance of the water level upstream and downstream is the highest, scoring 0.31 and 0.29, respectively; the influence of the rainfall and temperature components follows closely, scoring 0.25 and 0.24; and the influence of time-dependent components θ and ln (1 + θ) is small, only scoring 0.12 and 0.14.
Considering X 1 as an example to describe the decision reduction process, the equivalence classes of conditional attributes C and D are IND(D) � U 1 , U 2 , . . . , U 37 }, U 38 , U 39 , . . . , U 100 } and IND(C) � U 1 , U 2 , . . . , U 100 }, respectively, which are determined by combining collective knowledge based on IND(C)⊆IND(D), where pos C (D) � U. With IND(C 1 ) and setting U 32 , U 60 , U 95 , U 7 , U 55 , U 37 , U 98 , etc., a total of 31 sets of datasets are not subsets (as shown in Figure 7(b), the horizontal axis is the collection number that does not belong to IND (D)); therefore, pos C (D) ≠ pos C 1 (D). X 1 is necessary in C relative to D, and the importance of X 1 to C relative to D can be determined based on equation (4), yielding a result of 0.31.
Based on this analysis, the factors that affect seepage pressure do not include X 6 because its importance is 0; thus, the relative reduction of the decision sequence can be obtained. Figure 8 reveals the prediction results of the seepage pressure, and Table 1 makes a comparison of the model prediction results. From the results, the neural network method is shown to predict the seepage pressure accurately. e determination coefficient R 2 of the RS-LSTM, RNN, and BP models is 0.97, 0.89, and 0.83, respectively, and the mean relative errors are 3.00%, 6.08%, and 9.85%, respectively, all of which are within 10%; thus, the prediction accuracy requirements are satisfactory. Comparing the prediction results of the three models, the RS-LSTM model is found to have the highest prediction accuracy, followed by the RNN model and the BP model. e calculation accuracy of the former is 2.03 times and 3.28 times that of the RNN and BP models, respectively, based on the mean relative error.

Prediction Results and Analysis.
During the operation of the RS-LSTM model, the importance ranking of influencing factors is determined by RS calculation, which provides decision support for the selection of the input layer and weight setting of LSTM; eliminates the interference of redundant factors; and markedly improves the operation efficiency. In this case, prediction can be completed in 6.37 s and 278 iterations; the training error convergence process is shown in Figure 9, and the average operation efficiency is 41% and 59% higher than that of the RNN model and BP model, respectively. Additionally, both the LSTM and RNN models consider historical information hidden in the data, which can maintain correlations between the seepage pressure at different times and use them to describe the dynamic characteristics of the seepage pressure, which the BP model cannot describe. e realization process considers historical seepage pressure values. e output of the hidden layer in the model has a connection weight from the previous hidden layer to the current layer. When the seepage pressures are output, the current water level of the upstream, rainfall, and other factors will affect the current output through the memory unit with the output layer status and will again affect the  6 Complexity output of the seepage pressure in the next time step through the memory unit with the current output layer status. is cycle then repeats.

Memory cells
Compared to the RNN model, the LSTM model comprises an input gate, forget gate, and output gate of the memory unit, which can select, forget, and output the current input x t (namely, the influencing factors at the current moment), the output values of the hidden status h t-1 at the last moment (namely, the seepage pressure values of the hidden layer at the last moment), and the maximum reservation and output of the important hidden information, respectively, to markedly improve prediction accuracy. e specific process is described as follows. When data C t LSTM and D t−1 LSTM pass through the forgetting gate, the information that must be discarded in the data is determined based on the setting range [0,1] of the model and is determined as the information retention level f t . After the input gate, the data in this study consist of two parts: the sigmoid layer is primarily used to describe the renewal of the numerical range and the tanh layer is employed to create the new candidate vectorvalued C t . ese layers are combined with the forgotten door to generate a new vector-valued S t . For the next time step, the memory cell seepage pressure is predicted, and the output includes S t . e output value of the current time is then determined. e primary advantages of the RS-LSTM include the following three points. Firstly, the RS-LSTM considers historical information. By storing, writing, or reading information through the memory unit, each unit can determine which piece of information is going to be transmitted through the door switch and when it is allowed to be read, written, or cleared and can more intelligently describe the prediction and analysis of time series data. Secondly, the RS-LSTM yields high prediction accuracies and fast convergence speeds with fewer iterations and avoids local minima.

Complexity
Finally, the RS-LSTM mitigates correlations between a single factor and the seepage pressure during seepage, such as the increase in the water level increasing the seepage pressure.

Conclusion
In this paper, the RS-LSTM model structure is constructed using engineering examples, the significances of different influencing factors on the seepage pressure are investigated using rough set theory, and the influence of different models on the prediction of the seepage pressure is described by comparing average relative errors. e specific conclusions of this study are as follows: (1) Written in Python, the RS theory and LSTM model are integrated to develop the RS-LSTM model in this paper. Examples elucidate that this model considers changes in measured seepage pressure during seepage and their correlations with external factors and synchronously describes the importance ranking of the influencing factors of seepage. Meanwhile, this model can eliminate the influence of redundant factors and predict dam seepage monitoring accurately and effectively. e model can provide the corresponding and the theoretical support for seepage safety during the operation of dam projects in the future.
(2) A practical example demonstrates that the primary influencing factors of seepage pressure include the water level upstream, the water level downstream, the temperature, the average rainfall in the first four days, and the time-dependent components θ and ln (1 + θ). e importance scores of these factors are 0.31, 0.29, 0.25, 0.24, 0.12, and 0.14, respectively, and the time-dependent component e 1 + θ is redundant.
(3) e prediction results demonstrate that neural networks can be used to predict seepage pressures in dams, and the determination coefficient R 2 of the RS-LSTM model proposed in this paper can reach 0.96. Compared to the traditional neural network model, the RS-LSTM model yields a higher prediction accuracy, and the accuracy of the RS-LSTM model is 2.03 and 3.28 times that of the RNN model and BP model, respectively.
As part of the future study, the proposed model could be utilized for datasets in other dams. To further enhance the performance of the proposed model, it can be attempted to integrate a variety of seepage pressure gauges to achieve multi-point and multi-factor pressure prediction.

Data Availability
e data used to support the findings of this study are included within the article.