Detection Model for Seepage Behavior of Earth Dams Based on Data Mining

Seepage behavior detecting is an important tool for ensuring the safety of earth dams. However, traditional seepage behavior detection methods have used insufficient monitoring data and have mainly focused on single-point measures and local seepage behavior.The seepage behavior of dams is not quantitatively detected based on themonitoring data withmultiplemeasuring points. Therefore, this study uses datamining techniques to analyze themonitoring data and overcome the above-mentioned shortcomings. Themassive seepagemonitoring data withmultiple points are used as the research object.The key information on seepage behavior is extracted using principal component analysis. The correlation between seepage behavior and upstream water level is described as mutual information. A detection model for overall seepage behavior is established. Result shows that the model can completely extract the seepage monitoring data with multiple points and quantitatively detect the overall seepage behavior of earth dams. The proposed method can provide a new and reasonable means of quantitatively detecting the overall seepage behavior of earth dams.


Introduction
Seepage is an important factor that affects the safety of earth dams.Based on the statistical data of the International Commission on Large Dams, approximately 52.2% of earth dam crashes are caused by seepage damage [1].
The upstream water level under normal service conditions causes earth dams to form a stable seepage field in the dam body and foundation, thereby indicating a stable seepage behavior.However, excessive seepage gradient, excessive seepage pressure, and other abnormal seepage phenomena may occur in earth dams due to the construction defects and material aging.These phenomena can cause seepage damage, increase in instability of a dam's slope, and lead to dam breakage.
However, dam safety can be controlled.Safety detecting provides the basis for dam safety control.Several osmometers are typically placed along the key points in the dam to detect seepage behavior.The measured value of osmometers fluctuates within a reasonable range when the seepage behavior of earth dam is normal.The measured value will exhibit sudden changes or trends when the seepage behavior of earth dam is abnormal.Therefore, analyzing the data of these osmometers should be conducted to detect the seepage behavior of earth dams.Mathematical and mechanical methods are used to analyze the data and detect seepage behavior.Appropriate measures are reinforced when an abnormal seepage phenomenon is detected to effectively reduce the risk of dam breakage and provide technical assurance on the service safety of earth dams.
Currently, the detection methods for the seepage behavior of earth dams are divided into three types.The first type uses a statistical regression method for analyzing the monitoring data.The factors that influence the seepage behavior of earth dams are summarized as water level, rainfall, temperature, and aging.A seepage monitoring model for a single point is established.Abnormal seepage behavior is detected by analyzing the trend of different factors [2].Si et al. [3] used support vector machine to train original monitoring data.This method improved the precision and detection accuracy different points should be fused if the overall seepage behavior must be detected; this technique is called the multiplepoint detection method.(2) The overall seepage behavior is qualitatively detected based on experts' experiences and detection results of local seepage behavior.The subjectivity of this method is strong, and the experts' experiences affect the detection results.Therefore, the quantitative detection method for the overall seepage behavior of dams should be investigated.
In summary, new research methods should be developed to explore the potential information in massive monitoring data and establish an efficient and accurate model for seepage behavior detection given the increase in monitoring data.Therefore, this study utilizes principal component analysis (PCA) and mutual information (MI) in data mining technology to extract massive seepage monitoring data with multiple points.The PCA is used for information extraction, and MI is used to describe the correlation between the principal component (PC) and upstream water level.The detection model for seepage behavior is established based on MI distribution, thus providing a new means of accurately detecting the overall seepage behavior of earth dams.

Method for Establishing the Model
2.1.Modeling Process.Data mining [13,14] refers to the process of discovering hidden information from massive amounts of data.The correlation study on the data and the extraction of key information from massive monitoring data is the primary content for establishing a seepage detection model given the increasing volume of monitoring data.
The PCA is an important data mining algorithm [15] that extracts one or a few PCs from a plurality of variables to replace the original variables through the correlation between the data based on the principle of minimizing data information loss.Currently, PCA is extensively applied in data analysis [16,17].Several seepage monitoring cross sections are arranged in the dam, and osmometers are arranged in the sections to monitor the seepage behavior of dams.The monitoring data are called the water level of osmometers.The fusion of the data from multiple osmometers should be conducted to detect the overall seepage behavior quantitatively.The locations of several sections and working conditions are similar.Therefore, the data from these osmometers are similar and are correlated.In this study, the PCA is used to reconstruct one or a few integrated variables (PC) that reflect the basic characteristics of primitive variables when each osmometer is considered a primitive variable.PCs contain key information from primitive variables and provide the basis for quantitative detection of the overall seepage behavior.
The upstream water level is an important factor that affects the seepage of earth dams [18].For earth dams, the earth that is used to fill the dam does not prevent seepage, and the upstream water enters the earth.To prevent seepage damage in the dam, a core wall made from impervious materials is constructed in the dam to block the seepage and ensure its safety.Therefore, the water level of osmometers arranged in the front core wall is close to the upstream water level considering the poor antipermeability of the earth.The PCs of these osmometers are also close to the upstream water level.The correlation between the PC and upstream water level is strong.The water level of osmometers that were arranged behind the core wall is significantly reduced given the antipermeability of the core wall.The PCs of these osmometers are also significantly reduced.Therefore, the correlation between the PC and upstream water level is low.MI [19] is used to quantitatively describe the correlation between the upstream water level and PC after extracting the PC of osmometers.A considerable amount of MI results in a strong correlation between upstream water level and PC.Compared with the traditional correlation coefficient, MI simultaneously describes the linear and nonlinear relationships between the variables.In addition, MI is extensively used to describe the correlation of variables [20,21].MI between the upstream water level and PC should fluctuate in a rational region when the core wall is intact, thereby indicating that the seepage behavior of the dam is normal.The seepage quantity in the dam increases, and the PC becomes abnormal when the core wall is damaged given the effect of upstream water level, thus leading to an abnormal MI.This condition indicates that the seepage behavior of the dam is abnormal.The MI fluctuation range, that is, the detection model, can be obtained by analyzing the MI distribution of historical data.The seepage behavior of the dam is normal if the MI falls within this range.The seepage behavior of the dam is abnormal if the MI falls outside this range.
The advantage of MI in detecting seepage behavior is that MI can be used to eliminate the interference of osmometer failure.In general, several abnormal values are found in the data of osmometers.The PC fuses the data from different osmometers.Therefore, the PC also contains abnormal data.These abnormal values may reflect the abnormal seepage behavior.However, these abnormal values may be caused by osmometer failure.The abnormal values caused by osmometer failure may interfere in detecting seepage behavior and lead to misdiagnosis.The MI represents the correlation between PC and upstream water level.If the abnormal data are caused by osmometer failure, then the MI will not be abnormal because the abnormal data are not caused by the upstream water level.Therefore, MI eliminates the interference from osmometer failure and improves the detection accuracy.
In summary, this study uses PCA to extract PCs from massive seepage measurement and MI to describe the correlation between the PC and the upstream water level.The detection model for seepage behavior is constructed based on the MI distribution.A flowchart that illustrates the modeling process is depicted in Figure 1.

Extracting PCs of Effect
Variables.The number of seepage monitoring points is assumed as ; that is, the number of primitive variables is , and each point contains  times of observed value.Therefore, these observed values can form the following  ×  matrix: where X  ( = 1, 2, . . ., ) is the row vector that denotes the monitoring data sequence of the th monitoring point and   ( = 1, 2, . . ., ;  = 1, 2, . . ., ) denotes the jth monitoring data of the th monitoring point.
In matrix X, the working environments of these primitive variables (seepage monitoring points) are similar.Therefore, the measured data of these points (X 1 , X 2 , . . ., X  )  exhibit a strong correlation.The PCA is used to reconstruct  irrelevant integrated variables (PC) when the number of primitive variables is .Score matrix F can be expressed as where F i is the th PC and L is the score coefficient matrix.  is the coefficient of the jth primitive variables in the th PC and reflects the relevance between the jth primitive variables X j and the th PC, F i .A large absolute value of   leads to a high correlation between F i and X j .Hence, considerable information on X j can be explained by F i .If   is positive, then the correlation between F i and X j is positive.If   is negative, then the correlation between F i and X j is negative.
Equation ( 2) denotes that calculating L is an important step.PC can be obtained by calculating L. Assume that the covariance matrix of primitive variables is expressed as The eigenvalue decomposition of C can be expressed as where Λ is the diagonal matrix; that is, is the eigenvalue of C, which is the variance of the th PC.U is the eigenvector matrix, which can be written as U = (u 1 , . . ., u  , . . ., u  ).u  can be written as ( 1 ,  2 , . . .,   )  ( = 1, 2, . . ., ).Thus, L = U  can be confirmed.
Assume that a  = ( 1 ,  2 , . . .,   )  is an orthogonal vector, which makes the th The second property of F  indicates that F  has the largest variance in all linear combinations of X 1 , X 2 , . . ., X  , which are uncorrelated with F 1 , F 2 , . . ., F −1 .Therefore, var(F  ) can reach the largest variance when a  = u  ; that is, If the number of primitive variables is n, then less than  PCs can be reconstructed.The ability of these  PCs to explain primitive variables is different.Therefore, z (where z < n) PCs should be extracted from  PCs that best describe the properties of primitive variables.The values of   are sorted from large to small, and the value of  is typically determined based on the cumulative variance contribution rate , which is calculated as follows: In general, if  is greater than 95%, then over 95% of the original information can be explained by former  PCs.Therefore,  ≥ 95% is set as the discriminant index for extracting  PCs from  PCs.In engineering applications, the number of PCs can be properly adjusted according to the specific circumstances.

MI Calculation.
Assume that the th PC of seepage is F i and the upstream water level is Y.Then, the MI I  between F i and Y is calculated as where (  ) and () are the probability density functions of F i and Y, respectively, and (  , ) is the joint probability density function of F i and Y.If the correlation between F i and Y is high, then   will be considerable.Moreover, if F i and Y are not related, then   will be zero.F i and Y may not follow the fixed-form distribution type.Hence, kernel density estimation (KDE) method [22] is used to estimate the probability density functions of F i and Y.In this method, (  ) and () are expressed as where  are the measuring times,   is the th measured value of the th PC,   is the th measured value of the upstream water level, and  is the kernel function.Gaussian kernel function [22] is generally used and expressed as The joint probability density function (  , ) is expressed as In ( 8)- (10), ℎ is the bandwidth used to control the smoothness and fitting accuracy of the probability density curve.
If the value of ℎ is high, then the probability density curve is smooth with a low fitting precision.If the value of ℎ is small, then the smoothness of probability density curve decreases but the fitting precision increases.In general, the value of ℎ is determined through the comprehensive analysis of smoothness and fitting accuracy.

Detection Model of Seepage.
Assume that the length of the dam safety monitoring data is  years.If the unit is years, then the detection model is constructed based on the annual variation of MI.In (7), the MI between the PCs and the upstream water level can be obtained.These MI values form the following MI matrix: . . . where where S is the following covariance matrix: Based on statistical theory [15], Therefore, possibility  where the MI value [ 1 ,  2 , . . .,   , . . .,   ]  in the jth year falls into the confidence region 100(1 − )% satisfies the following equation: and the confidence region satisfies the following inequality: The region is a confidence interval when  = 1; the region is a confidence ellipse when z = 2; the confidence region is a confidence ellipsoid when  = 3; the region is a hyperellipsoid when  > 3.
The range of the confidence region is determined by the eigenvalue of covariance matrix S and significance level .S is symmetric and positively definite and has  real eigenvalues that are greater than zero.The eigenvalues of S are expressed as The confidence interval 100(1 − )% of the MI value [ 1 ,  2 , . . .,   , . . .,   ]  in the jth year is centered on mean vector I.The lengths of each half axis are expressed as From the statistical theory [15], significance level  is typically set as 0.05 and 0.01.Therefore, the distribution of [ 1 ,  2 , . . .,   , . . .,   ]  satisfies the following equations: Equations ( 19) and ( 20) are considered the detection models for seepage behavior of earth dams.For the MI value [ 1 ,  2 , . . .,   , . . .,   ]  in the jth year, the probability of falling in the range of ( 19) is 0.95, and the probability of falling outside the range of (20) is 0.01.Based on the small probability principle, the event is considered a small probability event when its probability is less than 0.01.If a small probability event occurs, then appropriate attention must be provided.The seepage behavior of earth dams is divided into three states, namely, normal, early warning, and abnormal, when the preceding mentioned theories are combined with engineering experience in seepage monitoring; these states are described as follows: (1) [ 1 ,  2 , . . .,   , . . .,   ]  falls within the range of ( 19) ( = 0.95); that is, the seepage behavior of earth dam is normal.

Case Study
3.1.Description of the Project.The Shenzhen Reservoir (Figure 2) is located downstream of Shawan River in Shenzhen City, Guangdong Province, China.This reservoir is a water conservation project with functions of flood control, water supply, and power generation.The main building includes the main dam, the left auxiliary dam, and the right auxiliary dam.This main dam is an earth dam that has a core wall with a shell material that is gravelly, silty, and clayey, and the core wall for antiseepage is made of concrete.Four seepage monitoring cross sections (MXF, MXG, MXS, and MXL) are arranged to monitor the dam seepage behavior and antiseepage effect of the core wall.Then, 20 osmometers are placed on the cross sections, where five osmometers are placed in each cross section.The osmometers in front of the core wall are called prewall osmometers, which are numbered as MXF1, MXG1, MXS1, and MXL1 to facilitate early recognition.The osmometers behind the core wall are called back-wall osmometers and are numbered as MXF2-MXF5, MXG2-MXG5, MXS2-MXS5, and MXL2-MXL5.The locations of the osmometers are exhibited in Figure 3.
In general, the current study uses prewall osmometers (MXF1, MXG1, MXS1, and MXL1) and the first osmometers of the back-wall (MXF2, MXG2, MXS2, and MXL2) as representative monitoring points.Seepage behavior is detected through a data mining method that uses the monitoring data, which are obtained from the osmometers from January 1, 1995, to December 31, 2014.The process lines for prewall osmometers MXF1, MXG1, MXS1, and MXL1 are demonstrated in Figure 4.Meanwhile, the process lines for back-wall osmometers MXF2, MXG2, MXS2, and MXL2 are displayed in Figure 5.
The qualitative analysis in Figure 4 denotes that the measured values of the prewall osmometers and the variations are similar.The qualitative analysis presented in Figure 5 indicates that the measured values of MXF2, MXG2, and MXS2 in the first osmometers of the back-wall are the same.However, the fluctuation of the MXL2 value from 2005 to 2010 significantly increases, thereby demonstrating an  abnormal phenomenon where the measured values of MXL2, MXF2, MXG2, and MXS2 are inconsistent.
The possible causes of the abnormal measured values of MXL2 include the following: (1) the core wall in the MXL monitoring section being damaged, thus resulting in an abnormal seepage behavior; (2) osmometer failures, such as external water infiltration in the MXL2 osmometer, and abnormal operation of the MXL2 osmometer.This study uses the data mining method to establish the detection model for seepage behavior.The seepage behavior is detected.Then, the causes of abnormal MXL2 data are speculated.

Extraction of PCs in the Prewall
Osmometers.The covariance matrix C  for prewall osmometers MXF1, MXG1, MXS1, and MXL1 is calculated by using (3), as displayed in Table 1.The covariance matrix C  for back-wall osmometers MXF2, MXG2, MXS2, and MXL2 is also calculated, as presented in Table 2.
From Tables 1 and 2, the covariance of the prewall osmometers is determined between 0.91 and 0.98; this covariance indicates a high correlation among the values.The covariance of MXF2, MXG2, and MXS2 is between 0.70 and 0.85, and the correlation is also high.However, the covariance of MXL2, MXF2, MXG2, and MXS2 is between −0.01 and 0.21, thereby indicating that MXL2 is weakly correlated with the first back-wall osmometers on the other monitored sections.
The eigenvalues and their variance contribution rate and the cumulative variance contribution rates of C p and C b can be calculated by using ( 4) and ( 6) as summarized in Table 3.
Table 3 indicates that the eigenvalue of F p1 in the prewall osmometers is considerably larger than the eigenvalues of the other PCs.In addition, the variance contribution rate of F p1 reaches 96.59%, which is higher than the threshold of 85.00%, thereby denoting that the main information from the original information of MXF1, MXG1, MXS1, and MXL1 can be explained using F p1 .Hence, F p1 can be used to describe the seepage characteristics of MXF1, MXG1, MXS1, and MXL1.The values X p1 , X p2 , X p3 , and X p4 represent MXF1, MXG1, MXS1, and MXL1, respectively.In (2), the expression of F p1 can be expressed as In (21), the coefficients of X p1 , X p2 , X p3 , and X p4 are extremely close; these coefficients indicate that the measured data of MXF1, MXG1, MXS1, and MXL1 are similar.Therefore, F p1 can express MXF1, MXG1, MXS1, and MXL1.
Table 3 also indicates that 63.85% of the original measured information can be explained by using the first PC F b1 in the first back-wall osmometers, whereas 25.55% of the original measured information can be explained by using the second PC F b2 .The cumulative variance contribution rate of F b1 and F b2 reaches 89.64%.Although the cumulative variance contribution rates of F b1 and F b2 are below the threshold, the information of F 3 and F 4 is significantly reduced.Therefore, the original measured information of MXF2, MXG2, MXS2, and MXL2 can be represented by F b1 and F b2 .Let X b1 , X b2 , X b3 , and X b4 be the measured data of MXF2, MXG2, MXS2, and MXL2, respectively, to calculate the expressions of F b1 and F b2 by using (2):  In (22), the coefficients of MXF2, MXG2, and MXS2 are close and significantly higher than the coefficient of MXL2, thereby indicating that F b1 mainly explains the original measured information of MXF2, MXG2, and MXS2.In (23), the coefficient of MXL2 is higher than the absolute value of the coefficients of MXF2, MXG2, and MXS2, thus denoting that F b2 mainly explains the original measured information of MXL2.
The values of F 1 during the monitoring period from January 1, 1995, to December 31, 2014, are calculated using (21), whereas those of F b1 and F b2 are calculated using (22) and (23), respectively.The process lines for F p1 , F b1 , and F b2 and upstream water level are illustrated in Figure 6.
In this figure, the process lines for F p1 and upstream water level coincide, thereby showing a strong correlation between F p1 and the upstream water level.The correlations between F b1 and the upstream water level and between F 2 and the upstream water level are not evident.In addition, F b1 mainly describes the seepage characteristics of MXF2, MXG2, and MXS2, thus exhibiting the strong regularity of process line.By contrast, F 2 mainly describes the seepage characteristics of MXL2.Therefore, a significant fluctuation in its process line is observed during the period of 2005-2010.
The PCA in data mining combines information that highly correlates and separates anomalous data.The key information for the original osmometers is expressed by F p1 , F b1 , and F b2 , thereby reducing the number of original variables and providing the basis for quantitative detection.

MI between PCs and Upstream Water
Level.Additional analysis is conducted by calculating the MI between PC and upstream water level to establish the detection model for seepage behavior and determine the cause of abnormal measurements of MXL2.Let I p1 , I b1 , and I b2 denote the MI values between F p1 and upstream water level, between F b1 and upstream water level, and between F b2 and upstream water level, correspondingly, during the period of 1995-2014.In (8), the probability density function of each PC and upstream water level can be obtained from the KDE when the bandwidth is set to 1.0, 0.5, and 0.1.The image is displayed in Figure 7.
In this figure, the probability density function can accurately simulate the distribution of PC and the upstream water level when the bandwidth is set to 0.1.The MI values I p1 , I b1 , and I b2 under this bandwidth during the period of 1995-2014 (i.e., 20 years) are calculated using (7).The matrix of I is expressed as ( = 1, 2, . . ., 20) . ( The process lines for the MI values are depicted in Figure 8.
MI reflects the correlation among variables, and MXF1, MXG1, MXS1, and MXL1 are placed in front of the core wall.Therefore, a high correlation theoretically exists between F p1 and the upstream water level, thus indicating that I p1 is large.However, the core wall plays the main role for antiseepage.If the seepage behavior of the earth dam is normal, then the correlations between F b1 and the upstream water level and between F b2 and the upstream water level should be significantly reduced; these conditions indicate that I b1 and I b2 are small.If the seepage behavior is abnormal, then the correlations between F b1 and the upstream water level and between F b2 and the upstream water level will increase, thereby indicating that I b1 and I b2 will exhibit a significant increase.In Figure 8, I p1 varies within the range of [1.17, 2.44], and I b1 and I b2 vary within the range of [1.30 × 10 −1 , 6.32 × 10 −1 ] during the period of 1995-2014.I b1 and I b2 are significantly lower than I p1 .Therefore, we can qualitatively consider that the seepage behavior is reasonable.

Detection Model of Seepage
Behavior.The result of Kolmogorov-Smirnov [23] analysis shows that I p1 follows a normal distribution N(1.86, 0.31 2 ), I b1 follows a normal distribution N(0.31, 0.11 2 ), and I b2 follows a normal distribution N(0.27, 0.10 2 ).The detection model is established using the distribution of MI values to quantitatively analyze the seepage behavior.The measured value of the MXL2 osmometer in 2005-2010 is evidently abnormal; that is, I b2 may not reflect the real MI between MXL2 (F b2 ) and the upstream water level.Therefore, the detection model is established based on the distribution of I p1 and I b1 , which reflects the real MI of the measured value and the upstream water level.
In (15) and ( 16), the confidence region is an ellipse when the number of PCs = 2.The means for I p1 and I b1 are 1.86 and 0.31.Significance level  is set to 0.05 and 0.01.Then, the two confidence ellipses can be obtained using (19) and (20) Equations ( 25) and ( 26) are considered the detection model for seepage behavior, and their images are exhibited in Figure 9.In this model, the seepage behavior can be determined based on the positions of  1, ,  1, , and  2, ( = 1, 2, . . ., 20) in the ellipses.( 1 The MI values ( 1, ,  1, ) and ( 1, ,  2, ) from 1995 to 2014 are plotted in Figure 9.In this figure, the values in other years ( 1, ,  1, ) and ( 1, ,  2, ) are in a normal state, except for the value of ( 1, ,  1, ) in 2004, which is in the early warning state.This result indicates that the seepage behavior is normal.Therefore, the significant fluctuation of MXL2 in 2005-2010 may be caused by equipment failure.

3.4.
Verifying the Speculation.The MXL2 osmometer was tested and analyzed through an engineering method to verify the speculation.
(1) The technical performance of the MXL2 osmometer was tested, and the results showed that the current service status of the MXL2 osmometer is qualified.
(2) The piezometer sensitivity in the MXL2 osmometer was also tested, and the results showed that the piezometer sensitivity in the MXL2 osmometer is unqualified.A certain degree of clogging occurred in the piezometer.
(3) The working records of the MXL2 piezometer were investigated and analyzed.The results showed that the dam surface was transformed in 2004.However, the piezometer in the MXL2 osmometer was poorly maintained, thereby causing rainfall infiltration.The piezometer was punched and cleaned at the beginning of 2011, and piezometer maintenance was conducted.Thus, the measured results of MXL2 after 2011 were normalized.

Conclusion
Seepage behavior is an important factor that affects the safety of earth dams.In this study, the PCA and MI methods are organically combined to detect the overall seepage behavior of earth dams.The monitoring data from different monitoring sections are effectively synthesized and mined.The detection model can eliminate the interference of osmometer failure and improve the accuracy of the detection, thereby providing a new method for detecting the overall seepage behavior of earth dams.
The main contributions of this paper are as follows: (1) The PCA method is applied to fuse the data of correlated osmometers, thus promoting the development of seepage detection from a single point to multiple points.(2) The detection model is established by MI distribution, which supports the improvement of seepage detection from being a qualitative method to being a quantitative method.In particular, the method can be extended to detect the behavior of concealed engineering such as core wall, foundation, and steel structure.

Figure 1 :
Figure 1: Modeling process for seepage detection of earth dams.

Figure 5 :
Figure 5: Process lines of the back-wall osmometers.

Figure 6 :
Figure 6: Process lines for F p1 , F b1 , and F b2 and upstream water level.

Figure 7 :
Figure 7: Probability density functions of the PC and upstream water level by KDE.
) If the MI falls within the range of (25), then the seepage behavior is normal.(2) If the MI falls within the range of (25) and (26), then the seepage behavior signals an early warning.(3) If the MI falls outside the range of (26), then the seepage behavior is abnormal.

Table 2 :
Covariance matrix of the first back-wall osmometers.

Table 3 :
Eigenvalues and the variance contribution rates of C p and C b .