Application of Clustering and Stepwise Discriminant Analysis Based on Hydrochemical Characteristics in Determining the Source of Mine Water Inrush

In order to explore the law of groundwater evolution, the water source connection between faults and aquifers and the main sources of mine water inrush in the deep mining area of Yangcheng Coal Mine in Jining City, 40 groups of hydrochemical samples were collected and analyzed by Piper Diagram and Durov Diagram. The results showed that the fluidity of groundwater developing to the deep became weaker, the value of total dissolved solids (TDS) became larger. So, the roof and floor of coal seam were more similar in water quality types due to the conduction of faults. Using principal component analysis (PCA) to the raw data, two principal components were extracted, and the principal component scores were used as clustering variables for hierarchical cluster analysis (HCA), 5 groups of abnormal water samples were eliminated and 3 clustering groups M1, M2 and M3 were obtained from the other water samples on the tree diagram. The results showed that the combination of HCA and hydrochemical analysis was more effective in screening water samples, and the 3 clustering groups could be qualified samples to represent 3 major aquifers (Taiyuan Formation limestone aquifer, Shanxi Formation sandstone aquifer and Ordovician limestone aquifer). Finally, taking M1, M2 and M3 as grouping variables, the discriminant functions 
 
 
 
 f
 
 
 1
 
 
 
 , 
 
 
 
 f
 
 
 2
 
 
 
 and 
 
 
 
 f
 
 
 3
 
 
 
 of the 3 aquifers were obtained, the results of stepwise discrimination analysis (SDA) showed that the discrimination model established by using 25 groups of standard water samples could discriminate the known water samples with the correct rate of 96%, 10 groups of unknown water samples collected at the fault are identified as Taiyuan Formation limestone water samples, which was consistent with the classification results of HCA, proving that the water inrush of fault DF53 was from Taiyuan Formation limestone aquifer, while the fault had little influence on Ordovician limestone aquifer.


Introduction
The problem of mine water inrush has been a serious limitation of the construction and development of the coal mine, if no measures are taken to prevent and control the waterabundance aquifer at the initial stage of mining, it is easy to conduct the aquifer or generate large fracutures without knowing it during the mining process, causing confined water in the water-abundance aquifer to flow into the working face, stopping work or production in light cases, even causing heavy casualties in heavy cases [1]. Therefore, it is of great significance to judge the causes of water inrush in time, find out the source of mine water, and then accurately and efficiently implement measures for each aquifer and water-conducting channel to prevent and control mine water inrush.
In recent years, water source identification methods have developed rapidly. Scholars have gradually developed a single identification method into an interdisciplinary and multi-theoretical comprehensive method to identify water sources [2][3][4]. Traditional water quality and hydrochemical analysis have evolved into more advanced techniques for identifying water sources. For example, the isotope method could be used to track the source of mine water and determine the connectivity of groundwater aquifers based on the composition of mine water and the ratio of measured isotope [5,6]; The theoretical formula of nonlinear mathematics was used to calculate the correlation between sample indicators and data, and the respective discriminant models and evaluation results were obtained, such as grey system theory [7,8], artificial neural network [9,10], fuzzy mathematics [11,12], which have good effects on distinguishing the source of mine water.
Multivariate statistical analysis was often combined with hydrochemical analysis, based on hydrochemical data. Factor analysis and weight were used to deal with multidimensional complex data and reflect the proportion of major ions causing groundwater pollution, the distribution of polluted ions is obtained to evaluate the pollution degree of groundwater [13]; PCA and HCA could also be combined to establish a model for analyzing and evaluating surface water quality changes and influencing factors, thereby identifying the main types of pollution sources [14]; Multivariate statistical analysis could also be used to identify the source of mine water inrush and the connection between aquifers and water-conducting channels, such as Liu et al. [15] who established a water sample database at the -375 m elevation of the Shandong coastal gold mine, used PCA and CA to identify the source of mine water, and established a Bayes discriminant model to test the sample groupings, and finally identified the faults through the workings as potential waterconducting channels. Wang et al. [16] based on the aquifer water sample data of Jiaojia Gold Mine, weighted multidimensional indexes with EWM, removed indexes with lower weights, used PCA and HCA to handle remaining 10 sample indexes, finally proved that the main source of mine water is the fault hanging wall, which provided a reference for mine water prevention. Sun and Gui [17] used PCA, CA and DA to build a model based on the water sample data from Renlou coal mine, which could understand the main aquifer from the perspective of ions, and the source of mine water could be judged by using the concentration value of main ions.
At present, mines entering the stage of deep mining are facing more complex water inrush environment, various geological structures will make different aquifers connect with each other, resulting in similar hydrochemical characteristics of collected water samples and superposition of hydrochemical data information, which makes it extremely difficult to identify water inrush sources.
For this reason, the author puts forward the method of PCA + HCA + SDA to discriminate the source of mine water, which can discriminate the mine water source with complex water source and complex structure, and the discrimination result has high accuracy, the flow chart of research methods is shown in Figure 1. At the same time, the function of HCA tree diagram is not only used for data classification, but also combined with Piper diagram to screen and elimi-nate data, which makes the screening of data more detailed and the result of discriminant equation more accurate. In this paper, the spatial position relationship between groundwater is fully considered, the direct optimal combination of several multivariate statistical analysis methods is studied, and finally the PCA + HCA + SDA model for distinguishing water inrush sources is proposed. On the basis of improving the correct rate of discrimination, the experimental data can be standardized, and the connection between water sources can be made clear. Provide a new solution for water source identification.

Study Area
2.1. Geological Conditions. Yangcheng Coal Mine is located in Guolou Town, Wenshang County, Jining City, Shandong Province, as shown in Figure 2. It belongs to temperate semihumid monsoon climate, with an annual average temperature of 13.5°C and an annual average precipitation of 664.7 mm, mostly concentrated in June to September. The mining area lies in the alluvial plain of the Yellow River, with flat terrain, high in the south and low in the north, with a ground elevation of +38.80 m~+41.60 m and an area of 42.63 km 2 .
The strata in the mining area are monoclinic structure with a strike of NE and a dip of SE, with a dip angle of 11°~25°and a local dip angle of 50°. The coal-bearing strata are mainly Carboniferous and Permian, with a total thickness of 257 m and 18 coal-bearing strata. 3 Coal seam is a thick coal seam sandwiched between sandstones of Shanxi Formation and is the main coal seam mined in the mining area, with an average thickness of 7.2 m and an inclination angle of 26°~30°. The No.3 mining area, which mainly mines 3 coal seams, is taken as the main area for water sample collection, and its elevation is -870 m~-1050 m, belonging to deep mining.
Fault structures are well developed in this area. The east, south and north directions are cut off by F1, F2, F3 and F4 major faults and Wensi branch faults,,, respectively. The main strike of faults can be divided into NE, NW and EW directions, of which NE direction is the most developed. Geophysical exploration shows that there are 124 large and small faults in the area, all of which are normal faults. F1, Jizhuang and Yangchengba faults are large in scale, large in drop and strong in ductility, forming the main body of faults in the area. Folds in this area are distributed in the northern part of the study area, mainly including Houcang syncline, Wanglou anticline, Qulou anticline, etc. The fault structure in the study area is shown in Figure 2.

Hydrogeological
Conditions. The aquifers which have great influence on the mining work of the working face can be divided into three types. (1) The first type is the sandstone aquifer of Shanxi Formation, distributed in the roof and floor of 3 coal seam, it is composed of sandstone, siltstone and mudstone, average thickness of the roof sandstone section is 31.69 m. Geophysical exploration shows that the roof sandstone section has certain static reserves. According to the production requirements, the roof sandstone water in coal seam is drained by drilling holes during mining, and the accumulated drained water reaches 95000 m 3 . The working face is currently less threatened   3 Geofluids by sandstone water, but roof gushing still exists. (2) The second type is Taiyuan Formation limestone aquifer, which is 54 m away from the coal seam floor and contains 8 layers of limestone strata, which are distributed alternately with fine sandstone, medium sandstone, siltstone and mudstone. Among them, 3 layers of limestone have great influence on the working face, which are 49 m~58 m away from the coal seam and generally have a thickness of about 4 m. The water pressure of the 3 layers of limestone water is relatively high. In the horizontal boreholes of -312 m and -650 m in the mine, the water pressure is greater than 1 MPa, some reaching 3.5 MPa, and the maximum water inflow is 45 m 3 /h. In the initial stage of mining and infrastructure construction, the limestone water of Taiyuan Formation is easy to threaten the working face.  Figure 3 shows the relative positional relationship between fault DF53 and 3 main aquifers.

Materials and Methods
A total of 30 groups of water samples were collected from 3 main aquifers, including Shanxi Formation sandstone aquifer (I), Taiyuan Formation limestone aquifer (II) and Ordovician limestone aquifer (III). They came from several main mining faces (3308, 3310, etc.) in No. 3 mining area of Yangcheng coal mine, and 10 groups of fault water samples from 3305 working face were collected as water samples to be tested. The collected water samples were sent to the Testing and Analysis Center of Shandong coalfield geology bureau of the Fifth Exploration Team for chemical testing and a water quality testing report was obtained, in which the main ions Na + + K + , Ca 2 + , Mg 2 + , Cl -, SO4 2and the total hardness (TH) of the water were determined by ion chromatography. HCO 3 and the total basicity (TA) of water was determined by titration with dilute sulfuric acid-methyl orange; Total dissolved solids (TDS) were obtained by filtering, drying and weighing water samples. The pH value was determined by pH tester. Finally, the ten discriminant index data of mine water were sorted and drawn into a table, as shown in Table 1.
3.1. Hydrogeochemistry. The groundwater of different aquifers usually shows differences in water chemical composition, so each aquifer has different water chemical characteristics [18,19]. Piper Diagram and Durov Diagram were commonly used in water chemical analysis. The ion composition and proportion of each water sample could be read out on Piper Diagram.
The relative contents of cations and anions in the water sample could be read out on the lower two isosceles triangles, respectively. The upper rhombus part was used to read the total ion proportion and chemical properties of the water sample. Water samples in each aquifer will be concentrated in the rhombus part of Piper diagram, while individual water samples with composition differences would deviate from most water samples. These non-standard water samples would be eliminated, but most "standard" water samples conforming to the composition characteristics of this aquifer would be found by screening water samples in this way [20,21].

Principal Component
Analysis. PCA is a basic multivariate statistical analysis method. Its main idea is to construct a linear combination of raw variables and compress multiple groups of complex variables into simple comprehensive variables to achieve the effects of compressing data and extracting main information [20][21][22][23][24]. In order to avoid the influence of different data units on the calculation results, the raw data was standardized by z-scores firstly, n samples were x i = ðx i1 , x i2 , ⋯, x ip Þ, and the formula was as follows [25,26]:

Geofluids
Z ij is the standardized data, and x j and s 2 j are the mean and variance of j column elements of the raw data, respectively.
Secondly, principal components were used to reflect as much variable information as possible, when the first principal component was not enough to represent the information of the raw variables, the second principal component was considered until most of the information about the raw variables could be represented by several principal components.
According to this idea, the relationship between PCA and raw variables was [27,28]:

Geofluids
On SPSS, most of the information of the raw variables needs to be expressed by the characteristic root of the principal component being greater than 1 or the cumulative contribution rate of the principal component expressing the original information being greater than 85%. SPSS is an analysis software introduced by IBM in 1984 to carry out statistical analysis and operation on data [15].
3.3. Hierarchical Cluster Analysis. Hierarchical Cluster Analysis (HCA) is the process of clustering samples with a higher degree of similarity into one category based on the features between them, and then re-aggregating the aggregated subcategories according to the degree of similarity, and finally all the sub-categories are aggregated into one large category, a process that can be represented as a tree diagram where it can be clearly seen which samples are more similar to each other [29][30][31].
HCA defines p variables of water samples as a point in pdimensional space, so the similarity can be expressed by the distance between classes and samples. Using different clustering methods, the clustering results may also be different, such as Single link method, Complete link method, Betweengroups link method and Ward's method. The smaller the distance, the higher the similarity between the two categories. The method used in this paper was the Between-groups link method, let the two groups be G p and G q , and the expression of the distance between the two classes was as follows [4,29]: Where x i and x j are the i and j samples of G p and G q ; d ij is the distance between samples of G p and G q ; n p and n q are the number of samples contained in the two groups. The distance d ij between samples used the Squared Euclidean distance formula, which was: Where x ik and x jk represent the k-th variable of samples x i and x j .

Stepwise Discriminant Analysis.
In a case where most of that water sample category are known, we need to introduce a discriminant function if we want to determine the type of unknown water sample or get the correct rate of discrimination. There are many variables, if all variables are introduced into the discriminant function, the calculation process will be complicated [32]. This article adopted the stepwise discriminant analysis (SDA), which would select the variables with a greater contribution to the discriminant result to introduce the discriminant function, which could greatly reduce the amount of calculation. Using Wilks' Lambda statistics as the principle of stepwise discrimination, we selected variables that can minimize the Wilks' Lambda of the population to enter the discrimination function for each step in the opera-tion process. In addition, in the discrimination process, we used the size of the F value to keep or delete variables. After p deletions, r variables were finally selected and used as variables of Bayesian discriminant function. Then, substituted the ion concentration into several discriminant functions to obtain the function value, and found the category with the largest function value, which was the final grouping of the sample [33][34][35][36]. , it could be seen that the main water quality of Shanxi Formation sandstone water was HCO3·Cl-Na type, and HCO 3 accounts for 30~40% of Shanxi Formation sandstone water. The formation of HCO 3 was initially due to the hydrolysis of potassium feldspar and albite in the coal measures strata in the deep strata, which increased the Na + +K + content in the groundwater. However, Na + and Ca 2+ can undergo reduction reaction in an anoxic environment under the formation, and then SO 4 2could be reduced to H 2 S through the action of bacteria, thus increasing the proportion of HCO 3 -. The chemical formula of the chemical reaction is as follows:

Results and Discussion
2Na + + CaSO 4 ⟶ Ca 2+ + Na 2 SO 4 On Figure 4(a), the points of cations in the sandstone water (I) of Shanxi Formation were concentrated, anions were dispersed, because Na + +K + accounted for more than 80% of total cations. In anions, the concentration of HCO 3 and Clin anions had a converse changing trend: HCO 3 accounts for 30~40% in water samples No.1-6, while the proportion in sandstone water samples 6-10 decreased, HCO 3 accounts for 10~20%, and samples No.14-15 were also collected from aquifer I, in which HCO 3 accounted for only 3~4%. But the proportion and concentration of Clwere both increasing. Obviously, the characteristic ion HCO 3 of sandstone water decreases, and the concentration of Cland SO 4 2rose sharply, the water sample type of Shanxi Formation sandstone changed from HCO 3 ·Cl-Na type water to Cl·SO 4 -Na type water.
On Figure 4(b), it could be seen that the TDS of sandstone water samples were low, within 0~4000 mg/L, it belonged to medium soluble solid mine water (according to the classification standard of coal mine water promulgated by China in 2015, mine water with soluble solid content between 1000 mg/L and 6000 mg/L belonged to medium soluble solid mine water, and mine water with soluble solid content >6000 mg/L belonged to high soluble solid mine water [37]).The change of anion SO 4 2could be analyzed 6 Geofluids   7 Geofluids from the runoff change of sandstone water, the mining disturbance caused by coal seam mining also produced fractures and fissures in the surrounding rock on the upper wall of the coal seam. Sandstone water could flow in the fissures, which accelerated the runoff of groundwater, and sulfides in coal strata undergo oxidation reaction when dissolved oxygen in water increases. The following is the chemical equation: Therefore, the concentration of SO 4 2in groundwater increased, from less than 7% to about 15%, and the flowing mine water also dissolved Clcontaining minerals, which made the concentration of Clincreased, from less than 20% to more than 30%, and the TDS of water samples had a gradual upward trend.
The anions and cations in the limestone water of Taiyuan Formation were mainly Cland Na + +K + , it could be seen from Figure 4(c) that the proportion of Na + +K + was more than 30% of the total ion content, and the content of Clwas more than 55% of the total ion content. The limestone water of Taiyuan Formation was a typical chloride water, and sodium chloride occupied a large proportion in the groundwater, and the water quality was Cl-Na type. The TDS of aquifer II water samples were between 3000 mg/L and 8000 mg/L, and the mine water could be divided into medium soluble solid mine water and high soluble solid mine water according to soluble solids. As could be seen from Figure 4(a), There was a phenomenon of overlapping water samples between aquifer II water samples, aquifer I water samples and unknown water samples, and the TDS or other ion contents were not completely consistent. Considering that mining operations might affect Taiyuan Formation limestone, it was speculated that there might be some channels that made Taiyuan Formation limestone water affect other aquifers. 8 Geofluids According to Figure 4(a), water sample No.5 should be excluded. The Ordovician limestone water sample contained a lot of SO 4 2and Ca 2+ (Ca 2+ accounted for 15% of the total ion content, and SO 4 2accounted for more than 25% of the total ion content), which was an important feature of Ordovician limestone aquifer and the main difference between it and the other two aquifers (I and II). SO 4 2-and Ca 2+ and Mg 2+ originated from the dissolution of gypsum, dolomite and other halides and sulphates of calcium and magnesium in underground strata. Similarly, Cland Na + +K + accounted for a higher proportion in Ordovician limestone water, and these ions came from halides of potassium and sodium which were more easily dissolved. The content of these ions increased, which made the content of TDS very high, and the TDS of No.27-28 water samples even exceeded 10000 mg/L. The III aquifer had a large buried depth, was poorly recharged by other water sources, and had a strong interaction between groundwater and rocks. The influence of DF53 fault on the third aquifer was not significant.

Spatial Distribution Characteristics of Hydrochemical
Elements. Figures 5(a)-5(g) compared and analyzed various elements in aquifer and fault water of three different groundwater heads. Since uncertian water samples were obtained by drilling holes near faults, No.31-40 water samples were used to represent fault water samples, which could intuitively reflect the evolution characteristics of groundwater and the connection between aquifer and faults. 1,500-2,800 2,800-4,100 2,800-5,400 5,400-6,700 6,700-8,000 8,000-9,300 9,300-10,600 10,600-11,900 Y a n g c h e n g b a fa u lt Ji z h u a n g fa u lt  Firstly, with the increase of water head depth, the content of various ions had obvious changes, and the law of change was not completely consistent. It could be seen that Na + +K + , Ca 2+ , Mg 2+ , Cl -, SO 4 2all had obvious increases, resulted in a significant increase in TDS as depth increases. In deep mines, the difference of TDS in groundwater was often related to the runoff velocity of groundwater. Under the action of coal mining, the roof of coal seam was affected by fissures and boreholes, and groundwater runoff increases, which was an important factor affecting groundwater level and flow velocity. Secondly, from the changes of these ions, we could also see some characteristics of groundwater under different water heads, for example, the concentration of SO 4 2-was very low at -960 m water head, while the concentration increased significantly at -1120 m water head; The contents of Ca 2+ and Mg 2+ at -1120 m water head increased significantly, and the proportion was larger than that at other water heads; The proportion of Na + +K + and Clin mine groundwater was relatively high, and their variation laws were similar, indicating that sodium chloride was one of the main components of mine groundwater. From the change of boxplots, it could be seen that some indicators were also significantly correlated, for example, with the increase of depth, the values of pH and TA both had a downward trend, because they were both related to the concentration of HCO 3 -, HCO 3 became weakly alkaline in water. With the increase of depth, the concentration of HCO 3 decreases, causing the alkalinity to decrease, and pH naturally decreases, finally approaching 7, indicating that mine water is weakly alkaline.
Figures 6(a)-6(d) showed the spatial distribution characteristics of Ca 2+ , TDS, Na + +K + and HCO 3 in the mining area. The high concentration value of Ca 2+ was distributed in the southeast of the mining area, and the concentration value was quite different between the south and the north. The Ca 2+ concentration near the Yangchengba fault and Jizhuang fault was generally low, both below 660 mg/L, because the groundwater near the fault could flow freely, and Ca 2+ in the water was easier to react with HCO 3 and CO 3 2to precipitate, while Ca 2+ could be preserved in the closed groundwater with small runoff conditions. The high concentration of TDS appeared in the south of the mining area. The fault in the north of the mining area affected the runoff change of groundwater. The TDS near the fault was relatively low. For example, the TDS in most areas near the Yangchengba fault was in the range of 500~2800 mg/L, and the TDS in most areas near the Jizhuang fault was in the range of 2000~4100 mg/L, only a few areas had a sudden increase in TDS, but the range is not wide. By comparing Figures 6(b) and 6(c), it could be found that the distribution law of Na + +K + and TDS was similar, they all had a minimum concentration near the Yangchengba fault. The high concen-tration was concentrated in the south and northeast of the mining area, which was closely related to the distribution of the fault, it could be seen that the conduction state of the fault affected the groundwater flow and also affected the ion concentration distribution in the groundwater. The concentration of HCO 3 was contrary to the distribution law of other ions. The highest concentration of HCO 3 existed near the Yangchengba fault in the north of the mining area, while the concentration of HCO 3 in the south of the mining area was getting lower and lower. This was due to the high concentration of CO 2 in the groundwater near the fault, which could react with water to form HCO 3 -, while the concentration of CO 2 in the deep stratum was low, and some cations would react with HCO 3 and precipitate. After the above research, it can directly reflect that the ion concentration change in the mining area not only has obvious distribution law in depth, there are also certain variation characteristics in the spatial range, this change is closely related to the geological structure in the region. The existence of faults and fissures will enable groundwater to flow freely, while runoff will change the ion concentration in groundwater and promote the interaction between water and rocks. Therefore, it is generally scientific and reliable to use the hydrochemical characteristics of groundwater to reflect the internal relations between aquifers.

Multivariate Statistical Analysis.
The Kaiser-Meyer-Olkin value was 0.731 of the data in Table 1, and the value of Bartlett test was 0 (less than 0.05), there was correlation between the raw data, and the data analysis was easily affected by potential factors [38], so PCA was suitable before data analysis. In Table 2, the correlation coefficient between variables was generally large (>0.7 was marked in bold), and the correlation between variables was high. PCA can eliminate the correlation between variables [39,40]. As can be seen from Table 2, the correlation coefficients of Na + + K + and Clare 0.954, Ca 2+ and SO 4 2are 0.861, which have high positive correlation, while HCO 3 is negatively correlated with other ions, indicating that HCO 3 gradually decreases with the increase of other ions in water samples, which is consistent with the results of hydrochemical analysis.
PCA was performed on the original data in SPSS, and the sum of two principal components (see Table 3) was finally selected, their cumulative variance contribution rate was 85.9%, and their eigenvalues were 7.076 and 1.514, respectively, meeting the conditions that the eigenvalue was greater than 1 and the cumulative variance contribution rate was greater than 85%. F 1 and F 2 respectively, reflected the variance contribution rate of 70.762% and 15.138% of the original information. Finally, the obtained principal component was a linear combination of the raw 10 variables, and the score formula of the principal component was as follows: Geofluids The principal component score was calculated by the principal component score formula, and the principal component score was saved as variables on SPSS as the basic data of HCA, and Q-type clustering analysis of HCA was carried out. Q-type clustering analysis is based on comprehensive comparison of different parameters between samples, grouping samples, and showing the relationship between samples in this way [41]. Except for the No.4 and No.5 water samples eliminated by water chemical analysis, the tree diagram of the remaining 38 groups of water samples was shown in Figure 7. On Figure 7, the connection distance (X axis) between groups is defined as D link /D max , the quotient of the connection distance and the maximum distance to represent the distance between samples [38,42]. When D link /D max ≤15, each water sample data can be clearly divided into 3 groups, named M1, M2 and M3, respectively, from top to bottom. From the results of tree diagram grouping, it can be seen that M1, M2 and M3 correspond to aquifers II, I and III, respectively, and most water samples could be correctly divided into the corresponding aquifers, which indicated that HCA could reliably group the original water samples.
The 3 water samples of 11, 14 and 15 were different from the original grouping (marked with red rectangle in Figure 7) and were eliminated from the database. Water sample 11 had a relatively low SO 4 2concentration, and HCO 3 and Clconcentrations were similar to those of sandstone water, so water sample 11 was wrongly classified as I from II. Water samples 14 and 15 belonged to aquifer I, they had higher Cland Na + and SO 4 2-, so water samples 14 and 15 were wrongly classified as aquifer II. It could be inferred that there was a hydraulic connection between the sandstone aquifer of Shanxi Formation and the limestone aquifer of Taiyuan Formation, therefore, the water samples of the two aquifers misjudged each other in a large number. From the ion level, Taiyuan Formation limestone water with high Cland Na + concentrations and large TDS was mixed with Shanxi Formation sandstone water, which made the Cland Na + concentrations in sandstone water gradually increase. Combined with the engineering practice, it was believed that the water conduction channels of the two aquifers should be the exploration and discharge boreholes of sandstone aquifer and fault DF53.
The water samples of Ordovician limestone were relatively concentrated, No. 5 water sample removed by hydrochemical analysis, the rest of the water samples were not misjudged, it showed that the Ordovician limestone aquifer under the No.3 mining area was relatively independent, and the mudstone aquifer of Benxi Formation had a good water blocking effect, and there was no interference from faults and a large amount of water gushing phenomenon. However, under the influence of mining disturbance, the threat of water gushing from Ordovician limestone still exists, and waterproof coal pillars still need to be retained near faults to ensure mine safety.
The correct 25 groups of water samples were introduced into the discriminant function, according to the requirement of normality test, Wilks' Lambda value was used to select variables, when the F value of the variable was greater than the specified "Enter" value of 3.84, the variable was retained, and when the F value of the variable was less than the specified "Delete" value of 2.71, the variable was deleted. Finally, 3 variables satisfying the conditions were retained, including Na + + K + , Ca 2+ and TA. The value of TA was highly correlated with the content of HCO 3 -. In this paper, the value of TA was used to introduce the discriminant function, if without measuring TA, the concentration of HCO 3 could also be introduced, which had little influence on the discriminant result of stepwise discriminant analysis. Taking HCA grouping M1, M2 and M3 as grouping variables, the discriminant functions of the 3 aquifers established by stepwise discriminant analysis were as follows: In the formula, f 1 , f 2 and f 3 , respectively, represent the discriminant functions of the 3 groups M1, M2 and M3, corresponding to aquifers II, I and III; wð Þ is that it corresponds ion concentration value. The model was used to distinguish 25 groups of water samples, of which No.9 water sample from sandstone aquifer was misjudged as Taiyuan Formation limestone water, and the rest water samples were all accurately distinguished, with a discrimination accuracy rate of 96%. The three ion concentrations of uncertain water samples were substituted into the formula to obtain three function values f 1 , f 2 and f 3 . The function with the largest function value is the grouping of water samples that were finally discriminated. Comparing the 3 groups of function values, it was found that f 1 value was the largest, and 10 groups of water samples to be tested were classified as M1 by discriminant analysis (see Table 4).
The scatter graph of the canonical discriminant function (see Figure 8) was easier to show the relationship between the 3 types of water samples. Finally, the water sample to be tested was identified as limestone water of Taiyuan Formation. This result was mutually verified with the tree diagram result of HCA. On Figure 8, the water sample to be tested was closer to limestone water sample of Taiyuan Formation, and the No.9 water sample from M2 group was closer to the center point of M1, which eventually could lead to misjudgment. Through the above discrimination results, it was proved that DF53 fault was the water conduction channel of Taiyuan Formation limestone aquifer, which made the sandstone aquifer of Shanxi Formation and the water source of Taiyuan Formation limestone aquifer connected. The mixed water sources of the two aquifers made the TDS of sandstone water increase and the ionic composition change.

Geofluids
The Ordovician limestone aquifer had little connection with other aquifers, and there was no misjudgment of water sources, which indicated that the Ordovician limestone water was relatively independent and less affected by faults.
Through the above analysis, it could be understood that the limestone water of Taiyuan Formation showed water gushing under the conduction of fault DF53, which was due to the expansion of rock fissures caused by mining, which made the originally closed aquifer became active and caused ions exchange in groundwater. The dynamic changes of groundwater were remarkable, the limestone of Taiyuan Formation and the sandstone of Shanxi Formation were located on the upper wall and the lower wall of the coal seam, respectively, under mining disturbance, Taiyuan Formation limestone water gushed out to the working face along fault DF53. As Taiyuan Formation limestone water contained a large amount of Na + + K + and Cl -, these ions also affected the quality of sandstone water in the upper wall. Sandstone water with low TDS and high HCO 3 content was mixed with Taiyuan Formation limestone water, which increased Clconcentration and TDS of mixed sandstone water samples. Combined with hydrochemical analysis and multivariate statistical analysis, it could be understood that faults and fractures were the main influencing factors of groundwater

Conclusions
Based on 40 groups of water samples collected from the working face in Yangcheng Coal Mine, this paper studied the evolution law of groundwater, analyzed the water quality characteristics of each aquifer, and screened the water samples by using hydrochemical analysis. The 10 variables were compressed by PCA, and the principal component scores were substituted into HCA to obtain the tree diagram classification results. Then the water samples were screened again, and 3 groups M1, M2 and M3 were obtained, representing Taiyuan Formation limestone water, Shanxi Formation sandstone water and Ordovician limestone water, respectively. Finally, M1, M2 and M3 were taken as grouping variables and the remaining water samples were taken as variables for stepwise discriminant analysis, and discriminant functions f 1 , f 2 and f 3 were obtained. The discriminant results were compared with the tree diagram for mutual verification. The following conclusions were obtained by combining various analysis methods: (1) With the development of groundwater in Yangcheng Coal Mine to the deep, the fluidity would become weaker and TDS would increase significantly. Sandstone water of Shanxi Formation was affected by fault DF53, which could be related to limestone water of Taiyuan Formation. Na + , Cland TDS in groundwater would increase obviously, making there were more misjudged water samples in the two aquifers (2) Combining hydrochemical analysis and HCA tree diagram to screen water samples is more accurate than using Piper diagram only to screen water samples, and the accuracy rate of the obtained results is higher (3) SDA retained 3 variables (Na + + K + , Ca 2+ , TA) that had great influence on the results and introduced discriminant function, which could correctly discriminate 96% of the known water samples. Ten groups of unknown water samples were classified into Taiyuan Formation limestone water samples and verified with HCA results, proving that there was a potential connection between fault DF53 and limestone aquifer of Taiyuan Formation, while fault had little influence on Ordovician limestone aquifer (4) The combination of PCA and HCA and SDA based on hydrochemical chemical characteristics could identify water sources in mines with fast identification speed and high accuracy, which could meet the actual requirements of mine engineering and has a guiding significance for the identification of mine water inrush

Data Availability
The data used in this article comes from the comprehensive evaluation report on the prevention and control of water hazards in the Yangcheng Coal Mine face of Shandong Jinan Luneng Coal and Electricity Co., Ltd. I promise that the data used is true and reliable without any modification and comes entirely from the real data collected from the mine face.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.  13 Geofluids