Quantitative Inversion Model Design of Mine Water Characteristic Ions Based on Hyperspectral Technology

In view of the problems of low measurement accuracy and repeated calibration during the use of coal mine water quality analysis, the hyperspectral re­ection noncontact measurement technology was proposed to solve the existing problems. KCl, NaCl, pH, NaHCO3, and CaCl2were used to indicate the characteristic ion information of Na, K, Ca, Cl−, HCO3, and pHmine water in the laboratory, and 2220 spectral data were obtained by spectral determination. Savitzky–Golay convolution smoothing was used to smooth and denoise the original spectral data of each ion, and the relationship between the spectrum and the concentration of each reagent was obvious after smoothing and denoising pretreatment. e principal component regression method was used to build the inversionmodel of each ion content, and through themodeling study, the prediction set of KCl was found: the coecient R2 reaches 0.907, RPD is up to 2.7; the prediction set of NaCl was found: the coecient R2 reaches 0.957, RPD is up to 3.1; the PH prediction set was found: the coecient R2 reaches 0.785, RPD is up to 2.1; the prediction set of NaHCO3 was found: the coecient R2 reaches 0.137, RPD is up to 1.2; the prediction set of CaCl2 was found: the coecient R2 reaches 0.622, and RPD is up to 1.7.e results show that the hyperspectral method can play a better role in the extraction of K, Cl−, Na, Ca, and pH. It is dicult to extract HCO3 ions.


Introduction
Water hazard is one of the main threats to the safety of coal mine production, which causes serious loss of life and property. e prevention and control of water disaster in coal mines take water lling channel, water lling source, and water lling intensity as the main objects and take exploration, prevention, blocking, dredging, drainage, interception, and monitoring as the main means. Water samples are collected after water inrush or water gushing occurs in a mine, and the source of the water inrush or water gushing is judged by using the chemical composition of the water. It is a method widely used by technicians of geological survey and water control engineering in coal mines.
In foreign countries, the rock mass structure of coal seam oor and the prevention and drainage technology have been studied in depth, and a lot of experience has been accumulated in the mechanism of water inrush and the identi cation of water hazards. In the book Hydrogeochemistry written by Clevers et al., the application of groundwater pollution and chemical evaluation in hydrochemical analysis is systematically discussed from the perspective of hydrogeochemistry [1][2][3][4]. Clevers et al. obtained it by using the 3D edge detection seismic attribute method [1][2][3][4]. Clevers et al. used hydrological observation and a tracer test to test the e ect of the tunnel drainage system [1][2][3][4]. However, there is little research work on the application of mine water chemistry and the identi cation of mine water inrush sources. e main method of discriminating the source of water inrush in coal mines in China is the conventional hydrochemical discrimination method. By measuring the eight most widely distributed ions in groundwater, such as Ca 2+ , Mg 2+ , K + , Na + , CO3 2− , HCO 3 − , SO4 2− , and Cl − . Its concentration accounts for more than 90% of the total ion concentration in groundwater, as well as the characteristic ion ratio, hardness, temperature, TDS index, and pH value [5-10]. e mine water chemical data of Taoyuan Coal Mine was processed by using Piper's three-line diagram [5][6][7]. e hydrochemical characteristics of each aquifer in the Xuzhou mining area were introduced [8,9]. Conventional hydrochemical methods were used to carry out hydrogeochemical analysis of underground aquifers in a mine in Xuzhou [10][11][12]. e conventional hydrochemistry of four waterbearing subsystems in Yaoqiao Mine, Xuzhou, was studied [13][14][15][16]. A systematic study on the hydrochemical characteristics of groundwater in the Ordovician karst aquifer in the middle part of the Taihang Mountains was made [17][18][19][20].
However, there are still some problems in the current underground ion electrode monitoring, such as inaccurate measurement and repeated calibration during use, which cannot meet the needs of online identification of water sources. It is urgent to develop a new type of online water quality analysis sensors.

Hyperspectral Experimental Determination of Common Ions in Mine Water
e purpose of the experimental test is to find the hyperspectral characteristic band of the liquid related to the coal mine. e experimental spectral acquisition equipment is a self-made spectral probe, and the experimental measurement process is composed of three parts of spectrometer calibration, standard solution production, spectral measurement, and accuracy evaluation [30].
Five reagents, NaCl, KCl, CaCl 2 , NaHCO 3 , and pH buffer, were measured to indicate Na + , K + , Ca 2+ , Cl − , HCO 3 − , and pH ion information, wherein the potassium ion and the chloride ion are indicated by KCl standard solution for a set of data (see Table 1 for details) [31][32][33]. Before measurement, the mother liquor is diluted with deionized water, and according to the test requirements, the sodium ion, potassium ion, chloride ion, and calcium ion dilution levels are 10, 50, 100, 500, 1000, and 10000 mg/ L, the carbonate dilution levels are 0.44, 2.2, 4.4, 22, 44, and 440 mg/L, and the pH dilution levels are 4, 6.86, and 9.18. According to the order of KCl, NaCl, pH, NaHCO 3 , CaCl 2 , pure water, empty barrel, and green plants, 8 kinds of targets were measured, totaling 2220 hyperspectral data. Figure 1 shows the number of spectra of various standard solutions.

Ion Hyperspectral Data Preprocessing and Sensitive Band Selection
We carry out spectral quality evaluation on all obtained spectral data and select qualified spectral data [34][35][36][37]. At the same time, due to the influence of the external environment, there are many "burr" noises on the spectral curve, so it is necessary to reduce the noise on the spectral curve after smoothing and filtering. In this study, Savitzky-Golay convolution smoothing was used to smooth and denoise the original spectral data of each ion. e value of the spectrum after Savitzky-Golay smoothing at wavelength I is (1) In the formula, x i,Savitzky−Golay is the smoothed value at the wavelength I, x is the value before smoothing, m is the number of smoothing windows on the wavelength side, N is the normalization index, and m j�−m c j is the smoothing coefficient, which can be obtained by polynomial fitting.
After smooth denoising pretreatment, the relationship between the spectrum and the concentration of each reagent is evident. Compared with the spectral data of "pure water + gradient" concentration, KCl, NaCl, pH, NaHCO 3 , and CaCl 2 have obvious sensitive bands and rules. e higher the concentration of KCl, NaCl, and CaCl 2 , the lower the overall reflectivity, which should be the mechanism under the action of Cl − . e pH data show that the reflectivity of pure water and acidic liquid is in the middle. e reflectivity of neutral liquid is low and that of alkaline liquid is the highest. As a whole, the higher the concentration of NaHCO 3 , the higher the reflectivity. Figure 2 shows the comparison of the KCl, NaCl, and pH spectral data before and after denoising, while Figure 3 shows the comparison of the spectral data of NaHCO 3 , CaCl 2 , and pure water before and after denoising.

Establishment of the Quantitative Inversion Prediction Model for Ion Hyperspectral Data
e mine water is a complex system composed of various chemical ions in the water. In this study, the principal component regression (PCR) method is used to establish the quantitative inversion model, which is based on principal component analysis (PCA) [38][39][40][41][42][43][44][45][46]. PCA is a multiple collinearity regression analysis method. e principle is that after the multicollinearity in the regression model is eliminated by the principal component analysis method, the principal component variables are used as independent variables for regression analysis, and then, the original variables are substituted back into the new model according to the score coefficient matrix. e basic steps of PCA are as follows: (1) e aim is to acquire a principal component of independent variable data through principal component analysis and select a principal component subset through standardized classification. For model evaluation, cross-validation was used to evaluate the model, and determination coefficients (R 2 and root mean of squared error (RMSE) were selected. e RMSE and relative percent deviation (RPD) were used as evaluation indexes. When the R 2 value of the calculated validation set is closer to 1, the RMSE value is lower, and when the RPD value is closer to 2, the model is more stable, the accuracy is higher, and the model is better. When R 2 is less than 0.50 and RPD is less than 1.40, the estimation ability of the model to the sample is poor, and the model is not available; 0.50 < R 2 < 0.75 and 1.40 < RPD < 2.00, the estimation ability of the model to the sample is improved, but only rough estimation can be made, and the model is available. When R 2 > 0.75 and RPD > 2.00, the model accuracy is high, the model is good, and the calculation formula is    Scientific Programming In the formula, y i represents the measured value of the sample I, y Δ i represents the predicted value of the sample I, y − represents the mean of all samples, n is the number of samples, and SD is the standard deviation of the measured values of the validation set samples.

KCl Content Spectral Prediction
Modeling. 382 standard solution spectral data were selected, the largest 7 principal components were selected, and the weights were set equally [47][48][49][50][51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and modeling set was 0.70. e first three principal components can represent more than 80% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.908, and in the prediction set, the coefficient reaches R2 which reaches 0.907, and RPD is up to 2.7. In the process of computational modeling, the importance of all sample points and the samples collected in the middle section play a greater role. Figure 4 shows the KCl principal component results, Figure 5 shows the comparison between the KCl actual measurement set and prediction sets, and Figure 6 shows the role of sample points in the calculation of KCl content.

NaCl Content Spectral Prediction Modeling.
ree hundred and ninety-nine standard solution spectral data were selected, the largest seven principal components were selected, and the weights were set equally [47][48][49][50][51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and the modeling set was 0.70. e first three principal components can represent more than 90% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.958, and in the prediction set, the coefficient reaches R2 which reaches 0.957, and RPD is up to 3.1. In the process of computational modeling, the importance of all sample points and the samples collected in the middle section play a greater role. Figure 7 shows the NaCl principal component results, Figure 8 shows the comparison between measured and predicted NaCl sets, and Figure 9 shows the role of sample points in the calculation of NaCl content.  Scientific Programming

pH Content Spectral Prediction
Modeling. 240 spectral data of standard solution were selected, the largest 7 principal components were selected, and the weights were set equally [47][48][49][50][51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and the modeling set was 0.70. e first three principal components can represent more than 85% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.791, and in the prediction set, the coefficient reaches R2 which reaches 0.785, and RPD is up to 2.1.
In the process of calculation and modeling, the importance of all sample points and the samples collected in the previous section play a greater role. Figure 10 shows the pH principal component results, Figure 11 shows the comparison between measured and predicted pH sets, and Figure 12 shows the role of sample points in the calculation of pH content.

NaHCO 3 Content Spectral Prediction
Modeling. 404 standard solution spectral data were selected, the largest 7 principal components were selected, and the weights were set equally [47][48][49][50][51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and the modeling set was 0.70. e first three principal Scientific Programming components can represent more than 75% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.162, and in the prediction set, the coefficient reaches R2 which reaches 0.137, and RPD is up to 1.2. In the process of computational modeling, the importance of all sample points and the samples collected in the middle and back end play a greater role. Figure 13 shows the NaHCO 3 principal component results, Figure 14 shows the comparison between measured and predicted NaHCO 3 sets, and Figure 15 shows the role of sample points in the calculation of NaHCO 3 content.

CaCl 2 Content Spectrum Prediction Modeling.
Four hundred and seventeen standard solution spectral data were selected, the largest seven principal components were selected, and the weights were set equally [47][48][49][50][51]. CV prediction detection, cross-validation, and the principal component analysis model were established when the proportion of the validation set and the modeling set was 0.70. e first three principal components can represent more than 55% of the content information. In the modeling set, the coefficient reaches R2 which reaches 0.630, and in the prediction set, the coefficient reaches R2 which reaches   Figure 16 shows the CaCl 2 principal component results, Figure 17 shows the comparison between measured and predicted sets of CaCl 2 , Figure 18 shows the role of sample points in the calculation of CaCl 2 content, and Figure 19 shows the comparison of extraction accuracy of various ions.

Conclusion
rough the spectrum analysis of the characteristic ions of the mine water, the principal component regression method is used to carry out the quantitative inversion modeling of various ions, and the five standard solutions of KCl, NaCl, pH, NaHCO 3 , and CaCl 2 indicate six ions (KCl includes K ions and Cl ions). e extraction precision of KCl and NaCl is higher than 0.9, followed by pH and CaCl 2 , the precision is more than 0.6. e extraction precision of HCO 3 is the lowest, only 0.162. e results show that the hyperspectral method can play a better role in the extraction of K + , Cl − , Na + , Ca 2+ , and pH. It is difficult to extract HCO 3 − ions.
Data Availability e dataset can be accessed from the corresponding author upon request.