Short-Term Coalmine Gas Concentration Prediction Based on Wavelet Transform and Extreme Learning Machine

1 School of Information and Electrical Engineering, China University Mining & Technology, Xuzhou, Jiangsu 221116, China 2 School of Medical Informatics, Xuzhou Medical College, Science and Technology Building E206, Dong Dian Zi Campus, Xuzhou, Jiangsu 220009, China 3Northern Nenghua Company of Wanbei Coal-Electricity Group, Huaibei, Anhui 235000, China 4 School of Medical Imaging, Xuzhou Medical College, Xuzhou, Jiangsu 221009, China


Introduction
It is well known that coalmine gas is one of the most important factors affecting coalmine security in production [1].The accurate forecasting of coalmine gas concentration is the basis of gas outburst prediction, gas explosion prevention, ventilation design, and so on [2].Therefore, enhancing research on reliable methods for coalmine gas prediction has positive significance on coalmine security [3].However, the coalmine gas is influenced by geological conditions, occurrence of coal seam, gas content of coal and rock, permeability coefficient of coal and rock, the depth of coal, mining process, and so on.There are dynamic nonlinear relationships among these factors [4][5][6].In addition, these factors are difficult to obtain in the coalmine, which bring great difficulties to the forecast of coalmine gas.
With regard to this, many researchers have turned their attention to gas time series prediction and many methods have been proposed in the gas forecasting field.Models of gas concentration forecasts are largely based on chaos time series [7][8][9], grey theory [10,11], fuzzy mathematics [12,13], neural networks [14,15], intelligent algorithm [16][17][18], support vector machine [19], gaussian process regression [20], and other mathematical or statistical methods [21,22].These methods have the same characteristics: the observed original values which are collected by gas sensors are usually directly used for building gas concentration forecasting models [23][24][25][26].
However, owing to the high-frequency, nonstationary fluctuations and chaotic properties of the gas concentration time series, a gas concentration forecasting model utilizing the original raw data often leads to an inability to provide satisfying forecast results.To solve this problem, before constructing a forecasting model, many studies would initially utilize information extraction techniques to extract features contained in data and use these extracted characteristics to construct independent forecasting model [27][28][29][30].

Mathematical Problems in Engineering
The useful or interesting information may not be observed directly from the observed original data but can be revealed in the extracted series through suitable signal processing methods.The wavelet decomposition and reconstruction can decompose the multicomponent signal information into a low-frequency approximate signal and a set of highfrequency detail signals.The low-frequency signal reacts to the inherent variation trend of the information while the high-frequency signal reacts to the stochastic disturbance influence of it.In view of the different rules of these two types of signals, different models and parameters can be utilized to independently predict these signals [31].In this study, the improvement in the accuracy of a forecasting model is achieved by wavelet-based transform.First, we decompose the sample data sequence of gas concentration time series into several components of various time-frequency domains according to wavelet analysis; then we use the ELM particularly established to make forecasts for all domains based on these components; finally, we arrive at the algebraic sum of the forecasts.Thereby, a relatively accurate forecast of mine gas concentration could be achieved.Thus, by means of a combination of ELM with wavelet analysis, we arrive at a model to forecast gas concentration.Based on the research and application in the II826 Coal Face of Luling Coal Mine of Huaibei Mining Group Company in Anhui Province, China, it shows that this method can take advantage of different features contained in data and effectively predict the gas concentration.
The rest of this paper is organized as follows.Section 2 analyzes corresponding basic theories and methods.The proposed hybrid method based on wavelet transform and ELM is described in Section 3. The numerical results and discussions are presented in Section 4. Section 5 includes the conclusions of this paper.

Wavelet Decomposition and Reconstruction.
The essence of the wavelet decomposition and reconstruction is to divide a set of primitive sequences containing comprehensive information into several groups of sequences with different characteristics by a group of band pass filters [32].In this paper, the Mallat algorithm for discrete wavelet transform (DWT) is adopted as the wavelet decomposition and reconstruction method.Let  = { 1 ,  2 . . .,   } be the original sequence, where  is the sequence length.The algorithm can be described as follows: where (⋅) and (⋅) represent the low-pass filter and highpass filter, and  +1 and  +1 are the components of the original signal in adjacent frequency band under the resolution of 2 −(+1) , while  +1 represent the low-frequency approximate component and  +1 represent the high-frequency detail component.Let  be the decomposition level.We can get  detail components  1 ,  2 , . . .,   and an approximate component   .For the length of the decomposed sequence is the half of that of the original one, binary interpolation method was adopted in the reconstruction sequence reconstructing [33], where It should be emphasized that the stationary wavelet transform (SWT) can also be used for frequency division.However, since the SWT is a nonorthogonal decomposition, there will be cross correlations among the resulted components.By contrast, the components will be independent when using DWT, which is convenient for obtaining the distribution of the original time series based on the forecasted distributions of the components [34].

Extreme Learning Machine.
ELM learning algorithm is a kind of the feed forward neural network with a single hidden layer.And the algorithm solves the problems including the slow convergence speed, easily falling into local minimum, and so forth, which exist in the most neural network learning algorithms [35].Both the theoretical analysis and the numerous experimental results have indicated that the ELM in most cases has better performance than that of the general back propagation neural networks (BPNN) learning algorithm [36].Besides, with far less learning time than the support vector machine (SVM) algorithm [37], the ELM learning algorithm can achieve almost the same effect as SVM [38].Therefore, the ELM learning algorithm is suitable for the practical application.And in view of this, this paper chooses the ELM as the base predictor.
Let  be training samples as {(x k ,   )}  =1 , where x k = [ 1 ,  2 , . . .,   ]  ∈   is the th vector of the input sample and   ∈  is the output variable corresponding to x k .Besides, the standard single layer feed forward network of the mathematical model with  hidden layer nodes can be described as follows: where   is the output vector of the th sample, w  = [ 1 ,  2 , . . .,   ]  is the input weight vector of the th hidden layer node,  = [ 1 ,  2 , . . .,   ]  is the output weight vector of the hidden layer node,   is the bias of the th hidden neuron, (⋅) is the activation function of the hidden layer, and w  ⋅ x  is the inner product of w  and x  .
Mathematical Problems in Engineering 3 For  training samples, to achieve zero error learning, we need to meet ∑  =1 ‖  −   ‖ = 0, and the condition is that (5) must be correct: Equation ( 5) can be described as follows in the form of matrix: In ( 6), The character of ELM is that the value of input weight w  and the value of hidden bias   are randomly assigned, and we can directly calculate the hidden layer output matrix H. Therefore, training ELM is equivalent to obtaining the leastsquares solution β of the linear equation H = T, and β can be described as follows: where H + is the Moore-Penrose generalized inverse of the matrix H [38][39][40].Since the output layer weight can be directly obtained, the ELM has the fast learning speed.At the same time, it also avoids the problem of easily falling into local minimum values due to the repeated iterations which is used by general neural network learning algorithms.In summary, the ELM algorithm can be divided into the following steps [41].
Step 1. Assign the random value of input weight vector w  and threshold value   in the hidden layer.According to Bartlett's theory [42], small weights will get better generalization performance, so we use the random value between 0 and 1 in practice.
Step 2. Calculate output matrix H in hidden layer.
Step 3. Calculate output weight vector  based on ( 8) and establish ELM model.
Step 4. Obtain the predicted values based on input variables.

The Proposed Wavelet-ELM Gas Concentration Forecasting Method
Firstly, we use the Mallat algorithm to decompose and reconstruct the original gas time series.Then, the different prediction models are established for the low-frequency approximate sequence and high-frequency detail sequences.At last, the final predicted value was calculated by the sum of the results of every prediction model.The flowchart of the multistep ahead prediction framework is depicted in Figure 1.The forecasting procedure is described as follows.
Step 1. Decompose and reconstruct the gas concentration time series into different component series (some detail sequences  1 ,  2 , . . .,   and an approximate sequence   ).In this process, there are two parameters that should be determined: the basic wavelet and decomposition level.Daubechies wavelet families are most appropriate for treating a nonstationary series and have been chosen as the basic wavelet in this paper [43], and the selection of Daubechies wavelet order is discussed in Section 4. The selection of decomposition level has a significant effect on the results obtained and this is also discussed in Section 4.
Step 3. Input and output vectors for the ELM model are obtained through phase space reconstruction with the time delay  and embedding dimension .The training process of each component is described in the previous section and the only parameter that should be ensured is the count of the hidden layer nodes.
Step 4. One-step ahead predicted value of each component series is obtained by trained ELM model.
Step 5. Predicted value of the gas concentration time series is obtained by superimposing the predicted values of all components.
Step 6. Determine whether the current reaches the need of look-ahead steps.If the condition is met, current predicted value is the last multistep forecasting value.Otherwise, append predicted value to the time series and go to Step 1.

Experimental Results
In this section, the effectiveness of the proposed method is evaluated by some experiments.Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 End of the forecast ahead step reached?

Lowfrequency
The last multistep predicted value One-step ahead predicted value where  and ŷ represent the actual and predicted value, respectively;  is the mean value;  is the total number of data points in the test set.A smaller indicator means higher accuracy of the forecast.The original data is firstly decomposed and reconstructed into 3 levels by wavelet transform based on db3 (db is the abbreviation of Daubechies, and db  means the Daubechies wavelets of order ).The original gas concentration time series and time spectra of the subbands at the 3rd layer are shown in Figure 2. It can be seen from Figure 2 that the lowfrequency signals embody the overall trend of the original gas concentration and several other subsignals represent the uncertainty inference.The wavelet decomposition can well identify the different characteristics from the original data and benefit the gas concentration prediction through different ELM models.
Then, the ELM models are established to get the onestep look-ahead prediction component of each subsignal and their sum indicates the final short time gas concentration prediction value.In the ELM models, the count of hidden layer is firstly set to 20; we get the average value of ten timeindependent predictions as the final predicted value.Table 1 gives the prediction performance of each component.
From Table 1, we will find that the order of the prediction performance is A3, D3, D2, and D1.That is because A3 is the smoothed low-frequency approximate component of the original gas concentration series which react to the inherent variation trend of the information and it can be easy to get a high fitness, while D3, D2, and D1 are the  high-frequency component of the original gas concentration series which react to the stochastic disturbance influence of the information; the higher the wavelet decomposition and reconstruction level, the stronger the detail component's randomness and the lower the prediction accuracy.This result coincides with the theory analysis.Figure 3 shows the chart of the final predicted value superimposed upon every subsequence and the actual data.From Figure 3, it can be seen that the final predicted data of the proposed method can fit the actual gas concentration data well.
To verify the effectiveness of the proposed method, routine methods are used to predict the gas concentration samples for comparison.These methods include classification and regression trees prediction model (CART) [45], back propagation neural network prediction model (BPNN) [36], support vector machine prediction model (SVM) [37], and extreme learning machine prediction model (ELM) [35].In the BPNN prediction model, we get the average value of ten time-independent predictions as the final predicted value, the network hidden layer transfer function is Sigmoid function, the transport layer transfer function is Purelin function, the training algorithm is gradient descent algorithm with variable learning rate momentum, and the learning rate is set to 0.1.In the SVM model, we choose radial basis function as the kernel function, particle swarm optimization (PSO) algorithm is used to optimize the parameters of SVM [46], optimized parameters include the penalty parameter , insensitive loss parameter , and kernel parameter , the number of particles is initialized to 30,  1 =  2 = 2,  = 0.7, iteration number is set to 1000, and initialization range of , , and  is set to [1, 1000], [0.001, 0.1], and [0, 10], respectively.The comparison of predicted results is shown in Table 2.
From Table 2, the forecasting accuracy of the WELM model is more promising than the results of previous works.Improvement in the MAE of the proposed approach with respect to the four previous approaches (CART, BPNN, SVM, and ELM) is 80.32%, 74.74%, 74.61%, and 74.07%, respectively.Improvement in the MAPE of the proposed approach with respect to the four previous approaches is 80.23%, 74.43%, 74.32%, and 73.72%, respectively.Improvement in the RMSE of the proposed approach with respect to the four previous approaches is 80.57%, 74.69%, 74.48%, and 74.26%, respectively.Improvement in the MAE of the proposed approach with respect to the four previous approaches is 80.64%, 74.79%, 74.58%, and 74.29%, respectively.From the column of training time and testing time, it can be seen that WELM method spent only 0.0155 s CPU time for training and 0.0073 s CPU time for testing, it is much less than CART, BPNN, and SVM algorithm, it is slightly more than ELM method because of the extra processing time of wavelet transform, this time is far less than the sampling interval, and it can be trained every time when the new data arrived that means the WELM model is suitable for automatic adjustment according to the time, while other models are not suitable for doing so due to long training time.Supplementary note: SVM algorithm requires a parameter optimization process which is too time-consuming, and the calculated training time is out of the statistical significance, so we did not list the corresponding calculation time.
Figure 4 shows the multistep (from 1 to 24 steps ahead) ahead forecast accuracy of the expectation value measured by the MAPE, the CART, BPNN, SVM, and ELM models used here for comparison.
For multistep ahead forecast, the gas concentration forecast is carried out by recursively taking the previous forecast values which is described in Section 3.That means the error will be recursively along with the increased steps, so the error will be increased according to the look-ahead step.According  to Figure 4, BPNN, SVM, and ELM have obviously better forecast accuracy than the CART model.Compared with the other four models, the proposed WELM model can improve the forecast accuracy significantly.
To illustrate the influence of the hidden layer nodes of ELM, the MAPE of WELM using different hidden layer nodes from 1 to 50 is shown in Figure 5.In the figure, we will find that the MAPE values are comparatively higher for less number of hidden nodes (from 1 to 5), while the forecast accuracy is flat which shows that the model performs equally well for different hidden nodes if they have high values, and this fact is equal to [35] that means ELM generalization performance is independent of the number of hidden nodes if the number of hidden nodes is considerably large, so, in the practice, we must choose hidden nodes higher than 5.
To illustrate the influence of the orders of Daubechies mother wavelets, the MAPE of WELM using Daubechies wavelets of different orders from 1 to 45 is shown in Figure 6.In the figure, db is the abbreviation of Daubechies, and db  means the Daubechies wavelets of order .It can be seen from the figure that MAPE according to  from 1 to 25 is decreased sharply, and the other mother wavelets have almost the same performance, particularly in the tail, so, in the practice, it is better to choose the Daubechies orders from 25 to 45.It should be noted that the MAPE of db1 which has the worst forecasting accuracy than others is only 2.03%, compared in  Table 2, and this is also more promising than other methods (CART, BP, SVM, and ELM) without wavelet transforms; the paper discussed above in Table 2 is using db3, which is comparably higher than Daubechies orders from 4 to 45, so there is still a lot of promising space when using other Daubechies orders.
Furthermore, wavelet decomposition level has influence on the prediction results.Specifically, the higher the wavelet decomposition level is, the smoother and more stable the approximation signal is, and the prediction accuracy is higher as well.However, with the increase of decomposition layers, the number of detail signals will also increase, and the errors will be superimposed because the number of detail signals has increased.More decomposition layers will bring more prediction errors, so the forecast accuracy will not increase with the increase of decomposition level.As a result, the prediction accuracy will fluctuate in a certain range.
Figure 7 shows the prediction errors of WELM using db3 wavelet decomposition when decomposition level is from 1 to 50.It can be seen from the figure that MAPE according to wavelet decomposition level from 1 to 3 is decreased sharply.The reason is that the forecast performance promising is source by the extraction of the stochastic disturbance influence, but the small wavelet decomposition level cannot extract significantly the random component which still remains in the approximate component.The MAPE according to wavelet decomposition level higher than 3 is fluctuating around 1.21%, according to the calculated performance; we can use 3 to 5 as the decomposition layers in the practical application.

Conclusions
The focus of this paper is to combine wavelet transform and extreme learning machine for predicting coalmine gas concentration.The coalmine gas time series is influenced by geological conditions, occurrence of coal seam, gas content of coal and rock, mining process, and many other factors.
As a result, it shows strong nonstationary and stochastic characteristic.Using a single model to forecast gas concentration is equal to forecasting a mixed signal by unified methods and parameters.Meanwhile, the random factors of gas concentration sequence will have an impact on determination of model parameters and final prediction results.The wavelet decomposition and reconstruction can decompose the multicomponent signal information into a low-frequency approximate signal which reacts to the inherent variation trend and a set of high-frequency detail signals which react to the stochastic disturbance influence.Different ELM models with different parameters can be utilized to predict these new signals independently.The proposed model is compared with CART, BPNN, SVM, and ELM for one-step and multistep prediction.Simulation results show that the ELM model with wavelet-based preprocessing greatly outperforms the other four models.Furthermore, we still discuss the selection principles of ELM hidden layer nodes, the orders of Daubechies mother wavelets, and the wavelet decomposition level.For coalmine gas concentration time series, we must choose hidden nodes higher than 5, it is better to choose the Daubechies orders from 25 to 45, and we can use 3 to 5 as the decomposition layers in the practical application for good performance and accuracy.

Figure 1 :
Figure 1: Gas concentration multistep ahead prediction framework based on WELM.

3 Figure 2 :
Figure 2: Wavelet decomposition and reconstruction of the original gas concentration time series.

Figure 3 :
Figure 3: Actual value (the solid line) and one-step ahead WELM model output (the dashed line).

Figure 4 :
Figure 4: Multistep ahead forecast accuracy comparison with different methods.

Figure 5 :
Figure 5: Performance comparison with different ELM hidden layer nodes.

Figure 6 :
Figure 6: Performance comparison with different orders of the Daubechies wavelets.

Figure 7 :
Figure 7: Performance comparison with different wavelet decomposition levels.
* and  * are the dual operators of  and .Detail sequences  1 ,  2 , . . .,   and approximate sequence   are the reconstruction sequences of  1 ,  2 , . . .,   and   ; they have the same length with original sequence, and the original sequence can be represented as the sum of reconstruction sequences, The dataset used in our experiment is collected from the Coalmine Security Monitoring System named KJ98 in the II826 Coal Face of Luling Coal Mine of Huaibei Mining Group Company in Anhui Province, China.We test the proposed model with 1000 gas concentration samples; the first 800 data points are used as the training sample, and the remaining 200 data points are used as testing sample, and every data points scale is 10 seconds.The prediction performance is evaluated by the mean absolute error (MAE), the mean absolute percentage error (MAPE), the root mean square error (RMSE), and the normalized root mean square error (NRMSE).The definitions of these criteria are as follows:

Table 1 :
Prediction error of wavelet decomposition sequence.

Table 2 :
Comparison of different prediction methods.