Coal-Rock Recognition in Top Coal Caving Using Bimodal Deep Learning and Hilbert-Huang Transform

. This study employs the mechanical vibration and acoustic waves of a hydraulic support tail beam for an accurate and fast coal-rock recognition. The study proposes a diagnosis method based on bimodal deep learning and Hilbert-Huang transform. The bimodal deep neural networks (DNN) adopt bimodal learning and transfer learning. The bimodal learning method attempts to learn joint representation by considering acceleration and sound pressure modalities, which both contribute to coal-rock recognition. The transfer learning method solves the problem regarding DNN, in which a large number of labeled training samples are necessary to optimize the parameters while the labeled training sample is limited. A suitable installation location for sensors is determined in recognizing coal-rock. The extraction features of acceleration and sound pressure signals are combined and effective combination features are selected. Bimodal DNN consists of two deep belief networks (DBN), each DBN model is trained with related samples, and the parameters of the pretrained DBNs are transferred to the final recognition model. Then the parameters of the proposed model are continuously optimized by pretraining and fine-tuning. Finally, the comparison of experimental results demonstrates the superiority of the proposed method in terms of recognition accuracy.


Introduction
Coal is an important source of energy, accounting for approximately 29.21% of primary energy consumption of the world in 2015 according to BP Statistical Review of World Energy (June 2016).China both produces and consumes large amount of coal, accounting for 47.70% and 50.01% of global coal production and consumption in the past year, respectively.Approximately 12.84% of coal reserves in the world are distributed in China, of which 44.8% are thick coal seam.Therefore, safe and efficient mining thick coal seam is considerably important.Fully mechanized top coal caving has been widely applied in the mining of thick coal seam due to its safety, high efficiency, high yield, and low production cost.
However, low-level automation and intelligence have always been problems in fully mechanized technology on top coal caving.Particularly, one of the key technologies of caving degrees completely relies on people's judgment.
Relying on artificial vision and auditory in determining the degree of caving is prone to over-and less caving caused by harsh environment, including poor light, coal dust noise, and narrow space.Overcaving and less caving can lead to low recovery rate, decline in coal quality, and increase in cost.In addition, the safety and health of operators are often threatened because they are relatively close to coalfalling areas.Therefore, an accurate and rapid approach of identifying coal-rock is considerably important to control the coal-falling time precisely and improve the automation and intelligence of a fully mechanized top coal caving.These objectives are important in ameliorating the working environment of coal miners, improving coal recovery rate, and reducing production cost.
Since 1960s, more than 20 types of coal-rock interface identification methods and coal-rock recognition methods have been proposed by researchers.The representative coal-rock interface identification methods include detection through artificial -ray, natural -ray [1][2][3], and radar [4].The artificial -ray detection method uses -ray backscattering to detect the thickness of the top coal and identify coalrock interface.However, artificial -ray is harmful to human beings, its penetration ability is limited, and it has been gradually abandoned.Natural -ray detection measures the intensity of gamma ray in the roof passing through the remaining coal seam and determines the thickness of the top seam according to its attention law to identify coalrock interface.Natural -ray detection has been relatively mature for commercialized coal-rock interface recognition.However, when the top coal contains no or low radioactive elements or contains excessive gangue, natural -ray detection is no longer applicable.In addition, the cost of this detection is high.Radar detection utilizes the reflection of the electromagnetic wave at the interface of coal gangue to detect the thickness of the top coal.Moreover, the advantages of radar detection include far resolution distance and extensive application range.However, radar detection is no longer applicable when the thickness of the top coal is extremely thick.
The coal-rock identification method recognize coal and rock to determine whether it reached the coal-rock interface.The representative methods include detection through cutting force response [5], image [6,7], vibration acoustic [8][9][10][11][12][13][14][15], and infrared [16].Cutting force response detection using drum picks will have different performance cutting coal or rock to identify coal and rock.This method has good adaptability, but it is inapplicable to top coal caving.Image detection uses coal and gangue with different colors, hardness, gloss, and other information to identify coal and rock.However, this method is sensitive to dust and light and needs further research.Infrared detection uses a cutting machine to produce different temperatures when it cuts to coal and rock and determine whether the cut is coal or rock.This method has a quick response and is ideal for real-time application.However, this detection is still immature because of its sensitivity to an ambient temperature.
In recent years, the coal-rock recognition method based on vibration technology has been widely used.Proposed by the US Mining Bureau, the vibration method mainly comprises acoustic method, slot wave seismic method, and mechanical vibration method.Its principle is coal and gangue, which have different frequencies.In [10], the vibration of till beam was analyzed to recognize coal and rock in the top coal caving process.In [11,12], vibration signals were used to determine whether cutting machine is cutting coal or rock.In [13], the cutting acoustic signal is used to recognize the cutting pattern.In [15], acoustic signals of coal and rock in fully mechanized face were analyzed to rock-coal recognition in top coal caving.
Although significant progress has been made on coalrock recognition in the past few decades, challenges still remain.Majority of the previous works based on vibration technology independently treated the acoustic or mechanical vibration methods, wherein only the acoustic or mechanical vibration is employed for recognition.Each single modality has been demonstrated to be useful for recognition.However, one single modality alone cannot provide sufficient information on the differences between coal and rock.Therefore, the method of integrating two modalities to improve the recognition accuracy of rock and coal still needs further investigation.
Several algorithms were proposed to address the representation learning for multiple modalities.In [17], video and audio modality inputs were employed to learn bimodal deep belief networks (DBN).In [18], multimodal deep neural networks (DNN) were proposed to study the correlation between texture and landmark modalities for facial expression reorganization, wherein several stacked autoencoders (AE) were used.In [19], bimodal DNN were used to determine driver fatigue expression.In this study, acoustic and mechanical vibrations are integrated for the first time, and feature statistics and classification are assembled together for coal-rock recognition.Acceleration and sound pressure sensors are employed to detect the vibration and acoustic wave signals.
A joint representation layer for recognition is learned from the acoustic and mechanical vibration modalities.
In addition, transfer learning method is adopted, because transferring knowledge of a general object from classification to recognition task has been found to be successful in visually similar categories [20].In the present study, bimodal transfer DNN (BT-DNN) is initially pretrained by simulating samples (coal and rock hitting the tail beam in the laboratory environment) and irrelevant samples (hand knocking the table),and solving the problem in which massive samples are necessary to train DNN.Then, the proposed model is pretrained by real training samples.Last, supervised finetuning is performed.With bimodal learning and transfer learning processes, the proposed method not only has high recognition accuracy but also has the advantages of low cost, extensive adaptability, insensitivity to the environment, and simplified technical difficulty.
Another important step in the recognition is extracting effective features from measured signals.The most commonly applied time-frequency analysis methods include Fourier transform (FT) [21][22][23], wavelet transform (WT) [24][25][26][27], and Hilbert-Huang transform (HHT) [28][29][30][31] methods.These methods have their own advantages and application areas.In this work, HHT is utilized to process signals.In addition, five statistics characteristics are computed, which include four characteristics based on intrinsic mode functions (IMFs) and one characteristic based on Hilbert marginal energy spectrum.However, a few extracted features may not provide several contributions for recognition.By contrast, combining different features as input yields an efficient combination of features.
The rest of this paper is organized as follows.Section 2 introduces the used methods.Section 3 describes the recognition system of coal-rock in top coal caving based on the proposed method, including signal acquisition, features extraction, and experiments.Section 4 summarizes this paper and proposes future work.

Deep Belief Network.
A deep belief network (DBN) is a neural network that contains several layers of restricted Boltzmann machine (RBM), in which the input layer of the

Visible layer
Hidden layer next RBM is the output layer of the previous RBM.RBM was proposed by Smolensky in 1986 [32]; it is a probabilistic graphical model that can be explained by stochastic neural network.RBM is a binary graph in which visible units are connected to hidden units; however, no connections exist between visible-visible units or hidden-hidden units.
The visible layer represents observations, and the hidden layer learns features.RBM is applied to numerous machine learning methods due to its desirable properties.In particular after Hinton et al. proposed DBN based on RBM as a basic component [33], RBM has been successfully implemented in dimensionality reduction [34], feature learning [35], and classification [36].The schematic of RBM is shown in Figure 1.
RBM is an energy based model, and its energy function is as follows: where  V and  ℎ denote the number of neuron units in the visible and hidden layers, respectively.V  and ℎ  denote the states of the th neuron in the visible layer and the th neuron in the hidden layer, respectively.  and   denote the bias of the th neuron in the visible layer and the th neuron in the hidden layer, respectively. , is the weight associated with the connection between units  and  in the hidden and visible layers, respectively.Given the independence of the activations of hidden and visible units, the individual activation probabilities are as follows: In this study, Gaussian-Bernoulli RBM was used, and its energy of joint configuration is Meanwhile, the conditional distribution is Apart from the preceding differences, Gaussian-Bernoulli RBM is the same as binary RBM.
DBN was pretrained layer by layer by RBM, and backpropagation (BP) algorithm was adopted to fine-tune the entire network for the optimization of all network parameters.Ngiam et al. in 2011 [17] to learn features over multiple modalities (image and audio modalities).The authors proved that if multiple modalities were present at feature learning, then one modality can be better learned.Since the idea of multimodal learning has been proposed, many researchers have applied this idea and achieved good results [18,19].

Bimodal Learning. Multimodal learning was proposed by
As shown in Figure 2, the proposed bimodal coal-rock recognition system in this study included bimodal DBNs, a joint representation layer, and an output layer.Each DBN had bimodal RBMs, and the architecture of each DBN was determined after testing 100 different architectures.The details on the determination of each DBN and joint representation layer architecture are provided in Section 3.3.1 and Section 3.3.2,respectively.
The training process of this model could be performed in four steps.
First, two DBNs were trained for acceleration and sound pressure, respectively.
Second, two DBNs and a joint representation layer were combined as a joint RBM, which was trained afterward.
Third, the two DBNs, joint RBM, and the output layer were combined to form a DNN.The DNN was subjected to greedily layer-wise training.The training process was still an unsupervised training, which is also called pretraining.
Fourth, this network was fine-tuned to strengthen the recognition capability; this process was a supervised training.

Transfer Learning. Deep networks usually need a large number of training samples to optimize their parameters.
The limited labeled sample is the weakness of deep learning.To overcome this difficulty, transfer learning is employed to transfer knowledge from related data.In transfer learning, knowledge obtained from different but related works with sufficient samples is used.Moreover, this method has already achieved some success in the identification field [17,34].
In this study, whether this transfer learning property of DNN could be generalized to coal-rock was explored.First, samples of simulating coal and rock hitting the tail beam were used to pretrain the DNN model.Second, samples of  hand knocking the table were used to pretrain the DNN model.Because these samples are also acceleration and sound pressure which belong to the same category with training samples, they played a pretraining effect and optimized the parameters.In the two preceding processes, the bimodal learning method is used.After the unsupervised training, the parameters of DNN are transferred to BT-DNN (Figure 3).

Hilbert-Huang Transform.
At present, the most commonly applied time-frequency analysis methods are FT, WT, and Hilbert-Huang transform (HHT).HHT comprises two parts: ensemble empirical mode decomposition (EEMD) and Hilbert spectral analysis (HSA).EEMD was proposed in 2004 [37].Different from EMD, EEMD solves the mode mixing problem by adding a certain amount of Gaussian white noise to the original signal each time before decomposing [38].Moreover, EEMD has better performance in nonlinear and nonstationary signals than FT and WT, which are extensively applied in many industries.EEMD was selected in this study due to the strong background noise of the caving environment and the nonlinear and nonstationary measured signals.By using EEMD, several IMFs were obtained: where () is the original signal,   () is th IMF, and   () is the trend item.
The obtained IMFs were transformed by Hilbert, and Hilbert energy spectra were obtained.
where (  ()) is the Hilbert transform of   (),   () is the amplitude function of   (), and   () is the phase function of   ().
Hilbert time-frequency spectra of the signal can be obtained as where   () =   ()/.Finally, Hilbert marginal energy spectra were obtained.3.1.Signal Acquisition.Top coal caving working site and the self-designed experimental system are shown in Figures 5(a) and 5(b), respectively.The acceleration and sound pressure varied when coal and rock hit the tail beam supported by a hydraulic supporter.Thus, identifying coal-rock by detecting the tail beam acceleration and sound pressure is feasible.The sensors were installed on the back of the tail beam due to the following reasons.First, the main impact locations of coal and rock falling are the tail chute and beam.However, if the sensors are installed on the tail chute, then they will be easily buried by falling coals and gangues.In addition, sensors may be damaged by the continuous spraying of water, which is used for dustproofing.Moreover, the running conveyor chute generates numerous noise that will add to the difficulty of data processing and analysis.Therefore, the ideal location of sensors is at the back of the tail beam.Two sensors were employed, acceleration and sound pressure sensors; their installation method was magnetic base.Sensors and specific installation location are shown in Figures 6(a) and 6(b), respectively.Their parameters are shown in Table 1.Data acquisition system is composed of data acquisition front end, wireless router, and software system.The data collection process is as follows.The acceleration and sound pressure of the tail beam were detected by sensors, obtained by the data acquisition front end, and then transmitted to a notebook by a wireless router.The data were obtained in No. 1306 working platform in Xinglongzhuang Coal Mine.The thickness of coal is 7.34 m to 8.90 m.The caving method is one knife; in one caving, the distance of the caving step is 0.8 m.The frequency of the sampling is set at 65536 Hz; the time of each pattern sampling is 52 s.The sampling time of each sample is determined in Section 3.3.6.The signals are shown in Figure 7.

Feature Extraction.
Feature extraction is important in pattern recognition, because selecting effective features significantly contributes to the improvement of recognition accuracy.By using EEMD, several IMFs were obtained, as shown in Figure 8.
The selection of statistics is significantly important for feature extraction.In this study, five statistics characteristics were computed.The four characteristics based on IMFs included skewness, kurtosis, energy, and variance, and one characteristic based on Hilbert marginal energy spectrum is the energy statistics of frequency division.The characteristics were calculated as follows.
(a) Energy based on IMFs is (d) Variance based on IMFs is (e) Segmented energy ratio based on Hilbert marginal energy spectrum: the frequency was divided into 10 bands, and the ratio of each frequency band to the total energy was calculated: When using acceleration as the algorithm's only input, we chose  statistics from the 5 statistics as feature input;  varied from 1 to 5.After ∑ 5 =1   5 = 31 experiments, the most effective combination of statistics for acceleration input is In the same way, we determined the feature when the sound pressure is the algorithm's only input.The corresponding combination of statistics is kurtosis, variance, and energy statistics of frequency division (82.37%).
When the acceleration and sound pressure are set as input together, we chose  statistics from 5 acceleration statistics and  statistics from 5 sound pressure statistics as feature input; the total number of combinations is ∑ 5 =1   5 × ∑ 5 =1   5 = 961.After 961 experiments, it was found that when the number of acceleration statistics is set to 5 and the number of sound pressure statistics is set to 1, corresponding to 5 all statistics of acceleration and skewness of sound pressure, the recognition rate reached the maximum value (99.13%).And other comparison experiments use all 5 statistics of acceleration and skewness of sound pressure as feature input.

Comparison among Algorithms with Different DBN
Structures.The network structure, which includes the unit number of the visible and hidden layers, has a profound influence on network performance.A total of 100 combinations were tested for each DBN to seek the optimal structure.The first and second layers of acceleration and sound pressure DBNs both varied from 50 to 500 at intervals of 50.The result of acceleration DBN is shown in Figure 9. Good network performance was observed when the first hidden layer was set from 50 to 150, and the second hidden layer was set at more than 150.The structure of acceleration DBN was set to [42, 100, 250] to build the best recognition deep network.The structure of sound pressure DBN could also be determined in the same manner, with its optical structure set to [8,150,100].

Comparison among Algorithms with Different Numbers
of Joint Representative Units.The number of joint representative RBM units is considerably important for the proposed model performance, which belongs to the hyperparameter of the network.The number of joint representative units varied from 25 to 1000 to obtain a better performance, results shown in Figure 10.
It shows that the performance gradually and rapidly becomes better initially, then stabilizes, and finally decreases because of the increasing number of joint representative units.This result could be attributed to having more units resulting in more parameters for learning more information regarding inputs.However, excessive parameters could not be learned, because the number of training samples was limited.If the number of units was too small, then learning the information would be difficult.The network performed best when the unit number was set to 150.

Comparison of the Algorithms Using Unimodality and
Bimodality.One of the key ideas of this study is integrating acceleration and sound pressure for coal-rock recognition.Thus, algorithms with unimodality and bimodality were compared.Figure 11 is the results of comparison of algorithms using acceleration and sound pressure, respectively.It shows that the method using acceleration (90.73%) has better performance than using sound pressure (82.37%), but inferior to the combination of two modalities (99.13%).
Another issue is that the modality number may affect the performance of network, algorithms with different input methods were compared.One input method was treating acceleration and sound pressure as one vector input fed into the recognition network.The recognition result is shown as the green bar in Figure 12.The other input method was treating acceleration and sound pressure features separately as two modalities fed into the recognition network.The recognition result is shown as the yellow bar.Separately treating two modalities evidently helped the algorithm obtain a better performance (3.63%) than combining features together.

Comparison among Algorithms with and without RBMs.
Pretraining plays an important role in the recognition of coal and rock.Figure 13 shows the comparison of algorithms with and without pretraining.The comparison shows that the model with unsupervised pretrained RBMs (99.13%) has 4.83% improvement compared to the model without pretraining (94.30%).This result is due to the BP algorithm that used gradient descent to gain the optima and pretraining, which allowed the network to be fine-tuned at a relatively good initial value rather than a few random initial points, thereby potentially avoiding the risk of falling into poor local optima.Therefore, pretraining is significantly necessary for model training.pretraining in the recognition of coal and rock (Table 3).In this experiment, algorithms with and without transfer learning were compared.Herein, the algorithm is pretrained first by the data obtained from the laboratory environment simulating coal and rock hitting the tail beam, then pretrained by the irrelevant data (hand knocking the table).The result shows that the model with transfer learning has 0.70% improvement compared with the model without transfer learning.This result is due to the transfer learning optimizing the initial value of formal training to enable the network to easily obtain the global optimum, which is similar to pretraining.

Comparison of Algorithms Using Different Lengths of Time.
In this experiment, algorithms with different lengths of time per sample, from 7.8125 (each sample contains 512 sampling points) to 125 ms (each sample contains 8192 sampling points), were compared.Figure 14 shows that given the increasing length of sample time, recognition accuracy rapidly improves initially, then stabilizes, and finally decreases slightly.The algorithm obtained the best performance when the sample time length was set to 62.5 ms (each sample comprised 4096 sampling points).A longer sample time resulted in more included information, and performance gradually improved with the increasing length  of time.However, when the time length of the sample was extremely long, a few samples may contain two states; and a longer time would increase the data processing time, thereby affecting the real-time performance.

Comparison among Algorithms with Different Classifiers.
In this experiment, to prove the superiority of the proposed method, other five algorithms were compared, including nearest neighbor (KNN), support vector machine (SVM), Naïve Bayes, decision tree, and DBN.As shown in Figure 15, decision tree is inferior to the other algorithms in this experiment.Naïve Bayes performs better than the other methods in rock recognition, except for DBN and the proposed method.SVM and KNN fairly perform, and KNN performs better   than SVM, especially in rock recognition.DBN performs well in rock recognition; however, in the coal recognition, DBN is inferior to the proposed method.Evidently, the proposed method performs best in this comparison experiment.

Real-Time Comparison of Algorithms.
In this experiment, the real time of methods was compared, because fast and real-time identification of coal or rock during the process of top coal caving is very important to achieve the automated mining.The processing time measured using the computer system clock is estimated using Matlab R2017a on an Intel(R) Core (TM) i7-6700 CPU @ 3.40 GHz with 8 GB RAM running on Windows 10 operating system.The average data processing time and recognition time of each algorithm per sample are shown in Table 4.It can be seen that data processing time accounts for the vast majority of the total time (more than 99.9%).Because the total time is less than 343 ms, in the actual application, it can be identified twice per second; all these methods can meet the real-time requirement.

Conclusions
A novel method is proposed in this study based on bimodal deep learning and Hilbert-Huang transform for the identification of coal-rock in top coal caving.Four main innovations are present in this study.First, the bimodal learning method is adopted in enabling DNN to study the characteristics of coal and rock completely.Second, the transfer learning method is adopted to solve the problem wherein a large number of samples are necessary to train the DNN.Third, the extracted features of acceleration and sound pressure signals are combined to extract the most efficient features.Fourth, the most suitable installation location for the sensors is selected.For future works, the authors plan to obtain substantial coal-rock data in different coal mines to improve the applicability of the proposed method, use more effective feature extraction methods, improve the real-time performance, and strengthen the stability and robustness.Moreover, producing an intelligent identification with high precision and speed, adaptability, and better practical products is the aim for future works.

Figure 3 :Figure 4 :
Figure 3: The architecture of transfer learning.

Figure 6 :
Figure 6: (a) Acceleration sensor and sound pressure sensor; (b) specific installation location.

Figure 10 :
Figure 10: The recognition results with different numbers of joint units.

Figure 13 :
Figure 13: Comparison of algorithms with and without pretraining.

Figure 15 :
Figure 15: Proposed method versus other methods.

Table 1 :
Parameters of acceleration sensors.

Table 2 :
Recognition rates with different combinations feature.
Figure 14: Comparison among algorithms with different frame rates.

Table 3 :
Comparison among algorithms with and without transfer learning.

Table 4 :
Comparison of the recognition time.