Discrimination of Rock Fracture and Blast Events Based on Signal Complexity and Machine Learning

The automatic discrimination of rock fracture and blast events is complex and challenging due to the similar waveform characteristics. To solve this problem, a new method based on the signal complexity analysis and machine learning has been proposed in this paper. First, the permutation entropy values of signals at different scale factors are calculated to reflect complexity of signals and constructed into a feature vector set. Secondly, based on the feature vector set, back-propagation neural network (BPNN) as ameans ofmachine learning is applied to establish a discriminator for rock fracture and blast events.Then to evaluate the classification performances of the new method, the classifying accuracies of support vector machine (SVM), naive Bayes classifier, and the new method are compared, and the receiver operating characteristic (ROC) curves are also analyzed. The results show the new method obtains the best classification performances. In addition, the influence of different scale factor q and number of training samples n on discrimination results is discussed. It is found that the classifying accuracy of the new method reaches the highest value when q = 8–15 or 8–20 and n = 140.


Introduction
In laboratory rock tests, in situ rock excavation, and a lot of other rock engineering, signals of rock fracture events are often mixed with other signals such as environmental noise, impact and vibration, and blast signal.When these signals are monitored by microseismic or acoustic emission machines [1][2][3][4], the presence of jamming signals, especially blast signals, may result in the wrong interpretation, for example, erroneous state evaluation and disaster prediction [5,6].Consequently, it is necessary to ensure a clean database of rock fracture signals.Although the discrimination of rock fracture and blast events can be performed by experts, manual discrimination of rock fracture and blast signals is timeconsuming and subjective due to the fact that it depends on the experience.Therefore, discrimination of rock fracture and blast signals, in particular large quantities of signals, requires a reliable and automatic method.
In recent years, machine learning has been widely applied to realize automatic identification and classification about signals.Machine learning [7][8][9][10] includes many methods, such as neural network [11][12][13], support vector machine [14,15], and naive Bayes classifier [16,17].Currently, several recognition methods of rock fracture or similar signals were proposed in some studies.For example, Shang et al. [18] classified microseismic events and quarry blasts according to artificial neural networks (ANN) based on principal component analysis.Yildirim et al. [12] used the extracted peak amplitude ratio (/ ratio) of quarry blasts and earthquakes to contrast classification accuracies of FFNN, PNN, and ANFIS.Liu et al. [19] proposed a method of wavelet transform and ANN to recognize acoustic emission signals for different rocks.Del Pezzo et al. [20] used ANN based on seismogram signatures to classify earthquakes and underwater explosions.Peng et al. [21] used improved BPNN and combined feature extraction method to recognize seismic signal.
All the aforementioned methods usually conduct feature extraction before feature recognition.Waveform parameters of signals, such as amplitude, frequency, and total radiated energy, are extracted as eigenvectors.However, those waveform parameters are sometimes impossible to reflect the characteristic of the total waveform absolutely.In addition, Shock and Vibration the process of extracting parameters also consumes much time and effort.In order to classify signals more precisely and easily, it is vital to find a classification method that need not depend on waveform parameters of rock fracture and blast signals.
In this study, a new method based on signal complexity analysis and machine learning has been proposed to achieve automatic identification of rock fracture and blast signals without waveform parameter.The method calculates signal complexity based on multiscale permutation entropy (MPE) and uses back-propagation neural network (BPNN) as a tool of machine learning.To calibrate and validate the proposed method, the signal complexity values of predetected events were also input into support vector machine (SVM) and naive Bayes classifier to classify signal category.In addition, the influence of scale factors and number of training samples on classifying accuracy was also analyzed for the new method.

Signal Complexity Analysis with Multiscale Permutation
Entropy.Feature extraction of signals is usually required before signal discrimination.Almost all the previous studies used waveform parameters as discrimination features.For example, Vallejos and McKinnon [22] used 13 parameters of seismic full waveform as discrimination feature vectors of blast and microseismic events.Mousavi et al. [23] extracted 40 features from time, frequency, and time-frequency domains to classify deep and shallow microearthquakes.However, the commonly used characteristic parameters are difficult to obtain automatically, which limits the automatic identification of rock fracture events.Furthermore, the above waveform parameters are obtained from single scale analysis, which reflects less information of signals.To solve the above problems, this paper extracts feature vectors of signals based on signal complexity standpoint.Signal complexity is expressed primarily by correlation and random degree of time series for a signal, which reflects the overall feature of a signal.The complexity of a signal can be described by many methods, such as permutation entropy (PE) [24,25], multiscale permutation entropy (MPE) [26,27], Lempel-Ziv complexity [28], and multiscale Lempel-Ziv complexity [29].MPE is more robust due to the only use of the order of time series values; meanwhile MPE can obtain multiscale signal information.This paper applies thus MPE to calculate signal complexity as signal recognition features.The basic principles are introduced as follows.
A one-dimensional time series is given as follows: Coarse graining of the above time series can be expressed by where  stands for the scale factor and   () stands for the multiscale time series.When  = 1, the coarse graining time series stands for the original time series.
Phase space reconstruction of coarse graining series is performed: where  is the embedded dimension and  is the time delay.
If the  number of real values contained in each () can be arranged in ascending order as and if there exist two or more elements in   () that have the same value, for example,   (+( 1 −1)) =   (+( 2 −1)), their original positions can be sorted such that, for  1 <  2 , Accordingly, any vector   () can be mapped onto a group of symbols as where  = 1, 2, . . .,  and  ⩽ !; ! is the largest number of permutations.The permutation entropy of time series at  scale is expressed as follows: If   () = 1/!,    () will reach a maximum ln(!) and    () will be normalized; then Then where PE   in MPE represents signal complexity when the scale factor is equal to .The size of PE   value indicates the degree of randomness of time series.The smaller the value of PE   is, the more regular the time sequence states are.The greater the value of PE   is, the more random the time series is.

Signal Identification with Back-Propagation Neural Network.
After signal features are extracted by signal complexity, then discrimination of rock fracture and blast signals is performed by feature recognition.However, manual identification is time-consuming and easily influenced by individual factors.In order to reliably discriminate rock fracture and blast signals automatically, BP neural network [30] as an identification tool is applied.It is made up of an input layer, a hidden layer, and an output layer.
There are two kinds of signals flowing between layers in BP neural network.The working signals spread forward and other error signals between actual outputs and expected outputs are back-propagated.The basic process is shown as follows.
The hidden layer input of the th node: where net  represents the hidden layer input of the th node,   stands for the weight value from the th node of the hidden layer and the th node of the input layer,   is the th input of input layer, and   is the th threshold of the hidden layer.The hidden layer output of the th node: In the formula,   is the hidden layer output of the th node and  stands for the inspirit function of the hidden layer.The output layer input of the th node: where net  represents the output layer input of the th node,   stands for the weight value from the th node of the out layer and the th node of the input layer, and   is the th threshold of the output layer.The output layer's output of the th node: In the formula,   is the output layer's output of the th node and  stands for the inspirit function of the output layer.
The error function    is given by ( 14) and the BP ANN stops when    <  is satisfied, where  is a given precision.
where   is expected value of output node .
A learning process updates the weights   for each neuron based on the following equation: where  is learning rate,  ∈ (0, 1).Step 3 (train machine learning tools).Input the feature vectors  mpe of training samples to train BPNN and make it adjust the weight value constantly until the error is below the set error value.

Discrimination Process and Performance of the New Method
Step 4 (classification of test and validation data).Input the feature vectors  mpe of test and validation samples to the BP neural network that has been trained.Through network internal calculation, the accuracies of test and validation data can be derived.
According to the above operation, the classification results are derived.The whole process sketch is shown in Figure 1.

Discrimination Performance Evaluation.
In order to evaluate the performance of the new methods, the receiver operating characteristic (ROC) curve is applied.ROC is a graphical plot which illustrates the performance of a binary classifier system, as its discrimination threshold is varied.It is created by plotting the fraction of true positives out of the positives (TPR = true-positive rate) versus the fraction of false positives out of the negatives (FPR = false-positive rate), at various threshold settings.In this study, rock fracture and blasts events are considered as a two-class prediction problem; there are four possible outcomes from a binary classifier, as shown in Table 1.A true positive (TP) means that a rock fracture event has been identified as a rock fracture event and a false negative (FN) means that a rock fracture event has been identified as a blast event.A true negative (TN) means that a blast event has been identified as a blast event and a false positive (FP) means that a blast event has been identified as a rock fracture event.Then  The accuracy (ACC) can be expressed as

MPE-BPNN Analysis.
Before MPE values are calculated, the coefficients of MPE method itself need to be chosen.The coefficients include embedding dimension , time delay , and scale factor . Bandt and Pompe [31] suggested that Thus, the value of scale factor  is chosen from 8 to 15.That is, an eight-dimensional vector can be constructed for each signal to describe its characteristic.
Then MPE values of 140 waveforms from  which include rock fracture and blast signals are chosen to train the BPNN.The remaining data are regarded as test data and validation data to evaluate the performance of MPE-BPNN method.Among BPNN that employed the typical network of three layers, Reyes et al. [32] stated that there should be 2 + 1 neurons in the hidden layer, where  is the number of input neurons.Due to the scale factors  = 8-15, the nodes of input are 8, so the hidden layer has 17 neurons and then BPNN is trained.
BPNN is trained for 71 loops by 140 groups of data.The cross entropies of training and test and validation data are shown in Figure 5. Best validation performance is 0.01458 at 65th iteration from Figure 5.And error of each data is calculated and shown in Figure 6.From Figure 6, errors of 200 eight-dimensional vectors that are made up of permutation entropy are within −0.02664∼0.02664;individual data have superior errors, which reveals little miscarriage of justice of events.
Results of classification of three data sets and total data set are shown in Table 2. From Table 2, three rock fracture events are regarded as blast events falsely and four blast events are misjudged as rock fracture events in the training data.TPR, FPR, and ACC of training data are 95.8%,5.9%, and 95%, respectively.In the validation data, a blast event is regarded as a rock fracture event falsely and a rock fracture event is regarded as a rock fracture event falsely when TPR, FPR, and ACC of validation data are 92.9%,6.3%, and 93.3%, respectively.Meanwhile, a rock fracture event and two blast events are not misjudged in the test data.Their TPR and FPR are 92.9% and 12.5%, respectively, and ACC reaches 90%.Overall, five rock fracture events are regarded as blast events falsely and seven blast events are misjudged as rock fracture events in the total data.TPR and FPR of training data are 95.0%and 7%, respectively, and ACC reaches 94%.
In order to display classification performance of the new method more intuitively, ROC curves of different sets of data are shown in Figure 7. From Figure 7, corner point of training set is closest to top left corner, which means TPR achieves maximum rapidly when related FPR is low and represents the better accuracy of classification of two signals in training.As a result of training, corner points of test set and validation set are both close to top left corner, which means rock fracture and blast events are classified accurately in test and validation sets.ROC curve of total set is also drawn in Figure 7. Corner point of total set is also close to top left corner.Thus this illustrates that the proposed method has high accuracy of discrimination of rock fracture and blast signals.

Comparison of Discrimination Performance for SVM,
Naive Bayes, and the Proposed Method.To evaluate the performance of the proposed method in this paper, Naive Bayes and support vector machine [33] have also been implemented collectively.The first 70% of the rock fracture and blast events have been used as training samples and the remaining 30% of data have been used as test samples.The total data numbers     11, 5, and 3, respectively, for SVM, naive Bayes, and the new method.FN that indicate rock fracture events are regarded as blasts falsely, being 3, 6, and 2, respectively, for SVM, naive Bayes, and the new method.FP and FN reveal that the proposed method has lower miscarriage of justice than others.The accuracies (ACC) are 76.67%, 81.67%, and 91.67%, respectively, for SVM, naive Bayes, and the new method, which illustrates that the proposed method obtains the best classification accuracies on the whole and shows the highly nonlinear mapping ability of the proposed method.From Figure 10, corner point of the new method is closest to top left corner in ROC curve, which shows that the new method has high TPR when FPR is low.This phenomenon exposes the fact that the new method possesses better performance than other classifiers mentioned in this paper.As Table 4 shows, TPR are 90%, 80%, and 92.86% for the new method, SVM, and naive Bayes, respectively, when FPR are 36.67%,16.67%, and 9.38%.
In conclusion, the new method obtains the best classification results.

Discussion
In order to further evaluate the performance of the proposed method, the influence of different scale factors and training sample numbers is discussed.From Section 4, an eight-dimensional vector [ 8 pe ,  9 pe , . . .,  15  pe ] is selected to express characteristic of each waveform.In order to further analyze the influence of scale factors on the identification results, the new method is run when  are chosen from 8 to 10, 15, 20, 25, and 30, respectively.The changes in the classification accuracies are shown in Table 4 when the number of scale factors increases.
As shown in Table 4, when the feature vector is threedimensional, the total classification accuracy of the new method is 90.5%, and with an increase of scale factor, the total classification accuracy increases.When  are 8-15, the rock fracture accuracy reaches the highest value.When  are 8-20, the blast classification accuracy only has the modest growth.When  are greater than 20, the classification accuracy declines and tends to be stable afterwards.The reason is that the increase of the scale factor could make it more difficult to express the complexity of the signal.Meanwhile, the increasing number of scale factors increases the calculating time.Therefore, the best choices for scale factors that should be selected, according to this experiment, are 8-15 or 8-20.  5.

The Influence of Training Sample Numbers on the
As shown in Table 5, with the number of training samples increasing, classification accuracies of all events first remain unchanged and then rise and decline lastly.Due to the fact that rock fracture and blast signals have complex waveform features, an excessive number of training samples may lead to an overfitting problem, which results in the decreased classification accuracy.According to the above analysis, it is appropriate to select the number of samples as 140.

Conclusion
In this paper, a new method has been proposed for distinguishing rock fracture and blast events.The new method has many advantages.First, the method turns out to be rather fast and it does not seek for waveform parameters of detected signals and only signal time series are required, which is more convenient and simple because signal time series have been detected by related equipment on the site.Secondly, depending on self-learning capacity of BPNN, it can classify rock fracture and blast signals automatically, which deals with time-consuming and subjective problem of manual discrimination.
In this study, multiscale permutation entropy (MPE) is applied to calculate complexity values for two hundred signals including 100 rock fracture and 100 blast signals.
The calculated MPE values can indicate signal complexity and characteristic and are regarded as feature vectors of rock fracture and blast signals.Then back-propagation neural network as means of machine learning is used to construct discriminator for rock fracture and blast signals based on feature vectors.Accuracies of training, validation, and test sets from 200 data sets reach 95%, 93.3%, and 90%, respectively.Accuracy of all data reaches 94%.TPR of training, validation, and test sets and all data both achieve maximum rapidly when related FPR is low, which reveals better accuracy and sensitivity of classification of two signals.
To evaluate the performance of the new method, the comparison of classification performances of SVM, naive Bayes, and the new method is carried out.Accuracies of the above three methods are 76.67%, 81.67%, and 91.67%, respectively.The results show the new method obtains the best classification accuracy.ROC curves of the above three methods are also contrasted.Corner point of the new method is closest to top left corner in ROC curve, which illustrates that the new method has the best specificity and sensitivity.
It is noted that the scale factors of MPE and quantities of training samples for BPNN are very important for identification results.For 200 data sets, the best scale factors are 8-15 or 8-20 and the best quantities of training samples are 140.Excessive number of training samples may lead to an overfitting problem, which would reduce classification accuracy.

Figure 1 :
Figure 1: Sketch of the whole discrimination process.

4. 1 .
Data Set.The experimental data sets were collected from Hubei Province, including one hundred rock fracture signals and a hundred blast signals.The partial signals are shown in Figure 2.

Figure 2 :Figure 3 :
Figure 2: Partial signal data from sample set.

Figure 8 :Figure 9 :
Figure 8: Outcomes of classification of different methods.
Identification Results.Appropriate numbers of training samples are vital for the proposed method to determine classification accuracy.Here, 100, 120, 140, 160, and 180 samples of 200 groups of data are chosen, respectively, as training sets.The accuracies of different training sample numbers are shown in Table This section describes the process of whole discrimination of rock fracture and blast signals based on the proposed method.The process divides signals waveform data into training and test and validation sets.The specific steps of the new method are as follows.

Table 1 :
Contingency matrix for two-class prediction problem.

Table 2 :
Classification results for different sets.

Table 3 :
Classification results for different methods.The classification results are shown in Table3and Figures8-10.As shown in Table3and Figures8 and 9, FP that indicate blasts are regarded as rock fracture events mistakenly, being

Table 4 :
Classification accuracy with different scale factors.

Table 5 :
Classification accuracies with different training numbers.