Ensemble Prediction Algorithm of Anomaly Monitoring Based on Big Data Analysis Platform of Open-Pit Mine Slope

With the diversification of pit mine slope monitoring and the development of new technologies such as multisource data flow monitoring, normal alert log processing system cannot fulfil the log analysis expectation at the scale of big data. In order to make up this disadvantage, this research will provide an ensemble prediction algorithm of anomalous system data based on time series and an evaluation system for the algorithm. This algorithm integrates multiple classifier prediction algorithms and proceeds classified forecast for data collected, which can optimize the accuracy in predicting the anomaly data in the system. The algorithm and evaluation system is tested by using the microseismic monitoring data of an open-pit mine slope over 6 months. Testing results illustrate prediction algorithm provided by this research can successfully integrate the advantage of multiple algorithms to increase the accuracy of prediction. In addition, the evaluation system greatly supports the algorithm, which enhances the stability of log analysis platform.


Introduction
Landslide is one of the most worldwide and dangerous natural disasters, which is a serious life threat for people working at open-pit mine slopes.Because of the complexity and uncertainties of the landslide formation and causing factors, landslide surveillances and predictions are always highly paid attention to by international researchers.With the wild application of systematic science and nonlinear science, researchers realized that landslide body is an open and complex system.By using sensor anomaly monitoring technology, massive amount of alert logs will be generated in the process of microseismic monitoring.These logs contain the time, place, alert level, and other relevant data of anomalous incidents.There are many researchers which have investigated the microseismic monitoring models in recent years [1][2][3].Anomaly monitoring techniques are varied based on different types of analysis subjects.Among those, monitoring methods based on anomaly history and system status are most generally applied.Previously, researchers have focused on factors such as intrusion detection system [4,5], prediction model [6,7], and system call [8,9].Nohra et al. [10] suggest that near continuous monitoring via infrared spectroscopy is safe and accurate for use in critically ill surgical and trauma patients.Vinoth et al. [11] propose that PCbased microseismic network consisting of geophones, data loggers, GPS synchronization, and Ethernet antennas for wireless communication is employed to study the impact of induced seismicity on the slope failures in real time.
Monitoring method based on anomaly history majorly collects the anomalous data of complex systems (e.g., slope microseismic monitoring alert logs).Machine learning and data mining are used in anomaly analysis in order to discover the potential correlations between history anomaly and future anomaly, which helps the prediction of future landslides.Many methods have been proposed in the literature such as neural network [12,13].Based on modified genetic algorithm, back propagation neural network classification algorithm, and the landslide disaster prediction theory, taking into account the rainfall and other uncertainties in landslide, this paper proposes the concept of separation of uncertain data/elaborates the processing methods of uncertain property data and builds uncertain genetic neural network and the landslide hazard prediction model.However, the method often results in such problems as local minimization and slow convergence rate.Most of previous researches only employ dichotomy algorithm of SVM [14,15], which is far from enough for anomaly monitoring system in open-pit mine.There are also methods used in this area, such as grey prediction model [16,17], fuzzy clustering [18,19], and numerical simulation [20][21][22], which also have its respective disadvantages that will not be listed one by one.With the coming of big data age, data collected is growing in an exponential rate.In the areas of big data analysis, Jayasena et al. [23] propose the ant colony optimization (ACO) algorithm for efficient resource allocation in infrastructure layer.Pop et al. [24] orient on computer and information advances aiming to develop and optimize advanced system software, networking, and data management components to cope with big data processing and the introduction of autonomic computing capabilities for the supporting large-scale platforms.Traditional alert log analysis can no longer fulfil the need in discovering the relationship between anomalous data and landslides.Thus, new alert log analysis platforms are invented for alert log data mining at the scale of big data.

Alert Log Analysis System Based on Big Data Platforms
Not only does the individual system log analysis require distributed data mining techniques but it also has a low recall rate in ineffective predictions.Single system is a system that uses only one processor to process data, while cluster system is a distributed system or concurrent system that comprises of computers connected together.Currently, there is increasing attention paid to cluster system rather than single system [25][26][27].Some scholars have focused on distributed storage, arguing that the MapReduce computing model is effective and easy to process mass data in distributed computing [28][29][30]; Vahidipour et al. [31] compare several weighted combination methods in which the hierarchical clustering results are used in order to derive a consensus hierarchical clustering.Song et al. [32] propose a two-tiered on-demand resource allocation mechanism consisting of the local and global resource allocation with feedback to provide ondemand capacities to the concurrent applications.Other scholars are interested in the combination of transfer learning and class imbalance learning.For example, Ryu et al. [33] through comparative experiments with the transfer learning and the class imbalance learning techniques, it shows that the proposed model provides significantly higher defect detection accuracy while retaining better overall performance.The system employs distributed processing to achieve real-time monitoring system log and uses distributed storage of big data platform and the capacity of data calculation and analysis to realize the distributed storage of network.
The system includes network data collector, distributed realtime data transmission channel, distributed log processing platform, network data protocol feature library, and big data platform.The structure of the system is showed in Figure 1.
The main function and process flow are presented below: (1) Data is sent through distributed real-time data transmission channels.Network data collectors will send data packets on each device to distributed log processing platforms.
(2) Distributed log processing platforms start realtime process for packets acquired and match data characteristics by using network protocol characteristics library.
(3) Big data platforms perform cluster analysis and classification training to stored log data and update dynamically to network protocol characteristics library described.

Complexity
The current cloud calculation model is MapReduce provided by Google laboratory.Besides Hadoop MapReduce and Hive, Map-Only and Iterative are also applied in MapReduce calculation model as discussed by Sobreira et al. [34].In addition, based on Hadoop and MapReduce, some researchers have also developed new effective programming pattern PDMiner which is a parallel distributed data mining system [35].The system uses Cloudera CDH as supporting software for big data platform and distributed stream as processing platform Spark Streaming to analyze, match, and count network data packets.A trunking system unit like this is composed by 5 nodes.Each node has equipped 24 CPUs, 127 GB RAM, and 10 hard drives.Data is collected via TcpDump protocol, which uses interchangers to send data from one or multiple ports to other ports for network monitoring without interfering other programs.Apache Kafka is used for real-time data transmission, sending data to Spark Streaming distributed streaming platform for real-time analysis.Results will be stored at big data platforms and provided for systematic training and network protocol data library updates by using support vector machine (SVM) and Bayesian structure classifier with big data modules such as MapReduce, Hive, and Mahout.The log analysis experiment on such platforms indicates that log analysis system has an acceptable monitoring accuracy and time consumption.However, in the circumstances of having anomaly network traffic, the statistics of traffic attributes have a decreasing trend.In order to improve the system log analysis and achieve anomaly detection, this article presents EPABT (ensemble prediction algorithm based on time series) for big data analysis platforms.This system can predict the anomaly quantity of the system in a certain period of time.

Theoretical Background
This research paper selects multiple EPABT in data mining to establish prediction model and uses two methods for testing.The first method is to separate data set into training set and testing set.Specificlly, the data collected in the last month is used as testing set, the rest data as training set.In this way, the method can achieve anomaly detection via training and testing [36][37][38].The second method uses alert logs based on time window to predict anomaly.By using two-stage time window, statics of alert incidents are used to describe network running characteristics, which is the base of decision making features for landslide predictions.The two evaluation systems are correspondingly adopted.The first one is error cost function used in traditional time series, including mean square error (MSE), mean absolute error (MAE), and mean absolute proportional error (MAPE).The other evaluation system uses the precision of prediction, recall rate, and F-Measure as evaluation criteria.New ensemble prediction system structure is depicted in Figure 2.
The workflow of this system includes 3 steps below: (1) Analyze, transform, and preprocess historical network system data.Then filter and remove irrelevant data and attributes and transform raw data into a usable format for prediction model.
( (3) Use individual trained model to predict for future capacity and request.Ensemble learning is used to reduce the variance of predicted results.Ensemble learning uses every available model to predict and determine the weight of each model based on prediction history.In addition, ensemble learning will also use nontime series methods for prediction corrections.
The second testing method needs to set up a time axis and divides it into time windows with different scales.For 3 Complexity microseismic monitoring systems online, system logs and real-time data are considered as a time series.A sliding time window Δ is set up.Then finish the prediction with the alert information in the time window and relationship between history alert log and landslides.Time axis divisions are shown in Figure 3.
The definition of time window in this research paper is prediction time window is the time window of predicting the existence of anomaly, with a size of Δ.Current time window is defined as the time window before the prediction time window, with a size of Δ. Observing time window is the nth time window before prediction time window, with a size of Δ × n.Sample window is the smaller division of unit time window, with a size of δ, number of sample time window in unit time window is defined as Δ/δ.The typical time series analysis prediction methods include average prediction of sliding window, autoregression, artificial neural network, support vector regression, and gene expression programming.

Sliding Window Mean Prediction.
Sliding window mean prediction is the simplest method in time series prediction.When using such prediction, a sliding window with a fixed length n is needed [39].If we set the prediction time is t, then the predicted value at time t is MA t−1 denotes the moving average at time t.By using incremental presentation, average of sliding window can be shown as The advantage of sliding window mean prediction is its low calculation complexity.However, if time series is unstable, the prediction result will be much less accurate.In addition, it is also difficult in setting a proper sliding window length.If the value of n is too large, the fluctuation of predicted value will be underestimated.If the value of n is too small, then the history data cannot be sufficiently used.[40,41] is another method to predict time series in statistics.The essence of this model is stochastic process.One basic hypothesis of autoregressive model is to output nonlinear historical value, which can be presented as

Autoregression. Autoregressive (AR) model
c indicates constant, ε t is random error, and ϕ t is the parameter of autoregressive model.If the current value has correlation with historical value, the current value can be predicted based on historical data acquired.Thus, the sliding windows mean prediction can be considered as a special case of autoregressive model with all parameter is 1/n.By improving autoregressive model, autoregressive moving average model (ARMA) is also commonly used in such calculations.ARMA model can be presented as ARMA(p,q), in which p is the number of relevant parameters in AR model, q is the number of parameters in MA model.For example, ARMA(n,m) can be presented as ϕ i are relevant parameters in AR model (n terms in total), ϕ i are the relevant parameters in MA model (m terms in total).

Artificial Neural Network.
If more complex nonlinear relationship exists in data, it is extremely difficult for autoregressive prediction to match the relationship among data.In order to efficiently predict nonlinear data relationships, artificial neural network (ANN) can be inducted [42,43].A typical neural network has 3 layers, that is, input layer, hidden layer, and output layer.An artificial neural network can has 0 to multiple hidden layers.An artificial neural network that has 3 layers is presented in Figure 4; one hidden layer is included.
Illustrated in the photo, number of neurons in each layer is 4, 4, and 1.Each input and hidden layer has one deviated neuron which has a fixed value of 1. θ k ij is the weight of neurons from layer k number j to layer k + 1 number i. a k i denotes the value of the ith neuron at layer k.If k is input layer, then the value of neuron can be directly acquired, or the value needs to be calculated from the upper layer (expect deviated neurons).If we set the value of neurons at k + 1th layer as a k = a k+1 1 , … , a k+1 m T , then from ( 5) we can have 4 Complexity θ k ij is the weigh matrix of neurons from the kth layer to the k + 1th layer.
is the vector value of all neurons on the kth layer.For example, the hidden layer neuron in figure four can be presented in 6 as is to acquire hidden layer output at output layer.The common method used in neural network learning is gradient descent algorithm.If we set j θ as error functions in a neural network with k input neurons In the formula, θ indicates the weight matrix, and y − ŷ denotes the difference between actual value and predicted value.
Because artificial neural network update the weight after learning the entire training set and also has a complex topology structure, the gradient descent algorithm will be much less effecient and take much more time in training.In order to make up this disadvantage and increase training speed, stochastic gradient descent algorithm that is a more practical method, can update the weight of make each sample by processing gradient descent.
3.4.Support Vector Machine.Support vector machine (SVM) is a supervised learning algorithm for solving classification problems with high accuracy [44].Because this algorithm always prefers to find the decision hyperplane with the largest margin, the algorithm has great generalization of testing samples.Support vector regressor machine is another variation of support vector machine.Its basic principle is the features of training machine data are mapped from n-dimension space into a higher m-dimension space through nonlinear transformation vertebra (also called kernel function), and then linear regression is preceded in the higher space.The basic idea of regressor can be illustrated as In the formula ω = ω 1 , ω 2 , … , ω m T denotes the weight of each characteristic of original samples in m(m > n) dimensions after mapping.b is threshold value.If mapping function Φ is deleted in (9), (9) will be a common expression of linear regression.
By setting up a trained support vector regressor machine, the predicted value can be calculated in the time of O(m).For example, if the data of a time series is  2  10 In the formula, x i denotes the training sample, y i denotes the actual value of concentrated training, C(•) is the cost function, and λ ω 2 is used to control the complexity of the model.Punishment will be given to weight values which increase the complexity of the model.Quadratic programming can be used to optimize most of the cost functions C(•).ε-intensitive and Huber cost functions are the common cost function candidates.
3.5.Gene Expression Programming.Gene expression programming (GEP) is a new type of bionic algorithm originated from biological field [45].It inherits advantages of traditional genetic algorithm and genetic programming.The reason it can manifest any complicated expression is that it codes according to chromosome expressions.In terms of structure, a chromosome is expressed by an array, but in terms of logic, a chromosome embodies an expression tree.
As shown in Figure 5, there is a simple expression a + b + c * d , its corresponding array and expression tree.For the training of model, gene expression programming optimizes the model through group evolution pattern.A population usually has 40-100 chromosomes whose chromosome expressions have strong diversities.Chromosomes in populations evolve through replication, mutation, recombination, inversion, and other operations under the lead of fitness function, as shown in Figure 6.

Ensemble Algorithm Based on Time Series and Assessment Standards
4.1.Ensemble Algorithm.Since a single prediction model is difficult to be predicted precisely, this article employs the strategy of ensemble learning, combining prediction abilities of different prediction models to present an ensemble prediction algorithm based on time series.The strategy of ensemble learning has already been widely used in classification problems of data mining.In classification problems, data samples are regarded as independent and identically distributed.However, for the problems of time series analysis, strong connections of time dimension are existed between samples.Moreover, since prediction values of time series analysis are consecutive, classified voting mechanism cannot be used to get the final result.For the ensemble learning of time series analysis, its strategy is mainly to update each predictor's weight by predicting assessment standards.The name of each algorithm and related descriptions are shown in Table 1.
In order to achieve the predicted result mentioned above, this article proposes a strategy of weighted linear combination.Assume the predicted result of prediction algorithm p ∈ P at moment t is v t p , and its weight at moment t is w t p , so at moment t the predicted value of some journal is At initial status, t = 0, all prediction algorithms have the same contribution to predicted results, like w t p = 1/ P .Based on the difference between the update strategy of the weight of ensemble algorithm and the basic strategy of traditional classification, in predicted scene of classification, results can only be indicated as "right" or "wrong," but the purpose of updating the weight of ensemble algorithm is just to enhance the weight of predictors which have the right result.In prediction scene, predicted results are consecutive values and the weight of prediction algorithm directly affects the result of ensemble algorithm.In order to achieve the update of ensemble prediction algorithm's weight, by using the difference between each predicted value vt p and actual value v t , the relative error of prediction algorithm i at moment t can be expressed as 6 Complexity shows the number of abnormal prediction at future moment s.Therefore, the entire cost function can be expressed as In this equation, β is a parameter for adjusting two types of cost weight.Through the change of value β, the cost of weight which is over high predicted or over low predicted can be adjusted manually.AEC is a generalized error cost function, which is able to satisfy the need of application scenario in this article.The expression form of AEC varies according to specific applications.Defined P function and R function satisfy two properties: nonnegativity and consistency.
(1) Nonnegativity: for any nonnegative v s and v s , both s the property of positive and negative should be obeyed as well.
In the prediction process of outlier of slope of open-pit mine's microseismic monitoring, assume the cost of precise prediction is C normal , the cost of over low prediction is C under , the cost of over high prediction is C over .Often, C under is uncertain, but what is certain is that C under ≫ C normal , so P function is AEP cost function can be described as Combining ( 15) and ( 16), we can get the specific measure of AEC cost function  7 Complexity of open-pit mine's microseismic monitoring [46,47], collecting monitoring data from Apr. 2017 to Oct. 2017, as shown in Figure 7. Therein, microseismic monitoring uses ARAMIS M/E microseismic monitoring system from Polish accompany EMAG, as shown in Figure 8. Supporting software generates journal document according to shock events.Assume sliding window is Δ = 72 h, the pace time window slides forwards 3 min, so the system can generate 20 groups of alarm value per hour.Each rating value is calculated based on the monitoring data in recent 72 hours.Through sifting and pretreatment, the effective amount of data is 92752, including the data information of slope monitoring system in a span of 6 months.Alarm report journal data mainly includes (1) Level: it means different status of slope abnormal information, which are three statuses in total: NOR, DAN, and CRI.
(2) Alarm report source: it means the number of specific monitoring sensor which emits alarm report information, starting with 0, which are 0,1,…,100.
(3) Location information: it describes the location information of the sensor which emits alarm information.
(4) Alarm time: it indicates the time of recording alarm report information, with the precision of second, such as 02/16/201607:35:06.Through the analysis of primal logged exception, a huge amount of illegal, redundant records, derivative alarm, noise data like flicker alarm, and so on are detected.Before the prediction, corresponding filtering methods are designed to effectively sift noise data of primal alarm report journal.First is to analyze the characteristics of predicted targets and alarm report journals.Current time window can best represent the current status of the system and is closest to prediction time window.Therefore, each level of alarm report events in current time windows and different types of alarm report events are regarded as features.Extracted features includes the amount of hint-level alarm report events in current time window, the amount of subordinate-level alarm report events, the amount of important-level alarm report events, the amount of emergency-level alarm report, also failure events, and the amount of every type of alarm report events in current windows.

Assessment Standards of the Experiment.
Besides the asymmetric error cost function this article mentions is assessment standard, in order to prove the utility of ensemble prediction algorithm and the accuracy of the assessment standards, this article also adopts other traditional assessment standards [48].The first prediction pattern adopts normal error cost function as prediction assessment standards.The error cost functions traditional method of time series analysis uses are usually MSE, MAE, MAPE, and so on.Each cost function is Anomaly number Temporal granularity (hours) Temporal granularity (days) Temporal granularity (weeks) 8 Complexity MSE can evaluate the variation of data.If the value of MSE is smaller, prediction model has better accuracy describing experimental data, where observed manifests measured value, predicted manifested predicted value.The average absolute value MAE is the average value of absolute error, which can better reflect the reality of predicted value's error.In MAE = 1/n ∑ n i=1 f i − y i , f i is predicted value, y i is actual value.The greater the MAPE value, the greater the difference between predicted value and original value, meaning the worse the prediction effect.The smaller the MAPE's absolute value, the higher accuracy it is.
The assessment standard the second test mode adopts is Precision, Recall, and F-Measure.In the equation, TP is the outlier of correct prediction, FP is the outlier of normal prediction (false positives), and FN is outlier that is not predicted (false negatives).

4.4.
Selection of Temporal Granularity.Since this experiment is based on time series prediction, the selection of temporal granularity is one of the great factors that affect the predicted result [49].Time series is to cluster the request of original data assembly according to certain time granularity and is gained according to time series.The difference of time granularity selection can bring different difficulties to time series prediction.Figure 8 shows the time series gained by different clustering of temporal granularity, where x-axis represents temporal granularity, y-axis represents the outlier.We can know from Figure 8 that the greater the selection of temporal granularity, the greater the outlier at each moment.
As for great temporal granularity, due to the huge base, the predicted absolute value might also be huge even though the prediction bias is small, which causes greater prediction error.Selecting overly small temporal granularity is not suitable for time series analysis either.Overly small temporal granularity makes data at each moment lack statistical significance.In addition, the clustered time series shows irregularity in multiple index.Table 2 provides coefficient of variations, skewness, and kurtosis of different temporal granularity clustering's every time series.
Coefficients of variation measure the mutability of time series.The greater the value, the easier time series changes with time.Skewness measures the asymmetry of time series.The greater the value is, the more asymmetrical the time series is.Kurtosis measures the variance of time series.The greater it is, the more irregular the distribution of time series' values.We can know from Table 2 that when the time granularity is in hours, the irregularity of time series is much greater than the total of time series which have temporal granularity in weeks and days.The author also adjusted the parameters.The results show that the best time granularity of time series is recorded by days.Overall, according to observation and initial analysis, the case adopts time series that has temporal granularity in days.

Results and Analysis
The first type of monitoring pattern splits data set into training data set and predicted data set.The last month data are used as test set, and the rest are training set.This contrast experiment uses multiple prediction methods to make comparison, including random prediction, NARMAX, autoregressive, neural network, gene expression programme, SVR, and ensemble learning.We can see from Figure 9 that the method of ensemble learning is always optimal or close to the optimal conclusion.Based on the predicted result of data from 1 to 5 months, the wining prediction model in different data set are distinct.However, ensemble learning method is always close to optimal model.This is because the prediction strategy of ensemble learning always tends to make the entire ensemble learning model approach optimal model through first validating the contribution ratio of increasing experience and prediction model which performs the best.
Figures 10-12 are the evaluation result calculated by using mean absolute error, mean square error, and average absolute ratio error, which each corresponding to sliding mean, autoregression model, neural network, gene expression programming, support vector machine, and ensemble learning algorithm.From the figures above, it is obvious that  Overall, ensemble learning algorithm includes all advantages of other algorithm and processes them in the most accurate way.The second monitoring and verification technique uses time window as basic units for error prediction.In the experiment process, time window unit is divided in to 1 h, 6 h, 12 h, and 24 h.By using the different division of time window, 3 different experiment data sets can be collected as further bases.Thus, efficiency for multiple prediction models can be evaluated under different time windows, which is presented in Figure 13.
The experiment results illustrate that with the increase of time window, the accuracy of all prediction algorithms shows an ascending trend.In order to test the ensemble algorithm and evaluation system provided in this article, actual anomaly and predicted results in a certain period of time are compared, as presented in Figure 14.From Figure 14, it is not difficult to conclude that gene expression programming has the best outcome, followed by neural network.However, compared to ensemble algorithm, both algorithms cannot optimize the anomaly prediction.Ensemble algorithm has integrated all advantages of other algorithms.

Discussion and Conclusion
This article designs a diary analysis system based on big data platform for the characteristic of open-pit mine's monitoring system.In analysis process of monitoring system,   10 Complexity disadvantages of diary prediction and the need of predicting anomaly numbers are found, proposing an ensemble prediction algorithm based on time series.Through integrating multiple types of classified prediction algorithm and making classified prediction for diary data collected, this algorithm achieves the purpose of anomaly number of the comprehensive optimal accuracy prediction system.After instantly adjustment of parameters, such as hierarchical structure of neural network, kernel function of SVM, the results turn out that resemble algorithm has higher accuracy.The specific conclusions are as follows: (1) Single local diary analysis system not only has a complicated calculation, but takes long time, going through distributed real-time data transmission channel to transmit the data packet to cloud service platform in real-time queuing and adopting Spark Streaming platform to do data analysis, feature matching, and access statistics for network packet.
The result shows that the experiment of dairy analysis operated in this platform has good improvement on the accuracy, time consuming and other indexes.
(2) For the disadvantages of single prediction algorithm, integrating multiple types of prediction algorithm is proposed.The model experiments in training set and test set by using the monitoring data of the slope of open-pit mine in 6 months.
The result shows that ensemble prediction algorithm gains better prediction accuracy and effect, whose calculation result is closest to the actual result, and it has a favourable generalization in dealing with homogeneous problems.
(3) In order to examine the experiment result, two different evaluation systems are compared.All the results show that ensemble algorithm has greater accuracy, which proves more strictly the value of ensemble algorithm mentioned in this article.
In this paper, the ensemble algorithm model is an ideal state for weighted mean.However, in reality, many algorithm cannot be weighted.In our future work, more efforts will be made to study the integrated prediction problem under these contexts.

)
Perform model selection and training based on data characteristics.Because of the dynamic attribute of grid systems, model selection and training will be repeated on regular basis with the latest data.System will save all relevant parameters and update existing model after training completed.
= 1/1 + e −z denotes the Sigmoid function of variable z.The purpose of this function is to change to range of z from the entire real number set to an open interval of (−1,1).If Figure4is a trained neural network (all neural weight related values θ ij can be acquired from training set), the output ŷ can be shown as

Figure 4 :
Figure 4: Topology structure of artificial neural network.

Figure 5 :
Figure 5: Array of gene expression programming and the expression tree.

Figure 9 :
Figure 9: Average errors of each prediction method.

Figure 14 :
Figure 14: The comparison between each algorithm and the actual result.

Table 1 :
Characteristics of each time series prediction's algorithm.

Table 2 :
Metric statistics of different temporal granularity clustering.
Figure 11: Evaluation result of MSE.Figure 12: Evaluation result of MAPE.