A Cycle Deep Belief Network Model for Multivariate Time Series Classification

Multivariate time series (MTS) data is an important class of temporal data objects and it can be easily obtained. However, the MTS classification is a very difficult process because of the complexity of the data type. In this paper, we proposed a Cycle Deep Belief Network model to classify MTS and compared its performance with DBN and KNN.This model utilizes the presentation learning ability of DBN and the correlation between the time series data.The experimental results showed that this model outperforms other four algorithms: DBN, KNN ED, KNN DTW, and RNN.


Introduction
Time series data are sequences of real-valued signals that are measured at successive time intervals. They can be divided into two kinds: univariate time series and multivariate time series (MTS). Univariate time series contain one variable, while MTS have two or more variables. MTS is a more important data type of time series because it is widely used in many areas such as speech recognition, medicine and biology measurement, financial and market data analysis, telecommunication and telemetry, sensor networking, motion tracking, and meteorology.
As the availability of MTS data increases, the problem of MTS classification attracts great interest recently in the literature [1]. MTS classification is a supervised learning procedure aimed for labeling a new multivariate series instance according to the classification function learned from the training set [2]. However, the features in traditional classification problems are independent of their relative positions, while the features in time series are highly correlated. That resulted in the loss of some important information if the traditional classification algorithms are used for MTS, since they treat each feature as an independent attribute. Many techniques have been proposed for time series classification. A method based on boosting are presented for multivariate time series classification [3]. In [4], the authors proposed a DTW based decision tree to classify time series and the error rate is 4.9%. In [5], the authors utilize a multilayer perceptron neural network on the control chart problem and the best performance achieved is 1.9% error rate. Hidden Markov Models are used on the PCV-ECG classification problem and achieve 98% accuracy [6]. Support vector machine combined with Gaussian Elastic Metric Kernel is used for time series classification [7]. The dynamics of recurrent neural networks (RNNs) for the classification of time series are presented in [8]. However, simple combination of one-nearest-neighbor with DTW distance is claimed to be exceptionally difficult to beat [9].
Deep Belief Network is a type of deep neural network with multiple hidden layers, introduced by Hinton et al. [10] along with a greedy layer-wise learning algorithm. Restricted Boltzmann Machine (RBM), a probabilistic model, is the building block of DBN. DBN and RBM have witnessed increased attention from researchers. They have already been applied in many problems and gained excellent performance, such as classification [11], dimensionality-reduction [12], and information retrieval [13]. Taylor et al. [14] proposed conditional RBM, an extension of the RBM, which is applied to human notion sequences. Chao et al. [15] evaluated the DBN performance as a forecasting tool on predicting exchange rate. Längkvist et al. [16] applied DBN for sleep stage classification and evaluated the performance. The result illustrated that DBN either with features (feat-DBN) or using the raw 2 Mathematical Problems in Engineering data (raw-DBN) performed better than the feat-GOHMM. The feat-DBN achieved 72.2% and the raw-DBN achieved 67.4%, while the feat-GOHMM achieved only 63.9%.
Raw-DBN do not need to extract feature before classifying the sleep data and this algorithm is easy to implement. However, it neglects the important information in time series data and its performance is not satisfactory. This paper proposed a Cycle DBN model for time series classification. This model possesses the ability of feature learning since it is developed on the basis of DBN. Meanwhile, the characters of time series data are taken into consideration in the model.
The remainder of the paper is organized as follows. Next section reviews the background material. In Section 3, we detail the Cycle DBN model for multivariate time series. Section 4 evaluates the performance of our Cycle DBN on two real data sets. Section 5 concludes the work of this paper.

Background Material
A time series is a sequence of observations over a period of time. Formally, a univariate time series = { ( ) ∈ : = 1, 2, . . . , } is an ordered set of real-valued numbers, and is called the length of the time series . Multivariate time series is more common in real life and it is more complex since it has two or more variables. A MTS is defined as a finite sequence of univariate time series The MTS has variables and the corresponding component of the th variable is a univariate time series of length : = { ( ) ∈ : = 1, 2, . . . , } ( = 1, 2, . . . , ) . (2) In this paper, we use bold face characters for MTS and regular fonts for univariate time series. The time series classification problem is a supervised learning procedure. First we should learn a function : → according to the given training set = {( ( ) , ( ) )} = 1, 2, . . . , . The training set includes samples and each sample consists of an input ( ) paired with its corresponding label ( ) . Then we can assign a label to a new time series instance based on the function we learned from the training set.
A Deep Belief Network (DBN) consists of an input layer, a number of hidden layers, and finally an output layer. The top two layers have undirected, symmetric connections between them. The lower layers receive top-down, directed connections from the layer above.
The process of training DBNs includes two phases. Each two consecutive layers in DBN are treated as a Restricted Boltzmann Machine with visible units V and hidden units ℎ. There are full connections between visible layer and hidden layer, but no visible-to-visible or hidden-to-hidden connections (see Figure 1). The visible and hidden units are connected with a weight matrix, , and have a visible bias vector and a hidden bias vector , respectively. We need to train each RBM independently one after another and then stack them on top of each other in the first phase. This procedure is also called pretraining. In the second phase, the BP network is set up at the last level of the DBN, and the output of the highest RBM is received as its input. Then we can perform a supervised learning in this phase. This procedure is called fine-tuning since the parameters in the DBN are tuned using error back propagation algorithm in this phase.
The procedure of training DBN is shown by Algorithm 1 and the corresponding flowchart is given by Figure 2.
From the above analysis, we can conclude that the most important of DBN is the training of each RBM.
Since there are no hidden-hidden or visible-visible connections in the RBM, the probability that hidden unit ℎ is activated by visible vector (ℎ |V) and the probability that visible unit V is activated by given hidden vector (V |ℎ) is given by Contrastive Divergence (CD) approximation is used to train the parameters by minimizing the reconstruction error and the learning rule is given by ⟨V ℎ ⟩ data is expectation of the training set and ⟨V ℎ ⟩ recon represents the expectation of the distribution of reconstructions. The procedure of training RBM is shown as Algorithm 2 and the corresponding flowchart is given by Figure 3.

Cycle_DBN for Time Series Classification
Längkvist et al. [16] applied DBN in time series classification and obtained a remarkable result. The standard DBN optimizes the posterior probability ( | ) of the class labels given the current input . However, time series data are different from other kinds of data and there are correlations between time series data. It is unsuitable to apply DBN for time series classification without any modification because it neglects the important information in time series data.
Based on the above discussion, this paper proposed a Cycle DBN model for time series classification just as In this model, is the input at time step and is the corresponding output of DBN. Since our purpose is classification, we add a softmax function on the top layer and is the corresponding label. After training DBN and getting the Algorithm 2: The algorithm for RBM train.
label , is then treated as one item input of DBN. At time , the inputs of DBN not only include but also include −1 , the output of DBN at time − 1.
The training procedure of this Cycle DBN, which is similar to the traditional DBN, includes two procedures. The only difference is that the output at time − 1 is feedback to Cycle DBN as one of the inputs at time . The first procedure is unsupervised training to initiate the parameters of DBN. After unsupervised learning, we add a softmax function on the top layer and do a supervised training procedure.

Experimental Evaluation
In this section, we conduct extensive experiments to evaluate the classification performance of the proposed model Cycle DBN and compare it against traditional DBN, NN ED, NN DTW, and recurrent neural networks (RNN).
The -NN is one of the most well-known classification algorithms that are very simple to understand but performs well in practice. An object in the testing set is classified according to the distances of the object to the objects in the training set and the object is assigned to the class its nearest neighbors belongs to. We will choose = 1 in our experiment and the algorithm is simply called the nearest neighbor algorithm. In NN ED, we use Euclidean Distance to measure the similarity between two instances.
Dynamic Time Warping (DTW) [17] is another distance measure for time series and it was originally and typically designed for univariate time series. However, the time series handled in this paper is multidimensional and a multidimensional version of DTW is needed. Fortunately, ten  Holt et al. [18] proposed a multidimensional DTW and it utilizes all dimensions to find the best synchronization. In standard DTW, the distance is usually calculated by taking the squared distance between the feature values of each combination of points: ( , ) = ( − ) 2 . But in multidimensional DTW, a distance measure for two -dimensional points must be calculated: ( , ) = ∑ =1 ( − ) 2 . In NN DTW, we use multidimensional DTW distance to measure the similarity between two instances. RNN allows the identification of dynamic system with an explicit model of time and memory, which makes it ideal for time series classification. In this paper, we choose Elman's architecture, which consist of a context layer, an input layer, one hidden layer, and an output layer.
To evaluate the performance of these methods, we test them on real-world time series datasets, including sleeping dataset, PAMAP2 dataset, and UCR Time Series Classification Archive.
The performance of the classifier is reported using error rate and the error rate of classifiers is defined as shown in error rate = total number of misclassification data total number of testing data .
According to Rechtschaffen and Kales (R&K) [19], sleep recordings can be divided into the following five stages: awake, rapid eye movement (REM), stage 1, stage 2, and slow wave sleep (SWS). Our goal is to find a map function that correctly predicts the corresponding sleep stage according to the X : = ( ).

Experiment Setup.
The raw signals of all subjects are slightly preprocessed by notch filtering at 50 Hz to cancel out power line disturbances and then are prefiltered with a bandpass filter of 0.3 to 32 Hz for EEG and EOG and 10 to 32 Hz for EMG. After that they are downsampled to 64 Hz. Since the sample rate is 64 samples per second and we set window width to be 1 second of data, our time series become  trainSamples  25000  5017  5005  4993  4986  4999  Val Samples  5000  983  995  1007  1014  1001  Ucdbb009  20000  4230  3720  6420  740  4890  Ucdbb010  20000  1980  11040  2610  2480  1890  Ucdbb011  20000  3270  6170  2430  390  7740  Ucdbb012  20000  4380  7710  930  3660  3320  Ucdbb013  20000  2670  4040  3660  2010  7620  Ucdbb014  20000  0  6780  7380 1190 4650 Since the length of X is 64, we have corresponding 64 labels. The last label is selected as the label of the time series X . In our study, we use five people recordings as the training set. In order to balance the samples, we select 6000 records every category random. So we have 30000 recordings and we divide 25000 into train samples and 5000 into validation samples. The other six people recordings are used for test data. The distribution of dataset is listed in Table 1.

Experiment Result.
Our goal is to compare the performance of IDBNs with original DBN, NN ED, NN DTW, and RNN for time series classification. We illustrate the error rate of each model in Table 2. The best results are recorded in boldface in Table 2.
Compared with other four algorithms, the proposed algorithm has best performance. The classification accuracies of Cycle DBN on all the test data are up to 90% and especially most of them are more than 99%. Standard DBN has a higher rate of correct classification than NN ED, NN DTW, and RNN. RNN shows quite poor performance and the error rate is about 50%.

Activity Classification.
Our second experiment is on the PAMAP2 dataset for activity classification. This dataset can be downloaded at http://archive.ics.uci.edu/ml/datasets/ PAMAP2+Physical+Activity+Monitoring.

Experiment Setup.
To improve the performance of the proposed approach, we need to carry out a data preprocessing process at the beginning of the experiment. Each dimension of time series is normalized through where mean( ) and std( ) are the mean and standard deviation of the variable for samples belonging to the same column, not all samples. For each subject of seven subjects, we randomly select 1/2 as training set, 1/6 as validation set, and the rest as test set.

Experiment Result.
We evaluate classification accuracies of each model on these seven subjects. Table 3 shows the detailed error rates comparison of each subject. From Table 3 we can see that the classification accuracies of the five models on the seven datasets are more than 90%. However, our Cycle DBN model is either the lowest error rate one or very close to the lowest error rate one for each subject.
NN ED also shows quite excellent performance and we should note that NN is feature-based model.
It is well known that feature-based models have an advantage over lazy classification models such as NN in efficiency. Although NN has high classification accuracy, the prediction time of NN will increase dramatically when the size of training data set grows. The prediction time of DBN and Cycle DBN will not increase no matter how large the training data is. Therefore, Cycle DBN shows excellent

UCR Time Series Classification.
Besides the above two data sets, we also test our Cycle DBN on the ten distinct time series datasets from UCR time series [20]. All the dataset has been split into training and testing by default. The only preprocessing in our experiment is normalization and divides them into training, validating, and testing set. Table 4 shows the test error rate and a comprehensive comparison with NN ED, NN DTW, RNN, DBN, and Cycle DBN.
Cycle DBN outperforms other four methods on five datasets of ten datasets; NN ED and NN DTW achieve best performance on the same two datasets. DBN achieves best performance on two datasets. Although the performance of RNN is not prominent, the effect is also acceptable.

Conclusion
Time series classification is becoming more and more important in a broad range of real-world applications. However, most existing methods have lower classification accuracy or need domain knowledge to identify representative features in data. In this paper, we proposed a Cycle DBN for classification of multivariate time series data in general. Like DBN, Cycle DBN is an unsupervised learning algorithm which can discover the structure hidden in the data and learn representations that are more suitable as input to a supervised machine than the raw input. Comparing with DBN, the new model Cycle DBN predicts the label of time not only based on the current input but also based on the label of previous time −1 . We evaluated our Cycle DBN model on twelve real-world datasets and experimental results show that our model outperforms DBN, NN ED, NN DTW, and RNN on most datasets.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.