Incomplete Time Series Prediction Using Max-Margin Classification of Data with Absent Features

This paper discusses the prediction of time series with missing data. A novel forecast model is proposed based on max-margin classification of data with absent features. The issue of modeling incomplete time series is considered as classification of data with absent features. We employ the optimal hyperplane of classification to predict the future values. Compared with traditional predicting process of incomplete time series, our method solves the problem directly rather than fills the missing data in advance. In addition, we introduce an imputation method to estimate the missing data in the history series. Experimental results validate the effectiveness of our model in both prediction and imputation.


Introduction
The subjects of time series prediction have sparked considerable research activities, ranging from short-range-dependent series to long-range-dependent series 1-4 , from conventional time series to fractal time series 5, 6 .Traditional predicting technologies are targeted for complete time series, such as Neural networks NNs 7 , and Support Vector Regression SVR 8 , and so forth.However, the time series we encountered in real life often contain missing data due to malfunctioned sensors, human factors, and other reasons.When dealing with prediction of incomplete time series, traditional process consists of two steps.The first step is to recover the incomplete time series by an imputation model, and the second step is to estimate the predicting model as complete time series.This process is shown in Figure 1 a .It may consume large number of calculations, and bring deviation for inaccurate imputation.In this paper, we propose a novel predicting model built directly by incomplete time series, which is shown in Figure 1 b .The issue of modeling incomplete time series is interpreted as classification of data with missing features in this paper.We use the optimal hyperplane of the classification to determine the prediction values.A similar approach has been applied to prediction of complete data 9 .In addition, our model is also used as an imputation method, which means estimating the missing data of history series.There have been several works carried out on the imputation of missing data.In the following, these methods are separated into two groups.Compared with traditional imputation methods, different samples can be selected to calculate the absent data in the history series using our model, which ensures the imputing accuracy of missing data.
The rest of this paper is organized as follows.Section 2 introduces the establishment of our model.The theory of max-margin classification of data with absent features is reviewed briefly in Section 3. The solution and algorithm of our model are discussed in Section 4. Section 5 follows with the experiments, in which the prediction and imputation performance of our model are tested in detail.Finally, conclusions are presented in Section 6.

Presentation of Our Model
We start by formalizing the problem of incomplete time series.Assume that a time series with missing data is given as x 1 , x 2 , ? 3 , . . ., x d , x d 1 , ?d 2 , . . ., x q−1 , ?q , ?q 1 , .where "?" represents the missing data.The sample set X {X 1 , . . ., X d |X d 1 } of the incomplete time series can be formulated as where d denotes the embedding dimension.Predicting technologies usually establish regression models by X, where X d 1 acts as the predicting target.In order to predict the value of x k d , {x k , x k 1 , . . ., x k d−1 } must be the input data of the model.
The implementation process of our model starts by dividing the sample set X into two parts: training set Xt and imputing set Xm.The predicting targets of the training samples in Xt are existing values, while those of the imputing samples in Xm are missing values.Training set is used to construct our predicting model.The role of imputing set is to estimate the missing values.
We construct two classes of incomplete data C 1 and C 2 by Xt, which can be expressed as where ε is the fitting error.Being the optimal hyperplane of classification of C 1 and C 2 , H is obtained by the theory of max-margin classification of data with missing features.Predicting samples must fall on the hyperplane determined by the training set for a small ε; thus the prediction values can be calculated by H.This model can also be used to predict the missing data of incomplete time series.The imputing samples, taken from the sample set, also fall on the hyperplane.Therefore the missing values can be estimated in the same way as the prediction values.The implementation process of our model is shown in Figure 2.

Mathematical Problems in Engineering
In the process of imputation, each missing data can be estimated by all the samples containing it in X, not just the imputing sample in Xm.Assume that x i is absent; the number of different samples we can use to compute the value of x i is where N i is equal to the frequency of x i in X.

Max-Margin Classification of Data with Missing Features
In the previous discussion, the issue of modeling incomplete time series is interpreted as classification of data with missing features.In this section, we review the theory of maxmargin classification of data with missing features proposed by Chechik 16 .Assume a set of samples x 1 , . . ., x n with missing features.y i denotes the binary class label of x i , and y i ∈ {−1, 1}.Each sample is characterized by a subset of features from a full set F.
The problem of classification can be interpreted as to find an optimal hyperplane with the max-margin framework.In the case of classification of incomplete data, the instance margin treating the margin of each instance in its own relevant subspace is defined as where w i is a vector obtained by taking the entries of w that are related to x i .Considering the geometric margin to be the minimum over all instance margins, it comes to an optimization problem max where the inner product •, • is taken only over features that are valid for both x i and x j .The nonlinear classification is solved by using kernels.Thus we obtain the optimal separating hyperplane of classification of data with missing features, which is expressed as where b is set as in Support Vector Machines 17, 18 .

Solution and Algorithm
In our model, the hyperplane of classification of data with missing data is used to compute the estimation values.Both predicting samples and imputing samples satisfy 3.5 .In this section we introduce the solution and algorithm of our model.

Analytical Solution
Suppose a test sample x {x 1 , x 2 , . . ., x d , x d 1 }, where x d 1 is the value to be estimated.In this paper we use the kernel function K x i , x j x i , x j F 1 2 .Replacing the kernel of 3.5 , we obtain The simplification of 4.1 is where the product operator "•" is taken only over features that are valid for both x i and x .Equation 4.2 is a quadratic equation of x d 1 , and can be solved easily.

Numerical Solution
Sometimes, analytical solution is meaningless or nonexistent.We need to get numerical solution of our model by iterative algorithms 19, 20 .Still take x as an example, supposing we use Newton method, the object function of our model can be represented as The iterative equation of x d 1 can be expressed as 4.4 Therefore, the estimation values are calculated by our model effectively.Numerical solution is more complicated, but applicable in every case.
In conclusion, we have introduced the establishment and solution of our model.The key idea is to first identify a hyperplane of classification of data with missing features by incomplete time series.Then, the hyperplane is used to calculate the estimation values in predicting and imputing samples.Figure 3 provides the algorithms of our model for prediction and imputation.

Experiments
To check the validity of our model, four experiments are conducted in this section.Firstly, the prediction performance of our model is evaluated in test A. Given that conventional imputation methods usually perform distinctly when incomplete time series are missing discretely and continuously, we examine the imputation performance of our model in two missing modes in test B and test C, respectively.The performance of our model is compared with that of RGGB and other two classical imputation methods: Mean and KNN.Finally, we verify the prediction performance of incomplete time series imputed by different models in test D.
The time series used in the experiments are Mackey-Glass time series and Henon time series.Mackey-Glass time series is generated by the chaotic equation

Prediction of Incomplete Time Series
In this test, continuous 115 data of Mackey-Glass time series with the missing level from 3% to 18% are used to construct the initial sample set, and the next 65 data are for testing the prediction performance of our model.The prediction results are shown in Figure 4.
From Figure 4 we can see that, with the increase of the missing level, larger deviations of the prediction results occur inevitably due to the decrease of the number of training samples.However, in practice, we can use an acceptable limit of error as the basis for judgment.For example, set the acceptable limit of error MAE l 0.1, which equals to the  minimum scale of Mackey-Glass time series.Thus our model performs well even when the missing level reaches 17%. Figure 5 shows an example of prediction performance of our model in Mackey-Glass time series with the missing level of 3% and 17%.From Figure 5, our model predicts the future time series roughly when 17% of the history data are absent.Compared with the performance of missing level at 3%, only some details are missing.Similar results are obtained in Henon time series, which are shown in Figures 6 and 7.

Imputation of Incomplete Time Series with Discrete Missing Data
The continuous 115 data of Mackey-Glass time series and Henon time series with discrete missing data are used as the experimental data in this test.The imputation results of different models are shown in Tables 1 and 2.
From Tables 1 and 2 we can see that, the imputation performance of our model is similar to that of KNN and RGGB in Mackey-Glass time series.However, in Henon time series our model outperforms other three methods at every missing level.An example of imputation performance of our model over Henon time series with the missing level of 10% is shown in Figure 8.
Figure 8 shows that our model imputes most of the missing data in Henon time series effectively.Compared with other methods, the performance of our model is not sensitive to the fluctuation of time series.

Imputation of Incomplete Time Series with Continuous Missing Data
We evaluate the performance of different imputation methods by incomplete time series with continuous missing data in the same way.Set the maximum length of continuous missing data l 5.The imputation results are shown in Tables 3 and 4.
Tables 3 and 4 indicate that our model outperforms other three methods in both Mackey-Glass time series and Henon time series when the data are missing continuously.Compared with Tables 1 and 2, no significant difference is observed between the two missing modes in our model, while other methods perform better in the former.Figure 9 shows an example of imputation performance of our model over Mackey-Glass time series with the missing level of 10%.
There are three sets of continuous missing data in Figure 9.Our method imputes the first two effectively.Based on the above observation, we conclude that our method performs better than other traditional technologies of imputation.

Prediction after Imputation
The prediction performance of incomplete time series imputed by different models in test B and test C is evaluated in this test.We also use the next 65 data to test the prediction  From Figures 10 and 11 we can see that, the prediction performance in Mackey-Glass time series and Henon time series imputed by our model are both superior to that of imputed by other imputation algorithms.

Conclusions
Learning and prediction of incomplete data are still pervasive problems, although extensive studies have been conducted to improve the efficiency of data acquisition and transmission 21, 22 .We have proposed a new prediction model for incomplete time series.Experiments conducted in this paper confirm that our model can be successfully applied to prediction of incomplete time series with a missing level below than that of acceptable error limit.In addition, the imputation performance of our model is superior to that of other imputation methods, and insensitive to the fluctuation of time series.Future work may focus on applications of the model in some relevant fields 23, 24 and real-life problems.

Figure 1 :
Figure 1: Two distinct processes of incomplete time series prediction.

Figure 2 :
Figure 2: The implementation process of our model.

Figure 3 :
Figure 3: The algorithms of our model for prediction and imputation.

Figure 4 :Figure 5 :
Figure 4: Prediction results of our model in Mackey-Glass time series.

Figure 6 :
Figure 6: Prediction results of our model in Henon time series.

Figure 7 :
Figure 7: Prediction performance of our model in Henon time series.

Figure 8 :Figure 9 :
Figure 8: Imputation performance of our model in Henon time series with the missing level of 10%.

Table 1 :
Imputation results of Mackey-Glass time series with discrete missing data.

Table 2 :
Imputation results of Henon time series with discrete missing data.

Table 3 :
Imputation results of Mackey-Glass time series with continuous missing data.

Table 4 :
Imputation results of Henon time series with continuous missing data.