MPEMathematical Problems in Engineering1563-51471024-123XHindawi Publishing Corporation51381010.1155/2010/513810513810Research ArticleIncomplete Time Series Prediction Using Max-Margin Classification of Data with Absent FeaturesZhaoweiShang1LingfengZhang1ShangjunMa2BinFang1TaipingZhang 1LiMing1College of Computer ScienceUniversity of ChongqingChongqing 400030Chinacqu.edu.cn2School of Mechatronic EngineeringNorthwestern Polytechnical UniversityXi'an 710072Chinanpu.edu20102705201020101802201024032010200420102010Copyright © 2010This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This paper discusses the prediction of time series with missing data. A novel forecast model is proposed based on max-margin classification of data with absent features. The issue of modeling incomplete time series is considered as classification of data with absent features. We employ the optimal hyperplane of classification to predict the future values. Compared with traditional predicting process of incomplete time series, our method solves the problem directly rather than fills the missing data in advance. In addition, we introduce an imputation method to estimate the missing data in the history series. Experimental results validate the effectiveness of our model in both prediction and imputation.

1. Introduction

The subjects of time series prediction have sparked considerable research activities, ranging from short-range-dependent series to long-range-dependent series , from conventional time series to fractal time series [5, 6]. Traditional predicting technologies are targeted for complete time series, such as Neural networks (NNs) , and Support Vector Regression (SVR) , and so forth. However, the time series we encountered in real life often contain missing data due to malfunctioned sensors, human factors, and other reasons. When dealing with prediction of incomplete time series, traditional process consists of two steps. The first step is to recover the incomplete time series by an imputation model, and the second step is to estimate the predicting model as complete time series. This process is shown in Figure 1(a). It may consume large number of calculations, and bring deviation for inaccurate imputation. In this paper, we propose a novel predicting model built directly by incomplete time series, which is shown in Figure 1(b).

Two distinct processes of incomplete time series prediction.

Our process

The issue of modeling incomplete time series is interpreted as classification of data with missing features in this paper. We use the optimal hyperplane of the classification to determine the prediction values. A similar approach has been applied to prediction of complete data . In addition, our model is also used as an imputation method, which means estimating the missing data of history series. There have been several works carried out on the imputation of missing data. In the following, these methods are separated into two groups.

Statistical Methods. Examples include Maximum likelihood (ML) algorithm , expectation maximization (EM) algorithm , Multiple Imputation , and so forth. Based on statistical theory, ML and EM algorithms need to know the distribution model of data and often have higher computational complexity. Multiple Imputation, which imputes the missing data M times, assumes that the missing data are Missing At Random (MAR).

Machine Learning Methods. Examples include mean, K-Nearest Neighbor (KNN) , Varies Windows Similarity Measure (VWSM)  and Regional Gradient Guided Bootstrapping (RGGB) , and so forth. KNN algorithm finds a plausible value of missing data by measuring the distance. VWSM algorithm fills the missing data by complete subsequences which are in the similar cycles. RGGB algorithm imputes the missing data by estimating the slopes of the imputing boundary regions. From the results of , RGGB algorithm outperforms other traditional imputation methods, such as MI and VWSM. However, it cannot be used to analyze dataset with high fluctuations.

Compared with traditional imputation methods, different samples can be selected to calculate the absent data in the history series using our model, which ensures the imputing accuracy of missing data.

The rest of this paper is organized as follows. Section 2 introduces the establishment of our model. The theory of max-margin classification of data with absent features is reviewed briefly in Section 3. The solution and algorithm of our model are discussed in Section 4. Section 5 follows with the experiments, in which the prediction and imputation performance of our model are tested in detail. Finally, conclusions are presented in Section 6.

2. Presentation of Our Model

We start by formalizing the problem of incomplete time series. Assume that a time series with missing data is given as {x1,x2,?3,,xd,xd+1,?d+2,,xq-1,?q,?q+1,,xn}, where “?” represents the missing data. The sample set X={X1,,Xd|Xd+1} of the incomplete time series can be formulated asX=(x1x2xd|xd+1x2?3xd+1|?d+2?3x4?d+2|xd+3xn-dxn-d+1xn-1|xn), where d denotes the embedding dimension.

Predicting technologies usually establish regression models by X, where Xd+1 acts as the predicting target. In order to predict the value of xk+d, {xk,xk+1,,xk+d-1} must be the input data of the model.

The implementation process of our model starts by dividing the sample set X into two parts: training set Xt and imputing set Xm. The predicting targets of the training samples in Xt are existing values, while those of the imputing samples in Xm are missing values. Training set is used to construct our predicting model. The role of imputing set is to estimate the missing values.

We construct two classes of incomplete data C1 and C2 by Xt, which can be expressed asC1={Xt1,,Xtp-1,Xtp,,Xtd+1+ε},C2={Xt1,,Xtp-1,Xtp,,Xtd+1-ε}, where ε is the fitting error. Being the optimal hyperplane of classification of C1 and C2, H is obtained by the theory of max-margin classification of data with missing features. Predicting samples must fall on the hyperplane determined by the training set for a small ε; thus the prediction values can be calculated by H.

This model can also be used to predict the missing data of incomplete time series. The imputing samples, taken from the sample set, also fall on the hyperplane. Therefore the missing values can be estimated in the same way as the prediction values. The implementation process of our model is shown in Figure 2.

The implementation process of our model.

In the process of imputation, each missing data can be estimated by all the samples containing it in X, not just the imputing sample in Xm. Assume that xi  is absent; the number of different samples we can use to compute the value of xi  is Ni={i,1id,d  +1,d+1in-d,n-i+1,n-d+1in, where Ni is equal to the frequency of xi  in X.

3. Max-Margin Classification of Data with Missing Features

In the previous discussion, the issue of modeling incomplete time series is interpreted as classification of data with missing features. In this section, we review the theory of max-margin classification of data with missing features proposed by Chechik .

Assume a set of samples x1,,xn with missing features. yi denotes the binary class label of xi, and yi{-1,1}. Each sample is characterized by a subset of features from a full set 𝔽.

The problem of classification can be interpreted as to find an optimal hyperplane with the max-margin framework. In the case of classification of incomplete data, the instance margin treating the margin of each instance in its own relevant subspace is defined asρi(w)=yi(w(i)xi+b)w(i), where w(i) is a vector obtained by taking the entries of w that are related to xi. Considering the geometric margin to be the minimum over all instance margins, it comes to an optimization problem maxw(miniyi(w(i)xi+b)w(i)). Define the scaling coefficients si=w(i)/w, and rewrite (3.2) asmaxw(miniyi(w(i)xi+b)w(i))=maxw1w(miniyi(w(i)xi+b)si),si=w(i)w. For a given set of si, we can solve a constrained optimization problemmaxαRni=1nαi-12i,j=1nαiyisixi,xjyjsjαjs.t.0αiC,i=1,,n,i=1nαiyisi=0,   where the inner product ·,· is taken only over features that are valid for both xi and xj. The nonlinear classification is solved by using kernels. Thus we obtain the optimal separating hyperplane of classification of data with missing features, which is expressed as i=1nαiyisiK(x,xi)+b=0, where b is set as in Support Vector Machines [17, 18].

4. Solution and Algorithm

In our model, the hyperplane of classification of data with missing data is used to compute the estimation values. Both predicting samples and imputing samples satisfy (3.5). In this section we introduce the solution and algorithm of our model.

4.1. Analytical Solution

Suppose a test sample x={x1,x2,,xd,xd+1}, where xd+1 is the value to be estimated. In this paper we use the kernel function K(xi,xj)=(xi,xjF+1)2. Replacing the kernel of (3.5), we obtain i=1nαiyisi(x,xi𝔽+1)2+b=0.

The simplification of (4.1) isi=1nαiyisixi,d+12·xd+12+2i=1nαiyisi(j=1dxi,j·xj+1)xi,d+1·xd+1+i=1nαiyisi(j=1dxi,j·xj+1)2+b=0, where the product operator “·” is taken only over features that are valid for both xi and x. Equation (4.2) is a quadratic equation of xd+1, and can be solved easily.

4.2. Numerical Solution

Sometimes, analytical solution is meaningless or nonexistent. We need to get numerical solution of our model by iterative algorithms [19, 20]. Still take x as an example, supposing we use Newton method, the object function of our model can be represented asF(xd+1)=i=1nαiyisiK(x,xi)+b.

The iterative equation of xd+1 can be expressed as xd+1(i+1)=xd+1(i)-F(xd+1(i))F(xd+1(i)).

Therefore, the estimation values are calculated by our model effectively. Numerical solution is more complicated, but applicable in every case.

In conclusion, we have introduced the establishment and solution of our model. The key idea is to first identify a hyperplane of classification of data with missing features by incomplete time series. Then, the hyperplane is used to calculate the estimation values in predicting and imputing samples. Figure 3 provides the algorithms of our model for prediction and imputation.

The algorithms of our model for prediction and imputation.

Incomplete time series prediction algorithm

Incomplete time series imputation algorithm

5. Experiments

To check the validity of our model, four experiments are conducted in this section. Firstly, the prediction performance of our model is evaluated in test A. Given that conventional imputation methods usually perform distinctly when incomplete time series are missing discretely and continuously, we examine the imputation performance of our model in two missing modes in test B and test C, respectively. The performance of our model is compared with that of RGGB and other two classical imputation methods: Mean and KNN. Finally, we verify the prediction performance of incomplete time series imputed by different models in test D.

The time series used in the experiments are Mackey-Glass time series and Henon time series. Mackey-Glass time series is generated by the chaotic equationdxdt=-b*x(t)+a*x(t-τ)1+x10(t-τ), where parameter τ is set to 17, a=0.2, and b=0.1. The interval of t is 5. Henon time series is generated by the nonlinear equationx(n+1)=1-1.4x(n)2+0.3x(n).

By contrast, Henon time series has a higher volatility. The dimension of the sample set d=15. Parameters of our model are ε=0.001 and C=100. The value of K in KNN is set to 5.

MSE (Mean Squared Error) and MAE (Mean Absolute Error MAE) are used to evaluate the performance of the experiments. All the results are obtained by repeating the algorithms 10 times.

5.1. Prediction of Incomplete Time Series

In this test, continuous 115 data of Mackey-Glass time series with the missing level from 3% to 18% are used to construct the initial sample set,and the next 65 data are for testing the prediction performance of our model. The prediction results are shown in Figure 4.

Prediction results of our model in Mackey-Glass time series.

From Figure 4 we can see that, with the increase of the missing level, larger deviations of the prediction results occur inevitably due to the decrease of the number of training samples. However, in practice, we can use an acceptable limit of error as the basis for judgment. For example, set the acceptable limit of error MAEl=0.1, which equals to the minimum scale of Mackey-Glass time series. Thus our model performs well even when the missing level reaches 17%. Figure 5 shows an example of prediction performance of our model in Mackey-Glass time series with the missing level of 3% and 17%. From Figure 5, our model predicts the future time series roughly when 17% of the history data are absent. Compared with the performance of missing level at 3%, only some details are missing. Similar results are obtained in Henon time series, which are shown in Figures 6 and 7.

Prediction performance of our model in Mackey-Glass time series.

Prediction results of our model in Henon time series.

Prediction performance of our model in Henon time series.

5.2. Imputation of Incomplete Time Series with Discrete Missing Data

The continuous 115 data of Mackey-Glass time series and Henon time series with discrete missing data are used as the experimental data in this test. The imputation results of different models are shown in Tables 1 and 2.

Imputation results of Mackey-Glass time series with discrete missing data.

 Missing level Our Mean KNN RGGB MSE MAE MSE MAE MSE MAE MSE MAE 4–6% 0.0716 0.0626 0.0628 0.0545 0.0728 0.0602 0.0692 0.0594 7–9% 0.0783 0.0744 0.0809 0.0657 0.0823 0.0612 0.0840 0.0771 10–12% 0.0912 0.0865 0.0969 0.0833 0.0795 0.0669 0.0979 0.0857 13–15% 0.1064 0.0933 0.1010 0.0836 0.1086 0.1050 0.1242 0.1098

Imputation results of Henon time series with discrete missing data.

 Missing level Our Mean KNN RGGB MSE MAE MSE MAE MSE MAE MSE MAE 4–6% 0.2385 0.1999 0.8352 0.6791 0.4624 0.4602 0.6683 0.5592 7–9% 0.4223 0.3120 0.9954 0.8148 0.6707 0.5534 0.7888 0.6156 10–12% 0.6461 0.4422 1.3951 1.2638 0.6746 0.6193 0.8321 0.6378 13–15% 0.6787 0.5261 1.4655 1.3114 0.7868 0.6467 1.2035 0.7639

From Tables 1 and 2 we can see that, the imputation performance of our model is similar to that of KNN and RGGB in Mackey-Glass time series. However, in Henon time series our model outperforms other three methods at every missing level. An example of imputation performance of our model over Henon time series with the missing level of 10% is shown in Figure 8.

Imputation performance of our model in Henon time series with the missing level of 10%.

Figure 8 shows that our model imputes most of the missing data in Henon time series effectively. Compared with other methods, the performance of our model is not sensitive to the fluctuation of time series.

5.3. Imputation of Incomplete Time Series with Continuous Missing Data

We evaluate the performance of different imputation methods by incomplete time series with continuous missing data in the same way. Set the maximum length of continuous missing data l=5. The imputation results are shown in Tables 3 and 4.

Imputation results of Mackey-Glass time series with continuous missing data.

 Missing level Our Mean KNN RGGB MSE MAE MSE MAE MSE MAE MSE MAE 4–6% 0.0741 0.0604 0.1044 0.0941 0.0846 0.0676 0.1296 0.1013 7–9% 0.0831 0.0691 0.1718 0.1374 0.1299 0.1081 0.1531 0.1320 10–12% 0.0832 0.0748 0.1791 0.1508 0.1091 0.0853 0.2020 0.1553 13–15% 0.1082 0.0814 0.1951 0.1526 0.1347 0.1119 0.2220 0.1843

Imputation results of Henon time series with continuous missing data.

 Missing level Our Mean KNN RGGB MSE MAE MSE MAE MSE MAE MSE MAE 4–6% 0.2409 0.1966 0.9633 0.8316 0.5449 0.4247 0.7294 0.5216 7–9% 0.3945 0.3391 1.0269 0.8957 0.5450 0.5022 0.8275 0.5777 10–12% 0.5472 0.4904 1.1343 0.9802 0.6777 0.5297 1.1462 0.7090 13–15% 0.8340 0.6190 1.3937 1.3101 0.7895 0.6711 1.2137 0.7657

Tables 3 and 4 indicate that our model outperforms other three methods in both Mackey-Glass time series and Henon time series when the data are missing continuously. Compared with Tables 1 and 2, no significant difference is observed between the two missing modes in our model, while other methods perform better in the former. Figure 9 shows an example of imputation performance of our model over Mackey-Glass time series with the missing level of 10%.

Imputation performance of our model in Mackey-Glass time series with the missing level of 10%.

There are three sets of continuous missing data in Figure 9. Our method imputes the first two effectively. Based on the above observation, we conclude that our method performs better than other traditional technologies of imputation.

5.4. Prediction after Imputation

The prediction performance of incomplete time series imputed by different models in test B and test C is evaluated in this test. We also use the next 65 data to test the prediction performance. The error-tolerant BP algorithm is used to build the predicting model. The prediction results are shown in Figures 10 and 11.

Prediction results in Mackey-Glass time series imputed by different methods.

Prediction results in Henon time series imputed by different methods

From Figures 10 and 11 we can see that, the prediction performance in Mackey-Glass time series and Henon time series imputed by our model are both superior to that of imputed by other imputation algorithms.

6. Conclusions

Learning and prediction of incomplete data are still pervasive problems, although extensive studies have been conducted to improve the efficiency of data acquisition and transmission [21, 22]. We have proposed a new prediction model for incomplete time series. Experiments conducted in this paper confirm that our model can be successfully applied to prediction of incomplete time series with a missing level below than that of acceptable error limit. In addition, the imputation performance of our model is superior to that of other imputation methods, and insensitive to the fluctuation of time series. Future work may focus on applications of the model in some relevant fields [23, 24] and real-life problems.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under the project Grants nos. 60573125, 90820306, and 60873264. The authors would like to thank the anonymous reviewers in MPE for helpful suggestions and corrections.

ShumwayR. H.StofferD. S.Time Series Analysis and Its Applications2000New York, Ny, USASpringer-Verlagxiv+549Springer Texts in StatisticsMR1856572ZBL1032.62086LiM.LiJ.-Y.On the predictability of long-range dependent seriesMathematical Problems in Engineering20102010939745410.1155/2010/397454BakhoumE. G.ebakhoum@uwf.eduTomaC.cg.toma@yahoo.comDynamical aspects of macroscopic and quantum transitions due to coherence function and time series eventsMathematical Problems in Engineering20102010134289032-s2.0-7795149464610.1155/2010/428903LiM.ZhaoW.Variance bound of ACF estimation of one block of fGn with LRDMathematical Problems in Engineering201020101456042910.1155/2010/560429BoxG. E. P.JenkinsG. M.ReinselG. C.Time Series Analysis: Forecasting and Control19943rdEnglewood Cliffs, NJ, USAPrentice Hallxvi+598MR1312604LiM.Fractal time series-—a tutorial reviewMathematical Problems in Engineering201020102615726410.1155/2010/157264MR2570932VemuriV. R.RogersR. D.Artificial Neural Networks: Forecasting Time Series1993Los Alamitos, Calif, USAIEEE Computer Society PressCaoL. J.TayF. E. H.Support vector machine with adaptive parameters in financial time series forecastingIEEE Transactions on Neural Networks2003146150615182-s2.0-000067242410.1109/TNN.2003.820556NingY.ZuopengL.YishengD.HuoliW.SVM nonlinear regression algorithmComputer Engineering200531101921DempsterA. P.LairdN. M.RubinD. B.Maximum likelihood from incomplete data via the EM algorithmJournal of the Royal Statistical Society. Series B1977391138MR0501537ZBL0364.62022GhahramaniZ.JordanM. I.Supervised learning from incomplete data via an EM approachAdvances in Neural Information Processing Systems (NIPS 6)1994San Fransisco, Calif, USAMorgan Kauffman120127RubinD. B.Multiple Imputation after 18+ yearsJournal of the American Statistical Association1996914344734892-s2.0-000259796310.2307/2291635ZBL0869.62014WasitoI.MirkinB.Nearest neighbour approach in the least-squares data imputation algorithmsInformation Sciences20051691-212510.1016/j.ins.2004.02.014MR2114094ZBL1084.62043ChiewchanwattanaS.sunkra@kku.ac.thLursinsapC.lchidcha@chula.ac.thChuC.-H. H.cice@cacs.louisiana.eduImputing incomplete time-series data based on varied-window similarity measure of data sequencesPattern Recognition Letters2007289109111032-s2.0-034760077010.1016/j.patrec.2007.01.008PrasomphanS.sathitp26@yahoo.comLursinsapC.lchidcha@chula.ac.thChiewchanwattanaS.sunkra@kku.ac.thImputing time series data by regional-gradient-guided bootstrapping algorithmProceedings of the 9th International Symposium on Communications and Information Technology (ISCIT '09)September 2009Incheon, South Korea1631682-s2.0-000097458410.1109/ISCIT.2009.5341265ChechikG.GAL@CS.STANFORD.EDUHeitzG.GAHEITZ@STANFORD.EDUElidanG.GALEL@CS.STANFORD.EDUAbbeelP.PABBEEL@CS.STANFORD.EDUKollerD.KOLLER@AI.STANFORD.EDUMax-margin classification of data with absent featuresJournal of Machine Learning Research200891212-s2.0-21844483562VapnikV. N.The Nature of Statistical Learning Theory1995New York, NY, USASpringerxvi+188MR1367965SchölkopfB.SmolaA. J.Learning with Kernels: Support Vector Machines, Regularization Optimization and Beyond2002Cambridge, Mass, USAMIT PressChenW.-S.PanB.FangB.LiM.TangJ.Incremental nonnegative matrix factorization for face recognitionMathematical Problems in Engineering200820081741067410.1155/2008/410674MR2471407ZBL1162.94306QiL. Q.SunJ.A nonsmooth version of Newton's methodMathematical Programming1993581–3353367MR121679110.1007/BF01581275ZBL0780.90090ChenS. Y.LiY. F.ZhangJ.Vision processing for realtime 3-D data acquisition based on coded structured lightIEEE Transactions on Image Processing2008172167176MR244600610.1109/TIP.2007.914755BinghamJ. A. C.Multicarrier modulation for data transmission: an idea whose time has comeIEEE Communications Magazine19902855142-s2.0-002543412610.1109/35.54342LiM.ZhaoW.Representation of a stochastic traffic boundIEEE Transactions on Parallel and Distributed Systems. In pressMattioliG.ScaliaM.CattaniC.Analysis of large amplitude pulses in short time intervals: application to neuron interactionsMathematical Problems in Engineering. In press