Detection of Outliers and Patches in Bilinear Time Series Models

We propose a Gibbs sampling algorithm to detect additive outliers and patches of outliers in bilinear time series models based on Bayesian view. We first derive the conditional posterior distributions, and then use the results of first Gibbs run to start the second adaptive Gibbs sampling. It is shown that our procedure could reduce possible effects on masking and swamping. At last, some simulations are performed to demonstrate the efficacy of detection and estimation by Monte Carlo methods.


Introduction
Dynamic systems or engineering time series observations 1 are often perturbed by some interrupting phenomena which generate aberrant data.They may be due to some unusual events, such as sudden disturbed factor, noise, and even recording errors.Such values are usually referred to as outliers, which may be isolated or of patch.Because outliers and patches in a time series can have adverse effects in data analysis, which maybe make the resultant inference unreliable or even invalid, it is important to detect and remove such outlier's effects.Some authors have considered the detection problem in the linear and nonlinear models.Abraham and Box 2 analyzed some outlier problems based on Bayesian methods in time series.Tsay

Mathematical Problems in Engineering
Based on some different prior distributions, a new Gibbs sampling algorithm is proposed for identifying additive isolated outliers and patches of outliers in bilinear time series.This paper considers detection of outliers and patches in bilinear time series models, which are one of the fractal time series models 9 .Wang and Wei 10 analyzed the probabilistic structure for a rather general class of bilinear models systematically.Wang 11 considered parameter estimation and subset selection for separable lower triangular bilinear models.In particular, Chen 12 proposed a procedure for detecting additive outliers in bilinear time series, which also discusses some major problems encountered in practice, such as how one can distinguish between ARMA model with outliers and a bilinear model without outliers.Some special cases of bilinear models, such as diagonal bilinear model, vector bilinear model and so on, have been studied by many authors on not only probabilistic structure but also statistical inferences; for instance, Subba Rao 13 , Subba Rao and Gabr 14 , Kim et al. 15 , and Sesay and Subba Rao 16 , among others.
The format of the paper is as follows.Section 2 gives the outliers model of bilinear series.Section 3 presents a method to detect outliers in bilinear model via standard Gibbs sampling.In Section 4, we propose an adaptive Gibbs sampling method to detect patches of outliers in bilinear model.Section 5 gives simulated examples and some conclusions.

Outliers Models of Bilinear Time Series
The time series {y t } is called a bilinear process, if it satisfies the model where ε t ∼ IIDN 0, σ 2 , and φ i , θ j and η ij are unknown parameters.This model is called a bilinear model with order p, q, r, k .The bilinear time series models were introduced by

Outliers Detection via Standard Gibbs Sampling
In order to detect outliers in the bilinear model, it is essential to derive the conditional posterior distributions of parameters Θ, β, δ, α, σ 2 , δ j 1, and β j .
For computational reason, we give the following conditions: 1 using conjugate prior distribution for parameters Θ and σ 2 , which distributed as multidimensional uniform distribution on 0, 1 region and inverted-Gamma distribution IG ν/2, νλ/2 , respectively.2 Assume that the outlier indicator δ t and the outlier magnitude β t are independent and distributed as Bernoulli α and N 0, τ 2 , respectively for all t. 3 The prior distribution of the contamination parameter α is Beta γ 1 , γ 2 , and β t s are i.i.d. for all t. 4 The hyperparameters in our model are λ, ν, γ 1 , γ 2 , and τ 2 , all of which are assumed to be known.
Note that the prior probability of being contaminated by an outlier is the same for all observations, namely, P δ t 1 α, for t s 1, . . ., n.Then, under these conditions and by using the standard Bayesian method, we can obtain the following results.
, where Proof.Under the conditions above, we have that This completes the proof of the theorem.

Mathematical Problems in Engineering
2 The conditional distribution of α given the sample and the other parameters is Beta γ 1 k, γ 2 n − s − k , where k denotes the number of δ j 1, j 1, . . ., n.
3 When δ j 0, there is no new information about the posterior distribution of β j , namely, β j distributed as N 0, τ 2 .When δ j 1, since z t contains information of β j , we have that where β −j is obtained β by eliminating the element β j , and , where A − T j t j a t 1 ϕ t−j and B T j t j ϕ 2 t−j .The proof of this Theorem is very tedious and similar to that in MuCulloch and Tsay [18]; we omit it here.Theorem 3.3.For the conditional posterior distribution of δ j 1, we have where δ −j is obtained δ by eliminating the element δ j , T j min{n, j s}, and , and a t 0 a t 1 ϕ t−j β j .When s 0, then ϕ i 0 for all i; when s > 0, we have that

3.6
Proof.The conditional posterior distribution of δ j j s 1, . . ., n is Bernoulli with probability

3.7
So the conclusion is obtained.

Detection of Outlier Patches via Adaptive Gibbs Sampling
Similar to Justel et al. 5 , our procedure also consists of two Gibbs runs.In the first run, the standard Gibbs sampling based on the results of Section 3 is carried out.The results of this Gibbs run are then used to implement a second Gibbs sampling that is adaptive in treating identified outliers and in using block interpolation to reduce possible masking and swamping effects.Let Θ m , σ m , β m , and α m be the posterior means of Θ, σ 2 , β, and α respectively based on the m iterations of the first Gibbs run.First, we select an appropriate critical value c 1 to identify potential outliers.An observation z j is identified as an outlier if the posterior probability p m j > c 1 .Let {t 1 , . . ., t g } be the collection of time indexes of outliers identified by the first Gibbs run.Second, let c 2 c 2 ≤ c 1 be another appropriate critical value to specify the beginning and end points of a potential outlier patch.We select a window of length 2h around the identified outlier to search for the boundary points of a possible outlier patch by a forward-backward method.Exampling an identified outlier z t i , for the h observations before z t i if their posterior probabilities p m j > c 2 , then these points are regarded as possible outlier patch associated with z t i .We then select the farthest point from z t i as the beginning point of the outlier patch.Denote the point by z t i −k i .Then we do the same for the h observations after z t i and select the farthest point from z t i with p m j > c 2 as the end point of the outlier patch.Denote the end point by z t i v i .Combine the two blocks to form a possible outlier patch associated with z t i , which is denoted by z t i −k i , . . ., z t i v i .Consecutive or overlapping patches should be merged to form a larger patch.Lastly, draw Gibbs samples jointly within a patch.Suppose that a patch of d outliers starting at time index j is specified.Denote the vectors of outlier indicators and magnitudes by δ j,d δ j , . . ., δ j d−1 and β j,d β j , . . ., β j d−1 , respectively, associated with the patch.We give the conditional posterior distributions of δ j,d and β j,d in the next proposition; its derivation is similar to Theorem 1 of Justel et al. 5 .
Proposition 4.1.Let z z 1 , z 2 , . . ., z n be a vector of observations according to 2.3 , with no outliers in the first s data points.Assume δ j ∼ Bernoulli α , j s 1, s 2, . . ., n, and where the parameters γ 1 , γ 2 , υ, λ, and τ 2 are known.Let e t 0 y t − k j 1 η ij y t−i ε t−j be the residual at time t when the series is adjusted for all identified outliers not in the interval j, j d − 1 with the outliers identified in δ j,d , T j,d min{n, j d s − 1}, and where π 0 −1, π i φ i if i 1, . . ., s, and π i 0 if i ≥ s 1.Then we have the following.
a The conditional posterior probability of a block configuration δ j,d , given the sample and the other parameters, is where s j,d j d−1 t j δ t , θ δ j,d denotes all the other parameters except δ j,d , and C is a normalization constant so that the total probability of the 2 d possible configurations of δ j,d is one.
For the second adaptive Gibbs sampling, we use the results of the first Gibbs run to start the second Gibbs sampling and to specify prior distributions of the parameters.For each outlier patch, we use the results of Proposition 4.1 to draw δ j,d and β j,d in the second Gibbs sampling, which is also run for m iterations.
The starting values of δ t are as follows: Then the prior distributions of β t are as follows.
a If z t is identified as an isolated outlier, then the prior distribution of where is the Gibbs estimate of β t from the first Gibbs run.
b If z t belongs to an outlier patch, then the prior distribution of where is the conditional posterior mean as follows: 4.5 c If z t does not belong to any outlier patch and is not an isolated outlier, then the prior distribution of β t is N 0, τ 2 .

Simulated Example and Conclusions
Example 5.1.In the simulations, the designed bilinear model is BL 2,2,1,1 model: We produce a set of observations of the bilinear model 5.1 by simulation.In order to obtain stationary data, we produce 1000 observations of y t firstly; then we use the last 100 observations as our simulated sample, in which a patch of five consecutive additive outliers has been introduced from t 31 to t 35, a single AO has been added at t 50, and the outlier magnitudes are β 31 5, β 32 −4, β 33 5, β 34 −4, β 35 3, and β 50 −6, respectively.Figure 1 shows the trend of simulated series {z t }.It is obvious that the curve of observations had large volatility; it would be very difficult to distinguish between "outliers" and normal points of nonlinear model.
Let γ 1 5, γ 2 95, ν 3, λ σ 2 /2, α 0.5, τ 2 √ 5σ, and c 1 0.5.Here γ 1 5 means that we believe the prior probability of each point is an outlier approximate to 0.05.First, we detect the outliers in {z t } using the standard Gibbs sampling.We can obtain the posterior probabilities that each observation is an outlier by using the given methods in Section 3, which are shown in Figure 2. It displayed that the posterior probabilities of being outliers only at t 31 and t 50 were larger than 0.5.Meanwhile, the outlying posterior probabilities of other observations were lower; the posterior probability of being an outlier at t 35 was even smaller than 0.2.We see that the standard Gibbs sampling failed to detect the inner and border points at t 32, 33, 35, that resulted in the masking effect.Second, in order to avoid the presence of masking and swamping problem, we use the method given in Section 4, and take 900 iterations by the adaptive Gibbs algorithm.Figure 3 gives the plot of the estimated posterior probability that each point is seen as an outlier via adaptive Gibbs sampling in simulation, where the window width of search was 6.It is obvious that the posterior probabilities of being an outlier obtained by adaptive Gibbs algorithm for data points at t 31, . . ., 35 and t 50 are larger than 0.5, which meant that these points were possible outliers.Meanwhile, the outlying posterior probabilities of other observations are very small.Actually, if we select the critical values c 1 0.5 and c 2 0.15, then we could identify the patch.On the other hand, many normal points like outliers in Figure 1 were not misspecified as outliers because the outlying posterior probabilities of these points were smaller than 0.2, which shows that the adaptive Gibbs sampling was more effective than the standard Gibbs sampling in mining the additive outlier patch for bilinear time series.Furthermore, we did 200 replications for the above simulation.Table 1 gives the average values of the posterior means of β 31 , . . ., β 35 and β 50 and their standard deviations.In these replications, the isolated outliers were identified by standard and adaptive Gibbs sampling procedures.Through computing, we find that the standard Gibbs sampling failed to detect all the outlier patches in 200 replications, but in 179 replications, the adaptive Gibbs sampling successfully could specify all outliers and patches.On the other hand, we note that the last three parameters are actually not estimated well by the adaptive Gibbs algorithm in many replications.Investigating its reason, we think it may be the influence of nonlinearity or the frontal outliers of patch.Such situation deserves careful investigation in the future.
Since the bilinear series will by itself produce what appears to be outliers, the detection of the locations of outliers and patches is more difficult, and the estimation of the magnitude of outliers and patches is more complex than autoregressive series.By a number of simulations of detecting the outlier patches in bilinear model by adaptive Gibbs sampling, we discovered that the critical value c 2 should be selected smaller than that in ARMA model.It may be that the variation of bilinear series is larger than that of ARMA series.On the other hand, in the process of running, the Gibbs sampler was repeated several times with different hyper-parameters and different numbers of iterations to reanalyze the data.The results show that the locations of possible outliers and patch are stable, even though the estimated outlying probabilities may vary slightly between the Gibbs samples.Some other case studies also show that our procedure could reduce possible masking and swamping effects, which is an improvement and extension on bilinear time series model over the existing outliers detection methods.
3 considered time series model specification in the presence of outliers.Ljung 4 presented outlier detection in time series.Justel et al. 5 studied detection of outlier patches in autoregressive time series.Battaglia and Orfei 6 considered outlier detection and estimation in nonlinear time series.Chen et al. 7 studied similar problem in ARMAX time series models.P. Chen and Y. Chen 8 gave the identification of outliers in ARMAX models via genetic algorithm.

Figure 1 :Figure 2 :
Figure 1: The curve of the simulated observations {z t }

Figure 3 :
Figure 3: The posterior probability that each point is an outlier via adaptive Gibbs algorithm.