Linear Estimation of Stationary Autoregressive Processes

Consider a sequence of an mth-order Autoregressive (AR) stationary discrete-time process and assume that at least m − 1 consecutive neighboring samples of an unknown sample are available. It is not important that the neighbors are from one side or are from both the left and right sides. In this paper, we find explicit solutions for the optimal linear estimation of the unknown sample in terms of the neighbors. We write the estimation errors as the linear combination of innovation noises. We also calculate the corresponding mean square errors (MSE). To the best of our knowledge, there is no explicit solution for this problem. The known solutions are the implicit ones through orthogonality equations. Also, there are no explicit solutions when fewer than m−1 samples are available. The order of the process (m) and the feedback coefficients are assumed to be known.


Introduction
Estimation has many applications in different areas including compression and equalization [1,2].The linear estimation is more common due to its mathematical simplicity.The optimal linear estimation of a random variable x in terms of y 1 , y 2 , and y n is the following linear combination where the coefficients A i must be chosen to minimize the MSE E{(x − x) 2 } and E{•} stands for the expected value.
To minimize the MSE, we must choose A i 's to satisfy the orthogonality principle as follows: We also write the above condition as Therefore, in the optimal linear estimation, we search for the coefficients such that the error is orthogonal to the data.A common model for many signals including image, speech, and biological signals is the AR model [1,[3][4][5].
This model has applications in different areas including detection [6,7], traffic modeling [8], channel modeling [9], and forecasting [10].An AR process is the output of an all-pole causal filter whose input is a white sequence called innovation noise [11].We introduce another model for the process using an all-pole anticausal filter as well.The optimal linear estimation of an AR process is accomplished through the recursive solution of Yule-Walker (YW) equations using Levinson-Durbin algorithm [12].This solution is recursive and implicit.As we will see in some cases the equation coefficients do not form a Toeplitz matrix and we cannot enjoy the complexity reduction advantage of Levinson algorithm.
To the best of our knowledge, there is no explicit solution for YW equations.Most of the focus of researchers is on model parameters estimation from observations.When researchers arrive at YW equations, they stop, since they consider the solution as known through Levinson recursion.Broersen in his method for autocorrelation function estimation form observations points to YW equations and mainly concentrates on bias reduction in estimation using finite set of observations [13,14].He does not attempt to find the solution for YW equations.Fattah et al. try to estimate the autocorrelation function of an ARMA model from noisy data; they again refer to YW equation set and its solution using matrix inversion and no explicit solution is proposed [15].Xia and Kamel propose an optimization method to estimate AR model parameters from noisy data [16].Noise is not necessarily Gaussian.The method finds a minimum for a cost function and exploits a neural-network algorithm.Again, the explicit solution of the orthogonality equations is not the goal of the paper.Hsiao proposes an algorithm to estimate the parameters of a time-varying AR system [17].He considers the feedback coefficients of a time-varying AR process as random variables.The proposed algorithm maximizes a posteriori probabilities conditioned on the data.The recursive algorithm is compared to Monte Carlo simulation in terms of accuracy and complexity.In this paper, the aim is parameter estimation from data and not the analytic solution of orthogonality equations.In [18], a sequence of Gaussian AR vector is considered.As the sequence elements are vectors rather than scalars, the AR model is defined by matrix feedback coefficients rather than scalar feedback coefficients.The estimation here is more complex, and some independence conditions are assumed.The method is based on convex optimization, and no exact answer can be provided.Mahmoudi and Karimi propose an LS-based method to estimate AR parameters from noisy data [19].The method exploites YW equations, but this method also does not provide the explicit solution to the equations.Another LS-based estimation method can be seen in [20].
As mentioned above, we could not see the final solution to YW orthogonality equations in the literature.In this work we have derived explicit solutions for orthogonality equations for different cases.Consider a stationary mth order AR process.The order and the feedback coefficients of the process are assumed to be known, and the model parameter estimation is out of the scope of this paper.The main goal of this paper is finding the solution for the orthogonality equations.We will find the the optimal linear estimation of a sample in terms of the neighbors where at least m − 1 consecutive neighbors are available.The consecutive neighbors include the situations where all the m − 1 or more neighbors are in one side, or some of them are left neighbors and the others are right neighbors.We will show that no more that m consecutive neighbors in each side are needed.Our approach is to find orthogonal estimation errors that are linear combinations of data.We use the well-known causal LTI AR model as well as our anticausal model to form orthogonal errors.The errors are formed as a linear combination of causal and anticausal innovation (process) noises.Beginning from suitable errors that are both orthogonal to the data and are linear combination of data, we arrive at linear estimations.We seek LTI system approach rather than trying to directly solve the orthogonality equations.The results of this paper for different cases can be important in situations where the equation matrices are ill-posed and the matrix inversion and other recursive algorithms become unstable.
This paper is organized as follows.In Section 2, the causal model is reviewed and the anticausal model is introduced.In Section 3, we review the forward prediction problem.We state the problem symmetries in Section 4. We see how we can use the similarities between two problems to exploit the solution of one problem to find the solution of the other problem.In Section 5, we extract a number of relations for cross-correlation functions that will be used later.We find the interpolation formulae when infinite data are available in Section 6.We find the prediction and interpolation with finite data in Sections 7 and 8, respectively.In Section 9, we present a detailed example to show that our relations and the matrix solution of the orthogonality principle result in the same coefficients.Finally, we conclude the work in Section 10.

Causal and Anticausal Models
A discrete-time stationary AR process s n of order m is modeled as follows.
The above equation is meant for a causal LTI system.I n , the input of the system, is called the innovation noise and is a stationary white sequence with the zero expected value, that is, and E{I n } = 0, where σ is a positive constant.δ[0] = 1 and δ[i] = 0 elsewhere.The system is causal.Therefore s n , the output of the system in the time index n, is a linear combination of the inputs in the time index n and before.So, we can write In the above equation, h n is the impulse response of the system.Assuming the causal system model, we have h n = 0 for n < 0. Paying attention to the whiteness of the sequence {I n } and from (5) we get the following result.
Figure 1 is the causal model of the AR process.H(z) is the Z-transform of h n , which is defined as For the system defined by (4), we have Assuming a stable causal system, we conclude that the roots of A(z) = 0 must be inside the unit circle |z| = 1.The power spectral density function (PSDF) of a process is the Ztransform of its autocorrelation function.The PSDF of s n is [11] In the above equation S s (z) is the PSDF of s n and S I (z) is the PSDF of I n .
We now present the anticausal model.If we apply the sequence s n to an LTI system with the transfer function H −1 (z −1 ), we get another innovation noise called I n .Figure 2 demonstrates the generation of the new innovation noise.To see the whiteness of the sequence I n , note that the PSDF of I n by using Figure 2 and ( 9) is as follows.
Equivalently we can apply I n to the inverse system with the transfer function H (z) = H(z −1 ) to get s n .The generation of s n from I n is depicted in Figure 3.
We have Therefore h n = h −n .Noting that h n = 0 for n < 0, we see that h n = 0 for n > 0. Also note that the roots of A(z −1 ) = 0 are outside the unit circle, as we had the roots of A(z) = 0 inside the unit circle.Regarding these points, we know that the system with the transfer function H (z) is stable and anticausal.We have Using the above equation and Figure 3, we get Also, note that From ( 14) and Figure 3, we see that s n is a linear combination of I n and the inputs after that.The whiteness of the sequence {I n } gives then

Forward Prediction
Forward prediction can be accomplished by using the whitening filter [11].The data are whitened, and we use the equivalent white data to achieve the prediction.As an example, consider the 1-step forward prediction of s n .It is seen that s n is estimated as It can bee seen from ( 4) that the error s n − s n is equal to I n and therefore, from (6), it is orthogonal to s n−k for k > 0. It proves the optimality of ( 16).
The 2-step prediction can be done as [11] In the above equation, s n−1 is the prediction of s n−1 from its previous data (1-step prediction) and is obtained by replacing n by n − 1 in (16).
From ( 17), (18), and (4), the estimation error is From ( 6), it is clear that I n and I n−1 are orthogonal to s n−k for k ≥ 2. It proves the optimality of ( 17).The higher-order predictions can be obtained in the same manner.As the final example of this section, consider the 3step forward prediction that is accomplished as follows.
In the above equation, s n−1 and s n−2 are the 2-step and 1-step predictions of s n−1 and s n−2 , respectively, and are obtained from (17) and (16).The error is (21)

The Problem Symmetries
Consider the following linear interpolation of s n from the data around it: The orthogonality principle gives The above equations become In the above equations, R s [i] = E{s n s n−i }.Now, consider the following estimation.
The orthogonality of error to the data gives Regarding that the R s [•] is an even function, we notice that the set of equations ( 24) and the set of equations ( 27) are exactly the same.Therefore, As an example, consider the following backward prediction.
Using (16) and the symmetry, we get The validity of the solution can also be confirmed as from (13), it is seen that the estimation error is Using (15), it is clear that the error is orthogonal to the data.It proves the optimality of (30).

Cross-Correlation Functions
In this section, we derive a number of properties for the cross-correlations between innovation noises and the AR process.We will exploit these properties to prove our solutions.
We define The first simple property follows from ( 6) and (15) as follows.
Now, consider Figure 1.In this figure I n is the input and s n is the output.The impulse response of system is In this equation, R I [k] = E{I n I n−k } and the " * " operator is the discrete convolution.Taking the Z-transform from both sides of (33) and using (8), we get Or equivalently Taking inverse Z-transform from this equation, we have The right side of (36) is zero for k / = 0. Referring to Figure 3, we have [11] (37) Again, we conclude that (38)

Interpolation Using an Infinite Set of Data
In this section, we assume that infinite number of data are available.However, we will see that only a finite number of data are sufficient.
6.1.Infinite Data on the Left Side.We want to obtain the following estimation.
k 1 is a positive integer constant not greater than m.There are k 1 data available on the right side of s n and infinite data on the left side.Define a 0 1.We are going to prove the following: Observe from (40) that although there are infinite data on the left side of s n , only m data s n−1 to s n−m participate in the estimation.Indeed, (40) is the optimal linear estimation solution for where k 2 can be any integer greater than or equal to m.
To prove the optimality of (40), we must show that the estimation error is orthogonal to the data.Firstly the estimation error can be calculated by inserting s n from ( 40) in e n = s n − s n .Secondly, by extending the innovation noises using (4) we can confirm that Indeed, we have obtained (40) from (41).The motivation is that the estimation error has to possess two essential conditions: (1) it must be orthogonal to the data and (2) it must be only a linear combination of the data and the variable to be estimated.It remains to prove that the right side of (41) is orthogonal to the data.Using (6), it is quite clear that I n to I n+k1 are orthogonal to s n−k for k > 0, and so is e n in (41).Further, we have The last equation of ( 42) is justified as we have R sI [k] = 0 for k < 0 from (32).Using (32), (36), (41), and (42) it is seen that This completes the proof.
The MSE is (44) Therefore,  ( Again, only m data s n+1 to s n+m on the right side of s n participate in the interpolation, and the data after them are not needed.Therefore, (46) is the solution for all the optimal linear interpolations s where k 2 can be any integer greater than or equal to m.
The validity of (46) can also be proved as follows.The error is calculated as e n = s n − s n , where s n is from (46).By extending the innovation noises from ( 13), it can be verified that Using (15), it is quite clear that I n−k1 to I n are orthogonal to s n+k for k > 0, and so is e n in (47).Further, we have The last equation of ( 48) is justified as we have R I s [k] = 0 for k < 0 from (32).Using (32), ( 38), (47), and (48), it is seen that E{e n s n−i } = 0 for 1 ≤ i ≤ k 1 .This completes the proof.
The MSE is the same as in (45).
ISRN Signal Processing

Infinite Data on Both
Sides.Now, we want to estimate s n from all the data around it.We will see that only m data on each side are needed and as is expected, the data with the same distance from s n participate with the same weight.We have This estimation can also be obtained by letting k 1 = m in (40) or (46).Again, note that (49) is the optimal solution for the problems where k 1 and k 2 can be any integer greater than or equal to m.
The validity of (49) can also be proved as follows.The error is calculated as e n = s n − s n , where s n is from (49).By extending the innovation noises from ( 4), it can be verified that Using ( 6) it is quite clear that I n to I n+m are orthogonal to s n−k for k > 0, and so is e n in (50).Further, we have Using ( 32), ( 36), (50), and (51), it is seen that E{e n s n+i } = 0 for i > 0. This completes the proof.The MSE is (52)

Prediction with Finite Data
Assume that only m − 1 consecutive data s n−1 to s n−m+1 are available.We want to prove the following.
The above estimation can be obtained as follows.Since s n−m is not available we can estimate it from data s n−1 to s n−m+1 .The estimated value can be now used to predict s n using (16).
On the other hand, s n−m can be backward predicted using (30) as Now we have two equations ( 54) and ( 55) with two unknowns s n and s n−m .Solving these equations, we get (53).The optimality of (53) can also be proved by seeing that the estimation error is equal to To derive the above equation, we has used (4) and (13).It is easily seen from ( 6) and ( 15) that I n and I n−m are orthogonal to data s n−1 to s n−m+1 .This proves the optimality of (53).To calculate the MSE, we note that The last equation is justified, as the error is orthogonal to the data and to the estimation which is a linear combination of the data.Inserting (56) in (57), we get Finally, using (58), (32), and (36), we have Higher-order predictions with m−1 data can be obtained from (53).As an example, we have where s n−1 is derived by replacing n by n − 1 in (53).
We could not derive a simple general form for the estimation with less than m − 1 data.

Interpolation with Finite Data
We now derive the linear interpolation with less than m data on each side.More clearly we allege (61) In (61) we must have It means that the distance between s n and the farthest data on the right side is less than the distance between s n and the farthest data on the left side.The optimality of (61) can be seen as we can verify that from (61), (4), and (13) the estimation error is (62) It remains to prove that (62) is orthogonal to the data.
(1) It is clear from (6) and ( 15) that I n to I n+k1 and I n−m to I n−k2−1 are orthogonal to the data s n−1 to s n−k2 .Therefore the error in (62) is orthogonal to (2) Further from (43) and regarding that I n−m to I n−k2−1 are orthogonal to the data s n+1 to s n+k1 according to (15), we see that the error in ( 62) is orthogonal to Therefore the error is orthogonal to the data and the proof is completed.
From (32), (36), and (62), the MSE is For the case we can replace k 2 by k 1 in (61) to achieve the following.
As expected, we see that the data with the same distance from s n participate with the same weight.Now, consider the case that the distance between s n and the farthest data on the right side is more than the distance between s n and the farthest data on the left side.It can be handled by the symmetry of the problem.More clearly, if we replace s n−k by s n+k and vice versa in (61), we get the following.( The estimation error in this case is The MSE is the same as (63).We could not find a simple general form for the case k 1 + k 2 < m − 1.

A Detailed Example
In this section we deal with a detailed example.The optimal linear estimation of the following process is desired.
I n is the innovation noise with the unit variance σ = 1.We have a 1 = 0.8, a 2 = 0.3 and a 3 = −0.1.The process is the response of the following 3rd order (m = 3) all-pole filter to the innovation noise.
The poles of this system are p 1 = 0. 9.1.Prediction with Finite Data.We want to derive the following optimal linear prediction.
Using (53), we have If we want to verify the solution using the orthogonality equations, we have Expanding (72), we get where r k 's come from (69).Replacing r k 's from (69) in (73), we get Solving (74), we get the same result as (71).

Interpolation with Finite Data.
Consider the following problem.
It is the symmetric case of k 1 = k 2 = 1 and we have 2k 1 = 2 = m − 1.Using (64), we have Let us rederive the solution of (75) using the orthogonality conditions.We have Expanding (77), we get the following.
Solving (78), we get the same answer as (76).Now, consider the nonsymmetric following problem.(80) Now, we want to obtain the solution of (79) using the matrix equations and we expect the same answer as (80).The orthogonality condition is It follows that (82) The result of (82) is the same as (80).(84) Now we verity (84) using the orthogonality conditions.The following set of equations is obtained (86) Note that the coefficient matrix of (86) is not Toeplitz.The result of (86) is the same as (84).

Conclusion
We introduced anticausal LTI model besides the known causal LTI model for AR processes.Using these models and the related innovation noises, we achieved the optimal linear interpolations for different cases.Specifically, we extracted the formulae when there are infinite data on the right, or the left sides of the variable to be estimated.We also obtained the linear prediction or interpolation with finite data.The number of data must be at least the order of the process minus one.We could not find a general simple form when fewer data are available.For the proofs of our solutions, the innovation noises and the orthogonality principle are essential.