A Bayesian Analysis of Spectral ARMA Model

Bezerra et al. 2008 proposed a new method, based on Yule-Walker equations, to estimate the ARMA spectral model. In this paper, a Bayesian approach is developed for this model by using the noninformative prior proposed by Jeffreys 1967 . The Bayesian computations, simulation via MarkovMonte Carlo MCMC is carried out and characteristics of marginal posterior distributions such as Bayes estimator and confidence interval for the parameters of the ARMA model are derived. Both methods are also compared with the traditional least squares and maximum likelihood approaches and a numerical illustration with two examples of the ARMA model is presented to evaluate the performance of the procedures.


Introduction
The spectral estimation of autoregressive moving average ARMA is considered a topic of interest in several applied areas of knowledge, for example, engineering, econometrics, and so forth 1-4 .Different methods have been studied, comprising two focuses, among which, we can point out: i the optimal methods, such as the maximum likelihood estimates, which are computational difficult to deal 1, 3, 4 ; ii the suboptimal methods based on the Yule-Walker equations, which employ linear equations in the parameters estimation 2-8 .Another type of suboptimal method, proposed by 9 , makes a simultaneous estimate of both parameters AR and MA using the reformulated Steiglitz-McBride least-squares algorithm.
The estimation methods for the ARMA process have been widely studied, especially for the separate estimation of parameters, where Yule-Walker equations are used to estimate the AR process parameters, and following this, the parameters MA process are estimated through the Durbin method 3, 4 .Among these methods, the best known are the modified

The ARMA Spectral Model
Consider x t an ARMA stationary random process of order p, q , defined by the following difference equation: p i 0 φ i x t−i q j 0 θ j ε t−j , assuming a 0 1, 1 ≤ t ≤ N, 2.1 where φ i , i 0, 1, . . ., p; θ j , j 0, 1, . . ., q, are the parameters of the process and ε t is a white noise with mean zero and variance σ 2 .Thus, the vector of parameters to be estimated is defined by β φ 1 , . . ., φ p , θ 1 , . . ., θ q , σ 2 .

2.2
We define the power spectral density of ARMA process, in plane-z, from the system function, by where A z p i 1 1 − p i z −1 i , A * 1/z * p i 1 1 − p * i z * i , p i , i 1, . . ., p, are the poles, and B z p i 1 1 − q i z −1 i , B * 1/z * q i 1 1 − q * i z * i and q i , i 1, . . ., q, are the zeros.As usual, the stars denote complex conjugation.
The ARMA power spectral density on frequency domain is obtained by substituting z exp j2πf into 2.3 resulting in

2.4
The function S f is the Fourier transform of the autocorrelation function r k E x t x t−k , presumably nonnegative and periodical in frequency with period 1 Hz.It is assumed to be band limited to ±0.5 Hz 1-3 .
The problem of ARMA spectral estimation consists initially on selecting the suitable model such that we can estimate the vector of parameters of the process and then replace the estimates value in the spectral density function.This parametrization of the function S f is obtained by using the vector of parameters β.

Estimation Methods Using Yule-Walker Equations
There are several methods which use Yule-Walker equations in separate spectral estimation for the ARMA model, as previously mentioned in the introduction.The methods the best known are modified Yule-Walker equations MYWEs and the least squares modified Yule-Walker LSMYW .This uses more than p linear equations in the parameters estimation.In both methods, the errors can be weighted in order to stabilize the variance.Moreover, these methods provide similar results for ARMA spectral estimation.

Least-Squares Yule-Walker Method
This method is an attempt to reduce the variance of the MYWE estimator and improve the quality of their estimates.It is known that the autocorrelation function of the ARMA process is defined as 6 3.1 From the MYWE, initially we will describe the least squares method of the modified Yule-Walker equations LSYW for the estimation of the parameters of the AR process, given that the expanded equations of Yule-Walker for the ARMA process can be represented as 2-4, 6 where R is the expanded autocorrelation matrix M − q × p, which is given by and r is the autocorrelations vector M − q × 1 given by r r q 1 , r q 2 , . . ., r M T .

3.4
The consistent estimates of AR vector parameters Φ φ 1 , . . ., φ p T can be obtained by expression 3.2 .As M − q > p, there are more equations than unknowns.One error e of size M − q × 1 in order to estimate the autocorrelations should be introduced given by r − RΦ e, 3.5 where r and R correspond to r and R estimators, respectively.It is convenient to use the unbiased autocorrelations r k to estimate r and R, so that the bias of approximation error is null.Thus, the method of least squares can be applied to find the vector which minimizes the sum of squares of errors, that is, 2, 6 or, in the matrix form: S e T e.

3.7
Substituting the equation of the errors e r RΦ 3.8 in 3.7 , we obtain S r RΦ T r RΦ .

3.9
The development of the products above in 3.9 one has the following expression results: Taking the partial derivatives ∂S/∂Φ 0 in expression 3.10 , We can obtain the Φ estimator of the AR process, which is given by

3.12
The matrix R T R is usually positive definite, hermitian, and Φ estimator cannot be a minimum-phase estimator 2, 3 .
The estimation of AR parameters of the ARMA model through LSYW method can be interpreted as an application of the covariance method considering the set of values { r q−p 1 , r q−p 2 , . . ., r M } 2, 3 .Moreover, the estimated process parameters MA can be obtained applying the method of Durbin 3 .
The performance of the LSYW, p q < M < N, is usually better than the modified Yule-Walker equations method MYWE , where M p q number of equations , using only p equations, but the results will also depend on the kind of spectrum to be estimated 2, 3, 5 .

The Method Using AR Filter in MA Estimates
The new method for separate estimation consists of making a new AR filtering of the signal x t by using the estimates of the MA parameters obtained with the Durbin method.Thus determining a new signal x t , proposed by 8 , as shown in the Figure 1 and according to the expression: 3.13 In Figure 1, the H z 1/Y z is the system transfer function, and q .This procedure is like repeating the second step of the Durbin method.This is the key idea of the separate estimation method 3, 6 .
It is known that an AR ∞ process large order can be an approach for the MA process 1-4 .In this method, a new signal estimate x t will be used to obtain a new AR estimate through the LSYW method in order to subsequently determine a new MA estimate through the Durbin method.The key idea of the method proposed is to consist of obtain a new signal, from the AR filtering, which will provide better estimates when the parameters of the ARMA process are reestimated.

Bayesian Estimation
In a Bayesian framework, the inference is based on the posterior distribution of parameters β, denoted by p β | x , which, in turn, is used for inferences and decisions involving β.
The posterior distribution p β | x is obtained from the combination of the information provided by a prior distribution f β and the information supplied by the data through the likelihood L β | x .Thus, using Bayes' theorem, the posterior distribution is given by 4.1 The prior distribution represents the knowledge or uncertainty state about the parameter β before the experiment is run and the posterior distribution describes the updated information about β after the data x is observed.
For an ARMA p, q model, we need to estimate the parameters β and σ where β φ 1 , . . ., φ p , θ 1 , . . ., θ q .The likelihood function proposed in 11 for the observations X given the vector parameters β, σ can be written as follows: where

4.3
The likelihood above combined with the prior distribution results in the posterior distribution given by The values of the variables x i−p , i 1, . . ., p and error terms ε i−q , i 1, . . ., q are arbitrarily fixed.
To progress with the Bayesian analysis, it is necessary to specify a prior distribution over the parameter space.Different prior distributions can be used in our study according to all currently available information.
If prior information on study parameters is unavailable or does not exist for a device, then initial uncertainty about the parameters can be quantified with a noninformative prior distribution.This is the same to include in the analysis just the information provided by the data.
Therefore, for the ARMA coefficients, we assume that little is known about these parameters such that an uniform distribution can be used as a prior distribution.We also assume a prior that the components of β are independent and hence the overall prior for β is the product of the priors for its components.Besides, a conventional noninformative prior distribution for the model variance σ 2 can be represented by Therefore, the joint prior distribution of β and σ 2 has the form: Other prior specifications also could be considered, as independent informative normal distributions for the components of β, that is, β j ∼ N μ,σ 2 β , j 1, . ..,k, with mean μ and variance σ 2 β specified a prior and an informative prior as the inverse Gamma distribution for the parameter σ 2 , that is, σ 2 ∼ IG a σ , b σ with hyperparameters a σ and b σ known.Thus, the joint prior for β and σ 2 would be given by To obtain the marginal posterior distribution for each parameter from the model, one needs to solve integrals involving the joint posterior density that are not analytically tractable and which standard integral approximations can perform poorly.In this case, we use of Markov chain Monte Carlo MCMC approaches to carry out the Bayesian posterior inference.Specifically, we can run an algorithm for simulating a long chain of draws from the posterior distribution, and base inferences on posterior summaries of the parameters or functional of the parameters calculated from the samples.MCMC is essentially Monte Carlo integration using Markov chains.
Constructing such a Markov chain is not difficult.We first describe the Metropolis-Hastings algorithm.This algorithm is due to 12 , which is a generalization of the method first proposed by 13 .
Let g γ be the distribution of interest.Suppose at time t, γ t 1 is chosen by first sampling a candidate point η from a proposal distribution q • | γ t .
The candidate η is then accepted with probability: If the candidate point is accepted, the next state becomes γ t 1 η.If it is rejected, then γ t 1 γ t and the chain does not move.The proposal distribution is arbitrary and, provided the chain is irreducible and aperiodic, the equilibrium distribution of the chain will be g γ .
After obtaining a random sample from the MCMC algorithm for each component of β, it is important to investigate issues such as convergence and mixing, to determine whether the sample can reasonably be treated as a set of random realizations from the target posterior distribution.Looking at marginal trace plots is the simplest way to examine the output where V is a diagonal matrix; iii the candidate for β i 1 , denoted by β prop , will be accepted with a probability given by the Metropolis ratio: v because this proposal is not symmetric, we find The proposal distribution parameters V and d were chosen to obtain good mixing of the chains.

Numerical Illustration
In this section, we describe the Monte Carlo simulations 17 , in order to obtain the estimates of the parameters and the spectral power density corresponding to each of the cited below ARMA processes: 1 ARMA 4,2 18 and 2 ARMA 4,4 19 .The simulation and the programs were implemented in the MATLAB and R softwares.
The methods used on the simulations are 1 modified Yule-Walker equations MYWE ; 2 least squares modified Yule-Walker equations LSYW ; 3 least squares modified Yule-Walker equations with AR filtering LSYWS-proposed method ; 4 maximum likelihood estimator MLE ; 5 Bayesian estimator considering the noninformative priors as Jeffreys and independence normal priors for the parameters.
In order to evaluate the performance of the methods, the random data have been generated from the same seed and that change it with the sequences, i 1, . . ., B, and B is the replications number in each simulation.We carried out this using N 256 observations and B 30 replications.The random numbers are generated from a standard Gaussian distribution white noise with mean equal zero and variance equal one, then generated a signal of the process ARMA.
The modified covariance was the criterion in the Durbin method 3 , and the large AR process order was chosen using the criterion described by 20 and the number of equations, M, was chosen in accordance with the spectral characteristics to be analyzed.On the other hand, for the spectrum with poles far from the unit circle UC , as it was given in Example 5.1, a small number was used for M, which has one pole near and other far from the unit cycle UC .5.2

Discussion
Firstly, we consider a comparative analysis between the classical methods applied to the ARMA 4,2 model fitted by the data of Example 5.1.Thus, Table 1 shows that the LSYWS method provided the best averages for estimate parameters than other methods, but with a slightly larger variability than the LSYW method for the AR model.The LSYW method produces intermediate estimates but with a superior performance than the MYWE and MLE methods.These facts can be observed through the average estimates of the spectra power in  Figures 2 and 3, which shows that the best estimate is really provided by LSYWS method proposed by 8 .
Comparing the classical methods and the Bayesian methods studied in this paper, Table 2 shows that the estimates of parameters from both Bayesian methods are similar to the LSYWS method, but with a smaller variability see Figure 4 .
By using the noninformative Jeffreys prior, we obtained a slightly better estimate than the empirical prior, most likely due to the mean prior given by LME does not produce so good results.Tables 3 and 4 present the results of estimation methods MEYW, LSYW, LSYWS, and MLE applied to the ARMA 4,4 model from Example 5.2 see Figures 5 and 6 .In this case, the average estimates of the parameters were approximately equal for all considered methods.Just the standard deviations of the mean estimates showed a difference with the Bayesian standard deviation, see Tables 5 and 6.The standard deviation, with Jeffreys prior, had lower value than the other methods, but with a value slightly smaller than the standard deviation obtained by the MLE method.This analysis can also be confirmed by plots of the power spectra estimates 2 see Figures 6 and 7 .In this case, this similarity can be due to the ARMA 4,4 has a stable power spectrum and with more parameters to be estimated than case previous.

Conclusions
In this article, we describe the Bayesian approach by using the noninformative prior distribution proposed by Jeffreys 10 to estimate the density spectral of the ARMA model.This Bayesian approach is compared with non-Bayesian approaches LSYWS method and LSYW, both based on Yule-Walker equations and least-squares method.Other methods proposed in the literature as MLE and MYWE Mod-fitted Yule-Walker Equations are also compared.
Two ARMA models with different orders were introduced to illustrate the proposed methodologies and to examine their performances.
The study showed that the Bayesian approaches produced more accurate estimates than the other approaches by considering ARMA 4,2 model, but similar results are found for ARMA 4,4 .This way, we conclude that Bayesian approach for less stable ARMA models is justified, and more stable power spectrums any one procedures could be chosen.The Bayesian approach provides best fitting of spectral applicable for any order of the ARMA model than the other approaches.Finally, the choice of the approaches becomes irrelevant for a moderate large size.In addition, generally speaking, it was seen that the results depend on the spectrum to be estimated.

Figure 1 :
Figure 1: Filtering of the signal through an AR filter so as to obtain a new estimate.

Table 1 :
Estimation classic AR and MA parameters standard deviation .

Table 2 :
Estimation Bayesian AR and MA parameters standard deviation .

Table 3 :
Estimation classic AR parameters standard deviation .

Table 4 :
Estimation classic MA parameters standard deviation .

Table 5 :
Estimation Bayesian AR parameters standard deviation .In the first example, the ARMA 4,2 model was used with M 10, L 85, to N 256, and in the second example, we used ARMA 4,4 model with M 20, L 125 to N 256.The dashed curve is theoretical and solid curve is the average of estimates.

Table 6 :
Estimation Bayesian MA parameters standard deviation .