The Failure of Orthogonality under Nonstationarity: Should We Care About It?

We consider two well-known facts in econometrics: i the failure of the orthogonality assumption i.e., no independence between the regressors and the error term , which implies biased and inconsistent Least Squares LS estimates and ii the consequences of using nonstationary variables, acknowledged since the seventies; LS might yield spurious estimates when the variables do have a trend component, whether stochastic or deterministic. In this work, an optimistic corollary is provided: it is proven that the LS regression, employed in nonstationary and cointegrated variables where the orthogonality assumption is not satisfied, provides estimates that converge to their true values. Monte Carlo evidence suggests that this property is maintained in samples of a practical size.


Introduction
Two well-known facts lie behind this work: i the behavior of LS estimates whenever variables are nonstationary and ii the failure of the orthogonality assumption between independent variables and the error term, also in an LS regression.nonstationary and cointegrated variables DGP 2.1 is included because it eases the comprehension of the paper x t μ x u x,t, 2.1 x t μ x x t−1 u x,t , 2.2 x t X 0 μ x ρ y1 t ξ x,t−1 ρ y2 ξ y,t−1 , 2.3 y t μ y β y x t u y,t ρ x u x,t innovations , 2.4 where u z,t , for z x, y, are independent white noises with zero-mean and constant variance σ 2 z , ξ zt t i 0 u zi and Z 0 is an initial condition.We may relax the assumptions made for the innovations; for example, we could force them to obey the general level conditions in 5, Assumption 1 .Nevertheless, although the asymptotic results would still hold in this case, our primary target concerns the problem of orthogonality between the regressor and the error term, not those of autocorrelation or heteroskedasticity.These DGPs allow for an interesting variety of cases note that the asymptotics of the LS estimates when x and y have been independently generated by any of first three DGPs can be found, e.g., in 13 ; notwithstanding, the authors can provide these cases as mathematica code upon request .
1 Bookcase no.1: DGP of x is 2.1 and DGP of y is 2.4 with ρ x 0. When the variables are generated in this manner, we fulfill the classical assumptions made in most basic econometrics textbooks.The variables are stationary, the innovations are homoskedastic and independent, and so forth.It is straightforward to show that: α p → μ y , β p → β y and σ 2 p → σ 2 y . 2 Bookcase no.2: DGP of x is 2.1 and DGP of y is 2.4 with ρ x / 0. These DGPs also represent a typical example of a problem of orthogonality in most basic econometrics textbooks.Although the variables are stationary and the innovations are homoskedastic and independent, the explanatory variable is related to the innovations of y.It is well known that the estimates do not converge to their true value.In particular, it is straightforward to show that: and σ 2 p → σ 2 y .3 Bookcase no.3: DGP of x is 2.2 and DGP of y is 2.4 with ρ x 0. These DGPs allow the relationship between x and y to be cointegrated à la 14 .Once again, asymptotic results have been known for a long time, obtaining these does not entail any particular difficulty: 4 Nonstationarity and non-orthogonality case no.1: DGP of x is 2.2 and DGP of y is 2.4 .Notwithstanding, the obvious problem of orthogonality between x and the error term, the variables remain cointegrated.The artifact employed to induce the orthogonality problem can be considered as, for example, measurement errors in the explanatory variable.One should expect that, in the presence of this problem, estimates would not converge to their true value.We prove below that, contrary to expectations, this is not the case.
and y, only in this case, the problem of orthogonality between the regressor and the error term is even more explicit; the artifact employed to induce the orthogonality problem can be related to the typical simultaneous equations case.We also prove below that Least Squares LS provide consistent estimates.
The common belief as regards the last two cases is that the failure of the orthogonality assumption induces LS to generate inconsistent estimates, even in a cointegrated relationship.
In fact, when the variables are generated as in 2.2 -2.4 , the estimates of the parameter converge to their true value note that we did not consider the case where the orthogonality assumption is not satisfied because of the omission of a relevant variable; 15 studied the later case and proved that the LS estimates do not converge to their true values .This is proven in Theorem 2.
ii Let x t be generated by 2.3 .The innovations of both DGPs, u z,t , for z y, x, are independent white noises with zero-mean and constant variance σ 2 z ; use y t and x t to estimate regression 1.1 by LS.Hence, as T → ∞, Proof.See Appendix A.
These asymptotic results show that a relationship between the innovations of y t and x t -as stated by DGPs 2.2 , 2.3 , and 2.4 -does not obstruct the consistency of LS estimates when the variables are nonstationary and cointegrated our results are in line with those of 8 .In other words, the failure of the orthogonality assumption does not preclude adequate asymptotic properties of LS.Furthermore, it can be said that x t is weakly exogenous for the estimation of μ y and β y but not for the estimation of σ 2 .The formula of the variance is noteworthy and the asymptotic expression of t β depends on the values of σ 2 x , σ 2 y , and ρ x .In order to emphasize the relevance of this result, we modified the DGPs of the variables in an effort to strengthen the link between the DGPs and the literature on simultaneous equations.The modifications are twofold and appear in the following propositions.As in Theorem 2.1, the results in proposition 1 are made under the assumption that innovations are i.i.d processes.
Proposition 2.2.Let y t and x t be generated by where u z,t , for z x, y, are independent white noises with zero mean and variance σ 2 z .Let these variables be used to estimate regression 1.1 by LS.Hence, as T → ∞, Proof.See Appendix A.
Proposition 2.3.Let y t and x t be generated by

2.6
where u z,t , for z x, y, are independent white noises with zero mean and variance σ 2 z , and ξ x,t t i 0 u x,i .Let these variables be used to estimate regression 1.1 by LS.Hence, as T → ∞, Proof.See Appendix A.
The two systems, represented in 2.5 and 2.6 , bear a striking resemblance to classical examples of simultaneous equations in econometrics.The fundamental variations are, i a deterministic trend in the variable x t in system 2.5 and ii a stochastic as well as a deterministic trend in system 2.6 .The asymptotics of LS estimates do not show significant differences from those in Theorem 2.1.Note, however, that x t is weakly exogenous for the estimation of μ y , β y , and σ 2 y .The main result is in fact identical, that is, the failure of orthogonality between x t and the error term does not preclude the estimates from converging to their true values.
Asymptotic properties of LS estimators clearly provide an encouraging perspective in time-series econometrics.Notwithstanding, we should bear in mind that asymptotic properties may be a poor finite-sample approximation.In order to observe the behavior of LS estimates in finite samples, we present two Monte Carlo experiments.Firstly, we represent graphically the convergence process of β towards its true value, β.In accordance with asymptotic results, β − β p → 0 as T → ∞.We reproduce the behavior of the later difference in figure 1.The variables x and y are generated according to 2.3 and 2.4 , respectively.The sample size varies from 50 to 700 whilst β y goes from −5 to 5. The remaining parameters appear below the figure .A brief glance at Figure 1 reveals that the asymptotic results stated in Theorem 2.1 approximate conveniently the finite-sample results for T > 150.For smaller sample sizes, it can be seen that the difference between the parameter and its estimates corresponds usually to approximately 1.5% or less of the value of the former we tried different variables in the y axis ρ x , ρ y1 , ρ y2 , σ 2 y , σ 2 y , . . .; all of these trials produced similar figures .The second Monte Carlo is built upon the same basis.In Table 1, each cell indicates the sample mean of β − β y and, below, its estimated standard deviation in parentheses .The number of replications is 10,000.The parameter values used in the simulation are explicit within the table.The variables, x and y, are generated according to 2.3 and 2.4 , respectively.Sample size ranges from T 50-700; ρ y1 −0.15; ρ x 4; σ 2 y σ 2 x 1; μ y 4.20; the error term is a white noise with variance σ 2 1.
Table 1 shows that LS estimates of a nonstationary relationship with a nonorthogonality problem quickly converge to their true value; with a sample size as small as 50 observations, the difference between β y and its estimate averages, at most, 0.015, and represents a deviation from the true value of 1.5%; in many other cases, the deviation is even smaller, of order 10 −3 -10 −4 .These differences tend to diminish further as the sample size grows.In fact, when there are 700 observations, the order of magnitude of such differences oscillates between 10 −5 -10 −8 .We performed the same experiment with autocorrelated disturbances AR 1 with φ 0.7 data available upon request ; using such disturbances severely deteriorates the efficiency of the LS estimates although β − β y still converges to zero; we do not focus on this issue because, as mentioned earlier, neither autocorrelation nor heteroskedasticity are under scrutiny in this work.

Concluding Remarks
Using cointegrated variables in an LS regression where the regressor is not independent of the error term does not preclude the method from yielding consistent estimates.In other words, it is proven that, under these circumstances, the regressor remains weakly exogenous for the estimation of μ y and β y and for σ 2 y in systems 2.5 and 2.6 as defined by 12 .Furthermore, the finite-sample evidence indicates that LS provide good estimates even in samples of a practical size.
Notwithstanding, one should note the striking resemblance between the properties of the DGPs used in the propositions and those of variables belonging to a classical simultaneous-equation model.It may be possible that the estimation of such models, even if the macroeconomic variables they are nourished with are not stationary, would yield correct estimates.Of course, such a possibility rules out the existence of structural shifts, parameter instability, omission of a relevant variable, or any other major assumption failure.
, and X X −1 22 is the element in row 2, column 2, of the X X −1 matrix.
To obtain the asymptotics of α, β, σ 2 , t β , and R 2 we need to ascertain the behaviour of the following expressions when T → ∞: x t , y t , x 2 t , y 2 t , and x t y t .The behavior of these expressions varies depending on the DGP of the variables x t and y t .We present such behavior for the DGPs underlying Theorem 2.1 and Propositions 2.2 and 2.3.All of the orders in probability stated in the underbraced sums can be found in 5, 13, 16-18 .It is important to clarify that the computation of the asymptotics follows 5 and was assisted by Mathematica; we thus rewrote below the expressions written as Mathematica code.

A.1. Theorem 2.1: First Result
The expressions needed to compute the asymptotic values of α, β, σ 2 , and R 2 are  where ξ y,t t i 1 u y,i and Y 0 is an initial condition.The sums including solely the deterministic trend component are The code in this case is represented below.To understand it, a brief glossary is required and appears in Table 2.These expressions were written as Mathematica 7.0 code.A.2 The code in this case is represented below.
Clear All, S t 1 2

Table 1 :
Finite-sample behavior of β − β y : mean and standard deviation.

Table 2 :
Glossary of the Mathematica code.
The expressions needed to compute the asymptotic values of α, β, σ 2 , and R 2 appear below.Note that y t , y 2 t , and y t x t are identical to the ones presented in the previous appendix − β x β y , C 1 μ x μ y β x /C 0 , and C 2 γ x /C 0 .The expressions needed to compute the asymptotic values of α, β, σ 2 , and R 2 are 1/2 , S y μ y T β y S x S uy T 1/2 , S xuy C 1 S uy T 1/2 C 2 S uyt T 3/2 β x S x2 S uy2 T 2μ y β y S x 2μ y S uy T 1/2 2β y S xuy , S xy μ y S x β y S x2 S xuy .As in the previous appendix, first rewrite DGP 2.5 as − β x β y , D 1 X 0 μ y β x /D 0 , and D 2 μ x /D 0 .The expressions needed to compute the asymptotic values of α, β, σ 2 , and R 2 appear below.Note that y t , y 2 t , and y t x t are identical to the ones presented in the previous appendix and have been therefore omitted Rcden Factor Limit Expand P52den/T K14 , T → ∞ ; Rc Factor Expand Rcnum/Rcden * T K13 /T K14 ; * t β * t β Full Simplify Bpar/ Vpar * R42 −1/2 * 1 − R 2 * P70 Full Simplify Rc .