Wavelet-M-Estimation for Time-Varying Coefficient Time Series Models

0is paper proposes wavelet-M-estimation for time-varying coefficient time series models by using a robust-type wavelet technique, which can adapt to local features of the time-varying coefficients and does not require the smoothness of the unknown time-varying coefficient. 0e wavelet-M-estimation has the desired asymptotic properties and can be used to estimate conditional quantile and to robustify the usual mean regression. Under mild assumptions, the Bahadur representation and the asymptotic normality of wavelet-M-estimation are established.


Introduction
e analysis of nonlinear and nonstationary time series, particularly with a time trend, has been very popular over the last two decades, because most time series data, coming from economics and finance data, are nonlinear or nonstationary or trending. Some nonlinear and nonstationary parametric, semiparametric, and nonparametric time series models have been proposed in the econometrics and statistics literature, for example, [1][2][3][4][5][6] and the references therein. One of the most attracted models is the time-varying coefficient time series models, which was formulated as follows: where Y i is the response, β(·) � (β 1 (·), . . . , β p (·)) T is a p-dimensional vector of unspecified coefficient function defined on [0, 1], X i � (X i1 , . . . , X ip ) T is a p-dimensional random vector, and ε i is the random error. ere are many smoothers proposed to estimate the time-varying coefficient β(·) in models (1), and the estimator is analyzed in the large-sample theory. Robinson [4] developed the Nadaraya-Watson method and showed the consistency and asymptotic normality of local constant estimator under the assumptions that the time series X i is a stationary α-mixing and the errors ε i are i.i.d. and independent of X i . Cai [1] proposed the local linear approach and established asymptotic properties of the proposed estimators under the α-mixing conditions and without specifying the error distribution. Hoover et al. [7] gave a smoothing spline and a locally weighted polynomial methods for longitudinal data and presented asymptotic properties. Li et al. [8] and Fan et al. [9] made statistical inference for the partially timevarying coefficient (errors-in-variables) model, respectively. ese estimations are all based on local least squares, which are efficient for Gaussian errors, but least-squares estimation may perform poorly in the presence of extreme outliers. In addition, these methods are all based on an important assumption that β(·) is of high smoothness. In reality, the assumption may be not satisfied. In some practical areas, such as signal and image processing, objects are frequently inhomogeneous. More robust estimation methods are required.
In this paper, we propose an M-type regression based on wavelet technique, which is called wavelet-M-estimation (WME), for the time-varying coefficient time series models (1). ere is the considerable literature devoted to M-estimation for nonparametric regression models. Fan et al. [10] obtained asymptotic normality of the M-estimator for local linear fit under independent observations. Hong [11] established a Bahadur representation for the local polynomial estimates in nonparametric M-regression under the i.i.d. random errors. Jiang and Mack [12] and Cai and Ould-Saïd [13] considered the local polynomial M-estimator and the local linear M-estimator for dependent observations and showed some asymptotic theories of the proposed estimators. For varying coefficient models, Tang and Cheng [14] showed asymptotic normality of local M-estimators for longitudinal data, and so on. However, the above works required the smoothness of the function being estimated, for example, assuming that the function has continuous first/ second derivative. With wavelets, such assumptions are relaxed considerably. Because wavelet bases can adapt to local features of curves in both time and frequency domains, wavelet provides a new technique to analyze functions with discontinuities or sharp spikes. erefore, it is natural to have better estimators than the local kernel method in many cases. Great achievements have been made for wavelet in nonparametric models, for example, Antoniadis et al. [15]; Donoho and Johnstone [16]; Hall and Patil [17]; Härdle et al. [18]; Vidakovic [19]; Zhou and You [20]; Lu and Li [21]; and Zhou et al. [22]. To the best of our knowledge, however, M-type estimation based on wavelet technique has not been developed for the time-varying coefficient models. We use a general formulation to treat mean regression, median regression, quantile regression, and robust mean regression in one setting by WME. e article is organized as follows. Section 2 describes the wavelet analysis, α-mixing sequence, and wavelet-M-estimation for time-varying coefficients. Section 3 presents the Bahadur representation and asymptotic normality of the WME under the α-mixing stationary time series sequence, and states some application of the main results. Some technical lemmas and the proofs of the main results are given in Section 4.

Wavelet-M-Estimation
As a central notion in wavelet analysis, multiresolution analysis (MRA) plays an important role for constructing a wavelet basis, which is a sequence of closed subspaces V n , n ∈ Z, in a square integrable function space L 2 (R) satisfying the following properties: iv) ere exists a scaling function ϕ ∈ V 0 whose integertranslates span the space V 0 , that is, and for which the set ϕ(· − k), k ∈ Z is an orthonormal basis of V 0 By dilation and translation for ϕ(·), we have ϕ m,k (t) � 2 m/2 ϕ(2 m t − k), m, k ∈ Z. From (iii) and (iv), one gets ϕ mk , k ∈ Z is the orthogonal bases of V m . According to Moore-Aronszajn theorem [23], V m is a reproducing kernel Hilbert space with a kernel (2) For any function f ∈ V m , Denote the kernel of V 0 as K 0 (t, s) � k ϕ(t− k)ϕ(s − k), then K m (t, s) � 2 m K(2 m t, 2 m s). For more details, we refer to Vidakovic [19].
It motivates us to define a wavelet-M-estimator of β(t) by where ρ(·) is a given convex function, and A i are intervals that partition [0, 1], so that t i ∈ A i . One way of defining the As an alternative to (4), the following equation is also used to define the WME of β(t), that is, to find b to satisfy where 0 is a p-dimensional zero vector. It is a natural method to obtain (5) by taking the partial derivatives of (4) with respect to b when ρ(·) is continuously differentiable and equating it to null, i.e., ψ(·) � ρ ′ (·). In this paper, we shall apply the suitably choice function ψ(·) to (5), which includes many interesting cases such as least-squares estimation, the least absolute distance estimation, and quantile regression. See the monographs Huber and Ronchetti [24] and Koenker [25] for more details about the robustness of M-estimations and quantile regression, respectively. Before stating the main results, we give the definition of α-mixing dependence, which is necessary to establish our asymptotic theory for trending time-varying coefficient time series models. roughout, we assume that X i , ε i is a stationary α-mixing sequence. Recall that a sequence ζ k , k ≥ 1 is said to be α-mixing (or strong mixing) if the mixing coefficients, converge to zero as m → ∞, where F k l denotes the σ-field generated by ζ i , l ≤ i ≤ k . e notion of α-mixing is widely adopted in the study of nonparametric regression models. It is reasonably weak and is known to be fulfilled for many stochastic processes, including many familiar linear and nonlinear time series models. We refer to the monograph of Doukhan [26] and Fan and Yao [27] for some properties or more mixing conditions. 2 Discrete Dynamics in Nature and Society

Asymptotic Theory
We first list the regularity conditions needed in the proof of the theorems although some of them might not be the weakest possible.
is a convex function, and ψ(·) is assumed to be any choice of the derivative of ρ(·). Denote by D the set of discontinuity points of ψ(·). e common distribution function F of ε i satisfying F(D) � 0. (A3) ψ(·) satisfies the following conditions: where holds uniformly for z in a neighborhood of X, where e time-varying coefficients β j (·) for j � 1, . . . , p, and the scaling function ϕ(·) in wavelet kernel satisfies the following conditions: (i) β j (·) belongs to Sobolev space with order ] > (1/2). (ii) β j (·) satisfies the Lipschitz of order condition of order c > 0. (iii) ϕ has a compact support and is in the Schwarz space with order l > ], satisfies the Lipschitz condition with order l.
(A6) e tuning parameter m satisfies (i) Some remarks on the conditions are in order.
Remark 1. Condition (A1) is the standard requirements for moments and the mixing coefficient for an α-mixing sequence. It is well known that among various mixing conditions, for example, α-, ρ-, and φ-mixing, and α-mixing is reasonably weak and can be used to depict many stochastic processes, including many familiar linear and nonlinear time series models. (A1) (i) is a very common condition, see Cai et al. [28]; Cai and Ould-Saïd [13]; and Fan and Yao [27]; among others.
Remark 2. Conditions (A2) and (A3) are often imposed to establish the large-sample theory of M-estimation in parametric or nonparametric models, see, for example, Bai et al. [29]; Cai and Ould-Saïd [13]; and Lin et al. [30]. ey are mild and cover some well-known special cases, such as leastsquare estimation, Huber loss, and quantile. Some special examples are given as follows.
. We can write the object function in equation (4) as and denote e first theorem is crucial for establishing the asymptotic properties of the WME.
With the help of eorem 1, we can establish the Bahadur representation of WME.
With the help of eorem 2, we can establish asymptotic normality of the WME.

Remark 5.
To obtain an asymptotic expansion of the variance and an asymptotic normality, we need to consider an approximation to β(t) based on its values at dyadic points of order m, as Antoniadis et al. [15] have done. e main reason is that the variance of β(t) as a function of t is unstable. It can be avoided by using dyadic points t (m) . Also see Antoniadis et al. [15] for the details. From eorems 2 and 3, we have the uniform weak consistency of WME: Next, we shall give some special cases as corollaries of eorem 3.

Technical Lemmas and Proofs
In the following sections, C is positive constant, which may be changed from line to line in the proof. Lemma 1 (see Antoniadis et al. [15] and Walter [31]). Suppose that (A4) holds. We have 4 Discrete Dynamics in Nature and Society

(25)
Lemma 4 (see Pollard [33]). Let λ n (θ), θ ∈ Θ be a sequence of random convex functions defined on a convex, open subset Θ of R d . Suppose λ(·) is a real-valued function on Θ for which λ n (θ) → λ(θ) in probability, for each fixed θ in Θ. en, for each compact subset K of Θ, in probability, For simplicity, we introduce some notations before proving. We are interested in the asymptotic behaviors of θ, which can follow from the new optimization objective function: Furthermore, denote We rewrite G n (θ; t) as Following Lemmas 5-7 give the asymptotic behaviors of R n (θ; t), E[G n (θ; t)] and W n (t) uniformly in t ∈ [0, 1], respectively.

Lemma 5. Under the assumptions of eorem 1, for fixed θ, it holds that
where R n (θ; t) is defined by (28).
Proof of Lemma 5. From the definition of R n (θ; t), we have where By the convexity of ρ(·), we have Furthermore, we have Let a n � (2 m /n)(n − c + η m ) + (2 m /n) 5/2 . From conditions A1 (ii), A3 (ii), and A4 (ii), and Lemma 1, we have Discrete Dynamics in Nature and Society To obtain an upper bound for the second term on the right-hand side of (34), we split it into two terms as follows.
where d n is a sequence of positive integers such that d n � O(log n) as n ⟶ ∞.
Proof of Lemma 6. By condition A2 (i), Lemma 2, and Lemma 1 in Bai et al. [29], we have Discrete Dynamics in Nature and Society 7 is finishes Proof of Lemma 6.

Lemma 7. Under the assumptions of eorem 3, it holds that
where W d n (t) � W n (t (m) ) with t (m) � ⌊2 m t/2 m ⌋, and W n (t) is defined by (1).
First, we calculate its variance-covariance. Note that E(ψ(ε 1 ) | X 1 ) � o(1) by condition A3 (i). We have Cov ξ i , ξ j . (46) By using the same argument as those used in Proof of Lemma 5, we obtain Now, we prove the first term on the right-hand side of (46). Using the compact support and Lipschitz properties of ϕ, one can show that E 0 (t, ·) is Lipschitz uniformly in t, so that and the Lipschitz property of κ(·) implies where u i and v i belong to A i . By condition A3 (iii), we have where u i and v i belong on A i . By (48) and (49), and a standard calculation Second, we shall show (44). Redefine Note that E(ψ(ε 1 ) | X 1 ) � o(1), and it turns (44) to where the definition of ξ d i is similar to ξ i by using t (m) to replace t. As ψ(·) is not necessarily bounded, we employ a truncation method. Denote ψ L (·) � ψ(·)I(|ψ(·)| ≤ L) and as before that ξ , and let Γ x be defined as before with c(·) replaced by E(ψ 2 L (ε 1 ) | X 1 � ·). Similar to Proof of eorem 2 in Cai et al. [28], by using Doob's large-block and small-block technique, we can show that at first n ⟶ ∞ and then L ⟶ ∞. Let Γ x be defined as before with c(·) replaced by E[ψ 2 (ε 1 )I(ψ(ε 1 ) > L) | X 1 � ·].
From Lemma 7, it is easy to see that W n (t) is stochastically bounded. Since the convex function λ n (θ) � G n (θ; t) − W n (t) converges in probability to the convex function λ(θ) � (1/2)θ T Ω x θ, it follows from convexity Lemma 4 that for any compact set K, (60) Notice that the convexity lemma strengthens the pointwise result to uniform convergence on compact subsets of R p . is completes Proof of eorem 1. □ Proof of eorem 2. To obtain Bahadur representation of WME, the idea behind the proof is to approximate G n (θ; t) by a quadratic function whose minimizing value has an asymptotic the behavior, and then to show that θ n lies close enough to the minimizing value to share its asymptotic behaviour. We have done the first step, that is, the results of eorem 1 and Lemma 7. Let θ n � Ω − 1 x W n (t) and c 2 n � b n ���� log n. Now, we prove the second step. e argument will be complete if we can show for each ε > 0 that P θ n − θ n � � � � � � � � > c n ε � o(1).
e argument is similar to Proof of eorem 1 in Pollard [33], whose method is extended to obtain the Bahadur representation of WME. From eorem 1, the compact set K can be chosen to contain a closed ball B(n) with center θ n and radius c n ε, with probability arbitrarily close to one. ereby, it implies that