Large-Deviation Results for Discriminant Statistics of Gaussian Locally Stationary Processes

This paper discusses the large-deviation principle of discriminant statistics for Gaussian locally stationary processes. First, large-deviation theorems for quadratic forms and the log-likelihood ratio for a Gaussian locally stationary process with a mean function are proved. Their asymptotics are described by the large deviation rate functions. Second, we consider the situations where processes are misspecified to be stationary. In these misspecified cases, we formally make the log-likelihood ratio discriminant statistics and derive the large deviation theorems of them. Since they are complicated, they are evaluated and illustrated by numerical examples. We realize the misspecification of the process to be stationary seriously affecting our discrimination.


Introduction
Consider a sequence of random variables S 1 , S 2 , . . .converging in probability to a real constant c.By this we mean that Pr{|S T − c| > ε} → 0 as T → ∞ for all ε > 0. The simplest setting in which to obtain large-deviation results is that considering sums of independent identically distributed iid random variables on the real line.For example, we would like to consider the large excursion probabilities of sums as the sample average: where the X i , i 1, 2, . .., are i.i.d., and T approaches infinity.Suppose that E X i m exists and is finite.By the law of large numbers, we know that S T should be converging to m.Hence, 2 Advances in Decision Sciences c is merely the expected value of the random process.It is often the case that not only does Pr{|S T − c| > ε} go to zero, but it does so exponentially fast.That is, where K ε, c, T is a slowly varying function of T relative to the exponential , and I ε, c is a positive quantity.Loosely, if such a relationship is satisfied, we will say that the sequence {S n } satisfies a large-deviation principle.Large-deviation theory is concerned primarily with determining the quantities I ε, c and to a lesser extent K ε, c, T .The reason for the nomenclature is that for a fixed ε > 0 and a large index T , a large-deviation from the nominal value occurs if |S T − c| > ε.Large-deviation theory can rightly be considered as a generalization or extension of the law of large numbers.The law of large numbers says that certain probabilities converge to zero.Large-deviation theory is concerned with the rate of convergence.Bucklew 1 describes the historical statements of large-deviation in detail.
There have been a few works on the large-deviation theory for time series data.Sato et al. 2 discussed the large-deviation theory of several statistics for short-and longmemory stationary processes.However, it is still hard to find the large-deviation results for nonstationary processes.Recently, Dahlhaus 3,4 has formulated an important class of nonstationary processes with a rigorous asymptotic theory, which he calls locally stationary.A locally stationary process has a time-varying spectral density whose spectral structure changes smoothly with time.There are several papers which discuss discriminant analysis for locally stationary processes e.g., Chandler and Polonik 5 , Sakiyama and Taniguchi 6 , and Hirukawa 7 .In this paper, we discuss the large-deviation theory of discriminant statistics of Gaussian locally stationary processes.In Section 2 we present the Gärtner-Ellis theorem which establishes a large-deviation principle of random variables based only upon convergence properties of the associated sequence of cumulant generating functions.Since no assumptions are made about the dependency structure of random variables, we can apply this theorem to non-stationary time series data.In Section 3, we deal with a Gaussian locally stationary process with a mean function.First, we prove the large-deviation principle for a general quadratic form of the observed stretch.We also give the large-deviation principle for the log-likelihood ratio and the misspecified log-likelihood ratio between two hypotheses.These fundamental statistics are important not only in statistical estimation and testing theory but in discriminant problems.The above asymptotics are described by the large-deviation rate functions.In our stochastic models, the rate functions are very complicated.Thus, in Section 4, we evaluate them numerically.They demonstrate that the misspecifications of nonstationary has serious effects.All the proofs of the theorems presented in Section 3 are given in the Appendix.

G ärtner-Ellis Theorem
Cramér's theorem e.g., Bucklew 1 is usually credited with being the first large-deviation result.It gives the large-deviation principle for sums of independent identically distributed random variables.One of the most useful and surprising generalizations of this theorem is the one due to Gärtner 8 and, more recently, Ellis 9 .These authors established a largedeviation principle of random variables based only upon convergence properties of the associated sequence of moment generating functions Φ ω .Their methods thus allow largedeviation results to be derived for dependent random processes such as Markov chains and functionals of Gaussian random processes.Gärtner 8 assumed throughout that Φ ω < ∞ for all ω.By extensive use of convexity theory, Ellis 9 relaxed this fairly stringent condition.
Suppose that we are given an infinite sequence of random variables {Y T , T ∈ N}.No assumptions are made about the dependency structure of this sequence.Define 2.1 Now let us list two assumptions.
Assumption 2.1.ψ ω ≡ lim T → ∞ ψ T ω exists for all ω ∈ R, where we allow ∞ both as a limit value and as an element of the sequence {ψ T ω }.
Define the large-deviation rate function by this function plays a crucial role in the development of the theory.Furthermore, define where ψ indicates the derivative of ψ.Before proceeding to the main theorem, we first state some properties of this rate function.
Property 1.I x is convex.
We remark that a convex function I • on the real line is continuous everywhere on D I ≡ {x : I x < ∞}, the domain of I • .Property 2. I x has its minimum value at m lim T → ∞ T −1 E Y T , and I m 0.
We now state a simple form of a general large-deviation theorem which is known as the Gärtner and Ellis theorem e.g., Bucklew 1 .
If Assumptions 2.1 and 2.2 hold and a, b ⊂ ψ D ψ , then lim inf Large-deviation theorems are usually expressed as two separate limit theorem: an upper bound for closed sets and a lower bound for open sets.In the case of interval subsets Advances in Decision Sciences of R, it can be guaranteed that the upper bound equals the lower bound by the continuity of I • .For the applications that we have in mind, the interval subsets will be sufficient.

Large-Deviation Results for Locally Stationary Processes
In this section, using the Gärtner-Ellis theorem, we develop the large-deviation principle for some non-stationary time series statistics.When we deal with non-stationary processes, one of the difficult problems to solve is how to set up an adequate asymptotic theory.To overcome this problem, an important class of non-stationary process has been formulated in rigorous asymptotic framework by Dahlhaus 3,4 , called locally stationary processes.Locally stationary processes have time-varying densities, whose spectral structures smoothly change in time.We give the precise definition of locally stationary processes which is due to Dahlhaus 3,4 .Definition 3.1.A sequence of stochastic processes X t,T t 1, . . ., T; T ≥ 1 is called locally stationary with transfer function A • and trend μ if there exists a representation: λ 2πj is the period 2π extension of the Dirac delta function.To simplify the problem, we assume in this paper that the process X t,T is Gaussian, namely, we assume that ν k λ 0 for all k ≥ 3; ii there exists constant K and a 2π-periodic function 3 for all T .A u, λ and μ u are assumed to be continuous in u.
The function f u, λ : |A u, λ | 2 is called the time-varying spectral density of the process.In the following, we will always denote by s and t time points in the interval 1, T , while u and v will denote time points in the rescaled interval 0, 1 , that is u t/T .
We discuss the asymptotics away from the expectation of some statistics used for the problem of discriminating between two Gaussian locally stationary processes with specified mean functions.Suppose that {X t,T , t 1, . . ., T; T ≥ 1} is a Gaussian locally stationary process which under the hypothesis Π j has mean function μ j u and time-varying spectral density f j u, λ for j 1, 2. Let X T X 1,T , . . ., X T,T be a stretch of the series {X t,T }, and let p j • be the probability density function of X T under Π j j 1, 2 .The problem is to classify X T into one of two categories Π 1 and Π 2 in the case that we do not have any information on the prior probabilities of Π 1 and Π 2 .

3.4
Initially, we make the following assumption.Assumption 3.2.i We observe a realisation X 1,T , . . ., X T,T of a Gaussian locally stationary process with mean function μ j and transfer function A j • , under Π j , j 1, 2; ii the A j u, λ are uniformly bounded from above and below, and are differentiable in u and λ with uniformly continuous derivatives ∂/∂u ∂/∂λ A j ; iii the μ j u are differentiable in u with uniformly continuous derivatives.
In time series analysis, the class of statistics which are quadratic forms of X T is fundamental and important.This class of statistics includes the first-order terms in the expansion with respect to T of quasi-Gaussian maximum likelihood estimator QMLE , tests and discriminant statistics, and so forth Assume that G • is the transfer function of a locally stationary process, where the corresponding G satisfies Assumption 3.2 ii and g u is a continuous function of u which satisfies Assumption 3.2 iii , if we replace A j by G and μ j u by g u , respectively.And set Henceforth, E j • stands for the expectation with respect to We first prove the large-deviation theorem for this quadratic form Q T of X T .All the proofs of the theorems are in the Appendix.

6 Advances in Decision Sciences
Next, one considers the log-likelihood ratio statistics.It is well known that the loglikelihood ratio criterion: gives the optimal discrimination rule in the sense that it minimizes the probability of misdiscrimination Anderson 10 .Set S j T Λ ≡ Λ T − E j Λ T for j 1, 2. For discrimination problem one gives the large-deviation principle for Λ T .

3.10
Similarly, under In practice, misspecification occurs in many statistical problems.We consider the following three situations.Although actually {X t,T } has the time-varying mean functions μ j u and the time-varying spectral densities f j u, λ , under Π j , j 1, 2, respectively, i the mean functions are misspecified to μ j u ≡ 0, j 1, 2; ii the spectral densities are misspecified to f j u, λ ≡ f j 0, λ , j 1, 2; iii the mean functions and the spectral densities are misspecified to μ j u ≡ 0 and f j u, λ ≡ f j 0, λ , j 1, 2. Namely, X T is misspecified to stationary.
In each misspecified case, one can formally make the log-likelihood ratio in the form:

3.13
Set S j T M k ≡ M k,T − E j M k,T for j 1, 2 and k 1, 2, 3.The next result is a large-deviation theorem for the misspecified log-likelihood ratios M k,T .It is useful in investigating the effect of misspecification.

3.15
Advances in Decision Sciences 16 3.17

3.18
Now, we turn to the discussion of our discriminant problem of classifying X T into one of two categories described by two hypotheses a follows:

3.19
We use Λ T as the discriminant statistic for the problem 3.19 , namely, if Λ T > 0 we assign X T into Π 2 , and otherwise into Π 1 .Taking x −lim T → ∞ T −1 E 1 Λ T in 3.9 , we can evaluate the probability of misdiscrimination of X T from Π 1 into Π 2 as follows:

3.20
Thus, we see that the rate functions play an important role in the discriminant problem.

Numerical Illustration for Nonstationary Processes
We illustrate the implications of Theorems 3.4 and 3.5 by numerically evaluating the largedeviation probabilities of the statistics Λ T and M k,T , k 1, 2, 3 for the following hypotheses:
From these figures, we see that the magnitude of the mean function is large at u close to 0, while the magnitude of the time-varying spectral density is large at u close to 1.
Specifically, we use the formulae in those theorems concerning Π 2 to evaluate the limits of the large-deviation probabilities:

4.2
Though the result is an asymptotic theory, we perform the simulation with a limited sample size.Therefore, we use some levels of x to expect fairness, that is, we take x −0.1, −1, −10.
The results are listed in Table 1.
For each value x, the large-deviation rate of Λ T is the largest and that of M 3,T is the smallest.Namely, we see that the correctly specified case is the best, and on the other hand the misspecified to stationary case is the worst.Furthermore, the large-deviation rates −LDP M 2 Table 1: The limits of the large-deviation probabilities of Λ T and M k,T , k 1, 2, 3.
and −LDP M 2 are significantly small, comparing with −LDP M 1 .This fact implies that the misspecification of the spectral density to be constant in the time seriously affects the largedeviation rate.Figures 3, 4, 5, and 6 show the large-deviation probabilities of Λ T and M k,T , k 1, 2, 3, for x −1, at each time u and frequency λ.
We see that the large-deviation rate of Λ T keeps the almost constant value at all the time u and frequency λ.On the other hand, that of M 1,T is small at u close to 0 and those of M 2,T and M 3,T are small at u close to 1 and λ close to 0. That is, the large-deviation probability of M 1,T is violated by the large magnitude of the mean function, while those of M 2,T and M 3,T are violated by that of the time-varying spectral density.Hence, we can conclude the misspecifications seriously affect our discrimination.

Appendix
We sketch the proofs of Theorems 3.3-3.5.First, we summarize the assumptions used in this paper.
and W T f A and W T {4π 2 f A } −1 are the approximations of Σ T A, A and Σ T A, A −1 , respectively.We need the following lemmata which are due to Dahlhaus 3,4 We also remark that if U T and V T are real nonnegative symmetric matrices, then Proof of Theorems 3.3-3.5.We need the cumulant generating function of the quadratic form in normal variables A.9 We prove Theorem 3.5 for M 3,T under Π 1 only.Theorems 3.3 and 3.4 are similarly obtained.
In order to use the Gärtner-Ellis theorem, consider Setting H T H T M 3 and h T h T M 3 in A.8 , we have under Π 1 the following: .

Lemma 2 . 3
Gärtner-Ellis .Let a, b be an interval with a, b ∩ D I / ∅.If Assumption 2.1 holds and a < b, then lim sup

Figure 1 : 2 Figure 2 :
Figure 1: The mean function μ u the solid line , the coefficient functions σ u the dashed line , and a u the dotted line .

Figure 3 :Figure 4 :
Figure 3: The time-frequency plot of the large-deviation probabilities of Λ T .

Figure 5 :Figure 6 :0
Figure 5: The time-frequency plot of the large-deviation probabilities of M 2,T .
A.5 Lemma A.3.Let D • be the transfer function of a locally stationary process {Z t,T }, where the corresponding D is bounded from below and has uniformly bounded derivative ∂/∂u ∂/∂λ D. f D u, λ ≡ |D u, λ | 2 denotes the time-varying spectral density of Z t,T .Then, for Σ T d ≡ Σ T D, D , we have lim Advances in DecisionSciences see Mathai and Provost 11 Theorem 3.2a.3 .Theorems 3.3, 3.4, and 3.5 correspond to the respective cases: