Distributions Escaping to Infinity and the Limiting Power of the Cliff-Ord Test for Autocorrelation

We consider a family of proper random variables which converges to an improper random variable. The limit in distribution is found and applied to obtain a closed-form expression for the limiting power of the Cliff-Ord test for autocorrelation. The applications include the theory of characteristic functions of proper random variables, the theory of almost periodic functions, and the test for spatial correlation in a linear regression model.


Introduction
Improper random variables do not satisfy the condition P X ∈ R 1; that is, they may take values outside the real line R.They are not used much by themselves, but there are situations when they arise as limits of proper random variables.In such cases, we say that a distribution escapes to infinity.The main problem considered in this paper is illustrated in the following in a simplified situation.
Throughout the paper we denote by χ A the indicator of a set A. Let g 1/2 χ −1,1 be a uniform distribution on the segment −1, 1 .Consider the family of densities g λ t λg λt , where λ > 0. The total mass is constant: R g λ t dt 1 for any λ > 0. In Bayesian estimation, improper priors are obtained by letting λ → 0. In this case, there are two effects at work: the support supp g λ −1/λ, 1/λ stretches out indefinitely and the height of the density max g λ λ/2 goes to zero.One might be tempted to think of the limit G of {g λ }, as λ → 0, as an infinitesimally thin layer smeared over the whole real line.This notion would be wrong because R g λ t ϕ t dt ≤ λ 2 supp ϕ ϕ t dt −→ 0, λ → 0, 1.1

ISRN Probability and Statistics
for any ϕ ∈ C 0 R the set of continuous functions on R with bounded support supp ϕ .Thus, if the limit in distribution G of {g λ } exists, it vanishes on all elements of C 0 R .By the definition of the support of a distribution 1, Chapter 1, Section 13 , the support of G does not contain R.So instead of spreading the mass over R, it is more correct to say that the mass escapes to infinity.
In this paper we provide a rigorous framework for treating more complex situations.To illustrate the arising complexities, let us look at the standard normal variable X with the density g x 2π −1/2 exp −x 2 /2 .Let X λ have the density g λ x λg λx λ 2π −1/2 e − λx 2 /2 .Its characteristic function is Ee itX λ R e i t/λ x g x dx Ee i t/λ X e − t/λ 2 /2 . 1.2 The moment generating function of X λ is Ee tX λ e t/λ 2 /2 .None of these expressions is good for characterizing the limit as λ → 0. Further, let ϕ denote an arbitrary continuous and bounded function on R. In the expression R g λ t ϕ t dt λ R g λt ϕ t dt, 1.3 the height of the density goes to zero, so the integrand converges to zero everywhere.However, the graph of the density stretches out from the origin.Therefore, the best majorant for the integrand is |g 0 ϕ t | which is generally not integrable.Thus, the dominated convergence theorem cannot be used to obtain convergence in distribution.While the case of collapsing density λ → ∞ is easily handled by the existing theory, the case of stretching out λ → 0 requires new tools which we develop here.Problem 1. Describe the limit in distribution when the stretching-out is applied to a density along all or some variables.This problem is solved in Theorem 2.2 in Section 2 case of all variables and Theorem 2.6 case of some variables .In their simplest form, those results reveal the main ideas.Let g be any summable even function on R it may change sign and not integrate to unity .Let Mϕ denote the generalized mean of ϕ over R: Now suppose that the function g depends on two variables and the stretching-out is applied with respect to the first of them.Let ϕ be continuous and bounded on R 2 .Denote M 1 ϕ, the result of application of 1.4 to ϕ with respect to the first argument, and let g 1 t 2 R g t 1 , t 2 dt 1 .In cases when g is not a density, we call this function a marginal "density."Then, 1.5 can be used to prove if g is integrable and spherically symmetric.As we show, the right sides of 1.5 and 1.7 determine distributions supported at infinity.Theorem 2.2 is followed by two applications.One generalizes results from 2 on the link between jumps of a distribution function of a proper random variable and its characteristic function.Another application is to the fundamental theorem of H. Bohr on the Parseval identity for almost-periodic functions; see 3 .Theorem 2.6 is applied to the limiting power of the Cliff-Ord test for autocorrelation.To formulate the related problems, we need some notation.
Consider a linear regression model y Xβ u, Eu 0, var u σ 2 Σ ρ , 1.8 where X is an n × k matrix of rank k < n; a vector β ∈ R k and a number σ 2 > 0 are unknown parameters.The matrix Σ ρ is assumed to be a nonnegative function of the parameter ρ ∈ 0, a , with some a > 0. ρ in applications characterizes the degree of autocorrelation.Testing for autocorrelation takes the form H 0 : ρ 0 versus H a : ρ > 0. The case ρ → a is of special interest for determining the limiting power of tests.

ISRN Probability and Statistics
This problem, as an intermediate step in the proof, was considered by Martellosio 4 , and his answer is reproduced as follows.Let N A denote the null space of a matrix A and denote A a Σ −1 a .Let Xβ N A a be the translation by Xβ of the null space of A a .Reference 4 states that in case 1.10 as ρ −→ a, f ρ tends to a degenerate density supported on Xβ N A a , 1.11 see page 182.Equation 1.11 arises from the confusion between the stretching-out and collapsing.On page 159, rows 10 and 11, he remarks that 1.11 can be extended to the case rank Σ −1 a < n − 1.He does not specify the mode of convergence, but, as we argue in Section 2, the convergence in distribution is the right one for the problem of the behavior of the limiting power of tests for autocorrelation.Unfortunately, 1.11 does not hold under the convergence in distribution.We prove this and solve Problem 2 in Theorem 2.7 for the general case 0 < rank Σ −1 a < n.
Let Φ be a critical region for rejecting H 0 in favor of H a .Denote 12 the probability content of Φ under density 1.9 and define the limiting power β Φ as the limit of this probability content: β Φ lim ρ → a β ρ Φ .Specifically, we consider critical regions that arise from the Cliff-Ord test, to be described now.With the regressor matrix X from 1.8 denote L X X X X −1 X , M X I − L X .Under the spatial autocorrelation assumption, the regression disturbances follow where ε is a new disturbance and W is some known n × n matrix.The scalar ρ, which is unknown, determines the degree of correlation among the components of u.For testing the null H 0 : ρ 0 against the alternative H a : ρ > 0, Cliff and Ord 5 proposed a test that rejects the Problem 3. Obtain a closed-form expression for β Φ .Theorem 2.7 is applied in Theorem 2.9 to solve this problem in case 1.10 .
Problem 4. Describe the cases when the limiting power disappears, that is, when β Φ 0. This problem has been the main motivation for this paper.Spatial models in general are peculiar in many respects, and the possibility of the limiting power to disappear is one of those peculiarities that has been attracting researchers' attention lately.Krämer 6 was the first to suggest that the limiting power of tests for spatial autocorrelation may vanish, for some combinations of the regressor matrix and the spatial matrix.Unfortunately, the terms in which Krämer expressed his results do not adequately reflect all the possibilities, and his proof contains an incorrect argument; see 7, Footnote 5 .Martellosio 4 was the first to suggest that the answer is better described in terms of the geometrical relationship between the eigenvectors of Σ −1 a and the critical region Φ.However, both of his main result 4, Theorem 1 and its proof contain errors see Remark 2.12 in Section 2 .Our answer to Problem 4 is given in Theorem 2.11 and corrects 4, Theorem 1 in that where he excludes the extreme values 0 and 1 for the limiting power, we give examples showing that the extreme values are possible.In the light of our result, several statements in 4, 7 have to be reconsidered.The analysis for the Cliff-Ord test is very involved, and it is not feasible to indicate corrections for all Martellosio's main results that depend on the wrong intermediate statement or his Theorem 2.9.The complexity of our analysis necessitates a more detailed notation than that used by Martellosio.In particular, some of his verbal definitions and statements are cast in a more formal way.As a result, our citations are not word for word.
Despite the fact that the convergence in distribution is the right one for Problem 4, it would be interesting to know what kind of convergence produces the limit indicated by Martellosio, as is stated next.Problem 5. Design the mode of convergence that leads to 1.11 .
To this end, we introduce a new convergence concept which, given its purpose, could be called a retrofit convergence , which may not look intuitively appealing but allows us to prove 1.11 in Theorem 2.13.Under this alternative convergence, there is no analog of 1.12 .Therefore, we did not consider Problems 3 and 4 for this convergence.
The previous plan will be implemented under conditions much more restrictive than suggested by Martellosio.All main results that are stated in Sections 2 and 3 contains all proofs.

Main Statements
In the multidimensional version of the generalized mean 1.4 instead of averaging over segments −r, r , we have to average over balls.The shape of those balls depends on the norm of R n .Let • a be an arbitrary norm in R n .The balls are defined by B a,n x, r {y ∈ R n : x − y a ≤ r}, where the indication of the space dimension will be important when dealing with more than one space.As an example, one can think of the l p -norm defined by

2.1
In case of the Euclidean norm • 2 , we obtain usual balls; in cases p 1 and p ∞, the balls B p,n x, r are cubes.Another useful example is x A x Ax 1/2 , where A is a symmetric, positive definite matrix.We say that a function g on R n is • a -spherically symmetric if g x p x a with some function p defined on the half-axis 0, ∞ .Conditions involving spherical symmetry in the following are similar to Conditions 2.1 and 2.2 from 8 .

ISRN Probability and Statistics
Let σ denote an element of the unit sphere {σ ∈ R n : σ a 1} and let σ, ρ σ be the representation of a point x ∈ R n in the polar system of coordinates such that σ x/ x a and ρ σ x a for x / 0.
In the one-dimensional case all balls are segments, and we write simply Mϕ instead of M a,1 ϕ.One of the basic properties of generalized means is that they do not depend on the behavior of ϕ in any fixed ball B a,n 0, r : 1 mes B a,n 0, r B a,n 0,r \B a,n 0,r 0 ϕ t dt.

2.5
Other useful properties are 3.16 , 3.37 , and 3.42 .CL R n denotes the set of continuous bounded functions on R n that satisfy the Lipschitz condition 2.7 10, Theorem 2.1 asserts that here CL R n can be replaced by the set of bounded uniformly continuous functions.L p R n is the space of p-summable functions on R n provided with the For our applications, in the multidimensional version of 1.5 , we need to allow ϕ to depend on the parameter λ, as in 2.8 Here ϕ ∈ CL R n , ϕ λ t ϕ F λ t G λ .F λ , λ ≥ 0, are n × d matrices and G λ , λ ≥ 0, are n × 1 vectors such that F λ and G λ tend to F 0 and G 0 , respectively, sufficiently quickly, as stipulated in the next assumption.The first application of Theorem 2.2 is to the theory of characteristic functions.Let F be the distribution function of a proper, real-valued random variable X. Denote j x the jump of F at point x and let ϕ t Ee itX be the characteristic function.

2.10
where the sum on the right is over all jump points x k of F.
Theorems 3.2.3 and 3.3.4 in 2 are a special case of this corollary with g 1/2 χ −1,1 .The proof of Corollary 2.4 is obtained by combining those theorems with our Theorem 2.2.
The second application is to the theory of almost-periodic functions.A complexvalued continuous function f on R is called almost-periodic if for each ε > 0 there exists l ε > 0 such that each interval a, a l ε of length l ε contains at least one number τ for which sup t∈R |f t τ −f t | < ε.In the space of almost-periodic functions, the formula ϕ, ψ Mϕψ defines a scalar product ψ is a complex conjugate of ψ , and the numbers α λ ϕ Note that the first part of this corollary applies to characteristic functions of purely discrete distribution functions because from 2, Corollary 2 of Theorem 3.2.3any such characteristic function is almost periodic.Now, we turn to the multidimensional version of 1.7 .For the density, we assume a stronger condition than Condition 1 a .
This assumption allows us to show that when some coordinates of x are fixed, g x as a function of the remaining coordinates satisfies Condition 1 a .Now, we provide the intuition for the next condition.The stretching-out applied in 1.7 is described by the transformation s A λ t where Here, the matrix A λ has two eigenvalues.The limit lim λ → 0 A λ A 0 is a singular matrix because one of these eigenvalues tends to zero as λ → 0. Generalizing upon this situation and also thinking of applications to invariant tests, in the n-dimensional case we consider a symmetric nonnegative matrix A ρ of size n × n, where the parameter ρ belongs to the segment 0, a .Denote its eigenvalues 0 ≤ λ 1 ρ ≤ • • • ≤ λ n ρ and let A ρ be diagonalized as A ρ P ρ Λ ρ P ρ , where Λ ρ diag λ 1 ρ , . . ., λ n ρ and P ρ is an orthogonal matrix.A ρ degenerates at the right end of the segment 0, a , owing to the following assumption.
The matrices P ρ and Λ ρ converge sufficiently quickly as ρ → a. Namely, with the matrix Note that Λ −1 a does not exist because of part b , but part c allows us to set Λ −1 a Λ a I by continuity.In line with part b , we use the partitions Also, partition P a P 1 a , P 2 a conformably with 2.14 Tϕ t 2 denotes the result of application of the generalized mean operator with respect to t 1 , keeping t 2 fixed: and g 1 denotes a marginal "density":

2.17
Theorem 2.6.Let Conditions 2 and 3 hold and let ϕ ∈ CL R n be such that the limit 2.16 exists for almost all t 2 ∈ R n−d .Then,

2.18
Next, we turn to the solution of Problem 2. Denoting A ρ Σ −1/2 ρ and assuming that A ρ is positive definite for ρ ∈ 0, a , we see that condition 1.10 corresponds to the case d 1 of our Condition 3. The density 1.9 fits the framework of our Theorem 2.6 because the stretching-out is applied along one variable.Thus, in Theorem 2.7 we apply Theorem 2.6 to a characterize the limit in distribution of f ρ in case 1 ≤ rank Σ −1 a ≤ n − 1 and b show that 1.11 does not hold under the weak convergence.By implication, 1.11 is wrong if any convergence stronger than the weak one is considered e.g., uniform, almost sure, in probability and in L p .Even though we use Theorem 2.6, the assumption on the density in part b of the next theorem is weaker than that in Theorem 2.6.

ISRN Probability and Statistics
Condition 4. g is a density on R n bounded by an integrable, • 2 -spherically symmetric function, g x ≤ p x 2 .
An example of such a density is g x p x a , where p is a nonnegative and nonincreasing function on 0, ∞ , which decays at infinity sufficiently quickly for g to be integrable.By monotonicity of p inequality 3.14 in the following implies that g x p x a ≤ p x 2 /c 3 .

Theorem 2.7. Let Condition 3 hold in which
a Denote Q P 2 a P 2 a the projector onto the subspace spanned by the last n − d eigenvectors of A a and let T 1 be defined by with g 1 defined in 2.17 Before giving the solution to Problem 3, we need more notation and definitions.Obtaining a closed-form formula for β Φ involves a meticulous analysis of Φ based on the representation of Φ given in the following.
Let Im X {Xβ : In particular, a cone-like set with each of its element s contains its opposite −s.The next representation is a stronger statement than saying that Φ defined in 1.14 is invariant with respect to transformations y

Representation of Φ
The rejection region Φ for the Cliff-Ord test is a cylinder with a cone-like base M X Φ ΓS, where S {s ∈ Im M X : s 2 1, s Ws > c}.
It is convenient to call an aperture the set S in representation 2.21 .The function s Ws is continuous on the unit sphere of Im M X and the set c, ∞ is open.By the general property of continuous mappings 11, Chapter 2, § 5, Section 5, Theorem 6 , the preimage of an open set under a continuous mapping is open.Thus, the aperture is an open set in the relative topology of the unit sphere of Im M X .We need the notations of the interior int Φ defined as the set of points of Φ that belong to Φ with some neighborhood , closure cl Φ the set of all limit points of Φ and boundary bd Φ cl Φ \ Φ. Writing 1.12 in the form we see that Theorem 2.7 a will be applicable if we manage to extend it from continuous Lipschitz functions to discontinuous functions of type χ Φ .This is done in Theorem 2.9.In Theorems 2.9 and 2.11, we assume that d 1.Up to the notation, this is the same assumption as 1.10 .At least within our method, generalizations of the results in the following to the case d > 1 are hard to obtain.Let Condition 3 hold for A ρ Σ −1/2 ρ and let d 1.Then, λ 1 ρ → 0 as ρ → a and all other eigenvalues have positive limits.Denote f 1 , ..., f n the orthonormal eigenvectors of A a Σ −1/2 a corresponding to the eigenvalues λ 1 a 0, λ 2 a , . . ., λ n a .The partitions 2.14 become QXβ will be called a shift because its role is to shift the line In the next theorem, we extend 2.20 to ϕ χ Φ with Φ described by 2.21 .In the notation of marginal densities, the subscript will indicate the number of integrated-out variables.For example, where To avoid triviality, in the following theorem we assume that the inclusion ∅ ⊂ Φ ⊂ R n is strict.This implies that in representation 2.21 the set ΓS is a nonempty proper subset of Im M X .
Theorem 2.11.Let conditions of Theorem 2.9 hold, with 2.23 accompanying f 1 ∈ Im X and 2.25 accompanying f 1 / ∈ Im X. Besides, we require the function p from Condition 2 to be positive on 0, ∞ .
1 Our conditions on the density and critical region are much more restrictive.Our proof reveals the distinction between the cases f 1 ∈ Im X and f 1 / ∈ Im X.In particular, in case 2.2 , Martellosio excludes the extreme values β Φ 0 and β Φ 1, while we provide counter examples showing that they are possible.Note also that we do not impose any conditions on the structure of W. What happens to the limiting power in case of W that arises in practice needs additional investigation. 2 Martellosio's proof is based on 1.11 which we disprove in Theorem 2.7 b .A series of other propositions from the same paper see Lemmas D.2, D.3 and E.4, Corollary 1 and Propositions 1, 2 and 5 , as well as from 7 see Lemma 3.2, Theorems 3.3, 3.5 and 4.1, and Proposition 3.6 , depend on 1.11 and need a revision.In particular, his claim that his results are true for any invariant critical region and any continuous density u that is unimodal at the origin is unwarranted.
3 Even if statement 1.11 were right, the proof of 4, Theorem 1 would be incomplete because it incorrectly uses the n-dimensional Lebesgue measure.Its use is inappropriate because for a degenerate density even one point may carry a positive mass.In our proof, we justify the use of n − 1 -dimensional Lebesgue measure.Now, we turn to the description of the alternative approach solution to Problem 5 .A mode is not a very good characteristic of a distribution when two unimodal densities with the same modes have very different spreads.A set of points where the density is close to its maximum might be a better characteristic in this case, at least for bell-shaped distributions.Let 0 < ε < 1 and let m max f x be the maximum of a continuous density f x .We call M ε {x : f x ≥ 1 − ε m} an ε-maximizing set of f.The idea of this density-maximizing approach is close to the maximum likelihood principle.
Suppose a decision d is taken if the statistic x belongs to a set D. In case of a favorable decision, D is chosen in such a way that the probability P x ∈ D is high.The use of probability in this decision rule presumes that the statistic can be calculated repeatedly.However, in practice, especially in economics, the decision is based on just one value of the statistic in question.In such a case it may be preferable to choose D so that min x∈D f x is sufficiently close to m max f x .For the ε-maximizing set, we have min x∈M ε f x ≥ 1 − ε m.Requiring ε to be close to 1 in the density-maximizing approach is similar to requiring P x ∈ D to be close to 1 in the probability-maximizing approach although normally P x ∈ M ε → 0 as ε → 0 .
As before, we assume that the matrix A ρ Σ −1/2 ρ of size n × n is symmetric for 0 ≤ ρ ≤ a and positive definite for 0 ≤ ρ < a.The idea is to impose conditions ensuring that ε-maximizing sets are ellipsoids which in the limit give a set of the desired shape.This idea is realized through a delicate balance of the limit behavior of the eigenvalues and density contained in Conditions 5 through 7 and 2.32 .
Condition 5.In the diagonal representation of A ρ , the orthogonal matrix P satisfies the first d eigenvalues vanish as power functions with positive constants c λ , α λ ; the remaining eigenvalues tend to positive constants where λ j a > 0.
Condition 6.The function ε ρ in the definition of the set M ε ρ vanishes as a power function where c ε , α ε are positive constants.
Condition 7. The density g is • 2 -spherically symmetric, g x p x 2 , where p is continuous and monotonically decreasing on 0, ∞ and such that where c p , α p > 0.
This assumption implies that m max g x p 0 , that the inverse function p −1 is continuous and monotonically decreasing on 0, m , and that

2.31
Theorem 2.13.Denote N A a the null subspace of A a (spanned by the eigenvectors corresponding to its zero eigenvalues).If Conditions 5-7 hold and

Proofs
The idea of the proof is to approximate g with a step function h, prove the statement for h and then pass to the limit to obtain the statement for g.A step function, by definition, is a finite linear combination of indicators of measurable sets.Due to the spherical symmetry of g, these sets turn out to be balls.For the method to work, the radii of the balls should be positive and finite.Approximation of p by a continuous function in Step 1 is a trick to make sure that h takes a finite number of values.
Step 1 p can be assumed continuous on 0, ∞ .Let Ω 0, ∞ and denote Z Ω the space of measurable functions The first three properties are standard facts of the theory of L p spaces; the last one follows from the fact that if supp f is a compact subset of Ω, then the weight ρ d−1 in the definition of the norm of Z Ω satisfies c 1 ≤ ρ d−1 ≤ c 2 on supp f and therefore for such an f the norms f Z Ω and f L 1 are equivalent.Functions from L 1 are known to be translationcontinuous.By Burenkov's theorem there exists a sequence 3.2 then 2.8 will follow.
Step 2 approximating g with a Step Function .Take an arbitrary ε > 0. We can assume that p is continuous on 0, ∞ .
a Suppose g L 1 ≥ ε.We approximate g t by a step function h t , which will vanish where t is large or small.By summability of g, there exist 0 g is uniformly continuous on the ring {N 1 ≤ t a ≤ N 2 }.By Condition 1 a , for any natural m we can find δ > 0 and split this ring into smaller rings in such a way that g in each ring is close to its value on the inner boundary:

3.5
Put h t 0 if t a < N 1 or t a > N 2 ; h t p l/m for t ∈ A m,l , l 0, . . ., L. Combination of 3.3 , 3.4 , and 3.5 leads to

3.6
Consequently, we can fix m m ε so that Step 3 replacing rings by balls in the representation of h .Suppose h is not identically zero.The sets A m,l in 3.4 are concentric rings with finite positive radii of the inner and outer boundaries, and by construction, h is a finite linear combination of indicators of such rings.

ISRN Probability and Statistics
Hence, it can be written as h L l 0 a l χ B a,d 0,r l − χ B a,d 0,r l−1 , where a l are some real numbers and the radii satisfy 3.8 Therefore, with some new constants b l , we can write h as a linear combination of indicators of balls: h l , where h l b l χ B a,d 0,r l .

3.9
If h ≡ 0, we put formally r 0 1, b 0 0, L 0. Note that L, b l and r l all depend on ε and that the constants b l may deviate from the values of h significantly.
Step 4 introducing residuals for generalized means .Define the residual R 1 r by Then, and in both cases lim r → ∞ R 1 r 0.

3.12
By the Lipschitz condition 2.6 and Condition 1 b ,

3.13
Here, we have used the fact that on R d any two norms are equivalent, so with some c 2 , c 3 > 0. Equations 2.3 and 3.13 imply that where o λ and o 1 do not depend on r.From this bound and 3.11 , it follows that where R 2 λ, r is a new residual satisfying R 2 λ, r o λ r o 1 ∀r > 0, as λ → 0. 3.17 Step 5 proving 2.8 for the approximating function .Let M a,d ϕ 0 / 0. For one term in 3.9 by 2.2 , and the first equation in 3.16 we have

3.18
Summation of these equations produces

3.20
In case M a,d ϕ 0 0, the only difference consists in application of the second equation in 3.16 .The conclusion is 3.20 with M a,d ϕ 0 0.

ISRN Probability and Statistics
Step 6 proving 2.8 .Using 3.7 and 3.20 , we have

Proof of Theorem 2.6
Denote J ρ ϕ R n ϕ t det A ρ g A ρ t dt.

3.24
From 3.23 and 3.24 we deduce

3.25
Step 2 changing the limit variable .Let us have a closer look at the expression in the square brackets in 3.25 :

3.26
Since Λ −1 a Λ a I, here H a P a , and by Condition 3 c ,

3.27
Considering λ λ 1 the independent variable, take ρ ρ λ as its function.The limit relation ρ → a translates to λ → 0, and 3.26 and 3.27 become, Step 3 proving a preliminary version of 2.18 .Now we check that Theorem 2.2 applies to 3.28 for any fixed t 2 .Letting

3.32
where M 2,d ϕ t 2 is obtained by applying M 2,d to ϕ t ≡ ϕ P 1 a t 1 P 2 a t 2 with respect to t 1 .Functions 3.28 are bounded by an integrable function of t 2 :

3.33
By the dominated convergence theorem, 3.32 and 3.33 imply that Now, we combine 3.25 , 3.28 , 3.32 , and 3.34 to get lim 3.36

Part (a)
Step

3.38
It remains to let r → ∞ to prove 3.37 .

3.39
Using I P a P a P 1 a P 1 a Q, we see that

Part (b)
Step 1 the generalized mean vanishes on p-integrable functions .If ϕ ∈ L p R d with some p ∈ 1, ∞ , then by H ölder's inequality

3.42
Step 2 lim J ρ ϕ vanishes on functions with compact support .By Condition 4, 3.44 Suppose 1.11 is true in the weak sense or, more generally, the family {f ρ : 0 ≤ ρ < a} converges in distribution, as ρ → a, to a probabilistic measure μ supported on some set S ⊂ R n .Then by 2.7 R n ϕ t f ρ t dt → S ϕdμ, ϕ ∈ CL R n .This equation and 3.44 imply S ϕ dμ 0, ϕ ∈ CL 0 R n .Taking here ϕ ∈ CL 0 R n such that ϕ ≥ 0 and ϕ 1 on B 2,n 0, r we have S∩B 2,n 0,r dμ 0 for any r > 0. Thus, μ S 0 which is impossible.

Proof of Theorem 2.9
The proof of Theorem 2.9 will be preceded by a series of lemmas.The main difficulty consists in estimating integrals of the marginal density g 1 over cylindrical sets of an infinite Lebesgue measure.
Proof.Since L X , M X are complementary orthoprojectors, the images Im M X and Im L X Im X are orthogonal to one another and where we have denoted v M X u.Hence, only the component v ∈ M X Φ determines whether u belongs to Φ or not, while the component L X u ∈ Im X is arbitrary.This proves the equation Φ M X Φ Im X.Finally, the equation M X Φ ΓS follows by homogeneity from 3.45 .Now we review a few facts about the Moore-Penrose inverse in a form adapted to our needs.Let A be a matrix of size m × n, not necessarily square, and denote its rank r rank A. Its Moore-Penrose inverse A is of size n × m.Lemma 3.2 see, 13, pages 24-25 .A is unique and has the following properties.

3.47
Here and in the sequel, •, • stands for the scalar product.

ISRN Probability and Statistics
ii Instead writing the second term in the form I −H z, z ∈ R n , we can equivalently write it as v ∈ Im I − H .
iii A y 0 implies that y 0. This is because if A y 0 and y Ax, then by a y Ax AA Ax A A y 0.
Recalling that a general solution of the equation Ax y is, by definition, the preimage A −1 y, we state our conclusion in a lemma.
2 a and let A and H be the accompanying matrices from Lemma 3.2.Then, the set T 2 defined by

3.49
In part (a) replace Y, τ, A by Y ⊆ Im M X − pp , τ ∈ Im M X − pp , and A M X − pp P 2 a Λ −1 2 a , respectively.Then the set T 2 has representation 3.48 .
Proof.Let us show that f 1 ∈ Im X implies that Im A Im M X .

3.50
The inclusion Im A ⊂ Im M X is obvious.Conversely, let x ∈ Im M X .Then, x M X x.Since f 1 , . . ., f n is an orthonormal basis, we have

3.51
We have proved 3.50 which implies Y − τ ⊂ Im A. Application of Lemma 3.3 proves 3.48 .
It is easy to check that M X −pp is an orthoprojector and its image satisfies 3.49 .From 3.49 , we conclude that M X and M X − pp commute and 3.52 see 14, VI, Section 3, Lemma 2 a .Applying M X − pp to both sides of 3.50 and using the last equation we get

3.53
We have proved an analog of 3.50 for the matrix under consideration: Im A Im M X − pp .

3.54
As previously mentioned, application of Lemma 3.3 proves 3.48 .
Lemma 3.5.Let f 1 , τ, A, A and H be the same as in Lemma 3.4(a).

Proof. Part a
Step 1 proving an auxiliary bound .Denote

3.57
By Lemma 3.2, 3.50 implies that r ≡ rank A n − k.

3.58
By Lemma 3.3, A is a one-to-one mapping from Im A to Im A , both of which are of dimension r.We use this fact twice.
Firstly, selecting bases, one in Im A and another in Im A , we can identify A with its matrix representation in these bases.This matrix is square of order r and rank r, so the Jacobian c 1 of the transformation u A v, v ∈ ΓΔ − τ, is positive.Therefore,

3.59
Secondly, A v 2 0 implies that A v 0 which for v ∈ Im A implies that v 0. Thus, A v 2 is a norm on Im A and by 3.14 ,

3.60
As g is assumed nonincreasing, g k is also nonincreasing, and 3.59 leads to

3.61
We bound this using polar coordinates: Here, the quantity is finite because by condition 2.23 , g k is integrable in the neighborhood of zero and ρ/c 2 τ 2 r−1 is of order ρ r−1 at infinity.
Step 2 proving the main bound .The matrix A from Lemma 3.4 a is of size n × n − 1 and A is of size n − 1 × n.Equation 3.50 and here are orthogonal, we have where the right side defines the function h.The transition from t 2 to the coordinates u, v is described by an orthogonal matrix.This can be shown as follows.Take in Im H, an orthonormal basis and identify u with the set of coordinates in that basis.Do the same with Im I − H and v. Join the two bases.The equation t 2 u v induces a transformation between the original coordinates of t 2 and the ones in the joint basis.Since Im H ⊥ Im I−H , the Jacobian of the resulting transformation is unity.Hence, by 3.48 , 3.64 , 3.65 , and 3.57 ,

3.66
Part b .In the bound see 3.59 and 3.60 , 3.68 Lemma 3.6.Let f 1 , p, τ, A, A , and H be the same as in Lemma 3.4(b).
Proof.The proof essentially follows that of Lemma 3.5 a .From 3.49 , and 3.54 we see that instead of 3.58 we have r n − k − 1.Instead of 3.64 we have rank I − H k. This leads to the definition of the marginal density used in 2.25 .With these changes, 3.62 , and 3.66 are true and prove part a .
With the same changes the proof of Lemma 3.5 b goes through.
In the course of the proof, we approximate χ Φ by functions ϕ ± ε ∈ CL R n from below and above.Then, we show that the rate of approximation improves as ε → 0 and ρ → a.
Step 1 constructing the approximating functions .Denote ρ x, A inf y∈A x − y 2 the distance from a point x to a set A. For a given ε > 0, set

ISRN Probability and Statistics
Obviously, Ψ − ε ⊂ Φ ⊂ ε .As in 10, page 12 proof of ii ⇒ iii , we can construct functions ϕ ± ε that are uniformly continuous on R n and satisfy

3.69
Step 2 reducing the convergence statement to a special case .Denote

3.70
Here, we show that lim

3.74
By 3.73 , for the same δ, there is ρ δ ∈ 0, a such that

3.78
Since each of the square brackets on the right is nonnegative, 3.77 gives lim ε → 0 I χ Ψ ± ε I χ Φ .Now, it suffices to take into account the inequality Step 4 geometric description of the task .Denote l s, i {st i : t ∈ R} a straight line with a slope s ∈ R n and intercept i ∈ R n and put

3.79
This expression shows that we need to study the set {t 1 ∈ R : f 1 t 1 z t 2 ∈ B ε } of points common to the straight line l f 1 , z t 2 and the band B ε .The cylindrical nature of the sets considered implies that the band admits an equivalent definition as a cylinder with the base and element Im X.From this definition, we see that the condition

3.81
We have to sort out points t 2 ∈ R n−1 into a those for which the set of points t 1 satisfying 3.81 is so small that 3.79 is zero and b that set is large enough for 3.79 to be positive.
Step 5 case f 1 ∈ Im X .In this case, M X f 1 0, and 3.81 reduces to This condition does not contain t 1 .Therefore, if it is not satisfied, the integrand in 3.79 is identically zero and b ε t 2 0 for such t 2 .If it is satisfied, the integrand in 3.79 is identically 1 and b ε t 2  1 for such t 2 .Hence,

3.83
The set of t 2 satisfying 3.82 has an infinite Lebesgue measure which precludes us from application of the continuity property of the Lebesgue measure for an integrable h, A h t dt → 0 as mes A → 0 .As one can see from Figure 1, M X B ε is not a cone-like set.We divide it into two parts: α in ε {s ∈ M X B ε : s 2 ≤ 1} includes the points inside the unit ball and on the unit sphere and α out ε {s ∈ M X B ε : s 2 > 1} includes the points of M X B ε outside the unit ball.Put Δ ε {s ∈ Im M X : s 2 1, ρ s, bd M X Φ ≤ ε} and let us show that

3.84
Suppose x ∈ α out ε .Then, x ∈ Im M X , x 2 > 1 and by 3.80 , there exists a sequence {y k } ⊂ bd M X Φ such that lim sup x − y k 2 ≤ ε.The sequence {y k } is bounded and therefore precompact, while bd M X Φ M X cl Φ \ Φ is closed by construction.Hence, we can assume that y k → y ∈ bd M X Φ passing to a subsequence, if necessary .Then, x − y 2 ≤ ε.This implies x/ x 2 − y/ x 2 2 ≤ ε/ x 2 < ε.Let s x/ x 2 .Since bd M X Φ is cone-like, it follows that y/ x 2 ∈ bd M X Φ and ρ s, bd M X Φ < ε.This proves that s ∈ Δ ε , x x 2 s ∈ ΓΔ ε and 3.84 is true.From 3.84 , we conclude that

3.85
This means that the set of solutions of 3.82 is contained in the union of two sets: 3.87 Equations 3.86 and 3.87 prove that I χ B ε → 0, ε → 0.
Step 6 case f 1 / ∈ Im X .Equation 2.5 implies that if the straight line l M X f 1 , M X z t 2 has a bounded intersection with the band M X B ε , then MT 1 χ B ε t 2 0. Therefore, we are interested in the set of slopes: Let us prove that

3.89
Suppose M X f 1 ∈ Sl.Then, there exists a sequence of points p k which lie on the straight line, Fix M X f 1 ∈ Sl and denote p M X f 1 / M X f 1 2 ∈ bd S ⊂ Im M X .The matrix pp projects Im M X onto the straight line l M X f 1 , 0 .The projector M X − pp projects Im M X onto Im M X − pp .Condition 3.81 implies two equations in projections:

3.91
The first one can be safely omitted because the shift pp M X z t 2 along a line does not change the property of the line to have an unbounded intersection with M X B ε .Denoting A M X − pp P 2 a Λ −1 2 a , τ M X − pp QXβ, and using 3.52 , we write the second equation as Proof.If A t 2 is bounded, then χ Φ f 1 t 1 z t 2 0 outside a bounded set, and by 2.5 , Obviously, the integral is null if G R n−1 .By 3.97 , condition 3.95 implies that

3.99
Since mes G > 0, we have

3.100
In case G R n−1 , the last integral is 1.Consider, for example, the case of the right half-axis: Obviously, M X f 1 ∈ ΓS and p ∈ S. In the equation

3.106
Since f 1 is a limit point of Φ, there is a sequence Thus, p Wp > c is impossible and 3.106 is true.
In the three examples in the following we assume that n > k 1.Then, Im Q / ⊆ Im X, and by Lemma 3.7 b , M X z t 2 / 0, for almost all t 2 ∈ R n−1 .
Let k i 0 − 1 and suppose that

3.112
Because of 3.111 , here M X f 1 f 1 .Besides, since z t 2 is orthogonal to f 1 , the decomposition does not contain f 1 .Hence, 3.112 becomes

3.114
Remembering that the of Σ −1/2 a are just reordered eigenvectors of W, we get x Wx 1 a t 3.120 for all t 1 ∈ R and almost all t 2 ∈ R n−1 .Hence, the set 3.94 is empty for almost all t 2 ∈ R n−1 and by Lemma 3.8 a , β Φ 0. Sub-sub-case c Example of β Φ 1/2 .M X z t 2 runs over Im M X P 2 a .If k > 1, then n − 1 > n − k, and we can choose the eigenvectors f 2 , . . ., f n of Σ −1/2 a so that Im Q ⊃ Im M X , and then M X z t 2 runs over Im M X Im M X P 2 a .Let ΓS be a union of two cones, as in Figure 2. The line l M X f 1 , 0 lies on the boundary of the cones.In Figure 2, L a l M X f 1 , 0 Shifted L a L b L a a(t (2)   b(t (2) )

Figure 2:
The base of Φ contains the area between two solid lines L a and L b .The dashed line is l M X f 1 , M X z t 2  L a M X z t 2 .
and L b l b t 2 , 0 .For any nonzero shift M X z t 2 , half of the shifted line is contained in M X Φ because L a and L b are not parallel.Here, "half" means "either for all large positive t 1 or for all large negative t 1 ."By Lemma 3.8 c , β Φ 1/2.

3.122
In z coordinates, this is an ellipsoid with half-axes h ρ /λ j ρ .
Step 2 characterizing the rate of decay of h ρ .First we determine the rate with which p −1 x approaches 0 when x → m.To approximately invert p, in the equation x p r , we use

3.126
Step 3 behavior of half-axes .We intend to show that for j d 1, . . ., n the half-axes tend to zero, so that in the limit z d 1 • • • z n 0, 3.127 whereas for j 1, . . ., d, they tend to ∞, and in the limit, z 1 , . . ., z d can be any real numbers.For j d 1, . . ., n from 2.28 , we see that the condition h ρ /λ j ρ → 0 is equivalent to h ρ → 0. Because of 2.31 , this condition is the same as 1 − ε ρ m → m which holds by 2.29 .
For the remaining half-axes, we get by 2.27 , 3.126 , and 2.32

3.128
We have proved our statement on convergence of the half-axes.According to 2.26 , 3.127 rewrites as P a y − Xβ j 0, j d 1, . . ., n.Since the rows of P a are eigenvectors of A a , the last equation means that − Xβ is orthogonal to eigenvectors numbered j d 1, . . ., n.Hence, y − Xβ belongs to the null space of A a , and the set of all such y's is Xβ N A a .In this proof, the set Xβ N A a is obtained as the limit of the ridge of f ρ .In general, this ridge can have any shape and the limit will not be Xβ N A a .

Summary
Theorem 2.2 is the key result in this paper.We tried to obtain it under minimal conditions on g.It would be interesting to relax our conditions.In particular, densities that are not spherically symmetric are of interest in applications.Note that when the dependence on λ is more complex than in g λ x λg λx , one needs very different techniques and more detailed assumptions about g to obtain an analog of Theorem 2.2.Such results can be found in 15 .Another promising direction of research is to consider tests other than the Cliff-Ord one.

.bRemark 2 . 8 .
If Condition 4 holds, then 1.11 cannot be true if the convergence in distribution is understood.Because of the identity Xβ N A a I − Q Xβ QXβ N A a QXβ N A a , 1.11 correctly captures one feature of the limit distribution: it depends on Xβ only through QXβ.

Theorem 2 . 9
representation of β Φ .Let Conditions 2 and 3 hold with d 1. a If f 1 ∈ Im X and 1 0

1 v
1 M a,d ϕ does not depend on the center of balls .Let us make definition 2.4 more precise: M a,d ϕ x lim r → ∞ m ϕ, B a,d x, r , x ∈ R d .Here, we prove that M a,d ϕ x M a,d ϕ 0 for any x ∈ R d .3.37 For r ≥ x a , one has implications y a ≤ r − x a ⇒ y − x a ≤ r ⇒ y a ≤ y − x a x a ≤ r x a .Geometrically, they mean the inclusion B a,d 0, r − x a ⊂ B a,d x, r ⊂ B a,d 0, r x a .This inclusion implies that r − x a r d m ϕ, B a,d 0, r − x a r − x a r d a r − x a d B a,d 0,r− x a ϕ y dy ≤ 1 v a r d B a,d x,r ϕ y dy m ϕ, B a,d x, r

Regarding 3 .
46 , we can remark the following.i The two terms on the right are orthogonal because by b and c A y, I − H z HA y, I − H z A y, H − H 2 z 0.

Figure 1 :
Figure 1: M X Φ consists two cones bounded by solid lines.M X B ε is bounded by dashed lines.

Case 3 .Step 1 , we have f ρ y n j 1 λ 1 λ j ρ m. 3 . 121 Since p − 1
Suppose f 1 ∈ Φ − .Then, as in Case 1, M X f 1 / 0 and p is defined.It satisfies p Wp < c.The term in the square brackets in 3.102 for all large |t 1 | belongs to an open set M X Φ − .Since this set is cone-like, the left side of 3.102 belongs to M X Φ − for all large |t 1 |.The set A t 2 is bounded for all t 2 ∈ R n−1 , and by Lemma 3.8 a , β Φ 0. the ε ρ -maximizing set is an ellipsoid .For y ∈ M ε ρ by orthogonality of P and Condition 7j ρ p Λ ρ P ρ y − Xβ 2 ≥ 1 − ε ρ n j is decreasing, this is equivalent to Λ ρ P ρ y − Xβ 2 ≤ p −1 1 − ε ρ m .Denoting h ρ p −1 1 − ε ρ m , z P ρ y − Xβ , we rewrite the resulting inequality as

38ISRN Probability and Statistics representation 2 .
30 : m 1 − c p r α p 1 o r α ε .This is equivalent to x/m 1 o r α ε 1 − c p r α p .Solving this equation for the r involved in the right-hand side produces p ρ → a, we can put r a − ρ in the last equation to get p and the generalized mean is applied over t 1 .Remark 2.10.a As can be seen from the proof, the theorem holds for any test with the critical region satisfying 2.21 .b Conditions 2.23 and 2.25 are technical assumptions that provide integrability in the neighborhood of the origin of marginal densities that arise in the course of the proof.
Using the definition of T 1 and 3.37 , we see that this is the same as M 2,d T 1 ϕ t 2 .Hence, the existence of M 2,d Tϕ • Xβ t 2 is equivalent to that of M 2,d T 1 ϕ t 2 .Application of Theorem 2.6 proves 2.20 .
1Xβ P 1 a t 1 P 1 a P 1 a Xβ QXβ P 1 a t 1 P 1 a Xβ QXβ.
The function ϕ satisfies the Lipschitz condition, | ϕ x − ϕ y | ||ϕ x | − |ϕ y || ≤ c 1 x − y 2 , and g satisfies Condition 2. Part a of this theorem allows us to use 2.20 with ϕ in place of ϕ and g instead of g whenever M 2,d T 1 ϕ t 2 exists for almost all t 2 ∈ R n−d .Let CL 0 R n denote the set of functions ϕ ∈ CL R n with compact support supp ϕ.Suppose ϕ ∈ CL 0 R n .By Step 1, M 2,d T 1 ϕ t 2 ≡ 0. Equations 2.20 and 3.43 imply that lim dt,3.43where ϕ t |ϕ t | and f ρ is obtained from 1.9 by replacing g x with g x p x 2 .
where |t k 1 | → ∞, and which are 2ε-close to bd M X Φ : there exists a sequence y k ∈ bd M X Φ such that Letting k → ∞, we get M X f 1 δ s ∈ bd S , and 3.89 is true.
for all large |t 1 |.Hence, the set A t 2 contains the neighborhood of infinity more precisely, all t 1 with |t 1 | > N where N depends on t 2 .By Lemma 3.8 b , whereG R n−1 , then β Φ 1. Suppose f 1 ∈ bd Φ .Subcase 2.1.Let f 1 ∈ Im X or, equivalently, M X f 1 0. Condition 3.93 reduces to Im I − H where A is the Moore-Penrose inverse of A M X P 2 a Λ −1 2 a ,τ M X QXβ, and H is the projector from Lemma 3.2.Using Lemma 3.8 b , 3.66 , and 3.59 we get ≥ t, which is impossible by assumption.A is one-to-one on Im A Im M X , so A ΓS − τ is a nonempty proper subset of Im A .Therefore, 0 < β Φ < 1.
for all large |t 1 |, the term in the square brackets is close to p and, because M X Φ is open in Im M X , this term belongs to M X Φ.Since M X Φ is cone-like, the left side of 3.102 belongs to M X ∈ Im X.As in Case 1 , p is defined.Let us show that it satisfies p Wp c.
Hence, normalizing x, s x/ x 2 , and assuming that μ > c, we get from 3.115 ∈ R and almost all t 2 ∈ R n−1 .This shows that the set 3.94 coincides with R for almost all t 2 ∈ R n−1 and by Lemma 3.8 b β Φ 1. Sub-sub-case b Example of β Φ 0. We change 3.111 so as to obtain in 3.115 μ − in place of μ .Let k n − i 0 and suppose thatf i 0 1 , . . ., f n span Im X, f 1 , f − 2 , . . ., f − i 0 span Im M X .