Asymptotic Normality of a Nonparametric Conditional Quantile Estimator for Random Fields

Given a stationary multidimensional spatial process (𝑍i=(𝑋i,𝑌i)∈ℝ𝑑×ℝ,i∈ℤ𝑁), we investigate a kernel estimate of the spatial conditional quantile function of the response variable 𝑌i given the explicative variable 𝑋i. Asymptotic normality of the kernel estimate is obtained when the sample considered is an 𝛼-mixing sequence.


Introduction
In this paper, we are interested in nonparametric conditional quantile estimation for spatial data. Spatial data are modeled as finite realizations of random fields that is stochastic processes indexed in Z N , the integer lattice points in the N-dimensional Euclidean space N ≥ 1 . Such data are collected from different locations on the earth and arise in a variety of fields, including soil science, geology, oceanography, econometrics, epidemiology, environmental science, forestry, and many others; see Chilès and Delfiner 1 , Guyon 2 , Anselin and Florax 3 , Cressie 4 , or Ripley 5 . In the context of spatial data, the analysis of the influence of some covariates on a response variable is particularly difficult, due to the possibly highly complex spatial dependence among the various sites. This dependence cannot be typically modeled in any adequate way.
Conditional quantile analysis is of wide interest in modeling of spatial dependence and in the construction of confidence predictive intervals. There exist an extensive literature and various nonparametric approaches in conditional quantile estimation in the nonspatial case N 1 for independent samples and time-dependent observations; see, for example, 2 Advances in Decision Sciences Stute 6 , Samanta 7 , Portnoy 8 , Koul and Mukherjee 9 , Honda 10 , Cai 11 ,Gannoun et al. 12 , and Yu et al. 13 . Extending classical nonparametric conditional quantile estimation for dependent random variables to spatial quantile regression is far from being trivial. This is due to the absence of any canonical ordering in the space and of obvious definition of tail sigma-fields.
Although potential applications of conditional spatial quantile regressions are without number, only the papers of Koenker and Mizera 14 , Hallin et al. 15 , Abdi et al. 16 , and Dabo-Niang and Thiam 17 have paid attention to study these regression methods. Hallin et al. 15 gave a Bahadur representation and an asymptotic normality results of a local linear conditional quantile estimator. The method of Koenker and Mizera 14 is a spatial smoothing technique rather than a spatial auto regression one and they do not take into account the spatial dependency structure of the data. The work of Abdi et al. 16 deals with 2r-mean r ∈ N * and almost complete consistencies of a kernel estimate of conditional quantiles. The paper of Dabo-Niang and Thiam 17 gives the L 1 norm consistency and asymptotic normality of a kernel estimate of the spatial conditional quantile, but this estimate is less general than the one considered here.
However, conditional mean regression estimation for spatial data has been considered in several papers; some key references are Carbon  The rest of the paper is organized as follows. In Section 2, we provided the notations and the kernel quantile estimate. Section 3 is devoted to assumptions. The asymptotic normality of the kernel estimate is stated in Section 4. Section 5 contains a prediction application based on quantile regression and applied to simulated data. Proofs and preliminary lemmas are given in the last section.

Kernel Conditional Quantile Estimator
Let Z i X i , Y i , i ∈ Z N N ≥ 1 be an R d × R-valued measurable strictly stationary spatial process d ≥ 1 , with same distribution as the vector of variables X, Y and defined on a probability space Ω, A, P . A point i i 1 , . . . , i N in Z N will be referred to as a site and may also include a time component.
We assume that the process under study Z i is observed over a rectangular domain I n {i i 1 , . . . , i N ∈ Z N , 1 ≤ i k ≤ n k , k 1, . . . , N}, n n 1 , . . . , n N ∈ Z N . We will write n → ∞ if min{n k } → ∞ and |n j /n k | < C for a constant C such that 0 < C < ∞ for all j, k such that 1 ≤ j, k ≤ N. In the sequel, all the limits are considered when n → ∞. For n n 1 , . . . , n N ∈ Z N , we set n n 1 , . . . , n N . Let S be a set of sites. B S will denote in what follows, the Borel σ-field generated by Z i , i ∈ S. We assume that the regular version of the conditional probability of Y given X exists and has a bounded density with respect to Lebesgue's measure over R. For all x ∈ R d , we denote by F x resp., f x the conditional distribution function resp., the conditional density of Y given X x. In the following, x is a fixed point in R d and we denote by V x a neighborhood of this point. For x ∈ R d , we denote by x the Euclidian norm of x. We suppose that the marginal and joint densities f X and f X, Y of, respectively, X and X, Y exist with respect to Lebesgue measures on R d and R d 1 .
For α ∈ 0, 1 , the conditional quantile of order α of F x , denoted by q α x , can be written as a solution of the equation F x q α x α.

3
To insure that q α x exists and is unique, we assume that F x is strictly increasing. Let The conditional distribution F x and the corresponding density f x can be estimated by the following respective estimators: where K 1 is a kernel density, K 2 is a distribution function, K is the first derivative of K 2 , and h 1 h 1, n resp., h 2 h 2, n are sequences of positive real numbers tending to 0 when n → ∞.
Remark that we can write F x y g n x, y / f x , where g n x, y Then, one can consider the alternative local constant estimator see Hallin et al. 15 defined by Let us mention that it can be shown that 2.4 is equivalent to where F x · is the same estimator of F x · as F x · except that H i y 1 {Y i ≤ y} . In this paper, we will focus on the study of the asymptotic behavior of q α x , since in practice some simulations permit to remark that the differences between this estimator and the local linear one are too small to affect any interpretations; see also Dabo-Niang and Thiam 17 and Gannoun et al. 12 .

Assumptions
We denote by g j the derivative of order j of a function g. In what follows, C and C will denote any positive constant.

General Assumptions
We will use the following four regularity conditions see, e.g., Gannoun  If f x q α x 0, then one can use a condition like F x · is of class C j , F x k q α x 0 for 1 ≤ k < j, and 0 < C < |F x j y | < C < ∞ as in Ferraty et al. 28 .
H 3 We assume that the conditional density f X i ,X j of Y i , Y j given X i , X j exists and is uniformly bounded in i, j.
For simplicity, we assume the following condition on the kernel K 1 see, e.g., Devroye 29 .
H 4 There exist C 1 and C 2 , 0 < C 1 < C 2 < ∞ such that 3.2 H 5 K 2 is of class C 1 with a symmetric, Lipschitz, bounded, and compact support density K. In addition, we assume that the restriction of K 2 on {t ∈ R, K 2 t ∈ 0, 1 } is a strictly increasing function.
Assumption H 5 is classical in nonparametric estimation and is satisfied by usual kernels such as Epanechnikov and Biweight, whereas the Gaussian density K is also possible; it suffices to replace the compact support assumption by R d |t| b 2 K t dt < ∞. Assumption H 5 ensures the existence and the uniqueness of the quantile estimate q α x .

Dependency Conditions
In spatial dependent data analysis, the dependence of the observations has to be measured. Here we will consider the following two dependence measures.

Mixing Condition
The spatial dependence of the process will be measured by means of α-mixing. Then, we consider the α-mixing coefficients of the field Z i , i ∈ Z N , defined by the following: there exists a function ϕ t ↓ 0 as t → ∞, such that E and E subsets of Z N with finite cardinals are where Card E resp., Card E is the cardinality of E resp., E , dist E, E the Euclidean distance between E and E , and ψ : N 2 → R is a symmetric positive function nondecreasing in each variable. Throughout the paper, it will be assumed that ψ satisfies either ψ n, m ≤ C min n, m , ∀n, m ∈ N 3.5 or ψ n, m ≤ C n m 1 β , ∀n, m ∈ N 3.6 6 Advances in Decision Sciences for some β ≥ 1 and some C > 0. We assume also that the process satisfies a polynomial mixing condition: If ψ ≡ 1, then Z i is called strongly mixing. Many stochastic processes, among them various useful time series models, satisfy strong mixing properties, which are relatively easy to check. Conditions 3.5 -3.6 are used in Tran

Local Dependence Condition
Since we aim to get the same rate of convergence as in the i.i.d. case, we need some local dependency assumptions. Then, we assume the following local dependency condition used in Tran 30 .
The joint probability density f X i ,X j of X i , X j exists and satisfies for some constant C and for all u, v, i, j.
In addition, let the density f Z i , Z j of Z i , Z j exist and where C is a positive constant.
In the following, the notations D → and P → mean, respectively, convergences in distribution and in probability.

Consistency Results
This section contains results on asymptotic normality of the conditional quantile estimate. The main result of this paper is given by the following theorem.
q α x , with g n and f being the estimates defined in Section 2.
The proof of this theorem is based on the following three lemmas.
, and 3.5 or 3.6 , one has

Lemma 4.4. Under assumptions of Theorem 4.1, one has:
where q * α x is an element of the interval of extremities q α x and q α x .
Proof of Theorem 4.1. By assumption H 5 , g n x, · is of class C 1 . Then a Taylor expansion on a neighborhood of q α x gives where q * α x is an element of the interval of extremities q α x and q α x . It follows that

4.6
Then, we have

4.7
Lemmas 4.2 and 4.4 and Slutsky's theorem imply that the first term of the right-hand side of the last equality above tends in distribution to N 0, 1 . In addition, Lemma 4.3 permits to write This last tends to 0 by H 8 . Thus, the second term goes to zero in probability. This yields the proof.
Before going further, it should be interesting to give examples where all our conditions on the bandwidths are satisfied. That is done through the two following remarks.
In this case, for any positive real number τ such that Lastly, the condition H 9 iv is h −d 1−2/r q −θ 1−2/r N → 0. It can be written equivalently Then, to satisfy the condition H 9 iv , it is enough to Advances in Decision Sciences Thus, H 6 , H 8 , and H 9 resp., H 7 -H 9 are satisfied when 2 < r < 2Nd/ Nd − 1 and 4.9

Application
In this section, we present a quantile prediction procedure and then apply it to simulated data.

Prediction Procedure
An application where a multidimensional stationary spatial process may be observed is the case of prediction of a strictly stationary R-valued random field ξ i , i ∈ Z N at a given fixed point i 0 when observations are taken from a subset of I n , not containing i 0 , see Biau and Cadre 20 or Dabo-Niang and Yao 27 .
Assume that the value of the field at a given location depends on the values taken by the field in a vicinity U i 0 of i 0 i 0 / ∈ U i 0 ; then the random variables whose components are the {ξ i , i ∈ U i 0 } are an R d -valued spatial process, where d is the cardinal of U i 0 . In other words, we expect that the process ξ i satisfies a Markov property; see, for example, Biau and Cadre 20 or Dabo-Niang and Yao 27 . Moreover, we assume that U i 0 U i 0 , where U is a fixed bounded set of sites that does not contain 0.
Suppose that ξ i is bounded and observed over a subset O n of I n . The aim of this section is to predict ξ i 0 , at a given fixed point i 0 not in O n ⊂ N N . It is well known that the best predictor of ξ i 0 given the data in U i 0 in the sense of mean-square error is To define a predictor of ξ i 0 , let us consider the R d -valued random variables ξ i {ξ u , u ∈ U i ⊂ O n }. The notation of the previous sections is used by setting As a predictor of ξ i 0 , one can take the conditional median estimate ξ i 0 q 0.5 ξ i 0 . We deduce from the previous consistency results the following corollary that gives the convergence of the predictor ξ i 0 .

Advances in Decision Sciences
This consistency result permits to have an approximation of a confidence interval and predictive intervals that consists of the α 2 −α 1 100% confidence intervals with bounds q α 1 ξ i 0 and q α 2 ξ i 0 , α 1 < α 2 .

Numerical Properties
In this section, we study the performance of the conditional quantile predictor introduced in the previous section towards some simulations. Let us denote by GRF m, σ 2 , s a Gaussian random field with mean m and covariance function defined by We consider a random field ξ i i ∈ N 2 from the three following models.

Model 1. One has
Model 2. One has Model 3. One has The choice of U i in the models is motivated by a reinforcement of the spatial local dependency. Set We supposed that the field ξ i , i ∈ N 2 is observable over the rectangular region I n , observed over a subset O n and nonobserved in a subset V n . A sample of size n 61 * 61 3721 obtained from each model is plotted in Figure 1.
For the prediction purpose, subsets O n of size M 441 and V n different with respect to the model and the quantiles of order 0.5, 0.025, 0.975 have been considered.
We want to predict the values ξ i 1 , . . . , ξ i m at given fixed sites i 1 , . . . , i m in V n , with m 10. We provide the plots of the kernel densities estimators of the field of each model in Figure 2. The distributions of the models look asymmetric and highly heteroskedastic in Model 3. These graphics exhibit a strongly bimodal profile of Model 3. That means that a simple study of conditional mean or conditional median can miss some of the essential features of the dataset.
As explained above, for any k ∈ {1, . . . , m}, we take the conditional median estimate ξ i k , as a predictor of ξ i k . We compute these predictors with the vicinity U {−1, 0, 1} × {−1, 0, 1} \ { 0, 0 } and select the standard normal density as kernel K 1 and the Epanechnikov kernel as K. For the bandwidth selection, we use two bandwidth choices.

5.9
where h mean is the bandwidth for kernel smoothing estimation of the regression mean obtained by cross-validation, and φ and Φ are, respectively, the standard normal density and distribution function.
2 The second rule is to choose first h 1 h n , and once h 1 is computed, use it and compute h 2 by cross-validation using the conditional distribution function estimate F x · . where 11 m x and σ x are, respectively, the conditional mean and conditional variance, and C is a constant. One can also choose the bandwidths using the following mean square error result of the conditional quantile estimate: In the case of different bandwidths h 1 and h 2 , we may still use the same rule as above for h 1 h 1 h n . Therefore, only the choice of h 2 seems to deserve more future theoretical investigation since the simulations see Tables 4 and 5 suggest that different choices of these two bandwidths are appreciable. This theoretical choice of h 2 by, e.g., cross-validation is beyond the scope of this paper and deserves future investigations.
To evaluate the performance of the quantile predictor ξ i k with h 1 h 2 or ξ 2 i k with h 1 / h 2 and compare it to the mean regression predictor see Biau and Cadre 20 :

5.13
we compute the mean absolute errors: Tables 1-6 give the quantile estimates for ξ i k , k 1, . . . , m, p ∈ {0.025, 0.5, 0.975}, U {−1, 0, 1} × {−1, 0, 1} \ { 0, 0 }, and the quantile and mean regression prediction errors of the predictors ξ i k and ξ i k . In each table, Inf resp., Sup is the lower resp., upper bound of the confidence interval see above of the median estimator. The bandwidth choice considered in these first three tables is the first choice h 1 h 2 h n . Clearly, the predictors give good results.  The median prediction errors are rather smaller than those of mean regression for the first two models less than 0.038 resp., 0.012 for Model 1 resp., Model 2 . Notice that the quantile and mean regression prediction results of Model 3 are similar and rather worse than those of Models 1 and 2. This can be explained by the fact that multimodality of the field of Model 3 cannot be captured by conditional median and the conditional mean. We derive from the results of the tables 95% confidence and predictive intervals where the extremities are the 2.5% and 97.5% quantiles estimates, for each of the 10 prediction sites. Note that the confidence interval is generally more precise than the predictive one. Tables 4-6 give the same estimates for Models 1-3 as the first three tables but using both the same and the different choices of h 1 and h 2 described the aftermentioned rules. As proved by the numerical experiments all the results are not presented here for seek of simplicity , the prediction results are improved when the sample size increases but notice that the improvement depends on the number of sites which are dependent on the prediction site in interest. Depending on the position of the sites, the conditional mean regression gives sometimes better prediction than the conditional median and vice versa. A next step would be to apply the predictor to a spatial real data that deserves future investigations.

Appendix
This section is devoted to proofs of lemmas used to establish the main result of Section 4. To this end, some needed additional lemmas and results will be stated and proved in this section.
Before giving the proof of Lemma 4.2, we introduce the following notation and establish the following preliminary lemma.
Let L i be the random variable defined by Lemma A.1. Under assumptions H 2 , H 4 , and H 5 , 3.4 , 3.7 (with θ > N 1), and 3.5 or 3.6 , one has for all i and j where V x is defined in Theorem 4.1.
Proof of Lemma A.1. Note that we can deduce easily from assumption H 4 the existence of two positive constants C and C such that for all i and any given integer s.

17
Now, let us calculate the variance term. We have Let us first consider A 2 . By taking the conditional expectation with respect to X i , we get From an integration by parts and an usual change of variables, we obtain A.5 Then, using, respectively, H 4 , H 2 , and H 5 , we have A.6 Thus, we can write A.7 Using A.2 and the fact that the bandwidths tend to 0, we get Then, we can write A.9 18

Advances in Decision Sciences
The conditional expectation with respect to X i permits to write A.10 For the second term of the right-hand side of A.10 , the same argument as that used above and A.2 permit to obtain For the first term, an integration by parts, assumptions H 2 , H 4 , and H 5 lead to A.12 Thus, we have A.14 Therefore, the term A 1 has the same limit as the last term of the right-hand side of A.10 which goes directly by Bochner's lemma to V x α 1 − α f X x R d K 1 t 2 dt. Consequently, we have var L i → V x . This gives the part i of the lemma. Let us now focus on the covariance term. Let where c n is a sequence of integers that converges to infinity and will be given later. We have A.16 Taking the conditional expectation with respect to X i and using the convexity of the function x → |x|, we have Since K 2 is a distribution function, we have clearly

s dt ds.
A.18 Remark that by condition 3.8 , the density f X i , X j is bounded. Then, we have A.19 Hence, A.20 20

Advances in Decision Sciences
where η > 0 is a given sufficiently small real number. Using A.20 and A.21 , we have Let A.23 Since θ > N 1 and c n → ∞, if η < θ − N 1 , we obtain i,j∈I n ,i / j |Cov L i , L j | o n . This implies i,j∈I n ,i / j Cov L i , L j o n and then gives the part ii . For iii , we have 1/ n var i ∈ I n L i var L i 1/ n i,j∈I n ,i / j Cov L i , L j → V x . This completes the proof.
Proof of Lemma 4.2. The proof is similar to that of Lemma 3.2 of Tran 30 . For the sake of brevity, we detail the main parts. We have

A.27
This last term tends to 0 when n → ∞. Thus, it can be assumed without loss of generality that q < p. Suppose that there exist integers r 1 , . . . , r N such that n 1 r 1 p q , . . . , n N r N p q .
As in Tran 30 , the random variables L i are set into large blocks and small blocks. Let Then, we have S n 2 N i 1 T n, x, i . Note that T n, x, 1 is the sum of variables L i in big blocks. The variables T n, x, i , 2 ≤ i ≤ 2 N are variables in small blocks.
As raised by Biau and Cadre 20 , if one does not have the equalities n i r i p q , the term say T n, x, 2 N 1 which contains the L i 's at the ends not included in the blocks above can be added. This will not change the proof much. Thus, to prove the lemma it suffices to prove that The general approach is to show that as n → ∞, This can be easily done as in Tran 30 .
The following two lemmas will be used to prove Lemma 4.4.
and thus A.42 So, to prove the weak convergence of the quantile estimate q α x to q α x it suffices to prove that the conditional distribution estimate F x q α x converges in probability to F x q α x . That is done in the following lemma.

Lemma A.5. Under conditions of Lemma A.3, one has
In order to establish Lemma A.4, we introduce the following notations and state the following three technical lemmas. For y ∈ R, let A.44 and S n x, y i∈I n Δ i x, y f n x, y − Ef n x, y , Lemmas A.7 and A. 6 give, respectively, the weak convergence of f x to f X x and f n x, y to f X, Y x, y . This finishes the proof since, by H 1 , f X x > 0 and f x y is bounded.
In order to prove Lemma A.5, we introduce the following notations and state the following three lemmas. For y ∈ R, let A.51 Proof of Lemma A.5. Let y q α x ; it is easy to see that Lemmas A.7 and A.9 give, respectively, the weak convergence of f x to f X x and g n x, y to g x, y and yield the proof.
Proof of Lemma A. 6. Remark that

Advances in Decision Sciences
The asymptotic behavior of the bias term is standard, in the sense that it is not affected by the dependence structure of the data. We have For the sake of completeness, we present it entirely. Let us now introduce a spatial block decomposition that has been used by Carbon et al. 19 .
Without loss of generality, assume that n i 2pq i for 1 ≤ i ≤ N. The random variables Δ i x, y can be grouped into 2 N q 1 . . . q N cubic blocks U 1, n, j of side p as previously . Then, for each integer 1 ≤ i ≤ 2 N , define T n, i q k −1 j k 0, k 1,...,N U i, n, j and get the following decomposition S n x, y 2 N i 1 T n, i . Observe that, for any ε > 0, We enumerate in an arbitrary way the q q 1 . . . q N terms U 1, n, j of the sum T n, 1 that we call W 1 , . . . , W q . Note that U 1, n, j is measurable with respect to the σ-field generated by V i x, y , with i such that 2j k p 1 ≤ i k ≤ 2j k 1 p, k 1, . . . , N. These sets of sites are separated by a distance at least p, and since K 1 and K 2 are bounded, then we have for all i 1, . . . , q,  Thus, for the case i of the theorem a simple computation shows that for sufficiently large n, P S n x, y > λε n ≤ 2 N 1 exp −λ 2 log n 2 2N 4 where b > 0 and depends on λ. For case ii , we obtain P S n x, y > λε n ≤ 2 N 1 exp −λ 2 log n 2 2N 4 C 2 N 2 Cλ C2 N 1 n β h −d 1 h −1 2 λ −1 ε −1 n ϕ p ≤ C n −b C2 N 1 n β h −d 1 h −1 2 λ −1 ε −1 n ϕ p .

A.64
Proof of Lemma A.9. We have Eg n x, y − g x, y 1 nh d 1 i∈I n R d 1 The sequel of the proof uses the same decomposition and similar lines as in the proof of Lemma A.6. Taking ε n λ log n/ nh d 1 1/2 , p nh d 1 / log n 1/2N and making use of Lemma A.10, we obtain, for λ > 0, the existence of b > 0 such that, for n being large enough, P g n x, y − Eg n x, y > ε n ≤ A.86 To prove the convergence of the lemma, it suffices to show, respectively, for i and ii that