^{1}

^{1}

^{1}

We propose a cross-validation method suitable for smoothing of kernel quantile estimators. In particular, our proposed method selects the bandwidth parameter, which is known to play a crucial role in kernel smoothing, based on unbiased estimation of a mean integrated squared error curve of which the minimising value determines an optimal bandwidth. This method is shown to lead to asymptotically optimal bandwidth choice and we also provide some general theory on the performance of optimal, data-based methods of bandwidth choice. The numerical performances of the proposed methods are compared in simulations, and the new bandwidth selection is demonstrated to work very well.

The estimation of population quantiles is of great interest when one is not prepared to assume a parametric form for the underlying distribution. In addition, due to their robust nature, quantiles often arise as natural quantities to estimate when the underlying distribution is skewed [

Let

In this paper, we propose a cross-validation method suitable for smoothing of kernel quantile estimators. In particular, our proposed method selects the bandwidth parameter, which is known to play a crucial role in kernel smoothing, based on unbiased estimation of a mean integrated squared error curve of which the minimising value determines an optimal bandwidth. This method is shown to lead to asymptotically optimal bandwidth choice and we also provide some general theory on the performance of optimal, data-based methods of bandwidth choice. The numerical performances of the proposed methods are compared in simulations, and the new bandwidth selection is demonstrated to work very well.

Bandwidth plays a critical role in the implementation of practical estimation. Specifically, the choice of the smoothing parameter determines the tradeoff between the amount of smoothness obtained and closeness of the estimation to the true distribution [

Several data-based methods can be made to find the asymptotically optimal bandwidth

Building on Falk [

If

There is no single optimal bandwidth minimizing the

In order to obtain

When measuring the closeness of an estimated and true function the mean integrated squared (MISE) defined as

The value which minimises

The unknown

The general approach of crossvalidation is to compare each observation with a value predicted by the model based on the remainder of the data. A method for density estimation was proposed by Rudemo [

when

The kernel method for distribution function

where

where

Now, from (

From (

Sheather and Marron [

On combining the expressions for bias and variance we can express the mean integrated square error as

and for

We can see from (

Suppose that

(An outline proof of the above theorem is in the appendix).

From the above theorem, we can conclude that minimisation of

Suppose that the conditions of previous theorem hold. If

A numerical study was conducted to compare the performances of the two bandwidth selection methods. Namely, the method presented by Sheather and Marron [

In order to account for different shapes for our simulation study we consider a standard normal, Exp

Further, for comparison purposes we refer to our proposed method and that of Sheather and Marron [

(a) Standard normal distribution (see Table

Mean squared errors results for bandwidth selection methods for different sample sizes and for data from a normal distribution.

0.05 | method 1 | 0.34841956 | 0.321073870 | 0.298936771 |

method 2 | 0.29636758 | 0.164364738 | 0.090598082 | |

0.10 | method 1 | 0.07645956 | 0.065440575 | 0.054697205 |

method 2 | 0.04947745 | 0.022846355 | 0.015566907 | |

0.15 | method 1 | 0.02291501 | 0.013920668 | 0.007384189 |

method 2 | 0.02939708 | 0.013386234 | 0.005005849 | |

0.20 | method 1 | 0.01891919 | 0.009273746 | 0.003152866 |

method 2 | 0.02228828 | 0.010094172 | 0.003812209 | |

0.25 | method 1 | 0.01596948 | 0.008581398 | 0.003000777 |

method 2 | 0.01835912 | 0.008880639 | 0.003568772 | |

0.30 | method 1 | 0.01614981 | 0.008035667 | 0.003208531 |

method 2 | 0.01639148 | 0.008299838 | 0.003375445 | |

0.35 | method 1 | 0.01461880 | 0.007677567 | 0.003534028 |

method 2 | 0.01544790 | 0.007763629 | 0.003012045 | |

0.40 | method 1 | 0.01279474 | 0.007375428 | 0.002899081 |

method 2 | 0.01494506 | 0.007248497 | 0.002661230 | |

0.45 | method 1 | 0.01224268 | 0.006128817 | 0.002183302 |

method 2 | 0.01444153 | 0.006790490 | 0.002295830 | |

0.55 | method 1 | 0.01414050 | 0.006348893 | 0.001922013 |

method 2 | 0.01373258 | 0.006702430 | 0.002099446 | |

0.60 | method 1 | 0.01375373 | 0.006392721 | 0.002007274 |

method 2 | 0.01341763 | 0.006762798 | 0.002254869 | |

0.65 | method 1 | 0.01344773 | 0.006063502 | 0.002589679 |

method 2 | 0.01290569 | 0.006801901 | 0.002507202 | |

0.70 | method 1 | 0.01320832 | 0.006394102 | 0.002456085 |

method 2 | 0.01233948 | 0.007001064 | 0.002691678 | |

0.75 | method 1 | 0.01503264 | 0.007011867 | 0.002789939 |

method 2 | 0.01219829 | 0.007216326 | 0.002679609 | |

0.80 | method 1 | 0.01604847 | 0.007246605 | 0.002715445 |

method 2 | 0.01327836 | 0.007602346 | 0.002791240 | |

0.85 | method 1 | 0.01757171 | 0.009239589 | 0.004770755 |

method 2 | 0.01740931 | 0.009522181 | 0.003848474 | |

0.90 | method 1 | 0.03192379 | 0.023292975 | 0.019942754 |

method 2 | 0.03702774 | 0.018053976 | 0.012250413 | |

0.95 | method 1 | 0.15323893 | 0.147773963 | 0.150811561 |

method 2 | 0.24825188 | 0.146840177 | 0.092517440 |

Left panel: plots of the quantile estimators for method 1 (solid line), method 2 (dotted line), and true quantile (dashed line) for different sample sizes and for data from a normal distribution. Right panel: box plots of mean squared errors for the quantile estimators for method 1 and method 2 for different sample sizes.

(b) Exponential distribution (see Table

Mean squared errors results for bandwidth selection methods for different sample sizes and for data from an exponential distribution.

0.05 | method 1 | 0.001687025 | 0.0014699990 | 0.0014107454 |

method 2 | 0.0006023236 | 0.0002476745 | ||

0.10 | method 1 | 0.001306211 | 0.0009229338 | 0.0007744410 |

method 2 | 0.0008225254 | 0.0004075822 | ||

0.15 | method 1 | 0.001589646 | 0.0008940486 | 0.0006237375 |

method 2 | 0.0012963576 | 0.0006938287 | ||

0.20 | method 1 | 0.002187990 | 0.0011477063 | 0.0006801504 |

method 2 | 0.0019188172 | 0.0010358272 | ||

0.25 | method 1 | 0.002916417 | 0.0015805678 | 0.0008156225 |

method 2 | 0.0026838659 | 0.0014096523 | ||

0.30 | method 1 | 0.003827511 | 0.0019724207 | 0.0010289166 |

method 2 | 0.0036542688 | 0.0018358956 | ||

0.35 | method 1 | 0.004919618 | 0.0025540323 | 0.0012720751 |

method 2 | 0.0048301657 | 0.0023318358 | ||

0.40 | method 1 | 0.005868113 | 0.0031932355 | 0.0016253398 |

method 2 | 0.0060092243 | 0.0028998751 | ||

0.45 | method 1 | 0.007267783 | 0.0039962426 | 0.0021094081 |

method 2 | 0.0072785641 | 0.0035363816 | ||

0.55 | method 1 | 0.011776976 | 0.0065148222 | 0.0039208447 |

method 2 | 0.0110599156 | 0.0055548552 | ||

0.60 | method 1 | 0.012864521 | 0.0070366699 | 0.0026965785 |

method 2 | 0.0138585365 | 0.0070359561 | ||

0.65 | method 1 | 0.018173097 | 0.0086476349 | 0.0031472559 |

method 2 | 0.0169709413 | 0.0088832263 | ||

0.70 | method 1 | 0.021125532 | 0.0111607501 | 0.0041235720 |

method 2 | 0.0201049720 | 0.0114703180 | ||

0.75 | method 1 | 0.024025836 | 0.0150785289 | 0.0057215181 |

method 2 | 0.0229763952 | 0.0149490250 | ||

0.80 | method 1 | 0.037367344 | 0.0204676368 | 0.0081595071 |

method 2 | 0.0407106885 | 0.0181647976 | ||

0.85 | method 1 | 0.057785539 | 0.0317404871 | 0.0098128398 |

method 2 | 0.0838657681 | 0.0300656149 | ||

0.90 | method 1 | 0.078797379 | 0.0426418410 | 0.0152139697 |

method 2 | 0.1878456852 | 0.1117820016 | ||

0.95 | method 1 | 0.121239102 | 0.0810135450 | 0.0284524316 |

method 2 | 0.6668323836 | 0.4923732684 |

Left panel: plots of the quantile estimators for method 1 (solid line), method 2 (dotted line) and true quantile (dashed line) for different sample sizes and for data from an exponential distribution. Right panel: box plots of mean squared errors for the quantile estimators for method 1 and method 2 for different sample sizes.

(c) Log-normal distribution (see Table

Mean squared errors results for bandwidth selection methods for different sample sizes and for data from a Log-normal distribution.

0.05 | method 1 | 0.001663032 | 0.0010098573 | 0.0006568989 |

method 2 | 0.002384136 | 0.0007270441 | 0.0003613541 | |

0.10 | method 1 | 0.001863141 | 0.0008438333 | 0.0002915013 |

method 2 | 0.002601994 | 0.0008361475 | 0.0002981938 | |

0.15 | method 1 | 0.002633153 | 0.0013492870 | 0.0004451506 |

method 2 | 0.002623552 | 0.0011943144 | 0.0003738508 | |

0.20 | method 1 | 0.003753458 | 0.0019922356 | 0.0006866399 |

method 2 | 0.003107351 | 0.0014724525 | 0.0005685022 | |

0.25 | method 1 | 0.004956635 | 0.0027140878 | 0.0009886053 |

method 2 | 0.004564382 | 0.0022952079 | 0.0008557756 | |

0.30 | method 1 | 0.006480195 | 0.0035603171 | 0.0015897314 |

method 2 | 0.006436967 | 0.0031574264 | 0.0011938924 | |

0.35 | method 1 | 0.008858850 | 0.0047972372 | 0.0023446072 |

method 2 | 0.008443129 | 0.0038626105 | 0.0015443970 | |

0.40 | method 1 | 0.010053969 | 0.0055989143 | 0.0022496198 |

method 2 | 0.010893398 | 0.0051735721 | 0.0017579579 | |

0.45 | method 1 | 0.012998940 | 0.0069058362 | 0.0030102466 |

method 2 | 0.013607931 | 0.0063606758 | 0.0019799551 | |

0.55 | method 1 | 0.019687850 | 0.0115431473 | 0.0051386226 |

method 2 | 0.020581110 | 0.0100828810 | 0.0029554466 | |

0.60 | method 1 | 0.023881883 | 0.0129227902 | 0.0046644050 |

method 2 | 0.025845419 | 0.0129081138 | 0.0040301844 | |

0.65 | method 1 | 0.032155537 | 0.0160476126 | 0.0056732073 |

method 2 | 0.035737008 | 0.0167147469 | 0.0056528658 | |

0.70 | method 1 | 0.045027965 | 0.0249576836 | 0.0077709058 |

method 2 | 0.042681315 | 0.0223936302 | 0.0077616346 | |

0.75 | method 1 | 0.060715676 | 0.0318891176 | 0.0121926243 |

method 2 | 0.059276198 | 0.0323738749 | 0.0104119217 | |

0.80 | method 1 | 0.087694754 | 0.0450814911 | 0.0165993582 |

method 2 | 0.090704630 | 0.0530374710 | 0.0168162426 | |

0.85 | method 1 | 0.140537374 | 0.0840290373 | 0.0311728395 |

method 2 | 0.193857196 | 0.1131949907 | 0.0350218855 | |

0.90 | method 1 | 0.289944417 | 0.1642236062 | 0.0679038026 |

method 2 | 0.552092689 | 0.2763301818 | 0.1112433633 | |

0.95 | method 1 | 1.119717137 | 0.4764026616 | 0.1984216218 |

method 2 | 2.306672668 | 1.3159008668 | 0.2217620895 |

Left panel: plots of the quantile estimators for method 1 (solid line), method 2 (dotted line) and true quantile (dashed line) for different sample sizes and for data from a Log-normal distribution. Right panel: box plots of mean squared errors for the quantile estimators for method 1 and method 2 for different sample sizes.

(d) Double exponential distribution (see Table

Mean squared errors results for bandwidth selection methods for different sample sizes and for data from a double exponential distribution.

0.05 | method 1 | 0.35372420 | 0.288207742 | 0.251339747 |

method 2 | 0.45458819 | 0.315704320 | 0.051385372 | |

0.10 | method 1 | 0.07123072 | 0.043684160 | 0.029307307 |

method 2 | 0.14868952 | 0.097871072 | 0.023601368 | |

0.15 | method 1 | 0.05081769 | 0.025358946 | 0.009241326 |

method 2 | 0.09377244 | 0.035207151 | 0.010910214 | |

0.20 | method 1 | 0.02489079 | 0.015360242 | 0.007647199 |

method 2 | 0.04997348 | 0.024864359 | 0.008013159 | |

0.25 | method 1 | 0.01863802 | 0.012204904 | 0.004401402 |

method 2 | 0.03117942 | 0.019101033 | 0.006247279 | |

0.30 | method 1 | 0.01869611 | 0.012031162 | 0.004145965 |

method 2 | 0.02516932 | 0.014680335 | 0.004847191 | |

0.35 | method 1 | 0.01562279 | 0.009560873 | 0.003235724 |

method 2 | 0.02017404 | 0.011355808 | 0.003513386 | |

0.40 | method 1 | 0.01430068 | 0.007860775 | 0.002493813 |

method 2 | 0.01669505 | 0.009165203 | 0.002621345 | |

0.45 | method 1 | 0.01386331 | 0.007587705 | 0.002485022 |

method 2 | 0.01529664 | 0.008221501 | 0.002104265 | |

0.55 | method 1 | 0.01501458 | 0.007801051 | 0.002013993 |

method 2 | 0.01280613 | 0.007796411 | 0.002227569 | |

0.60 | method 1 | 0.01712203 | 0.009076922 | 0.002233672 |

method 2 | 0.01394454 | 0.009475605 | 0.002791236 | |

0.65 | method 1 | 0.01946241 | 0.011129870 | 0.003521070 |

method 2 | 0.01840894 | 0.012558998 | 0.003628169 | |

0.70 | method 1 | 0.02098394 | 0.011997405 | 0.003255335 |

method 2 | 0.02333092 | 0.015792466 | 0.004534060 | |

0.75 | method 1 | 0.02791943 | 0.016885471 | 0.004419826 |

method 2 | 0.02937457 | 0.019852122 | 0.005469359 | |

0.80 | method 1 | 0.03532806 | 0.021319714 | 0.005471649 |

method 2 | 0.04294634 | 0.024757804 | 0.007270187 | |

0.85 | method 1 | 0.05463890 | 0.030489951 | 0.011338629 |

method 2 | 0.08441144 | 0.035306415 | 0.012054182 | |

0.90 | method 1 | 0.09188621 | 0.058587164 | 0.030485192 |

method 2 | 0.14755444 | 0.083844232 | 0.024399440 | |

0.95 | method 1 | 0.28184945 | 0.224432372 | 0.180645893 |

method 2 | 0.51462209 | 0.319147435 | 0.076406491 |

Left panel: plots of the quantile estimators for method 1 (solid line), method 2 (dotted line) and true quantile (dashed line) for different sample sizes and for data from a double exponential distribution. Right panel: box plots of mean squared error for the quantile estimators for method 1 and method 2 for different sample sizes.

We can compute and summarize the relative efficiency of

The relative efficiency (R.E) of

Standard normal dist. | Exponential dist. | Log normal dist. | Double exponential dist. | |
---|---|---|---|---|

100 | 1.037276 | 2.636250 | 1.806082 | 1.520903 |

200 | 0.6986324 | 2.952808 | 2.096307 | 1.308667 |

500 | 0.4455828 | 2.324423 | 1.173547 | 0.4519134 |

From Tables

Also, from Table

So, we may conclude that in terms of MISE our bandwidth selection method is more efficient than Sheather-Marron for skewed distributions but not for symmetric distributions.

In this paper we have a proposed a cross-validation-based-rule for the selection of bandwidth for quantile functions estimated by kernel procedure. The bandwidth selected by our proposed method is shown to be asymptotically unbiased and in order to assess the numerical performance, we conduct a simulation study and compare it with the bandwidth proposed by Sheather and Marron [

Let

With

This step combines Steps

This step establishes that

This step combines Steps

Let

This step notes that

Shows that

This step combines the results of Steps

This step notes that

This step combines Steps