Choosing the Right Spatial Weighting Matrix in a Quantile Regression Model

This paper proposes computationally tractable methods for selecting the appropriate spatial weighting matrix in the context of a spatial quantile regression model. This selection is a notoriously difficult problem even in linear spatial models and is even more difficult in a quantile regression setup. The proposal is illustrated by an empirical example and manages to produce tractable models. One important feature of the proposed methodology is that by allowing different degrees and forms of spatial dependence across quantiles it further relaxes the usual quantile restriction attributable to the linear quantile regression. In this way we can obtain a more robust, with regard to potential functional misspecification, model, but nevertheless preserve the parametric rate of convergence and the established inferential apparatus associated with the linear quantile regression approach.


The Spatial Quantile Regression Model
The spatial quantile regression model [1] is a straightforward quantile regression generalisation of the popular, in spatial econometrics, linear spatial lag model.More specifically it can be written as  =  ()  +  () + , where  is a spatially lagged dependent variable, specified via a predetermined spatial weighting matrix ,  is the design matrix containing the independent variables (covariates), and  is a residuals vector.Here we only have one spatially lagged dependent variable but this is not an essential assumption, and more than one spatial weighting matrix can be easily incorporated.This representation is similar to the linear spatial lag regression model, but here coefficients are allowed to vary with the quantile, rather than being assumed fixed.This model has some attractive properties.First, the original motivation for Kostov's [1] proposal is to alleviate the potential bias arising from inappropriate functional form assumptions in a spatial model.In simple terms the underlying logic is as follows.Omitting spatial dependence typically introduces estimation bias in the presence of spatial lag dependence when the wrong functional form specification is employed.Hence a natural way to circumvent the problem is to estimate the underlying function nonparametrically.The sample sizes used in many empirical studies are however often too small for efficient application of nonparametric methods.Semiparametric methods could then be used to alleviate the problem.The linear quantile regression is such a semi-parametric method.Although it cannot be guaranteed to entirely eliminate the adverse effects of functional form assumptions, such methods can greatly reduce them.In particular Kostov [1] argues that for a typical hedonic model the (linear) quantile restriction is appropriate.
A major advantage of the quantile regression approach is the opportunity to estimate a flexible semiparametric model, which is nevertheless characterised by parametric rate of convergence, thus making it suitable for empirical analysis in small sample cases.Furthermore a well-developed set of tools for efficient inference is available (see [1] for details).
Spatial modelling has however been focused mostly on estimation issues.For example, Kostov [1] assumes that the exact form of the process generating the spatial dependence is given.This is a typical assumption of an "estimation focused" approach to spatial modelling in that the spatial weighting matrix used to specify the model is known.The spatial ISRN Economics weighting matrix is however a part of the specification process.It needs to be prespecified.There could be cases where the underlying theoretical model provides some guidance but more often than not this is not the case.Consequently in empirical applications of spatial models the selection of spatial weight matrices is characterised by a great deal of arbitrariness.This arbitrariness presents a serious problem to the inference in such models since estimation results have been shown to critically depend on the choice of spatial weighting matrix [2][3][4].Even more importantly, there is an interplay between spatial weighting matrix and functional form choice.Using the wrong spatial weighting matrix has broadly speaking the same implications as ignoring existing spatial dependence.Therefore functional form and spatial weighting matrix specification have to be considered simultaneously.The problem is not as severe in nonparametric models, because most nonparametric estimation methods are typically consistent even in the presence of spatial dependence.The wrong spatial weighting matrix however would still introduce inefficiency in the non-parametric estimates, which with smaller samples can seriously impede inference.In a parametric setup, the wrong spatial weighting matrix introduces bias even when the right functional form is used.

Selection of Spatial Weighting Matrix
Owing to these considerations it would be advantageous to have methods to choose an appropriate spatial weighting matrix.Selecting the "right" spatial weighting matrix can serve twofold purpose.First, it will increase the efficiency of the model estimates, as discussed previously.Second, when the nature of the process generating spatial dependence is of particular interest (e.g., in social interaction models) the form of the spatial weighting matrices consistent with that data generation process becomes a major inferential problem.In such cases we need to find the appropriate spatial weighting matrix, since this is the explicit subject of the research problem.In this paper we consider the issue in a spatial quantile regression framework.
In the following we will briefly review some approaches designed to reduce the arbitrariness of spatial weighting matrix choice (mostly) in linear models.Then we will discuss the possible extensions to the spatial quantile regression.The approach taken in this paper falls in the framework of selecting the spatial weighting matrix either implicitly or explicitly from a pre-defined set of candidates.
Holloway and Lapar [5] used a Bayesian marginal likelihood approach to select a neighbourhood definition (cutoff points for the neighbourhood), but one can consider their approach as a general model selection approach, which could be applied to any other set of competing models.A particularly active strand of research is concerned with Bayesian model averaging (BMA) approaches.LeSage and Parent [6] proposed a BMA procedure for spatial model which incorporates the uncertainty about the correct spatial weighting matrix.LeSage and Fischer [7] extended the latter approach into an MC3 (Markov Chan Monte Carlo Model Composition) method to select an inverse distance nearest neighbour type of spatial weighting matrix for the linear spatial model.Crespo-Cuaresma and Feldkircher [8] further extend this procedure to deal with different types of spatial weighting matrices by introducing Bayesian model averaging inference conditional on a given spatial weighting matrix.Crespo-Cuaresma and Feldkircher [8] use spatial filtering to resolve the endogeneity issue and in this way focus on the regression part of the model rather than on the spatial dependence itself.The approach above implicitly assumes that the spatial dependence can be characterised by a single spatial weighting matrix.This assumption can be relaxed but at a considerable computational cost.Eicher et al. [9] proposed instrumental variables Bayesian model averaging procedure which is essentially a hierarchical Bayesian counterpart to the frequentist two-step estimation that accounts for model uncertainty in both steps.Although Eicher et al. [9] do not deal with spatial dependence, but only with the more general issue of endogeneity, since spatial lag dependence is a particular type of endogeneity, their approach can be readily applied to spatial lag models.
Finally from a non-Bayesian point of view Kostov [10] suggested a two-step procedure for selecting spatial weighting matrix that is applicable to a wide range of prespecified candidates.This procedure is motivated by considerations specific to spatial models (and the proposed computational algorithms are tuned for this purpose), but otherwise it deals with the endogeneity problem in the same way as Eicher et al. [9].

Proposal Outline
This paper proposes extending the methodology adopted in Kostov [10] to a quantile regression setting.In what follows we will first briefly explain the previously mentioned approach.We will then highlight the particularities of the extension of this procedure to quantile regression models.Furthermore we will briefly comment on the different alternative options and the reasons for the specific choices we adopt.Our contribution is twofold.First we adapt the approach of Kostov [10] to a (linear) quantile regression model.Second, since as we will explain later, the original approach has a prediction focus, we further expand it to focus on structure discovery (i.e., identifying the "true sparsity pattern").
Kostov's [10] approach is based on Kelejian and Prucha's [11] two-stage least squares method to estimate spatial models.In this method, spatially lagged independent variables are used as instruments for the spatially lagged dependent variable.The first step (instrumentation) is a least squares regression of the lagged dependent variable on the lagged independent variables.In the second step, the fitted values from the first stage regression replace the original endogenous variable in the estimation of the model's coefficients.Kostov [10] retains the first step of this procedure (which projects the spatially lagged dependent variable in the vector space of the instruments).He however suggests implementing this first step for a number of different spatial weighting matrices resulting in an augmented second stage model that includes a large number of transformed, in the first step, variables (instead of the original spatial weighting matrices) to be considered.In this way the problem of choice of spatial weighting matrix becomes a variable selection problem (amongst the previously mentioned transformed variables).The other interesting feature of Kostov's [10] paper is the application of a component-wise boosting algorithm as a variable selection method in the second step.Any other variable selection method could be used but Kostov's [10] choice is mainly motivated by computational considerations in dealing with large number of potential alternatives.
In a nutshell the approach of Kostov [10] amounts to transforming the spatial weighting matrix selection problem into a high-dimensional (due to the potentially large number of alternatives) variable selection problem, for which "standard" methods could be applied.The crucial point is Kostov's [10] approach to establish equivalence between the two-stage spatial least squares method and the proposed componentwise boosting alternative.Therefore in order to extend the same logic to a spatial quantile regression model we need to find a variable selection equivalent to a quantile regression estimation method.We will deal with these two issues in turn.
The first issue is the estimation method for spatial quantile regression.We are aware of two main approaches able to consistently estimate such models.The first is the application of Zietz et al. [12] who use the results of Kim and Muller [13] for quantile regression estimation under endogeneity.The other approach is presented in Kostov [1] who builds upon the methods developed by Chernozhukov and Hansen [14,15].In Kostov's [1] application one minimises a matrix norm over a range of values for the spatial dependence parameter.This is convenient when there is a single spatial weighting matrix.With many candidates however this would involve such minimisation over a multidimensional grid, which makes such an approach prohibitively expensive in terms of computational requirements, particularly when the number of potential spatial weighting matrices is large.Alternatively the methods developed in Chernozhukov and Hong [16] could be used to estimate such a model, but this will still involve considerable computational costs, and we will not pursue this option here.Furthermore, the main appeal of this procedure over the two-stage quantile regression is the availability of robust inference tools, since it is computationally more demanding (see [1] for detailed comparison).Here we are interested in selecting the model specification, rather than estimating a prespecified model.With view to this simpler methods are preferable.Once the final model specification is established and inference is the main focus, any estimation method could be applied, depending on the purpose of the analysis.
The Zietz et al. [12] approach on the other hand represents a simple two-stage quantile regression.As such it is very similar to the spatial two-stage least squares approach of Kelejian and Prucha [11], which is being used in Kostov [10].Therefore using the theoretical results of Kim and Muller [13] we can extend their two-stage quantile regression estimator to include variable selection, using essentially the same arguments as Kostov [10].Such an extension however comes at a cost.The previous approach uses two consecutive quantile regression estimators defined at the same quantiles at both steps.In the context of selecting spatial weighting matrices the first step would carry considerable computational burden, mainly because of the large number of alternatives to be considered.This means that the computational burden will be increased since separate first step estimation would need to be carried over each quantile that is to be considered.It would therefore have been very useful if one could have replaced the first step with, for example, least squares estimation, because this would then only need to be carried once.There have been empirical applications of two-stage estimation where the estimators used in the first and the second stage are different.For example, Arias et al. [17] and Garcia et al. [18] used least squares in the first step followed by quantile regression in the second.Unfortunately in general settings such an approach could induce asymptotic bias in the overall estimator (see [13] for details).In simple terms the robustness of twostage estimators could be lost when the first stage applies an estimator that is not robust.Owing to this we consider here only estimators that employ the same type of estimator for both steps.This means that we will have to use quantile regression in both steps.The use of quantile regression for each estimated quantile greatly increases the computational costs of the method compared to the linear model.
The proposal of Kostov [10] translates into using variable selection algorithm in the second stage estimation.As discussed previously this variable selection algorithm needs to be the same type as the one in the original two-stage estimator.Therefore we need a quantile regression variable selection method.There are several possibilities for the latter.First, the component-wise boosting approach used in Kostov [10] can be adapted to do variable selection in a quantile regression setting.At this end Fenske et al. [19] demonstrated that using the check function used to define the quantile regression as an empirical loss function leads to an alternative quantile regression estimator.Using this approach looks like a natural extension to the logic of Kostov [10], particularly since he does mention the potential use of alternative empirical risk functions.
Another option is to use regularised (i.e., penalised) quantile regression to select covariates.Two of the most popular regularisation approaches, namely the least absolute shrinkage and selection operator (lasso) of Tibshirani [20] and the smoothed clipped absolute deviations (SCAD) method of Fan and Li [21] have already been considered in quantile regression setting (see [22][23][24].In general these papers have established the consistency of such regularised estimators for quantile regression problems, subject to appropriately chosen "optimal" penalty parameter(s).
So, a straightforward generalisation of the approach of Kostov [10] to quantile regression involves a similar twostep procedure.In the first step a number of quantile regressions are implemented (for each candidate spatial weighting matrix) regressing the spatially lagged dependent variable on the spatially lagged independent variables.The fitted values from the first step are then used as additional explanatory variables (thus augmenting the original set of covariates).This second step is estimated using variable selection methods to effectively select the appropriate spatial weighting matrix.
There are several important features of such implementation.First, since it is based on a consistent two-stage estimator (the two-stage quantile regression estimator of Kim and Muller [13]) it should retain the consistency properties of the original estimator as long as the second step is also consistent.As already discussed the price we have to pay for maintaining such consistency is the need to estimate separate first step quantile regression for each quantile considered.Second, similarly to other the two-step procedures, standard errors, or indeed any inference based solely on the second step estimation would be invalid.One could consider asymptotic inference based on the results of Kim and Muller [13].Alternatively the overall (two-step) estimator could be bootstrapped.Note however that due to the computational costs of the first step (details of which we present later on) such an implementation would be prohibitively expensive.The best option is to follow the suggestion of Kostov [10] and only use the proposed estimator to select the structure of the model, which can then be estimated using standard methods.

Variable Selection Step
From now on we will take the first (instrumentation) step as given and will focus entirely on the variable selection step.We will argue that in order to obtain efficient inference it is desirable that in the second step a variable selection procedure characterised by the so-called oracle property is implemented.In simple words if an estimator possesses the oracle property this means that the asymptotic distribution of the obtained estimates is the same as this of the "oracle estimator, " that is, an estimator constructed from a priori knowledge of which coefficients should be zero.Therefore estimators possessing the oracle property can be used for both variable selection and inference.Here we deviate considerably from Kostov [10] who claimed that since the proposed procedure is only to be used for selecting the model structure, the oracle property in not essential.Actually the brief discussion provided in Kostov [10] implies (without explicitly mentioning it) that instead of consistency, the weaker condition of persistence [25] would be sufficient.While the oracle property aims at minimising prediction error, the persistence tries to avoid wrongly excluding significant variables.
Therefore using persistent estimator implicitly includes a measure of uncertainty very much in the spirit of Bayesian methods.The actual aim in many typical applications however would be to discover the "true sparsity pattern." For such purposes a combination of consistent and oracle estimators have been shown to be able to discover the underlying structure and retain the oracle property.This idea has been formalised and theoretically developed in Fan and Lv [26].Their methodology consists of a screening step (using a consistent variable selection method) followed by an oracle method (estimation step) to produce the final model.Even if both methods used in such a combination do not possess the oracle property the overall procedure will gain from improved convergence rates and can still be consistent subject to some additional conditions (see e.g., [27] for detailed discussion and simulation evidence).Here however we prefer to avoid imposing such additional conditions and would prefer applying a method possessing the oracle property in the estimation step.
An additional advantage of combining screening and estimation steps is the reduction in computational requirements and improved convergence rates.The convergence rates of estimators possessing the oracle property depend on the relative (to the complexity of the employed model) sample size.Owing to this it would be desirable if the size of the initial model is reduced.Applying an estimator possessing the oracle property to such a reduced model will improve this estimator's efficiency (compared to the case when it is applied directly to the larger, unrestricted model).In addition to the theoretical efficiency gain, this could bring considerable practical gains in greatly reducing the computational requirements of the selection algorithm(s) involved.Such a reduced model can be produced by using any consistent estimator (i.e., an estimator that (asymptotically) retains the important variables (i.e., variables with nonzero coefficients)).In simple terms the combination of screening and estimation steps reduces the false positive discovery rate (i.e., falsely retaining unimportant variables) and hence is tuned to structure discovery.Retaining such unimportant variables often improves prediction accuracy or uncertainty measures and hence can result in larger models (see [27], for a detailed discussion).
So we propose applying a combination of screening and estimation steps to the already transformed model.Such a proposal can be viewed as unnecessary complication to an already involved procedure.Nevertheless it has significant advantages.First, as we will show, it nests within itself the straightforward implementation of the Kostov's [10] proposal.Second since the combination of screening and estimation steps is equivalent to a single step estimation, but has better convergence rates, one can potentially further reduce the set of potential spatial weighting matrices by maintaining the consistency of the overall estimation procedure.The previously mentioned equivalency means that the overall proposed spatial model estimator which comprises three distinct steps (instrumentation, screening, and estimation) is still equivalent to the two-step method used to motivate it (i.e., the two-step quantile regression).
As discussed previously using either a boosting or regularisation approach can be viewed as different implementations of the same idea, namely, implementing a variable selection step in a two-stage quantile regression estimator.In order to ascertain the relative merits of these two alternatives let us first consider their relative computational requirements.The boosting approach is considerably less intensive in terms of computation.It has another important, in the context of spatial weighting matrix selection, advantage over the regularisation approach.Since the component-wise boosting approach processes the candidate variables one by one (see the next section for description of the component-wise boosting algorithm), high degrees of correlation amongst variables (and therefore singularity issues due to a highly nonorthogonal design) do not present significant problem to effectively reduce the set of alternatives.The nature of the spatial weighting matrix selection problem could involve simultaneous consideration of numerically very similar alternatives, which could be infeasible in the regularisation approach.Furthermore although extensively studied and shown to be consistent it is unclear whether the boosting approach possesses the oracle property.It is therefore desirable to implement the component-wise boosting as a screening method.
Then an oracle property regularisation approach can be implemented in the estimation step.Note that if we stop after the screening step, we obtain a straightforward quantile regression generalisation of the approach of Kostov [10].
Due to the fact that component-wise boosting is much faster than direct implementation of any regularisation approach, the previous strategy achieves considerable reduction in the computational requirements and makes the overall approach computationally feasible.Note that in addition to the computational requirements, direct application of a regularisation estimator could be infeasible in many spatial problems, simply because of the nature of the spatial weighting matrices to be considered.When a large number of such matrices is considered (as in [10]), the resulting transformed variables could be quite similar numerically.This could result in singularities that would prevent direct application of a regularised quantile regression estimation of the transformed problem.
In addition to the approach outlined previously we will also consider adopting the stability selection approach of Meinshausen and Bühlmann [28] to the boosting estimation.Strictly speaking stability selection is not an estimator per se, but application of a combination of subsamplings (although other forms of bootstrap could be used) and a variable selection algorithm.It provides a measure of how often a variable is selected, and therefore by using a threshold only persistent variables can be selected.

Technical Implementation Details
The screening step will use component-wise boosting estimation of quantile regression, following Fenske et al. [19].Consider the general linear quantile regression model: where  and   are the dependent and independent variables (the latter collected in the matrix ) and  is the quantile of interest.
Boosting can be viewed as a functional gradient descent method that minimises the constrained empirical risk function (1/) ∑  =1 (  , ()), where (⋅) is some suitable loss function.The th quantile regression is obtained when the so-called check function is used as empirical risk: In the a notation mentioned we intentionally use the general additive predictor (⋅) since it allows for generalisation of the approach to nonlinear and indeed nonparametric versions of the quantile regression problem.Since the check function is used to define the conventional linear quantile regression estimator of Koenker and Basett [29], using it as an empirical risk function solves an equivalent optimisation problem.
The boosting algorithm is initialised by an initial value for , for example,  0 .This implies an initial evaluation for the underlying function f0 .In this case all underlying functions will be linear.Typically one starts with an offset set to the unconditional mean of the response variable, but in the quantile regression the unconditional median is used instead (see [19] for details and justification of this choice).
Let ĝ, and f, denote the evaluations of the corresponding learners (in this case linear functions) for component  at iteration .ĝ, represents the learner (i.e., linear function) fitted to the current "residuals" while f, is the "global" evaluation of the same function (see the following algorithm).
Then the component-wise boosting algorithm iteratively goes through the following steps.
(1) Compute the negative gradient of the empirical risk function evaluated at the current function estimate (  for every step from  = 1, . ..): (2) Use the previous calculated negative gradients to fit the underlying function ĝ, (⋅) for each dependent variable (component).Here ĝ, (⋅) is fitted to the current residuals value of the used function at iteration .
The algorithm iterates between steps (1 and 3) until a maximum number of iterations are reached.The algorithm described above needs an updating step .In this application we will use  = 0.3.See Kostov [10] and references therein for a discussion about this choice and demonstration that the final results are insensitive to a wide range of choices.The other element of interest is the criterion used to decide which is the "best fitting" component in step (2).Here we use 2 norm (see the aforementioned), but other choices are also possible.The greatest advantage of 2 norm is that the base learners can be updated by simple least squares fitting, which is computationally fast and convenient (see [19]).In this particular case, since we use linear quantile regression, updating the base learners amounts to applying univariate least squares.
A regularised linear quantile regression estimator can be formally defined as min where   is the vector of the linear coefficients pertaining to the covariates, that is,   = ( 1 ,  2 , . . .,   )  , and (⋅) is a given penalty function.
The shrinkage effect is determined by the positive penalty parameter , that needs to be chosen according to some criterion (typically information criterion or cross-validation).
The SCAD penalty is symmetric around the origin (i.e.,  = 0).It is defined as follows: where  > 2 and  > 0 are tuning parameters.In this paper we will set  = 3.7, following Zou and Yuan [30], which would help us avoid searching for optimal tuning parameters over two-dimensional grid and for this reason suppress  in the notation previous.
The SCAD estimator can then be formally defined as min Straightforward implementation of regularised estimators is however computationally demanding.The main issue is that expensive repeated optimisation calls are needed to select the regularisation parameter(s) typically via some form of cross-validation.Furthermore the nonconvex nature of the SCAD optimisation problem can lead to considerable increase of the computation time at some quantiles, particularly when larger number of spatial weighting matrices are retained by the screening step, which is consistent with the results of Wu and Liu [23].In order to select the optimal amount of regularisation we need some criterion.Given the computational costs of SCAD estimation, information criteria would be preferable.Here we will employ the gprior Minimum Description Length (gMDL) criterion used in Kostov's [10] boosting application.This choice is however dictated mostly by computational reasons, and up to the best of our knowledge there is no evidence (such as simulation studies) to ascertain the performance of this criterion in empirical studies of nonlinear models.
The adaptive lasso estimator for the linear quantile regression can be defined as a weighted lasso problem in the following way: min where |⋅| denotes the 1 norm, while the weights are given by w = 1/| β |  for some  > 0, where β are initial estimates for the parameters.In this case β will be obtained by an unpenalised quantile regression.The conventional lasso estimator is a particular case when all weights are equal, rather than adaptively chosen.
The adaptive lasso when implemented in a quantile regression setting retains the oracle property [30] similarly to the mean regression case.Therefore the adaptive lasso estimator is a reasonable choice in this setting, particularly bearing in mind the computational cost associated with the transformation step.Furthermore 1 norm estimators are by far the most widely studied regularisation estimators for quantile regression (see, e.g., [23,24,30] for variable selection applications).
Li and Zhu [22] proposed an algorithm to estimate the whole regularisation path for lasso type of quantile regression problem.Their proposal is potentially valuable since it can be applied to non-(or semi-) parametric additive quantile regression models and therefore results in a much more general approach, intrinsically immune to functional form misspecification.The advantage to such algorithms is that since they exploit the piecewise linear property of the regularisation path, the latter can be obtained at a fraction of the computational cost of the overall regularised estimator.This facilitates implementation of cross-validation and/or information criteria.
The elastic net [31] penalty is a combination of 1 and 2 norms, and for the quantile regression the resulting estimator can be written as min An important property of the elastic net penalty is that the inclusion of the 2 norm induces a grouping effect in that correlated variables are grouped together.This would help avoid spuriously selecting only one variable from a group of highly correlated variables.Given that in many empirical problems the spatial weighting matrices considered can lead to highly correlated designs, it would be desirable to avoid such a pitfall.One should note however that elastic net penalisation could be expected to retain more variables compared to the other approaches.
The least squares approximation (LSA) estimator [32] is given by: min where Σ−1 =  −1 ( 2 ℓ( β)/ β) is the second derivative at the unpenalised loss function, evaluated at the unregularised estimates β.It is technically obtained as an approximation based on first order Taylor series expansion (see [32]).
In the case of quantile regression, the respective loss function (i.e., the check function   (⋅)) is not sufficiently smooth.Nevertheless, as long as Σ, which is in principle any consistent covariance matrix estimate pertaining to the unpenalised problem, can be obtained, the corresponding LSA estimator, defined in (11) exists.Furthermore when regularisation parameters are chosen optimally it possesses the oracle property (see [32] for a formal proof).Since (11) is essentially a linear lasso type of problem, it can be estimated using standard methods.In particular the computationally efficient least angle regression algorithm (LARS) of Efron et al. [33] can be used to compute the regularisation path.Here we will apply the BIC-type tuning parameter selector of Wang et al. [34] to select the optimal amount of shrinkage.Application of the LSA to a quantile regression requires a covariance matrix estimator for the latter.Any consistent estimator would be appropriate.In this paper we will use the kernel-based covariance estimator proposed in Newey and Powell [35].

Study Design and Implementation Details
For comparative purposes we follow closely the design outlined in Kostov [10].This involves using the same dataset, model specification as well as a set of competing alternative spatial weighting matrices.Since all these are discussed in some detail in Kostov [10] we will only briefly sketch them here.
The corrected version of the popular Boston housing dataset [36] is used.It consists of 506 observations and incorporates some corrections and additional latitude and longitude information, due to Gilley and Pace [37].This dataset contains one observation for each census tract in the Boston Standard Metropolitan Statistical Area.The variables comprise of proxies for pollution, crime, distance to employment centres, geographical features, accessibility, housing size, age, race, status, tax burden, educational quality, zoning, and industrial externalities.A detailed description of the variables, to be used in this study, is presented in Table 1.
The basic model as implemented in Kostov [10] is as follows: The basic specification mentioned previously is augmented with alternative candidate spatial weighting matrices, constructed using the longitude and latitude information.The set of alternative spatial weighting matrices is constructed using inverse distance raised on a power weights specification and nearest neighbours definition of the neighbourhood scheme.We will adopt the naming conventions used in Kostov [10] combining the codes for the neighbourhood definition and the weighting scheme to refer to the corresponding spatial weighting matrix and the resulting additional variables to be included in the boosting model.All these variables are named using the following convention: nxwy, where x is the number of neighbours and y is the weighting parameter (which is the inverse power of the weight decay).For example, the spatial weighting matrix with the nearest 50 observations as neighbours and inverse squared distance weights as well as the resulting transformed variable will be denoted as n50w2.We employ all values for number of neighbours from 1 to 50 and evaluate w in the interval [0.4,4] using increments of 0.1.In simple words this means that we are combining 50 possible neighbourhood definitions with 37 alternatives for the weighting parameter resulting in 1,850 alternative spatial weighting matrices to be considered simultaneously.Kostov [10] projects the spatially weighted dependent variable into the column vector space of the spatially weighted independent variables, by taking the fitted values from a least-squares regression to obtain the transformed variables, named according to the previous convention.As discussed before here we need to replace this first step with a quantile regression defined over a pre-determined quantile to obtain a model augmented with the alternative spatial weighting matrices.The second stage is then implemented in two consecutive steps.First we apply a component-wise boosting quantile regression, defined over the same quantile (as in the first stage) to the augmented model.This is the screening step that reduces the set of variables to be considered in the model.Then a regularized quantile regression (defined over the same quantile) is applied to the screened dataset.The previous three steps (transformation, screening and estimation) can be run over any prespecified quantile, and their consecutive implementation defines our estimator.
In the present setting some caution should be exercised in applying the estimation step.Note that in conditionally parametric models, there is a certain trade-off between variables and spatial dependence.The spatial dependence structure could approximate the effect of missing variables, provided these are spatially correlated.Therefore simultaneously shrinking the coefficients of both variables and spatial lags will be a manifestation of this trade-off.Whenever the model contains such related terms in both the spatial part (i.e., spatial weighting matrices) and in the regression part (variables the effect of which could be approximated by these spatial weighting matrices) simultaneous shrinkage is undesirable.The danger here is that one can spuriously exclude important variables and approximate their effect by additional spatial terms.Note however that if we assume that the regression part is given, this trade-off will disappear.Ideally one would want to eliminate this trade-off.In order to avoid the impact of the approximation on this trade-off we suggest a two-step implementation of the estimation step.In the initial step only the spatial lag coefficients are penalised, while in the following final step all coefficients are penalised.In this way the initial step should select the appropriate spatial dependence structure, while the final step would perform final variable selection.Hence the initial step makes structural inference about the spatial part conditional on the regression part of the model.If the screening step has produced a model that is reasonably close to the true one, then the proposed approach should be able to discover the true underlying structure.Alternatively one may wish to implement an iterative estimation in which the estimator alternates between steps in which only the spatial structure is penalised and steps with only the regression part are penalised until convergence (defined in terms of obtaining a stable structure in that no more terms are eliminated).Such steps can be viewed as conditioning one part (spatial or regression) of the model on the other hence avoiding the trade-off.The latter approach would however be computationally more expensive.
Another issue is the highly correlated design of the spatial quantile regression model, when there are large number of potential spatial weighting matrices.Since in principle the variable selection methods rely upon marginal correlations, they could fail to perform in such highly correlated designs.
For the mean regression model recent contributions by Wang [38] and Cho and Fryzlewicz [39] have suggested alternative methods that overcome such a reliance on marginal correlations and hence are applicable to highly correlated designs.It is however unclear how such methods can be extended to the quantile regression case.The two-step approach adopted in this paper conditions selection for the spatial and hedonic variables on the other part of the model and hence reduces this trade-off.Such an approach is justified if the regression part of the model is correctly specified, but could be suboptimal if this is not the case.This is of course an area that deserves further investigation.

Results
We implement the proposed estimator for the 0.1 to 0.9 quantiles with a step of 0.1 (i.e., 9 different quantile regressions).Table 2 presents comparative computational time details for the different procedures.All these are calculated from the first of the considered quantiles (i.e., the 0.1th one) and are given as a guidance only since the actual computational time could vary according to the nature of the optimisation problem that can change over different quantiles.All computations are undertaken using the statistical programming language R [40] on Intel Core2 2.13 GHz processor with 2 Gb RAM, not using any parallel computation.Parallelising some of the more computationally demanding tasks and/or using compiled code could considerably reduce the computational time.Furthermore it cannot be claimed that the actual implementation of these procedure is optimised in terms of computational time.The instrumentation step is the most time-consuming task.In our implementation it takes over 30 minutes for 1850 spatial weighting matrices.In many empirical problems one would probably consider much smaller number of alternative spatially weighting matrices.Furthermore most of the time in this step is spent on creating the spatially weighted dependent and independent variables, rather than fitting the actual quantile regressions.
The actual boosting procedure requires running the boosting algorithm for a large number of iterations and then calculating a stopping criterion to decide upon the estimated structure.The boosting algorithm is very efficient computationally.The stopping criterion calculation however takes considerable time.Efficient parallel implementations for the latter exist, and these can considerably reduce the computation time.
The time needed to calculate the stopping criterion is directly proportional to the number of boosting steps (which is effectively the number of alternative "models" for which it is calculated).Since in this case at all considered quantiles we need at least three times less iterations than the 5,000 used here, practical implementation would have taken 6-7 minutes rather than 18 as reported in Table 2.
We apply the stability selection to the already reduced (in the instrumentation step) dataset.Yet again this is relatively time-consuming procedure, but it can be parallelised for further computational gains.
One has to be careful in directly comparing these implementations of the estimation step, as the instrumentation step mentioned previously demonstrates; calculating the stopping criterion (i.e., the optimal penalty parameters) is by far the most computationally demanding part of these procedures and the reported implementations use different methods for this.With regard to the estimation methods we report separately the computation times for step one (where only the spatial weighting matrix coefficients are penalised) and the consecutive second step where all the coefficients are penalised.As it is to be expected the LSA is the fastest method.This is due to two underlying facts.The first is that it uses the efficient least angle regression algorithm [33] while the other refers to use of the BIC-type tuning parameter selector of Wang et al. [34] which is easy to compute.
The full path estimation for adaptive lasso, accompanied by cross-validation to choose the optimal amount or regularization, appears to be the most computationally demanding estimation method.Most of the computational costs however come from the use of cross-validation.Furthermore this is the most universally applicable method in the sense that many of the other methods can run into difficulties during the optimisation (at different quantiles) which can considerably inflate their computational costs.
We present computational details for implementing SCAD with gMDL over a predefined grid of 50 penalty values.Although the computational times appear acceptable, one has to take into account some caveats.The nonconvex nature of the SCAD optimisation problem means that in some cases the actual computation time can increase considerably (with a factor of over 100 in some cases).Furthermore we have opted to fix one of the regularisation parameters which artificially reduces the computational time.Another important point to make is that no set of penalisation parameters is ex ante guaranteed to span the whole regularisation path.In our implementation we run a preliminary SCAD estimation over a range of such values designed to identify a feasible set that does span most of the regularisation path and then manually select the grid of such values.In cases where the optimisation is difficult, this can lead to considerable increase of computational time.Therefore a path estimation algorithm for the SCAD estimator for quantile regression is essential if a reliable implementation of this method is to be designed.The use of the gMDL as an optimality criterion is also somehow ad hoc in that there is no firm evidence on its performance for this type of problems, and it is mostly dictated by computational reasons (since cross-validation, for example, would be very costly).
The elastic net implementation is reasonably efficient.Both the BIC and the generalised approximate crossvalidation yield the same models.The reported computational costs refer to the routines that compute internally both of the above criteria, but this only marginally increases the computational costs.Most of the computational load comes from the double regularisation needed to solve for the two underlying penalties.
The component-wise boosting algorithm manages to achieve considerable reduction in the model space.It retains between three and eleven spatial weighting matrices across the different quantiles.We will not present these intermediate results here for brevity reasons, but details are available upon request.This intermediate step yields a reduced model space that can be explored for the underlying structure as discussed in the methodology section.Table 3 presents the results from the stability selection applied to the prescreened model (i.e., after the boosting application).Typically stability selection applies a prespecified probability threshold to select variables.Here instead of proper stability selection we present the corresponding inclusion probabilities for the spatial weighting matrices.We omit spatial weighting matrices with inclusion probability less than 10%.Full results are available upon request.Table 3 provides a background against which the actual estimation results can be evaluated.If one was to use a threshold of 0.6, most quantiles would have resulted in a single spatial weighting matrix being selected.Such a choice would however have been base solely on the componentwise boosting algorithm, which as already discussed may advantages of the spatial quantile regression model are most pronounced.

Table 1 :
Description of variables.

Table 2 :
Typical computational details for different procedures.

Table 3 :
Stability selection-derived inclusion probabilities for spatial weighting matrices.