New Link Functions for Distribution–Specific Quantile Regression Based on Vector Generalized Linear and Additive Models

. In the usual quantile regression setting, the distribution of the response given the explanatory variables is unspecified. In this work, the distribution is specified and we introduce new link functions to directly model specified quantiles of seven 1–parameter continuous distributions. Using the vector generalized linear and additive model (VGLM/VGAM) framework, we transform certain prespecified quantiles to become linear or additive predictors. Our parametric quantile regression approach adopts VGLMs/VGAMs because they can handle multiple linear predictors and encompass many distributions beyond the exponential family. Coupled with the ability to fit smoothers, the underlying strong assumption of the distribution can be relaxed so as to offer a semiparametric–type analysis. By allowing multiple linear and additive predictors simultaneously, the quantile crossing problem can be avoided by enforcing parallelism constraint matrices. This article gives details of a software implementation called the VGAMextra package for R . Both the data and recently developed software used in this paper are freely downloadable from the internet.


Introduction
1.1.Background.Much of modern regression analysis for estimating conditional quantile functions may be viewed as starting from Koenker and Bassett [1], who offered a systematic strategy for examining how covariates influence the entire response distribution.The fundamental idea is based on the linear specification of the th quantile function  푦 ( | ) =  푇 휏  and finding  휏 ∈ R 푝 that solves the optimization problem for independent and identically distributed (i.i.d.) observations from a family of linear quantile regression models, say  푖 =  푇   +  푖, ,  = 1, . . ., .Equation (1) can be reformulated as a linear programming problem using the piecewise linear function  휏 () =  ⋅ [ − ( < 0)] for  ∈ (0, 1).More details can be found in Koenker [2].
In the spirit of quantile regression, the conditional distribution  |  is usually unspecified, although it relies on normal-based asymptotic theory that is used for inference, whilst the assumption of homoskedasticity of the error terms  푖, is dropped.In this paper we use an alternative approach of conditional-quantile regression based on assuming a prespecified distribution for the response.Parametric quantile regression has some advantages over many nonparametric approaches, including overcoming the quantile crossing problem.Two examples are Noufaily and Jones [3] which is based on the generalized gamma distribution and generalized additive models for location, scale, and shape (GAMLSS; [4]).Further examples are the LMS-BCN method involving the standard normal distribution and a three-parameter Box-Cox transformation [5] and the classical method of rhobit ‡ This is the VGLM-link for the mean function of the logarithmic distribution.
quantile regression based on the asymmetric Laplace distribution (ALD).Our approach uses the vector generalized linear and additive model (VGLM/VGAM; [6,7]) framework.We develop new link functions, G, for the quantile regression model for a vector of quantiles  = ( 1 , . . .,  퐿 ) 푇 .Our methodology relies on the prespecification of the distribution F. We will also show that the quantile crossing problem can be overcome by this modelling framework.Equations ( 2)-( 3) state that the conditional distribution of the response at a given value of  has a distribution involving a parameter  and that the transformed quantile of the distribution becomes a linear predictor of the form (5).This can be achieved by defining link functions that connect (3) to (5).The reason for the linear predictors is that generalized linear modelling [8] is a very well-established method for regression modelling.GLMs are estimated by iteratively reweighted least squares (IRLS) and Fisher scoring, and this algorithm is also adopted by VGLMs and VGAMs.The method presented in this paper differs from conventional quantile regression [1] in that we assume F is known whereas the conventional case does not but use an empirical method instead to obtain the quantiles  휏 : the expectation of the check function  휏 () results in the property  = ( 휏 ) which defines the -quantile ( is the cumulative distribution function (CDF) of F).In this paper we consider the Fs listed in Table 2.

VGLMs and VGAMs.
VGLMs/VGAMs provide the engine and overall modelling framework in this workthe VGAM R package described below fits over 150 models and distributions-therefore we only sketch the details here.VGLMs are defined in terms of  linear predictors,  = ( 1 , . . .,  푀 ) 푇 , as any statistical model for which the conditional density of  given a -dimensional vector of explanatory variables,  = ( 1 ,  2 , . . .,  푑 ) 푇 has the form for some known function ℎ(⋅), with B = ( 1  2 ⋅ ⋅ ⋅  푀 ), a  ×  matrix of unknown regression coefficients.Ordinarily,  1 ≡ 1 for an intercept.
In general, the  푗 of VGLMs may be applied directly to the  parameters,  푗 , of any distribution, transformed if necessary, as the th linear predictor where  푗 is a VGLM-parameter link function, as in Table 1 (see [6] for further choices) and  (푗)푘 is the th element of  푗 .Prior to this work the  푗 were 'raw' parameters such as location, scale, and shape parameters; however, in this present work we define them to be quantiles or a very simple function of quantiles.
1.3.Estimation.VGLMs are estimated by maximum likelihood performed by IRLS using the expected information.The VGLM log-likelihood is given by for known fixed positive prior weights  푖 , and a Newton-like algorithm for maximizing (9) has the form ), where I is the overall expected information matrix (EIM),  is the score vector, and  is the iteration number.The vector  (푎) is obtained as the solution of the generalized least squares problem  (푎) = arg min  RSS, where the quantity minimized at each IRLS iteration is the weighted (or residual) sum of squares, RSS = The ( × )  푖 are known as the working weight matrices and they have (, )th element given by The use of individual EIMs instead of observed information matrices means that Fisher scoring is used rather than the Newton-Raphson algorithm.
VGAMs are also estimated by IRLS, where the difference with respect to VGLMs is that a vector additive model is now fitted to the pseudo-response  푖 with explanatory variables  푖 and working weight matrices  푖 at each IRLS iteration.Two approaches are currently used by VGAM to estimate the component functions  * : regression splines and vector smoothing methods with vector backfitting.Rudimentary Psplines [9] are almost operational, albeit this work is not yet complete.Compared to VGLMs, the VGAM log-likelihood includes a penalty if used with vector smoothing splines.In VGAM the objective function maximized is Here, the  (푗)푘 are nonnegative smoothing parameters, and  푘 ≤  푖푘 ≤  푘 are endpoints covering the values of each covariate.The basic penalty approach adopted here is described in Green and Silverman Green and Silverman [10].

Methodology
Let F(; , ) be a 1-parameter statistical model as in (4) parametrized by  ∈ Θ ⊂ R for some parameter space Θ residing in (−∞, ∞).Also let  푦 ( | ) be the corresponding quantile function with  ∈ (0, 1).Crucially, note that (5) handles suitable transformations of  in the linear predictor by parameter link functions.In contrast our proposal focusses on directly modelling  푦 ( | , ) via a smooth and one-to-one function G, in the form of which is to be incorporated in the VGLM/VGAM loglikelihood, namely, ( 9) and ( 12).Here,  = ( 1 , . . .,  퐿 ) 푇 is a prespecified vector of quantiles of interest.Examples of ( 13) are log  휏 , logit  휏 , and  휏 .Equation ( 13) is central to this work.It allows modelling choices via G for the quantile function  푦 , and it represents a new modification to the VGLM/VGAM framework.Note that G resembles a link function within the VGLM/VGAM framework as in Table 1.Two notes: first, without any loss of generality, (13) can be seen (strictly) as a function of  since the quantiles  and the covariates  are known.Secondly, G * is monotonic and one-to-one, as a result of the composite of G and  푦 which also hold such properties.However, during the fitting process, the IRLS algorithm internally requires the inverse of G * .Working with 1-parameter distribution at this stage eases the implementation via Fisher scoring because the inverse (G * ) −1 can be derived manually and then incorporated in the IRLS algorithm.In a few cases, the inverse of G * does not have a closed form, such as the 1-parameter gamma distribution, and an alternative iterative method is employed to approximate (G * ) −1 .To achieve this efficiently, two choices are available.These are (a) newtonRaphson.basic() from VGAMextra and (b) VGAM::bisection.basic(),two vectorized implementations of the well-known Newton-Raphson and bisection algorithms, to solve the roots of a real-valued function in a given interval (, ).Further details are given in Section 2.2.
One advantage of this work is that the VGLM/VGAM framework can circumvent the quantile crossing problem (e.g., [2,11], Sect.2.5) by choosing H 1 = I 푀 and H 2 = H 3 = ⋅ ⋅ ⋅ = H 푑 = 1 푀 (an -vector of ones).Under this parallelism assumption the method borrows strength across the entire data set so that the additive models for  푗 with respect to  2 , . . .,  푑 are parallel.Each family function in Tables 2 and  3 has a parallel argument which is FALSE by default.Using the syntax of VGAM based on Chambers and Hastie [12], setting parallel = TRUE (or parallel = FALSE ∼ 1) results in H 1 = I 푀 and H 2 = H 3 = ⋅ ⋅ ⋅ = 1 푀 ; i.e., it is false for only the intercept.
It is noted that for some distributions such as the exponential and Maxwell the  푗 are naturally parallel with respect to  2 , . . .,  푑 because log  휏 has the form ℎ 1 () + ℎ 2 ( 2 , . . .,  푑 ).If this is so then only the intercepts will change as a function of  and the MLEs for ℎ 2 are the same.Other distributions such as the 1-parameter gamma do not possess this property, and then it is necessary to constrain H 2 = ⋅ ⋅ ⋅ H 푑 = 1 푀 to avoid the quantile crossing problem.

Two Derivations.
Ideally the link transforms the support of  to R because  푗 should be unbounded.The three most common cases are as follows.For  ∈ (0, ∞) a log link is recommended, for  ∈ (0, 1) a logit link is a good choice, and  ∈ (−∞, ∞) means that an identity link is natural.These cases have been implemented for seven 1-parameter distributions.The selection of the function G for each  is shown in the 5th column of Table 2, whilst the resulting quantile links as functions of  are shown in the last column.
We now describe the quantile links for the exponential and the Topp-Leone distributions as examples.Firstly, for  ∼ exponential(), with a rate parameter  > 0, the density and CDF are given by (; ) =  −휃푦 and (; ) = 1 −  −휃푦 .With a slight change in notation, the quantile function is given by  −1 , i.e., which lies in (0, ∞) regardless of the values of  and .Given that values of  are known (prespecified by the user), ( 14) becomes a function of .Thus, the new quantile link for the exponential distribution as shown in Table 2 is simply obtained by taking G as the logarithmic transformation, as follows: This quantile link has been implemented in VGAMextra via the function expQlink(), as shown in Table 3.Its inverse (denoted as (; )) can be manually obtained from the inverse of (15).Note that the corresponding family function (exponential()) implemented in VGAM includes a (known) location parameter , which gives the density (; ) =  −휃(푦−퐴) .By default  = 0, and it is handled by the argument location.

Software Implementation.
For practical use by others, we have implemented seven VGLM-quantile links,  휏 in the R package VGAMextra.They are summarized in Table 2.The package VGAM is a requirement of VGAMextra because the modelling functions vglm() and vgam(), and all but the last family function of Table 2, reside there.For this paper VGAMextra 0.0-2 and VGAM 1.1-0 or later are required; they are available at www.stat.auckland.ac.nz/∼vmir178 and www.stat.auckland.ac.nz/∼yee/VGAM/prerelease/ whilst older versions of both are available on CRAN (http://CRAN.R-project.org).
whose primary arguments are  and .Its inverse (Table 3) does not admit a closed form and it is approximated by the function VGAM::newtonRaphson.basic(), a vectorized implementation of the Newton-Raphson algorithm.Almost all implementations elsewhere of this are for a scalar argument, but we operate on vectors of length .It works as follows.Our data is effectively  푖 = { 푖 ,  푖,푑 },  = 1, . . ., , whilst the quantiles of interest,  or , must be entered by the user.The shape parameter  is estimated by IRLS and therefore it is available at each iteration.Thus, for each (; ) 0 , the 'inverse' is given by the root, , of the function  (; , ) =  (; ) 0 − log gamma (, shape = ) .(19) Finally, the inverse of all the VGLM-quantile links is shown in Table 3, as well as the name of the corresponding implementation in VGAMextra.The inverse-links are required at different stages of the IRLS by Fisher scoring, which internally switches between (; ) (namely, Table 2) and (; ) (namely, Table 3).Specifically, the algorithm requires the score vector and the EIMs at each IRLS iteration, which are given by the following chain-rule formulas: Internally, the functions utilized to compute the inverse are VGAM::eta2theta() or VGAM::theta2eta().The VGAMextra Manual and Miranda-Soberanis [13] give further details about the derivation of the quantile links, whilst Yee [6] describes in the IRLS and Fisher scoring algorithms for estimating VGLMs and VGAMs.Complements at the second author's homepage give additional details on link functions.

Software Use.
For the user, this methodology runs as usual by calling the modelling functions VGAM::vglm() and VGAM::vgam(), except for two modifications that are described below.
To start with, we give the following output that shows the central arguments handled by VGAM::vglm(): The first adjustment takes place with the argument formula, a symbolic description of the model to be fit.Usually, an expression like y ∼ x2 + x3 should suffice for a response y and covariates x2 and x3.This effectively works for univariate and even for multiple responses say y1, y2, and y3, where the only change is to set cbind(y1, y2, y3) ∼ x2 + x3.Here, the right hand side (RHS) of the formula is applied to each linear predictor.For quantile modelling using VGLMs and VGAMs, Q.reg() must be incorporated in the formula, whose arguments are shown in Table 4.For a given set of quantiles of interest, entered through  = ( 1 , . . .,  퐿 ) 푇 , Q.reg() replicates the response matrix Y into  ⋅ dim() columns, where  denotes the number of columns of Y.Then, the RHS of the formula applies to every set of columns according to the number of quantiles of interest.Ordinarily the response is a vector so that  = 1 and  = .
As an example, suppose that we have two responses  1 and  2 sampled from a prespecified distribution F, as per Table 3, and the quantiles of interest are  = (0.25, 0.50, 0.75) 푇 .Then Q.reg(cbind(Y1, Y2), pvector = p) will return a matrix with six columns, with the first three columns being  1 , one for each quantile, and similarly the last three columns equal  2 .Thus vglm() handles this model as a multiple responses fit.
The second adjustment is related to the argument family, a function that describes the statistical model to be fitted.Each family has at least one argument for the link functions to be used in the fitting process (the name changes from family to family).For example, for VGAM::exponential() this is called link, whilst for the family function VGAM::benini1() (see the third column of Table 3), it is called lshape.When VGLM-quantile modelling is to be performed, the corresponding link (last column of Table 3) must be entered into the family accordingly.All the quantile links manage the same arguments, including p, the vector of quantiles, except by benini1Qlink() which has the additional argument y0.
With both modifications, a typical call has the following form: Further fitting variants can be incorporated here, e.g., categorical covariates and the use of smoothers such as regression splines.These and a few other features are illustrated in the following section.

Examples
3.1.Maxwell Data.We use simulation to generate  = 200 random variates from a Maxwell distribution whose rate parameter is a function of a single covariate  2 .To account for a nonlinear trend in the dataset, additive models with cubic smoothing splines appear to be a better choice over linear schemes such as with VGLMs.In this example we perform the following steps to confirm the performance of the methodology.
(1) Generate random deviates from the Maxwell distribution.
(3) Perform ordinary quantile regression using VGAM::alaplace1() that estimates the 1-parameter ALD by Fisher scoring.Here, the special argument tau will be employed.
Figure 1 shows the simulated data, the estimated quantile functions, and the fitted quantile curves from fit.Qmodelling and fit.Qregression, obtained from vector smoothing spline fits [14].The results are similar for  2 > 0.3, but our present work performs better at the bottom LHS tail.The data coverage from each modelling framework is summarized in Table 5.Once again our work outperforms the ALD method.
We conclude with a few remarks.
(1) The argument p is available for all quantile links in Table 3 and not only for maxwellQlink().
It can be assigned any vector of percentile values.
(2) Under the conditional VGAM-quantile modelling framework, the arguments to handle the parallelism assumption such as the arguments parallel.locatand parallel.scale in family functions are no longer required.This is internally managed by the new quantile links rather than being managed by the family function.
(3) If fit is a Qlink fit, then fitted(fit) returns the fitted quantiles.This is in the form of a  × ( ⋅ ) matrix.Similarly, predict(fit) returns a  ×  matrix where the th row is  푇 푖 .The results should be the similar to Section 3.1 because the ALD and the classical quantile regression method are essentially the same.It can be seen that the bottom LHS corner is not modelled well with quantreg either.Once again our method performs best, which is not surprising given the strong distributional assumption.[15] fit an exponential distribution to a data set comprising the time to death (in weeks) and white blood cell counts for two groups of leukaemia patients, and a binary variable for AG-positive and AG-negative.The two groups were not created by random allocation.The variable AG is the morphological variable, the AG factor; a numeric vector where 1 means AG-positive and 2 means AG-negative.We create AG01 which is AG -1.We take the log of the white blood cell count (WBC) because it is very highly skewed.The data are found in GLMsData on CRAN, which supports Dunn and Smyth [16].

Exponential Data. Feigl and Zelen
One benefit of quantile modelling with VGLMs is that it easily allows comparisons of the effect of AG01 or any other indicator variable, at different quantiles.First note that, for AG-positive patients with logWBC = 9, the 25% percentile for For further illustration's sake, we fit a 1-parameter gamma distribution to these data and interpret the results.Unlike the Maxwell and exponential distributions, where simple mathematics shows that different quantiles are parallel because their logarithm is additive with respect to , the 1-parameter gamma does not possess this property.
Here, the level of WBC constant, for patients at the 25% percentile, the time to death for AG-negatives compared to AG-positives is multiplicative by a factor of exp(−1.46)≈ 0.232, i.e., a 76.8% reduction.In comparison, for patients at the 75% percentile, the time to death for AG-negatives compared to AG-positives is multiplicative by a factor of exp(−1.27)≈ 0.281, i.e., a 71.9% reduction.This suggests that the effect of AG is greater for more severe cases than those who live longer in general.
Finally, just to check, we obtain the constraint matrices for each predictor: There is a parallelism assumption made for logWBC but not for any of the other explanatory variables.

Discussion and Future Work
This work in parametric quantile regression is blighted by the strong assumption of the assumed distribution.In theory, this might be ameliorated somewhat by implementing as many distributions as possible.Some of the distributions listed in Table 2 have real applications, for example, in kinetic-molecular theory the speed of individual molecules of idealized gases follows the Maxwell distribution and the average kinetic speed is directly related to Kelvin temperature.In experiments that do not satisfy the various postulates made (such as the effects of the container) one might model the median particle speed with  comprising temperature and other covariates such as volume of the container and density of the container walls.Different forms of gases, such as plasmas and rarefied gases, could be modelled as such too.Another example is the Rayleigh distribution which is similar to the Maxwell distribution.In two-dimensions, and in applications of magnetic resonance imaging (MRI), complex images are often viewed in terms of the background data, which is Rayleigh distributed.Nonstandard background information could be included in  and their effects on the distribution examined.
In the current software implementation there are limitations due to its internal design.For example, it would be good if worked like many other VGAM models.The difficulty here is that the @linkinv S4 slot of a VGAM family function has eta as an argument, and in our implementation this could only possibly be created by supplying the new percentiles to predict() beforehand.
Another minor deficiency in our software implementation is that the response vector is replicated dim() times so that is a form of recycling.Possibly this could be avoided because the memory requirement might be excessive when either dim() or  are very large.
At present, the VGAM framework has infrastructure to afford 1-parameter quantile links.For quantile functions depending on 2 or more parameter, such as the twoparameter gamma distribution, the quantiles will be bivariate functions whose inverse would probably not admit a closed form.Nevertheless, future work includes being able to write links for two-parameter distributions, of which the normal distribution would be the most important.For this, the methodology behind Yee and Miranda-Soberanis [17] could be employed; they solve a decades-old problem implementing the two-parameter canonical link function log(/( + )) of the negative binomial distribution.We have already commenced work in this direction, e.g., with the 2-parameter gamma distribution.
Quantreg Package.For checking purposes, the results are compared with quantreg too.Figure2gives the results based on the following code.

Table 2 :
New link functions for the quantiles of some 1-parameter distributions.The selected G function is also shown.

Table 3 :
Inverse of the quantile links and names in VGAMextra."Approximate" means that Newton-Raphson or bisection is used to approximate the inverse.All family functions except for normal1sdff(), which is in VGAMextra, are in VGAM.