Efficient Basis Change for Sparse-Grid Interpolating Polynomials with Application to T-Cell Sensitivity Analysis

Sparse-grid interpolation provides good approximations to smooth functions in high dimensions based on relatively few function evaluations, but in standard form it is expressed in Lagrange polynomials. Here, we give a block-diagonal factorization of the basis-change matrix to give an efficient conversion of a sparse-grid interpolant to a tensored orthogonal polynomial (or gPC) representation. We describe how to use this representation to give an efficient method for estimating Sobol’ sensitivity coefficients and apply this method to analyze and efficiently approximate a complex model of T-cell signaling events.


Introduction
A common problem in many areas of computational mathematics is to approximate a given function based on a small number of functional evaluations or observations. This problem arises in numerical methods for PDE [1,2], sensitivity analysis [3][4][5], uncertainty quantification [2], many areas of modeling [6,7], and other settings. As a result, there are a large number of approaches to this problem, and the literature is large and growing quickly. In settings in which the points of evaluation may be chosen at will, two common approaches are generalized polynomial chaos (gPC) using cubature and sparse grid collocation [8]. In other settings in which the points of evaluation are given, common approaches include RS-HDMR, cut-HDMR, ANOVA decomposition, kriging, and moving least squares; see, for example, [9,Chapter 5] for a discussion of such methods and further references.
Sparse grid collocation has been used widely in recent years as a means of providing a reasonable approximation to a smooth function, , defined on a hypercube in R , based on relatively few function evaluations [2]. This method produces a polynomial interpolant using Lagrange interpolating polynomials based on function values at points in a union of product grids of small dimension [10,11]. Using barycentric interpolation to evaluate the resulting polynomial [12], this method is a viable alternative to an expansion of in terms of a sum of products of one-dimensional orthogonal polynomials. This latter approach is known as generalized polynomial chaos (gPC) or spectral decomposition and is obtained via standard weighted 2 techniques. However, the orthogonality implicit in the gPC representation often provides many advantages over the Lagrange representation, particularly in applications to differential equations, in which the gPC representation is closely related to spectral methods. Other advantages of the gPC representation include the ability to estimate convergence as more points are added to the sparse grid, and the ability to estimate variance-based sensitivity coefficients quickly and accurately [4]. A common approach to obtain the gPC coefficients is to use numerical integration by applying a cubature rule. However, cubature rules to integrate the product of two polynomials up to the degree in the sparse grid interpolant typically require either more or different points than found in the sparse grid itself.
In this paper, we provide an efficient algorithm for converting from the Lagrange interpolating polynomial to an equivalent gPC polynomial using only the function values at the sparse grid points. The foundation of this algorithm is a matrix factorization based on the fact that the sparse grid is a union of small product grids. More precisely, let Φ be the matrix obtained by evaluating each of the gPC basis functions (one per column) at each of the sparse grid points (one per row). This matrix produces the gPC coefficients, 2 Computational Biology Journal , from the function values, , by solving Φ = for . We show below that Φ −1 factors into a product of block diagonal matrices in which each block corresponds to one of the small product grids composing the full sparse grid. We also demonstrate the utility of this method for calculating variance-based sensitivity coefficients on a model of T-cell signaling.
Methods for changing basis between sets of orthogonal polynomials are described in many places, including [13][14][15]. Most of these results focus on the case of polynomials of one variable, and none focuses on the case of polynomial bases for sparse grid interpolants. For further background, Boyd [16,Chapter 5] discusses the matrix multiplication transform for converting between Lagrange and spectral representations, while Chapter 10 discusses the special case of the Chebyshev polynomial expansion using the FFT.
Organization. In Section 2, we provide some background into Smolyak's algorithm for sparse grid interpolation and the calculation of Sobol' sensitivity coefficients. In Section 3.1, we give the factorization of Φ −1 and in Section 3.2 we describe the algorithm to produce the gPC representation based on function values at the sparse grid points and give an upper bound on the complexity of the algorithm. In Section 4.1, we give numerical results on running time and accuracy of the algorithm for a simple test function. In Section 4.2, we describe the application of this algorithm to compute Sobol' sensitivity coefficients for a model of T-cell signaling.

Sparse Grid Interpolation.
In this section, we provide some background on Smolyak's algorithm for sparse grid interpolation. The discussion here is based on [10]. The foundation for sparse grid interpolation is interpolation in one dimension using Lagrange interpolation as follows: where ∈ N, ∈ [−1, 1], and is the Lagrange polynomial satisfying ( ) = . A common choice for sparse grid interpolation is to use the Chebyshev-Gauss-Lobatto (CGL) points, in which case = 2 −1 + 1 and = − cos(( − 1) /2 −1 ) for > 1, since this choice provides a nesting property that is vital for efficient interpolation in higher dimensions. Moreover, 1 = 1 and 1 1 = 0 for this choice. Other choices, including Gauss-Patterson nodes and nodes on a (semi)infinite interval as in [17], are also possible; the methods below apply for these choices as well. These onedimensional formulas may be tensored in dimension > 1 to yield where i and j are multi-indices with componentwise partial order, 1 is the multi-index of all 1s, i j = ( 1 1 , . . . , ), and i j ( ) = 1 1 ( 1 ) ⋅ ⋅ ⋅ ( ). This formula requires 1 ⋅ ⋅ ⋅ function values sampled on a product grid. Note, however, that when some of the are 1, then this grid is of dimension less than (since 1 = 1).
Linear combinations of these formulas produce the Smolyak formulas. Let 0 = 0 and Δ = − −1 for ∈ N, and define |i| = 1 + ⋅ ⋅ ⋅ + . Then for ≥ , we have where Δ i is the tensor product of the Δ . A multinomial expansion and = + produce This formula is known as the combination technique, which first appeared in connection with sparse grids in [18]. When > , as is common for large , we can replace + 1 ≤ |i| by ≤ |i|. An anisotropic version of this formula is described in [19]. The formula for this version is largely the same as (3) with the index set ≤ |i| ≤ replaced by a more general index set , characterized by the property that if i ∈ and > 1, then i − e ∈ , where e has 1 in the th position and zeros elsewhere. Important results from [10] include that ( + , )( ) = for all polynomials of degree at most , and that if the derivatives are continuous for all multi-indices with ≤ for all , then where = ( , ) is the number of points in the sparse grid, ( , ), for ( , ). There is a similar error estimate for a weighted 2 norm of the Chebyshev expansion of − ( , )( ). They note also that ( + , ) is asymptotic to 2 / ! as tends to ∞, in the sense that the ratio tends to 1. More precise estimates on convergence may be found in [20].

Global Sensitivity Analysis.
A common problem in analyzing complex biological models with unknown parameters is identifying which parameters play an important role in modulating the response of the model. The variance-based sensitivity coefficients of Sobol' [21] can provide a great deal of insight into this problem.
The calculation of these coefficients for a function ( ) = ( 1 , . . . , ) defined on a hypercube in R starts by writing as a normalized sum of functions that depend on a specified subset of the following variables: Computational Biology Journal 3 with a weighted orthogonality condition imposed on pairs of functions in this decomposition and a 0 mean condition in each variable separately imposed on each function individually. Depending on the context, this decomposition is known as the Sobol' or ANOVA decomposition of or the HDMR representation of . One method for obtaining this decomposition is by calculating the gPC expansion of [4,7]. In fact, the gPC expansion is essentially a refinement of the above decomposition in which each component function in the sum above is written as a sum of a product of onedimensional orthogonal polynomials.
The Sobol' or variance-based sensitivity coefficients for are then defined in terms of the variance of the functions in the Sobol' decomposition. More precisely, let denote the hypercube [−1, 1] and assume that the decomposition above takes place on this set. Let = 2 be the volume of , and for a function on this set, define the variance of to be In terms of the Sobol' decomposition given above, the main effect sensitivity coefficient for is = ( )/ ( ). This is generalized to interaction coefficients, by replacing by , where is any subset of {1, . . . , }. Also, the total effect sensitivity coefficient is the sum of over all containing .
As shown in [4], the gPC expansion of leads to an efficient method for calculating these sensitivity coefficients: ( ) is simply the sum of squares of coefficients of the gPC polynomials in the expansion of . Hence, the change of basis algorithm described above provides an efficient way to estimate Sobol' sensitivity coefficients based on function values at sparse grid nodes. We simply evaluate the function at sparse grid nodes, change to a gPC basis using Legendre polynomials, and find the appropriate sum of squares of coefficients to calculate the and .

Change of Basis Matrix and Factorization.
In this section, we describe the matrix factorization that is central to the efficient change to a gPC basis.
Let , ≥ 0 be a set of polynomials, each of degree , orthogonal on the interval [−1, 1] using inner product ∫ 1 −1 , where is a nonnegative weight function that is positive in the interior of the interval (straightforward modifications allow for unbounded intervals, but for clarity we restrict to the finite interval).
The expansion in (4) together with (1) gives a decomposition of ( , ) into a sum of products of Lagrange polynomials, with each i corresponding to a full product grid, i (although usually i is less than -dimensional since each entry of i, that is 1, produces only a single point for the corresponding dimension). On this grid, each nontrivial coordinate direction has = > 1, and the number of points in this direction is = 2 −1 + 1. Hence, the degree of each is − 1, so we write each as a linear combination of 0 , . . . , −1 . For a fixed value of , we have Taking tensor products as in (1), we obtain a basis in terms of tensor products of for interpolation on the product grid i . Taking a union over the i that appear implicitly in (4) via , we obtain a basis for interpolation on the entire sparse grid. The problem then is to determine the (gPC) coefficients in this basis for specified function values, f, at the sparse grid nodes.
Conceptually, the simplest approach to finding the gPC coefficients starts by evaluating each basis function, , at each point in = ( , ). These values form a matrix, Φ, with each basis function corresponding to one column and each point in corresponding to one row. With this definition, and a gPC coefficient vector, , the product Φ gives the value at each point in of the polynomial determined by the coefficients in . Hence, one can solve the interpolation problem by solving Φ = f for . Of course, simply constructing this matrix takes ( 2 ) operations, while solving the system in this form using elimination takes ( 3 ) operations.
The alternative presented here is to give a factorization of Φ −1 based on (4).

Theorem 2. The change-of-basis matrix Φ −1 has a factorization
whereΦ is block-diagonal, with each block corresponding to a subgrid, i ;̂is diagonal; is a matrix that includes R into R for some > .
Proof. To start, we order the columns of Φ so that for each multi-index, i, the set of column indices of basis functions for interpolation on i from (8) are the same as the set of row indices for points in i . Let be the number of points in and i the number in i . Let i be the projection from R to R i obtained by restricting to the indices corresponding to i . In this case, Φ i = i Φ i is the matrix obtained by evaluating basis functions at points, with the basis functions and points restricted to those associated with i . Hence, given f = ( ), we can interpolate on i by solving 4 Computational Biology Journal Since the gPC basis functions associated with i were obtained by changing basis from the Lagrange representation of i , we see that i ( ) and the polynomial with gPC coefficients i i are the same polynomial. Hence, we may extend the equality in (10) to all of by replacing i by on the left and dropping i on the right to obtain Let i denote the weight for i in (4). Using (11) with (4), we have Recalling Let i 1 , . . . , i be an enumeration of the indices in , let be the matrix obtained by vertical concatenation of i 1 , . . . , i , letΦ be the block diagonal matrix with Φ i 1 , . . . , Φ i as blocks, and let̂be the diagonal matrix with diagonal entries i 1 , . . . , i , each repeated according to the size of the corresponding block. Then (13) gives the factorization 3.2. Algorithm and Complexity. Based on the previous section, the algorithm to find the coefficient vector, , to represent the sparse-grid interpolating polynomial in gPC basis is simply to use (14) to find = Φ −1 f. We avoid the matrix inverse by definingf = f, solvingΦ̂=f for̂, and then using =̂̂.
In this section, we show that for fixed in (4) and large dimension, , the complexity of this algorithm is linear in the number of points of evaluation. We frame this result as a corollary to the following proposition, which gives a bound on the total number of operations needed to compute on each subgrid, i , in turn, as needed to solveΦ̂=f for̂.
In both of the following results, ( + , ) is the number of points in the sparse grid associated with ( + , ).
In this proposition, we use the fact mentioned above that when > , then the sparse grid is a union of i over i as indicated in the sum given here. Given this result, the following theorem is nearly immediate.

Theorem 4.
For fixed , there is > 0 so that for > , the coefficients in the gPC expansion of ( + , )( ) may be found using (14) with computation time bounded by ( + , ); that is, for fixed , the running time is linear in the number of grid points.
We have not attempted to calculate the best possible in the proof of this result. In fact, numerical results given below show that in some cases, the coefficient actually decreases as increases due to the spread of fixed overhead costs over a small number of points when and are small. Additionally, the numerical results show that the time per point evaluated is roughly constant through = 8.
Note that this algorithm computes the gPC coefficients of the interpolating polynomial rather than the original function itself. However, for a function, , with ≥ 1, the error estimate from (5) implies that the interpolating function converges uniformly to as the depth, , increases. Since the gPC coefficients are obtained by weighted integration of against the orthogonal polynomials, the gPC coefficients for the interpolating polynomials converge to the gPC coefficients for , and (5) provides a means to estimate the error in these coefficients based on the set of orthogonal polynomials and their corresponding weights.
Proof of Proposition 3. Given |i| = + between and + , consider the number of points in the corresponding grid i . Let = (i) be the number of nontrivial entries in i (i.e., entries larger than 1). Suppose without loss that the first entries of i are nontrivial with entries = + 1 ≥ 2. Then, the number of points in this grid is ∏ =1 (2 + 1). If 1 is the only entry larger than 1, then 1 = − ( − 1), and so the number of points is exactly 3 −1 (2 − +1 +1). A simple counting argument implies that any other distribution of nontrivial entries produces no more grid points, so To construct i with |i| = + and nontrivial entries, we select out of nontrivial entries, then distribute elements to these entries and so that each entry gets at least one element. There are ( ) ( −1 −1 ) ways to do this. Summing from = 1 to , and using ( ) ≤ , we have .
The summation on the right hand side is a binomial expansion of a power of − 1, and distributing a factor of (2 ) −1 Computational Biology Journal 5 into this power converts the right hand side to 4 (3 + 2 ) −1 . Using this in place of the sum over i and summing over gives As noted above, ( + , ) is asymptotically equal to 2 / ! for large in the sense that for fixed , lim → ∞ ( + , )/(2 / !) = 1. In particular, there is > 0 so that ≤ ( + , ) for all ≥ 1. Hence, the results just given imply Hence, taking , = 2 4 (2 ⋅ 3 ) −1 , we obtain the desired result.
Proof of Theorem 4. Given the sparse grid, , associated with ( + , ) and the vector of values, f, we need first to form the matrices ,̂, andΦ as in (13). From the construction of , it has exactly one nonzero entry per row and is × , where = ( , ) is asymptotic to 2 / ! as increases [10], and = ∑ ≤|i|≤ + | i |. Hence, and the diagonal matrix, , may be constructed and applied in time ( ). A given block ofΦ is obtained by evaluating products of one-dimensional polynomials on the points of a product grid, i . For a nontrivial entry i = 1 + , the polynomials in have degree at most 2 . Using a three-term recurrence, these polynomials may be evaluated at a given point in (2 ). Since the projection of i to the coordinate has 2 + 1 points, these polynomials are evaluated in time (2 2 ). This is repeated for each nontrivial for a total of ( 2 2 ), which is ( | i | 2 ). Then for each point in i , we multiply at most polynomials, which are ( | i |). Summing over i and applying Proposition 3, we see thatΦ may be constructed in time ( ( + , )).
As noted above, to find the gPC coefficients, , we definê f = f, solveΦ̂=f for̂, and then use =̂̂. Sincê Φ has blocks of size | i | × | i |, we may solve each block in time at most (| i | 3 ). Combining this with Proposition 3 and the estimates ( ) and ( ) above, we obtain the theorem.
A practical point is that many of the blocks inΦ are identical up to a permutation of the rows. Hence, for each of these blocks, we may use a single LU decomposition and appropriate permutations of the entries in f to reduce the total running time (although not, of course, the linear dependence described in the theorem).
We note also that the algorithm given here is compatible with the anisotropic adaptive sparse grids of [19]. The analysis given here shows that the time per point depends only on the maximum size of a block in the factorization of Φ −1 , and this size is bounded by the maximum depth, , in both the isotropic and the anisotropic cases.
Finally, the analysis for fixed and increasing is relevant in that as models become more complex, increases, while often the important behavior of a model can be captured by a relatively low-order approximation. An example of this is seen in the T-cell signaling model in Section 4.2.

Efficiency of Conversion.
In this section, we provide results of numerical experiments using the algorithm described in the previous sections. All computations were performed in MATLAB 7.7.0 on a Dell Precision PWS690 with an Intel Xeon running at 3 GHz with 3 GB of RAM.
To evaluate the running time of the conversion algorithm, we used the test function labeled oscillatory in [10] as where and 1 are chosen at random as indicated in [22]. The domain of definition is [0, 1]. The sparse grid for a given dimension and depth was created using the MATLAB package spinterp, version 5.1.1 [23]. The resulting functional values were then used as input to the revised conversion algorithm using the Legendre polynomials as basis.
In Figure 1, we plot the running time of the conversion algorithm for fixed order of accuracy, , and increasing dimension. Figure 4(a) clearly shows the linear dependence on the number of points evaluated. Figure 4(b) shows the same data as a function of the number of dimensions. Here, the nonlinear increase for bigger than 1 is due to the nonlinear dependence of the number of points of evaluation on dimension. Nevertheless, for = 2, the algorithm is reasonably fast even up to 100 dimensions.
In Figure 2, we plot the running time of the conversion algorithm for fixed dimension and increasing order of accuracy, . Here, the bound in (18) implies that the running time is bounded by 1 2 per point for some constants 1 and 2 . However, Figure 4(a) shows that the deviation from nonlinear is relatively small for small . This is made more precise in Figure 4(b), which shows the time per point as a function of . Perfect linear dependence would imply that the traces for different dimensions would coincide. While this is essentially true for ≥ 5 in the data presented, there are significant deviations for small and . This is due to fixed overhead time that must average over fewer points in these cases, which gives rise to the decrease in time per point as dimension increases when ≤ 4. Somewhat more surprisingly, the time per point (after discounting the fixed overhead) is essentially constant up to = 8.

Sensitivity Analysis for T-Cell Signaling Model.
To illustrate the calculation of global sensitivity coefficients as described in Section 2.2, we consider a mathematical model in [24] and examined further in [25]. This model describes the effect on T-cell activation of agonist and antagonist binding of major histocompatibility complex molecules, with both positive and negative feedback present in the signaling network. We use a form of this model that is a standard ODE system with 19 parameters, 37 state variables, and fixed initial conditions. We focus on the dynamics of phosphorylated ZAP, which plays an important role in positive feedback through the MEK/ERK kinase cascade. We take the uncertain parameter space to be a hypercube with sides given by [ /5, 5 ], where is the nominal th parameter value. For purposes of constructing the gPC expansion and calculating sensitivity coefficients, we transform the parameters to log space. Hence, the function that we evaluate has the form ( ; ), where is a vector of parameters given by = 5 , with each ∈ [−1, 1], and ∈ [0, 2000] is time.
More precisely, for this example, the nodes of a sparse grid are vectors in [−1, 1] 19 , and each such node gives a corresponding parameter vector using the transformation just described. For each such , we evaluate the model at 200 evenly spaced time points between 0 and 2000. For one of these fixed , say , the model evaluations ( ; ) over all the given by nodes of a sparse grid allow us to construct an interpolating polynomial, which can then be evaluated at any in the parameter space, even those that do not arise from nodes of the sparse grid. Of course the resulting interpolation is only an approximation to the true model, but the evaluation of the interpolating polynomial is in this case on the order of 20 times faster than evaluation by integrating the system of ODEs directly, and after change of basis to gPC form, the interpolating polynomial can be used to compute sensitivity coefficients very quickly. Moreover, the remarks in Section 2 quantify the relationship between the number of nodes in the sparse grid and the accuracy of the approximation.
Over the selected parameter space, this model has dynamics that can be difficult to capture using interpolation methods. Depending on parameters, the dynamics can exhibit dramatically different time scales, and, moreover, there is a division of parameter space into regions that produce trajectories with relatively high terminal concentrations of pZAP and other regions with relatively low terminal concentrations; very few trajectories have terminal concentrations in an intermediate range. These features can be difficult to capture accurately using polynomial interpolation in the same way that step functions can be difficult to approximate with Fourier series. Figure 3 shows a set of sample trajectories of the full model along with trajectories obtained by interpolation using an increasing number of model evaluations. The parameter vectors for these trajectories were chosen randomly from the full parameter space and thus are not sparse grid nodes. These plots show both the difficulty in capturing the dynamics accurately (e.g., the overshoot in the trajectory that rises most quickly) and the fact that interpolation captures the main features of the dynamics even in this difficult case and with accuracy increasing with the number of points.
After evaluating the function on a sparse grid, we next computed the Sobol' sensitivity coefficients for each of the 19 parameters as a function of time. As described in Section 2.2, this is accomplished by first using the function values ( ; ) for fixed to change to a gPC basis and thereby obtain an interpolating polynomial for ( ; ⋅); the sensitivity coefficients for this time point are then obtained by summing the squares of coefficients of appropriate terms in this polynomial. This is done for each to obtain the sensitivity coefficients as a function of time. The resulting curves are shown in Figure 4, using interpolating polynomials based on 100 function values (a) and 10000 function values (b). As seen there, the estimates of the sensitivity coefficients change in magnitude between the two panels, but the ranking of sensitive versus insensitive is quite consistent between the two. Moreover, this ranking is nearly identical to that given by an independent estimation using a quasi-Monte Carlo method.
To demonstrate the utility of this sensitivity analysis, we note that all but 4 of the parameters have sensitivity coefficients that are below 0.1 for the entire time course. The 4 remaining parameters have 3 distinct patterns of influence on the sensitivity. The first pattern is due to the ZAP dephosphorylation rate, which gives rise to a sharp spike in sensitivity near the beginning of the time course (in black in Figure 4). This sensitivity at the start of the time course is related to the way in which the model departs from the fixed initial conditions; a larger dephosphorylation leads to a steeper initial decline in the level of pZAP. However, examination of trajectories arising from a range of values for this parameter shows that there is less than 2% variation in these trajectories over the part of the curve corresponding to the peak of sensitivity. Hence, although the sensitivity to this parameter is high in this region, the reason for this sensitivity is not that the trajectories vary greatly in response to changes in this parameter, but rather that changes to the other parameters lead to essentially no change to the output in the early part of the trajectory.
The next sensitivity pattern is a moderate hump in the time period of about 50 to about 500 (shown in green and red in Figure 4). These are due to the T-cell receptor (TCR) phosphorylation rate (the upper, green curve) and the ZAP phosphorylation rate (the lower, red curve). Not surprisingly, with the remaining parameters held fixed, increased values for either of these rates generally lead to significantly increased levels of pZAP from about time 50 onward.
The final sensitivity pattern is shown in Figure 4 in the blue curve that rises around time 500, which is due to the SHP phosphorylation rate. SHP plays a role in a negative feedback loop, and this nonlinear dependence gives rise to complex and often long-lasting influences on the trajectories.
To indicate the utility of the sensitivity analysis in capturing the dynamics of this model, we took the parameter values used in Figure 3, changed all but the 4 parameters discussed as most sensitive back to the nominal values of the original model, and then resimulated. The resulting curves in Figure 5(a) indicate that this simple procedure captures the majority of the dynamical features of the original trajectories of Figure 3(a). We then used sparse grid interpolation on the 4-dimensional grid corresponding to these 4 parameters and produced interpolated curves as in Figure 3. The smaller  dimension here means that far fewer points are needed to accurately capture the trajectories relative to the 19 dimensional cases examined earlier.

Conclusions
The algorithm described here provides an efficient method to convert function values to a gPC approximation to a given function and thereby to estimate global Sobol' sensitivity coefficients. The method relies on a block-diagonal factorization of the matrix for changing basis from Lagrange polynomials to orthogonal polynomials based on function values at a Smolyak sparse grid. For a fixed degree of accuracy and increasing dimension, this algorithm is linear in the number of points of evaluation. Moreover, for fixed dimension, the time per point of evaluation is nearly constant as the degree of accuracy increases up to about = 8.
We applied this method to a time-varying model of T-cell signaling and showed that the majority of the dynamics of this model could be explained by variation in only 4 parameters: the ZAP dephosphorylation rate, and the ZAP, TCR, and SHP phosphorylation rates. Taken together, the combination of efficient sensitivity analysis and interpolation provides an easily applied and efficient method for understanding the main dynamics of a complex biological model.