Determinant Efficiencies in Ill-Conditioned Models

The canonical correlations between subsets of OLS estimators are identified with design linkage parameters between their regressors. Known collinearity indices are extended to encompass angles between each regressor vector and remaining vectors. One such angle quantifies the collinearity of regressors with the intercept, of concern in the corruption of all estimates due to ill-conditioning. Matrix identities factorize a determinant in terms of principal subdeterminants and the canonical Vector Alienation Coefficients between subset estimators—by duality, the Alienation Coefficients between subsets of regressors. These identities figure in the study of D and Ds as determinant efficiencies for estimators and their subsets, specifically, Ds-efficiencies for the constant, linear, pure quadratic, and interactive coefficients in eight known small second-order designs. Studies on Dand Ds-efficiencies confirm that designs are seldom efficient for both. Determinant identities demonstrate the propensity for Ds-inefficient subsets to be masked through near collinearities in overall D-efficient designs.


Introduction
Given {Y Xβ } of full rank with homogeneous, uncorrelated errors, the OLS estimators β are unbiased with second-moment matrix V β σ 2 X X −1 .Such moment matrices pervade experimental design, to include determinants as gauges of Dand D s -efficiencies for estimators and their subsets.Early references trace to 1-4 , and more recently to 5-10 and others.Finding D s -efficient designs for polynomial models is considered in 11-20 , for example.Studies examining the D s -efficiencies of D-efficient designs confirm that designs are seldom efficient for both; see 13, 21-23 .From those beginnings, the study of Dand D s -efficiencies continues apace.To wit, a recent key-word search in the Current Index to Statistics shows in excess of 60 listings from 2006 to 2010, and more than 100 from 2001 to 2010.Moreover, these ideas bear fruit in a widening diversity of applications as evidenced in the following.
To fix ideas, let D correspond to a polynomial P c of degree c, namely, g μ c i 0 β i t i .In toxicology studies, a two-stage experiment is proffered in 24 , seeking D-efficiency in estimating k c 1 overall parameters at the first stage, then D 1 -efficiency at the second stage in estimating a critical "threshold parameter," using quasilikelihood in nonlinear models.Coupled with this is the D k−1 -efficiency for the remaining k − 1 parameters at the second stage.In related work 25 , experiments with c chemicals in combination are to be examined along fixed-ratio rays.When restricted to a specified ray, the fundamental hypothesis of noninteracting factors can be rejected when higher-order polynomial terms are required in the total dose-response model g μ β 0 β 1 t c i 2 β i t i in the linear predictor t.Here D 2 refers to β 0 , β 1 and D c−1 to D s -efficiency in the critical estimation of β 2 , . . ., β c , which vanish under the conjectured additivity.Moreover, in 11, 12, 21 D refers to P k 1 , D k to P k , and D 1 to β k 1 , the highest-order term in P k 1 , for example.In short, users often are properly concerned with both Dand D s -efficiencies, and connections between these basic criteria deserve further study, to be undertaken here.
Ill-conditioning, as near-collinearity among the columns of X, "causes crucial elements of X X to be large and unstable," "creating inflated variances," and estimates that are "very sensitive to small changes in X," having "degraded numerical accuracy;" see 26-28 , for example.Diagnostics include the condition number c 1 X X , the ratio of largest to smallest eigenvalues; and the Variance Inflation Factors {VIF β j v jj w jj ; 1 ≤ j ≤ p} with W X X and V X X −1 , that is, ratios of actual v jj to "ideal" 1/w jj variances had the columns of X been orthogonal.In models with intercept, "collinearity with the intercept can quite generally corrupt the estimates of all parameters in the model whether or not the intercept is itself of interest and whether or not the data have been mean centered," as noted in 29 .
To the foregoing list of ills from ill-conditioning, we add that not only are designs seldom efficient for both, but D s -inefficient estimators may be masked in overall D-efficient designs, and conversely.This masking may be quantified in terms of structural dependencies, specifically, through determinant identities linking Dand D s -efficiencies to various gages of nonorthogonality of the data.The latter include nonvanishing inner products between columns of regressors, Hotelling's 30 canonical correlations among OLS solutions, and VIFs.An outline follows.
Section 2 contains supporting material.Details surrounding collinearity diagnostics are topics in Section 3, to include duality of angles between subspaces of the design and parameter spaces, and their connections to VIFs.Section 4 develops basic determinant identities and inequalities of independent interest.Section 5 revisits eight small second-order designs with regard to D s -efficiencies in estimating the constant, linear, pure quadratic, and interactive coefficients, to include the masking of inefficient estimators.Though in wide usage, with no apparent accounting for collinearity, these designs are seen to exhibit varying degrees of collinearity of regressors with the constant.Since computations proceed from the design matrix itself, an advantage is that prospective designs can be evaluated beforehand in regard to issues studied here, before committing to an actual experiment.Section 6 concludes with a brief summary.

Notation
Spaces of note include R k as Euclidean k-space; R k as its positive orthant; F n×k as the real matrices of order n × k ; S k as the k × k real symmetric matrices; and S k as their positive definite varieties.The transpose, inverse, trace, and determinant of A ∈ S k are A , A −1 , tr A , and |A|; and A 1/2 is its spectral square root.Special arrays include the unit vector 1 n 1, 1, . . ., 1 ∈ R n , the identity I n of order n × n , the block-diagonal matrix Diag A 1 , A 2 ∈ S k , the idempotent form B n I n − n −1 1 n 1 n , and O k as the real orthogonal group of k × k matrices.For X n × p of rank p ≤ n, designate a pseudoinverse as X † , its ordered singular values as σ X {ξ 1 ≥ ξ 2 ≥ • • • ≥ ξ p > 0}, and by S p X ⊂ R n , the linear span of columns of X.Its condition number is c 2 X ξ 1 /ξ p , specifically, c 2 X c 1 X X 1/2 .The mean, dispersion matrix, and generalized variance for a random , and GV U |Σ|, respectively.To account for dimension, consider G U GV U 1/k |Σ| 1/k as a function homogeneous of unit degree.The class M 0 : {Y β 0 1 n Xβ }, comprising models with intercept and dispersion V σ 2 I n , is our principal focus.Unless stated otherwise, we take σ 2 1.0, since variance ratios are scale-invariant.A distinction is drawn between centered and uncentered VIFs, namely, VIF c s and VI F u s, the former from columns of X centered to their means.The latter, designated as {VIF u β j ; j 0, 1, . . ., k}, are diagonal elements of X 0 X 0 −1 divided by reciprocals of diagonals of X 0 X 0 itself.These are of subsequent interest.Special distributions on R 1 include the Snedecor-Fisher distribution F •; ν 1 , ν 2 , λ having ν 1 , ν 2 degrees of freedom and noncentrality λ.

Collinearity Diagnostics
Ill-conditioned models {Y Xβ }, burdened with difficulties as cited, trace to nonorthogonality among columns of X.To examine aspects of near collinearity, we first establish duality between design linkage parameters among columns of X, and collinearity among the OLS solutions as quantified by Hotelling's 30 canonical correlations.

Duality Results
Partition a generic X ∈ F n×p as X X 1 , X 2 with {X, X 1 , X 2 } of orders { n × p , n × r , n × s }, respectively, having ranks {p, r, s} such that r ≤ s and r s p < n.Accordingly, write {Y X 1 β 1 X 2 β 2 }, taking β β 1 , β 2 , and denoting by S p X 1 and S p X 2 , the subspaces of R n spanned by columns of X 1 and X 2 .We seek a canonical form preserving these subspaces and linkage between X 1 , X 2 , a geometric concept independent of bases for representing S p X 1 and S p X 2 .Accordingly, let Following 31 , cosines of angles between S p X 1 and S p X 2 are found as singular values generated by X 1 , X 2 , to be designated as design linkage parameters {δ 1 , . . ., δ r }.To these ends, observe that X X in partitioned form transitions into Z Z through ; its singular decomposition is R PDQ , where D D δ , 0 ; and elements of D δ Diag δ 1 , . . ., δ r comprise the singular values of R. In particular, {φ j arccos δ j ; 1 ≤ j ≤ r} defines the design linkage angles between S p X 1 and S p X 2 as subspaces of R n .
To continue, partition V β Σ Σ ij conformably with β β 1 , β 2 , β 1 ∈ R r , β 2 ∈ R s ; designate their inner product space as R r ⊕R s , •, • Σ , where R r ⊕R s is the direct sum and •, • Σ their inner product, as in Eaton 32, page 409 .Denote by {ρ 1 , . . ., ρ r } Hotelling's 30 canonical correlations.Then by Proposition 10.2 of 32 , {ρ 1 , . . ., ρ r } are cosines of angles between R r , R s as subspaces of R r ⊕ R s , •, • Σ .In keeping with earlier usage, identify {S p β 1 , S p β 2 } with {R r , R s }.As Hotelling's canonical correlations are invariant under affine transformations parameters may be redefined linearly, preserving subspaces, thus leaving the canonical correlations invariant.Retracing steps leading to the canonical design model embodied in 3.1 , but now to preserve with Z Z as the rightmost matrix of 3.1 .We next establish connections between the design linkage parameters D δ from 3.1 , and the corresponding canonical correlations D ρ Diag ρ 1 , . . ., ρ r , as derived eventually from Σ X X −1 .A critical duality result is encoded in the following.
Theorem 3.1.Consider the design linkage parameters D δ between {S p X 1 , S p X 2 } as subspaces of R n and Hotelling's [30] Proof.In view of invariance of {ρ 1 , . . ., ρ r } under nonsingular linear transformations of β 1 ∈ R r and of β 2 ∈ R s , canonical correlations between β 1 , β 2 proceed as in expression 3.1 , but beginning instead on the left with V α Z Z −1 in lieu of X X.Specifically, with D D δ , 0 , and using rules for block-partitioned inverses, we have since diagonal matrices commute.But the off-diagonal block is precisely D ρ , 0 , the canonical correlations between β 1 , β 2 , to complete our proof.

Corollary 3.2. (i) Consider the design linkage parameters {cos φ j
δ j ; 1 ≤ j ≤ r}, gaging collinearity between {S p X 1 , S p X 2 } as subspaces of R n and the canonical correlations {cos φ j ρ j ; 1 ≤ j ≤ r}, between {S p β 1 , S p β 2 } as subspaces of R r ⊕R s , •, • Σ .Then angles between these pairs of subspaces correspond one-to-one, that is, {φ j arccos ffi j arccos ae j ; 1 ≤ j ≤ r}.(ii) For models X 0 1 n , X in M 0 , the element δ 1 n : X δ 1 generates the angle cos φ 1 δ 1 between the regressor vectors and the constant vector.Equivalently, this is given by cos φ 1 ρ 1 ρ β 0 : β from duality.Stewart 33 reexamined numerical aspects of ill-conditioning, to the following effects for X 0 1 n , X .Taking X † 0 X 0 X 0 −1 X 0 as the pseudoinverse of note, and letting x † j be its jth row, each collinearity index in the collection

Collinearity Indices
is constructed to be scale-invariant.Clearly x † j 2 is found along the principal diagonal of In addition, the conventional VIF u s are squares of the collinearity indices, that is, {VIF u β j κ 2 j ; j 0, 1, . . ., k}.In particular, since x 0 1 n in X 0 , we have 2 .Transcending Stewart's analysis, we connect his collinearity indices to angles between subspaces as follows.Choose a typical x j in X 0 ; rearrange X 0 as x j , X j and similarly β as β j , β j ; and seek elements of as reordered by each permutation matrix Q j .From the clockwise rule, the 1, 1 element of each inverse is where P j X j X j X j −1 X j is the projection operator onto the subspace S p X j ⊂ R n .
These relationships in turn enable us to connect {κ 2 j ; j 0, 1, . . ., k} to the geometry of illconditioning as follows.
Proof.From the geometry of the right triangle formed by x j , P j x j , the squared lengths satisfy x j 2 P j x j 2 RS j , where RS j x j − P j x j 2 is the residual sum of squares from the projection.Accordingly, the principal angle between x j , P j x j is given by cos φ j x j P j x j x j • P j x j for {j 0, 1, . . ., k}, to give conclusion i and conclusion ii by duality.Conclusion iii follows on specializing x 0 , P 0 x 0 with x 0 1 n and P 0 X X X −1 X , to complete our proof.
Remark 3.4.The foregoing developments specialize from Section 3.1 in that the partition x j , X j always has r 1 and s k, giving a single angle φ j .Rules-of-thumb in common use for problematic VIFs include those exceeding 10, as in 34 , or even 4 as in 35 , for example.In angular measure, these correspond respectively to φ j < 18.435 deg and φ j < 30.0 deg.

Case Study 1
Consider the model M 0 : i }, the design X 0 1 5 , X 1 , X 2 of order 5 × 3 , and X 0 X 0 and its inverse as in thereby preempting the need to undertake singular decompositions as required heretofore.

Background
The generalized variance, as a design criterion for {Y Xβ }, rests in part on the geometry of ellipsoids of the type Choices for c 2 in common usage give first i a confidence region for β, whose normal-theory confidence coefficient is 1 − α on taking c 2 S 2 c 2 α , with S 2 as the residual mean square and c 2 α the 100 1 − α percentage point of F •; k, n − k ; and otherwise admitting a lower Chebychev bound as in 36, page 92 .The alternative choice c 2 k 2 gives ii Cramér's 37 ellipsoid of concentration for β, that is, the measure uniform over R β having the same mean and dispersion matrix as β.The generalized variance GV β |V β | is proportional to the squared volumes of these ellipsoids, smaller volumes reflecting tighter concentrations.

Factorizations
To continue, let some T Y θ ∈ R k be random having E θ θ and V θ Σ ∈ S k ; partition θ θ 1 , θ 2 and Σ Σ ij conformably, with θ 1 ∈ R r and θ 2 ∈ R s such that r ≤ s and r s k; and let G θ |Σ| 1/k .The canonical correlations 30 , as singular values of

Conclusion ii follows directly from G θ
GV θ 1/k , and conclusion iii on combining i and ii .Conclusion iv now follows on applying iii twice, first on partitioning θ into { θ 1 , θ 2 , θ 3 }, whose canonical correlations are ρ 1 : 23 , then θ 2 , θ 3 into { θ 2 , θ 3 } having canonical correlations ρ 2 : 3 , to complete our proof.Remark 4.2.In short, Theorem 4.1 links determinants and principal subdeterminants precisely through angles between subspaces.Moreover, arguments leading to conclusion iv may be iterated recursively to achieve a hierarchical decomposition for four or more factors, as in the following with k r s t v, namely, Equivalently, duality asserts that γ 1 : 2 Π r i 1 1 − δ 2 i is the identical composite index of linkage between {S p X 1 , S p X 2 } as subspaces of R n .Theorem 4.1 anticipates that D s -inefficient subset estimators may be masked in a design exhibiting good overall D-efficiency.Conversely, a D s -inefficient subset may contraindicate, incorrectly, the overall D-efficiency of a design.Details are provided in case studies to follow.

The Setting
Our tools are informative in input-output studies.In particular, specify {Y X 0 β } as a second-order model Y x 1 , x 2 , x 3 in three regressors and p 10 parameters, namely, β L , β Q , β I exclusive of β 0 , the latter a base line for Y 0, 0, 0 .We proceed under conventional homogeneous and uncorrelated errors, the minimizing solution β We take σ 2 , although unknown, to reflect natural variability in experimental materials and protocol, and thus applicable in a given setting independently of the choice of design.Accordingly, for present purposes we may standardize to σ 2 1.0 for reasons cited earlier.

The Designs
Early polynomial response designs made use of factorial experiments, setting levels as needed to meet the required degree.For example, the second-order model 5.1 in three regressors would require 3 3 27 runs.However, in the early 1950s such designs were seen to be excessive, in carrying redundant interactions beyond the pairs required in the model 5.1 .In industrial and other settings where parsimony is desired, several small second-order designs have evolved, often on appending a few additional runs to two-level factorials or fractions thereof.Eight such small designs of note here are the hybrids H310, H311B of 38 , the small composite SCD 39 , the BBD 40 , the central composite rotatable design CCD 41 , and designs ND 42 , HD 43 , and BDD 44 .The designs H310, H311B, SCD, BBD, CCD, ND, HD, BDD have numbers of runs as 11,11,11,13,15,11,11,11 , respectively.These follow on adding a center run to all but design ND, rendering all as unsaturated having at least one degree of freedom for error.Specifically, the design ND of 42 already has 11 runs and is unsaturated.All designs have been scaled to span the same range for each regressor; and none strictly dominates another under the positive definite dispersion ordering.All determinants as listed derive from the respective V β X 0 X 0 −1 and its submatrices.Subset efficiencies for {β L , β Q , β I } were examined in 45 for selected designs using criteria other than Dand D s -efficiencies.Our usage here, as elsewhere in the literature, considers GV β and G β to be efficiency indices for β specific to a particular design, to include subsets { β i ; i ∈ I}, and smaller values reflect greater efficiencies through smaller volumes of concentration ellipsoids.On the other hand, the comparative efficiencies of two designs for estimating β or {β i ; i ∈ I} are found as ratios of these quantities.

Numerical Studies
Details for these designs are listed in the accompanying tables.Table 1 gives values G • GV • 1/ dim for β and selected subsets, with dim as the order of the determinant.Also listed are angles φ 1 n : X deg between regressors and the constant, to be noted subsequently.Table 2 displays the squared canonical correlations ρ 2 β i : β j between designated subsets, and Table 3 the corresponding Vector Alienation Coefficient γ β i : , for specified pairs.Here {0L, QI} refers to the pair { β 0 , β L , β Q , β I }, for example.Moreover, values of the composite indices γ β i : β j γ X i : X j , if much less than unity, serve to alert the user as to potential problems with ill-conditioning.

An Overview
To fix ideas, observe for the CCD that G β Q , β I 1.13633, G β Q 1.03300, and G β I 1.25000 from Table 1.These not only are comparable in magnitude, but are commensurate, in having been adjusted for dimensions and thus homogeneous of unit degree, as are all entries in Table 1.Moreover, since β Q , β I are uncorrelated and γ β Q , β I 1.0 for the CCD from Table 3, G β Q , β I is the geometric mean 1.13633 1.03300 3/6 1.25000 3/6 from Theorem 4.1 ii .A further rough spot check of Table 1 may be summarized as follows.P5 The designs {ND, HD, BDD} are considerably less D-efficient, with their generalized variances GV β being {1192.09,4768.37,2886.03},respectively, in comparison with {57.342,11.852,74.422,2.722, 0.523} for the remaining designs; and each of the former is burdened by unequivocal D s -inefficiency for β Q , to be treated subsequently.

Further Details
We next examine Hartley's 39 SCD in some detail, first in terms of generalized variances.

5.2
Corresponding factorizations proceed similarly for other designs.Details are left to the reader, but values for γ 0 : LQI γ L : QI γ Q : I 1/10 , the rightmost factor of 5.2 , are supplied for each design as the final row of Table 3.Although the tables, together with Theorem 4.1, support other factorizations, the one featured here seems most natural in terms of the parameters {β 0 , β L , β Q , β I }, together with their central roles in identifying noteworthy treatment effects in second-order models.

Masking
The D-efficiency index of the SCD, at GV β 74.4216, is larger but roughly comparable to that of H310 at GV β 57.3418.What cannot be anticipated from these facts alone, however, is that the 3 × 3 determinant GV β I 72.3379 for the SCD is comparable to its 10 × 10 determinant GV β 74.4216, despite their disparate dimensions.Adjusting for dimensions gives G β 74.4216 1/10 1.53876 and G β I 72.3379 1/3 4.16667 for the SCD.This illustrates the masking of a remarkably inefficient estimator for β I , despite the value G β 1.53876 in estimating all parameters.This masking stems from the nonorthogonality of subset estimators as reflected in their canonical correlations and Vector Alienation Coefficients.In contrast are the corresponding commensurate values for the H310 design, namely, G β 57.3418 1/10 1.49916 and G β I 4.1768 1/3 1.61045.It may be noted that the condition number c 1 X 0 X 0 is 21.59 for H310, with the somewhat larger value 54.01 for the SCD.
We next examine the D s -inefficiencies of ND and HD for β Q as noted earlier, with G β Q taking values 7.80031 and 6.61313, respectively.Our reference for masking is G β L , β Q .These values are not listed in Table 1, but may be recovered from Tables 2 and  3 as follows.Specifically, for ND we have Parallel steps for HD give the factorization 1.91189 3/6 6.61313 3/6 0.78514 1/6 3.41528, with similar conclusions in regard to masking.

Collinearity with the Constant
Advocates for these and other small designs have focused on D, D s , and other efficiency criteria, as well as the parsimony of small designs and their advantage in industrial experiments.To the knowledge of this observer, none has considered prospects for illconditioning and its consequences, despite the fact that columns of X are necessarily inter-linked as a consequence of second-order from first-order effects.Nonetheless, from Section 3.1 and Corollary 3.2, we may compute angles between the constant vector and the span of the regressors using duality together with the information at hand.This may prove to be critical in view of the admonition 29 that "collinearity with the intercept can quite generally corrupt the estimates of all parameters in the model."As noted in Remark 3.4, rules-of-thumb for problematic VIFs include those exceeding 10 or 4 or, in angular measure, φ * * < 18.435 deg and φ * < 30.00 deg.From the last row of Table 2, the angles φ 1 n : X have been computed for each of the eight designs, as listed in the final column of Table 1.For example, arccos √ 0.81990 25.1116 deg for H310.It is seen that all designs are flagged as potentially problematic using rules-of-thumb as cited.This adds yet another layer of concerns, heretofore unrecognized, in seeking further to implement these designs already in wide usage.
Matrix identities factorize a determinant in terms of principal subdeterminants and the Vector Alienation Coefficients of 30 between { β 1 , β 2 }.By duality, the latter also are Alienation Coefficients between {X 1 , X 2 }.These identities in turn are applied in the study of D s -efficiencies for the parameters {β 0 , β L , β Q , β I } in eight small second-order designs from the literature.Studies on D s -and D-efficiencies, as cited in our opening paragraph, confirm that designs are seldom efficient for both.Our determinant identities support a rational explanation.In particular, these identities unmask the propensity for D s -inefficient subset estimators to be masked through near collinearities in overall D-efficient designs.
Finally, the evidence suggests that all eight designs are vulnerable, to varying degrees, to the corruption of all estimates due to ill-conditioning.In short, we have exposed quantitatively the structural origins of masking through Hotelling's 30 canonical correlations, and their equivalent design linkage parameters.This analysis in turn proceeds from the design matrix itself rather than empirical estimates, so that any design can be evaluated beforehand with regard to masking and possible subset inefficiencies, rather than retrospectively after having committed to a given design in a particular experiment.

3 . 2 where
equality at the first step follows using DD D 2 δ and D D Diag D 2 δ , 0 D 0 .The succeeding step utilizes the factors I r − D 2 δ 1/2 and I s − D 0 1/2 , taking the principal diagonal blocks of Z Z −1 into I r , I s as in the rightmost matrix of 3.2 , and its off-diagonal block from

P1
Compared with G β , values for G β 0 appear excessive throughout.P2 Values for G β L are roughly comparable across designs.P3 The eight designs sort essentially into two groups.P4 Designs {H310, H311B, SCD, BBD, CCD} overall are comparatively Dand D sefficient, with the noted exception being G β I 4.16667 for the SCD.

Table 3 .
Theorem 4.1 i again recovers GV β M as 81.8638 4.6926 81.8638 0.21600 since G β L and γ L : QI are reciprocals in this instance.Moreover,

Table 2 :
Squared canonical correlations between designated subsets β i , β j of estimators for eight small second-order designs.

Table 4 :
Generalized variances and Vector Alienation Coefficients between designated subsets for Hartley's 39 SCD in k 3 regressors.Q , β I } are mutually uncorrelated from Table2.In summary, the value G β for the SCD admits the factorization of 4.6 , on identifying { θ 1 , θ 2 , θ 3 , θ 4 } with { β 0 , β L , β Q , β I }, respectively, given numerically from Tables Q 7.80031 is excessive would be masked on examining G β L and G β L , β Q only.