Revision: Variance Inflation in Regression

.

Users deserve to be apprised not only that data are ill conditioned, but also about workings of the diagnostics themselves.Accordingly, we undertake here to rectify longstanding misconceptions in the use and properties of VIFs, necessarily retracing through several decades to their beginnings.
It is seen here that (i) these differ profoundly in regard to their respective concepts of orthogonality; (ii) objectives and meanings differ accordingly; (iii) sharp divisions trace to muddling these concepts; and (iv) this distinction assumes a pivotal role here.Over time a large body of widely held beliefs, conjectures, intrinsic propositions, and conventional wisdom has accumulated, much from flawed logic, some to be dispelled here.The key to our studies is that VIFs, to be meaningful, must compare actual variances to those of an "ideal" second-moment matrix as reference, the latter to embody the conjectured type of orthogonality.This differs between centered and uncentered diagnostics and for both types requires the reference matrix to be feasible.An outline follows.
Our undertaking consists essentially of four parts.The first is a literature survey of some essentials of illconditioning, to include the divide between centered and uncentered diagnostics and conventional VIFs.The structure of orthogonality is examined next.Anomalies in usage, meaning, and interpretation of conventional VIFs are exposed analytically and through elementary and transparent case studies.Long-standing but ostensible foundations in turn are reassessed and corrected through the construction of "Reference models." These are moment arrays constrained to encode orthogonalities of the types considered.Neither array returns the conventional VIF  s nor VIF  s.Direct rules for finding the amended Reference models are given, preempting the need for constrained numerical algorithms.Finally, studies of ill-conditioned data from the literature are reexamined in light of these findings.

Case Study 1:
A First Look.That anomalies pervade conventional VIF  s and VIF  s may be seen as follows.Given of order (5 × 3), U=X  0 X 0 , and its inverse V=(X  0 X 0 ) −1 as in Conventional centered and uncentered VIFs are respectively, the former for slopes only and the latter taking reciprocals of the diagonals of X  0 X 0 as reference.A Critique.The following points are basic.Note first that model ( 1) is nonorthogonal in both the centered and uncentered regressors.
Remark 1.The VIF  s are not ratios of variances and thus fail to gage relative increases in variances owing to nonorthogonal columns of X 0 .This follows since the first row and column of the second-moment matrix X  0 X 0 = U are fixed and nonzero by design, so that taking X  0 X 0 to be diagonal as reference cannot be feasible.
Remark 1 runs contrary to assertions throughout the literature.In consequence, for models in  0 the mainstay VIF  s in recent vogue are largely devoid of meaning.Subsequently these are identified instead with angles quantifying degrees of multicollinearity among the regressors.
On the other hand, feasible Reference models for all parameters, as identified later for centered and uncentered data in Definition 13, Section 4.2, give [VF  ( β0 ), VF  ( β1 ), VF  ( β2 )] = [0.7704,0.8889, 0.8889] in lieu of conventional VIF  s and VIF  s, respectively.The former comprise corrected VIF  s, extended to include the intercept.Both sets in fact are genuine variance inflation factors, as ratios of variances in the model (1), relative to those in Reference models feasible for centered and for uncentered regressors, respectively.This example flagrantly contravenes conventional wisdom: (i) variances for slopes are inflated in ( 4), but for the intercept deflated, in comparison with the feasible centered reference.Specifically,  0 is estimated here with greater efficiency VF  ( β0 ) = 0.9913 in the initial design (1), despite nonorthogonality of its centered regressors.(ii) Variances are uniformly smaller in the model (1) than in its feasible uncentered reference from (5), thus exhibiting Variance Deflation, despite nonorthogonality of the uncentered regressors.A full explication of the anatomy of this study appears later.In support, we next examine technical details needed in subsequent developments.

Types of Orthogonality.
The ongoing divergence between centered and uncentered diagnostics traces in part to meaning ascribed to orthogonality.Specifically, the orthogonality of columns of X in   refers unambiguously to the vectorspace concept X  ⊥ X  , that is, X   X  = 0, as does the notion of collinearity of regressors with the constant vector in  0 .We refer to this as -orthogonality, in short  ⊥ .In contrast, nonorthogonality in  0 often refers instead to the statistical concept of correlation among columns of X when scaled and centered to their means.We refer to its negation as -orthogonality, or  ⊥ .Distinguishing between these notions is fundamental, as confusion otherwise is evident.For example, it is asserted in [11, p.125] that "the simple correlation coefficient   does measure linear dependency between   and   in the data."

The Models 𝑀
where In later sections, we often will denote the meancentering matrix X X  by M. In particular, the centered form {X → Z = B  X} arises exclusively in models with intercept, with or without reparametrizing ( 0 , ) → (,) in the mean-centered regressors, where Scaling Z to unit column lengths gives Z  Z in correlation form with unit diagonals.

Historical Perspective
Our objective is an overdue revision of the tenets of variance inflation in regression.To provide context, we next survey extenuating issues from the literature.Direct quotations are intended not to subvert stances taken by the cited authors.Models in  0 are at issue since, as noted in [10], centered diagnostics have no place in   .
3.1.Background.Aspects of ill-conditioning span a considerable literature over many decades.Regarding {Y = X + }, scaling columns of X to equal lengths approximately minimizes the condition number  2 (X) [12, p.120] based on [13].Nonetheless,  2 (X) is cast in [9] as a blunt instrument for ill-conditioning, prompting the need for VIFs and other local constructs.Stewart [9] credits VIFs in concept to Daniel and in name to Marquardt.Ill-conditioning points beyond OLS in view of difficulties cited earlier.Remedies proffered in [14,15] include transforming variables, adding new data, and deleting variable(s) after checking critical features of the reduced model.Other intended palliatives include Ridge and Partial Least Squares, as compared in [16]; Principal Components regression; Surrogate models as in [17].All are intended to return reduced standard errors at the expense of bias.Moreover, Surrogate solutions more closely resemble those from an orthogonal system than Ridge [18].Together the foregoing and other options comprise a substantial literature as collateral to, but apart from, the present study.
3.2.To Center.Advocacy for centering includes the following.
(i) VIFs often are defined categorically as the diagonals of the inverse of the correlation matrix of scaled and centered regressors; see [4,9,11,19] for example.These are VIF  s, widely adopted without justification as default to the exclusion of VIF  s.
(ii) It is asserted [4] that "centering removes the nonessential ill-conditioning, thus reducing the variance inflation in the coefficient estimates." (iii) Centering is advocated when predictor variables are far removed from origins on the basic data scales [10,11].

Not to
Center.Advocacy for uncentered diagnostics includes the following caveats from proponents of centering.
(i) Uncentered data should be examined only if an estimate of the intercept is of interest [9,10,20].
(ii) "If the domain of prediction includes the full range from the natural origin through the range of data, the collinearity diagnostics should not be mean centered" [10, p.84].
Other issues against centering derive in part from numerical analysis and work by Belsley.
(i) Belsley [1] identifies  2 (X) for a system {Y = X+} as "the potential relative change in the LS solution b that can result from a small relative change in the data." (ii) These require structurally interpretable variables as "ones whose numerical values and (relative) differences derive meaning and interpretability from knowledge of the structure of the underlying 'reality' being modeled" [1, p.75].
(iii) "There is no such thing as 'nonessential' ill-conditioning, " and "mean-centering can remove from the data the information needed to assess conditioning correctly" [1, p.74].
(iv) "Collinearity with the intercept can quite generally corrupt the estimates of all parameters in the model whether or not the intercept is itself of interest and whether or not the data have been centered" [21, p.90].(v) An example [22, p.121] gives severely ill-conditioned data perfectly conditioned in centered form: "centering alters neither" inflated variances nor extremely sensitive parameter estimates in the basic data; moreover, "diagnosing the conditioning of the centered data (which are perfectly conditioned) would completely overlook this situation, whereas diagnostics based on the basic data would not." (vi) To continue from [22], ill-conditioning persists in the propagation of disturbances, in that "a 1 percent relative change in the X  's results in over a 40% relative change in the estimates, " despite perfect conditioning in centered form, and "knowledge of the effect of small relative changes in the centered data is not meaningful for assessing the sensitivity of the basic LS problem, " since relative changes and their meanings in centered and uncentered data often differ markedly.
(vii) Regarding choice of origin, "(1) the investigator must be able to pick an origin against which small relative changes can be appropriately assessed and (2) it is the data measured relative to this origin that are relevant to diagnosing the conditioning of the LS problem" [22, p.126].
(i) "Because rewriting the model (in standardized variables) does not affect any of the implicit estimates, it has no effect on the amount of information contained in the data" [23, p.76].
(ii) Consequences of ill-advised diagnostics can be severe.Degraded numerical accuracy traces to near collinearity of regressors with the constant vector.
In short, centering fails to prevent a loss in numerical accuracy; centered diagnostics are unable to discern these potential accuracy problems, whereas uncentered diagnostics are seen to work well.Two widely used statistical packages, SAS and SPSS-X, fail to detect this type of ill-conditioning through use of centered diagnostics and thus return highly inaccurate coefficient estimates.For further details see [3].
On balance, for models in  0 the jury is out regarding the use of centered or uncentered diagnostics, to include VIFs.Even Belsley [1] (and elsewhere) concedes circumstances where centering does achieve structurally interpretable models.Of note is that the foregoing listed citations to Belsley apply strictly to condition numbers  2 (X); other purveyors of ill-conditioning, specifically VIFs, are not treated explicitly.

A Synthesis.
It bears notice that (i) the origin, however remote from the cluster of regressors, is essential for prediction, and (ii) the prediction variance is invariant to parametrizing in centered or uncentered forms.Additional remarks are codified next for subsequent referral.
Remark 2. Typically  represents response to input variables { 1 ,  2 , . . .,   }.In a controlled experiment, levels are determined beforehand by subject-matter considerations extraneous to the experiment, to include minimal values.However remote the origin on the basic data scales, it seems informative in such circumstances to identify the origin with these minima.In such cases the intercept is often of singular interest, since  0 is then the standard against which changes in  are to be gaged as regressors vary.We adopt this convention in subsequent studies from the archival literature.
Remark 3. In summary, the divergence in views, whether to center or not, appears to be that critical aspects of illconditioning, known and widely accepted for models in   , have been expropriated over decades, mindlessly and without verification, to apply point-by-point for models in  0 .

The Structure of Orthogonality
This section develops the foundations for Reference models capturing orthogonalities of types  ⊥ and  ⊥ .Essential collateral results are given in support as well.[9] reexamines ill-conditioning from the perspective of numerical analysis.Details follow, where X is a generic matrix of regressors having columns {x  ; 1 ≤  ≤ } and X † = (X  X) −1 X  is the generalized inverse of note, having {x †  ; 1 ≤  ≤ } as its typical rows.Each corresponding collinearity index  is defined in [9, p.72] as

Collinearity Indices. Stewart
constructed so as to be scale invariant.Observe that is centered and scaled, Section 3 of [9] shows that the centered collinearity indices   satisfy Moreover, in  0 it follows that the uncentered VIF  s are squares of the collinearity indices, that is, {VIF  ( β ) =  2  ;  = 0, 1, . . ., }.Note the asymmetry that VIF  s exclude the intercept, in contrast to the inclusive VIF  s.That the label Variance Inflation Factors for the latter is a misnomer is covered in Remark 1. Nonetheless, we continue the familiar notation {VIF  ( β ) =  2  ;  = 0, 1, . . ., }.Transcending the essential developments of [9] are connections between collinearity indices and angles between subspaces.To these ends choose a typical x  in X 0 , and rearrange X 0 as [x  , X [𝑗] ].We next seek elements of as reordered by the permutation matrix Q  .From the clockwise rule the (1, 1) element of the inverse is in succession for each { = 0, 1, . . ., }, where is the projection operator onto the subspace S  (X [] ) ⊂ R  spanned by the columns of X [𝑗] .These in turn enable us to connect { 2  ;  = 0, 1 . . ., }, and similarly for centered values { 2  ;  = 1 . . ., }, to the geometry of illconditioning as follows.
(ii) In particular, ] quantifies the degree of collinearity between the regressors and the constant vector.
Proof.From the geometry of the right triangle formed by (x  , P  x  ), the squared lengths satisfy , where ‖x  − P  x  ‖ 2 = RS  is the residual sum of squares from the projection.It follows that the principal angle between (x  , P  x  ) is given by cos (10) for { = 0, 1, . . ., }, to give conclusion (i).Conclusion (ii) follows on specializing (x 0 , P 0 x 0 ) with x 0 = 1  and P 0 = X(X  X) −1 X  .Conclusion (iii) follows similarly from the geometry of the right triangle formed by (z  , P  z  ), where P  now is the projection operator onto the subspace S  (Z [] ) ⊂ R  spanned by the columns of Z [] , to complete our proof.

Remark 5. Rules of thumb in common use for problematic
VIFs include those exceeding 10 or even 4; see [11,24] for example.In angular measure these correspond, respectively, to   < 18.435 deg and   < 30.0 deg.

Reference Models.
We seek as Reference feasible models encoding orthogonalities of types  ⊥ and  ⊥ .The keys are as follows: (i) to retain essentials of the experimental structure and (ii) to alter what may be changed to achieve orthogonality.For a model in   with moment matrix X  X, our opening paragraph prescribes as reference the model R  = D = Diag( 11 , . . .,   ), as diagonal elements of X  X, for assessing  ⊥ -orthogonality.Moreover, on scaling columns of X to equal lengths, R  is perfectly conditioned with  1 (R  ) = 1.0.In addition, every model in   clearly conforms with its Reference, in the sense that R  is positive definite, as distinct from models in  0 to follow.Consider again models in  0 as in (6); let C = (X  X − M) with M = X X  as the mean-centering matrix; and again let D = Diag( 11 , . . .,   ) comprise the diagonal elements of X  X.
(i)  ⊥ Reference Model.The uncentered VIF  s in  0 , defined as ratios of diagonal elements of (X  0 X 0 ) −1 to reciprocals of diagonal elements of X  0 X 0 , appear to have seen exclusive usage, apparently in keeping with Remark 3.However, the following disclaimer must be registered as the formal equivalent of Remark 1. Theorem 6. Statements that conventional VIF  s quantify variance inflation owing to nondiagonal X  0 X 0 are false for models in  0 having X ̸ = 0.
Proof.Since the Reference variances are reciprocals of diagonal elements of X  0 X 0 , this usage is predicated on the false postulate that X  0 X 0 can be diagonal for X ̸ = 0. Specifically, [1  , X] are linearly independent, that is, 1   X = 0, if and only if X has been mean centered beforehand.
To the contrary, Gunst [25] purports to show that VIF  ( β0 ) registers genuine variance inflation, namely, the price to be paid in variance for designing an experiment having X ̸ = 0, as opposed to X = 0. Since variances for intercepts are Var(⋅ | X = 0) =  2 / and Var(⋅ | X ̸ = 0) =  2  00 ≥  2 / from (6), their ratio is shown in [25] to be  2 0 =  00 ≥ 1 in the parlance of Section 2.3.We concede this to be a ratio of variances but, to the contrary, not a VIF, since the parameters differ.In particular, Var( β0 | X ̸ = 0) =  2  00 , whereas Var(α | X ̸ = 0) =  2 /, with  =  0 +  1  1 + ⋅ ⋅ ⋅ +     in centered regressors.Nonetheless, we still find the conventional VIF  ( β0 ) to be useful for purposes to follow.Remark 7. Section 3 highlights continuing concern in regard to collinearity of regressors with the constant vector.Theorem 4(ii) and expression (10) support the use of ] as an informative gage on the extent of this occurrence.Specifically, the smaller the angle, the greater the extent of such collinearity.Instead of conventional VIF  s given the foregoing disclaimer, we have the following amended version as Reference for uncentered diagnostics, altering what may be changed but leaving X ̸ = 0 intact.
Definition 8. Given a model in  0 with second-moment matrix X  0 X 0 , the amended Reference model for assessing ] with D = Diag( 11 , . . .,   ) as diagonal elements of X  X, provided that R  is positive definite.We identify a model to be  ⊥ -orthogonal when As anticipated, a prospective R  fails to conform to experimental data if not positive definite.These and further prospects are covered in the following, where   designates the angle between {(1  , X  );  = 1, . . ., }.Lemma 9. Take R  as a prospective Reference for  ⊥orthogonality.
(iv) The Reference variances for slopes are given by where Proof.The clockwise rule for determinants gives The computation in parallel with (10), gives conclusion (ii).Using the clockwise rule for block-partitioned inverses, the (1, 1) element of R −1  is given by conclusion (iii).Similarly, the lower right block of R −1  , of order ( × ), is the inverse of H = D − X X  .On identifying a = b = X,  = , and Conclusion (iv) follows on extracting its diagonal elements, to complete our proof.
Corollary 10.For the case  = 2, in order that R  maybe positive definite, it is necessary that  1 +  2 > 90 deg.
Proof.Beginning with Lemma 9(ii), compute Moreover, the matrix R  itself is intrinsically ill conditioned owing to X ̸ = 0, its condition number depending on X.To quantify this dependence, we have the following, where columns of X have been standardized to common lengths (ii) The roots are positive, and R  is positive definite, if and only if ( −  2 ) > 0. ()  ⊥ Reference Model.As noted, the notion of  ⊥ -orthogonality applies exclusively for models in  0 .Accordingly, as Reference we seek to alter X  X so that the matrix comprising sums of squares and sums of products of deviations from means, thus altered, is diagonal.To achieve this canon of  ⊥ -orthogonality, and to anticipate notation for subsequent use, we have the following.
Recall that conventional VIF  s for [ β1 , β2 , . . ., β ] are ratios of diagonal elements of (Z  Z) −1 to reciprocals of the diagonal elements of the centered Z  Z. Apparently this rests on the logic that  ⊥ -orthogonality in  0 implies that Z  Z is diagonal.However, the converse fails and instead is embodied in R  ∈  0 .In consequence, conventional VIF  s are deficient in applying only to slopes, whereas the VF  s resulting from Definition 12(ii) apply informatively for all of [ β0 , β1 , β2 , . . ., β ].
thus preserving slopes, where  = c  =( , H] and to X  0 X 0 and its inverse are in the form of ( 6).This pertains to subsequent developments, and basic invariance results emerge as follows.
Proof.Rules for block-partitioned inverses again assert that G of ( 16) is the inverse of since B  L = 0, to complete our proof.
These facts in turn support subsequent claims that centered VIF  s are translation and scale invariant for slopes β = [ β1 , β2 , . . ., β ], apart from the intercept β0 .

A Critique.
Again we distinguish VIF  s and VIF  s from centered and uncentered regressors.The following comments apply.
(C1) A design is either  ⊥ or  ⊥ -orthogonal, respectively, according as the lower right (×) block X  X of X  0 X 0 , or C = (X  X−X X  ) from expression (6), is diagonal.
Orthogonalities of type  ⊥ and  ⊥ are exclusive and hence work at crossed purposes.(C2) In particular,  ⊥ -orthogonality holds if and only if the columns of with X ̸ = 0, do not gage variance inflation as claimed, founded on the false tenet that X  0 X 0 can be diagonal.(C4) To detect influential observations and to classify high leverage points, case-influence diagnostics, namely {  = VIF (−) /VIF  ; 1 ≤  ≤ }, are studied in [27] for assessing the impact of subsets on variance inflation.Here VIF  is from the full data and VIF (−) on deleting observations in the index set .Similarly, [28] proposes using VIF () on deleting the th observation.
In the present context these would gain substance on modifying VF  and VF  accordingly.

Case Study 1: Continued
5.1.The Setting.We continue an elementary and transparent example to illustrate essentials.Recall  0 : {  =  0 +  1  1 +  2  2 +  } of Section 2.2, the design X 0 = [1 5 , X 1 , X 2 ] of order (5 × 3), and U = X  0 X 0 and its inverse V = (X  0 X 0 ) −1 , as in expressions (1).The design is neither  ⊥ nor  ⊥ orthogonal since neither the lower right (2 × 2) block of X  0 X 0 nor the centered Z  Z is diagonal.Moreover, the uncentered VIF  s as listed in (3) are not the vaunted relative increases in variances owing to nonorthogonal columns of X 0 .Indeed, the only opportunity for  ⊥ -orthogonality here is between columns Nonetheless, from Section 4.1 we utilize Theorem 4(i) and (10) to connect the collinearity indices of [9] to angles between subspaces.Specifically, Minitab recovered values for the residuals {RS  = ‖x  − P  x  ‖ 2 ;  = 0, 1, 2}; further computations proceed routinely as listed in Table 1.In particular, the principal angle  0 between the constant vector and the span of the regressor vectors is  0 = 31.751deg, as anticipated in Remark 7.
Accordingly, R  = U(0) is "as  ⊥ -orthogonal as it can get, " given the lengths and sums of (X 1 , X 2 ) as prescribed in the experiment and as preserved in the Reference model.

𝐶
and invoking Definition 12(ii), we seek a  ⊥ -orthogonal Reference having the matrix C  as diagonal.This is identified as (0.6) in Table 2. From this the matrix R  and its inverse are The variance factors are listed in (4) where, for example, VF  ( β2 ) = 0.3889/0.3571= 1.0889.As distinct from conventional VIF  s for β1 and β2 only, our VF  ( β0 ) here reflects Variance Deflation, wherein  0 is estimated more precisely in the initial  ⊥ -nonorthogonal design.
A further note on orthogonality is germane.Suppose that the actual experiment is (0) yet, towards a thorough diagnosis, the user evaluates the conventional VIF  ( β1 ) = 1.2250 = VIF  ( β2 ) as ratios of variances of (0) to (0.6) from Table 2. Unfortunately, their meaning is obscured since a design cannot at once be  ⊥ and  ⊥ -orthogonal.
As in Remark 2, we next alter the basic design X 0 at (1) on shifting columns of the measurements to [0, 0] as minima and scaling to have squared lengths equal to  = 5.The resulting X 0 follows directly from (1).The new matrix U = X  0 X 0 and its inverse V are demonstrating, in comparison with (4), that VF  s, apart from β0 , are invariant under translation and scaling, a consequence of Lemma 14.
We further seek to compare variances for the shifted and scaled (X 1 , X 2 ) against  ⊥ -orthogonality as Reference.However, from U = X  0 X 0 at (21) we determine that  1 = 0.84853 =  2 and 5[(  definite.This appears to be anomalous, until we recall that  ⊥ -orthogonality is invariant under rescaling, but not recentering the regressors.

A Critique.
We reiterate apparent ambiguity ascribed in the literature to orthogonality.A number of widely held guiding precepts has evolved, as enumerated in part in our opening paragraph and in Section 3. As in Remark 3, these clearly have been taken to apply verbatim for X 0 and X  0 X 0 in  0 , to be paraphrased in part as follows.
(P1) Ill-conditioning espouses inflated variances; that is, VIFs necessarily equal or exceed unity.(P2)  ⊥ -orthogonal designs are "ideal" in that VIF  s for such designs are all unity; see [11] for example.We next reassess these precepts as they play out under  ⊥ and  ⊥ orthogonality in connection with Table 2.
(C5) VFs for β0 are critical in prediction, where prediction variances necessarily depend on Var( β0 ), especially for predicting near the origin of the system of coordinates.
(C6) Dissonance between  ⊥ and  ⊥ is seen in Table 2, where (0.6), as  ⊥ -orthogonal with X  1 X 2 = 0.6, is the antithesis of  ⊥ -orthogonality at (0), where X   1 X 2 = 0. (C7) In short, these transparent examples serve to dispel the decades-old mantra that ill-conditioning necessarily spawns inflated variances for models in  0 , and they serve to illuminate the contributing structures.
6.1.The Model  0 .We consider in turn the orthogonal and the linked series.
6.1.1.Orthogonal Data.Matrices X  0 X 0 and (X  0 X 0 ) −1 for the orthogonal data under model  0 are listed in Table 4, where variances occupy diagonals of (X  0 X 0 ) −1 .The conventional uncentered VIF  s are [1.20296,1.02144, 1.12256, 1.05888].Since X ̸ = 0, we find the angle between the constant and the span of the regressor vectors to be  0 = 65.748deg as in Theorem 4(ii).Moreover, the angle between X 1 and the span of [1  , X 2 , X 3 ], namely,  1 = 81.670deg, is not 90 deg because of collinearity with the constant vector, despite the mutual orthogonality of [X 1 , X 2 , X 3 ].Observe here that X  0 X 0 10
In view of dissonance between  ⊥ and  ⊥ orthogonality, the sums of squares and products for the mean-centered X are reflecting nonnegligible negative dependencies, to distinguish the  ⊥ -orthogonal X from the  ⊥ -nonorthogonal matrix Z = B  X of deviations.
To continue, we suppose instead that the  ⊥ -orthogonal model was to be recast as  ⊥ -orthogonal.The Reference model and its inverse are listed in Table 5. Variance factors, as ratios of diagonal elements of (X  0 X 0 ) −1 in Table 4 to those of R −1  in Table 5, are reflecting negligible differences in precision.This parallels results reported in Table 2 comparing (0) to (0.6) as Reference, except for larger differences in Table 2, namely, [1.2868,1.2250,1.2250].

Linked Data.
For the Linked Data under model  0 , the matrix Z  0 Z 0 follows routinely from Table 3; its inverse (Z  0 Z 0 ) −1 is listed in Table 6.The conventional uncentered VIF  s are now [1.20320, 1.03408, 1.13760, 1.05480].These again fail to gage variance inflation since Z  0 Z 0 cannot be diagonal.From (10) the principal angle between the constant vector and the span of the regressor vectors is  0 = 65.735degrees.
Remark 15.Observe that the choice for R  may be posed as seeking R  = U() such that R −1  (2, 3) = 0.0, then solving numerically for  using Maple for example.However, the algorithm in Definition 12(ii) affords a direct solution: that C  should be diagonal stipulates its off-diagonal element as X   1 X 2 − 3/5 = 0, so that X  1 X 2 = 0.6 in R  at expression (20).
To illustrate Theorem 4(iii), we compute cos() = (1 − 1/1.0889) 1/2 = 0.2857 and  = 73.398deg as the angle between the vectors [X 1 , X 2 ] of (1) when centered to their means.For slopes in  0 , [9] shows that {VIF  ( β ) < VIF  ( β ); 1 ≤  ≤ }.This follows since their numerators are equal, but denominators are reciprocals of lengths of the centered and uncentered regressor vectors.To illustrate, numerators are equal for VIF  ( β3 ) = 0.3889/(1/3) = 1.1667 and for VIF  ( β3 ) = 0.3889/(1/2.8)= 1.0889, but denominators are reciprocals of X  2 X 2 = 3 and ∑ 5 =1 ( 2 −  2 ) 2 = 2.8.()  ⊥ Reference Model.To rectify this malapropism, we seek in Definition 8 a model R  as Reference.This is found on setting all off-diagonal elements of Z  0 Z 0 to zero, excluding the first row and column.Its inverse R −1   is listed in Table 6.The VF  s are ratios of diagonal elements of (Z  0 Z 0 ) −1 on the left to diagonal elements of R −1  on the right, namely, Against conventional wisdom, it is counter-intuitive that [ 0 ,  1 ,  2 ,  3 ] are all estimated with greater precision in the  ⊥ -nonorthogonal model Z  0 Z 0 than in the  ⊥ -orthogonal model R  of Definition 8.As in Section 5.4, this serves again to refute the tenet that ill-conditioning espouses inflated  6 and Reference values appear as diagonal elements of R −1  on the right in Table 7, their ratios give the centered VF  s as We infer that changes in precision would be negligible, if instead the Linked Data experiment was to be recast as a  ⊥orthogonal experiment.
As in Remark 15, the reference matrix R  is constrained by the first and second moments from X  0 X 0 and has free parameters { 1 ,  2 ,  3 } for the {(2, 3), (2,4), (3,4)} entries.The constraints for  ⊥ -orthogonality are Although this system of equations can be solved with numerical software such as Maple, the algorithm stated in Definition 12 easily yields the direct solution given here.
6.2.The Model   .For the Linked Data take {  =  1  1 +  2  2 +  3  3 +   ; 1 ≤  ≤ 8} as {Y = Z + }, where the expected response at { 1 =  2 =  3 = 0} is () = 0.0, and there is no occasion to translate the regressors.The lower right (3 × 3) submatrix of Z  0 Z 0 (4 × 4) from  0 is Z  Z; its inverse is listed in Table 8.The ratios of diagonal elements of (Z  Z) −1 to reciprocals of diagonal elements of Z  Z are the conventional uncentered VIF  s, namely, [1.0123, 1.0247, 1.0123].Equivalently, scaling Z  Z to have unit diagonals and inverting gives C −1 as in Table 8; its diagonal elements are the VIF  s.Throughout Section 6, these comprise the only correct interpretation of conventional VIF  s as genuine variance ratios.Against complaints that regressors are remote from their natural origins, we center regressors to [0, 0, 0] as origin on subtracting the minimal element from each column as in Remark 2, and scaling these to {‖X  ‖ = 5.0;  = 1, 2, 3}.This is motivated on grounds that centered diagnostics, apart from VIF  ( β0 ), are invariant under shifts and scalings from Lemma 14.In addition, under the new origin [0, 0, 0],  0 assumes prominence as the baseline response against which changes due to regressors are to be gaged.The original data are available on the publisher's web page for [29].
Taking these shifted and scaled vectors as columns of values for X  0 X 0 and its inverse are listed in Table 9.The condition number of X  0 X 0 is now 113.6969 and its VIF  s are [6.7756,17.9987,4.2782,17.4484].Additionally, from Theorem 4(ii) we recover the angle  0 = 22.592 deg to quantify collinearity of regressors with the constant vector.
In addition, the scaled and centered correlation matrix for X is indicating strong linkage between mean-centered versions of (X 1 , X 3 ).Moreover, the experimental design is neither  ⊥ nor  ⊥ -orthogonal, since neither the lower right (3 × 3) block of To check against  ⊥ -orthogonality, we obtain the matrix W of Definition 12(ii) on replacing off-diagonal elements of the lower right (3 × 3) submatrix of X  0 X 0 by corresponding offdiagonal elements of X X  .The result is the matrix R  and its inverse as given in Table 10, where diagonal elements of R −1  are the Reference variances.The VF  s, found as ratios of diagonal elements of (X  0 X 0 ) −1 relative to those of R −1  , are listed in Table 12 under G = [0, 0, 0], the indicator of Definition 12(i) for this case.

Variance Factors and Linkage.
Traditional gages of illconditioning are patently absurd on occasion.By convention VIFs are "all or none, " wherein Reference models entail strictly diagonal components.In practice some regressors are inextricably linked: to get, or even visualize, orthogonal regressors may go beyond feasible experimental ranges, require extrapolation beyond practical limits, and challenge credulity.Response functions so constrained are described in [30] as "picket fences." In such cases, taking C to be diagonal as its  ⊥ Reference is moot, at best an academic folly, abetted in turn by default in standard software packages.In short, conventional diagnostics here offer answers to irrelevant questions.Given pairs of regressors irrevocably linked, we seek instead to assess effects on variances for other pairs that could be unlinked by design.These comments apply especially to entries under G = [0, 0, 0] in Table 12.
To proceed, we define essential ill-conditioning as regressors inextricably linked, necessarily to remain so and as nonessential if regressors could be unlinked by design.As a followup to Section 3.2, for X ̸ = 0 we infer that the constant vector is inextricably linked with regressors, thus accounting for essential ill-conditioning not removed by centering, contrary to claims in [4].Indeed, this is the essence of Definition 8.
Fortunately, these limitations may be remedied through Reference models adapted to this purpose.This in turn exploits the special notation and trappings of Definition 12(i) as follows.In addition to G = [0, 0, 0], we iterate for indicators Values of VF  s thus obtained are listed in Table 12.To fix ideas, the Reference R  for G = [1, 0, 0], that is, for the constraint R  (2, 3) = X  0 X 0 (2, 3) = 19.4533, is given in Table 11, together with its inverse.Found as ratios of diagonal elements of (X  0 X 0 ) −1 relative to those of R −1   in Table 11, VF  s are listed in Table 12 under G = [1, 0, 0].In short, R  is "as  ⊥ -orthogonal as it could be" given the constraint.Other VF  s in Table 12 proceed similarly.For all cases in Table 12, excluding G = [0, 1, 1],  0 is estimated with greater efficiency in the original model than any of the postulated Reference models.
As in Remark 15, the reference matrix R  for G = [1, 0, 0] is constrained by X  1 X 2 = 19.4533, to be retained, whereas X  1 X 3 =  1 and X  2 X 3 =  2 are to be determined from {R −1  (2, 4) = R −1  (3, 4) = 0}.Although this system can be solved numerically as noted, the algorithm stated in Definition 12 easily yields the direct solution given here.
On the other hand, VF  s for G ∈ {[0, 1, 0], [1, 1, 0], [0, 1, 1]} in Table 12 are likewise comparable, where (X 1 , X 3 ) are allowed to remain linked at X  1 X 3 = 24.2362.Here the VF  s reflect negligible changes in efficiency of the estimates Table 9: Matrices X  0 X 0 and (X  0 X 0 ) −1 for the body fat data under model  0 centered to minima and scaled to equal lengths. in comparison with the original X  0 X 0 , where [X 1 , X 2 , X 3 ] are all linked.In summary, negligible changes in overall efficiency would accrue on recasting the experiment so that (X 1 , X 2 ) and (X 2 , X 3 ) were pairwise uncorrelated.
Table 12 conveys further useful information on noting that VFs, as ratios, are multiplicative and divisible.For example, take the model coded G = [1, 1, 0], where X  1 X 2 and X  1 X 3 are retained at their initial values under X  0 X 0 , namely, 19.4533 and 24.2362 from Table 9, with (X 2 , X 3 ) unlinked.Now taking G = [0, 1, 0] as reference, we extract the VFs on dividing elements in

Conclusions
Our goal is clarity in the actual workings of VIFs as illconditioning diagnostics, thus to identify and rectify misconceptions regarding the use of VIFs for models X 0 = [1  , X] with intercept.Would-be arbiters for "correct" usage have divided between VIF  s for mean-centered regressors and VIF  s for uncentered data.We take issue with both.Conventional but clouded vision holds that (i) VIF  s gage variance inflation owing to nonorthogonal columns of X 0 as vectors in R  ; (ii) VIFs ≥ 1.0; and (iii) models having orthogonal mean-centered regressors are "ideal" in possessing VIF  s of value unity.Reprising Remark 3, these properties, widely and correctly understood for models in   , appear to have been expropriated without verification for models in  0 hence the anomalies.Accordingly, we have distinguished vector-space orthogonality ( ⊥ ) of columns of X 0 , in contrast to orthogonality ( ⊥ ) of mean-centered columns of X.A key feature is the construction of second-moment arrays as Reference models (R  , R  ), encoding orthogonalities of these types, and their Variance Factors as VF  's and VF  's, respectively.Contrary to convention, we demonstrate analytically and through case studies that (i  ) VIF  s do not gage variance inflation; (ii  ) VFs ≥ 1.0 need not hold, whereas VF < 1.0 represents Variance Deflation; and (iii  ) models having orthogonal centered regressors are not necessarily "ideal." In particular, variance deflation occurs when ill-conditioned data yield smaller variances than corresponding orthogonal surrogates.
In short, our studies call out to adjudicate the prolonged dispute regarding centered and uncentered diagnostics for models in  0 .Our summary conclusions follow.
(S1) The choice of a model, with or without intercept, is a substantive matter in the context of each experimental paradigm.That choice is fixed beforehand in a particular setting, is structural, and is beyond the scope of the present study.Snee and Marquardt [10] correctly assert that VIFs are fully credited and remain undisputed in models without intercept: these quantities unambiguously retain the meanings originally intended, namely, as ratios of variances to gage effects of nonorthogonal regressors.For models with intercept the choice is between centered and uncentered diagnostics, which is to be weighed as follows.(S2) Conventional VIF  s apply incompletely to slopes only, not necessarily grasping fully that a system is ill conditioned.Of particular concern is missing evidence on collinearity of regressors with the intercept.On the other hand, if  ⊥ -orthogonality pertains, then VF  s may supplant VIF  s as applicable to all of [ 0 ,  1 , . . .,   ].Moreover, the latter may reveal that VF  ( β0 ) < 1.0, as in cited examples and of special import in predicting near the origin of the coordinate system.(S3) Our VF  s capture correctly the concept apparently intended by conventional VIF  s.Specifically, if  ⊥orthogonality pertains, then VF  s, as genuine ratios of variances, may supplant VIF  s as now discredited gages for variance inflation.(S4) Returning to Section 3.2, we subscribe fully, but for different reasons, to Belsley's and others' contention that uncentered VIF  s are essential in assessing illconditioned data.In the geometry of ill-conditioning, these should be retained in the pantheon of regression diagnostics, not as now debunked gages of variance inflation, but as more compelling angular measures of the collinearity of each column of X 0 = [1  , X 1 , X 2 , . . ., X  ], with the subspace spanned by remaining columns.
(S5) Specifically, the degree of collinearity of regressors with the intercept is quantified by  0 deg which, if small, is known to "corrupt the estimates of all parameters in the model whether or not the intercept is itself of interest and whether or not the data have been centered" [21, p.90].This in turn would appear to elevate  0 in importance as a critical ill-conditioning diagnostic.
(S6) In contrast, Theorem 4(iii) identifies conventional VIF  s with angles between a centered regressor and the subspace spanned by remaining centered regressors.For example, is the angle between the centered vector Z 1 = B  X 1 and the remaining centered vectors.These lie in the linear subspace of R  comprising the orthogonal complement to 1  ∈ R  and thus bear on the geometry of ill-conditioning in this subspace.
(S7) Whereas conventional VIFs are "all or none, " to be gaged against strictly diagonal components, our Reference models enable unlinking what may be unlinked, while leaving other pairs of regressors linked.This enables us to assess effects on variances attributed to allowable partial dependencies among some but not other regressors.
In conclusion, the foregoing summary points to limitations in modern software packages.Minitab v.15,  v.19, and  v.9.2 are standard software packages which compute collinearity diagnostics returning, with the default options, the centered VIFs {VIF  ( β ); 1 ≤  ≤ }.To compute the uncentered VIF  s, one needs to define a variable for the constant term and perform a linear regression using the options of fitting without an intercept.On the other hand, computations for VF  s and VF  s, as introduced here, require additional if straightforward programming, for example, Maple and PROC IML of the SAS System, as utilized here.

Table 5 :
Matrices R  and R −1  for checking  ⊥ -orthogonality in the orthogonal data X 0 under model  0 .
()  ⊥ Reference Model.Further consider variances, were the Linked Data to be centered.As in Definition 12(ii), we seek a Reference model R  such that C  = (W − M) is diagonal.The required R  and its inverse are listed in Table7.Since variances in the Linked Data are [0.15040,0.12926, 0.14220, 0.13185] from Table 7. Case Study: Body Fat Data 7.1.The Setting.Body fat and morphogenic measurements were reported for  = 20 human subjects in Table D.11 of [29, p.717].Response and regressor variables are : amount of body fat; X 1 : triceps skinfold thickness; X 2 : thigh circumference; and X 3 : mid-arm circumference, under  0: {  =  0 +  1  1 +  2  2 +  3  3 +   }.From the original data as reported the condition number of X  0 X 0 is 1,039,931, and the VIF  s are [271.4800,446.9419, 63.0998, 77.8703].On scaling columns of X to equal column lengths {‖X  ‖ = 5.0;  = 1, 2, 3}, the resulting condition number of the revised X  0 X 0 is 2, 969.6518, and the uncentered VIF  s are as before from invariance under scaling.

Table 7 :
Matrices R  and R −1  for checking  ⊥ -orthogonality in the linked data under model  0 .

Table 8 :
Matrices (Z  Z) −1 and C −1 for the linked data under model   .Reference Model.To compare this with a  ⊥ -orthogonal design, we turn again to Definition 8.However, evaluating the test criterion in Lemma 9() shows that this configuration is not commensurate with  ⊥ -orthogonality, so that the VF  s are undefined.Comparisons with  ⊥orthogonal models are addressed next.

Table 10 :
Matrices R  and R −1  for the body fat data, adjusted to give off-diagonal elements the value 0.0, with code G = [0, 0, 0].

Table 11 :
Matrices R  and R −1  for the body fat data, adjusted to give off-diagonal elements the value 0.0 except X  1 X 2 ; code G = [1, 0, 0].

Table 12 :
Summary VF  s for the body fat data centered to minima and scaled to common lengths, identified with varying G = [ 12 ,  13 ,  23 ] from Definition 12.