Tolerance Intervals in a Heteroscedastic Linear Regression Context with Applications to Aerospace Equipment Surveillance

A heteroscedastic linear regression model is developed from plausible assumptions that describe the time evolution of performance metrics for equipment. The inherited motivation for the related weighted least squares analysis of the model is an essential and attractive selling point to engineers with interest in equipment surveillance methodologies. A simple test for the significance of the heteroscedasticity suggested by a data set is derived and a simulation study is used to evaluate the power of the test and compare it with several other applicable tests that were designed under different contexts. Tolerance intervals within the context of the model are derived, thus generalizing well-known tolerance intervals for ordinary least squares regression. Use of the model and its associated analyses is illustrated with an aerospace application where hundreds of electronic components are continuously monitored by an automated system that flags components that are suspected of unusual degradation patterns.


Introduction
1.1.Background.The model and analyses developed in this paper address a problem encountered when analyzing data from service life tests of aerospace hardware packages.Data for as many as 700 performance metrics per part type are automatically stored during surveillance testing and subsequently input into a software program where, up to this point, an ordinary least squares line based on normaltheory has been routinely fit to the data using time-inservice as the explanatory variable.The software program also outputs tolerance intervals based on the ordinary least squares analysis.Engineers monitoring this process are alerted only to those cases where observations in the scatter plot fall outside the tolerance intervals or the tolerance interval crosses a given limit within some specified future time interval (e.g., 60 months).In cases where the alert suggests an increasing accelerated degradation, a proactive corrective action (e.g., part replacement) may be initiated.In cases where the alert suggests less than expected degradation, a cursory investigation to determine if the part is being utilized properly is initiated.Tolerance intervals are often similarly used for monitoring environmental applications [1].
Compared to having engineers individually examine the data from all the combinations of metrics and part types, the automated monitoring and flagging process is quite costeffective.However, when the variance of the metric increases with time, the tolerance intervals that result from the ordinary least squares analyses fail in the sense that the intervals become too narrow as time increases.Figure 1 illustrates this point, where the 111 observations are the performance metric for one type of part.It is evident from the figure that the random errors associated with the regression model are heteroscedastic.Heteroscedasticity is, of course, not unique to our application.It arises routinely in other applications such as economics, behavioral sciences, social sciences, environmental science, and computer vision.The works in [2][3][4][5][6] provide examples within these disciplines, respectively.
Figure 1 also shows two sets of pointwise 95%-content tolerance intervals constructed at the 90% confidence level.The dashed lines represent the tolerance intervals derived from an ordinary least squares analysis, while the solid lines represent the tolerance intervals derived from an estimated weighted least squares analysis that is described in Section 1.2.The bold line in the figure is the estimated weighted least squares regression line.It is evident in Figure 1 that the ordinary least squares tolerance intervals are inadequate in the sense that they are too wide for small time values and too narrow for large time values.On the other hand, the more pronounced curvature associated with the tolerance intervals derived from the estimated weighted least squares analysis more adequately captures the range of the observations at both ends of the time spectrum.

Model for Heteroscedasticity.
Examination of aerospace data for many part types and many performance metrics led us to propose the following model for heteroscedasticity.Let X(t) denote the metric level at time t for a particular part, and for motivational purposes, assume that the underlying process is a discrete-time valued process (t = 0, 1, . ..).Assume that the initial value of the process is X(0) = β 0 + e, where β 0 is an unknown constant and e is a normally distributed random variable with zero mean and unknown variance σ 2 e > 0. If the degradation process has both a constant deterministic trend, say β 1 , and also a random stochastic perturbation, it would follow that X(t) = X(t − 1)+β 1 +s t , where s t is a normally distributed random variable with zero mean and unknown variance σ 2 s ≥ 0. It follows from successive substitutions that X(t) = β 0 +β 1 t+e+ t i=1 s i , or equivalently, X(t) = β 0 +β 1 t +e +δ t , where δ t is a normally distributed random variable with zero mean and unknown variance σ 2 s t.The model for X(t) is (drifting) Brownian motion, except for the fact that the variance function is displaced from the origin by σ 2 e .In our application, the process X(t) is measured one time on each unit; so the data are a collection of independent observations Y i = X i (t i ), where X i (t) is the process associated with the ith unit.Thus, we have the model Equivalently, the model implies that observations are independent and normally distributed with means β 0 + β 1 t i and variances σ 2 e + σ 2 s t i .The fact that a variance component, σ 2 s , is responsible for the heteroscedasticity is somewhat unique to our model since the large literature pertaining to heteroscedasticity models usually assumes variances of the form h(σ 2 e + λt i ), where h(•) is an arbitrary function (e.g., a polynomial or the exponential function); see [2,7] and references therein.While these types of models have seen many successful applications, the lucid interpretation of the heteroscedastic variance function σ 2 e + σ 2 s t was crucial when eliciting buy-in from our aerospace engineering project stakeholders.
A pointwise 100γ%-content tolerance interval with confidence level 100(1 − α)% for a regression line in the context of homoscedastic normal errors was derived in [10].The following theorem, which is proved in the Appendix A, extends that result to our context.Theorem 1.If ρ is known, a pointwise 100γ%-content tolerance interval with confidence level 100(1 − α)% for a normal distribution that has mean β 0 + β 1 t and variance σ 2 e (1 + ρt) is where χ 2 n−2, 1−α is the upper 1 − α percentile of a chi-square distribution with n − 2 degrees of freedom and r(ρ) is the solution to (2π The pointwise tolerance intervals shown in Figure 1 corresponding to the estimated weighted least squares analysis were drawn from (3) with ρ replaced by its MLE ρ, and with α = 0.1 and γ = 0.95.

Testing for Heteroscedasticity.
Of special importance when analyzing the data for our application is the test of H 0 that the data are homoscedastic.If the hypothesis is not rejected, all the pertinent inference can be drawn from an efficient ordinary least squares analyses.On the other hand, rejecting H 0 protects a practitioner from the pitfalls of using an inappropriate least squares analysis which include a higher than expected false alert rate.Deriving a test for H 0 is particularly interesting from the standpoint that there are alternative tests available in the literature and in subsequent sections in this paper we present a unifying framework for relating alternative tests to each other.

Overview of Remaining Sections.
In Section 2, we position the proposed heteroscedastic model as a special case of a general mixed linear model.We then propose a test of H 0 that is motivated by informally combining the root arguments used when deriving locally most powerful tests and score tests.We show that the test statistic has a distribution that only depends on the underlying variance ratio ρ and thus is amenable to a significance test of H 0 : ρ = 0.A Monte Carlo algorithm for computing the required critical values is outlined.In Section 3, we relate our proposed test to some common tests for heteroscedasticity, namely, the Breusch and Pagan, White, and likelihood ratio tests.In Section 4, we show additional examples of our test for heteroscedasticity and the use of tolerance intervals within the context of aerospace engineering case study.We also report results of a simulation study that examined the power of all the tests that are discussed in this paper.We close the paper with a short summary in Section 5.

Proposed Test for Heteroscedasticity
The heteroscedastic regression model developed in Section 1 is a special case of a more general mixed linear model defined by where X is an n × p matrix (we assume without loss of generality that X has a full column rank) of known constants, β is p × 1 vector of fixed-effects, Z is n × m matrix of known constants, s is a m × 1 vector of unobservable random-effects that follow an MVN m (0, σ 2 s I m ) distribution, and e is an n × 1 vector of unobservable random error terms, independent of s, and having a MVN n (0, σ 2 e I n ) distribution.The special case considered in Section 1 is simply where p = 2 and m = n with Z a diagonal matrix with the values equal to { √ t i } n i=1 .We begin the derivation of our proposed test for heteroscedasticity by writing the log-likelihood of the general mixed linear model, based on (β, σ 2 e , ρ), as follows: Next, there exists an orthogonal matrix Q such that where B is a diagonal matrix whose elements i=1 are the eigenvalues of ZZ .It follows that ZZ = QBQ , and hence Combining this result with the definitions v = Q y and μ = Q Xβ, the log-likelihood can be rewritten as If β and σ 2 e were known, the locally most powerful test statistic would be To obtain a useable test statistic we follow the procedure used when deriving the score test (e.g., see [2]) by replacing the unknown β and σ 2 e by their conditional MLEs given H 0 , which are β 0 (0) = (X X) −1 X y and σ 2 e (0) = r r/n, where r = (I − P X )y and P X = X(X X) −1 X .The resulting statistic is The distribution of r is MVN n {0, σ 2 e [I − P X + ρ(I − P X )ZZ (I − P X )]} and by the nature of R it is clear that its distribution only depends on ρ.A simple approach for finding critical values to test H 0 is to simulate r vectors from an MVN n (0, I − P X ), compute the corresponding values of R from (8), and estimate the critical value from the resulting empirical distribution (edf) function of R values.Denoting the upper-α percentile of the edf by R a , the R test rejects H 0 if R > R α .We emphasize the computational simplicity of R, and the ease in which critical values can be obtained.It is also possible to reexpress the null distribution of (8) as a ratio of quadratic forms in independent standard normal random variables and then the saddlepoint approximation described by [11] can be employed to approximate the Pvalue associated with the test.
A locally most powerful invariant (LMPI) test of H 0 was derived in [12][13][14].The derivation of this test is sketched in Appendix B. It can be seen there that the complexity of computing LMPI test is nontrivial.The key result in Appendix B is the demonstration that the LMPI test and the R test are equivalent under the general mixed linear model.The practical significance of this finding is that the computation of the LMPI test, as originally proposed, involves a complicated eigenvalue and eigenvector analysis, whereas computing the equivalent form R test only requires fitting the model with ordinary least squares tools.To characterize this computational contrast another way, the computations associated with the equivalent form R test can be done with software as simple as Excel, whereas the computations associated with the originally proposed form LMPI test will require a rich matrix processing software package.
From a conceptual perspective, the equivalence between the LMPI test and the R test is interesting in the sense that the original complicated derivation of the LMPI test can be replaced by the simpler derivation of the R test.In addition, the demonstrated equivalence also provides further motivation for a procedure introduced in [15] where a test statistic is derived by replacing nuisance parameters in the locally most powerful test statistic by their conditional MLEs.Results in [15] show this procedure produces an asymptotically uniformly most powerful test.Our equivalence finding shows that when we used this approach to derive the R test, in a finite sample context, we arrive at a locally most powerful invariant test.

Other Tests for Heteroscedasticity
The Breusch and Pagan test for homogeneity is a partial score test derived for a general setting where the observations are independent and normally distributed with means x i β and variances σ 2 i = h(z i ξ), where x i is a p × 1 vector of known covariates, β is a p × 1 vector of unknown parameters, z i is a q × 1 vector of known covariates (whose components may or may not overlap with the components of x i ), ξ is a q × 1 vector of unknown parameters, and h(•) is an arbitrary function that is only assumed to possess a first derivative.By setting the first component of z i to unity, a test of H * 0 : is a test of heteroscedasticity.The Breusch and Pagan test [16] is the partial score test of H * 0 .For application to our problem, we have e , σ 2 s ) and h(z i ξ) = z i ξ and the test statistic simplifies to where A is an n × n diagonal matrix whose ith diagonal element is equal to n(t i − t)/ 2 n i=1 (t i − t) 2 .The White test [17] is positioned to be robust to the normality assumption.For our application, the statistic can be obtained by first computing the vector of squared residuals, say u where u i = r 2 i , and then computing the test statistic as where J n is an n × n matrix of ones,

and
V is a matrix whose first column is all ones, whose second column is the values {t i } n i=1 and whose third column is the values {t 2 i } n i=1 .Note that the ratio term in the definition of W is simply the R-square value of the regression of u on the V matrix.The distributions of both BP and W depend on ρ and the same Monte Carlo method for approximating the null distribution of R can be used to approximate the null distributions of BP and W.
The log-likelihood function for the general mixed linear model was given in ( 5) and computation of the MLE of (β, σ 2 e , ρ) and the conditional MLE of (β, σ 2 e ) was discussed in Section 2. It follows that the likelihood ratio test statistic is The LRT requires relatively significant computations.For example, computations of the full model MLEs needed by the LRT are not readily available unless software such as PROC MIXED in SAS is available.In standard situations the asymptotic null distribution of LRT is chi-square with degrees of freedom equal to the difference in the dimension of the unconstrained and constrained parameter spaces.However, our situation is not standard since the reduced parameter space lies on the boundary of the unconstrained parameter space.Results in [18] show that the asymptotic distribution of LRT in ( 11) is a 50: 50 mixture of zero and a chi-square distribution with one degree of freedom.A size-α LRT therefore rejects H 0 : denotes the upper-2α percentile of a chi-square distribution with one degree of freedom.

Application to Aerospace Case Study
4.1.Illustrative Examples.Figure 1 was used in Section 1 as illustrative of a data set that exhibits heteroscedasticity.Figure 2 shows a second illustrative data set (109 observations) which does not exhibit evidence of heteroscedasticity.As in Figure 1, two sets of pointwise 95%-content tolerance intervals constructed at the 90% confidence level are shown corresponding to an ordinary least squares analysis and the estimated weighted least squares analysis described in Section 1.2.In contrast to Figure 1, the two sets of tolerance intervals are nearly identical.Table 1 shows the computed test statistics R, BP, W, and LRT for the two data sets shown in Figures 1 and 2, along with their P-values for testing H 0 : ρ = 0.The tests all agree that heteroscedasticity appears to exist in Data Set 1  but not in Data Set 2. We note that of all the tests considered, R is by far the simplest to compute with BP and W only slightly more complicated.As described in the introduction, our application context has hundreds of metrics that need to be monitored with this type of analysis methodology.Figures 3, 4, 5, and 6 provide four additional illustrative examples, including the computed R statistic and associated P-value for H 0 : ρ = 0.There is evidence of heteroscedasticity in all four of these data sets.

Power Comparison of Alternative Tests.
It is easy to show that the distributions of the R, BP, W, and LRT test statistics depend only on ρ.It follows that the power functions (i.e., the probability of rejecting H 0 : ρ = 0 as function of ρ) of the tests can be evaluated using simulated data sets from the mixed model in (3) using the parameters (β 0 , β 1 , σ 2 e , ρ) = (0, 0, 1, ρ).We implemented our simulation study using the S-Plus package.For each simulated data set, each of the test statistics can be evaluated and compared to their respective critical values corresponding to nominal size α significance tests.The power of the tests is estimated by the fraction of data sets for which they reject H 0 .
Column 2 of Table 2 shows the estimated power of the 10% nominal size R test based on this procedure for ρ ∈ [0, 0.25], using 5000 simulated data sets and the same set of {t i } n i=1 values associated with Data Set 1, which is shown in Figure 1.Columns 3-5 of Table 2 give the ratio of the power for the LRT, BP, and W tests, relative to the R test.The results show that the R test and the LRT have nearly identical power, though the R test has slightly better power for small ρ.The R test has uniformly better power than BP and W tests.A somewhat surprising observation is the pronounced dominance of R with respect to BP and W, which are perhaps the two most widely known tests in the economics literature.Table 3 shows similar power comparisons using the {t i } n i=1 associated with Data Set 2, that is shown in Figure 2, and the conclusions are consistent with those drawn from Table 2.

A. Tolerance Intervals for Heteroscedastic Model
Using observations that follow a simple linear regression model Y i = β 0 + β 1 t i + e i , (i = 1, . . ., n), Wallis [10] derived a tolerance interval for a normal distribution that has mean β 0 + β 1 t and variance σ 2 e .The Wallis interval easily extends to the more general case where the observations are of the form Y i = β 0 t 0i + β 1 t 1i + e i , where the {t 0i } n i=1 are not necessarily equal to unity.In particular, a 100(1 − α)% tolerance interval with content γ for the normal distribution with mean β 0 t 0 + β 1 t 1 (where t 0 and t 1 are arbitrary) and variance σ 2 e is where ( β 0 , β 1 ) and s 2 denote the ordinary least squares estimates of (β 0 , β 1 ) and σ 2 e using an X matrix that has the values {t 0i } n i=1 in its first column, rather than unity, and moreover, when computing the r value from where N = [(t 0 , t 1 )(X X) −1 (t 0 , t 1 ) ] −1 .The heteroscedastic regression model described in Section 1.2 generalizes the problem solved by Wallis by having {e i } n i=1 independent and normally distributed with zero means and variances equal to σ 2 e (1+ρt i ), where ρ is a new parameter.The derivation in this Appendix is for the case where ρ is known so that the observations can equivalently be expressed as where { e i } n i=1 are independent and normally distributed with zero means and variances equal to σ 2 e .Defining Y i = Y i / 1 + ρt i , t 0i = 1/ 1 + ρt i and t 1i = t i / 1 + ρt i , it follows that a 100(1 − α)% tolerance interval with content γ for a normal distribution with mean β 0 t 0 + β 1 t 1 and variance σ 2 e is where ( β 0 , β 1 ) = ( X X) −1 X Y with X being the n × 2 matrix whose first column is the values {t 0i } n i=1 and whose second column is the values , and where r(ρ) is the solution r to (A.2) but with N replaced by N(ρ) = [(t 0 , t 1 )( X X) −1 (t 0 , t 1 ) ] −1 .

B. Equivalence of R Test and LMPI Test
Following [19], where P XZ = [X : Z]([X : Z] [X : Z]) −1 [X : Z] , an appropriate ANOVA table for the mixed model is based on the sum-of-squares decomposition y y = y P X y + y (P XZ − P X )y + y (I − P XZ )y.The first two terms on the right-hand-side of this decomposition, which we denote as S β and S s , respectively, correspond to fitting the fixed-effects first and then fitting the random-effects after the fixed-effects.The final term, denoted as S e , corresponds to the residual error sum-of-squares.It can be shown that the degrees of freedom associated with these three quadratic forms are p, r = rank(X : Z)−rank(X) and f = n−rank(X :Z), respectively.
Let the nonzero eigenvalues of C = Z (I − P X )Z be denoted by Δ 1 ≤ Δ 2 ≤ • • • ≤ Δ r , and define D = diag(Δ i ).Let d denote the number of distinct eigenvalues of C, and denote their values by Δ * 1 < Δ * 2 < • • • < Δ * d and their multiplicities by {r i } d i=1 .Define Q to be the m × r matrix of orthornormal eigenvectors corresponding to the {Δ i } r i=1 .Finally, let q = Z (I − P X )y and define s to be a solution to C s = q.The following results are proved in [19].e (I + ρD)], (b) t is independent of S e .(c) For J i = { j : Δ j = Δ * i }(note that the cardinality of J i is r i ) and S s (i) = j∈Ji t 2 j , S s (i) are independent with S s (i)/σ 2 e (1 + ρΔ * i ) ∼ χ 2 ri and S s = d i=1 S s (i).It is shown in [12] [14,15].
To prove the equivalence of the LMPI test based on (B.1) and the R test based on (8), we first note that S e + S s = y (I − P XZ )y + y (P XZ − P X )y = y (I − P X )y = r r, which establishes the denominator of the two test statistics coincide.Next, we observe that the numerator of LMPI test statistic, d i=1 Δ * i S s (i), can be reexpressed as t Dt = s QD 2 Q s.Since C = QDQ implies C 2 = QD 2 Q , we have t Dt = s CC s = q q = y (I − P X )ZZ (I − P X )y = r ZZ r, which establishes that the numerators of the two statistics are also identical.

Figure 1 :
Figure 1: (Data Set 1) Pointwise 95%-content tolerance intervals with 90% confidence.Dashed lines correspond to an ordinary least squares analysis and solid lines correspond to the estimated weighted least squares analysis.

Figure 2 :
Figure 2: (Data Set 2) Pointwise 95%-content tolerance intervals with 90% confidence.Dashed lines correspond to an ordinary least squares analysis and solid lines correspond to the estimated weighted least squares analysis.

8 International
Journal of Quality, Statistics, and Reliability (a) t = D 1/2 Q s has a MVN r [0, σ 2

Table 1 :
Tests for Heteroscedasticity in Figures1 and 2.

Table 2 :
Power comparison of the tests using {t i } n i=1 values from Data Set 1.

Table 3 :
Power comparison of the tests using {t i } n i=1 values from Data Set 2.Our aerospace case study, which pertains to the time evolution of performance metrics for electronic equipment, motivated the derivation of a model for heteroscedastic regression errors.A test for homoscedasticity was proposed and compared with various other tests that are prevalent in the literature.The proposed R test was shown to be equivalent to an LMPI test associated with a general mixed linear model of which this paper's heteroscedastic regression model can be regarded as a special case.Aside from what we judge to be a simpler intuitive motivation associated with the R test formulation, it has the practical benefit of being much simpler to compute than the formula that results from the classical LMPI derivation.The power of the R test was compared to some popular alternative tests, and only the considerably more difficult to compute likelihood ratio test has power that is comparable to the R test.We extended the classical application of tolerance intervals for regression to our heteroscedastic case and illustrated their use with six data sets from our aerospace case study.
that the locally most powerful invariant test statistic for testing H 0 isAlthough S e explicitly appears in the denominator of (B.1), use of the LMPI test does not require f > 0. For the cases where f = 0 (S e = 0), such as what will occur with our application since Z is a diagonal matrix of rank n, the LMPI test is still useable with the test statistic simply reducing to