One of the key issues in robust parameter design is to configure the controllable factors to minimize the
variance due to noise variables. However, it can sometimes happen that the number of control variables is
greater than the number of noise variables. When this occurs, two important situations arise. One is that
the variance due to noise variables can be brought down to zero The second is that multiple optimal control
variable settings become available to the experimenter. A simultaneous confidence region for such a locus
of points not only provides a region of uncertainty about such a solution, but also provides a statistical test
of whether or not such points lie within the region of experimentation or a feasible region of operation.
However, this situation requires a confidence region for the multiple-solution factor levels that provides
proper simultaneous coverage. This requirement has not been previously recognized in the literature. In
the case where the number of control variables is greater than the number of noise variables, we show how
to construct critical values needed to maintain the simultaneous coverage rate. Two examples are provided
as a demonstration of the practical need to adjust the critical values for simultaneous coverage.
1. Introduction
Robust Parameter Design (RPD) is also called Robust Design or Parameter Design in the literature [1, 2]. The concept of RPD was introduced in the United States by Genichi Taguchi in the early 1980s. It is a methodology that takes both the mean and variance into consideration for product or process optimization. Taguchi [3] divided the predictor variables into two categories: control variables and noise variables. Control variables are easy to control while noise variables are either difficult to control or uncontrollable at a large scale. In practice, we would like to find a range of control variables such that (1) the variance caused by the change of noise variables is minimized and (2) the mean response is close to target. Multiple optimization design and analysis methods have been developed to achieve these two goals simultaneously, ranging from the traditional Taguchi methods to the more sophisticated response surface alternatives. (See [2, 4, 5] for detailed reviews on these methods.)
Under certain conditions, Myers et al. [6] proposed a way to construct a confidence region of the control variables where the variability transmission by the noise variables is minimized to zero. Although they only focused on the variance part, there are many such situations in which focus is placed entirely on the process variance (see p506 in [5]). For instance, if the process mean can be optimized using certain control factors that do not interact with noise variables or impact the noise variance (i.e., “tuning factors”), then one can seek other control factors which can drive the noise variance to zero. Even if the process mean cannot be driven to the target by tuning factors alone, it is nonetheless illuminating to consider both the confidence region for the minimum process variance and the confidence region for the optimal process mean [5, 6]. If the number of noise variables is not greater than the number of control factors (for examples, see [7–9] and see pages 491–492, 499 in [5]), then the conditions for a minimum process variance and a zero-gradient solution will coincide.
Furthermore, a zero-gradient solution is quite useful in that, even for very large noise variation, the transmission of that noise to the output of the process can be made negligible (or very small) by utilization of zero-gradient (or near-zero-gradient) operating conditions. Therefore, it is desirable to have a way to statistically test for the existence of such a solution within the experimental region or a region of feasible operation (see p506 in [5]). A simultaneous confidence region for the locus of points forming a zero-gradient solution forms such a test and also provides a graphical measure of uncertainty about the zero-gradient solution.
The existence of a zero-gradient solution is especially interesting for mixture experiments with control factor ingredients and noisy process variables. If a zero-gradient solution for the ingredient mixture can be found, that is, within the experimental region, and the confidence region for the zero-gradient solution intersects one or more of the mixture simplex boundaries, then this implies that it may be possible to remove one or more mixture ingredients and still maintain very low noise variance. Removal of one or more mixture ingredients may help to reduce production cost [10].
Myers et al. [6] proposed a confidence region in control variables based upon the standard response surface model as shown in (1) for incorporating noise variables (see, for examples, [4, 11, 12])
y(x,z)=β0+x′β+x′Bx+z′γ+x′Δz+ε,
where x is a vector of k control variables, z is a vector of h noise factors, β0 is the intercept, β is a k×1 vector of coefficients for the main effects of control variables, γ is a h×1 vector of coefficients for the main effects of noise variables, B is a k×k matrix whose diagonals are the coefficients for the quadratic terms of control variables and whose off-diagonals are one-half of the control variable interaction effects, and Δ is a k×h matrix of control by noise variable interaction effects. ε is the random error term. It is assumed here that ε~N(0,σ2).
Assuming that noise variables have mean zero and variance-covariance matrix Vz, the variance of the response in Model (1) is
σ2[y(x,z)]=(γ+Δ′x)′Vz(γ+Δ′x)+σ2.
Here the variance of the response is divided into two parts: the variance transmitted by the noise variables (represented by the first term) and the constant variance (σ2) due to modeling error and other factors not considered in the model. In other words, it is the noise variables that lead to the variance heterogeneity in the response. Since the changes of noise variables are inevitable in practice, Myers et al. [6] proposed that the “minimum process variance” can be reached by setting the slope of the noise variables, γ+Δ′x, equal to zero and, therefore, eliminating the noise variance part from the response variance. A confidence region for such control variable values can be constructed by inverting a hypothesis test of the form:
H0:γ+Δ′x=0,
for each x-point. To simplify the notation, let ψ be a p×1 vector that contains all the elements of the noise variable’s main effect vector γ and the interaction matrix Δ, that is, ψ=(γ1,δ11,δ12,…,δ1k,γ2,δ21,δ22,…,δ2k,…,γh,δh1,δh2,…,δhk)′, where γi,i=1,2,3,…h, is the ith element of vector γ and δij,i=1,2,3,…,h,j=1,2,3,…k, is the element of matrix Δ′ in the ith row and jth column. Let M(x)=[Ih⊗(1,x′)], where Ih is an identity matrix of dimension h, x is a k×1 vector of the corresponding control variables that interact with noise variables, and ⊗ denotes the Kronecker product. The null hypothesis in (3) can then be written as
H0:M(x)ψ=0,
where M(x) is a h×p matrix and M(x)ψ=γ+Δ′x. Now the 1-α confidence region in control variables for zero variance due to noise variables can be defined as
{x:M(x)ψ=0isnotrejectedatlevelα}.
Furthermore, let Q(x;ψ̂) denote the test statistic for H0 which is
(M(x)ψ̂)′[M(x)V̂ψ̂M(x)′]-1(M(x)ψ̂),
where ψ̂ is the estimate of ψ; V̂ψ̂ is the usual unbiased estimate of the variance of ψ̂, Vψ̂.
LetΨ̂~N(ψ,Vψ̂), Myers et al. [6] have shown that, for each fixedx,Q(x;Ψ̂)~hF(h,v), where Ψ̂ is a vector of random variables (i.e., Ψ̂ is an estimator of ψ, whereas ψ̂ is an estimate from the actual data), v is the residual degrees of freedom (df), and F(h,v) is F distribution with numerator df equal to h and denominator df equal to v. They then conclude that the 100(1-α) percent confidence region in (5) is
Cx(cα)={x:Q(x;ψ̂)≤cα},
where the critical value cα=hF(1-α,h,v).
Note that the confidence region in (7) (called the “MKG confidence region” from now on) and the critical value cα=hF(1-α,h,v) were derived based on two critical assumptions: (1) the minimum of the variance due to noise variables is zero; (2) the solution to the zero-gradient equation in (4) is unique. There are situations where the first assumption cannot be met due to the fact that the solution to (4) is either outside the experimental region or does not exist. (However, the approach proposed in this article may provide the determination of such existence in a statistically significant way.) Assuming that the first assumption is met (like the two examples in Section 3), the second assumption is only true when h≥k. Notice that (4) represents a series of h equations with k unknown control variables. As recognized by Myers et al. (see page 506 in [5]), the equation will result in a single point solution when h=k, a line or hyperplane when h<k, and a single point solution or no solution when h>k. In other words, the MKG confidence region provides the correct critical value for the zero-gradient solution (if it exists) only when there are at least as many noise variables as control variables.
However, when the number of noise variables is less than the number of control variables, multiple solutions can exist to the zero-gradient equation in (4). In such situations, use of the MKG region will provide below nominal simultaneous coverage. As such, a confidence region which covers all the solutions simultaneously needs to be developed. In practice, statistical inference for the multiple-solution problems is important as this gives the experimenter more options with regard to finding the zero-gradient factor settings. The objective of this paper is to generalize the MKG confidence region such that it will provide the adequate coverage for both the single-solution case (where h≥k) and multiple-solution case (where h<k).
The rest of this paper will be organized as follows. Section 2 will focus on the derivation of such a generalized confidence region and the corresponding critical value required for inverting the associated null hypothesis. In Section 3, we give two examples to demonstrate the difference in simultaneous coverage between the MKG confidence region and our proposed confidence region when the number of control factors exceeds the number of noise variables. Section 4 provides a summary of the results.
2. A Generalized Confidence Region Approach2.1. The Multiple Zero-Gradient Solution Problem
To address the multiple solution situation, the hypothesis in (4) is generalized to H0:M(x)ψ=0,forallx∈ℒ, where ℒis the linear subspace representing either a unique single solution (i.e., a point) or multiple solutions (i.e., a line, or a hyperplane). In other words, the confidence region could be a collection of either points or linear subspaces (of dimension ≥1) depending on whether the solution to the equation M(x)ψ=0 is unique or not. Therefore, we propose to generalize the MKG confidence region to
CL(cα)={L:Q(x;ψ̂)≤cα∀x∈L},
where ℒ represents the linear subspace of the space defined by the elements of x, which are solutions to γ+Δ′x=0. Here x is a k×1 vector. ℒ has dimension d, where d is defined as k-h,ifh<k, 0 otherwise. Therefore, when the solution to γ+Δ′x=0 is a single point, d=0; otherwise, d>0. In this section, we derive values for cα in (8). When d>0 and ℒ is, therefore, not a point, computation of the confidence region in (8) may appear difficult due to the replacement of x-points by ℒ subspaces. However, in this section, we will also show that the confidence region in (8) is equivalent to one based on pointwise gridding (of the type done for the MKG confidence region computations).
We call the confidence region given in (8) the generalized zero-gradient (GZG) confidence region. As indicated by the definition, the MKG confidence region is a special case of the GZG confidence region where ℒ is a point and d=0. The MKG confidence region is correct, and the critical value is cα=hF(1-α,h,v) for h≥k, if a solution exists. (If h≥k and the MKG region is the null set, then there is statistically significant evidence that a solution does not exist.) The next question is what value should cα take when d>0? For d>0, note that a 100(1-α)% GZG confidence region should contain the zero-gradient solution set ℒ, with probability 1-α before the experiment is performed. It is worth pointing out that when h<k, the GZG confidence region in (8) is a simultaneous confidence region problem in that a line or hyperplane will be included in the confidence region only if all the points on the line or hyperplane satisfy the criterion in (8). Therefore, the GZG confidence region in (8) can also be expressed as{L:QL≤cα},
where Qℒ=maxx∈ℒQ(x;ψ̂) and ℒ={x:M(x)ψ=0}. To find the critical value cα for h<k, we need to first investigate the distribution of the test statistic Qℒ when H0:M(x)ψ=0 is true.
When h=1, M(x)=(1,x′), which is a 1×p vector. Based on Miller’s theorem (see p65 and p113 in [13]), the critical value should be cα=(d+1)F(1-α,(d+1),v), where d is the dimension of the solution set, ℒ, for the equation M(x)ψ=0. Here, v is the degrees of freedom of the residuals. Note that when h=k=1, d=0 means that the solution is a point, which is a linear space of dimension zero. The critical value (d+1)F(1-α,(d+1),v) then becomes the MKG critical value F(1-α,1,v) because d=0.
For the k>h>1 case, the distribution of Qℒ is more complex. Section 2.2 addresses the full model in (1) where the experimental design is completely orthogonal or partially orthogonal so that Vψ̂=cσ2I for some positive constant, c, residual variance σ2, and an identity matrix I, of dimension of p. Here, an exact simultaneous confidence region is derived. For the general case, Section 2.3 proposes a simulation method based upon the multivariate t-distribution to find approximate critical values with which to construct the confidence region.
2.2. Full Model-Orthogonal Case
Here, we assume that the data are generated from an orthogonal design or partially orthogonal design such that Vψ̂=cσ2I. Furthermore, it is assumed that we have a full-noise-control variable interaction model, meaning that each noise variable interacts with the same set of control variables, that is, each element of interaction matrix Δis nonzero. If Vψ̂=cσ2I,c>0, and all the elements of the interaction matrix Δ are nonzero, then, for k>h>1, the distribution of the test statistic Qℒ has the same distribution as a function of a chi-square random-variable and a random Wishart’s matrix (which are stochastically independent) as shown below:
QL~λmax(A)U/v,
where U~χ2(v), A~Wishart (Id+1,h), and v is the residual degrees of freedom. Here, Id+1 is a (d+1)×(d+1) identity matrix, where d is the same as defined in Section 2.1. The degrees of freedom of the Wishart distribution is h, and λmax(A)is the maximum eigenvalue of the matrix A. The proof of this result is provided in Appendix A.
Using (10), the critical value, cα, can then be computed as the 100(1-α)th percentile of the distribution of Qℒ, which can be obtained by the simple Monte Carlo simulation from χ2 and Wishart’s distributions. Some limited tables of critical values are given in Appendix B based on (10), although computation of the critical value for any specific case using the random variable in (10) is easily accomplished. Note that the critical value determined by (10) becomes the MKG critical value, hF(1-α,h,v), when d=0, that is, h=k. This is because when d=0, A has a Wishart (1,h) distribution which is χ2(h). In other words,A is not a matrix anymore but a χ2(h) random variable. Hence, λmax(A)=A. So Qℒ can then be written as
QL=hA/hU/v~hF(h,v).
Furthermore, the same result in (10) holds when [γ,Δ] has an h×(k+1) matrix normal distribution. (See p90-91 in [14].) (For details, see [15].)
2.3. The General Case
In some cases, the experimental design may be such that Var(ψ̂) does not have the orthogonal cσ2I form, or we may wish to use a model with some “control × noise variable” interaction terms deleted, that is, Δ has some zero elements. In such situations, when k>h>1, the distribution of Qℒ does not have a simple form and may depend upon ψ even under H0. Nonetheless, in such situations, it is still possible to obtain approximately conservative simultaneous confidence regions for control variables associated with zero-gradient solutions. We provide such a construction as follows.
Recall that Qℒ in (9) is a function of ψ, and consider
Qmax=maxψ∈Cψ(bα)QL=maxψ∈Cψ(bα)maxM(x)ψ=0Q(x;Ψ̂),
where Cψ(bα)={ψ:(ψ-ψ̂)′Vψ̂-1(ψ-ψ̂)≤bα} and bα=F(1-α,1,n-p). Note that by using bα=F(1-α,1,n-p),Qmax is an approximate upper confidence bound for the scalar-valued quantity, Qℒ. (See Clarke, [16], for a discussion of confidence bounds on nonlinear functions of model parameters constructed from confidence regions.) Let cα* denote the 100(1−α)th percentile of the distribution of Qmax under H0. Consider the confidence region defined by Cxmax={x:Q(x;ψ̂)≤cα*}. This confidence region should provide (at least approximately) a conservative simultaneous confidence region for the zero-gradient solutions. However, computation of cα* (using (12)) and the associated confidence region is numerically difficult due to the complex constraints associated with the definition of Qmax. Fortunately, it can be shown that
Qmax=maxx∈Cx(bα)Q(x;Ψ̂),
where Cx(bα)={x:Q(x;ψ̂)≤bα}. A proof is given in Appendix C. The expression for Qmax in (13) allows for much easier computation of the cα* critical value. The actual construction of the GZG confidence region from the relevant critical value will be outlined in Section 2.4.
2.3.1. The Critical Value Computation for the General Case
Note that under H0:M(x)ψ=0 we can express Q(x;Ψ̂) as
(M(x)t)′(M(x)ΩM(x)′)-1(M(x)t),
where V̂ψ̂=s2Ω,s2 is the mean squared error, Ω is a known matrix computed from the design matrix, and t=(Ψ̂-ψ)/s. Here, t follows the multivariate tdistribution with location parameter equal to zero, scale matrix Ω, and degrees of freedom v. Using (14) we can then compute the critical value, cα*, using the Monte Carlo simulations as follows.
Step 1.
Compute Cx(bα)={x:Q(x;ψ̂)≤bα}, where bα=F(1-α,1,n-p).
Step 2.
Simulate a multivariate t random vector (rv) with scale matrix Ω and ν df. (This can be done by simulation of a multivariate normal rv with mean vector 0 and variance-covariance matrix Ω and a chi-square random variable with ν df. See [17] for details.)
Step 3.
Compute Qmax using the expressions in (13) and (14). (For practical reasons, computation of Qmax can be done by maximization of Q(x;Ψ̂) over Cx(bα)∩R instead, where R is a prespecified, bounded region. This will calibrate the coverage to be simultaneous only over ℒ0∩R, where ℒ0 is the true linear subspace such that M(x)ψ=0.)
Step 4.
Do Steps 2-3 a large number of times to estimate the 100(1-α)th percentile of the Monte Carlo distribution of Qmax. This 100(1-α)th percentile is then a Monte Carlo estimate of cα*.
2.3.2. The Coverage Rate of the Critical Value
In order to check the accuracy of cα* as a critical value, we have done some Monte Carlo simulations of the above four-step procedure using three different noise variable models in conjunction with both orthogonal and nonorthogonal designs. The statistical models used are summarized in Table 1. These models are constructed so that the zero-gradient solution exists in the experimental region. Three partially orthogonal, face-centered central composite experimental designs were assessed, with associated statistical models 1, 2, and 3, respectively. These designs employed a coded factor space with factor levels equal to ±1 (except for the center points). The axial points in noise variables are deleted to maintain partial orthogonality. The factorial part of the designs is either full factorial (e.g., model 1 and model 2) or half factorial (e.g., model 3). The nonorthogonal designs are constructed by changing the (one) factorial point (comprised of all −1s) from (-1,-1,…,-1,-1) to (-1,-1,…,-1,0). The resultant sample size (n) and the residual df (ν) of each composite design are both listed in Table 2. For demonstration purposes, we simply chose model parameters to be either 1 or −1, with residual error variance equal to 1.
The coverage rates for the GZG confidence region using the approximate and conservative critical values, c̃α and cα*, respectively. The nominal coverage rate here is 95%. (Here k= no. of control factors, h= no. of noise variables, n= sample size, and v= residual df).
Model no.
k
h
n
v
Approximate coverage rates
Conservative coverage rates
Orthogonal design
Nonorthogonal design
Orthogonal design
Nonorthogonal design
1
3
2
40
28
96.1%
96.1%
96.9%
96.9%
2
4
2
74
60
96.3%
96.3%
98.0%
98.0%
3
4
3
72
57
96.1%
96.1%
97.2%
97.2%
The results of these coverage rate simulations are summarized in Table 2. The coverage rates were computed as follows. The models in Table 1 were used to compute the true ℒ space, ℒ0. For each of the three models, the region, R, was a hypercube constructed from the Cartesian product of intervals of the form [-10,10]. For each simulation, a dataset was generated based on the model and the corresponding central composite design, a critical value cα* was computed based on the simulated dataset, then a check was done to see if the event
maxx∈L0∩EQ(x;Ψ̂)≤cα*
occurred, where E is the convex hull formed by the ±1 factor levels. For each simulated dataset, the critical value of cα* was computed using 1000 Monte Carlo simulations. 5000 simulations were done to assess the simultaneous coverage of the GZG confidence region for the set ℒ0∩E. In an attempt to reduce the conservatism of the above approach for computing cα*, we also considered the approximate approach obtained by maximizing Q(x;Ψ̂) over ℒ0(ψ̂)∩R, where ℒ0(ψ̂)={x:M(x)ψ̂=0}=Cx(0). We denote this approximate critical value by c̃α and use it in place of cα* to reduce conservatism.
Remark 1.
Because Cx(bα) is a function of the data, the relatively large region, R, was chosen for these simulations so that Cx(bα)∩R would be extremely unlikely to be empty for any simulated dataset. In addition, we did not want to rule out situations where the confidence region was outside the experimental region. While, in practice, such extrapolated inferences must be treated with caution, nonetheless it may be desired to compute such a confidence region. Such a confidence region outside the experimental region suggests that it may not be possible to obtain a “zero-gradient” solution for noise transmission, at least within the current experimental region. However, such a confidence region just outside the experimental region may offer hope that resetting process control conditions may allow for a more robust process. Of course, additional experiments outside the current experimental region would be needed to confirm this.
Remark 2.
Maximization of Q(x;Ψ̂) over Cx(bα)∩R, to compute the Monte Carlo critical value, cα*, was accomplished by using the SAS/IML Nelder-Mead simplex algorithm, nlpnms. This was done to make the Monte Carlo simulations of this Monte Carlo procedure tractable. Some limited simulations were also done whereby the maximization of Q(x;Ψ̂) over Cx(bα)∩R was computed by gridding instead. This was done to make sure that the Nelder-Mead algorithm did not stop its maximization prematurely. In all cases, each critical value, cα*, computed using nlpnms, was slightly larger than that obtained using gridding. (Random number seeds were aligned to avoid the Monte Carlo differences in the comparisons between gridding and the use of the Nelder-Mead simplex algorithm.) For the approximate approach, maximization over ℒ0(ψ̂)∩R was done by gridding as this was easier to accomplish with finer gridding.
Table 2 below displays the percent of times the event in (15) occurred for each of the three models with and without an orthogonal design. If the event in (15) occurs, then that portion of the true linear subspace, ℒ0 (within E), is entirely covered by the GZG confidence region; otherwise it is not.
Table 2 indicates that the simultaneous coverage rate of the GZG confidence region using the conservative critical value, cα*, produces reasonably conservative results, while the approximate approach (that maximizes over ℒ0(ψ̂)∩R, instead) achieves closer to nominal (yet slightly conservative) coverage rates. It is interesting to note that for each approach the coverage rate appears to be insensitive to the minor departure from orthogonality that was induced by changing the (one) factorial point (comprised of all −1s) from (-1,-1,…,-1,-1) to (-1,-1,…,-1,0). Such a departure from design orthogonality could happen due to a design execution error or a process restriction.
2.3.3. The Full Model Nonorthogonal Case
Because computation of cα* and c̃α requires maximization within a Monte Carlo calculation, it would be useful to assess if this can be eliminated when a full model is employed. We, therefore, conduct another simulation study to see if the critical value based upon the random variable in (10) can be used as an approximate critical value for mild departures from orthogonality. We use the same nonorthogonal designs as used in Table 2. The corresponding full-interaction models are listed in Table 3. This time the cα critical value was used with these nonorthogonal designs to assess the simultaneous coverage rate. The results are shown in Table 4. In order to assess the coverage rate gridding had to be done over a subset of ℒ0. As a more fair comparison with the theoretical cα critical value, gridding was done over ℒ0∩R, (whereas before R is a hypercube region composed of the Cartesian product of the intervals [-10,10], instead of [-1,1]). This is because the cα critical value associated with the random variable in (10) is computed by maximization over the whole linear subspace, ℒ0.
The simultaneous coverage rate of the critical value for the full model nonorthogonal designs (no. of simulations = 100,000, and nominal coverage rate = 95%. k: no. of control factors, h: no. of noise variables, n: sample size, and v: residual df).
Model no.
k
h
n
v
Coverage rate
4
3
2
40
24
95.2%
5
4
2
74
56
95.0%
6
4
3
72
49
95.1%
Table 4 indicates that this minor departure from orthogonality has virtually no effect on the coverage rate of the GZG confidence region when the more convenient cα critical value is used. For more radical departures from orthogonality, it may possibly be safer to use the conservative cα* critical value. But further robustness studies are needed to ascertain how well the more convenient cα critical value works under departures from its assumptions.
2.4. Computation of Simultaneous Confidence Region
For the k>h case, once we have computed the critical value, the confidence region can be computed by searching linear subspaces, ℒ, that satisfy the condition as defined in (9). However, searching over various lines or hyperplanes that span an experimental region is more computationally difficult than searching the same experimental region in a pointwise fashion. Fortunately, it can be shown that, for any given critical value, the GZG confidence region can be computed by pointwise gridding. This is because for Cx(cα) in (7) and Cℒ(cα) in (8), with the same critical value, Cx(cα)=Cℒ(cα). A proof is provided in Appendix D. This equivalency shows that one can construct the GZG confidence region by simply gridding over the experimental region in a pointwise fashion.
3. Examples3.1. One Noise Variable
This example is from Myers et al. [6]. It was originally taken from Montgomery [18] (2009, page 231). The data was generated from a 24 factorial experiment with a total of 16 observations from a pilot plant to explore the factors that could affect the filtration rate of a chemical bonding substance. The goal is to maximize the filtration rate, y.
As in Myers et al. [6], one of the four factors, temperature, is assumed difficult to control at large scale and, therefore, treated as a noise variable z. The rest of the factors are control variables: x1: pressure, x2: concentration, x3: stirring rate. The fitted model is
ŷ=70.06+10.81z+4.94x2+7.31x3-9.06x2z+8.31x3z-0.56x2x3,
with mean square error equal to 21.12 and residual df equal to 9.
The estimated slope of noise variable is
γ̂+Δ′̂x=10.81-9.06x2+8.31x3.
Therefore, k=2 and h=1. The solution to the null hypothesis H0:γ+Δ′x=0 is a line. Then the general critical value 2F(1-α,2,v) (based on Miller’s Theorem (1981) [13]) should be used to calculate the GZG confidence region (as shown in Figure 1). The GZG confidence region in Figure 1 is clearly wider than the MKG confidence region in Myers et al. (1997, Figure 2 in [6]) where F(1-α,1,v) is used as the critical value. It is clear from Figure 1 that we are at least 95% confident that the zero-gradient locus of points passes through the experimental region.
The 95% GZG confidence region for one-noise-variable and two-control-variable case.
Next, we do some simulations to compare the coverage rates of the GZG and MKG confidence regions. Since the true optima is not known in practice, we calculate the coverage rate for the solution to γ̂+Δ′̂x=10.81-9.06x2+8.31x3=0, using a simulation model equal to the fitted model in (16) with σ2=21.12. Note that the true solution in this example is a line with infinite length. But the simulation is done only for the line within the experimental region, that is, [-1,1] of the control variables.
Using 100,000 Monte Carlo simulations, the simultaneous coverage rate of the GZG confidence region for all of the zero-gradient solutions in the experimental region is 97% while the MKG confidence region only has 92% coverage rate. The MKG confidence region has a lower coverage rate because it was designed to contain the true optima only when the optimum is a point. Although the GZG confidence region is designed to contain all the true solutions (which could be a point, a line, or a hyperplane), the simulated coverage rate tends to exceed the nominal coverage rate because the simulation is done within a finite range of the control variables while the line or hyperplane has an infinite range in theory.
3.2. Two Noise Variables
This example comes from a face-centered central composite design with the factorial part being a half-fractional factorial design (see details in [19]). The objective of this study is to find the optimized condition that maximizes the yield of diacylglycerol oil, which is a natural component of various edible oils and has shown some beneficial effects as compared to the traditional triacylglycerol oil.
Five factors were studied in this experiment: reaction time (RTIME), enzyme load (ENZL), reaction temperature (RTEMP), water content (WATC), and substrate molar ratio (SUBR). Water content (WATC) is difficult to control at large scale [19] and, therefore, treated as a noise variable. For illustration purposes, substrate molar ratio is also treated as a noise variable, and the axial points corresponding to the noise variables are excluded from the analysis to obtain partial design orthogonality with respect to the noise variables (i.e., to ensure Vψ̂=cσ2I). The final model in coded factor value is as follows:ŷ=57.58+9.12x1+4.78x2+11.01x3-4.69x12-9.47x22-7.37x32-1.61x23-2.05z1+4.83z2-2.92z1x1-2.07z1x2-2.12z1x3-3.17z2x1+4.90z2x2-2.41z2x3,
where x1: RTIME, x2: ENZL, x3: RTEMP, z1: WATC, z2: SUBR. Here, the residual mean squared error is equal to 2.56 with 25 observations and residual df equal to 9.
Since k=3,h=2, the solution to the null hypothesis is a line in a 3-dimensional space determined by control variables x1,x2,andx3. Therefore, the confidence region for this line is a tube in this 3-dimensional space. A 95% GZG confidence region is shown in Figure 2. Based on (10), the GZG critical value is obtained via χ2 and Wishart’s distribution. From Figure 2, we can see that while this confidence region does not provide statistically significant evidence that the zero-gradient locus of points passes through the experimental region, it does appear that a good portion of the confidence region is within the experimental region, and hence, attainment of near-zero-gradient conditions should be feasible for this process.
The 95% GZG confidence region for two-noise-variable and three-control-variable case.
As with previous example, we compare the coverage rates for the GZG and MKG confidence regions using the fitted models as true population models. Using 100,000 Monte Carlo simulations (based upon the fitted model in (18) with σ2=2.56), the simultaneous coverage rate is 96% while it is only 90% for the nominal 95% MKG confidence region. (Here, gridding was done over the cube formed by the Cartesian product of [-1,1] associated with each xi.)
4. Summary
This paper shows that when the number of control variables does not exceed the number of noise variables, the MKG approach provides a confidence region for control variables associated with a zero-gradient for noise transmission. Otherwise, the MKG approach results in a confidence region that is too small for simultaneous coverage of the linear subspace of zero-gradient solutions. It is important to know that the true optimal condition represented by control variables is either a line or a hyperplane instead of a single point when h<k. In this situation, constructing a simultaneous confidence region about the linear subspace solution is desirable in that a subspace of solutions provides the investigator with many options for setting the zero-gradient control level. Of course a confidence region also provides the experimenter with a measure of uncertainty for the optimal solution. If the confidence region is too large, further experimental runs may be needed to make more accurate inferences. If the current manufacturing set point is outside of the confidence region, this provides statistically significant evidence that reconfiguration of the set point may help improve process variability by lowering the transmission of noise through the system. The GZG confidence region for the zero-gradient conditions is proposed and is shown to provide nominal or reasonably conservative coverage rates for many noise variable experiments that occur in practice.
In the situation where there are many noise variables, it may be either costly or difficult to study all the noise effects. One way to deal with this problem is to combine the multiple noise factors into one compound noise factor with two extreme conditions as its two levels (provided certain assumptions can be satisfied). See [1, 20, 21] for discussion. If the noise factors can be combined into one compound noise factor, then we could have a situation where the number of control variables is greater than the number of noise variables. The GZG approach is directly applicable for this situation. In some cases, however, one may desire to create a compound noise factor, with more than two levels [22], or two or more compound noise factors. In either case, as long as the predictive model is in the form of (1), the GZG approach is applicable.
The GZG confidence region provides inferences about the optimal control point or points that yield a zero-gradient for the transmission of variability from the noise variables. However, there are situations where the control points corresponding to zero noise variance are either outside the experimental region or simply do not exist. In this case, it would still be useful to find the constrained optimal control point for minimum noise variance over the experimental region. A method to further generalize the confidence region for the constrained optimal point for minimum noise variance is needed and is currently under development.
AppendicesA. Proof of (10)
We will prove the result for the no-intercept case where γ=0 in the hypothesis γ+Δ′x=0. It then follows that the result can be generalized to the intercept case where γ≠0.
Part 1.
For the no-intercept case:
When k>h, the solution, x, to Δ′x=0 can be expressed as the linear combination of the basis vectors for the linear subspace ℒ, that is, x′=w′L, where w is a d×1 coefficient vector, and L is a d×k matrix where the rows consist of the basis vectors of linear subspace ℒ. (Here, w∈ℝd and only L is a function of Δ.) Then M(x)=Ih⊗w′L. Let z=(ψ̂-ψ)/(σc), then z~N(0,Ip) and M(x)z=G(z)x, where G(z) is defined as
G(z)=(z11⋯z1k⋮⋱⋮zh1⋯zhk),
where zij=(δ̂ij-δij)/(σc),i=1,2,3,…,h,j=1,2,3,…,k, and δij is the jth element on the ith row of the matrix Δ′. Since the vector z~N(0,Ip), G(z)′G(z)~Wishart(Ik,h) by the definition of Wishart distribution (see p92 in [23]). Let s2 be the sample estimate of the residual variance σ2, then the test statistic Qℒ becomes
QL=maxx∈L(M(x)(ψ̂-ψ))′×(cs2M(x)IM(x)′)-1(M(x)(ψ̂-ψ))=σ2s2maxw((Ih⊗w′L)(ψ̂-ψσc))′×((Ih⊗w′L)(Ih⊗w′L)′)-1((Ih⊗w′L)(ψ̂-ψσc)).
By replacing (ψ̂-ψ)/(σc) by z in (A.2), Qℒ becomes
QL=σ2s2maxww′LG(z)′G(z)L′ww′LL′w=σ2s2λmax(D),
where D=(LL′)-1LG(z)′G(z)L′ and w′LL′w is a scalar. (For a proof of (A.3), see the result in Problem 22.1 in Rao 1973 [24, p74].)
By the definition of an eigenvalue, it follows that
|(LL′)-1LG(z)′G(z)L′-λmaxId|=0.
Note LL′ is positive definite and symmetric, and, therefore, (LL′)-1 is positive definite and symmetric as well. Therefore, both (LL′)1/2 and (LL′)-1/2 exist.
Now multiply both sides of (A.4) by (LL′)1/2 from the left, then multiply both sides by (LL′)-1/2 from the right, we get
|(LL′)-1/2LG(z)′G(z)L′(LL′)-1/2-λmaxId|=0.
Let A=(LL′)-1/2LG(z)′G(z)L′(LL′)-1/2, then λmax(D)=λmax(A). Therefore,
QL=σ2s2λmax(A)~λmax(A)U/v,
where U=vs2/σ2 follows χ2(v) distribution.
Next, we show that A~Wishart (Id,h). Because G(z)′G(z) ~Wishart (Ik,h) and L is a d×k matrix with rank d, by the property of Wishart distribution (see the theorem in Rencher 1998 [23, page 56]), LG(z)′G(z)L′~Wishart (LL′,h). Note that rank(L′L)= rank((L′L)-1)= rank(L)=d. Since (LL′)-1 is symmetric and positive definite, then (LL′)-1/2 is also symmetric and positive definite, and its rank is d. Hence, A ~Wishart (Id,h).
Part 2.
For the intercept case:
The intercept case can be proved using the same arguments as indicated by Miller ([13, page 113]). Note that, in the intercept case, the rank of (LL′)-1/2 is d+1. Hence, A~Wishart(Id+1,h).
B. Critical Values Based on (10)
See Table 5.
Each critical value is generated by 100,000 simulations.
v
h
k
α=0.01
α=0.05
α=0.1
2
2
3
361.2
69.40
33.57
3
2
3
101.7
32.31
18.99
4
2
3
58.37
23.10
14.80
5
2
3
40.90
18.71
12.71
6
2
3
33.96
16.22
11.47
7
2
3
27.87
14.77
10.58
8
2
3
25.05
13.72
10.01
9
2
3
23.13
13.04
9.65
10
2
3
21.18
12.45
9.37
20
2
3
17.14
10.53
8.18
30
2
3
15.04
10.06
7.75
40
2
3
14.16
9.26
7.28
50
2
3
13.78
9.18
7.53
60
2
3
13.33
9.09
7.28
70
2
3
13.42
9.09
7.29
80
2
3
12.84
8.95
7.22
90
2
3
12.90
8.98
7.18
100
2
3
12.76
9.00
7.29
2
2
4
473.9
95.48
46.42
3
2
4
142.5
45.26
26.65
4
2
4
74.17
30.73
20.11
5
2
4
53.50
24.96
17.17
6
2
4
43.52
21.75
15.41
7
2
4
35.82
19.17
14.10
8
2
4
31.84
18.08
13.40
9
2
4
29.91
17.17
12.88
10
2
4
25.82
15.81
12.02
20
2
4
19.22
13.10
10.54
30
2
4
18.27
12.30
10.06
40
2
4
16.58
11.87
9.75
50
2
4
16.23
11.62
9.54
60
2
4
16.01
11.54
9.48
70
2
4
15.44
11.44
9.49
80
2
4
16.05
11.50
9.43
90
2
4
15.24
11.26
9.28
100
2
4
15.61
11.15
9.39
2
3
4
513.8
98.84
46.93
3
3
4
139.7
44.78
26.45
4
3
4
76.92
31.03
20.09
5
3
4
53.05
24.64
16.92
6
3
4
42.91
21.71
15.38
7
3
4
36.41
19.55
14.20
8
3
4
32.46
18.06
13.42
9
3
4
29.49
16.94
12.84
10
3
4
26.53
16.05
12.28
20
3
4
20.14
13.28
10.70
30
3
4
17.63
12.28
9.96
40
3
4
16.88
12.06
9.94
50
3
4
16.85
11.87
9.56
60
3
4
15.92
11.39
9.35
70
3
4
16.00
11.66
9.50
80
3
4
16.16
11.33
9.55
90
3
4
15.70
11.24
9.37
100
3
4
15.07
11.09
9.17
C. Proof of (13)
Note that
maxψ∈Cψmax{x:M(x)ψ=0}Q(x;Ψ̂)=max{(x,ψ):M(x)ψ=0,ψ∈Cψ}Q(x;Ψ̂).
Adapting the proof of Theorem 2.1 in Peterson et al. [25], it follows that for any critical value, bα,
Cx(bα)={x:M(x)ψ=0,ψ∈Cψ(bα)}.
Since Q(x;ψ̂)is not a function of ψ, it follows directly that
max{(x,ψ):M(x)ψ=0,ψ∈Cψ}Q(x;Ψ̂)=maxx∈Cx(bα)Q(x;Ψ̂).
So (13) follows directly from (C.1) and (C.3).
D. The Proof That Cx(cα)=Cℒ(cα)
Recall that Cx(cα)={x:Q(x;ψ̂)≤cα},Cℒ(cα)={ℒ:Q(x;ψ̂)≤cαforallx∈ℒ}. By definition, x∈Cℒ(cα) implies that x∈Cx(cα). Next, we will show that if x∈Cx(cα), then x∈Cℒ(cα). This can be proved by contradiction. Suppose that there exist some x′s such that x∈Cx(cα), but x∉Cℒ(cα). Then there exists at least one x point, say x*, in Cx(cα) such that there is no linear subspace that satisfies the two conditions: (1) it contains x*, (2) for every point x in this subspace Q(x;ψ̂)≤cα. Define Cψ(cα)={ψ:(ψ̂-ψ)′V̂ψ̂-1(ψ̂-ψ)≤cα} and consider the set, Cx*={x:M(x)ψ=0,forallψ∈Cψ(cα)}. It follows using a proof analogous to that in Theorem 2.1 in Peterson et al. [25] that Cx(cα)=Cx*. Therefore, x*∈Cx*. This then implies that there exists a ψ*∈Cψ(cα) such that M(x*)ψ*=0. If h<k, then there exists a line or hyperplane ℒ*such that all the x′s that satisfy M(x)ψ*=0 are in ℒ*. Hence, x*∈ℒ*. Again using the proof of Theorem 2.1 in Peterson et al. [25], it follows that for any given point x, Q(x;ψ̂)≤cα if and only if M(x)ψ=0 for some ψ∈Cψ(cα). Therefore, Q(x;ψ̂)≤cα for all the x′s in ℒ*. In other words, ℒ* satisfies the above two conditions, which is a contradiction.
NairV. N.Taguchi's parameter design: a panel discussion1992342127161RobinsonT.BorrorC.MyersR. H.Robust parameter design: a review2004201811012-s2.0-1442333281TaguchiG.1986White Plains, NY, USAUNIPUB/Kraus InternationalMyersR. H.KhuriA. I.ViningG. G.Response surface alternatives to the taguchi robust parameter design approach1992462131139MyersR. H.MontgomeryD. C.Anderson-CookC. M.20093rdNew York, NY, USAJohn Wiley & SonsMyersR. H.KimY.GriffithsK. L.Response surface methods and the use of noise variables19972944294402-s2.0-0031246467KunertJ.AuerC.ErdbrüggeM.EwersR.An experiment to compare Taguchi's product array and the combined array200739117342-s2.0-33847721904RobinsonT. J.WulffS. S.MontgomeryD. C.KhuriA. I.Robust parameter design using generalized linear mixed models200638165752-s2.0-31644438181GoldfarbH. B.BorrorC. M.MontgomeryD. C.Anderson-CookC. M.Using genetic algorithms to generate mixture-process experimental designs involving control and noise variables200537160742-s2.0-18144412011Del CastilloE.CahyaS.A tool for computing confidence regions on the stationary point of a response surface20015543583652-s2.0-003564679210.1198/000313001753272349BoxG. E. P.JonesS.Design products that are robust to environment199056Madison, WI, USACenter for Quality and Productivity Improvement, University of WisconsinLucasJ. M.How to achieve a robust process using response surface methodology19942642482602-s2.0-0028516023MillerR. G.19812ndNew York, NY, USASpringerTimmN. H.2002New York, NY, USASpringerChengA.Confidence Regions for Optimal Control Variables for Robust Parameter Design Experiments2011Temple University, Department of StatisticsClarkeG. P. Y.Approximate confidence limits for a parameter function in nonlinear regression198782397221230Kotz S.JohnsonN. L.Multivariate t-distribution19826Hoboken, NJ, USAJohn Wiley & Sons129130MontgomeryD. C.20097thNew York, NY, USAJohn Wiley & SonsKristensenJ. B.XuX.MuH.Process optimization using response surface design and pilot plant production of dietary diacylglycerols by lipase-catalyzed glycerolysis20055318705970662-s2.0-2544452179810.1021/jf0507745WuJ.HamadaM.2000New York, NY, USAJohn Wiley & SonsSinghJ.FreyD. D.SoderborgN.JugulumR.Compound noise: evaluation as a robust design method20072333873982-s2.0-3404713492710.1002/qre.812HouX. S.On the use of compound noise factor in parameter design experiments20021832252432-s2.0-003665818610.1002/asmb.475RencherA. C.1998New York, NY, USAJohn Wiley & SonsRaoC. R.19732ndNew York, NY, USAJohn Wiley & SonsPetersonJ. J.CahyaS.Del CastilloE.A general approach to confidence regions for optimal factor levels of response surfaces20025824224312-s2.0-0035989835