A Confidence Region for Zero-Gradient Solutions for Robust Parameter Design Experiments

One of the key issues in robust parameter design is to configure the controllable factors to minimize the variance due to noise variables. However, it can sometimes happen that the number of control variables is greater than the number of noise variables. When this occurs, two important situations arise. One is that the variance due to noise variables can be brought down to zero The second is that multiple optimal control variable settings become available to the experimenter. A simultaneous confidence region for such a locus of points not only provides a region of uncertainty about such a solution, but also provides a statistical test of whether or not such points lie within the region of experimentation or a feasible region of operation. However, this situation requires a confidence region for the multiple-solution factor levels that provides proper simultaneous coverage. This requirement has not been previously recognized in the literature. In the case where the number of control variables is greater than the number of noise variables, we show how to construct critical values needed to maintain the simultaneous coverage rate. Two examples are provided as a demonstration of the practical need to adjust the critical values for simultaneous coverage.


Introduction
Robust Parameter Design (RPD) is also called Robust Design or Parameter Design in the literature [1,2]. The concept of RPD was introduced in the United States by Genichi Taguchi in the early 1980s. It is a methodology that takes both the mean and variance into consideration for product or process optimization. Taguchi [3] divided the predictor variables into two categories: control variables and noise variables. Control variables are easy to control while noise variables are either difficult to control or uncontrollable at a large scale. In practice, we would like to find a range of control variables such that (1) the variance caused by the change of noise variables is minimized and (2) the mean response is close to target. Multiple optimization design and analysis methods have been developed to achieve these two goals simultaneously, ranging from the traditional Taguchi methods to the more sophisticated response surface alternatives. (See [2,4,5] for detailed reviews on these methods.) Under certain conditions, Myers et al. [6] proposed a way to construct a confidence region of the control variables where the variability transmission by the noise variables is minimized to zero. Although they only focused on the variance part, there are many such situations in which focus is placed entirely on the process variance (see p506 in [5]). For instance, if the process mean can be optimized using certain control factors that do not interact with noise variables or impact the noise variance (i.e., "tuning factors"), then one can seek other control factors which can drive the noise variance to zero. Even if the process mean cannot be driven to the target by tuning factors alone, it is nonetheless illuminating to consider both the confidence region for the minimum process variance and the confidence region for the optimal process mean [5,6]. If the number of noise variables is not greater than the number of control factors (for examples, see [7][8][9] and see pages 491-492, 499 in [5]), then the conditions for a minimum process variance and a zero-gradient solution will coincide.
International Journal of Quality, Statistics, and Reliability Furthermore, a zero-gradient solution is quite useful in that, even for very large noise variation, the transmission of that noise to the output of the process can be made negligible (or very small) by utilization of zero-gradient (or near-zerogradient) operating conditions. Therefore, it is desirable to have a way to statistically test for the existence of such a solution within the experimental region or a region of feasible operation (see p506 in [5]). A simultaneous confidence region for the locus of points forming a zero-gradient solution forms such a test and also provides a graphical measure of uncertainty about the zero-gradient solution.
The existence of a zero-gradient solution is especially interesting for mixture experiments with control factor ingredients and noisy process variables. If a zero-gradient solution for the ingredient mixture can be found, that is, within the experimental region, and the confidence region for the zero-gradient solution intersects one or more of the mixture simplex boundaries, then this implies that it may be possible to remove one or more mixture ingredients and still maintain very low noise variance. Removal of one or more mixture ingredients may help to reduce production cost [10].
Myers et al. [6] proposed a confidence region in control variables based upon the standard response surface model as shown in (1) for incorporating noise variables (see, for examples, [4,11,12]) where x is a vector of k control variables, z is a vector of h noise factors, β 0 is the intercept, β is a k × 1 vector of coefficients for the main effects of control variables, γ is a h × 1 vector of coefficients for the main effects of noise variables, B is a k ×k matrix whose diagonals are the coefficients for the quadratic terms of control variables and whose off-diagonals are one-half of the control variable interaction effects, and Δ is a k × h matrix of control by noise variable interaction effects. ε is the random error term. It is assumed here that ε ∼ N(0, σ 2 ). Assuming that noise variables have mean zero and variance-covariance matrix V z , the variance of the response in Model (1) is Here the variance of the response is divided into two parts: the variance transmitted by the noise variables (represented by the first term) and the constant variance (σ 2 ) due to modeling error and other factors not considered in the model. In other words, it is the noise variables that lead to the variance heterogeneity in the response. Since the changes of noise variables are inevitable in practice, Myers et al. [6] proposed that the "minimum process variance" can be reached by setting the slope of the noise variables, γ + Δ x, equal to zero and, therefore, eliminating the noise variance part from the response variance. A confidence region for such control variable values can be constructed by inverting a hypothesis test of the form: for each x-point. To simplify the notation, let ψ be a p × 1 vector that contains all the elements of the noise variable's main effect vector γ and the interaction matrix Δ, that is, ψ = (γ 1 , δ 11 (3) can then be written as where M(x) is a h × p matrix and M(x)ψ = γ + Δ x. Now the 1 − α confidence region in control variables for zero variance due to noise variables can be defined as Furthermore, let Q(x; ψ) denote the test statistic for H 0 which is where ψ is the estimate of ψ; V ψ is the usual unbiased estimate of the variance of ψ, V ψ . Let Ψ ∼ N(ψ, V ψ ), Myers et al. [6] have shown that, for each fixed x, Q(x; Ψ) ∼ hF (h, v), where Ψ is a vector of random variables (i.e., Ψ is an estimator of ψ, whereas ψ is an estimate from the actual data), v is the residual degrees of freedom (df), and F(h, v) is F distribution with numerator df equal to h and denominator df equal to v. They then conclude that the 100(1 − α) percent confidence region in (5) is where the critical value c α = hF (1 − α, h, v). Note that the confidence region in (7) (called the "MKG confidence region" from now on) and the critical value c α = hF (1 − α, h, v) were derived based on two critical assumptions: (1) the minimum of the variance due to noise variables is zero; (2) the solution to the zero-gradient equation in (4) is unique. There are situations where the first assumption cannot be met due to the fact that the solution to (4) is either outside the experimental region or does not exist. (However, the approach proposed in this article may provide the determination of such existence in a statistically significant way.) Assuming that the first assumption is met (like the two examples in Section 3), the second assumption is only true when h ≥ k. Notice that (4) represents a series of h equations with k unknown control variables. As recognized by Myers et al. (see page 506 in [5]), the equation will result in a single point solution when h = k, a line or hyperplane when h < k, and a single point solution or no solution when h > k. In other words, the MKG confidence region provides the correct critical value for the zero-gradient solution (if it exists) only when there are at least as many noise variables as control variables.
International Journal of Quality, Statistics, and Reliability 3 However, when the number of noise variables is less than the number of control variables, multiple solutions can exist to the zero-gradient equation in (4). In such situations, use of the MKG region will provide below nominal simultaneous coverage. As such, a confidence region which covers all the solutions simultaneously needs to be developed. In practice, statistical inference for the multiple-solution problems is important as this gives the experimenter more options with regard to finding the zero-gradient factor settings. The objective of this paper is to generalize the MKG confidence region such that it will provide the adequate coverage for both the single-solution case (where h ≥ k) and multiplesolution case (where h < k).
The rest of this paper will be organized as follows. Section 2 will focus on the derivation of such a generalized confidence region and the corresponding critical value required for inverting the associated null hypothesis. In Section 3, we give two examples to demonstrate the difference in simultaneous coverage between the MKG confidence region and our proposed confidence region when the number of control factors exceeds the number of noise variables. Section 4 provides a summary of the results.

The Multiple Zero-Gradient Solution Problem.
To address the multiple solution situation, the hypothesis in (4) is generalized to H 0 : M(x)ψ = 0, for all x ∈ L, where L is the linear subspace representing either a unique single solution (i.e., a point) or multiple solutions (i.e., a line, or a hyperplane). In other words, the confidence region could be a collection of either points or linear subspaces (of dimension ≥1) depending on whether the solution to the equation M(x)ψ = 0 is unique or not. Therefore, we propose to generalize the MKG confidence region to where L represents the linear subspace of the space defined by the elements of x, which are solutions to γ In this section, we derive values for c α in (8). When d > 0 and L is, therefore, not a point, computation of the confidence region in (8) may appear difficult due to the replacement of x-points by L subspaces. However, in this section, we will also show that the confidence region in (8) is equivalent to one based on pointwise gridding (of the type done for the MKG confidence region computations). We call the confidence region given in (8) the generalized zero-gradient (GZG) confidence region. As indicated by the definition, the MKG confidence region is a special case of the GZG confidence region where L is a point and d = 0. The MKG confidence region is correct, and the critical value is if a solution exists. (If h ≥ k and the MKG region is the null set, then there is statistically significant evidence that a solution does not exist.) The next question is what value should c α take when d > 0? For d > 0, note that a 100(1 − α)% GZG confidence region should contain the zero-gradient solution set L, with probability 1− α before the experiment is performed. It is worth pointing out that when h < k, the GZG confidence region in (8) is a simultaneous confidence region problem in that a line or hyperplane will be included in the confidence region only if all the points on the line or hyperplane satisfy the criterion in (8). Therefore, the GZG confidence region in (8) can also be expressed as where Q L = max x∈L Q(x; ψ) and L = {x : M(x)ψ = 0}. To find the critical value c α for h < k, we need to first investigate the distribution of the test statistic Q L when H 0 : , which is a 1 × p vector. Based on Miller's theorem (see p65 and p113 in [13]), the critical value should be For the k > h > 1 case, the distribution of Q L is more complex. Section 2.2 addresses the full model in (1) where the experimental design is completely orthogonal or partially orthogonal so that V ψ = cσ 2 I for some positive constant, c, residual variance σ 2 , and an identity matrix I, of dimension of p. Here, an exact simultaneous confidence region is derived. For the general case, Section 2.3 proposes a simulation method based upon the multivariate t-distribution to find approximate critical values with which to construct the confidence region.

Full Model-Orthogonal Case.
Here, we assume that the data are generated from an orthogonal design or partially orthogonal design such that V ψ = cσ 2 I. Furthermore, it is assumed that we have a full-noise-control variable interaction model, meaning that each noise variable interacts with the same set of control variables, that is, each element of interaction matrix Δ is nonzero. If V ψ = cσ 2 I, c > 0, and all the elements of the interaction matrix Δ are nonzero, then, for k > h > 1, the distribution of the test statistic Q L has the same distribution as a function of a chi-square random-variable and a random Wishart's matrix (which are stochastically independent) as shown below: where and v is the residual degrees of freedom. Here, I d+1 is a (d + 1) × (d + 1) identity matrix, where d is the same as defined in Section 2.1. The degrees of freedom of the Wishart distribution is h, and λ max (A) is the maximum eigenvalue of the matrix A. The proof of this result is provided in Appendix A. Using (10), the critical value, c α , can then be computed as the 100(1-α)th percentile of the distribution of Q L , which 4 International Journal of Quality, Statistics, and Reliability can be obtained by the simple Monte Carlo simulation from χ 2 and Wishart's distributions. Some limited tables of critical values are given in Appendix B based on (10), although computation of the critical value for any specific case using the random variable in (10) is easily accomplished. Note that the critical value determined by (10) In other words, A is not a matrix anymore but a χ 2 (h) random variable. Hence, λ max (A) = A. So Q L can then be written as Furthermore, the same result in (10) holds when [γ, Δ] has an h × (k + 1) matrix normal distribution. (See p90-91 in [14].) (For details, see [15].)

The General
Case. In some cases, the experimental design may be such that Var( ψ) does not have the orthogonal cσ 2 I form, or we may wish to use a model with some "control × noise variable" interaction terms deleted, that is, Δ has some zero elements. In such situations, when k > h > 1, the distribution of Q L does not have a simple form and may depend upon ψ even under H 0 . Nonetheless, in such situations, it is still possible to obtain approximately conservative simultaneous confidence regions for control variables associated with zero-gradient solutions. We provide such a construction as follows.
Recall that Q L in (9) is a function of ψ, and consider where , Q max is an approximate upper confidence bound for the scalar-valued quantity, Q L . (See Clarke, [16], for a discussion of confidence bounds on nonlinear functions of model parameters constructed from confidence regions.) Let c * α denote the 100(1−α)th percentile of the distribution of Q max under H 0 . Consider the confidence region defined by C max This confidence region should provide (at least approximately) a conservative simultaneous confidence region for the zero-gradient solutions. However, computation of c * α (using (12)) and the associated confidence region is numerically difficult due to the complex constraints associated with the definition of Q max . Fortunately, it can be shown that where A proof is given in Appendix C. The expression for Q max in (13) allows for much easier computation of the c * α critical value. The actual construction of the GZG confidence region from the relevant critical value will be outlined in Section 2.4.

The Critical Value Computation for the General Case.
Note that under H 0 : M(x)ψ = 0 we can express Q(x; Ψ) as where V ψ = s 2 Ω, s 2 is the mean squared error, Ω is a known matrix computed from the design matrix, and t = ( Ψ − ψ)/s. Here, t follows the multivariate t distribution with location parameter equal to zero, scale matrix Ω, and degrees of freedom v. Using (14) we can then compute the critical value, c * α , using the Monte Carlo simulations as follows. Step Step 2. Simulate a multivariate t random vector (rv) with scale matrix Ω and ν df. (This can be done by simulation of a multivariate normal rv with mean vector 0 and variancecovariance matrix Ω and a chi-square random variable with ν df. See [17] for details.) Step 3. Compute Q max using the expressions in (13) and (14).
(For practical reasons, computation of Q max can be done by where R is a prespecified, bounded region. This will calibrate the coverage to be simultaneous only over L 0 ∩ R, where L 0 is the true linear subspace such that M(x)ψ = 0.) Step 4. Do Steps 2-3 a large number of times to estimate the 100(1-α)th percentile of the Monte Carlo distribution of Q max . This 100(1-α)th percentile is then a Monte Carlo estimate of c * α .

The Coverage Rate of the Critical Value.
In order to check the accuracy of c * α as a critical value, we have done some Monte Carlo simulations of the above fourstep procedure using three different noise variable models in conjunction with both orthogonal and nonorthogonal designs. The statistical models used are summarized in Table 1. These models are constructed so that the zerogradient solution exists in the experimental region. Three partially orthogonal, face-centered central composite experimental designs were assessed, with associated statistical models 1, 2, and 3, respectively. These designs employed a coded factor space with factor levels equal to ±1 (except for the center points). The axial points in noise variables are deleted to maintain partial orthogonality. The factorial part of the designs is either full factorial (e.g., model 1 and model 2) or half factorial (e.g., model 3). The nonorthogonal designs are constructed by changing the (one) factorial point (comprised of all −1s) from (−1, −1, . . . , −1, −1) to (−1, −1, . . . , −1, 0). The resultant sample size (n) and the residual df (ν) of each composite design are both listed in Table 2. For demonstration purposes, we simply chose model parameters to be either 1 or −1, with residual error variance equal to 1.
The results of these coverage rate simulations are summarized in Table 2. The coverage rates were computed as follows. The models in Table 1 were used to compute the occurred, where E is the convex hull formed by the ±1 factor levels. For each simulated dataset, the critical value of c * α was computed using 1000 Monte Carlo simulations. 5000 simulations were done to assess the simultaneous coverage of the GZG confidence region for the set L 0 ∩ E. In an attempt to reduce the conservatism of the above approach for computing c * α , we also considered the approximate approach obtained by maximizing Q(x; Ψ) over L 0 ( ψ) ∩ R, where L 0 ( ψ) = {x : M(x) ψ = 0} = C x (0). We denote this approximate critical value by c α and use it in place of c * α to reduce conservatism.

Remark 1.
Because C x (b α ) is a function of the data, the relatively large region, R, was chosen for these simulations so that C x (b α ) ∩ R would be extremely unlikely to be empty for any simulated dataset. In addition, we did not want to rule out situations where the confidence region was outside the experimental region. While, in practice, such extrapolated inferences must be treated with caution, nonetheless it may be desired to compute such a confidence region. Such a confidence region outside the experimental region suggests that it may not be possible to obtain a "zero-gradient" solution for noise transmission, at least within the current experimental region. However, such a confidence region just outside the experimental region may offer hope that resetting process control conditions may allow for a more robust process. Of course, additional experiments outside the current experimental region would be needed to confirm this.

Remark 2.
Maximization of Q(x; Ψ) over C x (b α )∩R, to compute the Monte Carlo critical value, c * α , was accomplished by using the SAS/IML Nelder-Mead simplex algorithm, nlpnms. This was done to make the Monte Carlo simulations of this Monte Carlo procedure tractable. Some limited simulations were also done whereby the maximization of Q(x; Ψ) over C x (b α ) ∩ R was computed by gridding instead. This was done to make sure that the Nelder-Mead algorithm did not stop its maximization prematurely. In all cases, each critical value, c * α , computed using nlpnms, was slightly larger than that obtained using gridding. (Random number seeds were aligned to avoid the Monte Carlo differences in the comparisons between gridding and the use of the Nelder-Mead simplex algorithm.) For the approximate approach, maximization over L 0 ( ψ) ∩ R was done by gridding as this was easier to accomplish with finer gridding. Table 2 below displays the percent of times the event in (15) occurred for each of the three models with and without an orthogonal design. If the event in (15) occurs, then that portion of the true linear subspace, L 0 (within E), is entirely covered by the GZG confidence region; otherwise it is not. Table 2 indicates that the simultaneous coverage rate of the GZG confidence region using the conservative critical value, c * α , produces reasonably conservative results, while the approximate approach (that maximizes over L 0 ( ψ) ∩ R, instead) achieves closer to nominal (yet slightly conservative) coverage rates. It is interesting to note that for each approach the coverage rate appears to be insensitive to the minor departure from orthogonality that was induced by changing the (one) factorial point (comprised of all −1s) from (−1, −1, . . . , −1, −1) to (−1, −1, . . . , −1, 0). Such a departure from design orthogonality could happen due to a design execution error or a process restriction.

The Full Model Nonorthogonal Case.
Because computation of c * α and c α requires maximization within a Monte Carlo calculation, it would be useful to assess if this can be eliminated when a full model is employed. We, therefore, conduct another simulation study to see if the critical value based upon the random variable in (10) can be used as an approximate critical value for mild departures from orthogonality. We use the same nonorthogonal designs as used in Table 2. The corresponding full-interaction models 6 International Journal of Quality, Statistics, and Reliability 4 2 y = 1+x 1 +x 2 +x 3 +x 4 +x 2 1 +x 2 2 +x 2 ×x 3 +z 1 +x 1 ×z 1 +x 2 ×z 1 +x 3 ×z 1 +x 4 ×z 1 +z 2 −x 1 ×z 2 +x 2 ×z 2 −x 3 ×z 2 +x 4 ×z 2 are listed in Table 3. This time the c α critical value was used with these nonorthogonal designs to assess the simultaneous coverage rate. The results are shown in Table 4. In order to assess the coverage rate gridding had to be done over a subset of L 0 . As a more fair comparison with the theoretical c α critical value, gridding was done over L 0 ∩ R, (whereas before R is a hypercube region composed of the Cartesian product of the intervals [−10, 10], instead of [−1, 1]). This is because the c α critical value associated with the random variable in (10) is computed by maximization over the whole linear subspace, L 0 . Table 4 indicates that this minor departure from orthogonality has virtually no effect on the coverage rate of the GZG confidence region when the more convenient c α critical value is used. For more radical departures from orthogonality, it may possibly be safer to use the conservative c * α critical value. But further robustness studies are needed to ascertain how well the more convenient c α critical value works under departures from its assumptions.

Computation of Simultaneous Confidence Region.
For the k > h case, once we have computed the critical value, the confidence region can be computed by searching linear subspaces, L, that satisfy the condition as defined in (9). However, searching over various lines or hyperplanes that span an experimental region is more computationally difficult than searching the same experimental region in a pointwise fashion. Fortunately, it can be shown that, for any given critical value, the GZG confidence region can be computed by pointwise gridding. This is because for C x (c α ) in (7) and C L (c α ) in (8), with the same critical value, C x (c α ) = C L (c α ). A proof is provided in Appendix D. This equivalency shows that one can construct the GZG confidence region by simply gridding over the experimental region in a pointwise fashion.

Examples
3.1. One Noise Variable. This example is from Myers et al. [6]. It was originally taken from Montgomery [18] (2009, page 231). The data was generated from a 2 4 factorial experiment with a total of 16 observations from a pilot plant to explore the factors that could affect the filtration rate of a chemical bonding substance. The goal is to maximize the filtration rate, y.
As in Myers et al. [6], one of the four factors, temperature, is assumed difficult to control at large scale and, therefore, treated as a noise variable z. The rest of the factors are control variables: x 1 : pressure, x 2 : concentration, x 3 : stirring rate. The fitted model is with mean square error equal to 21.12 and residual df equal to 9.
Therefore, k = 2 and h = 1. The solution to the null hypothesis H 0 : γ + Δ x = 0 is a line. Then the general critical value 2F(1−α, 2, v) (based on Miller's Theorem (1981) [13]) should be used to calculate the GZG confidence region (as shown in Figure 1). The GZG confidence region in Figure 1 is clearly wider than the MKG confidence region in Myers et al. (1997, Figure 2 in [6]) where F (1 − α, 1, v) is used as the critical value. It is clear from Figure 1 that we are at least 95% confident that the zero-gradient locus of points passes through the experimental region. Next, we do some simulations to compare the coverage rates of the GZG and MKG confidence regions. Since the true optima is not known in practice, we calculate the coverage rate for the solution to γ + Δ x = 10.81 − 9.06x 2 + 8.31x 3 = 0, using a simulation model equal to the fitted model in (16) with σ 2 = 21.12. Note that the true solution in this example is a line with infinite length. But the simulation is done only for the line within the experimental region, that is, [−1, 1] of the control variables.
Using 100,000 Monte Carlo simulations, the simultaneous coverage rate of the GZG confidence region for all of the zero-gradient solutions in the experimental region is 97% while the MKG confidence region only has 92% coverage International Journal of Quality, Statistics, and Reliability rate. The MKG confidence region has a lower coverage rate because it was designed to contain the true optima only when the optimum is a point. Although the GZG confidence region is designed to contain all the true solutions (which could be a point, a line, or a hyperplane), the simulated coverage rate tends to exceed the nominal coverage rate because the simulation is done within a finite range of the control variables while the line or hyperplane has an infinite range in theory.

Two Noise
Variables. This example comes from a facecentered central composite design with the factorial part being a half-fractional factorial design (see details in [19]). The objective of this study is to find the optimized condition that maximizes the yield of diacylglycerol oil, which is a natural component of various edible oils and has shown some beneficial effects as compared to the traditional triacylglycerol oil. Five factors were studied in this experiment: reaction time (RTIME), enzyme load (ENZL), reaction temperature (RTEMP), water content (WATC), and substrate molar ratio (SUBR). Water content (WATC) is difficult to control at large scale [19] and, therefore, treated as a noise variable. For illustration purposes, substrate molar ratio is also treated as a noise variable, and the axial points corresponding to the noise variables are excluded from the analysis to obtain partial design orthogonality with respect to the noise variables (i.e., to ensure V ψ = cσ 2 I). The final model in coded factor value is as follows: y = 57.58 + 9.12x 1 + 4.78x 2 + 11.01x 3 − 4.69x 2 1 − 9.47x 2 2 − 7.37x 2 3 − 1.61x 23 − 2.05z 1 + 4.83z 2 − 2. where x 1 : RTIME, x 2 : ENZL, x 3 : RTEMP, z 1 : WATC, z 2 : SUBR. Here, the residual mean squared error is equal to 2.56 with 25 observations and residual df equal to 9.
Since k = 3, h = 2, the solution to the null hypothesis is a line in a 3-dimensional space determined by control variables x 1 , x 2 , and x 3 . Therefore, the confidence region for this line is a tube in this 3-dimensional space. A 95% GZG confidence region is shown in Figure 2. Based on (10), the GZG critical value is obtained via χ 2 and Wishart's distribution. From Figure 2, we can see that while this confidence region does not provide statistically significant evidence that the zerogradient locus of points passes through the experimental region, it does appear that a good portion of the confidence region is within the experimental region, and hence, attainment of near-zero-gradient conditions should be feasible for this process.
As with previous example, we compare the coverage rates for the GZG and MKG confidence regions using the fitted models as true population models. Using 100,000 Monte Carlo simulations (based upon the fitted model in (18)

Summary
This paper shows that when the number of control variables does not exceed the number of noise variables, the MKG approach provides a confidence region for control variables associated with a zero-gradient for noise transmission. Otherwise, the MKG approach results in a confidence region that is too small for simultaneous coverage of the linear subspace of zero-gradient solutions. It is important to know that the true optimal condition represented by control variables is 8 International Journal of Quality, Statistics, and Reliability either a line or a hyperplane instead of a single point when h < k. In this situation, constructing a simultaneous confidence region about the linear subspace solution is desirable in that a subspace of solutions provides the investigator with many options for setting the zero-gradient control level. Of course a confidence region also provides the experimenter with a measure of uncertainty for the optimal solution. If the confidence region is too large, further experimental runs may be needed to make more accurate inferences. If the current manufacturing set point is outside of the confidence region, this provides statistically significant evidence that reconfiguration of the set point may help improve process variability by lowering the transmission of noise through the system. The GZG confidence region for the zero-gradient conditions is proposed and is shown to provide nominal or reasonably conservative coverage rates for many noise variable experiments that occur in practice.
In the situation where there are many noise variables, it may be either costly or difficult to study all the noise effects. One way to deal with this problem is to combine the multiple noise factors into one compound noise factor with two extreme conditions as its two levels (provided certain assumptions can be satisfied). See [1,20,21] for discussion. If the noise factors can be combined into one compound noise factor, then we could have a situation where the number of control variables is greater than the number of noise variables. The GZG approach is directly applicable for this situation. In some cases, however, one may desire to create a compound noise factor, with more than two levels [22], or two or more compound noise factors. In either case, as long as the predictive model is in the form of (1), the GZG approach is applicable.
The GZG confidence region provides inferences about the optimal control point or points that yield a zero-gradient for the transmission of variability from the noise variables. However, there are situations where the control points corresponding to zero noise variance are either outside the experimental region or simply do not exist. In this case, it would still be useful to find the constrained optimal control point for minimum noise variance over the experimental region. A method to further generalize the confidence region for the constrained optimal point for minimum noise variance is needed and is currently under development.

Appendices
A. Proof of (10) We will prove the result for the no-intercept case where γ = 0 in the hypothesis γ + Δ x = 0. It then follows that the result can be generalized to the intercept case where γ / = 0.  2, 3, . . . , h, j = 1, 2, 3, . . . , k, and δ i j is the jth element on the ith row of the matrix Δ . Since the vector z ∼ N(0, I p ), G(z) G(z) ∼ Wishart (I k , h) by the definition of Wishart distribution (see p92 in [23]). Let s 2 be the sample estimate of the residual variance σ 2 , then the test statistic Q L becomes where U = vs 2 /σ 2 follows χ 2 (v) distribution. Next, we show that A∼Wishart (I d , h). Because G(z) G(z) ∼Wishart (I k , h) and L is a d × k matrix with rank d, by the property of Wishart distribution  LG(z) G(z)L ∼Wishart (LL , h). Note that rank(L L) = rank((L L) −1 ) = rank(L) = d. Since (LL ) −1 is symmetric and positive definite, then (LL ) −1/2 is also symmetric and positive definite, and its rank is d. Hence, A ∼Wishart (I d , h).

Part 2. For the intercept case:
The intercept case can be proved using the same arguments as indicated by Miller ([13,page 113]). Note that, in the intercept case, the rank of (LL ) −1/2 is d + 1. Hence, A ∼ Wishart(I d+1 , h). (10) See Table 5. (13) Note that Q(x; ψ) ≤ c α for all x ∈ L}. By definition, x ∈ C L (c α ) implies that x ∈ C x (c α ). Next, we will show that if x ∈ C x (c α ), then x ∈ C L (c α ). This can be proved by contradiction. Suppose that there exist some x s such that x ∈ C x (c α ), but

C. Proof of
x / ∈ C L (c α ). Then there exists at least one x point, say x * , in C x (c α ) such that there is no linear subspace that satisfies the two conditions: (1) it contains x * , (2) for every point x in this subspace Q(x; ψ) ≤ c α . Define C ψ (c α ) = {ψ : ( ψ − ψ) V −1 ψ ( ψ − ψ) ≤ c α } and consider the set, C * x = {x : M(x)ψ = 0, for all ψ ∈ C ψ (c α )}. It follows using a proof analogous to that in Theorem 2.1 in Peterson et al. [25] that C x (c α ) = C * x . Therefore, x * ∈ C * x . This then implies that there exists a ψ * ∈ C ψ (c α ) such that M(x * )ψ * = 0. If h < k, then there exists a line or hyperplane L * such that all the x s that satisfy M(x)ψ * = 0 are in L * . Hence, x * ∈ L * . Again using the proof of Theorem 2.1 in Peterson et al. [25], it follows that for any given point x, Q(x; ψ) ≤ c α if and only if M(x)ψ = 0 for some ψ ∈ C ψ (c α ). Therefore, Q(x; ψ) ≤ c α for all the x s in L * . In other words, L * satisfies the above two conditions, which is a contradiction.