ANOVA gauge repeatability and reproducibility study is the most popular tool for measurement system analysis. Two experimental designs can be applied depending on the durability of the objects. If repeated measurements are possible or sufficient homogeneous nonrepeatable samples are available, crossed design is appropriate; otherwise, nested design should be used. In this paper, we investigated the adequacy of ANOVA gauge repeatability and reproducibility study from the perspective of practitioners. We proposed a Monte Carlo simulation that is close to the realistic procedure to evaluate the adequacy of both structures. During the evaluation, we considered the average performance metrics, percentage of correct decision, histogram shape, and symmetric mean absolute percentage error for the four popular performance metrics, namely, % Study Variation, % Contribution, % Tolerance, and the number of distinct categories. The experimental results show that the nested design fails to judge the precision of the gauge while the crossed design succeeds.
National Research Foundation of KoreaNRF-2013R1A1A20069471. Introduction
Gauge repeatability and reproducibility (GRR) study is a representative measurement system analysis (MSA) tool [1]. Two factors determine the adequacy of a measurement system: accuracy, such as bias, linearity, stability, and correlation, and precision such as repeatability and reproducibility. The main concern of the GRR is that a measurement system has sufficient precision to measure the variation of the manufactured products or the manufacturing process under consideration. There are three conventional GRR methods; the range method, the average and range method using control chart, and the analysis of variance (ANOVA) GRR (AGRR) [1]. After the AGRR was introduced by Montgomery and Runger [2, 3], it became the most popular tool for MSA as it considers the interaction effects and provides interval estimates for the variance components and the performance metrics [4]. The ANOVA in AGRR measures the variability of observations and estimates variance components. The performance metrics, which are composed of sums or ratios of the estimated variance components, provide the criteria used to analyze the precision for the measurement system. Crossed designs are standard experimental layouts for AGRR. Nested designs, or hierarchical designs, are used for nonrepeatable measurements such as a destructive test. Though the measured object is nonrepeatable, if sufficient homogeneous samples are captured, the crossed design will be appropriate [5].
For the last two decades, numerous studies have been conducted on AGRR. Previous studies mainly concentrated on providing theoretical backgrounds, introductions of the AGRR, efficient approximations for narrower confidence intervals of variance components and performance metrics, and variations of AGRR for special experimental structures. In fact, AGRR is a popular tool used in the industrial field. QS-9000, a quality standard of the American automotive industry, even provides a guideline for AGRR [6]. From the practitioners’ perspective, previous theoretical studies are not as such valuable. The main reason for this is the wide confidence intervals. Theoretical studies have mainly focused on developing efficient approximations on the wide confidence intervals. Since the AGRR is based on sampling, it is reasonable that the confidence intervals give clearer evidence than the point estimates to arrive at the correct conclusions. However, as Burdick and Larsen [4] noted, the estimated confidence intervals are often too wide to be used. In many cases, the confidence intervals overlap with the decision criteria of AGRR in Table 3, making them unsuitable for assessing the adequacy of the gauge. Therefore, practitioners choose the point estimates of the performance metrics.
In this situation, it is imperative to verify the adequacy of AGRR with the point estimates. Especially, as Bergeret et al. [7] mentioned, the adequacy of the nested design is quite doubtful. Though the theoretical basis of AGRR is very firm, there are several possible sources to harm the adequacy of the AGRR. The fundamental assumption of AGRR is that all effects including interaction follow a normal distribution and those are independent. If we filter outliers during inspection or select samples arbitrarily to secure a sufficient range of variability, the normality or the independent assumption breaks. The nested design itself is another source of decreasing adequacy. The nested effects interferes to separate the variance components clearly, and, consequently, the accuracy of the performance metrics decreases.
Practitioners are not concerned with theoretical derivations or proofs. Their primary concern with the AGRR is that the tool works properly to determine the precision of the gauge and, if possible, to find ways to improve the adequacy of the AGRR within budget constraints. Theoretical, and even practical, studies have not dealt with these issues. Existing practical studies only focused on offering user guidelines or providing case studies on various applications. The purpose of this paper is to evaluate the adequacy of the AGRR for both the crossed and nested designs and to investigate the causes of any inadequacies. To accomplish this, we constructed a series of Monte Carlo simulations and verified the adequacy via four popular performance metrics, % Study Variation, % Contribution, % Tolerance, and the number of distinct categories (NDC) [8].
Section 2 introduces the conventional AGRR process for both crossed and nested designs and compares the differences in formulas for the performance metrics. In Section 3, we briefly review existing references on the AGRR. The proposed Monte Carlo simulation method and experimental environments are described in detail in Section 4. We summarize the simulation results for various perspectives, show why the nested design AGRR is unsuitable for MSA, and reveal the cause of inadequacy in Section 5. Finally, in Section 6, the conclusions and further discussion of this paper are provided.
2. ANOVA Gauge Repeatability and Reproducibility Study
A standard AGRR uses the crossed design. The two-way random effect model is as follows:(1)yijk=μ+Oi+Sj+SOij+Eijk,i∈1,2,…,o,j∈1,2,…,s,k∈1,2,…,r,where yijk is an observation; μ is the unknown overall mean; S,O,SO,andEare random variables that represent the effects of the sample, the operator, the interaction between the sample and the operator, and the replicate, respectively; o,s,andr are the number of operators, samples, and replicates, respectively. It is generally assumed that Oi~N(0,σO2), Sj~N(0,σS2), (SO)ij~N(0,σSO2), and Eijk~N0,σE2, and those are independent of each other. Left side of Figure 1 shows an experimental structure of the crossed design with 2 operators, 4 parts, and 2 replicates. In the figure, the number in the observations indicates ijk. In the crossed design, two operators measure four distinct samples twice.
Examples of experimental designs for a crossed design and a nested design.
Table 1 shows the resulting ANOVA table. If the interaction effect is negligible, that is, VSO/VE is less than the significant level, it is pooled into error terms; that is, E and the table are changed. For detailed explanation, refer to Montgomery and Runger [2, 3].
ANOVA table of the crossed design under two-way random effects model.
Source of variability
Sum of square (S)
Degrees of freedom (d)
Mean of square (V)
Expected mean square
F
Operator (O)
SO
dO=o-1
VO=SOdO
θO=σE2+rσSO2+orσO2
VO/VSO
Sample (S)
SS
dS=s-1
VS=SSdS
θS=σE2+rσSO2+srσS2
VS/VSO
Interaction: S × O
SSO
dSO=(s-1)(o-1)
VSO=SSOdSO
θSO=σE2+rσSO2
VSO/VE
Replicate (E)
SE
dE=so(r-1)
VE=SEdE
θE=σE2
Total
ST
sor-1
We can estimate the variance components as follows: (2)σ^E2=VE,σ^SO2=VSO-VEr,σ^S2=VS-VSOsr,σ^O2=VO-VSOor.
If the samples are destructive, we must apply the nested design instead of the standard crossed design. The right side of Figure 1 shows a nested experimental structure with two operators, four parts, and four replicates. The experiment is a counterpart of the crossed design on the left side with the same number of observations. In the experiment, two operators measure sixteen distinct samples in four batches (or lots). If the samples in a batch are homogeneous and enough samples are available, the crossed design can still be effective. The two-way random effects model for the nested design is as follows:(3)yijk=μ+Oi+SOji+Eijk,i∈1,2,…,o,j∈1,2,…,s,k∈1,2,…,r,where yijk is an observation and μ is the unknown overall mean. O,SO,andEare random variables that represent effects of operator, sample nested operator, and replicate, respectively. o, s, and r are the number of operators, samples per operator, and replicates, respectively. It is also assumed that Oi~N(0,σO2), S(O)ij~N(0,σS(O)2), and Eijk~N0,σE2 and that these are independent of each other. Table 2 shows the ANOVA Table of the nested design.
ANOVA table of the nested design under two-way random effects model.
Source of variability
Sum of square (S)
Degrees of freedom (d)
Mean of square (V)
Expected mean square
F
Operator (O)
SO
dO=o-1
VO=SOdO
θO=σE2+rσS(O)2+orσO2
VO/VS(O)
Sample (Operator) (S(O))
SS(O)
dSO=o(s-1)
VSO=SSOdSO
θS(O)=σE2+rσS(O)2
VS(O)/VE
Replicates (E)
SE
dE=so(r-1)
VE=SEdE
θE=σE2
Total
ST
sor-1
Variance components, performance metrics, and their decision criteria of the crossed design and the nested design.
Components
Equation
Crossed design
Nested design
Criteria
Variance
Repeatability
σ^RPT2
σ^E2
σ^E2
Reproducibility
σ^RPD2
σ^O2+σ^SO2
σ^O2
R&R
σ^R&R2=σ^RPT2+σ^RPD2
σ^E2+σ^O2+σ^SO2
σ^E2+σ^O2
Sample
σ^S2orσ^S(O)2
σ^S2
σ^S(O)2=σ^S2+σ^SO2
Total
σ^T2
σ^S2+σ^E2+σ^O2+σ^SO2
σ^S2+σ^E2+σ^O2+σ^SO2
Performance metric
% Study Variation
σ^R&R/σ^T×100
σ^E2+σ^O2+σ^SO2/σ^S2+σ^E2+σ^O2+σ^SO2×100
σ^E2+σ^O2/σ^S2+σ^E2+σ^O2+σ^SO2×100
≤10% acceptable10~30% pending>30% unacceptable
% Contribution
σ^R&R2/σ^T2×100
σ^E2+σ^O2+σ^SO2/σ^S2+σ^E2+σ^O2+σ^SO2×100
σ^E2+σ^O2/σ^S2+σ^E2+σ^O2+σ^SO2×100
≤1% acceptable1~9% pending>9% unacceptable
% Tolerance
5.15×σ^R&R2/Tolerance×100
5.15×σ^E2+σ^O2+σ^SO2/Tolerance
5.15×σ^E2+σ^O2/Tolerance
≤10% acceptable10~30% pending>30% unacceptable
Number of distinct category (NDC)
1.41×σ^S/σ^R&R
1.41×σ^S2/σ^E2+σ^O2+σ^SO2
1.41×σ^S2+σ^SO2/σ^E2+σ^O2
≥5 acceptable<5 unacceptable
The estimates of the variance components are as follows:(4)σ^E2=VE,σ^SO2=VSO-VEr,σ^O2=VO-VSOor.
The goal of AGRR is to determine whether the measurement system can distinguish variation of products or processes properly. To do that, the AGRR extracts the gauge error (repeatability) and the operator error (reproducibility) from the observed measurements and judges the adequacy via performance metrics. The most popular performance metrics in practice are % Study Variation, % Contribution, % Tolerance, and the number of distinct categories (NDC). Popular software in the quality field, Minitab, provides the four metrics for AGRR [8]. We summarize the calculation formula and relevant decision criteria of the metrics for the crossed design and nested design in Table 3.
According to the formulas in Table 3, under the assumption of the same estimates of the variance components, we can surmise that the values of all performance metrics of the nested design are superior to the crossed design. In the crossed design, the measurement variation σ^R&R2 includes the interaction effect σ^SO2, and it makes the values worse. However, practical results oppose this theoretical analysis. Bergeret et al. [7] claimed that % Contribution and % Tolerance of the nested design are overestimated when compared with the crossed design. They investigated three case studies and argued that improper estimation of repeatability results in the overestimation of the performance metrics. In this situation, it is valuable to determine whether the AGRR, specially the nested design, is indeed an appropriate tool to determine precision of the gauge and to investigate how accurate the AGRR is.
3. Previous Studies
In this section, we briefly review previous studies on AGRR. The mainstream theoretical developments on AGRR include accurate and efficient approximation approaches for narrower confidence intervals of the variance components and performance metrics, methods for improving the accuracy of the AGRR, and the methods for nonrepeatable measurements.
Montgomery and Runger [2, 3] introduced AGRR as an alternative to the conventional GRRs such as the range method and the average and range method. They also suggested a proper experimental design for AGRR and the Satterthwaite approximation for confidence intervals of the variance components. Borror et al. [9] compared two approximations, the restricted maximum likelihood estimation (REML) using SAS PROC MIXED, and a modified large sample (MLS) method to estimate confidence interval of σR&R2 for the two-way random effects model. They claimed that the REML is superior to the MLS due to the narrower confidence interval. Burdick and Larsen [4] compared five approximation approaches, that is, MLS [10], Satterthwaite approximation [3], AIAG [1], REML [11], and the Milliken and Johnson method [12] for five variations, namely, σE2, σO2, σRPD2, σR&R2, and σS2/σR&R2. The simulation results showed that the MLS is superior to others since it satisfies the confidence coefficient in spite of wider confidence interval. Dolezal et al. [13] investigated the confidence interval of σR&R2 for a two-way mixed effects model with fixed operators. They suggested the mixed effects model for a limited number of operators because the interval length is shorter than conventional random effects model. Hamada and Weerahandi [14] proposed a modified generalized inference approximation for σR&R2 and argued that it provides a shorter confidence interval than the MLS [4]. Chiang [15] also proposed an approximation for σR&R2 using the surrogate variables. He compared the confidence interval to the MLS and insisted that it is an effective general method for the balanced random effects model. Daniels et al. [16] employed the generalized confidence interval approach using a generalized pivotal quantity. They stated that the approach is superior to the MLS if σO2/σR&R2 is less than or equal to 0.2. Wang and Li [17] proposed a bootstrapping method that can estimate the confidence interval when the control chart GRR is applied.
As for the performance metrics and their confidence intervals, Burdick et al. [18] stated that the confidence interval of σS2/σR&R2 is too wide to be used, hence they recommended the Cochran method based on Satterthwaite approximation [19]. Chiang [20] argued that the confidence coefficients of the MLS and Satterthwaite approximation for σS2/σR&R2 become low when σS2 is less than 0.5. To overcome this phenomenon, he suggested the F-screened MLS that applies the MLS only when σS2 is statistically significant. Burdick et al. [21] reviewed previous research on AGRR and stated that precision-to-tolerance ratio (% Tolerance), signal-to-noise ratio (SNR), and discrimination ratio (DR) are popular performance metrics for AGRR. Adamec and Burdick [22] compared the performance of the MLS and the generalized inference procedure for the DR in a three-way random effects model. Burdick et al. [23] proposed the generalized inference procedure for the misclassification rate. Woodall and Borror [24] reviewed and analyzed relationships for popular performance metrics, % Study Variation [1, 8], NDC [1, 8], SNR [1, 21], DR [1], and misclassification rates.
There were a few studies on improving the accuracy of AGRR. Pan [6] calculated the optimal set of (o,s,r) to provide the shortest confidence interval of σR&R2 for variety combinations of σO2, σSO2, and σE2 under the same number of observations. Browne et al. [25] proposed two-staged AGRR to increase adequacy of AGRR. At a baseline stage, a number of operators measure a sample to obtain an appropriate range of samples. Then, at the second stage, a standard AGRR is conducted with the samples. They argued that the approach provides shorter standard deviations for σR&R/σT and σO/σR&R than the normal one-staged AGRR. Pan et al. [26] suggested a revised % Tolerance for the multivariate GRR that provides smaller mean squared error and mean absolute percentage error than the conventional % Tolerance. They also calculated the optimal (o,s,r) in terms of the new performance metric using the principal component analysis.
Research on the nested design is rare. Bergeret et al. [7] applied the nested design for three case studies with destructive samples and argued that the nested design overestimates % Study Variation and % Tolerance. Mast and Trip [27] introduced four assumptions for AGRR that are the consistency of bias, homogeneity of measurement errors, temporal stability of objects, and robustness against measurement. They defined a nonrepeatable measurement as the measurement that the last two assumptions are not satisfied. Furthermore, they proposed several alternative AGRRs that are suitable for various homogeneity assumptions. Van Der Meulen et al. [28] developed a compensation method to improve overestimation of the nested design for nonrepeatable measurements.
There are many practical studies on AGRR, but those mainly focused on introducing basic theoretical knowledge, suggesting systematic user guidelines, and providing case studies for various applications. We will skip the review for the references.
4. Experimental Setup4.1. Procedure for Simulation Experiment
Adequacy implies the ability to perform the desired goal. The purpose of AGRR is to determine the sufficiency of precision of a measurement system by statistics. To verify the adequacy of AGRR, we require information on the population, in other words, the true values must be known. However, it is impossible to obtain complete data for a population. One alternative, for overcoming this problem, is to use simulation. Most existing studies applied a Monte Carlo simulation for verification. If an effect Q~N0,σQ2, then dQVQ/θQ~χ2dQ or VQ~θQχ2dQ/dQ, where dQ is the degree of freedom of Q and VQ is the mean of squares of Q. From the relationship, the population of VQ can be generated using Chi-squared distribution, and subsequently, the estimates of variance components and performance metrics can be calculated. This simulation approach, however, has two weaknesses. First, the normality assumption of effects must be satisfied. If we limit the random sampling by inspection processes or select samples or operators arbitrarily to obtain better results, we cannot generate the population due to the broken normality condition. Second, the true values of variance components and performance metrics are still unknown, therefore the adequacy of AGRR cannot be judged. Therefore, we propose a new Monte Carlo simulation for verification of the adequacy of AGRR. This approach generates populations of all effects instead of populations of the mean of the squares. An observation will then consist of the sampled effects from the populations. Since the true variance components can be calculated from the populations, intensive evaluation is possible. In addition, during the procedure, it is possible to employ various realistic constraints such as inspection, without loss of generality. The detailed procedure of the proposed Monte Carlo simulation is in Algorithm 1.
<bold>Algorithm 1</bold>
For 1 to Nscenario Do:
Step 1. Assign a set of variance components, (σ~O2,σ~S2,σ~SO2,σ~E2) according to the assigned experimental design in
Tables 4 and 5.
Step 2. Generate Spop size of populations of effects with TN(0,σ~O2), TN(0,σ~S2), TN(0,σ~SO2), andTN0,σ~E2, where
TN denotes a truncated Normal distribution bounded by -3σ~Q,+3σ~Q for Q∈O,S,SO,E.
Step 3. Calculate true variance components (σO2,σS2,σSO2,σE2) and the true values of performance metrics from
the populations of effects.
For 1 to Nrepeat Do:
For 1 to Nexperimental design Do:
Step 4. Generate observations.
Step 4-1. Select samples from each population of effects by the experimental design in Table 6.
Step 4-2. Generate observations by the structural models of Equation (1) and (3), respectively.
Step 5-3. Calculate the performance metrics for the crossed design and the nested design, respectively.
End
Step 6. Calculate mean of performance metrics for the crossed design and the nested design.
End
End
Levels of σ~S2 and σ~O2+σ~SO2+σ~E2 for population generation.
σ~S2
σ~O2+σ~SO2+σ~E2
8192
213
3
4096
212
3
2048
211
3
1024
210
3
512
29
3
256
28
3
128
27
3
64
26
3
32
25
3
16
24
3
8
23
3
4
22
3
2
21
3
1
20
3
Levels of σ~O2, σ~SO2, and σ~E2 for population generation.
σ~O2
σ~SO2
σ~E2
1
1
1
0.8
1.4
0.8
1.4
0.8
0.8
0.8
0.8
1.4
1
1.4
0.6
0.6
1.4
1
1
0.6
1.4
1.4
0.6
1
1.4
1
0.6
0.6
1
1.4
Levels of factors of experimental design.
Factors
Level (low, high)
Crossed design
Nested design
Number of operators
oC∈{3,5}
oN=oC
Number of samples per operator
sC=oCsN
sN∈{3,5}
Number of replicates
rC∈{2,4}
rN=oNrC
Number of observations
oCsCrC
oNsNrN
At the beginning of each scenario, the levels of σ~S2, σ~O2, σ~SO2, and σ~E2 are assigned. The total number of scenarios Nscenario is 140 since the number of levels of σ~S2 is 14 in Table 4 and there are 10 combinations of σ~O2,σ~SO2,andσ~E2 at each level of σ~S2 in Table 5. The population of each effect is generated by the truncated normal distribution bounded by ±3σ~Q. In practice, an inspection process may filter outliers of the products, so the truncated normal assumption is reasonable for the actual samples. Normal distribution can be used if this is insignificant. The size of each population Spop is 10,000 except for the interaction, which is 10,000 by 10,000. The true values of the performance metrics are computed by the crossed design formulas in Table 3 with population parameters, σS2, σO2,σSO2,andσE2. A set of observations is generated by Equations (1) and (3) with the sampled effects from the populations. At this time, the structure of observations follows the experimental design in Table 6. Section 4.3 explains the experimental design in more detail. At next step, AGRRs using the crossed and nested designs estimate the variance components, where the significance level for determining pooling of the interaction is 0.05. For every AGRR, the performance metrics, % Study Variation, % Contribution, % Tolerance, and NDC are calculated by the formulas in Table 3 where the Tolerance is 6σ~S. These steps are repeated Nrepeat(100) times.
4.2. Simulation Parameters
The sample variance σS2 affects the % Study Variation, % Contribution, and NDC among the four performance metrics. To investigate the effect of σS2, we set σ~S2 from 2^{0}(1) to 2^{13} (8192) in Table 4, while σ~O2+σ~SO2+σ~E2 is three. The variations, σO2, σSO2, and σE2 affect all the performance metrics. To analyze the effects of the variance components, ten orthogonal sets of σ~O2,σ~SO2,andσ~E2 are designed as shown in Table 5. Since σ~O2+σ~SO2+σ~E2 is fixed at three, the experimental design seems to be limited. However, except for % Tolerance, the performance metrics do not depend on the magnitude but the ratio among the variance components. In this perspective, our experimental design can cover a total of 140 scenarios and the number is sufficient.
4.3. Experimental Design for the Crossed and Nested Designs
Direct comparison of performance metrics between the crossed design and the nested design is meaningless because they are used in different environments. However, it is valuable to evaluate and compare the adequacy levels. To do that, we designed two structures that use the same observations, for the experiment. For example, in Figure 1, we use the same observations in both the designs. The experimental design to maintain the same number of observations for the crossed and the nested design is as follows.
The number of operators, samples, and replicates also affects the performance metrics. To investigate their effects, we set up a 2^{3} factorial design for the number of operators using the crossed design (oC). The number of samples per operator in the nested design (sN) and the number of replicates in the crossed design rC are shown in Table 6. In order to match the number of observations, the number of operators in the nested design (oN), the number of samples in the crossed design (sC), and the number of replicates in the nested design (rN) are assigned oC, oCsN, and oNrC, respectively.
5. Experimental Results5.1. Performance Metrics
Figure 2 shows the trajectories of the averages of the four performance metrics for the crossed design, the nested design, and the population over σ~S2. The two dashed horizontal lines indicate the rule-of-thumb decision criterion of each performance metric. The regions I, II, and III represent the acceptable, the pending, and the unacceptable regions, respectively, based on the performance metrics of population. Since σ~O2+σ~SO2+σ~E2 and tolerance are constants, the performance metrics of the crossed design are functions of σ^S2. As for the nested design, the performance metrics are also functions of σ^S2 because averaging and orthogonal designs compensate the individual effects of σ^O2,σ^SO2,andσ^E2. In Figure 2, all average performance metrics of the crossed design are very close to the values of the population that are true values. It implies that the adequacy level of the crossed design is very high. However, the metrics of the nested design differ from the values of the population. In particular, the trajectory of nested design for the % Tolerance differs from the trajectory of the population. It is caused by the overestimation of σ^S2 (Section 5.4. will elaborate on this). Moreover, the population shares the formulas for performance metrics with the crossed design. The σSO2 is nested into σS2 in the nested design but it is added to σR&R2 in the crossed design and the population. In theory, the gap between the nested design and the population should decrease in region III since σ^S2 is much bigger than σ^SO2. Region III is important because the metrics of a good measurement system are positioned at the region. Therefore, we can conclude that nested design does not provide the correct result on gauge precision.
Average performance metrics over the sample variance.
5.2. Percentage of Correct Decision
To investigate the nested design further, we employed a new metric, the percentage of correct decision (PCD). The PCD is the percentage ratio of the same decision of the crossed or nested designs, to the population. Figure 3 shows the PCD over σ~S2 for each performance metric and shows that the vertical dashed lines and the regions are equivalent to Figure 2. The PCD of the crossed design decreases to about 50% around the borderlines of decision criteria and close to one at the other range of σ~S2. It is reasonable because even a small variation of the performance metrics at the border lines results in a different decision. If the AGRR works correctly, the PCD must be close to 100% except for the borderlines. However, the trajectories of the PCD of the nested design decrease rapidly up to almost zero and do not recover to 100% until a range of σ~S2. In region III, those are about 60% (in case of NDC, 70%). It implies that the decision of the nested design is almost random.
Percentage of correct decision by performance index for all populations.
5.3. Effects of the Allocation of the Variance Components and the Experimental Design
In this subsection, we investigate the causes of the poor quality of the nested design in the perspective of allocation of variance components and the experimental design. Figure 4 shows the % Study Variation for the allocations of variance components (σ~O2,σ~SO2,σ~E2) in Table 5, and Figure 5 shows % Study Variation for the experimental designs, oC,sC,rC and oN,sN,rN, in Table 6. As mentioned in Section 5.1, the metric should be close to the population as σ~S2 increases. However, the gap is still large in the region III irrespective of the allocation of σ^O2,σ^SO2,andσ^E2, while the gap of the crossed design is close to zero. This phenomenon is very similar to that in the experimental design. In general, increasing degree of freedom improves the estimation quality of variance components in AGRR; thus it decreases the gap to the true value. However, in Figure 6, changing the experimental design does not improve the gap of the nested design significantly. The adequacy of the nested design is still very low at all regions. From the above results, we can conclude that the allocation of the variance components and the experimental design are not critical to the poor adequacy of the nested design.
Average % Study Variation for the initial variance components.
Average % Study Variation for the experimental designs.
Histogram of sample repeatability from four different populations.
5.4. Robustness of AGRR
Next, we draw histograms of the repeatability and reproducibility at four distinct σ~S2, as shown in Figures 6 and 7, respectively. We fixed the other parameters to eliminate the side effects: (σ~O2,σ~SO2,σ~E2) as (1,1,1), oC,sC,rC as (3,9,2), and oN,sN,rN as (3,3,6). As for repeatability, as shown in Figure 6, both designs have histograms similar to the Chi-squared distribution at every σ~S2. It coincides with the theoretical result. However, this is not so with reproducibility. In the case of the crossed design, the shape of the histogram still seems to be Chi-squared distribution. However, as shown in Figure 7, the histograms of the nested design differ, which have many zeroes and spread widely over all ranges. The zeroes of reproducibility make the effect of the operator statistically insignificant; hence, the results of AGRR are unstable. This implies that the estimation of reproducibility of the nested design is inadequate and it could be the main reason for the inadequacy of the design.
Histogram of sample reproducibility from four different populations.
Table 7 shows the estimates of the variance components at fourteen σ~S2s for the population, the crossed design, and the nested design. All estimates of the crossed design are very close to the estimates of the population. On the other hand, the AGRR of the nested design overestimates the reproducibility, σ^O2, while it properly estimates the repeatability and σ^O2+σ^SO2. The overestimation of σ^O2 increases according to σS2. Since several estimates of σ^O2 are zero in Figure 7, its overestimation implies that the nonzero estimates are very large. In the nested design, an operator does not share samples with other operators; therefore, it is hard to separate the variability of the operator and the sample in ANOVA effectively. That is, the variability of σS2 leaks to estimates of σ^O2, and, consequently, it increases σ^O2 and makes the adequacy low.
Average estimates of variance components of the population, the crossed design, and the nested design.
Population
Crossed design
Nested design
σE2
σO2
σSO2
σS2
Repeatability σ^E2
Reproducibility σ^O2+σ^SO2
σ^S2
Repeatability σ^E2
Reproducibility σ^O2
σ^S2+σ^SO2
0.978323
0.983632
0.973362
0.977343
0.991771
1.948983
0.992134
0.977858
1.066857
1.971704
0.967414
0.966841
0.973394
1.946821
0.981132
1.919912
1.94997
0.968178
1.109688
2.912493
0.975347
0.976984
0.973246
3.874717
0.987337
1.925811
3.895092
0.97622
1.266511
4.876734
0.970432
0.970866
0.973523
7.757004
0.984564
1.944843
7.806784
0.971788
1.63028
8.772708
0.97772
0.971914
0.973365
15.48564
0.995433
1.942836
15.483
0.982204
2.356923
16.44207
0.969531
0.972552
0.973272
30.93152
0.981883
1.940877
31.04326
0.970869
3.81615
31.86321
0.973194
0.97107
0.973456
62.02678
0.986751
1.937034
61.96536
0.975096
6.844492
62.9203
0.969944
0.973756
0.973213
124.708
0.980781
1.950806
124.9287
0.968457
12.68279
126.2736
0.968533
0.972231
0.973198
249.9384
0.982749
1.956499
251.2949
0.969171
25.92697
251.5965
0.969467
0.96702
0.973414
497.5466
0.983512
1.931515
496.9132
0.968052
50.91509
495.8767
0.972785
0.971262
0.973209
994.6194
0.98618
1.940338
1000.073
0.975168
99.09859
997.4504
0.9712
0.970518
0.973416
1999.386
0.984346
1.936489
1987.74
0.973287
189.851
1992.567
0.97488
0.967591
0.97352
3993.367
0.987316
1.933699
4007.995
0.974775
383.4533
4018.364
0.97684
0.975879
0.973405
7910.442
0.991661
1.943502
7960.486
0.979389
781.8932
7959.333
5.5. Evaluation of Robustness
The AGRR is based on sampling statistics. Even though the population is the same, each run of the AGRR can differ. If an AGRR method is reliable and robust, the variance of the results should be small. To investigate the robustness of AGRR, we employed the symmetric mean absolute percentage error (sMAPE) as follows [29]:(5)sMAPEk=1n·∑i=1nσ^Q,i2-σQ,p2σ^Q,i2+σQ,p2/2·100,where σ^Q,i2 denotes estimates of Q with the ith set of observations and σQ,p2 is the true value of Q from the population, n is the number of repetition, and k is the index of scenario. sMAPE is a non-scale dependent variability measure; therefore the range is from −200% to 200%. However, since the values of σ^Q,i2 and σQ,p2 are all nonnegative, in our case, sMAPEk is nonnegative. The smaller the sMAPEk, the lower the variability. Figure 8 shows the averaged sMAPEk of repeatability, reproducibility, R&R, and σ^S2 for the crossed design or σ^S2+σ^SO2 for the nested design. As for the crossed design, all averaged sMAPEk are very stable. On the other hand, the averaged sMAPEk of the nested design are unstable except repeatability. The severity of the reproducibility and, consequently, the R&R are critical in regions II and III, where the value is over 150%. This result implies that the reproducibility of the nested design is not robust.
Averaged sMAPEk of the crossed design and the nested design.
6. Conclusions
In this paper, we evaluated the adequacy of the AGRR from the perspective of practitioners. To this end, we designed a Monte Carlo simulation, which is different from conventional approaches but close to the actual AGRR process. We considered and compared two main experimental structures, crossed and nested designs, regarding four popular performance metrics for various combinations of σS2,σO2, σSO2, and σE2. The experimental results show that the adequacy of the crossed design is appropriate for all the evaluation perspectives: the average performance metrics, PCD, histogram shape, and sMAPE. However, the adequacy of the nested design is very low for all evaluation terms. We revealed that inadequacy comes from the overestimation of σ^O2. We tried to solve this problem by increasing the number of operators, but the problem is still unsolved. In conclusion, we highly recommend not applying the nested design as a tool of AGRR unless a solution to this problem is found. This solution could be any compensation coefficient of σ^O2 or other experimental design. This could be a topic of further research.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education, Science, and Technology (NRF-2013R1A1A2006947).
AIAGMontgomeryD. C.RungerG. C.Gauge capability and designed experiments. Part I: basic methodsMontgomeryD. C.RungerG. C.Gauge capability analysis and designed experiments. Part II: Experimental design models and variance component estimationBurdickR. K.LarsenG.Confidence intervals on measures of variability in RR studiesGormanD.BowerK. M.Measurement System Analysis And Destructive TestingPanJ.-N.Determination of the optimal allocation of parameters for gauge repeatability and reproducibility studyBergeretF.MaubertS.SourdP.PuelF.Improving and applying destructive gauge capabilityMinitabBorrorC. M.MontgomeryD. C.RungerG. C.Confidence intervals for variance components from gauge capability studiesGraybillF. A.WangC. M.Confidence intervals of nonnegative linear combinations of variancesSAS InstituteSAS Technical Report P-229, SAS - STAT Software: Changes and Enhancements, Release 6. 071992SAS Publishinghttp://www.barnesandnoble.com/w/sas-technical-report-p-229-sas-stat-software-s-a-s-institute-incorporated/1014558798?ean=9781555444730MillikenG. A.JohnsonD. E.DolezalK. K.BurdickR. K.BirchN. J.Analysis of a two-factor R & R study with fixed operatorsHamadaM.WeerahandiS.Measurement system assessment via generalized inferenceChiangA. K.A simple general method for constructing confidence intervals for functions of variance componentsDanielsL.BurdickR. K.QuirozJ.Confidence intervals in a gauge RR study with fixed operatorsWangF.-K.LiE. Y.Confidence intervals in repeatability and reproducibility using the Bootstrap methodBurdickR. K.AllenE. A.LarsenG. A.Comparing variability of two measurement processes using RR studiesCochranW. G.Testing a linear relation among variancesChiangA.Improved confidence intervals for a ratio in an RR studyBurdickR. K.BorrorC. M.MontgomeryD. C.A review of methods for measurement systems capability analysisAdamecE.BurdickR. K.Confidence intervals for a discrimination ratio in a gauge RR study with three random factorsBurdickR. K.ParkY-J.MontgomeryD. C.Confidence intervals for misclassification rates in a gauge RR studyWoodallW. H.BorrorC. M.Some relationships between gage R & R criteriaBrowneR.MacKayJ.SteinerS.PanJ.-N.LiC.-I.OuS.-C.Determining the optimal allocation of parameters for multivariate measurement system analysisDe MastJ.TripA.Gauge R&R studies for destructive measurementsVan Der MeulenF.De KoningH.De MastJ.Nonrepeatable gauge R&R studies assuming temporal or patterned object variationWallströmP.SegerstedtA.Evaluation of forecasting error measurements and techniques for intermittent demand