A Robust Control Chart for Monitoring Dispersion

Most robust control charts in the literature are for monitoring process location parameters, such as mean or median, rather than process dispersion parameters.This paper develops a new robust control chart by integrating a two-sample nonparametric test into the effective change-point model. Our proposed chart is easy in computation, convenient to use, and very powerful in detecting process dispersion shifts.


Introduction
Statistical process control (SPC) has been widely used in various industrial processes.Most SPC applications assume that the quality of a process can be adequately represented by the distribution of a quality characteristic, and the in-control (IC) and out-of-control (OC) distributions are the same with only differing parameters.
While parametric methods are only useful in certain applications, there is often a lack of enough knowledge about the process distribution.For example, univariate process data are often assumed to have normal distributions, although it is well recognized that, in many applications, particularly in start-up situations, the underlying process distribution is unknown and not normal, so that statistical properties of commonly used charts, designed to perform best under the normal distribution, could potentially be (highly) affected.So robust charts are needed in such situations.A chart is called robust or distribution-free if its IC run-length distribution is nearly the same for every continuous distribution [1].
In the last several years, robust control charts have attracted much attention.For example, Bakir and Reynolds [2] proposed a cumulative sum (CUSUM) chart for group observations based on the Wilcoxon signed-rank statistic.
McDonald [3] considered a CUSUM procedure for individual observations based on the statistics called "sequential ranks." An exponentially weighted moving average (EWMA) chart for individual observations proposed by Hackl and Ledolter [4] is constructed by the "standardized ranks" of observations, which are determined by IC distributions.If the distribution is not available, they recommended using the ranks in collected reference data instead.The robust charts considered by Chakraborti et al. [5,6] are based on the precedence test.Recently, a Shewhart-type chart and a scheme using change-point formulation based on the Mann-Whitney test statistic were investigated by Chakraborti and van de Wiel [7], Zhou et al. [8], and Hawkins and Deng [9].Jones-Farmer et al. [10] developed a rank-based robust Phase I control scheme for subgroup location.Other developments include Albers and Kallenberg [11] and Bakir [12,13].A nice overview on the topic of univariate robust control charts was presented by Chakraborti et al. [1].In addition, robust control charts in multivariate cases have been discussed by Liu [14], Qiu and Hawkins [15], and Qiu [16].
Most of the robust charts mentioned above focus on monitoring process median, but monitoring the process dispersion is also highly desirable.However, there are far fewer robust control charts which can monitor process dispersion.Zou and Tsung [17] proposed a chart which incorporates a powerful goodness-of-fit (GOF) test [18] using the nonparametric likelihood ratio into an EWMA chart.It can detect more general changes than location shifts and is also very easy in computation but leaves a tuning parameter  to choose.This paper develops a new robust control chart by integrating a two-sample nonparametric test [19] into the effective change-point model.Simulation studies show that the proposed method is superior to other robust schemes in monitoring dispersion.As it avoids the need for a lengthy data-gathering step before charting (although it is generally necessary and advisable to have about at least 20 warm-up samples) and it does not require knowledge of the underlying distribution, the proposed chart is particularly useful in startup or short-run situations.
The rest of this paper is organized as follows.The control chart for Phase I is given in Section 2. The control chart for Phase II is derived in Section 3. The performance comparisons with two other robust control charts are discussed in Section 4. The conclusion is given in Section 5.

The Control Chart for Phase I
We begin by considering the Phase I problem of detecting a change point in a fixed-size sequence of observations.We denote the observations by { 1 , . . .,   }, and the goal is to test whether they have all been generated by the same probability distribution.We assume that no prior knowledge is available regarding this distribution other than that it is continuous.Using the language of statistical hypothesis testing, the null hypothesis is that there is no change point, and all the observations come from the same distribution, while the alternative hypothesis is that there exists a single changepoint  in the sequence which partitions them into two sets, with  1 , . . .,   coming from the prechange distribution  0 and  +1 , . . .,   coming from a different postchange distribution  1 : We can test for a change point immediately following any observation   by partitioning the observations into two samples  1 = { 1 , . . .,   } and  2 = { +1 , . . .,   } of sizes  1 =  and  2 =  − , respectively, and then performing an appropriate two sample hypothesis test.For example, to detect a change in location parameter without making assumptions about the distribution, Mann-Whitney statistic would be a proper test statistic [9].In order to monitor the process dispersion, we will consider the Mood test.The Mood test uses a statistic like the following: where  1 is the rank of the th observation   in the pooled sample. 1 could be computed as The mean and variance of the Mood test statistic are 12 , We reject the null hypothesis that no change occurs at  if  , > ℎ , for some appropriately chosen value of ℎ , .The statistic can be integrated into the change-point model and is easy to compute.Now, since we do not know in advance where the change point is located, we do not know which value of  to use for partitioning.We therefore specify a more general null hypothesis that there is no change at any point in the sequence.The alternative hypothesis is then that there exists a change point for some unspecified value of .We can perform this test by computing  , at every value 0 <  <  and taking the maximum value.This leads to the maximized test statistic: If   > ℎ  for some suitably chosen threshold ℎ  , then the null hypothesis is rejected, and we conclude that a change occurred at some point in the data.In this case, the best estimate τ of the location of the change point is at the value of  which maximized   .If   ≤ ℎ  , then we do not reject the null hypothesis and hence we conclude that no change has occurred.The choice of this threshold will be discussed further in the following section.

The Control Chart for Phase II
Having considered the problem of detecting changes in a fixed-size sample, we now turn to the task of sequentially Phase II monitoring where new observations are being received over time.Let   denote the th observation where  is increasing over time.
First, there are only a finite number of ways to assign ranks to a set of  points; the   statistic can only take a discrete set of values.This creates a problem for threshold choice when  is small, since it may not be possible to find a value for ℎ  which gives the exact ARL 0 required, which is a general problem when dealing with discrete valued test statistics.Therefore, we recommend that Phase II monitoring only begins after the first 20 observations have been received, which gives sufficient possibilities for rank assignments to make most ARL 0 s achievable.This seems a reasonable compromise, since in practice it would be very difficult to detect a change that occurred during the first 20 observations.Then; we make some modifications to the   statistic.Suppose there are  0 warm-up data.Because it is impossible to have a change point in these warm-up data, we set the   statistic as follows: Once a new observation   is received, we then regard { 1 , . . .,   } to be a fixed-size sample and employ our proposed method based on the above modified   statistic to test if a change point has occurred.The problem of sequentially monitoring is then reduced to performing a sequence of fixed-size tests.Suppose it is desired to have an IC average run length (ARL 0 ) of .This can be achieved if we choose the ℎ  values so that the probability of incurring a false alarm at the th observation equals to 1/.We hence require that for all   ( It is not trivial to find a sequence of ℎ  values which satisfy this property.The approach in Hawkins and Deng [9] is to use Monte-Carlo simulation.We will follow in the same way.One million realizations of the sequence { 1 , . . .,  1000 } were generated.Because the distribution of   is independent of the distribution of the   observations, these   values can be sampled from any continuous distribution so long as they are independent and identically distributed.Then, for each value of ,   is computed for each of the million realizations.The values for ℎ  corresponding to the desired ARL 0 can then be read off from them.Table 1 shows the values of ℎ  which gives various commonly used values for the ARL 0 .Note that these values appear to have converged by the 1000 observation, so if the stream contains more than 1000 observations it is reasonable to let ℎ  = ℎ 1000 for  > 1000.Now we denote our chart by ROBUSTD, implying Robust Control Chart for monitoring process dispersion. To be used in practice, our approach requires a computationally efficient method for computing the ROBUSTD statistic   .First, we denote  +1 1 as the rank of the th observation   in all ( + 1) observations.Although computing these  +1 1 values seems like it may be computational expensive, this can be greatly reduced by noting that the arrival of a new observation  +1 only has a small effect on the values of the  +1 1 values.It can easily be shown that where  denotes the possible change point.Therefore, we can compute  ,+1 based on these  +1 1 values and get the  +1 value ultimately.

Performance Comparisons
We now evaluate the performance of our chart.As is standard in the quality control literature, we measure performance as the average time taken to detect a change of magnitude , which we denote by ARL 1 ().We consider changes which affect the process dispersion.Three different process distributions are considered: the standard normal distribution (0, 1), the Student t distribution with 3 degrees of freedom (3), and the chi-square distribution with 3 degrees of freedom  2 3 .The latter two correspond to the heavy tailed and skewed distributions, respectively.
Because our chart can be treated as a self-starting chart, the number of observations available before the change may have a large impact on its performance.We will consider changes which occur after both 50 and 100 observations, that is,  ∈ [50, 100].We compare our ROBUSTD chart to two other change-point detection algorithms.The first is the method described in Hawkins and Deng [9] for location shifts, which we will denote by MWCPM.It uses a similar change-point model to ours, but there test statistic is the Mann-Whitney statistic.Second, we compare our ROBUSTD chart to Zou and Tsung [17], which integrates the nonparametric likelihood ratio test framework into the EWMA chart.We notice that their chart contains a tuning parameter  used in the EWMA scheme.Large values of  produce a chart which is more efficient to large changes, while small values of  produce a chart which is sensitive to small changes.We choose to use  = 0.1 which is a value considered in their paper, and we denote their chart by NLREWMA.To allow fair comparisons, we set the ARL 0 of every chart at 500.Similar results hold for other values of ARL 0 , but we omit them for space reasons.For each of the three distributions, 10000 sequences were generated, and the change consists of multiplying  to all postchange observations, respectively.The average time taken to detect the change is then recorded for each chart.
Tables 2, 3, and 4 show the average time required to detect shifts in dispersion, from which we can get the following conclusions.
(i) Our chart is much better than the MWCPM at all cases of dispersion shifts.
(ii) Our chart is much better than the NLREWMA at most cases of dispersion shifts.
So we can conclude that when we want to monitor dispersion shifts, our chart is the best choice since it gives excellent performance across all magnitudes of shifts considered based on comparisons previously mentioned.

Conclusions
We proposed a new robust and self-starting control chart to detect dispersion shifts by integrating a two-sample nonparametric test [19] into the effective change-point model.
Our chart is much better than some other nonparametric methods at most cases for shifts in dispersion.As it avoids the need for a lengthy data-gathering step before charting (although it is generally necessary and advisable to have several warm-up samples) and it does not require knowledge of the underlying distribution so the proposed chart is particularly useful in start-up or short-run situations.

Table 1 :
Values of the threshold sequence ℎ  corresponding to ARL 0 of 200, 500, and 1000.

Table 2 :
ARL 1 () for dispersion shifts in the (0, 1) distributions, for several values of the change time .

Table 3 :
ARL 1 () for dispersion shifts in the heavy tailed distribution (3), for several values of the change time .

Table 4 :
ARL 1 () for dispersion shifts in the skewed distribution  2 3 , for several values of the change time .