Robust Hotelling T 2 Control Chart with Consistent Minimum Vector Variance

T 2 MVV, was introduced in Phase II. T 2 MVV was able to detect out-of-control signal and simultaneously control false alarm rate even as the dimension increased. However, the estimated UCLs of T2 MVV are large as compared to the traditional chart. In this study, we improved the MVV estimators in terms of consistency and bias. The result showed great improvement in the control limit values while maintaining its good performance in terms of false alarm and probability of detection.


Introduction
Hotelling  2 statistic was the first statistic known to be used in multivariate control chart.The control chart is referred to as Hotelling  2 control chart.This statistic is used to measure the significance of the shifted distance from the out-of-control mean vector,   , to the nominal mean vector,  0 , with the assumption that the covariance matrix remains constant at Σ 0 .The purpose of the control chart is to monitor the stability of a multivariate process in Phase I and II.Analysis in Phase I seeks to identify a stable historical data set (HDS).From this dataset, the in-control mean vector and the in-control variance-covariance matrix are estimated and later will be used in the Phase II analysis.A successful process monitoring in Phase II totally depends on the estimates of the parameters obtained from a stable HDS.However, the estimators are easily affected by unstable process, that is, multivariate outliers.The existence of outliers can violate the normality assumption.This violation may lead to the inflation of control limits and reduction of the probability of detection in Phase I, which consequently will cause the level of false alarm to be distorted, and the power to detect changes will be reduced in Phase II process [1].False alarm rate is the probability of out-of-control signal when a process is in control.The value becomes large if the process is unstable due to the increase in variability.Inflated false alarm rate can lead to unnecessary process adjustments and loss of confidence in the control chart as a monitoring tool [2].Hence, a method which can control the false alarm rate to the desired (nominal) level is necessary.
However, the traditional Hotelling  2 control chart is only effective in eliminating extreme outliers in small sample sizes, but it fails to detect moderate outliers particularly when the number of variables is large [3][4][5].To overcome the problem, alternative estimation methods have been proposed in the literature.One of the approaches is to calculate the  2 statistic based on successive differences variancecovariance matrix estimator [6][7][8][9].Though this approach is effective in detecting shifts in the mean vector, it fails to detect other outliers as shown in Vargas [3].Another approach is to use robust estimators in place of the classical estimators ( and ).Robust estimators are known to be more effective in detecting the deviation of data, or outliers as compared to the classical estimators [10].A wide range of robust estimators of multivariate location and scatter is available; see [11,12] for a review.However, MCD estimator is more attractive than others because it has good theoretical properties with affine equivariance, high breakdown value, bounded influence function, and better convergence rate [13,14].The study on the significant role of MCD estimators in the construction of robust Hotelling  2 chart can be easily found in the literature.Vargas [3] and Jensen et al. [4] introduced robust control chart based on MCD estimator for multivariate individual observations.They identified and removed the Mathematical Problems in Engineering outliers in Phase I analysis by using robust estimator and then calculated the classical estimator using the remaining good data points for Phase II analysis.They noticed some drawbacks when MCD was used in Phase I. Hotelling's  2 issued from MCD needed a larger sample size if large number of outliers was suspected to ensure that MCD estimator did not breakdown and lose its ability especially when monitoring with more quality characteristics ().To abate the problems, Chenouri et al. [5] proposed robust Hotelling  2 chart based on reweighted MCD estimator.Besides possessing the nice properties of MCD estimator, the estimator was not unduly influenced by outliers and was more efficient than MCD.However, their approach was different from Vargas [3] and Jensen et al. [4] whereby they used RMCD estimator in place of classical estimators in constructing Hotelling  2 chart for Phase II data directly.Using the same approach as Chenouri et al. [5], Alfaro and Ortega [15] conducted a comparison study on the performance of Hotelling  2 control chart in Phase II process based on MCD, MVE, reweighted MCD, and trimmed estimator.Their finding showed a conflict between the percentage of outliers detection and the ability of the robust control charts in controlling the overall false alarm rate under certain conditions.To alleviate this conflict, Yahaya et al. [16] introduced the MVV estimator in Hotelling  2 chart ( 2 MVV ) in Phase II.In general, the result showed that  2 MVV chart was able to increase the detection of out-of-control signals and simultaneously control false alarm rates even with large number of quality characteristics.In contrast, the MCD charts performed well in detecting out-of-control signals but failed in controlling false alarm rates.The traditional chart, however, was able to control false alarm rates but not effective in detecting out-of-control signals.Despite the good performance of  2 MVV , the estimated UCLs for Hotelling  2 chart issued from MVV estimators were large as compared to the traditional and MCD charts.Thus, this study attempts to improve the MVV estimators in achieving the desired UCLs by making the estimators consistent at normal model.Since in practice we always deal with finite samples, therefore the issue of bias in a finite sample is also considered in this study.The advantage of having unbiased estimator for a finite sample is that this estimator remains unbiased even though the sample size becomes larger [17].With respect to the latter issue, this paper will also seek to improve the performance of MVV by making it unbiased for finite samples.
The organization of the remaining part of this paper is as follows.The formal definition of MVV estimator and the adjustment done on the MVV scatter estimator to ensure that it is consistent and unbiased will be discussed next, followed by the investigation on the improved MVV estimator through simulation study.The discussion continues with the computation of control limits for the traditional, MCD, and the improved Hotelling  2  MVV charts, and the improvement of the proposed chart is revealed in this section.A real data analysis from aircraft industry is presented to illustrate the applicability of the proposed charts before arriving to the conclusion in the last section.

Minimum Vector Variance (MVV) Estimator
Herwindiati et al. [18] had proved that MVV estimators possess three major properties of a good robust estimator, that is, high breakdown point, affine equivariance, and computational efficiency.The main method used in the estimation of MVV is the Mahalanobis squared distances (MSDs).Let  = { 1 ,  2 , . . .,   } be a data set of -variate observations.Denote the MVV estimators for the location parameter and scatter by  MVV , and  MVV respectively.Now let  ⊆ , the  MVV and  MVV are determined based on the set  consisting of ℎ = ⌊( +  + 1)/2⌋ data such that  MVV has minimum trace of  2 MVV , denoted as Tr( 2 MVV ), among all possible sets of ℎ data.To compute the estimates of MVV, we used the MVV algorithm proposed in Yahaya et al. [16].The location and scatter estimators are defined as 2.1.Consistency Factor.The aim of Hotelling  2 chart in Phase I is to estimate the in-control parameters of location,  and scatter, Σ.The usual estimators for these parameters are the normal maximum likelihood estimators (MLE).The estimation of parameters is based on the data set  = { 1 ,  2 , . . .,   } from multivariate normal distribution with density with  ∈ R  and Σ ∈ Z + .However, the distribution of (3) is only an approximation because a portion of the data may be contaminated by outliers [19].With the existence of outliers, MLE which are known to be sensitive to outliers will not be able to precisely estimate the parameters.To address this problem, we propose MVV estimators, that is, robust estimators with highest breakdown point (50%) proposed by Herwindianti [20] to replace the MLE.We compute the MVV estimators in Phase I data sets, with location and scatter estimators as defined in ( 1) and ( 2), respectively.The MVV estimator has a fixed integer ℎ such that The preferred choice of ℎ for outlier detection is its lower bound, which yields the breakdown value, BP = ( − 2( − 1))/2.Let  MVV and  MVV be the mean and the scatter matrix calculated from the ℎ observations out of   , whose classical scatter matrix has the lowest vector variance resulting from ℎ smallest MSD.The  MVV is a scatter  ×  matrix which is positive definite, symmetric (PDS), and affine equivariant [20].However, this estimator is not consistent under normal model.Robust scatter estimator is typically calibrated to be consistent for normal model.Known as Fisher consistency, this is a standard concept in robust statistics which denotes that the functionals evaluated at the model distribution return the true parameter value, Σ [19].In order to achieve consistency under the normal model,  MVV (in ( 2)) is multiplied by a consistency factor, (ℎ), as follows: The approximation of consistency factor can be obtained from elliptical truncation in the multivariate normal distribution based on squared distance.If   ∼ (, Σ), (ℎ) is defined as where  2 ,ℎ/ is the ℎ/-quantile of  2  distribution.This formula is derived by Butler et al. [13] and further discussed in Croux and Haesbroeck [14] based on the functional form of the MCD estimator.Since MVV have the same functional form with the MCD estimator, we used (6) as the consistency factor for  MVV .Albeit guaranteed consistency under normality distribution, Pison et al. [17] cautioned that MCD estimators were biased for small sample sizes.Thus, the consistency factor in (6) only might not be sufficient to make MVV estimator unbiased for small sample sizes.For that reason, we also include the computation of correction factor at any sample size  and dimension .

Correction Factor.
A simulation study on the effect of correction factor on the MVV estimator is carried out for several sample sizes  and dimension  = 2, 5, 10, 15, and 20.We generated data sets  () ∈ R × from standard multivariate normal distribution.For each data set  () ,  = 1, . . .,  we then determine the (ℎ) The computed values are displayed in Table 1.Then, using   , in (7) as the correction factor for (ℎ) MVV , we obtain Since   , (ℎ) MVV can be considered consistent and unbiased, the determinant of   , (ℎ) MVV should approach 1.

Investigation through Simulation Experiment
Gather and Becker [21]  MVV statistic for   can be constructed in the following manner: To check on the distributions of the improved  2 MVV , we employed the QQ plots and evaluated the goodness of fit on those plots based on the slope and the -square of the straight line as shown in Table 2.The hypothetical distribution represents the  2  without error if all points are in a straight line with slope equals 1 and -square also equals 1 [22].Random data were generated from multivariate standard normal distribution MVN(0,   ).This study is carried out for the sample size of  = 10,000 with dimensions of  = 2, 5, 10, 15, and 20.From this table we observe that the -square values for all 's are 0.999.With regard to the slopes, we can see a considerable difference in the values between the Hotelling's  2 with original MVV( 2 MVV() ) and Hotelling's  2 with improved MVV( 2 MVV() ) especially when  = 2.The slopes for  2 MVV() are consistent and approximately equal to 1 regardless of the dimensions ().In contrast, the slopes for  2 MVV() are quite a distance away from 1 even though the pattern shows a declining in values towards 1 as  increases.We observe that the values for the two measurements ( 2 and slopes) are very close to the ideal value, which signify that the  2  distribution fits well with the simulated  2

MVV(𝐼)
values.The result implies that the constant   , (ℎ) fulfills the condition of the multiplicative factors to make the  MVV estimators consistent and unbiased for Σ.

MVV Hotelling 𝑇 2 Control Chart
Let   = { 1 ,  2 , . . .,   } be the -variate random sample of  observations of preliminary data set in Phase I. Calculate the  MVV and   , (ℎ) MVV estimators.Since the estimators are known to be free from outliers due to their estimation process, they could be readily used as in-control estimators in Phase II.By using these estimates,  2 MVV() statistic in ( 9) is computed for Phase II observation,   = { +1 ,  +2 , . ..}where   ∉   .

Estimation of Control Limits.
In this section, we present the control limit of the improved  2 MVV() control chart by control charts, we need to identify the distribution of each method in order to obtain appropriate control limits, that is, UCL.Since the exact distribution of  2 MVV() is unknown, we apply Monte Carlo method to estimate the quantiles of the  2 MVV() and  2 MVV() , for several combinations of sample sizes and dimensions.In order to estimate the 95% quantile of  2 MVV() for a given Phase I of sample size  and dimension , we generate  = 5000 samples of size  from a standard multivariate normal distribution, MVN  (0,   ).For each data set of size , we compute the MVV mean vector and the modified covariance matrix estimates,  MVV () and   , (ℎ) MVV (), respectively, from  = 1, . . ., .In addition, for each data set, we randomly generate a new observation  , treated as a Phase II observation from MVN  (0,   ) and calculate the corresponding  2 MVV() (, ) values.The empirical distribution function of  2 MVV() () is based on the simulated values We sort  2 MVV() (, ) values in ascending order, and the UCL is the 95% quantile of the 5000 statistics.The results of the investigation are presented in Table 3.We observe that the estimated UCLs for  2 MVV() are large as compared to the traditional control charts  2  ,  2 MCD , and  2 RMCD .However, after making the MVV scatter estimator consistent and unbiased as shown in (8), the results improved immensely.As we can see here, the UCLs are closer to the traditional UCLs.

Real Data Analysis. The application of the improved method 𝑇 2
MVV() on real data is illustrated using data furnished by Asian Composites Manufacturing Sdn. Bhd.(ACM) which involves in the production of advanced   2 MCD consider signal observations 20, 22, and 25 as out-ofcontrol but  2 fails to signal observation 22 and only consider, observations 20 and 25 as out-of-control.This is expected due to the low probability of detection in the traditional control chart [16].For a clearer visualisation on the performance of the control charts in detecting out or control observations, graphical presentation of the corresponding control charts are put on view in Figure 2.

Conclusion
The UCL value for the Hotelling  2 control chart using consistent and unbiased MVV estimators seemed to improve significantly from the Hotelling  2 control chart based on the original MVV estimators.The improved control chart ( 2 MVV() ) was put to test on real data.Even though the performance of the improved  2 MVV control chart was on par with the original  2 MVV chart, the improved estimators have successfully reduced the inflated UCL of the original  2 MVV close to the UCL of the traditional Hotelling  2 ( 2 ) control chart.However, when the improved control chart was compared with the traditional chart based on their   almost equal UCLs, the finding showed that the improved control chart performed better in detecting out-of-control observations.With the good properties and performance, this improved MVV estimators that should be considered as alternative estimators to replace the usual mean and variance vector in the construction of the robust Hotelling  2 control chart as well as other multivariate statistical procedures.
have emphasized that robust estimators to be used in the method of outliers detection should have sufficient rate of convergence to some true underlying model parameter for consistency and unbiased.A sequence of asymptotically unbiased estimators for parameter  is called consistent if lim  → ∞ (| θ − | ≥ ) = 0. To illustrate the analysis on the consistency of MVV estimator at multivariate normal, data are randomly generated from (0,   ).An experiment is carried out for several values of sample sizes  until convergent for a fixed moderate dimension such that  = 10.Figure1shows the determinants of   , (ℎ)  and Σ, in Hotelling  2 are consistent and unbiased.The squared distances using any affine-equivariant robust location and scatter estimators which are consistent and unbiased under normal model are asymptotically  2 distributed [21].Therefore, if  MVV and   , (ℎ) MVV are consistent and unbiased estimators for  and Σ, then with observations   i.i.d in R  ∼   (, Σ), it follows that  2  = (  −  MVV )  , (ℎ) If we consider a sample of  quality characteristics such that   = { 1 ,  2 , . . .,   } where  = 1, 2, . . .,  as a phase I data set, then the improved  2 MVVcorresponding to the sample size, .As the value of  increases, we can observe that the determinant approaches 1 which implies that the   , (ℎ) MVV is consistent.Next, the investigation using simulation experiment continues to show that  MVV and   , (ℎ) MVV which replaced the MLE,

Table 2 :
[9] slope and -square for  2 MVV() and  2 MVV() ., and number of dimensions, .The control limit of  2 MVV() chart is then compared with the control limit of  2MVV() chart, robust Hotelling  2 chart using MCD ( 2 MCD ) and the traditional Hotelling  2 charts.The application of robust estimators in place of the mean and covariance structure in traditional Phase II Hotelling  2 statistic will cause the distributional properties of the traditional chart to change[9].To demonstrate the performance of  2 MVV() and  2

Table 3 :
Control limits of the investigated control charts for various combinations of sample sizes and dimensions.

Table 4 .
For the purpose of this study, a sample of 47 spoilers ( = 47) which consists of several features, namely, trim edge ( 1 ), trim edge spar ( 2 ), and drill hole ( 3 ) were furnished to us by the company.Out of the total, 21 spoilers were collected from 2009, while the rest were from 2010.Hence, we decided to use the 2009 spoilers as Phase I historical data and considered the spoilers from 2010 as future data in this study.Estimates for the location vector () and scatter matrix () are presented in Table5.In the last column of Table5, we could clearly observe that the upper control limit (UCL) for  2 MVV() is the closest to the traditional Hotelling  2 with values of 11.5513 and 11.035, respectively, whereas the other control charts produce large UCL values especially the original  2 MVV ( 2 MVV() ).When we compare the improved with the original  2 MVV , we observe a large disparity between the two values such that  2 MVV() = 41.298, and  2 MVV() = 11.5513.The result indicates great improvement in the UCL values from  2 MVV() to  2 MVV() .Table 6 identifies the out-of-control data (bold font) using the different  2 statistics.Among the four statistics,  2 MVV() ,  2 MVV() and

Table 4 :
List of Phase I and Phase II real data.

Table 5 :
Estimates of location vector, covariance matrix, and UCL.

Table 6 :
The Hotelling  2 values for the future (Phase II) data.