Fault Identification in Industrial Processes Using an Integrated Approach of Neural Network and Analysis of Variance

Due to its importance in process improvement, the issue of determining exactly when faults occur has attracted considerable attention in recent years. Most related studies have focused on the use of the maximum likelihood estimator (MLE) method to determine the fault in univariate processes, in which the underlying process distribution should be known in advance. In addition, most studies have been devoted to identifying the faults of process mean shifts. Different from most of the current research, the present study proposes an effective approach to identify the faults of variance shifts in a multivariate process. The proposed mechanism comprises the analysis of variance (ANOVA) approach, a neural network (NN) classifier, and an identification strategy. To demonstrate the effectiveness of our proposed approach, a series of simulated experiments is conducted, and the best results from our proposed approach are addressed.


Introduction
Process personnel have always wanted to search for process faults in real time to significantly improve the underlying process.Statistical process control (SPC) charts have been successfully used to detect process faults for several decades.Because technological progress allows more and more advanced sensors to be used in a process, it has become popular to monitor multiple quality characteristics during a process.A multivariate process is simply defined as a process with two quality characteristics or more to be monitored.Due to having multiple quality characteristics, it is much more difficult to determine at what time a fault occurs in a multivariate process compared with that for a univariate process.
Multivariate statistical process control (MSPC) charts have been studied and developed [1][2][3]; however, their major function is basically to generate an out-of-control signal when process faults occur.It is extremely difficult to estimate or identify the beginning time of a fault using only MSPC charts.
In most situations, the beginning time of a fault contains most of the information behind the causes of the process fault.Rapidly and accurately estimating the beginning time of a fault would contribute much to the identification of the associated root causes of the fault and would significantly improve the process.
There have been many studies that investigated the beginning time of a process fault; however, most of the studies have focused on univariate processes [4][5][6][7][8][9][10][11][12][13][14].In addition, most related studies have focused on the use of the maximum likelihood estimator (MLE) method [3][4][5][6][7][8][9][10][11][12].However, the MLE method has a strict assumption: the underlying process distribution is known.Because the real-world process distribution is typically unknown, this strict assumption seriously restricts the range of the applicability of the MLE method.Besides, another problem that can be encountered is that there are a considerable number of explanatory variables when modeling a multivariate process with a considerable number of quality characteristics.To overcome the limitations of the MLE method and the difficulties when there are too many explanatory variables in a multivariate process, this study focuses on a multivariate process with ten quality characteristics and considers process variance shifts as the underlying process faults.Additionally, this study assumes that the process covariance matrix has shifted from Σ 0 to Σ 1 when the process fault has occurred.There are 56 input variables considered in this study.It is not practical to use all 56 variables as inputs into the proposed neural network (NN) classifier.Consequently, this study uses a hybrid technique to select fewer but more significant explanatory variables.This is the first stage of building the proposed scheme.The chosen significant variables are then used as inputs into the proposed NN models.This modeling is the second stage of creating the scheme.After conducting the NN classification, an identification strategy is combined with the scheme to estimate the beginning time of a process fault.
The structure of this study is organized as follows.Section 2 addresses the problems with previous studies.The research gaps and the proposed methodologies used are discussed in Section 3. Section 4 discusses the experimental simulations, where the results and analysis for the typical and the proposed approaches are reported.The final section concludes this study.

Problems and Process Models
In this section, we discuss the difficulties that can be encountered in practice.Several research studies that have investigated determining the beginning times of faults will be addressed.In addition, this section presents the models of a generalized multivariate process and the process fault.

Problems Statement.
In typical MSPC applications, an out-of-control signal would indicate that a process fault has occurred in the underlying process.At that moment, although we have evidence regarding the status of the underlying process, we would have difficulty determining the beginning time of the fault.In particular, if the effects of the underlying process faults are minor, the probability of triggering a signal at the beginning time of the fault would be extremely low.As a result, it is almost impossible to determine the beginning time of a process fault by only using the MSPC chart.For example, consider a multivariate process with ten quality characteristics monitored by |S|, an MSPC chart.The process fault with a process variance shift has occurred at time 201.Due to the small magnitude of the fault, it is not detected until time 230.Observing Figure 1, it is apparent that the beginning time of the fault is not equal to the MSPC signal.Their difference gets larger when the magnitude of the fault gets smaller.
Several studies have been conducted to address the difficulty of determining the beginning time of a fault [13,14].Process faults have typically been divided into two types: process mean shifts and process variance shifts [15,16].These studies propose the MLE approach to estimate the beginning time of a fault when the process mean or variance has shifted to a univariate process.The MLE approach with the use of EWMA charts has been reported for a univariate process [17].Whereas most of the existing MLE approaches have focused on univariate processes, the study in [3] derived an MLE for a multivariate process.However, the performance of this MLE was not stable when the number of quality variables became large.
The MLE is criticized for its strict assumption that states that the underlying process distribution must be known.This assumption is not feasible for practical processes.As a result, machine learning (ML) methods have been used to determine the beginning time of a fault [13,14].However, the number of input variables from those studies is extremely small because of the simplicity of the process structure.There are few studies that have investigated how to identify the beginning time of a fault when considerable input variables are involved.Too many input variables generally result in a time-consuming training stage with the ML approach.However, the study in [18] considered a large number of inputs in their experiments and also assumed that all the quality variables were at faults in the process, which is a rare case in industrial applications.Accordingly, this study proposes an effective hybrid scheme that integrates ANOVA, NN techniques, and an identification strategy to overcome the aforementioned difficulties.

The Multivariate Process and the Fault Models.
In contrast to the traditional multivariate normal distribution assumption for a process, this study considers a multivariate process that follows an unknown multivariate distribution.Assume that the multivariate process is initially in control and the sample observations are from an unknown distribution ( ̃, Σ) with a known mean vector  ̃and covariance matrix Σ 0 .After an unknown time  + 1, we assume that the process covariance matrix changes from × 1 be a vector that represents the  characteristics on the th observation in subgroup  with the unknown distribution function ( ̃, Σ).Accordingly, we have where  is the sample size,  + 1 is the change point,  is the signal time in which a subgroup covariance matrix exceeds the limits of the control chart |S|, " iid ∼ " means "independent and identically distributed, " and Σ 0 is the incontrol covariance matrix, which is defined as follows: Following the suggestion of [19], this study considers the following variance shift as the process fault: where  is the inflated ratio.Let the sample variancecovariance matrix in subgroup  be defined as To monitor a multivariate process variance shift, the sample generalized variances |S  |,  = 1, 2, . .., and the following control limits are used [2]: where |Σ 0 | is the determinant of Σ 0 and

The Proposed Scheme
In recent years, intelligent approaches, such as neural networks and support vector machines, have had an important role in the development of industrial technologies [20][21][22].Although acceptable results may be obtained using traditional intelligent approaches, these approaches may not fulfill the particular needs of industrial applications.Recent studies have shown that hybrid intelligent approaches can help achieve a better performance for particular applications [18,19,23,24].In this study, we develop a hybrid scheme to effectively determine the change point of a multivariate process.The proposed scheme includes the ANOVA, an NN, and the identification strategy.The scheme can be used when the multivariate process distribution is unknown and when there are a large number of input variables.The following sections address these components.

ANOVA.
The proposed hybrid two-stage method integrates the framework of ANOVA and an NN.In stage I, a oneway ANOVA test is applied to select important, influential variables.In stage II, the selected significant variables are taken as the input variables into the NN.
To identify significant variables, an F-test statistic is used to test the differences between the in-control and out-of control groups.Those significant variables selected in this stage are then substituted into the NN to construct a twostage model.

Neural Network.
The purpose of using an NN is to classify the process output as either an in-control or out-ofcontrol process.The identification strategy uses this information to activate its function.Accordingly, the beginning time of a process fault can be estimated in real time.
The structure of the NN can be briefly described as follows.The NN nodes are divided into three layers, which include the input, the output, and the hidden layers.The nodes in the input layer receive input signals from an external source, and the nodes in the output layer provide the target output signals.The output of each neuron in the input layer is the same as the input to that neuron.For each neuron  in the hidden layer and neuron  in the output layer, the net inputs are given by where () is a neuron in the previous layer,   (  ) is the output of node (), and   (  ) is the connection weight from neuron () to neuron ().The neuron outputs are given as where net  (net  ) is the input signal from the external source to node () in the input layer and   (  ) is a bias.The transformation function shown in ( 10) is called the sigmoid function and is the one most commonly used transformation function.Accordingly, this study uses the sigmoid function.
The generalized delta rule is the conventional technique used to derive the connection weights of the feedforward network.Firstly, a set of random numbers is assigned to the connection weights.Then, to obtain a pattern  with target output vector   = [ 1 ,  2 , . . .,   ]  , the sum of the squared error to be minimized is given as where  is the number of output nodes.By minimizing the error   using the gradient descent technique, the connection weights can be updated using the following equations: where for the output nodes, and for other nodes,

An Identification Strategy.
This study uses an NN to classify the status of a process at a certain time .When the output of the NN is classified as "0", this indicates that the process fault has not occurred.When the output of the NN is classified as "1", this indicates that a process fault has intruded into the underlying process.When an SPC chart is triggered at time , we know a fault has intruded into the underlying process.The identification component is then activated, and the NN begins to classify the status of the process from time  − 1 to 1 in a backward sequence.If the NN output is "1" at time −1, we may conclude that the beginning time of the fault has been confirmed at time  − 1 instead of time .Then, we can proceed to time  − 2. If the NN output is "1" again at time  − 1, we could conclude that the beginning time of the fault has been confirmed at time −2 instead of time −1.However, because all classifiers are not perfect, we could obtain misclassification results.That is, we may encounter a problem in which the NN output is 0 at time −1 and the values of the outcome are all 1 s from time  − 2 to  −  (where 1 ≤  ≤  − 1); one may ask what is the subsequent decision.The decision on the beginning time of a fault is not definitively made by observing only a single outcome.
In this study, because the NN outputs are either 1 or 0, we can consider them as the success or failure of a Binomial experiment, respectively.Accordingly, we can use the cumulative probability distribution of a Binomial experiment to determine the beginning time of a fault.If the NN has a good classification capability, we know that most of the output values from time  to  −  should be classified as 1, which implies that the cumulative probability of the Binomial distribution is near 1.Due to there being no perfect classifiers in reality, several misclassifications of NN outputs must be tolerated.Therefore, the cumulative probability of the Binomial distribution should be less than a certain threshold value.That is, if the value of the cumulative probability is greater than a threshold at a time  − , we can conclude that the beginning time of a fault has occurred at time  − .However, there is no theoretical threshold value.According to our experience and numerous simulations' results, we therefore estimate the threshold value as follows.
(1) Determining the Threshold.During the training and testing for the NN modeling phase, denoted previously as phase I, we can obtain an accurate identification rate (AIR) for the classification tasks.The AIR is equivalent to the probability of a successful rate (  ) from the Binomial experiments.Because the number of successes must be an integer, the following relationship should be used: where   is the number of successes in  Binomial experiments and [  ] is the smallest integer that is greater than or equal to the value of  ×   .The integer [  ] is used as a standard, and the corresponding cumulative probability is considered to be the threshold.As a result, the threshold is calculated as follows: where   is the accumulation of the Binomial trial outputs.
(2) Performing the Confirmation Test.To perform the confirmation data test, the new process data vectors were generated.For each confirmation data vector, the phase I NN model that classifies the confirmation data was used.This confirmation test is referred to as phase II.The accumulation of the NN outputs in phase II is denoted as  NN .The number of successes of the NN outputs in phase II is denoted as   .At time   , the value of the cumulative probability can be calculated as the following: (3) Conducting the Decision Rule.After performing steps (1) and (2), the decision rule can be set up as follows: time   is the beginning time of a process fault. (18)

Simulated Examples
This study performs a series of simulations to compare the existing single-stage NN method with the proposed hybrid scheme proposed in Section 3. The corresponding estimators of  for these two methods are denoted as τANN and τAA , respectively.
4.1.Assumptions.Without loss of generality, we assume that each quality characteristic is sampled from a normal distribution with zero mean and one standard deviation.In addition, we assume that we monitor ten quality characteristics simultaneously (i.e., p = 10) and the in-control covariance matrix is as follows: For the out-of-control covariance structure, without loss of generality, we assume that a variance shift occurs at the first quality characteristic.Consequently, the following outof-control covariance matrix is considered: In this study, the training data sets include 1000 data vectors for every possible parameter setting.Whereas the first 500 data vectors are all from an in-control state, the last 500 data vectors are from an out-of-control state.The structure of the testing data sets is the same as that of the training data sets; that is, the testing data sets involve 1000 data vectors.The first 500 data vectors are from an in-control state and the last 500 data vectors are from an out-of-control state.
This study considers four values of the inflated ratio : 1.1, 1.2, 1.3, and 1.4.In our proposed two-stage model, we have 7, 10, 10 and 10 input nodes for the ANOVA-NN models for  = 1.1,  = 1.2,  = 1.3, and  = 1.4,respectively.For all the models, there is only one output node.This output node indicates the classification results of the process status, where a value of 0 indicates that the process is in control and a value of 1 implies that the process is out of control.Furthermore, the change point of the process is assumed to be 201 ( + 1 = 201).For each data structure, we use a sample size (n) of 12 and repeat the simulation 5 times.The average of the estimates of each approach for the 5 simulation replicates was then recorded along with their standard errors.

Modeling Results and Analysis.
In stage I, we use a significance level of 0.05 and apply a one-way ANOVA test to select the important, influential variables.The results are given in Table 1.The significant variables selected in this stage are then used as the input variables into the NN.In addition, from Table 2, it can be seen that between the two methods discussed above, the two-stage ANOVA-NN scheme tended to have a better performance than that of the existing singlestage NN method.
To evaluate the performance of the two estimators discussed above, the bias and the mean squared error (MSE) were used in this study.The bias of an estimator τ is  (7.20) the distance between the expected value of the estimator and the parameter being estimated.It is used to indicate the accuracy of the estimator and is defined as follows: bias (τ) =  (τ) − .
The MSE is the expected value of the squared errors and is defined as follows: It is used to indicate how far, on average, the collections of estimates are from the parameters being estimated.The effects of the inflated ratio  on the biases and the MSE of the two estimators are shown in Figures 2 and 3, respectively.From Figure 2 it is found that the biases of the two estimators decrease as  increases, and the bias of the two-stage scheme appears to be smaller than the one of the other method.On the other side, again, Figure 3 shows that the mean squared error of the two-stage scheme tends to be smaller than the one of single-stage NN method.Consequently, it seems that the proposed two-stage ANOVA-NN scheme is more efficient in detecting the actual change point than the existing singlestage NN method.

Conclusions
The objective of this work is to develop an effective scheme to identify the beginning time of a fault, specifically for a process variance shift in a multivariate process with a general distribution.On the basis of our numerical study, the twostage procedure introduced here was generally more efficient in detecting the beginning time of a fault than that of the single-stage NN method.This work could be a useful guide to engineers attempting to search for the root cause of a process disturbance.
Based on our results, further studies can be expanded.For example, extensions of the proposed two-stage procedure to discrete multivariate processes or other statistical techniques are possible.Such work deserves further research and is our future concern.

Table 2 :
Average beginning time of a fault estimate and standard error for two estimators.