Change Point Determination for an Attribute Process Using an Artificial Neural Network-Based Approach

The change point identification has played a vital role in process improvement for an attribute process. This identification is able to effectively help process personnel to quickly determine the corresponding root causes and significantly improve the underlying process. Although many studies have focused on identifying the change point of a process, a generic identification approach has not been developed. The typical maximum likelihood estimator (MLE) approach has limitations: particularly, the known prior process distribution and mathematical difficulties. These deficiencies are commonly encountered in practice. Accordingly, this study proposes an artificial neural network (ANN)mechanism to overcome the difficulties of typicalMLE approach in determining the change point of an attribute process. Specifically, the performance among the statistical process control (SPC) chart alone, the typical MLE approach, and the proposed ANN mechanism are investigated for the following cases: (1) a known attribute process distribution with the associatedMLE being available to be used, (2) an unknown attribute process distribution with the MLE being unable to be used, and (3) an unknown attribute process distribution with the MLE being misused. The superior results and the performance of the proposed approach are reported and discussed.


Introduction
The statistical process control (SPC) charts have been extensively reported the success in monitoring manufacturing processes.The SPC signal is triggered when evidence suggests that a disturbance has intruded into the underlying process.The signal implies that the process personnel needs to search for the root causes of the disturbance.The sooner the root causes that have been correctly identified, the better the process improvement that can be achieved.Typically, the search of the root causes mainly depends on the identification of change point or starting time of a disturbance.The change point carries the most related information about the disturbance, and the process personnel is much easier able to correctly determine the root causes based on the change point information.As a consequence, the identification of the change point has become a promising research topic.
In recent years, there have been many studies reported on change point determination [1][2][3][4][5][6][7][8].For example, the combination of MLE approach with  and  control charts, respectively, was used to monitor the normal processes [1,2].The combination of EWMA and Cusum control charts with MLE was also investigated to determine the change point of a normal process [3].In addition to the univariate applications, the change point determination for the multivariate process applications has been reported [4,5].Those studies have the same assumption; that is, the process distribution is known.Although those studies have reported the effectiveness of the MLE approach [1][2][3][4], the MLE has one major drawback for the estimation of change point [5].That is, the distribution of a process must be preassumed.If the process distribution cannot be confirmed in advance, which is typical in practice, the MLE approach would cause the problem of underestimation of the true change point.
While most of the research have investigated the approach of MLE to determine the change point for variable processes [1][2][3][4], fewer studies have focused on the MLE for the attribute processes [6][7][8].In addition, a generic identification approach for the change point determination has not been developed.Accordingly, we propose a generic approach to overcome the difficulties of MLE in determining the change point for an attribute process.Our proposed approach involves the integrated use of the artificial neural network (ANN) and the binomial cumulative probability.Using the proposed approach, the change point for an attribute process can be accurately and reliably determined.
This study considers three general cases to evaluate the performance of the proposed and typical approaches.Case 1 assumes the situation where a process distribution is known and the corresponding MLE can be derived.Case 2 is involved with the situation where the process distribution may either be known or unknown and the corresponding MLE cannot be derived.Case 3 considers the situation where the process distribution is unknown, but the MLE is misused.
The structure of this study is organized as follows.The following section discusses the concept of the proposed generic approach for determining the change point of an attribute process.Section 3 discusses three cases in which the typical and the proposed approaches are used.The performances are demonstrated and addressed.The final section concludes this study.

The Proposed Approach
In contrast to typical change point applications, this study proposes a generic approach to deal with the change point determination for an attribute process.Since the MLE method may not be used to estimate the change point in this study, the underlying process can be viewed as a distributionfree type.As a consequence, the proposed approach is applicable to any type of an attribute process.Our proposed approach involves the integrated use of the ANN and the binomial cumulative probability.

The Identification Strategy.
Suppose that an attribute process is monitored by an attribute control chart and an outof-control signal is triggered at time .This signal implies that a disturbance has been intruded into a process at (or before) time  unless the signal is a false alarm.Typically, the process personnel may conclude that the change point has occurred at time  by only observing the SPC signal.However, a process disturbance would typically be intruded into a process at time  − 1,  − 2,  − 3, or much earlier than time  − 3.As a consequence, we know that the change point determination should not judged only by the SPC signal.Some researchers use the MLE method to derive the estimate of the change point.For example, consider an attribute process which follows a binomial distribution.Assume that the binomial process is initially in control and the observations come from a binomial distribution with known parameters  and  0 , where  is the sample size and  0 stands for the probability of obtaining a nonconforming product in a state of statistical control.After an unknown time at  + 1, the process parameter changes from  0 to  1 =  0 , where  is the unknown magnitude of the change and  1 stands for the probability of obtaining a nonconforming product in an out-of-control state.Let   be the observation (i.e., nonconforming product) at time i with binomial distribution function of (⋅, ⋅), and we have where the notation "∼ (,  0 )" stands for "has a binomial distribution (,  0 ), "  + 1 is the change point of a process,  is the signal time that a sample point exceeds the attribute control chart's limits, and  ≤ .
When a disturbance has occurred in a process after time , the in control binomial process which followed a (,  0 ) distribution would be changed to an out-of-control state (i.e., (,  1 )).As a consequence, the likelihood function would follow: It can be shown that an MLE of a true change point can be obtained as follows [6]: where Although the performance of MLE is acceptable, the difficulty is that the MLE cannot be obtained when the underlying process distribution is unknown, which is commonly seen in practice.Therefore, a generic approach, which does not require the process distribution, is developed.
In this study, we initially apply the classifier ANN to predict the values of the output variable from time  − 1 to 1, in a backward sequence.This study classifies the output variable as a binary digit, either 1 or 0. When the value of output variable is predicted as 0, we assume that the process is in control; that is, the process disturbance has not been introduced yet.On the other hand, when the value of outcome is classified as 1, it implies that the process is out-ofcontrol and the disturbance has been intruded already.Thus, if we can consecutively, from time  − 1 to  −  (where 1 ≤  ≤  − 1), and obtain the value of outcome which equal 1, we should be able to draw the conclusion that the change point is equal to time  − .
However, when the value of outcome is 0 at time  − 1 and the value of outcome is 1 at time  − 2, what is the conclusion?It is not straightforward to provide the solution to this question.Actually, the values of 1 and 0 can be viewed as success and failure of a binomial experiment, respectively.A binomial experiment possesses the following properties.
(1) There are two types of outcomes, success or failure, in each Bernoulli trial.
(2) The success rate of each trial is  and the failure rate of each trial is 1 − .
(3) Each trial is mutually independent.That is, the outcome of a trial would not influence the other trial's outcome.
Since the decision about the change point of a process should not appropriately made by only one single outcome of the output variable, we could use the cumulative probability distribution of a binomial experiment to determine the change point.Suppose that a SPC signal is triggered at time  and the change point occurred at time  + 1.If ANN has a perfect classification capability (i.e., 100% accurate identification rate (AIR)), the values of output variable should be classified as 0 from time 1 to , and the values of output variable are 1 from time  + 1 to .Accordingly, if we have the accumulation of the perfect ANN output values, in a backward sequence, as a binomial random variable,   , the corresponding cumulative probability would be In general, if the proposed ANN has a good classification capability, we can be sure that most of the output values, from time  to  − , could be classified as 1.It is equivalent to indicate that the cumulative probability of the binomial distribution near 1.Since there are no perfect classifiers in practice, the misclassification ANN outputs must exist.Consequently, the cumulative probability of the binomial distribution is rationally less than a certain threshold value.This threshold value must be less than 1.That is, if the value of cumulative probability is greater than a threshold at a certain time −, we can conclude that the change point has occurred at time  − .
However, there seems no theoretical threshold value.According to our experience and numerous simulations results, we could determine the thresholds in the following steps.
(1) During the training and testing in the ANN modeling phase (i.e., it was named as the phase I), we can obtain an accurate identification rate (AIR) for the classification tasks.This AIR is equivalent to the probability of successful rate (  ) of the binomial experiments.Since the numbers of success must be an integer, the following relationship holds: where   stands for the number of success in  binomial trials and [  ] is the smallest integer which is greater than or equal to the value of  ×   .This [  ] can be served as a standard, and the corresponding cumulative probability is deemed as the threshold.Accordingly, this threshold is calculated as follows: where   is the accumulation of the binomial trial outputs.
(2) When phase I is completed, the ANN parameters are all set.In order to perform the confirmation test, we simulate other new process data vectors.We use the phase I ANN model to classify the new process data vectors.This confirmation test is referred to as the phase II.The accumulation of the ANN outputs in phase II which is defined as  ann would also follows a binomial distribution.This study defines the number of success of the ANN outputs in phase II as   .At time   , we can compute the value of the cumulative probability for this binomial random variable; that is, (3) As a consequence, our decision rule is described as follows: time   is recognized as the change point.
Notice that we could have many time periods identified as the change point.Since the change point is only occurred at a certain single time, we should determine the first appearance of the change point as the estimate of the change point.

The ANN Modeling.
In recent years, a large number of studies have been reported for ANN applications [9][10][11][12][13][14][15][16].ANN is a massively parallel system comprised of highly interconnected, interacting processing elements, or units that are based on neurobiological models.ANNs process information through the interactions of a large number of simple processing elements or units, also known as neurons.Knowledge is not stored within individual processing units but is represented by the strength between units [9].
To utilize the ANN, we need to design its structure.All training data sets include 1000 data vectors.While the first 500 data vectors are all from an in control state (i.e., no disturbance involved), the last 500 data vectors are from an out-of-control state.The structure of the testing data sets is same as the training data sets; that is, the testing data sets involve 1000 data vectors.The first 500 data vectors are from an in control state, and the last 500 data vectors are from an out-of-control state.
The ANN nodes can be divided into three layers: the input layer, the output layer, and one or more hidden layers.The nodes in the input layer receive input signals from an external source and the nodes in the output layer provide the target output signals.The output of each neuron in the input layer is the same as the input to that neuron.For each neuron j in the hidden layer and neuron k in the output layer, the net inputs are given by [17] net where () is a neuron in the previous layer,   (  ) is the output of node (), and   (  ) is the connection weight from neuron () to neuron ().The neuron outputs are given by where net  (net  ) is the input signal from the external source to the node () in the input layer and   (  ) is a bias.The transformation function shown in (12) is called sigmoid function and is the one most commonly utilized to date.Consequently, sigmoid function is used in this study.
The generalized delta rule is the conventional technique used to derive the connection weights of the feedforward network.Initially, a set of random numbers is assigned to the connection weights.Then for a presentation of a pattern  with target output vector   = [ 1 ,  2 , . . .,   ]  , the sum of squared error to be minimized is given by where  is the number of output nodes.By minimizing the error   using the technique of gradient descent, the connection weights can be updated by using the following equations: where for output nodes and for other nodes Note that the learning rate affects the network's generalization and the learning speed to a great extent.The input to the ANN is the values of the process outputs.The ANN output consists of one node.This output node indicates the classification of the process status.The value of 0 concludes that the process is in control, and the value of 1 indicates that the process is out-of-control.

The Experimental Examples
In this section, we consider three general cases to compare the performance of the SPC chart alone, the MLE method, and our proposed approach, respectively.Case 1 involves a known binomial process distribution and the corresponding MLE can be obtained.This study uses a typical np control chart to monitor this process.MLE can be obtained as shown in (3).In case 2, this study assumes that the underlying process distribution is known; however, the MLE cannot be accessed.In this case, we apply the  control chart [18] to monitor a negative binomial distribution.In case 3, we assume that the underlying process is not known, but the process personnel uses the "binomial" MLE to the underlying process.Under this condition, we assume that the underlying process follows a discrete uniform distribution.However, we assume that the process personnel misunderstand the process as a binomial distribution.Thus, the process personnel uses the binomial MLE to estimate the change point for a uniform process.

Case 1: A Known Process Distribution with a Known MLE.
Suppose that an attribute process would follow the binomial distribution with parameters (,  0 ).This binomial process operates in a statistical control during the period of 1 to 100.The in control parameters are arbitrarily chosen as  = 100 and  0 = 0.1.After sample period 100, the out-of-control parameters have been shifted to  1 = 0.12, 0.14, 0.16, 0.18, and 0.20, respectively.In this condition, a change point is set to be sample period 101.
Three methods,  control chart alone, MLE, and the proposed approaches, are employed to determine the change point.Since there is no other extra information to assist the method of  control chart alone, we only can judge the change point as the SPC signal time.Consider a simulation with the following conditions.Suppose that an in control binomial process has the initial parameter setting,  ( = 100,  0 = 0.1).A disturbance has occurred at time 101.Consequently, the out of control binomial process is denoted by  ( = 100,  1 = 0.12).Figure 1 depicts  control chart for monitoring the binomial process as mentioned above.The upper control limit (UCL), center line (CL), and the lower control limit (LCL) are computed as follows: Therefore, UCL = 19, CL = 10, and LCL = 1.Observing Figure 1, it is apparent that the change point (i.e., at time 101) is not equal to  chart's signal (i.e., at time 138).
For the method of MLE approach, we utilize (3) to determine the change point.We also use the proposed generic approach, with the help of ( 9), to determine the change point.
In our proposed generic approach, we consider the development of five ANN models in phase I.Those five models are used to represent the different types of process out-of-control conditions, and they include  1 = 0.12,  1 = 0.14,  1 = 0.16,  1 = 0.18, and  1 = 0.20, respectively.After performing the training and testing phases, the AIRs for the five ANN models are obtained.They are 63%, 76%, 85%, 90%, and 94%, respectively.Using those AIRs, we can calculate   which is represented in (6).Consequently, we are able to compute the threshold, based on (7), for each binomial trial.This is the step 1 of the proposed generic approach.In the step 2, we simulate five types of 150 data sets of the binomial process outputs with five various values of  1 in the phase II.That is, the first 100 observations are distributed with B ( = 100,  0 = 0.10), and after time 101, the remaining data are from an out-of-control state (i.e., a disturbance has intruded into the process after 100).These remaining data can be represented as B (,  1 ), where  = 100 and  1 = 0.12, 0.14, 0.16, 0.18, and 0.2, respectively.For each data set, the last collected observation in a simulation run is the time at which the SPC signal was triggered.This study uses the five models which are obtained in phase I to perform the confirmation for these 150 data sets.After performing the ANN classification, we are able to obtain   and the corresponding cumulative probability which is computed by using (8).For the last step of our generic approach, we just simply compare the values of ( ann ≤   ) and (  ≤   ).If the relationship ( ann ≤   ) ≥ (  ≤   ) holds, we are able to draw the conclusion that the time   is the change point.
Table 1 displays the results when  control chart alone, MLE, and typical approaches are applied to determine the change point.The values list in Table 1 is the averaged estimates of the change point with the associated standard errors (i.e., within the parentheses).For example, the topleft value of 135.44 (2.68) stands for the averaged estimate of the change point, 135.44, and its standard error, 2.68, by performing 150 simulation runs with the use of SPC chart alone.Since the true change point is 101, the averaged estimate of the change point has better performance if it is close to 101.Also, the smaller value of the standard error would imply a more accurate estimate.In the case of  1 = 0.12, the averaged estimates of the change point are 135.44,103.91, and 117.91 for the SPC chart alone, MLE, and proposed approaches, respectively.The performance of the MLE method is the best and the SPC chart alone is the weakest.In the cases of  1 = 0.14 and  1 = 0.16, the performances for the MLE and typical approaches are almost the same.The both approaches still outperform the method of SPC chart alone.However, when the  1 gets larger ( 1 = 0.18 and  1 = 0.20), the MLE method seems to have the over-estimation problem.The MLE estimates of the change point are 100 and 99 when  1 = 0.18 and  1 = 0.20, respectively.Instead, our proposed generic approach has the best performance and the smallest standard errors among these three approaches.
In conclusion, in case 1, we observe that the performance for both MLE and proposed approaches are satisfactory, and the method of SPC charts alone has the poorest performance.Notice that in this case, even if the process distribution is known and MLE is derived, the performance of the MLE method does not really outperform the proposed generic approach.

Case 2: An Unknown Process Distribution and without MLE.
Because the MLEs have been derived for the Poisson, geometric, and binomial attribute processes, this study does not consider the Poisson, geometric, or binomial distributions to represent an attribute process for cases 2 and 3. Accordingly, this study uses two common distributions, negative binomial, and discrete uniform distributions, to describe an attribute process.This study arbitrarily considers the negative binomial distribution and discrete uniform distribution for cases 2 and 3, respectively.In this case, a negative binomial process is used to represent the underlying process.According to their suggestion [18], the in control process parameters are set as NB ( = 1,  0 = 0.2).For the same conditions as case 1, the process operates in a statistical control during the period of 1 to 100.Again, following their suggestions [18], the out-of-control  1 values are set up as 0.18, 0.16, 0.14, 0.12, and 0.1, respectively.Therefore, after sample period 100, a disturbance is assumed to be intruded into the process, and the negative binomial parameters have been shifted from  0 to  1 .The change point is set to be at sample period 101.
We apply  control chart to monitor this negative binomial process.The UCL, CL, and LCL are computed as follows [18]: Also, since there is no MLE for the negative binomial process, we do not employ the MLE method to obtain the estimate of the change point.We are, however, able to apply our generic approach to determine the change point, with the use of ( 9).As a consequence, this study employs two approaches,  control chart alone and the proposed approach, to determine the change point of an attribute process.
After performing the phase I, the AIRs for the five ANN models are obtained.They are 55%, 61%, 70%, 75%, and 79%, respectively.Table 2 displays the simulation results when the two approaches are used to determine the change point.One is apparently able to recognize that the performance of the proposed generic approach is much superior to the method of SPC chart alone.This can be observed since the estimates of the change point for the proposed approach are closer to the true change point, 101, and the standard errors for the proposed approach are also smaller for every value of  1 .

Case 3: An Unknown Process Distribution with Misuse of MLE.
Case 3 is involved with an unknown process distribution.In here, this unknown process is assumed to be a discrete uniform distribution.However, suppose that the process personnel misunderstands this unknown process as a binomial process.Thus, the binomial MLE is applied to estimate the change point.Accordingly, the MLE is misused.The unknown uniform process is arbitrarily assumed to follow (1, 10) when the process is in control.After time 100, a process disturbance has intruded into the process, and the process parameters have arbitrarily been shifted as  (8,20).
For the same conditions as case 1, this study employs three approaches, with the running of 150 simulated confirmation data sets, to determine the change point.Since this study assumes that the process personnel misunderstand the process as a binomial distribution, np control chart is used to monitor the process.The parameters are arbitrarily chosen as  = 100 and  0 = 0.1.This study also assumes that the process personnel misuse the binomial MLE to estimate the unknown process.The MLE is obtained by using (3).The third approach is our generic mechanism.Table 3 shows the averaged estimates of the change point with the associated standard errors (i.e., within the parentheses) when an SPC chart alone, the misuse of MLE, and the proposed approaches are used, respectively.Due to the misuse of the MLE method in this case, the performance of the MLE is the poorest.The proposed generic approach is still considered to be the most robust mechanism since the averaged estimate is the nearest to the true change point, and it also has the smallest standard error.

Conclusions
The issue of change point determination is a very important challenge task for industrial processes.Most of the studies have focused on the MLE methods in identifying the change point for a variable process.Although certain MLEs were developed, they have to be applied to a certain known process distribution.When a process distribution is unknown, which is common in real applications, the MLE estimators would seriously be misestimated.
In contrast to the typical research, this study proposes a generic mechanism to effectively determine the change point for an attribute process.The proposed approach does not need to require the prior information about the underlying process distribution, and the proposed approach can be applied to any type of the attribute process.This study considers three practical cases to show the performances among the SPC chart alone, MLE, and the proposed generic approaches.
By performing the experimental simulations, we observed that the proposed generic approach is able to effectively assist in determining the change point in the three cases considered.All of the simulation results indicate that the proposed generic approach has better accurate identification rates than the other two typical methods.
The superior capability of accurate determination of the change point of the proposed approach can be used to substantially improve process control in industries.
The proposed approach is simple to be used, and the practical usefulness is clearly observed.As a result, the proposed generic approach should improve process personnel effectiveness in isolating and correcting the root causes of a disturbance, resulting in enhanced overall process performance.Nevertheless, this study only discusses the case of a univariate process, and we plan to apply the generic approach to the multivariate attribute processes in the future.

Figure 1 :
Figure 1: An  control chart for monitoring a binomial process.

Table 1 :
Performance comparisons among three approaches in the case of a known binomial process with a known MLE.

Table 2 :
Performance comparisons among two approaches in the case of an unknown negative binomial process without MLE.

Table 3 :
Performance comparisons among three approaches in the case of an unknown process with the misuse of MLE.