A Bayesian Approach to Control Loop Performance Diagnosis Incorporating Background Knowledge of Response Information

To isolate the problem source degrading the control loop performance, this work focuses on how to incorporate background knowledge into Bayesian inference. In an effort to reduce dependence on the amount of historical data available, we consider a general kind of background knowledge which appears in many applications. The knowledge, known as response information, is about what faults can possibly affect each of the monitors. We show how this knowledge can be translated to constraints on the underlying probability distributions and introduced in the Bayesian diagnosis. In this way, the dimensionality of the observation space is reduced and thus the diagnosis can be more reliable. Furthermore, for the judgments to be consistent, the set of posterior probabilities of each possible abnormality that are computed from different observation subspaces is synthesized to obtain the partially ordered posteriors. The eigenvalue formulation is used on the pairwise comparison matrix. The proposed approach is applied to a diagnosis problem on an oil sand solids handling system, where it is shown how the combination of background knowledge and data enhances the control performance diagnosis even when the abnormality data are sparse in the historical database.


Introduction
Fault diagnosis is a topic of practical significance in process industries.In complex control loop systems, the control performance could be degraded due to various reasons [1].Sensors and instrumentation problems are usually revealed as the systematic errors in mass or energy balance equations [2].Poor control loop performance might also be due to changes in the model or modeling error [3].
One challenge of control loop diagnosis is that some similar evidences could be shared among different faults [4].In a complex industrial control loop system, there may be lots of observations.For example, a large-scale industrial process includes thousands of process measurements [5].Many diagnostic algorithms are often designed to identify specific components, while the faults may propagate and influence other components which are not to be detected [6]; thus, the methods could be influenced by possible faults in other components [7].Moreover, all processes may run subject to uncertainty due to missing information, noise, and so on.Therefore, the occurrence of one fault may lead to flood of abnormal measurements and alarms and make it difficult to distinguish true underlying source.
Fault diagnosis methods for control loop systems may be classified into three categories, qualitative model-based methods, quantitative model-based methods, and datadriven methods [8][9][10][11][12][13].However, these methods result in problems when multiple abnormalities have the same influence on the measurements.To deal with these problems, the methods were extended based on qualitative information about signs, magnitudes, and so on, to consider the direction and the magnitude of change [14,15].Also, Bayesian method has been proposed; for example, a Bayesian network was constructed from expert knowledge [16].However, in these methods, the models are assumed to be known.Besides, fuzzy logic methods were proposed [17].All these previous works rely on prior knowledge only.
To overcome the drawbacks of both quantitative and qualitative model-based diagnosis approaches, data-driven methods are developed [18].For example, the support vector machine (SVM) methods were proposed that take the diagnosis problem as a classification one [19].Besides, multivariate statistical process monitoring methods were suggested [20].Nevertheless, the problem sources of abnormality may not be explicitly identified by means of the variable contribution methods [21].
Then, a systematic probabilistic approach based on Bayesian inference was proposed that considers all possible abnormal observations.A Bayesian framework for control loop performance diagnosis was developed in [4].The measurements from plenty of monitors are synthesized to generate a probabilistic result to diagnose the fault.Pernestal [22] proposed using Bayesian approaches to isolate faults in diesel engines.In a similar way, a data-based Bayesian approach is suggested in [23], to diagnose underlying sources of control performance degradation.
The main disadvantage with the data-driven or statistical approaches in diagnosis is that their performance relies heavily on the amount of available historical data.However, the requirement of sufficient training data is hardly met in diagnosis applications since faults are rare in normal processes.On the one hand, in their general form, they require sufficient historical samples from all faulty cases.However, in practical applications, there is only a limited amount of data available.On the other hand, the large number of monitors is a principle challenge for Bayesian diagnosis to be applied in industries.It is required in Bayesian inference to estimate the joint likelihood probability density of the observations from all monitors.In previous works, it is shown that the computational effort in estimating the probabilities grows exponentially with respect to the number of monitors [24,25] That phenomenon is also regarded as the curse of dimensionality.It makes it difficult to correctly estimate the likelihood probability in more than five dimensions with practical sample sizes.These works using Bayesian methods are based on training data only, and no explicit background knowledge is integrated about the process under diagnosis.
There is also a general type of background knowledge available.In this paper, we consider incorporating the background knowledge together with the training data under the Bayesian framework in order to improve the diagnosis even if the historical data are insufficient with respect to the monitors number.Regarding the process knowledge, it is possibly known that one measurement in observation vector is equally distributed given different abnormalities.This type of knowledge is very general and can be formulated as constraints on the underlying likelihood probability distributions [22,27].It can express several types of process knowledge and appear in many diagnosis applications naturally.
In this paper, the background information is expressed in terms of response signature matrix (RSM).With a translation from RSM to the constraints of the marginal probability of the likelihoods, the background knowledge is explicitly taken into account in Bayesian control loop diagnosis.Moreover, we also suggest using a moving window method to consider a sequence of observation rather than a single observation in the diagnosis.In order to evaluate the proposed approach, we applied it to an oil sand solid handling system in such a case where only a few samples from abnormalities are available.
The rest of this paper is organized as follows.A description of the Bayesian control performance diagnosis problem is introduced first, and in Section 2 some terminologies are reviewed.In Section 3, the problem studied in this paper is stated formally, and in Section 4 Bayesian diagnosis evaluating multiple consecutive observations is presented.The computations of the posterior probabilities for different modes considering historical data only are presented first, and, the approach is extended to incorporate process knowledge in Section 5.In Section 6, the proposed approach is evaluated on the diagnosis problem on the oil sand solids handling system, using training and background knowledge.The conclusion is given in Section 7.

Preliminaries: Bayesian Diagnosis for Control Loop Systems
Before going into the details of the Bayesian diagnosis, some terminologies are introduced based on the definition proposed in [23].
Component.Assume that the process under diagnosis consists of some components which is possible to fail or not fail.In a typical control loop, the components can be sensors, actuators, controllers, process models, and so on [28].Assume there are  components of interest.Each component may have several different states.For example, a sensor may consist of three states: unbiased, moderate biased, and severe biased.
Mode.A mode of the process is defined as an assignment of the states of all the components.It indicates the state of the system.For example, a mode can be as follows:  [23] is briefly reviewed in this section.Each component is possible to suffer from some abnormal operating conditions that may degrade the control performance.Also, any fault in one component may influence the monitors for the other components [4].Consider there are certain probabilistic interconnections between problem causes and monitor outputs [4].Bayesian inference is applied to compute the probability of mode variable  given a current observation  and the training observation data set .The posterior probability of every operating mode can be computed based on Bayes' rule.
where ( | , ) is the likelihood probability and () is the probability of mode  which is typically specified by a priori knowledge.The mode with the highest posterior probability can be determined as the underlying mode based on the maximum a posteriori (MAP) principle, and the related abnormality is generally regarded as the fault source.Thus, the main issue to construct a Bayesian diagnostic system is estimating the likelihood in line with the training observation data .Following the results of [22,23], a Bayesian algorithm is presented for the likelihood estimation for control loop diagnosis.

Formal Problem Formulation
Consider that there is a general type of background information and multiple consecutive observations available.The task is to determine which fault(s) which has caused the measurements, given consecutive observations , training data , and background knowledge  that is described as follows.
Background Knowledge in Terms of Probability Constraints.Background information usually comes from expert or process knowledge.It can be described as .It can be considered to consist of two parts of information.One specifies the prior probability for the modes, and the other defines that there are elements, representing the monitor outputs, in the observation vector which are equally distributed under different modes.
In addition, rather than considering a single observation as [23], assume that  consecutive observations are recorded and that the same fault is present during collection of these observations.Now, the fault diagnosis problem studied in this work can be stated formally as to compute that is, to compute the probabilities that each mode   is present at a time instant   , given the training dataset , the background knowledge , and the observation values   1 =   1 , . . .,    =    from the control loop process under diagnosis.The subscripts on  are used to denote observation vectors from consecutive instants and those on  to enumerate the observation values.
In the following, the posterior probabilities of each mode given consecutive observations are calculated with the training data only.

Bayesian Inference Using Training Data Only
To solve the stated problem (2), a new method is proposed for learning the likelihood probability distribution.Before going into the details, first let us present a previous result on inference based on training data only.
According to Bayes' rule, to compute (2), the likelihood probability needs to be calculated: where    denotes the training data under mode  =   .
Assume that the likelihood of all possible values of observation under mode   is parameterized by Let Ω   denote the space of all the likelihood parameters when the mode  =   .Also, the prior probability of these parameters is Dirichlet distributed It can be shown that the Dirichlet distribution is the only possible choice for (Θ   | ) under certain, not very restrictive assumptions [29].One attractive property of Dirichlet distribution is that it is conjugate to the multinomial distribution [30], and the distribution for the training samples is proportional to the multinomial distribution.This makes the computations particularly simple.Further, the parameters  1|  , . . .,  |  of the distribution are required.Γ(⋅) is the gamma function.For real number , Γ() = ( − 1)!.
By marginalizing over all the likelihood parameters, we have For the first factor of the integral (6), given the likelihood parameters Θ   and assuming that these observations from mode   are independent, And for the second factor, following the derivation of [23], we can write Further, the likelihood of training data subset    related to the operating mode   can be calculated as where  |  is the number of training data samples with the observation   from the mode   .Then combining ( 7)-( 9) and substituting in (6), likelihood (3) can be obtained.
To introduce the consecutive observations, first some notations are needed.Let  obs ⊂ X denote the set of distinct values present in consecutive observations X  1: = (  1 , . . .,    ), and let    be the total number of observations in X  1: with the value   .Following [31], the likelihood probability is given by the expression where    = ∑  =1  |  is the count of the hypothetical samples and    = ∑  =1  |  is the count of training samples.Theorem A.1 in the appendix can be referred to for the derivation of (10).

Bayesian Diagnosis Incorporating RSM and Data
To combine the background knowledge with training data, first, the problem dimensionality needs to be reduced utilizing the probability constraints implied in the available background information, and in the dimension-reduced subspaces, estimate the likelihoods with Bayesian inference.In this way, the estimation accuracy can be improved in the case of small amount of available historical samples that is often encountered in real applications since abnormalities are rare in normal process operations.Then, from the set of posterior probabilities that might be inconsistent as they are computed from different subspaces, derive the partially ordered posteriors that are consistent in the original probability space.

Background Knowledge Expressed as RSM.
In many applications, there are only a few historical samples available.Therefore, the process knowledge must be explicitly handled.
We consider a general type of process knowledge about what abnormalities can possibly affect each of the monitors.It can be expressed in terms of the following: "observation   has the same but maybe unknown probability distribution under  =   and  =   ."Table 1 gives an example of such knowledge."" at the th row and the th column represents a response signature meaning that the th element of the observation that is from the th monitor is affected under abnormal mode   , compared with that under the normal mode.The likelihood distribution of the th observation element given  =   is different from that under the normal mode  =   .In other words, the th monitor output would respond when the operating mode turns into the th abnormal operating mode.And "0" in the table indicates that the likelihood probability distribution is the same as that under the fault free mode.Or to say, the th monitor measurement shows zero response to the th abnormal mode.
The matrix corresponding to the response information table is the Response Signature Matrix (RSM), denoted by  = (  ) × , and use "1" for "" in the matrix.For example, the RSM according to Table 1 is ) . (11)

Dimensionality Reduction.
Using training data only, the Bayesian diagnosis would suffer from the curse of dimensionality.In statistics, the phrase reflects the sparsity of data in multiple dimensions.That phenomenon is an inevitable problem in Bayesian diagnosis.For instance, when a process employs 20 monitors, each one having the same three states that are low, medium, and high, the total number of possible observation values is 3 20 = 3.497×10 9 .This large observation space requires substantially more data to estimate.Consider any two operating modes   and   , and the domain of discourse consists only of these two modes.

𝑝 (𝑚
For the th monitor   ,   =   = 0 indicates that the marginal probability distributions of the th monitor output under the two modes are equal; that is, Therefore, the th monitor readings can be ignored, and it is possible to reduce the dimension by one.Whereas if   = 1 or   = 1, (13) does not hold.Thus, this measurement must be taken into account in the probability computation.Define as a set of numbers of monitors whose readings are affected by the th or the th abnormality, and   is the dimension of   , or to say, the number of the elements of set   .Given the background knowledge in terms of response information,   is usually smaller than .Take the response information given in Table 1 as an example, we can obtain  12 = {2}, that is a   -dimensional observation space, a subspace of X.Also, define   − as the observation vector whose probability is unaffected.For instance,  12   = ( [2]) and  12  − = ( [1],  [3]).From ( 13), we have Combining (12), it can be obtained that (16) indicates that when only two modes instead of all  modes are considered,   − is independent of mode variable.Then, it is easy to prove Therefore, while comparing the likelihoods of the observation under two modes, the monitor outputs corresponding to   − can be ignored, and only those related to    are needed to be taken into the probability computation.In this way, given the background information, the dimension of observation space can be reduced from  to   .Let   denote the total count of different observations in Z  .Note that in the following,    is written as  for simplicity.In the   -dimensional subspace Z  , the likelihood can also be obtained applying Bayesian inference as follows.
First, consider the likelihood estimation given one observation.We want to compute Assume that the likelihood of all possible values of  dimensional observation under mode   is parametrized by a set of parameters Θ   , Also, the prior probability of these parameters is assumed to be Dirichlet distributed.

𝑓 (Θ 𝑚
Then, applying Bayesian inference, the likelihood (18) can be obtained as this expression where =1  |  are the count of hypothetical samples and training samples, respectively.Now consider we have  consecutive observations   1 , . . .,    ∈ Z  , define  obs ⊂ Z  as the set of distinct values presenting in the consecutive   -dimensional observations Z  1: = (  1 , . . .,    ), and    is the total number of observations in Z  1: with value   .The sought likelihood can be obtained as Given the prior probabilities for the modes, the posterior probability can also be computed.

𝑝 (𝑚 𝑗 | 𝑍 𝐽
where each entry   is the ratio of the posteriors of modes   and   .Therefore, the matrix is of the form where   = 1/  ,   = 1,   > 0, ,  ∈ {1, . . ., }.This comparison matrix consists of paired reciprocal comparisons based on (17).By definition (25), C is a positive reciprocal matrix.Combining ( 22) and ( 23), we have or in an equivalent form What are the priorities of the modes with respect to the posterior probability?Consider the consistency of the matrix C. C is consistent if     =   , ∀, , .The original matrix C itself may be inconsistent.In order to determine which mode has the maximum probability, we need to derive a consistent partially ordered relationship set of all the modes { 1 , . . .,   } from the paired comparisons of the posteriors (maybe inconsistent) given in C.
There is a number of ways to obtain the vector of priorities.With emphasis on consistency, we suggest adopting an eigenvalue formulation [32].Using this formulation, our problem becomes where  max is the principal or largest eigenvalue of C. The principal eigenvector w = ( 1 , . . .,   );  1 > ⋅ ⋅ ⋅ >   is the partially ordered vector for all  modes with respect to their posterior probabilities.It is easy to prove that, for any estimate x, where  > 0 is a constant and w is principal eigenvector of C. The formula can be interpreted roughly as follows: "if we begin with an estimate and operate on it successively by C/ max to get new estimates, the result converges to a constant multiple of the principal eigenvector."Therefore, the mode corresponding to the largest element  1 is the sought operating mode based on the MAP principle.
To sum up, following is the algorithm of the proposed diagnosis method for complex control loops that incorporates training data and background knowledge of response information.
(a) Based on process knowledge expressed as RSM, for each pair of modes   and   , obtain   according to (14).
(b) For each pair of modes, in an observation subspace Z  with respect to   , compute the likelihood of each possible observation   under   and   , respectively, with (c) Construct the pairwise comparison matrix C with ( 25), (26), or (27).
(d) Compute the eigenvector w using (28), and the mode corresponding to the largest element  1 is the sought operating mode.

Evaluation Results for Oil Sand Solids Handling System Diagnosis
6.1.Diagnostic Settings.We now consider solids handling system for evaluation.This system is the first stage of the oil sands process, which is a typical setup used for oil sands mining operations.The flowchart is presented in Figure 1 that is based on the industrial application [26].From the flowchart, the mass of each truckload of the oil sand solid and the time when it was dumped into the dump hopper are available in a database.After being crushed, the solid is transported into the surge pile through the conveyor belt.
A level indicator gives a reading, from 0% to 100%, of the relative level of the surge pile.A weightometer is on the mixer feed conveyor which feed oil sand from the surge pile to the slurry mixer.The slurry is prepared in this mixer by adding water to the oil sand.The amount of water is controlled by a slurry density controller.The controller output is the volumetric flow rate of water.A slurry flow meter and a density meter give the readings of the volumetric flow rate and the density of the effluent slurry, respectively.In our simulation, four instruments  1 (database),  3 (weightometer),  4 (slurry flow meter), and  5 (density meter) are subject to possible bias.The control valve that is used to manipulate the water flow may suffer from stiction, and due to linearization, the model for slurry density controller is subject to error.
The system is designed to run under seven modes as shown in Table 2.The first mode  1 is the No Fault mode.Each of the other six modes is under a fault of one component.Mode  2 represents the density model error due to the linearization,  3 ,  4 ,  5 , and  6 consider bias in each of the four instruments  1 ,  3 ,  4 , and  5 , and  7 considers stiction of the water valve.In this table, a "-" denotes that the corresponding component is fault free and a " * " represents that the component has fault.Nine monitors are available for diagnosis, as shown in Table 3. 200 simulation runs were performed for each case, as well as 60 runs used for validation.As there are totally 9 monitors, the generic Bayesian diagnosis using training data only is a 9-dimensional problem.In other words, each likelihood probability given each underlying mode that is needed in the inference is a 9-dimensional joint probability.It is obvious that the available historical samples are very far from sufficient to generate accurate likelihood estimation.
The process knowledge of response information is given (Table 3).From this table, the corresponding RSM can be written that represents the implied probability constraints.According to (14), all   sets for each value of  and  (,  ∈ {1, 2, 3, 4, 5, 6, 7}) can be obtained.For instance,  23 = {2, 4, 5, 6, 7} with a reduced dimension   = 5.Then, for each pair of modes, the likelihood of each possible observation under each mode is computed, respectively, in a subspace with respect to   .Finally, through the pairwise comparison matrix C, the underlying operating mode can be determined using the eigenvalue formulation.

Bayesian Diagnosis Using RSM and Historical Data.
In order to evaluate diagnosis performance, two criteria are used.One is the false negative rate, which can be obtained using the simple quotient where  inc is the number of validating samples that are incorrectly diagnosed and  cor is the number of samples that are correctly diagnosed.The misdiagnosis rate is related to mode number.In order to exclude the influence of the mode number, we define a relative misdiagnosis rate (RMR), and the aforementioned misdiagnosis rate will be referred to as absolute misdiagnosis rate (AMR).Assume that the underlying mode of a validating sample is   ; calculate posteriors of all  modes.If there are  posteriors that are less than the posterior of the underlying mode (  |   ), the correct diagnosis number for this sample will be /( − 1), and the incorrect diagnosis number be 1 − /( − 1).Then  cor and  inc are obtained by adding up the correct and incorrect number of all validating samples, respectively.Finally, the RMR is obtained using the same quotient in (30).By such definition, when (  |   ) is larger than posteriors of all other modes, it will be counted as one correct diagnosis; also, when (  |   ) is larger than some other posteriors, still a positive fraction will be added to  cor .In order to mimic the background knowledge of response information, we obtained 10000 samples through 10000 runs of simulation and established the response information table based on the distribution of these samples.
The diagnosis performance of the proposed approach is evaluated in comparison with diagnosis using training data only.
In Figure 2, the horizontal axis represents the sample number.The underlying mode of the 1-60, 61-120, 121-180, 181-240, 241-300, 301-360, and 361-420 sample is  1 ,  2 ,  3 ,  4 ,  5 ,  6 , and  7 , respectively.In Figure 2 respectively.Green points represent the samples that are correctly diagnosed, and pink points represent those incorrectly diagnosed.Not surprisingly, the proposed diagnosis approach incorporating background knowledge expressed as response information results in the better performance, while without use of the background knowledge, the diagnosis performs worse in the percentage of mode diagnosed.
The average AMR and the RMR from diagnosis with or without incorporating response information are shown in Figure 3.It is clearly observed that when combining the background knowledge together with the training data, diagnosis results are much better than when only training data is incorporated.

Conclusions
The objective is to isolate the problem source that is degrading the control performance.In order to reduce dependence on the amount of data available, our approach is to emphasize the use of background information and incorporate the background knowledge of response information into the diagnosis.The knowledge in general terms of RSM can be translated to constraints on the underlying probability distribution.We introduce the constraints in the Bayesian inference such that the dimensionality of the observation space is reduced, and thus the diagnosis can be enhanced.Moreover, for the comparative judgments to be consistent, the set of posterior probabilities computed from different observation subspaces is synthesized by using the eigenvalue formulation on pairwise comparison matrix; therefore, we can obtain the partially ordered posteriors and then determine the state of the process under diagnosis.The approach is applied to a diagnosis problem on an oil sand solids handling system.The advantage of combining background knowledge and data is achieved even when the amount of training data is limited.To sum up, training data and background knowledge are used for solving different parts of the control performance diagnosis problems.When both are used, the optimal diagnosis is achieved.

Figure 2 :
Figure 2: The underlying and diagnosed modes for each validating sample.(a) Underlying mode; (b) modes diagnosed using training data only; (c) modes diagnosed incorporating training data and response information.

Figure 3 :
Figure 3: Average AMR and RMR from diagnosis with or without incorporating background knowledge.
[1] 1 , ...,   ) with the domain X = X 1 ×⋅ ⋅ ⋅×X  .Denote an assignment of the observation vector by X =   ,  = 1, ..., , where  is the number of different observations.If the th monitor output has   discrete values,  = ∏  =1   .Each value   ( = 1, ..., ) is an -dimensional vector, and we write   = ( [1], . . .,   []) to explicitly denote the elements.Consider the observation as a random variable.Training Data.A training sample at time  consists of simultaneous values of the mode variable  and the observation vector  at that time.The value is denoted by   = (  ,   ).All training samples collected from different modes of the system form the training dataset.A realization of training data is denoted by .And    denotes the subset of training data entries where the underlying mode is   .

Table 1 :
A priori response information.

Table 3 :
Monitors and a priori response information.