Semiadaptive Fault Diagnosis via Variational Bayesian Mixture Factor Analysis with Application to Wastewater Treatment

Mainly due to the hostile environment in wastewater plants (WWTPs), the reliability of sensors with respect to important qualities is often poor. In this work, we present the design of a semiadaptive fault diagnosis method based on the variational Bayesian mixture factor analysis (VBMFA) to support process monitoring.The proposed method is capable of capturing strong nonlinearity and the significant dynamic feature of WWTPs that seriously limit the application of conventional multivariate statistical methods for fault diagnosis implementation. The performance of proposed method is validated through a simulation study of a wastewater plant. Results have demonstrated that the proposed strategy can significantly improve the ability of fault diagnosis under fault-free scenario, accurately detect the abrupt change and drift fault, and even localize the root cause of corresponding fault properly.


Introduction
Due to stringent environmental regulations, wastewater treatment plants (WWTPs) are always challenged to meet new effluent quality requirements [1].In such context, the use of online sensors for control and optimization in the wastewater treatment plants is indispensable.Measurements from instrumentation shall be available 24 hours a day and 7 days a week.However, even reliable instrumentation can fail during operation, which can have serious consequences if the instrumentation is used in the closed form.The huge amount of correlated data adds more complexity to this problem, thus making manual expert-based quality evaluation infeasible [2].The development of methods that allow us to extract useful information for automatic detection of faults and identifying their root cause is urgently needed.
This vision has motivated inspiring efforts toward using multivariate statistical process monitoring (MSPM) methods, such as principal component analysis (PCA) [3,4] and partial least-squares (PLS) [5,6], for fault diagnosis in the wastewater treatment process, even though they had been used for many years in the chemical process industries [3,7,8].The goal of such methods is to project a high dimensional measurement space onto a space with significantly fewer dimensions.In PCA, Hotelling statistics,  2 , and the sum of squared residuals, SPE, or  statistic are formulated as tools to identify the deviations of normal process.The  2 statistic is a measure of the variation in the PCA model, whereas the  statistic is a measure of the amount of variation not captured by the PCA model [9].However, these methods adhere to an assumption that process noises for each variable are normally distributed.The presence of variations such as shifting operational conditions, feed flow fluctuations, control strategy changes, and non-Gaussian disturbances in the WWTPs often deviates from such an assumption.Independent component analysis (ICA) provides an alternative to make data more comprehensible and useful for non-Gaussian fault diagnosis [10][11][12].Other complementary MSPM approaches include Fisher discriminant analysis (FDA) [13], canonical variable analysis (CVA) [14], and support vector machines (SVM) [15].Nevertheless, WWTPs pose an additional challenge to the development and application of these methods.The problem is highly uncertain environment.To alleviate such problem, a probabilistic PCA based method was proposed with the assumption that both of process variables and noises follow Gaussian distribution [16,17].The issue of PPCA is that it assumes the same noise level for all measurement variables with potentially underfitting the fault diagnosis model.Factor analysis provides a potential way to address such a problem using different noise variance for each variable [18].Unfortunately, parameter identification of FA using the least-squares method or maximum likelihood for optimization potentially results in an overfitting problem.Therefore, variational Bayesian learning was introduced to identify corresponding parameters for FA or PCA and further used to process monitoring [19].These methods work reasonably well with synthetic data sets but they have difficulties in handling wastewater plant dynamics (e.g., seasonal or diurnal changes) and nonlinearity (e.g., temperature-dependent kinetics).One of common and plausible ways is to extend the conventional methods to a mixture form with the capability of approaching global nonlinearity by some local linearity models.However, there are several drawbacks of the mixture model based methods [20]: (1) they cannot determine the effective factor number for each factor analyzer automatically through the modeling process; (2) the importance of each retained factor in the model structure cannot be easily differentiated; (3) singularity and overfitting problems maybe caused when the number of modeling data samples is limited.Variational Bayesian mixture factor analysis offers a good alternative to overcome these problems.Additionally, the VBMFA method is not only able to determine the dimensionality automatically, but also capable of avoiding overfitting problem [21].
It is common sense that batch processes are in fact a special type of nonlinear processes.Therefore, in this paper, unlike other mixture models for batch multimode process monitoring in general [22][23][24][25], the VBMFA-based process monitoring method is proposed to address the significant nonlinear problems in the WWTPs.This is more practical because systems are always nonclassifiable clearly for most cases.By doing so, the corresponding  2 and SPE statistics are formulated as tools for fault diagnosis and identification. 2 and SPE statistics from local factor analyzers are assimilated by weights, respectively.In order to account for changing process behavior, such weights can change over time, thereby offering more reliable way for fault diagnosis and identification.Therefore, the implementation of adaptive mechanism for the proposed method is not by their mean and variance but rather depending on their weight update.In other words, the resulting online VBMFA model is obtained by weighting adaptively several factor analyzers that are already trained offline.Therefore, the resulting method is termed as a semiadaptive fault diagnosis method.In summary, the contributions of this paper is given as follows: (1) typical variables are generated to account for the fluctuations of diurnal wastewater changes in workdays and weekends, serving as data pretreatment and an initial step to break up dynamics of the wastewater process; (2) the VBMFA is introduced for fault diagnosis; (3) an adaptive probabilistic weighted strategy is adapted to mix different local factor analyzers; (4) an adaptive control limit is proposed to monitor a WWTP with strong nonlinearity and dynamic feature.
The rest of this paper is organized as follows.In Section 2, an overview of issues in the WWTPs is given, together with a description of the simulation systems involved in this work.Section 3 will deal with the description of the FA and VBMFA.Section 4 provides a detailed demonstration for the development of the VBMFA-based fault diagnosis strategy, which is followed by the case study of the corresponding process monitoring scheme in the next section.Finally, some conclusions are made.organic load varies during the day according to the level of human activity.Also, they themselves are strongly affected by weather conditions and seasonal change.Such fluctuations often result in the degradation of the performance or even plant failure.

BSM1 (Benchmark Simulation Model 1) Platform.
The simulation platform used is the BSM1 developed by the second IWA Respirometry Task Group, offering an unbiased benchmarking system for comparing various control strategies (http://www.benchmarkwwtp.org/)[28].In this study, a relatively simple plant layout, which is most commonly used for removal of organic matter, nitrification, and denitrification, is selected in the simulation benchmark (see Figure 1).The plant consists of five compartment biological tanks (5999 m 3 ) and a secondary settler (6000 m 3 ), designing to treat an average flow of 20,000 m 3 /day with an average biodegradable chemical oxygen demand (COD) concentration of 300 mg/L.The last three compartments of the bioreactor are aerated, whereas the others are not.The nonreactive secondary settler is modeled with a series of 10 layers with an area of 1500 m 2 and a depth of 4 m.There are two internal recycles: the nitrate internal recycle from the last tank to the first tank and the activated sludge recycle from the underflow of the secondary settler to the front end of the plant.The feed from the final biological reactor is connected to the sixth layer from the bottom of the secondary settler.The scenario with dry weather condition is tested in this case study.The simulated BSM1 comprises 14 days of influent data recorded at 15-minute intervals from a real plant.In order to make sure that the methodology is able to translate to a real WWTP easily in the future, the BSM1 model and proposed method were run in separate computer and connected by two I/O cards.

Factor Analysis and Mixture Factor Analysis Model Based on Variational Bayesian Learning
3.1.Factor Analysis.Factor analysis is a statistical method with the capability to describe variability among observed correlated variables in terms of a potentially lower number of unobserved variables called factors.The original observed measurement variables  ∈   are treated as linear combination of factors  plus the noise  ∈   and mean vector  ∈   ; that is, where Λ ∈  (×) is the linear transformation known as the factor loading matrix and e is the zero-mean Gaussian noise with diagonal covariance matrix Ψ.Given the observed variables  ∈   , the parameters  = {, Λ, Ψ} in the model are needed to be identified.One of common ways is to use the expectation and maximization (EM) algorithm, also termed as the iterative likelihood maximization algorithm.

Mixture Factor Analysis Model Based on Variational
Bayesian Learning (VBMFA).It is a common sense that high dimensional data in fact lie on a low dimensional manifold generally [29].In factor analysis, each factor dictates the amount of each linear transformation on the observed variables.However, such transformation can only well explain a small region of the manifold in which it is locally linear, even though the manifold is globally nonlinear.
One way to deal with this problem is to use mixture models to tile the data manifold.A mixture of factor analyzers models the density for a data point   as a weighted average of factor analyzer densities as follows: where  is the number of mixture components in the model,  is the vector of mixing proportions,   is the discrete indicator variable for the mixture component selected for modeling the data point , and Λ = {Λ  }  =1 is a set of factor loadings with the factor matrix Λ  for analyzer .
Due to the complexity of integrating out all those parameters, it is necessary to choose priors that the likelihood terms greatly simplify inference and interpretability.Therefore, the Dirichlet prior for mixing proportion , with strength  * , is chosen as follows: with  * = [1/, . . ., 1/].Thus, only a single hyperparameter is needed regardless of the dimensionality of .The column of each factor loading matrix has a Gaussian prior with zero mean and a different precision parameter (drawn from a gamma distribution with fixed hyperparameters): where Λ  .denotes the vector of entries in the th column of the th analyzer in the mixture and V   is the same scalar precision for each entry in the corresponding column.
Since the number of hyperparameters in } increases with the number of analyzers, a hyperprior on every element of each V  precision vector is as follows: where  * and  * are shape and inverse-scale hyperparameters for a gamma distribution.Finally, the means of each analyzer in the mixture need to be integrated out.A Gaussian prior with mean  * and axis-aligned precision diag(V * ) is placed on each mean  * : The directed acyclic graph for the generative model for VBMFA is shown graphically in Figure 2.
The aim of VBMFA is to optimize An arbitrary distribution (, V, Λ, ) is introduced to lower bound.By using the Kullback-Leibler divergence [30],  can be reformulated as follows: where Θ = ( *  * ,  * ,  * ,  * , V * , Ψ).Thus, the lower bound is a sum of a foundational of variational posterior distribution over the parameters, denoted by (), and a foundational of variational posterior distribution over the hidden variables of each data point.
In order to optimize the lower bound, functional derivatives with respect to each (⋅) distribution are taken and are equal to zero to find the distributions.By doing so, the corresponding (⋅) can be obtained as follows: where each element of the variational parameter  is given by with  =  * + ,  *  = 1/, and ∑  =1   = 1: where Λ  denotes the column vector associated with the th row of Λ , which has   + 1 dimensions.These variational poster parameters are given by with with By using a result of Dirichlet distributions in [21], (  ) is computed by where   is a normalization constant for each data point, such that ∑    =1 (  ) = 1: The solution for  * and V * can be found by maximizing (8): with where the update for V * uses the already updated  * .

Units Process Monitoring Based on VBMFA
4.1.Fault Detection.In order to perform process monitoring, monitoring statistic  2 and SPE is constructed first of all.Similar to PCA method, corresponding monitoring statistics for each local FA can be represented as follows: with Similarly, the associated SPE statistic can be given as follows: The control limits of the  2 and SPE statistic can be both determined by the  2 distribution; thus, where  = 1, 2, . . ., .  is the number of retained factors in each local FA model,  is the number of process variables, and is the significance level of both monitoring statists.As for a new coming sample data  new , both of statists are updated as follows: However, it is cumbersome to monitor the process with all of the local monitoring charts, which also potentially frustrates the decision making for fault diagnosis with such many monitoring charts.Therefore, coordination of all the local monitoring charts is imperative.Different from the hard assignment method employed by [31], an adaptive method is implemented in this section.As the monitoring results of the new sample  new in each local subspace have been identified, we can combine them through the estimated posterior probabilities, which are given as follows: which represents the responsibility of the components for the new coming data point  new .Both of synthetic monitor statists  2 and SPE can be obtained as follows: Since the control limit of the local  2 is correlated with the number of factors   in each FA model, there is a need to combine the control limit as well.Thus,  ( new ) is used to weight associated  2 control limit as follows: Note that, due to constant control limit of each local SPE  2  (), it is not necessary to calculate the combined SPE control limit any more.
Even though the structure for each local FA model is unchanged after model training, our process monitoring model can be adjusted by updating mixture proportions in an adaptive way.Hence, process behaviors can be judged by simply checking both of statistics accordingly if they exceed predefined control limits.

Fault Identification.
Once a fault is detected, more information is necessarily obtained to find the root cause of the corresponding faults.A contribution plot is a common way to fill this gap, which is able to show the contribution of each process variable to the statistic calculated.A high contribution of a process variable usually indicates a problem with this specific variable.Plots of the contribution to the associated statistic are similar to standard plots of squared residuals obtained, but the contribution plots presented here also have control limits.These control limits are used for comparing the residuals of the new values for the present to the residuals of the fault-free data.If in the fault-free data a certain process variable had high residuals, it can also be expected to have high residuals in the present.However, if there are high residuals for a certain process variable that had low residuals in the fault-free data; this probably is due to a special event in the new process.If the contributions of a large group of process variables are studied, it is usually found that several process variables have high contributions.Using the control limits, it is easier to find those process variables that are really different, compared to the fault-free data.In this paper, a control limit for the contribution for each process variable is calculated as the mean of the contributions plus three times the standard deviation of the contributions for each process variable at each time period.That means contribution of each variable follows Gaussian distribution in a long period, but the matching mean and variance will  [32,33].Here, the control limits are determined according to the 3 method [32].The reason why we used the 3 method is because of its efficiency and simplicity.

Experimental Study
5.1.Data Pretreatment.In this case, 16 important variables shown in Table 1 were selected for monitoring, covering almost all the wastewater plant qualities required by engineers and the government.As depicted in Figures 3(a) and 3(b), the historical data sets reveal that, during dry weather periods, daily SS and S NH concentration follow a very similar pattern, which is also observed in other variables.In this respect, the typical data profiles for each variable under dry weather conditions were generated.This was done by stacking all the historical data in one set (from 0 to 1 day), and from it calculating the average flow at each time step.It is worth noting that daily data from Monday to Friday are very similar, whereas a slightly different pattern (shift in amplitude) is observed on the weekend profiles.Therefore, two typical data profiles were generated for each variable: one for weekdays and one for weekends.These patterns are consistent with the water consumptions by inhabitants during daily life.Results about SS and S NH concentrations are presented in Figures 3(c) and 3(d).By subtracting typical data from historical data, dynamic behaviors during the wastewater plant dataset are capable of being broken up, thereby providing better data for fault detection, identification, and even prediction.Moreover, to make sure all data from different variables are consistent mutually, they were centered and scaled accordingly.We denote all these procedures as "data whitening."Finally, the whitening data were further used for ensuing VBMFA model building.

5.2.1.
Fault Detection for the Fault-Free Scenario.1344 samples in two weeks under normal operation condition were collected to assess the proposed methodology.900 data points were used for VBMFA model training; the remaining was for testing.As depicted in Figures 4(a) and 4(c), the FA model performed well with most of points under the 95% control limit in terms of  2 and SPE statistics.However, it is worthy to emphasize that unacceptable numbers of false alarm occurred during this process.This is due to the fact that, when the measurements reach the peck or the valley, FA model is incapable of capturing such strong nonlinearity properly, thus leading to too many false alarms frequently.On the contrary, better performance was achieved by VBMFA with almost all of test points under 95% control limit.It is obvious that the control limit changed along with new data points coming, resulting in a more compact control limit and less probability of false alarms.
Table 2 depicts the performance of four different monitor strategies, including PCA, FA, VBPCA, and VBMFA, under fault-free scenario in dry weather conditions.The frequency of false alarm for PCA and FA was very similar in terms of  2 , but the SPE for both of strategies were quite different.The FA relying on different variance for different variables achieved less false alarms in comparison to the PCA.Furthermore, variational Bayesian learning is incorporated into PCA to learn its parameters.The number of false alarms was reduced as expected.Since variational Bayesian learning is the sensitive posterior probability mass rather than the posterior probability density, thus making it more resistant against overfitting compared to other estimation methods (leastsquares, point estimation).Furthermore, because uncertainty information for unknown quantities can be obtained by this variational Bayesian learning, it is useful to detect unreliable results.However, due to strong nonlinearity and dynamics,  VBPCA failed to capture all the characteristics properly.Obviously, VBMFA achieved the best performance.This is an important improvement as the peaks or the valleys that frequently occurred in the concentrations of different variables were described accurately, thus leading to better monitor effectiveness.It is necessary to emphasize that all model parameters excluding VBMFA were obtained somehow with the help of manual operation.This would potentially frustrate the use of these methods in practice.Conversely, the number of factors for each factor analyzer in VBMFA was determined automatically.

Fault Detection for an Abrupt Change and a Drift Fault.
Abrupt change and drift fault are two types of common faults in the process control.Thus, to access the fault detection ability of VBMFA, two cases with different faults were generated during the testing data on the second sensor (TCOD) and the fourth sensor (SS), respectively, tabulated as Table 3. Due to cost saving demand and stricter regulations from governments, TCOD and SS sensors are fairly important for WWTP monitoring.Therefore, it is monitored not only by the engineers from WWTP, but also by the government using a remote   5(a) between data points 200 and 300, which potentially leads to a real fault being failed to be identified, VBMFA was able to adjust its control limit to adapt to such variation.To further illustrate the efficiency of VBMFA, the second scenario with a drift fault exerting on the fourth sensor was also simulated, showing in Figures 5(b) and 5(d) that it achieved a good performance with accurate detection when the drift fault was happening.It is important to notice that the control limit of S 2 decreased to lower values once it recognized significant changes in the updating data, thereby providing more potentials to identify the drift fault earlier.

Fault Identification.
In order to find the root cause of the corresponding fault, the contribution plot was employed in this paper.The contribution plot of SSPE for the abrupt change can discriminate between the faulty sensor and the normal sensors (Figure 6(a)).It is also obvious in Figure 6(a) that the faulty sensor (TCOD) deviated from the major pattern and exceeded its control limit due to the negative influence of abrupt changes.Similarly, the contribution plot for the drift fault was shown in Figure 6(b).The result clearly shows that the contribution plot of the VBMFA strategy successfully separated the out of control data from normal operation ones.

Parameters Analysis.
When optimizing (8), some (  ) could be zero.This can be attributed to the fact that local data is insufficient to support the dimensional complexity on the factor loading matrices.Such redundant components should be removed to increase , because they are no longer needed.This is the component death.
Component birth does not happen spontaneously; thus, a heuristic is introduced.A parent-component stochastically has a birth with probability proportional to  − , and we attempt to split it into two.The parameter distributions of the two Gaussians created from the split are initialized by partitioning the responsibilities for the data, (  ).This usually causes  to decrease, so by monitoring the progress of , this attempted birth can be rejected if  does not recover.
Furthermore, in the VBMFA model, the dimensionality of each factor analyzer is determined automatically by automatic relevance determination (ARD) with the potentials to improve VBMFA model performance dramatically [21,34].As such, the corresponding weight of each factor analyzer can be obtained in an automatic manner accordingly without necessarily resorting to trial and error method.The S 2 and SSPE index can be weighted and updated adaptively.
As for the maximum factors for each factor analyzer, it represents the freedom of parameters the ARD method is able to learn.Thus, this parameter is just set to  − 1 or  − 2.
Each of the mixture components takes for a particular data point  new and on the parameters  of the mixture components, not on the values of other data points.Also, due to the parameters  having been identified in the training stage already, the update of (  ) is only related to the data point  new in fact.

Discussions.
The present work describes the development of a VBMFA model to monitor a WWTP.This is an attempt to provide water utilities with a tool to improve the process monitor ability and formulate a guideline for system operation.
The VBMFA model has been successfully applied to a simulated WWTP with strong nonlinearity and dynamics in the dry weather.Results showed that false alarms can be avoided significantly under fault-free scenario.Also, accurate fault detection and identification can be achieved if an abrupt change or a drift fault occurred.Due to the mixture of factor analyzers for modeling, VBMFA is able to describe a small region of process in a locally linear way and the entire process in a globally nonlinear manner and therefore facilitate catching the characteristic of process even in the peaks or the valleys in terms of results in the previous sections.
In this paper, we show the benefits of VBMFA model for a WWTP process monitor.The results clearly demonstrate the superior performance of VBMFA model compared with other methods.However, future work should update not only the weights of each factor analyzer and corresponding monitor indexes, but also the associated variance and mean.Also, future work should aim to test and validate this methodology in a real WWTP system, which incurs us to implement our work in separate computer as shown in Figure 2. Additionally, due to the demands of cost saving in the water communities, it is imperative to assimilate economic index into the process monitoring in our future work.

Additional Points
A methodology for fault diagnosis in WWTPs has been successfully developed based on VBMFA, where estimated posterior probabilities have been used to weight different local factor analyzers adaptively.Resulting monitoring statistics as well as associated control limit are reformulated as an adaptive manner.The results show that the proposed method achieves the best performance under fault-free scenario.This study further demonstrates the effectiveness of detecting two typical faults and localizing the associated faulty sensor.

Figure 1 :
Figure 1: The schematic of wastewater plant for BSM1.

Figure 3 :
Figure 3: Typical variables for workdays and weekdays.

Figure 4 :
Figure 4: Fault diagnosis under normal condition using FA and VBMFA.

Figure 5 :
Figure 5: Fault detection for an abrupt change and a drift fault using VBMFA.

FaultFigure 6 :
Figure 6: Fault identification for an abrupt change and a drift fault using VBMFA.

Table 1 :
Selected sensors for process monitoring.

Table 2 :
Selected sensors for process monitoring.

Table 3 :
Faults generated on two important sensors.