A Denoising Based Autoassociative Model for Robust Sensor Monitoring in Nuclear Power Plants

1University of Science and Technology Beijing (USTB), 30 Xueyuan Road, Haidian District, Beijing 100083, China 2Institute of Automation, Chinese Academy of Sciences, University of Science and Technology Beijing, 95 Zhongguancun East Road, Beijing 100190, China 3Department of Electrical Engineering, COMSATS Institute of Information Technology, University Road, Abbottabad 22060, Pakistan 4COMSATS Institute of Information Technology near Officers Colony, Kamra Road, Attock 43600, Pakistan


Introduction
From safety and reliability stand point, sensors are one of the critical infrastructures in modern day automatic controlled nuclear power plants [1].Decision for a control action, either by operator or by automatic controller, depends on correct plant state reflected by its sensors."Defense in depth" (defense in depth safety concept requires mission critical systems to be redundant and diverse in implementation to avoid single mode failure scenarios) safety concept for such mission critical processes essentially requires a sensor health monitoring system.Such sensor health monitoring system has multifaceted benefits which are just not limited to process safety, reliability, and availability but also in context of cost benefits from condition based maintenance approach [2,3].A typical sensor health monitoring system may include tasks of sensor fault detection, isolation, and value estimation [4].Basic sensor monitoring architecture comprises two modules as depicted in Figure 1.The first module implements a correlated sensor model which provides analytical estimates for monitored sensor's values.Residuals values are evaluated by differencing the observed and estimated sensor values and are supplied to residual analysis module for fault hypothesis testing.These correlated sensor models are based on either the first principles models (e.g., energy conservation and material balance) or history based data driven models [5].However, sensor modeling using empirical techniques from statistics and artificial intelligence are an active area of research [6,7].
In order to model complex nonlinearity in physical process sensors, autoassociative neural network based sensor models had widely been used and reported for calibration monitoring in chemical processes [8][9][10][11] and nuclear power plants [12][13][14][15].Data driven training procedures for such neural network based sensor models discover the underlying statistical regularities among input sensors from history data and try to model them by adjusting network parameters.Five-layer AANN is one of the earliest autoassociative architectures proposed for sensor and process modeling [8].
In contrast to shallow single layered architectures, these multilayered neural architectures have flexibility for modeling complex nonlinear functions [16,17].However, harnessing the complexity offered by these deep NN models without overfitting requires effective regularization techniques.Several heuristics based standard regularization methods are suggested and exercised in literature [18,19] such as training with jitter (noise), Levenberg-Marquardt training, weight decay, neuron pruning, cross validation, and Bayesian regularization.Despite all these regularization heuristics, the joint learning of multiple hidden layers via backpropagation of error gradient inherently suffers from gradient vanishing problem at the earlier layers [20].This gradient instability problem restricts the very first hidden layer (closer to input) from fully exploiting the underlying structure in original data distribution.Result is the poor generalization and prediction inconsistency.Problem gets even more complex and hard due to inherent noise and colinearity in sensor data.
Considering the complexity and training difficulty due to gradient instability in five-layer AANN topology, Tan and Mayrovouniotis proposed a shallow network topology of three layers, known as input trained neural network (ITN-network) [21].However, the modeling flexibility gets compromised by shallow architecture of ITN-network.
The regularization and robustness issues associated with these traditional learning procedures motivate the need for complementary approaches.Contrary to shallow architecture approach by Tan and Mayrovouniotis [21], here, we are interested in preserving the modeling flexibility offered by many layered architectures without being compromised on generalization and robustness of the sensor model.Recent research on greedy layerwise learning approaches [22,23] has been found successful for efficient learning in deep multilayered neural architectures for image, speech, and natural language processing [24].So, for a multilayered DAASM, we proposed to address poor regularization through deep learning framework.Contrary to joint multilayer learning methods for traditional AANN models, the deep learning framework employs greedy layerwise pretraining approach.Following the deep learning framework, each layer in the proposed DAASM is regularized individually through unsupervised pretraining under denoising based learning objective.This denoising based learning is commenced under autoencoder architectures as elaborated in Section 3. It essentially serves several purposes: (1) Helps deep models in capturing robust statistical regularities among input sensors.
(3) Implicitly addresses model's robustness by learning hidden layer mappings which are stable and invariant to perturbation caused by failed sensor states.
Moreover, robustness to failed sensor states is not an automatic property of AANN based sensor models but is primarily essential for fault detection.Consequently, traditional AANN based sensor model requires explicit treatment for robustness against failed sensor states.However, for the case of DAASM, an explicit data corruption process is exercised during denoising based unsupervised pretraining phase.The proposed corruption process is derived from drift, additive, and gross type failure scenarios as elaborated in Section 4.1.Robustness to faulty sensor conditions is an implicit process of denoising based unsupervised pretraining phase.Robustness of the proposed DAASM, against different sensor failure scenarios, is rigorously studied and demonstrated through invariance measurement at multiple hidden layers in the DAASM network (see Section 7).The full DAASM architecture and layerwise pretraining is detailed in Section 4. We will compare the proposed DAASM based sensor model with an extensively reported five-layer AANN based sensor model by Kramer.Both sensor models are trained on sensor data sampled from full power steady operation of a pressurized water reactor.Finally, performance assessment with respect to accuracy, autosensitivity, crosssensitivity, and fault detectability metrics is conducted under Section 8.

Problem Formulation
In context of sensor fault detection application, the purpose of a typical sensor reconstruction model is to estimate correct sensor value from its corrupted observation.The objective is to model relationships among input sensors which are invariant and robust against sensor faults.So, empirical learning for robust sensor relationships can be formulated as sensor denoising problem.However, contrary to the superimposed channel/acquisition noise, the term "denoising" specifically corresponds to the corruption caused by gross, offset, and drift type sensor failures.Under such denoising based learning objective, the empirical sensor model can be forced to learn a function that captures the robust relationships among correlated sensors and is capable of restoring true sensor value from a corrupted version of it.
Let  True and SObs be the normal and corrupted sensor states related by some corruption process (⋅) as follows: where  :   →   is a stochastic corruption caused by an arbitrary type sensor failure.The learning objective for denoising task can be formulated as Under minimization of above formulation, the objective of empirical learning is to search for  that best approximates (1) Pretraining one layer at a time in a greedy way.
(2) Using unsupervised learning at each layer in a way that preserves information from the input and disentangles factors of variation.(3) Fine-tuning the whole network with respect to the ultimate criterion of interest.

Building Block for DAASM
In relation to empirical modeling approach as formulated in Section 2, denoising autoencoder (DAE) [26] is the most promising building block for pretraining and composition of deep autoassociative sensor model.DAE is a variant of the traditional autoencoder neural network, where learning objective is to reconstruct the original uncorrupted input  from partially corrupted or missing inputs x.Under training criterion of reconstruction error minimization, DAE is forced to conserve information details about the input at its hidden layer mappings.The regularization effect of denoising based learning objective pushes the DAE network towards true manifold underlying the high dimension input data as depicted in Figure 2. Hence, implicitly captures the underlying data generating distribution by exploring robust statistical regularities in input data.A typical DAE architecture, as depicted in Figure 3, comprises an input, output, and a hidden layer.An empty circle depicts a neuron unit.The input layer acts as a proxy layer to the original clean input.Meanwhile, the red filed units in input layer are proxies to clean input units which are randomly selected for corruption under some artificial noise process.(, x) is an empirical loss function to be optimized during training process.Let   be the original data vector with  = 1, 2, . . .,  elements, while x represents the partially corrupted version obtained through corruption process   .The encoder and decoder functions corresponding to DAE in Figure 3 are defined as The encoder function   (x) transforms input data to ℎ(x  ) mapping through a sigmoid type activation function () = (1 + exp − ) −1 at hidden layer neurons.X(x  ) is an approximate reconstruction of  obtained through decoder function    (ℎ) through reverse mapping followed by sigmoid activation at output layer.Meanwhile,  = {,   } = {, ,   ,   } are the weight and bias parameters corresponding to these encoder and decoder functions.
In relation to sensor reconstruction model as formulated in Section 2, the above-described DAE can be reinterpreted as follows: (5)

DAASM Architecture and Regularization
In order to capture complex nonlinear relationships among input sensors, a multilayered architecture is proposed for denoised autoassociative sensor model (DAASM).Individual layers in network hierarchy are pretrained successively from bottom to top.For a well regularized sensor model, the structure and optimization objective in greedy layerwise pretraining play a crucial role.Two heuristics are applied for robust learning in DAASM as follows: (1) Each successive layer in multilayered DAASM assembly is pretrained in an unsupervised fashion under denoising autoencoder (DAE) as elaborated in Section 3.
(2) To address robustness, data corruption processes for denoising based pretraining task are incorporated with domain specific failure scenarios which are derived from different types of sensor faults.These heuristics serve several purposes: (i) Forcing the DAE output to match the original uncorrupted input data acts as a strong regularizer.It helps avoid the trivial identity learning especially under overcomplete hidden layer setting.(ii) Denoising procedure during pretraining leads to latent representations that are robust to input perturbations.(iii) Addition of corrupted data set increases training set size and thus is useful in alleviating overfitting problem.
Full DAASM is learnt in two stages: (1) an unsupervised pretraining phase and (2) a supervised fine-tuning phase.As shown in Figure 4, the pretraining phase follows a hierarchal learning process in which successive DAEs in the stack hierarchy are defined and trained in an unsupervised fashion on the preceding hidden layer activations.Full sensor model is constructed by stacking hidden layers from unsupervised pretrained DAEs followed by a supervised fine-tuning phase.For each DAE in the stack hierarchy, the optimization objective for unsupervised pretraining will remain the same as in relation (5).However, weight decay regularization term is added to the loss function which constrains network complexity by penalizing large weight values.In relation (6), {,   } are the network weight parameters corresponding to encoder and decoder function, while  is the weight decay hyperparameter: In a typical DAE architecture, a number of input and output layer neurons are fixed corresponding to input data dimension ; however, middle layer neuron counts   can be adjusted according to problem complexity.Literature in deep learning suggests that under complete middle layer (  < ), for DAE architecture, results in dense compressed representation at the middle layer.Such compressed representation has tendency to entangle information (change in a single aspect of the input translates into significant changes in all components of the hidden representation) [27].This entangling tendency directly affects the cross-sensitivity of sensor reconstruction model especially for the case of gross type sensor failure.Considering that, here, we choose for an overcomplete hidden layer setting (  > ).Under overcomplete setting, denoising based optimization objective acts as a strong regularizer and inherently prevents DAE from learning identity function.

𝜂 S D S | S)
) corresponding to DAE-1 and DAE-2 can be represented as (ℎ  , ĥ , ), where ĥ is an approximate reconstruction of ℎ: ⟨⟩ are the network weights corresponding to encoder part in DAE.
The noise process    ( S | ) for DAE-1 corresponds to a salt-and-pepper (SPN) type corruption process, in which a fraction of the input sensor set  (chosen at random for each example) is set to minimum or maximum possible value (typically 0 or 1).The selected noise process models gross type failure scenarios and drives the DAE-1 network to learning invariance against such type of sensor failures.The noise functions  ℎ 2  ( h1 | ℎ 1 (s)) employ a corruption process in which ℎ 1 () and ℎ 1 ( S) from pretrained DAE-1 will be used as the clean and noisy input for DAE-2 pretraining.Finally, an additive Gaussian type corruption process (AGN) : ).We will further mathematically formulate and discuss all these corruption processes in detail in Section 4.1.
These pretrained layers will initialize the DAASM network parameters in basin of attractions which have good generalization and robustness property.In order to generate a sensor model that is fairly dependent on all inputs, "Dropout" [28] heuristic is applied on ℎ 3 hidden units during DAE-3 pretraining.Random dropouts make it hard for latent representations at ℎ 3 to get specialized on particular sensors in the input set.Finally, pretrained DAEs are unfolded into a deep autoassociator network with  number of encoder and  − 1 decoder cascade as shown in unsupervised fine-tuning Science and Technology of Nuclear Installations phase in Figure 3.The final network comprises one input layer, one output, and 2 − 1 hidden layers.The input sensor values flow through encoder cascade  =     −1   ⋅ ⋅ ⋅  1  using recursive expression in (7) and a decoder cascade  =  1    +1    ⋅ ⋅ ⋅  −1   using the following equations: where ⟨  ,   ⟩ are network weights and biases of the decoder part in DAE.The entire network is fine-tuned using a semiheuristic based "Augmented Efficient Backpropagation Algorithm," proposed by Embrechts et al. [29], with following minimization objective: A  − 2 weight decay term is added to the above loss function for network regularization purpose during fine-tuning phase.
To circumvent the overfitting, an early stopping procedure, which uses validation error as proxy for the generalization performance, is used during fine-tuning phase.(i) Gross sensor failure: it includes catastrophic sensor failures.Salt-and-pepper type corruption process, in which a fraction ] of the input sensor set  (chosen at random for each example) is set to minimum or maximum possible value (typically 0 or 1), is selected for modeling gross type failure scenarios.
(ii) Miscalibration sensor failure: it includes drift, multiplicative, and outlier type sensor failures and is modeled through isotropic Gaussian noise (GS) : x |  ∼ (,  2 ).Instead of selecting an arbitrarily simple noise distribution, we estimated the distribution of sensor's natural noise and exaggerated it to generate noisy training data.We propose to distribute the denoising based invariance learning task across multiple hidden layers in the DAASM network.Both gross and miscalibration noise types are equally likely to occur in the input space.Gaussian type corruption process is not suitable for input data space  because of its low denoising efficiency against gross type sensor failures.Contrarily, salt-and-pepper type corruption process covers two extremes of sensors failure range and hence provides an upper bound on perturbation due to minor offset and miscalibration type sensor failures.So, saltand-pepper type corruption process is devised for DAE-1 pretraining as follows: Gross type sensor failures usually have high impact on crosssensitivity and can trigger false alarms in other sensors.Such high cross-sensitivity effect may affect isolation of miscalibration type secondary failures in other sensors.In order to minimize the effect, a corruption procedure in which ℎ 1 () and ℎ 1 ( S) from pretrained DAE-1 are proposed as the clean and noisy input for DAE-2 pretraining.This corruption method is more natural since it causes next hidden layer mappings to get invariant against cross-sensitivity effects and network aberrations from previous layer.The corruption process is supposed to improve invariance in ℎ 2 layer mappings against cross-sensitivity effects from gross type sensor failures: where S =    ( S | ) = SPN.
(11) Here ℎ  1 () corresponds to hidden layer activations against clean sensors at the input layer, while ℎ  1 (s) corresponds to hidden layer activations against partially faulted sensor set.
Finally, to add robustness against small offset and miscalibration type sensor failures, an isotropic Gaussian type corruption process is devised for DAE-3 pretraining.The corruption procedure corrupts the ℎ 2 hidden layer mappings, against clean sensors at the input layer as ℎ 2 (ℎ 1 ()), by employing an isotropic Gaussian noise as follows: Finally, clean input is used for the supervised fine-tuning phase in Figure 4.

Data Set Description
Intentionally, for study purposes, we limited the modeling scope of DAASM to full power steady operational state.It is the common state in which NPP operates from one refueling to the next.However, in practice it is not possible for NPP systems to be in perfect steady state.Reactivity induced power perturbations, natural process fluctuations, sensor and controller noises, and so forth, are some of the evident causes for NPP parameter fluctuations and are responsible for steady state dynamics.Considering that the collected data set should be fairly representative of all possible steady state dynamics and noise, the selected sensors are sampled during different time spans of one complete operating cycle.The training data set consists of 6104 samples collected during the first two months of full power reactor operations after refueling cycle.Meanwhile 3260 and 2616 samples are reserved for validation and test data sets, respectively.Five test data sets are used for model's performance evaluation.Each test data set consists of 4360 samples collected during eight-month period after refueling operation.In order to account for fault propagation phenomenon due to large signal groups, a sensor subset is selected for this study.An engineering sense selection based on physical proximity and functional correlation is used to define the sensor subset for this study.Thirteen transmitters, as listed in Table 1, are selected from various services in nuclear steam supply system of a real PWR type NPP. Figure 5 shows the spatial distribution of the selected sensors.Starting from postrefueling full power startup, the data set covers approximately one year of selected sensors values.Selected sensors are sampled every 10 seconds for consecutive 12-hour time window.Figure 6 shows data plot from few selected sensors.1.

Model Training
employs two learning stages, an unsupervised learning phase and supervised training phase.DAE based greedy layerwise pretraining of each hidden layer, as described in Section 4, is performed using minibatches from training data set.Stochastic gradient descent based learning algorithm is employed as suggested in practical training recommendations by [30].Finally, standard backpropagation algorithm is employed for supervised fine-tuning in fully stacked DAASM in Figure 4. Supervised training is performed using clean sensor input only.The model hyperparameters are set by random grid search method [31].A summary of the training hyperparameters corresponding to optimum DAASM is shown in Table 2.

Invariance Test for Robustness
A layer by layer invariance study is conducted to test the robustness of fully trained DAASM against failed sensor states.Data corruption processes applied during pretraining are essentially meant to learn hidden layer mappings which are stable and invariant to faulty sensor conditions.The following invariance test, for successive hidden layers in final DAASM stack, can provide an insight into the effectiveness of data corruption processes exercised during denoising based pretraining phase.Invariance, for hidden layer mappings ℎ  , is quantified through mean square error (MSE) between Euclidean (2) normalized hidden layer activation ⟨ℎ  ⟩  /‖⟨ℎ  ⟩  ‖ 2 and ⟨ h ⟩  /‖⟨ h ⟩  ‖ 2 against clean and faulty sensors, respectively.Invariance test samples are generated by corrupting randomly selected sensors in input set with varying level of offset failures [5%-50%].The MSE against each offset level is normalized across hidden layer dimension  ℎ and number of test samples   as shown in (13).Finally these MSE values are normalized with maximal MSE value as in (14).Normalized MSE curves for each successive hidden layer are plotted in Figure 7. Consider Layerwise MSE plots, in Figure 7, clearly show that invariance to faulty sensor conditions increases towards higher layers in the network hierarchy.In these plots, lower curves indicate higher level of invariance.To further investigate the effect of increasing invariance on reconstructed sensor values, a sensor model, corresponding to the level "" of each hidden layer, is assembled via encoder and decoder cascade.Robustness of these partial models is quantified through (1 −   Auto ).

Autosensitivity values 𝑆 𝑖
Auto (see Section 8.2) are calculated against varying offset failure levels.In Figure 8, layerwise increase in robustness confirms that increased invariance helps in improving overall model's robustness.

DAASM versus K-AANN Performance Analysis
Here we will assess and compare the performance of DAASM  The MSE values of all sensors are normalized to their respective span and are presented as percent span in Figure 9. Being an error measure, the lower MSE values by DAASM signify its prediction accuracy.

Robustness.
Robustness is quantified through autosensitivity as defined by [32,33].It is the measure of model's ability to predict correct sensor values under missing or corrupted sensor states.The measure is averaged over an operating region defined by  samples from test data set as follows: where  and  are indexes corresponding to sensors and their respective test samples.  is the original sensor value without fault.ŝ is the model estimated sensor value against   . drift  is the drifted/faulted sensor value.ŝdrift  is the model estimated sensor value against drifted value  drift  .The autosensitivity metric lies in [0, 1] range.For autosensitivity value of one, the model predictions follow the fault with zero residuals; hence no fault can be detected.Smaller autosensitivity values are preferred, which essentially means decreased sensitivity towards small perturbations.Large autosensitivity values may lead to missed alarms due to underestimation of the fault size caused by small residual values.Compared to K-AANN model, in case of DAASM, a significant decrease in autosensitivity values for all sensors is observed.The plot in Figure 10 shows that DAASM is more robust to failed sensor inputs.
To further investigate robustness against large offset failures, both models are evaluated against offset failures in [5%-50%] range.For each sensor, samples from test data are corrupted with specific offset level and corresponding autosensitivities are averaged over whole sensor set.Autosensitivity values less than 0.2 are considered as robust.The maximum autosensitivity value of 0.187 is observed in steam flow sensor.The plot in Figure 11 shows that average autosensitivity for both models increases with increasing level of offset failure.However, the autosensitivity curve for DAASM autosensitivity is well below the corresponding K-AANN curve.literature as "spillover effect" and is quantified through "crosssensitivity" metric [32].It quantifies the influence of faulty sensor  on predictions of sensor  as follows: where  and  indexes are used to refer to faulty and nonfaulty sensors, respectively.Meanwhile,  is the index for corresponding test samples.  Cross is the cross-sensitivity of sensor  with respect to drift in th sensor.  is the value of th sensor without any fault.ŝ is the model estimated value of th sensor against   . drift  is the drifted/faulted value of th sensor.ŝdrift  is the model estimated value of th sensor against drifted value  drift  .The highly distributed representation of the input in neural network based sensor models has pronounced effect on the cross-sensitivity performance.Cross-sensitivity metric value lies in [0, 1] range.High value of cross-sensitivity may set off false alarms in other sensors, provided the residual values overshoot the fault detectability threshold in other sensors.So, minimum cross-sensitivity value is desired for a robust model.The plot in Figure 12 shows that the cross-sensitivity for DAASM is reduced by a large factor as compared to K-AANN model.
The spillover effect, against particular level of offset failure in [5%-50%] range, is averaged over all sensors as follows:  The cross-sensitivity values   cross() , against % offset failure level, are calculated using (18).Figure 13 shows the average cross-sensitivity plot for both models.Small crosssensitivities are observed in DAASM which effectively avoided false alarms in other channels without relaxing the SPRT faulted mean value up to an offset failure of 35-40% in any channel.However, for the case of offset noise larger than 35%, SPRT mean needs to be relaxed to avoid false alarms and isolate the faulty sensor.However, Robustness of K-AANN model deteriorates significantly due to spillover effect beyond 15% offset failure.
Similarly, gross failure scenarios corresponding to two extremities of sensor range can cause severe Spillover effect.To study robustness against gross type failure scenario, a subset of input sensors is simultaneously failed with gross high or low value and average cross-sensitivity of remaining sensor set is calculated using relation (19).Plot in Figure 14 shows that average cross-sensitivity of K-AANN model increases drastically beyond 10% gross failure.However, DAASM resulted in a very nominal spillover, even in case of multiple sensor failure.The DAASM effectively managed simultaneous gross high or low failures in 25% of total sensor set as compared to 10% in case of K-AANN.: false alarm probability.
Under normal mode, the residuals from observed and model estimated sensor values behave as a white Gaussian noise with mean  0 = 0.The residual variance  2 is estimated for each sensor under normal operating conditions and remained fixed.The false alarm  and missed alarm  probabilities are set to be 0.001 and 0.01, respectively.In order to determine minimum fault detectability limit, a numerical procedure is opted which searches for minimum expected offset  1 in the interval { 1 : [-3]}, provided the constraint on missed and false alarm rate holds. is the standard deviation corresponding to residual variance of particular sensor.The plot in Figure 15 shows the detectability metric for each sensor.The plot in Figure 15 shows that DAASM can detect  faults which are two times smaller in magnitude than those detectable by K-AANN model.Improvement in fault detectability metric for DAASM can be attributed to observed improvement in model robustness, as suggested by the following relation: The term   /Δ drift  measures the ratio of observed residual to actual sensor drift in terms of autosensitivity.For highly robust model, this ratio reduces to one which means residual reflects the actual drift and results in high fault detectability.Contrarily, ratio value close to zero means that the prediction is following the input and results in poor fault detectability.

SPRT Based Fault Detectability
Test.Sequential probability ratio [34,36] based fault hypothesis test is applied to residual sequence {  } =  1( 1 ) ,  1( 1 ) ⋅ ⋅ ⋅  (  ) generated by relation   (  ) =  Obs (  ) −  Est (  ) at time   , where  Obs (  ) and  Est (  ) are the actual and model predicted sensor values, respectively.The SPRT procedure analyzes whether the residual sequence is more likely to be generated from a probability distribution that belongs to normal mode hypothesis  0 or abnormal mode hypothesis  1 by using likelihood ratio as follows: For fault free sensor values, the normal mode hypothesis  0 is approximated by Gaussian distribution with mean  0 = 0 and variance  2 .Abnormal mode hypothesis  1 is approximated with mean  1 >  0 using the same variance  2 .The SPRT index for the positive mean test is finally obtained by taking logarithm of the likelihood ratio in (21) as follows [35]: Pressurizer pressure sensor, sampled at a frequency of 10 seconds, is used as a test signal to validate the fault detectability performance.Two drift faults, at the rate of +0.01%/hour and −0.01%/hour, are introduced in the test signal for DAASM and K-AANN model's assessment, respectively.The first and second plots in Figure 16 show drifted and estimated pressure signal from DAASM and K-AANN models, respectively.Third plot shows residual values generated by differencing the drifted and estimated signals from both models.The final plot shows SPRT index values against residuals from K-AANN model and DAASM.The hypotheses  1 and  0 correspond to positive and negative fault acceptance, respectively.From SPRT index plot, successful early detection of the sensor drift at 2200th sample, with lag of 6.11 hours since the drift inception, shows that DAASM is more sensitive to small drifts.On the other hand, SPRT index on K-AANN based sensor estimates registered the same drift at 3800th sample with a lag of almost 10.55 hours.The result shows that DAASM is more robust in terms of early fault detection with low false and missed alarm rates.Finally, both models are tested against five test data sets.Each test set consists of 3630 samples corresponding to different months of full power reactor operation.Both models successfully detected an offset failure of 0.12-0.3BARG in all steam pressure channels and a drift type failure up to 2.85% in steam generator level (Figure 22).The K-AANN model failed to register a very small drift up to 0.1% in steam flow (STM flow 1) channel.A small drift up to 0.1 BARG is detected in test set 5 of pressurizer pressure channel.However, in case of drift type sensor failures, fault detection lag for DAASM was on average 0.5 times smaller in comparison with K-AANN model.Plots in Figures 17-21 show the estimated sensor values, from both models, on five test data sets of few selected channels.

Conclusion
This paper presented a neural network based denoised autoassociative sensor model (DAASM) for empirical sensor modeling.The proposed sensor model is trained to generate a monitoring system for sensor fault detection in nuclear power plants.Multilayer AANN based sensor models may result in suboptimal solutions due to poor regularization by traditional backpropagation based joint multilayer learning procedures.So a complementary deep learning approach, based on greedy layerwise unsupervised pretraining, is employed for effective regularization in the proposed multilayer DAASM.Autoencoder architecture is used for denoising based unsupervised pretraining and regularization of individual layers in the network hierarchy.To address robustness against perturbations in input sensors, data corruption processes exercised during unsupervised pretraining

Figure 1 :
Figure 1: Integrated sensor estimation and fault detection architecture.

Figure 4 :
Figure 4: DAASM architecture and greedy learning procedure.Greedy layerwise pretraining procedure is depicted by counterclockwise flow in the figure.

Figure 5 :
Figure 5: Spatial distribution of selected sensor set.

Figure 10 :
Figure 10: Autosensitivity values of individual sensors in both models.

Figure 13 :
Figure 13: Comparison of spillover effects against increasing offset failure.
imposed on clean input.Different data corruption processes (⋅) are devised for learning of each successive hidden layer.Denoising based learning objective drives the hidden layer mappings to get invariant against such engineered transformations on input data.It is important to understand that denoising based learning approach does not correct the faulty signal explicitly; rather it seeks to extract statistical structure among input signals which is stable and invariant under faults and hence implicitly estimates correct value for faulty signal.Two failure types are identified and defined as follows: 4.1.Corruption Process (⋅) for Invariance.For the case of calibration monitoring, an ideal DAASM should learn encoder and decoder functions which are invariant to failed sensor states.So during DAE based pretraining phase, engineered transformations from prior knowledge about the involved failure types are

Table 1 :
List of NPP sensors.
2 ℎ , 1 ≤  ≤  = No. of encoder layers = 3; %Offset = 5%, 10%, 20% ⋅ ⋅ ⋅ 50%, [35]t detectability metric measures the smallest fault that can be detected by integrated sensor estimation and fault detection module as shown in Figure 1[32].The detectability metric is measured as percentage of sensor span  = /Span, where value M corresponds to minimum detectable fault.Minimum fault detectability limit, for each sensor, is quantified through statistical based sequential probability ratio test (SPRT) by Wald[34].SPRT test is carried out to detect if the residual being generated from normal distribution ( 1 ,  2 ) or ( 0 ,  2 ) as defined for faulty and fault free sensor operations, respectively[35].
0 : normal mode residual mean. 2 : normal mode residual variance. 1 : expected offset in residual mean in abnormal mode.