Stochastic DES Fault Diagnosis with Coloured Interpreted Petri Nets

This proposal presents an online method to detect and isolate faults in stochastic discrete event systems without previous model. A coloured timed interpreted Petri Net generates the normal behavior language after an identification stage. The next step is fault detection that is carried out by comparing the observed event sequences with the expected event sequences. Once a new fault is detected, a learning algorithm changes the structure of the diagnoser, so it is able to learn new fault languages. Moreover, the diagnoser includes timed events to represent and diagnose stochastic languages. Finally, this paper proposes a detectability condition for stochastic DES and the sufficient and necessary conditions are proved.


Introduction
Fault diagnosis has a major role in industrial systems since it allows the fault detection as soon as possible to avoid serious damages of the system or the injury of an operator.Fault diagnosis of Discrete Event Systems (DES) is an issue that has been addressed from different approaches.A fault is a deviation of the normal or required behavior.Fault diagnosis is the process of detecting and identifying such deviations of the system by using the information available on system variables [1].
According to [2], fault diagnosis aims to achieve three complementary tasks: fault detection, fault isolation, and fault identification.Fault detection is a functionality that decides whether the system works in normal conditions or whether a fault has occurred.If a fault has occurred, fault isolation aims to locate the component(s) causing the fault.Fault identification is concerned with identifying the specific nature of the fault (its size, criticality, importance, etc.).This problem has been addressed by many researchers related with developing new models, new properties, new algorithms, and efficient solutions to fault diagnosis of DES.Model based diagnosis techniques can be divided into two groups.The first group uses models which include fault-free and faulty behaviors.The second group only uses fault-free models.
The work of [3,4] has provided a formal foundation of fault diagnosis and diagnosability analysis of DES that has been the base for many approaches of diagnosis.They use an automaton which generates all the possible event sequences in nominal and faulty operation.
Petri Nets (PNs) have been recognized as a suitable model to describe DES, particularly when a system is asynchronous [5,6].PN has been used for fault diagnosis starting from [7][8][9] who presented diagnosis proposals of estimating faulty states.In [10] a net unfolding approach to online asynchronous diagnosis is presented.This proposal avoids the state explosion problem that typically results from having concurrent components interacting asynchronously in a distributed system, but the computing cost of performing the online diagnosis increases for offline diagnosis.In [11], the authors extend the proposal of [3] to online fault diagnosis of modeled systems by PN.Some years later, these authors in [12] present two new algorithms to deal with the case of multiple modules and real-time communication requirements.In [13] the authors not only model faults by unobservable transitions but also include other transitions representing legal unobservable behaviors as well.They prove that all possible firing sequences corresponding to a given observation can be characterized and based on the notion of basis markings and justifications.The authors use a basis reachability tree to compute the set of basis markings; [6] changes the concept of basis marking and enumerates only a subset of the reachability space.This approach includes a different characterization in terms of new original notions such as justifications and minimal explanations.The work of [14] considers the system modeled as an interpreted PN (IPN) with partially observable states and events; the model includes the possible faults that may happen.Reference [15] proposes an online fault detection technique to avoid the redesign and the redefinition of the diagnoser when the structure of the system changes.The diagnoser waits for an observable event and an algorithm decides whether the system behavior is normal or may exhibit some possible faults.The solution of an integer linear programming (ILP) problem provides a sequence of unobservable transitions containing the faults that may have occurred.The system is modeled by IPN where fault events are modeled as unobservable transitions.It associates a different label to each transition, so it models the regular behavior.In [16] the authors started from the results of [15].They extend the work by considering a new source of nondeterminism (different observable transitions sharing the same label) and by considering distributed systems.To conclude [17] builds an online diagnoser based on PN approach, using the ILP definition and resolution.
The advantage of this class of methods lies in the possibility to give guarantees about the diagnosability of faults; moreover, if certain conditions hold, modeled faults can be precisely localized.An inherent disadvantage is that only faults explicitly considered in the system model can be detected and localized.
Diagnosis methods without fault model avoid this disadvantage; moreover, they build straightforward models since no special knowledge of system fault behavior is necessary.Nevertheless, the main drawback of these approaches is how to locate the fault since the models have less knowledge.Moreover, diagnosability of a given set of faults usually cannot be guaranteed.These methods are based on comparing the system outputs with model nominal outputs.In [18,19] the proposed method compares the observed and the expected behavior, a fault can be detected, and a set of fault candidates is determined.Inspired by residuals known from diagnosis in continuous systems, different set operations are introduced to generate the fault candidate set.After fault detection and a first fault localization, a procedure is given to locate the fault more precisely by an analysis of the further observed system behavior.
Coloured Petri Nets (CPNs) have also been oriented to fault diagnosis.Some approaches deal with combinatorial explosion, so they can be used to diagnose large systems.In [20], the authors present a method for modeling flexible manufacturing systems including fault models (based on fault trees).In [21], the authors present a method for modeling and diagnosing an orchestrated complex Web service.This approach is very restrictive to Web services models.In [22] the author presents a method for including the fault diagnosis within an embedded controller.The integration between diagnosis and controller in a reduced CPN model is suitable since it allows merging information about device states in a single token.Nevertheless, this approach has the same weak point as other model based approaches; it needs a fault model.
Reference [23] presented a new diagnosis method based on CPN called Latent Nestling Method.The initial model of the normal behavior of the system is performed from modeling techniques, based on generalized PNs.However, for complex systems the synthesis capabilities of CPNs can be used in these modeling steps.The set of faults to be diagnosed are defined and assigned to the subset of coloured tokens.A faulty event will be defined by establishing dynamic conditions in every marking and subsequently in every state reached by the system and the set of unexpected signals of the sensor readings.Next the coloured tokens of faulty events are allocated in appropriate places called places of latent nestling faults.These tokens are susceptible to fire from that place by the activation of an event sequence, , associated with an abnormal sensor reading.
Regarding diagnosability [3] defines diagnosability in the framework of formal languages and presents necessary and sufficient conditions for diagnosability of DES.The authors in [24] focused on diagnosability of IPN.They defined and characterized the property of input-output diagnosability in IPN models, so they avoided the reachability analysis.In their next work [25], they presented a polynomial algorithm to decide if an IPN is diagnosable.Reference [26] provides a necessary and sufficient condition for diagnosability of bounded PNs, namely, PNs whose set of reachable markings is finite.The effectiveness of the proposed procedure was illustrated in [27].They showed that, under certain conditions, the number of basis markings (a basis marking is a marking reached from  0 with the firing of  0 , where  0 ∈  0 | L( 0 ) = , and of all unobservable transitions whose firing is strictly necessary to enable ) is always smaller than the number of reachable markings (that increase exponentially with the size of the net).
Approach of the Work.In [2], the authors made a classification of diagnosis methods with respect to a number of criteria such as fault compilation (offline or online), modeling tools (automaton, PN, and state machines), fault representation (fault model: event-based or state-based, fault-free model), and decision structure or architecture (centralized, decentralized, and distributed).According to this classification, the diagnoser presented in this proposal can be classified as online fault diagnosis, based on PN without previous model and under a centralized structure.
In general terms the proposal is based on language theory and on stochastic timed interpreted Petri Nets (st-ICPN), as structures to generate DES languages.The diagnosis process starts by identifying the fault-free model, from the observed language.As a result, a st-ICPN language generator is built.The generator of the fault-free language is a base to building the diagnoser and, in addition, to the concepts of Coloured Petri Nets.A learning algorithm modifies the net structure each time a new fault is detected, so the diagnoser is able to learn fault languages.The net structure changes with each new detected fault.The modifications are as follows: addition of a token in the fault transition, modification of the arcs linking that transition, and the addition of a specific fault token to the initial marking.
The main advantages of this proposal about other diagnosis methodologies are the generation of deterministic models and the absence of previous fault models.The learning algorithm guarantees the diagnosis of faults not included in the fault set.
This paper is organized as follows: Section 2 describes the background on PN; Section 3 describes the fault behavior; Section 4 presents the diagnosis method; Section 5 shows an application case; and finally, the concluding remarks and discussion are shown in Section 6.

Background
This section introduces the formalism and definitions used in the paper.

System under Study.
In real systems, there are various stochastic disturbances such as sensor noises, stochastic disturbance, fault, or random variation of parameters.Thus, the system representation should be based on stochastic models [28]; therefore, the system to be diagnosed is a stochastic DES, whose dynamics can be described by the interrelation of I/O signals and its behavior could be described with formal languages based on event sequences; the system is split into  subsystems; a subsystem is a part of a system with a particular behavior.This structure can be seen in Figure 1.
In a closed loop system, there exist two kinds of inputs.Some inputs are external observable SCADA commands or operator requirements that modify the operation mode of the controller ().The other kind of inputs is external events affecting the plant (); these inputs can be either observable or unobservable. includes disturbances and interaction with other systems or faults.Moreover, control commands are plant inputs and can be considered as internal signals () of the closed loop model.When the system can be split into subsystems, control commands can be considered as local or global.A control command is considered global if it is applied to more than one subsystem, and it is considered local otherwise.
System outputs are sensor reading; each sensor reading belongs to a subsystem ; then, the set of sensor readings () will be  = ∪  .An input symbol for a subsystem  is composed of global and local control commands, where   ,  = 1 ⋅ ⋅ ⋅   , are global control commands and  , ,  = 1 ⋅ ⋅ ⋅  , , are local control commands; it is represented as  , ,  , being a binary representation of ;  stands for the input symbol at time   .An output symbol for a subsystem  is [ ,1 ⋅ ⋅ ⋅  ,  ]; it is represented as  , ,  , being a binary representation of ;  stands for the output symbol at time   .An operation mode is composed of a combination of  signals, [ 1 ⋅ ⋅ ⋅    ]; it is represented as   ,   being a binary representation of ;  stands for the external event at time   .For example, given a set of sensor readings for subsystem  = 1 :

Events and Languages.
Let Ω 1 , Ω be two event sets, such that Ω 1 ⊂ Ω.A language, L, defined over Ω is a set of finitelength strings formed from events in Ω; that is, L ⊂ Ω * .The projection operation,  : Definition 1 (compound event).Given two event sets Ω  , Ω  and given two events   and   , such that   ∈ Ω  and   ∈ Ω  , a compound event  is the concatenation of   and   ;  =     .Ω = Ω  Ω  is a compound event set (Ω over Ω  Ω  ) and a language defined over Ω will be L ⊂ (Ω  Ω  ) * .
For example, given two event sets Definition 2 (projection operation over compound event sets, ).Given a compound event set Ω over Ω  Ω  ,  operation over compound events,  : used at the previous example will give Definition 3 (timed event and stochastic timed event).A timed event is a composition of an event and the elapsed time between two consecutive events.Then   =  ⋅  ev at time   , where A language, L, defined over timed event set Ω is a set of finite-length strings formed from timed events in Ω; that is, where  is a timed event sequence, at times  0 ≤ ⋅ ⋅ ⋅ ≤   .In real systems, it is nearly impossible to repeat the same event sequence with the same times between events.In that case, if times fit a probability density function,  ev ∼ ( ev ), timed events are called stochastic timed events.Definition 4 (projection operation of timed event sequences: ).Given two timed event sets Ω 1 , Ω, such that Ω 1 ⊆ Ω, let  =  0 ⋅ ⋅ ⋅   be a timed event sequence with   = ⋅ ev ; the projection operation of timed event sequences,  :  and the variable back to zero  proy = 0; else  Ω 1 (  ) =  and  proy =  proy +  ev .
For example (see Figure 2), given a timed set

Petri Nets. Petri Nets (PNs) are widely used for modeling DES ([29]
).A PN, , is a bipartite digraph represented by the five-tuple  = (, , , ,  0 ), where  is a set of places with cardinality  and  is a set of transitions with cardinality ;  :  ×  → N and  :  ×  → N are the Pre and Post incidence matrices ( =  − ).The marking function  :  → N represents the number of tokens residing inside each place;  0 is the initial marking [30,31].For the Pre and Post sets, the dot notation is used: ⋅  = { ∈  : (, ) > 0} [31].
IPN is an extension of PN allowing the association of input and output signals to models [32].

Definition 5 (interpreted Petri Net
).An IPN is a tuple  = (,   , , , ), where  is a PN.  = { 0 ,  1 , . . .,  |2   |−1 } is the observable inputs set,   is an input symbol, and   is the number of observable inputs;  = { 0 ,  1 , . . .,  |2  |−1 } is the output set,   is an output symbol, and  is the number of outputs;  :  →   is a transition labeling function that assigns an input symbol to each transition. : () →  is an output function that assigns an output symbol to each reachable marking.Differential of output symbols is introduced in [33] to avoid IPN nondeterminism.Definition 6 (differential of output symbol ).Given two output symbols   ,  −1 ∈ , at times   and  −1 , respectively,   is defined as A st-IPN is defined as follows.
If a CPN has output and transition labeling functions, it can be considered as an ICPN.Therefore, a PN including the characteristics of CPN and st-IPN can be defined as follows.Definition 11 (firing language of a st-ICPN).Let   be a firing sequence   =  1 ⋅ ⋅ ⋅   for colour class , of a , such that  0 ⌈  >   .The set of all firing sequences for the colour class  is called the firing language L   () for .Consider The transition and output labeling sequences generated by   allow the definition of the generated languages as follows.
Definition 12 (input and output languages of a st-ICPN).Let   be a firing sequence such that   ∈ L   (); the input language for  is defined as the labeling function sequences of the   ∈ L   (); that is, L  in () = {  |   = ( 1 ) ⋅ ⋅ ⋅ (  )}, and the output language for  is defined as the reached marking sequences by the firing of   ; that is, L  out () = {( 0 ) ⋅ ⋅ ⋅ (  )}.st-ICPN language is L() = {L  ()}.

Fault Behavior
The system generates a language that can be split into sublanguages, taking into account if a fault has occurred or not.L  is the language generated by a subsystem.
The set of timed compound events is partitioned into observable and unobservable events, Ω  = Ω   ∪ Ω   , where Ω   includes two subsets: fault and regular unobservable event subsets Ω   = Ω   ∪Ω reg  (adapted from [36]).An event    ∈ Ω  is of the form    = ( ,  , ) ⋅  ev (see (1)).The set of normal timed compound events Ω   is a subset of Ω   such that L   ⊂ Ω *   .A timed fault event can be defined as follows.
Definition 13 (timed fault event).Given a timed event sequence  and an event of the form  In cases (i), (ii), and (vi) the fault event is observable and in case (iii) the fault event is unobservable.Fault language can be defined as follows.
Definition 14 (fault language, L   ).Given a timed event sequence , the fault language is that is, the set of all timed event sequences with at least one timed fault event in the postlanguage of a normal sequence.

Diagnosis Method
The diagnoser proposed in this paper works without any previous knowledge of the system language.The diagnoser construction starts with the identification of the normal behavior, which results in a set of st-IPNs that generate the observed normal language.L   is the language generated by the identified st-IPN for subsystem .The diagnosis task is carried out by comparing the current event trace  with L   .If  ∉ L   , a timed fault event has been detected.The algorithm creates a language model recognizer for this new situation and  is considered as part of a fault language, L   .
Once a fault has been detected, a fault filtering algorithm allows the full diagnosis.This algorithm takes into account flow sharing (data, materials, or energy) between subsystems.The consequence of flow sharing is that a subsystem that operates without fault could reach an erroneous state, not described in L   .This problem happens when the subsystem does not receive the prospective service (the flow) of another subsystem linked to it.Flow stopping could be due to failures in another subsystem up or down the line [37].In order to include this fact in the diagnoser the notion of shared flow sensor (SFS) is introduced.
When a fault has been detected and it has not been eliminated by the filtering algorithm, the structure of diagnoser proposed allows isolating and identifying the fault; at this time a new fault trace of L   is learned by the diagnoser.So, the diagnosis skills of the diagnoser grow over time.

Architecture for the Diagnostic Method.
As it was mentioned in the previous section, the diagnoser is based on a set of identified st-IPNs.The diagnoser also includes color in order to compare languages and detect faults.So the

Variable
df "integer" "detected faults" diagnoser is a stochastic timed interpreted Coloured Petri Net to diagnosis (st-DICPN), which is shown in Figure 3.The set of places is partitioned into   =    ∪    ∪    , where    represents the set of latent nestling places, that is, places with nominal behavior in which a fault can happen;    is the set of places that verify the detected fault kind;    is a place that counts the identified faults.
The set of transitions is partitioned into   =    ∪    , where    represents the set of normal transitions that fire following the normal language and    represents the set of fault transitions whose size can be increased each time a fault event is detected; they fire when a fault   is detected.
The set of colour classes is   = {Behavior, , , Mode}, where Behavior = {⟨  ⟩, ⟨(, )   ⟩}: ⟨  ⟩ is the normal token, ⟨(, )   ⟩ is the generic fault token,  stands for the subsystem,  stands for the place, and the subscript   is a fault identification index;  = {⟨(, )   ⟩};  = {⟨integer⟩} is variable token that depends on variable , and the Mode colour class set includes the colours assigned to transitions, where Mode = { ,    }:   is "normal mode" and    is "detected fault mode ." In Figure 4, the part of the net in green colour represents the normal behavior which is identified online from the observed legal sequences.The part of the net in black colour represents the identified fault behavior by means of applying the diagnostic algorithm.This net has a variable structure because the diagnosis process learns the fault languages.
In Figure 4, when a fault is detected on place  ,0 ,  = 1, the transition  1,3 fires in   1 mode and then a colour token ⟨(, 0) 1 ⟩ is reached in  ,2 (VF place) and an integer ⟨1⟩ is reached in  ,3 (IF place).In this moment, the fut function is executed and the arcs and transitions structure increases, and the token in transition  ,3 is [  1 ;    ].
The incidence matrix entries are represented by vectors [35].If a fault  1 is detected in place ( ,0 ), then Pre and Post matrices will be updated as shown as follows.

Variable
df "integer" "detected faults"  This structure of incidence matrices allows the net evolution under normal conditions or nonpredefined event traces.

Online Diagnosis Process.
This section proposes a procedure that specifies the online work of the diagnoser.This process has five steps and it can be seen in Figure 5.
The first step is the configuration of the system.The system has to be split into subsystems, I/O signals must be defined, and the starting event for each subsystem will be  0  = ( ,  , ) 0 , where  , ,  , stand for the starting values of control commands and sensor readings, at time  0 , and the set of operation modes (  ) has to be stated.
The second step is the observation and learning of the normal behavior.The identification algorithm [33] builds a set of st-IPNs generating the observed language.
The third step is the computing of the initial diagnoser which transforms the st-IPNs into st-DICPNs.The proposed algorithm modifies the net structure as follows.
(i) A normal token ⟨  ⟩ will be added to each at initial place.
(ii) The set of    ,    , and    will be added as well as the arcs required to complete the diagnoser architecture (as the one shown in Figure 4).
The fourth step is the online fault detection and fault isolation.More precisely the process consists of the following.
Being The fifth step eliminates the fault if it has been generated by a false alarm caused by coupling.Coupling of -  and faults propagation ( [38]) are generated by interactions among subsystems.Therefore, a fault filtering algorithm is proposed, which is described below.
The algorithm analyzes the shared flow sensor reading set,   = ∪ ,  , where   is the number of sensors in subsystem .Given an identified fault event   * , at current time   , L out (  * , ) =  , , and  , =  ,1 ⋅ ⋅ ⋅  ,  , the algorithm compares the values of the shared flow sensors to decide if the fault is due to propagation (Algorithm 2).
Once Algorithm 2 filters each fault, the st-DICPN architecture is updated and  function updates the st-DICPN as follows.
(i) Detected fault colour tokens are added in the verification place.
(ii) The integer colour token is updated.
(iii) A colour token,    , is added in the fired fault transition, the same way the colour token (, )   is added in the involved arcs.
(iv) The fault has been isolated and then   *  is included in the set of identified fault trace,    ⊂ L    .Therefore, the fault trace is learned by the st-DICPN and the fault is identified.
This trace contains information about the faulty subsystem as well as the unexpected behavior and it is possible to distinguish the subsystem that made the fault as well as the faulty signals.

Properties.
The structure of a   is variable because the number of coloured tokens, as well as the number of transitions, grows with each new fault trace.Nevertheless, the size of each   is bounded by the number of sensors.
Let    be a transition firing sequence, such that, for all  , ∈   reaches a normal token in a latent nesting place, from which the system can evolve when the fault has been repaired.

Detectability.
The analysis of detectability presented in this proposal is based on language theory and prior works on temporal observability.Detectability proves if the system can detect the occurrence of a fault in a finite number of observable events.
Based on Definition 13 -detectability is defined as follows.
Definition 16 (-detectable).Let   *  be an event sequence ending with a fault event and let  be an observable event sequence after   *  , with  = ||.

Application Case
As application case, it has used the centralized air heating system, AHS; it has been identified as a set of st-IPNs in [33].
The system includes three heating subsystems.Each heating subsystem has a fan creating an air flow that is heated with hot water.The water flow is controlled by pump-valve systems.Moreover, there are a central heater providing hot water to each heating subsystem and two valves (V 1 and V 2 ) controlling the water flow through the whole system.The system can be split into five subsystems (1, 2, 3, 4, and 5).Subsystems 3, 4, and 5 are the local heaters, subsystem 2 is the distribution subsystem (V 1 and V 2 ), and subsystem 1 is the main heating subsystem (heater, main pump (  ), and reflux valve (V  )).The heater works in three modes (0, 1, 2), each state is defined by the number of resistances it has activated; so modes 0, 1, and 2 will represent no activation of resistance, activation of ℎ 1 , and activation of both ℎ 1 and ℎ 2 .
The system has a set of sensors.Each subsystem  includes a flow sensor (  ) that measures the presence or absence of flow.Nevertheless, flow level is affected when other subsystems are activated.So, a software sensor (  ) is designed to measure the deviation over a normal operation flow taking into account the activation of other subsystems.The system also includes binary temperature sensors, pressure sensors, and a position sensor for valve  1 .
The initial conditions in each subsystem are  The system globally starts with the external event "Son"; the heating subsystems are locally started with events "Ca3," "Ca4," and "Ca5."These events are external events that change the controller strategy.Each combination of external events generates a system operation mode.For example,  12 = , 3, 4, 5.
We have simulated the system, including some changes in the operation modes:  0 - 8 - 12 - 8 - 0 .That sequence means that "Son" works from time 1 to 85 and "Ca3" works from time 15 to 75.
Generic fault in all places is assumed, in all involved subsystems of the identified operation modes sequence.
Step 3. Assuming a fault in the main pump at time 60 min and a fault in the heater at time 70 min at subsystem 1, the
).A st-IPN is a structure represented by  = (, Ω, , ), where  = (,   , , , ) is an IPN; ,   ,  have the same meaning as in Definition 5;  :  →   ×  is a labeling function that assigns an input symbol and a time density function to each transition;  is defined as  : () → /;  is isomorphic over /.Consider Ω := (  × ). is the system alphabet.:=  ×  → ( × ) is a transition firing time density function for each   .= { 0 , . ..,  |2   |−1 } is the set of operation modes.The system alphabet Ω = (  × ). relates signals.A letter   ∈ Ω is a symbol that concatenates input signals and output signals at every instant   , (I/O symbol); this symbol is a compound event as in Definition 1 but is a timed event (see Definition 3); therefore Ω is a timed compound event set over   , .Consider   = (    ) ⋅  ev .

Table 1 :
Identified density functions for AHS system.