Failure Diagnosis for Distributed Stochastic Discrete Event Systems

Because of the complexity of the failure diagnosis for large-scale discrete event systems (DESs),DESswith decentralized information have received a lot of attention. DESs with communication events are defined as distributed DESs. Stochastic discrete event systems (SDESs) are DESs with a probabilistic structure. A-diagnosability is an important property in failure diagnosis of SDES. In this paper, we investigate A-diagnosability in distributed SDESs. We define a local model and global model. Moreover, we construct a synchronized stochastic diagnoser to check A-diagnosability in distributed SDESs. We also propose a necessary and sufficient condition for a distributed SDES to be A-diagnosable. Some examples are described to illustrate our algorithms.


Introduction
Discrete event system (DES) is a discrete-state, event-driven system, the states evolution of which depends entirely on the occurrence of asynchronous discrete events over time.Failure diagnosis of DES has received considerable attention to guarantee the performance of a reliable system [1][2][3][4][5].Most of the previous work concerned the DES as a global system, where there is only a site for collecting all the information about the system [1,2].However, in many complex systems, such as communication networks, power systems, and manufacturing systems, information is decentralized among many physically separated sites [6].According to the decentralized information, the global system can be partitioned into a set of local models.
DESs with decentralized information can be classified into distributed DESs [3,7,8] and decentralized DESs [4,9].Meanwhile, the methods of diagnosis are classified into distributed diagnosis and decentralized diagnosis.The distinction is not exactly between distributed DESs and decentralized DESs in the previous literature.In this paper, the difference between distributed DESs and decentralized DESs is summarized as follows: in distributed DESs, the local models communicate with each other by the communication events between them; however, communication events do not exist between the local models in decentralized DESs, and a coordinator is constructed to exchange the local diagnosis information.Briefly speaking, the diagnosis is performed locally in distributed DESs.Figures 1 and 2, respectively, depict the procedures of verifying diagnosability in distributed DESs (Algorithm 1) and decentralized DESs.
To deal with the diagnosis problem of DESs precisely, stochastic discrete event systems (SDESs) were proposed by Lunze and Schroder [10].SDESs extend DESs by probabilistic transitions.A-diagnosability is an important property in failure diagnosis of SDES.References [11][12][13] have investigated A-diagnosability in SDESs.In some complex SDESs, the information is also decentralized.Inspired by the DESs with decentralized information, SDESs with decentralized information are partitioned into decentralized SDESs and distributed SDESs.Failure diagnosis in decentralized SDESs was investigated in [14,15].However, the previous literature only focused on the decentralized SDESs.The approach for diagnosis in decentralized SDESs is not adapted to the distributed SDESs, because the structure of distributed SDESs is different from that of the decentralized SDESs.
Based on the synchronized stochastic diagnoser, we describe a necessary and sufficient condition for a distributed SDES to be A-diagnosable.
This paper is organized as follows.Some definitions and frequently used terms are introduced in Section 2. In Section 3, A-diagnosability in distributed SDES is presented.In Section 4, we construct a synchronized stochastic diagnoser and propose a necessary and sufficient condition for a distributed SDES to be A-diagnosable.We give an example to illustrate the condition of the A-diagnosability for distributed SDES in this section.In Section 5, we analyze the complexity of the algorithm.Section 6 describes the related work.Finally, Section 7 presents a summary of the results in the paper and gives the concluding remarks.

Preliminaries
In this section, we review some definitions and frequently used terms of SDES, A-diagnosability, and stochastic diagnoser.

SDES.
We first introduce some basic concepts in SDES.SDES is usually modeled as a stochastic automaton (SA), which is a finite state machine with probabilistic structure [12].
Definition 1 (SA).An SA is defined as a tuple  = (, Σ, ,  0 ), where  is the state space, Σ is the set of events,  : ×Σ× → [0, 1] is the partial state transition probability function, and  0 is the initial state of the SDES.
The event set Σ is partitioned as Σ = Σ  ∪ Σ uo , where Σ  and Σ uo denote the sets of observable and unobservable events, respectively.Note that Σ  ⊆ Σ uo ⊆ Σ denotes the set of failure events to be diagnosed.Let Σ * denote the set of all sequences formed by events in Σ, including  (empty event).The behavior of the system is described by the prefixclosed language , where  is a subset of Σ * .A path denotes an arbitrary element of .Suppose  is a path of ; projection Pj() removes the unobservable events from .Formally, projection is defined as follows [1].
According to different failure types, the set of failure events can be partitioned into disjoint sets; that is, Σ  = Σ  1 ∪⋅ ⋅ ⋅∪Σ   .Let   denote the final event of a path .Define The result of function Ψ(Σ   ) represents the set of paths whose final event is the failure event of a specific type.Hereafter,   denotes the failure events whose type is Σ   .For the sake of simplicity, we introduce our algorithms by the systems with only a single failure type.In Section 4, we extend our algorithms to multiple failure types.
In SA, (, ,   ) is a state transition probability of the system evolution from  to   driven by event , where ,   ∈ , and  ∈ Σ.To facilitate the solution to the diagnosis problems, we formulate three assumptions about the transition probability [1,12]: (A1) At most one   ∈  exists, such that (, ,   ) > 0 for a given  ∈  and a given  ∈ Σ. (A2) For every state in , the probability of a transition occurring from that state is one or, equivalently, ∀ ∈ , (A3) There does not exist any cycle of unobservable events; that is, Intuitively, assumptions (A1) and (A2) indicate that transitions will continue to occur in any state.Assumption (A3) ensures that the DES does not exhibit an arbitrarily long path of unobservable events.

A-Diagnosability and Stochastic Diagnoser of SDES.
The approach for distributed diagnosis is based on the related definitions in [12].In this subsection, we review the definitions of A-diagnosability and stochastic diagnoser of SDES in [12].
Assessing the diagnosability of a system is crucial in diagnosis.The definition of diagnosability in DES was proposed in [1].However, diagnosability cannot distinguish between paths highly probable and less probable.Therefore, A-diagnosability was proposed in [12].A-diagnosability requires an error bound  and a delay bound  such that, for any failure path, its extensions, which are longer than , occur with a probability smaller than  [12].Definition 4 (A-diagnosability).A live, prefix-closed language  is   -A-diagnosable with respect to a projection Pj and a set of transition probabilities  if where the diagnosability condition function  is as follows: In (9), path  ends with a failure event whose type is Σ   . is an arbitrary sufficiently long continuation of . is not logical diagnosable (referred to as diagnosability in [1]). is   -A-diagnosable, if and only if (iff) the probability of  is smaller than .
An SA is A-diagnosable iff every failure event  in SA is A-diagnosable.
Example 5. Consider the SDES  in Figure 3 as an example; the path  =  ∈ Ψ(Σ  1 ).We take  = 0.1.The continuation For both continuations, we have Pj() =   1 .However, the inverse projection Pj A necessary and sufficient condition was proposed to check the A-diagnosability of the failure events in [12].The condition is based on stochastic diagnoser.Stochastic diagnoser, which is used either online or offline to describe the behavior of the stochastic system , is constructed as follows [12].Definition 6 (stochastic diagnoser).A stochastic diagnoser is defined as a tuple   = (  , Σ  ,   ,  0 , Φ,  0 ), where   is the set of logical elements with the initial logical element  0 = {( 0 , )}, Σ  is the set of observable events,   is the transition function of the stochastic diagnoser, Φ is the set of probability transition matrices, and  0 is the initial probability mass function on  0 .
More details of the stochastic diagnoser can be found in [12].Figure 4 shows the stochastic diagnoser   of the SDES  in Figure 3. Two logical elements exist in   ; that is, The condition of A-diagnosability is based on the theory of Markov chains.Let  and  be states of a Markov chain.  (0 <   ≤ 1) represents the probability that if the Markov chain is in state , it will go back to state  at some point in the future.If   = 1, then  is called a recurrent state.Otherwise,  is called a transient state.
The condition for a language  to be A-diagnosable is described as follows.
Theorem 7 (see [12]).A language L generated by an SA G is   -A-diagnosable iff every logical element of its stochastic diagnoser   containing a recurrent component bearing the label   is   -certain.

A-Diagnosability for Distributed SDESs
In Section 2, we have already introduced A-diagnosability in SDES.In this section, A-diagnosability property will be taken into account when the SDES is modeled as a distributed SDES.Assumption (A4) implies that if there exists a communication event from state , then all the other events from  belong to Σ  .Assumption (A5) avoids the deadlock state during synchronization.

SDES with
After defining the local models, we introduce some properties in distributed SDESs.We define   = { 0 }∪{  ∈   :  has an observable event or a communication event into it}.
Let (  ,   ) denote the set of all paths that originate from state   of   .We define For the sake of simplicity, we first illustrate a distributed SDES with two local models as an example.It is not difficult to extend to the case of a finite number of local models.
Example 9. A distributed stochastic system  composed of local models { 1 ,  2 } is shown in Figure 5.In the system, { 1 ,  2 } is the set of communication events.Event  1 is observable in  1 .Events  2 and  3 are observable in  2 . 1 is a failure event.( 1 ,  1 ,  2 ) = 0.2 is a probability transition in  1 .
A local model is an SA.Therefore, based on Theorem 7, we can similarly obtain the A-diagnosability of the failure events in local models.The proof of Theorem 10 is the same as Theorem 7's.

Theorem 10. A language 𝐿 generated by a local model is 𝐹 𝑖 -Adiagnosable iff every logical element of its diagnoser containing a recurrent component bearing the label 𝐹
A local model is locally A-diagnosable iff every failure event occurring on that local model is locally A-diagnosable.Before introducing the global model, we make the following assumption:

Global
(A6) The delay among the observable events in different local models can be omitted.
Assumption (A6) ensures that the local states from different local models can be triggered together by the local events.
(2) Σ gl ⊆ ∏  =1 Σ  is the set of global events, where Σ = Σ ∪ {}.The element in Σ gl is of the form  gl = ( 1 , . . .,   ), where, for each  = 1, 2, . . ., , where  represents the event from the local event set Σ  .If an observable event does not exist after a state, then we use event  to guarantee the synchronization of the local models.
Then, we verify A-diagnosability of the failure event in the global model.The global event  gl can be seen as an ordinary event in SDES.Similarly, the global state  gl can be seen as an ordinary state in SDES.Thus, the global model is equivalent in formalism to an SA.According to Definition 4, we can obtain the A-diagnosability of the failure event in the global model.

Conditions of A-Diagnosability for Distributed SDESs
In this section, we present the conditions for a distributed SDES to be A-diagnosable.

A-Diagnosable
We determine the A-diagnosability through the global model.The probability of the transition is computed by (15).Suppose   ∈ Ψ(Σ   ) is the path ended with failure event in the global model.There exists   ∈ /  .According to (15), we have Pr(  ) ≤ Pr() < .By contrast, assume  is not A-diagnosable in the global system.Then, Because Pr() ≥ Pr(  ), there exists Pr() ≥ Pr(  ) > .Therefore, there exists  ∈ Ψ(Σ   ) and  ∈ / (where  ≥ ), such that The assumption violates the known conditions.Therefore, if  is locally A-diagnosable in a local model, then  is Adiagnosable in the global system.
In Example 9,  1 is A-diagnosable in the local model  1 .Moreover, we have already presented that  1 is A-diagnosable in the global model shown in Figure 6.The result also demonstrates the correction of Theorem 12.
However, if the failure event  is not A-diagnosable in the local model, we cannot verify whether  is A-diagnosable or not in the global model.Furthermore, we construct a synchronized stochastic diagnoser to test the A-diagnosability of the failure event which is not locally A-diagnosable.

Construction of Synchronized Stochastic Diagnoser.
In this subsection, we describe the construction of a synchronized stochastic diagnoser used to state the condition that ensures A-diagnosability.We need to first define the set of possible failure labels, which are similar to the failure labels in stochastic diagnoser in [12].
Definition 13 (local failure label).Δ  = {} ∪ 2 Σ  is a set of possible local failure labels, where Σ  is the set of failure events in the local model   and label {} represents the normal behavior of   .
(3) tran  :   × Σ  →   is the transition function of the prediagnoser (to be defined later).
(2) Φ  is the set of probability transition matrices.A set of probability transition matrices is defined as where  ∈ Σ uo and   ∈   .
Example 16.We take the prediagnoser in Figure 7 as an example.Figure 8 shows the synchronized stochastic diagnoser of local models  1 and  2 . 1 is not A-diagnosable and  2 is A-diagnosable.Therefore, the distributed SDES   is not A-diagnosable.

Evaluations
Let  = (, Σ, ,  0 ) be a distributed stochastic system composed of  local models { 1 , if the local models are not A-diagnosable, the synchronized stochastic diagnosers should be constructed to verify the Adiagnosability of the distributed SDES.The worst case is that we verify the A-diagnosability through the global model.

Related Work
Since Sampath et al. proposed diagnosability in DESs [1], many algorithms about failure diagnosis were proposed.In order to reduce the complexity in central diagnosis, DESs with decentralized information were investigated.References [4,9] described the diagnosis in decentralized DES and [3,7,8] described the diagnosis in distributed DES.SDES can present the system more precisely.In the context of testing diagnosability of SDES, a number of approaches have been proposed, including [11][12][13][14][15][16][17].Reference [16] has defined safe diagnosability for SDES, in which failure detection occurs before any given forbidden string in the failed mode of system is executed.References [11][12][13]17] have presented the algorithms of testing diagnosability of SDES.The methods in [12] constructed a diagnoser and used Markov matrix to test diagnosability of the SDES.The complexity of the methods in [12] is (2×2 2|| ×|| 2 ×|Σ|+2 2|| ×|| 3 ), which is exponential in the number of states of the system.To improve the efficiency, [11,17] proposed a polynomial test to verify the diagnosability.
Reference [11] is based on the twin-plant structure and does not construct a diagnoser.Reference [13] has used probabilistic logic to diagnose SDES.The complexity of [13] is also polynomial and the power of the algorithms is 4. Because of the large number of states in a global SDES, SDESs with decentralized information were proposed.Similarly, SDESs with decentralized information are separated into decentralized SDESs and distributed SDESs.A-diagnosability of decentralized SDESs has been presented in [15].In order to improve the complexity of diagnosis in decentralized SDESs, [14] proposed a polynomial algorithm to check the A-diagnosability of decentralized SDESs.
To the best of our knowledge, there is no work about Adiagnosability in a distributed SDES.Therefore, the question of A-diagnosability in a distributed SDES is investigated in this paper.

Conclusion
A-diagnosability is an important property in SDES.SDESs with decentralized information are partitioned into decentralized SDES and distributed SDES.In this paper, we investigate the A-diagnosability in distributed SDES.We introduce the local model and global model in distributed SDES.In order to verify the A-diagnosability of the global model, A-diagnosability of every local model should be verified first.For the local models which are not A-diagnosable, we have proposed a necessary and sufficient condition to ensure A-diagnosability of distributed SDES.A synchronized stochastic diagnoser has been constructed to determine the condition.
Incremental diagnosis is another approach to diagnose the system locally.In the future, we intend to investigate the incremental diagnosis in SDES.
Model.Suppose  is a failure event in a local model   .The purpose of failure diagnosis in distributed SDES is to verify the A-diagnosability of  in the global model.Given a set of local models, the global model can be obtained by the synchronization of transitions among the local models.The synchronization is based on the communication events.
Decentralized Information.DESs with decentralized information are partitioned into decentralized DESs and distributed DESs.Similarly, we separate the SDESs with decentralized information into decentralized SDESs and distributed SDESs.In distributed SDESs, the local models communicate with each other by the communication events.The main task of the communication events is to deliver the diagnosis information.A distributed stochastic system  is composed of interacting SDESs  = { 1 ,  2 , . . .,   }.  is called global model. is obtained by the synchronization of the local models   ( = 1, . . ., ), which are defined as follows.Definition 8 (local model).A local model is defined as a tuple   = (  , Σ  ,   ,  0 ), where   is the local state space, Σ  is the set of local events,   :   × Σ  ×   → [0, 1] is a partial local state transition probability function, and  0 is the initial state of the local model. are the observable events.If an event   ∈ Σ  occurs on the local model, then this event cannot be observed by other local models.Therefore, suppose   and   are two arbitrary local models of .Σ  and Σ  represent the observable event sets of   and   , respectively.We have Σ  ∩ Σ  = .(2) Σ uo are the unobservable events.If the failure event  ∈ Σ  ∈ Σ uo , then  can only occur on this local model.(3) Σ  are the communication events.If a communication event   ∈ Σ  , then   occurs at least on another local model.Note that communication events are unobservable.The communication events are used to exchange the local diagnosability information.Therefore, if a communication event  has been triggered in a local model, then  should be triggered in other local models at the same time.In order to guarantee assumption (A2) and avoid deadlock state, we make the following assumptions: A5) If there is a local model   = (  , Σ  ,   ,  0 ) such that ( 0 , ,   ) = 1 and  ∈ Σ  , then, ∀ ∈ {1, . . ., } \ , the event  from the initial state  0 of local model   , s.t. ∉ Σ  or  = .
(A4) If there exists (, ,   ) > 0 such that  ∈ Σ  , then, ∀(, ,   ) > 0, we have  ∈ Σ  .( Definition 11 (global model).Given a set of local models { 1 , . . .,   }, the global model of { 1 , . . .,   } is defined by Based on the definition of local model, only communication events can occur in different local models.Therefore, if an event  is a communication event, then all the transitions including  should be triggered.Meanwhile, when a new transition tran gl is generated, its probability is calculated according to different cases.Let  gl ( gl ,  gl ,   gl ) denote the probability of transition tran gl from global state  gl to   gl driven by global event  gl ..For example, we choose two transitions ( 4 ,  1 ,  4 ) = 1 In order to guarantee that the sum of the probabilities is equal to 1, we need to divide ∏  =1   by Pr temp ( gl ).Figure6presents the global model  gl of the local models { 1 ,  2 } shown in Figure 1, . ..,   ) which are   are triggered.(2)If there does not exist communication event among ( 1 , . . .,   ), then all the events in ( 1 , . . .,   ) which are not  are triggered.where   denotes the probability of the triggered transition in local model   . Mreover, we suppose (, , ) = 1.Based on the definition of global model, the event  gl is the Cartesian product of the local events.Therefore, a temporary probability of  gl ( gl  gl ,   gl ) can be calculated recursively by ((o 1 , o 3 ), 1) Figure 6: The global model  gl . in  1 and ( 2 ,  3 ,  4 ) = 0.1 in  2 , and then the temporary probability of transition (( 4 ,  2 ), ( 1 ,  3 ), ( 4 ,  4 )) = ( 4 ,  1 ,  4 ) × ( 2 ,  3 ,  4 ) = 1 × 0.1 = 0.1.However, some results of the Cartesian product are redundant (suppose two outgoing transitions from  1 labeled by  1 and  2 exist in  1 ; then the result of Cartesian products ( 1 ,  2 ) and ( 2 ,  1 ) is redundant).Let Pr temp ( gl ) represent the sum of the probabilities of the transitions which are not redundant.

Table 1 :
Complexity of verifying A-diagnosability in distributed SDES.