University of Failure Propagation Modeling and Analysis Via System Interfaces

Safety-critical systems must be shown to be acceptably safe to deploy and use in their operational environment. One of the key concerns of developing safety-critical systems is to understand how the system behaves in the presence of failures, regardless of whether that failure is triggered by the external environment or caused by internal errors. Safety assessment at the early stages of system development involves analysis of potential failures and their consequences. Increasingly, for complex systems, model-based safety assessment is becoming more widely used. In this paper we propose an approach for safety analysis based on system interface models. By extending interaction models on the system interface level with failure modes as well as relevant portions of the physical system to be controlled, automated support could be provided for much of the failure analysis. We focus on fault modeling and on how to compute minimal cut sets. Particularly, we explore state space reconstruction strategy and bounded searching technique to reduce the number of states that need to be analyzed, which remarkably improves the efficiency of cut sets searching algorithm.


Introduction
Our society is relying more and more on the safety of a number of computer-based systems, for example, the control system of managing air traffic or operating a nuclear power plant. These systems are usually called safety-critical systems, which are a class of engineered systems that may pose catastrophic risks to its operators, the public, and the environment. The development of these systems demands a rigorous process of system engineering to ensure that safety risks of the system, even if some of its components fail, are mitigated to an acceptable level. System safety analysis techniques are well established and are used extensively during the design of safety-critical systems.
The size, scale, heterogeneity, and distributed nature of current (and likely future) systems make them difficult to verify and to analyze, particularly for nonfunctional properties including availability, performance, and security, as well as safety. Due to the manual, informal, and errorprone nature of the traditional safety analysis process, the use of models and automatic analysis techniques as an aid to support safety-related activities in the development process has attracted increasing interest. Model-based safety analysis (MBSA), where the analysis is carried out on formal system models that take into account system behaviors in the presence of faults, has been proposed to address some of the issues specific to safety assessment. Recent work in this area has demonstrated some advantages of this methodology over traditional approaches, for example, the capability of automatic generation of safety artifacts, and shown that it is a promising way to reduce costs while further improving efficiency and quality of safety analysis process.
The existing approaches to MBSA, for example, ESACS/ ISAAC [1,2], AltaRica [3][4][5], Failure Propagation and Transformation Notation (FPTN) [6,7], Hierarchically Performed Hazard Origin and Propagation Studies (HiP-HOPS) [8], and the AADL with its error annex [9], can be classified into two groups: (a) failure logic based or (b) system states based. Original MBSA techniques, such as FPTN and HiP-HOPS, have sought to unify classical safety analysis methods such as Fault Tree Analysis (FTA) and Failure Modes and Effects Analysis (FMEA) and to provide a formalism for capturing a single authoritative safety model of the system. These approaches emphasized the model of failure propagation logic. The second group of MBSA approaches addresses the analysis of the transition of system states [10][11][12], in order to identify the routes that a system transits from a safe state to a hazardous state. Since these search-based techniques normally require exhaustive enumeration of all reachable states, they do not fully exploit the advantage of the internal structure of the state space and domain knowledge of safety analysis.
Safety is clearly an emergent property of a system that can only be determined in the context of the whole. As an emergent property, safety arises only when the system components interact with each other in an environment. Such property is controlled or enforced by a set of constraints related to behaviors of system components. Accidents often result from interactions among components that violate these constraints. In general, the term interaction is conceptually simple; it is a kind of action that occurs as two or more objects have an effect upon one another. In practice, interactions among the components dramatically increase the complexity of the overall system. It is intuitively obvious that growing interaction complexity poses a great challenge to engineer safety of the system. In some cases, although hazard identification and safety assessment had been undertaken for system components, the hazards could be missed apparently at least in part because they arise out of the complex and indirect interactions in a complex system, especially when the components of the system are independently developed or operated. The new challenge to MBSA due to the complexity of a system is that it is very hard to analyze all possible dysfunctional interactions in the system so that its hazardous states which reflect the effects of dysfunctional interactions and inadequate enforcement of safety constraints can be identified.
Using interface models to capture these interactions would offer twofold benefits. Interface information could be abstracted from the existing system design models conveniently. This is helpful to the tight integration of the systems and safety engineering processes. Furthermore, interface models are often more abstract and contain much less corresponding implementation details, which help to combat the state space explosion problem in the following automatic analysis.
In this paper, we propose an approach of model-based safety analysis which utilizes extended interface automata [13] to model the nominal behaviors as well as fault behaviors of the system. To avoid the exploration of the entire reachability set, we present a structural analysis strategy, which takes into account the inner structure of state space. This has made possible development of efficient algorithms for the purpose of safety analysis. By applying state space reduction and heuristic search, a much smaller reachable space needs to be explored and thus the efficiency of proposed minimal cut sets algorithm has been improved.
The rest of the paper is organized as follows. In Section 2, we introduce interface automaton as a formal model for safety analysis. In Section 3, we show how to use domain knowledge for efficient state space reduction and minimal cut sets generation. Section 4 mainly demonstrates our approach on a small yet realistic safety-related example where minimal cut sets are generated and analyzed. Conclusions and outlooks for future work are presented in Section 5.

Interfaces and Fault Modeling
A trace on interface automaton is an alternating sequence consisting of states and actions, such as 0 , 0 , 1 , 1 , . . . , −1 , , where ∈ V and ∈ A ( ∈ {0, . . . , } and ∈ {0, . . . , − 1}). If an action ∈ I (resp., ∈ O , ∈ H ), then (V, , V ) ∈ T is called an input (resp., output, internal) transition. We denote by T I (resp., T O , T H ) the set of input (resp., output, internal) transitions. An action ∈ A is enabled at a state V ∈ V if there is a transition (V, , V ) ∈ T for some V ∈ V . We denote by I (V), O (V), and H (V) the subsets of input, output, and internal actions that are enabled at the state V.
We illustrate the basic features of interface automata by applying them to the modeling of a railroad crossing control system. Figure 1 depicts the interfaces of three components modeling the train, controller, and gate, respectively. Two sensors are used to detect the approach and exit of the train. The state changes of the controller stand for handshaking with the train (via the actions Approach and Exit) and the gate (via the actions Lower and Raise by which the controller commands the gate to close or to open). When everything is ready, a signal Enter is sent to authorize the entrance of the train.
In the graphic representation, each automaton is enclosed in a box, whose ports correspond to the input and output actions. The symbols ? and ! are appended to the name of the action to denote that the action is an input and output action, respectively. An arrow without source denotes the initial state of the automaton.  The parallel composition of interface automata shows how they all relate and work together. In Alfaro and Henzinger's original paper of interface automata [13], providing a particular form of parallel composition mainly aimed to analyze the compatibility of components. In this paper, the compatibility is not our concern. Therefore, we abandon this kind of parallel composition, using a more traditional one which is common in automaton theory. Two interface automata and are composable if I ∩ I = ⌀ = O ∩ O ; that is, they have neither common inputs nor common outputs. We let shared( , ) = A ∩ A . In a composition, the two automata will synchronize on all common actions and asynchronously interleave all other actions.
Definition 2 (parallel composition). If and are composable interface automata, the parallel composition × is the interface automaton defined by The parallel composition of train and controller is shown in Figure 2  states of the composition. The automaton × × in Figure 2(b), where all the actions have been hidden as internal ones after synchronization, describes the system function in an orderly and concise manner.

Fault Propagation Modeling on Interface Automata.
For model-based safety analysis, failure modes must be explicitly modeled. Our approach to modeling fault behaviors is to specify them using the interface automata notation itself. The incorporation of the fault behaviors directly on the system interface models will promote ease of specification of complex fault behaviors for both the system design and safety engineers, allowing them to create simple but realistic models for precise safety analysis.
Fault modeling is aiming to specify the direct effects of failure modes. In our approach, this is done via importing new actions, states, and transitions to the existing models. There are two types of faults in interface automata: basic faults and propagating faults. Basic faults differ from propagating faults in their activation condition. Basic faults are intrinsic to a component and originate within the component boundary. Their activation occurs independently of other component failures and can be modeled using an independent input action.
The faults that get activated by interaction or interference due to error propagation are considered as propagating faults. In interface automata, propagating faults will be synchronized during the composition of two components and then hidden as internal actions. We denote by E bf and E pf , respectively, the mutually disjoint sets of basic faults and propagating faults. Consider the cooling water supply system in Figure 3. This system consists of an electric pump and a water tank. Two components synchronize on action Water, which means there is water in the tank and the pump will start working (action Pumping). However, the tank may be broken or empty, denoted by input actions Broken and Empty. Here, Broken and Empty are basic faults since they originate within the tank component. NoWater is defined as a propagating fault to model the failure propagation from water tank to the pump. Also, there are other propagating faults, like power failure (action Pow F) and the stop of pump (action Stop), between pump and other devices not listed in this example.
Our extension towards interface automata lies on two aspects. Firstly, as shown in Figure 3, the extended definition of interface automata could be regarded as a 7-tuple = ⟨V , V Init , I , O , H , T , E bf , E pf ⟩. Since E bf and E pf also have input or output attributes, they do not need special treatment during the composition. Besides, solid lines in the figure depict the nominal system interfaces, while the dash lines show the fault behaviors of each of the components. Based on the real system interfaces, this kind of extension is easy to perform and easy to understand and provides useful system insights and shared formal models between the design and safety analysis stages.

Algorithms Assist in Failure Analysis
Minimal cut set is the combination of basic faults which can guarantee occurrence of a top-level event (TLE), that is, a set of undesired states, but only has the minimum number of these faults. The key problem investigated in this paper is how to efficiently produce minimal cut sets through exhaustive state space exploration. We first give the following definitions of (minimal) cut sets. (1) 0 ∈ V Init , ∉ V Init ( ∈ {1, . . . , }), and ∈ TLE; (2) ∀ ∈ {0, . . . , − 1} ( ∈ E bf → ∈ cs); Intuitively, a cut set is a combination of some basic faults which can lead to the occurrence of the given top-level event, that is, the set of all basic faults contained in a trace from initial states V Init to top-level event (TLE) is a cut set with respect to TLE. We use CS TLE to represent all cut sets on automaton with respect to TLE. Minimal cut sets are formally defined as follows.
Definition 4 (minimal cut sets). Let CS TLE be the set of all cut sets on automaton with respect to TLE. One has the set of all minimal cut sets of TLE on automaton as follows: Based on the previous definitions, the computation of minimal cut sets is to find out all traces leading to the TLE, that is, all cut sets CS TLE , and then minimize these sets.

State Space Reconstruction.
Several automatic analysis techniques for minimal cut sets generation have been developed on a variety of models, for example, Petri net, finite state machine, NuSMV model, and AltaRica model. The main difficulty in this kind of search-based minimal cut sets generation is state space reduction, because in general the complexities of searching algorithms depend on the size of the state space.
We observed that, for safety analysis on interface models, only those actions that contribute to the occurrence of the predefined TLE need to be analyzed. During the state exploration, noncontributing actions could be ruled out as far as possible. This means that a majority of transitions relevant to internal and output actions could be peripheral to our core searching algorithm. Based on this observation, we develop a procedure of state space reduction.
To reconstruct the state space of the given interface models, our approach is to cluster states that are noncontributing to the occurrence of TLE into equivalent classes and eliminate Mathematical Problems in Engineering 5 relevant transitions. The numbers of states and transitions are reduced using a restricted forward and backward reachability analysis from initial states and TLE, respectively. The result is a representation of the state space that is compact and minimal in some sense and keeps all necessary information about faults propagation. (iii) TriggeringArea = V \ (SafetyArea ∪ Hazard Core ).
Definition 5 divides the state space of an interface automaton into three separate areas based on top-level event (TLE). Intuitively, the set SafetyArea contains the reachable states of an automaton from the initial states by taking only internal or output transitions. In this area, the current running of the system is safe and there is no occurring of any basic faults. HazardCore consists of all states that can reach TLE through a series of continuous internal or output transitions. States within the scope of HazardCore could evolve into TLE without any external stimulation, that is, the occurrence of any basic fault. TriggeringArea is a complement set containing all states of V excepting those in SafetyArea ∪ HazardCore. According to this definition, all of the basic faults are contained in the TriggeringArea, which is the focus of our state exploration algorithm for minimal cut sets generation.
We use a directed graph DG , consisting of vertices and labeled edges, to denote the underlying transition diagram of an interface automaton .

Definition 6 (state space reconstruction). Given an interface automaton ,
(DG ) is the reduced form of its original transition diagram by applying the following steps on DG : (1) remove all transitions from TriggeringArea to SafetyArea ; (2) remove all transitions from HazardCore to SafetyArea or TriggeringArea ; (3) combine all states of SafetyArea into a new state Init; (4) combine all states of HazardCore into a new state Top.
Definition 6 and Figure 4 show the process of our state space reduction strategy. All states in the set SafetyArea are

Theorem 7. Let be an interface automaton. If is a cut set with respect to top-level event ( ), then there exists a trace in ( ) from Init to Top, containing all elements of .
Proof. Since the set of basic faults cs is a cut set of TLE, according to Definition 3, there is a trace = 0 , 0 , 1 , 1 , . . . , −1 , in DG from V Init to TLE satisfying ∀ ∈ cs → ∈ . By applying steps (3) and (4) of Definition 6 at both ends of this trace, respectively, we get a new path = Init, , +1 , . . . , , , Top. Obviously, is in (DG ). During this process, only those internal and output transitions in SafetyArea and HazardCore are removed. Because all basic faults are defined as input actions, hence no basic fault is eliminated from ; that is, trace still contains all elements of cs.
The essence of the search-based minimal cut sets generation is to find out all combinations of basic faults that contributed to the top-level event in DG . Theorem 7 shows that DG and (DG ) are equivalent for this purpose, whereas the latter contains far fewer states and transitions. We use an example to illustrate the effectiveness of this approach. Reconsider the previous railroad crossing control system in Figure 1, which is in an ideal world where no errors occur. The next step is to extend these models such that failure modes are also correctly described. The following three failure modes are taken into account in this example: (i) Failure of the sensors (actions 1 and 2 ) which will prevent sending signals (actions Approach and Exit) when the train is approaching or exiting.
(ii) Failure of the brake (action Bra ) which will lead to nonauthorized entering of the cross, that is, bypassing action Enter.
(iii) Failure of the barrier (action Stuck) which results in the barrier being stuck at any location; a new state 3 is added to represent the stuck state of the barrier.
These failure modes are integrated into the formal interface models, as shown in Figure 5. This model extension provides us a failure propagation map on nominal system model, (c) Figure 6: The reshaped state space (DG RC ).
reflecting both normal interactions and fault propagation. The safety goal of this system is clear: SR 1: it must never happen that the train is on the crossing (at state 2 ) and the crossing is not secured (at state 0 or 3 ). RC = × × is the parallel composition of those extended interface models. There are 30 states and 63 transitions in the state space DG RC . By using the state space reduction technique in Definition 6, we can obtain a reduced state space (DG RC ), as shown in Figure 6, which only contains 17 states and 31 transitions. For brevity, we use 3 separate subgraphs to represent the entire state space, while these subgraphs have the common endpoints Init and Top.

Minimal Cut Sets Generation.
Here, we discuss the basic searching algorithm for cut sets generation using forward reachability analysis. The first step is to find all possible simple paths (paths without cycles) between two vertices, that is, Init and Top, in the graph (DG ). To solve this problem, the traditional depth-first search algorithm could be adjusted in the following manner: (1) Start at source vertex Init and perform a depth-first walk. All nodes on the path are pushed in a stack and set as visited.
(2) When the top element of the stack is target node Top, a path is successfully found. Record this path, pop out Top, and set it as unvisited.
(3) For the current top of the stack , find its successor that is unvisited and push this node in the stack. If no such successor exists, pop out and set and its successors unvisited. (4) Go back to step (2) until the stack is empty.
To better visualize this process, one can think of a search tree rooted at the vertex Init, and all simple paths leading to node Top constitute the body of this tree. As an illustration, consider the searching of Figure 6(b). The tree structure in Figure 7(a) depicts all simple paths between Init and Top generated by the above algorithm. Since a cut set only consists of basic faults, we get the following four cut sets from this tree by removing extra actions and duplicate paths: After finding all possible cut sets in (DG ), it is easy to identify those minimal ones. According to Definition 4, given two cut sets cs 1 and cs 2 , if cs 1 ⊂ cs 2 , cs 2 must not be a minimal cut set. This fact could be used to design a simple filter through pairwise comparison of these cut sets. It is worth noting that sorting this set of cut sets by size in advance will make the comparison more efficient. So far, we have presented a simple search-based algorithm for minimal cut sets generation. Unfortunately, this naive algorithm has a major drawback: it needs to traverse all possible simple paths between Init and Top during the searching. However, from a practical perspective, some branches of the search tree could be pruned. Since the ultimate goal is to get minimal cut sets, according to Definition 4, no further extension of the current path is necessary if it contains a cut set which has been found in the previous exploration. Using this observation, we present a heuristic searching strategy based on a bounded breadth-first search to improve the performance of state space exploration.
The basic idea is to search for all simple paths between Init and Top whose lengths are bounded by some integer . This problem can be efficiently solved in (DG ) via a breadth-first search with bound . The result is a set of cut sets whose lengths are no more than , denoted by sets, and can therefore be used for guiding the branch pruning during the rest searching. Figure 7 shows a very distinct optimization effect on the graph (DG RC ). Firstly, we perform a bounded breadth-first search (let = 1) on (DG ). Via a breadth-first search with depth 1, the searching process will find two traces from Init to Top, which are exactly shown as Figure 6(a). Thus, we get sets = {{Stuck} , {Bra }} .
We then use part of the state space (i.e., Figure 6(b)) to demonstrate the branches pruning effect of sets. Figure 7(a) shows the paths generated by the naive algorithm, while Figure 7(b) represents the results of the optimized algorithm which uses sets to prune superfluous paths. The comparison indicates that over half of the total vertices and edges are ruled out of the searching. In order to tail off the search space and boost converging rate of the algorithm, unnecessary branches are trimmed in terms of the results of the bounded searching. Algorithm 1 implements the above discussed techniques, which takes as input a directed graph (DG ), a set of basic faults BasicFaults, and bounded searching result sets. The output MCSList returns all minimal cut sets as a list, which will be initialized with sets. The set cs is used to record all basic faults in the current path, and it will be added to the end of MCSList once the node Top in (DG) is reached. All nodes on the current searching path are pushed into a stack . If the top element of the stack is node Top or cs ∈ , the algorithm will backtrack to continue the search for a new path by popping out the old stack top and trying out the unvisited neighbor of the new stack top. This procedure is repeated over and over until gets empty. Procedure Filter(MCSList) carries out the pairwise comparison of all elements in MCSList to get those minimal cut sets, as we mentioned before.
For (DG RC ) in Figure 6, computing sets = {{Stuck}, {Bra }} with = 1 firstly and then using Algorithm 1 to perform a full search in state space, we get all minimal cut sets = {{Stuck}, {Bra }}. This result shows that the necessary and sufficient conditions for the occurrence of top-level event (i.e., a train is on the crossing, while the crossing is not secured) are as follows: the barrier is stuck or the brake fails, while sensors failure will not consequentially lead to a dangerous situation.
In this approach, the choosing of parameter in the bounded searching is relatively flexible, depending on the size of (DG). The role of sets will gradually change with the increasing of . Obviously, if the value of is big enough, all simple paths between Init and Top will be found in a breadth-first way with low efficiency. Therefore, providing an appropriate value for is key to the solution. For large models, we recommend a relatively small for the bounded searching firstly. If Algorithm 1 can not terminate within a reasonable amount of time, gradually increase until an adequate number of searching branches have been cut down so that the algorithm gets terminated.

Fuel Supply System Example
In this section, we exemplify our approach with a fuel supply system for small aircraft. Figure 8 is the schematic diagram of this system. The engine is supplied with fuel pumped at high pressure from a collector tank-a small tank located close to the engine. This demonstration is not concerned with the high-pressure fuel system. The main fuel storage in the aircraft is in the left and right main tanks. Each tank contains a low-pressure pump ( and in the diagram) which transfers fuel to the collector tank via valves and as required. In if ∈ then (8) A d db a s i cf a u l t into the set cs; (9) if cs ∉ then (10) P u s h in stack ; (11) else (12) R e m o v e from cs;  The aircraft also has a smaller reserve tank, also fitted with its own lowpressure pump . All pumps are protected by nonreturn valves , , , and . Valves and (normally closed and opened when required) allow fuel to be transferred from the reserve to either wing tank as necessary. The pumps have built-in overpressure protection; in the event of attempting to pump into a closed or blocked pipe, the overpressure relief system will simply return fuel to the tank. We model this system at interface level by three components interacting with each other, as shown in Figure 9. The automaton at the top left, denoted by Left , describes the interface behavior of the left tank, pump , and valves and . The top right model Right consists of the right tank, pump , and valves and . The reserve tank, pump , and valves , , , and are modeled as Bottom at the bottom. The solid part of the figure depicts the nominal interactions among these components.
In order to analyze the system behavior in presence of faults, we would like to extend this nominal system model with the given failure modes. In Table 1, we list all faults under consideration, defined as input or output actions, and their intuitive meaning. Those dash lines as well as the new added actions in Figure 9 demonstrate our model extension. The parallel composition FSS = Left × Bottom × Right describes the behavior of this fuel supply system in the presence of faults.
There are 140 states and 560 transitions in the state space DG FSS .
For this example, assume that the safety requirement is as follows: SR 2: simultaneous loss of fuel supply from the left and right main tank will not occur.
That is to say, it must never happen that Left is at the state 2 and at the same time Right at 2 . Thus, the top-level event is TLE = 2 * 2 , where * is used as a wildcard to substitute for any state of automaton Bottom . The set of all basic faults could be obtained from Table 1, that is, According to Definition 5, DG FSS could be divided into three parts: SafetyArea FSS , TriggeringArea FSS , and HazardCore FSS , while TriggeringArea FSS is the focus of our attention. Using the reconstruction approach given in Definition 6, we get a reduced state space (DG FSS ). In contrast, the new state space only contains 88 states and 442 transitions.