Architecture Level Safety Analyses for Safety-Critical Systems

,


Introduction
Systematic analyses of the architectural models modeled using the Model-Based Engineering (MBE) [1] practices, early and at every abstraction level, imbibe a greater confidence in the integration of the system. The creation and analysis of architectural models of a system support prediction and understanding of the system's capabilities and its operational quality attributes. These attributes include performance, reliability, reusability, safety, and security. All along the developmental lifecycle, the faults such as their failure modes and their propagation effects, at system-level, can be predicted. Such issues remain unnoticed until system integration and testing. This proves to be a costly rework resulting in an unaccounted project time, cost, and maintenance.
For safety-critical advanced complex embedded systems, the system design and development are in compliance with the safety standards and engineered with practices as specified by MIL-STD882 [2], SAE ARP-4761 [3], and DO-178B/C [4]. The process of development, management, and controlling these systems in conformance with the safety practices proves to have an impact on the system requirements, postsystem integration, and test. With the evolution of the system, availability and reliability of these models are to be consistent and this poses a great challenge.
These safety practices include various availability and reliability prognosis with the help of system architectural models. Model-Based Engineering approaches for safety analyses address these issues and prove to provide consolidated information about the informal requirements and the architecture model of the system. The safety analyses performed on a system also take into consideration the physical environment of its deployment and functioning. Due to insufficient support of the formal languages trend is to make use of architecture description languages such as Architecture Analysis & Design Language (AADL) and Society of Automotive Engineers (SAE) standard. AADL, a high-level architectural descriptive language, basically provides a platform for overall 2 International Journal of Aerospace Engineering integration of various system recommended components via formal semantics and syntax. This component-based modeling language is extended with the introduction of sublanguages as Annexes. AADL is packaged with multiple Annex sublanguages such as Error Model Annex (EAnnex) and Behaviour Annex (BAnnex) as standards. The EAnnex standard is suitably augmented with safety semantics and ontology of fault propagation, supporting error annotations on the architectural models [5]. This thus enables the component error models and their interactions to be considered in context to the system architecture modeled using AADL.
This paper presents our contributions as a case study implementation (Speed Control Unit of Power-Boat Autopilot) to the standard approach for the illustration of its application. The paper is organized as follows. Firstly, we summarize the concept of Architecture Analysis & Design Language (SAE AADL) and Error Model Annex (EAnnex/EMV2). Next we provide an illustration of the architecture fault model specification for Speed Control Unit of a Power-Boat Autopilot (PBA). We also discuss the various safety analyses methods involved in MIL-STD882 safety practice. Finally, we conclude the paper with the assessment of these safety analyses based on the architecture fault models.

Error Model Annex in Architecture Analysis & Design Language (AADL)
Architecture Analysis & Design Language (AADL), an SAE International standard, is a unified framework providing extensive formal foundations for Model-Based Engineering (MBE) practices. These practices extend throughout the system design, integration, and assurance with safety standards. AADL distinctly represents a system hardware and software components and their interactions via interfaces. Critical real-time computational factors such as performance, dependability, safety, security, and data integrity can be rigorously analysed with AADL. AADL also integrates custom analyses and specification techniques during the engineering process. This allows in the development and analysis of a single, unified system architectural model. AADL can be extended using the specialized language constructs that can be attached to the components of the architectural model defined by AADL. These components are reinforced with additional characteristics and requirements, referred to as Annex languages. The architectural model components are annotated with these properties and Annex language clauses for functional and nonfunctional analyses. Error Model Annex (EMV), which is an extension of AADL, aids in describing the failure conditions and fault propagations as error events, propagations, occurrence, and their distribution properties. With the integration of these constructs in the AADL model/s, as shown in Figure 1 [6], the existing components are extended as current models liable for Safety Evaluation and Analyses. This can be done with the help of the algorithms in OSATE or by using other third party tools. Error Model Annex (EMV2) overlays major focus on the standards set of error types and error propagation, defined by AADL as a standard syntactic construct through the introduction of Annex libraries. These Annex libraries provide an overlook of the formally specified error propagation behaviours [10,11]. Some of the common error types are as follows [9]. (2) Timing Errors. They represent arrival rate, service too early or late, and unsynchronized rate.
(3) Value Errors. They represent individual service item error or errors in a sequence of values.
(4) Replication Errors. They represent replicates of states or services being communicated.
(5) Concurrency Errors. They represent accessing shared logical or physical resources. Along with these the error model types can be referenced in the Error Model Annex subclause. The constructs for the EMV2 are similar to the syntax and style as defined for AADL. An exception is that any set of textual language constructs can be included within an Annex that includes Object Constraint Language (OCL) [12] or a temporal logic notation [13].

Implementation of Proposed Research
In this section we exhibit the architecture fault modeling in AADL, along with the extension of EMV2, at three levels of abstraction with a suitable case study, Speed Control Unit of Power-Boat Autopilot (PBA). This unit is a simplified speed control model, including a pilot interface unit for input of relevant Power-Boat Autopilot information, a speed sensor that sends speed data to the PBA, the PBA controller, a throttle actuator that responds to the commands specified by the PBA controller, and a display unit. The type definitions defining the component, component names, their runtime category, and interfaces are identified and defined. The speed sensor, pilot interface, throttle actuator, and the display unit are modeled as devices, while the PBA control functions are represented as process, as shown in Figure 2. With all these we perform the safety analyses with the specification of the source of error and its propagation across the system and its components. This is carried out by defining the error states and their corresponding compositional fault behaviour. This is followed by the expansion of the fault logic with respect to its error behaviour related to each component of the system and its response to the failures. The component error behaviour is also defined for the system components that correlate to the faults that are possible to occur. Here in this system the NoValue due to failure passes on from the pilot interface unit to the throttle actuator. The same is being conveyed to the display unit feature status. In addition to this fault, there occurs another propagation of error that is NoService. This fault results in the Failed state of the system. Here we can observe that the specification is automatically inherited by the instances of each component and their interactive neighbors. The error propagation paths inherent in such system architecture AADL models form a basis, as a need for the representation of Failure Mode and Effect Analysis (FMEA) and Common Cause Analysis (CCA). [throttle.Failed and display unit inter .Failed] -> Failed.

(iii) Composite Error
We assume that the system fails if either of the devices, that is, throttle actuator or the display unit, behaves in the Failed state, while it tends to recover from the Failed state and remains to be Operational even if the display unit fails, as the speed control unit mainly depends on the throttle command in maintaining and controlling the speed of the PBA.
This provides a scope for redundancy management for fault management capability of the system as well as seek for extensive solutions for reliability and availability analyses through various hierarchical levels of the system architecture. This methodology is not advisable for Markov Chains as the systems tends to grow quickly with their dependencies among various components within a system, as the number of components increases.
(iv) Component Error Behaviour. The modeler will have the flexibility of analysing the possible error behaviour that may correspond to individual components of a system. This also provides an insight into the component internal failures and the divergent factors that may result in failure mode, in turn having an impact on other components. The case study in this paper specifies that there might be multiple failure modes like Failure and Failed. In Failed mode the entire component is assumed to be redundant while in the Failure the component is working but having erroneous outputs/output states, as shown in Box 3. The failure modes are represented using the error states with more likely coupled error behaviour of the subsystem/component. The consistency checker associated with the Error Model Annex abstracts the propagation specification to introduce unique and distinctive error types. While the modeling tool associated with the Error Model Annex validates the organization of the component error behaviour along with the propagation specification specific to each of the components in the system architecture, the actual system architecture must include the Safety System component/s that regulates the fault management and aids in safety analyses.

Safety Analyses
Safety Analyses involve various analytical processes such as consistency checks, Fault Tree Analysis (FTA), Failure Modes and Effect Analysis (FMEA), Functional Hazard Assessment (FHA), and Common Mode Assessment (CMA) of the architectural model. The architecture model and its associated fault model are designed and developed in Open Source AADL Tool Environment (OSATE) [14]. It is an Eclipse based AADL modeling framework. There is also need to the safety analysis tool such as OpenFTA [15]. An Open Source tool for FTA is integrated into Eclipse environment, to assist in generation of FTA and its relevant documents, while CMA, FMEA, and FHA reports are generated as a built-in feature from OSATE.
(i) Consistency Checks. The consistency checks at the system integration level scan for the consistency in their functionality and the interfaces between various models/components, as shown in "Consistency Report" Section. This thereby strengthens the Virtual integration and analysis of the architecture model of the system. The consistency of various models deals with their integration feasibility while the consistency of the internal components in a model concentrates on the propagation capabilities, redundancies, and so on. With Error Model Annex the concept of consistency across the error models as specified checks for the consistency with respect to the component error behaviour along with the composite error behaviour of the system. It helps in defining the correctness of the error state as per the components specified in the architectural model. This may be proven with the substantial inclusion of Behaviour Annexes (BAnnex) [16] along with the Error Model Annex. The consistency report generated by the OSATE plugin for the case study is as follows.

Consistency Report
Warning  (FTA). It is a widely used safety and reliability analysis [17] feature in aerospace, medical electronics, and industrial automation industries [18]. In this analysis the major focus is on the top-level event (Minimal Cut-Set), from a set of combinations of basic events (Faults). It provides a hierarchical representation of the errors of the system (top-level event) from the basic events, related to components as specified in component error behaviour, in the form of a tree. OSATE depicts this composite error behaviour of the system from the underlying component error behaviours as a fault tree that represents specific error state of the system. This is achieved in the form of two files from OSATE for the representation of the fault tree, one being the database of primary events (.ped), as shown in Figure 4, causing the top-level error event, and the Fault Tree Analysis file (.fta). These files are viewed using OpenFTA, as shown in Figure 3.    The FTA analysis is in conformance with MIL-STD882 standard and the generated fault tree is validated, as shown in Figure 5.
The artifacts related to FTA as specified by MIL-STD882 deal with error composites and error events. FTA is a topdown approach of analysis. The Minimal Cut Set is evaluated in the OpenFTA tool and is as shown in Figure 6.
(iii) Failure Modes and Effects Analysis (FMEA) and Functional Hazard Assessment (FHA). Analysis of the failure modes associated with the system and the determination of its effects over the hierarchical evolution, performed systematically with a bottom-up approach, is FMEA. With respect to the errors of the system, FMEA provides the information about the deficient component/models and their related effects. It also provides sufficient overview of the failing component such as its phase of failure, severity/impact, and so on. FMEA is based on the artifacts that include error propagation paths (error source, error path, and the error sink). FHA provides the possible list of error upon the synthesis of the architectural model of the system. The major artifacts from FHA comprise the source of the error and the error events, as shown in Table 1. The details of FHA are processed from the OSATE tool after the model is instantiated and the relevant error information is suitably extracted from these architecture models. The report will be in the form of an excel spreadsheet with the specification of the error event details.

Conclusion
In this paper, we have proposed a novel approach of safety analyses of Safety-Critical Systems using AADL and the related Error Model Annexes. In spite of the comprehensive activities involved in safety analyses, the needs for such approaches are proved to be very much necessary. This is achieved and projected with the implementation of a suitable case study, Speed Control Unit of Power-Boat Autopilot. The employment of analysis techniques such as Fault Tree Analysis (FTA), Functional Hazard Analysis (FHA), and consistency of the model along with the conduction of qualitative and quantitative reliability analyses as part of these techniques can assess the system hazards and faults. The assessment covers the generation of suitable reports justifying the analyses. These methodologies or techniques provide grant for early identification and probability of the occurrence of potential problems. This also provides a perspective to explore additional architectural properties. Reuse and analysis of the evolved models, provided with suitable extensions with limited effort, can be achieved with this approach. The overall effect induces a greater confidence over abstracted stages of development and safety analyses of these architectural models of the system. Also analysing the system based on the Safety-Critical Requirements, with the expectation of exceptional conditions, hazards are expedited in the development of Safety System architecture models which will have an impact in certifying the same. This also avoids the unnecessary certification costs by understanding the change impact or the exceptional causes impacts during system engineering. "Display unit not working properly" "Faulty display unit" "Output" Marginal Remote "Not a major hazard"