Integrated DeterministicProbabilistic Safety Assessment (IDPSA) combines deterministic model of a nuclear power plant with a method for exploration of the uncertainty space. Huge amount of data is generated in the process of such exploration. It is very difficult to “manually” process and extract from such data information that can be used by a decision maker for riskinformed characterization, understanding, and eventually decision making on improvement of the system safety and performance. Such understanding requires an approach for interpretation, grouping of similar scenario evolutions, and classification of the principal characteristics of the events that contribute to the risk. In this work, we develop an approach for classification and characterization of failure domains. The method is based on scenario grouping, clustering, and application of decision trees for characterization of the influence of timing and order of events. We demonstrate how the proposed approach is used to classify scenarios that are amenable to treatment with Boolean logic in classical Probabilistic Safety Assessment (PSA) from those where timing and order of events determine process evolution and eventually violation of safety criteria. The efficiency of the approach has been verified with application to the SARNET benchmark exercise on the effectiveness of hydrogen management in the containment.
Development of Deterministic Safety Analysis (DSA) and Probabilistic Safety Analysis (PSA) was crucial step for establishing stateoftheart in nuclear power safety design and licensing. However, in order to avoid stagnation, it is important to recognize inherent limitations of the classical approaches and new opportunities provided by the overall progress of risk analysis science and computational technologies. For instance, advantage of DSA is that it can model dynamics of the plant systems driven by physical phenomena and their response to failures of the equipment or operator actions. If the “worst” scenarios can be clearly identified, then conservative treatment of uncertainties in DSA can be employed to estimate safety margins. The number of scenarios considered in DSA is usually small with respect to the actual set of possible accident scenarios, thus outcomes of DSA are largely affected by the expert judgment. However, obtaining a priori knowledge about “worst” case scenarios and “conservative” assumptions about uncertain parameters for complex systems is not a trivial task. PSA attempts to cover all possible risk significant scenarios. However, it is not easy to model a priori unknown dependency of the accident scenario outcome on the order and timing of the events (e.g., due to temporary evolution of the system parameters driven by complex physical processes and interactions) using Boolean logic of the classical PSA where the result is unambiguously determined by simple set of events. A robust safety justification must be based on both deterministic and probabilistic considerations to address the effects of the dynamic nature of mutual interactions between (i) stochastic disturbances (e.g., failures of the equipment), (ii) deterministic response of the plant (i.e., transients), (iii) control logic, and (iv) operator actions. Passive safety systems, severe accident, and containment phenomena are examples of the cases when such dependencies of the accident progression on timing and order of events are especially important. Integrated use of deterministic and probabilistic safety analysis is a means to enable riskinformed decision making based on consistent evaluation of both the uncertainties arising from the stochastic nature of events (aleatory uncertainties) and those arising from lack of knowledge about the processes relevant to the system (epistemic uncertainties) [
Integrated DeterministicProbabilistic Safety Assessment (IDPSA) methodologies aim to achieve completeness and consistency of the analysis through systematic consideration of different sources of uncertainties including physical processes, failures of hardware and software, and human actions. IDPSA tools usually employ (i) system simulation codes and models with explicit consideration of the effect of timing on the interactions between epistemic (modeling) and aleatory (scenario) uncertainties, (ii) a method for exploration of the uncertainty space. A review of the IDPSA methods for nuclear power plant applications can be found in [
For decision making, however, it is often insufficient to merely calculate a quantitative measure for the risk and respective uncertainties [
The goal of this work is to develop methods that will enable understanding of the outcomes of IDPSA analysis while maintaining completeness. In order to achieve that, the methods should reduce the volume of the data generated by IDPSA tools without loss of important for decision making information. The strategy for the reduction of the data volume is based on (i) grouping of different scenarios into different “classes” according to different failure modes; (ii) identification of the scenarios that have “similar” behavior (clustering) within each class. Condensed information should provide useful insights into the complex accident progression and understanding of possible mitigation strategies.
In this work we develop an approach for classification and characterization of failure domains. Failure domain is a domain in the space of uncertain parameters where critical system parameters exceed safety thresholds. The approach is based on scenario grouping and clustering with application of decision trees for characterization of the influence of timing and order of the events. In this approach decision trees are constructed to represent failure domain as a set of leaf nodes and correspondent classification rules that lead to each node. The approach was applied to classification of the simulated transients and failure domain identification and characterization in SARNET benchmark exercise [
In this paper we extend our previous work [
In Section
Methodologies that take into account uncertainty in timing of events can produce potentially unlimited number of transient scenarios for a single initiating event. For decision making, handling of the huge amount of data is a challenge. The development of insights and understanding requires interpretation of the scenario evolutions in order to identify the principal characteristics of the events that contribute to the risk. In order to solve this problem we develop an approach based on clustering and decision trees for explaining the structure of the clustered data (see Figure
Grouping and classification approach.
The main steps of this approach are briefly explained below. Firstly, the scenario grouping is performed (see Section
Next, Principal Component Analysis (PCA) [
System codes are used in IDPSA in order to evaluate temporal evolution of the accident progression for different time dependent sequences of the events such as activation or failure of safety systems (e.g., reactor protection system and emergency core cooling system). The main purpose of scenario grouping is to identify and separate sequences of events that can be treated in classical PSA, that is, those where order and timing of events have no effect on the outcome (safe or failure end state). The approach is represented in Figure
Scenario grouping algorithm.
The numeric algorithm used in scenario grouping is similar to those used in sequence pattern analysis [
The sets of events that always lead to either failure or safe condition are identified for further treatment in PSA. If the same set of events can lead to both failure and safe states it means that timing and/or order of events can be important. Such sets of events are treated further in Steps (2) and (3).
The sequences of events which always lead to either failure or safe condition are identified. If the same sequence of the events can lead to both failure and safe conditions it is a sign that the influence of timing of the events is important.
The sequences of events where outcome depends on the timing of the events and parameter uncertainty and requires respective dynamic treatment are considered further in the following steps of the analysis, that is, PCA and data transformation, Scenario Clustering, and so forth (see also Figure
Principal Component Analysis (PCA) is a technique for revealing the relationships between variables in a data set by identifying and quantifying a group of principal components. These principal components are composed of transformations of specific combinations of input variables that relate to a given output (or target) variable [
The main purpose of application of PCA in the classification approach is to transform the data without rescaling into a new orthogonal coordinate system that optimally describes the variance in a single dataset. The data transformation is defined by
The purpose of clustering analysis is to assign members to each group such that members of a group are more similar (according to specific criteria) to each other than to those in other groups (clusters). Clustering analysis is the task of grouping a set of objects in a way that objects within one group (or cluster) are more similar than those in the other groups. It can be achieved by various algorithms that can differ significantly in their notion of what constitutes a cluster and how to efficiently find them. There are several clustering algorithms that methodologically can be separated into connectivity models (hierarchical clustering [
Gridbased clustering methods partition the space into a finite number of cells that form a grid structure on which all of the operations for clustering are carried out. The main advantage of the approach is its computational efficiency [
Given a set of
Once grid is defined, the algorithm looks for the clusters of cells that contain failure scenarios of the same failure mode. Two cells can form a cluster if they have a common face. The algorithm presents large amount of scenarios with different failure modes as a finite number of cells grouped into clusters corresponding to the same failure mode.
A grid based clustering algorithm performs orthogonal partitioning of the uncertainty space, similar to the partitioning of learning data set in the decision tree. Therefore, complexity of the decision trees can significantly reduce when using clustering results data rather than row scenario data.
A decision tree is a classification and datamining tool for extraction of useful information contained in large data sets. An instance is classified by starting at the root node of the tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute in the given example. This process is then repeated recursively for the subtree rooted at the new nodes until no further branching in the tree can be made or some stopping preset conditions are met [
Most algorithms that have been developed for learning decision trees are variations on a core algorithm that employs a topdown, greedy search through the space of possible decision trees [
The Gini impurity index (commonly used in CART) at node
The Gini criterion for split at
The split
The failure domain is represented by agglomerations (clusters) of nonoverlapping cells (grids) in the uncertainty space. If all points in the uncertainty space are equally probable then the probability of the failure domain is the ration of the volume of the failure domain to the total volume of the uncertainty space.
Decision tree represents the failure domain by final nodes in the tree and respective classification rules that lead to these nodes. The probability of each cell can be obtained as average probability of scenarios contained in correspondent cell:
In order to illustrate proposed approach we chose a benchmark exercise developed in the framework of the SARNET [
The exercise is based on a hypothetical accident transient in typical French 900 MWe PWR (3 loops, with Passive Autocatalytic Recombiners, PAR).
The transient description is as follows:
Loss of coolant accident (LOCA) with a 3^{″}break size on cold leg of Reactor Coolant System (RCS) (INI – initiation event).
The Safety Injection System (SIS) and Containment Heat Removal System (CHRS or spray system) which are not available until the beginning of core dewatering.
The steam generators which are available but not used by the operators.
No water injection (SIS) occurring before core dewatering.
The reactor operating at nominal power before the initiating event.
The calculated core dewatering occurring at 4080 s (1 h 08 mn); the vessel rupture occurring at 14220 s (3 h 57 min) if no action is undertaken.
A water injection (SIS) means is available (with an “average” flow rate) and can be used by the operators.
The spray system (CHRS) is available and can be used by the operators.
Water injection after the beginning of clad oxidation causes an increase of the hydrogen flow rate towards containment.
Hydrogen combustions (hereafter called IGNI event) can occur if the containment gas mixture is flammable; recombiners, because of their high temperature, can initiate a combustion; such combustions can be total (all the hydrogen in the containment is burnt) or not.
Shapiro diagram [
Table
Limit for inflammability.
Molar fraction of H_{2}O, %  Inflammability limit for H_{2} molar fraction, % 

0  4 
10  4.5 
20  5.5 
30  6.7 
40  8.1 
50  10.1 
The probability that water injection is available between total core uncovery (5875 s) and vessel rupture (14220 s) is 0.5. The probability of water injection initiation timing is uniformly distributed in the time interval between total core uncovery and vessel rupture.
If hydrogen concentration (
If
If
Performing grouping analysis we identified the following possible sequences of the events:
Containment failure probabilities.
Sequence  Containment failure probability 


0.51379 

0.07221 

0.00189 
Scenario Grouping.
The advantage of using PCA and coordinate system defined by the principal components of the failure domain is that it significantly reduces the complexity of the decision tree. In case of the transformed coordinate system the decision tree was able to characterize almost 50% of the data set separating the major part of failure scenarios from safe scenarios only in 2 cuts. The results can be transferred back into original coordinate system simply by inverting (
Figures
Cluster representation of the failure domain (red) and safety domain (green) for the sequence
Cluster representation of the failure domain (red) and safety domain (green) for the sequence
Containment failure probability distribution for sequence
Different values of probabilities in the different parts of the failure domain correspond to different H_{2} concentrations and respective probability distributions for the time delays of ignition event [
H_{2} molar fraction (%) and H_{2} inflammability and ignition limits (%).
H_{2} molar fraction (%) and H_{2} inflammability and ignition limits (%).
Failure domain structure can be represented using clustering data and decision tree. To illustrate the approach and to provide a possibility to compare failure domains, presented in Figures
After computing an exhaustive tree, the algorithm eliminates nodes that do not contribute to the overall prediction, decided by another essential ingredient, the cost of complexity. This measure is similar to other cost statistics, such as Mallows’
Decision tree results for the sequence
The pruning (cutting) in the decision trees is done at the point where the further refinement will not improve the results and, on the other hand, increase the complexity of the decision tree. Decision trees (Figures
Decision tree fitted into clustering results data for the sequence
Decision tree fitted into clustering results data for the sequence
Let us consider as an example of sequence
Cluster representation of the failure domain (red) and safety domain (green) for the sequence
When it comes to decision support, H_{2} ignition event (IGNI) in this sequence is entirely stochastic event; that is, the operator has no control over it. On contrary, water injection (SIS) and containment spray (CHRS) systems can be actuated by operator at specified moment of time and, therefore, they are controllable. Decision trees can be used to build decision support model based on the controllable events; that is, decision trees can help us to find an answer to the question “what can be done in case of LOCA initiating event to avoid containment failure?”. Figure
Cluster representation of the failure domain (red) and safety domain (green) for the sequence
Decision tree fitted into clustering results data for the sequence
Decision tree fitted into clustering results data for the sequence
In this work we present an approach for grouping and classification of typical “failure/safe” scenarios identified using IDPSA methods. This approach allows the classification of scenarios that are directly amenable in classical PSA and scenarios where order of events, timing, and parameter uncertainty affect the system evolution and determine violation of safety criteria.
We use grid based clustering with AMR and decision trees for characterization of the failure domain. Clustering analysis is used to represent the failure domain as a finite set of the representative scenarios. Decision trees are used to visualize the structure of the failure domain. Decision trees can be applied to the cases where four or more uncertain parameters are included in the analysis and it is difficult to visualize results in threedimensional space.
Proposed approach helps to present results of the IDPSA analysis in a transparent and comprehendible form, amenable to consideration in the decisionmaking process. Useful insights into the complex accident progression logic can be obtained and used for development of understanding and mitigation strategies of the plant accidents including severe accidents. The insights can be employed to reduce unnecessary conservatism and to point out areas with insufficient conservatism in deterministic analysis. Results of the analysis can be also used to facilitate connection between classical PSA and IDPSA analysis.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This study was supported by the Swedish Radiation Safety Authority (SSM). The authors are grateful to Dr. Wiktor Frid (SSM) for very useful discussions.