Reliability Assessment for a Safety-Related Digital Reactor Protection System Using Event-Tree/Fault-Tree (ET/FT) Method

.e aim of this study is to verify if the reliability of a digital four-channel RPS under the design phase satisfies the specified target and to identify the weakness of system design and potential solutions for system reliability improvement..e event-tree/fault-tree (ET/FT), which is the method used in the current probabilistic safety assessment (PSA) framework of nuclear power plants (NPPs), was adopted to developed reliability modeling for the RPS with the Top Events defined as the system failure to generate reactor trip signal and the system generating spurious trip signal. .e evaluation results indicate that the probability of the system failure on demand and the frequency of spurious trip signal generation are 1.47×10 6 with a 95% upper bound of 4.63×10 6 and 7.94×10 /year with a 95% upper bound of 2.50×10 /year, respectively..e importance and sensitivity analyses were conducted and it was found that undetected unsafe common cause failures (CCFs) of signal conditioning modules (SCMs) dominate the system reliability. Two preliminary optimization schemes relative to reducing periodic test interval and adapting two kinds of diverse SCMs were proposed. Results of the quantitive evaluation of the schemes show that neither of them could determinedly improve the system reliability to the target level. In the future, more detailed optimization analysis shall be required to determine a feasible system design optimization scheme.


Introduction
e reactor protection system (RPS) is one of the most important safety-related systems in NPPs nuclear power plants (NPPs). It protects the integrity of the safety barriers of NPPs by generating signals to scram or drive engineered safety features when necessary. Obviously, the reliability of the RPS has an important impact on the plant safety and should be demonstrated to satisfy a certain level. With the rapid development of computer technology, digital technologies, which can provide potential to improve the system reliability through special features such as online self-diagnosis, are gradually adopted in the RPS [1]. It is necessary to develop reliability modeling for digital RPS and integrate the system model into probabilistic safety assessment (PSA) of NPPs.
So far there is no consensus on methods for reliability modeling of the digital system in NPPs. Even though some dynamic methods with great potential, for example, dynamic flow-graph methodology, have been proposed, they are still within the usage trial phase [2,3]. Furthermore, the application of a dynamic method needs substantial effort and the method generally suffers from the incompatibility with the existing PSA framework. On this viewpoint, the ET/FT method that has a mature theory and is easy-to-use got much attention and had been used in research about reliability assessments of digital systems in NPPs and yielded satisfactory results [4][5][6][7].
In this paper, the ET/FT method was used to perform reliability assessment of one digital four-channel RPS within the design phase; the main contributions for system risk were identified by importance and sensitivity analysis and two preliminary schemes for the system design optimization were also proposed and quantitatively evaluated.

Target System Description
e present paper estimated the reliability of a digital four-channel RPS during the design phase, with the intention of validating if it satisfies the specified reliability goal and obtaining meaningful risk information about the system for design improvement. e reliability goal for the RPS specified by the system requirement specification is as follows: (i) Probability of failure to generate reactor trip signal should be equal to or less than 10 − 7 per demand. (ii) Frequency of the generation of the spurious trip signal should be equal to or less than 0.1/year e schematic diagram of the four-channel RPS is provided in Figure 1. e system includes four channels (i.e., IP, IIP, IIIP, and IVP). Each channel consists of two subchannels (i.e., subchannel-1 and subchannel-2) with functional diversity and eight subchannels constituting subsystem-1 and subsystem-2. Each subchannel (see Figure 2) contains three types of signal condition modules (SCMs), that is, analog signal conditioning modules (ACM), digital signal conditioning modules (DCM), and thermocouple signal conditioning modules (TCM), two types of input modules, that is, analog input modules (AI) and digital input modules (DI), input/output extended modules (EXT), digital output modules (DO), processor modules (CPU), and communication modules (COM). Among them, CPU and COM are hotstandby redundancy configurations. Conditioning modules are used to condition, isolate, and distribute signals from sensors. AI and DI convert the signals into numerical format then transmit them to CPUs through EXT. In a subchannel, the CPU compares the signals with the predefined setpoint value and generate a local coincidence signal (LCS) if the threshold value is reached; with threshold judgment results of other three subchannels transmitted by COM, the CPU performs two-out-of-four voting logic and generation trip signal when there are two or more LCSs. Output signals of subchannel-1 and subchannel-2 of each channel are connected with "OR" gate and then open one pair of reactor trip breakers. If two out of four pairs of reactor trip breakers open, the reactor will be shut down.
For most of the design basis accidents, there are two kinds of diversity of sensor signals used to generate shutdown signals; and signals without diversity are transmitted to two subchannels through SCM.

Model Development.
e present paper is focused on the safety function of the RPS to generate a reactor trip signal. Two failure modes of the system are considered, that is, failing to generate reactor trip signal and generating spurious signal.
In order to envelop situations with different acquisition signal quantities and obtain conservative calculation results, Top Events are defined as follows based on the principle for functions allocation of the system: (i) Failing to generate reactor trip signal on demand under three sensor signals without diversity (RT 3IN FD).
(ii) Generating a spurious trip signal under one sensor signal with diversity (RT 1IN ST).
Since the component configurations for different types of measurement signal to generate trip signal just distinguish on conditioning and input modules, it might as well select analog signal to develop a case model and only simple modifications will be needed for digital or thermocouple signal. e analysis is based on the following assumptions: (i) e analysis places emphasis on the digital system itself and the failures of sensors, reactor trip breakers, and associated relays with them are not considered.
(ii) Loss of power supply of functional modules would cause their unavailability and such has a negative effect on the implementation of preset safety functions of the system. However, since there is not enough information about the supply power system at the time of the performance of this analysis and it can believe that the complete failure probability of it is very low because the power supply of a cabinet generally has triple redundancy configuration, the modeling of the supply power system exclude in the present paper. (iv) e faults in different software modules of the digital system may result in different failures. Although, from the point of view of modeling convenience, software failure can be divided into two categories depending on whether the effect of the failure is failure of a single module or simultaneous failure of multiple modules that is the same as CCF.
Examples of the failure categories may be faults in application software and faults in software functional requirements specification. Since debates on the applicability of current quantitative software     reliability methods and the lack of data and information of the system software, its modeling was not included in the current study. (v) It is supposed that the human errors have no effect on the generation of the automatic signal and human reliability analysis is out of scope. (vi) To be conservative in terms of reliability, it is assumed that once the failure of one module is detected, repair activity occurs and results in the unavailability of the module. (vii) According to the maintainability and availability requirements of the RPS specified in the system requirement specification, the meantime to repair (MTTR) and periodic test interval (TI) for modules are assumed to be four hours and six months, respectively.

Quantitative Analysis.
e reliability models used for basic events in the quantitative analysis include a repairable component for detected failure and a periodically tested component for undetected failure. e unavailability Q (t) of the repairable component is modeled by where λ and μ (�1/MTTR) are the failure rate and repair rate of the component, respectively. e long-term unavailability of the component is Q � λ/(λ + μ). e unavailability Q (t) of periodically tested component is modeled by: where λ and TI are failure rate and test interval of the component, respectively. e mean unavailability of the e failure data of the modules constituting the system was derived from results of failure modes, effects, and diagnostic analysis (FMEDA) of the modules. Parameters mainly include detected failure rate (λ D ), undetected safe failure rate (λ US ), and undetected unsafe failure rate (λ UU ) of modules, as shown in Table 1.
Two types of CCFs of the modules were considered: (1) CCFs of modules with hot-standby configuration in the same subchannel and (2) CCFs of identical modules of four channels in the same subsystem. ey are modeled by Beta model and Multiple Greek Letter model, respectively [9]. e parameters of CCFs models used in this analysis are shown in Table 2.
e parameter uncertainty was considered in the analysis. Since recognized weaknesses in the data, large error factor (EF) was assumed for the parameter, that is, 5 for failure rate and 3 for β, c, and δ [10]. In addition, the parameter was assumed to be lognormally distributed. e propagate of parameter uncertainties in terms of variation of system failure probability was evaluated. e calculation results for three types of signals are shown in Tables 3 and 4. e results indicate that when the input scram parameters are thermocouple signals the probability of the RPS failing to generate a trip signal on demand is 1.47 × 10 − 6 with a 95% upper bound of 4.63 × 10 − 6 in case of considering CCFs, which is larger than the other two types of signals. If contributions of CCFs are ignored, this value is 2.12 × 10 − 11 with a 95% upper bound of 3.83 × 10 − 10 . For the same signal type, the frequency of the system generating spurious trip signal is 7.94 × 10 − 4 / year with a 95% upper bound of 5.71 × 10 − 3 /year on condition that the FT model includes CCFs, which is also larger than the other two types of signals. When the CCFs are excluded in the system reliability model, the frequency is 2.70 × 10 − 5 /year with a 95% upper bound of 1.41 × 10 − 4 / year. Taking CCFs into account, the system reliability does not fulfill the specified reliability goal (see section 2) with regard to the probability of failure on demand of the system function. e results make it clear that CCFs of modules are the main contributors of the system failure; this is consistent with the consensus that the safety-critical protection system with redundancy multiple-channel is remarkably affected by CCFs [4,11].

Importance and Sensitivity Analysis
From the perspective of safety, the probability of the system failure on demand to generate trip signal is more of a concern in PSA. Such importance and sensitivity analyses were performed to identify the significant factors which contribute to the failure on demand of the RPS (selecting analog signal as case study). e factors include individual basic event (BE), input parameters (e.g., failure rate), and components (modules of the RPS). Importance measures commonly used include Fussell-Vesely (FV), risk decrease factor (RDF), and risk increase factor (RIF    Science and Technology of Nuclear Installations (related to individual BE or multiple BEs constituting a group) represents the contribution of the factor on the system risk, defined as where Q Top is the probability of the Top Event. Q Top,i is the probability of the Top Event calculated based only on all minimum cut sets including BEs related to factor i. RDF of factor i is a measure that indicates the decrease of system risk assuming the nonoccurrence of BEs related to the factor. Mathematically, it is calculated as where Q Top,p(i)�0 is the probability of the Top Event with assuming that probabilities of BEs related to factor i are zero.
RIF is the opposite of RDF, that is, it expresses the increase of system risk based on BEs related to the factor certainly occurring. It is expressed as where Q Top,p(i)�1 is the probability of the Top Event with assuming that probability of BEs related to factor i is one. e sensitivity of factor i related to individual BE or multiple BEs on the probability Top Event is defined as where Q Top,U and Q Top,L are the probabilities of the Top Event based on probability of BEs related to factor i multiplied by a sensitivity factor (SF) and divided by SF, respectively. When the analysis object is the input parameter,    the above two quantities, respectively, represent the probabilities of the Top Event under conditions that the parameter is multiplied and divided by SF. In this analysis, SF is defined as 10. e importance and sensitivity calculation results for the selected BEs, parameters, and components are shown in Tables 5-7. It is shown that undetected unsafe CCFs of ACMs have significant effects on system reliability. TI and λ UU of the ACM, which determine the probabilities of UU of ACMs, are decisive parameters for the system risk. e results show that ACMs are the critical component of the system.
Schemes for system design optimization shall focus on reducing the unavailability of ACMs caused by CCFs which is determined by TI, λ UU of the ACM, and CCF parameter. From the perspective of feasibility, reduction of TI might be more appropriate. In addition, enhancing the capacity of the ACM defending CCF, such as applying diversity, is also an effective approach.

Preliminary Optimization Schemes for the System
According to the insights of importance and sensitivity analyses, two preliminary optimization schemes were explored, regarding increase test frequency and adopting different kinds of diverse SCMs. e quantitative evaluations for the improvements were conducted as well. e probability of the system failing to generate trip signal on demand was calculated under different shorter TI as follows: (i) Case 1: TI for modules is reduced to three months. (ii) Case 2: TI for modules is reduced to one month. e calculation results are shown in Table 8. It is shown that the probability of system failure on demand decreases significantly when TI reduces. However, the reliability requirement of the system is still not explicitly fulfilled. With consideration of the increased maintenance costs associated with increasing the frequency of the periodic test, this approach is not very promising.
Another potential approach is the use of two kinds of diverse SCMs to improve the capacity of SCMs to defense CCF. It should be recognized that although diverse modules usually achieve the same function through different principles, materials, and so forth, it is inappropriate to assume that diverse modules are completely free of CCF, due to the use of small electronic elements manufactured in a globally standardized environment. More appropriate treatment is to assume that the CCF probability of diverse modules decreases to a certain extent. Calculations for the following three cases were performed: (i) Case 1: the CCF probability of diverse SCMs decreases by 50% (ii) Case 2: the CCF probability of diverse SCMs decreases by 75% (iii) Case 3: the CCF probability of diverse SCMs decreases by 90% e calculation results are shown in Table 9. It indicates that the use of diverse SCMs would markedly improve system reliability, but even if assuming that the CCF probability is reduced to a level that is almost ideal, the system reliability is still not determinately meeting the target. e analysis results show that the system reliability requirement cannot be fulfilled only by shortening TI or adopting diverse SCMs. More detailed optimization analysis is needed to determine the final system design optimization scheme, for example, the combination of the above scheme or change of system architecture.

Conclusions
In this paper, a safety-related digital four-channel RPS within design phase was assessed by ET/FT method to verify if the system reliability meets specified requirements regarding the function to generate reactor trip signal and to obtain important risk information for design feedback.
e results of the quantitative analysis indicate that the probability of failure on demand of the system to generate trip signal is 1.47 × 10 − 6 with a 95% upper bound of 4.63 × 10 − 6 and the frequency of the system generating spurious signal is 7.94 × 10 − 4 /year with a 95% upper bound of 2.50 × 10 − 3 . e reliability of the system function regarding generating trip signal on demand does not fulfill the reliability target of the system, that is, below 10 − 7 . e importance and sensitivity analyses were performed to identify critical factors which have significant impacts on  Note. X (X � 1, 2, 3, 4) represents channels. XY (X � 1, 2, 3, 4; Y � 1, 2) represents subchannels. ACM-1, 2, 3 represent ACMs for three sensor signals; CPU-A represents the main CPU in the subchannel.  system reliability and to determine improvement direction. It is found that undetected unsafe CCFs of SCMs dominate the probability of the system failure on demand and TI and λ of the SCMs have very high sensitivity. Quantitive evaluation for two preliminary optimization schemes relative to the improvement of TI frequency and the use of diverse SCMs was conducted. e analysis results show that neither of them could determinedly improve the system reliability to target level. In the future, more detailed optimization analysis will be performed to determine feasible system design optimization scheme, for example, the combination of the above scheme or change of system architecture.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.