An Association Rule Mining-Based Method for Revealing the Impact of Operational Sequence on Nuclear Power Plants Operating

. Te operations of the operators are important for nuclear safety, but conventional operating experience feedback and common data-driven methods make it difcult to explicitly fnd valuable information hidden in these operational sequences that can help the operator to provide advice at the operational level. During the nuclear power plant (NPP) operation, a large amount of historical operating data is accumulated, which records the operational sequences of the operators and the state parameters of equipment. Terefore, this paper proposes the use of association rule techniques to mine the NPP operating data to discover the operational characteristics of operators and reveal their possible impact on the NPP operation. Tis work helps to improve the operational performance of operators and prevent human-factor events. To this end, the concept of state switching values for describing the operating states of NPPs is proposed to enable the proposed method to be adapted to diferent practical application scenarios. A sequence segmentation method is proposed to be able to transform historical NPP operating data into a sequence data set for association rule mining. Furthermore, an ensemble algorithm based on sequence pattern mining and sequence rule mining and its postprocessing method are designed. Te empirical study was carried out using 20 batches of historical operating data of the cold start-up. A total of 164 original association rules are generated using the proposed method and were analyzed by experts. Te recommendations were made for 4 diferent cases that would improve the operational performance of the operators


Introduction
With the continuous increase in the number of operating experience feedback (OEF) from nuclear power plants (NPPs), analysis of human factors events (HFEs) has become an important part of OEF and is important for improving personnel performance and preventing human errors [1,2].Many studies have been conducted on HFEs in recent years [3][4][5][6].It is evident that human reliability analysis (HRA) tends to focus more on the impact of objective conditions on the operator and lacks analysis of the operational level of the operator [7,8].Moreover, the main source of HRA data for the study is full-scope simulators; operational level problems are more likely to occur under real operating conditions [9].
In HFEs, the operations of the operator have a direct impact on nuclear safety and account for a large proportion of human errors [10].Faced with an emergency situation, operators may make incorrect judgements about the state of the equipment and perform corresponding operations accordingly [11].More often, due to variations in cognitive ability, experience level, and behaviour among diferent operators at nuclear power plants, operational sequences may display difering characteristics, such as regular performance of certain operations and habitual disregard for procedural steps [12], even if the initial state of the NPP is the same.Revealing hidden characteristics in these operational sequences can help prevent human error and improve the safety of NPPs.
Currently, there are two approaches for analysing the operational sequences of NPP operators.One approach involves obtaining OEF subsequent to anomalies or accidents.However, the frequency of such cases is very low.For operations that do not generate anomalies or accidents, it becomes arduous to accurately determine the consequences for NPPs.In this context, if operators ignore characteristics that are hidden in operational sequences, the probability of HFEs is likely to increase.Another approach is the presently popular data-driven techniques such as neural networks, support vector machines, clustering, and others [13].Tese techniques can extract operational characteristics of operators from data.Data-driven techniques have extensively been utilized in condition monitoring, fault diagnosis, detection of environmental radiation, and other related domains of NPPs [14].Te issue with this approach is that the model's extracted features from data are often implicit and challenging to interpret, thus insufcient in aiding operators to enhance their operational performance for a particular case.Furthermore, the model's generalization performance is limited due to the nature of the data, resulting in a situation where implementing a data-driven model in complex real-life scenarios is unfavourable for obtaining dependable operation recommendations.
Association rule mining is one of the main techniques of data mining, which can fnd frequent patterns, associations, and correlations among variables or items in a database [15].Unlike the data-driven approach mentioned above, the method of obtaining frequent items in the data is explicit and does not involve model learning.Tis method can provide results in the form of rules that are intuitive and easy to understand.In addition, it exhibits strong applicability to various data sets and does not require excessive parameter tuning.Tese characteristics of association rules make them able be directly used to mine the operational patterns of operators from historical data, and efective conclusions and suggestions can be obtained through expert analysis.Tere has been some research into the use of association rule mining techniques for HFEs in NPPs.Jiang et al. used an association rule-based approach to assess the support and confdence level of HFEs, using the example of a steam generator tube rupture accident, to assist in reducing HFEs [16].Zou et al. used association rule techniques to identify associations and causality among HFEs [1].Tese studies have inspired us to consider using this technology to analyze the association between the operational sequence and some operating phenomena to help the operator improve operational performance and prevent human errors.In addition, with the increasing digitization of NPPs, a large amount of historical operating data is now available [17].Tese data contain information on the condition of the equipment and the operations of the operators, and already provide the data basis for carrying out related research.
Tis paper proposes a method based on association rule mining to discover the operational sequence characteristics of operators through historical operating data of NPPs and reveal the impact of these operational sequences on NPPs.Te proposed method conducts an association analysis on whether these operational sequences are conducive to the normal operation of NPPs.

Methodology
Te outline of the proposed method is shown in Figure 1.It consists of three modules, which are data preprocessing, association rule mining, and rule postprocessing.Data preprocessing aims to generate state switching values that represent operation events and to segment the raw data into sequence datasets accordingly.Association rule mining is the central part, the ultimate aim of which is to discover original rules between operational sequences and operation events represented by the state switching values.Te purpose of rule postprocessing is to obtain rules and accurate confdence for expert analysis and to reveal the impact of operational sequence on NPPs operating according to these rules and metrics.In Section 2.1, state switching values are proposed to adapt the association rule technique to varying task types.In addition, a sequence segmentation method is presented that accounts for the specifc features of data obtained from the NPPs operating data.In Section 2.2, the technical principles of association rules are introduced, and our proposed approach is shown.In Section 2.3, we outline the process of revising association rule mining results to obtain a form suitable for expert analysis.

Data
Preprocessing.An NPP has many diferent types of systems and equipment, and its data acquisition devices acquire and record thousands of major parameters [18].Tese data variables can be divided into analog quantities and switching values.Analog quantities are some physical quantities that vary continuously over time, usually represented by numerical values with units.Te state of the NPP system, such as pressure and fow, can be monitored by observing the changes in the analog quantities during operation.Te switching value is a physical quantity with only two states, usually represented by 0 and 1. Te switching value can be further divided into a control switching value and alarm an switching value.Te control switching value refects the intervention actions of the operator, e.g., the stop of a pump can be represented by 1 ⟶ 0, and the action of a control rod can be represented by 0 ⟶ 1. Te alarm switching value is a variable that refects whether the key operating parameter exceeds the safety limit.Te original dataset for the method proposed in this paper is the time series data containing the above variable types, which is transformed into a segmented sequence dataset containing control switching values and state switching values after data preprocessing.For the convenience of understanding, Figure 2 shows how the proposed method changes the format of the data during process.

State Switching Value Generation.
Since the alarm switching value describes the abnormal or accident state of the NPP in terms of a defnite safety limit, the information in the period before and after the alarm switching value is triggered will be lost.Moreover, the scenarios for triggering the alarm switching value in actual operation are very limited.Treshold defnition method.Arrange the values of the continuous variable x in ascending or descending order, and then, according to the physical meaning of the variable, mark the frst n% of the variable x as state 1, and the rest as state 0.
Diferential defnition method.Arrange the continuous variables x in chronological order, set the value of the variable x at the current moment t 1 to q 1 , and the value of the variable x at the previous moment t 0 to be q 0 , and perform the following calculations for each moment:

Raw data
Data preprocessing (Section 2.1)

Event sequence dividing
Association rule mining (Section 2.2)

Sequential rules mining Suggestions
Rules post-processing (Section 2.3)

Expert analysis
Figure 1: Outline of the proposed method.Science and Technology of Nuclear Installations Te resulting x d is then arranged in ascending or descending order, marking the frst n% of moments of the variable x d as state 1 and the rest as state 0, depending on the physical meaning of the variable.
Moving average defnition method.For continuous variables x in time order, let the value of variable x at the current moment be q m and the values of variable x at the previous m − 1 moments be q m−1 , . .., q 1 , respectively.For each moment the following calculation is performed: Ten sort x m in ascending or descending order, according to the physical meaning of the variable, and mark the frst n% of variable x m as state 1, and the rest as state 0.
Te threshold defnition method fnds the states within the statistics that are closer or further away from the safety limit by applying constraints between the upper and lower limits of the alarm switching threshold.Te diferential defnition method is used to observe states where the variables change instantaneously, either more rapidly or more slowly.Te moving average defnition method is suitable for analysing the parameters which are prone to drastic fuctuations and identifying the variables with strong nonlinear changes over some time.In practice, irrelevant and redundant variables can be removed in advance using a flter feature selection algorithm based on a defned state switching value to improve mining efciency and reduce postprocessing difculties [19].

Event Sequence Segmentation.
Before mining association rules, the event sequence should be segmented into sequence datasets.Te commonly used static time series segmentation methods are piecewise aggregate approximation (PAA) [20], discrete Fourier transform (DFT) [21,22], symbolic aggregate approXimate (SAX) [23], and special points-based method [24,25], etc.However, the process of segmenting the historical operating data of NPPs based on state switching values has the following difculties.(i) Segmentation by time intervals only would result in diferent events appearing in the same sequence at the same time and afecting each other.Moreover, if diferent events are selected as research objects, diferent segmentation results will be obtained.(ii) If the time interval is set too long, it can lead to irrelevant events appearing in the same sequence and may produce some pseudo-rules with low support and confdence.In addition, setting the time interval too short can lead to some events that would otherwise be associated with being divided into diferent sequences, resulting in low confdence in the rule.
(iii) Tere may be interference sequences in the event sequence, such as 0 ⟶ 1 and 1 ⟶ 0 that change frequently in the state switch value in a short time.(iv) If the segmented dataset has extra-long sequences, the efciency of the sequence segmentation algorithm can be seriously afected.(v) If the whole sequence dataset is used as input to the association rule mining algorithm after segmentation, it will be too computationally intensive to complete the association rule mining algorithm.
Aiming at the above difculties, a sequence segmentation method is designed in this paper to target the event sequence.Te motivation for proposing this approach is that the issues mentioned above are of an engineering nature rather than an academic nature.Existing methods struggle with the need to efciently generate segmentation sequences with diferent time window sizes and event sequences of diferent lengths from raw time series data.Terefore, the idea of event sequence segmentation is to consider only the moments where there is a change in the state switching values and to output them in two diferent sets according to whether they contain state switching values or not.Only the set containing state switching values is used for association rule mining.
Figure 3 shows the sequence segmentation process using only a single state switching value change as an example.Te input data set D is the time series data, including the timestamp, the control switching values at each time, and the change of the state switching values (0 ⟶ 1 or 1 ⟶ 0).Te process scans the database only once and records the moments and operational sequences where switching values change.Te two parameters λ and μ are used to limit the sequence length and the event interval, facilitating expert analysis during rule postprocessing.With this method, two sets of sequences S and S′ are generated for each state switching value, where S is used as input to the association rule mining algorithm and S′ is used to update the confdence in rule postprocessing.When segmenting the input with multiple state switching value changes, only parallel processing is required.

Association Rule Mining.
As it is the sequence data with a timestamp that refects the operator's operational sequence, the association rule mining module uses sequential pattern mining and sequential rule mining techniques to process the segmented sequence dataset.
Sequential pattern mining was proposed by Agrawal et al. to mine frequently occurring ordered events or subsequences as patterns [26].Te problem of sequential pattern mining can be briefy stated as follows [27].
Let I � i 1 , i 2 , . . ., i k   be the set of k items.Te sequence α � 〈a 1 , a 2 , . . ., a m 〉 (a i ⊆ I) is an ordered list in which each set of items represents events that occur at the same timestamp.For a sequence β � 〈b 1 , b 2 , . . ., b n 〉, we call α a subsequence of β if and only if there exists i 1 , i 2 , . . ., i k such 4 Science and Technology of Nuclear Installations For a given sequence data set S � s 1 , s 2 , . . ., s n  , the support of a sequence α is the number of α contained in S. If the sequence α satisfes the minimum support threshold, α is a sequential pattern.
Te key to association rule mining is to fnd all sequential patterns efciently.In this paper, the idea of prefx-projected pattern growth proposed by Pei et al. is used to traverse the search space to enumerate all frequent sequences [28].Te main ideas are as follows: (i) Te frst scan of the database yields a set of sequential patterns of length 1. (ii) Each sequential pattern is regarded as a prefx, and the complete set of sequential patterns can be divided into diferent subsets depending on the prefx.(iii) To mine a subset of the sequential patterns, the corresponding projection database is constructed and recursively mined.Given the minimum support c, the above process can be performed to obtain frequent terms, as shown in Figure 4 (lines 1-13).
Once the frequent sequences are known, they can be used to obtain rules describing the relationships between diferent sequence items [29].In this paper, we use the association rule representation in [30], i.e., A ⟹ B. Te confdence of A is expressed as fr(A ∪ B)/fr(A) and fr(•) represents the frequency of sequence occurrence.Given the minimum confdence, we use the algorithm shown in Figure 4 (lines 14 to 18) to generate rules that satisfy the conditions.Te rules have the form A ⟹ B as described above, where A is a subsequence of B.
Figure 4 shows the ensemble algorithm for association rule mining.Te data input of the algorithm is a set of segmented sequences according to the state switching values, and the set of segmented sequences with diferent state switching values can be combined as the whole input.Lines 1-13 represents the process that searches for sequential patterns, and the result is a set F composed of sequential patterns.Lines 14-18 is the process by which generates sequence rules based on the set F. Te result is a set AR consisting of sequence rules, support, and confdence shown in Figure 2 (orange table).Due to the limitations of the output data format, the minimum confdence η of the algorithm needs to be set to 1 to fnd the sequence of operations associated with the state switching values, and the true confdence is updated in the rule postprocessing.

Rule Postprocessing.
After association rule mining, each event corresponding to the change of state switching value (0 ⟶ 1 or 1 ⟶ 0) generates a rule set to be updated.Association rules for expert analysis are fnally obtained through association rule fltering and confdence updating.For expert analysis, each action in the operational sequence carries a timestamp, as shown in Figure 5.

Association Rules Filtering.
For rule A ⟹ B, we call rule A an antecedent sequence and rule B a consequence sequence.Te fnal purpose of association rule fltering is to make the antecedent sequence the operational sequence of the operator, and the consequence sequence the sequence that contains the sequence of operations and the event of a change in a state switching value after the sequence of operations has occurred.However, many of the rules generated by association rule mining may not satisfy the above purposes.Terefore, in this phase of association rule fltering, it is considered to flter the rules for each state transition set using the data format requirements.Filter the judged rule when the following occurs.Te process of confdence updating is shown in Figure 6.By executing this process on the sequence (S′) without state switching value change and the corresponding association rule (AR) generated by each state switching value change, the association rule with updated confdence can be obtained.In this process, the parameter ε is used as the minimum confdence of the truth, and fnally, the association rules that meet the requirements are obtained.

Empirical Study
To verify the efectiveness of the method proposed in this paper, the historical data of the cold start-up were used to carry out empirical research, analyze the operational sequence of operators, and put forward guidance opinions that can help improve the performance of operators and avoid HFEs [31].Te cold start-up of the NPP is the process of moving a nuclear reactor from a cold state to stable power.It can be roughly divided into 3 stages [32]: (1) system preparation, (2) subcritical state to critical state, and (3) heat-up.Tis process involves the extensive operation of pumps, valves, electrical heaters, and control rods and requires consideration of the appropriate coordination of the operating states of various subsystems [31][32][33].At present, the start-up control of reactors is mainly realized manually by operators, which takes a long time and places a heavy burden on operators, which is prone to HFEs.6 Science and Technology of Nuclear Installations

Description of the Data.
To ensure the credibility of the conclusions obtained from association rule mining, we extracted 20 batches of cold start-up data from the historical operating data of a commercial NPP.No operation errors or alarms are known to have occurred with these data.We set the start time of each batch of cold start-up data as the time of the frst control rod action (the lifting order of the control rod is represented as I⟶II⟶III), and the deadline time as the time when the reactor reaches stable power operation.Te total sample size of the extracted data was 357595.

Result of Data Preprocessing.
During the 3 Stages involved in a cold start-up, the operator is primarily concerned with the response of the neutron period of the source range channel in Stage 1, the changes in pressure in Stage 2, and the changes in temperature, pressure, and fow in Stage 3. We used the threshold defnition method, the diferential defnition method, and the moving average defnition method to defne the source range channel-related state switching values, and the diferential defnition method to defne the state switching values related to the pressure of the pressurizer, the primary loop temperature, and fow, depending on the distribution characteristics of the parameters.Te detailed description of the defned state switching values is shown in Table 1.
Based on the generated state switching values, we reduced the data dimensionality to 272 using the feature selection algorithm proposed in [34].Te sequence segmentation method in Section 2.1 is then executed, where the maximum time interval is set to 100 sec, and the maximum sequence length is set to 30.

Results and Analysis of the Association Rule.
In this study, the threshold c of minimum support was set to 3 to remove rare association rules.Although setting a low threshold may afect the efciency of the algorithm, it is meaningful to reveal the infuence of the operational sequence on the NPP operation.For example, rules with long sequences often have low support, but they may have high confdence and can be analyzed in comparison with subsequences with high support.Te true minimum confdence ε was set to 0.6 to remove weak association rules.Finally, a total of 164 original association rules are generated for further analysis, some of which are shown in Table 2.
All association rules are grouped according to the state switching value changes in the consequence sequence, and the rules are presented in the form of operational sequence ⟹ state switching value changes.After careful examination of all the association rules, four typical cases found are discussed by experts.It should be noted that the results and recommendations generated by the discussion are only applicable to the source of data collection.
Case 1. Close the emergency shut-down signal on the source range channel.
Following the operating instruction, the operator should close the emergency shut-down signal of the source range channel when the corresponding threshold is exceeded [35].Rules 1-6 show that the operation of closing this signal has high support and confdence with the 0 ⟶ 1 change of the state switching value related to the source range channel (point 2) defned by the moving average defnition method.Te maximum support of the relevant rule is 18, indicating that the phenomenon did not occur only 2 times out of 20 batches of cold start-up data.Furthermore, the rules show that the more frequently the operator withdraws the control rod in the period before performing the operation to close this signal, the lower the support and the higher the confdence in the 0 ⟶ 1 change of the defned source range channel (point 2) with the moving average state switching value.In addition, in the rules related to this operation, although there are rules in which the state switches to 1 ⟶ 0, the operation is the last item in the antecedent sequence.
Te above fndings suggest that the operation of closing the emergency shut-down signal of the source range channel causes a local peak in the source range channel.Tis peak is infuenced by the rate at which the control rod is withdrawn before the closing operation.Te higher the rate, the more Science and Technology of Nuclear Installations likely the phenomenon is to occur.After a short period, the source range channel value gradually returns to normal, and this recovery process is unafected by the subsequent withdrawal of the control rod.We make the following recommendations based on these rules.
(i) Te operator should carefully monitor the change in the source range channel value for a while before and after closing the emergency shut-down signal of the source range channel until the source range channel value stabilizes.
(ii) Before closing the emergency shut-down signal of the source range channel, the operator should reduce the control rod withdrawal rate by reducing the frequency of withdrawal or the duration of a single rod withdrawal to avoid greater fuctuations caused by this operation.
Case 2. Withdraw control rods from the lower limit position.
Before the cold start-up of the reactor, each control rod is located in the lower limit position.To achieve a cold start-up, multiple sets of control rods need to be withdrawn to the specifed position.Rules 8-11 show that withdrawing the control rods of II and III from the limit position has a certain correlation with the change of 0 ⟶ 1 of the state switching values of the source range channel defned by the threshold defnition method and the diferential defnition method, and the maximum support of the corresponding rules is 6.According to prior knowledge, control rods I and II need to be withdrawn before control rods II and III are withdrawn from the lower limit position.Te above fndings indicate that when one set of control rods is in the lower limit position, the continuous operation of the control rod withdrawal with the last set of control rods will increase the change rate of the source range channel.Tis phenomenon is not conducive to the safe operation of the NPP.Accordingly, we propose the suggestion that when a set of control rods is in the lower limit position and needs to be withdrawn, the operator should avoid continuous withdrawal operations with the previous set of control rods, thus reducing the average withdrawal rate of multiple sets of control rods and preventing a large source range channel rate of change.Case 3. Te source range channels of points 1 and point 2 are not uniformly afected by the same operation.
Based on Case 1 results, it can be further found that the number of state switching values containing the source range channel defnition of point 1 in the rule is small, and the support and confdence are low.Tis demonstrates that the nonsmooth variation in the source range channel due to the operations of the operator in the rules generated by the mining is mainly refected in point 2. Terefore, operators should pay more attention to the source range channel of point 2 during the operation of the cold start-up.It is worth noting that although the diferential defnition method was used to defne the state switching values related to the pressure of the pressurizer, temperature, and fow of the primary loop, it did not produce valuable rules related to changes in these state switching values.According to the mining results of these historical operating data, if there is a large instantaneous change rate of the pressure of the pressurizer, temperature, and fow of the primary loop under the premise of obeying the operating instruction during the cold start-up, the operator can preliminarily determine that it is not caused by the operations of the operator.Tis is when the system may be abnormal or faulty, and the cause needs to be further identifed.

Conclusions
Nowadays, NPPs have accumulated a large amount of historical operating data, from which it is valuable to reveal the impact of operational sequence on their operation.In this paper, a method based on association rule mining is proposed to analyze the operational sequence characteristics of operators and their impact on the NPPs operating.We verify the efectiveness of the proposed method using 20 batches of cold start-up historical operating data.Te results show that the raw data can be converted into segmented sequence data sets according to the defned state switching value.164 original association rules were obtained using the accompanying mining technique and its postprocessing solution.Tese rules reveal some valuable operational issues, such as the efect of control rod action on the neutron period under specifc conditions.
Te advantage of the proposed method is that it can fexibly mine the association between the operational sequences of the operator and the operation phenomenon under study through the artifcially defned state switching value.In particular, it is possible to mine operation phenomena that do not trigger an alarm but may harm the safe operation of the NPP.Te rules mined can also be used to guide the operations of operators to avoid the recurrence of these phenomena.It is important to note that the results obtained based on association rule mining do not indicate a causal relationship between the operations and the operational phenomenon to which the rule corresponds, and further analysis of the rule is necessary.Nevertheless, this approach can be used to mine rule patterns as long as historical operating data for several batches (the same scenario) is available.Te proposed method could be further explored for extended applications in the future.First, the proposed method can be improved for online data mining.Second, the training data of the operator is extracted, and the operations of the operator are evaluated by this method.Tird, it is used to study the correlation between diferent operation phenomena of NPPs and explore the implied nonlinear relationship between diferent parameter variables.

Case 4 .
No rules related to the state switching values defned by the primary loop temperature and fow rate are generated.
Tese factors are not conducive to data analysis and mining.Based on the prior knowledge of experts, this paper

Table 1 :
Te detailed description of the defned state switching values.

Table 2 :
Part of the association rules.