Prediction of Automatic Scram during Abnormal Conditions of Nuclear Power Plants Based on Long Short-Term Memory (LSTM) and Dropout

. A deep-learning model was proposed for predicting the remaining time to automatic scram during abnormal conditions of nuclear power plants (NPPs) based on long short-term memory (LSTM) and dropout. Te proposed model was trained by simulated condition data of abnormal conditions; the input of the model was the deviation of the monitoring parameters from the normal operating state, and the output was the remaining time from the current moment to the upcoming reactor trip. Te predicted remaining time to the reactor trip decreases with the development of abnormal conditions; thus, the output of the proposed model generates a predicted countdown to the reactor trip. Te proposed prediction model showed better prediction performance than the Elman neural network model in the experiments but encountered an overftting problem for testing data containing noise. Terefore, dropout was applied to further improve the generalization ability of the prediction model based on LSTM. Te proposed automatic scram prediction model can provide NPP operators with an alert to the automatic scram during abnormal conditions.


Introduction
Various faults and failures cannot be completely avoided throughout the lifetime of nuclear power plants (NPPs). During abnormal conditions in NPPs that do not immediately lead to automatic scram, control room operators need to gather information through human-machine interfaces (HMIs) to follow the operating procedures [1]. Conventional monitoring techniques cannot provide control room operators with information regarding when the abnormal condition will lead to automatic scram without further operator intervention. Tus, control room operators usually cannot be alerted about the reactor trip ahead of it. Intelligent prediction for automatic scram can help operators to know when will the reactor trip be triggered during abnormal conditions, so the operators can better prepare or take countermeasures for the upcoming automatic scram to avoid further damage to the reactor. Terefore, intelligent prediction for automatic scram is a promising technique to improve the operating safety of NPPs.
Intelligent prediction for automatic scram belongs to the research feld of NPP prognostics, which is an important aspect of nuclear safety. Ayo-Imoru and Cilliers [2] reviewed the studies on prognostics in the nuclear industry, and they pointed out that prognostics is a very important aspect of condition-based maintenance (CBM) in the nuclear power industry as it will help the operator and maintenance personnel to better understand how to schedule maintenance. Coble et al. [3] reviewed the studies on prognostics and health management (PHM) for nuclear power systems.
Many relevant studies focused on long-term remaining useful life (RUL) prediction for NPP systems. Liu et al. [4] proposed an exponential degradation model based on the Bayesian process and prior information to predict the RUL of equipment after the failure warning system alarm. Zio and di Maio [5] proposed a method to predict the RUL of leadbismuth reactor subsystems based on fuzzy similarity analysis. Coble and Hines [6] proposed a prediction method of RUL based on a general path model and Bayesian updating, which can predict the RUL of the system by predicting failure-related parameters. Furthermore, among the studies on prognostics and health management of NPP, the application of machine learning algorithms, especially deep-learning methods showed many benefts and potentials [7,8]. In these existing studies on NPP prognostics, the prediction target is the RUL of NPP subsystems or components, but the prediction of short-term behavior of NPPs during abnormal conditions such as automatic scram is not involved.
On the other hand, some other studies on the prognostics of NPPs involve the short-term trend prediction of NPP operating conditions. An online condition forecasting method for a natural circulation system with fow instability was developed by Chen et al. [9] based on the ensemble of the online sequential extreme learning machine (EOS-ELM). Marseguerra et al. [10,11] presented a predictive model based on neurofuzzy techniques to predict the water level in a steam generator of a pressurized water reactor (PWR). Liu et al. [12] developed a condition prediction model for NPP operating parameters based on backpropagation neural networks (BPNN). An approach for predicting NPP components condition was proposed by Liu et al. [13] based on a modifed probabilistic support vector machine (PSVM). Koo et al. [14] developed a model to provide the internal containment states information during LOCA accidents based on the rule dropout deep fuzzy neural networks (DFNN). Zhang and Hu [15] predicted radionuclides in the receiving water of an inland nuclear power plant based on a diference gated neural network. Te aforementioned studies focus on the short-term prediction of NPP operating parameters such as coolant fow rate, coolant temperature, steam generator water level, and radionuclides, but these prediction methods are not applied on the NPP events like automatic scram.
Tere are very limited studies investigating the prediction methods for automatic scram during fault conditions of NPPs. Park et al. [16] pointed out that the prognosis usually means the remaining useful lifetime, so from the point of view of an operator, the remaining time to reactor trip is regarded as the remaining useful lifetime of NPPs during abnormal conditions. As such, Park et al. [16] proposed a method to predict the expected remaining time to the reactor trip by calculating the similarity between the monitored data and the simulated data in the transient database with the operating data and used principal component analysis (PCA) to compress the dimension of data used in the similarity comparison. However, the prediction method for automatic scram proposed by Park et al. [16] is based on simple transient similarity measures and relies on calling the transient database during abnormal conditions, so the prediction results cannot be updated with the change of operating conditions. Te advanced data-driven methods developed in recent years should provide better solutions to NPP automatic scram prediction problems.
In recent years, advanced data-driven methods such as long short-term memory (LSTM), random forest, and attention mechanism have been widely used in the studies of condition prognostics and diagnostics of NPPs. Bae et al. [17] conducted real-time prediction of nuclear power plant parameter trends based on LSTM. Zhang et al. [18] developed a cost-sensitive long short-term memory (CSLSTM) model to predict abnormal or false pressurizer water levels. A method for multistep ahead prediction of leakage fow from the frst reactor coolant pump seals was proposed by Nguyen et al. [19] based on LSTM combined with ensemble empirical mode decomposition. Radaideh et al. [20] developed a DNN/LSTM expert system for predicting NPP condition parameters during LOCA accidents. Wang et al. [21] applied several LSTM-based networks for abnormal event detection, identifcation, and isolation to help maintain the safe operations of NPPs. Yu et al. [22] proposed a continuous learning monitoring strategy for NPPs based on random forest and PCA. Qian and Liu [23] applied a new gated recurrent unit (GRU) network combined with attention mechanism and transfer learning (TL) in the fault diagnosis of NPPs. Furthermore, the mutation of the aforementioned advanced data-driven methods, such as bidirectional long short-term memory (BiLSTM), interpretable machine learning, knowledge-data-driven attention (CFKDA), and self-attention, was also successfully applied in many prognostics felds, such as renewable energy generation forecast [24], battery capacity prediction [25], battery health prognostics [26], charging demand forecast [27], etc. Tese existing studies have proved the capability of advanced data-driven methods in the prediction of time series and showed the potential of these methods in solving the problem of automatic scram prediction during fault conditions of NPPs.
In the present paper, a deep-learning model based on LSTM and dropout was proposed for predicting the remaining time to automatic scram during abnormal conditions of NPPs. Te proposed prediction model was trained and tested using abnormal condition data generated by a fullscale simulator for a 300 MW PWR. Te input of the proposed model is set to be the deviation from the normal operating state, and the output is the remaining time from the current moment to the upcoming reactor trip. Te infuence of data noise on automatic scram prediction was studied, and dropout was applied to improve the generalization ability of the LSTM prediction model. Diferent from the existing method [16], the proposed models based on LSTM do not depend on the transient database when conducting automatic scram prediction, so the proposed method has better deployment fexibility and response speed. Te proposed automatic scram prediction models can assist NPP operators to prepare for automatic scram during abnormal conditions.
Te present paper is structured as follows: Section 2 introduces the specifc process of the automatic scram prediction of NPPs. Section 3 illustrates the prediction algorithm used in this study. Te experimental results of the proposed prediction model are shown and analyzed in Section 4. Section 5 shows the improvements in the proposed prediction model based on dropout for avoiding overftting. Conclusions are drawn in Section 6.

Process of Automatic Scram Prediction
When an abnormal condition occurs in a NPP because of component fault or failure, the operating parameters of the NPP will deviate from the normal operating state. An automatic scram will be triggered when the deviation of NPP operating parameters exceeds a certain limit. Terefore, for a specifc mode of NPP abnormal condition, there is a close association between the time series of operating parameters and the remaining time to automatic scram. A machine learning model can be trained to learn this association and predict when will the automatic scram be triggered if there is no operator intervention.
In the proposed prediction model, the input of the network is the deviation of monitoring parameters from the operating state before an abnormal condition occurred, and the output is the remaining time from the current moment to the upcoming reactor trip. Te deviation of the operating state is selected to be the input of the network instead of the absolute value of the monitoring parameters because, under the same type of fault mode, the operating parameters of NPP will have the similar deviation mode, but the operating conditions before the abnormal condition are not necessarily the same, so the absolute values of the operating parameters may not refect the development process of the abnormal condition. In such a way, by using the deviation of the operating state as the input of the prediction model, even if the fault event occurs during nonstandard operating conditions (for example, operating below full power), the prediction model still has a high probability of being effective. Te output of the network will be provided to the operators as an alert for the reactor trip.
Every prediction model only corresponds to one fault mode (for example, steam generator secondary side leakage) in this work. Since diferent fault modes of NPPs have diferent causes and mechanisms, it is almost impossible to train one prediction model corresponding to all fault modes of NPPs, and it is reasonable to train plural prediction models for diferent fault modes. During the NPP operation monitoring using the proposed method, only when a certain fault mode is detected and identifed by operators or by an intelligent fault diagnosis system, the corresponding automatic scram prediction model will be activated to predict the remaining time to automatic scram. In this way, one prediction model does not need to be applicable to diferent abnormal modes, and the difculty of model training is reduced.
During the training process of the prediction model, the entire dataset of the abnormal condition is available at once, so the complete time series of operating parameters from a fault condition occurrence to a reactor trip can be input into the prediction model. While in the testing of the models or automatic scram prediction in the real world, the time series of the monitored parameters are acquired sequentially and input into the prediction model in each time step. Every time the prediction model obtains the input of operating parameters, it outputs the remaining time to automatic scram. In this way, the prediction of the reactor trip can be updated continuously during abnormal conditions. Without external intervention, the output of the prediction model should form a countdown for the upcoming automatic scram, which can be provided to NPP operators as a reference. Te proposed process of automatic scram prediction is shown in Figure 1.
Te fault mode of the steam generator's secondary side leakage was used to build the proposed automatic scram prediction model. After the steam generator's secondary side leak occurs, the steam generator pressure will decrease gradually, and fnally, trigger turbine tripping and automatic reactor scram. Without the intervention of the operators, the steam generator's secondary side leakage event usually causes the reactor scram within a few minutes.
Te monitoring data used in this study was generated by a full-scale simulator of the nuclear power plant. Te reference plant for the simulator is a 300 MW PWR plant. Te reference plant has two primary coolant loops with one steam generator and one coolant pump in each loop. Te schematic diagram of the primary coolant system (PCS) of the reference plant is shown in Figure 2.
Te fault conditions with diferent severity were used in the training and testing of the prediction model. Te fault condition with the worst severity was given a severity of 1.00, and the normal operating condition has a severity of 0. Te  Table 1. Te operating parameters listed in Table 1 are all measurable in NPPs and are closely associated with the operation condition of the reactor, and the deviation of these operating parameters can show the transient process of the abnormal conditions; thus, these operating parameters are selected to build the prediction model for predicting the remaining time to automatic scram.

Prediction Algorithm
To predict the remaining time to automatic scram during NPP abnormal conditions, the prediction model must be able to obtain information from the time series of operating parameters. If conventional feedforward neural network is applied to build the prediction model, then at each time step, the prediction output will only be determined by the network input of the current time step. Since the fault Science and Technology of Nuclear Installations conditions of NPPs are transient processes, the prediction model should know not only the instantaneous operating condition of NPP but also the changing tendency of various parameters, so as to predict the automatic scram. Terefore, the feedforward neural network prediction model has to take the complete sequence or at least a time window of operating parameter time series as the network input, which will increase the scale of the network and make it difcult to train.
Diferent from the feedforward neural network, recurrent neural networks (RNNs) can obtain information from the historical network input. If RNNs are applied to build the prediction model, only the current operating parameters need to be input into the network at each time step, which improves the performance of the prediction model. Terefore, a simple RNN and its variant LSTM were applied to predict the remaining time to automatic scram in this study.

Recurrent Neural Networks (RNNs).
RNNs are a class of deep-learning algorithms frst proposed by Rumelhart et al. [28]. For conventional feedforward neural networks, the outputs of neural nodes in each layer are connected to the inputs of neural nodes in the next layer, and the nodes in the same layer are not connected to each other. Te history node outputs of feedforward neural networks have no efect on subsequent node outputs. While the layer outputs of RNNs can be stored and fed back to the input of neural nodes in subsequent time steps so that the history node outputs will afect the later ones.

NPP operation monitoring
Certain fault mode is detected and identified Corresponding automatic scram prediction model is activated   One of the most known simple RNNs is Elman neural network [29], which can be regarded as a special feedforward network including an additional context layer. Te architecture of the Elman neural network is shown in Figure 3.
Te Elman neural network uses the outputs of the hidden layer at the previous time step as part of the inputs to the hidden layer at the present time step. Te network output is as follows: where y is the output vector of the network, h is the output vector of the hidden layer, x is the input vector of the network, w HO is the weight matrix from the hidden layer to the output layer, w IH is the weight matrix from the input layer to the hidden layer, w CH is the weight matrix from the context layer to the hidden layer, φ O and φ H are activation functions of the output layer and hidden layer, and b O and b H are biases of the output layer and hidden layer. In Elman neural network, φ O is generally the linear function and φ H is generally the sigmoid or tanh function. Te superscript (t) represents the time step. By bringing the hidden layer outputs of previous time steps into equations (1), (3)(4)(5), and so on can be obtained in turn, where the layer biases are omitted. Te hidden layer output h (t) can be decomposed sequentially until the start of the input data sequence.
Te time-dependent learning characteristics of RNNs can also be illustrated by unfolding the neural network architecture through time, as shown in Figure 4. It can be found that all the data inputs back in time contribute to the current output of the Elman neural network model y (t) . Trough adding the context layer, RNNs can store and learn information from the whole time-dependent input data sequence. Terefore, the prediction models based on RNNs seem to have the ability to learn the changing tendency of operating parameters to predict the automatic scram.
However, in the backpropagation through time (BPTT) training algorithm commonly used by RNNs, the error gradient of the current time step propagates backward to the previous time steps and usually decreases exponentially during the backpropagation. Terefore, the outputs of RNNs are only afected by inputs within several time steps before the current time, while the inputs far back from the current time will not afect the network outputs. Tis is the so-called vanishing gradient problem [30]. Te prediction model of NPP automatic scram needs to learn the long-term changing tendency of operating parameters, so it is necessary to fnd a more suitable algorithm to improve the prediction performance.  Science and Technology of Nuclear Installations

Long Short-Term Memory (LSTM).
Long short-term memory (LSTM) was proposed by Hochreiter and Schmidhuber [31] as a solution to the gradient vanishing problem. LSTM is designed to adaptively control the memory length of the learned features and shows good performance in Seq2Seq problems. To solve the problem that the RNNs cannot learn information from long data sequences, the LSTM nodes add a new state value c, called cell state, and its function is to save the information of data input far away from the current moment, that is, long-term memory. Structures called gates are set in LSTM nodes to remove or add information to the cell state c. Te gates are essentially fully connected layers with outputs between 0 and 1. Generally, a LSTM node has the following three types of gates that update and control the cell state: the forget gate, the input gate, and the output gate. Te forget gate controls what information in the cell state to forget, whose output is given by the following equation: where w f is the weight matrix of the forget gate, b f is the bias vector of the forget gate, h (t− 1) denotes the hidden layer outputs of the previous time step and x (t) denotes the input of the current time step, [·, ·] represents the cascading two vectors, and φ is the activation function. Te input gate controls what new information will be encoded into the cell state. Te input gate output has the form: where w i is the weight matrix and b i is the bias vector of the input gate. Te output of the input gate i determines whether the candidate value c generated by the new input will be added to the cell state or not. Te candidate value c is given by where w c and b c are the corresponding weight matrix and bias vector.
Te cell state of the current moment step c (t) is calculated based on the cell state of the previous time step and the candidate value of the current time step.
where ⊙ represents multiplying corresponding elements of two vectors.
Te output gate controls what information encoded in the cell state is sent to the network as hidden layer output. Te output gate's output is given by where w o is the weight matrix, b o is the bias vector, and o is the output of the output gate. Te output vector of the LSTM layer is given by where h (t) will be transmitted to the next network layer as well as the LSTM layer of the next time step. Te LSTM network can be built by the above calculation fow, which is shown in Figure 5. By introducing state value and three types of gates, LSTM networks can efectively learn the long-term tendency by efciently resisting the gradient vanishing [32]. It can be expected that the automatic scram prediction model based on LSTM will have better performance.
Furthermore, some other advanced data-driven methods will be applied in future work to improve the proposed prediction model for automatic scram in NPPs. BiLSTM [33] is an improvement of LSTM, which enable additional training by traversing the input data twice in forward and backward directions. BiLSTM provides better predictions compared to LSTM in many cases [24,34], and can be used to beneft the automatic scram prediction accuracy. Te interpretable machine learning framework proposed by Liu et al. [25] can also be combined with the proposed models to analyze how the variations of NPP operating parameters during abnormal conditions afect the process leading to automatic scram.  Figure 4: Unfolding the Elman neural network architecture through time.

Prediction Experiments for Original Testing Data.
Te prediction results for automatic scram during the steam generator secondary side leak condition are shown in this section. In the training of prediction models, the training error or training loss decreases gradually with the increase in training epochs. Te prediction models with diferent training error usually have diferent prediction performance, and deep-learning models such as RNN and LSTM networks often encounter overftting problems; that is, the models are too closely ft to the training datasets, but has poor ft with the testing datasets. Terefore, in the present study, the training of prediction models was stopped when the preset goal of the training error was reached, and the performance of prediction models with diferent training errors was tested, so as to analyze the infuence of ftting degree on the prediction performance. Te training error and testing error of prediction models in this study were measured by root mean square error (RMSE), which is defned as follows: where y(i) is the actual value and y(i) is the predicted value, and N is the number of data inputs used in the tests. Elman neural networks and LSTM networks were applied in the modeling of automatic scram prediction, respectively. Firstly, the automatic scram prediction model based on the Elman neural network was tested. Te Elman neural network models were trained using Bayesian regularization backpropagation, and the learning rate was set to 0.01. During the training process, it was found that the training RMSE of the prediction models can be reduced from dozens to 0.01 or even lower within 1000 epochs. Terefore, logarithmic coordinates were used to represent the training RMSE of the prediction model in this study. Te goal of training RMSE was set to be 10 1 , 10 0.9 , 10 0.8 , . . ., 10 −1.9 , and 10 −2 , and the training was stopped when the goal of the training error was reached, then the testing dataset was applied to the trained prediction models with diferent training RMSE. In addition, Elman neural network models with 10, 15, 20, and 25 hidden layer nodes were trained and tested to evaluate the infuence of the hidden layer node number on prediction performance. Te Elman neural network prediction model was trained to achieve every set training RMSE goal for 20 times. Te average testing RMSE of Elman neural network prediction models with diferent training errors is shown in Figure 6, and the testing RMSE results with error bars are shown in Figure 7. Te x-axis of training RMSE in Figures 6 and 7 is in logarithmic coordinate.
As shown in Figures 6 and 7, when the training RMSE of the Elman neural network prediction model is greater than 10 −0.3 (about 0.5), the testing error decreases with the increase in training accuracy. However, when the training RMSE of the Elman neural network model is too low (less than 0.5), the prediction performance deteriorates, that is, overftting occurs. When the Elman neural network prediction model is overftted, its average testing RMSE increases (as shown in Figure 6), and the stability of the prediction model becomes worse, which may lead to prediction results with a very large deviation (as shown in Figure 7). In addition, the results show that when the hidden layer node number is in a reasonable range (10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25), it has no signifcant infuence on the performance of the prediction model based on Elman neural networks.
Te examples of predicted remaining time to automatic scram generated by the Elman neural network models with the training RMSE of 5.00, 0.50, and 0.01 are shown in Figure 8, where the severity of the abnormal transient used in the testing is 0.55. It should be noted that, as shown in Figure 7, the outputs of the prediction models with very small training error are especially unstable, so the prediction results generated by the prediction models with training RMSE of 0.01 varies largely at each time of training.
It can be observed that the model with training RMSE of 0.50 generated prediction results ftting well with the actual values. Te predicted remaining time to reactor trip decreases with the development of abnormal conditions; thus, the output of the prediction model can generate a countdown to the reactor trip for NPP operators. However, the prediction results generated by the models with training RMSE of 5.00 and 0.01 were not ftting so well with the actual values. Te output of the model with training RMSE of 5.00 increases with time in the initial stage of an abnormal condition, so the changing tendency of the predicted results was inconsistent with the actual value. Meanwhile, there is a signifcant deviation between the predicted results and the actual value of the model with a training RMSE of 0.01.
Te results shown in Figures 6-8 jointly prove that the training error does have an impact on the prediction performance of the automatic scram prediction models based on the Elman neural network, and whether underftting (RMSE of 5.00) or overftting (RMSE of 0.01) occurs, the prediction model will not achieve the best prediction performance. Generally, the optimal training error of the Elman neural network prediction model cannot be easily determined for diferent datasets, so the overftting problem will bring some difculties to the practical application of the automatic scram prediction model based on the Elman neural network.  Science and Technology of Nuclear Installations To improve the performance of the automatic scram prediction model, LSTM, as an improved algorithm for RNN, was applied in this study. Te same training dataset and testing dataset were used in the experiments of automatic scram prediction models based on LSTM. As the performance of LSTM prediction models is also signifcantly afected by training errors, LSTM prediction models with diferent training errors were also trained and tested to analyze the impact of training errors on the performance of LSTM prediction models. Figure 9 shows the testing RMSE of the LSTM prediction model with 10, 15, 20, and 25 hidden layer nodes, while the testing errors of the Elman neural network prediction models are also shown in Figure 9 for comparison. During the training process, it was found that even if the training epoch number of the LSTM prediction model was increased to more than 100,000, it was still difcult to reduce the training RMSE to less than 0.1, so only the results with training RMSE greater than 0.1 are shown in Figure 9. In addition, the testing RMSE results of LSTM models with error bars are shown in Figure 10.
It can be seen from Figure 9 that the average testing RMSE of LSTM models is lower than the Elman neural network models, but the diference is not signifcant. When the training RMSE of the LSTM prediction model is greater than 1.0, the testing RMSE decreases with the decrease in training error. After its training RMSE drops below 1.0, the testing error of the LSTM model does not continue to decrease, but remains basically unchanged. Furthermore, it can be seen from Figure 10 that the prediction performance of the LSTM model is stable in the range of training RMSE from 0.1 to 1.0, and the prediction model will not generate prediction results that deviate from the actual value largely like Elman neural network models.
Terefore, the LSTM prediction model can achieve good prediction performance in a large range of training RMSE from 0.1 to 1.0, which indicates that the LSTM prediction model can avoid the overftting problem to a great extent in the training process. Tis feature is an important advantage in the practical application of automatic scram prediction of NPPs.

Prediction Experiments for Testing Data Containing Noise.
Te training and testing data set used in the above experiments are based on simulated fault condition data without measurement errors and noise. In the actual monitoring process of NPP fault conditions, the monitored data obtained by the prediction model will inevitably contain measurement noise. Tis requires the prediction model to be robust to the data noise.
In order to analyze the infuence of data noise on the prediction performance of the automatic scram prediction model proposed in this paper, random white noises with diferent amplitudes were added to the normalized testing data, and the prediction models based on the Elman neural network and LSTM were tested by these noisy testing data, respectively, while keeping the training data set unchanged. Figure 11 shows the average prediction RMSE for noisy testing sets by prediction models based on the Elman neural network and LSTM with diferent training errors.
It can be seen from the prediction results of Figure 11(a) that the prediction accuracy of the Elman neural network model with RMSE greater than 0.5 is relatively less afected by data noise, but its testing RMSE is still signifcantly higher than that of LSTM prediction model shown in Figure 11(b). After the training RMSE drops below 0.5, the testing RMSE of the Elman neural network models for noisy testing data increases sharply. Especially for the testing data containing the noise with the maximum amplitude of 0.05 and 0.07, the testing RMSE of the Elman neural network models exceeds 30, which means that the prediction results at this time have deviated greatly from the actual values and are no longer of the reference value.
Te above results show that overftting has a serious infuence on the generalization ability of Elman neural network prediction models. In practical applications, once the condition monitoring data contains measurement noise, the prediction accuracy of Elman neural network models, especially the overtrained Elman neural network models, will be obviously afected.
In contrast, the infuence of testing data noise on the performance of LSTM prediction models is much smaller. Tis is mainly because the learning process of LSTM models is based on a longer data sequence, so the training error cannot decrease so low as Elman neural networks. Te generalization ability of LSTM prediction models is relatively high, so they are more suitable for actual automatic scram time prediction of the NPPs.
However, when the added noise is relatively large (the maximum amplitude is greater than or equal to 0.03), after the training RMSE error decreases below 1.0, the testing error increases with the decrease of the training error, which  indicates that the LSTM prediction models also encountered overftting for the testing data with noise.
To further demonstrate the overftting of the LSTM models, the prediction results of LSTM models with RMSE of 1.47 and 0.12 are shown in Figure 12, where the artifcial noise with the maximum amplitude of 0.07 is added to the testing data, and the severity of the abnormal condition used in the test is 0.55. It can be seen from the prediction results in Figure 12 that the LSTM prediction model with a larger training error (training RMSE � 1.47) has a relatively high prediction accuracy, while the outputs of the model with a smaller training error (training RMSE � 0.12) shows a larger deviation from the actual value in the initial stage of the abnormal condition. Figure 11(b) and Figure 12 jointly show that when there is noise in the testing data, the LSTMbased automatic scram prediction model will also have the overftting problem, and the prediction model should be improved to solve this problem.

Solution to Overfitting Based on Dropout
To solve the overftting problem in the test of input data with noise, the dropout technique was used to improve the LSTM prediction model. Dropout is a technique used for addressing the overftting problem in deep neural networks, frst proposed by Hinton et al. [35] and further discussed by Srivastava et al. [36].
Te key idea of dropout is that in each epoch of training, some neural nodes (along with their connections) are randomly dropped with a certain probability (called the dropout rate) to prevent neural nodes from coadapting with each other too much, thus improving the generalization ability of the network.
At test time, the neural network without dropout and composed of all nodes are used, but the outputs of each node need to be multiplied by the probability that the node is not discarded in the dropout training (that is, 1-dropout rate), so as to ensure the correct network output. In this study, dropout was applied to the LSTM layer of the prediction model. Te following is the detailed calculation fow. Te output layer of the LSTM prediction model uses a linear activation function, so the network output without dropout is as follows: where y is the output of the network, which is a real number indicating the remaining time to reactor trip. J denotes the number of LSTM layer nodes in the network, v j is the weight from the j-th LSTM layer node to the output node, h j is the output of the j-th LSTM layer node, and b is the output layer bias. Let the dropout rate of the hidden layer node be p, then the prediction model output becomes where r j is a Bernoulli random variable (with a value of 0 or 1) with a probability of 1 − p, and when r j is 0, the corresponding node is discarded. At test time, the prediction model uses all hidden layer nodes, but the output of each node should be scaled down to ensure the expected network output; that is, where y ' denotes the testing network output of the prediction model trained with dropout.
Te automatic scram prediction model based on LSTM and dropout was tested by a dataset with and without noise. Te prediction model with 10 LSTM layer nodes was used for testing, and dropout was applied to its LSTM layer in the training. Te dropout rate was set to be 0.05, 0.10, 0.20, 0.30, 0.40, and 0.50 to show its efect on prediction performance. Every model was trained by 100,000 epochs to ensure suffcient training. Table 2 lists the training RMSE of prediction models trained with diferent dropout rates, as well as the testing RMSE for the testing dataset with and without noise.
It can be found that, when applying dropout to train the prediction model, the training RMSE of the sufciently trained model increased. Te prediction models with greater dropout rates achieved larger training errors, which indicates that the prediction models are more likely to avoid overftting, but an excessive dropout rate may lead to underftting and deteriorate prediction performance. On the contrary, the prediction models with a smaller dropout rate generated smaller training errors and a more similar prediction performance with the models without dropout.
As shown in Table 2, for the testing data with no noise or small noise amplitude (±0.01), adopting dropout did not improve the prediction performance, while increasing the dropout rate may raise the prediction RMSE to some extent. Tis is because the original LSTM prediction model does not show obvious overftting for testing data with no noise or small noise amplitude, so increasing training error would not beneft the prediction performance. However, for the testing data with the maximum noise amplitude above ±0.03, if the applied dropout rate is appropriate, applying dropout can reduce overftting, thus reducing the testing RMSE of the prediction model. When the dropout rate is 0.2, the prediction performance for testing data with larger maximum noise amplitude (±0.03, ±0.05, ±0.07) is signifcantly improved and the overftting problem is largely avoided, while the prediction accuracy for the testing data with no noise and the maximum noise amplitude of ±0.01 has only deteriorated very slightly. Figure 13 gives some examples of prediction results for the aforementioned sufciently trained LSTM models with diferent dropout rates, where the testing data contains  artifcial noise with the maximum amplitude of ±0.07, and the severity of the abnormal condition used in the test is 0.55. Te illustrated prediction results show that the predicted results generated by the LSTM prediction model with an appropriate dropout rate (0.2) are in relatively good agreement with the actual remaining time to reactor trip, while the outputs of the model with no dropout and large dropout rate (0.5) both have a larger deviation from the actual value in the initial stage of the abnormal condition.
Te experimental results of Table 2 and Figure 13 jointly proved that applying an appropriate dropout rate to the proposed LSTM prediction model can address the overftting problem for noisy testing data.
However, it should be emphasized that all the above comparisons in this section were made on the premise of a large training epoch number (100,000). Although the LSTM prediction models with dropout can achieve better performance when the training epoch number is sufciently   large, applying dropout only avoids overftting, but cannot improve the best prediction performance that can be achieved by the prediction models based on LSTM. Te best prediction performance of the models with no dropout (i.e., the best prediction result shown in Figure 11(b)) and the prediction performance of a sufciently trained model with a dropout rate of 0.2 are compared in Table 3. Te comparison results show that the best testing RMSE of a prediction model with a dropout rate of 0.2 is not better than that of the prediction model without dropout. Terefore, in fact, the main advantage of applying dropout is to simplify the training process, that is, the training epoch number can be set sufciently large without the risk of overftting, and it is not necessary to search for the best training error, which can be difcult for the new training dataset.

Conclusions
In the present paper, a deep-learning model based on LSTM and dropout was proposed for predicting the remaining time to automatic scram during abnormal conditions of NPPs. Te prediction result of the proposed model can provide NPP operators with a predicted countdown for the upcoming reactor trip, as such the operators can prepare or take countermeasures for it to avoid further damage to the reactor.
Te proposed prediction method was trained and tested by monitoring data of abnormal conditions generated by a full-scale PWR simulator. Te input of the prediction model was the monitoring data during abnormal conditions, and the output was the remaining time from the current moment to the upcoming reactor trip. Te monitoring data of abnormal conditions with diferent severity was used to train and test the proposed prediction method separately.
Te experiment results show that the prediction model based on LSTM generated a more accurate automatic scram prediction than the prediction model based on simple RNNs such as the Elman neural network. Te LSTM prediction model did not show an obvious overftting problem when tested by the input data without noise. Furthermore, the performance of the LSTM prediction model is not sensitive to the number of hidden layer nodes.
Te infuence of data noise on the automatic scram prediction performance was also studied. When the prediction model was tested by input data containing artifcial noise, the prediction model based on LSTM achieved signifcantly better performance than the Elman neural network model, but showed some degree of overftting.
Te overftting problem of the LSTM prediction model can be addressed by adding a dropout layer. When the dropout rate is 0.2, the prediction model can largely avoid overftting for noisy testing datasets even if the training epoch number is sufciently large. However, applying dropout cannot improve the best prediction performance achieved by the proposed prediction model, and the main advantage of applying dropout is to simplify the training process; that is, the training epoch number can be set suffciently large without the risk of overftting.
Te proposed automatic scram prediction model based on LSTM and dropout can achieve good prediction performance within a wide range of model parameters, so the proposed prediction model is suitable to be applied in real NPP systems.
In the present work, the proposed prediction method for automatic scram was only applied in the fault mode of the steam generator's secondary side leakage. In the future work, more fault modes will be tested to verify the proposed model. Furthermore, advanced machine learning algorithms such as random forest, BiLSTM, and attention mechanism will be applied in the future work to improve the performance of the prediction model.

Data Availability
Te original data of abnormal condtions used in the present work are stored in the simulator system of Harbin Engineering University and can be obtained after approval.

Conflicts of Interest
Te authors declare that they have no conficts of interest.