Fault Diagnosis to Nuclear Power Plant System Based on Time-Series Convolution Neural Network

Nuclear power plant (NPP) is a highly complex engineering system which has typical internal feedback and strong component coupling. With these features, most NPP systems have high risk of radioactive release, which makes it essential to perform fault detection (FD) to the NPP systems. To address this challenge, this paper proposes a FD mechanism named characteristic time-series convolutional neural network (CT-CNN) based on principal component analysis (PCA), time-series analysis, and convolutional neural network (CNN) mechanisms. First, the models of NPP FD system are formulated. Then, the PCA mechanism is applied to extract the features of the NPP system. Next, the time-series analysis and CNN approaches are applied to realize FD to the NPP system. With the above mechanisms, the proposed approach has not only shown strong stability and become adaptive to di ﬀ erent data set, but also preserves both time and state characteristics of the NPP system. In experiment, it shows the proposed approach can achieve better performance in both detection accuracy and variance than the classic back propagation, LSTM method, and standard CNN algorithms. More signi ﬁ cantly, its optimal accuracy can be as high as 99.8%.


Introduction
The structure of industrial nuclear power plant (NPP) control system is rather complex as it consists of many interconnected systems and equipment [1]. Due to the high complexity, the internal feedback phenomenon is obvious in NPP system [2], e.g., the change of coolant temperature affects the coolant volume, and this effect further puts pressures to the steam generator, etc. In other words, any small deviation in the NPP system may cause a failure quickly and then makes this failure spread throughout the whole NPP system, which can consequently result in disasters [3]. Therefore, the fault detection (FD) to the NPP system is of great significance.
Since FD technology can ensure the safety and reliability of the NPP system, this technology has been extensively studied in recent years. Currently, the related studies on FD can be divided into two kinds: model-based methods and model-free methods. Model-based methods commonly rely on ideal assumptions and physical knowledge to establish mathematical models, e.g., the differential equation model based on thermodynamic equation and nuclear reactor point dynamics equation can simulate the internal state of NPP to a certain extent through numerical operation and residual evaluation. However, in practice, the modelbased method cannot establish a sufficiently accurate mathematical model, especially for highly complex coupling systems like NPP. The idealized assumptions are likely to deviate from the actual situation, and the numerical operation is likely to delay the valuable diagnosis time. These problems limit the application of this method in many application scenarios.
Compared with model-based methods, model-free methods are more popularly applied. This method is mainly divided into expert system methods and data-driven methods. The expert system models represented by fault tree analysis (FTA) [4] and random forest (RF) [4] have achieved good results. However, the expert system also faces some problems in the practical applications, such as the impossibility of exhaustive knowledge base caused by the limitation of expert knowledge and the contradiction between the poor adaptability and expert knowledge caused by the limitation of empirical knowledge. Different from the expert system, the data-driven method is more flexible and concise [5]. It does not need to carry out essential theoretical analysis and empirical rule summary, and it only needs to collect the historical operation data of NPP to establish a relatively feasible diagnosis model.
Since the data-driven approaches show the above advantages, many data-driven methods have been proposed for the FD of NPP system, such as k-nearest neighbor (KNN) method [6], support vector machine (SVM) [7], principal component analysis (PCA) [8], and other classical statistical machine learning methods. These methods are effective and have achieved good performance. However, they have the problems of poor anti-interference ability, low recognition accuracy, and high time complexity. The determined mapping function and linear classifier make many approaches unable to use the massive operation state data to extract the state features and summarize the system experience of NPP. Improved classic back propagation network [9], deep belief network (DBN) [10], and recurrent neural network (RNN) [11] are applied to this problem, but these methods still have the problem of unstable training.
To improve the FD performance, many approaches try to combine the concepts of different FD methods. Yao et al. [12] proposed a full-range FD method based on state information imaging. With this method, the state information of NPP is expressed by gray image, and the image features are extracted by Kernel Principal Component Analysis (KPCA). Then, the FD is realized by using various classification methods. Peng et al. [13] proposed a method combining correlation analysis (CA) and deep belief network (DBN), in which CA was used to reduce the dimension and DBN was in charge of training and diagnosis. Wang et al. [7] realized the diagnosis of coolant circuit of NPP system by using SVM and improved particle swarm optimization (PSO). Li and Lin [14] used the convolutional neural network (CNN) algorithms to extract the characteristics of instantaneous data of NPP system and realized the diagnosis of eight types of faults. The above methods are effective in realizing FD of NPP, yet they still have the limitation in ignoring the time relationship of NPP system data as well as the relationship between time and state. To solve this problem, many studies use the time-series approach in the mechanical failure analysis [15]. Yao et al. [16] integrated the time-domain and frequency-domain characteristics of multichannel acoustic signals through CNN and realized the diagnosis of gear fault. Chen et al. [17] input the mechanical monitoring signal of rolling bearing into 1DCNN for training and obtained the FD model of rolling bearing.
With the time-series approaches, the FD performance can be improved. However, this approach cannot be applied directly to the NPP system, and this is because the NPP system has multiple state characteristics. To address this challenge, many new FD approaches based on CNN [18,19] and long short-term memory (LSTM) network have been proposed for the NPP system. He et al. [20,21] used Markov to process the multistate data. They transformed the data into color images after flatting the multistate data into onedimensional data and extracted features through the CNN image processing functionality. The results showed that this approach could realize the diagnosis of eight types of faults of NPP system. Choi and Lee [11] combined the online monitoring technology based on signal reconstruction as well as the LSTM network to achieve FD of NPP system. However, most of these methods do not mine the time features well, or they destroy the continuity of the time series and eliminate the relationship between time series and multidimensional states. Moreover, less work has considered the relationship between NPP state data and faults from a global perspective.
To address the above challenges, this paper proposes a data-driven FD method based on time-series analysis. On the one hand, the method uses PCA method to reduce the dimension to exclude some irrelevant features and integrated some related features but not lose much information; on the other hand, the method considers the connection between the features and time series, arranged these data into a matrix, and used the convolution method to extract both partial features and partial time series at the same time, which preserves the structure compared with [20] so that it can realize the FD of NPP system more accurately and efficiently. The method does not have strict requirement on the setting of the parameter values, which makes it have high adaptability and can be applied to different NPP application scenarios.
The structure of this paper is as follows: Section 2 describes the problem of realizing FD in the NPP system; Section 3 builds the FD model and presents the solutions to find the optima of this model. In Section 4, the experiments are conducted and the results are discussed. Finally, in Section 5, a conclusion is drawn.

Problem Description
The operational status data of NPP system, which reflects the health status and fault information, is collected online in real time through the instrumentation and control system (I&C) system. This section gives the following definitions about the operational status data of NPP system.
denotes the operation state data of NPP collected by the i th sensor at a certain time t, such as the water level and pressure of the pressurizer, the steam flow of the steam generator, the opening and feedwater flow of the feedwater regulating valve, the water level, and the exhaust steam flow of the main condenser.
denote the operational status data set collected in time period t 1 − t T , where X t j (j ≤ T) is the state vector in Definition 1. Suppose the sampling time-interval of the sensors is equal, that is, for any j, there is t j+1 − t j ≡ hðh > 0Þ, where h is the sampling period.
P is a time-series matrix which has the following features: ① the data of the row or column of the matrix represents the same characteristic, and ② the data are arranged in chronological order.
Definition 3. Let E = fe 1 , e 2 , e 3 ,⋯,e D g denote the set of fault types, where e i (i ≤ D) denotes a certain type of fault, such as heat pipe water loss accident, cold pipe water loss accident, rupture of steam pipe inside the containment, rupture of steam pipe outside the containment, loss of water supply accident, and closing of the main steam isolation valve. e 0 indicates normal operational state, E′ = e 0 ∪ E.
The characteristic information of the fault is stored up not only in the operating state vector X t j at the time t j , but also in the trend X t j , X t j+1 , X t j+2 , ⋯, e.g., when the pressure data of steam generator B drops sharply, there is a fault of steam line B pipe rupture. Therefore, this paper fully considers the time sequence characteristics and uses the sliding time window to intercept the time subsequence for FD. ð2Þ denote the sliding time window, where W is the continuous submatrix of P and the number of columns is equal, ℓ is the starting index of sliding window, and w is the size of sliding time window. Then, the sliding time window represents the fault state of time t l+w . Supposing that the starting index of the sliding window increases ℓ each time, the number of sliding windows generated from a state data set P is Q = bðT − wÞ/l + 1c.
The objective is to establish a fitting mapping G : W ⟶ e i that makes its output close to the real operational state of NPP as much as possible.

Model Design
3.1. Architecture Design of FD Model. To establish the mapping G : W ⟶ e i , this paper designs a FD model based on the CNN feature extraction, PCA and sliding window mechanisms, as is shown in Figure 1. Firstly, PCA dimensionality reduction to the original data set is performed, and then the time-series submatrix is intercepted through the sliding window. Next, the CNN coder is used to analyze the features of the input sequence shaped by the sliding window. Finally, a small-size feature matrix is output to realize the classification of the input sequence.
The specific implementation process is shown in Figure 2, and the working process is as follows: (1) Obtain the initial samples from NPP system and standardize the samples 3.2. Data Preprocessing. Generally, the status data of NPP monitored by the I&C system cannot be directly used for the input data of the model. It is necessary to standardize the data based on different unit systems. Moreover, it is essential to reduce the dimension of high-dimensional status data and extract the fault features.

Data Standardization Processing.
To realize the feature extraction of NPP operational state time series, this paper uses Z-score regularization method to convert the sample data to the same dimension. The standardized conversion formula of Z-score is as follows: 3 Wireless Communications and Mobile Computing where x i,t j denotes the state data collected by the ith sensor

Data Dimensionality Reduction
Processing. In terms of the coupling and correlation of the operational state data of NPP, the PCA method is used to reduce the dimension of the sample data. The PCA method divides the state vector space ℝ M in which the sample data set is located into principal subspace and residual subspace. The principal subspace represents the change trend of the data, and the residual subspace represents the data disturbance. Then, any sample can be decomposed into the projection of two subspaces, and the projection difference between samples in the residual subspace is small. Therefore, it can ignore the projection difference in the residual space and select a vector to replace the projection of all samples in the residual subspace, so as to reduce the dimension.
Considering the orthogonal base U = ðu 1 , u 2 , u 3 , ⋯, u M−1 , u M Þ, U can be spanned into an ℝ M ; then, the sample data set state vectorX t k can be written aŝ where a ki =X t k u i T .   Wireless Communications and Mobile Computing LetX t k denote the target vector of dimension reduction, which can be expressed bỹ where m is the dimension of the principal subspace, ∑ m i=1 z ki u i is the linear representation ofX t k based on U in the principal subspace, and ∑ M i=m+1 b i u i is the linear representation ofX t k based on U in the residual subspace.
Let J denote the objective function of dimension reduction, which can be expressed by That is: By solving the partial derivative of the coordinates ofX t k , the minimum of J [22] can be denoted as u i is linearly independent. Thus, when each u i T Su i takes minimum value, it can get the minimum value of J. Next, Lagrange multiplier method can be used for each u i T Su i : The solution is  Figure 2: Flow chart of FD for NPP system.

Wireless Communications and Mobile Computing
Equation (10) is to decompose the state sequence by singular value and solve the eigenvalue of the covariance matrix. After obtaining the eigenvalue and eigenvector, Equation (8) can be expressed as In order to minimize the objective function J, the minimum M − m eigenvalues are used. In this case, the basis of the principal component subspace is the eigenvector corresponding to the maximum m eigenvalues. At this time, it can get a characteristic time-series set with m principal component state features from the original data set though the orthogonal transformation method.

Sliding Time Window
Design. The process of generating W from time-series matrix P is shown in Figure 3.
The form of sliding window W depends on window size w and sliding step ℓ in terms of Definition 4: The smaller w is, the less information a single window W contains, and the less accurate and faster fault classification is. The larger ℓ is, the smaller Q value is, and the more accurate training speed is. This is because less information is contained in the time series and the reduced total amount of data set. Therefore, the specific value of W and ℓ needs to be adjusted according to the characteristics of the data set.

CNN Model Design.
The key point to fit the mapping G is to obtain the state and time features of W. In this study, the CNN is used to extract the state and time features and then fuse these feature data. The network structure of CNN is shown in Figure 4.
The CNN consists of two parts: (1) The convolutional computation of feature extraction and (2) pool the characteristic matrix.
Equation (12) represents the part of linear transformation of the input time-series matrix, where z ðr+1Þ represents the linear transformation of the time-series matrix in the ðr + 1Þ th layer, x ðrÞ represents the input of the time-series matrix of the r th layer, and k ðr+1Þ represents the coefficient matrix of the linear transformation in the ðr + 1Þ th layer.
Equation (13) represents the part of nonlinear transformation based on linear transformation, where matrix b ðr+1Þ represents the offset added to the time-series matrix z ðr+1Þ , f represents an activation function, namely a nonlinear function which uses ReLu function, and a ðr+1Þ represents the time-series matrix output.

3.4.2.
Step 2: Pooling Process. The dimension of this matrix needs to be scaled after the convolutional process. Since the size of the time-series characteristic matrix is still the same as the original matrix, it is necessary to perform the pooling operation to make the network lightweight.
The pooling method in this study uses the maximum pooling mechanism. This method can perceive the small changes in the time series compared with other time series. After extracting time-series features via CNN, the number of parameters becomes less, and the interference such as noise becomes lower. This enables the speed of the model to be faster and the prediction of model to be more accurate.
The output of the time-series matrix of layer r + 1 through the pooling operation can be expressed as

3.4.3.
Step 3: Full Connection. The full connection layer classifies the feature data extracted from the CNN layer and outputs the corresponding result, which can be expressed as where z ðkÞ represents the one-dimensional data sequence input in the k layer. Specifically, when k is equal to 0, it represents the one-dimensional sequence flattened from the output result of CNN network. W ðkÞ represents the coefficient matrix of the k layer of the network, and b ′ ðkÞ represents the offset of the k layer of the full connection layer.
3.5. Case Study. This paper uses the random time-series faults to verify the functionality and performance of the proposed model. Figure 5 shows one of the original data used in this study case; this time-series matrix has 91 × 300 in the dimension.
The study case sets the indexes W = M = 30 and ℓ = 1 to construct the sliding window sequence. 271 sliding time windows will be generated under this parameter selection, and these submatrices will be marked in terms of the relationship between the time series and the fault state. Figure 6 shows the classification process after selecting one of the windows.
After the time-series submatrix with L = 30 is intercepted in Step ①, the dimension of the submatrix is reduced. And then, it produces a time-series submatrix with n = 30 dimension, with the gray image shown in Step ② of Figure 6. After the feature extraction is performed by CNN, the time-series submatrix becomes a 4 × 4 submatrix as is shown in the Step ③ in Figure 6.
The output result is an abstract description of the original submatrix. Some original features of the characteristic submatrix can still be shown in the output feature extraction matrix (such as vertical gradient color bands, and horizontal gray difference). After the feature extraction is completed, the number of parameters is reduced from 900 to 256. In this case, the first full connection layer with 256 hidden layers could reduce 164864 hidden layer parameters, which greatly improves the detection performance and achieves high accuracy.
Finally, through the full connection, the probability of the fault detection can be generated, as is shown in Step ④ of Figure 6. It can be seen that the precise fault is the first type of fault, namely, the heat pipe water loss accident, which is consistent with the labeled information.   Figure 7. The software is a reactor transient and accident simulation software, and its reactor model has been widely applied for the NPP system simulation [23]. In this study, ten fault states are set in PCTran/AP1000 software which supply all the train data set and test data set in experiment, the sampling period h is 1 s, and the sample are chosen by 7 different initial conditions, and every condition has a time series with 5 minutes running data. These fault states are merged with the normal state, and the OneHot coding mode is used to code the fault category. The faults and their parameters which can make the model detect faults more sensitively are given in Table 1.

Neural Network Parameter Setting.
Since the experiment could describe as series of partial differential equations, the CNN model's convolution layers can be seen as numerical simulation of partial differential equations so that the parameters of the layers do not need too many. Therefore, this study investigates the possible structure of CNN network under this data set and gives a reference structure with the 3 lays and 3 × 3 convolution kernel shown in Table 2,

Data Preprocessing
Settings. This section will discuss the influence of the sliding time window size w, the pivot feature number m, and the selection of the sliding window step size ℓ on the fault detection results. This study will experiment with different parameters based on the network structure in Table 2.
Parameters w and m are the key data for preprocessing the initial samples. On the one hand, their values determine the number of features retained in the initial sample. On the other hand, considering the significance of generating a multiparameter time series that is easy to be processed by the CNN network, setting w is equal to m so that the input matrix is a square matrix.
In addition, this study also tests the step ℓ of the sliding window to prevent over-fitting caused by too dense intervals and under-fitting caused by too sparse intervals.
The performance of the approach to select these two parameters is evaluated in this experiment, which noted as K, and Figures 8 and 9 are the results of the experiments.
Based on the results of Figures 8 and 9, if the K value is too large, the data have more noise and the convexity of the optimization model less significant so that make the model's accuracy has such large fluctuations. If the ℓ is too large, the data set is quite less information so that make the model's accuracy lower.
Specifically, it can be seen from Tables 3 and 4 that after comprehensively considering the indicators such as the average accuracy rate, the best accuracy rate, and variance, the best fit is to set the parameter K to the value 30.
In addition, the difference of ℓ will generate a different number of feature time series, and the accuracy shows a trend that the more images, the higher the accuracy in the data set. Therefore, in this data set, only the relationship between training time and accuracy needs to be considered. Thus, the value of the parameter l is set to 2. Therefore, the experiment will evaluate the performance of CNN based on the preset parameter values: K = 30 and ℓ = 2.

Discussions on Experimental Results
4.2.1. Fault Feature analysis. Through PCA dimensionality reduction and sliding window operation, the sliding window time subsequence of the principal component feature of each fault sample data is obtained. Figure 10 is the feature grayscale image of a typical sliding window selected from 11 types of state faults (including 10 types of faults and 1 type of normal state). It can be seen from the result that there are significant differences in the stripes of the grayscale images of different fault types.
The sliding window time subsequence goes through the convolutional layer to further extract fault features and generates a feature submatrix set through convolutional operation. Figure 11 lists the grayscale images of the characteristic submatrix at the time of 24 s, 48 s, 72 s, and 96 s for the normal state, the heat pipe water loss accident, and the Main steam isolation valve closed accident, respectively.
It can be seen from Figure 11 that the feature submatrix sets of the same fault type at different times of the time series have high similarity, as shown in Figures 11(a)

Diagnostic Accuracy Analysis.
In order to verify the accuracy and effectiveness of the model in detecting faults, the generated data sets are used for 20 independent training sessions, and the results are compared with the traditional classic back propagation (BP) algorithm, one part in our own method, LSTM method, and the classic CNN algorithm in [14], which just generate image without the time dimension but simply repeating the single feature vector. The comparison indicators include the accuracy and different training variances, and the results are shown in Figure 12.
The experimental results show that the accuracy and training variances of the CNN models based on time-series method and the sliding window mechanism are significantly better than those of the BP neural network. In terms of accuracy, the average accuracy of the BP neural network is only 67%, the average accuracy of the LSTM method is 86.9%, while the average accuracy of the model proposed in this 9 Wireless Communications and Mobile Computing    paper is increased to 99%. In addition, the optimal accuracy of the BP neural network is 98.7%, and the optimal accuracy of the LSTM method is 90.9%, while the proposed model of this paper is improved to 99.8%. In terms of variance, the variance of BP neural network is 0.091, and the variance of LSTM is 0.0027, while the model proposed in this paper is only 0.0007. For the comparison of the classification effect, this paper selects the confusion matrix of the four methods' one of the typical experiments for comparison. The results are shown in Figure 13. It can be seen that in the classification of the normal state, the CT-CNN method has a very good performance that only 1% of the normal state data is misdiagnosed; this conclusion is better than other methods. The difference between the normal state and the fifth fault, steam        Compared with the standard CNN method, the model proposed in this paper also shows the improvement in accuracy. The highest accuracy rate of the standard CNN method is 99.3%, and the average accuracy rate is 98.0%. By using the approach in this paper, the highest accuracy rate is improved by 0.5%, and the average accuracy is increased by 0.8%. In addition, the variance of the two methods is almost the same.
The above results show that the model proposed in this paper has strong stability and accuracy.

Conclusion
This paper proposes a FD mechanism for NPP control system, which is based on PCA, time-series analysis, and CNNs. The CNNs process the time-series data collected by the NPP system through the time window, which not only retains the organic features of the internal time and state information of

13
Wireless Communications and Mobile Computing the data, but also reduces the processing difficulty and improves the feasibility of the fault data. In addition, the proposed approach can be adaptive to different data set and demonstrates a stable training process. The experimental results show that the proposed approach can achieve better performance in both detection accuracy and variance than the classic back propagation (BP), LSTM, and standard CNN algorithms. More significantly, the optimal accuracy of the proposed model can be as high as 99.8%.

Data Availability
All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest
The authors declare that they have no conflicts of interest.