A Correlation-Based Feature Selection Algorithm for Operating Data of Nuclear Power Plants

Nuclear power plant operating data are characterized by a large variety, strong coupling, and low data value density. When using machine learning techniques for fault diagnosis and other related research, feature selection enables dimensionality reduction while maintaining the physical meaning of the original features, thus improving the computational eﬃciency and generalization ability of the learning model. In this paper, a correlation-based feature selection algorithm is developed to implement feature selection of nuclear power plant operating data. The proposed algorithm is veriﬁed by experiments and compared with traditional correlation-based feature selection algorithms. The experiments and comparison results show that the proposed algorithm is eﬀective in realizing the dimensionality reduction of nuclear power plant operating data.


Introduction
During the real-time operation of a nuclear power plant, the parameters such as temperature and pressure are monitored constantly and recorded. When studies based on machine learning are carried out for fault diagnosis or anomaly detection using nuclear power plant operating data, dimensionality reduction is required to avoid the calculation delay caused by the excessive data volume and the interference of weakly related parameters on the prediction accuracy [1][2][3][4][5]. Feature selection is interpretable because it retains the physical meaning of the original feature, which makes it more advantageous in related research [6,7]. Currently, the feature selection method for nuclear power plant operating data mainly uses expert knowledge or simple data preprocessing. Santhosh et al. obtained a new feature set by artificially selecting features according to professional knowledge [8][9][10]. Wang et al. comprehensively considered the reliability requirements of the nuclear power system and the coupling relationship between the parameters and expert experience feedback to determine the strong correlation parameters [11]. Wang et al. used statistical methods to remove low variance features in time-series data [12]. e work of Peng et al. was based on relevance analysis and used the Pearson coefficient to evaluate correlation to delete features that have little impact on classification [13]. Na et al. proposed a method, which combines correlation analysis with a genetic algorithm to realize automatic input selection [14,15]. However, faced with specific classification problems, some important features may be mistakenly removed or the new feature set may contain a large number of redundant features, which makes it impossible to achieve satisfactory dimensionality reduction effects with guaranteed model performance. Since nuclear power plant operating data have many types and strong coupling characteristics, common feature selection algorithms sometimes cannot accurately identify redundant features. In this context, this paper proposes a correlation-based feature selection algorithm for operating data of nuclear power plants (NPP-FS). e main contributions of the proposed method can be summarized as follows: (i) We consider the characteristics of nuclear power plant operating data and propose that the maximal information coefficient should be used to determine the strength of the correlation between features and class. e results are used as the basis for feature ranking to delete irrelevant features. (ii) We propose an improved approximate Markov blanket concept, which can avoid the risk of excessively removing redundant features when making redundant judgments on nuclear power plant operating data. (iii) We conduct a series of experiments on the data generated by the simulation based on different backgrounds. e effects of the proposed method are investigated, and the feature selection performance is compared with traditional feature selection methods. e results show that the proposed method outperforms the conventional method in the dimensionality reduction effect.
e remainder of this article is arranged as follows. Section 2 reviews classical correlation-based feature selection algorithms and points out the problems in applying these algorithms to nuclear power plant operating data. Sections 3 and 4 introduce the details of the proposed feature selection algorithm. Section 5 evaluates the validity of the proposed theory and method through experiments. Conclusions are given in Section 6, which summarizes the advantages and disadvantages of the proposed approach as well as some future topics.

Related Work
e feature selection algorithm based on correlation is a common method to obtain the input features of model learning, and it has received widespread attention in many fields such as text classification [16][17][18][19], bioinformatics [20], and genetic analysis [21]. e correlation theory regards the feature-class correlation as feature relevance and the featurefeature correlation as redundancy [22]. en, an original feature set is considered to be composed of strongly relevant features, weakly relevant but nonredundant features, redundant features, and irrelevant features. It is expected that a feature subset containing only nonredundant features and strongly relevant features can be obtained through feature selection algorithms. e most representative of these algorithms are correlation-based feature selection (CFS) [23], minimal-redundancy-maximal-relevance criterion (mRMR) [24], and fast correlation-based filter (FCBF) [25,26]. CFS uses a forward search strategy to estimate the performance of a subset of features rather than a single feature. It uses Merit s as an evaluation function to examine the effect of adding a certain feature to the Merit s value. e mRMR is similar to the CFS algorithm in that it uses a mutual information-based evaluation function to estimate the correlation and redundancy while also taking into account the performance of the feature subset as a whole. FCBF uses symmetric uncertainty as to the correlation measure and decouples the relevance analysis and redundancy analysis to sequentially perform feature ranking and feature search to gradually remove irrelevant and redundant features.
Unlike the publicly available datasets used to validate such feature selection algorithms, nuclear power plant operating data have the following characteristics. (1) Various data sources: in addition to variables such as temperature, pressure, water level, and flow from the loop, process parameters such as neutron detection and radiation monitoring are also involved. (2) Strong coupling between variables: for example, the reactor physical and thermal-hydraulic parameters influence each other. (3) Low data value density: fault or abnormality data are low, and the occurrence of a certain fault or abnormality is often reflected only by individual variables. e characteristics possessed by nuclear power plant operating data can amplify the shortcomings of the above feature selection algorithm. CFS and mRMR sometimes have poor recognition performance for strongly coupled features to take into account the overall performance of feature subsets. e symmetric uncertainty used in FCBF is weak in identifying nonlinear relationships, which can cause the results of feature ranking to fail. ese problems pose challenges for the application of correlation-based feature selection algorithms.
To overcome the problems of existing algorithms based on correlation and meet the needs of nuclear power plant operating data feature selection, we develop a novel algorithm that can provide a collection of features with strong correlation and low redundancy.

Correlation-Based Measures
Choosing an appropriate measure of correlation is crucial for a correlation-based feature selection algorithm. e mutual information is first introduced in this section. We then focus on two mutual information-based correlation measures adopted by the NPP-FS algorithm. Maximal information coefficient is applied to measure nuclear power plant operating data. Symmetric uncertainty is used to improve the approximate Markov blanket.

Mutual Information.
ere are many measures based on the information-theoretical concept of entropy among nonlinear correlation measures. It solves the measurement of the uncertainty of random variables. e entropy of variable X can be defined as where P(x i ) represents the prior probabilities for all values of X. Conditional entropy is the conditional probability distribution of random variable X when random variable Y occurs alone. It is defined as where P(x i |y j ) represents the posterior probabilities of X given the values of Y.

Science and Technology of Nuclear Installations
Mutual information can be expressed by equations (1) and (2), given by [27]

IG(X|Y) � H(X) − H(X|Y).
(3) We can easily know that IG(X|Y) � IG(Y|X) by derivation, and it can be expressed as I(X, Y). Mutual information is not normalized, and it tends to have more values. It is generally not directly used for correlation measurement.

Maximal Information Coefficient.
e maximal information coefficient is a nonparametric statistical method proposed by Reshef et al. that can measure the correlation between two variables [28]. Under the condition of sufficient sample data and information, it can identify any type of functional relationship (including the superposition of noise-free functions), and it is more equitable than mutual information.
e calculation method of the maximal information coefficient (MIC) is as follows.
Given a data set D consisting of n-ordered pairs of variables, and a scatter plot consisting of all pairs of variables in this set, the X-axis and Y-axis are divided into a grid according to the number of x, y, and the grid G is obtained. Under the condition that x and y remain unchanged, grid G has multiple division methods. If I(D|G) represents the mutual information of the dataset D divided by G, then the highest mutual information is expressed as follows: Normalize the highest mutual information obtained under the condition of different x and y to form the characteristic matrix M(D) x,y , and it can be expressed by the following equation: On the dataset D, the MIC can be calculated as follows: where xy ≤ B(n) � n α , n is the number of samples, and α is an adjustable parameter. e value of α will affect the density of the grid so that it will affect the judgment of the correlation. According to Reshef's suggestion, α is 0.6 in the experiment involved in this article. e MIC is symmetric due to the symmetry of mutual information and its score is in the range [0, 1]. When there is a noise-free relationship between two variables, the MIC tends to 1, and when the two variables are statistically independent, the MIC tends to 0.

Symmetric Uncertainty.
Symmetric uncertainty (SU) is a correlation measure based on normalized mutual information proposed by Kvalseth [29] and is often used in feature selection algorithms based on correlation. e calculation method of SU is shown in the following equation [30]: where SU(X, Y) is the SU of random variables X and Y, I(X, Y) is the mutual information between X and Y, and H(·) represents the information entropy of the random variable. SU can compensate for the bias that mutual information tends to have more value features and at the same time normalizes its values to the range [0, 1]. SU(X, Y) � 0 indicates X and Y are independent, and SU(X, Y) � 1 means that one variable can be completely predicted by another variable. In addition, SU is also symmetric to a pair of variables because of the symmetry of mutual information.

Improved Approximate Markov Blanket.
Markov blanket is a method proposed by Koller to eliminate redundant features, and its definition is as follows [31].
M j does not contain f j , and we say that e meaning of the above formula is that M j contains all the information of f j about C and F − M j − f j , that is, when there is a feature subset M j , feature f j does not contribute to classification. e redundancy feature based on the Markov blanket is defined as follows.
A feature f j is redundant iff f j is weakly relevant and there is a Markov blanket in F.
Because obtaining a feature's Markov blanket is an NPhard problem, the approximate Markov blanket method is usually used to eliminate redundant features in applications. A commonly used approximate Markov blanket is defined as follows.
f j forms an approximate Markov blanket for is approximation method takes the feature-class correlation and the feature-feature correlation as the solution condition of the Markov blanket at the same time, and the time complexity is O(N log N). is can avoid enumerating all combinations of features under nonapproximate conditions. However, the use of this approximation method in practical applications may misjudge strongly relevant features as redundant features and eliminate them, which will cause the subset obtained by feature selection to lose part of the information needed for classification [32]. erefore, this paper proposes an improved approximate Markov blanket method, which is defined as follows.
f j forms an approximate Markov blanket for In the formula, the value of c is in the range [0, 1], which is used to constrain the Markov blanket to contain the information amount of the eliminated feature. e larger the value of c, the higher the feature-feature correlation of the eliminated features and the stronger the redundancy. is Science and Technology of Nuclear Installations improvement can be explained by Figure 1 as follows. When eliminating redundant features, the feature-feature correlation consists of two parts: class-related (yellow parts) and class-independent (red parts). We impose a constraint through SU so that the deleted feature has high class-related rather than class-independent. e reason for not using the MIC as a constraint is its strong ability to mine the correlation between variables, and it may introduce more classindependent correlations into the results.
rough this improvement, it is possible to ensure that the deleted redundant features have high feature-feature correlation, thereby avoiding misjudgment of strongly relevant features and obtaining a feature set with better performance.

Algorithm.
Based on the selected correlation measures and the improved approximate Markov blanket presented before, we develop a two-stage algorithm, named NPP-FS. Given a dataset with d features and a class C, the final output is a selected subset after two stages. e detailed process of the algorithm is shown in Figure 2.
e first stage uses the original dataset as input for relevance analysis (lines 5-11). First, the MIC is used to calculate the correlation of each feature, and then the feature correlation ranking method is used to eliminate some features. In this process, set a threshold σ with a value of [0, 1], add all the features with a feature-class correlation higher than σ to an empty set S list ′ , and finally arrange the features in S list ′ in descending order of the feature-class correlation to obtain the relevance analysis subset. e feature of the first stage is that only the correlation of a single feature is considered, and the time complexity is low, but the redundant features in the dataset cannot be eliminated. e features in the relevance analysis subset should be strongly correlated or weakly correlated, and the features whose feature-class correlation is less than σ are regarded as irrelevant features and are eliminated in relevance analysis. erefore, when σ takes different values, a relevance analysis subset with different dimensions can be obtained, which also determines the minimum feature-class correlation of the features in the final subset that can be selected by the NPP-FS algorithm.
e second stage uses a subset of relevance analysis as input for redundancy analysis (lines [12][13][14][15][16][17][18][19][20][21][22]. Based on the feature correlation ranking results, the redundancy of features is evaluated through MIC and SU, and then the improved approximate Markov blanket is used to eliminate redundant features. e process of redundancy analysis includes a nested loop. e inner loop applies an improved approximate Markov blanket to the feature f j and all features in the current set whose ranking is lower than f j under the condition of a given c threshold. Only one feature is judged at a time. If the judgment condition is met, f i is regarded as a redundant feature and eliminated from the current set. After completing the inner loop, the outer loop uses the next feature of f j in the current set as the new f j . e outer loop ends when there is no new f j . e feature set at this time is the final subset obtained by the NPP-FS algorithm. e feature of the second stage is that the relevance and redundancy of features are comprehensively considered, and finally a feature selection subset with higher relevance and lower redundancy can be obtained.
Combining the first stage and second stage, it can be seen that under the condition of a given relevance analysis subset, the result of redundancy analysis is only affected by the value of c. When the σ value is fixed, the dimension of the final feature selection subset can be further controlled by selecting different c values. erefore, in practical applications, the values of σ and c can be adjusted at the same time to generate feature selection subsets composed of different features and having lower dimensions to adapt to different datasets and classification problems.

Data Sources.
Due to irresistible reasons, it is very difficult to obtain fault data in nuclear power plants, so the amount of fault data that can be collected is limited [33]. To verify the effectiveness of the NPP-FS algorithm and apply it to the classification of steady conditions and typical accident conditions, 12 conditions were simulated by nuclear power plant simulation software [34]. It generated a total of 10800 sample data, and each sample point includes 90 parameters (not considering time series). e dataset needed for the experiment can be constructed from the generated sample data. Table 1 lists the detailed information of the sample data.

Correlation Measure Verification.
is experiment is mainly used to verify the validity of the MIC as a correlation measure of nuclear power plant operating data. e experimental program is shown in Figure 3.
First, we construct the experimental dataset shown in Table 2 according to the original data in Table 1. To reflect the advantage of the MIC to evaluate the correlation, the SU is used for comparative experiments and combined with professional knowledge to evaluate the use of the MIC as a correlation measure of nuclear power plant operating data. e MIC is calculated by the minepy module [35]. When the features are continuous data, the calculation of SU is discretized by the histogram method [36]. e experimental results are retained to 6 decimal places. Figure 4 shows the calculation results of feature-class correlation, and the meaning of the abbreviations can be found in [34]. In experiments 1-5, the average value of the MIC is higher than the average value of the SU. In experiment 1, the difference between the MIC and the SU is the smallest, which is 0.236. In experiment 2, the difference between the two is the largest, which is 0.437. is shows that in the nuclear power plant operating data, the MIC is stronger than the SU as a whole to measure the correlation. But when the calculated value is less than 0.2, the difference between the calculation results of the two correlation measures is small, indicating that the performance of the two indicators is relatively close at this time. In addition, the situation where the two measures are 0 is the same, indicating that the two measures have the same effect in identifying features that are completely unrelated to classification. e above analysis shows that the results of calculating the operating data of nuclear power plant by the two measures are different, and the order of the features according to the number of the correlation measure is also inconsistent. e mechanism of different operating conditions in experiments 1-5 is clear. Combined with professional knowledge, it can be found that the use of SU to evaluate feature-class correlation will assign lower values to some features that are more important for classification, such as TAVG, PSGA, and PSGB in experiment 1, P in experiment 2, WFWA and WFWB in experiment 3, WRCA and WRCB in experiment 4, and LVPZ in experiment 5. e reason for this phenomenon is illustrated in Figure 5, which shows the TAVG and TBLD data contained in the class label "75% power" in experiment 1. It can be found from Figure 5 that the parameters of the feature TAVG fluctuate around the setting value, while the parameters of the TBLD have a fixed value. Combining Figure 4, it can be seen that when the operating parameters of a feature fluctuate in the data under the same class label, the SU cannot accurately identify its correlation, but this type of feature is widely present in the operating data of nuclear power plants.
Based on the above comparative analysis, when the MIC is used to evaluate the correlation, its ability to mine the correlation is overall stronger, and the relative value of different features is also in good agreement with the professional knowledge.

Sensitivity Analysis Experiment.
e NPP-FS algorithm contains two adjustable parameters, σ and c. σ is used to delete irrelevant features, and c is used as a constraint to approximate the Markov blanket to avoid eliminating strongly relevant features. is experiment mainly explores the influence of the different values of these two parameters on the results of the NPP-FS algorithm. e experimental program is shown in Figure 6. e experiment uses the method of establishing a classifier model to evaluate the feature selection results of the NPP-FS algorithm when different parameters are selected. First, we construct the experimental dataset shown in Table 3 according to the simulated data in Table 1. In data preprocessing, the NPP-FS algorithm is first used for feature selection, and the grid search method shown in Figure 6 is used to obtain the feature selection results when different parameter values are taken. After the experimental dataset obtains the feature selection subset by the NPP-FS algorithm, to avoid the inconsistency of the feature value range from affecting the classifier, the data are normalized. Since the maximum and minimum values of each feature in the dataset are known, the following formula is used for normalization: After data preprocessing is completed, three classifiers of logistic regression (LR) [37], support vector machine (linear kernel, SVM) [38], and k nearest neighbors (KNN) [39] are used for evaluating the selected subset of features. e evaluation of the classifier uses the average of 10-fold crossvalidation accuracy. Finally, through the performance of the classifier on different feature selection subsets, the results of the NPP-FS algorithm in different parameter values are evaluated. Figure 7 shows how the accuracy of the classifier varies with σ and c.
e results show that the accuracy of the nuclear power plant operating data on the three classifiers is different. is is caused by the different calculation principles of the classifier. On the same experimental dataset, the results of LR and SVM are relatively consistent, and the results of the two and KNN are different.

e Results of LR and SVM.
e classification accuracy obtained by the sensitivity analysis experiment using LR and SVM shows an obvious partition phenomenon. In each graph, there is a transition zone with a relatively large  Science and Technology of Nuclear Installations accuracy gradient. According to the theory of the NPP-FS algorithm, in the relevance analysis, irrelevant, weakly relevant, and strongly relevant features will be eliminated in turn as σ increases. In the redundancy analysis, with the increase of c, there is a tendency to keep the strongly relevant and weakly relevant features in the feature subset in turn. erefore, the shape of the transition zone is the result of the combined effect of σ and c. When the parameter values in the vicinity of the transition zone change, the strongly relevant features will be eliminated or remain in the feature subset, resulting in a relatively large gradient of accuracy in the transition zone.
According to the NPP-FS algorithm shown in Figure 8, the dimensionality of the subset generated by σ and c changes can be further found, that is, the results of LR and SVM have a strong correlation with the change of the feature selection subset dimension with the parameters. σ with higher accuracy should be taken in the σ interval smaller than the transition zone, but when σ is less than 0.2, the change of the feature selection subset dimension has no obvious effect on the accuracy. e value of c to obtain results with higher accuracy is usually greater than 0.4. However, when σ ∈ [0.1, 0.3] and c ∈ [0, 0.4], affected by the approximate Markov blanket, the dimension of the subset of experiment 4 first increases and then decreases with the increase of the value of c. is also shows that the improved approximate Markov blanket can effectively alleviate the problem of incorrectly eliminating nonredundant features. Figure 7, KNN has better adaptability to the classification of operating conditions of nuclear power plants and can achieve higher accuracy in a larger parameter range. In binary classification, the accuracy of experiments 1 and 4 is above 99% in the full parameter experiment range; when    P  TAVG  THA  THB  TCA  TCB  WRCA  WRCB  PSGA  PSGB  WFWA  WFWB  WSTA  WSTB  VOL  LVPZ  VOID  WLR  WUP  HUP  HLW  WCMT  WRHR  QMWT  MSGA  MSGB  QMGA  QMGB  NSGA  NSGB  TBLD  WTRA  WTRB  TSAT  QRHR  LVCR  SCMA  SCMB  FRCL  PRB  PRBA  TRB  LWRB  DNBR  QFCL  WBK  WSPY  WCSP  HTR  MH2  CNH2  RHBR  RHMT  RHFL  RHRD  RH  PWNT  PWR  TFSB  TFPK  TF  TPCT  WACC  WLPI  WCHG  RM1  RM2  RM3  RM4  RC87  RC131  STRB  STSG  STTB  RBLK  SGLK  DTHY  DWB  WRLA  WRLB  WLD  VCST  VRWS  WADS4  VCMT1  VCMT2  RDPOS  CPS P  TAVG  THA  THB  TCA  TCB  WRCA  WRCB  PSGA  PSGB  WFWA  WFWB  WSTA  WSTB  VOL  LVPZ  VOID  WLR  WUP  HUP  HLW  WCMT  WRHR  QMWT  MSGA  MSGB  QMGA  QMGB  NSGA  NSGB  TBLD  WTRA  WTRB  TSAT  QRHR  LVCR  SCMA  SCMB  FRCL  PRB  PRBA  TRB  LWRB  DNBR  QFCL  WBK  WSPY  WCSP  HTR  MH2  CNH2  RHBR  RHMT  RHFL  RHRD  RH  PWNT  PWR  TFSB  TFPK  TF  TPCT  WACC  WLPI  WCHG  RM1  RM2  RM3  RM4  RC87  RC131  STRB  STSG  STTB  RBLK  SGLK  DTHY  DWB  WRLA  WRLB  WLD  VCST  VRWS  WADS4  VCMT1  VCMT2  RDPOS  CPS   Science and Technology of Nuclear Installations σ > 0.1, the accuracy of experiment 2 is above 94%; experiment 3 has an accuracy higher than 96% in the full parameter experiment range. In combination with Figure 8, it can be found that in binary classification, the dimensionality of the feature selection subset has little effect on the accuracy of KNN, and only a few features are needed to obtain a higher classification accuracy. e results of experiments 5 and 6 show that in the multiclassification problem, the accuracy of KNN is also strongly related to the change of the feature selection subset dimension with parameters, but the range of parameters with higher accuracy is larger than that of LR and SVM.

Comparative Experiment and Analysis.
To reflect the advantages of the NPP-FS, the same dataset and experimental program as mentioned in Section 5.3 are used to compare the NPP-FS algorithm with the feature selection algorithms CFS, mRMR, and FCBF. Using the data in Table 3 as input, we run the original feature set (Fullset) and all four feature selection algorithms, NPP-FS, CFS, mRMR, FCBF, respectively, in the three classifiers of LR, SVM, and KNN. e normalization method adopts equation (8), and the way to evaluate models is the average of 10-fold crossvalidation. We use the sklearn library to build these models with default parameters, which can avoid the uncertain  P  TAVG  THA  THB  TCA  TCB  WRCA  WRCB  PSGA  PSGB  WFWA  WFWB  WSTA  WSTB  VOL  LVPZ  VOID  WLR  WUP  HUP  HLW  WCMT  WRHR  QMWT  MSGA  MSGB  QMGA  QMGB  NSGA  NSGB  TBLD  WTRA  WTRB  TSAT  QRHR  LVCR  SCMA  SCMB  FRCL  PRB  PRBA  TRB  LWRB  DNBR  QFCL  WBK  WSPY  WCSP  HTR  MH2  CNH2  RHBR  RHMT  RHFL  RHRD  RH  PWNT  PWR  TFSB  TFPK  TF  TPCT  WACC  WLPI  WCHG  RM1  RM2  RM3  RM4  RC87  RC131  STRB  STSG  STTB  RBLK  SGLK  DTHY  DWB  WRLA  WRLB  WLD  VCST  VRWS  WADS4  VCMT1  VCMT2  RDPOS  CPS    It can be seen from Table 4 that the classification accuracy of NPP-FS on the LR model is lower than that of mRMR in experiment 1 and CFS in experiment 3, but the performance of NPP-FS is better than other algorithms in other experiments. e average accuracy of NPP-FS in six experiments is 2.42% higher than that of the CFS algorithm, 4.59% higher than that of the mRMR algorithm, and 17.9% higher than that of the FCBF algorithm. Although the average number of feature subsets generated by NPP-FS is larger than that of FCBF, NPP-FS has an obvious advantage in terms of classification accuracy. e results shown in Table 5 on the SVM classifier are relatively similar to those on the LR classifier. e classification accuracy of NPP-FS is slightly lower than that of mRMR in experiment 1 and experiment 5 and slightly lower than that of CFS in experiment 3, but in all other experiments, NPP-FS outperformed the other algorithms. e mean of the six experiments' accuracy is 1.65% higher than that of CFS, 4.66% higher than that of mRMR, and 17.47% higher than that of FCBF. Similarly, although the NPP-FS algorithm produces a larger average number of feature subsets than FCBF in SVM, it has a remarkable advantage in terms of classification accuracy. As shown in Table 6, the results of NPP-FS, CFS, mRMR, and FCBF on the KNN classifier are better than those on LR and SVM classifiers overall. e accuracy of NPP-FS is slightly lower than that of FCBF in experiment 2, but in all other experiments, NPP-FS is superior to the rest of the algorithms. e average accuracy of Science and Technology of Nuclear Installations NPP-FS in six experiments is 0.32% higher than that of the CFS algorithm, 2% higher than that of the mRMR algorithm, and 11.3% higher than that of the FCBF algorithm. e average number of NPP-FS feature subsets is the only one greater than the smallest FCBF among the four.
According to Tables 4-6, we can conclude that the overall performance of the NPP-FS algorithm is better than that of CFS, mRMR, and FCBF. is is made possible by the fact that NPP-FS obtains a subset of features that perform well on each experimental dataset. e remaining three feature selection algorithms suffer, to varying degrees, from the problem of making the model performance sharply decline, such as CFS in experiment 5, mRMR in experiment 3 and experiment 6, and FCBF in experiment 3, experiment 5, and experiment 6. e reasons for this phenomenon are explained below. e strong coupling between nuclear power plant operating data features is often complex, and at the same time, deterministic mathematical relationships and features  with such relationships are difficult to identify whether they are redundant features by general feature selection algorithms. When these features contain information necessary for classification, the feature selection algorithm incorrectly removes them as redundant features. If these features are not relevant to the classification, the results are not influenced by this factor. NPP-FS uses the MIC to evaluate the correlation, which can effectively identify the complex nonlinear relationships in the nuclear power plant operating data. Meanwhile, the misjudgment of redundant features is mitigated by an approximate Markov blanket improved for nuclear power plant operating data. erefore, the overall performance of NPP-FS is superior in the experimental dataset.

Conclusion
In this study, we propose a novel correlation-based feature selection algorithm for solving the problem of poor performance in identifying redundant features when the existing correlation-based feature selection algorithm analyzes nuclear power plant operating data. We propose the MIC as a correlation measure for the characteristics of nuclear power plant operating data and conduct correlation analysis experiments on the MIC and SU by combining professional knowledge. e results of this section show that the MIC is more applicable to the nuclear power plant operating data. In addition, we purposefully improved an approximate Markov blanket to enable the task of eliminating redundant features of nuclear power plant operating data under a given constraint. en, we demonstrate the validity of the proposed theory and method through parametric sensitivity analysis experiments. Finally, we compare the performance of the proposed algorithm with several typical correlation-based feature selection algorithms.
e results show that the proposed algorithm performs better than the conventional algorithm.
Our proposed approach provides a general dimensionality reduction method for the application of machine learning techniques on nuclear power plant operating data. It can generate a feature-selected subset with good classification performance from labeled data under appropriate given parameters, resulting in a remarkable reduction of feature dimensionality, thus improving the computational efficiency and generalization ability of the model. However, there are two points to note in the application of this algorithm. (1) e MIC is less efficient to calculate compared to the traditional correlation measure. We can improve the efficiency of this algorithm by using SU to preprocess irrelevant features based on the characteristics that the MIC and SU respond consistently to irrelevant features. (2) e sensitivity of the two parameters of the algorithm is poorly generalized across different datasets, and the parameters still need to be selected according to the specific dataset and classifier. When applying this algorithm, it is recommended to sample the data with class labels into a small sample dataset and then perform feature selection.
ere are several topics for further focus in the future. First, it would be interesting to explore the performance of additional correlation measures on nuclear power plant operating data and methods for evaluating that performance. Second, the proposed approach can be combined with association rules and causal analysis in the field of nuclear power plants for studies such as anomaly detection.  Does not contain any necessary information

H(·):
Information entropy IG(·)/I(·): Mutual information f: A single feature C: Collection of class labels F: A feature set J(·): Optional correlation measure {Data number}: Dataset with a class label σ: A parameter of the algorithm c: A parameter of the algorithm.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.