A Method for Detecting Incipient Faults in Satellites Based on Dynamic Linear Discriminant Analysis

Timely detection and treatment of possible incipient faults in satellites will effectively reduce the damage and harm they could cause. Although much work has been done concerning fault detection problems, the related questions about satellite incipient faults are little addressed. In this paper, a new satellite incipient fault detection method was proposed by combining the ideas of deviation in unsupervised fault detection methods and classification in supervised fault detection methods. First, the proposed method uses dynamic linear discriminant analysis (LDA) to find an optimal projection vector that separates the in-orbit data from the normal historical data as much as possible. Second, under the assumption that the parameters obey a multidimensional Gaussian distribution, it applies the normal historical data and the optimal projection vector to build a normal model. Finally, it employs the noncentral F-distribution to test whether a fault has occurred. The proposed method was validated using a numerical simulation case and a real satellite fault case. The results show that the method proposed in this paper is more effective at detecting incipient faults than traditional methods.


Introduction
With the reduction in the costs of launching rockets and manufacturing satellites, the number of satellites operating in orbit increases annually, bringing large economic benefits to society [1][2][3]. However, due to the harsh operating environment of satellites and human error, key modules or components of satellites in orbit may have abnormalities or experience failures [4]. If incipient faults can be detected and dealt with promptly in their early stages, the damage and harm they cause will be effectively reduced [5]. erefore, the detection of incipient faults in satellites is receiving an increasing amount of attention because it is one of the key technologies that ensures the normal operation of satellites [6,7]. e current common method used to detect faults in satellites is to compare telemetry parameters with preset thresholds directly [8,9].
is fault detection method is suitable for detecting abrupt and large faults. However, it may be less effective at detecting incipient faults because the telemetry parameters with an incipient fault may not change significantly from their normal condition [10]. If the fault detection threshold was set too low, the fault detection method would be sensitive to noise and cause frequent false alarms; whereas if the threshold was set too high, some early symptoms of the fault might be missed. In addition, as the production batches, processes, and operating environments of different satellites are not identical, different fault detection thresholds may need to be determined for different satellites, and it is inefficient to manually set the appropriate threshold for each telemetry parameter.
Model-based satellite fault detection methods are more intelligent than threshold comparison methods and often combine fault detection, isolation, and recovery functions [11][12][13]. However, with the rapid development of science and technology, a variety of new technologies, materials, and highly integrated devices are being used in satellites. e complex coupling relationships between the various components of a satellite and the lack of familiarity of various faults can make it difficult to build accurate and comprehensive fault detection models, thus limiting the application of model-based satellite fault detection methods.
In recent years, data-driven fault detection methods have become a popular research topic due to the advantages of low expert involvement, high modeling efficiency, and high scalability [14][15][16]. At present, data-driven fault detection methods are mainly divided into two categories: unsupervised fault detection methods and supervised fault detection methods.
e core idea of unsupervised fault detection methods is deviation. ese methods use the normal historical data to automatically build a model that characterizes the normal condition of the satellite. It assumes that a fault has occurred when the actual in-orbit data deviates significantly from the model characterizing the normal data. Since the ground test data and the in-orbit data of satellites contain mostly fault-free data, fault detection methods based on unsupervised learning have been widely researched and applied. Representative unsupervised fault detection methods are one-class support vector machine (OCSVM) [17], inductive monitoring system (IMS) [18], principal component analysis (PCA) [19], Gaussian process regression (GPR) [20], long short-term memory (LSTM) [21], and so on. Although these methods use different principles to build normal models, they all have one thing in common-all normal models used to detect faults are obtained by learning from the normal historical data. Once the learning process is complete, each method will use a fixed and invariable model to detect faults, regardless of the variation of the actual inorbit data, with no optimization or adjustment for the actual faults that may occur. e core idea of supervised fault detection methods is classification. ese methods learn and build a classifier from the normal historical data and various real or simulated fault data. If the in-orbit data were classified as a normal class by the classifier, the in-orbit data would be free of faults. Conversely, if the in-orbit data were classified as a fault class by the classifier, the in-orbit data would be deemed to be faulty in some way. Representative supervised fault detection methods are linear discriminant analysis (LDA) [22], support vector machine (SVM) [23], neural networks [24], random forest [25], and so on. However, due to the high reliability of satellites, most of the samples collected by satellite operation and maintenance systems are normal, and fault samples are exceedingly rare. In addition, the classification models built using the fault samples from different satellites may not be generalized, thus hindering the application of supervised fault detection methods in the satellite domain.
Based on the existing research, this paper proposes a new satellite incipient fault detection method that combines the ideas of deviation in unsupervised fault detection methods and classification in supervised fault detection methods. e main contributions of our work are summarized as follows: (1) is paper first uses the idea of classification to find an optimal projection vector separating the in-orbit data from the normal historical data. Specifically, this paper considers the fault detection problem as a binary classification problem and uses LDA to find the optimal projection vector where the in-orbit telemetry data can be distinguished from the normal historical data to the greatest extent. (2) is paper then uses the idea of deviation to test whether a fault has occurred in the in-orbit data.
Specifically, a normal model is built using the normal historical data and the optimal projection vector, and the fault is determined by testing whether the deviation of the in-orbit data from the normal model exceeds the threshold.
is paper is organized as follows: A brief introduction about LDA is given in Section 2. e fault detection method based on dynamic LDA is presented in detail in Section 3. en, the proposed method is illustrated and analysed using a numerical simulation case and a real satellite fault case in Section 4. Finally, conclusions are given in Section 5.

Linear Discriminant Analysis
Linear discriminant analysis, also known as Fisher discriminant analysis [26], is a supervised dimensionality reduction and classification method which is widely used in the field of pattern recognition and machine learning [27][28][29]. Taking the binary classification as an example, given a data set D � (x i , z i ) n i�1 , where x i ∈ R m is a column vector of multidimensional telemetry parameters, z i ∈ R is the corresponding class label, m is the number of variables need be monitored, and n is the is the number of samples. ere are only two values of z i , z i ∈ 0, 1 { }. Let n j ∈ R, X j ∈ R n j ×m , μ j ∈ R m , and Σ j ∈ R m×m , respectively, represent the number of samples, the set of samples, the mean vector, and the covariance matrix of the class j, j ∈ 0, 1 { }. We assume that the projection vector is w ∈ R m . For each sample x i , the projection of x i onto the vector w is w T x i . Moreover, the projections of μ 0 and μ 1 onto the vector w are w T μ 0 and w T μ 1 , respectively. e scatter of each class after projection onto the vector w is S 2 j , as shown in We expect that the samples of the same class are clustered together as much as possible after projection onto the vector w, while the samples of different classes to be more dispersed [30]. us, we can construct the objective function of LDA J(w), as shown in In equation (2), S b � (μ 0 − μ 1 )(μ 0 − μ 1 ) T and S w � Σ 0 + Σ 1 . e objective of LDA is to find an optimal projection vector w that maximizes J(w). Let w T S b w � 1, the problem of finding the optimal projection vector w can be transformed into an optimization problem, as shown in 2 Computational Intelligence and Neuroscience max w T S b w, e optimization problem in equation (3) can be solved by Lagrange multiplier method and then we obtain From equation (4) and the relationship between eigenvalues and eigenvectors [30], we can know that the projection vector w is an eigenvector of the matrix S −1 w S b . Furthermore, the optimal projection vector w is the eigenvector corresponding to the largest eigenvalue of the matrix S −1 w S b . is article intends to use sliding windows and hypothesis testing methods to test whether a fault has occurred in the in-orbit data. e general idea of fault detection is as follows:

Incipient Fault Detection Method Based on
(1) Sliding windows with the length of n 1 are used to extract the in-orbit data in real time. Let the in-orbit data in the kth sliding window be X k ∈ R n 1 ×m . We assume that a fault has occurred in X k , and X k belongs to a different class from the normal historical data X 0 ∈ R n 0 ×m . (2) LDA is used to find an optimal projection vector w k ∈ R that separates the normal historical data X 0 from the in-orbit fault data X k as much as possible. (3) A normal model is built using the normal historical data X 0 and the optimal projection vector w k . (4) Whether the in-orbit data X k deviates significantly from the normal model is tested. If there was a significant deviation, then the original hypothesis that X k and X 0 belong to different classes is valid, and a fault has occurred in X k . If there was no significant deviation, then the original hypothesis is not valid, and there is no fault in X k .
As can be seen earlier, the traditional use of LDA is static. e optimal projection vector will be fixed once the training data are determined. e new use of LDA in this paper is dynamic. For each sliding window of the in-orbit data X k , an optimal projection vector w k is obtained using dynamic LDA. As the in-orbit data X k may vary from different windows, the optimal projection vector w k may not be the same for each LDA process. Due to the use of dynamic LDA, the optimal projection vectors can adjust the in-orbit data in real time, making the proposed method more adaptable to potential faults.

Construction of the Normal Model.
After the optimal projection vector w k is obtained, we need to verify whether there is a significant deviation between X k and X 0 . However, how large of the deviation is the significant deviation? erefore, we need to determine the normal fluctuation range of deviation between X k and X 0 when the in-orbit data X k is normal, and then use the normal fluctuation range to build a normal model. A fault is considered to have occurred when the deviation is outside the acceptable range.
In this paper, the objective function of LDA J(w) is used as the measure of deviation. We assume that the normal historical data X 0 ∈ R n 0 ×m and the in-orbit data X k ∈ R n 1 ×m obey two m-dimensional joint Gaussian distributions X 0 ∼ N(μ 0 , Σ 0 ) and X k ∼ N(μ k , Σ k ), respectively. e projections of X 0 and X k onto the vector w k are f ∈ R n 0 and g ∈ R n 1 , respectively. Based on the property of the m-dimensional joint Gaussian distribution [31], it is clear that f and g obey one-dimensional Gaussian distributions Since X 0 and w k have been obtained after using LDA, it can be considered that the mean vector μ 0 , the covariance matrix Σ 0 , and the optimal projection vector w k in equation (5) are known and fixed, while the mean vector μ k and covariance matrix Σ k associated with X k are unknown and variable. As μ 0 , Σ 0 , and w k are all known, we can assume that w T k μ 0 � c 1 and w T k Σ 0 w k � c 2 , where c 1 and c 2 are two constants. en, equation (5) can be reduced to For the purpose of obtaining the normal fluctuation range of J(w k ), we assume that X k and X 0 belong to the same class and X k is obtained by sampling the joint Gaussian distribution which X 0 obeys. Since f and g obey one-dimensional Gaussian distributions and are the projections of X k and X 0 onto the vector w k , respectively, we can consider that g is obtained by sampling the one-dimensional Gaussian distribution which f obeys. Based on the property of the one-dimensional Gaussian distribution [32,33], the sample mean value w T k μ k of g obeys a one-dimensional Gaussian distribution, as shown in equation (7). (n 1 − 1)w T k Σ k w k /w T k Σ 0 w k obeys the chi-square distribution with degrees of freedom of n 1 − 1 [32,33], as shown in the following equations: Computational Intelligence and Neuroscience Since w T k μ k obeys a one-dimensional Gaussian distribution and c 1 is a constant, w T k μ k − c 1 also obeys a onedimensional normal distribution, as shown in As w T k μ 0 � c 1 and w T k Σ 0 w k � c 2 , we can obtain Furthermore, we can get equation (12) from the relationship between the standard normal distribution and the chi-square distribution: us, the numerator of J(w k ) satisfies Using w T k Σ 0 w k � c 2 to simplify equation (8), we can get erefore, the denominator of J(w k ) satisfies In summary, the numerator of J(w k ) multiplied by a constant n 1 /c 2 obeys a chi-square distribution with a 1 degree of freedom. e denominator of J(w k ) minus a constant c 2 and then multiplied by a constant (n 1 − 1)/c 2 obeys a chi-square distribution with n 1 − 1 degrees of freedom.
erefore, the denominator of J(w k ) obeys a noncentral chi-square distribution. In addition, G 1 (w k ) and G 2 (w k ) are independent of each other. e relationship between the chi-square distribution and the Fdistribution and equations (13), (15) show that 1/n 1 J(w k ) obeys a noncentral F-distribution with degrees of freedom of n 1 − 1 and 1 and a noncentral parameter c 2 , as shown in 1 erefore, we can use the noncentral F-distribution to test whether there is a fault in X k [33]. Given a significance level α, the detection threshold ε k ∈ R of 1/n 1 J(w k ) can be obtained from the noncentral F-distribution test. If the value of 1/n 1 J(w k ) was greater than or equal to ε k , we consider that X k and X 0 belong to the same class, and there is no fault in X k . If the value of 1/n 1 J(w k ) was less than ε k , we consider that X k and X 0 belong to different classes and a fault has occurred in X k . Taking the reciprocal of 1/n 1 J(w k ), we can obtain

Overall Fault Detection
Process. e pseudocode for the overall fault detection method based on dynamic LDA is as follows: (1) Each parameter of the normal historical samples X 0 is normalized by Z-score to obtain X 0 (2) A sliding window with the length of n 1 is used to extract the in-orbit data and X k is obtained (3) e in-orbit data X k is normalized by Z-score to obtain X k (4) LDA is used to find the optimal projection vector w k between X 0 and X k (5) A normal model is built using X 0 and the optimal projection vector w k , and the detection threshold ε k is obtained with the significance level α (6) e value of LDA objective function J(w k ) is calculated according to equation (5)  (7) Determine whether J(w k ) is greater than 1/n 1 ε k ?. If J(w k ) > 1/n 1 ε k was, the in-orbit data X k is faulty; otherwise, X k is normal. Let k � k + 1, the in-orbit data X k+1 of next sliding window will be tested from Steps 3 to 7.
e computation cost of finding the optimum projection vector for each window mainly consists of matrix inversion, matrix multiplication, and solving eigenvalue problem. e time complexities of these three parts are O(n 3 ), where n is the number of monitored variables. Considering all the aforementioned computation cost parts, the computation cost of finding the optimum projection vector for each window is O(n 3 ).

Experiment with the Fixed Fault Magnitude.
A numerical simulation experiment which includes three faults was conducted to verify the effectiveness of the method proposed in this paper. e system was modeled as shown in In equation (18), In this paper, four evaluation indexes: fault detection rate (FDR), false alarm rate (FAR), F1 value, and AUC value were chosen as the indexes for evaluating the fault detection results: e other parameters of the numerical case were set as follows. e total number of samples was 120,400 of which 60,200 were normal historical samples and 60,200 were inorbit samples for testing. e sliding window length was 300, Computational Intelligence and Neuroscience and the sliding window interval was 100 for both the normal historical data and the in-orbit data in the experiment. After using sliding windows, both 600 windows were obtained from the normal historical data and the in-orbit data. e first 300 windows of the 600 windows of the in-orbit data were normal windows, while the last 300 windows were fault windows. e signal-to-noise ratio (SNR) was set to 30 dB [34].
In this paper, eight common fault detection methods were chosen as comparison methods, namely isolation forest (IForest) [35], OCSVM [36], kth nearest neighbor (KNN) [37], local outlier factor (LOF) [38], histogram-based outlier score (HBOS) [39], PCA with T 2 statistic (PCA + T 2 ) [19], PCA with squared prediction error statistic (PCA + SPE) [19], and PCA with combined index (PCA + CI) [40]. For comparison purposes, the parameters monitored by these eight methods were the mean values of each sliding window samples instead of the original values. IForest, OCSVM, KNN, LOF, and HBOS were implemented using the opensource program PyOD [41]. e parameters of PyOD are shown in Table 1. e significance of the parameters were detailed in Appendix (if there were no special explanations, other parameters were default values). For the three PCAbased fault detection methods, the cumulative variance contribution rate was 90%, and the confidence level of the statistic was set to 95%. e significance level of the proposed method was set to 0.005. e detection results of nine fault detection methods for the fault f 1 are shown in Figure 2. As can be seen from e detection results of nine fault detection methods for the fault f 2 are shown in Figure 3. As is shown in Figure 3, the results of the first seven fault detection methods are not satisfactory for the fault f 2 . e fault or anomaly scores of these methods except OCSVM did not change significantly before and after insertion of the fault f 2 . Although the detection result of OCSVM is better, there are still a large number of fault windows below the threshold. It can be seen from Figure 3(i) that the proposed method has a good detection result for the fault f 2 .
As can be seen from Figure 4, except OCSVM and the proposed method, the other seven fault detection methods have poor performance in the detection of the fault f 3 . However, the detection result of OCSVM is not stable. In other words, due to randomly generated signal sources and noise sources, OCSVM may obtain good or poor results. e three faults were simulated randomly 100 times in this paper [42], and then the average values of the fault detection results were calculated and are shown in Table 2.
As can be seen from Table 2, all the false alarm rate of the nine fault detection methods were concentrated in the vicinity of 5% ∼ 7%. Consequently, it could be considered that the results in Table 2 are comparative results under similar false alarm rate condition. In terms of fault detection rate, the fault detection rate of the proposed method for the faults f 1 , f 2 , and f 3 ranked 5th, 1st, and 1st, respectively. As for the fault f 1 , the proposed method ranked 5th but only 3.19% lower than the 1st method. In terms of F1 value, the F1 values of the proposed method for the faults f 1 , f 2 , and f 3 ranked 5th, 1st, and 1st, respectively. In terms of AUC value, the AUC values of the proposed method for the faults f 1 , f 2 , and f 3 ranked 3rd, 1st, and 1st, respectively. Figure 5 shows the AUC value of 100 simulation results using OCSVM and proposed method to detect the fault f 3 . As can be seen from Figure 5, the detection results of OCSVM are unstable. erefore, the evaluation index of OCSVM is not very high after averaging. On the contrary, the detection results of the proposed method are stable and satisfactory. Figure 4(i) that the proposed method has large allowances for the faults f 2 and f 3 . In other words, it seems that the proposed method can also detect the smaller magnitude of the faults f 3 . To test the ability of detecting smaller faults of the proposed method, another experiment was conducted and the fault f 3 was taken as an example. All the simulation environment and experimental parameters were retained, but the fault magnitude of f 3 in equation (19) was varied. e fault magnitude of f 3 was increased from 0.001 to 0.08, and the increase interval was 0.001. Each fault magnitude was simulated 30 times and the average value was used as the result. e F1 and AUC values of the detection results of nine methods with different fault magnitudes are shown in Figure 6.

Experiment with Different Fault Magnitudes. It is evident from
As shown in Figures 6(a) and 6(b), the F1 and AUC values of the aforementioned nine methods show an increasing trend as the fault magnitude of f 3 increases gradually, but the rate of increase varies among the nine methods. e F1 and AUC values of the proposed method increased rapidly with the increase of the fault magnitude and remained finally near the highest value. e optimal fault detection method of the other eight methods for the fault f 3 was the OCSVM method, but it was slower than the method proposed in this paper. Due to the influence of noise, the detection result of OCSVM fluctuated greatly. As can be seen from Figure 6, the detection result of the proposed method may not be advantageous for large magnitude faults. However, in the case of incipient faults, the proposed method has an obvious advantage over the other eight fault detection methods. erefore, the method proposed in this paper is more comprehensive in its ability to detect the different magnitudes of the fault f 3 . . Why is the proposed method more sensitive to small-magnitude faults than other methods? is paper attempted to explain the reasons from the perspective of the optimal projection vector. For presentation purposes, the optimal projection vectors for each sliding window were normalized (the moduli of vector were set to 1) and taken the absolute value. e optimal projection vectors obtained by using dynamic LDA before and after the faults f 1 , f 2 , and f 3 are shown in Figures 7(a)-7(c), respectively. In Figure 7, the first 300 windows were the normal windows, while the last 300 windows were the faulty windows.
As can be seen from Figure 7, due to the influence of noise, the optimal projection vectors obtained by the proposed method were chaotic and had no fixed pattern in the absence of faults. However, in Figures 7(b) and 7(c), the optimal projection vectors obtained using dynamic LDA showed regular patterns after faults f 2 and f 3 occurred. In comparison with Figure 7(a), the optimal projection vectors obtained using dynamic LDA were still chaotic after the occurrence of the fault f 1 , and there is no significant advantage between the proposed method and traditional methods: e size of each component of the optimal projection vector determines the degree of scaling of different parameters in X. Taking Figure 7(b) as an example, the weight of the parameter x 6 was up to about 0.6 after the fault f 2 occurred, while the weights of the parameters x 4 and x 8 were about 0.45, and the weights of the remaining parameters were below 0.3. It can be seen from equation (20) that the fault f 2 was added to the parameter x 6 . It can be concluded that the optimal projection vector enlarged the weights of fault parameters and suppressed the weights of the other parameters.
e enlargement of the fault parameters improves the ability of the proposed method to detect smallmagnitude faults, while the suppression of the other parameters reduces the effect of noise from the other parameters.
As can be seen from Figure 7(c), the weights of x 3 and x 7 were significantly higher than the weights of the other parameters after the fault f 3 occurred. It can be seen from equation (20) that the fault f 3 was added to the parameter x 7 . It can be concluded that the optimal projection vectors also enlarged the weights of fault parameters and suppressed the weights of other parameters after the fault f 3 occurred. e traditional fault detection methods, such as OCSVM, IForest, and PCA, are static methods. Once the learning process is complete, they will use a fixed and invariable model to detect faults. Although the KNN method is dynamic, it treats each parameter equally. ey did not have the aforementioned dynamic process of enlarging the weights of fault parameters and suppressing the weights of irrelevant parameters. Consequently, the other eight methods were not effective for the faults f 2 and f 3 . Although OCSVM can obtain significant results sometimes, the detection results are not stable. After analysis, we consider that the instability of OCSVM may be caused by the selection of noise as the support vectors. In addition, comparing Figures 7(b) and 7(c), the optimal projection vectors were not the same for different faults. e optimal projection vectors obtained using the method proposed in this paper can be automatically adjusted according to the actual fault, and there is no need to manually set the parameter weights in advance.  Figure 8. For reasons of confidentiality, the true telemetry parameter names were hidden. e first 266,540 samples of the total samples were selected as normal historical data, while the next 197,000 samples were selected as test data. e sliding window length was still set to 300 for both the test data and the normal historical data, but the sliding window interval was set to 100. After the sliding window extraction, a total of 2663 windows were obtained from the normal historical data, and a total of 1968 windows were obtained from the test data. e first 324 windows of the 1968 test windows were normal windows, while the subsequent windows were fault windows. e significance level of the proposed method was set to 0.0001, while the rest of the experimental parameters were the same as those presented in Section 4.1.1. e evaluation indexes and detection results of the nine fault detection methods for the real satellite fault are shown in Table 3 and Figure 9, respectively.

Real Satellite Fault
It can be summarized from Figure 9 that the four methods, which including OCSVM, KNN, LOF, and the proposed method, have better detection results for the real satellite fault. In terms of four evaluation indexes, the proposed method all obtained satisfactory results. Although the proposed method has little advantage over the other fault detection methods, it still ranked as the best that can be seen from Table 3. e effectiveness of the proposed method is further verified by the real satellite case.     Figure 7: Variation of optimal projection vectors for the different faults: (a) optimal projection vectors of f 1 , (b) optimal projection vectors of f 2 , and (c) optimal projection vectors of f 3 . From the perspective of projection, the projection process can be considered as a weighted sum process.

Conclusions
Based on the analysis and comparison of existing satellite fault detection methods, this paper proposes a new incipient fault detection method that combines the core ideas of unsupervised learning and supervised learning. en, the effectiveness and superiority of the proposed method were verified through a numerical simulation case and a real fault case. is paper only studies linear Gaussian system. If the system did not meet the assumptions of joint Gaussian distribution and linearity, the detection effect of the proposed method might decrease. Due to the insensitivity of LDA to variance, the proposed method is suitable for detecting the slight abnormal change of mean instead of the slight abnormal change of variance.
Data Availability e numerical case data used to support this study are included within the article. e real satellite fault data used to support the findings of this study are available from the corresponding author upon request.