Feature Frequency Extraction Based on Principal Component Analysis and Its Application in Axis Orbit

Vibration-based diagnosis has been employed as a powerful tool in maintaining the operating efficiency and safety for large rotating machinery. However, the extraction of malfunction features is not accurate enough by using traditional vibration signal processing techniques, owing to their intrinsic shortcomings. In this paper, the relationship between effective eigenvalues and frequency components was investigated, and a new characteristic frequency separation method based on PCA (CFSM-PCA) was proposed. Certain feature frequency could be purified by reconstructing the specified eigenvalues. Furthermore, three significant perspectives were studied via the distribution of effective eigenvalues, and theoretical derivations were subsequently illustrated. More importantly, this proposed scheme could also be used to synthesize axis orbits of larger machines. Purified curves were so explicit and the CFSM-PCA exhibited higher efficiency than harmonic wavelet and wavelet packet.


Introduction
Principal component analysis (PCA), which can reduce the dimensionality of data set but retain most of original variables [1][2][3], has been widely used in fields of image processing, fault diagnosis, pattern recognition, neural network, data compression, wavelet transform, and so on.For example, Kirby et al. [4] employed PCA algorithm to compress image and extract main features.Moreover, the combination of PCA and Back Propagation (BP) neural network could also be applied in reorganization of facial image.Xi et al. [5] and Malhi et al. [6] individually applied PCA approach to reduce the dimension of data and extract the feature variables.Additionally, neural network was further used as a classifier to categorize the bearing faults.To investigate the fault diagnosis of impeller in centrifugal compressor, PCA was also adopted to decrease the dimensionality of multiple time series by Jiang's group [7].Sun et al. [8] analyzed the defects of conventional fault diagnosis methods and introduced the data mining technology into fault diagnosis.After that, a new scheme used to reduce data features was proposed based on C4.5 decision tree and PCA algorithm.
Generally, when PCA is used to denoise or for data compression, the number of effective eigenvalues is determined by the cumulative contribution rate and its deformation [9][10][11][12][13][14], expressed as where   and   are eigenvalues of covariance matrix, respectively;  is the number of eigenvalues of covariance matrices;  is the number of effective eigenvalues.When cumulative contribution rate   is greater than a certain value (80%-95%),  could be decided [2].Although impressive progress in signal denoising and dimensionality reduction fields has been achieved, the studies on extraction or elimination of specific characteristic spectrum (single frequency) via this classical PCA method have always been ignored.However, precise extraction of the fundamental frequency (1X), the second-harmonic (2X), or the other feature frequencies of raw signal is of significance to the purifications of axial orbit, notch filter [15], speech recognition [16], fault diagnosis of rolling bearing [17], and so forth.Over the past decade, many signal processing tools for 2 Shock and Vibration the extraction of certain frequency have been developed, such as wavelet packet transform, harmonic wavelet, ensemble empirical mode decomposition (EEMD), and sparse decomposition [18].For instance, references [19][20][21] adopted multilevel division technique of wavelet packet to select certain frequency band for the extraction of specific frequency, from which axis orbit could be manufactured.References [22,23] subdivided random frequency band infinity via harmonic wavelet to extract interesting frequency; the refinement of rotor center's orbit from one or more interesting frequency bands could be realized subsequently.Nevertheless, wavelet packet and harmonic packet algorithm are subject to the Heisenberg uncertainty principle and resolutions of time domain   and frequency domain   could not be randomly high simultaneously, i.e.,  2   ⋅  2  ≥ 1/4.In addition, in EEMD method, signals are adaptively decomposed into several sums of intrinsic mode functions (IMFs), whose instantaneous frequencies have physical meanings.In practice, the IMF is always multicomponent rather than a single component, resulting in unexplainable irregularity in its instantaneous amplitude and blind extraction of the 1X, 2X, and the other subharmonics.So the EEMD method is not suitable for the decomposition of signals with multiple components in a narrow band [24].By using singular value decomposition (SVD) strategy, reference [25] generated axis orbit by means of cumulative contribution rate to denoise noisy signal.This method is unable to extract the specified feature frequency.Therefore, other more effective and simple methods, which suffer free from above disadvantages, remain to be explored.
Our group has been committed to studying the fault diagnosis of large scale equipment [26][27][28].During the course of single frequency simulation via PCA, an interesting phenomenon was discovered unexpectedly; i.e., a frequency component produces two eigenvalues.After intensive study, we found that PCA algorithm could be used to extract the specified single or multiple feature frequencies from a crude signal.Guidelines are summarized as follows.(1) Each characteristic frequency in a signal produces only two valid eigenvalues.(2) The number of effective eigenvalues is related to the quantity of raw signal frequencies and has nothing to do with the magnitude of   ,   , and   .(3) The sequence of eigenvalues of covariance matrix in its distribution chart is determined by the amplitude of feature frequency.For these discoveries, a novel frequency separation method based on PCA was proposed, through which axis orbits of large rotating machines were readily purified.Moreover, purification results are better than that of existing methods, such as wavelet packet and harmonic wavelet.
Hereafter, the paper is organized as follows.Section 2 briefly introduces the basic principle PCA theory in signal processing, and a new method of signal recovery is introduced.In Section 3, the theoretical discovery and the theoretical verifications of relationship between eigenvalues and feature frequency are given.Section 4 illustrates the application on purification of axis orbit of large rotor test bed and compares experimental results with that of harmonic wavelet and wavelet packet, proving its high efficiency of the proposed scheme.The filtration of single frequency is given in Section 5. Finally, Section 6 draws the conclusions.

Basic Theories of PCA in Signal Processing
We assume that there are  random vectors (x 1 , x 2 ⋅ ⋅ ⋅ x  ) with each vector x  containing  samples (x  = ( 1 ,  2 ⋅ ⋅ ⋅   )).An  ×  matrix X with  rows and  columns can be described as where T and T denotes the vector transpose.Supposing that x 1 , x 2 ⋅ ⋅ ⋅ x  represent the indicatrix of crude variables, then  new variables y  ( = 1, 2 ⋅ ⋅ ⋅  ∈  + ) ( ⩽ ) could be obtained after principal component analysis of X, described as where y  ∈ R 1× .According to the definition of PCA [3], and   are eigenvectors corresponding to the ith eigenvalue in descending order in the covariance matrix of X [2][3][4]6], and (4) should be satisfied.
Covariance matrix of X in (2) is where cov(x  , Based on PCA theory, characteristic equation of covariance matrix C can be given by where   are eigenvalues of covariance matrix C and   are eigenvectors corresponding to   .Given that eigenvalues ranged in descending order, that is,  1 >  2 > ⋅ ⋅ ⋅ >   , data dimensionality reduction can be achieved by using (1) and (3).Then original m variables are converted to new  ones.If signal processing is performed, the signal reconstruction is needed.Considering covariance matrix C is a semi-positive symmetric matrix, its eigenvectors are orthogonal to each other, i.e., ∑  =1    T  =   [1,2].After left multiplication on both sides by   of (3) and sum calculation, (7) can be given as follows: If the former  principal components are chosen to reconstruct in the light of cumulative contribution rate   , an approximate matrix X can be formulated as Compared with original matrix X, the reconstructed approximate matrix X comprises most of information of X and excludes redundant features, such as noise and power frequency interference [9,10].
Signal could be recovered from the reconstructed matrix X in terms of matrix composition mode.XT is converted to 1 row  columns vector with the row vectors arranging in head-to-tail fashion, then a new vector is obtained, recorded as a, a ∈  1× .The recovered signal x is obtained via inverse transformation of vector a, derived as where x ∈  1× , L=m+n-1, L is the length of recovered data, M + = (M T M) −1 M T ∈  × , and M + is pseudoinverse of M. M ∈  × , and M comprises  unit matrices, where the rank of each unit matrix is .Taking m=3, n=3 as an example, in this case, L=m+n-1=5, and mn=9, the matrix M can be expressed as follows: The signal x is recovered from the pseudoinverse matrix M + ; actually it is to compute the average values of each element at counter-diagonal of matrix X, which is consistent with the method reported in [23].Since M is a sparse matrix, especially when values of  and  are large, random-access memory and time for calculating of pseudoinverse matrix M + will increase exponentially.Hence, signal recovered from simple method is extremely desired.With this aim in mind, we have developed a new averaging method and adopted it to recover signal x from matrix X.The expression is shown as follows: where x, are element of the th row and the th column in the reconstructed approximate matrix X.The signal x can be restored facilely according to (11).

Internal Law of Effective Eigenvalues and Frequency Components
where  is the number of frequency components.4096 data points were collected with sampling frequency of 1024 Hz, and then the Hankel matrix was formed from signal () with m rows and n columns.Decomposition and reconstruction of signal () were proceeded by employing PCA algorithm in Section 2 [9,13].Effect of the constructed Hankel matrix on signal processing was studied by Zhao et al. [26]; they pointed out that if the number of rows was close to the number of columns, signal processing effect was better.Furthermore, when L was an even, m=L/2 and n=L/2+1; when L was an odd, m= (L+1)/2 and n= (L+1)/2.Hence, in our case m = 2048 and n = 2049 were applied.The decomposition procedure consists of the following: (1) Given k = 1,   = 1, tri-groups signals were constructed and their principal component eigenvalue distribution maps are shown in Figure 1.The number of eigenvalues  range from 1 to 2048, and just the leading 50 eigenvalues are listed in this case.
When k = 1, each signal contains only one effective frequency component.These three sets of signals have the same amplitude   but different frequency   and phase   .As can be seen from Figure 1, each signal produces two adjacent nonzero eigenvalues  1 and  2 .Although tri-groups signals are different, nonzero eigenvalues of them are the same to each other.
It can be seen, from eigenvalue distribution graphs in Figure 2, that each signal produces two sets of nonzero eigenvalues; each set of eigenvalues contains two adjacent eigenvalues   and  +1 .Interestingly, the four nonzero eigenvalues produced from each signal are equal to their corresponding eigenvalues generated by the other signals, even if the signals are different.
Moreover, by comparing Figure 2 with Figure 1, the first set of eigenvalues in both two graphs are exactly same.Actually, signals in   frequency component with amplitude of 0.8.Thus, it could be confirmed that the first set of nonzero eigenvalues  1 and  2 in Figure 2 are generated from frequency component with amplitude of 1.0, while the second set of nonzero eigenvalues are generated from frequency component with amplitude of 0.8.
(3) Given  = 3,   = 1, 0.8, and 0.6, tri-groups signals were reconstructed and each group signal contained three sets of effective frequency components.The frequencies of the first group of signal were 20 Hz, 30 Hz, and 40 Hz, respectively.The frequencies of the second group of signals were 50 Hz, 60 Hz, and 70 Hz, respectively.Similarly, the frequencies of the third group of signals were 80 Hz, 90 Hz, and 100 Hz, respectively.The corresponding phases of these three groups of signals were taken as 10, 20, and 30, 40, 50, and 60, and 70, 80, and 90, respectively.
When  = 3,   =1, 0.8, and 0.6, each group signal produces three effective frequency components.The amplitudes   of corresponding frequency   (=1, 2, 3) are same, but with different frequencies   and phases   .As displayed in Figure 3, each group signal produces three sets of nonzero eigenvalues, and each set of eigenvalues contains two adjacent eigenvalues   and  +1 .In addition, these three sets of eigenvalues produced by each signal are correspondingly same.
It can be found from Figures 1-3 that the magnitude of eigenvalues of each first group in three graphs is same.As aforementioned, the first set of nonzero eigenvalues  1 and  2 are generated from frequency component with amplitude   of 1.Similarly, by comparing Figure 2 with Figure 1, the second set of nonzero eigenvalues  3 and  4 are generated from frequency component with amplitude   of 0.8.And then, it is almost certain that the third set of frequency components  5 and  6 are generated from frequency component with amplitude   of 0.6.
The same results can be obtained by continuously increasing effective frequency components of signal.Therefore, an  Hankel matrix is derived from certain signal x(t), l=min(m, n), with m rows, n columns, and k effective frequencies.Concerning the fact that the Shannon sampling theorem is met, assuming that l>2k, generic conclusions are summarized as follows: (1) Each frequency component of signal produces two nonzero eigenvalues with one arranging another closely.
(2) The number of effective eigenvalues of crude signal is related to the number of frequency components and has nothing to do with the magnitude of amplitude   , frequency   , and phase   .
(3) In eigenvalue distribution chart of covariance matrix C, the sequence of nonzero effective eigenvalues is determined by amplitude   of signal.The larger amplitude is, the larger eigenvalues will be, the more forward rank of the two eigenvalues produced from corresponding frequencies is.
Inspired by the relationship between frequency and eigenvalues, a new technique for characteristic frequency separation method based on PCA (CFSM-PCA) was proposed.The concrete steps are listed below: (1) For a certain signal (), direct component (DC) of raw signal is filtered out via fast Fourier transform (FFT) firstly, and then Hankel matrix X is constructed through filtered signal.
(3) According to the distribution of eigenvalues   , reconstruction is carried out from two eigenvalues and corresponding eigenvectors of certain frequency.For example, for amplitude perspective, if the rank of specific frequency of raw signal is k, a new matrix is received by reconstructing the eigenvectors corresponding to 2k-1 and 2k eigenvalues in eigenvalue distribution chart of covariance matrix C.
(4) The matrix X can be produced by adding the mean of original matrix to the new reconstructed matrix.
(5) The signal x, which is the characteristic frequency component, is recovered from matrix X by means of the averaging method.

Theoretical Deduction.
In this section, deduction process of the three discoveries (Section 3.1) is provided.Supposing that a signal is expressed as () =  sin( + ).Sampling time   is used to discretize signal x(t).Hankel matrix with m rows and n columns is derived from signal (), exhibited as (1) Each characteristic frequency produces two effective eigenvalues.
The deduced process of the first conclusion is given below.Equation ( 13) can be rewritten as (14) based on Euler's Formula: Equation ( 14) can be expanded to addition form of two formulas, depicted as From ( 15), the rank of both matrices is 1.Based on rank relationship of two matrices ( + ) ≤ () + ().Hence, rank of matrix X is less than or equal to 2.
When  ̸ = 0, (13) can be deduced into the first-order principal minor of matrix X, i. where   ̸ = 0, , 2 ⋅ ⋅ ⋅ .Because the leading principal minor of order 2 of X in ( 13) is nonzero values, so the rank of matrix X is at least 2.
Combining result of ( 15) ( + ) ≤ 2 and nonzero values of the leading principal submatrix of order 2, hence, the rank of matrix X is 2. Matrix X has two eigenvalues   1 and   2 .Derived from C = XX T /, (where  is the number of columns of matrix X) [3], two nonzero eigenvalues of covariance matrix C of X are  1 and  2 , respectively.
(2) The sequence of eigenvalues is determined by amplitude.
In terms of PCA, covariance matrix C is described as Referring to (6), ( 19) is constructed as follows: Two effective eigenvalues are generated from a frequency component, that is, l=2.
Then the energy of matrix C can be deduced as Based upon (4), 22) is given by The constructed Hankel matrix X of signal () is substituted into (18); we can see that the energy of covariance matrix C is proportional to  2 .The larger  2 is, the greater energy of matrix C is, as well as   , according to (22).Furthermore, the larger frequency component amplitude is, the larger corresponding eigenvalue in covariance matrix C characteristic distribution chart is.
Based on these conclusions, once the amplitude sequence of a certain frequency in raw signal amplitude spectrum is determined, its corresponding frequency component can be reconstructed.In this way, extraction of single or multiple characteristic frequencies could be realized.
In view of addition relation as shown in (8), the notch filter could be achieved through CFSM-PCA.The frequency component extracted by this algorithm is subtracted in original signal; i.e., this frequency component can be eliminated in raw signal.Reader could consult examples in Section 5 for more details.

Simulation Example.
To verify the applicability and effectiveness of the CFSM-PCA algorithm, a simulated sinusoidal signal  was constructed as  = sin (40 + 20) + 0.5sin (100 + 40) + 0.7sin (98 + 50) + 0.8sin (160 + 60) where () is Gaussian white noise with standard deviation of 1.2.The result is shown in Figure 4(a) and amplitude spectrum (Fourier spectrum) is illustrated in Figure 4(b), after collecting 4096 data points with sampling frequency of 1024 Hz.Follow-up work to separate frequencies via the proposed approach was performed.From eigenvalue distribution chart of crude signal in Figure 5, we can see that, apart from discernible four sets of eigenvalues, most of eigenvalues caused by noise are nonzero values and distribute at large eigenvalues region.
As demonstrated in Figures 6(a) and 6(b), time domain and frequency spectrum are generated after reconstructing the first set of eigenvalues  1 and  2 in Figure 5. Frequency of the reconstructed signal is 20 Hz with amplitude of 0.977, which is in correspondence with component sin(40t+20) of y signal.
As revealed in Figures 6(c) and 6(d), another time domain and amplitude spectrum are obtained via the reconstruction of the second set of eigenvalues  3 and  4 .Frequency of this reconstructed signal is 80 Hz with amplitude of 0.786, which is same as component 0.8sin(160t+60) of raw signal y.Figures 6(e) and 6(f) are brought out through the reconstruction of the third set of eigenvalues  5 and  6 .Frequency of reconstructed signal is 49 Hz with amplitude of 0.688, which is consistent with component 0.7sin(98t+50) of crude signal .Time domain and amplitude spectrum are gained after the reconstruction from the last set of eigenvalues in Figure 5, exhibited as Figures 6(h) and 6(g).Frequency is 50 Hz with amplitude of 0.463, which is matched well with component 0.5sin(100t+40) of signal y.
Moreover, several sets of signals with different signalto-noise ratio (SNR) were processed by CFSM-PCA.The amplitude difference is summarized in Table 1.
It can be seen that when the noise is low, the amplitude of frequency components extracted by CFSM-PCA algorithm is close to that of the raw signal.However, the amplitude difference will increase with the decrease of SNR.
From all above-mentioned diagrams (Figure 6), it is obvious that diverse frequency components extracted from raw signal can be achieved accurately.More importantly, these results are coincident with ideal signal perfectly, demonstrating that this CFSM-PCA is a zero phase shift frequency extraction algorithm.Meanwhile, it should be noted that two frequencies could also be perfectly separated via the proposed algorithm, even difference value between the adjacent frequency components is only 1 Hz.This method is accurate and seems to be more efficient than other existing filter methods such as wavelet filter and finite impulse response (FIR) filter, which are subject to phase distorting issue.

Experimental Verification
The fault diagnosis for large rotating machinery has been an important topic and is attracting more and more attention.Rotor orbits indicate the symptoms of malfunction and are related to variation of input force or dynamic stiffness.Generally, different curve shapes of rotor orbit, in a sense, represent different fault types [19][20][21][22]29].For example, inside "8", outside "8", and petal-shaped curves are omens of oilfilm whirl fault, misalignment fault, and rubbing fault of rotor system, respectively.Therefore, extraction of rotor orbit plays a significant role in rotating machinery diagnostics.In this section, a simple and efficient CFSM-PCA is applied to purify rotor orbit precisely.

Experimental Rigs.
Figure 7 shows the large rotor vibration test bed system, which is constructed by our group.In order to real-time monitor rotor working station, two Kaman KD2306-1S eddy current proximity probes are installed on individual side in orthogonal manners (Figure 8).The experimental data were collected through LMS Test Lab.

The First Group Sample Analysis.
Experimental data stems from D11 and D12 eddy current displacement sensor in side A. 4096 data points were collected with test bed speed of 1080 r/min and sampling frequency of 2048 Hz.The time domain waveforms, amplitude spectrums, and the order of amplitudes were obtained after filtering DC component via FFT (Figure 9).
As demonstrated in Figure 9, raw signal was not only affected by noise but also interfered by power frequency 50 Hz and its harmonics frequency.The energy of signal mainly concentrated on the leading two octaves.
Rotor orbit was produced by combining spindle displacement signal D11 with signal D12 (Figure 10), X-axis presented raw signal D11, and Y-axis referred to signal D12.Manipulating status of test bed is difficult to know from this unprocessed rotor orbit.Next, the rotor orbit was attempted to purify by employing PCA.
Generally, rotor orbit synthesized by 1X and 2X is creditable.Orbits included high harmonics incline to become complex and even turn messy.Hence, extraction of 1X and 2X of raw signal becomes more important [21,22].Eigenvalue distribution of covariance matrix of D11 and D12 signals are displayed in Figure 11.According to amplitude of each frequency of D11 and D12 in Figure 9, 1X and 2X corresponded to the first (first and second eigenvalues) and the second (third and fourth eigenvalues) set of eigenvalues, respectively.The leading four eigenvalues in Figure 11 were reconstructed through CFSM-PCA and results are illustrated in Figure 12.Generated 1X and 2X were legible without effect by noise and disturbance from frequency 50 Hz as well as its harmonics.Meanwhile, amplitude of the extracted signal was close to that of 1X and 2X in raw signal, indicating that there is no spectral leakage issue.
The feature spectrums continued to be separated and eigenvalue distribution maps of covariance matrix of D11 and D12 are shown in Figure 15.
Rotor orbit generated by the purified 1X and 2X is shown in Figure 13.The rotor orbit was a caved-in banana-like curve, demonstrating that side A (near the motor) potentially suffers from misalignment fault [18,19].The explicit rotor orbit also validates the high efficiency of this proposed method.

The Second Group Sample Analysis.
In this case, experimental data of D11 and D12 were analyzed when test bed runs steadily for a period of time.Similarly, 4096 data points were measured with test bed speed of 2770 r/min and sampling frequency of 2048 Hz.Time domain waveforms and amplitude spectrums are displayed in Figure 14.
According to amplitude of the leading trebling frequency of D11 and D12 in Figure 14, 1X corresponded to the second set of eigenvalues (third and fourth eigenvalues).2X paralleled the third set of eigenvalues (fifth and sixth eigenvalues) and  3X matched the first set of eigenvalues (first and second eigenvalues).As displayed in Figure 15, eigenvalues of 1X and 3X were large; thus 1X and 3X were utilized to synthesize rotor orbit firstly.Purification results and orbit of shaft center are revealed in Figures 16 and 17, respectively.The legible orbit showed a typical plum blossom shape.

Shock and Vibration
It can be seen from Figures 16 and 17 that the filtering effect was also quite obvious.The noise and power frequency 50 Hz as well as its harmonic interference could be removed successfully.Moreover, amplitude of refined signal was close to that of original one and the reconstructed time domain signal seemed to be steady without any fluctuation.
As shown in Figures 14(c) and 14(d), amplitudes of 1X, 2X, and 3X in amplitude spectrum of raw signal were large.Thus, 1X, 2X, and 3X were also extracted to synthesize axis orbit.Time domain, frequency domain, and orbit curve are shown in Figures 18 and 19, respectively.
Axis orbit synthesized by the leading three frequencies exhibited a quincunx shape, which is basically the same to that synthesized by 1X and 3X frequency components, indicating that static and dynamic parts friction may exist in A-side.

Comparison Harmonic Wavelet with Wavelet Packet
(1) Comparison of CFSM-PCA with Harmonic Wavelet.One common used method for signal processing, harmonic wavelet algorithm proposed by Newland, is an improved wavelet algorithm [29].It can segment entire frequency band infinitely and avoid the shortcoming of downsampling method of binary wavelet and binary wavelet packet [22,23].1X and its frequency doubling signal can be easily extracted to form axis orbit of revolving test bearing.
Axis orbit was fabricated by 1X and 2X of data of example 1 (Section 4.2) through harmonic wavelet algorithm and the results are demonstrated in Figure 20.The shape of axis orbit was trapped banana-like, which is consistent with the result obtained via CFSM-PCA.It should be noted that orbit in Figure 13 was more obvious than that in Figure 20, i.e., W1<W2.Meanwhile, the amplitude fluctuation of time domain signal was large (Figures 20(a) and 20(b)).The major reason for this phenomenon is that the harmonic wavelet, which is a band filter in frequency domain, inevitably extracts the noise near the feature frequency.Furthermore, this phenomenon also demonstrates that harmonic wavelet is subject to the Heisenberg uncertainty principle; i.e., the phenomenon of the amplitude leakage exists.
(2) Comparison of CFSM-PCA with Wavelet Packet.Besides harmonic wavelets, wavelet packet is also used to purify axis orbit [19][20][21].Purification results of axis orbit via Daubechies wavelet packet in Figure 21 exhibit that axis orbit was divergent seriously.This was worse than that of harmonic wavelet, i.e., W1<W2<W3.The phenomenon of severe energy leakage occurred during the extraction of 1X and 2X, which can cause divergence of axis orbit.Moreover, the bandpass filter of wavelet packet also accounts for the divergent orbits.

A Single Frequency Filtration with CFSM-PCA
The application in single frequency filtration is discussed in this section.Taking D11 experimental data of example 1 (Section 4.2) as a demo, we attempt to filter the power frequency interference (50 Hz) from D11 by employing CFSM-PCA.It can be seen from Figure 9(b) that sequence of amplitude of 50 Hz in raw signal was 3, so eigenvectors corresponding to 5th and 6th eigenvalues in eigenvalue distribution map were used to reconstruct.According to (8), the filtered signal (Figure 22) without 50 Hz was obtained by subtracting reconstructed signal from original one.
Results in Figure 22 show that this algorithm filters out 50 Hz power frequency interference and has no influence on the other frequency components of raw signal, demonstrating that this method has a potential application in notch filter.

Conclusion
The selection of effective eigenvalues in feature distribution chart of covariance matrix C is crucial in PCA algorithm.In this paper, the relationships between effective eigenvalues and signal frequency and amplitude are discovered and a series of conclusions are summarized.
First, in Hankel matrix mode, each valid frequency component of signal only produces two adjacent nonzero eigenvalues.Second, the number of valid eigenvalues in characteristic distribution chart, which is independent of numerical values of   ,   , and   , is related to the number of frequency components.Third, the sequence of signal effective eigenvalues in characteristic distribution map is determined  by amplitude   of signal frequency component.The larger magnitude of amplitude is, the greater eigenvalues are, and the higher arrangement will be.
Based on aforesaid principle, a new CFSM-PCA was developed and has become an effective tool to extract or eliminate specific characteristic (single frequency), even difference of two frequencies is only 1 Hz.Purification of rotor orbit using this algorithm shows significant improvement than those of existing methods, such as wavelet packet and harmonic wavelet packet.Although the proposed CFSM-PCA is effective, some drawbacks are unavoidable.For example, this method suffers from similar issue to other algorithms; that is, the reconstruction error increases with the decrease of SNR.In addition, the applications of the CFSM-PCA algorithm in purification of speech recognition, bearing fault diagnosis, and fault diagnosis of power system should be further explored.We fully believe that the CFSM-PCA will sparkle in diverse fields.

Figure 2
are established by adding an extra

Figure 1 :
Figure 1: Eigenvalues distribution of covariance matrix C with one frequency component.

Figure 2 :
Figure 2: Eigenvalues distribution of covariance matrix C with two frequency components.

Figure 3 :
Figure 3: Eigenvalues distribution of covariance matrix C with three frequency components.

Figure 5 :
Figure 5: The eigenvalues distribution of the covariance matrix C of .

Figure 6 :
Figure 6: The time and frequency domains of reconstructed signal.(a) The time domain of the reconstructed signal via the first set of eigenvalues.(b) Amplitude spectrum of Figure 6(a).(c) The time domain of the reconstructed signal using the second set of eigenvalues.(d) Amplitude spectrum of Figure 6(c).(e) The time domain of the reconstructed signal using the third set of eigenvalues.(f) Amplitude spectrum of Figure 6(e).(g) The time domain of the reconstructed signal using the fourth set of eigenvalues.(h) Amplitude spectrum of Figure 6(g).

Figure 7 :
Figure 7: Large rot or vibration test bed.

Figure 8 :
Figure 8: Schematic and installation of eddy current displacement sensor.

Figure 9 :Figure 10 :
Figure 9: The time domain and frequency domain of the D11 and D12 sensors in the A-end face.(a) The time domain of D11.(b) The frequency domain of D11.(c) The time domain of D12.(d) The frequency domain of D12.

Figure 11 :Figure 12 :Figure 13 :
Figure 11: The eigenvalues distribution of covariance matrix C. (a) The eigenvalues distribution of D11 signal.(b) The eigenvalues distribution of D12 signal.

Figure 14 :
Figure 14: The time and frequency domains of the D11 and D12 sensors.(a) The time domain of D11.(b) The time domain of D12.(c) The frequency domain of D11.(d) The frequency domain of D12.

Figure 15 :
Figure 15: The eigenvalues distribution of covariance matrix C. (a) The eigenvalues distribution of D11.(b) The eigenvalues distribution of D12.

Figure 16 :Figure 17 :
Figure 16: Extracting the frequency spectrums of the first two frequencies of D11 and D12 by PCA algorithm.(a) The time domain of D11.(b) The time domain of D12.(c) The frequency spectrum of D11.(d) The frequency spectrum of D12.

Figure 18 :Figure 19 :
Figure 18: Extracting the time and frequency spectrums of the first three frequencies of D11 and D12 by PCA algorithm.(a) The time domain of D11.(b) The time domain of D12.(c) The frequency spectrum of D11.(d) The frequency spectrum of D12.

Figure 20 :
Figure 20: The purification effect of harmonic wavelets.(a) The time domain of D11.(b) The time domain of D12.(c) The frequency spectrum of D11.(d) The frequency spectrum of D12.(e) The orbit is to extract the first 2 frequencies.

Figure 21 :
Figure 21: The purification effect of wavelet packet.(a) The time domain of D11.(b) The time domain of D12.(c) The frequency spectrum of D11.(d) The frequency spectrum of D12.(e) The orbit is to extract the first 2 frequencies.
, frequency   , and phase   was constructed as follows:

Table 1 :
Amplitude of signal components under the different SNR.