A Research on Maximum Symbolic Entropy from Intrinsic Mode Function and Its Application in Fault Diagnosis

Empirical mode decomposition (EMD) is a self-adaptive analysis method for nonlinear and nonstationary signals. It has been widely applied to machinery fault diagnosis and structural damage detection. A novel feature, maximum symbolic entropy of intrinsic mode function based on EMD, is proposed to enhance the ability of recognition of EMD in this paper. First, a signal is decomposed into a collection of intrinsic mode functions (IMFs) based on the local characteristic time scale of the signal, and then IMFs are transformed into a serious of symbolic sequence with different parameters. Second, it can be found that the entropies of symbolic IMFs are quite different. However, there is always a maximum value for a certain symbolic IMF.Third, take the maximum symbolic entropy as features to describe IMFs from a signal. Finally, the proposed features are applied to evaluate the effect of maximum symbolic entropy in fault diagnosis of rolling bearing, and then the maximum symbolic entropy is compared with other standard time analysis features in a contrast experiment. Although maximum symbolic entropy is only a time domain feature, it can reveal the signal characteristic information accurately. It can also be used in other fields related to EMDmethod.


Introduction
Empirical mode decomposition (EMD) is an adaptive timefrequency signal processing method [1][2][3][4].EMD not only has a wide applicability and a high signal-to-noise ratio, but also does not need basis functions; therefore, it has been applied in many engineering fields [5][6][7][8].The original signal is decomposed into a series of intrinsic mode functions (IMFs) according to the signal features with EMD.IMFs can reveal the nonstationary and nonlinearity properties of the signal and reflect the information in different time domain scales.
How to obtain the property from nonstationary signal based on IMFs is an important problem.Many researchers have attempted to solve this problem in the past.They often focus on eliminating mode mixing and calculating sensitive features [5,9,10].With the development of science and technology, there are higher requirements for the efficiency and accuracy in the industrial application.
Many researchers get instantaneous frequency of IMFs with Hilbert-Huang transform and then take an envelope analysis to get the key frequency area.Although both time domain and frequency domain characteristics are considered, computational complexity of features becomes more difficult as a result of large calculation in frequency domain of IMFs.Many values are needed in pattern recognition.It is difficult to realize intelligent decision and information fusion [4,[11][12][13].With the development of pattern recognition such as support vector machine and neural networks, structure effective features have been much more important in signal analysis, and it can recognize state of nonstationary signals reliably and quickly.Energy features, dimensionless features, statistical features, and others are applied in various situations [14][15][16].Computation is often increased with EMD-based methods as a result of IMFs and corresponding frequency components.Redundant information may also be introduced with low frequency components in IMFs.These defects will lead to a lower prediction accuracy in signal analysis.Therefore, advanced features, those that are strong to represent signal and easy to calculate, should be developed for this challenging task.
Recently, entropy-based features are always used in IMFs and EMD such as information entropy and sample entropy [17][18][19].Above these features, it has been proved that symbolic entropy has a good property in representing statistical regularity.With good computational efficiency and resisting disturbance, symbolic entropy represents a major improvement of entropy-based features [20][21][22].
In this paper, we combined the advantages of IMFs and entropy and proposed a novel feature, maximum symbolic entropy, based on symbolic dynamics in order to overcome the above shortcomings.
The paper was organized as follows.In Section 2, the components of the proposed methodology based on EMD and symbolizing IMFs, a procedure for the proposed methodology of maximum symbolic entropy and related parameters, are introduced.In Section 3, verification of the methodology as applied on simulated signal and its properties are described.In Section 4, contrast rest results prove the effectiveness and reliability of maximum symbolic entropy.Experimental data for normal and faulty bearings sourced comes from bearing data center in Case Western Reserve University.Finally, the conclusion is given in Section 5.

Calculation of Intrinsic Mode Function.
EMD is a powerful time-frequency domain analysis technique for decomposing a nonlinear and nonstationary time series into a set of orthogonal components named as intrinsic mode functions.The EMD process of a signal () can be described as follows [4]: (1) Find the positions and amplitudes of all local maxima and minima; then denote them as  max and  min correspondingly; (2) Create an upper envelope and a lower envelope by cubic spline interpolation of the local maxima and the local minima, respectively.Calculate the mean of the upper and lower envelopes as () = ( max +  min )/2; (3) Envelope is then subtracted from the signal using ℎ 1 () = () − ().If ℎ 1 () satisfies the two conditions of IMF conditions as follow, it can be obtained as an IMF.Otherwise, set () = ℎ 1 () and repeat processes (1)-(3) until the residual satisfies the stopping criterion.(4) Once IMF has been got, () should be replaced by the residual  1 () = () −  1 ().The above process is repeated and the signal () would be separated into  IMFs   () and a residue signal   () as in (1) at last: An IMF is a function that satisfies the two following conditions: (1) in the whole data set, the number of extremes and the number of zero-crossings must either be equal or differ at most by one; (2) at any point, the mean value of the envelope defined by local maxima and the envelope defined by the local minima is zero.
To overcome the end effects and mode mixing, original signal was extended in a mirror way before EMD.Then, extended signal was decomposed and the extended parts in IMFs were also cut correspondingly.High frequency information of the signal is always contained in the former IMFs while noise and low frequency information are in residue signal and back IMFs.Thus features calculation is mainly aimed at the first few IMFs.

Symbolic Intrinsic Mode Function.
It is essentially a kind of quantization process symbolizing IMFs in some certain regulation; therefore, local and some specific information is ignored.However, symbolized signal always keeps enhanced robustness and a better reliability to resist the interference from noise.These advantages will make features more sensitive to the property of signal.
There are always two ways for signal symbolizing.One is directly changing the sequence into symbol number one by one, the other is taking a few points as a unit and transforming them gradually.Considering the later one takes account of the relationship between neighboring points, hence it is more conducive to find property of signals.In the symbolizing process, the range between the maximum and minimum values was divided into 2 or 4 intervals [23][24][25].
In this paper, the range was divided into 4 intervals, and 3 points of IMF were taken as a unit to realize symbolization.The process is presented as where  min and  max are the upper and lower boundary, respectively. 1 ,  2 , and  3 are break points which have a deep influence on symbolization result.According to symbol dynamics, we can set  2 =   and  2 −  1 =  3 −  2 = Δ for the symmetry of intervals.Δ denotes the length of interval and can be defined as Δ = .
The effect of symbolizing is due to ratio parameter  and mean value .As to a certain signal,  is a constant.In substance, the distribution of points in different internals determines the result of division; therefore, how to choose  is the most important factor in the process of symbolizing IMFs.In order to get the symbolic law with different intervals division,  1 was used to represent the total number of both 1 and 2, and  2 was used to represent the total number of both 0 and 3 in symbolization, and a new parameter  was introduced and defined as  =  1 /( 1 +  2 ).Thus  determined the interval division in essence and led to division by both  1 and  3 .Once  changes, the result of symbolization will be quite different.
where (  ) is the number of times of a unit   .Then, the symbol entropy  can be noted as The concept of entropy serves to characterize the probability distribution functions of symbolic units.A low value of  reflects a stronger regularity in symbolic sequence, and vice versa.
For the symbolic IMFs to be analyzed, different symbolic interval will lead to a change of .As to a certain signal, there is always a maximum symbolic entropy.We know that symbolic entropy can represent random variables in some way, and a higher value often means that it contains more information of symbolizing signal.Thus, maximum value reflects the complexity degree of a given symbolic sequence.It can be taken as features of symbolic IMFs correspondingly.

Analysis of Simulation Signal
In order to get the property of maximum symbolic entropy of IMFs from different signals, simulation experiments for both period signal and periodic signal containing noise were taken.

Periodic Signal.
The used period signal is composed of 3 cosine signals with a frequency of 50 Hz, 100 Hz, and 200 Hz.The signal contains 1024 points and can be described as (5): ) .(5) Figure 1 shows () and its IMFs.It can be seen that IMFs have been abstracted gradually with EMD. Figure 2 shows the relationship between symbolic entropies and , and it is clear that the symbolic entropy presents a parabola form with the increasing .
The maximum symbolic entropies of IMFs from () are recorded in Table 1.It can be observed that the maximum symbolic entropy decreases from IMF1 to low order IMFs.And β, a value of  corresponding to the maximum entropy,  is different for various IMF, but β fluctuates in the vicinity of 50%.This phenomenon can be due to the properties of entropy and symbolizing.

Periodic Signal Containing Noise.
In order to evaluate the ability of noise suppression, Gauss white noise was added to () and the result was recorded as ()  when the ratio of signal to noise is 10.
The IMF1-3 of ()  is shown in Figure 3. IMFs cannot reflect the property of original signal because there is a mode mixing.Their maximum symbolic entropies are shown in Figure 4 and the distribution is similar to that in Figure 2. β is reduced corresponding to a period signal without noise.What is more, the maximum symbolic entropy is shown in   As to a period signal, there is always a maximum symbolic entropy wherever containing a noise.Its values will increase with a more complex signal.With a EMD method, maximum symbolic entropy decreases from high order IMF to those low ones.Considering that it has an ability to represent IMFs in vicarious situation, maximum symbolic entropies of high order IMFs were taken as new features for an engineering signal.Bearing faults represent the most frequent cause for failure of mechanical drives; thus the proposed approach is evaluated on bearing fault data from Case Western Reserve University [26].Deep groove ball bearing (6205-2RS, SKF) is used in the experiment and bearings were seeded with faults using electro-discharge machining.Faulted bearings were reinstalled into the test motor and vibration data was recorded.Sampling frequency was 12 kHz.Faults ranging from 0.178 mm in diameter to 0.533 mm in diameter were introduced separately at the inner raceway, rolling element, and outer raceway.

Application in Fault Diagnosis
Four groups of fault signals are selected and shown in Table 3.The experiment includes two aspects, one is distinguish the failure mode; the other is judge the fault level for each mode.Sample A includes normal bearing, inner raceway failure, outer raceway failure, and rolling element failure with a fault of 0.533 mm.Sample B includes inner fault containing 0.178 mm, 0.356 mm, and 0.533 mm, and sample C and D for outer fault and rolling fault.Each sample is composed of 60 signals and the sequence length is 1024.The average of β is calculated in Table 4.
Figure 5 shows symbolic entropies of IMF1-8 from normal bearing.It can be observed that the distribution of entropy with different  is also similar to Figure 4; furthermore, β is nearly equal to Table 2 and it means that engineering signal can be considered as a period signal with noise.The result proves that symbolic entropy can be The detailed descriptions of maximum symbolic entropies of IMf1-3 from samples of rolling bearings are recorded in Table 4.The values are quite different from each failure mode.It provides the information to how the features distribute in failure samples with different modes or different level.It can also be seen that there are both a high concentration ratio between same failure mode and an obvious distance between different modes.β ranges from 43% to 47% and an average of β can be used in calculation to avoid traversal search.Because bearing data are engineering signals, the value of β is more near to it in Table 2 than Table 1.
The original data of maximum symbolic entropy from high order IMFs is shown in Figure 6.The distribution can also be observed and we can find that maximum symbolic entropy is sensitive to different kinds and degrees of faults.

Contrast Test.
In order to evaluate the effect of separation, other time series analysis methods such as symbolic entropy (in [23]), standard deviation, and root mean square of IMFs are used to compare with maximum symbolic entropy.Some advanced classification algorithm based on machine learning may get a good result of separation with these features such as neural networks, but they will also hide the real properties of features with their strong abilities of nonlinear self-study and numbers of train sets are also needed.Therefore, K-means cluster is adopted here, in the contrast test, to evaluate the effect of features itself.The result can reflect the space distribution and distances of features.The results are shown in Table 5.
In sample A, maximum symbolic entropy reaches an accuracy of 85.00% which is better than other contrast features.Accuracy of other features is near 70.00%.It proves that the maximum symbolic entropy has a robust ability in expressing different faults.
In samples B and C, standard deviation and root mean square can recognize the inner and outer faults under different degree well, and the accuracy is over 90%.Two other entropies are lower but the maximum symbolic entropy is still near to 80%.
In sample D, rolling fault is always more difficult to distinguish in bearing fault diagnosis; therefore, all features have a lower accuracy.Only maximum symbolic entropy is over 80% which is better.
From the result, maximum symbolic entropy has a correct rate about 80% for all samples based on cluster of Euclidean distance and its property is better than symbolic entropy.Although its accuracy is lower than two classical time domain characteristics in samples B and C, maximum symbolic entropy has a stable ability to distinguish samples A and D.
In order to provide reference for further applications, Support Vector Machines are also used in classification with cross validation and all features are selected.Both training and validation sets have 30 sequences for each sample.Take radial basis function as kernel function of SVM net as (6).
The classification result is shown in Table 6.There is a higher accuracy compared to K-means cluster due to the ability of self-study and nonlinear learning of SVM.In Table 6, most results of separation are near to 90%.Maximum symbolic entropy is better than symbolic entropy in test.Compared to standard deviation and root mean square, different types of bearing faults and failure levels in rolling faults can be separated more accurately with maximum symbolic entropy.

Conclusion
The empirical mode decomposition and symbolizing method provide powerful tools for nonlinear and nonstationary signal analysis.Maximum symbolic entropy based on IMFs is proposed in this research and applied to fault diagnosis successfully.Carrying out simulation and bearing experiment mainly yields the following conclusions.
(1) Maximum symbolic entropy based on IMFs is proposed and analyzed.The relationship between symbolizing parameters and symbolic entropy value is concluded and it is proved that the maximum symbolic entropy value has a robust ability in expressing different signal.
(2) In simulation and experiment, bearing faults are detected by the proposed features accurately.Furthermore, the given method, the maximum symbolic entropy with EMD, can be used to distinguish both failure level and mode of rolling bearing.(3) In the contrast test, other time series analysis features are used to compare with maximum symbolic entropy.The proposed feature is better than symbolic entropy in fault diagnosis.Different types of bearing faults and failure levels in rolling faults can be separated more accurately with maximum symbolic entropy.

Figure 5 :
Figure 5: Relationship between symbolic entropy and  of normal bear.

4. 1 .
Fault Diagnosis.Maximum symbolic entropy was used in fault diagnosis to prove its effect in the analysis and application of engineering signal.

Figure 6 :
Figure 6: The original data of maximum symbolic entropy from test samples A to D.

Table 1 :
Maximum symbol entropy of IMF from ().Entropy of Symbolic Sequence.If  points in sequence are selected as a basic unit for symbolization, there are 4  kinds of units in signal.Take  as the length of original signal, the probability for one of the 4  units   can be calculated as

Table 2
. The complexity of signal is added by Gauss white noise and entropies of all IMFs increase.

Table 3 :
Test samples of rolling bearing.

Table 4 :
Maximum symbol entropy of IMF1-3 from bearing data.

Table 5 :
Results of contrast test based on K-means cluster.