A Wavelet Based Multiscale Weighted Permutation Entropy Method for Sensor Fault Feature Extraction and Identification

Sensor is the core module in signal perception and measurement applications. Due to the harsh external environment, aging, and so forth, sensor easily causes failure and unreliability. In this paper, three kinds of common faults of single sensor, bias, drift, and stuck-at, are investigated. And a fault diagnosis method based on wavelet permutation entropy is proposed. It takes advantage of the multiresolution ability of wavelet and the internal structure complexity measure of permutation entropy to extract fault feature. Multicluster feature selection (MCFS) is used to reduce the dimension of feature vector, and a three-layer back-propagation neural network classifier is designed for fault recognition.The experimental results show that the proposedmethod can effectively identify the different sensor faults and has good classification and recognition performance.


Introduction
At present, the sensor is widely used in various processes to obtain a variety of physical quantity of data.In practical applications, due to the harsh external environments, battery depletion, aging, and other reasons, the sensor is prone to failure or even damage [1,2].The data obtained from the fault sensor has low reliability, and the subsequent judgment, recognition, decision, and control based on these low quality data will lose the meaning.The reliability of sensor data and the identification of sensor fault are important research subjects.Sensor fault identification mainly consists of two aspects: fault feature extraction and fault pattern classification [3,4].
Wavelet transform is a widely used time-frequency analysis technology.Using wavelet transform, signals are decomposed into multilevel time-frequency components.Suitable wavelet basis for wavelet decomposition is important for fault information representation [5].The selection method of wavelet base includes many kinds, such as the minimum joint entropy standard, the minimum conditional entropy standard, the maximum mutual information criterion, the minimum relative entropy standard, and maximum energy-Shannon entropy criterion [6,7].Maximum energy-Shannon entropy criterion takes energy intensity and energy distribution into consideration and is capable of extracting the sensor fault variations effectively.The wavelet basis with maximum entropy Shannon energy ratio is the most appropriate wavelet basis.
After wavelet decomposition, the main problem is how to extract fault information from the coefficients in decomposed subbands.The traditional Shannon entropy only considers the probability distribution of the signal value and does not consider the order structure of the signal value.Paper [8] combines the concept of Shannon entropy with the theory of symbol to propose a new complexity measure, which is the permutation entropy (PE).Permutation entropy is a time series complexity measure based on comparison of neighborhood values and the numerical mapping into symbol sequence pattern.It can describe the local structure features of time series signal and enlarge the subtle changes in the signal with low complexity and antinoise ability.PE is a kind of effective method for classification of different signal state, identification of the breakpoint in time sequence, prediction of the future trend of the time series, determination of causal relationship [9], and so forth.In order to overcome the shortcomings of PE's single scale, Aziz and Arif [10] proposed a multiscale permutation entropy (MPE) to estimate the complexity in different scales of time series.MPE can describe the structural characteristics and complexity of time series in multiple scales and is widely used in heart sound signal analysis [10] and bearing fault diagnosis [11].
On the other hand, the disadvantage of PE is the lack of amplitude information about the signal except sequential pattern [12].Paper [13] puts forward the weighted permutation entropy (WPE).It extracts the sequential pattern of time series and retains the amplitude information of time series.Although the amplitude information of time series is used by WPE, it can only reflect the complexity of time series in one single scale.Multiscale analysis and weighted permutation entropy are combined, and multiscale weighted permutation entropy (MWPE) emerges.MWPE represents the complexity measure of the signal on multiscale and reflects the microlocal structure complexity and the amplitude information of the signal.It is widely used in a variety of signal analysis, such as analysis for bubbly oil-in-water [14], flow bearing fault diagnosis [15][16][17], biomedical signal analysis [18][19][20][21], and stock information analysis [22].
From the viewpoint of structure feature presentation, PE can extract the local microstructure feature and wavelet transform can extract the global macrostructure feature.So the combination of wavelet transform with MWPE can comprehensively represent the feature of the sensor fault.A wavelet based multiscale weighted permutation entropy (WMWPE) is proposed in this paper.WMWPEs of different subbands are used to extract signal features.Because the dimensions of WMWPE features are relatively high, it may cause low identification accuracy and time consuming.So the selection of the most important features in WMWPEs is needed [23,24].In this paper, the multicluster feature selection (MCFS) [23] is used as feature selection method, which takes into account both the importance of each feature itself and the correlation between all features.By sorting the score of MCFS, the first  features with larger MCFS score are selected as the  important feature.Through the MCFS feature selection algorithm, the recognition accuracy is guaranteed, the feature vector dimension is reduced, and the computational efficiency is improved.
Naturally, after feature selecting using MCFS, the multifault classifier is needed to conduct the fault diagnosis.A three-layer BP neural network is adopted as classifier to identify fault.The  features selected by MCFS are fed into the classifier to identify sensor fault.
The remainder of this paper is organized as follows.Section 2 introduces permutation entropy and multiscale permutation entropy.In Section 3, the wavelet based multiscale weighted permutation entropy and corresponding fault identification method are presented in detail.Using practical data, the performance of the proposed method is investigated in Section 4. Section 5 makes some concluding remarks.
The definition of PE with  dimension is defined as The maximum value of  PE () is ln(!), when all possible permutations appear with the same probability.Therefore, the normalized permutation entropy (NPE) can be obtained as For any time series, 0 ≤  NPE () ≤ 1 is satisfied.The value of  PE () depends on the selection of the embedding dimension  and delay .If  is too small, the scheme will not work well since there are too few distinct states.However, it is often inappropriate to choose  as a large value for detecting the dynamic change of a time series.

Weighted Permutation Entropy.
Weighted permutation entropy (WPE) incorporates significant amplitude information from the time series when retrieving the sequential patterns.The main motivation aims at saving useful amplitude information carried by the signal.WPE is defined as follows [13].
Given a time series () and  dimensional embedding vector    as (1) shows, the relative frequency of each motif in (3) is modified to include the weighted information   for each    .Weight values   are calculated in (6) based on the variance or energy of each subsequence    : where    is the mean of    : Thus, each pair of weight value   and motif type   can represent full feature for each vector    .By using weight value, WPE extends the concept of PE with the addition of amplitude information prior to the computing of probability occurrence of each motif defined in (6).Weighted relative frequency is defined as Then WPE of time series () is The MWPE procedure is summarized in the following steps.Firstly, the original time series () = { 1 ,  2 , . . .,   } is divided into nonoverlapping windows of length .Secondly, the data points inside each window are averaged by (10), and the coarse-grained time series    is got.The schematic illustration of the coarse-grained procedure is shown in Figure 1.Consider the following: Weighted permutation entropy of    is MWPE, as shown in MWPE is a function of scale factor  and represents the complexity of time series on each scale.When  = 1, MWPE is the same as WPE; that is to say, WPE is a special form of MWPE.For most of signals, the MWPE on one single scale is not enough to describe complexity of the signal, and the multiscale weighted permutation entropy is more suitable for the analysis of all kinds of actual signals.

Parameters Selection.
Before computing multiscale weighted permutation entropy, four important parameters including the length  of time series data, embedding dimension , time delay , and scale factor  are needed to set.
The value of PE mainly depends on the embedding dimension  and time delay .Embedding dimension  determines the number of sequential pattern, and the maximum sequential pattern number is !, so  plays an important role in calculating the probability of the sequential pattern.If embedding dimension is too small (such as less than 3), it is hard to differentiate the sequential patterns.If the embedding dimension  is too large (such as more than 8), calculation of PE is time consuming, and it is not easy to observe the small changes in the signal [16].
The time delay  is related to the sampling rate of the signal.As [8] suggests, the time delay  is set to 1 in this paper.
The length of time series  has a great influence on the calculation of PE, a larger  will cause low computational efficiency, and a smaller  cannot completely describe the complexity of the time series. ≥ 5! is recommended by [9].
To illustrate the rationality of parameters selection, some experiments are conducted.The experiment data used in this paper is 1-minute gas sensor data.Sampling rate is 100 Hz and the data has 6000 sampling points.Given that the scale factor  is a fixed value, Figure 2 shows the weighted permutation entropy (WPE) of four kinds of sensor data under different embedding dimensions, m = 2-8, and the time delay  = 1.As shown in Figure 2, when  = 5, WPE can separate the four kinds of signals.So  = 5 is feasible and is used in this paper.
In Figure 3, it is easy to find that different  almost has no effect on WPE, so it is reasonable for the fact that  is set to 1.
In this paper, the scale factor  is set to 20.MWPEs on 20 scales are used as the features to identify and classify faults.

Wavelet Based Multiscale Weighted Permutation Entropy (WMWPE) and Fault Identification Method
Although the MWPE takes advantage of the local microstructure information and amplitude information of signal, the macrostructure information is not explored.Wavelet transform is a powerful method to explore it and can be used to extract the global macrostructure information of the signal.
So the combination of wavelet transform with MWPE can comprehensively describe the multiple features of the signal.

WMWPE Based Fault Identification Method.
The WMWPE based fault identification method takes advantage of wavelet transform, WMWPE, MCFS, and BP-NN.It provides a full working flow of feature selection and fault identification as shown in Figure 4. Detailed procedures are described as follows.
Step 1. Use maximum energy-to-Shannon entropy ratio criterion to choose a proper wavelet base.
Step 2. The sensor signals are decomposed by the selected wavelet base.A series of wavelet subband signals are obtained, and the appropriate subband signal is selected to extract feature.
Step 3. Calculate multiscale weighted permutation entropy of the selected wavelet subbands, and WMWPEs are got.
Step 4. After feature extraction, calculate MCFS score of the WMWPE features.According to the MCFS score ranked from high to low, select the features corresponding to the top  MCFS score as the best subset of features.
Step 5.The selected  features by MCFS are fed into BP-NN classifier for sensor fault recognition.

Experiments and Result Analysis
In this paper, the experimental data set is the measurement recordings collected from an array of 72 metal-oxide gas sensor array-based chemical detection platform [25].The sampling rate of data is 100 Hz.One-minute sampling data (6000 sampling points) of each sensor is used as original data.Three kinds of fault (bias, stuck-at, and drift) are injected into original data.Each kind has 120 groups of data.For bias fault, the bias constant is 2% of the average value of original data.In each stuck-at fault, there are two segments of data with constant value (98% of the average value of the original).In drift fault groups, drift rate is 0.1% of the average value of original data.Normal sensor data and fault sensor data are shown in Figure 5, respectively.
(1) Selection of Wavelet Subband.Before selection of wavelet subband, the maximum energy-to-Shannon entropy ratio criterion is used for wavelet base selection.More detailed information can be found in [26].(2) Feature Extraction and Fault Identification.After 3-level wavelet decomposition, feature extraction on the third-level low frequency subband is the further job.The WMWPEs of the selected low frequency subband signal on 20 scales are calculated.These WMWPEs will be fed into BP-NN classifier to identify fault.A three-layer BP-NN neural network is used as classifier in the experiments.The hidden layer of the network has 10 neural nodes for learning.The network is trained by scaled conjugate gradient back-propagation method.Mean squared error is used as performance function.To obtain the generalized identification performance, 10-fold cross-validation [27,28] is used in this paper.In 10-fold cross-validation, the proposed feature vectors are randomly partitioned into 10 equal sized subsets.Of the 10 subsets, a single subset is retained as the validation data for testing, and the remaining 9 subsets are used as training data.The cross-validation process is then repeated 10 times, with each of the 10 subsets used exactly once as the validation data.The 10 results from the folds can then be averaged to produce a single estimation.To illustrate the identification performance of WMWPE on different subbands (HF 1 , HF 2 , HF 3 , and LF 3 ), all WMW-PEs of 20 scales are used to recognize faults.Table 1 shows the average identification accuracy of different subband.WMWPE of LF 3 has much better identification performance than that of other subbands.The average identification accuracy of LF 3 is 99.5% and about 20% higher than others.At the same time, LF 3 has the smallest identification standard deviation than others.It explains the rationality of selecting WMWPE of LF 3 subband as features.
Table 2 is the detailed identification results of WMWPE from LF 3 subband.For bias and stuck-at, the identification accuracy is 100%.There is false judgment in the identification of normal and drift signal.The reason is that the local structure pattern and the amplitude of bias and stuck-at are significantly different from those of normal signal.So it is easy to use WMWPE to identify them.Although the drift causes the change of signal amplitude, small change of the amplitude may bring little change to the sequential pattern.And some false negative results exists.
(3) Feature Selection.There exists large consuming time and information redundancy, if selecting all 20 scales of WMWPE as features and implementing classification.In order to improve the efficiency of the proposed method, MCFS is used to reduce the dimension of feature vector.
After calculating MCFS of 20 WMWPE features, the top  features with larger MCFS score are selected as the inputs of BP-NN.The feature selection result and the corresponding identification accuracy are shown in Table 3.
Table 3 shows that the average identification accuracy and standard deviation obtained by 5 new selected features are almost the same as those of all 20 features.The experiment results show that, using feature selection algorithm MCFS, the dimension of the feature vector is reduced from 20 to 5, but the recognition accuracy is maintained at about 99.5% with standard deviation of 0.08.That is because MCFS not only considers the classification ability of single feature, but also considers the relationship between the features.
(4) Performance Comparison of Different Features.In order to further illustrate the superiority of the proposed method, the identification performance of the proposed method WMWPE is compared with that of MWPE, WWPE, WPE,   As shown in Tables 2, 4, and 5, the identification accuracy of WMWPE is higher than that of MWPE.The same conclusion can be got from the performance comparison of WWPE with WPE.That is because wavelet transform brings macrostructure information into the feature.Combined with macro-and microstructure information, WMWPE and WWPE can get better performance.
Comparing the results of WMWPE to WWPE, MWPE to WPE, and MPE to PE, the multiscales analysis can improve the identification precision over 16%.The main reason is that the multiscale feature can explore more local structure information of the signals than the single scale feature.
Comparing the results of MWPE to MPE and WPE to PE, the amplitude information can bring about 25% and 9% increase of average identification accuracy, respectively.
So the macro-and microstructure information and amplitude information are all explored by WMWPE.The experiment results validate that the proposed method based on WMWPE can achieve high identification accuracy for sensor fault.

Conclusion
How to find an effective feature extraction method for sensor fault analysis and identification is always an important issue.Taking full advantage of macrostructure information, microstructure information, and amplitude information of the typical sensor faults, this paper proposed a new sensor fault feature extraction and identification method based on wavelet transform and multiscale weighted permutation entropy.The appropriate based wavelet selection, feature extraction, multicluster feature selection, and BP classifier are investigated.Actual chemical gas concentration data is used to evaluate the performance of the proposed method.Experiment results show that the proposed WMWPE extracts more comprehensive feature information and can achieve higher fault recognition accuracy than other kinds of features.

Figure 2 :Figure 3 :
Figure 2: The relationship of  and WPE.

Figure 4 :
Figure 4: Flowchart of sensor fault feature extraction and recognition method.

Figure 6
is the average WMWPE of 120 sensor signals after 3-level wavelet decomposition.It is easy to find that WMWPE of the low frequency subband can easily differentiate the four kinds of sensor signals (normal, bias, stuck-at, and drift) on different scale factor , and WMWPE of three high frequency subbands cannot distinguish the four kinds of sensor signals.The experiment results show that selecting the low frequency subband for feature extraction is feasible.

Table 1 :
Average identification accuracy and standard deviation of 20 WMWPEs of different subband.

Table 2 :
Identification results of WMWPE feature of LF 3 subband.

Table 3 :
Average identification accuracy of the selected feature vectors.

Table 4 :
Average identification accuracy of using different features.
MPE, and PE, wherein WWPE is WPE of the low frequency subband signal under wavelet decomposition.Table4shows the average recognition accuracy and standard deviation for the four kinds of signals using different features over 10 times of 10-fold cross-validation.The more detailed recognition results are shown in Table5.

Table 5 :
Identification results of different features.