The Research on Information Representation of Φ-OTDR Distributed Vibration Signals

This paper mainly focuses on the representable problem of Φ-OTDR distributed vibration signals. The research included a signal extraction part and a signal representation part. Firstly, in order to extract the betterΦ-OTDR signal, the time-domain data should be fully preserved. The 2D-TESP method is used to extract data in this paper. There are 29 characters in the traditional TESP method. The characters’ number is reduced from 29 to 13 and the characters’ dimension is expanded from 1 to 2 in the 2D-TESP method. Secondly, in order to represent Φ-OTDR signal better, the characteristics ofΦ-OTDR data and damped vibration signals are combined in the paper. The EMD method and the NMF method are combined to form the new method in the paper. Some parameters in the proposedmethod are optimized and adjusted by GAmethod. AfterΦ-OTDR data is represented by the proposed method, there is excellent performance both on the length dimension and on the time dimension. Lastly, some experiments are carried out according to the physical truth in this paper. The experiments are carried out in the semianechoic room. The methods of the paper have better performance. The methods are proved to be effective through these experiments.


Introduce
Φ-OTDR (phase sensitive optical time-domain reflectometer) was put forward by Taylor and Lee in 1993.It serves as a typical technique for monitoring distributed vibrations.At present, it has a wide range of applications in the field of large building structure, health monitoring [1], perimeter security of important places [2], and so on.However, we hope to grasp its monitoring state more accurately, understand its vibration mode better, and enhance the monitoring efficiency.Therefore, it is of great significance for a standard timedomain analytic expression of the Φ-OTDR data.The Φ-OTDR data is a distributed vibration signal.In order to standardize the time-domain analytic expression, we need to extract and represent useful signals.
In signal extraction, the feature extraction methods in time domain mainly include probability analysis method, time series method, correlation function analysis method, and time-domain waveform feature analysis.Potočnik and Govekar analyzed probabilistic statistics of vibration signals.
They combined probability statistical analysis, feature extraction, and principal component analysis.The combined method is used for evaluating the performance of multiple classifiers [3].Delpha et al. also combined probability statistical analysis, feature extraction, and principal component analysis.The combined method is used for condition monitoring and fault diagnosis [4].Ma et al. used the time series model to extract features of vibration signals.The fault signals and nonfault signals are identified effectively by the method [5].Liu et al. used multiplex detection to eliminate trend correlation analysis.The feature extraction is achieved finally [6].Alhazza analyzed the waveform of the signal.Alhazza used waveforms to control the system [7].In a word, the time series, correlation function, and waveform analysis are all based on the condition of obvious signals.But for the condition of poor SNR (signal-to-noise ratio), the probability analysis method has incomparable advantages to other methods because of its unique statistical characteristics.In signal expression, some scholars focused on the NMF (nonnegative matrix factorization) method and its expansion methods.Gao et al. combined TDF method and NMF method.The combined method is used to diagnose faults [8].Li et al. combined the  transformation method, NMF method, mutual information method, and multiobjective evolutionary algorithm [9].Li et al. also, respectively, combined generalized  transform method to NMF method and 2DNMF method [10,11].Some other scholars focused on the EMD (empirical mode decomposition) method and its expansion methods.Rai and Upadhyay combined EMD method and -means method to process the signal [12].Liu et al. put EMD method into the process of multiplex detection to eliminate trend correlation analysis [6].In conclusion, NMF and the related methods pay more attention to data model.These researches proceed from the fitting curve of signal separation but ignore the research of vibration mechanism.EMD and the related methods pay more attention to mechanism model.These researches simply proceed from the vibration mechanism and the expression curve but ignore the research of data features.
A summary of the above signal extraction studies is presented.When the SNR is weak, there will be some errors in the extraction of signals.Φ-OTDR technique monitors the vibration of an optical fiber.The shape of optical fiber is a line.There will be weak SNR on the farther point of the optical fiber with the attenuation of signal propagation.Therefore, a signal extraction method which is more suitable for the condition of weak SNR is needed.On the other hand, it is more necessary for data driven and mechanism driven organic combination for the representation of Φ-OTDR distributed vibration signals.The combined method can not only reflect the real situation of data but also reflect the real state of vibration.This paper is divided into two parts.The first part uses the 2D-TESP method to extract the signal.The 2D-TESP method has a more appropriate coding interval and also takes into account the first derivative and the two derivatives.It has good effect on signal extraction under weak SNR.The second part uses the GAEMD-NMF method to express the signal.It combines the EMD method and the NMF method.Some parameters are optimized by GA (genetic algorithm) after the two methods are combined.It can better solve the relative standard time-domain expression problem of distributed vibration signals.

Time-Domain Signal Extraction Based on 2D-TESP Algorithm
Signal extraction aims at better signal representation.Signal extraction requires full preservation of signal data prototypes in the time domain.Because of the characteristics of Φ-OTDR technology, it is hoped that effective signals can be extracted on the point with weak SNR.Therefore, the 2D-TESP method is selected to extract the features of the signal.

TESP Algorithm. TESP (time encoded signal processing)
algorithm has two main features [13][14][15][16].The first feature is that the algorithm is based on the time domain which directly processes the signal.The second feature is to convert signals into probabilistic models that contain finite elements.Simply put, the method is to reencode the signal in the time domain.Typical time-frequency domain analysis algorithms include FFT, WT, and HHT.These algorithms have a long operation time and are also unsuitable for the poor SNR.They are not suited for distributed vibration data characteristics.TESP algorithm has the advantages of intuitive, small computation and simple implementation process.The traditional TESP algorithm is suitable for the feature extraction of speech signals [13].The TESP method was improved by Wang's team in 2014.They apply the method to feature extraction of sound signals [14].Wang et al. used it to develop two-dimensional feature extraction.They applied the improved algorithm to signal feature extraction in a complex environment [15,16].
The specific implementation steps of TESP algorithm are as follows.
(1) The window is divided into time-domain signals on each length node of the optical fiber.In each window, the zero crossing rate of the signal is calculated.The interval between two adjacent zero points is a time period.According to the rule of TESP algorithm, each time period is called meta.
(2) There are two indexes in the meta.One is the duration; it is usually expressed in .The other one is the signal form; it is usually expressed in .At the same time, the following information is obtained according to these two indexes: (a) The number of sampling points existing in each element is the duration.
(b) In each element's duration, the data is derived one by one.The extreme number in each element is obtained.
(3) The matrix is constructed by using  and  as two dimensions.Each element is coded according to the elements in the matrix.
(4) The probability of each code in the matrix is compiled statistically.Finally, the probability distribution is used as a feature to be inserted into a classifier or cluster.

Feature Extraction Based on 2D-TESP Algorithm.
There are only 29 characters because of the limitations of the encoding principle in traditional TESP algorithms.At the same time, the traditional TESP method achieves higher recognition rate in recognition.This paper focuses on the study of damped vibration.The encoding of 29 characters has far exceeded the requirements of damped vibration in the Φ-OTDR technique.Therefore, the - matrix is extended to -1 and -2 matrices in the paper at first.Secondly, the coding principle for each matrix is reduced from 29 to 13. Lastly, the -1 and -2 matrices are encoded together with joint probability distribution statistics to form the  matrix.

The Reduced TESP Symbol Table.
In the traditional TESP algorithm,  only represents the first derivative.There may be mutual influence between points in Φ-OTDR technology.Therefore, further analysis of the Φ-OTDR signal is needed.The concept of the inflection point of the two-order derivative is introduced in the traditional method.In this paper, the  matrix that is the extreme point is set as -1, and the  matrix that is the inflection point is set as -2.Because of the characteristics of the Φ-OTDR technology, the two indicators named  and  do not require such many codes.The new method employs 13 coded representations.Tables 1 and 2 represent the standard 29 encoded and the shrunk 13 encoded -1 matrices, respectively.
From Tables 1 and 2, the longest continuous sampling point is still 35.The statistics of first-order extreme points are reduced from six classes to four classes.And the encoding is reduced from 29 symbols to 13 symbols.Tables 3 and 4 are statistics of the two-order inflection point and the encoding result is reduced to 13 codes too.
By comparing Tables 2-5, it can be seen that the features represented are coarser with the codes' decrease.Actually, the  probability distribution of encoded high numbers is almost zero, so it is possible to reduce the encoding to 13 characters.

The Improved 2D-TESP Algorithm.
After the original algorithm reduces to 13 characters, the feature of the algorithm is further represented as a two-dimensional  matrix.The two dimensions of the  matrix are the 13 codes in the shrunk -1 and in the shrunk -2 matrix.The values of the  matrices are the joint distribution probabilities of -1 and -2.We select a window of the signal to be encoded in accordance with the improved 2D-TESP algorithm.Figures 1 and 2 are histograms of probability distributions for -1 and -2.Figure 3 is a histogram of the  matrix with the joint distribution probability of -1 and -2.The  matrix is shown in Figure 3.

The Analysis of the Algorithm.
In this paper, the improved 2D-TESP algorithm is mainly considered as follows.
(1) Because the damping vibration signal is encoded by TESP, the higher the coding number is, the more the probability drops.In this paper, the original 29-character encoding is reduced to 13-character encoding.(2) The Φ-OTDR technology has its own characteristics.Therefore, the signal is expressed not only by extreme points but also by inflection points.
(3) After the reduction of characters, the 13-character coding must be slightly rougher than the 29-character coding.Some experiments are carried out in -means, hierarchical clustering, and spectral clustering.The effect of 13-character coding has not declined.

Time-Domain Signal Representation Based on GAEMD-NMF
Because of the presence of 3D data in the Φ-OTDR technique, the signal representations of "time dimensional" and "length dimensional" need to be required.In the representation of the time axis and length axis, it is necessary to consider the signal's physical model and actual data characteristics in a rational representation.Therefore, the paper combines two aspects of knowledge about the prior vibration model and the actual signal data decomposition.The new method is combined by EMD and NMF.The combined method is optimized by GA method.The parameters of the method are effectively corrected and finally good results are achieved.

The EMD of Damped Vibration.
In the time amplitude plane, a damping vibration model is established because the data conforms to the law of damping vibration.The model is shown in In the length amplitude plane, a SA model is established because the data conforms to the law of SA signal.The model is shown in After the superposition of signals, the model is shown in According to the two models above, the original signal corresponds to the model to determine the various parameters.The concrete steps are as follows.
(1) The absolute maximum value of the signal amplitude on the time axis is A 1 .A 2 is identified as 1.
(2) The peak value points are fitted to a curve.The damping attenuation coefficient is obtained according to the fitted curve.The damping attenuation coefficient is recorded as .
(3)  1 is determined according to the number of zero crossings on the time axis.
(4) When the signal starts, it must start from zero.The parameter  1 is determined as 0 or 1.
(5) The parameters B 1 and B 2 are determined as 0 according to the actual data.(6) The parameter  2 is determined by the number of zero crossings on the length axis.
(7) When the signal starts, it must be the maximum.The parameter  2 is determined as 0.
3.2.NMF.NMF (nonnegative matrix factorization) was proposed by Lee and Seung in 1999.It is a new matrix decomposition method published in Nature magazine [17].It is widely used in image analysis, data mining, speech processing, and other fields.In recent years, scholars have made some modifications on the basis of the original NMF algorithm and achieved many results [18,19].The advantages of NMF algorithm are two main points.Firstly, its decomposition form and results are very explanatory because it requires matrix elements to be nonnegative.Secondly, the result of matrix decomposition usually has natural sparsity.Its results not only are easy to express, but also reduce the space occupation.Since the value of the vibration signal is positive, the vibration data are processed by using nonnegative matrix factorization to meet the requirements of the algorithm.
The nonnegative matrix factorization algorithm is defined as follows.A nonnegative matrix  + is set.It is decomposed into the product of two nonnegative matrices named  + and  + : Here, subscript "+" stands for nonnegative constraints.The parameter  is a low dimensional spatial dimension that approximately describes the raw data. should meet the requirement "(+) < ."For the solution of formula (4), the product of matrices  and  is approached gradually by the original matrix .Euclidean distance is usually used to represent the error between  and .The error function is shown in Here, the matrices , , and  are nonnegative matrices.When the minimum of formula ( 5) is obtained, the error between matrix  and matrix  is minimum.Lee and Seung gave the iterative rules of the corresponding upper formula such as and  are iterated over the two formulas above.The iteration ends when  converges.NMF method is completed.However, the traditional NMF method cannot meet the needs of signal representation.The new iterative method needs to be carried out according to the actual situation.

GAEMD-NMF.
The accuracy of the traditional NMF method is not satisfactory.Therefore, this paper uses the combination of EMD and NMF algorithm for signal representation.
The objective function is redefined as follows: In order to facilitate the convergence calculation, the rotation matrix is defined by the four-element number.It is as shown in Here,  =  0 + 1 ⋅+ 2 ⋅+ 3 ⋅. represents quaternion.By adjusting an expression, formula ( 9) is as follows: The GA method is used to optimize the processing of the EMD-NMF method.
(1) Each row or column of the signals  and X is regarded as a population.
(2) The entire matrix is normalized.
(4) According to the sample input, we establish the fitness function as shown in where J is fitness.
(5) If the fitness is reduced, it will inherit the parameter results.If the fitness is enhanced, it will change the parameter results.

Experiments
The Φ-OTDR instrument in this paper is the NBX-S3000 instrument from the Japanese Nebrex company.The instrument is shown in Figure 4.  Figure 5 is a physical picture of 5 percussion hammers on the knocker.Figure 6 is a physical picture of the knocker.
In order to restrain the environmental noise and ensure the good vibration effect, the anechoic chamber is selected to test.The specific experimental environment is shown in Figures 7 and 8.In Figure 7, the fiber clings to the ground.The instrument is placed on the damping table in Figure 8.
The effects in each observation position on the fiber are shown in Figure 9.

Signal Extraction Experiment.
After encoding and extracting features by the 2D-TESP method, the results are used in the popular clustering algorithms.Popular clustering algorithms are -means clustering, hierarchical clustering, and spectral clustering.The main work of the first half of this paper is signal extraction.Therefore, clustering targets are limited to two cases: "signal" and "no signal."We compare the effects of the 5 cases.These five situations are "extreme value 29 characters," "extreme value 13 characters," "inflection point 29 characters," "inflection point 13 characters," and "extreme point + inflection point matrix."The clustering accuracy of each case is shown in Table 5.Part 1 in Table 5 is the average accuracy of the signals from 4.4 to 4.6 meters and from 5.9 to 6.1 meters on the fiber.Part 2 in Table 5 is the signal on the fiber from 4.7 meters to 5.8 meters.Some conclusions can be drawn from the accuracy of Table 5.The 2D-TESP algorithm is the best of the five    encoding methods.Spectral clustering is the best of the three clustering methods.In the case of good SNR, the accuracy of the 2D-TESP algorithm is about 0.5% higher than that of the traditional TESP algorithm.But in the case of poor SNR, the accuracy of the 2D-TESP algorithm is about 8% higher than that of the traditional TESP algorithm.In the case of strong and weak SNR, the accuracies of 2D-TESP algorithm are relatively close.They differ by about 1 percentage point.In this section, what is the best sort of coding in the improved TESP?The results obtained are shown in Figure 10.From Figure 10, the correct rate between 13 and 29 characters is almost the same.But in 13 characters, it will have a certain effect on the accuracy.The accuracy rate tends to decrease linearly within 3 characters.
In order to verify the quality of this method, the method and another three methods are compared in the paper.Comparison is also made between three clustering methods and two different SNR situations.The results are shown in Table 6.
Some conclusions can be drawn from the accuracy of Table 6.The 2D-TESP algorithm is the best of the four methods.Spectral clustering is still the best of the three clustering methods.In the case of good SNR, the accuracy of the 2D-TESP algorithm is about 1.5% higher than that of the other algorithms.But in the case of poor SNR, the accuracy of the 2D-TESP algorithm is about 4% higher than that of the other algorithms.The method not only has better accuracy, but also has shorter operation time.The same set of signals is made in the experiments.The signal's size is 1 MB.The PC has 1.6 GHz * 4 clocked CPU and 8 G memory.Matlab software is used on this PC.The results are shown in Figure 11.
As shown in Figure 11, the method in this paper is superior to several other methods.Because the method is only computing in the time domain, its operation time is about 1 s higher than FFT, about 1.2 s higher than WT, and about 1.5 s higher than HHT.

Signal Expression Experiment.
Since the Φ-OTDR signal is 3D data, the expression of the signal needs to be seen from the length axis and the time axis.

Positioning Problem on Length Axis.
Figure 12 shows the state of the fiber with one vibration source.The vertical intersection point between the vibration source and the optical fiber is the strongest point of vibration.
There are five hammers with equal intervals on the knocker.The striking distance between two strikes is 10 cm.The five signals are not the same on the time axis.Five different signals need to be separated.The result is shown in Figure 13.
The spacing of the two hammers on the length axis is fixed as 10 cm.The five hammers have four pitches.The specific values of the four distances under the four methods are compared.The result is shown in Figure 13.The four methods are NMF, EMD, EMD-NMF, and GAEMD-NMF.
As you can see from Figure 14, the GAEMD-NMF method works best.The EMD-NMF method has good performance and the NMF and EMD methods have the worst performance.After 100 sets of trial and error, the error range between the four methods is as shown in Figure 15.
From Figure 15, the GAEMD-NMF method in this paper is the best method for the four spacings.At the same time, the range of error fluctuation is also minimum.The range of error fluctuations in GAEMD-NMF is reduced by half compared with EMD-NMF and by 1/3 compared with NMF and EMD.The error range of the GAEMD-NMF method at 4 distances is from −0.1 cm to 0.1 cm.

Vibration Modes of Signals on Time Axis.
The most important thing about the performance of signal on time axis is whether the spectrum analysis is consistent or not.In this paper, a high-precision vibration sensor is arranged along the fiber.The sensor signal, the original signal, NMF, EMD, EMD-NMF, and GAEMD-NMF are compared in power spectrum.The result is shown in Figure 16.The method which compares the power spectrum similarity between different signals in this paper is as follows.
Step 1.Each ten Hz are noted as a frequency segment.
Step 2. The accumulated energy per frequency segment is calculated.
Step 3. The total energy of signal is calculated.
Step 4. Each energy is divided by the total energy.The corresponding percentage of each segment is got.
Step 5.The probability fluctuation of each segment between two signals is calculated.
Step 6.These probability fluctuations are summed by absolute values.
Step 7. The sum of the fluctuation probability is reduced by 1.The result is probability value of the similarity.
From Figure 16, several experimental results are as follows.
(1) There is large high-frequency noise in the original signal.The same performance is in the treated signal by NMF method and EMD method.This shows that the signal-tonoise separation is poor.
(2) There is no high-frequency noise in the EMD-NMF and GAEMD-NMF.This shows that the signal separation is in good condition.(3) The trends of the spectrogram of EMD-NMF, GAEMD-NMF, and sensor signals are consistent.This shows the validity of this method.
(4) The spectrograms of the original signal, the EMD method, and the NMF method are not quite consistent with the other three classes.This shows that the signal expression method is not ideal.
(5) Because the device is different, there may be a difference in the effective value of the power spectrum.Therefore, each frequency segment of the power spectrum is normalized.The similarity between GAPPM-NMF and sensor is 95.63%.The similarity between GAEMD-NMF and sensor is 95.63%.The similarity between EMD-NMF and sensor is 90.74%.This shows that the GAEMD-NMF method is 5.11% higher than the EMD-NMF method.

Conclusion
In this paper, we study the information representation of Φ-OTDR distributed vibration signals, and the main conclusions are as follows.
(1) The improved 2D-TESP signal extraction method has good signal extraction effect.The number of coded characters is reduced and the dimension is extended.This will effectively reduce redundancy and increase effectiveness.Particularly, it has good signal extraction effect on the fiber point with poor SNR.
(2) The improved GAEMD-NMF signal expression method has a good expression effect on the length axis.Because the new method takes into account the characteristics of data model, it is easier to locate the strongest point of the signal in the positioning process.The positioning error effect is effectively suppressed.
(3) The improved GAEMD-NMF signal expression method has good expression effect on the time axis.Because the new method takes into account the characteristics of mechanism model, the vibration model is closer to the theoretical model.The result is compared with the highprecision sensor, and the spectrogram has a great similarity. of China (no.2015BAK40B03 and no.2016YFC0701309), and Subproject of Key Project of Beijing, China (no.D161100004916002 and no.Z171100002417022).

Figure 1 :
Figure 1: The histogram of -1 coding's probability distribution in a signal.

Figure 2 :
Figure 2: The histogram of -2 coding's probability distribution in a signal.

Figure 3 :
Figure 3: The histogram of  coding's probability distribution in a signal.

Figure 5 :
Figure 5: A physical picture of five hammers on a knocker.

Figure 6 :
Figure 6: A physical picture of a knocker.

Figure 9 :
Figure 9: Time-domain vibration signals of each point in the effective range.

Figure 10 :Figure 11 :
Figure 10: Comparison of the number of coded characters.

Figure 15 :
Figure 15: Error fluctuation on four distances by four different methods.

Figure 16 :
Figure 16: Comparison of power spectrum between four different methods, sensor signal, and original signal.

Table 1 :
The standard 29-symbol encoding table of -1 matrix.

Table 3 :
The standard 29-symbol encoding table of -2 matrix.

Table 5 :
Comparison of correct rates under different SNRs and different coding methods.

Table 6 :
Comparison of correct rates under different SNRs and different methods.