An Improved Feature Parameter Extraction Algorithm of Composite Detection Method Based on the Fusion Theory

An improved feature parameter extraction algorithm is proposed in this study to solve the problem of quantitative detection of subsurface defects. Firstly, the common feature parameters from the differential signal of pulsed eddy current and ultrasonic are extracted in time domain and frequency domain. Then, the dispersion model and ReliefF model are established to determine the weights of each parameter. Finally, the weights from the two different algorithms are fused by the D-S evidence theory to determine feature parameters. Compared with the PCA feature parameter algorithm from the pulsed eddy current or ultrasonic, the experiment results show the feature parameters extracted by the algorithm proposed in this paper are more effective in quantitative detection of subsurface defects. It will lead to high accuracy in the subsurface defections.


Introduction
Pulsed eddy current testing (PEC) and ultrasonic testing (UT) are considered as the main techniques of nondestructive testing (NDT) that have been widely applied in defect detection of metals [1,2]. PEC is always applied in the low depth in the subsurface defects, while UT is applied in the deep depth. Due to the limited detection capability of PEC and UT, composite detection methods are applied in NDT widely. Feature parameter extraction of PEC and UT signals is the key technique in NDT. Therefore, studying the feature extraction algorithms from PEC and UT signals on different depths of defects and establishing a mathematical model to estimate the depth of subsurface defects have great significance [3,4].
In this study, feature parameter extraction methods of PEC and UT signals are investigated for the quantitative detection of subsurface defects. The quantitative detection results are affected by the result of the feature extraction directly [5,6]. Features can be extracted from the time, frequency, and time-frequency domains of signals. Latif et al. analyzed the features of PEC signals from the time domain and distinguished subsurface defects in stainless steel sheets [7]. Cruz et al. extracted features from UT signals in the fre-quency domain and selected features using statistical techniques, such as PCA [8]. Li et al. extracted the features from PEC signals in the time-frequency domain using EMD and fewer features using PCA to recognize weld [9]. The defect signals may have relatively different responses in the time, frequency, and time-frequency domains. Therefore, several scholars have proposed the algorithms that extract all the responsive features by constructing a multifeatured framework in the time, frequency, and time-frequency domains.
In the process of detection, the reliability of features changes variously when different kinds of defects existed. Wang and Wang studied the feature extraction method of PEC signals measured the depth of multilayer metals and concluded that the peak and fundamental frequency amplitude features were more accurate when the conductivity of the upper layer was less than the lower layer. Otherwise, the zero-passing time was the more effective feature [10]. Nan et al. extracted multiple features from PEC signals in time and frequency domains and combined them to classify surface, subsurface, and corrosion defects [11]. An adaptive feature extraction algorithm with variable weights that fully consider the variety of different defects' feature reliability is necessary for defect recognition [12,13]. The common methods of feature weight distribution include the methods based on statistical distribution and distance measurement [14][15][16]. The statistical distribution methods measure the degree of feature distribution of the sample to determine the weight. Wang and Qian proposed an unsupervised feature selection algorithm based on information entropy, and the feature weight was measured from the information entropy of the feature after dimension reduction [17]. Zhou et al. proposed a feature weighting algorithm based on class variance to improve the accuracy of text categorization accuracy [18]. The distance measurement methods determine the weight of the feature according to the distance between the feature of the same and the different kinds of samples. Li et al. studied ReliefF and FCM clustering algorithms to classify different types of users; ReliefF algorithm was used to assign weights of user features adaptively, and the FCM algorithm was used to classify different types of users [19]. Li et al. proposed an algorithm based on the ReliefF and LSTM network to select features and assign weights adaptively to establish a power system prediction model [20].
At present, the distribution methods usually determine feature weight by studying a method between statistical distribution and distance measurement. In the study of using statistical distribution method to determine feature weight, Chen et al. calculated the feature weight by statistical distribution of information entropy, but it is prone to local minimum when solving small sample events [21]. Hu et al. calculated the first-order mean moment and second-order variance moment of features by Gan method and obtained the image feature weight by statistics, but when the samples were not enough, it was prone to meet the local optimal problem [22]. So the method of statistical distribution is highly dependent on the sample set, and it was prone to meet local minimum when the sample is not enough. In the study of using distance measurement method to determine feature weight, He et al. assigned weights of EMG features by ReliefF distance, but when the training samples were insufficient, it was prone to gain the unreasonable feature weight assignment [23]. Fan et al. proposed an improved ReliefF algorithm, which determined the feature weight by ReliefF distance, but when the training samples were uneven, it was prone to gain the unreasonable weight distribution [24]. So the method based on distance measurement is highly dependent on training samples, when the selected training data samples were uneven or the training was insufficient, the weight distribution obtained might be unreasonable. To overcome the weakness of the algorithms above, in this paper, the statistical distribution and the distance measurement methods are considered, and we propose a feature extraction algorithm based on the combination of dispersion ratio and ReliefF distance. Firstly, the statistical distribution weight is obtained by computing the dispersion ratio. Dispersion ratio is the proportion of the intraclass variance and the interclass variance ratio. Gong et al. proposed to maximize interclass variance and minimize intraclass variance to calculate the dispersion ratio and filtered out the best feature subsets to improve model performance [25]. Chiara et al. calculated the intraclass and interclass distance to obtain the dispersion ratio of the micro-CT samples, and the damage categories were distinguished by the dispersion ratio [26]. Then, the distance measurement weight is obtained by computing the ReliefF distance. ReliefF distance is iteratively difference of the intraclass and the interclass feature distance. Lin et al. proposed a new multilabel feature selection method, which determined the feature weight by ReliefF distance to select the best feature subset [27]. Jin et al. researched the image feature of PD diseases and SWEDD diseases, and the feature weight of two diseases image were assigned by ReliefF distance [28]. Finally, the statistical distribution weight and distance measurement weight are fused by the D-S evidence theory. The best feature weights both considering the statistical distribution and distance measurement of sample feature are obtained. In conclusion, the feature extraction algorithm based on the combination of dispersion ratio and ReliefF distance is proposed in this paper, which combines the advantages of different sources feature (ultrasound and eddy current), makes up for the error of different detection methods, and makes the quantitative estimation of defect depth more accurate.
This paper has been organized as follows. In Section 2, the feature extraction methods from PEC signals and UT signals in the time domain and frequency domain are described in detail. In Section 3, dispersion ratio and ReliefF distance algorithms are introduced to weigh the features from different domains. In Section 4, the D-S fusion algorithm is established to calculate best feature weight and select the optimal features. In Section 5, the results of the experiment are shown to analyze the preferment of the algorithm proposed in this study.

Feature Extraction from Pulsed Eddy Current and Ultrasonic Signals
Based on the PEC and UT detection principles, the PEC and UT signals are analyzed in time domain, frequency domain, and time-frequency domain, and the feature framework of PEC and UT signals is established.

Feature Extraction from Pulsed Eddy Current Signals.
When a metal conductor is placed in a changing magnetic field, a vortex-like induced current or eddy current will be generated in the conductor. The coil is injected with an excitation signal by the pulsed excitation source and generates the excitation magnetic field inductively. When defects exist in the specimen, the induced eddy current and magnetic field distribution inside the specimen will change because of the impact of the defects. The characteristic of defects in the specimen can be obtained by analyzing the variation of the current and magnetic field, as shown in Figure 1. At the rising or falling edge of the given square wave signal, the excitation magnetic field induced will generate eddies inside the specimen based on the principle of eddy current generation. When the square wave signal is kept at the same level, the coil will stop generating the excitation magnetic field, and the eddy current signal inside the specimen will decline. It can be determined whether defects are existed or not in the specimen according to the change of the eddy current distribution.
The PEC signal curve is similar to the square wave signal of excitation in the time domain, and it is also a periodic signal; the period is the same as the excitation. At the beginning and end of each period, the voltage of the PEC signal is considerably larger than the other data, which is called the split voltage, and can be obtained by Python. Thus, the PEC data matrix is obtained for one period corresponding to a PEC test accurately. Then, PEC data is analyzed by the box-graph method to handle outliers; the outliers are found and replaced with the mean value of the voltage nearby. However, the PEC signal contains more high-frequency noise and is always in the state of high-frequency oscillation. The wavelet threshold algorithm and cumulative average algorithm are applied to eliminate high-frequency noise. The denoising results are compared in Figure 2; the signals after the wavelet threshold denoising algorithm and the cumulative average algorithm are smoother.
Fourier transform is used to denoising the signal to obtain its frequency domain signal, and the time and frequency domain curves are shown in Figure 3. The peak value, peak time, and zero-crossing time can be extracted in the time domain, and the amplitude of fundamental component and third harmonic component can be extracted in the frequency domain.

Extraction
Method of Ultrasonic Signals. UT detection method has been widely applied in component fault prevention and diagnosis. Due to the advantages of accurate positioning and high sensitivity, type-A pulsed reflection, UT flaw detector is widely used. When the defects exist in the specimen, the acoustic impedance is changed. When UT waves encounter different acoustic impedance, the reflection time will be changed. The size and depth of internal defects in the specimens are quantified based on reflection time and amplitude. UT is carried out on the specimens with different depth defects, and the ultrasonic echo signals without defects and 3.5 mm deep defects are selected for analysis, as shown in Figure 4.
Based on the UT testing principle, the change of acoustic impedance caused by different defects will cause different UT signal reflection. So, the defects are detected by analyzing the defect echo signal, which can be obtained by the difference of the signals with and without defects.
The wavelet threshold algorithm eliminates the noise of UT signal, and the result is shown in Figure 5, and the defect  3 Journal of Sensors echo signal is obtained by differential of the UT signals after denoising, and the result is shown in Figure 6. Then, the feature from the differential signal can be extracted, such as the maximum differential peak value and the second differential peak value.
2.3. Feature Parameter Extraction from Pulsed Eddy Current and Ultrasonic. The feature parameters are extracted from the response curves in the time domain and frequency domain, and each feature parameter may contain information to reflect defect characteristics. In this study, ten feature parameters are extracted from the PEC signal, and five feature parameters are extracted from the UT signal, as shown in Table 1. x i ðtÞ is the response curve data changing with time, and N is the length of response curve.
Fifteen feature parameters are extracted from PEC signal and UT response curves to establish the feature framework; it is still not certain that each feature parameter contains useful   Journal of Sensors information for defect detection. Therefore, it is necessary to establish a mathematical model for further analysis and weight the feature parameters. In this study, the dispersion model is established, and the standard deviation (SD) of is calculated to measure the statistical distribution weight. The ReliefF (RF) model is established, and the ReliefF distance is calculated to measure the distance measurement weight. The D-S evidence theory is used to fuse two kinds of weights to obtain the best feature weight. The feature parameters are selected according to the weight sorting result.

Feature Parameter Weight Algorithm
A feature parameter weight algorithm based on dispersion ratio and ReliefF distance is proposed, which is shown as Figure 7. Firstly, the common feature parameters from the differential signal of PEC and UT are extracted in time domain and frequency domain. Then, the dispersion model and ReliefF model are established to obtain statistical distribution weight and distance measurement weight. Finally, the weights are fused from the two different algorithms by the D-S evidence theory to obtain the best feature parameters.

Dispersion Evaluation
Model. Generally, features should satisfy the condition that the dispersion of the feature between the same depths should be as large as possible, and the dispersion of the feature between different depths should be as small as possible. Therefore, the method for measuring the validity of features is constructed. The dispersion ratio is defined to reflect the validity of the feature, and it is the proportion of the interclass scatter and the intraclass scatter. The feature is more valid when the dispersion ratio is greater. In the experiments, the number of the defect classes denotes as t. n i is the number of total sample sets of ith class; φ is the set of all the parameters; f i,j is the value of feature parameter j in the ith class. The intraclass scatter of feature parameter j can be calculated as follows: The interclass scatter of parameter j is expressed as follows: Then, the statistical distribution weight of feature parameter j can be calculated as follows: 3.2. ReliefF Evaluation Model. ReliefF algorithm is a feature weight algorithm. The principle is to calculate the correlation between features and samples of different types, which means ReliefF distance (it is obtained by iteratively calculating the difference between the distance between features and similar samples and the distance between features and different samples). Then, distance measurement weight is measured according to the ability of features to distinguish sample  Defect band mean The mean ordinate of the bands with significant defect signals in the differential response curve UT 12 Maximum differential peak The maximum peak value of the wave caused by the difference between the defect and the nondefect signals in the differential signal UT 13 Second differential peak In the differential signal, the second peak value of the wave is caused by the difference between the defect and nondefect signals. UT 5 Journal of Sensors categories. In the experiments, the sample set of the ith class denotes as X i . The depth of the sample in the same class is the same. There is a matrix A j to represent each parameter j. Each row represents an experimental measurement of the same parameter. The specific steps are as follows: Step 1. An experiment sample R i is randomly taken from all samples in the same parameter.
Step 2. The k experimental samples are taken from the same class with R i , which is near with the selected sample in Step 1. This k experimental sample set denotes as H i .
Step 3. The k experimental samples are taken from the lth class different with R i , which is near with the selected sample in Step 1. This k experimental sample set denotes as M i,l .
Step 4. The RH and RM i are calculated. RH is the mean distance between R i and each sample of H i . RM l is the mean distance between sample R i and each sample of M i,l .
The normalized formula between R 1 and R 2 is defined as follows: where R 1 and R 2 are two samples and | R 1 | and |R 2 | are values of the sample. The formula of RH and RM l is as follows: RH and RM l are used to reflect the intra-and interclass dispersions, respectively.   Note: SD, RF, and DS represent the SD discrete radio, ReliefF algorithm, and D-S fusion, respectively.

Journal of Sensors
Step 5. The weight of parameter j denotes as w j , and the initial value of the parameter is zero. An update of parameter weight can be expressed as follows: where A is a parameter and PðX i Þ is the proportion of class X i .
Step 6. The final weight matrix w j is obtained by the sample set. That is, repeat m randomly in the same parameter.
3.3. Feature Fusion Model Based on D-S Evidence Theory. Based on the above discussion, the measure of the statistical distribution and distance measurement weights is obtained by dispersion model and ReliefF model. These two methods have their advantages and disadvantages, and how to use them comprehensively to select the feature parameters has become a key problem. D-S evidence theory is used to deal with this problem. The main principle of the D-S evidence theory is to fuse the decision-making layer through an uncertain reasoning method, which also can fuse of incomplete or even conflicting information evidence. The specific implementation is as follows:Therefore, the weight matrix should be normalized, as shown as follows: where mðjÞ is the trust of the feature parameter j and w j is the weight of feature parameter j. The evidence theory rules can be expressed as follows: where m 1 ðAÞ is the trust of the feature parameter A, which is obtained by the ReliefF algorithm and m 2 ðAÞ is the trust of the feature parameter A, which is obtained by SD. m 1 ⊕ m 2 ðAÞ is the new weight of the reliability of parameters obtained by D-S evidence theory.
Step 1. The value obtained by SD and the weight matrix obtained by the ReliefF algorithm is merged into a twocolumn matrix. The new feature parameter weight matrix Θ is obtained.
Step 2. The weight calculated by the two methods can be considered the concept of trust. The mass function is defined based on previous ideas. The trust of the feature parameter A is mðAÞ, that is, K is the normalized constant as follows:

Analysis of Experimental Results
PEC and UT detectors are used for detection, respectively. PEC test adopts the pulse signal with the frequency of 100 Hz, the peak value of 5 V, and duty cycle of 50% as excitation signal, and the sampling frequency is set to 200 kHz; UT test adopts edible oil as a coupling agent to make the detection probe and test piece closely fit, and the sampling 7 Journal of Sensors frequency is set to 5 MHz; transmitting and receiving mode is set to single probe working mode. The experiment data are collected from a different depth of defects from the PEC signal and UT signal to evaluate the feature parameter extraction algorithm's performance. The PEC feature parameters can be extracted as Table 1, and the measure of the statistical distribution and distance measurement weights of each parameter can be calculated though SD and ReliefF algorithm by Eq. (3) and Eq. (7). Finally, the normalized weight through D-S fusion theory is obtained, as shown in Table 2 and Figure 8. And it shows that the weight of feature parameters labeled 4, 8, 9, and 10 is larger than others. Therefore, the mean value of the signals, the kurtosis index, amplitude of the fundamental component, and amplitude of the third harmonic components are selected as PEC signals of the feature parameters.
The UT feature parameters can be also extracted as shown in Table 1. The measure of the statistical distribution weight, distance measurement weight, and fused weight are shown in Table 3 and Figure 9. And it shows that the weight of feature parameters labeled 11 is larger than others. Therefore, the defect band mean is selected as the optimal feature parameter of UT.
In order to ensure the validity of the features extracted, the number of samples for many times is changed in the paper. From the results, the above features (Nos. 4, 8, 9, and 10 features of PEC signal and No. 11 feature of UT signal) still have larger weights. It shows the features selected have low sensitivity to the change of the sample set and universality and better universality. Thus, the features of PEC and UT signals extracted by the model have better validity in different situations, as shown in Table 4.
To evaluate the performance of the feature parameter extraction algorithms, the algorithm proposed in this paper and PCA algorithm was compared, and the depth of the subsurface defections was calculated based on the data from the experiments above. To detect the depth based on the extracted feature parameters after fusion algorithm and the extracted feature parameters from single signal, the results are shown in Figure 10.
In order to test the effectiveness of the feature extraction method, the materials with subsurface defect depths of 0.5 mm, 1 mm, 1.5 mm, 2 mm, 2.5 mm, 3.5 mm, 4 mm, and 4.5 mm are detected by PEC and UT, respectively; then, the method proposed and PCA are used to extract the features of the PEC and UT testing data. Finally, the depth of subsurface defects is quantitatively estimated by the features extracted, and the average error of quantization is shown in Figures 10 and 11. The results show that the defect depth estimation errors of PEC and UT based on PCA are similar, and the proposed method has the minimum average error. When the depth of defect is between 2 mm and 3 mm, the accuracy of the proposed method is close to that of PEC. When the depth of defect is shallower or deeper, the accuracy of depth estimation of feature parameters selected by the algorithm is obviously higher than that of the other two.

Conclusion
This study proposes an algorithm for extracting feature parameters of NDT based on fusion theory. Firstly, PEC and UT signal data with different defect depths are preprocessed to handle outliers and noise, and the time and frequency domain feature parameters are extracted to form a feature framework. Then, the statistical distribution and distance measurement weights of feature parameters are   Journal of Sensors calculated using dispersion ratio and ReliefF distance algorithm, and the D-S evidence theory is used to fuse two kinds of weights to obtain the best feature weight. Finally, through the subsurface defect depth quantization experiment, it is proved that the feature parameters obtained in the paper have better validity and accuracy for defect depth quantification.

Data Availability
The simulation data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.