Intelligent Defect Identification Based on PECT Signals and an Optimized Two-Dimensional Deep Convolutional Network

. Accurate and rapid defect identiﬁcation based on pulsed eddy current testing (PECT) plays an important role in the structural integrity and health monitoring (SIHM) of in-service equipment in the renewable energy system. However, in conventional data-driven defect identiﬁcation methods, the signal feature extraction is time consuming and requires expert experience. To avoid the diﬃculty of manual feature extraction and overcome the shortcomings of the classic deep convolutional network (DCNN), such as large memory and high computational cost, an intelligent defect recognition pipeline based on the general Warblet transform (GWT) method and optimized two-dimensional (2-D) DCNN is proposed. The GWT method is used to convert the one-dimensional (1-D) PECT signal to a 2D grayscale image used as the input of 2D DCNN. A compound method is proposed to optimize the baseline VGG16, a well-known DCNN, from four aspects including reducing the input size, adding batch normalization layer (BN) after every convolutional layer(Conv) and fully connection layer (FC), simplifying the FCs, and removing unimportant ﬁlters in Convs so as to reduce memory and computational costs while improving accuracy. Through a pulsed eddy current testing (PECT) experiment considering interference factors including liftoﬀ and noise, the following conclusion can be obtained. The time-frequency representation (TFR) obtained by the GWTmethodnot onlyhasexcellentability intermsofthetransientcomponentanalysisbut alsoislessaﬀectedby thereductionofimage size; the proposed optimized DCNN can accurately identify defect types without manual feature extraction. And compared to the baseline VGG16, the accuracy obtained by the optimized DCNN is improved by 7%, to about 99.58%, and the memory and computational cost are reduced by 98%. Moreover, compared with other well-known DCNNs, such as GoogLeNet, Inception V3, ResNet50, and AlexNet, the optimized network has signiﬁcant advantages in terms of accuracy and computational cost, too.


Introduction
Renewable energy source such as wind and tides are gradually replacing coal and oil as new energy sources because of their inexhaustible and pollution-free advantages. e large in-service equipment applied in the renewable energy system such as supports and pipes often suffers from various defects due to the harsh working environment, which is dangerous for the safe operation of the renewable energy system. So, it has important practical significance to explore an accurate, fast, and concise intelligent defect identification method applying the latest achievements of artificial intelligence into the time-series signal that are widely used in the field of renewable energy source, for example, pulsed eddy current testing, ultrasound testing, guided waves testing, and magnetic flux leakage testing.
With the advantages of excellent detection ability for surface and internal defect, a simple mechanism, and a low price, pulsed eddy current testing-(PECT-) based defect identification has always been a research hotspot in the field of nondestructive testing [1]. Defect identification methods using PECT can be classified as model-based and datadriven. Model-based methods usually establish a suitable electromagnetic model to analyze the defects, which is generally complicated and time consuming [2,3]. Datadriven methods can build defect identification models with the help of historical inspection data, which is suitable to complex systems for which it is difficult to establish accurate models through mechanism analysis [4,5]. In recent years, the popularization of intelligent manufacturing and the development of information technology have provided new opportunities for data-driven defect identification methods and predicting remaining useful life [6,7].
Conventional data-driven defect recognition includes the two steps of manual feature extraction and defect pattern recognition [8]. PECT signal feature extracted can be roughly divided into time-domain features, statistical features, frequency domain, and time-frequency domain features. Commonly used time-domain features, such as signal peaks, peak rising times, and zero-crossing times of differential signals, are simple but susceptible to noise and liftoff interference. Frequency-domain features, such as the frequency spectrum separating point [9] and specific frequency components or frequency bands [10], as extracted from the amplitude spectrum of a fast Fourier transform (FFT), often vary with the signal and are susceptible to noise. Statistical analysis methods, such as principal component analysis (PCA) [11] and independent component analysis (ICA) [12], have been applied to extract more effective timeor frequency-domain features. Time-frequency domain features generally adopt time-frequency analysis (TFA) methods, such as wavelet transform [13] and ensemble empirical mode decomposition (EEMD) [14], to transform a PECT signal to a two-dimensional space and then use such methods as PCA to extract a few components to form a feature vector. However, the manual feature extraction often requires great familiarity with the detection objects and signals and takes much trial and error to match the appropriate feature vector. In addition, the inevitable denoising before feature extraction would cause information loss. So, it is really a difficult task. e following pattern recognition is much simpler task. Machine learning methods, such as neural networks [11] and support vector machine [15], are used to establish the mapping relationship between features and defect patterns.
In recent years, with the development of artificial intelligence and machine learning, the deep convolutional neural network (DCNN) has become an important adaptive signal processing method [16]. Classical DCNN architectures consist of a series of convolutional layers (Convs) containing multiple filters, pooling layers, a rectified linear unit (ReLU), and fully connected layers (FCs) [17], where the Convs can automatically perform feature extraction through convolution operation, and the pattern recognition can completed by the FCs. So, DCNN can directly realize pattern recognition from a two-dimensional (2D) input signal and avoid the difficulties of manual feature extraction. Some scholars have carried out research on applying DCNN to PECT. For example, Cheon et al. adopted a single convolutional neural network (CNN) model to extract effective features for defect classification [18]. Saeed et al. adopted a pretrained DCNN and transfer learning to recognize artificial subsurface defects in a carbon fiber reinforced polymer (CFRP) sample, where the defect (i.e., input) signal was a 2D thermogram obtained from a pulsed-thermography setup [19]. Zhu et al. used a CNN to form a pattern recognizer to improve the identification accuracy of heat exchange tube defects using eddy current testing [20]. However, there is little research on how to directly realize the intelligent defect identification from a 1-D PECTsignal with the application of DCNN. ere are two possible reasons for this. First, PECT signal is 1D, while DCNN usually adopts a 2D convolutional layer for good feature extraction, which requires a 2D input [21]. Second, the parameters and the intermediate variables of DCNN need too much memory and computational effort [22], which restricts the application of DCNN to nondestructive testing, including PECT, because the inspection and maintenance of large-scale in-service equipment in the field of renewable energy source is often carried out using portable NDT equipment within a given time, and small memory and fast speed are as important as the accuracy for the signal processing methods.
Some scholars have carried out research on technical pipeline based on time-domain signals and DCNN in the field of bearing fault diagnosis. Time-frequency analysis (TFA) methods, such as the short-time Fourier transform (STFT) [23], synchrosqueezed transform (SST) [24], Wigner-Ville distribution (WVD) [25], and ensemble empirical mode decomposition (EEMD) [26], were used to transform time-series signals into 2D time-frequency representations (TFR) as the input of 2D DCNN. e abovementioned TFR methods cannot adaptively configure important parameters according to signal characteristics so that the obtained TFRs have a certain degree of distortions, which will inevitably affect the accuracy of subsequent pattern recognition. Peng et al. proposed the parameterized time-frequency analysis method which can adaptively match the appropriate parameters of TFA methods for the signal model through a matching kernel function [27]. is method can more accurately express the time-frequency characteristics of transient signals [28,29] and has been successfully used in low-signal-to-noise ratio, multicomponent, and weak-signal processing methods, such as micro-Doppler [30] and multiradar signals [31]. However, there is little research on its application to PECT.
On the contrary, in terms of reducing memory and computational costs of DCNN, scientists have conducted valuable research. For example, Li and Frankle et al. used the L1-and L2-norm to rank the importance of filters in the convolutional layers (Convs) and removed the unimportant ones [32,33]. Liu et al. introduced a scaling factor for each filter (channel), which could automatically identify insignificant channels, and pruned afterwards [34]. Molchanov et al. treated network pruning as an optimization problem and proposed a criterion based on a Taylor expansion that approximated the change in the cost function [35]. Current research on network optimization methods is carried out on large databases such as Cifar and ImageNet, which usually contain thousands of image data. However, as far as nondestructive testing is concerned, it is difficult to build a large damage database even with the wide application of big data because defects are extremely rare in normally operating equipment. Moreover, in addition to reducing the parameters of the Convs and the FCs, how to reduce the 2 Complexity intermediate variables of the DCNN without causing the deteriorating accuracy is also worthy of further study. In order to avoid the difficulty of manual feature extraction and overcome the shortcomings of DCNN that require large memory and computational cost, we propose a defect recognition pipeline based on PECT signal and an optimized DCNN to intelligently, quickly, and accurately identify defect. In the pipeline, the general Warblet transform (GWT) method, a kind of parameterized time-frequency method, is proposed to transform the PECT signal into a 2D TFR as the input of DCNN. A novel compound method is proposed to optimize VGG16 baseline architecture to obtain optimized DCNN which is used for feature extraction and pattern recognition of 2D TFRs. e pipeline was verified by PECT experiment considering such interference as lift-off and noise in terms of accuracy and computational cost. e remainder of this article is organized as follows. Section 2 briefly introduces the self-made PECT equipment and the dataset used to verify the pipeline, which is introduced in Section 3. e pipeline is verified and analysed in Section 4. We discuss our conclusions in Section 5.

Self-Made PECT Equipment and Defects
2.1. Self-Made PECT Equipment. PECT technology uses pulse excitation with a certain duty cycle, which can be treated theoretically as a superposition of a series of harmonic components. us, PECT technology can potentially obtain better sensitivity for both surface and internal defect than conventional eddy current testing excited by a harmonic signal [36]. e data were obtained by the self-made PECT equipment, as shown in Figure 1, which consists mainly of the following: (a) PECT probe; (b) excitation signal generator; (c) amplifier and low-pass filtering; (d) data-acquisition card (DAC); (e) power module; and (f ) detection signal. e PECT probe consists of a ferrite core, detection coil, and driver coil. e ferrite core concentrates more dense magnetic lines around the probe for better sensitivity and deeper detection. e detection and driver coils are coaxially wound with a ferrite core. e detection coil is inside and is wound with 1000 turns of copper wire. e driver coil has 500 turns of copper wire, is outside, and is excited by a square pulse with a frequency of 100 Hz, amplitude of 5 V, and duty cycle of 50%. e pulse excitation signal is generated by the STM32 microcontroller through internal triggering. e excitation coil is excited by the pulse signal and induces a transient eddy current in the conductor specimen according to the Maxwell equations. Subsequently, the changing eddy current field generates an induced magnetic field above the test piece, which is picked up by the detection coil as a voltage signal. After amplification and low-pass filtering, the voltage signal enters the DAC with a sampling rate of 500 k/s. e power module can provide power to each part. e display and save function of the detection signal is programmed by LABVIEW software. And the detection signal, as shown in Figure 1(f ), is a 1D time-series signal.

Introduction of Defect.
A specimen with three kinds of artificial defects, i.e., surface defect, internal defect, and hidden defect, was used to verify the proposed defect identification pipeline, as shown in Figure 2. e two simulated surface defects were shown in Figure 2(a), where the first one was a crack with length 15 mm, width 1 mm, and depth 3 mm. e second one consisted of two cracks like the first one at a distance of 2 mm. Two simulated internal defects were two holes with diameters of 3 mm and 5 mm, respectively, and 2 mm distance from the surface, as shown in Figure 2(b). e artificial hidden defect consisting of a crack and hole in the same vertical position was used to simulate a very dangerous situation in which a serious internal defect was hidden by a slight surface defect. e sizes of crack and hole were shown in Figure 2(c). It is worth noting that the real internal defects are different from those shown in Figure 2. However, considering the difficulties in processing hidden internal defects, holes are often used to simulate various internal defects for research work and employee training.
In the experiment, the probe stood upright to collect data. e liftoff (the distance between the probe end face and the specimen surface) was set to 0.5 mm. e liftoff is a common interference during PECT [2]. To verify the robustness of the proposed method against this interference, the signal of #2 Crack with a liftoff of 1 mm was extracted. PECT signals were transformed to 2D TFRs through timefrequency analysis (TFA) and converted to grayscale images to form a dataset. Salt-and-pepper noise with amplitude 0.02 was added to imitate signals contaminated by noise that is another common interference [37]. Finally, 596 signals, including 298 surface defects, 198 internal defects, and 100 hidden defects, were obtained. Sixty percent of the dataset was randomly selected as the training set, and the remaining 40 percent was the test set. Table 1 displays information on the training and test sets.

Method
e proposed intelligent defect identification pipeline consists mainly of input signal processing module and network optimization module based on VGG16, as shown in Figure 3. In the input signal processing module, the GWT method transforms the 1D PECT signals to 2D TFRs, which are converted to grayscale images. Size compression is performed to save memory and increase the processing speed. In the network optimization module the VGG16 baseline architecture is improved from four aspects including reducing the input size, adding BNs, simplifying the FCs, and removing unimportant filters in Convs so as to reduce memory and computational costs while improving accuracy. In this section, the pipeline will be introduced.

2D TFR of PECT Signal Based on GWT Method.
We propose to use the GWT method to obtain the TFR of PECT for two reasons. First, as a parameterized time-frequency analysis method, GWT constructs a matching kernel     Complexity function to adaptively match the appropriate parameters for the signal model, so the obtained 2-D TFR has excellent time-frequency resolution and energy concentration. Second, the PECT signal can theoretically be regarded as the superposition of a series of harmonic signals characterized by Fourier series, while the kernel function of the GWT method is also constructed by Fourier series. us, GWT can better approximate the PECT signal so as to accurately characterize the instantaneous frequency components contained in the PECT signal. e GWT method can be defined as [38] GWT is a window function, and here, the Gaussian window function is Only a proper kernel function can make the GWT method obtain TFR with good energy concentration and a clear instantaneous frequency. We estimate the coefficients of the kernel function as follows [38].
Step 1: assume , and bring them into equation (1) to obtain the initial TFR of the PECT signal.
Step 2: extract the location of the local maximum energy in TFR as the estimated instantaneous frequency F i (t).
Step 3: calculate the Fourier transform F(nω 0 ) and Fourier coefficients F nω 0 � 1 2 α n + jβ n , n < 0, Step 4: bring the Fourier coefficients into equation (1) to get a new TFR, and repeat steps (2) and (3) until the difference between the instantaneous frequencies obtained in two consecutive iterations is less than the preset threshold δ, at which time the iteration is terminated.
e TFR of a PECT signal from 1# crack obtained by the GWT method is shown in Figure 4. Considering that the PECT signal only responds at the moment when the pulse is triggered and then quickly decays, the TFA method is only performed on the signal segment at the moment of triggering.
To verify the effect of GWT, four other TFA methods were used to process the same signal. ese are the shorttime Fourier transform (STFT), ensemble empirical mode decomposition (EEMD), synchrosqueezed transform (SST), and smooth pseudo-Wigner-Ville distribution (PSWVD). e obtained TFRs and corresponding grayscale images are shown in Figures 5-8.
It can be seen from the figures that the TFRs obtained by GWT, STFT, and PSWVD are denser time-frequency spectra, while the TFRs of EEMD and SST only indicate a few timefrequency components. e spectra obtained by GWT and PSWVD have clearer profiles than STFT, which shows that these two TFA methods perform better in terms of instantaneous frequency analysis. However, the two methods produce peak-shaped distortions at the initial stage. is is because the PECTsignal includes rich transient components at the moment of excitation, which inevitably causes cross-interference during the TFA process. GWT produces less distortion than PSWVD because of its better transient recognition ability.

Information Loss as Reducing Image
Size. We will use the image entropy to evaluate the information loss generated when the TFRs obtained by the above methods are reduced in size.

Complexity
Shannon introduced the concept of entropy to information theory to measure information [39]. In image processing, the 1D image entropy E may indicate the amount of information contained in an image through the aggregation characteristics of the gray distribution in the image, and this can be calculated as follows: where p i is the probability of the ith gray value appearing in the image, which can be obtained from the gray histogram. Table 2 shows the image entropies and changes corresponding to several TFRs when the size is reduced from 224 × 224 to 32 × 32. e values shown in Table 2 are in accordance with  e grayscale images obtained by STFT, PSWVD, and GWT contain more information; hence, their information entropies are larger. TFRs obtained by EEMD and SST have only a few curves representing the frequency components, so their information entropies are smaller. When the signal is reduced from 224 × 224 to 32 × 32, it can be seen from Table 2 that the changes of information entropies of the first three are relatively small, such as 1.53% for the GWT method used in this article. However, the information entropies of EEMD and SST increase significantly. For example, the change of information entropy for EEMD reaches 168.39%, which means that the distribution of image information changes greatly, which may have a significant impact on defect recognition.

Network Optimization Based on VGG16
3.2.1. VGGNet. VGGNet was developed by the Visual Geometry Group of Oxford University and Google DeepMind in 2014 [40]. So far, VGGNet is still the most commonly used pretraining architecture due to its excellent accuracy and feature extraction capabilities. Among them, VGG16 includes 5 Convs and three FCs and requires input of 224 × 224. Each convolutional layer is composed of multiple subconvolutional layers with small kernel and a pooling layer. e architecture increases the number of nonlinear mappings and improves the fitting ability of the network, which makes the VGG16 architecture more suitable to engineering applications where it is difficult to obtain a large number of samples. Moreover, the VGG16 architecture is simple and direct, so it is easy to implement network improvements.

Network Optimization Method.
A wider and deeper architecture can improve the feature extraction and mapping capabilities of DCNN on input signals, but it has more parameters and requires more memory and computational effort. In fact, many channels (called filters in this article) in the Convs, especially deeper Convs, have very low or even zero weights and have not played the expected role [32]. Regarding FCs, previous research has proved that more layers and hidden nucleotides do not imply stronger mapping ability [8]. In terms of the input image, although larger size (or more pixels in the image) means more information is contained, more memory and computation cost are required due to more intermediate variables (such as feature maps).
We propose a network optimization method based on VGG16, which comprises the following four steps.
First, an adaptive pooling layer is adopted so that the VGG16 architecture accepts smaller input. In this article, the input signal is reduced from 224 × 224 to 32 × 32.
Second, a batch normalization layer (BN) is added after each convolutional layer (Conv) and fully connected layer (FC). BN refers to normalizing the data of each minibatch to a mean and variance of 0 and 1, respectively, when DCNN is trained by gradient descent. e BN alleviate the gradient disappearance (or explosion) phenomenon during DCNN      Complexity training, speeding up the model training and reducing the dependence on the initial parameters [41].
ird, the original three FCs are compressed to two, and the number of hidden neurons in each FC is reduced from 4096 to 512.
Finally, unimportant filters (channels) in the Convs are removed. e importance of each filter is evaluated based on the activation value (the output of the activation function). e evaluation criterion is the absolute value of the firstorder term in the Taylor expansion of the objective function relative to the activation value. e biggest advantage of this method is the avoidance of additional calculation. e principle of this criterion based on the Taylor expansion is as follows [35]. Assume that the cost of pruning operations can be described by where

C(D, h i � 0) is the cost if output h i is pruned and C(D, h i ) is the cost if h i is not pruned. e first-order Taylor polynomial near
where R 1 (h i � 0) is the first-order remainder and is neglected here. So, equation (5) can be written as follows: Specifically, the kth filter of the lth convolutional layer is written as z (k) l , and the cost function generated by removing the filter is Θ TE , which can be calculated as follows: where M is the length of the vectorized feature map. In fact, z (k) l,m is the activation of the kth filter in the lth convolutional layer. e partial derivative terms can be obtained from backpropagation. So, Θ TE can be obtained without additional calculation. All of the filters will be sorted according to their Θ TE values.
To avoid the performance degradation caused by removing a large number of filters at one time, we adopt multiple iterations of pruning and retraining to compress the network. e filters in all of the Convs are sorted according to the Taylor criterion, and the least important N filters are removed, where we set N � 512. e pruned network is retrained using the dataset shown in Table 1. After five iterations of pruning and training, the pruning operation is terminated. It is worth noting that the improved VGG16 architecture used in this article is quite different from the baseline VGG16 architecture due to improvements such as adding BN and reducing FCs. us, the architecture is trained by the dataset in Table 1 from scratch to obtain weights before iterations of pruning and retraining. Figure 9 compares the baseline and optimized VGG16. e baseline VGG16 architecture requires 224 × 224 input signals, and the Convs directly perform convolution processing on the input signals to obtain the feature map. After the convolutions are completed, feature maps obtained by the last convolutional layer are fed into three FCs for mapping between feature maps and defect patterns. When the input signal is 224 × 224, the number of input neurons in the first FC layer is 25088. e numbers of hidden neurons in the first and second FC are both 4096. e number of hidden neurons in the third FC is three, which is equal to the number of defect types. e input size in the optimized VGG16 architecture is 32 × 32. e BNs are added after every Conv and FC and the numbers of hidden neurons in the two FC layers are 512 and 3. If the jth filter in the ith convolutional layer Con ith , F Con (i, j), must be deleted, then the corresponding filter in the BN behind the Conv, F BN (i, j), and the feature map FM(i, j), also must be deleted.

Results and Discussion
We verify the proposed intelligent defect recognition pipeline and analyze the effect of TFRs and network optimization method on defect identification in detail.

Effect of TFR on Defect Identification.
To verify the effectiveness of the GWT method, four common-used TFA methods, i.e., STFT, PSWVD, EEMD, and SST, were also used to transform PECT signal in Table 1 into 2D TFR and formed the training set and test set. Five pretrained DCNN architectures, as shown in Table 3, were used to build end-toend pattern recognizers to identify defects with the above 2D dataset. For simplicity, we do not repeat the principles of these methods, which are available in references. e result of defect identification is shown in Figure 4. e optimizer used by five DCNNs was stochastic gradient descent with momentum (SGDM), whose main parameters, i.e., momentum, batch size, initial learning rate, and training epochs, were set as 0.9, 4, 0.0001, and 20, respectively. e loss function was the crossentropy function. In Table 4, the accuracy is the average test accuracy from five experiments. e time, used to indicate the computational cost of different algorithms, is the training time of an epoch in an experiment. In the following content, except for special instructions, the same parameter settings are adopted. e processor was an Intel Core i5-7300U with a main frequency of 2.60 GHz. e algorithm was realized by Python software and ran on a single GPU. It can be seen from Table 4 that GWT generally performed better in combination with all five pretrained DCNNs. e simple architectures of VGG16 and AlexNet obtained higher accuracy than the complex architectures of GoogLeNet, Inceptive3, and ResNet50 due to the small dataset. Figures 10-11 show the confusion matrices, accuracy, and loss function curves obtained during an experiment for two pipelines of TFR and DCNN. It can be indicated that besides the end-to-end pattern recognizer, TFRs also have a great impact on the identification accuracy. e defect identification pipeline composed of AlexNet and the other four TFRs except for STFT can achieve better accuracy. AlexNet cannot extract the features contained in the less clear TFR obtained by STFT. So, this leads to a larger deviation between the training accuracy and the validation accuracy, as shown in Figure 11(b) e TFR obtained by the GWT method not only has excellent ability in terms of the transient component analysis but also is less affected by the reduction of image size, which not only guarantees high recognition accuracy but also is very helpful for reducing the memory and computational cost of DCNN.

Network Optimization Methods.
As analysed above, simple architectures such as VGG16 and AlexNet are more suitable for applications with small datasets. However, VGG16 had a much higher memory and computational cost than AlexNet due to its deeper and wider architecture. erefore, it is necessary to explore effective network         Complexity optimization methods to reduce the memory and computational of VGG16 while ensuring recognition accuracy, so as to be more suitable for engineering applications. e proposed network optimization methods have four steps: reducing the input size, adding BNs, simplifying FCs, and removing unimportant filters in Convs. e following section will verify the proposed method in detail.

4.2.1.
Reducing the Size of 2D Input. VGG16 can accept input signals of any size by adopting an adaptive pooling layer. In this article, the input signal is reduced from 224 × 224 to 32 × 32. In this section, the impact of using smaller input on memory is discussed, as shown in Table 5. It is worth noting that only the memory occupied by intermediate variables including the feature map from Convs and pooling layers (shown as ParamI in Table 5) and model parameters (ParamII in Table 5) are considered in this article. e format of ParamI is height × width × number of filters, and the format of ParamII is height × width × number of input filters × number of output filters. e feature map from the Convs and pooling layers are calculated by the following formula: where Con and Pool, respectively, represent the Convs and pooling layers. f is the Kernel size. e subscripts H and W, respectively, indicate the height and width of the feature map. Symbols with the subscript 0 represent the input feature map. e hyperparameters of the convolutional layer are as follows: f is 3, stride is 1, and padding is set to 1. e maximum pooling layer is adopted, where f is 2, stride is 2, and padding is 0. It can be seen from us, using small-size input will save significant memory, which is of great significance to the engineering application of DCNN. e impact of using smaller input on accuracy is discussed in Section 4.2.2.

Adding BNs and Simplifying FCs.
We discuss the effects of adding BNs and reducing FCs, by comparing the performance of VGG16 and three VGG16 based variants, as shown in Table 6, where VGG16 is the baseline VGG16 architecture, and VGG16I, VGG16II, and VGG16III are three kinds of VGG16 variants whose input sizes are all 32 × 32. eir main differences are whether to add BNs and simplify FCs. e accuracies and computational costs of the four architectures have been compared in Table 7.
e following conclusions can be obtained from Table 7. First, by comparing VGG16 I and VGG16, it can be seen that reducing input size significantly reduces the computational cost, about 90%. However, the accuracy also decreases significantly, especially for EEMD and SST, whose accuracies decreased by over 30%. is may be because the information entropies of TFRs obtained by these two methods will change greatly as the size decreases, as shown in Table 2.
e other three TFA methods are less affected by size changes, so their accuracy drops within 10%. Second, by comparing VGG16II and VGG16I, it can be seen that the adding BNs significantly improves the recognition accuracy, especially for the GWT method, whose accuracy reaches 96%. e computational cost increases slightly because the addition of the BN layers requires a small amount of computational cost.
ird, by comparing VGG16III and VGG16II, it can be seen that simplifying FCs does not worsen the accuracy and further reduces the computational time by 40%. Table 7 verifies the effectiveness of the used GWT method for PEC signal processing again, especially in the case of small size. Figure 12 contrasts the training accuracies, training loss functions, and test accuracies obtained by the above three improved VGG16 architectures, where curves 1, 2, and 3 in figures represent VGG16I, VGG16II, and VGG16III, respectively.
It can be seen that, for VGG16I without the BNs, i.e., curve 1 in Figure 12, neither the accuracy nor the loss function has been improved with the training process. However, for the VGG16II and VGG16III with BNs, i.e., curves 2 and 3 in Figure 12, their training accuracies increase from 40% to 100%, loss functions decrease from about 1 to nearly 0, and testing accuracies reach about 97% after five epochs of training. Figure 12 shows intuitively that the adding BNs is important in improving the accuracy and convergence speed of DCNN and that simplifying FCs will not degrade its performance.

Pruning Filters in Convs.
We discuss the pruning filters in Convs based on the VGG16III architecture. Filter pruning is carried out in five iterations including pruning and retraining. e activation values of the feature maps generated by all of the filters in each Conv are calculated and ranked according to the Taylor criterion described by equations (5)- (8), and the 512 filters with the lowest contribution are removed. e training set shown in Table 1 is used to retrain the pruned architecture to maintain performance. Five iterations are performed, and a total of 2,560 filters are pruned. Table 8 shows the numbers of remaining filters in every Conv and the testing accuracy after retraining in each iteration of pruning and retraining. It is worth noting that the VGG16III architecture is pretrained using the PECT dataset, as shown in Table 1 Table 5.
ird, the deeper the Conv is, the higher the pruning rate (i.e., the percentage of removed filters in the initial filters) is, gradually increasing from 34% in the first Conv to 87.5% in the last. Finally, the learning rate plays an important role in improving network performance, as shown in Figure 13. When a small LR is used, e.g., LR � 0.00001, the adjustment   is slight and a satisfying training effect can be obtained. e testing accuracy has increased and maintains about 99% after three epochs of training, as shown in Figure 13(a).
However, when the LR is larger, e.g., LR � 0.0001, the adjustment fluctuates sharply, and it is difficult to obtain an ideal training effect, as shown in Figure 13(b).  In a word, the optimized VGG16 structure obtained by the proposed network optimization method not only reduces the computational effort significantly but also improves the accuracy. Moreover, VGG16 architecture is simple and straightforward, and the proposed network optimization method does not introduce additional calculations, so the proposed method has low complexity and is suitable for industrial applications.

Conclusion
We have proposed and verified an intelligent defect identification pipeline based on time series signals and optimized VGG16, which can accurately identify the defect type without manual feature extraction. By comparison to other TFA methods, the GWT method not only has a clearer resolution to the transient components contained in the PECT signal but also ensures that the image information is not lost when the TFR size is reduced to 2%. It has also been verified by experiments that the proposed optimized VGG16 can increase the accuracy by 7% and reduce the running time and memory by 98%. is provides an effective solution to the problems of large memory and high computational cost that restrict the application of DCNN in engineering applications. In the future work, other networks besides DCNN, such as fast recurrent Neural Network and long short term memory (LSTM) network, will be used in nondestructive testing. In addition, the influence of nonlinear interference, such as temperature and stress, on defect recognition will be further studied.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.