A Damage Detection Method Using Neural Network Optimized by Multiple Particle Collision Algorithm

A critical task of structural health monitoring is damage detection and localization. Lamb wave propagation methods have been successfully applied for damage identification in plate-like structures. However, Lamb wave processing is still a challenging task due to its multimodal and dispersive characteristics. To address this issue, data-driven machine learning approaches as artificial neural network (ANN) have been proposed. However, the effectiveness of ANN can be improved based on its architecture and the learning strategy employed to train it. The present paper proposes a Multiple Particle Collision Algorithm (MPCA) to design an optimum ANN architecture to detect and locate damages in plate-like structures. For the first time in the literature, the MPCA is applied to find damages in plate-like structures. The present work uses one piezoelectric transducer to generate Lamb wave signals on an aluminum plate structure and a linear array of four transducers to capture the scattered signals. The continuous wavelet transform (CWT) processes the captured signals to estimate the time-of-flight (ToF) that is the ANN inputs. The ANN output is the damage spatial coordinates. In addition to MPCA optimization, this paper uses a quantitative entropy-based criterion to find the best mother wavelet and the scale values. The presented experimental results show that MPCA is capable of finding a simple ANN architecture with good generalization performance in the proposed damage localization application. The proposed method is compared with the 1-dimensional convolutional neural network (1D-CNN). A discussion about the advantages and limitations of the proposed method is presented.


Introduction
In safety-critical systems, failure detection and prognostic (FDP) approaches are essential to avoid catastrophic failures. The two main tasks of FDP methods are incipient failure detection and remaining useful life estimation through prognostics techniques [1]. The FDP approaches can be applied in different critical systems as wind turbine [2], gas turbines [3], power systems [4,5], and transmission lines [6].
Structural health monitoring (SHM) investigates the damage detection and prognostic in structural components of critical systems such as aircraft and bridges. The SHM system provides an online solution for performing in-service structural analysis reducing maintenance costs [7]. The SHM approaches have been developed for mechanical, aerospace, and civil engineering applications [8]. The most important task of the SHM strategy is the damage identification procedure that can be classified into four levels: detection, location, quantification, and prediction [9].
The vibration methods compare vibration signatures of damaged and undamaged structures. Usually, a vibration signature is obtained through modal analysis [8]. The electromechanical impedance methods are based on the electromechanical impedance signature [12]. The guided wave methods are based on propagation properties of the acoustic wave as Lamb waves. The Lamb waves are guided waves that propagate in thin-walled structures, and they can propagate long distances with low attenuation and have high sensibility to detect small damages [18,19]. Therefore, Lamb waves have been a promising tool for damage detection and localization in plate-like structures.
The SHM methods can be divided into two main approaches: physical-based or data-driven [20]. The physicalbased approaches require a detailed model of the structure while the data-driven approaches are based on database analysis. Phased-array and directional filter [21][22][23], subspace approach [24][25][26], time reversal [27,28], and ellipse-based image [16,29,30] are methodologies based on the physical model of Lamb wave propagation. It is a challenging task due to its multimodal and dispersive characteristics [18,27].
The data-driven approaches are machine learning and deep learning [8,31,32]. Machine learning techniques are data-driven approaches that learn patterns found in the database and can be classified into supervised learning and unsupervised learning [27,33]. The most common supervised learning methods are the artificial neural network and support vector machine [33]. Some advantages of these techniques are low computational cost and high generalization capability. Deep learning is a subbranch of machine learning, and it has the capability to deal with a large dataset [20,33]. Recent studies have proposed the 1-dimensional convolutional neural network (1D-CNN) for structural damage detection and localization [8,10,13,20]. The main advantage of 1D-CNN is automatic feature extraction performed through its initial convolutional layers [10,13,34]. However, CNN has a high computational cost, and its architecture design is a difficult task [35].
Several strategies using ANN for damage localization and classification in plate-like structures have been proposed. Lu et al. [36] implemented an inverse method based on the feedforward artificial neural network for damage identification using PZT transducers. The ANN configuration (the number of hidden layers and the number of neurons in each layer) was determined using a rule of thumb. The discrete wavelet transform was applied to signal denoising before feature extraction. Yelve and Mulla [37] proposed damage detection in an aluminum plate using ANN and PZT transducers. The ANN inputs were damage indexes obtained from twelve actuator sensor paths, and the specific software selected the ANN architecture automatically. Feng et al. [30] presented two methods for damage detection in anisotropic wovenfabric carbon fiber reinforced polymer plate using PZT transducers. The first method was a probabilistic approach by constructing a probability matrix, and the second method was an ANN. The ANN inputs were the time-of-flight of the wave acquired by PZT transducers, and a try-and-error method selected the ANN configuration. Hesser et al. [38] proposed an ANN and support vector machine to locate low impact in the aluminum plate. An ANN with one hidden layer was proposed, and the number of neurons was found through a try-and-error method. However, these approaches do not use any method to find an optimized ANN topology. The best choice of neural network topology, weight values, and activation function for a particular application is a difficult task. ANN topology has a high impact on its performance. A small number of neurons may reduce the ANN learning capacity, but an excessive number of neurons may reduce its generalization capacity [39].
In this context, the metaheuristic approach formulates the ANN parameter identification task into an optimization problem. The most common method is ANN weight optimization where the metaheuristic algorithm searches for the weight values that minimize a cost function. This approach focuses on the ANN training process where the metaheuristic algorithm provides a way to escape from local minimums, which is the main problem in standard gradient-based methods. Several metaheuristic algorithms have been proposed to improve the ANN training process, such as genetic algorithm [40,41], particle swarm optimization [42], and grasshopper optimization [43]. However, to improve the ANN performance, the cost function needs to balance the weight values, ANN topology, and learning parameters. The genetic algorithm is the common metaheuristic method of ANN architecture optimization [44][45][46][47].
The Multiple Particle Collision Algorithm (MPCA) is a metaheuristic that minimizes a mono-objective function providing the simplest neural network topology (lowest number of neurons and faster convergence during the training phase) with the best performance (lowest error) [48]. The MPCA optimization algorithm takes into account all relevant ANN parameters as the number of hidden layers, the number of neurons in each hidden layer, the weight values, the learning rate, the momentum constant, and the activation function [49]. MPCA has been successfully applied to climate prediction applications [39]. The present paper proposes an optimized ANN for damage detection in a plate-like structure using the MPCA. For the first time in the literature, the ANN optimized by MPCA is applied to find damages in plate-like structures. The ANN inputs are the time-of-flight of the Lamb wave reflection captured by four piezoelectric transducers, and the ANN output is the damage localization.
In the damage localization methods using ANN, a common approach is to use four PZT transducers placed in the corner of the inspected plate region [30,36,37]. The damage position is estimated using the time-of-flight information. This configuration is similar to a pitch-catch method finding damages in the delimited area by the transducers. In the present paper, a linear array of PZT transducers is investigated. The transducer placement is similar to a phased-array configuration, so the damage location information is strongly related to the time-of-flight and the time delay between adjacent transducers [18]. Using a linear array of transducers is possible to inspect a large area of the plate.
In the proposed damage localization method, due to the importance of the ToF estimation, the continuous wavelet transform is implemented to improve the arrival time measurement of the scattered wave from damage. The mother 2 Journal of Sensors wavelet is the main CWT parameter, as it may significantly influence the performance of the transformation. Recent works have proposed wavelet transform for damage detection and localization [4,6,30]. However, these works do not investigate a quantitative approach for mother wavelet selection. The present work uses a quantitative criterion, based on the Shannon entropy calculation, to select the best mother wavelet function [50]. The Shannon entropy criterion selects the most relevant scale values, reducing the CWT calculation. The proposed new method combines two optimization approaches: optimized ANN by MPCA and optimized CWT by entropy criterion. The experimental verification of the proposed method is performed on an aluminum plate structure. A mass of lead is employed to simulate several damage scenarios, providing a database for training the ANN and 1D-CNN. The performance of the proposed method is compared with two 1D-CNN. The first 1D-CNN uses as input the ToF features extracted using CWT, and the second 1D-CNN performs feature extraction directly from raw data acquired from PZT transducers. A discussion about the advantages and limitations of the proposed method is presented.

Background
2.1. Continuous Wavelet Transform. Continuous wavelet transform CWT is a linear transformation that decomposes the input signal xðtÞ over the scaled and translated versions of the mother wavelet ψðtÞ, as shown by the following equation [51]: where s is the scale, τ is the translation parameter, and ð * Þ indicates the complex conjugate of mother wavelet function ψðtÞ. Using a proper scale-to-frequency relationship, CWT provides a time-frequency analysis similar to the short-time Fourier transform. However, CWT is more effective in representing nonstationary signals in the time-frequency domain [52,53]. A challenge in using wavelet transform is to establish a criterion to select the mother wavelet. A usual qualitative approach is a method based on a similarity between the mother wavelet and the analyzed signal.
A suitable quantitative approach is Shannon entropybased optimization. This criterion consists of calculating the normalized Shannon entropy of the CWT coefficients as defined by [50]: where s is the scale vector with M elements, T s is the sample time, and N is the number of samples of the xðtÞ signal. Shannon entropy measures the CWT randomness in the timescale domain, where a low entropy value indicates high energy concentration. Therefore, the best mother wavelet has the lowest value of Shannon entropy (Sh).

Hilbert
Transform. The Hilbert transform F Hi ðtÞ of the real signal xðtÞ can be represented by the equation below [54]: The Fourier transform of the F Hi ðtÞ signal is expressed as where sgn ðωÞ is the sign function, XðωÞ is the Fourier transform of input signal xðtÞ, and i is the imaginary number. According to Equation (5), the Hilbert transform is a filter in which the amplitudes of spectral components of the input signal xðtÞ are left unchanged, but their phases are shifted by 90 degrees (see [54] page 359). This property can be used to calculate the instantaneous amplitude, or envelope, of signal xðtÞ.
The RðtÞ envelope of signal xðtÞ is given by [52,54] 2.3. Artificial Neural Networks. Artificial neural networks are computational methods that emulate the human brain learning process. The simple processing units, known as artificial neurons, compose the ANN and are organized through several layers. Typically, the ANN has one input layer, one or more hidden layers, and one output layer. The synaptic weights interconnect the neurons and store the ANN knowledge [39,55]. Multilayer Perceptron Neural Network (MLP-NN) is a feed-forward model used mainly as a classifier, associative memory, or regression. Figure 1 shows an MLP neural network with just one hidden layer, where Equations (7) and (8) describe the neuron x n of the hidden layer and the neuron y k of the output layer, respectively, being b α β (α = x, y and β = n, m) the bias, v nm and ω kn the weight matrices of the connection between the neuron x n and the neuron z m , and y k with the neuron x n . The neuron 3 Journal of Sensors activation function g 1,2 ð·Þ is a nonlinear and smooth function [55]. The frequently used activation functions are sigmoid and hyperbolic tangent.
ANN training is the neuron's weight adjustment process and can be supervised or unsupervised. For an MLP-NN, the gradient-based and stochastic-based approaches are the main supervised training methods [43]. Among gradient-based methods, the frequently used is back-propagation. The back-propagation algorithm calculates the new weights using two steps: forward and backward [39]. In the first step, the input data is applied to the MLP-NN input, starting the data processing. The MLP-NN forward calculations propagate the results layer-by-layer producing the MLP-NN output. The network output y k is subtracted from an expected output of d k to produce an error value [55]: After that, this error is backward propagated from the MLP-NN output to the input. Finally, the neuron weights are adjusted using the delta rule for minimizing the calculated error.

Journal of Sensors
The delta rule can be expressed as [55]: where Δv nm and Δω kn are the weight correction factors and η is the learning rate parameter. The feed-forward neural networks are popular mainly because of their generalization capability. The generalization refers to the ANN ability to correctly process data that is not part of training data. Therefore, the generalization performance can be estimated by The P data elements are not part of the K data elements-those ones were used to train the ANN [49].

1D Convolutional Neural
Networks. Convolutional neural networks (CNN) were inspired by the mammalian nervous system. The usual applications of CNN are regression and classification problems [34]. Typically, a CNN is composed of three types of processing layers: convolutional layers, pooling layers, and fully connected neural network (FCN) layers. Figure 2 shows an example of 1D-CNN.
A convolutional layer extracts relevant features from the input data. Equation (13) describes the convolution operation between layers l and l − 1: where y l k is the output, b l k is the bias of the k th neuron at layer l, s l−1 i is the output of the i th neuron at layer l − 1, ω ik l−1 is the kernel from the i th neuron at layer l − 1 to the k th at layer l, and Conv1Dð·Þ is the 1D discrete convolution without zero padding [34]. The pooling layer performs a downsampling operation resulting in a dimensional reduction. Pooling    Journal of Sensors   Journal of Sensors could be max-pooling or average pooling. The fully connected layer corresponds to MLP-NN.

Optimal Neural Network Architecture by Multiple
Particle Collision Algorithm. The identification of the best ANN configuration for a given application and data is a challenging task. One standard procedure is to do several experiments with different ANN configurations for getting acceptable results. The specialist changes the values of ANN parameters for each trial and compares the results with the observed values until finding a set of the most suitable parameters in his judgment [49]. An alternative approach is to formulate the task of identifying the best ANN topology as an optimization problem [56].
The goal of optimization problems is finding the suitable hyperparameters of a set that maximizes or minimizes a function, defined as an objective function or cost function. The objective function used in this work is given by [57] f obj = penalty × ρ 1 × E train + ρ 2 × E gen ρ 1 + ρ 2 , ð14Þ The first term of Equation (14) is the penalty factor, and the second term is the weighted mean of two errors: train error E train and generalization error E gen . Equation (15) shows the penalty term to evaluate the number of neurons and the epochs to perform the training [49]. The parameter values used in this work are ρ 1 = 1, ρ 2 = 0:1, C 1 = 5 × 10 −8 , and C 2 = 5 × 10 −5 .
The MPCA optimization strategy must find the ANN configuration that minimizes the objective function (Equation (14)).
The ANN parameters taken into account by MPCA are the number of hidden layers, the number of neurons in each hidden layer, the weight values, the learning rate η, the momentum constant α, and the activation function [49].
The MPCA is a version of the particle collision algorithm (PCA). The PCA is a stochastic optimization algorithm inspired by the neutron travelling in a nuclear reactor, where two main phenomena can occur: scattering and absorption. The MPCA is a modified version of the particle collision algorithm that uses several particles to explore the search space [48]. Algorithm 1 shows the MPCA pseudocode.
The MPCA starts with a random ANN configuration (Old-Config). The Fitness() function calculates Equation (14) that evaluates the ANN configuration performance and updates the Best-Fitness information. For each particle, a stochastic perturbation (function Perturbation()) generates a new ANN configuration (New-Config) closer than the previous one, and the Fitness() function calculates the fitness of it. If New-Config is better than Old-Config, this new configuration is absorbed (the New-Config becomes an Old-Config for the next iterations). The function Exploration() generates small perturbations on closer positions. However, if New-Config is worse than Old-Config, the function Scattering() sends the particle to a different location of the search space trying to escape the local minimum [56]. The blackboard

Experimental Setup and Results
The validation scenario of the proposed damage detection method is an aluminum plate (500 mm × 500 mm × 1 mm) with four PZT transducers bounded on its surface, as represented by Figure 3. These transducers, which have a 20 mm diameter, are arranged with a space of 5 mm and labeled as R0, R1, R2, and T0. Figure 3 shows twelve square sectors (50 mm sides) that delimit the damage scan region. The Lamb wave excitation signal is produced by transducer T0 and received by the other four transducers (R0, R1, R2, and R3). Figure 4 shows the experiment setup. The DE1-SoC (System-on-Chip) Development board has a hardware design platform which combines the dual-core Cortex-A9 embedded cores with FPGA (Field Programmable Gate Array). This board is programmed to generate a Gaussian window five cycles sine tone burst through a look-up table implemented in programmable logic and using a DAC (10 bits) converter. The DAC signal is amplified by the power circuit; then, the output signal generates the Lamb waves through the PZT transducer T0. The sine signal is generated with three different frequencies: 12 kHz, 16 kHz, and 20 kHz. The Lamb waves propagate through the aluminum board, and they are captured by the PZT transducers R0, R1, R2, and R3. The 12-bit ADC converts the analog voltage from the PZT transducers into digital signals. The sampling rate is set to 400 kHz.
Two signals received by sensor R0 are plotted in Figure 5(a). The first signal (black color) corresponds to the undamaged condition, and the second signal (red color) represents the signal received when damage occurs on sector S5. The excitation arrow indicates the direct wave received from the T0 transducer, and the damage arrow indicates the reflected wave by the damage, as shown in Figure 5(a). Figure 5(b) shows the difference signal, corresponding to the damage signal subtracted from the undamaged signal.
In the next step, the continuous wavelet transform of the difference signal is performed using six different mother wavelets. Figure 6 presents the normalized Shannon entropy of the CWT coefficients. The selected mother wavelet is the mexh (Mexican hat) which has the minimum entropy.
The resultant scalogram is plotted in Figure 7(b). The CWT minimum entropy signal, shown in Figure 7(c), is constructed from the scalogram by getting the CWT coefficients which scale has the minimum entropy value. The Hilbert transform of the CWT minimum entropy signal makes it possible to obtain the peak position as presented in Figure 7(c). Table 1 shows the ANN inputs. The peak position of the minimum entropy CWT coefficient obtained for each sensor R0, R1, R2, and R3 is the ANN inputs. The ANN output is the damage coordinate (x, y).
A mass of lead is employed to simulate damage in the aluminum plate. The training and validation databases are generated by placing the mass of lead in the center of each sector defined in Figure 3. A total of 3600 samples, 300 samples by sector, is used to train the networks. The total number of samples is randomly divided into three sets: 2520 samples are selected for training, 540 samples for validation, and 540 samples for generalization. The training set is used to update the ANN weights and biases. The validation set evaluates the ANN generalization performance during the training phase (cross-validation process). A poor generalization performance is signaled by the raising of the validation error [56]. The generalization set is used to evaluate the ANN generalization performance after the training process. Table 2 presents the ANN parameters found by the MPCA optimization algorithm. The parameters used in the MPCA are 10 particles using a multiprocessing machine (one particle per processor) and 30 iterations (a scheme used to explore a better solution around the new particle location), and the stopping criterion is the maximum of 300 objective function evaluations.
The MPCA-ANN results are compared with two 1D-CNN. The 1D-CNN-CWT uses the same dataset applied to MPCA-ANN. This dataset corresponds to the features extracted using the CWT as defined in Table 1. The 1D-CNN-RAW uses the raw signals acquired by the PZT sensors. The mean square    Table 3 shows the 1D-CNN-CWT architecture.
For 1D-CNN-RAW, the training parameters are batch size of 32, learning rate of 1e − 4, and 200 epochs. Table 4 shows the 1D-CNN-RAW architecture. Table 5 shows the dataset division by sector. The damage is simulated placing the mass in the center of the sector indicated by the first column of Table 5. The second column of Table 5 presents the damage location represented by Cartesian's coordinates which origin is the corner of sector S12. The last three columns of Table 5 present the rootmean-square error (RMSE) of the estimated damage position by MPCA-ANN, 1D-CNN-CWT, and 1D-CNN-RAW, respectively. Equation (16) calculates the RMSE of the damage positions, where ðx ann ðnÞ, y ann ðnÞÞ is the damage position estimated by ANN and ðxðnÞ, yðnÞÞ is the real damage position: Figure 8 shows the MPCA-ANN performance. The real damage position (the mass position in the aluminum board) is represented by blue color, and the estimated damage position by ANN is represented by red color.
In order to evaluate the MPCA-ANN generalization capability, experiments were performed with mass placed in the vertices of sectors S6 and S7. The second column of Table 6 presents each vertex location represented by Cartesian's coordinates which origin is the corner of sector S12. 200 experiments were performed for each vertex. The last three columns of Table 6 present the RMSE of the estimated damage position by MPCA-ANN, 1D-CNN-CWT, and 1D-CNN-RAW, respectively. Figure 9 shows the damage position estimated by MPCA-ANN for each case of Table 6. Each plot of Figure 9 represents the physical aluminum board; the red points show the estimated damage position and the blue cross the real damage position.

Discussion
The minimum Shannon entropy selected the Mexican hat as the best choice for the mother wavelet as shown by Figure 6. The CWT performance can be visualized by Figures 7(b) and 7(c). The results presented by these figures indicate that the CWT can be evaluated only on the scale with minimum entropy value, where more energy concentration exists. The comparison between Figures 7(a) and 7(b) shows that the CWT using the Mexican hat as the mother wavelet, calculated in the scale with minimum entropy, can estimate the peak position of the Lamb wave reflection signal.
The MPCA algorithm found the ANN architecture shown in Table 2. The ANN training performance result, presented in Figure 8, indicates that the ANN can find the correct damage    Table 5 shows the training results for the MPCA-ANN, 1D-CNN-CWT. and 1D-CNN-RAW. The ANN-MPCA presented a good performance in comparison with the two versions of 1D-CNN. The results of Table 6 show the good generalization capability of MPCA-ANN when compared with the 1D-CNN-CWT and 1D-CNN-RAW.
A metric termed mean normalized distance (MND) [13] is used to compare the global result of the generalization performance. The distance values are normalized with length of the inspected board region. Equation (17) describes the MND metric: ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi x n ann n ð Þ − x n n ð Þ ð Þ 2 + y n ann n ð Þ − y n n ð Þ ð where ðx n ann , y n ann Þ is the normalized damage position found by ANN, ðx n , y n Þ is the real normalized position of damage, and N is the number of samples. The normalized coordinate x n corresponds to the division of x by 200 mm (length of inspected area), and the normalized coordinate y n corresponds to the division of y by 150 mm (length of inspected area). Table 7 shows MND values for the three algorithms, with errors of order Oð10 −1 Þ, and the best performance was the 1D-CNN-RAW.
The MND results presented in Table 7 show that 1D-CNN-RAW with features extracted directly from raw data presents the same generalization performance in comparison with the 1D-CNN-CWT. This result shows the good damage localization information contained in ToF features extracted using the CWT optimized by entropy calculation. Normally, the CWT performs several convolutional operations using different scale values. The CWT operation has high computational cost. However, the entropy method limits the CWT just to optimum scale value reducing the computational cost of the CWT operation. Therefore, the 1D-CNN-CWT has the same performance as 1D-CNN-RAW but with a low computational cost. The 1D-CNN-RAW uses two convolutional layers, while 1D-CNN-CWT uses just one convolutional layer. Table 8 shows an estimated computational cost comparison between MPCA-ANN, 1D-CNN-CWT, and 1D-CNN-RAW. The computational cost is indirectly estimated by the number of convolutions, the number of neurons, and the number of weights.
Both 1D-CNN methods presented a better generalization performance in comparison with MPCA-ANN. This result confirms the great 1D-CNN capability to extract relevant features from input data. However, the 1D-CNN structure has a high computational cost. The 1D-CNN-CWT architecture, presented in Table 3, has two fully connected layers with 60 neurons, while the 1D-CNN-RAW architecture, presented in Table 4, has one fully connected layer with 60 neurons. The generalization performance of MPCA-ANN with only one hidden layer and five neurons was close to the complex 1D-CNN structures. An estimated computational cost is presented in Table 8. The lowest computational effort is linked to the MPCA-ANN, due to the design based on searching the simplest neural architecture. Such low neural network complexity shows the adequacy for embedded applications, with limited computational capacity, or to be implemented in a hardware processing device, maintaining a good generalization capability.

Conclusion
In this paper, an optimized artificial neural network is proposed to locate damages into plate-like structures. The Multiple Particle Collision Algorithm finds the most adequate ANN architecture for the considered dataset. The experimental results of damage detection and localization in an aluminum plate validate the effectiveness of MPCA in finding a simple and optimized ANN architecture with good generalization capability. The generalization performance of MPCA-ANN was compared with two 1D-CNN. The results  show that MPCA-ANN has a good performance in comparison with complex 1D-CNN structures. The continuous wavelet transform processes the Lamb wave reflections and improves the time-of-flight estimation. The Shannon entropy-based criterion finds the best mother wavelet and scale values for detecting the Lamb wave reflections. The experimental results validate the effectiveness of the proposed optimized CWT. In future works, this method will be extended to more complex plate-like structures such as composite structures.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.