1. Introduction

MPE

Mathematical Problems in Engineering

1563-5147 1024-123X

Hindawi Publishing Corporation

406521

10.1155/2012/406521

406521

Research Article

Application of Kernel Density Estimation in Lamb Wave-Based Damage Detection

Long

¹ Su

Zhongqing

² Marzani

Alessandro

School of Mechanics and Civil & Architecture

Northwestern Polytechnical University

Xi'an

Shaanxi 710129

China

npu.edu

Department of Mechanical Engineering

The Hong Kong Polytechnic University

Hong Kong

polyu.edu.hk

2012

8 8 2012

2012 13 04 2012 15 06 2012 20 06 2012

2012

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The present work concerns the estimation of the probability density function (p.d.f.) of measured data in the Lamb wave-based damage detection. Although there was a number of research work which focused on the consensus algorithm of combining all the results of individual sensors, the p.d.f. of measured data, which was the fundamental part of the probability-based method, was still given by experience in existing work. Based on the analysis about the noise-induced errors in measured data, it was learned that the type of distribution was related with the level of noise. In the case of weak noise, the p.d.f. of measured data could be considered as the normal distribution. The empirical methods could give satisfied estimating results. However, in the case of strong noise, the p.d.f. was complex and did not belong to any type of common distribution function. Nonparametric methods, therefore, were needed. As the most popular nonparametric method, kernel density estimation was introduced. In order to demonstrate the performance of the kernel density estimation methods, a numerical model was built to generate the signals of Lamb waves. Three levels of white Gaussian noise were intentionally added into the simulated signals. The estimation results showed that the nonparametric methods outperformed the empirical methods in terms of accuracy.

1. Introduction

Structural health monitoring (SHM) is an emerging technology that merges with a variety of techniques related to diagnostics and prognostics. Monitoring the status of structural health can improve the safety and maintainability of critical structures in many fields, such as civil engineering, aerospace, and military industry. An ideal SHM system includes several subsystems in which the damage detection methodology is the key part. Therefore, numerous damage-detection methods have been researched in years [1]. The method based on Lamb waves has the apparent advantages of high sensitivity to structural damage compared with methods based on the mode shapes [2] or structure dynamic responses [3]. It has been verified that the Lamb wave-based damage detection methods can detect crack, delamination, surface corrosion, penetrate holes, weld defect, and many other kinds of damage in plate and shell structure [4–6]. Consequently, the Lamb wave is widely acknowledged as one of the most encouraging tools for SHM. The relevant research has been conducted intensively since the 1980s [7].

The portion of the SHM process that has received the least attention in the technical literature is the development of statistical models for discrimination between features from the undamaged and damaged structures. The algorithms, which analyze statistical distributions of the measured or derived features to enhance the damage identification process, have been developed [8, 9]. The probability-based diagnostic methods have also been introduced in Lamb wave-based damage detection area in recent years [10, 11]. However, the statistical modes using in the existing Lamb wave-based methods are relatively simple. Despite a number of literatures had been published, which focused on the consensus algorithm of combining all the results of individual sensors, the p.d.f. of the measured data was empirically determined. As a key part of statistical model, it is obvious that the accuracy of the p.d.f. has a significant effect on the precision of damage-detecting result. Compared with the estimating results by empirical formula, the results of statistical methods will be more accurate and reliable. Hence, the study of using statistical methods to estimate the p.d.f. is necessary in Lamb wave-based damage detection.

Elementary parametric estimation method has been adopted under the assumption that the p.d.f. of the measured data is normal distribution [12]. However, the assumption in parametric method limits the application of this method. If the extra assumption is correct, the results produced by parametric method can be more accurate than the results given by empirical formula. While if the assumption is incorrect, parametric methods can be very misleading.

Since the type of p.d.f. of measured data from field experiments is varied and can hardly be predicted, more robust approach methods should be considered. The nonparametric statistic methods can give the parameters of distribution and do not rely on assumptions that the data are drawn from a given probability distribution. Therefore, introducing the nonparametric statistic methods is crucial in Lamb wave-based damage detection.

The aim of this paper is to demonstrate the necessity and feasibility of application of kernel density estimation, which is the most popular nonparametric estimation method in Lamb wave-based damage detection. Two kinds of kernel density estimation methods, the one based on the Gaussian approximation and the one based on the smoothing properties of linear diffusion processes, were briefly introduced in this paper. The signals of Lamb waves with different levels of white Gaussian noise were acquired by using numerical simulation. The framework of applying nonparametric estimation method in Lamb wave-based damage detection was demonstrated by using the simulated signals. The characteristics of noise-induced error in the arriving time of damage-scattered Lamb waves, which is the index used to locate damage, was analyzed. Based on this analysis, the outcomes of two kinds of kernel density estimation method as well as the parametric estimation methods were compared. The results show that the nonparametric methods outperform the parametric method in terms of accuracy and reliability.

2. Lamb Wave-Based Damage Detection 2.1. Background

Lamb waves are a kind of elastic waves propagates in thin plate and shell structure. With a high susceptibility to interference on a propagation path, for example, damage or a boundary, Lamb waves can travel over a long distance even in materials with a high attenuation ratio, and thus a broad area can be quickly examined [13].

Lamb waves are made up of a superposition of longitudinal and shear modes, and its propagation characteristics vary with entry angle, excitation, and structural geometry. A Lamb mode can be either symmetric or antisymmetric, formulated by (2.1)tan(qh)tan(ph)=-4k2qp(k2-q2)2 for symmetric modes,(2.2)tan(qh)tan(ph)=-(k2-q2)24k2qp for antisymmetric modes, where p2=w2/cL2-k2, q2=w2/cT2-k2, k=w/cp, and h, k, cL, cT, cp, ω are the plate thickness, wavenumber, velocities of longitudinal and transverse modes, phase velocity, and wave circular frequency, respectively. Equations (2.1) and (2.2), correlating the propagation velocity with its frequency, imply that Lamb waves, regardless of its mode, are dispersive (velocity is dependent on frequency).

Lamb waves can be actively excited by a variety of means, such as ultrasonic probe, laser, interdigital transducer, and piezoelectric element. The piezoelectric element can also be used as sensor to collect signals of Lamb waves perfectly. The piezoelectric element is particularly suitable for integration into a host structure as an in situ generator/sensor, for their neglectable mass/volume, easy integration, excellent mechanical strength, wide-frequency responses, low power consumption and acoustic impedance, as well as low cost. Applications of piezoelectric element in Lamb wave-based damage detection are numerous.

Lamb mode selection is an important part for damage detection. The basic symmetric mode, S0, and the antisymmetric mode, A0, are normally used in practice. Although S0 is preferred in many of studies [14], utilization of A0 is increasing because that A0 is the highly effective for detecting delamination and transverse ply cracks [15]. To implement the Lamb mode selection, a multielement transducer setup was proposed [16] to dominantly generate S0 or A0.

The algorithms for Lamb wave-based damage identification can be roughly divided into two categories. The first category is the algorithms that identify and locate damage by observing the damage-reflected Lamb waves, such as Time-of-Flight (ToF) method [17–19], embedded ultrasonic structural radar [20], and time of difference method [21]. The second category is the algorithms that analyze the changes in the characteristics of Lamb waves caused by the damage in its propagation path, such as tomography method [22] and virtual-sensing paths method [23].

For the algorithms that focus on the damage-reflected waves, the arriving time of the Lamb waves is the key index used to locate damage. Since the signal of Lamb waves is wave packet in the form, several methods have been developed to measure the arriving time of Lamb waves, such as threshold method, correlation method, wavelet method [24], and a novel cross-correlation analysis method based on a wavelet transform [25, 26]. Among those methods, the threshold method, which was adopted in this paper, has the advantage of simplicity. In threshold method, a threshold value Vt was firstly set up on basis of experience. Once the amplitude of one or several peaks exceed Vt, then the corresponding peaks were recorded. Depending on the magnitude of Vt, one or more peaks could be recorded for a wave packet. If only one peak was recorded, the arriving time was the time corresponding to that peak. If more than one peak were recorded, then the arriving time will be the average of all recorded time. Usually, the threshold value is selected to let several peaks belong to one wave packet can be recorded. The benefit of recording several peaks instead of only the strongest peak is that the averaging process itself can reduce noise to some extent.

2.2. Time of Flight Method

ToF, defined as the time lag from the moment when a sensor catches the damage reflected signal to the moment when the same sensor catches the incident signal, was widely used to locate damage [17–19].

Consider a sensor network consisting of N piezoelectric wafers denoted by si (i=1,2,…,N). For convenience of discussion, sm-sn hereinafter stands for the sensing path in which sm serves as the actuator and sn as the sensor. The center of the damage, if any, is presumed to be (x,y) in coordinate system. Then, the ToF can be defined in (2.1) as Ti-j: (2.3)LA-DVS0+LD-SVSH0-damage-LA-SVS0=Ti-j, In which LA-D, LD-S, and LA-S represent the distance from the actuator si to the damage, from the damage to the sensor sj, and from actuator si to the sensor sj, respectively. VSH0-damage and VS0 are velocities of the damage-converted SH₀ mode and the incipient S0 mode, respectively.

Because there are two unknown damage parameters, (x,y), in (2.3), the solution of (2.3) will be a root locus, which implies the possible locations of the damage for a certain ToF value. In traditional approaches, the damage location is given by seeking the intersections of two or more loci. As shown in Figure 1(a), in the case of using three sensor pairs, there will be three loci, each exhibiting a time delay due to the existence of damage. The point with which all three loci intersect was considered as the location of damage, while the points with which only two loci intersect were considered as pseudodamage location.

Damage localization using ToF method in a plate. (a) Locus based on accurate ToF value, (b) Locus based on ToF with error.

(a) (b)

There is a prerequisite in the traditional approach. That is all of the measured ToF values Tm were accurate. However, errors are always inevitable in any experimental result due to the reasons such as noises. Therefore, as shown in Figure 1(b), there is no point with which all three loci intersect if the loci were drawn based on noise contained Tm instead of the theoretical value T. It is suggested that the damage location can be given as the area where the density of intersections of two loci is relatively large. That leads to the research about the probability-based approach method, to give the precise damaged area based on the density of intersections.

2.3. Probability-Based ToF Method

The concept of probability-based approach was introduced by Zhao et al. [27] to improve the performance of Lamb wave-based method, and then it was adopted by Su et al. [28] in ToF method. In traditional ToF method, only the points on loci are considered as possible damage location. Other points, regardless of its distance to the loci, will all be excluded outside the possible damage location. In fact, due to the existence of errors in Tm, the real damage may not be on the loci which were drawn based on Tm. Therefore, in probability-based approach method, the points absent in the loci are also considered as possible damage location. The possibility of damage occurrence in those points will be determined by its distance to the loci. The mesh nodes right located on an above-established locus have the highest degree of probability of damage presence; for the others, the greater the distance to the locus, the lower the probability damage exists there. To quantify the probabilities at all nodes with regard to all loci, a function called as p.d.f. of damage occurrence was introduced. For each loci, a probability distribution map can be given for the detection target plant structure based on p.d.f. of damage occurrence. Combination of all the probability distribution maps can give the final damage detection result.

The main frame of data fusion-based method can be divided into two steps.(1)

The inspection area of the structure was evenly meshed. For a certain measured ToF, each mesh node will be evaluated about its possibility for the presence of damage by using a probability density function.

(2)

All evaluated results for each measured ToF were combined to give the detection result in a matrix form. Each element of the matrix represents the probability of the presence of damage for one mesh node.

The detection result in matrix form can be illustrated in an image shown in Figure 2, where the lighter the greyscale, the greater the possibility of damage existing at that pixel (each pixel exclusively corresponds to a spatial point of the structure under inspection).

Figure 2

Damage localization result of probability-based method.

It is obvious that the p.d.f. of damage occurrence is the key part of probability based method. Su et al. [10] suggest the p.d.f. can be quantified in relation to the loci: (2.4)f(zij)=(1σij2π) exp⁡[-zij22σij2], where f(zij) is the Gaussian distribution function, representing the p.d.f. of damage occurrence at node Li (i=1,…,K×K for the structure that is comprised of K×K mesh nodes), perceived by a sensor, sj (j=1,…,N for the sensor network consisting of N sensors). σij is the standard deviation and (2.5)zij=‖χi-μij‖, where χi is the location vector of node Li and μij is the location vector of the point on the locus provided by sensor sj that has the shortest distance to node Li.

Satisfied results have been obtained by using this kind of p.d.f. But it should be noticed that the standard variance σij was selected depending on experience.

The concept of probability-based approach was also adopted in some other Lamb wave-based damage detection methods rather than ToF method. Wang et al. [23] combine the concept of probability-based approach with virtual-sensing paths method. The p.d.f. in their work is an empirical formula and the parameters were given by experience.

There are mainly two disadvantages in the existing work. First, empirical formula usually are simpler to write down and faster to compute, but it depends heavily on the experimental environment. Any change which is inevitable in experiment may cause a big error in the estimated results. That is, the simplicity of empirical formula makes up for its nonrobustness. Since the data measurement work in the Lamb wave-based damage detection is not time consuming, it is reasonable that the density function should be estimated by using robust statistic method. Second, the p.d.f. used in existing work is the distribution function about the location of damage in the plane fD(T)(s), where s=|D(T)-D(Tm)|, D(T) and D(Tm) are the damage location corresponding to T (the actual ToF data) and Tm (the experimental ToF data), respectively. It should be noticed that the damage location cannot be directly measured in experiment. Thus, estimating fD(T)(s) directly will be difficult. Based on the estimation of the function fTm(t) about the distribution of experiment data Tm in time domain, estimating fD(T)(s) by using the mapping relationship defined in (2.3) should be a better method.

Therefore, probability density estimation methods will be introduced in Section 3. The advantages and feasibility of applying probability density estimation methods in ToF method will be demonstrated.

3. Probability Density Estimation

In statistic, density estimation is the method that estimates the parameters of a distribution based on the observed samples. Depending on whether a priori knowledge about the type of the distribution is required, the density estimation methods can be divided into two categories: parametric estimation and nonparametric estimation.

3.1. Parametric Estimation

Parametric estimation mainly includes point estimation and interval estimation. In statistics, point estimation is the use of sample data to calculate a single number of possible values of an unknown population parameter, in contrast to interval estimation, which is an interval. Most commonly used point estimation methods are method of moment estimation, maximum likelihood estimation, and Bayesian estimation. For instance, if it is known that the sample data come from a normal distribution, then the two parameters of normal distribution, expectation and variance, can be calculated by using (3.1) and (3.2), which is derived by using maximum-likelihood estimation method: (3.1)μ^=1N∑i=1Nxi,(3.2)σ^2=1N∑i=1N(xi-μ^)2, where N is the number of samples.

3.2. Nonparametric Estimation

Nonparametric estimation is a method that estimates the parameters of an unknown distribution while does not rely on assumptions about the type of this distribution. Commonly, nonparametric estimation methods include histogram, nonparametric regression, and kernel density estimation, which is the most popular one.

3.2.1. Kernel Density Estimation Based on the Gaussian Approximation

Kernel density estimation is a nonparametric method to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite-data sample. In some fields such as signal processing and econometrics, kernel density estimation was also termed as the Parzen-Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating this method in its current form [29, 30].

Let (x1,x2,…,xn) be an independent and identically distributed sample drawn from some distribution with an unknown density f. Estimating the shape of this function f is interested. Its kernel density estimator is (3.3)f^h(x)=1n∑i=1nKh(x-xi)=1nh∑i=1nKh(x-xih), where K(•) is the kernel, a symmetric but not necessary positive function that integrates to one; and h is positive and a smoothing parameter called the bandwidth. A kernel with subscript h is called as the scaled kernel and defined as Kh(x)=(1/h)K(x/h). A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. As with the kernel regression, the choice of kernel function is not crucial, but the choice of bandwidth is important.

The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate [31, 32]. The most common optimality criterion used to select this parameter is the expected L2 risk function, also termed as the Mean Integrated Squared Error (MISE); (3.4)MISE(h)=E∫(f^h(x)-f(x))2dx. Under weak assumptions on f and K [29, 30], MISE(h)=AMISE(h)+o(1/(nh)+h4), where o is the little o notation of the family of Bachmann-Landau notations. o(1/(nh)+h4) denotes the function family in which every function grows much slower that (1/(nh)+h4) [33]. The AMISE is the asymptotic MISE which consists of the two leading terms (3.5)AMISE(h)=R(K)nh+14m2(K)2h4R(f′′), where R(g)=∫g(x)2dx for a function g, m2(K)=∫x2K(x)dx, and f′′ is the second derivative of f. The minimum of this AMISE is the solution to this differential equation: (3.6)∂∂hAMISE(h)=-R(K)nh2+m2(K)2h3R(f′′)=0 or (3.7)hAMISE=R(K)1/5m2(K)2/5R(f′′)1/5n1/5. Neither the AMISE nor the hAMISE can be used directly since they involve the unknown density function f or its second derivative f′′. Therefore, a variety of automatic, data-based methods have been developed for selecting the bandwidth.

If the kernel function is normal and it is assumed that the distribution being estimated is Gaussian, then it can be derived from (3.7) that optimal choice for h is (3.8)h=(4σ^53n)1/5≈1.06σ^n-1/5, where σ^ is the standard deviation of the samples. This approximation is termed as the normal distribution approximation, Gaussian approximation, or Silverman’s rule of thumb [32].

3.2.2. Kernel Density Estimation via Diffusion

Kernel density estimation is an ongoing research topic in statistics. Botev et al. [34] proposed an adaptive kernel density estimation method based on the smoothing properties of linear diffusion processes. This novel approach method includes two parts: first, a simple and intuitive kernel estimator with substantially reduced asymptotic bias and mean square error, and better boundary bias performance; second, an improved plug-in bandwidth selection method that completely avoids the Gaussian approximation. The new plug-in method is thus genuinely “nonparametric,” since it does not require a preliminary normal model for the data.

(I) The Diffusion Estimator

Given N independent realizations χN≡{X1,…,XN} from an unknown continuous p.d.f. f on 𝔛, the Gaussian kernel density estimator is defined as (3.9)f^(x;h)=1N∑i=1Nϕ(x,Xi;h), where (3.10)ϕ(x,Xi;h)=12πhe-(x-Xi)2/(2h) is a Gaussian p.d.f. (kernel) with location Xi and scale h. The scale is the bandwidth in kernel density estimation.

Chaudhuri and Marron [35] had found that there is a link between the Gaussian kernel density estimator and the well-known Fourier heat equation which is a diffusion partial differential equation (PDE). The link is the Gaussian kernel density estimator defined in (3.9) in fact is the unique solution to the Fourier heat equation: (3.11)∂∂tf^(x;h)=12∂2∂x2f^(x;h), x∈χ, h>0, with χ≡ℝ and initial condition f^(x;0)=Δ(x), where Δ(x)=∑i=1Nδ(x-Xi) is the empirical density of the data χN and δ(x-Xi) is the Dirac measure at Xi. In the heat equation interpretation, the Gaussian kernel in (3.9) is the so-called Green’s function [36] for the diffusion PDE in (3.11). Thus, the Gaussian kernel density estimator f^(x;h) can be obtained by evolving the solution of (3.11) up to h.

Because any bounded domain can be mapped onto [0,1] by a linear transformation, there is no loss of generality in assuming that the domain of the data is known as χ≡[0,1]. Then, the analytical solution of PDE (3.11) with initial condition Δ(x) and the Neumann boundary condition in this case is (3.12)f^(x;h)=1N∑i=1Nκ(x,Xi;h), x∈[0,1], where the kernel k is given by (3.13)κ(x,Xi;h)=∑k=-∞∞ϕ(x,2k+Xi;h)+ϕ(x,2k-Xi;h). The Neumann boundary condition is (3.14)∂∂xf^(x;h)|x=1=∂∂xf^(x;h)|x=0=0, and the target of this boundary condition is to ensure that (3.12) satisfies the requirements of p.d.f., such as f^ should be a nonnegative Lebesgue-integrable function and integrates to unity.

It has been proved that the estimator given in (3.12) arises as the solution of the diffusion PDE is better in boundary bias properties compared with the traditional estimator given in (3.9).

Therefore, motivated by the idea of acquiring the estimator from the solution of diffusion PDE, Botev proposed that the most general linear time-homogeneous diffusion PDE can be a starting point for the construction of a better kernel density estimator. The simple diffusion model described in (3.11) can be extended on the basis of the smoothing properties of the linear diffusion PDE: (3.15)∂∂hg(x;h)=Lg(x;H), x∈χ, t>0, where the linear differential operator L is of the form (1/2)(d/dx)(a(x)(d/dx)(·/p(x))), and a and p can be any arbitrary positive function on χ with bounded second derivatives, and the initial condition is g(x;0)=Δ(x).

The solution of (3.15) can be the diffusion kernel estimator and written as (3.16)g(x;h)=1N∑i=1Nκ(x,Xi;h). There is no analytical expression for the diffusion kernel satisfying (3.16), κ can be written in terms of a generalized Fourier series in the case that χ is bounded: (3.17)κ(x,Xi;h)=p(x)∑k=0∞eλκhφk(x)φk(y), where {φk} and {λk} are the eigenfunctions and eigenvalues of the Sturm-Liouville problem on [0,1]: (3.18)L*φk=λkφk, k=0,1,2,…,φk′(0)=φk′(1)=0, k=0,1,2,…, where L* is of the form (1/2p(y))(∂/∂y)(a(y)(∂/∂y)(·)); that is, L* is the adjoint operator of L.

(II) Improved Plug-In Bandwidth Selection Method

The novel plug-in bandwidth selection method for the diffusion estimator defined in (3.16) proposed by Botev is based on the improved plug-in bandwidth selection method for the Gaussian kernel density estimator defined in (3.9).

Assuming that f′′ is a continuous square-integrable function, the asymptotically optimal value of h for Gaussian kernel density estimator is the minimize of the first-order asymptotic approximation of MISE [37] (3.19)h*=(12Nπ‖f′′‖2)2/5. It is clear from (3.19) that to compute the optimal h*, one needs to estimate the functional ∥f′′∥2. Consider the problem of estimating ∥f(j)∥2 for any arbitrary integer j≥1. The identity ∥f(j)∥2=(-1)j𝔼f(f(2j)(X)) suggests two plug-in estimators: (3.20)the first one is (-1)jEf(f(2j)(X))=(-1)jN2∑k=1N∑m=1Nϕ(2j)(Xk,Xm;hj),the second one is ‖f(j)^‖2∶=‖f^(j)(⋅,h)‖2=(-1)jN2∑k=1N∑m=1Nϕ(2j)(Xk,Xm;2hj). For a given bandwidth, both estimators (-1)j𝔼f(f(2j)(X)) and ∥f(j)^∥2 aim to estimate the same quantity ∥f(j)∥2. Therefore, hj* can be selected to make both estimators asymptotically equivalent in the mean square error sense: (3.21)hj*=(1+1/2j+1/231×3×5×⋯×(2j-1)Nπ/2‖f(j+1)‖2)2/(3+2j). Computation of hj* by using (3.21) involves ∥f(j+1)∥2 which is unknown. Thus, each hj* is estimated by (3.22)h^j*=(1+1/2j+1/231×3×5×⋯×(2j-1)Nπ/2‖f(j+1)^‖2)2/(3+2j). Computation of ∥f(j+1)^∥2 requires the estimation of h^j+1*, which in turn requires the estimation of h^j+2*, and so on, as seen from (3.20) and (3.22). There is the problem of estimating the infinite sequence {h^j+k*,k≥1}. However, for some l>0, if h^l+1* can be given, then all {h^j*,1≤j≤l} can be estimated recursively. Based on this idea, the l-stage direct plug-in bandwidth selector [37] has been proposed.

Denote the functional dependence of h^j* and h^l+1* as (3.23)h^j*=γj(h^j+1*). It is then obvious that h^j*=γj(γj+1(h^j+2*))=γj(γj+1(γj+2(h^j+3*)))=⋯. For simplicity of notation, the composition can be defined as (3.24)γ[k](h)=γ1(⋯γk-1(γk(h))), k≥1. The estimate of h* satisfies (3.25)h^*=ξ*h^1γ=ξγ[1](h^*2)=ξγ[2](h^*3)=⋯=ξγ[l](h^l+1*). Then, for a given integer l>0, the l-stage direct plug-in bandwidth selector consists of computing (3.26)h^*=ξγ[l](hl+1*), where hl+1* is estimated by assuming that f in ∥f(l+2)∥2 is a normal density with mean and variance estimated from the data.

It is noticed that the assumption in the l-stage direct plug-in bandwidth selector method can lead to arbitrarily bad estimates of h*, when, for example, the true f is far from being Gaussian. Therefore, Botev proposed to find a solution to the nonlinear equation: (3.27)h=ξγ[l](h), for some l, using either fixed point iteration or Newton’s method with initial guess h=0. The fixed-point iteration version is formalized in the following Improved Sheather-Jones algorithm:(1)

Given l>2, initialize with z0=ε, where ε is machine precision, and n=0;

(2)

Set zn+1=ξγ[l](zn);

(3)

if |zn+1-zn|<ε, stop and set h^*=zn+1; otherwise, set n:=n+1 and repeat from step (2);

(4)

Deliver the Gaussian kernel density estimator in (3.9) evaluated at h^* as the final estimator of f, and h^*2=γ[l-1](zn+1) as the bandwidth for the optimal estimation of ∥f′′∥2.

It has been proved that the recommending setting for l is 5.

The above section explains how to estimate the bandwidth h* of the Gaussian kernel density estimator. Now, the algorithm that estimates the bandwidth h* of the diffusion estimator will be introduced.

Assuming that f is as many times continuously differentiable as needed, then it has been proved that the square of the asymptotically optimal bandwidth is (3.28)h*=(Ef[σ-1(x)]2Nπ‖Lf‖2)2/5. Computation of h* in (3.28) requires an estimate of ∥Lf∥2 and 𝔼f[σ-1(x)]. The latter one can be estimated via the unbiased estimator (1/N)∑i=1Nσ-1(Xi). The identity ∥Lf∥2=𝔼fL*Lf(x) suggests two possible estimators. The first one is (3.29)EfL*Lf(x)^∶=1N∑i=1N∑j=1NL*Lκ(x,Xi;h2)|x=Xj.

The second one is (3.30)‖Lf^‖2∶=1N∑i=1N∑j=1NL*Lκ(x,Xi;2h2)|x=Xj. Just like the way that h2* is derived for the Gaussian kernel density estimator, h2* is selected to make both estimators 𝔼fL*Lf(x)^ and ∥Lf^∥2 have the same asymptotic mean square error: (3.31)h2*=(8+224-32Ef[σ-1(X)]8πNEf[L*L2f(X)])2/7. Note that h2* has the same rate of convergence to 0 as h2*. In fact, since the Gaussian kernel density estimator is a special case of the diffusion estimator when p(x)=a(x)=1, the plug-in estimator equation (3.30) for the estimation of ∥Lf∥2 reduces to the plug-in estimator for the estimation of (1/4)∥f′′∥2. In addition, the h2* in (3.31) and h2* are identical when p(x)=a(x)=1. Thus, the bandwidth for the diffusion estimator given in (3.16) can be selected by using the following algorithm:(1)

Given the data X1,…,XN, run the Improved Sheather-Jones algorithm to obtain the Gaussian kernel density estimator defined in (3.9) evaluated at h^* and the optimal bandwidth h^*2 for the estimation of ∥f′′∥2. This is the pilot estimation step.

(2)

Let p(x) be the Gaussian kernel estimator from above step, and let a(x)=pα(x) for some α∈[0,1].

(3)

Estimate ∥Lf∥2 via the plug-in estimator given in (3.30) using h^2*=h^*2

(4)

Substitute the estimate of ∥Lf∥2 into (3.28) to obtain an estimate for h*.

(5)

Deliver the diffusion estimator in (3.16) evaluated at h^* as the final density estimate.

The flow chart of the entire bandwidth selection algorithm was shown in Figure 3.

Figure 3

Flow chart of kernel density estimation via diffusion.

4. Numerical Simulation

Feasibility of using the kernel density estimation method to estimate the p.d.f. of experiment results was demonstrated in a thin plate structure via finite-element (FE) simulation. Eight PZT wafers were surface installed at an aluminium plate. The aluminium plate was 600 mm × 600 mm × 1.5 mm in size, supported with all its four edges. The elastic modulus, poission’s ration, and density of the aluminium are 71e9GPa, 0.35, and 2711 Kg/m³, respectively. The thin plate was three dimensionally modeled using eight-node brick solid elements. To ensure simulation precision, the largest dimension of FE elements was less than 1 mm and the plate was divided into multilayer in thickness, guaranteeing that at least ten elements were allocated per wavelength of the incident diagnostic wave, which has been demonstrated sufficiently to portray the characteristics of elastic waves in the thin plate [19]. A through-thickness hole of 16 mm in diameter was assumed in the plate, 200 mm and 200 mm away from the left and low edges of the plate, respectively (Figure 4). The S0 mode of Lamb waves was used to detect damage. Five-cycle Hanning window-modulated sinusoid tone bursts at a central frequency of 300 kHz were activated as the incident diagnostic wave signal. The speed of S0 mode is 5159.5 m/s in this simulation.

Figure 4

Schematic of numerical simulation mode.

Gaussian noise is statistical noise that has its probability density function equal to that of the normal distribution, which is also known as the Gaussian distribution. A special case is white Gaussian noise, in which the values at any pairs of times are statistically independent (and uncorrelated). It is well known that noise comes from many natural sources is Gaussian noise. Therefore, in order to simulate the environment noise, three signal-to-noise ration (SNR) levels (20 dB, 30 dB, and 40 dB) of white Gaussian noise were intentionally added into the numerical simulated Lamb waves signals.

In numerical model, four sensor pairs are used to locate the damage. The sensor pairs are s2-6 formed by sensor 2 and sensor 6; s4-8 by sensor 4 and 8; s3-7 by sensor 3 and 7; s3-5 by sensor 3 and 5. The process of adding three levels white Gaussian noise in the signals captured by the four sensor pairs repeated 30 times. That is, there are 30 ToF results for each sensor pair under each level of noise.

5. Results and Discussion 5.1. The Characteristics of Noise-Induced Error in ToF

It can be expected in theory that the nonparametric estimation methods should have a better performance than parametric estimation method when deal with the distribution without a priori knowledge about its type. The advantage of kernel density estimation method will be demonstrated in this paper by estimating fTm(t) of s4-8. In statistic, the performance of density estimation methods is usually verified through comparing the estimation results with the bona fide p.d.f of some well-known datasets. That is, in order to show the accuracy of estimation results, one needs to know the real p.d.f. of the distribution to be estimated. It is difficult to give the analytical expression of fTm(t) about ToF measured by threshold method. However, partial understanding about the characteristics of noise-induced error in ToF still can be obtained by analyzing the process of threshold method. That will be helpful to prove the advantage of nonparametric estimation methods in ToF method.

ToF is given by comparing the arriving time of incident waves and damage-scattered waves. Since the incident waves is strong, the errors in arriving time of incident waves can be neglected. Without loss of generality, the errors in ToF was considered to be caused entirely by the errors in the arriving time of damage-scattered waves.

As mentioned in Section 2.1, the existence of wave packet is determined by whether the amplitude of signal is bigger than the threshold value. Once a wave packet is detected, the arriving time of entire wave packet is given by the time of recorded peaks. The process of threshold method suggests there are two kinds of noise-induced errors in ToF: (5.1)Tm=T+ε1+ε2, where ε1 denotes the variance in the arriving time of single peak, ε2 denotes the error caused by misidentification of peaks. While ε1 is easy to understand, ε2 is relatively complex. The signal received by s4-8 which shown in Figure 5 is taken as example to explain the existence of ε2. Noise not only can change the time of peaks, but also can change the relative magnitude relationship of peaks. That means the sequence of peaks on its magnitude may be changed by noise. If there were no noise and the arriving time was measured by recording the strongest peak, then the second peak of the damage-scattered waves shown in Figure 5 should be recorded. However, the strongest peak may change to other peaks, such as the third or the fourth peak, in noise-contaminated signals. The same problem exists in the method of recording several peaks. For example, if there is no noise and the arriving time is measured as the average of four peaks. Then, the first four peaks (the second, the third, the fourth, and the fifth in this case) should be recorded. However, the first peak in noise-contaminated signals is likely to become stronger than the fifth peak. That leads to the error ε2 in ToF.

Figure 5

The signals of Lamb waves received by sensor 8.

It is obvious that ε2 is larger than ε1, but it appears only in strong noise environment.

5.2. Density Estimation Results

Parametric estimation method, the kernel density estimation based on the Gaussian approximation, and the adaptive kernel density estimation via diffusion were used to estimate fTm(t). The sample data is ToF measured by s4-8 with three levels noise.

The estimation results for the signal with 40 dB SNR noise was shown in Figure 6. The symbol “+” in Figure 6 and the following Figures 7, 8, and 9 were used to give an intuitive understanding about the distribution of samples. Each “+” represented a sample. It could be seen that samples were distributed around the two values. Most of the samples (26 samples of total 30 samples) were distributed in the range from 1.1e-5 second to 1.15e-5 second. 4 samples were distributed in the range from 0.82e-5 second to 0.87e-5 second. The p.d.f. given by the kernel density estimation based on the Gaussian approximation and the adaptive kernel density estimation via diffusion was the functions with two peaks. The p.d.f. given by parametric estimation method was undeniably a normal density function. Based on the conclusion drawn in the above section about the characteristics of noise-induced errors in ToF, the distribution of samples could be easily understood. Because the noise was weak in this case, most of the samples, which were only affected by ε1, were distributed around the analytic value of ToF (1.117e-5 second). The other 4 samples which were relatively far from the analytic value were affected by both ε1 and ε2. Therefore, it could be learnt that two kinds of kernel density estimation make correct estimating about the p.d.f. of Tm. Because the assumption about the type of distribution to be estimated was incorrect, parametric estimation method was very misleading in this case.

Figure 6

p.d.f. estimated results for samples from s4-8 with 40 dB noise.

Figure 7

p.d.f. estimated results for refined samples from s4-8 with 40 dB noise.

Figure 8

p.d.f. estimated results for samples from s4-8 with 30 dB noise.

Figure 9

p.d.f. estimated results for samples from s4-8 with 20 dB noise.

The fact that only 4 samples were affected by both ε1 and ε2 in this case could be utilized to learn the characteristic of ε1. Since these samples could be easily distinguished from the samples which were only affected by ε1, these samples could be excluded from the data set. Then, the density function was estimated with the refined dataset. The results were shown in Figure 7. It could be seen that the results of two kinds kernel density estimation methods were similar to normal distribution.

Lilliefors test was adopted to check whether the refined samples came from a normal distribution. In statistics, the Lilliefors test, named after Hubert Lilliefors, was an adaptation of the Kolmogorov-Smirnov test [38]. It was used to test the null hypothesis that data came from a normally distributed population, when the null hypothesis did not specify which normal distribution; that is, it did not specify the expected value and variance of the distribution.

The calculated value from the Lilliefors test was 0.1373, which was less than the critical value 0.1699 corresponding to 5% significance level. The null hypothesis that the refined data came from a normally distributed population was accepted. It explained why the empirical formula given in the previous work was a normal distribution type and why the damage detection results based on the empirical formula was satisfied. Since the noise in previous work [12] was weak and the Tm data was only affected by ε1, its distribution was actually normal distribution.

The estimation results for the signals with 30 dB SNR noise were shown in Figure 8. It could be seen that as in the case of 20 dB SNR noise, parametric estimation method failed to give correct estimation.

The estimation results for the signals with 20 dB SNR noise were shown in Figure 9. It could be seen that, with the increase of noise level, the kernel density estimation based on the Gaussian approximation, which was traditional kernel density estimation, failed to give correct estimation. Only the novel and completely data-driven method, the kernel density estimation via diffusion-, could give correct estimation.

5.3. Damage Detection Results

The damage localization under 20 dB noise environment was selected as the example to show that an accurate estimation was important for the localization result. The p.d.f. estimation results given by three kinds of density estimation methods introduced in Section 2 were used to calculate the location of damage. The results were shown in Figures 10, 11, and 12. It could be seen that the locating process which employed the kernel density estimation via diffusion has the most accurate localization result. This indicated that the an accurate estimation could ensure an better localization result.

Figure 10

Damage localization result based on parametric estimate method (partial view).

Figure 11

Damage localization result based on kernel density estimation with Gaussian approximation (partial view).

Figure 12

Damage localization result based on kernel density estimation via diffusion (partial view).

6. Conclusion

The characteristics of noise-induced error in ToF data measured by using threshold method were analyzed.

The empirical formula method and the parametric estimation method presented in existing work had the same assumption that the experimental data came from a normal distribution. This assumption had been verified by real experiments and numerical simulation. The results in this paper revealed that the type of distribution of ToF data was related to the noise level. The empirical formula method and the parametric estimation method were developed in laboratory environment where the noise was weak. It had also been proved in this paper that the ToF data measured from high SNR signal (SNR > 40 dB) were distributed normally. Therefore, the density estimation method with the normality assumption presented in existing work can work well in laboratory environment.

However, the signals of field experiment usually contained much more strong noise. The results in this paper showed that even for the signal with 40 dB SNR, the distribution of measured ToF data were not normal distribution. In this case, nonparametric estimation method must be emplyed to estimate the p.d.f. correctly. Further, investigating about the signals with 30 dB and 20 dB noise showed that, with the increasing noise, only the kernel density estimation via diffusion, which is purely data driven, can give a satisfied estimating result.

The damage localization under 20 dB noise environment had been carried out. Parametric estimation method with the normality assumption, the kernel density estimation based on the Gaussian approximation and the kernel density estimation via diffusion were adopted to estimate the p.d.f. of measured data. Three different p.d.f. were obtained by employing the above-motioned three kinds of density estimation methods. By using each p.d.f, a damage location result can be calculated. Through comparing the three results of damage location, it can be seen that an accurate estimation of p.d.f. has a direct effect on the accuracy of the results. Applying kernel density estimation in Lamb wave-based damage detection was necessary.

The noise studied in this paper was the white Gaussian noise. The noise in the real field experiment was much more complex. Further study was needed to reveal the characteristic of errors in ToF data caused by noise in field experiment. However, the complex nature of noise in field experiment could not be a trouble for the application of kernel density estimation method, instead, it could be a reason to apply this method. It had been proved that when deal with simple noise, the kernel density estimation method introduced in this paper performed better, in comparison with empirical methods. Since the kernel density estimation method did not rely on any assumption about the distribution to be estimated, it could be expected that the kernel density estimation method could demonstrate a greater advantage in a complex noise environment.

Acknowledgments

This work was financially supported by National Natural Science Foundation of China under grant no. 50905141, the Program for New Century Excellent Talents in University of China, and the NPU Foundation for Fundamental Research under grant no. NPU-FFR-JC20110258.

Yan

Y. J.

Cheng

Z. Y.

Yam

L. H.

Development in vibration-based structural damage detection technique

Mechanical Systems and Signal Processing 2007 21 5 2198 2211

2-s2.0-34047250842

10.1016/j.ymssp.2006.10.002

Yang

A new damage identification method based on structural flexibility disassembly

Journal of Vibration and Control 2011 17 7 1000 1008

10.1177/1077546309360052

Cheng

Yam

L. H.

Yan

Y. J.

Jiang

J. S.

Online damage detection for laminated composite shells partially filled with fluid

Composite Structures 2007 80 3 334 342

2-s2.0-33847683386

10.1016/j.compstruct.2006.05.019

Zhou

L. M.

Cheng

Meng

Evaluation of welding damage in welded tubular steel structures using guided waves and a probability-based imaging approach

Smart Materials and Structures 2011 20 1

2-s2.0-79951622669

10.1088/0964-1726/20/1/015018

015018

Chen

Cheng

Identification of corrosion damage in submerged structures using fundamental anti-symmetric Lamb waves

Smart Materials and Structures 2010 19 1

2-s2.0-75649115289

10.1088/0964-1726/19/1/015004

015004

Zumpano

Meo

A new nonlinear elastic time reversal acoustic method for the identification and localisation of stress corrosion cracking in welded plate-like structures—a simulation study

International Journal of Solids and Structures 2007 44 11-12 3666 3684

2-s2.0-33947587947

10.1016/j.ijsolstr.2006.10.010

Guided Lamb waves for identification of damage in composite structures: a review

Journal of Sound and Vibration 2006 295 3–5 753 780

2-s2.0-33746880693

10.1016/j.jsv.2006.01.020

Farrar

C. R.

Doebling

S. W.

Nix

D. A.

Vibration-based structural damage identification

Philosophical Transactions of the Royal Society A 2001 359 1778 131 149

2-s2.0-34250821432

10.1098/rsta.2000.0717

Sohn

Farrar

C. R.

Hemez

F. M.

Czarnecki

J. J.

A review of structural health monitoring literature: 1996–2001

Proceedings of the 3rd World Conference on Structural Control

April 2002

Como, Italy

Cheng

Wang

Zhou

Predicting delamination of composite laminates using an imaging approach

Smart Materials and Structures 2009 18 7

2-s2.0-68549122920

10.1088/0964-1726/18/7/074002

074002

Niri

E. D.

Salamone

A probabilistic framework for acoustic emission source localization in plate-like structures

Smart Materials and Structures 2012 21 3

035009

10.1088/0964-1726/21/3/035009

Liu

The definition and measurement of the probability density function in lamb wave damage detection based on data fusion

Proceedings of the14th Asia Pacific Vibration Conference

2011

Kowloon, Hong Kong

Raghavan

Cesnik

C. E. S.

Review of guided-wave structural health monitoring

Shock and Vibration Digest 2007 39 2 91 114

2-s2.0-33847132245

10.1177/0583102406075428

Pierce

S. G.

Culshaw

Manson

Worden

Staszewski

W. J.

Application of ultrasonic Lamb wave techniques to the evaluation of advanced composite structures

3986

Smart Structures and Materials 2000—Sensory Phenomena and Measurement Instrumentation for Smart Structures and Materials

March 2000

93 103 Proceedings of SPIE

2-s2.0-0033702878

Alleyne

D. N.

Cawley

The interaction of Lamb waves with defects

IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 1992 39 3 381 397

2-s2.0-0026869904

Grondel

Paget

Delebarre

Assaad

Levin

Design of optimal configuration for generating A0 Lamb mode in a composite plate using piezoceramic transducers

Journal of the Acoustical Society of America 2002 112 1 84 90

2-s2.0-0036304893

10.1121/1.1481062

A damage identification technique for CF/EP composite laminates using distributed piezoelectric transducers

Composite Structures 2002 57 1–4 465 471

2-s2.0-0036658980

10.1016/S0263-8223(02)00115-0

Lemistre

Balageas

D. L.

Structural health monitoring system based on diffracted Lamb wave analysis by multiresolution processing

Smart Materials and Structures 2001 10 3 504 511

2-s2.0-0035360962

10.1088/0964-1726/10/3/312

Yang

Quantitative assessment of through-thickness crack size based on Lamb wave scattering in aluminium plates

NDT and E International 2008 41 1 59 68

2-s2.0-35548980949

10.1016/j.ndteint.2007.07.003

Giurgiutiu

Bao

Zhao

Piezoelectric wafer active sensor embedded ultrasonics in beams and plates

Experimental Mechanics 2003 43 4 428 449

2-s2.0-0346024319

10.1177/0014485103434008

Cheng

Correlative sensor array and its applications to identification of damage in plate-like structures

Structural Control and Health Monitoring. In press

10.1002/stc.461

Hutchins

D. A.

Jansen

D. P.

Edwards

Lamb-wave tomography using non-contact transduction

Ultrasonics 1993 31 2 97 103

2-s2.0-0027256939

10.1016/0041-624X(93)90039-3

Wang

Meng

Probabilistic damage identification based on correlation analysis using guided wave signals in aluminum plates

Structural Health Monitoring 2010 9 2 133 144

2-s2.0-77749306535

10.1177/1475921709352145

Peng

Yuan

Damage localization on two-dimensional structure based on wavelet transform and active lamb wave-based method

Materials Science Forum 2005 475-479 3 2119 2122

2-s2.0-17044426766

De Marchi

Marzani

Speciale

Viola

A passive monitoring technique based on dispersion compensation to locate impacts in plate-like structures

Smart Materials and Structures 2011 20 3

2-s2.0-79952830539

10.1088/0964-1726/20/3/035021

035021

Perelli

De Marchi

Marzani

Speciale

Acoustic emissions localization in plates with dispersion and reverberations by using sparse PZT sensors in passive mode

Smart Materials and Structures 2012 21 2

025010

10.1088/0964-1726/21/2/025010

Zhao

Gao

Zhang

Ayhan

Yan

Kwan

Rose

J. L.

Active health monitoring of an aircraft wing with embedded piezoelectric sensor/actuator network: I. Defect detection, localization and growth monitoring

Smart Materials and Structures 2007 16 4 1208 1217

2-s2.0-34547395399

10.1088/0964-1726/16/4/032

Wang

Cheng

Chen

On selection of data fusion schemes for structural damage evaluation

Structural Health Monitoring 2009 8 3 223 241

2-s2.0-65349145396

10.1177/1475921708102140

Rosenblatt

Remarks on some nonparametric estimates of a density function

Annals of Mathematical Statistics 1956 27 3 832 837

10.1214/aoms/1177728190

0079873

Parzen

On estimation of a probability density function and mode

Annals of Mathematical Statistics 1962 33 3 1065 1076

10.1214/aoms/1177704472

0143282

Scott

D. W.

On optimal and data-based histograms

Biometrika 1979 66 3 605 610

2-s2.0-0018734781

556742

Silverman

B. W.

Density Estimation for Statistics and Data Analysis 1998

London, UK

Chapman & Hall, CRC Press

Monographs on Statistics and Applied Probability

848134

Graham

R. L.

Knuth

D. E.

Patashnik

Concrete Mathematics 1994 2nd

Reading, Mass, USA

Addison-Wesley

1397498

Botev

Z. I.

Grotowski

J. F.

Kroese

D. P.

Kernel density estimation via diffusion

Annals of Statistics 2010 38 5 2916 2957

2-s2.0-77956220369

10.1214/10-AOS799

2722460

Chaudhuri

Marron

J. S.

Scale space view of curve estimation

Annals of Statistics 2000 28 2 408 428

2-s2.0-0034347388

1790003

Larsson

Thomée

Partial Differential Equations with Numerical Methods 2003 45

Berlin, Germany

Springer

Texts in Applied Mathematics

1995838

Wand

M. P.

Jones

M. C.

Kernel Smoothing 1995 60

London, UK

Chapman & Hall

Monographs on Statistics and Applied Probability

1319818

Lilliefors

H. W.

On the Kolmogorov-Smirnov test for the exponential distribution with mean unknown

Journal of the American Statistical Association 1969 64 325 387 389