Self-Consistent Density Estimation in the Presence of Errors-in-Variables

and Applied Analysis 3 where ∗ denotes the convolution and φ Y denotes the characteristic function of Y. Let


Introduction
The statistical methodology of density estimation has been widely studied in the literature (see [1][2][3][4][5]) and still is a hot issue in nonparametric statistics.For example, Park et al. [6] studied the kernel-based local likelihood density estimation.Jones and Henderson [7] proposed a Gaussian copula kernel estimator.Botev et al. [8] introduced an adaptive kernel density estimation method based on the smoothing properties of linear diffusion processes.The most commonly used nonparametric method is kernel density estimation.Usually, the choice of kernel function is not crucial, whereas the bandwidth parameter, which controls the degree of smoothing, must be chosen carefully.However, it is well known that the choice of bandwidth is difficult.Cross-validation techniques have been previously applied for this, but they are computationally expensive.Furthermore, it becomes difficult to estimate the density and choose the bandwidth when there exists the measurement error.
In this paper, we consider the density estimation in the presence of additive measurement error.Suppose we have  i.i.d.available observations  1 , . . .,   which have the same distribution as that of , to estimate the unknown density () of a random variable , where with a measurement error , and  is independent of .The density function of  is denoted as   .
When the distribution of  is known, the statistical methodology for estimating the unknown density () has been extensively discussed in the literature.For instance, Carroll and Hall [2] and Fan [9,10] discussed the optimal rates of convergence over a class of functions whose derivatives are Lipschitz continuous using kernel method.Zhang [11] studied the deconvolution kernel density estimation of the mixing densities and distributions and derived the optimal rates of convergence.See also Koo [12], Pensky and Vidakovic [13], Fan and Koo [14], and Comte et al. [15], among others, for earlier contributions.Recently, Butucea [4] and Butucea and Tsybakov [5] evaluated the minimax rate of convergence of the pointwise risk using the kernel method and computed upper bounds for the  2 risk of the estimator.However, most of those papers deal only with theoretical aspects of the estimation, and very few focus attention on the yet important issue of choosing the bandwidth in practice.Delaigle and Gijbels [16] proposed a bandwidth selection procedure based on bootstrap techniques and proved that the mean integrated squared error (MISE) of the bootstrap estimator for the density function converges to the exact mean integrated squared error.However, the algorithm is complex and the quality of the estimator of the density depends strongly on the choice of the pilot bandwidth which must be chosen before the bootstrap procedure.
Indeed, in most practical applications, the distribution of  cannot be perfectly known.Meister [17] studied the effect of the misspecification of the error density on the MISE of the deconvolution estimator.He pointed out that the limit of MISE can be infinite in some cases.Sometimes, this problem can be solved by repeated observations of the same variable of interest, each time with an independent error; see, for example, Li and Vuong [18], Delaigle et al. [19], or Neumann [20] and references therein.However, there are also many application fields where it is not possible to do repeated measurements of the same variable.In that case, the information of the error distribution can be drawn from an additional experiment; this means that knowledge of   can be replaced by observations  −1 , . . .,  − , which is a noise sample with distribution   , and is independent of  1 , . . .,   .Neumann [21] and Kerkyacharian et al. [22] replaced the characteristic function of the error by its empirical version.Johannes [23] studied the density deconvolution with unknown (but observed) noise and showed that the proposed estimators are asymptotically optimal in a minimax sense over a Sobolev space.The resulting estimators depend on two bandwidth-type parameters, but the data-driven selection of these bandwidths was not done.Recently, Comte and Lacour [24] proposed an adaptive estimator of the density based on a data-driven model selection strategy and discussed the convergence rates of the estimator.In addition, they studied the link between  and  if one wants to preserve the rate that is found in the case where the distribution of  is known.
In the present paper, we study model (1), completed with a known or unknown distribution of error , and construct the self-consistent estimators of the density () by searching for the optimal kernel when the error distribution is ordinary smooth or super smooth.The method of constructing the estimator of density function by optimal kernel was first proposed by Watson and Leadbetter [25].But the result can not be used directly, since the Fourier transform of the true density function is unknown.Bernacchia and Pigolotti [26] derived the exact expression of the density estimator based on optimal kernel and defined it as self-consistent estimate.They showed that the self-consistent method has better performance than existing methods for all examples that they studied.But the data in their paper did not include measurement errors.The self-consistent method shares some desired features: the choice of the bandwidth-type parameter is more convenient than that of kernel or spline method; thus the computing speed can be improved significantly; the resulting estimators are consistent; the proposed method is preferable for applications, especially for the large datasets.Thus, it is of great significance to extend the self-consistent estimate method to more general case.
The paper is organized as follows.In Section 2, we propose the self-consistent method for density estimation with the known distribution of , or with the noise sample  −1 , . . .,  − when   is unknown, and give the asymptotic properties.In Section 3, some simulations are carried out to illustrate the efficacy of the proposed method.Two real data examples are used for illustration in Section 4. The proofs of the main results are included in Appendix.

Error Distribution Is Known.
For the sake of descriptive convenience, we first introduce some notations.For two real numbers  and , we denote  ∧  = min(, ) and  + = max(, 0).Let  be a complex number, let  denote its conjugate, and let || denote its modulus.Let ‖‖ be the  2 -norm of ; that is, ‖‖ = ∫ R |()| 2 .The Fourier transform of  is defined by Similarly, we denote the characteristic function as   () for the known density function   of .
The smoothness of   is described by the following assumption.There exist some positive constants  0 ,  1 , , and  and constants  0 ∈ R ( 0 > 0 if  = 0), such that the characteristic function   () of the error distribution satisfies  0 ( 2 + 1) When  > 0 we call the distribution of  as super smooth of order .For example, standard normal and Cauchy distributions are super smooth with  = 2 and 1, respectively.When  = 0 we call the distribution of  as ordinary smooth of order  0 .For example, gamma distribution (, ) and double exponential distribution are ordinary smooth with  0 =  and 2, respectively.Assume that the density function () of  belongs to the following type of smoothness spaces: where  > 0,  ≥ 0,  > 0,  ∈ R, and  > 1/2 if  = 0.If  > 0, the density function () is called super smooth and ordinary smooth otherwise.Let   denote the common density of the   's.By (1) and the independence assumption between  and , we have where * denotes the convolution and   denotes the characteristic function of .Let be a kernel estimator of   with the kernel function (⋅).
Then, the Fourier transform of f is defined by φ () =   ()Δ(), where Δ() = (1/) ∑  =1 exp(i  ) is the empirical characteristic function of .Therefore, under the classical assumption   () ̸ = 0, for any  ∈ R, we can obtain an estimator of   as follows: By the inverse Fourier transform, we have Note that (8) does not depend on any bandwidth ℎ, compared to the deconvolution kernel density estimation methods in the literature.Instead of looking for an arbitrary shape for the kernel (⋅) and choosing an optimal bandwidth (see [1]), we rather look for an optimal shape of the kernel (⋅) such that the estimator f() minimizes the mean integrated square error By means of Parseval's theorem and ( 7), we have , and the MISE can be rewritten as Then we get the optimal kernel, which in Fourier space reads Replacing   () in ( 7) by the optimal kernel (12), we have Similar to the discussion of Bernacchia and Pigolotti [26], the self-consistent estimator of   is equivalent to solving the equation Simple calculation yields where is a set of accepted frequencies (i.e., the frequencies giving a nonzero contribution to the estimate).In practical applications, when the sample  1 , . . .,   has been observed, we can choose a bounded interval where t is a truncation parameter and t can be chosen as follows.
Step 3. Repeat Step 2 until t satisfies the inequality condition.
From the simulation studies in Section 3, we can see that the choice of t is convenient and fast, and the density estimator is not sensitive to the choice of t.
By (15) and the inverse Fourier transform, the selfconsistent estimate of () is defined by Let Then, fsc () can be rewritten in the kernel form We state the asymptotic behavior of the self-consistent estimator in the following theorem.
(b) The error distribution is ordinary smooth and t = ( 1/2( 0 +1) ), Then the self-consistent density estimator fsc () defined by (17) is consistent; that is, fsc () Remark 2. As many other methods, for example, higherorder kernel estimators, spline estimators, wavelet estimators, orthogonal expansion estimators, and so forth, the resulting self-consistent estimators are not nonnegative, but those can be corrected without any error cost by translating the estimate downwards until the positive part is normalized to 1 and setting to 0 the negative part.For example, the modified estimator is defined by fsc () = max{ fsc () − , 0}, where  is chosen in such a way that ∫ +∞ −∞ fsc () = 1; see more details in Efromovich [27], Glad et al. [28], and so on.
Remark 5.In the cases 0 <  <  or  0 ≤  and  = 0, from the proof of theorem in Appendix, one can check that the estimator defined by ( 24) is consistent as long as the noise sample size  tends to infinity as  increases, while the rate of  tending to infinity is not restricted.Therefore, in practice, the noise sample size  does not need to be very large in these cases.

Simulation Studies
In this section, we report on the simulation studies to illustrate the finite sample performances of the proposed selfconsistent (SC) estimation method and compare it with the kernel (KN) method.For comparison, we compute the estimators for different signal densities and different types of noise.The following densities are considered: (a) gamma distribution () = 10.125 4  −3 with shape parameter  = 5 and scale parameter  = 3; Two kinds of error distributions are considered to study their effects on the MISE of the estimators.
(c) Double exponential error: the density of  is given by (d) Normal error: the density of  is given by For kernel method, we choose the Gaussian kernel for double exponential error, and for normal error we suppose that the kernel has a Fourier transform   () = (1 −  2 ) 2 + .For the unknown error distributions, we suppose that the noise sample  −1 , . . .,  − comes from the above two error distributions.Throughout the simulations, we set  = 100, 200, 400 and  = 50, 100.The MISE of the estimators are computed with 1000 random samples, and the results are reported in Tables 1 and 2. When   is known or unknown, the gamma density estimator through the double exponential noise and the normal density estimator through the normal noise are shown in Figures 1 and 2, respectively.
Tables 1 and 2 indicate the following simulation results.
(1) When the error distribution is unknown, we can see that the estimator of the characteristic function of the error does not spoil the procedure so much.It even happens that the estimation with unknown error distribution works better, which is probably due to the truncation ( 23).
(2) For the same observation sample, the self-consistent method performs slightly better than the kernel method.It is worth mentioning that the selfconsistent method can save more time than the kernel method, because the truncation parameter t can be chosen conveniently.
(3) The MISE obtained by self-consistent and kernel methods decrease as  increases whether the error distribution is known or unknown.
Figures 1 and 2 clearly show that the self-consistent estimators perform well, no matter whether the error distribution is known or unknown.Moreover, for  = 10, the results show that our method is still very satisfactory for small noise sample size.Next, we evaluate the sensitivity of the self-consistent density estimation procedure on the choice of truncation parameter t.In this simulation, we only consider the case when the error distribution is double exponential, and the value of t is fixed at t = t * , t * /1.25, and 1.25 t * , respectively, where t * is obtained by the iterative algorithm given in Section 2.1.The MISE of the self-consistent estimator is reported in Table 3. From Table 3 we can see that, for given , the performance of the proposed method does not depend sensitively on the choice of the value of t.

Application to Real Data
We now use two real datasets to illustrate the proposed method.The first dataset is the density of direction chosen by an ant to an evenly illuminated black target.The experiment with 100 ants was described by Fisher [29], who concluded that the ants tend to run toward the target with some moderate variation.For this particular example, , a point where an ant intersects a circle, can be treated as indirect observation of the direction  chosen by an ant.The estimation method of ( 17) is used to analyze the dataset.In Figure 3, the solid line denotes the estimate based on the assumption that the data can be accurately measured (i.e.,  = ), and the dashed and dot-dashed lines denote the case of (0,  2 ) measurement errors with  = 0.3 and  = 0.5, respectively.We compare our results with Efromovich [30] who already used the orthogonal series approach to estimate the density.
From Figure 3, we find that when the data are accurately measured ( = 0), the corresponding density estimate is rather smooth and has a pronounced mode (direction of the target) with a relatively large deviation.When the standard deviation  increases, it makes the density more picky interestingly and makes two additional local modes more significant.Therefore, the assumption about measurement error reveals the presence of three distinct groups (strata) of ants.A majority clearly tends to move toward the target with a rather small variation, and a minority tends to move in two other directions about the target also with small variation.These findings basically agree with those that were discovered by Efromovich [30].But the two directions are not symmetric  here, and the variation of the right side is slightly less than that of the left side.
The second example involves estimating density of the magnitudes of Alaskan earthquakes for the period from 1969 to 1978.The data comes from the National Oceanic and Atmospheric Administration's Hypocenter Data File [31].In this dataset,  denotes the logarithm of the seismogram amplitude of longitudinal body waves for 62 Alaskan earthquakes.There is a measurement error associated with the observations, which includes errors made in determining the amplitude of ground motion arising from such things as the orientation of a limited number of observation stations to the fault plane of the earthquake.In Figure 4, the solid line denotes the estimate based on assumption that the body waves are accurately measured (i.e.,  = ), and the dashed and dot-dashed lines denote the case of Laplace measurement errors with position parameter  = 0 and scale parameters  = 0.3 and  = 0.5, respectively.
Figure 4 clearly shows the nature of the indirect data.Under the assumption that the data are accurately measured ( = 0), the corresponding density estimate is smooth and has two pronounced modes.For the case of  ̸ = 0, the distinctions between the density estimates become more noticeable.It makes the data become a unimodal distribution with  increasing.It is clear that the presence of the measurement error allows one to gain some insight into the data.

Figure 1 :Figure 2 :
Figure 1: Estimates of gamma density through double exponential noise for  = 400.The thick solid line is the true density, the thin solid line and the dashed line are the kernel estimator and the SC estimator when   is known, respectively, and the dot-dashed line and the dotted line are the SC estimators when the noise sample sizes  = 50 and  = 10, respectively.

Figure 3 :
Figure 3: Estimates of density of movement of ants.The solid line is the estimate based on direct data, and the dashed and dot-dashed lines are the estimates under assumption on normal error with zero mean and standard deviations  = 0.3 and  = 0.5, respectively.

Figure 4 :
Figure 4: Estimates of density of body waves.The solid line is the estimate based on direct data, and the dashed and dot-dashed lines are the estimates under assumption on Laplace error with position parameter  = 0 and scale parameters  = 0.3 and  = 0.5, respectively.

Table 1 :
MISE for the self-consistent estimator and the kernel estimator when the error distribution is double exponential.

Table 2 :
MISE for the self-consistent estimator and the kernel estimator when the error distribution is normal.

Table 3 :
MISE for the self-consistent estimator with different values of t when the error distribution is double exponential.