In many applications, observed signals are contaminated by both random noise and blur. This paper proposes a blind deconvolution procedure for estimating a regression function with possible jumps preserved, by removing both noise and blur
when recovering the signals. Our procedure is based on three local linear kernel estimates of the regression function, constructed from observations in a left-side, a right-side, and a two-side neighborhood of a given point, respectively. The estimated function at the given point is then defined by one of the three estimates with the smallest weighted residual sum of squares. To better remove the noise and blur, this estimate can also be updated iteratively. Performance of this procedure is investigated by both simulation and real data examples, from which it can be seen that our procedure performs well in various cases.
1. Introduction
Nonparametric regression analysis provides us statistical tools for estimating regression functions from noisy data [1]. When the underlying regression function has jumps, the estimated functions by conventional nonparametric regression procedures are not statistically consistent at the jump positions. However, the problem to estimate jump regression functions is important because the true regression functions are often discontinuous in applications [2]. This paper focuses on estimation of the jump regression function when the observed data are contaminated by both random noise and blur.
In the literature, there are some existing procedures for estimating regression curves with jumps preserved in cases when only random noise is present in observed data. These procedures include the one-sided kernel estimation methods (e.g., [3–9]), the local least-squares estimation procedures (e.g., [10–12]), the wavelet transformation method [13], the spline smoothing method [14], and the direct estimation methods (e.g., [15, 16]).
In some applications, our observations are both blurred and contaminated by pointwise noise (e.g., signals of groundwater levels in geothermy). It is, therefore, important to remove both noise and blur when estimating the true regression function. In the nonparametric regression literature, we have not seen any discussion about this problem yet. In the context of image processing, which can be regarded as a two-dimensional nonparametric regression problem [2], there is a related research area concerned about deblurring images. Most existing image deblurring procedures assume that the point-spread function (psf), which describes the blurring mechanism, either is known or has a parametric form, and this function is homogeneous in the entire image (e.g., [17–20]). In many applications, however, it is difficult to specify the psf completely or partially by a parametric model. Our major goal in this paper is to provide a method to estimate the true regression function properly in cases when both noise and blur are present and the psf is unspecified.
The remaining part of the paper is organized as follows. In next section, our proposed method is discussed in detail. Some comparative results are presented in Section 3 with several simulated examples. A real data application is presented in Section 4. Some concluding remarks are included in Section 5.
2. Our Proposed Method
Suppose that the regression model concerned isyi=h⊗f(xi)+ɛi,i=1,2,…,n,
where 0<x1<x2<⋯<xn<1 are design points, ɛi are i.i.d. random noise with mean 0 and variance σ2, and f is an unknown nonparametric regression function that is continuous in [0,1] except on some jump points 0<s1<s2<⋯<sm<1. In (2.1), h is a psf, and h⊗f denotes the convolution between h and f, defined byh⊗f(x)=∫Rh(u)f(x-u)du.
If h is not zero at the origin and zero everywhere else, then there is no blurring in the observed curve. In such cases, model (2.1) is the conventional nonparametric regression model. Generally speaking, the problem described by model (2.1) is ill posed if both h and f are unknown, because infinite sets of h and f would correspond to the same response in such cases. Therefore, proper estimation of f based on (2.1) is a challenging problem. As a demonstration, in Figure 1(a), the solid lines denote a true regression function f with one jump point at x=0.5, the dotted curve denotes the blurred regression function h⊗f, and the small pluses denote a set of observations from model (2.1).
(a) Solid lines denote the true regression function, and dotted curve denotes the blurred regression function. (b) Solid lines denote the estimate of the regression function by the proposed method, dotted curve denotes the blurred true regression function, and dashed curve denotes the conventional local linear kernel estimate of the regression function. In both plots, little “+”s denote a set of observations generated from model (2.1).
To estimate f, by the conventional local linear kernel (LLK) smoothing [1], we would consider the problemmina,b∑i=1n{yi-[a+b(xi-x)]}2K(xi-xhn),
where K is a density kernel function with support [-1,1] and hn is a bandwidth parameter. Then the solution of (2.3) to a, denoted as âc(x), is defined as the LLK estimator of f(x). In Figure 1(b), the dotted curve denotes the blurred regression function, and the dashed curve denotes the LLK estimate of f. It can be seen that the LLK estimate removes the noise well; but the blur is not removed and it is actually made more serious around the jump point.
Qiu [15] finds that the major reason why the conventional LLK estimate could not preserve the jump at the jump point is that it uses a “continuous” function (i.e., a linear function) locally to approximate the jump regression function. To overcome this limitation, Qiu suggests fitting a piecewise linear function around x as follows: minal,bl;ar,br∑i=1n{yi-[al+bl(xi-x)]-[(ar-al)I(xi-x)+(br-bl)(xi-x)I(xi-x)]}2K(xi-xhn),
where I(·) is an indicator function defined by I(u)=1 if u≥0 and 0 otherwise. The solution to al and ar are denoted as âl(x) and âr(x). It is easy to see that âl(x) and âr(x) are actually LLK estimates of f(x) constructed from observations in the left-sided neighborhood [x-hn,x) and the right-sided neighborhood [x,x+hn], respectively, with kernel functions Kl(x)=K(x)I(-x) and Kr(x)=K(x)I(x). Then, Qiu suggests the following jump-preserving estimate of f(x):f̂1(x)={âl(x),ifWRMSl(x)<WRMSr(x),âr(x),ifWRMSl(x)>WRMSr(x),(âl(x)+âr(x))2ifWRMSl(x)=WRMSr(x),
where WRMSl and WRMSr are the Weighted Residual Mean Squares(WRMSs)of the left-sided and right-sided estimates, respectively, defined byWRMSl(x)=∑i=1n[Yi-âl-b̂l(xi-x)]2Kl((xi-x)/hn)∑i=1nKl(xi-x/hn)WRMSr(x)=∑i=1n[Yi-âr-b̂r(xi-x)]2Kr((xi-x)/hn)∑i=1nKr(xi-x/hn).
Qiu [15] proves that f̂1 is a consistent estimate of f when there is no blurring in the observed data.
Since only part of observations is actually used in f̂1, this estimator would be quite noisy in continuity regions of f. To overcome this problem, similar to the method in [16], we propose a modification as follows:f̂2(x)={â(x),ifWRMS(x)≤min[WWRMSl(x),WWRMSr(x)],âl(x),ifWWRMSl(x)<WWRMSr(x),âr(x),ifWWRMSl(x)>WWRMSr(x),(âl(x)+âr(x))2ifWWRMSl(x)=WWRMSr(x).
By (2.7), when x is far away from any jump points, f(x) would be estimated by the conventional LLK estimate. It would still be estimated by one of the one-sided estimates around the jump points. Therefore, it is expected that f̂2(x) would be better for estimating f(x) than f̂1(x). The solid curve in Figure 2 denotes f̂2 constructed from the data shown in either plot of Figure 1. The two dotted curves show âl and âr, respectively. It can be seen that f̂2 indeed estimates f well in this case.
The solid curve denotes f̂2 constructed from the data shown in either plot of Figure 1. The two dotted curves show âl and âr, respectively.
The estimate defined in (2.7) can also be updated iteratively as follows. The estimated values {f̂2(xi),i=1,2,…,n} can be used as observed data, and the estimate f̂2 can be updated by (2.7) with all quantities on the right-hand side of (2.7) computed from {f̂2(xi),i=1,2,…,n}, and this process can continue iteratively. Numerical results in the next section suggest that a good estimate can usually be generated after about 5 iterations in all the cases considered there.
In our procedure, the bandwidth parameter hn should be chosen properly. To this end, we use the following cross-validation (CV) procedure:hn=argminhn1n∑i=1n(yi-f̂-i(x))2,
where f̂-i(x) is the estimate of f(x) using all of the data points except the ith point (xi,yi).
3. Simulation Study
In this section, some simulated examples are presented concerning the numerical performance of our proposed procedure. In all numerical examples presented in this paper, the Epanechnikov kernel function K(x)=3/4(1-x2)I(|x|≤1) is used. We consider the following two true regression functions: f1(x)={4x,if0≤x≤0.2,4(0.4-x),if0.2<x≤0.4,exp-2x2sin(2.5πx)-1,if0.4<x≤0.8,exp-2x2sin(2.5πx),if0.8<x≤1,f2(x)={2-2(0.26-x)0.2,if0≤x≤0.26,2-2(x-0.26)0.6,if0.26<x≤0.78,2-2(x-0.26)0.6+1,if0.78<x≤1.
The function f1 has two jumps at 0.4, 0.8, and a roof discontinuity (i.e., jump in the slope) at 0.2. The function f2 has a jump at 0.78, and an unbalanced cusp (i.e., a sharp angle) at 0.26. These functions are shown by the solid curves in Figure 3. Our observations are generated from model (2.1) with random noise from the N(0,σ2) distribution and the psf h=(1-x2)(1+cos(1.5xπ))I(|x|≤1). One set of observations from each regression function is shown by the little pluses in Figure 3.
(a) Solid curve denotes the true regression function f1, and little pluses denote a set of observations when n=400 and σ=0.2. (b) Solid curve denotes the true regression function f2, and little pluses denote a set of observations when n=400 and σ=0.2.
For the proposed procedure, its Mean-Squared Error (MSE) values with several different (n,σ) combinations are presented in Figure 4 as functions of the number of iterations, where the bandwidth hn is chosen by the CV procedure in each iteration. All of the MSE values are computed based on 100 replications. From the plots in Figure 4, it can be seen that, for each (n,σ) combination, MSE values first decrease and then increase with the iteration number. The optimal number of iteration is around 5 in each case, which is the number that we recommend to use in applications.
MSE values of the estimated regression function. (a) f=f1 and n=200, (b) f=f1 and n=400, (c) f=f2 and n=200, (d) f=f2 and n=400.
Next, we compare the proposed procedure (denoted as NEW) with the conventional local linear kernel (LLK) smoothing procedure and the jump-preserving curve estimation (JPCE) procedure by Qiu [15]. Figure 5 presents the estimated regression functions by all three methods from the observed data shown in Figure 3. For procedure NEW, 5 iterations are used. From the plots in Figure 5, it can be seen that LLK blurs the jumps, JPCE preserves the jumps well but its estimates are quite noisy in continuity regions, and our proposed procedure NEW preserves the jumps well and also removes noise efficiently.
Solid curves denote the estimates by the proposed procedure NEW, dotted curves denote the estimates by LLK, and dashed curves denote the estimates by JPCE.
Tables 1 and 2 present the MSE values and the corresponding standard errors of the three methods in various cases. We use 5 iterations in the proposed procedure NEW. From Tables 1 and 2, it can be seen that procedure NEW performs the best in all cases.
Comparison of the MSE values of the three methods in various cases when f=f1. The numbers in parentheses are the standard errors.
Method
n=200
n=400
n=1000
σ=.1
σ=.2
σ=.5
σ=.1
σ=.2
σ=.5
σ=.1
σ=.2
σ=.5
New
.0012
.0051
.0195
.0008
.0022
.0093
.0005
.0010
0.0040
(.0001)
(.0003)
(.0008)
(.0001)
(.0001)
(.0004)
(.00004)
(.0007)
(.0002)
CLLK
.0066
.0118
.0275
.0042
.0079
.0203
.0029
.0051
.01218
(.0001)
(.0002)
(.0006)
(.0001)
(.0001)
(.0004)
(.00002)
(.0004)
(.0002)
JPCF
.0052
.0106
.0336
.0035
.0056
.0184
.0013
.0024
.0087
(.0003)
(.0004)
(.0009)
(.0002)
(.0002)
(.0005)
(.00007)
(.0007)
(.0002)
Comparison of the MSE values of the three methods in various cases when f=f2. The numbers in parentheses are the standard errors.
Method
n=200
n=400
n=1000
σ=.1
σ=.2
σ=.5
σ=.1
σ=.2
σ=.5
σ=.1
σ=.2
σ=.5)
New
.0012
.0051
.0195
.0006
.0018
.0088
.0002
.0012
.0038
(.0001)
(.0003)
(.0008)
(.00004)
(.0001)
(.0004)
(.00003)
(.00005)
(.0002)
CLLK
.0066
.0118
.0275
.00344
.0066
.0171
.0021
.0041
.0109
(.0001)
(.0002)
(.0006)
(.00003)
(.0001)
(.0004)
(.00002)
(.00004)
(.0002)
JPCF
.0052
.0106
.0336
.00222
.0043
.0151
.0017
.0020
.0086
(.0003)
(.0004)
(.0009)
(.0001)
(.0001)
(.0004)
(.00003)
(.00005)
(.0002)
4. An Application
In this section, we apply our proposed method to a groundwater level data. Possible jumps in groundwater level arise from changes in subsurface fluid currents, which has become an important predictor of earthquakes. In Figure 6, little pluses denote the groundwater level observed by the Seismograph Network Stations of China Earthquake Center during January and May 2007. The solid curve denotes the estimated regression curve by our proposed procedure in which all parameters are chosen in the same way as in the simulation examples presented in Section 3. As indicated by the plot, the jumps around Mar 23, Aril 10, and Aril 17 are preserved well by our procedure. We checked the earthquake history and it is confirmed by China Earthquake Center that earthquakes with magnitude of more than 4.0 occurred in these periods in the area of observation.
Little pluses denote the groundwater level observed by the Seismograph Network Stations of China Earthquake Center during January and May 2007, and solid curve denotes the estimated regression curve by our proposed procedure.
5. Concluding Remarks
We have presented a blind deconvolution procedure for jump-preserving curve estimation when both noise and blur are present in the observed data. Numerical results show that it performs well in various cases. However, theoretical properties of the proposed method are not available yet, which requires much future research. We believe that the proposed method can be generalized to two-dimensional cases to solve problems such as image deblurring and restoration.
Acknowledgments
This research was supported in part by an NSF grant. The authors thank the guest editor of this special issue, Professor Ming Li, for help during the paper submission process. They also thank the three anonymous referees for their careful reading of the paper.
FanJ.GijbelsI.199666London, UKChapman & Hallxvi+341Monographs on Statistics and Applied ProbabilityMR1383587ZBL1135.76475QiuP.2005Hoboken, NJ, USAJohn Wiley & Sonsxxiv+305Wiley Series in Probability and Statistics10.1002/0471733156MR2111430ZBL1180.91062QiuP. H.AsanoC.LiX. P.Estimation of jump regression function1991243-4197212MR1110448ZBL0808.62060MüllerH.-G.Change-points in nonparametric regression analysis1992202737761MR116559010.1214/aos/1176348654ZBL0783.62032WuJ. S.ChuC. K.Kernel-type estimators of jump points and values of a regression function199321315451566MR124127810.1214/aos/1176349271ZBL0795.62043QiuP. H.Estimation of the number of jumps of the jump regression functions199423821412155MR129317610.1080/03610929408831378LoaderC. R.Change point estimation using nonparametric regression199624416671678MR141665510.1214/aos/1032298290ZBL0867.62033ZhangB.SuZ.QiuP.On jump detection in regression curves using local polynomial kernel estimation200925505528JooJ.-H.QiuP.Jump detection in a regression curve and its derivative2009513289305MR256227410.1198/tech.2009.07163McDonaldJ. A.OwenA. B.Smoothing with split linear fits1986283195208MR85311310.2307/1269075ZBL0626.65010HallP.TitteringtonD. M.Edge-preserving and peak-preserving smoothing1992344429440MR119026210.2307/1268942QiuP.YandellB.A local polynomial jump-detection algorithm in nonparametric regression1998402141152MR162692710.2307/1270648WangY.Jump and sharp cusp detection by wavelets1995822385397MR135423610.1093/biomet/82.2.385ZBL0824.62031KooJ.-Y.Spline estimation of discontinuous regression functions199763266284MR146686810.2307/1390733QiuP.A jump-preserving curve fitting procedure based on local piecewise-linear kernel estimation2003154-5437453MR201747910.1080/10485250310001595083ZBL1054.62047GijbelsI.LambertA.QiuP.Jump-preserving regression and smoothing using local linear fitting: a compromise2007592235272MR239416810.1007/s10463-006-0045-9CarassoA. S.Direct blind deconvolution200161619802007MR185687910.1137/S0036139999362592ZBL0980.68123HallP.QiuP.Blind deconvolution and deblurring in image analysis200717414831509MR2398603ZBL1141.68645HallP.QiuP.Nonparametric estimation of a point-spread function in multivariate problems200735415121534MR235109510.1214/009053606000001442QiuP.A nonparametric procedure for blind image deblurring200852104828484110.1016/j.csda.2008.03.027MR2521626