MPE Mathematical Problems in Engineering 1563-5147 1024-123X Hindawi Publishing Corporation 10.1155/2016/5053434 5053434 Research Article A General Proximal Alternating Minimization Method with Application to Nonconvex Nonsmooth 1D Total Variation Denoising http://orcid.org/0000-0002-1035-3200 Zhang Xiaoya 1 http://orcid.org/0000-0001-5024-1900 Sun Tao 1 http://orcid.org/0000-0002-8371-066X Cheng Lizhi 1,2 2 Gandarias Maria L. 1 College of Science National University of Defense Technology Changsha Hunan 410073 China nudt.edu.cn 2 The State Key Laboratory for High Performance Computation National University of Defense Technology Changsha Hunan 410073 China nudt.edu.cn 2016 3112016 2016 15 06 2016 10 10 2016 3112016 2016 Copyright © 2016 Xiaoya Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We deal with a class of problems whose objective functions are compositions of nonconvex nonsmooth functions, which has a wide range of applications in signal/image processing. We introduce a new auxiliary variable, and an efficient general proximal alternating minimization algorithm is proposed. This method solves a class of nonconvex nonsmooth problems through alternating minimization. We give a brilliant systematic analysis to guarantee the convergence of the algorithm. Simulation results and the comparison with two other existing algorithms for 1D total variation denoising validate the efficiency of the proposed approach. The algorithm does contribute to the analysis and applications of a wide class of nonconvex nonsmooth problems.

National Natural Science Foundation of China 61571008
1. Introduction

In the past few years, increasing attentions have been paid to convex optimization problems, which consist of minimizing a sum of convex or smooth functions . Each of the objective functions enjoys several appreciated properties, like strong convexity, Lipschitz continuity, or other convex conditions. These properties can usually lead to great advantage in computing. Meanwhile works on such convex problems have provided a sound theoretic foundation. Both of the theoretic and computing advantages created many benefits to practical use. It is particularly noticeable in signal/image processing, machine learning, computer vision, and so forth. However, what deserves the special attention is the fact that the convex or smooth models are always approximations of nonconvex nonsmooth problems. For example, nonconvex  l0-norm in sparse recovery problems is routinely relaxed as convex  l1-norm, and many related works were developed [4, 5]. Although the difference between the nonconvex nonsmooth problem and its approximations vanishes in certain case, it is nonnegligible sometimes, like the problem in paper . On the other hand, excellent numerical performances of various nonconvex nonsmooth algorithms inspire researchers to continue on their directions to the nonconvex methodology.

Nonconvex and nonsmooth convex optimization problems are ubiquitous in different disciplines, including signal denoising , image deconvolution [8, 9], or other ill-posed inverse problems , to name a few. In this paper, we aim at solving the following generic nonconvex nonsmooth optimization problem, formulated in real Hilbert spaces X and  Umm=1,2,,M, for some  MN+:(1)x^argminxPxfx+m=1MhmLmx,where (i) f(·):XR{+}  and  hm(·):UmR{+}  are proper lower semicontinuous functions; (ii) the operators  Lm:XUm  are linear; and (iii) the set of minimizers is supposed to be nonempty.

It is quite meaningful to find a common convergent point in the optimal set of sums of simple functions [2, 11]. Insightful studies for nonconvex problems are presented in [12, 13]: if nonconvex structured functions of the type L=f(x)+Q(x,y)+g(y)  has the Kurdyka-Lojasiewicz (K-L) property, then each bounded sequence generated by a proximal alternating minimization algorithm converges to a critical point of  L. Our main work is mainly based on this convergence result and introduces a generic proximal minimization method for the general model (1). Note that if some of the functions in (1) are zeros and linear operator  Lm  is the identity  Id, the model would reduce to common one in compressed sensing . The model we consider is the generalization of many application problems, such as the common “lasso” problem , and composition in papers [2, 16]. Here, we provide a few of the typical examples.

Example 1 (1D total variation minimization [<xref ref-type="bibr" rid="B7">7</xref>]).

In this application, we need to solve the following denoising problem:(2)x^argminxXx-y22+λDx0,where  yX is the input signal, X=Rn,  z0 counts the number of nonzero elements of z, and Dx(x2-x1,x3-x2,,xn-xn-1)  is defined as derivation of original signal  x.

Noise removal is the basis and requisite of other subsequential applications and algorithm dealing with total variation (TV) regularizer; this regularizer is of great importance since it can efficiently deal with noisy signals which have sparse derivatives (or gradients), for instance, piecewise constant (PWC) signal that has flat sections with a number of abrupt jumps. The 1D total variation minimization can be extended to related 2-dimension image restoration.

Example 2 (group lasso [<xref ref-type="bibr" rid="B17">17</xref>]).

In this application, one needs to solve(3)x^argminxRN12Ax-y22+λi=1nxi2,where  xT=x1T,x2T,,xnT,xiRpi  are the decision block variables and  i=1npi=N  with  pi  is the corresponding block size.

In the past few years, researches on structural sparse signal recovery have been very popular and group lasso is typical one of those important problems. It attracts many attentions in face recognition, multiband signal processing, and other machine learning problems. The general case is also applied to many other kinds of structural sparse recovery problems, like  l12-minimization  and block sparse recovery .

Example 3 (image deconvolution [<xref ref-type="bibr" rid="B20">20</xref>]).

In this application, one needs to solve(4)X^argminXXAX-YF2+λTVX,where X=Rm×n  and   XF=ijXi,j21/2. The discrete total variation, denoted by  TV  in (4), is defined as follows. We define  r×r  matrix: (5)Dr1-1-11-11, and the discrete gradient operator  D:XX2  is defined as (6)DXi,j=dxi,j,dyi,j,dx=DmX,dy=XDnT. Then we have   TV(X)=DX1,2=ij|dxi,j|2+|dyi,j|2.

The concept of deconvolution finds lots of applications in signal processing and image processing . In this paper, it would just be considered as a specific case to problem (1), although paying attention to this problem (4) with other details is equally important.

The main difficulty in solving (1) lies in that  x  is coupled by  Lm. In order to surmount the computation barrier, we introduce a new auxiliary variable and split the problem into two sequences of subproblems, minutely described in the next section. Then our problem is an extension of problem given in paper . The paper aims at giving a generic proximal alternating minimization method for a class of nonconvex nonsmooth problem (1), to be applied in many fields. The motivation is introduction of auxiliary variable and splitting the original problem into two kinds of easier nonconvex nonsmooth subproblems. Recent studies often give the regularization  hm  a reasonable assumption; namely, the proximal map of  hm  is easy to calculate. Then convergence analysis can be extended by the context of the present work . In the last section, we show application to nonconvex nonsmooth 1D total variation signal denoising.

2. Algorithm

In order to reduce computation complexity caused by composite of nonconvex function  hm·  and operator  Lm, we introduce a sequence of auxiliary variables  θmm=1M. Then, the problem in (1) can be represented equivalently as follows: for each  m{1,,M}(7)x^,θ^margminx,θmPmx,θmfx+Qmx,θm+Hθ,where  θmm=1M  are represented as a concise whole  θ  and (8)Qx,θm=1Mτm2θm-Lmx22;Qmx,θmτm2θm-Lmx22;Hθm=1Mhmθm. The last term  m=1Mτm/2θm-Lmx22 ensures the high similarity between  θm  and  Lmx(m=1,,M). And this quadratic function minimization can be easily solved. Hence, the original complex composite is split into two simpler objectives.

Apparently, now this form could be solved by a series of alternating proximal minimization methods : for each  m{1,,M},(9a)xk+1argminufu+Qu,θk+12ζku-xk22;(9b)θmk+1argminvmQmxk+1,vm+hmvm+12μkvm-θmk22.

We then make the following assumptions about (9a) and (9b): HinfX×m=1MUmfx+Hθ+Qx,θ>-;fx+Hθ0+Qx,θ0is  proper;for  some  positive  0<r-<r+,  the  stepsize  sequences  satisfy  r-<ζk,μk<r+k0.The assumption is weakly required and can be easily meet. In the first section, the given three examples all satisfy the first two conditions, and the third condition holds when proper parameters are set in practical tests. Besides, each of the proximal terms  1/2ζku-xk22  and  1/2μkvm-θmk22 in (9a) and (9b) is used to meet the decrease condition of objective function. If we omit them, the algorithm can also perform well but can not build direct general convergence theories. At that time, it reduces to alternating projection minimization method.

3. Convergence Results

In fact, now the problem is a proximal alternating minimization case, whose global convergence has been detailedly analyzed in paper . The paper mainly concentrates on theory analysis of problem  L(x,y)=f(x)+Q(x,y)+g(y)  with the following form:(10)xk+1argminxLx,yk+12ζkx-xk22,xRm,yk+1argminyLxk+1,y+12μky-yk22,yRn.And if  L  has the Kurdyka-Lojasiewicz (K-L) property, then each bounded sequence generated by the above algorithm converges to a critical point of  L. Even convergence rate of the algorithm can be computed, which depends on the geometrical properties of the function  L  around its critical points. It is remarkable that assumption with K-L property can be verified in many common functions.

The convergence difference between algorithm of paper  and ours is that our minimization objective is not two variables  (x,y)  but  x  and a sequence of variables  θmm=1M. In this section, our work is to give similar consequence of algorithms (9a) and (9b).

3.1. Preliminary Definition 4 (subdifferentials [<xref ref-type="bibr" rid="B24">24</xref>, <xref ref-type="bibr" rid="B25">25</xref>]).

Let  g:RN(-,+]  be a proper and lower semicontinuous function.

For a given  xdom(g), the Fréchet subdifferential of  g  at x, written as  ^g(x), is the set of all vectors  uRN  which satisfy (11)limyxinfyxJy-Jx-u,y-xy-x20.

When  xdom(g), we set  ^J(x)=.

The “limiting” subdifferential, or simply the subdifferential, of  J  at  xRN, written as  g(x), is defined through the following closure process: (12)gxuRN:xkx,gxkgx,  uk^gxku  as  k.

A necessary condition for  zX to be a minimizer of  g  is(13)0gx.A point that satisfies (13) is called limiting-critical or simply critical point. The set of critical points of  g  is denoted by  critg.

Being given  x0,θ0X×m=1MUm, recall that sequence generated by (9a) and (9b) is of the form  (xk,θk)(xk+1,θk)(xk+1,θk+1). According to the basic properties of  P, we can deduce a few important points.

Corollary 5.

Assume sequences  (xk,θk)  are generated by (9a) and (9b) under assumption H, and then they are well defined. Moreover, consider the following:

The following estimate holds: (14)Pxk,θk+12ζkxk-xk-122+m=1M12μkθmk-θmk-122Pxk-1,θk-1,k1;

hence  Lxk,θk  dose not increase.

(15)k=1xk-xk-122+m=1Mθmk-θmk-122<;

hence limkxk-xk-12  +m=1Mθmk-θmk-12=0.

For  k1, we have(16)m=1MxQmxk,θmk-xQmxk,θmk-1,0-1ζk-1xk-xk-1,m=1M1μk-1θmk-θmk-1Pxk,θk,

where above  0  is a multivector with the same dimension of  m=1Mθm.

Besides, for all bounded subsequence  xkj,θkj  of  (xk,θk),  dist(0,P(xkj,θkj))0,kj+.

Corollary 6.

Assume that H hold. Let  xk,θk  be a sequence complying with (9a) and (9b). Let  ωx0,θ0  denote the set (possibly empty) of its limit points. Then

if  (xk,θk)  is bounded, then  ω(x0,θ0)  is a nonempty compact connected set and  dist((xk,θk),ω(x0,θ0))0  as  k+,

ω(x0,θ0)critP,

P  is finite and constant on  ω(x0,y0), equal to  infkNP(xk,yk)=limk+P(xk,θk).

The above proposition gives some convergence results about sequences generated by (7), (9a), and (9b). Point (ii) guarantees that all limiting points produced by (9a) and (9b) must be limiting-critical points. And (iii) gives the point that objective  P  converges to the finite and constant.

3.2. Convergence to a Critical Point

This part gives more precise convergence analysis about the proximal algorithms (9a) and (9b).

Let  f:XR{+}  be a proper lower semicontinuous function. For  -<η1<η2+, let us set  [η1<f<η2]=xX:1<fx<2. Then we give an important definition in the optimization theory.

Definition 7 (K-L property [<xref ref-type="bibr" rid="B12">12</xref>]).

The function  f  is said to have the K-L property at  x¯domf  if there exist  η(0,+], a neighborhood  U  of  x¯,and a continuous concave function  ϕ:[0,η)R+  such that

ϕ(0)=0;

ϕ  is  C1  on  (0,η);

for all  s(0,η), ϕ(s)>0;

for all  xU[f(x¯)<f<f(x¯)+η], the Kurdyka-Lojasiewicz inequality holds: (17)ϕfx-fx¯dist0,fx1.

If we justify that a function has K-L property, we should estimate  η,U,ϕ. Many convex functions, for instance, satisfy the above property with  U=Rn  and  η=+. Besides, a lot of nonconvex examples are also given in paper .

Below, we will give convergence analysis to critical point.

Theorem 8 (convergence).

Assume that  P  satisfies H and has the Kurdyka-Lojasiewicz property at each point of the domain of  f. Then

either (xk,θk)2  tends to infinity,

or  xk-xk-1,θk-θk-1  is  l1, that is, (18)k=1+xk-xk-12+m=1Mθmk-θmk-12<+,

and, as a consequence,  xk,θk  converges to a critical point of  P.

The above theorem’s proof is based on the same analysis process in paper , so here we just present the convergence results but omit their proofs.

4. Application to 1D TV Denoising

In practical scientific and engineering contexts, noise removal is the basis and requisite of other subsequential applications. It has received extensive attentions. A range of computational algorithms have been proposed to solve the denoising problem . Among these solvers, total variation (TV) regularizer is of great importance since it can efficiently deal with noisy signals that have sparse derivatives (or gradients). For instance, piecewise constant (PWC)  signal with noise, whose derivative is sparse relative to signal dimension, could be denoised by powerful TV denoising method.

In 1D TV denoising problem , one needs to solve model (2). TV denoising minimizes a composite of two parts. The first part is to keep the error, between the observed data and the original, as small as possible. The second is devoted to minimizing the sparsity of the gradients. Usually, denoising model is defined as one combination of a quadratic data fidelity term and a convex regularization term or a differential regularization, for example, convex but nonsmooth problem (19)argminxRny-x22+λDx1:xRnor differential but nonconvex problem (20)argminxRny-x22+λi1alog1+aDxi:xRn,a>0,where  x1=i=1nxi. Exact solution of the above two types can be obtained by very fast direct algorithms [7, 30]. In fact, convex  l1-norm is the replacement of nonconvex  l0-norm in (19) since convex optimization techniques have been deeply studied. The latter, like logarithmic penalty and arctangent penalty, can be solved by MM update iteration, in which total objective function (including data fidelity and regularization terms) should meet strictly convex condition.

In this test, we apply our algorithms (9a) and (9b) to this example. Auxiliary variable  θ  is introduced to reduce complexity of the composition  Dx0. Then (2) is represented as(21)argminxRn,θRn-1x-y22+τ2Dx-θ22+λθ0.Apparently this problem satisfies the convergence conditions . Concrete steps by algorithms (9a) and (9b) are shown in (22a)xk+1argminuRnu-y22+τ2Du-θk22+12ζku-xk22;(22b)θk+1argminvRn-1λv0+τ2Dxk+1-v22+12μkv-θk22.

In fact, when tests  ζk  and  μk  can be set as very large constants, the last proximal terms  1/2ζku-xk22  and 1/2μkv-θk22  in each computing step could be negligible. Hence, the computation of (22a) and (22b) is as follows.

Computation of (22a). The former step (22a) is a quadratic function and could be computed through many techniques, like gradient descent.

Computation of (22b). Apparently, the latter (22b) could be rewritten as a proximal operator of function  gθ ; that is,  proxλ/τθ0Dxk+1. Consider the proximal operator  proxλ/τ·0(u). When  n=1,   l0  norm is reduced to  ·0, where one easily establishes that (23)proxλ/τ·0u=u,ifu>2λτ;0,u,ifu=2λτ;0,otherwise.

When  n  is arbitrary, trivial algebraic manipulations are given, with  u=(u1,u2,,un)Rn: (24)proxλ/τ·0ui=proxλ/τ·0ui, and thus  proxλ/τθ0Dxk+1  is a perfectly known object.

Total variation denoising examples with three convex and nonconvex regularization instances (the two others are convex and nonconvex but smooth algorithms in [7, 30], resp.) are figured in Figure 1. Original piece signal data  x0Rn  with length  n=256  is obtained with MakeSignal in paper . The noisy data is obtained using additive white Gaussian noise (AWGN) (σ=0.5). For both convex and nonconvex cases, we set  λ=nσ/4, consistent with the range suggested in  for standard (convex) TV denoising and nonconvexity parameter is set to its maximal value,  a=1/(4λ)  default in . These settings could lead to the best denoising result in their papers. All the other settings are consistent with paper . The maximum iteration numbers are all  500. All the codes are tested in the same computer.

Total variation denoising with nonconvex nonsmooth  l0  penalty (RMSE = 0.1709), compared with convex  l1  penalty  (RMSE = 0.2720) and smooth log penalty  (RMSE = 0.2611).

According to the comparison between our algorithm for TV-l0  norm and the proposed algorithms in papers [7, 30], our algorithm has better result with smaller Root-Mean-Square-Error (RMSE), where  RMSE(x)(1/n)i=1nxi-xi022. Referring to Figure 1, the best RMSE results for 1D TV denoising with convex  l1  penalty  and smooth  log  penalty  are 0.2720 and 0.2611, respectively. And ours is 0.1709, much better than the convex and smooth cases.

5. Conclusion

Nonconvex nonsmooth algorithm finds many interesting applications in many fields. In this paper, we give a general proximal alternating minimization method for a kind of nonconvex nonsmooth problems with complex composition. It has concise form, good theory results, and promising numerical result. For specific 1D standard TV denoising problem, the improvement is more dramatic compared to the existing algorithms [7, 30]. Besides, our algorithm works on other nonconvex nonsmooth problems, such as block sparse recovery, group lasso, and image deconvolution, of which the examples are just too numerous to mention.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The work is supported in part by National Natural Science Foundation of China, no. 61571008.

Combettes P. L. Pesquet J.-C. Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators Set-Valued and Variational Analysis 2012 20 2 307 330 10.1007/s11228-011-0191-y MR2913680 2-s2.0-84869774172 Condat L. A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms Journal of Optimization Theory and Applications 2013 158 2 460 479 10.1007/s10957-012-0245-9 MR3084386 2-s2.0-84881083126 Esser J. E. Primal dual algorithms for convex models and applications to image restoration, registration and nonlocal inpainting [Ph.D. thesis] 2010 Los Angeles, Calif, USA University of California Zhao Y. B. Li D. Reweighted l1-minimization for sparse solutions to underdetermined linear systems SIAM Journal on Optimization 2012 22 3 1065 1088 Wright J. Ma Y. Dense error correction via-minimization IEEE Transactions on Information Theory 2010 56 7 3540 3560 Sun T. Zhang H. Cheng L. Subgradient projection for sparse signal recovery with sparse noise Electronics Letters 2014 50 17 1200 1202 10.1049/el.2014.1335 2-s2.0-84906254067 Selesnick I. W. Parekh A. Bayram I. Convex 1-D total variation denoising with non-convex regularization IEEE Signal Processing Letters 2015 22 2 141 144 10.1109/lsp.2014.2349356 2-s2.0-84906871526 He L. Schaefer S. Mesh denoising via L0 minimization ACM Transactions on Graphics 2013 32 4, article 64 10.1145/2461912.2461965 2-s2.0-84880828872 Xu L. Lu C. Xu Y. Jia J. Image smoothing via L0 gradient minimization ACM Transactions on Graphics 2011 30 6, article 174 10.1145/2024156.2024208 2-s2.0-82455162612 Brandt C. Seidel H.-P. Hildebrandt K. Optimal spline approximation via l0-minimization Computer Graphics Forum 2015 34 2 617 626 10.1111/cgf.12589 2-s2.0-84932090144 Zhang H. Cheng L. Yin W. A dual algorithm for a class of augmented convex models https://arxiv.org/abs/1308.6337 Attouch H. Bolte J. Redont P. Soubeyran A. Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality Mathematics of Operations Research 2010 35 2 438 457 10.1287/moor.1100.0449 MR2674728 2-s2.0-77953092582 Attouch H. Bolte J. Svaiter B. F. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods Mathematical Programming 2013 137 1-2 91 129 10.1007/s10107-011-0484-9 MR3010421 2-s2.0-84877887498 Bolte J. Sabach S. Teboulle M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems Mathematical Programming 2014 146 1-2 459 494 10.1007/s10107-013-0701-9 MR3232623 2-s2.0-84905580784 Tibshirani R. Regression shrinkage and selection via the lasso Journal of the Royal Statistical Society. Series B (Methodological) 1996 73 3 267 288 10.1111/j.1467-9868.2011.00771.x Chen G. Teboulle M. A proximal-based decomposition method for convex minimization problems Mathematical Programming 1994 64 1–3 81 101 10.1007/bf01582566 MR1274173 2-s2.0-0002472542 Meier L. Van De Geer S. Bhlmann P. The group Lasso for logistic regression Journal of the Royal Statistical Society. Series B. Statistical Methodology 2008 70 1 53 71 10.1111/j.1467-9868.2007.00627.x MR2412631 2-s2.0-37849035696 Liao X. Li H. Carin L. Generalized alternating projection for weighted-2, 1 minimization with applications to model-based compressive sensing SIAM Journal on Imaging Sciences 2014 7 2 797 823 10.1137/130936658 Elhamifar E. Vidal R. Block-sparse recovery via convex optimization IEEE Transactions on Signal Processing 2012 60 8 4094 4107 10.1109/TSP.2012.2196694 MR2960482 2-s2.0-84863970193 Condat L. A generic proximal algorithm for convex optimization—application to total variation minimization IEEE Signal Processing Letters 2014 21 8 985 989 10.1109/lsp.2014.2322123 2-s2.0-84901022806 Chan T. F. Wong C.-K. Total variation blind deconvolution IEEE Transactions on Image Processing 1998 7 3 370 375 10.1109/83.661187 2-s2.0-0032028980 He L. Marquina A. Osher S. J. Blind deconvolution using TV regularization and Bregman iteration International Journal of Imaging Systems and Technology 2005 15 1 74 83 10.1002/ima.20040 2-s2.0-25444480815 Dey N. Blanc-Feraud L. Zimmer C. Roux P. Kam Z. Olivo-Marin J.-C. Zerubia J. Richardson-Lucy algorithm with total variation regularization for 3D confocal microscope deconvolution Microscopy Research and Technique 2006 69 4 260 266 16586486 10.1002/jemt.20294 2-s2.0-33645901226 16586486 Mordukhovich B. S. Variational Analysis and Generalized Differentiation I: Basic Theory 2006 Berlin, Germany Springer Science & Business Media Rockafellar R. T. Wets R. J. B. Variational Analysis 2009 Berlin, Germany Springer Science & Business Media Rudin L. I. Osher S. Fatemi E. Nonlinear total variation based noise removal algorithms Physica D: Nonlinear Phenomena 1992 60 1–4 259 268 10.1016/0167-2789(92)90242-f MR3363401 2-s2.0-44049111982 Lysaker M. Lundervold A. Tai X.-C. Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time IEEE Transactions on Image Processing 2003 12 12 1579 1590 10.1109/tip.2003.819229 2-s2.0-4544278119 Lysaker M. Osher S. Tai X.-C. Noise removal using smoothed normals and surface fitting IEEE Transactions on Image Processing 2004 13 10 1345 1357 10.1109/tip.2004.834662 MR2093522 2-s2.0-4544240733 Little M. A. Jones N. S. Generalized methods and solvers for noise removal from piecewise constant signals. I. Background theory Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 2011 467 2135 3088 3114 10.1098/rspa.2010.0671 2-s2.0-80755182306 Dumbgen L. Kovac A. Extensions of smoothing via taut strings Electronic Journal of Statistics 2009 3 41 75 10.1214/08-ejs216 MR2471586