A General Proximal Alternating Minimization Method with Application to Nonconvex Nonsmooth 1D Total Variation Denoising

We deal with a class of problems whose objective functions are compositions of nonconvex nonsmooth functions, which has a wide range of applications in signal/image processing. We introduce a new auxiliary variable, and an efficient general proximal alternatingminimization algorithm is proposed.Thismethod solves a class of nonconvex nonsmooth problems through alternating minimization. We give a brilliant systematic analysis to guarantee the convergence of the algorithm. Simulation results and the comparison with two other existing algorithms for 1D total variation denoising validate the efficiency of the proposed approach. The algorithm does contribute to the analysis and applications of a wide class of nonconvex nonsmooth problems.


Introduction
In the past few years, increasing attentions have been paid to convex optimization problems, which consist of minimizing a sum of convex or smooth functions [1][2][3].Each of the objective functions enjoys several appreciated properties, like strong convexity, Lipschitz continuity, or other convex conditions.These properties can usually lead to great advantage in computing.Meanwhile works on such convex problems have provided a sound theoretic foundation.Both of the theoretic and computing advantages created many benefits to practical use.It is particularly noticeable in signal/image processing, machine learning, computer vision, and so forth.However, what deserves the special attention is the fact that the convex or smooth models are always approximations of nonconvex nonsmooth problems.For example, nonconvex ℓ 0 -norm in sparse recovery problems is routinely relaxed as convex ℓ 1norm, and many related works were developed [4,5].Although the difference between the nonconvex nonsmooth problem and its approximations vanishes in certain case, it is nonnegligible sometimes, like the problem in paper [6].On the other hand, excellent numerical performances of various nonconvex nonsmooth algorithms inspire researchers to continue on their directions to the nonconvex methodology.
Nonconvex and nonsmooth convex optimization problems are ubiquitous in different disciplines, including signal denoising [7], image deconvolution [8,9], or other ill-posed inverse problems [10], to name a few.In this paper, we aim at solving the following generic nonconvex nonsmooth optimization problem, formulated in real Hilbert spaces X and {U  } =1,2,..., , for some  ∈ N + : x ∈ arg min  { () fl  () where (i) (⋅) : X → R∪{+∞} and ℎ  (⋅) : U  → R∪{+∞} are proper lower semicontinuous functions; (ii) the operators   : X → U  are linear; and (iii) the set of minimizers is supposed to be nonempty.It is quite meaningful to find a common convergent point in the optimal set of sums of simple functions [2,11].Insightful studies for nonconvex problems are presented in [12,13]: if nonconvex structured functions of the type  = () + (, ) + () has the Kurdyka-Lojasiewicz (K-L) property, then each bounded sequence generated by a proximal alternating minimization algorithm converges to a critical point of .Our main work is mainly based on this convergence result and introduces a generic proximal minimization method for the general model (1).Note that if some of the functions in (1) are zeros and linear operator   is the identity   , the model would reduce to common one in compressed sensing [14].The model we consider is the generalization of many application problems, such as the common "lasso" problem [15], and composition in papers [2,16].Here, we provide a few of the typical examples.
Noise removal is the basis and requisite of other subsequential applications and algorithm dealing with total variation (TV) regularizer; this regularizer is of great importance since it can efficiently deal with noisy signals which have sparse derivatives (or gradients), for instance, piecewise constant (PWC) signal that has flat sections with a number of abrupt jumps.The 1D total variation minimization can be extended to related 2-dimension image restoration.
In the past few years, researches on structural sparse signal recovery have been very popular and group lasso is typical one of those important problems.It attracts many attentions in face recognition, multiband signal processing, and other machine learning problems.The general case is also applied to many other kinds of structural sparse recovery problems, like  12 -minimization [18] and block sparse recovery [19].
Example 3 (image deconvolution [20]).In this application, one needs to solve X ∈ arg min where X = R × and ‖‖  = [∑  ∑  ( 2 , )] 1/2 .The discrete total variation, denoted by TV in (4), is defined as follows.We define  ×  matrix: and the discrete gradient operator  : X → X 2 is defined as Then we have TV() The concept of deconvolution finds lots of applications in signal processing and image processing [21][22][23].In this paper, it would just be considered as a specific case to problem (1), although paying attention to this problem (4) with other details is equally important.
The main difficulty in solving (1) lies in that  is coupled by   .In order to surmount the computation barrier, we introduce a new auxiliary variable and split the problem into two sequences of subproblems, minutely described in the next section.Then our problem is an extension of problem given in paper [12].The paper aims at giving a generic proximal alternating minimization method for a class of nonconvex nonsmooth problem (1), to be applied in many fields.The motivation is introduction of auxiliary variable and splitting the original problem into two kinds of easier nonconvex nonsmooth subproblems.Recent studies often give the regularization ℎ  a reasonable assumption; namely, the proximal map of ℎ  is easy to calculate.Then convergence analysis can be extended by the context of the present work [12].In the last section, we show application to nonconvex nonsmooth 1D total variation signal denoising.

Algorithm
In order to reduce computation complexity caused by composite of nonconvex function ℎ  (⋅) and operator   , we introduce a sequence of auxiliary variables {  }  =1 .Then, the problem in (1) can be represented equivalently as follows: for each  ∈ {1, . . ., } (x, θ ) ∈ arg min where {  }  =1 are represented as a concise whole  and The last term ∑  =1 (  /2)‖  −   ‖ 2 2 ensures the high similarity between   and    ( = 1, . . ., ).And this quadratic function minimization can be easily solved.Hence, the original complex composite is split into two simpler objectives.

(H)
The assumption is weakly required and can be easily meet.
In the first section, the given three examples all satisfy the first two conditions, and the third condition holds when proper parameters are set in practical tests.Besides, each of the proximal terms in (9a) and (9b) is used to meet the decrease condition of objective function.If we omit them, the algorithm can also perform well but can not build direct general convergence theories.At that time, it reduces to alternating projection minimization method.

Convergence Results
In fact, now the problem is a proximal alternating minimization case, whose global convergence has been detailedly analyzed in paper [12].The paper mainly concentrates on theory analysis of problem (, ) = () + (, ) + () with the following form: And if  has the Kurdyka-Lojasiewicz (K-L) property, then each bounded sequence generated by the above algorithm converges to a critical point of .Even convergence rate of the algorithm can be computed, which depends on the geometrical properties of the function  around its critical points.It is remarkable that assumption with K-L property can be verified in many common functions.The convergence difference between algorithm of paper [12] and ours is that our minimization objective is not two variables (, ) but  and a sequence of variables {  }  =1 .In this section, our work is to give similar consequence of algorithms (9a) and (9b).

(12)
A necessary condition for  ∈ X to be a minimizer of  is 0 ∈  () .
A point that satisfies ( 13) is called limiting-critical or simply critical point.The set of critical points of  is denoted by crit .
The above proposition gives some convergence results about sequences generated by ( 7), (9a), and (9b).Point (ii) guarantees that all limiting points produced by (9a) and (9b) must be limiting-critical points.And (iii) gives the point that objective  converges to the finite and constant.

Convergence to a Critical
Point.This part gives more precise convergence analysis about the proximal algorithms (9a) and (9b).
Below, we will give convergence analysis to critical point.
Theorem 8 (convergence).Assume that  satisfies (H) and has the Kurdyka-Lojasiewicz property at each point of the domain of .Then (i) either ‖(  ,   )‖ 2 tends to infinity, and, as a consequence, (  ,   ) converges to a critical point of .
The above theorem's proof is based on the same analysis process in paper [12], so here we just present the convergence results but omit their proofs.

Application to 1D TV Denoising
In practical scientific and engineering contexts, noise removal is the basis and requisite of other subsequential applications.It has received extensive attentions.A range of computational algorithms have been proposed to solve the denoising problem [26][27][28].Among these solvers, total variation (TV) regularizer is of great importance since it can efficiently deal with noisy signals that have sparse derivatives (or gradients).For instance, piecewise constant (PWC) [29] signal with noise, whose derivative is sparse relative to signal dimension, could be denoised by powerful TV denoising method.
In 1D TV denoising problem [10], one needs to solve model (2).TV denoising minimizes a composite of two parts.The first part is to keep the error, between the observed data and the original, as small as possible.The second is devoted to minimizing the sparsity of the gradients.Usually, denoising model is defined as one combination of a quadratic data fidelity term and a convex regularization term or a differential regularization, for example, convex but nonsmooth problem [30] arg min or differential but nonconvex problem [7] arg min where ‖‖ 1 = ∑  =1 |  |.Exact solution of the above two types can be obtained by very fast direct algorithms [7,30].In fact, Figure 1: Total variation denoising with nonconvex nonsmooth ℓ 0 penalty (RMSE = 0.1709), compared with convex ℓ 1 penalty [21] (RMSE = 0.2720) and smooth log penalty [7] (RMSE = 0.2611).convex ℓ 1 -norm is the replacement of nonconvex ℓ 0 -norm in (19) since convex optimization techniques have been deeply studied.The latter, like logarithmic penalty and arctangent penalty, can be solved by MM update iteration, in which total objective function (including data fidelity and regularization terms) should meet strictly convex condition.
In this test, we apply our algorithms (9a) and (9b) to this example.Auxiliary variable  is introduced to reduce complexity of the composition ‖‖ 0 .Then (2) In fact, when tests   and   can be set as very large constants, the last proximal terms (1/2  )‖ −   ‖ 2 2 and (1/2  )‖V −   ‖ 2 2 in each computing step could be negligible.Hence, the computation of (22a) and (22b) is as follows.
Computation of (22a).The former step (22a) is a quadratic function and could be computed through many techniques, like gradient descent.

Computation of (22b).
Apparently, the latter (22b) could be rewritten as a proximal operator of function () [12]; that is, prox /‖‖ 0 ( +1 ).Consider the proximal operator prox /‖⋅‖ 0 ().When  = 1, ℓ 0 norm is reduced to | ⋅ | 0 , where one easily establishes that When  is arbitrary, trivial algebraic manipulations are given, with  = ( 1 ,  2 , . . .,   ) ∈ R  : and thus prox /‖‖ 0 ( +1 ) is a perfectly known object.Total variation denoising examples with three convex and nonconvex regularization instances (the two others are convex and nonconvex but smooth algorithms in [7,30], resp.) are figured in Figure 1.Original piece signal data  0 ∈ R  with length  = 256 is obtained with MakeSignal in paper [7].The noisy data is obtained using additive white Gaussian noise (AWGN) ( = 0.5).For both convex and nonconvex cases, we set  = √/4, consistent with the range suggested in [30] for standard (convex) TV denoising and nonconvexity parameter is set to its maximal value,  = 1/(4) default in [7].These settings could lead to the best denoising result in their papers.All the other settings are consistent with paper [7].The maximum iteration numbers are all 500.All the codes are tested in the same computer.
According to the comparison between our algorithm for TV-ℓ 0 norm and the proposed algorithms in papers [7,30], our algorithm has better result with smaller Root-Mean-Square-Error (RMSE), where RMSE() fl √(1/) ∑  =1 ‖  −  0  ‖ 2 2 .Referring to Figure 1, the best RMSE results for 1D TV denoising with convex ℓ 1 penalty [21] and smooth log penalty [7] are 0.2720 and 0.2611, respectively.And ours is 0.1709, much better than the convex and smooth cases.

Conclusion
Nonconvex nonsmooth algorithm finds many interesting applications in many fields.In this paper, we give a general proximal alternating minimization method for a kind of nonconvex nonsmooth problems with complex composition.It has concise form, good theory results, and promising numerical result.For specific 1D standard TV denoising problem, the improvement is more dramatic compared to the existing algorithms [7,30].Besides, our algorithm works on other nonconvex nonsmooth problems, such as block sparse recovery, group lasso, and image deconvolution, of which the examples are just too numerous to mention.