Total Variation Regularization Algorithms for Images Corrupted with Different Noise Models: A Review

Total Variation (TV) regularization has evolved from an image denoising method for images corrupted with Gaussian noise into a more general technique for inverse problems such as deblurring, blind deconvolution, and inpainting, which also encompasses the Impulse, Poisson, Speckle, and mixed noise models. This paper focuses on giving a summary of the most relevant TV numerical algorithms for solving the restoration problem for grayscale/color images corrupted with several noise models, that is, Gaussian, Salt & Pepper, Poisson, and Speckle (Gamma) noise models as well as for the mixed noise scenarios, such the mixed Gaussian and impulse model. We also include the description of the maximum a posteriori (MAP) estimator for each model as well as a summary of general optimization procedures that are typically used to solve the TV problem.


Introduction
Acquired digital images are subject to different kinds of noise [1,Chapter 7] depending on the hardware used for their acquisition which may involve additional degradations due to transmission errors or other external factors.The more common image noise models include the Gaussian, Impulse (e.g., Salt & Pepper), Poisson, and Speckle (e.g., Gamma) and the mixed Gaussian and impulse noise models.
While there are several algorithms that can be considered state of the art for a particular noise model, typically the adaptation of such specialized algorithms to handle other noise models has been proven to be either severely difficult or just plain impossible.It is in this regard that regularization methods stand atop due to their flexibility to use any given noise model; while there are several examples of such methods (Tikhonov regularization, Wavelet image restoration, sparsity based denoising and inversion, etc.), here we focus on the Total Variation (TV) regularization [2] method.The original TV regularization method targeted image denoising under Gaussian noise [2]; nevertheless it has evolved into a more general technique for inverse problems (see [3] for more specific details) while retaining its edge preserving property ( [4] gives an extended analysis of TV's properties).
The TV regularized solution of the inverse problem involving data b and observation operator  (the identity in the case of denoising) is the minimum of the functional where  is the data fidelity term, which depends on the noise model (for instance see (40), (42), (43), and ( 44)), the scalar  is the regularization parameter, ‖∇u‖ 1 represents the total variation of solution u, and the ℓ 1 norm of vector k = ∇u is denoted by ‖k‖ 1 = ∑  |V  |.Moreover, depending on the noise model a nonnegativity constraint may be imposed to (1).
While in Section 4 we will give a complete list of several types of numerical algorithms to solve (1), here we mention that such algorithms can be classified on those that need to solve a linear system of equations and those that do not.Usually the formers can handle the denoising as well as the deconvolution problems, whereas the latter can only handle the denoising problem.Moreover for the cases in which a nonnegativity constraint needs to be imposed to (1), such for the Poisson and Speckle noise models, numerical algorithms differ on how they handle the non-negativity constraint, which may add additional computational costs.
The outline of the paper is as follows: in Section 2 we describe the more common image noise models, where we also include a brief description on how to find the best estimator of an observed data corrupted with a known noise model using the criterion based on the MAP (maximum a posteriori) estimator.Next in Section 3, we provide a general review of optimization procedures, with a particular focus on the TV problem, where we also include a more detailed description of some general algorithms that usually do not appear on well-known textbooks.In Section 4 we summarize several TV numerical algorithms for each noise model (Gaussian, Impulse, Poisson, Gamma, and mixed Gaussian plus Impulse) considered in the present paper; we also include some details that differentiate the Total Variation formulation for grayscale from color (vector-valued) images.Numerical simulations and examples are presented throughout Section 5. Finally, concluding remarks are listed in Section 6.

Image Noise Models
In this section we will describe the more common image noise models including the Gaussian, Impulse, Poisson, Speckle (Gamma), and the mixed Gaussian and impulse noise models; nevertheless we will start by succinctly describing the basics on how to find the best estimator of an observed data corrupted with a known noise model using the criterion based on the MAP (maximum a posteriori) estimator (for a more detailed introduction to this topic, we recommend [5] and the many references therein).
We will assume that u * , b, and u are a 1-dimensional (column) or 1D vector that represents the original, observed, and reconstructed 2D grayscale or color images, respectively, obtained via any ordering (although the most reasonable choices are row major or column major), that is, for the grayscale case, b = { 1 ,  2 , . . .,   }, whereas for the case of a 2D color image it will be represented by the 1D (column) , where each b  with  ∈  = {1, 2, 3} represents a grayscale image.Additionally, we will also consider an observation operator  (see (1)) to be a  ×  matrix, which we will assume to be either the identity or to represent an observation operator (e.g., blur); furthermore for the case of color images, the observation operator  is assumed to be decoupled; that is,  is a diagonal block matrix with elements   ; if  is coupled (interchannel blur) due to channel crosstalk, it is possible to reduced it to a diagonal block matrix via a similarity transformation [6].
The MAP estimator assumes that prior noise model (in the observed image b) is known; then the joint probability distribution function for pixel , given by (  |  *  ), will be described by the particular noise model that degrades the observed image; furthermore, if we assume that the noise affecting the (observed) image is independent, then we can write Since the aim of any restoration procedure is to estimate u ≈ u * conditioned on the knowledge of the observed data b, Bayes' law directly leads to the description of the a posteriori conditional joint probability distribution function where the maximization of (3) will lead us to the classical MAP estimator; moreover, this maximization is equivalent to the minimization of − ln( U|B (u | b)).If we additionally assume that (u) ≈ (u * ) and (b) are constant, then the MAP estimator can be found by minimizing − ln( B|U (b | u)).
In the next subsections we will use the previous description for each particular noise model to find its MAP estimator, which will be incorporated as the fidelity term in (1).

The Gaussian Noise Model.
The additive Gaussian noise model, is the most frequently occurring noise model used to model thermal noise as well as the limiting behavior of other noise models (for details see the Central Limit theorem, [7,Chapter 8.4]); the model is given by where the forward linear operator  models an observation operator which we will assume to be either the identity or to represent a blur operator and  = { 1 ,  2 , . . .,   } is a sample of a random vector of  independent Gaussian variables, each with probability density function given by (1/ √ 2 2 )e − (  −) 2 /2 2 .Assuming that u ≈ u * and  = 0, it is straightforward to check that Using the MAP criterion (and since  is a constant), then the estimator will be found by minimizing − ln(e −‖u−‖ 2 2 /2 2 ). Typically this MAP estimator is referred as the solution of the optimization problem given by min This result is the classical and well-known direct approach for inverse filtering under additive Gaussian noise (see [8]).
We finish this subsection with a concisely list of general denoising methods under the additive Gaussian noise model, highlighting that [9] is a very good introduction to the classical methods for image restoration.Well-known restoration algorithms include the bilateral filter [10], Total Variation denoising [2], the SUSAN filter [11], Wavelet based denoising [12], nonlocal means [13], and so forth; several of the above mentioned algorithms are reviewed and compared in [13].Finally we mention that patch based approaches, which group 2-D images fragments (or patches) with similar characteristics, are widely considered to be the current leaders in terms of reconstruction quality for the denoising problem; for instance [14] (and derived variants), known as the BM3D algorithm, is a good example of such approaches.

The Impulse Noise
Model.Degradations due to (wireless) transmission errors appearing during the acquisition of digital images are usually modeled as Impulse noise, which can be expressed as where arithmetic operations are to be considered as elementwise, B is a sample drawn from an i.i.Since Impulse noise (and Salt & Pepper noise in particular) is an example of heavy tailed noise (its probability density function or pdf approaches to zero slower than the Gaussian pdf) it is usually modeled as additive white Laplacian noise (in [15] it was noticed that Laplace noise has the heavy tail behavior associated with impulsive noise); in other words, the model described by ( 7) is approximated by b = u * + , where  = { 1 ,  2 , . . .  } is a sample of a random vector of  independent Laplacian variables, each with probability density function given by (1/2)e −|  −|/ ; as in the previous sub-section, by assuming that u ≈ u * and  = 0, then it is easy to verify that Using the MAP criterion, the estimator can be found by solving the optimization problem given by The use of the ℓ 1 -norm (as in ( 9)) has attracted attention in the past decades [15][16][17][18][19][20] due to a number of advantages, including superior performance with impulse noise in scenarios ranging from detection to variational models.
There are several (nonvariational) approaches to denoise images corrupted with Salt & Pepper noise (as well as for the more general case of Impulse noise), here we succinctly mention the classical 2D median filter [21] and its variants, such the rank-order-based adaptive median filter [22] for the Salt & Pepper case (which uses two-phase approach: first detects the pixels corrupted with Salt & Pepper noise and then filters only those pixels), or the center-weighted median filter [23] for the random-valued impulse noise case.We also mention other methods that also use a two-phase approach: the progressive switching median filter [24], the filter based on the ranked over absolute difference statistics [25], the fuzzy impulse noise detection and reduction method [26], and the directional weighted median filter [27] among others.[7,Chapter 8.4]) even when  *  (value that represents the "count") is not low, since the variance of such approximation would be spatially dependent, and thus be in contradiction with the standard Gaussian noise model (which assumes a constant variance).

The
Following a similar approach as in the previous subsections, we will assume that each pixel of the observed image b = { 1 ,  2 , . . .,   } is a sample of a random vector of  independent Poisson variables, with a joint pdf given by where  = u * ≥ 0,  models an observation operator (identity or blur), and u * is the unknown noise-free image, which is usually assumed to be nonnegative.Using the MAP criterion, the estimator can found by solving the optimization problem given by where (u)  represents the th element of u (i.e., if k = u, then V  = (u)  ).The minimization problem summarized by (11) was first presented in [28,29], where the (now classical) Richardson-Lucy algorithm was introduced.
It must be noted that the functional (Poisson likelihood) (u) = ∑ (u)  − b  ⋅ log((u)  ) is not Lipschitz continuous when (u)  (for any ) is close to zero: that is, it is no possible to find a constant 0 <  < ∞ such that |(u (1) ) − (u (0) )|/|u (1) − u (0) | ≤  if either any element of u (0) or u (1) is (close to) zero; therefore, any given algorithm that optimizes a cost functional that includes that the Poisson likelihood must take this consideration into account.
Finally, we mention that there are several methods to restore (denoise/deconvolve) non-negative images corrupted with Poisson noise (a comprehensive review is given in [30]) which includes the Richardson-Lucy algorithm and its regularized variants as well as Wavelet methods, Bayesian methods, and so forth.Additionally we also notice that there are methods [31][32][33] based on Variance-Stabilizing Transformations or VST (the Anscombe [34] or Freeman-Tukey [35] VST for this particular case): given a set or sequence of nonhomoskedastic random variables (random variables with different finite variances), with the same underlying distribution family, a VST is a specific distribution transformation that renders that set or sequence of random variables into an homoskedastic one (random variables with the same finite variance); moreover such transformation is chosen so the transformed random variables are Gaussian with unit variance.These methods ( [31][32][33]) attain very high quality reconstruction results by using state-of-the-art denoising methods for the Gaussian noise model and then applying the inverse VST.
2.4.The Speckle (Gamma) Noise Model.For images acquired via ultrasound, SAR (synthetic aperture radar) or coherent light imaging systems (e.g., Fabry-Perot interferometer, etc.), Speckle noise is a critical factor that limits their visual perception and processing (see [36,37] among others).Typically, the multiplicative model is considered, where (as in previous sub-sections)  is the observation operator, u * is the noise-free data, and  is the noise; also, it is assumed that b, u * > 0.Moreover, in the Speckle noise model, it is considered that  = { 1 ,  2 , . . .,   } is a sample of a random vector of  independent Gamma variables, each with pdf given by   () = (  /Γ()) −1 e − , with E{  } = 1 and var{  } = 1/, where Γ() = ( − 1)! for a positive value of .
The derivation of the MAP estimator is not as straightforward as for the previous noise models; on what follows we summarized the derivation of the MAP estimator as originally presented in [38], which leads to a nonconvex optimization problem.
We first reproduce Proposition 3.1 from [38] (for the proof, see the mentioned reference): assume that u and  are independent random variables with independent continuous pdfs  U and  Z ; then for b = u ⋅  and u > 0 where arithmetic operations are to be understood as elementwise; using this result, then MAP estimator can be computed by maximizing such maximization derives in the following non-convex optimization problem As for the Poisson noise model, it must be noted that the functional (Gamma likelihood) (u  ) = ∑  (b  /(u)  ) + log((u)  ) is not Lipschitz continuous when (u)  (for any ) is close to zero, and thus any given algorithm that optimizes a cost functional that includes the Gamma likelihood must take this consideration into account.In [38] the authors give a detailed proof of the existence and the uniqueness for this MAP estimator (15) when used within a variational model.While in [37] an extensive list of despeckle filtering algorithms is provided, which includes methods based on linear filtering (first-order statistics, local statistics, etc.), nonlinear filtering (median, linear scaling, etc.), diffusion filtering (anisotropic diffusion, etc.), and wavelet filtering, we also mention that methods following a VST-like approach [39] can be also an effective alternative.

The Mixed Gaussian-Impulse Noise Model.
As mentioned before, acquired digital images are frequently subject to additive Gaussian noise; if we then add the degradation due to transmission errors, the observed image will be corrupted by Gaussian with salt-and-pepper (constant-valued impulse) noise or random-valued impulse noise.The mixed Gaussian and impulse noise model is summarized by where we have the same considerations as for the Impulse noise model (see Section 2.2, in particular the considerations for ( 7)), and additionally  = { 1 ,  2 , . . .,   } is a sample of the random vector of  independent Gaussian variables, each with probability density function given by (1/ √ 2 2 )e −(  −) 2 /2 2 .The key idea of the methods that target this mixed noise model scenario is to use a two-phase approach: detect the outlier pixels before proceeding with the filtering phase, where the detected outliers (pixels corrupted with Impulse noise) and the rest of the pixels are filtered using a suitable (and sometimes independent) method for the particular noise model.For example (in the case of non-TV based methods), [25] introduced an universal noise removal filter that first detects the impulse corrupted pixels and estimates its local statistics and incorporate them into the bilateral filter [10], resulting in the trilateral filter.In [40] somewhat similar ideas were used to develop a similarity principle which in turn drives the weights of the "mixed noise filter"; the reconstruction performance (as reported in [40]) outperformed that of the trilateral filter.
Although all the methods that target (16) assume that N, the set of outliers (pixels corrupted with impulse noise), is known; the fact is that N must be estimated; for this purpose there are several options (for a performance comparison between several of the listed methods we refer the reader to [41]): the rank-order-based adaptive median filter [22], the progressive switching median filter [24], the filter based on the ranked over absolute difference statistics [25], the fuzzy impulse noise detection and reduction method [26], the center weighted median filter [23], and the directional weighted median filter [27] among others.

Optimization Algorithms: A Review from the Total Variation's Perspective
Optimization theory deals with methods to find the argument of a cost function that minimizes (maximizes) it, where the cost function is a quantitative measure of the performance of a given system.In this Section we focus on listing general optimization procedures that are usually used to solve the TV problem, giving references to well-known textbooks or succinctly describing such general optimization procedures.For a general introduction to the topic of optimization theory we recommend [42,43] among other textbooks.We first start this Section by reminding of the general TV problem (as described in (1)) and by stressing that a non-negativity constraint (u ≥ 0) may be imposed to (17); also, throughout this paper we will consider that the discrete version of ‖∇u‖ 1 is given by where  ∈  = {1} (grayscale images case) or  ∈  = {1, 2, 3} (or "red, " "green", and "blue, " for the color images case; also note that  could represent an arbitrary number of channels), the horizontal and vertical discrete derivative operators are denoted by   and   , respectively, and that the scalar operations are considered to be applied elementwise, so that, for example, u = √k ⇒ [] = √V [𝑘].We also note that (18) where We also mention that in other cases an anisotropic separable approximation of ‖∇u‖ 1 is preferred, that is |∇u| ≈ |  u| + |  u| (for instance, see [48]).Since its original formulation [2], the development of numerical algorithms for solving (17), as well as its nonnegativity version, has attracted considerable interest; while in Section 4 we will give a complete list of such algorithms, here we focus on briefly describing or listing (whenever already described in well-known textbooks) general optimization procedures that are usually used to solve (17).For instance in [49,Chapter 8] a complete description of several general optimization procedures for TV are given, such as the Steepest Descent (also referred as the artificial time marching approach, see [44, Chapter 4.5.5]), the Newton method, the lagged fixed point iteration (originally described in [50,51]), and the Primal-dual Newton method (originally described in [52]).Although all of these algorithms focus on the ℓ 2 -TV case (Gaussian noise model), several of them have been used for other noise models (see Section 4).Similarly in [44], several numerical algorithms for TV are described; of particular interest is the dual formulation of the TV problem described in [44, Chapter 4.5.6],which is a different approach than the Primal-dual Newton method (and was used in [53] to derive an effective algorithm).
The enforcement of a nonnegativity constraint, that is,: u ≥ 0, for the solution of ( 17) is not only physically meaningful in most of the cases: images acquired by digital cameras, CT, and so forth; it also improves the quality of the reconstruction (see [54]).Nevertheless, the nonnegativity constraint is seldom considered in the practice (unless explicitly needed by the noise model), since it makes a hard problem even harder.There are several well-known methods to attack this problem: for instance in [49,Chapter 9] very good descriptions are given for the gradient projection method (which can be understood as a generalization of the Steepest Descent method) and for the Projected Newton method (as well as its variants).
We finalize this Section by succinctly describing three general optimization procedures that have been used to solve the TV problem: the Iterative Reweighted Least Squares (IRLS), the Alternating Direction method of multipliers (ADMM) or Split-Bregman (SB), and the Nonnegative Quadratic Programming (NQP) method.

Iterative Reweighted Least Squares (IRLS).
Since its introduction [55] the IRLS method has been applied to a variety of optimization problems.Originally, the IRLS method was used to solve the ℓ  minimization problem (u) = (1/)‖u − b‖    by iteratively approximating it by a weighted to the global minimizer [56], while for 1 ≤  < 2 the definition of the weighting matrix  () must be modified to avoid numerical instability due to division by zero or by a very small value.A standard approach is to threshold elements of u () − b in constructing the corresponding elements of  () , but other choices are also possible [57,58].For 0 <  < 1 it has been shown [58], within a sparse representation framework, that the IRLS algorithm not only converges but increases its convergence rate as  goes to zero.Without loss of generality, we will focus on the ℓ 2 -TV [2] case for grayscale images: is the observed noisy data,  is a weighting factor controlling the relative importance of the data fidelity and regularization terms, and u is the restored image data.
The key idea [59] is to express the regularization term by the quadratic approximation , where and   isa  ×  identity matrix, and ⊗ is the Kronecker product.The resulting iterations can be expressed in the form of the standard IRLS problem: For a given current solution u () , the weighting matrix  ()   can be easily computed, and the threshold (to avoid numerical instability due to division by zero or by a very small value) may be automatically adapted to the input image [59,Section 4.7].Finally, the resulting algorithm has to iteratively solve the linear system which is its most computationally demanding part.The same strategy can be used to solve all other noise models within the TV framework, including the vector-valued (color) TV.Moreover, for the particular case of denoising ( =  in ( 22)) the solution of the linear system described by ( 23) is given by where Ŵ()  =  ()  ; if we now apply the (well-known) matrix inversion lemma to (24), we get It is important to notice that ( Ŵ()  ) −1 can be computed directly (no division by zero); furthermore, by solving We can compute u (+1) via This approach ((26) and ( 27)) was first proposed, within the Total Variation framework (particularly for the ℓ 2 -TV denoising case), in [60].

Alternating Direction Method of Multipliers (ADMMs) or Split Bregman (SB).
Alternating minimization methods have become popular in the past few years due to their ability to solve ℓ 1 regularized problems, that is, (b | u) + (u), where  is the regularization term (being (17) a particular example with (u) = ‖∇u‖ 1 ), in a simple and computationally efficient fashion.Although there are several incarnations of these methods [61], we focus on the Split-Bregman (SB) [62,63] algorithm, while noting that it is now recognized that the SB algorithm is equivalent to the older Alternating Direction Method of Multipliers (ADMM) [64,65] algorithm.
The key idea of the SB method [62,63] is to introduce an auxiliary variable to modify the original cost functional (17) so that it can be iteratively minimized using simple steps per iteration; this is done by first considering the following constrain optimization problem the standard Lagrangian of (28) will lead us to (u, z, w) = (b | u) + (z) + w  (u − z); nevertheless the augmented Lagrangian   (u, z, w), is preferred since it gives a more robust framework (see [61,Section 2.3]).Furthermore, by setting r = u − z and noting that w  r + (/2)‖r‖ 2 2 = (/2)‖k + r‖ 2  2 + (/2)‖k‖ 2 2 , with w = k/, we can write (29) as which can be iteratively solved by considering the following alternating optimization problems: It is important to highlight some observations regarding the previous optimization problems: assuming that (b | u) is quadratic then (31) is a generalized Tikhonov problem step, which is simple to solve; if (z) can be expressed as the ℓ 1 norm of z, then (32) can be solved via shrinkage (in some cases the generalized shrinkage is needed, see [61, Section 4.1]).
It is also important to note that the SB (or ADMM) method can also be used to solve the non-negative quadratic optimization (see (35)) or to solve the optimization problems derived from the Poisson and Speckle noise models (see Sections 2.3 and 2.4) using a similar approach (as described in the previous paragraphs); for a complete description we refer the reader to [61, Section 5.2], [66,67], respectively.
The general SB (or ADMM) algorithm can be easily adapted to handle isotropic TV ((u) = ‖∇u‖ 1 , that should be replaced by its appropriate discrete version for grayscale and vector-valued images); as an example we succinctly focus on the ℓ 2 -TV [2] case for grayscale images: , where the auxiliary variable z = [z  , z  ]  is used along with the constrains z  =   u and z  =   u; among other operations, such as generalized shrinkage (see [63,Section 4.1]) and auxiliary vector updates (which are not computationally demanding) use to solve the particular optimization problems (related to the general ones, i.e., (32) and ( 33)), the SB-TV algorithm has to solve the linear system Note that the left-hand side (LHS) of ( 34) is constant across different iterations while its right-hand side (RHS) changes at each iteration; the opposite is true for the resulting linear system in (23).Furthermore, since LHS of ( 34) is constant across iterations (and strictly diagonally dominant in the practice), it seems natural (as suggested in [63]) to use the Gauss-Seidel method to solve (34); furthermore, when  =  or  is a circulant (diagonalizable by the Fourier transform) matrix, (34) may also be solved in the Fourier or DCT domain.
One key computational difference between the SB-TV and the IRLS based algorithm is that even though the number of floating-point operations for each SB-TV iteration is slightly smaller than that of each IRLS-TV iteration, typically the number of global iterations to attain good reconstruction results for the latter algorithm is less than for the former (see [68] for a detailed analysis).

Nonnegative Quadratic Programming (NQP).
Recently [69] an interesting and quite simple algorithm has been proposed to solve the NQP problem: where the matrix Φ is assumed to be symmetric and positive defined, and  max is some positive constant.The multiplicative updates for the NQP are summarized as follows (see [69] for details on derivation and convergence): where  () = Φ + u () , ^() = Φ − u () , and all algebraic operations in (37) are to be carried out element-wise.One key observation to consider is that this algorithm cannot be initialized with zeros.Interestingly, once an element has been zeroed by the multiplicative updates it remains zero under successive updates; this property is specially appealing when this algorithm is applied to sparse representation problems.Furthermore, the NQP is quite efficient and has been used to solve interesting problems such as statistical learning [69] and compressive sensing [70] among others.
We finalize this sub-section by noting that the constraint problem, could be iteratively solved using an IRLS type approach that benefits from the NQP algorithm, by approximating (38) with and by setting Φ =    ()  and c = −   () 1/2 b in (37).

Numerical Algorithms for TV
In the following sub-sections we will summarize a complete list of TV numerical algorithms for each noise model; we will also include the particular cost functional that is targeted in each case, highlighting particular cases such the methods that adapt the regularization parameter, either in a global or local fashion.

TV Numerical Algorithms for the Gaussian Noise Model.
The minimization of the cost functional, is usually referred to as the ℓ 2 -TV solution.Since its original formulation [2] a large number of algorithms have been proposed to specifically solve (40); generally speaking these algorithms can be classified based on the necessity (or not) to solve a linear system of equations.
For instance algorithms based on a dual formulation of (40) do not need to solve a linear system of equations and are usually computational efficient, although they lack the ability to handle a nontrivial forward (observation) operator  in (40); the Chambolle [53] and the Aujol's ℓ 2 -TV approximation (or A 2 BC model) [71] algorithms are examples of these type of methods.
On the other hand, methods that are based on the Euler-Lagrange equations (derived from the direct optimization of ( 40)) usually use a smooth approximation of ‖∇u‖ 1 (see Section 3) and need to solve a linear system of equations, which is usually carried out via conjugate gradient (CG), the preconditioned CG (PCG), or via the Fourier transform (to solve the linear system when the forward operator  is equivalent to a circular convolution).For example, we mention that in [2] the authors used the steepest descent (artificial time marching) algorithm along with a line-search to solve (40) (the computational performance of the resulting method is very slow compare to all others); more elaborated algorithms include the primal-dual method [52] (which although uses a dual formulations, it does need to solve a set of linear equations), the lagged diffusivity algorithm [50,51], the majorization-minimization or MM method [72,73] (which can be considered a generalization of the Expectation-Maximization, or EM algorithm [74]), as well as its variant [60] for the denoising case that uses the matrix inversion lemma (see (26) and ( 27)), the TwIST method [75] (which is similar to MM method, previously mentioned and uses a predetermined PCG splitting method), and the Iteratively Reweighted Norm (IRN) [59] (which can be understood as an IRLS method).We also mention the FTVd method [62,76] (which is a FFT based algorithm) and the linear programming via interior point method [77] (for which the linear system to be solved has twice as many variables as the original problem) and the Newton's method [49].

4.1.1.
Vector-Valued ℓ 2 -TV Algorithms.The TV minimization scheme for deblurring color images was first introduced in [46], and since then several alternatives have been proposed: in [78] the authors proposed an algorithm that considered the HSV color model, in [79] an algorithm based on a dual formulation (similar to Chambolle's [53]) was proposed (only the denoising case was considered), [80] presented an extension to [79] with a dual/alternate algorithm, in [81] the IRN algorithm [59] was modified to handle vectorvalued images (this algorithm is related to the IRLS algorithm for vector-valued datasets, see [6]), and in [82] the FTVd algorithm [62,76] was extended to handle vector-valued images.Finally we also mention that a direct extension of [50], called vectorial lagged diffusivity, was used in [83] for comparison purposes.

Nonnegative ℓ 2 -TV Algorithms.
For scalar (grayscale) images, only a handful of numerical algorithms, for example, [49,Chapter 9] and more recently [84] (non-negativity is handled via the NQP algorithm, see Section 3.3), [54] (nonnegativity is handled via a variant of the Primal-dual Newton method, [49,52,Chapter 8]), [85] (non-negativity is handled via combination of a gradient based approach along with an iterative shrinkage/thresholding algorithm), and [86] (nonnegativity is handled via the SB algorithm, see Section 3.2), include a non-negativity constraint on the solution of the ℓ 2 -TV problem (40), and to the best of our knowledge, for vector-valued (color) images only [87] (non-negativity is handled via the NQP algorithm, see Section 3.3) explicitly includes the non-negativity constraint within the ℓ 2 -TV framework.

Adaptation of the Regularization Parameter for ℓ 2 -TV.
Several methods have been proposed to adapt the regularization parameter for ℓ 2 -TV; they differ on how the amount of noise is estimated and on the nature of the regularization parameter which can be a scalar (global) value or a spatially dependent (diagonal matrix) set of regularization parameters.For the latter case, the regularization parameter is moved from the regularization term into the fidelity term; moreover if we consider Λ = diag(1/), then (40) can be written (for the denoising case) as where Λ will be updated for each pixel (independently) in an iteratively fashion.
Succinctly we mention that for the ℓ 2 -TV case in [88] two adaptive regularization schemes were presented to spatially update the regularization parameters.Also in [89] the Stein's Unbiased Risk Estimate (SURE) was used to estimate the mean square error between the observed image and the denoised one; a global regularization parameter was used.In [90] an extension to the Unbiased Predictive Risk Estimator (UPRE) was used to estimate the global regularization parameter as well.

TV Numerical Algorithms for the Salt & Pepper Noise
Model.TV for the Salt & Pepper noise model is usually referred as ℓ 1 -TV; in this case the minimizer of the cost functional, is the solution of the ℓ 1 -TV problem.The use of the ℓ 1 for the fidelity term in ( 42) was a significant development [16,17,19,20] for TV based restoration methods and attracted attention due to a number of advantages, including superior performance with impulse noise [91].
The numerical algorithms that solve (42) are based on a variety of methods (although they can also be loosely classified between methods that need to solve a linear system and methods that need not): for instance in [20,91] a smooth approximation of the ℓ 1 norm was used, along with the steepest descent algorithm (the computational performance of this algorithm is slow compared to all other options); in [92] a Markov random field model was used, where the resulting algorithm, that used the anisotropic separable approximation, operated over integer-valued images and did not need to solve a linear system of equations; moreover, similar ideas were used in [93][94][95] (which are related to an earlier algorithm [96]) which yielded to a computational efficient algorithm (specially [93,94]), [71] also uses the anisotropic separable approximation, and does not need to solve a linear system of equations nevertheless; it introduces an auxiliary variable (similar to the core idea of ADDM/SB, see Section 3.2) resulting in an alternating algorithm; a second-order cone programming approach was proposed in [97], although it had high memory requirements and needed to solve a linear system of equations; in [77] the ℓ 1 -TV problem was solved using (linear programming) interior point method; finally we mention that the IRN algorithm [59] (IRLS based) can also handle the ℓ 1 -TV problem (both denoising and deconvolution) and was a computational efficient alternative to methods based on the Markov random field model for the denoising case.

Adaptation of the Regularization Parameter for ℓ 1 -TV.
The original ℓ 1 -TV problem ( 42) features a single regularization parameter (), which influences the entire pixel set, and has a direct impact on the quality of the reconstructed data.Ideally, for the salt-and-pepper noise model, noisefree pixels should preserve their values in the reconstructed (denoised) image.However, the use of a global parameter forces the entire pixel set to be penalized, which results in an inaccurate reconstruction.To the best of our knowledge, [41,[98][99][100] are the only published papers that tackle the above-mentioned shortcomings for the ℓ 1 -TV problem.
All the above-mentioned algorithms (in a way or another) minimize the cost functional (u) = ‖Λ(u − b)‖ 1 + ‖∇u‖ 1 , where Λ represents a diagonal matrix, whose entries are set to zero if the pixel is declared noise-free.Nevertheless, whereas in [98] a set of corrupted pixel candidates were estimated (via [22]) to then proceed to solve the ℓ 1 -TV problem only for the corrupted candidates, being the regularization parameter hand-picked, in [99] and in [41,100] a scheme was proposed to spatially adapt the regularization parameter; for the former, such adaptation was based only on local statistics and on the estimation of the noise level (although all pixels are still regularized); for the latter, which also estimated the set of corrupted pixel candidates via [22], the adaptation was initially based on similar ideas [100], however it was then proved [41] that the scheme that obtains the best results (reconstruction quality) is such that the nonzero entries of Λ should be increased at each iteration.

TV Numerical Algorithms for the Poisson Noise Model.
The minimization of the TV regularized Poisson loglikelihood cost functional [66,[101][102][103][104][105][106][107][108], summarized by min (u) s.t.u ≥ 0, where has been successfully employed to restore non-negative images corrupted with Poisson noise from a number of applications, such as PET images [101], confocal microscopy images [102], and others.Within the TV framework, numerical algorithms that solve (43) can be loosely classified as those that use a secondorder Taylor approximation of the data fidelity term (u) to make (43) a tractable problem and those that use other alternative approaches.
For the latter case, we mention that in [101] an EM algorithm along with a smooth approximation of ‖∇u‖ 1 was proposed to solve the Poisson-TV problem; in [102] a multiplicative gradient based algorithm (equivalent to the penalized EM algorithm) was used; a multilevel algorithm was used in [103] to solve a modified version of (43); in [66] the authors used the ℓ 1 regularized loss minimization (a particular case of the ADMM algorithm, see [61, Section 6.3]) to minimize (43); in [107] two alternative variational methods were used (called L2-L2Log and TV-log) which can be understood as a change of variable (  = log((u)  )) in (43).
Algorithms that do use a second-order Taylor approximation of the data fidelity term (u) need to explicitly address the problem of it ((u)) being not Lipschitz continuous when any element of u is (close to) zero (see Section 2.3).This issue is typically addressed by using a modified version of the second order Taylor approximation or by adding a constant to the observed data, and although the exact original problem is not solved (due to the approximations involved) the reconstruction (numerical) results have proven to be competitive.Moreover, this type of algorithms need solve a linear system of equations and typically differ in the constrained optimization algorithm used to carry out the actual minimization and in whether the true Hessian of (u) or an approximation is used.For instance, in [109] a linear approximation of the Hessian is used along with a quasi-Newton optimization method, in [104] the minimization was carried out via a nonnegatively constrained, projected quasi-Newton minimization algorithm; ∇ 2 (u) was used.In [106] a constrained TV algorithm, described in [85], plus the approximation ∇ 2 (u) ≈   ,   > 0 was used.In [105] a Expectation-Maximization TV approach was used and only the denoising problem was addressed.In [108] the NQP algorithm was used to optimize (43) for both grayscale and color images; ∇ 2 (u) was used.

TV Numerical Algorithms for the Speckle (Gamma)
Noise Model.The minimization of the non-convex TV multiplicative model (introduced in [38]) summarized by min (u) s.t.u ≥ 0, where is not the only alternative within the TV framework to restore images corrupted with Speckle noise; here we mention that [110] (the first method within the TV framework), used a constrained optimization approach with two Lagrange multipliers; the denoising and deconvolution problems were addressed; additionally in [39] the multiplicative model was converted into an additive one and used a multigrid algorithm to solve the resulting Euler-Lagrange equation; Also in [111] the multiplicative model was converted into an additive one and used the SB (or ADMM) algorithm to solve the optimization problem; only the denoising problem was addressed.Moreover, a framework based on MRF with levelable priors for restoration of images corrupted by Gaussian or Speckle (Rayleigh) was proposed in [112] where only the denoising problem was addressed, and in [113] a hard thresholding of the curvelet transform of the log-image followed by a ℓ 1 -TV in the log-image domain was used; only the denoising problem was addressed.
Regarding the methods that use (44) within the TV framework, we first mention that [38] introduced such data fidelity term, which was derived using the MAP criterion; moreover, a detailed mathematical study of (44) was also carried in [38,Section 4] to address several issues, including the problem of (44) being not Lipschitz continuous (see Section 2.4): the unique solution to (44) is such that all its elements are strictly greater than zero.This result was reached without any assumption on the underlying optimization procedure to solve (44); furthermore, as to give numerical evidence on such result, in [38] an artificial time marching approach (steepest descent) was used to numerically solve the resulting Euler-Lagrange equation from (44); the denoising and deconvolution problems were addressed.A general TV formulation was proposed in [114] which included several models ( [38,110,112]) as special cases; it also replaced the regularizer TV() by TV(log ); only the denoising problem was addressed.In [115] the non-convex model introduced in [38] was augmented with the Weberized TV (a term of the form ‖∇u‖ 1 /u, see also [116]) as an extra regularizer and solved the Euler-Lagrange equation via a fixed-point iteration; only the denoising problem was addressed.In [117] a second order Taylor approximation of the data fidelity term (u) in ( 44) was used along with the NQP algorithm.
we also recall that the key idea of the non-TV based methods is to use a two-phase approach: detect the outlier pixels before proceeding with the filtering phase.Within the TV framework, most of the algorithms also start with an outlier detection pre-processing phase followed by a denoising phase.The approach in [119] was based on two augmented cost functionals ([119, equations (15)-( 16)]) which had to be chosen depending on the noise characteristics; the reconstruction performance was competitive for grayscale images, but its computational performance was extremely poor.In [120] the computational performance of [119] was improved, although it still was highly dependent on the noise level, specially for high noise levels; In [121] an ℓ 1 -ℓ 0 minimization approach was proposed for grayscale and color images, resulting in a three-phase algorithm; reconstruction quality results and computational performance were quite good when compared with published works (and could be considered state-of-the-art), but the proposed cost functional ([121, Equation (5)]) is complicated and has several regularization parameters which have to be hand-picked, and the use of a dictionary learning second phase affects the overall computational performance.In [118] a spatially adaptive algorithm was proposed, along with the cost functional (u) = ‖Λ  ⋅ (u − b)‖ 1 +(1/2)‖Λ  ⋅ (u − b)‖ 2  2 +‖∇u‖ 1 (minimized via an IRLS based algorithm) to address the denoising problem; the computational performance of this algorithm greatly exceeded that of [120,121] while its reconstruction quality results can be place somewhere in between those reported by [120,121].

Numerical Simulations and Examples
The aim of this Section is to provide practical examples of restoration results that can be obtained with TV for different noise models, for which we have used several (grayscale and color) images from [122] as test images, which are scaled between 0 and 1 before blurring (by a 7 × 7 out-of-focus kernel); afterwards, they are corrupted with noise (models described in Section 2).To assess the reconstruction quality of the restored images the SNR (all cases) and SSIM (only for grayscale images) [123] metrics are reported.
Results presented throughout this Section may be reproduced using the [124], a Matlab-only implementation of IRLS based algorithms for TV with different fidelity terms.Since it is not the aim of this paper to present a computational performance comparison between the many TV numerical algorithms (for all the considered different noise models) here we will only report the time (computational) performance of [124] for completeness' sake and to offer the reader a rough idea of the processing time for each case (elapsed for Matlab, version R2011a, running on a 2.20 GHz Intel core i7 CPU laptop, with 6144 K of L2 memory and 6 G of RAM).For the readers interested in a thoroughly performance comparison between the many TV numerical algorithms we recommend to review the performance reports listed in the bibliography referenced throughout Section 4.
For the ℓ 2 -TV case we choose to present simulations where we compare the reconstruction performance results of the (standard) ℓ 2 -TV (Section 4.1) with the nonnegative ℓ 2 -TV (Section 4.1.2) for the deconvolution case (see Figure 1).Reconstruction SNR values and computation times are compared in Table 1; The nonnegative ℓ 2 -TV method has a better results, in terms of visual quality although SNR/SSIM metrics are about the same, than the (standard) ℓ 2 -TV method, although the computational performance for the latter is about 2 to 3 times faster.
For the ℓ 1 -TV case we choose to present simulations where we compare the reconstruction performance results of the (standard) ℓ 1 -TV (Section 4.2) with the adaptive ℓ 1 -TV (Section 4.2.1) for the denoising case (see Figure 2).Reconstruction SNR and SSIM values and computation times are compared in Table 2; The adaptive ℓ 1 -TV method has significantly better reconstruction results, both in terms of SNR, SSIM and visual quality than the (standard) ℓ 1 -TV; there is, however, a trade-off to be made for the superior reconstruction quality: the first step of the adaptive ℓ 1 -TV, that is the estimation of the set of pixels corrupted with Salt & Pepper noise, is very slow compared to the processing time needed to solve the ℓ 1 -TV problem.Here we want to highlight that although we only report the performance for the grayscale "Boats" and color "Barbara" images, these results are representative for all other test images.
For the Poisson-TV (Section 4.3) method, we choose to present simulations for the denoising and deconvolution cases.We use the grayscale "Cameraman" and color "Peppers" images (both 512 × 512) which were first scaled to a maximum value  ∈ {5, 30, 100, 255} (the lower the value of  is, the noisier the image will be perceived), and then were blurred ("Peppers" only), and finally the Poisson noise was added (this matches typical experimental setups, such [66, Section 6.3] and [108, Section 4]).In Table 3 (see also Figure 3) we summarize the restoration results for the Poisson-TV method.
Similarly, for the Gamma-TV (Section 4.4) method, we choose to present simulations for the denoising and deconvolution cases.We use the grayscale "Tank" and color "Lena" images (both 512 × 512) which were corrupted with Speckle noise, generated according to (12) with  = 5, 10, 33 (note that the lower the value of  is, the noisier the image will be perceived, only "Lena" was blurred).This matches typical experimental setups, such as [111,115,117].In Table 4 (see also Figure 4) we summarize the restoration results for the Gamma-TV method.Table 1: Deconvolution performance results for the standard ℓ 2 -TV and for the nonnegative ℓ 2 -TV methods for the grayscale "Goldhill" (720 × 576) image and the color "Lena" (512 × 512) image.

Image
Noise (      Table 5: Denoising performance results for the mixed Gaussian Impulse TV method for the grayscale "Peppers" and the color "Lena" images (both 512 × 512).The Gaussian plus Salt & Pepper noise case is marked as "G + S & P", whereas the Gaussian plus random-valued impulse noise case is marked as "G + R. " Finally, for the mixed model TV (Section 4.5) method, we choose to present simulations for three denoising cases, where in one case the grayscale "Peppers" image (512 × 512) was corrupted with Gaussian plus random-valued impulse noise generated according to (45) with  = 15/255 and  = 0.3, and in the other two, color "Lena" image (512 × 512) was corrupted with Gaussian plus Sal & Pepper noise as well as with Gaussian plus random-valued impulse noise with  = 15/255 and  = 0.7, and  = 10/255 and  = 0.2, respectively (according to (45)), see Figure 5. Additionally, in Table 5 we summarize the restoration results for the mixed model TV method for a broader combination of levels of Gaussian and Impulse noises that matches typical experimental setups [118,120,121].

Conclusions
To the best of our knowledge, we have provided a complete summary of recent TV algorithms for different noise models, that is, Gaussian, Impulse (e.g., Salt & Pepper), Poisson, and Speckle (e.g., Gamma) and the mixed Gaussian and impulse noise models.Although for some noise models, there are particular algorithms that have better reconstruction performance (e.g., patch-based approaches for the Gaussian noise model) we expect that in the coming years TV will improve its current reconstruction performance (and hopefully become competitive with state-of-the-art alternative denoising methods); moreover, we believe that the development of new TV algorithms that spatially adapt the regularization parameter is a possible answer (see the computational results for the adaptive ℓ 1 -TV case) to current reconstruction performance shortcomings when compared to patch-based approaches.
Finally, it is important to highlight the flexibility of the TV method, that allows it to be relevant to applications that come from a variety of fields, ranging from astronomical to SAR based systems, including medical applications such as Ultrasound, PET, and others.

4. 5 .
TV Numerical Algorithms for the Mixed Gaussian-Impulse Noise Model.A number of algorithms have recently been proposed for denoising of images subject to the mixed noise model (16), reproduced here for convenience b = B ⋅ (u * + ) + (1 − B) ⋅ r;
d. multivariate Bernoulli distribution with success probability 1 − , 1 represents a vector with all elements equal to one (both B and 1 have the approppiate dimension), r is either Salt & Pepper noise or random-valued impulse noise; for the former case,   = c min or   = c max with probability  1 and  2 , respectively, ( =  1 +  2 ), and for the latter case   is drawn from an uniform distribution in [c min , c max ].

Table 2 :
Denoising performance results for the ℓ 1 -TV and the adaptive ℓ 1 -TV methods for the grayscale "Boats" (512 × 512) image and the color "Barbara" (720 × 576) image.Time in parenthesis (adaptive case) corresponds to time elapsed to estimate the pixels corrupted with Salt & Pepper noise.