Inexact Version of Bregman Proximal Gradient Algorithm

The Bregman Proximal Gradient (BPG) algorithm is an algorithm for minimizing the sum of two convex functions, with one being nonsmooth. The supercoercivity of the objective function is necessary for the convergence of this algorithm precluding its use in many applications. In this paper, we give an inexact version of the BPG algorithm while circumventing the condition of supercoercivity by replacing it with a simple condition on the parameters of the problem. Our study covers the existing results, while giving other.


Introduction
We consider the following minimization problem: (1) where f is a convex proper lower-semicontinuous (l.s.c.) function and g is a convex continuously differentiable function. is problem arises in many applications including compressed sensing [1], signal recovery [2], and phase retrieve problem [3]. One classical algorithm for solving this problem is the proximal gradient (PG) method: x n ≔ argmin f(u) + 〈∇g x n− 1 , u〉 where λ n is the stepsize on each iteration. e Proximal Gradient Method and its variants [4][5][6][7][8][9][10][11][12][13][14] have been one hot topic in optimization field for a long time due to their simple forms. A central property required in the analysis of gradient methods is that of the Lipschitz continuity of the gradient of the smooth part g. However, in many applications, the differentiable function does not have such a property, e.g., in the broad class of Poisson inverse problems. In [15], by introducing the Bregman distance [16] generated by some reference convex function h defined by the authors could replace the intricate question of Lipschitz continuity of gradients by a convex condition easy to verify, which we call below LC property. ereby, they proposed and studied the algorithm called NoLips defined by x n � argmin f(u) + 〈∇g x n− 1 , u〉 where g � 0. Equation (4) is the Bregman Proximal (BP) studied in [17][18][19][20][21].
In this article, we give an inexact version of the (BPG) defined by x n ∈ ε n − argmin f(u) +〈∇g x n− 1 , u〉 + 1 λ n D h u, x n− 1 .
While circumventing the condition of supercoercivity required in [15,22] by replacing it with a simple condition on the parameters of the problem, our study covers the existing results, while giving others.
Our notation is fairly standard, 〈., .〉 is the scalar product on R d , and the associated norm ‖·‖. e closure of the set C (relative interior) is denoted by C (riC, respectively). For any convex function f, we denote by

Preliminary
In this section, we present the main results of the convergence of NoLips.
Definition 1 (see [23]). Let C be a convex, not empty of R d .
Legendre on C if it verifies the three following conditions: for any sequence x i of C that converges towards a boundary point of C (ii) e class of strictly convex functions verifying a, b, and c is called the class of Legendre's functions on C and denoted by E(C).
Assumption 1 We consider the following minimization problem: (P): Let the operator T λ be defined by Lemma 1 (well-posedness of the method). Under Assumption 1, suppose one of the following assumptions holds: (i) argmin Ψ(x), x ∈ dom h is nonempty and compact (ii) ∀λ > 0, h + λf is supercoercive and the map T λ defined in (7) is nonempty and single-valued from int (domh) to int (domh).
By posing they showed that where p(u) � 〈∇g(x), u〉. e operator T λ thus appears as composed of two operators prox. e NoLips algorithm then becomes Assumption 2 (i) argmin Ψ(x), x ∈ dom h is nonempty and compact or ∀λ > 0, h + λf is supercoercive (ii) For every x ∈ int(dom h) and r ∈ R, the level set (ii) λ n � +∞ and Assumptions 1 and 2 are satisfied. en, the sequence x n { } n converges to some solution x * of (P).
Our contribution is resumed in two essential points: (1) Improvements of some assumptions: (a) Suppose f and g are both are convex (see [15,22]), we show that we can reduce this hypothesis by supposing only that Ψ is convex, which allows to distinguish two interesting cases that are still not yet studied neither in the case of the BPG nor in the case PG: (i) e nonsmooth part f is possibly not convex and the smooth part g is convex (ii) e nonsmooth part f is convex and the smooth part g is possibly not convex (b) e assumption is as follows: argmin Ψ(x), is is a condition on f and g (see [15]), which precludes the application of NoLips for the functions Ψ non-supercoercive. In this work, we show that we can circumvent this condition by coupling the LC property with the bounded level sets as follows: It is a condition which relates to the parameter h and which is verified by most of the interesting Bregman distances. (2) Inexact version of NoLips.
We propose an inexact version of NoLips called ε− NoLips which verifies e convergence result is established in Section 4. is study covers the convergence results given for PG and BPG, by giving new results, in particular, the convergence of the inexact version of the interior method with Bregman distance studied in [24]; this result has not been established until now.
We also note that the convergence of NoLips is given with the following condition: It is for that and for the clarity of the hypothesis, we suppose in what follows that h: S ⟶ R, with S being an open convex set of R d .

Main Results
In order to clarify the status of parameter h, we give the following definitions. Let S be a convex open subset of R d and h: S ⟶ R. Let us consider the following hypotheses: h is continuous and strictly convex on S. H 3 : ∀r ≥ 0, ∀x ∈ S, the sets below are bounded H 4 : ∀r ≥ 0, ∀x ∈ S, the sets below are bounded Definition 4 Eq. (18) is called Bregman distance if h is a Bregman function. We put the following conditions: Lemma 2. ∀h ∈ A(S), ∀a ∈ S, and ∀b, c ∈ S: with the convention 0 log 0 � 0, then ∀(x, y) ∈ S 1 XS 1 :
We consider the following minimization problem: e following assumptions on the problem's data are made throughout the paper (and referred to as the blanket assumptions).
We consider the operator T λ defined by ∀x ∈ S: We give in the following a series of lemmas allowing establishment of eorem 2, which assures the well-posedness of the method proposed in Section 4.
Proof. ∀x ∈ S ⊂ dom g: □ Lemma 4. If the pair (g, h) verified the condition (LC), then ∃ L > 0, ∀x ∈ S, ∀u ∈ S: Let u ∈ (S/S). ere exists a sequence u n { } n ⊂ S such that u n ⟶ u; then, we have en, h is strongly convex on S, x, y ∈ S, x ≠ y; then, D h (x, y) ≠ 0⟹1 ≤ λL which is absurd. Hence, h − λg is strictly convex in S. □ Lemma 6. Consider the following: is also a Legendre. By application of eorem 26.1 in [23], z(D h− λg (.; u)) verifies the following: □ Theorem 2 (well-posedness of the method). We assume that
Proof. ∀x ∈ S and T λ (x) is nonempty; for this, it is enough to demonstrate that ∀r ∈ R: which is closed and is bounded when it is nonempty: It follows that thanks to H 4 ; L 2 (x, (λ(r − Ψ * )/1 − Lλ)) is bounded, which leads that L(x, r) is bounded too, which shows that Let x * ∈ T λ (x). Let us suppose that [10], which allows to write that It follows that such that is in contradiction with Lemma 6. en, T λ (x) ⊂ S. On the other hand, h − λg is strictly convex in S and Ψ is convex, so Ψ(·) + D h− λg (., x) is strongly convex in S. en, T λ (x) has a unique value for all x ∈ S.
is result is liberated from the supercoercivity of Ψ and the simultaneous convexity of f and g, as required by Lemma 2 [15].
Proof. e first equality is due to Lemma 3. e second is established in [15]. □ e first equality played a decisive role in the development of this paper.

Analysis of the ε−NoLips Algorithm
In this section, we propose an Inexact Bregman Proximal Gradient Algorithm (IBPG), which is an inexact version of the BPG algorithm described in [15,22]; the IBPG Abstract and Applied Analysis framework allows an error in the subgradient inclusion by using the error ε n . We study two algorithms: (i) Algorithm 1: inexact Bregman Proximal Gradient (IBPG) algorithm without relative error criterion (ii) Algorithm 2: inexact Bregman Proximal Gradient (IBPG) algorithm with relative error criterion which we call ε− NoLips We establish the main convergence properties of the proposed algorithms. In particular, we prove its global rate of convergence, showing that it shares the claimed sublinear rate O(1/n) of basic first-order methods such as the classical PG and BPG. We also derive a global convergence of the sequence generated by NoLips to a minimizer of (P).

Assumptions 4
In our analysis, Ψ is supposed to be a convex function; it allows to distinguish two interesting cases: (i) e nonsmooth part f is possibly not convex, and the smooth part g is convex (ii) e nonsmooth part f is convex, and the smooth part g is possibly not convex In what follows, the choice of the sequence λ n depends of the convexity of g.
Let λ such that 0 < λ < (1/L), λ 0 ≔ 0. If g is not convex, then we choose If g is convex, then we choose In those conditions, we easily show that ∀x, y ∈ S, ∀n ∈ N, 0 < λ n ≤ λ, such that We pose Proposition 5. e sequence x n { } n defined by (IBPG) exists and is verified for all n ∈ N * : Proof. Existence is deduced trivially from (45): By applying Lemma 2, we have is result shows that IBPG is an inexact version of BPG and this is exactly the BPG when ε n � 0, i.e.: Proof (55)⟹λ n Ψ x n − Ψ(u) ≤ D h n u, x n− 1 − D h n u, x n − D h n x n , x n− 1 + ε n λ n .
□ Now, we derive a global convergence of the sequence generated by Algorithm 1 to a minimizer of (P).

Abstract and Applied Analysis
Theorem 3. We assume that λ n ε n < +∞, if one of the following assumptions holds: { } is nonincreasing and λ n � +∞; then, (a) Ψ(x n ) ⟶ inf Ψ and (b) x n ⟶ x * ∈ arg min Ψ Proof (a) Suppose (i) Let x * ∈ arg min Ψ and we put u � x * in (69), we have (68) and (suppose (69) Then, D h (x * , x n ) is bounded, and from H 3 , x n { } is bounded as well. Let u * ∈ Adh x n { }; there exists then a subsequence which shows that ⟹u * ∈ arg min Ψ. en, Since D h n i (u * , x n i ) ⟶ 0 and u * ∈ arg min Ψ, we have We have D h u * , x n � D h n u * , x n + λ n D g u * , x n ≤ D h n u * , x n + L · λD h u * , x n , And from H 6 , we have x n ⟶ u * ∈ arg min Ψ. □ e IBPG algorithm generates a sequence such that Ψ(x n ) { } n does not necessarily be nonincreasing; for this reason and for improvement of the global estimate in function values, we now propose ε− NoLips which is an inexact version of BPG with a relative error criterion. Let σ such that 0 ≤ σ < 1 be given as follows.
In what follows, we will derive a convergence rate result ( eorem 4) for the ε− NoLips framework. First, we need to establish a few technical lemmas.

□
In the following, x n { } n denotes the sequence generated by ε− NoLips.

(81)
From the condition LC, we have (82) □ Remark 3. We now notice that Ψ(x n ) { } n is nonincreasing. Just replace u with x n− 1 in (79). Lemma 8. For every n ∈ N * and x * ∈ arg min Ψ, we have Proof. Replacing u by x n− 1 in (79), and since A n (x n− 1 ) ≥ 0 and A n− 1 (x n− 1 ) � 0, we have where a i denotes the ith line of A.
In this section, we propose an approach for solving the nonnegative linear inverse problem defined by We take It is shown in [15] that the couple (g, h) verified a Lipschitz-like/Convexity Condition (LC) on R d + for any L such that where A j denotes the jth column of A. For λ n ≔ λ, ∀n, eorem 3 is applicable and global warrant of Algorithm 1 convergence to an optimal solution of (P α ).
Given x n ∈ R d ++ , the iteration amounts to solving the one-dimensional problem: For j � 1, . . . d, where c j is the jth component of ∇g(x n ).

Conclusion
e proposed algorithms constitute a unified frame for the existing algorithms BPG, BP, and PG, by giving others, in particular, the inexact version of the interior method with Bregman distance studied in [24]. More precisely, (i) When ε n � 0, ∀n ∈ N * , our algorithm is the NoLips studied in [15,22] (ii) When g � 0, our algorithm is the inexact version of Bregman Proximal (BP) studied in [19] (iii) When ε n � 0, ∀n ∈ N * , and g � 0, our algorithm is the Bregman Proximal (BP) studied in [17,21] (iv) When f � 0, our algorithm is the inexact version of the interior method with Bregman distance studied in [24] (v) When f � 0 and ε n � 0, ∀n ∈ N * , our algorithm is the interior method with Bregman distance studied in [24] (vi) When h � (1/2)‖·‖ 2 , our algorithm is the proximal gradient method (PG) and its variants [4][5][6][7][8][9][10][11][12][13][14]27] Our analysis is different and more simple than the one given in [15,22] and allows to reduce some hypothesis, in particular, the supercoercivity of Ψ as well as the simultaneous convexity of f and g.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.