An implementable algorithm for solving a nonsmooth convex optimization problem is proposed by combining Moreau-Yosida regularization and bundle and quasi-Newton ideas. In contrast with quasi-Newton bundle methods of Mifflin et al. (1998), we only assume that the values of the objective function and its subgradients are evaluated approximately, which makes the method easier to implement. Under some reasonable assumptions, the proposed method is shown to have a Q-superlinear rate of
convergence.
1. Introduction
In this paper we are concerned with the unconstrained minimization of a real-valued, convex function f:Rn→R, namely,
(1)minf(x)s.t.x∈Rn,
and in general f is nondifferentiable. A number of attempts have been made to obtain convergent algorithms for solving (1). Fukushima and Qi [1] propose an algorithm for solving (1) under semismoothness and regularity assumptions. The proposed algorithm is shown to have a Q-superlinear rate of convergence. An implementable BFGS method for general nonsmooth problems is presented by Rauf and Fukushima [2], and global convergence is obtained based on the assumption of strong convexity. A superlinearly convergent method for (1) is proposed by Qi and Chen [3], but it requires the semismoothness condition. He [4] obtains a globally convergent algorithm for convex constrained minimization problems under certain regularity and uniform continuity assumptions. Among methods for nonsmooth optimization problems, some have superlinear rate of convergence, for instance, see Mifflin and Sagastizábal [5] and Lemaréchal et al. [6]. They propose two conceptual algorithms with superlinear convergence for minimizing a class of convex functions, and the latter demands that the objective function f should be differentiable in a certain space U (the subspace along which ∂f(p) has 0 breadth at a given point p), but sometimes it is difficult to decompose the space. Besides these methods mentioned above, there is a quasi-Newton bundle type method proposed by Mifflin et al. [7] it has superlinear rate of convergence, but the exact values of the objective function and its subgradients are required. In this paper, we present an implementable algorithm by using bundle and quasi-Newton ideas and Moreau-Yosida regularization, and the proposed algorithm can be shown to have a superlinear rate of convergence. An obvious advantage of the proposed algorithm lies in the fact that we only need the approximate values of the objective function and its subgradients.
It is well known that (1) can be solved by means of the Moreau-Yosida regularization F:Rn→R of f, which is defined by
(2)F(x)=minz∈Rn{f(z)+(2λ)-1∥z-x∥2},
where λ is a fixed positive parameter and ∥·∥ denotes the Euclidean norm or its induced matrix norm on Rn×n. The problem of minimizing F(x), that is,
(3)minF(x)s.t.x∈Rn,
is equivalent to (1) in the sense that x∈Rn solves (1) if and only if it solves (3), see Hiriart-Urruty and Lemaréchal [8]. The problem (3) has a remarkable feature that the objective function F is a differentiable convex function, even though f is nondifferentiable. Moreover F has a Lipschitz continuous gradient
(4)G(x)=λ-1(x-p(x))∈∂f(p(x)),
where p(x) is the unique minimizer of (2) and ∂f is the subdifferential mapping of f. Hence, by Rademacher’s theorem, G is differentiable almost everywhere and the set
(5)∂BG(x)={D∈Rn×n∣D=limxk→x∇G(xk),222222222whereGis differentiableatxk(xk)limxk→x(xk)}
is nonempty and bounded for each x. We say G is BD-regular at x if all matrices D∈∂BG(x) are nonsingular. It is reasonable to pay more attention to the problem (3) since F has such good properties. However, because the Moreau-Yosida regularization itself is defined through a minimization problem involving f, the exact values of F and its gradient G at an arbitrary point x are difficult or even impossible to compute in general. Therefore, we attempt to explore the possibility of utilizing the approximations of these values.
Several attempts have been made to combine quasi-Newton idea with Moreau-Yosida regularization to solve (1). For related works on this subject, see Chen and Fukushima [9] and Mifflin [10]. In particular, Mifflin et al. [7] consider using bundle ideas to approximate linearly the values of f in order to approximate F in which the exact values of f and one of its subgradients g at some points are needed. In this paper we assume that for given x∈Rn and ε≥0, we can find some f~∈R and ga(x,ε)∈Rn such that
(6)f(x)≥f~≥f(x)-ε,f(z)≥f~+〈ga(x,ε),z-x〉,∀z∈Rn,
which means that ga(x,ε)∈∂εf(x). This setting is realistic in many applications, see Kiwiel [11]. Let us see some examples. Assume that f is strongly convex with modulus μ>0, that is,
(7)f(x)+g(x)T(z-x)+μ2∥z-x∥2≤f(z),22222222222∀z,x∈Rn,g(x)∈∂f(x),
and that f(x)=w(v(x)) with v:Rn→Rm continuously differentiable and w:Rm→R convex. By the chain rule we have ∂f(x)={∑i=1mξi∇vi(x)∣ξ=(ξ1,ξ2,…,ξn)T∈∂w(v(x))}. Now assume that we have an approximation ∇hv(x) of ∇v(x) such that ∥∇hv(x)-∇v(x)∥≤κ(h),h>0. Such an approximation may be obtained by using finite differences. In this case, typically κ(h)→0 for h→0. Let gh(x)=∑i=1mξi∇hvi(x),ξ∈∂w(v(x)). Then, we have
(8)f(x)+gh(x)T(z-x)≤f(x)+g(x)T(z-x)+∥ξ∥∥∇hv(x)-∇v(x)∥∥z-x∥≤f(z)-μ2∥z-x∥2+κ(h)∥ξ∥∥z-x∥
for all x,z∈Rn and g(x)=∑i=1mξi∇vi(x)∈∂f(x). Some simple manipulations show that
(9)-μ2∥z-x∥2+κ(h)∥ξ∥∥z-x∥≤12μ∥ξ∥2κ(h)2=:εh,∀x,z∈Rn.
By the definition of ξ, the bound εh depends on x, we obtain
(10)f(x)+gh(x)T(z-x)≤f(z)+εh,∀z∈Rn.
From the local boundedness of ∂w(v(x)), we infer that εh>0 is locally bounded. Thus, gh(x) is an εh-subgradient of f at x, see Hintermüller [12]. As for the approximate function values, if f is a max-type function of the form
(11)f(x)=sup{ϕu(x)∣u∈U},∀x∈Rn,
where each ϕu:Rn→R is convex and U is an infinite set, then it may be impossible to calculate f(x). However, for any positive ε one can usually find in finite time an ε-solution to the maximization problem (11), that is, an element uε∈U satisfying ϕuε≥f(x)-ε. Then one may set fε(x)=ϕuε(x). On the other hand, in some applications, calculating uε for a prescribed ε≥0 may require much less work than computing u0. This is, for instance, the case when the maximization problem (11) involves solving a linear or discrete programming problem by the methods of Gabasov and Kirilova [13]. Some people have tried to solve (1) by assuming the values of the objective function, and its subgradients can only be computed approximately. For example, Solodov [14] considers the proximal form of a bundle algorithm for (1), assuming the values of the function and its subgradients are evaluated approximately, and it is shown how these approximations should be controlled in order to satisfy the desired optimality tolerance. Kiwiel [15] proposes an algorithm for (1), and the algorithm utilizes the approximation evaluations of the objective function and its subgradients; global convergence of the method is obtained. Kiwiel [11] introduces another method for (1); it requires only the approximate evaluations of f and its ε-subgradients, and this method converges globally. It is in evidence that bundle methods with superlinear convergence for solving (1) by using approximate values of the objective and its subgradients are seldom obtained. Compared with the methods mentioned above, the method proposed in this paper is not only implementable but also has a superlinear rate of convergence under some additional assumptions, and it should be noted that we only use the approximate values of the objective function and its subgradients which makes the algorithm easier to implement.
Some notations are listed below for presenting the algorithm.
∂f(x)={ξ∈Rn∣f(z)≥f(x)+ξT(z-x),∀z∈Rn}, the subdifferential of f at x, and each such ξ is called a subgradient of f at x.
∂εf(x)={η∈Rn∣f(z)≥f(x)+ηT(z-x)-ε}, the ε-subdifferential of f at x, and each such η is called an ε-subgradient of f at x.
p(x)=argminz∈Rn{f(z)+(2λ)-1∥z-x∥2}, the unique minimizer of (2).
G(x)=λ-1(x-p(x)), the gradient of F at x.
This paper is organized as follows: in Section 2, to approximate the unique minimizer p(x) of (2), we introduce the bundle idea, which uses approximate values of the objective function and its subgradients. The approximate quasi-Newton bundle-type algorithm is presented in Section 3. In the last section, we prove the global convergence and, under additional assumptions, Q-superlinear convergence of the proposed algorithm.
2. The Approximation of p(x)
Let x=xk and s=z-xk, where xk is the current iterate point of AQNBT algorithm presented in Section 3, then (13) has the form
(12)F(xk)=mins∈Rn{f(xk+s)+(2λ)-1∥s∥2}.
Now we consider approximating f(xk+s) by using the bundle idea. Suppose we have a bundle Jk generated sequentially starting from xk and possibly a subset of the previous set used to generate xk. The bundle includes the data (zi,f~i,ga(zi,εi)), i∈Jk, where zi∈Rn, f~i∈R, and ga(zi,εi)∈Rn satisfy
(13)f(zi)≥f~i≥f(zi)-εi,f(z)≥f~i+〈ga(zi,εi),z-zi〉,∀z∈Rn.
Suppose that the elements in Jk can be arranged according to the order of their entering the bundle. Without loss of generality we may suppose Jk={1,…,j}. εi is updated by the rule εi+1=γεi, 0<γ<1, i∈Jk. The condition (13) means ga(zi,εi)∈∂εif(zi), i∈Jk. By using the data in the bundle we construct a polyhedral function fa(xk+s) defined by
(14)fa(xk+s)=maxi∈Jk{f~i+ga(zi,εi)T(xk+s-zi)}.
Obviously fa(xk+s) is a lower approximation of f(xk+s), so fa(xk+s)≤f(xk+s). We define a linearization error by
(15)α(xk,zi,εi)=f~xk-f~i-ga(zi,εi)T(xk-zi),
where f~xk∈R satisfies
(16)f(xk)≥f~xk≥f(xk)-εxk,for givenεxk≥0.
Then fa(xk+s) can be written as
(17)fa(xk+s)=f~xk+maxi∈Jk{ga(zi,εi)Ts-α(xk,zi,εi)}.
Let
(18)Fa(xk)=mins∈Rn{fa(xk+s)+(2λ)-1∥s∥2}=f~xk+mins∈Rn{maxi∈Jk{ga(zi,εi)Ts-α(xk,zi,εi)}222222222222+(2λ)-1sTs(zi,maxi∈Jkεi)T}.
The problem (18) can be dealt with by solving the following quadratic programming:
(19)minv+λ(2)-1sTs,s.t.ga(zi,εi)Ts-α(xk,zi,εi)≤v∀i∈Jk.
As iterations go along, the number of elements in bundle Jk increases. When the size of the bundle becomes too big, it may cause serious computational difficulties in the form of unbounded storage requirement. To overcome these difficulties, it is necessary to compress the bundle and clean the model. Wolfe [16] and Lemaréchal [17], for the first time, introduce the aggregation strategy, which requires storing only a limited number of subgradients, see Kiwiel and Mifflin [18–20]. Aggregation strategy is the synthesis mechanism that condenses the essential information of the bundle into one single couple (g^εk~,α^k~) (defined below). The corresponding affine function, inserted in the model when there is compression, is called aggregate linearization (defined below). This function summarizes all the information generated up to iteration k. Suppose Jmax is the upper bound of the number of elements in Jk, k=1,2,…. If |Jk| reaches the prescribed Jmax, two or more of those elements are deleted from the bundle Jk; that is, two or more linear pieces in the constraints of (19) are discarded (notice that different selections of discarded linear pieces may result in different speed of convergence), and introduce the aggregate linearization associated with the aggregate ε-subgradient and linearization error into bundle. Define the aggregate linearization as
(20)ft(xk+s)=f~xk+〈g^εk~,s〉-α^k~,
where g^εk~=∑i∈Jkμiga(zi,εi), α^k~=∑i∈Jkμiα(xk,zi,εi). Multiplier μ=(μi)i∈Jk is the optimal solution of dual problem for (19), see Solodov [14]. By doing so, the surrogate aggregate linearization maintains the information of the deleted linear pieces and at the same time the problem (19) is manageable since the number of the elements in Jk is limited. Suppose s(xk) solves the problem (19), and let pa(xk)=xk+s(xk) be an approximation of p(xk) and εpa(xk)=εj+1=γεj. Let
(21)Fa(xk)=f~pa(xk)+εpa(xk)+(2λ)-1s(xk)Ts(xk),
where f~pa(xk)∈R is chosen to satisfy
(22)f(pa(xk))≥f~pa(xk)≥f(pa(xk))-εpa(xk).
The results stated below are fundamental and useful in the subsequent discussions.
Fa(xk)≤F(xk)≤Fa(xk).
Fa(xk)=F(xk) if and only if pa(xk)=p(xk) and f~pa(xk)=f(p(xk)).
Note that p(xk) is the unique minimizer of (2) and (P1) and (P2) can be obtained by the definitions of Fa(xk), Fa(xk), and F(xk).
If we define Fea(xk)=mins∈Rn{maxi∈Jk{f(zi)+g(zi)T(xk+s-zi)}+(2λ)-1sTs}, where g(zi)∈∂f(zi), then Fa(xk)→Fea(xk) as the new point zj+1=xk+s(xk) is appended into the bundle Jk infinitely.
Let ε=maxi∈Jk{εi}. if ga(zi,εi)=g(zi)∈∂f(zi), then Fa(xk)≥Fea(xk)-ε.
Because εi→0 by the update rule εi+1=γεi,0<γ<1, we have ga(zi,εi)→g(zi) and f~i→f(zi). Thus fa(xk+s)→maxi∈Jk{f(zi)+g(zi)T(xk+s-zi)}, so Fea(xk)→Fa(xk). It is easy to see that fa(xk+s)=maxi∈Jk{f~i+ga(zi,εi)T(xk+s-zi)} ≥ maxi∈Jk{f(zi)+ga(zi,εi)T(xk+s-zi)-εi} ≥ maxi∈Jk{f(zi)+g(zi)T(xk+s-zi)}-ε. Therefore, Fa(xk)=mins∈Rn{fa(xk+s)+(2λ)-1∥s∥2} ≥ Fea(xk)-ε.
Let
(23)a(xk)=Fa(xk)-Fa(xk).
We accept pa(xk) as an approximation of p(xk) based on the following rule:
(24)a(xk)<m(xk)min{λ-2s(xk)Ts(xk),L},
where m(xk) and L are given positive numbers and m(xk) is fixed during one bundling process; that is, m(xk) depends on xk, see Step 1 in AQNBT algorithm presented in Section 3. If (24) is not satisfied, we let zj+1=xk+s(xk)andεj+1=γεj,0<γ<1, and take f~j+1=f~pa(xk) and ga(zj+1,εj+1)∈Rn satisfying
(25)f(zj+1)≥f~j+1≥f(zj+1)-εj+1,f(z)≥f~j+1+〈ga(zj+1,εj+1),z-zj+1〉,∀z∈Rn,
and then append a new piece f~j+1+ga(zj+1,εj+1)T(xk+s-zj+1) to (14), replace j by j+1, and solve (19) for finding a new s(xk) and a(xk) to be tested in (24). If this bundle process does not terminate, we have the following conclusion.
Suppose that xk is not the minimizer of f. If (24) is never satisfied, then a(xk)→0 as the new point zj+1 is appended into the bundle Jk infinitely.
Suppose that |Jk|=|{1,2,…,j}|=j<Jmax. Define the functions ϕ and φj+1,j=1,2,… by
(26)ϕ(z)=f(z)+(2λ)-1∥z-xk∥2,φj+1(z)=maxi∈Jk={1,2,…,j}{f~i+ga(zi,εi)T(z-zi)}+(2λ)-1∥z-xk∥2.
Let zj+1 be the unique minimizer of minz∈Rnφj+1(z), and let zj+2 be the unique minimizer of minz∈Rnφj+2(z), where φj+2(z)=maxi∈Jk+1{f~i+ga(zi,εi)T(z-zi)}+(2λ)-1∥z-xk∥2. Note that if |{1,2,…,j+1}|=j+1<Jmax, then let Jk+1={1,2,…,j+1}, so φj+1(zj+1)≤φj+2(zj+2); if |{1,2,…,j+1}|=j+1=Jmax, delete at least two elements from {1,2,…,j+1}, say q1,q2, and q1≠j+1,q2≠j+1, the order of the other elements in {1,2,…,j+1} are left intact. Introduce an additional index k~ associated with the aggregated ε-subgradient and linearization error into Jk+1 and let Jk+1={1,2,…,q1-1,q1+1,…,q2-1,q2+1,…,k~,j+1}, so |Jk+1|=j<Jmax. By adjusting λ appropriately, we can make sure that zj+1 and zj+2 are not far away from xk. According to the proof of Proposition 3, see Fukushima [21], we find that ϕ(zj) has limit, say ϕ*, and φj+1(zj+1) also converges to ϕ* as j→∞. By the definitions of F(xk) and Fa(xk) we have Fa(xk)→F(xk) and Fa(xk)→F(xk) as j→∞, so a(xk)→0 as j→∞.
In the next part we give the definition of Ga(xk), which is the approximation of G(xk),
(27)Ga(xk)=λ-1(xk-pa(xk))=-λ-1s(xk),
and some properties of Ga(xk) are discussed. It is easy to see that the approximation of G(xk) is associated with F(xk):
∥G(xk)-Ga(xk)∥=∥λ-1(p(xk)-pa(xk))∥≤2a(xk)/λ.
By the strong convexity of ϕ(z), we have ϕ(pa(xk))≥ϕ(p(xk))+(2λ)-1∥p(xk)-pa(xk)∥2. From the definitions of Fa(xk) and p(xk), we obtain Fa(xk)=f~pa(xk)+εpa(xk)+(2λ)-1∥pa(xk)-xk∥2 ≥ f(pa(xk))+(2λ)-1∥pa(xk)-xk∥2=ϕ(pa(xk)) ≥ ϕ(p(xk))+(2λ)-1∥p(xk)-pa(xk)∥2=F(xk)+(2λ)-1∥p(xk)-pa(xk)∥2. By (P1), (P5) holds.
By (P4) and (P5), we have the following (P6). In fact, (P6) says that the bundle subalgorithm for finding s(xk) terminates in finite steps.
If xk does not minimize f, then we can find one solution s(xk) of (18) such that (24) holds.
3. Approximate Quasi-Newton Bundle-Type Algorithm
For presenting the algorithm, we use the following notations: ak=a(xk), sk=s(xk), and mk=m(xk). Given positive numbers δ, υ, γ, and L such that 0<δ<1, 0<υ<1, 0<γ<1, and one symmetric n×n positive definite matrix N.
Let x1 be a starting point, and let B1 be an n×n symmetric positive definite matrix. Let ε1 and λ be positive numbers. Choose a sequence of positive numbers {mk}k=1∞ such that ∑k=1∞mk<∞. Set k=1. Find s1∈Rn and a1 such that
(28)a1≤m1min{λ-2(s1)Ts1,L}.
Let Ga(x1)=-λ-1s1, z1=x1, j=1, and j be the running index of bundle subalgorithm.
Step 2 (finding a search direction).
If ∥Ga(xk)∥=0, stop with xk optimal. Otherwise compute
(29)dk=-Bk-1Ga(xk).
Step 3 (line search).
Starting with u=1, let ik be the smallest nonnegative integer u such that
(30)Fa(xk+υudk)≤Fa(xk)+δυu(dk)TGa(xk),
where εu+1=γεu corresponds to the approximations Fa(xk+υudk) and Fa(xk+υudk) of F at xk+υudk; Fa(xk+υudk) satisfies
(31)Fa(xk+υudk)-Fa(xk+υudk)≤mk+1min{λ-2s(xk+υudk)Ts(xk+υudk),L},
and s(xk+υudk) is the solution of (19), in which xk is replaced by xk+υudk, and the expression of Fa(xk+υudk) is similar to (21), but x is replaced by xk+υudk. Set tk=υik and xk+1=xk+tkdk.
Step 4 (computing the approximate gradient).
Compute Ga(xk+1)=-λ-1sk+1.
Step 5 (updating Bk).
Let Δxk=xk+1-xk and Δgk=Ga(xk+1)-Ga(xk). Set
(32)Bk+1={N,if(Δxk)TΔgk≤0,(Bk+1Δxksymmetric,positivedefiniteandsatisfiesBk+1Δxk=Δgk)otherwise.
Set k=k+1, and go to Step 2.
End of AQNBT algorithm.
4. Convergence Analysis
In this section we prove the global convergence of the algorithm described in Section 3, and furthermore under the assumptions of semismoothness and regularity, we show that the proposed algorithm has a Q-superlinear convergence rate. Following the proof of Theorem 3, see Mifflin et al. [7], we can show that, at each iteration k, ik is well defined, and hence the stepsize tk>0 can be determined finitely in Step 4. We assume the proposed algorithm does not terminate in finite steps, so the sequence {xk}k=1∞ is an infinite sequence. Since the sequence {mk}k=1∞ satisfies ∑k=1∞mk<∞, there exists a constant W such that ∑k=1∞mk≤W. Let Da={x∈Rn∣F(x)≤F(x1)+2LW}. By making a slight change of the proof of Lemma 1, see Mifflin et al. [7], we have the following lemma.
Lemma 1.
F(xk+1)≤F(xk)+L(mk+mk+1)forallk≥1 and xk∈Da.
Theorem 2.
Suppose f is bounded below and there exists a constant β such that
(33)〈Bkd,d〉≥β∥d∥2,∀d∈Rn,∀k.
Then any accumulation point of {xk} is an optimal solution of problem (1).
Proof.
According to the first part of the proof of Theorem 3, see Mifflin et al., [7], we have limk→∞F(xk)=F*. Since mk→0, from (P1) we obtain ak→0 as k→∞, and limk→∞Fa(xk)=limk→∞Fa(xk)=F*. Thus
(34)limk→∞tk(dk)TGa(xk)=0.
Let x- be an arbitrary accumulation point of {xk}, and let {xk}k∈K be a subsequence converging to x-. By (P5) we have
(35)limk∈K,k→∞Ga(xk)=G(x-).
Since {Bk-1} is bounded, we may suppose
(36)limk∈K,k→∞dk=d-
for some d-∈Rn. Moreover we have
(37)limk∈K,k→∞〈Ga(xk),dk〉=〈G(x-),d-〉≤-β∥d-∥2.
If liminfk→∞tk>0, then d-=0. Otherwise, if liminfk→∞tk=0, by taking a subsequence if necessary we may assume tk→0 for k∈K. The definition of ik in the line search rule gives
(38)Fa(xk+vik-1dk)>Fa(xk)+δvik-1(dk)TGa(xk),
where vik-1=tk/v. So by (P1) we obtain
(39)F(xk+υik-1dk)-F(xk)υik-1>δ(dk)TGa(xk).
By taking the limit in (39) on the subsequence k∈K, we have
(40)d-TG(x-)≥δd-TG(x-).
In view of (37), the last inequality also gives d-=0. Since Ga(x-)=-Bkdk and Bk is bounded, it follows from d-=0 that
(41)limk→∞,k∈KGa(xk)=G(x-)=0.
Therefore, x- is an optimal solution of problem (1).
In the next part, we focus our attention on establishing Q-superlinear convergence of the proposed algorithm.
Theorem 3.
Suppose that the conditions of Theorem 2 hold and x- is an optimal solution of (1). Assume that G is BD-regular at x-. Then x- is the unique optimal solution of (1) and the entire sequence {xk} converges to x-.
Proof.
By the convexity and BD-regularity of G at x-, x- is the unique optimal solution of (3); for the proof, see Qi and Womersley [22]. So x- is also the unique optimal solution of (1). This implies that both f and F must have compact level sets. By Lemma 1{xk} has at least one accumulation point, and from Theorem 2 we know this accumulation point must be x- since x- is the unique solution of (1). Next following the proof of Theorem 5.1, see Fukushima and Qi [1], we can prove that the entire sequence {xk} converges to x-.
The condition that the Lipschitz continuous gradient G of F is semismooth at the unique optimal solution of (1) is required in the next theorem. This condition is identified if f is the maximum of several affine functions or f satisfies the constant rank constraint qualification.
Theorem 4.
Suppose that the conditions of Theorem 3 hold and G is semismooth at the unique optimal solution x- of (1). Suppose further that
ak=o(∥G(xk)∥2),
limk→∞dist(Bk,∂BG(xk))=0,
tk≡1, for all large k.
Then {xk} converges to x- Q-superlinearly.
Proof.
Firstly we have {xk} converges to x- by Theorem 3. Then by condition (i) and (P5), we have
(42)∥Ga(xk)-G(xk)∥=O(∥ak∥)=o(∥G(xk)∥)=o(∥xk-x-∥).
By condition (ii), there is a B-k∈∂BG(xk) such that
(43)∥Bk-B-k∥=o(1).
Since G is semismooth at x-, we have, according to Qi and Sun [13],
(44)∥G(xk)-G(x-)-B-k(xk-x-)∥=o(∥xk-x-∥).
Notice that ∥Bk-1∥=O(1), (42)–(44) and condition (iii), for all large k, we have
(45)∥xk+1-x-∥=∥xk-x--Bk-1Ga(xk)∥=∥Bk-1[Ga(xk)-G(xk)+G(xk)-G(x-)222222-B-k(xk-x-)+(B-k-Bk)(xk-x-)]∥≥∥B-k-1∥[∥Ga(xk)-G(xk)∥222222+∥G(xk)-G(x-)-B-k(xk-x-)∥222222+∥B-k-Bk∥∥xk-x-∥].
This establishes Q-superlinear convergence of {xk} to x-.
Condition (i) can be replaced by a more realistic condition ak=o(∥G(xk-1)∥2) without impairing the convergence result since ak is chosen before xk is generated. For condition (ii), Fukushima and Qi [1] suggest one of possible choices of Bk, we may expect Bk to provide a reasonable approximation to an element in ∂BG(xk), but it may be far from what we should approximate. There are some approaches to overcome this phenomenon, see Mifflin [10] and Qi and Chen [3]. For condition (iii) we can make sure that if the conditions of Theorem 4, except (iii), hold and 0<δ<1/2, then condition (iii) holds automatically.
Acknowledgment
This research was partially supported by the National Natural Science Foundation of China (Grants no. 11171049 and no. 11171138).
FukushimaM.QiL.A globally and superlinearly convergent algorithm for nonsmooth convex minimization1996641106112010.1137/S1052623494278839MR1416531ZBL0868.90109RaufA. I.FukushimaM.Globally convergent BFGS method for nonsmooth convex optimization2000104353955810.1023/A:1004633524446MR1760737ZBL0986.90076QiL.ChenX.A preconditioning proximal Newton method for nondifferentiable convex optimization199776341142910.1016/S0025-5610(96)00054-8MR1433964ZBL0871.90065HeY. R.Minimizing and stationary sequences of convex constrained minimization problems2001111113715310.1023/A:1017575415432MR1850682ZBL0987.90067MifflinR.SagastizábalC.A VU-proximal point algorithm for minimization2002Berlin, GermanySpringerUniversitextLemaréchalC.OustryF.SagastizábalC.The U-Lagrangian of a convex function2000352271172910.1090/S0002-9947-99-02243-6MR1487623MifflinR.SunD.QiL.Quasi-Newton bundle-type methods for nondifferentiable convex optimization19988258360310.1137/S1052623496303329MR1618547ZBL0927.65074Hiriart-UrrutyJ. B.LemaréchalC.1993Berlin, GermanySpringerChenX.FukushimaM.Proximal quasi-newton methods for nondifferentiable convex optimization199595/32Sydney, AustraliaSchool of Mathematics, The University of New South WalesMifflinR.A quasi-second-order proximal bundle algorithm1996731517210.1016/0025-5610(95)00053-4MR1389668ZBL0848.90100KiwielK. C.Approximations in proximal bundle methods and decomposition of convex programs199584352954810.1007/BF02191984MR1326074ZBL0824.90110HintermüllerM.A proximal bundle method based on approximate subgradients200120324526610.1023/A:1011259017643MR1857057ZBL1054.90053GabasovR.KirilovaF. M.1980 (Russian)Minsk, BelarusIzdatel'Stov BGUSolodovM. V.On approximations with finite precision in bundle methods for nonsmooth optimization2003119115116510.1023/B:JOTA.0000005046.70410.02MR2028443ZBL1094.90046KiwielK. C.An algorithm for nonsmooth convex minimization with errors19854517117318010.2307/2008055MR790650ZBL0584.65034WolfeP.A method of conjugate subgradients for minimizing nondifferentiable functions19753145173MR0448896ZBL0369.90093LemaréchalC.An extension of davidon methods to non differentiable problems1975395109MR0436586ZBL0358.90051KiwielK. C.1981Laxemnburg, AustriaInternational Institute for Applied Systems AnalysisKiwielK. C.1982Warsaw, PolandDepartment of Electronics, Technical University of WarsawMifflinR.A modification and extension of Lemarechal's algorithm for nonsmooth minimization1982177790MR654692ZBL0476.65047FukushimaM.A descent algorithm for nonsmooth convex optimization198430216317510.1007/BF02591883MR758002ZBL0545.90082QiL. Q.WomersleyR. S.An SQP algorithm for extended linear-quadratic problems in stochastic programming19955625128510.1007/BF02031711MR1339796ZBL0835.90058