An Approximate Redistributed Proximal Bundle Method with Inexact Data for Minimizing Nonsmooth Nonconvex Functions

We describe an extension of the redistributed technique form classical proximal bundle method to the inexact situation for minimizing nonsmooth nonconvex functions. The cutting-planes model we construct is not the approximation to the whole nonconvex function, but to the local convexification of the approximate objective function, and this kind of local convexification is modified dynamically in order to always yield nonnegative linearization errors. Since we only employ the approximate function values and approximate subgradients, theoretical convergence analysis shows that an approximate stationary point or some double approximate stationary point can be obtained under some mild conditions.


Introduction and Motivation
Consider the following unconstrained nonsmooth nonconvex optimization problem: min  () where  is a function from   → .Nonsmooth optimization problems (NSO) arise from many fields of applications.There exist several approaches to solve these kinds of problems; see [1][2][3][4].Bundle methods [5] are based on the cuttingplanes methods, first described in [6,7], where the convexity of the objective function is the fundamental assumption.If the objective function  is convex, tangent lines are cutting planes supporting the epigraph of  and the linearization errors are always nonnegative, and the model functions usually defined by the maximum of tangent lines are lower approximations to the objective function.However, in the nonconvex case the linearization errors can be negative, and the corresponding model function does not stay below  and may even cut off a region containing a minimizer.Very little systematic research has been performed on extending convex bundle methods to nonconvex cases.The bundle methods for nonconvex functions [8][9][10][11][12] are of proximal type, which were developed in the 90's; see [13].They still use subgradients locality measures to redefine negative linearization errors, and primal information, corresponding to function values, is again ignored.
Note that in some cases computing the exact function value is not easy.The assumptions for using approximate subgradients and approximate values of the function are realistic, for instance, the Lagrangian relaxation problem: if  is a max-type function of the form () = sup{  () |  ∈ }, where each   () is convex and  is an infinite set, then it may be impossible to calculate () since  itself is defined by a minimization problem involving another function .However, we may still consider two cases.In the first case, for each positive  > 0 one can find an element   ∈  satisfying    () ≥ ()−; in the second case, this may be possible only for some fixed (any possibly unknown)  < ∞.In both cases we may set   =    () ≥ () − .Besides that, the study of approximate subgradients of convex functions is deserved since in some cases a subgradient () ∈ () is expensive to compute.But if we know an already computed subgradient 2 Mathematical Problems in Engineering () ∈ (), where  is near to , then we have () ∈   () because  () +  ()  ( − ) =  () +  ()  ( − ) +  ≤  () + , ∀ ∈   , (2) where  = () − () − ()  ( − ) ≥ 0. For more details and papers involving the approximate function values and subgradients, we refer to [14][15][16] and the references therein.
In our work, we attempt to explore the possibility of using the approximate function values and approximate subgradients of  instead of the exact ones for solving problem (1).From the point of view of primal problem, by separating the traditional prox-parameter into two parts, local convexification parameter and new model prox-parameter, and employing the inexact information of the objective function, we construct a cutting-planes model which is the approximation to the local convexification function, and the iterate points are obtained approximately by computing its double approximate proximal points.The cutting-planes model is special in the sense that it no longer approximates the objective function, but rather certain local convexification, centered at the proximal center.
This paper is organized as follows: in Section 2, some preliminary results and assumptions required in our paper are provided.In Section 3 we pay more attention to the primal pattern of (1) instead of the dual insight and the cuttingplanes model of local approximation to the objective function is constructed in this part.In Section 4, the concrete approximate redistributed algorithm for solving (1) is presented.Convergence results are examined and discussed in Section 5, and the iterate points generated by the proposed algorithm converge to an approximate (double approximate) stationary point of the objective function.In the last section, some conclusions are given.

Preliminaries and Assumptions
In this part, we first present some basic definitions and results [17].
(i) The regular subdifferential of  at  is defined by (ii) The limiting subdifferential of  at  is defined by If  is finite at , ∂() and () are closed and ∂() is convex.
(iii) For given  > 0, -limiting subdifferential of  at  is defined by We call elements of this approximate subdifferential approximate subgradients.
which is single-valued and Lipschitz continuous on  0 , provided  is sufficiently large.By imitating the conclusion in [18] and the optimality condition, we define a new kind of approximate stationary point of the objective function.Note that  is sufficiently large meaning that  >   , where   is the value in item (c) in Lemma 3 and The above relation with the local convexification property plays a fundamental role in the development of our algorithm if we have already known the ideal proximal threshold   .

Construction of the Model
For a convex function , the exact linearization error of  at x is defined by where   ∈ (  ) and x is the current stability center.
Obviously we have    ≥ 0, and the reformulated bundle data consists of { x , (x  ),   ∈ (x  )} and the approximate subgradients ∪ ∈  {(   ,   ∈     ( x ))}.In our method, at any iteration, bundle method keeps memory of the iterative process in a bundle of inexact information: where   ⊆ {0, 1, 2, . . ., }, f ∈ [(  ) −   , (  ) +   ], and fx () is the best approximate value obtained until iteration , evaluated at the serious step , corresponding to some past iterate   .For nonconvex function , we work with augmented functions We consider an augmented bundle of inexact information: Note that the following relation holds: where fx  = f + (/2)‖ ⋅ − x‖ 2 ,  =  + , and  and  are called the convexification parameter and model prox-parameter, respectively.We use the past information in bundle to construct a cutting-planes model of the function fx  : An equivalent expression, written with all the iteration indices, is the following: The next candidate point is chosen as  +1 = p  φ ( x ).The corresponding optimality condition is that there exists   ∈   such that where   denotes the unit simplex in is called the aggregate approximate subgradient, and the corresponding aggregate bundle element is the quadruplet: Note that for  = − and all  ∈  act  we have For all  ∈   , ẽ+1  ,  +1  , and Δ +1  can be updated according to the following formula: In classical bundle method with inexact information for convex functions, the bundle consists of the pair (ẽ For our method, this pair is replaced by a quadruplet (ẽ   ,    , Δ   ,    ) for which the relation holds, since for all  ∈   , whenever ẽ  +      + 2 ≥ 0, we have We want to make the parameter   asymptotically estimate the ideal convexity threshold   , and when   is sufficiently large, f + (  /2)‖ ⋅ − x ‖ 2 is a convex function.As a result, the model function φ becomes eventually a lower approximation to a locally convex function fx    .Set and clearly ẽ  +    + 2 ≥ 0 for all  ∈   whenever  ≥   .
Remark 5.By comparing with η [19] and  min  [18], we find that η ≥  min  ≥   , which means that the domain of  that ensures that the nonnegativeness of the linearization errors is enlarged.

Algorithm
Algorithm 6 (approximate redistributed proximal bundle method).Consider the following steps.
Step 6 (update of model prox-parameter).If f +1 > fx  +  0 , restart the algorithm by setting Step 7 (stopping criterion).If  +1 − 2 ≤ TOL stop , then stop with the message "Algorithm successfully terminated at  +1 ." Otherwise, in case of serious step increase  by 1.In all cases increase  by 1, and loop to Step 2.
From the definition of  +1 , we obtain its equivalent expression that is, f +1 ≤ fx  +  0 .

Convergence Results
Lemma 8.For the function φ , the following conclusions hold: Lemma 9. Consider the functions φ () given by (14).If  satisfies Assumptions 1 and 2, there exists an iteration  > 0 such that all the parameters sequences stabilize; that is, Therefore, condition (3) in [19] where  =  + , , and  are the stabilized convexification parameter and model prox-parameter.
Theorem 11.Suppose that   = 0 and there is no termination.Let  be the stabilized value for the local convexification parameter sequence, as in Lemma 9, and  >   .The following two mutually exclusive conclusions hold: (a) There is a last serious step x, followed by infinitely many null steps.Then  +1 → x and x is an approximate stationary point of .
(b) There is an infinite number of serious steps.Then any accumulation point of the sequence { x } is -double approximate stationary point of .
Remark 12.Note that if  = 0, the results obtained in Theorem 11 are exactly the ones in [18], which means that our work is really the generalization of previous work.

Conclusion
We propose an approximate redistributed proximal bundle method for nonsmooth nonconvex optimization by employing the inexact information from the objective function.With the inexact data we prove that the cutting-planes model constructed in this paper is eventually the local lower approximation to the approximate objective function.Analysis of the convergence proceeds by first showing that the convexification parameter eventually stabilized and once stabilized, the convergence to the approximate or -double approximate stationary point of the objective function is obtained under the condition that the stabilized convexification parameter is greater than the ideal proximal threshold   .The local convexification approach opens a new way for future study on nonsmooth nonconvex optimization and can shed a new insight on the first order models from [20].
In [21], the authors present a framework of general bundle methods which are capable of handling inexact oracles, and the framework generalizes in various ways a number of algorithms proposed in the literatures.Next we discuss the relationship between our algorithm and [21].In [21], the objective function  is a finite-valued convex function, and the authors make the following assumptions.For each given  ∈   , the oracle delivers the inexact information: (66) In our paper, if f is the function which is locally convex by adding a quadratic term with convexification parameter, we suppose that, for each given  ∈   , an oracle delivers the inexact information: (67) Assumption (67) is more general than (64) since if we choose   =   = , (67) will become f ∈ [() − , () + ],    ∈  2 (), which is exactly (66).Therefore, our assumption is, in a sense, some kind of generalization of assumptions in [21].
In [21], the authors use the linearizations and the cuttingplane model which becomes a lower approximation to the locally convex objective function.It is similar to (68) except for the appearance of 2.At the same time, the next trail point is chosen as  +1 = p  φ ( x ), the approximate proximal point of f defined by (6), which is quite different from the general traditional techniques introduced in [21].
In [21], the authors mention that, for inexact oracles, the progress made by the algorithm can be measured relative to the model, some nominal reference value, or the approximate objective function, and they are called and denoted by model decrease: effective decrease: In our paper, we define a predicted decrease  +1 = ( fx  + (  /2)‖ +1 − x ‖ 2 + 2) − φ ( +1 ), which is similar to    , but a bit different from    since   is only associated with current stability center û , while the first term in  +1 is not only related to current stability center x but also involved with the new trial point  +1 .And this predicted decrease is nonnegative as long as   is sufficiently large, and this requirement is exactly coincident with the condition which guarantees that φ (⋅) is a lower approximation to f(⋅) .Therefore, we can further employ it in the descent test to decide between making a descent step or a null step.

( a )
One always hasφ   V .
And if  is finite at ,   () is convex and closed.(iv) We say  is prox-bounded if there exists  ≥ 0 such that  + (/2)‖ ⋅ ‖ 2 is bounded below.The corresponding threshold is the smallest   ≥ 0 such that  + (/2)‖ ⋅ ‖ 2 is bounded below for all  >   .(v) The function  is lower- 2 on an open set  if  is finite on  and for any  in  there exists a threshold  2 > 0 such that  + (/2)‖ ⋅ ‖ 2 is convex on an open neighbourhood   of  for all  >  2 .Next we give some assumptions needed in our paper.Assumption 1.For fixed accuracy tolerances   ≥ 0,   ≥ 0, for each  ∈   , the oracle can provide an approximate function value f ∈ [()−  , ()+  ] and an approximate subgradient    of  at , where  =   +   .Given  0 ∈   and  0 > 0, there exist an open bounded set  and a function  such that  0 = { ∈   | f ≤ f 0 +  0 } ⊂ , and  is lower- 2 on  satisfying  ≡ f on  0 .