An Approximate Proximal Bundle Method to Minimize a Class of Maximum Eigenvalue Functions

We present an approximate nonsmooth algorithm to solve a minimization problem, in which the objective function is the sum of a maximum eigenvalue function of matrices and a convex function. The essential idea to solve the optimization problem in this paper is similar to the thought of proximal bundle method, but the difference is that we choose approximate subgradient and function value to construct approximate cutting-plane model to solve the above mentioned problem. An important advantage of the approximate cutting-plane model for objective function is that it is more stable than cutting-plane model. In addition, the approximate proximal bundle method algorithm can be given. Furthermore, the sequences generated by the algorithm converge to the optimal solution of the original problem.


Introduction
Bundle method is one of the efficient and promising methods for solving nonsmooth optimization problems.For example, a class of maximum eigenvalue function can be minimized by bundle method [1], bundle-filter method can be used to deal with nonsmooth convex constrained optimization problem [2], and penalized bundle method was proposed to solve nonsmooth optimization problem by Bonnans et al. [3].Recently, a minimization problem for a class of constrained maximum eigenvalue function has been solved in [4] with the help of penalized bundle method.However, when constructing the cutting-plane model for the objective function, the formula () =  *  max (())+() shows that the subdifferential of  max (()) is involved in [4].Note that  max (()) is the face of   exposed by (), where   := { ∈   :  ⪰ 0, tr  = 1}.So  max (()) changes drastically when the multiplicity of the  max (()) is changed [5].Thus () is unstable and it leads to the instability of cutting-plane model in [4].In this paper, to avoid this drawback, we try to give a more stable approximate cutting-plane model for the objective function.
In view of a class of nonsmooth optimization problem of the form min ∈   () +  () , (P) where () := ( max ∘)() is the composite function of  max and an affine mapping , specifically is affine,  0 ∈   and  is a linear operator from   to   ; () is a nonsmooth convex function.We modify the elements in bundle and give an approximate proximal bundle method algorithm for (P).Our algorithm is established on the basic assumption that at least one approximate function value and one approximate subgradient at each point are available.And suppose ri dom() ∩ ri dom() ̸ = 0.The initial motivation for our present work lies in the following facts.The whole idea of so-called bundle methods can be concentrated on constructing a good approximation for the objective function.Reference [4] has solved this class of optimization problem with the help of cutting-plane model and penalized bundle method.However, the cutting-plane model in [4] is unstable.Our motivation is to construct a more stable approximate model for the objective function.We try to use the approximate subdifferential of the objective function () := () + () = ( max ∘ )() + ().Note that in the equation   () =  *    max (()) +   (), obviously,    max (()) is involved.And    max (()) is no longer a face of   ; it is the intersection of   with the halfspace { ∈   :  ⋅ () ≥  max (()) − }.In particular, almost all matrices in    max (()) have rank  (for  > 0), whereas the rank of matrices in  max (()) is at most the multiplicity , which is 1 for almost all () (so    max (()) is relative stable); this also gives an idea of the big gap existing between these two convex sets.Therefore we decide to introduce an enlarged subdifferential [6] of  max (()) which can be regarded as an outer approximation of  max (()) and an inner approximation of    max (()).Through the enlarged subdifferential of  max (()), we can easily get an enlarged subdifferential of the objective function ().Using the approximate subgradient in the enlarged subdifferential to construct the approximate cutting-plane model for the objective function, there are two advantages: on one hand it can avoid the instability of the cutting-plane model which is constructed in [4], on the other hand it can avoid too much elements in    max (()).After constructing a more stable approximate cutting-plane model, we propose our algorithm and prove the convergence of the algorithm.
The rest of this paper is organized as follows.Section 2 mainly contains the approximate cutting-plane model of the objective function.Here we firstly introduce an enlarged subdifferential which is an outer approximation of (), simultaneously, it is an inner approximation of   ().And then, using the approximate subgradient in the enlarged subdifferential to complete the constructing of an approximate cutting-plane model for the objective function.Section 3 gives the algorithm with respect to the approximate proximal bundle method and also provides a corresponding compression mechanism.Section 4 is devoted to convergence analysis of the algorithm mentioned in Section 3. Section 5 gives the conclusions.
In the paper, the standard norm and inner product are all in Hilbert space and are denoted by ‖ ⋅ ‖ and ⟨⋅, ⋅⟩.

The Approximate Model of the Objective Function
In this section, we will mainly give the approximate model for the objective function.It is known that [4] has given a kind of approximate model for the objective function (): where ((  ))((  ))  ∈  max ((  )) and   ∈ (  ).
It is known that  max (()) is the face of   exposed by (), where   := { ∈   :  ⪰ 0, tr  = 1}.If the multiplicity of  max (()) is changed, the subdifferential of  max (()) changes drastically.So it leads that the subdifferential of function () is unstable.Thus this causes the instability of F ().In order to construct a stable approximate model, we take the approximate subdifferential of function () into account.Firstly, we consider the approximate subdifferential of  max (()).
For the composite maximum eigenvalue function, its approximate subdifferential is where   = { ∈   :  ⪰ 0, tr  = 1}.From this one sees that    max (()) is the intersection of   with the half-space { ∈   :  ⋅ () ≥  max (()) − }, instead of being a face of   .In particular, almost all matrices in    max (()) have rank  (for  > 0), whereas the rank of matrices in  max (()) are at most the multiplicity , which is 1 for almost all (); this gives ideas that    max (()) is more stable than  max (()) and there is a big gap existing between these two convex sets.Introduce a compact convex set where   (()) is a  ×   matrice whose columns form orthonormal basis of   (()).
Accordingly, the proof of right inclusion is completed.
Choose  *   ((  ))  ((  ))  ∈   (  ) and   ∈ (  ), then Next at each point   compute an approximate function value Thus, using these approximate information of (), the approximate model F () becomes Let the aggregate linearzation error at   be denoted by   : With the notation, we obtain the form of approximate model where  ∈    .
Observe that   () is a more stable approximate model than F ().In the following part, based on the approximate model   (), we will give the approximate proximal bundle method algorithm.

The Approximate Proximal Bundle Method Algorithm
Algorithm 3 (the approximate proximal bundle method).
Step 2. Solve the quadratic program: Define  +1 as the nominal decrease: Step 3.Call the black box again with  =  +1 , if Then set  +1 =  +1 , otherwise set  +1 =   .Corresponding, the former one is said to be descent step and the latter one is called null step.
Step 4. Append  +1 to the bundle model and construct  +1 .Change  to  + 1 and go to Step 1.
Remark 4. For a parameter   > 0, then the candidate point  +1 can be obtained through solving the dual problem of (P 1 ). +  ) is the solution to the quadratic programming problem (P 1 ), then (i) holds.
To see (ii) and (iii), note that since there is no duality gap, the primal optimal value in (P 1 ) equals the dual optimal value in ( 1 ).Thus the term (ii) holds.The relation () ≥   () ≥   ( +1 ) implies that the term (iii) holds.
From Theorem 5 we can ensure the candidate point  +1 which appears in Step 2 can be obtained.Remark 6.As iterations conduct, the elements in the bundle become more and more.When the size of the bundle becomes too big, it is necessary to compress bundle.So, at Step 4, one should append the compression subalgorithm.
When the current size of the bundle is bigger than the maximal size, that is,   ≥  max , Step 4 turns out to be as follows: Step 4  .Let  act := { ≤   :   ≥ 0} be the active indices.
If  act ≤  max − 1, then keep active couples and delete all inactive couples from the bundle.Set  left =  act and define  +1 =  left + 1.Then append new element to the bundle and construct model  +1 .Let  =  + 1, and go to Step 1.Note that   +1 is in new element: (i) when it is a descent step,   +1 = 0, (ii) when it is a null step, If  act >  max − 1, then delete two or more couples of elements.In addition,  left ≤  max −2.Define  +1 =  left + 2 and then append new element to the bundle and construct model  +1 .Let  =  + 1, and go to Step 1.Note that   +1 is in new element: (i) when it is a descent step,   +1 = 0, (ii) when it is a null step, Remark 7.Under certain circumstance with   >  max , if the remaining couples are still too many after discarding all inactive couples from the bundle, one synthesizes indispensable information of active elements in bundle.
Simultaneously the corresponding affine function is called aggregate linearization and is denoted by   () For the aggregate linearization   (), it holds that When the maximum capacity is reached, for instance, if  =  max , then assume that one discards the elements  1 ,  2 , . . .,   ( < ) from the bundle and appends the aggregate couple.The resulting model will be where   =  *   ((  ))  ((  ))  +   .Note that for all  and  ∈   , in any case, one can have

Convergence Analysis
To show convergence of the algorithm, we have to refer the stoping tolerance .So we consider two situations, that is,  > 0 and  = 0. Firstly, when the parameter  > 0, use the notation   to denote the set of indices  in which a new step is done.
Theorem 8. Consider Algorithm 3 and use the notation  * to denote lim ∈  (  ).Assume that the algorithm never stops ( → ∞) as well as  * > −∞.Then Proof.Since  > 0 and the algorithm never stops, the nominal decrease must satisfy   > 0 for all  ∈   .Note that   is a descent index set, we have  +1 =  +1 and (  ) − ( +1 ) = (  )−( +1 ).Let   be the index following  in   .Between  and   the algorithm makes null steps only without moving the stability center  +1 =  + for all  = 2, . . .,   − .The descent test at   gives Thus for all   ∈   , it holds that when we let   → ∞, Removing  to the right side of the inequation, we obtain the desired result.
When  is taken strictly positive, by Theorem 8, there is an index k for which  k ≤  if (P) has minimizers.By Theorem 5(ii), both   and are all small.Therefore,  k is the minimizer.Secondly, when the stopping tolerance  = 0, the algorithm either stops by having found a solution to (P) or it never stops.In this case, there are two possibilities for the sequence of descent steps {  } ∈  .One is that it has infinitely many elements.Another is that there is an iteration k where a last descent step is done, that is,   =  k for all  ≥ k.We proof these two cases separately.
Case 1.There are infinitely many elements in   .
Proof.(i) Note that the algorithm loops forever and  = 0 for all  ∈   , thus  +1 holds.Then the infinite sequence of objective values {(  )} is strictly decreasing.If (P) has no solution, the sequence {(  )} is close to −∞.Therefore the proof of (i) is completed.
(ii) To show that {  } is a minimizing sequence.Suppose for contradiction purposes that there exists x ∈   and  > 0 such that ( x) ≤ (  ) −  for all  ∈   .By Theorem 5(iii), we have