A Decomposition Algorithm for Convex Nondifferentiable Minimization with Errors

A decomposition algorithm based on proximal bundle-type method with inexact data is presented for minimizing an unconstrained nonsmooth convex function f . At each iteration, only the approximate evaluation of f and its approximate subgradients are required which make the algorithm easier to implement. It is shown that every cluster of the sequence of iterates generated by the proposed algorithm is an exact solution of the unconstrained minimization problem. Numerical tests emphasize the theoretical findings.


Introduction
Consider minimizing the following problem: where f : R n → R is a nondifferentiable convex function.It is well known that many practical problems can be formulated as 1.1 , for example the problem of catastrophe, ruin, vitality, data mining, and finance.A classical conceptual algorithm for solving 1.1 is the proximal point method, based on the Moreau-Yosida regulation of f 1, 2 .Implementable forms of the method can be obtained by means of a bundle technique, alternating serious steps with sequences of null steps 3, 4 .
More recently, new conceptual schemes for solving 1.1 have been developed by using an approach that is somewhat different from Moreau-Yosida regularization.This is the VUtheory introduced in 5 ; see also 6, 7 .The idea is to decompose R n into two orthogonal where P is a compact subset of R m and q : R m → R, h : R m → R n .Lagrangian relaxation of the equality constraints in the problem leads to the following problem min x∈R n C x , 1.3 where is the dual function.Trying to solve problem 1.2 by means of solving its dual problem 1.3 makes sense in many situations.In this case, evaluating the function value C x and a subgradient g x ∈ ∂C x requires solving the optimization problem 1.4 exactly.Actually, in some cases, computing exact values of C x is unnecessary and inefficient.For this reason, some modifications of bundle methods in 9 were needed.The paper is organized as follows.In the next section we present the approximate U-Lagrangian based on the approximate subgradient.Then we design a conceptual Algorithm 2.6 which can deal with the approximate subgradients and approximate function values.Section 3 breaks into 3 parts.In the first part, we propose the approximate primaldual track.The proximal bundle-type subroutine with inexact data is introduce in the second part.The third part of Section 3 is devoted to establishing an implemental Algorithm 3.5 which substitutes the approximate V-step in Algorithm 2.6 with proximal bundle subroutine.Numerical testing of the resulting Algorithm 3.5 is reported in the final section.

Approximate U-Lagrangian and Its Properties
In some cases, computing exact values of the objective function and exact subgradient is unnecessary and inefficient.For this reason, some modification of the U-Lagrangian will be proposed in this section.We assume that g satisfies Introducing this g to 5 , one can restate the definition of U-Lagrangian and its properties as follows.
Definition 2.1.Assume 2.1 .The approximate U-Lagrangian of f, denoted by L g u , is defined as follows.
and W g u is defined by Theorem 2.2.Assume 2.1 .Then the following assertions are true: i the function L g defined in 2.2 is convex and finite everywhere; iii 0 ∈ W g 0 and L g 0 f x .
Theorem 2.3.Assume 2.1 and W g u / ∅.Then one has that Remark 2.4.Assume 1.2 .If ε 0, then the approximate U-Lagrangian in this paper is exactly the U-Lagrangian in 5 .

Approximate Decomposition Algorithm Frame
In order to give an approximate decomposition algorithm frame, we restate the definition of Hessian matrix in 5 as follows.
Definition 2.5.Assume that f is finite, x is fixed, and g satisfies 2.1 .We say that f has at x a U-Hessian H U f x associated with g if L g u has a generalized Hessian at 0, setting Assume 2.1 , we investigate an approximate decomposition algorithm frame based on the definition of the approximate U-Lagrangian.Algorithm 2.6.
Compute an optimal solution δv ∈ V satisfying Set x x k 0 ⊕ δv.
Step 3. U-step.Make a Newton step in x .Compute that g k satisfies for all Compute the solution δu ∈ U satisfying Step 4. Update-set Compute that g k 1 satisfies for all x, f x Theorem 2.7.Assume 2.1 and f has a positive definite U-Hessian at x, a minimizer of f.Then the iterate points {x k } constructed by Algorithm 2.6 satisfy this algorithm is the same as Algorithm 4.5 in 5 .However, it only uses the approximate objective function values which make the algorithm easier to implement.

Approximate Decomposition Algorithm
Since the Algorithm 2.6 in Section 2 relies on knowing the subspaces U and V and converges only locally, it needs significant modification.In 10 , Mifflin and Sagastizábal show that a proximal point sequence follows primal track near a minimizer.This opens the way for defining a VU-algorithm where V-steps are replaced by proximal steps.In addition, the proximal step can be estimated with a bundle technique which also can approximate the unknown U and V subspaces as a computational byproduct.Therefore, they establish Algorithm 6 in 8 by combing the bundle subroutine with the VU-space decomposition method.However, this algorithm needs the exact function values and exact subgradients, which is expensive to compute.Therefore, the study of using approximate values instead of the exact ones is deserving.

Approximate Primal-Dual Track
Given a positive scalar parameter μ, the proximal point function depending on f is defined by where • stands for the Euclidean norm.It has the property: Similarly to the definition of primal track, we define the approximate primal track.
Definition 3.1.For any ε > 0, μ μ x > 0, we say that Θ u x u ⊕ v u is an approximate primal track leading to x, a minimizer of f, if for all u ∈ R dim U small enough, it satisfies the following: iii the Jacobian JΘ u is a basis matrix for V Θ u ⊥ ; iv the particular U-Lagrangian L 0 u is a C 2 -function.
Accordingly, we have the approximate dual track denoted by Γ u corresponding to the approximate primal track.More precisely, In fact, if ε 0, the approximate primal-dual track is exactly the primal-dual track shown in 8 .
The next lemma addresses that making an approximate V-step in Algorithm 2.6 essentially amounts to finding a corresponding approximate primal track point.Lemma 3.2.Let Θ u μ x be an approximate primal track leading to x, a minimizer of f, and let H : ∇ 2 L 0 0 .Then for all u sufficiently small Θ u μ x is the unique minimizer of f on the affine set Θ u μ x V Θ u μ x .
Proof.Since JΘ u is a basis for V Θ u ⊥ , Theorem 3.4 in 10 with B U u JΘ u gives the result.

The Proximal Bundle-Type Subroutine with Inexact Data
Throughout this section, we make the following assumption: at each given point x ∈ R n , and for ε ≥ 0, we can find some f x ∈ R and g x ∈ R n satisfying where ε f x − f x .At the same time, it can be ensured that where This setting is realistic in many applications; see 11 .
The bundle method includes two phases.i The first phase makes use of the in- bundles to establish a polyhedral approximation of f at the actual iterate x k .ii Due to the kinky structure of f, the model is possibly not precise for approximation f.Then, more information around the actual iterate x k is mobilized to obtain a more reliable model.Feature i leads to the following approximation of f at x k .Let I k denote the index set at the kth iteration with each j ∈ I k representing y j , f j , g j , where f j and g j satisfy for given ε k > 0. From the choices of f j and g j , we have that, for all x ∈ R n and for all j ∈ I k , where On the basis of the above observation, we attempt to explore the possibility of utilizing the approximate subgradient and approximate function values instead of the exact ones.We approximate f at x k from below by a piecewise linear convex function ϕ of the form: Since 3.8 becomes more and more crude if an approximation of f is farther away from x k , we add the proximal term 1/2 μ p − x k 2 , μ > 0, to it.To approximate an proximal point, we solve the first quadratic programming subproblem min r

3.9
Its corresponding dual problem is

3.10
Let r, p and λ λ 1 , . . ., λ |I k | denote the optimal solution of 3.9 and 3.10 , then it is easily seen that r ϕ p , p x k − 1 μ g, where g :

3.11
In addition, λ j 0 for all j ∈ I k such that r > f k g jT p − x k − α k,j ε k and

3.12
The vector p is an estimate of an approximate proximal point.Hence, it approximates an approximate primal track point when the latter exists.To proceed further we let y j : p and compute f p , e : f p − ϕ p f p − r, and g j satisfying f z ≥ f p g j , z − p , for all z ∈ R n .
An approximate dual path point, denoted by s, is constructed by solving a second quadratic problem, which depends on a new index set

3.13
The second quadratic programming problem is min r

3.14
It has a dual problem min 1 2 j∈ I k λ j g j 2 s.t.

3.15
Similar to 3.11 , the respective solutions, denoted by r, p and λ, satisfy r ϕ p , p − x k − s, where s j∈ I k λ j g j .

3.16
Given σ ∈ 0, 1/2 , the proximal bundle subprocedure is terminated and p is declared to be an approximation of 3.17 Otherwise, I k above is replaced by I k , and new iterate data are computed by solving updated subproblems 3.9 and 3.14 .This update, appending α k,j , g j to active data at 3.9 , ensures convergence to a minimizing point x in case of nontermination.
Remark 3.3.From the talking above, the following results are true: i s arg min{ s 2 : s ∈ co{g j : j ∈ I k }}; ii since p μ x is an approximate primal track point Θ u μ x approximated by p and co{g j : j ∈ I k } approximates co{g : f x ≥ f Θ u g, x − Θ u }, from 3.2 the corresponding Γ u μ x is estimated by s; iii we can obtain the U by means of the following iteration.

I act
k : j ∈ I k : r g jT p − x .

3.18
Then, from 3.16 , r −g jT s, j ∈ I act k , so g j − g l T s 0, 3.19 for all such j and for a fixed l ∈ I act k .Define a full column rank matrix V by choosing the largest number of indices j satisfying 3.19 such that the corresponding vectors g j − g l are linearly independent and by letting these vectors be the columns of V .Then let U be a matrix whose columns form an orthonormal basis for the null space of V T with U I if V is vacuous.Theorem 3.4.At the kth iteration, the above proximal bundle subprocedure satisfies the following: iv s ≤ g , where g μ x k − p ; v for any parameter m ∈ 0, 1 , 3.17 implies Proof.i Since g j satisfies f z ≥ f p g j , z − p and e f p − ϕ p ≥ 0, g j satisfies

3.21
where ε f p − f p , so the result of item i holds for j j .From the definition of p, r, and I k we have that for all j / j in so for all such j, e f p − ϕ p f p − f k − g jT p − x k α k,j − ε k .

3.23
In addition,

3.24
Adding 0 e − e to this inequality gives

3.25
which means that g j ∈ ∂ ε e f p for j / j ∈ I k .
ii Multiplying each inequality in 3.25 by its corresponding multiplier λ j ≥ 0 and summing these inequalities, we have

3.26
Using the definition of s from 3.16 and the fact that j∈ I k λ j 1 gives 27 which means that s ∈ ∂ ε e f p .In a similar manner, this time using the multipliers λ j that solve dual problem 3.10 and define g in 3.11 , together with λ j : 0, obtains the result.
iii Since g μ ∈ ∂f p μ x k , we have

3.28
From ii : g ∈ ∂ ε e f p , we get

3.29
Therefore, Then, since the expression for g from 3.11 written in the form g −μ p − x k , 3.31 combined with g μ x k μ x k − p μ x k implies that g − g μ x k μ p μ x k − p , we obtain item iii .
iv From 3.10 , 3.11 , 3.31 , and the definition of I k , we have that μ x k − p g is in the convex hull of {g j , j ∈ I k }.We obtain the result by virtue of the minimum norm property of s.

3.32
Finally, combing this inequality with item iv gives 3.20 .
Journal of Applied Mathematics 11

Approximate Decomposition Algorithm and Convergence Analysis
Substituting the approximate V-step in Algorithm 2.6 with proximal bundle-type subroutine, we present an approximate decomposition algorithm as follows.Afterwards a detailed convergence analysis is given.The main statement comprises the fact that each cluster point of the sequence of iterates generated by the algorithm is an optimal solution.
Step 0. Compute g 0 satisfying f z ≥ f p 0 g 0 , z − p 0 , where f p 0 ∈ f p 0 − ε 0 , f p 0 .Let U 0 be a matrix with orthonormal n-dimensional columns estimating an optimal U-basis.Set s 0 g 0 and k : 0.
Step 2. Choose an n k × n k positive definite matrix H k , where n k is the number of columns of U k .
Step 3. Compute an approximate U-Newton step by solving the linear system Step 4. Choose μ k 1 > μ, σ k 1 ∈ 0, 1/2 , initialize I k , and run the bundle subprocedure with x x k 1 .Compute recursively, and set e k 1 , p k 1 , s k 1 , U k 1 : e, p, s, U .
Step 5.If Otherwise, execute a line search x k 1 : arg min f p k , f p k 1 , 3.36 reinitialize I k , and rerun the bundle subroutine with x x k 1 , to find new values for e, p, s, U , then set e k 1 , p k 1 , s k 1 , U k 1 , ε k 1 e, p, s, U, τε k .
Step 6. Replace k by k 1 and go to Step 1.
Remark 3.6.In this algorithm, this algorithm is the same as Algorithm 6 in 8 .However, this algorithm uses proximal bundle-type subroutine which can deal with the approximate subgradients and the approximate function values.Theorem 3.7.One of the following two cases is true: i if the proximal bundle procedure in Algorithm 3.5 does not terminate, that is, if 3.17 never hold, then the sequence of p-values converges to p μ x and p μ x is a minimizer of f; ii if the procedure terminates with s 0, then the corresponding p equals p μ x and is a minimizer of f.Proof.By 12 , Prop.4.3 , if this procedure does not terminate then it generates an infinite sequence of e-values and ε-value converging to zero.Since 3.17 does not hold, the sequence of s -values also converges to 0. Thus, item iii in Theorem 3.4 implies that { p} → p μ x .And Theorem 3.4 ii gives

3.37
By the continuity of f, this becomes

3.38
The termination case with s 0 follows in a similar manner, since 3.17 implies e 0 in this case.
The next theorem establishes the convergence of Algorithm 3.5, and the proof of which is similar to Theorem 9 in 8 .ii if f is bounded from below, then any accumulation point of {p k } is a minimizers of f.
Proof.In this paper, the inequalities of 3.15 , 3.16 , and 3.17 in 8 become 3.41 since s k / 0, 3.39 implies that { f p k } is decreasing.Suppose { f p k } −∞.Then summing 3.39 over k and using the fact that m/2μ k ≥ m/2μ for all k implies that { s k } → 0. Then 3.41 with σ k ≤ 1/2 and μ k ≥ μ > 0 implies that {e k } → 0, which establishes i .Now suppose f is bounded from below and p is any accumulation point of {p k }.Then, because { s k }, {e k }, and {ε k p } converge to 0, 3.40 together with the continuity of f implies that f p ≤ f z for all z ∈ R n and ii is proved.
In Table 1, we show some relevant data for the problems, including the dimensions of V and U, the known optimal values and solutions, and the starting points.
We calculate an ε-subgradient at x by using the method in 13 : g ε x λg x 1 − λ g x 1 , where g x is a subgradient at x and g x 1 is a subgradient at a point x 1 such that 0 r} and λ ∈ 0, 1 are randomly chosen.The approximate function value f x is randomly taken out from the interval f x − ε, f x .The radius r is adjusted iteratively in the following way: If we find the linearization error α 0 x, x 1 > ε then r is reduced by a multiple smaller than one.On the other hand, if α 0 x, x 1 is significantly smaller than ε, then r is increased by a multiple greater than one.When s k s in the algorithm, then U-Hessian at x is computed in the following form: H k U T k j∈ B λ j H j U k , where H j ∇ 2 f i j y j , i j is an active index such that f i j y j f y j , λ j correspond to s via 3.16 .The parameters have values η 1.0 × 10 −4 , ε 0 1.0 × 10 −4 , m 1.0 × 10 −1 , τ 1.0 × 10 −1 , and U 0 equal to the n × n identity matrix.As for σ k , μ k , one can refer to 8 .Table 2 shows the results of Algorithm 3.5 for these examples, compared with Algorithm 6 in 8 .Number of f/g denotes the number of evaluation of the function and subgradient ε-subgradient in Algorithm 6 and Algorithm 3.5.x is the calculated solution, |fx − fx * | stands for the difference between the function values at x and x * .
It is shown in Table 2 that we obtain quite accurate solutions by Algorithm 3.5 with inexact data costing a slightly more evaluation number than that with exact data.One noticeable exceptional occurs in the example F3d-U1; it seems that the decomposition algorithm is sensible with exact data, but is more stable when applying inexact data function values and subgradients .This favorable results demonstrate that it is suitable to use approximate decomposition algorithm to solve 1.1 numerically.

Theorem 3 . 8 .
Suppose that the sequence {μ k } in Algorithm 3.5 is bounded above by μ.Then the following hold:i the sequence { f p k } is decreasing and either { f p k } → −∞ or { s k } and {e k } both converge to 0;

Table 1 :
Problem data.The italic data in Table1is calculated by our algorithms.

Table 2 :
Numerical results of UV-decomposition algorithm with inexact data.