DESCENT METHODS FOR CONVEX OPTIMIZATION PROBLEMS IN BANACH SPACES

We consider optimization problems in Banach spaces, whose cost functions are convex and smooth, but do not possess strengthened convexity properties. We propose a general class of iterative methods, which are based on combining descent and regularization approaches and provide strong convergence of iteration sequences to a solution of the initial problem.


Introduction
A great number of problems arising in mathematical physics, economics, engineering, and other fields can be formulated as the optimization problem.Find a point x * in a convex and closed subset D of a Banach space E such that f x * ≤ f (x) ∀x ∈ D, (1.1) or briefly, where f : E → R is a convex function; see, for example, [1,2,8] and the references therein.
It is well known that standard iterative methods for solving (1.2), which are designed for finite-dimensional optimization problems, cannot guarantee strong convergence of their iteration sequences to a solution of the initial problem if the cost function does not possess strengthened convexity properties such as strong convexity.Usually, these methods provide only weak convergence to a solution.However, such a convergence is not satisfactory for many real problems, which are ill-posed in general, since even small perturbations of the initial data may cause great changes in solutions.These questions are crucial for developing stable solution methods.Strong convergence ensuring stability and continuous dependence of the initial data can be obtained via the regularization approach (see [9]), when the initial problem is replaced with a sequence of perturbed well-posed problems.There exist a great number of applied problems in physics, mechanics, optimal control, economics, engineering, and other fields whose stable solutions require the regularization approach; see [9,10,11] and the references therein.However, each perturbed problem has nonlinear optimization form and its solution can be found only approximately.In order to avoid the difficulty of solving perturbed problems within a prescribed accuracy, some other regularization techniques, such as averaging and iterative regularization, were proposed for obtaining implementable algorithms with strong convergence of their iteration sequences (see, e.g., [11]).However, due to their divergent series step-size rules, convergence may be very slow.
Very recently, another approach to constructing implementable and strongly convergent algorithms for variational inequalities was proposed in [4,5].This approach is based on applying a descent algorithm for perturbed problems which are obtained by adding the identity map, and introducing artificial gap functions, which allow for estimating error bounds.
In this paper, motivated by the above approach, we intend to present a rather general class of implementable algorithms for solving the convex optimization problem (1.2) in a reflexive Banach space.These algorithms have a flexible structure in the sense that they can be easily adjusted for essential features of each particular problem under consideration.At the same time, they provide the strong convergence of the iteration sequence if the initial problem is solvable.Namely, we introduce a general class of perturbation functions and utilize its properties for designing gap functions and descent methods.

Regularization of optimization problems
In this section, we give a strong convergence result of approximate solutions of perturbed problems.First we recall several definitions and auxiliary properties.A function ψ : E → R is said to be uniformly convex (see [6]) if there exists a continuously increasing function θ : R → R such that θ(0) = 0 and that for all x, y ∈ E and for each λ ∈ [0,1], we have If θ(τ) = κτ for κ > 0, then ψ is called a strongly convex function.One can see that the class of uniformly convex functions is rather broad.These functions possess several very useful properties (see [6] and also [10]), which are listed in the following proposition.In order to apply the Tikhonov regularization approach to problem (1.2), we introduce an auxiliary function ϕ : E → R, which satisfies the following property.
(B1) ϕ : E → R is a uniformly convex and lower semicontinuous function.Now we define the perturbed optimization problem Note that the choice ϕ(x) = 0.5α x 2 , where α is the regularization parameter, is the classical Tikhonov regularization.For brevity, set then the function f ε is clearly uniformly convex and lower semicontinuous.From Proposition 2.2, it follows that problem (2.6) has a unique solution, which will be denoted by x ε .We now establish the convergence result for approximate solutions of problem (2.6) to a solution of problem (1.2), which can be viewed as a modification of the known results from [3,9,10,11].
Proposition 2.3.Suppose that assumptions (A1)-( A3) and (B1) are fulfilled.Then any sequence {z k }, which is generated in conformity with the rules converges strongly to the unique solution x * n of the problem (2.9) Proof.First we note that the set D * is nonempty, convex, and closed.Hence by Proposition 2.2, problem (2.9) has a unique solution.Next, we proceed to show that lim k→∞ x εk = x * n . (2.10) Since f ε is uniformly convex with function εθ, we have from (2.5) that for each ε > 0 and for an arbitrary point x * ∈ D * , where θ( but ϕ is bounded from below on E, hence the sequence {x εk } is bounded, that is, it has weak limit points.If x is an arbitrary weak limit point of {x εk }, then (2.11) yields x * n and taking the corresponding limit in (2.12), we now obtain hence x * n = x and (2.10) is true.Next, by definition, (2.15) It follows now from (2.8) and (2.10) that {z k } converges strongly to x * n .Thus, in order to present an implementable algorithm for solving problem (1.2), we have to find an approximate solution z k of each perturbed problem (2.6) with the prescribed accuracy µ k in a finite number of iterations.

Properties of the perturbed auxiliary problem
In this section, we establish several results which will be used for construction and substantiation of an iterative solution method for problem (2.6).Recall that, on account of Proposition 2.2, this problem has a unique solution if assumptions (A1), (A2), and (B1) are fulfilled.Nevertheless, in order to construct a convergent solution method, we will make use of the additional differentiability condition.More precisely, we replace (A2) and (B1) with the following assumptions.
(A2 ) f : E → R is convex and has the Lipschitz continuous gradient map ∇ f : E → E * , where E * is the conjugate space.
Observe that can be regarded as the primal gap function for problem (3.1) or, equivalently, as the regularized gap function for problem (3.2) with the regularization term ε[ϕ(x) − ϕ(x ε ) − ∇ϕ(x ε ),x − x ε ] (see [4,7]).The idea of utilizing auxiliary terms in regularization and proximal point methods for constructing smooth gap functions was first suggested in [4] and called the nonlinear smoothing approach.This approach enables us to avoid including additional parameters and functions in descent methods and to propose very flexible computational schemes.We now establish an error bound for the auxiliary problem (2.6).
Proof.Fix x ∈ D and for brevity, set y = y ε (x).Then, due to Lemma 3.1, we have where the second inequality follows from the same optimality criterion applied to problem (3.5).Adding these inequalities gives Using (A2 ), (B1 ), and Proposition 2.1(iii), we have (3.12) Applying now the above inequality and the monotonicity of ∇ϕ yields and the result follows.

M. S. S. Ali 2353
We now give the basic descent property which utilizes the direction y ε (x) − x.
Proof.Take arbitrary points x ,x ∈ D and set y = y ε (x ), y = y ε (x ).Using the optimality condition (3.16) gives Adding these inequalities and taking into account (3.2), we obtain which implies the continuity of x → y ε (x).

Descent method for convex optimization
First we describe a descent algorithm for solving the auxiliary problem (2.6) for a fixed ε > 0. This algorithm follows the general iteration scheme from [7].
Since the function ϕ may be chosen rather arbitrarily within rule (B1 ), this algorithm admits in fact a wide variety of iteration schemes.That is, one can choose ϕ to be suitable for approximation of properties of the cost function f in (1.2), for solution of the auxiliary problem (3.5), and for computation of the gradient ∇ϕ.These properties make Algorithm 4.1 very flexible in comparison with the usual gradient schemes.
Step 2. Compute y i = y ε (x i ) and set d i = y i − x i .
Step 3. Find m as the smallest nonnegative integer such that set λ i = β m , x i+1 = x i + λ i d i , i = i + 1 and go to Step 2.
Algorithm 4.1 The next theorem presents a convergence result for Algorithm 4.1.
Proof.It has been mentioned that (2.6) has a unique solution due to Proposition 2.2.By (A2 ) and (B1 ), ∇ f ε is Lipschitz continuous with constant L = L f + εL ϕ , where L f and L ϕ are the corresponding Lipschitz constants for ∇ f and ∇ϕ, respectively.It follows that the well-known inequality (see, e.g., [10, Chapter 2, Section 3]) holds: In view of (3.15), we have If is satisfied for a positive λ, then the line-search procedure in Algorithm 4.1 becomes implementable.However, (4.4) is equivalent to
Being based on the above result, we can approximate the solution x ε of each perturbed problem (2.6) with any prescribed accuracy in a finite number of iterations.In order to construct such a combined regularization and descent method, we will make use of error bound (3.8).The corresponding solution method for the initial convex optimization problem (1.2) can be described as follows.
Method.Choose a point z 0 ∈ D, a number δ > 0, and a positive sequence {ε k } 0. For each k = 1,2,..., we have a point z k−1 ∈ D; apply Algorithm 4.1 with Then we set z k = x i and increase k = k + 1.
We now show that our method is fully implementable and generates a strongly convergent sequence.Theorem 4.2.Suppose that assumptions (A1), (A2 ), (A3), and (B1 ) are satisfied and that a sequence {z k } is generated by the method.Then, (i) each kth iteration of the method is finite; (ii) the sequence {z k } converges strongly to the point x * n , which is the unique solution of problem (2.9).
Proof.First we note that the inequality (4.13) will be satisfied in a finite number of iterations, since d i → 0 as i → +∞ due to Theorem 4.1 and Propositions 3.2 and 3.5.Hence, assertion (i) of the theorem is true.Next, combining (4.13) and (3.8) yields since {ε k } 0.Moreover, it follows that for some sequence {µ k } 0. We now see that all the assumptions of Proposition 2.3 are satisfied.Therefore, assertion (ii) of the theorem also holds.

Numerical experiments
As has been mentioned, a great number of applied problems in physics, mechanics, optimal control, economics, engineering are solved via the regularization approach; for example, see [9,10,11].In this section, we present some preliminary results of numerical experiments that illustrate convergence properties of the combined regularization and descent method.We chose a well-known ill-posed optimal control problem; see, for example, [10, page 162, Example 2].The problem is to minimize the functional f (x) = It is easy to see that the cost functional is convex and differentiable, that the optimal control x * (•) ≡ 0 is determined uniquely, and that the minimal value f * = 0.At the same time, the sequence x k (t) = sin(2πkt) minimizes the cost functional, but that is, it is not norm converging to the solution.It means that the initial problem is ill-posed.We apply the described combined regularization and descent method to solve this problem.We implement the method with double-precision arithmetic with piecewise constant approximation of the control function x(t).We used the stopping criterion x k − z k ≤ η (5.4) in Algorithm 4.1.The total number of iterations for several values of the accuracy η and the number N of pieces of control approximation is given in Table 6.1.The results show that the convergence of the method is rather stable and rapid for ill-posed problems.

Conclusions
Thus, we have presented a general iterative scheme of implementable algorithms with strong convergence for convex optimization problems.Although this scheme is used for differentiable problems, the above approach may be viewed as a basis for nondifferentiable problems.