APPMATHISRN Applied Mathematics2090-55722090-5564International Scholarly Research Network71574810.5402/2011/715748715748Research ArticleGradient-Type Methods: A Unified Perspective in Computer Science and Numerical AnalysisFanelliStefanoKyriacouG.SunM.Dipartimento di MatematicaUniversità degli Studi di Roma Tor VergataViale della Ricerca Scientifica, 00133 RomaItalyuniroma2.it20112072011201104042011110520112011Copyright © 2011 Stefano Fanelli.This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This paper presents a general and comprehensive description of Optimization Methods, and Algorithms from a novel viewpoint. It is shown, in particular, that Direct Methods, Iterative Methods, and Computer Science Algorithms belong to a well-defined general class of both Finite and Infinite Procedures, characterized by suitable descent directions.

1. Introduction

The dichotomy between Computer Science and Numerical Analysis has been for many years the main obstacle to the development of eclectic computational tools. With the latter term the author indicates the capability of implementing algorithms properly adaptable to particular environmental requirements and, therefore, optimized for this aim.

Since the formulation of a problem requires the preliminary definition of the variables, and the functions involved in the model, the antithesis between finite and continuous applied mathematics is even stronger from a computational point of view.

In Computer Science, problems are typically defined on discrete sets (graphs, integer variables and so forth) and are characterized by procedures formalized in a finite number of steps.

Direct Methods, which are classical tools of Numerical Analysis, can be considered, in fact, algorithms according to the standard Computer Science definitions. However, the presence of ill-conditioned matrices can seriously affect the practical implementation of Direct Methods. On the other hand, Iterative Methods are based in the majority of cases on the convergence of a sequence approximating the optimal solution of a problem defined in a continuous range. Proper stopping rules on the truncation error reduce the latter computational scheme to a finite process, but, unfortunately, in many cases the theoretical result is affected by a variety of numerical instability problems, thereby preventing a precise forecast of the true number of iterations, requested to achieve the desired approximation.

Furthermore, Linear Programming, Convex Quadratic Programming, and the unconstrained minimization of a symmetric positive definite bilinear form are continuous problems that can be exactly solved with a finite number of steps. This proves that the distinction between algorithms and infinite iterative procedures is not always characterized by the discrete or the continuous range of the variables involved in the problem.

Most of Numerical Analysis methods are based upon the application of the Fixed Point theorem, assuring the convergence of the iterative scheme by means of a contraction of the distance between successive terms of the sequence approximating the optimal solution.

Gradient methods are usually considered in the literature as particular procedures in the frame of optimization techniques, for classical unconstrained or constrained problems.

The main aim of the present paper is to show that Gradient or Gradient-type methods represent the fundamental computational tool to solve a wide set of continuous optimization problems, since they are based on a unitary principle, referred to both to finite and to infinite procedures.

Moreover, some classical discrete optimization algorithms can be also viewed in the framework of Gradient-type methods.

Hence, the gradient approach allows to deal with problems involving variables defined both in a continuous range and in a discrete one, by utilizing finite or infinite procedures in a quite general perspective.

It is essential to underline that ABS methods , which represent a remarkable class of algorithms for solving linear and nonlinear equations, are founded on a quite different approach. Roughly speaking, ABS-methods construct, in fact, a set of spanning matrices in Rn, by performing an adaptive optimization, associated to the dimension of the subspace and parameter dependent. In many ABS-methods the choice of the set of optimal parameters is, in fact, crucial in order to identify by a unified approach the structural features of the optimization algorithms. Parameter dependence is not present in Gradient-type methods.

It is important to emphasize that the typical finiteness of Computer Science algorithms is characterized by classes of Gradient-type methods converging to an isolated point of a suitable sequence, generated by the procedure.

Furthermore, the most recent algorithms for Local Optimization can be precisely described by Gradient-type methods in a general framework. As a matter of fact, Interior Points techniques [2, 3], Barrier Algorithms  represent a wide set of Gradient-type methods for NonLinear Programming.

Moreover, a fundamental role in this new approach is played by the properties of suitable Structured matrices, associated to the optimization procedures. Advanced Linear Algebra Techniques are, in fact, essential to construct low-complexity algorithms.

We point out, in particular, the techniques based on Fast Transforms and the corresponding approximations by algebras of matrices simultaneously diagonalized .

The utilization of Advanced Linear Algebra Techniques in NonLinear Programming opens a new research field, leading in many cases to a significant improvement both of the efficiency and in the practical application of Gradient-type methods for problems of operational interest [11, 12].

In Deterministic Global Optimization structured matrices allow remarkable results in the frame of the Tunneling techniques, by using the classical αBB approach .

The novel results on Tensor computation  are a promising area of research to improve the efficiency of global optimization algorithms for large-scale problems and particularly for the effective construction of more general sets of Repeller matrices in the Tunneling phases [14, 15]. This approach can have important consequences also in Nonlinear Integer Optimization (see the pioneer work in ), taking into account the more recent results concerning the discretization of the problem by the continuation methods (see, e.g., ).

Therefore, this survey has also the aim of finding in-depth general relationships between Local Optimization techniques and Deterministic Global Optimization algorithms in the frame of Advanced Linear Algebra Techniques.

Let minxAf(x),ARn, be an unconstrained problem to be solved.

By assuming f(x)C1(A), the simplest heuristic procedure to deal with (2.1) is to determine the stationary points of f(x)A, that is, x*A:f(x*)=0 by the recursive computational scheme: x(k+1)=x(k)-λkf(x(k)),k=0,1 with

x(0)A being the initial point of the procedure,

for all  k,  λk is computed such thatx(k+1)A,f(x(k+1))f(x(k)), and the sequence {x(k+1)} satisfies the condition limkf(x(k))=0. The iterative method (2.2) is a particular case of the following general Gradient-type method:x(k+1)=x(k)-λks(k),k=0,1, where

s(k) is a descent direction, that is,

The following theorem generalizes a well-known result shown in .

Theorem 2.1 (see [<xref ref-type="bibr" rid="B14">7</xref>]).

If s(k) is a descent direction in x(k) for a function f(x), then s(k)=-Ak-1f(x(k)) with Ak being a symmetric positive definite (spd) matrix.

Moreover, the following property holds: cos(-f(x(k)),s(k)̂)c>0 cond (Ak)M,k.

Remark 2.2.

Particular cases of descent directions can be obtained, by setting

Ak=I  (Steepest Descent method),

(see [5, 6, 18, 19]).

It is useful to underline that the general theory of admissible directions for unconstrained optimization  is also a special case of (2.5). By setting, in fact, for a given γ>0: D(γ,x(k))={s(k)Rn:s(k)=1,  f(x(k))Ts(k)γf(x(k))},   one can obtain other Gradient-type methods described by (2.5).

The iterative scheme described by Algorithm 1 contains several ingredients of a general Gradient-type method see .

<bold>Algorithm 1</bold>

(a) Given {γk},{σk},    k=0,1s.t.

inf{γk}>0,    inf{σk}>0

Let x(0)Rn be a starting point

For a given vector s(k)D(γk,x(k)), set

x(k+1)=x(k)-λks(k), with λk(0,  σkf(x(k))):

f(x(k+1))minμ{f(x(k)-μs(k)),  0<μσkf(x(k))}

The convergence of Algorithm 1 is guaranteed by the following result (see again ).

Theorem 2.3.

Let f(x)C1(A), A open Rn. Let K={xRn:f(x)f(x(0))}Ax(0)K. Then, for all  {x(k)} evaluated by Algorithm 1:

x(k)K,  for all  k;

if x(k+1)x(k),  {x(k)} has at least an Extremal Point (EP) x*;

every EP  x* of {x(k)} is a stationary point, that is, f(x*)=0.

Remark 2.4.

Notice that Theorem 2.3 can be also applied in the case of classical Computer Science algorithms. As a matter of fact, if condition (ii) is not verified, then, by definition, k̃:x(k̃+1)=x(k̃), implying f(x(k̃))=0, that is, the convergence to a stationary point in a finite number of steps k̃. Moreover, the convergence to an isolated point x̂ of the sequence {x(k)} can be proven ab absurdo by showing that if  x̂EPk>k0,f(x(k))-f(x(k+1))>c0,c0>0. We will see in par. 3-4 that the convergence in a finite number of steps of a given iterative procedure can be verified in this way both for the unconstrained problems and for the constrained ones.

3. Local Unconstrained Optimization

Let C and b be a spd matrix of order n and a n-dimensional vector, respectively.

It is well known that the problem min12  xTCx-bTx,xRn can be exactly solved in at most n steps by the Conjugate Gradient (CG) method , which represents a direct method to solve (3.1). The quadratic form associated to a spd matrix C is, in fact, a convex function.

However, it can be also proved that the application of the procedure defined in (2.2), that is, the Steepest Descent method, always requires an infinite number of iterations, apart from the trivial case x(0)=x*. The latter result shows that the existence of a finite procedure to solve (3.1) does not depend only by the role played by convexity but it is also the consequence of a sort of optimal matching between the problem and the corresponding algorithm, which is in this case the (CG) method. On the other hand, the latter method can be also interpreted as an iterative method in the family of the following fixed point procedures: x(k+1)=((rI-C)x(k)+b)r with r being a suitable scalar parameter. By setting, in fact, H=I-Cr,D=(1r001r0001r01r), one can obtain the classical iterative scheme x(k+1)=Hx(k)+Db. Since Hs=1-1/ cond(C)  , (3.4) is convergent if the original matrix C is well conditioned.

Moreover, if x̂ is the optimal solution of (3.1), the truncation error of the method is:x(k)-x̂2(1-1cond(C))kx(0)-x̂2. In the case of (CG) method, one can prove the inequalityx(k)-x̂22(cond(C)-1cond(C)+1)kx(0)-x̂2. Equation (3.6) shows that, if the dimension n is huge and the matrix C is well conditioned, from a computational point of view it is more convenient to implement the (CG)method as a classical iterative procedure with a stopping rule based on the above inequality.

So, once again, the distinction between Numerical Analysis direct methods (or Computer Science algorithms) and infinite procedures cannot be considered as the fundamental classification rule in computational mathematics.

In the case of Steepest Descent method, the truncation error isx(k)-x̂22(cond(C)-1cond(C)+1)kx(0)-x̂2. The difference between (3.6) and (3.7) clearly indicates the major efficiency of (CG) method.

In  the finite version of (CG)-method was extended to a family of nonquadratic functions, including the following important sets: F(x)=xTCx(cTx)2,xX,G(x)=xTCx*(cTx)k,k  integer,xX, where X={xRn:cTx>0}.

According to the classical definition, the function F indicated in (3.8) is called conic. If k=-2, then G(x)F(x).

Hence, G represent a class of nonquadratic functions for which the optimal solution can be found with a finite number of steps if the matrix C is spd.

As a matter of fact, the following result holds.

Theorem 3.1 (see [<xref ref-type="bibr" rid="B26">22</xref>] Theorem 3.1, Lemmas 3.2 and 5.1).

Let G(x) be defined as in (3.9). Then the minimum problem minG(x),xX, can be solved in at most n steps.

Let us now consider some generalizations of the convexity, which play an important role in global optimization see .

Let s(k) be a descent direction in x(k) for a function f(x). The importance of the following definitions will be shown in the next results of this paragraph.

Definition 3.2.

A function f(x)C1(Rn) is called algorithmically convex if for all  x(k),  x(k+1) evaluated by an algorithm of type (2.5), one has (s(k+1)-s(k))T(x(k+1)-x(k))0.

Definition 3.3.

A function f(x)C1(Rn) is called weakly convex if for all  x(k),  x(k+1) evaluated by an algorithm (2.5), the following inequality holds: f(x(k+1))-f(x(k))2(f(x(k+1))-f(x(k)))T(x(k+1)-x(k))M.

Definition 3.4.

Let s(k)=-Ak-1f(x(k)),  for all  k be descent directions of an algorithm of type (2.5) applied to problem (2.1). Then the method is called secant if the matrix Ak solves the secant equation: Ak(x(k)-x(k-1))=f(x(k))-f(x(k-1)).

Definition 3.2 is clearly a generalization of convexity. As a matter of fact, if s(k)=f(x(k)) then f(x)C1(A),  ARn, is convex if and only if (3.11) is verified for all  x(k),    x(k+1)A (see ).

Definition 3.3 is also a generalization of convexity. In , in fact, it is proved that if f(x)C1(A),  ARn, is convex, then (3.12) is satisfied for all  x(k),  x(k+1)A. So (3.12) is a necessary, but not sufficient, condition for a function f to be convex.

Definition 3.4 is an n-dimensional generalization of the classical secant iterative formula to compute the zeroes of the derivative of a function f(x)C1(R1), that is, xk+1=f(xk)xk-1-f(xk-1)xkf(xk)-f(xk-1). Observe, in fact, that (3.14) can be rewritten as xk+1=xk-f(xk)ak,ak(xk-xk-1)=f(xk)-f(xk-1). Hence, the expression of ak is the 1-dimensional version of (3.13).

In  it is proved the following result.

Let s(k)=-Ak-1f(x(k)) be descent directions of a secant method, that is, satisfying (3.13), applied to problem (2.1). Moreover, let conditions (2.7) and (3.12) be verified. Then, {f(x(ki))}, such that limif(x(ki))=0.

Remark 3.6.

Theorem 3.5 shows that a global convergence for a quasi-Newton secant method applied to problem (2.1) can be obtained if the function f(x) is weakly convex and the matrices Ak approximating 2f(x(k)) are well conditioned.

Remark 3.7.

By utilizing Armijo-Goldstein-Wolfe's method  and setting s(k)=f(x(k)), the step λk in (2.5) is such that for all  k(f(x(k+1))-f(x(k)))T(x(k+1)-x(k))>0,f(x(k+1))<f(x(k)). Hence, by Definition 3.2, in this case the function f(x) is also algorithmically convex. For general descent directions s(k), evaluated by a quasi-Newton secant method, inequality (3.11) is not always satisfied.

4. Local Constrained Optimization

Quadratic Programming (QP) is defined in the following way: minxTCx+cTx,Ax=b,x0 with C being a symmetric semidefinite positive (ssdp) matrix of order n and A a matrix with m rows and n columns.

Remark 4.1.

Let P={xRn:  Ax=b,  x0}. The optimal solution of (4.1) can be located in any point of P. Hence, (4.1) is a continuous problem which cannot be immediately reduced to a finite problem as in the case F(x)=cTx, that is, Linear Programming (LP).

Let us consider, for instance, the following problems: minx12-3x2,2x1-x24,-x1-2x2-16,-2x1+4x2-8,-7x1+8x2-35,x10,x20,minx12+4x22-4x1-24x2+40,-7x1-6x2+420,-5x1+3x2+100,x10,x20.

The optimal solution of (4.2) is the point (3,2) which is in the boundary of P but is not a vertex. On the other hand, problem (4.3) has the optimal solution in the inner point (2,3). However, QP can be solved in general in a finite number of steps by means of Frank-Wolfe's algorithm . So, QP can be considered as a finite continuous constrained optimization problem.

The following question arises: does QP characterize the boundary, separating finite continuous constrained optimization problems from infinite ones? In other words, there exist more general nonlinear constrained optimization problems that can be solved in a finite number of iterations? Since in the unconstrained case we have shown in the previous paragraph that there exist nonquadratic problems that can be exactly solved in a finite number of iterations by utilizing the (CG)-method, the answer is expected to be positive.

Given a convex function f(x)C1(S),  S convex Rn, Convex Programming with Linear Constraints (CPLC) is defined as minf(x),Ax=b,x0. Problem (4.4) can be solved by the Reduced Gradient (RG) algorithm or by the Gradient Projection (GP) method [23, 26, 27].

Assuming A with maximum rank and taking into account Remark 2.4, one can introduce the following.

Definition 4.2.

Let f̂(x)C1(S),  S convex Rn, be a convex function.

Let P̂={xRn:Âx=b̂,  x0} be a nonempty polyhedron. The corresponding CPLC problem (4.4) is a finite continuous constrained optimization problem, if and only if there exists a convergent Gradient-type method (2.5) and a positive real number c0, such that if (2.5) would require an infinite number of steps, then x(k)P̂,    k,infk{f̂(x(k))-f̂(x(k+1))}c0,k>k0,c0>0. Equation (4.5) clearly implies that k*:f̂(x(k*))=minxP̂f̂(x).

The importance of Definition 4.2 can be pointed out by the next result, showing the relationship between (4.4) and a particular linear optimization problem.

Theorem 4.3.

Let x(k) be an admissible solution of (4.4). Let s(k) be a descent direction in x(k) for the function f(x). Then s(k) is an admissible descent direction for (4.4) if As(k)=0.

Moreover, for any fixed x̂(k) the optimal solution s* of the problem minf(x̂(k))Ts,As=0,s=1 is given by s*=(I-AT(AAT)-1A)f(x̂(k))(I-AT(AAT)-1A)f(x̂(k)). By setting c=f(x̂(k)) and x=s it was proven in  that (4.6) is equivalent to a general LP problem, that is, mincTx,Ax=b,x0. Furthermore, if T={xR+n,  Ax=0,  x=1}, the following result holds (see ).

Theorem 4.4.

Given a suitable integer L and the function. g(x)=j=1nlncTxxj=nlncTx-j=1nlnxj, then, (4.6) and hence (4.8) are equivalent to find a point x*: x*T,g(x*)<-2nL. Moreover, it is possible to determine a real number c0 and a sequence {x(k)}T by a GP algorithm with a suitable scaling procedure (see again ) such that g(x(k+1))<g(x(k))-c0.

By Theorem 4.4 and Definition 4.2 it follows that there exists a Gradient-type method (2.5) solving LP in a finite number of steps. Hence LP is a finite continuous constrained optimization problem. It is important to underline that the latter result is not a consequence of the intrinsic finiteness of the set of the possible optimal solutions (the vertices of a polyhedron) as in the classical simplex algorithm.

Given the convex functions, f(x),  h1(x),  h2(x)hm(x)C1(S),  S convex Rn, let us now consider the general Convex Programming (CP) problem:minf(x),hi(x)0,i=1,2m,x0. The following property is well known [23, 26].

Definition 4.5.

Letting x̂0 and I={i:hi(x̂)=0}, then the constraints of (4.12) are qualified if one of the following conditions is satisfied:             x*0:hi(x*)<0,  i=1,2m,  hi(x̂)  is locally concave in  x̂,iI,x̂. If hi(x)=ciTx, then (4.14) is trivially satisfied for all  x̂,  for all  iI.

So, from Definition 4.5 we deduce that the constraints of CPLC problem (4.4) are always qualified. Assuming in (4.4) A with maximum rank, we clearly obtain a condition equivalent to (4.13).

Definition 4.6.

A set CRn is called a convex cone if xCλxC,λ>0,x(1),x(2)C,  0λ1,λ21,  λ1x(1)+λ2x(2)C. The following theorem was proved in  in a general Hilbert space (see Theorem 2.3).

Theorem 4.7.

Let S1,S2Rn be closed convex cones, and let S1o denote the interior of S1. Assume that S1o.

Then the corresponding conic feasibility problem find   xS1oS2 can be solved in a finite number of steps.

The technique utilized to prove Theorem 4.7 is based upon the so-called Method of Alternative Projections (MAP) (see ).

Theorem 4.7 was extended in  (see Proposition 2.1) by assuming S1 and S2 be closed convex sets, thereby proving that a convex feasibility problem is equivalent to a conic feasibility problem. However, the open question remains how to express explicit formulas for the projection operators to convert the algorithm from S1 and S2 to the conified closed sets con(S1)¯ and con(S2)¯ in the case of nonlinear and nonquadratic problems. The Linear Matrix Inequality (LMI) feasibility problem was, in fact, efficiently solved in the literature (see ).

Remark 4.8.

Theorem 4.7 can be applied to CPLC problem (4.4), by assuming S1={xRn:Ax=b,  x0},S2(k)={xRn:f(x)t(k),  t(k)R},k=1,2,k0.

Hence, explicit formulas for the projection operators for suitable classes of nonlinear convex feasibility problems in terms of the corresponding conified sets might allow to solve CPLC problem (4.4) in the nonquadratic case with a finite number of steps. By utilizing Theorem 3.1, we can prove, in fact, the following important theorem.

Theorem 4.9 (see [<xref ref-type="bibr" rid="B17">33</xref>]).

Consider the particular CPLC problem minxTCx*(cTx)-2,C   spd ,Ax=b,-cTx0,x0. Assume that the optimal solution x* of problem (4.18) be such that -cTx*<0. Then, (4.18) can be converted into a convex feasibility problem by utilizing a proper modification of the Alternative Projection method, and the latter algorithm converges to the optimal solution with a finite number of steps.

Remark 4.10.

Given the convex set of feasible solutions S1={xRn:Ax=b,  -cTx0,  x0}, the proof of Theorem 4.9 is essentially based upon the following computational ingredients:

by Theorem 4.7, one can convert the closed convex set defined in (4.19) into a closed convex cone;

by Theorem 3.1, the extended version of (CG)-method and a suitable projection algorithm can be applied to problem (4.18) thereby obtaining a convergence with a finite number of steps.

5. Global Optimization

One can prove the following global convergence theorem .

Theorem 5.1.

Consider Problem (2.1), where f(x)C2(Rn).

Let fmin be the value of the optimal solution. Assume that ϵaR+,  ϵsR+:f(x(k))>ϵs   except for   k:f(x(k))-f min <ϵa. If in an iterative scheme of BFGS-type, x(k+1)=x(k)-λkB(k)-1f(x(k)),(B(k)=φ(B̃(k-1),),k), the following conditions are satisfied: f(x(k+1))-f(x(k))2(f(x(k+1))-f(x(k)))Tλkd(k)=yk2ykTskM, cond (B(k))N. Then ϵaR+,  k**:k>k**  f(x(k))-f min <ϵa. Theorem 5.1 points out as follows three conditions for a global optimization BFGS-type method.

Condition (5.1) assumes an optimal matching between the BFGS-type algorithm and the function F . (5.3) is equivalent to (3.12), that is, f(x) is weakly convex (see ).

Condition (5.4) can be easily satisfied, by modifying the matrices B(k) by a restarting procedure, because every descent direction is associated to an spd matrix (see Theorem 2.1).

Let us now consider the classical “box-constrained” problem: minf(x),xLxxU. Let xc(m)Lxc(m)xc(m)U denote the current box at iteration m.

Set αxc(m)=max{0,-12miniλi{2f(xc(m))}},Lc(m)(xc(m))=f(xc(m))+αxc(m)(xc(m)L-xc(m))(xc(m)U-xc(m)). The following global convergence theorem holds (see [35, 36]).

Theorem 5.2.

Consider Problem (5.6) and assume f(x)C2. These hypotheses imply cond (2f(x))c,mαm*=maxxc(m)αxc(m). Set fc(m)L=infxc(m)Lc(m)(xc(m)),fc(m)U=f((xc(m)L+xc(m)U)2). Then, it follows for all  m: fc(m)Lfc(m+1)Lminxc(m+1)f(xc(m+1))minxf(x),fc(m)Ufc(m+1)Uminxf(x)fc(m)L. Moreover, for all  ϵa>0,  m*:  for all  mm*: fc(m)U-fc(m)L<ϵa,xc(m)U-xc(m)L24ϵac,c   constant Theorem 5.2 can be immediately extended to Problem (2.1), by assuming a growth condition on the function f(x).

In fact, we have the following.

Corollary 5.3.

Given f(x)C2(Rn): limxf(x)=+. Equation (5.14) implies K0:minxRnf(x)minxK0  f(x),2f(x)c1,xK0. Assume 2f(x)-1c2 and hence cond (2f(x))c1c2. Then, the convergence results proved for (5.6) can be applied to (2.1).

We can fruitfully combine the results of Theorems 5.1 and 5.2, by proving the following.

Theorem 5.4.

Consider Problem (5.6) and assume f(x)C2(Rn).

If in a BFGS-type iterative scheme x(k+1)=x(k)-μkB(k)-1f(x(k)), the following conditions are satisfied: xLx(k)xU,k, cond (B(k))N,f(x(k+1))-f(x(k))2(f(x(k+1))-f(x(k)))Tλkd(k)=yk2ykTskM. Then (5.17) is convergent to the optimal solution of (5.6).

Proof.

By the assumptions it follows: cond(2F(x))c.

Hence, by (5.7) we have for all m: αm*=maxxc(m)αxc(m). Set L̃c(m)(xc(m))=f(xc(m))+αm*(xc(m)L-xc(m))(xc(m)U-xc(m)). Therefore, by (5.21) and (5.22) for all m: Lc(m)(xc(m))L̃c(m)(xc(m)),xc(m).L̃c(m)(xc(m))  convex  xc(m). So, fc(m)L=infxc(m)Lc(m)(xc(m))=minxc(m)L̃c(m)(xc(m)),m. Let xc(m)(k̃m) be a local minimum in the box c(m).

If xc(m+1)Lxc(m)(k̃m)xc(m+1)U and f((xc(m+1)L+xc(m+1)U)/2)f(xc(m)(k̃m)), then define fc(m+1)U=f(xc(m)(k̃m)). Else, set xc(m+1)(0)=(xc(m+1)L+xc(m+1)U)2,fc(m+1)U=f(xc(m+1)(k̃m+1)) with xc(m+1)(k̃m+1) being a local minimum evaluated by the starting point xc(m+1)(0) and contained in the box c(m+1). Since the assumptions of Theorem 5.2 are satisfied, by the results of  (see Theorem 2 and Corollary 2), it follows that (5.19), (5.20) imply that for all  ϵb>0,  {xc(mi)(k̃mi)}: fc(mi)Ufc(mi+1)U,f(xc(mi)(k̃mi))<ϵb. Applying Theorem 5.2, by inequalities (5.11) and (5.27) and by setting ϵ=max{ϵa,ϵb}, we have that for all  ϵ>0,  {xc(mi)(k̃mi)}, f(xc(mi)(k̃mi))2<ϵ,xc(mi)U-xc(mi)L24ϵc,f(xc(mi)(k̃mi))-f min fc(mi)U-fc(mi)L<ϵ. This completes the proof.

Although the local minimization phases are performed effectively by the iterative scheme (5.17), the convergence of the method to the global minimum is usually very slow by the very nature of the αBB approach. In particular, the number of the upper bounds fc(mi)U and the corresponding boxes mi, requested to obtain a satisfactory approximation can be unacceptable from a computational point of view. In order to overcome this problem, a fast determination of “good” local minima is essential.

More precisely, by the utilization of terminal repellers and tunneling techniques , one can build algorithms based on a sequence of cycles, where each cycle has two phases, that is, a local optimization phase and a tunneling one. The main aim of these procedures is to build a favourable sequence of local minima (maxima), thereby determining a set of possible candidates for the global minimum (maximum) more efficiently.

By injecting in the method suitable “tunneling phases,” one can avoid the unfair entrapment in a “bad” local minimum, that is, when the condition fc(m+1)U=f(xc(m+1)(k̃m+1))=f(xc(m+1)(k̃m))=fc(m)Uf min is verified for several iterations. For this purpose, the power of the repellers, utilized in the tunneling phases, plays a crucial role. The classical and well-known use of scalar repellers [14, 34] is often unsuitable, when the dimension n of the problem assumes values of operational interest. A repeller structured matrix, based on the sum of a diagonal matrix and a low-rank one , can be constructed to overcome the latter difficulty.

Let x(k̃) be an approximation of a local minimizer for f(x)C1.

A matrix A(k̃) is called a repeller matrix for x(k̃) if x̂, x̂=x(k̃)-A(k̃)  f(x(k̃)),f(x̂)<f(x(k̃)). The repeller matrix A(k̃) for any given computed local minimizer x(k̃) can be approximated in the following way (see ): A(k̃)λ(k̃)I+(Iμ+R)-1,2rank(R)4 with λ(k̃) being the maximal scalar repeller  that is, λ(k̃)=ϵaf(x(k̃))2,f(x(k̃)ϵa,ϵa  desired precision, with R being of the following structure: R=μ1ppT+μ2qqT+μ3prT+μ4rqTp,q,r  suitable vectors  μ1,μ2,μ3,μ4  scalars.   In this way, the application of a BFGS-type method can be effectively extended to the tunneling phases and hence to the whole global optimization scheme (see [9, 33]).

The structure in (5.33) can be generalized by using the recent Tensor-Train (TT)-cross approximation theory .

It is well known, in fact, that a rank-p matrix can be recovered from a cross of p linearly independent columns (or rows). Therefore, an arbitrary matrix can be interpolated by a pseudoskeleton approximation (see  and again ). In particular, since a repeller matrix is not arbitrary and possesses some hidden structure, it is fundamental to discover a low-parametric representation, which can be useful in the tunneling phases.

An operational cross approximation method, evaluating large close-to-rank-p matrices in 𝒪(np2) time complexity and by computing 𝒪(np) elements, was shown in .

6. Discrete Optimization

A well-known family of Computer Science methods is represented by the so-called Greedy algorithms. The simplest application of this type of procedures is in the standard Knapsack Problem (KP), that is, maxcTx,aTxb,x0,  integer.   Greedy approach is essentially a generalization of the classical Dynamical Programming (DP) methods, which are based on the Bellman Principle. By utilizing the DP computational scheme and assuming y integer, problem (6.1) can be reduced to the recursive solution of the following family of problems: maxc(k)Tx(k),a(k)Tx(k)y,x(k)0, integer ,1kn,1yb,   integer, where c(k),  a(k),  x(k) indicate the vectors associated to the first k components of c,a,x, respectively.

Given k and y, let ψk(y) be the value of the objective function corresponding to the optimal solution of problem (6.2).

The algorithm computes ψk(y) by the recursive formula ψk(y)=max{ψk-1(y),ψk(y-ak(k))+ck(k)}. By (6.3), the optimal value of (6.2) is determined by a generalized discrete Steepest Descent algorithm, since ck(k) is the k.th component of the gradient of the objective function and represents, in fact, the increase associated to the choice of the k.th object.

Therefore, formula (6.3) is based on a discrete Steepest Descent approach, and the value ψk(y-ak(k))+ck(k) assures that the corresponding solution is admissible.

Integer Nonlinear Programming with Linear Constraints problems (INPLCs) can be transformed into continuous GO problems over the unit hypercube . In order to reduce the difficulties caused by the introduction of undesirable local minimizers, a special class of continuation methods, called smoothing methods, can be introduced . These methods deform the original objective function into a function whose smoothness is controlled by a parameter. Of course, the success of the latter approach depends on the existence of a suitable smoothing function.

Hence, the Gradient-type methods for Global Optimization of Section 4 can be also applied to INPLC.

7. Conclusions

In this paper we have tried to demonstrate that Gradient or Gradient-type methods lead both to a general approach to optimization problems and to the construction of efficient algorithms.

In particular, we have shown that the class of problems for which the optimal solution can be obtained in a finite number of steps is larger than canonical unconstrained Convex Quadratic problems or Convex Quadratic Programming. Moreover, we have pointed out that the classical distinction between Direct Methods and Iterative Methods cannot be considered as a fundamental classification of techniques in Numerical Analysis. Many optimization problems can be, in fact, solved in a finite number of steps by suitable hybrid efficient algorithms (see ).

Furthermore, if the matrices involved in the computation are well conditioned, the superiority of Iterative Methods with respect to Direct ones, which is a typical feature of (CG) algorithm, can be proved an a more general context (see again ).

Several heuristic and ad hoc algorithms in operational environments can be considered, in fact, as particular cases of a general Gradient-type approach to the problem. In some cases, surprisingly enough, the convergence of Iterative Methods can be guaranteed only by utilizing a special Line-Search Minimization algorithm (see f.i. Fletcher-Reeves method in conjunction with Armijo-Goldstein-Wolfe's procedure, , Theorem 5.8).

It is also important to underline that many combinatorial problems, representing a remarkable benchmark set in Computer Science, can be translated in terms of Gradient-type methods in a general framework.

Once again, we stress that the Fixed Point theorem, which is considered a milestone in Numerical Analysis and guarantees the convergence of most of classical Iterative Methods, represents the background for only a subset of Gradient-type methods.

Acknowledgment

This paper was partially supported by PRIN 2008 N. 20083KLJEZ.

AbaffyJ.SpedicatoE.ABS Projection Algorithms1989Chichester, UKEllis Horwood220Ellis Horwood Series: Mathematics and Its Applications1015928ZBL0691.65022BoydS.VandenbergheL.Convex Optimization2004Cambridge, UKCambridge University Pressxiv+7162061575ForsgrenA.GillP. E.WrightM. H.Interior methods for nonlinear optimizationSIAM Review200244452559710.1137/S00361445024149421980444ZBL1028.90060NashS. G.SoferA.Linear and NonLinear Programming1995New York, NY, USAMcGraw-HillDi FioreC.FanelliS.LeporeF.ZelliniP.Matrix algebras in quasi-Newton methods for unconstrained minimizationNumerische Mathematik2003943479500198116410.1007/s00211-002-0410-4ZBL1034.65045Di FioreC.FanelliS.ZelliniP.Low-complexity minimization algorithmsNumerical Linear Algebra with Applications2005128755768217267510.1002/nla.449ZBL1164.65422Di FioreC.FanelliS.ZelliniP.Low complexity secant quasi-Newton minimization algorithms for nonconvex functionsJournal of Computational and Applied Mathematics20072101-2167174238916610.1016/j.cam.2006.10.060ZBL1139.65047Di FioreC.LeporeF.ZelliniP.Hartley-type algebras in displacement and optimization strategiesLinear Algebra and Its Applications2003366215232198772210.1016/S0024-3795(02)00499-8ZBL1044.65050FanelliS.fanelli@mat.uniroma2.itA new algorithm for box-constrained global optimizationJournal of Optimization Theory and Applications2011149117519610.1007/s10957-010-9780-4Di FioreC.Structured matrices in unconstrained minimization methodsFast algorithms for Structured Matrices: Theory and Applications (South Hadley, MA, 2001)2003323Providence, RI, USAAmerican Mathematical Society205219Contemporary Mathematics1999396ZBL1031.65076BortolettiA.Di FioreC.FanelliS.fanelli@mat.uniroma2.itZelliniP.A new class of quasi-Newtonian methods for optimal learning in MLP-networksIEEE Transactions on Neural Networks200314226327310.1109/TNN.2003.809425CaiJ.-F.ChanR. H.Di FioreC.Minimization of a detail-preserving regularization functional for impulse noise removalJournal of Mathematical Imaging and Vision20072917991237425810.1007/s10851-007-0027-4OseledetsI.TyrtyshnikovE.TT-cross approximation for multidimensional arraysLinear Algebra and Its Applications201043217088256645910.1016/j.laa.2009.07.024ZBL1183.65040CetinB. C.BarhenJ.BurdickJ. W.Terminal repeller unconstrained subenergy tunneling (TRUST) for fast global optimizationJournal of Optimization Theory and Applications199377197126122278610.1007/BF00940781ZBL0801.49001OseledetsI.TyrtyshnikovE.A unifying approach to the construction of circulant preconditionersLinear Algebra and Its Applications20064182-3435449226020310.1016/j.laa.2006.02.037ZBL1109.65041GeR. P.HuangC. B.A continuous approach to nonlinear integer programmingApplied Mathematics and Computation1989341, part I3960102059310.1016/0096-3003(89)90005-2ZBL0682.90067NgK. M.A continuation approach for solving nonlinear optimization problems with discrete variables, Ph.D. Dissertation2002Palo Alto, Calif, USAStanford UniversityNocedalJ.WrightS. J.Numerical Optimization1999New York, NY, USASpringerxxii+636Springer Series in Operations Research10.1007/b988741713114Di FioreC.FanelliS.ZelliniP.An efficient generalization of Battiti-Shanno's Quasi-Newton Algorithm for learning in MLP-networksProceedings of the International Conference on Neural Information Processing (ICONIP '04)2004Calcutta, IndiaSpringer483488StoerJ.Einführung in die Numerische Mathematik. I1972Berlin, GermanySpringerix+2500400616Dennis,J. E.Jr.SchnabelR. B.Numerical Methods for Unconstrained Optimization and Nonlinear Equations1983Englewood Cliffs, NJ, USAPrentice Hallxiii+378Prentice Hall Series in Computational Mathematics702023LukšanL.Conjugate gradient algorithms for conic functionsČeskoslovenská Akademie Věd1986316427440870480ZBL0622.65045LuembergerD. G.Linear and Nonlinear Programming1984Reading, Mass, USAAddison-WesleyPowellM. J. D.Some global convergence properties of a variable metric algorithm for minimization without exact line searchesNonlinear Programming (Proc. Sympos., New York, 1975)1976Providence, RI, USAAmerican Mathematical Society53720426428ZBL0338.65038FrankM.WolfeP.An algorithm for quadratic programmingNaval Research Logistics Quarterly1956395110008910210.1002/nav.3800030109MinouxM.Mathematical Programming: Theory and Algorithms1986Chichester, UKJohn Wiley & Sonsxxviii+489A Wiley-Interscience Publication868279RosenJ. B.The gradient projection method for nonlinear programming. I. Linear constraints196081812170112750ZBL0099.36405KarmarkarN.A new polynomial-time algorithm for linear programmingCombinatorica19844437339510.1007/BF02579150779900ZBL0557.90065GonzagaC. C.Conical projection algorithms for linear programmingMathematical Programming198943215117397813310.1007/BF01582287ZBL0667.90064RamiM. A.HelmkeU.MooreJ. B.A finite steps algorithm for solving convex feasibility problemsJournal of Global Optimization200738114316010.1007/s10898-006-9088-y2316406ZBL1180.90239BauschkeH. H.BorweinJ. M.On projection algorithms for solving convex feasibility problemsSIAM Review199638336742610.1137/S00361445932517101409591ZBL0865.47039VandenbergheL.BoydS.Semidefinite programmingSIAM Review1996381499510.1137/10380031379041ZBL0845.65023FanelliS.Finite iterative procedures in optimization, in preparationDi FioreC.FanelliS.ZelliniP.Computational experiences of a novel global algorithm for optimal learning in MLP-networks1Proceedings of the International Conference on Neural Information Processing (ICONIP '02)2002317321FloudasC. A.Deterministic Global Optimization200037Dodrecht, The NetherlandsKluwer Academicxviii+739Nonconvex Optimization and Its Applications1746644FloudasC. A.VisweswaranV.Primal-relaxed dual global optimization approachJournal of Optimization Theory and Applications1993782187225123680110.1007/BF00939667ZBL0796.90056TyrtyshnikovE. E.Incomplete cross approximation in the mosaic-skeleton methodComputing200064436738010.1007/s0060700700311783468ZBL0964.65048MoréJ. J.WuZ.Global continuation for distance geometry problemsSIAM Journal on Optimization199773814836146206710.1137/S1052623495283024ZBL0891.90168