To solve an unconstrained nonlinear minimization problem, we propose an optimal algorithm (OA) as well as a globally optimal algorithm (GOA), by deflecting the gradient direction to the best descent direction at each iteration step, and with an optimal parameter being derived explicitly. An invariant manifold defined for the model problem in terms of a locally quadratic function is used to derive a purely iterative algorithm and the convergence is proven. Then, the rank-two updating techniques of BFGS are employed, which result in several novel algorithms as being faster than the steepest descent method (SDM) and the variable metric method (DFP). Six numerical examples are examined and compared with exact solutions, revealing that the new algorithms of OA, GOA, and the updated ones have superior computational efficiency and accuracy.
1. Introduction
The steepest descent method (SDM), which can be traced back to Cauchy (1847), is the simplest gradient method for solving unconstrained minimization problems. However, the SDM performs well during earlier stages and as approaching to a stationary point it converges very slowly. In this paper, we consider the following nonlinear minimization problem without considering constraint:
(1)minf(x),
where f:ℝn↦ℝ is a ℂ2 differentiable function.
In the iterative solution of (1), if xk is the current iterative point, then we denote f(xk) by fk, ∇f(xk) by gk, and ∇2f(xk) by Ak, which is known to be a symmetric Hessian matrix. The second-order Taylor expansion of function f(x) at the point xk is
(2)f(x)=fk+gkTΔx+12(Δx)TAkΔx,
where Δx=x-xk. The superscript T signifies the transpose and meanwhile gkTΔx signifies the inner product of gk and Δx.
Let x=xk-λ0gk, and inserting it into (2) we can obtain
(3)f(xk-λ0gk)=fk-λ0gkTgk+λ022gkTAkgk.
By requiring the minimization with respect to λ0, we can derive
(4)λ0=∥gk∥2gkTAkgk.
Then, we have a famous steepest descent method (SDM) for solving (1).
Give an initial x0, and then compute g0=∇f(x0).
For k=0,1,2,…, we repeat the following iteration:
(5)xk+1=xk-∥gk∥2gkTAkgkgk.
If ∥gk+1∥<ε, then stop; otherwise, go to step (ii).
For the minimization problem (1) we need to solve ∇f=0, and hence the residual means the value of ∥∇f∥=∥g∥. The above convergence criterion ∥gk+1∥<ε means that when the residual norm is smaller than a given error tolerance ε, the iterations are terminated.
In the derivation of SDM for solving (1) it is easy to see that we have transformed the global minimization problem into a model problem in terms of a quadratic minimization problem of
(6)ϕ(x)=12xTAx-bTx+c0
and determined the coefficient λ0 by (4), where c0=fk-gkTxk+xkTAkxk/2+k0 with k0 being a constant to raise the level value of ϕ, and b=Akxk-gk is a constant vector within each iterative step. Here for the purpose of simple notation we omit the subscript k in (6), and we are going to modify the SDM by starting from the above locally quadratic function.
Several modifications of the SDM have been addressed. These modifications have led to a new interest in the SDM that the gradient vector itself is not a bad choice but rather that the original steplength λ0 leads to a slow convergence behavior. Barzilai and Borwein [1] have presented a new choice of steplength through a two-point stepsize. Although their method did not guarantee the descent of the minimum function values, Barzilai and Borwein [1] were able to produce a substantial improvement of the convergence speed for a certain test of a quadratic function. The results of Barzilai and Borwein [1] have spurred many researches on the SDM, for example, Raydan [2, 3], Friedlander et al. [4], Raydan and Svaiter [5], Dai et al. [6], Dai and Liao [7], Dai and Yuan [8], Fletcher [9], and Yuan [10]. In this paper, we will approach this problem from a quite different viewpoint of invariant manifold and propose a new strategy to modify the steplength and the descent direction. Besides the SDM, there were many modifications of the conjugate gradient method for the unconstrained minimization problems, like Birgin and Martinez [11], Andrei [12–14], Zhang [15], Babaie-Kafaki et al. [16], and Shi and Guo [17].
Also, there is another class method with the descent direction d in f(xk-λd) being taken to be D∇f(xk), where D is a positive definite matrix that approximates the inverse of the Hessian matrix A, which is usually named the quasi-Newton method. The earlier method of minimization of a nonlinear function by using this type approach is performed by Davidon [18], which was then simplified and reformulated by Fletcher and Powell [19] and was referred to as the variable metric method (DFP).
The remaining portions of this paper are arranged as follows. In Section 2 we describe an invariant manifold to derive the governing ordinary differential equations (ODEs). The main results are derived in Section 3, which includes the proof of convergence theorem, optimal parameter, optimal algorithm, a critical parameter, and a globally optimal algorithm. Then, in Section 4 we employ the rank-two Broyden-Fletcher-Goldfarb-Shanno (BFGS) updating techniques to update the Hessian matrix or its inversion, resulting in several novel optimal algorithms. The numerical examples are tested in Section 5 to assess the performance of the newly proposed algorithms. Finally, the conclusions are drawn in Section 6.
2. An Invariant Manifold
From this section on, we focus on the local minimum problem defined in terms of ϕ in (6), rather than that of f. When the novel algorithms are developed, we will return to the minimization problem (1). The present approach is different from the conventional line search method by minimizing the steplength λk in
(7)f(xk-λkdk)=minλ>0f(xk-λdk),
where dk is a given search direction.
At the first, we consider an iterative scheme of x derived from the ordinary differential equations (ODEs) defined on an invariant manifold which is formed from ϕ(x):
(8)h(x,t):=Q(t)ϕ(x)=C.
Here, we let x be a function of a fictitious time variable t. We do not need to specify the function Q(t) a priori, of which C/Q(t) is merely a measure of the decreasing of ϕ in time. Hence, we expect that in our algorithm if Q(t)>0 is an increasing function of t, the iterative point xk can tend to the minimal point. We let Q(0)=1, and C is determined from the initial condition x(0)=x0 by
(9)C=ϕ(x0)>0.
We can suitably choose the constant k0 and hence c0 in (6), such that ϕ(x)>0. Indeed, the different level of ϕ does not alter its minimal point.
When C>0 and Q>0, the manifold defined by (8) is continuous, and thus the following differential operation being carried out on the manifold makes sense. For the requirement of consistency condition we have
(10)Q˙(t)ϕ(x)+Q(t)(Ax-b)·x˙=0,
which is obtained by taking the time differential of (8) with respect to t, using (6), and considering x=x(t). We suppose that x is governed by the following ODEs:
(11)x˙=-κu,
where κ is to be determined. Inserting (11) into (10) we can solve
(12)κ=q(t)ϕg·u,
where
(13)g:=Ax-b,(14)q(t):=Q˙(t)Q(t).
We further suppose that
(15)u=u1+αu2=:g+αBg,
where α is a parameter to be determined below through the solution of an optimal equation derived, and B is a descent matrix to be specified. Here, we assert that the driving vector u is an optimal linear combination of the gradient vector and a supplemental vector being the gradient vector times B.
3. Numerical Methods3.1. Convergence Theorem
Before the derivation of optimal algorithms we can prove the following convergence result.
Theorem 1.
For an iterative scheme to solve (1) by
(16)x(t+Δt)=x(t)-(1-γ)g·uuTAuu,
which is generated from the ODEs in (11), the iterative point x on the manifold (8) has the following convergence rate:
(17)
Convergence
Rate
:=Q(t+Δt)Q(t)=1s>1,
where
(18)0<s=1-1-γ22a0<1,(19)a0:=ϕuTAu(g·u)2≥12,
and 0≤γ<1 is a relaxation parameter.
Proof.
The proof of this theorem is quite lengthy and we divide it into three parts.
(A) Inserting (12) into (11) we can obtain an evolution equation for x:
(20)x˙=-q(t)ϕg·uu.
In the algorithm if Q(t) can be guaranteed to be an increasing function of t, we might have an absolutely convergent property in finding the minimum of ϕ through the following equation:
(21)ϕ(t)=CQ(t),
which is obtained from (8). Here we simplify the notation of ϕ(x(t)) to ϕ(t).
(B) By applying the Euler method to (20) we can obtain the following algorithm:
(22)x(t+Δt)=x(t)-βϕg·uu,
where
(23)β=q(t)Δt.
In order to keep x on the manifold defined by (21) we can insert the above x(t+Δt) into
(24)12xT(t+Δt)Ax(t+Δt)-bTx(t+Δt)=CQ(t+Δt)-c0,
and obtain
(25)CQ(t+Δt)-c0=12xT(t)Ax(t)-bTx(t)+βϕ[b-Ax(t)]Tug·u+β2ϕ2uTAu2(g·u)2.
Thus, by (13), (21), and (6) and through some manipulations we can derive the following scalar equation:
(26)12a0β2-β+1=Q(t)Q(t+Δt),
where
(27)a0:=ϕuTAu(g·u)2≥12,
of which the inequality can be achieved by taking a suitable value of k0 and hence c0 in (6).
(C) Let
(28)s:=Q(t)Q(t+Δt),
and by (26) we can derive
(29)12a0β2-β+1-s=0.
From (29), we can take the solution of β to be
(30)β=1-1-2(1-s)a0a0,if1-2(1-s)a0≥0.
Let
(31)1-2(1-s)a0=γ2≥0,(32)s=1-1-γ22a0,
and the sufficient condition 1-2(1-s)a0≥0 in (30) is satisfied automatically, and thus by (30) and (31) we can obtain a preferred solution of β by
(33)β=1-γa0.
Here 0≤γ<1 is a relaxation parameter. The inequality 0<s<1 follows from (32), 1-γ2>0, and 2a0≥1. Inserting the above β into (22) and using (19) we can derive algorithm (16). For this algorithm we can define the local convergence rate by
(34)ConvergenceRate:=ϕ(t)ϕ(t+Δt),
which, using (28) and (21) and 0<s<1 just proved, renders (17). This ends the proof of Theorem 1.
3.2. Optimization of α
In Algorithm (22) we do not yet specify how to choose the parameter α. We can determine a suitable value of α such that s defined in (32) is minimized with respect to α, because a smaller s will lead to a faster convergence as shown by (17).
Thus by inserting (19) for a0 into (32) we can write s to be
(35)s=1-(1-γ2)(g·u)22ϕu·(Au),
where u as defined by (15) includes a parameter α. Let ∂s/∂α=0, and through some algebraic operations we can solve α and denote it by
(36)αo=g·u1u1·(Au2)-g·u2u1·(Au1)g·u2u1·(Au2)-g·u1u2·(Au2),
where the subscript o signifies that αo is the optimal value of α.
Remark 2.
For the usual three-dimensional vectors a, b, c∈ℝ3, the following formula is famous:
(37)a×(b×c)=(a·c)b-(a·b)c.
Liu [20] has developed a Jordan algebra by extending the above formula to vectors in n-dimension:
(38)[a,b,c]=(a·b)c-(c·b)a,a,b,c∈ℝn.
In terms of the Jordan algebra we can write
(39)αo=[u1,g,u2]·(Au1)[u2,g,u1]·(Au2),
where the symmetry of A was used. It can be seen that the above equation is a more symmetric form than that in (36).
3.3. An Optimal Algorithm
Now we can let xk denote the numerical value of x at the kth step and go back g to gk, u to uk, A to Ak, and B to Bk. Thus, by using (16) we can derive an iterative algorithm:
(40)xk+1=xk-ηgk·ukukTAkukuk,
where
(41)η=1-γ.
Therefore, we have the following optimal algorithm (OA).
Select 0≤γ<1, and give an initial x0.
For k=0,1,2,…, we repeat the following iterations:
(42)αk=[gk,gk,Bkgk]·(Akgk)[Bkgk,gk,gk]·(AkBkgk),uk=gk+αkBkgk,xk+1=xk-(1-γ)gk·ukukTAkukuk.
If ∥gk+1∥<ε, then stop; otherwise, go to step (ii).
Again we emphasize that we need to solve ∇f=0 for the minimization problem (1), and hence the residual means the value of ∥∇f∥=∥g∥. The above convergence criterion ∥gk+1∥<ε means that when the residual norm is smaller than a given error tolerance ε, the iterations are terminated.
3.4. A Critical Value for α
In Sections 3.2 and 3.3 we have used ∂s/∂α=0 (or equivalently, ∂a0/∂α=0) to find the optimal value of α in the descent vector u=g+αBg. Usually, this value of α obtained from ∂s/∂α=0 is not the global minimum of a0 (or s). Here, we try another approach and attempt to derive a better value of α than αo, such that the value of α obtained in this manner is the global minimum of a0 (or s).
In practice, we can take
(43)a0:=ϕuTAu(g·u)2=as.
When as is near to 0.5, the convergence speed is very fast. Inserting (15) for u into the above equation and through some elementary operations we can derive a quadratic equation to solve α:
(44)e1α2+e2α+e3=0,
where
(45)e1:=ϕu2TAu2-as(g·u2)2,(46)e2:=2ϕu1TAu2-2asg·u1g·u2,(47)e3:=ϕu1TAu1-as(g·u1)2.
If the following condition is satisfied:
(48)D:=e22-4e1e3≥0,
then α in (44) has a real solution:
(49)α=D-e22e1.
Inserting (45)–(47) into the critical equation:
(50)D=e22-4e1e3=0,
we can derive an algebraic equation to determine that as is the lowest bound of (48). In this lowest bound as is a critical value denoted by ac, and for all as≥ac it can satisfy (48) automatically. From (50) through some elementary operations, the critical value ac can be solved as
(51)ac=ϕ[u1TAu1u2TAu2-(u1TAu2)2]u1TAu1(g·u2)2+u2TAu2(g·u1)2-2u1TAu2g·u1g·u2.
Then by inserting it for as into (45) and (46) we can obtain a critical valueαc for α from (49):
(52)αc=acg·u1g·u2-ϕu1TAu2ϕu2TAu2-ac(g·u2)2,
where D=0 was used in view of (50).
By inserting (51) for ac into (52) and cancelling out the common term ϕ we can derive a final equation for αc, of which we use the same symbols for saving notations:
(53)ac=u1TAu1u2TAu2-(u1TAu2)2u1TAu1(g·u2)2+u2TAu2(g·u1)2-2u1TAu2g·u1g·u2,(54)αc=acg·u1g·u2-u1TAu2u2TAu2-ac(g·u2)2.
Here we must emphasize that, in the current descent vector u=g+αBg, the above value αc is the best one, and the vector
(55)u=g+αcBg(bestdescentvector)
is the best descent vector. Due to its criticality, if one attempts to find a better value of the parameter α than αc, there would be no real solution of α. Furthermore, the best descent vector is also better than the optimal vector u=g+αoBg derived in Section 3.2.
3.5. A Globally Optimal Algorithm
Then, we can derive the following globally optimal algorithm (GOA) to solve the minimization problem in (1).
Select 0≤γ<1 and give an initial guess of x0.
For k=0,1,2,…, we repeat the following iterations:
(56)u1k=gk,u2k=Bkgk,ack=(u1k·Aku1ku2k·Aku2k-(u1k·Aku2k)2)×(u1k·Aku1k(gk·u2k)2+u2k·Aku2k(gk·u1k)2-2u1k·Aku2kgk·u1kgk·u2k)-1,αk=ackgk·u1kgk·u2k-u1k·Aku2ku2k·Aku2k-ack(gk·u2k)2,(criticaloptimalparameter),uk=gk+αkBkgk,(bestdescentvector),(57)xk+1=xk-(1-γ)gk·ukukAkukuk.
If xk+1 converges according to a given stopping criterion ∥gk+1∥<ε, then stop; otherwise, go to step (ii).
Remark 3.
We have derived a novel globally optimal algorithm for solving the minimization problem in (1). In terms of the descent vector u=g+αcBg, the GOA is the best one, which leads to the global minimum of a0 (or s) and hence the largest convergence rate. While the parameter γ is chosen by the user with problem dependence, the parameter αk is exactly given by (56). Up to here we have successfully derived a novel best descent vector algorithm, with the help from (19), (53), and (54).
Remark 4.
At the very beginning we have set an invariant manifold h(x,t)=Q(t)ϕ(x) in (8) as our starting point to derive the iterative optimal algorithms, which includes a locally objective function ϕ in the governing equations. However, in the final stage the terms which include ϕ can be cancelled out, and thus we have obtained the optimal algorithm in (42) and the globally optimal algorithm in (57), which are both independent of ϕ.
4. The Broyden-Fletcher-Goldfarb-Shanno Updating Techniques
In the above we have derived two optimal algorithms by leaving the descent matrix B to be specified by the user. First we fix B to be the exact Hessian matrix A, and then we can obtain two optimal algorithms OA and GOA.
We can also apply the technique of BFGS by updating the Hessian matrix A and its inverse matrix B. To derive this updating technique let us mention the Newton iterative scheme to solve (1):
(58)xk+1=xk-λk∇2f(xk)-1gk,
where λk is the optimal steplength along the Newton direction ∇2f(xk)-1gk at the kth step. In order to construct a matrix Bk to approximate ∇2f(xk)-1 we can analyze the relation between ∇2f(xk)-1 and the first order derivative gk. We take the Taylor expansion of f(x) at a point xk+1, obtaining
(59)f(x)≈f(xk+1)+∇f(xk+1)T(x-xk+1)+12(x-xk+1)T∇2f(xk+1)(x-xk+1).
Then we have
(60)∇f(xk)≈∇f(xk+1)+∇2f(xk+1)(xk-xk+1).
Let
(61)pk=xk+1-xk,(62)qk=∇f(xk+1)-∇f(xk),
and from (60) we have
(63)qk≈∇2f(xk+1)pk.
Assume that the Hessian matrix ∇2f(xk+1) is invertible and denote the inverse matrix by Bk+1. Then from the above equation we have the so-called quasi-Newton condition:
(64)pk=Bk+1qk.
When we take the descent matrix B to be the inverse of the Hessian matrix A, we might accelerate the convergence speed, which is however a more difficult task when the dimension of the Hessian matrix is quite large. So far, as that done in the quasi-Newton method, we can employ the following updating technique due to BFGS:
(65)Bk+1=Bk+(1+qkTBkqkqk·pk)pkpkTqk·pk-pkqkTBk+BkqkpkTqk·pk.
The advantage by using the above updating technique is that we can obtain an approximation of the inverse of the Hessian matrix, and we do not need to really calculate A-1.
By the same token we can also apply the technique of BFGS to update the Hessian matrix without needing the computation of the real Hessian matrix:
(66)Ak+1=Ak+qkqkTqk·pk-AkpkpkTAkpk·(Akpk),
where at the first step we can take A0=In. The advantage of this approach is that we do not need to calculate the Hessian matrix exactly, as developed independently by Broyden [21], Fletcher [22], Goldfarb [23], and Shanno [24]. It is easy to check that the updating technique in (66) satisfies the quasi-Newton condition:
(67)Ak+1pk=qk,
which is an inversion relationship of (64).
The present notations of A and B are the same as those B and H used by Dai [25, 26].
The above technique together with the OA by using Bk=Ak generated from (66) will be named the OABFGS, and two numerical examples will be given to test its performance.
When the matrix A is given exactly by the Hessian matrix in the OA and GOA, we use the above technique to compute Bk in the OA and GOA, and the resultant iterative algorithms are named OABFGS1 and GOABFGS1, respectively. Finally, if in the OA and GOA both A and B are updated by (66) and (65), the resultant iterative algorithms will be named OABFGS2 and GOABFGS2, respectively.
In order to show the high performance of the optimal algorithms OABFGS1, GOABFGS1, and OABFGS2, we will also apply the DFP method [18, 19] to solve nonlinear minimization problems, of which the updating technique is
(68)Bk+1=Bk+pkpkTqk·pk-BkqkqkTBkqk·(Bkqk).
For each iterative step we search the minimization of f(xk-λBk∇f(xk)) to find λ by solving the optimality equation ∇f(xk-λBk∇f(xk))·(Bk∇f(xk))=0 by using the half-interval method. Gill et al. [27], among several others, have shown that the modification in (65) performs more efficiently than that in (68) for most problems.
5. Numerical Examples
In order to evaluate the performance of the newly developed methods let us investigate the following examples. Some results are compared with those obtained from the steepest descent method (SDM) and the DFP method [18, 19]. In the above algorithms there exists a relaxation parameter γ, which is problem dependent. A good parameter value of γ can be selected easily by comparing the convergence speeds for different values of γ.
Example 1.
As the first testing example of OA and GOA we use the following function given by Rosenbrock [28]:
(69)min{f=100(x2-x12)2+(1-x1)2}.
In mathematical optimization, the Rosenbrock function is a nonconvex function used as a performance test benchmark problem for optimization algorithms. It is also known as Rosenbrock’s valley or Rosenbrock’s banana function. The minimum is zero occurring at (x1,x2)=(1,1). This function is difficult to be minimized because it has a steep sided valley following the parabolic curve x12=x2. Kuo et al. [29] have used the particle swarm method to solve this problem; however, the numerical procedures are rather complex. Liu and Atluri [30] have applied a fictitious time integration method to solve the above problem under the constraints of x1≥0 and x2≥0, whose accuracy can reach to the fifth order.
We apply the OA to this problem by starting from the initial point at (3,2) and under a convergence criterion ε=10-10. The SDM is run 3749 steps as shown in Figures 1(a) and 1(b) by solid lines for showing the residual and the objective function f. The SDM can reach a very accurate minimum value of f with 1.22×10-20. The OA with γ=0 converges very fast with only 6 steps, with the residual and the objective function f being shown in Figures 1(a) and 1(b) by dashed lines. The OA is faster than the SDM with about 625 times, and furthermore the minimum value of f can be reduced to 1.925×10-25. In Figure 1(c) we compare the solution paths generated by the SDM and OA. When the SDM is moving very slowly along the valley, the OA is moving outside the region of valley and is convergent very fast to the solution. In Figure 1(c) the red dashed line is used to represent the valley of the Rosenbrock function, which is not used to indicate the result of a numerical method. As shown in Figure 1(b), GOA is slightly better than OA, although the paths of OA and GOA seem to be identical in Figure 1(c).
The SDM usually works very well during early stages; however, as a stationary point is approached, it behaves poorly, taking small nearly orthogonal steps. On the other hand, with the help of the optimal direction g+αoAg the OA can fast reach to the final end point with high accuracy. For this example the GOA spends the same number of steps as the OA; however, the GOA gives very accurate minimum value of f=1.26×10-29.
For the Rosenbrock problem, (a) the residuals, (b) the objective function f, and (c) the solution paths of SDM, OA, and GOA.
Example 2.
Next, we consider a generalization of the Rosenbrock function [31, 32]:
(70)min{f=∑k=1n-1[100(xk+1-xk2)2+(1-xk)2]}.
This variant has been shown to have exactly one minimum for n=3 at (x1,x2,x3)=(1,1,1) and exactly two minima for 4≤n≤7. The global minimum happens at (x1,x2,…,xn)=(1,1,…,1) and a local minimum is near (x1,x2,…,xn)=(-1,1,…,1). This result is obtained by setting the gradient of the function equal to zero, noticing that the resulting equation is a rational function of xi. For small n the polynomials can be determined exactly and Sturm’s theorem can be used to determine the number of real roots, while the roots can be bounded in the region of |xi|<2.4 [32]. For larger n this method breaks down due to the size of the coefficients involved.
We apply the OA to this problem with n=30, starting from xi=0.1i, and under a convergence criterion ε=10-5. Under the above stopping criterion, the SDM is run over 13830 steps as shown in Figure 2(a) by solid lines for showing the residual and f. The SDM can reach a very accurate minimum value of f with 1.357×10-10. The OA with γ=0.2 converges with 9956 steps, with the residual and f being shown in Figure 2 by dashed lines. The OA is faster than the SDM, and similarly the minimum of f can be reduced to 1.96×10-11.
At the same time we show the values of the optimal α in Figure 3(a) for the OA. The value of α is much smaller than 1, which means that the term g plays a dominant role in the descent direction; however, the influence of αAg, although a little, is a key point to speed up the convergence.
Then we apply the GOA to this problem under the same conditions as that in OA. The GOA with γ=0.1 converges with 9846 steps, with the residual and f being shown in Figure 2 by dashed-dotted lines. The GOA is faster than the SDM, and the minimum value of f can be further reduced to 9.59×10-12. We show the values of the optimal α for the GOA in Figure 3(b). This problem is quite difficult, and many other algorithms are failure to solve this problem.
For the generalized Rosenbrock problem, (a) the residuals and (b) the objective functions obtained by SDM, OA, and GOA.
In the solution of the generalized Rosenbrock problem by the OA and GOA, showing the variation of optimal values of α for (a) OA and (b) GOA.
Example 3.
We consider a problem due to Powell [33]:
(71)min{f=(x1+10x2)2+5(x3-x4)2mmmmll+(x2-2x3)4+10(x1-x4)4}.
The minimum of f is zero occurring at (x1,x2,x3,x4)=(0,0,0,0). We apply the OA to this problem, starting from (x1,x2,x3,x4)=(3,-1,0,1) and under a convergence criterion of ε=10-6. The SDM converges very slowly over 50,000 steps as shown in Figure 4 by solid lines for showing a0, s, and the residual. The SDM can reach a very accurate value of f=9.938×10-9. At the same time, the OA with γ=0.15 converges with 349 steps, with a0, s, and the residual shown in Figure 4 by dashed lines. The OA is faster than the SDM over 130 times, and furthermore we can get a more accurate f=8.33×10-10.
As shown in Figure 4(c) the residual obtained by the OABFGS is rapidly growing to the order 1026 and then fast decaying to the order 10-6, and through 1043 steps the OABFGS leads to a better solution with f=1.285×10-9 than that obtained by the SDM. Here the combination of the optimal algorithm with the BFGS updating technique is very useful when the exact Hessian matrix is not available or when the computation of the Hessian matrix is cumbersome.
Then we apply the GOA in Section 3.5 to the Powell problem. Under the same convergence criterion of ε=10-6, the GOA with γ=0.001 converges only with 96 steps as shown in Figure 4(c), where the GOA is faster than the SDM with 500 times and than OA with 3.5 times, and furthermore we can get a very accurate minimum value f=1.55×10-9. In Figure 5 we compare the values of α obtained by the OA and the GOA, of which we can observe that both α are quite small and may be negative.
Then, we apply the OABFGS1 and the GOABGGS1 mentioned in Section 4 to solve the Powell problem. Under the same convergence criterion of ε=10-6, the OABFGS1 with γ=0.1 converges only with 29 steps as shown in Figure 6(a), where we can get very accurate value of f=7.57×10-12. Similarly, the GOABFGS1 with γ=0.1 converges with 30 steps as shown in Figure 6(a), where we can get very accurate value of f=5.28×10-10. In Figure 6(b) we compare the values of α obtained by the OABFGS1 and GOABFGS1, of which we can observe that both α are quite large, which means that the quasi-Newton direction is a dominant factor to accelerate the convergence speed.
Then, we apply the OABFGS2 to solve the Powell problem with γ=0, which converges through 116 steps and with the value of f to be f=2.77×10-11, which is very accurate. In Figures 6(c) and 6(d) we show the residual and the value of α obtained by the OABFGS2.
In order to reveal the high performance of the optimal algorithms OABFGS1, GOABFGS1, and OABFGS2, we apply the DFP method [18, 19] to solve the Powell problem, which converges through 131 steps as shown in Figure 6(a). For each iteration step we search the minimization of f(xk-λBk∇f(xk)) to find λ by solving the optimality equation ∇f(xk-λBk∇f(xk))·(Bk∇f(xk))=0 by using the half-interval method, with two end points fixed by λ=0 and λ=1, and the convergence criterion is given by ε=10-10. At the end of the iteration we obtain the minimum value of f to be f=9.79×10-12, which is very accurate.
For the Powell problem, comparing (a) a0 and (b) s of SDM and OA and (c) the residuals obtained by the SDM, OA, GOA, and OABFGS.
For the Powell problem, comparing α obtained by the OA and GOA.
For the Powell problem, comparing (a) the residuals and (b) α obtained by the OABFGS1 and GOABFGS1 with DFP and showing (c) the residual and (d) α obtained by OABFGS2.
Example 4.
In this example we design an office block inside a structure with a curved roof given by x=100-y2. Suppose that the number of total cuboids is n and each cuboid can have different size, and we attempt to find the dimensions of all cuboids with maximum volume which would fit inside the given roof structure; that is,
(72)max{f=y1[100-y12]+y2[100-(y1+y2)2]mmmmm+…+yn[100-(y1+…+yn)2]},
where yi>0 is the height of the ith cuboid.
The maximum of f is tending to 2000/3 when n is increasing. When n=95, we apply the GOA to this problem by starting from yi=0.05 and under a convergence criterion of ε=10-5. The SDM is convergent with 6868 steps as shown in Figure 7 by solid lines for showing a0, s, residual, and f. At the same time, the GOA with γ=0.35 converges with 96 steps, with a0, s, residual, and f being shown in Figure 7 by dashed lines. Both f are tending to 661.9945. The GOA is faster than the SDM with 80 times. The heights and widths of the cuboids with respect to the number of floors are plotted in Figure 8. For this problem both OA and GOA lead to the same numerical results.
Moreover, we also apply the OABFGS and the OABFGS1 as well as the DFP to this problem under the above same initial condition and convergence criterion, where for the DFP we use the half-interval method to solve the local minimization to find the best λ with two end points fixed by λ=0 and λ=1, and the convergence criterion is given by ε=10-10. The values of γ used in the OABFGS and OABFGS1 are, respectively, 0.3 and 0.05. The residuals of these three methods are compared in Figure 9(c), from which we can observe that the OABFGS converges with 35 steps, the OABFGS1 converges with 32 steps, and the DFP converges with 46 steps. They are all better than the above OA and GOA algorithms. The value of α as shown in Figure 9(a) for the OABFGS is quite small, which indicates that the descent direction is dominated by the gradient direction. The value of α as shown in Figure 9(b) for the OABFGS1 is quite large, which indicates that the descent direction is dominated by the quasi-Newton direction.
For the maximum area under a given curve comparing (a) a0, (b) s, (c) residual error, and (d) f obtained by the SDM and GOA.
Showing the heights and widths of the floors with respect to the number of floors.
For Example 4, showing (a) α of OABFGS and (b) α of OABFGS1 and (c) comparing the residuals obtained by the DFP, OABFGS, and OABFGS1.
Example 5.
In this example we test the minimization of the Schwefel function with n=100:
(73)min{f=∑i=1n(∑j=1ixj)2}.
The minimum is zero at xj=0,j=1,…,n.
We apply the OA to this problem by starting from xi=1 and under a convergence criterion of ε=10-4. The SDM does not converge within 10000 steps as shown in Figure 10 by solid lines for showing a0, s, residual, and f. At the same time, the OA with γ=0.1 converges with 276 steps, with a0, s, residual, and f being shown in Figure 10 by dashed lines. The OA is faster than the SDM over 100 times, and f is tending to 3.99×10-10 which is more accurate than 1.733×10-4 obtained by the SDM. When we apply the GOA with γ=0.05 to this problem, it is convergence with 299 steps and is faster than the SDM over 100 times, and f is tending to 7.38×10-10 which is more accurate than SDM and OA.
For the minimization of Schwefel function comparing (a) a0, (b) s, (c) residual error, and (d) f obtained by the SDM and OA.
Example 6.
In this example we test the minimization of the Whitley function with n=8:
(74)min{f=∑i=1n∑j=1n[yji24000-cosyji+1]},
where yji=100(xi-xj2)2+(xj-1)2. The minimum is zero at xj=1,j=1,…,n. Here, we fix n=8.
It is very difficult to minimize the Whitley function. The SDM diverges as shown in Figure 11 by solid lines. We apply the OA to this problem, starting from xi=1.12 and under a convergence criterion of ε=10-8. The OA with γ=0.06 converges with 24 steps, with residual and f being shown in Figure 11 by dashed lines and f tending to 1.47×10-13.
For the minimization of Whitley function comparing (a) residuals and (b) f obtained by the SDM and OA, where the SDM is failure.
6. Conclusions
By formulating the minimization problem into a continuous manifold, we can derive a governing system of ODEs for deriving the iterative algorithms which being proven convergence to find the unknown minimum point x of a given nonlinear minimization function. The novel algorithm is named “an optimal algorithm (OA)," because in the local frame we have derived the optimal parameter of α in the descent direction, which is a linear combination of the gradient vector and a supplemental vector. We have demonstrated a critical descent vector to derive a globally optimal algorithm (GOA), which can substantially accelerate the convergence speed in the numerical solution of nonlinear minimization problem. It was proven that the critical value αc in the critical descent vector leads to the largest convergence rate among all the descent vectors specified by u=g+αBg. Due to its criticality, if one attempts to find a better descent vector than u=g+αcBg, there would be no real descent vector of u. Through several numerical tests we found that the both the OA and the GOA outperformed very well. Then we have proposed novel algorithms based on OA and GOA by replacing the Hessian matrix A and the descent matrix B with the updating techniques of BFGS for one or for both of these two matrices. Two numerical examples were given to test the performances of these algorithms, which are faster than the original OA and GOA algorithms, due to the enhancement by using the quasi-Newton conditions on the updated matrices.
Conflict of Interests
This paper is a purely academic research, and the author declares that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The author highly appreciates the constructive comments from anonymous referees, which improved the quality of this paper. Highly appreciated are also the Project NSC-102-2221-E-002-125-MY3, the 2011 Outstanding Research Award from the National Science Council of Taiwan, and the 2011 Taiwan Research Front Award from Thomson Reuters. It is also acknowledged that the author has been promoted as being a Lifetime Distinguished Professor of National Taiwan University since 2013.
BarzilaiJ.BorweinJ. M.Two-point step size gradient methods1988811411482-s2.0-000153189510.1093/imanum/8.1.141MR967848ZBL0638.65055RaydanM.On the Barzilai and Borwein choice of steplength for the gradient method19931333213262-s2.0-001173923510.1093/imanum/13.3.321MR1225468ZBL0778.65045RaydanM.The Barzilaiai and Borwein gradient method for the large scale unconstrained minimization problem19977126332-s2.0-0031542191MR1430555ZBL0898.90119FriedlanderA.MartinezJ. M.MolinaB.RaydanM.Gradient method with retards and generalizations19993612752892-s2.0-0000213421MR1664812RaydanM.SvaiterB. F.Relaxed steepest descent and Cauchy-Barzilai-Borwein method20022121551672-s2.0-003647100610.1023/A:1013708715892MR1883751ZBL0988.90049DaiY. H.YuanJ. Y.YuanY.Modified two-point stepsize gradient methods for unconstrained optimization20022211031092-s2.0-003653928910.1023/A:1014838419611MR1901255ZBL1008.90056DaiY. H.LiaoL.-H.R-linear convergence of the Barzilai and Borwein gradient method20022211102-s2.0-003610237610.1093/imanum/22.1.1MR1880051ZBL1002.65069DaiY. H.YuanY.Alternate minimization gradient method20032333773932-s2.0-003850458910.1093/imanum/23.3.377MR1987936ZBL1055.65073FletcherR.QiL.TeoK.YangX.On the Barzilai-Borwein gradient method200596New York, NY, USASpringer235256Applied Optimization10.1007/0-387-24255-4_10MR2144378ZBL1118.90318YuanY.-X.A new stepsize for the steepest descent method20062421491562-s2.0-33645651209MR2204453ZBL1101.65067BirginE. G.MartinezJ. M.A spectral conjugate gradient method for
unconstrained optimization20014321171282-s2.0-014146795010.1007/s00245-001-0003-0MR1814590ZBL0990.90134AndreiN.Scaled conjugate gradient algorithms for unconstrained optimization20073834014162-s2.0-3614899834410.1007/s10589-007-9055-7MR2354202ZBL1168.90608AndreiN.A Dai-Yuan conjugate gradient algorithm with sufficient descent and conjugacy conditions for unconstrained optimization20082121651712-s2.0-3654902786810.1016/j.aml.2007.05.002MR2426973ZBL1165.90683AndreiN.New accelerated conjugate gradient algorithms as a modification of Dai-Yuan's computational scheme for unconstrained optimization201023412339734102-s2.0-7795527046110.1016/j.cam.2010.05.002MR2665395ZBL05775773ZhangL.A new Liu-Storey type nonlinear conjugate gradient method for unconstrained optimization problems200922511461572-s2.0-5854911198410.1016/j.cam.2008.07.016MR2490179ZBL1185.65101Babaie-KafakiS.GhanbariR.Mahdavi-AmiriN.Two new conjugate gradient methods based on modified secant equations20102345137413862-s2.0-7795086185110.1016/j.cam.2010.01.052MR2610353ZBL1202.65071ShiZ.-J.GuoJ.A new family of conjugate gradient methods200922414444572-s2.0-5644910901910.1016/j.cam.2008.05.012MR2474245ZBL1155.65049DavidonW. C.Variable metric method for minimization1959ANL-5990FletcherR.PowellM.A rapidly convergent descent method for minimization19636163168MR015211610.1093/comjnl/6.2.163ZBL0132.11603LiuC.-S.A Jordan algebra and dynamic system with associator as vector field20003534214292-s2.0-003418810810.1016/S0020-7462(99)00027-XMR1724166ZBL1006.70015BroydenC. G.The convergence of a class of double-rank minimization algorithms: 2. The new algorithm1970632222312-s2.0-1994436659410.1093/imamat/6.3.222MR0433870ZBL0207.17401FletcherR.A new approach to variable metric algorithms19701333173222-s2.0-0014825610GoldfarbD.A family of variable-metric methods derived by variational means197024232610.1090/S0025-5718-1970-0258249-6MR0258249ZBL0196.18002ShannoD. F.Conditioning of quasi-Newton methods for function minimization197024647656MR027402910.1090/S0025-5718-1970-0274029-XDaiY.-H.Convergence properties of the BFGS algoritm20031336937012-s2.0-004265941010.1137/S1052623401383455MR1972211DaiY.-H.A perfect example for the BFGS method20131381-25015302-s2.0-8485832250110.1007/s10107-012-0522-2ZBL1262.49040GillP. E.MurrayW.PitfieldP. A.The implementation of two revised quasi-Newton algorithms for unconstrained optimization1972NAC-11Teddington, UKNational Physical LaboratoryRosenbrockH. H.An automatic method for finding the greatest or least value of a function19603317518410.1093/comjnl/3.3.175MR0136042KuoH.-C.ChangJ.-R.LiuC.-H.Particle swarm optimization for global optimization problems20061431701812-s2.0-33749627025LiuC.-S.AtluriS. N.A fictitious time integration method (FTIM) for
solving mixed complementarity problems with applications to non-linear optimization20083421551782-s2.0-57649213675MR2479717ZBL1232.90334BouvryP.ArbabF.SeredynskiF.Distributed evolutionary optimization, in Manifold: Rosenbrock's function case study20001222–41411592-s2.0-003389446710.1016/S0020-0255(99)00116-4ZBL1147.68621KokS.SandrockC.Locating and characterizing the stationary points of the extended Rosenbrock function20091734374532-s2.0-7034961542110.1162/evco.2009.17.3.437PowellM. J. D.An iterative method for finding stationary values of a function of several variables19625214715110.1093/comjnl/5.2.147ZBL0104.34303