A modified spectral PRP conjugate gradient method is presented
for solving unconstrained optimization problems. The constructed search direction
is proved to be a sufficiently descent direction of the objective function. With an
Armijo-type line search to determinate the step length, a new spectral PRP conjugate
algorithm is developed. Under some mild conditions, the theory of global convergence is established. Numerical results demonstrate that this algorithm is promising,
particularly, compared with the existing similar ones.
1. Introduction
Recently, it is shown that conjugate gradient method is efficient and powerful in solving large-scale unconstrained minimization problems owing to its low memory requirement and simple computation. For example, in [1–17], many variants of conjugate gradient algorithms are developed. However, just as pointed out in [2], there exist many theoretical and computational challenges to apply these methods into solving the unconstrained optimization problems. Actually, 14 open problems on conjugate gradient methods are presented in [2]. These problems concern the selection of initial direction, the computation of step length, and conjugate parameter based on the values of the objective function, the influence of accuracy of line search procedure on the efficiency of conjugate gradient algorithm, and so forth.
The general model of unconstrained optimization problem is as follows:minf(x),x∈Rn,
where f:Rn→R is continuously differentiable such that its gradient is available. Let g(x) denote the gradient of f at x, and let x0 be an arbitrary initial approximate solution of (1.1). Then, when a standard conjugate gradient method is used to solve (1.1), a sequence of solutions will be generated byxk+1=xk+αkdk,k=0,1,…,
where αk is the steplength chosen by some line search method and dk is the search direction defined bydk={-gkifk=0,-gk+βkdk-1ifk>0,
where βk is called conjugacy parameter and gk denotes the value of g(xk). For a strictly convex quadratical programming, βk can be appropriately chosen such that dk and dk-1 are conjugate with respect to the Hessian matrix of the objective function. If βk is taken byβk=βkPRP=gkT(gk-gk-1)‖gk-1‖2,
where ∥·∥ stands for the Euclidean norm of vector, then (1.2)–(1.4) are called Polak-Ribiére-Polyak (PRP) conjugate gradient method (see [8, 18]).
It is well known that PRP method has the property of finite termination when the objective function is a strong convex quadratic function combined with the exact line search. Furthermore, in [7], for a twice continuously differentiable strong convex objective function, the global convergence has also been proved. However, it seems to be nontrivial to establish the global convergence theory under the condition of inexact line search, especially for a general nonconvex minimization problem. Quite recently, it is noticed that there are many modified PRP conjugate gradient methods studied (see, e.g., [10–13, 17]). In these methods, the search direction is constructed to possess the sufficient descent property, and the theory of global convergence is established with different line search strategy. In [17], the search direction dk is given bydk={-gkifk=0,-gk+βkPRPdk-1-θkyk-1ifk>0,
whereθk=gkTdk-1‖gk-1‖2,yk-1=gk-gk-1,sk-1=xk-xk-1.
Similar to the idea in [17], a new spectral PRP conjugate gradient algorithm will be developed in this paper. On one hand, we will present a new spectral conjugate gradient direction, which also possess the sufficiently descent feature. On the other hand, a modified Armijo-type line search strategy is incorporated into the developed algorithm. Numerical experiments will be used to make a comparison among some similar algorithms.
The rest of this paper is organized as follows. In the next section, a new spectral PRP conjugate gradient method is proposed. Section 3 will be devoted to prove the global convergence. In Section 4, some numerical experiments will be done to test the efficiency, especially in comparison with the existing other methods. Some concluding remarks will be given in the last section.
2. New Spectral PRP Conjugate Gradient Algorithm
In this section, we will firstly study how to determine a descent direction of objective function.
Let xk be the current iterate. Let dk be defined bydk={-gkifk=0,-θkgk+βkPRPdk-1ifk>0,
where βkPRP is specified by (1.4) andθk=dk-1Tyk-1‖gk-1‖2-dk-1TgkgkTgk-1‖gk‖2‖gk-1‖2.
It is noted that dk given by (2.1) and (2.2) is different from those in [3, 16, 17], either for the choice of θk or for that of βk.
We first prove that dk is a sufficiently descent direction.
Lemma 2.1.
Suppose that dk is given by (2.1) and (2.2). Then, the following result
gkTdk=-‖gk‖2
holds for any k≥0.
Proof.
Firstly, for k=0, it is easy to see that (2.3) is true since d0=-g0.
Secondly, assume that
dk-1Tgk-1=-‖gk-1‖2
holds for k-1 when k≥1. Then, from (1.4), (2.1), and (2.2), it follows that
gkTdk=-θk‖gk‖2+gkT(gk-gk-1)‖gk-1‖2dk-1Tgk=-dk-1T(gk-gk-1)‖gk-1‖2gkTgk+dk-1TgkgkTgk-1‖gk‖2‖gk-1‖2gkTgk+gkT(gk-gk-1)‖gk-1‖2dk-1Tgk=dk-1Tgk-1‖gk-1‖2gkTgk=‖gk‖2‖gk-1‖2(-‖gk-1‖2)=-‖gk‖2.
Thus, (2.3) is also true with k-1 replaced by k. By mathematical induction method, we obtain the desired result.
From Lemma 2.1, it is known that dk is a descent direction of f at xk. Furthermore, if the exact line search is used, then gkTdk-1=0; henceθk=dk-1Tyk-1‖gk-1‖2-dk-1TgkgkTgk-1‖gk‖2‖gk-1‖2=-dk-1Tgk-1‖gk-1‖2=1.
In this case, the proposed spectral PRP conjugate gradient method reduces to the standard PRP method. However, it is often that the exact line search is time-consuming and sometimes is unnecessary. In the following, we are going to develop a new algorithm, where the search direction dk is chosen by (2.1)-(2.2) and the stepsize is determined by Armijio-type inexact line search.
Given constants δ1, ρ∈(0,1), δ2>0, ϵ>0. Choose an initial point x0∈Rn. Let k:=0.
Step 2.
If ∥gk∥≤ϵ, then the algorithm stops. Otherwise, compute dk by (2.1)-(2.2), and go to Step 3.
Step 3.
Determine a steplength αk=max{ρj,j=0,1,2,…} such that
f(xk+αkdk)≤f(xk)+δ1αkgkTdk-δ2αk2‖dk‖2.
Step 4.
Set xk+1∶=xk+αkdk, and k∶=k+1. Return to Step 2.
Since dk is a descent direction of f at xk, we will prove that there must exist j0 such that αk=ρj0 satisfies the inequality (2.7).
Proposition 2.3.
Let f:Rn→R be a continuously differentiable function. Suppose that d is a descent direction of f at x. Then, there exists j0 such that
f(x+αd)≤f(x)+δ1αgTd-δ2α2‖d‖2,
where α=ρj0, g is the gradient vector of f at x, δ1, ρ∈(0,1) and δ2>0 are given constant scalars.
Proof.
Actually, we only need to prove that a step length α is obtained in finitely many steps. If it is not true, then for all sufficiently large positive integer m, we have
f(x+ρmd)-f(x)>δ1ρmgTd-δ2ρ2m‖d‖2.
Thus, by the mean value theorem, there is a θ∈(0,1) such that
ρmg(x+θρmd)Td>δ1ρmgTd-δ2ρ2m‖d‖2.
It reads
(g(x+θρmd)-g)Td>(δ1-1)gTd-δ2ρm‖d‖2.
When m→∞, it is obtained that
(δ1-1)gTd<0.
From δ1∈(0,1), it follows that gTd>0. This contradicts the condition that d is a descent direction.
Remark 2.4.
From Proposition 2.3, it is known that Algorithm 2.2 is well defined. In addition, it is easy to see that more descent magnitude can be obtained at each step by the modified Armijo-type line search (2.7) than the standard Armijo rule.
3. Global Convergence
In this section, we are in a position to study the global convergence of Algorithm 2.2. We first state the following mild assumptions, which will be used in the proof of global convergence.
Assumption 3.1.
The level set Ω={x∈Rn|f(x)≤f(x0)} is bounded.
Assumption 3.2.
In some neighborhood N of Ω, f is continuously differentiable and its gradient is Lipschitz continuous, namely, there exists a constant L>0 such that
‖g(x)-g(y)‖≤L‖x-y‖,∀x,y∈N.
Since {f(xk)} is decreasing, it is clear that the sequence {xk} generated by Algorithm 2.2 is contained in a bounded region from Assumption 3.1. So, there exists a convergent subsequence of {xk}. Without loss of generality, it can be supposed that {xk} is convergent. On the other hand, from Assumption 3.2, it follows that there is a constant γ1>0 such that‖g(x)‖≤γ1,∀x∈Ω.
Hence, the sequence {gk} is bounded.
In the following, we firstly prove that the stepsize αk at each iteration is large enough.
Lemma 3.3.
With Assumption 3.2, there exists a constant m>0 such that the following inequality
αk≥m‖gk‖2‖dk‖2
holds for all k sufficiently large.
Proof.
Firstly, from the line search rule (2.7), we know that αk≤1.
If αk=1, then we have ∥gk∥≤∥dk∥. The reason is that ∥gk∥>∥dk∥ implies that
‖gk‖2>‖gk‖‖dk‖≥-gkTdk,
which contradicts (2.3). Therefore, taking m=1, the inequality (3.3) holds.
If 0<αk<1, then the line search rule (2.7) implies that ρ-1αk does not satisfy the inequality (2.7). So, we have
f(xk+ρ-1αkdk)-f(xk)>δ1αkρ-1gkTdk-δ2ρ-2αk2‖dk‖2.
Since
f(xk+ρ-1αkdk)-f(xk)=ρ-1αkg(xk+tkρ-1αkdk)Tdk=ρ-1αkgkTdk+ρ-1αk(g(xk+tkρ-1αkdk)-gk)Tdk≤ρ-1αkgkTdk+Lρ-2αk2‖dk‖2,
where tk∈(0,1) satisfies xk+tkρ-1αkdk∈N and the last inequality is from (3.2), it is obtained that
δ1αkρ-1gkTdk-δ2ρ-2αk2‖dk‖2<ρ-1αkgkTdk+Lρ-2αk2‖dk‖2
due to (3.5) and (3.1). It reads
(1-δ1)αkρ-1gkTdk+(L+δ2)ρ-2αk2‖dk‖2>0,
that is,
(L+δ2)ρ-1αk‖dk‖2>(δ1-1)gkTdk.
Therefore,
αk>(δ1-1)ρgkTdk(L+δ2)‖dk‖2.
From Lemma 2.1, it follows that
αk>ρ(1-δ1)‖gk‖2(L+δ2)‖dk‖2.
Taking
m=min{1,ρ(1-δ1)L+δ2},
then the desired inequality (3.3) holds.
From Lemmas 2.1 and 3.3 and Assumption 3.1, we can prove the following result.
Lemma 3.4.
Under Assumptions 3.1 and 3.2, the following results hold:
∑k≥0‖gk‖4‖dk‖2<∞,limk→∞αk2‖dk‖2=0.
Proof.
From the line search rule (2.7) and Assumption 3.1, there exists a constant M such that
∑k=0n-1(-δ1αkgkTdk+δ2αk2‖dk‖2)≤∑k=0n-1(f(xk)-f(xk+1))=f(x0)-f(xn)<2M.
Then, from Lemma 2.1, we have
2M≥∑k=0n-1(-δ1αkgkTdk+δ2αk2‖dk‖2)=∑k=0n-1(δ1αk‖gk‖2+δ2αk2‖dk‖2)≥∑k=0n-1(δ1m‖gk‖2‖dk‖2‖gk‖2+δ2⋅m2⋅‖gk‖4‖dk‖4⋅‖dk‖2)=∑k=0n-1(δ1+δ2m)‖gk‖4‖dk‖2⋅m.
Therefore, the first conclusion is proved.
Since
2M≥∑k=0n-1(δ1αk‖gk‖2+δ2αk2‖dk‖2)≥δ2∑k=0n-1αk2‖dk‖2,
the series
∑k=0∞αk2‖dk‖2
is convergent. Thus,
limk→∞αk2‖dk‖2=0.
The second conclusion (3.14) is obtained.
In the end of this section, we come to establish the global convergence theorem for Algorithm 2.2.
Theorem 3.5.
Under Assumptions 3.1 and 3.2, it holds that
limk→∞inf‖gk‖=0.
Proof.
Suppose that there exists a positive constant ϵ>0 such that
‖gk‖≥ϵ
for all k. Then, from (2.1), it follows that
‖dk‖2=dkTdk=(-θkgkT+βkPRPdk-1T)(-θkgk+βkPRPdk-1)=θk2‖gk‖2-2θkβkPRPdk-1Tgk+(βkPRP)2‖dk-1‖2=θk2‖gk‖2-2θk(dkT+θkgkT)gk+(βkPRP)2‖dk-1‖2=θk2‖gk‖2-2θkdkTgk-2θk2‖gk‖2+(βkPRP)2‖dk-1‖2=(βkPRP)2‖dk-1‖2-2θkdkTgk-θk2‖gk‖2.
Dividing by (gkTdk)2 in the both sides of this equality, then from (1.4), (2.3), (3.1), and (3.21), we obtain
‖dk‖2‖gk‖4=(βkPRP)2‖dk-1‖2-2θkdkTgk-θk2‖gk‖2‖gk‖4=(gkT(gk-gk-1))2‖gk-1‖4‖dk-1‖2‖gk‖4-(θk-1)2‖gk‖2+1‖gk‖2≤‖gk-gk-1‖2‖gk-1‖4‖dk-1‖2‖gk‖2-(θk-1)2‖gk‖2+1‖gk‖2≤‖gk-gk-1‖2‖gk‖2‖dk-1‖2‖gk-1‖4+1‖gk‖2<L2αk-12‖dk-1‖2ϵ2‖dk-1‖2‖gk-1‖4+1‖gk‖2.
From (3.14) in Lemma 3.4, it follows that
limk→∞αk-12‖dk-1‖2=0.
Thus, there exists a sufficient large number k0 such that for k≥k0, the following inequalities
0≤αk-12‖dk-1‖2<ϵ2L2
hold.
Therefore, for k≥k0,
‖dk‖2‖gk‖4≤‖dk-1‖2‖gk-1‖4+1‖gk‖2≤⋯≤‖dk0‖2‖gk0‖4+∑i=k0+1k1‖gi‖2<C0ϵ2+∑i=k0+1k1ϵ2=C0+k-k0ϵ2,
where C0=ϵ2∥dk0∥2/∥gk0∥2 is a nonnegative constant.
The last inequality implies
∑k≥1‖gk‖4‖dk‖2≥∑k>k0‖gk‖4‖dk‖2>ϵ2∑k>k01C0+k-k0=∞,
which contradicts the result of Lemma 3.4.
The global convergence theorem is established.
4. Numerical Experiments
In this section, we will report the numerical performance of Algorithm 2.2. We test Algorithm 2.2 by solving the 15 benchmark problems from [19] and compare its numerical performance with that of the other similar methods, which include the standard PRP conjugate gradient method in [6], the modified FR conjugate gradient method in [16], and the modified PRP conjugate gradient method in [17]. Among these algorithms, either the updating formula or the line search rule is different from each other.
All codes of the computer procedures are written in MATLAB 7.0.1 and are implemented on PC with 2.0 GHz CPU processor, 1 GB RAM memory, and XP operation system.
The parameters are chosen as follows: ϵ=10-6,ρ=0.75,δ1=0.1,δ2=1.
In Tables 1 and 2, we use the following denotations:
Dim: the dimension of the objective function;
GV: the gradient value of the objective function when the algorithm stops;
NI: the number of iterations;
NF: the number of function evaluations;
CT: the run time of CPU;
mfr: the modified FR conjugate gradient method in [16];
prp: the standard PRP conjugate gradient method in [6];
msprp: the modified PRP conjugate gradient method in [17];
mprp: the new algorithm developed in this paper.
Comparison of efficiency with the other methods.
Function
Algorithm
Dim
GV
NI
NF
CT(s)
Rrosenbrock
mfr
2
8.8818e-007
328
7069
0.2970
prp
2
9.2415e-007
760
41189
1.4370
mprp
2
8.6092e-007
124
2816
0.0940
msprp
2
6.9643e-007
122
2597
0.1400
Freudenstein and Roth
mfr
2
5.5723e-007
236
5110
0.2190
prp
2
7.1422e-007
331
18798
0.6250
mprp
2
2.4666e-007
67
1904
0.0940
msprp
2
8.6967e-007
62
1437
0.0780
Brown badly
mfr
2
—
—
—
—
prp
2
—
—
—
—
mprp
2
7.9892e-007
105
10279
0.2030
msprp
2
7.6029e-007
70
7117
0.2660
Beale
mfr
2
6.1730e-007
74
714
0.0780
prp
2
8.2455e-007
292
12568
0.4370
mprp
2
6.2257e-007
130
1539
0.0940
msprp
2
8.7861e-007
91
877
0.0470
Powell singular
mfr
4
9.9827e-007
4122
10578
0.6870
prp
4
—
—
—
—
mprp
4
9.6909e-007
13565
218964
5.2660
msprp
4
9.8512e-007
11893
169537
7.2500
Wood
mfr
4
7.7937e-007
263
5787
0.2660
prp
4
9.9841e-007
1284
69501
2.3440
mprp
4
9.6484e-007
280
6432
0.1720
msprp
4
7.9229e-007
404
9643
0.4070
Extended Powell singular
mfr
4
9.9827e-007
4122
10578
0.6800
prp
4
—
—
—
—
mprp
4
9.6909e-007
13565
218964
5.5310
msprp
4
9.8512e-007
11893
169537
7.4070
Broyden tridiagonal
mfr
4
4.8451e-007
53
784
0.0630
prp
4
6.6626e-007
87
4460
0.1180
mprp
4
5.8166e-007
39
430
0.0320
msprp
4
9.7196e-007
52
785
0.0780
Comparison of efficiency with the other methods.
Function
Algorithm
Dim
GV
NI
NF
CT(s)
Kowalik and Osborne
mfr
4
—
—
—
—
prp
4
8.9521e-007
833
26191
1.2970
mprp
4
9.9698e-007
6235
35425
3.5940
msprp
4
9.9560e-007
7059
37976
4.9850
Broyden banded
mfr
6
8.9469e-007
40
505
0.0780
prp
6
8.4684e-007
268
9640
0.4840
mprp
6
8.9029e-007
102
1319
0.0940
msprp
6
9.3276e-007
44
556
0.0940
Discrete boundary
mfr
6
9.1531e-007
107
509
0.0780
prp
6
7.8970e-007
269
11449
0.4690
mprp
6
8.28079e-007
157
1473
0.0930
msprp
6
9.9436e-007
165
1471
0.1410
Variably dimensioned
mfr
8
7.3411e-007
57
1233
0.1250
prp
8
7.3411e-007
113
7403
0.3290
mprp
8
9.0900e-007
69
1544
0.0780
msprp
8
7.3411e-007
57
1233
0.1100
Broyden tridiagonal
mfr
9
9.1815e-007
129
2173
0.1250
prp
9
6.4584e-007
113
5915
0.2500
mprp
9
7.3529e-007
187
2967
0.1250
msprp
9
9.2363e-007
82
1304
0.1100
Linear-rank1
mfr
10
9.7462e-007
84
3762
0.1720
prp
10
4.5647e-007
98
6765
0.2810
mprp
10
6.9140e-007
51
2216
0.0780
msprp
10
6.6630e-007
50
2162
0.1250
Linear-full rank
mfr
12
7.6919e-007
9
36
0.0160
prp
12
8.2507e-007
47
1904
0.1090
mprp
12
7.6919e-007
9
36
0.0630
msprp
12
7.6919e-007
9
36
0.0150
From the above numerical experiments, it is shown that the proposed algorithm in this paper is promising.
5. Conclusion
In this paper, a new spectral PRP conjugate gradient algorithm has been developed for solving unconstrained minimization problems. Under some mild conditions, the global convergence has been proved with an Armijo-type line search rule. Compared with the other similar algorithms, the numerical performance of the developed algorithm is promising.
Acknowledgments
The authors would like to express their great thanks to the anonymous referees for their constructive comments on this paper, which have improved its presentation. This work is supported by National Natural Science Foundation of China (Grant nos. 71071162, 70921001).
AndreiN.Acceleration of conjugate gradient algorithms for unconstrained optimization20092132361369253665910.1016/j.amc.2009.03.020ZBL1172.65027AndreiN.nandrei@ici.roOpen problems in nonlinear conjugate gradient algorithms for unconstrained optimization2011342319330BirginE. G.MartínezJ. M.A spectral conjugate gradient method for unconstrained optimization200143211712810.1007/s00245-001-0003-01814590ZBL0990.90134DuS.-Q.ChenY.-Y.Global convergence of a modified spectral FR conjugate gradient method20082022766770243571110.1016/j.amc.2008.03.020ZBL1154.65047GilbertJ. C.NocedalJ.Global convergence properties of conjugate gradient methods for optimization1992212142114788110.1137/0802003ZBL0767.90082GrippoL.LucidiS.A globally convergent version of the Polak-Ribière conjugate gradient method199778337539110.1016/S0025-5610(97)00002-61466138ZBL0887.90157NocedalJ.WrightS. J.1999New York, NY, USASpringerxxii+636Springer Series in Operations Research1713114PolyakB. T.The conjugate gradient method in extremal problems19699494112ShiZ. J.A restricted Polak-Ribière conjugate gradient method and its global convergence200231147551930796WanZ.HuC. M.YangZ. L.A spectral PRP conjugate gradient methods for nonconvex optimization problem based on modigfied line search201116411571169WanZ.YangZ.WangY.New spectral PRP conjugate gradient method for unconstrained optimization20112411622272798210.1016/j.aml.2010.08.002ZBL1208.49039WeiZ. X.LiG. Y.QiL. Q.Global convergence of the Polak-Ribière-Polyak conjugate gradient method with an Armijo-type inexact line search for nonconvex unconstrained optimization problems20087726421732193242988010.1090/S0025-5718-08-02031-0ZBL1198.65091YuG.GuanL.WeiZ.Globally convergent Polak-Ribière-Polyak conjugate gradient methods under a modified Wolfe line search2009215830823090256342510.1016/j.amc.2009.09.063ZBL1185.65100YuanG.LuX.WeiZ.A conjugate gradient method with descent direction for unconstrained optimization20092332519530256854410.1016/j.cam.2009.08.001ZBL1179.65075YuanG.Modified nonlinear conjugate gradient methods with sufficient descent property for large-scale optimization problems2009311121245350110.1007/s11590-008-0086-5ZBL1154.90623ZhangL.ZhouW.LiD.Global convergence of a modified Fletcher-Reeves conjugate gradient method with Armijo-type line search20061044561572224967810.1007/s00211-006-0028-zZBL1103.65074ZhangL.ZhouW.LiD.-H.A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergence2006264629640226389110.1093/imanum/drl016ZBL1106.65056PolakE.RibièreG.Note sur la convergence de méthodes de directions conjuguées196931635430255025ZBL0174.48001MoréJ. J.GarbowB. S.HillstromK. E.Testing unconstrained optimization software198171174160735010.1145/355934.355936ZBL0454.65049