Recently, sufficient descent property plays an important role in the global convergence analysis of some iterative methods. In this paper, we propose a new iterative method for solving unconstrained optimization problems. This method provides a sufficient descent direction for objective function. Moreover, the global convergence of the proposed method is established under some appropriate conditions. We also report some numerical results and compare the performance of the proposed method with some existing methods. Numerical results indicate that the presented method is efficient.
1. Introduction
Consider the unconstrained optimization problem
(1)minx∈Rnf(x),
where f:Rn→R is a continuously differentiable function. For solving (1), the following iterative formula is often used:
(2)xk+1=xk+αkdk,k=0,1,2,…,
where xk is the current iterative point, αk>0 is a step size which is determined by some line search, and dk is a search direction. Different search directions correspond to different iterative methods [1–4]. Throughout this paper, gk=∇f(xk) is an n-dimensional column vector, yk-1=gk-gk-1, ∥·∥ and T are defined as the Euclidian norm and transpose of vectors, respectively. Generally, if there exists a positive constant c>0, such that
(3)gkTdk≤-c∥gk∥2,
then the search direction dk possesses sufficient descent property. This property may be crucial for the iterative methods to be global convergence [5], and some numerical experiments have shown that sufficient descent methods are efficient [6]. However, not all iterative methods can satisfy sufficient descent condition (3) under some inexact linear search conditions, such as the conjugate gradient method proposed by Wei et al. [7] or the gradient method presented in [8]. In order to make the search direction dk satisfy the condition (3) at each step, much effort has been done [9–12].
In [9], Cheng proposed a modified PRP conjugate gradient method in which the search direction dk is determined by
(4)dk={-gk,k=0,-gk+βk(I-gkgkT∥gk∥2)dk-1,k≥1,
where βk=βkPRP=gkTyk-1/∥gk-1∥2, gkgkT is a n×n matrix and I is an identity matrix.
In [10], Zhang et al. derived a simple sufficient descent method; the search direction dk is given by
(5)dk={-gk,k=0,-gk+(I-gkgkT∥gk∥2)gk-1,k≥1.
Recently, Zhang et al. [11] presented a three-term modified PRP conjugate gradient method; the search direction dk is generated by
(6)dk={-gk,k=0,-gk+βkdk-1-θkyk-1,k≥1,
where
(7)βk=βkPRP=gkTyk-1∥gk-1∥2,θk=gkTdk-1∥gk-1∥2.
We note that (4), (5), and (6) can be written as a linear combination of the steepest descent direction and the projection of the original direction; that is,
(8)dk={-gk,k=0,-gk+λk(I-μkgkTμkTgk)d¯k,k≥1,
where d¯k is an original direction, λk is a scalar, and μk∈Rn is any vector such that μkTgk≠0 holds. Indeed, if λk=βkPRP, μk=gk, and d¯k=dk-1, then (8) reduces to the method (4). Let λk=1, μk=gk, and d¯k=gk-1; then (8) reduces to the method (5). When λk=βkPRP, μk=yk-1, and d¯k=dk-1, it is easy to deduce that (8) reduces to the method (6). From (8), we can easily obtain
(9)gkT(λk(I-μkgkTμkTgk)d¯k)=0.
Thus, one has gkTdk=-∥gk∥2 for all k. It implies that the sufficient descent condition (3) holds with c=1. But the method (5) does not possess a restart feature which can avoid the jamming phenomenon. In addition, the methods (4) and (6) may not always be globally convergent under some inexact linear search [13], such as the standard Armijo-type line search which is given as follows:
(10)αk=max{ρj,j=0,1,2,…},f(xk+αkdk)≤fk+δαkgkTdk,
where ρ∈(0,1) and δ∈(0,1/2).
Motivated by (8) and (9), our purpose is to design a direction in the subspace {d∈Rn∣gkTd=-tk}, where tk≥0 is a parameter. This direction can be written as
(11)d^k=λk(I-μkgkTμkTgk)d¯k-tkvkvkTgk,
where vk∈Rn is any vector such that vkTgk≠0 holds. Let
(12)dk={-gk,k=0,-gk+d^k,k≥1.
It is clear that (8) can be regarded as a special case of (12) with tk=0. Therefore, (12) will have a wider application than (8). If we take λk=βkPRP, μk=yk-1, vk=yk-1, d¯k=gk-1, and tk=(gkTyk-1)2/∥gk-1∥2 in (12), then a new search direction is given as follows:
(13)dk={-gk,k=0,-gk+βkgk-1-θkyk-1,k≥1,
where
(14)βk=gkTyk-1∥gk-1∥2,θk=∥gk∥2∥gk-1∥2.
In this paper, we present a new iterative method for unconstrained optimization problems; the search direction is defined by (13) and (14). We prove that dk satisfies gkTdk≤-∥gk∥2 without any line search. It means that the sufficient descent condition (3) holds with c=1. Furthermore, we prove that the proposed method is globally convergent under the standard Armijo-type line search or the modified Armijo-type line search. From (13) and (14), we can see that the proposed method has a restart feature that directly addresses the jamming problem. In fact, when the step xk-xk-1 is small, then the factor yk-1 tends to zero vector. Therefore, the direction dk generated by (13) is very close to the steepest descent direction -gk.
The rest of this paper is organized as follows. In Section 2, we propose a new algorithm and discuss its sufficient descent property. In Section 3, the global convergence of the proposed method is proved under the modified Armijo-type line search or the standard Armijo line search. Some numerical results are given to test the performance of the proposed method in Section 4. Finally, we have some conclusions about the proposed method.
2. New Algorithm
In this section, the specific iterative steps of the proposed algorithm are listed as follows.
Algorithm 1.
Consider the following.
Step 1. Choose parameters δ∈(0,1), ρ∈(0,1), and β>0; given an initial point x0∈Rn. Set d0=-g0 and k:=0.
Step 2. If ∥gk∥=0, then stop; otherwise go to the next step.
Step 3. Determine a step size αk satisfying modified Armijo-type line search conditions:
(15)αk=max{βρj,j=0,1,2,…},f(xk+αkdk)≤f(xk)-δαk2∥dk∥2.
Step 4. Let xk+1=xk+αkdk.
Step 5. Calculate the search direction dk+1 by (13) and (14).
Step 6. Set k:=k+1, and go to Step 2.
Theorem 2.
Let sequences {dk} and {xk} be generated by (13) and (2); then
(16)gkTdk≤-∥gk∥2,
for all k≥0.
Proof.
Obviously, the conclusion is true for k=0.
If k≥1, multiplying (13) by gkT, we have
(17)gkTdk=-∥gk∥2+gkT(βkgk-1-θkyk-1)=-∥gk∥2+gkTyk-1∥gk-1∥2gkTgk-1-∥gk∥2∥gk-1∥2gkTyk-1=-∥gk∥2+gkTyk-1∥gk-1∥2(gkTgk-1-∥gk∥2)=-∥gk∥2+gkTyk-1∥gk-1∥2gkT(gk-1-gk)=-∥gk∥2-(gkTyk-1)2∥gk-1∥2≤-∥gk∥2.
Therefore, the inequality (16) holds for all k≥0. The proof is completed.
Theorem 2 shows that the search direction dk given by (13) possesses the sufficient descent property for any line search.
3. Convergence Analysis
The following assumptions are often needed to prove the global convergence of nonlinear conjugate gradient methods [14, 15]. In this section, we also use these assumptions in the convergence analysis of the proposed method.
Assumption 3.
Consider the following.
The level set S={x∈Rn:f(x)≤f(x0)} is bounded.
In a neighborhood N of S, the function f is continuously differentiable and its gradient is Lipchitz continuous; namely, there exists a constant L>0, such that
(18)∥g(x)-g(y)∥≤L∥x-y∥,∀x,y∈N.
Lemma 4.
Suppose that Assumption 3 holds. Let {xk} and {dk} be generated by Algorithm 1. If the step size αk is obtained by (15) or (10), then there exists a constant m>0, such that
(19)αk≥m∥gk∥2∥dk∥2,
and one can also have
(20)∑k=0∞∥gk∥4∥dk∥2<∞.
Proof.
The results of this lemma will be proved in the following two cases.
Case 1. Let the step size αk be computed by (15). From Theorem 2, we have ∥gk∥∥dk∥≥-gkTdk≥∥gk∥2; thus ∥dk∥≥∥gk∥. If αk=β, then we obtain αk≥β∥gk∥2/∥dk∥2. If αk<β, then we know ρ-1αk does not satisfy the inequality (15). So we have
(21)f(xk+ρ-1αkdk)-fk>-δαk2ρ-2∥dk∥2.
By Assumption 3(ii) and the mean value theorem, we have
(22)f(xk+ρ-1αkdk)-fk=ρ-1αkg(xk+tkρ-1αkdk)Tdk=ρ-1αkgkTdk+ρ-1αk(g(xk+tkρ-1αkdk)-gk)Tdk≤ρ-1αkgkTdk+Lρ-2αk2∥dk∥2,
where tk∈(0,1).
From (21) and (22), we have
(23)-δαk2ρ-2∥dk∥2<ρ-1αkgkTdk+Lρ-2αk2∥dk∥2.
Using Theorem 2 again, we get
(24)αk>ρ∥gk∥2(L+δ)∥dk∥2.
Let m=min{β,ρ/(L+δ)}; then the inequality (19) is obtained.
From Assumption 3(i), there exists a constant M>0, such that |f(x)|<M, ∀x∈S. By (15), (19), and Theorem 2, we have
(25)∑k=0n-1(δm2∥gk∥4∥dk∥4∥dk∥2)≤∑k=0n-1(δαk2∥dk∥2)≤∑k=0n-1(fk-fk+1)<2M.
Therefore, from the above inequality, we have
(26)∑k=0∞∥gk∥4∥dk∥2<∞.
Case 2. Let the step size αk be computed by (10). Similar to the proof of the above case, we can obtain
(27)αk≥∥gk∥2∥dk∥2,ifαk=1,αk>ρ(1-δ)∥gk∥2L∥dk∥2,ifαk<1.
Let m=min{1,ρ(1-δ)/L}; then the inequality (19) is obtained. From (10), (19), and Theorem 2, we obtain
(28)∑k=0n-1(δm∥gk∥2∥dk∥2∥gk∥2)≤∑k=0n-1(-δαkgkTdk)≤∑k=0n-1(fk-fk+1)<2M.
By the above inequality, we can get (20). The proof is completed.
Theorem 5.
Suppose that Assumption 3 holds. If Algorithm 1 generates infinite sequences {dk} and {xk}, then one has
(29)limk→∞inf∥gk∥=0.
Proof.
We obtain this conclusion (29) by contradiction. Suppose that (29) does not hold, then there exists a positive constant λ1>0, such that ∥gk∥≥λ1, for all k≥0. From Assumption 3(i), we know that there also exists a positive constant λ2>0, such that ∥gk∥≤λ2, for all k≥0. Since dk=-gk+βkgk-1+θkyk-1, then we have
(30)∥dk∥≤∥gk∥+|βk|∥gk-1∥+|θk|∥yk-1∥≤∥gk∥+∥gk∥(∥gk∥+∥gk-1∥)∥gk-1∥2∥gk-1∥+∥gk∥2∥gk-1∥2(∥gk∥+∥gk-1∥)≤λ2+2λ22λ1+2λ23λ12≜M1.
The above inequality implies
(31)∑k=0∞∥gk∥4∥dk∥2≥∑k=0∞λ14M12,
which contradicts with (20). This completes the proof.
Remark 6.
If the search direction dk is defined by (13) with βk=-(gkTyk-1)/(gk-1Tdk-1), θk=-∥gk∥2/(gk-1Tdk-1), then the sufficient descent property and global convergence can also be proved similar to the proof of Theorems 2 and 5.
4. Numerical Results
In this section, some numerical results are provided to test the performance of the proposed method, and the proposed method is compared with the existing methods [9–11]. For the sake of simplicity, the proposed method and other comparative methods are named by NSDM, LPRP [11], SSD [10], and MPRP [9], respectively. The test problems and initial points are from [16]. The test problems are listed in Table 1. In our experiment, all the codes were written in MATLAB 7.0 and run on PC with 2.00 GB RAM memory, 2.10 GHz CPU, and windows 7 operation system.
The test problems.
Number
Function name
P1
Generalized Tridiagonal 1
P2
Extended Himmelblau
P3
Liarwhd
P4
Diagonal 7
P5
Diagonal 8
P6
Nonscomp
P7
Cosine
P8
Hager
P9
Diagonal 2
P10
Raydan 1
P11
Extended Penalty
P12
Diagonal 3
P13
Generalized Quartic
P14
Power
P15
Extended Denschnf
P16
Perturbed Tridiagonal Quadratic
P17
Extended Denschnb
P18
Raydan 2
P19
Almost Perturbed Quadratic
P20
Extended BD1
P21
Extebded Tet
P22
Extended Denschnb
P23
Arwhead
P24
Extended Tridiagonal 2
P25
Quartc
P26
Extended Maratos
P27
Engval 1
P28
Extended Quadratic Exponential EP1
In all algorithms, the step size αk is computed satisfying the modified Armijo-type line search (15) with δ=0.1, ρ=0.1, and β=1, and the stopping condition is ∥gk∥≤10-5. We also stop these algorithms if CPU time is over 500(s).
In Table 2, P, N, NI, NF, NG, and CPU stand for the number of test problems, the dimension of the vectors, the number of iterations, the number of function evaluations, the number of gradient evaluations, and the run time of CPU in seconds, respectively. The symbol “—” means that the corresponding method fails in solving the test problems when the CPU time is more than 500 seconds, and the star * denotes that the numerical result is the best one among all the comparative methods.
The numerical results of the NSDM/LPRP/SSD/MPRP methods.
P
N
NSDM
LPRP
SSD
MPRP
NI/NF/NG/CPU
NI/NF/NG/CPU
NI/NF/NG/CPU
NI/NF/NG/CPU
P1
400
57/164/58/1.934*
70/199/71/2.098
65/187/66/2.337
74/210/75/2.984
P2
1000
53/161/54/4.715*
60/181/61/4.764
119/361/120/13.974
58/175/59/7.881
P3
900
24/68/25/3.276*
68/140/69/8.175
65/199/66/9.594
80/216/81/13.400
P4
1000
36/73/37/5.990*
41/83/42/6.053
41/83/42/7.410
41/83/42/8.424
P5
900
29/59/30/3.946*
36/73/37/4.352
36/73/37/5.336
36/73/37/6.052
P6
300
70/213/71/1.424*
108/310/109/1.921
293/879/294/6.316
—/—/—/—
P7
4000
41/115/42/32.339*
73/201/74/49.889
79/216/80/95.581
82/203/83/115.440
P8
100
57/118/58/0.156
65/125/66/0.172
100/218/101/0.280
59/109/60/0.188
P9
100
960/1108/961/2.606
780/781/781/1.888*
1096/1266/1097/2.886
780/781/781/2.293
P10
100
362/802/363/0.967
230/414/231/0.546
742/1578/743/1.872
151/266/152/0.437*
P11
1000
53/186/54/9.388*
65/245/66/10.329
146/496/147/28.011
64/242/65/14.234
P12
1000
44/89/45/7.896*
49/99/50/7.933
49/99/50/9.718
49/99/50/10.955
P13
3000
52/105/53/35.802
54/109/55/33.056
55/116/56/48.891
54/109/55/55.973
P14
200
613/2798/614/5.210*
839/4045/840/6.412
650/2990/651/5.491
1601/5914/1602/16.114
P15
800
31/118/32/3.354*
86/331/87/8.206
78/304/79/9.142
82/302/83/10.982
P16
100
298/1015/299/0.796
157/504/158/0.374
499/1943/500/1.310
143/480/144/0.421
P17
1000
67/135/68/5.523
71/143/72/5.210
70/141/71/7.317
70/141/71/8.486
P18
3000
13/20/14/10.076
5/6/6/3.659*
5/6/6/5.373
5/6/6/6.194
P19
100
274/937/275/0.734
125/396/126/0.312*
544/2281/545/1.435
141/448/142/0.421
P20
3000
47/110/48/16.895
23/49/24/7.395*
58/140/59/39.243
27/59/28/19.451
P21
500
59/129/60/1.420
44/89/45/0.951*
81/185/82/2.527
46/93/47/1.576
P22
2000
69/139/70/23.469
73/147/74/22.386
73/147/74/32.423
73/147/74/36.179
P23
500
183/914/184/8.221*
—/—/—/—
—/—/—/—
—/—/—/—
P24
500
87/176/88/4.072
74/136/75/3.089
338/678/339/16.957
60/109/61/3.463
P25
100
3093/3096/3094/9.485
3145/3147/3146/8.159
3145/3147/3146/9.064
3145/3147/3146/9.984
P26
100
293/1189/294/0.936
111/410/112/0.327*
—/—/—/—
131/447/132/0.421
P27
1000
78/184/79/12.699*
92/238/93/13.478
101/267/102/18.533
—/—/—/—
P28
200
17/70/18/0.092*
29/121/30/0.137
29/121/30/0.183
29/121/30/0.198
In Table 2, we compare the performance of the new method by testing 28 different problems. According to the distribution of the star *, one can see that the NSDM method performs better than the LPRP, MPRP, and SSD methods with 14 test problems, worse than the MPRP method with 1 test problem and worse than the LPRP method with 6 test problems. However, there also exist 7 test problems that are not marked by the symbol *. Among these 7 test problems, the NSDM method performs better than other methods with 5 test problems in the number of iterations, 4 test problems in the number of function evaluations, 5 test problems in the number of gradient evaluations, and 1 test problem in CPU time.
In order to compare the performance of these methods clearly, we adopt the performance profiles introduced by Dolan and Moré [17]. The performance results are shown in Figures 1–4, respectively. In [17], Dolan and Moré introduced the notion as a means to evaluate and compare the performance of the set solvers S on a test set P. Assuming ns solvers and np problems exist, for each problem p and solver s, they defined
(32)tp,s=computingtime(thenumberofiterationsorothers)requiredtosolveproblempbysolvers.
Performance profiles about the number of iterations.
Performance profiles about the number of function evaluations.
Performance profiles about the number of gradient evaluations.
Performance profiles about CPU time.
The performance ratio is given by
(33)γp,s=tp,smin{tp,s:s∈S}.
Assume that a parameter γM≥γp,s for all p,s is chosen, and γp,s=γM if and only if solver s does not solve problem p. The performance profile is defined by
(34)Ps(t)=1npsize{p∈P:γp,s≤t}.
Hence, Ps(t) is the probability for solver s∈S that a performance ratio γp,s is within a factor t∈R of the best possible ratio. The performance profile Ps:R→[0,1] for a solver was nondecreasing, piecewise, and continuous from the right. The value of Ps(1) is the probability that the solver will win over the rest of the solvers. In general, a solver with high values of Ps(t) or at the top right of the figure is preferable or represents the best solver.
From Figures 1–4, we can obviously see that the NSDM method performs better than the MPRP method and SSD method. Although the LPRP method outperforms the NSDM method for 1.2<t<2.4 in Figure 1, 1.2<t<3.2 in Figure 2, 1.2<t<2.2 in Figure 3, and 1.1<t<2.8 in Figure 4, the NSDM method is superior to the LPRP method in the remaining interval. Moreover, from Figures 1–4, we can see that the NSDM method can solve 100% of the test problems, while the LPRP method can solve about 96% of the problems. Hence, the NSDM method is superior to the LPRP method. By comparing the value of Ps(1) in Figures 1–4, one can have a conclusion that the NSDM method is competitive to others; for example, the NSDM method is superior to other methods at least 45% in the number of iterations. In a word, one can have a conclusion that the presented method is much better than the LPRP, MPRP, and SSD methods from the analysis of the numerical results.
5. Conclusions
In this paper, we have proposed a new formula (11) that can generate different search directions by taking different parameters. Based on this formula, we have proposed a new sufficient descent method for solving unconstrained optimization problems. At each iteration, the generated direction is only related to the gradient information of two successive points. We have shown that this method is globally convergent. The numerical results indicate that the given method is superior to other methods for the test problems. In the future, we will study much better iterative methods according to (11) and perform new convergence analysis on them.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
The authors would like to thank the editor and anonymous referees for their valuable comments and suggestions, which improve this paper greatly. This work is partly supported by the National Natural Science Foundation of China (11371071), Natural Science Foundation of Liaoning Province (20102003), Scientific Research Foundation of Liaoning Province Educational Department (L2013426), and Graduate Innovation Foundation of Bohai University (201208).
DaiZ.WenF.Another improved Wei-Yao-Liu nonlinear conjugate gradient method with sufficient descent propertyUedaK.YamashitaN.Convergence properties of the regularized Newton method for the unconstrained nonconvex optimizationRaydanM.The Barzilai and Borwein gradient method for the large scale unconstrained minimization problemXiaoY.WeiZ.WangZ.A limited memory BFGS-type method for large-scale unconstrained optimizationGilbertJ. C.NocedalJ.Global convergence properties of conjugate gradient methods for optimizationLiD.-H.TianB.-S.n-step quadratic convergence of the MPRP method with a restart strategyWeiZ.YaoS.LiuL.The convergence properties of some new conjugate gradient methodsBarzilaiJ.BorweinJ. M.Two-point step size gradient methodsChengW.A two-term PRP-based descent methodZhangM.-L.XiaoY.-H.ZhouD.A simple sufficient descent method for unconstrained optimizationZhangL.ZhouW.LiD.-H.A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergenceAnX.-M.LiD.-H.XiaoY.Sufficient descent directions in unconstrained optimizationDaiZ.Two modified Polak-Ribière-Polyak-type nonlinear conjugate methods with sufficient descent propertyWanZ.HuC.YangZ.A spectral PRP conjugate gradient methods for nonconvex optimization problem based on modified line searchWeiZ.LiG.QiL.New nonlinear conjugate gradient formulas for large-scale unconstrained optimization problemsAndreiN.An unconstrained optimization test functions collectionDolanE. D.MoréJ. J.Benchmarking optimization software with performance profiles