AAA Abstract and Applied Analysis 1687-0409 1085-3375 Hindawi Publishing Corporation 940120 10.1155/2014/940120 940120 Research Article A New Method with Sufficient Descent Property for Unconstrained Optimization Qian Weiyi Cui Haijuan Petrusel Adrian College of Mathematics and Physics Bohai University Jinzhou 121000 China bhu.edu.cn 2014 1322014 2014 14 09 2013 21 12 2013 04 01 2014 13 2 2014 2014 Copyright © 2014 Weiyi Qian and Haijuan Cui. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Recently, sufficient descent property plays an important role in the global convergence analysis of some iterative methods. In this paper, we propose a new iterative method for solving unconstrained optimization problems. This method provides a sufficient descent direction for objective function. Moreover, the global convergence of the proposed method is established under some appropriate conditions. We also report some numerical results and compare the performance of the proposed method with some existing methods. Numerical results indicate that the presented method is efficient.

1. Introduction

Consider the unconstrained optimization problem (1)minxRn  f(x), where f:RnR is a continuously differentiable function. For solving (1), the following iterative formula is often used: (2)xk+1=xk+αkdk,k=0,1,2,, where xk is the current iterative point, αk>0 is a step size which is determined by some line search, and dk is a search direction. Different search directions correspond to different iterative methods . Throughout this paper, gk=f(xk) is an n-dimensional column vector, yk-1=gk-gk-1, · and T are defined as the Euclidian norm and transpose of vectors, respectively. Generally, if there exists a positive constant c>0, such that (3)gkTdk-cgk2, then the search direction dk possesses sufficient descent property. This property may be crucial for the iterative methods to be global convergence , and some numerical experiments have shown that sufficient descent methods are efficient . However, not all iterative methods can satisfy sufficient descent condition (3) under some inexact linear search conditions, such as the conjugate gradient method proposed by Wei et al.  or the gradient method presented in . In order to make the search direction dk satisfy the condition (3) at each step, much effort has been done .

In , Cheng proposed a modified PRP conjugate gradient method in which the search direction dk is determined by (4)dk={-gk,k=0,-gk+βk(I-gkgkTgk2)dk-1,k1, where βk=βkPRP=gkTyk-1/gk-12, gkgkT is a n×n matrix and I is an identity matrix.

In , Zhang et al. derived a simple sufficient descent method; the search direction dk is given by (5)dk={-gk,k=0,-gk+(I-gkgkTgk2)gk-1,k1.

Recently, Zhang et al.  presented a three-term modified PRP conjugate gradient method; the search direction dk is generated by (6)dk={-gk,k=0,-gk+βkdk-1-θkyk-1,k1, where (7)βk=βkPRP=gkTyk-1gk-12,θk=gkTdk-1gk-12.

We note that (4), (5), and (6) can be written as a linear combination of the steepest descent direction and the projection of the original direction; that is, (8)dk={-gk,k=0,-gk+λk(I-μkgkTμkTgk)d¯k,k1, where d¯k is an original direction, λk is a scalar, and μkRn is any vector such that μkTgk0 holds. Indeed, if λk=βkPRP, μk=gk, and d¯k=dk-1, then (8) reduces to the method (4). Let λk=1, μk=gk, and d¯k=gk-1; then (8) reduces to the method (5). When λk=βkPRP, μk=yk-1, and d¯k=dk-1, it is easy to deduce that (8) reduces to the method (6). From (8), we can easily obtain (9)gkT(λk(I-μkgkTμkTgk)d¯k)=0. Thus, one has gkTdk=-gk2 for all k. It implies that the sufficient descent condition (3) holds with c=1. But the method (5) does not possess a restart feature which can avoid the jamming phenomenon. In addition, the methods (4) and (6) may not always be globally convergent under some inexact linear search , such as the standard Armijo-type line search which is given as follows: (10)αk=max{ρj,j=0,1,2,},f(xk+αkdk)fk+δαkgkTdk, where ρ(0,1) and δ(0,1/2).

Motivated by (8) and (9), our purpose is to design a direction in the subspace {dRngkTd=-tk}, where tk0 is a parameter. This direction can be written as (11)d^k=λk(I-μkgkTμkTgk)d¯k-tkvkvkTgk, where vkRn is any vector such that vkTgk0 holds. Let (12)dk={-gk,k=0,-gk+d^k,k1. It is clear that (8) can be regarded as a special case of (12) with tk=0. Therefore, (12) will have a wider application than (8). If we take λk=βkPRP, μk=yk-1, vk=yk-1, d¯k=gk-1, and tk=(gkTyk-1)2/gk-12 in (12), then a new search direction is given as follows: (13)dk={-gk,k=0,-gk+βkgk-1-θkyk-1,k1, where (14)βk=gkTyk-1gk-12,θk=gk2gk-12.

In this paper, we present a new iterative method for unconstrained optimization problems; the search direction is defined by (13) and (14). We prove that dk satisfies gkTdk-gk2 without any line search. It means that the sufficient descent condition (3) holds with c=1. Furthermore, we prove that the proposed method is globally convergent under the standard Armijo-type line search or the modified Armijo-type line search. From (13) and (14), we can see that the proposed method has a restart feature that directly addresses the jamming problem. In fact, when the step xk-xk-1 is small, then the factor yk-1 tends to zero vector. Therefore, the direction dk generated by (13) is very close to the steepest descent direction -gk.

The rest of this paper is organized as follows. In Section 2, we propose a new algorithm and discuss its sufficient descent property. In Section 3, the global convergence of the proposed method is proved under the modified Armijo-type line search or the standard Armijo line search. Some numerical results are given to test the performance of the proposed method in Section 4. Finally, we have some conclusions about the proposed method.

2. New Algorithm

In this section, the specific iterative steps of the proposed algorithm are listed as follows.

Algorithm 1.

Consider the following.

Step 1. Choose parameters δ(0,1), ρ(0,1), and β>0; given an initial point x0Rn. Set d0=-g0 and k:=0.

Step 2. If gk=0, then stop; otherwise go to the next step.

Step 3. Determine a step size αk satisfying modified Armijo-type line search conditions: (15)αk=max{βρj,j=0,1,2,},f(xk+αkdk)f(xk)-δαk2dk2.

Step 4. Let xk+1=xk+αkdk.

Step 5. Calculate the search direction dk+1 by (13) and (14).

Step 6. Set k:=k+1, and go to Step 2.

Theorem 2.

Let sequences {dk} and {xk} be generated by (13) and (2); then (16)gkTdk-gk2, for all k0.

Proof.

Obviously, the conclusion is true for k=0.

If k1, multiplying (13) by gkT, we have (17)gkTdk=-gk2+gkT(βkgk-1-θkyk-1)=-gk2+gkTyk-1gk-12gkTgk-1-gk2gk-12gkTyk-1=-gk2+gkTyk-1gk-12(gkTgk-1-gk2)=-gk2+gkTyk-1gk-12gkT(gk-1-gk)=-gk2-(gkTyk-1)2gk-12-gk2. Therefore, the inequality (16) holds for all k0. The proof is completed.

Theorem 2 shows that the search direction dk given by (13) possesses the sufficient descent property for any line search.

3. Convergence Analysis

The following assumptions are often needed to prove the global convergence of nonlinear conjugate gradient methods [14, 15]. In this section, we also use these assumptions in the convergence analysis of the proposed method.

Assumption 3.

Consider the following.

The level set S={xRn:f(x)f(x0)} is bounded.

In a neighborhood N of S, the function f is continuously differentiable and its gradient is Lipchitz continuous; namely, there exists a constant L>0, such that (18)g(x)-g(y)Lx-y,x,yN.

Lemma 4.

Suppose that Assumption 3 holds. Let {xk} and {dk} be generated by Algorithm 1. If the step size αk is obtained by (15) or (10), then there exists a constant m>0, such that (19)αkmgk2dk2, and one can also have (20)k=0gk4dk2<.

Proof.

The results of this lemma will be proved in the following two cases.

Case 1. Let the step size αk be computed by (15). From Theorem 2, we have gkdk-gkTdkgk2; thus dkgk. If αk=β, then we obtain αkβgk2/dk2. If αk<β, then we know ρ-1αk does not satisfy the inequality (15). So we have (21)f(xk+ρ-1αkdk)-fk>-δαk2ρ-2dk2. By Assumption 3(ii) and the mean value theorem, we have (22)f(xk+ρ-1αkdk)-fk=ρ-1αkg(xk+tkρ-1αkdk)Tdk=ρ-1αkgkTdk+ρ-1αk(g(xk+tkρ-1αkdk)-gk)Tdkρ-1αkgkTdk+Lρ-2αk2dk2, where tk(0,1).

From (21) and (22), we have (23)-δαk2ρ-2dk2<ρ-1αkgkTdk+Lρ-2αk2dk2. Using Theorem 2 again, we get (24)αk>ρgk2(L+δ)dk2. Let m=min{β,ρ/(L+δ)}; then the inequality (19) is obtained.

From Assumption 3(i), there exists a constant M>0, such that |f(x)|<M, xS. By (15), (19), and Theorem 2, we have (25)k=0n-1(δm2gk4dk4dk2)k=0n-1(δαk2dk2)k=0n-1(fk-fk+1)<2M. Therefore, from the above inequality, we have (26)k=0gk4dk2<.

Case 2. Let the step size αk be computed by (10). Similar to the proof of the above case, we can obtain (27)αkgk2dk2,ifαk=1,αk>ρ(1-δ)gk2Ldk2,ifαk<1. Let m=min{1,ρ(1-δ)/L}; then the inequality (19) is obtained. From (10), (19), and Theorem 2, we obtain (28)k=0n-1(δmgk2dk2gk2)k=0n-1(-δαkgkTdk)k=0n-1(fk-fk+1)<2M. By the above inequality, we can get (20). The proof is completed.

Theorem 5.

Suppose that Assumption 3 holds. If Algorithm 1 generates infinite sequences {dk} and {xk}, then one has (29)limkinfgk=0.

Proof.

We obtain this conclusion (29) by contradiction. Suppose that (29) does not hold, then there exists a positive constant λ1>0, such that gkλ1, for all k0. From Assumption 3(i), we know that there also exists a positive constant λ2>0, such that gkλ2, for all k0. Since dk=-gk+βkgk-1+θkyk-1, then we have (30)dkgk+|βk|gk-1+|θk|yk-1gk+gk(gk+gk-1)gk-12gk-1+gk2gk-12(gk+gk-1)λ2+2λ22λ1+2λ23λ12M1. The above inequality implies (31)k=0gk4dk2k=0λ14M12, which contradicts with (20). This completes the proof.

Remark 6.

If the search direction dk is defined by (13) with βk=-(gkTyk-1)/(gk-1Tdk-1), θk=-gk2/(gk-1Tdk-1), then the sufficient descent property and global convergence can also be proved similar to the proof of Theorems 2 and 5.

4. Numerical Results

In this section, some numerical results are provided to test the performance of the proposed method, and the proposed method is compared with the existing methods . For the sake of simplicity, the proposed method and other comparative methods are named by NSDM, LPRP , SSD , and MPRP , respectively. The test problems and initial points are from . The test problems are listed in Table 1. In our experiment, all the codes were written in MATLAB 7.0 and run on PC with 2.00 GB RAM memory, 2.10 GHz CPU, and windows 7 operation system.

The test problems.

Number Function name
P1 Generalized Tridiagonal 1
P2 Extended Himmelblau
P3 Liarwhd
P4 Diagonal 7
P5 Diagonal 8
P6 Nonscomp
P7 Cosine
P8 Hager
P9 Diagonal 2
P10 Raydan 1
P11 Extended Penalty
P12 Diagonal 3
P13 Generalized Quartic
P14 Power
P15 Extended Denschnf
P17 Extended Denschnb
P18 Raydan 2
P20 Extended BD1
P21 Extebded Tet
P22 Extended Denschnb
P24 Extended Tridiagonal 2
P25 Quartc
P26 Extended Maratos
P27 Engval 1

In all algorithms, the step size αk is computed satisfying the modified Armijo-type line search (15) with δ=0.1, ρ=0.1, and β=1, and the stopping condition is gk10-5. We also stop these algorithms if CPU time is over 500(s).

In Table 2, P, N, NI, NF, NG, and CPU stand for the number of test problems, the dimension of the vectors, the number of iterations, the number of function evaluations, the number of gradient evaluations, and the run time of CPU in seconds, respectively. The symbol “—” means that the corresponding method fails in solving the test problems when the CPU time is more than 500 seconds, and the star * denotes that the numerical result is the best one among all the comparative methods.

The numerical results of the NSDM/LPRP/SSD/MPRP methods.

P N NSDM LPRP SSD MPRP
NI/NF/NG/CPU NI/NF/NG/CPU NI/NF/NG/CPU NI/NF/NG/CPU
P1 400 57/164/58/1.934* 70/199/71/2.098 65/187/66/2.337 74/210/75/2.984
P2 1000 53/161/54/4.715* 60/181/61/4.764 119/361/120/13.974 58/175/59/7.881
P3 900 24/68/25/3.276* 68/140/69/8.175 65/199/66/9.594 80/216/81/13.400
P4 1000 36/73/37/5.990* 41/83/42/6.053 41/83/42/7.410 41/83/42/8.424
P5 900 29/59/30/3.946* 36/73/37/4.352 36/73/37/5.336 36/73/37/6.052
P6 300 70/213/71/1.424* 108/310/109/1.921 293/879/294/6.316 —/—/—/—
P7 4000 41/115/42/32.339* 73/201/74/49.889 79/216/80/95.581 82/203/83/115.440
P8 100 57/118/58/0.156 65/125/66/0.172 100/218/101/0.280 59/109/60/0.188
P9 100 960/1108/961/2.606 780/781/781/1.888* 1096/1266/1097/2.886 780/781/781/2.293
P10 100 362/802/363/0.967 230/414/231/0.546 742/1578/743/1.872 151/266/152/0.437*
P11 1000 53/186/54/9.388* 65/245/66/10.329 146/496/147/28.011 64/242/65/14.234
P12 1000 44/89/45/7.896* 49/99/50/7.933 49/99/50/9.718 49/99/50/10.955
P13 3000 52/105/53/35.802 54/109/55/33.056 55/116/56/48.891 54/109/55/55.973
P14 200 613/2798/614/5.210* 839/4045/840/6.412 650/2990/651/5.491 1601/5914/1602/16.114
P15 800 31/118/32/3.354* 86/331/87/8.206 78/304/79/9.142 82/302/83/10.982
P16 100 298/1015/299/0.796 157/504/158/0.374 499/1943/500/1.310 143/480/144/0.421
P17 1000 67/135/68/5.523 71/143/72/5.210 70/141/71/7.317 70/141/71/8.486
P18 3000 13/20/14/10.076 5/6/6/3.659* 5/6/6/5.373 5/6/6/6.194
P19 100 274/937/275/0.734 125/396/126/0.312* 544/2281/545/1.435 141/448/142/0.421
P20 3000 47/110/48/16.895 23/49/24/7.395* 58/140/59/39.243 27/59/28/19.451
P21 500 59/129/60/1.420 44/89/45/0.951* 81/185/82/2.527 46/93/47/1.576
P22 2000 69/139/70/23.469 73/147/74/22.386 73/147/74/32.423 73/147/74/36.179
P23 500 183/914/184/8.221* —/—/—/— —/—/—/— —/—/—/—
P24 500 87/176/88/4.072 74/136/75/3.089 338/678/339/16.957 60/109/61/3.463
P25 100 3093/3096/3094/9.485 3145/3147/3146/8.159 3145/3147/3146/9.064 3145/3147/3146/9.984
P26 100 293/1189/294/0.936 111/410/112/0.327* —/—/—/— 131/447/132/0.421
P27 1000 78/184/79/12.699* 92/238/93/13.478 101/267/102/18.533 —/—/—/—
P28 200 17/70/18/0.092* 29/121/30/0.137 29/121/30/0.183 29/121/30/0.198

In Table 2, we compare the performance of the new method by testing 28 different problems. According to the distribution of the star *, one can see that the NSDM method performs better than the LPRP, MPRP, and SSD methods with 14 test problems, worse than the MPRP method with 1 test problem and worse than the LPRP method with 6 test problems. However, there also exist 7 test problems that are not marked by the symbol *. Among these 7 test problems, the NSDM method performs better than other methods with 5 test problems in the number of iterations, 4 test problems in the number of function evaluations, 5 test problems in the number of gradient evaluations, and 1 test problem in CPU time.

In order to compare the performance of these methods clearly, we adopt the performance profiles introduced by Dolan and Moré . The performance results are shown in Figures 14, respectively. In , Dolan and Moré introduced the notion as a means to evaluate and compare the performance of the set solvers S on a test set P. Assuming ns solvers and np problems exist, for each problem p and solver s, they defined (32)tp,s=computing  time  (the  number  of  iterations  or  others)required  to  solve  problem  p  by  solver  s.

Performance profiles about the number of iterations.

Performance profiles about the number of function evaluations.

The performance ratio is given by (33)γp,s=tp,smin{tp,s:sS}. Assume that a parameter γMγp,s for all p,s is chosen, and γp,s=γM if and only if solver s does not solve problem p. The performance profile is defined by (34)Ps(t)=1npsize{pP:γp,st}. Hence, Ps(t) is the probability for solver sS that a performance ratio γp,s is within a factor tR of the best possible ratio. The performance profile Ps:R[0,1] for a solver was nondecreasing, piecewise, and continuous from the right. The value of Ps(1) is the probability that the solver will win over the rest of the solvers. In general, a solver with high values of Ps(t) or at the top right of the figure is preferable or represents the best solver.

From Figures 14, we can obviously see that the NSDM method performs better than the MPRP method and SSD method. Although the LPRP method outperforms the NSDM method for 1.2<t<2.4 in Figure 1, 1.2<t<3.2 in Figure 2, 1.2<t<2.2 in Figure 3, and 1.1<t<2.8 in Figure 4, the NSDM method is superior to the LPRP method in the remaining interval. Moreover, from Figures 14, we can see that the NSDM method can solve 100% of the test problems, while the LPRP method can solve about 96% of the problems. Hence, the NSDM method is superior to the LPRP method. By comparing the value of Ps(1) in Figures 14, one can have a conclusion that the NSDM method is competitive to others; for example, the NSDM method is superior to other methods at least 45% in the number of iterations. In a word, one can have a conclusion that the presented method is much better than the LPRP, MPRP, and SSD methods from the analysis of the numerical results.

5. Conclusions

In this paper, we have proposed a new formula (11) that can generate different search directions by taking different parameters. Based on this formula, we have proposed a new sufficient descent method for solving unconstrained optimization problems. At each iteration, the generated direction is only related to the gradient information of two successive points. We have shown that this method is globally convergent. The numerical results indicate that the given method is superior to other methods for the test problems. In the future, we will study much better iterative methods according to (11) and perform new convergence analysis on them.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors would like to thank the editor and anonymous referees for their valuable comments and suggestions, which improve this paper greatly. This work is partly supported by the National Natural Science Foundation of China (11371071), Natural Science Foundation of Liaoning Province (20102003), Scientific Research Foundation of Liaoning Province Educational Department (L2013426), and Graduate Innovation Foundation of Bohai University (201208).

Dai Z. Wen F. Another improved Wei-Yao-Liu nonlinear conjugate gradient method with sufficient descent property Applied Mathematics and Computation 2012 218 14 7421 7430 10.1016/j.amc.2011.12.091 MR2892710 ZBL1254.65074 Ueda K. Yamashita N. Convergence properties of the regularized Newton method for the unconstrained nonconvex optimization Applied Mathematics and Optimization 2010 62 1 27 46 10.1007/s00245-009-9094-9 MR2653894 ZBL1228.90087 Raydan M. The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem SIAM Journal on Optimization 1997 7 1 26 33 10.1137/S1052623494266365 MR1430555 ZBL0898.90119 Xiao Y. Wei Z. Wang Z. A limited memory BFGS-type method for large-scale unconstrained optimization Computers & Mathematics with Applications 2008 56 4 1001 1009 10.1016/j.camwa.2008.01.028 MR2437872 ZBL1155.90441 Gilbert J. C. Nocedal J. Global convergence properties of conjugate gradient methods for optimization SIAM Journal on Optimization 1992 2 1 21 42 10.1137/0802003 MR1147881 ZBL0767.90082 Li D.-H. Tian B.-S. n -step quadratic convergence of the MPRP method with a restart strategy Journal of Computational and Applied Mathematics 2011 235 17 4978 4990 10.1016/j.cam.2011.04.026 MR2817305 ZBL1221.65144 Wei Z. Yao S. Liu L. The convergence properties of some new conjugate gradient methods Applied Mathematics and Computation 2006 183 2 1341 1350 10.1016/j.amc.2006.05.150 MR2294093 ZBL1116.65073 Barzilai J. Borwein J. M. Two-point step size gradient methods IMA Journal of Numerical Analysis 1988 8 1 141 148 10.1093/imanum/8.1.141 MR967848 ZBL0638.65055 Cheng W. A two-term PRP-based descent method Numerical Functional Analysis and Optimization 2007 28 11-12 1217 1230 10.1080/01630560701749524 MR2372254 ZBL1138.90028 Zhang M.-L. Xiao Y.-H. Zhou D. A simple sufficient descent method for unconstrained optimization Mathematical Problems in Engineering 2010 2010 9 684705 10.1155/2010/684705 MR2740327 ZBL1202.90246 Zhang L. Zhou W. Li D.-H. A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergence IMA Journal of Numerical Analysis 2006 26 4 629 640 10.1093/imanum/drl016 MR2263891 ZBL1106.65056 An X.-M. Li D.-H. Xiao Y. Sufficient descent directions in unconstrained optimization Computational Optimization and Applications 2011 48 3 515 532 Supplementary material available online 10.1007/s10589-009-9268-z MR2784403 ZBL1242.90223 Dai Z. Two modified Polak-Ribière-Polyak-type nonlinear conjugate methods with sufficient descent property Numerical Functional Analysis and Optimization 2010 31 7–9 892 906 10.1080/01630563.2010.498597 MR2683711 ZBL1202.90239 Wan Z. Hu C. Yang Z. A spectral PRP conjugate gradient methods for nonconvex optimization problem based on modified line search Discrete and Continuous Dynamical Systems B 2011 16 4 1157 1169 10.3934/dcdsb.2011.16.1157 MR2836104 ZBL1229.90209 Wei Z. Li G. Qi L. New nonlinear conjugate gradient formulas for large-scale unconstrained optimization problems Applied Mathematics and Computation 2006 179 2 407 430 10.1016/j.amc.2005.11.150 MR2293156 ZBL1106.65055 Andrei N. An unconstrained optimization test functions collection Advanced Modeling and Optimization 2008 10 1 147 161 MR2424936 ZBL1161.90486 Dolan E. D. Moré J. J. Benchmarking optimization software with performance profiles Mathematical Programming 2002 91 2 201 213 10.1007/s101070100263 MR1875515 ZBL1049.90004