AAA Abstract and Applied Analysis 1687-0409 1085-3375 Hindawi Publishing Corporation 507102 10.1155/2014/507102 507102 Research Article The Hybrid BFGS-CG Method in Solving Unconstrained Optimization Problems http://orcid.org/0000-0002-6950-6937 Ibrahim Mohd Asrul Hery 1 Mamat Mustafa 2,3 Leong Wah June 4 Jodar Lucas 1 School of Applied Sciences and Foundation Infrastructure University Kuala Lumpur, 43000 Kajang Malaysia iukl.edu.my 2 Faculty of Informatics and Computing Universiti Sultan Zainal Abidin Tembila Campus, 22200 Besut Malaysia unisza.edu.my 3 Department of Mathematics Faculty of Science and Technology Universiti Malaysia Terengganu (UMT), 21030 Kuala Terengganu Malaysia umt.edu.my 4 Department of Mathematics Faculty of Science Universiti Putra Malaysia (UPM), 43400 Serdang Malaysia upm.edu.my 2014 432014 2014 22 04 2013 23 01 2014 4 3 2014 2014 Copyright © 2014 Mohd Asrul Hery Ibrahim et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In solving large scale problems, the quasi-Newton method is known as the most efficient method in solving unconstrained optimization problems. Hence, a new hybrid method, known as the BFGS-CG method, has been created based on these properties, combining the search direction between conjugate gradient methods and quasi-Newton methods. In comparison to standard BFGS methods and conjugate gradient methods, the BFGS-CG method shows significant improvement in the total number of iterations and CPU time required to solve large scale unconstrained optimization problems. We also prove that the hybrid method is globally convergent.

1. Introduction

The unconstrained optimization problem only requires the objective function as (1) min x R n f ( x ) , where R n is an n -dimensional Euclidean space and f : R n R is continuously differentiable. The iterative methods are used to solve (1). On the i th iteration, an approximation point x i and the ( i + 1 ) th iteration are given by (2) x i + 1 = x i + α i d i , where d i denotes the search direction and α i denotes the step size. The search direction must satisfy the relation g i T d i < 0 , which guarantees that d i is a descent direction of f ( x ) at x i . The different choices of d i and α i yield the different convergence properties. Generally the first order condition f ( x * ) = 0 is used to check for local convergence to stationary point x * . There are many ways to calculate the search direction depending on the method used, such as the steepest descent method, conjugate gradient (CG) method, Newton-Raphson method, and quasi-Newton method.

The different choices of the step size ensure that the sequence of iterates x i defined by (2) is globally convergent with some rates of convergence. There are two ways to determine the values of the step size, the exact line search, and the inexact line search. For the exact line search, α i is calculated by using the formula α i = argmin a > 0 ( f ( x i + α i d i ) ) . However, it is difficult and often impossible to find the value of step size in practical computation using the exact line search. Hence, the inexact line search is proposed by previous researchers like Armijo , Wolfe [2, 3], and Goldstein  to overcome the problem. Recently Shi proposed a new inexact line search rule similar to the Armijo line search and analysed the global converge . Shi also claimed that among several well-known inexact line search procedures published by previous researchers, the Armijo line search rule is one of the most useful and the easiest to be implemented in computational calculations. The Armijo line search rule can be described as follows: (3) Given    s > 0 , β ( 0,1 ) , σ ( 0,1 ) , α i = max { s , s β , s β 2 , } such    that f ( x i ) - f ( x i + α i d i ) - σ α i g i T d i , i = 0,1 , 2 , . Then, the sequence of { x i } i = 0 is converged to the optimal point, x * , which minimises f . Hence, we will use the Armijo line search in this research associated with the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method and the new hybrid method.

This paper is organised as follows. In Section 2, we elaborate the step size and search direction that are used in this research. Here, the BFGS method and CG method also will be presented. Then, the new hybrid method and convergence analysis will be discussed in Section 3. An explanation about the numerical results is provided in Section 4 and the paper ends with a short conclusion in Section 5.

2. The Search Direction

The different methods in solving unconstrained optimization problems depend on the calculation of search direction, d i in (2). In this paper, we will focus on the CG method and quasi-Newton methods. The CG method is useful for finding the minimum value of functions or unconstrained optimization problems, which are introduced by . The search direction of the CG method is (4) d i = { - g i , i = 0 , - g i + β i d i - 1 , i 2 , where g i = f ( x i ) and β i is known as the CG coefficient. There are many ways to calculate β i and some well-known formulas are (5) β i FR = g i T g i g i - 1 2 , β i PR = g i T ( g i - g i - 1 ) g i - 1 2 , β i HS = g i T ( g i - g i - 1 ) ( g i - g i - 1 ) T d i - 1 , where g i and g i - 1 are gradients of f ( x ) at points x i and x i - 1 , respectively, while · is a norm of vectors and d i - 1 is a search direction for the previous iteration. The above corresponding coefficients are known as Fletcher-Reeves (CG-FR) , Polak-Ribière (CG-PR) , and Hestenes-Stiefel (CG-HS) .

In quasi-Newton methods, the search direction is the solution of linear system (6) d i = - H i g i , where H i is an approximation of Hessian. Initial matrix H 0 is chosen by the identity matrix, which subsequently updates by an update formula. There are a few update formulas that are widely used like Davidon-Fletcher-Powell (DFP), BFGS, and Broyden family formula. This research uses a BFGS formula in a classical algorithm and the new hybrid method. The update formula for BFGS is (7) H i + 1 = H i - H i s i s i T H i s i T H i s i + y i y i T s i T y i , with s i = x i - x i - 1 and y i = g i - g i - 1 . The approximation that the Hessian must fulfil is (8) H i + 1 s i = y i . This condition is required to hold for the updated matrix H i + 1 . Note that it is only possible to fulfil the secant equation if (9) s i T y i > 0 , which is known as the curvature condition.

3. The New Hybrid Method

The modification of the quasi-Newton method based on a hybrid method has already been introduced by previous researchers. One of the studies is a hybridization of the quasi-Newton and Gauss-Seidel methods, aimed at solving the system of linear equations in . Luo et al.  suggest the new hybrid method, which can solve the system of nonlinear equations by combining the quasi-Newton method with chaos optimization. Han and Neumann  combine the quasi-Newton methods and Cauchy descent method to solve unconstrained optimization problems, which is recognised as the quasi-Newton-SD method.

Hence, the modification of the quasi-Newton method by previous researchers spawned the new idea of hybridizing the classical method to yield the new hybrid method. Hence, this study proposes a new hybrid search direction that combines the concept of search direction of the quasi-Newton and CG methods. It yields a new search direction of the hybrid method which is known as the BFGS-CG method. The search direction for the BFGS-CG method is (10) d i = { - H i g i , i = 0 , - H i g i + η ( - g i + β i d i - 1 ) , i 1 , where η > 0 and β i = ( g i T g i - 1 / g i T d i - 1 ) .

Hence, the complete algorithms for the BFGS method, CG-HS, CG-PR, and CG-FR methods, and the BFGS-CG method will be arranged in Algorithms 1, 2, and 3, respectively.

Algorithm 1 (<italic>BFGS method</italic>).

States the following.

Step  0. Given a starting point x 0 and H 0 = I n , choose values for s , β , and, σ and set i = 1 .

Step  1. Terminate if g ( x i + 1 ) < 1 0 - 6 or i 10000 .

Step  2. Calculate the search direction by (6).

Step  3. Calculate the step size α i by (3).

Step  4. Compute the difference between s i = x i - x i - 1 and y i = g i - g i - 1 .

Step  5. Update H i - 1 by (7) to obtain H i .

Step  6. Set i = i + 1 and go to Step 1.

Algorithm 2 (<italic>CG-HS, CG-PR, and CG-FR</italic>).

States the following.

Step  0. Given a starting point x 0 , choose values for s , β , and σ and set i = 1 .

Step  1. Terminate if g ( x k + 1 ) < 1 0 - 6 or i 10000 .

Step  2. Calculate the search direction by (4) with respect to the coefficient of CG.

Step  3. Calculate the step size α i by (3).

Step  4. Compute the difference between s i = x i - x i - 1 and y i = g i - g i - 1 .

Step  5. Set i = i + 1 and go to Step 1.

Algorithm 3 (<italic>BFGS-CG method</italic>).

States the following.

Step  0. Given a starting point x 0 and H 0 = I n , choose values for s , β , and σ and set i = 1 .

Step  1. Terminate if g ( x i + 1 ) < 1 0 - 6 or i 10000 .

Step  2. Calculate the search direction by (10).

Step  3. Calculate the step size α i by (3).

Step  4. Compute the difference between s i = x i - x i - 1 and y i = g i - g i - 1 .

Step  5. Update H i - 1 by (7) to obtain H i .

Step  6. Set i = i + 1 and go to Step 1.

Based on Algorithms 1, 2, and 3 we assume that every search direction d i satisfied the descent condition (11) g i T d i < 0 , for all i 0 . If there exists a constant c 1 > 0 such that (12) g i T d i c 1 g i 2 for all i 0 , then the search directions satisfy the sufficient descent condition which can be proved in Theorem 6. Hence, we need to make a few assumptions based on the objective function.

Assumption 4.

Consider the following.

the objective function f is twice continuously differentiable.

the level set L is convex. Moreover, positive constants c 1 and c 2 exist, satisfying (13) c 1 z 2 z T F ( x ) z c 2 z 2 ,

for all z R n and x L , where F ( x ) is the Hessian matrix for f .

the Hessian matrix is Lipschitz continuous at the point x * ; that is, there exists the positive constant c 3 satisfying (14) g ( x ) - g ( x * ) c 3 x - x *

for all x in a neighbourhood of x * .

Theorem 5 (see [<xref ref-type="bibr" rid="B15">15</xref>, <xref ref-type="bibr" rid="B16">16</xref>]).

Let { B i } be generated by the BFGS formula (8), where B 1 is symmetric and positive definite, and y i T s i > 0 for all i . Furthermore, assume that { s i } and { y i } are such that (15) ( y i - G * ) s i s i ε i for some symmetric and positive definite matrix G ( x * ) and for some sequence { ε i } with the property i = 1 ε i < . Then (16) lim i ( B i - G * ) d i d i = 0 and the sequence { B i } , { B i - 1 } are bound.

Theorem 6.

Suppose that Assumption 4 and Theorem 5 hold. Then condition (12) holds for all i 0 .

Proof.

From (12), we see that (17) g i T d i = - g i T B i - 1 g i + η g i T ( - g i + ( g i T g i - 1 g i T d i - 1 ) d i - 1 ) = - g i T B i - 1 g i + η ( - g i T g i + ( g i T g i - 1 g i T d i - 1 ) g i T d i - 1 ) = - g i T B i - 1 g i + η ( - g i T g i + g i T g i - 1 ) . Based on Powell , g i T g i - 1 ε g i 2 with ε = ( 0,1 ] , and (18) g i T d i = - g i T B i - 1 g i + η ( - g i 2 + ε g i 2 ) - λ i g i 2 + ( - η + η ε ) g i 2 c 1 g i 2 , where c 1 = - ( λ i + η - η ε ) which is bound away from zero. Hence, g i T d i c 1 g i 2 holds. The proof is completed.

Lemma 7.

Under Assumption 4, positive constants ϖ 1 and ϖ 2 exist, such that for any x i and any d i with g i T d i < 0 , the step size a i produced by Algorithm 2 will satisfy either (19) f ( x i + α i d i ) - f i - ϖ 1 ( g i T d i ) 2 d i 2 or (20) f ( x i + α i d i ) - f i ϖ 1 g i T d i .

Proof.

Suppose that a i < 1 , which means that (3) failed for a step size a a i / τ : (21) f ( x i + α i d i ) - f ( x i ) ϖ a g i T d i . Then, by using the mean value theorem, we obtain (22) f ( x i + 1 ) - f ( x i ) = g - T ( x i + 1 - x i ) , where g - = f ( x - ) , for some x - ( x i , x i + 1 ) . Now, by the Cauchy-Schwartz inequality, we get (23) g - T ( x i + 1 - x i ) = g T ( x i + 1 - x i ) + ( g - - g i ) T ( x i + 1 - x i ) = g T ( x i + 1 - x i ) + g - - g i ( x i + 1 - x i ) g T ( x i + 1 - x i ) + μ x i + 1 - x i 2 g T ( a d i ) + μ a d 2 g T ( a d i ) + μ ( a d ) 2 . Thus, from H3 (24) ( ϖ - 1 ) a g i T d i < a ( g - - g i ) T d i M ( a d i ) 2 , which implies that (25) a i τ a > τ ( 1 - ϖ ) - g i T d i M ( a d i ) 2 . Substituting this into (21), we have (26) f ( x i + α i d i ) - f ( x i ) c 2 - g i T d i ( a d i ) 2 , where c 2 = τ ( 1 - ϖ ) / M , which gives (19).

Theorem 8 (global convergence).

Suppose that Assumption 4 and Theorem 5 hold. Then (27) lim i g i 2 = 0 .

Proof.

Combining descent property (12) and Lemma 7 gives (28) i = 0 g i 4 d i 2 < . Hence, from Theorem 6, we can define that d i - c g i . Then, (28) will be simplified as i = 0 g i 2 < . Therefore, the proof is completed.

4. Numerical Result

In this section, we use the test problem considered by Andrei , Michalewicz , and Moré et al.  in Table 1 to analyse the improvement of the BFGS-CG method compared with the BFGS method and CG method. Each of the test problems is tested with dimensions varying from 2 to 1,000 variables. This represents a total of 159 test problems. As suggested by , for each of the test problems, the initial point x 0 will further subtract from the minimum point. In doing so, this leads us to test the global convergence properties and the robustness of our method. For the Armijo line search, we use s = 1 , β = 0.5 , and σ = 0.1 . The stopping criteria we use are g i 1 0 - 6 and the number of iterations exceeds its limit, which is set to be 10,000. In our implementation, the numerical tests were performed on an Acer Aspire with a Windows 7 operating system and using Matlab 2012 languages.

A list of problem functions.

Test problem n -dimensional Sources
Powell badly scaled 2 Mor e ´ et al. 
Beale 2 Mor e ´ et al. 
Biggs Exp 6 6 Mor e ´ et al. 
Chebyquad 4, 6 Mor e ´ et al. 
Colville polynomial 4 Michalewicz 
Variably dimensioned 4, 8 Mor e ´ et al. 
Freudenstein and Roth 2 Mor e ´ et al. 
Goldstein price polynomial 2 Michalewicz 
Himmelblau 2 Andrei 
Penalty 1 2, 4 Mor e ´ et al. 
Extended Powell singular 4, 8 Mor e ´ et al. 
Extended Rosenbrock 2, 10, 100, 200, 500, 1000 Andrei 
Trigonometric 6 Andrei 
Watson 4, 8 Mor e ´ et al. 
Six-hump camel back polynomial 2 Michalewicz 
Extended shallow 2, 4, 10, 100, 200, 500, 1000 Andrei 
Extended strait 2, 4, 10, 100, 200, 500, 1000 Andrei 
Scale 2 Michalewicz 
Raydan 1 2, 4 Andrei 
Raydan 2 2, 4 Andrei 
Diagonal 3 2 Andrei 
Cube 2, 10, 100, 200 Mor e ´ et al. 

The performance results will be shown in Figures 1 and 2, respectively, using the performance profile introduced by Dolan and Moré . The performance profile seeks to find how well the solvers perform relative to the other solvers on a set of problems. In general, P ( τ ) is the fraction of problems with performance ratio τ ; thus, a solver with high values of P ( τ ) or one that is located at the top right of the figure is preferable.

Performance profile in a log 10 scale based on iteration.

Performance profile in a log 10 scale based on CPU time.

Figures 1 and 2 show that the BFGS-CG method has the best performance since it can solve 99% of the test problems compared with the BFGS (84%), CG-HS (65%), CG-PR (80%), and CG-FR (75%) methods. Moreover, we can also say that the BFGS-CG is the fastest solver on approximately 68% of the test problems for iteration and 52% of CPU time.

5. Conclusion

We have presented a new hybrid method for solving unconstrained optimization problems. The numerical results for a broad class of test problems show that the BFGS-CG method is efficient and robust in solving the unconstrained optimization problem. We also note that, as the size and complexity of the problem increase, greater improvements could be realised by our BFGS-CG method. Our future research will be to try the BFGS-CG method with coefficients of CG like Fletcher-Reeves, Hestenes-Stiefel, and Polak-Ribiére.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

This research was supported by Fundamental Research Grant Scheme (FRGS Vote no. 59256).

Armijo L. Minimization of functions having Lipschitz continuous first partial derivatives Pacific Journal of Mathematics 1966 16 1 3 MR0191071 10.2140/pjm.1966.16.1 ZBL0202.46105 Wolfe P. Convergence conditions for ascent methods SIAM Review 1969 11 2 226 235 MR0250453 10.1137/1011036 ZBL0177.20603 Wolfe P. Convergence conditions for ascent methods. II: some corrections SIAM Review 1971 13 2 185 188 MR0288943 10.1137/1013035 ZBL0216.26901 Goldstein A. A. On steepest descent Journal of the Society for Industrial and Applied Mathematics A 1965 3 1 147 151 10.1137/0303013 MR0184777 ZBL0221.65094 Shi Z.-J. Convergence of quasi-Newton method with new inexact line search Journal of Mathematical Analysis and Applications 2006 315 1 120 131 2-s2.0-29544451359 10.1016/j.jmaa.2005.05.077 MR2196534 ZBL1093.65063 Han L. Neumann M. Combining quasi-Newton and Cauchy directions International Journal of Applied Mathematics 2003 12 2 167 191 MR1994761 ZBL1042.90041 Fletcher R. Reeves C. M. Function minimization by conjugate gradients The Computer Journal 1964 7 2 149 154 MR0187375 10.1093/comjnl/7.2.149 ZBL0132.11701 Andrei N. Accelerated scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained optimization European Journal of Operational Research 2010 204 3 410 420 2-s2.0-75449103729 10.1016/j.ejor.2009.11.030 MR2587869 ZBL1189.90151 Polak E. Ribière G. Note on the convergence of methods of conjugate directions Revue Française d’Informatique et de Recherche Opérationnelle 1969 3 35 43 MR0255025 ZBL0174.48001 Shi Z.-J. Shen J. Convergence of the Polak-Ribiére-Polyak conjugate gradient method Nonlinear Analysis: Theory, Methods & Applications 2007 66 6 1428 1441 2-s2.0-33845893334 10.1016/j.na.2006.02.001 MR2294450 ZBL1120.49027 Yu G. Guan L. Wei Z. Globally convergent Polak-Ribière-Polyak conjugate gradient methods under a modified Wolfe line search Applied Mathematics and Computation 2009 215 8 3082 3090 2-s2.0-70449532989 10.1016/j.amc.2009.09.063 MR2563425 ZBL1185.65100 Hestenes M. R. Stiefel E. Method of conjugate gradient for solving linear equations Journal of Research of the National Bureau of Standards 1952 49 6 409 436 10.6028/jres.049.044 Ludwig A. The Gauss-Seidel-quasi-Newton method: a hybrid algorithm for solving dynamic economic models Journal of Economic Dynamics and Control 2007 31 5 1610 1632 2-s2.0-33947170490 10.1016/j.jedc.2006.05.007 MR2317571 ZBL1201.91119 Luo Y.-Z. Tang G.-J. Zhou L.-N. Hybrid approach for solving systems of nonlinear equations using chaos optimization and quasi-Newton method Applied Soft Computing 2008 8 2 1068 1073 2-s2.0-37249091252 10.1016/j.asoc.2007.05.013 Byrd R. H. Nocedal J. A tool for the analysis of quasi-Newton methods with application to unconstrained minimization SIAM Journal on Numerical Analysis 1989 26 3 727 739 10.1137/0726042 MR997665 ZBL0676.65061 Byrd R. H. Nocedal J. Yuan Y.-X. Global convergence of a class of quasi-Newton methods on convex problems SIAM Journal on Numerical Analysis 1987 24 5 1171 1191 10.1137/0724077 MR909072 ZBL0657.65083 Powell M. J. D. Restart procedures for the conjugate gradient method Mathematical Programming 1977 12 1 241 254 2-s2.0-33846446220 10.1007/BF01593790 ZBL0396.90072 Andrei N. An unconstrained optimization test functions collection Advanced Modeling and Optimization 2008 10 1 147 161 MR2424936 ZBL1161.90486 Michalewicz Z. Genetic Algorithms + Data Structures = Evolution Programs 1996 Berlin, Germany Springer MR1329091 Moré J. J. Garbow B. S. Hillstrom K. E. Testing unconstrained optimization software ACM Transactions on Mathematical Software 1981 7 1 17 41 10.1145/355934.355936 MR607350 ZBL0454.65049 Dolan E. D. Moré J. J. Benchmarking optimization software with performance profiles Mathematical Programming 2002 91 2 201 213 2-s2.0-28244496090 10.1007/s101070100263 MR1875515 ZBL1049.90004