JMATHJournal of Mathematics2314-47852314-4629Hindawi10.1155/2021/66920246692024Research ArticleAn Efficient Modified AZPRP Conjugate Gradient Method for Large-Scale Unconstrained Optimization Problemhttps://orcid.org/0000-0002-8283-0432AlhawaratAhmad12NguyenThoi Trung13SabraRamadan4https://orcid.org/0000-0001-5877-9051SallehZabidin5ZhaoQingli1Division of Computational Mathematics and EngineeringInstitute for Computational ScienceTon Duc Thang UniversityHo Chi Minh CityVietnamtdtu.edu.vn2Faculty of Mathematics and StatisticsTon Duc Thang UniversityHo Chi Minh CityVietnamtdtu.edu.vn3Faculty of Civil EngineeringTon Duc Thang UniversityHo Chi Minh CityVietnamtdtu.edu.vn4Department of MathematicsFaculty of ScienceJazan UniversityJazanSaudi Arabiajazanu.edu.sa5Department of MathematicsFaculty of Ocean Engineering Technology and InformaticsUniversiti Malaysia TerengganuKuala Nerus 21030TerengganuMalaysiaumt.edu.my20212642021202151220203012021193202126420212021Copyright © 2021 Ahmad Alhawarat et al.This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

To find a solution of unconstrained optimization problems, we normally use a conjugate gradient (CG) method since it does not cost memory or storage of second derivative like Newton’s method or Broyden–Fletcher–Goldfarb–Shanno (BFGS) method. Recently, a new modification of Polak and Ribiere method was proposed with new restart condition to give a so-call AZPRP method. In this paper, we propose a new modification of AZPRP CG method to solve large-scale unconstrained optimization problems based on a modification of restart condition. The new parameter satisfies the descent property and the global convergence analysis with the strong Wolfe-Powell line search. The numerical results prove that the new CG method is strongly aggressive compared with CG_Descent method. The comparisons are made under a set of more than 140 standard functions from the CUTEst library. The comparison includes number of iterations and CPU time.

Universiti Malaysia Terengganu
1. Introduction

The conjugate gradient (CG) method aims to find a solution of optimization problems without constraint. Suppose that the following optimization problem is considered:(1)minfx,xn,where f:n is continuous, the function is differentiable, and the gradient fx is available. The iterative method is given by the following sequence:(2)xk+1=xk+αkdk,k=1,2,,where xk is the starting point and αk>0 is a step length. The search direction dk of the CG method is defined as follows:(3)dk=fx,if k=1,fx+βkdk1,if k2,where fx=gxk and βk is a parameter.

To obtain the step length, we normally use the inexact line search, since the exact line search which is defined as follows,(4)fxk+αkdk=minfxk+αdk,α>0,requires many iterations to obtain the step length. Normally, we use the strong version of Wolfe-Powell (SWP) [1, 2] line search which is given by(5)fxk+αkdkfxk+δαkgkTdk,(6)fxk+αkdkTdkσgkTdk,where 0<δ<σ<1.

The weak Wolfe-Powell (WWP) line search is defined by (5) and(7)fxk+αkdkTdkσgkTdk,where f=gk=gxk. The famous parameters of βk are the Hestenes–Stiefel (HS) , FletcherReeves (FR) , and PolakRibièrePolyak (PRP)  formulas, which are given by(8)βkHS=gkTykdk1Tyk,βkFR=gkTgkgk12,βkPRP=gkTykgk12,where yk=gkgk1.

Powell  shows that there exists a nonconvex function such that the PRP method does not globally converge. Gilbert and Nocedal  show that if βkPRP+=max0,βkPRP with the WWP and the descent property is satisfied, then it is globally convergent.

Al-Baali  proved that the CG method with FR coefficient is convergent with SWP line search when σ1/2. Hager and Zhang [9, 10] presented a new CG parameter with descent property, i.e., gkTdk7/8gk2. This formula is given as follows:(9)βkHZ=maxβkN,ηk,where βkN=1/dkTykyk2dkyk2/dkTykTgk; ηk=1/dkminη,gk; and η>0 is a constant. In the numerical experiments, they set η=0.01 in (9). Al-Baali et al.  compared βkHZ with a new three-term CG method (G3TCG).

Regarding the speed, memory requirements, number of iterations, function evaluations, gradient evaluations, and robustness to solve unconstrained optimization problems which have prompted the development of the CG method, the readers are advised to refer references  for more information on these new formulas.

2. The New Formula and the Algorithm

Alhawarat et al.  presented the following simple formula:(10)βkAZPRP=gk2μkgkTgk1gk12,if gk2>μkgkTgk1,0,otherwise.

Dai and Laio  presented the following formula:(11)βkDL+=maxβkHS,0tgkTsk1dk1Tyk1,where sk1=xkxk1 and t0.

The new formula is a modification of βkAZPRP and βkDL+ is defined as follows:(12)βkA=gk2μkgkTsk1gk12 ,if gk2>μkgkTsk1,tgkTsk1dk1Tyk1,otherwise,where μk=sk1/yk1 and t>0.

We obtain the following relations (Algorithm 1):

<bold>Algorithm 1:</bold> Steps of CG method with new modification to obtain the stationary point of functions.

(13)βkA0,βkAgk2μkgkTsk1gk12gk2gk12=βkFR.

3. Convergence Analysis of Coefficient <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M38"><mml:msubsup><mml:mi>β</mml:mi><mml:mi>k</mml:mi><mml:mi>A</mml:mi></mml:msubsup></mml:math></inline-formula> with CG MethodAssumption 1.

The level set Ψ=x|fxfx1 is bounded, that is, a positive constant T exists such that(14)xT,xΨ.

In some neighbourhoods N of Ψ, f is continuous and the gradient is available and its gradient is Lipschitz continuous; that is, for all x,yN, there exists a constant L>0 such that

(15)gxgyLxy.

This assumption shows that there exists a positive constant B such that(16)guB,uN.

The descent condition(17)gkTdkgk2,k1.

(17) plays an important role in the CG method. The sufficient descent condition proposed by Al-Baali  is a modification of (17) as follows:(18)gkTdkcgk2,k1,where c0,1. Note that the general form of the sufficient descent condition is (18) with c>0.

3.1. Global Convergence for <inline-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M54"><mml:msubsup><mml:mi>β</mml:mi><mml:mi>k</mml:mi><mml:mi>A</mml:mi></mml:msubsup></mml:math></inline-formula> with the SWP Line Search

The following theorem demonstrates that βkA ensures that the sufficient descent condition (21) is satisfied with the SWP line search.

The following theorem shows that βkA satisfies the descent condition. The proof is similar to that presented in .

Theorem 1.

Let gk and dk be generated using (2), (3), and βkA=gk2μkgkTsk1/gk12, where αk is computed by the SWP line search (5) and (6). If σ0,1/2, then the sufficient descent condition (18) holds.

Algorithm 1 shows the steps to obtain the solution of optimization problem using strong Wolfe-Powell line search.

Descent condition is (18) with c > 0.

Proof.

By multiplying () by gkT, we obtain(19)gkTdk=gkTgkT+βkdk1=gk2+βkgkTdk1.

Divide (19) by gk2; using(20)fxk+αkdkTdkσgkTdk,and (12), we obtain(21)1+σgk1Tdk1gk12gkTdkgk21σgk1Tdk1gk12.

From (3), we obtain g1Td1=g12. Assume that it is true until k1, i.e., giTdi<0, for i=1,2,,k1. Repeating the process for (21), we obtain(22)j=0k1σjgkTdkgk22+j=0k1σj.

As(23)j=0k1σj<1σk1σ,hence,(24)1σk1σgkTdkgk22+1σk1σ,and when σ1/2, we obtain 1σk/1σ<2. Let c=21σk/1σ, then(25)c2gkTdkgk2c.

The proof is complete.

Theorem 2.

Let gk and dk be obtained by using (2), (3), and βkA=tgkTsk1/dk1Tyk1 where αk is computed by SWP line search (5) and (6), then the descent condition holds.

Proof.

(26)βkDLHS=μkgkTsk1dk1Tyk1.

By multiplying (3) by gkT, and substituting βkA, we obtain

(27)gkTdk=gk2μkαk1gkTdk1dk1Tyk1gkTdk1=gk2μkαk1gkTdk12dk1Tyk1<0,which completes the proof.

Zoutendijk  presented a useful lemma for global convergence property of the CG method. The condition is given as follows.

Lemma 1.

Let Assumption 1 hold and consider any method in the form of (2) and (3), where αk is obtained by the WWP line search (6) and (7), in which the search direction is descent. Then, the following condition holds: (28)k=0gkTdk2dk2<.

Theorem 3.

Suppose Assumption 1 holds. Consider any form of equations (2) and (3), with the new formula (12), in which αk is obtained from the SWP line search (5) and (6) with σ1/2. Then,(29)liminfkgk=0.

The proof is similar to that presented in .

Proof.

We will prove the theorem by contradiction. Assume that the conclusion is not true, then a constant ε>0 exists such that(30)gkε,k1.

Squaring both sides of equation (3), we obtain(31)dk2=gk22βkgkTdk1+βk2dk12.

Divide (31) by gk4, we get(32)dk2gk4=1gk22βkgkTdk1gk4+βk2dk12gk4.

Using (6), (12), and (32), we obtain(33)dk2gk4dk12gk14+1gk2+2σgk1Tdk1gk12gk2dk12gk14+1+2σ2cgk2.

Repeating the process for (33) and using the relationship 1/g1=1/d1 yields(34)dk2gk41+2σ2ci=1k1gi2.

From (33), we obtain(35)gk4dk2ε2k1+2σ2c.

Therefore,(36)k=0gk4dk2=.

This result contradicts (32), thus liminfkgk=0. The proof is complete.

4. Numerical Results

To investigate the effectiveness of the new parameter, several test problems in Table 1 from CUTEst  are chosen. We performed a comparison with the CG_Descent 5.3 based on the CPU time and the number of iterations. We employed the SWP line search with the line as mentioned in [1, 2] with δ=0.01 and σ=0.1. The modified CG_Descent 6.8 where the memory (mem) equals zero is employed to obtain all results. The code can be downloaded from Hager web pagehttp://users.clas.ufl.edu/hager/papers/Software/.

The test functions.

CG_Descent 5.3βkA
FunctionDimensionNumber of iterationsCPU timeNumber of iterationsCPU time
AKIVA2100.0280.02
ALLINITU4120.0290.02
ARGLINA20010.0210.02
ARGLINB20050.0260.11
BARD3160.02120.02
BDQRTIC50001360.581610.75
BEALE2150.02110.02
BIGGS66270.02240.02
BOX33110.02100.02
BOX100080.0870.08
BRKMCC250.0250.02
BROWNAL20090.0290.02
BROWNBS2130.02100.02
BROWNDEN4160.02160.02
BROYDN7D500014115.47640.37
BRYBND5000850.38390.22
CHAINWOO40003180.8663791.08
CHNROSNB502870.023400.02
CLIFF2180.02100.02
COSINE10000110.19140.26
CRAGGLVY50001030.451040.48
CUBE2320.02170.02
CURLY101000047808173.742454145.16
CURLY201000066587383.9467279366.03
CURLY301000079030639.6374375509.59
DECONVU634002.00E − 022270.02
DENSCHNA290.0260.02
DENSCHNB270.0260.02
DENSCHNC2120.02110.02
DENSCHND3470.02140.02
DENSCHNE3180.02120.02
DENSCHNF280.0290.02
DIXMAANA300070.0260.02
DIXMAANB300060.0260.02
DIXMAANC300060.0260.02
DIXMAAND300070.0280.02
DIXMAANE30002220.332180.33
DIXMAANF30001610.131160.09
DIXMAANG30001570.121730.14
DIXMAANH30001730.221900.2
DIXMAANI300038564.2531603.34
DIXMAANJ30003270.363600.39
DIXMAANK30002830.284160.36
DIXMAANL30002370.23990.36
DIXON3DQ100001000019.121000019.12
DJTL2820.02750.02
DQDRTIC500050.0250.02
DQRTIC5000170.03150.03
EDENSCH2000260.03320.05
EG2100050.0230.02
EIGENALS255010083178.367247133.4
EIGENBLS25501530123718846290.3
EIGENCLS265210136174.1911152186.86
ENGVAL15000270.06230.12
ENGVAL23260.02260.02
ERRINROS503800.02955042.36
EXPFIT2130.0290.02
EXTROSNB100038081.2523700.87
FLETCBV2500010.0210.02
FLETCHCR10001520.05840.05
FMINSRF256253461.09E + 004851.4
FMINSURF56254731.515421.64
FREUROTH5000250.11290.19
GENROSE50010780.1720980.45
GROWTHLS31560.021090.02
GULF3370.02330.02
HAIRY2360.02170.02
HATFLDD3200.02170.02
HATFLDE3300.02130.02
HATFLDFL3390.02210.02
HEART6LS66840.023750.02
HEART8LS82490.022530.02
HELIX3230.02230.02
HIELOW3140.02130.05
HILBERTA220.0220.02
HILBERTB1040.0240.02
HIMMELBB2100.0240.02
HIMMELBF4260.02230.02
HIMMELBG280.0270.02
HIMMELBH270.0250.02
HUMPS2520.02450.02
JENSMP2150.02120.02
JIMACK3544983141182.2572971030.3
KOWOSB4170.02160
LIARWHD5000210.03150.05
LOGHAIRY2270.02260.02
MANCINO100110.08110.08
MARATOSB211450.025890.02
MEXHAT2200.02140.02
MOREBV50001610.411610.38
MSQRTALS102429058.6427889.08
MSQRTBLS102422806.9121816.84
NCB20B500203546.36418170.16
NCB20501087911.8395913
NONCVXU25000661015.89637915.92
NONDIA500070.0370.03
NONDQUAR500019422.4530583.88
OSBORNEA5940.02820.02
OSBORNEB11620.02570.02
PALMER1C8110.02120.02
PALMER1D7110.02100.02
PALMER2C8110.02110.02
PALMER3C8110.02110.02
PALMER4C8110.02110.02
PALMER5C660.0260.02
PALMER6C8110.02110.02
PALMER7C8110.02110.02
PALMER8C8110.02110.02
PARKCH1567229.4582339.39
PENALTY11000280.02410.02
PENALTY22001910.052000.03
PENALTY3200991.78881.98
POWELLSG5000260.02270.05
POWER100003720.765431.2
QUARTC5000170.03150.02
ROSENBR2340.02280.02
S308280.0270.02
SCHMVETT5000430.23400.27
SENSORS100210.25500.8
SINEVAL2640.02460.02
SISSER260.0250.02
SNAIL21000.02610.02
SPARSINE500018358732132883
SPARSQUR10000280.31350.98
SPMSRTLS49992030.592190.61
SROSENBR5000110.0290.03
STRATEC1046219.981706.23
TOINTGOR501350.021200.02
TOINTGSS500040.0250.02
TOINTPSP501430.021570.02
TOINTQOR50290.02290.02
TQUARTIC5000140.03110.03
TRIDIA50007820.847831.11
VAREIGVL230.02240.02
VIBRBEAM501380.02980.02
WATSON8490.02610.02
WOODS12220.06220.03
YFITU4000840.02680.02
ZANGWIL2310.0210.02

The CG_Descent 5.3 results are obtained by run CG_Descent 6.8 with memory which equals zero. The host computer is an AMD A4-7210 with RAM 4 GB. The results are shown in Figures 1 and 2 in which a performance measure introduced by Dolan and More  was employed. As shown in Figure 1, formula A strongly outperforms over CG_Descent in number of iterations. In Figure 2, we notice that the new CG formula A is strongly competitive with CG_Descent.

Performance measure based on the number of iterations.

Performance measure based on the CPU time.

4.1. Multimodal Function with Its Graph

In this section, we present six-hump camel back function, which is a multimodal function to test the efficiency of the optimization algorithm. The function is defined as follows:(37)functions:fx=42.1x12+x143x12+x1x2+4+4x22x22.

The number of variables (n) equals 2. This function has six local minima, with two of them being global. Thus, this function is a multimodal function usually used to test global minima. Global minima are x1=0.0898,0.7126 and x2=0.0898,0.7126. The function value is fx=1.0316. As its name describes, this function looks like the back of an upside down camel with six humps (see Figure 3 for a three-dimensional graph); for more information about two-dimensional functions, the reader can refer to .

Six-hump camel back function in 3D.

Finally, note that CG method can be applied in image restoration problems and neural network and others. For more information, the reader can refer to [20, 21].

5. Conclusions

In this study, a modified version of the CG algorithm (A) is suggested and its performance is investigated. The modified formula is restarted based on the value of the Lipchitz constant. The global convergence is established by using SWP line search. Our numerical results show that the new coefficient produces efficient and competitive results compared with other methods, such as CG_Descent 5.3. In the future, an application of the new version of CG method will be combined with feed-forward neural network (back-propagation (BP) algorithm) to improve the training process and produce fast training multilayer algorithm. This will help in reducing time needed to train neural network when the training samples are massive.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors would like to thank Universiti Malaysia Terengganu for supporting this work.