We are concerned with the nonnegative constraints optimization problems. It is well known that the conjugate gradient methods are efficient methods for solving large-scale unconstrained optimization problems due to their simplicity and low storage. Combining the modified Polak-Ribière-Polyak method proposed by Zhang, Zhou, and Li with the Zoutendijk feasible direction method, we proposed a conjugate gradient type method for solving the nonnegative constraints optimization problems. If the current iteration is a feasible point, the direction generated by the proposed method is always a feasible descent direction at the current iteration. Under appropriate conditions, we show that the proposed method is globally convergent. We also present some numerical results to show the efficiency of the proposed method.
1. Introduction
Due to their simplicity and their low memory requirement, the conjugate gradient methods play a very important role for solving unconstrained optimization problems, especially for the large-scale optimization problems. Over the years, many variants of the conjugate gradient method have been proposed, and some are widely used in practice. The key features of the conjugate gradient methods are that they require no matrix storage and are faster than the steepest descent method.
The linear conjugate gradient method was proposed by Hestenes and Stiefel [1] in the 1950s as an iterative method for solving linear systems
(1)Ax=b,x∈Rn,
where A is an n×n symmetric positive definite matrix. Problem (1) can be stated equivalently as the following minimization problem
(2)min12xTAx-bTx,x∈Rn.
This equivalence allows us to interpret the linear conjugate gradient method either as an algorithm for solving linear systems or as a technique for minimizing convex quadratic functions. For any x∈Rn, the sequence {xk} generated by the linear conjugate gradient method converges to the solution x* of the linear systems (1) in at most n steps.
The first nonlinear conjugate gradient method was introduced by Fletcher and Reeves [2] in the 1960s. It is one of the earliest known techniques for solving large-scale nonlinear optimization problems
(3)minf(x),x∈Rn,
where f:Rn→R is continuously differentiable. The nonlinear conjugate gradient methods for solving (3) have the following form:
(4)xk+1=xk+αkdk,dk={-∇f(xk),k=0,-∇f(xk)+βkdk-1,k≥1,
where αk is a steplength obtained by a line search and βk is a scalar which deternimes the different conjugate gradient methods. If we choose f to be a strongly convex quadratic and αk to be the exact minimizer, the nonliear conjugate gradient method reduces to the linear conjugate gradient method. Several famous formulae for βk are the Hestenes-Stiefel (HS) [1], Fletcher-Reeves (FR) [2], Polak-Ribière-Polyak (PRP) [3, 4], Conjugate-Descent (CD) [5], Liu-Storey (LS) [6], and Dai-Yuan (DY) [7] formulae, which are given by
(5)βkHS=∇f(xk)⊤yk-1dk-1⊤yk-1,βkFR=∥∇f(xk)∥2∥∇f(xk-1)∥2,(6)βkPRP=∇f(xk)⊤yk-1∥∇f(xk-1)∥2,βkCD=-∥∇f(xk)∥2dk-1⊤∇f(xk-1),(7)βkLS=-∇f(xk)⊤yk-1dk-1⊤∇f(xk-1),βkDY=∥∇f(xk)∥2dk-1⊤yk-1,
where yk-1=∇f(xk)-∇f(xk-1) and ∥·∥ stands for the Euclidean norm of vectors. In this paper, we focus our attention on the Polak-Ribière-Polyak (PRP) method. The study of the PRP method has received much attention and has made good progress. The global convergence of the PRP method with exact line search has been proved in [3] under strong convexity assumption on f. However, for general nonlinear function, an example given by Powell [8] shows that the PRP method may fail to be globally convergent even if the exact line search is used. Inspired by Powell’s work, Gilbert and Nocedal [9] conducted an elegant analysis and showed that the PRP method is globally convergent if βkPRP is restricted to be nonnegative and αk is determined by a line search satisfying the sufficient descent condition gk⊤dk≤-c∥gk∥2 in addition to the standard Wolfe conditions. Other conjugate gradient methods and their global convergence can be found in [10–15] and so forth.
Recently, Li and Wang [16] extended the modified Fletcher-Reeves (MFR) method proposed by Zhang et al. [17] for solving unconstrained optimization to the nonlinear equations
(8)F(x)=0,
where F:Rn→Rn is continuously differentiable, and proposed a descent derivative-free method for solving symmetric nonlinear equations. The direction generated by the method is descent for the residual function. Under appropriate conditions, the method is globally convergent by the use of some backtracking line search technique.
In this paper, we further study the conjugate gradient method. We focus our attention on the modified Polak-Ribière-Polyak (MPRP) method proposed by Zhang et al. [18]. The direction generated by MPRP method is given by
(9)dk={-g(xk),k=0,-g(xk)+βkPRPdk-1-θkyk-1,k>0,
where g(xk)=∇f(xk), βkPRP=g(xk)Tyk-1/∥g(xk-1)∥2, θk=g(xk)Tdk-1/∥g(xk-1)∥2, and yk-1=g(xk)-g(xk-1). The MPRP method not only reserves good properties of the PRP method but also possesses another nice property; that it is, always generates descent directions for the objective function. This property is independent of the line search used. Under suitable conditions, the MPRP method with the Armoji-type line search is also globally convergent. The purpose of this paper is to develop an MPRP type method for the nonnegative constraints optimization problems. Combining the Zoutendijk feasible direction method with MPRP method, we propose a conjugate gradient type method for solving the nonnegative constraints optimization problems. If the initial point is feasible, the method generates a feasible point sequence. We also do numerical experiments to test the proposed method and compare the performance of the method with the Zoutendijk feasible direction method. The numerical results show that the method that we propose outperforms the Zoutendijk feasible direction method.
2. Algorithm
Consider the following nonnegative constraints optimization problems:
(10)minf(x)s.t.x≥0,
where f:Rn→R is continuously differentiable. Let xk≥0 be the current iteration. Define the index set
(11)Ik=I(xk)={i∣xk(i)=0},Jk={1,2,…,n}∖Ik,
where xk(i) is the ith component of xk. In fact the index set Ik is the active set of problem (10) at xk.
The purpose of this paper is to develop a conjugate gradient type method for problem (10). Since the iterative sequence is a feasible point sequence, the search directions should be feasible descent directions. Let xk≥0 be the current iteration. By the definition of feasible direction, we have that [19] d∈Rn is a feasible direction of (10) at xk if and only if dIk≥0. Similar to the Zoutendijk feasible direction method, we consider the following problem:
(12)min∇f(xk)Tds.t.dIk≥0,∥d∥≤1.
Next, we show that, if xk is not a KKT point of (10), the solution of problem (12) is a feasible descent direction of f at xk.
Lemma 1.
Let xk≥0 and let d- be a solution of problem (12); then ∇f(xk)Td-≤0. Moreover ∇f(xk)Td-=0 if and only if xk is a KKT point of problem (10).
Proof.
Since d=0 is a feasible point of problem (12), there must be ∇f(xk)Td-≤0. Consequently, if ∇f(xk)Td-≠0, there must be ∇f(xk)Td-<0. This implies that the direction d- is a feasible descent direction of f at xk.
We suppose that ∇f(xk)Td-=0. Problem (12) is equivalent to the following problem:
(13)min∇f(xk)Tds.t.dIk≥0,∥d∥2≤1.
Then there exist λIk and μ such that the following KKT condition holds:
(14)∇f(xk)-(λIk0)+2μd-=0,λIk≥0,d-Ik≥0,λIkTd-Ik=0,μ≥0,∥d-∥≤1,μ(∥d-∥2-1)=0.
Multiplying the first of these expressions by d-, we obtain
(15)∇f(xk)Td--λTd-+2μ∥d-∥2=0,
where λ=(λIk0). By combining the assumption ∇f(xk)Td-=0 with the second and the third expressions of (14), we find that μ=0. Substituting it into the first expressions of (14), we obtain that
(16)∇fIk(xk)-λIk=0,∇fJk(xk)=0.
Let λi=0, i∈Jk; then λi≥0, i∈Ik∪Jk. Moreover, we have
(17)∇f(xk)-(λIkλJk)=0,λi≥0,xk(i)≥0,λixk(i)=0,i∈Ik∪Jk.
This implies that xk is a KKT point of problem (10).
On the other hand, we suppose that xk is a KKT point of problem (10). Then there exist λi,i∈Ik∪Jk, such that the following KKT condition holds:
(18)∇f(xk)-(λIkλJk)=0,λi≥0,xk(i)≥0,λixk(i)=0,i∈Ik∪Jk.
From the second of these expressions, we get λJk=0. Substituting it into the first of these expressions, we have ∇fIk(xk)=λIk≥0 and ∇fJk(xk)=0, so that ∇f(xk)Td-=∇fIk(xk)Td-Ik=λIkTd-Ik≥0. However, we had shown that ∇f(xk)Td-≤0, so ∇f(xk)Td-=0.
By the proof of Lemma 1 we find that ∇fIk(xk)≥0 and ∇fJk(xk)=0 are necessary conditions of the fact that xk is a KKT point of problem (10). We summarize these observation results as the following result.
Lemma 2.
Let xk≥0; then xk is a KKT point of problem (10) if and only if ∇fIk(xk)≥0 and ∇fJk(xk)=0.
Proof.
Firstly, we suppose that xk is a KKT point of problem (10). Similar to the proof of Lemma 1, it is easy to get that ∇fIk(xk)≥0 and ∇fJk(xk)=0.
Secondly, we suppose that ∇fIk(xk)≥0 and ∇fJk(xk)=0. Let λIk=∇fIk(xk)≥0, λJk=0; then the KKT condition (18) holds, so that xk is a KKT point of problem (10).
Based on the above discussion, we propose a conjugate gradient type method for solving problem (10) as follows. Let feasible point xk be current iteration. For the boundary of the feasible region xkIk=0, we take
(19)dki={0,gi(xk)>0,-gi(xk),gi(xk)≤0,∀i∈Ik,
where gi(xk)=∇fi(xk). For the interior of the feasible region xkJk>0, similar to the direction dk in the MPRP method, we define dkJk by the following formula:
(20)dkJkMPRP={-gJk(xk),k=0,-gJk(xk)+βkPRPdk-1Jk-θkMPRPyk-1,k>0,
where gJk(xk)=∇fJk(xk), βkPRP=gJk(xk)Tyk-1/∥g(xk-1)∥2, θkMPRP=gJk(xk)Tdk-1Jk/∥g(xk-1)∥2, and yk-1=gJk(xk)-gJk(xk-1).
It is easy to see from (19) and (20) that
(21)-∥gIk(xk)∥2≤gIk(xk)TdkIk≤0,gJk(xk)TdkJk=-∥gJk(xk)∥2.
The above relations indicate that
(22)g(xk)Tdk=gIk(xk)TdkIk+gJk(xk)TdkJk≤-∥gJk(xk)∥2,(23)g(xk)Tdk≥-∥gIk(xk)∥2-∥gJk(xk)∥2=∥g(xk)∥2,
where g(xk)=∇f(xk).
Theorem 3.
Let xk≥0, dk be defined by (19) and (20) then
(24)g(xk)Tdk≤0.
Moreover, xk is a KKT point of problem (10) if and only if g(xk)Tdk=0.
Proof.
Clearly, inequality (22) implies that
(25)g(xk)Tdk≤0.
If xk is a KKT point of problem (10), similar to the proof of Lemma 1, we also get that g(xk)Tdk=0.
If g(xk)Tdk=0, by (22), we can get that
(26)gIk(xk)TdkIk=0,gJk(xk)TdkJk=-∥gJk(xk)∥2=0.
The equality gIk(xk)TdkIk=0 and the definition of dkIk (19) imply that
(27)gIk(xk)≥0.
Let λIk=gIk(xk)≥0; λJk=0, then the KKT condition (18) also holds, so that xk is a KKT point of problem (10).
By combining (22) with Theorem 3, we conclude that dk defined by (19) and (20) provides a feasible descent direction of f at xk, if xk is not a KKT point of problem (10).
Based on the above process, we propose an MPRP type method for solving (10) as follows.
Algorithm 4 (MPRP type method).
Step 0. Given constants ρ∈(0,1), δ>0, ϵ>0. Choose the initial point x0≥0; Let k:=0.
Step 1. Compute dk=(dkIk,dkJk) by (19) and (20). If |g(xk)Tdk|≤ϵ, then stop. Otherwise, go to the next step.
Step 2. Determine αk=max{ρj,j=0,1,2,…} satisfying xk+αkdk≥0 and
(28)f(xk+αkdk)≤f(xk)-δαk2∥dk∥2.
Step 3. Let the next iteration be xk+1=xk+αkdk.
Step 4. Let k:=k+1 and go to Step 1.
It is easy to see that the sequence {xk} generated by Algorithm 4 is a feasible point sequence. Moreover, it follows from (28) that the function value sequence {f(xk)} is decreasing. In addition if f(x) is bounded from below, we have from (28) that
(29)∑k=0∞αk2∥dk∥2<∞.
In particular we have
(30)limk→∞αk∥dk∥=0.
Next, we prove the global convergence of Algorithm 4 under the following assumptions.
Assumption A.
(1) The level set ω={x∈Rn∣f(x)≤f(x0)} is bound.
(2) In some neighborhood N of ω, f is continuously differentiable, and its gradient is the Lipschitz continuous; namely, there exists a constant L>0 such that
(31)∥∇f(x)-∇f(y)∥≤L∥x-y∥,∀x,y∈N.
Clearly, Assumption A implies that there exists a constant γ1 such that
(32)∥∇f(x)∥≤γ1,∀x∈N.
Lemma 5.
Suppose that the conditions in Assumption A hold; {xk} and {dk} are the iterative sequence and the direction sequence generated by Algorithm 4. If there exists a constant ϵ>0 such that
(33)∥g(xk)∥≥ϵ,∀k,
then there exists a constant M>0 such that
(34)∥dk∥≤M,∀k.
Proof.
By combining (19), (20), and (33) with Assumption A, we deduce that
(35)∥dk∥≤∥dkIk∥+∥dkJkMPRP∥≤γ1+∥gJk(xk)∥+2∥gJk(xk)∥∥yk-1∥∥dk-1JkMPRP∥∥g(xk-1)∥2≤2γ1+2γ1Lαk-1∥dk-1JkMPRP∥ϵ2∥dk-1JkMPRP∥.
By (30), there exists a constant γ∈(0,1) and an iteger k0 such that the following inequality holds for all k≥k0:
(36)2Lγ1ϵ2αk-1∥dk-1JkMPRP∥≤γ.
Hence, we have for any k≥k0(37)∥dk∥≤2γ1+γ∥dk-1∥≤2γ1(1+γ+γ2+⋯+γk-k0-1)+γk-k0∥dk0∥≤2γ11-γ+∥dk0∥.
Let
(38)M=max{∥d1∥,∥d2∥,…,∥dk0∥,2γ11-γ+∥dk0∥}.
Then
(39)∥dk∥≤M,∀k.
Theorem 6.
Suppose that the conditions in Assumption A hold. Let {xk} and {dk} be the iterative sequence and the direction sequence generated by Algorithm 4. Then
(40)liminfk→∞|g(xk)Tdk|=0.
Proof.
We prove the result of this theorem by contradiction. Assume that the theorem is not true; then there exists a constant ε>0 such that
(41)|g(xk)Tdk|≥ϵ,∀k.
So by combining (41) with (23), it is easy to see that (33) holds.
If liminfk→∞αk>0, we get from (30) that dk→0, so that limk→∞|g(xk)Tdk|=0. This contradicts assumption (41).
If liminfk→∞αk=0, there is an infinite index set K such that
(42)limk∈K,k→∞αk=0.
It follows from Step 2 of Algorithm 4, that when k∈K is sufficiently large, ρ-1αk does not satify f(xk+αkdk)≤f(xk)-δαk2∥dk∥2; that is
(43)f(xk+ρ-1αkdk)-f(xk)>-δρ-2αk2∥dk∥2.
By the mean-value theorem, Lemma 1, and Assumption A, there is hk∈(0,1) such that
(44)f(xk+ρ-1αkdk)-f(xk)=ρ-1αkg(xk+hkρ-1αkdk)Tdk=ρ-1αkg(xk)Tdk+ρ-1αk(g(xk+hkρ-1αkdk)-g(xk))Tdk≤ρ-1αkg(xk)Tdk+Lρ-2αk2∥dk∥2.
Substituting the last inequality into (43), we get for all k∈K sufficiently large
(45)0≤-g(xk)Tdk≤ρ-1(L+δ)αk∥dk∥2.
Taking the limit on both sides of the equation, then by combining ∥dk∥≤M and recalling limk∈K,k→∞αk=0, we obtain that limk∈K,k→∞|g(xk)Tdk|=0. This also yields a contradiction.
3. Numerical Experiments
In this section, we report some numerical experiments. We test the performance of Algorithm 4 and compare it with the Zoutendijk method.
The code was written in Matlab, and the program was run on a PC with 2.20 GHz CPU and 1.00 GB memory. The parameters in the method are specified as follows. We set ρ=1/2, δ=1/10. We stop the iteration if |∇f(xk)Tdk|≤0.0001 or the iteration number exceeds 10000.
We first test Algorithm 4 on small and medium size problems and compared them with the Zoutendijk method in the total number of iterations and the CPU time used. The test problems are from the CUTE library [20]. The numerical results of Algorithm 4 and the Zoutendijk method are listed in Table 1. The columns have the following meanings.
The numerical results.
P(i)
Dim
Algorithm 4
Zoutendijk method
Iter
Time
Iter
Time
3
2
1973
1.5710
—
—
4
2
201
0.2290
—
—
6
2
30
0.0160
—
—
3
35
0.0160
—
—
4
39
0.0470
—
—
10
124
0.1210
—
—
50
220
0.5370
—
—
8
3
44
0.0150
40
0.2188
11
3
3
0.0000
4
0.1094
15
4
10
0.0160
20
0.1563
18
6
322
0.0690
1936
12.0938
19
11
438
0.5440
8338
72.4219
23
50
12
0.0300
4
0.5000
24
100
142
0.3750
—
—
25
100
38
0.0810
6
0.3438
26
100
8
0.0470
6
0.1250
1000
4
47.9060
4
190.1406
P(i) is the number of the test problem, Dim is the dimension of the test problem, Iter is the number of iterations, and Time is CPU time in seconds.
We can see from Table 1 that Algorithm 4 has successfully solved 12 test problems, and the Zoutendijk method has successfully solved 8 test problems. From the number of iterations, Algorithm 4 has 12 test results better than Zoutendijk method. From the computation time, Algorithm 4 performs much better than the Zoutendijk method did. We then test Algorithm 4 and the Zoutendijk method on two problems with a larger dimension. The problem of VARDIM comes from [20], and the following problem comes from [16]. The results are listed in Tables 2 and 3.
Test results for VARDIM with various dimensions.
Problem
Dim
Algorithm 4
Zoutendijk method
Iter
Time
Iter
Time
VARDIM
1000
46
13.4485
—
—
2000
55
49.0090
—
—
3000
65
97.1020
—
—
4000
78
164.6213
—
—
5000
90
271.0340
—
—
Test results for Problem 1 with various dimensions.
Problem
Dim
Algorithm 4
Zoutendijk method
Iter
Time
Iter
Time
Problem 1
1000
17
0.1400
8
110.2578
2000
26
16.8604
8
263.2660
3000
39
39.6561
11
554.0310
4000
51
68.1729
30
910.1090
5000
55
110.5660
—
—
Problem 1.
The nonnegative constraints optimization problem
(46)minf(x)s.t.x≥0,
with Engval function f:Rn→R is defined by
(47)f(x)=∑i=2n{(xi-12+xi2)2-4xi-1+3}.
We can see from Table 2 that Algorithm 4 has successfully solved the problem of VARDIM whose scale varies from 1000 dimensions to 5000 dimensions. However, the Zoutendijk method fails to solve the problem of VARDIM with larger dimension. From Table 3, although the number of iterations of Algorithm 4 is more than the Zoutendijk method, the computation time of Algorithm 4 is less than the Zoutendijk method, and this feature becomes more evident as increase of the dimension of the test problem.
In summary, the results from Tables 1–3 show that Algorithm 4 is more efficient than the Zoutendijk method and provides an efficient method for solving nonnegative constraints optimization problems.
Acknowledgment
This research is supported by the NSF (11161020) of China.
HestenesM. R.StiefelE.Methods of conjugate gradients for solving linear systems195249409436MR0060307ZBL0048.09901FletcherR.ReevesC. M.Function minimization by conjugate gradients19647149154MR018737510.1093/comjnl/7.2.149ZBL0132.11701PolakB.RibireG.Note sur la convergence de directions conjugees1969163543PolyakB. T.The conjugate gradient method in extremal problems196994941122-s2.0-0001931644FletcherR.19872ndChichester, UKJohn Wiley & Sons Ltd.xiv+436MR955799LiuY.StoreyC.Efficient generalized conjugate gradient algorithms. I. Theory199169112913710.1007/BF00940464MR1104590ZBL0702.90077DaiY. H.YuanY.A nonlinear conjugate gradient method with a strong global convergence property199910117718210.1137/S1052623497318992MR1740963ZBL0957.65061PowellM. J. D.Convergence properties of algorithms for nonlinear optimization198628448750010.1137/1028154MR867680ZBL0624.90091GilbertJ. C.NocedalJ.Global convergence properties of conjugate gradient methods for optimization199221214210.1137/0802003MR1147881ZBL0767.90082PytlakR.On the convergence of conjugate gradient algorithms199414344346010.1093/imanum/14.3.443MR1283946ZBL0830.65052LiG.TangC.WeiZ.New conjugacy condition and related new conjugate gradient methods for unconstrained optimization2007202252353910.1016/j.cam.2006.03.005MR2319974ZBL1116.65069LiX.ZhaoX.A hybrid conjugate gradient method for optimization problems201131859010.4236/ns.2011.31012DaiY. H.YuanY.An efficient hybrid conjugate gradient method for unconstrained optimization2001103334710.1023/A:1012930416777MR1868442ZBL1007.90065HagerW. W.ZhangH.A new conjugate gradient method with guaranteed descent and an efficient line search200516117019210.1137/030601880MR2177774ZBL1093.90085LiD.-H.NieY.-Y.ZengJ.-P.LiQ.-N.Conjugate gradient method for the linear complementarity problem with S-matrix2008485-691892810.1016/j.mcm.2007.10.017MR2451124LiD.-H.WangX.-L.A modified Fletcher-Reeves-type derivative-free method for symmetric nonlinear equations201111718210.3934/naco.2011.1.71MR2806294ZhangL.ZhouW.LiD.Global convergence of a modified Fletcher-Reeves conjugate gradient method with Armijo-type line search2006104456157210.1007/s00211-006-0028-zMR2249678ZBL1103.65074ZhangL.ZhouW.LiD.-H.A descent modified Polak-Ribière-Polyak conjugate gradient method and its global convergence200626462964010.1093/imanum/drl016MR2263891ZBL1106.65056LiD. H.TongX. J.2005Beijing, ChinaScience PressMoréJ. J.GarbowB. S.HillstromK. E.Testing unconstrained optimization software198171174110.1145/355934.355936MR607350ZBL0454.65049