A method is proposed to smooth the square-order exact penalty function for inequality constrained optimization. It is shown that, under some conditions, an approximately optimal solution of the original problem can be obtained by searching an approximately optimal solution of the smoothed penalty problem. An algorithm based on the smoothed penalty functions is given. The algorithm is shown to be convergent under mild conditions. Two numerical examples show that the algorithm seems efficient.
1. Introduction
Consider the following nonlinear constrained optimization problem:
(1)minf(x)[P]s.t.gi(x)≤0,i=1,2,…,m,x∈Rn,
where f:Rn→R and gi(x):Rn→R, i∈I={1,2,…,m} are twice continuously differentiable functions. Let
(2)G0={x∈Rn∣gi(x)≤0,i=1,2,…,m}.
To solve [P], many penalty function methods have been proposed in numerous pieces of literature. One of the popular penalty functions is given by
(3)F(x,q)=f(x)+q∑i=1m(gi+(x))2,
where gi+(x)=max{0,gi(x)}, i=1,2,…,m. Obviously, it is a continuously differentiable function, but it is not an exact penalty function. If each minimum of the penalty problem is a minimum of the original problem or each minimum of the original problem is a minimum of the penalty problem when the penalty parameter q is large enough, the corresponding penalty function is called exact penalty function.
In Zangwill [1], the classical l1 exact penalty function is defined as follows:
(4)f(x,q)=f(x)+q∑i=1mgi+(x).
After Zangwill’s development, exact penalty functions have attracted most of the attention (see, e.g., [2–6]). It is known from the theory of ordinary constrained optimization that the l1 penalty function is a better candidate for penalization. However, it is not a smooth function and causes some numerical instability problems in its implementation when the value of the penalty parameter q becomes larger. Some methods for smoothing the exact penalty function are developed (see, e.g., [7–14]).
In [15, 16], the square-order penalty function
(5)φq(x)=f(x)+q∑i=1mgi+(x)
has been introduced and investigated. The penalty function φq(x) is exact but not smooth. Its smoothing has been investigated in [15, 16]. So, it can been applied to solve the problem [P] via a gradient-type or a Newton-type method.
In this paper, a new smoothing function to the square-order penalty function of the form (5) is investigated. The rest of this paper is organized as follows. In Section 2, a new smoothing function to the square-order penalty function is introduced, and some fundamental properties of the smoothing function are discussed. In Section 3, an algorithm is presented to compute an approximate solution to [P] based on the smooth penalty function and is shown to be convergent. In Section 4, two numerical examples are given to show the applicability of the algorithm. In Section 5, we conclude the paper with some remarks.
2. Smoothing Exact Lower Order Penalty Function
Consider the following lower order penalty problem:
(6)[LOP]minx∈Rnφq(x).
In this paper, we say that the pair (x*,λ*) satisfies the KKT condition if
(7)∇f(x*)=-∑i∈Iλi*∇gi(x*),λi*gi(x*)=0,λi*≥0,gi(x*)≤0,i∈I
and that the pair (x*,λ*) satisfies the second-order sufficiency condition [17, page 169] if
(8)∇xL(x*,λ*)=0,gi(x*)≤0,i∈I,λi*≥0,i∈I,λi*gi(x*)=0,i∈I,yT∇2L(x*,λ*)y>0,foranyy∈V(x*),
where L(x,λ)=f(x)+∑i=1mλigi(x) and
(9)V(x*)={y∈Rn∣∇Tgi(x*)y=0,i∈A(x*),∇Tgi(x*)y≤0,i∈B(x*)},A(x*)={i∈I∣gi(x*)=0,λi*>0},B(x*)={i∈I∣gi(x*)=0,λi*=0}.
In order to establish the exact penalization, we need the following assumptions.
Assumption 1.
f(x) satisfies the following coercive condition:
(10)lim∥x∥→+∞f(x)=+∞.
Under Assumption 1, there exists a box X such that G([P])⊂int(X), where G([P]) is the set of global minima of problem [P] and int(X) denotes the interior of the set X. Consider the following problem:(11)minf(x)[P′]s.t.gi(x)≤0,i=1,…,m,x∈X.
Let G([P′]) denote the set of global minima of problem [P′]. Then G([P′])=G([P]).
Assumption 2.
The set G([P]) is a finite set.
Then we consider the penalty problem of the form
(12)[LOP′]minx∈Xφq(x).
Let p(u)=(max{0,u})1/2; that is,
(13)p(u)={u1/2ifu>0,0otherwise,
then
(14)φq(x)=f(x)+q∑i=1mp(gi(x)).
For any ϵ>0, let
(15)pϵ(u)={0ifu≤0,23ϵ-2u5/2-13ϵ-3u7/2if0<u≤ϵ,u1/2-23ϵ1/2ifu>ϵ.
It follows that
(16)pϵ′(u)={0ifu≤0,53ϵ-2u3/2-76ϵ-3u5/2if0<u≤ϵ,12u-1/2ifu>ϵ.
It is easy to see that pϵ(u) is continuously differentiable on R. Furthermore, we can obtain that pϵ(u)→p(u) as ϵ→0.
Figure 1 shows the behavior of p(u) (represented by the real line), p0.1(u) (represented by the real line with plus sign), p0.01(u) (represented by the dash and dot line), and p0.001(u) (represented by broken line).
The behavior of pϵ(u) and p(u).
Let
(17)φq,ϵ(x)=f(x)+q∑i=1mpϵ(gi(x)).
Then φq,ϵ(x) is continuously differentiable on Rn. Consider the following smoothed optimization problem:
(18)[SP]minx∈Xφq,ϵ(x).
Lemma 3.
For any x∈X, ϵ>0,
(19)-23mqϵ1/2≤φq,ϵ(x)-φq(x)≤43mqϵ1/2.
Proof.
Note that
(20)p(gi(x))-pϵ(gi(x))={0ifgi(x)≤0,(gi(x))1/2-23ϵ-2(gi(x))5/2+13ϵ-3(gi(x))7/2if0<gi(x)≤ϵ,23ϵ1/2ifgi(x)>ϵ.
When gi(x)∈(0,ϵ], we have
(21)-23ϵ1/2≤p(gi(x))-pϵ(gi(x))≤43ϵ1/2.
Then
(22)-23mqϵ1/2≤φq,ϵ(x)-φq(x)≤43mqϵ1/2.
As a direct result of Lemma 3, we have the following result.
Theorem 4.
Let {ϵj}→0 be a sequence of positive numbers, and assume that xj is a solution to minx∈Xφq,ϵj(x) for some q>0. Let x- be an accumulating point of the sequence {xj}. Then x- is an optimal solution to minx∈Xφq(x).
Proof.
Because xj is a solution to minx∈Xφq,ϵj(x), we have
(23)φq,ϵj(xj)≤φq,ϵj(x).
By Lemma 3, we have
(24)φq(xj)≤φq,ϵj(xj)+23mqϵj1/2,φq,ϵj(x)≤φq(x)+43mqϵj1/2.
It follows that
(25)φq(xj)≤φq,ϵj(xj)+23mqϵj1/2≤φq,ϵj(x)+23mqϵj1/2≤φq(x)+43mqϵj1/2+23mqϵj1/2=φq(x)+2mqϵj1/2.
Let j→0; we have
(26)φq(x-)≤φq(x).
We complete the proof.
Theorem 5.
Let xq*∈X be an optimal solution of problem [LOP′] and x-q,ϵ∈X an optimal solution of problem [SP] for some q>0 and ϵ>0. Then
(27)-23mqϵ1/2≤φq(xq*)-φq,ϵ(x-q,ϵ)≤43mqϵ1/2.
If both xq* and x-q,ϵ are feasible, then
(28)0≤f(x-q,ϵ)-f(xq*)≤23mqϵ1/2.
Proof.
By Lemma 3, we have
(29)-23mqϵ1/2≤φq(xq*)-φq,ϵ(xq*)≤φq(xq*)-φq,ϵ(x-q,ϵ)≤φq(x-q,ϵ)-φq,ϵ(x-q,ϵ)≤43mqϵ1/2.
Specially, if both xq* and x-q,ϵ are feasible, we have
(30)f(xq*)≤f(x-q,ϵ)
by φq(xq*)≤φq(x-q,ϵ).
It follows that
(31)0≤f(x-q,ϵ)-f(xq*).
On the other hand, by (14), (15), (17), and (19), we have
(32)f(x-q,ϵ)-f(xq*)=φq,ϵ(x-q,ϵ)-φq(xq*)≤23mqϵ1/2.
We complete the proof.
Theorem 6.
Supposing that Assumptions 1 and 2 hold, and that, for any x*∈G([P]), there exists a λ∈R+m such that the pair (x*,λ*) satisfies the second-order sufficiency condition (8). Let x*∈X be a global solution of problem [P] and x-q,ϵ∈X a global solution of problem [SP] for ϵ>0. Then there exists q*>0 such that for any q>q*,
(33)-23mqϵ1/2≤f(x*)-φq,ϵ(x-q,ϵ)≤43mqϵ1/2,
where q* is defined in Corollary 2.3 in [16].
Proof.
By Corollary 2.3 in [16], we have that x* is a global solution of problem [LOP′]. Then, by Theorem 5, we have
(34)-23mqϵ1/2≤φq(xq*)-φq,ϵ(x-q,ϵ)≤43mqϵ1/2.
Since ∑i=1mp(gi(x*))=0, we have
(35)φq(x*)=f(x*)+q∑i=1mp(gi(x*))=f(x*).
We complete the proof.
Theorems 4 and 5 mean that an approximate solution to [SP] is also an approximate solution to [LOP′]. Furthermore, by Theorem 6, an optimal solution to [SP] is an approximately optimal solution to [P]. Now we present a penalty function algorithm to solve [P].
3. A Smoothing Method
We propose the following algorithm to solve [P].
Algorithm 7.
Consider the following.
Step 1. Choose a point x0. Given ϵ0>0, q0>0, 0<η<1, and N>1, let j=0, and go to Step 2.
Step 2. Use xj as the starting point to solve minx∈Rnφqj,ϵj(x). Let xj* be the optimal solution obtained (xj* is obtained by a quasi-Newton method and a finite difference gradient). Go to Step 3.
Step 3. Let qj+1=Nqj, ϵj+1=ηϵj, xj+1=xj* and j=j+1; then go to Step 2.
Remark 8.
From 0<η<1 and N>1, we can easily obtain that the sequence {ϵj} is decreasing to 0 and the sequence {qj} is increasing to +∞ as j→+∞.
Now we prove the convergence of the algorithm under mild conditions.
Theorem 9.
Suppose that, for any q∈[q0,+∞), ϵ∈(0,ϵ0], the set
(36)argminx∈Rnφq,ϵ(x)≠∅.
Let {xj*} be the sequence generated by Algorithm 7. If {xj*} has limit point, then any limit point of {xj*} is the solution of [P].
Proof.
Let x- be any limit point of {xj*}. Then there exists a natural number set J∈N, such that xj*→x-, j∈J. If we can prove that (i) x-∈G0 and (ii) f(x-)≤infx∈G0f(x) hold, then x- is the optimal solution of [P].
(i) Suppose, to the contrary, that x-∉G0; then there exist δ0>0, i0∈I, and the subset J1⊂J such that
(37)gi0(xj*)≥δ0
for any j∈J1.
If ϵj≥gi0(xj*)≥δ0, it follows from Step 2 in Algorithm 7 and (15) that
(38)f(xj*)+23qjϵj-2δ05/2-13qjϵj-3δ07/2≤φqj,ϵj(xj*)≤φqj,ϵj(x)=f(x)
for any x∈G0, which contradicts with ϵj→0 and qj→+∞.
If gi0(xj*)≥δ0>ϵj or gi0(xj*)>ϵj≥δ0, it follows from Step 2 in Algorithm 7 and (15) that
(39)f(xj*)+qj(δ01/2-23ϵj1/2)≤φqj,ϵj(xj*)≤φqj,ϵj(x)=f(x)
for any x∈G0, which contradicts with ϵj→0 and qj→+∞.
Then we have x-∈G0.
(ii) For any x∈G0, it holds that
(40)f(xj*)≤φqj,ϵj(xj*)≤φqj,ϵj(x)=f(x);
then f(x-)≤infx∈G0f(x) holds.
This completes the proof.
4. Numerical Examples
In this section, we solve two numerical examples to show the applicability of Algorithm 7 on Fortran.
Example 1 (see [<xref ref-type="bibr" rid="B12">18</xref>, Example 4.1]).
We can see the following:
(41)minlf(x)=x12+x22-cos(17x1)-cos(17x2)+3s.t.g1(x)=(x1-2)2+x22-1.62≤0g2(x)=x12+(x2-3)2-2.72≤0g2kx0≤x1≤2g2kx0≤x2≤2.
Starting point x0=(0,0), q0=5.0, ϵ0=0.1, η=0.5, and N=10, we obtain the results by Algorithm 7 shown in Table 1.
Furthermore, the algorithms based on the penalty function (3) or the exact penalty function (4) are described as follows.
Numerical results for Example 1 by Algorithm 7.
j
xj*
qj
ϵj
g1(xj*)
g2(xj*)
f(xj*)
0
(0.78110471.057024)
5
0.1
0.0430043
−2.904718
3.333603
1
(0.72608870.3992826)
50
0.05
−0.7777236
0.000935
1.836004
2
(0.72447940.3991450)
500
0.025
−0.7737306
−0.000683
1.838842
3
(0.72450650.3990242)
5000
0.0175
−0.7738962
−0.000016
1.837684
Algorithm 10.
Consider the following.
Step 1. Choose a point x0, and a stopping tolerance ϵ>0. Given ϵ0>0, q0>0, 0<η<1, and N>1, let j=0, and go to Step 2.
Step 2. Use xj as the starting point to solve minx∈RnF(x,qj). Let xj* be the optimal solution obtained (xj* is obtained by a quasi-Newton method and a finite difference gradient). Go to Step 3.
Step 3. Let qj+1=Nqj, ϵj+1=ηϵj, xj+1=xj*, and j=j+1; then go to Step 2.
Algorithm 11.
Consider the following.
Step 1. Choose a point x0 and a stopping tolerance ϵ>0. Given ϵ0>0, q0>0, 0<η<1, and N>1, let j=0, and go to Step 2.
Step 2. Use xj as the starting point to solve minx∈Rnf(x,qj). Let xj* be the optimal solution obtained (xj* is obtained by a quasi-Newton method and a finite difference gradient). Go to Step 3.
Step 3. Let qj+1=Nqj, ϵj+1=ηϵj, xj+1=xj*, and j=j+1; then go to Step 2.
Let x0=(0,0), q0=0.1, ϵ0=0.1, η=0.1, and N=5; numerical results by Algorithm 10 are shown in Table 2.
Numerical results for Example 1 by Algorithm 10.
j
xj*
qj
ϵj
g1(xj*)
g2(xj*)
f(xj*)
0
(0.0039020.006869)
0.1
10-1
1.424453
1.668849
1.009072
1
(0.37397300.7309830)
0.1×5
10-2
0.6182998
−2.001706
1.686692
2
(0.39716050.7207292)
0.1×52
10-3
0.5285448
−1.937189
1.833846
3
(0.72771280.3901970)
0.1×53
10-4
−0.7890318
0.050637
1.761504
4
(0.72593670.3969422)
0.1×54
10-5
−0.7791997
0.012894
1.815956
5
(0.72547780.3987666)
0.1×55
10-6
−0.7765785
0.002733
1.832846
6
(0.72714510.3997456)
0.1×56
10-7
−0.7800440
0.000062
1.837930
7
(0.72713370.3997256)
0.1×57
10-8
−0.7800310
0.000150
1.837770
8
(0.72606960.3994698)
0.1×58
10-9
−0.7775255
−0.000066
1.837740
Let x0=(0,0), q0=0.1, ϵ0=0.1, η=0.1, and N=20; numerical results by Algorithm 11 are shown in Table 3.
Numerical results for Example 1 by Algorithm 11.
j
xj*
qj
ϵj
g1(xj*)
g2(xj*)
f(xj*)
0
(0.0013770.002065)
0.1
10-1
1.434500
1.697615
1.000896
1
(0.75136761.085665)
2
10-2
0.1777513
−3.060769
2.840876
2
(0.71880420.3974435)
40
10-3
−0.7605761
−0.000021
1.844097
This example is a nonconvex problem with 22 local optimal solutions in the interior of the feasible region. By Sun and Li [18], we know that x*=(0.7255,0.3993) is a global minimum with global optimal value f*=1.8376. It is clear from Table 1 that the obtained approximately optimal solution is x*=(0.7245065,0.3990242) with corresponding objective function value 1.837684.
From Tables 1–3, one can see that Algorithm 11 converges faster than Algorithms 7 and 10, but the solution generated by Algorithm 11 is the worst. Algorithm 10 is the slowest one, and the solution generated by Algorithm 10 is worse than the solution generated by Algorithm 7.
Example 2 (see the Rosen-Suzki problem in [<xref ref-type="bibr" rid="B8">15</xref>]).
We can see the following:
(42)minf(x)=x12+x22+2x32+x42-5x1-5x2-21x3+7x4s.t.g1(x)=2x12+x22+x32+2x1+x2+x4-5≤0g2(x)=x12+x22+x32+x42+x1-x2+x3-x4-8≤0g3(x)=x12+2x22+x32+2x42-x1-x4-10≤0.
Let x0=(1,1,1,1), q0=2.0, ϵ0=1.0, η=0.1, and N=2; the results by Algorithm 7 are shown in Table 4.
Let x0=(1,1,1,1), q0=0.1, ϵ0=1.0, η=0.1, and N=5; numerical results by Algorithm 10 are shown in Table 5.
Let x0=(1,1,1,1), q0=2.0, ϵ0=1.0, η=0.1, and N=2; the results by Algorithm 11 are shown in Table 6.
Numerical results for Example 2 by Algorithm 7.
j
xj*
qj
ϵj
g1(xj*)
g2(xj*)
g3(xj*)
f(xj*)
0
(1.139751,1.272704,3.819159,1.997231)
2
1.0
15.35879
19.17715
17.95990
−70.16554
1
(0.1705428,0.8361722,2.011177,-0.9678533)
4
10-1
0.0115902
0.0232417
−1.856925
−44.28859
2
(0.1335792,0.8091212,1.995527,-1.008688)
8
10-2
−0.2599155
0.0007742
−1.780662
−44.02616
3
(0.1585001,0.8339736,2.014753,-0.959688)
16
10-3
−0.00367924
−0.0003119
−1.881673
−44.22965
Numerical results for Example 2 by Algorithm 10.
j
xj*
qj
ϵj
g1(xj*)
g2(xj*)
g3(xj*)
f(xj*)
0
(0.4528408,0.8017877,2.580122,-1.214387)
0.1
1.0
3.203103
4.425251
1.858834
−53.31968
1
(0.2167079,0.8595678,2.192063,-1.159403)
0.1×5
10-1
0.7715020
1.643783
−0.039055
−47.79021
2
(0.1782670,0.8383819,2.051085,-1.015255)
0.1×52
10-2
0.1530515
0.3785772
−1.457033
−45.08350
3
(0.1712600,0.8356738,2.017540,-0.9755791)
0.1×53
10-3
0.030094
0.078609
−1.795672
−44.41169
4
(0.1696707,0.8354126,2.010606,-0.9668932)
0.1×54
10-4
0.005887
0.015878
−1.865860
−44.26973
5
(0.1836584,0.8497245,1.993892,-0.9809219)
0.1×55
10-5
0.001217
0.002323
−1.824922
−44.23592
6
(0.1835818,0.8496645,1.993588,-0.9808556)
0.1×56
10-6
−0.000298
0.000466
−1.826614
−44.23108
7
(0.1843284,0.8502323,1.992886,-0.9814753)
0.1×57
10-7
−0.000141
0.000219
−1.824903
−44.23038
8
(0.1843219,0.8502275,1.992824,-0.9814662)
0.1×58
10-8
−0.000412
−0.000131
−1.825209
−44.22948
Numerical results for Example 2 by Algorithm 11.
j
xj*
qj
ϵj
g1(xj*)
g2(xj*)
g3(xj*)
f(xj*)
0
(0.1670927,0.8365505,2.011684,-0.9752350)
2
1.0
−0.001971
0.043152
−1.815266
−44.31766
1
(0.1692354,0.8394703,2.006255,-0.9683554)
4
0.1
−0.003364
0.000497
−1.862336
−44.23219
2
(0.1691869,0.8394533,2.006037,-0.9682074)
8
0.01
−0.004265
−0.001105
−1.863955
−44.22833
It is clear from Table 4 that the obtained approximately optimal solution is x*=(0.1585001,0.8339736,2.014753,-0.959688) with corresponding objective function value −44.22965. From [15], the obtained approximately optimal solution is x*=(0.169234,0.835656,2.008690,-0.964901) with corresponding objective function value −44.233582.
From Tables 4–6, one can see that Algorithm 11 converges faster than Algorithms 7 and 10, but the solution generated by Algorithm 11 is the worst. Algorithm 10 is the slowest one, and the solution generated by Algorithm 10 is worse than the solution generated by Algorithm 7.
From Tables 1–6, one can see that Algorithm 7 yields some approximate solutions to [P] that have a better objective function value in comparison with Algorithms 10 and 11.
5. Conclusion
In this paper, we propose a method for smoothing the nonsmooth square-order exact penalty function for inequality constrained optimization. Error estimations are obtained among the optimal objective function values of the smoothed penalty problem, of the nonsmooth penalty problem, and of the original optimization problem. The algorithm based on the smoothed penalty functions is shown to be convergent under mild conditions.
According to the numerical results given in Section 4, one may draw that the smoothing penalty function φq(x) yields some better convergence results for computing an approximate solution to [P] than F(x,q) and f(x,q).
Finally, we give some advices on how to choose a parameter in the algorithm. According to our experience, initially, q0 may be 0.1, 1, 5, 10, 100, 1000, or 10000, N = 2, 5, 10, or 100, and the iteration formula q=Nq. The initial value of ϵ0 may be 10, 5, 1, 0.5, or 0.1, η = 0.5, 0.1, 0.05, or 0.01, and the iteration formula ϵ=ηϵ.
Acknowledgments
This work is supported by National Natural Science Foundation of China (10971118 and 71371107) and the Foundation of Shandong Province (J10LG04 and ZR2012AL07).
ZangwillW. I.Non-linear programming via penalty functionsBazaraaM. S.GoodeJ. J.Sufficient conditions for a globally exact penalty function without convexityHanS. P.MangasarianO. L.Exact penalty functions in nonlinear programmingMangasarianO. L.Sufficiency of exact penalty minimizationPilloG. D.SpedicatoE.Exact penalty methodsYuC.TeoK. L.ZhangL.BaiY.A new exact penalty function method for continuous inequality constrained optimization problemsBaiF. S.LuoX. Y.Modified lower order penalty functions based on quadratic smoothing approximationChenC.MangasarianO. L.Smoothing methods for convex inequalities and linear complementarity problemsLianS. J.Smoothing approximation to l1 exact penalty function for inequality constrained optimizationMengZ. Q.GaoS.Smoothed square-root penalty function for nonlinear constrained optimizationPinarM.ZeniosS.On smoothing exact penalty functions for convex constrained optimizationWangC. Y.ZhaoW. L.ZhouJ. C.LianS. J.Global convergence and finite termination of a class of smooth penalty function algorithmsXuX. S.MengZ. Q.SunJ. W.HuangL. G.ShenR.A second-order smooth penalty function algorithm for constrained optimization problemsYangX. Q.MengZ. Q.HuangX. X.PongG. T. Y.Smoothing nonlinear penalty functions for constrained optimization problemsMengZ.DangC.YangX.On the smoothing of the square-root exact penalty function for inequality constrained optimizationWuZ. Y.BaiF. S.YangX. Q.ZhangL. S.An exact lower order penalty function and its smoothing in nonlinear programmingBazaraaM. S.SheraliH. D.ShettyC. M.SunX. L.LiD.Value-estimation function method for constrained global optimization