A Derivative-Free Trust Region Algorithm with Nonmonotone Filter Technique for Bound Constrained Optimization

We propose a derivative-free trust region algorithm with a nonmonotone filter technique for bound constrained optimization.The derivative-free strategy is applied for special minimization functions in which derivatives are not all available. A nonmonotone filter technique ensures not only the trust region feature but also the global convergence under reasonable assumptions. Numerical experiments demonstrate that the new algorithm is effective for bound constrained optimization. Locally, optimal parameters with respect to overall computational time on a set of test problems are identified. The performance of the best choice of parameter values obtained by the algorithm we presented which differs from traditionally used values indicates that the algorithm proposed in this paper has a certain advantage for the nondifferentiable optimization problems.


Introduction
Many of the objective functions in mathematical optimization that occur in engineering are obtained from a mass of numerical experiments and have special characteristics such as being nonconvex for which their first-order or secondorder derivatives are unavailable.In this paper, we analyze the solution of the nonlinear problem with bound constraints: where Ω = { ∈ R  |  ≤  ≤ } and () : Ω ⊂ R  → R is a twice continuously differentiable function, but its firstorder or second-order derivatives are not explicitly available.It mainly emerges in the field of operations research, management science, industrial engineering, applied mathematics, and network transmission [1] and in engineering disciplines that deal with analytical optimization techniques such as banking business and weather analysis.The unavailable first-or second-order derivatives may result in traditional derivative-based methods like quasi-Newton methods and conjugate gradient methods are not work.Therefore, researches focus on derivative-free methods which could effectively avoid the use of derivative information.
. .Derivative-Free Trust Region Method.Derivative-free technique has been explored to tune parameters of nonlinear optimization methods [2], to automatic error analysis [3,4] and to design helicopter rotor blade [5,6] and hydrodynamic [7].These methods are all special algorithms designed for particular optimizations, but have their usage limitations.In [8][9][10][11][12] another type of derivative-free methods is proposed based on the traditional derivative-based algorithm framework [8,[13][14][15].They construct a function with all available derivatives to approximate the original objective function.
Conn and Scheinberg and Vicente [8,13] have already given a derivative-free method under trust region method framework.They construct the trust region subproblem where   (  ) is the function value at th iteration point,   is the gradient of   (  +) at th iteration point, and    is the 2 Mathematical Problems in Engineering Hessian.Although the function model and the true objective function are meant to coincide the model gradients and Hessian may be (and typically are) different from the original objective function gradient ∇  and Hessian    , function (2) defined in [8] is called fully linear or fully quadratic model, dependent upon the related chosen truncated Taylor series conditions; it must be occasionally updated in order to guarantee that the residual between the approximated and real functions (and more critically, their gradients) is within the related error bounds.In fact, by definition, the function values are essentially the same.We will show the definition of fully linear model after a reasonable assumption.
Assumption (A ).Suppose that a level set L( 0 ) and a maximal radius Δ max are given.Suppose furthermore that  is twice continuously differentiable with Lipschitz continuous Hessian in an appropriate open domain containing the Δ max neighborhood ⋃ ∈L( 0 ) B(, Δ max ) of the set L( 0 ).
Definition .Let a function , which satisfies assumption (A1), be given.A set of model functions M = { : R  → R,  ∈  2 } is called a fully linear class of models if the following hold.
There exist positive constants   ,   , and   , such that, for any  ∈ L( 0 ), Δ ∈ (0, Δ max ].There is a model function ( + ) in M, with Lipschitz continuous gradient and corresponding Lipschitz constant bounded by   , such that (1) the error between the gradient of the model and the original objective function satisfies     ∇ ( + ) − ∇ ( + )     ≤   Δ, ∀ ∈ B (0, Δ) , (2) the error between the model and the original objective function satisfies Such a model  is called fully linear on B(, Δ).
Remark .For this class M there exists an algorithm which we will call a model-improvement algorithm that, in a finite, uniformly bounded (with respect to  and Δ) number of steps can (1) either establish that a given model  ∈ M is fully linear on B(, Δ) (we will say that a certificate has been provided and the model is certifiably fully linear) (2) or find a model m ∈ M that is fully linear on B(, Δ). ( As the solution of subproblem ( 5),   induces following decrease: with constant  1 > 0 independent of .Hereby, the norm of the projected gradient is a suitably chosen criticality measurement.In order to obtain a relatively short and elegant convergence result, we describe a concrete implementation by means of a Cauchy step that is defined by an affine-scaled gradient used here have stronger smoothness properties.Similar approaches can be found in [16][17][18].Define the diagonal affine-scaling matrix where   > 0 and  > 0 are given constants and (  )  is the th component of the gradient of   (  + ).Solve the subproblem min    () Following the idea of [18], we are able to prove that the solution of this quadratic model ( 9) also satisfies the decrease (6).
. .Nonmonotone Filter Technique.The filter method was first introduced for constrained nonlinear optimization by Fletcher and Leyffer [19] and then it has a wide range of applications in various optimization problems; see [18,[20][21][22][23].In 2005, the filter method has been extended into a filter trust region method by Gould et al [24] for unconstrained optimization and by Sainvrtu [25] for general box constrained optimization.They indicate that the filter method is a reliable and efficient way for nonlinear optimizations.In this paper, we will make a further study on nonmonotone filter method and propose a new algorithm for (1a) and (1b).The main features of this paper are as follows: (i) We present a further extension of that filter trust region method by introducing both a suitable nonmonotonicity criterion and a derivative-free strategy for bound constrained optimization.
(ii) The global convergence of the presented derivativefree trust region method with the nonmonotone filter technique for bound constrained optimization is established.
(iii) Numerical results indicate that the new algorithm is effective for problems for which the derivative functions are unavailable.
The paper is therefore in the following way: we present our algorithmic scheme in Section 2. There we first show that the decrease direction described in this paper satisfies the predicted decrease inequality and recall the nonmonotone trust region method from [18,26] and then make the necessary modifications for a derivative-free version.The global convergence properties of the derivative-free trust region method with nonmonotone filter technique is shown in Section 3. The corresponding numerical results are reported in Section 4 together with some additional tests.Finally, conclusions and further discussions are given.
Notation.Unless otherwise specified, throughout this paper, the norm ‖ ⋅ ‖ is the 2-norm for a vector and the induced 2-norm for a matrix.Let  denote a closed ball in Ω ⊂ R  and B(, Δ) denote the closed ball centered at , with radius Δ > 0. In addition, L( 0 ) = { ∈ Ω | () ≤ ( 0 )} is the level set about .We use the subscript   and subscript   to distinguish the relevant information between the original function and the approximate function; for example,    is the criticality measurement of (  ) and    is the criticality measurement of   (  +). +  is the trial step in th iteration. +  and  +   is the gradient and Hessian of the trial step, respectively.

The Derivative-Free Trust Region Algorithm with Nonmonotone Filter Technique
We analyze the behaviors of subproblem (9) with the diagonal matrix defined by (8).Let    denote the criticality measurement such that holds on   for some  2 > 0. Fraction of Cauchy decrease condition defined as where  ∈ (0, 1) is a constant and    (   ) =    ( *   ) = min{   (  ) :  ≤ 0,  ∈   } with   =    (  )  .It is not difficult to prove that both ‖   ()()‖ and    are criticality measurement, i.e., satisfying that    = 0 if and only if  is the KKT point of problem (1a) and (1b).Furthermore, if () is bounded and uniformly continuous on (, ), then ‖   ()()‖ is uniformly continuous.The proof is similar to Lemma 6.1 and Lemma 6.2 in [18] except that we now have to replace ∇ℎ() by the approximated gradient   .In order to discuss the global convergence we first provide the following lemma to show that the decrease direction   satisfies the predicted decrease inequality (6).Lemma 3. Suppose that criticality measurement    satisfies ( ).If    ̸ = 0 and if the trial step   +   satisfies the fraction of Cauchy decrease condition ( ), then ( ) holds.
Proof.Considering that   =    (  )  and    (   ) is defined above, firstly we obtain from following inequality that   is a decrease direction of    () at 0, that is, In the case that the maximum stepsize is determined by the trust region constraint ‖  ‖ ≤ Δ  , we obtain In the case that the maximum stepsize is determined by the lower bounds of the set   , then In the same way, the stepsize  3 admitted by the upper bounds of the set   can be estimated: Mathematical Problems in Engineering In the case    (  )      (  ) = 0, we set  4 = +∞.
Otherwise, in the case that    (  )      (  ) positive definition as well as ‖   ‖ less than a constant   , the function   (  ),  ≥ 0, attains its global minimum at  =  4 , where We have If  T       ≤ 0, then If, on the other hand,  * =  4 , then Combining with the assumptions ( 10) and ( 11), the conclusion is obtained.
There are two criteria in the proposed algorithm, to measure if the trial step  +  =   +   is acceptable.One is in the trust region method.The new algorithm is based on a nonmonotone decrease criterion.Nonmonotone trust region methods were investigated by Toint [26] and Ulbrich [18].Let the increasing sequence (  ) ≥0 enumerate all indices of accepted steps.Moreover, Conversely, if  ̸ =   for all , then   was rejected.In the following we introduce the set of all these "successful" indices   by : We follow [18] to choose integer ,  ≥ 0, fix  ∈ (0, 1/], and then compare the predicted decrease promised by the trust region model with a relaxation of the actual decrease where   ≥ ,   = min { + 1, } ,  = 0, 1, . . .,   − 1 and for the computation of reduction ratio   =   (  )/   (  ) in order to decide whether a step is acceptable or not.The idea behind the update rule ( 22) is the following: instead of requiring that ( +  ) be smaller than (  ), it is only required that ( +  ) is either less than (  ) or less than the weighted mean of the function values at the last   = min{ + 1, } successful iterates.Of course, if  = 1, then   () =   () and the usual reduction ratio is recovered.Our approach is a slightly stronger requirement than the straightforward idea to replace   () with max 0≤<  (  − ) − (  + ).Unfortunately, for this latter choice it does not seem be possible to establish all the global convergence results that are available for the monotone case.For our approach, however, this is possible without making the theory substantially more difficult.
The other criterion is in the filter step.We prefer a filter mechanism to assess the suitability of  +  .Our strategy is inspired by that of [24]: we decide that a trial point where  ĝ ∈ (0, 1/√) is a small positive constant and ĝℓ, = ĝ ( ℓ ).
Aiming to solve the nonlinear optimization with unavailable first-(or second-) order derivatives and to guarantee reliable and efficient numerical performance, we now state following derivative-free trust region method.
Algorithm (a derivative-free trust region method with nonmonotone filter technique).
Step 6.These are tests to accept the trial step by (20) and (23).
Remark .At the beginning of every iteration, Step 5 is encountered with  = −1.In this case the sum in ( 22) is empty and thus we define Remark .In order to obtain a suitable approximation function, the objective function of the trust region subproblem needs to update if necessary.
Step 9 is the model improvement method which has the same principle as the Algorithm 2 proposed in [14,15].

Global Convergence for First-Order Critical Points
The purpose of this section is to provide in-depth introduction to the global convergence properties of Algorithm 4 in first-order case.We recall or state some reasonable assumptions that are assumed to hold for problem (1a) and (1b) in order to get global convergence of the derivative-free trust region method.
The global convergence property is described by following Theorem 7 which indicates that there exist at less one accumulation point of a sequence generated by the derivativefree trust region method from Algorithm 4 with filter technique which is still a stationary point of the optimization problem (1a) and (1b).

Theorem 7. Suppose that Assumptions (A )-(A )
, the error bounds ( ) and ( ), and the fact that Δ  is bounded by Δ max hold.Suppose furthermore that   (  + ) is fully linear on B(  , Δ  ).en In order to obtain Theorem 7, the remainder of this section will derive Lemmas 8-15 as support.
Lemma 8. Suppose that Assumptions (A )-(A ), the error bounds ( ) and ( ), and the fact that Δ  is bounded by Δ max hold.  (  + ) is the fully linear function on B(  , Δ  ).
en step of Algorithm will terminate in a finite number of improvement step, if    ̸ = 0.
Proof.We prove this result by contradiction.Assume that the loop in Step 9 is infinite.We will show that    must be zero in this case.If Step 9 is implemented, we notice that we do not have a certifiably fully linear model   (  + ) and that the radius Δ  exceeds    .Then set  (0)  (  + ) =   (  + ) and improve the model until it is fully linear on B(  ,  (0) Δ  ).If  (1)    of the resulting model  (1)   (  +) satisfies  (1)    ≥  (0) Δ  , the procedure stops with Δ  =  (0) Δ  ≤  (1)    .Otherwise, that is,  (1)    <  (0) Δ  improve the model until it is fully linear on B(  , Δ  ).Then, estimate whether the procedure stops or not.If not, the radius should be multiplied by  again, and go on.
The only way for this procedure to be infinite is if This construction implies Since each model  ()  (  + ) was fully linear on B(  ,  −1 Δ  ), the bound (3) provides The choice of  ∈ (0, 1) in Algorithm 4 implies that Lemma 9. Suppose that Assumptions (A )-(A ), the error bounds ( ) and ( ), and the fact that Δ  is bounded by Δ max hold.  (  + ) is the fully linear function on B(  , Δ  ).

Lemma 10. Suppose that Assumptions (A )-(A ), the error bounds ( ) and ( ), and the fact that Δ
is bounded by Δ max hold.Suppose furthermore that   ,   , Δ  , etc. are generated by Algorithm , that   (  + ) is fully linear on B(  , Δ  ), and that en for arbitrary  ∈ L( 0 ) with    ̸ = 0, the th iteration is successful.
Proof.Since    ̸ = 0 and Δ  ≤ min{1,    }, we obtain from the decrease condition (6) On the other hand, since the current model is fully linear on B(  , Δ  ), then from the bound (3) This means that For either case, we obtain the conclusion that   ≥  1 ; that is, the th iteration is successful.[max { (  ) ,

Lemma 11. Suppose that Assumptions (A )-(A ), the error bounds ( ) and ( ), and the fact that
where   = |{1, . . ., } ∩  ∩ |.As we have supposed that there are infinitely many successful nonconvex iterations, we have that lim →∞   = +∞, and |( 0 ) − ( +1 )| is unbounded above, which contradicts the fact that the objective function is bound below; as stated in assumption A1, our initial assumption must then be false and the set ∩ of successful nonconvex iteration must be finite.Proof.Let  0 be the index of the last successful iteration.Then  * =   0 +1 =   0 + and

Lemma 13. Suppose that Assumptions (A )-(A ), the error bounds ( ) and ( ), and the fact that
Now observe that Restrict is set by the algorithm in the course of every unsuccessful iteration.This flag must thus be set at the beginning of every iteration of index  0 +  + 1 for  > 0.
Proof.Assume for the purpose of obtaining a contradiction that, for all  large enough, for some   > 0 and define {  } = .The bound (51) and the Lemma 12 then imply that | ∩ | is finite and therefore that the filter is no longer reset to the empty set for  sufficiently large.Moreover, since our assumptions imply that {   +1 } is bounded above and away form zero, there must exist a subsequence  ℓ ⊆  +1 such that lim ℓ→∞ By definition of { ℓ },   ℓ is acceptable for the filter in each iteration ℓ − 1.This implies, since the filter is not reset for ℓ large enough, that, for each ℓ sufficiently large, there exists an index  ℓ ∈ {1, . . ., } such that But (51) implies that ‖ ĝ ℓ−1 ‖ ≥   for all ℓ sufficiently large.Hence we deduce from (53) that for all ℓ sufficiently large.But the left-hand side of this inequality tends to zero when ℓ tends to infinity because of (52), yielding the desired contradiction.Hence the conclusion holds.
We now consider the similar conclusion of Lemma 5.7 in [8] that if the model criticality measurement    converges to 0 on a subsequence, then so does the true criticality measurement    .

Lemma 15. For any subsequence 𝑘
it also holds that Combining both Lemmas 14 and 15, a global convergent property immediately will be given as Theorem 7, which also illustrates that the criticality step plays an important role in ensuring a subsequence of the iterations being close to the first-order stationarity.

Numerical Experiment
In this section, we examine the practical performance of the derivative-free trust region method on two aspects.The comparisons of the numerical results between the proposed derivative-free trust region method and traditional gradient algorithm are firstly given in order to illustrate the effectiveness of Algorithm 4 in solving the general optimization problems.And then the derivative-free trust region algorithm with nonmonotone filter technique to parameter estimation is presented to show the performance of Algorithm 4 to derivative-free optimization problems.All routines are written in Matlab R2009a and run on a PC with 2.66GHz Intel(R) Core(TM)2 Quad CPU and 4G DDR2.

. . Numerical Results of the Derivative-Free Trust Region
Algorithm with Nonmonotone Filter Technique.The Hock [27] test set is frequently used to test derivative-free algorithms on moderate-size problems.For running the proposed derivative-free trust region algorithm with nonmonotone filter technique, the bound constraints of each problem define the set Ω and linear (not bound) constraints of the original problem were processed by projections.We test 27 simple bound constrained optimization problems (List in Table 1) from Test examples for Nonlinear Programming Codes [27][28][29].The values Δ 0 = 2,  0 = 0.25,  1 = 0.75,  = 0.4,  = 2,  = 0.5,  = 0.3,  = 10 −8 , and  = 0.5 are used.At the same time, we introduce two different algorithms traditional quasi-Newton method and conjugate gradient method [30] into this section to measure the objective algorithm efficiency through the tested problems.We denote the two algorithms as Algorithm 1.1 and Algorithm 1.2.
We use the tool of Dolan and More [31] to analyze the efficiency of the given three algorithms.Figures 1 and 2 show that the performance of Algorithm 4 is feasible and has the robust property among those three methods.It is not difficult to see from Figure 1 that Algorithm 4 has a huge lead among those three methods in the CPU time for solving each test problems.Simultaneously, Figure 2 illustrates that Algorithm 4 more quickly to optimal performance in terms of the number of function evaluations than other two algorithms.
To measure the efficiency of the proposed algorithm for large-scale optimizations, in this section, we compare this method with Algorithm 2.1 in [32] using three characteristics "NI," "NF," and "CPU," where "NI" presents the number of iterations, "NF" is the calculation frequency of the function, and "CPU" is the time of the process in addressing the tested problems.The numerical results with the corresponding problem are listed in Table 2.
Although we only list the data of dimension of 9000, obviously, the objective algorithm (Algorithm 4) is more effective for large-scale optimizations since the iteration number and CPU time which are two essential aspects to measure the efficiency of an algorithm are less than Algorithm 2.1.  .

. e Derivative-Free Trust Region
Algorithm to Parameter Estimation Algorithm (basic trust region method).
Step .Solve a trust region subproblem and compute a step   which satisfies the sufficient decrease conditions.Step .Set Increment  by one, and go to Step 1.
Trust region methods generate steps with the help of a quadratic model of the objective function, but they use this model in different ways from line search methods.Trust region methods define a region around the current iterate within which the trust the model to be an adequate representation of the objective function and then choose the step to be the approximate minimizer of the model in this region.The size of the trust region is critical to the effectiveness of each step.There are four parameters found in a trust region update process (57); namely,  0 ,  1 , ,  are used to adjust the size of the trust region radius.The values are arbitrary and that much better options are available.The classical parameter values   = (0.25, 0.75, 0.5, 2) are recommended in the literature [30], but they may not be the best choice if we consider the total CPU time or the overall number of function evaluations which was necessary to solve a set of optimization problems.This is a derivativefree optimization problem.In this section, an objective is to identify four optimal parameters found in a trust-region update using the proposed Algorithm 4 in this paper.Next we describe the specific representation of this parameter estimation problem.Let  be a set of optimization problems, Ω = { ∈  4 | 0 ≤  0 <  1 < 1 and 0 <  < 1 < },   () be the CPU time which is necessary to solve the problem   ∈  with the parameter values , and   () be the number of function evaluations.Thus the small-dimensional nondifferentiable optimization problem measuring by the overall CPU time   () or the overall number of function evaluations  V () is written as We consider to minimize the total computing time   (⋅).The test problems in  and their dimension are list in Table 3.The initial point chosen for Algorithm 4 is   .Thus, the best set of parameters given by Algorithm 4 is  * = {0.238721,0.923489, 0.352533, 2.304294} . (60) Table 3 shows the results on the test problems.Timings   (⋅) are in seconds.The measure   (⋅) is the number of  3 demonstrates that this strategy allowed the improvement if the truth initial value from   (  ) = 71.04 to   ( * ) = 64.4seconds, i.e., an improvement of approximately 10.31 %.The total number of function evaluations is improved from  V (  ) = 14221 to  V ( * ) = 12752, i.e., an improvement of approximately 11.52 %.
The function which characterizes the performance profile of parameter values   and  * is the same as [2].The performance of the computing time   (⋅) required to solve each optimization problem in the problem set , with the parameter values   , can be visualized in the profile of Figure 3 which compares the CPU time with the parameter values  * .The profile of Figure 4 presents a similar comparison, using the number of function evaluations.Both Figures 3 and 4 illustrate that Algorithm 4 has advantage in solving nondifferentiable optimization problems.

Conclusion
This paper proposes a derivative-free trust region algorithm with nonmonotone filter technique for bound constrained optimization.
(i) This algorithm is mainly designed to solve the unavailable derivatives optimization problems in engineering.The proposed algorithm possesses the trust region property and adopts nonmonotone filter technique for bound constrained optimization.
(ii) The global convergence is provided under the definition of fully linear model.The sufficient descent property is able to make the objective function value descend, and then the iteration sequence {  } converges to the global limit point if the problems are convex.(iii) The preliminary numerical results compared with traditional quasi-Newton method and conjugate gradient method turn out the proposed algorithm is feasible for those of the unavailable derivative functions.Large-scale problems are done by the given problems, which shows that the new algorithms are very effective.
(iv) Finally, optimal parameters with respect to overall computational time on a set of test problems are identified.We use the proposed algorithm to get a best choice of parameter values which differ from traditionally used values and compare the numerical results affected by two different parameters from the two aspects of the CPU time and the number of function evaluations.

Figure 1 :
Figure 1: Profile comparing the CPU time required performance of different methods.

Figure 2 :
Figure 2: Profile comparing the numbers of function evaluations performance of different methods.

Figure 3 :
Figure 3: Profile comparing the CPU time required performance when using different parameter values.

Figure 4 :
Figure 4: Profile comparing the total numbers of function evaluations performance when using different parameter values.
Δ  is bounded by Δ max hold.Suppose furthermore that   (  + ) is fully linear on B(  , Δ  ) and that there exists a constant   > 0 such that    ≥   for all .enthere is a constant   > 0 such that Δ Suppose that Assumptions (A )-(A ), the error bounds ( ) and ( ), and the fact that Δ  is bounded by Δ max hold.Suppose furthermore that   (  + ) is fully linear on B(  , Δ  ) and that there exists a constant   > 0 such that    ≥   for all .entherecanbe only finitely many successful nonconvex iterations in the course of the algorithm, i.e., | ∩ | < +∞.Proof.Suppose, for the purpose of obtaining a contradiction, that there are infinitely many successful nonconvex iterations, which we index by  ∩  = {  }.It follows from the fact of  ∩ =  ∩ that the algorithm also guarantees that   ≥  1 for all iteration in  ∩ , where  = { |   ≥  1 } is the set of sufficient descent iterations, which in turn implies with(6) that, for  ∈  ∩ ,   (  ) ≥  1 [  (  ) −   (  +   )] ≥  1  1    min {1, Δ  ,    } ≥  1  1   min {1, Δ  ,   } [8] for all .Proof.The proof is the same as for Lemma 5.3 in[8]when ‖  ‖ ∞ ≤ Δ  except that we now have to replace ‖  ‖ by the criticality measurement    and use (38) instead of the model decrease defined in[8].Lemma 12.

Table 3 :
Continued.Problem name    (  )   ( * )   (  )   ( * ) 20170101037JC); the PhD Start-Up Fund of Natural Science Foundation of Beihua University and the Youth Training Project Foundation of Beihua University (Grant no.2017QNJJL10).