Accelerated Double Direction Method for Solving Unconstrained Optimization Problems

An iterative method for solving a minimization problem of unconstrained optimization is presented. This multistep curve search methoduses the specific formof iterationwith twodirection parameters, the approximation ofHessian by appropriately constructed diagonal matrix, and the inexact line search procedure. It is proved that constructed numerical process is well defined under some assumptions. Considering certain conditions, the method is linearly convergent for uniformly convex and strictly convex quadratic functions. Numerical results arising from defined algorithms are also presented and analyzed.


Introduction
In this paper, we derive a first-order numerical method for solving the following nonlinear unconstrained optimization problem: where () is twice continuously differentiable function.The iteration of the form is considered.Here,  +1 represents a new iterative point,   is the previous iteration, and   denotes the stepsize, while   and   generate search directions.Each of these directions is calculated by particular algorithms.Similar to other iterative methods for solving the unconstrained optimization problems, the crucial moment is to find the appropriate descent direction vectors   ,   and the optimal step length   .In the proposed iteration (2), there are two direction vectors, and that was a motivation for naming this method as Accelerated Double Direction method (or shortly  method).The decisive point of our research stands actually on two-direction form of the analyzed method and its implementation.Originally, the method of this particular form, but for different assumptions, is described in [1,2].
In [1,2], the implementation of the likely defined method is omitted so in a way in this work we extend and fulfill the similar topic.To allow the method to become suitable for the implementation, we modify choice of the vectors   and   .Another contribution of this paper is obtaining better numerical results with respect to the number of iterations among some known methods for unconstrained optimization.Further, the following notation is used: where ∇() denotes the gradient of  and ∇ 2 () denotes the Hessian of .As usual,   denotes the transpose of .
There are some known procedures for deriving appropriate search directions.We mention several of them (see [1,3,4]).
On the other hand, computation of the step length is also important.The common way to determine stepsize   2 Mathematical Problems in Engineering is the inexact line search technique.The only requirement in the line search procedure is decreasing in the objective function values.This way we calculate the step length which is appropriate enough for our iterative optimization problem (see [5][6][7][8][9][10][11][12]).
In the present paper, we use a combination of the iterative scheme (2) and the accelerated gradient descent method from [13].More precisely, the first term,     , in (2) is defined using the principles from the [13] method.The second term,  2    , appears as the correction factor which is defined from the Taylor expansion series.
The paper is organized as follows.In Section 2, the basic motivation and idea for deriving the accelerated gradient descent method of the form (2) are explained.The algorithm of derived Accelerated Double Direction method, in short  method, is presented in Section 3, where also the main result of this work is analyzed.The convergence of  method is proved in Section 4. Numerical tests and comparisons of derived  method with the accelerated gradient descent method with line search (so-called  method) originated in [13] as well as with nonaccelerated version of  method ( method) are given in Section 5.

Preliminaries
The accelerated gradient descent () methods of the form use   > 0 as an acceleration parameter.The first  method is originated in [14].A type of  method is considered in [13].This  method is called  method and it is derived starting from the Newton iterative method with the line search where   is an appropriate approximation of the Hessian inverse, presented as a symmetric  ×  positive definite matrix.Taking   =  −1  ,   > 0 as an approximation of the inverse of the Hessian, the authors in [13] derived an accelerated modified Newton scheme: where step length   is computed by means of the backtracking inexact line search procedure and   is the length of the acceleration parameter given by the following expression: In this paper, we are using the motivation for calculating an accelerated parameter,   , from Taylor's extension of an objective function.But, unlike  method, we choose to evaluate a method which contains two direction vectors.
A multistep minimization iterative process (2) with two direction vectors is described in [1,2].This algorithm considers generally nondifferentiable functions and it consists of three partial subalgorithms, each explaining one of the needed parameters: two direction vectors   and   and the stepsize   .Since this work considers uniformly convex or strictly convex quadratic functions, we modify the proposed subalgorithms in a way according to the present conditions.In a further section, we give three supplemented algorithms based on propositions originally written in [1].

Main Algorithm
Taking into consideration the results obtained in [1,13], we can construct new iterative method.That process has a form predefined by (2), where the parameter   has properties taken from  method.Using the same notation as in the previous section, the process (2) becomes assuming that the direction vector   is, according to  method, defined by − −1    .Practically, deriving the direction vector   is reduced to deriving the positive real number   .Taking vector   and step length   defined similarly as in [1], we get Now, from Taylor's expansion of the second rate, the approximation of ( +1 ) can be brought as follows: The matrix ∇ 2 () is, like in [13], replaced by  +1  and the parameter  fulfills the following condition: Knowing this, the expression (10) becomes From (12),  +1 is computed in the following way: It is supposed that  +1 > 0; otherwise, the second-order necessary condition and second-order sufficient condition will not be fulfilled.If in some iterative step it happens that  +1 < 0, we take  +1 = 1.
Then, the next iterative point  +2 is computed by According to the posed assumptions in this paper, original calculations for obtaining the values of step length   and direction vector   given in [1] are modified by Algorithms 1 and 2.
Supplemented further, the main contribution in this paper is presented by Algorithm 3. Remark 1.It is possible to compare iterations (9) proposed in the present paper with the general iterative scheme proposed in [15].The search direction in [15] is defined as a linear combination of  +1 and  +1 −   .On the other hand, the search direction in ( 9) is defined as a particular linear combination of −  and the vector   defined in Algorithm 2.

Convergence Analysis
In this section, the convergence analysis of constructed method is discussed.We will first analyze a set of uniformly convex functions and afterwards a subset of strictly convex functions.We will start with the following proposition and lemma that can be found in [16,17].
Proposition 2 (see [16,17]).If the function  : R  → R is twice continuously differentiable and uniformly convex on R  , then (1) the function  has a lower bound on  0 = { ∈ R  | () ≤ ( 0 )}, where  0 ∈ R  is available; (2) the gradient  is Lipschitz continuous in an open convex set  which contains  0 ; that is, there exists  > 0 such that Lemma 3 (see [16,17]).Under the assumptions of Proposition 2, there exist real numbers m, M satisfying such that () has a unique minimizer  * and The estimation of decreasing of a given uniformly convex function in each iteration is described in the following lemma taken from [13].
Lemma 4 (see [13]).For a twice continuously differentiable and uniformly convex function  on R  and for sequence {  } generated by Algorithm 3 the following inequality is valid: where Proof.The proof follows directly from the proof of Lemma 4.2 in [13] using   = − −1    instead of   .The following theorem guarantees a linear convergence of Accelerated Double Direction method.The proof is the same as the proof of Theorem (4.1) in [13].
Theorem 5 (see [13]).If the objective function  is twice continuously differentiable and uniformly convex on R  and the sequence {  } is generated by Algorithm 3, then and the sequence {  } converges to  * at least linearly.
We now consider the case of strictly convex quadratic function which has the form where  is real  ×  symmetric positive definite matrix and  ∈ R  .This particular case is observed since the convergence of gradient methods is generally difficult and nonstandard.In the following analysis, we will use some known assumptions taken from [18][19][20].Let  1 ≤  2 ≤ ⋅ ⋅ ⋅ ≤   be eigenvalues of the matrix .In [20], the -linear rate of convergence is presented for BB method under the assumption   < 2 1 .

Lemma 6.
Let  be a strictly convex quadratic function given by the expression (21) which involves symmetric positive definite matrix  ∈ R × and the gradient descent method (8).Let  1 and   be, respectively, the smallest and the largest eigenvalues of .Let the parameters   ,   , and   be determined according to (13) and Algorithm 3.Then, the following holds: Proof.According to (21), the difference between the values of the objective strictly convex quadratic function in current and previous iterative point is Require: Objective function (), the direction   of the search at the point   and numbers 0 <  < 0.5 and  ∈ (, 1).
Algorithm 1: Calculation of the step length   by reducing the curve search rule from [1] to the backtracking line search starting from  = 1.
(1)  =   . ( where  *  is the solution of the problem min ∈R Φ  (), and Φ  () is defined by Algorithm 2: Calculation of direction vector   .
(2) If           < , then go to Step 9, else continue by the next Step.
Switching the last equivalence into (13),  +1 becomes and further Finally, which implies definitive expression for  +1 : Since  is a real symmetric positive definite matrix and since the previous expression for  +1 presents the Rayleigh quotient of the real symmetric matrix  at the vector     −  −1    , it can be concluded that The fact 0 ≤  +1 ≤ 1 implies the left hand side of inequality (22).The right hand side of the same inequality arises from the inequality which is proved in [13], in Lemma 4. The direct consequence of the previous expression is We know that  is symmetric and () = () − .
which means that in the last expression the largest eigenvalue   of matrix  has the property of Lipschitz constant .In the backtracking algorithm, we chose that parameters  and  take the values 0 <  < 0.5 and  ∈ (, 1).As a resulting inequality, we have and with this expression we are ending the proof.

Theorem 7. For the strictly convex quadratic function 𝑓
given by (21) and the gradient descent method (2), under the additional assumptions   < 2 1 for the eigenvalues of matrix , one has where as well as Proof.Suppose that {V 1 , V 2 , . . ., V  } are orthonormal eigenvectors of symmetric positive definite matrix  and let {  } be the sequence of values constructed by Algorithm 3.For some  and value   ,   =   − .On the other hand, for some real constants   1 ,   2 , . . .,    and   1 ,   2 , . . .,    .

Numerical Results
In order to numerically prove the acceleration property of  method, we constructed the nonaccelerated version of this scheme and named it Nonaccelerated Double Direction method, shortly  method.For that purpose, we had to eliminate the acceleration parameter   .Since   presents an approximation of the inverse of the Hessian, the question was what the adequate substitution for   in iteration ( 9) is.The natural choice for nonaccelerated counterpart of  method is defined by taking constant value   = 1 for all  in each iteration (9).Then, the Hessian is approximated by the identity matrix in each iteration.That is way the nonaccelerated form of the process is obtained and iteration (9) becomes We tested the presented Accelerated Double Direction method, in short  method, on a large scale of unconstrained test problems given in 25 functions proposed in [21].Through the execution, we investigate the number of iterative steps since our primary goal is to reduce this number.Each of 25 functions is tested for 10 numerical experiments.In order to have more general view of analyzed characteristic number of iterations, we choose to test cases of large number of variables: 1000, 2000, 3000, 5000, 7000, 8000, 10000, 15000, 20000, and 30000. method is compared with  method, since in [13]  method is already compared with  method and  (gradient descent) method from [14], but for a lower number of variables (500, 1000, 2000, 3000, 5000, 7000, 8000, 10000, and 15000).In the same paper, it is proved that  method outperformed  and  methods with respect to the number of iterative steps.Since our aim is to improve the numerical results with respect to this characteristic by using constructed  method, it is enough to show that on this matter  algorithm obtains better results than .The stopping criteria for  Algorithm 3 are like those in [13] The codes that are used for testing are written in the  + + programming language on a Workstation Intel Celeron 2.2 GHz.
The presented results in Table 1 show the enormous dominance of  method with respect to the number of iterations.Among the 25 tested functions, a very big difference in the number of iterations in favor of  method is obvious.Precisely, in cases of even 20 of the tested functions, ADD is significantly more effective with respect to the analyzed characteristic than  method.
In Table 1, the test results of the  method are also presented.During the test procedures needed execution time for  method was evidently too long.That is why we defined an execution time limiter parameter   as follows: if the test execution is lasting more than   , we will stop further testing and declare that the testing is too long.The longest execution time in testing  method is obtained for Diagonal 7 function and it is totaled 3287 seconds.We doubled this time, approximated it, and denoted this approximation as   :   To get more clear comparison between  and , we did additional tests for 100 times lower number of variables: 10, 20, 30, 50, 70, 80 100, 150, 200, and 300.The contents of Table 3 display that in this case 9 of 25 test functions were achievable for testing by  scheme.Remark 8.During the testing procedures, we were able to test Generalized quartic function for larger number of variables (1000, 2000, 3000, 5000, 7000, 8000, 10000, 15000, 20000, and 30000), but while applying  iteration for a 100 times lower set of variables on this function (specially for  = 30 and  = 100), time limiter parameter is   ≫ 2 h.That is why this function is not displayed in Table 3.  times better results in favor of  method compared to its nonaccelerated dual method .

Conclusion
We used the proposed form of the iteration for unconstrained optimization problems from [1] to define in similar way a double direction method for uniformly convex functions and for some strictly convex quadratic functions.The presented iterative method is an acceleration gradient descent method, constructed from the Newton method with the line search, approximating the Hessian by appropriate diagonal matrix.The aim of constructing Accelerated Double Direction method, in short  method, is to improve the number of iterations for chosen test functions from [21] for a large number of parameters and this goal is successfully obtained.Also, important contribution of  method is the implementation of specific form of iteration introduced originated in [1].
It is proved that  is linearly convergent method for the uniformly convex functions and for the special subset of strictly convex quadratic functions.
In order to confirm the advantages of accelerated properties of  iteration, a nonaccelerated representation of  scheme,  method, is constructed.Comparative tests substantiate the enormous benefits of  method.Derived accelerated method is also compered with  method whose dominance among the  method and  method has alreday been proven in [13]. algorithm generates multiple times better numerical results with respect to the number of iterations compared to  method.
for  method: The notation >   in Table 1 means that the execution time exceeds   .Considering the presented results in Table 1 for all 25 given functions and all 250 tests, Table 2 actually illustrates the fact that  method has approximately 66 times lower number of iterations than  method.

Table 1 :
Number of iterations needed in developing SM, ADD, and NADD on 25 large-scale test functions.

Table 3 ,
obtained for  and  methods, we display average values for 90 test executions among the 9 functions that satisfy defined time limiter condition in Table4.This table confirms even 1502

Table 3 :
Numerical results for 90 tests for each method, ADD and NADD, tested on 9 test functions, generated for a lower number of variables.