Hybrid Modification of Accelerated Double Direction Method

We present a hybridization of the accelerated gradientmethodwith two vector directions.This hybridization is based on the usage of a chosen three-term hybridmodel. Derived hybrid accelerateddouble directionmodel keeps preferable properties of both included methods. Convergence analysis demonstrates at least linear convergence of the proposed iterative scheme on the set of uniformly convex and strictly convex quadratic functions. The results of numerical experiments confirm better performance profile in favor of derived hybrid accelerated double direction model when compared to its forerunners.


Introduction
The main goal herein is to derive an efficient optimization method for minimization of an objective function  : R  → R. Therewith, we assume the function  is uniformly convex and twice continuously differentiable.Furthermore, for the gradient and the Hessian of the function  at the -th iterative point we use the next notation:   () =  (  ) ,   () =  2  (  ) . (1) The general form of the iterations for finding the extreme values of the objective function  is given by the next expression: where   is the current,  +1 is the next iterative point,   is iterative step length value, and   is an iterative vector direction which leads us to the solution of the problem.Certainly,   and   are the most important issues of an optimization model (2) and they generate the efficiency of a relevant method.For that reason, the way of defining these two crucial elements is of great importance for each minimization scheme.
In one of the first algorithms for solving unconstrained optimization problems, denoted as the steepest descent gradient method which is exposed by Cauchy, the iteration is defined as and here, the descent direction is simply presented as the negative gradient vector, while iterative step size value is calculated by the exact line search formula: Furthermore, in the general Newton method the vector direction is calculated as the product of the inverse Hessian and the gradient of the objective function.Defining the vector direction in this way guarantees fast convergence properties, but still, practical computing of the function Hessian and its inverse can be difficult.And so, many modified Newton, Newton-conjugate, and quasi Newton schemes were developed in which the calculation of the Hessian and its inverse is, somehow, avoided.
In the quasi-Newton methods the Hessian of the goal function or its inverse is approximated by the adequately defined matrix.Using this type of methods we generally reduce the time of computations since we avoid the complicated calculations in deriving the Hessian of the objective function.Nevertheless, the methods of quasi Newton type preserve good properties of the Newton method.For these reasons, in this paper, we propose the method of quasi Newton type where the value of the iterative step size parameter   is obtained by the inexact Backtracking line search procedure.
In the second section we give an overview of some accelerated gradient methods and hybrid iterations.We elaborate the deriving of the hybrid accelerated double direction method and restate the algorithm in the third section of this paper.In the fourth section we give a convergence analysis regarding the proposed iteration.Numerical experiments and comparison are presented in the last section of this paper.

Preliminaries: Accelerated Gradient Methods and Hybrid Iterations
The authors in [1] rightfully detected a class of accelerated gradient descent methods, defined by the general iterative scheme In the previous expression,   presents an iterative acceleration parameter which improves performance of the relevant method.A common way to determine this parameter is through the features of the second-order Taylor's series taken on appropriate scheme (6).Acceleration parameters that were computed in such way are applied in the methods described in [1][2][3][4][5].According to the iteration form (6), we can conclude that the accelerated gradient methods are of the quasi-Newton type in which the approximation of the Hessian, i.e., its inverse, is obtained by the scalar matrix   , where  is appropriate identity matrix and   = (  ,  −1 ) is the matching acceleration parameter.Here are several expressions for defining the acceleration parameters of some accelerated gradient schemes: (SM method [1]) (ADD method [4]) (ADSS method [2]) (TADSS method [5]) (HSM method [3]).An interesting concept of merging iterations through the hybrid expression was suggested in some research articles (see [6][7][8]).Some of representations are given by the next set of iterations: where  : C → C is a mapping defined on nonempty convex subset C of a normed space E, V  ,   , and   present the sequences defined by proposed iterations, and {  }, {  } ∈ (0, 1).
In [9] it was proved that the hybrid method proposed by Picard, Mann, and Ishikawa, upgrades the hybrid models mentioned above.The authors of [3] used the advantages of the hybrid model (15) and derived a hybrid version of the accelerated gradient SM method from [1], termed the HSM method and defined by Numerical tests from [3] confirmed that the hybrid model (16) upgrades its forerunner SM iterative rule.

HADD Algorithm
We are motivated by the confirmed advantages which were approved in [3] when the scheme (15) was applied on the SM method.As a result, the hybrid SM model (called HSM) was defined and tested in [3].Herein, we apply the same hybridization strategy to the accelerated double direction method (ADD method, shortly), introduced in [4].Derived scheme will be based on the hybrid scheme (15) and with that it keeps the accelerated features of the ADD iterations.
In order to complete the presentation, we start from the ADD iteration: Require: Objective function (), gradient   and stepsize   .1: Compute where  =   is the step size,  *  is the solution of the problem min ∈ Φ  (), and Algorithm 1: Procedure Second direction (calculation of the second direction vector   ).
where   is appropriately defined step size, the first direction vector is given by ) −1   , and the second one,   , is determined based on the next procedure Second direction.That procedure was introduced in [4], which was derived as a practical appearance of the more general procedure considered in [10].The procedure Second direction is restated in Algorithm 1.
Remark 1.For further investigation within this topic, the second direction   in the ADD iteration can be defined differently.For example, in [11] the authors proposed directional k-step Newton methods for solving a single nonlinear equation in n-variables.Accordingly, they established the semi-local convergence analysis for these models, based on two different approaches.The first one is based on recurrent relations, while the other, more preferable, is established using recurrent functions.Using one (or both) approaches from [11] in determining the second direction   in the ADD method as well as in its hybrid version can be an interesting topic in further research.
Applying the hybrid scheme (15) on the iterative rule (17), we get the hybrid iterative scheme After replacing the third expression from the set of equations (18) into the second one, the next iterative rule follows: To simplify further calculation, we will use a constant value for the parameter   ∈ (0, 1) in (19), just like the authors did in [1,9].So, in (19), instead of   + 1 ∈ (1, 2) we simply take  ∈ (1, 2).Now we can restate a hybrid ADD method, or the HADD iterative scheme, as follows: Yet, we need to determine the iterative value of the accelerated parameter   =    .As we mentioned previously, this parameter can be appropriately defined using Taylor's expansion of the proposed iteration (20) in two successive iterative points: The parameter  in the previous expansion fulfills the condition In the next relation we substitute the value  2 () from ( 21) by the scalar matrix  +1 , which leads to From ( 23), it is possible to derive the approximation factor of the HADD scheme: With the aim of preserving the Second-Order Necessary Condition and Second-Order Sufficient Condition, we assume positivity of the acceleration parameter:  +1 > 0.
Algorithm 2: The Backtracking line search procedure.
vector direction becomes the negative gradient vector −  .In this special case, the next iterative point of the iteration (20) becomes In order to present the main HADD algorithm, we need two additional auxiliary procedures.The first one is previously displayed Algorithm 1, by which we calculate the second vector direction,   .The second procedure is the Backtracking line search algorithm for calculating the iterative step size value.
Algorithm 3 describes the main algorithm, termed the HADD algorithm.

Convergence of the HADD Method
The convergence properties of the established HADD iterative method are considered on the set of uniformly convex and strictly convex quadratic functions.In the case of uniformly convex functions the statements are the same as exposed in [1,4].For that reason, we just restate the following lemma, in which decreasing of the objective function in two successive points is estimated with respect to the HADD scheme.Thereupon, the upcoming theorem confirms linear convergence of our hybrid accelerated model.

Lemma 2. Suppose the function
Therewith, the sequence {  } converges to the optimal solution at least linearly.
We show now that the iteration (20) is convergent regarding the set of strictly convex quadratic functions In (29), it is assumed that  is a real  ×  symmetric positive definite matrix and that vector  ∈ R  is given.The smallest and the largest eigenvalues of the matrix , respectively, are denoted by  1 and   .

Lemma 4.
Let  be the strictly convex quadratic function defined by ( 29), where  ∈ R × is a symmetric positive definite matrix.Let  1 and   be the smallest and the largest eigenvalues of .Then, the following inequalities are valid for the hybrid accelerated gradient model (20): Proof.Let us calculate the difference in two successive iterative points of the goal function (29): Including the iteration (20) we continue computations: Applying the equality   =   −  and the symmetry property of , we get The right hand side of the previous expression can be further transformed as follows: The replacement of ( +1 ) − (  ) by the right hand side of (34) into (24) leads us to After some calculations, we obtain Previous expression confirms that  +1 is the Rayleigh quotient of the real symmetric matrix  at the vector     − −1    , which leads us to the conclusion The left hand side in inequalities (30) arrives from the fact 0 ≤  +1 ≤ 1.To prove the right hand side of (30), we use the estimation [[3], eq.(3.8)]: Previous inequality implies We can approximate the Lipschitz constant  by the largest eigenvalue   and use the fact that  ≤ 2, 0 <  < 0.5 and  ∈ (, 1).Then (39) is restated to which completes the proof.

Theorem 5. Let the iterations (19) be applied on strictly convex quadratic function 𝑓 given by the expression (29). Suppose that the condition
holds for the largest and the smallest eigenvalues of symmetric positive definite matrix .Then, the following estimations are true: where and Proof.Let us consider the orthonormal system of eigenvectors {V 1 , V 2 , . . ., V  } of the matrix .Thereon, we construct the sequence of values {  } by applying Algorithm 3 on strictly convex quadratic function  defined by (29).Then, for some  ∈ N and for some constants Applying (20), further we conclude that Having in mind the representation (46), one can verify To prove This case implies the next set of inequalities: As a consequence, we can conclude that (2)   ≥  −1    .In this case, one can verify the following estimations: The representation (46) and the fact that {V 1 , V 2 , . . ., V  } is an orthonormal system of eigenvectors lead to the next conclusion Now, knowing that the parameter  under condition   < 2 1 / satisfies 0 <  < 1, we confirm that the final statement is true.The assumption (42) used in the previous theorem is required in order to prove that the HADD process is convergent for the strictly convex quadratics.Therewith, knowing that the hybrid parameter  ∈ (1, 2) implies   / 1 ∈ (1, 2) points to the conclusion that Theorem 5 is applicable to very few cases.However, this is not entirely so since we choose only one particular value  ∈ (1,2) for the practical computations.Regarding this matter, the authors in [3] numerically confirmed that the optimal value of the hybrid parameter  is the one close to the left limit of the interval (1, 2), i.e., the value which is very close to 1. Therefore, we choose  = 1.1 for numerical tests displayed in the next section.Choosing the similar values for hybrid parameter , the condition (42) becomes very close to the condition   < 2 1 , used in [12], under which Q-linear convergence rate of the preconditioned BB method was established.

Computational Tests and Comparisons
The performance of the C++ implementation of derived HADD model is investigated on a set of 630 test unconstrained optimization problems picked from [13].We conduct the testings on a Workstation Intel Celeron The values of the Backtracking parameters are set up as follows:  = 0.0001 and  = 0.8.We compare the hybrid accelerated HADD method with its forerunner ADD scheme, as well as with the hybrid accelerated HSM method.The number of function evaluations is the performance profile measured in all tests.The dominance of the ADD method regarding the number of iterations among the other comparative models was confirmed in [4].However, from that research we do not have any information about the behavior of the ADD method when the number of function evaluations is involved.With respect to this parameter, the HSM scheme upgrades the accelerated SM method as well as Nesterov's line search algorithm; see [3].For these reasons, our experimental goal is to numerically prove better performance feature of the HADD method, considering the number of function evaluations, when compared with the ADD and the HSM method.
In Table 1, we display the number of problems, out of 630, for which an algorithm achieved the minimum number of function evaluations.In the same table, we also display the total number of problems for which all three algorithms achieved an equal number of function evaluations.Based on the results displayed in this table, it is obvious that the HADD scheme convincingly outperforms the other two comparative models.
For more clear visualization of the performance of the HADD algorithm versus the ADD and the HSM algorithms, we display in Figure 1 the Dolan-Moré's performance profile subject to the number of function evaluations metric.As we can see, the HADD scheme is more robust and therewith more efficient than the other two methods.
Obtained numerical results confirm that applied hybridization process is a good way to improve some important characteristics of chosen accelerated methods.Preferable outcomes of the HADD scheme, regarding analyzed characteristic, come from the properly chosen hybrid value , together with derived accelerated parameter   .Good convergent properties of defined HADD process can be a reason for applying proposed hybridization on some other gradient and accelerated gradient models.

Conclusion
We present a hybrid accelerated double direction gradient method for solving unconstrained optimization problems.The HADD method is derived by applying good properties of the hybrid representation introduced in [3] in conjunction with the form of double direction optimization model with accelerated parameter presented in [4].The convergence of defined optimization model is provided on the set of uniformly convex and strictly convex quadratic functions.The HADD scheme reserves preferable features of both forerunner methods.Therewith, according to conducted numerical experiments, it outperforms the ADD and HSM methods regarding the requested number of function evaluations.We evaluated the Dolan-Moré performance profiles of comparative methods and showed that the HADD iteration is the most efficient compared to the other two algorithms.

Figure 1 :
Figure 1: Performance profile for the HADD, ADD, and HSM methods regarding the number of function evaluations metric.
is twice continuously differentiable and uniformly convex on R  .With that, let the sequence {  } be generated byAlgorithm 3.
Theorem 3.For the twice continuously differentiable and uniformly convex function  on R  and the sequence {  } generated by Algorithm 3, the following holds: lim →∞           = 0.

Table 1 :
Comparison between the HADD, ADD, and HSM methods regarding the minimal number of function evaluations.