A Three-Term Gradient Descent Method with Subspace Techniques

We proposed a three-term gradient descent method that can be well applied to address the optimization problems in this article. The search direction of the obtained method is generated in a speciﬁc subspace. Speciﬁcally, a quadratic approximation model is applied in the process of generating the search direction. In order to reduce the amount of calculation and make the best use of the existing information, the subspace was made up of the gradient of the current and prior iteration point and the previous search direction. By using the subspace-based optimization technology, the global convergence result is established under Wolfe line search. The results of numerical experiments show that the new method is eﬀective and robust.


Introduction
Gradient descent and conjugate gradient (CG) methods have profound significance for dealing with unconstrained optimization problems: where f: R n ⟶ R is a continuous-differential function. ey are widely used because of the low storage requirements and the strong convergence property. Starting from an initial iteration point x 0 ∈ R n , in each iteration, the method produces an approximate solution sequence x k of (1): in which x k is the current iteration point, α k ≥ 0 is the step length, and d k is the search direction that usually has the following form (for typical CG methods): where g k � ▽f(x k ) is the gradient of f(x) at x k and β k is the CG parameter. e step-size α k is usually determined by performing a specific line search, among which Wolfe line search [1,2] is one of the most commonly used: and where 0 < ρ ≤ σ < 1. Sometimes, the strong Wolfe line search given in (4) and Recently, subspace technique has attracted more and more researchers's attention. Various subspace techniques are applied to construct different approaches to deal with various optimization problems. For detailed description of subspace technique, refer to [12][13][14]. In [15], Yuan and Stoer came up with the SMCG method by using subspace technique in the conjugate gradient method. Specifically, a twodimensional subspace Ω k+1 � Span g k+1 , s k is using in [15], to determine the search direction, that is, where ξ k and φ k are parameters and s k � x k+1 − x k . In [16,17] [20] presented a three-term conjugate gradient method, which possesses sufficient descent property without line search. And, they further extended two-variant method and established the global convergence results for general function under the standard Wolfe line search. In [21], Narushima et al. proposed a specific threeterm CG method, drawing on the idea of multistep quasi-Newton method. Deng and Wan [22] put forward a three-term CG algorithm, in which the direction is formed by g k+1 , y k � g k+1 − g k , and d k . For some other three-term conjugate gradient methods, refer to [23][24][25][26][27]. e outline of the article is shown below. In Section 2, we use the technique of subspace minimization to derive the search direction on the subspace constituted by −g k+1 , s k , g k . We elaborate the proposed algorithm in Section 3. Section 4 provides the convergence analysis of the given algorithm under some suitable conditions. In Section 5, the numerical experiments compared with other methods are presented.

Derivation of the Search Direction
Different from the abovementioned subspace Ω k+1 � Span −g k+1 , s k , in this paper, the following subspace is used to construct the search direction. In order to simplify the calculation process, for each iteration point x k+1 , the following quadratic model is used to approximate the function f(x) where B k+1 can be viewed as an approximation of Hessian matrix. Moreover, B k+1 is supposed to be positive definite and meet quasi-Newton equation B k+1 s k � y k . Model ϕ k+1 (d) has two advantages, which is not only a good approximation of f(x) [14] but also convenient to minimize in the subspace Ω k+1 . Since g k+1 and g k are usually not collinear, we discuss the case of dimensions 3 and 2.
Case I. dim (Ω k+1 ) � 3. In this case, the direction has the form where a k and b k are parameters to be calculated. Adding d k+1 (10) to the minimizing problem (9), we get the following minimizing problem: then a k and b k are the solutions of the following linear system: To solve system (12), we need to estimate two quantities η k � g T k B k+1 g k and ω k � g T k+1 B k+1 g k . Inspired by BBCG [28], we adopt Next, we choose the BFGS [29][30][31][32] update initialized with the identity matrix to compute ω k : is choice not only retains useful information but also gives numerical results that perform better than the scaling matrix s T k y k /s T k s k I. Substituting (13) and (14) into the linear system (12), we get Combining (13), the determinant of the system (15) is calculated as en, a k and b k are obtained as By observing numerical experiments, we find that span −g k+1 , s k produces better numerical results. In this case, we chose Ω k+1 � span −g k+1 , s k as the subspace. e direction is expressed as Substituting (18) into (9), we get en, the solution of (19) can be expressed as erefore, If it is an exact line search, namely, g T k+1 s k � 0, then, from (17) and (21), we get which means a k � g T k+1 y k /s T k y k , b k � 0, and the obtained method is degraded to the HS method (Hestenes and Stiefel) [4].

The Proposed TCGS Algorithm
is section is mainly to elaborate the three-term CG method with subspace techniques (TCGS), in which the acceleration scheme [33] is used (Algorithm 1).

Convergence Analysis
e main content of this section is to study the convergence properties of the TCGS algorithm. Some necessary assumptions for the objective function are given as follows.
Assumption 2. Suppose that f: R n ⟶ R is continuousdifferential, and its gradient g(x) is Lipschitz continuous with the constant L > 0, i.e., Based on the above assumptions, it can be showed that there exists a constant Γ > 0 such that It is well known that, for optimization algorithm, the relevant properties of the search direction are very important for the convergence of the algorithm. Firstly, we will study some properties of the search direction generated by the TCGS algorithm. e following lemmas show that the direction is decreasing, and Dai-Liao conjugacy condition is satisfied. Lemma 1. Suppose that B k+1 > 0. en, d k+1 generated by the TCGS algorithm is a descent direction.
Proof. According to (9), we can get ϕ k+1 (0) � 0. Since B k+1 > 0 and d k+1 is generated by the TCGS algorithm, Lemma 3. Suppose that d k is generated by the proposed TCGS algorithm and the step size α k possesses the conditions (4) and (5), then Proof. Based on condition (5) and assumption (23), we get Since g T k d k < 0, σ < 1, we prove it.

Lemma 4. Assume that Assumptions 1 and 2 hold. Consider the algorithm TCGS in which Wolfe line search is used to compute the step size α k . en, the Zoutendijk condition [34] holds:
Proof. Combining equation (4) and Lemma 3, we have Condition (29) and Assumption 1 deduce (28) directly.

Lemma 5. Suppose that the objective function f(x) satisfies Assumptions 1 and 2, consider the algorithm TCGS in which the strong Wolfe line search (4) and (6) is used to compute the step size α k . If
holds, then Theorem 1. Under the Assumptions 1 and 2, consider the sequence x k generated by algorithm TSGS, α k satisfies the conditions (4) and (6). For uniformly convex function f(x), i.e., there exists a constant μ > 0 such that en, (31) holds.

Numerical Results
is section aims to observe the performance of the TCGS algorithm through numerical experiments and verify the effectiveness of the algorithm in dealing with unconstrained problems. For experimental purposes, we compared the numerical performance of TCGS with the PRP and MTHREECG [22] methods, in which MTHREECG has a structure similar to CGS, and PRP is a classic and effective CG method.
In [22], Deng and Wan presented a similar three-term conjugate gradient method (MTHREECG) with the following form: We chose a total of 75 test functions, and all the test functions come from [35]. e dimension of each function is from 1000, 2000, . . . to 10000 for the numerical experiments. e code is written by Fortran and available at https://camo.
ici.ro/neculai/THREECG/threecg.for [36]. e default parameter values of the algorithm are consistent with [36]. e performance profile introduced by Dolan and Moré [37] is one of the most used tools for evaluating the performance of different methods. In this paper, we use it to investigate the numerical performance of the proposed TCGS algorithm. e performance profile for the number of iterations can be seen in Figure 1. Looking at Figure 1, it can be found that   Step 1: given x 0 ∈ R n and ϵ > 0, set d 0 : � −g 0 and k: � 0.
Step 3: determine the step-size α k by Wolfe line search, which means that conditions (4) and (5) hold.
Step 6: set k: � k + 1, and go to step 2. Mathematical Problems in Engineering the TCGS algorithm can solve about 78% of the test problems with the least iterations. With the increasing factor τ, TCGS method outperforms both MTHREECG and PRP methods. Figure 2 represents the performance profile with respect to the number of function evaluations. e result shows the similar performance with the result in Figure 1, in which TCGS also outperforms both MTHREECG and PRP methods. From the numerical experiments, it can be seen that the proposed TCGS method is efficient and robust in dealing with a set of unconstrained test problems.

Conclusion
By using the idea of subspace minimization, we come up with a new type of subspace gradient descent method in this paper. In our method, the subspace is spanned by −g k+1 , s k , g k , and the quadratic approximation of the objective function is minimized to obtain the search direction. erefore, the direction has the form d k+1 � −g k+1 + a k s k + b k g k , where the values of a k and b k are computed by discussing in two cases. In addition, we prove the descent property of the direction d k and show that Dai-Liao conjugacy condition is satisfied. Under the Wolf line search, the convergence of the proposed TCGS algorithm is established. Observing the numerical results, the performance of the TCGS algorithm in a set of unconstrained problems is competitive.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest.