An Improvement of Stochastic Gradient Descent Approach for Mean-Variance Portfolio Optimization Problem

In this paper, the current variant technique of the stochastic gradient descent (SGD) approach, namely, the adaptive moment estimation (Adam) approach, is improved by adding the standard error in the updating rule. ,e aim is to fasten the convergence rate of the Adam algorithm. ,is improvement is termed as Adam with standard error (AdamSE) algorithm. On the other hand, the mean-variance portfolio optimization model is formulated from the historical data of the rate of return of the S&P 500 stock, 10-year Treasury bond, and money market. ,e application of SGD, Adam, adaptive moment estimation with maximum (AdaMax), Nesterov-accelerated adaptive moment estimation (Nadam), AMSGrad, and AdamSE algorithms to solve the meanvariance portfolio optimization problem is further investigated. During the calculation procedure, the iterative solution converges to the optimal portfolio solution. It is noticed that the AdamSE algorithm has the smallest iteration number. ,e results show that the rate of convergence of the Adam algorithm is significantly enhanced by using the AdamSE algorithm. In conclusion, the efficiency of the improved Adam algorithm using the standard error has been expressed. Furthermore, the applicability of SGD, Adam, AdaMax, Nadam, AMSGrad, and AdamSE algorithms in solving the mean-variance portfolio optimization problem is validated.


Introduction
Recently, the application of the stochastic gradient descent (SGD) approach to machine learning and deep learning is actively explored. Moreover, due to the ability of the SGD approach in handling the stochastic optimization problems [1] and for solving optimization problems under the uncertainty environment [2,3], the SGD approach and its variants have been developed rapidly. By virtue of this, the mean-variance portfolio optimization problem [4], which deals with risk and return, has attracted the attention of the investment community. e optimal decision on the portfolio selection is necessarily needed, where the scientific approach is employed in maximizing the return with the minimum risk [5]. However, this optimal decision is difficult to be made in advance.
In this paper, the disadvantage of the SGD approach, which is the slow convergence [6,7], is noticed. To improve this weakness, the standard error from the sampling theory is added to the updating rule of the adaptive moment estimation (Adam) algorithm [8], which is the current variant of the SGD approach. On this basis, the convergence rate of the Adam algorithm is improved significantly.
is improved version is then known as Adam with standard error (AdamSE) algorithm. On the other hand, the application of SGD methods, including Adam, adaptive moment estimation with maximum (AdaMax), Nesterov-accelerated adaptive moment estimation (Nadam), AMSGrad, and AdamSE approaches, for solving the mean-variance portfolio optimization problem is further studied. For this purpose, the historical data of the rate of return for the S&P 500 stock, 10-year Treasury bond, and money market are employed. en, the mean-variance portfolio optimization model is formulated. During the calculation procedure, the iterative solution converges to the optimal portfolio solution, and the performance of these algorithms is presented. e rest of the paper is organized as follows. In Section 2, the mean-variance portfolio optimization problem is described, where the expected return and the covariance matrix are expressed. In Section 3, the enhancement of the convergence rate of the Adam algorithm by using the standard error from the sampling theory is further discussed. e calculation procedure for the SGD, Adam, AdaMax, Nadam, AMSGrad, and AdamSE algorithms is summarized. In Section 4, a mean-variance portfolio optimization model is formulated using the historical data of S&P 500 stock, 10year Treasury bond, and money market. en, the model is solved by using the algorithms discussed, and the results are presented. Finally, some concluding remarks are made.

Problem Description
Consider a general mean-variance portfolio optimization problem for n risky assets given by where w � (w 1 , . . . , w n ) T is the vector of portfolio weights of the assets and Σ represents the covariance for the assets.
Here, w T Σw gives the variance of the portfolio, μ � (μ 1 , . . . , μ n ) T is the vector of portfolio return mean, and 1 � (1, . . . , 1) T is the vector with 1s elements. Note that the targeted expected return R depends on the risk tolerance of investors. Furthermore, by using the geometric mean, the portfolio return mean is computed from where r i,t is the rate of return of asset i at time t � 1, . . . , T and μ i is the mean of the rate of return for the asset i, whereas the covariance matrix of assets is defined as for i, j � 1, . . . , n. e assumption of the mean-variance portfolio optimization defined in (1) is to minimize the risk of an investment that is represented by the variance, at the same time, to satisfy the targeted return of the portfolio [9]. Since the risk is always related to randomness and uncertainty [10], the stochastic optimization approach will be used in solving the optimization problem defined in (1).

Stochastic Optimization Method
Now, let us define the Lagrange function as follows: where λ � (λ 1 , λ 2 , λ 3 ) T is the vector of the Lagrange multiplier. en, the following first-order necessary conditions are derived: 3.1. Analytical Optimal Solution. From (5), the optimal weighted value of the portfolio is calculated from Refer to (6), the targeted expected return is provided by en, substitute (9) into (10) to have the targeted expected return in terms of the Lagrange multipliers, that is, Rewrite (7) to be and substitute (9) into (12): From (11) and (13), after doing some algebraic manipulations, the Lagrange multipliers are computed from and λ 3 � 0. erefore, from the discussion above, the analytical solution of the mean-variance portfolio optimization problem defined in (1), which is given by (9), (14), and (15), is resulted. However, this analytical solution is assumed to be not available due to the uncertainty and randomness of the variables.

Stochastic Gradient Descent Algorithm.
Referring to the mean-variance portfolio optimization problem defined in (1), let us introduce an augmented objective function f a (w) as with w � (w 1 , . . . , w n ) T . Since the existence of the uncertainty, the augmented objective function f a (w) defined in (16) can be rewritten as the expected objective function, given by where f a,i (w), i � 1, . . . , n, is the element of the augmented objective function f a (w) that is uniformly sampled at random and E[·] is the expectation operator. By virtue of this, the sampled gradient ∇f a,i (w) is denoted as an unbiased estimator to be Notice that the first-order necessary condition for (16) is equivalent to the first-order necessary condition (5). at is, For convenience, define the stochastic gradient which can be calculated from (5). e updating rule of the SGD approach is given by with the step size α, which is also known as the learning rate, k is the number of iterations, and the random index j is the gradient referred. Hence, the calculation procedure of the SGD algorithm is summarized as in Algorithm 1.

Adaptive Moment Estimation Algorithm.
In the Adam approach [11], the exponential decaying averages of past gradients m k and past squared gradients v k are considered as follows: where g k is the gradient, β 1 and β 2 are the decay rates, which are close to 1. Notice that m k and v k are estimates of the first moment (the mean) and the second moment (the uncentered variance) of the gradients, respectively. ese biases are counteracted by using the bias-corrected first-and secondmoment estimates, given by (25) us, Adam updating rule has been presented as follows: where δ is the smoothing term used to avoid division by zero. e calculation procedure of the Adam algorithm is summarized as in Algorithm 2.

Adaptive Moment Estimation with Maximum.
AdaMax, which is the adaptive moment estimation with maximum [11], is a variant of the Adam optimizer that uses the infinity (ℓ ∞ ) norm, while the Adam optimizer itself uses the ℓ 2 -norm for optimization. When generalizing the Adam algorithm to the ℓ ∞ -norm, and hence in AdaMax, the gradient update is the maximum between the past gradients and current gradient, which is shown as en, the updating rule of AdaMax is e calculation procedure of the AdaMax algorithm is summarized as in Algorithm 3.

Nesterov-Accelerated Adaptive Moment Estimation.
Nadam, which is the Nesterov-accelerated adaptive moment estimation, combines Adam and NAG, which is the Nesterov acceleration gradient [12]. e Nadam algorithm is employed for noisy gradients or gradients with high curvatures. e NAG algorithm allows performing a more accurate step in the gradient direction by updating the parameters with the momentum step before computing the gradient. e learning process is accelerated by summing up the exponential decay of the moving averages for the previous and current gradients. It is resulting in a little faster training time than the Adam algorithm. Its updating rule is shown by e calculation procedure of the Nadam algorithm is summarized as in Algorithm 4.

AMSGrad.
In setting, where the Adam algorithm converges to a suboptimal solution, it has been observed that some minibatches provide large and informative gradients, but as these minibatches only occur rarely, exponential averaging diminishes their influence, which leads to poor convergence. To fix this behaviour, a new algorithm, which is known as the AMSGrad algorithm [13], that uses the maximum of past squared gradients rather than the exponential average to update the parameters is created: erefore, the updating rule of AMSGrad is e calculation procedure of the AMSGrad algorithm is summarized as Algorithm 5.

Improved Adaptive Moment Estimation Algorithm.
Consider the standard error (SE) from the sampling theory [14], where σ is the population standard deviation and n is the sample size for the sampling. us, for improving the updating rule of the Adam algorithm, assume that the standard error of the bias-corrected first-moment estimate m k is defined by where �� v k represents the sample standard deviation of the gradient g k and k is the number of iterations. From (26), the updating rule of the Adam algorithm is modified to be Data: given the initial value w (0) � w 0 , the number of samples n, the step size α, and the tolerance ε. Set k � 0.
Step 3: set the random index j.
Data: given the initial value w (0) � w 0 , the number of samples n, the step size α, and the tolerance ε. Set k � 0.
Step 3: set the random index j.
Step 4: compute the decaying averages of past and past squared gradients from (22) and (23).

ALGORITHM 2: Adam algorithm.
Data: given the initial value w (0) � w 0 , the number of samples n, the step size α, and the tolerance ε. Set k � 0.
Step 3: set the random index j.
Step 4: compute the decaying averages of past and past squared gradients from (22) and (27).

Journal of Mathematics
For the modification made, this improved Adam algorithm is also known as Adam with standard error (AdamSE) algorithm [11,14]. e calculation procedure for the AdamSE algorithm is summarized as Algorithm 6.

Illustrative Example
Consider a portfolio optimization problem [15], where the portfolio selection is based on three securities, namely, S&P 500 stock, 10-year Treasury bond, and money market (MM). e corresponding historical data of the annual rate of return for these securities, which are dated from 1961 to 2003, are shown in Table 1.
As a result, the optimal portfolio in percentage is shown in Table 2, where the final iterative solutions obtained from Data: given the initial value w (0) � w 0 , the number of samples n, the step size α, and the tolerance ε. Set k � 0.
Step 3: set the random index j.
Step 4: compute the decaying averages of past and past squared gradients from (22) and (23).
Data: given the initial value w (0) � w 0 , the number of samples n, the step size α, and the tolerance ε. Set k � 0.
Step 3: set the random index j.
Step 4: compute the decaying averages of past and past squared gradients from (22) and (23).

Remark:
e default values for the decay rates are β 1 � 0.9 and β 2 � 0.999, and the smoothing term is δ � 10 − 8 , while the tolerance is ε � 10 − 6 , and the learning rate is α � 0.001.        e performance of the respective methods in solving the mean-variance portfolio optimization problem is shown in Table 3, where the numbers of iterations are presented.
Apparently, the AdamSE algorithm has the smallest number of iterations, which is an 86 percent reduction from the iteration numbers of the Adam algorithm, while the Adam algorithm leads the iterative solution to the convergence with 1 percent faster than the SGD algorithm. At the same time, some variants of Adam algorithm, namely, AdaMax, Nadam, and AMSGrad, show more iteration numbers than the Adam algorithm. Also, the convergence behaviour for each algorithm, which is represented by the norm of the stochastic gradient, is shown in Figures 1-6, respectively. For better visualization, the iterative results for the first 300 iteration numbers of the SGD and AdaMax algorithms are presented as shown in Figures 1 and 3, respectively. erefore, the modification of the Adam algorithm by equipping the standard error significantly enhances the rate of the convergence of the Adam algorithm and the efficiency of the AdamSE algorithm is definitely proven.
In addition to this, the objective function has a minimum risk of 5.0182 × 10 − 4 , where the changes in the variance presented by each algorithm during the iteration procedure are shown in Figures 7-12, respectively. From these figures, it is noticed that the variance is dramatically increasing before meeting a peak point for the different algorithms, and then the variance is gradually decreasing and heading to the minimum variance of 5.0182 × 10 −4 . is behaviour indicates that the divergence issue has been counted when reaching a peak point, and after satisfying the constraints, the optimal solution is successfully determined to give the optimal weight of the portfolio selection.

Concluding remarks
e enhancement of the convergence rate of the Adam algorithm, which is addressed by using the standard error, was discussed in this paper. is improved version of the algorithm is known as the AdamSE algorithm. In addition, the application of the algorithms of SGD, Adam, AdaMax, Nadam, AMSGrad, and AdamSE in solving the mean-variance portfolio optimization problem was also studied. e result obtained showed that the AdamSE algorithm is an efficient approach, especially for solving the mean-variance portfolio optimization problem. In conclusion, the practicality of the SGD algorithm and its current variants, which are the Adam, AdaMax, Nadam, AMSGrad, and AdamSE algorithms, is particularly validated for the mean-variance portfolio optimization problem.

Journal of Mathematics
Data Availability e data used are shown in Table 1.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.