An Improved Adam Optimization Algorithm Combining Adaptive Coefficients and Composite Gradients Based on Randomized Block Coordinate Descent

An improved Adam optimization algorithm combining adaptive coefficients and composite gradients based on randomized block coordinate descent is proposed to address issues of the Adam algorithm such as slow convergence, the tendency to miss the global optimal solution, and the ineffectiveness of processing high-dimensional vectors. The adaptive coefficient is used to adjust the gradient deviation value and correct the search direction firstly. Then, the predicted gradient is introduced, and the current gradient and the first-order momentum are combined to form a composite gradient to improve the global optimization ability. Finally, the random block coordinate method is used to determine the gradient update mode, which reduces the computational overhead. Simulation experiments on two standard datasets for classification show that the convergence speed and accuracy of the proposed algorithm are higher than those of the six gradient descent methods, and the CPU and memory utilization are significantly reduced. In addition, based on logging data, the BP neural networks optimized by six algorithms, respectively, are used to predict reservoir porosity. Results show that the proposed method has lower system overhead, higher accuracy, and stronger stability, and the absolute error of more than 86% data is within 0.1%, which further verifies its effectiveness.


Introduction
Te introduction of this study is described in the following sections.

Background.
With the rapid development of artifcial intelligence, population optimization algorithms [1], the memetic algorithm [2], and frst-order optimization methods, such as random gradient descent [3] and gradient descent with momentum (SGDM) [4], have been widely used in the feld of machine learning and play an important role in solving optimization problems of complex systems. As a frst-order adaptive step stochastic gradient optimizer, the Adam algorithm has gained a lot of attention in the feld of numerical optimization for its outstanding computational efciency and has been widely used in deep learning with impressive results [5]. However, the frst-order momentum of the Adam algorithm is an exponentially weighted average of the historical gradients, and the update of the search direction is infuenced by the deviation value of the gradient, which leads to slow convergence of the model. While the second-order momentum is accumulated over a fxed time window, and the data do not vary monotonically with the time window. Tis generates oscillations in the learning rate in the later stages of training and leads to failure of the model convergence. Terefore, it has become a focus of researchers to seek methods to improve the defects of the Adam algorithm in convergence. studies focus on further improving the performance of the optimizer or combining it with other optimization methods [6]. By assigning a "long-term memory" to the historical gradients, the AMSGrad [7] algorithm is proposed, which solves the convergence problem theoretically. Based on the momentum-accelerated stochastic gradient descent, Ma and Yarats [8] proposed a quasi-hyperbolic weight decay acceleration algorithm and adjusted the hyperparameters. Luo et al. [9] compared the generalization and convergence capabilities of stochastic gradient descent (SGD) and adaptive methods and provided new variants of Adam and AMSGrad, identifed as AdamBound and AMSBound, respectively, by using dynamic learning rate variation bounds to achieve an asymptotic and smooth transition from adaptive methods to SGD. Yin et al. [10] proposed a C-Adam algorithm based on the current gradient, predicted gradient, and historical momentum gradient to attain iteratively more accurate search directions by updating the true gradient. Subsequently, a hybrid Adam-based optimization method HyAdamC [11] is proposed, which carefully tunes the search intensity using three-speed control functions: initial, short term, and long term, thus, signifcantly enhancing the prediction accuracy. Later, some methods were proposed such as AdaGrad [12], Yogi [13], Fromage [14], difGrad [15], RBC-Adam [16], and TAdam [17].
Although the above optimization algorithms can achieve competent results when used to train neural networks, they still pose the following three problems. First, the algorithms need to determine an optimal search speed at each training step, which may introduce overftting or afect the training accuracy and testing accuracy [18]. Secondly, the current momentum used in the Adam is prone to inaccurate search directions because of gradient deviations caused by the outliers [17]. Tirdly, such algorithms have difculty in identifying the current state of the optimized terrain in the solution space spanned by the weights, and therefore, they fail to fnd the approximate optimal weights.

Contribution.
To deal with the above problems, an improved Adam optimization algorithm, combining adaptive coefcients and composite gradients based on randomized block coordinate descent, written ACGB-Adam, is proposed. Te contributions and innovations of this article are summarized as follows. (1) To deal with the problem of slow convergence of the Adam algorithm, adaptive coefcients are used for computing the degree of diference between the frst-order momentum and the current gradient. Tis helps to reduce the degree of infuence of parameters on the deviated gradient caused by the outlier points, improve the proportion of infuence of the parameters on the momentum at the previous moment, avoid the gradient deviation, and enhance the search speed and convergence accuracy. (2) Aiming at the shortcoming that the Adam algorithm tends to miss global optimal solution, the prediction gradient is introduced and combined with the current gradient and the frst-order momentum to form a composite gradient, thus, providing a joint determination of the direction of the iterative optimization. Tis helps to get a more accurate search direction and improve the global search capability, thereby speeding up the search for the global optimal solution. (3) To address the issue of dealing with high-dimensional vectors and the high computational overhead of the Adam algorithm, the randomized block coordinate descent (RBC) is introduced to determine the gradient update mode according to the random variables of the diagonal matrix. Tis ensures that only one block of the gradient needs to be computed in each iteration instead of the entire gradient. Ten, the dynamic balance between the convergence accuracy and the system overhead can be achieved. (4) Combining the above ideas, the ACGB-Adam optimization algorithm is proposed. Te optimization performance of the proposed algorithm is verifed by standard classifcation datasets Mnist and CIFAR-10, which is further applied to BP neural networks and compared with optimization methods based on SGD, AdaGrad, Adam, C-Adam, and RBC-Adam. From the experimental results, it can be concluded that the algorithm proposed in this article has better performance, and its convergence speed, stability, and prediction accuracy are higher than those of the other fve methods.

Adam Algorithm
Te Adam algorithm is explained in the following sections.

Basic Principles.
Te Adam algorithm [19] difers signifcantly from the traditional SGD algorithms. SGD algorithm maintains a single learning rate to update all the weights during training; the AdaGrad algorithm reserves a learning rate for each parameter to improve the performance on sparse gradients; the RMSProp algorithm adaptively reserves a learning rate for each parameter based on the mean of the nearest magnitude of the weight gradient, thereby improving the algorithm's performance on nonstationary problems. Adam algorithm sets independent adaptive learning rates for diferent parameters by computing the frst-order and the second-order momentum estimates of the gradient and gains the advantages of both the AdaGrad and RMSProp algorithms.
Particularly, the Adam algorithm uses not only frst-order momentum to maintain the direction of the historical gradient but also second-order momentum to maintain the adaptive state of the learning rate. Besides, it directly considers a sequential setting where samples are displayed sequentially rather than assuming that a large number of training samples are preavailable. Because of these reasons, the Adam algorithm performs well with high computational efciency and low memory requirements [20]. In recent years, research on the Adam algorithm has fourished, and several variants such as NAdam [21], GAdam [22], AMSGrad [23], Adafactor [24], and Adadelta [25] have been proposed.

Algorithm Flow.
In view of accurately describing the Adam algorithm and its improvement, the relevant parameters involved in this article are described in Table 1. Te pseudocode of the Adam algorithm is shown in Algorithm 1.

2
Computational Intelligence and Neuroscience

Existing Problems.
In deep learning, the Adam algorithm is widely used to solve parameter optimization problems because of its efcient calculation, smaller number of tuning parameters, and high compatibility. However, there are certain shortcomings of this algorithm. Firstly, the model convergence speed is very slow. Te frst-order momentum in the Adam algorithm is the exponentially weighted average of the historical gradient, which controls the update of the optimization direction. It gets easily affected by the gradient deviation value, leading to poor searchability and slow convergence speed of the model. Secondly, it is easy to miss the global optimal solution. Te neural network model often contains a large number of parameters. In a space with extremely high dimensions, the nonconvex objective function often tends to rise and fall, and it is easy to produce the "plateau phenomenon" that causes the training to stop and then miss the global optimal solution.

ACGB-Adam Algorithm
To solve the problems of the Adam algorithm, the ACGB-Adam algorithm is proposed, which is primarily improved from the following three aspects. (1) To address the slow convergence speed of the Adam algorithm, an adaptive coefcient calculation method is adopted to improve the search direction and reduce the infuence of gradient deviation caused by the outliers on the frst-order momentum search direction. (2) In view of the issue that the Adam algorithm is easy to miss the global optimal solution, a composite gradient is formed out of the current gradient and the predicted gradient, which enhances the correctness of the search direction, improves the global optimization ability, and further boosts the search efciency and optimization ability of the algorithm. (3) To reduce the computational cost of the algorithm, the randomized block coordinate descent method is introduced to select variables by modules to calculate the gradient update mode. Tis contributes to reducing the memory and CPU utilization as much as possible on the premise of ensuring the search performance.

Adjust Gradient Deviation with Adaptive Coefcients.
In the Adam algorithm, the gradient deviation caused by outliers has a signifcant impact on the calculation of the frst-order momentum. From the exponential weighted average (EWA), it can be noticed that the frst-order momentum maintains the movement direction of its historical gradient, so the search direction of the next time is determined by the previous frst-order momentum of the current gradient. Subsequently, if the current gradient is far from the global optimal direction, the direction of the frst-order momentum will be further away from the approximate optimum, leading to a serious decline in the search ability. Figure 1 demonstrates the impact of the desired gradient on the frst-order momentum. As highlighted in Figure 1(a), the frst-order momentum at the current time m t is calculated by the EWA between the previous momentum m t−1 and the current gradient g t , and the two constant coefcients β 1 and (1 − β 1 ) are used to obtain the EWA. At this time, the direction of m t shifts to the direction of g t if g t deviates from the desired direction due to the infuence of the outliers. Terefore, the search direction at the next time will also be further away from the approximate global optimum P * , as demonstrated in Figure 1 To improve the slow convergence speed caused by the deviation of the frst-order momentum search direction, it is mandatory to confrm whether the current gradient is the deviation gradient caused by the outliers and reduce its impact as much as possible. So, the ACGB-Adam algorithm computes the diference between m t−1 and g t . If this difference is very large, g t is more likely to afect the search direction at the next moment than the frst-order momentum. In this case, the infuence of the momentum at the previous time m t−1 will be increased according to their diference degree by an adaptive coefcient to reduce the infuence of g t on m t as much as possible. Te outlier gradient adjustment method based on the adaptive coefcient is expressed as where β 1,t is the adaptive coefcient, which is proportional to the diference between m t−1 and g t , namely, In this article, the method in [17] is used to determine the diference ratio, as mentioned in equation (2). In equation (2), q t denotes the similarity between g t and m t as calculated by equation (3), and d represents the vector dimension. Q t−1 is a weighted cumulative sum of q 1 , q 2 ,. . ., q t−1 , as calculated by equation (4):

Combined Predicted Gradient to Form Composite
Gradient. In the Adam algorithm, the frst-order momentum m t is determined by the current gradient g t and the historical frst-order momentum m t−1 . Tis causes the search direction to be excessively dependent on the historical gradient, making it easy to miss the global optimum. Te ACGB-Adam algorithm thus introduces the predicted gradient u t , updates the parameter to be optimized at the next moment by the gradient descent method, and difers it from the historical momentum so that it uses a real gradient update and then merges with the current gradient and the historical frst-order momentum to form a composite gradient. Tis makes it possible to get a more accurate search direction in the next iteration. Figure 2 illustrates the schematic diagram of the frst-order momentum search direction adjustment mechanism integrating adaptive coefcients and composite gradient.
For the frst-order momentum before improvement in Figure 2(a), a constant coefcient β 1 is used. Terefore, if g t moves away from the optimal position P * in a direction   Learning rate β 1 , β 2 Exponential decay rate of the frst-order and second-order moment estimation, respectively T, t Te maximum iterations and the current t time step, respectively Product of exponential decay rate of the frst and second-order moment estimation at t time step, respectively, Te second-order moment vector at t time step g t Current gradient at t time step Te parameter that needs to be optimized f t Te sequence of the smooth convex loss function P * Global optimal position Get a stochastic gradient objective at time step t: Update biased frst-order moment estimation: Get bias-corrected second-order moment estimation End For (10) Return θ t ALGORITHM 1: Adam. 4 Computational Intelligence and Neuroscience deviated from the desired direction, m t will continue its movement in the direction of g t . In Figure 2(b), m 1,t is the search direction corrected by the adaptive coefcient β 1,t . As compared to the direction of m t in Figure 2(a), m 1,t will approach the global optimal position in a more accurate direction. Terefore, to adjust the gradient efect of outliers, an adaptive coefcient is introduced. Because of this, the infuence of the outliers of the frst-order momentum at the previous moment is as small as possible while calculating the current frst-order momentum. Tus, a more potential search direction can be efectively determined, and the search for the global optimal solution can be accelerated. Secondly, based on the use of an adaptive coefcient to correct the search direction, the predicted gradient u t is introduced, and the search direction m t is formed together with the current gradient g t and the historical frst-order momentum m t−1 . It can be observed that, by introducing the predicted gradient, on the basis of the adjustment of m 1,t , the search direction formed can be further closer to P * to avoid missing the global optimal solution. Terefore, the convergence accuracy of the algorithm is improved.

Gradient Update Mode Based on Randomized Block
Coordinate Descent. As a simple and efective method, SGD is often used to learn linear classifers. However, when dealing with high-dimensional vector data, the full gradient descent mode in SGD is not easy to be implemented in parallel. Terefore, this article introduces the random block coordinate method to optimize the Adam algorithm, which can not only handle high-dimensional vectors but also can avoid calculating the complete gradient of all dimensional data in each iteration, thus saving the computing cost and reducing the system overhead on the premise of ensuring the convergence speed and optimization accuracy.

RBC Algorithm.
RBC is a random optimization algorithm [26]. In each iteration, a coordinate (block) is randomly selected, and its variables are updated in the coordinate gradient direction. If f is a convex smooth function and its gradient L i (i ϵ 1, 2, ..., N { }) is a Lipschitz continuous number, the fow of the RBC algorithm is as follows: wherein, x t denotes the parameter vector to be updated. Te RBC algorithm is as shown in Algorithm 2.
RBC algorithm has been widely used to address largescale optimization problems because of its low computation and update cost [16] and its good optimization efect. For instance, Hu and Kwok [27] studied the learning of scalable nonparametric low-rank kernels, and Zhao et al. [28] proposed an accelerated small-batch random block optimization algorithm. Moreover, several machine learning algorithms can be optimized with the help of RBC. For instance, Singh et al. [29] improved the gradient projection algorithm by using RBC, and Xie et al. [30] combined the RBC algorithm with mean-variance optimization.

Gradient Calculation Based on RBC.
In this article, a new gradient calculation method is proposed based on the RBC method. Let D t (t � 0, 1, 2, ..., N) be a n-dimensional diagonal matrix in the tth iteration, and the ith element on the diagonal is denoted as d t i . Here, d t i is a Bernoulli random variable that satisfes the independent identically distributed, i.e., d t Select coordinates iϵ 1, 2, ..., N { } evenly and randomly (5) Update  Input: α, β 1 , β 2 , f t Output: θ t (1) Initialize parameters (adaptive coefcient β 1,t , predicted gradient u t , and the remaining parameters were initialized in the same way as in the Algorithm 1) Generate a random diagonal matrix D t / * Gradient Calculation based on Algorithm 2-RBC * / (4) Get a stochastic gradient at time step t: Update the parameters according to the gradient descent method: Get a predicted stochastic gradient at time step t: u t � ∇ θ f t (θ t−1 )/ * Optimization of the frst moment estimation * / (7) Update biased frst-order moment estimation: Compute bias-corrected frst-order moment estimation: Compute bias-corrected second-order moment estimation End For (13) Return θ t ALGORITHM 3: ACGB-Adam.
Te RBC method is used to randomly select a block (subset) from the whole element of a high-dimensional vector through equation (5) If d t i � 1, which means that the corresponding coordinates are selected, then the ACGB-Adam algorithm is executed for gradient calculation; if d t i � 0, which means that the corresponding coordinates are not selected, then the gradient update calculation is not performed. Tus, in each round of gradient updating, only one block (subset) of the gradient has to be computed, and the frst-order and secondorder momentum are calculated based on this. Moreover, it is not necessary to calculate the entire gradient. Terefore, compared with the other full gradient descent algorithms, the optimization method based on randomized block coordinate descent may save a lot of computing costs and reduce CPU utilization as well as memory utilization while ensuring the convergence of the algorithm. Te specifc calculation process is shown in Figure 3.

ACGB-Adam Algorithm Process.
Te ACGB-Adam algorithm process is described in the following sections.

Overall Architecture of Algorithm.
Te overall architecture of the ACGB-Adam algorithm is shown in Figure 4, which mainly includes three core modules: the random block coordinate method, the adjustment of gradient deviation values through adaptive parameters, and the composite gradient.
Te general strategy of the ACGB-Adam algorithm is to integrate the above three modules and apply three optimization methods to solve problems in parameter updating so as to improve the convergence speed, global optimization ability and reduce the system overhead. First, the current gradient update mode is optimized by RBC, which can avoid calculating all gradients and reduce the system overhead. Secondly, through the adaptive parameters, the algorithm could calculate the coefcient proportion of the frst momentum adaptively according to the diference between the current gradient and the frst momentum at the last time so as to minimize the infuence of the outlier gradient and optimize the search direction and search speed. Finally, the composite gradient combines the predicted gradient, the current gradient, and the frst momentum of the last time to form the fnal search direction, aiming to further approach the global optimal position and improve the global search ability of the algorithm.

ACGB-Adam Algorithm Process.
Te overall algorithm fow of ACGB-Adam is shown in Algorithm 3.

Experiment and Analysis
Te experiment and analysis are described in the following sections.

Standard Datasets and Experimental Setup.
To evaluate the performance of the ACGB-Adam algorithm, experiments were carried out on two standard datasets (Table 2) used for classifcation. Te proposed algorithm was further compared with the stochastic gradient descent (SGD), the adaptive gradient (AdaGrad), the adaptive moment estimate (Adam), the Adam optimization algorithm based on adaptive coefcients (A-Adam), Adam optimization algorithm based on composite gradient (C-Adam), and Adam optimization algorithm based on randomized block coordinate descent (RBC-Adam) algorithms ( Figure 5(a)).
(1) Mnist Dataset. Te Mnist dataset [31] developed by the US postal system is a classic dataset for image recognition. In this dataset, 70000 digital pictures of 0∼9 handwritten by 250 diferent people are counted. Tese numbers have been standardized in size and are located in the center of the image. Some examples of handwriting in the dataset are represented in Figure 5(a). (2) CIFAR-10 Dataset. Te CIFAR-10 dataset [32] is used for identifying universal objects which consists of 60000 RGB images. Compared with the handwritten characters, this dataset contains pictures of real objects in the real world. Te noise is large, and the proportions and characteristics of objects are diferent, which lead to great difculties in recognition. Figure 5(b) lists ten classes in the dataset, and each class shows ten pictures randomly. (3) Experimental Setting. MATLAB is used for the simulation of experiments. Te operating system is Win10, the CPU is Intel i7-1065G7, the primary frequency is 1.30 GHz, the memory is 16 GB, and the SSD capacity is 512 GB. To improve the comparability of the results, the six comparison algorithms involved in the experiment all use the same parameter settings. Te main superparameters are as follows: α � 0.001, β 1 � β 2 � 0.9, and the maximum number of iterations is 100. MSE and accuracy are used as performance evaluation indicators of algorithm training and classifcation accuracy.

Experimental Results of the Standard Dataset.
Te experimental results of the standard dataset are explained in the following sections. Figure 6 represents the training error loss and classifcation accuracy of the six algorithms on the Mnist. Te training error and test accuracy at the 100th iteration are shown in Table 3. It can be observed from Figure 6 and Table 3 that, as the number of iterations increases, each algorithm gradually converges on the   Figure 7 demonstrates the training error of algorithms on the CIFAR-10 dataset, along with the classifcation accuracy of the test set.

Mnist Experimental Results.
Te training error and test accuracy at the 100th iteration are shown in Table 4. It can be observed from Figure 7 and Table 4 that the training error of the ACGB-Adam algorithm in the early stage of iterations reduces quickly and gradually tends to be stable. With the increase in iterations, the error loss of the algorithm still decreases steadily. Compared with the other six algorithms, the ACGB-Adam algorithm has the smallest error loss value and the highest classifcation accuracy of 0.941. From the experimental results on the CIFAR-10 dataset, it can be inferred that the proposed algorithm in this article has better optimization performance than the other six algorithms in terms of convergence speed, accuracy, stability, and classifcation accuracy.

Memory and CPU Usage Rate Analysis.
For the two standard datasets, the changes in memory and CPU utilization of the seven algorithms with the number of iterations are illustrated in Figure 8 and Table 5. It can be observed from Table 5 and Figure 8 that, with the increase of iteration times, the memory and CPU   Computational Intelligence and Neuroscience utilization of each algorithm increase gradually. Under the same conditions, the memory and CPU utilization of the RBC-Adam algorithm is the lowest, followed by the ACGB-Adam algorithm proposed in this article. Te diference between the memory and CPU utilization rates of the two algorithms is less than 2%. Te specifc experimental results are shown in Table 6. It can be seen from Tables 5 and 6 that although the computing cost of the RBC-Adam algorithm is slightly lower than the ACGB-Adam algorithm, its training error and classifcation accuracy are far lower than those of the proposed algorithm. Altogether, the ACGB-Adam algorithm proposed in this article can achieve a dynamic balance in convergence and computing cost. On the premise of improving the convergence speed and accuracy, it can reduce the memory and CPU utilization to the greatest extent and has good comprehensive optimization performance.

Reservoir Porosity Prediction.
To further verify the effectiveness and utility of the algorithm proposed, the reservoir porosity in the real work area was predicted by a BP neural network based on the ACGB-Adam algorithm. Figure 9, the sample data are from the real data of two wells, A and B, in an exploration area. Te logging depth is 900∼1120 m, including 1492 records and 11 logging parameters. To achieve efcient and accurate porosity prediction, the grey correlation analysis method [33] is used to select parameters with high correlation with porosity as input parameters of the neural network, namely, Depth, RLLS (shallow investigate double lateral resistivity log), GR (natural gamma ray), HAC (high-resolution interval transit time), and DEN (density), as represented in Figure 10. Tis   helps to improve the data processing efciency on the premise of ensuring prediction accuracy. It can be assumed that these fve parameters that have a signifcant impact on porosity are diferent in nature and usually have distinct dimensions and orders of magnitude. In case the level diference between the parameters is too large, the infuence of the parameters with higher values will be highlighted, and the efect of the parameters with lower values will be weakened. To ensure the comparability of the data, this article uses the deviation normalization method [33] to preprocess the data and eliminates the infuence of the dimension and the value of the variable itself on the results.

Model Performance Analysis.
Te preprocessed data were taken as sample data, and the training set and test set were divided in the ratio of 8 : 2. Te BPNN model is set as follows: the number of hidden layers was 1, including 5 neurons, the transfer function was Tansig, the learning rate was 0.001, and the maximum number of iterations was 5000. Using MSE and RMSE as the model performance evaluation indices, the proposed ACGB-Adam_BP model was compared with fve methods, namely, SGD_BP, AdaGrad_BP, Adam_BP, C-Adam_BP, and RBC-Adam_BP. Te fnal training error and test error of various methods are enlisted in Table 7, in which the minimum values of MSE and RMSE are shown in bold, and the iterative error curve is shown in Figure 11.
It can be seen from Table 7 and Figure 11 that the BPNN based on the ACGB-Adam algorithm generates the lowest error in the training set and the test set and tends to be stable as soon as possible. Te convergence speed is much better than the other fve comparison algorithms. Tis indicates that the proposed algorithm has better optimization performance.

Porosity Prediction Results.
To further observe the above results intuitively and validate the efectiveness and correctness of the method proposed in this article for porosity prediction, the prediction results of the BP model based on the ACGB-Adam optimization algorithm are visually analyzed in terms of 300 test samples, as highlighted in Figure 12. Due to space constraints, the error analysis results on the training set are not shown in the article. From the comparison curve between the predicted value and the actual value of porosity, it can be observed that the BP neural network model based on the ACGB-Adam optimization algorithm has a relatively ideal prediction result, and the predicted abnormal value of porosity is quite less. Te absolute error of more than 86% of the data is   Table 7 is shown in bold.
Computational Intelligence and Neuroscience within 0.1%, which signifes the high prediction accuracy of the proposed algorithm.

Conclusion
Starting with the improvement of the Adam algorithm to heighten the convergence speed, accelerating the search for the global optimal solution, and enhancing the high-dimensional data processing ability, the Adam optimization algorithm combining adaptive coefcients and composite gradients based on randomized block coordinate descent is proposed, which enhances the performance of the algorithm. Trough theoretical analysis and numerical experiments, the following conclusions can be drawn: (1) Te gradient deviation caused by the outliers is crucial to the convergence speed and solution precision of the Adam algorithm. Using an adaptive coefcient to adjust the diference between the frstorder momentum and the current gradient can help in reducing the infuence of parameter proportion of deviation gradient, improving the slow convergence speed of the Adam algorithm, boosting the search speed, and improving the convergence accuracy. (2) By introducing the prediction gradient and combining the current gradient and the frst-order momentum to form a composite gradient, an accurate search direction can be obtained in the subsequent iteration, and then, the global optimization ability of the algorithm could be enhanced. (3) In the process of gradient updating, the RBC method is used to determine the gradient calculation method by randomly selecting variables from the parameter subset. Tis can reduce the calculation cost as much as possible on the premise of ensuring the convergence of the algorithm, enhance the processing ability of the algorithm for high-dimensional data, and maintain a good balance between the optimization accuracy and the system overhead. (4) Te test results on Mnist and CIFAR-10 standard datasets for classifcation indicate that the ACGB-Adam algorithm is signifcantly superior to SGD, AdaGrad, Adam, A-Adam, C-Adam, and RBC-Adam algorithms in terms of convergence speed and optimization accuracy. Although the proposed method is slightly higher than the RBC-Adam algorithm in terms of memory and CPU utilization, it can achieve a decent balance between convergence and system overhead.
According to the evaluation indices, the proposed algorithm has better performance advantages compared with the other fve algorithms, which validates the efectiveness of the algorithm improvement. (5) Te BPNN model based on the ACGB-Adam algorithm is applied to reservoir porosity prediction. Te experimental results suggest that, as compared to the BPNN model based on Adam and its variants, the maximum reduction of MSE and RMSE of the proposed model in this article is approximately 86.30% and 62.99%, respectively, which achieves higher accuracy in porosity prediction, verifes the superiority of the proposed algorithm, and extends the application feld of the algorithm.
Te method proposed in this article enhances the performance of the Adam optimization algorithm to a certain extent, but does not consider the impact of the second-order momentum and diferent learning rates on the performance of the original algorithm. Terefore, the follow-up research can focus on the optimization and improvement of the second-order momentum and learning rate and conduct indepth and detailed research on the parts not involved in this algorithm. Tis can help to attain better optimization performance.

Data Availability
No data were used to support the fndings of the study.

Conflicts of Interest
Te authors declare that they have no conficts of interest.