Boosting Adversarial Attacks on Neural Networks with Better Optimizer

Convolutional neural networks have outperformed humans in image recognition tasks, but they remain vulnerable to attacks from adversarial examples. Since these data are crafted by adding imperceptible noise to normal images, their existence poses potential security threats to deep learning systems. Sophisticated adversarial examples with strong attack performance can also be used as a tool to evaluate the robustness of a model. However, the success rate of adversarial attacks can be further improved in black-box environments. Therefore, this study combines a modified Adam gradient descent algorithm with the iterative gradient-based attack method. The proposed Adam Iterative Fast Gradient Method is then used to improve the transferability of adversarial examples. Extensive experiments on ImageNet showed that the proposed method offers a higher attack success rate than existing iterative methods. By extending our method, we achieved a state-of-the-art attack success rate of 95.0% on defense models.


Introduction
In image recognition tasks, convolutional neural networks are able to classify images with an accuracy approaching that of humans [1][2][3][4]. However, researchers have found that neural networks are also vulnerable to adversarial examples. Szegedy et al. [5] first proposed the concept of adversarial examples: images added with small perturbations, which cause neural network models to output incorrect classifications with high confidence. These adversarial perturbations are often indistinguishable to the human eyes (In other words, there is no obvious visual difference between the adversarial examples and the original images).
Adversarial attacks can be categorized into white-box and black-box attacks. A variety of techniques can be used to generate adversarial examples and perform white-box attacks, depending on the model structure and corresponding parameters [6][7][8][9][10]. In addition, adversarial examples are generally transferable as data generated for one model may be able to fool other models. This facilitates black-box attacks, in which the structure and parameters of the model are not available, for various neural networks [11]. Goodfellow et al. [6] suggested that different models learn similar decision boundaries during the same image classification tasks and obtain similar parameters. These properties make it easier to generalize adversarial examples to different models.
Although adversarial examples are generally transferable, the optimal approach of improving this transferability needs to be examined further. Due to a balance between attack performance and transferability, basic iterative attacks are often more effective than single-step attacks in white-box environments and weaker in black-box environments. In white-box attacks, iterative methods excessively fit specific network parameters, achieving a high success rate but preventing generalization to other models [9]. We consider that this is the result of overfitting, since attack performance for adversarial examples in white-box and black-box environments is similar to the neural network performance on training and test sets.
Unlike white-box attacks, black-box attacks are more consistent with actual attack-defense environments and are primarily performed in one of three ways. 1) Decision-based attacks are conducted using information collected from the network. Although access to model structure and parameters is unavailable, the attacker can generate adversarial examples by executing multiple queries against the model [12][13][14][15]. 2) In substitute-model techniques, the attacker can input images into the model to obtain an output label. Then, the substitute model can be trained to imitate the target model and generate adversarial examples [16]. 3) Transferability attacks are conducted by improving the transferability of adversarial examples generated in a white-box setting [8,9]. The first two black-box attacks require large quantities of model queries, which is impractical in some cases (e.g., online platforms often limit the number of queries). As such, this study investigates black-box attacks using adversarial data with strong transferability. We investigate a state-of-the-art optimizer Adam for improving black-box adversarial attacks.
In this paper, we propose the Adam Iterative Fast Gradient Method (AI-FGM) to improve the transferability of adversarial examples among different models [17]. Inspired by the fact that Adam is better than momentum in op-timization, we adapt Adam optimizer into the iterative gradient-based attack, so as to accelerate data update in dimensions with small gradients and achieve better convergence, by applying the second momentum term and a decreasing step size. As shown in Figure. 1, different from the momentum method which just accumulates the gradients of data points along the optimization path, our Adam method can also accumulate the square of the gradients. Such accumulation might help obtain an adaptive update direction, which leads to better local minimum. Besides, a variable step size is also used to avoid oscillating. AI Compared with existing gradient-based methods, the proposed method offers improved attack performance in black-box settings. This approach was tested on multiple networks, including adversarially trained networks, achieving higher attack success rates on black-box models. In addition, we combined our approach with advanced methods and attacked an ensemble of multiple networks, which further improved the transferability of adversarial examples.

Inception-v3
Inception-v4 Clean Adversarial Figure 2. Classification of a normal image and corresponding adversarial example by Inc-v3 and Inc-v4. The first row shows the top-10 confidence distributions for a clean image, for which both models provided a correct prediction. The second row shows the top-10 confidence distributions for the adversarial example generated for Inc-v3 by AI-FGM. It is evident that the adversarial example successfully attacked Inc-v3 (white-box) and Inc-v4 (black-box) with high confidence.  [19]. Lin et al. proposed Scale-Invariant attack method (SIM) and combined it with existing methods, resulting SI-NI-TI-DIM (the currently strongest black-box attack method) [18].

Adversarial Defense Methods
Multiple defense mechanisms have been proposed to protect deep learning models from the threat of adversarial examples [20][21][22][23][24][25][26][27][28][29]. Among these, adversarial training is the most effective way to improve model robustness [6,30]. In this process, adversarial examples are generated and added to the training set to participate in the model training procedure. Since normal adversarial training is still vulnerable to adversarial examples, Tramèr et al. proposed ensemble adversarial training to further improve robustness [31]. In this process, adversarial data generated by multiple models are included in the training set for a single model, thereby producing a more robust classifier.

Methodology
This section provides a detailed introduction to our proposed methodology. Let x denote an input image and y denote the corresponding ground-truth label. The term  represents network parameters and ( , , ) J x y  describes a loss function, typically the cross-entropy loss. The primary objective is to generate an adversarial example * x to fool the model by maximizing ( , , ) J x y  , such that the prediction label pre y y  . In this paper, we used an L  norm bound to limit adversarial perturbations, such that

Attack Methods Based on Gradient
In this section, several techniques that are used to solve the above optimization problem are introduced briefly.  Fast Gradient Sign Method (FGSM) [6]: As one of the simplest techniques, it seeks adversarial examples in the direction of the gradient of the loss function with respect to the input image x . The method can be expressed as:  Iterative Fast Gradient Sign Method (I-FGSM) [7]: This algorithm is an iterative version of FGSM. The approach involves dividing the FGSM gradient operation into multiple steps that can be expressed as follows: where  denotes the step size of each iteration and / T    , in which T denotes the number of iterations. The Clip function was included to limit the adversarial example * x within the  neighborhood of the original image x and satisfy the L  norm constraint. I-FGSM is more effective than FGSM in white-box environments, but less effective in black-box environments. In other words, I-FGSM exhibits poor transferability. This iterative attack method is also known as Projected Gradient Descent (PGD) if the algorithm is added by a random initialization on x [20].  Momentum Iterative Fast Gradient Sign Method (MI-FGSM) [8]: In this method, a momentum term is applied to the iterative process to escape from poor local maxima. This produces adversarial examples with more transferability, which can be expressed as [8]: where  denotes the decay factor of the momentum term and i g denotes the weighted accumulation of gradients in the first i rounds of iterations.

Adam Iterative Fast Gradient Method
The generation of adversarial examples is similar to the training of neural networks, both of the two process can be viewed as an optimization problem. Specifically, the adversarial example can be viewed as the training parameter, while the white-box model can be viewed as the training set and the black-box model can be viewed as testing set. From this point of view, the transferability of the adversarial examples is similar with the generalization of the models. Therefore, we can apply the methods used to improve the generalization of the models to the generation of adversarial examples. There are many methods proposed to improve the generalization of neural network models, which can be divide into two categories: better optimization and data augmentation. Correspondingly, these two kinds of methods can be used into adversarial attack, and there have been attempts to do so, e.g., MI-FGSM and DIM [8] [9]. Based on the analysis above, we aim to improve the transferability of adversarial examples with Adam optimizer since it performs well in the training of neural networks.
Adam [17] is an optimization algorithm combining momentum [32] and RMSProp [33]. In the iteration process, Adam accumulates not only the gradients of the loss function with respect to the input image, but also the square of the gradients. This is done to accelerate loss function ascent in dimensions with small gradients. In addition, Adam uses a decreasing step size in each iteration to achieve better convergence. 1: 5: 6: 7: 8: 9: 10: In contrast, both I-FGSM and MI-FGSM adopt a constant step size. In the latter stages of the algorithm, oscillations occur near the local maximum, which do not converge well. To solve this problem, we have improved the Adam algorithm by normalizing the step size sequence (defined in Eq. (13)) and have applied it to the iterative gradient method. The proposed Adam Iterative Fast Gradient Method (AI-FGM) is summarized in Algorithm 1.
Specifically, the gradient ( , , ) x t J x y    in each iteration is normalized by its own 1 L distance, defined in Eq.
(9), because the scale of these gradients differs widely in each iteration [8]. Similar to MI-FGSM, t m accumulates gradients of the first t iterations with a decay factor 1  , defined in Eq. (10). The result can be considered the first momentum. The term t v represents the second momentum, which accumulates the squares of gradients for the first t iterations with a decay factor 2  , defined in Eq. (11). It is noteworthy that 2 t g denotes an elementwise square t t g g  , where  represents the Hadamard product [32]. The terms 1  and 2  are typically defined in the range (0,1) . The update direction of input x is defined in Eq. (12), where the stability coefficient  is set to avoid a zero in the denominator. The t s term is advantageous as it prompts x to escape from local maxima and accelerates the updating of x in dimensions with small gradients.
Learning rate decay is often used in the training of neural networks. In this study, the step size used in the Adam optimizer is continually reduced to help improve the convergence of the algorithm. If 1  and 2  are set appropriately, the value of will decrease with the increase of t . Thus, a decreasing sequence can be generated when t ranges from 1 to T . The sequence can then be normalized to obtain the weight of step sizes in each iteration, relative to the total step size  . This can be used to acquire a set of exponential decay step sizes controlled by 1  and 2  , as defined in Eq. (13).
Existing techniques typically use the sign function to satisfy the L  norm limitation. However, if the sign function was applied in Eq. (14), the update direction in our method would be equivalent to that of MI-FGSM. Hence, we constrain adversarial examples within the L  norm bound by the Clip function and apply the step size and update direction within the corresponding 2 L norm bound, defined in Eqs. (14) and (15). The relationship between the L  norm bound (  ) and the 2 L norm bound (  ) is defined in Eq. (8), where N represents the dimension of the input image x .

Attacking an Ensemble of Networks
The proposed method was also used to attack an ensemble of networks. If an adversarial example poses a threat to multiple networks, it is far more likely to transfer to other models [11]. We followed the ensemble strategy proposed by Dong et al., in which multiple models are attacked by fusing network logits [8]. Specifically, in order to attack an ensemble of K models, logits were fused as follows:

Attacking a Single Network
We first performed adversarial attacks on a single network with FGSM, I-FGSM, MI-FGSM, and AI-FGM. And the adversarial examples were generated on four normally trained networks and tested on all seven networks. The results are shown in Table 1, where the success rates are misclassification rates for the corresponding models, with adversarial examples used as input. The decay factors 1  and 2  in AI-FGM were set to 0.99 and 0.999, respectively. The maximum perturbation  was 16. The number of iterations T for I-FGSM, PGD, MI-FGSM, and AI-FGM was 10. These methods are hereafter referred to as "iterative methods" without ambiguity. The effects of these parameter choices are discussed further in this section. As shown in Table 1, all four iterative methods attacked a white-box model with a near 100% success rate. AI-FGM performed better than the other three methods on all black-box models. For example, for adversarial examples generated on Inc-v3, AI-FGM had a success rate of 60.7% on Inc-v4, while MI-FGSM, PGD, I-FGSM and FGSM reached 54.3%, 18.9%, 27.5%, and 32.1%, demonstrating the effectiveness of the proposed method. An original image and the corresponding adversarial example generated for Inc-v3 by AI-FGM are shown in Figure. 2.

Decay Factors
The terms 1  and 2  control not only the decay amplitude of the step sizes, but also the accumulation intensity of gradients for t m and t v . As such, they have a direct impact on attack success rates. We applied a grid-search method to identify an optimal set of 1  and 2  values. In the experiments, the maximum perturbation  was set to 16 and the number of iterations T was set to 10. The values of 1  and 2  ranged from 0 to 1. Notably, we not only chose the values of 1  and 2  uniformly from 0.1 to 0.9, but also selected values close to 0 and 1, as shown in Figure. 3. This was done to provide a more comprehensive study on the effects of decay factors, as we propose that the relationship between the success rate and the decay factors may be nonlinear. Adversarial examples were then generated on Inc-v3 with AI-FGM and used to perform attacks on Inc-v3, Inc-v4, and Inc-v3ens4. It is evident from the figure that regardless of the values of 1  and 2  , attack success rates were al-ways near 100% in white-box environments. Beyond that, attack success rates were more sensitive to 1  for black-box models, regardless of whether networks were normally or adversarially trained. Success rates were maximized when both 1  and 2  were close to 1.

The Number of Iterations
The effect of iteration quantities on success rates was studied by performing attacks using iterative methods.
The maximum perturbation  was set to 16 and the decay factors 1  and 2  were set to 0.99 and 0.999, respectively. The number of iterations T ranged from 1 to 20. Adversarial examples generated on Inc-v3 with I-FGSM, MI-FGSM, and AI-FGM were then used to attack Inc-v3 and Inc-v4, as shown in Figure. 4. The experimental results presented above suggest that AI-FGM can increase the transferability of adversarial examples. In addition, the success rate of black-box attacks can be further improved by attacking an ensemble of networks. As discussed in Sec. 3.3, multiple models were attacked by fusing network logits. In the experiment, all seven networks described in Sec. 4.1 were used, and we generated adversarial examples on the ensemble of Inc-v3, Inc-v4, IncRes-v2, and Res-101, with FGSM, I-FGSM, MI-FGSM, NI-FGSM, and AI-FGM. Attacks were then performed on the other three defense models. Decay factors were set to 0.99 and 0.999, the number of iterations was 10, the maximum perturbation was 16, and each network had an equal ensemble weight of . The corresponding results are shown in Table 2, and it is evident that AI-FGM was more effective than the other four methods on adversarially trained models.

Combination with Advanced Methods
In this subsection, we combined our AI-FGM with SI-NI-TI, SI-NI-DI, SI-NI-TI-DI [18] respectively, and compared the black-box attack success rates of our extensions with the original methods under single-model setting.
The adversarial examples were generated on Inc-v3, with the number of iterations set to 10 and the maximum perturbation to 16 respectively. It is noteworthy that our combination with other methods is more like an improvement. For example, the combination of AI-FGM and SI-NI-TI was done by replacing the Nesterov optimizer in SI-NI-TI with Adam, thus resulting SI-AI-TI. As shown in Table 3, our method SI-AI-TI-DI achieved an average attack success rate of 65.1%, surpassing the state-of-the-art attack by 10.7%.   7 9 11 13 15 17 19 21 23 25 27 29 Success Rate (%) The size of perturbation Inc-v3 vs. I-FGSM Inc-v3 vs. MI-FGSM Inc-v3 vs. AI-FGM Inc-v4 vs. I-FGSM Inc-v4 vs. MI-FGSM Inc-v4 vs. AI-FGM Furthermore, we generated adversarial examples on the ensemble models by using our SI-AI-TI-DI. As shown in Table 4, we achieved an average attack success rate of 95.0% on adversarially trained models under the black-box setting, which raised a new security issue for the robust deep neural networks.

Discussion
It is commonly acknowledged that the training of neural network models is similar to the generation of adversarial examples, especially for gradient-based generation methods. Hence, techniques used in the training of neural networks, to improve model generalizability, can also be adopted to improve the transferability of adversarial examples. Since the Adam optimizer is often used in the training of neural networks, to improve convergence and achieve better performance on test sets, a second momentum term and decay step size in Adam were included in the AI-FGM algorithm. This was done to improve the transferability of adversarial examples. Furthermore, we suggest that other techniques (such as data augmentation) could be used to further improve the performance of adversarial examples in black-box environments.

Conclusions
In this study, we proposed the Adam Iterative Fast Gradient Method to improve the transferability of adversarial examples. Specifically, the Adam algorithm was modified to increase its suitability for the generation of adversarial examples. The proposed method improved both the iteration update direction and step size. The effectiveness of the proposed method was verified by an extensive series of experiments with ImageNet. Compared with previous gradient-based adversarial example generation techniques, our method improved attack success rates in black-box environments. In addition, we further improved the transferability of adversarial examples by combining our approach with existing methods and attacking ensemble models, which achieved a state-of-the-art attack success rate against adversarially trained networks. We suggest the proposed method could be used as a reference for other iterative gradient-based methods. For example, data augmentation could be combined with AI-FGM to achieve better attack performance.