Complete Defense Framework to Protect Deep Neural Networks against Adversarial Examples

Although Deep Neural Networks (DNNs) have achieved great success on various applications, investigations have increasingly shown DNNs to be highly vulnerable when adversarial examples are used as input. Here, we present a comprehensive defense framework to protect DNNs against adversarial examples. First, we present statistical and minor alteration detectors to ﬁlter out adversarial examples contaminated by noticeable and unnoticeable perturbations, respectively. Then, we ensemble the detectors, a deep Residual Generative Network (ResGN), and an adversarially trained targeted network, to construct a complete defense framework. In this framework, the ResGN is our previously proposed network which is used to remove adversarial perturbations, and the adversarially trained targeted network is a network that is learned through adversarial training. Speciﬁcally, once the detectors determine an input example to be adversarial, it is cleaned by ResGN and then classiﬁed by the adversarially trained targeted network; otherwise, it is directly classiﬁed by this network. We empirically evaluate the proposed complete defense on ImageNet dataset. The results conﬁrm the robustness against current representative attacking methods including fast gradient sign method, randomized fast gradient sign method, basic iterative method, universal adversarial perturbations, DeepFool method, and Carlini & Wagner method.


Introduction
Lately, the performance of deep neural networks (DNNs) on various applications, such as computer vision [1], natural language processing [2], and speech recognition [3], has been impressive. However, recent investigations also revealed that DNNs are fragile and are easily confused by adversarial examples contaminated by elaborately designed perturbations [4][5][6][7]. Szegedy et al. [4] first crafted some adversarial examples that are misclassified by DNNs with high probability.
ese adversarial examples easily deceived the targeted network even though they did not affect human recognition (see Figure 1). Undoubtedly, these adversarial examples are serious potential threats to security concerned applications such as autonomous vehicle systems [8] and face recognition [9]. erefore, improving the robustness of DNNs to adversarial examples is of crucial importance.
To date, roughly three categories of approaches have been used to defend against adversarial examples. e first is to ensure that neural networks are robust against adversarial examples, the second is to reform adversarial examples, and the third is to detect adversarial examples. With respect to the first type, one of the most effective strategies is to (re) train the targeted network with adversarial examples to obtain an adversarially trained targeted network. However, each of these approaches has its respective limitations. e first approach cannot effectively defend against adversarial examples if they have not been learned while training the network. e second approach would inevitably have an impact on legitimate examples as they would also be reformed. e third approach may reject the detected adversarial examples, which might be unacceptable in certain scenarios.
To overcome the aforementioned drawbacks, we propose a complete defense framework that combines the three types of approaches. It consists of two detectors, a deep residual generative network (ResGN) [10], and an adversarially trained targeted network [11]. e two de- On the contrary, we have discovered that the proposed attacks are implemented along the gradient-based and optimization-based attacks. Gradient-based attack just increases or decreases a loss function that depends on the gradients to seek an adversarial example, while optimization-based attack directly takes the minimal adversarial perturbation as one of the objective functions to optimize. Consequently, in general, the gradient-based attacks will introduce more visible flaws than the optimization-based attacks at the same attacking rate. According to their corresponding perturbations magnitudes, we call them noticeable perturbations and unnoticeable perturbations, respectively, in the remainder of the paper. In addition, to illustrate the distinguishment, we show the two types of perturbations in Figure 1. Adversarial examples with noticeable perturbations contain large statistical abnormalities that can be readily distinguished from legitimate ones, whereas unnoticeable perturbations can be readily destroyed. is led us to design a pair of complementary detectors: a statistical detector and a minor alteration detector.
e statistical detector relies on extracting  (d) show the noticeable and unnoticeable adversarial perturbations, respectively. e corresponding adversarial images generated by fast gradient sign and Carlini & Wagner methods are misclassified as "starfish" with 97.6% confidence (c) and "chickadee" with 95.2% confidence (e) Note the different colors in (b) and (d) represent the average pixel values of three channels of the residual image which is the difference between adversarial and legitimate images. e color close to blue means the small difference and the color close to orange means the large difference so that they can be distinguished between noticeable and unnoticeable perturbations (the pixel values of legitimate and adversarial images range from 0 to 255).
statistical features from an input image to distinguish adversarial image from legitimate one. e minor alteration detector relies on crafting minor alterations to an input image and discovering the difference in output between the original and altered examples to distinguish adversarial example from legitimate one. is aspect differs in two respects from our previous work [12]: the first is that we extract a detection feature from three channels instead of using the average of three channels; the other is that minor geometric alterations are performed instead of Gaussian noise corruption on the input image.
e improvement in the detection performance resulting from the two changes is verified by our empirical results.
We summarize our contributions as follows: (1) We designed two detectors, a statistical and a minor alteration detector, which are adaptive to the characteristics of adversarial perturbations and filter out adversarial examples contaminated by noticeable and unnoticeable perturbations, respectively.
(2) We used an adversarially trained targeted network to classify examples. us, the two detectors, ResGN, and the adversarially trained targeted network form a complete defense framework.
(3) We conducted comprehensive experiments on the ImageNet dataset [13] to confirm the effectiveness of the proposed complete defense framework. e paper is organized as follows. In Section 2, adversarial attacks and defensive techniques are briefly surveyed. In Section 3, the proposed complete defense framework is presented and analyzed. In Section 4, the comprehensive experiments are described to verify the effectiveness of the proposed complete defense framework. Section 5 draws the conclusions.

Background
Although attack is opposite to defense, attacking study is essential to increase the adversarial robustness of networks.

Attacks.
In terms of the knowledge known by the adversary regarding the targeted model, attacks are grouped into white box, gray box, and black box attacks. In white box attacks, the adversary knows the structure and parameters of the targeted network, the training data, and even the defensive scheme of the defender. Of the three categories of attacks, white box attacks are the most frequently used when evaluating defensive techniques. In the following, we review widely used white box attacks.

Fast Gradient Sign Method (FGSM).
Goodfellow [5] developed FGSM. It is a single-step attack that uses the ℓ ∞ metric measuring the distance between a legitimate and perturbed example. Formally, the adversarial example x adv is obtained as follows: (2) where g(·) is the classification result of the targeted network, ∇ x J(· , ·) is the gradient of the cost function J(· , ·) with respect to x, and the value of ∈ controls the strength of the perturbation. Although FGSM is efficient, it introduces noticeable perturbations.

Randomized Fast Gradient Sign Method (R-FGSM).
An improved version of FGSM, R-FGSM, was proposed by Tramèr [14]. It injects a small amount of random noise into the legitimate example before performing the FGSM attack. Specifically, for α < ∈, a legitimate example x is corrupted into x R by the additive Gaussian noise: after which, the FGSM attack is performed on x R , as in equations (1) BIM has a much higher attacking rate than FGSM and it still causes noticeable perturbations even though fewer visual flaws occur than those crafted by FGSM.

DeepFool Method (DeepFool).
Moosavi-Dezfooli et al. [16] proposed an iterative attack based on the ℓ 2 norm to compute the minimal distortion for a given example. e attacker assumes that the legitimate example resides in the region restricted by the decision boundaries of the classifier. is algorithm disturbs the example by a small vector. en, the resulting example is taken to the boundary of the polyhedron, which is obtained by linearizing the boundaries of the region within which the image resides during each iterative step. e final perturbation is calculated by accumulating the perturbations added to the legitimate example in each iterative step, which forces the perturbed image to change its ground truth (GT) label.
e formulation of DeepFool is as follows: e perturbations introduced by DeepFool are unnoticeable and the attacking rate is much higher than that of FGSM and BIM.

Carlini & Wagner Method (CW).
In an attempt to counter defensive distillation, Carlini and Wagner [17] introduced optimization-based adversarial attacks that render the perturbations quasi-imperceptible by restricting their ℓ 2 , ℓ ∞ , and ℓ 0 norms. e researchers demonstrated that distilled targeted networks almost completely fail against these attacks. In addition, the adversarial examples generated using a targeted network without distillation can be transferred successfully to a network with distillation. ese facts indicate that the perturbations are suitable for black box attacks. In the remainder of this paper, CW_UT and CW_T denote untargeted and targeted CW attacks in ℓ 2 norms, respectively.

Universal Adversarial Perturbations (UAP).
Moosavi-Dezfooli et al. [18] developed an attack generating universal adversarial perturbations that are imageagnostic. In their problem context, given the targeted classifier g and data distribution S, the existence of small universal perturbations ρ whose magnitude is measured by the ℓ p norm with p ∈ [1, ∞) and leading to most misclassified images is examined. e problem can be formulated as follows: e parameter ∈ controls the magnitude of the perturbation ρ and δ quantifies the failure rate of fooling the targeted network for all images sampled from distribution S. e UAP attack is implemented iteratively and the iteration will not terminate until most of the sampled images are misclassified by the targeted network.

Defensive Techniques.
ere is a rich literature relating to defensive strategies. Here, we outline the major defensive strategies into five sections.

Adversarial Training.
By augmenting training samples with adversarial examples, adversarial training enhances the robustness of the network. Goodfellow et al. [5] and Huang et al. [19] evaluated their adversarial training only on the MNIST dataset and it is thought that adversarial training had provided regularization for the DNNs. Kurakin et al. [11] presented a comprehensive analysis for adversarial training on the ImageNet dataset. Madry et al. [20] showed the alteration between retraining and projected gradient descent (PGD) attack is one of the most effective methods for adversarial training, in which retraining used the adversarial examples generated by the PGD attack. Tramèr et al. [14] proposed "ensemble adversarial training," in which the training set included adversarial examples produced by the trained models itself and pretrained external models to improve the robustness of the network for the transferred examples.

Network Distillation.
As a training strategy, Hinton et al. [21] originally designed a distillation technique that transfers knowledge from a complex network to a simpler network with the purpose of reducing the size of DNNs. For distillation, high temperature can increase the vagueness of softmax output. e property was applied and Papernot et al. [22] further proved that high-temperature softmax decreased the sensitivity of the model to small perturbation. e result is harmful to the adversary as the attack primarily relies on the sensitivity of model. us, they proposed defensive distillation to improve the robustness of the model to adversarial examples. In their subsequent work, Papernot and McDaniel [23] solved the numerical instabilities encountered in [22] to extend the defensive distillation method.

Adversarial Examples Reforming.
is defense reforms adversarial examples, aiming at mitigating adversarial perturbations prior to the targeted network. Gu and Rigazio [24] proposed a variant of the autoencoder network. At inference, the network is used to encode adversarial examples to remove adversarial perturbations. Santhanam and Grnarova [25] used both the discriminator and the generator of the generative adversarial network (GAN) to project the adversarial examples back onto the legitimate data manifold. In another GAN-based defense method, Samangouei et al. [26] used the generator to sanitize inputs prior to the targeted classifier. Xie et al. [27] randomly resized the adversarial examples and added random padding to the resized examples to reduce the effects of adversarial perturbations. Liao et al. [28] proposed high-level representation-guided denoiser (HGD) to defend the models for image classification. e HGD was trained by optimizing a loss function that represented the difference between the target model's outputs of the clean image and denoised image. Jia et al. [29] studied a preprocessing module to reform adversarial examples, termed as ComDefend, which is composed of a compression convolutional neural network and a reconstruction convolutional neural network.

Adversarial Examples Detecting.
e purpose of defense is to filter out adversarial examples. Metzen et al. [30] learned a small network as an auxiliary part of the original network to output the probability of the input example being adversarial. Grosse et al. [31] enabled their model to classify all adversarial examples into one special class by augmenting the targeted network with an additional class. From a Bayesian perspective that the uncertainty of adversarial data is higher than that of legitimate data, Feinman et al. [32] deployed a Bayesian neural network to estimate the uncertainty of input data so as to detect adversarial input data. Xu et al. [33] introduced feature squeezing, an approach to detect adversarial examples by comparing the predictions of the targeted network on the original input and the squeezed input. Fan et al. [12] proposed an integrated detection framework comprising the statistical detector and Gaussian noise injection detector to filter out adversarial examples with different characteristics of perturbations.

Miscellaneous Approaches.
Owing to the great diversity of adversarial examples, multiple defense strategies can be integrated to defend against adversarial examples. PixelDefend [34] and MagNet [35] combine an adversarial detector and an adversarial reformer to compose a defense scheme. Akhtar et al. [36] proposed a defense against UAP. ey trained a Perturbation Rectifying Network (PRN) as "preinput" layers to a targeted model. e PRN acts as both the UAP detector and input image reformer.

Detection of Adversarial Example Based on Two Detectors.
e proposed detection method employs both a statistical detector and minor alteration detector in two consecutive stages to adapt to different perturbation characteristics. Specifically, the tasks of the former and latter detectors are to inspect noticeable and unnoticeable perturbations, respectively. Obviously, the two detectors are complementary. e detection procedure is shown in Figure 3.  [37], we apply subsets of the transition probability matrix as the perturbation detection feature. Since SPAM is capable of highlighting statistical anomalies, and noticeable perturbations definitely introduce statistical anomalies to an adversarial image, the examples containing noticeable perturbations are naturally expected to be detectable. Last, an ensemble of base fisher linear classifiers undergoes learning to make a decision. During testing, the SPAM-based feature is extracted from the input image and then the ensemble classifier outputs the decision.
(1) SPAM-Based Feature Extraction. I � (R, G, B) represents a color image with a spatial size of m × n, where R � r(i, j) , G � g(i, j) , and B � b(i, j) denote the red, green, and blue channels, respectively. First, the difference array of the three channels and eight directions are computed. For instance, for the R channel, the difference array D(i, j) ⟶ R in the horizontal direction from left-to-right ( ⟶ ) is calculated as follows: e features for the other directions are calculated in the same way. β ∈ ⟶ , ←, ↑, ↓, ↖, ↘, ↙, ↗ { } denotes eight directions. en, the second-order Markov process is used to model the second-order SPAM to construct a transition probability matrix. Taking the red channel and the horizontal direction ( ⟶ ) as an example, the transition probability matrix is defined using the following formula: where −T ≤ u, v, w ≤ T, T is a preset parameter and p denotes a conditional probability which satisfies Since u, v, and w change from −T to T, the range of each of the parameters is 2T + 1 and the size of transition probability matrix M is (2T + 1) 3 , accordingly.
e transition probability matrices of the other directions and other channels can be defined in the same way. To reduce the dimension, the transition probability matrices of some directions are fused on average as follows: Mathematical Problems in Engineering e fused transition probability matrix is used as the feature. Finally, the features of the three channels are denoted as . e final feature F M is a joint of F R , F G , and F B . e dimension of the features of each channel is 2(2T + 1) 3 . To obtain an optimal trade-off between efficiency and performance, we set T as 3 and the dimension of F M is 686 * 3 � 2058.
To provide an intuitive understanding of SPAM-based feature extraction, a legitimate image and its corresponding FGSM attacked adversarial image were selected to illustrate the significant differences between their F M features (see Figure 4).
(2) Ensemble Binary Classifiers. Ensemble classifiers [38] consist of multiple base classifiers independently trained from a set of positive and negative samples. As a base classifier, each fisher classifier is trained from a random subspace of the entire feature space. e symbol L stands for the number of base classifiers. For the ith base classifier, the corresponding random subspace are represented using D i , i � 1, . . . , L. en, we train a base classifier B i on features of positive and negative samples using the fisher linear discriminant (FLD). For a test feature y, the decision of the ith base learner is B i (y (D i ) ). After collecting all L decisions, the final classifier output B(y) is formed by combining them using a majority voting strategy, where 1 stands for positive class and 0 for negative class: We notice that if a legitimate example undergoes a minor alteration, the classification result given by the targeted network would be relatively unchanged, whereas an adversarial example containing unnoticeable perturbations after minor transformation would have a significant effect on the classification result. e reason may be that the legitimate example is located on the manifold of its GT class such that a slight bias would not influence the result critically. Nevertheless, an adversarial example with unnoticeable perturbations would still be close to the manifold of its GT class, but these perturbations are easily damaged by minor transformations. us, its classification result would most Is adversarial example? max (d1, d2, d3, d4) > T? Figure 3: An overview of detection of adversarial example. First, the input example is fed into the statistical detector. If the input example is not determined to be an adversarial example with noticeable perturbations, it will be further analyzed by the minor alteration detector. Specifically, the input example is altered by four minor operations and then the original input and its four altered counterparts are all fed into the targeted network. en, the L 1 norm difference between two outputs corresponding to the original input and any one of the four alterations is calculated. Finally, the max value of the four differences is compared with a threshold T. If the maximum exceeds the threshold, the input example will be detected as adversarial example with unnoticeable perturbations, otherwise legitimate example.
probably change once it undergoes minor alterations. Depending on this observation and the need to adapt to a range of adversarial perturbation characteristics, we devise four minor alterations to images and a straightforward yet effective max fusion rule: taking the max value among four output differences between the image before and after alteration and the output is given by the targeted network. In terms of the max value, the detector can make a decision. If the max value is greater than a threshold, the input sample is classified as an adversarial example with unnoticeable perturbations; otherwise, it is a legitimate example. is detection process is referred to as a minor alteration detector (see "Minor alteration detector" in Figure 3). Obviously, the statistical detector and the minor alteration detector function in a complementary way to detect adversarial examples. Specifically, the four alterations operate as follows: remove&interpolation (ri). Remove a small number of rows and columns at fixed positions in the image and then recover the removed rows and columns by interpolation (see Figure 5(a)). remove&expansion (re). Remove a small peripheral part and then expand the remaining part to the original image size (see Figure 5(b)). rotate-clockwise (rc). Rotate clockwise at a small angle around the geometric center of the image. rotate-anticlockwise (ra). Rotate anticlockwise at a small angle around the geometric center of the image.
Computing the difference is another critical issue, we choose a prediction probability distribution vector as the output and use the ℓ 1 norm to measure the output difference d(x, x alt ): where g(x) and g(x alt ) denote the prediction probability distribution vectors of targeted network for the original input x and its minor alteration version x alt , respectively. Furthermore, the range of d(x, x alt ) is from 0 to 2, with a higher value indicating a greater difference. We expect the difference to be as small as possible for legitimate input and as large as possible for the adversarial input. Figure 6 shows some examples of a legitimate image and adversarial images produced by DeepFool, CW_UT, and CW_T attacks.

e Deep Residual Generative Network to Clean Adversarial Perturbations.
Cleaning adversarial perturbations is also a feasible defense scheme. In this paper, we utilize our previously proposed network called ResGN to reform adversarial examples. e network with residual blocks is conditionally generative and is trained in a supervised way. e supervisions are pairs of legitimate image and corresponding adversarial image, and the adversarial images are generated by white box attacking on a certain targeted network. e optimization of ResGN is driven by minimizing a joint loss composed of pixel loss, texture loss, and semantic loss, in which the latter two losses depend on a pretrained network independent of the targeted network (see Figure 8). e specific structure of ResGN and its detailed training algorithm are referred in [10].

e Complete Defense Framework.
us far, we have discussed two defense methods: adversarial examples detection and adversarial perturbations cleaning. Accordingly, we can implement three defense patterns: the integration of a nonadversarially trained targeted network with adversarial examples detection, adversarial perturbations cleaning, and both of them (see Figures 9(a)∼9(c)). Apart from the three patterns, an adversarially trained targeted network is a well-known defensive technique [10], in which the input samples are directly classified by such a targeted network (see Figure 9(d)). Naturally, other defensive options are to replace the nonadversarially trained targeted network in ese issues suggest the use of an adversarially trained targeted network rather than one that is nonadversarially trained. In addition, the joint use of detection and cleaning modules is expected to significantly boost the performance of the adversarially trained targeted network alone. is led us to combine the two proposed detectors, the ResGN, and an adversarially trained targeted network and to construct a complete defense framework (see Figure 9(g)).

Experiment Setup.
In our experiments, we used a PC equipped with an i7-6850K 3.60 GHz CPU and a NVIDIA TITAN X GPU. e developing environment is Tensor-Flow [39]. We chose Inception-v3 [40] and adv-Inception-v3 [11] as targeted network and adversarially trained network, respectively. From the ImageNet validation dataset, 5000 legitimate images correctly classified by the pretrained Inception-v3 model were selected, in which 4000 images form the training set and 1000 images constitute the testing set. All adversarial examples were generated from the legitimate images using attacking implementation from Cleverhans library [41]. Table 1 demonstrates the parameters of these attacking methods that are mentioned in Section 2.1, and Table 2 shows the classification accuracy of their corresponding adversarial examples using Inception-v3.        ese results motivate us to attempt to separate all samples into two groups in terms of perturbation significance: one group includes FGSM, R-FGSM, BIM, and UAP adversarial examples with noticeable perturbations and the other group contains the adversarial examples with unnoticeable perturbations produced by DeepFool, CW_UT, CW_T, and legitimate examples. Accordingly, the former and the latter groups constitute the positive and negative training set, respectively. Table 4 lists overall TPR and FPR values of the testing set which are 99.6% and 0.6%, respectively. ese results confirm that the statistical detector achieves promising performance.
Parameter ∈ in FGSM, R-FGSM, and BIM attacks controls the attacking strength. In a real setting, attackers may yield adversarial examples using various attacking strengths. us, we designed this experiment to explore the transfer ability of the statistical detector. e results in Table 5 show that each detector learned from examples with ∈ performs well on the testing set when ∈ is the same or larger, whereas the performance decreases when ∈ is weaker for the testing set. us, from the average sight, the detector learned from samples set with ∈ � (8/255) performs the best (99.4%) and is supposed to have the best transfer ability. e results validate that more strongly attacked examples are much easier to detect since the statistical anomaly hidden in them is more evident than that in more weakly attacked examples. Although higher transfer ability is obtained at the cost of a slightly increased FPR (1.7%), we still favor the detector learned from the sample set with ∈ � 8/255 as the ultimate statistical detector owing to its satisfactory TPR.

Results of Minor Alteration Detector.
e performance of the statistical detector in terms of detecting the adversarial examples produced by FGSM, R-FGSM, BIM, and UAP is significantly high. Unfortunately, it was not possible to reliably distinguish the adversarial examples produced by DeepFool, CW_UT, and CW_T from legitimate examples.
is experiment was therefore intended to evaluate the capability of the minor alteration detector to detect the three types of adversarial examples with unnoticeable perturbations.
First, we selected an optimal parameter for each alteration from the candidate optional multiple parameters. We calculated the AUC value of the ROC curve (ROC-AUC) for all candidate parameters (see Table 6). e value in bold indicates the top value and the corresponding parameter.
en, an optimal decision threshold is needed to be determined. Because a detector with high TPR at the cost of high FPR is meaningless, we chose 5% as the acceptable FPR.
us, the optimal decision threshold is the value corresponding to 5% FPR. Note that the optimal parameters for all alterations and the final optimal decision value are derived from training set. e performance of the different single alterations is also compared by evaluating TPR and FPR for the four alterations, in addition to the max fusion rule. Table 7 lists that all corresponding thresholds and the optimal TPR (94.8%) and FPR (4.7%) have been obtained by the max fusion rule.
ese results confirm that the max fusion rule has an advantage over the single alterations.

Results of Combination of Two Detectors.
e combined results show that the combination of the statistical detector and the minor alteration detector (shown in Figure 3) enables all seven types of adversarial examples to be detected. A promising trade-off between TPR (97.6%) and FPR (6.3%) ( Table 8) was obtained by combining the two detectors.
We next compare the performance of the combination of the two detectors with that of the integrated detector [12] and feature squeezing detector [33]. e proposed detector achieves highest TPR on adversarial examples produced by FGSM, R-FGSM, BIM, and UAP. However, the performance of the proposed detector on adversarial examples produced by DeepFool, CW_UT, and CW_T is slightly weaker than that of the integrated detector. Although feature squeezing detector achieves a higher TPR on CW_UT and CW_T adversarial examples than ours, the proposed detector has better performance on the other types of adversarial examples. It is worth mentioning that the proposed detector obtained a lower FPR (6.3%). In general, the performance of the proposed detector in terms of detecting adversarial example is satisfactory.

Results of Optimization of ResGN.
We discovered that increasing the number of residual blocks improves the performance of ResGN; nevertheless, it is at the expense of a considerable increase in the computational complexity. Considering the need to maintain a balance, we use 24 residual blocks in ResGN. Considering the adaptability of ResGN, vgg-19 is selected as the pretrained network during training rather than Inception-v3 or adv-Inception-v3 network. Last  Table 9 confirm that ResGN optimized by FGSM      e proportion was increased in increments of 10 percent. e adversarial examples contained in each of the testing sets vary with respect to their type and attacking strength. Furthermore, we compared the performance of our complete method with that of the RADOMIZATION [27], HGD [28], and Com-Defend [29] techniques. In Figure 11, the "accuracy" is plotted as a function of the "proportion" for the three forms of defense, details of which are provided in the legend. Instead of Inception-v3, adv-Inception-v3 was used as targeted network and its ability to function alone as a form of defense is shown in Table 10. e experimental results indicate that, for detection alone or for a combination of detection and ResGN, working collaboratively with Inception-v3 or adv-Inception-v3 yields performance superior to that of other defense methods; second, the combination of our proposed defense methods with adv-Inception-v3 improved the performance of adv-Inception-v3 alone remarkably (see Table 10). ese results verify that ResGN is actually able to meaningfully improve the performance of an adversarially trained network because it is at least capable of mitigating adversarial perturbations even though the perturbations are impossible to remove perfectly owing to their diversity. Finally, the performance of the joint use of detection and ResGN with the targeted network is expected to outperform that of ResGN    en, for the proposed complete defense, the total time required for recognizing N samples is N × (T d + s × T c + T r ). For the defense excluding detection, the total time required for recognizing N samples is N × (T c + T r ). For the complete defense, we assume a special case. If the s equals to 0, the required time is N × (T d + T r ). So, in this case, if the detection is more efficient than adversarial perturbation cleaning, which means T d < T c , the complete defense will be more efficient than the defense excluding detection, and vice versa. In addition, with s increasing, the required time will increase. In sum, the computational complexity of the complete defense depends on two factors: the computational complexity of the detection module and the percent of the adversarial example. e percent of adversarial example is out of control, so the efficiency of detection is crucial. For each image, the average required time of our proposed framework is 1.5 seconds. Specially, the detection module spends 1.0 seconds. erefore, the circumstance that the detection module in our work has a heavier time consumption than adversarial perturbation cleaning module, the adversarially trained targeted network calls for a study of more efficient detectors in the future.

e Complete Defense Aware Attack.
We evaluate the robustness of the complete defense framework in totally white box setting. e adversary is aware of the complete defense framework and includes them in generating adversarial examples. e entire attacking process is explained in Algorithm 1. We consider two types of white box attacks, which are single-step attack FGSM and iterative attack BIM. e adversarial examples generated by using them to attack the Inception-v3 and complete defense framework are illustrated in Figure 12. More serious color or texture distortions are induced by attacking the complete defense than sole Inception-v3, and the differences could be observed for FGSM and BIM from the global and local region level. Table 11 shows the success rates of FGSM and BIM attacking Inception-v3 and complete defense framework on the testing set. e success rates of attacking complete defense framework are much lower than that of Inception-v3. In terms of our proposed complete defense itself, the success rate of a single-step attack is higher than that of an iterative attack. e adversary almost exclusively attacks the adversarially trained targeted network using single-step attack due to the involvement of detection. While in iterative attack, it is essential for adversary to attack the combination of adversarial perturbation cleaning and an adversarially trained targeted network. e results confirm that the  examples are generated by using BIM to attack proposed complete defense framework. More serious color or texture distortions are induced by attacking the complete defense than sole Inception-v3, and the differences could be observed for FGSM from global level (see (b)) and BIM from local region level (see (d)). e differences in local region are marked with the red circle. complete defense maintains it robustness in totally white box setting.

Conclusions
We propose a complete defense framework comprising three modules: adversarial example detection, adversarial perturbation cleaning, and adversarially trained targeted network. Specifically, if an input sample is detected to be adversarial, the sample is cleaned by ResGN and then classified by the adversarially targeted network. Otherwise, the sample is directly classified by the adversarially targeted network. Furthermore, detection is accomplished by two complementary detectors adaptive to adversarial perturbation characteristics: the statistical detector filters out the adversarial examples with noticeable perturbations and the minor alteration detector filters out the adversarial examples with unnoticeable perturbations. In future work, the proposed complete defense framework is expected to extend to other applications, such as face recognition. Furthermore, we aim to dynamically optimize the proposed defense method to incrementally boost the capability to counteract adversarial examples.

Data Availability
e ImageNet dataset used in the experiments is public. Please refer to the corresponding project website for downloading these datasets. e source code of the proposed method is available from the corresponding author on reasonable request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.