From Spatial to Spectral Domain, a New Perspective for Detecting Adversarial Examples

Deep neural networks (DNNs) have been closely related to the Pandora’s box from the moment of its birth. Although it achieves a high accuracy signiﬁcantly in real-world tasks (e.g., object detecting and speech recognition), it still retains fatal vulnerabilities and ﬂaws. Malicious attackers can manipulate DNN model misclassiﬁcation just by adding tiny perturbations to the original image. These crafted samples are also called adversarial examples. One of the eﬀective defense methods is to detect them before feeding them into the model. In this paper, we delve into the representation of adversarial examples in the original spatial and spectral domains. By qualitative and quantitative analysis, it is conﬁrmed that the high-level representation and high-frequency components of abnormal samples contain richer discriminative information. To further explore the inﬂuence mechanism between the two factors, we perform an ablation study and the results show a win-win eﬀect. Utilizing the ﬁnding, a detecting method (HLFD) is proposed based on extracting high-level representation and high-frequency components. Compared with other state-of-the-art detection methods, we achieve a better detection performance in most scenarios via a series of experiments conducted on MNIST, CIFAR-10, CIFAR-100, SVHN, and Tiny-ImageNet. In particular, we improve detection rates by a large margin on DeepFool and CW attacks.


Introduction
Exploring the inherent patterns and changing trends of data is an eternal subject for all human beings. e traditional way of processing data is driven by rules which obtain through experience or manual summary. e advent of DNNs makes it possible to process the data automatically. Relying on the characteristic, it has recently been widely applied in data-sensitive fields, such as financial payment [1], medical assistance [2], and satellite remote sensing [3]. On the contrary, the benefit of its automation also brings the property of a black-box [4][5][6], which means almost everything inside the network is unknown to us. Against this background, Goodfellow et al. found in [7] that adversarial examples, imperceptible for a human observer, produced by adding crafted perturbations to benign images were able to misclassify the DNNs model with high confidence. ere is no doubt that how to defend against these out-of-distribution samples has become a top priority subject.
Before implementing adversarial defenses, exploring the intrinsic properties of adversarial examples is crucial. At present, there are two main explanations for the appearance of adversarial samples: low-probability regions in manifold [8] and linear explanations [7]. ese studies all focused on the distribution of spatial probability (i.e., statistical probability) to make them reasonable. However, the latest studies in [9][10][11][12][13][14][15] indicated that adversarial examples are mainly concentrated in the high-frequency region. Moreover, Ilyas et al. further illustrated in [16] that adversarial examples are not even bugs. ey are non-robust features, which means we can learn them via training a DNN model.
Inspired by these studies, we initiate to explore the distinction of the representation of adversarial examples in the original spatial and the spectral domain. In fact, the spatial and the spectral domain refer to the original samples and the original samples after Fourier transform, respectively. Intuitively, we discover that adversarial examples have many small black dots in the mid-high frequency region. By performing cluster analysis, we further discover that adversarial examples can be better classified in the spectral domain. For promoting the detection performance, we further analyze the impact of different layers of network and frequency bands on the detection performance. e experimental results surprisingly demonstrate that extracting high-level representation and high-frequency components can improve the detection performance significantly.
In this paper, we propose the detection method HLFD to detect the abnormal samples based on extracting high-level representation and high-frequency components. Using the high-level feature maps of the model as input, we transform them to the spectrum by Fourier transform and extract the high-frequency components. A detector with an ideal performance will be born after training these transformed data. Compared with other defense methods, there is no need to alter the network architecture and the lower computational cost, which are its superiority. e overview of the detection model is shown in Figure 1.
We evaluate our method on six different attacks, the fast gradient sign method (FGSM) [7], two of its variants, the basic iterative method (BIM) [17], the projected gradient descent (PGD) [18], Jacobian-based Saliency Map Attack (JSMA) [19], Carlini and Wagner (CW) [20], and DeepFool [21] methods. Using only one of the tricks of high-frequency extraction or high-level representation cannot achieve the ideal detection performance on DeepFool and CW, although we perform well in FGSM, BIM, PGD, and JSMA attacks. Considering that high-level representation and high-frequency components may affect each other, we further perform an ablation study for the two factors. Experimental result shows a win-win effect which means a better performance after applying the two tricks. DeepFool and CW attacks can be detected efficiently by employing the two factors simultaneously. For a more rigorous conclusion, the detector is evaluated on five datasets: MNIST, CIFAR-10, CIFAR-100, SVHN, and Tiny-ImageNet (aka T-ImageNet).
For a fair comparison, adversarial examples are restricted to similar L 2 norm values and employ three state-ofthe-art detectors, kernel density and Bayesian uncertainly (KD + BU) [22], local intrinsic dimensionality (LID) [23], and Mahalanobis distance (M-D) [24] as a contrast. Our method outperforms the other three detection methods for each attack on CIFAR-10 and CIFAR-100. Although our detection rate is not the highest in some scenarios, the gap is not large at least. Moreover, we improve the detection rates by a large margin on DeepFool and CW attacks.
In particular, our main contributions are as follows: (i) We intuitively and experimentally prove that the spectral samples have richer discriminative details, which can be effectively distinguished by the detector. (ii) e ablation study shows that the detection performance can be improved effectively whether using high-level representation or high-frequency extraction. (iii) An effective method for detecting adversarial examples is proposed, which performs better in most scenarios compared to the state-of-the-art.

Related Work
In this section, we will briefly introduce several state-of-theart methods for adversarial attacks and adversarial defenses.  [17]. In most cases, adversarial samples generated solely by the FGSM method are ineffective. As a variant of FGSM, it performs multiple gradient computations on the direction of loss function and can be represented as follows:

Basic Iterative Method (BIM)
where N, α refer to the number of iterations and the step size of each iteration, respectively, and J(X, Y true ) means the cross-entropy loss function on a given image X and label Y. e norm of (X adv N+1 − X) is limited to ε by the clipping function. [18]. PGD is an advancement of BIM, which alters an initialized with uniform random noise. [19]. Utilizing the Jacobian matrix, Papernot et al. put forward Jacobian-based Saliency Map Attack (JSMA). It mainly adopts the priori probability of the last layer to  backpropagate, thereby obtaining the corresponding gradient information.

Jacobian-Based Saliency Map Attack (JSMA)
where t is the class to be attacked. By constructing a saliency map S, the pixels which contribute the most to the result can be found. Adversarial examples can be generated via exploiting the information. [20]. CW abstracts the adversarial attack task into an optimization problem. On the basis of guaranteeing the model misclassify, constantly seeking for the smallest adversarial perturbation is its key idea. We formulate it as follows:

Carlini and Wagner (CW)
where Z(x) is the output of the pre-softmax layer. Utilizing the tanh(·) function, map the X adv to [−1, 1], thereby avoiding the loss caused by truncation.
2.1.6. DeepFool [21]. Compared with other attack methods, DeepFool is known for its minimal disturbances, which makes adversarial examples harder to detect. It stops seeking when the samples just cross the decision boundary as described by the formula below: Δ X, X adv � arg min Z ||Z|| 2 , subject to : g(X + Z) ≠ g(X), where Z is actually the smallest perturbation. We apply a python tool Foolbox [25] to generate adversarial examples for all attack methods. An intuitive rendering of the comparison of various attack methods is represented in Figure 2.

Adversarial Defense and Detection.
In general, adversarial defenses can be mainly divided into two categories. One strategy is to modify the architecture or parameters of the network, and the other is to defend by preprocessing benign images before feeding them into the model. Adversarial training belonging to the first strategy [26,27] has achieved great success in the defensive areas. e model will be retrained on normal and abnormal samples to learn the decision boundary details, thereby avoiding misclassification and having stronger robustness. Nonetheless, massive data support and ineffectiveness against specialized attacks have made it less attractive. It did not take long for adversarial distillation [28] to be raised. Although the experiment conducted on small datasets showed that it can defend against adversarial examples effectively, it was limited to be used in DNN models with probability distribution vectors. Methods belonging to the latter, like JPG image compression [29], rejecting classification [30], and detecting [22][23][24][31][32][33] are trying to eliminate the abnormal statistical characteristics before poisoning the model. e defense methods without touching the training process are undoubtedly more exciting.
As one of the approaches in the defensive field, adversarial detection has attracted the attention of scholars due to its higher flexibility and lower computation. Sample statistics and training a detector are two main routes. Exploiting the statistical properties, Feinman et al. dived into the kernel density (KD) and Bayesian uncertainly (BU) in the hidden layers of the model and purposed an effective detection method in [22]. Ma et al. further applied local intrinsic dimension (LID) in [23] to describe the intrinsic characteristics of adversarial subspaces. Considering that the information of the last layer may not be enough to judge the out-of-distribution data, Lee et al. in [24] made full use of each layer of DNN and obtained a detector via calculating Mahalanobis distance (M-D). Hendrycks and Gimpel indicated in [34] that samples with a large principal component had higher weights to attack successfully. Still, the latest research showed that DNNs were sensitive to the direction of the Fourier basis function. In [9,10], it was found that the high-frequency components of adversarial examples affected seriously on the robustness of the model. On the assumption that each layer of DNN obeyed the generalized Gaussian distribution, Ma et al. in [35] calculated the Benford-Fourier coefficients of each layer, thereby obtaining a support vector machine with an ideal detection performance.

Methodology
In this section, we will introduce the detection mechanism in detail to identify adversarial examples. misclassify and minimize the disturbance as much as possible. Suppose X ∈ R n as a n-dimensional input image, Z ∈ R n as a perturbation of X, and M(·) as a model trained by X. If M(X) ≠ M(X + Z), we can define that X adv � X + Z is the adversarial example specified to model M(·) and normal image X. However, if the perturbation Z is so large that neither the model nor the human eye can correctly identify, the adversarial example X adv has become no practical significance. Hence, our objective shall be expressed like where Z i means the ith dimensional value of Z and ||Z|| p is the L p norm of Z. In general, the norm is frequently utilized to limit the increase in perturbation. In this paper, we apply the L 2 norm for all adversarial attacks. For a detection task, suppose D(·) as a detector which is a binary classifier essentially. An ideal detector can classify normal images as label 0 (i.e., D(X) � 0) and abnormal images as label 1 (i.e., D(X adv ) � 1). We formulate this objective as follows: where m refers to the number of samples and D(X) represents the result of feeding X into detector D. If D(X i ) ≠ D(X adv i ), return true, otherwise returns false. D(·) is exactly the detector we seek and the maximizing result will be employed as one of our evaluation metrics, which is also called detection accuracy. (7), making the detector D identify as many samples as possible is our objective. In general, there are two strategies to achieve the purpose: transform the input data and alter the internal structure of the detector. rough the subsequent experiments shown in Figure 8, it was found that altering the detector model would not improve the detection performance significantly. Hence, feature engineering on the input data turns into our principal subject. Surprisingly, a tiny attempt to extract high-level representation from the raw data breaks the technical difficulty. As Harder et al. illustrated in [11], high-level representation has more stable and robust discriminative details for adversarial detection.

High-Level Representation. As shown in
Suppose M(·) as a DNN model trained by X, we can obtain the mth feature map via calculating the M m (X) simply. Using M m (X) instead of X as the input data, (7) will have a higher value, which means better detection accuracy under the condition of the same training time.

Fourier Transform and High-Frequency Extraction
3.3.1. Fourier Transform. In general, Fourier transform is resorted to transform signals between the time domain (or spatial domain) and the frequency domain. After converting data into the spectrum, multiply characteristics hidden in the spatial domain are revealed. e low-frequency components correspond to slowly changing regions (i.e., the flat regions), while the high-frequency components do the opposite (i.e., the edges or noise). Exploiting these properties, we can obtain blurred or edge sharpened images, respectively, by suppressing high or low-frequency components. Still, unlike continuous mathematical signals, images are discrete data consisting of pixels, which means we shall convert it using discrete Fourier transform (DFT). For a low computational cost, we employ the fast Fourier transform (FFT) [36] which has a time complexity of O (N · log(N)). Suppose an image X ∈ [0, 255] M×N , where M, N represent the width and height of the image, respectively. We can acquire the Fourier coefficient by the following formula: where l, k � 0, 1, . . . , N − 1 and X(m, n) refers to the pixel value of the coordinate (m, n). F(X) is actually a complex matrix with the same size as image X. e magnitude matrix |F(X)| will be acquired via calculating the following formula: where Real(·) and Image(·) refer to the real and imaginary parts, respectively. In subsequent experiments, the magnitude of the spectrum |F(X)| will be applied to represented the spectral domain.

High-Frequency Extraction.
Due to the conjugate property of FFT, its effective spectrum only accounts a quarter of |F(X)| for a two-dimensional image. We divide the effective spectrum into four parts (a), (b), (c), and (d), as shown in Figure 3, according to low, medium-low, mediumhigh, and high-frequency, respectively. For a fair division, it is necessary to ensure that each part occupies 25% pixels of the image. Hence, we will introduce a threshold function φ(; R) that separates the frequency components according to the radius R. Suppose the effective spectrum X e ∈ R N×N , the formal definition of equation φ(X e ; R) is as follows: where X e (i, j) represents the effective spectrum X e at position (i, j) and (N − 1, 0) is exactly the lower left of the effective spectrum. d(·, ·) refers to the Euclidean distance. By calculating r L , r R as follows, we can obtain each frequency band simply.
where r i is the boundary value for quartering the matrix X e . Hence, low-frequency component X Low e (i, j) can be obtained via calculating φ(X e ; 0, Learning by analogy, medium-low, medium-high, and high-frequency can be obtained by computing φ(X e ; r 1 , r 2 ), φ(X e ; r 2 , r 3 ), and φ(X e ; r 3 , r 4 ), respectively. Figure 4, we divide the HLFD detection method into three parts: extracting high-level representation, extracting high-frequency components, and training process. Inputting the normal and abnormal samples X, X adv , we can obtain the mth feature map of model M via calculating M m (X), M m (X adv ). According to the experimental result in section 4.2, further converting Mm(X),Mm(Xadv) to spectral domain makes the model have a greater improvement in detection tasks. us, we employ the Fourier transform to acquire the spectral characteristics F(M m (X)), F(M m (X adv )) and further obtain high-frequency components F H (M m (X)), F H (M m (X adv )) by equation (11). As emphasized in the previous paragraphs, feature engineering is the key to our HLFD method. Whether the detector is logistic regression [37], support vector machine [38], or neural network model, we can obtain a better detection performance as long as using F H (M m (X)), F H (M m (X adv )) as input. Both Figures 5 and 6 illustrated this conclusion well by intuition and experiment, respectively. More specific procedures can be acquired in Algorithm 1.

Experiment
In this section, we will rigorously conduct experiments to demonstrate the effectiveness of our detection method. We initiate with a basic experimental setup and explore the discrepancy between the spatial and spectral domains. To improve the detection performance, it is indispensable to further explore the impact of the representations of different layers, different frequency bands, and different detectors on a detection task. At last, we will conduct an ablation study and compare our method with the existing state-of-the-art methods. number of normal samples. We split them into training set (64%), validation set (16%), and test set (20%), and apply the detection rate ACC (accuracy) and AUC (area under curve) as the evaluation metrics. All adversarial examples are generated by a python tool Foolbox [25]. For a fair comparison, each pixel is changed by an average of 10%. For MNIST, L 2 � (28 * 28 * 1 * 0.1 2 ) 1/2 � 2.8. For SVHN, CIFAR-10, CIFAR-100, L 2 � (32 * 32 * 3 * 0.1 2 ) 1/2 � 5.5. Similarly, L 2 � 22 for T-ImageNet. e L 2 norm is used to limit the size of the perturbation in subsequent experiments.

Spatial Vs. Spectral Domain.
e spatial and the spectral domain refer to the original samples and the original samples after Fourier Transform, respectively. To intuitively observe the discrepancy between the spatial and spectral domains, we make a visual diagram shown in Figure 5. It seems that the pixel distribution of adversarial examples is discontinuous in the spatial domain, which is caused by random perturbation. Although humans can recognize the difference between normal and abnormal samples, the machine is hard to learn the pattern since the distribution is not generalized and stable. On the contrary, adversarial examples in the spectral domain have many small black dots in the mid-high frequency region, which express fixed and generalized patterns. e pattern may be effective for training detectors. To obtain a more rigorous conclusion, we further perform cluster analysis in the spatial and spectral domains as shown in Figure 6. In the first column, it seems that normal and abnormal samples cannot be separated by clustering whether in the spatial or the spectral domain. Still, it is not hard to discover that normal and abnormal samples are gradually classified as deepening of the network layer. Despite illustrating the effectiveness of high-level representation, we find that data in the spectral domain can be linearly separated, which cannot implement in the spatial domain. e phenomenon demonstrates the effectiveness of the spectral domain in a sense. To further explore the performance of spatial domain data on detection tasks, we conduct a series of experiments on CIFAR-10. As shown in Table 1, although spatial data are effective for FGSM, PGD, BIM, and JSMA attacks, they are powerless in CW and DeepFool attacks. It is possible that the perturbations generated by these two attack methods are little and just cross the decision boundary which are harsh to detect in the spatial domain.

Influence of High-Level Representation.
e above experiments reveal the effectiveness of high-level representation and high-frequency extraction qualitatively. Yet, it is not clear how high-level representation affects the detection task. For this, we further conduct experiments to explore the impact of representations of different layers. As shown in Figure 7, the detection rate gradually increases as the deepening of the network layer, although decreases occasionally. Concentrating on CW and DeepFool, which are harsh to detect in the spatial domain, we can also detect them effectively after high-level representation. Nonetheless, we are still not sure which layer works best for the detector. For insurance, we support extracting the last two or three layers for aggregation. An intuitive understanding of why highlevel representations work is that these features are incomprehensible (i.e., nonrobust) to humans and are extracted as the deepening of network layers. However, both robust and nonrobust features are crucial for model training, as Ilyas et al. illustrated in [16]. is is exactly the reason why clustering analysis can separate them farther and farther as deepening of network layers, as shown in Figure 6.  [9] that high-frequency components can affect model perception and further proposed that high-frequency regions are correlated with semantic components of images. Inspired by this, we initiate to explore the impact of different frequency bands on the detection performance. As the experimental results shown in Tables 2 and  3, high-frequency regions can indeed promote the performance of the detector to a certain extent. However, which frequency bands are considered high is an issue.

Influence of High-Frequency. Wang et al. illustrated in
Although we obtain the highest detection rate from 3/4 to 4/4 frequency bands (high-frequency component) as shown in Table 2, 2/4 to 4/4 frequency bands (mid-high and high-frequency components) acquire highest detection rate as shown in Table 3. For insurance, we suggest frequency bands from 2/4 to 4/4 as the output of extracting high-frequency. e high-frequency components actually correspond to the part of the image that changes drastically and the perturbation is the same. e commonality makes the high-frequency components contain more perturbation detail, which is effective for detecting. e method for frequency band division can be found in equations (10) and (11).

Influence of Different Detectors.
Although the input data is crucial, the choice of the detector model also has an impact on the detection results. We apply three classifier models: LR [37], SVM [38], and simple neural network for comparison. As shown in Figure 8, it is not hard to comprehend that since the detector is trained on CW, it works well for detecting the CW attack. However, the three classifiers are less regular in the detection performance. erefore, we believe that it is inefficient to promote the detection performance significantly via altering the model structure. e result further confirms the rationality of concentrating on feature engineering.

Ablation Study.
To explore the influence mechanism between the representation of different layers and the frequency bands, we perform an ablation study on them. As   shown in Figure 9, the low-frequency of the original images (i.e., the left brown red pillar) are viewed as the benchmark of the detection rate. Two dimensions, the layer of the feature map and the interval of the frequency band, are taken into account in the experiment. For verifying the effectiveness of high-level representation and high-frequency extraction, we compared the benchmark results with the two dimensions. For a fair comparison, the L 2 norm of each image is controlled around 5.5 and the SVHN dataset is used here. From Figure 9, we can observe that the detection rates show an upward trend whether only considering different layers of the network or frequency bands. e result shows that there is a win-win effect between the representations of different layers and frequency bands, which further confirms the effectiveness of our HLFD method. It is also found that an 83% detection rate can be achieved even on DeepFool, which is harsh to detect in the spatial domain.

Comparison with Existing Methods.
We compare our method with three state-of-the-art detection methods (KD + PU, LID, and M-D). To obtain a more objective    conclusion, we conduct a series of experiments on five datasets and evaluate on six attacks. For a fair comparison, we set the same perturbation for each attack. As shown in Table 4, our method outperforms the other three detection methods for each attack on CIFAR-10 and CIFAR-100. To make the results more convincing, we use more realistic datasets (T-ImageNet) for testing. Although our detection rate is not the highest in some scenarios, the gap is not large at least. Moreover, we improve the detection rates by a large margin on DeepFool and CW attacks. Overall, our HLFD method is more robust and stable in various real-world environments compared to the existing state-of-the-art methods.

Conclusion
In this paper, we propose a simple yet effective HLFD method for detecting adversarial examples. By exploring from the spatial to the spectral domain, it is found that adversarial examples after transforming to the spectrum have richer characteristics which are beneficial for training the detector. Moreover, we further discover that extracting high-level representations and high-frequency components can promote the detection performance and the two factors show a win-win relationship via the ablation study. We intuitively and experimentally explain why these two factors work. Exploiting these findings, HLFD detection method is proposed. Although our method outperforms other state-ofthe-art adversarial detection methods in most scenarios, the detectors are still faced with a more complex and unknown attacks in a real-world environment. Extending our method to more realistic settings (e.g., ImageNet dataset) is crucial. Exploring how to detect more aggressive attacks effectively are also a worthwhile research subject.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.