MedicalGuard: U-Net Model Robust against Adversarially Perturbed Images

Deep neural networks perform well for image recognition, speech recognition, and pattern analysis. )is type of neural network has also been used in the medical field, where it has displayed good performance in predicting or classifying patient diagnoses. An example is the U-Net model, which has demonstrated good performance in data segmentation, an important technology in the field of medical imaging. However, deep neural networks are vulnerable to adversarial examples. Adversarial examples are samples created by adding a small amount of noise to an original data sample in such a way that to human perception they appear to be normal data but they will be incorrectly classified by the classification model. Adversarial examples pose a significant threat in the medical field, as they can cause models to misidentify or misclassify patient diagnoses. In this paper, I propose an advanced adversarial training method to defend against such adversarial examples. An advantage of the proposed method is that it creates a wide variety of adversarial examples for use in training, which are generated by the fast gradient sign method (FGSM) for a range of epsilon values. A U-Net model trained on these diverse adversarial examples will be more robust to unknown adversarial examples. Experiments were conducted using the ISBI 2012 dataset, with TensorFlow as the machine learning library. According to the experimental results, the proposed method builds a model that demonstrates segmentation robustness against adversarial examples by reducing the pixel error between the original labels and the adversarial examples to an average of 1.45.


Introduction
Deep learning technology has enabled innovations in the field of computer image recognition. e deep neural network [1], which is a neural network with a multilayer structure, has displayed better performance than previous machine learning models on tasks in the field of image object classification.
is technology has attracted particular attention in the field of medical imaging [2]. For example, Google's machine learning technology for diabetic retinopathy diagnosis [3] using fundus images has demonstrated the capability of performing medical diagnoses at a level comparable to that of physicians. Studies using deep learning technology are also being conducted in the fields of radiology, pathology, and ophthalmology.
However, deep neural networks are vulnerable to adversarial examples [4,5]. An adversarial example is a sample created by adding a small amount of noise to an original data sample in such a way that it does not appear abnormal to humans but will be incorrectly classified by the model. If adversarial examples are included in medical image data, patient image data may be incorrectly classified by the deep learning model, resulting in incorrect diagnoses.
In this paper, I propose a method of constructing a model that is robust against adversarial examples in medical images.
e proposed method accomplishes this by generating adversarial examples for a range of epsilon values and then training the model using the fast gradient sign method (FGSM) [6]. e contributions of this paper are as follows: first, I propose a system of gradual adversarial training to build a model that is robust to adversarial examples, targeting the U-Net model, which segments data. e remainder of this paper is organized as follows: Section 2 introduces related research, and Section 3 explains the proposed scheme. Section 4 presents the experiments and evaluations, and Section 5 discusses the proposed method further. Finally, the paper concludes with Section 6.

Related Work
is section describes the U-Net model and provides a brief introduction to adversarial examples.

U-Net Model.
e U-Net model [7] is a model that performs well for segmenting a specific area of an image. Shown in Figure 1, the U-Net model has two advantages over previous models. First, it is fast because instead of using a sliding window to subdivide an image and scan the pieces one by one, it uses a patch method, cutting the entire image into a grid and classifying it all at once. Second, it does not face a trade-off between recognition rate and patch size. With the conventional method, the recognition rate for the overall data improves when a large area is examined at once, but this comes at the expense of localization. With U-Net, however, a high recognition rate is maintained both locally and for the entire dataset. e left half of the U-Net model's structure, called a contracting path, consists of a part that decreases in size, and the right half, called an expansive path, increases in size. A feature of the structure is that the result value of each layer in the contracting path is concatenated with the output filter of the same-sized layer on the right before each max pooling operation is performed. In the concatenation operation, when the left image size is greater than that on the right, the model crops and resizes the image so that the left and right images mirror each other. In addition, it uses a trick to increase the recognition rate: it attaches the rest when cutting in the patch units called overlap tiles. Another advantage is its fast forwarding speed due to the lack of a fully connected layer.

Adversarial Examples.
Adversarial examples are samples created by adding a small amount of noise to an original data sample such that the noise will be difficult for humans to discern but will cause misclassification by the model. e study of adversarial examples was introduced by Szegedy et al. [4] in 2014, and various types of attack and methods of defense have been investigated. Attacks by adversarial example can be classified in four ways: by the amount of information available on the target model, by the specificity of the desired misclassification, by method of distortion, and by the method used to generate the adversarial examples. First, adversarial examples can be divided according to the amount of information available on the target model into white box attacks and black box attacks [8]. A white box attack is one occurring in a scenario in which the attacker has all of the information about the target model: its structure, its parameters, and the output probability values for a given input. A black box attack is one occurring in a scenario in which the attacker does not have information about the target model; that is, only the model's result for a given input can be known. It is more difficult for an attacker to generate adversarial examples for a black box attack than for a white box attack.
Second, adversarial examples can be classified according to the specificity of the desired misclassification as targeted attacks [9,10] or untargeted attacks [11]. In a targeted attack, the intent is that the adversarial example will be misclassified by the model as a specific target class determined by the attacker. In an untargeted attack, the intent is that the adversarial example will be misclassified by the model as any class other than the original class. Targeted attacks have the advantage of enabling more sophisticated attacks than untargeted attacks. On the other hand, untargeted attacks have the advantage of being able to generate adversarial examples in less time and with less distortion than targeted attacks require.
ird, adversarial examples can be classified according to the metric of distortion [12] into L 1 , L 2 , and L ∞ as follows: In all three method of distortion, the smaller the result, the more similar the adversarial example x * is to the original sample x.
Fourth, adversarial examples can be classified according to the method used for generating them. ese include FGSM [6], iterative FGSM (I-FGSM) [13], the DeepFool method [14], the Jacobian-based saliency map attack (JSMA) [15], and the Carlini and Wagner attack [5]. ese methods generate adversarial examples using the output fed back by the target model on values given to it as input. A transformer adds a small amount of noise to the original data sample, generates a transformed data sample, and passes it to the target model, which delivers a corresponding probability value to the transformer as feedback.
e transformer creates adversarial examples by iteratively adding a small amount of noise to the transformed data such that the probability value corresponding to the target class (or to a random class other than the original class) is increased. Methods of defense against adversarial examples involve manipulating the input data [16,17] or making the classifier more robust [4,6]. e first approach, manipulation of input data, defends against adversarial examples by reducing the effect of the adversarial noise on the input, such as by filtering or feature squeezing. e second approach, making the classifier more robust, reduces the effectiveness of adversarial example attacks by training the classifier on adversarial examples. e adversarial training method [  e mathematical expression of the proposed method is as follows: first, to generate local adversarial examples for each of several values of epsilon, the fast gradient sign method (FGSM) [6] finds an adversarial example x * l through L ∞ :

Proposed Scheme
where F is an objective function of the local model and t is the target class. In FGSM, the local adversarial example is generated according to the value of ϵ from the input image x through the gradient ascent method, which is simple but has excellent performance.
ird, to confirm the robustness of the trained U-Net model against unknown adversarial examples, the U-Net model segments the unknown adversarial example x * u generated from a holdout model with the original label:

Experimental Setup and Evaluation
is section describes the experimental environment and reports the results for the proposed method. e experiments were performed using the TensorFlow [21] machine learning library and a Xeon E5-2609 1.7 GHz server.

Dataset.
e ISBI 2012 dataset [22] was used for the experiments. is is a serial section transmission electron microscopy (ssTEM) dataset of the Drosophila first instar larva ventral nerve cord (VNC). It is commonly used in the segmentation of medical images. It is composed of 30 images and labels. e microcube measures approximately 2 × 2 × 1.5 microns, and the resolution is 4 × 4 × 50 nm/pixel. e label is binary and is represented by a black and white image, with segmented objects represented in white and the rest represented in black. It was tested by the k-fold cross-validation method. Although the ISBI 2012 dataset is small, the U-Net model has demonstrated very high performance on data segmentation using this dataset.

Model Configurations.
e experiments in this study involved a U-Net model as the attack target; a local model, used for the advanced adversarial training of the U-Net model; and a holdout model, as used by attackers to perform a transfer attack. An attacker performs a transfer attack on the U-Net model as a black box attack, using an adversarial example created using the holdout model.

U-Net Model.
e target model was a U-Net model used for data segmentation. Its structure is given in Table 1.
e Adam algorithm [23] was used as the optimization algorithm of the U-Net model, and ReLU [24] was used as the activation function. e model's parameter values are given in Table 2.

Local Model.
e local model has the same structure as the U-Net model, but with different parameters: the learning rate was 0.002, and the epoch was set to 80 to create the local model.

Holdout Model.
e holdout model was configured as shown in Table 3; its structure was different from that of the U-Net models. e learning rate was set to 0.002, the epoch was set to 120, and the remaining parameter values were as listed in Table 2.

Experimental Results.
is section shows the results of the analysis of adversarial example images generated from the holdout model, the analysis of the segmentation performance of the proposed method, and the analysis of pixel error for the proposed method. Table 4 shows examples of original sample images, adversarial noise, and adversarial examples from the holdout model. To generate the adversarial noise, epsilon was set to      Table 5 shows a comparison between original labels and output results of the model with no defense, the baseline model, and the proposed model. e model with no defense is a U-Net model with no defense against adversarial examples. e baseline model is a U-Net model to which the existing adversarial training method was applied, and adversarial examples generated from the local model using an epsilon value of 0.4 were used for additional training. It can be seen in the figure that the model with no defense produced many incorrect segmentations in the adversarial examples. is is because the noise of the adversarial example affects the elements to be segmented, resulting in errors. On the other hand, the segmentations produced by the proposed model and the baseline model for the adversarial examples are similar to the original labels. Furthermore, the proposed model had better segmentation performance on a greater number of adversarial examples than the baseline model because it was trained on a variety of adversarial examples generated from a wider range of epsilon values than the baseline model. Figure 3 shows the pixel error between the original label and the adversarial example for each model. e "pixel error" is the difference between the pixels of the original label and those of the classified output, calculated according to the L 2 metric. It can be seen in the figure that as epsilon

Assumptions.
e proposed method assumes that the attacker will perform a transfer attack in the form of a black box attack, with no information on the U-Net model. In other words, it is assumed that the attacker will exploit the feature that an adversarial example generated from the holdout model (known to the attacker) can be effective as an attack against other U-Net models. In the experiment, to give the advantage to the side of the attacker, the holdout model and the U-Net model were defined to have similar structures. e proposed defense method used adversarial examples generated from the local model (similar to the U-Net model) in the additional training of the U-Net model.

Epsilon.
In FGSM, epsilon is a parameter that controls the amount of adversarial noise. When adversarial examples are segmented, as epsilon increases, the pixel error of the segmentation increases. On the other hand, it is important to keep in mind that the amount of adversarial noise added in generating the adversarial example increases as epsilon increases.
erefore, it is desirable to select a value for epsilon such that the pixel error of the segmentation by the model will be high but the adversarial noise will not be identifiable by the human eye.
e adversarial examples generated from the local model using values of epsilon ranging from 0.1 to 0.4 were used in the additional training of the U-Net model. By training adversarial examples generated using various values for epsilon, the proposed method produces a model that is more robust to unknown adversarial examples.

Pixel Error.
e models' performance on the original labels and the results of the model with no defense, the baseline model, and the proposed model were analyzed using pixel error. As epsilon increases, the adversarial noise of the unknown adversarial example increases, and thus, the pixel error increases. e trade-off is that with higher epsilon values, the adversarial noise becomes easier for humans to discern. e proposed model has less pixel error than the other models between the original labels and the adversarial examples. e proposed model is robust in correctly segmenting unknown adversarial examples because it has been trained on a variety of adversarial examples.

5.4.
Applications. An adversarial training method can be used in medical applications in which there is a risk of misclassification due to adversarial examples. In the experiment, adversarial examples for a segmentation application in the medical field were analyzed. Segmentation is an important technology for MRI imaging and tumor identification in the healthcare industry. If a segmentation is incorrect because of adversarial examples in such medical projects, it can pose a serious threat to patients' medical care. erefore, the proposed method can be an important tool in the field of medical imaging because it creates models that are robust against adversarial examples.

5.5.
Limitations. FGSM was chosen as a representative method from among the adversarial example generation methods for use in the proposed method. Studies on the proposed method can be expanded by using alternative adversarial example generation methods. In addition, it may be interesting to research targets of attack other than the U-Net segmentation model.   In future research, the scope can be expanded to experiments with other datasets. In addition, the proposed method can be applied to models for medical data by generating the adversarial examples using generative adversarial nets [25]. Finally, another interesting topic for research would be ensemble defense methods for use in the medical field.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request after acceptance.

Conflicts of Interest
e author declares no conflicts of interest regarding the publication of this paper.