Protecting Privacy in Shared Photos via Adversarial Examples Based Stealth

Online image sharing in social platforms can lead to undesired privacy disclosure. For example, some enterprises may detect these large volumes of uploaded images to do users’ in-depth preference analysis for commercial purposes. And their technology might be today’s most powerful learning model, deep neural network (DNN). To just elude these automatic DNN detectors without affecting visual quality of human eyes, we design and implement a novel Stealth algorithm, which makes the automatic detector blind to the existence of objects in an image, by crafting a kind of adversarial examples. It is just like all objects disappear after wearing an “invisible cloak” from the view of the detector.Then we evaluate the effectiveness of Stealth algorithm through our newly defined measurement, named privacy insurance. The results indicate that our scheme has considerable success rate to guarantee privacy compared with other methods, such as mosaic, blur, and noise. Better still, Stealth algorithm has the smallest impact on image visual quality. Meanwhile, we set a user adjustable parameter called cloak thickness for regulating the perturbation intensity. Furthermore, we find that the processed images have transferability property; that is, the adversarial images generated for one particular DNN will influence the others as well.


Introduction
With the pervasiveness of cameras, especially smartphone cameras, coupled with the almost ubiquitous availability of Internet connectivity, it is extremely easy for people to capture photos and share them on social networks.For example, according to the statistics, around 300 million photos are uploaded onto Facebook every day [1].Unfortunately, when users are eager to share photos online, they also hand over their privacy inadvertently [2].Many companies are adept at analyzing the information from photos which users upload to social networks [3].They collect massive amounts of data and use advanced algorithms to explore users' preferences and then perform more accurate advertising [4].The owner's life behind each photo is like being peeped.
Recently, we may shudder at a news report about fingerprint information leakage from the popular two-fingered pose in photos [5].The researchers are able to copy fingerprints according to photos taken by a digital camera as far as three metres away from the subject.Another shocking news is that a new crop of digital marketing firms emerge.They aim at searching, scanning, storing, and repurposing images uploaded to popular photo-sharing sites, to facilitate marketers to send targeted ads [6,7] or conduct market research [8].These behaviors of large-scale continuous accessing users' private information will, no doubt, make the photo owners very disturbed.
Moreover, shared photos may contain information about location, events, and relationships, such as family members or friends [9,10].This will inadvertently bring security threats to others.After analyzing more than one million online photos collected from 9987 randomly selected users on Twitter, we find that people are fairly fond of sharing photos containing people's portrait on social platforms, as shown in Table 1.We test on 9987 users and take 108.7 images on average from each person.The result shows that about 53.4% of the photos contain people's portrait and 97.9% of the users have shared one or more photos containing people's portrait, which shows great risks of privacy disclosure.In addition to portrait, photos containing other objects may reveal privacy as well, such as road signs and air tickets.Traditional methods of protecting personal information in images are mosaic, blur, partial occlusion, and so on [11,12].These approaches are usually very violent and destructive.A more elegant way is to use a fine-grained access control mechanism, which enforces the visibility of each part of an image, according to the access control list for every accessing user [13].More flexibly, a portrait privacy preserving photo capturing and sharing system can give users, who are photographed, the selection to choose appearing (select the "tagged" item) in the photo or not (select the "invisible" item) [14].
These processing methods can be good ways to shield people's access.But for many companies which push largescale advertising, they usually use automated systems rather than manual work to detect user uploaded images.For instance, Figure 1 shows the general process of obtaining privacy through online photos.First, a user shares a photo on the social network unguardedly.Then this photo is collected by astute companies and put into their own automatic detection system.Based on the detection results from a simple photo, the user's privacy information might be at their fingertips.The traditional processing methods (mosaic, blur, etc.) will not only greatly reduce image quality undesirably, but also not work well to the automatic detection system based on DNN, as shown in the later experimental results (Figure 6).Users' purpose of sharing photos is to show their life to other people, but not to give detection machine any opportunity to pry into their privacy.Therefore, we need a technique to deal with images, so that the automatic detection system is unable to work well, but humans cannot be aware of the subtle changes in images.
From Figure 1, we can see, whether for commercial or wicked purposes, the basic model of infringing image privacy follows the same patterns: first, the system gives object proposals, that is, to find where objects may exist in the picture and outline bounding boxes of all possible objects; then the system identifies the specific category of each proposal.
With regard to the detection process, the most advanced algorithm is based on deep neural networks.The unparalleled accuracy turns them into the darling of artificial intelligence (AI).DNNs are able to reach near-human-level performance in language processing [15], speech recognition [16], and some vision tasks [17][18][19], such as classification, detection, and segmentation.
Although they dominate the AI field, recent studies have shown that DNNs are vulnerable to adversarial examples [20], which are well designed to mislead DNNs to give an incorrect classification result.But, for humans, the processed images still remain visually indistinguishable with the original ones.
Since adversarial examples have a great deal of resistance on the classification task, then for the more complex detection task, can we produce adversarial examples with a similar effect?Even if the classification result is incorrect, knowing the existence of an object (not knowing its specific category) is a kind of privacy leakage to some extent.So disenabling the detection machine to see anything is both meaningful and challenging.
As we mentioned above, the detection process is divided into two steps, region proposal and proposal box classification.If we can successfully break through either of these two and visual quality of the original image does not deteriorate, then we are able to produce a new kind of adversarial examples specifically for detection task.A successful resistance involves two cases.One is failing in object proposal, that is, proposing nothing for the next step; and the other is going wrong in recognition on the given right proposal boxes.Our work focuses on the first case.It makes DNNs turn a blind eye to the objects in images; in other words, DNNs will fail to give any boxes of possible objects.Intuitively, our approach is implemented as if objects in an image are wearing an "invisible cloak."Therefore, we call it Stealth algorithm.Furthermore, we define cloak thickness to evaluate the strength of perturbation and privacy insurance to measure the capacity of privacy preservation, and their interconnections are also discussed.In addition, we find the cloak can be shared; that is, adversarial examples which we make specially for one DNN can also resist other DNN detectors.
In previous work, adversarial examples were usually used to attack various detection systems, such as face recognition [21,22], malicious code detection [23], and spam filtering [24], all of which are aggressive behaviors out of malice.But The rest of the paper is organized as follows.In Section 2, we review the related work.In Section 3, we introduce several DNN-based detectors and highlight the Faster RCNN detection framework, which we use in our algorithm.In Section 4, we illustrate the approach we design to process an image into an adversarial one for eluding a DNN detector.Then, in Section 5, we evaluate our approach in multiple aspects.Finally, in Section 6, we make conclusions and discuss the future work.

Related Work
Over the past few years, many researchers are committed to studying the limitation of deep learning and it is found to be quite vulnerable to some well-designed inputs.Many algorithms spring up in classification tasks to generate this kind of adversarial input.Christian et al. [25] first discovered that there is a huge difference between DNN and human vision.Adding an almost imperceptible interference into the original image (e.g., a dog seen in human eyes) would cause DNN to misclassify it into a completely unrelated category (maybe an ostrich).Then the fast gradient sign method was presented by Ian Goodfellow et al. [20], which can be very efficient in calculating the interference to an image for a particular DNN model.An iterative algorithm of generating adversarial perturbation by Papernot et al. [26] followed it, which is based on a precise understanding of the mapping between inputs and outputs of DNNs by constructing adversarial saliency maps, and the algorithm can choose any category as the target to mislead the classifier.Nguyen et al. [27], along the opposite line of thinking, synthesized a kind of "fooling images."They are totally unrecognizable to human eyes, but DNNs classify them into a specified category with high confidence.More interestingly, Moosavi-Dezfooli et al. [28] found that there exists a universal perturbation vector that can fool a DNN on all the natural images.Adversarial examples have also been found by Ian Goodfellow et al. [20] to have the transferability property.It means an adversarial image designed to mislead one model is very likely to mislead another as well.That is to say, it might be possible for us to craft adversarial perturbation in circumstance of not having access to the underlying DNN model.Papernot et al. [29,30] then put forward such a black-box attack based on cross-model transfer phenomenon.Attackers do not need to know the network architecture, parameters, or training data.Kurakin et al. [31] have also shown that, even in the physical world scenarios, DNNs are vulnerable to adversarial examples.Followed by an ingenious face recognition deceiving system by Sharif et al. [32], it enables the subjects to dodge face recognition when they just wear printed paper eye glasses frame.
It can be seen that most of the previous studies on the confrontation against DNNs are usually for classification task.Our work is about the detection task, which is another basic task in computer vision.It is quite distinct from classification, since the returned values of detection are usually both several bounding boxes indicating object positions and labels for categories.Also, its implementation framework is more complicated than classification.Higher dimensions of the result, continuity of the bounding box coordinates, and more complex algorithm make deceiving DNNs on detection become more challenging work.
Viewed from another aspect, Ilia et al. [13] proposed an approach that can prevent unwanted individuals from recognizing users in a photo.When another user attempts to access a photo, the designed system determines which faces the user does not have permission to view and presents the photo with the restricted faces blurred out.Zhang et al. [14] presented a portrait privacy preserving photo capturing and sharing system.People who do not want to be captured in a photo will be automatically erased from the photo by the technique of image inpainting or blurring.
Previous work is to protect the privacy on the level of human vision, whereas these methods have proven less effective for computer vision.In this article, we attempt to design a privacy protection method for computer vision, and meanwhile it ensures human visual quality.This method can be applied in conjunction with the above-mentioned photosharing system by Zhang et al. [14] in the future work.And it will allow users to choose whether their purpose of privacy protection is against computer vision or human vision.

Object Detectors Based on DNNs
Object detection frameworks based on DNNs have been emerging in recent years, such as RCNN [33], Fast RCNN [34], Faster RCNN [18], Multibox [35], R-FCN [36], SSD [37], and YOLO [38].These methods generally have excellent performance, many of which have even been put into practical applications.In order to avoid the practitioners hesitating to choose detection frameworks, some researchers have made some detailed test and evaluation on the speed and accuracy of Faster RCNN, R-FCN, and SSD, which are prominent on detection task [39].Results reflect, in general, that Faster RCNN exhibits optimal performance on the tradeoff between speed and accuracy.So we choose to resist the detection system employing the Faster RCNN framework, as shown in Figure 2. Technically, it integrates RPN (region proposal network) and Fast RCNN together.The proposal obtained by RPN is directly connected to the ROI (region of interest) pooling layer [34], which is an end-to-end object detection framework implemented with DNNs.First of all, images are processed to extract features by one kind of DNN (ZF-net, VGGnet, ResNet, etc.).And then the detection happens in the following two stages: region proposal and box classification.At the stage of region proposal, the features are used for predicting class-agnostic bounding box proposals (object or not object).At the second stage, which is box classification, the same features and corresponding box proposals are used to predict a specific class and bounding box refinement.
Here, we do some explanation of the notations.X ∈ R  is an input image composed of  pixels, and  is the number of classes that can be detected.The trained models of the two processes in detection, region proposal, and box classification are  rp and  cl , respectively.And of course there is a feature extraction process  feat before both of them at the very beginning.
In the process of feature extraction, some translationinvariant reference boxes, called anchors, are generated based on the extracted features, denoted by The value  represents the number of anchors.  ,   ,   , ℎ  ( = 1, 2, . . ., ) are, respectively, the vertical and horizontal coordinates of the upper left corner of the anchors and its width and height.Each anchor corresponds to a nearby ground truth box, which can be denoted by Then, in the region proposal stage,  rp predict  region proposals, which are parameterized relative to  anchors.
Security and Communication Networks 5   ,   ,   , ℎ  ( = 1, 2, . . ., ) are, respectively, the vertical and horizontal coordinates of the upper left corner of the region proposal and its width and height.The value   is the probability of it being an object (only two classes: object versus background).For convenience, we let B(X) be the first four columns, which contain the location and size information of all the bounding boxes and let P(X) be the last column containing their probability information.
The region proposal function is followed by a function for box classification  cl : R  × R ×5 → R ×(4+) .Here, except the image X, the above partial result B(X) is also as one of inputs.
The value  is the number of final bounding boxes results ( ≤ ).And similarly, x , ỹ , w , h ( = 1, 2, . . ., ) represent their location and size information. 1 ,  2 , . . .,   are, respectively, the probability of each box result belonging to each class ( classes in total).We also let B(X, B(X)) and P(X, B(X)) be the two parts of the result matrix.In short, Faster RCNN framework is the combination of region proposal and box classification.

Stealth Algorithm for Privacy
4.1.Motivation and Loss Function.Our Stealth algorithm is aimed at the first stage, region proposal.The processing method which directs at the first stage could be the simplest and most effective, because if the detector does not give any proposal boxes, the next stage (box classification) will be even more impossible to succeed.In a word, we deceive a DNN detector from the source.
Our aim is to find a small perturbation X, X st = X + X, s.t.
Here th rp is a threshold, according to which the detection machine decides each box to be retained or not.Formula ( 5) expresses that we want to add some small perturbations, so that in region proposal stage any object proposals cannot be detected with considerable probability  rp .In other words, at this stage, all the boxes with low scores (probability of being an object) will be discarded by the system.
Likewise, we can also interfere with the subsequent box classification stage, which can be expressed as where, (th cl )  = th cl × ( Some other bounding boxes will be discarded, because the probability that they belong to any class among the  classes is less than the threshold th cl with great probability.On the surface, formula (5) and formula (6) are two modification methods.But in the detection framework Faster RCNN, its two tasks (region proposal and box classification) share the convolution layers; that is, the two functions ( rp and  cl ) regard the same deep features as their input.We modify the image for purpose of resisting either of the two stages, which may mislead the other function inadvertently.Therefore, we just choose to deal with the image as formula (5).This operation will obviously defeat the region proposal stage, and it will be even very likely to defeat the following box classification process in formula (6).A more straightforward explanation is that, in the view of the detection machine, our algorithm makes the objects in the image no longer resemble an object, let alone an object of a certain class.The image seems to be wearing an invisible cloak.So, in the machine's eyes, an image including a lot of content looks completely empty, which lives up to our expectation.
We are more concerned about the region proposal stage, and its loss function in Faster RCNN framework is Here T(A(X i ), B(X i )) represents a certain distance between anchors and the predicted region proposals, and T(A(X i ),  gt (X i )) is that between anchors and ground truth boxes (in Figure 3, we represent it as a vector).In training phase, the goal of the neural network is to make T(A(X i ), B(X i )) closer to T(A(X i ),  gt (X i )), as shown in Figure 3(a).More specifically, Similarly, And (X i ) in the loss function is the probability of the ground truth object labels ((X i ) ∈ {0, 1}: 1 represents the box is an object and 0 represents not). is the parameter of the trained model.At the region proposal stage, the total loss L is composed of two parts, box regression loss ℓ box (smooth 1 loss) and binary classification loss ℓ prb (log loss). and  are the weights balancing the two losses.(1) Get the anchors A(X i ) on the basis of the features extracted from DNN. X i is the temporary image in the th iteration.

Algorithm
(2) Compute the forward prediction  rp (X i ).This indicates the position of the prediction boxes.(3) Get the adversarial perturbation X i based on backpropagation of the loss.The loss function L is the same as that of Faster RCNN, but we change one of its independent variables.In other words, we replace T(A(X i ),  gt (X i )) with −T(A(X i ), B(X i )), as shown in Figure 3(b).We compute the backpropagation value of the total loss function: as the perturbation X i in one iteration.The role of backpropagation and loss function in the training process is to adjust the network so that the current output moves closer to the ground truth.Here we substitute the reverse of the direction towards which the box should be adjusted (−T(A(X i ), B(X i ))) for the ground truth  gt .An intuitive understanding is that we try to track the adjustment on region proposal by DNN detector.If it is found that the DNN wants to move the proposals in a certain direction, then we add some small and well-designed perturbations onto the original image.These perturbations may cause the proposals to move in the opposite direction and consequently counteract their generation.
The original image and that processed by the Stealth algorithm will have totally different results through the DNN detector, as shown in Figure 4.The original image can be detected and labeled correctly, while as for the processed image no objects are detected by the DNN detector; that is, no information has been perceived at all.Even better, in human eyes, there is little difference between the adversarial image and the original image.

Privacy Metric.
To measure the effectiveness of our algorithm quantitatively, we define a variable PI, named privacy insurance.It can be interpreted as how much privacy the algorithm can protect.We let   be the total number of bounding boxes of the th class (1 ≤  ≤ ), which are detection results based on all original images, including both correct and wrong results.And we let   be the number of just correct boxes of each class detected on adversarial ones and PI be the average of all PI  values.
We can observe from the above definition that PI means the success rate of our detection resistance actually, and it also indicates how much privacy owned by users can be preserved.Normally, mAP (mean average precision) is usually used to measure the validity of a detector.But here our PI value is a more appropriate evaluation index.Suppose there are  classes in the dataset, each with an independent privacy insurance value PI  ( = 1, 2, . . ., ), because the model itself has some errors when detecting original images; that is, the accuracy is not 100%.And the major concern of our algorithm is to resist the detection model.Consider such a case: the machine's judgment itself on the original image is wrong.And after dealing with it by the algorithm, the judgment is still wrong, but it has two different wrong forms.Then this processing of resisting detection is successful theoretically.But calculating the difference of mAP value between pre-and postprocessing cannot reflect that this case is a successful one.On the contrary, PI can evaluate the validity of our work at all cases, of course including the above one.

Experiment and Evaluation
In order to illustrate the effectiveness of our Stealth algorithm, we will evaluate it from four aspects: (i) We clarify whether the processed images by our algorithm can resist DNNs effectively.We show the result of performing on nearly 5000 images in PASCAL VOC 2007 test dataset to confirm that.(ii) We compare our algorithm with other ten methods of modifying images for resisting detection.Results indicate that our method works best and has minimal impact on 5.1.Some Experimental Setups.We test our algorithm on the PASCAL VOC 2007 dataset [40].This dataset consists of 9963 images and is equally split into the trainval (training and validation) set and test set.And it contains 20 categories, which are common objects in life, including people, several kinds of animals, vehicles, and indoor items.Each image contains one or more objects, and the objects vary considerably in scale.As for DNNs, we use two nets trained by Faster RCNN on the deep learning framework Caffe [41].One is the fast version of ZF-net [42] with 5 convolution layers and 3 fully connected layers, and the other is the widely used VGG-16 net [43] with 13 convolution layers and 3 fully connected layers.In addition, our implementation is completed on a machine with 64 GB RAM, Intel Core i7-5960X CPU, and two Nvidia GeForce GTX 1080 GPU cards.

Effectiveness and Comparison.
Here we first illustrate the effectiveness through several samples and compare with other trivial methods.In the next subsection, we will then introduce the results of larger-scale experiments.As shown in Figure 5, one can observe that images processed by our algorithm can dodge detection successfully.And humans can hardly notice the slight changes.Consequently, we have generated a kind of machine-harm but human-friendly images.For most images in our experimental dataset, the machine cannot see where objects are (the first two rows in Figure 5), let alone identifying what specific category they belong to.For a small number of images, even if the machine is really aware that there may be some objects in the image, it cannot locate them exactly or classify them correctly (the last row in Figure 5).In short, in the vast majority of cases, the machine will give the wrong answer.To give a quantitative analysis, we introduce a new measurement, cloak thickness, which will be explained in detail in Section 5.3.
In addition, we show the other ten trivial but interesting ways of modifying images to interfere with detection machines in Figure 6.We use PSNR (Peak Signal to Noise Ratio) to evaluate the visual quality of the processed images.These methods include both global and local modification.Local processing here is on the location of objects, rather than a random location.This indicates that although the perturbation is not very considerable, the image gets disgustingly murky.People usually cannot endure viewing such images on the Web.Sadly, although people cannot bear it, the machine can still detect most objects correctly.Thus some smoothing filters (like mosaic or Gaussian blur) are unable to resist DNN-based detector.We think DNNs could compensate for the homogeneous loss of information; that is, once a certain pixel is determined, a small number of surrounding pixels are not very critical.
(ii) As shown in Figures 6(f) and 6(g), an image with large Gaussian noise has poor quality judged by its low PSNR value.But the machine is also able to draw an almost correct conclusion.This shows that Security and Communication Networks adding Gaussian noise is not a good way to deceive the detector, either.
(iii) As for a large area of occlusion on key objects, whether black occlusion in Figure 6(h) or white occlusion in Figure 6(i), they both make the quality deteriorate drastically.In spite of a large amount of information loss, the detection result is still almost accurate surprisingly.
(iv) From Figure 6(j), we can see that adjusting the image brightness to a fairly low level cannot resist the detector, either.It causes the greatest damage to the image simultaneously so that human eyes cannot see anything in the image at all.But the detector gives rather accurate results.
(v) In order to make the machine unaware of the existence of objects in the image, another natural idea is to make objects become transparent in front of the machine.So we try to change its transparency and hide it in another image, as shown in Figure 6(k).And yet it still does not work.
(vi) On the contrary, from Figure 6(l), we can see that our Stealth algorithm substantially has the smallest damage to image quality and it is also resistant to detection effectively.In order to better illustrate its effectiveness, we have carried out other larger-scale experiments which will be described next.

Privacy Insurance.
In order to depict the degree of privacy protection in our algorithm, we define a parameter, cloak thickness , to weight the trap-door between privacy and visual quality.Users can tune this parameter to determine the adversarial disturbance intensity on each pixel.For a specific , the modification to each pixel is obviously uneven.What we need to do is multiplying  by the gradient value of DNN backpropagation.This is equivalent to expanding the gradient of each pixel by  times simultaneously, and it is considered as the final modification added to the image.Greater gradient value of pixel means further distance away from our target, so we need to add more adversarial interference on this pixel.Certainly, different  values also influence the results.The added interference is proportional to  value.The greater , the thicker the cloak the image is wearing, and the machine will be more blind to it.But, of course, the visual quality will go down.We test on nearly 5000 images and calculate the PI using ZF-net and VGG-net, and the results can be found in Table 2.The 20 classes include airplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, sofa, train, and tv monitor.Except for very few classes, the PI values of the vast majority are fairly high.This roughly means that we have successfully protected the users' most information in images.
Assume that a user shares many pictures and then tries to protect his privacy by using different methods of perturbing images.We test the PI values of all these methods, as shown in Figure 7.We can see from it that our Stealth algorithm can protect most privacy, and mosaic comes second, but it nevertheless has destructive effects on image.Other methods not only fail to protect privacy, but also cause terrible visual quality of images that users cannot put up with.Of course, users can get more insurance for their privacy by increasing the cloak thickness , but they may have to face the risk of image quality deteriorating, as shown in Figure 8. From this figure, we can find  = 0.3×10 3 could be an appropriate value, at which we can not only get a satisfactory privacy insurance but also ensure the visual effects.Even if the value of cloak thickness is fairly large (e.g.,  = 1.2 × 10 3 ), the PSNR is still greater than any other methods.The Stealth algorithm's modification to a pixel is related to the current value of the pixel, so it does not seem so abrupt after the processing.
From the above experimental results, we can see our algorithm works well, but the fact that there exist classes with low PI value (e.g., Class 8 "cat," Class 12 "dog," and Class 14 "motorbike") is worth thinking about.Here we present some illustrations and thoughts on this question.The extracted feature of each region proposal corresponds to a point in a high dimensional space.The correctness of the judgment is related to the classification boundary.Our work is to change positions of these corresponding points by adding perturbation to an image, so that the points can cross the boundary and jump to another class (from be-object class to not-object class).
Our algorithm is independent of the specific class of the object.That is to say, to offset the generation of region proposal, we use the same number of iterations (Γ) and multiple times () when we superimpose the gradient disturbance for all classes.In the abstract high dimensional space, features of different classes occupy different subspaces, which are large or small.So perturbations with the same iterations and multiple times are bound to cause a problem where features of some classes are successfully counteracted, while some few other classes may fail.The reason for failure may be that the number of iterations is insufficient or the magnitude of modification is not enough for these classes.For each region proposal feature in the detector, Figure 9 gives a vivid illustration of the following four cases.
Case 1.The region proposal features of some classes are successfully counteracted after the image is processed.In other words, the corresponding feature point jumps from be-object subspace to not-object subspace.In this case, our algorithm can be deemed a success.
Case 2. Region proposal features of some classes are counteracted partly.So the feature point jumps to a be-object subspace, but features in this subspace are not strong enough to belong to any specific class.That is to say, these proposals will be discarded in the following classification stage for their scores of each class are lower than our set threshold.In this case, the final result is that objects cannot be detected, so it is an indirect success.
Case 3. The feature point jumps from one object class to another.Result is that the detector will give a bounding box approximately, but its label might be incorrect.This case is just a weak success.

Conclusion and Future Work
In this paper, we propose the Stealth algorithm of elaborating adversarial examples to resist the automatic detection system based on the Faster RCNN framework.Similar to misleading the classification task in previous work, we also add some interference to cheat the computer vision of ignoring the existence of objects contained in images.Users can process images to be uploaded onto social networks through our algorithm, thus avoiding the tracking of online detection system, so as to meet the goal of minimizing privacy disclosure.In effect, it is like objects in images wearing an invisibility cloak and everything disappearing in machine's view.As a comparison, we conduct experiments of modifying images with several other trivial but intriguing methods (e.g., mosaic, blur, noise, low brightness, and transparency).The result shows our Stealth scheme is the most effective and has minimal impact on image visual quality.It can guarantee both high image fidelity to human and invisibility to machine with high probability.We define a user adjustable parameter to determine the adversarial disturbance intensity on each pixel, that is, cloak thickness, and a measurement to indicate how much privacy can be protected, that is, privacy insurance.And we have further explored the relation between them.In addition, we find the adversarial examples crafted by our Stealth algorithm have transferability property; that is, the

Figure 1 :
Figure 1: The general process of obtaining privacy through online photos.

Figure 3 :
Figure 3: Region proposal process in the training phase and in our algorithm.
plan plan plan plan pla plan an n n la la plan plan plan plan plan plan n n n plan plan pla plan plan a plan plan n n plan plan p p plan pla plan n n n la lan n plan pla plan n n an n n n n n n n n a lan plan n n n an n

XFigure 4 :
Figure 4: The original and processed image through a DNN detector.
(i) Whether global mosaic in Figure 6(b), local mosaic in Figure 6(c), global blur (Gaussian blur here) in Figure 6(d), or local blur in Figure 6(e), compared to other ways, their PSNR value is a bit larger.

Table 2 :Figure 9 :
Figure 9: An intuitive understanding of adversarial images for detection task in the high dimensional space.(a) Different cases that feature point moves between the be-object class and not-object class in the high dimensional feature space.(b) Different cases that feature point moves among different specific classes.Each subspace with a color represents a specific class.The subspace in the be-object region but not belonging to any specific class represents its score of belonging to any class which is lower than our set threshold.

Table 1 :
Some statistics on photos from Twitter.