Re-ADP : Real-Time Data Aggregation with Adaptive ω-Event Differential Privacy for Fog Computing

In the Internet of Things (IoT), aggregation and release of real-time data can often be used for mining more useful information so as to make humans lives more convenient and efficient. However, privacy disclosure is one of the most concerning issues because sensitive information usually comes with users in aggregated data. Thus, various data encryption technologies have emerged to achieve privacy preserving.These technologies may not only introduce complicated computing and high communication overhead but also do not work on the protection of endless data streams. Considering these challenges, we propose a real-time stream data aggregation framework with adaptive ω-event differential privacy (Re-ADP). Based on adaptive ω-event differential privacy, the framework can protect any data collected by sensors over any dynamic ω time stamp successively over infinite stream. It is designed for the fog computing architecture that dramatically extends the cloud computing to the edge of networks. In our proposed framework, fog servers will only send aggregated secure data to cloud servers, which can relieve the computing overhead of cloud servers, improve communication efficiency, and protect data privacy. Finally, experimental results demonstrate that our framework outperforms the existing methods and improves data availability with stronger privacy preserving.


Introduction
Driven by the development of cyberphysical networks, cloud computing, mobile Internet, context-aware smart devices, and the corresponding data experience explosive growth [1].Cloud computing provides a good solution to deal with the explosive data growth and realize resource sharing [2].However, cloud-based services may face many challenges, such as high latency and high overhead at cloud servers, due to the centralized structure and the limitation of network bandwidth.Some researches present a distributed service computing paradigm, called fog networking [3][4][5].It allocates the capabilities of data gathering, data processing, computing, and applications to devices located at the edge of the network, so as to provide intelligent services for nearby users.
Although fog computing provides great benefits, sensitive and private information mined from raw data (e.g., social relationships and financial transactions) is also exposed to the risk of disclosure.Even more, due to the complexity and diversity of fog nodes, user privacy in a fog network can easily be disclosure.For example, more than 400, 000 electronic eyes in Beijing may lead to privacy leakage (e.g., vehicle location information) by data sharing in vehicular ad hoc networks (VANETs) [6][7][8].Similarly, we can also gain illegal access to personal health datasets gathered from various sensors of physical sign in body sensor networks (BSNs) and publish these private data without permission [9][10][11].As a result, how to protect user privacy is one of the important research issues in fog computing.
Currently, the protection of aggregated data privacy is mainly divided into two types.The first one is designed based on various encryption technologies, such as homomorphic encryption [6,[8][9][10].In this type, the encryption technology may cause huge computational overhead as well as lots of computing resources of cloud services [12].In addition, the cryptography-based schemes may lower communication system efficiency, especially when the system contains many sensors with high reporting frequency.The reason is that a great number of communication resources may be wasted on transmission of encryption information and the corresponding keys.As a result, this is not suitable for energy-limited sensor networks.

Wireless Communications and Mobile Computing
The other type of aggregated data privacy preserving is explored by using differential privacy [13].Compared with the traditional cryptography-based schemes, differential privacy can protect individuals privacy while improving data accuracy as much as possible.For example, the authors of [14] protect privacy of aggregated data with differential privacy by using machine learning.Although there exist many studies based on differential privacy, some challenges cannot be addressed.These studies do not consider the high correlation of time series so as not to generate real-time aggregated data with high accuracy.However, a practical framework should be able to satisfy batch queries in continuous time by exchanging information only once.
To address these challenges, we propose a real-time privacy-preserving stream data aggregation framework based on adaptive -event differential privacy under fog computing architecture.In fog computing, data storage, processing, and applications are concentrated in devices on the edge of the network rather than all in the cloud.This type of architecture reduces the amount of data transmitted to the cloud, increases efficiency, and significantly lowers overhead on the server itself.In addition, a fog center is considered as a data aggregator in our framework.It only reports the aggregation secure results to cloud server.In this way, the efficiency of communication can be greatly improved.Moreover, sensors only report raw data instead of encrypted data because our framework does not utilize complex encryption technology.Finally, many techniques for processing time-series data is exploited in our framework to improve the accuracy of aggregation data, such as adaptive sampling, time-series prediction, and filter.
In a nutshell, the main contributions of the paper are summarized as follows.
(i) We propose a real-time privacy-preserving stream data aggregation framework based on adaptive event differential privacy under fog computing architecture.The framework releases the overhead of cloud servers and generates aggregation data with differential privacy preserving.(ii) In order to promote -event differential privacy, we pioneer a novel metric, i.e., quality of privacy (QoP).
The QoP design takes into account both the window size  and errors of published statistics.Using the metric, we adjust the size of window  adaptively by dint of the design of QoP-based adaptive -event mechanism.(iii) We exploit the long short-term memory (LSTM) to predict time-series data and design the adaptive sampling scheme to improve the accuracy of aggregation data.(iv) We theoretically analyze the privacy of the proposed Re-ADP framework and demonstrate the high accuracy of aggregated data through numerical simulation results.
The rest of the paper is organized as follows.In Section 2, we introduce preliminaries of differential privacy and -event privacy.Then, we provide the system model, the adversary model, and the whole Re-ADP framework to illustrate our problem.In Section 3, we present a QoP-based adaptive event privacy algorithm that includes a dynamical adjustment method of the window size .Section 4 presents a smart grouping-based perturbation algorithm, which can reduce the noise added to data significantly.In Section 5, we analyze whether the Re-ADP framework satisfies differential privacy and provides a series of simulation results to discuss the performance of each mechanism in our framework.We then review previous works related to the privacy preserving of aggregated data and differential privacy in Section 6.Finally, Section 7 concludes our paper and explains promising research directions for future work.

Problem Statement and Preliminaries
2.1.System Model.The system model, shown as Figure 1, is composed of four layers: the things layer, the fog layer, the core layer, and the cloud layer.The function of each layer is described as follows.
(i) Things layer, consisting of various smart devices, e.g., sensors, mobiles, and actuators, generates and reports raw data to fog layer.(ii) Fog layer, typically located between IoT devices and core networks, is composed of lots of fog devices.The fog devices can be considered as traditional network devices, such as routers, switches, gateways, or local servers that are specially deployed.In this paper, the devices are mainly composed of local servers and are responsible for (i) gathering and storing data reported from things layer, (ii) computing and aggregating data to satisfy differential privacy, and (iii) responding to query requests from the cloud layer.(iii) Core layer is in charge of transferring and exchanging data between the fog layer and the cloud layer through network protocols such as IP and MPLS.(iv) Cloud layer deploys many cloud servers that can analyze massive aggregated data.Using the analyzing results, cloud services can provide a wide range of services.

Adversary Model.
In this paper, we assume that both the cloud layer and the core layer are untrustworthy.
where (A) denotes the range of the randomized algorithm A.
Note that , called privacy budget, is an important parameter in differential privacy.It represents the privacy level of the randomized algorithm A. More specifically, the level of privacy is inversely proportional to .Then, a mostly used method to achieve -differential privacy is the Laplacian mechanism as shown below.
Theorem 2 (the Laplacian mechanism [15]).Let D denote a set of datasets.Considering a function  : D → A, the Laplacian mechanism A for any dataset  ∈ D is where the noise follows a Laplacian distribution with mean zero and scale Δ()/.Here, Δ() denotes sensitivity of , which is defined as the maximum  1 norm for any neighboring datasets  1 and  2 .
Obviously, Theorem 3 shows that the secrecy level of a combination of several differential privacy-preserving algorithms is the sum of all budgets.

𝜔-Event
Privacy.-event privacy, the abbreviation of event -differential privacy, is a new privacy model proposed by Kellaris et al. [17].It can protect privacy for any event sequence occurring at any window of  time stamp.
We define two neighboring datasets at the th time stamp as   and    and a stream prefix of an infinite series  = ( 1 ,  2 , . ..) at the th time stamp as   = ( 1 ,  2 , . . .,   ).
where O is the set of all possible outputs of A. A mechanism satisfying -event privacy will protect the sensitive information that may be disclosed from a sequence of length .
According to the above definitions, we refer to [17] to conclude Theorem 6.The theorem enables a -event private scheme to view  as the total available privacy budget in any sliding window of size  and appropriately allocate portions of it across the time stamps.Theorem 6. Assume that A is a mechanism with input stream prefix   [] =   ∈ D and output s = (s 1 , . . ., s  ) ∈ S. Supposing A can be decomposed into A  (  ) = s  ,  ∈ [1, ], each A  generates independent randomness and achieves differential privacy.Then, A satisfies -event privacy if Based on this fundamental theorem, we will explore a novel adaptive -event differential privacy mechanism in our work.The proposed mechanism is designed for real-time privacy-preserving stream data aggregation under fog computing architecture.

Motivation and System
Framework.Our motivation is to design a real-time stream data aggregation framework that can protect user privacy in any  time stamp, allow batch queries, and obtain high-accuracy results.In order to achieve the motivation, we divide our work into two main tasks.
(i) Protect privacy in any window of  time stamp.
Servers may query aggregated data within  time only one round of communication.Therefore, the proposed framework must protect privacy of data generated in  time stamp.Besides the size of window  should be adaptively adjusted according to the state of data changes.(ii) Improve the accuracy of aggregated data.Because of the Laplacian differential privacy, the proposed framework needs to add random noise to data to guarantee privacy protection.Thus, the framework must reduce extra errors of aggregated data as much as possible on premise of privacy preserving.
In this article, we intend to design an adaptive -event based differential privacy-preserving strategy.This strategy in Figure 2 is composed of adaptive -event privacy analysis, smart grouping-based perturbation, and the filtering mechanism.Here, we outline the complete process of the proposed Re-ADP strategy, shown in Algorithm 1.The first component, illustrated in Section 3, is achieved based on the adaptive sampling and QoP measurement.The second one is presented in Section 4, which is designed based on Kmeans smart grouping and the corresponding perturbation mechanism.And we exploit the similar filtering mechanism in [18] to reduce errors of aggregated data so as to improve data availability.

QoP-Based Adaptive 𝜔-Event Privacy Design
For privacy protection on infinite stream data aggregating, -event privacy is a convincing model.The objective is to make a trade-off between utility and privacy to protect all data sequences that occur within all windows of  time stamp.However, it is not applicable to many realistic scenarios due to the fixed size of the sliding window.The key issue of the unrealistic assumption is that most real-time aggregate data streams collected from sensors are significantly different in various time periods.For example, within successive time stamps, it can be seen that traffic data varies sharply in the daytime but is relatively stable at night.Thus, we introduce a new QoP-based adaptive -event privacy mechanism in this section to dynamically adjust the window size  within different time stamps.The following three subsections describe the key parts to achieve this mechanism, including the QoP definition, the adaptive sampling design, and the adaptive event privacy design.

Quality of Privacy.
Considering the window size  and errors of aggregated statistics, QoP is proposed to measure the corresponding privacy quality.Assume x = { 1 , . . .,   } and r = { 1 , . . .,   } represent the raw time series in a window and the sanitized time series, respectively.Then, we exploit mean absolute error (MAE) to measure difference between these two time series.
Next, we employ a sampling mechanism in the proposed Re-ADP.It may perturb statistics at selected time stamp and approximate the nonsampled statistics with perturbed sampled statistics.Thus, (5) can be rewritten as follows: As a result, QoP in a window is defined as where  is a window size and  is the weight between  and (, ).Here,  is set to 0.002 in our experiments.In addition, (⋅) is a logistic sigmoid function that is equal to The reason that we employ the logistic sigmoid function for normalization is that we do not need to know the general characteristics of the data.Intuitively, as sensor data generated in contiguous time stamps is not independent, there is close correlation among these data when data changes slowly.Meanwhile, with the possibility that sensitive information may be disclosed, the windows size  should be increased when data changes slowly.

The Adaptive Sampling Design.
In general, a report of noisy data denotes the expenditure of fixed budget .When protecting all time stamps, the budget allocated to each time stamp will be small if the window size  is large.In this case, the report will show gigantic errors.This problem can be addressed by using a sampling mechanism (this mechanism can perturb sampled statistics while skipping nonsampled statistics).In this case, we can employ skipping some data points to save budget for future perturbation as where   ,   , and   denote the proportional gain, the integral gain, and the derivative gain, respectively.Intuitively, the sampling interval needs to be small with rapid data change.Thus, a new sampling interval T is calculated by the following methods.
In (11),  and   denote the current sampling interval and the previous one of sensor . 1 is used to regulate the sampling interval, and  2 is used to control the sensitivity of the PID error.

The Adaptive 𝜔-Event
Privacy Algorithm.On the basis of the two sections above, the adaptive -event privacy algorithm is proposed in Algorithm 2. Note that pseudocodes from line 1 to line 6 are experiment offline over the training set.
We assume that the starting and ending points of the window are both sampling points and there are  sampling points in the current window.As a result, the window size  = ∑ −1 =1   .According to ( 6) and ( 7), the QoP in a window can be calculated as follows: After obtaining  over training test, the adaptive -event privacy mechanism is described from line 7 to line 10.In particular, we can adjust the new window size by moving the start point of the window forward or backward Δ time stamps.

Smart Grouping-Based Perturbation
A naive method to achieve differential privacy is to inject the Laplacian noise to statistics.Nonetheless, it is likely to introduce more perturbation errors, especially in statistics with small values.Therefore, we present a smart groupingbased perturbation to aggregate sensors with small statistics together in a dynamic way with the change of statistics.
The Smart Grouping Algorithm is presented in Algorithm 3. It mainly is divided into three steps.First, it screens out the sensors that needs to be grouped according to the predicted statistics (denoted by     ) by exploiting the LSTM model.Then, it groups sensors that need to be grouped using the K-means algorithm.Finally, aggregated data will be perturbed based on the grouping result.We will elaborate on each step in detail in the following subsections.

Statistics Prediction with LSTM.
To protect privacy of raw data, we use the predicted data instead of real values in the smart grouping-based perturbation algorithm.As mentioned above, whether a sensor needs to be grouped depends on the prediction data of the sensor.In addition, which group each sensor is assigned to also depends on the predicted value.Let the sensor  itself as a group (5) Add the group to    (6) else (7) Add the sensor into Φ (8) Employ K-means algorithm to cluster sensors in Φ according to     .(9) Add each group into    based on the cluster results.This means that the accuracy of the predicted value is critical to the accuracy of the final aggregated data.Thus, a good model must be formulated, which can describe characteristics of data change well and predict data accurately.
To achieve accurate prediction, we introduce the LSTM model.In general, a LSTM network [19] has been gradually applied to the time-series analysis [20][21][22] by profiting from some advantages.In particular, it is a special type of recurrent neural network (RNN), which skillfully solved the problem of gradient vanishing of RNN.A common LSTM unit is composed of a memory cell, an undate gate, an output gate, and a forget gate.The memory cell stores a value (or state) for either long or short terms.It has the ability to remove or increase information to cell state through the well-designed three gates that can transfer information.As a result, we adopt the LSTM network to formulate our model to characterize the nonlinear characteristics of data in our algorithm.
Considering the effectiveness of our Smart Grouping Algorithm, our LSTM network only consists of three layers (shown in Figure 3), i.e., the input layer, the hidden layer, and the output layer.The input layer has  neurons, where the value of  is determined by the number of previously aggregated data to be used for prediction.The output is just one neuron because we just need to predict the value of next time stamp.The hidden layer consists of several LSTM units. 1 is a weight matrix between the input layer and the hidden layer, while  2 is that between the output layer and the hidden layer.In addition, each context unit corresponds to a neuron in the hidden layer, which is used to record the output of the hidden layer in one recurrence.
As shown in Figure 3, the historical aggregated data is used as the training data to input to the LSTM model so as to predict the value for each sensor at current time.For example, suppose we need to predict the value generated from sensor  at time   (e.g.,     ).The previously aggregated data used for prediction is ( . We first calculate the output of a hidden layer unit (i.e.,     ). Figure 4 shows the detailed structure of a LSTM unit, and     is calculated as follows.
First, the LSTM unit determines what information should be forgotten from the cell state by using where (⋅) is the logistic sigmoid function,   is weight matrix of the forget gate, and   is the bias vector of forget gate

Input layer Hidden layer
Output layer

LSTM Units
Context Units Output Units  layer.   −1 is the output of the hidden layer at time  −1 , while     is the input of the hidden layer at current time, which is computed as follows.
Next, LSTM employs the following equations to decide what new information needs to be stored in the cell state by using the update gate layer.
where ϝ  indicates which value will be updated and C  represents a vector of new candidate values.  and   are weight matrices of the input gate layer.And tanh(⋅) is defined as follows: Then, the cell state    is updated based on (18) in current time   .
Here, the output at current time of the hidden layer is controlled by the update gate ϝ  and the forget gate ϝ  .Finally, based on the latest cell state    , the output of the hidden layer at current time,     , can be calculated as follows: where ϝ  is the output gate that determines which part of    should be output.According to ( 13), (15), and ( 19), we can be aware that the LSTM unit has the ability of determining which information is forgotten, updating and outputting intelligently.This ability enables us to predict time series of our network more accurately.
Final prediction data of sensor  at time   , e.g.,     , is calculated as follows: where  is the activation function of the output layer.
Training of the LSTM network: in order to achieve real-time prediction, we should train the network related parameters offline in advance.In addition, we must employ the true statistics of the training set for the sake of the accuracy of the training model.Therefore, for sensor  at time   , the input is ( Using the backpropagation algorithm [23], the training error is propagated to the neurons in the LSTM network.Then, we further calculate training errors caused by each neuron and adjust the corresponding weights to reduce the errors.Details of the training process can be established in [23].Finally, given the historical aggregated data, the trained LSTM model can predict sensors data in real time.

K-Means Based Smart Grouping Algorithm.
In this subsection, we present a Smart Grouping Algorithm based on the K-means method.The algorithm can smartly aggregate small statistics obtained from sensors in the noise scenarios.First of all, we allocate the budget to each sampling point   and then generate an antinoise threshold  dynamically.Clearly, we can utilize an inverse proportion to characterize the relationship between  and the corresponding allocated budget.Then, we can obtain the predicted data of each sensor     for each time   by using the trained LSTM model.According to the above processing, we can exploit K-means algorithm [24] to achieve sensors data aggregation in the premise that     is smaller than the antinoise threshold .
Compared with other algorithms, k-means algorithm is fast and efficient, which is suitable for large data scenarios.Thus, it accords with the data size and real-time requirement of our algorithm.Next, we will introduce how the K-means algorithm works in our scenario.Note that the input is the predicted data of sensors at each time stamp   , which need to be grouped as {. . .,     ,     , ....}.In particular, we first randomly initialize the  cluster centers and then divide it into clusters where each cluster is closest to its nearest cluster centers for each sensor .Here, we intend to employ the Euclidean distance to calculate the distance from the current point to the center point.Next, we update the cluster center according to the new clusters obtained from the previous step.The method to update cluster centers is to calculate the mean of all points in the cluster.And the convergent condition is that the minimum squared error of every point to the center point is less than a threshold value or the preset maximum number of iterations.Finally, the algorithm repeats the above two steps until convergence.Figure 5 is an example to explain the whole process of the Smart Grouping Algorithm.Assuming there are four sensors that need to sampled at time stamp   , we define that the predicted statistics are 12, 11, 23, and 80, respectively.The antinoise threshold  is 50.For  4 ,  4 is an independent group because 80 > 50, which is added to    (the group strategy of the current time stamp).For  1 ,  2 , and  3 , we input them to the K-means algorithm.Note that  1 and  2 are clustered into a group while  3 becomes a single group.Thus, the final group strategy is    = {{ 1 ,  2 },  3 ,  4 }.

Smart Grouping-Based Perturbation.
To achieve additional noise loading, we intend to exploit the Laplacian mechanism to directly inject noises into aggregated statistics [15] based on results from adaptive sampling.The aggregated statistics do not include the nonsampled statistics that can be approximated by the last aggregated statistics.In this article, we present a scheme of smart grouping-based perturbation.This scheme is composed of a perturbation component and an allocation component.Considering the utilization of the grouping algorithm, we apply the Laplacian mechanism in each group rather than in each sensor in our scheme design.
We assume that a group  has  sensors and () represents a function to aggregate the number of data contributors in .Intuitively, because all contributors can only appear in the collection range of a sensor at one time stamp, the sensitivity of the function  is equal to 1; i.e., Δ() = 1.Then, the Laplacian mechanism can be employed in group  as follows: where [] is the th sensor of the group  and () denotes the scale of Laplacian noises injected into ().In order to avoid exceeding the total budget, our scheme considers the smallest budget of a sensor in a group as the budget of the whole group.In this case, the proposed RescueDP strategy does not make full use of the total budget.In addition, we also fix the sampling points in our scheme and allocate the total budget to each sampling point uniformly.Therefore,   = /, which leads to making full use of the total budget as well as ensuring not exceeding the total budget.Next, considering the predicted statistics in each sensor, we allocate the perturbed statistic.The allocation method can avoid errors resulting from the average operation in the RescueDP strategy.Our allocation method is shown as follows: where the weight of a sensor,   , can be calculated by the predicted statistics of a sensor; i.e., According to the smart grouping-based perturbation scheme, the perturbed statistics of a sensor are more accurate.

Performance Discussion
In this section, we first analyze the privacy of our proposed Re-ADP framework in theory and then provide several numerical simulations to study the performance of our framework in terms of MAE and QoP.

Privacy Analysis
Theorem 7. The proposed Re-ADP framework satisfies differential privacy.

Wireless Communications and Mobile Computing 9
Proof.In the Re-ADP framework, perturbation is the only possible mechanism to disclose private information because it is the only one to access raw data.As a result, if the perturbation mechanism can be proved to satisfy -differential privacy, the Re-ADP framework can meet the requirement of -differential privacy subsequently.
On the basis of the smart grouping strategy    at time stamp   , each group includes several sensors.We assume that   with   sensors is an arbitrary group of    .According to (12), the Laplacian mechanism on group   is as follows: where   [] is the th sensor of   and Δ() = 1.
Based on Definition 1, A(  ) satisfies    -differential privacy.According to Axiom 2.1.1 in [25], postprocessing sanitized data will not reveal privacy as long as sensitive information is not available directly in the postprocessing algorithm.As a result, A(  []), ∀ = 1, ⋅ ⋅ ⋅,   , also satisfies    -differential privacy.Assume that     and    represent the budget consumed and the budget allocated for a sensor at timestamp   , respectively.If all allocated budget is employed for perturbation in our algorithm, then     =    holds.Based on Theorem 6, the perturbation mechanism of a sensor satisfies -differential privacy for every   and The above formula always holds for any sliding window  timestamp for the reason that     =    holds in our budget allocation algorithm.Thus, the perturbation mechanism on each group can satisfy -differential privacy.In other words, the Re-ADP algorithm also satisfies -differential privacy.And this completes the proof of Theorem 7.

Numerical Simulation.
We compare the performance of the proposed Re-ADP strategy with MLDP in [14] and the RescueDP strategy in [26] over two real datasets.The MLDP is a privacy-preserving data aggregation scheme under fog computing based on machine learning, while the RescueDP is the latest strategy that provides -event privacy for realtime aggregate data publishing.In the simulation, we employ MAE and QoP as metrics to study the performance of the three schemes.The specific expressions of these metrics are given by ( 6) and (12).Our experiment is conducted in Python environment in Windows 10 operating system.Each experiment is run 100 times and points in the results are the average values of 100 times of each experiment.
The real-world test datasets to discuss the performance in our experiment are Bike data [27] and Station date [28].The dataset of Bike provides an accurate data containing the bike share trip in Washington DC for one year from January 1, 2016, to December 31, 2016.It contains a total of 3, 333, 791 bike share trajectories.Each of them consists of the bike number, the end station and time, and the start station and time.We transform it into a dataset that consists of 368 sensors to count the number of bikes at each bike parking spot in real time.The first three-quarters of the data is split as the training set and the fourth-quarter of the data is used as test set.The dataset of Station consists of the number of passengers of 2116 stations between January 1, 2016, and December 30, 2016.It contains 9, 917, 584 records and each record reports the number of passengers of a station.Because many stations have very little throughput, we chose 1393 sensors with more throughput to report.And the division of the test set and the training set is the same as the Bike dataset.
In our experiment, the parameters of the PID control are set as follows:   = 0.9,   = 0.1,   = 0,  = 10,  1 = 15, and  2 = 10 for the adaptive sampling mechanism.In Algorithm 2,   is set to be 100.In addition, we obtain  = 10 for the Bike dataset and  = 8 for the Station dataset by constantly iterating over training set.In K-means based Smart Grouping Algorithm, we set  = 5 for Bike dataset and  = 8 for the Station dataset, and the result is also the best performance on the training set.The parameters of the LSTM network are set as follows: the previous  = 50 history data is used for the input of the LSTM network.So the numbers of input layers' neurons is 50, and 100 is the number of hidden layers' neurons.Besides, we train by iterating 1, 000 times.Note that a training process that takes about two minutes is timeconsuming.However, we only conduct this process offline, which does not affect the real-time nature of the algorithm.

Utility versus Privacy.
Figure 6 provides the trade-off analysis between utility and privacy.It is clear that when  increases, the MAE of three schemes decreases gradually.The reason is that the larger  represents the smaller noise that needs to be injected.Moreover, for two real-world test datasets, the Re-ADP scheme outperforms the other two schemes greatly, especially in a small privacy budget.Also, the QoP of Re-ADP is obviously superior to the other two schemes in a sufficient privacy budget.
The superior performance of the Re-ADP scheme results from the following three aspects.First, due to the design of the optimal number of sampling points and the corresponding privacy budget allocation mechanism, the privacy budget is fully used for private perturbation.Second, the adaptive event privacy mechanism in the Re-ADP scheme satisfies the privacy window adaptively, which improves the practicability of the scheme.Finally, LSTM-based prediction can provide a high-accuracy prediction result for the smart grouping mechanism.

Effect of Adaptive 𝜔-Event Privacy Mechanism.
In order to highlight the advantages of adaptive -event privacy mechanism, we compare our Re-ADP scheme with a variant version, Re-ADP(f), which only adapts fixed -event privacy mechanism.Figure 7 demonstrates the comparison results in terms of MAE and QoP.It can be clearly seen that the adaptive  mechanism can increase QoP while decreasing MAE significantly in both real-world datasets.Therefore, we can draw the conclusion that the adaptive -event privacy mechanism advances the quality of reported data considerably.

Effect of Smart Grouping Mechanism.
In this part, we investigate the performance of our smart grouping mechanism.As shown in Figure 8, both MAE and QoP of the smart grouping mechanism exceed the Re-ADP without the smart grouping.The excellent performance of smart grouping chiefly benefits from the K-means-based grouping algorithm and the application of the deep learning algorithm.

Related Work
Many methods have been proposed to ensure the privacy of aggregated data generated from IoT devices [29][30][31][32][33]. Wu et al. [34] proposed a Dynamic Trust Relationships Aware Data Privacy Protection (DTRPP) mechanism for Mobile Crowd Sensing (MCS), which evaluates the trust value of a public key ingeniously.Zhang et al. [35] designed a priority-based health data aggregation scheme (PHDA) in cloud-assisted wireless body area networks.In the scheme, a credible relay node can be selected according to the social relationship between nodes to help aggregation data and then forward it to cloud servers.PHAD also provides a lightweight privacypreserving aggregation scheme, which can not only resist the forgery attack but reduce communication overhead.Li et al. [36] presented a privacy-aware data aggregation protocol for mobile sensing, which can aggregate time-series data to prevent untrustworthy aggregators from disclosing privacy.Using an additive homomorphic encryption and a novel key management scheme, the aggregator can only obtain the sum of all users data.Still, both schemes cannot cope with complex attacks that can also mine some privacy from the raw sum data.
In addition, all existing methods to achieve privacy preserving are based on encryption technologies.These complicated encryption technologies usually introduce high computation overhead, which may not be suitable for energyconstrained sensor networks.Some researchers suggest exploiting differential privacy, a convincing model for providing privacy, to protect aggregated data generated by IoT devices.Han et al. in [37] proposed a scheme to provide privacy preserving for health data aggregation.It employs a differential privacy model to resist differential attacks that most existing data aggregation schemes have suffered from.Yang et al. in [14] also proposed a differential privacy model based on machine learning algorithms.The model can reduce communication overhead as well as protect the privacy of sensitive data rigorously for the fog computing architecture.Also in fog computing, Wang et al. [38] put forward a privacy-preserving content-based publish-subscribe scheme with differential privacy in a publish-subscribe system, which can protect against collusion attacks.
Although these works apply differential privacy to protect privacy of aggregated data, there is a serious deficiency in a real scenario.They may greatly reduce the availability of aggregated data streams.Thus, some studies are committed to solving this challenge.Cao et al. in [39] studied a protection method for sensitive streams within a window instead of the whole infinite stream.Considering window-based applications, they explored a stream-based management system to cope with numerous aggregate queries simultaneously.In [18], Fan and Xiong intended to hide all events of users and designed a user-level privacy strategy for a finite stream.For the received perturbing data, they employed the Kalman filter [40] in their strategy to improve accuracy of differentially private data release.Considering multiple events occurring at continuous time segments, Kellaris et al. presented a -event -differential privacy model in [17].This model combined the advantages of event-level privacy model and user-level privacy model skillfully.In the model, they employed a sliding window to capture a wide range of -event privacy and designed a scheme to distribute and absorb the privacy budget on the assumption that statistics do not change significantly.On this basis, Wang and Zhang further designed an online aggregate monitoring scheme for infinite streams in [26].Their scheme integrated adaptive sampling, a budget mechanism, and dynamic grouping and perturbation to provide privacy preserving of statistics.
Despite the fact that ongoing studies of differential privacy on streams data aggregation have played a vital role, there still exist challenges to be dealt with.We point out that the fixed sliding windows employed in most existing frameworks may not be practical.Moreover, existing metrics are only suitable for static data rather than streaming media.Motivated by these challenges, in this paper, we present a realtime privacy-preserving streams data aggregation framework based on adaptive -event differential privacy for the fog computing architecture.

Conclusion
Considering privacy disclosure of aggregated data in fog computing, we present a real-time stream data aggregation framework with adaptive -event differential privacy (Re-ADP) in this paper.For the four layers of our system model, this framework is composed of three components, i.e., adaptive -event privacy analysis, smart grouping-based perturbation, and the filtering mechanism.In particular, we can employ the first component to protect privacy of infinite stream over any successive  time stamps.Then the second component is to achieve smart grouping based on K-means and inject additional noise into aggregated data, and we exploit an existing filter to improve data availability in the third component.Finally, we provide a theory to prove that the proposed Re-ADP framework satisfies differential privacy in theory.Extensive experiments over real-world datasets show that the Re-ADP scheme outperforms existing methods and improves the utility of real-time data publishing with strong privacy preserving.

( 10 )
Introduce Laplacian noise into each group    .(11) Allocate the group perturbed statistic to each sensor according to     .(12) return The noised data of sensors in    Algorithm 3: Smart grouping-based perturbation.

Figure 3 :
Figure 3: The architecture of the LSTM network.

Figure 4 :
Figure 4: The structure of the LSTM unit.

𝑗𝑡
− ,    −−1 , ⋅ ⋅ ⋅ ,    −1 ), and the expected output is the true statistic     .And, based on the predicted statistic     , the loss function of our network is defined as below.

Figure 5 :
Figure 5: An example of smart grouping.
They will try to acquire actual values of gathered data or maliciously tamper data.And the fog layer is considered trusted, which means it can acquire raw data but do not disclose data to the third party.
The raw data database   at the th time stamp Output: The aggregation secure statistics   (1) Find out the optimal number of sampling points  (2) Find out sets of sampling sensors at the current time stamp (3) Obtain the grouping strategy    via smart grouping (4) Allocate the budget for all sampling sensors (5) Add Laplacian noise to group    with allocated budget at the perturbation mechanism (6) Report the aggregated secure statistics   that is filtered by Kalman filtering.
(7)ut:(7)Update the sampling interval by adaptive sampling Algorithm 1: A real-time stream data aggregation framework with adaptive -event differential privacy (Re-ADP).