Privacy protection is one of the major obstacles for data sharing. Time-series data have the characteristics of autocorrelation, continuity, and large scale. Current research on time-series data publication mainly ignores the correlation of time-series data and the lack of privacy protection. In this paper, we study the problem of correlated time-series data publication and propose a sliding window-based autocorrelation time-series data publication algorithm, called SW-ATS. Instead of using global sensitivity in the traditional differential privacy mechanisms, we proposed periodic sensitivity to provide a stronger degree of privacy guarantee. SW-ATS introduces a sliding window mechanism, with the correlation between the noise-adding sequence and the original time-series data guaranteed by sequence indistinguishability, to protect the privacy of the latest data. We prove that SW-ATS satisfies

Time-series data are a set of sequential, large, and continuous data sequences. In general, time-series data can be regarded as a dynamic dataset that grows infinitely over time. Using the correlation between data values to analyze and mine time-series data can bring considerable benefits to government, enterprises, and social public services. For example, in this outbreak of COVID-19, monitoring and analyzing the patient’s physical condition can effectively treat the disease and control the spread of the epidemic. The navigation software needs to count the total amount of traffic in a specific time range of each road to calculate the best route to the destination.

The above example illustrates the importance of publishing time-series data for knowledge discovery and acquisition. However, if the curator does not adopt appropriate privacy protection technology and publish the data directly, it will leak personal sensitive information and violate citizens’ privacy.

Traditional data publishing mainly uses anonymous technology, such as

Insufficient privacy protection: adding independent identically distributed noise to the correlated data will cause the attacker to filter out the noise through filtering attacks and other methods, thus causing the user’s privacy to be disclosed.

Low data utility: since IID noise is added to the correlated data, it will lead to the reduction of privacy protection level. In order to maintain the same level of differential privacy protection, more noise needs to be added, resulting in a sharp decrease in the utility of published data.

These issues indicate that the current methods of differential privacy are not suitable for processing time-series data with correlation. Although Wang et al. [

First, because time-series data exhibit periodic changes and have strong autocorrelation, even if a single record in the dataset is deleted, an attacker can infer information about missing records from other correlated records. We propose periodic sensitivity to replace the global sensitivity in traditional differential privacy to avoid this situation and provide a stronger degree of privacy protection under the same privacy budget. Second, based on the periodic sensitivity, we propose a sliding window mechanism to process infinitely growing and correlated time-series data. Third, we theoretically proved that our proposed correlated time-series data publication algorithm based on sliding window (SW-ATS) satisfies differential privacy. And compared with the state-of-the-art method, the experimental results show that SW-ATS can reduce more errors and provide stronger privacy protection.

In the early research on differential privacy data publication, most literature studies assume that the data are independent. At present, the research on differential privacy on correlated data is still relatively limited. Because the main research obstacle of correlated differential privacy is that correlated records can provide additional information for attackers, while traditional mechanisms can hardly model it. In this case, meeting the definition of differential privacy is a complex task. Kifer et al. [

In the research of correlated time-series data, Cao et al. [

Wang et al. [

Some scholars convert the correlated time-series data to another independent domain for processing while retaining the main characteristics of the original sequence. Rastogi et al. [

Summary of literature survey.

Algorithm | Advantage | Limitation |
---|---|---|

Pufferfish [ | The algorithm takes into account the correlation between data | Does not satisfy differential privacy |

PCA [ | Under the premise of keeping the main characteristics of the sequence unchanged, the correlation time series is transformed into another independent domain for processing | Independent noise is added and the sequence correlation is destroyed to some extent |

CIM [ | Literature [ | It is only applicable to the publication of histogram statistics |

CTS-DP [ | The correlation noise is added to the original time-series data | Dynamic data cannot be processed and privacy protection is inadequate |

How to dynamically publish correlated time-series data?

How to deal with the lack of privacy intensity due to the periodic changes of correlated time-series data?

Dwork et al. [

(

(Global sensitivity [

(Laplace mechanism [

parallel combinatorial properties [

Time-series data are a set of sequential, large, and continuous data sequences. In general, time-series data can be regarded as a dynamic dataset that grows infinitely over time. For example, Table

User blood glucose monthly statistics (mg/dl).

User | ... | ||||
---|---|---|---|---|---|

James | 186 | 203 | 196 | ... | 260 |

Mary | 140 | 132 | 129 | ... | 148 |

Jane | 167 | 152 | 198 | ... | 176 |

... | ... | ... | ... | ... | ... |

Tom | 188 | 239 | 197 | ... | 204 |

Considering the following scenarios, user A wants to query the average value of blood glucose data within the range of _{1}-_{2}; user B wants to query the number of people whose blood pressure is greater than 140 mmHg at time _{3} ... The goal of this article is to use differential privacy technology to publish correlated time-series data, and users can obtain meaningful query results under the premise that personal privacy in the database is not leaked. The curator aggregates the time-series data of all users and divides it into

Data publication scenario.

For any piece of time-series data

(Autocorrelation function [

Among them, _{0} represents the power spectral density of

(Sequence indistinguishability [

In real life, time-series data are a dynamic dataset with infinite growth over time. Therefore, on the basis of the CTS-DP algorithm, this paper uses the sliding window mechanism for any length of time-series data to realize the continuous publication of time-series data under the premise of satisfying differential privacy. In order to solve the problem of insufficient privacy protection in the CTS-DP algorithm, we propose periodic sensitivity instead of global sensitivity to achieve greater privacy protection.

Define time-series data

The sliding window in time-series data refers to specifying an interval on the time-series data, which contains the latest data. The purpose is to limit the infinite data stream and obtain data characteristics. With the arrival of new data, the data in the sliding window are processed after the amount of data reaches the set sliding window size. Then slide the window forward and wait for the next set of data. Figure

Time-series data publication under sliding window.

Differential privacy protection under time-series data is divided into two levels: the event level and the user level [

Time-series data usually have a strong characteristic of periodic change. According to the characteristic of timing data showing a periodic change, the sampling period of the timing data can be determined. For example, the blood glucose of normal people remains in a constant range before three meals a day and before bedtime. Usually, the sampling frequency of health data within a day is taken as a period. Taking the blood glucose data as an example, the blood glucose data are sampled four times a day, and then the sampling period of blood glucose data is

Since the time-series data have strong periodic changes, if the global sensitivity is still adopted at this time, it will indeed increase the risk of privacy leakage.

For example, someone’s blood pressure surged recently due to staying up late. If users query the blood pressure value of a day at this time, they will have a higher probability to infer the other approaching blood pressure samples. Therefore, in order to ensure that the data are not leaked, it is necessary to delete all the sampling data before and after approaching this blood pressure value. At this time, if the global sensitivity is still sampled to generate Laplacian noise, it is obviously unable to better protect the data from leakage. Based on this, this paper proposes periodic sensitivity to replace global sensitivity to provide stronger privacy protection.

(periodic sensitivity). According to the attribute

Among them,

The SW-ATS algorithm can iteratively process and publish the existing data (static data) in the database, and the recently arrived data (dynamic data) can be processed and published after the data volume meets the sliding window size. Or adjust the size of the sliding window to the size of the newly added data before publishing. The establishment process of the SW-ATS algorithm is shown in Algorithm

Read the original time series

f

Calculate the autocorrelation function

According to the query function

Generate four IID Gauss white noise series

Calculate

Splice

Algorithm

For the newly added data, when the amount of data reaches the size of the sliding window, the sequence

Algorithm SW-ATS satisfies

Literature [

Therefore, according to Theorem

The noise sequence generated by the algorithm SW-ATS in each sliding window is correlated with the original sequence.

Literature [

In Algorithm

This paper uses the differential privacy utility definition proposed by Blum et al. [

((

For any query

Let

Since

This experiment uses MATLAB language to realize the correlated time-series differential privacy publishing algorithm based on sliding window. The experimental environment is Inter (R) Core (TM) i5 2.7 GHz, 4 GB memory, Windows 7 operating system. We used two real-world datasets in our evaluations as this has helped in illustrating the effectiveness of our approach in real-world applications.

Some fields of the Steps dataset.

Field | Sample |
---|---|

Start date | 2019-05-14 10 : 37 : 07 + 0800 |

End date | 2019-05-14 11 : 49 : 32 + 0800 |

Value | 956 |

Nowadays, CTS-DP is the state-of-the-art method to publish correlation time-series data. Therefore, we choose the CTS-DP algorithm as a comparison.

Figure

Utility comparison when the size of sliding window changes. (a) MAE (diabetes dataset). (b) MAE (steps dataset).

Figure

Utility comparison when epsilon changes. (a) MAE (diabetes dataset). (b) MAE (steps dataset).

The average error of the algorithm SW-ATS in the Diabetes dataset is 25.1% less than that of CTS-DP, and the decrease in the average error in the Steps dataset is 12.5%.

In this paper, we use the filtering-based attack method proposed by Xiong et al. [

Comparison of privacy protection strength. (a) Diabetes dataset. (b) Steps dataset.

Each time CTS-DP releases data, it needs to process all the time-series data involved in the query. When new data arrive, CTS-DP needs to recalculate all the time-series data to be released and does a lot of unnecessary calculations. With the continuous growth of data flow, the calculation cost of the CTS-DP algorithm will become larger and larger and may cause the system to crash in extreme cases. The SW-ATS algorithm proposed in this paper introduces a sliding window mechanism on the basis of CTS-DP, which can both process the latest data and respond to queries with different time starting points and lengths. This reduces a lot of unnecessary calculations and greatly saves the system resources. The experimental results show that, under the sliding windows of different sizes, the error of SW-ATS is reduced by about 31% than that of CTS-DP, and under different privacy budgets, the error is reduced by about 19%.

In this paper, we proposed a sliding window-based differential privacy publishing algorithm for autocorrelation time series, which is applied to the publishing of time-series data. We proved that SW-ATS satisfies

Although SW-ATS is effective, there are still some aspects to be improved in the future. One is that the periodic sensitivity depends on the sampling period of the timing data. Only when the time-series data have an obvious sampling period, SW-ATS can have a better protection effect. If the time-series data are sampled randomly, the privacy protection strength may not meet the expectations. At the same time, in order to calculate the periodic sensitivity, the length of the sliding window must be greater than three times the length of the sampling period. At present, the SW-ATS algorithm only considers the autocorrelation of single attribute and can only process the time-series data of a single attribute each time. The data of each attribute not only have self-correlation but also have a mutual correlation. It is the next research direction of this paper to consider the correlation between multiple attributes and publish multidimensional correlation time-series data.

The data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no conflicts of interest.

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article. This work was supported by the National Natural Science Foundation of China (Grant no. 41971407), Major Technical Innovation Project of Hubei (Grant no. 2018AAA046), and Applied Basic Research Project of Wuhan (Grant no. 2017060201010162).