Differential privacy (DP) provides a rigorous and provable privacy guarantee and assumes adversaries’ arbitrary background knowledge, which makes it distinct from prior work in privacy preserving. However, DP cannot achieve claimed privacy guarantees over datasets with correlated tuples. Aiming to protect whether two individuals have a close relationship in a correlated dataset corresponding to a weighted network, we propose a differentially private network data release method, based on edge correlation, to gain the tradeoff between privacy and utility. Specifically, we first extracted the Edge Profile (PF) of an edge from a graph, which is transformed from a raw correlated dataset. Then, edge correlation is defined based on the PFs of both edges via Jenson-Shannon Divergence (JS-Divergence). Secondly, we transform a raw weighted dataset into an indicated dataset by adopting a weight threshold, to satisfy specific real need and decrease query sensitivity. Furthermore, we propose
Recently, social networking such as cooperation networks, online/mobile social networks, and software defined vehicular network [
In general, a dataset corresponding to such a network, usually modeled as a graph, is considered as correlated data; that is, tuples in this dataset are dependent. Clearly, privacy preserving in such correlated settings is more difficult because an adversary can infer the relationship of two individuals from their associated friends. Accordingly, our concern is preventing, whether the relationship of two individuals appears in a network dataset, from being unveiled.
Differential privacy (DP), a privacy preserving model originated from statistical database, has currently drawn considerable attentions in research communities [
The focus of our work is on hiding the affinity degree of two individuals in a correlated dataset corresponding to a weighted network, that is, protecting whether the affinity degree of two individuals exceeds a given weight threshold, in a differentially private manner. Toward this end, we first transform a weighted network dataset into a corresponding weighted graph and define the correlation of both edges via Jenson-Shannon Divergence (JS-Divergence). For satisfying specific query need in spite of some utility loss, we utilize Threshold Based Transformation (TBT) algorithm to transform a weighted dataset, by adopting a weight threshold, into an indicated dataset, which also decreases query sensitivity. Finally, we present the notion of
The framework of our solution.
The contributions of our work are as follows.
First, we extract the Edge Profile (PF) vectors of edges in a weighted graph corresponding to a network dataset and then define the correlation of both edges via JS-Divergence. The inferred correlation analysis is more reasonable for datasets corresponding to such networks, since the typical Pearson correlation coefficient assumes that sample data follows normal distribution; however, the degree and weight distributions in such networks are often not so.
Second, we propose the
Third, we utilize TBT algorithm to transform a raw weighted dataset into an indicated dataset; that is, a weight value is equal to 1 or 0, by adopting a weight threshold, to satisfy specific real need and decrease query sensitivity. Admittedly, some utility loss exists in such transformation. However, many queries in real world only need Boolean values indicated by one and zero instead of accurate numeric answers. Therefore, this solution provides a feasible way for decreasing query sensitivity while maintaining real query need.
The rest of this paper is organized as follows. Section
Compared with previous works in privacy preserving, DP proposed by Dwork [
Currently, DP has drawn much attention in privacy preserving work needed in many fields. Wang et al. [
However, existing works have found that DP provides weaker privacy guarantee over nonindependent data; that is, DP needs more noise added to the output query result to cancel out the impact of correlations among tuples on privacy guarantee. Undoubtedly, how to analyze correlations among tuples and apply them into DP are desired to be further explored. For example, Kifer and Machanavajjhala [
As seen from the above discussions, correlation analysis plays an important role in privacy preserving mechanisms, which directly influences the tradeoff between privacy protection and service utility. Obviously, the more accurate the correlation analysis, the better the balance of both aspects. Therefore, we attribute the underestimated privacy guarantee of DP over correlated data to the lack of data knowledge, and our work starts from data correlation analysis.
In this paper, we focus on correlated datasets corresponding to weighted cooperation networks. Different from the existing methods of correlation analysis, for example, simple multiplication in [
Differential privacy provides the privacy guarantee for an individual in the probabilistic sense [
A randomized mechanism
Generally, DP is achieved via two mechanisms: Laplace mechanism [
For any query function
Laplace mechanism, used in this paper, is formally presented as follows.
Given any query function
In this paper, we model a correlated dataset as a weighted undirected simple graph
Suppose a raw weighted dataset
A raw weighted dataset.
|
|
|
---|---|---|
1 | 2 | 2 |
2 | 3 | 4 |
2 | 4 | 8 |
2 | 5 | 1 |
4 | 5 | 5 |
4 | 6 | 3 |
Motivated by the entropy in information theory, we adopt JS-Divergence, inferred from Kullback-Leibler Divergence (KL-Divergence) [
Suppose
Here
Suppose
In this section, we first discuss how to define the correlation of both edges in a weighted graph corresponding to a network dataset and then introduce why and how we conduct dataset transformation based on a given weight threshold. Finally, we define the
For achieving the correlation of tuples in a raw weighted dataset
For a weighted edge
Similarly, for any other edge
Note that Pearson correlation coefficient assumes that sample data follows normal distribution. However, in social networks, the weight and degree distributions do not follow such distribution, which is also verified by our experiments shown in Figure
Degree distribution and weight one on network data.
Degree distribution on geom
Weight distribution on geom
Degree distribution on out
Weight distribution on out
Since our constructed PF vectors of edges do not satisfy the assumption of normal distribution, we adopt JS-Divergence, instead of Pearson correlation coefficient, to measure the CORrelation (COR) of any two edges
Meanwhile, we consider the distance of both edges as follows.
Suppose
Specifically, we first calculate the distances including
Based on Definitions
Suppose
According to (
Substituting (
In our opinion, the proposed correlation definition, extracted from two aspects of relational strength and network structure, is more reasonable. The rationale is (i) graph models, commonly abstracted from networks, reflect inherent dependent relations of individuals, which naturally form edge correlations and (ii) the weights of edges in weighted graphs describe the affinity degree of individuals’ relations, which also influence the variances of edge correlations.
Take
Furthermore, according to (
Finally, according to (
We consider some real world situations that do not need exact query answers. For example, people sometimes only want to learn about whether two individuals have an intimate relationship or not, rather than the specific number of communication or cooperation. So the privacy concern at this time is to avoid the leakage of close relationship, that is, yes or no. Therefore, the first thing we focus on is to transform a weighted dataset
To this end, we propose the TBT algorithm to modify raw weight values
( ( ( ( ( ( ( (
As discussed above, we only consider the situations: the query answers responding to a correlated weighted data are yes or no, which indicates whether two individuals have close relationship. That is, the privacy concern herein is to avoid the leakage of whether there is a close relationship between two individuals, in a weighted dataset whose at most
Any pair of databases
Note that the neighboring databases in Definition
(i) Based on JS-Divergence, we have the following conclusion about the correlation
For any two edges
For the ease of exposition, we denote the last two items in the numerator of (
Since
Substituting (
Similarly, we obtain
Combining (
Note that
Clearly,
(ii) Similar to [
Furthermore, we define the
A randomized mechanism
After transforming weighted dataset
Since the change of a tuple only affects at most other
Finally, we have the correlated sensitivity denoted by CS, that is, the maximal
Note that the CS is also suitable for the independent and fully correlated cases. Based on the CS, we can achieve
Given any query function
According to (
Finally, combining (
For indicated dataset
Based on indicated dataset
The goal of NDR algorithm is to achieve the tradeoff between privacy and utility under correlated settings. To this end, three phases are taken into account: (i) for achieving the correlation of two tuples in
( ( ( ( ( ( (
Clearly, NDR algorithm satisfies the
A mechanism NDR is
Based on Definition
For any query
By Definition
If
According to (
Therefore, mechanism NDR satisfies
Generally, the goal of privacy preserving is to achieve maximal utilities while maintaining required privacy guarantees; that is, the tradeoff between privacy and utility is desired. In this section, we first present the better privacy guarantees and utilities of Algorithm NDR based on the definition of
We analyzed privacy and utility of NDR and Baseline algorithms in terms of
Figures
Comparison of privacy and utility on network data in terms of
Privacy on geom
Utility on geom
Privacy on out
Utility on out
Privacy on rgd
Utility on rgd
On the other hand, Figures
We adopt MAE, that is,
Figures
Comparison of utility on network data in terms of mean absolute error.
Utility on geom (correlated parameter
Utility on geom (privacy budget
Utility on out (correlated parameter
Utility on out (privacy budget
Utility on rgd (correlated parameter
Utility on rgd (privacy budget
Figures
In this paper, we focus on adopting differential privacy model to avoid the leakage of close relationship between two individuals in a network. To this end, we first extract the PF vector from both aspects of node degree and edge weight to depict an edge in a network dataset and then design the correlation metric of two edges via JS-Divergence to avoid the constraint of adopting Pearson correlation coefficient. Next, we proposed the
The authors declare that they have no conflicts of interest.
This work is partly supported by the Fundamental Research Funds for the Central Universities of China under Grant no. GK201703061, the National Natural Science Foundation of China under Grants nos. 61402273 and 61373083, the National Science Foundation (NSF) under Grants nos. CNS-1252292, 1741277, and 1704287, and the Natural Science Basic Research Plan in Shaanxi Province of China under Grants nos. 2017JM6060 and 2017JM6103.