An Edge Correlation Based Differentially Private Network Data Release Method

Differential privacy (DP) provides a rigorous and provable privacy guarantee and assumes adversaries’ arbitrary background knowledge, whichmakes it distinct from prior work in privacy preserving. However, DP cannot achieve claimed privacy guarantees over datasets with correlated tuples. Aiming to protect whether two individuals have a close relationship in a correlated dataset corresponding to a weighted network, we propose a differentially private network data release method, based on edge correlation, to gain the tradeoff between privacy and utility. Specifically, we first extracted the Edge Profile (PF) of an edge from a graph, which is transformed from a raw correlated dataset. Then, edge correlation is defined based on the PFs of both edges via JensonShannonDivergence (JS-Divergence). Secondly, we transform a rawweighted dataset into an indicated dataset by adopting a weight threshold, to satisfy specific real need and decrease query sensitivity. Furthermore, we propose ε-correlated edge differential privacy (CEDP), by combining the correlation analysis and the correlated parameter with traditional DP. Finally, we propose network data release (NDR) algorithm based on the ε-CEDP model and discuss its privacy and utility. Extensive experiments over real and synthetic network datasets show the proposed releasing method provides better utilities while maintaining privacy guarantee.


Introduction
Recently, social networking such as cooperation networks, online/mobile social networks, and software defined vehicular network [1] is becoming increasingly prevalent.Accompanied with the growth of the networks, mass of network data is released for analytical decisions or scientific researches.However, direct publication of these data, including sensitive information, leads to privacy leakage of individuals.For example, whether two individuals in a social network have a close relationship may be expected to be kept a secret.Therefore, privacy concerns have been raised in increasingly emerging technologies [2][3][4][5][6][7][8][9].
In general, a dataset corresponding to such a network, usually modeled as a graph, is considered as correlated data; that is, tuples in this dataset are dependent.Clearly, privacy preserving in such correlated settings is more difficult because an adversary can infer the relationship of two individuals from their associated friends.Accordingly, our concern is preventing, whether the relationship of two individuals appears in a network dataset, from being unveiled.
Differential privacy (DP), a privacy preserving model originated from statistical database, has currently drawn considerable attentions in research communities [10][11][12][13][14][15][16][17] due to (i) its rigorous and provable privacy guarantee and (ii) its assumption of adversaries' arbitrary background knowledge.However, DP actually assumes that the tuples in databases are independent [18].In other words, DP cannot provide claimed privacy guarantees over correlated (nonindependent) data [19].Therefore, the application of DP over correlated data is a challenge, and how to achieve a differentially private correlated data release method deserves to be further explored.
The focus of our work is on hiding the affinity degree of two individuals in a correlated dataset corresponding to a weighted network, that is, protecting whether the affinity degree of two individuals exceeds a given weight threshold, in a differentially private manner.Toward this end, we first transform a weighted network dataset into a corresponding weighted graph and define the correlation of both edges via Jenson-Shannon Divergence (JS-Divergence).For satisfying specific query need in spite of some utility loss, we utilize Threshold Based Transformation (TBT) algorithm to transform a weighted dataset, by adopting a weight threshold, into an indicated dataset, which also decreases query sensitivity.Finally, we present the notion of -correlated edge differential privacy (CEDP), by combining the correlation analysis and the correlated parameter, that is, the maximal number of correlated tuples, with traditional DP, and design differentially private network data release (NDR) algorithm to obtain better utilities while maintaining DP guarantee.Experimental results over real and synthetic network datasets also show the advantages of the proposed method.The framework of our solution is shown in Figure 1.
The contributions of our work are as follows.
First, we extract the Edge Profile (PF) vectors of edges in a weighted graph corresponding to a network dataset and then define the correlation of both edges via JS-Divergence.The inferred correlation analysis is more reasonable for datasets corresponding to such networks, since the typical Pearson correlation coefficient assumes that sample data follows normal distribution; however, the degree and weight distributions in such networks are often not so.
Second, we propose the -CEDP model based on our result of correlation analysis and the introduction of the correlated parameter, which makes DP over correlated datasets applicable and flexible.Furthermore, the NDR algorithm, based on correlated sensitivity and Laplace mechanism, is proposed, which also satisfies -CEDP and achieves the tradeoff of privacy and utility.
Third, we utilize TBT algorithm to transform a raw weighted dataset into an indicated dataset; that is, a weight value is equal to 1 or 0, by adopting a weight threshold, to satisfy specific real need and decrease query sensitivity.Admittedly, some utility loss exists in such transformation.However, many queries in real world only need Boolean values indicated by one and zero instead of accurate numeric answers.Therefore, this solution provides a feasible way for decreasing query sensitivity while maintaining real query need.
The rest of this paper is organized as follows.Section 2 discusses related literature.Section 3 provides the preliminaries.In Section 4, correlation analysis of both edges in a weighted graph is presented, and the -CEDP model and sensitivity calculation are proposed.Furthermore, a differentially private NDR algorithm, including TBT algorithm, to obtain the tradeoff between privacy and utility over correlated data is proposed in Section 5.The extensive experiments are illustrated in Section 6.Finally, Section 7 concludes the paper.

Related Work
Compared with previous works in privacy preserving, DP proposed by Dwork [20] provides a probabilistic formulation, which represents that adversaries learn little from both databases differing in one tuple even if adversaries know about all tuples except the target one.In other words, the inference abilities of adversaries about the presence or absence of a tuple are bounded regardless of adversaries' knowledge; that is, the presence or absence of a tuple is probabilistically indistinguishable for adversaries.
Currently, DP has drawn much attention in privacy preserving work needed in many fields.Wang et al. [10] considered a unified privacy distortion framework, where the distortion is defined to be the expected Hamming distance between the input and output databases, and investigated the relation between three different notions of privacy: identifiability, differential privacy, and mutual-information privacy.To provide personalized recommendation in big data resulting from social networks and maintain user privacy, a cloud-assisted differentially private video recommendation system based on distributed online learning was proposed [11].The work in [12] proposed a new privacy preserving smart metering scheme for smart grid, which supports data aggregation, differential privacy, fault tolerance, and rangebased filtering simultaneously.To et al. [13] introduced a novel privacy-aware framework for spatial crowdsourcing, which enables the participation of workers without compromising their location privacy.Focusing on the privacy protection of sensitive information in body area networks, the authors in [14,15] proposed different privacy preserving schemes, based on differential privacy model, via a tree structure and dynamic noise thresholds, respectively.The work in [16] proposed a novel differentially private frequent sequence mining algorithm by leveraging a sampling-based candidate pruning technique, which satisfies -differential privacy and can privately find frequent sequences with high accuracy.In order to protect users' privacy in ridesharing services, a jointly differentially private scheduling protocol has been proposed [17], which aims to protect riders' location information and minimize the total additional vehicle mileage in the ridesharing system.
However, existing works have found that DP provides weaker privacy guarantee over nonindependent data; that is, DP needs more noise added to the output query result to cancel out the impact of correlations among tuples on privacy guarantee.Undoubtedly, how to analyze correlations among tuples and apply them into DP are desired to be further explored.For example, Kifer and Machanavajjhala [19] first explicitly doubted the privacy guarantee of DP in correlated settings, for example, social networks, and then adopted the subsequently proposed privacy framework, that is, Pufferfish, to formalize and prove that DP assumes independence between tuples [18].Inspired by the Pufferfish framework, Blowfish privacy [21] was proposed to achieve the tradeoff between privacy and utility using policies specifying secrets and constraints.Similarly, the authors in [22] proposed Bayesian DP to evaluate the level of private information leakage even when data is correlated and prior knowledge is incomplete.The work in [23] regarded the correlation among tuples as complete correlation and multiplied the query sensitivity with the number of correlated tuples in publishing correlated network data, which leaves room for fine-grained correlation analysis in the following work.Aiming to decrease the noise amount, Zhu et al. [24] depicted the correlation between tuples via Pearson correlation coefficient, including complete correlation, partial correlation, and independence.Liu et al. [25] inferred the dependence coefficient, distributed in interval [0, 1], to evaluate the probabilistic correlation between two tuples in a more fine-grained manner, thus reducing the query sensitivity which results in less noise.Considering temporal correlations of a moving user's locations, the work in [26] leveraged a hidden Markov model to establish a location set and proposed a variant of DP to protect location privacy.Wu et al. [27] proposed the definition of correlated differential privacy to evaluate the real privacy level of a single dataset influenced by the other datasets when multiple datasets are correlated.The work in [28] formalized the privacy preservation problem to an optimization problem by modeling the temporal correlations among contexts and further proposed an efficient contextaware privacy preserving algorithm.Cao et al. [29] modeled the temporal correlations using Markov model and investigated the privacy leakage of a traditional DP mechanism under temporal correlations in the context of continuous data release.The work in [30] quantified the location correlation between two users through the similarity measurement of two hidden Markov models and applied differential privacy via private candidate sets to achieve the multiuser location correlation protection.
As seen from the above discussions, correlation analysis plays an important role in privacy preserving mechanisms, which directly influences the tradeoff between privacy protection and service utility.Obviously, the more accurate the correlation analysis, the better the balance of both aspects.Therefore, we attribute the underestimated privacy guarantee of DP over correlated data to the lack of data knowledge, and our work starts from data correlation analysis.
In this paper, we focus on correlated datasets corresponding to weighted cooperation networks.Different from the existing methods of correlation analysis, for example, simple multiplication in [23], Pearson correlation coefficient in [24], and the maximal information coefficient in [31], we extract the PF vectors of edges in a weighted graph corresponding to a correlated dataset and then define the correlation of both edges via JS-Divergence, which is more accurate and reasonable.Specifically, the work in [23] assumes both tuples are completely correlated; however, our proposed correlation results lie in interval [0, 1] representing multiple correlation including complete correlation.In addition, the work in [24] assumes sample data follows normal distribution, while our method is not the case.Also, the maximal information coefficient proposed in [31] satisfies two heuristic properties including generality and equitability, and we will consider it in our future work.

Preliminaries
3.1.Differential Privacy.Differential privacy provides the privacy guarantee for an individual in the probabilistic sense [20].It is defined as follows.
Definition 1 (-differential privacy).A randomized mechanism A satisfies -differential privacy if, for any pair of databases  and   differing in only one tuple and for any output  ∈ (A) representing the possible output set of A, where  is the privacy budget depicting the probabilistic difference between the same outputs of A over  and   .Generally, DP is achieved via two mechanisms: Laplace mechanism [32] and exponential mechanism [33].Both mechanisms include a concept of global sensitivity [20], which reveals DP's preferable choice of protecting the extreme case.
Definition 2 (global sensitivity).For any query function  :  → R  , where  is a dataset and R  is a -dimension realvalued vector, the global sensitivity of  is defined as where  and   denote any pair of databases differing in only one tuple and ‖ ⋅ ‖ 1 denotes  1 norm.Laplace mechanism, used in this paper, is formally presented as follows.
Theorem 3 (Laplace mechanism).Given any query function  :  → R  , where  is a dataset and R  is a -dimension real-valued vector, the global sensitivity Δ of , and privacy budget , a randomized mechanism A provides the -differential privacy, where (⋅) denotes Laplace noise.

Weighted Adjacency Matrix.
In this paper, we model a correlated dataset as a weighted undirected simple graph  = (, , ), where  = {V 1 , . . ., V  } is the set of vertices and  = || is the number of vertices,  = {  } is the set of edges and   = (V  , V  ), V  ∈ ,  = 1, . . ., , V  ∈ ,  = 1, . . ., , and  = {  } is the set of weights where weight   corresponds with edge   .Then, the weighted adjacency matrix   of  can be denoted as where   represents the affinity degree between two individuals.Obviously, the weighted adjacency matrix    is symmetric.
Example 4. Suppose a raw weighted dataset   is listed in Table 1.Then, the corresponding weighted adjacency matrix   of   can be denoted as ) . (5)

Correlation Metric.
Motivated by the entropy in information theory, we adopt JS-Divergence, inferred from Kullback-Leibler Divergence (KL-Divergence) [23], to depict the difference of two distributions, which can be transformed to depict the correlation of two tuples in a correlated dataset.

Correlation Analysis of Weighted Edges
In this section, we first discuss how to define the correlation of both edges in a weighted graph corresponding to a network dataset and then introduce why and how we conduct dataset transformation based on a given weight threshold.Finally, we define the -CEDP model and calculate the correlated sensitivity for smaller added noise.

Correlation Definition.
For achieving the correlation of tuples in a raw weighted dataset   , we first obtain a weighted graph , whose weighted adjacency matrix is denoted by   .Then, the correlation problem is changed to seeking the correlation of edges in .To this end, we first describe the PF vector of a weighted edge from the perspectives of relational strength and network structure and then define the correlation of both edges via JS-Divergence instead of Pearson correlation coefficient.For a weighted edge   , suppose represents the set of vertices connected with V  ; we extract the PF vector of   , denoted by PF(  ), from the perspectives of relational strength and network structure simultaneously.Specifically, we obtain   /max   ∈ (  ) ∈ [0, 1] from the global weights of all edges.In addition, we get   / ∑ V∈   V ∈ [0, 1] and   / ∑ V∈   V ∈ [0, 1] from the local weights of edge   .On the other hand, similar to the representation of relational strength, Deg(V  )/max V∈ (Deg(V)) ∈ [0, 1] and Deg(V  )/max V∈ (Deg(V)) ∈ [0, 1] are constructed, by introducing the node degree Deg(⋅), to depict the global active degree for both vertices of edge   .Also, is adopted, via the set similarity, to depict the ratio of the number of common vertices connecting V  and V  to that of   and   .Meanwhile, Deg(V  )/ ∑ V∈  Deg(V) ∈ [0, 1] and Deg(V  )/ ∑ V∈  Deg(V) ∈ [0, 1] are used to depict the local active degree for both vertices of edge   .Combining the above factors, we obtain PF(  ) as follows: Similarly, for any other edge   , according to (8), we define PF(  ) as follows: Note that Pearson correlation coefficient assumes that sample data follows normal distribution.However, in social networks, the weight and degree distributions do not follow such distribution, which is also verified by our experiments shown in Figure 2, where geom and out are the abbreviations of geom.net[34][35][36] and out.morenolesmis lesmis [37][38][39] for simplicity.Specifically, (i) geom.net is the authors collaboration network in Computational Geometry based on the file geombib.bib,and the reduced simple network contains 7343 vertices and 11898 edges.Two authors are linked with an edge, iff they wrote a common work.The value of an edge is the number of common works.(ii) out.morenolesmis lesmis is the characters cooccurrences network in Victor Hugo's novel "Les Misérables," and it contains 77 vertices and 254 edges.A node represents a character and an edge between two nodes shows that these two characters appeared in the same chapter of the book.The weight of each link indicates how often such a coappearance occurred.
Since our constructed PF vectors of edges do not satisfy the assumption of normal distribution, we adopt JS-Divergence, instead of Pearson correlation coefficient, to measure the CORrelation (COR) of any two edges   ,   in a weighted graph.To this end, we normalize PF(  ) and Security and Communication Networks PF(  ) as PN(  ) and PN(  ), which are two probability distributions.Therefore, we have where where Meanwhile, we consider the distance of both edges as follows.
Definition 7 (edge distance).Suppose   = (V  , V  ) and   = (V  , V  ) are two edges in graph , (V 1 , V 2 ) denotes the length of the shortest path between nodes V 1 and V 2 , and  is the index of the smallest value in vector ; then the distance of   and   is defined as follows: Specifically, we first calculate the distances including (V  , V  ), (V  , V  ), (V  , V  ), and (V  , V  ) and then determine the smallest one among these distances and its index , complete the calculation of the distance of another pair of nodes, and finally obtain the distance of two edges   and   .
Based on Definitions 6 and 7, we define the CORrelation (COR) of two probability distributions via JS-Divergence as follows.
Definition 8 (CORrelation).Suppose  = { 1 ,  2 , . . .,   } and  = { 1 ,  2 , . . .,   } are the probability distributions of random variables  = { 1 ,  2 , . . .,   } and  = { 1 ,  2 , . . .,   }; then the CORrelation of  and  is defined as follows: According to (10)-( 13) and Definition 6, we adopt the normalized PN vectors of edges   ,   to measure their correlation as where PN(⋅)  denotes the  element of vector PN(⋅) and Substituting ( 15) into ( 14), we obtain In our opinion, the proposed correlation definition, extracted from two aspects of relational strength and network structure, is more reasonable.The rationale is (i) graph models, commonly abstracted from networks, reflect inherent dependent relations of individuals, which naturally form edge correlations and (ii) the weights of edges in weighted graphs describe the affinity degree of individuals' relations, which also influence the variances of edge correlations.

Dataset Transformation.
We consider some real world situations that do not need exact query answers.For example, people sometimes only want to learn about whether two individuals have an intimate relationship or not, rather than the specific number of communication or cooperation.So the privacy concern at this time is to avoid the leakage of close relationship, that is, yes or no.Therefore, the first thing we focus on is to transform a weighted dataset   to an indicated dataset   , based on a given weight threshold .In other words, we consider replacing query "Select SUM(weight) from   where   > " with query "Select COUNT( * ) from   where   > ", which satisfies some specific situations and decreases the query sensitivity simultaneously.Note that this method aims to avoid the leakage of whether an edge satisfying the given threshold condition exists and not to avoid the weights of edges satisfying the one exposed.In our opinion, this solution is reasonable and suitable for achieving privacy protection via DP in spite of some utility loss.
To this end, we propose the TBT algorithm to modify raw weight values   in   as an indicated value; that is,   = 1 if   > ; otherwise,   = 0, thus transforming   to   .The TBT algorithm is presented in Algorithm 1.

Correlated Edge Differential Privacy.
As discussed above, we only consider the situations: the query answers responding to a correlated weighted data are yes or no, which indicates whether two individuals have close relationship.That is, the privacy concern herein is to avoid the leakage of whether there is a close relationship between two individuals, in a weighted dataset whose at most  tuples are correlated, where  is the correlated parameter.To this end, we first define correlated neighboring databases as follows.
Definition 10 (correlated neighboring databases).Any pair of databases  COR, and   COR, are correlated neighboring databases, if the weight change of a tuple in  COR, results in the weight changes of at most  − 1 other correlated tuples in   based on the correlation COR(⋅, ⋅) of both tuples.
Note that the neighboring databases in Definition 10 are described by two parameters: the correlation COR(⋅, ⋅) aforementioned and the correlated parameter .Specifically, we have the following.
(i) Based on JS-Divergence, we have the following conclusion about the correlation COR(⋅, ⋅).Theorem 11.For any two edges   and   in the weighted graph  corresponding to a network dataset   , 0 ≤ (  ,   ) ≤ 1 holds.
Proof.For the ease of exposition, we denote the last two items in the numerator of ( 16) as follows.
Since PN(⋅) denotes a probability distribution, we have we consider two cases separately.
(ii) Similar to [23][24][25], we introduce the correlated parameter  representing that there are at most  correlated tuples in a dataset.In other words, a tuple is correlated with at most  − 1 other tuples; that is, an edge in a graph is correlated with at most  − 1 other edges.Obviously,  = 1 represents the independent case of tuples in a dataset,  =  represents the fully correlated case of tuples in a dataset, and 1 <  <  represents the partially correlated case of tuples in a dataset.Therefore, the variance of  increases the flexibility of Definition 10.Furthermore, we define the -CEDP model as follows.
Definition 12 (-correlated edge differential privacy).A randomized mechanism M satisfies -differential privacy if, for any neighboring databases  COR, and   COR, and for any output  ∈ (M) representing the possible output set of M, where  is the privacy budget depicting the probabilistic difference between the same outputs of M over  COR, and   COR, and COR(⋅, ⋅) and  are the correlation of two tuples and the correlated parameter representing the maximal number of correlated tuples, respectively.

Sensitivity Calculation.
After transforming weighted dataset   to indicated dataset   , we add Laplace noise to query answers based on the -CEDP model.Laplace noise is determined by two factors: privacy budget  and the global sensitivity of a query, and the latter refers to the maximal change of query result due to the modification of only one tuple.Here, for a query , assume the global sensitivity of , resulting from the change of tuple   , in independent settings is Δ  .Clearly, Δ  = 1.However, for dataset   with  tuples where at most  tuples are correlated, the query sensitivity resulted from modifying tuple   , called Edge Sensitivity denoted by ES  , is more complex.Specifically, (i) if COR(  ,   ) = 0, that is,  = 1, denoting the independent case, ES  = 1, (ii) if COR(  ,   ) = 1 and 2 ≤  ≤ , denoting the fully correlated case, ES  = , and (iii) if 0 < COR(  ,   ) < 1 and 2 ≤  ≤ , denoting the partially correlated case, ES  is defined as follows.
Since the change of a tuple only affects at most other  − 1 correlated tuples, ES  can be rewritten as Finally, we have the correlated sensitivity denoted by CS, that is, the maximal ES  in dataset   , as follows: Note that the CS is also suitable for the independent and fully correlated cases.Based on the CS, we can achieve -CEDP, which is shown as follows.
Theorem 13.Given any query function  :  , → R  , where  , is a correlated dataset with the correlation definition (⋅, ⋅) and the correlated parameter  and R  is a -dimension real-valued vector, the correlated sensitivity  of , and privacy budget , a randomized mechanism M, provides -CEDP, where (⋅) denotes Laplace noise.

Security and Communication Networks
Input: Original dataset   , privacy budget correlated parameter , threshold  and query set Output: Noisy query result M(  ).
Pr (M ( COR, ) = ) According to (35), the following holds: Finally, combining (37) with (38), we have Pr (M ( COR, ) = ) For indicated dataset   with weight   ∈ {0, 1} and the correlated parameter , we can easily infer the global sensitivity is equal to .Due to 0 ≤ COR(⋅, ⋅) ≤ 1, we have CS < .Therefore, CS is less than the global sensitivity.In other words, added noise via CS is less than that via the global sensitivity; hence the utility of the mechanism M based on CS is better.

Network Data Release Method
Based on indicated dataset   and the CS discussed in Section 4, we proposed a network data release method in special cases, which achieves the -CEDP model.Furthermore, the theoretical analysis of privacy and utility is elaborated.

NDR Algorithm.
The goal of NDR algorithm is to achieve the tradeoff between privacy and utility under correlated settings.To this end, three phases are taken into account: (i) for achieving the correlation of two tuples in   , we transform dataset   into the corresponding graph and calculate the correlation COR(⋅, ⋅) of both edges via the JS-Divergence, (ii) based on a given weight threshold , we convert   into   via TBT algorithm.In other words, the sensitivity in independent settings is 1, irrelevant to the weights.Furthermore, we implement the calculation of CS, and (iii) combining the affordable privacy budget  with CS, we calculate the added Laplace noise and finally obtain the noisy query result M(  ) for query  in query set Q.The NDR algorithm is presented in Algorithm 2.

Utility Analysis.
Clearly, NDR algorithm satisfies the -CEDP model.To conduct utility analysis, we adopt the (, )useful definition in [40] to depict the utility of NDR as follows.

Experiment
Generally, the goal of privacy preserving is to achieve maximal utilities while maintaining required privacy guarantees; that is, the tradeoff between privacy and utility is desired.
In this section, we first present the better privacy guarantees and utilities of Algorithm NDR based on the definition of (, )-useful and then further demonstrate its better utilities in terms of mean absolute error (MAE).Here the Baseline algorithm adopts the multiplication in [23] to handle with the correlated tuples in a network dataset.Considering the constraint of applying Pearson correlation coefficient, we do not adopt the method using Pearson correlation coefficient as comparison reference in the following experiments.To verify the advantages of Algorithm NDR concerning privacy and utility, we conduct NDR and Baseline algorithms on three datasets: geom, out explained in Section 4, and randomly generated dataset (rgd), which is a randomly generated weighted network containing 100 vertices and 1645 edges.The weight of each edge is uniformly distributed in interval [1,50].Such doing can also show the better adaption of the proposed correlation metric and algorithm over real world and synthetic datasets.Without loss of generality, threshold  here is set as 0, and the selection of its value is to be investigated in future work.
6.1.Privacy and Utility.We analyzed privacy and utility of NDR and Baseline algorithms in terms of (, )-useful when the correlated parameter is set to the size of the whole dataset.
In terms of privacy, we evaluate the consumption of privacy budget  under the same accuracy  and the same possibility 1 − .Clearly, the smaller the consumed privacy budget, the better the performance of algorithm.Figures 3(a), 3(c), and 3(e) present the variation of privacy budget, consumed by algorithms NDR and Baseline based on datasets geom, out, and rgd, with the increase of  from 1 to 40 when  equals 0.1 and 0.5, respectively.From Figures 3(a), 3(c), and 3(e), we can see that privacy budgets decrease in all cases with the increase of .The reason is that, with the relaxation of , larger noise can be allowed when  stays fixed; therefore algorithm can consume smaller privacy budget.Meanwhile, we also see that privacy budgets decrease with the increase of  from 0.1 to 0.5 when  stays fixed.Because the possibility of satisfying accuracy requirement decreases with the increase of , which means that algorithms can have more chances to add larger noise; that is, algorithms can use smaller privacy budget.Such advantage of both algorithms especially in the case of  = 1 is more obvious than that of other ones.In fact, when  stays constant, the higher the accuracy presented by , the more the privacy budget needed by algorithms.
On the other hand, Figures 3(b), 3(d), and 3(f) demonstrate the variation of  of algorithms NDR and Baseline based on datasets geom, out, and rgd, with the increase of  from 0 to 10000 when  equals 0.1 and 1.0, respectively.We can see that  decreases in all cases with the increase of ; that is, the possibility increases with the increase of .Note that this trend varies from dataset to dataset; for example, the possibility and accuracy of algorithm NDR over datasets geom and out are evidently different when  = 1.Clearly, when  is determined, the possibility increases with the relaxation of .In addition, we find that algorithm NDR can have larger possibility, that is, smaller , than the Baseline algorithm to achieve the same accuracy  under the same level of privacy budget .Also, when  increases from 0.1 to 1.0, algorithms also have possibility to achieve the same accuracy , which is easily understood from (40).
6.2.Utility.We adopt MAE, that is, (1/|Q|) ∑ ∈Q | f()−()|, to depict the performance of algorithms NDR and Baseline.Obviously, the smaller the MAE value, the better the utility.For each dataset, 10000 queries are randomly generated, and each query result ranges from 0 to the maximal number of tuples.
Figures 4(a), 4(c), and 4(e) show the variances of MAEs of NDR and Baseline algorithms, over datasets geom, out, and rgd, under various privacy budgets when the correlated parameter  is 10.From Figures 4(a), 4(c), and 4(e), we can see that the MAEs of both algorithms decrease with the increase of privacy budget  from 0.1 to 1.Because larger privacy budget leads to smaller noise added to raw data, the downtrends always hold.More importantly, algorithm NDR can obtain better accuracy, that is, smaller MAE, under various .Furthermore, the smaller the privacy budget , the more obvious such advantage.The reason is that algorithm NDR adopts the more reasonable correlation metric compared with the Baseline algorithm.
Figures 4(b), 4(d), and 4(f) show the variances of MAEs of NDR and Baseline algorithms, over datasets geom, out, and rgd, under various correlated parameters when privacy budget  is 0.5.In Figures 4(b), 4(d), and 4(f), we find that the MAEs of both algorithms increase with the increase of correlated parameter  from 1 to 40.Undoubtedly, with the increase of the number of correlated tuples in a dataset, larger noise needs to be injected to eliminate the effect of tuple correlation, which necessarily results in the increase of MAE.In addition, we also note that algorithm NDR can obtain better accuracy, that is, smaller MAE, under various correlated parameters compared with the Baseline algorithm.Also, the larger the correlated parameter, the larger such advantage.All these advantages are due to the more reasonable correlation metric, which is proposed in Section 4 and adopted by algorithm NDR.

Conclusion
In this paper, we focus on adopting differential privacy model to avoid the leakage of close relationship between two individuals in a network.To this end, we first extract the PF vector from both aspects of node degree and edge weight to depict an edge in a network dataset and then design the correlation metric of two edges via JS-Divergence to avoid the  of adopting Pearson correlation coefficient.Next, we proposed the -CEDP model to deal with the correlated dataset by introducing two parameters including our correlation metric and the correlated parameter.Furthermore, we present the NDR algorithm based on the -CEDP and discuss its privacy and utility in terms of the definition of (, )useful.Extensive experiments on real and synthetic network datasets verify the advantages of our proposed privacy preserving model and algorithm concerning privacy and utility.Admittedly, the proposed solution is currently appropriate for weighted network datasets, and other datasets are out of the scope of this paper.In future work, we will discuss the impacts of choosing weight threshold on algorithm performances, explore more appropriate correlation metrics, and investigate privacy preserving algorithms in different applications.

Figure 1 :
Figure 1: The framework of our solution.

Figure 2 :
Figure 2: Degree distribution and weight one on network data.

Figure 3 :Figure 4 :
Figure 3: Comparison of privacy and utility on network data in terms of (, )-useful.

Table 1 :
A raw weighted dataset.Note.  and   denote two individuals, and weight   denotes the relation strength between them.