The Discrete Gaussian Expectation Maximization (Gradient) Algorithm for Differential Privacy

In this paper, we give a modified gradient EM algorithm; it can protect the privacy of sensitive data by adding discrete Gaussian mechanism noise. Specifically, it makes the high-dimensional data easier to process mainly by scaling, truncating, noise multiplication, and smoothing steps on the data. Since the variance of discrete Gaussian is smaller than that of the continuous Gaussian, the difference privacy of data can be guaranteed more effectively by adding the noise of the discrete Gaussian mechanism. Finally, the standard gradient EM algorithm, clipped algorithm, and our algorithm (DG-EM) are compared with the GMM model. The experiments show that our algorithm can effectively protect high-dimensional sensitive data.


Introduction
Now, big data have spread to every field and organization in our society, generating large amounts of personal data every day, which people use and analyse to enable the rapid development of society and technology. However, it is expected that some personal private data will be protected from being hacked or made public when it is collected. erefore, how to effectively protect the privacy of data, not to be attacked, and can be effectively used, has gradually been paid attention to. Dwork et al. [1] introduced the concept and basic theoretical framework of differential privacy, which can effectively protect users' data privacy and has a strict and elegant mathematical theoretical framework and guarantees.
Gradient EM algorithm is one of the most important statistical models, and Wang et al. [2] recently applied sensitive data for privacy protection. Before this, people used the original EM algorithm and gradient EM algorithm, and there is no statistical guarantee. Until Balakrishnan et al. [4] gave the statistical guarantee of EM algorithm, Wang et al. [3] gave the guarantee of gradient EM algorithm based on it and extended it to the data privacy protection theory. However, just like most scholars, Gaussian noise with continuous distribution is added to the data, while in practice, the data output queries are often discrete, such as the number of records in the database that meets certain conditions. For this reason, Canonne et al. [5] proposed to use a discrete Gaussian mechanism to add discrete Gaussian noise to the data and to ensure that it has the same excellent accuracy as adding continuous Gaussian noise.
In this paper, we design a discretized Gaussian algorithm based on the gradient EM algorithm for differential privacy calculation based on [2]. Our algorithm has a good practical effect and can be extended to the general standard model. Meanwhile, the corresponding statistical guarantee of the algorithm is given in this paper. e structure of this paper is as follows: in the second part, we first introduce some theories of gradient EM algorithm, discrete Gaussian, and differential privacy, as well as some works related to this paper. In the third part, we introduce our model, namely, differential privacy discrete Gaussian EM (Gradient) algorithm (DG-EM), and the relevant statistical guarantee theorem. In the fourth part, we give the data simulation of the sensitivity, sample size, and dimension of the aggregated data, and the discussion of the model and future work are shown in the fifth part. Finally, we add the proof of some lemmas in the appendix.

Gradient EM Algorithm.
Assume that (X, Z) is complete data, where X is an observing sample and called Z as a latent variable. ey are generally unobservable because they are missing or have underlying data structures. We denote X and Z as the sample space for variables X, Z, respectively. Suppose that (X, Z) has a joint density function p θ 0 (x, z); it belongs to some parameterized distribution family p θ 0 |θ 0 ∈ Ω . For convenience, the variable X has a margin density function π θ (x) � Z p θ (x, z)dz, and π θ (z|x) � p θ (x, z)/π θ (x) is a Z ′ s conditional density function which is under X � x. Suppose that the given observer samples are x 1 , . . . , x n from population X. e EM algorithm needs to maximize the log-likelihood function ℓ n (θ) � log p θ (x, z). rough Jensen's inequality, the lower bound of the log-likelihood function can be writen as follows: where To maximize equation (3), the left term of the inequality can be sufficiently large by iteratively increasing the lower bound on the right term. e standard EM algorithm [6][7][8][9] estimates the function Q n (θ, θ (t) ) by E-step at each iteration, then the parameters are estimated in M-step to make the parameter values of this iteration maximize the function Q n (θ, θ (t) ) and denote the parameter as θ (t+1) � max θ∈Ω Q n (θ, θ (t) ). e gradient EM algorithm is usually used to achieve higher accuracy and faster global maximum if the function is differentiable at each iteration step. e gradient EM algorithm is usually stated as follows: when the function Q n (θ, θ (t) ) is differentiable at the t-th iteration, we can update the current parameter θ (t) to θ (t+1) by the following steps: where η is a parameter which calls step size.

Discrete Gaussian.
e study of discrete distributed forms of noise has received more attention this year. In the literature, people studied discrete Laplace distribution, discrete binomial distribution, and discrete Gaussian distribution and applied them to the field of cryptography.
In this paper, the differential privacy model is studied based on Gaussian mechanism. e noise with normal distribution makes the model have many elegant mathematical properties. Although the discrete Laplace noise mechanism and the discrete Gaussian noise mechanism cannot be compared in the same model, since they are used in different privacy mechanisms, we are still willing to use the discrete Gaussian noise in order to obtain aesthetic mathematical conclusions [10][11][12][13].
In this paper, we need to add noise to have discrete Gaussian distribution to specially treated sample. Firstly, we will give the definition of the discrete Gaussian distribution and some useful related theories. Definition 1. Let μ, σ 2 ∈ R, σ > 0, if random variable X has probability mass function as follows: On the integers support set, then we call it is a discrete Gaussian distribution with location parameter μ and scale parameter σ 2 and denoted N Z (μ, σ 2 ).

Some Basic eories on Differential
Privacy. In this part, we will give some basic theories on differential privacy [14,15].

Definition 2.
A randomized algorithm M: X ⟶ Y satisfies (ϵ, δ)-differential privacy (DP) if for all neighboring datasets ,D, D ′ ⊂ X, differing on a single entry. For all events S in the space Y, we have Pr(M(D) ∈ S) ≤ e ϵ Pr(M(D ′ ) ∈ S) + δ. Moreover, we called its approximate differential privacy, if δ > 0, and we called its pure or point-wise ϵ-differential privacy in the case of (ϵ, 0)-differential privacy. e concept of concentrated differential privacy given by Bun et al. [14] as follows: Definition 3. A randomized algorithm M: X ⟶ Y satisfies ρ-concentrated differential privacy if for neighboring datasets D, D ′ ⊂ X, and for any α ∈ (1, ∞), we have where D α (P Q ‖ ) � (1/α − 1)log y (P(y)/Q(y)) α Q(y) is the Renyi divergence of order α of the distribution form the distribution.
From these definitions, we have the conclusion that pure-DP can imply ρ-CDP, and ρ-CDP can imply In order to ensure the consistency of the parameters of our model, we need some basic definitions and assumptions based on [4].

Differential Privacy Discrete Gaussian EM (Gradient) Model
We will mention that the EM algorithm based on [2] and use the discrete Gaussian noise mechanism of high-dimensional truncation algorithm, which satisfies the centralized differential privacy (CDP). Like Wang et al. [2], we have first considered one coordinate case that is 1-dimensional random variable x. Let x 1 , . . . , x n be i.i.d. sampled from x. We get the clipped estimator as follows: Step 1. For the sample x i , we take a soft truncation function h(x) which is defined by Catoni and Giulini [16], en, we take some mild constant ω and rescaled sample x i by dividing ω to get h(x i /ω); through this approach, we can get the truncated mean as follows: Multiplicative noise is an effective method to ensure the estimation effect of typical points and increase the estimation effect of outliers as much as possible. It was first proposed by Srivastava et al. [17], and the motivation of using Gaussian multiplicative noise comes from [18].
Step 3. Finally, we take the expectation for the distributions with arrive multiplicative noise as follows: Like Catoni and Giulini [16], taking χ ∼ N(0, (1/β)), we take the distribution χ following the discrete Gaussian distribution as χ ∼ N Z (0, (1/β)). Easily, for any given constant a, b > 0, we also have Computational Intelligence and Neuroscience 3 where Also, the notation is defined by Unproved, we have the following estimation error Lemma 1 which is like Lemma 5 in Holland [19], and we gave the proof of it in Appendix A.

and the upper bound has known. Given a number
with probability at least 1 − c.
From the soft truncation function and the multiplicative noise algorithm, we know that the sensitivity of the processed observation samples is (4 � 2 √ s/3n). Next, we need to add discrete Gaussian noise to the observations and obtain that the query will be (ϵ, δ)-DP, which leads the following Lemma 3; we give the proof in Appendix B.
Lemma 2. Let ϵ > 0; let the function q: X n ⟶ Z be an operator algorithm which is defined by Steps 1-3, satisfying |q(x) − q(x ′ )| ≤ Δ for any x, x ′ ∈ X n ; the query can be writen as randomized algorithm M: Furthermore, these results imply the following lemma.

Lemma 3.
Under the assumptions in Assumption 1, with probability at least 1 − c, the following holds: After the estimation of the univariate private data, in the t-th iteration of Algorithm 1, we use the univariate estimation method for each coordinate of the gradient ∇Q n (θ (t) ; θ (t) ) and then get the estimation of the gradient ∇Q n (θ (t) ; θ (t) ). Finally, step M is performed.

Lemma 4. For any
where Y ∼ N Z (0, σ 2 ). For Algorithm 1, the next theorem shows that the parameter estimation is consistent if the initial parameter θ Init is close to the true parameter θ * enough. After some simple calculations, we conclude that in Lemma 2, the upper bound is where ω op is the optimal numerical solution to the equation Lemma 5. Let B � θ: ‖θ − θ * ‖ 2 ≤ R denote a parameter set with R � κ‖θ * ‖ 2 2 , κ ∈ (0, 1) which is a positive constant. Assume parameters L, B, μ, λ, τ satisfying condition of 1 − 2(λ − L/λ + μ) ∈ (0, 1). If ‖θ Init − θ * ‖ 2 ≤ R/2 and n is a large number such that We have Pr( Lemma 6. Let (‖θ * ‖/σ) ≥ r, then there exists a constant C such that the properties of self-consistent Lipschitz-gradient-2(L, B), μ-smoothness, and λ-strongly concave hold for the function where r is a enough large constant means that the minimum signal-tonoise ratio (SNR).
Furthermore, we can get eorems 1 and 2. e proof of these theorems is very simple; we do not list the detailed proof procedure here. In fact, we only need to replace the upper bound on the variance of the discrete noise in [2] with a single coordinate with 3 exp(− 1/2σ 2 ). Lemma 4, for any θ ∈ B, the j-th coordinate of ∇q(θ; θ) satisfies the following results:

Theorem 1. With the same condition as in
Theorem 2. With the same conditions in Lemma 3, we assume that ‖θ Init − θ * ‖ 2 ≤ (‖θ * ‖ 2 2 /8) in Algorithm 1, and n is a large enough number such that If we take T � O(log(n)) and the ratio as η � O(1), then for a failure probability c, we have with probability at least 1 − 2Tc We note that Lemmas 3-6 and eorems 1 and 2 are easy to get through Lemmas 1 and 2. Due to limited space, we delete these proofs here, and readers can prove them by themselves. It is only necessary to pay attention to the upper bound of the ℓ 2 -norm between the iterative values of parameters and the truth values in the process of proof.

Experiments and Results
In this section, we will evaluate the performance of Algorithm 1 on the GMM model based on these methods. We will study the statistical setting and theoretical behavior of this algorithm on synthetic data. Input: D � x i ⊂ R d , i � 1, . . . , n, privacy parameter ϵ, δ, Q(·; ·) and q i (·; ·), initial parameter θ Init ∈ B and τ satisfy Assumption 1, the number of iterations T, step size η, and failure probability c > 0.
, calculate the robust gradient and add a discrete Gaussian noise, that is, − 1) ).   Computational Intelligence and Neuroscience 4.1. Baseline Methods. In this part, we will compare the two methods primarily. For convenience, we will refer to the gradient EM algorithm as EM, which will serve as a nonprivate baseline method. e other is the clipped differential private EM algorithm, which we still refer to as clipped [20], which will serve as our privacy baseline approach.

Experimental Settings.
In this experiment, we generate the synthetic data of the mixed distribution of two components. To generate each of the algorithm, we consider the random initialization method for the selection of initial parameter values. In the results, we used to measure the resulting estimation error. We set signal-to-noise ratio (‖θ * ‖/σ) � 3. For the privacy parameter ϵ, we set ϵ � 0.5, 0.8, 1 { }, and then the parameter δ � Pr(Y > (ϵσ 2 / Δ) + (Δ/2)) needs to calculate because it is the function of ϵ.

Experimental Results.
As can be seen from Figure 1, we fixed n � 1000, d � 20. When the budget of our method is set at different values, the estimation error decreases significantly with the increase of iteration time. When the budget is 0.2, 0.5, and 1, the optimal value is 1, 2, and 2, respectively. It is difficult for us to determine the optimal value C.
In Figure 2, under the lower dimension case, we test how the data dimension d, privacy budget ϵ, and data size n affect the estimation error ‖θ − θ * ‖ 2 of algorithms on the Gaussian mixture model over iteration t. We can see that the estimation error of Algorithm 1 in GMM decreases when ϵ increases, n increases, or d decreases. However, we can see that when the budget ϵ is small, the effect of our algorithm is performed badly, and the estimation error declines unstably with the increase of the number of iterations.
In Figure 3, we can see that, in the face of high-dimensional data, the effect of estimation error ‖θ − θ * ‖ 2 needs a relatively large sample to be guaranteed. We conducted experiments with higher dimensions d � 40, 80, 160 and different sample sizes of 2000, 5000, and 10 000, respectively. It can be seen that when the sample size n is large enough, the estimation error can be guaranteed to decrease significantly with the number of iterations t. As shown in Figure 3, with the increase of sample size, our algorithm is equally effective in high-dimensional space, which is not comparable with Wang et al.'s [2] algorithm.

Conclusions
In this paper, we study the differential privacy model with discrete Gaussian mechanism noise. rough the process of data scaling and truncation, the model effectively solves the influence of high-dimensional data on the model. rough the experimental part and theoretical proof, we can see that the estimation error of the model adding discrete Gaussian noise is faster than that of the model adding continuous Gaussian noise in the low dimension than that of the clipped model. e effect is much better than that of [2] in the case of high dimension. At the same time, in the previous lemma section, we can see that our model has more compact bounds, because of the smaller variance of discrete Gaussian noise.

Proof of Lemma 1
Proof. In order to make the conclusion universal, we make some necessary assumptions. Firstly, let P(R) denote all probability measures on R, and we assumed it has an appropriate σ-field. Consider any two measures v, v ′ ∈ P(R), and f 0 : R ⟶ R is a v ′ -measurable function. We take the form of a cumulative generating function as is depend on two random quantities x i and the noise o i , we write By the definition of function h(·) before, the function f: R ⟶ R is measurable and bounded with (2 � 2 √ /3). Next, we let where c(o) is a term needs to be determined later. Inserting f 0 (o) to (A.1), we have we have with probability at least 1 − c. We have the following inequality: Since the noise terms o 1 , . . . , o n are independent and follow distribution o ∼ v, we can get us, we have the bound from equation (A.9) as follows: and then we need to analyse the first term and the second term on the right-hand side of the top inequality (A.11). For the first term, from the definition of the truncation function h(·), by (A.11), we have Computational Intelligence and Neuroscience 11 Since o ∼ v � N(0, (1/θ)), the expectation and variance of the 1 + o are as follows:

(A.13)
For the second term in (A.11), we need to evaluate D α (v‖v ′ ). We take v ′ � N(0, (1/θ)); through simple computations, we can get (A.14) us, we can take the upper bound form as We take the differential for the variable s; we have 16) and with respect to θ, we have To get lower bounds on x − E μ (x), we need to get the upper bounds on − x − E μ (x). Similar to the analysis above, we get the upper bound of − en, with probability at least 1 − c, the following holds Proof of Lemma 2 e proof process of Lemma 2 needs the next proposition [5]: Proposition 1. Let σ, α ∈ R with σ > 0 and α ≥ 1. Let μ 1 , μ 2 ∈ Z. en, Furthermore, this inequality is an equality whenever α(μ 1 − μ 2 ) is an integer.
Proof of Lemma 2: we can get Lemma 2 easily though Proposition 1 and Definition 3.
Data Availability e data in this paper are random numbers generated by statistical software R.

Conflicts of Interest
e author declares no conflicts of interest.