Privacy-Preserving Restricted Boltzmann Machine

With the arrival of the big data era, it is predicted that distributed data mining will lead to an information technology revolution. To motivate different institutes to collaborate with each other, the crucial issue is to eliminate their concerns regarding data privacy. In this paper, we propose a privacy-preserving method for training a restricted boltzmann machine (RBM). The RBM can be got without revealing their private data to each other when using our privacy-preserving method. We provide a correctness and efficiency analysis of our algorithms. The comparative experiment shows that the accuracy is very close to the original RBM model.


Introduction
With the rapid development of information technology and modern network, huge amounts of personal data are generated every day, and people care deeply about maintaining their privacy. Therefore, there is a need to focus on developing privacy-preserving data mining algorithms. With the rapid growth of social networks like Facebook and LinkedIn, increasingly more research will be based on personal data, such as advertising suggestion. In another scenario, doctors always collect patients' personal information before the diagnosis of a disease or the treatment of an illness. However, in order to prevent the leakage of these privacy data, the Health Insurance Portability and Accountability Act (HIPPA) has set up a series of regulations that protect the privacy of individually identifiable health information.
Data mining is an important interdisciplinary field of computer science and has been widely extended to the fields of bioinformatics, medicine, and social networks. For example, when a research institute wants to study the DNA sequence and related genetic diseases, they need to collect patients' DNA data and apply data mining or machine learning algorithms to obtain a relevant model. However, if scientists from other institutes also want to use these DNA sequences, ensuring that the patients' personal information is protected is an example of the problem at hand. In another scenario, some researchers want to combine the personal data from Facebook and LinkedIn to undertake a study. However, neither company wants to reveal the personal information of their subscribers, and they especially do not want to give it to a competitor. Therefore, we propose a privacy-preserving machine learning method to ensure that individuals' privacy is protected.
The restricted Boltzmann machine (RBM) [1] is increasingly being used in supervised or unsupervised learning scenarios, such as classification. It is a variant of the Boltzmann machines (BMs) which is a type of stochastic recurrent neural network invented by Hinton and Sejnowski. It has been used as windows of mel-cepstral coefficients that represent speech [2], bags of words that represent documents [3], and user ratings of movies [4].
In this paper we propose a privacy-preserving method for training the RBM, which can be used for information sharing without revealing personal data from different institutions to each other. We provide a correctness and efficiency analysis of our algorithms. The comparative experiment shows that the accuracy is very close to original RBM model. The rest of this paper is organized as follows. Section 2 describes the related work. We introduce the restricted Boltzmann machine, Gibbs sampling, contrastive divergence, and cryptograph scheme in more detail in Section 3. In Section 4, we describe our privacy-preserving method for training the RBM. The analysis of our model is described in Section 5. Section 6 gives the design of our experiments in detail. Last, Section 7 is the conclusion of this paper.

Related Work
In [5], Hinton gives a practical guide for training the restricted Boltzmann machine. It is widely used in collaborative filtering [4]. In [6], Agrawal and Srikant and [7] Lindell and Pinkes propose separately that much of future research in data mining will be focused on the development of privacypreserving techniques. With the development of privacypreserving data mining techniques, it can be divided into two classes: the randomization-based method like [7] and the cryptograph-based method like [6].
Randomization-based privacy-preserving data mining, which perturbs data or reconstructs the distribution of the original data, can only provide a limited degree of privacy and accuracy but is more efficient when the database is very large. In [8], Du and Zhan present a method to build decision tree classifiers from the disguised data. They have conducted experiments to compare the accuracy of their decision tree with the one built from the original undisguised data. In [9], Huang et al. study how correlations affect the privacy of a dataset disguised via the random perturbation scheme and propose two data reconstruction methods that are based on data correlations. In [10], Aggarwal and Yu develop a new flexible approach for privacy-preserving data mining, which does not require new problem-specific algorithms since it maps the original dataset into a new anonymous dataset.
Cryptograph-based privacy-preserving data mining, which can provide a better guarantee of privacy when different institutes want to cooperate to meet a common research goal, is always subject to its efficiency when the dataset is very large. In [11], Wright and Yang propose a cryptographic-based privacy-preserving protocol for learning the Bayesian network structure. Chen and Zhong [12] present a cryptographic-based privacy-preserving algorithm for backpropagation neural network learning. In [13], Laur et al. propose cryptographically secure protocols for kernel perceptron and kernelized support vector machines. In [14], Vaidya et al. propose a privacy-preserving naive Bayes classifier on both vertically and horizontally partitioned data.
To the best of our knowledge, we are the first to provide a privacy-preserving RBM training algorithm for vertical partitions.

Technical Preliminaries
In the section, we give a brief review of RBM and the cryptograph method we have used in our privacy-preserving algorithm. First, we introduce RBM and the learning method for the binary unit. Much of the description about RBM and its training method in this section is adapted from [5,15].
Second, we introduce the cryptograph technology [12] that we have used in our work. [16] is a stochastic neural network with symmetric connections between units and no connection in the same unit. The BMs can be used to learn important aspects of an unknown probability distribution based on its samples. Restricted Boltzmann machines (RBMs) further restrict that BMs do not have visible-visible and hidden-hidden connections [15], thus simplifying their learning process. A graphical depiction of an RBM is shown in Figure 1. Given , a joint configuration (V, ℎ) of the visible and hidden units has an energy [17] defined as

RBM. The Boltzmann machine (BM)
where V and ℎ are the vectors consisting of states of all visible units and hidden units, respectively; and are the biases associated with unit and unit , respectively, and is the weight between units and . The energy determines the probability distributions over the hidden units' and visible units' state vectors using an energy function as follows: where is the sum of (V, ℎ) for all possible (V, ℎ) pairs.

RBM with Binary
Units. When units' states are binary, according to [18], a probabilistic version of the usual neuron activation function that is commonly studied can be simplified to where sigm denotes the sigmoid function and (and , resp.) is the th row vector (the th column vector, resp.) of . Based on (2) and (3), the log-likelihood gradients for an RBM with binary units [15] can be computed as These gradients will be used in guiding the weight matrix's updates during the training procedure of the RBMs.

Sampling and Contrastive Divergence in an RBM.
Using Gibbs sampling as the transition operator, samples of ( ) can be obtained by running a Markov chain to convergence [15]. To sample a joint of random variables = ( 1 , . . . , ), Gibbs sampling performs a sequence of sampling substeps of the form ∼ ( | − ), where − represents the ensemble of the − 1 random variables in other than .
An RBM consists of visible and hidden units. However, since they are conditionally independent, we can perform block Gibbs sampling [15]. In this condition, hidden units are sampled simultaneously when given fixed values of the visible units. Similarly, visible units are sampled simultaneously when given the hidden units. A step in the Markov chain is thus taken as follows [15]: where ℎ ( ) refers to the set of all hidden units at the th step of the Markov chain. What it means is that, for example, ℎ ( ) is randomly chosen to be 1 (versus 0) with probability sigm( V ( ) + ), and similarly V ( +1) is randomly chosen to be 1 (versus 0) with probability sigm( ℎ ( ) + ) [15]. This can be illustrated graphically in Figure 2. Contrastive divergence does not wait for the chain to converge. Samples are obtained only after k-steps of Gibbs sampling. In practice, = 1 has been shown to work surprisingly well [15].

ElGamal Scheme.
In our privacy-preserving scheme, we use ElGamal [19], which is a typical public encryption method, as our cryptograph tool. Reference [20] has shown that the ElGamal encryption scheme is semantically secure [21] under a standard cryptographic assumption. In [12], the authors develop an elegant secure computing sigmoid function method and a secure computing product of two integer algorithms based on ElGamal's homomorphic property and probabilistic property. Here we give a brief review of these two algorithms. As shown in Algorithm 1, first Party computes that ( 1 + ) − , and is all the possible input of Party . Specifically, is the sigmoid function. Similarly, as shown in Algorithm 2, Party holds and Party holds . Party computes × for all possible inputs of Party and then sends all encrypted messages to Party . Then, Party and Party can obtain the secret share of × [12].

Overview and Algorithm of Our Privacy-Preserving
Restricted Boltzmann Machine. In order to use cryptographic tools in our privacy-preserving RBM, we use probability as the value of the hidden unit and visible unit. That means that when we are undertaking the Gibbs sampling process, we use the probability instead of {0, 1} as the value of the hidden unit and visible unit. Therefore, we can use the ElGamal scheme to encrypt the probability after rounding the decimal. However, there will be some accuracy loss when we use this approximation. We will evaluate this accuracy loss in Section 5. In our privacy-preserving RBM training algorithm, we assume the data are vertically partitioned. That means that each party owns some features of the dataset. Our privacypreserving RBM is the first work on training restricted Boltzmann machine over a vertically partitioned dataset. We will look in detail at our training algorithm.
For each training iteration, two parties, and , own the inputs V 0 = (V 0 1 , V 0 2 , . . . , V 0 ) and V 0 = V +1 , . . . , V + separately. The main idea of our privacy-preserving RBM is that when training our model, we use the cryptograph method (Algorithms 1 and 2) [12] to secure each step without revealing the original data to each other's party.
First, we let each party sum up their visible data of each sample. Then Party computes sigmoid(∑ ≤ ( V 0 + ) + )− for all possible , where is a random number generated by Party . Then Party rounds all these results to the integer and encrypts them. Then Party sends the cipher message to Party in the increasing order of . Then Party picks , which is their sum-up value, rerandomizes it, and sends it to Party , who partially decrypts this message and sends it back to Party , who decrypts it and gets the value of sigmoid(∑ ≤ ( V 0 + ) + ∑ ≤ ≤ + ( V 0 + )) − . Specifically, ℎ 0 1 is and ℎ 0 2 = sigmoid(∑ ≤ ( V 0 + ) + ∑ ≤ ≤ + ( V 0 + )) − as shown in the Privacy-Preserving Distributed Algorithm for RBM. Then, using the same method we can perform the rest of the privacypreserving Gibbs sampling process.
Party first generates a random number and computes ( 1 + ) − for each , is the possible inputs of Party . We define = ( 1 + ) − , is the plain text. Party encrypts each using the ElGamal scheme and gets ( , ), where each is a new random number. Party sends each ( , ) in the increasing order of .
Party partially decrypts ( 2 , ) and sends the partially decrypted message to Party .

Step 4.
Party finally decrypts the message (by doing partial decryption on the already partially decrypted message) to get 2 = ( 1 + 2 ) − . Note that is only known to Party and 2 is only known to Party . Furthermore, 2 + = ( 1 + 2 ) = ( ).
Party first generates a random number and computes ⋅ − for each , is the possible input of Party . Then define = ⋅ − , is the plain text. Then Party encrypts each using ElGamal scheme and gets ( , ), where each is a new random number. After that, Party sends each ( , ) to Party in the increasing order of .
Party picks ( , ), rerandomizes it, and sends ( , ) back to Party , where = + , and is only known to Party .

Step 3.
Party partially decrypts ( , ) and sends the partially decrypted message to Party .

Step 4.
Party finally decrypts the message (by doing partial decryption on the already partially decrypted message) to get = ⋅ − . Note that is only known to Party and is only known to Party . Furthermore, + = ⋅ .
Algorithm 2: Securely computing the product of two integers [12].
For the second updating weight part, we use Algorithm 2 [12] to securely compute the products V 0 ℎ 0 and V 1 ℎ 1 separately. Specifically, ℎ 0 = ℎ 0 1 + ℎ 0 2 , V 1 = V 1 1 + V 1 2 , and ℎ 1 = ℎ 1 1 + ℎ 1 2 , where the number on the top indicates the Gibbs step and the number on the bottom indicates the party the data belongs to. So we can get . Regardless of which party V 0 belongs to, we can get the same result. Furthermore, we . Therefore, we use Algorithm 2 to securely compute these products. As one example, V 0 1 ℎ 0 2 indicates that V 0 1 belongs to Party , which computes all V 0 1 × − for all , rounds all these result to the integer and encrypts them, and then sends the cipher message to Party in the increasing order of . Then Party picks , which is their ℎ 0 2 value, rerandomizes it, and sends it to Party , who partially decrypts this message and sends it back to Party , who decrypts it and gets the value of V 0 1 ℎ 0 2 − . Specifically, 0 11 is and 0 12 = V 0 1 ℎ 0 2 − as shown in the Privacy-Preserving Distributed Algorithm for RBM (Algorithm 3). Then, using the same method, we can perform the rest of the privacy-preserving product process.
Lastly, if Party owns V 0 , it can compute V 0 . Then Party sends this to Party , and Party sums up these two to get the final value of V 0 ℎ 0 − V 1 ℎ 1 . Then Party can perform gradient descent to update the weight. Using the same method, we can update the bias of visible unit and the bias of hidden unit .
A privacy-preserving testing algorithm can be easily derived from the Gibbs sampling part of the privacypreserving training algorithm.

Analysis of Algorithm Complexity and Accuracy Loss.
The running time of one iteration of training consists of two parts, the Gibbs sampling and updating the weights. First, we analyze the execution time of the Gibbs sampling process. According to [12], Algorithm 1 takes = (2 × Domain + 1) + 2 , where Domain is the total number of in Algorithm 1 and E and D are the costs of encryption and decryption. Therefore, in the Gibbs sampling process, we assume there are samples, Initialize all weights ( , , ) to small random numbers and make them known to both parties. Repeat for all training sample {V 0 , V 0 } do \\ This part mainly uses (5). Samples are obtained after only one step of Gibbs sampling because one-step Gibbs has been shown to work surprisingly well [22].
In order to provide the preservation of privacy, we introduced two approximations in our algorithm. First, we replaced the binary value by the probability. Second, we mapped the real numbers to fixed-point representations to enable the cryptographic operations in Algorithms 1 and 2 [12]. This is necessary in that intermediate results, such as the values of visible and hidden units, are represented as real numbers in normal RBM learning, but cryptographic operations are on discrete finite fields. We will empirically evaluate the impact of these two sources of approximation on the accuracy loss of our RBM learning algorithm in Section 6. Below we give a brief theoretical analysis of the accuracy loss caused by the fixed-point representations. We assume that the error ratio bound which is caused by truncating the real number is . In the Gibbs sampling process, Algorithm 1 is applied three times; therefore, the error ratio bound is (1 + ) 3 − 1. In updating the weight process, Algorithm 2 is one for each dataset. The error ratio bound for is .

Analysis of Algorithm's Security.
In our distributed RBM training algorithm, except the computations that can be done by a party itself, all other computations that have to be done jointly by the two parties protect their input data with semantically secure encryptions. In addition, all intermediate computing results are also protected using the secret sharing scheme. In the semihonest model, both parties follow the algorithm without any deviation; our algorithm guarantees that the additional knowledge gained from the execution of our algorithm by a party is only the final training result. Therefore, our algorithm protects both parties' privacy in this model.

Experiments
In this section, we explain the experimental process for measuring the accuracy loss of our modified algorithms. We compare the testing error rates to non-privacy-preserving cases. In the second set, we distinguish two types of approximations introduced by our algorithms: a conversion of real numbers to fixed-point numbers when applying cryptographic algorithms and an analysis of how they affect the accuracy of the RBM.

Effects of Two Types of Approximation on Accuracy.
In this section, we evaluate the loss of accuracy of our modified training model. In our model, there exist two approximations. The first one is that we use probability instead of binary value as our Gibbs sampling result. The second is that we truncate the probability to finite digits so that we can shift the decimal point and then use this number for encryption. We then distinguish and evaluate the effects of these two approximation types without cryptographic operations (we call it approximation test).
First, we compare the loss of accuracy caused by using probability instead of binary value on the MNIST dataset. We chose 5,000 samples as training data and 1,000 as testing data. We then set the 100 hidden units and perform the experiments by varying the number of epochs and evaluating the loss of accuracy on different training epochs. In Figure 3, we can see that the accuracy caused by this approximation is less than 1%. Since encryption and decryption do not influence the accuracy of our model, this is the accurate amount of loss of our privacy-preserving training method.
Second, we compare the accuracy loss caused by truncating the probability to finite digits. Specifically, we truncate the number to two digits. We set the parameter as the same as the first experiment. From the results we can see that the error rate is still close to the algorithm without approximation.

Conclusion and Future Work
In this paper, we have presented a privacy-preserving algorithm for RBM. The algorithm guarantees privacy in a standard cryptographic model, the semihonest model. Although approximations are introduced in the algorithm, the experiments on real-world data show that the amount of accuracy loss is reasonable.
Using our techniques, it should not be difficult to develop the privacy-preserving algorithms for RBM learning with three or more participants. In this paper, we have proposed only the RBM training method. A future research topic would be to apply it in a practical implementation and to extend our work to deep networks training.