We present a secure backpropagation neural network training model (SecureBP), which allows a neural network to be trained while retaining the confidentiality of the training data, based on the homomorphic encryption scheme. We make two contributions. The first one is to introduce a method to find a more accurate and numerically stable polynomial approximation of functions in a certain interval. The second one is to find a strategy of refreshing ciphertext during training, which keeps the order of magnitude of noise at
Driven by massive amounts of data and the high scalability, versatility, and high efficiency of cloud computing, modern machine learning (ML) has been widely used in many fields, including health care, military, and finance [
DP allows one to control the amount of information leaked from an individual record in a dataset. By using DP, one can ensure privacy for any entity whose information is contained in the dataset as well as to create models that do not leak this information about the data they were trained on. Therefore, DP is mainly used in the training process. However, we are more concerned about how to use cryptographic methods to protect data privacy.
Most MPC methods establish a communication protocol among the parties involved such that if the parties follow the protocol, then they will end with the desired results while protecting the security and privacy of their respective assets [
HE is also another major method to protect data privacy, which allows us to perform certain arithmetic operations on encrypted data without decryption. Fully homomorphic encryption (FHE) (it allows us to perform arbitrarily complex and efficiently computable evaluations over encrypted data without decrypting them) was originally introduced by Rivest et al. in 1978 [
Outstanding performance of HE schemes.
Based scheme | Library | Plaintext | Operation |
---|---|---|---|
BGV | HElib, SEAL | Finite field packing | Addition, multiplication |
GSW | FHEW, TFHE | Binary string | Look-up table |
CKKS | HEAAN | Real/complex packing numbers | Fixed-point arithmetic |
Two important use-cases for machine learning models are predictions-as-a-service (PaaS) setting and training-as-a-service (TaaS) setting. In the PaaS setting, a large organization (or the cloud) uses its proprietary data to train machine learning models. The organization now hopes to monetize the model by deploying services that allow users to upload their inputs and receive predictions for price. In the TaaS setting, the organization makes profits by deploying services that allow users to upload their encrypted inputs and receive the encrypted machine learning model. Moreover, in this setting, since the process of training an encrypted model is time- and resource-consuming, the techniques and proprietary tools for the training algorithm are often considered critical intellectual property by its owner, who is typically not willing to share them.
BP [
In this paper, we present a secure backpropagation neural network model (SecureBP) based on HE. In this model, in a setup phase, the data owner (user) encrypts his data and sends them to the cloud. In the computation phase, the cloud can train the model on the encrypted data without learning any information beyond the ciphertext of data. Technically, we have two main contributions: a more accurate polynomial approximation technique and a lightweight interactive scheme to refresh ciphertexts during training.
We focus on the TaaS setting in this paper and we choose HE (i.e., CKKS scheme) as the method to protect user’s data. For clarity, let us review the technical challenges and difficulties of using HE for the BP network in the TaaS setting. Firstly, in the BP network, each node is activated before output by an activation function, which is usually selected by nonpolynomial functions, such as sigmoid, tanget-hyperbolic (tanh), or rectified linear unit (ReLU). However, most existing HE schemes is that they only support polynomial arithmetic operations. The evaluation of the activation function is an obstacle for the homomorphic implementation of the BP network since it cannot be expressed as a polynomial. In addition, in order to ensure security, HE introduces some noise in encryption, and the noise increases as the homomorphic computation proceeds. When the noise reaches a certain threshold, the decryption error will occur. Therefore, in view of the abovementioned technical difficulties, we make the following two contributions.
The first contribution is that by using Chebyshev polynomials (in fact, several studies have suggested this approach, but none have examined it in detail), we introduce a more accurate polynomial approximation
The second contribution is that we propose a lightweight interaction protocol, which is a novel strategy to refresh ciphertext during training. The trivial way to deal with the growing noise is bootstrapping. However, bootstrapping comes with high computational overhead. To avoid costly bootstrapping of HE, we present the lightweight interaction protocol during training. By this method, on the one hand, no technical information of the cloud training model is provided to the user. On the other hand, the noise of weight ciphertext grows linearly after it grows to a certain value.
Now that the basic ingredients are in place, we construct our SecureBP network. To demonstrate the feasibility of our SecureBP, we estimate its performance on three datasets: Iris dataset, Diabetes dataset and Sonar dataset (see Table
Performance of SecureBP in time and accuracy.
Dataset | Accuracy (%) | Time (ms) |
---|---|---|
Iris | 79.6 | 7632.7 |
Diabetes | 65.1 | 9962.1 |
Sonar | 82.23 | 2.0993 × 108 |
Before the current work, there have been some researches on privacy-preserving machine learning algorithm [
Research works in secure machine learning.
Setting | Prior work | Problem | Activation | Technique |
---|---|---|---|---|
PaaS | DeepSecure [ | DNN | ReLU, ReLU, sigmoid | MPC |
Gazelle [ | CNN | ReLU | MPC, HE | |
CryptoNets [ | CNN | Square function | SHE | |
FHE-DiNN [ | DiNN | Sign | FHE | |
Chameleon [ | DNN | ReLU, sigmoid | MPC, HE | |
TaaS | SecureML [ | LR, NN | Sigmoid, Softmax | MPC |
[ | LR | Least squares approximation | HE | |
Ours | BP | HE |
Privacy-preserving machine learning via MPC provides a promising solution by allowing different parties to train various models on their joint data without revealing any information beyond the outcome. They require interactivity between the party that holds the data and the party that performs the blind classification. Even though practical performances of MPC-based solutions have been impressive compared to FHE-based solutions, they incur other issues such as network latency and high bandwidth usage. Because of these downsides, HE-based solutions seem more scalable for real-life applications.
Privacy-preserving machine learning based on HE is more challenging. As we mentioned before, the standard activation function is a challenge in applying HE to the machine learning algorithm. Faced with this challenge, Ran Gilad-Bachrach et al. [
Section
All logarithms are base 2 unless otherwise indicated. During homomorphic operations, we use
Next, we introduce some signs used in the BP network:
In this subsection, we give a brief review of one version of the BP network. For ease of presentation, in this paper, we only consider a neural network of three layers (input layer, hidden layer, output layer). It is trivial to extend our work to the multilayers network. This configuration can be seen from Figure
A conventional BP network with three layers.
In the BP algorithm, there is one forward phase and one backward phase during each iteration. Then, the whole BP algorithm can be described in Algorithm
(1) Set the number of iterations, weight matrix W, and bias vector b to small random initial values. (2) Repeat (3) Forward phase: beginning with the input nodes, compute weighted sums, and activation function for all nodes; (4) Backward phase: compute gradients for all nodes starting from output nodes; (5) Adjust the weights and biases (6) until the number of iterations reaches a preset value.
The forward phase starts from the input layer and approaches the output layer. During this phase, weighted sums and activations are computed for every node in each layer using the activation function, which is normally the sigmoid function. That is,
The backward phase starts from the output layer and descends toward the bottom layer (i.e., the input layer) of the network to compute gradients. Finally, we need to update the weights (
In this section, we explain how to securely train the BP network model using the CKKS scheme.
In the preceding update formula, except for activation function inside neurons (i.e., sigmoid function
Actually, the Taylor polynomials
However, we observe that the size of error grows rapidly as
The Chebyshev polynomials are used to construct the optimal uniform approximation polynomials
From the abovementioned definition, we can get two important properties of Chebyshev polynomials. The first property is that we can get a recurrence of Chebyshev polynomials
Let
Therefore, it can be seen from the abovementioned theorem that to find the optimal uniform approximation polynomial of
To justify our claims, we compare the accuracy of the produced BP neural network model using different activation functions with the Iris dataset (see Table
Polynomial approximation for the sigmoid function
Accuracy of different activation functions on the Iris dataset.
Iteration | 50 (%) | 100 (%) | 150 (%) | 180 (%) | 200 (%) | 220 (%) | 240 (%) | 260 (%) | 280 (%) | 300 (%) |
---|---|---|---|---|---|---|---|---|---|---|
33.33 | 33.33 | 40.67 | 74.67 | 83.33 | 84.00 | 84.00 | 84.67 | 82.67 | 85.33 | |
33.33 | 33.33 | 40.67 | 51.33 | 81.33 | 84.00 | 84.00 | 84.00 | 82.67 | 82.67 | |
33.33 | 33.33 | 45.33 | 54.67 | 58.00 | 88.00 | 90.00 | 90.00 | 91.30 | 91.30 |
In this section, we explain how to perform the lightweight interactive protocol to refresh ciphertexts during the training phase. To be precise, we explicitly describe a full pipeline of the evaluation of the SecureBP. We adopt the same assumptions as in the previous section so that the whole database can be encrypted in
First of all, in the setup phase, the user encrypts the dataset and sends them to the public cloud. The cloud randomly initializes weights and biases (in the initialization phase, the weights and biases can be plaintexts). Next, we introduce the iterative computing phase carried out in the cloud. The goal of each iteration is to update the weights and biases. Note that
Cloud starts the iterative computation (here,
Cloud sends
Cloud computes
Cloud sends
Cloud updates
Cloud updates
In the abovementioned iteration, we choose the interaction between the cloud and the user to avoid high-cost bootstrapping. We will send the outputs of the hidden layer and the output layer to the user. After the user refreshes these ciphertexts, they will be sent to the cloud to continue the subsequent homomorphic operations. Because the outputs of the hidden layer and output layer are two ciphertext vectors, with a total of
In this process, it should also be noted that what the cloud sends to the user is not the true outputs of the hidden layer and output layer (i.e.,
In this section, we show the parameters setting for BP and the CKKS scheme and analyze the estimation and implementation results.
In the BP model, the numbers of input nodes and output nodes are determined, while the number of hidden nodes is uncertain. In fact, the number of hidden nodes has an impact on the performance of the neural network; an empirical formula can determine the number of hidden nodes as follows:
Weights are initialized as uniformly random values in the range
Datasets and parameters in BP.
Dataset | Number of samples | Number of features | Number of hidden nodes | Learning rate |
---|---|---|---|---|
Iris | 150 | 4 | 4 | 0.6 |
Diabetes | 768 | 8 | 5 | 0.5 |
Sonar | 208 | 60 | 9 | 0.4 |
In the CKKS scheme, the coefficients of error polynomials are sampled from the discrete Gaussian distribution of standard deviation
We analyze the growth of noises in some ciphertexts, and Table
Noise growth during the SecureBP process.
The ciphertext | The 1st iteration | The |
---|---|---|
As can be seen from Table
By carefully analyzing our SecureBP protocol, we calculate the number of homomorphic operations required for each step in the course of an iteration (as shown in Table
Number of operations in SecureBP.
Step | Enc | Dec | Mult | CMult | Add |
---|---|---|---|---|---|
1 | 0 | 0 | 0 | ||
2 | 0 | 0 | 0 | ||
3 | 0 | 0 | 0 | ||
4 | 0 | 0 | 0 | ||
5 | 0 | 0 | 3 | 4 | 5 |
6 | 0 | 0 | 3 | ||
Total |
From Table
Homomorphic training of SecureBP.
Dataset | Enc (ms) | Dec (ms) | Mult (ms) | CMult (ms) | Add (ms) | Total (ms) |
---|---|---|---|---|---|---|
Iris | 522 | 39 | 6068 | 836 | 167.7 | 7632.7 |
Diabetes | 812 | 46.8 | 7872 | 1012 | 219.3 | 9962.1 |
Sonar | 4060 | 78 | 20992 | 1716 | 580.5 | 2.0993 × 108 |
Comparison of encrypted/unencrypted BP algorithm.
Dataset | Number of iterations | Error rate of SecureBP (%) | Error rate of conventional BP (%) |
---|---|---|---|
Iris | 10 | 66.67 | 66.67 |
23 | 20.40 | 16.67 | |
Diabetes | 10 | 36.97 | 34.71 |
23 | 34.90 | 33.89 | |
Sonar | 10 | 21.45 | 18.26 |
23 | 17.77 | 17.21 |
There are still some limitations in the application of our evaluation model to an arbitrary dataset. On the one hand, the HE system is a promising solution for the privacy issue, but its efficiency in real applications remains an open question. In other words, one constraint in our approach is that the efficiency of the SecureBP network is limited by the efficiency of homomorphic operations. On the other hand, we find that the accuracy of the network model is positively correlated with the degree of approximate polynomials. However, the higher degree of polynomial means the more homomorphic operations, the more time it takes to train the network model. Therefore, we need the tradeoff between the training efficiency and accuracy of the model.
In this paper, we present a SecureBP network model for homomorphic training. We introduce two methods, more accuracy polynomial approximation and lightweight interactive protocol, to solve the difficulties encountered when the CKKS scheme is used to protect the BP network, and our method has a good experimental performance on different datasets. For future work, we plan to explore how to train the deep neural network and the convolutional neural network effectively on encrypted data in a training-as-a-service setting.
All types of data used to support the findings of this study have been deposited in the University of California at Irvine (UCI) Machine Learning Repository (
The authors declare that they have no conflicts of interest.
This research was supported the National Natural Science Foundation of China under Grant no. 61672030.