SecureBP from Homomorphic Encryption

We present a secure backpropagation neural network training model (SecureBP), which allows a neural network to be trained while retaining the confidentiality of the training data, based on the homomorphic encryption scheme. We make two contributions. *e first one is to introduce a method to find a more accurate and numerically stable polynomial approximation of functions in a certain interval. *e second one is to find a strategy of refreshing ciphertext during training, which keeps the order of magnitude of noise at 􏽥 O(e33).


Introduction
Driven by massive amounts of data and the high scalability, versatility, and high efficiency of cloud computing, modern machine learning (ML) has been widely used in many fields, including health care, military, and finance [1][2][3]. ese fields often contain a large amount of sensitive data, so how to protect the data privacy while using them becomes a very important problem. At present, there exist various approaches that can be used to protect data privacy. Differential privacy (DP), secure multiparty computation (MPC), and homomorphic encryption (HE) are the most widely used methods for this problem.
DP allows one to control the amount of information leaked from an individual record in a dataset. By using DP, one can ensure privacy for any entity whose information is contained in the dataset as well as to create models that do not leak this information about the data they were trained on. erefore, DP is mainly used in the training process. However, we are more concerned about how to use cryptographic methods to protect data privacy.
Most MPC methods establish a communication protocol among the parties involved such that if the parties follow the protocol, then they will end with the desired results while protecting the security and privacy of their respective assets [4][5][6][7]. However, due to the large scale of data used in machine learning, the communication cost of MPC is very high.
HE is also another major method to protect data privacy, which allows us to perform certain arithmetic operations on encrypted data without decryption. Fully homomorphic encryption (FHE) (it allows us to perform arbitrarily complex and efficiently computable evaluations over encrypted data without decrypting them) was originally introduced by Rivest et al. in 1978 [8]. But it had been an open problem until Gentry presented the first plausible candidate FHE construction based on ideal lattices in 2009 [9]. Since then, a series of works [10][11][12][13][14][15] have been proposed to improve the security assumptions and efficiency of FHE, following Gentry's blueprint. Currently, some public libraries are available (Table 1), namely, HElib [16] and SEAL [17] based on BGV scheme [18] and FHEW [19] and TFHE [20,21] based on GSW scheme [14] and HEAAN [22] based on CKKS scheme [23]. e BGV-based schemes can handle a lot of bits at the same time, so they can pack and batch many operations in the SIMD manner. However, the set of operations that are efficient with BGV depends on the section of the parameter set. e GSW-based schemes can use Boolean circuits to deal with nonlinear operations quickly, but their computational efficiency of arithmetic operations is relatively low. e CKKS-based schemes can perform efficient approximate arithmetic operations on encrypted data by introducing a novel encoding technique and a fast rescale operation, but they cannot deal with nonpolynomial operations. It is widely used in machine learning due to its high efficiency in arithmetic operations (which is why we chose the CKKS scheme for our SecureBP model).
Two important use-cases for machine learning models are predictions-as-a-service (PaaS) setting and training-as-aservice (TaaS) setting. In the PaaS setting, a large organization (or the cloud) uses its proprietary data to train machine learning models. e organization now hopes to monetize the model by deploying services that allow users to upload their inputs and receive predictions for price. In the TaaS setting, the organization makes profits by deploying services that allow users to upload their encrypted inputs and receive the encrypted machine learning model. Moreover, in this setting, since the process of training an encrypted model is time-and resource-consuming, the techniques and proprietary tools for the training algorithm are often considered critical intellectual property by its owner, who is typically not willing to share them.
BP [24] is one of the most classical and widely used neural network models. It is more powerful than linear regression and logistic regression models. Moreover, the BP network already has the basic module of deep neural network (DNN); in other words, the BP network is the cornerstone of DNN. erefore, when we study the data privacy protection of machine learning, it is appropriate to take the BP network model as the breakthrough point.

Our Contributions.
In this paper, we present a secure backpropagation neural network model (SecureBP) based on HE. In this model, in a setup phase, the data owner (user) encrypts his data and sends them to the cloud. In the computation phase, the cloud can train the model on the encrypted data without learning any information beyond the ciphertext of data. Technically, we have two main contributions: a more accurate polynomial approximation technique and a lightweight interactive scheme to refresh ciphertexts during training.
We focus on the TaaS setting in this paper and we choose HE (i.e., CKKS scheme) as the method to protect user's data. For clarity, let us review the technical challenges and difficulties of using HE for the BP network in the TaaS setting. Firstly, in the BP network, each node is activated before output by an activation function, which is usually selected by nonpolynomial functions, such as sigmoid, tanget-hyperbolic (tanh), or rectified linear unit (ReLU). However, most existing HE schemes is that they only support polynomial arithmetic operations. e evaluation of the activation function is an obstacle for the homomorphic implementation of the BP network since it cannot be expressed as a polynomial. In addition, in order to ensure security, HE introduces some noise in encryption, and the noise increases as the homomorphic computation proceeds. When the noise reaches a certain threshold, the decryption error will occur. erefore, in view of the abovementioned technical difficulties, we make the following two contributions. e first contribution is that by using Chebyshev polynomials (in fact, several studies have suggested this approach, but none have examined it in detail), we introduce a more accurate polynomial approximation L n (x) of sigmoid function for a certain interval. Compared with Taylor polynomials, our method causes more similarities of derivatives with the sigmoid function (see Section 3.1).
e second contribution is that we propose a lightweight interaction protocol, which is a novel strategy to refresh ciphertext during training. e trivial way to deal with the growing noise is bootstrapping. However, bootstrapping comes with high computational overhead. To avoid costly bootstrapping of HE, we present the lightweight interaction protocol during training. By this method, on the one hand, no technical information of the cloud training model is provided to the user. On the other hand, the noise of weight ciphertext grows linearly after it grows to a certain value. Now that the basic ingredients are in place, we construct our SecureBP network. To demonstrate the feasibility of our SecureBP, we estimate its performance on three datasets: Iris dataset, Diabetes dataset and Sonar dataset (see Table 2), which are from the University of California at Irvine (UCI) dataset repository [25].

Related
Work. Before the current work, there have been some researches on privacy-preserving machine learning algorithm [26][27][28][29]. ese papers propose solutions based on MPC and HE techniques (see Table 3), but they appear to incur some problems.
Privacy-preserving machine learning via MPC provides a promising solution by allowing different parties to train various models on their joint data without revealing any information beyond the outcome. ey require interactivity between the party that holds the data and the party that performs the blind classification. Even though practical performances of MPC-based solutions have been impressive compared to FHE-based solutions, they incur other issues such as network latency and high bandwidth usage. Because of these downsides, HE-based solutions seem more scalable for real-life applications.
Privacy-preserving machine learning based on HE is more challenging. As we mentioned before, the standard where instead of standard activation function, they use a square function. Homomorphic computation depends on the total number of levels required to implement the network and results in a relatively high computational overhead which bounds CryptoNets practicability in resource-limited settings where the data owners have severe computational constraints. Moreover, the inherent limitation of most existing HE constructions is that they only support the arithmetic operations over modular spaces. erefore, their approaches required the size of parameter for real number operations (i.e., no modular reduction over plaintext space) which is too large to be practically implemented.
1.3. Organization. Section 2 briefly introduces some notations and reviews the framework of BP. Section 3 describes our SecureBP model. In section 4, we estimate our model and discuss the estimation and implementation results.

Preliminaries
2.1. Notations. All logarithms are base 2 unless otherwise indicated. During homomorphic operations, we use ⊗ to denote the multiplication between ciphertexts; ⊕ denotes the addition between ciphertexts and ⊙ denotes the scalar multiplication between a constant and a ciphertext.
Next, we introduce some signs used in the BP network: , the output value of hidden layer.
, the output value of output layer. [m] , the weight connecting the hiddenlayer j-th node and the input-layer i-th node.
, the weight connecting the outputlayer k-th node and the hidden-layer j-th node.
, the bias of hidden-layer j-th node.
, the bias of output-layer k-th node. (viii) L, the learning rate.

e Framework of BP.
In this subsection, we give a brief review of one version of the BP network. For ease of presentation, in this paper, we only consider a neural network of three layers (input layer, hidden layer, output layer). It is trivial to extend our work to the multilayers network. is configuration can be seen from Figure 1.
In the BP algorithm, there is one forward phase and one backward phase during each iteration. en, the whole BP algorithm can be described in Algorithm 1. e forward phase starts from the input layer and approaches the output layer. During this phase, weighted sums and activations are computed for every node in each layer using the activation function, which is normally the sigmoid function. at is, (1) e backward phase starts from the output layer and descends toward the bottom layer (i.e., the input layer) of the network to compute gradients. Finally, we need to update the weights ( W h j,i W O k,j }) and biases ( b h j , b O k ) using the computed gradients. e rules for updating are as follows: where

SecureBP Based on CKKS Scheme
In this section, we explain how to securely train the BP network model using the CKKS scheme.

A Decent Polynomial Approximation.
In the preceding update formula, except for activation function inside neurons (i.e., sigmoid function σ(x) � (1/1 + e − x )), all other operations in BP network are addition and multiplication, so they can be implemented over encrypted data. One limitation of the existing HE schemes is that they only support polynomial arithmetic operations. e evaluation of the activation function is an obstacle for the implementation of the BP network since it cannot be expressed as a polynomial. Hence, in order to operate a complete BP neural network over encrypted data, we replace the sigmoid function with polynomial approximations that are compatible with practical HE schemes.
Actually, the Taylor polynomials x k have been commonly used for approximation of the sigmoid function [32][33][34]: However, we observe that the size of error grows rapidly as |x| increases. Besides, in order to guarantee the accuracy of the BP network, we have to use a higher degree Taylor polynomial, but it requires too many homomorphic multiplications to be practically implemented. In summary, although Taylor expansions are more convenient and easier to compute, the accuracy of estimation is not always consistent because it is a local approximation near a certain point. erefore, we introduce another good candidate for approximation with better approximation ability to replace the sigmoid function: optimal and uniform polynomial approximation of σ(x). Not exactly, we find a polynomial function L n (x) that minimizes the absolute value of the error between σ(x) and L n (x) within a given interval. e Chebyshev polynomials are used to construct the optimal uniform approximation polynomials L n (x).
From the abovementioned definition, we can get two important properties of Chebyshev polynomials. e first property is that we can get a recurrence of Chebyshev polynomials e second is that the Chebyshev polynomial C n (x) has n different zero points on the interval [− 1, 1], i.e., x k � cos((2k − 1)π/2n), k � 1, 2, . . . , n. en, we can get the important theorem in polynomial approximation as follows.

Theorem 1. Let f(x) be a continuous differentiable function on interval
[− 1, 1], L n (x) be the interpolation polynomial, and its interpolation nodes x 0 , · · · , x n are the zero points of Chebychev polynomial C n+1 (x), then L n (x) is the optimal and uniform polynomial approximation of f(x) on interval [− 1, 1], and erefore, it can be seen from the abovementioned theorem that to find the optimal uniform approximation polynomial of f(x) on the interval [− 1, 1], we only need to set the interpolation node of L n (x) as the zero point of Chebychev polynomial C n+1 (x). For the function f(x) on an interval [a, b], we can take the transformation en, we can apply eorem 1 to g(t). We note that compared with Taylor polynomials, this method of polynomial approximation causes more similarities of derivatives with the sigmoid function, which might help produce a better model (see Figure 2).
To justify our claims, we compare the accuracy of the produced BP neural network model using different activation functions with the Iris dataset (see Table 4).

Our SecureBP Network Model.
In this section, we explain how to perform the lightweight interactive protocol to refresh ciphertexts during the training phase. To be precise, we explicitly describe a full pipeline of the evaluation of the SecureBP. We adopt the same assumptions as in the previous section so that the whole database can be encrypted in m ciphertexts.
First of all, in the setup phase, the user encrypts the dataset and sends them to the public cloud. e cloud randomly initializes weights and biases (in the initialization phase, the weights and biases can be plaintexts). Next, we introduce the iterative computing phase carried out in the cloud. e goal of each iteration is to update the weights and biases. Note that ct.
. Each iteration consists of the following six steps: Step 1. Cloud starts the iterative computation (here, ct.r j (including ct.r k in (8)) represents the encryption of a small random number, which has no effect on the correctness of the decryption): ct.h j � L 3 ct.h j ′ ⊕ ct.r j .
Step 2. Cloud sends ct.h j j∈ [z] to the user. After decrypting and reencrypting them, the user sends the refresh ciphertext ct.h j j∈ [z] to the cloud for further computation.
Step 3. Cloud computes Step 4. Cloud sends ct.o k k∈ [d] to the user. After decrypting and reencrypting them, the user sends the refresh ciphertext ct.o k k∈ [d] to the cloud for further computation.
Step 5. Cloud updates ct.W o k,j and ct.b o k : Step 6. Cloud updates ct.W h j,i , ct.b h j , In the abovementioned iteration, we choose the interaction between the cloud and the user to avoid high-cost bootstrapping. We will send the outputs of the hidden layer and the output layer to the user. After the user refreshes these ciphertexts, they will be sent to the cloud to continue the subsequent homomorphic operations. Because the outputs of the hidden layer and output layer are two ciphertext vectors, with a total of (z + d) ciphertexts, the communication cost between the cloud and the user is not high. rough the analysis of noise in the later section, we can find that the advantage of this interactive protocol makes the noise of ciphertext in the process of homomorphic operations grow linearly after it reaches a certain value (i.e., e 33 ).
In this process, it should also be noted that what the cloud sends to the user is not the true outputs of the hidden layer and output layer (i.e., ct.h j ′ } j∈ [z] and ct.o k ′ k∈ [d] ), but the disturbed ct.h j j∈ [z] and ct.o k k∈ [d] . e idea is to prevent the user from snooping into the cloud to train the neural network.

Estimation
In this section, we show the parameters setting for BP and the CKKS scheme and analyze the estimation and implementation results.

Parameters for the BP Algorithm.
In the BP model, the numbers of input nodes and output nodes are determined, while the number of hidden nodes is uncertain. In fact, the number of hidden nodes has an impact on the performance of the neural network; an empirical formula can determine the number of hidden nodes as follows: where z is the number of hidden nodes, d is the number of input nodes, m is the output nodes, and a is an adjustment constant between 0 and 10.  Table 5, and we choose Iris, Diabetes, and Sonar datasets, which are from the University of California at Irvine (UCI) dataset repository [25]. e conventional BP learning network has the same parameters as the SecureBP algorithm.

Parameters for the CKKS Scheme.
In the CKKS scheme, the coefficients of error polynomials are sampled from the discrete Gaussian distribution of standard deviation σ � 3.2 and a secret key is chosen randomly from the set of signed binary polynomials with the Hamming weight h � 64. We used the estimator of Albrecht et al. [35] to guarantee that the proposed parameter sets achieve at least 80 bit security level against the known attacks against the LWE problem.
We analyze the growth of noises in some ciphertexts, and Table 6 provides theoretical upper bounds on the noise growth during homomorphic operations. Note that e denotes the noise of a fresh ciphertext.
As can be seen from Table 6, the maximum size of growth noise during homomorphic operations is O(e 33 ) [36]; we choose parameters as follows: L � 10, N � 2 15 , log q � 55, and [log Q L ] � 611.

Estimation and Implementation
Results. By carefully analyzing our SecureBP protocol, we calculate the number of homomorphic operations required for each step in the course of an iteration (as shown in Table 7). Polynomial approximations for sigmoid function  From Table 7, we can see that the computation time required for an iteration is only related to the number of nodes in each layer. Combined with the time required for each homomorphic operation in [36], we give the estimation time (Table 8) of training SecureBP network homomorphically with Iris, Diabetes, and Sonar datasets, and Table 9 shows the accuracy comparison of encrypted and unencrypted BP networks in the case of 10 and 23 iterations, respectively.

Efficiency and Accuracy Discussion.
ere are still some limitations in the application of our evaluation model to an arbitrary dataset. On the one hand, the HE system is a promising solution for the privacy issue, but its efficiency in real applications remains an open question. In other words, one constraint in our approach is that the efficiency of the SecureBP network is limited by the efficiency of homomorphic operations. On the other hand, we find that the accuracy of the network model is positively correlated with     the degree of approximate polynomials. However, the higher degree of polynomial means the more homomorphic operations, the more time it takes to train the network model. erefore, we need the tradeoff between the training efficiency and accuracy of the model.

Conclusion and Future Work
In this paper, we present a SecureBP network model for homomorphic training. We introduce two methods, more accuracy polynomial approximation and lightweight interactive protocol, to solve the difficulties encountered when the CKKS scheme is used to protect the BP network, and our method has a good experimental performance on different datasets. For future work, we plan to explore how to train the deep neural network and the convolutional neural network effectively on encrypted data in a training-as-a-service setting.

Data Availability
All types of data used to support the findings of this study have been deposited in the University of California at Irvine (UCI) Machine Learning Repository (http://archive.ics.uci. edu/ml/datasets.html).

Conflicts of Interest
e authors declare that they have no conflicts of interest.