Privacy Preserving RBF Kernel Support Vector Machine

Data sharing is challenging but important for healthcare research. Methods for privacy-preserving data dissemination based on the rigorous differential privacy standard have been developed but they did not consider the characteristics of biomedical data and make full use of the available information. This often results in too much noise in the final outputs. We hypothesized that this situation can be alleviated by leveraging a small portion of open-consented data to improve utility without sacrificing privacy. We developed a hybrid privacy-preserving differentially private support vector machine (SVM) model that uses public data and private data together. Our model leverages the RBF kernel and can handle nonlinearly separable cases. Experiments showed that this approach outperforms two baselines: (1) SVMs that only use public data, and (2) differentially private SVMs that are built from private data. Our method demonstrated very close performance metrics compared to nonprivate SVMs trained on the private data.


Introduction
Data sharing is important for accelerating scientific discoveries, especially when there are not enough local samples to test a hypothesis [1,2]. However, medical data are sensitive as they essentially contain personal information and can reveal much about ethnicity, disease risk [3], and even family surnames [4]. To promote data sharing, it is important to develop privacy-preserving algorithms that respect data confidentiality and present data utility [5], especially when one wants to leverage cloud computing [6].
Privacy preserving data analysis and publishing [7,8] have received considerable attention in recent years as a promising approach for sharing information while preserving data privacy. Differential privacy [9][10][11] has recently emerged as one of the strongest privacy guarantees for statistical data release [12][13][14][15][16][17]. A statistical aggregation or computation is DP (we shorten differentially private to DP) if the outcome is formally indistinguishable when run with and without any particular record in the dataset. The level of indistinguishability is quantified as a privacy parameter . A common mechanism to achieve differential privacy is the Laplace mechanism [18] which injects calibrated noise to a statistical measure determined by the privacy parameter and the sensitivity of the statistical measure influenced by the inclusion and exclusion of a record in the dataset. A lower privacy parameter requires larger noise to be added and provides a higher level of privacy.
General purpose algorithms for privacy protection (e.g., [19,20]) often introduce too much perturbation error, which renders the resulting information useless for healthcare research. Our contribution is to leverage a small portion of open-consented data to maximally explore information that resides in the private data through a hybrid framework. Figure 1 shows an example of an environment in this case. We recently published differentially private distributed logistic regression using public and private biomedical datasets [21], which demonstrated advantages over pure private or public models. However, logistic regression is a generalized linear model, which has limited flexibility in classifying complex patterns. In this paper, we sought to extend our previous effort to the more powerful, RBF-kernel based support vector machines.
The remainder of the paper is organized as follows. Section 2 reviews background knowledge of differential privacy and SVM and RBF kernel. Section 3 describes the framework and details for our hybrid SVM mechanism. Then, Section 4 contains an extensive set of experimental  . . Figure 1: Biomedicine data sharing system. A small amount of public data and a large amount of private data are available for different data providers. A privacy preserving support vector machine can leverage both public and private data to maximize the classification accuracy under differential privacy. Then users can classify their test data via the released privacy preserving classifier.
evaluations. Finally, Section 5 concludes the paper with conclusions, limitations, and directions for future work.

Related Work
Rubinstein et al. [22] propose a private kernel SVM algorithm (shortened as PrivateSVM) which only works for a translation-invariant kernel (Δ). The method approximates the original infinite feature space Ω of (Δ) with a finite feature spaceΩ using the Fourier transform ( ) of (Δ). Then add the noise to the weight parameters in the primal form based on the new spaceΩ. One weakness is that the parameters used to constructΩ are randomly generated from ( ) which degrades the approximation accuracy of Ω to Ω. Another problem is that the utility bounds use the same regularization parameter value to compare the private and nonprivate classifiers. They take no consideration into the change of regularization parameter incurred by privacy constraints. Chaudhuri et al. [23] investigated a general mechanism, namely, DPERM, to produce private approximations of classifiers by regularized empirical risk minimization (ERM) with good perturbation error. Akin to PrivateSVM, DPERM requires that the underlying kernel is translation invariant. In this paper, we will compare our method to the PrivateSVM algorithm, since DPERM has comparable performance with PrivateSVM.

Preliminary
Consider an original dataset = {(x , ) | ∈ + , 1 ≤ ≤ } that contains a small portion of public data public and a large part of private data private . Our goal is to release a differentially private support vector machine using both public and private data. In this section, we first introduce the definition of differential privacy; then, we give a brief overview of SVM and RBF kernel.
3.1. Differential Privacy. Differential privacy has emerged as one of the strongest privacy definitions for statistical data release. It guarantees that if an adversary knows complete information of all the tuples in except one, the output of a differentially private randomized algorithm should not give the adversary too much additional information about the remaining tuples. We say that datasets and differ in only one tuple if we can obtain by removing or adding only one tuple from . A formal definition of differential privacy is given as follows.
Definition 1 ( -differential privacy [18]). Let A be a randomized algorithm over two datasets and differing in only one tuple, and let O be any arbitrary set of possible outputs f A. Algorithm A satisfies -differential privacy if and only if the following holds: (1) Intuitively, differential privacy ensures that the released output distribution of A remains nearly the same whether or not an individual tuple is in the dataset.
Definition 2 (sensitivity [18]). Let denote a numeric function, and the sensitivity of is defined as the maximal 1norm distance between the outputs of over the two datasets and which differ in only one tuple. Formally, With the concept of sensitivity, the noise follows a zeromean Laplace distribution with the magnitude = Δ / . To fulfill -differential privacy for a numeric function over , it is sufficient to publish ( ) + , where is drawn from Lap(Δ / ).

Review of SVM and RBF
Kernel. SVM is one of the most popular supervised binary classification methods that takes a sample and a predetermined kernel function as input, and outputs a predicted class label for this sample. Consider training data = {(x , ) | ∈ + , 1 ≤ ≤ }, where x ∈ denotes the training input points, ∈ {1, −1} are the training class labels, and is the size of training data. Here, is the dimension of input data and "+1" and "−1" are class labels. A SVM maximizes the geometric margin between two classes of data and minimizes the error from misclassified data points. The primal form of a soft-margin SVM can be written as where w is the normal vector to the hyperplane separating two classes of data, ( ,̂) is a loss function convex in , is a regularization parameter that weighs smoothness and errors (i.e., large for fewer errors, smaller for increased smoothness), and w (x ) = ⟨ ( ), w⟩, where (x) : → is a function mapping training data point from their input space to a new -dimensional feature space ( may be infinite). Sometimes we map the training data from their input space to another high-dimensional feature space in order to classify nonlinearly separable data. When is large or infinite, the innerproducts in feature space may be computed efficiently by an explicit representation of the kernel function (x, y) = ⟨ (x), (y)⟩. For example, (x, y) = x y is a linear kernel function for a linear SVM, and (x, y) = exp(−‖x − y‖ 2 2 / 2 ) is a RBF kernel function, which is translation invariant.
In this paper, we use a RBF kernel function. Our method can be applied to any translation invariant kernel SVM. With the hinge loss ( , w (x )) = max(0, 1 − w (x )), we can obtain a dual form SVM written as where ∈ , ∈ (1, ) is a persample parameter and ∈ w, ∈ (1, ) is a perfeature weight parameter. The weight vector w can be converted from sample weight vector via w = ∑ =1 x in the linear SVM.

Privacy Preserving Hybrid SVM
In this section, we first introduce a framework overview and then the technical details of our hybrid SVM method. We assume that all data samples follow the same distribution.
Here, we assume that all original data from different data sets follow some unknown joint multivariate distribution and all data tuples are samples from this distribution. Figure 2 illustrates the general framework of hybrid SVM. Algorithm 1 presents the hybrid SVM algorithm. First, we use the small amount of public data and (5) and (6) to compute the parameter = ( 1 , . . . , ) , ∈ in the mapping function of the approximation form to the RBF kernel. Second, with , we transform the private data from the original sample space to the new 2dimensional feature space via the mapping function̂( ) in (7). Then we can compute the parameter in the dual space with the transformed private data and w in the primal space via the linear relationship between and w in the linear SVM. Finally, draw from Lap( ) 2 where = 2 2.5 √ / and returnŵ = w + and . Then users can transform their test data to the new 2 -dimensional feature space with and classify the transformed data withŵ. Here the computation Input: Public data public , private data private , the dimensionality of , a regularization parameter , and privacy budget ; Output: Differentially private SVM;
Algorithm 1: Hybrid SVM algorithm. of parameter has no privacy risk because it is retrieved directly from public data. More details about hybrid SVM will be given in the successive subsections.
Privacy Properties. We present the following theorem showing the privacy property of Algorithm 1.

Theorem 3. Algorithm 1 guarantees -differential privacy.
Proof. For step 1, no private data is used, and hence step 1 does not impact the privacy guarantee. Due to Corollary 15 in [22] and the fact that the hinge-loss is convex and 1-Lipschitz in , the sensitivity of w over a pair of neighbouring datasets is Δ w = 2 2.5 √ / . Then the scale parameter in step 4 is set to = Δ w / = 2 2.5 √ / due to the Laplace mechanism introduced in Section 3.1. Therefore, Algorithm 1 preservesdifferential privacy which completes the proof. [24] approximate a Reproducing Kernel Hilbert Space (RKHS) H induced by an infinite dimensional feature mapping with a random RKHSĤ induced by a random finite-dimensional mapping . The random finite-dimensional RKHSĤ can be constructed by drawing i.i.d. vectors 1 , . . . , from the Fourier transform of a positive-definite translation-invariant kernel function ( , ), such as the RBF kernel function. Then we can obtain an approximation form ( ) ( ) of ( , ) using the real-valued mapping function ( ) : → defined by the following equation: → maps the data from its original -dimensional input space to the newdimensional feature space. Their approach is based on the fact that the kernel function of a continuous positive-definite translation-invariant kernel is the Fourier transform of a nonnegative measure. The uniform convergence property of the approximation form ( ) ( ) to the kernel function ( , ) has also been proved in [24]. In our context, the kernel function ( , ) refers to the RBF kernel function.

The Computation of . Rahimi and Recht
In our problem setting, since a small amount of public data can be considered as in ( ) and only the vectors 1 , . . . , are needed to construct the random finitedimensional RKHSĤ, we can compute the vectors 1 , . . . , with an optimization function defined as follows: Since (6) is an unconstrained nonlinear optimization function, we solve it using L-BFGS (the full name is Limitedmemory Broyden Fletcher Goldfarb Shanno) algorithm. Thus, we can obtain a more accurate approximation form ( ) ( ) of the kernel function ( , ) by deploying the public data to compute the , than randomly sampling from the fourier transform of the kernel function ( , ) as shown in [25]. To guarantee differential privacy, we need only consider the data-dependent weight parameter w. Fortunately we can employ the differentially private linear SVM approach in [25] to compute w after transforming all private data to a new 2 -dimensional feature space using the mappinĝ( ) : → 2 defined in (7) with the vectors 1 , . . . , as follows:

The Computation ofŵ.
With the vectors 1 , . . . , to approximate the RBF kernel function, we can convert RBF kernel SVM in the -dimensional input space into the linear SVM in a new 2 -dimensional feature space with (7), then use the privacy preserving linear SVM algorithm in [25]. The general idea of this algorithm is that with the transformed 2 -dimensional private data, we first compute the parameter in the dual space and then w in the primal space using w = ∑ =1̂( x ); then we draw from Lap( ) 2 , where = 2 2.5 √ / and compute noisyŵ withŵ = w + .

Experiments
In this section, we experimentally evaluate our hybrid SVM and compare it with one state-of-the-art method, called  For each dataset, we randomly extract a subset of original data as a public data pool, from which public data is sampled uniformly, and use the remaining 30000 tuples as the private data.
Comparison. We experimentally compared the performance of our hybrid SVM against two approaches, namely, public data baseline and private SVM [25]. The public data baseline is a RBF kernel SVM that uses only public data. In our experiment figures, we use "Public-#" to denote the public data baseline method with # as the size of public data. The private SVM is a state-of-the-art differentially private RBF kernel SVM that uses private data only. The parameters in all methods are set to optimal values.
Metrics. We used the other attributes to predict the value of annual income by converting annual income into a binary attribute: values higher than a predefined threshold were mapped to 1, and otherwise to −1. Here, we set the predefined threshold as the median value of annual income. The classification accuracy was measured by the AUC (the area under an ROC curve) [26]. The boxplot was used to measure the stability of our method and private SVM. The boxplots of "Public-50, " "Public-100, " and "Public-200, " are qualitatively similar to our hybrid SVM; hence, we do not report boxplots of these baseline methods. We performed 10fold cross-validation 10 times for each algorithm and reported the average results. We varied three different parameters: the privacy budget , the dataset dimensionality, and the

Computation
Time. Finally, Figure 9 shows the time cost of our proposed algorithm with varying dimensions and different sampling rates. We only report the results for the US dataset; the results for the Brazil dataset are greatly similar. One can notice that the dimensionality, rather than the sampling rate, determines the computational cost of the hybrid SVM. The overhead of the hybrid SVM is from computing with the public data, since a nonlinear optimization equation needs to be solved. As the other private SVM methods, our hybrid SVM is intended for off-line use, and hence the time is generally acceptable for even 14 dimensional datasets.

Discussion and Conclusion
We proposed and developed a RBF kernel SVM using a small amount of public data and a large amount of private data to preserve differential privacy with improved utility. In this algorithm, we use public data to compute the parameters in an approximation form of the RBF kernel function and then train private classifiers with linear SVM after converting all private data into a new feature space defined by the approximation form. A limitation of our approach is that we used the L-BFGS method [27], which is not very efficient, to find the optimal solution. Because the objective function in (6) is not a convex function, our model is computationally intensive in order to calculate the local optimal values, especially when the size of the public data set is large. We will develop more efficient methods and test the model on clinical records in future work. Another limitation is that we assume all original data from different data sets follow some unknown joint multivariate distribution. Our assumption might now always be true in practice, and calibration is necessary for future investigation. That is, in the presence of distributional difference, we will leverage transfer learning to build the global model.