Practical Privacy Preserving-Aided Disease Diagnosis with Multiclass SVM in an Outsourced Environment

With the rapid development of cloud computing and machine learning, using outsourced stored data and machine learning model for training and online-aided disease diagnosis has a great application prospect. However, training and diagnosis in an outsourced environment will cause serious challenges to the privacy of data. At present, many scholars have proposed privacy preserving machine learning schemes and made a lot of progress, but there are still great challenges in security and low client load. In this paper, we propose a complete privacy preserving outsourced multiclass SVM training and aided disease diagnosis scheme. We design some eﬃcient basic operation algorithms for encrypted data. Then, we design an eﬃcient and privacy preserving SVM model training protocol using the basic operation algorithms. We propose a secure maximum ﬁnding algorithm and secure comparison algorithm. Then, we design an eﬃcient online-aided disease diagnosis scheme based on the BFV cryptosystem and blinding technique. Detailed security analysis proves that our scheme can protect the privacy of each participant. The experimental results illustrate that our proposed scheme signiﬁcantly reduces the computation overhead compared with the previous similar works. Our proposed scheme completes most of the operations of aided disease diagnosis by the cloud servers and the client only needs to complete a small amount of encryption and decryption operations. The overall computation overhead is 0.175s, and the eﬃciency of online aided disease diagnosis is improved by 85.4%. At the same time, our proposed scheme provides multiclass diagnosis results, which can better assist doctors in their treatment.


Introduction
Machine learning (ML) uses the computer system to build mathematical models on sample data with statistical methods and makes predictions or decisions without being explicitly programmed. Now, ML has shown significant advantages in the field of disease diagnosis and brings more and more convenience to the prevention and treatment of diseases.
With the rapid development of cloud computing technology, cloud service providers (CSP) have high-quality computation and huge storage space, which can provide data processing, model training, diagnosis services and deployment, and other intelligent solutions based on machine learning. In this context, the local clients will outsource their medical data and machine learning models to CSP without having to build their own large-scale infrastructure and computing resources. e cloud can train a machine learning model and provide aided disease diagnosis service by using the outsourced medical data and machine learning models, which can help improve doctors' diagnosis, treatment decisions and provide patients an online disease diagnosis service. A typical cloud platform machine learning system architecture is shown in Figure 1.
However, the security and privacy of outsourced data will be threatened by various threats, making people afraid to use the service of CSP. e security and privacy threats are mainly reflected in the leakage of the data, the machine learning model of the model owners, the users' request, and diagnosis results. As we all know, the leakage of medical information may cause irreversible losses or become a major event.
erefore, the security and privacy preserving of model training and diagnosis based on cloud computing have become a major challenge.
To address the abovementioned challenges, many scholars have proposed various schemes, such as a secure outsourced classification based on logistic regression model [1], an electronic medical disease risk prediction scheme based on naive Bayes model [2], and other secure disease prediction schemes based on machine learning technology [3][4][5]. As a machine learning algorithm with high computational efficiency and nice predictive accuracy, the support vector machine (SVM) has achieved high classification accuracy and efficiency in the medical field [6,7]. However, the existing privacy preserving SVM schemes mainly implement secure prediction [8][9][10][11], and there are few privacy preserving SVM schemes for secure training. Most of the existing privacy preserving SVM schemes are designed for binary classification, which can only determine whether the patient has the disease [12], but cannot deal with the multiclass of the disease. In addition, multiclass SVM requires more computation, which will reduce the efficiency [13].
To solve the abovementioned problems, we propose an efficient and privacy preserving online disease diagnosis scheme based on the SVM algorithm. In our scheme, we can achieve multi-class SVM training on the encrypted outsourced data from multiple data owners and provide users with privacy preserving disease diagnosis. In summary, our contributions are as follows: (1) Efficient and secure basic operation algorithms: Based on the Paillier cryptosystem, we design several basic operation algorithms to realize the secure outsourced data storage and computation, including secure aggregation algorithm, secure multiplication algorithm, and so on. ese secure computation algorithms are the building blocks for our proposed training protocol. (2) Completing machine learning process under privacy preserving: Aiming at the general machine learning process and the goal of privacy preserving, we propose a privacy preserving outsourced multiclass SVM model training and online-aided disease diagnosis scheme. Different from the existing privacy preserving schemes that only support training or diagnosis, our proposed scheme extends the function of privacy preserving machine learning system. (3) Efficient and secure online aided disease diagnosis: Based on the BFV cryptosystem, we design a secure maximum finding algorithm and secure comparison algorithm. We provide an efficient and privacy preserving aided disease diagnosis scheme. Experimental results illustrate that our proposed scheme significantly reduces the computation cost than the existing similar schemes, which is suitable for practical application scenarios where a large number of users request diagnosis at the same time. (4) Low overhead for local client: For a local client, the client only needs to perform encryption and decryption operations in our proposed scheme, which reduces the storage and computation overhead of the local client to the greatest extent and makes full use of the computation power of the cloud servers. e remainder of this paper is organized as follows. In Section 2, we review some related works. In Section 3, we review the Paillier cryptosystem, BFV cryptosystem, and SVM algorithm as preliminaries. In Section 4, we make a system overview. en, we propose our scheme in Section 5.
In Section 6, we analyze the security of our proposed scheme. In Section 7, we make a performance evaluation. Finally, we conclude this paper in Section 8.

Related Work
In this section, we summarize the privacy preserving machine learning schemes in recent years.
With the development of big data era, machine learning has been widely used in many fields. Among them, the application of machine learning in the field of intelligent disease diagnosis has developed rapidly. Disease diagnosis schemes based on various machine learning classification algorithms have been proposed [14][15][16][17]. However, at the same time, the problem of privacy disclosure in the machine learning process is becoming more and more serious. So, many scholars have carried out the research studies on privacy preserving machine learning.
Triastcyn and Faltings [18] proposed the Bayesian differential privacy, considered the distribution of data and provided a more practical privacy guarantee. Laur et al. [19] proposed a privacy preserving scheme of support vector machine based on secure multiparty computation. For each training or testing phase, their scheme involves multiple parties holding encrypted data and secret sharing obtained  during training. Based on additive homomorphic encryption, Mandal and Gong [20] designed a privacy preserving scheme that performs gradient descent on data owners and cloud server. ey achieved secure linear and logistic regression model training. Shen et al. [21] used blockchain technology to establish a secure and reliable data sharing platform among multiple data providers and constructed a privacy preserving support vector machine training scheme based on the Paillier cryptosystem. However, in their scheme, the data provider needs to interact with the cloud server to complete the computation. e computation cost of the data provider is large. Liu et al. [22] proposed a privacy preserving clinical decision support system using the naive Bayes (NB) classifier. e BGV homomorphic encryption system significantly improved the performance. In work [23], a framework for securely and efficiently outsourcing decision tree inference was proposed. Tan et al. [24] proposed a system for privacy-preserving machine learning that implements all operations on the GPU, which makes full use of the computing power of GPU. Zheng et al. [25] combined random permutation and arithmetic secret sharing by the compute-after-permutation technique and built a privacypreserving machine learning framework. Li et al. [26] proposed a verifiable privacy-preserving machine learning prediction scheme for the edge-enhanced HCPSs, which outputs the verifiable prediction results for users without privacy leakage. Ma et al. [27] designed a lightweight privacy-preserving medical diagnosis mechanism on edge called LPME. Among them, the SVM algorithm is a research hotspot and has been widely used in different data mining and machine learning schemes. Most of the existing privacy preserving SVM schemes are based on three main privacy preserving technologies: differential privacy (DP), secure multi-party computation (SMC), and homomorphic encryption (HE). DP can significantly improve the calculation and communication efficiency, but the cost is to sacrifice the accuracy of the model by adding random noise [28,29]. Zhang et al. [30] proposed a general differential privacy model fitting method based on the genetic algorithm, but it reduces the decision accuracy of the model. SMC alleviates the limitation of computing but requires more interaction between participants. is leads to expensive communication overhead [31,32]. Yu et al. [33] first proposed a privacy preserving SVM classification method based on vertically segmented data. ey use SMC technology to obtain the global model, so as to protect the local privacy data and hide the classification model. However, this method requires at least three parties to participate in the calculation, which is complex and inefficient. HE can directly calculate the encrypted data, but it also requires a lot of computing costs [34,35]. Bajard et al. [36] uses HE technology to protect the decision model and medical data, but it needs high computational load. erefore, it is necessary to design an efficient and secure SVM scheme for cloud online disease diagnosis service. Wang et al. [37] proposed an efficient privacy preserving outsourced SVM scheme for Internet of medical things deployment, which protected training data privacy and guaranteed the security of the trained SVM model.
In this paper, we propose a new privacy preserving scheme for training and disease diagnosis of the multiclass SVM algorithm. We make a comparison analysis with the schemes in [38][39][40]. e experimental results demonstrate that our scheme has more practical application values.

Preliminaries
In this section, we describe some techniques as the basis of our scheme, including the Paillier cryptosystem, BFV cryptosystem, and SVM algorithm.

Paillier Cryptosystem.
In the training phase, the data are encrypted by the Paillier cryptosystem [41]. e Paillier cryptosystem is a public key cryptosystem with additive homomorphic operation. We will introduce the Paillier cryptosystem as follows.
(i) Key generation: Set the security parameter k.
(ii) Encryption: Given m ∈ Z n . e message m will be encrypted with pk. e ciphertext is expressed as c � E pk (m) � g m r n mod n 2 , where r ∈ Z * n is a random number. (iii) Decryption: According to the key generation stage and Carmichael's theorem, g λ ≡ 1 mod n. So g λ � kn + 1. en, m � D sk (c) � (L(c λ mod n 2 ) /L(g λ mod n 2 )) mod n. (iv) Homomorphic computation: Given two ciphertexts E pk (m 1 ), E pk (m 2 ) under the same public key pk. e homomorphic computations are defined as E pk (m 1 + m 2 ) � E pk (m 1 ) · E pk (m2), E pk (m 1 · m 2 ) � E pk (m 1 ) m 2 .

BFV Cryptosystem.
In the prediction phase, the data are encrypted by the BFV cryptosystem [34]. BFV cryptosystem is a leveled-FHE public key cryptosystem based on RLWE, which can support unlimited times additive homomorphic operation and limited times multiplicative homomorphic operation.
e polynomial a is used to generate public key. Define a noise polynomial e←χ. e notation χ expresses the Gaussian distribution. e public key is pk � ([− (a · s + e)] q , a).

SVM Algorithm.
SVM is a classical supervised learning algorithm to solve two kinds of classification problems. e SVM algorithm will find the best hyperplane. e classifier is a decision function f(X) � 〈W · X〉 + b, f(X) ≥ 0 expresses positive class and f(X) < 0 expresses negative class.
ere are two training methods for the SVM model: one is based on the SMO algorithm and the other is based on the gradient descent algorithm. Because the operation steps of the SMO algorithm are more complex, which makes a lot computation costs when using encrypted data. erefore, we choose gradient descent to realize the privacy preserving SVM model training. In the SVM model training process based on the gradient descent method, the objective function L(X) � (1/2)|W| 2 + C n i�1 max(0, 1 − y i (〈W · X〉 + b)) � (1/2)|W| 2 + C n i�1 loss needs to be minimized. When y i (〈W · X〉 + b) ≥ 1, it means that the classification is correct.
e loss � 0 and the parameters do not need to be updated. When y i (〈W · X〉 + b) < 1, it means that the classification is incorrect. e loss � 1 − y i (〈W · X〉 + b) and the parameters need to be updated.

System Overview
In this section, we will introduce our system model, security goals, and threat model.

System Model.
Our system model should achieve the privacy preserving training and online disease diagnosis process. erefore, our system model is designed as shown in Figure 2.
ere are six participants in our system model, which are trusted authority (TA), medical centers (MCs), cloud storage server (CSS), cloud computation server (CCS), diagnosis service provider (DSP), and users.
(i) Trusted authority (TA): TA is the fully trusted party of the whole system, which is used to generate and distribute keys for other participants in the system. After initialization, TA will stay offline. (ii) Medical centers (MCs): Each MC has its own local medical data. To reduce the local storage cost, MCs will outsource the medical data to CSS for storage. e users can decrypt the results with own private key.

Security Goals.
In order to meet the security requirements of outsourced training and diagnosis, our scheme will achieve the following security goals.
(i) Medical data privacy: e outsourced data of MCs will not be leaked to other participants in the whole machine learning process. (ii) Model privacy: Other participants cannot learn any useful information about the model of DSP. (iii) Users privacy: e diagnosis requests and results of users will not be acquired by other participants. (iv) Intermediate results privacy: In the execution of protocols, any participant will not infer other participants' sensitive information through the intermediate results.
In our scheme, the training and diagnosis processes are completed by CSS and CCS. All participants are semi-honest (or honest-but-curious). Specifically, they will honestly implement the secure computation protocols, but they will try to analyze the sensitive data and intermediate results to infer the useful information of other participants. Like the previous works, we assume that CSS and CCS will not collude. Because CSS and CCS belong to different commercial companies, they will not collude with each other for their own reputation.

reat Model.
In this paper, we will define three attacks in our system model.
(i) Eavesdropping attack: is attack means that an adversary can eavesdrop and analyze data during the data transmission. e data transmission includes outsourcing process and the interaction between participants in protocol implementation. (ii) Honest-but-curious attack: All participants will implement the protocol honestly, but they will infer the useful information during the execution of protocols. (iii) Client-collusion attack: In the training and diagnosis process, some clients may collude to analyze the useful information of other participants.

Proposed Scheme
In this section, we describe the proposed scheme in detail. Our scheme mainly includes system initialization, privacy preserving machine learning training, and online disease diagnosis.
In order to accurately describe our proposed scheme, we give the description of used notations in Table 1.

System Initialization.
In the system initialization phase, TA generates system parameters and distributes the parameters for MCs, CSS, CCS, and DSP, respectively. TA sends the parameters through the secure communication channel. en, TA will stay offline. We assume that there are m MCs in our system. Because the Paillier cryptosystem and BFV cryptosystem can only encrypt integers, the floating point numbers and negative numbers should be converted into integers. erefore, all participants should make data conversion before encrypting their sensitive information.

Generate System Parameters
(1) Generate a public-private key pair (PK P � (N P , g), SK P ) of the Paillier cryptosystem and a public-private key pair (PK B , SK B ) of the BFV cryptosystem. e BFV plaintext space is N B . e public keys are public and the private keys are sent to the CCS.
(2) Generate a public-private key pair (PK C P , SK C P ) of the Paillier cryptosystem and a public-private key pair (PK C B , SK C B ) of the BFV cryptosystem for CSS. e BFV plaintext space is N C B . e public keys are public and the private keys are sent to CSS.
(3) Generate a public-private key pair (PK D P , SK D P ) of the Paillier cryptosystem for DSP. e public key is public and the private key is sent to DSP.
(4) Generate a random integer ω ∈ N P . TA randomly splits ω to m integers, satisfying ω 1 + ω 2 + · · · + ω m � ω and sends ω i to MC i . en, generate two lists H and H ′ . Each list has m random integers, H � (n 1 , n 2 , . . . , n m ), n i ∈ N P , H ′ � (n 1 ′ , n 2 ′ , . . . , n m ′ ), n i ′ ∈ N P . Each element in H and H ′ represents the ID of each MC. When MC i sends authentication id i to CSS, MC i will hide g a i and ω i with n i and n i ′ , respectively. e (n i , n i ′ ) is sent to MC i . H and H ′ are sent to CSS.

Data Conversion.
In the machine learning application scenario, data and model parameters contain floating point numbers and negative numbers.
For a floating point number x, we enlarge x to x · 2 E (E is the precision of floating point numbers). For example, given a floating point number x � 3.61 and the precision E � 20, we can convert x into an integer x ′ � 3785359. For a negative number y, we divide the plaintext space N (N is expressed the plaintext space of the Paillier or BFV cryptosystem) into two parts because all variables and intermediate results in the process of training and prediction are much smaller than N/2. An integer in [0, N/2) represents a positive integer and (N/2, N − 1] represents a negative integer. When encrypting the negative integer y, it is converted to encrypt N − y. If y is both a floating point number and a negative number, y is first converted into a negative integer.

Privacy Preserving Machine Learning Training.
e privacy preserving machine learning training process is completed by CSS and CCS. We assume that the amount of outsourced data is n.

Local Data Outsourcing.
To protect the privacy of MCs' local data, MCs will encrypt the data before outsourcing. e outsourcing process of MC i (i � 1, . . . , m) is as follows.
(1) MC i generates a random integer a i ∈ Z N P . Computing pk i � g a i mod N 2 P as public key and the private key is sk i � a i . (1) Secure data aggregation algorithm (Block_1). CSS needs to aggregate MCs' outsourced data before starting machine learning training. e algorithm works as follows and is described in Algorithm 1.
(2) After receiving the training request, MC i computes id i � (pk i + n i , ω i + n i ′ ) as authentication ( e id i indicates that CSS is allowed to use the outsourced data of MC i for training) and sends to CSS.
(3) CSS obtains the pk i , ω i of MC i through the id i and computes ω � ω 1 + ω 2 + · · · + ω m . It should be noted that ω can be obtained only after all MCs have sent their authentication. en, CSS computes h i � g a i +ω mod N 2 P and completes the aggregation.
(2) Secure multiplication algorithm (Block_2). Given two encrypted integers [x] PK P and [y] PK P , the algorithm needs to compute [x · y] PK P . e algorithm works as follows and is described in Algorithm 2.
(1) CSS generates two random integers R 1 , R 2 and R 1 , R 2 ∈ Z N P . en, it computes by applying the additive homomorphism, obtaining the following results.
en, sending them to CCS.
(2) CCS generates a random integer T, T ∈ Z N P . It decrypts [y + R 2 ] PK P by using SK P . en, it encrypts (y + R 2 + T) mod N C P with PK C P to get [y + R 2 + T] PK C P . Computing [xT + R 1 T] PK P � [x + R 1 ] T PK P and encrypting T with PK P . Sending [y + R 2 + T] PK C P , [xT + R 1 T] PK P and [T] PK P to CSS. (3) CSS decrypts [y + R 2 + T] PK C P with SK C P and computes y + T. en, computing by applying the additive homomorphism, obtaining the following results.
(3) Secure inner product algorithm (Block_3). Given two encrypted vectors [X] PK P , [Y] PK P . e algorithm will compute [X · Y] PK P and is described in Algorithm 3.

(4) Secure scalar multiplication of vector algorithm (Block_4).
Given a encrypted vector [X] PK P and a encrypted integer [y] PK P , the algorithm will compute [y · X] PK P and is described in Algorithm 4.
(5) Secure symbol judgment algorithm (Block_5). Given an encrypted integer [x] PK P , the algorithm will compute the sign of [x] PK P . Let judge � 1 if x ≥ 0 else judge � 0. e algorithm works as follows and is described in Algorithm 5.
(1) CSS chooses a random integer r, l(r) < l(N P )/2. en, it computes [x · r] PK P � [x] r PK P by applying the additive homomorphism and sends [x · r] PK P to CCS.

Privacy Preserving Outsourced Training with Multiclass SVM.
In this section, we construct a privacy preserving outsourced training protocol to train a multiclass SVM model using the proposed building blocks. DSP outsources the training task to CSS and CSS completes the aggregation of outsourced data. en, CSS and CCS complete the model training. After finishing the training, CSS transforms . To achieve the transformation, we use the algorithm proposed in reference [38].
For multiclass SVM training, there are two methods: one to rest (ovr) and one to one (ovo). In order to improve the efficiency and reduce the number of iterations, we choose the ovr method for training. We need to construct L binary SVM classifiers, each of which corresponds to one classification. e process is described in Algorithm 6.   (14) [xy] PK P � [xy − R 1 T] PK P · [R 1 T] PK P ALGORITHM 2: Secure multiplication (Block_2).
Input: the authentication and outsourced data of MC i . Output: the training data. CSS: (1) Send a training request to MC i , i � 1, 2, . . . , m. MCs:

Security and Communication Networks
For W i and the corresponding class result class i , DSP generates a t + 1-dimensional random integer vector R i � (R i 1 , R i 2 , . . . , R i t+1 ) and a random integer r i , l(R i j ) � l(r i ) < l(N C B )/2, l(R i j ) < l(N B )/2. en, DSP computes W i + R i and class i + r i to hide the parameters class results.
According to the combination of subtraction of L random integer vectors, DSP constructs a combination for it � 1 ⟶ T: [grad] PK P � [W k ] PK P for i � 1 ⟶ n:

Secret Diagnosis Request Generation.
For user i , the symptom is expressed as X i � (x i 1 , x i 2 , . . . , x i t , 1) ( e last 1 is added to facilitate the computation of vector inner product). e user i generates a t + 1-dimensional random integer vector en, user i hides plaintext symptoms X i + T i . e user i encrypts symptom X i + T i with PK B and encrypts T i with PK C B . Let S as the secret prediction request of user i .
en, the user i sends S to CSS.

Diagnosis Value Computation.
In our proposed diagnosis scheme, it is a multiclassification problem, so it is necessary to compute the diagnosis value of each classification. After receiving the secret prediction request S, CSS decrypts [T i ] PK C B with SK C B . en, it computes [X i + T i ] PK B − T i by the homomorphic operation of the BFV cryptosystem.
According to the decision function f(X) � W · X + b of the SVM algorithm, a diagnosis value needs to be computed by one multiplication homomorphic operation and one addition homomorphic operation. Because the BFV encryption algorithm supports ciphertext packaging, batch operation can be realized and the computation efficiency is significantly improved. e process is described in Algorithm 7.

Diagnosis Result Generation.
After computing the diagnosis values, CSS obtains L encrypted diagnosis values and each value corresponds to a class result. en, CSS needs to select the classification corresponding to the maximum value from the L encrypted values as the diagnosis result. erefore, we design a secure maximum find protocol and a secure comparison algorithm. In this process, CSS and CCS jointly execute the protocol.
(1) Secure maximum finding. CSS sets an initial maximum position pos � 1. en, CSS executes L cycles and each cycle executes a secure comparison algorithm to continuously update the pos value.
After L cycles, CSS obtains the final diagnosis result [class pos ] PK P and converts [class pos ] PK P into [class pos ] PK user i under the public key PK user i of user i . To achieve the transformation, we use the algorithm proposed in literature [38].
en, CSS sends [class pos ] PK user i to user i . e user i decrypts the encrypted result with SK user i . e process is described in Algorithm 8.
(2) Secure comparison (SC). For the i-th cycle, CSS computes en, according to pos and j, computing index � pos · L + j. e index corresponds to the value (R pos − R j ) index in the combination table and computing as follows: CCS encrypts judge with PK C B and sends it to CSS. CSS decrypts it and if judge � 1, updates the value of pos. e process is described in Algorithm 9.

Security Analysis
In this section, we analyze the security of the proposed scheme. e focus is on the outsourced data of MCs, the SVM model parameters of DSP, the symptoms, and diagnosis results of users.

Security Analysis of Training.
In the training phase, the outsourced data of MCs and the SVM model parameters of DSP need privacy preserving. e training protocol is composed of building blocks designed in Section 5.2.2, which are completed by CSS and CCS. According to the Table 2: Combination table. (1, 2): index � 1 · L + 2 . . .   threat models proposed in Section 4.3, we analyze the security of the training protocol.

Eavesdropping Attack.
e data transmission process in the training phase includes that MCs outsources the encrypted data to CSS and the interactions of training protocol between CSS and CCS.
In the outsourcing process, the data of MC i have been encrypted. MC i combines the system public key PK P , parameter ω, and its own public key g a i to ensure that the data are hidden while encrypting. Suppose an adversary obtains the private key SK P and eavesdrops when MCs outsource their data to CSS. Because the data of MC i have been encrypted, such as [x] MC i � g x r N P + h i , the adversary cannot obtain any useful information. Similarly, the authentications are also hidden by random numbers. In the training protocol execution process, CSS and CCS will interact and the transformed data have been encrypted and hidden the real values with random numbers. e adversary also cannot obtain any useful information.

Honest-But-Curious Attack.
During the training phase, CSS and CCS will get some intermediate results from the proposed building blocks in Section 5.2.2.
In the Block_2, CSS hides x, y with R 1 , R 2 by homomorphic operation before sending them to CCS. en, CCS sends [xT + R 1 T] PK P , [y + R 2 + T] PK C P and [T] PK P to CSS after computing. erefore, both CSS and CCS cannot learn any useful information about x, y. Because the Block_3 and Block_4 are designed based on the Block_2, we will not analyze them. In the Block_5, CSS hides x with r and sending [x · r] PK P to CCS. CCS can only know the symbol of x, but cannot obtains the real value of x. CCS only returns the result judge (0 or 1) to CSS. rough the abovementioned analysis, CSS and CCS cannot learn any useful information in the training process.

Client-Collusion
Attack. For MCs, each MC i only know its own ω i . erefore, if (m − 1) MCs collude with each other to steal the privacy of another MC, they cannot learn any useful information.

Security Analysis of Disease Diagnosis.
In the diagnosis phase, the SVM parameters of DSP, the symptom X i and the diagnosis result class pos of user i need privacy preserving. e diagnosis process consists of diagnosis outsourcing, secret diagnosis request generation, diagnosis value computation, and diagnosis result generation. erefore, we conduct security analysis on the main steps by the threat model.

Eavesdropping Attack.
e data transmission process includes that DSP outsources [W i + R i ] PK B , [class i + r i ] PK C B , [r i ] PK P , i � 1, 2, . . . , L and to CSS, user i sends request S to CSS and the interaction of diagnosis process between CSS and CCS.
rough the encrypted data of outsourcing process, it can be seen that the adversary(CCS) can only decrypt [W i + R i ] PK B and [r i ] PK P with SK B and SK P . However, the adversary cannot learn W i because of the R i and the r i do not contain any useful information. When user i sends S to CSS, the symptom X i may be eavesdropped and decrypted by the adversary, but X i is hidden by random numbers. In the interaction of SC algorithm between CSS and CCS, all transmitted data are hidden by random numbers and ciphertext state, so the adversary cannot learn any useful information.

Honest-But-Curious Attack.
In the diagnosis value computation process, CSS can only obtain the L encrypted diagnosis values under PK B and does not know the corresponding classification meaning. e whole process is executed in the ciphertext state, so CSS cannot learn any useful information. e process of diagnosis result generation consists of secure maximum finding protocol and secure comparison algorithm. When CSS and CCS execute the secure comparison algorithm, CSS computes the difference between the two encrypted vectors to be compared. e obtained difference vector can confuse the positive and negative of the two numbers on each dimension of the original two vectors. At the same time, random integers are used to hide the difference vector. After decrypting the difference vector, CCS can eliminate the random number only after summing. During this process, CSS and CCS cannot obtain any useful information.
After CSS and CCS execute secure maximum finding protocol, CSS obtains the diagnosis result [class pos ] PK P . When performing key conversion on [class pos ] PK P , CSP hides class pos with a random integer R. en, sending [class pos + R] PK P to CCS. CCS can decrypt it. However, because there is a random integer hidden, CCS cannot obtain class pos .

Client-Collusion
Attack. For all users, they can only get the diagnosis results and cannot get any other information. erefore, our proposed scheme can resist the client-collusion attack.

Performance Evaluation
In this section, we implemented our scheme and evaluated the performance of training and diagnosis.
Our experimental environment is shown in Table 3.
In our experiments, we evaluated our proposed scheme with a real dataset from UCI machine learning library called dermatology.
e dermatology dataset is a multiclassification dataset with 6 categories and 34 symptoms.

Effect of Key Length on Computation Overhead.
e key length in cryptosystem has a great impact on efficiency and security. erefore, we tested the data encryption time and main building blocks time (Block_1 and Block_3), which have high computation overhead. e test results are shown in Table 4.
From Table 4, it can be seen that the increase of key length has a great impact on the computation overhead. Based on the experimental results and security considerations, the key length of the Paillier cryptosystem is set to 1024 bit in the training phase.

Privacy Preserving Multiclass SVM Training Analysis.
In order to meet the requirements of data encryption, we convert all floating-point numbers to integers. e conversion accuracy E of floating-point numbers has a great impact on the accuracy of the SVM model. We tested the accuracy of the SVM model under different E values; the results are shown in Figure 3.
rough the abovementioned experimental analysis, it can be seen that the larger the E, the higher the accuracy of the model. With the increase of E, the accuracy of the model tends to be stable. When E � 20, the accuracy of the model is the highest. At the same time, we also used the gradient descent method to train the SVM model in the plaintext state. We compared the accuracy with the model trained in ciphertext state and the results are shown in Table 5.
rough the abovementioned experimental analysis, it can be seen that the accuracy of our proposed scheme is the same as the plaintext state (98.61%). erefore, it is verified that our proposed scheme is correct and available.

Privacy Preserving Online-Aided Disease Diagnosis
Evaluation. We implemented our proposed scheme by using SEAL library in the diagnosis phase.

Noise Effect of BFV Cryptosystem.
When using the BFV cryptosystem for homomorphic operation, the influence of noise needs to be considered. e noise of ciphertext will be increased when the multiplication homomorphic operation is carried out. If the noise is too large after computation, the correct result cannot be obtained after decryption.
erefore, the BFV cryptosystem in SEAL will set the noise budget during initialization. If the noise budget is greater than 0 after the computation, it can be decrypted correctly. e value of noise budget is related to the setting of parameters. We evaluated the influence of poly module degree (d) on the encryption time, the change of noise budget after homomorphic operation, the computation time and whether the decryption result is correct. e results are shown in Table 6. It can be seen that the noise consumption of the BFV cryptosystem is relatively large when performing multiplication homomorphism, so the BFV cryptosystem can only perform multiplication homomorphism for a limited number of times. When computing the diagnosis values, only one inner product operation and one addition operation are required. erefore, it is completely feasible to use the BFV cryptosystem.
We comprehensively consider the encryption time and computation time and ensure that the computation results can be decrypted correctly. e parameter we set is poly module degree(d) � 8192.

Influence of Different Classification Numbers on
Computation Overhead. When using the BFV cryptosystem to encrypt data, multiple plaintext data can be packaged and encrypted into a ciphertext. e number of classifications is L.   We tested the impact of di erent L on user i and DSP. e results are shown in Figure 4(a). With the increase of L, the encryption time of DSP is gradually increasing, and the encryption time of user i can be considered as unchanged.
We also tested the impact of di erent L on the diagnosis values computation of CSS. e results are shown in Figure 4(b). With the continuous increase of L, the computation time for CSS is also increasing. e process of   generating diagnosis result is jointly completed by CSS and CCS. We tested the effect of different L on the diagnosis result generation. e results are shown in Figure 4(c). With the continuous increase of L, the time for CSS and CCS is also increasing.

Comparison Analysis of Secret Diagnosis Request
Generation and Diagnosis Values Computation. In our proposed scheme, secret diagnosis request generation can be regarded as data encryption of user i and diagnosis value computation can be regarded as homomorphic operation. We compared with the other three privacy preserving schemes. e results are shown in Table 7. rough the comparison analysis, it can be seen that the time of data encryption in our proposed scheme is significantly reduced compared with [38,39]. In the computation of decision function, our scheme has significantly reduced the computational cost compared with the scheme in [39,40]. At the same time, it can be seen from the total time that our proposed scheme is significantly lower than the other three schemes.
Next, we make further analysis. e names of participants may be slightly different in different schemes. In order to facilitate analysis, we divided participants into cloud server and client. We compared the computation overhead of cloud server and client, respectively. e results are shown in Tables 8 and 9.
In our proposed scheme, the client only needs to encrypt the data and can be offline after uploading the data to the cloud server. e cloud server only needs to compute the decision function.
is model reduces the computation overhead of the client to the greatest extent and performs privacy preserving computation through the powerful computing power of the cloud server. In scheme [38], the cloud server does not participate in the whole process, so it brings heavy computation overhead to the client. In scheme [39], the computation of the diagnosis values needs to be completed by the cloud server and the client. erefore, it not only brings heavy computation overhead to the client but also requires the client to always stay online in this process.

Comparison Analysis of Diagnosis Result Generation.
In our proposed scheme, after CSS completes the diagnosis values computation, it will jointly execute the secure protocol with CCS to generate the diagnosis result. We continued to make comparison analysis with schemes in [38][39][40]. e results are shown in Table 10.
rough the comparison analysis in Table 10, it can be seen that the computation time of our proposed scheme is    significantly lower than the scheme in references [38,40]. In our proposed scheme, the client does not need to participate in the process of diagnosis result generation. e schemes in references [38,39] require the participation of the client, which brings heavy computation overhead to the client.

Comprehensive Comparison Analysis.
We made a comparison analysis of the whole privacy preserving online disease diagnosis process. It is divided into the secret diagnosis request generation (data encryption), diagnosis value computation, and diagnosis result generation. e results are shown in Table 11.
rough the comparison analysis in Table 11, the total time of our proposed scheme is significantly lower than the schemes in references [38][39][40]. Considering that in the actual application scenario, a large number of users will constantly initiate secret diagnosis requests. It is very important to be able to quickly respond to the diagnosis results for users. erefore, our scheme has more practical application value. en, we made a summary as shown in Table 12.

Conclusion
In this paper, we propose an efficient and privacy preserving outsourced multiclass SVM training and onlineaided disease diagnosis scheme. We design some secure basic operation algorithms for machine learning training over the outsourced data from multiple data owners. We achieve a privacy preserving multiclass SVM training based on the basic operation algorithms. In the diagnosis phase, we achieve a privacy preserving multiclass diagnosis through our proposed the secure maximum find algorithm and secure comparison algorithm. Security analysis proves that our proposed scheme ensures that outsourced data, model parameters, users' symptoms, and diagnosis results will not be leaked. Experimental evaluation illustrates that our proposed scheme significantly reduces the computation overhead. In the future, we will study more efficient and privacy preserving machine learning schemes.

Data Availability
e data supporting the results of this study can be obtained from the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.