An Efficient Online Multiparty Interactive Medical Prediagnosis Scheme with Privacy Protection

Medical prediagnosis systems are now available online to give users quick and preliminary diagnosis information. The need for such a system has become particularly evident in areas with insufficient health professionals. Due to the privacy of patient medical information and the sensitivity of cloud diagnosis models, it is necessary to protect the security of data, models, and communications. These existing diagnosis systems can hardly provide a satisfied diagnosis accuracy while ensuring comprehensive security and high efficiency. In order to solve these problems, we proposed Relief-k minimum Wasserstein distance (Relief-kMW) classification method, which combined data encryption and BLS signature to form a privacy-preserving efficient online multiparty interactive medical prediagnostic scheme (OMPD). Theoretical analysis shows our OMPD effectively provides high-precision prediagnosis services. Extensive experimental results demonstrate that OMPD not only greatly improves the diagnostic accuracy but also reduces the computational and communication overhead.


Introduction
With the rapid development of mobile Internet, wearable devices, and intelligent Internet of Things, online medical prediagnosis systems that can provide prediagnosis services and medical advice anytime and anywhere have received extensive research attention due to their importance. Typically, an online medical prediagnosis system needs to provide a high degree of diagnostic accuracy along with a strong level of privacy protection. In order to achieve these two goals at the same time or at least maintain an acceptable balance between the two goals, multiple factors need to be considered. Firstly, a good classifier must be carefully selected from the existing classification algorithm library to achieve high diagnostic accuracy. And the classifier must be refined to capture the peculiar nature of the online diagnostic problem. The existing literature already provides us some candidate algorithms such as random forest [1], neural network [2], and other methods [3][4][5], and they have been widely used in medical diagnosis. However, in general, these algorithms often yield either an interminable training and response time or an unbearably low diagnostic accuracy and privacy protection. Therefore, we need to reinvestigate the potential of existing classifiers and then select the one that can be modified to offer the experimentally best performances in terms of computational overhead and privacy protection.
Secondly, in order to keep the medical data used by an online medical prediagnosis system well protected from malicious users who may attach and profit from these data, we need to choose appropriate privacy protection schemes [6][7][8][9][10][11][12][13][14][15] for security needs. There are three kinds of privacy protection schemes: anonymity protection, differential privacy protection, and homomorphic encryption. The anonymity protection (such as k-anonymity [9] and l-diversity [10]) simply erases users' private information which may lead to a decreased diagnostic accuracy as the deleted data cannot be used. The differential privacy protection [11][12][13] protects users' privacy by adding noise to completely obfuscate the query response. However, these random noise can cover the critical information that are needed to boost the diagnostic accuracy. Different from the prior two schemes, homomorphic encryption [14,15] can strictly protect privacy without destroying the original data. However, the encryption and decryption process of the homomorphic encrption usually requires huge computational overhead. Therefore, to have homomorphic encryption work for the scenarios of online medical prediagnosis, it still requires us considerable effort to bring the computational burden of the homomorphic encryption down considerably without sacrificing the protection strength very much.
In addition, a multiparty interactive system usually needs to be aware of the problem of communication security. This becomes particularly essential for online medical prediagnosis scenarios with vulnerable and unsecure communication channels. It is possible that the transmitted messages between the two parties can be eavesdropped malicious attackers. Hence, we need an efficient information security scheme for online medical prediagnosis system to protect the communication. For this target, there are some existing methods [16][17][18][19], such as secure multiparty protocol, digital signature, and some other data encryption methods. In this paper, we choose digital signature information authentication scheme. Because it can ensure that only the correct recipient can obtain the communication information through its private key, it can effectively ensure that the communication content is not maliciously stolen or tampered with. The BLS short signature [16] is considered to be one of the most effective methods. This technology authenticates and confirms users' identity information. It can prevent others from fraudulently using users' identity information.
Based on the above three points, it is difficult to combine the classification method with the privacy protection method, while saving computation and communication overhead to achieve high system efficiency. In this paper, we proposed an online multiparty interactive medical prediagnosis service scheme with high efficiency, high precision, and privacy protection, called OMPD. It can protect the private information of medical users, a large of medical instances of the hospital and the diagnosis model in the cloud. And the users can obtain online medical prediagnosis services when the original data is not available in the cloud. The main contributions of this paper are as follows: (1) OMPD can provide high-precision prediagnosis services. We introduced a data preprocessing method (simple data encryption and some conversion) based on the newly proposed Relief-kMW classifier, and these data processing operations would not change the original classification accuracy of the classifier. In order to verify the classification accuracy of OMPD, we conducted accuracy analysis and experiments on two real data sets on the UCI machine learning library (http://archive.ics.uci.edu/ml). Experimental results showed that OMPD can provide high-precision services (2) OMPD is a three-party interactive system that can provide medical prediagnosis services with fullprocess protection. By preprocessing medical data and applying BLS signatures to communication information, the security of private information of medical users, the instance of the hospital's database, the diagnostic model in the cloud, and the interaction can be guaranteed The structure of our work is as follows: Section 2 introduces some related works, and Section 3 introduces the Relief-kMW classification method and BLS signatures. Section 4 introduces the entire detailed process of the OMPD. Section 5 carries out accuracy and safety analysis. The experimental evaluation is carried out in Section 6. Finally, we conclude in Section 7.

Related Works
In recent years, more and more people have paid attention to the efficiency and privacy safety of medical diagnosis services, and many solutions [20][21][22][23][24][25] have been proposed. In view of the privacy and security issues in online medical prediagnosis services, homomorphic encryption technology can well protect the private information in medical data and is widely used in various medical diagnosis schemes. For example, literature [22] constructed three classification protocols based on the Paillier cryptosystem to protect the security of data collected from medical users and service providers. Literature [23] developed an automatic diagnosis system for privacy protection. The remote server classified the biomedical signal provided by the client without obtaining any information about the signal itself and the final result of the classification. Liu et al. [24] proposed a privacy-preserving patient-centered clinical decision support system based on additive homomorphism to help clinicians assist in diagnosing patients' disease risks. Hua et al. [25] proposed an efficient and privacy-preserving medical diagnosis framework, which outsourced the accurate diagnosis model to a cloud server in an encrypted manner based on partial decryption and security comparison technology to achieve the advantages of two-way ciphertext quantification. Since all encryption operations are based on homomorphic encryption, the huge computational overhead makes the system extremely inefficient, which is not suitable for online medical service scenarios.
In addition, many online medical prediagnosis schemes based on various machine learning classification algorithms [26][27][28][29][30][31][32][33] adopt different privacy protection strategies. Wu et al. [27] designed a new efficient and privacy-preserving conditional unintentional transmission protocol. The literature [28] proposed a novel privacy protection biometric identification scheme, which improved efficiency by using the power of cloud computing. Zhang et al. [29] proposed a cloud-based privacy protection deep computing model to improve the efficiency of big data feature learning. Their schemes mainly protect the privacy information of users' query vector, but do not protect the security of communication information and diagnostic models.
Zhang et al. [32] proposed a disease prediction system (called PPDP) that used random matrices to construct new medical data encryption, disease learning, and disease prediction. Although the use of a single-layer perceptron makes 2 Wireless Communications and Mobile Computing the disease prediction stage simple and efficient, the accuracy of the prediagnosis is not high enough. And in the disease learning stage, constantly updating the weights until convergence would consume huge computation and communication overhead. At the same time, in this three-party interaction scenario, the users' private information is directly transmitted without communication security. Zhu et al. [33] proposed an efficient and privacy-preserving medical primary diagnosis based on kNN (called EPDK). With lightweight multiparty random shielding and polynomial aggregation technology, users can ensure the security of their sensitive information in the online medical diagnosis. This is an interactive service in which only the user and the server participate. The diagnostic model of the server is based on the original medical data set. Medical data that is not encrypted or processed is vulnerable to attack or theft. In addition, the classification accuracy of EPDK is still not high enough. There are few schemes that provide comprehensive privacy protection, high-precision, and high-efficiency prediagnosis services.

Preliminaries
This section introduces the Relief-kMW classifier used by OMPD and the BLS signature technology to protect communication security.
3.1. Relief-kMW Classifier. The Relief-kMW classifier first needs to use the Relief algorithm [34] to obtain weight = ð w 1 , w 2 , ⋯, w n Þ, and then calculates the Relief-Wasserstein distance between the query vector X = ðx 1 , x 2 , ⋯, x n Þ and each medical instance Y ðjÞ = ðy ðjÞ 1 , y ðjÞ 2 , ⋯, y ðjÞ n Þ, j ∈ ½1, N in the database D, where n is the number of features in each vector, and N is the total number of data in the database. The specific steps are as follows: 3.1.1. Relief Feature Weight Distribution Algorithm. Before the Relief-kMW classifier works, the weight of the feature is calculated according to the Relief feature weight distribution algorithm (as shown in Algorithm 1).
The function diff ði, R, HÞ in Algorithm 1 represents the difference between the sample R and the sample H on the i -th feature, and its formula is as follows: Definition 1 (Relief-Wasserstein distance). Suppose X = ðx 1 , x 2 , ⋯, x n Þ and Y = ðy 1 , y 2 , ⋯, y n Þ are two input samples, where n is the total number of features, θ i is the iterative value of the difference between X and Y on each component, and weight = ðw 1 , w 2 , ⋯, w n Þ is the weight value of the corresponding features obtained by the Algorithm 1. Then, we calculate the followings: The Relief-Wasserstein distance between X and Y is Then, we select the closest k instances. At last, we use most of the classification results of these k instances as the final classification result. In fact, the choice of k value is not optimal in theory. It depends on data characteristics and classification requirements. Some articles [35] on the optimal theoretical value of k pointed out that the best choice of k for a given data set may also depend on many attributes of the data. They have carried out the selection experiment of k value under different applications for different specific data sets and selected the best k value for the application as much as possible.

BLS Signature.
Suppose there is a large prime number q and two cyclic groups G 1 and G 2 , their orders are both q, and g is a generator of G 1 . Then, there is a mapping e : G 1 × G 1 ⟶ G 2 , for ∀a, b ∈ Z ⋆ q and ∀u, v ∈ G 1 , and it has the following properties: (1) Generate public key Definition 2 (BLS signature). The bilinear parameter generator takes a safety parameter μ as input and outputs a 5-tuple ðq, g, G 1 , G 2 , eÞ, where q is the prime number of μ, G 1 and G 2 are two cyclic groups of order q, and g is the generator of G 1 , e : G 1 × G 1 ⟶ G 2 is a nondegenerate and effectively calculated bisexual mapping. The steps of BLS signature are as follows: The message sender randomly selects an integer as the private key SK and 0 < SK < q − 1, and then calculates the 3 Wireless Communications and Mobile Computing public key: (2) Create a signature. The sender creates a signature by performing the following operations on the message m ∈ G 1 : (3) Verify the signature. After receiving the signature, the receiver performs the verification of the following formula, and the message content can be obtained after the verification is successful:

OMPD Scheme
Our OMPD includes three entities: the hospital, the cloud, and the users. And it consists of six stages: initialization, query generation, query processing, prediagnosis service, query result analysis, and result acquisition. Figure 1 shows the flow of the OMPD. For ease of expression, Table 1 gives the description of the notations used in the following sections.
4.1. Initialization. The hospital first generates a bilinear parameter ðq, g, G 1 , G 2 , eÞ. Then, the hospital uses a random number as private key SK H (SK H ∈ Z ⋆ q ), calculates public key PK H = g SH H , and sets the parameters t 1 , t 2 , and t 3 , where t 1 > t 2 > t 3 . It needs to choose a secure asymmetric encryption algorithm EðÞ and encrypted hash function HashðÞ, where HashðÞ: f0, 1g ⋆ ⟶ G 1 . The hospital securely saves its private key SK H as the master key and publishes system param- Suppose there is a medical data set fD l = fY ðjÞ l = ðy ðjÞ l1 , y ðjÞ l2 , ⋯, y ðjÞ ln l Þ, j ∈ ½1, N l g, l ∈ ½1, Lg in the hospital database, these medical instances include L kinds of diseases, and each disease corresponds to a data subset D l containing N l data. For each D l , the instance contains n l features. The hospital uses the Algorithm 1 to obtain the feature weight weight l = ðw l1 , w l2 , ⋯, w ln l Þ in each data subset D l . The preprocessing of medical data is shown in Algorithm 2. The hospital selects two large prime numbers p and a and sets lenðpÞ = lenðt 1 Þ, lenðaÞ = lenðt 2 Þ. Next, it chooses a large random number β and β ∈ Z p . Each medical instance Y The hospital keeps the parameters ðp, a, b j i Þ secret and sends all preprocessed medical instances to the cloud. After obtaining the preprocessed medical instances, the cloud randomly selects part of the data as the test set to obtain the optimal k l value of the classifier for the diseases l. Then, the cloud can have the Relief-kMW classifier with the best value k l and many preprocessed medical instances received from the hospital.

Query
Generation. The user generates query vector X = ðx 1 , x 2 , ⋯, x n l Þ. Then, the user uses the hospital's public key PK H to encrypt the query vector to get X Q = E PK H ðlkx 1 kx 2 k⋯kx n l Þ, where l is the disease that the user wants to query. Then, he (she) uses private key SK U to create a signature Sig U = HashðX Q kTS 1 Þ SK U and then sends ID U kX Q kSi g U kTS 1 to the hospital. Input: training data set D, sample sampling times m, feature set F = ð f 1 , f 2 , ⋯, f n Þ, which has n features in total. Output: feature weight weight = ðw 1 , w 2 , ⋯, w n Þ. 1:The feature rights are reset to 0; 2:for r = 1 to m do 3: Randomly select a sample R; 4: Find the nearest neighbor sample H in the same class of R, and the nearest neighbor sample M in different classes of R; 5: for i = 1 to n do 6: wðiÞ = wðiÞ − ðdiff ði, R, HÞ/mÞ + ðdiff ði, R, MÞ/mÞ; 7: end for 8:end for 9:return weight.
Algorithm 1: Relief feature weight distribution algorithm. 4 Wireless Communications and Mobile Computing 4.3. Query Processing. After receiving the ID U kX Q kSig U kT S 1 from the user, the hospital first needs to confirm the I D U , and then uses the following formula to verify the validity of the message: If the equation is true, then the hospital performs Algorithm 2. The processing method is the same as that of medical instances. The hospital selects a large random number γ, and γ ∈ Z p then performs vector features weight distribution (to get the vector IX = ðIx 1 , Ix 2 , ⋯, Ix n l Þ), vector value iterative transformation (to get the vector IIX = ðIIx 1 , IIx 2 , ⋯, I Ix n l Þ), and two-dimensional forward expansion (to get the vector IIIX = ðIIIx 1 , IIIx 2 , ⋯, IIIx n l +2 Þ), and obtains the preprocessed query vector EX = ðex 1 , ex 2 , ⋯, ex n l +2 Þ after encryption.
The hospital calculates Q = E PK C ðlkβkγkex 1 kex 2 k⋯ke x n l +2 Þ and keeps the parameters ðp, a, b i Þ secret. Then, it uses private key SK H to create a signature Sig H = Hash ðQkTS 2 Þ SK H and sends QkSig H kTS 2 to the cloud.

Prediagnostic Service.
After receiving QkSig H kTS 2 from the hospital, the cloud verifies the validity of the message by using the following formula: If the above equation is true, the cloud performs the prediagnosis service algorithm. As shown in Algorithm 3, the cloud finds the data subset D l corresponding to the disease l and calculates the Relief-Wasserstein distance between E X and each preprocessed encrypted medical instance in D l . Then, the cloud selects the k l instances with the smallest distance and takes most of the categories of these k l instances as the final prediagnosis result R. The time complexity of Algorithm 3 is OðN l n l Þ. The cloud keeps each RW, and compu-tation rules secret. Then, it uses the hospital's public key to calculate R l = E PK H ðRÞ and uses SK C to create a signature Sig C = HashðR l kTS 3 Þ SK C . Findly, the cloud sends R l kSig C kT S 3 to the hospital.

Query Result Analysis.
After receiving R l kSig C kTS 3 from the cloud, the hospital verifies the validity of the message by using the following formula: The prediagnosis result R can be obtained if the equation holds. The hospital gives some advice A l for this result and calculates RA l = E PK U ðRkA l Þ. Then, the hospital uses private key to create a signature Sig H = HashðRA l kTS 4 Þ SK H and sends RA l kSig H kTS 4 to the user. 4.6. Result Acquisition. After receiving RA l kSig H kTS 4 from the hospital, the user verifies the validity of the message by using the following formula: If the equation holds, the user can obtain the prediagnosis result R and the advice A l .

Accuracy and Security Analysis
This section analyzes the accuracy and security of OMPD. We verify that the data preprocessing of OMPD does not affect the original classification accuracy and can ensure privacy security.

Accuracy Analysis.
If it can be verified that the Relief-Wasserstein distance RW 2 between the preprocessed data calculated by Algorithm 3 and the Relief-Wasserstein distance RW between the original data are approximately a fixed multiple, it can be proved that our scheme does not  Theorem 3. Assuming that RW 1 represents the Relief-Wasserstein distance between IIIX and IIIY ðjÞ l , RW represents the Relief-Wasserstein distance between the two original data, then RW is equal to RW 1 .
Proof :The calculation process of the Relief-Wasserstein distance RW between the original data is as follows: RW 1 represents the Relief-Wasserstein distance between IIIX and IIIY ðjÞ l , and then the calculation process of RW 1 is The components of X, IX, IIX, IIIX, EX Two large prime numbers chosen by the hospital ,⋯, θ n l +2 = IIIx n l +2 − IIIy j ð Þ ln l +2 = w 1 x 1 + w 2 x 2 +⋯+w n l x n l À Á − w 1 y j ð Þ l1 + w 2 y j ð Þ l2 +⋯+w n l y j ð Þ ln l , +⋯+ w 1 x 1 +⋯+w n l x n l À Á  Input: the preprocessed query vector EX, β, γ, l and k l . Output: pre-diagnosis result R. 1:EX = γ −1 · EX; 2:for j = 1 to N l do 3: set RW = 0; 4: for i = 1 to n l + 2 do 5: li j; 6: end for 7:end for 8:Select the k l data with the smallest RW between the N l data and the query vector, and use most of the classification results in the k l data as the pre-diagnostic result R; 9:return R. Proof :RW 1 represents the Relief-Wasserstein distance between IIIX and IIIY ðjÞ l , and then the calculation process of RW 1 is as follows: RW 2 represents the Relief-Wasserstein distance between EX and EY ðjÞ l , and then the calculation of RW 2 is as follows: Available from t 1 > t 2 > t 3 , we can get the followings: From t 1 > t 2 > t 3 , we can get that b i − b ðjÞ i is negligible relative to the large prime number a, and then Finally, Theorem 4 holds. And we can get that RW 1 and RW 2 are approximately in a fixed multiple relationship. It can be obtained from Theorems 3 and 4 that the cloud can still obtain accurate pre-diagnosis results through Algorithm 3 without obtaining the original data.  ðjÞ ln l Þ, j ∈ ½1, N l in the hospital data subset D l are privacy-preserving. In the initialization stage, in order to prevent the privacy of medical instances from leaking, the hospital expands two dimensions with 0 for each medical instance after feature weight distribution and vector value iteration, which can prevent the cloud and illegal users from obtaining real medical instances. Each b ðjÞ li in the encryption calculation is randomly generated; so, IIIy ðjÞ li is protected . When p, a, b ðjÞ li , and weight l are unknown, the cloud cannot obtain the original medical data. And the random number b ðjÞ li generated is independent of each other at each time. Only the hospital knows the data processing rules, and the cloud cannot infer the original medical data. Therefore, the medical instance set Dataset = fD l = fY ðjÞ l , j ∈ ½1, N l g, l ∈ ½1, Lg is kept secret during the calculation process. Similarly, the query vector of the user X = ðx 1 , x 2 , ⋯, x n l Þ is also kept secret.
The Relief-kMW classifier is confidential. In the operating calculation phase, each medical instance Y , ⋯, y ðjÞ ln l Þ, j ∈ ½1, N l and query vector X = ðx 1 , x 2 , ⋯, x n l Þ are preprocessed by the hospital before sending to the cloud. For two same query vectors, they should have the same Relief-Wasserstein distance between the same medical instances, but the random number b ðjÞ li and b i generated by each data processing are different and the Relief-Wasserstein distance calculated by the query vector is also different, which ensures that even the same user cannot obtain medical instance information after multiple queries. And each RW calculated by the Relief-kMW classifier is confidential, and the cloud uses the hospital's public key PK H to encrypt the query result. The hospital uses the user's public key PK U to encrypt the query result and the advice; so, only the corresponding user can decrypt the query result. Moreover, the user and the cloud cannot communicate directly during the query. Although the cloud knows the final prediction result, it cannot obtain the corresponding user's information. And the user cannot obtain the detailed information of the Relief-kMW classifier in the cloud. Therefore, the Relief-kMW classifier is confidential.
OMPD ensures communication security. Our scheme uses BLS signature to protect the information of each interaction. The signature is proved to be safe under the Diffie-Hellman problem of the random prediction model [36]. In addition, any illegal user cannot successfully submit a query request to the hospital because there is no key. Signature authentication can ensure that the message is not maliciously tampered with during transmission. Even if someone maliciously intercepts the message, the effective information in the message cannot be obtained because there is no key.
In summary, our OMPD can provide privacy-preserving medical prediagnosis services.

Performance Evaluation
In this section, multiple experiments are conducted to verify the accuracy and efficiency of OMPD from multiple dimensions of accuracy, computation overhead, and communication overhead.
6.1. Experiment Configuration. We implement OMPD with Python programming language and evaluate the computation and communication overhead of OMPD. We carry out our experiments on the device with CPU Intel(R) Xeon(R) Silver 4114, 2.20GHz, and Memory 32G. We choose two real data sets: Wisconsin Breast Cancer (WBC) and Mammographic Mass (MM) in the UCI machine learning library to evaluate the accuracy of the OMPD. In the comparative experiment, OMPD uses the Relief-kMW classifier, while OMPD-kMW uses the kMW classifier, EPDK [33] is a two-party interaction pre-diagnosis program using kNN classifiers, and PPDP [32] is a three-party interaction prediagnostic program using a single-layer perceptron trained with encryption matrices as the classifier.
The WBC contains 683 instances, including 444 benign instances and 239 malignant instances. Each instance contains 9 attributes. The MM contains 830 instances, of which 427 instances are benign and 403 instances are malignant. Each instance contains 5 attributes.
6.2. Selection of the Optimal Value of k. For WBC and MM, we randomly selected 100 malignant instances and 100 benign instances as the test data set to evaluate the accuracy of OMPD. And the rest were used as the training data set. Then, we performed 100 calculations to compare the average accuracy and computation time under different values of k. As shown in Table 2, when k WBC = 9 and k MM = 11, the classification accuracy of OMPD is the highest, and the Wireless Communications and Mobile Computing calculation time-consuming swing is the smallest. Therefore, in the following experiment, the values of k WBC and k MM are 9 and 11, respectively.
6.3. Evaluation of Accuracy. In order to verify the accuracy of our method, we conducted an accuracy comparison experiment, as shown in Table 3. The results show that the classification accuracy of OMPD is significantly higher than the other three schemes. This means that our OMPD still has a high classification accuracy after the data is preprocessed by feature weight distribution, iteration, expansion, and encryption. And it can provide high-precision online medical prediagnosis services for medical users. Please note that the accuracy of OMPD is higher than OMPD-kMW, which means that the feature weights obtained by OMPD using the Relief feature weight distribution algorithm, which increases the impact of features that have a greater contribution to the classification and reduces the impact of features with low contributions. The experimental results verify that the accuracy of using Relief-kMW is higher than that of k MW.

Evaluation of Computational Efficiency.
In order to verify that OMPD can provide efficient online medical prediagnosis services for medical users, this section evaluates the efficiency of computation. During the system operation, when the user generates the query vector and sends it to the hospital, the hospital needs to preprocess the query vector through ð3n l + 2Þ multiplication (division) and ð2n l − 1Þ addition (subtraction) operation. After the cloud receives the query request, it needs to go through ðN l n l + 2N l + n l + 2Þ multiplication (division) and ð2N l n l + 4N l Þ addition (subtraction) operation to calculate the diagnosis result. Let C a and C m denote the running time of addition and multiplication, respectively. Then, the overall computation complexity of OMPD is C m · ðN l n l + 2N l + 4n l + 4Þ + C a · ð2N l n l + 4N l + 2n l − 1Þ during system operation. As shown in Table 4, it can be seen that OMPD-kMW only lacks a multiplication operation (the calculation of multiplying feature weights and feature values) than our OMPD. The computational complexity of PPDP in the operating phase of the system has nothing to do with the amount of data N l . Its computational complexity is concentrated in the system initialization stage, and a lot of calculations are required to continuously update the weights to converge. The multiplication (division) in OMPD is far less than that in EPDK, while C m is more than C a . Therefore, the computational complexity of OMPD is lower than that of EPDK, and the gap becomes more obvious with the increase of data. However, we cannot intuitively see the comparison of the computational time from Table 4. So, we conducted a comparison experiment of computational overhead. As shown in Figure 2, as the number of medical instances increases, the computation time of EPDK has grown rapidly, and OMPD and OMPD-kMW have grown slowly, while PPDP has remained almost unchanged. This is because EPDK takes more time to perform operations such as multiplication (division). In addition, a large number of medical data stored by EPDK providers are collected directly from various medical service sites without encryption and other privacy protection measures, and there is a serious risk of privacy leakage. Our OMPD and OMPD-kMW, as shown in Table 4, require significantly less calculations than EPDK. However, PPDP only needs to calculate the query data and the weight to get the diagnosis result; so, the computation time of PPDP in the running phase remains unchanged, but PPDP generates a lot of computation and communication overhead in the initialization phase to update the weights to converge. And PPDP ignores the protection of the communication process. Users send sensitive information directly to the hospital, which is vulnerable to attacks and theft by malicious third parties.
6.5. Evaluation of Communication Overhead. The communication overhead comparison of the four schemes in the system operation phase is shown in Figure 3. Assuming that the medical advice contains 100 words, each word contains an average of 5 characters, plus punctuation and spaces, one 11 Wireless Communications and Mobile Computing word approximately contains 6 characters, and each character occupies 1 byte. The total communication overhead of OMPD is about 0.735 KB (including 600B medical advice). As the number of medical instances increases, the total communication overhead of OMPD, OMPD-kMW, and PPDP remains unchanged, while EPDK is much higher than the other three schemes. Because these three schemes obtain the diagnosis results directly in the cloud, only the results need to be sent during the communication process. The server in the EPDK only calculates the two intermediate values of the Spearman distance between the query vector and N l medical instances in database. The server needs to send 2N l intermediate values to the user, and the user finally calculates the diagnosis result. Therefore, the larger the N l , the greater the communication overhead of EPDK. And users cannot get corresponding medical advice in time.
In summary, OMPD can balance accuracy and efficiency to achieve high-precision online medical prediagnosis with lower computational complexity and communication overhead. At the same time, it also provides timely professional medical advice.

Conclusions
In this article, we propose an efficient online multiparty interactive medical prediagnosis scheme with privacy protection, called OMPD, which can protect the privacy with low computation and communication overhead. Accuracy analysis showed that our OMPD can obtain more accurate classification results without decryption. The security analysis demonstrated its security strength and privacy protection capabilities. And its effectiveness was verified through comparative experiments. Future work is to further improve the classification accuracy and operating efficiency of the program while ensuring the strength and security of privacy protection.

Data Availability
The data used to support the findings of this study are included within the article.