Highly Secure Privacy-Preserving Outsourced k-Means Clustering under Multiple Keys in Cloud Computing

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China Department of Mathematics Teaching and Research, Shanghai Business School, Shanghai, China State Key Laboratory of Integrated Service Networks, Xidian University, Xi’an, China Engineering Research Center of Molecular and Neuro Imaging of Ministry of Education of China and School of Life Science and Technology, Xidian University, Xi’an, China PBC School of Finance, Tsinghua University, Beijing, China School of Information Engineering, Xuchang University, Xuchang, China


Introduction
Data clustering [1,2] enables data records to be classified into groups according to their features, attributes, or similarities.
is property leads to its significance in many fields related to data analysis, such as pattern recognition, image processing, information retrieval, geography, and marketing. Furthermore, with the explosive data received nowadays in the information era, it has been a challenge for our digital devices not only to storage but also to perform computation on such massive data. Cloud computing relieves this problem by providing a platform with high storage capacity and strong computing power. Users tend to outsource their data on the cloud and authorize the cloud server computing ability on data. e cloud server therefore can replace users to perform some computation on the outsourced data, such as keyword search [3], equality test [4], and outsourced data clustering [5]. It is worth noting that, in these applications, the cloud server will send the final result to the data owner. is gives a security issue of data integrity which has been further researched in [6][7][8][9][10][11].
By outsourced data clustering which means the cloud classifies data into different groups according to their similarities, it is possible to efficiently detect abnormalities, segment images, and predict diseases. As a widely applied clustering method, k-means clustering [1] classifies data into k-clusters based on their distances from cluster centers. However, the sensitive information of data on the cloud platform cannot be protected by simply using k-means clustering. is calls for privacy-preserving outsourced kmeans clustering, where data is classified without exposing the sensitive information of data. e traditional privacy-preserving k-means clustering schemes [12][13][14][15] protect the data privacy by adding noises with the sacrifice of clustering accuracy. Subsequently, some symmetric and asymmetric constructions [16][17][18] have been proposed to improve it with the tradeoff of computing cost and communication overhead. e literature of outsourced privacy-preserving clustering schemes fall into two categories, i.e., single-key and multikey clustering, where the former one refers to that all of the outsourced data of owners are encrypted with one same key while that are with different keys in multikey clustering. Taking into account the practical application, it is necessary to consider the privacy-preserving clustering under multiple keys.
Recently, Rong et al. proposed an outsourced k-means clustering scheme [19] under multiple keys. Nevertheless, their scheme is not secure against semihonest assistant server (AS) and key management server (KMS), where AS can extract the ratio of messages and KMS can even extract all data records of users with its master secret key. In addition, as long as AS obtains one of the data records, it can recover all data records. e privacy leakage may incur a huge economic loss to the user in practice. To solve this problem, we present a highly secure outsourced k-means clustering scheme under multiple keys in cloud computing.

Our Contribution.
In this paper, we propose a highly secure privacy-preserving outsourced k-means clustering under multiple keys in cloud computing.
We first introduce our system model and threat models. Specifically, the system model includes four entities, i.e., data owners (DOs), query client (QC), cloud computing service (CCS), and key management service (KMS), and threat model denotes the models against semihonest CCS and KMS. Subsequently, based on [19] and BCP homomorphic encryption, we construct five protocols to realize different functions. It is worth noting that the secure multiplication (SM) protocol is defined to achieve the multiplicative homomorphic property using BCP encryption which only has additive homomorphic property. We then present a highly secure outsourced k-means clustering scheme under multiple keys in cloud computing, which achieves privacy security against semihonest CCS and KMS. In particular, we use BCP encryption to realize the security against privacy leakage to CCS such that semihonest CCS cannot extract any useful information from ciphertexts of data records. We then utilize AES encryption to protect privacy security against semihonest KMS. KMS, therefore, cannot extract any data records of data owners although KMS possesses the master secret key which can be used to decrypt ciphertexts encrypted using BCP encryption.  [20] proposed a high-order possibilistic c-means algorithm for big data in cloud computing based on the BGV cryptosystem [21]. However, their scheme is not practical because of its low efficiency. Subsequently, Almutairi et al. [22] improved it and developed a privacy-preserving k-means clustering scheme based on homomorphic encryption but failed to protect the plaintext information in the update of clustering centers. For this, Yuan and Tian [23] put forward a privacy-preserving clustering scheme using a novel lightweight cryptosystem basing on the hardness of learning with error (LWE) [24]. eir scheme can complete the sum of ciphertexts and compare the distance using ciphertexts of multidimensional data. Nevertheless, this scheme is not fully outsourced.

Outsourced
Single-Key Clustering. Lin [25] constructed a privacy-preserving kernel k-means clustering scheme based on linear transformation and kernel matrix with random perturbation, but this scheme cannot realize ciphertext comparison. Based on Paillier cryptosystem, Rao et al. [26] proposed a privacy-preserving outsourcing distributed clustering protocol in the union cloud environment, which includes a new protocol to construct the function of Euclidean distance and evaluate the termination condition over the encrypted data. e problem of this scheme lies in the heavy computing load and lack of support to encrypted datasets under multiple keys. Liu et al. [27] constructed a secure KNN multilabel data classification scheme based on Paillier cryptosystem.

Multikey
Clustering. Gheid and Challal [28] presented a novel privacy-preserving k-means clustering scheme with the multiparty of Clifton security [29]. Peter et al. [30] further proposed a scheme to outsource multiparty computation to cloud under multiple keys, while it does not support ciphertext comparison. Li et al. [31] applied the BCP homomorphic encryption [32] to multiparty horizontal partitioned databases and then set up the ciphertext comparison for the outsourced privacy-preserving random decision tree algorithm. Rong et al. [19] improved it by presenting an efficient privacy-preserving protocol for outsourced k-means clustering under multiple keys based on the double decryption cryptosystem [33].

Organization.
e rest of this paper is organized as follows. In Section 2, we recall the definitions for k-means clustering, BCP encryption, and AES encryption. e system model and threat models are proposed in Section 3. In Section 4, five basic protocols are constructed, and we present our scheme in which the defined protocols are invoked thoroughly. e security proof and performance analysis are given in Section 5. Finally, we conclude this paper in Section 6.

Notations.
We summarize the notations used in this paper in Table 1.

k-Means
Clustering. k-means clustering is an iterative algorithm that allocates l data records into k disjoint clusters, each of which has a center. Let l m-dimensional data records be d e detailed process of k-means clustering is depicted as Algorithm 1. e algorithm takes as input l m-dimensional l , a predefined number of clusters k, and a predefined max number of iterations I. kcluster centers are firstly picked to compute the Euclidean distance with data records. Each data record is distributed to the cluster which has the minimum Euclidean distance with it. After one iteration, the cluster center μ → j is reassigned as the average value of all data records in c j for j ∈ 1, 2, . . . , k { }. If the max number of iterations is reached or the output clusters does not change any more, terminate the algorithm and output the k-clusters.

BCP Encryption.
In this paper, we utilize the BCP encryption proposed by Bresson et al. [32] which has the additive homomorphic property and provides double decryption mechanisms. e BCP encryption consists of five algorithms as follows: (i) Setup(λ). Taking as input a security parameter λ, the setup algorithm picks two primes p, q of the form p � 2p ′ + 1, q � 2q ′ + 1 and computes N � pq, where p ′ , q ′ are also primes. Consider G � QR N 2 , the cyclic group of quadratic residues modulo N 2 , and we have ord (G) � Nλ(N)/2 with λ(N) � 2p ′ q ′ . It chooses g ∈ G, the order of which is Nλ(N)/2, and we have g λ(N) mod N 2 � (1 + αN)mod N 2 , α ∈ 1, 2, { . . . , N − 1}. e public parameter pp and the master secret key msk are denoted as pp � (N, g), msk � p ′ , q ′ . (2) (ii) KeyGen(pp). Taking as input the public parameter pp, the key generation algorithm randomly chooses a ∈ [1, ord(G)] and computes h � g a mod N 2 . Note that h is of maximal order with high probability. It sets the output public and secret key pair (pk, sk) as (iii) Enc(pk, M). Taking as input a public key pk and a message M, the encryption algorithm randomly chooses r ∈ Z N and generates the ciphertext CT � (A, B) as Specifically, we denote Enc pk (M) as the encryption of message M under the public key pk. (iv) Dec(sk, CT). Taking as input a secrete key sk and a ciphertext CT � (A, B), the decryption algorithm output the message as Specifically, we denote Dec sk (CT) as the decryption of ciphertext CT under the secret key sk.
Query client CCS Cloud computing service KMS Key management server λ e security parameter p, q Two primes of the form p � 2p′ e cyclic group of quadratic residue modolo N 2 g g ∈ G and its order is Nλ(N)/2 msk Master secret key in BCP encryption pk, sk Public key and secret key in BCP encryption ask Symmetric key used in AES encryption Location matrix of n records in k clusters (v) sDec(msk, CT). Taking as input the master secret key msk and a ciphertext CT � (A, B), the system decryption algorithm computes Let arord(G) � c 1 + c 2 N; thus, ar � c 1 mod N is efficiently computable. Let π be the inverse of λ N . It generates the message as Specifically, we denote sDec msk (CT) as the decryption of ciphertext CT under the master secret key msk.
Specifically, BCP encryption has additive homomorphic property, which means is property will be utilized in the whole system.

AES Encryption.
AES encryption is an efficient symmetric encryption system widely used in practical application, where the symmetric means encryption and decryption require the same key. We give the simplified definition of AES as follows: (i) AKeyGen. e sender and receiver consult the secret key sk of the AES encryption system. (ii) AEnc. e sender generates the ciphertext CT of message M under the secret key ask following the AES encryption algorithm. We denote it as CT � AEnc ask (M).
(iii) ADec. e receiver decrypts the ciphertext CT with the secret key sk. We denote it as

System Model.
As shown in Figure 1, our scheme considers four types of entities, i.e., data owner (DO), cloud computing service (CCS), key management server (KMS), and query client (QC).
(i) DO: DO has limited computing power and therefore outsources its encrypted data to the cloud. Our system involves n DOs, denoted as DO 1 , DO 2 , . . . , DO n . For i ∈ [1, n], each DO i has n i data records, and each data record has m attributes. Data owners are assumed not to collude with the cloud servers. (ii) QC: QC is authorized to query and receive the clustering results and does not involve in any clustering calculation. (iii) CCS: CCS stores the datasets of multiple DOs, takes part in the clustering process, and sends the clustering results to the QC. (iv) KMS: KMS generates system parameters and performs ciphertext transformation with the master secret key. It also participates in the clustering process.
3.2. reat Models. In our system, we suppose that CCS and KMS are semihonest. is means they will honestly perform what the protocol requires but will be curious about the messages under ciphertexts they received. Upon this assumption, we define three thread models as follows, where an adversary A acting as different roles in different models attempts to decrypt the ciphertexts sent from DOs and CCS.
(i) Acting as a "malicious" CCS, A tries to obtain the message under ciphertexts sent from DOs and KMS (ii) Acting as a "malicious" KMS, A tries to obtain the real message under ciphertexts sent from CCS (iii) Acting as a "malicious" KMS, A tries to obtain the message under the ciphertexts that sent from DOs to CCS It is worth noting that CCS and KMS are assumed not to collude with each other.
; k: predefined number of clusters; I: predefined max number of iterations;

Our Construction
Based on the scheme proposed by Rong et al. in [19], we construct a more secure clustering scheme. In our construction, we utilize BCP homomorphic encryption to protect the privacy security of data owners such that adversaries cannot extract any useful information about underlying data records of data owners, while AS can easily extract M 1 /M 2 in [19]. Furthermore, AES encryption is also used to double-encrypt the data records to prevent KMS from directly extracting data records from ciphertexts sent from DO to CCS.

Protocols.
We first define five underlying protocols to satisfy different requirements in the clustering process. To securely transfer the data records of DO to CCS, we define secure ciphertext transformation (SCT) protocol. Since the BCP encryption used in our scheme only has additive property, we build a secure multiplication (SM) protocol to realize the multiplicative property. Finally, aiming to classify the similar data records using the ciphertexts, we construct three protocols, namely, secure distance measurement (SDM) protocol, secure distance comparison (SDC) protocol, and secure minimum distance measurement (SMDM) protocol. ese protocols will be invoked through our scheme.

Secure Ciphertext Transformation Protocol.
Secure ciphertext transformation (SCT) protocol aims to transfer the ciphertext of message M encrypted under public key pk x to a ciphertext of M encrypted under public key pk y without revealing M. Suppose two entities in SCT protocol, i.e., Alice and Bob, Alice interacts with Bob following SCTprotocol to convert Enc pk x (M) to Enc pk y (M). To prevent Bob from extracting the message M, a random number is used to blind the message from Bob. e detailed process is listed in Algorithm 2.
Taking as the input the public keys pk x and pk y and the ciphertext Enc pk x (M), Alice randomly chooses r ∈ Z N and encrypts r using pk x to Enc pk x (r). It then computes the encryption of (M + r) under pk x , which can be realized by Enc pk x (M) + Enc pk x (r) because of the additive homomorphic property of BCP encryption. Alice then sends the output sf Enc pk x (M + r) to Bob. Taking as the input the public key pk y , its master secret key msk, and received Enc pk x (M + r), Bob decrypts this ciphertext using its master secret key msk following the system decryption algorithm sDec and obtains (M + r). It then encrypts (M + r) with pk y and sends the output Enc pk y (M + r) to Alice. Alice eliminates r in the ciphertext by computing Enc pk y (M + r) · Enc pk y (− r) and obtains Enc pk y (M) as the final output.

Secure Multiplication
Protocol. Secure multiplication (SM) protocol is used to obtain the ciphertext of messages' multiplication with corresponding messages' ciphertexts using the BCP homomorphic cryptosystem. It is required in this process that the messages should not be exposed. e same as SCT protocol, we also assume two entities in SM protocol, i.e., Alice and Bob. Alice attempts to obtain Enc pk x (M 1 · M 2 ) from Enc pk x (M 1 ), Enc pk x (M 2 ) without revealing M 1 , M 2 to Bob who is the owner of the corresponding secret key sk. We define SM protocol in Algorithm 3.
Taking as the input the ciphertext Enc pk x (M 1 ) and Enc pk y (M 2 ), Alice randomly chooses numbers r 1 , r 2 ∈ Z N and computes the ciphertext of (M 1 + r 1 ), (M 2 + r 2 ) by computing Enc pk x (M 1 ) · Enc pk x (r 1 ) and Enc pk x (M 2 ) · Enc pk x (r 2 ) respectively. is utilizes the additive homomorphic property of BCP encryption. It then sends the output Enc pk x (M 1 + r 1 ), Enc pk x (M 2 + r 2 ) to Bob. Taking as the input the corresponding secret key sk x of pk x , Bob decrypts the received ciphertexts with sk x and obtains (M 1 + r 1 ), (M 2 + r 2 ). It computes the multiplication of (M 1 + r 1 ), (M 2 + r 2 ) as Λ � (M 1 + r 1 ) · (M 2 + r 2 ) � M 1 · M 2 + r 2 M 1 + r 1 M 2 + r 1 r 2 and encrypts Λ with pk x as Enc pk x (Λ) which is used to divide M 1 · M 2 in the underlying message. Bob sends Enc pk x (Λ) to Alice. Finally, Alice computes Enc pk x (M 1 · M 2 ) with Enc pk x (Λ), r 1 , r 2 , Enc pk x (M 1 ), Enc pk x (M 2 ) using the additive homomorphic property of BCP encryption.

Secure Distance Measurement Protocol.
We define the secure distance measurement (SDM) protocol to measure the distance between data records and cluster centers using Euclidean distance. Assume there are n data records and k clusters. Let s → j � (s j,1 , s j,2 , . . . , s j,m ) be the sum of data records in cluster c j and |c j | be the number of data records in cluster c j , respectively. Given a data record d denoted and computed as in the following equation: e process is depicted as Algorithm 4.

Secure Distance Comparison
Protocol. Secure distance comparison (SDC) protocol is to determine the shorter distance between two output distances from SDM protocol. Taking as the input two distances, i.e., (Enc pk x (Ω i,a ), |c a |) and (Enc pk x (Ω i,b , |c b |)), Alice interacts with Bob to obtain the shorter one. As in [19], the difference between two differences can be expressed as Since we only need to know whether ((|c b | 2 Ω i,a − |c a | 2 Ω i,b )/(|c a | 2 |c b | 2 )) > 0 or not, it is equal to judge whether |c b | 2 Ω i,a − |c a | 2 Ω i,b > 0 or not. is means, the comparison can be related to Let β be the maximum size of messages. We have . Let η be the threshold for sign judgement chosen from [2 β − 1, N + 2 β − 1]. To prevent Bob from obtaining distance-related information, Alice blinds the message with a random r ∈ [1, min N − η, (N − ϕN)/2 β− 1 ] with ϕ ∈ Z and satisfying We illustrate the detailed realization in Algorithm 5.
In the process, Bob cannot obtain Ω i,a , Ω i,b .

Secure Minimum Distance Measurement Protocol.
Finally, we define the secure minimum distance measurement (SMDM) protocol as Algorithm 6 to choose the shortest one among given distances.

Our Scheme.
At the beginning, the four entities in the system, i.e., data owners DOs, query client QC, cloud computing service CCS, and key management server KMS, setup the system by running the algorithms, Setup, KeyGen, and AKeyGen. DOs then run Enc and AEnc on their data records and upload to CCS separately. CCS decrypts the received ciphertexts using ADec. After receiving the clustering request from QC, CCS interacts with KMS to transform the ciphertexts encrypted under different public keys to ciphertexts encrypted under the same public key. Subsequently, CCS performs the clustering computation. Finally, CCS interacts with KMS to transfer the clustering result to QC. It is worth noting that the defined protocols are invoked through the process.

System Setup.
As the setting in the system model (see Section 4.1), we have n data owners DO i 1 ≤ i ≤ n , cloud computing servers (CCS), key management server (KMS), and query client (QC). Before running the protocols, related entities in the system model generate their keys as follows: (1) Taking as the input a security parameter λ, KMS runs the setup algorithm Setup(λ) of the BCP homomorphic cryptosystem and generate the public parameter pp and master secret key msk, where msk is kept secret (2) Each data owner DO i runs KeyGen(pp) to generate its own public/secret key pair (pk i , sk i ), 1 ≤ i ≤ n (3) Each DO i consults with CCS a symmetric key ask i through Diffie-Hellman key exchange protocol or other methods for 1 ≤ i ≤ n (4) CCS runs the key generation algorithm KeyGen(pp) to generate its public/secret key pair as (pk c , sk c ) (5) QC runs KeyGen(pp) to generate its own public/ secret key pair (pk q , sk q )

Data
(2) To prevent the privacy disclosure from KMS, data owners double-encrypt the output ciphertext with AES encryption. Each DO i computes aEnc ask i Enc pk i d →i and sends the output results to CCS. (3) After receiving aEnc ask i (Enc pk i ( d →i j )) 1 ≤ j ≤ n i from DO i , CCS runs the decryption algorithm aDEC with the consulted symmetric key ask i on each ciphertext to obtain aDEC ask aEnc ask Enc pk i d →i In our setting for data uploading, each data owner DO i sends their double-encrypted ciphertext to CCS such that the KMS cannot obtain the original message of the data owner although the KMS has the master secret key msk which can be used to decrypt the ciphertext encrypted under the BCP homomorphic cryptosystem.

Ciphertext Transformation.
is phase is to transfer "multiuser" to "single-user" by re-encrypting the ciphertext encrypted under the public key of pk i to the ciphertext encrypted under pk c , 1 ≤ i ≤ n.
(1) QC sends a clustering request to CCS.
(2) For a ciphertext Enc pk i ( d →i j ) from DO i , CCS interacts with KMS to run the SCT protocol by setting pk x � pk i , pk y � pk c , Enc pk x (M) � Enc pk i ( d →i j ), msk � msk. Finally, CCS obtains Enc pk c ( d →i j ) .
Input: Enc pk x ( d → i ), Enc pk x (s j → ), |c j | Begin: Enc pk x (Ω i,j ) � 0 for α � 1 to m 1. Run SM protocol on Enc pk x (d i,α ) and Enc pk x (|c j |) to obtain Γ � Enc pk x (|c j | · d i,α ) 2. Compute Enc pk x (Ω i,j ) � (Γ · Enc pk x (s j,α ) N− 1 ) 2 + Enc pk x (Ω i,j ) end for Output: (Enc pk x (Ω i,j ), |c j |) ALGORITHM 4: SDM protocol. Security and Communication Networks 7 (3) By performing the SCT protocol on all the ciphertexts received from DO i 1 ≤ i ≤ n , CCS finally obtains Enc pk c d →i Let n � n 1 + n 2 + · · · + n l , and denote these n ciphertexts as For simplicity, we denote Enc pk c ( d It is worth noting that the final ciphertexts are unknown to the KMS since they are blinded in the SCT protocol.

Clustering Computation.
In this phase, CCS computes the clustering results with k randomly chosen cluster centers 1 ), Enc(s j,2 ), . . . , Enc (s j,m )) and |c i | � 1. CCS also outputs a matrix V n×k which refers to the location in k clusters of n records, where V i,j � 1 means d → i is allocated to j-th cluster. In addition, there is a maximum iteration time ϕ max . Let ϕ � 0.
(1) For a data record Enc( d → i ), CCS runs the SMDM protocol on it and k-cluster centers with the setting pk x � pk c . Finally, CCS obtains the output where

runs step 1 and obtains
Enc Ω i,min , c i,min , 1 ≤ i ≤ n, (21) and the matrix V n×k .

Result Retrieval
(1) CCS interacts with KMS to run the SCT protocol on Enc pk c ( s → j ) 1 ≤ j ≤ k with the setting pk x � pk c , pk y � pk q , Enc pk x (M) � Enc pk c ( s → j ). CCS obtains and sends it and V n×k to QC. (2) QC decrypts the received Enc pk q ( s → j ) with its secret key sk q by computing QC then computes the cluster centers as where |c j | � n i�1 V i,j .

Security Analysis.
As shown in the proposed scheme (see Section 4.2), our protocol is realized by invoking the BCP homomorphic cryptosystem, AES encryption, and the defined protocols. Upon that, the former two cryptosystems are semantic secure, and we give the security proof of the defined protocols as follows. We take the SM protocol's security proof under "Real-vs.-Ideal" framework as an example. Other protocols' security proofs are in a similar manner and we omit here.

Theorem 1. SM protocol is secure.
Proof. SM protocol relates to two semihonest parties, namely, Alice and Bob. erefore, we consider both securities of SM protocol against semihonest attacker Alice A A and semihonest attacker Bob A B . In the protocol, Alice takes as the input pk x , Enc pk x (M 1 ), Enc pk x (M 2 ) and Bob takes as the input the corresponding secret key sk x of public key pk x .
(i) Security against A A : In the SM protocol, the realworld view of the attacker Z A includes the input pk x , Enc pk x (M 1 ), Enc pk x (M 2 ), random numbers r 1 , r 2 , Enc pk x (Λ), and the output Enc pk x (M 1 · M 2 ), where Λ � (M 1 + r 1 )(M 2 + r 2 ). A A tries to obtain useful information about the underlying messages, i.e., M 1 , M 2 , (M 1 + r 1 )(M 2 + r 2 ), M 1 · M 2 that are encrypted under pk x . Because of the semantic security of the used BCP homomorphic cryptosystem, we have that A A cannot extract any information of underlying messages except the bit length without sk x . erefore, we can construct a simulator S A in the ideal world by using ciphertexts of random chosen messages. It will be computationally hard for A A to distinguish the ideal world with real world because of the semantic security of the BCP homomorphic cryptosystem. We have where ≈ c means computationally indistinguishable.
(ii) Security against A B : In the protocol, A B takes as the input the secret key sk x of pk x and Enc pk x (M 1 + r 1 ), Enc pk x (M 2 + r 2 ). With sk x , A B can decrypt the ciphertexts and obtain the underlying messages M 1 + r 1 , M 2 + r 2 . However, since r 1 , r 2 are randomly chosen by Alice, they are random numbers in the point of view of A B . We can then construct a simulator S B in the ideal world by using ciphertexts of random chosen messages, and it will be computationally hard for A B to distinguish the ideal world with the real world. We have is completes the proof of eorem 1. Next, we prove that our protocol is secure by taking the process of data uploading as an example.

Theorem 2.
e data uploading process is secure.
Proof. In the data uploading process, data owners (DOs) double-encrypt their data records with pk and ask using the BCP homomorphic cryptosystem and AES encryption separately. ey then send the encrypted result to CCS who has ask but does not have the corresponding secret key sk of pk. Because of the semantic security of the BCP homomorphic cryptosystem, it is secure against semihonest CCS. Although KMS can extract the underlying messages of ciphertexts encrypted using the BCP homomorphic cryptosystem, it is also computationally hard for a semihonest KMS to obtain any information of data records with the semantic security of AES encryption. Furthermore, CCS and KMS are supposed not to collude in our scheme such that the data Security and Communication Networks uploading process is secure against semihonest CCS and KMS separately. is completes the proof of eorem 2.
It is worth noting that the security of our construction is protected by the semantic security of the BCP homomorphic cryptosystem, AES encryption, and blinding with random numbers, which prevents the adversaries from obtaining any useful information from the received ciphertexts. □ 5.2. Performance Analysis. In our construction, we use the BCP homomorphic cryptosystem and AES encryption to encrypt data owners' data records to prevent the information disclosure to KMS. Compared with the underlying scheme [19] which utilizes Youn's homomorphic encryption scheme [33], our scheme therefore increases the computation cost between DOs and CCS along with the increased security.
In particular, each data owner additionally needs to interact with CCS to consult a symmetric key of AES encryption in the system setup phase. Except this, since BCP encryption has additive homomorphic property instead multiplication in Youn's encryption scheme [33], we give a secure multiplication protocol SM instead of secure addition SA in [19]. is leads to different invocations in other defined protocols, which result in more computation cost.
With the sacrifice on the computation cost, our scheme achieves semantic security that adversaries cannot obtain any useful information about underlying data records, while AS can extract M 1 /M 2 in SA protocol of [19]. Furthermore, in our scheme, KMS cannot extract the underlying data records of data owners, while KMS can realize this with its master secret key in [19].
Finally, we compare our scheme with the existing outsourced k-means clustering schemes [19,22,23,26,30,34] in Table 2 on six aspects, i.e., whether the scheme is based on symmetric or asymmetric cryptosystem, whether it supports or achieves multiple data owners and multiple keys, ciphertext comparison, security, and multidimensional data. As shown in Table 2, our scheme achieves all the listed functionalities under the asymmetric cryptosystem.

Conclusions
is paper proposed a highly secure privacy-preserving outsourced k-means clustering scheme on the encrypted datasets under multiple keys. We utilized BCP homomorphic encryption and AES encryption to double-encrypt the data records in the database to protect the security against semihonest cloud computing server and key management server. Furthermore, we constructed five protocols, i.e., secure ciphertext transformation (SCT), secure multiplication (SM), secure distance measurement (SDM), secure distance comparison (SDC), and secure minimum distance measurement (SMDM), as the base of our scheme. In particular, SM protocol is built to achieve the homomorphic multiplicative property using BCP encryption. Finally, we proposed our scheme by invoking the defined protocols thoroughly. e given security and performance analysis shows that our scheme is comparable with the existing outsourced k-means clustering scheme on security and functionality.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.