Secure KNN Classification Scheme Based on Homomorphic Encryption for Cyberspace

With the advent of the intelligent era, more and more artiﬁcial intelligence algorithms are widely used and a large number of user data are collected in the cloud server for sharing and analysis, but the security risks of private data breaches are also increasing in the meantime. CKKS homomorphic encryption has become a research focal point in the cryptography ﬁeld because of its ability of homomorphic encryption for ﬂoating-point numbers and comparable computational eﬃciency. Based on the CKKS homomorphic encryption, this paper implements a secure KNN classiﬁcation scheme in cloud servers for Cyberspace (CKKSKNNC) and supports batch calculation. This paper uses the CKKS homomorphic encryption scheme to encrypt user data samples and then uses Euclidean distance, Pearson similarity, and cosine similarity to compute the similarity between ciphertext data samples. Finally, the security classiﬁcation of the samples is realized by voting rules. This paper selects IRIS data set for experimental, which is the classiﬁcation data set commonly used in machine learning. The experimental results show that the accuracy of the other three similarity algorithms of the IRIS data is around 97% except for the Pearson correlation coeﬃcient, which is almost the same as that in plaintext, which proves the eﬀectiveness of this scheme. Through comparative experiments, the eﬃciency of this scheme is


Introduction
With the gradual maturity of various AI algorithms, data has gradually become the basis of social operation, playing an essential role in important areas such as economic investment, social management, scientific and technological development, and national security. Large amounts of data are uploaded to cloud servers from personal front-ends, social networks, sensor networks, and the Internet for sharing and analysis. With the development of cloud computing, many artificial intelligence algorithms derived from large data have been developed and widely used in various APPs. Major manufacturers generate recommendation algorithms or make price decisions by analyzing large data of user behavior, such as TikTok, Taobao, small videos recommended by Meituan for users, commodities, and stores. Drip travel also analyzes user travel segments to increase prices for older users. As one of the classic algorithms for big data classification, the KNN algorithm realizes data classification by calculating the similarity between the test data set and the training data set. Because of the simple structure, high efficiency, and accuracy of the KNN algorithm, it is adopted in various scenarios; for example, it is used in image recognition, traffic classification, sensor detection, and especially text classification. Commercial clouds such as Google, Microsoft, and Amazon also provide related services. With the popularity and application of cloud computing, large numbers of users upload local data to cloud servers to enjoy storage and computing services provided by cloud platforms. However, the security of commercial clouds is often not fully trusted [1][2][3]. As shown in Figure 1, various users and cloud servers transmit various data and classification results in network space, and the security of data needs to be protected.
In recent years, privacy leakage incidents such as malicious collection and theft of user-sensitive privacy data by APP have emerged one after another. In 2007, the National Security Agency (NSA) and the Federal Bureau of Investigation (FBI) launched the infamous "Prism" secret surveillance project, which directly entered the central server of the US Internet company to mine data and collect intelligence; nine international network giants including Microsoft, Yahoo, Google, and Apple have participated in it. In 2018, the Z. power risk monitoring platform detected NASDAQ: HTHT data leak, involving sensitive and private data of about 500 million citizens. In 2021, China Internet Network Information Center issued a notice on the illegal collection and use of personal information by 105 apps including TikTok. Under this trend, people's personal data has gradually become another commodity, and many manufacturers privately collect or even sell users' personal data. In the traditional KNN classification schemes [4][5][6][7], the user uploads the local raw data to the cloud platforms for storage and calculation, and then, the cloud platforms compute the similarity between samples and return the result to the user. However, for some highly confidential data, such as personal privacy data, commercial confidential data, medical privacy data, and national security data, once these data are leaked or stolen, the consequences are unimaginable. erefore, the core issue of this study is to implement a secure privacy-preserving KNN classification scheme efficiently and accurately in the cloud environment. An effective solution for the secure KNN classification scheme is to encrypt local data through a cryptographic algorithm and compute the similarity on the ciphertext data. However, due to the limitation of the encryption algorithm, the computational overhead and storage overhead brought by this solution are extremely large compared to plaintext [8,9]. First of all, the basic logic operations supported by general encryption schemes are limited, and iterative algorithms are needed instead. Second, some encryption schemes are restricted by modulus reduction and can only support a limited number of multiplications. Finally, traditional encryption schemes can only encrypt digits or vectors one by one, requiring a lot of resources to store the ciphertext. In 2017, Cheon et al. [10] proposed a scheme of homomorphic encryption, CKKS, which supports real number/complex number approximations. is scheme has the ability of homomorphic encryption for floating-point numbers and comparable computational efficiency, which has become a research focal point in the cryptography field, and is widely used in machine learning and big data analysis. To keep users' privacy safely, ensure the security of data and implement KNN classification safely and efficiently in the cloud environment, and enable it to maintain computational efficiency and classification accuracy in plain text fields, as proposed by this paper, a secure KNN classification scheme ground on the CKKS homomorphic encryption scheme in cloud sever for Cyberspace (CKKSKNNC) will be presented. We realize the computation of the similarity through the Euclidean distance, Pearson correlation coefficient, and cosine similarity, at last, to make the secure KNN classification can operate in the domain of ciphertext, which can avoid any user privacy data leakage.

Related Work
e traditional KNN classification scheme is divided into two types: one is to assign the same optimal K value for every sample in the test [11][12][13], and the other is for different test samples; the experts assign individual K values [14][15][16][17]. In recent years, a lot of categories is designed based on the KNN algorithm. In 2016, Deng et al. [18] first divided the data set into several categories by K-means algorithm and then classified them by KNN, which realized the efficient KNN algorithm for big data. In 2017, Zhang et al. [19] came up with a kTree method to effectively implement KNN classification with different neighbor numbers. In 2020, Zhang et al. [20] designed two effective cost-sensitive KNN classifiers to classify unbalanced data. In 2021, Zhu et al. [21] proposed an ML-KNN integration scheme which can realize classification algorithm recommendations, and the scheme can take advantage of the diversity of different data features. Levchenko et al. [22] implemented a KNN query on a large time-series database based on iSAX and sketch. In the meantime, for protecting the users' private data, researchers also carry out a lot of research on how to perform KNN classification on the ciphertext. As early as 2009, Wong et al. [23] designed an asymmetric vector product preservation encryption scheme (ASPE) to support KNN calculations on encrypted data, which supports KNN computation of encrypted data by retaining a special type of scalar product, but the scheme assumes that the querying user is fully trusted, which is not suitable for practical application in complex network environments. In 2013, Zhu et al. [24] proposed a secure KNN calculation scheme for encrypted cloud data, and it does not need to share the key with the querying user, but they increase the communication overhead compared to the ASPE scheme. In 2014, Elmehdwi et al. [25] proposed a secure KNN scheme in an outsourcing environment based on the Paillier homomorphic encryption scheme, which can query the data without leaking any information to the cloud server by using the feature of homomorphic encryption and hides the query and data access  mode of users, but the computing cost is large. In 2015, Xia et al. [26] proposed a secure dynamic multikeyword ranking search scheme based on encrypted cloud data, which achieves sublinear search time and handles document deletion and insertion flexibly with a special tree-based index structure. Samanthula et al. [27] proposed a KNN classification scheme, which can be used for encrypted data stored in the cloud based on Paillier and multiparty security protocol. In 2016 and 2017, based on the Paillier homomorphic encryption scheme, similarly, Kim et al. [28,29] designed a privacy protection KNN classification algorithm using the tree index structure and Yao's garbled code, respectively. However, the KNN classification scheme based on Paillier homologous encryption scheme is inefficient to compute, has some limitations in calculation method, and has a high computation cost. Li et al. [30] presented two secure and effective dynamic searchable symmetric encryption (SEDSSE) schemes for medical cloud data, they combined the secure KNN scheme and ABE technology to design a dynamic searchable symmetric encryption scheme and a key sharing scheme, and they implement both forward and backward privacy security and propose an enhanced scheme to effectively solve the key sharing problem caused by search encryption using KNN. In 2018, Wu et al. [31] used Paillier and ElGamal encryption schemes to implement a secure KNN classification scheme on a semantically secure hybrid encrypted cloud database. Later, Liu et al. [32] proposed a privacy protection KNN classification scheme in the dual cloud model based on secret sharing and additive homomorphic cryptography. In 2020, Parvin et al. [33] developed an electronic medical record analysis system on the blockchain based on KNN and LDA algorithms to automatically and safely share medical data sets among medical experts. In the same year, in order to realize the classification of large-scale ciphertext data in distributed servers, Yang et al. [34] proposed a vector homomorphic encryption (VHE) scheme through constructing key switching matrix and noise matrix and constructed a secure distributed KNN classification algorithm (seed KNN) based on it. Recently, Kim et al. [35] proposed an index-based KNN query processing algorithm and improved query processing efficiency through Yao's garbled code and data packaging technology. Liu et al. [36] achieved secure KNN classification by a secure and efficient query processing (SecEQP) scheme, which encodes location information through a projection function and implements locationbased query processing based on the encrypted geospatial data stored in the cloud.
is article mainly analyzes the CKKS homomorphic encryption algorithm. As shown in Figure 2 which is drawn referring to Cheon's Report [10], the following describes the main algorithm flow of CKKS.
Set safety parameters λ, and choose the power of two integers N. Set distributions χ key , χ err , χ enc for key, learning with errors, and encryption on R � Z[X]/(X N + 1) individually. To get a basic integer p and the number of levels L, set the modulus of the ciphertext q l � p l (1 ≤ l ≤ L), where l is the level of ciphertext, then create an integer P at random, and output pp � (N, χ key , χ err , χ enc , L, q l ): (1) Key Gen(params) ⟶ (pk, sk, ks, rk r , ck).
Given that the CKKS encryption scheme has a nature of being homomorphic, the cloud server computes ciphertext equivalent to plaintext, which would ensure both privacy of the user and the efficiency of the encryption.

K-Nearest Neighbor.
Proposed by Cover and Hart in 1968, KNN came into the public view quite some time ago [37], and it ranks among the simplest algorithms for machine learning. Due to its simple structure and remarkable classification performance, it became one of the most popular algorithms in the data mining and statistics fields, granting it a seat among the top ten data mining algorithms [6], and is used very commonly in classification, regression, and missing value interpolation and other fields [38][39][40]. At present, many algorithms for machine learning have been developed to better determine the value of k in the KNN algorithm and the distance measurement algorithm. Being one of the most classic data mining classification technology algorithms, the main idea of the KNN nearest neighbor classification algorithm is to establish the category objects to be classified, based on the category of the majority of samples in a certain range adjacent to the object to be classified. e working principle of the KNN nearest neighbor classification algorithm is to compare the sample waiting for classification with the others which are of established categories in the database, and to compute the similarity between these two sets of different samples, and select the k samples of known categories with the closest similarity to the sample to be classified. According to the voting rule (minority obeys the majority), the category of the sample to be classified falls in rank with the category which has the highest proportion of the k-nearest samples. Suppose that we have samples of known categories , where X represents the characteristic index of the sample, and Y represents the category label of the sample. For a given sample X ′ to be classified, we select the k samples with the highest similarity in the vicinity of X ′ , and these samples vote for the category of X ′ according to their own category. e category label with the most votes is called category Y ′ of X ′ , as shown in Figure 3. e green dots represent samples to be classified, and the blue squares and red triangles represent the other two samples of known categories. When k � 3, the proportion of red triangles in the nearest neighboring range is 2/3, and the green dots are judged as red triangle samples. When k � 5, the proportion of the blue square in the nearest neighbor is 3/5, and the green dot is judged as a blue square sample.
e KNN method is more suitable than other methods in the sample to be classified with more intersections or overlaps in the class domain. ere are many methods for calculating similarity in the KNN algorithm, such as the Euclidean distance, cosine similarity, Pearson correlation, Manhattan distance, and Chebyshev distance. e most commonly used method is the Euclidean distance.

Ciphertext Matrix Transpose Operation.
Since this scheme is implemented in the TenSEAL homomorphic encryption library, although TenSEAL provides the ciphertext matrix multiplication function mul and the inner product function dot, it does not provide a ciphertext transpose function. erefore, this part will introduce the process of transposing ciphertext matrix in the TenSEAL homomorphic encryption library. TenSEAL provides a very useful function reshape; its function can be expressed as A m×n � A 1×m * n .reshape ([m, n]). Suppose that there is a ciphertext matrix A m * n . First, the transition matrix D m * n×m * n is generated, and the transposition process can be shown in Figure 4.
It can be seen that A m×n is converted to A 1×m+n through reshape([1, m + n]). Afterward, the internal elements are rearranged through dot(D m+n×m+n ) and finally transposed through reshape([n, m]).

Symbols and Parameters.
In order to show the algorithm in this article more intuitively, we briefly introduce the related symbols that are often used in this article, as shown in Table 1. e vectors are illustrated in lowercase bold letters and the matrices are shown in uppercase bold letters. Add enc_ in front to indicate the ciphertext form of the data. Figure 5, the CKKSKNNC protocol model designed in this paper is composed of two parts, namely, the user (USER) and the cloud service provider (CSP). Among them, CSP can provide remote storage and computing services for users, which is "honest and curious". Users have a large amount of local data and enjoy the services provided by CSP. e division of labor of each part is as follows:

Proposed Model. According to
(1) USER: generate public and private keys locally, encrypt data and upload them to CSP, and decrypt ciphertext computation results (2) CSP: provide remote storage and services for computing for USER, with capable storage and  computing capabilities, taking charge for storing the ciphertext data uploaded by USER, calculating the similarity between the encrypted sample to be classified and other ciphertext samples, and returning the ciphertext result to USER First of all, USER generates public and private keys locally, encrypts locally known samples of classes, and sends them to CSP. CSP accepts the ciphertext samples sent by USER and stores them. When USER receives a new sample to be classified, USER encrypts the sample to be classified locally and delivers it to the CSP. e CSP accepts and

Security Model.
Since the CSP is "honest and curious", the transmission network may also be subject to malicious attacks. erefore, we list the following security issues that may occur when users upload data to the cloud server for KNN classification: (1) CSP may strictly abide by the designed protocol, but it can infer other additional information through the information legally received in the process of the protocol (2) CSP attempts to steal USER's public and private keys and relies on stored ciphertext data samples to try and decipher the USER's plaintext data samples and private keys (3) During the transmission process between the useruploaded ciphertext data and the ciphertext result returned by the cloud server, data samples may be maliciously intercepted by hackers and be used to crack the user's sensitive data

CKKSKNNC Framework.
Assuming that the user has sample data [(X 1 , Y 1 ), (X 2 , Y 2 ), . . . , (X n , Y n )], which is known categories to be uploaded locally, where X j � (x 1 , x 2 , . . . , x m ) T , the system protocol framework is shown in Figure 6. According to the protocol framework, the protocol algorithm is made up of two phases, namely, the data initialization phase and the classification phase. e specific operation procedures are listed as follows: (1) Data initialization: (a) First, USER standardizes the characteristic index of local data samples; compute where x j � 1/n n i�1 x ij represents the average value of the j-th characteristic index, var(x j ) � 1/n − 1 n i�1 (x ij − x j ) 2 represents the standard deviation of the j-th characteristic index, then the standardized data is a [(X 1 , Y 1 ), (X 2 , Y 2 ), . . . , (X n , Y n )], the average value of its characteristic index is 0, the variance is 1, and it is dimensionless. b. USER generates public and private keys locally (pk, sk) and encrypts the characteristic index and category labels in the original data and standardized data, respectively, get(enc_X, enc_Y)and(enc_X, enc_Y), and upload both to CSP for storage.
(2) Classification (a) After receiving the new sample X ′ to be classified, USER first standardizes its characteristic index to obtain X ′ , then uses the public key pk to encrypt it to obtain enc_X ′ , and sends the encrypted result to the CSP as a query matrix. (b) After receiving the query matrix, the CSP computes the similarity enc_result in the ciphertext between the sample waiting for classification and others that are of other known categories and returns it to USER. (c) USER decrypts enc_result, selects the top k samples with the highest similarity, and obtains the category label Y ′ of the sample to be classified according to the voting rule.

Security Similarity Calculation.
In the process of data mining and data analysis, there are many methods to measure the differences between samples. In the CKKSKNNC protocol, this paper uses the Euclidean distance, Pearson correlation coefficient, and cosine similarity to measure the similarity between samples.

Euclidean Distance.
e Euclidean distance [41] is the most popular similarity measurement method. It has been widely used in various scenes such as face recognition. e traditional Euclidean distance computation method is to directly calculate the absolute distance between each point in the multidimensional space and the Euclidean distance between samples through the matrix inner product [42]. Two methods for calculating the Euclidean distance are introduced below. Method 1: since the ciphertext encrypted by the CKKS homomorphic encryption algorithm cannot be directly squared, the distance is not squared, and the ciphertext distance between the sample to be classified and the sample of the category is As the distance grows smaller, the similarity of the samples becomes higher. Method 2: before uploading data, USER computes X * ′ � (1, X ′ T X ′ , X T ) T and X * � (X T , X, 1 − 2X T ) T , respectively, and uploads data to CSP after being encrypted. CSP can directly compute the ciphertext distance between two samples through the inner product: (2) e result of this method is the same as that of the first method. Although it increases the computational complexity, it can be batch-processed computation.

Pearson's Correlation Coefficient.
Since the magnitude of the different characteristic index of the sample has a greater impact on the Euclidean distance, in some applications, people often choose the Pearson correlation coefficient [43] that is not sensitive to the magnitude to measure the similarity between samples. e data sample has been standardized before uploading to the CSP; the computation process is as follows:

Cosine Similarity.
e angle cosine similarity is like the Pearson correlation coefficient and insensitive to the magnitude of the characteristic index. It is often used in the computation of text similarity, but it needs to be computed on the original data. It measures the similarity by calculating the cosine of the angle between both samples in the vector space. And the method pays more attention to what is different from the direction of one vector to another, rather than the distance measurement. Similarly, because the CKKS ciphertext cannot be directly used for square rooting, the cosine similarity computation process in the protocol is as follows: In terms of actual implementation, the CKKS ciphertext cannot be directly performed division operations, so the CSP will actually return the two values of (enc X ′ T enc X) 2

and
Encrypt local data:

Encrypt:
Generate the private and public key pair:

Security and Communication Networks
(enc X ′ T enc X ′ )(enc X T enc X) to the USER, and the USER will decrypt it and perform the division on the plaintext.

Batch.
Assume that CSP stores ciphertext samples of known category [(enc X 1 , enc Y 1 ), (enc X 2 , enc Y 2 ), . . . , (enc X n , enc Y n )], when USER uploads multiple ciphertext samples to be classified [enc X 1 ′ , enc X 2 ′ , . . . , enc X q ′ ], CSP needs to compute the similarity between each sample to be classified and each sample of a known category, and the encryption method is determined by the method of calculating the similarity.

Euclidean Distance.
When CSP uses the Euclidean distance method 1 to compute similarity, batch processing cannot be performed. USER needs to encrypt each sample separately, and CSP needs to separately compute the ciphertext similarity between each sample to be classified and all of the other samples of known categories and returns the similarity matrix enc where enc_d ij is the similarity between the i-th known category sample and the j-th sample to be classified. When CSP uses the Euclidean distance method 2 to compute similarity, USER can directly encrypt the plaintext matrix of the sample to be classified and obtain the ciphertext matrix enc X ′ � [enc X 1 ′ , enc X 2 ′ , . . . , enc X q ′ ], CSP computes[(enc X * ) 1 , (enc X * ) 2 , . . . , (enc X * ) n ], [(enc X * ′ ) 1 , (enc X * ′ ) 2 , . . . , (enc X * ′ ) q ], and similarity matrix is enc D 2 � [enc d ij ] � (enc X * ) T (enc X * ′ ), i ∈ [1, n], j ∈ [1, q], where enc_d ij is the similarity between the sample of i-th from a known category and the sample of j-th, which is yet to be classified.

Security Analysis
According to the protocol model of CKKSKNNC, since the CSP is 'honest and curious', the USER's private key is only stored locally, and the CSP is only in charge of storing data and computing the user-uploaded ciphertext data; both the public and the private central information of the USER cannot be obtained. e security definition of the semitrusted model is listed as follows.
Definition (security of semitrusted model): assume function f(x, y), where f 1 (x, y) and f 2 (x, y) are, respectively, the first and second elements of f(x, y). Assume that Γ is a two-party protocol used to compute f(x, y). PARTY 1 (x, y) is a role that implements the Γ protocol, where PARTY 1 (x, y) � (x, r, p 1 , p 2 , . . . , p t ), x represents input, r represents randomness, and p i represents the i-th data accepted. Also PARTY 1 (x, y) � (y, r, p 1 , p 2 , . . . , p t ) is available. If there exists probabilistic polynomial-time algorithms V 1 and V 2 , such that where computational indistinguishability is represented by ⟼ , it is said that computing f(x, y) is secure when Γ protocol is against semitrusted adversary. Proof. According to the protocol algorithm, in the transmission process, USER's sample data, category label, and query matrix are all transmitted in the form of CKKS ciphertext. Its security is protected by the CKKS homomorphic encryption scheme, which grants security, and security is resolved by its own algorithm. erefore, the CSP cannot restore the original user data and keys through the stored user data and intermediate computation results. In the process of returning the result, the similarity computation result is transmitted to the user in ciphertext for decryption, so the attacker cannot recover the user's original sensitivity from the intercepted ciphertext data during the process of transmission and the data in the stolen cloud server data. In addition, no matter what method is used to compute the similarity, multiplications only need 3 times at most. erefore, the CKKSKNNC protocol algorithm does not have additional special requirements for the parameters of the CKKS encryption scheme.

Experimental Test
With the aim of well testing the potency of the scheme in a proposition, we conduct our experiments on Windows10 operating system with Intel ® Core ™ i7-7700HQ CPU @ 2.80 GHz/16 GB RAM, using PyCharm 2020.1.1 × 64 to call TenSEAL-0.1.4 library to implement the CKKS encryption scheme, take poly_ modulus_ degree � 8192, coeff_ mod_ bit_ sizes � [50, 30,30,50], scale � 30 as the parameter of CKKS homomorphic encryption scheme, and test on IRIS data set.

Efficiency of Similarity Calculation.
In this part, we test the computational efficiency of different similarity algorithms. We randomly selected 100 groups of IRIS data set as the known class samples and randomly selected the remaining 10, 20, 30, 40, and 50 groups of samples as the samples to be classified to form the test set, recorded the time to compute the similarity on the ciphertext, and recorded the results of 30 experiments, and the average value is regarded to be the final experimental data. e computational efficiency of the four similarity algorithms is shown in Figure 7.
In this part, we did not record the time to compute the transposition of the ciphertext matrix, because we make matrix transposition default as preprocessing work. It shows that as the number of samples to be classified in the test set increases, the computation costs of the four similarity algorithms increase linearly. Among them, the Euclidean distance 2 has the highest computation efficiency, and cosine similarity computation has the lowest efficiency because it performs more cipher multiplications. But it is worth mentioning that the ciphertext in the Euclidean distance 1 uses the CKKS_Vector data type, and other methods use the CKKS_Tensor data type. e storage overhead of different test sets is shown in Table 2 (unit: byte).
It shows that, with the rising amount of samples, the ciphertext data set of CKKS_Vector type occupies a linear increase in memory and occupies more memory than the same number of ciphertext data sets of CKKS_Tensor type, while the ciphertext data set of CKKS_Tensor type occupies a constant memory change. erefore, when dealing with big data, try to avoid using the Euclidean distance 1 and cosine similarity to compute the similarity.

Accuracy of Similarity Calculation.
In this part, we take a random number of samples to be classified between 35 and 45 as the data set and test the classification accuracy of the four similarity algorithms when k � 3, 5, 7, and 9, respectively. We record the results of 30 experiments and take the average value as the final experimental data. e computation accuracy of the four similarity algorithms is shown in Figure 8.
It shows that the accuracy of the Euclidean distance and cosine similarity is stable at about 97%, but the accuracy of the Pearson correlation coefficient is as low as about 65%. erefore, when doing the KNN classification algorithm, try to avoid using the Pearson correlation coefficient to compute similarity degree.     scheme, we test the efficiency and accuracy of computing the Euclidean distance in the Paillier and CKKS homologous encryption schemes with the same encryption parameters and in plain text. It should be emphasized that since the CKKS scheme is implemented in the TenSEAL homologous encryption library, which encapsulates the seal encryption library based on C++ into a dynamic library called python, the speed of the CKKS scheme depends on the efficiency of the sealed library in C++. erefore, we did a comparative experiment in C++ with the Paillier homologous encryption scheme by calling NTL and GMP libraries with encryption parameter |N| � 1024. First, we compare the computational efficiency of the three schemes. We randomly selected the remaining 10, 20, 30, 40, and 50 groups of samples as the test set to be classified and calculated the similarity time as shown in Figure 9. Clearly, the efficient CKKS schemes are more computational and closer to plain text than the Paillier schemes, and CKKS can support batch processing of data. In the encryption mode, the CKKS scheme can directly encrypt the data matrix, while the Paillier scheme can only encrypt numbers one by one and does not support floating-point operation. en, we put together, in comparison, the accuracies of the three different schemes. We take a random number of samples to be classified between 35 and 45 as a dataset and test the classification accuracy of the four similarity algorithms at k � 3, 5, 7, and 9, respectively. e result can be found in Figure 10.
It is evident that whether the CKKS scheme or the Paillier scheme is used for security calculation, the calculation accuracy is not different from that calculated directly in plain text. We then tested the storage costs of the three schemes in datasets with different sample sizes, as shown in Table 3 (in bytes).
As the number of samples goes up, the encrypted dataset of the CKKS scheme occupies the least memory and remains unchanged, but the Paillier scheme and the plain scheme occupy more memory and increase linearly. e secure KNN classification algorithm that chooses the CKKS scheme to process large data has absolute advantages.

Conclusions
To protect sensitive privacy data of cloud servers and users during transmission while meeting classification accuracy and computational efficiency requirements of classification algorithms, this paper implements a secure KNN classification scheme in ciphertext domain for Cyberspace (CKKSKNNC), based on the KNN classification scheme and CKKS algorithm. We use the TenSEAL homomorphic encryption library to implement the CKKS homomorphic encryption scheme and select two schemes of the Euclidean distance, Pearson correlation coefficient, and cosine similarity as the algorithm for calculating similarity in the KNN classification algorithm and test the computational efficiency, storage cost, and classification accuracy of the four similarity algorithms on IRIS data set. rough experimental tests, we found that the Euclidean distance Scheme 1 has the largest storage cost, the computation efficiency of cosine similarity is the lowest, and the classification accuracy of Pearson's correlation coefficient is the lowest. Nevertheless, the specific algorithm used as the similarity algorithm varies depending on the specific data.

Data Availability
Marked datasets, which support the conclusion of this study, can be obtained upon request to the corresponding authors.

Conflicts of Interest
e authors declare that there are no conflicts of interest.