A Bitwise Design and Implementation for Privacy-Preserving Data Mining: From Atomic Operations to Advanced Algorithms

Homomorphic encryption (HE) is considered as one of the most powerful solutions to securely protect clients’ data from malicious users and even severs in the cloud computing. However, though it is known that HE can protect the data in theory, it has not been well utilized because many operations of HE are too slow, especially multiplication. In addition, existing data mining research studies using encrypted data focus on implementing only specic algorithms without addressing the fundamental problem of HE. In this paper, we propose a fundamental design and implementation of data mining algorithm through logical gates. In order to do this, we design various logic of atomic operations in encrypted domain and nally apply these logic to wellknown data mining algorithms. We also analyze the execution time of atomic and advanced algorithms.


Introduction
With the progress of storage in the cloud server, advanced data process and analysis using machine learning and data mining techniques are developed to extract valuable information.However, the concern about the data privacy and security issues has occurred in storing and managing information in cloud servers. is is because the server must decrypt the data in order to process the data encrypted in conventional cryptosystems such as AES and DES, even though the client transmits the data to the server in encrypted form.Eventually, users must share the decryption key with the cloud, which can lead to data infringement by a malicious server.
Homomorphic encryption (HE) [1,2] is mentioned as one of the most powerful solutions to the data security problem in the cloud, since the data can be processed in the encrypted domain without decryption.However, data analysis with HE is not so popular in real world although it is highly recommended for providing the proper security to the cloud.e major reason is the fact that it is di cult to link HE and machine learning.As known, HE is a new cryptosystem which uses profound, mathematical property with lattice, which makes it di cult for the data scientists to understand and use.
In addition, a few well-known HE algorithms support only very simple operations such as addition and multiplication between integers.Although Gentry [3] presented fully homomorphic encryption (FHE) which allows all operations on the ciphertext to be theoretically unlimited, it had many limitations in adapting to the real cloud model [4].Since the implementation and development of the encryption algorithm are not main interest to theoretical cryptographers, the practical usage and implementation are rarely developed compared to the theoretical progress in FHE.erefore, to date, FHE has been limited to be applied only to speci c algorithms without solving the fundamental problems of FHE [5][6][7][8][9][10][11].
From this point of view, we propose a FHE computation method that can be applied more generally by using bitwise logical circuits, rather than algorithms that operate only under certain conditions.By designing the basic operations necessary for machine learning, we make a universal link between HE and machine learning.People who are studying FHE can easily apply machine learning with homomorphic operations.Furthermore, machine learning researchers will be able to run data-driven data analysis algorithms with encrypted data although they do not have the knowledge about FHE at all.
Our contribution of this paper is threefold: (i) In order to build simple data mining techniques with FHE, we design various atomic operations including absolute value operation, multiplication, comparison, and sorting through the gate operation provided by the TFHE library (ii) In contrast to the integer-based FHE scheme in which possible operations are limited, all the operations including division and log can be designed in the bit-based FHE scheme (iii) We finally demonstrate the applicability of the several well-known data mining techniques using our proposed bitwise FHE schemes: the linear regression, the logistic regression, k-NN classifier, and k-means clustering

Background
2.1.Homomorphic Encryption.Homomorphic encryption (HE) [1,2] is a cryptosystem in which the result of operations between ciphertexts is equal to the result of operations between plaintexts when decrypted.e operations on the ciphertexts of a and b can be expressed as ] where E[•] and D[•] denote encryption and decryption, respectively.e concept of HE was first presented in 1978 by Rivest et al. [12].Many HE schemes have been introduced since then, and the most popular one was the Paillier cryptosystem, proposed by Paillier [13] in 1999.However, they were partial HE with a limited number of operations since the encryption noise is amplified each time the operation is performed.e solution to this noise accumulation problem was the fully homomorphic encryption (FHE) of Gentry [3] in 2009.Gentry [3] proposed a bootstrapping algorithm that removes accumulated noise, thereby eliminating the limit on the number of operations.However, this Gentry [3] technique had to encrypt each plaintext bit by bit.It was a heavy burden on memory because the size of the ciphertext was so large.In addition, the bootstrapping operation was performed with a very complicated algorithm, so it took dozens of minutes to bootstrap a bit.For these reasons, many FHE libraries now use integer-based schemes, but this also has the disadvantage that the possible operations are very limited.

TFHE Library for FHE.
In 2017, Ilaria Chillotti, Nicolas Gama, Mariya Georgieva, and Malika Izabachène proposed TFHE [14] library which is an improved version of FHEW [15] library.It has the bit-by-bit encryption scheme similar to Gentry's initial FHE [3].However, unlike [3], TFHE has constructed operations in a more fundamental way than addition and multiplication between ciphertexts.It is the binary circuit that was used for the encrypted bits operation.In other words, TFHE supports NOT, AND, OR, NAND, NOR, XOR, and XNOR gate operations between encrypted bits, allowing users to construct encrypted circuits using these logical operations.Another advantage of TFHE is that it efficiently solves the bootstrapping problem, which was the biggest obstacle to using FHE. is is designed to perform a bootstrapping function automatically whenever a single operation is performed, unlike the conventional FHE, in which a direct bootstrapping must be performed to remove noise each time a certain number of operations are performed.In other words, it is possible to perform computation without limitations.Here, the bootstrapping algorithm is performed with a time of less than one 0.1 second and has the fastest performance among all of the preceding FHE schemes.
In addition, through supporting the multiplexer function, convenience of implementation and speed of circuit are more improved.In the below function, a is a multiplexer factor and outputs either b or c depending on the value: is process is repeated until the cluster converges, and finally, the data are clustered into k clusters.
e result of the k-means is affected by the distance measurement method as well as the kNN.

Problems
3.1.FHE for Machine Learning.Although machine learning and FHE have long history, their research has been conducted separately for a long time.Recently, as the era of cloud computing comes, privacy-preserving machine learning and data mining have been introduced as a hot topic.ere have been several studies on connecting FHE and machine learning [6,7,9,10,16].
However, as mentioned in Section 2, there is a limitation that it is difficult to apply FHE to machine learning algorithms because it is only possible to perform limited operations such as addition and multiplication in most libraries.Accordingly, existing machine learning studies using encrypted data have focused on implementing specific algorithms such as Naïve Bayes classifier [9] or linear regression [16].In addition, since FHE requires complex theoretical knowledge, it is difficult for general machine learning engineers to understand its concept.Worse, in order to use the FHE scheme, we need a technique to replace all operations on plaintext with homomorphic operations.
In this paper, we focused on how to efficiently implement basic atomic operations and universal application to various machine learning algorithms.ese studies will be a good mediator between FHE and machine learning.

Integer-Level Encryption vs. Bitwise Encryption.
e FHE, which operates in integer space, takes scalar integers or polynomials with integer coefficients as input and then performs an operation on an integer basis.erefore, additional integer encoding is required for real data that are not integers.Previous research studies about FHE application have used the rounding function to convert real numbers to integers for the encoding and decoding processes.Most of them used the scaling constant k before rounding to preserve the original number.In order to recover the encoded value, k must be divided from the decrypted result as follows: (1) Encoding: However, there is a problem with this method, which is to use an approximation rather than an accurate data.e approximation accuracy of the data is determined by the scaling constant, and the user must also determine this constant.
On the other hand, bitwise encryption does not require encoding process to an integer because all real-valued data can be represented in bits.In addition, since the computer stores and processes data on a bit-by-bit basis, a generalized encryption scheme can be easily applied to any data.
In this paper, we introduce the logic of various operations for the bitwise encryption scheme using the TFHE library.We present a method for constructing atomic operations using the circuit operation for each bit after converting integer data into bits.Table 1 shows the logical operators used in this study and their notation.

Designing Homomorphic Atomic Operations
Our method uses the TFHE library, so we perform all operations on a bit-by-bit basis.is is similar to the way that binary data in a plaintext are processed by a computer using AND/OR/NAND/NOR/XOR/XNOR/NOT gates.However, since we do not know actual values to be computed, the operations should be differently designed from algorithms in the plaintext, such as using ciphertext in if-statement (for example, "If ciphertext � E[0], then follow below command"; in this case, we cannot compare ciphertext and E[0] typically).Considering these characteristics, we introduce a new design of the atomic operations in this section.e atomic operations include Addition, 2's Complement, Subtraction, Equivalent Comparison, Large and Small Comparison, Shift, Absolute, Multiplication, and Division.Note that Addition, Subtraction, and Multiplication among these atomic operations have already been introduced in the literature [14,17].However, other atomic operations have rarely been studied although they are highly significant for numerical computation.We demonstrate the description and algorithms of both already and rarely studied atomic operations in this section because they are separately classified as homomorphic atomic operations from the advanced homomorphic data mining algorithms in Section 4.
All algorithms introduced in this paper are implemented and evaluated with Intel i7-7700 3.60 GHz, 8.0 GB RAM, and Ubuntu 16.04.4LTS.

Addition Operation.
Addition is one of the most basic operations.ere are many ways to implement full adder circuit with basic gates such as 9 NAND gates and 7 NOR and 5 NOTgates.However, since the number of basic gates is relative to speed of the circuit in the TFHE library, addition can be more efficiently designed by using only 2 XOR, 2 AND, and 1 OR gates.More details are described in Figure 1.
In the circuit diagram of Figure 1, the least significant bit (lsb) of a and b is input to the upper bit input, and c 0 , which is the lsb of the carry, is initialized to E[0].s i passing through the circuit is the sum of the corresponding bits, and c i+1 is the carry of the next bit.

2's Complement Operation.
It is necessary to express a negative number in order to perform an integer binary data operation.ere are two ways to represent negative numbers in a computer, mainly the 1's complement method and 2's complement method.e 1's complement method has a simpler advantage than the 2's complement method when representing a negative number.
e desired number is operated through a XOR gate with a single bit 1. e process can be replaced to taking the NOT gate for every bit of the desired number.is is because the NOT gate is significantly faster than the XOR gate.However, the 1's complement Security and Communication Networks method has two ways of representing 0, and it is necessary to use a logic different from the plaintext to perform operations such as addition and subtraction.e method of improving this is the 2's complement method, which is represented by adding the integer 1 in the 1's complement method.Since, when 1 is represented by a binary number, it is filled with zeros except for lsb, the carry can be added to the next bit of 1 to perform addition.erefore, when adding, it is possible to reduce the speed by adding the NOT gate to the half adder which does not need carry, without using the previous full adder: b i+1 � a i ∧ b i and s i � a i ⊕ b i . is is expressed by a circuit as shown in Figure 2.
Set the number a and b � [00 . . .01] to take the 2's complement operation and input from each lsb.e output s i from the above circuit is the result of the corresponding bits; the carry is b i+1 , which is the next input.

Subtraction Operation.
In a typical computer environment, you can implement subtraction using the 2's complement method and addition, so subtraction logic is not implemented separately.However, subtraction can be processed using the 2's complement method and addition as in plaintext, but it can be newly implemented with 2 XOR, 2 AND, 1 OR, and 2 NOT gates.e detailed circuit diagrams are demonstrated in Figure 3.
Subtraction enters the input from lsb of a and b. d i passed through the circuit is the result of the subtraction of that bit, and c i+1 is the carry of the next bit.d i and c i+1 are defined according to their value after defining D as shown in the following equation:

Large and Small
Comparison.We will explain this as a large comparison because the large comparison and the small comparison are logically similar.In a computer, large comparison is a system that outputs results when bits with different values are compared while comparing from upper bit to lower bit.However, since it is not known whether the value of the comparison of each bit is ciphertext of 1 or ciphertext of 0, it does not know which bit has a different value and which of the two numbers is larger.us, we have to use the new logic.
First, let us consider the sign bit of the result of subtracting the preceding number from the latter number of two inputs.If the preceding number is less than or equal to the latter number, E[0] is output and larger E [1] is output.erefore, we will use this subtraction to make a large comparison.However, considering the speed of the circuit, we will use a method that uses a multiplexer function and XNOR gate.e detailed circuit diagrams are demonstrated in Figure 5. e result of the comparison is the result of repeating the circuit by the length of the data.
Larger than or equivalent comparison or smaller than or equivalent comparison can take a NOT gate as the result of a small comparison or a large comparison, respectively.

Shift Operation.
Since the ciphertext is encrypted bitwise, it can be shifted in the same way as for the shift in plaintext.Shift the k bits to the left and fill the empty right k bits with E[0].Shifting k bits has the effect of multiplying 2 k as shown in Algorithm 1.
In this algorithm, "HomCONSTANT" is a function that produces one bit ciphertext corresponding to the input value  and "HomCOPY" is a function that produces the same one bit ciphertext as the result of the decryption, but different ciphertexts.e right shift can be divided into a general shift, which is a method of shifting the upper k bits to E[0] after shifting like a left shift, and an arithmetic shift which shifts the upper k bits to the same value as the sign bit.An arithmetic shift is mainly used, and shifting k bits has the effect of dividing by 2 k .4.6.Absolute Value Operation.In the plaintext, the absolute value algorithm outputs as it is if the most significant bit is 0 and takes the complement of 2 if the most significant bit is 1.Since the value of the most significant bit is not known in a ciphertext, a new algorithm must be designed.Let the original value be a and the value obtained by taking the complement of 2 to a be b; then, one is positive and the other is negative (except for 0).Now, let sign bit of a be a multiplexer factor, which returns a or b depending on the value: (2)

Multiplication Operation.
In general multiplication, multiplying m bits by n bits results in (m + n) bits.When the two numbers to be multiplied are positive, the multiplicand is multiplied from the lsb of the multiplier to the upper bit as if it were calculated by hand.en, the result of multiplication is the sum of all the left shifted values as the bit position of the multiplier increases.us, the smaller 1-bit of the multiplicand is, the more efficient it is.erefore, we divide the multiplier by addition or subtraction to reduce the number of 1-bit as much as possible.However, as mentioned earlier, this is an algorithm that can be applied only to positive numbers, so a more advanced form of algorithm is needed to consider negative numbers. is is because, in the case of the unencrypted plaintext data, the sign of the data can be inspected by checking the msb, but in the case of the encrypted data, the value of the msb cannot be confirmed.at is, a new algorithm should be designed to output the correct result regardless of the sign of the given data.To solve this problem, we can calculate the product of positive numbers through an absolute value operation and then perform a 2's complement operation on the result according to the sign.at is, for multiplication of a and b, we follow the below way: (3) erefore, our algorithm adopts the latter method, and its circuit diagram is shown in Figure 6.If the divisor or dividend is negative, we need to use a slightly different algorithm.First, we can implement negative binary division algorithm by modifying Algorithm 2 slightly.However, since the sign of the input value cannot be known, when the negative binary division algorithm is implemented with a new algorithm, both algorithms must be performed and a single result should be output according to the sign of the input value.is is inefficient because it takes time to perform Algorithm 2 twice.erefore, we will implement the signed binary division algorithm using a second method that uses absolute values and multiplexer function as in multiplication.
at is, for signed binary division M and Q, we follow the following way:

Basic Gate Experiment.
We implemented the operations of Section 4 based on the basic gates and checked the speed of 1-bit basic gate operation in TFHE 1000 times.As shown in Table 2, the basic gates except the NOT gate have the same speed, and the speed of the NOT gate is significantly lower than that of the other gates.Also, the multiplexer function is implemented differently from the basic gates so that there is a difference in speed.It can be seen that the speed of the multiplexer function is faster than the speed of computing basic gate about two times.

Number of Gates Used in Designed Homomorphic Atomic
Operations.Since all gates except NOT gate and MUX gate have the same speed, we will denote execution time of these gates as T G .Time of the MUX gate is represented by T M , and the NOT gate is omitted because the speed converges to zero.
Table 3 shows the number of gates used when performing designed homomorphic operations with l-bit input values for each operation.
Most of the operations listed in Table 3 are linear for data length.In shift operation, the position of bit is shifted without using a gate operation, and the number of gates in multiplication and division operations is proportional to the square of the data length.

Execution Time of the Homomorphic Atomic Operations.
In Table 4, we measure the speed of the operations based on 16 bits.
e speed of the shift operation is not measured because gate is not used; for nonlinear operations, we measured 8, 16, and 32 bits to see the change in speed.
Looking at the measured values, the doubling of the length of the data increases the speed of both algorithms by about four times.
is is because the speed of addition, subtraction, and comparison operations constituting the multiplication and division is linearly increased with respect to the data length, and the number of iterations of the algorithm is also proportional to the length of the data.

Linear Regression. Given a d-dimensional input variable
x (i) ∈ R d and its corresponding target variable y (i) ∈ R for i � 1, 2, . . ., n, an inference on parameters of the linear function within hypotheses is defined as and number of features, d + 1. is regression describes a hyperplane in the d-dimensional space of the independent variables x.
In general, the linear regression can be easily estimated by using least square estimation as follows: where Y � [y (1) , y (2) , ... , y (n) ], X � [x (1) , x (2) , ... , x (n) ].However, in FHE, it is rather difficult to design and implement the inversion matrix of equation (6).erefore, instead of the exact solution, we choose an approximation estimation which is based on the gradient descent update in order to avoid the calculation of inverse matrix.e approximation estimation uses error function to optimize the parameters of both simple and multiple linear regression as follows:  Security and Communication Networks e main goal of linear regression is to fit a straight line through the data, so we minimize the error function J(θ).Gradient descent is achieved by an algorithm that starts with an initial θ and repeatedly performs the update: where α is denoted by a learning rate.e parameters θ j are updated concurrently for every iterations till convergence.Our algorithm of linear regression is given in Algorithm 3. e method of implementing the linear regression is very similar to operation in the plaintext.However, it is calculated in an encrypted state; therefore, in an encrypted domain, we can calculate all operations in gradient descent algorithm which includes multiplication, addition, and subtraction operations.We initialized parameters θ to 0 and updated our parameters using linear regression function with FHE operations.
Input: divisor M, dividend Q (1) Shift the AQ to the left by one bit and let the upper l bit of AQ at A.
(2) Calculate A − M and put it in A.
(3) If A is negative, the last bit of AQ becomes 0 and A + M is calculated and put it in A to return to the value before step 2. (4) If A is positive or zero, the last bit of AQ is 0.
(5) e count value is decremented by 1. ( 6) If the count is not 0, the algorithm goes to step 1 and the algorithm is progressed.(7) If the count value is 0, the result of algorithm is output (the lower l bit of AQ becomes the quotient and the upper l bit becomes the remainder).
ALGORITHM 2: Positive binary division operation.Security and Communication Networks

Performance Evaluation of FHE Linear Regression.
We performed two experiments with varying d, the simple linear regression (d � 1) and the multiple linear regression (d > 1).We set the number of data (N), the number of dimensions (d), the length of data (l), and the number of iterations of the algorithm (p) as factors for the linear regression algorithm.en, the number of gates (T) can be expressed as follows: For the simple linear regression, we set the initial values to (N, d, l, p) � (10, 1, 16, 1) for the experiment.e dataset consists of a feature vector x � [2,4,5,6,8,10,13,16,17,19] and a target variable y � [5,9,12,14,15,18,24,26,30,32] with 10 data created artificially, and it takes 554 seconds with 0.01 running rate.

Logistic Regression.
Implementation of various algorithms such as linear regression can be easily facilitated by our FHE arithmetic operations.However, logistic regression is an algorithm that holds a nonlinear function which requires variation in the equation to be calculated.erefore, the key point of deriving FHE logistic regression lies in designing a nonlinear sigmoid function.We initially elaborate a brief derivation and structure of FHE logistic regression followed by explaining two ways of constructing logistic function.
Given an input variable x (i) ∈ R d and its corresponding target variable y (i) ∈ Z 2 for i � 1, 2, . . ., n, an inference on parameters of the logistic function g(z) within hypotheses is defined as T , and number of features, d + 1.We also denote x (i)  j as an element of a matrix in the i-th row and j-th column position.
e logistic regression uses likelihood function to make an estimate on weight θ.If we let p(y . Finally, the likelihood function for the whole data, , is to multiply likelihood of each data.Next, log operation is performed to enumerate log likelihoods in a linear combination as the follows: In order to maximize the likelihood, L(θ), we chose to perform gradient descent algorithm that iteratively updates cost function, J(θ), where J(θ) � − L(θ).
erefore, θ is updated with the following equation: Existing literature [18] designed a nonlinear logistic function by two approximation techniques, namely, the Taylor series method and least square approximation.In this paper, we show feasibility of constructing two different approximation techniques based on our proposed bitwise FHE operations to perform the logistic regression.

Taylor Series Method. It is well-known that Taylor expansion enables a differentiable real-valued function f(x)
to be expanded in a series at x � a such that Bos et al. applied Taylor series expansion to logistic function which facilitates calculation of the nonlinear function since the altered equation incorporates only the four fundamental operations [18].erefore, Taylor series polynomial of degree 9 for sigmoid function can be derived as  Security and Communication Networks x 7 + 31 1451520 x 9 .
(13) Using our basic bitwise FHE operations that are presented in the previous section, we can construct approximate logistic function by Algorithm 4. In addition, we refer c i to coefficients of g(x) where c 0 � 1/2, c 1 � 1/4 , • • •, c 5 � 31/1451520.
Figure 7 illustrates approximated logistic function with respect to Taylor series expansion.Our approach guarantees a boundary of (− 1, 1) while (− 2, 2) for the existing literature [18]. is is due to the 4th and 5th coefficients that are 0 for the length of the input designated by 32 bit. is can be solved by assigning larger length to represent the coefficient numbers.

Least Square Approximation. Kim and Cheon et al.
proposed a least square polynomial that broadens bounded domain of Taylor series expansion to (− 8, 8) [19,20].e underlying principle is to derive a function g(x) that minimizes mean squared error (MSE) such that 1/|I|  I (g(x) − f(x)) 2 dx where |I| is denoted by the length of an interval.
We omit an algorithm for implementing the least square approximation with respect to our scheme since the algorithm follows a similar procedure as in Algorithm 4. e visualized comparison of the real sigmoid function with our approach and that of the existing literature [18] can be seen in Figure 8 to verify that our FHE scheme can approximate the desired function equal to the current literature.

FHE Gradient Descent Algorithm.
When the logistic function is designed either by the Taylor series or the least square approximation technique, we are able to perform the gradient descent algorithm for parameter estimate.e process of logistic regression is indicated in Algorithm 5.

Performance Evaluation of the FHE Logistic Regression.
We implemented logistic regression with two of the strategies mentioned previously.From Algorithm 4, we claim that number of data (N), length of data (l), dimension (d), and iteration (p) are the principal factors of time complexity (T) for both methods.We deliver their time performances in a precise manner, where T Taylor and T ls are time complexity of the Taylor series and least square approximation, respectively: Since the time for experiment requires fairly significant amount of time, we set number of data, dimension, and iteration to be 10, 2, and 1, respectively.e summary of time performance with respect to 16 bit is elaborated in Table 5.

kNN Classifier.
e bitwise FHE method of implementing the kNN algorithm in Algorithm 6 is almost similar to that of the plaintext, except the sorting operation which is described in the next section.e conventional kNN algorithm uses Euclidean distance between data, but our algorithm replaced the distance as the sum of the absolute value for speed efficiency.Also, when sorting the calculated distances, we searched for only the k smallest values to reduce the computation time.As shown in Algorithm 6, we need to design two additional homomorphic operations for the homomorphic kNN classifier: sort of Algorithm 7 and conditional swap of Algorithm 8.
When sorting is completed, we check the labels of the nearest k data and output the major labels.Since the label is also encrypted, it is not possible to know which label is the most major.In order to attain the most frequently used label, we first counted number of data with the same label.Since the counting numbers are encrypted, we perform equivalent compare operation of a label to the other labels.Lastly, we add all the output numbers and sort out in descending order to pick the largest number, which is our desired label.Algorithm 9 represents the pseudocode that finds the most major label among the labels of k-nearest data in our kNN algorithm.

Sorting for kNN Algorithm.
e kNN algorithm on encrypted domain requires sorting algorithm to find the nearest neighbors, so we design a new sort algorithm for ciphertext.Algorithm 7 represents the pseudocode to sort the numbers in arr[n] by the selection sort algorithm.
A swap operation that simply exchanges a location in a ciphertext should only change its position as in plaintext, but to apply the selection sort algorithm to ciphertext, we must decide whether to relocate it through a large or small comparison.So, we have to input the factor to determine whether to swap or not, and we call this swap operation conditional swap.We set the number of data (N), the dimension of data (d), the length of data (l), the number of near neighbors (k), and the length of label (L) as factors of the kNN algorithm.en, the time complexity of kNN algorithm (T) can be expressed as follows: We set the initial values to (N, d, l, k, L) � (64, 1, 10, 3, 1) for the experiment.When conducting experiment with the initial value, it took 226 seconds.en, we performed the experiment by changing the value of each factor one by one.As a result, because the algorithm consists solely of linear operations except k, we confirmed that the speed of the algorithm is almost proportional to the value of each factors.

k-Means Algorithm for Image Segmentation.
We also performed gray color image segmentation using the k-means algorithm.e target image for the homomorphic segmentation has the 8-bit gray color of each pixel in the image, and the k-means algorithm is used to input the encrypted color value of all the pixels.In order to do this, the cloud server first obtains encrypted values of the pixels at N random locations rather than all encrypted pixels for efficient computation.Afterwards, the k-means algorithm is applied to partition N encrypted pixels into k clusters.As a result, the cloud server calculates the representative values of k clusters in a homomorphic way.After deciphering the representative values in the client's side, the colors of all the pixels in the image are compared with the representative values, and image segmentation is performed by replacing the color with the representative value of the near cluster.Our algorithm of k-means is given in Algorithm 10; we performed the algorithm by expanding the total data size to 10 bits considering 8-bit original data, the sign bit, and addition operation.
In general, use the Euclidean distance when calculating the distance between two points.In this experiment, however, another method can be used because the dimension of Input: a training data x (i)  Output: logistic value of x (i) w.r.t the Taylor expansion method (1) Convert coefficient c i into arrays (2) Construct power series of x to 9th power (3) Multiply c i with corresponding power of x (4) Add all the derived terms in step 3 ALGORITHM 4: FHE sigmoid function by Taylor expansion.input of an image in Figure 9(a), where the parameters are given as 64, 10, 3, and 10. e representative value for each cluster is recorded as 23, 56, and 170, respectively.e experiment took approximately 1,500 seconds, and the result of segmentation can be checked in Figure 9(b).

Discussion
In this section, we describe the limitation of our proposed approach in usage.Our proposed approach has a concern: it has extremely slow computation with large memory space.
Currently, it is true that bit-based schemes are inefficient in terms of speed and memory compared to integer-based schemes.However, integer-based schemes have a fatal disadvantage that their possible operations are limited and can only be used for specific algorithms.is is a fundamental problem and hard to improve.On the other hand, the speed of computation, which is a disadvantage of bit-based schemes, can be improved more flexibly.
Our current approach is not optimized yet, so each operation on encrypted domain is extremely time consuming.However, this problem may be addressed by accelerating the computation with a lot of state-of-the-art techniques.For instance, the atomic operations can be implemented in a hardware level rather than in a software level.FPGA and ASIC would be the good candidates for the implementation.Additionally, we can reduce the computation time by optimizing the logic and programming codes in a software level.We can also save the computation time using a graphical processing unit (GPU) and parallel computing scheme.

Conclusion
In this paper, we have proposed basic homomorphic arithmetic operations using bitwise homomorphic gates.We applied these bitwise homomorphic operations to several well-known data mining techniques: the linear regression, logistic regression, k-NN classifier, and k-means clustering.To implement the algorithms, we introduced advanced bitwise operations such as sorting and conditional swap, which are specific to bitwise homomorphic operations.With our proposed bitwise homomorphic atomic and additional operations, even data scientists without any knowledge of FHE can easily analyze and process data on encrypted domain.

Data Availability
e training data and image data used to support the findings of this study have not been made available because it is artificially created and also small enough so that the reader can easily create it.

Figure 2 :Figure 3 :
Figure 2:e circuit design for 2's complement in the FHE scheme.
6) end for(7) return [a l− 1− k , . . ., a 0 , E[0], . . ., E[0]] ALGORITHM 1: Pseudocode of left shift.Security and Communication Networks cases.First, let us consider the case where both the divisor and the dividend are positive.Let the array M, Q, and A have the same length of l, and initialize M to divisor, Q to dividend, and A to zero.e count value is the dividend length, l.And let AQ � [A||Q] with a length of 2l and start the main part of the algorithm.

Figure 6 :
Figure 6: e circuit design for signed multiplication.

Figure 7 :Figure 8 :
Figure7: Comparison of real sigmoid with our approach and Taylor series approximation from existing literature[18].
In data mining, the k-nearest neighbors algorithm is one of the most well-known and useful supervised methods for classifying a dataset.Given the classified data with several classes, the kNN determines the class of new input data based on its neighbors.At this time, the label of the input data is set to the largest number of labels of the closest k data.
)2.2.Data Mining and Machine Learning Algorithms2.2.1.Linear Regression.Linear regression is the most popular model for predicting target value of y.It is the method that estimates the coefficients of the linear equation, involving one or more independent variables.Several types of process exist to optimize the values of the coefficients.We focus on gradient descent, iteratively minimizing the error of the training data.2.2.2.Logistic Regression.Logistic regression is a special case of generalized linear model in which the target variable is binary such as pass or fail, live or death, etc.In general, logistic regression makes an inference on parameters of sigmoid function which determines classification of modeling binary or categorical dependent variables.2.2.3.k-Nearest Neighbors (kNN) Classification.
Equivalent comparison in plaintext compares each bit for two input values and outputs 1 if all are equal and 0 if there are other values.However, in encrypted data, it is possible to determine whether each bit is the same through an XOR gate, but since it comes out encrypted, it does not know what the value is.erefore, to get the results we want, all the results of the XNOR gate of each bit are operated with the AND gate as shown in Figure 4. en, E[0] is output when there are different bits in two inputs, and E[1] is output if each bit is the same value.
4.4.1.Equivalent Comparison.en, if the input values are different, E[0] is returned for the output and E[1] for the same input values.

Table 1 :
e notations of the logical operators.

Table 2 :
Execution time of basic gates (s).

Table 3 :
Time complexity of designed homomorphic atomic operation with l-bit input values.

Table 4 :
Execution time of designed homomorphic atomic operation.