Secure Two-Party Decision Tree Classification Based on Function Secret Sharing

the original


Introduction
Te two stages of a machine learning process are as follows.A model or classifer is developed using a potentially vast collection of training data during the frst phase, also known as the learning phase.Ten, the raw data are classifed using the model.In many felds, including healthcare, fnance, spam fltering, intrusion detection, and remote diagnostics, machine learning (ML) classifers are useful tools [1].Tese classifers frequently need access to highly sensitive personal information like medical or fnancial records to execute their duties.Investigating systems that guarantee data privacy while reaping the rewards of ML is, therefore, essential.On the one hand, the ML model itself could include private information.For instance, a bank that utilizes a decision tree to evaluate its credit by clients would wish to keep the information about the model private.On the other hand, the model may have been created using private information.Socalled model inversion attacks are widely known.Furthermore, these attacks might jeopardize the confdentiality of the training data, which are promoted by white-box and, even worse, black-box access to ML models [2][3][4].Terefore, publicizing the ML model can confict with the training data privacy.
Private decision tree evaluation can be implemented using general secure two-party computation [5][6][7] techniques like secret sharing and garbled circuits.Te goal is to protect the decision tree algorithm so that it may be reviewed without disclosing any personal information.Some frameworks like ObliVM [8] and CBMC-GC [9] can transform plaintext programs written in high-level programming languages into oblivious programs suitable for secure computing.Teir straightforward application to decision tree algorithms unquestionably improves performance compared to a manually created architecture.Nonetheless, the size of the resulting ignorant program is still proportional to the size of the tree.Generic methods are, therefore, typically useless, especially when the tree is large.

Overview of Our Construction
We make use of homomorphic encryption and function secret sharing to implement a server-server secure two-party decision tree classifcation evaluation protocol without the trusted third party.
Te basic idea is as follows.
(1) Without relying on a trusted third party, two diferent servers share their data sources and combine their common data to perform security decision tree training on the ciphertext.(2) In this study, we address the problem of private decision tree training on confdential data from diferent data sources.Te characteristics of the data of the two servers are public, and the server requires to combine the data of the other party to classify the private attribute vector.Te goal of the computation is to determine the classifcation while keeping the user input and the decision tree confdential.Once the calculations are complete, only the classifcation models and their respective training results are shared in secret; neither party knows anything else.Use any general secure multiparty computation to solve the problem.Tere are specialized solutions that integrate multiple methodologies and leverage subject matter expertise to create efective agreements.
In this study, we provide a 2PC-based (two-party computation) framework for decision tree training and inference that is quicker and more precise.We provide several new building blocks based on the comparison protocol [10][11][12][13][14][15][16][17][18][19], support and implement secret sharing comparison on 2PC, and establish a new preprocessing protocol for mask creation.Te experimental fndings demonstrate that our approach is more accurate and timeefective than the majority of existing frameworks.
It is more challenging to prevent collusion among participants when several parties are involved and there are issues with the deployment itself.Although the present 3PC (three-party computation) or multi-PC security architecture must ensure an honest majority, the real world makes it difcult to meet this condition.Te cooperation of parties can only be easily regulated if it is implemented on cloud servers owned by many businesses.Nevertheless, 2PC can fulfll this need.
Our intent is to deliver and implement a unique twoserver protocol that gives both parties access to a complete classifcation model while maintaining the privacy of their own data and a tolerable level of speed.Te plan is to evaluate ciphertext trees encrypted with a server public key while using fully or somewhat homomorphic encryption (FHE/SHE).Terefore, the evaluation server is not informed of any intermediate or fnal calculation results.Existing fully homomorphic encryption techniques have high computational overhead and data transmission costs.To address this, we use efcient data representation and algorithm improvements.However, fully homomorphic encryption still has substantial overhead compared to our approach.
We summarize the key diferences between our method and prior work by De Cock et al. [20] and Lu et al. [21] in Table 1.In this work, we have introduced a novel framework for secure two-party decision tree classifcation that provides substantial improvements over prior art.As evidenced by the table, our approach achieves higher accuracy, lower communication overhead, and reasonable computation complexity compared to De Cock et al. [20] and Lu et al. [21].Our innovations in computing decision bits and combining secret sharing with homomorphic encryption lead to a highly performant and accurate framework with demonstrable gains.Te empirical results substantiate the concrete efciency and accuracy advantages of our proposed techniques over existing methods.
In this paper, we present secure two-party decision tree classifcation for diferent data sources' training and inference.Te two-party setting is reasonable for real-world applications [22] and has been widely employed in privacypreserving machine learning [23][24][25][26].First, exploiting an advanced cryptographic primitive, function secret sharing (FSS) [27], we present an efcient comparison protocol for the choice of the best split.Te main challenge is that directly using the general FSS scheme [28] leads to a high evaluation overhead, since it requires two FSS invocations to handle the wrap around problem illustrated in Section 3. We address this by providing a novel theoretical analysis, which shows that the probability of incurring the wrap around problem is negligible with appropriate parameter settings even though we only invoke one FSS evaluation.Tis achieves approximately 2× reduction in the online runtime compared to the most efcient FSS scheme [28], while resulting in a slight accuracy loss in the training of trees.For communication, our protocol only requires one communication round with 2 ring elements.Nonetheless, the computational workload can be parallelized, thereby reducing computational time, even though the computational overhead may still end up being larger than existing protocols.By providing encrypted input and returning only encrypted output, we are able to provide a noninteractive protocol that enables clients to outsource evaluation to servers.Furthermore, it is possible to make existing systems unilaterally simulatable and secure in a semihonest model by employing techniques that may double computation and communication costs.
Brickell et al. [12] introduced the frst private decision tree evaluation protocol by utilizing a novel combination of homomorphic encryption (HE) and garbled circuits (GC).Te server translates the decision tree into a GC, which the 2 Complexity client subsequently executes.Tis protocol combines homomorphic encryption and oblivious transmission, which enables the client to discover its garbling key (OT).While the evaluation time is sublinear in the tree size, the technique could be more efcient for large trees due to the linear secure program and communication cost.Barni et al. [10] improved upon this technique by removing the leaf node from the secure program, thereby reducing calculation costs by a constant factor.Bost et al. [11] have modelled the decision tree as a multivariate polynomial where the constants in the polynomial signify the classifcation labels and the variables signify the outcomes of the Boolean conditions at the decision nodes.In order to clandestinely calculate the value of the Boolean conditions, each threshold is matched with the respective encrypted attribute values under the client's public key.Subsequently, the client receives the result once the server homomorphically evaluates the polynomial.
In their study, Wu et al. [4] have employed various methods that exclusively require additive homomorphic encryption (AHE).Tey have also used the protocol from [36] to compare data and broadcast the encrypted comparison bits with the client's public key to the server.Upon evaluating the tree, the server communicates the client's index of the matching categorization label, and the outcome is conveyed to the client via an OT.Tai et al. [15] have implemented the comparison methodology of [36] and AHE in their work.Tey have assigned costs to the left and right edges of each node, namely, b and 1 − b, respectively, where b is the result of the comparison at that specifc node.Ultimately, the costs are tallied along each tree branch, and the classifcation label pertains to the path that yields zero cost.
Tueno et al. [16] have represented the tree as an array and conducted comparisons of the depth of the tree using small garbled circuits to obtain secret shares of the subsequent node's index along the tree.Tey have also introduced a novel primitive called oblivious array indexing to enable the selection of the following nodes without memorization.Using a modular approach, Kiss et al. [14] have incorporated subfunctionalities such as attribute selection, integer comparison, and route evaluation.For covertly computing these subfunctionalities, they have thoroughly examined the trade-ofs and performance of various potential combinations of reduction protocols.
De Cock et al. [20] utilized a similar approach to earlier methods by initially carrying out comparisons.To minimize interactions, they have implemented secret sharing-based secure multiparty computation (SMC) and commodity-based cryptography [37] in an informationtheoretic model.In contrast to ours and other protocols, De Cock et al.'s approach is secure in the computational environment.Lu et al. [21] have proposed XCMP, a noninteractive comparison protocol using BGV homomorphic method [38] with the polynomial encoding of the inputs.Tey have further employed output expressive XCMP to construct the private decision tree protocol suggested by Tai et al., thereby maintaining additive homomorphism.
Te decision tree technique is efcient and noninteractive due to its short multiplicative depth.However, it has limitations and is not universal as it works best with small inputs and depends on BGV-type homomorphic encryption (HE) schemes.Furthermore, it lacks output expressiveness like XCMP.It cannot support SIMD operations, which makes it unsuitable for expanding to more complex protocols such as random forest [39] while preserving its noninteractive nature.Additionally, the output length of the technique is exponential in the depth of the tree.In contrast, our binary instantiation has a marginally linear output length, and the integer instantiation can further reduce it by utilizing SIMD.
Like most privacy-preserving techniques based on FSS [28,[40][41][42], their schemes derive correlated randomness through a third party.However, the role of the third party can be jointly simulated by the two parties using either generic two-party secure protocols such as garbled circuits (GCs) [43] and GMW [44], or specifc techniques [45].Specifcally, (1) one can use generic GCs or GMW style protocols to produce the required correlated randomness during the ofine phase.Although versatile, these protocols necessitate a private evaluation of underlying pseudorandom generators (PRGs) during the FSS key generation phase.(2) In a customized approach, Doerner and Shelat [45] proposed a new solution that ofers signifcant efciency advantages, as PRG evaluation takes place locally without the need for secure simulation.However, it is suitable only for moderate domain sizes and challenging to extend to more generalized and signifcant cases.

Preliminaries
Tis section provides essential defnitions and notations for our system, serving as a background for the rest of the study.Fully or somewhat homomorphic encryption is the fundamental concept, wherein we have simplifed the mathematical intricacies to facilitate the reader's comprehension and presentation.In this work, we utilize the terminology presented in [46] to delineate several foundational concepts.Relevant literature [12,[46][47][48][49][50][51][52] regarding homomorphic encryption is recommended for further understanding.

Decision Tree Classifier
Machine learning relies heavily on decision trees for data classifcation and regression.Tis study considers two parties that provide their distinct input variables and independently reconstruct the tree model, with data from one party kept confdential from the other.Te decision tree classifcation's primary objective, given an input query x, is to follow the tree model and compare the input entries to node-specifc thresholds for each decision node.Te left or right child node is chosen as the next node depending on the comparison result.Te classifcation model eventually ends in a specifc leaf node, giving the input query a unique classifcation label.Let elements x ∈ Z k be a feature vector.A function T(x) with a k-dimensional feature space is implemented by a decision tree T: Z k ⟵ Z.Let the input query q � (q 1 , . . ., q n ) ∈ Z k of the party i be an n i -dimensional positive integer vector over Z. Te Boolean function Bool is indeed the index of the feature vector x ∈ Z and t k is the threshold.
Ten the decision tree evaluation on input x � (x 1 , . . ., x n−1 ) is given by l label � T(x) with T: , in which m is the number of leaf nodes.Tis function starts from the root node and then does a comparison at each decision node.Let j be the index of a decision node and f be the function mapping the decision node index j to the corresponding input index f(j).Besides, let t j be the threshold value of decision node j.Ten if x f(j) ≥ t j holds for node j, the right child is chosen as the next decision node; otherwise, the left child node is chosen.At the end, the function outputs the classifcation label l label of the fnal leaf node.
A decision tree (DT) is a function T: Z ⟶ c 0 , . . .,  c k−1 } that maps an attribute vector x � (x 0 , . . ., x n−1 ) to a fnite set of classifcation labels.Te tree consists of (i) Nodes that either contain a test condition or are internal decision nodes.(ii) Nodes that contain a classifcation label and are considered as leave nodes.
Te decision tree model comprises a decision tree and the functions outlined below: At each decision node, a comparison of "greater-than" is made between the assigned threshold and attribute values, i.e., the decision at node v is Node Indices.If we have a decision tree, the index of a node can be determined using breadth-frst search (BFS) traversal, starting at the root with index 0.When the tree is complete, a node with index v will have a left child of 2v + 1 and a right child of 2v + 2.

Homomorphic Encryption
Tis article focuses on lattice-based homomorphic encryption methods that allow for computations on ciphertexts by generating an encrypted output that corresponds to the result of a function applied to the plaintexts.Such encryption schemes facilitate several linked additions and multiplications on plaintexts in a homomorphic manner.Defnition 1.Consider the plaintext space defned as a ring Z q [X]/(X N + 1), where q is a prime number and N can be expressed as a power of two.Te homomorphic encryption (HE) scheme under consideration includes the following algorithms: (i) (pk, sk, ek) ⟵ KGen(λ): Te generation of private key sk, public key pk, and evaluation key ek is achieved through a probabilistic algorithm denoted by KGen(λ).Tis algorithm employs a security parameter λ to ensure the randomness and security of the generated keys.(ii) c ⟵ Enc(pk, m): An encryption algorithm using a probabilistic algorithm is employed to produce a ciphertext c from a given message m and public key pk.We will denote the resulting encryption as ⟦m⟧.(iii) c ⟵ Eval(ek, f, c 1 , . . ., c n ): a probabilistic algorithm is utilized to generate a ciphertext c by employing the evaluation key ek, an n-ary function f, and n ciphertexts denoted as c 1 , . . ., c n .(iv) m ′ ⟵ Dec(sk, c): a message m ′ can be generated from a given ciphertext c and private key sk using a deterministic algorithm.
When using the encoding method in homomorphic encryption (HE), the ciphertext is modifed by introducing "noise," which can increase during homomorphic evaluation.While the noise level grows exponentially upon multiplication, adding ciphertexts results in a linear increase.If the noise level becomes too high, it makes the decryption of the ciphertext impossible.To avoid this problem, either the refresh algorithm can be employed or the depth of the circuit for the function f can be kept sufciently low.Tese techniques include keyswitching or bootstrapping procedures that convert a ciphertext encrypted with one key into a ciphertext of the same message encrypted with another key and a specifed amount of noise [46].

Function Secret Sharing
Function secret sharing (FSS) works by splitting a function f into two succinct function parts such that each part reveals nothing about the function f, but when the evaluations are combined at some point x, the result is f(x).

Complexity
Formally, an FSS scheme is a pair of algorithms Gen and Eval with the following syntax.We identify two FSS constructions [28,40] as a natural ft for our scheme:  [27,53]).A two-party function secret sharing (FSS) scheme is a pair of algorithms (Gen, Eval) such that (1) Gen(1 κ ,  f) is a probabilistic polynomial-time (PPT) key generation algorithm that given secure parameter 1 κ and a function  f∈ 0, 1 { } * outputs a pair of keys (k 0 , . . ., k σ ).We assume that  f explicitly contains descriptions of input and output groups G in , G out .
(2) Eval(σ, k σ , x) is a polynomial-time evaluation algorithm that given σ ∈ 0, . . ., m { } (party index), k σ is defned as the key of function f σ : G in ⟵ G out .Let x ∈ G in be the input of function f σ and output a group element y σ ∈ G out .
When σ is omitted, it is understood to be 2.When σ � 2, we sometimes index the parties by σ ∈ 0, 1 Defnition 3 (correctness and security [27,53]).Let F � f   be a function family and Leak be a function specifying the allowable leakage about  f.When Leak is omitted, it is understood to output only G in , G out .We say that (Gen, Eval) as in Defnition 2 is an FSS scheme for F (with respect to leakage Leak) if it satisfes the following requirements.
(3) Correctness: for all  f: G in ⟵ G out and every (4) Security: for each σ ∈ 0, 1 { } there is a PPT algorithm Sim σ (simulator), such that for every sequence (  f λ ) λ∈N of polynomial-size function descriptions from F and polynomial-size input sequence x λ for f λ , the outputs of the following experiments Real and Ideal are computationally indistinguishable: A central building block for many of our constructions is an FSS scheme for a special interval function referred to as a distributed comparison function (DCF) as defned below.We formalize it below.

Defnition 4 (distributed comparison function). A special interval function f <
α,β , also referred to as a comparison function, outputs β if x < α and 0 otherwise.We refer to an FSS scheme for comparison functions as DCF.Analogously, function f ≤ α,β outputs β if x ≤ α and 0 otherwise.In all of these cases, we allow the default leakage Leak(

Our Construction
Tis section outlines a modular description of our base protocol for secure two-party decision tree classifcation.We frst introduce the data structures used in the protocol.By employing this structured representation of data, we can ensure that each party has access to necessary information while preserving the privacy of sensitive data.Tis enhances the security of our protocol, making it suitable for real-world applications requiring secure data analysis.At last, we show the honest-but-curious adversarial model assumed in our protocol and cryptographic primitives like function secret sharing, homomorphic encryption, and secure comparison to prevent leakage of sensitive data to each party for our protocol.Overall, the modular design of our base protocol enables us to address specifc security concerns by considering data structures and access control mechanisms.In subsequent sections, we describe the key components of the protocol in more detail, including the cryptographic primitives employed and the communication protocol used to facilitate secure multiparty computation.Defnition 7 (classifcation function).Let the attribute vector be x � (x 0 , . . ., x n−1 ) and the decision tree model be M � (D, L).We defne the classifcation function to be f c (x, M) � tr(x, root), where root is the root node and tr is the traverse function defned as

Building Blocks
A one-time key is generated as part of the initialization process for a homomorphic encryption system.Te server S 0 is responsible for creating the triple (pk, sk, ek), which consists of the public, private, and evaluation keys.Following this, S i (i � 0, 1 { }) sends (pk, ek) to the other server S 1−i .For each instance of data categorization, S i encrypts their input and forwards it to the server S 1−i .A trusted randomizer can be employed to reduce transmission costs, which is not authorized to cooperate with the server and does not participate in the actual protocol.Tis technique is similar to commodity-based cryptography, except that the client can act as the randomizer themselves and provide the list of ⟦r⟧ before the start of the protocol when the network is not overloaded.
Te 10.1.Initialization.Te initialization consists of a onetime key generation.One server S i (i ∈ 0, 1 { }) generates appropriate triple (pk, sk, ek) of public, private, and evaluation keys for a homomorphic encryption scheme.Ten, another server S 1−i sends (pk, ek) to the server.For each input classifcation, S i just encrypts its input and sends it to the other S 1−i .To reduce the communication cost of sending input of S 1−i , S i can use a trusted randomizer that does not take part in the real protocol and is not allowed to collaborate with S 1−i .Te trusted randomizer generates a list of random strings r and sends the encrypted strings ⟦r⟧ to server and the list of r to S 1−i .For an input x, this server S 1−i then sends x + r to the server S i in the real protocol.Tis technique is similar to the commodity-based cryptography with the diference that S 1−i can play the role of the randomizer itself and sends the list of ⟦r⟧'s (when the network is not too busy) before the protocol setting.10.3.Aggregating Decision Bits.Ten for each leaf node v, the server aggregates the comparison bits along the path from the root to v. We implement it using a queue and traversing the tree in BFS as illustrated in Algorithm 1.
10.4.Finalizing.After aggregating the decision bits along the path to the leave nodes, each leaf node v stores either v.cmp � 0 or v.cmp � 1. Ten, the server aggregates the decision bits at the leaves by computing for each leaf v the value ⟦v.cmp⟧ ⊕ ⟦v.cLabel⟧ and summing all the results.Tis is illustrated in Algorithm 3.
Te comparison operation is used to select the maximum Gini impurity gain.Algorithm 4 gives a specifc comparison protocol Compare(⟦x⟧, ⟦y⟧) based on FSS, which outputs the shares of z � 1 y > x  .Note that the comparison protocol is executed over the secret-shared inputs rather than public values, which should be supported by our designed FSS scheme.As a result, the key idea is to construct the FSS scheme for the ofset function f ⟦r⟧ (x) � f(x + r), where r is randomly selected from Z 2 n and secret sharing between S 0 and S 1 .In this way, S 0 and S 1 frst reconstruct x + r and then evaluate f ⟦r⟧ (x), which exactly equals to evaluating f(x).Note that the ofset function fails if x + r wraps around.Our protocol only invokes 1 DCF and introduces 2n communication bits within 1 round in the setup phase.

Secure Two-Party Decision Tree Classification
In this section, we present our secure two-party decision tree classifcation protocol that caters to scenarios where two counterpart parties provide privacy information, and both parties can own a decision tree model (see Figure 1).Te proposed protocol ensures that both servers possess knowledge of the classifcation results but only of the individual inputs of the self-party.Our protocol is designed to be secure for "honest and curious" parties.
To establish the necessary functionality for the tree array A T and feature array X, S 0 and S 1 perform the required setup work for function secret sharing.Additionally, S 0 shares the root node A T [0] with S 1 , which serves as the starting evaluation node.Our secure two-party decision tree classifcation protocol provides an efective solution for secure data analysis while maintaining data privacy.It enables both parties to access the classifcation results without compromising sensitive information, thereby ensuring transparency in the data analysis process.
In each iteration, the evaluation process starts with the call of the FFS functionality F on S i , which initiates the sharing of X[v] among the parties.Te parties then perform a secure comparison between ⟦X[v]⟧ and thr, the purpose of which is to obtain a comparison result, denoted as b.Subsequently, the MUX computation determines which child becomes the next evaluation node.Te computation of 6 Complexity Lemma 8 (Correctness).Assuming the evaluation correctness of the underlying FSS scheme and our Eval algorithm, then the above construction is a dual-server private decision tree classifcation protocol, outputting the correct classifcation label.
Proof.Based on the correctness of FSS and Eval, the cos t l is equal to 0 if and only if v.cmp is equal to 0; then the corresponding result satisfes that result l � r l 1 × 0 + l label � l label .□ Lemma 9 (security).Te algorithm Compare(⟦x⟧, ⟦y⟧) securely realizes the functionality F Compare , assuming the existence of secure protocols for FSS procedures.
Proof.We prove the security of Compare(⟦x⟧, ⟦y⟧).S i receives no private information of S 1−i , i ∈ 0, 1 { }, and hence this protocol is trivially secure against "curious but honest" adversary.Now, we prove the security against corruption of S i , S 1−i , i ∈ 0, 1 { }, when server receives ⟦b⟧ 1−i � ⟦y⟧ 1−i −⟦x⟧ 1−i + ⟦r⟧ 1−i and k i .Given the security of PRFs, ⟦r⟧ 1−i is a random value unknown to S i .Tus, the distribution of ⟦b⟧ 1−i is uniformly random from the view of S i .Ten given the security of FSS, the information learned by S i can be perfectly simulated.Hence, our protocol is trivially secure against "curious but honest" corruption of S i .

Security Analysis
We now present a formal security proof of our secure twoparty decision tree classifcation protocol described in this section.We show that the protocol satisfes computational semihonest security by proving the existence of probabilistic polynomial-time (PPT) simulators whose output is computationally indistinguishable from the real view of each party during the protocol execution.
Let VIEW 0 (x 0 , x 1 ) denote the view of party S 0 during an execution of the protocol on inputs (x 0 , x 1 ), consisting of its input x 1 , internal random coins r 1 , and received messages.Similarly, VIEW 1 (x 0 , x 1 ) denotes the view of party S 1 .We construct the following PPT simulators (Algorithms 5 and 6).
We now show that the output of each simulator is computationally indistinguishable from the real view.Theorem 10.Te secure two-party decision tree classifcation protocol satisfes computational semihonest security.Formally: Te SETUP message and PRF randomness r 0 ′ generated by Sim1 are identically distributed as in the real protocol execution.Te simulated transcript consists of (1) Encrypted inputs computed on x 0 , x 1 ′ (2) FSS keys generated independently of inputs (3) Encrypted outputs that encrypt results from x 0 , x 1

′
Tese are all computationally indistinguishable from the real transcript due to the IND-CPA security of the encryption scheme and the security of the FSS scheme.Terefore, Sim0(x 0 , f(x 0 , x 1 )) ≈ VIEW 0 (x 0 , x 1 ).By a similar argument, we can show Sim1(x 1 , f(x 0 , x 1 )) ≈ VIEW 1 (x 0 , x 1 ).Since PPT simulators Sim0 and Sim1 exist where the output is computationally indistinguishable from the real view of each 8 Complexity party, this proves that the protocol satisfes computational semihonest security.Tis security proof demonstrates that our protocol protects the privacy of each party's inputs and decision tree model during the secure two-party computation.By simulating the views using arbitrary inputs, we have shown that the views leak no additional information beyond the intended output.Terefore, our protocol provides provable security guarantees for practical applications requiring privacy-preserving decision tree classifcation.

Experiment
We present experimental results evaluating the performance of our secure two-party decision tree classifcation protocol on the MNIST dataset [54].Specifcally, we analyze the impact on accuracy of varying the number of training epochs.We also benchmark the runtime of training and inference under diferent model confgurations.Finally, we compare our approach to prior frameworks from related works regarding efciency and accuracy.

Experimental Setup
In our study, we implement the secure two-party decision tree training algorithm in Python.To facilitate communication between parties, we utilize the communication backend of the Porthos framework in EzPC [55].We employ a pseudorandom function (PRF) based on the block cipher AES using the OpenSSL-AES library [56,57].At the same time, the fully homomorphic secret sharing (FSS) schemes are implemented using the LibFSS library.Te implementation is executed on two terminals with Intel(R) Core(R) CPU i7-6700 running the Ubuntu 18.4 operating system and 16 GB of RAM, with each terminal representing a party (S 0 and S 1 ).Te reported communication overhead includes the communication between the two parties, while the runtime incorporates the computational costs of local computation within each entity and the communication latency between them.For experiments conducted over a local area network (LAN), we assume a bandwidth of 2 Gbps and an echo latency of 0.3ms.We use secret-sharing protocols over the ring Z 2 64 following existing works [23,58].We encode the inputs using a fxed-point representation with a precision of 20 bits.
Our implementation demonstrates the practical viability of secure two-party decision tree training for data analysis applications prioritising privacy.By leveraging commonly available resources such as Python and the Porthos framework in EzPC, we provide a simple yet efective solution that can be quickly adopted for particular data analysis tasks.In summary, our study presents an efcient and practical approach to implementing secure two-party decision tree training, providing insights into designing secure data analysis systems for real-world applications.
In this section, we give the accuracy of secure two-party decision classifcation.Our study aims to evaluate the effectiveness of secure two-party decision tree classifcation by conducting several epochs of training on a decision tree classifer.Specifcally, we perform 5, 10, and 15 epochs of training on the model and record the corresponding accuracies obtained in each case.
Upon analyzing the results, we present the fndings in Table 2. Due to space constraints, we only report the plaintext training results and the corresponding secure training results.Te table shows that the trend in secure training accuracy is similar to that of plaintext training accuracy, with no discernible fuctuations.Furthermore, the diference between the accuracy obtained from secure and plaintext training is approximately ± 0.05%.Tese results suggest that the secure two-party decision tree classifcation method efectively achieves high accuracy while preserving data privacy.To sum up, the experimental results demonstrate that secure two-party decision tree classifcation can achieve performance comparable to plaintext training, with only a negligible diference in accuracy, making it an upand-coming method for secure data analysis.
In our study, we investigate the impact of varying the number of training data and the maximum tree depth on the communication overhead of secure two-party decision tree classifcation.Firstly, we examine the relationship between (1) Generate random coins r 0 ′ for PRF evaluation (2) Generate SETUP message to FSS functionality F (3) Run protocol execution locally on inputs (x 0 , x 1 ′ ), outputting f(x 0 , x 1 ) (4) Use r 0 ′ as randomness and arbitrary x 1 ′ as input (5) Output view (x 0 , r 0 ′ , transcript) ALGORITHM 5: Simulator sim0.
Complexity the number of training samples and the communication cost.Figure 2 shows that as the number of training samples increases, both the cost of data and communication grow roughly linearly.Tis result can be attributed to the secure two-party decision tree training phase requiring more multiplication operations to compute the impurity gain with a more signifcant number of training samples.Secondly, we explore the impact of varying the maximum tree depth on the communication overhead.As the well-trained tree tends towards a complete binary tree, approximately 2h − 1 internal nodes are constructed for a given depth d.Terefore, as shown in Figure 3, the communication overhead increases logarithmically with the tree depth.Tis result is because deeper trees require more computation and excellent communication between parties.
Our results show that the communication overhead is infuenced by critical factors such as the number of training samples and the maximum tree depth.As such, it is essential to carefully consider these factors when designing secure two-party decision tree classifcation systems to ensure optimal performance while maintaining data privacy.In conclusion, our study highlights the need for efcient and secure methods for decision tree classifcation, especially in situations where data privacy is of utmost importance.
Te study [59] presents an initial GPU-based implementation of function secret sharing, although further optimizations could reduce the memory footprint of cryptographic keys by approximately 50% to match theoretical minimum bounds.Moreover, the marginal divide between LAN and WAN runtimes intimates that computational overhead supersedes communication for sufciently extensive networks.Tus, optimizing GPU-centric calculations profers the potential to enhance overall efciencies of inference and training paradigms markedly.

Discussion
We have demonstrated the utility of function secret sharing for private training and evaluation of decision trees.Compared to related works, our protocols are highly competitive and achieve negligible failure rates for ML applications.
Numerous opportunities remain to improve performance further and expand the applicability of private ML via function secret sharing.Running experiments at 16 bit precision versus 32 bit could be another promising improvement, as major ML frameworks now support 16 bit encoding on CPU.Reducing key sizes, leveraging lower precision, and GPU optimizations can help overcome scaling bottlenecks.Testing new model architectures and data modalities will be essential to gauge general viability.Overall, there is tremendous promise in employing function secret sharing primitives to enable practical secure computation for diverse machine learning pipelines.
Moreover, a core technical challenge in applying homomorphic encryption (HE) to secure machine learning is managing the noise growth inherent in lattice-based cryptosystems.Our scheme introduces randomness into the ciphertext to ensure security when applying HE.However, each subsequent homomorphic operation (addition or multiplication) also accumulates and amplifes this noise.Excessive noise during HE evaluation inhibits correct decryption and reduces arithmetic fdelity.In the context of secure decision tree protocols, imprecise calculations may propagate errors when calculating attribute thresholds and reduce model accuracy.While multiplication noise worsens exponentially, even repeated additions can produce considerable noise.
We employ techniques, including optimized circuitry, modular design, and regular ciphertext refresh, to suppress noise.However, some accuracy loss may still occur for deep   10 Complexity trees and large datasets.We will empirically quantify the potential degradation in accuracy due to noise in future work.Analyzing the impact on accurate data will better reveal the actual impact.If noise-induced inaccuracies prove unacceptable, an alternative, homomorphic encryption scheme with slower noise growth is a better choice.However, these usually require more computational overhead.Developing robust protocols for large amounts of noise remains an open problem when applying HE to machine learning.
Te inherent stochasticity of lattice-based HE afects model accuracy when noise accumulates across multiple operations.While this paper mitigates noise growth through multiple strategies, more empirical analysis is needed to determine the extent of this problem in practice.Managing noise persistence remains an active research challenge in building efcient and accurate protocols for the secure computation of encrypted data.We have demonstrated the utility of function secret sharing in private training and evaluation of decision trees.Our protocol is highly competitive compared to related work and sufers a negligible failure rate for machine learning applications.

Conclusions
In conclusion, our research has introduced a two-party secure decision tree classifcation protocol that ofers low communication and computational costs and minimal client interaction.Our approach enhances the practical implementation of the solution by improving the multiplication depth of the tree evaluation circuit and the efciency of the underlying general FSS solution.Notably, we have utilized a unique approach of adding relatively small amounts of blurring noise by each participant in threshold decryption, resulting in a considerable reduction in the overall computational cost and ciphertext size of FSS.Together, our contributions have enabled the application of our protocol with a lower computational overhead while maintaining a higher level of security.
(i) A threshold value is assigned to each decision node by the function thr: [0, m − 1] ⟶ Z. (ii) An attribute index is assigned to each decision node by the function att: [0, m − 1] ⟶ [0, n − 1].(iii) A label of each leaf node is assigned to each decision node by a labeling function lab: [m, M − 1] ⟶ c 0 , . . ., c k−1  .
(1) distributed point function (DPF) (Gen DPF a,b , Eval DPF a,b ) that satisfes f a,b (x) � b is x � a and 0 otherwise and (2) distributed comparison function (DCF) (Gen DCF a,b , Eval DCF a,b ) that satisfes f a,b � b is x < a and 0 otherwise.Defnition 2 (function secret sharing there exists a a DCF for f < α,β : G in ⟵ G out with key size 4n • (λ + 1) + nl + λ, where n �  log|G in | and l �  log|G out |.For l ′ � l/λ + 2, the key generation algorithm Gen invokes G at most n • (4 + l ′ ) times and the algorithm Eval invokes G at most n • (2 + l ′ ) times.We use DCF n,G to denote the total key size, i.e., |k 0 | + |k 1 |, of the DCF key with input length n and output group G. On the other hand, we use DCF n,G (nonbold) to denote the key size per party, i.e., |k b |, b ∈ 0, 1 { }.Tis captures the key size used in Eval algorithm.In the rest of the paper, we use DCF n,G to count number of invocations/ evaluations as well as key size per evaluator P b , b ∈ 0, 1 { }.
server starts by computing for each node v ∈ D the comparison bit b ⟵ [x att(v) ≥ thr(v)] and stores b at the right child node (v.right.cmp� b) and 1 − b at the left child node ((v.left.cmp� 1 − b)).It is illustrated in Algorithm 1.

10. 2 .
Computing Decision Bits.Te server starts by computing for each node v ∈ D the comparison bit b ⟵ [x att(v) ≥ thr(v)] and stores b at the right child node (v.right.cmp� b) and 1 − b at the left child node (v.left.cmp� 1 − b).It is illustrated in Algorithm 2.

Table 1 :
Comparison to prior work.
If v is a right node, it stores b; otherwise, it stores 1 − b. (vii) v.cLabel: the classifcation label is stored in the variable v.cLabel if v is a leaf node; otherwise, it stores an empty string.
(v) v.right: Pointers to the right child nodes are stored in the variables v.right.For leaf nodes, these pointers are null.(vi) v.cmp:During tree evaluation, the comparison bit b ⟵ [x att(v.parent)≥ x thr(v.parent)] is computed and Complexity stored in the variable v.cmp.

Table 2 :
Te performance of secure two-party decision tree classifcation.