Private Function Evaluation Using Intel’s SGX

Private Function Evaluation (PFE) is the problem of evaluating one party’s private data using a private function owned by another party. Existing solutions for PFE are based on universal circuits evaluated in secure multiparty computations or on hiding the circuit’s topology and the gate’s functionality through additive homomorphic encryption. These solutions, however, are not efficient enough for practical use; hence there is a need for more efficient techniques. This work looks at utilizing the Intel Software Guard Extensions platform (SGX) to provide a more practical solution for PFE where the privacy of the data and the function are both preserved. Notably, our solution carefully avoids the pitfalls of side-channel attacks on SGX. We present solutions for two different scenarios: the first is when the function’s owner has an SGX-enabled device and the other is when a third party (or one of the data owners) has the SGX capability. Our results show a clear expected advantage in terms of running time for the first case over the second. Investigating the slowdown in the second case leads to the garbling time which constitutes more than 60% of the consumed time. Both solutions clearly outperform FairplayPF in our tests.


Introduction
In Private Function Evaluation (PFE), a participant S 0 holds some private function f, while participants S 1 , S 2 , . . ., S m each have their own private input x i . ese parties would like to work together to find f (x 1 , x 2 , . . ., x m ) while retaining the confidentiality of their respective inputs and of S 0 's function.
is problem is useful when an entity holding a proprietary piece of software would like to offer some service using that software to other entities that have confidential data. One typical example would be a privacy-centric recommendation system. Using PFE, a company can run their proprietary algorithms to recommend products to consumers while maintaining privacy of consumers data even from the company itself.
is problem is similar to Secure Multiparty Computation (SMPC) in that both problems require the input data to remain hidden. However, PFE additionally requires the function to be private while SMPC assumes a publicly known function. e performance of SMPC solutions has improved a lot over the years making SMPC more practical and thereby more widely adopted. is is not the case with PFE as the additional requirement of function privacy adds more complexity to the problem. Although not very practical, solutions for PFE do exist and are mostly adapted from techniques used in SMPC.
One such solution involves running a universal circuit in SMPC that takes x 1 , x 2 , . . ., x m in addition to C f , a circuit representation of f, as inputs. e idea is that SMPC insures the privacy of all inputs; hence the privacy of the function is insured since it is part of the input. e issue, however, is that a universal circuit that can run C f will be of size Ω (|C f | log |C f |) according to the state of the art [1]. Different universal circuits must be constructed depending on |C f | adding further cost to this solution.
More recent solutions involve modifying the garbled circuits used in SMPC in order to hide the gates functionality and circuit topology of C f . ese approaches can actually achieve a linear cost of |C f |, but with an additional cost of a linear amount of asymmetric key operations which are not practical. Without asymmetric key operations, the best solution still takes O (|C f | log |C f |) time. Hence, these approaches are also not very practical.
In this article, we propose a practical solution to the PFE problem which builds on Intel Software Guard Extensions (SGX). SGX is a set of CPU instructions provided by Intel to allow hardware-based protection of the running software. SGX provides this protection through the use of enclaves, which are protected areas of execution in memory. Code and data in an enclave will be encrypted and hidden from even the operating system. SGX provides powerful tools that can help in developing a more practical PFE solution. However, we note that SGX does not hide memory accesses and instruction trace; hence it is vulnerable to side-channel attacks which we should carefully consider.
is work looks at developing protocols that make use of SGX to create more practical solutions for PFE.
Our key contributions can be summarized as follows: (i) We propose the first approach built on SGX to offer efficient PFE. (ii) We consider two scenarios for our solution: one where the function owner has the SGX-enabled device and a more challenging scenario where a data owner or any third party has SGX that assists in performing PFE. (iii) We analyze the security and the performance of our scheme theoretically. We also implement a proofof-concept of our solutions in both scenarios to benchmark the efficiency of our approach and show that it outperforms current existing solutions.

Preliminary Definitions and Background
In this section, we present an overview about concepts and technologies which are needed to build our solutions.

SGX.
SGX is a new set of instructions which were provided by Intel in order to enable developers to create enclaves which are protected areas in memory. SGX guarantees both integrity and confidentiality to data and code inside the enclave. Intel provided an API for developers to create applications with enclaves. SGX supports multiple enclaves with a limited total size of 128 MB. Unfortunately, SGX does not hide the memory access of the enclave which opens the door for both deterministic and probabilistic sidechannel attacks [2].
Remote SGX attestation provides a hardware-based guarantee that a certain software is running on another server's enclave. is means that a client will be able to gain confidence that the server it is communicating with is running a legitimate enclave. Attestation also provides a secure communication channel by establishing shared keys between the client and the enclave which enables the client to encrypt messages that can only be read by the enclave itself but not by the host of the enclave.

Garbled Circuits (GC). Originally introduced by Yao [3]
, GC is a technique that performs secure multiparty computation in the two-party setting. A formal description of GC can be found in [4].
In GC, the function is assumed to be public and represented by a circuit. e two parties work together to evaluate the result of the function on their respective private inputs. One party assumes the role of garbling the circuit and is denoted by the garbler, while the other party evaluates the garbled circuits and is referred to as the evaluator.
Garbling means blurring the circuit in a way, and it implies replacing the wire values which are 0 or 1 prior to garbling to pseudorandom values of a size defined by the security parameter k. For each wire w i , the garbler picks two random k-bit values w 0 i and w 1 i to denote its new 0 and 1 values. For each gate whose input wire labels are w l and w r , output wire label is w o , and truth table is T; the garbler ). After garbling, the garbler sends the garbled circuit and his garbled inputs to the evaluator. e evaluator must also receive his garbled input representation from the garbler.
is is done through Oblivious Transfer (OT). OT is a method that allows the garbler to present two choices w 0 i and w 1 i to the evaluator and the evaluator has to pick one of these two choices without learning the other choice and without allowing the garbler to learn the choice which was picked by the evaluator.
Given all garbled input and garbled circuit, the evaluator evaluates the garbled gates in topological order until the final output is reached. Each garbled gate is evaluated by decrypting each ciphertext in its garbled table using the two gate inputs w a l and w b r and checking if the decrypted value is a valid gate output (could be done by padding the valid gate output with zeros). e final output of the circuit can be sent back to the garbler to decode it.

GC Optimizations.
ere have been many presented optimizations for GC that greatly improved its practicality. ese optimizations generally follow two different directions: (i) Reducing the number of ciphertexts per gate in order to reduce network transfer time [5][6][7][8] (ii) Improving the computational efficiency of garbling and the evaluation of the circuit [9,10] e first direction involves a specific process of generating the garblings which, as a consequence, creates a dependency between gates' input wires and the corresponding output wire. ese dependencies do not leak information and are proven secure. e second direction for optimizations aims to make the garbling and evaluation computation more efficient without paying any cost in terms of the security of the overall garbled circuits. is work makes use of two of these optimizations: Point and Permute and Fixed-key block cipher.
Point and permute [10] assigns a bit to each wire garbling and permutes the truth table in a way that makes it possible for the evaluator to realize which ciphertext from the truth table he should be decrypting, thereby removing the need of further decryption to make sure that the correct one is decrypted.
Fixed-key block cipher [9] notes that fixing the key used during garbling and evaluating makes the block cipher operation a simple permutation that is executed much faster than evaluating the full block cipher since the computations that only depend on the key can be precomputed before hand.

Universal Circuits.
A universal circuit UC is a circuit that takes other circuits as input and evaluates them. If C is some circuit that UC supports and x is some input to C, then, UC (C, x) � C (x). e main issue is the size of the universal circuit, and recently researchers have proposed solutions to obtain smaller universal circuits. For the class of size n circuits, Valiant's universal circuit [11] is of size 19 nlogn with depth O (n) and Kolesnikov and Schneider's universal circuit [1] is of size 1.5 nlog 2 n though it has smaller universal circuits for circuit sizes less than 5000. Kiss and Schneider [12] further reduced Valiant's bound by constructing universal circuit where the number of AND gates is bounded by 5 nlogn and where the number of total gates is bounded by 20 nlogn. Although Kiss and Schneider [12] showed that it is practical to implement PFE using Valiant's size-optimized universal circuits, they claimed that "universal circuits are not the most efficient solution to perform PFE." Despite the fact that implementations for Secure Function Evaluation (SFE) protocol with billions of gates have been reported in the literature, the best reported implementation for a universal circuit based PFE protocol [12] is for simulated circuits of 300,000 gates, which results in a universal circuit of at most 245,627,140 gates (at most 61,406,785 AND gates).

Terminology.
We define the "circuit owner" as the party who has a private function, while a "data owner" is a party who owns private data. e terms program owner and the circuit owner are used interchangeably. e "enclave parent" is the party that creates an enclave in which all sensitive computations are done. We also use the terms enclave parent and the enclave host interchangeably.

Related Work
In this section, we review related work on secure multiparty computations and give an overview for the development of garbled computing. We also discuss SGX applications and the advantages of using SGX in solving PFE.

GC with Universal
Circuits. GC provides secure multiparty computation (SMPC) and hides the input data but assumes a public function. If, additionally, the function that the program owner wishes to evaluate is passed as part of the input to SMPC, then the function will be hidden together with the input data. us, one way to achieve PFE is by running a UC using GC whose input is C f and x. Note that any kind of GC optimization is applicable here because it is not required to hide the functionality of the UC (which is actually publicly known). However, there will be an additional logarithmic cost incurred since a UC will be of size Ω (|C f | log |C f |). Additionally, a UC has to be constructed specifically to run a certain set of possible circuits. If the UC construction cost was pushed to offline, then the produced UC needs to be big enough to evaluate any possible circuit input which will create a very big UC. Nevertheless, this approach does have some benefits as it supports multiple data owners. e current state-of-the-art UC construction for small circuit was proposed by Kiss and Schneider in [12] and achieves an upper bound of 1.5 nlog 2 n. One may also use TinyGarble [13] techniques to construct smaller circuit for PFE. Most existing garbled circuit techniques convert a function/program to a combinational Boolean function with a directed acyclic graph (DAG) of binary gates. e authors in [13] analyzed the approach which first converts a function/program to a sequential circuit, which allows having feedback from the output to the input by adding the notion of a state (memory). en, one can convert each sequential cycle to a Boolean combinational logic. e results in [13] show that this approach can reduce the size of the garbled circuit significantly. FairplayPF [1] is a well-documented framework for secure evaluation of private functions using universal circuits. It is an extension of the classical Fairplay [14], which is a tool for secure two-party computation with a publicly known function. Most of the proposed PFE techniques have been of theoretical interest. ey lacked implementation and lacked tools for program (private function) development. is is attributed to the fact that PFE is still very slow to provide performance that meets the time requirements of real applications. FairplayFE was unique in the sense that it provided a tool and an implementation of PFE. erefore, we used FairplayFE as the baseline to evaluate the performance of our proposed techniques.

Modified GC: Nonuniversal Circuit-Based PFE.
In the modified GC, we try to hide both the individual gate functions and the topology of the circuit C f . e gate functions can be hidden by using only universal gates such as the NAND. Indeed the result will be that all gates have the same function which does not leak information. Hiding the topology is a bit more tricky though. To do so, existing solutions make a distinction between outgoing wires and ingoing wires. Outgoing wires are gate output wires and the circuit input wires. Assuming that the input size is n, then the number of outgoing wires is n + |C f |. Ingoing wires are gate input wires which means that the number of ingoing wires is 2 |C f | (as we assume that all gates are NAND which are binary gates). Ingoing wires and outgoing wires have different wire labels and in order to connect an outgoing wire like a gate output to an ingoing wire, some translation needs to take place. is translation needs to be oblivious to Security and Communication Networks the data owner but can be known to the program owner.
ere are currently two solutions that achieve this, one that uses additive homomorphic encryption [15] and another that uses switching networks [16].
Katz and Malka also introduced a more efficient variant PFE protocol with provable security in the random oracle model. e second protocol is roughly twice as efficient as the first one. PFE protocols in [15,16] use a singly homomorphic public-key encryption scheme such as the additive homomorphic Paillier encryption scheme. Let C f be a circuit that computes P 2 's function f and that C f contains only NAND gates. Assume that C f have n gates and it take l-bit inputs. In a high level, the PFE protocol proceeds as follows.
(1) Given the pair (n, l), P 1 generates a sequence of n + l pairs of labels which are encrypted using singly homomorphic encryption scheme and sent to P 2 (2) P 2 obliviously groups these labels in gates to form a circuit C f using a linear transformation compatible with the singly homomorphic encryption scheme and sends the gates to P 1 (3) After decrypting the gates, P 1 produces a garbled circuit corresponding to the circuit C f by garbling the n gates received from P 2 independently (and P 1 does not learn the circuit in this way) (4) P 1 gives an encoded version of the input x to P 2 and P 2 evaluates the garbled circuit to obtain the circuit e PFE protocols in [15,16] were described with P 2 learning the output f (x). e limitation of this approach is that the server P 2 is allowed to compute any function of his choice. Hence, a malicious P 2 can just provide a function to output the data owner inputs without violating the privacy definitions of the authors. For this reason, in our settings of the PFE protocols, we do not allow the function owner to get any output. Katz and Malka explain how to modify their protocol at no additional cost to achieve this. e modified version is the closest solution to our setting. e PFE protocols in Katz and Malka [15] have provable security in the semihonest security model with the assumption of semantic security for homomorphic encryption schemes and linear-related key security for symmetric encryption schemes. It is also worth noting that Mohassel et al. proposed later a solution for PFE protocols which is secure against malicious adversaries [17]. e latter, however, relies heavily on zero-knowledge proofs and is therefore much more costly.

ORAM with GC.
Based on the physically shielded Central Processing Unit (CPU) technique [18], Goldreich and Ostrovsky [19] proposed a theoretical treatment of software protection by formulating the problem in the setting of learning a program structure by observing its execution. Using this new formulation, they reduced this problem to online simulation of any programs on oblivious RAMs (random access machines). A machine is oblivious if its access to memory locations is independent of the input values and is processed with the same running time. Lu and Ostrovsky [20] showed how to design garbled ORAMs by constructing t pairs of garbled circuits where t is the maximum runtime of the ORAM, O i ORAM simulates the ith-step memory read/write command, and O i CPU simulates the ith-step shielded CPU operation. Gentry et al. [21] showed that in order to prove the security for the garbled RAM scheme in [20], an additional circularity assumption is required. Gentry et al. [21] then proposed two new constructions to avoid this additional assumption.
3.4. SGX. SGX has been used in several applications such as private membership testing [2], oblivious RAM [22], and secure indexing [23]. SGX has also been used for secure multiparty computation. For example, Bahmani et al. [24,25] designed practical secure multiparty computation (SMP) using SGX. As mentioned above, SGX may leak information about the program running through sidechannel attacks. References [26,27] use Intel SGX to achieve secure function evaluation, with the former trying to hide the function.

Benefit over Related Work.
In both cases, using SGX can keep the cost linear to the |C f | which makes this solution more scalable to larger circuits in comparison with UC or the modified GC using switching networks. Compared to the nonuniversal circuit based PFE protocols in Katz and Malka [15] (see also [16]), our approach has two advantages. First, our approach allows more than one data owners to participate in the private function evaluation while the PFE in [15] has only two participants: one data owner and one circuit owner. Secondly, our approach only uses symmetric key ciphers while the PFE protocols in [15,16] requires an additive homomorphic encryption scheme to obliviously connect each of the gates. at is, at least two extra additive homomorphic encryption operations for each wire are required for each participant. ese additive homomorphic encryption operations are the major cost for the PFE schemes in [15,16].
us, our scheme is significantly more efficient.

PFE Leveraging SGX
A trivial design of PFE with SGX may run a private program within an SGX enclave directly. One might think for example to run the program in an enclave and interpret it there. e attestation can show that the enclave is running a trusted interpreter (e.g., Java Virtual Machine), and keys to decrypt the data to perform the computation can be securely shared with the enclave and not with the developer of the private function. However, this approach is vulnerable to several attacks. For example, one may use the program's runtime or the memory access pattern to infer some information about private inputs (see, e.g., [25]). A number of research studies have actually shown that utilizing a cache can reveal details about the execution of a program [28][29][30][31][32].
To defeat these side-channel data leakage attacks that SGX is susceptible to, the program is represented as a circuit.
is is because circuits perform the same memory accesses regardless of the input data and hence are memory access oblivious. Accordingly, the words "circuit" and "program" are used interchangeably.
While SGX is available on most modern Intel CPUs, it may still be the case that a machine does not have SGX support; hence we propose two solutions that cover different assumptions: (i) e program owner has an SGX supported CPU.
is is the simpler of the two cases. (ii) e program owner's machine does not support SGX. In this case, SGX support is available either on the data owner's machine, or on the machine of an untrusted third party other than the data or program owners. e technique to address this second scenario is the same regardless of whether SGX is available on the machine of a third party, or the machine of the data owner (in this case, the data owner takes the role of the third party).

Technique 1: Circuit Owner Has SGX.
In the first case, the circuit owner creates an enclave with a specific list of tasks. is enclave must first attest to the data owners that it is a valid enclave and that it is performing the right protocol. For example, the enclave should attest to the data owners that the enclave will only issue Ocall at the end of the function evaluation and the Ocall should only return encrypted output value under an established key with the data owner. Furthermore, the enclave should also attest to the data owners that the enclave will only accept circuit description (instead of general program descriptions with loops, etc.) from the program owner. During the attestation phase, a secret key between each data owner and the enclave will be established using the Diffie-Hellman key agreement protocol. is key will be used to encrypt all future communication content between the data owner and the enclave. Specifically, the data input will be encrypted using this key.
is concludes the initialization phase. Starting the online phase: (i) e data owners encrypt their input using the established secret keys and send it to the enclave via the enclave's parent. (ii) e enclave also receives the circuit from the program owner who is also its host (parent). (iii) Since both data and circuit are in the enclave, it performs the evaluation and obtains an output. e output is encrypted and then sent to the data owners so that only they can learn the output and not the program owner. Figure 1 demonstrates case 1. Data owner and enclave are trusted parties while program owner who is the enclave's parent is an untrusted party.
Note that this case is simpler than the next one, as the function does not need to be private with regard to the enclave. e function is indeed executed at the function owner side, but in an enclave, and the function owner naturally knows the function.

Technique 2: e Circuit Owner Does Not Have SGX.
In the second case, the circuit owner does not have an SGX but a data owner or a third party has an SGX-enabled hardware. In the following, we consider the case in which a third party has an SGX equivalent to the case where a data owner has an SGX. In this case, the participants may jointly use the SGX enclave to evaluate the circuit on the data inputs. In order to hide the memory access patterns from the SGX owner, one can implement the ORAM within the enclave. However, this approach is not as efficient as the following fixed-key based garbled circuit technique since it is more efficient to garble a circuit obliviously than to implement the ORAM within an enclave. In a high level description, the data owners and the program owner submit their data/program to the enclave. e enclave garbles the data/program and delivers the garbled circuit/data to the program owner who will evaluate it. Finally, the program owner sends the encrypted garbled circuit output to the enclave which decodes the garbled output, encrypts the output, and sends it to the data owners to decrypt. Note that this case works if any of the data owners have an SGX too, so it is not necessary for the enclave's parent to be a third party. e initialization phase is almost identical to the one in the first case but with the program owner also sharing a secret key with the enclave using the attestation phase. e online phase, however, differs as the job of this enclave is to garble C f and not to evaluate it, because evaluating can leak some information about C f 's topology through memory access patterns. When garbling, however, care must be taken as not to leak the functionality of the gates through side channels or the topology of the circuit. To hide the functionality, the garbling scheme must garble each gate in the same way regardless of the gate type; therefore, gate type dependent optimizations such as FreeXOR cannot be used. Instead Garbled Row Reduction techniques can be used together with fixed-key block cipher and point and permute. To hide the circuit topology, gates should be garbled independently from one another. is can be done by generating wire garblings on the fly during garbling on each gate even if they were generated before. Specifically, assume that the circuit C has n gates G � {g 1 , . . ., g n } and l inputs w 1 , . . ., w l . Let W � {w 1 , . . ., w n+l } be the collection of wires. Each input wires, w i3 is the output wire, and T gi is the gate type (that is, AND, OR, or NAND). By using the fixed-key garbling technique, the labels K 0 w i1 , K 1 w i1 , K 0 w i2 , K 1 w i2 , K 0 w i3 , K 1 w i3 for the wires w i1 , w i2 , w i3 can be computed on the fly as K b w ij � F(R, w b ij ) for b � 0, 1 and j � 1, 2, 3, where F is a fixed function and R is a fixed entropy string. us given a description of the circuit C � (W, G), one can obtain the garbled circuit by garbling each of the gates g 1 , . . ., g n in a sequence independently (obliviously). It is noted that for each gate with two incoming wires and one outgoing wire, the garbling scheme needs to compute six labels. For a gate with one incoming wire (e.g., a negation gate) and one outgoing wire, the garbling scheme only needs to compute four labels. us, an attacker could use the timing channel to guess whether a gate under garbling process contains two incoming wires or a single incoming wire. To avoid this attack, our above discussion assumes that each gate in circuit C has two incoming wires. In case that the circuit contains gates with a single incoming wire, the garbling scheme should introduce time-delays when garbling such kind of gates.
Accordingly, technique 2 works as follows: (i) e data owner S i chooses a random string R i , encrypts its data input and the R i using its secret key with the enclave (established during the attestation phase) and sends it to the enclave's parent (third party). (ii) e circuit owner S 0 chooses a random string R 0 , encrypts the circuit and the R 0 using its secret key with the enclave (established during the attestation phase) and sends it to the enclave's parent. (iii) e enclave's parent (untrusted third party) passes the encrypted data, the encrypted circuit, and the encrypted random strings to the enclave (third party). (iv) e enclave computes the entropy string R � H (R 0 , R 1 , . . ., R m ) for the garbling process where H is a secure hash function. (v) e enclave garbles the circuit and the input data as explained above. (vi) e enclave encrypts the garbled circuit and the garbled input data using the secret key shared with the circuit owner (established during the attestation phase) and sends the encrypted values to the circuit owner. (vii) e circuit owner decrypts the garbled circuit and the garbled input data and evaluates the garbled circuit on the garbled input data. (viii) e circuit owner encrypts the garbled output using the secret key shared with the enclave and sends it back to the enclave via the enclave's parent. (ix) e enclave decrypts the garbled output, creates copies of the output for data owners, and encrypts each copy with its appropriate key and sends these encrypted copies to the data owners. Figure 2 explains the technique.

Encryption.
e communication between the data owners and the enclave is encrypted using AES-CBC using the keys that are generated by the attestation process. Initial vectors (IV) are incremented with every encryption in order to eliminate the possibility of having the same cipher when encrypting the same value more than once. e same encryption algorithm is used to secure the communication between the circuit owner and the enclave. Clearly, the enclave should use the appropriate key to decrypt the received data (or circuit) while maintaining a corresponding incrementing for each IV.

Security
Our adversary on the enclave's host machine is high-privileged. Accordingly, he can watch all memory access. However, we assume semihonest adversary who has no intention in modifying messages coming from and to the enclave. We assume that the number of parties, the number and size of inputs, and the size of the circuit are known to all parties. We also assume that the inputs, the circuit, and the garbled circuit can all be fit inside the enclave. Enabling paging would allow an enclave to exceed the limitation of 128 MB with an expected slowdown. Despite the fact that the problem of converting a function into a circuit is not a trivial problem, we consider it out of scope since this procedure can be done completely on the program owner before executing the technique. e security of the two proposed protocols depends on the security of SGX. If the hardware has no faults and performs as intended then SGX should guarantee that the memory of the enclave and the registers used by it as well as the instructions performed by the CPU remain hidden from the host and its operating system in particular. SGX, however, does not protect from side-channel attacks. Hence if something can be observed through measuring the time that certain instructions take, or through the access made to memory or through other indirect means, then the two proposed protocols should protect against that information leakage.
In the first protocol that uses the program owner's SGX, the program stays hidden from the data owners.
is is because the program never leaves the program owner's machine and the SGX as created by the program owner himself will not leak information about the program to others. On the other hand, the data is encrypted using an established secret key between the enclave and each data owner using the attestation process. us the encrypted data could only be decrypted by the enclave but not by the program owner. By running a directed acyclic circuit, the enclave executes memory access instructions. e time these instructions take and any other observable side-channel issues will look the same regardless of what the input data was.
erefore the data stays hidden from the program owner. After evaluating the circuit the enclave returns the result only to the data owners to prevent the program owner from writing an identity function and learning all the data. Lastly, since the enclave is attested by the data owners they can be sure that the enclave is executing the correct protocol.
In the second protocol, each data owner receives the evaluation output of the garbled circuit on the garbled input data, and the program owner receives the garbled circuit and its garbled input. Since a garbled circuit hides the input throughout computation, the program owner cannot learn anything about the input data. Looking at enclave host, just like in the first protocol all communications between the enclave and the other servers are hidden from the enclave host through encryption. Additionally, the data of the other data owners also remain hidden because they are evaluated in a directed acyclic circuit that produces the same memory accesses and timings regardless of the input data. Since garbling a gate is done in the same way regardless of the gate type, then side channels cannot leak the gate type. In addition, generating the garbling from a permutation (fixedkey block cipher) of the wire labels every time will hide the topology of the circuit and avoid leaking information through memory accesses. Using both a garbling technique that is independent of the gate type and generating wire garblings independently at each gate, ensures that the circuit stays hidden during garbling. erefore, the program does not leak to any of the data owners and as mentioned before the individual data does not leak to other servers.

Experimental Evaluation
In this section, we first explain the setup for our experiments and then we discuss the results of these experiments. We compare the performance of our techniques with Fair-playFP, and we analyze the effect of the number of parties (i.e., data owners) on the performance. We also identify the bottleneck step of the second technique that is responsible for the biggest time overhead.

Experimental Setup.
We present two C++ implementations for multiparty private function evaluation using SGX.
e source code of our implementations can be downloaded from https://github.com/maanrachid/PFE-SGX: (i) e first implementation has one circuit owner and an arbitrary number of data owners np which is known to all parties. e circuit owner is the parent of the enclave so it creates the enclave, then passes the circuit to it using one ECALL. e data owners send their encrypted data in an arbitrary order to the enclave via its parent. Using a list of np different keys, the enclave decrypts the data. e enclave evaluates the circuit and sends encrypted output to the data owners via its parent using one OCALL for each data owner. (ii) e second implementation has several data owners (np data owners), one circuit owner and a server which is the enclave's parent. Both data owners and circuit owner send their data and circuit, respectively, in an arbitrary order to the server. e server passes them to the enclave which decrypts the data and the circuit using a list of np +1 keys and performs the garbling for the circuit and the data.
Wire's garblings are generated every time they are needed (rather than saving the garbling of a gate's output wire to be used for another gate's input). Two keys were used for garbling: one to generate the 0 garbling and another to generate the 1 garbling. Only point-and-permute is used in this implementation. e enclave sends the garbled circuit and the garbled input back to the circuit owner with all needed keys. e circuit owner evaluates the garbled circuit on the garbled input and sends the garbled output back to the enclave. e enclave finally decodes the garbled output and sends the output to the data owners.
Our implementations are tested with SGX's SDK version 1.9. e communication between the data owners and the enclave is encrypted using AES-CBC offered by Intel's IPP cryptography library. e same encryption is used for the communication between the enclave and the circuit owner in the second implementation.
Our experiments were run on a machine with 8 GB RAM and Intel(R) Core(TM) i7-6770HQ 2.60 GHz CPU with an enclave's maximum size of 128 MB. e sizes of the input, the output, and the circuit are known to all parties. Data owners, the circuit owner, and the server (enclave's parent) were run on the same machine in all experiments; nevertheless, each program takes the machine name and a port as parameters. Accordingly, they can be run on different machines. e circuits which we used in our experiments are downloaded from https://homes.esat.kuleuven.be/nsmart/ MPC/. e descriptions for these circuits are shown in Table 1. e circuits are well-known and they show a diversity in terms of circuit size. We slightly modified the format of each circuit in order to enable an arbitrary number of data owners with arbitrary input shares.

Results.
We compare our implementations with Fair-playPF in terms of time consumption. Since FairplayPF has different input format (Secure Function definition Language (SFDL)) than ours, we only run our tests using simple circuits. We study the effect of the number of clients on the performance, and we also analyze the cause of the relative slowdown in our second implementation. Table 2 shows the elapsed times for one complete round of each technique. ese numbers are obtained by running each round 1000 times and recording the average. Time command is applied on one of the data owners in order to obtain the time for each round.
We compared the performance of technique 1 and technique 2 and the relationship with the size of the circuit by sorting the circuits by size (wires and gates). It turns out that with the adder, technique 1 has a speedup 415 times over technique 2, while with the multiplication, technique 1 has a speedup of 23 times. With SHA-1 (the largest), there is a speed up of 13 times which suggests a negative correlation between the size of the circuits and the superiority of technique 1 over technique 2. Figure 3 shows the time consumption for technique 1, technique 2, and FairplayFP. As mentioned before, we restrict our comparisons to these circuits because they are the only circuits which we have in both formats: SFDL (for FairplayFP) and Bristol (for our work). For MD5, for example, An SFDL version of MD5 has to be implemented which is beyond the scope of this work.
We investigate the effect of the number of data owners by running the same experiments using 4 data owners instead of 2 owners. We got the same time consumption for all rounds suggesting that the number of clients has no effect on the performance of any of the techniques.
We studied the major reason behind the slowdown in technique 2. Table 3 shows that garbling is responsible for 60% to 80% of the total time consumption in most of the circuits except for the adder since it is a relatively small circuit and the communication cost has a relatively high cost.

Conclusion
is work provides a novel method of solving PFE with Intel Software Guard Extension. is method allows for a much more efficient and therefore more practical approach to performing PFE in comparison to previous solutions. e practicality of this solution will open the door to some interesting applications of PFE. Our future work will involve looking at some of these applications and developing a simpler full fledged system that can be used by regular developers to create PFE applications.
Data Availability e source code of our implementations can be downloaded from https://github.com/maanrachid/PFE-SGX. Disclosure e work was mainly performed while the author was working at Qatar University. Contents of the research are solely the responsibility of the authors and do not necessarily represent the official views of the Qatar National Research Fund.