Practical Frequency-Hiding Order-Preserving Encryption with Improved Update

Order-preserving encryption (OPE) that preserves the numerical ordering of plaintexts is one of the promising solutions of cloud security. In 2013, an ideally secure OPE, which reveals no additional information except for the order of underlying plaintexts, was proposed, along with the notion (mutable encryption) that ciphertexts can be changed. Unfortunately, even the ideally secure OPE can be vulnerable by inferring the underlying frequency of repeated plaintexts. To solve this problem, in 2015, Kerschbaum designed a frequency-hiding OPE (FH-OPE) scheme based on the notion of a randomized order under the strengthened security model. Later, Maffei et al. has shown that Kerschbaum’s model is imprecise, which means no such OPE scheme can exist. Moreover, they provided a new FH-OPE scheme under the corrected security model. However, their scheme requires the order information of all the encrypted plaintexts as an input; therefore, it causes relatively high overhead during encryption. In this work, we propose a more efficient FH-OPE based onMaffei et al.’ s security model and also present an improved update algorithm suitable for duplicate plaintexts.


Introduction
Cloud storage has become a common practice in recent years, but it still has privacy concerns with respect to the service provider hosting the data. In these data-outsourcing scenarios, encryption is one of the most reliable solutions. However, the existing normal encryption schemes have limitations; for instance, it is impossible to perform operations, e.g., range query, on encrypted data. To perform such operations, the client has to download all the encrypted data and decrypt them. To overcome these limitations, few solutions have been proposed by slightly weakening the security of the normal encryption schemes. Order-preserving encryption is one of the promising solutions and allows a client to perform efficient range queries on the encrypted data because it maintains the ordering of plaintexts in ciphertexts.

Related Works.
e first concept of order-preserving encryption was introduced by Agrawal et al. [1]. In 2009, Boldyreva et al. [2] presented the first formal security notion of OPE, which is called indistinguishability against ordered chosen plaintext attacks (IND-OCPA). Moreover, they showed that any stateless OPE cannot guarantee the IND-OCPA security unless the ciphertext space is exponentially large in the plaintext space. ey also presented the weaker security notion, which is known as pseudorandom orderpreserving function advantage under chosen ciphertext attacks (POPF-CCA). However, this security model does not precisely quantify the leakage information of plaintexts. Later, Boldyreva et al. [3] and Xiao and Yen [4] showed that ciphertexts of [2] scheme leak approximately the first half bits of the underlying plaintexts. Yum et al. [5] improved Boldyreva's construction by extending their work to nonuniformly distributed plaintexts but still remained in the same security level of random order-preserving functions. Subsequently, few OPE schemes [6][7][8][9][10][11][12][13][14][15] that provide no formal security proof were proposed, but rather they provided an ad hoc security analysis.
Recently, some ideally secure (IND-OCPA secure) OPE schemes [16][17][18][19][20], which are stateful or interactive, have been proposed. Popa et al. [18] developed an interactive model for clients and servers as a two-party protocol. e client encrypts plaintexts using a deterministic OPE algorithm and sends them to the server that maintains a search tree where ciphertexts are stored. When the client wants to perform range queries on the encrypted data, the server exploits the search tree. Moreover, they presented a notion of mutable encryption, which means that ciphertexts can be updated to achieve the IND-OCPA security. eir interacting scheme requires a large amount of communication. In 2014, Kerschbaum and Schröpfer [19] presented a revised ideally secure OPE scheme where the client stores the search tree.
is approach makes it possible for their scheme to incur lower communication cost than that proposed by [18].
To solve the problem of deterministic OPE [1,3,18,19,21,22] that are vulnerable to frequency analysis, sorting, and cumulative attack [23], Kerschbaum [16] presented a new frequency-hiding OPE to apply randomization to duplicate plaintexts. In addition, they introduced a stronger security notion than IND-OCPA, which is known as indistinguishability against frequency-analyzing ordered chosen plaintext attacks (IND-FA-OCPA). In 2017, Maffei et al. [17] has shown that Kerschbaum's security model is imprecise. erefore, they designed a new construction based on the corrected security model. However, their scheme causes relatively high overhead during encryption due to requiring the order information of all encrypted plaintexts. Moreover, we figure out that the update algorithm used in [16,17] cannot guarantee to produce a perfectly balanced search tree when duplicate plaintexts are encrypted.
Yang et al. [24] presented a semiorder-preserving encryption (SOPE) although with the sacrifice of the precision of order-preserving. In this scheme, two different plaintexts may be encrypted to the same ciphertext; thus, the ciphertext sequence cannot be mapped to a plaintext. Dyer et al. [25] presented OPE scheme based on the general approximate common divisor problem (GACDP). is approach is the first OPE scheme using a computational hardness, not on a security game. Like Liu and Wang [9], their scheme adds random noise to the initial plaintext so that if there are duplicate plaintexts, the ciphertexts seem like distinct. Kim [26] showed a new OPE scheme based on order-revealing encryption (ORE) and improved the round and client side storage complexities on the exiting ideally secure OPE [16,20]. Tueno and Kerschbaum [27] introduced an oblivious OPE (OOPE) as an equivalent of a public-key OPE; they also showed a protocol for OOPE that combines existing ideally secure OPE [16,19] with Paillier's homomorphic encryption and garbled circuits. In [28], Taigel et al. presented a real-life use case that combines OPE and decision tree classification to enable privacypreserving forecasting of demand for spare parts based on distributed condition data. In [29], Meng and Feigenbaum described an application of OPE that combines OPE, pseudorandom functions (PRFs), and additively homomorphic encryption (AHE) to design a privacy-preserving XGBoost inference algorithm, that is, to create an encrypted regression tree. Table 1 shows the comparison of the existing FH-OPE schemes. As mentioned before, the original definition of IND-FA-OCPA of [16] is imprecise, and their FH-OPE scheme is insecure under the security model that they proposed. In fact, no FH-OPE scheme that can be proven under their security model can exist. e scheme of [17] guarantees the IND-FA-OCPA security that has been revised to be feasible, but the client has to maintain the order information of all the encrypted plaintexts to date; this maintenance causes inefficiency in the client's persistent storage and the encryption performance.

Our Contributions.
To summarize, our contributions are as follows: (i) We propose a more practical FH-OPE scheme compared with the previous schemes. Our scheme does not require the order information of all the encrypted plaintexts; thus, the client does not need to maintain them. e security of the proposed scheme can be proven considering the IND-FA-OCPA security model of [17].
(ii) We figure out that the update algorithm in [16,17] is not suitable for random duplicate distributions. Moreover, it cannot guarantee to produce a perfectly balanced search tree when duplicate plaintexts are encrypted. To overcome this problem, we propose an improved update algorithm. e proposed algorithm always produces a perfectly balanced search tree regardless of the distribution of plaintexts and positively affects the overall performance of FH-OPE. (iii) We implement the schemes of [16,17] and the proposed scheme and evaluate the schemes based on different plaintext distributions. Among others, the implementation results show the excellence of our scheme.

Outline.
In Section 2, we recall the formal notion of (stateful) OPE and its security definitions. In Section 3, we analyze the scheme proposed by [17] and present that the scheme still needs to be improved in terms of storage and computational complexity. Section 4 proposes a new practical FH-OPE scheme and an improved update algorithm and shows that the proposed scheme achieves the IND-FA-OCPA security. We present the experimental results in Section 5. Finally, we conclude our work in Section 6.

Preliminaries
is section briefly recalls the formal notion of OPE and its security definitions.

Order-Preserving Encryption.
e OPE scheme is defined in two ways: stateless and stateful. A stateless scheme is difficult to achieve the IND-OCPA security. Instead of the stateful OPE being a key-less scheme, a state operates as a secret key. Definition 2 (order-preserving). An OPE scheme is orderpreserving if for any two ciphertexts y 1 and y 2 with corresponding messages x 1 and x 2 , we have y 1 ≥ y 2 ⇒x 1 ≥ x 2 .

Security Definitions.
e standard security notion of OPE is IND-OCPA [2]. It means that an adversary cannot know anything about plaintexts except for their order. Let n be the number of necessarily distinct plaintexts in sequence e security game Game IND-OCPA (λ) between the adversary A and challenger C for the security parameter λ proceeds as follows: (1) e adversary A prepares two sequences X 0 and X 1 of necessarily distinct plaintexts with the same order. He sends them to the challenger C.
. He sends y 1≤i≤n,b to the adversary A.
(3) e adversary A guesses which sequence is encrypted and accordingly outputs guess b ′ .
We say that the adversary A wins Now, we review IND-FA-OCPA, which is originally presented in [16] and modified in [17]. To capture "frequency-hiding" security, it allows duplicate plaintexts, e.g., However, two challenge sequences X 0 and X 1 have at least one common randomized order Γ. A randomized order Γ of X means any possible permutation of 1, 2, . . . , |X| { } is placed in an order according to X, and the order of duplicate plaintexts is randomly determined. For example, with X � 1, 1, 3, 3 { }, the randomized order Γ for X can be any of }. e randomized order Γ is precisely defined as follows.

Definition 4 (randomized order). Let n be the number of plaintexts in a sequence
Two sequences }. e security game Game IND-FA-OCPA (λ) between an adversary A and a challenger C for a security parameter λ proceeds as follows: (1) e adversary A prepares two sequences X 0 and X 1 that have at least one common randomized order. He sends them to the challenger C. (2) e challenger C randomly chooses b ⟵ 0, 1 { } and selects Γ from the common randomized orders of X 0 and X 1 . en, the challenger C executes the Setup (1 λ ) and runs It means that the relative order of duplicate plaintexts is determined by Γ ↓ i . He sends y 1≤i≤n,b to the adversary A.  Worst [16] O(log n) O(log n) IND-OCPA -X Imprecise [17] O As the randomized order of distinct plaintexts is equal to its order, the IND-FA-OCPA security is stronger than IND-OCPA.

Maffei Et Al.'s FH-OPE Scheme
We review the FH-OPE scheme of [17] in detail. e main idea is that the client maintains the randomized order Γ of all encrypted plaintexts and uses it as one of the inputs of the encryption algorithm. It externally determines their relative order for duplicate plaintexts in the encryption algorithm. A search tree T that maps plaintexts to ciphertexts is stored as a state S on the client side and used in the decryption algorithm. For a node of t of T, t.m and t.c represent a plaintext and a corresponding ciphertext. t.left and t.right denote the left and right child of t, respectively. Every node t in T stores its index based on the plaintext sequence. N is the number of distinct plaintexts, and n is the number of plaintexts in the sequence to be encrypted, which also means |Γ|. k is the number of plaintexts encrypted and stored on the server so far. M denotes the number of distinct ciphertexts, and its bit length is expanded by a factor of λ, i.e., O(λ log N). As described in Algorithm 1, the state S comprises min, max, and T. When the search tree T is empty, the state (min , max) is initialized as (0, M). e update (tree rebalancing) algorithm is as described in [16].
To review their scheme, we present a concrete example. Figure 1 shows the encryption results of 1 } properly based on Γ. However, Γ should be updated continuously with each encryption. For the mutable OPE schemes whose state is stored on the client side, the computational cost of rebalancing T is similar; thus, it will be excluded from the following efficiency analysis.

Computational Cost.
e encryption algorithm [17] has computational complexities O(log n) and O(n), except the rebalancing T in the best and worst cases, respectively. is is because in the search tree, the cost of finding an empty node and placing a plaintext based on T required for each encryption is O(log n). In addition, the cost of updating Γ is required. In the case of increasing sequential plaintexts, e.g., Storage Cost. ere are n elements in Γ, and each can be represented by log n bits. us, the client requires O(n log n) bits for additional persistent storage, except for T.
Rebalancing Tree. In [16,17], if there is no available ciphertext in T, it has to be rebalanced by calling the update algorithm. However, the algorithm presented in [19] was designed assuming that there were no duplicate plaintexts. erefore, the algorithm cannot guarantee to produce a perfectly balanced tree when duplicate plaintexts are encrypted. e result quality of the update algorithm significantly impacts the overall performance; thus, a new improved algorithm is needed.

Proposed Scheme
We propose a practical FH-OPE scheme described in Algorithm 2 that achieves the IND-FA-OCPA security with an improved update algorithm. Our search tree T does not need to store the index of the encrypted plaintexts. Let H: { } be a hash function with 1-bit output modeled as a random oracle. Our main idea is to replace the Input: x, S, and Γ � c 1 , . . . , c k Output: y ALGORITHM 1: Encryption [17]. inefficient input Γ with the combination of a single random value r ⟵ 0, 1 { } ploy(λ) and H. In our scheme, the selection of empty nodes for duplicate plaintexts is determined by H(r‖count). It means that the order of duplicate plaintexts is not determined internally but intended externally. e other notations and the initialization T are defined as described in the previous sections. Figure 2 shows the encryption of duplicate plaintexts {1, 1, 1, 1} based on our scheme. We can check that the algorithm produces distinct ciphertexts {64, 96, 80, 112} based on the chosen random values, e.g., r ′′ has the same role as Γ � 1, 3, 2, 4 { } in [17]. e existing update algorithm for FH-OPE [16,17,19] sorts the plaintext sequence X � t.m 1 , t.m 2 , . . . , t.m n in ascending order and simply re-encrypts the sequence. e algorithms cannot guarantee to produce a perfectly balanced tree because the node positions are randomly selected for the duplicate plaintexts. e idea of our improved update Algorithm 3 is simple. We build a new search tree T * on 1, 2, . . . , n { } where n is the number of nodes in T and replace t * .m i with t.m i , where 1 ≤ i ≤ n. We check that the resulting T * is a perfectly balanced tree because it has been built based on the distinct plaintexts.
In stateful OPE, the decryption algorithm can be omitted by the state that is stored on client side. However, this omission is without loss of correctness of OPE scheme. To decrypt a given ciphertext y, he uses the binary search tree T and finds the node t that includes t.m and t.c where t.c � y.
us, he can simply decrypt the ciphertext y and return a plaintext x by performing the binary search.
Next, we will prove the security of our proposed scheme with regard to the IND-FA-OCPA security model and analyze our construction in terms of efficiency. Theorem 1. Let Γ denote any possible randomized order of the plaintext sequence X. Y Γ denotes the ciphertext sequence when Γ is used as plaintexts. en, the challenger C in IND-FA-OCPA can always simulate the ciphertexts of X, which is identical to Y Γ .
Proof. In the encryption algorithm of x 1≤i≤n , let b i,j be the outputs of H(r i � � � �j), where count � j. en, we can compute b i,j , which is identical to the search tree as if Γ is encrypted. As shown in Figure 2, we can obtain b 2, Finally, the challenger C chooses n random values r 1 , r 2 , . . . , r n and simulates the random oracle H as b i,j ⟵ H(r i � � � �j); otherwise, it returns a bit chosen randomly. Based on eorem 1, if the challenger C outputs Y Γ in step 3 of IND-FA-OCPA, where Γ is the chosen common randomized order in the step 2, there is no advantage for an attacker A to distinguish X 0 and X 1 .

Efficiency.
e security in Table 1 demonstrates that the scheme of [16] achieves only the IND-OCPA security and shows that Kerschbaum's model is imprecise, which means no such FH-OPE can exist. Furthermore, both [16,17] do not provide an improved update algorithm. We can know that the proposed update algorithm positively affects the computational performance through some experiments in Section 5. Compared with [17], our scheme does not require the order information of all the encrypted plaintext while Maffei's scheme requires O(n log n) bits for additional Input: x, S, and (r‖count) ALGORITHM 2: Our encryption.  Security and Communication Networks persistent storage, except for a state. Moreover, a series of sorting every element in the randomized order Γ causes very low computational performance of [17] because these operations occur whenever a plaintext is inserted into the encryption algorithm of [17].

Experiments
We analyze the performance of [16,17] and the proposed scheme using a system that includes an AMD Ryzen 5 3600 6-core processor 3.59 GHz, 16 GB RAM in Python 3.9.5. We use different plaintext sizes N and the number of plaintexts to be encrypted n, but the ciphertext size M is fixed at 2048. Figure 3 shows the comparison of the encryption of n plaintexts that are ran-  [17] is to maintain the order of all encrypted plaintexts, their scheme requires additional Γ updates except for the ciphertext updates. Figure 3 shows that scheme of [17] exhibits lower performance than the other.

Random Distinct Plaintexts.
Here, we encrypt n plaintexts that are selected randomly in 1, 2, . . . , N { }, not allowing duplicates where n � N. Figure 4 shows that the overall speed improved in all the schemes owing to the blockage of the duplicate plaintexts, but the encryption time of [17] is more than the other. N, N − 1 Figure 5 shows that all the schemes take more time to encrypt data owing to the frequent updates of search tree. However, as the number of plaintexts to be encrypted increases, the encryption time of the scheme [17] sharply increases.

Ciphertext Update Cost.
In this section, we prove that our update algorithm is better than other algorithms. We encrypt n plaintexts that are selected randomly in    e update algorithm in [16,17] may produce the unbalanced search tree after executing the ciphertext updates. erefore, Figure 6 shows that the number of updates in our case is relatively small.

Conclusion
We review the construction presented by Maffei et al. and conclude that the scheme still needs to be improved in terms of storage and computational complexity. en, we propose a more practical FH-OPE scheme with the formal IND-FA-OCPA security proof. Moreover, we figure out that the previous update algorithms are not suitable for the duplicate plaintexts and propose an improved update algorithm that helps produce a perfectly balanced search tree regardless of the distribution of the plaintexts. Finally, we present some experimental results to demonstrate the excellence of the proposed scheme.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.   [16,17] and ours based on random duplicate plaintexts.