^{1}

^{2}

^{1}

^{1}

^{2}

The existing password-based encryption (PBE) methods that are used to protect private data are vulnerable to brute-force attacks. The reason is that, for a wrongly guessed key, the decryption process yields an invalid-looking plaintext message, confirming the invalidity of the key, while for the correct key it outputs a valid-looking plaintext message, confirming the correctness of the guessed key. Honey encryption helps to minimise this vulnerability. In this paper, we design and implement the honey encryption mechanisms and apply it to three types of private data including Chinese identification numbers, mobile phone numbers, and debit card passwords. We evaluate the performance of our mechanism and propose an enhancement to address the overhead issue. We also show lessons learned from designing, implementing, and evaluating the honey encryption mechanism.

Most people in China (as in any other country) are annoyed by junk text messages. The Internet users can also be affected by identity theft when criminals are using someone’s identity. This can occur because some sensitive private data was not well protected and was then maliciously used by other parties causing damage to finances and reputation of the data owner.

When purchasing a product online, we are asked to provide our mobile phone number for the delivery purpose. When buying a train ticket in China, we need to fill in the identification card number. The commercial parties gather such sensitive private data. Some store them in a plaintext format. Some employ password-based encryption (PBE) [

Juels and Ristenpart [

The innovation of honey encryption is the design of the distribution-transforming encoder (DTE). According to the probabilities of a message in the message space, it maps the message to a seed range in a seed space, then it randomly selects a seed from the range and XORs it with the key to get the ciphertext. For decryption, the ciphertext is XORed with the key and the seed is obtained. Then DTE uses the seed location to map it back to the original plaintext message. Even if the key is incorrect, the decryption process outputs a message from the message space and thus confuses the attacker.

The contribution of this paper is threefold. First, we design and implement the honey encryption system and apply the concept to three applications including Chinese identification numbers, mobile numbers, and passwords. These applications are based on uniformly distributed message spaces and the symmetric encryption mechanism. We also extend honey encryption to applications with nonuniformly distributed message spaces and an asymmetric encryption mechanism (RSA). Second, we evaluate the performance of our honey encryption mechanism and propose an enhancement. Third, we discuss lessons learned from implementing and evaluating the honey encryption technique.

The rest of this paper is organised as follows. Section

Most of the systems with encryption use password-based encryption (PBE). These systems are susceptible to brute-force guessing attacks. Honey encryption [

Honey encryption deceives attackers that the incorrectly guessed key is valid. Many luring technologies that also use the term honey have been proposed in the last 20 years. Honeytokens [

Honey encryption is also related to Format-Preserving Encryption (FPE) [

While Vinayak and Nahala [

In [

Our paper focuses on applying the honey encryption technique to three new applications including citizens’ identification numbers, mobile phone numbers, and debit card passwords. This data is vital private information that can cause serious damage to person’s finance and/or reputation if stolen. Although honey encryption has been applied to a number of applications, due to the variety of message formats and probability features, the message space design needs to vary for new types of applications. The applications discussed in our paper are carefully selected, because later we will show that the protection honey encryption provides varieties for different applications: stronger for debit card passwords, weaker for mobile phone numbers.

In our comprehensive design and implementation of the honey encryption mechanisms for three different applications we cover small/large message spaces, uniformly/nonuniformly distributed message probabilities, and symmetric/asymmetric encryption mechanisms. As far as we know, our paper is the first one to study the performance of honey encryption. We discover the performance problem for large message spaces and present a performance optimisation for small message spaces. We also show that the capability of honey encryption to address the brute-force vulnerability could be lost if the message space has not been well designed and that this design needs to vary for different types of applications.

Honey encryption protects a set of messages that have some common features (e.g., credit card numbers are such messages). A message set is called a message space. Before encrypting a message, we should determine the possible message space. All messages in the space must be sorted in some order. Then the probability of each message (PDF) that occurs in the space and the cumulative probability (CDF) of each message are needed. A seed space should be available for the distribution-transforming encoder (DTE) to map each message to a seed range in the seed space (

Let us consider using honey encryption to encrypt the coffee types, as shown in Figure

Honey encryption example.

For decryption, the ciphertext is XORed with the key to obtain the seed. Then the DTE inversely maps the seed to the original plaintext message. In the encryption process, a message could have multiple mapping choices and the mapping is directional and random. However, since we sort plaintext messages in the message space and determine the seed range by the PDF and CDF of each message, it can be guaranteed that the seed ranges are arranged in the same order and the cumulative probability of the seed range in the seed space is equal to the cumulative probability of the message in the message space. Therefore, we establish an inverse_table that consists of mappings of the cumulative probability to the plaintext message. Finding the seed, we can determine the seed range. Finding the seed range, we can determine the cumulative probability shared by the seed range and the corresponding plaintext message. Then by looking up the cumulative probability in the inverse_table, we can find the original plaintext message and the ciphertext is decrypted.

In [

We can design DTE as a common module that implements the encryption and decryption algorithms. For encryption, the DTE module takes in some parameters from the message space including the PDF and CDF probabilities of each message. Therefore, we abstract some interfaces for DTE to use when designing the message space. For decryption, the main task for DTE is to search the inverse_table and find the correct plaintext message. Therefore, the message space implementation should provide interfaces for probabilities and the inverse_table.

DTE maps the plaintext message to a seed in a seed range. The starting point of the seed range is determined by the CDF of the message, while the end point of the seed range is determined by the PDF of the message. Therefore, we define an interface for the message space containing functions including the cumulative_probability(mesg) function and the probability(mesg) function. These two functions accept a plaintext message as the parameter and output the CDF and PDF, respectively.

In decryption, DTE finds the plaintext message from the inverse_table by looking up the cumulative probability of the seed. The inverse_table is stored in a file. We define another function as get_inverse_table_file_name() in the message space interface. The function returns the filename of the inverse_table for DTE to look up and decrypt the ciphertext. If the inverse_table is not large, we can store the content in the memory when the system initiates. Then during decryption, the binary search method can be utilised to save time. However, if the inverse_table size exceeds the available system memory, DTE needs to read the inverse_table file line by line and find the plaintext by linear search.

DTE maps the plaintext message into a seed range, randomly selects a seed from the range, and XORs the seed with the key to output the ciphertext. The beginning of the seed range is determined by the CDF and the end of the seed range is determined by the PDF. The seed is randomly selected from the range.

When decrypting a ciphertext, the ciphertext is XORed with the key to obtain the seed. Then DTE determines the location of seed in the seed range. The location is corresponding to a probability value which lies between the CDF of the message and the CDF of the next message in the message space. Every line in the inverse_table contains a cumulative probability and its corresponding plaintext message. All lines are sorted by the cumulative probability. By searching the inverse_table, the DTE can find the plaintext message given the cumulative probability determined by the seed.

In this section, we apply the implemented honey encryption technique to private sensitive data including Chinese identification numbers, Chinese mobile phone numbers, and passwords. The code for DTE and the message space interface can be reused. However, the message space implementation should be customised and this is the focus of this section.

Identification numbers identify citizen’s personal information and are widely used for authentication. Therefore they are used and stored by many commercial organisations. Stolen/leaked identification numbers can be misused by malicious users.

The identification number consists of 18 digits, as shown in Figure

Identification number format.

By gathering statistics, we found that China has 3519 location symbols. The 11th to 14th digits have 365 choices, as a year has 365 days. The sequence code has 999 choices, but in fact, this value rarely reaches 999. So the message space has

Therefore, the probability(mesg) function returns

In this implementation, we find that the size of the message space and the inverse_table file is much larger than the available memory space if we consider 100 years and 999 sequence code. To address this problem, we divide the identification number into four parts including location, year, month-day, seq, and checksum and store their possible values in different files. Therefore the identification number without the checksum can be viewed as a

In this way, we construct a message space considering all people in China who were born between 1917 and 2016 and that the sequence code lies between 1 and 200. DTE reads these four files into the memory and stores these values in four different lists. To encrypt a message, DTE joins the four parts to make an identification number and compares it to the message to be encrypted to determine the location of the message. Then the cumulative probability is calculated and the message is encrypted. For decryption, the system XORs the ciphertext with the key to obtain the seed. Then DTE calculates the probability,

Although we solve the memory problem by storing the identification numbers in different files, we find that the time to honey encrypt and decrypt a message in such a large message space is very high. In addition, if taking into account a nonuniformly distributed message space, it would be more complicated, since different messages have different PDFs and CDFs that cannot be calculated by determining the message location.

Nowadays, mobile phone numbers are combined with the debit cards for financial transaction purposes and therefore mobile numbers should be well protected. The mobile phone number consists of 11 digits. The first 3 digits represent the operator code. There are three operators, China Mobile, China Unicom, and China Telecom in China. Each operator has been assigned a number of operator codes. The 4th to 7th digits are the location code of the mobile phone. The 8th to 11th digits are random. We implement the honey encryption technique for the Beijing Unicom numbers. For Beijing Unicom, the first 7 digits have 2872 choices, and the last 4 digits have

We define the MobileNumber class and implement the mentioned three functions for the message space interface. Given a mobile number, DTE outputs a random seed, which is XORed with the key to get the ciphertext. For a specific mobile number, using the system to encrypt it multiple times, the system outputs different ciphertexts due to the randomness in the seed generation process. When the ciphertext is XORed with the key, we can obtain the seed. Then the seed is used by DTE to get the correct plaintext of the mobile number.

Passwords usually consist of uppercase and lowercase letters, digits, and symbols. Many users use weak passwords. For example, the debit card uses a 6-digit password for withdrawing money from the ATM machine. Honey encryption can help to protect such passwords from brute-force attacks. The 6-digit password space consists of

We implement the password class by defining the three functions in the message space interface. When combined with DTE, the system can encrypt and decrypt correctly. Also when using a wrong key to decrypt a message, the system outputs a message that does not indicate that the key is not correct.

The platform for evaluating our honey encryption system is the Toshiba Portege-M800 laptop. The processor is Intel Core 2 Duo 2.0 Hz. The memory has a 3 GB RAM. The operating system is Ubuntu Kylin 16.04. The goal of experiments is to study the time taken to encrypt and decrypt a message. In order to make it easy to increase the size of the message space for multiple times, we choose the password message space for evaluation and increase the size from

For encryption in a large message space, DTE should read the message space file line by line, calculate the PDF and CDF, determine the seed range, and randomly select a seed from the range. Finally, the chosen seed is XORed with the key to obtain the ciphertext.

We extend the message space size from

Time to encrypt/decrypt a message.

Encryption

Decryption

During the decryption process, DTE first XORs the key with the ciphertext and obtains the seed. Then it determines the location of the seed in the seed space. Using the location information, it looks up the inverse_table and gets the corresponding plaintext message.

We measure the time to decrypt a message in three message spaces, ranging from

For a large message space, the decryption algorithm needs to read the inverse_table file line by line and find the correct plaintext message using the calculated cumulative probability. For a small message space, we can read the whole inverse_table into the memory and use the binary search method to find the corresponding plaintext message in the decryption process.

For a large message space, the encryption algorithm needs to read the message space file and determine the message’s PDF and CDF. But if the message space is incrementally sorted like the password message space, the value of the message,

We improve the encryption and decryption algorithm and evaluate their performance. Figure

Time to encrypt/decrypt a message.

Encryption enhanced

Decryption enhanced

Figure

We discussed honey encryption for identification numbers, mobile phone numbers, and debit card passwords in Section

The coffee example in Section

Performance in application with a nonuniform distributed message space.

So far, we have combined a symmetric encryption mechanism, XOR, with the honey encryption technique to produce a ciphertext from a given seed. The symmetric key scheme assumes that the involved parties can store the symmetric key securely and the key is distributed to both parties by a secure channel. In this section, we extend the honey encryption mechanism to a public key encryption mechanism, RSA, to mitigate this limitation.

The encryption process can be easily integrated with RSA. The 1024-bit public and private keys are generated by the RSA algorithm. For encryption, the plaintext message is mapped to a seed by the DTE encode() process and then the seed is encrypted by an RSA public key to generate the ciphertext. For decryption, the RSA private key is used to decrypt the ciphertext to obtain the seed and then the seed goes through the DTE decode() process to obtain the plaintext message.

When decrypting with a wrong private key, RSA encounters an error instead of outputting a valid-looking seed. To solve this problem, the system captures the exception when a wrong decryption key is utilised and outputs a random seed from the seed range to confuse the attacker. If the decryption key is the correct private key, the system can call the RSA decryption function without any exception and obtain the right seed. The seed then is mapped to a cumulative probability corresponding to a plaintext message. We implemented the asymmetric key honey encryption mechanism for the debit card password application (enhanced version) with a message space size of

We evaluate the performance of honey encryption with the RSA extension, as shown in Figure

Time to encrypt/decrypt a message.

Encryption time

Decryption time

In our research on honey encryption, we find that it is an effective countermeasure for brute-force attacks. However, we also discover the following limitations.

Private data should be well protected to avoid loss due to leakage and misuse. The existing password-based encryption (PBE) methods used to protect private data are vulnerable in face of brute-force attacks, as the attacker can determine whether the guessed key is correct or not by looking at the output of the decryption process. The honey encryption technique is a countermeasure for such a vulnerability. In this paper, we discussed the honey encryption concept and we also designed and implemented a honey encryption mechanism for Chinese identification numbers, mobile phone numbers, and debit card passwords. Applications with uniformly or nonuniformly distributed message spaces and with symmetric or asymmetric key encryption mechanisms are designed and implemented. The performance of our honey encryption mechanism was evaluated and an enhancement was proposed to address the overhead issue.

Finally, we discussed the lessons learned from our experience of designing, implementing, and evaluating the honey encryption mechanism. Specifically, we have the following observations.

The authors declare that they have no conflicts of interest.

The work is partially supported by the NSFC project 61702542 and the China Postdoctoral Science Foundation project 2016M603017.