A Key Selected S-Box Mechanism and Its Investigation in Modern Block Cipher Design

.e block cipher is an important means to provide data confidentiality in reality, and the S-box is an essential part in most of modern block cipher designs. In 1973, Feistel used a key selected S-box mechanism in his early block cipher designs, whose idea is to let each S-box have two different states and use a key bit to select which of the two states is to be used in an encryption or decryption operation. However, this key selected S-box mechanism has not got much attention in modern block cipher design with the DES block cipher published in 1977. In this paper, we revisit Feistel’s key selected S-box mechanism, give a generalised version of Feistel’s key selected S-box mechanism, compare it with existing close notions, and design the LBC example cipher to demonstrate that the generalised key selected S-box mechanism can be advantageous over the ordinary S-box mechanism in modern block cipher design for improving security and/or performance without intensifying computational effort and space in some application environments.


Introduction
e block cipher is an important primitive in secret-key cryptography. A block cipher is an algorithm that transforms a fixed-length data block, called a plaintext block, into another data block of the same length, called a ciphertext block, under the control of a secret user key. One main purpose of a block cipher is to provide confidentiality for data transmitted in insecure communication environments. A block cipher typically involves two types of operations, one is for the confusion property, which aims to make an involved relationship between ciphertext and plaintext/key, and the other is for the diffusion property, which aims to dissipate the statistical structure of plaintext over ciphertext. e diffusion operation is usually a linear permutation operation, and the confusion operation is usually made up of a nonlinear substitution box (S-box for short). An S-box takes as input a certain number of data bits and transforms them into a certain number of output bits in a nonlinear way, which is usually implemented as a lookup table. In modern block cipher designs such as DES [1] and AES [2], the S-box is usually an essential part and plays an important role in securing the ciphers.
In 1973, Feistel (the inventor of so-called Feistel ciphers) used a key selected S-box mechanism in his early block cipher designs [3,4]. Feistel's key selected S-box mechanism to let each S-box have two different states and use a key bit to select which of the two states is to be used in an encryption or decryption operation. However, this key selected S-box mechanism has not got much attention since DES was published in 1977 and has not been investigated in modern block cipher design, although there is occasionally an application [5] with the S-box replaced with such key selected S-boxes in AES.
In this paper, we revisit Feistel's key selected S-box mechanism. First, we generalise Feistel's key selected Sbox mechanism, and the generalised key selected S-box mechanism is to store several specific S-boxes (with the same dimension sizes) into a table and use certain key (or subkey) bits to select which of the S-boxes should be used in each S-box position of the S-box layer of a round of a cipher in an encryption or decryption operation. en, we compare the generalised key selected S-box mechanism with existing close notions and find that the generalised key selected S-box mechanism can offer extra security without intensifying computational effort and space, by producing many key-dependent choices for the round function of a cipher; in particular, it is well resistant against not only conventional cryptanalysis methods such as differential cryptanalysis [6] and linear cryptanalysis [7] but also recently emerging sophisticated variants such as multiple differential cryptanalysis [8], multiple linear cryptanalysis [9], and multidimensional linear cryptanalysis [10]. e extra security gain can allow us to reduce the number of rounds for the sake of performance, as long as the overhead caused by the key selected S-box mechanism in comparison with the ordinary S-box mechanism is negligible when compared with the gain resulted from the reduced number of rounds. Finally, we design the LBC cipher as an example to demonstrate that the generalised key selected S-box mechanism can be advantageous over the ordinary S-box mechanism in some application environments, where we define the combined difference distribution table and the combined bias distribution table for a generalised key selected S-box and describe frameworks to analyse the security of a block cipher with a generalised key selected S-box against differential and linear cryptanalysis. For this example cipher, the key selected S-box mechanism offers a software speedup of around 12% on a lightweight ARM NEON processor and a software speedup of around 16% on a general-purpose Intel i3 processor and offers a hardware speedup of around 22% in a parallel hardware implementation with one cycle per round, although it requires slightly more gate equivalents (GEs) than the ordinary Sbox mechanism in this particular parallel case. However, nevertheless, note that, like most of block cipher designs, we only consider the algorithmic security in the black-box model and do not consider the physical security of its implementations, such as side-channel attacks [11], which work in the gray-box model (that assumes an attacker having more power than the black-box model) and usually need additional resistance countermeasures; we note that an implementation of the (generalised) key selected S-box mechanism may be more vulnerable to side-channel attacks; however, the applicability of side-channel attacks is completely dependent on application environments and no single cipher design can be optimal in all application environments. e remainder of the paper is organised as follows. In Section 2, we give the abbreviations and notation used throughout this paper. In Section 3, we generalise Feistel's key selected S-box mechanism and compare it with existing similar notions. We specify the LBC block cipher in Section 4, discuss its design rationale in Section 5, and evaluate the security and performance gain of the key selected S-box mechanism over the ordinary S-box mechanism under the LBC example cipher in Sections 6 and 7, respectively. Section 8 concludes this paper.

Abbreviations and Notations
In all descriptions we assume that the bits of an n-bit value are numbered from 0 to n − 1 from right to left, with the most significant bit being the (n − 1)th, a number without a prefix expresses a decimal number unless stated otherwise, and a number with prefix 0x expresses a hexadecimal number. We use the following abbreviations and notations throughout this paper. GE: gate equivalent SIMD: single instruction multiple data SPN: substitution-permutation network ⊕: bitwise logical exclusive OR (XOR) operation ⋘i: left rotation (of a bit string) by i bits : string concatenation ∘: functional composition; when composing functions X and Y, X∘Y denotes the function obtained by first applying X and then Y 〈X〉: X in binary (base 2) notation |X|: the bit length of a value X X (i 0 ,i 1 ,...,i j ) : bits (i 0 , i 1 , . . . , i j ) of an n-bit value X, (0 ≤ i 0 , i 1 , . . . , i j , j ≤ n − 1)

The Generalised Key Selected S-Box Mechanism
In this section, we generalise Feistel's key selected S-box mechanism and compare it with existing close notions by discussing their similarities and differences.

Definition
for specific values of m, v, and q) is called a key selected S-box if there are 2 v ordinary (that is, one-variable) m × q-bit S-boxes with indexes from 0 to 2 v − 1 and, for each fixed v-bit value V, that is, some v bits of key material (e.g., a round key), F(·, V) refers to the Vth m × q-bit S-box.
We call V the selection vector and write F V (·) as F(·, V) for any fixed V or simply write F V . providing a greater security than the ordinary S-box mechanism and making the resulting cipher particularly resistant to multiple differential cryptanalysis [8], multiple linear cryptanalysis [9], and multidimensional linear cryptanalysis [10]. Because different differential characteristics or linear approximations usually require different sets of key (or subkey) bits under the key selected S-box, an attacker needs to specify the corresponding selecting key bits when establishing a differential characteristic or linear approximation, which shrinks the remaining key space that can be guessed in the key recovery phase. By contrast, under the ordinary S-box mechanism, a differential characteristic or linear approximation generally works under a random key, and an attacker does not need to specify the corresponding key bits when establishing a differential characteristic or linear approximation, and different differential characteristics or linear approximations can presumably work under the same key, and all these facts leave the full key space that can be guessed in the key recovery phase. As a consequence, under the key selected S-box mechanism, we do not need to additionally increase the number of rounds of a cipher due to the effect of multiple differential cryptanalysis, multiple linear cryptanalysis, and multidimensional linear cryptanalysis, which may produce a performance gain.
However, the key selected S-box is slightly different from the key-dependent S-box.
(i) e current key-dependent S-box construction methods such as [12,13] generally involve a number of interactions (at least 2, which is from the key-dependent S-box built from an ordinary S-box in [14], one XOR with the input of the ordinary S-box and one XOR with its output) between the key parameter and the data parameter, which is costly. While in the key selected S-box, the key parameter serves simply as the index to the associated ordinary S-boxes and then produces the output after only one simple interaction with the data parameter. In other words, the key selected S-box is usually much less computation intensive than the key-dependent S-box. (ii) In the current key-dependent S-boxes, the key parameter usually has the same role as and the same dimension size as the data parameter for a good randomness, and the key-dependent S-box can usually produce a relatively large number of instantiations over the key parameter space. By comparison, in the key selected S-box, the key parameter has a different role with the data parameter and usually has a smaller dimension size than the data parameter, as we use next in LBC.

A Comparison with DES (-like) S-Boxes.
e notion of key selected S-box is similar to the notion of a DES (or DESlike) S-box [1], which is an ordinary (6 × 4 bit) S-box involving only one input parameter, the data parameter, rather than a key-dependent S-box involving the data and key parameters, but a DES S-box uses two bits of the data parameter as the index to the four rows of the S-box table each of which can be treated as an ordinary (4 × 4 bit) S-box. However, the key selected S-box is different from a DES Sbox in which the key selected S-box has the other input parameter, the key parameter, which causes a distinction from the two bits of a DES S-box used for the index to the four rows, although they both serve as an index, for example, (i) When applying the differential cryptanalysis method [6] at an S-box level, for a DES S-box, we can generate its difference distribution table and then use it under the general assumption that data is distributed uniformly at random, but for a key selected S-box, although we can generate the difference distribution tables of the associated ordinary Sboxes, we have to guess the specific value of the key parameter in order to determine which difference distribution table should be used, since the key parameter is fixed once a user key is provided and thus is not distributed uniformly at random for the data produced with the user key. (ii) When applying the differential cryptanalysis method at a cipher level, the differential behaviors of the rounds of a cipher using a DES-like S-box are simply iterations of the differential behavior of a round since data is distributed uniformly at random; however, for a cipher using a key selected S-box, although we can make a guess for the values of the key parameters of a few rounds, the guessed values of the key parameters of the few rounds will shrink the space of possible user keys, and eventually the space of possible user keys will become very small or empty after a number of rounds, which would make it no sense to cryptanalyse the cipher any more. In short, like the key-dependent S-box, the key selected S-box makes more difficulty than an ordinary S-box for an attacker to apply differential cryptanalysis. e same situation holds for linear cryptanalysis [7], multiple differential cryptanalysis, multiple linear cryptanalysis, multidimensional linear cryptanalysis, etc.

A Comparison with Lucifer S-Box
Mechanism. e DES precursor Lucifer [15] also uses a key bit to control which of its two S-boxes is to be used as follows. Suppose S 0 and S 1 are two four-bit S-boxes, X and Y are four-bit nibbles, and some key bit is a so-called Interchange Control Bit (ICB). When ICB is equal to 0, then X will go through S 0 and Y will go through S 1 ; when ICB is equal to 1, then X will go through S 1 and Y will go through S 0 .
Lucifer S-box mechanism is different from the key selected S-box mechanism, which is best illustrated by the simple example with only two S-boxes in Figure 1. In the Lucifer S-box mechanism, the outputs of S 0 and S 1 are dependent. If X goes through S 0 , then Y must go through S 1 , Security and Communication Networks and vice versa. However, in the key selected S-box mechanism, whether X will go through S 0 or S 1 is independent from whether Y will go through S 0 or S 1 , since the two selection key bits are independent. In this simple example, the Lucifer Sbox mechanism can produce two possible output patterns: (S 0 (X), S 1 (Y)) and (S 1 (X), S 0 (Y)), while the key selected Sbox mechanism can produce four possible output patterns: (S 0 (X), S 1 (Y)), (S 1 (X), S 1 (Y)), (S 1 (X), S 0 (Y)), and (S 0 (X), S 0 (Y)). Other small distinctions include (1) the relative positions of X and Y are variable in the output of the Lucifer S-box mechanism, while the relative positions of X and Y are fixed in the output of the key selected S-box mechanism and (2) the relative positions of S 0 and S 1 are fixed in the output of the Lucifer S-box mechanism, while the relative positions of S 0 and S 1 are indeterminate in the output of the key selected S-box mechanism.

A Comparison with Key-Dependent S-Box Layers.
In 1994, when discussing how to strength the DES block cipher, Biham and Biryukov [14] mentioned the idea of using several sets of S-boxes (for the S-box layer of the DES round function) and using additional key bits to control which set is used (in an encryption/decryption operation), by writing that 'One can compute several different sets of S-boxes according to the design principles of DES and use additional key bits to control which set is used.' In 1999, Harris and Adams [16] mentioned a slightly different idea, which uses several S-boxes in a key-dependent order (also for the S-box layer of the round function of a cipher), by writing 'Another possibility is to order the s-boxes in a key-dependent way.' However, neither Biham and Biryukov nor Harris and Adams implemented their idea, and they mentioned that the security gain is small when the number of (the sets of ) Sboxes is small; specifically, Biham and Biryukov mentioned that the scheme is strengthened by a factor smaller than the number of the sets of S-boxes, and Harris and Adams mentioned that it is not particularly useful with only four Sboxes, (since there are only 4! � 24 possible orders, adding less than five bits of entropy to the key space). In other words, Biham and Biryukov's and Harris and Adams's mechanisms would require a large number of S-boxes in order to produce a large security gain in practice.
Compared with Biham and Biryukov's and Harris and Adams's mechanisms [14,16], the (generalised) key selected S-box mechanism can produce a much larger security gain at the expense of relatively more overhead, given a small number of S-boxes. Below, we discuss other similarities and differences between the (generalised) key selected mechanism and Biham and Biryukov's and Harris and Adams's mechanisms.
ese similarities and differences are better illustrated by the typical example in Figure 2, where S 0 , S 1 , S 2 , and S 3 are four S-boxes with the same size, K is a user key, and K 1 , K 2 , . . . , K m are round keys for some positive integer m: (i) Storage space required: Biham and Biryukov's mechanism [14] requires storing a number of sets of permuted S-boxes for the S-box layer of the round function, while Harris and Adams's mechanism and the (generalised) key selected S-box mechanism require storing a number of S-boxes for the S-box layer of the round function. us, Biham and Biryukov's mechanism generally requires a larger storage space than Harris and Adams's mechanism and the key selected S-box mechanism.
(ii) e number of choices on the S-box layer of the round function of a cipher: Biham and Biryukov's mechanism produces the same number of choices as the sets of permuted S-boxes stored, Harris and Adams's mechanism produces all possible permutations of the S-boxes stored, while the key selected S-box mechanism produces all possible patterns of the S-boxes stored. and Adams's mechanisms use the same S-box layer for all rounds of a cipher, while the key selected Sbox mechanism likely uses different S-box layers for different rounds. is significantly increases the security gain, albeit at the expense of relatively more implementation overhead if used under the same number of rounds, but nevertheless the security gain can allow for a reduced number of rounds so that a better overall performance may be possible, depending on specific cipher designs.
(v) When implemented in a parallel hardware with one cycle per round, the key selected S-box mechanism generally requires slightly more hardware area (or GEs) than its counterparts using Biham and Biryukov's and Harris and Adams's mechanisms. Anyway, when implemented in a serial hardware, the key selected S-box mechanism may produce a more compact implementation, depending on the reduced number of rounds owing to the security gain.
Particularly, coming back to the typical example in Figure 2, Biham and Biryukov mentioned that the security gain is small (i.e., 2 bits of entropy) when the number of the sets of S-boxes is small, and Harris and Adams mentioned that their mechanism is not particularly useful with only four S-boxes, since there are only 4! � 24 possible orders, adding less than five bits of entropy to the key space. However, the key selected S-box mechanism can produce a much larger security gain even with the small number of four S-boxes, and specific security gain depends on a specific cipher design that the key selected S-box mechanism applies to.
3.6. Summary. In summary, the key selected S-box is similar to but more or less different from existing close notions; it is simple to construct a key selected S-box from ordinary S-boxes, and it produces greater security improvement. A modern block cipher can gain extra security by using the (generalised) key selected S-box mechanism and can gain a better performance by reducing the number of rounds according to the extra security gain, as long as the overhead caused by the key selected S-box mechanism in comparison with the ordinary S-box mechanism is negligible when compared with the gain resulted from the reduced number of rounds.

The LBC Block Cipher
In this section, we specify the LBC block cipher, which employs a Feistel structure with a 64-bit block size, a variable length user key from 96 to 128 bits and a total of 25 rounds and takes advantage of the key selected S-box mechanism to achieve a good security and performance. LBC uses two elementary operations and involves three subalgorithms,  Security and Communication Networks namely, a key schedule algorithm, an encryption algorithm, and a decryption algorithm.
Below, we first describe the two elementary operations used in LBC, then the round function, the key schedule algorithm, the encryption algorithm, the decryption algorithm, and finally several test vectors of LBC.

Elementary
Operations. LBC mainly uses two elementary operations: a confusion operation S and a diffusion operation L, which are defined as follows: is a nonlinear substitution operation, constructed by applying a key selected S-box S: 0, 1 e four general 4 × 4-bit S-boxes involved in the key selected S-box S are S 0 , S 1 , S 2 , and S 3 , which we chose according to the most recent work on 4-bit optimal S-boxes owing to Zhang et al. [17], whose specifications are given in Table 1. If X � (X 3 , X 2 , X 1 , X 0 ) is a 32-bit block represented as four bytes which are further arranged as a 4 × 8-bit array:

Round Function.
e round function of LBC is built mainly on the nonlinear substitution operation S and the linear L operation, which takes two 32-bit blocks as inputs and outputs a 32-bit block.

Key Schedule Algorithm.
e key schedule algorithm of LBC takes a k-bit user key as input and outputs the required twenty-five 32-bit round subkeys, where k can be a variable between 96 and 128 bits and typically k � 96.
e key schedule algorithm is as follows: (1) A k-bit user key is stored in a key register K; K � (K (k− 1) , . . . , K (1) , K (0) ). (2) Output the leftmost 32 bits of the current content of the key register K as the first round subkey K 1 . (3) For i � 1 to 24, (a) Rotate the key register to the left by 29 bits, that is, K � (K ⋘ 29). (b) Update the leftmost 32 bits of the key register as follows: where 〈k〉 and 〈i〉 represent, respectively, the binary representations of k and i with the left side being extended by concatenating as many zeros as required to reach the required bit length. (c) Output the leftmost 32 bits of the current content of the key register K as the (i + 1)th round subkey K i+1 .
Note that the key schedule uses the key selected S-box in an abused way, where the selection vector for the key selected S-box is not the key material but rather the key length.

Encryption Algorithm.
e encryption algorithm of LBC transforms a 64-bit data block, called a plaintext (block), into a pseudorandom data block of the same length, called a ciphertext (block), under the control of a secret user key. e encryption algorithm takes as input a 64-bit plaintext block p and has a total of 25 rounds. e encryption procedure is as follows, where L j and R j are 32-bit variable (j � 0, 1, . . . , 25) (K i (1 ≤ i ≤ 25) is a round subkey generated from a user key by the key schedule algorithm of LBC).

Decryption Algorithm.
e decryption algorithm of LBC is the inverse of the encryption algorithm, and it decrypts a ciphertext to obtain the original plaintext, under the control of the same user key as in the encryption process. It takes a 64-bit ciphertext block C as input and works as follows:

Design Rationale of LBC
In this section, we give our design rationale for the structure, parameters, and components of LBC. At a high level, we feature the following distinctions when designing the LBC example cipher: (1) the novel notion of the key selected Sbox is used to achieve a good performance and a sufficient security; (2) the Feistel structure is combined with simple substitution and permutation operations to achieve an efficient hardware implementation with a moderate amount of GEs and an efficient software implementation; (3) the same key schedule algorithm as well as the same encryption and decryption algorithms for different key length versions of a variable length user key is used to provide user friendliness and efficient resource utilization; and (4) the strong key schedule ensures partially that data authenticity is robustly provided when LBC is sometimes used to build or abused as a hash function in some applications.

Structure.
ere are mainly two types of structures for iterated block ciphers, one is the Feistel structure and the other is the Substitution-Permutation Network (SPN) structure.
LBC has a Feistel structure. Compared with an SPN structure, a Feistel structure with the same block size has the following merits: (1) there are more flexibilities to design its round function, for example, the linear or S-box operation does not need to be invertible; (2) the round function is generally lighter, partially due to the fact that the round function operates on a smaller number of bits; and (3) implementing the circuit for both encryption and decryption does not cost much more than implementing the circuit for encryption only, as decryption is (almost) identical to encryption. By contrast, for an SPN structure we need to implement the round function as well as its inverse for both encryption and decryption. Anyway, the Feistel structure may need a larger number of rounds to be secure, but nevertheless, this is not always the case, for example, the Feistel block cipher LBlock [18] has a comparable number of rounds with the SPN block cipher PRESENT [19] and has resisted extensive cryptanalysis. Moreover, LBC uses the novel notion of the key selected S-box as well as a good diffusion operation to achieve additional security protection.

Block Size.
In reality, a general-purpose block cipher typically has a block size of 64 or 128 bits. LBC uses a block size of 64 bits, in order to meet the requirements of moderate application environments on memory, space, and performance. Although a 64-bit block size may be short in some applications due to the birthday bound paradox, it is still okay with appropriate block cipher modes of operation in many applications. [20] estimated that, for a symmetric cipher, an 80-bit key size can provide a security margin until (around) 2012, a 96-bit key size can provide a security margin until 2034, and a 128-bit key size can provide a security margin until 2076. NIST Table 1: e 4 × 4-bit S-boxesS 0 , S 1 , S 2 , and S 3 of LBC, where the values are in hexadecimal notation.

Key Length. In 2001, Lenstra and Verheul
recommended not to use an 80-bit key in 2010 and disallowed an 80-bit key in 2012. In 2012, European ECRYPT II project remarked for a symmetric cipher that an 80-bit key size provides a security level of "Very short-term protection against agencies" and " ≤ 4 years protection," a 96-bit key size provides a security level of "Legacy standard level" and " ≈ 10 years protection," and a 128-bit key size provides a security level of "Long-term protection" and " ≈ 30 years (protection)". As Dinur [21] noted in 2015, the Bitcoin network [22] demonstrated that a computation of 2 80 (cipher encryption operations) is (marginally) practical. In short, an 80-bit key is now considered to be too short to be secure in reality. When designing the LBC cipher, we use a minimum key length of 96 bits for short-term protection and a maximum key length of 128 bits for long-term protection. Anyway, to be flexible and user friendly, LBC accepts a variable-length user key, and thus the user can use a key length of his choice, as long as it is between 96 and 128 bits (a key shorter than 96 bits may be used, which we do not recommend); for example, a 112-bit user key may be used for medium-term protection. Using a variable key length enables the user to have more flexibility to choose an appropriate key length according to the expected lifetime for the concerned security application, so as not to waste computing and hardware resources.

S-Box Layer.
In reality, a general-purpose block cipher typically uses an 8 × 8-bit S-box, and a lightweight block cipher typically uses a 4 × 4-bit S-box, in order to meet the requirements of lightweight application environments on memory and space, since a 4 × 4-bit S-box is generally more compact in hardware than an 8 × 8-bit Sbox. e PRESENT block cipher uses a 4 × 4-bit S-box based on Leander and Poschmann's work [23]. However, the S-box has a weak security property in the sense of linear cryptanalysis, that is, there are a number of combinations of onebit input mask and one-bit output mask [24]. In 2015, Zhang et al. [17] studied 4 × 4-bit optimal S-boxes with more security criteria and presented three classes of 4 × 4-bit optimal S-boxes. e number of valid combinations of one-bit input difference and one-bit output difference is x, and the number of valid combinations of one-bit input mask and one-bit output mask is 4 − x, where x � 0, 1, 2.
LBC uses a key selected S-box S that is based on four ordinary 4 × 4-bit S-boxes S 0 , S 1 , S 2 , and S 3 eight times to build the S-box layer S in its round function F and uses the same S-box layer in the 25 rounds.
From Zhang et al.'s (2, 2)-Num1-DL category of 4 × 4-bit optimal S-boxes, we further chose each 4 × 4-bit Sbox S i (0 ≤ i ≤ 3) by the following additional security criterion: (i) e two valid combinations of one-bit input difference and one-bit output difference do not use the same input/output difference; the two valid combinations of (one-bit input mask and one-bit output mask) do not use the same input/output mask. Here, one-bit input difference/mask means that the binary representation of the input difference/mask has one and only one bit position with a one, that is, it has zeros everywhere except for one bit position. e same statement applies subsequently throughout the rest of this paper, although we do not explicitly make it further.
at is, we use the following security criteria in total: (1) e S-box is bijective, that is,  Security and Communication Networks (2) e S-box has no fixed point, that is, (5) e number of valid combinations of one-bit input difference and one-bit output difference is 2, and the number of valid combinations of one-bit input mask and one-bit output mask is 2, too. (6) Either of the valid combinations of one-bit input difference and one-bit output difference has the smallest (valid) possibility, that is, 1/8 for a 4 × 4-bit S-box. (7) Either of the valid combinations of one-bit input mask and one-bit output mask has the smallest (valid) bias, that is, ± 1/8 for a 4 × 4-bit S-box. (8) e two valid combinations of one-bit input difference and one-bit output difference do not use the same input/output difference. (9) e two valid combinations of one-bit input mask and one-bit output mask do not use the same input/ output mask. e four ordinary 4 × 4-bit S-boxes S 0 , S 1 , S 2 , and S 3 together meet the following security criterion. (10) Ideally, any two 4 × 4-bit S-boxes do not involve a common valid combination of one-bit input difference and one-bit output difference or one-bit input mask and one-bit output mask. (11) Ideally, any two 4 × 4-bit S-boxes do not concurrently have the largest (valid) probability (i.e., 1/4) under any (input difference and output difference) pair and do not concurrently have the largest (valid) bias (i.e., ± 1/4) under any (input mask and output mask) pair.

Diffusion
Layer. e diffusion layer L has a branch number [25] of 4, which provides a sufficiently large avalanche effect to make LBC secure against currently known cryptanalysis techniques such as differential and linear cryptanalysis, together with the S-box layer. L performs only simple operations (namely, rotation and XOR) and is very lightweight in hardware implementation and is suitable not only for hardware implementation but also for software implementation.

Key Schedule.
To achieve a good performance, many lightweight or moderate block ciphers use a simple key schedule, for example, HIGHT [26] and PRESENT; in particular, the style of the key schedule of PRESENT was followed by many subsequent lightweight block ciphers such as LBlock [18] and RECTANGLE [17]. However, the full-round HIGHT was shown in 2011 to suffer from a related-key [27,28] attack [29], and the full-round PRESENT was shown in 2015 to suffer from a known-key distinguisher [30,31], mainly due to their simple key schedules. In reality, a block cipher may be used to build or abused sometimes as a hash function for data authenticity to save hardware space, where the unknown key parameter under the block cipher corresponds to the known message parameter under the hash function. us, HIGHT and PRESENT are not suitable for this case because of the related-key and known-key cryptanalysis results, and the known-key distinguisher on the full PRESENT puts a security concern on PRESENTbased hash functions.
We aim to design a strong key schedule for LBC so that LBC can resist key schedule attacks and can be used to build or abused as a hash function to provide data authenticity in some devices, considering that confidentiality without authenticity is usually not sufficient for a real-life application (note that a 64-bit digest size may be short for some applications, since the birthday bound is 2 32 ; however, it is practically okay for many real-life applications).
e key schedule of LBC is based on the round function, so as to have a good nonlinearity and save some hardware area; it makes LBC secure against related-key cryptanalysis [27,28,32] as well as slide attacks [33,34] (together with the encryption or decryption procedure of LBC). When the key length parameter k is determined by the user, the ordinary S-box used for the key selected S-box in the key schedule can be easily determined (in other words, the key selected S-box becomes an ordinary S-box), and the order of the ordinary S-boxes in the S-box layer can also be easily determined, which results in a determinate S-box layer and thus a simple hardware implementation.
e key schedule of LBC is very user friendly in several aspects. First, a variable-length user key enables the user to have more flexibility in choosing an appropriate key length according to the expected lifetime for the target application, so as not to waste computing and hardware resources. Second, the key schedule uses the same algorithm for different key lengths, which makes LBC different from most existing block ciphers that usually use different key schedule algorithms for different key lengths (if supported); this feature is user-friendly, for example, it enables the user to make a hardware implementation easily for different key length versions. ird, with computing power increasing as time goes on, it is often the case to upgrade to a larger key length when the current key length becomes insufficient after some usage time. A variable-length key enables the LBC user to upgrade to the exactly required key length, so as to efficiently utilize hardware resource by avoiding having to upgrade to a much larger prespecified key length than required. For example, published in 2007, PRESENT accepts only 80-and 128-bit user keys, but since an 80-bit key is considered to be too short nowadays, as mentioned in Section 5.3, PRESENTshould be upgraded now if it had been deployed with an 80-bit key in reality, although it is not very long since its publication; however, a 128-bit key may be too long for many lightweight security applications and thus may be wasteful. Security and Communication Networks e idea of using a variable length key for LBC is motivated by the general block ciphers, Serpent [35] and SHACAL-2 [36], but LBC processes a variable length key in a manner different from that used by Serpent or SHACAL-2: the latter requires extending a shorter user key to the maximum key length by concatenating as many zeros as required or a one followed by as many zeros as required (and thus does not distinguish different key length versions much), while LBC does not require extending a shorter user key to the maximum key length and it distinguishes different key length versions by involving the key length parameter in the key schedule, to avoid potential key-schedule attacks.

Security Gain Evaluation
In this section, we briefly give our evaluation results on the security of LBC against a list of advanced cryptanalysis techniques (under the worst case assumption) and finally get the security gain of LBC over the LBC version with the ordinary S-box mechanism. Conservative frameworks are developed for analysing the security of the key selected S-box mechanism against differential and linear cryptanalysis. Recall that like most of block cipher designs, we only consider the black-box security of the algorithm and do not consider its gray-box security such as side-channel attacks [11], which usually assume a more powerful attacker and need additional resistance countermeasures. Note first that LBC uses a user key of at least 96 bits and can withstand elementary cryptanalysis methods. We start with two properties of LBC.

Properties of LBC.
A simple analysis of the L operation reveals the following property.

Property 1.
For the L operation, if the input X and the output Y � L(X) are represented each as eight 4-bit nibbles corresponding to the eight S-boxes, that is, X � (X 7 , . . . , X 1 , X 0 ) and Y � (Y 7 , . . . , Y 1 , Y 0 ), then (1) e eight 4-bit nibbles of the output Y can be expressed with the eight 4-bit nibbles of the input X as follows: (2) e eight 4-bit nibbles of the input X can be expressed with the eight 4-bit nibbles of the output Y as follows: A simple detailed investigation reveals the following property.

Property 2.
e propagation of a single bit: (i) A single bit will get at least 62 subkey bits involved, after 3 rounds, depending on the bit position. Detailed numbers of involved subkey bits are given in Table 2, in comparison with the numbers of involved subkey bits under the ordinary mechanism. (ii) A single bit will get at least 92 subkey bits and about 60 output bits involved, after 4 rounds, depending on the bit position. Detailed numbers of involved subkey bits are given in Table 2, in comparison with the numbers of involved subkey bits under the ordinary mechanism. (iii) A single bit will get all 96 subkey bits and all 64 output bits involved, after 5 rounds.
e numbers of disjoint subkey bits involved in the propagation of a single nibble position through 3 and 4 rounds are summarised in Table 2 and are briefly illustrated in Figures 4-11. 6.2. Differential Cryptanalysis. As mentioned in Section 3.3, a key selected S-box makes it difficult for an attacker to apply differential cryptanalysis. Anyway, we develop a conservative framework for the differential cryptanalysis of block ciphers using a key selected S-box. We start the framework with introducing the concept of the combined difference distribution (CDD) table for a key selected S-box as follows.

Definition 2.
e combined difference distribution (CDD) table for a key selected S-box: 0, 1 values of m, v, and q) is a table with 2 m rows being the 2 m possible input differences, 2 q columns being the 2 q output differences, and the (i, j)th entry being the set of the possible combinations (the number of m-bit inputs satisfying the input difference and output difference pair (i, j) under an ordinary m × q-bit S-box and the number of ordinary m × q-bit S-boxes that have the number of m-bit inputs satisfying the input difference and output difference pair (i, j)), where 0 ≤ i ≤ 2 m − 1 and 0 ≤ j ≤ 2 q − 1.
As an example, we compute the CDD table for the key selected S-box S used in LBC, which is given as Table 3. Each entry except (0, 0) has at most three combinations, which follow the difference distribution tables of the four ordinary S-boxes (see Table 4). Note that one may enhance the combined difference distribution table by associating every combination with its corresponding probability/ probabilities. Now by treating the eight key selected S-boxes in the Sbox layer of LBC as eight identical ordinary S-boxes with the difference distribution table being the CDD table, we can check the minimum number of active S-boxes for a differential characteristic of a certain number of rounds, in a manner similar to that Matsui did for DES (under the general assumption for differential cryptanalysis) in [37]. As each ordinary S-box has a maximum (valid) probability of 2 − 2 , we can obtain an upper bound for a differential characteristic of a certain number of rounds and get its security against differential cryptanalysis. Clearly, the upper bound is overestimated, since it is based on the CDD table and, each of the four ordinary difference distribution tables is only a subset of the CDD table. By this way, we can bound the security against differential cryptanalysis in the worst case from the point of the user of the cipher.
We made a computer program to compute the minimum numbers of active S-boxes of i-round differential characteristics under the CDD table (1 ≤ i ≤ 18), and the results are given in Table 5.
From Table 5, we see that the number of active S-boxes is larger than 32 for 18 or more rounds. In particular, for an 18round differential characteristic, the number of active Sboxes is at least 33. 33 active S-boxes require a total of 66 selecting key bits, which means that there are only 128 − 66 � 62 key bits left for the key recovery phase. Table 2 shows that a single nibble/bit will get at least 62 subkey bits involved after propagating through 3 rounds. A total of five rounds appended at both ends of an 18-round differential characteristic will indicate at least 3 rounds in an end, which would require an attacker to guess all the remaining 62 key bits in the key recovery phase. As a result, we can assume at most an 18-round differential characteristic and assume appending at most a total of five rounds at both ends.  Remind that multiple differential cryptanalysis does not work well in the key selected S-box mechanism because a different differential characteristic will require a different set of selecting key bits, which would further shrink the space of the key bits that can be guessed in the key recovery phase. erefore, 25-round LBC should be secure against differential cryptanalysis.

Linear Cryptanalysis.
To analyse the security of LBC against linear cryptanalysis, we first have the following property of the L operation.   (1) e eight 4-bit nibbles of the output mask ΓY can be expressed with the eight 4-bit nibbles of the input mask ΓX as follows:    14 Security and Communication Networks (2) e eight 4-bit nibbles of the input mask ΓX can be expressed with the eight 4-bit nibbles of the output mask ΓY as follows: Proof. By Property 1 (1), we have By Property 1 (2), we have us, the results follow trivially. As mentioned earlier, a key selected S-box also makes it difficult for an attacker to apply linear cryptanalysis. Here, we similarly develop a conservative framework for the linear cryptanalysis of block ciphers using a key selected S-box, which is based on the concept of the combined bias distribution (CBD) table for a key selected S-box as follows. □ Definition 3. e combined bias distribution (CBD) table for a key selected S-box: 0, 1 values of m, v, and q) is a table with 2 m rows being the 2 m possible input masks, 2 q columns being the 2 q output masks, and the (i, j)th entry being the set of the possible combinations (the number of m-bit inputs satisfying the input mask and output mask pair (i, j) under an ordinary  0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8 0x9 0xA 0xB 0xC 0xD 0xE 0xF 0x0 (16,4) Table 4: e difference distribution tables of S 0 , S 1 , S 2 , and S 3 S-boxes.

Input difference
Output difference 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xA 0xB 0xC 0xD 0xE 0xF m × q-bit S-box and the number of ordinary m × q-bit Sboxes that have the number of m-bit inputs satisfying the input mask and output mask pair (i, j)), where 0 ≤ i ≤ 2 m − 1 and 0 ≤ j ≤ 2 q − 1. Likewise, we can compute the CBD table for the key selected S-box S used in LBC, which is given as Table 6. Each entry except (0, 0) has at most five combinations, namely, 0, ± 2, and ± 4, which follow the bias distribution tables of the four ordinary S-boxes (see Table 7). Note that one may enhance the combined difference distribution table by associating every combination with its corresponding probability/probabilities. Now by treating the eight key selected S-boxes in the Sbox layer of LBC as eight identical ordinary S-boxes with the bias distribution table being the CBD table, we can check the minimum number of active S-boxes for a linear approximation of a certain number of rounds, in a manner similar to that Matsui used for DES (under the general assumption for linear cryptanalysis) in [37]. As each ordinary S-box has a maximum (valid) bias probability of 2 − 2 , we can obtain an upper bound for a linear approximation of a certain number of rounds and get its security against linear cryptanalysis. Clearly, the upper bound is overestimated, since it is based on the CBD table, and each of the four ordinary bias distribution tables is only a subset of the combined bias distribution table. By this way, we can bound the security against linear cryptanalysis in the worst case from the point of the user of the cipher.
We made a computer program to compute the minimum numbers of active S-boxes of i-round linear approximations under the CBD table (1 ≤ i ≤ 5), and the results for 1, 2, 3, 4, and 5 rounds are 0, 1, 2, 5, and 8, respectively (it is rather time consuming for 6 or more rounds). us, a 20round linear approximation has a minimum of 4 × 8 � 32 active S-boxes, and 32 active S-boxes have at most a bias of 2 32− 1 × (2 − 2 ) 32 � 2 − 33 , which is not valid for a linear cryptanalysis attack because 2 − 33 < 2 − 32 . As a result, we can assume at most a 20-round linear approximation and can assume appending at most a total of five rounds at both ends, since a total of five rounds appended at both ends will indicate at least 3 rounds in an end, which would require an attacker to guess all the remaining 62 key bits. Remind that multiple linear cryptanalysis does not work well in the key selected S-box mechanism because a different linear approximation requires a different set of selecting key bits, which would further shrink the space of remaining key bits that can be guessed in the key recovery phase. erefore, 25round LBC should be secure against linear cryptanalysis.
6.4. Impossible Differential Cryptanalysis. Impossible differential cryptanalysis [38,39] is a special case of differential cryptanalysis, which is based on a differential with a zero probability. Here, we analyse the security of LBC against impossible differential cryptanalysis.

Boomerang and Rectangle
Attacks. Boomerang, amplified boomerang, and rectangle attacks [40][41][42] are variants of differential cryptanalysis, which treat a block cipher as two cascades and use two short differentials with larger probabilities instead of a long differential with a smaller probability. Here, we analyse the security of LBC against boomerang, amplified boomerang, and rectangle attacks.
Typically, differential cryptanalysis is based on the idea of using a long differential characteristic with a usually small probability. Different from the idea of differential cryptanalysis, boomerang attack [42] is based on the idea of using two short differential characteristics with relatively larger probabilities. Suppose two short differential characteristics with probability p and q, respectively; then, p and q should satisfy p × q < 2 − (n/2) to construct a valid boomerang distinguisher, where n is the block size of the concerned cipher. Amplified boomerang and rectangle attacks refine boomerang attack mainly by using more than two differential characteristics with the same input or output difference.  Input mask  Output mask  0x0  0x1  0x2  0x3  0x4  0x5  0x6  0x7  0x8  0x9  0xA  0xB  0xC  0xD  0xE  0xF  0x7  0  4  2  2  0  0 Figure 12: A 9-round impossible differential of LBC.

Security and Communication Networks
For LBC, from Table 5, we can learn that an 11-round boomerang distinguisher has a minimum of 16 active Sboxes, which means that the product of the probabilities of two differential characteristics operating on 11 rounds is at most 2 − 32 . us, 25-round LBC should be sufficiently secure against boomerang attack as well as amplified boomerang and rectangle attacks.
6.6. Integral Cryptanalysis. Here, we analyse the security of LBC against integral cryptanalysis [43]. Let A denote a 4-bit nibble position which takes all the possible 16 values, B denote a 4-bit nibble position which is balanced (in other words, its XOR sum is zero), C denote a constant 4-bit nibble, and "?" denote a constant 4-bit nibble whose status is unclear about whether it is any of the above three statuses. . Now, there is a 4-bit nibble position with symbol "C" and a 4-bit nibble position with symbol "A" in the output of Round 5. If we continue with one more round, all the 4-bit nibble positions of the output of the resulting round will have an unclear status. us, we get a 5-round integral distinguisher of one dimension, which is illustrated in Figure 13(a), here "one dimension" means there is only one active nibble position in the set of inputs.
If we would like to obtain a longer integral distinguisher by adding more rounds from the beginning, we can only add at most 4 rounds before reaching the full plaintext space, as illustrated in Figure 13(b). As a result, 25-round LBC should be sufficiently secure against integral cryptanalysis.

Slide Attack.
e key schedule involves the round numbers (i.e., i) to avoid slide attacks [33,34], and the rotation number "29" in Step 3 (a) guarantees that the three least significant bits of i get involved in the generation of the subkey of the next round (if any), and the two most significant bits of i get involved in the generation of another round subkey (if any).

Related-Key Cryptanalysis.
e key schedule is based on the LBC round function to have a high level of nonlinearity and involves the key length parameter (i.e., k) to distinguish the different key versions, so as to avoid (potential) relatedkey attacks [27,28,32] under different key lengths. e use of the key selected S-box in the encryption/decryption algorithm makes it more difficult to apply related-key cryptanalysis, since the order of the ordinary S-boxes involved in the S-box layer of a fixed round is indeterminate if a key is unknown, and is very likely to vary when a key is changed.
6.9. Summary. We also analysed the security of LBC against other cryptanalysis methods. In summary, the potential 20round linear approximation of Section 6.3 is the longest cryptanalysis distinguisher we have obtained, and thus 25round LBC should be sufficiently secure.
Note that differential cryptanalysis and linear cryptanalysis require a different framework in the key selected Sbox mechanism, while impossible differential cryptanalysis and integral cryptanalysis work similarly as in the ordinary mechanism, and boomerang, amplified boomerang, and rectangle attacks follow from differential cryptanalysis.

Security of LBC with the Ordinary S-Box Mechanism.
In comparison, for LBC with the ordinary S-box mechanism rather than the key selected S-box mechanism (i.e., using an ordinary S-box S(x), say S 0 (x), rather than a key selected Sbox S K (,) i (x)), as shown in Figure 14, a single nibble/bit will get all the 128 subkey bits involved after propagating through at least 6 rounds. A total of 11 rounds appended at both ends of a linear approximation will indicate at least 6 rounds in an end, which would ensure that an attacker guesses all the 128 key bits in the key recovery phase; multiple linear cryptanalysis works well in the ordinary mechanism and thus we should take its effect into consideration. As a result, LBC with the ordinary S-box mechanism would require 32 rounds to be secure, assuming a 20-round linear approximation with a total of 11 rounds appended at both ends, plus one additional round for preventing the potential effect of multiple linear cryptanalysis.

Performance Gain Evaluation
In this section, we briefly give our performance gain evaluation of LBC over LBC with the ordinary S-box mechanism. Recall that as discussed in Section 6 from a design perspective, LBC requires 25 rounds to be secure, while LBC with the ordinary S-box mechanism requires 32 rounds to be secure.
We test software performances on two types of processors, one type has enough storage and computing resources for general purposes such as servers, and the other type has low or moderate storage and computing resources for resource-constrained devices such as smartphones. Note that there are various software and hardware implementation optimizations and trade-offs among such metrics as memory, cost, area, and throughput, faster or slower than the presented performance results.

Software Performance on Intel i3.
e last second subcolumn of Table 8 shows the encryption-only performances of the two LBC versions under the same Single Instruction Multiple Data (SIMD) implementation method on a popular Intel i3 CPU i5-4200U @ 1.6GHz processor (x64 architecture) with enough storage and computing resources for general purposes such as servers, where the results are only for the encryption parts, and the round keys are stored for use after being generated, which is the usual case for a server. 22 Security and Communication Networks   Note that if the key schedule part was included, the speedup would be greater, since the two versions use the same process for round keys, but LBC with the key selected mechanism has only 25 rounds, while LBC with the ordinary mechanism has 32 rounds. Note also that since the round keys are stored after being generated, the results also hold for the case that the server processes a larger number of plaintexts at a time.

Software Performance on ARM NEON.
e last subcolumn of Table 8 shows the performance of the two LBC versions under the same SIMD implementation method on a popular ARM Cortex-A9@1.4GHz processor (×64 architecture) for cost-sensitive devices such as smartphones, where the results are for both encryption and key schedule parts, and the round keys are generated on the fly, which is the usual case for a resource-constrained device. As a result, the key selected S-box mechanism offers (91.7 − 80.3/91.7) ≈ 12% speedup in the example LBC cipher.

Hardware Performance.
When implemented in a parallel hardware implementation with one cycle per round, to process a (64-bit) plaintext block, LBC with the key selected mechanism takes 25 cycles, and LBC with the ordinary mechanism takes 32 cycles to process. us, the key selected S-box mechanism offers about (32 − 25/32) ≈ 22% speed improvement under this implementation approach in the example LBC cipher. In this case, the key selected S-box mechanism requires slightly more hardware area or GEs than the ordinary S-box mechanism, which may make it not suitable for extremely resource-constrained environments, but nevertheless it is okay in moderately resource-constrained environments.

Concluding Remarks
We have presented and investigated a generalised version of Feistel's key selected S-box mechanism in modern block cipher design and have designed the LBC example cipher to demonstrate that the generalised key selected S-box mechanism can be advantageous over the ordinary S-box mechanism for improving security and/or performance without intensifying computational effort and storage space in some application environments. Especially, we have defined the combined difference distribution table and the combined bias distribution table for the generalised key selected S-box mechanism to analyse the security of a block cipher with a generalised key selected S-box against differential and linear cryptanalysis [44,45].
As the first attempt, LBC is designed mainly as an example for the primary purpose of investigating relative security and performance gain of the generalised key selected S-box mechanism over the popular ordinary S-box mechanism in modern block cipher design. To us, the main overhead of the key selected S-box mechanism is that it requires slightly more hardware area or GEs than the ordinary S-box mechanism, which may make it not suitable for extremely lightweight application environments, depending on specific designs, but nevertheless it can gain better security and/or performance at least in general or moderately lightweight application environments. No single cipher design can be optimal in all application environments, this is the first detailed investigation on the key selected S-box mechanism, and we would like to see more investigations and better cipher designs on it.
Data Availability e data are available from the corresponding author upon request.

Disclosure
An extended abstract version of this work was published in Proceedings of the 2017 IEEE Region Ten Conference (TENCON 2017, Penang, Malaysia, 5-0038 November, 2017) [46]. As the full version of the work, this paper gives more design rationale and security analysis of the LBC example cipher. e authors were with Institute for Infocomm Research (Singapore) when this work was completed.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.