CFA: A New Family of Hybrid CA-Based PRNGs

In this article, a new Pseudorandom Number Generator (PRNG) construction is proposed. It is based on cellular automata (CAs) and comprises other cryptographic primitives organized as blocks. Each of these blocks has a purpose and serves toward obtaining a higher level of randomness. )e construction described is modular, and each of its blocks can be replaced, modified, or adapted according to the user, the application, or the level of randomness required. )e authors first describe a general structure and the design principles behind each of the components. Next, a concrete example using the SHA-3 hash function, a hybrid cellular automaton, and the AES block cipher is provided.)en, the security analysis and the statistical properties for this specific instance of the scheme are presented.


Introduction
Designing cryptographic primitives that achieve security strength and satisfy all of the cryptographic properties is a hard task. First, the development of cryptanalysis techniques makes it difficult for designers to build new primitives with a high security level. In addition, some cryptographic properties are conflicting like, for example, nonlinearity and correlation [1]. erefore, a compromise between cryptographic properties should be found according to the security level the designer wants to achieve.
Many cryptographic applications make use of random numbers. Examples are session keys, seeds, salts, initial vectors, nonce, etc. However, for costs reasons, not all these applications use truly random bits generated from physical sources. Generally, a pseudorandom number generator is used. is cryptographic mechanism tries to generate sequences that are hard to distinguish from truly random sequences. e motivation of this work is to design a versatile, portable, and secure PRNG family based on cellular automata. Consequently, it could be adapted easily according to the different applications requiring random sequences. e design presented here is modular in the sense that all the building blocks can be modified by the system's designer to meet specific requirements.
ose building blocks are a hash function, a cellular automaton, and a block cipher. e mechanism of seeding and reseeding is also part of the design and ensures that the PRNG starting state is unpredictable. All those components were included to achieve a high degree of randomness, a high security level, and good cryptographic properties. Statistical tests, properties evaluation, and a selection procedure for the ruleset of the cellular automaton have been conducted to show that the generator proposed is a good candidate for generating high quality random sequences. e family of generators presented in this work can be used as standalone schemes to generate random sequences or as part of a bigger system, for example, within a stream cipher. e previous CA-based PRNGs, found in [2][3][4][5][6][7][8], based the security of their systems on only cellular automata. Consequently, their designs cannot prevent some attacks like chosen-input attacks, direct cryptanalytic attacks, and sidechannel attacks. ey focused more on the randomness property, which made the systems lack the security level required by a secure PRNG. Instead, CFA, the PRNGs family proposed and studied in this article, was designed by considering the randomness properties and known attacks to provide a high-security level. e main contributions of this work were to define a general design satisfying the requirements of a Cryptographically Secure PRNG, then the choice of the building blocks of this design. e construction of the cellular automaton ruleset provides good cryptographic properties (see Appendix A), which enhances the security of the PRNG in addition to the randomness properties. Secondly, the choice of the hash function and the block cipher are adequate for the proposed implementation. In addition, the seed file, the reseed, and the reseed trigger mechanisms make attacks trying to find out the initial configuration of the PRNG challenging. e rest of this article is organized as follows. In Section 2, some basic notions about randomness, entropy, and cellular automata are defined. In Section 3, a brief description of related works is given. Next, in Section 4, the general scheme of the generator and a specific implementation are described. e experimental results and an analysis of that specific implementation follow in Section 5. Finally, Section 6 presents challenges, possible solutions, and future directions, followed by conclusions.

Preliminaries
2.1. Randomness. Random numbers are an essential component of cryptographic applications such as encryption, key generation, or content masking. e notions of random, randomness, and random numbers are often associated with unpredictability and uniform distribution.
A Random Number Generator (RNG) is an algorithm that produces sequences of numbers or bits that seem to be random, from a seed or from a continuously fed input. True Radom Number Generators (TRNGs) are a class of RNGs that use physical sources to generate "true" random sequences. However, working with truly random sequences has drawbacks like the reliability of the physical sources, the availability of the data, the limited amount of data, or the high cost of getting the data. Pseudorandom Number Generators (PRNGs) are another class of RNGs that use algorithms to generate random sequences as indistinguishable as possible from "truly" random sequences [9].
In general, an RNG must meet specific conditions. Its output should be independent, uniformly distributed, and unpredictable [9]. e generator should be efficient in terms of the amount of random numbers generated. In order to be used in a cryptographic environment, an RNG should additionally be resistant to well-known attacks [9]. is means that, despite having information about the input or the current/previous output of the generator, an attacker cannot guess its output (previous, current, or future).

Entropy.
Entropy is another term associated with RNGs. e concept was first introduced in 1948 by Claude Shannon in the field of Information eory. Entropy is the measure of randomness as it represents the uncertainty of the output generated by a data source. Entropy is denoted by H(X) and defined as [10] where P(X � x) is the probability of X taking the value x in the sample space Ω X . As shown in [10], the maximum entropy distribution over the range is the uniform distribution. e entropy in this range is no greater than log 2 |Ω X |.
In cryptographic applications, entropy is one of the measures of the quality of an RNG. erefore, it is expected to be the highest possible as it translates into more efforts needed by an attacker to guess the output. (CA). Stanislaw Ulam took an interest in self-replicating automata concepts in the 1940s and initiated the studies on cellular automata. Later on, upon suggestions by Stanislaw Ulam, John von Neumann further developed the theory on cellular automata in the 1960s to model self-reproduction in the field of Biology. CAs gain popularity in the 1970s with Martin Gardner and John Conway's Game of Life [11].

Cellular Automata
A cellular automaton represents a network of cells or finite state automaton. e state of each cell at time t changes according to a local rule and depends on the state of the surrounding cells or neighborhood at time t−1.
e interest in cellular automata comes from the fact that simple local calculations, at the level of cells, result in a complex global behavior, at the level of the automaton. Even though uniform CAs (uniformity in update, in dependency to neighbors, in local rules, etc.) are suitable for modeling physical systems, nonuniform CAs are more suited and more efficient to model real life systems.
A CA is formally defined as the tuple (d, L, S, N, f ) [11], where (i) d is the dimension of the cellular space (ii) L is the d-dimensional cellular space, in which elements are called cells (iii) S is a finite state, in which elements are called states (v) f is the local rule of the automaton that dictates the change of state of each cell from time t to time t+1 If c is considered to be the current configuration of the automaton, then Φ(c), Φ 2 (c), . . . represent its next configurations. Φ is called the global function.
Changing d, L, S, N, and f leads to different types of CAs.
A CA for which all the cells are updated by the same rule f is called uniform. Otherwise, the CA is called hybrid or nonuniform. A CA for which the boundary cells are assigned the state 0 is called a null boundary CA. A periodic boundary CA has the extreme cells adjacent to each other.
Elementary Cellular Automata or ECAs were introduced in the 1980s by Wolfram. ese CAs are one-dimensional, have two possible states (0 or 1), and are characterized by a three-neighborhood dependency. Extensive research related to CAs has been conducted in the field of Cryptography, especially in producing random numbers using CAs. Wolfram introduced this association between Pseudorandom Number Generators and CAs in 1985 [2].
In the case of ECAs, the neighborhood of the ith cell is given by N(i) � s i−1 , s i , s i+1 and the local rule f for that cell is defined as s t+1 , s t i being the current state and s t+1 i the next state of the cell. Since ECAs assume two possible states and have a three-neighborhood dependency, 2 2 3 � 256 rules are possible. Each of these rules can be represented as a Boolean function or as a truth table, the decimal value of the truth table being the rule number. If the Boolean expression of the rule contains only the XOR (⊕) operator, then the rule is linear. Otherwise, if the Boolean expression contains also AND(·)/OR(+) operators, then the rule is nonlinear. Table 1 shows an example of a linear and a nonlinear rule.

Related Work
Wolfram was the first to use CAs as a source of randomness [2]. To generate pseudorandom numbers sequences (PNSs), he used a one-dimensional cellular automaton with r � 1 and rule 30. In [3], Hortensius et al. proposed a PRNG based on cellular automata for built-in self-test. Both uniform (rule 30) and nonuniform (rules 30 and 45, and rules 90 and 150) cellular automata were evaluated in the article. Nandi et al. [4] also worked on nonuniform cellular automata and proposed 5 relevant rulesets (rulesets can be found in Appendix B). Tomassini et al. [5] extended the research on nonuniform, one-dimensional CAs using four rules (90, 105, 150, and 165) selected using cellular programming, an evolutionary technique. Guan et al. introduced controllable CA [6] and self-programmable CA [7]. By using control signals and hybrid cell configurations, they managed to increase the randomness. Seredynski et al. [8] worked also on nonuniform, one-dimensional CAs with rules of radius 1 and 2. ey used cellular programming to discover the relevant rules among 47 rules. e most relevant ruleset chosen for radius 1 in [8] was (86, 90, 101, 105, 153, 165). Recently, Bhattacharjee et al. [12] explored 3-state onedimensional CA. Each of these constructions either has been attacked or presents some flaws. is is still a young field of research, and improvements are still to be made. e configuration of the cellular automaton (dimension, neighborhood, rules selection, etc.) can be seen as an optimization problem. As such, the configuration chosen must be a compromise of cryptographic properties resulting in some flaws and gaps. In this article, the authors chose to work with a hybrid one-dimensional cellular automaton combined with other cryptographic primitives to maximize the strength and the randomness quality of the sequences generated. Table 2 summarizes the previously developed CA-based PRNGs.

Design Principles.
e goal behind the general CFA scheme was to develop a flexible PRNG. In other words, it can be adapted according to the design constraints and can be easily integrated in a broader system. is implies that its components are independent. It also implies that it can rely on existing cryptographic primitives or use new constructions.
CFA was also designed with portability and resistance to attacks in mind. e system was conceived to be resistant to most of the known attacks against PRNGs. However, the authors also tried to make it comply with the following constraints: (i) Flexibility (ii) Efficiency in execution (iii) Ease of use (iv) Simplicity in design e general building blocks of CFA are as follows: (i) A seed file containing high entropy sequences (ii) A reseed trigger mechanism that initiates the reseed mechanism (iii) A reseed mechanism that updates the seed file (iv) An output generation mechanism that comprises the cryptographic primitives and other functions and generates the output Figure 1 details the general building blocks and mechanisms by which an input is generated using CFA.
In the rest of this section, each of these components is detailed.

Seed File.
High entropy sequences are stored in the seed file before being used in the output generation mechanism. e origin of these sequences depends on the application and the designer of the system. ey are appended to the seed file, stored in flash memory or hard disk, and to be used subsequently by the reseed mechanism to be fed to the output generation mechanism.

Reseed Trigger Mechanism.
e reseed trigger mechanism controls when a reseeding is necessary based on some parameter threshold. is parameter could be, for example, the period of the cellular automaton. In addition to this periodic reseed, an upon request reseed mechanism can be implemented for when the state of the PRNG is compromised by some attack.

Reseed Mechanism.
During the reseed mechanism, the seed file is updated. When the reseed mechanism is triggered, a new random sequence of fixed length is generated and prepended to the seed file. When all the sequences of the seed file are used, its content is cleared, and new sequences are generated.

Output Generation Mechanism.
e output generation mechanism is the core of the system. It uses cryptographic primitives and cellular automata configurations to Security and Communication Networks generate a random sequence with entropy as high as possible.

Description.
In this section, a more detailed description of the general scheme's (CFA) generation mechanism and a specific implementation are presented.
3.2.1. CFA Outline. CFA output generation mechanism comprises three major building blocks: (i) A hash function, h(x), that has the properties of preimage resistance, second preimage resistance, and collision resistance , with good statistical properties and good resistance to attacks CFA has a strength in bits of min(m, k), where m is the size in bits of the hash function digest, and k is the size in bits of the key of the block cipher. Figure 2 depicts the output generation mechanism of CFA comprising h, Evol and E.

A Specific Implementation: CFA-256.
In CFA-256, the building blocks used are as follows: (i) A seed generated by means of the Bouncy Castle Java library (ii) e SHA-3-256 hash function (iii) A one-dimensional, nonuniform two-state CA with r � 1 (iv) e AES-256 block cipher in counter mode

Step 0: Seed File Generation.
e CFA-256 algorithm starts with the generation of a seed using the rea-dedSeedGenerator class found in the Bouncy Castle Java library. e size of the seed generated is 1024 bits. Each sequence generated is then stored in the seed file. A selected sequence from the seed file is then fed to the SHA-3 hash function.

3.2.4.
Step 1: SHA-3 Processing. SHA-3 relies on a sponge construction rather than a Merkle-Damgård construction like SHA-1 and SHA-2. e message is divided into blocks, and padding is applied if necessary (preprocessing phase).  Truth table   110 No 110  101  100  011  010  001  000  0  1  1  0  1  1  1  0   150 Yes 111  110  101  100  011  010  001  000  1 0  [12] 3-state one-dimensional CA with r � 1 120021120021021120021021210 Reseed mechanism Source of random e sponge construction starts with an absorbing phase followed by a squeezing phase. During the absorbing phase (or input phase), the input blocks x i are processed, while, during the squeezing phase (or output phase), the output h(M) is computed. For both of these phases, the same f function is used. f is a fixed permutation function consisting of 24 rounds of 5 reversible operations (θ, ρ, π, χ and ι applied in this order).
In Figure 3, r represents the size of the input blocks, called the bitrate. c represents the capacity, which is closely related to the security level of the construction and equal to two times the digest size. e sum of these two parameters represents the width of the state or b. In SHA-3, b is equal to 1600 bits. Using SHA-3, hash values of lengths 224, 256, 384, or 512 bits can be obtained from messages of any size. For CFA-256, the parameters used are r � 1088 bits, c � 512 bits, and 256 bits for the size of the digest (Table 3, [13]). Table 3 summarizes SHA-3 parameters for different digest sizes.
A 256-bit digest is obtained from the SHA-3 function and is then evolved using a one-dimensional elementary CA.

Step 2: CA Evolution.
e CA used in this case is a hybrid CA. e local rule f consists of the ruleset [30, 90, 150, 30, 110, 30, 90, 150] applied alternately on the cells. In order to choose those rules, the guideline provided in [1] was followed.
e rule selection procedure is detailed in the Appendix. e CA is evolved using this local rule 128 times. e obtained 256 cells are then supplied to the AES encryption algorithm.

Step 3: AES Encryption.
is step encrypts the output of the second one in counter mode using a random secret key. Figure 4 shows the general AES encryption process. e plaintext is the result of the CA evolutions, and the ciphertext is the random sequence generated by the whole CFA-256 algorithm. e secret key involved in the encryption is generated using readedSeedGenerator.

CFA-256
Algorithm. Algorithm 1 shows the process by which the seed file is filled with high entropy sequences. Algorithm 2 shows the generation mechanism of CFA-256.

Security Analysis.
Here, it is assumed that the security of cryptographic primitives translates to the PRNG [14].
According to [15], the following points represent some of the reasons that lead to a compromised PRNG: (i) Entropy overestimation and guessable starting points (ii) Chosen-Input Attacks (iii) Side-Channel Attacks (iv) Direct Cryptanalytic attacks

Entropy Overestimation and Guessable Starting Points.
In order to avoid this situation, a seed file has been used, and a seed/reseed mechanism has been implemented [16]. e seed file contains high entropy sequences that are used as input to the hash function. e seed/reseed mechanism is responsible for generating a new state of the PRNG from time to time or upon request. us, guessing the starting configuration, even in the case of entropy overestimation, is costly. In [4,5,8], no seed generation mechanism, reseed mechanism, or seed is mentioned.

Chosen-Input Attacks.
To resist these attacks, a cryptographic hash function is used in the generation mechanism. In [4,5,8], no strategy or mechanism is used to avoid or lessen the risk of chosen-input attacks.

Side-Channel Attacks.
A hybrid CA, with a carefully chosen ruleset combining linear and nonlinear rules with good cryptographic properties, has been used to protect against this kind of attacks. As shown in the next section and Appendix B, the cryptographic properties of the ruleset used in CFA-256 are better than the ones presented in [4,5,8].

Direct Cryptanalytic Attacks.
To prevent this kind of attacks, a secure block cipher along with a hash function has been used. It should be mentioned that attacks related to a state compromise are not considered for the design of a PRNG as these attacks can be prevented by the environment. Measures to prevent them should be handled at the system level [16]. In [4,5,8], no strategy or mechanism is used to avoid or lessen the risk of chosen-input attacks. From these tables, it can be noted that the correlation immunity, the resiliency, and the balancedness decrease with the iterations. However, the nonlinearity and the algebraic degree increase with the number of clock cycles. erefore, the ruleset selected appears to be a good compromise for the cryptographic properties (Appendix A displays the selection process). Moreover, if the design proposed in this paper is compared to other known CAbased PRNGs on the basis of cryptographic properties and the CAs evolutions (results are presented in Appendix B and Appendix C), it can be considered as better suited for cryptographic use.

Avalanche Effect.
One interesting feature sought by any cryptographic primitive is the avalanche effect [9]. is property was first introduced by Feistel [17] and can be expressed as follows: a small change in the input (plaintext or key) results in a large change in the ciphertext [9]. Mathematically, it can be formulated as where I and I' are two inputs that differ in one bit, and F is a function applied to these inputs. In this article, F corresponds to CFA-256.
In order to evaluate the avalanche effect of CFA-256, the following steps were applied: (i) Generate 100 high entropy seeds (ii) For each seed, generate outputs by changing one bit each time (iii) For each seed, calculate the hamming distances (iv) For each bit i (1 ≤ i ≤ 1024), calculate the average of the hamming distances Figure 5 shows the results obtained. e maximum change is 54.99%, and the minimum is 45.01%. e average of all changes is equal to 50.02%. erefore, changing a bit in the input yields about 50% changes in the output. is shows that CFA-256 displays a good diffusion property.

NIST Statistical Test Suite.
e NIST Statistical Test Suite (STS) is a set of tests developed by the National Institute of Standards and Technology. e goal behind this test suite is to check the random behavior and the statistical properties of PRNGs and TRNGs algorithms by deriving p-values from sequences generated by these algorithms. ese p-values represent the probability that a sequence was generated by means of a TRNG. A p-value measures the   difference between a given sequence and a random sequence. e algorithm passes or fails the tests depending on a significance level, set to 0.01 for the NIST STS. A more detailed description of this test suite is provided in the NIST special publication 800-22 [18].
A sequence of 10 M bits was generated using CFA-256 and was used as an input to the test suite. Table 9 Table 9 summarizes the results of the DIEHARDER tests for CFA-256 and other known PRNGs. Results in bold represent failed tests. Table 10 shows that CFA-256 passes all the tests included in the DIEHARDER test suite. Furthermore, it is the only design that passes all the tests. erefore, it can be considered as a better alternative to the previously proposed designs.  7   1  2  1  1  2  3  2  1  1  2  3  2  2  4  4  3  2  2  3  4  3  4  5  5  4 3 3 Table 8: Balancedness.

Challenges, Possible Solutions, and
Future Directions e general design proposed in this article is adaptable, as mentioned before. Consequently, changing the building blocks is done according to applications. However, it is challenging in some applications like IoT systems. It is possible to choose either of the predeveloped lightweight systems (hash functions and block ciphers) together with CAs. As an alternative, irreversible and reversible CAs could  be involved instead of hash functions and block ciphers, respectively, which provide the required properties and security. Future directions could include the use of CFA in banking systems security like [20]. In addition, the benefit of machine learning to find the best combination of linear and nonlinear CA rules using the classifiers is found in [21]. Moreover, an adequate extension of CFA to secure wireless sensor networks [22] will be proposed.

Conclusions
A cryptographically secure pseudorandom number generator is useful in various applications. A new family of PRNGs is described in this article. Both the general and the specific designs presented rely on three building blocks carefully selected for their cryptographic features. e generator is built upon two strong cryptographic primitives: a hash function and a block cipher. ose primitives ensure that the PRNG is resistant to several types of attacks. In addition, a hybrid cellular automaton is used to achieve better confusion and diffusion properties, as well as to increase the randomness and security criteria. Passing statistical tests (NIST STS and DIEHARDER), exhibiting a good avalanche property and using a cellular automaton with good cryptographic properties, shows that CFA-256 can generate high quality random sequences and displays good confusion and diffusion properties.

A. Cellular Automata
e self-reproducing feature of cellular automata makes them a good candidate for the generation of high quality random sequences [1]. is randomness property is provided by linear rules only. Although maximum period linear rules provide high quality randomness and security against side-channel attacks (power attack, timing attack, etc.), linear cellular automata have been shown to be insecure for cryptographic applications [23]. erefore, nonlinear rules are needed to achieve a better security against linear cryptanalysis and MS (Meier and Staffelbach) attack [23]. However, nonlinear rules have high correlation [23]. Consequently, for cryptographic applications, hybrid cellular automata with a ruleset made of a combination of maximum period linear rules and nonlinear rules carefully chosen should be used. For the selection of the ruleset used within the specific pseudorandom generator proposed in the paper, the authors followed the guideline provided in [1].

A.1. Definitions
Before detailing the selection process of the ruleset used, some definitions related to cryptographic properties are described [24]. e hamming weight of f 1 ⊕ f 2 is called the Hamming distance between f 1 and f 2 . For example, the hamming distance between e algebraic degree of a Boolean function is the number of variables in the highest order term with nonzero coefficient. For example, the algebraic degree e minimum of the Hamming distances between a Boolean function f and all affine functions involving its input variables is known as the nonlinearity of the function. For example, the nonlinearity of f( A.1.6. Balancedness. If the Hamming weight of a Boolean function of n variables is 2 n−1 , it is called a balanced Boolean A.1.7. Correlation Immunity. A function in n variables is correlation immunity of order k, 1 ≤ k ≤ n, if and only if all of the Walsh transforms f(w) � x∈V n (−1) f(x)⊕x·w , 1 ≤ wt(w) ≤ k, are equal zero.
A Boolean function f in n variables is correlation immunity of order k if its values are statistically independent of any subset of k input variables.
A.1.8. Resiliency. A Boolean function, which is balanced, and correlation immunity of order k are said to be a kresilient function. For example, the resiliency of f(x 1 , x 2 ) � x 1 ⊕ x 2 is 1, and the resiliency of f(x 1 , x 2 ) � x 1 · x 2 is 0.
A. 1.9. Selection Process. In [1], a selection procedure for selecting a ruleset for constructing a robust hybrid cellular automaton is presented. Table 11 summarizes the different steps of the guideline along with the choices made by the authors for each of these steps. e sets of linear rules and nonlinear rules are taken from [1,25]. Tables 12 and 13 summarize their algebraic normal form and their cryptographic properties up to the third iteration.
Security and Communication Networks Tables 19 to 23 show the computed cryptographic properties (nonlinearity, algebraic degree, correlation immunity, resiliency, and balancedness) for the ruleset used in [5].

Data Availability
e data used to support the findings of this study are included in the article.