We describe a modification to a previously published pseudorandom number generator improving security while maintaining high performance. The proposed generator is based on the powers of a wordpacked block upper triangular matrix and it is designed to be fast and easy to implement in software since it mainly involves bitwise operations between machine registers and, in our tests, it presents excellent security and statistical characteristics. The modifications include a new, keyderived sbox based nonlinear output filter and improved seeding and extraction mechanisms. This output filter can also be applied to other generators.
Most cryptographic protocols require unpredictable quantities; these include keys, prime numbers, and challenge values. If these values were predictable, the security of such systems would be compromised.
The most common way to obtain these values is from pseudorandom sequences. Furthermore, a suitable pseudorandom number generator (PRNG) can be used as the keystream generator in a Vernam streamcipher scheme (see [
A PRNG is a completely deterministic algorithm; the sequence it generates is a function of its inputs and, unlike a truly random generator, its output can be reproduced. This means that we only need the seed (the input to the PRNG) in order to generate the complete output sequence. The output sequence is much longer than the seed and is, in practice, undistinguishable from a really random sequence.
In security applications we need to produce sequences with large periods, high linear complexities, and good statistical properties and satisfy certain unpredictability criteria. Several statistical suites (see [
Most available cryptographic PRNGs are based on linear feedback shift registers (LFSRs). They are so popular because they can be easily implemented in hardware; they produce sequences of large periods with good statistical properties and have a simple structure that can be analysed easily. LFSRs by themselves are not secure but they are commonly enhanced with other techniques to improve their cryptographic properties.
We propose a modification to a previously published PRNG (see [
Similar to the LFSR case, the generator is comprised of two distinct blocks: a generator component that involves linear operations but is proven to generate sequences of a guaranteed period, perfect linear complexity (unlike LFSRs) and excellent statistical properties; and a nonlinear output filtering component that introduces unpredictability and resistance to common attacks.
One of the main contributions of this paper, together with the seeding and extraction algorithms, is the output filter, which is a new design based on the key scheduling algorithm of RC4 (see [
The main key differences and improvements of the proposal regarding the original PRNG (see [
The paper is divided as follows: a description of the generator is given in Section
Our generator is based on the powers of a BUTM defined over
Consider the BUTM
The following result, which forms the basis of the generator, establishes the expression of the different powers of matrix
Let
In order to generate the pseudorandom bit sequence, matrices
For each matrix
The key for obtaining long periods for the sequence given by (
The value of
Although any prime can be chosen for the generator, we choose
The initial contents of matrices
The concept of wordpacked matrices is essential for the optimized implementation of the generator over
We define a matrix, whose elements lie in
Operations involving wordpacked matrices are equivalent to those between conventional matrices since packed matrices are, essentially, just a way of storing the elements of the matrix so that the computations required can be efficiently implemented as binary operations between processor registers. Nevertheless, they present certain peculiarities of their own that must be taken into account.
The addition of wordpacked matrices must be done between matrices of the same type, observing packing orientation: rows or columns. Although they could be unpacked and operated normally, the optimal way is to perform a XOR operation word by word.
The product operation between packed matrices is a little more complex than the addition. The product must be done between matrices of different types and with compatible sizes. The multiplicand has to be a row packed matrix, while the matrix corresponding to the multiplier must be packed by columns.
In order to implement the generator over
It can be observed that the computations required on each iteration imply that matrices
Taking into account the peculiarities of the product operation between packed matrices, we can identify the following matrices and types:
Although the product operation between wordpacked matrices generates sparse bits instead of words, these bits can be repacked into the desired format (rows or columns) without a significant performance penalty.
Besides determining the format for each matrix, their sizes must also be decided for the correct operation of the implementation.
Several sizes and the number of decimal digits of the corresponding period are shown in Table
Different orders of


Digits 

15  8  06 
31  8  11 
47  8  16 
23  16  11 
31  16  14 
47  16  18 
47  32  23 
63  32  28 
64  48  33 
80  48  38 
95  48  43 
96  53  44 
In this section, we describe the different algorithms that perform seeding, bit extraction, and output filtering using a suitable pseudocode notation involving the following operators:
bitwise XOR,
bitwise AND,
bitwise right shift,
bitwise left shift,
modulus,
assignment,
addition modulo
product.
The generator takes a 128 bit seed (or key) as input and uses it to generate the initial state of block
The four
We initialise a temporary
}
Then, for each sbox (
}
The first sbox (
Note that all sboxes constructed in this way are balanced and have properties similar to purely random, nonkeydependent, sboxes (see [
The generator is seeded using the
Each row of this matrix is filled up using a 64 bit word,
Rows 0 and 63 are fixed to a specific bit pattern, preventing a full zero
}
The generator is then iterated 64 times without generating any output, further improving avalanche characteristics and overall security.
The output filtering and extraction mechanism employs two adjacent 32 bit words from matrix
This process is repeated to process all 96 32 bit words produced on each iteration of the generator, thus creating 48 32 bit words (1536 bits) of filtered output sequence per iteration.
The array
}
u32 A
:= {0x40000000, 0x00000000, 0x20000000, 0x00000000, 0x10000000, 0x00000000,
0x08000000, 0x00000000, 0x04000000, 0x00000000, 0x02000000, 0x00000000, 0x01000000,
0x00000000, 0x00800000, 0x00000000, 0x00400000, 0x00000000, 0x00200000, 0x00000000,
0x00100000, 0x00000000, 0x00080000, 0x00000000, 0x00040000, 0x00000000, 0x00020000,
0x00000000, 0x00010000, 0x00000000, 0x00008000, 0x00000000, 0x00004000, 0x00000000,
0x00002000, 0x00000000, 0x00001000, 0x00000000, 0x00000800, 0x00000000, 0x00000400,
0x00000000, 0x00000200, 0x00000000, 0x00000100, 0x00000000, 0x00000080, 0x00000000,
0x00000040, 0x00000000, 0x00000020, 0x00000000, 0x00000010, 0x00000000, 0x00000008,
0x00000000, 0x00000004, 0x00000000, 0x00000002, 0x00000000, 0x00000001, 0x00000000,
0x00000000, 0x80000000, 0x00000000, 0x40000000, 0x00000000, 0x20000000, 0x00000000,
0x10000000, 0x00000000, 0x08000000, 0x00000000, 0x04000000, 0x00000000, 0x02000000,
0x00000000, 0x01000000, 0x00000000, 0x00800000, 0x00000000, 0x00400000, 0x00000000,
0x00200000, 0x00000000, 0x00100000, 0x00000000, 0x00080000, 0x00000000, 0x00040000,
0x00000000, 0x00020000, 0x00000000, 0x00010000, 0x00000000, 0x00008000, 0x00000000,
0x00004000, 0x00000000, 0x00002000, 0x00000000, 0x00001000, 0x00000000, 0x00000800,
0x00000000, 0x00000400, 0x00000000, 0x00000200, 0x00000000, 0x00000100, 0x00000000,
0x00000080, 0x00000000, 0x00000040, 0x00000000, 0x00000020, 0x00000000, 0x00000010,
0x00000000, 0x00000008, 0x00000000, 0x00000004, 0x00000000, 0x00000002, 0x00000000,
0x00000001, 0xD8000000, 0x00000000};
u32 B
:= {0x40000000, 0x00000000, 0x20000000, 0x00000000, 0x10000000, 0x00000000,
0x08000000, 0x00000000, 0x04000000, 0x00000000, 0x02000000, 0x00000000, 0x01000000,
0x00000000, 0x00800000, 0x00000000, 0x00400000, 0x00000000, 0x00200000, 0x00000000,
0x00100000, 0x00000000, 0x00080000, 0x00000000, 0x00040000, 0x00000000, 0x00020000,
0x00000000, 0x00010000, 0x00000000, 0x00008000, 0x00000000, 0x00004000, 0x00000000,
0x00002000, 0x00000000, 0x00001000, 0x00000000, 0x00000800, 0x00000000, 0x00000400,
0x00000000, 0x00000200, 0x00000000, 0x00000100, 0x00000000, 0x00000080, 0x00000000,
0x00000040, 0x00000000, 0x00000020, 0x00000000, 0x00000010, 0x00000000, 0x00000008,
0x00000000, 0x00000004, 0x00000000, 0x00000002, 0x00000000, 0x00000001, 0x00000000,
0x00000000, 0x80000000, 0x00000000, 0x40000000, 0x00000000, 0x20000000, 0x00000000,
0x10000000, 0x00000000, 0x08000000, 0x00000000, 0x04000000, 0x00000000, 0x02000000,
0x00000000, 0x01000000, 0x00000000, 0x00800000, 0x00000000, 0x00400000, 0x00000000,
0x00200000, 0x00000000, 0x00100000, 0x00000000, 0x00080000, 0x00000000, 0x00040000,
0x00000000, 0x00020000, 0x00000000, 0x00010000, 0x89400000, 0x00000000};
The resulting generator has been tested successfully with three different statistical suites.
The first one, shown in Table
RandTest statistical comparison.
Result  Correction  

Frequency  0.7200  2.7060 
Serial  2.2407  4.6050 
Poker 8  250.4640  284.30 
Poker 16  65554  65999 
Runs  21.0368  23.5418 
Autocorr.  0.8074  1.2820 
Linear comp.  10000 

The second suite is PractRand (see [
PractRand results for a 64 GB sequence.
Test  Raw  Processed  Evaluation 

BCFN(2, 13):! 

“pass”  Normal 
BCFN(2 + 0, 13 − 0) 


Normal 
BCFN(2 + 1, 13 − 0) 


Normal 
BCFN(2 + 2, 13 − 0) 


Normal 
BCFN(2 + 3, 13 − 0) 


Normal 
BCFN(2 + 4, 13 − 0) 


Normal 
BCFN(2 + 5, 13 − 0) 


Normal 
BCFN(2 + 6, 13 − 0) 


Normal 
BCFN(2 + 7, 13 − 0) 


Normal 
BCFN(2 + 8, 13 − 1) 


Normal 
BCFN(2 + 9, 13 − 1) 


Normal 
BCFN(2 + 10, 13 − 2) 


Normal 
BCFN(2 + 11, 13 − 3) 


Normal 
BCFN(2 + 12, 13 − 3) 


Normal 
BCFN(2 + 13, 13 − 4) 


Normal 
BCFN(2 + 14, 13 − 5) 


Normal 
BCFN(2 + 15, 13 − 5) 


Normal 
BCFN(2 + 16, 13 − 6) 


Normal 
BCFN(2 + 17, 13 − 6) 


Normal 
BCFN(2 + 18, 13 − 7) 


Normal 
BCFN(2 + 19, 13 − 8) 


Normal 
DC69x1Bytes1 


Normal 
Gap16:! 

“pass”  Normal 
Gap16:A 


Normal 
Gap16:B 


Normal 
The third suite is comprised of the 160 statistics included in TestU01 1.2.3 BigCrush battery (see [
Avalanche is a very important characteristic in cryptographic primitives in order to prevent successful cryptanalysis.
It is defined as the number of output bits that change when a single input bit is flipped, and the expected outcome is that roughly half of the output bits change when this happens.
In this case, we have taken 128 different seeds that differ in a single bit and have measured avalanche in two different points: the
Avalanche analysis of
Avalanche analysis of generated sequence.
In both cases, the results are excellent with no abnormal values and with most of the population very close to 32, which is the expected value.
The proposed generator could be employed as the source for random values, nonces, keys, and so forth, as well as the key stream generator in a Vernam stream cipher.
In this case, it accepts keys of 128 bits in size, but the seeding algorithm (see Section
The maximum capacity of the
The generator component is based on the powers of a 2 × 2 BUTM and has excellent statistical characteristics, passing all tests of stringent suites (see Section
It also guarantees a period of at least
Regarding linear complexity, the generated sequences present the expected linear complexity of a random sequence (half the sequence length, see Table
The generator is a linear algorithm, involving exclusively addition and multiplication operations over
Substitution boxes (or sboxes) are simple substitution tables where an input value is transformed into a different output value; the most common are 8 × 8 bits (byte as input and output) and 8 × 32 bits (byte as input and four byte word as output). They are essential in many cryptosystem designs since they can introduce the required nonlinearity characteristics, making cryptanalysis a more difficult endeavour (see [
The proposed nonlinear filter (see Sections
There have been some known attacks on RC4 (see [
The proposed nonlinear filter presents defences against common attacks.
We have included a performance benchmark in Table
Performance benchmark.
MB/s  

Proposed PRNG  133.1 
AES256 (OFB)  71.3 
RC4  218.1 
Salsa20  209.7 
HC128  176.7 
All algorithm implementations are single thread, pure native compileroptimized code, without hardware acceleration or processorspecific highperformance instruction set extensions.
Although not as fast as other lighterweight stream ciphers, the proposed generator achieves acceptable performance and presents valuable characteristics.
One of them is that, being a matrix based generator, it is essentially an
Another advantage comes from the fact that the whole matrix computations involve binary operations between registers, and matrix sizes can be adjusted to profit from architectures that have bigger register sizes. The proposed design employs a 64 × 48 bit matrix size, taking advantage of current 64 bit architectures, but it is trivial to adjust this size for maximum performance on future architectures if necessary.
We have presented a modification on a block matrix PRNG introducing a keydependent sbox output filter as well as different seeding and initialization algorithms to improve security and nonlinearity. It employs a word packing technique in order to optimise computations over
The resulting generator produces sequences of great quality in terms of randomness, passing battery tests like PractRand [
The keydependent sbox output filter is the result of adapting the concept of block cipher keyscheduling to a keystream generator. It offloads some computations to each seed change cycle while maintaining high performance during sequence generation. An additional benefit is that it multiplies the cost of brute force key search. Although somewhat based on the key scheduling algorithm of RC4 [
Possible future work includes parallel implementations of the proposed generator on suitable CPU and GPU platforms, assembler optimization, and other performance analyses, as well as the adaptation of the nonlinear filter component to other PRNG and further analysis.
For more details see supplementry material.
The authors declare that there is no conflict of interests regarding the publication of this paper.
Research partially supported by the Spanish MINECO under Project TIN201125452.