The Prediction of Serial Number in OpenSSL’s X.509 Certificate

. In 2007, a real faked X.509 certificate based on the chosen-prefix collision of MD5 was presented by Marc Stevens. In the method, attackersneeded topredictthe serial numberof X.509certificatesgeneratedby CAsbesides constructingthecollision pairs of MD5. After that, the randomness of the serial number is required. Then, in this case, how do we predict the random serial number? Thus, the way of generatingserial numberin OpenSSL was reviewed.The vulnerability was found thatthe valueof the field “not before”of X.509 certificates generated by OpenSSL leaked the generating time of the certificates. Since the time is the seed of generating serial number in OpenSSL, we can limit the seed in a narrow range and get a series of candidate serial numbers and use these candidate serial numbers to constructfaked X.509 certificatesthrough Stevens’s method. AlthoughMD5 algorithm has been replaced by CAs, the kind of attack will be feasible if the chosen-prefix collision of current hash functions is found in the future. Furthermore, we investigate the way of generating serial numbers of certificates in other open source libraries, such as EJBCA, CFSSL, NSS, Botan, and Fortify.


Introduction
Digital certificates are adopted widely in Internet, which is a basic security measurement. Many principals, such as clients and servers, depend on digital certificates to authenticate each other. If an attacker can forge other's digital certificate, he/she may impersonate other's identity and access sensitive information. This is one of serious threats for the public.
The security of digital certificates is based on the digital signature algorithms and hash algorithms. If an attack against these algorithms occurs, the digital certificates based on these algorithms cannot be trusted any more. Among attacks, collision of hash algorithms is one of the most serious threats. Since the first real MD5 collision attack was presented by Wang [1,2] in 2004, it is possible to construct forged certificates based on the collision attack of MD5.
At Eurocrypt 2007, the different certificates with the same signature were created firstly by Stevens based on the chosen-prefix collision attack of MD5 [3][4][5]. This was a big event for commerce CAs and their users because the kind of forged certificates can be verified successfully. After that, many companies announced that MD5 was vulnerable to digital certificates, such as Verisign, Microsoft, Mozilla, TC TrustCenter, RSA, US-CERT, and Cisco [6]. In addition, the super-malware Flame was discovered in 2012 [7], which uses the method to forge a Microsoft's certificate [8].
The method of Stevens cannot forge a certificate from an existing certificate because the second preimage attack of MD5 is hard so far. The method needs to construct two certificates based on chosen-prefix collision attack of MD5 before submitting one of them to apply for a certificate to a CA. The implementation of the process has two key issues, one related to the collision pair construction of MD5 and the other to some fields controlled by CAs, such as serial number, in certificates, which attackers need to predict before submitting the application. Against the threat, Stevens gave two suggestions for CAs: one is to replace MD5 algorithm with other secure hash algorithms (such as SHA-256) because chosen-prefix collision of other hash algorithms does not occur at present; the other is to add a sufficient amount of fresh randomness at the appropriate fields (such as serial number) in order to prevent attackers from predicting if MD5 cannot be replaced at once [5]. In the wild, however, many valid certificates still use MD5 [9]. In addition, we grabbed 180,000+ certificates from Internet, while 5000+ certificates are based on MD5, in other words 2.8% certificates.
In this paper, we will focus on whether the randomness of some fields in certificates is enough to prevent attackers from predicting. Since the detailed codes of business CAs are not public, we review the way of generating certificates by open source software OpenSSL to find how to predict the values of some fields in certificates. OpenSSL uses a pseudo random number generator (PRNG) to output random numbers. Some literatures related to the security of the PRNG have been proposed [10][11][12][13][14][15]. The security of OpenSSL's PRNG in Android and Debian has been reported in [10,14]. A theory analysis of OpenSSL's PRNG was presented in [10]. However, it is not clear how the PRNG works in the procedure of generating X.509 certificates. Furthermore, we also investigated generating certificates in other open source libraries, like EJBCA, CFSSL, NSS, Botan, and Fortify.
In this paper, we have three contributions as follows: (1) We find a vulnerability of OpenSSL that the field "not before" in certificates leaks the time of generating certificates, which is the seed of generating the field "serial number," so that it is possible to predict the value of "serial number." (2) We give the predicting method for the field "serial number" and forge certificates based on the proposed method and Stevens's method.
(3) We investigate five other open source libraries and find similar vulnerability in two libraries, EJBCA and NSS.
The paper is organized as follows. In Section 2, some preliminaries are introduced and the problems solved by the paper are defined. Section 3 reviews the source codes of OpenSSL about generating X.509 certificates. Then, Section 4 proposes a method predicting the key fields of certificates. Some countermeasures are given in Section 5 and Section 6 investigates other open source libraries. Finally, Section 7 concludes the paper.

Preliminaries
In X.509 certificates, the signature of CA is the most important part to prevent from forging. Any modification of contents in certificates would make the change of CA's signature, in other words the change of Hash value. If a user A's certificate has existed, we cannot forge the certificate directly because it needs to construct the second preimage of hash value of the certificate. However, we can use other user B's identity to apply a certificate for CA, and generate a chosen-prefix collision pair, which can forge A's certificate.

Chosen-Prefix Collision Attack of MD5.
According to the chosen-prefix collision, the prefixes p and p of two message blocks are chosen. Then, the collision pair, s and s , is generated, so that 5( ‖ ‖ ) = 5( ‖ ‖ ) is satisfied for any arbitrary suffix d. The two prefixes p and p must be of equal length and their length is a multiple of the MD5 message block size. Otherwise, padding message must be added. The computing complexity of the attack is (2 39 ) [4,5] and a program was presented by Stevens [16]. For attackers, the method can be applied to forge certificates successfully.
Before that, identical-prefix collision had been studied, which is easier to be constructed than chosen-prefix collision. Although identical-prefix collision can be used to forge certificates, the kind of forgery is meaningless in practical attacks because the user's identity is in the prefix and cannot be changed.
The overview of collision complexities is in Table 1. We can see the chosen-prefix collision of MD5 is feasible in computing while the chosen-prefix collision of SHA-1 is unfeasible so far. But, in the near future, a real case of chosenprefix collision of SHA-1 may be found, when the attack will be feasible.

X.509 Certificates.
To forge a certificate, we need to know which part of certificate is as the prefix and which part of certificate the collision pair is placed on. The data structure of X.509 certificate is in Table 2.
According to the chosen-prefix collision attack, the generating collision pair is like random number, while only the field "subject public key info" is the analogy with random number. Thus, the collision pair constructed by chosen-prefix collision attack is placed in the field "subject public key info" ( Table 2), and the fields from "version number" to "public key algorithm" of the certificate are as prefix chosen. Then an attacker must know the CA will chose which value to fill the fields in advance, because, before requiring the certificate for the CA, he/she must construct a collision pair and then submit the generated "public key info." Among these fields, the values of "serial number" and "not valid before" need to be forecast because they are controlled by CAs while others are easy to obtain.

The Procedure of Forging a Certificate.
To forge A's certificate, we need to generate a chosen-prefix collision pair to construct two certificates, one of which is in the name of A and the other is in the name of B. Then, we submit B's identity and public key to the CA and get its signature. The signature of A's certificate is replaced, which can be verified successfully. The flow of the forging a certificate is in Figure 1. Firstly, attackers chose a target CA. Before guessing the serial number and validity period in certificates, they need to collect/apply for enough certificates issued by the CA and look for whether the two fields have any patterns. If they find any, then the fields can be predicted. After constructing the collision pair based on chosen-prefix collision attack, attackers can submit one of the two to the CA and get its signature. If the guessed serial number and validity period are correct, it is successful! Otherwise, attackers would guess again.
In [4], Stevens reported that their targeted CA used sequential serial numbers and the validity period started exactly 6 seconds after a certification request was submitted. Thus they could predict the value of the fields easily. However, the attack becomes effectively impossible if the CA adds a sufficient amount of fresh randomness to the certificate fields, such as in the serial number. This randomness is to be generated after the approval of the certification request, so that if attackers cannot predict the value of these fields, they cannot construct the collision pair.

The Problem.
Thus, in this paper, we try to answer the two questions: (1) How do we predict the value of the field "serial number" if the CA chooses a random number as the serial number?
(2) How do we predict the value of the field "not valid before" that is in the unit of second?
To answer the two questions, we need to know how CAs generate the value of the two fields. However, the different CAs may adopt different ways to filling the fields. Since the open source software OpenSSL [18] is widely applied in generating X.509 certificates, we take it as an example to answer the two questions. For example, the open source PKI architecture OpenCA [19] is to call OpenSSL to generate X.509 certificates.

The Reviewing of OpenSSL
We use OpenSSL 1.1.0e to review how a certificate is generated. Before 0.9.8 of OpenSSL, MD5 was a default configuration for creating message digests [20], but after that MD5 is still supported because of compatibility. (ii) RAND bytes(void * buf, int n): outputs n bytes of random number into buf.
The input parameter md 0 of RAND add is the IV of SHA1 algorithm. The parameter s is an array, whose initial values are zero, which is the internal states of the random number generator. The parameters p and q are location marks of array s, whose initial values are zero.

The Serial Number of X.509
Certificates. When we use OpenSSL to generate a X.509 certificate, there are two ways to generate the serial number. In the configure file of OpenSSL "openssl.conf" (Figure 2), the term "serial" is related to the serial number. If the file "serial" in the current directory exists, the serial number can be set up in the file; that is to say, we can designate a number as the serial number in the file. For example, if we input "01" into the file "serial," the serial number will be "01." In addition, after the certificate is generated, the number in the file "serial" will be plus one and then changed into "02." In other words, the serial number of the next certificate will be "02." Thus, we can forecast exactly the serial number because of the sequential serial numbers.
On the other hand, if the file "serial" does not exist, OpenSSL would use random number as the serial number of X.509 certificates. An example is in Figure 3. In this paper, we will discuss the prediction of the serial number in the way.
Reviewing the source code of OpenSSL, we can find it calls the function "rand serial (BIGNUM * b, ASN1 INTE-GER * ai)" in X509.c to generate the serial number (Figure 4).
After a serial of function calling, the functions "RAND add(const void * buf, int num, double add)" and "RAND bytes(unsigned char * buf, int num)" are called in bn rand.c ( Figure 5).
In the case, the parameter b of RAND add() is "time t" type of variable "tim, " while the parameter r of RAND bytes() is defined inside. We reviewed the source code of RAND bytes() and found it is "FILETIME" type of variable "tv" in Figure 6. In addition, the parameter md 0 of RAND bytes() depends on the "dummy seed" in Figure 6, whose value is 20 bytes of "." by default.
In summary, the serial number depends on two time variables "tim" and "tv," where "tim" is a 32-bit integer which records the number of seconds since 00:00:00 Jan. 1, 1970, and "tv" is a 64-bit integer which records the number of 100 nanoseconds since 00:00:00 Jan. 1, 1601, in Windows, while "tv" records the number of microseconds since 00:00:00 Jan. 1, 1970, in Linux. The two times are the current system time.

The Valid
Time of X.509 Certificates. The valid time of X.509 certificate depends on two times: "not before" and "not after." The different time between "not before" and "not after" is the valid time.
In the source codes of OpenSSL, x509.c generates the content of a X.509 certificate (Figure 4) "set cert time(X509 * x, const char * startdate, const char * enddate, int days)" is to set the valid time (Algorithm 3). Since the parameter "startdate" is set as NULL when the function is called, the data field "not before" of certificates is set as the current time of system. The detail code is in X509 vfy.c by a serial of functions calling (Figure 7).
In Figure 7, "not after" is got by "not before" + "days," the parameter of set cert times(), because the "enddate" is set as NULL.

The Prediction of Serial Number
From above analysis, the serial number and "not before" depend on the system time when the certificate is generated in OpenSSL. In addition, the value of "not before" is the time when generating the certificate. Thus, we know some information of the seeds of the serial number.

Low Entropy Secret Leakage.
In [10], Strenzke pointed that if the seed was in a low entropy state, the output of random number generator would leak the information of the seed, which was called low entropy secret leakage (LESL). Thus, an attack can try through all the possible seeds and generate the results according to his/her instance of the random number generator. If the resulting outputs are equal to the outputs of the real random number generator, then the attacker knows the used seed of the real random number generator.
The above serial number generator of X.509 certificates in OpenSSL is an example of LESL. The current time of the day in microseconds provides about 36 bits of entropy. However, since "not before" of certificates leaks the time in seconds, as the part of seeds of serial number, we can try every 100 nanoseconds (in Windows) or microseconds (in Linux) to find which seed is used. Thus, the entropy is lost, and only 20 bits (10 6 ). In the next subsection, we will make the entropy reduce to 10 bits (10 3 ).

Testing.
In [4], authors reported that the validity period started exactly 6 seconds after a certification request was submitted. To verify the issue, we selected a commercial CA 6 Security and Communication Networks   that provides personnel with free certificates. We used ten different E-mail addresses to apply to the CA for certificates. The submitting time was recorded and the value of "not before" was checked after receiving the certificate. We can find that the difference between the two times is 5 seconds fixed.
Obviously, according to the difference of the two times, attackers can control the time when a CA generates a certificate because the value of "not before" directly shows the time. Furthermore, the serial number depends on the time in seconds and in nanoseconds in OpenSSL (Figures 3  and 4). Then attackers know the time in seconds while not knowing the time in 100 nanoseconds. Thus, for attackers, to predict the serial number of certificates, a natural idea is to brute force every 100 nanoseconds in the second according to Algorithms 1 and 2. The computation complexity is (10 7 ). However, in real computer systems, can the timing precision be 100 nanoseconds? We test the parameter "tv" in Figure 4  in different operation systems. We installed three operation systems in the same computer (Intel Core i7 2GHz) and tested the time jumping. We can see that every time jumping is larger than 100 nanoseconds. The result is shown in Table 3. From Table 3, we can see the computation complexity in reality is much smaller than the one in theory.
According to the above discussion, attackers can predict the serial number and "not before" of a certificate. To verify the conclusion, we use Algorithm 4 to predict the serial number and "not before." In Windows XP, the time precision is 0x18730 100nanoseconds (=100144). So in Step 5, we select randomly a value of m; the success probability is 0.01; in other words, we submit the application more than 69 times; the success probability is more than 50%. In Ubuntu, the time precision is 0x3f0 microseconds (=1008). So the success probability is 0.001.
The testing result shows that the real serial number of the certificate is one of the candidate serial numbers that we predict (in Table 4).

Countermeasure
Since the value of "not before" leaks the time of certificates' generation, attackers can limit a narrow range of the seeds for generating serial numbers in OpenSSL. The problem shows that the entropy of the seed is too low, which cannot guarantee the randomness of serial numbers. Thus a natural idea is to add entropy of the seed. In Figure 4, a dummy seed is defined but it is a fixed 20 bytes ".". Obviously, if the seed is a variable secret, the entropy will be increased. This is the simplest method to deal with the problem.
The other idea is that the value of "not before" should be set a future time instead of the current system time. For  example, the value can be set as 00:00:00 of the second day after the day of application. Thus, attackers cannot know the exact time when the certificate is generated.

EJBCA. EJBCA is an open source PKI Certificate
Authority software based on Java technology. We reviewed the source codes of EJBCA Community 6.10.1.2. In EJBCA, a tool called CertTool is provided to generate certificates, where is in \ \ \ − \ \ \ \ \ . V . We reviewed the file to find how the valid time and serial number of certificates are generated.
From Figures 8 and 9, we can conclude that the default value of "not before" is set as "current time -10 minutes" (in milliseconds), and "not after" is set as "current time + 24 hours" (in milliseconds). The generation algorithm of "serial number" is "SHA1PRNG" and the seed is set as "current time" (in millisenonds).
Obviously, the problem of EJBCA is similar to OpenSSL. We can get "not before" of certificates easily, then know the seed of "SHA1PRNG," and predict the serial number.

CFSSL.
CFSSL is an open source PKI/TLS toolkit developed by CloudFlare. We reviewed the source codes of CFSSL 1.2 in order to find how the valid time and serial number of certificates are generated.
From Figure 10, we can see that the default value of "not before" is set as "current time." The "serial number" is generated by the function "rand" in the package "crypto/rand" of Go. "rand.Reader" is a global shared instance of a cryptographically PRNG, which reads from /dev/urandom on Unixlike systems or from CryptGenRandom API on Windows systems; i.e., the seed of the PRNG is from operation systems.  It is hard to predict the output of random number generators of operation systems so far.

NSS.
NSS is a set of libraries supporting cross-platform network security services and developed by Mozilla. We reviewed the source codes of NSS 3.38 to find the way that the valid time and serial number of certificated are generated. The tool creating certificates is in \ \ \ \ . . From Figure 11, we can see that the default value of "not before" is set as "current time." From Figure 12, "serial number" is not a random number. "LL USHR" is a macro defined in "prlong.h" to logically shift the second operand right by the number of bits specified in the third operand. "PRTime" is a 64-bit structure in microseconds. Thus, "serial number" is 64-bit "current time" shifted right by 19 bits. Obviously, we can predict "serial number" easily.

Botan.
Botan is an open source cryptography library written in C++. We reviewed the source codes of Botan 2.6 to find the way that the valid time and serial number of certificated are generated.
Form Figure 13, the default value of "not before" (start time in Figure 13) is set as "current time." The "serial number" is the second parameter of the function "sign.request", i.e., "rng()", which is defined in the header file /botan/src/cli/cli.h in Figure 14.
There are 5 kinds of random number generators in Botan, which is dependent on the command parameters "rng -system -rdrand -auto -entropy -drbg -drbg-seed= Step 1: Attacker applies for a certificate to target CA and records the submitting time T s in seconds.
Step 2: Attacker receives the certificate, checks the time Tb of "not before" in certificate and the time Ta of "not after", and computes d = b − s and v = a − b .
Step 3: According to the Algorithms 1 and 2, attacker brutes force the every 100-nanosecond (for Windows) or every microsecond (for Linux) in T b to find which time seed T bn is used to generate the certificate (totally 10 7 or 10 6 ) Step 4: Attacker selects a future time T f as the time of "not before" of the target forged certificate Step 5: Attacker randomly selects a value of m, which satisfies the condition: Step 6: According to the Algorithms 1 and 2, attacker computes the candidate serial numbers with the seed of T f (in seconds) and T bn +100144m(in 100nanoseconds) or T bn +1008m(in microseconds).
Step 7: Attacker uses the candidate serial numbers, T f as "not before", and f + v as "not after", to generate forged certificates according to the Stevens's method [3].
Step 8: Attacker submits the application at the time f − d and get the signature of the certificate.     Figure 14: The definition of rng() in Botan. * bytes." The parameter "-system" means using the RNG of operation systems, such as /dev/(u)random in Linux-like systems. The parameter "-rdrand" means using the instruction RDRAND from Intel x86 on-chip hardware random number generator. The parameters "-auto" and "-entropy" use the system RNG or else a default entropy source to input seeds. The parameter "-drbg" uses a PRNG complied with NIST SP 800-90A, whose seed is designated by "-drbg-seed." There are no known security vulnerabilities of those RNGs for predicting their outputs so far.

Fortify.
Fortify is an open source application supported by the CA Security Council. We reviewed the source codes of Fortify 1.0.17 to find the way that the valid time and serial number of certificated are generated. Form Figure 15, the default value of "not before" is set as "current time." The "serial number" is generated by the function "crypto.getRandomValues," which is from Web Crypto API and is a cryptographically strong RNG.
Concluding the above analysis on OpenSSL, EJBCA, CFSSL, NSS, Botan, and Fortify, we can compare the way generating valid time and serial number of certificates in Table 5.

Conclusion
In the paper, we found the vulnerability during OpenSSL's generating the serial number of X.509 certificates. It is possible to forge certificates based on the method presented by Stevens. Similarly, EJBCA and NSS have the same vulnerability among other 5 open source libraries.
Although MD5 has been replaced by CAs now, with the development of technology, new attacks for current hash algorithm adopted by CAs, such as SHA-256, will probably occur in the future. If the chosen-prefix collision of some hash algorithm occurs, the threat will work again probably. In that case, attackers still need to predict the value of fields  controlled by CAs in order to construct forged certificates. Thus, the randomness of the serial number is important for CAs too.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The author declares that they have no conflicts of interest.