Construction of a New Biometric-Based Key Derivation Function and Its Application

Biometric data is user-identifiable and therefore methods to use biometrics for authentication have been widely researched. Biometric cryptosystems allow for a user to derive a cryptographic key from noisy biometric data and perform a cryptographic task for authentication or encryption.The fuzzy extractor is known as a prominent biometric cryptosystem.However, the fuzzy extractor has a drawback in that a user is required to store user-specific helper data or receive it online from the server with additional trusted channel, to derive a correct key. In this paper, we present a new biometric-based key derivation function (BB-KDF) to address the issues. In our BB-KDF, users are able to derive cryptographic keys solely from their own biometric data: users do not need any other user-specific helper information. We introduce a security model for the BB-KDF. We then construct the BB-KDF and prove its security in our security model. We then propose an authentication protocol based on the BB-KDF. Finally, we give experimental results to analyze the performance of the BB-KDF. We show that our proposed BB-KDF is computationally efficient and can be deployed on many different kinds of devices.


Introduction
Biometric data is unique to the individual, and, therefore, there has been a lot of research on using biometrics for authentication systems.There are two main types of biometrics: physical (e.g., a fingerprint, face, iris, or hand) and behavioral (e.g., a handwritten signature or keyboard dynamics such as rhythm, speed, and use of the left or right shift key).Recently, much research has been conducted to develop models to combine several biometrics for user authentication [1][2][3][4].In comparison to previous non-biometric-based authentication systems, biometric-based systems do not require the user to remember passwords or possess security tokens.Instead, the server authenticates the user just by using his/her unique physical or behavioral features (i.e., who you are).Various kinds of devices that collect biometric data can now be found in the surrounding environment, making deployment of a biometric-based authentication system practicable.
In the early stages of research on biometric authentication, biometric templates were developed and used as such.In other words, in the enrollment phase, a template extracted from a user's biometric data was stored in the server, and, in the authentication phase, a newly extracted template was compared with the stored one.However, biometric data in authentication have given rise to a host of privacy issues [5].Once a biometric template is stored in the server or database, the raw biometric data can be recovered, compromising user privacy [6,7].In addition, once a particular piece of biometric data has been compromised, it cannot be used again for authentication, yet since there is not a wide array of biometric data available to use for authentication, it cannot continually be replaced.With the threat of compromise and limited biometric resources for authentication, recent research has focused on protection of biometric templates.
The research can be categorized into two types: (1) cancelable biometrics and (2) biometric cryptosystems.In cancelable biometrics, biometric data are altered via noninvertible transformations [8][9][10].The transformed template is stored in a server for matching.If the transformed template is compromised, the system cancels the validity of the template and reissues a new template with different parameters.However, the template is used simply for an authentication purpose.In a biometric cryptosystem, the user derives a deterministic key from his/her own noisy biometric data (with help of additional data which was generated at an initial step).The derived key is used to operate cryptographic mechanisms for various security goals such as authentication, encryption, and message integrity [11][12][13][14].
The main challenge in biometric cryptosystems is how to deal with the noisy trait of biometric data while keeping the privacy of biometric data in a cryptographic aspect.Biometric readings may differ every time, even though they are derived from the same individual.Nevertheless, the same digital key should be able to be derived if the difference between two pieces of noisy biometric data is within a certain minimal threshold.
The fuzzy extractor was a biometric cryptosystem first introduced by Dodis et al. in 2004 [13] and has been widely researched since then.The fuzzy extractor consists of two algorithms.The first algorithm, Gen, takes a piece of noisy data as an input and outputs a key as well as so-called helper data, which is public information.The second algorithm, Rep, takes as inputs the helper data and another piece of noisy data and outputs a key.If the two pieces of noisy data are quite similar, then the two algorithms, Gen and Rep, will generate the same key with the help of the helper data.Although the fuzzy extractor is a useful cryptographic primitive in that it derives a cryptographically secure key from noisy biometric data, it has a drawback.A user must personally receive the user-specific helper data from the Gen algorithm and keep it as a security token to input into the Rep algorithm, or the user must receive it from the server via additionally established channel, whenever authentication is required.Since the helper data is of substantial size (approximately 33,569MB or 19,372GB when the error occurs by 15% or 20% between measurements, respectively, in [15]), when a large number of users participate in the system, a considerable network bandwidth is used, which causes heavy network traffic and overwhelms the system.

Our Contributions.
In this paper, we propose a new efficient biometric-based key derivation function (BB-KDF) that will require no user-dependent randomized information (such as helper data of the fuzzy extractor) and only use the user's biometric data to generate a cryptographic key for authentication.In our BB-KDF, biometric data are assumed to be encoded to a biometric vector of real numbers as in [16,17].Though the BB-KDF is described for biometrics in the paper, it is applicable to other fuzzy or noisy data.
The proposed BB-KDF is conceptually simple and computationally efficient, and, therefore, we may deploy the BB-KDF in a wide range of devices for the Internet of Things (IoT).In the IoT, various devices are connected to the Internet and potentially to each other, requiring authentication to limit access to a particular user or users.Since the proposed construction has little computational overhead, the BB-KDF is deployable on many different kinds of devices, from lowperformance devices (e.g., sensors) to high-performance devices (e.g., smartphones).
In the proposed BB-KDF, a cryptographic key is computed only by using a user's biometric vector and a public parameter that is generated when the system is set up and is not associated with the user or his/her biometric vector.Therefore, in the BB-KDF, no user biometric-dependent data is available to adversaries.Capturing it precisely, we define a security model for the BB-KDF and prove the security of our construction using this model.
As an application, we propose an authentication protocol using the BB-KDF.The Schnorr identification scheme is used to construct a biometric authentication scheme, except that a secret key is derived from the BB-KDF.The security of the authentication scheme is taken from that of the Schnorr identification scheme, based on the hardness of the discretelogarithm problem.
Finally, we give experimental results to show the performance of the BB-KDF.To analyze the efficiency of the BB-KDF more fairly, we conducted experiments in various settings considering both a device specification and the length of biometric vector.We also give comparison between the BB-KDF and an existing KDF (e.g., password-based KDF) in terms of computational cost.BB-KDF is approximately 146 times faster than PBKDF1 in PKCS♯5 (setting the iteration number to 1000) on smartphones.

Organization.
The remainder of this paper is organized as follows.In Section 1.3, we briefly review related work in the area.In Section 2, we introduce basic notions and definitions.In Section 3, we define the notion of BB-KDF and its security model.We describe our proposed BB-KDF construction and experimental results in Section 4, followed by its security analysis in Section 5.In Section 6, we present an authentication protocol as an application of the BB-KDF.Section 7 is dedicated to analyzing the efficiency of the BB-KDF with implementation results.Conclusions are given in Section 8.

Related Work
Key Derivation Function.The goal of a key derivation function (KDF) is to derive a pseudorandom key (i.e., cryptographically secure key), taking a source of initial keying material as an input.The main key material used has been imperfect, namely, not uniformly random or pseudorandom, such as physical sources [18,19], a shared Diffie-Hellman value [20,21], a user password [21][22][23][24], or any bit sequence from a source of more or less entropy [21].In case of an imperfect input material, the extract-then-expand approach has been used to derive a cryptographic key [25].A password-based key derivation function (PB-KDF) has been the most typical one, which takes a user-chosen password as an input.PB-KDFs have been constructed by using a variety of cryptographic primitives such as block ciphers [23], stream cipher [24], and cryptographic hash functions [21,22].In PKCS♯5, the de facto standard for passwordbased cryptography, the methods to construct a practical PB-KDF are provided [26].PB-KDF has been employed in various environments.For example, PBKDF2, described in PKCS♯5, was used in Android's full disk encryption [27], WPA/WPA2 encryption process [28], FileVault MAC OS X [29,30], and Winrar [31].In the case of applying a PB-KDF to existing cryptosystems (e.g., authentication systems), users basically have to memorize their passwords, which is quite inconvenient.In this paper, we propose the construction of a new KDF that requires no additional information to be memorized.The proposed KDF can replace existing PB-KDFs in various environments.
Biometric Cryptosystems.As a means of biometric template protection, a biometric cryptosystem represents a comprehensive biometric-based key generation and encryption system [14].In previously existing biometric cryptosystems, biometric-dependent, but public, information (a.k.a.helper data) has been generated using a biometric template to correctly recover a cryptographic key.Biometric data are noisy and there have been various research projects on derivation of cryptographic keys from noisy data.Fuzzy commitment schemes have been constructed based on binary error-correcting codes [11], in which biometric data are represented as binary strings, the similarity of which is estimated by using the hamming distance metric.Fuzzy vault schemes have corrected errors in noisy data by using polynomial interpolation [12].In such schemes, biometric data have been represented as a set of elements in a finite field.In 2004, Dodis et al. generalized the fuzzy vault and the fuzzy commitment and proposed the fuzzy extractor [13].Since then, the information-theoretic fuzzy extractors have been widely researched [32][33][34], and, recently, the computational fuzzy extractor was also constructed [35].A reusable fuzzy extractor was first constructed in 2004 by Boyen [36], and, until recently, very little further research has been done on this.Over a decade later, in 2016, a reusable fuzzy extractor was proposed with no limitation on the number of correlated readings of the source [15].Since then, reusable fuzzy extractors based on various assumptions have been proposed [37,38].In the fuzzy extractor, user-specific, biometric-dependent information is generated to correct the noise in biometric data, causing a great deal of inconvenience to the user because he/she should carry it in person or receive it from the server whenever the authentication is required.On the other hand, the proposed (biometric-based) KDF does not require any user-specific information in authentication, mitigating the user's inconvenience.
Fuzzy Signature.Fuzzy signature uses the user's biometric data as a signing key.The concept of fuzzy signature was first proposed by Takahashi et al. in [39] and has since been improved by relaxing the requirements for construction or increasing efficiency [40].However, all the fuzzy signature schemes proposed so far are not robust since the user's biometric data can be directly recovered from the (public) verification key or signature [41].

Preliminaries
In this section, we introduce the basic notations and definitions that will be used throughout the paper.
Basic Notations.Let R denote the set of all real numbers.Let [0, 1) denote the set of all real numbers  satisfying 0 ≤  < 1.If  ∈ R  for an integer  > 0, then  denotes a vector of length  whose components are all real numbers.Similarly, if  ∈ [0, 1)  for an integer  > 0, then each component of  is a real number in a range of [0, 1).Let [1, ] denote the set of all integers  satisfying 1 ≤  ≤  for a positive integer .The symbol "‖" denotes concatenation.If  ∈ R, then "⌈⌋" denotes a rounding to the nearest integer.

Entropy and Hash Functions
Min-Entropy.For a probability distribution X, we use the notation Pr X () to denote the probability assigned by X to the value .We often omit the subscript when the probability distribution is clear from the context.For a probability distribution X over {0, 1}  , we define its min-entropy as the minimum integer  such that, for all  ∈ {0, 1}  , Pr X () ≤ 2 − .We denote the min-entropy of such X by H∞ (X).
Conditional Entropy.The (average) min-entropy of X given Z is defined as follows: The collision probability of X given Z is given by and the collision entropy of X given Z is equal to For any joint distribution X and Z, these three notions are related as follows: Definition 1.Let ℓ and  be integers and H be a family of hash functions ℎ with domain {0, 1} ℓ and range {0, 1}  .We say that the family H is -almost universal (-AU) if for every pair of different inputs ,  from {0, 1} ℓ it holds that Pr[ℎ() = ℎ()] ≤ , where the probability is taken over ℎ ∈  H.For a given probability distribution X over {0, 1} ℓ , we say that H is -AU with respect to X if Pr[ℎ() = ℎ()] ≤ , where the probability is taken over ℎ ∈  H and ,  ∈  X conditioned to  ̸ = .Clearly, a family H is -AU if it is -AU with respect to all distributions over {0, 1} ℓ .If  = 2 − then we say that H is universal.

Leftover Hash Lemma. The leftover hash lemma (LHL)
for the conditional min-entropy [42] is given as follows.
Lemma 2. Let (X, Z) be a joint distribution on X × Z.Let ℓ and  be integers, let X be a probability distribution over {0, 1} ℓ , and let  be a family of hash function with domain {0, 1} ℓ and range {0, 1}  .If H is (1/2  + )-almost universal with respect to X,   being uniform over {0, 1}  and ℎ being uniformly chosen over H, then the statistical distance between ℎ(X) and   , given ℎ and Z, is defined as follows: where  = H∞ (X | Z) −  is the entropy loss.
For universal hashing, we need  ≈ 2 log(1/) to make the statistical distance smaller than .Note that we need  ≪ 1/2  (say,  ≈ 1/2 2 ).There are examples of families with  = 1/2  (i.e., (2/2  )-AU families) that generate outputs that are easily distinguishable from uniform distribution.For example, if H is a family of pairwise independent hash functions with -bit outputs, and we define a new family H  that is identical to H except that it replaces the last bit of output with 0, then the new family has a collision probability of 2/2  , yet its output (which has a fixed bit of output) is trivially distinguishable from uniform distribution.Therefore, to be secure, we need  ≪ 1/2  .[43] is a function  : K × X → Y where K is a key space, X is a domain, and Y is a range.For a given security parameter , we say that a PRF  is secure if, for all efficient algorithms A,

Pseudorandom Function. A pseudorandom function (PRF)
where Funs(X, Y) denotes a set of all functions from X to Y.

Biometric-Based Key Derivation Function (BB-KDF). A biometric-based key derivation function consists of two (PPT) algorithms. (1)
The public parameter generation algorithm Setup(1  ), on input of security parameter 1  , outputs a threshold vector , a PRF  V with its key V, and a hash function  as a public parameter .Here, d(,   ) <  implies that each component-wise difference of two vectors  and   , denoted by In this paper, we construct a practical biometric-based key derivation function (see Section 4 for detailed explanations).With biometric data of -bit size and vector of length , each component of the biometric vector, denoted by   , has a bit size of (/).
(i) Biometric space S. The biometric vector consists of  real number components.We will refer to the set of all biometric vectors as biometric space, expressed as follows: S ⊂ [0, )  ⊂ R  for some positive integer For example, a biometric vector  can be expressed as  = (0.004002, 0.007759, . ..), which is a transformation of fingerprint images [44] through [45].
(ii) Distance d.If the distance between two different biometric vectors is smaller than the threshold value or, to be exact, the threshold vector, two outputs of BB-KDF with these biometric vector inputs will be the same.The distance between two biometric vectors is defined as follows.
For two vectors  = ( 1 , . . .,   ) and   = (  1 , . . .,    ), For  = ( Biometrics, such as fingerprint and face, is represented as a real number vector, and a number of cryptographic protocols using this format of biometrics have been proposed [16,17].Under the condition where a feature vector is extracted as a real number vector, our BB-KDF scheme, described above, can be applied while improving overall efficiency of the authentication system.

Security Model.
In this section, we introduce two security requirements: privacy of a biometric vector and security of the derived key.

Privacy of a Biometric Vector.
A biometric vector is unique to each user and therefore user-identifiable.Since a biometric vector is sensitive information, privacy should be guaranteed.Maintaining privacy of a biometric vector implies that no significant information about the biometric vector will become public.The only available public information in our BB-KDF scheme is its public parameter, from which an adversary can extract no meaningful information about users' biometric vector.
Leakage of information results in loss of entropy.If our key generation scheme permits only a negligible loss of entropy of a biometric vector from public side information, e.g., the public parameter, we can say that it guarantees the Security and Communication Networks 5 privacy of a biometric vector.The privacy of a biometric vector will be deemed to be maintained when the following definition is met.Definition 3. Let B be an input source and Z be a side information.When Z is public, the BB-KDF maintains the privacy of a biometric vector if , represented in below formula, is negligible.
3.2.2.Security of the Derived Key.The key derived by our BB-KDF should be cryptographically secure.In other words, an adversary should not be able to figure out the key generated from BB-KDF or obtain any nonnegligible information from the biometric vector associated with the derived key.We can define the security level afforded by the BB-KDF by computing the statistical distance (SD) between two distributions, the distribution of the key derived by BB-KDF and uniform distribution.
Assume that BB-KDF outputs a -bit key string.Given a string  of length , the adversary should not be able to distinguish whether  is derived from the BB-KDF or chosen randomly from {0, 1}  .In other words, it should be nearly impossible to tell the difference between the distribution of the key derived from BB-KDF and uniform distribution over {0, 1}  .We consider the derived key to be secure when the following definition is met.Definition 4. Let BB-KDF be a biometric-based key derivation function that outputs -bit value, B be an input source of BB-KDF, and Z be a side information.When Z is public, the key derived from the BB-KDF is secure if , represented in below formula, is negligible.

Construction
4.1.Design Goal.The design goal of the BB-KDF is to derive cryptographic keys only from users' biometric vector.A cryptographic primitive, called the fuzzy extractor, has been widely used to generate keys from biometric data.In the fuzzy extractor, additional information called helper data is generated to deal with noisy biometric data.The purpose of helper data, which is derived from the user's biometric data, is to generate or regenerate the same key using several user inputs of biometric data within a threshold range.In other words, a user who wants to extract a key from his/her biometric data has to access the helper data in addition to his/her own biometric data, which is burdensome to the user.By contrast, in the BB-KDF scheme, each user is able to derive a correct key using only a public parameter and user's own biometric vector.Recall that the public parameter is not associated with the user's biometric vector and is configured when the system is deployed.Therefore, a user does not need to store or obtain any additional information, such as the helper data, and derives a cryptographic key only by using his/her own biometric vector.

Biometric-Based Key Derivation Function.
Our construction of the BB-KDF consists of two algorithms, Setup and KDF.Let  be a positive integer to denote the length of a biometric vector.
Setup(1  ).It takes a security parameter  as input and then generate a public parameter  through the following steps: (1) It sets a threshold value   ∈ [0, 1) for each  ∈ [1, 𝑛].
(2) It selects a PRF  and a key V from the key space of  uniformly at random.
The public parameter  is defined by  = (,  V , ).
Note that the key V for PRF  is also included in the public parameter.
Given -th components of both a biometric vector and a threshold vector;   and   , a biometric value    satisfies  ℎ  (  ) = ⌈  ⋅ ℎ  ⌋ = ⌈   ⋅ ℎ  ⌋ =  ℎ  (   ) only if it is in the range of [  −   ,   +   ).Note that the errorcorrecting data, denoted by ℎ  , have no correlation with a biometric vector and are set only by a threshold vector.
In our construction, we use (nonadaptive) PRF so that the input source X of a hash function is uniformly distributed over {0, 1} ℓ .Let  : N  → {0, 1} ℓ be a PRF family.In the Setup algorithm, the description of PRF and its key V are released as public parameters.While it is unusual to publish the PRF key, we do so because we only require the fact that the distribution of the output of PRF is indistinguishable from the uniform distribution.Note that this technique has been used in previous constructions [46][47][48][49].This is a different concept to the helper data of the fuzzy extractor in that the PRF key V is a system parameter and, therefore, independent of the individual.Note that the helper data is biometric-dependent information and each user has its own helper data.

Accuracy Analysis.
To illustrate how our BB-KDF derives a key from similar biometric vectors, we give a simple example.Assume that the length of a biometric vector is equal to 3; i.e.,  = 3, and a threshold vector is set to  = (0.01, 0.02, 0.004).Given a biometric vector  = (23.784,50.125, 9.369) ∈ R 3 , the BB-KDF outputs a key   as follows:  Hence we have that    is exactly equal to   .By multiplying ℎ  = 1/2  by   , for each  ∈ [1, ], we allow only a biometric vector   = (  1 , . . .,    ), which satisfies the formula below, to generate the same key as   .
Let us take a closer look at this, using the first components of a biometric vector  and a threshold vector , denoted by  1 and  1 , respectively.Given the biometric value  1 , we regard as a valid piece of biometric vector, namely, one from the same user.By multiplying ℎ 1 = 1/2 1 , we can assure that both  1 and   1 have the same output as the output of a rounding function; i.e., ⌈ ; however, some errors may arise.In the case of  1 = 23.784 and  1 = 0.01, the range of valid biometric values is [23.774, 23.794], as shown in blue on Figure 1, whereas the actual range that satisfies ⌈ 1 ⋅ ℎ 1 ⌋ = 1189 is equal to [23.77475 . . ., 23.79475 . ..).As a matter of fact, we deal with a range that is slightly higher than what we expected, as shown in red on Figure 1.As described in Figure 1, there are two gaps between two ranges, denoted by () and ().Biometric values within the gap on the left side of Figure 1, () [23.774, 23.77475 . ..), will not succeed in deriving a correct key, even though it is valid data provided by an authorized user.This exemplifies the case of a false rejection, where a genuine user is incorrectly rejected as an imposter.It is difficult to reduce the false rejection rate (FRR) by taking a design-level approach.However, since the margin of error is quite small, this can be managed by establishing appropriate policies for this system.By contrast, biometric values within the gap on the right side of Figure 1, () [23.794, 23.79475 . ..), will succeed in deriving a correct key, even though it is not a valid one from the user.This exemplifies the case of false acceptance, where an imposter is incorrectly accepted as a genuine user.We can reduce the false acceptance rate (FAR) by increasing the length of a biometric vector.The more we increase the length of a biometric vector , the smaller the probability that all the components of  will fall within FAR-related ranges.We should configure the length of a biometric vector so the probability that an imposter will generate a correct key is negligible.
The fundamental reason for the difference between two ranges is that the rounding function used to derive a key has a discrete property, whereas a biometric vector is continuous.However, as explained above, there are strategies to deal with these gaps.Furthermore, we are able to close the gap between two ranges, thereby reducing both FRR and FAR, by decreasing threshold values.In the example described above, if we set the threshold value  1 to 0.003, the errors (i.e., gaps on both sides) converge to almost zero.
In Figure 2, the gaps on both the left and right sides indicate the FAR and FRR, respectively.The left side of those gaps, denoted by (), represents the range [23.78095, . . ., 23.781), and the gap on the right side, denoted by (), represents the range [23.78695, . . ., 23.787).So, we need to ensure that () and () shown in Figure 2 are smaller than () and () in Figure 1, respectively.If we set the threshold value to 0.002 or 0.001, the errors (i.e., gaps on both sides) are eliminated altogether.Note that since biometric value  1 is presented with three decimal places, the threshold value  1 cannot be smaller than 0.001.

Performance Evaluation.
In order to show accuracy performance of our BB-KDF, we give experimental results in terms of FAR and FRR.The experiments were conducted on two public datasets, FVC2002 (DB1, DB2) [44], where each dataset consists of 100 users with 8 samples per user.
As input for the BB-KDF evaluation, we used a realvalued fixed-length fingerprint vector which is generated by [50].For example, an input vector is represented as (. . ., 0.013335, 0.032443, 0.009297, . ..).The method of [50] consists of two main components, that is, minutiae descriptor extraction and kernel learning-based transformation.As a minutia descriptor it makes use of variable-sized Minutia Cylinder-Code (MCC) [45] which is considered as a A projection matrix is defined by a 300 × 299 matrix and so the length of real-valued fixed-length fingerprint vector is set to be 299.This means that the length of a biometric vector to be taken as input for the BB-KDF is 299.
The accuracy performance of BB-KDF can be analyzed by using FAR and FRR.FAR is used to indicate error rate of accepting an imposter as a legitimate user.FRR is used to indicate error rate of rejecting a legitimate user as an imposter.Table 1 shows experimental results of FAR and FRR according to a threshold value .In real applications, achieving low FRR is more important than reasonably high FRR because most users can bear with 2-4 retrials as long as the authentication system guarantees expected level of security against imposters.For example, when  = 0.063 and  = 0.053, we have FAR=0.05(and FRR=0.66) for both of FVC2002-DB1 and FVC2002-DB2.Hence it shows the possibility that the BB-KDF can work effectively in practice.
The experiments were conducted under a PC with processor core i7-4790 CPU @ 3.60GHz with 8 GB RAM.The BB-KDF was implemented using Python hashlib library.For a PRF and a hash we used HMAC-SHA256 and SHA256, respectively.Given a real-valued biometric vector of length 299, the computation of the BB-KDF per sample took about 0.07 ms.

Security Analysis
In this section, we analyze the security of the BB-KDF constructed in Section 4. First, we prove the security of our construction in security models described in Section 3.2.We then present an example to show that the more the information about a biometric vector is revealed, the more the loss of entropy occurs.This implies the extended version of a security model described in Section 3.2.2,where more information related to a biometric vector is included in a side information Z.

Privacy of a Biometric Vector.
To ensure the privacy of a biometric vector, we first identify the information revealed to potential adversaries in the BB-KDF scheme.In the BB-KDF scheme described in Section 4, only a public parameter is revealed, which consists of a threshold vector, denoted by , a PRF and its key, denoted by  V , and a hash function, denoted by .The public parameter is not related to users' biometric vectors, and, therefore, an adversary will not be able to extract any meaningful biometric information from it.As a result, the public parameter does not cause loss of entropy, as represented by the formula below: where B is an input source (i.e., a biometric vector) and Z is side information (i.e., a public parameter).Therefore, the privacy of a biometric vector in the BB-KDF scheme fits Definition 3 and is therefore considered secure.

Security of the Derived Key.
To ensure that the derived key is cryptographically secure, the distribution of the output of the BB-KDF should be uniform.In other words, the distribution of the key derived by the BB-KDF should be statistically indistinguishable from uniform distribution over {0, 1}  .We can estimate the security of the derived key by computing the statistical distance (SD) between two distributions, the distribution of the key derived by the BB-KDF, and uniform distribution.In our security proof, we assume that an input source of BB-KDF has enough entropy to guarantee the security of the derived key.
Assumption.We assume that there is an input source of BB-KDF whose entropy is large enough to derive a secure key, which is a quite strong assumption since it is hard to be satisfied in practice.However, the gap between reality and assumption can be bridged by using multifactor biometrics as an input source [1][2][3][4].In other words, we combine several biometrics to create an input biometric vector, which may have higher entropy than the one using only one biometrics modality.
Theorem 5. Let Z be a side information which consists of a threshold vector  = ( 1 , . . .,   ) ∈ [0, 1)  , a PRF and its key, denoted by  V and a hash function .Let  be a randomly chosen universal hash function from {0, 1} ℓ to {0, 1}  and let B be an input source of BB-KDF.Then SD(BB-KDF(B);   | Z), the statistical distance between BB-KDF(B) and the uniform distribution over {0, 1}  given Z, is at most where  is length of biometric vector,  is upper bound of biometric space,  is the number of decimal places, and   is -th threshold value over [0, 1).
Note that X represents an input of a hash function, which is generated by computing a PRF on inputs a biometric vector and a threshold vector.If we assume that  is a randomly chosen universal hash function from ℓ to  bits, using LHL (Leftover Hash Lemma), we can calculate the statistical distance between (X) and   given Z.

SD (𝐻 (X) ; 𝑈
where  = H∞ (X | Z) −  is the entropy loss.We first compute H∞ (X | Z), the min-entropy of X given Z.The min-entropy of X given Z is defined as Since  V is a PRF where its key V is randomly chosen, X, the output of  V is uniformly random over {0, 1} ℓ , with  decimal places.If not, we can construct a distinguisher that distinguishes the output of  V from random string, where V is randomly chosen.Note that it does not matter whether the PRF key V is released in public or not.By using this property, we derive the following formula: where   is the -th threshold value over [0, 1),  is the upper bound of input space, and  is the length of biometric vector.
Let us think about the -th component of a biometric vector   .First, since   is included in [0, ), for some integer  with  decimal places, the size of the space of   is ( ⋅ 10  ).
Second, the range of a biometric value    which satisfies , where   is a biometric value,   is a threshold value, and ℎ  = 1/2  .Therefore, the size of the space of    is (2  ⋅ 10  ).In conclusion, if a threshold vector  and a PRF  V are public, the probability that Using the leftover hash lemma, the statistical distance between (X) and   given Z is as follows: where  = 128 and  = 50 (where  is the length of the derived key in bits and  is a length of biometric vector); we can compute the statistical distance in various settings.To make it simpler, we assume that all the components of a threshold vector (i.e.,   for all  ∈ [1, ]) have the same value, denoted by .Table 2 represents the statistical distance in various settings.Therefore, if the statistical distance is negligible in a parameter setting, meeting Definition 4, then the key derived by the BB-KDF in that setting is secure.

Leakage of a Biometric
Vector.If an adversary is not able to obtain any information associated with a user's biometric vector, the probability that the adversary can get meaningful information about the biometric vector will be negligible.However, if any additional information, beyond a public parameter, about the biometric vector is revealed, the entropy of the biometric vector will be reduced and an adversary may be able to extract enough other biometric vectors to launch an effective attack.
We evaluate the probability that an adversary will succeed in being able to obtain the user's biometric vector based on the degree of leakage.We assume that the length of a biometric vector is equal to 1; that is, a biometric vector consists of only one component, denoted by .We also assume that the biometric space S is [0, 100) ⊂ R (i.e.,  = 100) and that  has three decimal places.Let  = 23.784 and  = 0.01 (where  is a threshold value).Note that, when we say the adversary will succeed, we mean that the adversary can figure out a biometric vector   to generate the same key as  does.In other words, a successful adversary figures out   such that   =    where KDF(, ) →   and KDF(,   ) →    .
In the next subsections, we give details to show how the degree of information leakage on a biometric vector affects its recovery, more concretely, Section 5.2.1 for the case where  is never leaked, Section 5.2.2 for the case where the integer part of  is leaked, and Section 5.2.3 for a generalized assumption where the length of a biometric vector is equal to (> 1).
In order to derive the same key as   , a biometric vector   should satisfy ⌈  ⋅ ℎ⌋ = 1189.In other words, an attack succeeds if an adversary guesses   within the range of [23.770, 23.790).Therefore, the probability that an adversary will succeed in guessing the correct   (i.e.,   ∈ [23.770, 23.790)) is Since the size of the space of  is equal to 100 ⋅ 10 3 and the number of biometric values within the range of [23.770, 23.790) is equal to 20, the probability that an adversary will is only 0.02%.As a result, if there is no leakage of a biometric vector, the probability that an adversary will figure out underlying biometric vector (i.e., a biometric vector from which a valid key is derived) is reasonably low.

Leakage of the Integer Part.
We assume that the integer part of a biometric vector, denoted by , is leaked.For example, let us assume that an adversary knows the integer part of , which is 23.In this case,  is within the range of [23,24), and, therefore, it satisfies 1150 ≤ ⌈ ⋅ ℎ⌋ ≤ 1200, where  = 0.01 and ℎ = 1/2 = 50.So there are fiftyone candidates for the derived key (i.e., from ( V (1150)) to ( V (1200))).Accordingly, there are fifty-one candidates for .In the case where ⌈⋅ℎ⌋ = 1181, we can determine the range of , which satisfies ⌈ ⋅ ℎ⌋ = 1181, as follows: Since this range (i.e., 23.61 ≤  < 23.63) means the range of [ − ,  + ) for the threshold value  = 0.01, an adversary may infer that  is 23.62.In the same way, an adversary may compute fifty-one candidates for .The detailed description is given in Table 3.
Given that the integer part of the biometric value  is equal to 23 and the threshold value  is equal to 0.01 (i.e., side information denoted by ), the probability that an adversary will succeed in correctly guessing a particular piece of a biometric vector is given as follows: Compared to the case where no leakage of a biometric vector occurs, the probability that an adversary will succeed is increased.In other words, it will be easier for an adversary (i.e., imposter) to guess the biometric vector of a genuine user when integer leakage occurs.In order to make the probability lower, we can decrease a threshold value , since the probability that an adversary will succeed in guessing a valid piece of biometric vector is 1/(1/2 + 1).For example, in the above case, if we lower  from 0.01 to 0.001, the probability that an adversary will succeed in guessing is reduced from 1.96% to about 0.2%.

Analysis of the Generalized Case.
We now generalize the assumption by extending the length of a biometric vector to (> 1).Let  be the probability of success in the case where the length of a biometric vector is equal to 1 (e.g., =0.0002 in Section 5.2.1).Ideally, the probability that an adversary will succeed in guessing the biometric vector of size  is   .However, since components of the biometric vector generally correlate with each other, entropy loss will occur, making the probability of success higher than   .Let   be a factor of entropy loss of the -th component of the biometric vector.We can write the probability that an adversary will succeed in guessing the biometric vector of size  as follows.
If  is sufficiently large, we can expect that the probability of leakage of a user's biometric vector is almost negligible.

Application
6.1.Replacement of PB-KDF.Password-based key derivation functions (PB-KDFs) are widely used in a variety of applications [27][28][29][30][31].These can all be replaced with the proposed BB-KDF.In fact, only the key generation part would need to be replaced, from the password-based to the biometric-based mechanism, and the remaining process would not change.
Since the BB-KDF works with noisy (biometric) data, it can also works well with a combination of noisy and deterministic data.That is, in the BB-KDF, a user is able to generate a cryptographic key using his/her biometric vector and a human memorable password together.Since users generally choose passwords that can easily be memorized, the entropy of a password is more likely to be low, and, therefore, a PB-KDF is vulnerable to offline password guessing attacks.
Using the BB-KDF based on two factors, i.e., biometrics and password, we can enhance the PB-KDF based on only a single factor, i.e., password in both efficiency and security.that  KDF is reasonably low.Table 4 describes computational costs of the proposed authentication system (in case of  = 100) on the user side.The specifications of device are given in Table 5.
Theorem 6.Let the discrete logarithm in the above authentication scheme be derived from the secure BB-KDF.If the discretelogarithm problem is hard in G, then the authentication scheme above is secure.
Proof.The authentication scheme above is constructed on the basis of the Schnorr identification scheme, which is based on hardness of the discrete-logarithm problem (DLP).In the Schnorr identification scheme, a secret key is chosen uniformly at random.Since we use the secure BB-KDF to derive a secret key in the above authentication scheme, the derived key is indistinguishable from a random string of the same length.Therefore, on the assumption that the DLP is hard in G, the authentication scheme above is secure.For more details of the proof, see Theorem 12.11 in [56].

Experimental Results
In this section, we analyze the performance of the BB-KDF constructed in Section 4 under various conditions, that is, considering both device specifications and the length of biometric vector.We also compare the BB-KDF with a PB-KDF in terms of efficiency.Note that we need to compute the inverse of each threshold value in our BB-KDF.Due to its slow computation time, we assume that each inverse element is included in a public parameter for efficient evaluation.Since the threshold vector is public, it is reasonable to assume that the server computes the inverse of each threshold value and releases it as a public parameter.
Algorithm 1 is a pseudocode for the BB-KDF.KDF algorithm.In the experiment, HMAC-SHA-256 and SHA-256 are used as the underlying PRF  V and hash algorithm , respectively.

Effects of Device Specification.
In this subsection we analyze the computational overhead of the BB-KDF based on various device specifications.Table 5 specifies the devices evaluated and the software used for the evaluation.
Given a biometric vector and a public parameter, the computational overhead for running the BB-KDF on the MCU1, MUC2, and smartphone (listed in Table 5) are described in Figure 3(a).In the experiment, HMAC-SHA-256 and SHA-256 are used as the underlying PRF and hash algorithm, respectively.The size of the PRF key and the length of biometric vector are set to be 32 bytes and 100, respectively.In this experiment, the key generation time for the BB-KDF is measured in three phases.The first phase, denoted by kGen, measures computing time of the rounding function; namely,  ℎ  (  ) = ⌈  ⋅ ℎ  ⌋ for all  ∈ [1,100].The second phase, denoted by PRF, measures computing time of the PRF; namely,  V ( ℎ 1 ( 1 ) ‖ . . .‖  ℎ  (  )).The third phase, denoted by hash, measures computing time of the hash function, namely, (X), where X =  V ( ℎ 1 ( 1 ) ‖ . . .‖  ℎ  (  )).Total indicates the sum of kGen, PRF and hash.
As illustrated in Figure 3(a), the better the performance of the device, the less computational overhead it takes to derive a key.Total computational overhead using the MCU1, MCU2, and smartphone is 1569.25s,814.09s, and 37.7s, respectively.From the result, we can show that the second phase takes much more computational overhead than the first phase does.

Effects of the Length of Biometric
Vector.In this subsection, we analyze the computational overhead of the BB-KDF based on the length of the biometric vector, denoted by .In this experiment, we consider five cases where  is equal to 50, 100, 200, 300, or 400.Since several biometrics can be combined to build a single biometric vector [1][2][3][4], we set the length of the biometric vector to be sufficiently large.Figure 3(b) illustrates computational overheads of the BB-KDF corresponding to vector size.For example, when  = 200, computational overheads of MCU1, MCU2, and smartphone are 3239.09s,1617.85s,and 49.87s, respectively, making them deployable.From the result, we can show that the computational overhead increases linearly as the length of biometric vector gets bigger.

Comparison with PB-KDF.
Computation time of the BB-KDF and PB-KDF on both smartphone and desktop is compared in Table 6.In the experiment, the PBKDF1 in PKCS♯5 is employed and the iteration number is set to be 1000 [26].Only one PRF and one hash function are employed in the BB-KDF, resulting in a much faster key generation with the BB-KDF than with the PB-KDF.Feature extraction time varies according to extraction algorithms and types of biometric data [57,58].If we use a suitable algorithm for feature extraction, such as [16,59], the BB-KDF can be an

Conclusions
In this paper, we have proposed the construction of a novel biometric-based key derivation function, called BB-KDF, and have proven its security.The proposed BB-KDF is computationally efficient, as shown in our experiments.
As a result, it can be deployed in a wide range of devices for the Internet of Things (IoT) and can replace existing PB-KDFs in various environments.We have also introduced an authentication protocol using BB-KDF as an application.
For future work, it will be interesting to study the construction of robust BB-KDFs against outliers.Here the outliers imply some biased biometric points that are further away from what is deemed reasonable.Since biometric readings may differ from environment to environment, or sensor to sensor, outliers may exist with high probability.

Table 1 :
FAR/FRR of implementation of the proposed BB-KDF.

Table 2 :
Statistical distance in various settings.

Table 3 :
Candidates for a biometric value (in the case of leakage of the integer part).

Table 4 :
Computational costs of authentication system.

Table 5 :
Specifications of device and software.