Deep Hybrid Multimodal Biometric Recognition System Based on Features-Level Deep Fusion of Five Biometric Traits

The need for information security and the adoption of the relevant regulations is becoming an overwhelming demand worldwide. As an efficient solution, hybrid multimodal biometric systems utilize fusion to combine multiple biometric traits and sources with improving recognition accuracy, higher security assurance, and to cope with the limitations of the uni-biometric system. In this paper, three strategies for dealing with a feature-level deep fusion of five biometric traits (face, both irises, and two fingerprints) derived from three sources of evidence are proposed and compared. In the first two proposed methodologies, each feature vector is mapped from the feature space into the reproducing kernel Hilbert space (RKHS) separately by selecting the appropriate reproducing kernel. In this higher space, where the result is the conversion of nonlinear relations to linear ones, dimensionality reduction algorithms (KPCA, KLDA) and quaternion-based algorithms (KQPCA, KQPCA) are used for the fusion of the feature vectors. In the third methodology, the fusion of feature spaces based on deep learning is administered by combining feature vectors in in-depth and fully connected layers. The experimental results on 6 databases in the proposed hybrid multibiometric system clearly show the multimodal template obtained from the deep fusion of feature spaces; while being secure against spoof attacks and making the system robust, they can use the low dimensionality of the fused vector to increase the accuracy of a hybrid multimodal biometric system to 100%, showing a significant improvement compared with uni-biometric and other multimodal systems.


Introduction
Te multibiometric system utilizes fusion to combine multiple biometric sources with improved recognition accuracy [1] while eliminating the limitations of uni-biometric systems relying on a single biometric trait. It is subject to limitations such as noise, poor data quality, nonuniversality, and large variations between users. As far as multiple sources of biometric information are concerned, fve possible scenarios exist. Five scenarios can provide biometric information from multiple sources. Multibiometric systems can be classifed according to their information sources, such as multisensory, multi-algorithm, multi-instance, multisampling, and multimodal. Tere are four scenarios in which a single biometric trait (like a fngerprint or an iris) can be used to derive several types of information, while a ffth scenario (which is called a multimodal biometric system) involves the use of several biometric traits (such as fngerprints and iris). Te above fve scenarios can also be combined into a multibiometric system [2]. Tese systems are known as hybrid multibiometric systems ( Figure 1).
Moreover, to further increase the user authentication's complexity and ensure higher security, more than one trait is combined with each other [3]. Terefore, this paper introduces hybrid multibiometric structures to solve the abovementioned problems. Tanks to stronger reliability, greater applicability, and better security, multimodal biometric systems have been developed for biometric recognition and have attracted more researchers [4].
As shown in Figure 2, there are four levels of biometric data fusion. Raw data will be combined if it occurs at the sensor level. Multimodal systems cannot beneft from this type of fusion; however, it can improve the efciency of unibiometric systems. It is possible to combine various biometric features of the same class by using feature-level fusion. It is also possible to combine the scores obtained from multiple classifers, each of which pertains to a particular biometric. Te simplicity and low cost of this method make it ideal for designing multibiometric systems. Decision-level fusion can also occur when several decisions are combined that are each derived from a single biometric system. Te efectiveness of decision-level fusion is lower than that of score-level fusion, however. Uni-biometric systems are effcient at recognizing individuals, so both levels can improve limited space usage. A comparison of the four fusion levels reveals that the feature level can extract the maximum discriminative data from the initial feature sets and remove redundant information [4,5]. Multimodal systems are best devised using feature-level fusion due to the richer information in the feature vectors. Tere are some applications of machine learning methods in biomedical science [6][7][8][9], knowledge science [10,11], and network and system protection [12][13][14][15][16].
We approach feature-level deep fusion as a design and development technique to achieve a robust and secure hybrid multimodal biometric template. However, it becomes challenging to perform concatenation directly for feature sets with inherent diferences in their representation (e.g., IrisCode in Iris and minutiae in fngerprints) [17,18] Different feature fusion approaches have been explored by several authors [19][20][21] for fusing other modalities reasonably and efectively. We also proposed three strategies for the feature-level deep fusion of biometric traits.
A feature vector is formed by combining two or more feature vectors in the feature space. As a result, the fnal vector has a higher detection power than the original vectors. Te process of combining feature vectors can be performed by mapping them by selecting the reproducing kernel functions to the reproducing kernel Hilbert space (RKHS) with much higher dimensions, and then fusion of the mapped vectors to RKHS via dimensionality reduction algorithms (KPCA, KLDA) and quaternion-based algorithms (KQPCA, KQLDA).
Or the fusion of feature spaces can be achieved based on deep learning by combining feature vectors in-depth and fully connected layers ( Figure 3).
Tree strategies are presented here to merge the face, combined iris, and combined fngerprint feature vectors in a hybrid multibiometric recognition system to create a robust and secure multimodal biometric template. We propose and compare dimensionality reduction algorithms (KPCA, KLDA) and quaternion-based algorithms (KQPCA, KQLDA) for reproducing kernel Hilbert space (RKHS) and in-depth learning-based fusion.
As a continuation of this paper, Section 2 introduces the theoretical foundations of the Kernel Method, Hilbert space, RKHS, and Quaternion, as well as deep learning as a method for combining feature vectors in depth and fully connected layers using RKHS. An overview of the design and implementation of the proposed hybrid system is presented in Section 3, which also includes the features extraction, feature-level integration, and classifcation modules. Te results of the analysis and tests are reported in Section 4, and the conclusion is presented in Section 5.

Theoretical Fundamentals
In this section, to explain the proposed methodology, we present the main theoretical fundamentals we relied upon for our previous works [22,23].

Kernel Method.
Te kernel technique is based on the following ideas; Assume that x 1 , x 2 ∈X ⊆ R d represent two specimens and φ: X ⟶ H is a feature map of the nonlinear type that transforms each element existing in X into a highdimensional (or even infnite-dimensional) Reproducing Kernel Hilbert Space (RKHS). Te interior product between φ(x 1 ) and φ(x 2 ) in the feature spacing H may be calculated through the application of a kernel function k(x 1 , x 2 ): Practically, this interior product is obtained from the direct introduction of kernel k without having to fnd the explicit expression φ, often called the "kernel trick." Even though it is efective in learning nonlinear structures, the kernel technique often sufers from scalability drawbacks when dealing with large-scale problems due to extreme space and time complexity [24].  numerous attributes of Hilbert's space theoretical framework, especially in infnite-dimensional function spaces. Hilbert space H is a complex vector space on which the inner product for each pair of vectors has the following properties: 〈y, x〉 � 〈x, y〉,

Hilbert Space (H
also, H with a distance function d(x, y) is a complete metric space such that every Cauchy sequence in H has a limit in H: In a Hilbert space, the Pythagorean theorem and parallelogram law are exact analogs. However, each element of Hilbert space can be represented based on its coordinates on the axes (normal orthogonal bases). Te projection and change of basis in fnite-dimensional spaces to Hilbert space are one of the things that show the scope of the applications of Hilbert space.

Reproducing Kernel Hilbert Space (RKHS).
Te Hilbert space can be used to extract nonlinearity or higher-order moments from data [25]. For RKHS; x ∈ X , k x (y) � k(x, y) ∈ H, Specifcally, G � (k(x, y)) x,y may be determined with a self-adjoint matrix in the case where X is a fnite set, known as the Gram matrix of H [26]. Also, it follows from the reproducing property that:   x 1 ∈ T,y n' ,······,y 1 ∈ T,a m' ,······, a i a j a ij 〉0,A � a ij 〉0.

(6)
Te RKHS property is signifcant in kernel learning because it uniquely determines a reproducing kernel function k(x,·) in some specifc Hilbert space H. Practically, the H may be abstracted as the image Φ(x) with a nonlinear, probably infnite-dimensional, mapping function with the property such as k(x, y) � < φ(x), φ(y) > H also called as the kernel trick. Te kernel trick lets us directly evaluate the inner product developed between φ(x) and φ(x′) in H without constructing the explicit equation for φ. Indeed, some kernels, like the Gaussian kernel, ft into infnite-dimensional feature spaces. Tere are many kernel-based learning algorithms that calculate only a fnite number of data points based on Gram matrices. Te Gram matrix G∊R n * n on a data set x 1 , . . ., x n is given by G ij � k(x i , x j ) [27]. Our transfer operators may live in a space of infnite dimensions, but, using Gram matrices deriving from training data, all relevant operations can be performed. Feature spaces can also be infnite-dimensional (e.g., Gaussian kernels) as well as lowdimensional (e.g., polynomial kernels) [28]. In this paper, feature vectors of faces, irises, and fngerprints with appropriate kernel functions are mapped from the feature space to the Hilbert space (with high properties). Te change of basis is made in that space to extract more linear relationships.

(8)
PolyPlus kernel function is as follows: Linear kernel function is as follows: Hamming kernel function is as follows: (m is the number of image pixels).

(11)
We can calculate the principal component or linear discriminant in space by fnding the best kernel function (in RKHS) using high-order correlations of the input pixels. Te Matlab code will select the best response based on the mapping of the input image to a higher-order feature space using multiple kernels. Generally, linear kernels, polynomial kernels, Gaussian kernels, and Hamming kernels are the most commonly used kernel functions. Te RKHS kernel functions are as follows:

Quaternions.
Te current section provides a background image of Quaternion algebra and what we call the Quaternion operators used to introduce our models. As a number system in mathematics, the Quaternion number would extend the complex numbers. A Quaternion number is a form of Q � a.1  (Table 1) [29].
In theorems of Euclidean space, quaternions are advantageous over traditional real-valued approaches for several reasons: (1) quaternions are constructed with one real component and some imaginary components, which would lead to greater expressiveness. (2) Te quaternion numbers/vectors are replaced by an application of Hamilton products as a replacement for dot products in Euclidean space, which corresponds best across multiple (inter-latent) quaternions, strengthening their inter-latent relationships, leading to a more expressive model. (3) Te Hamilton product is weight sharing, so it would have fewer parameters than a model without it. Figure 4 illustrates how quaternions are superior to real-valued representations by comparing the transformation procedure with quaternions. Compared with real-valued representations in Euclidean space, quaternions can provide superior inter-dependencies interaction coding with a 75% reduction in parameters [30].
Parallel feature fusion by quaternion may be extended in RKHS to be performed directly with a data-adapted kernel. Tis corresponds to the parallel fusion of a hyperplane in the probably infnite dimensional space H. At the same time, a nonlinear kernel induces the mapped patterns Φ (x), the advantage being that using the appropriate kernel in RKHS primarily results in a linear resolution of the nonlinear relationships of the multibiometric features.
A feature-level fusion module implements the proposed methodology, as shown in Figure 5. Kernel functions are used to map the feature subspaces to the RKHS. A nonlinear kernel mapping with a kernel function that maximizes the interclass scatter while minimizing the intraclass scatter is defned to extract the discriminant information. Te RKHS integrates feature vectors using several quaternion-based algorithms, including Quaternion singular value decomposition (QSVD), Quaternion principal component analysis (QPCA), Quaternion linear discriminant analysis (QLDA) [31], and Quaternion locality preserving projection (QLPP). QPCA and QSVD extract the global data from the quaternion division ring, and QLPP extracts the local data, fnding the essential manifold construct of quaternion fusion. In addition, QLDA also minimizes conficts within and between classes and maximizes variance between classes in quaternion fusion feature sets [31]. Te result is a quaternion-based fusion in RKHS. Te proposed algorithm for achieving a hybrid multimodal template of three feature vectors of face, combined iris, and combined fngerprint in the RKHS by quaternion comprises the following four steps: (i) Step 1: Normalize feature vectors (ii) Step 2: Kernel mapping for the individual input data point x i . (iii) A nonlinear mapping function φ is defned as φ: . An implicit high dimensional feature space F is formed by mapping the input data feature point xi to RD. With the nonlinear mapping function φ in an RKHS, a kernel function Computational Intelligence and Neuroscience According to this, we propose a quaternion-based parallel fusion algorithm in RKHS that model's the fusion of three types of face, iris, and fngerprint biometrics. Unlike the prior works in feature-level fusion, which are based on Euclidean space, our introduced system models the fusion of three biometric feature vectors of a face, combined irises, and combined fngerprints to achieve further efciency in a reproducing kernel Hilbert space with high dimensional (or even infnite-dimensional) and hypercomplex system (i.e., quaternion space).

Deep Learning.
In the previous sections, the theoretical foundations of quaternions and dimensional algorithms in the reproducing kernel Hilbert space (RKHS) were briefy discussed to express one of the proposed models in machine learning aimed at combining the feature vectors of several biometric traits to achieve a hybrid biometric template. Similarly, this section refers to deep learning, which is the construction of a machine learning model applicable to demonstrating a hierarchical display of the data. Deep learning is a potent tool because it manages a lot of data. In deep learning, neural networks with multiple layers are generally referred to as deep neural networks, which illustrate how neural networks with multiple layers can successfully create representational structures. [32][33][34]. Te weights of these networks can be adjusted using feature learning algorithms with and without observers. A convolutional neural network (CNN-based) is one of the most popular deep neural networks. Convolutional, pooling, and fully connected layers make up CNN's architecture. Te convolution layer is the backbone of any CNN business model. Tis layer is where the pixel-by-pixel scan of the images is performed and creates a feature map to defne future classifcations. By acquiring the overall dimensions of the images, pooling is also known as data sampling. Te information on each property from each convolution layer is limited to the essential information. Creating convolution layers and using pooling is continuous; it may be used multiple times. Once the feature analysis has been performed and the calculation time has arrived, the fully connected layer would assign a random weight to the inputs, predicting the appropriate label. Te fully connected output layer is the last layer of the CNN model, which contains the results of the tags assigned for the classifcation and allocates a class to the images.
Te flters, therefore, determine the properties through the channeling process, producing an attribute map as their output, as shown in Figure 6. Tese attribute vectors are placed next to each other and combined to form a hybrid pattern, building a hybrid for our system. Figure 7 shows the three main modules of a hybrid multibiometric system: feature extraction, feature fusion, and classifcation. Additionally, the proposed methodology is implemented in the fusion layer. Modules are divided into subsections.

Face, Iris, and Fingerprint Features Extraction.
Te unimodal iris, face, and fngerprint feature extraction algorithms used in our previous work were reused in the proposed hybrid multimodal biometric system using the three traits (iris, face, and fngerprint).
(1) Face Feature Extraction. Tere are three types of face feature extraction methods: model-based, template-based, and appearance-based. Several appearance-based methods are available, including principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projections (LPP), local binary patterns (LBP), and discrete cosine transforms (DCTs). Nonlinear methods include kernel principal component analysis, kernel linear discriminant analysis, and kernel locality preserving projections. A face experiences extensive intrinsic (i.e., age wrinkles, facial expression variations) and exterior changes (i.e., occlusions, gestures, and lighting variations) changes in the real world. Because of high cost and complex calculations, it may be challenging and difcult to provide many experimental face photos. Accordingly, face recognition is often a nonlinear issue due to the small-scale photos' complicatedness, number, and dimensionality. Tat explains the importance of developing algorithms with useful features and nonlinear scales for describing the nonlinear relationsships between face samples. Considering that the kernel method can efectively register nonlinear similarities between samples, kernel-based face feature extraction methods have been introduced to develop linear algorithms in which appropriate kernel functions are used to map the samples implicitly into a new feature space with higher dimensionalities. Ten, in this new feature space, nonlinear relations become linear, and the distance metrics are trained for the desired use. Of course, unlike the excellent performance of kernel functions in various algorithms, one key matter in the algorithms is to select the appropriate kernel or parameters for a specifc kernel [35]. Tis paper uses KPCA [36] and KLDA [37] to extract face features.
(2) Iris Feature Extraction. Te iris recognition system is an accurate and reliable biometric technology. Te Daugman algorithm [38] and the Hough transform [39] extract iris features. Te iris feature extraction algorithm may be summarized in three steps: (1) Identifying the iris boundaries in an eye image is the frst and most vital step in iris recognition. In the Duagman algorithm, (x, y) denotes the iris image, r denotes the radius range for searching in the image, and G(r) represents the Gaussian smoothing function that starts searching from the pupil to distinguish changes in the maximum pixel values (partial  Computational Intelligence and Neuroscience derivative). Two circles represent the eye in the Hough transform, with (xi, yi) denoting the center coordinates and r denoting the radius.
(2) A rectangular block is created by normalizing images segregated from circles into equal dimensions. (3) An iris feature is extracted using the 1D Log-Gabor flter on a normalized image to display information about the tissue of the iris. Tis flter is a logarithmicscale Gabor function, and the frequency of the Log-Gabor flter is provided for in the following formula, where f0 denotes the frequency center and σ is the flter bandwidth: A 9600-bit code is used to process the features of the iris, while a 9600-bit code is used to process the upper and lower eyelashes.
(3) Fingerprint Feature Extraction. Fingerprint verifcation techniques can be categorized as based on correlation, based on ridge features, and based on minutiae. Te ridges and valleys pattern can also be considered an oriented texture. Te minutiae pattern is unique, but many factors afect the system performance, such as noise and distortion occurring while acquiring the image [40].
To solve the problems with fngerprint matching algorithms, new representations and techniques of fngerprint images have been proposed recently [41]. False fngerprints made from unique materials can attack fngerprint identifcation systems. A fngerprint liveness detection algorithm (FLD) has been developed to enhance fngerprint identifcation systems' security. Image quality, sweat pores, perspiration, skin deformation, and texture features are the fve categories of software-based FLD methods. Te frst four FLD methods have a poor user experience as they require two or more images to compare. Using texture features, the frst four methods are solved by analyzing fne texture information individually and measuring using just one image. Visual traits such as texture refect the surface structure's arrangement properties and describe an image's homogeneity phenomenon. For FLD, fngerprint texture information can be used to determine the morphology, smoothness, and orientation of authentic and fake fngerprints. [42].
Te fngerprint feature space is based on the fngerprint texture feature. In fngerprint feature extraction, methods such as minutiae matching, short time Fourier transform (STFT), and Gabor flter bank [43] are used. Gabor flter banks are commonly used to extract features.
Tere are four major steps to the fngerprint feature extraction algorithm after improving the fngerprint image: (1) Identifying the target area and the reference point (2) A reference point is used to segment the target area (3) Te Gabor flter bank can be used to flter the target area in six or eight diferent directions (4) Calculate the absolute standard deviation of each segment to generate a feature vector [44].

CNN-Based Feature Extraction.
In the VGG-16 architecture [45], CNNs comprise convolutional layers, pooling layers, and all-connected layers. An input image is processed by a convolutional layer using a sliding window technique. A feature map is produced by convolutioning the original image, in which various features are captured, including edges, corners, etc., from the original image. Tis allows diferent types of flters to produce diferent feature maps. Afterward, the production of a convolution layer, usually a nonlinear activation function (e.g., Rectifed Linear Unit (ReLU), is introduced elementwise, so a rectifed feature map can be generated. Tis ReLU, when activated, would replace the total of negative pixels with values equal to zero in a feature map. To decrease the dimensionality of the rectifed feature map, a pooling layer should be used. By pooling all the pixels in a feature map's local neighborhoods, the pooling retains all the signifcant information on the map. Terefore, the feature map is equivariant of scale and translation [46]. Following the nonlinear activation and pooling of layers, the convolutional layers are sequenced; CNNs have one or more fully connected layers, in which all neurons are connected to all neurons in the subsequent layer, so that the frst fully connected layer is coupled to the last minimized feature map. By utilizing the fully connected layers, the dimensionality could be further reduced and nonlinear dependencies could be captured. Tere are an equal number of output neurons in the last fully connected layer compared to the targeted classes. Tis layer uses a function called "softmax." Several pretrained CNN architectures are currently available, including the VGG-16 [46]. Te VGG-16 network provides outstanding efciency regarding the ImageNet competition, in which the network is trained with countless images in one thousand categories. Moreover, the VGG-16 was utilized with proper results in our previous work, that is, the Faster R-CNN paper, giving us the impetus to reuse it in the current study. Tis VGG-16 has thirteen layers of convolutions + reLU, fve layers of pooling, and three layers of fully connected layers [46] (see Figure 8).

Feature Fusion Module.
A single vector as a hybrid multimodal template is the output of the feature fusion module, which is obtained by combining the three vectors of face, iris, and fngerprint features. Tis vector has more diferential power than the output feature vectors of the feature extraction module. Figure 2 illustrates the feature space that contains the richest data. It means feature vectors are better quantitatively and qualitatively than other levels in terms of information. In two aspects, data fusion in image space is essential. Te frst is that it derives discriminant information from the original set of features; the second is that it can remove unnecessary and repetitive information because of the correlation between separate sets of features. In other words, feature fusion would 8 Computational Intelligence and Neuroscience produce the best vector to create maximum distinction and have the minimum dimensions for the system to make the best decision [1]. Te process of vector fusion in the feature space can be implemented in three ways: "series or parallel combination" [2], "feature extraction algorithm, or dimensionality reduction methods" [47], or "binary feature fusion" [2]. Te combination of feature vectors constitutes one of the three feature-level fusion strategies. Tere are three principal modes for combining feature vectors: serial rule, weighted sum rule [48], and parallel fusion. Te frst one consumes abundant computational resources while adopting the last one; the weight selection is problematic. Parallel fusion was proposed by Yang et al. [49]. It avoids the huge amount of computation in the serial rule and the choice of weights in the weighted sum rule. But this method takes two features as the real and imaginary parts of a complex vector, and it can only fuse two single feature modalities or one modality with two types of the feature. Often, multimodal biometric recognition must beneft from more modalities or features for higher efciency.
For this reason, this paper proposes the parallel fusion of three or more feature vectors using the Quaternion algorithm. Of course, for better performance, before the Quaternion fusion, feature vectors are mapped into the Hilbert space using reproducing kernels to extract more nonlinear relationships. As shown in Figure 5, the features extracted from the 3 traits underwent fusion to generate new features representing the user. Because of the fusion strategy discussed above, the model could recognize the combined features throughout the training phase. As a result, a fusion of the second fully connected layer output of the face, iris, and fnger vein CNNs is performed. By combining three CNN models, the vectors resulting from the 2nd fully connected layer become a single vector, defned as follows: An iris image contains xr characteristics, a face image contains xf characteristics, and a fnger vein image contains xv characteristics. To identify the person, the resulting vector X is then input into the softmax classifer to classify the image based on the similarity score [19]. Figure 9, training and testing of the hybrid system are proposed in two ways.

Classifcation Module. As shown in
As part of the testing phase, the same parameters are applied to new data to determine the level of distinction Computational Intelligence and Neuroscience 9 between consequential feature vectors. A comparison is then made between the classifcation results and a favorable target function to determine the efciency of the system. Obtaining the weights of each neuron in a neural network is similar to determining the efectiveness of the neural network.
In deep learning, a trained model is obtained from the training data. Te testing data are entered into this model (which can be a deep neural network and includes all the extraction, fusion, and classifcation modules). Te result would be accessible from the output. Using the other method, as explained in previous sections, the features of the face, iris, and fngerprint images are extracted frst. Ten, after normalization by mapping feature vectors to the RKHS, quaternion-based algorithms are used to fuse and store a multimodal template of biometrics representing each class in the database. Finally, in the recognition phase, the classifcation module compares the new hybrid biometric template obtained from the previous modules (extractor and feature fusion) with the hybrid modules previously stored in the database in the enrolment phase to determine its class based on further similarity (or shorter distance) between the new template and the stored template. In order for a system to be efcient, it is imperative that the classifcation module perform well. Te results of nine classifers have been evaluated in this article, as shown in Figure 8. In addition to distance function classifers (Euclidean, Manhatan, Angle, Mahalanobis), probabilistic neural network classifers (PNNs), radial basis function neural networks (RBFs), k-nearest neighbor classifers, kernel support vector machines (KSVMs), and Gaussian classifers, these classifcations include k-nearest neighbor classifers.

Evaluation Metrics
It is essential to evaluate a model before developing a machine learning or deep learning model for practical applications. A trained model was used to classify the test images after preprocessing, training, and validation. It is proposed to measure the performance of uni-biometric, multi-instance, multialgorithm, and hybrid multimodal biometric systems [22,34] by looking at performance metrics such as recognition accuracy, receiver operating characteristics (ROC) curves, AUCs (area under ROC curves), sensitivity, specifcity, and efciency [22,34]. In Table 2, true positives (TP) represent correct positive example assignments, false negatives (FN) indicate incorrect negative example assignments to positive classes, false positives (FP) indicate incorrect positive example assignments to negative classes, and true negatives (TN) indicate correct negative example assignments.
(i) Sensitivity: Tis is also known as the recall rate or the true positive rate, and is a measure of how well the classifer performs. Calculates the percentage of positives that are correctly identifed. (ii) Specifcity: It is the percentage of samples that test negative with the test on an actually negative issue, which is often referred to as the specifcity of a test. Te rate at which negatives are correctly identifed, in other words. All healthy individuals are identifed as negative for a given condition by a test, for example. Specifcity measures are based on the percentage of accurately diagnosed healthy individuals across all healthy groups. (iii) Accuracy: Te accuracy metric measures how many samples were correctly identifed. Measurements that are close to a specifc value (iv) Precision: In the precision metric, relevant issues are calculated as a percentage. Te measure of how close the measurements are to each other. Classifers are evaluated according to their ability to reject irrelevant subjects. Because of the recall metric, relevant subjects are found in many samples. During the classifcation process, it considers how well the classifer presents all the relevant subjects. (v) Positive likelihood ratio (PLR): likelihood ratio positive, the likelihood ratio for positive results is the probability of a test case with the condition testing positive divided by the probability of a test case that does not have the need to test positive. (vi) Negative likelihood ratio (NLR): likelihood ratio negative or likelihood ratio for negative results is the probability of a test case with the condition testing negative divided by the probability of a test case that does not need to test negative.
Receiver operating characteristic curves are developed by plotting true positive rates (TPR) against false positive rates (FPR) with diferent thresholds. Te following shows this clearly: the below curve area can be computed by (the integral boundaries are inevitably inversed because the large threshold T value is lower on the x-axis): ROC curves and verifcation performance alone cannot validate a multibiometric system's performance. Research on in-person authentication, which analyzes whether two models difer statistically signifcantly from one another, is essential but has received insufcient attention. As a result, NG and NI indicate the number of intraclass comparisons and interclass comparisons [51]. For the introduced system, performance parameters such as recognition accuracy, ROC curve, AUC, sensitivity, specifcity, and efciency are presented. 100 classes are considered for system training and testing in these tests. For this purpose, the faces, right and left irises, and the right and left index fngerprints of 100 persons registered in the databases above were selected to extract their feature vectors. Eighty percent of the images of each person (class) are used in training and the remaining twenty percent in testing.

Experimental Results Analysis and Discussions
A multimodal biometric device's efciency is greatly infuenced by the fusion methodology adopted. Because of the high quality and quantity of information in the feature space, feature-level fusion is more efective than fusion at other levels. Terefore, the feature-level fusion technique was the most stable. We use a fusion of feature vectors to convert fve feature vectors into a single vector to achieve a multimodal biometric template that is robust, secure and has higher detection power than the original vectors. However, the feature set derived from multiple biometric features and used for multimodal device design could be inconsistent [52]. Tis is the challenge we face in the fusion of diferent feature spaces. Te fusion of feature space is done through one of the three processes "series or parallel combination," "feature extraction algorithm or dimensionality reduction methods," or "binary feature fusion." In this article, we propose three fusion methodologies to avoid the inconsistency problems of the feature space of fve biometric traits derived from three sources of evidence. Tis section compares the results of strategies of serial combination, dimensionality reduction, parallel fusion, and CNN-based fusion in multi-instance fngerprint, multialgorithm iris, and deep hybrid multimodal recognition systems. A deep hybrid multimodal biometric recognition system uses deep learning and quaternion-based and dimensionality reduction algorithms in the RKHS for an efective fusion strategy and accurate biometric recognition. Feature-level deep fusion is performed using mapping in higher space (RKHS) or deep layers; the process of combining feature vectors can be performed by mapping them by selecting the reproducing kernel functions to the RKHS with much higher dimensions and then fusion of the mapped vectors to the RKHS via dimensionality reduction algorithms. Also, parallel feature fusion by quaternion may be extended in RKHS to be performed directly with a data adapted kernel. Te advantage is that using the appropriate kernel in RKHS largely results in a linear resolution of the nonlinear relationships of the multibiometric features. Te deep learning-based fusion model in Figure 6 with fully connected layers shows the architecture of the multimodal CNN network for deep feature fusion of the features of the face, irises, and fngerprints.
We perform experiments on six databases, including the two face databases (FERET [53] and Shahed-University (gathered at Shahed University, Tehran, Iran)) [23], CASIA iris databases (right and left irises) [54], and both right and left index fngerprint databases of Shahed-University.
First, we present results from multibiometric recognition systems using iris, fngerprint, and face data separately with corresponding classifers. A multialgorithm and multi-instance recognition system examines the fusion of two fngerprints and both right and left irises. As a fnal illustration, the recognition results of the hybrid multimodal biometric system are illustrated using the same classifers that were used to combine the features of the face, two irises, and two index fngerprints. Figure 10 compares the AUC with the CNN model and the uni-biometric recognition systems' performance on the face, iris (right, left), and fngerprint (right index, left index) with ROC, true prevalence curve, and AUC against FERET and Shahed face databases, iris CASIA database, as well as Shahed index fngerprint database.

Uni-Biometric Recognition Systems.
Te AUC using the LDA feature extraction technique in Gaussian reproducing kernel Hilbert space and angle distance classifer(Dis-Angle), as well as Mahalanobis distance classifer (Dis-L1) against FERET and Shahed databases, amounts to 0.7881 and 0.9094 respectively as represented in Figures 10(a) and 10(b).
To investigate the uni-biometric iris recognition system, several 100 CASIA database classes relative to the left and right irises are considered, from which 3 images of each iris are evaluated against the left iris database, 2 for training and 1 for testing. Additionally, from the right iris database, 4 images are considered in each class, 3 for training and 1 for testing. As for feature extraction, the Daugman and Hough transform algorithms were utilized, and some 9,600 features were extracted from the iris. Next, the fve reproducing kernels (gaussian, polyplus, polynomial, linear, and hamming) were used to map the feature vectors from feature space to RKHS, through which the nonlinear relations were converted into linear ones. A further comparison of the unibiometric iris recognition systems' performance is made in Computational Intelligence and Neuroscience  index fngerprint databases, fgures 0.8553 and 0.9593 are obtained, respectively, as the areas under the ROC curve (AUC) in the uni-biometric system. Table 3 shows the AUC with the CNN model and the classifcation results of multialgorithm and multi-instance systems in RKHS concerning fve reproducing kernels.

Multialgorithm Iris and Multi-Instance Fingerprint Recognition Systems.
Compared with mapping the feature space to RKHS through Gaussian and linear reproducing kernels, the recognition precision would increase to 99.07 and 100 percent, respectively, with a combined vector dimension of 180 and a reduction-based fusion dimension of 83. Figure 11(a) compares more accurately the multialgorithm iris recognition systems' performance following their ROC curves, drawn from the application of Daugman and Hough transfer algorithms and CNN for extraction of features of left and right irises CASIA database.
Upon applying LDA and PCA in the RKHS, fngerprint features are reduced from 73960 to 150 features. When the right and left index fngerprint feature vectors are combined in the RKHS, the produced vector is inputted to the classifer. When applied for transferring the feature vectors from the feature space to RKHS, linear, and Gaussian reproducing kernels would enhance the recognition precision by 84% and, in respect of the multi-instance system, by 81%. Figure 11(b), illustrating the ROC curve, denotes the multiinstance recognition systems' performance when the Dis-Angle classifer and CNN-based are applied to the left and right fngerprint databases. Considering the multi-instance fngerprint system, the area shown below the ROC curve (AUC � 0.9123) represents the proper performance of the system. Table 4 compares the recognition precision of multi-instance fngerprint and multialgorithm iris systems and hybrid multimodal systems applying 4 approaches concerning dimensionality reduction, serial combination, parallel fusion through Quaternion, and CNN-based fusion. Te deep mixed multimodal template results from a CNN-based fusion accompanied by the parallel fusion of face, combined irises, and fngerprint vectors through the Quaternion  algorithm in RKHS. With the introduced hybrid multimodal system, using CNN-based fusion and parallel feature fusion through face Quaternion, combined irises, and fngerprint feature vectors using QSVD-QPCA algorithms in the RKHS, 100 percent recognition precision is achieved.

Conclusion and Future Work
Due to the richness of information available in the feature space (in terms of quality and quantity), feature-level fusion is more efective than fusion at other levels (sensor, score, and conclusion). Tis paper proposes a deep hybrid multimodal biometric system to obtain a robust and secure hybrid templet from the fusion of face, irises, and two left. Te right index fngerprints are featured at the feature level. In one of the proposed strategies, the fusion of feature spaces based on deep learning algorithms is performed using a combination of feature vectors in in-depth and fully connected layers. Te second proposed methodology was implemented based on a design involving the parallel fusion of feature spaces in the human face, combined iris and fngerprint using quaternion in the reproducing kernel Hilbert space. Using appropriate kernel functions for mapping feature vectors to the RKHS makes the nonlinear relations linear and, in other words, achieves more resolution of nonlinear relationships of biometric feature vectors in the new space. Ten, in the RKHS, three feature vectors fll the three imaginary parts of the quaternion. Using the parallel fusion approach, quaternion-based algorithms extract the global and local information to constitute the quaternion fusion features based on the global and local information extracted. Biometric systems can be evaluated using the AUC without specifying client and impostor priors or costs associated with the diferent errors. An AUC value of 1 indicates a perfect verifer has no false rejects and no false accepts. Verifers that perform like random guesses have an AUC of 0.5. A verifer should perform better than a random guess, at the very least. Te higher the AUC value, the better the verifer.
As for the FERET and Shahed face databases, right and left index fngerprint databases, and right and left iris CASIA databases, fgures 0.7881, 0.9094, 0.8553, 0.9593, 0.8892, and 0.9593 are obtained, respectively, as the AUC in the unibiometric system. Also, the proposed strategies for feature-level deep fusion in the multialgorithm iris recognition system with an AUC � 0.9813 and the multi-instance fngerprint recognition system with an AUC � 0.9123, as expected.
For searching extensive databases (recognition), CNN-based and quaternion-based feature-level fusion is recommended for RKHS. Based on the results, the corresponding class of a test sample can be accurately differentiated in a secure multimodal template database without consistency errors. It is 100% accurate and performs well. Several research topics are retained for future work. One issue is how to perform analysis on other large multibiometric databases. It would be essential to obtain a robust and secure hybrid templet from the strategy of vector fusion in the feature space and repeat and evaluate again the methods which are used through one of the three mentioned processes, given that the quality of the data would be impaired and the volume would be larger. Also, another topic is to analyze the computational cost of the algorithms. [55][56][57][58][59][60].

Data Availability
Data is available and can be provided over the emails querying directly to the author at the corresponding author (doostari@shahed.ac.ir)

Conflicts of Interest
Te authors declare that they have no conficts of interest