Online Signature Verification on MOBISIG Finger-Drawn Signature Corpus

We present MOBISIG, a pseudosignature dataset containing finger-drawn signatures from 83 users captured with a capacitive touchscreen-based mobile device. *e database was captured in three sessions resulting in 45 genuine signatures and 20 skilled forgeries for each user. *e database was evaluated by two state-of-the-art methods: a function-based system using local features and a feature-based system using global features. Two types of equal error rate computations are performed: one using a global threshold and the other using user-specific thresholds. *e lowest equal error rate was 0.01% against random forgeries and 5.81% against skilled forgeries using user-specific thresholds that were computed a posteriori. However, these equal error rates were significantly raised to 1.68% (random forgeries case) and 14.31% (skilled forgeries case) using global thresholds. *e same evaluation protocol was performed on the DooDB publicly available dataset. Besides verification performance evaluations conducted on the two finger-drawn datasets, we evaluated the quality of the samples and the users of the two datasets using basic quality measures. *e results show that finger-drawn signatures can be used by biometric systems with reasonable accuracy.


Introduction
One of the oldest ways of proving your identity is giving your signature.Many official documents require signatures from agreeing parties.Signature recognition can be divided into off-line (static) and online (dynamic) methods.While off-line systems work with images, only the shape of the signature is available, but online systems use information related to the dynamics of the signature.Due to this additional information, online systems outperform off-line systems [1].
Biometric systems can produce two types of errors: false rejections of genuine signatures (false rejection rate (FRR)) and false acceptance of forged signatures (false acceptance rate (FAR)).e overall system error is usually reported in terms of EER (equal error rate), which is defined as the system error rate when FAR and FRR are equal.
In signature database evaluations, two types of forgeries are considered: skilled and random forgeries.Skilled forgery evaluation is based on using the forgery samples available in the database (forgery samples are provided by forgers who know both the shape and dynamics of the imitated signature).Random forgery (or zero effort) evaluation is based on using random genuine samples from the dataset (corresponding to the case when the forger does not know the signature to be forged, therefore is using his/her own signature).e state of the art in automatic signature verification is presented in a study by Impedovo and Pirlo [1].
Online signature recognition is not a new research area: several online signature corpora have already been collected using digitizer tablets.While PHILIPS [2], SVC2004 [3], and SUSIG [4] databases contain only online signatures, MCYT [5] is a bimodal database containing both fingerprints and online signatures.BIOMET [6] and BioSecurID [7] contain several types of biometric data including online signatures.
Due to the increasing number of touchscreen-based mobile devices and the familiarity of users with using signatures, we consider that signatures are plausible candidates for authentication on mobile devices.A number of researches have already been conducted in this topic, although the signature databases are not publicly available [8][9][10], except for DooDB database [11].Compared to the DooDB database, which was collected on a device using a resistive touchscreen, our database was collected on a device with a capacitive touchscreen.Specifically, while DooDB contains only the coordinates of the points of the signature and the corresponding time information, our MOBISIG database contains additional information such as pressure, finger area, and data saved from the accelerometer and gyroscope.
In this paper, we analyze whether signatures can be used for authentication in a mobile device context.erefore, two state-of-the-art methods, a local-or function-based and a global-or feature-based system, were implemented and evaluated on the MOBISIG database.In order to compare our database to the other publicly available online signature database, we performed the same evaluations for the DooDB database using the same parameters and features.In addition, we present a few basic quality evaluations for both databases.
e main contribution of this paper is the presentation and analysis of MOBISIG signature database containing data from 83 users.e signatures are not the original signatures of the users, but users were assigned a family name and were required to create a signature for that name.Signature collection was performed on a Nexus 9 tablet under supervision; data providers were instructed on how to draw signatures using their finger.
e database was collected during three sessions and contains 45 genuine and 20 forged signatures for each user.e database is publicly available at http://www.ms.sapientia.ro/∼manyi/mobisig.html.
On the MOBISIG database, the best EER for skilled forgeries was obtained by our function-based DTW system: 5.81% for a posteriori user-specific thresholds and 20.82% for common thresholds.In case we added pressure information to the coordinates and their first-and secondorder differences, only the a posteriori user-specific threshold result was improved.When using a 9-inch diameter device for data collection, users tend to put down the device on the table while drawing the signatures.erefore, we did not use the data obtained from the accelerometer and gyroscope sensors in the computations.
Comparisons with the DooDB database indicate higher signature quality in the case of the MOBISIG database, and correspondingly better performances for the verification methods studied in our paper.Our study is limited by the sample size (83 users) and a slightly unbalanced age distribution (77% of the users aged under 25); in addition, our data providers were not experts in forging signatures.
e rest of the paper is organized as follows.A literature review on signature recognition on mobile devices is presented in Section 2, completed with a review of a few papers on signature quality evaluation.Section 3 presents our MOBISIG dataset, followed by a detailed description of the methods used for signature verification.Experiments and benchmark results are presented in Section 5, whereas Section 6 compares DooDB and MOBISIG datasets along verification system performance and quality measures.Section 7 concludes the paper.

Related Work
2.1.Signature Recognition on Mobile Devices.Little research has been carried out in the field of online signature recognition on mobile devices.We have only found six studies [8][9][10][11][12][13] reporting results obtained on signature databases captured in mobile context.e properties of the databases used in these studies are presented in Table 1.
In most of the studies concerning signature recognition, results are reported using signature databases captured on a pen tablet.However, touchscreens present some drawbacks compared to pen tablets, the most important being the quality of the captured signal.While pen tablets sample the signal uniformly with relatively high frequency, hand-held device sampling is usually event-driven with lower sampling frequency than pen tablets.Moreover, while both touchscreen devices and pen tablets are able to capture trajectory and pressure, the latter can track pen orientation.e only advantage of a touchscreen device is that it allows capturing the signature by fingertip.
One of the objectives of BioSecure Signature Evaluation Campaign (BSEC'2009) was to study the influence of acquisition conditions (digitizing tablet or PDA) on authentication systems' performance [14].Results are reported using signatures from 382 writers, acquired on a digitizing tablet and on a PDA, respectively.e authors reported a significant quality degradation of signatures acquired in mobile conditions.e semester thesis of Bissig [12] is the first study reporting results using a signature database captured on a resistive touchscreen with fingertip.Four types of signals were acquired: coordinates x(t), y(t), pressure p(t), and finger area a(t).Both local (function-based: DTW) and global systems (featurebased: one-class SVM and Mahalanobis distance) and the combination of these were evaluated.Unfortunately, neither the number of subjects nor the number of forgeries in the captured database is reported.However, this is the only study reporting the influence of pressure on the performance of a signature verification system.
Houmani et al. [8] report results on a new dataset collected from 64 subjects on a PDA.Unfortunately, neither the number of sessions nor the acquired signals are reported.However, they propose an entropy-based quality metric for selecting reference signatures in the enrollment phase.Krish et al. [9] collected a new signature database using a Samsung Galaxy Note device from 25 users (20 genuine signatures from a user).eir verification algorithm combines two state-of-the-art algorithms (function-based DTW and feature-based Mahalanobis distance).Due to the missing forgery samples, only the results obtained by random forgery evaluations are reported.
Sae-Bae and Memon [10] collected an online signature dataset from 180 users using HTML5.Users were allowed to enter their signatures using their own iOS devices.e dataset is not publicly available and contains only genuine signatures; therefore, only the random forgery-type evaluation was feasible.ey proposed a new histogram-based feature set and reported performance evaluations both on their and MCYT datasets.
e first publicly available database collected on a handheld device (HTC Touch HD mobile phone) is the DooDB.
is database contains data from 100 users, and it also contains doodles besides pseudosignatures.Martinez-Diaz et al. [11] report the database analysis and benchmark results, 2 Mobile Information Systems using a function-based DTW verification system with several local features.Although the EERs obtained by the random forgery evaluations are low (around 3%), those obtained for skilled forgery evaluations are too high (around 27%).In a later study [13], the skilled forgery result was improved (20.9%) by using the Gaussian mixture method.
2.2.Signature Quality.Quality evaluation of biometric datasets is a difficult problem.A biometric dataset consists of biometric samples from a number of users, usually containing a fixed number of samples from each user collected in a fixed number of sessions.Moreover, signature datasets contain skilled forgery samples for each user.ere are two ways to achieve the quality evaluation of a biometric dataset: (i) evaluating each sample of the dataset and (ii) evaluating each user of the dataset.In each case, we obtain a set of scores, and then an average score can be computed from these scores.Both the samples and the users can be evaluated by using only the genuine signatures or by using both the genuine and forgery signatures.Müller and Henniger [15] proposed two quality metrics for signature dataset evaluation.One of the quality metrics evaluates the samples, while the other one evaluates the users of the dataset.Both metrics use the DTW distance between samples.
Houmani et al. [16] proposed a personal entropy measure for online signatures and showed the existence of a clear relationship between the proposed measure and the verification performance of the user revealed by the signature verification system. is measure allowed them to categorize the users of several signature datasets.In a later study [17], they adapted the measure to the skilled forgery samples of signature datasets.Similar to their previous study, they proved the effectiveness of the quality measure by evaluating several online signature databases by using state-of-the-art signature verification methods.
One of the objectives of the BSEC'2009 competition was the evaluation of online signature algorithms with respect to the quality of the signatures [14].
e personal entropy measure introduced by Houmani et al. was used to group the signatures into different categories.
e results of the competition showed that the performance of classifiers varied significantly with respect to the good and bad quality signatures.Houmani and Garcia-Salicetti [18] extended the Biometric Menagerie to online signatures and categorized the users of MCYT database using the Personal Entropy quality measure.
Kahindo et al. [19] proposed a novel signature complexity measure to select reference signatures for online signature verification systems.Guest and Henniger [20] used commercial engines for the assessment of the quality of handwritten signatures.ey concluded that predicting the utility of a signature sample using a multifeature vector was possible.More recently, another novel method was proposed for the quality evaluation of off-line signatures [21].

The MOBISIG Database
Due to security reasons (people are reluctant to give their own signatures), participants were asked to create a signature for a given family name.Family names were selected from the first 100 most frequent Hungarian family names.Participants were also asked to practice the created signatures by drawing and deleting several attempts.e first five attempts were deleted.
e database contains signatures from 83 subjects: 49 men and 34 women, with the following age distribution: 64 subjects under 25, 12 between 25 and 40, and 7 over 40.
Mobile Information Systems sensor.e accelerations and the values obtained from the gyroscope characterize the holding position of the device.
e screen of the device was divided into two sections (Figure 1): the upper section was the replay section, where users were shown the animated signature, and the lower section was designed to draw the signature.e animation functionality was available in both types of signature collections: genuine and forgery.
e animation allows participants to recall the shape and the dynamics of genuine and forged signatures.
e animation could be replayed any number of times.Before data collection, users were asked to become familiar with the device usage as well as their pseudosignatures.Additionally, signatures were saved after user's acceptance.Any signature could be deleted by the provider if he/she was not satisfied with the result.e data collection process was divided into three sessions with one week between consecutive sessions.In the first session, each user had to provide 15 genuine pseudosignatures for the assigned name.In the second and third sessions, participants had to provide 15 genuine pseudosignatures and 10 forgeries for two assigned users (two times 5 forgeries).At the end of the data collection process, we had 45 genuine signatures and 20 forgeries for each participant.A few of these signatures are shown in Figure 2.

Signature Files. Each user has a dedicated folder which contains the 45 genuine signatures of the user and the 20 forgeries made by other users. e naming convention of the files is as follows: SIGN [T] USER[SID] USER[WID]
[NR] , where T is FOR for forgeries and GEN for genuine signatures.SID identifies the user whose signatures are in the folder.WID is the identifier of the user, who gave the signature; SID and WID are equal in the case of genuine signatures, while they are different in the case of forgeries.e NR at the end of the filename is a value from 1 to 45 for genuine signatures and 1 to 20 for forgeries.e first 15 genuine signatures were collected in the first session, the second 15 signatures in the second session, and the third 15 signatures were collected in the third session.e first 10 forgeries were collected in the second session, while the second 10 forgeries in the third session.
e naming conventions of the folders is USER [SID].Each signature is represented as a sequence of points and is stored in a file.Each line of the file represents one point of the signature and consists of the following features: x-coordinate, y-coordinate, time stamp, pressure, finger area, x-velocity, y-velocity, x-acceleration, y-acceleration, z-acceleration, x-gyroscope, y-gyroscope, and zgyroscope.

Methods
In order to assess the authentication performance based on pseudosignatures, both a function-based and a feature-based verification system were implemented.Features used by signature verification systems can be local and global ones.Local features correspond to sample points along the signature's trajectory (e.g.,point-wise pressure).Global features are computed from the signature as a whole (e.g.,duration).
where N f is the number of local features.In this work, the following local features were employed: the x, y coordinates, the x 1 , y 1 firstorder differences, the x 2 , y 2 second-order differences, and the p, p 1 pressure and its first-order difference.Before computations, the features of time sequences were standardized , where μ f i and σ f i are the mean and standard deviation for the ith local feature computed over all sampling points of the signature).e Euclidean distance function was used to compute the distance between two elements of the time sequences (distance(s[i], t[j] )).Finally, the obtained DTW distance was divided by the sum of the time sequence lengths (n + m).
e verification process works as follows: in the enrollment stage, a set of N reference signatures are selected e 1 , e 2 , . . ., e N  .In the verification stage, the DTW distances between the test signature and all the reference signatures are computed, and the final score results as the average of these distances.Finally, this distance-based score (Dscore) is transformed into a similarity score using (2): Mobile Information Systems score � e architecture of our function-based system is shown in Figure 3.

Feature-Based Verification.
e second type of verification system is a feature-based or a global system, which computes a fixed size feature vector from each signature.Each feature is a global feature related to the signature as a whole.
e components of our feature-based system are the following: feature extractor, user template creation, and matcher.No preprocessing was applied.However, before computations, features were scaled (separately for each user).e architecture of our feature-based system is shown in Figure 4.
Our feature extractor computed position-and time-based features [12,22], such as duration and different types of velocity, as well as different types of sign change, a few sensorrelated features (touchscreen pressure and finger area), and 2 types of histogram-based features.In the computation of histogram-based features, we followed the work of Sae-Bae and Memon [10] except that we used fewer features.
Let X � x 1 , x 2 , . . ., x n   and Y � y 1 , y 2 , . . ., y n   be the x, y coordinates of a signature and P � p 1 , p 2 , . . ., p n   the pressure attribute.We compute the first-and second-order differences of these sequences as follows: were computed in order to derive a uniform-width histogram from this sequence.e tan 2 −1 trigonometric function is a common variation of the standard arctan function, which produces results in the range (−π, π].Angles characterize the shape of the signature.is interval of angles was divided into 8 equal bins in order to compute the histogram.
, i � 1, n − 1, sequence captures the speed distribution, and it is considered very useful in combating skilled forgeries [10].In the histogram computation, we considered only the values from the interval [0, μ + 3σ], where μ represents the mean and σ the standard deviation of the r sequence obtained from a single signature.is interval was divided into 16 equal bins for the histogram computation.
e full list of the features is presented in Table 2.
In feature-based methods, each signature is representedby a D-dimensional feature vector.After selecting the signatures used for template creation, features were scaled.We applied the normalization for each feature (f ′ i � (  (11) for i←1, m do end for (16) end for (17) result ←DTW[n][m]/(n + m) (18) end procedure ALGORITHM 1: DTW algorithm.6 Mobile Information Systems . ., D) before each template creation.In order to be able to apply the same scaling to the test signature before matching, the scaling parameters (max and min values for each feature) were stored as part of the user's template.User-based scaling of the data meets real biometric veri cation systems, because the system does not take into account the data of other users.e performance of the system is increased signi cantly by normalizing all users' data in a single step.
is type of normalization was already evaluated for the DooDB database [23].
In data mining, anomaly detection (also outlier detection) is the identi cation of observations which do not conform to an expected pattern in a dataset [24] and therefore can be used for impostor detection.Killourhy and Maxion [25] evaluated 14 anomaly detectors for keystroke dynamic biometrics.
Four anomaly detectors were implemented for template creation and matching.Each anomaly detector uses a speci c template and score computation (see below).Dissimilarity scores were transformed into similarity scores by using formula (2) as for the DTW algorithm.
In the training phase, user models (templates) are constructed using a xed number of training samples.Testing or veri cation refers to an anomaly score computation for the test sample, which in our case is a distance from the user model.
e classic Euclidean anomaly-detection algorithm in the training phase calculates the mean vector of the training samples, while in the test phase it calculates the Euclidean distance between the mean vector and the test sample.
e Manhattan detector is similar to the Euclidean detector except that the Manhattan distance is used in the testing phase.Another variant of the Manhattan detector is the Manhattan scaled detector, described in Araujo et al. [26].In the training phase, besides the mean vector of the training samples (m i , i 1, D), the mean absolute deviation of each feature is also calculated (a i , i 1, D). e anomaly Feature extraction  e k-nearest neighbour (kNN) detector works as follows: in the training phase, the detector saves the training samples (reference-based system).In the test phase, the detector calculates the Euclidean distance between each of the training samples and the test sample.e anomaly score is calculated as the mean of the distances to the k-nearest training samples.
One-class SVM is an outlier detector based on the support vector method.We used the LibSVM implementation [27] provided by e1071 package in R, with the following parameters: type � one-classification (for novelty detection), kernel � "radial," and gamma � 0.04.e SVM parameter nu was set to 0.4, and parameters were not tuned for individual users.

Evaluation Protocol.
e same evaluation protocol was used for both verification systems, which consists of three measurements.e systems were trained with the first 5, 10, and 15 samples from the first session, respectively.e 15 genuine signatures from the second session were used for positive score computation.All of the 20 available forgeries per user are used for skilled forgery score computations.Random forgery scores were computed for each user by using the first genuine signature from all the other users (Table 3).
Two types of EERs were computed.e global EER was computed based on a global threshold, which was computed using common genuine and forgery score lists for all users of the database.e second type of EER, the a posteriori userspecific EER, is reported in some recent papers [13,28]. is type of EER is computed by averaging the user-specific EERs.User-specific EERs are computed independently for each subject of the dataset and therefore are based on userspecific decision thresholds.Martinez-Diaz et al. [13] used the notation of aEER for this type of EER.As we compare our results with their results, it is important to use the same notation.However, it is important to note that this type of EER is based on a posteriori user-specific thresholds [29].Hence, the corresponding EER (aEER) represents the best global EER that would be obtained if optimal score normalization techniques were known a priori [13].

Training Set Size.
e effect of the number of training samples during enrollment was investigated.ree cases were evaluated for each type of local and global features: using 5, 10, and 15 samples during enrollment.
Table 4 shows the function-based system results for the MOBISIG database.e five types of local feature sets were as follows: (i) xy-the coordinate sequence; (ii) x 1 y 1 -the firstorder differences of the coordinate sequence; (iii) x 2 y 2 -the second-order differences of the coordinate sequence; (iv)xyx 1 y 1 x 2 y 2 -coordinate sequence with first-and secondorder differences; (v) xyx 1 y 1 x 2 y 2 pp 1 -coordinate sequence with first-and second-order differences and pressure with firstorder differences.Both types of evaluations, skilled and random forgeries, are reported.
In the case of skilled forgeries, the best global EER was obtained by using the xy time sequences only (20.82%).However, the best aEER was 5.81% using type (v) local features.
In the case of random forgeries evaluation, the best result was obtained by using the first differences of x and y time sequences (EER: 1.41% and aEER: 0.01%).e very low EERs obtained show that this type of function-based verification system is highly reliable against random forgeries.
As expected, the more samples used in the enrollment phase, the better the verification system performance was.
is is true for each type of local feature and for both types of evaluations: skilled and random forgeries.However, the improvements are small, especially between using 10 and 15 samples for enrollment.e results obtained for the feature-based system are reported in the following.In order to show the contribution of different categories of global features, we formed three  Table 5 shows the feature-based system results for the MOBISIG database using all features (feat62).As for the function-based system, we report results for using 5, 10, and 15 samples for enrollment.Similar to the function-based system, using more samples for enrollment improves the performance of the feature-based verification system.e performance gaps are higher for cases between 5 and 10 samples than for cases between 10 and 15 samples.
Comparing Table 5 with Table 4, it can be observed that the DTW system achieved far better performances in the case of random forgeries than the anomaly detectors.In the case of skilled forgeries, both the best global EER (14.31%) and the best aEER (9.35%) were obtained by the Manhattan detector.Interestingly, the differences between global EER and aEER are not so large as in the case of DTW system.e ROC curves for the global EERs are shown in Figure 5.

Global Feature Sets.
e results obtained for the three global feature sets are shown in Table 6.e best (lowest) error rates were obtained by using the Manhattan detector both for skilled and random forgery evaluations.In the case of this detector, using less features resulted in higher error rates.However, not all detectors were affected negatively by using less features.For example, the SVM detector performance increased by reducing the dimension of the feature set, especially when the pressure-related features were excluded (feat56).

Intersession Variability.
We evaluated the intersession variability for the MOBISIG dataset.Two cases were evaluated: (1) using session 1 for training and session 2 for testing and (2) using session 2 for training and session 3 for testing.e results for using 15 samples for enrollment are shown in Table 7. e performance differences between the two cases are between 0.16% and 3.30% and are highly dependent on the used method.For example, verification results for system-wide EER are improved for the DTW (type (ii) features) method in case of session pair 2-3 (compared to session pair 1-2), but this tendency is not similar for aEER value in the case of the Manhattan detector.
According to the results from Table 7, no general tendency of improvement or deterioration of verification results could be stated.e results might suggest similar signature quality in session pairs 1-2 and 2-3 considering verification performance, but further investigations are necessary.

DooDB and MOBISIG comparison
In this section, comparative results are presented for our MOBISIG database and the other publicly available DooDB database.e results are presented along (i) their statistical properties, (ii) the verification system performances, (iii) score distributions, and (iv) some signature quality metrics.8 presents the most important characteristics regarding the data collection process of the two publicly available mobile device context signature databases.

Verification Performance.
In the case of function-based DTW system, only the coordinates and their first-and second- Mobile Information Systems order di erences were used (no pressure information is available in the DooDB database).e pressure-based features were omitted for the feature-based system, consequently measurements were performed on feat56 and feat32 feature sets.
e DooDB database contains (0, 0) coordinate values in the case of sampling errors.In order to correct this error, coordinate values from the previous sample were assigned to the sample.e same evaluation protocol was followed as presented in Section 5.1.We present the results obtained by the two veri cation methods on the DooDB database (Tables 9 and 10), followed by the comparison of the results obtained on the two databases (Table 11).
Comparing the skilled forgery results by the DTW method (Tables 4 and 9), we observe that using more samples for reference signatures decreases both global and user average EERs.Among the xy, x 1 y 1 , and x 2 y 2 features, the rst-order di erences (x 1 y 1 ) provide the lowest aEER for both databases (7.34% for MOBISIG and 16.67% for DooDB).However, combining all features (xyx 1 y 1 x 2 y 2 ), the aEERs become slightly better (6.27% for MOBISIG and 15.78% for DooDB).Kholmatov and Yanikoglu [30] also reported that the rst-order di erences had given the lowest error rates.
e worst results were obtained by using thesecond-order di erences (x 2 y 2 ) alone, which means that these features do not contain too much user-speci c information.
Mobile Information Systems e effect of using a reduced feature set is shown in Table 12.It can be seen that not using the histogram-based features resulted in a very small performance degradation (about 1%).
e ROC curves for some verification method are shown in Figure 6.

Score Distributions.
e score distributions of genuine and impostor samples (skilled forgeries case, 15 training samples) for both verification methods are shown in Figure 7.
6.4.Signature Quality.Alonso-Fernandez et al. [31] reviewed the state of the art in the biometric quality problem.ey consider that a biometric sample is of good quality if it is suitable for personal recognition.According to the ISO/IEC 29794-1 standard, biometric quality has three dimensions: fidelity, utility, and character.Fidelity is the degree of similarity between the sample and its source.Utility is related to the impact of the sample on the overall performance of a biometric system, while character indicates the inherent discriminative capability of the source.
Influence of the character of a biometric sample on its utility was investigated by Müller and Henniger [15].In case of signatures, recognition systems will give different results depending on the reference samples (the sample set used for template creation).Selection of samples used in the template can be controlled by sample quality assessment algorithms.Evaluating the quality of a single sample based only on sample features is difficult, but methods exist to evaluate considering other samples.Some a posteriori methods use only genuine samples, others use skilled forgery samples also.Using a set of genuine samples for quality assessment still restricts selection of reference samples to the genuine user and assures usage of the method in real applications.In consequence, we used two metrics which examine the character of the sample proposed by Müller and Henniger [15].e first method, the sampleEER, consists of computing the EER for each genuine sample.
is is obtained by Mobile Information Systems computing the distances to all the other corresponding genuine and forgery samples.We formed a positive and a negative score lists from the obtained distances (scores) and computed the EER.We computed DTW distances between signatures using type (i) local features.e second metric, the userAvgDist, was computed as the average of the pairwise distances (DTW distance-type (i) local features) of genuine samples for each user (in the case of having N genuine samples, we computed N(N − 1)/2 distances).e lower it was, the more similar the samples of the user were, so the more stable the user's signature was.For both databases, we used the rst N 30 genuine samples for computations.
After applying the two metrics for each genuine sample (sampleEER) or each user userAvgDist, we obtained two sequences of values.In order to characterize the dataset, both the mean and the standard deviation of each sequence were computed.e obtained results are shown in Table 13.
From the two quality metrics proposed by Müller and Henniger, we favor the userAvgDist because this metric does not rely on forgery samples; therefore, it could be used during data collection.Samples that are not closely similar to those already collected should be discarded.

Conclusion
In this paper, the MOBISIG nger-drawn online signature dataset has been presented.e dataset comprises pseudosignatures from 83 users, both genuine and forgery samples.Benchmark veri cation experiments have been performed   12 Mobile Information Systems using both function and feature-based methods.Two types of EERs are reported: one is based on global threshold, while the other one on a posteriori user-speci c thresholds.However, the global threshold-based EER results are not outstanding.Nevertheless, further improvement may be obtained by score normalization techniques.Good results were obtained using a posteriori user-speci c thresholds (aEER).e lowest aEER for the skilled forgery case was 5.81% and 0.01% for the random forgery case.Both results were obtained by the function-based DTW method.Although feature-based methods o er poor results in the case of global threshold, they are signi cantly better than using function-based methods (skilled forgery case).e lowest aEERs were 9.35% (skilled forgery) and 2.10% (random forgery), both obtained by the Manhattan outlier detector.
e second objective of this paper was to compare our new dataset with similar publicly available ones.e only publicly available dataset collected on mobile devices and containing nger-drawn signatures was the DooDB.erefore, we have presented a comparison of our MOBISIG and the DooDB dataset along (i) their statistical properties, (ii) the veri cation system performances using exactly the same methods and features, and (iii) some signature quality metrics.Signature Function-based systems use local features and featurebased systems use global features.

Figure 2 :
Figure 2: Pseudosignatures from the database.e first column contains genuine signatures and the second column contains forgeries.

Figure 3 :
Figure 3: System architecture of function-based method.

Table 1 :
Each signature was stored as a sequence of discrete values [x t , y t , t, p t , fa t , vx t , vy t , ax t , ay t , az t , gx t , gy t , gz t ], where x t , y t are the coordinate values, t is the time stamp, p t , fa t are the pressure and finger area (these are normalized values [0, 1] and can be obtained through the standard Android API), vx t , vy t are the directional velocities, ax t , ay t , az t are the directional acceleration of the device, and gx t , gy t , gz t are the values obtained from the gyroscope Mobile signature databases.

Table 3 :
Number of testing samples used in evaluations.

Table 5 :
Verification performance in terms of EER and averaged individual EER (aEER) for the MOBISIG database using anomaly detectors and 62 features (%).

Table 4 :
Verification performance in terms of EER and averaged individual EER (aEER) for the MOBISIG database using DTW and local features (%).

Table 9 :
Verification performance in terms of EER and averaged individual EER (aEER) for the DooDB database using DTW (%).

Table 10 :
Verification performance in terms of EER and averaged individual EER (aEER) for the DooDB database using anomaly detectors and 56 features (%).

Table 11 :
EER and aEER comparison for DooDB and MOBISIG databases.

Table 12 :
Veri cation performance in terms of EER and averaged individual EER (aEER) for the DooDB database using anomaly detectors (di erent feature sets).