Identifying Individuals Using Eigenbeat Features of Electrocardiogram

The authors of this paper present a new method to characterize the electrocardiogram (ECG) for individual identification. We propose an ECG biometric system which is insensitive to noise signals and muscle flexure. The method utilizes the principal of linearly projecting the heartbeat features into a subspace of lower dimension using an orthogonal basis that represents the most significant features to distinguish the individuals. The performance of the proposed biometric system is evaluated on the subjects of both health statuses such as the ECG recordings of MIT-BIH Arrhythmia database and the ECG recordings of normal subjects prepared at IIT(BHU). The result demonstrates that the derived eigenbeat features from proposed ECG characterization perform better and achieve the recognition accuracy of 91.42% and 95.55% on the subjects of MIT-BIH Arrhythmia database and IIT(BHU) database, respectively.


Introduction
The accurate and automatic authentication of individuals is becoming inevitable in several aspects of our daily life.It includes border crossing, business or commercial transactions, health care, physical access control, and managing the digital rights.The proliferation of computers, Internet and computer-based applications such as e-business, ecommerce, and e-learning are raised the concerns of security breaches and identity theft crimes.Conventional methods of automatic identity proofing use the credentials such as passwords and PIN numbers.The deployment of such identity proofing systems that are either possession based or knowledge based raise the serious risk of identity theft.In addition, the use of multiple documents (e.g., passport, PAN card, license, ration card, etc.) for identity proofing offers an opportunity to someone replicating one of these identity markers and pretending to be you.The cases of identity theft are increased significantly which is a dark side of documentary-proofing methods for identity recognition.
Identity theft is a growing problem globally.In a report of Federal Trade Commission [1], identity theft made up about one-fifth of all the consumer complaints reported in 2010.According to the report, the most popular type of identity theft involved criminals stealing victim's identity so that they could use their information to apply for federal benefits such as social security payments.The Federal Trade Commission estimated that about 3-5% of US residents have their identities stolen every year, and surprisingly, most of them might not aware that this has happened to them.Overall losses from identity fraud in the US alone are estimated to $37 billion [2].There is no authentic data available to estimate the identity fraud losses in countries like India or Brazil, where increasing adoption of Internet and computer applications are grown exponentially in an insecure manner.
With the recent advancement in technology, it is now possible to create an automatic individual recognition system using biometric attributes such as face, fingerprint, iris, or handwritten signatures that can be verified on-line [3].Individual authentication using biometrics is attractive because the authentication process is principally based on physiological or behavioral characteristics that are unique and measurable.Biometrics is being emerged as a state of the art tool of information security for accurate and efficient identification of individuals in a digital society [4].Biometric identities are intrinsic to an individual; therefore, these are difficult to share and distribute among peers; steal and forge to the fraudulents; counterfeit and hack [5].They are not easy to fool, nor they are intrusive.But, biometric authentication as a part of security is not a solution in itself.
Biometric attributes are unique among individuals, but they are not the secret.Biometric information is irrevocable and hard to regain identity [6].The biometric technology is being popular but it has the growing concerns of vulnerabilities that breach the security of the system and user privacy.It includes circumvention, replay attacks, and obfuscation [7].Circumvention refers to the reproduction of falsified credentials from an original biometric sample, whereas the presentation of original biometric feature from an illegitimate subject is referred to a replay attack.The removal of biometric features in order to avoid the establishment of true identity is called obfuscation.
This paper advocates to use the bioelectrical signal such as electrocardiogram (ECG) as a novel biometric attribute for secured identity proofing.The main advantage of using an ECG as a biometrics is the robustness to circumvention, replay, and obfuscation attacks.If we succeed in establishing the ECG signal as a biometrics, then the recognition system using the ECG can be empowered with an inherent shield to the security threats.The ECG is a physiological signal generated in human heart that has an inherent feature of vitality which signifies the life signs.The ECG signal as a biometrics is sufficiently nonvulnerable to spoof attacks so it may insure robustness of a system.It is universally present among live subjects, and as such, it is naturally secured.The ECG is difficult to mimic and hard to be copied or stolen.Therefore, the ECG has strong credentials to successfully address the security and privacy issues of an individual [8].
The ECG acquired from different individuals show heterogeneous characteristics [9][10][11][12][13][14][15].The heterogeneity has been marked in the studies conducted for diagnosing arrhythmia present in the heart function [16].The distinctiveness of the ECG signal is generally resulted due to the change in ionic potential, time of ionic potential to spread from different part of the heart muscles, plasma levels of electrolytes (e.g., potassium, calcium and magnesium, etc.), and the rhythmic differences.The difference in the heart structure such as chest geometry, position, size, and physical condition among individuals also manifest the distinctiveness in their heartbeats rhythm.The distinct characteristics of an individual heartbeat are reflected in the change in morphology, difference in amplitudes, and the variation in time intervals of the dominant fiducials.The main issue to use the ECG as a biometrics is the variations present in the heartbeats and accumulation of signal with noise artifacts making the data representation more difficult [17,18].
In this paper, we present a novel method for identifying individuals using their ECG signals, in particular the method is insensitive to noise and muscle flexure that are usually contaminated to them.We make use of ECG waveform delineators [19,20] for determining the dominant fiducials from each heartbeat, efficiently.The ECG characterization is performed next, such that the heartbeat interval features and the morphological features are derived from the successive beats.In order to make the features insensitive to noise and other artifacts, a two-stage procedure is employed.In the first stage, the information of amplitude features are derived from the signal which is scaled using Pareto normalization [21], whereas in the second stage, a linear projection of derived heartbeat features from high-dimensional space to a lower dimensional feature space is derived.It uses principal component analysis (PCA) also known as Karhunen-Loeve methods [22] for dimensionality reduction and yields projection directions that maximize the scatter across all traces of an individual ECG.When the heartbeat features are projected into the subspace spanned by the dominant eigenvectors, the separability among the subjects are manifested.The dimension of generated eigenvectors is the same as original feature vectors, therefore, they can be referred to as eigenbeat features.
The identity classification is performed using a nearest neighbor classifier.The classification results are obtained using a subset of the MIT-BIH Arrhythmia database [23] and the ECG database prepared from normal subjects at IIT(BHU).In the sections to follow, the dependency of recognition performance on the number of principal components is reported in both the databases.We found extremely better results at the lower dimension of feature subspace of the ECG signal.
The rest of the paper is organized as follows.The related work and the state-of-the-art using the ECG signal as a biometrics is given in Section 2. Section 3 presents the method of ECG characterization for biometric application.The schematic description of an ECG biometric recognition system is presented in Section 4. The experimental results that prove the efficacy of the proposed characterization of the ECG signal for identity recognition on publically available database and the database acquired from real subjects are presented in Section 5. Finally, some conclusions are drawn in Section 6.

Related Work
Different studies have shown that the ECG can be used as a new candidate of biometrics for individual authentication ( [9][10][11][12][13][14][15]).Biel et al. [9] amongst the first who have demonstrated the use of ECG for biometric application.They have conducted the biometric experiment on a group of 20 subjects and a multivariate method is used for classification.The feasibility of using the ECG as a new biometrics for individual identity verification has been shown by Shen et al. [10].They have performed the experiment on appearance and time domain features of the heartbeat.However, most of the features are extracted from QRS complex those are stable with the change in the heart rate.The feature, QT interval that varies with the heart rate, is normalized.Template matching and decision-based neural network approaches are used to quantify the identity verification rate that are reported to be 95% and 80%, respectively.After combining the classification approaches, the result of identity verification is found to be 100% for a group of 20 individuals.
Israel et al. [11] have shown that the ECG of an individual exhibits distinct pattern.They have performed the ECG processing for quality check and a quantifiable metrics is proposed for classifying heartbeats among individuals.A total of 15 intrabeat features based upon cardiac physiology are extracted from each heartbeat, and the classification is performed using linear discriminant analysis.The tests show that the extracted features are independent to electrode positions (e.g., around chest and neck), invariant to an individual's state of anxiety, and unique to an individual.
Wang et al. [12] have introduced a two-step fiducial detection framework that incorporates analytic and appearancebased features from the heartbeat.The analytic features capture local information of a heartbeat that consist of temporal and amplitude features, while the appearance-based features capture the holistic patterns of a heartbeat.To better utilize the complementary characteristics of analytic and appearance based features, a hierarchical data integration scheme has been presented.The method used for feature extraction is based on the combination of autocorrelation (AC) and discrete cosine transform (DCT) which is free from fiducial detection.The recognition performance of AC/DCT method is reported between 94.47% and 97.8%.
The feasibility of ECG signal to aid in human identification has been explored by ), recently.Signal processing methods are used to delineate ECG waveforms (e.g., P and T waves) from each heartbeat.The delineation results are found optimum and stable in comparison to other published results.These delineators are used along with QRS complex to extract different features of classes, time interval, amplitude, and angle, from clinically dominant fiducials on each heartbeat.They have conducted the experiment on 50 subjects' ECG recordings of Physionet database [23].The individuals are classified with an accuracy up to 99%.

Method of ECG Characterization
The ECG is a noninvasive tool used to record the electrical manifestation of the contractile and relaxation activity of the heart.Nobel laureate, Willem Einthoven was the first who had recorded the ECG in 1903 [24].The ECG can be recorded with the surface electrodes placed on the limbs and the chest.The ECG devices use a varying number of electrodes ranging from 3 to 12 for signal acquisition; while systems using more electrodes exceeding 12 and up to 120 are also available [25].Each normal cycle of an ECG signal contains P, QRS, and T waves (e.g., see Figure 1).The P wave is a representation of contraction of the atrial muscle and has a duration of 60-100 milliseconds (ms).It has low amplitude morphology of 0.1-0.25 millivolts (mV) and is usually found in the beginning of the heartbeat.The QRS complex is the result of depolarization of the messy ventricles.It is a sharp biphasic or triphasic wave of 80-120 ms duration and shows a significant amplitude deflection that varies from person to person.The time taken for an ionic potential to spread from sinus node, through the atrial muscles and entering the ventricles is 120-200 ms and known as PR interval.The ventricles have a relatively long ionic potential duration of 300-420 ms known as the QT interval.The plateau part of ionic potential of 80-120 ms after the QRS and known as the ST segment.The return of the ventricular muscle to its resting ionic state causes the T wave that has an amplitude of 0.1-0.5 mV and duration of 120-180 ms.The duration from resting of ventricles to the beginning of the next cycle of atrial contraction is known as the TP segment which is a long plateau part of negligible elevation.

Feature Extraction.
Prior to use the ECG signal in the subsequent stage of processing of heartbeat segmentation and features extraction, all signals are passed through a two-stage median filters of width 200 ms and 600 ms, respectively, to remove the baseline wander.The first median filter suppresses the QRS complexes and P waves, while the second median filter suppresses the T waves.The resulting signal is then subtracted from the original signal to produce the baseline corrected ECG signal [26].
The QRS complex delineator is used to detect the heartbeats from the ECG signal.We employ the technique proposed by Pan and Tompkins [27] for QRS complex detection with some improvements.It uses digital analysis of slope, amplitude, and width information of the ECG waveforms.The beginning and the end of the QRS complex, that is, QRS onset and QRS offset time instances, respectively, are delineated according to the location and convexity of the R peak.Once the heartbeats are detected, temporal time windows are defined heuristically before and after the QRS complex time instances to seek for the P and T waves.The technique proposed in [19] is used to determine the P onset and P offset time instances from the P wave, while the technique proposed in [20] is used to determine T onset and T offset time instances from the T wave.Through all these time instances of the heartbeats, three different classes of features are derived.These are (1) heartbeat interval features, (2) interbeat interval features, and (3) ECG morphological features.
(1) Heartbeat Interval Features.Five features relating to heartbeat intervals are computed after heartbeat segmentation.The QRS width is the duration between the QRS onset and the QRS offset .The T wave duration is defined as the time interval between the QRS offset and the T offset .The PQ segment is defined as the time interval between the P onset and the QRS onset .The pre-TP segment is defined as the time interval between a given P onset and the previous wave T offset .Similarly, the post-TP segment is defined as the time interval between a given T offset and the following wave P onset .
(2) Interbeat Interval Features.Ten features relating to interheartbeat intervals are computed after the segmentation of successive heartbeat fiducials points.These features are extracted from the PP, QQ, SS, TT, and RR sequence of the successive heartbeats.The pre-PP (post-PP) interval is the time interval between P onset of a given heartbeat and the P onset of the previous (following) heartbeat.The pre-QQ (post-QQ) interval is the time interval between Q Peak of a given heartbeat and the Q peak of the previous (following) heartbeat.The pre-SS (post-SS) interval is the time interval between S peak of a given heartbeat and the S peak of the previous (following) heartbeat.The pre-TT (post-TT) offset interval is the time interval between T offset of a given heartbeat and the T offset of the previous (following) heartbeat.Similarly, the pre-RR (post-RR) interval is defined as the RR interval between a given heartbeat and the previous (following) heartbeat.
(3) ECG Morphological Features.We divide the ECG morphological features into two groups, where both groups contain the amplitude values of the segmented heartbeat of the ECG signal.The first group contains thirty-three features.These features are determined within the time windows as shown in Figure 2. The first window is set between the QRS onset and the QRS offset .Five features are extracted corresponding to the fiducials of QRS onset , Q peak , R peak ,  peak , and QRS offset .The boundaries of the second window is set such that it approximately covers the P wave.It contains the portion of the heartbeat between the P onset and the P onset +120 ms.Using    linear interpolation method, thirteen features are estimated uniformly within the time window.Similarly, the third window is bounded by the QRS offset and the T offset .Fifteen features of the heartbeat amplitude is derived uniformly within the window using linear interpolation.
The second group contains twenty-eight features which are extracted from the normalized ECG signal.In the normalized signal the amplitude difference from   to the mean  is measured in units of standard deviation  such as, where   represents the data sample of size  at discrete instance of time  [21].The aim of normalization is to reduce the sensitivity of the ECG signal both to noise and muscle flexure that are contaminated in the signal.We define three different time windows with respect to the location of the heartbeat fiducial points (FP) as shown in wave and it is started from FP + 150 ms to FP + 420 ms.Ten amplitude features are derived from this window.In all three windows the features are derived using linear interpolation method where the signal is sampled uniformly.

Selection of Eigenbeat Features.
The eigenbeat method is based on the linear projection of the sample space to a low dimensional feature space [22].It uses principal component analysis (PCA) for dimensionality reduction that yields the projection direction that maximizes the scatter across all samples present in the gallery and the probe ECG signals.
More formally, let us consider that there be  classes of feature vectors { 1 ,  2 , . . .,   } where each class   contains one or more feature vectors ( = 1, 2, . . ., ) in an dimensional space.Then, a set of  ( < ) feature basis vectors {}  =1 can be estimated by maximizing the expression arg max where {  |  = 1, 2, . . ., } is the set of -dimensional eigenvectors of the scatter matrix  corresponding to the  largest eigenvalues where  is defined as where  (∈ R  ) = 1/ ∑  =1   is the mean of all feature vectors participated in the recognition process.It is to be noted that the dimension of the generated eigenvectors is the same as the original feature vectors; therefore, they can be referred to as eigenbeat features.The generated eigenvectors form the basis representation of the gallery and the probe ECG signals.They yield projection directions that maximize the scatter across all feature vectors within a subject.The coefficients set   are derived for each class of feature vectors corresponding to each subject  which is a compact representation of the heartbeat features in the gallery set.If a class contains more than one feature vector, then the average of all heartbeats for a single subject provides the gallery representation   against which the probe data is to be compared.For identity recognition the classification is performed using a nearest neighbor classifier in the reduced feature space.The best match in the gallery set is the choice of subject  that minimizes the distance between   and   such that arg min where   is a vector of coefficients in eigenspace for probe ECG signal which can be obtained using similar processing steps as used by the gallery ECG.

ECG Biometric Recognition System
The biometric recognition system of identifying individuals using the ECG signal is shown in

Experimental Results
The performance of the aforementioned identity recognition system is tested on two different databases.The first database is prepared from publically available PhysioBank archives [23], in particular MIT-BIH Arrhythmia database is used.Forty-four ECG recordings are randomly selected from this database in this study.The second database is prepared in the laboratory of the School of Biomedical Engineering, Indian Institute of Technology (Banaras Hindu University), using the PowerLab 4/25 system of AD Instruments.A total 29 volunteers aged 20 to 56 years are participated in the data enrollment process, and the data are acquired in multiple sessions across a year.The data acquisition is performed in a more simplistic manner, with the subjects merely sitting on a chair or a wooden stool under relaxed condition and the clamp electrodes are fixed to both wrists and left ankle.The data are bandpass filtered at 0.3-50 Hz and sampled at 1000 Hz.
The MIT-BIH Arrhythmia database contains only one ECG recording for each subjects; therefore, the complete record of a subject is divided into two halves such that the first half is used for training and latter half is used for testing.In IIT(BHU) database, two different sessions of data are used for the gallery and the probe.We randomly select 10 sets of heartbeats from the gallery data, and the features are derived from the successive occurrences of 10 beats such that they meet the delineators requirement in the each set.Prior to apply the selection procedure of eigenbeat features the derived features are normalized using Z-score criterion [28].A representation relative to the basis formed by dominant eigenvectors is derived by selecting the five most significant eigenvectors corresponding to five maximum eigenvalues.Finally, the coefficients of the components in the projected subspace are generated which is a compact representation of heartbeats in the gallery set.Averaging over all of the heartbeat sets for a single subject provide the gallery representation against which the probe data is to be compared.The probe signal undergoes the same processing steps as the gallery set and derives a representation relative to the basis formed by the dominant eigenvectors.The best match in the gallery set is the choice of the subject that minimizes the distance (Euclidean) between components.The distinction between eigenbeat features among the subjects of MIT-BIH Arrhythmia database and IIT(BHU) database can be represented by principal components that are shown in Figure 5.The separability between the subjects of both health statuses is clearly visible at the lower dimensions of projection which are represented by considering only first principal component (PC1) and second principal component (PC2).For example, a decrease in intraclass variability and an increase in interclass separability, represented by principal components such as PC1 and PC2 for the subjects of MIT-BIH Arrhythmia database and IIT(BHU) database are shown in Figures 5(a The performance of the proposed system on eigenbeat features of MIT-BIH Arrhythmia database and IIT(BHU) database is shown in Figure 6 through a plot of recognition accuracy versus the number of principal components.
The recognition accuracy can be computed as the inverse of an equal error rate reported by the system.For the subjects of MIT-BIH Arrhythmia database, the system is reported a recognition accuracy of 88.5% when the comparison between the gallery and the probe ECG is done on the information derived by first principal component (PC1).The recognition accuracy can be improved further, if the system accumulates information associated to other principal components.For example, the reported values of recognition accuracy are 89.87%,90.78%, 91.23%, and 91.42% for PC2, PC3, PC4, and PC5, respectively.A similar trend is observed for the subjects of IIT(BHU) database.The recognition accuracy is reported maximum to 95.55% for the subjects of IIT(BHU) database on the accumulation of first five principal components (PC5) whereas the minimum accuracy of 91.15% is reported at PC1.The recognition accuracy at other dimension of principal components such as PC2, PC3, and PC4 are found to be 93.65%,94.82%, and 95.22%, respectively.The reason being that of getting higher recognition accuracy for the subjects of IIT(BHU) database is that it contains the ECG data of healthy subjects that are acquired under normal conditions.This confirms that the proposed characterization of ECG signal and subsequently derived eigenbeat features are robust enough to distinguish the subjects of both health statuses such as the healthy subjects or the subjects suffering form cardiac arrhythmia.

Conclusion
This study has proposed a new method to characterize the ECG signal for identifying individuals.The set of features have derived from the analysis of successive heartbeats which include the heartbeat interval features and the waveform morphological features.We have derived eigenbeat features using the method of linearly projecting the sample space to a lower dimensional feature space.The advantages of using eigenbeat features are the elimination of noise and muscle flexure from the ECG data, reducing the complexity to access a larger attribute set and simplifying the classification process.The reported results have proved the effectiveness of proposed characterization of the ECG signal and subsequently derived eigenbeat features for individual identification.

Figure 1 :
Figure 1: A typical ECG signal that includes three successive heartbeats and the information lying in the P, Q, R, S and T waves.

Figure 2 :
Figure 2: Extraction of ECG morphological features from a heartbeat, where the fiducial point (FP) represents the position of R peak.

Figure 3 :
Figure 3: Extraction of ECG morphological features from the scaled samples of a heartbeat.

Figure 3 .Figure 4 :
Figure 4: Schematic of a biometric recognition system for identifying individuals based on their electrocardiograms.

Figure 4 .
ECG signals acquired from the individuals are preprocessed for quality check.It makes necessary correction of the signal from noise and muscle flexure.The ECG delineation includes segmentation of heartbeats such as detection of the P, Q, R, S, and T waves and determination of their end fiducials.The feature extraction includes determination of the interval features and the ECG morphological features from the successive beats and derived the eigenbeat features.Finally, the authentication is performed on reduced feature set in the projected domain comparing the features of the gallery and the probe ECG signals using nearest neighbor criterion.

Figure 5 :Figure 6 :
Figure 5: Intersubject variability represented by first and second principal components of five different subjects (a) MIT-BIH Arrhythmia database and (b) IIT(BHU) database.