We propose an ear recognition system based on 2D ear images which includes three stages: ear enrollment, feature extraction, and ear recognition. Ear enrollment includes ear detection and ear normalization. The ear detection approach based on improved Adaboost algorithm detects the ear part under complex background using two steps: offline cascaded classifier training and online ear detection. Then Active Shape Model is applied to segment the ear part and normalize all the ear images to the same size. For its eminent characteristics in spatial local feature extraction and orientation selection, Gabor filter based ear feature extraction is presented in this paper. Kernel Fisher Discriminant Analysis (KFDA) is then applied for dimension reduction of the high-dimensional Gabor features. Finally distance based classifier is applied for ear recognition. Experimental results of ear recognition on two datasets (USTB and UND datasets) and the performance of the ear authentication system show the feasibility and effectiveness of the proposed approach.
The research on ear recognition has been drawing more and more attention in recent five years [
Ear recognition using 2D images can be categorized into three kinds based on the features extracted from the ear images: (1) structural features, (2) local features, and (3) holistic features. The application scenarios can be summarized as the following three: ear recognition under constrained environment, ear recognition with pose variation, and ear recognition under partial occlusion. Table
Representative feature extraction methods and their performance evaluation.
Reference | Description | Dataset | Performance |
---|---|---|---|
Structural feature extraction methods | |||
|
|||
[ |
Burge and Burger (1997) use the main curve segments to form Voronoi diagram and use adjacency graph matching based algorithm for authentication. But the curve segments will be affected by changes in camera-to-ear orientation or lighting variation. | — | — |
|
|||
[ |
Moreno et al. (1999) used feature points of outer ear contour and information obtained from ear shape and wrinkles for ear recognition. The compression network is applied for classification. | 28 subjects, 168 images, 6 images for each subject | Rank-1: 93% |
|
|||
[ |
Mu et al. (2004) proposed a long axis based shape and structural feature extraction method; the shape feature consisted of the curve fitting parameters of the outer ear contour, the structural feature was composed of ratios of the length of key sections to the length of the long axis, and nearest neighborhood classifier was used for recognition. | USTB dataset2: 77 subjects, 4 images for each subject | Rank-1: 85% |
|
|||
[ |
Choras (2005) proposed a geometrical feature extraction method based on number of pixels that have the same radius in a circle with the centre in the centroid and on the main curves. | 240 images | — |
|
|||
Local feature extraction methods | |||
|
|||
[ |
Hurley et al. (2005) proposed the force field transformation method. The ear images are treated as array of mutually attracting particles that act as the source of Gaussian force field. The force field transforms of the ear images were taken and the force fields were then converted to convergence fields. Then Fourier based cross-correlation techniques were used to perform multiplicative template matching on ternary thresholded convergence maps. | XM2VTS face profile subset (252 subjects) | Rank-1: 99.2% |
|
|||
[ |
Nanni and Lumini (2007) proposed a local approach. A multimatcher system was proposed where each matcher was trained using features extracted from the convolution of each subwindow with a bank of Gabor filters. The best matchers, corresponding to the most discriminative subwindows, were selected by Sequential Forward Floating Selection where the fitness function was related to the optimization of the ear recognition performance. Ear recognition was made using sum rule based decision level fusion. | UND collection E (114 subjects) | Rank-1: 80% |
|
|||
[ |
Bustard and Nixon (2010) proposed an ear registration and recognition method by treating the ear as a planar surface and creating a homography transform using SIFT feature matches. Ear recognition under partial occlusion was discussed in this paper. The relationship between occlusion percentage and recognition rate was presented. | XM2VTS face profile dataset (63 subjects) | Rank-1: 92% (30% occlusion from above), |
|
|||
[ |
Arbab-Zavar and Nixon (2011) proposed a model-based approach for ear recognition. The model was a partwise description of the ear derived by a stochastic clustering on a set of scale invariant features of the training set. The outer ear curves were further analyzed with log-Gabor filter. Ear recognition was made by fusing the model-based and outer ear metrics. | XM2VTS face profile dataset (63 subjects) | Rank-1: 89.4% (30% occlusion from above) |
|
|||
Holistic feature extraction methods | |||
|
|||
[ |
Chang et al. (2003) used standard PCA to compare face and ear and concluded that ear and face did not have much difference on recognition performance. | Human ID Database (197 subjects) | Rank-1: 70.5% for face, 71.6% for ear |
|
|||
[ |
Yuan et al. (2006) proposed an improved Nonnegative Matrix Factorization with Sparseness Constraint for ear recognition with occlusion. The ear image was divided into three parts with no overlapping. INMFSC was applied for feature extraction. The final classification was based on a Gaussian model based classifier. | USTB dataset3 (79 subjects) | Rank 1: ~91% (for 10% occlusion from above) |
|
|||
[ |
Dun and Mu (2009) proposed an ICA based ear recognition method through nonlinear adaptive feature fusion. Firstly, two types of complimentary features are extracted using ICA. These features are then concatenated with different weight to form a high-dimensional fused feature. Then the feature dimension was reduced by Kernel PCA. The final decision was made by nearest neighbor classifier. | USTB dataset3 (79 subjects), and USTB dataset4 (150 subjects) | Rank-1: ≥90% (for pose variation within 15°) |
|
|||
[ |
Wang et al. (2008) proposed ear recognition based on Local Binary Pattern. Ear images were decomposed by Haar wavelet transform. Then Uniform LBP, combined with block-based and multiresolution methods, was applied to describe the texture features. Finally, the texture features are classified by the nearest neighbor method. | USTB dataset3 (79 subjects) | Rank-1: ≥92% (for pose variation within 20°) |
|
|||
[ |
Zhou et al. (2010) proposed ear recognition via sparse representation. Gabor features are used to develop a dictionary. Classification is performed by extracting features from the test data and using the dictionary for representing the test data. The class of the test data is then determined based upon the involvement of the dictionary entries in its representation. | UND G subset, 39 subjects | Rank-1: 98.46% (4 images for training and 2 images for testing) |
The present works listed in Table
The rest of this paper is organized as follows. Section
This section will detail the ear enrollment process, which includes ear detection and ear normalization. The ear detection approach based on our modified Adaboost algorithm detects the ear part under complex background using two steps: offline cascaded classifier training and online ear detection. We have made some modification compared with our previous work on ear detection [
In our previous work [ Given Initialize weights Repeat for train the weak classifiers with weights get the error rate of the regression function set the weight of weak classifiers update weights of different training samples:
After
The training process of the algorithm mentioned above is time consuming. Also, the false rejection rate and false acceptance rate need to be lowered for real application scenarios. Based on the structural features of the ear itself, we propose the following four improvements on the traditional AdaBoost algorithm in view of its deficiency.
We divide the feature value space composed of the feature values of all the training samples for each Haar-like feature into
In the new space, search the optimum threshold
The strong classifier is composed of weighted weak classifiers. The smaller error rate a weak classifier possesses, the bigger the weight assigned to a weak classifier. The error rate is decided by the training samples. The positive and negative samples are equally important. Ear detection experimental results show that with the traditional Adaboost algorithm, the false acceptance rate is not acceptable. So we improve the training procedure of the weak classifiers by proposing that the weight distribution among the weak classifiers not only is decided by the total error rate, but also is concerned with the negative samples.
So we improve the Adaboost algorithm by including a parameter
Given a weak classifier learning algorithm and a training sample set
Initialize the weights
Repeat for train the weak classifier learning algorithm with the weights compute the error rate;
get the updating parameter Update the weights:
After
By analyzing human ear samples, we find that the global structure of human ears is similar: the shape the outer contour is almost oval; all human ears have similar shape of helix, antihelix, and concha. These similar global features are helpful for the training of an ear/non-ear classifier. But if we look into more details, we find that each ear has its unique features or measures on different ear components. The differences on details make the Adaboost based two-class classifier more difficult to construct. Here, we regard those ear samples that present special detail components as “difficult samples.” These samples will get more weights during the weak classifier learning process, because the weak classifier will always try to get them classified correctly. This will ultimately incur overfitting. In order to prevent overfitting, we apply a new parameter called “elimination threshold Hw” to improve the robustness of the ear detector. During the training process, the ear samples with weight greater than Hw will be eliminated.
Table
Detection performance comparison: FRR and FAR.
Test dataset | Number of ear images |
Method in our previous work [ |
Proposed method in this paper (dual-ear detector) | Proposed method in this paper (left ear detector + right ear detector) | |||
---|---|---|---|---|---|---|---|
FRN/FRR | FAN/FAR | FRN/FRR | FAN/FAR | FRN/FRR | FAN/FAR | ||
CAS-PEAL | 166 | 5/3.0% | 6/3.6% | 5/3.0% | 2/1.2% | 3/1.8% | 2/1.2% |
UMIST | 48 | 1/2.1% | 0/0% | 0/0% | 0/0% | 1/2.1% | 0/0% |
USTB220 | 220 | 1/0.5% | 5/2.3% | 0/0% | 5/2.3% | 0/0% | 1/0.5% |
For each layer, we compare the detection performance between the single-ear detector and the dual-ear detector in this paper. For each layer, the difference of detection rate between these two detectors is very limited: not more than 1%, but they have difference on the false acceptance rate as shown in Figure
Performance comparison between single-ear detector and dual-ear detector.
Ear detection experimental examples.
As we can see from the second figure in Figure
We apply an automatic ear normalization method based on improved Active Shape Model (ASM) [
Example images of ear normalization.
After converting the color images to gray images, we use histogram equalization to eliminate lighting variations among different subjects. Figure
For feature extraction, Gabor filter is applied on the ear images to extract spatial local features of different directions and scales. The Gabor features are of high dimension, so Kernel Fisher Discriminant Analysis is further applied for dimension reduction. Then distance based classifier is applied for ear recognition.
The ear has its distinct features compared with the face, such as the texture features on different orientations. For its eminent characteristics in spatial local feature exaction and orientation selection, Gabor based ear is presented in this paper. A two-dimensional Gabor function is defined as [
The Gabor feature of an ear image is the convolution of the ear image and the Gabor kernel function as shown in
Gabor ear image: (a) the real part of the Gabor kernel; (b) the magnitude spectrum of the Gabor feature of the ear image on 4 orientations; (c) the magnitude spectrum of the Gabor feature of the ear image on 8 orientations.
For a
Suppose that
Dimension reduction using the Full Space Kernel Fisher Discriminant Analysis.
In this experiment, we select two datasets: the USTB dataset3 [
Sample images form USTB dataset3.
In UND collection J2, we selected 150 subjects, which possess more than 6 images per subject. Six images are selected for the experiment. There exists illumination and pose variation in this subset. Using the ear detection method mentioned in Section
Sample images of UND dataset.
In the training stage, we use the
For the kernel function in KFDA, we apply linear kernel, polynomial kernel, RBF kernel, and cosine kernel functions as shown in Table
Rank-1 recognition rate of the proposed method when adopting different kernel functions.
Kernel |
|
Parameters in kernel functions | Rank-1 recognition rate | |
---|---|---|---|---|
USTB dataset 3 | UND dataset | |||
Linear |
|
— | 75.44% | 74.22% |
Polynomial |
|
|
89.11% | 87.33% |
RBF |
|
|
96.46% | 94% |
Cosine |
|
|
92.41% | 90.22% |
Table
Performance comparison between ear recognition with or without ear normalization.
Dataset | Ear images without normalization | Ear images with normalization | Rank-1 recognition rate | |
---|---|---|---|---|
Without normalization | With normalization | |||
USTB dataset3 |
|
|
93.58% | 96.46% |
|
||||
UND dataset |
|
|
91.11% | 94% |
We have designed an ear authentication system with the proposed method. Figure
Framework flow of the ear authentication system.
Figure
Ear authentication system: (a) authentication scenario; (b) ear enrollment interface; (c) ear authentication interface.
ROC curve of the ear authentication system.
Comparison of the number of weak classifiers on each layer.
In this paper, an ear recognition system based on 2D images is proposed. The main contributions of the proposed method are the following: (1) automatically detecting the ear part from the input image and normalizing the ear part based on the long axis of the outer ear contour and (2) extracting the discriminating Gabor features of the ear images using the Full Space Kernel Discriminant Analysis. Experimental results show that we can achieve automatic ear recognition based on 2D images with the proposed method.
Our future work will be focused on two aspects: (1) in the ear normalization stage, we need to improve the accuracy of the earlobe localization, generate deliberate models for the earlobe landmarks, and make the searching process less dependent on the initial model shape, and (2) in the ear authentication stage, we need a larger dataset to testify the matching accuracy and the real-time performance of the proposed method.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work is supported by the National Natural Science Foundation of China under Grant no. 61300075 and Fundamental Research Funds for China Central Universities under Grant no. FRF-SD-12-017A.