Optimized Periocular Template Selection for Human Recognition

A novel approach for selecting a rectangular template around periocular region optimally potential for human recognition is proposed. A comparatively larger template of periocular image than the optimal one can be slightly more potent for recognition, but the larger template heavily slows down the biometric system by making feature extraction computationally intensive and increasing the database size. A smaller template, on the contrary, cannot yield desirable recognition though the smaller template performs faster due to low computation for feature extraction. These two contradictory objectives (namely, (a) to minimize the size of periocular template and (b) to maximize the recognition through the template) are aimed to be optimized through the proposed research. This paper proposes four different approaches for dynamic optimal template selection from periocular region. The proposed methods are tested on publicly available unconstrained UBIRISv2 and FERET databases and satisfactory results have been achieved. Thus obtained template can be used for recognition of individuals in an organization and can be generalized to recognize every citizen of a nation.


Introduction
A biometric system comprises a physical or behavioral trait of a person through which he or she can be recognized uniquely. Computer aided identification of a person through face biometric has grown its importance through the last decade and researchers have attempted to find unique facial nodal points. However, change of facial data with expression and age makes it challenging for recognition through face. A stringent necessity to identify a person on partial facial data has been felt in such scenario. There are forensic applications where antemortem information is a partial face. These motives led researchers to derive auxiliary biometric traits from facial image, namely, iris, ear, lip, and periocular region. Recognizing human through iris captured under near infrared (NIR) illumination and constrained scenario yields satisfactory recognition accuracy while recognition under visual spectrum (VS) and unconstrained scenario is relatively challenging. In particular, VS periocular image has been exploited to examine its uniqueness as there exists many nodal points. Classification and recognition through periocular region show significant accuracy, given the fact that periocular biometric uses only approximately 10% of a complete face data (illustrated in Section 4.1). Figure 1 illustrates the working model of a biometric system that employs region around the eye (periocular region) as a trait for recognition. Face is one of the primitive means of human recognition.
Periocular (peripheral area of ocular) region refers to the immediate vicinity of the eye, including eyebrow and lower eye fold as depicted in Figure 2. Face recognition has been main attention of biometric researchers due to its ease of unconstrained acquisition and the uniqueness. Face is proven to have approximately 18 feature points [1] which can comprise in formation of a unique template for authentication. The major challenges in face detection faced by the researchers are due to change of human face with age, expression, and so forth. With the advent of low-cost hardware to fuse multiple biometrics in real time, the emphasis began to extract a subset of face which can partially resolve the aforementioned issues listed in Table 1. Hence the investigation towards ear, lip, and periocular has started gaining priority. Furthermore, capturing eye or face image automatically acquires periocular image. This gives the  flexibility of recognizing an individual using the periocular data along with iris data without extra storage or acquisition cost. Moreover, periocular features can be used when an iris image does not contain subtle details, which mostly occurs due to poor image quality. Periocular biometric also comes into play as a candidate for fusion with face image for better recognition accuracy. This paper approaches to fit an optimal boundary to the periocular region which is sufficient and necessary for recognition. Unlike other biometric traits, edge information is not the required criteria to exactly localize periocular region. Rather periocular region can be localized where the periphery of eye contains no further information. Researchers have considered a static rectangular boundary around the eye to recognize human and termed the localized rectangle as periocular region. However, this approach is naive as the same static boundary does not work for every face image (e.g., when the face image is captured through different distances from the camera, or when there is a tilt of face or camera during acquisition). So there is a need of deriving a dynamic boundary to describe periocular region. While deciding the periocular boundary, the objective of achieving the highest recognition accuracy also needs to be maintained. The paper specifies few metrics through which periocular region can be optimally localized in scale and rotation invariant manner.
The rest of the paper is organized as follows: Section 2 describes the landmark works in the direction of recognition and classification through periocular region and analyzes the need for optimizing the periocular region considered for recognition pointed in Section 3. In Section 4, four methods of template optimization are described and subsequently Section 5 records experimental results obtained to establish the proposed methods. Finally Section 6 concludes with describing the decided periocular template which is optimal for human recognition and marks its importance for recognition from a large database.

Literature Review
Investigations have been made by researchers in the direction of localizing iris from high quality constrained eye images captured in NIR illumination. So researchers have been motivated to take into account not only iris but also its peripheral regions while recognizing visible spectrum images. The task of recognition is more challenging than classification and hence draws more attention. The most commonly used feature extraction techniques in context of periocular recognition are Scale Invariant Feature Transform, Local Binary Pattern. Tables 3 and 4 outline the methods used and performance obtained towards periocular classification and recognition in visual spectrum images, respectively. However, the portion of eye on which it is applied is not computationally justified in the literature. Any arbitrary rectangular portion centering the eye has been taken into account without questioning the following.
(a) Will the accuracy obtained from this arbitrary boundary increase if a larger region is considered?
(b) How much of the considered periocular region is actually contributing to recognition?
(c) Is there any portion within this arbitrary considered periocular region which can be removed and still comparable accuracy can be achieved?
The derivation of optimal dynamic periocular region gives a simultaneous solution to the aforementioned questions.

Why Optimal Template for Periocular Region Is Required
Unlike other biometric traits, periocular region has no boundary defined by any edge information. Hence periocular region cannot be detected through differential change in pixel value in different directions. Rather the location of boundary is the region which is smooth in terms of pixel intensity, that is, a region with no information. The authors of [2] have localized the periocular region statically by taking a rectangle having dimension 6 iris × 4 iris centering the iris where iris defines the radius of the iris. But this localization method fails when the eye is tilted or gaze is not frontal. Moreover, the method presumes the location of iris center to be accurately detectable. However, iris center cannot be detected for some eye images due to low-resolution nature of the image. The objective of the paper is to attain a dynamic boundary around the eye that defines periocular region. The region hence derived should have the following properties: (a) should be able to recognize humans uniquely, (b) should be achievable for low-quality VS images, (c) should contain main identifiable features of eye region identifiable by a human being, and (d) no subset of the derived periocular region should be equally potent as the derived region for recognition.
The optimally selected periocular template can be a template to hold identity of an individual. If such template can be generated for the whole nation, it can serve as authorized identity (i.e., biometric passport [23]) of every citizen of the nation.

Proposed Periocular Template Selection Methods
To achieve the above stated properties, four different dynamic models are proposed through which periocular region can be segmented out. These models are based on (a) human anthropometry, (b) demand of the accuracy of biometric system, (c) human expert judgement, and (d) subdivision approach.

Through Human Anthropometry.
In a given face image, face can be extracted out by neural training to the system or by fast color-segmentation methods. The color-segmentation methods detect skin region in the image and find the  connected components in such a region. Depending on connected components having skin color, the system labels the component largest in size as face. Algorithm 1 proposes a binary component analysis based skin detection. The thresholds are experimentally fitted to obtain highest accuracy in segmenting skin region in face images comprising skin colors with different skin tones. The algorithm takes RGB face image as input. It first converts the face image to color space and normalizes the pixel values. In the next step, the average luminance value is calculated by summing up the component values of each pixel and dividing the total number of pixels in the image. A brightness compensated image is generated depending on the value of average luminance as specified in the algorithm. In the obtained brightness compensated image, compound condition is applied and a thresholding is performed to obtain the skin-map finally. Through connected component analysis of the skin map in color space, open eye region can be obtained as explained in Algorithm 2. The reason of segmenting open eye region is to obtain the nonskin region within detected face, which can be labeled as eye and thus to achieve approximate location of eye center.
Once the eye region is detected, the iris center can be obtained using conventional pupil detection and integrodifferential approach for finding the iris boundary and a static boundary can be fitted. As described earlier, the authors of [2] bounded periocular region with 6 iris × 4 iris rectangle centering the iris center. But no justification is produced in the paper regarding the empirically taken height and width of this periocular boundary. This process of finding periocular boundary has prerequisite of knowledge of coordinates of iris center and radius of iris.
Anthropometric analysis [24] of human face and eye region gives the information regarding the ratio of eye and iris and ratio of width of face and eye. A typical block diagram in Figure 6 depicts the ratios of different parts of human face with respect to height or width of face. From the analysis, it is found that width periocular = width eyebrow = 0.67 × height face 2 , where (eyebrow,eyecenter) denotes the distance between center of eyebrow and eye center:  This information can be used to decide the boundary of periocular region. In (1), width and height of eye are expressed as a function of the height and width of human face. Hence to gauge the width and height of periocular template boundary, there is no need to have knowledge of iris radius. However, knowledge of coordinates of iris center is necessary. From these information, a bounding box can be fit composing all visible portions of periocular region, for example, eyebrow, eyelashes, tear duct, eye fold, eye corner, and so forth. This approach is crude and dependent on the human supervision or intelligent detection of these nodal points in human eye.
Further, from (2), it is observable that either information of the height or width of periocular region is sufficient to derive the other parameter, provided that the aspect ratio of face is known. This aspect of the localization of periocular is used in Section 4.2. Equation (3) considers elliptical model to represent face while finding the ratio of periocular region and area of a human face. It justifies the usefulness of using an optimally selected periocular template for human recognition rather than a full face recognition system. This method achieves periocular localization without knowledge of iris radius. Hence it is suitable for localization of periocular region for unconstrained images where iris radius is not detectable by machines due to low-quality, partial closure of eye, or luminance of the visible spectrum eye image.
However, to make the system work in more unconstrained environment, periocular boundary can be achieved through sclera detection, for the scenario when iris cannot be properly located due to unconstrained acquisition of eye or when the image captured is a low-quality color face image captured from a distance.

Detection of Sclera Region and Noise Removal
(1) The input RGB iris image is converted to grayscale image im gray.
(2) The input RGB iris image is converted to HSI color model where component of each pixel can be determined by where R, G, B denotes the Red, Green, and Blue color component of a particular pixel. Let the image hence formed containing S component of each pixel is .
(3) If < where is a predefined threshold, then that pixel is marked as sclera region, else as a nonsclera region. Authors in [25] have experimented with = 0.21 to get a binary map of sclera region through binarization of as follows: = < . Only a noisy binary map of sclera can be found through this process, in which white pixels denote noisy sclera region and black pixels denote non-sclera region.
(5) V is formed as follows: (6) All binary connected components present in V are removed except the largest and second largest components.
(7) If size of the second largest connected component is less than 25% of that of the large one, it is interpreted that the largest component is the single sclera detected and the second largest connected component is removed hence. Else both components are retained as binary map of sclera.
After processing these above specified steps, the binary image would only contain one or two components describing the sclera region, after removing noises.

Content Retrieval of Sclera Region. After a denoised binary map of sclera region within an eye image is obtained, it
is necessary to retrieve the information about sclera, whether two parts of sclera on two sides of iris are separately visible, only one of them is detected, or both parts of sclera are detected as a single component.
There can be three exhaustive cases in the binary image found as sclera: (a) the two sides of the sclera is connected and found as a single connected component, (b) two sclera regions are found as two different connected components, and (c) only one side of the sclera is detected due to the pose of eye in the image. If the number of connected components is found to be two, then it is classified as aforementioned Case b (as shown in Figures 3(a), 3(b), and 3(c)) and two components are treated as two portions of sclera. Else, if a single connected component is obtained, it is checked for the ratio of length and breadth of the best fitted oriented bounding rectangle. If the ratio is greater than 1.25, then it belongs to aforementioned Case a, else belongs to Case c (shown in Figure 3(e)). For the aforementioned Case a, the region is subdivided into two components (through detecting minimal cut that divides the joined sclera into two parts) as shown in Figure 3(d) and further processing is performed.

Nodal Points Extraction from Sclera Region.
Each sclera is subjected to following processing through which three nodal points are detected from each sclera region, namely (a) center of sclera, (b) center of concave region of sclera, and (c) eye corner. So in general cases where two parts of the sclera are detected, six nodal points will be detected. The method of nodal point extraction is illustrated below.
(1) Finding Center of Sclera. The sclera component is subjected to a distance transform where the value of each white pixel (indicating pixels belonging to sclera) is replaced by its minimum distance from any black pixel. The pixel which is farthest from all black pixels will have highest value after this transformation. That pixel is labeled as center of sclera. (2) Finding Center of Concave Region of Sclera. The midpoints of every straight line joining any two border pixels of the detected sclera component are found out as shown in Figure 5. The midpoints lying on the component itself (shown by red point between 1 and 2 in Figure 5) are neglected. The midpoints lying outside the component (shown by yellow point between 3 and 4 in Figure 5) are taken into account. Due to discrete computation of straight lines, midpoints of many straight lines drawn in aforementioned way overlap on a single pixel. A separate matrix having the same size as the sclera itself is introduced, which is having zero value of each pixel initially. For every valid midpoint, the value of corresponding pixel in this new matrix is incremented. Once this process is over, more than one connected components of nonzero values will be obtained in the matrix signifying concave regions. The largest connected component is retained while others are removed. The pixel having maximum value in the largest component is labeled as the center of concave region.
(3) Finding the Eye Corner. The distances of all pixels lying on boundary of sclera region from the sclera center are also calculated to find the center of sclera as described above. The boundary pixel which is farthest from the center of the sclera is labeled as the eye corner.
The result of extracting these nodal points from eye image helps in finding the tilt of eye along with the position of iris in eye. Figure 3 depicts five sample images from UBIRISv2 dataset and the outputs obtained from their processing through the aforementioned nodal point extraction technique. This information can be useful in localization of periocular region.

Through Demand of Accuracy of Biometric System.
Beginning with the center of the eye (pupil center), a bounding rectangular box is taken of which only encloses the iris. Figure 4 shows how the eye images changes when it is cropped with pupil center and the bounding size is gradually increased. The corresponding accuracy of every cropped image is tested. In subsequent steps the coverage of this bounding box is increased with a width of 3% of the diameter of the iris and the change in accuracy is observed. After certain iterations of this procedure, the bounding box will come to a portion of periocular region where there is no more change in intensity; hence the region is low  entropic. Hence no more local feature can be extracted from this region even if the bounding box is increased. In such scenario, the saturation accuracy is achieved, and on the basis of saturation accuracy, the corresponding minimum bounding box is considered as the desired periocular region. As the demand of different biometric systems may vary, the bounding box corresponding to certain predefined accuracy can also be segmented as periocular region. Similar results have also been observed for FERET database. The exact method of obtaining the dynamic boundary is as follows.
(2) For each image in database, find approximate iris location in eye image.
(3) For each image in database, centering at the iris center, crop a bounding box whose width = 100 + 3 × % of diameter of iris, height ℎ = 73% of .
(4) Find accuracy of the system with this image size.
(5) Observe the change in accuracy with . Figure 7 illustrates a plot of accuracy against which shows that the accuracy of the biometric system saturates after a particular size of the bounding box. Increasing the box further does not increase the accuracy. To carry out this experiment, Local Binary Pattern (LBP) [26] along with Scale Invariant Feature Transform (SIFT) [27] are employed as feature extractor from the eye images. First, LBP is applied and resulting image is subjected for extracting local feature through SIFT. In the process, a maximum accuracy of 85.64% is achieved while testing with randomly chosen 50 eye images of 12 subjects from UBIRISv2 dataset [28]. When the same experiment is executed for randomly chosen 50 eye images of 12 subjects from FERET dataset [29], a maximum accuracy of 78.29% is achieved. These saturation accuracy values are obtained when a rectangular boundary of width 300% of diameter of iris is considered or a wider rectangular eye area is taken into consideration. To validate the experiment run on the sample strongly, the same experiment was conducted on complete UBIRISv2 and FERET dataset which yielded 85.43% and 78.01% accuracy, respectively. This concludes that a subset of a large database can be employed to find the optimal template size and the result found can be used on whole dataset for cropping of images. So to minimize template size without compromising in accuracy, the smallest wide rectangle with saturation accuracy can be used as localization boundary to periocular region. It is also observed    that the region beyond 300% of diameter of iris, though does not participate in recognition, increases the matching time as shown in Figure 11. This is also another reason of removing the redundant eye region to make the recognition process fast.
To validate this experiment, the same experiment has been carried out once again on full database of UBIRISv2 and FERET. The obtained accuracy values as depicted in Figure 8 ensure the experimental objective that there is no significant feature in periocular region beyond 300% of diameter of iris which can contribute to recognition. The score distribution of imposter and genuine scores is shown in Figures 9 and 10.

Human Expert Judgement on Importance of Portions of
Eye. Human expertise has been utilized to decide a sorted order of importance of different sections of periocular region towards recognition [17]. This information can be used to   detect only the most important section in human eye that is most important towards recognition. If that section is not found in human eye region, the captured image is marked as Failure to Acquire (FTA) and not used for recognition. Hence a predecision on the quality of live query template can increase the accuracy of the system by reducing false rejections. However, this technique is human-supervised while enrolling an image in the database and while a live query comes. The human expert has to verify whether the most important portion of eye is visible in the image and has to guide the biometric system accordingly.

Through Subdivision Approach and Automation of Human
Expertise. During enrolment phase of a biometric system, a human expert needs to verify manually whether the captured image includes expected region of interest. Through automated labeling different sections of an eye, it can be stated which portion of eye is necessary for identification (from human expert knowledge already discussed) and an automated FTA detection system can be made. Hence there is no need of a human expert for verifying the existence of important portions of human eye in an acquired eye image. The challenge in incorporating this strategy in localization of periocular region is the automatic detection of portions of human eye like eyelid, eye corner, tear duct, lowereyefold, and so forth. An attempt to do subdivision detection in eye region can be achieved through color detection and analysis and applying different transformations.

Experimental Results
There are four methods explained through which an optimal periocular template can be selected for biometric recognition. The first two methods explained in Sections 4.1 and 4.2 are experimentally evaluated using publicly available FERET and UBIRISv2 databases. A brief description of the two databases used for evaluation are illustrated in Table 5. A total of ( 11102 2 ) = 61621651 genuine and imposter matching among images from UBIRISv2 and ( 14126 2 ) = 99764875 genuine and imposter matching among images from FERET database are experimented to claim the proposition of optimality.
Anthropometry based approach performs accurately along with proper skin detection and sclera detection in eye region. The sample outputs are shown in Figure 3 which are found to be proper when evaluated against ground truth.
Saturation accuracy based approach performs with an accuracy more than 80% with noisy and low-resolution images of UBIRISv2 and FERET, which marks the efficiency of the proposed approach. To analyse the performance more deeply, Receiver Operating Characteristic (ROC) curve is experimented out when the width of the periocular region is 200%, 250%, and 300% of the diameter of iris region, respectively. ROC curve depicts the dependence of false rejection rate (FRR) with false acceptance rate (FAR) for change in the value of threshold. The curve is plotted using linear, logarithmic, or semilogarithmic scales. As plotted in Figures 12 and 13, it is obvious to conclude that the system performs better with low FAR when = 300 than when = 200 and 250. Hence the ROC curve reveals that the portions of eye lying between 200% and 300% of diameter of iris are very much responsible for the recognition and feature-dense part of a periocular image. Furthermore to have a 1 : matching analysis, Cumulative Match Characteristic (CMC) curves representing the probability of identification at various ranks are also experimented out when the width of the periocular region is 200%, 250%, and 300% of the iris region, respectively (shown in Figures 14 and 15). The index [31] measures the separation between the arithmetic means of the genuine and imposter probability distribution in standard deviation units is defined as follows where and are mean and standard deviation of genuine and imposter scores. Table 6 yields the change of index of recognition when the width of periocular region is varied. The value of increases monotonically from 1.23 to 2.85 for UBIRISv2 dataset and from 1.19 to 2.69 for FERET dataset with incremental change in . An incremental nature in the values of for = 100 to 300 and an insignificant change in the value of for = 300 to 400 also establishes the existence of a boundary between regions contributing and not contributing to recognition.
Human expert judging is experimented by Hollingsworth et al. [17] and the results are used towards the direction of optimal periocular localization. Human subjects are asked which part of eye they feel to be the most important for recognition. Most of the subjects voted that blood vessels are the most important feature to recognize an individual from VS eye image. This information is used to infer which subportions of eye must belong to the optimal periocular region   for it to be a candidate for recognition. Removal of those important regions will lead to rejection of the template.
Subdivision approach needs manual supervision in the process of proper labeling of the different portions of human eye. Once the enrolled templates are labeled by the expert, an optimal part of the template can be selected for recognition. The method is tested on FERET database and yielded proper localization of periocular region.

Conclusions
Recent research signifies why recognition through visual spectrum periocular image has gained so much importance and how the present approaches work. While developing recognition system for a large database, it is a crucial factor to optimize the template size. Existence of any redundant region in template will increase the matching time but will not contribute to increase the accuracy of matching. Hence  removal of redundant region of the template should be accomplished before the matching procedure. As recognition time of identification is dependent on database size n, hence a decrease of 1 : 1 matching time of t will actually decrease nt matching time for identification in total. As n is large (in the range of 10 9 practical cases), nt is a significant amount of time, especially when concurrent matching is implemented in distributed biometric systems. The paper prescribes four metrics for the optimization of visual spectrum periocular image and experimentally establishes their relevance in terms of satisfying expected recognition accuracy. These methods can be used to localize the periocular region dynamically so that an optimized region can be selected which is best suitable for recognition in terms of two contradictory objectives: (a) minimal template size, and (b) maximal recognition accuracy.