An Example-Based Super-Resolution Algorithm for Selfie Images

A selfie is typically a self-portrait captured using the front camera of a smartphone. Most state-of-the-art smartphones are equipped with a high-resolution (HR) rear camera and a low-resolution (LR) front camera. As selfies are captured by front camera with limited pixel resolution, the fine details in it are explicitly missed. This paper aims to improve the resolution of selfies by exploiting the fine details in HR images captured by rear camera using an example-based super-resolution (SR) algorithm. HR images captured by rear camera carry significant fine details and are used as an exemplar to train an optimal matrix-value regression (MVR) operator. The MVR operator serves as an image-pair priori which learns the correspondence between the LR-HR patch-pairs and is effectively used to super-resolve LR selfie images. The proposed MVR algorithm avoids vectorization of image patch-pairs and preserves image-level information during both learning and recovering process. The proposed algorithm is evaluated for its efficiency and effectiveness both qualitatively and quantitatively with other state-of-the-art SR algorithms. The results validate that the proposed algorithm is efficient as it requires less than 3 seconds to super-resolve LR selfie and is effective as it preserves sharp details without introducing any counterfeit fine details.


Introduction
With the advent of smartphones having sophisticated camera technologies and integrated online social networking services, selfies gain popularity among social media users. Selfie is typically a photograph that one has taken of oneself, using the front camera of a smartphone. Most conventional smartphones have two cameras, a primary rear camera and a secondary front camera. As the front camera is mainly intended for video conference, it has limited pixel resolution compared with rear camera. For instance, Apple's iPhone 6 has a 1.2-megapixel (MP) front camera which is very much limited compared with primary 8 MP rear camera in terms of pixel resolution. Though the front camera is designed for video conference, it is often used by users to capture selfies. Selfies are low-resolution (LR) images, as the fine details in it are explicitly missed due to hardware limitation of the front camera. Despite the fact that selfies are self-portraits which essentially comprise facial information of the user, it is equally important to proclaim the importance of the background information in it. The vital background information can be an interesting scene, astounding location, or a group of friends. Selfies are widely shared via social media; hence the volume of such images is burgeoning and there is a need to improve the quality of these images.
Super-resolution (SR) algorithm [1] aims to generate high-resolution (HR) image from single or ensemble of LR images. Example-based SR algorithms [2][3][4] enhance the resolution of LR image by learning the high frequency (HF) details from LR-HR training examples. The priori which defines the relation between the LR and HR images could be learned from the training image-pairs. The learned imagepair priori [5] can be used to generate HR image from the observed LR image. Conventional example-based SR algorithms can be characterized into two categories with respect to the way image-pair priori is learned from the training set, namely, the implicit-and explicit-priori based methods. The implicit-priori based algorithms [6][7][8] represent the 2 The Scientific World Journal priori directly from the training image-pairs. Most of the traditional -nearest neighbor algorithms [6,9] are implicit and are computationally expensive to search the -nearest neighbors to estimate the HR image. The explicit-priori based algorithms either use a dictionary [10][11][12] or a regression function [13,14] to map the correspondence between the LR-HR image-pairs. Dictionary based algorithms [15,16] represent the priori between LR and HR image-pairs by a LR-HR dictionary pair. In regression based approaches, the regression function which maps the LR and HR image-pairs can be mapped by either a supervised [13] or semisupervised [17] learning process. The time required to train explicit image-pair priori is generally high. Therefore, conventional example-based SR algorithms are not suitable for superresolving selfies.
The main challenge in super-resolving LR selfie is to learn the image-pair priori which maps the LR to HR image-level correspondence with minimum computational complexity. As HR images captured by the rear camera preserve fine details, it can be used to learn a priori to super-resolve selfies. Most of the conventional example-based SR algorithms are implemented by vectorizing the training image-pairs [9,15]. By vectorizing, the image-level information between imagepairs is lost due to structural disparity. Hence the vectorbased priori which relates the LR-HR image-pairs is not effective [18]. To overcome this difficulty, a novel matrix-based priori is proposed by Tang and Yuan [18] to model the imagepair priori. However, the matrix-based priori is derived based on the assumption that most of the image patches extracted from natural training images are full rank [18]. Though this assumption is valid for natural images, patches extracted from real-life images with facial information and smooth textures are intuitively rank deficient.
This paper endeavors to improve the spatial resolution of selfies by efficiently learning an optimal matrix-value regression (MVR) operator from LR-HR image patch-pairs extracted from training samples captured by rear camera of the smartphone. The training image patch-pairs are factorized by singular value decomposition (SVD) to accommodate rank deficient patch-pairs in the learning process. The MVR operator explicitly models the correspondence between the LR-and HR-training image patches to super-resolve the LR selfies. As the proposed MVR algorithm avoids vectorization, it preserves the structural similarity of training image patches and enjoys image-level information within them. The computational cost of the proposed algorithm is greatly reduced by optimally selecting a larger patch-size in both training and recovering phase as it carries significant image-level information. The main contributions of this paper are as follows: (i) A fast selfie SR algorithm: LR selfies are super-resolved by a fast example-based algorithm using an optimal MVR operator learned from HR training images captured by rear camera of the smartphone.
(ii) Effective and efficient MVR operator: The computational cost to learn the MVR operator is minimum. Also, it faithfully preserves the structural similarity between training image patch-pairs, which makes the MVR operator effective and efficient.
The remainder of the paper is organized as follows. A brief description on image-pair analysis methods is reported in Section 2. In Section 3, the proposed SR methodology for selfie images is explained in detail. In Section 4, experimental evaluations are reported to compare the performance of the proposed method and finally Section 5 concludes the paper.

Brief Description on Image-Pair Analysis Methods
Example-based super-resolution algorithms estimate the fine details that are missed in LR images by learning the correspondence between training image-pairs. The process of example-based super-resolution is summarized in Figure  1. Effective image-pair analysis methods are required by example-based SR algorithms to learn an image-pair regression operator, which defines a relation between LR-HR image-pairs. Training image-pair typically consists of a HR image and its corresponding synthetically generated LR image. Well learned image-pair regression operator provides significantly precise correspondence between LR-HR patchpairs and could be effectively used as a global priori in many inverse image processing tasks [19,20]. Example-based SR is an ill-posed problem and requires sophisticated image-pair analysis methods [18] to learn the suitable regression operator from training examples. Image-pair analysis methods are classified as vectorbased and matrix-based methods. In vector-based image-pair analysis methods [15,16], LR-HR image patch-pairs are represented as feature vectors and its correspondence is learned with an explicit vector-based regression operator. Though image patch-pairs are faithfully represented as vectors in vector-based methods, its image-level structural information is lost due to vectorization [21,22]. Therefore the problem of image-pair analysis is converted to a problem of vector-pair analysis. To avoid structural disparity and preserve imagelevel information within patch-pairs, a few matrix-based image-pair analysis methods are suggested [18,23]. In these methods [18,23], a linear matrix-based regression operator is learned to map the global dependency [18] between LR-HR patch-pairs.

Matrix-Value Regression (MVR)
Operator. An image patch-pair denoted as = ( , ) ∈ R × defines a linear matrix-value regression (MVR) operator : ∈ R × such that If the image patch-pairs are assumed to be full rank matrices, then the MVR operator can be obtained as where −1 refers to the matrix inverse of . The MVR operator profoundly depends on the full rank condition of its constituent patch-pairs to compute the matrix inverse. For rank deficient matrices, computing inverse is not stable. Hence, in recent matrix-based imagepair analysis methods [   to be full rank matrices. However, the main difference between a selfie image and general image is with respect to its information content. Typical selfie images essentially carry the facial information of the user that contains a foreground with vivid facial features of similar textures and a background with less complex information. However, general images will carry any natural information having more complex structures with random patterns and textures [25]. Image patches extracted from random natural images are intuitively assumed to be full rank [18] due to complex structures in it. Though this assumption is valid (ideally producing 5% rank deficiency) for natural images, image patches extracted from selfie images are intuitively rank deficient. To validate this, an experiment was carried out with 100000 patches extracted from training images and it is observed that approximately 50% of the patches are rank deficient as shown in Table 1. This is attributed to the similar texture details present in the training samples. Furthermore, this percentage increases for larger patch-size as the patch coherence becomes higher.
To accommodate rank deficient patch-pairs to represent the image-pair priori, matrix inverse is computed by factorizing the patch-pairs with singular value decomposition.

Similarity Measure via MVR Operator.
The linear MVR operator precisely models the correspondence between the image patch-pairs = ( , ) =1 ∈ R × . Therefore from (1), we get If a LR test patch is identical with th patch in the training set, then ( −1 ) becomes an identity matrix. The term ( −1 ) can be observed as a patch-similarity measure which defines the mutual information between and . From (3), the HR estimation of the LR test image can be found effectively using the MVR operator.

Computational Efficiency via MVR Operator.
The MVR operator significantly reduces the computational complexity by reducing the number of variables required to represent the operator. As the image patch-pairs are matrices of size × , the image-pair regression operator will be a matrix of size × . Therefore it is required to have 2 variables to represent the matrix-based regression operator. Nevertheless, in vectorbased approaches, as image patches are column vector of size 2 × 1, the regression operator that maps the two vectors should be a matrix of size 2 × 2 and hence requires 4 variables.

The Proposed Selfie Super-Resolution Methodology
The overview of the proposed selfie SR methodology is illustrated in Figure 2. The example-based selfie SR algorithm consists of a training phase (performed offline), where an optimal MVR operator is learned from a set of image patch-pairs extracted from the training image set and a reconstruction phase performing super-resolution on the test selfie image using the learned matrix-value regression (MVR) operator from the previous phase.  Figure  The correspondence between and is learned by an optimal MVR operator.

Algorithm to Learn Optimal MVR Operator.
Let the training patch-pairs be denoted as = ( , ) =1 ∈ R × , where ( , ) is low-and high-resolution patch-pairs of size × and is the number of training patch-pairs. Let : → be a MVR operator mapping the low-resolution image space to the high-resolution image space.
The optimal MVR operator * is subsequently learned from the training set using the least square regression model given by where ‖ ⋅ ‖ is the Frobenius norm. Let ( ) be the cost function such that (4) becomes * = arg min∑ where To obtain the optimal MVR operator, the target function is given by where 0 = ∑ =1 ‖ ‖ 2 , 1 = ∑ =1 , and 2 = ∑ =1 are the auxiliary matrices.
The optimal MVR operator * can be deduced by imposing condition for minimization on (7); hence Therefore, the optimal MVR operator is given by The inverse of the auxiliary matrix 2 is computed by factorizing 2 with SVD; thus 2 = Σ , where and are orthogonal matrices and Σ is a diagonal matrix with singular values. Thus * = 1 ( Σ −1 ) .
The optimal MVR operator * shown in (10) explicitly represents the image-level correspondence between the low-The Scientific World Journal 5 Input: Training image patch-pairs = ( , ) =1 ∈ R × Output: Optimal Matrix-value operator * Steps: (1) Calculate the auxiliary matrices 1 and 2 (2) Factorize the auxiliary matrix 2 using SVD and high-resolution image patch-pairs. The MVR operator resulting from the training phase is used to reconstruct the fine details from the low-resolution selfie images. The procedure to deduce optimal MVR operator is summarized in Algorithm 1.

Algorithm for SR Reconstruction.
In the reconstruction phase, LR selfies captured by the front camera are superresolved using the MVR operator learned from Algorithm 1. In addition, the MVR operator is adapted to learn from the test selfie itself by a bootstrapping approach [16]. The given test selfie is assumed to be the HR image and the scaled-down version is its LR counterpart. The correspondence between the LR-HR patch-pairs extracted from the bootstrapped image-pairs is used to update the optimal MVR operator. The test selfie lr is interpolated by a factor with an interpolation operator . Nonoverlapping image patches of size × are extracted from the interpolated test image. This collection of low-resolution patches is represented as lr = { lr } =1 . Every test LR image patch in set is super-resolved using the optimal MVR operator, such that The super-resolved test image patches are merged to form the super-resolved high-resolution image hr . The steps involved in the reconstruction phase are summarized in Algorithm 2.

Results and Discussions
The proposed algorithm is evaluated for its effectiveness and efficiency by conducting both qualitative and quantitative experiments on various test images shown in

Experimental Setup.
In the experiments carried out, test images shown in Figures 3 and 4 are used as LR images. Though the algorithm is proposed to super-resolve LR selfie images, few standard test images (shown in Figure 3) such as Barbara, girl, and Lena are used to fairly compare the performance of the proposed algorithm with other state-ofthe-art SR algorithms.
To evaluate the effectiveness of the proposed algorithm on selfies, various test selfies captured by different smartphones such as iPhone 4s, iPhone 6, and Nexus 5 with diverse specifications are collected. Figure 4 shows the selfie test images used for comparison, in which images (#1) Figure 5.
The training and testing color images are converted to YCbCr channel and only the luminance channel is considered for super-resolution as it is sensitive to human eye. The LR images are synthetically generated by downsampling the test images shown in Figures 3 and 4 using bicubic interpolator. The downsampled LR images are resized to the size of target HR image and are contiguously blocked into nonoverlapping patches of size 27 × 27. The LR test images are super-resolved by a scale-factor of = 2, 3, and 4. LR-HR training imagepairs are generated with the same scale-factor . All the experiments were carried out using Matlab R2012 on an Intel core i5-2400@2.7 GHz processor with 4 GB RAM.

Experimental Analysis
Effectiveness. Qualitative and quantitative evaluation are carried out to assess the effectiveness of the proposed algorithm. Qualitative evaluation of SR methods relies on a few attributes of the reconstructed image such as sharpness, naturalness, and granularity [28]. The sharpness of an image is assessed based on the HF details it preserves. The naturalness of an image is affected by the artifacts present in it. Various artifacts such as ghosting, ringing, jagging, and staircase artifacts generally affect the quality of an image. A visual  comparison is made to assess the fidelity of the proposed algorithm qualitatively. The effectiveness of the proposed method is quantitatively evaluated based on a few objective performance metrics such as root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM) index [29]. A high PSNR score indicates that the scaled-up image is free from distortions and effectively reconstructs the HF details. Similarly, a high SSIM value (typically 1) implies that the scaled-up image has a very similar structure to its ground truth. For fair comparison, the standard test images shown in Figure 3 are super-resolved using the proposed method and are compared with the aforementioned algorithms. Table 2 summarizes the quantitative comparison of various SR algorithms on test images for 3x magnification. Figure 6 shows the 2x visual comparison for the standard test image Barbara. Figure 6(a) shows the ground truth and its corresponding scaled-up local image and Figure 6 Figure 6(f) shows the SR image and its corresponding local image super-resolved by the proposed MVR algorithm. In Figure 6(c), the texture on the table cloth is blurred when compared with ground truth. Though the stripes in the table cloth are sharp in Figure 6(d), it is not the same pattern as in the ground truth as the fine details in the table cloth are not well preserved. Dong et al. 's method reconstructs the texture as in the ground truth; however it introduces ringing and jagging artifacts, as observed in Figure 6(e), and accordingly has low PSNR value. As observed from Figure 6(f), it is evident that the proposed algorithm preserves sharp texture details as in ground truth and is free from artifacts.
For visual comparison on test selfies, 3x magnification on test selfie images is carried out. Figure 7 depicts the qualitative visual comparison for five test selfie images. Figure 7 [15] model, two coupled dictionaries are trained simultaneously from random raw image patches. Based on a dictionary pretrained from thousands of natural images, Yang et al. 's method seems to produce natural-looking results. Though Yang's algorithm faithfully reconstructs natural-looking images, it can be observed from Table 2 that the objective measures are not the best among other comparative algorithms. This is because the fine details in the image are not well preserved due to the fact that a universal dictionary used in this method fails to represent complex structures accurately. For instance, the spectacle frame in the ROI of test image (#1) shown in Figure 7(b) looks sharp and natural but for the ROI of test image (#4) in Figure 7    is carried out on the reconstructed image. Nevertheless, for images with complicated edges, the edge priori tends to introduce ringing artifacts along the corner of edges, which reduces the PSNR and SSIM value. For example, artifacts can be visualized in the fan rails of ROI of test image (#5) in Figure 7 Gaussian process regression method. On the contrary, the proposed method preserves the sharp details and fine textures in most of the images without affecting the naturalness of the image. Also it is observed that the proposed method provides more photorealistic details as it does not introduce any counterfeit fine details. The effectiveness of the proposed algorithm is quantitatively validated from the PSNR and SSIM value observed from Table 2. The proposed method achieves the best PSNR and SSIM value which indicates that the proposed algorithm reconstructs the LR image with minimal distortions and a high SSIM value corroborates the effectiveness of the structural similarity which has been preserved by the proposed matrix-based regression algorithm. The proposed method performs better than other state-of-the-art SR approaches as it avoids vectorization of image patch-pairs during training phase of the MVR operator, which intuitively preserves structural similarity and image-level information within patch-pairs. Also, as the MVR operator is trained with HR images captured by the rear camera of the smartphone it effectively corresponds to the relation between LR-HR patchpairs, thereby improving the performance of the proposed algorithm. For instance, in the highlighted ROI of test image (#1) shown in Figure 7 Efficiency. The efficiency of the proposed matrix-based SR algorithm is compared with aforementioned algorithms on a personal computer with Intel core i5-2400@2.7 GHz processor with 4 GB RAM. The computation time required to train and recover the images is reported in Table 3. Among the training-free algorithms (Dong et al. and He et al. algorithms), the average CPU time taken to recover the SR image by He et al. 's Gaussian process regression algorithm [27] is significantly high as the source code available in the author's homepage is not optimized. The NARM based SR algorithm by Dong et al. [26] takes approximately 3∼6 minutes to recover the HR image with a magnification factor of = 3. It is witnessed from Table 3 that the training time required by training-based SR algorithms such as Yang et al. and Kim et al. algorithm is significantly high, as it has to extract training image patches from an extensive dataset to train an universal dictionary. Owing to the fact that image patches are represented as matrices and large patches (typically of size 27 × 27) are used in the proposed MVR algorithm, the computational time is significantly less (<a minute), thereby outperforming other stateof-the-art approaches. The experimental results presented in Table 3 reveal that the proposed MVR algorithm can be efficiently applied to super-resolve LR selfie images with minimum computational expense.  performance of the algorithm. Intuitively, selecting a larger patch-size may produce overly smooth results whereas a small patch tends to produce undesired artifacts in smooth areas of the image. In addition, computational cost of the algorithm is influenced by patch-size. Hence a performance evaluation based on variation in patch-size for the proposed algorithm is carried out and depicted in Figure 5. The magnified ROI highlighted in red box is compared for visual fidelity. In addition, a quantitative analysis based on PSNR for different patch-size is reported in Table 4. The size of training patch is varied from 3 × 3 to 43 × 43 with a step size of 8 pixels. For a small patch-size of 3 × 3 as in Figure 8, the freckles near the eye are relatively blurred and are quantitatively validated in Table 4. The qualitative and quantitative performance of the proposed algorithm increase as the patch-size is increased and are maximum for a patchsize of 27 × 27 as shown in Figure 8 and Table 4, respectively. For instance, it is perceived that the freckles near the eyes are crisper comparatively and hence the eyes look sharper and are natural-looking for a patch-size of 27 × 27 as in Figure 8(e). Due to the fact that the image-level information between patches is preserved by the proposed matrixbased regression algorithm, the performance of the proposed algorithm is better for larger patch-sizes. However, too large patch-size will reduce the performance of the algorithm as it is more complex to utilize the image-level information within them.

Influence of Scale-Factor.
The test images are magnified by a scale-factor and its performance evaluation is carried out. For visual comparison, test image (#6) is upscaled by a factor of 2x, 3x, 4x, and 5x by the proposed algorithm and is depicted in Figure 9. The ROI considered for visual evaluation is the texture of the shirt. It is observed from Figure 9 that the texture details are well preserved for 2x magnification. For 3x magnification, the proposed algorithm is able to preserve the fine texture details as the interleaved pattern in the shirt is clear to visualize. For 4x magnification, though the pattern in the ROI is visible, the fine details in it are lost. It is also observed that ringing artifacts along the edges affect the quality of the image. Furthermore, the texture details are lost for a magnification factor of 5x. The results are quantified by its PSNR values tabulated in Table 5.

Influence of Training Dataset.
Training images captured by the rear camera of the smartphone can serve as fine exemplar to train the MVR operator. The performance of the proposed algorithm can be influenced by the training dataset used to train the MVR operator. To validate this, a performance evaluation based on variation in dataset is carried out. It is observed that the training images from the same device as the test image lead to better results than when the training and testing images are taken from different devices. For visual comparison, test selfie (#2) is super-resolved by a factor of 3x using the MVR operator trained by four different datasets and is depicted in Figure 10. The training dataset TR1 has a collection of random natural images as example images. Similarly, training datasets TR2, TR3, and TR4 have collection of example images captured by the rear camera of iPhone 4s, iPhone 6, and Nexus 5, respectively. From Figure 10(e), it is observed that the freckles beneath the eye are sharp and crisp for the image superresolved using the MVR operator trained with TR4. This is due to the fact that both training examples in TR4 and the test selfie (#2) are captured by the same smartphone. As the rear camera of the smartphone is used by the same user, example images captured from the rear camera tend to possess similar low-level image features such as texture, granularity, and exposure as in the selfie image captured by front camera. In addition to this, the facial information contained in the selfies  can possibly reoccur in the training set as it is captured by the same user. This self-similarity improves the interdependency between the images and results in a more robust and efficient MVR operator. The results are quantified by the PSNR values tabulated in Table 6.

Conclusion
In this paper, a fast example-based SR algorithm for superresolving LR selfie image is presented. The proposed SR algorithm learns an optimal matrix-value regression (MVR) operator from a set of training samples captured from the rear camera of a smartphone. The relation between LR-HR training patch-pairs is established by an optimal MVR operator. It preserves structural similarity across training patchpairs and effectively represents the image-level information of the training image patch-pairs. It is used effectively to superresolve clean LR selfie image captured by the front camera of the smartphone and it is observed that the fine details in the super-resolved test selfie are preserved. In the future, the proposed algorithm will be extended to super-resolve distorted selfie images. Qualitative and quantitative experiments have validated the efficiency and effectiveness of the proposed algorithm over other state-of-the-art SR algorithms.