Image Super-Resolution Based on Sparse Representation via Direction and Edge Dictionaries

Sparse representation has recently attracted enormous interests in the field of image super-resolution. The sparsity-based methods usually train a pair of global dictionaries. However, only a pair of global dictionaries cannot best sparsely represent different kinds of image patches, as it neglects two most important image features: edge and direction. In this paper, we propose to train two novel pairs of Direction and Edge dictionaries for super-resolution. For single-image super-resolution, the training image patches are, respectively, divided into two clusters by two new templates representing direction and edge features. For each cluster, a pair of Direction and Edge dictionaries is learned. Sparse coding is combined with the Direction and Edge dictionaries to realize superresolution. The above single-image super-resolution can restore the faithful high-frequency details, and the POCS is convenient for incorporating any kind of constraints or priors.Therefore, we combine the twomethods to realize multiframe super-resolution. Extensive experiments on image super-resolution are carried out to validate the generality, effectiveness, and robustness of the proposed method. Experimental results demonstrate that our method can recover better edge structure and details.


Introduction
In video surveillance, medical imaging, satellite observation, and other scenes, due to the imaging equipment, the hardware storage, natural environment, and other limited factors, we usually get low-resolution (LR) images [1].However, highresolution (HR) images are often needed for subsequent image processing and analysis in most practical applications.As an effective approach to solve this problem, superresolution (SR) technique fulfils the task of estimating HR image from one or a sequence of LR images.The SR technology increases high-frequency components and removes the resolution degradation, blur, noise, and other undesirable effects by making full use of the existing data information.
As a hot research direction in the field of image processing, the problem of SR has been studied for more than three decades, and many SR approaches have been proposed.According to the number of input LR images, SR approaches can be broadly classified into two categories: single-image SR and multiframe SR [2].According to the processing method, it mainly includes three kinds of SR approaches: interpolation-based methods [3], reconstruction-based methods [4], and learning-based methods [5].Interpolation methods get the value of interpolated point from its surrounding pixels with different weight.The classical interpolation methods include nearest interpolation, bilinear interpolation, and bicubic interpolation [6].Although such methods have simple principle and low algorithm complexity, they tend to produce considerable blurring and jagged artifacts.The reconstruction-based methods [7][8][9][10] are usually used for multiframe SR.These methods usually incorporate the reconstruction constraints or the prior knowledge to model a regularized cost function with a datafidelity term [11].The reconstruction-based methods possess the ability to recover better edges and suppress aliasing artifacts.However, they cannot restore the fine structures when the upscaling factor is larger, as the performance depends heavily on the nonredundant complementary information among input LR images.Undoubtedly, the learning-based methods have become a research hotspot in recent years.The methods exploit the information from training images to establish the relationship between HR and LR image patches.As the relationship reflects the inherent similarity among natural images, the learning methods can restore high-frequency information effectively.There are some typical methods, such as Example-Based method [12], Neighbor Embedding method [13], Sparse Coding method [14][15][16], and Anchored Neighborhood Regression method [17].In 2010, Yang et al. [18] proposed an image SR method via sparse representation, and it can provide better reconstruction results.In 2012, Zeyde et al. [19] improved the efficiency of Yang's method by reducing the dimension of training samples and using K-SVD algorithm to train dictionaries.In 2014, Farhadifard et al. [20] presented a single-image SR based on sparse representation via directionally structured dictionaries.It can avoid the problem that using same dictionary for sparse representation of image patches cannot reflect the differences of image patch structure characteristics [21], which exists in Yang et al. [18] and Zeyde et al. [19].Usually, learningbased methods need a large and representative database, leading to high computational costs in the process of training dictionaries.
Inspired by the work of [18,20] and considering the importance of learning dictionary, the author presents a novel Direction and Edge dictionaries model for image SR.Firstly a pair of Direction and Edge templates is built to classify the training image patches into two clusters.Then each cluster is studied to get two pairs of HR and LR overcomplete Direction and Edge dictionaries.Finally sparse coding and Direction and Edge dictionaries are combined to realize single-image SR.The performance of reconstruction-based methods degrades rapidly when the upscaling factor is larger.Therefore we combine the above single-image SR with the POCS to realize multiframe SR.Experimental results prove that our method is feasible and effective, while demonstrating better edge and texture preservation performance.
The content of this paper is arranged as follows: Section 2 introduces sparse representation and Direction and Edge learning dictionaries.In Section 3, the novel sparse representation based image SR using Direction and Edge dictionaries is illustrated.The experimental results of single-image and multiframe SR and their evaluation are given in Section 4. Section 5 arrives at a brief conclusion.

Sparse Representation and Direction and
Edge Learning Dictionaries where  = , so  = .If  is an image patch taken from ,  is an image patch taken from  which is in the same location with .The sparse representation model is as follows [22]: where  is the sparse representation coefficient of ,  ℎ ∈  × ( > ) is the HR overcomplete dictionary.Assuming LR overcomplete dictionary   =  ℎ ,   ∈ R × ( > ), then  =   , so it can be clarified that HR and LR image patches have the same sparse representation coefficient.As a result, taking known a pair of HR and LR dictionaries { ℎ ,   } as the premise of prior knowledge, we are able to rebuild the corresponding HR image patch as long as we acquire sparse representation coefficient of the LR image patch.

Direction and Edge Learning
Dictionaries.The quality of reconstructed image depends largely on the expression ability of overcomplete dictionary.In Yang et al. [18], dictionary training scheme is as follows: where  ℎ is the set of sampled HR training image patches and   is the corresponding LR training image patches,  = {  } is the sparse representation coefficient, and  is a balance parameter.
Based on the same sparse representation model ( 2), Zeyde et al. [19] modify the above dictionary training method: LR dictionary   is trained from the LR set   by applying K-SVD algorithm [23] to solve the following minimization problem [24]: where  0 denotes the sparsity constraint.The obtained sparse representation matrix  is used to infer dictionary  ℎ as follows: Both Yang et al. [18] and Zeyde et al. [19] have two similar aspects in dictionary training: (i) the large scale of training sample sets leads to heavy computational burden in the training process; (ii) it ignores the difference between image patches with only one pair of global dictionaries whose representation ability is limited.
It has been shown in [25] that designing multiple dictionaries is more beneficial than a single one.Furthermore, in [26] it is pointed out that using clustering to design several dictionaries improves quality and reduces computational complexity [27].In 2014, Farhadifard et al. [20] trained eight pairs of directionally structured dictionaries for directional patches and a pair of dictionaries for nondirectional patches.Firstly, the two-dimensional space is divided into eight fixed directions.Then they design eight kinds of template sets, and each kind of template set contains several templates.Finally these templates are applied to classify the training sets into eight directional clusters and one nondirectional cluster and further to learn a pair of dictionaries for each cluster.
As everyone knows, edge represents the large-scale structure of image and has the characteristics of smoothness, so human visual system is more sensitive to edge.Besides, image content is highly directional.In short, edge and direction are the most important features of an image.In order to better capture the intrinsic direction and edge characteristic of image, we design Direction and Edge dictionaries for different clusters of patches, instead of a global dictionary for all the patches.
Based on the consideration of the significant difference between edge pixels and neighborhood pixels and the strong direction performance of the image, we design a new pair of Direction and Edge templates, as Figure 1.It is not difficult to find that the template A represents vertical direction and edge, while the template B represents horizontal direction and edge.
Direction and Edge templates are used to guide the clustering of image patches and further to obtain Direction and Edge dictionaries.Firstly, each patch is clustered, and the training image patches are classified into two clusters, in which the criterion for clustering is Euclidean distance.Thus the Euclidean distances between the image patch and two templates are obtained and the smaller value determines which cluster the patch belongs to.Finally, two clusters are trained, respectively, to obtain two pairs of HR and LR dictionaries, which are referred to as the Direction and Edge dictionaries.
There are some advantages of Direction and Edge dictionaries: (i) the dictionaries are expected to better represent the intrinsic direction and edge characteristics of the natural images; (ii) the reconstructed HR image via the above dictionaries inherits the large-scale information of natural images and has more high-frequency information, which are the most important parts for SR; (iii) they reduce computational complexity due to the fact that structural dictionaries can be smaller than a global dictionary.
In order to improve the algorithm efficiency, our templates are at the size of 6 by 6.Compared with Farhadifard et al. [20], our method contains only two templates, which consider not only the direction, but also the edge features.In addition, there is no need to set a specific threshold, which is for clustering nondirectional patches in [20].Of course, we can try other possible classification templates.In the Direction and Edge dictionary training phase, we gain the LR dictionary using K-SVD algorithm for each cluster of LR training set and then obtain the corresponding HR dictionary by ( 5).

Image SR Based on Direction and Edge Dictionaries
In the reconstruction phase, after computing sparse representation coefficient of LR patch, the HR patch is obtained from the coefficient multiplied by corresponding class HR dictionary.

Algorithm Implementation
Step Step 2 (Direction and Edge dictionary training).For first class, train LR1 training set by K-SVD algorithm to get first class LR dictionary  1 and sparse coefficient  1 .According to (5), get first class HR dictionary  ℎ1 from known  1 and HR1 training set.Similarly, get second class LR dictionary  2 and HR dictionary  ℎ2 .
Step 3 (image reconstruction).(a) Acquire MR image by interpolation amplification of the input LR image.Take patch with five-pixel overlap from MR image and classify the patches into two clusters by same method as above.Then get feature vectors by extracting the first-and second-order gradients of patches.Finally calculate the sparse coefficient  of each column characteristic vector on corresponding class   ;   The results of single-image SR are showed in Section 4.

Method.
The method of POCS is widely used for multiframe SR and easily available to introduce prior knowledge.However, it usually shows jagged edges in the reconstructed results when the upscaling factor is larger.Our method    based on Direction and Edge dictionaries can recover more high-frequency information and preserve smooth edges.Therefore, we combine the POCS method with our singleimage SR method to realize multiframe SR.It includes three steps: multiframe registration, POCS reconstruction, and single-image SR based on Direction and Edge dictionaries, like Figure 5.
In the stage of multiframe registration image, firstly extract feature points of input multiple images by SURF algorithm [28] and complete feature points matching.Then remove the mismatching points by RANSAC algorithm [29].Finally the registration images are obtained according to the parameters computed from affine transformation matrix.

Algorithm Implementation
Step 1 (multiframe registration image).(a) Obtain LR sequence images via geometric distortion and downsampling of HR image.Then select the first frame as reference frame and other frames for the floating frame.Use SURF algorithm to extract feature points and RANSAC algorithm to remove the false matching points.
(b) The registration images are calculated on the basis of the affine transformation model with matching points.
Step 2. Use POCS method to reconstruct the registration images by an upscaling factor .
Step 3. The result of POCS is magnified by our method by a factor of .The whole reconstruction upscaling factor is  * .

The Experimental Results and Evaluation
In this section, we demonstrate the numerous experiments to verify the performance of our method.All the experiments are executed with MATLAB 8.3.0.adopted, and the dictionaries have 256 atoms, and patch size is 6×6 with the overlap width equal to 5 between the adjacent patches.LR training and testing images are generated by resizing the ground truth image by bicubic interpolation.Since human visual system presents more sensitivity to the luminance changes, we only apply the SR method to the luminance component, while applying the simple bicubic interpolation to the chromatic components.
We compare the proposed single-image SR based on Direction and Edge dictionaries with the bicubic interpolation method and several state-of-the-art SR methods, including Yang et al. [18], Zedye et al. [19], NCSR [16], ANR [17], and CSC [15].The source codes of competing methods are downloaded from the authors' websites and we use the recommended parameters by the authors.
Visual Quality.We perform experiments on 16 widely used test images by an upscaling factor 2. In Figures 6, 7, and 8, we show the single-image SR results of competing methods on images of Plant, Parrot, and Comic.In order to clearly compare, we amplify four times of local line in left upper corner of the figure.As highlighted in the small window, the SR results by our method can recover more high-frequency information and reduce artifacts.
PSNR and SSIM.The peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values by the competing methods are shown in Tables 1 and 2. Our method achieves much better PSNR and SSIM index than bicubic and NCSR.The average values are only slightly inferior to Yang's method, Zedye's method and ANR.For the PSNR index, our method   is better than Yang's method on Raccoon, better than Zedye's method on Hat, Lena, and Bike, and better than ANR on Hat, Parrot, and Raccoon.The PSNR of CSC based on convolutional neural network is higher than our method, but the method takes long running time and at least 3G memory.
In short, the results verify not only the validity of our method, but also the good robustness for different kinds of input.

Conclusion
In this paper, we present a novel for image superresolution based on sparse representation in terms of Direction and Edge dictionaries.The key idea is to classify image patches based on their direction and edge features and selectively code each patch using more appropriate dictionary.According to the Euclidean distances between image patch and two new templates, image patches are divided into two clusters and then are trained to obtain two pairs of Direction and Edge dictionaries.Single-image experimental results indicate the usefulness of the proposed Direction and Edge dictionaries.Furthermore, we combine the POCS with our single-image SR method to realize multiframe SR, especially when upscaling factor is larger, while the experiments show that it has the same satisfactory results.In short, our proposed method achieves not only competitive PSNR and SSIM values, but also more pleasant visual quality of image edge structures and texture.

Figure 1 :
Figure 1: Direction and Edge templates (from left to right: A template and B template).

3. 1 .
Single-Image SR 3.1.1.Method.The single-image SR based on Direction and Edge dictionaries includes three steps: tectonic training sets, Direction and Edge dictionary training, and image reconstruction, as shown in Figures 2, 3, and 4. In training sets' tectonic phase, after taking patches overlapped from training images, all patches are classified into two clusters according to the Euclidean distances.
1 (tectonic training sets).(a) Take 91 natural images as HR image library, and the LR image library is comprised of LR images achieved from downsampling of HR images.To reach the HR image dimension, LR images are scaled up to the size of HR images via bicubic interpolation and are termed medium-resolution (MR) images.(b) Take patches with five-pixel overlap from HR images (6 × 6), and then calculate the Euclidean distances between each normalized patch and the two templates.Classify the patches into two classes by distances, and mark the first and second class position.(c) Take the same size patch from MR image in the same position as HR image, and then use the first-and secondorder gradients of the patches as the feature vector.Develop the first class LR (LR1) training set and the second class LR (LR2) training set by combining the corresponding class feature vectors.(d) Extract image patch from HR-MR image to be columns feature vector, so as to develop the first class HR training set (HR1) and the second class HR training set (HR2) by collecting corresponding class feature vectors.

Figure 2 :
Figure 2: Two classes of tectonic HR and LR training sets.

Figure 3 :
Figure 3: Direction and Edge dictionary training.

4. 1 .
Single-Image SR.The experimental setting in this paper refers to Yang et al. [18].The same 91 training images are (b) Ground truth (c) Bicubic (d) Yang et al.(e) Zedye et al.

4. 2 .
Multiframe Image SR.The experiments aim to obtain a HR image (512 × 512) from 10 frames LR image (128 × 128) by an upscaling factor of 4 ( = 2,  = 2).In order to simulate the imaging process in actual scene, we obtain 10 LR images from the original HR image via downsampling by a factor of 2, random jitter around 1∼2 pixel and clockwise rotation of −1∼+1 degree.

Figure 10 :
Figure 10: of the Monarch (the upscaling factor 4). Smaller: input images.Larger: from left to right and top to bottom: bicubic, POCS, our method, and original image.

Figure 11 :
Figure 11: Results of the Pepper (the upscaling factor 4). Smaller: input images.Larger: from left to right and top to bottom: bicubic, POCS, our method, and original image.