Multiframe Superresolution Reconstruction Based on Self-Learning Method

One category of the superresolution algorithms widely used in practical applications is dictionary-based superresolution algorithms, which constructs a single high-resolution (HR) and high-clarity image from multiple low-resolution (LR) images. Despite the fact that general dictionary-based superresolution algorithms obtain redundant dictionaries from numerous HR-LR images,HR image distortion is unavoidable. To solve this problem, this paper proposes amultiframe superresolution reconstruction based on self-learning methods. First, multiple images from the same scene are selected to be both input and training images, and larger-scale images, which are also involved in the training set, are constructed from the learning dictionary.Then, different largerscale images are constructed via repetition of the first step and the initial HR sets whose scale closely approximates that of the target HR image are finally obtained. Lastly, initial HR images are fused into one target HR image under the NLM idea, while the IBP idea is adopted to meet the global constraint. The simulation results demonstrate that the proposed algorithm produces more accurate reconstructions than those produced by other general superresolution algorithms, while, in real scene experiments, the proposed algorithm can run well and create clearer HR images from input images captured by cameras.


Introduction
In modern engineering, high-resolution (HR) images are always desirable which provides important information for making important decisions in various practical applications, for example, biometrics, airborne detection system, medical diagnosis, high definition television (HDTV), and remote surveillance.One way to obtain HR images is to increase sensor pixel resolution, which, however, leads to increased cost in sensors.A promising alternative is the superresolution technique, which reconstructs one HR image from one or more low-resolution (LR) images shot from the same scene, providing sufficient relevant information.
In recent years, superresolution techniques have been extensively studied and put into practice.CEVA Company firstly combined superresolution techniques with CEVA-MM3101 low-power imaging and vision platform successfully, which used LR sensors to construct HR image, and have been applied to camera-enabled mobile devices.French company SPOT applied superresolution techniques to SPOT-5 satellite which can obtain clearer ground scene images and identify targets more accurately, compared with SPOT-4 which did not adopt superresolution techniques.So superresolution technique has been a hot spot to improve images' resolution without changing the optical system of the camera.
Superresolution techniques have three categories in general, namely, interpolation-based, reconstruction-based, and dictionary-based algorithms.Interpolation-based algorithms are simple but ineffective, in which the unknown image pixel values are calculated from the known pixels using interpolation formula [1][2][3][4], such as bilinear, bicubic, and Lanczos interpolations.Reconstruction-based algorithms have process inverse to imaging system solving.These algorithms use prior image knowledge and recover high-frequency information via established imaging models [5][6][7][8][9][10].
Most recently, dictionary-based algorithms have been more widely applied due to the development in compressed sensing [11][12][13][14][15].This category of algorithms requires a number of external training images, using similarity redundancy structure of natural images to restore missing high-frequency information in input images.Roweis and Saul [16] proposed a nonlinear dimension reduction method, namely, locally linear embedding (LLE), which has been widely applied to image processing.Based on [16], Chang et al. [17] applied LLE technology to superresolution reconstruction.In this method, the single input LR image is divided into blocks with image shift; then, each image block searches for  nearest blocks in all LR training image blocks whereby the optimal weights are obtained; finally, a HR image is constructed from a linear combination of the  HR training blocks corresponding to the  LR training blocks.References [18][19][20] improved the algorithm in [17] and achieved enhanced reconstruction results.Yang et al. [21] proposed a sparse coding method for dictionary learning.Compared with previous algorithms, their algorithm relies on a larger number of databases, while obtaining missing high-frequency information more effectively.In [22], their sparse coding is improved to strengthen the sparse similarity between LR and HR image blocks.This method is simpler and more effective compared with [21].He et al. [23] adopted a beta process dictionary learning approach to make the HR and LR dictionaries more accurate and consistent, gaining better results in comparison with [22].Jeong et al. [24] proposed a superresolution method robust to noise, which involves two phases: the learning phase and the reconstruction phase.In the learning phase, the training images are classified and different dictionaries are constructed according to noise quantity; and, in the reconstruction phase, the input images with various noise quantities adopt appropriate dictionaries to reconstruct HR images.
All the above reconstruction methods require a large number of external training images.When external images cannot be accessed, it is feasible to start from the point of similarity redundancy structure of input images.In [25][26][27][28], different self-learning methods are proposed.Glasner et al. [25] realized superresolution reconstruction without external images by adopting the gradual magnification scheme.Chen et al. [26] used high-frequency and low-frequency components of input images to train dictionary image blocks.Then, they used the Approximate Nearest Neighbor Search to reconstruct images and recovered the missing highfrequency information.Zhang et al. [27] proposed a selfsimilarity redundant algorithm, which selects original LR images and corresponding degraded images as the training images and uses LLE algorithm to obtain one HR image.In [28], Ramakanth and Babu proposed single image learning dictionary and adopted the approximate nearest neighbor fields (ANNF) method to find similarities between input LR blocks and LR dictionary.This method reconstructs HR images more effectively and more efficiently compared with other dictionary-based methods.
Other studies [25][26][27][28] have shown that natural images have self-similarity information on different scales and highfrequency information in small-scale images can be obtained from large-scale images.So this paper proposes a multipleframe superresolution algorithm based on self-learning, and this algorithm requires no external training image.First, selfsimilarity information of multiple input images is used to create larger-scale training images through a step by step process.Then, an over-complete dictionary is studied by sparse representations.Finally, the reconstructed images are further improved by exerting the global constraint.Compared with [25][26][27], the proposed algorithm possesses several unique features as follows.
(1) Larger-scale images are created from input images.
Both input images and larger-scale images are involved in the training set.Sparse representation is adopted to learn the mapping relation of the dictionary.(2) One larger-scale image is reconstructed from one input image; apply this reconstruction to all  input images.Through gradual magnification,  initial estimates are obtained, each corresponding to one of the input images, with a scale closely approximating that of the target HR image.Then, the NLM method is employed to fuse all estimates into one target HR image.
(3) In order to restore more high-frequency parts, degraded images of input images are used as LR training set and high-frequency parts of input images are used as HR training set.
Our theoretical analysis demonstrates that the proposed algorithm is feasible, and the simulation results show that the proposed algorithm performs better than general dictionarybased superresolution algorithms.In real scene experiments carried out on a camera system, the proposed algorithm obtains clearer real HR images.
This paper is organized as follows.Section 2 gives a discussion on image sparse models.Section 3 presents the framework of our proposed algorithm as well as its implementation, followed by experimental results in Section 4. Finally, Section 5 gives a conclusion of this paper.

Image Sparse Model
Superresolution reconstruction is a reverse evaluation problem.Dictionary-based superresolution reconstruction trains the corresponding LR dictionary and HR dictionary by mapping the relationship between LR images and HR images.Then, the relationship between LR dictionary and HR dictionary is solved, and a HR image is reconstructed with recovery of the missing high-frequency information.Imaging model between LR image and HR image is as shown in (1): when the real scene is captured through the optical system, we obtain a group of LR image scenes due to the impact of optical blur and downsampling.Consider  = . (1) is a LR image captured by the camera. is the real scene image. is the optical blur matrix. is the downsampling matrix.Superresolution reconstruction is a highly ill-condition problem.Given one LR input image , many HR images  are suitable for formula (1).With the development of compressed sensing, dictionary-based superresolution methods have been applied more widely in image restoration areas.We can divide image into blocks  of the same dimension, every image block is represented by both sparse representation "" and sparse matrix , such as  =  × , where  is a dictionary matrix and "" is sparse coefficients.The basic process of dictionary-based reconstruction is shown in Figure 1.
First, we require lots of training images and train the corresponding LR dictionary and HR dictionary by learning LR training images and HR training images.Second, the optimal sparse coefficients "" of LR input image block are solved according to the equation min( −   × ).Then, we get the corresponding HR image block  via solving the formula   ×  and fuse all HR image blocks into one HR image .

The Proposed Method
When one LR image is given to reconstruct one HR image, the main problem of dictionary-based methods is how to find lots of HR-LR training images and utilize self-similarity information of external images.But it will cause the reconstructed HR image distortion, given the absence of the input images' frequency information in the selected HR-LR training set.In this paper, we propose a multiple superresolution reconstruction method based on self-learning dictionary.We collect multiple images of the same scene as both input images and training image set, and larger-scale images, which are constructed from last training set, are also involved in the training set constantly.After that, we get the initial reconstruction images whose scale closely approximates target HR image.Then, we use NLM method to fuse the initial reconstruction images into one HR image.Lastly, IBP idea is adopted to satisfy the global reconstruction constraint.Here, four major parts are detailed in the following.In Section 3.1, we mainly discuss how to use input images to gradually magnify image and create a large number of different larger-scale images as training set.In Section 3.2, sparse representation is adopted to jointly learn redundant dictionary.In Section 3.3, we employ NLM method to fuse initial reconstruction images into one HR image and introduce the importance of global reconstruction constraint.In Section 3.4, the working process of the proposed algorithm is introduced.In Section 3.5, we discuss how to simplify the operation.

Self-Learning Method.
As we have said above, dictionarybased reconstruction algorithms need lots of training images to learn the overcomplete dictionary for constructing HR image.Literatures [25][26][27][28] point out that input image has enough self-similarity information and the high-frequency information of small-scale image can be obtained from large-scale image.They can use multiframe input images to reconstruct images accurately.Therefore, when external images cannot be accessed, this paper proposes a self-learning pyramid method which constructs larger-scale images gradually to join in training set.And it is feasible.We can fully learn all the redundant information from different scale images.First, assuming that the target HR image resolution is  times that of the input LR image resolution, we will enlarge input images resolution   times step by step, as the following formula: Assuming that the input images are  ×  dimension, there are two cases for the principle of  as follows: We discuss this paper based on condition (a).Then, the degraded images  ∧ 1 are solved via the formula  1 ↓  ↑  (↓  is  times downsampling; ↑  is  times upsampling), which together constitute the HR-LR training set  1 with high-frequency part of input images  1 , as the formula (3).(The degraded images  ∧ 1 are LR training set.Highfrequency part ( 1 − ∧ 1 ) of input images  1 is HR training set.)This is the difference between this paper and [25][26][27], and our method can recover the edge high-frequency better.We train  1 to get dictionary  1 and construct the corresponding  larger-scale HR images  2 whose resolution is  times that of the input images.Then, degraded images  ∧ 2 are solved, which together constitute new HR-LR training set  2 with highfrequency part of  2 and  1 again.It is as shown in (3) that we reconstruct final HR images whose resolution is closest to target HR images, followed by  repeated learning.Process of training images is as shown in Figure 2. We enlarge every input image step by step just like pyramid.Consider (3) 3.2.Learning Dictionary.In the previous section, we have discussed the self-learning pyramid method.Chen et al. [26] and Zhang et al. [27]  are HR dictionary and LR dictionary, resp.).In this section, the premise of learning dictionary is that sparse coefficients  and  are the same.It is as shown in (4).Consider By solving the sparse coefficients  of LR dictionary   , we can get HR image blocks according to  =  ℎ × .This is the basic idea of training dictionary.Before reconstructing HR image, we must learn dictionaries   and  ℎ which must satisfy constraints of (4) firstly.So we can convert (4) into the following optimization problem, as shown in (5) As long as the sparse coefficients  are sparse enough, (5) is solved by solving  1 norm problem, as shown in ( 6) and (7).Consider In order to get HR dictionary and LR dictionary of the same coefficient , (6) and ( 7) can be converted into (8).Consider In (8),  where  is dimension of HR image blocks and  is dimension of LR image blocks.In order to solve (8), we set the initial value of matrix  and ( 8) can be converted into (10) Then, the transformation of ( 10) is as shown in (11).Consider where  =    and  =   .After obtaining , we can solve (12) to get the value of matrix .Consider Therefore, dictionaries   and  ℎ are obtained.Then, we enlarge every input image of   one by one, as shown in Figure 3. First, every input image is enlarged to   by interpolation and the upscale factor is .Second,   is divided into blocks  of the same size.According to dictionaries   and  ℎ , we solve ( 7), (8), and (10) to get .Then, HR image blocks are obtained according to the formula  =  ℎ × .When all of the HR image blocks are gained, they are merged into one HR image  ℎ according to the fixed order. ℎ +   is the larger-scale reconstructed image and also is one of reconstruction images  +1 , as shown in Figure 3. Lastly, we obtain the reconstruction images set  +1 by using dictionary to construct every image in   .Then, image sets  +1 and  +1 will become the input images and training images set of next reconstruction enlargement, respectively. +1 is as shown in (13).Consider Nonlocal mean (NLM) filter is a very effective denoising filter, which is based on an assumption that each pixel is a repetition of surrounding pixels.It can be used to obtain the target pixel value with a weighted sum of the surrounding pixels.Now, NLM is widely used in the field of superresolution reconstruction.Just as [9], when we use NLM idea to reconstruct multiframe images, the smaller the magnification is, the smaller the reconstruction error is.So when image dimension of  +1 is closest to target HR image, using NLM idea to process  +1 is feasible.In this section, we adopt ( 14 ) .
The (, ) is a pixel of the target HR image.  represents one of the image sets,  +1 .(, ) is a pixel of image   . ,   represents extracting image block from image   whose center is (, ).After amplification and denoising of reconstruction image, we get the prime estimation image  0 .Finally, we must add global reconstruction constraint to meet  = , as shown in (16).Consider Turn into the following formula (17).Consider We can solve (17) to obtain the optimal estimated HR image  via IBP idea.

The Proposed Algorithm.
The process of the proposed algorithm is as shown in the following writing based on the above sections.Suppose that we have four LR images of the same scene.

Objective: Reconstruct HR Image from Multiple LR Images
(1) Initialization: input four LR images  1 and solve the corresponding degrade images by formula  1 ↓  ↑  ; construct the first training image set  1 ; set magnification  and iterations .
(  3).Take  +1 as new LR images to be reconstructed.Consider (3) End ( 4) Adopt NLM idea to fuse images of  +1 into one the same scale HR image  0 with target image.
(5) Add global reconstruction constraint to meet  = .Use IBP idea and gradient descent algorithm to solve ( 16) and obtain the optimal reconstruction HR image .Consider (6) Output the final HR image .

The Discussion of Reconstruction Speed.
When we apply the NLM method to adjust reconstruction image in Section 3.3, it will spend much operation time by solving (14) directly.So, in this section, we will accelerate reconstruction speed by simplifying section (c) in this section.Set derivation of the ( 14) to be equal to zero.We can get (21).Consider , represents extracting one pixel from the HR image.Equation ( 21) becomes (22).Consider  As described above, it will reduce operation time and avoid complicated matrix operations.Meanwhile, this algorithm can be parallelized to deal with such steps as 2(a), 2(b), and (4) of Section 3.4.Therefore, we can take advantage of GPU to further improve the reconstruction speed.As shown in Section 4.2, we use Gforce 780 M GPU to process the captured images in camera system experimentation.Some blocks of the input LR image are smooth with few details, as shown in Figure 4 (the black box).These blocks contain little high-frequency information and have no effect for reconstruction HR image.So this section proposes interpolation algorithm to enlarge these blocks, and the algorithm can run faster.Smooth image block is defined by the following formula: where    is the pixel value of image block and  is the average value of block pixel.We set a fixed value  0 .If  <  0 , we use interpolation method to reconstruct the block.

Experimental Results and Analysis
4.1.Simulation Experimentation.In this section, we perform 2x magnification experiments on three test images from Figure 4 to validate the effectiveness of the proposed method.These images cover a lot of contents including Lena, flower, and building, as shown in Figure 4.The compared group includes five other methods: the interpolation-based method [4], the NLM method based on reconstruction [8], Glassner's method [25], Yang's method [23], and Zhang's method [27].For example, the input LR Lena images are simulated from a 256 × 256 normative HR image revolved, translated randomly, and downsampled by a factor of two.The image block size is set to 5 × 5 with overlap of 4 pixels between adjacent blocks.And upscale factor  is 2. The gradual enlargement  is set to 1.25,  is 3, and  is 128/125.Although the proposed method has a certain similarity in the idea of dictionary-based reconstruction [23], it takes the following unique features.(3) Because the reconstruction results have fusion noise and may not be the same scale with target HR image in Section 3.3, we use NLM idea to meet both cases and merge them into one HR image, which contains more target image information.Lastly, we use IBP idea to meet global reconstruction constraint.
Figures 5, 6, and 7 are the reconstruction results of the testing images Lena, flower, and building.
We prove the superiority of the proposed algorithm from the visual quality and objective quality of the reconstructed HR image.And peak signal to noise ratio (PSNR), structural similarity (SSIM), mean squared error (MSE), and mean absolute error (MAE) are employed to evaluate the objective quality of the reconstructed HR image.Definitions are as follows.
Mean squared error is as follows: Peak signal to noise ratio is as follows: Mean absolute error is as follows: The (, ) and (, ) are pixels of the reconstructed HR image and the original HR image, respectively.The  and    can achieve better reconstructed result and recover more missing high-frequency information and clearer edge details in comparison with other five methods.
To further assess the visual quality obtained by different methods, we compare reconstructed image details of different methods with original HR image, as shown in Figures 5, 6, and 7. When we contrast eye and the brim of a hat in Figure 5, the result of interpolation-based method is vaguer and NLM method's reconstructed result appears distortion and mosaic.On the contrary, the proposed method does not cause distortion and vague compared with interpolation-based and NLM cases.Moreover, reconstructed details of our method are more delicate than other algorithms.In Figures 6 and 7, the proposed method can also produce better reconstructed details, eaves and flower border, both accurately and visually in comparison with other superresolution algorithms.

Camera System Experimentation.
In this section, we conduct research on real scene reconstruction to prove the superiority of the proposed algorithm.It is the camera system assembled by ourselves, which consists of a microdisplacement lens, PI turntable, and Gforce 780 M GPU.By controlling the PI turntable, we can continuously receive multiple real images, as shown in Figure 8(a) (one of multiple real images).The images are taken to be reconstructed by our method, and result is as shown in Figure 8(f).Figures 8(b)-8(e) are the reconstruction results of the NLM method based on reconstruction [9], Glassner's method [25], Yang's method, and Zhang's method [27].Signal noise ratio (SNR), average gradient (AG), information entropy (IE), and standard deviation (SD), which do not require reference HR represents image mean value.The STD max represents the max value of local standard deviation.Consider is the total image gray level, and   represents the ratio of pixels in gray value  to pixels in all image gray values.Consider   image is clearer and contains richer information.Therefore, the numeric results of proposed method are better than those of other algorithms as shown in Table 2. Visually, in comparison of contour outline of the reconstructed image detail, the reconstructed image in Figure 8(f) is significantly clearer than the other reconstructed image.Consequently, the proposed method can work out well in the camera system experimentation and be capable of reproducing plausible details and sharp edges with minimal artifacts.

Conclusion
It is known that, in superresolution reconstruction, distortion of reconstructed images occurs when frequency information of target images is not contained in training images.To solve this problem, this paper proposes a multiple superresolution reconstruction algorithm based on self-learning dictionary.First, images on a series of scales created from multiple input LR images are used as training images.
Then, a learned overcomplete dictionary provides sufficient real image information that ensures the authenticity of the reconstructed HR image.In both simulation and real experiments, the proposed algorithm achieves better results than that achieved by other algorithms, recovering missing high-frequency information and edge details more effectively.Finally, several ideas towards higher computation speed are suggested.

Figure 1 :
Figure 1: Simple diagram of general dictionary-based reconstruction algorithm.

Figure 2 :
Figure 2: Process of training images.

Figure 3 :
Figure 3: Process of constructing dictionary and reconstructing image (the upscale factor is ).

Figure 4 :
Figure 4: The input LR images.(a) is one of several Lena images.(b) is one of several building images.(c) is one of several flower images.

( 1 )
This paper proposes self-learning pyramid method to construct larger-scale image step by step.And it does not need to search lots of external HR-LR images as a training set.(2) The reconstructed result depends on the selected training images.It will cause to the reconstructed HR image distortion, given the absence of the input images' frequency information in the selected HR-LR training set.This method puts different scale reconstructed image into the training set, and it can provide much real similarity information to ensure the authenticity of the reconstruction HR image.
image, are employed to evaluate the objective quality of the reconstructed real HR image in real scene reconstruction of camera system.Definitions are as follows.Consider SNR =  STD max .
=1 ( (  ,   ) − ) 2  ×  .(29) The average gradient represents clarity.Information entropy can indicate image information abundance.Standard deviation represents discrete situation of image gray level.When SNR, SD, IE, and AG are larger, the reconstructed
[22]image blocks manifold to learn dictionary.They solve the optimal weight between input image block and several nearest local training blocks.But it cannot effectively utilize global image information.Yang et al.[22]point out that sparse representation can utilize global image information better and construct universal dictionary with a better reconstructed image.So we propose jointly learning dictionary method based on sparse representation in this section.As shown in Section 3.1 that   is the training set, we use images   of   as input images of next reconstruction. and  are HR training blocks and LR training blocks respectively, which are represented by atomic linear combination of matrix :  =  ℎ × ;  =   ×  ( and  are sparse coefficients. ℎ and . Consider min ‖‖ 0 , s.t.     −   × ) to fuse images of  +1 into target HR image  0 .Consider , , , ) = exp (−       ,   −  , a) HR image and LR image of the HR-LR training set   are divided into image blocks  and , respectively.Training dictionaries  ℎ and Take   as new input LR images to be reconstructed.And solve new HR images  +1 from dictionaries  ℎ and   .Then, we get four reconstruction images  +1 , in which resolution is   times of the original input images.