A l 1 Norm Based Image Prior Combination in Multiframe Superresolution

We address themultiframe superresolution problemusing the variational Bayesianmethod in this paper. In the variational Bayesian framework, the prior is crucial in transferring the ill-posed reconstruction problem to a well-posed one. We propose a prior combination method based on filter bank and l1 norm. Multiple filters are used in our prior model, and the corresponding combination coefficient vector can be estimated by the characteristics of the filtered image andnoise. Furthermore, the local adaptive coefficients of every filter are more effective in removing noise and preserving image edges. Extensive experiments demonstrate the advantages of the proposed method.


Introduction
Spatial resolution is a crucial parameter to indicate the image quality.Images with high spatial resolution have more distinguishable details and are usually desirable in real applications.In this paper, we address the multiframe superresolution (SR) problem.It uses a sequence of undersampled, shifted, rotated, and noisy low resolution (LR) observation images of the same scene to reconstruct a high resolution (HR) image by the algorithm.It has been widely used in remote sensing [1], surveillance [2], medical imaging [3], and astronomy imaging [4].
The superresolution method has been developed for more than three decades [5].Although the literature is rich (see [6,7] for reviews), it is still an open and wildly investigated topic.The superresolution concept was first proposed by Tsai and Huang in the frequency domain [5].However, these frequency approaches [8] have difficulty in including spatial prior knowledge.Therefore, methods in spatial domain have become more popular in recent years due to their convenient incorporation of prior knowledge.These methods fall into two categories: deterministic approach, such as projection onto convex sets (POCS) [9] and iterative back projection (IBP) [10]; stochastic approach, including maximum likelihood (ML) [11], maximum a posterior (MAP) [12][13][14], and Bayesian method [15][16][17].Compared with the deterministic approach, the stochastic approach has more flexibility in modeling the imaging formation process and the prior knowledge [5], and using the hierarchical Bayesian method, both the high resolution image and model parameters can be estimated simultaneously and automatically.So in this paper, we use the Bayesian framework to address the superresolution problem.
Using the Bayesian theorem, the prior knowledge of the desired HR image and the LR observation data are combined to get the posterior of the HR image and the parameters.Using the MAP principle, the HR image and the parameters maximize the posterior.However, for many prior models, the posterior is intractable mathematically.This can be efficiently solved by the variational Bayesian method, which was first introduced into superresolution by Babacan et al. [16].To suppress the noise, the prior models are designed to constraint the high spatial frequency component in the HR image.However, important edges and textures are also high frequency signals.So the prior model is very critical to keep a balance between removing noise and preserving edges and textures.How to choose the appropriate prior model is still an open question.The simultaneous autoregressive model (SAR) prior [18] and the total variation (TV) prior [19] were first used in the variational Bayesian framework [16].Later the  1 prior [20] was introduced and proven to have better performance.In [21], the combination of the sparse prior and nonsparse SAR prior was proposed, but the combination coefficient had to be determined by the hand empirically.In [17], the author suggested a nonstationary image prior combination method based on a general combination of spatially adaptive image filters, and the combination coefficient can be estimated automatically.However, this method favors oversmooth image estimates because of the heavy penalty of the  2 norm in the prior model.Also the reported PSNR values are low in lots of filter combinations.Based on the above problem, we proposed a prior combination method based on filter bank and  1 prior.The high resolution image is reconstructed according to the characteristics of the filtered image, and the locally adaptive property is guaranteed by the majorization-minimization (MM) approximation of the  1 prior.Furthermore, our work is useful in exploring the superresolution limit performance with respect to the prior model in the variational Bayesian framework.Extensive experiments demonstrate the advantages of the proposed method.
The paper is organized as follows.In Section 2, we formulate the underlying mathematical framework for variational Bayesian superresolution.In Section 3, we present the variational Bayesian SR algorithms using prior combination based on filter bank and  1 norm.Experimental results are given in Section 4. Finally, Section 5 concludes the paper.

Problem Formulation
2.1.Imaging Model.The imaging process is based on a mathematical model that describes the physical process of obtaining  LR images y  ,  = 1, . . ., , from the latent HR image x we desire.We adopt matrix-vector notation such that each y  and x are arranged as  × 1 and PN × 1 vectors in lexicographic order, where the integer  > 1 is the factor of increase in resolution, and √  is the magnification factor.The imaging process (Figure 1) consists of warping, blurring, and downsampling, which is modeled as [22,23] where n  is the additive observation noise, D is the  × PN downsampling matrix, H  is the PN × PN blurring matrix, and M  is the PN × PN warping matrix, encoding the motion of y  with respect to reference coordinate defined by x.In this work, we assume H  is known and decided by the linear space invariant point spread function (PSF), and M  (s  ) is generated by the motion vector s  = (s 1  , s 2  , s 3  ) = (  ,   ,   )  , where   is the rotation angle and   and   are horizontal and vertical translation pixels, respectively.We denote the motion vectors by s = {s  }  =1 .The effects of downsampling, blurring, and warping matrix are combined into a  × PN degradation matrix B  , describing the generation of LR frames from the HR image.The joint imaging model for all LR frames is given by where y = (y where

Hyperprior of the Hyperparameters.
We call the parameter in the imaging model and image prior model as hyperparameter, and they form the hyperparameter set Θ = { 1 , . . .,   ,  1 , . . .,   }.Hyperprior of the hyperparameter gives the prior knowledge on the different hyperparameter  ∈ Θ, which will be modeled using where () is the Gamma hyperprior where  > 0,   > 0,   > 0, Γ(⋅) is the Gamma function, and the expectation () =   /  .The Gamma hyperpriors are used because they are the conjugate priors [24] for the variance of the Gaussian distribution and convenient to effectively incorporate the prior knowledge.

Prior of the Motion Vectors.
In the preprocessing step, the initial motion vectors are useful to accelerate the convergence rate of the SR algorithm.For example, using the algorithms reported in [25], we can acquire an initialization of s  , denoted by s   , from the LR images.However, this initialization is usually inaccurate.As in [16], we use the Gaussian distribution to model the prior of this initialization, where Λ   is the covariance matrix to describe the uncertainty of the initialization.

Variational Bayesian Inference
In this section, we use the variational Bayesian method in [16,24] to infer the HR image x, hyperparameters Θ, and the motion vector s.The Bayesian inference will be based on posterior probability distribution where (y) has fixed value for given LR observation images y.
Express the joint probability distribution (x, Θ, s, y) in (10) in terms of the hyperprior (Θ), the prior of the motion vectors (s), the image prior (x | ), and the imaging model Since (x, Θ, s | y) is intractable due to the fixed unknown (y) and the  1 norm in the image prior, as in [16], we apply variational methods to approximate (x, Θ, s | y) by the simpler distribution (x, Θ, s) by minimizing the Kullback-Leibler (KL) divergence, which is given by The KL divergence is nonnegative and equals zero only when (x, Θ, s) = (x, Θ, s | y).Substitute (x, Θ, s | y) in ( 12) with (10); we have , Θ, s, y) ) x Θ s To further make the computation of  KL ((x, Θ, s) ‖ (x, Θ, s | y)) tractable, we assume that (x, Θ, s) can be factorized as Due to the  1 norm in the image prior, we still cannot calculate the KL divergence.We resort to the majorizationminimization (MM) approach [26] to overcome this difficulty.Utilizing the inequality which states that, for real numbers  ≥ 0 and  > 0, we can make the minimization of ( 13) tractable.For the image prior model in (5), let  = x  (), and  = √   ; using (15), then where w  is a PN × 1 auxiliary vector with   > 0 as its th component.w  is the local adaptive coefficient for the th filtered image.Using ( 16), a lower bound of the joint distribution (x, Θ, s, y) can be found Substitute (x, Θ, s, y) in ( 13) with (x, Θ, s, y, w  ) and use (17); we have Using ( 18), we have min where the last equality holds because the solution of the minimization problem (19) is not effected by the fixed valued log((y)) even if we do not know the exact value of it.Since inequality (19) holds for any distribution (x, Θ, s) and auxiliary variable {w  }, we can solve the problem min (x,Θ,s)  KL ((x, Θ, s) ‖ (x, Θ, s, y, w  )) to get an approximation minimum of  KL ((x, Θ, s) ‖ (x, Θ, s | y)).We circularly update {w  } and the factors of (x, Θ, s) in (14) given by the minimum of  KL ((x, Θ, s) ‖ (x, Θ, s, y, {w  })) with other distributions held constant.For fixed {w  }, to solve the problem min (x,Θ,s) the standard solutions of variational Bayesian methods [24] can be used to estimate the unknown distribution by where  \ [⋅] denotes expected value with respect to all stochastic variable (i.e., {x, Θ, s}) with  ∈ {x, Θ, s} removed.For fixed (x, Θ, s), we differentiate  KL ((x, Θ, s) ‖ (x, Θ, s, y, {w  })) with respect to {w  } to find the minimum to tighten the upper bound in (19).By this procedure, we get a sequence of distribution {  (x, Θ, s)} and auxiliary variable {w   }, which satisfy where the first inequality holds because  +1 (x, Θ, s) is the solution of the problem min (x,Θ,s)  KL ((x, Θ, s) ‖ (x, Θ, s, y, {w   })), and the second one holds because The proximity of the estimated posterior distribution (x, Θ, s) = lim →∞   (x, Θ, s) to the true posterior (x, Θ, s | y) is still an open question.However, the experiments demonstrate that it is a good approximation.The following (A), (B), (C), and (D) give the estimation of the unknown distributions and auxiliary variable.
(A) Estimation of the HR Image Distribution.From ( 21), the distribution of HR image x is which is Gaussian distribution where the mean and inverse variance are where In this paper, for a random variable , ⟨⟩ is the mean value of it, and ⟨()⟩  is the mean value of () with respect to ; for a vector a, diag(a) generates a square diagonal matrix with the elements of vector a on the main diagonal; for a matrix A, diag(A) generates a column vector using the main diagonal elements of A.
and, as a result, ) is nonlinear with respect to s  , we use the first-order Taylor approximation at ⟨s  ⟩ as in [16].For the above estimated Gaussian distribution (x) = N(x | ⟨x⟩, Σ x ) of high resolution image x, the mean value ⟨x⟩ maximizes the posterior and is the optimal estimation of x.The algorithm is summarized in Algorithm 1.

Experimental Setup.
This section presents experimental results of our proposed method on simulated and real images and compares it with (1) bicubic interpolation (BBC), that is, bicubic interpolation of the LR reference frame, (2) the variational SR method using simultaneous autoregressive model (SAR) prior [16] (denoted by SAR), (3) the variational SR method using TV prior model [16] (denoted by TV), and (4) the nonstationary image prior combination method [17] with the filter combination NF3, because it has the best performance in the combination selection (denoted by NS NF3).It is easy to see that the  1 prior method in [20] is a special case of our proposed method.We have not compared our method with the one in [21], because how to determine the combination coefficients is not given in [21].
For the sake of comparison, the filters in Section 2.2 and the filter combinations are mainly designed as the ones in [17] except the Haar filters.The image filters used in this paper are the first-order difference (f.o.d.) filters f 1 , f 2 , f 3 , and f 4 , the second-order Laplacian filter f 5 , the second-order difference filters f 6 , f 7 , f 8 , and f 9 , the Prewitt filters f 10 and f 11 , the Sobel filters f 12 and f 13 , and the Haar filters f 14 , f 15 , and f 16 .All the corresponding convolution kernels are listed below: , , , In our method, we use the combinations: NF2 = {f 1 , f 2 }, NF3 = {f 1 , f 2 , f 5 }, and NF4 = {f 14 , f 15 , f 16 }, and our method with these combinations is denoted by NF2, NF3, and NF4, respectively.Our proposed method with NF2 is equivalent to the  1 prior method in [20].
The initial values of the algorithm are chosen as follows: a noninformative hyperprior model assumption, that is,   =   = 0,  ∈ Θ.For the motion vectors, the initial values are estimated using the LK method [25], assuming that (Λ   ) −1 is equal to zero matrix.The HR image estimate is initialized using bicubic interpolation of the reference frame.With the initialized HR image, the parameters w  ,   , and   are initialized using the formula in (31) and (35) with the expectation operator removed.

Assessment of SR Algorithms.
For experiments with simulated images, the objective methods have been used to assess the results of SR algorithms.With ground truth HR images, peak signal-to-noise ratio (PSNR) and structural similarity measure (SSIM) are used to assess the reconstructions of the SR algorithms.PSNR is defined as SSIM is defined as where x is the ground truth grayscale HR image, from which the sequence of LR images is generated, x is the reconstructed HR image using the SR algorithms, and  1 ,  2 are constant.
In this paper, we set  1 =  2 = 0.01. x and  x are the mean of the component of vectors x and x, respectively. x and x are the standard deviations of the associated images, and  xx is defined as In this paper, high resolution images are all reconstructed iteratively for superresolution methods mentioned in this paper except the BBC.And for experiments with real images, the ground truth images are not available.So we choose the reconstruction images empirically such that they produce the visually most appealing results in the iteration process.

Experiments with Simulated
Images.The simulated images are generated using the grayscale high resolution images in Figure 2.For every image, synthetic sequences of 5 LR images are generated according to (1), that is, the following step: (1) warp, including transition and rotation, the transitions are pixels, respectively, and the rotation angles are (2) blur using isotropic 3 × 3 Gaussian PSF with standard deviation 1; (3) downsample the row and column of the image by factor of √  = 2; (4) add independent identically distributed (i.i.d.) Gaussian noise of SNR (signal noise ratio): 10 dB, 15 dB, 20 dB, 25 dB, and 30 dB; SNR is defined as where  x and  n are the standard derivation of the HR image and noise, respectively.We conduct the simulations with 10 different noise realizations at each SNR, and the average and variance of the performance are reported.
For our proposed method with different filter combinations, the performance comparisons in terms of PSNR and SSIM of the reconstructed image are shown in Figures 3, 4, and 5.Here we take Figure 2(b) as an example.The filter combinations in Figure 3 are coupled by the distance between the positive and negative values of the filter, distance 1: (1) {f 1 , f 2 }, (2) {f 14 , f 15 , f 16 }; distance √ 2: (3) {f 3 , f 4 }; distance 2: (4) {f 10 , f 11 }, and (5) {f 12 , f 13 }.From Figure 3, we can observe that the performance degrades as the distance increases.Figure 4 shows the reconstruction performance for the second-order filters combinations, distance 1: (1) {f 6 , f 7 }, distance √ 2: (2) {f 8 , f 9 } and (3) {f 5 }.The performance degrades as the distance increases, but {f 5 } has better performance than {f 8 , f 9 }. Figure 5 shows the performance comparison when combining the first-and second-order filters, the performance is mainly decided by the first-order filters, and the addition of the second-order filters slightly degrades the performance.
Figure 6 shows the performance comparison in terms of PSNR and SSIM for the images presented in Figure 2 with different methods.It is observed that the proposed method has the highest PSNR and SSIM for all images with all SNR and this performance advantage is more obvious for low SNR.For our method, the best performance is achieved by filter combination of NF2 and NF4.In the following comparison, only NF2 and NF4 are used for our method.
To further compare the performance of different method, we set up another experiment.Here, we use the synthesized sequence EIA from the MDSP benchmark dataset [27].Detailed description about this synthesized sequence is available in [14].We approximate the PSF using isotropic 3 × 3 Gaussian function with standard deviation 1, first 15 frames are used to reconstruct the HR images, and the first frame in the sequence is set as the reference frame.Figure 7 shows the HR reconstructions for different method.It is observed that our method has best overall performance.Ringing artifacts can be found for the SAR.The NS NF3 has the best noise removing performance, but there are some unnatural artifacts around the edges, for example, in the red circle domain of the Figure 7(e).Compared to TV, our proposed method has better noise removing performance.So our proposed method has better trade-off between preserving the edges and removing noise and artifacts.

Experiments with Real Images.
In this section, we demonstrate the performance comparison of our proposed method with real images.The color image sequence car is downloaded from the website [28].From Figures 8(a) and 9(a), we can observe that it is a challenging sequence because the LR car images are severely degraded by the blur and unknown noise model.We reconstruct each color channel separately using the superresolution algorithm.We approximate the PSF using isotropic 3 × 3 Gaussian function with standard deviation 1, and the first frame in the sequence is set as the reference frame.
For the car sequence, Figures 8 and 9 show the reconstructions with magnification factor √  = 2, using  = 5 and  = 15 LR frames, respectively.As expected, it can be observed that the reconstructions are better with 15 than 5 LR frames.From Figure 8, we can clearly see that the proposed method has better performance: the texts on the window of car are oversmoothed by the NS NF3 method; by the red circle inserted in Figure 8(c), we can see the ringing artifact produced by the SAR method; near the vertical boundary of Figure 8(d), we can observe the unnatural and obvious artifacts produced by the TV method; also compared to TV and SAR method, our method has better noise removing performance with crisper edges.Although the visual performance difference in Figure 9 is not obvious as that in Figure 8, similar conclusions can be acquired by careful observation: especially, the NS NF3 method oversmooths the texts on the window of the car; the artifacts near the vertical boundary in Figures 9(c) and 9(d) are obvious for SAR and TV method.
Although the NS NF3 method has the best noise removing performance, the edges are also oversmoothed by the  2 norm in the prior model, especially for the low number of frames when the prior has a strong effect on the reconstructed HR image; this can be observed by comparing Figures 8(e) and 9(e).To sum up, by the results in Figures 8 and 9, our proposed method has better trade-off between preserving the edges and removing the noise and artifacts.

Conclusions
In this paper, a prior model combination method, or a class of prior model based on filter bank and the  1 norm, has been proposed for the variational Bayesian superresolution method.High resolution images and all parameters can be estimated automatically and cumbersome parameter-tuning by hand is no longer necessary.Example filters have been designed and used in the proposed method, and the proposed method outperforms the state of art methods in simulated and real data experiments.For the proposed method with different filter combination, filter combinations with small distance have better performance.For all image scenarios with different SNR, the proposed method has the highest PSNR and SSIM values and this performance advantage is more obvious for low SNR.By the visual effect in the simulated and real experiments, it is observed that the proposed method has better trade-off between preserving the edges and removing the noise and artifacts.
, and (⋅)  is the transposition operator.In this work, we assume that the additive noise n  of LR frame is zero-mean white Gaussian noise with inverse variance (precision)   , and the conditional distribution ofy  is  (y  | x,   , s  ) ∝  /2 [17]he th component of the vector x  , F  is the PN × PN convolution filter matrix, which is decided by the convolution kernel, and ‖ ⋅ ‖ 1 is the  1 norm of the vector.For a vector a = ( 1 ,  2 , ...,   )  , ‖a‖ 1 = ∑  =1 |  |. (  ) is the partition function that we approximate as a group of high-pass filter, such as Laplacian filter and some edge detection filters.Using the filter band, high spatial frequency signals in the reconstructed image are extracted and penalized by the  2 norm for prior model in[17]and by  1 norm for our proposed one.As is well known, for the large amplitude high spatial frequency component, the  2 norm penalizes it heavily than the  1 norm, which means steep edges and textures are oversmoothed by the  2 norm.Also the image prior model can be considered as regularization term and, as is well known, the  1 norm favors a more sparse solution than the  2 norm.And the sparsity of the high spatial frequency signals means crisp edges and local smoothness.The nonsparse property of the solution induced by  2 norm usually means blur edges, especially for low number of LR images.Experiments demonstrate that good performance can be achieved using appropriate filter combinations. contains Given the values    ,    ,    ,    , s where    ,    ,    , and    are the parameters of the hyperpriors.Substitute the first equality of (31) into (32); we get (  ) ∝     −1+PN/2  exp (−  (   + PN ∑ =1 √   )) , (34)