Content-Aware Compressive Sensing Recovery Using Laplacian Scale Mixture Priors and Side Information

Nonlocal methods have shown great potential in many image restoration tasks including compressive sensing (CS) reconstruction through use of image self-similarity prior. However, they are still limited in recovering fine-scale details and sharp features, when rich repetitive patterns cannot be guaranteed; moreover the CS measurements are corrupted. In this paper, we propose a novel CS recovery algorithm that combines nonlocal sparsity with local and global prior, which soften and complement the selfsimilarity assumption for irregular structures. First, a Laplacian scalemixture (LSM) prior is utilized tomodel dependencies among similar patches. For achieving group sparsity, each singular value of similar packed patches is modeled as a Laplacian distribution with a variable scale parameter. Second, a global prior and a compensation-based sparsity prior of local patch are designed in order to maintain differences between packed patches. The former refers to a prediction which integrates the information at the independent processing stage and is used as side information, while the latter enforces a small (i.e., sparse) prediction error and is alsomodeledwith the LSMmodel so as to obtain local sparsity. Afterward,we derive an efficient algorithmbased on the expectationmaximization (EM) and approximate message passing (AMP) frame for the maximum a posteriori (MAP) estimation of the sparse coefficients. Numerical experiments show that the proposed method outperforms many CS recovery algorithms.


Introduction
Compressive sensing (CS) [1,2] allows us to reconstruct highdimensional data with only a small number of random samples or measurements, if the original signal can be sparsely represented by some given appropriate basis.Owing to the fact that image prior knowledge plays a critical role in the performance of compressive sensing reconstruction, much efforts have been made to develop an effective regularization term or signal model to reflect the image prior knowledge.Standard CS methods exploit the sparsity of signal in some domains, such as DCT [3], wavelets [4,5], total variation (TV) [6,7], and learned dictionary [8,9].Unfortunately, these methods are less appropriate for many imaging applications.The reason for this failure is that natural images do not have an exactly sparse representation in any above basis.These models favor piecewise constant image structures and hence tend to smooth much the image details.
More recently, the concept of sparsity has evolved into various sophisticated forms, including group sparsity [10,11], tree sparsity [12][13][14], and nonlocal sparsity [15][16][17][18][19], where higher-order dependency among sparse coefficients is exploited.Among them, nonlocal sparsity, which refers to the fact that a patch often has many nonlocal similar patches to it across the image, has been shown to be most beneficial to CS image recovery.In [15], a nonlocal total variation (NLTV) regularization model for CS image recovery is proposed by using the self-similarity property in gradient domain.In order to obtain an adaptive sparsity regularization term for CS image recovery process, a local piecewise autoregressive model is designed in [16].In [17], similar patches are grouped to form a two-dimensional data matrix for characterizing the low-rank property, leading to a CS recovery method via nonlocal low-rank regularization (NLR-CS).In [18,19], a probabilistic graphical model is established, which uses collaborative filtering [20] to promote sparsity of packed patches.
Despite the steady progress in nonlocal methods, they still tend to smooth the detailed image textures, degrading the image visual quality, for the reason that the lack of selfrepetitive structures and corruption for data is unavoidable.
To deal with this issue, local and global priors are designed to soften and complement the nonlocal sparsity for irregular structures so as to preserve image details.More specifically, the nonlocal sparsity is only imposed on a set of patches with limited influence from neighboring pixels, while the global prior refers to a prediction used as a reference which integrates the outcomes at the independent processing stage and can maintain the entire consistency of image; moreover, a compensation-based constraint term of local patch is utilized to enforce a small (i.e., sparse) prediction error.Both local sparsity and nonlocal sparsity are represented by Laplacian scale mixture (LSM) [21,22] models, which are adopted to force coefficients, that is, singular values of local patches and similar packed patches, to be sparse.Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, resulting in weighted singular value minimization problems, where weights are adaptively assigned according to the signal-to-noise ratio.On the other hand, the reference image can be used as side information.Finally, we obtain a side informationaided LSM prior model for CS image reconstruction.To solve this model, the expectation-maximization (EM) [23] method is adopted, turning the CS recovery problem into a prior parameter estimation problem and a singular value minimization problem.In particular, owing to its promising performance and efficiency, we are motivated to apply the approximate message passing (AMP) algorithm [24,25], which is an iterative algorithm that can be used in signal and image reconstruction by performing denoising at each iteration, to solve the latter.Experimental results on natural images show that our approach can achieve more accurate reconstruction than other competing approaches.

CS Recovery Problem.
The CS recovery problem aims to find the sparsest solution  ∈ C  from the underdetermined linear system  =  + , where  ∈ C  are the measurements,  ∈ C × ,  <  is the measurement matrix, and  denotes the additive noise.One can solve the following objective function: The first term is the data fidelity term that represents the closeness of the solution to the measurements.The second term is a regularization term that represents a priori sparse information of the original signal. is a regularization parameter that balances the contribution of both terms.As mentioned in Introduction, CS recovery methods exploit the sparsity of signal in some domains, such as DCT [3], wavelets [4,5], learned dictionary [8,9], and total variation (TV) [6,7], leading to various forms of R() : ‖DCT()‖  , ‖Ψ()‖  , and ‖Dic()‖  where  ∈ {0, 1} and ‖‖ TV , respectively.

Nonlocal Sparsity.
The abundance of self-repeating patterns in natural image (as shown in Figure 1) can be characterized by the nonlocal sparsity [16,17].As shown in Figure 2

The Approximate Message Passing
Algorithm.On the other hand, these minimizing problems (e.g., (1) and ( 2)) can be solved easily by iterative shrinkage/thresholding (IST) methods [26][27][28], alternating direction method of multipliers (ADMM) [29,30], or Bregman iterative algorithms [31,32].The approximate message passing reconstruction algorithm defined by Donoho et al. [24] has recently become a popular algorithm for solving signal reconstruction problems in linear systems as defined in (1).It is based on the theory of belief propagation in graphical models.Unlike belief propagation that needs to calculate 2 messages in each iteration, by employing quadratic approximation, the expression of each message can be simplified in the AMP algorithm, and the number of messages can be reduced to  +  [33].The final alternating expressions to solve the objective function min  ‖ − ‖ 2  2 /2 + ‖Ψ()‖ 1 are where Ψ(⋅) denotes wavelet transforms;  () and  () are the estimates of  and the residual at iteration .The iteration starts from  (0) = 0 and  (0) = . * is the conjugate transpose of .The functions (⋅) and   (⋅) are the wavelet threshold function and its first derivative, respectively.The last term in (4) is called the "Onsager reaction term" [24] in statistical physics.This Onsager reaction term helps improve the phase transition (trade-off between the measurement rate and signal sparsity) of the reconstruction process over existing IST algorithms [26][27][28].We can summarize the AMP algorithm in three steps: a residue update step (i.e., (4)), a back-projection step to yield a noisy image  () =  () + *  () ,  and a proximal denoising correction (i.e., (3)).These steps are identical to the ones of IST methods.As a result, the AMP algorithm can be viewed as a special IST method.One benefit of this interpretation is that if R() varies, only the proximal denoising operator needs to be altered, when the AMP algorithm is utilized to solve (1).Some AMP variants [13,34,35] have been proposed with various forms of R(), such as total variation [34], a Cauchy prior in the wavelet domain [35], and tree sparsity [13].This motivates us to find a more suitable prior for natural images to improve the AMP algorithm.

Side Information-Aided LSM Prior Modeling
The distribution in ( 5) is defined as a Laplacian scale mixture.Note that, for most choices of (  ), we do not have an analytical expression for (  ).

The Proposed Model.
In this section, we formulate the side information-aided LSM prior model and apply the MAP theory to estimate the original signal.As discussed earlier, the global information of image refers to a prediction that is actually a reference image.We use this reference image denoted by  as side information which is supposed to be known in each iteration to assist the reconstruction.
Using Bayesian formula, we might derive the following MAP estimation problem: The side information  and the measurements  are supposed to be independent.According to (6), we need to define three terms ( | ), ( | ), and () for the MAP estimation.First, the additive noise  is assumed to be white Gaussian with variance  2 , that is,  ∼ (0,  2 ).Thus, we have the following likelihood of CS: Second, we characterize the nonlocal sparsity of image with the Laplacian scale mixture model.Recall that, in Section 2, we define   = [ ,1 ,  ,2 , . . .,  , ] as the singular value vector of the low-rank matrix   .For each coefficient  , , we assign it a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter, that is,  , ∼ Gamma(, ).Note that the mean of the Laplacian distribution is 0. Hence, the mean subtraction should be carried out as shown in Figure 2. The observation that the singular values can be modeled by a Laplacian distribution has been proposed and validated in [36].Here, we extend this idea by viewing scale parameters as random variables for achieving a better spatial adaptation.Assume that the sparse coefficients are i.i.d., and then the LSM prior of the coefficients can be expressed as () = ∏  =1 ∏  =1 ( , ).Third, for irregular structures, the packed patches are less similar.We wish to preserve the relative difference among these patches.Supposing that the reference image u is available, we wish to get a solution that has a small Euclidean distance from u while being constrained by the self-similarity prior: where   and   denote the patches located at the th position and  is the total number of patches.In practice, the reference image  is produced by minimizing the data fidelity term with the gradient descent method [26].Note that the resulting image  has fine details as well as a small amount of noise; thus the noise level might be a little higher than the estimated one.The image  which integrates the outcomes at the independent processing stage can represent the global information of image.This approach is similar to the one reported in [37], which is a denoising method.It reduces the local-global gap by encouraging the overlapping patches to reach an agreement before they merge their forces by the averaging.
At last, when the packed patches have an obvious dissimilarity, a local constraint imposed on each local patch is added by enforcing a small (i.e., sparse) prediction error   =   − V  , where V is the noiseless version of the image .To get V, the mean filtering is applied.The singular value vector of the prediction error of the th patch is denoted by where with a Gamma distribution prior over the scale parameter, that is,  , ∼ Gamma(, ). is the total number of singular values of the th patch.The proposed hierarchical model can be summarized in Figure 3.We will have an objective function that can be maximized with respect to , if we observe the latent variable  and .The standard approach in machine learning when confronted with such a problem is the EM algorithm.Note that once the spare coefficients (i.e.,  and ) are determined, the image  can be obtained by averaging all reconstructed patches.

EM-AMP Algorithm for CS Recovery
In this section, we simultaneously learn the hidden parameters and do inference.To accomplish this task, we embed the AMP algorithm within an EM framework.2 ) + log  () .
Performing coordinate descent in the auxiliary function (Θ, ) leads to the following updates that are usually called the E step and the M step: :  (+1) = arg min   (Θ (+1) , ) .
The E step and the M step represent the prior learning and the signal reconstruction, respectively.First, we have equality in the Jensen inequality if () = ( | ) and () = ( | ), which implies that the E step can be reduced to      ) . (16)

Image Recovery via the AMP Algorithm.
Once the prior parameters are estimated, the composite regularization problem, that is, (15), can be solved with various CS reconstruction algorithms [29][30][31][32].Owing to its promising performance, the AMP algorithm [24,25] is employed.As discussed earlier, the proximal operator varies according to the regularization term R().Hence we have the following proximal operator ℓ( () ) on the basis of (15): where  () =  () +  *  () denotes a noisy image yielded by back-projection and ℓ( () ) is a denoising operation.To solve this composite regularization problem, that is, (17), we decompose it into two simpler regularization subproblems by using the composite splitting algorithm (CSA) [7], which includes three steps: In order to solve (18), we transform the data from spatial domain to SVD domain for the unity of variable representation by (1) dividing the images into overlapped patches; (2) building the low-rank matrices   =    and  ()  =   ( () +  2  () / 2  ), and then we have ‖ − ( , where  is a scale factor for balancing the increasing energy caused by the repetitive computation because of the overlap division and ‖ ⋅ ‖  is the Frobenius norm [17]; (3) performing the singular value decomposition on   and  ()   to get the singular value vectors   = [ ,1 ,  ,2 , . . .,  , ] and   = [ ,1 ,  ,2 , . . .,  , ].Based on the von Neumann trace inequality [38], we know that tr(( ()   )    ) achieves its upper bound ∑  =1     .Then we have By incorporating ( 20) into ( 18), it yields Taking a derivative with respect to  , , we have which is the global optimum of  , .The noise variance  2 is obtained by maximum likelihood estimation [39]; that is,  2 = ‖ () ‖ 2 2 /.Afterward, we obtain the matrix constructed by similar patches, that is,  (+1)  =   diag( (+1)  )   , and then recover  (+1) 1 by averaging all reconstructed patches.Similarly, we can derive the global optimum of another subproblem by incorporating (20) into (19): where   is the singular value vector of the th patch.Then  (+1) 2 can be recovered by aggregating all reconstructed patches.Because  is the singular value of the prediction error, the reconstructed image solved by this subproblem is the prediction error in fact.Thus, we add V to obtain the final image  (+1) 2 .
To apply the AMP algorithm, computing the Onsager term  () ‖ℓ  ( () )‖ 1 /, which involves computing the derivative ℓ  ( () ), is required.It is not easy to get ℓ  ( () ), since ℓ( () ) do not have an explicit input-output relation.Thanks to the Monte Carlo (MC) method [40], we can simulate ℓ  ( () ) with random numbers.This method has been used to estimate the derivative of BM3D denoiser in [18].More details can be found in [18].

Content-Aware Strategy.
In order to preserve fine details, we distinguish irregular structures from regular structures through the similarity of packed patches.The local regularization and side information-aided term are only utilized for irregular structures.They are adopted to maintain differences among packed patches.We use the normalized mean squared error (NMSE) as the similarity measure: where x is an exemplar patch located at the th position and   denotes the th patch in the similar patch group.The objective function ( 17) is then modified to where   and   denote the weights that correspond to the th patch and ⊙ is the Hadamard product.  and   are determined by where TH1 and TH2 (TH2 > TH1) are two threshold parameters.First, if the patches are similar, that is,  < H1, only the nonlocal regularization is utilized.Second, if the packed patches are not so similar, that is, TH1 <  < TH2, the side information-aided term is added.At last, when the patches have an obvious dissimilarity, that is,  > TH2, the local regularization is superadded.
In general, the proposed algorithm is summarized in Algorithm 1, named as the AMP algorithm with side information-aided LSM priors (SI-LSM-AMP).We also propose an algorithm as a by-product, named as the AMP algorithm with nonlocal LSM prior (NL-LSM-AMP), which is a special form of SI-LSM-AMP by setting  < TH1 for all image patches.By comparing SI-LSM-AMP with NL-LSM-AMP in the experiments, one can validate the effectiveness of our content-aware strategy.

Experiments
In this section, we report the experimental results of the proposed CS recovery method SI-LSM-AMP.

Experiment Setup.
We generate the CS measurements by randomly sampling the Fourier transform coefficients of test images; that is,  is partial Fourier transform with  rows and  columns.Thus, the sampling ratio is /.We follow the sampling strategy of previous works ( [7,12]), which randomly choose more Fourier coefficients from low frequency and less on high frequency and set the sampling ratio near to 0.2, as CS imaging is always interested in low sampling ratio cases.All measurements are mixed with Gaussian white noise with standard deviations 5 and 15, representing the environments with low noise and high noise, respectively.Peak Signal-to-Noise Ratio (PSNR) is used for quantitative evaluation.
We conduct experiments on 22 natural images, as shown in Figure 4.Besides the NL-LSM-AMP algorithm, we compare the SI-LSM-AMP algorithm with image reconstruction algorithms, including the original AMP method [24], two tree-based algorithms: WaTMRI [12] and Turbo-AMP [13], and two nonlocal sparsity algorithms: NLR-CS [17] and BM3D-AMP [19].For fair comparisons, all codes are downloaded from the authors' websites.All algorithms terminate after 50 iterations, except 20 iterations for Turbo-AMP and 260 iterations for NLR-CS.For the WaTMRI algorithm, in order to achieve the best result, the regularization parameters are tuned to 0.8 and 0.35.The original AMP uses Daubechies wavelet to decompose the images.Then the wavelet denoising is applied with a threshold 0.8.For the rest of algorithms, default settings in their codes are adopted.All experiments are on a desktop with 3.80 GHz AMD A10-5800K CPU.Matlab version is R2014a.
The main parameters of the proposed algorithms are set as follows: patch size is 6 × 6; a total of 36 similar patches are selected for each exemplar patch.To reduce the computational complexity, we extract exemplar image patch in every 5 pixels along both horizontal and vertical directions.For better CS recovery performance, some parameters are tuned empirically, including (1)  2  = 3.3 2 , which implies that the proportion between the data fidelity term and the side information-aided term is 1 : 0.3; (2)  = 5.2 in ( 22); (3) the parameters of Gamma distribution  =  = 0 and  =  = 0.01 as suggested in [21]; (4) the combination parameters  1 = 2/3 and  2 = 1/3.Recall that the noise level should be a little higher (1.08 times in our experiments) than the estimated one because the reference image introduces a small amount of noise.The experimental results including average PSNR, visual quality, and runtime are present.

Average PSNR Evaluation.
To reduce the randomness, we run each experiment 5 times for each image.The average PSNR results are shown in Figures 5 and 6. Figure 5 shows the PSNR results in the environment with low noise, while Figure 6 shows the ones with high noise.From these figures, one can draw the following four conclusions.First, four nonlocal sparsity-based methods SI-LSM-AMP, NL-LSM-AMP, BM3D-AMP, and NLR-CS perform better than others, which implies that the assumption of the nonlocal sparsity structure is more appropriate for natural images when compared to the tree structure and the standard sparsity.Second, the highest PSNR results are achieved by the proposed algorithm SI-LSM-AMP.In fact, the PSNR gain of the SI-LSM-AMP algorithm over the next-best algorithm NL-LSM-AMP can be as much as 0.25 dB on average.This result validates the effectiveness of our content-aware strategy that treats irregular and regular structures differently.Third, through the use of the LSM prior model, the SI-LSM-AMP algorithm and the NL-LSM-AMP algorithm outperform the other nonlocal sparsity-based AMP method BM3D-AMP (in fact, by 0.88 dB and 0.63 dB on average).Therefore, we can conclude that the LSM prior model is more appropriate for representing the nonlocal sparsity of natural images.At last, the PSNR curves decline with the decrease in sampling ratio or with the increase in measurement noise.The SI-LSM-AMP algorithm always performs better with low sampling ratio or strong measurement noise.For better quality perception, experiments performed on Baboon and Boat images present the similar results in Figures 5(b) and 5(c) and Figures 6(b) and 6(c), respectively.In a word, these results validate the superiority of the proposed SI-LSM-AMP algorithm in objective quality and the effectiveness of the LSM prior model.

Visual Quality and Runtime Evaluation in Lower-Power
Noise Environment.Figures 7 and 8 show the visual comparisons of the reconstructed results on Baboon and Boat images with 20% sampling by different methods with measurement noise with standard deviation 5, while the corresponding iterative curves are given in Figures 9 and 10, respectively.From Figures 7 and 8, we can clearly see that four nonlocal sparsitybased methods are still better than others.Among them, the SI-LSM-AMP algorithm enjoys great advantages over the NL-LSM-AMP algorithm in producing clearer image, for example, on the area of hair (Figure 7) and beach (Figure 8).It can not only perfectly reconstruct large-scale sharp edges but also well recover small-scale fine structures.The images reconstructed by the NL-LSM-AMP algorithm and the BM3D-AMP algorithm are too smooth.These two methods all have a strong assumption of the nonlocal self-similarity structure.However, many images with irregular structures do not strictly follow this assumption.We could find that the SI-LSM-AMP algorithm has great superiority on the images with irregular structures, because we combine local and global sparsity, which soften and complement the nonlocal selfsimilarity structure assumption for irregular structures.We also compute the PSNR as well as the structural similarity index (SSIM), which better reflects the visual quality of the images.The SI-LSM-AMP algorithm achieves the highest quantitative results.The superiority of the proposed SI-LSM-AMP in visual quality could be demonstrated by these results.
The CPU time and PSNR are traced in each iteration for each of the methods.Figures 9 and 10 present the CPU time versus PSNR curves and iteration number versus PSNR curves, respectively.We can see that the SI-LSM-AMP algorithm achieves higher PSNR results after about 30 iterations in Figures 9(a  Therefore, the PSNR results of the SI-LSM-AMP algorithm in the first 30 iterations are identical to the ones of the NL-LSM-AMP algorithm.The SI-LSM-AMP algorithm is relatively slow.The main computational burdens are introduced by iteratively applying SVD on each of the patch groups.The computational time of our algorithm can be further reduced through the use of the parallel computing method or C language development.Besides, in the case of NLR-CS with more iterations, its results are every five iterations presented in Figures 9 and 10.These curves demonstrate that the SI-LSM-AMP algorithm can converge to a good reconstructed result in a reasonable amount of time.(2) compared to the reconstructed images in the presence of lower-power measurement noise, the ones in higher-power noise environment tend to be more noisy (e.g., AMP, Turbo-AMP, WaTMRI, and NLR-CS) or smooth (e.g., BM3D-AMP and NL-LSM-AMP); however, the SI-LSM-AMP algorithm can better remove the artifacts and preserve important image structures more effectively even when the measurement noise is high.The corresponding PSNR and SSIM results of these algorithms are also provided, from which we can see that the proposed algorithm achieves the highest quantitative results.This indicates that the SI-LSM-AMP algorithm is shown to be more robust to noise.The corresponding CPU time versus PSNR curves and iteration number versus PSNR curves are given in Figures 13 and 14.From these figures, we can see that all algorithms converge to reconstructed results more quickly in higherpower noise environment.Among them, the PSNR results of the proposed SI-LSM-AMP algorithm are higher than all other competing methods.

Visual Quality and Runtime Evaluation in
We could find that, compared to other nonlocal methods, the SI-LSM-AMP algorithm has great superiority in higher-power noise environment.These results are reasonable because the mismatching issue is worse when the measurement noise is higher, leading to the difficulty in finding similar patches.However, other nonlocal algorithms depend on self-similarity structure only, which makes them hard to perform well.In contrast, besides the self-similarity structure, the SI-LSM-AMP algorithm also uses the local and global sparse prior of natural images and thus has a soft assumption of the self-similarity structure.As a result, one advantage of the SI-LSM-AMP algorithm is that it is more appropriate for compressive imaging under severe environment, where compressive samples are subject to strong measurement noise.

Conclusion
We have proposed an effective structured AMP algorithm for CS image reconstruction.Our work has the following contributions.First, guided by structure sparsity theories, we introduce the Laplacian scale mixture distribution to model the nonlocal sparsity for higher-order sparse representation of natural images, and then it is used as a prior constraint for CS problem.By taking the scale parameter as a random variable, it makes the practical representation much more feasible for achieving a better spatial adaptation.Second, in order to maintain differences between packed patches for irregular structures, we combine local sparsity and global sparsity to soften and complement the nonlocal self-similarity structure assumption.It can substantially enhance the details of image.Afterward, an effective algorithm based on the EM and AMP frame is proposed in this paper to solve this model.The derived inference procedures are efficient to estimate both the sparse coefficients and the scale parameter.After learning the prior parameters, the AMP algorithm is utilized to solve the singular value minimization problem for achieving the accurate image reconstruction.Finally, our simulations and experiments on a variety of natural images demonstrate the superiority of the proposed algorithm to the original AMP algorithm and several tree-based and nonlocal sparsity-based algorithms or solvers in CS image recovery.

Figure 1 :
Figure 1: The nonlocal similar patches in natural images.

Figure 2 :
Figure 2: Illustration for the low-rank matrices construction.

Figure 6 :
Figure 6: Average PSNR at different sampling ratios with measurement noise with standard deviation 15.(a) Comparisons on 22 natural images; (b) comparisons on Baboon image; (c) comparisons on Boat image.
) and 10(a) and after about 100 seconds in Figures9(b) and 10(b).Note that, considering the fact that it is hard to distinguish irregular structures from
, for each local patch, we can find the first  most similar nonlocal patches to it.In practice, this can be done by Euclidean distance based block matching in a large enough local window.Let    (or   ) denote an exemplar patch located at the th position.Patches that are similar to    including    itself are found to form a low-rank matrix    = [  1 ,   2 , . . .,    ]:    ∈ R × .Here, we suppose that  ≥ .An objective function that reflects the group sparsity of similar patches with a low-rank regularization term for CS recovery can be formulated as follows: where  is the total number of similar patch groups; ‖  ‖ * is the nuclear norm of   , taking a sum value of its singular values, namely, ‖  ‖ * = ∑  | , |.   = [ ,1 ,  ,2 , . . .,  , ] denotes the singular value vector of   , that is,    =   Σ     , and   is a vector that contains the diagonal elements of Σ  .