Distributed Compressive Video Sensing with Mixed Multihypothesis Prediction

. Traditional video acquisition systems require complex data compression at the encoder, which makes them unacceptable for resource-limited applications such as wireless multimedia sensor networks (WMSNs). To address this problem, distributed compressivevideosensing(DCVS)representsanovelsensingapproachwithasimpleencoder.Thismethodshiftsthecomputational burdenfromtheencodertothedecoderandneedsarobustreconstructionalgorithm.Inthispaper,amixedmeasurement-based multihypothesis(MH)reconstructionalgorithm(mixed-MH)isproposedforDCVStoimprovethereconstructionqualityatlow samplingrates.ConsideringtheinaccuracyofMHpredictionwhenmeasurementsareinsufficient,theavailablesideinformation (SI)isresampledtoobtaintheartificialmeasurements,whicharethenintegratedintorealmeasurementsviaregularization. Furthermore,toavoidthenegativeeffectofSIathighsamplingrates,anadaptiveregularizationparameterisdesignedtobalance thecontributionsofrealandartificialmeasurementsatdifferentsamplingrates.Theexperimentalresultsdemonstratethatthe proposedmixed-MHpredictionschemeoutperformsotherstate-of-the-artalgorithmsinthereconstructionqualityatthesame lowsamplingrate.


Introduction
In traditional video acquisition systems, the Shannon-Nyquist sampling theorem requires a high sampling rate to obtain the sampled video signals without any loss of information.However, because of the inherent temporal and spatial correlations of video sequences, abundant redundancies exist in video signals.Therefore, a computationally expensive compression process is required after acquisition for effective transmission or storing.In applications such as wireless multimedia sensor networks (WMSNs), traditional video acquisition systems are not appropriate because such complex compression tasks are difficult to achieve with the resourcelimited sensors of WMSNs.Recently, distributed compressive video sensing (DCVS) has emerged as a promising video sensing method for these resource-limited applications.This approach incorporates the features of compressive sensing (CS) [1][2][3] and distributed video coding (DVC) [4].Without complex data compression, the encoders in the DCVS system measure the video signals by random projections and directly obtain the compressed measurements.At the decoder, the side information (SI) is generated from the previously reconstructed frames to assist in the reconstruction of the current video frame.Among the existing reconstruction algorithms of DCVS, the multihypothesis (MH) prediction model [5][6][7][8] has achieved significant success in improving recovery performance.Instead of directly reconstructing the video signal, the MH model first predicts the target frame using a linear combination of a group of hypotheses.Because the residual between the original signal and the prediction is more compressible than the original signal, the reconstruction quality is improved [6].
The main objective of an MH model is providing an accurate MH prediction, and considerable research has recently been performed on this topic.In [6], the MH prediction algorithm was first proposed, and it yielded satisfactory performance with low computational complexity.In [7], Chen et al. suggested an elastic net-based MH prediction method based on the assumption that the coefficient vector of MH prediction is sparse.However, this assumption is not realistic in all circumstances.Instead of considering the sparsity of the coefficient vector, Azghani et al. [8] proposed an MH prediction model that exploits the sparsity of frames in the DCT domain and designed an ADMM-based iterative algorithm to solve the derived model.Note that, although the reconstruction quality of the algorithms suggested in [7,8] is improved, they are both computationally expensive compared with the original MH prediction algorithm [6].In general, these three MH-based recovery methods were all established based on the Johnson-Lindenstrauss (JL) lemma, which enables these methods to obtain the MH prediction in the measurement domain.However, a low sampling rate might not satisfy the requirements of the JL lemma and result in poor prediction quality.To solve this problem, Chen et al. [9] integrated the measurements obtained from the SI into original measurements to enhance the reconstruction quality at low sampling rates.However, this method is influenced by the inaccuracy of the SI; therefore, only few measurements of the SI were involved in the prediction, and the original MH prediction model [6] was used to obtain a backup prediction.Consequently, this method does not sufficiently exploit the SI and leads to high computational complexity.
In this paper, a mixed-MH prediction model is proposed as a generalization of the method proposed in [9].The innovations of the proposed mixed-MH prediction are twofold: (a) it utilizes the measurements from the SI in the form of a new regularization rather than directly integrating them into the original measurements, and (b) an adaptive regularization parameter is designed to enhance the robustness of the proposed scheme at different sampling rates.Compared with the method in [9], the proposed mixed-MH prediction algorithm exploits the SI more effectively and yields a better reconstruction performance with lower computational complexity.
The remainder of the paper is organized as follows.Section 2 provides the research background, and the basic concepts of CS theory and MH prediction are introduced.Section 3 details the proposed mixed-MH algorithm and compares it with the method suggested in [9].Section 4 presents the experimental results and the conclusions are drawn in Section 5.

Background
2.1.Compressive Sensing.CS theory indicates that if the signal to be sampled is sufficiently sparse or compressible in some domains, the exact recovery can be yielded with a small number of measurements [2].Suppose that the signal  ∈  ×1 has a sparse representation in the transform domain Ψ, that is,  = Ψ, where  ∈  ×1 is a -sparse coefficient vector; then, the acquisition process of CS can be formulated as follows: where  ∈  ×1 denotes the obtained measurements, Φ ∈  × refers to the measurement matrix, and  = ΦΨ is the sensing matrix.The sampling rate is measured by /.Typically, the dimension of the measurement vector  is much smaller than that of the signal vector , which makes CS recovery an underdetermined problem.However, according to CS theory, if the sensing matrix satisfies the restricted isometry property (RIP), the -sparse vector can be well recovered from  >  log(/) measurements, where  is a small constant [3].Under this condition, the CS recovery is equivalent to solving the following optimization problem: where ‖ • ‖ 0 denotes the ℓ 0 norm.Note that this ℓ 0 norm optimization process is an NP-hard problem.Hence, a general solution can be obtained by replacing the ℓ 0 norm with the ℓ 1 norm, which makes this optimization convex.Various methods have been proposed to effectively solve this ℓ 1 norm optimization problem [10].
In terms of the measurement matrix, a widely used matrix is the orthonormalized Gaussian matrix because its product with most transform basis Ψ satisfies the RIP property with a high possibility [11].In this paper, the orthonormalized Gaussian matrix is denoted as Φ 0 ∈  × , and the measurement matrix is obtained by extracting some rows of Φ 0 according to the sampling rate.Note that an orthonormalized matrix can be thought of as the matrix representation of an orthogonal transformation and its rows can compose an orthonormal basis.Therefore, suppose that the basis composed of the rows of Φ 0 is { 1 ,  2 , ⋅ ⋅ ⋅ ,   }; then the basis of the measurement domain consists of the first  terms, that is, { 1 ,  2 , ⋅ ⋅ ⋅ ,   }.Similar to the definitions of the measurement matrix and measurement domain, in this paper, the matrix Φ 0 is defined as the complete measurement matrix, and the domain that consists of the basis { 1 ,  2 , ⋅ ⋅ ⋅ ,   } is the complete measurement domain.In addition, the matrix generated by stacking the basis { +1 ,  +2 , ⋅ ⋅ ⋅ ,   } is defined as the residual measurement matrix Φ * , and the domain constituted by this basis is the residual measurement domain.

Multihypothesis Prediction.
Generally, considering the limited computational and memory resources of the encoder, block-based compressed sensing (BCS) [12] is adopted to sample each frame in a blockwise fashion.In this context, the MH prediction model also predicts the current frame in a block-by-block procedure.The main objective of the MH prediction model is to predict the target block with an optimized linear combination of a group of hypotheses.Suppose that the target block  , ∈  ×1 is the  th nonoverlapping block in the  th frame and that its measurement vector is  , ∈  ×1 .In this case, Φ ∈  × represents the block-based measurement matrix derived from extracting the first  rows of Φ 0 .Thus, the MH prediction can be obtained by solving the following optimization problem [6]: where  , is the coefficient vector and x, =  ,  , is the desired prediction.Here,  , represents the hypothesis set corresponding to the target block  , .The columns of  , are the vectorizations of hypotheses from the search window in the reference frame.Figure 1 depicts an example of hypothesis set generation.The red square represents the corresponding position of the target block in the reference frame.The blue dotted square, of size ± pixels about the target block, represents the search window.All overlapping blocks in the search window that have the same size as the target block (e.g., the black dotted squares in Figure 1) constitute the hypothesis set  , .However, there is a problem associated with optimizing (3); notably, the original frame  is unknown at the decoder.Therefore, reference [6] suggested recasting (3) into the measurement domain to approximate it; that is, According to the JL lemma [13,14],  points in   can be projected into a -dimensional subspace while approximately maintaining pairwise distances as long as Consequently, the solution in the measurement domain will coincide with that in the pixel domain if the dimension of the measurement domain is sufficiently large.Nevertheless, problem (4) is still unsolvable because of its ill-posed nature.Therefore, the Tikhonov regularization is suggested to impose an ℓ 2 penalty on this least square problem [6]; specifically, where  is a regularization parameter that controls the penalty degree and Γ is the Tikhonov matrix, via which prior knowledge can be imposed on the solution.A common Tikhonov matrix used in MH prediction is a diagonal matrix in the following form: where ℎ 1 , ⋅ ⋅ ⋅ , ℎ  are all hypotheses in the hypothesis set.The prior knowledge involved in this regularization is that the hypothesis closest to the target block in the measurement domain should be assigned the largest weight [6].With this regularization approach, the optimization problem (4) can be solved with a closed-form Tikhonov solution:

The Proposed DCVS Scheme with Mixed-MH Prediction
The proposed mixed-MH prediction scheme is schematically illustrated in Figure 2 and the area denoted by dotted lines highlights the innovations of this paper.At the encoder, the video sequences are divided into several groups of pictures (GOPs) and sampled frame by frame using the BCS [12].Each GOP consists of a key frame and some subsequent nonkey frames, and the sampling rate assigned to the key frame is higher than that to nonkey frames.
At the decoder, key frames are initially reconstructed separately by BCS reconstruction algorithm using smoothed projected Landweber iterations (BCS-SPL) [15].Due to the high sampling rate of key frames, they typically perform better than nonkey frames in recovery.Accordingly, the SI of each nonkey frame is generated by performing motion compensation and interpolation operations on the reconstructed neighbouring key frames, which is similar to DVC [16].The contributions of the SI in mixed-MH are twofold: (a) generate the hypothesis set as the reference frame, and (b) generate the artificial measurements, which are used in a new regularization named SI regularization.With the aid of the SI, the prediction of the current nonkey frame is obtained and residual reconstruction is then performed to yield the final reconstructed frame.

SI Regularization.
The MH prediction method is essentially a motion estimation/motion compensation (ME/MC) technique implemented in the measurement domain.Note that this approach is reasonable only when the dimension of the measurement domain is sufficiently large; that is, the sampling rate is sufficiently high.By contrast, at a low sampling rate, the information derived from the measurement domain cannot produce an accurate prediction.In such a case, the information in some domains other than measurement domain is desired.
According to the definition in Section 2.1, Φ 0 is the matrix representation of the orthogonal transformation from the pixel domain to the complete measurement domain, and the complete measurement domain is the direct sum of the measurement domain and residual measurement domain.Assuming that the projections of the original signal to the measurement domain and residual measurement domain are available, the original signal in the pixel domain can be perfectly recovered by an orthogonal inverse transformation.Therefore, the information associated with the original signal in the residual measurement domain is desired.Because the original signal is unknown at the decoder, the noisy version of it, namely, the SI, is projected into the residual measurement domain to obtain the artificial measurements.Specifically, the artificial measurements can be obtained as follows: where   ∈  ×1 represents the SI and  * ∈  (−)×1 refers to the artificial measurements in the residual measurement domain.
In order to integrate the real measurements with the artificial measurements in an appropriate manner, a model that takes into account the relationship between these two types of measurements is needed.Recently, some works [7,8] have achieved success in improving the MH prediction accuracy by adding regularizations other than the Tikhonov regularization to the original MH model [6].The motivation comes from the fact that the prior knowledge associated with the Tikhonov regularization is not always correct in all circumstances; therefore, some new prior knowledge is desired in prediction generation.Following this philosophy, a mixed measurement-based MH prediction model is proposed to incorporate artificial measurements in the form of regularization.Specifically, the mixed-MH prediction is equivalent to solving the following problem: where the matrix Γ * is given by ) .
The first term in (10) is the original MH term, and it has the same form as the objective function of (6).The second term is the SI regularization, which can be thought of as an artificial measurement version of the first term.Here,  is an adaptive regularization parameter that balances the contributions of both terms.A detailed description of  will be presented in the next subsection.
In essence, the prior knowledge associated with this regularization is that the SI is thought to be close to the original signal in the residual measurement domain.Accordingly, the prediction is forced towards the SI instead of the unknown original signal in the residual measurement domain.Although the SI is just a noisy version of the original signal, it is still helpful when the dimension of the measurement domain is low, that is, when the information that can be derived from the measurement domain is limited.Note that, in the second term of (10), a Tikhonov regularization in the residual measurement domain is also involved other than the squared Euclidean distance constraint.The reason is that the Tikhonov regularization in the first term utilizes the Euclidean distance in the measurement domain instead of the pixel domain, which will also be influenced by a low sampling rate.

Regularization Parameter Selection.
In this subsection, an adaptive regularization parameter  is designed for the mixed-MH prediction model at different sampling rates.As noted above, SI regularization is suggested to force the MH prediction to be close to the SI in the residual measurement domain, but such an approach is helpful only at a low sampling rate.At high sampling rates, the information in the measurement domain is sufficient for accurate MH prediction according to the JL lemma.Consequently, the parameter , which controls the relative contribution of SI regularization compared with the original MH term, is expected to decrease as the sampling rate increases.However, the reality is counterintuitive.The parameter  that presents the best results at different sampling rates is shown in Figure 3(a).As shown,  is not a monotonically decreasing function of the sampling rate.After rewriting the Tikhonov regularization term as the weighted sum of  squared Euclidean distances, the SI regularization term can be thought of as a linear combination of  + 1 squared Euclidean distances in the residual measurement domain.Similarly, the original MH term can be considered a linear combination of  + 1 squared Euclidean distances in the measurement domain.Thus, it is unfair to directly compare the associated contributions because the dimensions of the measurement domain and residual-measurement domain are unequal.To achieve fairness, the parameter  is divided into two factors.One factor is a fraction /(1 − ) that represents the proportion of the dimensions in the measurement and residual measurement domains.Another factor is the function (), which represents the relative contribution of the SI regularization term, assuming that the measurement and residual measurement domain have the same dimension.Figure 3(b) presents the magnitude of () at different sampling rates, which is derived by multiplying (1 − )/ by  in Figure 3(a).As the sampling rate increases, the value of (1 − )/ decreases in an approximately linear manner.Therefore, for simplicity, the function () is defined as follows: where  is a scale factor and  refers to the sampling rate at which the contribution of SI regularization should be zero.As a result, the regularization parameter  is given by Here, the function max(•) is used to ensure that  ≥ 0.  where   represents the SI measurements sampled by Φ  .

A Comparative Discussion
Here, the Tikhonov matrix Γ  is given by To avoid the negative effects caused by the inaccuracy of the SI, reference [9] takes two preventative measures: (a) instead of projecting the SI into the entire residual measurement domain, the sampling rate of the SI is set to be approximately 0.1; (b) the original MH prediction scheme [6] is additionally implemented, and the result is compared with that of RH-MH prediction in the measurement domain.The prediction result that is closer to the real measurements is selected as the final prediction.
In effect, the RH-MH scheme can be considered a special case of the proposed mixed-MH scheme, provided that only a small part of the artificial measurements is considered and  = 1.Compared with RH-MH, there are two main advantages of mixed-MH.(a) First, because the RH-MH method prevents deterioration by adding only a few artificial measurements from the SI, the quantity of all measurements may be insufficient at a low sampling rate.For example, when the sampling rate of real measurements is 0.1, the overall sampling rate, including that of resampling, is 0.2, which is still a low sampling rate.By contrast, the mixed-MH prediction model can incorporate the information from the entire residual measurement domain, thereby improving the prediction quality at low sampling rates.(b) At high sampling rates, although the sampling rate of the SI is 0.1 in RH-MH, it may still have a negative effect on prediction.Moreover, the second preventative measure applied in the RH-MH method cannot completely offset this negative effect since the comparisons of two predictions are not strictly exact in the measurement domain.Unlike the RH-MH, the mixed-MH prediction model avoids the negative effects of the SI using an adaptive regularization parameter .At a high sampling rate, this parameter will become an extremely small value, and the contribution of SI regularization will be suppressed.

Experimental Results
In this section, the recovery performance of the mixed-MH prediction model is evaluated via extensive experiments.Furthermore, the results are compared with those of two representative schemes from the literature, namely, MH-BCS-SPL [6] and RH-MH [9], to verify the effectiveness of the proposed method.Note that the MH-BCS-SPL model utilizes key frames to form the hypothesis set.For fairness, the key frames are replaced with the same SI used in the mixed-MH and RH-MH schemes.
In all schemes, the frames are sampled by BCS [12] with the measurement matrix suggested in Section 2. The SI of each nonkey frame is generated by an efficient frame rate upconversion tool [17].The BCS-SPL [15] is employed as the CS recovery algorithm for residual reconstruction.Since key frames are reconstructed in the same way, the experiments only compare the recovery performance of nonkey frames.
Note that all the experiments are performed in MATLAB R2017a on an Inter(R) Core (TM) i7-7700HQ CPU (2.80 GHz) with 8.00 GB of RAM and the Windows 10 operating system.

Parameter Settings.
In experiments, every two frames were considered a GOP, where the first frame was the key frame with a sampling rate of 0.7 and the second frame was the nonkey frame sampled at a rate varying from 0.05 to 0.3.In all cases, the block size is 16 pixels and the search window size is set to ±12 pixels.
In the mixed-MH prediction method, two parameters  and  should be specified.Based on various video sequences, a value of  = 0.6 yielded the best results.This finding implies that SI regularization becomes useless when the sampling rate is larger than 0.6, but, in practice, the improvement of mixed-MH prediction faded away as long as the sampling rate is larger than 0.3.For the parameter , there was no unified value for different video sequences because the quality of the SI varied for different sequences; therefore, the penalty degree of SI regularization was difficult to standardize.In practical applications, a larger value of parameter  will provide a better recovery performance, provided that the high-quality SI is available for the target video signal.Generally, a value of  ∈ (0.5, 2.5) yields acceptable results.
Compared with the MH-BCS-SPL method, the mixed-MH scheme displays a significant improvement at low sampling rates; specifically, it yields PSNR improvements of 2.4 dB, 4.1 dB, 2.78 dB, and 2.81 dB over the MH-BCS-SPL method for "coastguard," "hall," "mother-daughter," and "waterfall," respectively, at a sampling rate of 0.05.Here, the  values used for these four video sequences in the mixed-MH model are 0.64, 0.52, 2.2, and 0.82.Notably, the  value of "hall" is much larger than that of other video sequences, which implies that the SI of "hall" is highly accurate and the penalty of SI regularization can be enhanced to yield better reconstruction quality.
A comparison of the mixed-MH and RH-MH results shows that the mixed-MH scheme also achieves a significant improvement in PSNR performance over the RH-MH scheme at low sampling rates.For example, 2.1 dB PSNR improvements are obtained on average for four test video sequences at a sampling rate of 0.05.In addition, at high sampling rates, the improvement of the RH-MH scheme over MH-BCS-SPL rapidly decreases, and MH-BCS-SPL even displays better performance in some cases.In contrast, the mixed-MH scheme approximately maintains the same reconstruction quality as MH-BCS-SPL due to the use of a small regularization parameter at high sampling rates.
To display the superior performance of the mixed-MH scheme in a subjective way, the reconstruction results for mixed-MH, RH-MH, and MH-SI-BCS-SPL are shown in Figure 5.The frame recovered by MH-BCS-SPL is noisy and suffers from serious blocking artefacts.The RH-MH scheme alleviates the blocking artefacts, but the recovered frame is blurry in the motion area, for example, in the area of the boat.In the frame recovered by mixed-MH, there are no blocking artefacts in the entire frame and the motion area is the clearest.These figures demonstrate that the mixed-MH scheme yields the best recovery performance at low sampling rates and confirms the PSNR improvements in Figure 4.
In terms of computational complexity, these three schemes were evaluated based on the average CPU time of nonkey frame reconstruction.As presented in Table 1, the mixed-MH requires a slightly longer recovery time compared with MH-BCS-SPL due to the process of SI regularization.However, the RH-MH scheme is more complex than the mixed-MH and MH-BCS-SPL methods because it includes an additional original MH recovery task.The CPU time is approximately two times that of the MH-BCS-SPL scheme.However, in general, the complexity of these three schemes is relatively low compared with most algorithms in the literature.

Conclusion
In this paper, a novel MH prediction scheme based on mixed measurements is considered for DCVS to improve the reconstruction quality at low sampling rates.For the case of low sampling rates, the information in the measurement domain cannot provide an accurate MH prediction, which leads to poor CS reconstruction quality.To address this problem, the proposed mixed-MH scheme capitalizes on the information in the residual measurement domain by projecting the SI into this domain and enhances the MH prediction quality via an SI regularization.Furthermore, considering the negative effect of this regularization at high sampling rates, an adaptive regularization parameter comprising two factors is designed.The

Figure 4 :
Figure 4: Recovery performance of four video sequences using the mixed-MH, RH-MH, and MH-BCS-SPL methods.

Figure 5 :
Figure 5: Visual comparison of the reconstruction results for the 4th "coastguard" frame at a sampling rate of 0.05.
arg min

Table 1 :
Average CPU time per frame for each method at different sampling rates.experimentalresults objectively and subjectively demonstrate the superior performance of the proposed scheme over other state-of-the-art DCVS algorithms.