Learning-Based Video Superresolution Reconstruction Using Spatiotemporal Nonlocal Similarity

Aiming at improving the video visual resolution quality and details clarity, a novel learning-based video superresolution reconstruction algorithm using spatiotemporal nonlocal similarity is proposed in this paper. Objective high-resolution (HR) estimations of low-resolution (LR) video frames can be obtained by learning LR-HR correlation mapping and fusing spatiotemporal nonlocal similarities between video frames. With the objective of improving algorithm efficiency while guaranteeing superresolution quality, a novel visual saliency-based LR-HR correlation mapping strategy between LR and HR patches is proposed based on semicoupled dictionary learning. Moreover, aiming at improving performance and efficiency of spatiotemporal similarity matching and fusion, an improved spatiotemporal nonlocal fuzzy registration scheme is established using the similarity weighting strategy based on pseudo-Zernike moment feature similarity and structural similarity, and the self-adaptive regional correlation evaluation strategy. The proposed spatiotemporal fuzzy registration scheme does not rely on accurate estimation of subpixel motion, and therefore it can be adapted to complex motion patterns and is robust to noise and rotation. Experimental results demonstrate that the proposed algorithm achieves competitive superresolution quality compared to other state-of-the-art algorithms in terms of both subjective and objective evaluations.


Introduction and Motivation
Factors such as environmental changes, inaccurate focusing, optical or motion blur, subsampling, and noise disturbance can have a negative effect on video visual quality.Superresolution (SR) reconstruction technology [1][2][3][4] aims to reconstruct high-resolution (HR) video sequences from their lowresolution (LR) counterparts.With rapid and significant development of computer vision, there is a growing need for HR videos.Video visual resolution quality plays an important role in accurate moving-target tracking and recognition in intelligent video surveillance systems, which can provide more important details of moving targets.HR medical videos are also very useful for doctors to make correct diagnoses.Therefore, SR video has great research significance and application potential.
In recent years, SR reconstruction technology has been one of the most active research fields in smart image and video analytics and processing.SR techniques have been developed to solve SR problems from the frequency domain to the spatial domain.Currently relevant studies include three main categories: interpolation-based SR methods [5,6], multiframe-based SR methods [7][8][9], and learning-based SR methods [10,11].Interpolation-based SR methods have relatively low computational cost and therefore are well suited for real-time applications.However, degradation models are not applicable to these methods if blur and noise characteristics vary for different LR video frames.Moreover, additional video details cannot be effectively recovered using these methods because some of the details of interest have usually been blurred.
Multiframe-based SR methods produce HR video sequences by fusing several LR video frames, making full use of complementary and redundant information with similar but not exactly identical details between adjacent video frames at different spatiotemporal scales.At present, 2 Mathematical Problems in Engineering two main fields of research address this kind of method.One branch is based on accurate estimation of subpixel motion using methods such as the projections onto convex sets (POCS) method, the maximum a posteriori (MAP) estimation method, and the iterative back projection (IBP) method, which can be applied only to video sequences with relatively simple motions such as global translation.These methods cannot be adapted to more complex motion patterns such as local motion or angles of rotation.The second branch [12,13] is based on a recently proposed novel probabilistic motion-estimation scheme based on nonlocal similarity, which does not rely on accurate estimation of subpixel motion and can be adapted to more complex motion patterns.Using this novel scheme, Protter et al. [14] proposed a nonlocal fuzzy registration scheme-based SR reconstruction framework based on a 3D nonlocal mean filter (3D NLM) [15].Subsequently, Gao et al. [16] improved the nonlocal similarity matching method based on Zernike moment feature similarity and proposed a novel Zernike moment-based SR method which improved the noise robustness and rotation invariance of the NLM-based SR process.However, multiframe-based SR methods cannot be adapted to a larger magnification factor and usually fail when insufficient complementary and redundant information between video frames is provided.
In recent years, learning-based SR methods [17][18][19] have received much attention.These methods estimate the missing high-frequency details in the input LR images by learning the relationship between LR image patches and the corresponding HR patches from a training set of LR and HR image pairs.This kind of method can be adapted to larger magnification factors and can produce better superresolved results.This paper concentrates on the learning-based SR method for video SR.Until now, nearly all studies of this kind of method have focused on SR for static images.In this paper, by combining the spatiotemporal similarities between video frames, learning-based SR methods will be extended to the video SR field.In the learning-based image SR field, the representative methods are the neighbor embedding-based SR methods (NESR) and the sparse representation-based SR methods (SRR).
Motivated by locally linear embedding (LLE), Chang et al. [20] first proposed a neighbor embedding-based SR method, which reconstructed HR patches by learning a mapping from the local geometry of the LR image patch manifold to that of the HR image patch manifold.Since then, numerous other methods have been proposed and have achieved good performance.Gao et al. [21] extended this method using sparse neighbor embedding, in which the k-nearest neighbor (k-NN) of each LR patch was adaptively chosen by describing local structural information using the histograms of oriented gradients (HoG) feature.Timofte et al. [22] proposed a novel anchored neighborhood regression method for fast example-based SR, in which the nearest neighbors were computed using correlation with dictionary atoms rather than Euclidean distance.However, when dealing with a huge number of training patches, searching for the nearest neighbor can be prohibitively slow and also can require much memory.Moreover, with increasing magnification factor, the correlation between LR patches and their corresponding HR patches becomes ambiguous [23].
Recently, sparse representation and dictionary learning have been proven to be very effective for SR.In sparse representation-based SR methods, some coupled dictionary learning methods [24,25] have been proposed for superresolution.Lin and Tang [26] proposed a novel coupled subspace learning strategy to learn mappings between different styles.They first used correlative component analysis to find the hidden spaces for each style to preserve correlative information and then learned a bidirectional transform between the two subspaces.Yang et al. [27] proposed a coupled dictionary learning model for image superresolution.They assumed that coupled HR and LR image dictionaries exist which have the same sparse representation for each pair of HR and LR patches.After learning the coupled dictionary pair, the HR patch was reconstructed on the HR dictionary with sparse coefficients coded by the LR image patch over the LR dictionary.This coupled dictionary learning-based SR method assumes that the representation coefficients of the image pair are strictly equal in the coupled subspace.However, this assumption is too strong to address the flexibility of image structures at different resolutions.To overcome this problem, in [28], a semicoupled dictionary learning-based SR method was proposed, which relaxed the above assumption and assumed that there exists a dictionary pair over which the representations of HR and LR image patches have a stable correlation mapping.He et al. [29] used a beta process for sparse coding, establishing a mapping function between HR and LR coefficients.Moreover, in the methods described in [28][29][30], nonlocal similarities were used to enhance SR performance.
However, these learning-based methods consider nonlocal similarities only in the spatial region of the single image.Therefore, they cannot be directly adapted to video superresolution because they do not make full use of spatiotemporal correlation between video frames, which will influence video spatiotemporal consistency to some extent.This paper aims to solve this problem by extending the concept of single frame-based nonlocal similarities to spatiotemporal nonlocal similarities.A novel learning-based video superresolution method using spatiotemporal nonlocal similarity constraint is proposed which can be adapted to larger magnification factors while effectively preserving video spatiotemporal consistency.
This paper presents a novel learning-based video superresolution reconstruction algorithm using spatiotemporal nonlocal similarity (LBST-SR).The novelty and contributions of this paper are as follows: ( between LR and HR patches based on semicoupled dictionary learning.In addition, a self-adaptive regional correlation evaluation strategy based on regional average energy and structural similarity is used in spatiotemporal similarity matching. (3) An improved spatiotemporal nonlocal fuzzy registration scheme using pseudo-Zernike moment (PZM) and structural similarity is proposed for spatiotemporal similarity matching with the aim of further improving SR accuracy and robustness.
The remainder of the paper is organized as follows.Section 2 gives the observation model for video superresolution reconstruction.Section 3 presents the details of the proposed LBST-SR algorithm.Section 4 gives the experimental results and analysis.Conclusions are presented in Section 5.

Observation Model for Video Superresolution Reconstruction
The observation model for video superresolution reconstruction shown in Figure 1, which describes the relationship between HR and LR video frames for superresolution reconstruction, can be formulated as follows: where   denotes the th original HR video frame and   denotes the th observed LR video frame, which is processed by warping   , blurring   , downsampling , and noise disturbance   .  describes the motions which occur during video acquisition, such as global or local translation and rotation. denotes the frame number in the video sequence.Objective HR estimations of LR video frames can be obtained by learning LR-HR correlation mapping and fusing spatiotemporal nonlocal similarity information between video frames.With the aim of improving algorithm efficiency while guaranteeing superresolution quality, LR-HR correlation mapping is performed only for the visual salient object region, and then an improved nonlocal fuzzy registration scheme using pseudo-Zernike moment feature and structural similarity is proposed for spatiotemporal similarity matching and fusion.The advantages of the proposed LBST-SR algorithm mainly lie in the following three aspects: (1) it does not rely on accurate estimation of subpixel motion and therefore can be adapted to complex motion patterns (local motions, angles of rotation, etc.); (2) it has high rotation invariance effectiveness and is robust to noise and illumination; and

Proposed LBST-SR Algorithm
(3) it can be adapted to larger superresolution magnification factors.The proposed algorithm architecture is shown in where X * denotes the HR estimation of the video sequence. denotes the pixel in the LR sequence . so denotes the salient object region in , and  nso denotes nonsalient region Video sequence in . CML SR (, ) denotes an LR-HR correlation mapping energy element,  STNL SR (, ) denotes a spatiotemporal nonlocal similarity regularization constraint element, and  is the balancing parameter between the two elements.Aiming at improving algorithm time efficiency while guaranteeing superresolution quality, the LR-HR correlation mapping is established only for the human-eye concentrated salient object region  ∈  so .

LR-HR Correlation Mapping
Learning.The HR estimations of LR video frames can be obtained by learning correlation mapping between LR and HR patches.With the objective of improving algorithm efficiency while guaranteeing SR quality, the LR-HR correlation mapping is established only for the human-eye concentrated salient object region  so ∈   in the LR video frame .In this paper, a saliency optimization method based on robust background detection [31] is used to detect and extract the visual salient region.The learning process for LR-HR correlation mapping can be formulated as follows: given the LR patch set  and the HR patch set , the mapping process can be described as a process of seeking a mapping function  = (⋅) from space  to space :  = ().
The correlation learning model based on a coupled dictionary assumes that each pair of HR and LR patches has the same sparse representation coefficients.This assumption is too strong to address the flexibility of frame structures at different resolutions, which will restrict superresolution performance.Therefore, in this research, a more flexible and stable semicoupled dictionary learning method has been used to establish correlation mapping between HR and LR patches, which assumes that there exists a stable correlation mapping between the sparse representation coefficients of HR and LR patches.In the LR-HR correlation learning process based on semicoupled dictionary learning, the LR-HR dictionary pair (  ,   ) and the correlation mapping matrix  can be obtained by minimizing the objective energy function given in min ( Mapping Updating.With the dictionary pair (  ,   ),   , and   fixed, the mapping  can be updated as follows: By solving (6), the following expression can be derived: where  is an identity matrix.
After obtaining the LR-HR correlation mapping  using the above learning process, the superresolution reconstruction is done by using it to derive the HR estimation of the salient object region in the video frame.For the salient object region  so ∈   in LR video frame , the following optimization problem given in (8) where PZM feature with order  and repetition  (0 ≤  ≤ ∞, 0 ≤ || ≤ ) of video frame (, ) is defined as where  and  are the radius and angle, respectively, of the pixels in the polar coordinate system,  = √ 2 +  2 , and = tan −1 (/).The function {  (, )} is the basis of PZM feature, and  *  (, ) denotes the complex conjugate of   (, ).
The nonlocal fuzzy registration scheme based on PZM is based on a similarity match in the nonlocal spatiotemporal domain between video frames at different spatiotemporal scales, which is measured by the Euclidean distance between regional PZM feature vectors.The weight  PZM SR [, , , , ] of each pixel in the nonlocal spatiotemporal region is calculated based on this similarity as follows: where  controls the decay rate of the exponential function and the weight.(, ) is a normalization constant, which is calculated as follows: Note that the higher the PZM order is, the more sensitive the PZM is to noise.Therefore, in the experiments performed in this study, only the first third-order moments, including PZM 00 , PZM 11 , PZM 20 , PZM 22 , PZM 31 , and PZM 33 , were calculated.
By analyzing the weight calculation formula for the PZMbased nonlocal fuzzy registration scheme in (12), it is clear that the time complexity is much too high and increases with the number of LR video frames and the amplification factor.To achieve further improvements in time efficiency and the edge detail-preserving ability of the superresolution algorithm, a novel spatiotemporal nonlocal fuzzy registration scheme (ZSFR) was established by improving the PZM-based spatiotemporal nonlocal fuzzy registration scheme using the similarity weighting strategy based on PZM feature similarity and structural similarity and the self-adaptive regional correlation evaluation strategy.
The improvements in the ZSFR involve two main aspects: (1) with the aim of improving algorithm efficiency, a selfadaptive regional correlation evaluation strategy based on regional average energy and regional structural similarity was constructed for nonlocal similarity matching; and (2) an improved similarity weighting strategy based on regional PZM feature similarity and regional structural similarity was proposed for spatiotemporal nonlocal similarity matching, with the aim of further improving SR performance.To describe this improved ZSFR scheme, the following three definitions are required.
Definition 1 (regional average energy).The video frame  is divided into many regions of equal size, and each region is divided into 5 × 5 patches.The total number of pixels in each region is Num, and the energy value of each pixel is denoted by  1 ,  2 , . . .,  Num , respectively.AE(, ) is defined as the regional average energy centered on pixel (, ) and is calculated as In the improved spatiotemporal nonlocal fuzzy registration scheme, the regional correlation is first evaluated to divide the local regions centered on all pixels (, ) in the nonlocal search region for pixel (, ) into related and unrelated regions.Only related regions are used to calculate the weight, an approach which can further improve time efficiency and is beneficial for mining the most similar patterns to calculate the similarity weight.The regional correlation is calculated by combining the regional average energy and regional structural similarity.Moreover, a selfadaptive threshold  adap is introduced, which yields a selfadaptive regional correlation evaluation mechanism.If two regions are related, the criterion is defined as The self-adaptive threshold  adap is adaptively determined by the average energy AE(, ) for the region centered on pixel (, ), which leads to a more accurate regional correlation evaluation. adap is calculated as where  is an adjustment factor that controls  adap .Experiments have confirmed that the best SR quality is obtained when  is set to 0.08.
With the aim of further improving superresolution accuracy and detail-preserving ability, the similarity weight  EPZM SR [, , , , ] is improved on the basis of the weighting strategy given in (12) by combining the two factors of regional PZM feature similarity and regional structural similarity.The improved similarity weight  EPZM SR [, , , , ] is calculated as follows: where (, ) denotes the pixel to be superresolved and (, ) denotes a pixel in the nonlocal search region  nonloc (, ) centered on pixel (, ).The parameter  controls the decay rate of the exponential function, as well as the weight.(, ) is a normalization constant.

Spatiotemporal Nonlocal Similarity Information Fusion
Based on ZSFR.Spatiotemporal nonlocal similarity information fusion is based on the improved nonlocal fuzzy registration scheme using PZM feature similarity and structural similarity.By learning spatiotemporal nonlocal similarities between video frames, the similarity weight is calculated according to (19).The HR estimation of the video frame to be superresolved can then be obtained by spatiotemporal information fusion, which is implemented by a weighted average based on spatiotemporal nonlocal similarities.
Once the weight  EPZM SR [, , , , ] has been determined, the HR estimation of each pixel in the video frame to be superresolved can be obtained using the weighted average of the pixels in the nonlocal spatiotemporal region.The objective superresolution energy function based on spatiotemporal nonlocal similarity can be expressed as follows: where [ 1 ,  2 ] denotes a 3D spatiotemporal region (temporal sliding window).By minimizing the objective energy function in (21), the HR estimation xstnl of each video frame can be obtained as follows: where Ψ denotes the video frame to be superresolved.Consequently, the proposed learning-based video superresolution reconstruction using spatiotemporal nonlocal similarity can be performed as follows:  where  CML SR denotes the energy function defined in (8) and  is a balancing parameter.

Implementation Steps of the Proposed LBST-SR Algorithm.
The LBST-SR algorithm implementation includes the following steps, as shown in Algorithm 4.  Step 2. Train the LR-HR dictionary pair (  ,   ) and the correlation mapping matrix  by LR-HR correlation learning according to (3).
Step 2. According to the learned dictionary pair (  ,   ) and the LR-HR correlation mapping , map each LR patch of the salient region  so of video frame to its HR estimation ∧  using ( 8) and ( 9).
Step 3. Update x using the improved spatiotemporal nonlocal similarity regularization constraint in (23).
Step 4. Iteratively refine the fusion result for further optimization.Update the counter,  =  + 1.If  ≤ , return to Step 3; otherwise, end the process.the YOUKU website (http://www.youku.com/).The superresolution effects were validated in terms of both subjective visual evaluation and four objective quantitative indices: peak signal-to-noise ratio (PSNR), structural similarity (SSIM), feature similarity (FSIM), and root-mean-square error (RMSE), which were calculated as follows:

Experimental Results and Analysis
where  and  denote the length and width of the video frame;  and  denote the reconstructed frame and the original frame, respectively;   and   are the means;   and   are the standard deviations for the original and reconstructed frames;   is the covariance for the original and reconstructed frames;  1 and  2 are constants; Ω denotes the whole spatial domain of the video frame;   () is a similarity measure of the phase congruency and gradient magnitude features between  and ;   () is a chrominance similarity measure between  and ;   () is used to weight the importance of   () in the overall similarity between  and , where   (),   (), and   () are calculated according to [32].The greater the PSNR is, the closer the reconstructed frame is to the original.The closer SSIM (0 ≤ SSIM ≤ 1) is to 1, the greater is the similarity between the original and reconstructed frame structures.The closer FSIM (0 ≤ FSIM ≤ 1) is to 1, the greater is the similarity between the original and reconstructed frame features.The smaller the RMSE is, the closer the reconstructed frame is to the original.

Experimental Results and
Analysis.This section describes the experiments that were carried out to evaluate the performance of the proposed LBST-SR superresolution reconstruction algorithm and a comparison of these results  with five recently proposed representative state-of-the-art superresolution algorithms in terms of both visual quality and objective quantitative indices, including the learningbased ANRSR [22], DPSR [30], and ScSR [27] algorithms, the 3D nonlocal mean-based NL-SR [14] algorithm, and the Zernike moment-based ZM-SR [16] algorithm.In the experiments, ten benchmark and two spatial video sequences were used: "Forman," "Calendar," "Coastguard," "Suzie," "Mother Daughter," "Miss America," "Ice," "Football," "Carphone," "Akiyo," "Satellite-1," and "Satellite-2."Based on the motion contents, these video sequences are divided into three categories: (1) "Calendar," "Suzie," "Mother Daughter," "Miss America," and "Akiyo" contain small-motion objects; (2) "Forman," "Coastguard," "Carphone," "Satellite-1," and "Satellite-2" contain moderate-motion objects; and (3) "Ice" and "Football" contain fast-motion objects.Some complex motion scenes exist in these dynamic sequences, such as local motion patterns and rotations.Each video sequence was decimated by a factor of 1 : 3 and then contaminated by additive Gaussian white noise with  = 2.In  correlation evaluation strategy based on regional average energy and regional structural similarity, which is an improvement over ZFR scheme.The superresolved results for Frame 29 of the "Calendar" sequence are shown in Figure 9, with the magnified local textures marked by red and blue rectangular boxes.The results demonstrate that the proposed algorithm generates the best visual effects and produces clearer contours and details.The "Calendar" sequence contains complex object motions, including translation motion, occluded areas, and newly appearing object areas.The proposed algorithm still performed well under such complex motion scenes, benefitting mainly from the improved spatiotemporal nonlocal fuzzy registration scheme based on PZM feature and structural similarity, which is robust to complex motion scenes.The local magnified details indicate that ANRSR, DPSR, and ScSR algorithms introduce noticeable annoying artifacts around the edges of each number.ZM-SR algorithm shows some blurring effects.In the local detail area marked by the red rectangle, the quality of the proposed algorithm is comparable to the NL-SR algorithm, but in the magnified road details area marked by the blue rectangular box, the proposed algorithm produces smoother effects, whereas discontinuous edges and annoying block artifacts are generated in the NL-SR algorithm.

Subjective Visual Evaluations.
Figure 10 shows the superresolved results for Frame 18 and the magnified local details of the "Coastguard" sequence.The "Coastguard" sequence contains the complex backgrounds and motions of both object and camera.Moreover, complex motions such as translation, occluded areas, and newly appearing object areas exist in this sequence.Under such complex motion scenes, the proposed LBST-SR algorithm still performed better than the other five algorithms.As can be observed from the magnified local details marked by the red rectangle and details in the background regions, annoying black spots and block artifacts are generated in the ANRSR, DPSR, and ScSR algorithms.ZM-SR algorithm produces blurred edges and details, especially in the complex stone bank background area.Annoying block artifacts and discontinuous edges are generated in the NL-SR algorithm because its nonlocal similarity matching strategy cannot be well adapted to the complex motion scenes.
The superresolved results for Frame 1 of the "Akiyo" sequence are shown in Figure 11, with local details magnified to emphasize visual quality.The magnified visual effects of the face region marked in the red rectangle demonstrate that the proposed algorithm is superior to the other five algorithms and produces a more natural and smoother visual effect.ANRSR, DPSR, and ScSR algorithms produce annoying artifacts in the face region and unnatural skin colors.NL-SR algorithm produces block effects.And some blurring phenomena are generated in the ZM-SR algorithm.
The "Satellite-2" sequence contains local motions, light variation, and more object details. Figure 12 shows the superresolved results for Frame 3 of the "Satellite-2" sequence.

Conclusions
A novel learning-based algorithm to implement video SR reconstruction using spatiotemporal nonlocal similarity was proposed in this paper.On the basis of LR-HR correlation mapping, spatiotemporal nonlocal similarity structural redundancies were used to improve SR quality further.With the objective of improving algorithm efficiency while guaranteeing SR quality, LR-HR correlation mapping was performed only for the salient object region of the video frame, following which an improved spatiotemporal nonlocal fuzzy registration scheme was established for spatiotemporal similarity matching and fusion using the similarity weighting strategy based on pseudo-Zernike moment feature similarity and structural similarity and the self-adaptive regional correlation evaluation strategy.The proposed spatiotemporal nonlocal fuzzy registration scheme does not rely on accurate

Figure 3 :
Figure 3: Objective evaluation indices of the six algorithms for the "Satellite-1" sequence.

Figure 4 :
Figure 4: Objective evaluation indices of the six algorithms for the "Satellite-2" sequence.

Figure 6 :
Figure 6: Objective evaluation indices of the six algorithms for the "Calendar" sequence.

Figure 7 :
Figure 7: Objective evaluation indices of the six algorithms for the "Coastguard" sequence.

Figure 8 : 4 . 2 . 1 .
Figure 8: SR reconstruction visual effects for Frame 6 of the "Forman" sequence with a magnification factor of three.

Figure 9 :
Figure 9: SR reconstruction visual effects for Frame 29 of the "Calendar" sequence with a magnification factor of three.

Figure 8
shows the SR reconstruction visual effects of the six algorithms (ANRSR, DPSR, ScSR, NL-SR, ZM-SR, and LBST-SR) for Frame 6 of the "Forman" sequence, with the magnified local textures marked by the red rectangular box.The frame contains moderate-motion objects (such as local motions of head and mouth and rotation motion of eyes) in the "Forman" sequence.By analyzing global and local detail effects (such as regions around the eyes), it is clear that the proposed LBST-SR algorithm obtains a better visual effect than the other five algorithms.The learning-based ANRSR, DPSR, and ScSR algorithms produce annoying spot artifacts and unnatural visual effects in the face regions.Edge detail blurring phenomena are produced in the ZM-SR algorithm.Some annoying block artifacts are generated in the NL-SR algorithm, which mainly occurred because local complex motions influenced the accuracy of nonlocal similarity matching and fusion between video frames.The proposed LBST-SR algorithm was able to solve this problem because the spatiotemporal similarity matching process can be adapted to complex motion patterns.In comparison, the proposed LBST-SR algorithm not only has clearer edges and contours but also produces smoother effects in the face part.

Figure 10 :
Figure 10: SR reconstruction visual effects for Frame 18 of the "Coastguard" sequence with a magnification factor of three.

Figure 11 :
Figure 11: SR reconstruction visual effects for Frame 1 of the "Akiyo" sequence with a magnification factor of three.

Figure 12 :
Figure 12: SR reconstruction visual effects for Frame 3 of the "Satellite-2" sequence with a magnification factor of three.
‖ −     ‖ 2  and ‖ −     ‖ 2  denote the reconstruction errors; ‖  −   ‖ 2  denotes the mapping error; and  , and  , denote the atoms of   and   , respectively.Sparse Coding for Training Samples.With the initialization of  and the dictionary pair (  ,   ), the sparse coding coefficients   and   can be obtained by solving (4) using   denotes the mapping from   to   and   denotes the mapping from   to   .‖  −    ‖ 2  denotes the mapping error generated during   is mapped to   .‖  −     ‖ 2  denotes the mapping error generated during   is mapped to   .Dictionary Updating.With   and   fixed, the dictionary pair (  ,   ) can be updated using min {  ,  }      − {  ,  ,}      −         2  +       −          2  +         −        2  +            1 +              1 +   ‖‖ 2  s.t.,     2 ≤ 1,       ,      2 ≤ 1, ∀, (3) where ,   ,   , and   denote the regularization parameters needed to balance the terms in the objective function;   and   are the sparse representation coefficients of LR and HR patches, respectively; 1 -optimization algorithms: min {  }      −         2  +         −          2  +            1 , min {  }       −          2  +         −          2  +              1 ,(4)where is solved to obtain its HR estimation: min { , , , }       −    ,       is a patch of LR video frame  and   is the corresponding patch in the initial estimation of HR video frame .An initial estimation of  can be obtained using a Bicubic interpolator.Equation (8) can be solved by alternately updating  , and  , .The objective HR estimation x  of each patch   in the salient object region  so ∈   of  can be derived by solving x Considering good rotation, translation, and scale-invariance properties and insensitivity to noise and illumination of PZM feature, the nonlocal fuzzy registration scheme could be further improved by using this feature, resulting in a more accurate and robust similarity measure between regional features in the nonlocal spatiotemporal domain for weighting calculations.In this way, the performance and robustness of SR reconstruction could be further improved.Unlike traditional methods, the improved spatiotemporal nonlocal fuzzy registration scheme does not rely on accurate estimation of subpixel motion and therefore it can be adapted to complex motion scenes and is robust to noise and rotation.Let PZM(, ) and PZM  (, ) represent two PZM feature vectors of local regions corresponding to pixel (, ) and pixel (, ) in the nonlocal search region  nonloc (, ) of pixel (, ), which can be calculated as PZM (, ) = (PZM 00 , PZM 11 , PZM 20 , PZM 22 , PZM 31 , PZM 33 ) , =   Ŝ, .(9)3.3.Spatiotemporal Nonlocal Fuzzy Registration and Fusion.The superresolution process based on the learned LR-HR correlation mapping uses only the spatial information in the video frame and the LR-HR mapping.Therefore, it

Table 5 :
Comparison of time efficiency for ZFR and ZSFR schemes. of subpixel motion, and therefore it can be adapted to complex motion patterns and is robust to noise and rotation.Experimental results demonstrated that the proposed algorithm achieves competitive SR quality compared to other state-of-the-art algorithms in terms of both subjective and objective evaluations. estimation