Spatiotemporal Super-Resolution Reconstruction Based on Robust Optical Flow and Zernike Moment for Video Sequences

In order to improve the spatiotemporal resolution of the video sequences, a novel spatiotemporal super-resolution reconstruction model (STSR) based on robust optical flow and Zernike moment is proposed in this paper, which integrates the spatial resolution reconstruction and temporal resolution reconstruction into a unified framework.Themodel does not rely on accurate estimation of subpixel motion and is robust to noise and rotation. Moreover, it can effectively overcome the problems of hole and block artifacts. First we propose an efficient robust optical flow motion estimation model based on motion details preserving, then we introduce the biweighted fusion strategy to implement the spatiotemporal motion compensation. Next, combining the self-adaptive region correlation judgment strategy, we construct a fast fuzzy registration scheme based on Zernike moment for better STSR with higher efficiency, and then the final video sequences with high spatiotemporal resolution can be obtained by fusion of the complementary and redundant information with nonlocal self-similarity between the adjacent video frames. Experimental results demonstrate that the proposed method outperforms the existing methods in terms of both subjective visual and objective quantitative evaluations.


Introduction
The resolution quality of the video sequences, which are collected by multisource vision sensors, plays an important role in the accurate moving targets recognition and tracking of the intelligent monitoring and control system. However, some factors such as light, inaccurate focusing, optical or motion blur, subsampling, and noise disturbance can negatively affect the visual quality of the video sequences. In this situation, the spatiotemporal super-resolution (SR) reconstruction technology [1] can provide an excellent solution, which can be described as in Figure 1. By making full use of the complementary and redundant information with similar but not exactly the same details in the different spatiotemporal scales between the adjacent video frames, video sequences with high resolution (HR) via fusion of several low resolution (LR) video frames can be produced, thus it would have great research significance and application potential for the intelligent monitoring and control.
In recent years, the spatiotemporal super-resolution reconstruction technology has become the focus of much research [2][3][4][5]. Now lots of researchers make use of the complementary and redundancy features between multiframe images and make endeavors to research the SR reconstruction via fusion of multiframe images, which aims to improve the spatial resolution of each image or video [6][7][8][9]. However, the traditional methods [10,11] usually rely on accurate estimation of subpixel motion, which only constrain their applicability to video sequences with relatively simple motions such as global translation. Thus in recent years, some scholars have proposed a novel fuzzy registration scheme for probabilistic estimation of motion based on similarity match and introduced it into the super-resolution methods [12] for further improving the spatial resolution of image or video, which can effectively avoid the accurate estimation of subpixel motion. Using such a scheme, Protter et al. [13] proposed a nonlocal mean (NLM) based SR framework by extending the NLM filter [14,15] concept successfully applied in the denoising field into the SR field. Su et al. proposed a spatially adaptive block-based SR model [16].
However, there are also some limitations which exist in the new developed fuzzy registration scheme. If   angle rotations exist in the image or video sequence, the correlation between corresponding pixels becomes weak, and then it will be difficult to use LR images or video frames effectively in the process of SR reconstruction. Moreover, if the LR images or video frames are noised, the reconstruction quality will be affected seriously. Thus considering good properties of rotation, translation, and scale-invariance of Zernike moment (ZM) [17,18], we propose a fast fuzzy registration scheme based on ZM by using the self-adaptive region correlation judgment strategy, which could make an efficient similarity measure between region features in the spatiotemporal nonlocal domain for weight calculation. Based on that, we construct a novel spatiotemporal SR reconstruction model based on robust optical flow and ZM, which makes full use of the nonlocal self-similarity and redundant information in the different spatiotemporal scales between the adjacent video frames and produces video sequences with high resolution via fusion of several LR video frames. Meanwhile, the new model integrates spatial SR and temporal SR into a unified framework, which can improve both the spatial resolution and the temporal resolution and make the video sequences more clear and fluent. Different from the traditional SR reconstruction methods, the proposed method does not rely on accurate estimation of subpixel motion and can be adaptive to many kinds of complex motion patterns. The traditional motion vector based frame interpolation technology [19,20] can also improve the temporal resolution, but because of the inevitable influence from the motion estimation error, the visual block or hole artifacts usually exist in the interpolation frame. Our proposed method can effectively overcome these artifacts while improving the spatial and temporal resolution.
The contributions of this paper are as follows. (1) We propose a novel spatiotemporal SR reconstruction model for video sequences based on robust optical flow and ZM. (2) We propose a robust optical flow motion estimation and compensation model based on motion details preserving. (3) By introducing the self-adaptive region correlation judgment strategy, we construct a fast fuzzy registration scheme based on ZM for better STSR with higher efficiency. (4) An efficient iterative curvature-based interpolation (ICBI) scheme is introduced to obtain the initial HR estimate of each LR video frame.
The remainder of the paper is organized as follows. Section 2 presents our proposed model architecture. Section 3 describes the algorithm implementation of our model. Section 4 gives the experimental results and analysis. Conclusions are presented in Section 5. , by using the spatiotemporal reconstruction technology. Thus in this paper, a novel spatiotemporal SR reconstruction model (STSR) based on robust optical flow and Zernike moment is proposed, which integrates the spatial SR and temporal SR into a unified framework. The architecture of the proposed model is shown in Figure 2. It mainly includes the following three processes for the spatiotemporal SR reconstruction modeling.

The Model Architecture
First, by making the motion analysis in the spatiotemporal domain of the video sequence { [ , , ]} =1 , ∈ , we propose a robust multilayered optical flow motion estimation method based on motion details preserving to obtain the motion vector ( , V).
Then, according to the obtained motion vector ( , V), we introduce an efficient biweighted fusion strategy to implement the spatiotemporal motion compensation, aiming at obtaining the compensated video sequence , ∈ . Finally, the fast fuzzy registration scheme based on ZM is proposed to implement the efficient spatiotemporal SR reconstruction and optimization for the sequence { [ , , ]} =1 , ∈ , via fusion of the nonlocal complementary and redundant information between the adjacent video frames, which can produce high quality video sequence { [ , , ]} =1 , ∈ , with high spatiotemporal resolution.

Iterative refinement mechanism
Optimization + This process mainly includes three operations: initial HR estimation based on ICBI scheme, iterative multiframe fusion, and deblurring.

Robust Optical Flow Motion Estimation Model Based on Motion Details Preserving.
Owing to the brightness constancy and motion smoothness constraints, the traditional optical flow motion estimation methods are not usually robust to light noise, and lack the strong ability of motion details preserving [21]. In order to solve this problem, we propose a novel robust optical flow motion estimation model. In our model, the iterative multiresolution coarse-to-fine strategy and the Total Variation (TV) idea are applied in the model framework, which could effectively avoid falling into the local optimum and further improve the time efficiency as well.
Traditional optical flow calculation methods based on TV usually have higher computational complexity, and also when larger displacements exist in the video sequence, the error of motion estimation becomes larger. In order to overcome this problem, the iterative multi-resolution layered mechanism based on Gaussian pyramid is introduced in our model to calculate optical flow from coarse to fine in the multiresolution scales. The final motion vectors are obtained by adding the offset got in the higher solution to the motion vector got in the lower solution. The new mechanism cannot only effectively improve time efficiency, but can also obtain more reliable motion vectors.
Given two adjacent frames 1 and 2 in the video sequence, in the traditional TV based optical flow model, the process to obtain the motion vectors by optical flow motion estimation is equal to solving the minimum speed vector ( , V) of the objective energy function shown as follows: where ( , V) denotes the data item and ( , V) denotes the regularization item, which are calculated, respectively, as follows: In the above calculation method, the minimum process for data term ( , V) is based on the brightness constancy constraint, so it usually makes the traditional model greatly influenced by some factors such as light, shadows, or occlusion. And for the regularization item ( , V), it is based on the motion smoothness constraint, but motion is often discontinuous near the contours or edges of the video frame, thus it usually makes the motion estimation perform worse for the motion discontinuity points.
Thus, on the basis of the traditional methods, we make improvements and optimization for the optical flow model, in order to further improve the model robustness and motion estimation accuracy. The specific details are shown as follows.
First, in order to enhance the model robustness to the factors such as light noise, we make improvement for the data item ( , V) in the optical flow model and construct a new data item with the combination constraints of brightness constancy and gradient constancy, which is calculated as follows: where the parameter is a weight adjustment factor between the two constraints. ( , ) denotes luminance value for the pixel ( , ) at time point . , , and denote partial derivatives on , , and for ( , ), and ( , V) denotes motion vector obtained by optical flow estimation.
Second, in order to effectively protect the motion discontinuity and edge details, inspired by the ideas of [22,23], a motion structure adaptive strategy is introduced in the regularization item of our optical flow model. The new improved regularization item is defined as follows: where |∇ , | + |∇V , | is the traditional TV regularization operation. ( , ) is the adaptive weight to protect motion details, which is calculated as follows: Results from large quantities of experiments show that when the parameter is set to 0.8, the motion estimation obtains best performance.
Finally, a heuristic nonlocal median filtering item WNL is introduced in our model, which can make the optimization for optical flow motion vector on each level using the adaptive weighted median filtering. This process can be formulated as follows: where , is the set of neighbors of pixel ( , ) in a large nonlocal region.̂, andV , denote the flow field estimate.
, , , is the adaptive weight factor for the pixel ( , ). Inspired by the idea in [24], the weight , , , is determined according to the spatial distance, the color-value distance, and the occlusion state, which is calculated as follows: where ( , ) and ( , ) denote color vectors in the Lab space, ( , ) and ( , ) denote occlusion variables, and 1 = 7, and Owing to the pixels in the occlusion regions between the adjacent video frames lack correspondence, the motion vectors estimated from these regions are usually not accurate. Thus we need to make a further optimization for the estimated motion vectors by occlusion perception refinement. Comprehensively considering the flow difference and the pixel projection difference, we apply the following method to detect the occlusion region and solve the occlusion variable ( , ). Consider where (⋅) follows nonnormal Gaussian prior assumption with zero mean. ( , ) denotes the flow difference; ( , ) denotes the pixel projection difference. In our experiment, = 0.3 and = 20. Based on the above three points of optimization, we construct the objective energy function of our new optical flow model shown in (9). The final optical flow motion vector ( , V) with high accuracy is obtained by minimizing where parameters and are weight adjustment factor between ( , V), ( , V), and WNL .
In the optimization process using the heuristic nonlocal median filtering item WNL , we design a strategy to further improve the time efficiency for our optical flow model. Given the estimated optical flow, first we detect the motion boundaries using a Canny edge detector and then dilate these edges with a 5 × 5 mask to obtain flow boundary regions. In Mathematical Problems in Engineering 5 these regions, we apply the adaptive weight in (7) in a 15 × 15 nonlocal window. But in the nonboundary regions, we apply the equal weight in a 5×5 window for the median calculation.

Spatiotemporal Motion Compensation Using the Biweighted Fusion Strategy.
After the motion vector ( , V) is obtained using the optical flow model introduced in Section 3.1, the spatiotemporal motion compensation scheme is introduced to predict the intermediate missed frames in the video sequence, which aims to obtain their initial estimation.
The traditional motion compensation strategies based on single directional motion vectors usually produce some visual artifacts such as block effect, frame distortion, and motion blurring, which significantly influence the video visual quality. To some extent, some complex strategies can obtain better performance, but too much time cost. In order to obtain better compensation effects, and meanwhile improve the algorithm time efficiency, we introduce an efficient biweighted fusion strategy to make the spatiotemporal motion compensation. The predicted frame pixel energy value can be determined according to the following calculation method: Comprehensively, considering the algorithm time complexity and compensation accuracy, the parameters 1 and 2 are both set to be 0.5 in our experiment.
Based on Sections 3.1 and 3.2, the general framework of our proposed optical flow motion estimation and spatiotemporal motion compensation is shown in Figure 3.

Enhanced Spatiotemporal Super-Resolution Reconstruction Based on Robust Optical Flow and Zernike Moment.
Through the experimental analysis, we can see that some artifacts of hole usually exist in the spatiotemporal predicted frames, which is mainly caused by the inevitable optical flow motion estimation errors. Thus in order to overcome this problem, through making full use of the nonlocal selfsimilarity and redundant information in the spatiotemporal domain between the adjacent video frames, we propose a fast fuzzy registration scheme based on ZM and then apply multiframe information fusion strategy to construct an efficient enhanced spatiotemporal super-resolution reconstruction model to reconstruct and optimize the predicted frames, which aims to obtain predicted frames with more pleasing visual quality and further improve the temporal resolution of the video sequence. Moreover, this new model can implement the spatial resolution reconstruction for the initial LR video sequence and finally obtain a video sequence with high spatiotemporal resolution.
Different from the traditional SR reconstruction approaches, the new scheme is not dependent on the accurate estimation of subpixel motion, which implements SR reconstruction by mining the nonlocal self-similarities between several adjacent video frames. Because of good rotation, translation, and scale-invariance properties of ZM, the proposed model can perform better when some complex motion patterns exist in the video sequence such as arbitrary angles of rotations. What is more, ZM is not sensitive to noise, thus our model also has well noise robustness. The implementation of our proposed STSR model mainly includes the following three steps.
Step 1. An efficient iterative curvature-based interpolation scheme is introduced to obtain the initial HR estimation { [ , , ]} =1 , ∈ , of the LR video sequence , ∈ , after motion compensation.
Step 2. On the basis of the initial HR estimation { [ , , ]} =1 , ∈ , each video frame is super-resolved using the multiframe information fusion strategy based on the proposed fast fuzzy registration scheme.
Step 3. The deblurring operation and iterative refinement mechanism are applied to optimize the super-resolved video sequence, which aims to further improve the reconstruction quality. And finally the high quality video sequence { [ , , ]} =1 , ∈ , with high spatiotemporal solution is obtained.
In the first step, we introduce an efficient iterative curvature-based interpolation scheme (ICBI) [25] to provide better HR initial estimation for the second step, which would significantly influence the weight calculation in the first iteration of the next fusion process. Compared with traditional interpolation schemes, ICBI scheme used in our method, based on the continuity of the second-order derivatives and energy curvature, not only is simple and extremely effective in removing blurring or jaggy artifacts, but also has real-time performance. In this scheme, a rough estimate of the energy of each interpolated pixel (2 + 1, 2V + 1) is calculated as follows: (2 + 1, 2V + 1) where V 1 (2 + 1, 2V + 1) and V 2 (2 + 1, 2V + 1) are local approximations of the second-order derivatives along the two diagonal directions using the eight neighboring pixels. However, the pixel energy obtained above is only a rough estimate and we need continuous iterative refinement in some way. Following Giachetti and Asuni [25], the rough estimate of energy for each pixel is modified according to (12), and then we obtain an initial HR estimate of each LR video frame with higher quality.
(2 + 1, 2 + 1) = (2 + 1, 2V + 1) + (2 + 1, 2V + 1) where , , and are adjustment factors that control the proportions of the curvature continuity energy , the curvature enhancement energy , and the isolevel curve smoothing energy . We calculate these three energies using (5), (9), and (10) in [25]. Once the initial HR estimate of each LR video frame is obtained, the next step (Step 2) is the core of the algorithm, which is established on the basis of the fast fuzzy registration scheme based on ZM. We make weight calculation by mining the nonlocal self-similarity patterns between the frame to be super-resolved and each LR video frame, and then the HR estimation of the frame to be super-resolved could be obtained by weighted average. The weight ezer [ , , , , ] of each pixel in the nonlocal neighboring region is calculated based on the similarity with ZM. On the basis of [17], to improve the time efficiency, we introduce a region correlation judgment strategy controlled by a self-adaptive threshold, which yields a spatiotemporal adaptive model. Moreover, it is beneficial for mining the most similar patterns to calculate the weight based on similarity, and thus SR quality can also be improved to some extent. To describe our improved method, we first provide the following definitions.

Definition 1.
A video frame is divided into many regions of equal size, and each region is divided into 5 × 5 patches. The total number of pixels in each region is Num, and pixels energies are denoted by 1 , 2 , . . . , Num , respectively. We define ( , ) as the average energy for the region centered on pixel ( , ), calculated as Definition 2. Given two regions centered on pixels ( , ) and ( , ), denoted by ( , ) and ( , ), respectively, the corresponding feature vectors extracted from these two regions are ZM( , ) and ZM ( , ). The feature similarity between these two regions is defined as where ZM( , ) and ZM ( , ) represent two ZM feature vectors for pixel ( , ) and pixel where = √ 2 + 2 , = tan −1 ( / ), * ( , ), represents the complex conjugate of ( , ). In our SR reconstruction model, the region correlation judgment is first applied to divide the regions centered on all pixels ( , ) in the search region for pixel ( , ) into related and unrelated regions; only related regions are used to calculate the weight, which can further improve the time efficiency. For the region correlation judgment, a self-adaptive threshold adap is introduced. If two regions are related, adap is defined as The self-adaptive threshold is adaptively determined by the average energy ( , ) for the region centered on pixel ( , ). This leads to more accurate judgment of region correlation. The self-adaptive threshold is calculated as where is an adjustment factor that controls adap . Experiments confirmed that better SR quality is obtained when is set to 0.08. Based on the above ideas, we construct the following enhanced weight calculation formula based on similarity calculation with ZM.
where ( , ) denotes the pixel to be super-resolved and ( , ) denotes the pixel in the nonlocal neighboring region nonloc ( , ) of ( , ). The parameter controls the decay rate of the exponential function and the weight. ( , ) is a normalization constant.
It is worth noting that the higher the ZM order is, the more sensitive it is to the noise. Thus, in our experiments we only calculated the first third-order moments, including ZM 00 , ZM 11 , ZM 20 , ZM 22 , ZM 31 , and ZM 33 .
When the weight ezer [ , , , , ] is determined, the HR estimation of each pixel in the video frame to be superresolved can be obtained by the weighted average for the pixels in its nonlocal neighboring region of each LR video frame. Suppose that = ; the objective function of the blurring HR estimate can be obtained by the following energy function: where Ψ represents the video frame to be super-resolved.
Finally in the third step, we introduce a stronger adaptive Kernel regression (AKR) deblurring mechanism [26] to deal with the blur, which is applied in the results of the second step of multiframe fusion. The desired HR video frame can be obtained by minimizing the following objective function: where denotes the weighted parameter of the AKR deblurring process.
To further improve the SR reconstruction quality, the result needs to be iteratively refined for further optimization. The result after each iteration could provide basis for more accurate weight calculation in the next iteration.

Experimental Data Set and Evaluation Indices.
In order to validate the effectiveness of our proposed model and algorithm, two groups of experiments were designed. In the first group of experiments, we made evaluation for our proposed optical flow motion estimation model in terms of the estimation accuracy and time efficiency and made comparison and analysis with some existing methods.
And the second group of experiments was designed to evaluate our proposed spatiotemporal SR reconstruction model in terms of subjective visual evaluation and three objective quantitative indices, the peak signal-to-noise ratio (PSNR), mean structural similarity index (MSSIM), and root-mean-square error (RMSE). In the experiments, we used the spatial video sequence taken from YOUKU website (http://www.youku.com/) and the standard video sequences taken from http://trace.eas.asu.edu/yuv/index.html website. The SR methods were assessed by subjective visual evaluation and three quantitative indices, the peak signal-to-noise ratio (PSNR), mean structural similarity index (MSSIM), and root-mean-square error (RMSE), calculated as follows: PSNR = 10 log 10 255 2 MSSIM ( ( ) ,̂( )) = 1 ∑ =1 SSIM ( ( ) ,̂( )) , where and denote the length and width of the video frame. and denote the reconstructed frame and the original frame, respectively. and are the mean. and are the standard deviation for the original and reconstructed frame, respectively.
is the covariance for the original and reconstructed frames. 1 and 2 are constants. is the number of frame blocks. For greater PSNR, the reconstructed frame is closer to the original. The closer MSSIM (0 ≤ MSSIM ≤ 1) is to 1, the greater the similarity is between the original and reconstructed frame structures. For smaller RMSE, the reconstructed frame is closer to the original.

Experimental Results.
In the first group of experiments, our improved optical flow motion estimation model was assessed by two quantitative indices, average end-point error (EPE) and average angular error (AAE). And also the existing four methods (HS [27], BA [27], Classic-C [28], and Classic-NL [24]) were introduced for comparison with our approach. In this experiment, we chose two video sequences of Rubber Whale and Grove2 from the standard optical flow Middlebury database [29] to evaluate the performance of our optical flow model. The parameters , , and in our optical flow model are set to 1, 10 2 , and 1, respectively. The EPE and AAE values and the time efficiency for the two sequences in the different optical flow methods under the conditions without noise and with noise are shown in Tables 1 and 2, respectively. The results from Table 1 show that compared with some existing methods, the overall performance of the proposed method is optimal, which obtains lower average EPE and AAE values and smaller time cost. From results shown in Table 2, we can see that the proposed model has good noise robustness, which can also perform better under the conditions with noise disturbance. Figure 4 gives the optical flow maps and motion vector graphs for the spatial sequences which were obtained by the proposed optical flow model. As can be seen from the optical flow maps and flow vectors shown in Figure 4, the proposed   method can accurately detect the motion area of spatial target and also preserve the motion edges well, thus it is effective to use it for the motion estimation of spatial sequences.
In the second group of experiments, in order to verify the performance of the proposed SR reconstruction model and algorithm, three experiments were designed for comparison of our method (STSR) with existing approaches, POCS, NL-SR [13], and ZM-SR [17]. In our method, to obtain better SR reconstruction results and also improve time efficiency, we applied STSR to each frame using its adjacent six frames in the video sequence. In our STSR method, the region size for the weight calculation is 3 × 3, and the optimal search region size is set to 5 × 5.
In Experiment 1, we used two spatial video sequences of Satellite-1 (20 frames/s, 592 × 256/frame) and Satellite-2 (20 frames/s, 640 × 346/frame) and two standard video sequences of Forman (20 frames/s, 352 × 288/frame) and Suzie (20 frames/s, 351 × 240/frame), each of which was blurred using a 3 × 3 uniform mask, decimated by a factor of 1 : 2 (for each frame), and then contaminated by additive Gaussian noise with = 2. And meanwhile, the even frames were gotten rid of, and then the frame rate was compressed to 10 frames/s. Then by the two times spatial and temporal super-resolution reconstruction, we tried to reconstruct the HR video sequences from the LR video sequences. Figure 5 gives PSNR, MSSIM, and RMSE values of the POCS, NL-SR, ZM-SR, and STSR methods for the four video sequences. The average PSNR, MSSIM, and RMSE values of the four methods are shown in Table 3. Table 4 shows the time cost of the multiframe fusion process in the ZM-SR and STSR methods. It lists the average SR time for each iteration of each video frame in the spatial sequences and the standard Forman and Suzie sequences. Through a further analysis from the experimental results, we can see that compared with traditional methods, the proposed STSR method shows obvious advantages. On one hand, it yields better results with higher PSNR and MSSIM values and lower RMSE values. On the other hand, it has higher time efficiency, which only costs half of that for ZM-SR. The main reason is the region correlation judgment strategy controlled by a self-adaptive threshold. On one hand, it is beneficial for mining the most similar patterns to calculate the weight based on a similarity measure, and on the other hand, only the most related regions are used for the weight calculation rather than all the regions. Figure 6 gives the visual effects of the spatial resolution reconstruction in the four different methods (POCS, NL-SR, ZM-SR, and STSR) for the four video sequences. From Figure 6 we can see that, because of relying on the accurate estimation of subpixel motion, POCS method is influenced by the errors of motion estimation, which produces some ghosting phenomenon (see the red rectangular box), so it is not adaptive to the sequences with some complex motion patterns. Compared with POCS, NL-SR, ZM-SR, and STSR perform better in the SR reconstruction, because of that they do not rely on the accurate estimation of subpixel motion and also have a denoising effect to some extent. However, compared with NL-SR and ZM-SR which usually produce blur and jagged effects, such as some textures in the sequences (see the local textures marked in the red rectangular box), the proposed STSR method yields more pleasing visual effects that contain richer details and clearer edges and contours. Figure 7 gives the visual effects of the temporal resolution reconstruction in the two different schemes (ICBI and STSR) for the two spatial video sequences, which aim to predict and reconstruct the missed or distorted video frames. From results shown in Figure 7, we can see that because of the  influence of noise and optical flow motion estimation errors, the traditional single frame based ICBI scheme usually produces some black hole effects in the reconstructed frames (see the local texture effects marked in the red rectangular box). But compared with ICBI, our method can effectively overcome this problem, and the reason mainly lies in the fact that we applied the multiframe information fusion strategy for SR reconstruction by making full use of the nonlocal selfsimilarity and redundant information in the spatiotemporal domain between the adjacent video frames. In Experiment 2, we used the two spatial sequences of Satellite-1 and Satellite-2 to test the effectiveness of the noise robustness and rotation invariance of the proposed method and made comparison with traditional POCS, NL-SR, and  ZM-SR methods. The two sequences of Satellite-1 (Frames 55-60) and Satellite-2 (Frames 62-67) were blurred using a 3 × 3 uniform mask, decimated by a factor of 1 : 2 (for each frame) and contaminated by additive noise with mean 0 and standard deviation 0.2, 0.4, 0.6, 0.8, 1.0, and 1.2; then some frames were processed with a slight angle. Moreover, the even frames were gotten rid of. Then we made the two times spatial and temporal SR reconstruction for the two sequences. PSNR, MSSIM, and RMSE values for the POCS, NL-SR, ZM-SR, and STSR methods under different noise levels are shown in Figure 8, which demonstrate that, compared with traditional methods, STSR method with a filter parameter of = 10 shows better performance, with higher PSNR and MSSIM values and lower RMSE values for all noise levels tested. Results for the first two experiments demonstrate that the STSR method performs better regardless of whether rotations occur or not. Thus the performance of our method shows that it has higher rotation invariance effectiveness and also is not sensitive to noise. Furthermore, we made Experiment 3 to test the performance of our proposed STSR model under some different noise models (Gaussian, Poisson, and mixed Poisson-Gaussian) for the sequences of Satellite-1, Satellite-2, Forman, and Suzie, and the experimental results are shown in Table 5.
PSNR, MSSIM, and RMSE results shown in Table 5 demonstrate that our proposed method yields better performance for Gaussian, Poisson, and mixed Poisson-Gaussian noise. Thus we can see that our method can be applied to some other noise models except the white Gaussian noise model and can also perform better.

Conclusions
A novel model and algorithm are proposed in this paper to implement the spatiotemporal super-resolution reconstruction for the video sequences. In our model, a motion details preserving based optical flow motion estimation model is first proposed to obtain the motion vectors, and then an efficient biweighted fusion strategy is introduced to implement the spatiotemporal motion compensation. Then combining the good properties of rotation, translation, and scale-invariance of ZM, we propose a fast fuzzy registration scheme based on ZM by using the self-adaptive region correlation judgment strategy, and then the final video sequences with high spatiotemporal resolution can be obtained by fusion of the complementary and redundant information with nonlocal self-similarity between the adjacent video frames. Moreover, the new model integrates spatial SR and temporal SR into a unified framework, which can improve both the spatial resolution and the temporal resolution and make the video sequences more clear and fluent. Different from the traditional SR reconstruction methods, the proposed method does not rely on accurate estimation of subpixel motion and can be adaptive to many kinds of complex motion patterns. Experimental results demonstrate that the proposed method outperforms the existing methods in terms of both subjective visual and objective quantitative evaluations and has higher rotation invariance effectiveness and noise robustness.