A Novel Adaptive Directional Interpolation Algorithm for Digital Video Resolution Enhancement

In this paper, a novel digital video resolution enhancement algorithm based on adaptive directional interpolation is proposed, where the directionality of the edge structure and the nonlocal self-similarity prior within the current frame as well as its adjacent frames are both considered. First, we establish the regularization equation that conforms to the prior model of a video frame and then take the classic bicubic interpolation result as the initial estimation to iteratively solve the restoration equation, in which the edge structures and contours in low resolution (LR) input are reconstructed to estimate and refine the desired high resolution (HR) output. Experimental results show that the proposed algorithm can effectively enhance the clarity of a video frame, with satisfying subjective visual quality and PSNR value.


Introduction
Videos and images are the main sources of information for humans. According to statistics, more than 80% of the information we receive from the outside world comes from vision. With the development of digital mobile communication and computer technology, various novel applications such as distance education, video on demand, telemedicine, and multiperson online video conference have appeared, promoting the revolution of productivity and social progress. In the meantime, the image quality of digital video has also been desired higher and higher, where the clarity index comes from standard definition to high definition (HD) and ultrahigh definition, as well as the corresponding resolution index also comes from 480p to 720p, 1080p, and 2160p (4K). On the one hand, these improvements in clarity and resolution can meet the increasing demand of end users and provide better image quality; on the other hand, while highresolution video provides more details in content, it also adds burdens to the entire production and consumption ecosystem: more expensive capture and storage devices on the image acquisition side, additional computing resource requirement for video editing on the media creation side, and more data transmission pressure on the communication network side. All these above have become important factors that restrict further improvement of video clarity and quality. In order to solve this problem, a common way is to use an image postprocessing procedure where the LR input frame is interpolated by a superresolution method [1][2][3][4][5][6][7], leading to a resolution-enhanced HR one. This software-based technique does not change the existing image acquisition and data transmission systems and thus is of great value in fields of videotelephony, virtual reality, augmented reality, and HD video games.
Natural images are highly structured, which reflects the strong time-spatial redundancy and self-similarity underlying pixels and performs a key role in solving inverse problems such as image denoising, deblurring, inpainting, and superresolution. By considering the fact that the human visual system is sensitive to the image edge structure [7][8][9][10][11], a novel digital video resolution enhancement algorithm via adaptive directional filtering is proposed in this paper, in which the characteristics of the edge contour and the nonlocal self-similarity within current frame as well as the corresponding adjacent frames are both considered. We first establish the regularization equation that conforms to the prior model of a video frame and then take the classic bicubic interpolation result as the initial estimation to iteratively solve the restoration equation, where the edge structures and contours in LR input are reconstructed to estimate and refine the desired HR output.
The rest of the sections are organized as follows. In Section 2, we introduce the core idea of the proposed adaptive directional interpolation scheme for estimating the missing details of the LR image and then use the nonlocal selfsimilarity prior to further improve the interpolation performance. The details of the video resolution enhancement algorithm are provided in Section 3. Section 4 presents the experimental validations of the proposed algorithm and comparison with the classic bicubic interpolation method; conclusions are drawn in Section 5.

The Core Idea
Directional regularity has widely existed in textures, edges, and contours of natural images (shown as in Figure 1). Denote vector f i ∈ R n 2 as the image patch centered around the ith pixel and with sizes n × n, and L θ ∈ R n 2 ×n 2 as the filter matrix corresponding to the directional filter with angle θ (in this paper, the directional controllable steerable filter [12] is used). Obviously, the filtered vector L θ f i is the sparsest (namely, L θ f i is approximate to zero) when θ is parallel with the main direction of f i . Generally, an image patch may include more than one main direction due to its complexity (examples are shown in Figures 1(c) and 1(d)); we can search for these direction angles using the following algorithm: In our previous works [3,13], we have shown the details to construct a blurring matrix from its corresponding linear degradation operator (as well as the downsampling matrix H). Here, we simply present the steps to construct the directional filter matrix L from a 2-D filter kernel B, as follows: (i) Let L be a n 2 × n 2 zero matrix; The structure of filter matrix L is presented in Figure 2. Figure 3 shows the main direction searching results of test images barbara and butterfly using the algorithm above.
Denote y i = Hf i as the LR image patch, where H ∈ R m 2 ×n 2 is the downsampling matrix [3]. When the downsampling factor D is an integer, we have m = n/D, and the corresponding LR input can be represented as yðh, vÞ = f ðh/D, v/DÞ. With the constraint of the directional regularity posed above,  Main direction searching Partition f into overlapping patches ff 1 , f 2 , ⋯g, and for each patch, do the following steps: • Initialization: Set main direction angle set S = ∅, candidate angle set Θ = fθ 1 , θ 2 , ⋯θ K g, the largest number of direction angles P. Set start point d = f i .
where λ is the regularization parameter and L i = Q P p=1 L θp is the adaptive directional filter matrix. This equation posed above has the well-known closed-form solution It is easy to know from the structure of the downsampling matrix H that H T H is diagonal. For the downsampling factor D = 2, we have  Figure 4: The NAR image model. As an example of (a), one image patch in (b) can be linearly represented by several nonlocal neighbors in (c).

Wireless Communications and Mobile
Computing Recall that L i f i ≈ 0, and thus, L i is approximately singular, implying that one or more singular values of L T i L i are close to zero, and therefore, the inverse of the restoration kernel H T H + λL T i L i is ill-posed that can not be well handled. To solve this problem, we explore the self-similarity prior widely existing in natural images to further improve the interpolation performance. In this paper, the nonlocal autoregressive (NAR) model of images [14] is used to add additional constraint to the restoration kernel and reduce the degree of freedom of desired unknown pixels; this will help to yield a more stable result.
According to our previous works [15][16][17], we show that each patch in an image can be approximatively represented as a linear combination of M nonlocal neighbors at different locations (shown as in Figure 4) that The neighbor set F i = ½f i1 , f i2 , ⋯, f iM ∈ R n×M consists of M nonlocal patches around f i , which can be seen as an adaptive local dictionary that refers to the target vector f i , and the corresponding representation coefficient ω i can be easily computed by ridge regression where the parameter γ is set manually to lead to the best results. Moreover, we have also proved in [15,17] that ω i is sparse when the atoms of F i are similar to f i in terms of normalized inner products. Considering that sparsity is very powerful that is broadly used in solving various inverse problems and has shown the ability to handle the image superresolution task [3,6,14,15,18], we here propose the following algorithm (Algorithm 2) to construct the adaptive dictionary F i : Figure 5 shows the dictionary construction results of two patches of test images lena using the algorithm above. For video sequence, the above algorithm is also adapted to construct a dictionary for image patch of frames. At this time, each atom of F i comes from those nonlocal neighbors belonging to the current frame and its adjacent Q frames, shown as in Figure 6. Considering that video scene changes smoothly for most time, the differences between neighbor frames are small; this means it will be easier to find more similar candidate patches and thus finally leads to a sparser/better representation coefficient ω i , which helps in improving the interpolation performance further.
Replacing the constraint posed in (5) by an equivalent penalty and adding it to Equation (1), we obtain Combining this equation with Equation (6), we get the desired HR patch estimator

Wireless Communications and Mobile Computing
Contrast the expression above with formula (2), we can see that the restoration kernel is full rank now, while keeping the advantage of diagonal, leading to a cheap computation of matrix inversion.

Video Resolution Enhancement Algorithm
To sum up, we use the interpolation algorithm (Algorithm 3) listed below for digital video resolution enhancement: A graphic demonstration of this algorithm is displayed in Figure 7.
In each interpolation loop, the time consumption T loop ð NÞ mainly consists of three parts, including the main direction searching T m ðNÞ, the adaptive dictionary constructing Adaptive dictionary construction Partition f into overlapping patches ff 1 , f 2 , ⋯g, and for each patch, do the following steps: • Initialization: Set nonlocal neighbor number M and search window size W.
• Dictionary construction: -Sweep over all possible patches f i1 , f i2 , ⋯ over the searching window centered around f i , and compute the normalized candidate atom set G i = ½ðf i1 /∥f i1 ∥Þ, ðf i2 /∥f i2 ∥Þ, ⋯; -Compute the normalized inner product vector r = G T i f i ; -Select the atoms with the largest M values in |r | to construct dictionary F i . • Output: The adaptive dictionary F i of the ith image patch f i .

Algorithm 2:
Target patch Non-local neighbors Frame #i Frame #i-1 Frame #i+1 Figure 6: Dictionary construction. As an illustration, for a target patch in the ith frame of a video sequence, the corresponding adaptive dictionary is composed of nonlocal neighbors scattered over frames i, i − 1, and i + 1 (take Q = 1 for example).
Resolution enhancement algorithm For each LR frame y of the input digital video sequence, do the following steps: • Initialization: Set f the bicubic interpolation of y.
• Main loop (repeat C times): -Use Algorithm 1 to search the main direction for each patch off, calculate the corresponding adaptive directional filter matrix L i ; -Use Algorithm 2 to construct the adaptive dictionary F i ; -Takingf i as an initial estimation of the desired HR output f i , use Equation (8) to compute the resolution enhancement result f̂i; -Updatef ← f̂for the next iteration when all image patches have been restored. • Output: The resolution enhanced output f̂.
For the first term T m ðNÞ, we know from Algorithm 1 that searching each direction for every target patch needs K filtering operations. Considering the fact that filtering a fixed-size image patch with size n × n can surely be done in constant time t 1 , therefore For the second term T a ðNÞ, we need to sweep over W 2 candidate patches around each target LR patch for searching atoms. Similarly, since the normalization and inner product computing can also be finished in constant time t 2 , thus In the above expression, T top ðMÞ represents the time consumption of selecting the top M largest elements from vector|r | = |G T i f i | ∈ R W 2 , where this task can be simply implemented by a fast ordering algorithm with time com-plexity OðW 2 log ðWÞÞ, and this leads to For the last term T e ðNÞ, the time consumption is mainly determined by the computation of the inverse matrices ðH T H + λL T i L i + μIÞ −1 and ðF T i F i + γnIÞ −1 . For the reason that the size of H, L i , F i , and I are fixed and indifferent to N , thus these operations can also be done in constant time t 3 . We have Plugging Equations (10), (12), and (13) into (9), we obtain The equation above means that the computational complexity of our proposed interpolation algorithm is proportional to the pixel number (N 2 ) of the LR input frame.
For color video sequence interpolation, the YUV color model can be considered: we start by splitting the input color frame into luminance channel and chrominance channel and then enhance each channel using the proposed algorithm and classic bicubic interpolation, respectively.           Wireless Communications and Mobile Computing resolution enhanced frame can be obtained by converting these channels back to RGB color space. The diagram is shown in Figure 8.

Experimental Results
In this section, several experimental results of the proposed resolution enhancement algorithm are reported to show the performance and compared with the widely used bicubic interpolation method, in terms of subjective image quality and objective PSNR index. The LR input image/video frame is generated by directly decimating the original HR one by a factor of T in each axis and then interpolated back to the original size for performance evaluation. The chosen parameters are as follows: P = 2, n = 8, λ = 1, γ = 800, μ = 5, M = 12, C = 4, and W = 20, the candidate angle set for main direction searching is Θ = f0,10,20, ⋯170g, and the width of the directional controllable steerable filter is 5 (with Gaussian kernel standard deviation σ = 0:7). According to our tests, performing a 2 × 2 interpolation for a single frame costs about 2.3 seconds on Intel Core i7 8750H with 6 cores at 3.9 GHz, Windows 64 bit, Matlab 2017b, accelerated by C-MEX interface in typical settings of N = 512 × 512 and D = 2. Using a GPU-accelerated architecture (CUDA or OpenCL) may be helpful to reduce computation time extremely, we shall study this in future research. Figures 9-12 present the resolution enhancement results on test still images leaves, airplane, butterfly, and peppers, with factor D = 2 and 3. Figures 13 and 14 further show the 2 × 2 interpolation results of test video sequences foreman and ice, with reference frame number Q = 2. From these figures, we see that the proposed algorithm works very well in reconstructing image contours and fine details, with few noticeable staircase artifacts in tiny structures, when compared to the bicubic interpolation method which produces a large amount of aliasing in edges and textures, and thus, the performance is very poor. Moreover, Figure 15 also gives the objective quality evaluation of foreman and ice for the first 50 frames. As expected, our method achieves satisfying PSNR values (with about 2 dBs higher than bicubic on average); this is consistent with the subjective visual quality shown above.

Conclusion
In this paper, we present an effective algorithm for enhancing digital video/still image resolution based on the directional regularization and nonlocal self-similarity structure, where the missing pixels of an image patch can be estimated from its nonlocal neighbors via an adaptive directional filtering operation. The appeal of this work is its simplicity, with no requirement of solving complex optimization equations, and is easily implemented. Experimental results show that the proposed algorithm can effectively improve the digital video quality in terms of clarity and resolution and thus will be of great value in theory and application.

Data Availability
Please contact the first author (sundong@ahu.edu.cn) to obtain the Matlab demo codes.

Conflicts of Interest
The authors declare that they have no conflicts of interest.