Motion-Compensated Frame Interpolation Using Cellular Automata-Based Motion Vector Smoothing

Motion-Compensated Frame Interpolation (MCFI) is one of the common temporal-domain tamper operations, and it is used to produce faked video frames for improving the visual qualities of video sequences. The instability of temporal symmetry results in many incorrect Motion Vectors (MVs) for Bidirectional Motion Estimation (BME) in MCFI. The existing Motion Vector Smoothing (MVS) works often oversmooth or revise correct MVs as wrong ones. To overcome this problem, we propose a Cellular Automata-based MVS (CA-MVS) algorithm to smooth the Motion Vector Field (MVF) output by BME. In our work, a cellular automaton is constructed to deduce MV outliers according to a defined local evolution rule. By performing CA-based evolution in a loop iteration, we gradually expose MV outliers and reduce incorrect MVs resulting from oversmoothing as many as possible. Experimental results show the proposed algorithm can improve the accuracy of BME and provide better objective and subjective interpolation qualities when compared with the traditional MVS algorithms.


Introduction
Motion-Compensated Frame Interpolation (MCFI) [1] is one of the common temporal-domain tamper operations, and it produces several new video frames between two neighboring video frames along motion trajectories of objects. It can be used to increase the frame rate or temporal resolution of a video sequence, so it is also called Motion-Compensated Frame Rate Upconversion (MC-FRUC) [2,3]. MCFI is a key step for many video applications, e.g., in the low bit-rate video coding, it is used to remove the temporal redundancy [4]; in the slow replay, it is used to improve movement details in a short time interval [5]; in Liquid Crystal Display (LCD), it is used to reduce the hold-type motion blur [6], etc. Therefore, as a fundamental technique, MCFI has been keeping a high research value since it was born.
Motion Estimation (ME) and Motion-Compensated Interpolation (MCI) are the main parts in MCFI. ME is for predicting the Motion Vector Field (MVF) between the neighboring frames, and MCI is for interpolating a new frame by using the MVF output of ME. The performance of MCFI heavily depends on the prediction accuracy of ME, so many works focus on improving ME performance, and they can be classified into two types: Unidirectional ME (UME) [7] and Bidirectional ME (BME) [8]. Block Matching Algorithm (BMA) [9] is the core of UME and BME, and it is performed in order to produce block-based MVF according to the execution mechanisms of UME and BME. UME predicts the motion trajectory of each block from next frame and previous frame, then determines the MVs of blocks on these motion trajectories in the intermediate frame. The MVs produced by UME are close to reality, but for some blocks in the intermediate frame, there could be no MV or multiple MVs, thus introducing holes or overlaps. That is why many works abandon the use of UME and turn to BME. BME directly predicts the MVF of intermediate frame according to the assumption of temporal symmetry of translational motion, so each block has a unique MV, making the estimated MVF free from overlaps and holes. BME has been popular due to its straightforward implementation, though it usually produces inevitable MV outliers because the translational motion cannot describe some complex movements, e.g., rotation and deformation, and the flat region contains many structural-similar patches. To reduce MV outliers produced by BME, Motion Vector Smoothing (MVS) [10] becomes an essential postprocessing step, and in some works, the BME combined with MVS is called a true ME [11][12][13]. From the above, it can be seen that a high-efficiency MVS is a key to improving the prediction accuracy of BME.
A typical MVS method is Median Filter (MF) [14], which replaces the MV of a block with the mean of MVs of its adjacent blocks. When an MV outlier occurs in a flat region, MF is not to adjust this outlier along with the major direction of neighboring MVs but to produce a new MV biased toward the major direction. That keeps MVF output by MF from being a high spatial correction in the flat region. A more popular way is to use Vector-Median Filter (VMF) [15], whose effectiveness is much more evident in the motion field with a high spatial correction when compared with MF. Beyond that, VMF can also reduce impulse noise while preserving contours. In the fields including edges and textures, spatial coherence is limited by the fact that the adjacent MVs do not necessarily align with a direction, so MF and VMF present a poor ability to correct MV outliers. To suppress MV outliers in the nonflat regions, Weighed VMF (WVMF) [16] is a good way. By adaptively controlling weights, WVMF relies not only on spatial coherence but also on the measure of the matching success. Though each MV in MVF can be smoothed by any one of MF, VMF, and WVMF, not all the MVs are outliers, thus introducing computation redundancy, even oversmoothing, i.e., revising the right MV as the wrong one. To prevent oversmoothing, many works add outlier detection before filtering MVF, e.g., Yoon et al. [17] regard an MV as an outlier if its absolute horizontal or vertical component value is larger than the average of those of its neighboring MVs; Kim and Sunwoo [18] detect an MV outlier by comparing the absolute difference of the current MV with the mean MV of its neighboring MVs. However, these methods cannot identify outliers accurately, especially that some obvious outliers are often omitted. Another way of suppressing oversmoothing is to refine MVF by imposing spatial-temporal smoothness constraint upon BMA, e.g., Huang et al. [19] designed a Spatial MVS (S-MVS), which uses Markov Random Field (MRF) to model spatial smoothness constraint of MVF, and refine each MV by performing BMA with MRF-based penalty term; Yoo et al. [20] proposed a Temporal MVS (T-MVS), which selects a reliable MV from the temporal-neighbor MVs along the forward and backward directions. With the spatial-temporal smoothness constraint, these methods make MVF closer to reality but cost lots of computations. So, both correction capability and computational complexity being considered, a more effective MVS can be realized by combining outlier detection and spatialtemporal smoothing. In this work, we try to provide a 2 Wireless Communications and Mobile Computing solution to make better use of the advantages of outlier detection and spatial-temporal smoothing.
In this paper, we first perform BME to generate the initial MVF of the intermediate frame; then, a Cellular Automata-(CA-) based MVS (CA-MVS) algorithm is proposed to filter the MVF output by BME. The proposed CA-MVS algorithm is a combination of outlier detection and spatial-temporal smoothing, in which CA is the key to a trade-off between oversmoothing and computational redundancy. Compared with the existing works, the main contributions of our work are summarized as follows: (i) MV outliers are found in MVF by measuring the angle between each MV and the mean of its neighboring MVs. A harsh threshold on the angle is set in order to pick out the evident outliers (ii) According to the positions of outliers, the outlier map is generated, and input to CA. By the evolution rule of CA, some hidden outliers are found. After several iterations, the distribution of MV outliers tends to be stable, resulting in a good balance between oversmoothing and computational redundancy (iii) Spatial-temporal coherence is exploited to correct MV outliers. For any MV outlier, VMF is first performed on its neighboring MVs to obtain the spatially predictive MV. Then, we refine this outlier by using the temporally neighboring MV candidates along the above-mentioned spatially predictive MV Experimental results show that the proposed CA-MVS algorithm can improve the prediction accuracy of BME and provide better objective and subjective interpolation qualities than the traditional MVS methods. in the further stage. As shown in Figure 1(a), due to the spatial coherence of MVF, a true MV is similar in value and direction to its adjacent MVs. An example of an MV outlier is illustrated in Figure 1(b), an MV is detected as an outlier once it is different from its adjacent MVs in value or direction. In this case, MF or VMF can be used to correct this outlier. Suppose v 0 is an MV outlier, and v 1 − v 8 are its adjacent MVs, MF is performed as follows:

Background
v 0 is the corrected MV of v 0 , at the geometric center of v 1 − v 8 . As shown in Figure 1 in which medf·g produces the median value of input set, in which k•k is the norm. Equation (3) indicates that the sum of distances between the median MV and adjacent MVs is smaller than the one between any MV in v 1 − v 8 and its adjacent MVs. This property forcesv 0 to be biased toward the main direction of v 1 − v 8 . A drawback of VMF lies in the lack of turning parameters. As shown in Figure 1(d), if the adjacent MVs vary in value and direction, the median MV is unreliable since we are confident in the validity of any MV. To overcome such a drawback, WVMF has been introduced based on VMF and is defined aŝ in which w i is the weight coefficient corresponding to v i . The fixed weights are first considered; then, the definition is extended to the case of adaptively varying weights. The more reliable v i is, the higher the corresponding weight w i is. Therefore, the weighted-median MV is close to the reliable MVs. MF, VMF, and WVMF exploit the spatial coherence of MVF to smooth MV, so their performances degrade when the spatial coherence is limited. In addition to spatial coherence, temporal coherence can also be used to construct the smoothness constraint into BMA. As shown in Figure 2, by assuming that MVs remain stable in a local region along time axis, the temporally and spatially neighboring MVs are combined in a candidate set C S . MV outlier can be refined by searching more reliable MV candidates in C S . To measure the reliability of MV candidates, the Sum of Bidirectional Absolute Difference (SBAD) is defined by in which v c is the MV candidate, B is the pixel position set of the current block in the intermediate frame, v c,h and v c,v are horizontal and vertical components of v c , respectively, and f t−1 ðx, yÞ and f t+1 ðx, yÞ represent the pixel value at the position ðx, yÞ in the previous frame and next frame, respectively. By the SBAD criterion, the MV outlier v 0 can be smoothed as follows: in whichṽ 0 is the refined MV of v 0 . Combined with SBAD, MVS can fully exploit the temporal coherence of MVF. For the above-mentioned MVS algorithms, an inevitable defect is Outlier detection

CA-based evolution
Residual calculation Spatial-temporal smoothing OBMĈ ≤ ThrF igure 5: Framework of the proposed CA-MVS algorithm. 4 Wireless Communications and Mobile Computing oversmoothing, i.e., revising true MV as a wrong one, which not only reduces the fidelity of MVF but also introduces invalid computations. To overcome oversmoothing which results from the unstable spatial-temporal coherence, we need a proper mechanism of outlier detection to correctly recognize the MV outlier.

Cellular Automata (CA)
. Cellular Automata (CA) are discrete dynamical systems with a simple structure to investigate self-organization in statistical mechanics, and they were originally introduced by von Neumann and Ulam as a possible idealization of biological systems [21]. It is when a computer game "Life," an application of a two-dimensional cellular automaton, became successful that CA began to attract researchers' attention [22]. Then, Stephen Wolfram improved the theory of CA by an in-depth and comprehensive study on the elementary CA [23][24][25]. It has been widely used in a variety of fields such as sociology, graphics, and physics [26]. The construction of a cellular automaton can be represented by the following formula: which shows that each point in a d-dimensional lattice Ω, called a cell, can take any one from a finite state set S, and the states of the cells of a lattice are updated according to a local evolution rule g, i.e., which denotes that the state S k+1 i of a cell at time k + 1 depends on its own state S k i at time k, and the states S k i±r of cells in its neighbors set Λ at time k. All cells in the lattice are updated synchronously, and the state of the lattice advances in discrete time steps. An example of the one-dimensional cellular automaton with code 76 is illustrated in Figure 3. All cells are arranged in a line. Each cell has binary states 1 or 0, and its left and right cells construct its neighbors set. The eight possible states of three adjacent cells are given at time k; then, the central cell of the three takes its state at the next time k + 1 by a defined rule. The time evolution of the complete cellular automaton is obtained by simultaneous application of the rule at each cell for each time step. In a two-dimensional case, different definitions of neighbors set are possible, among which von Neumann neighborhood and Moore neighborhood are common. As shown in Figure 4(a), four cells, the cells above and below each cell and the two on its right and left, are called von Neumann neighborhood of this cell. Moore neighborhood is illustrated in Figure 4(b), and it is an enlargement of von Neumann neighborhood containing the diagonal cells.  Figure 7: Local evolution rule of the constructed cellular automaton.

Wireless Communications and Mobile Computing
By properly modeling objective behaviors as evolution rules, CA can simulate the running of some physical systems. Particularly in two dimensions, CA have been extensively used to exploit the statistics and latent features of images [27,28]. The MVF of video frame is a twodimensional lattice, and there exists inherent coherence between adjacent MVs, enabling CA to deduce the statistical mechanics of MV outliers in MVF. Motivated by the rule evaluation of CA, we attempt to construct a twodimensional cellular automaton to model a universal law analogous to the variation of MV outliers. The cellular automaton controls the evaluation of the outlier map and helps MVS to get a good trade-off between oversmoothing and computational complexity.   Figure 5 shows the framework of the proposed CA-MVS algorithm. BME is first performed to produce the initial MVF V of f t at the ðj + 1Þth iteration. We select the less spatial-temporal neighboring MVs to smooth MV outliers in order to suppress oversmoothing. To decide whether to quit the loop, we calculate the residual ɛ between V ðj+1Þ t and V ðjÞ t as follows: in which M × N denotes the spatial size of V and V ðjÞ t is measured by the residual ɛ, and a threshold Thr is set to determine whether to perform the next iteration. Once ɛ is less than or equal to Thr, we make the loop exit and output V ðj+1Þ t as the final MVF V t of f t . After several iterations, by the evolution of CA, the number of outliers tends to be stable, preventing oversmoothing introduced by redundant computations. According to V t , the Overlapped Block Motion Compensation (OBMC) [29] is finally performed on f t−1 and f t+1 to interpolate the intermediate frame f̂t. Outlier detection, CA-based evolution, and spatial-temporal smoothing are the important parts of the proposed CA-MVS algorithm, and the following describes them in detail.

Outlier
Detection. An evident MV outlier shows its large angle with respect to one of its adjacent MVs, so we use the angle between every two adjacent MVs to detect the outliers in MVF. A 3 × 3 window is used to scan all MVs in MVF from left to right and from top to bottom, and as shown in Figure 6, in the MV window, v 0 is the MV to be detected in the red block, and v 1 − v 8 are eight adjacent MVs of v 0 , which are marked in blue. It costs many computations to compute all angles between v 0 and v 1 − v 8 . In order to reduce computations, we first get the median MVv 0 of v 1 − v 8 according to Equation (2), then compute the angle θ between v 0 andv 0 as follows: in which arccosð·Þ is the arc-cosine function.v 0 represents the main direction of v 1 − v 8 , so a large θ means that v 0 deviates far from its adjacent MVs. Based on this experience, we regard v 0 as an evident outlier if |θ | is larger than π/2. After detecting all MVs in V ðjÞ t , the outlier map E ðjÞ t is generated, in which outlier and nonoutlier are marked with 1 and 0, respectively. This 7 Wireless Communications and Mobile Computing strict criterion makes the exposed outliers more reliable, but it also excludes some real outliers. These hidden outliers lead to the trade-off between oversmoothing and computations: if the hidden outliers are omitted, computations invested to smooth outlier are not many, but they can mislead the correction of exposed outliers, resulting in oversmoothing. Therefore, we add CA-based evolution to keep a good balance between oversmoothing and computations.

CA-Based Evolution.
To ensure the accuracy of detecting outliers, the evident outliers are only marked with 1 in E ðjÞ t . When we use E ðjÞ t to decide whether to smooth any MV in V ðjÞ t , those hidden outliers cannot be corrected, and the exposed outliers would also be modified incorrectly once some outliers hide in its adjacent MVs. It is necessary to find these hidden outliers in V ðjÞ t in order to solve the above problem of oversmoothing. Motivated by the CA theory, we construct a cellular automaton to model the interaction between outliers in E ðjÞ t and deduce the hidden outliers from the exposed ones according to the defined local evolution rule. E ðjÞ t is a 2-dimensional lattice, and its element is regarded as a cell. Each cell has two states 0 and 1, denoting nonoutliers and outlier, respectively. Due to the locally stationary statistics of MVF, outliers propagate in its neighborhood. In a 3 × 3 window, this propagation effect can be enhanced when an outlier is closer to the center or more outliers occur in the window. Based on this experience, as shown in Figure 7, we define a local evolution rule for the constructed cellular According to Equation (11), the current MV can be identified as an outlier if outliers occur in its von Neumann neighborhood or more than three outliers occur in its Moore neighborhood. The states at each cell in E ðjÞ t are updated simultaneously, and these updated states form the new outlier mapÊ ðjÞ t . To speed up the CA-based evolution, we summarize the truth table on the above defined local rule, as shown in Table 1. According to Table 1, the state of the cell at a given time is obtained immediately depending on the logical combination of its neighbors' states at the previous time step. The CA-based evolution propagates the outlier in a local region and deduces the moderate number of hidden outliers.   is corrected by the proposed spatial-temporal smoothing. To improve the correction capability, the existing works are trying to construct a large-scale candidate set by exploiting the spatial-temporal coherence of MVF. However, the more candidate MVs there are, the more computations are invested, particularly the higher the probability of outliers occurring in the candidate set is. We combine VMF into the construc-tion of candidate MV set in the spatial-temporal neighborhood, which can simplify the set of candidate MVs while providing a robust capability to correct outliers. The flow of spatial-temporal smoothing is illustrated in Figure 8 . Due to a small radius of the search window, the number of candidates in C S is limited, but these candidates have a strong spatial-temporal coherence with v 0 , and in that way, a good correction capability can be ensured.

Experimental Results
In this section, the performance of the proposed CA-MVS algorithm is evaluated by testing it on different video sequences and comparing the results with those obtained by the traditional MVS algorithms including MF [14], VMF [15], WVMF [16], S-MVS [19], and T-MVS [20]. An MCFI algorithm is also combined by BME, CA-MVS, and OBMC and compared with the recent state-of-the-art MCFI algorithms [1,3,7,8] from objective perspectives. All test sequences used for experiments are in the standard CIF (352 × 288) formats and 30 frame/s. To evaluate the quality of the interpolated frames, we remove the first 50 even frames of each test sequence, then use various MCFI algorithms to recover these even frames from the 51 first odd frame. In the proposed CA-MVS algorithm, the block size is set to be 8 × 8, and how to set the threshold Thr will be discussed in

Effect of CA on Performance.
We evaluate the effect of CA on the performance improvement by presenting the variations of interpolated frames and outlier maps in the loop iterations as shown in Figure 9. In the first iteration, there are many outliers in the outlier map after detection, then by CA-based evolution, many hidden outliers are exposed, and finally, we smooth these outliers and get the corresponding interpolated frame as shown in the left of Figure 9(a), from which it can be seen that obvious mismatches occur around the mouth region. In the second iteration, by outlier detection, the outliers are reduced significantly, but outliers around the mouth cannot be detected. Thanks to the CAbased evolution, these outliers hidden around the mouth are exposed again. The interpolated frame after the second iteration is shown in the second column of Figure 9(a), and we can see that the mismatches around the mouth are suppressed effectively. In the third iteration, after detecting, some outliers disappear as a result of the last spatialtemporal smoothing; then, the CA-based evolution propagates outliers around the detected outliers. After correcting these outliers, the interpolated frame is shown in the third column of Figure 9(a), and it can be seen that mismatches are removed. In the fourth iteration, the distribution of outliers tends to be stable, and the quality of the interpolated frame as shown in the right of Figure 9(a) is a little different from that after the third iteration. Figure 9 indicates that the bad effects from outliers in the interpolated frame are reduced step by step owing to the implementation of CAbased evolution. Figure 10(a) shows the varying PSNR values of the interpolated frame as the number of iteration times increases, and it can be seen that the PSNR value increases gradually at each iteration and tends to stabilize after the third one, indicating that CA-based evolution can effectively improve the interpolated quality. We cannot observe the quality degradation in both objective and subjective results due to oversmoothing, which presents that outlier propagation controlled by CA prevents oversmoothing. From Figure 10(b), we can find that the PSNR value has little change when residual value ɛ is smaller than 0.1 for the 82-rd frame of the Foreman sequence. Limited by space, we cannot present the PSNR and residual curves for other sequences, but these results are similar to Figure 10, i.e., the PSNR value tends to be stable when ɛ is smaller than 0.1. Therefore, in order to remove the invalid iterations, we set the threshold Thr to be 0.1. Figure 11 shows the visual results of the 14th interpolated frame of the Foreman sequence with different MVS algorithms. MF and VMF can produce clear background, but serious blocking artifacts occur around the nose. For WVMF, S-MVS, and T-MVS, there are many mismatches on the background, and the faces are deformed severely. The proposed CA-MVS algorithm provides a pleasant result, in which there are no blocking artifacts. Figure 12 shows the visual results on the 58th interpolated frame of the Mobile sequence using different MVS algorithms. MF, VMF, and WVMF make the numbers on the calendar blurry and recover the red ball wrongly. S-MVS and T-MVS produce more serious distortion: numbers on the calendar disappear, and the color distortion occurs around the red ball. The CA-MVS algorithm produces clear numbers, and no deformation is generated. Figure 13 shows the visual results of the 22-rd interpolated frame of the News sequence with different MVS algorithms. WVMF, S-MVS, and T-MVS produce serious blocking effects around the face of anchorwoman. MF and VMF generate a clear face, but some mismatches occur on the background. The CA-MVS algorithm provides a comfortable result. Figure 14 shows the visual results of the 30th interpolated frame of the Stefan sequence with different MVS algorithms. For the traditional MVS algorithms, there are serious ghost effects around the sportsman; however, the CA-MVS algorithm effectively suppresses the blurring and provides a satisfying result.  The proposed CA-MVS and traditional MVS algorithms are, respectively, combined with BME and OBMC, and we get some different MCFI algorithms, which are used to interpolate the absent frames of test sequences. Table 2 presents the average PSNR values of these interpolated frames when using different MVS algorithms. It can be seen that the CA-MVS algorithm has obvious PSNR gains when compared with any of the other algorithms, e.g., for Football with complex motions, the CA-MVS algorithm is 1.03 dB higher than VMF, and for News with simple motions, the CA-MVS algorithm obtains, respectively, 6.00 dB, 5.42 dB, and 5.36 dB PSNR gains when compared with WVMF, S-MVS, and T-MVS. The last row of Table 2 lists the average PSNR values on all test sequences for various MVS algorithms, and it can be seen that the proposed CA-MVS algorithm obtains obvious PSNR improvements compared with other MVS algorithms. Figure 15 shows the PSNRs of individual interpolated frames on Foreman, Mobile, News, and Stefan. We can see that the proposed CA-MVS algorithm outperforms the traditional MVS algorithms in most cases, and especially for Mobile and Stefan, obvious PSNR improvements of the CA-MVS algorithm can also be obtained. Table 3 lists the average processing time of various MVS algorithms. It can be seen that the CA-MVS  Table 4 presents the PSNR results of the CA-MVS algorithm and recent stateof-the-art MCFI algorithms [1,3,7,8]. The results of [1,3,7,8] are directly taken from the original researches. From Table 4, we can see that the CA-MVS algorithm obtains PSNR gains for some test sequences, e.g., for Container, the CA-MVS algorithm outperforms [3,7,8], and for Football, the CA-MVS algorithm obtains 0.88 dB PSNR gains over [3]. In most cases, the proposed CA-MVS algorithm has comparable PSNR results to those of the state-of-the-art MCFI algorithms. From the above, we can see that the CA-MVS algorithm can provide a good objective interpolation quality.

Conclusion
In this paper, the CA-MVS algorithm is proposed to reduce the incorrect MVs resulting from BME. To overcome oversmoothing, according to the CA theory, we construct a cellular automaton to model the interaction between outliers and define a logical evolution rule to accurately expose outliers step by step. In the CA-MVS algorithm, a loop iteration is executed. First, by the angles between every two adjacent MVs, the evident outliers are detected. Second, through CA-based evolution, we find the hidden outliers based on the information from the exposed outliers. Third, the spatial-temporal smoothing corrects each MV outlier by searching the reliable MV from the spatial-temporal neigh-boring MVs. Finally, we calculate the residual between the current and the previous MVFs to decide whether to make the loop exit. Experimental results show that, compared with the traditional algorithms, the CA-MVS algorithm can better improve the accuracy of BME and provide better subjective and objective interpolation qualities. As the research in this paper is exploratory, there are many intriguing questions that future work should consider. First, the outlier detection is required to make it more accurate. Second, we will further improve the CA evolution rule to more efficiently find hidden outliers. Finally, the color information will be considered to be mixed into the CA evolution.

Data Availability
The experimental codes have been downloaded from Ran Li's homepage: http://www.scholat.com/liran358.