Video Texture Synthesis Based on Flow-Like Stylization Painting

The paper presents an NP-video rendering system based on natural phenomena. It provides a simple nonphotorealistic video synthesis system in which user can obtain a flow-like stylization painting and infinite video scene. Firstly, based on anisotropic Kuwahara filtering in conjunction with line integral convolution, the phenomena video scene can be rendered to flow-like stylization painting. Secondly, the methods of frame division, patches synthesis, will be used to synthesize infinite playing video. According to selection examples from different natural video texture, our system can generate stylized of flow-like and infinite video scenes. The visual discontinuities between neighbor frames are decreased, and we also preserve feature and details of frames. This rendering system is easy and simple to implement.


Introduction
Most natural phenomena can use image or video to demonstrate directly. However, both natural and artificial phenomena cannot adequately be captured by a single static photo or image. On the other hand, though video is considered as the best media to show natural scene, it also has some shortcomings, such as if video scene should be stored on a computer or other storage devices, we always apply finite duration video clip. Therefore, video needs a beginning, a processing, and the ending frame. Further, though video captures the time-varying behavior of the phenomenon, it lacks the "timeless" quality of the photograph or image. In addition, it needs too much storage space.
In order to solve the above problems, Schödl and Essa proposed a new type of medium called "video texture" [1]. Video texture has qualities between photograph and video. This medium provides a continuous, infinitely varying stream of video images, and it can show an arbitrary length of video. The natural short video can be captured to synthesize infinite scene based on the technique of video texture synthesis. Then video texture works well for motions not only repetitive but also quasi-repetitive, such as swaying trees, waterfalls, and flickering flames. So, video texture is the best media to represent natural phenomena of our nature. Because of its wide application in the entertainment and industry and so forth, computer-based phenomena video texture becomes hot topic.
Generally, video texture may be regarded as expansion of image not only in the time domain but also in natural phenomena simulation. Schödl and Essa also employed video texture technique to animation. They used L2 distance to measure similarity between neighbor frames and maintained playing continuity among video frames. The proposed method has realized highly third dimension scenery animation successfully. Based on image cutting and synthesizing technique, Kuwahara improved video texture synthesis method. To ensure continuity, the input frames were divided into some parts, and then they would be synthesized to video texture [2]. Bhat et al. put forward an editing nature phenomena method to generate the final video texture. Their method first analyzed the motion of texture particles and segmented frames into patches, then they reorganized the patches to reduce error dither [3]. Their method can merge an arbitrary length video based on natural scene. Agarwala et al. described an automatic method for taking the output of a single panning video camera and creating a panoramic video texture [4]. Their method applied a dynamic programming step through hierarchical min-cut optimization process to improve synthesis speed.

2
The Scientific World Journal Though there are so many video synthesis methods to synthesize natural video phenomena, the synthesis results are traditionally striven photorealism scenes. That is to say photorealism effects can not produce artistic sensation. Sometimes, people want to enjoy artistic video with cartoon, oil, and water painting effects. In addition, photorealistic images often include more details than necessary to communicate intended information. It is known that nonphotorealistic rendering (NPR) is the subject of intense debate for representative methods of artistic style creation. Artistic expression can often convey a specific mood which is difficult to imbue in a photorealistic scene. NPR also can focus viewer's attention on the important information while downplaying extraneous or unimportant features. Additionally, a NPR look is often more engaging than the traditional, photorealism computer graphic rendering [5]. So, if video texture can be rendered to some nonphotorealistic effects, we can obtain artistic video with infinite playing scene.
Some NPR methods address the computer generated abstract stylization artistic effects, and this process fit perceptual task performance and evaluation. Decaudin applied modeling tools to generate abstract effect from natural scene [6], and his method kept details very well and shadow stayed coherent. Based on mean-shift, DeCarlo and Santella extended three-dimensional video volumes to automatically generate abstract artistic [7]. Particularly in the presence of occlusions and camera movement, contour tracking required substantial user correction of the segmentation results.
This paper presents an improved method to synthesize phenomena video with flow-like abstract painting. The new NP-video model can convert natural phenomena to a flowlike stylization video fast, and we also can achieve the infinite video texture easily. The amalgamations of the NPR and video texture synthesis can make the user enjoy artistic phenomena work, looking on the original phenomena in different artistic style, immersing vision and spirit into the artistic kingdom, and avoid some critical problems in photorealism systems.

NP-Video Phenomena Model System
We will introduce the basic architecture of our NP-video phenomena model in this section. As shown in Figure 1, our system proceeds in two steps. (1) Flow-like effects rendering: the natural phenomena video should be rendered to NPR artistic scene. (2) Video texture synthesis: the phenomena with flow-like effect will be synthesized to infinite playing video. We define the final play sequence of video texture to eliminate visual discontinuities. In addition, frames dividing and recombination method is used to arrive coherent playing frames. Our architecture supports a common NPR technique merging to video texture, and the final NP-video can give user flexibility and enjoyment.

Flow-Like Artistic Style Simulating
Many arts create flow-like artistic work, and people always enjoy this work. Van Gogh and Munch emphasized flowlike structures and directional features in their paintings.
The paintings are so famous because they are harmonic, interesting, and pleasant. Directional coherence and flow structure in this artistic work can help salience region features and boundaries. It also helps to evoke viewer's imagination. So, before synthesizing video texture, we transfer the input phenomena to flow-like art first, and our method provides a versatile NPR flow-like rendering interface.
As same as the abstract effects, flow-like stylization artistic work can generate using edge-preserving filter method. Bilateral filter [8] and mean-shift method [9] are very famous examples of edge-preserving filter. Kang et al. improved the filter shapes, which is based on the vector field derived from the salient image features [10]. However, because edgepreserving filter cannot obtain good result with weak vector, this method may lose some details such as weak edge and structure. Kuwahara filter provides an overall stereoscopic painting visual and maintains a uniform level of abstraction across the image [11]. Unfortunately, Kuwahara filter can result in clustering artifacts during filter process. Papari et al. introduced a new weighting filter window, and the shape of the filter window can be changed based on local feature directions [12]. Kyprianidis also generated an abstract effect along the local feature directions [13], but the stylized surface of their methods cannot preserve both spatial and temporal coherence. Like bilateral filter and Kuwahara filter, Kang and Lee used Shock filter and mean curvature flow to generate abstraction effect, but the boundaries are not salient and simplified [14]. Recently, Kim et al. put forward bristle maps to generate aggregation, abstraction, and stylization of spatiotemporal data [15]. Isenberg explored visual map technique for generating of stylized renderings of 2D map data [16]. His method can render different stylization artistic work.
Though there are many techniques that can be used to obtain flow-like artistic work, these methods should be improved to adopt phenomena video. After capturing phenomena video, we will resolve this video and generate flowlike stylization painting of every frame. Our method begins with the anisotropic Kuwahara filtering. Firstly, eigenvalues and eigenvectors can be used to calculate structure tensor. Secondly, local orientation and anisotropy structure are used to guide Kuwahara filter process. Then the flow-like effects can be generated using improved line integral convolution.

Local Orientation Calculate.
Brox et al. obtained anisotropy through calculating eigenvalues and eigenvectors of structure tensor [17]. We use Brox's method to calculate structure tensor first. Let , and , be the spatial derivatives in -and -direction and let be the input frame with standard deviation ; structure tensor can be calculated as The pixel's gradient vector is G = ( , ) . Let * denote convolution operation; the partial derivatives of can be calculated: The Scientific World Journal  The structure tensor of each video frame f can be defined as Let 1 and 2 be the minimum and the maximum eigenvalue of T, which represents gray scale variation degree of vector k 1 and k 2 where parameter denotes filter kernel of Gauss function variance, denotes filter radius which is calculated using 1 / 2 . If 1 and 2 are close, this filter field tends to be circular. The parameter represents edge direction of pixel ( , ), which reflects curve direction of eigenvalue. Then anisotropic structure tensor can be expressed as T : Because the local vector field maybe discontinuous, we use Gaussian filter technique to smooth this vector field, and eigenvalues of structure tensor are given: Then, the eigenvector in local curve direction is calculated as The local orientation is = arg , and we calculate anisotropy based on Yang's method [18]: where parameter ranges between 0 and 1, which denotes isotropic and anisotropic regions, respectively.

Anisotropic Filter.
Because pixel ( , ) of filter result is determined by the spatial distance from the center pixel ( , ), as well as its relative intensity difference, the filtering output at pixel ( , ) is calculated: where parameter Ω denotes filter spatial space. , is the local intensity or color similarity with weighted , , and , denotes the squared standard deviations, and can be defined: Let be the local orientation, be anisotropy, and be matrix defining a rotation, the filter spatial space Ω is where parameter ℎ = 2 . Parameter is coefficient to adjust anisotropy . So, − is a linear coordinate transform which maps Ω to a disc with radius ℎ. This anisotropic filter ensures that if local region has lower standard deviation, the more weight value is given during filter process. Figure 2 shows our filter results. Figure 2 From these results, we can find that our method can maintain more details and features, such as edge, color, and local structure. Our method also eliminates artifacts of shapesimplifying and Shock filter method.

Integral Convolution.
We benefit from our efficient stylization painting not only in anisotropic filtering, but also with flow-like effect. Papari and Petkov applied integral convolution technique to produce nonphotorealistic rendering glass patterns texture effect [19]. We refer their integral convolution method to generate final artistic work. During convolution process, our method calculates a local stream line which moves out from the positive and negative directions. Let out The Scientific World Journal 5 be flow-like output image, and let ( ) be a stream line with length of . Parameter 0 denotes each pixel in ( ). Let 0 be current pixel; we can obtain 0 = ( 0 ). Parameter ( ) denotes convolution kernel, and we utilize Hanning window function in this paper. ( ( )) denotes all the pixels in this line, the convolution implement is defined as follows [20]: where parameter denotes convolution length. If is small, an insufficient amount of filtering flow occurs. However, too large will result in close convolution value and regular output. Because variance can well reflect local statistic information, it can be used to calculate local convolution length automatically: where max and min are the largest and the smallest integral length which can be assigned. Parameter is variance value of local area. Based on anisotropic filtering and integral convolution, input video can be rendered to flow-like effects. Next we will synthesize nonphotorealistic video texture to infinite video scene.

Phenomena Video Texture Synthesis
After phenomena video is rendered to flow-like effects, we will synthesize video texture clip to infinite video scene through video texture synthesis method. There are two significant challenges for video texture synthesis. (1) Because infinite video scene is generated through playing period video sequence repeat, the end frame in one sequence has large difference with the beginning frame in the next loop sequence. So, there are visual breaks between different loop arrays.
(2) During video texture playing, visual discontinuities exist in different frames in one sequence. When frames dispose to nonphotorealistic artistic work, the discontinuities will be more prominent. So, we will determinate loop sequence of input video frames first. Then, methods of frame division and recombination, patches synthesis will be applied to eliminate visual discontinuities.

Loop Sequence Determinate.
Many natural phenomena present playing cycle in video texture. If we play video cycle of natural phenomena repeat, people can enjoy infinite scene, and visual discontinuities will be reduced. So, we should find the loop cycle of a certain video texture. We use Schödl's method of L2 distance to measure the similarity among different video frames [1]. Because human's vision is sensitive to the change of luminance in frames, during the process of L2 distance calculation, we transfer the color space of frame from RGB to YCbCr. So L2 distance can be calculated only in luminance channel Y. If , denote two different frames in one video texture, similarity can be acquired as where , are luminance information of and . Parameters , represent pixel positions of and . So, if ( , ) is smaller than the threshold , frames and can be regarded as the same loop sequence. That is to say, and are neighbor frames. If denotes frame number of input natural video texture and denotes each frame in ; threshold can be calculated: Based on similarity, video texture can be divided into different sequences. Because L2 distance of neighbor frames in one sequence is small, we consider using different sequences to combine final infinite video scene. If L2 distance satisfies formula (16), then there only exists one sequence, and this sequence of input video texture has frames: Figure 4 shows one natural video texture of stream water. There are fifteen frames 1 to 15 in this video, and we can obtain similarity in Table 1 using formula (14).

Frame Division and Recombination.
Though L2 distance is smaller than threshold in one sequence, when playing video sequence repeatedly, there also exists visual discontinuity between neighbor frames, such as ending and beginning frames. Therefore, we utilize Bhat's method to eliminate these discontinuities. Most natural phenomena video texture is similar particle moving, such as waterfall, fountain, flame, and stream water. Figure 5(a) shows a particle moving procedure along a flow line, and we can find that a patch texture of one frame in Figure 5(b) will shift to neighbor frame in next playing moment.
One frame in video texture can be divided into some texture patches, such as Figure 5(c). If the sequence of input video is four, then each frame should be divided into four texture parts, and each patch will shift along flow line like Figure 5(a). The patch will move to different position in neighbor frames. For example, when = 3, the patch moves to the third part in the fourth frame, whose beginning is the first part in the second frame. If frames are divided into three parts, Figure 5(d) shows the result after frames recombined together, and the same color stands for the same texture patch. We can find that the same patch is in different frames when this video scene is playing.      Table 2 shows original playing sequence without frame dividing and recombination. The different column expresses different playing video frames. When this video texture is playing repeatedly, there is a visual discontinuity between the end frame of front cycle and the beginning frame of next cycle. If sequence length of this input video texture is three, then we divide each frame into three texture parts. After using frames recombination method, we obtain new group frames like Table 3. From Table 3, we can find that the discontinuity is scattered to different frames and forms a ladder-shaped discontinuity. Because this technique disperses visual dither to the whole playing sequence, visual break between different loop arrays will be reduced.
The method of frames division and recombination is suitable to natural phenomena moving downwards, such as rivers, waterfalls, and fallen leaves. If input video texture moves upwards, such as fountains or flames, the frames should be divided into patches and recombination retroflex like Table 3. However, if a natural phenomenon is complicated and its sequence is more than ten, it is impossible to divide every frame into ten patches. Then, if video texture's sequence is more than appointed threshold, each frame of this video can be divided using this threshold.

Patches Synthesis Method.
Though visual discontinuity can be removed applying frame division and recombination, some distinct boundaries will be produced between different patches in one frame. Texture synthesis method can be used to eliminate these boundaries.
Texture synthesis method can be used to synthesize texture sample to an unlimited size image. Efros and Freeman proposed a texture quilting method to synthesize surprisingly Table 3: Frame division and recombination.
good results for a wide range of textures [21]. Efros and Freeman chose some blocks from the sample texture to fill in the final texture image. The blocks satisfy the overlap constraints within some error tolerance. As shown in Figure 6, there have been some overlap regions between neighbor blocks. To eliminate discontinuity between overlap regions, based on the error cost path in overlap region, Efros found the best seam line to synthesize neighbor blocks. Hence, when frames of input video texture are divided into patches, we apply Efros' method to recombine these patches. Let ( , ) be error cost function and width of overlap region. color denotes distance error of color, geometry denotes structure error, and the minimum error boundary can be calculated as Figure 7 shows the slide window when we calculate ( , ). Let pre be error cost of previous slider window and old error cost of region A; new is error cost of region B. Equation (18) is used to calculate the error cost between New and Old windows, and this method can optimize synthesis process between different patches:

Experiment Results
To verify the feasibility and effectiveness of our proposed methods in this paper, we implemented the relevant algorithms and carried out our corresponding experiments.  Figure 8 shows blowing water video scene, and this input video has seventeen frames. Figure 9 shows fountain video scene, and this input video has fifty-eight frames. We also can conclude that the experimental results are realistic and they all have some esthetics.

Conclusions
In this paper, we have presented an NP-video texture synthesis system based on natural phenomena. Based on a simple and effective anisotropic filtering and integral convolution, flow-like stylization video effects can be generated. In addition, through loop sequence determination, frame division, and recombination, our algorithm removes visual discontinuities between different frames during video texture synthesis process. The experiment results indicate that our algorithm is easy to synthesize infinite video scene with nonphotorealistic artistic work.
An obvious limitation of our method is that the synthesis speed should be improved. Some strategies should be used to protect highly important area in the frames. We may further extend the GPU implementation of our algorithm to process video in real time.