A Simple and High Performing Rate Control Initialization Method for H . 264 AVC Coding Based on Motion Vector Map and Spatial Complexity at Low Bitrate

The temporal complexity of video sequences can be characterized by motion vector map which consists of motion vectors of each macroblock (MB). In order to obtain the optimal initial QP (quantization parameter) for the various video sequences which have different spatial and temporal complexities, this paper proposes a simple and high performance initial QP determining method based on motion vector map and temporal complexity to decide an initial QP in given target bit rate. The proposed algorithm produces the reconstructed video sequences with outstanding and stable quality. For any video sequences, the initial QP can be easily determined from matrices by target bit rate and mapped spatial complexity using proposed mapping method. Experimental results show that the proposed algorithm can showmore outstanding objective and subjective performance than other conventional determining methods.


Introduction
In the last decade, multimedia data has been applied to communication, security, entertainment, and military.Because multimedia data has the problem of large amount of data, it can be hardly stored and transmitted.Video coding can effectively solve the problem.With the development of terminal equipment and communication networks, the video coding standards have been continually established as MPEG-1 [1], MPEG-2 [2], MPEG-4 [3], H.261 [4], H.262, H.263 [5], H.264 [6], and H.265 [7].Especially, H.264/AVC can be applied for extensive areas such as DVD and VOD over cable and bit streaming video at low bit rate with high quality.
In multimedia communication and transmission, rate control (RC) algorithm plays a crucial role.H.264/AVC includes an RC [8] which can be used to achieve optimal subjective quality given transmission bandwidth limit by regulating the encoding parameters.In RC, the optimal QP has been determined for every frame of video sequence.The large amount of encoded and good perceptual quality can be obtained by reducing the QP.In contrast, the QP increment can reduce the encoded bits and perceptual quality.The mean absolute difference (MAD) of residual MB can be used to determine the QP for MB.However, the chicken-and-egg dilemma [6] occurs in the process of determining QP.In JVT-G012 [9], the scheme of one-pass RC uses a linear MAD prediction model to solve the chicken-and-egg dilemma.According to the efficiency and simplicity of JVT-G012, the scheme of JVT-G012 has been widely applied in H.264 software and hardware.The algorithm of JVT-G012 includes the process of initialization for RC.In this process, the initial QP is determined for IDR picture in a video sequence.Therefore, the optimal MAD of IDR can be calculated using initial QP.The MAD of next frame is predictable using optimal MAD of IDR.The process of initial QP determination is an important part in RC.The value of initial QP can influence the RC performance.Conventionally, the value of initial QP can be determined according to the number of bits per pixel (BPP) which depends on the target bit rate, frame rate, and image size in JVT-G012.Although the scheme of BPP is easy to implement, it is very rough and imprecise.Generally, video sequence is two-element data which includes spatial and temporal components.According to information theory, the relatively large number of bits is used to represent the video sequence which has highly complicated spatial and temporal features.In contrast, relatively small amount of bits can encode the low complex video sequence.In the situation of the restricted target bits, a large initial QP is expected for highly complicated video sequence.In spite of reducing the image quality, it is in the limited data range.For the low complex video sequence, a small initial QP is assigned; not only the target bits can be fulfilled but the excellent image quality can be also obtained.Using this idea, it is very important that the scheme of RC initialization is used to consider spatial and temporal complexities of video sequence.In order to complete this target, Wang and Kwong [10] and Wu and Kim [11] proposed the schemes which used the characteristics of video sequences as well as BPP to determine the initial QP.However, their methods do not consider quality balance of reconstructed video sequence and the provided parameters cannot apply any video sequence.The criterion of selecting the sample video sequences has not been provided.In addition, the algorithm of determining optimal initial QP has not been explained either.Moreover, Hu et al. [12] proposed a scheme of computing the initial QP for spatial scalable video coding (SVC).However, this scheme is only applicable to the SVC standard.
In order to solve the existing problems, we propose a simple and high performance method to determine an initial QP in given target bit rate.To obtain the initial QP for any video sequences, it is very important to measure the spatial and temporal complexities of the video sequences.In H.264/AVC [6] standard, the motion vector is proposed as a measure of temporal complexities of video sequences.Primarily, temporal complexities of ten video sequences are analyzed using motion vector and 4 video sequences are screened out as the samples.And then, the spatial complexities of samples are calculated using the rate of number of complex MBs, which are determined by variance of MB, in the IDR.In this paper, an algorithm for determining optimal initial QP is proposed for producing the reconstructed video sequences with high and stable qualities at the encoding bits which are very close to target bit rates within the range of 0.4 to 1.0 Mbps.Subsequently, optimal initial QPs of 4 samples constitute two-dimensional matrices basis of spatial complexity and target bit rate.Moreover, we propose a mapping method to determine the spatial complexity of tested video sequences by the two-dimensional matrices.For any tested video sequence, its optimal initial QP can be chosen from the built matrix using its mapped spatial complexity and target bit rate.Section 2 presents the related method of initialization process of RC.Section 3 interprets the evolution process of proposed method.The results of experiment are shown in Section 4. Finally, Section 5 shows the conclusion of this study.

Related Method of Initialization Algorithm of Rate Control
In this section, we review some initialization methods of RC which are used to decide initial QPs in the recent literatures.The method of JVT-G012 that can automatically conclude the value of initial QP for the IDR is the most traditional and coarse method.The advantages of JVT-G012 are easy implementation and low computational complexity.
In the case of the poor performance hardware of terminal equipment, JVT-G012 is being of extensive usage.Various versions of the reference software of H.264 [7,13] have adopted JVT-G012.The disadvantage of JVT-G012 is being not accurate enough.The method of JVT-G012 uses only BPP to determine an initial QP as follows: where JVTQP is calculated by the initial QP of initialization process of RC.BPP, which is the bits per pixel, is computed using BR, FR, and VS which represent the target bit rate, the frame rate, and the size of frame in (2). 1 ,  2 , and  3 are thresholds.For QCIF video sequences, the thresholds are given as 0.1, 0.3, and 0.6.For CIF video sequences, the thresholds are recommended to be 0.6, 1.4 and 2.4 in JM9.3 [10].
In the method of Wang, the value of initial QP is computed on the basis of BPP and spatial feature.The spatial feature can be computed using entropy information and the dc mode of INTRA 16 × 16 of the IDR.Consider where  − 1 is maximum gray level value of pixel, () is the probability of gray level ,  is the number of MBs in IDR,   (, ) is used to denote the pixel value at (, ) of the th MB, and   dc is used to denote the predicted compensation value computed from the dc mode of INTRA 16 × 16.Finally, the initial QP can be calculated by (4). Figure 1 shows the relationship of the best initial QP and BPP by News, Foreman, and Mobile.And then (5) can be computed by this relationship: In the scheme of Wu, initial QP is calculated on the basis of BPP, the MAD value of the IDR, and the average MAD value of the 2nd, 3rd, and 4th frames which are encoded using intermode as follows: where the value of  is 0.05 and  = {1, 2, 3, 4, 5, 6, 7, 1, 2, 3} can be determined as  = {−  (7).The parameters used in the method of Wang and Wu are calculated using three types of tested video sequences.In Wu and Wang, the extracting method of sample video sequences has not been shown.It is difficult to say that they can represent the various spatial and temporal complexities video sequences.Moreover, Wang and Wu have not taken into account the quality consistency of recovered video sequences.Furthermore, the scheme of the best initial QP determination is not explained.

Spatial and Temporal Complexities.
In the video sequence, the content of adjacent frames has not any significant difference.In order to save amount of bits, only difference is encoded.For finding difference, most video coding standards support the method named motion estimation (ME) [6].ME is used to investigate the 16 × 16 at the objective region in reference frames that closely matches the current MB.ME is improved by H.264/AVC such as variable block size, multiple reference frames, and optimization algorithm.Figure 2 shows the process of ME.Motion vectors are used to compress video by storing the changes to an image from one frame to the next.Figure 3 shows motion vector.By the number and magnitude of the motion vectors, the temporal complexity of video sequences can be measured and predicted.In H.264/AVC, most frames are encoded using intermode [4][5][6].For the video coding performance, the temporal complexity is more important than spatial complexity.
Figure 4 shows the result of motion vector maps for ten different types of sample video sequences [14].In Figure 4, the ten video sequences can be simply divided into two categories.The Bus, Flower, and Mobile video sequences are classified as complex case.On the contrary, Foreman, Waterfall, Silent, Paris, Bridge-far, Mother-daughter, and News video sequences are relatively simple case.Complexity means that the number of motion vectors is high and vice versa.In complex videos, Mobile video is captured by fixed camera and its motion vectors exist in each MB.Flower video shows the movement of the objects located at the lower part of the video while Bus video has the movement on the all-region.In proposed initialization algorithm of RC, the value of initial QP is computed based on the spatial and temporal complexities at the given target bit rate.As the sample video sequences are selected, the calculating method of spatial complexity of sample video sequences must be provided.
In H. 264, the smallest encoding unit is MB.MB includes two INTRA prediction modes which are INTRA 16 × 16 [4] and INTRA 4 × 4 [4].Figure 5 shows the INTRA prediction modes.In Figure 5 Since the variance of an MB, which includes 256 pixels, is the equal of the total information of the DC and AC coefficients of the MB, the spatial complexity of the MB can be estimated using variance [15]. Consider where the variance of MB is computed by MB var and (, ) is value of the pixel which is luminance signal at (, ).An MB  is classified to be of high or low complexity by threshold of variance as follows: where  is the threshold value defined as 92735 [15].Figure 7 shows the classification process of MB according to the value of variance.The spatial complexity of the IDR is measured according to the proportion of the number of complex MBs as follows: where the Frame complex is the rate of the number of complex MBs in the IDR.MB Complex is the number of the complex MBs of IDR and MB Frame is the total number of the MBs of IDR.
The Frame complex can be a measure of the spatial complexity.

Proposed Algorithm for Determining the Optimal Initial
QP.According to the given target bit rate, the value of initial QP is directly related to the performance of encoding at the H.264/AVC.The performance of encoding can be evaluated using the quality of reconstructed video sequences and total bits.In other words, the objectives are to satisfy target bit rate to ensure the best quality of reconstructed video sequences.And also the stability of reconstructed video sequences is a very important quality measure in multimedia broadcasting and transmission.Thus the optimal initial QP algorithm has the following properties: (1) maximizing PSNR(peak signalto-noise ratio) that means the best quality of reconstructed video sequences, (2) maximizing stability that is defined as the differences of QPs, and (3) minimizing total real bits under satisfying the target bit rate.
To find out the optimal initial QP, all potential 52 initial QPs are calculated.For each initial QP, the results which include PSNR, real bits of prior 60 frames, and the differences of QPs in a group of pictures (GOP) are recorded at the given target bit rate.So an initial QP, which produces the maximum PSNR and the minimum of real bits under satisfying target bit rate and minimum differences of QPs, is the optimal initial QP.Since the numeric value of real bits is much larger than PSNR and differences of QPs and also PSNR, real bits, and differences of QPs have different dimensions, PSNR, real bits, and differences of QPs are normalized as follows: where Optimal Initial QP is the optimal initial QP, 0 ≤ , ,  ≤ 1, and  +  +  = 1.In our research, the values of , ,  are set to 1/3.

Mapping Method of Spatial Complexity for Generalization.
For the video sequences which have the similar complexity, the initial QPs should be similar.A mapping method of spatial complexity is proposed for any tested video sequence.The spatial complexities of selected sample video sequences can be calculated using (11), according to the given target bit rate.For any video sequence, its spatial complexity is computed using (11) too.The spatial complexity of a video sequence can be mapped by selecting the nearest matching spatial complexity from a set of sample proportion spaces as follows: where Difference  is the absolute value of difference between the spatial complexity of th sample video and a tested one.SSC  is the th spatial complexity among 4 sample video sequences.TSC is the spatial complexity of the tested video sequence.
In Table 1, MSC is the spatial complexity of sample video.Type 1, Type 2, Type 3, and Type 4 are 4 sample video sequences, which are Waterfall, Foreman, Flower, and Mobile, respectively.Table 1 shows that Mobile has the highest spatial complexity and Waterfall has the lowest spatial complexity, which is in agreement with Figure 4.In addition, Table 1 includes 4 groups of the optimal initial QPs which are 4 affiliated sample video sequences at low bit rate, respectively.The initial QP of a test video sequence can be determined by only selecting an element from Table 1 based on Mapping Sample and given target bit rate.

Experimental Results
The proposed algorithm and existing methods, which are JVT-G012 and Wu, are implemented on JM9.3 [13] which is the reference software for H.264.In addition, the experiment uses 7 various CIF (352 × 288) standard video sequences that are Waterfall, Foreman, Flower, Mobile, Bus, City, and Stefen, respectively.In the standard video sequences, Waterfall, Foreman, Flower, and Mobile are selected samples, and Bus, City, and Stefen are used to tested generalized quality of proposed algorithm.According to the proposed scheme which is the spatial complexity calculation and mapping, Mobile is the Mapping Sample of Stefen and Bus and City are correspondent to Flower and Foreman, respectively.The experimental conditions are as follows.(c) The profile baselines are used; one GOP has 15 frames which includes that the 1st frame is encoded by intra and others are encoded as interframes; the B-picture is not adopted.The item of "Rate Control Enable" is enabled, the item of "Initial QP" is set to 0, and the target bit rates are limited to range that is from 0.4 to 1.0 (units: Mbps).
(d) The proposed initial QP is determined using Table 1.
(e) As for the standard video sequence, the number of frames is 60, the frame rate is 30.

Objective Evaluation.
The three methods which are proposed algorithm, JVT-G012, and Wu are compared in terms of PSNR and the difference of real bits.These indicators of performance can be quantized as follows: where MSE = ∑ −1 =0 ∑ −1 =0 ( p(, ) − (, )) 2 and p is the pixel value of the reconstructed video sequence. is the original pixel value.
where  JVT is the real bits of the JVT-G012 and  JVT & TEST is the real bits of the algorithms which are proposed method, Wu, and JVT-G012.Table 2 shows that the simulation results for the average PSNR and the total Δ, which can be calculated at the range of the target bit from 0.4 to 1.0 Mbps for each video sequence, are indicated by PSNR Average and ΔR total , respectively.
In Table 2(a), the proposed method obtains more better quality performance than JVT-G012 and Wu, although improvement is not obvious.However, ΔR total shows that the proposed algorithm has quite prominent bit rate performance.The proposed method reduces 243032 bits than JVT-G012 in total although each PSNR Average is similar.This illustrates that the reconstructed video of proposed method has minimum actual total bits at the same or similar quality in almost all of simulations except City.
One of the important quality measures of a video is that the quality of each frame should be uniform.The existing methods have not involved this issue.The proposed method has solved this issue by selecting initial QP according to the highest and stablest quality as well as the lowest actual bits in (15).In Table 3, the quality of each frame shows an example of extreme changes.According to the JVT-G012, the value of initial QP is 25 at the target bit rate 0.7 Mbps.The maximum QP is 51 which can generate abominable quality and the minimum QP is 25 which can generate good quality.So the difference between the maximum and minimum of QPs in GOP can be a good measure for the stability of video quality.Figure 8 shows this stability performance      for proposed method, Wu, and JVT-G012.In Figure 8, the proposed method obtains the lower difference of QPs than others in all the cases.In fact, PSNR can objectively and effectively assess quality of one frame.However, PSNR is not a perfect measure to evaluate the qualities of video sequences which have multiple frames.In Table 3, although the average PSNR is good, the quality of reconstructed video sequences is very low because the change of frame quality is very high.Extremely high fluctuation of the frame qualities means not only the repetition of fuzzy and vivid images but also video broken phenomenon.
So in order to evaluate video sequences, average PSNR and stability should be considered simultaneously in a GOP.
The quality of video sequences that have high complexity is very sensitive to the value of initial QP at low bit rate.For a given target bit, low initial QP can lead the large bits assignment to the first frame and insufficient bits to the other frames in a GOP to maintain qualities.Therefore, the stability of a reconstructed video sequence is a very important quality performance.Proposed optimal initial QP is determined under consideration of stability in (15).In Figure 8, the proposed method shows better results for all test

Subjective Evaluation.
General video user evaluates video sequences by just looking but not by calculating PSNR or stability.This implies the importance of subjective evaluation.Frankly speaking the relationship between the results of objective evaluation and subjective one is not known.Therefore, it is not easy to convert the difference of objective evaluation results to the differences in subjective evaluation.In this part, three methods, proposed method, JVT-G012, and Wu, are subjectively evaluated using the objective evaluation results, that is, the maximum and minimum QPs at 0.4 Mbps target bitrate.The Mobile sample video sequence is tested.Figure 9 shows result of this situation.
The average PSNRs of the proposed method, JVT-G012, and Wu are very similar at objective assessment, which are 25.22,25.03, and 24.83, respectively.
Although Figures 9(a In subjective evaluation, we can see also the importance of stability from the gaps in Figures 9(a

Conclusions
In order to obtain the optimal initial QP for the various video sequences which have different spatial and temporal complexities, we propose a simple and high performance initial QP determining method based on motion vector map and temporal complexity to decide an initial QP in given target bit rate.Four sample video sequences are selected according to the temporal complexity which is measured using motion vectors map, and the spatial complexities of four sample video sequences are computed according to proposed method.For any video sequences, the initial QP can be easily determined from matrices by target bit rate and mapped spatial complexity using proposed mapping method.Experimental results show that the proposed algorithm can obtain more outstanding objective and subjective performance than other conventional determining methods.In the future, one of the further research areas will be the development of quantitative measure for the temporal complexity.The study on temporal complexity will provide a hint to explain or to solve the exceptional case for H.264 AVC coding.

Figure 1 :
Figure 1: Relationship between the best initial QP and BPP according to News, Foreman, and Mobile.

Figure 2 :
Figure 2: The process of ME.

Figure 4 :
Figure 4: Motion vectors obtained from ten video sequences.
(a), the INTRA 4 × 4 has 9 prediction modes.In Figure 5(b), the INTRA 16 × 16 has 4 prediction modes.Generally, the INTRA 16 × 16 mode is used in the MBs which are in the homogeneous regions of image.However, the INTRA 4 × 4 mode is used in the MBs which are in the object and edge parts of image [4].In Figure 6, the nonhomogeneous MBs have the same value of pixels.The INTRA 16 × 16 mode is used to process the homogeneous MBs.The reasons to use INTRA 16 × 16 blocks are to save the computing time and to maintain the image quality.The INTRA 4 × 4 mode is used to process the non-homogeneous MBs which have the different pixel values.The reason to use INTRA 4 × 4 blocks is to maintain the quality.

( a )
The system platform is Intel (R) Core(TM)2 Duo CPU E7400 2.80 GHZ, 2.00 GB RAM, and the OS is Microsoft Windows XP professional 2002 Service Pack 3. (b) JM 9.3 is implemented at the Visual Studio 6.0.
(a) Minimum QP of GOP based on JVT-G012 (b) Maximum QP of GOP based on JVT-G012 (c) Minimum QP of GOP based on Wu (d) Maximum QP of GOP based on Wu (e) Minimum QP of GOP based on proposed method (f) Maximum QP of GOP based on proposed method
) and 9(c) show better quality than Figure 9(e), all objects in Figure 9(e) are vividly identified at minimum QPs.Although Figures 9(b) and 9(d) are very fuzzy and some details of frame are lost, the quality of Figure 9(f) is excellent compared with Figures 9(b) and 9(d).

Table 1 :
Lookup table for proposed initial QP.

Table 2 :
Comparison of coding performance.

Table 3 :
Reconstructed video with extremely changing quality.