Fast HEVC Intramode Decision Based on Hybrid Cost Ranking

To improve rate-distortion (R-D) performance, high efficiency video coding (HEVC) increases the intraprediction modes with heavy computational load, and thus the intracoding optimization is highly demanded for real-time applications. According to the conditional probabilities of most probable modes and the correlation of potential candidate subsets, this paper proposes a fast HEVC intramode decision scheme based on the hybrid cost ranking which includes both Hadamard cost and rate-distortion cost. The proposed scheme utilizes the coded results of the modified rough mode decision and the neighboring prediction units so as to obtain a potential candidate subset and then conditionally selects the optimal mode through early likelihood decision and hybrid cost ranking. By the experiment-drivenmethodology, the proposed scheme implements the early termination if the bestmode from the candidate subset is equal to one or twoneighboring intramodes.The experimental results demonstrate that the proposed scheme averagely provides about 23.7% encoding speedup with just 0.82% BD-rate loss in comparison with default fast intramode decision in HM16.0. Compared to other fast intramode decision schemes, the proposed scheme also significantly reduces intracoding time while maintaining similar R-D performance for the all-intraconfiguration in HM16.0 Main profile.


Introduction
High efficiency video coding (HEVC) has been developed through the efforts of the joint collaborative team on video coding (JCT-VC) [1].HEVC includes three types of coding blocks: coding unit (CU), prediction unit (PU), and transform unit (TU).CU has a square shape quad-tree structure whose size ranges from 8 × 8 to 64 × 64.Each CU may be recursively split into four equally sized CUs.After splitting CU, PU and TU split themselves independently, and their size cannot be larger than the size of CU.For intraprediction, PU can only split into two types:  ×  and 2 × 2.For a PU, HEVC can perform 35 intraprediction modes, whereas the number is maximally 9 in H.264/AVC.The increased number of intraprediction modes can bring higher compression ratio in intracoding but require substantially higher computational complexity to select the optimal intraprediction mode and unit splitting by performing the rate distortion optimization (RDO) [2].Given the unbearable complexity of traversing all candidates through brute-force RDO, the HEVC encoder suffers a lot if all intraprediction modes are used in the RDO process.To overcome this problem, fast intraprediction can selectively go through the available intraprediction modes and further remove unlikely modes with complexity reduction.Motra et al. [3] used the intraprediction modes of previous frames and neighboring blocks to estimate the candidates of current block, and this method requires a lot of memory space to store the information of previous frames.To reduce the number of candidate modes, the texture detection techniques are employed to decide the edge direction of current block.Zhang and Ma [4] presented a fast intraencoding method by estimating the rate-distortion (R-D) cost of the current CU based on the direction intensity detections, and the R-D cost is calculated for the main angles only with local direction refinement.However, the preprocessing to detect the textures and classify them requires the additional processing time and memory cost.
To reduce the computational complexity of HEVC intrapredication, rough mode decision (RMD) was adopted [5], where  best candidate modes are selected according to the sum of absolute transformed differences (SATD) and the mode bits.SATD usually performs a Hadamard transform of residual signal between the pixels in the original 4 × 4 block and the corresponding pixels in the reference block.The subsequent RDO process is only applied to the  best candidate modes from RMD.However, the correlation of intraprediction modes among the spatially neighboring blocks is not considered in the RMD process.Therefore, Zhao et al. [6] improved the RMD process which always includes a most probable mode (MPM) process, and the method makes full use of the direction information of neighboring blocks to speed up intramode decision in the high-efficiency and low-complexity configuration.The above RMD and MPM proposals are still retained in new HEVC test Model 16.0 (HM16.0)[7].The recent intraframe optimization algorithm for HEVC can reduce up to 50% intracoding complexity for a monocore platform, which includes multiple functional modules: early termination of CU encoding, fast intramode decision, fast TU depth selection, and fast intratransform skip mode decision [8,9]; these modules are inseparable for the intraframe optimization algorithm, and thus it is difficult to solely compare the performance of the MPM-related module.There are still some recent related references available.For example, Gan et al. [10] presented a fast intraprediction mode decision algorithm with early MPM decision.Based on the Lagrangian cost function of RMD candidates, the algorithm utilizes the correlation between the first RMD candidate mode and the MPM modes to early terminate the RDO process and thereby reduce the computational complexity of intraprediction.Kumar et al. [11] used the line average for each row and column to identify the directional orientation, and the proposed scheme compared the strength of the directional orientations in each direction to detect the most probable intraprediction direction for each PU.Further, Fini and Zargari [12] proposed a two-stage decision method for HEVC fast intraprediction.In the first stage, the number of the tested modes in RMD is reduced from 35 to 19, and the rough encoding cost for omitted modes is estimated by using the cost of two neighboring modes; in the second stage, the method reduces the candidate modes based on the correlations among angular modes.Reference [12] briefly reviewed the current development of fast intramode decision for HEVC and also summarized the experimental results of different decision methods for test sequences, where the experiments had shown that the two-stage decision method reduces the coding time with minimum quality degradation.
With the evolution of HEVC reference software, the correlation between RMD and spatially adjacent PUs has not been fully studied, and there is still some space for further reducing of the encoding complexity for HEVC intraprediction.The emergence probabilities of coding modes have been used for the optimization of different video codecs.In this paper, we utilize the experiment-driven methodology to further modify the RMD and MPM processes in HM16.0.We propose a fast HEVC intramode decision scheme based on hybrid cost ranking, which utilizes the instant information of RMD and neighboring PUs to obtain a potential candidate subset and then conditionally selects an optimal mode through the early likelihood decision and hybrid cost ranking.The remainder of this paper is organized as follows.Section 2 presents a brief review of fast HEVC intramode decision.The details of the proposed scheme are provided in Section 3. Extensive experiments are carried out and analyzed in Section 4 to demonstrate the performance of the proposed scheme, while Section 5 concludes the work in this paper.

HEVC Intramode Decision
2.1.Intraprediction Modes.Since high-definition video has more complex and detailed texture, the accuracy of HEVC intraprediction has been significantly improved.HEVC designs 33 angular direction modes in addition to planar mode and DC mode.In the HM16.0, the PU size ranges from 4 × 4 to 64 × 64, and a PU checks up to 35 prediction modes to derive the R-D optimal one.Figure 1 gives an illustration of HEVC intraprediction modes, where a larger number of modes can lead to greater compression ratio and computation complexity.

Fast Intramode Decision.
In the HM16.0, the default fast intraprediction mainly includes two stages for a current PU, which is briefly summarized as follows.
Stage 1 Is the RMD Process.The Hadamard cost of each intrapredication mode is estimated by deriving the SATD value and the bits consumption.For the th mode, its Hadamard cost    is calculated by the cost function in the formula where SATD  denotes the SATD between the original block and the reference block;   denotes the mode bits; and   denotes a Lagrange multiplier.The RMD process will choose a few candidates with small Hadamard costs into a suboptimal candidate subset.An additional MPM process may be implemented, where three MPM modes are estimated from left and above PUs.In addition to the RMD candidates, the MPM modes are added to the potential candidate subset if they are not included yet.Stage 2 Is the RDO Process.The full RDO process is implemented for the suboptimal candidate subset, including DCT transformation, quantization, and entropy coding.For the th mode, its R-D cost  RD  is calculated by the cost function in the formula where SSD  denotes the sum of squared differences between the original block and the reconstructed block;   denotes the number of coded bits with quantization; and   (QP) denotes a Lagrange multiplier related with the quantization parameter (QP).The intraprediction need utilizes the subsequent RDO process to loop over these potential modes at the maximum TU size.By R-D cost ranking, the candidate with minimum R-D cost is chosen as the optimal mode of the current PU.Based on the optimal mode, the residual quad-tree TU selection is further utilized to derive the best TU partition [13].
The intraprediction in HM16.0 utilizes the timeconsuming RDO process to calculate and compare the R-D costs of candidate modes and choose the optimal mode with minimum R-D cost.In HM16.0, the execution time of intraprediction coding can occupy over 20% in total intraencoding process.Figure 2 shows the average runtime ratio of the RMD and RDO processes in default intraprediction coding where the experimental settings are given in Section 4, and the two processes consume about 66% of computations.That brings us an optimization target which aims at cutting down the RMD and RDO runtime.
The MPM modes are acquired through the coding results of spatially adjacent PUs. Figure 3  The candidate modes in RMD are ranked in increasing order of Hadamard costs and present a descending trend to be the optimal mode.By Hadamard cost ranking, the former modes in  RMD have higher probability as the optimal mode than the latter candidates.As shown in Figure 4, the statistics show that the MPM modes are highly correlated with the RMD candidate modes, and they have a certain overlap as the optimal mode.Particularly, the first candidate mode "RMD1" from the RMD process and the MPM modes from the spatially adjacent PU have a high percentage as the optimal intramode.There is still some space for further reducing of the encoding complexity for HEVC intraprediction.We will utilize the experiment-driven methodology to further improve the RMD and MPM processes in HM16.

Conditional Decision with Hybrid Cost Ranking
For real-time applications, the significant reduction of HM16.0 complexity is required while maintaining similar bitrate and perceptual quality.The Hadamard cost has a positive correlation with the R-D cost and may be utilized to avoid unnecessary R-D calculation in fast mode decision.In order to describe the proposed scheme, some symbols in Section 3 are defined in Table 1.
Our optimization scheme includes two strategies: (1) early termination if the optimal candidate obtained from RMD is equal to one or both neighboring intramodes and (2) selectively reducing the number of candidates for full RDO.Therefore, we collect the mode information during HM16.0 intracoding and try to analyze the statistical properties between candidate modes and optimal mode, so as to speed up intraprediction.

Intraprediction Statistical Experiments.
To indicate the potential effectiveness of the proposed scheme, we implement the following three experiments by testing all sequences in Table 7 and collect the emergence probabilities of different candidates as  opt .=   =  1 ), the percentage of the PUs with  1 =  opt is about 14.29%/15.46%= 92.43% on average.For this condition,   or   is highly correlated with  opt .For a PU with (  ̸ =   =  1 ) or (  ̸ =   =  1 ), mode  1 may be directly selected as  opt .Based on Tables 2 and 3, the mode correlation of adjacent PUs can be used to early terminate the RDO process.Experiment 3.For the RMD process, each PU size has a fixed candidate number.RMD determines a potential candidate subset among all 35 prediction modes, for example, 8 candidate modes for 8 × 8 PU, 3 candidate modes for 32 × 32 PU.In Generally speaking, a 64 × 64 PU tends to be applied to homogeneous regions of one image, while 4 × 4 or 8 × 8 PU is often applied to texture regions of the image.In Experiment 3, our RMD process does not include the MPM process, and the RMD candidate subset  RMD is obtained by Hadamard cost ranking.Then the Hadamard cost of each MPM mode is, respectively, calculated, and the MPM candidate subset  MPM is also obtained for the current PU.Based on Hadamard cost ranking, a few candidate modes from RMD and MPM may be chosen for the subsequent RDO process to derive the optimal mode.In increasing order of Hadamard costs in  MPM ∪  RMD , the probability of the candidate mode as  opt also presents a descending trend, and the former modes have a large proportion as  opt compared to latter modes in  MPM ∪  RMD .After the RMD and MPM processes, the number of candidate modes may be reduced by their rank.The experiment-driven method is used to choose the optimal numbers of  RD candidate.In  MPM ∪  RMD , some later modes for a certain PU size have very low percentage as  opt , and we can obtain the potential candidate subsets  RD ⊆  MPM ∪  RMD .In increasing order of Hadamard costs, the optimized R-D candidate subset  RD includes the former modes of  MPM ∪  RMD whose candidate number is defined in Table 5, where the number of candidate modes involved in the subsequent RDO process is, respectively, compromised to 6, 6, 3, 3, and 2 for PU sizes of 4 × 4, 8 × 8, 16 × 16, 32 × 32, and 64 × 64 through extensive trials.For other allocation choices, the performance will be poorer.The proposed algorithm fully exploits the correlation between the rough modes by RMD and MPM and the final optimal mode.
to determine the best TU partition M opt subset S MPM ∪ S RMD by Hadamard cost ranking candidate subset S RD is obtained by Table 5 By R-D cost ranking, RDO is used for S RD ; the mode with minimum R-D cost is M opt  After Experiments 1 and 2, the combination of MPM and RMD candidate modes is effective to further remove the unlikely modes.According to the statistical distribution of optimal intraprediction modes, the candidate modes in  RD have high probability to be chosen as  opt .From these experiments, it is observed that the candidate modes selected from  RD can averagely cover about 90.45% of the optimal modes for current PU, which is further illustrated in Table 6.

Proposed Intramode Decision.
The spatial correlation of intraprediction modes among neighboring PUs is exploited to accelerate the mode decision.We have modified the RMD process which does not include a MPM process.The proposed RMD process can obtain the candidate subset  RMD by Hadamard cost ranking.After the RMD and MPM processes, the number of candidate modes is reduced according to the hybrid cost ranking.From these experiments, it is observed that the first 3, 3, 2, 2, and 1 candidate modes selected from  RD can cover about 90% of the optimal intramode.By hybrid (Hadamard + R-D) cost ranking, a few candidate modes from RMD and MPM are chosen for the subsequent RDO process.According to the above statistical analysis, Figure 5 illustrates the flowchart of the proposed fast intramode decision algorithm for one PU, and the proposed scheme needs to modify the RMD and MPM processes as follows.
Step 1.   and   from left and above PUs are firstly obtained and compared.Step 2. For a current PU, the RMD process calculates the Hadamard cost of each intraprediction mode and traverses up to 35 modes.Then the RMD candidate subset  RMD is obtained by Hadamard cost ranking.
Step 3.According to the following three conditions, the early likelihood decision and hybrid cost ranking are implemented.
Condition A. For the current PU with   =   =  1 , mode  1 is directly decided as the optimal mode  opt , and the RDO process is implemented only for mode  1 and then goes to Step 4.

Condition B.
For the current PU with (  ̸ =   =  1 ) or (  ̸ =   =  1 ), mode  1 is directly decided as  opt , and the RDO process is implemented only for mode  1 and then goes to Step 4.
Condition C. For other conditions, an additional MPM process is implemented to predict three MPM modes, and the MPM candidate subset  MPM is obtained.If any MPM mode is not included in  RMD , its Hadamard cost is calculated, and the MPM mode is added to the potential candidate subset  MPM ∪  RMD by Hadamard cost ranking.In increasing order of Hadamard costs, the R-D candidate subset  RD is obtained whose candidate number is defined in Table 5.Then all members in subset  RD are considered as candidates in the RDO process to compete for the optimal intraprediction mode.By R-D cost ranking, the RDO process is implemented only for the candidate modes in  RD , and the mode with minimum R-D cost is decided as  opt and then goes to the next step.
Step 4. The residual quad-tree TU selection is performed on the optimal mode  opt to determine the best TU partition.

Experimental Results
In order to evaluate the performance of the proposed scheme, it is implemented into the reference software of HEVC test Model 16.0 (HM16.0).The experimental results for various test sequences are shown in this section.Since the proposed scheme only focuses on the optimization of the RMD and MPM processes, it is difficult to fairly compare it with other non-MPM schemes.Therefore, the proposed scheme is compared with the following intramode decision schemes with HM16.0 MPM process: (1) D-FIMD: the Default Fast Intramode Decision with the MPM process [7]; (2) ET: the fast intramode decision with Early Termination [10]; (3) BO: the fast intramode decision based on Block Orientation [11]; (4) TS: the Two-Stage fast intramode decision [12]; and (5) HCR: the proposed fast intramode decision based on Hybrid Cost Ranking.
The proposed HCR scheme is fully compatible with the standard and does not bring additional load to HEVC decoder.Since the proposed scheme focuses on the intracoding, experiments are carried out in the all-intraconfiguration in HM16.0 Main profile according to the JCT-VC common test conditions [14].The experimental platform and HM16.0 encoding configuration are briefly summarized as follows: (a) our scheme is independent of the parallelization of any multicore processor and the experimental platform is Dell 9020MT: 3.2 GHz i5-4570 CPU, 8 GB memory, 1 G independent graphics; (b) QP is, respectively, fixed at 22, sequences are all used for the performance verification.In the experiments, the difference of encoding bit-rate is compared by the BD-rate (Bjøntegaard Delta Bitrate) loss [15], and the encoding speedup Δ is derived in where   denotes the encoding time of the D-FIMD benchmark scheme and   denotes the encoding time of the ET, BO, TS, or HCR schemes.The performance is measured with the BD-rate loss and the encoding speedup, and the simulation results of different schemes are shown in Table 7 without any multicore platform or source code optimization.
The results are adequately compared with related work of others.It is clear from the results in Tables 2 and 3 that early termination would save about 25% of the time and it seems to agree with the 20%-25% speedup observed.The experimental results demonstrate that, in comparison with the D-FIMD scheme for HM16.0, the ET scheme averagely achieves 12.6% speedup with 0.75% BD-rate loss, and the TS scheme averagely achieves 10.9% speedup with 0.51% BD-rate loss, and the proposed HCR scheme averagely achieves 23.7% encoding speedup with just 0.82% BD-rate loss.The HCR scheme averagely outperforms the BO scheme in terms of speedup and BD-rate.Compared with the ET and TS schemes, the HCR scheme can averagely save 9.5∼ 11.2% encoding time while slightly increasing about 0.11∼ 0.35% BD-rate for all test sequences.Therefore, the proposed HCR scheme can significantly reduce intracoding time with negligible degradation in BD-rate.
By, respectively, testing four typical sequences, Figure 6 plots the R-D curves of four sequences including "Traffic" and "BQTerrace" and "PartyScene" and "RaceHorses" for five schemes.We can observe that the proposed HCR scheme slightly reduces the luma peak signal-to-noise ratio (Y-PSNR) with similar encoding bit-rate (kbps).Compared with other schemes, the proposed HCR scheme can preserve nearly the same R-D performance.The experimental results show that the algorithm achieved about 24% speedup of HEVC intravideo coding with a little degeneration in DB-rate.
To demonstrate the speedup stabilization of the proposed HCR scheme, we test different classes of video sequences with different QP values (22, 27, 32, and 37). Figure 7 shows the encoding speedup curves of each sequence class for the HCR scheme versus the D-FIMD benchmark scheme.For different QP values, the proposed scheme consistently achieves about 24.78%, 21.43%, 20.45%, 21.33%, and 22.60% encoding speedup for test sequence Class A, Class B, Class C, Class D, and Class E, respectively.There is no obvious fluctuation for different bit-rate ranges.Hence, the proposed HCR scheme is less sensitive to changes in bit-rate or QP.

Conclusions
This paper proposed a fast HEVC intramode decision scheme based on hybrid cost ranking, and the proposed scheme exploits the correlation between the rough modes by RMD and MPM and the final optimal mode.By optimizing the RMD and MPM processes, the proposed scheme utilizes the instant information of RMD and neighboring PUs to obtain a potential candidate subset and then conditionally selects the optimal mode through the early likelihood decision and hybrid cost ranking.By the experiment-driven methodology, this paper proposes speeding up HEVC intraencoding by two strategies: (1) early termination if the best candidate obtained from RMD is equal to one or both neighboring intramodes and (2) selectively reducing the number of candidates for full RDO.The experimental results demonstrate that the proposed scheme can significantly reduce intracoding time while maintaining similar R-D performance in the allintraconfiguration in HM16.0 Main profile.

Figure 4 :
Figure 4: The possibility of being selected as the optimal prediction mode for RMD and MPM.

1 M
of Hadamard costs, the R-D The residual quad-tree-based TU selection is performed on Optimal mode RDO is used for Yes No from left and above PUs are firstly obtained and compared M L and M a M L = M a = M

Figure 5 :
Figure 5: The flowchart of the proposed algorithm.

Figure 6 :
Figure 6: R-D curves of four test sequences for different schemes.

Figure 7 :
Figure 7: Encoding speedup curves of each sequence class for HCR versus D-FIMD.

Table 1 :
Symbol definition for a current PU.

Table 2 :
The conditional probability of Experiment 1.In homogeneous regions of one image, the adjacent PUs have similar smooth characteristics, and the optimal mode of a current PU has a strong correlation with that of adjacent PUs.In  RMD ,  1 denotes the first candidate mode with minimum Hadamard cost.For a current PU,  opt denotes the optimal mode with minimum R-D cost.When   and   modes of left and above PUs and the first RMD candidate  1 have a certain overlap, we need to investigate whether  1 has high probability as  opt of current PU.The conditional probability of Experiment 1 is shown in Table2.The experimental results demonstrate that, among the PUs with   =   =  1 , the percentage of the PUs with  1 =  opt is about 8.73%/9.12%=95.72% on average.Therefore, for a PU with   =   =  1 , mode  1 has high conditional probability to be chosen as  opt .Statistical experiments have showed that image objects often have higher spatial correlation in the horizontal or vertical direction.Because of the spatial correlation between adjacent PUs,   and   modes from left and above PUs may reflect the direction information of current region.The proposed scheme plans to use the direction information of neighboring PUs to speed up intramode decision.The experiment is further carried out to investigate when   or   is finally selected as  opt .The conditional probability of Experiment 2 is shown in Table3.The experimental results demonstrate that, among the PUs with (  ̸ =   =  1 ) or (  ̸

Table 4 :
Number of intra modes in  RMD .
for the critical 4 × 4 or 8 × 8 PU.Table4shows the default relationship between the PU size and the RMD candidate number, where the non-RMD candidate number denotes the empirical number of intrapredication candidates without the RMD process.

Table 5 :
Number of candidate modes from  RD .

Table 6 :
The conditional probability of  RD candidate to be  opt .

Table 7 :
BD-rate loss and encoding speedup.