Complexity Analysis of New Future Video Coding (FVC) Standard Technology

Future Video Coding (FVC) is a modern standard in the field of video coding that offers much higher compression efficiency than the HEVC standard. FVC was developed by the Joint Video Exploration Team (JVET), formed through collaboration between the ISO/IEC MPEG and ITU-T VCEG. New tools emerging with the FVC bring in super resolution implementation schemes that are being recommended for Ultra-High-Definition (UHD) video coding in both SDR and HDR images. However, a new flexible block structure is adopted in the FVC standard, which is named quadtree plus binary tree (QTBT) in order to enhance compression efficiency. In this paper, we provide a fast FVC algorithm to achieve better performance and to reduce encoding complexity. First, we evaluate the FVC profiles under All Intra, Low-Delay P, and Random Access to determine which coding components consume the most time. Second, a fast FVC mode decision is proposed to reduce encoding computational complexity. Then, a comparison between three configurations, namely, Random Access, Low-Delay B, and Low-Delay P, is proposed, in terms of Bitrate, PSNR, and encoding time. Compared to previous works, the experimental results prove that the time saving reaches 13% with a decrease in the Bitrate of about 0.6% and in the PSNR of 0.01 to 0.2 dB.


Introduction
High-Efficiency Video Coding (HEVC) is the leading video coding standard [1], standardized in 2013 by the Joint Collaborative Team on Video Coding (JCT-VC) forming the Motion Picture Experts' Group (MPEG) and the Video Coding Expert Group (VCEG). HEVC achieves an increase of about 50% in coding efficiency while maintaining the same visual quality than previous standards, such as H.264/Advanced Video Coding (AVC) [2]. With the development of video technologies, better qualities and higher resolutions are demanding. For this reason, the new video codec is very interesting to improve the compression efficiency and the quality of the predecessor standards. In October 2015, the new group, Joint Video Exploration Team (JVET), has been working on a new video coding standard, called post-HEVC or Future Video Coding (FVC), as the successor of HEVC [3]. The Versatile Video Coding (VVC) standard is a new video coding technology, which can be standardized from 2020. At the same video quality, especially for UHD video, the FVC standard currently provides between 25 and 30% Bitrate saving compared to HEVC [4].
These new FVC technologies are being evaluated in order to improve the compression efficiency using an experimental platform, namely, the Joint Exploration Model (JEM) software, which was developed from the reference software HM [3]. FVC is developed to essentially meet all existing HEVC and H.264/AVC applications, such as broadcast, surveillance, and smart home, and focus on two goals: higher video resolution and parallel architectures [5]. On the other hand, video coding has a high potential for being deployed in wireless networks due to its unique features like independent frame coding and low-complexity encoding operations. In fact, the growing complexity has hampered the adoption of video encoding in real-time streaming over mobile wireless networks, such as 4G networks and upcoming 5G networks. However, the video encoding with high computational complexity and the great contribution to a node's power consumption and video transmission over an erroneous wireless channel are the main reasons for these challenges [6]. As in [7], the authors proposed a unified Quality of Experience (QoE) prediction framework for HEVC-encoded video streaming over the wireless network. In addition, the work in [8] proposes a novel frame-level rate control algorithm for videos with a complex scene in HEVC over the wireless network. Many other works in this nascent field of video coding in the wireless network can be found in [9,10].
This paper focuses on the optimization of the new video coding standard (FVC) through fast methods in terms of encoding time. Therefore, to reduce the computational complexity, we propose in this paper a fast FVC scheme-based fast-mode decision. The rest of this paper is structured as follows. An overview of the FVC standard is defined in Section 2. Section 3 presents some existing algorithms developed on fast-mode decisions in order to reduce the HEVC and FVC complexity in terms of encoding time. Section 4 presents the JEM configuration. Section 5 gives experimental results. At the end, the conclusion of this paper is presented in Section 6.

FVC Overview
As in most previous standards, FVC has a hybrid block-based encoding architecture, containing intra and inter prediction and transform coding with entropy coding [1]. The FVC is developed by the JVET based on the HM test model (HM 16.6) [11]. The picture partitioning structure divides the input video into blocks called Coding Tree Units (CTUs). A CTU is split using a quadtree with a nested binary tree structure into Coding Units (CUs), with a leaf CU defining a region sharing the same prediction mode (e.g., intra or inter). Figure 1 shows a general block diagram of the FVC standard. The new coding features in the FVC standard are listed as follows [12].

Block Partitioning.
A Coding Tree Unit (CTU) becomes the principal block partition in the HEVC standard, which replaced the macroblock for the H.264/AVC encoder. The CTU has a size of 64 × 64 up to 8 × 8. We talk about quadtree in HEVC, but recently, the quadtree plus binary tree (QTBT) block structure was introduced in the new video coding FVC [13,14]. The concepts of multiple partition types have been removed in the FVC standard; it means that the sizes of CU, PU, and TU are similar in the QTBT structure [15]. There are two types of binary trees: a value of 1 determines a symmetric vertical split for a CU, while a 0 value specifies a symmetric horizontal split. These improvements notably increase compression efficiency. The leaf nodes of the binary tree are called coding units (CU), and this segmentation is used for transformation and prediction processing without further splitting; a CU can be square or rectangular in shape [16]. Figure 2 illustrates the QTBT block structure in JEM software.
The Coding Tree Unit (CTU) with P/B slice coding is presented in Figure 3. In the QTBT structure, CTU is firstly divided into a quadtree partition, and then it can be further partitioned into a binary tree partition. With a QTDepth from 0 to 4 levels, the quadtree nodes have a block size from 256 × 256 (CTU) to 16 × 16 (MinQTSize). The maximum allowed size of the root node of the binary tree is 128 × 128, corresponding to a BTDepth from 0 to 3 [17].

Intra Prediction.
The intra prediction modes have been also enhanced; JEM software has 67 intra modes: 65 angular, Direct Current (DC) and planar prediction modes, instead of 35 modes in HEVC [3,14]. Figure 4 illustrates intra prediction modes. The black line represents the existing directional mode in HEVC, and the red line means the newly added directional mode in FVC. The planar and DC modes remain the same.

Inter Prediction.
Compared to HEVC, inter prediction has many improvements in JEM software. There are two Motion Vector Predictions (MVPs) including Alternative Temporal Motion Vector Prediction (ATMVP) and Spatial Temporal Motion Vector Prediction (STMVP). The ATMVP is enhanced by allowing each CU to report multiple sets of motion information from multiple blocks smaller than the current CU in the collocated reference frame, as illustrated in Figure 5.
In the STMVP process, based on the neighboring spatiotemporal motion vector predictor, the motion vectors of the sub-CUs are derived recursively, as shown in Figure 6.

Transforms.
The prediction residual is encoded using a transform block. There are two types of transforms: DST and DCT. FVC introduced multiple transforms such as DST (I and VII) and DCT (II, V, and VIII). The size of transform block is increased from 4 × 4 to 64 × 64 in the new video coding compared to the HEVC standard.
2.5. Filter Improvements. There are three filtering methods introduced for the FVC standard: a deblocking filter, a Sample Adaptive Offset (SAO), and an Adaptive Loop Filter (ALF). The deblocking filter is designed to minimize the visibility of artifacts and is only used for samples found at block boundaries. The SAO filter is aimed at enhancing the quality of reconstruction of the amplitudes of the original signal and is adaptively applied to all samples. ALF minimizes the mean-absolute-error between the decoded frame and the original image.
2.6. Entropy Coding. "Context Adaptive Binary Arithmetic Coding (CABAC)" is the entropy coder in HEVC [18]. In the FVC standard, the improved version of CABAC is adopted with a changed context model selection for the evaluation of multihypothesis possibilities, transformation coefficients with context-dependent update rate, and adaptive model initialization.  Figure 1: FVC block diagram.      In fact, many researchers aim to reduce the complexity for each standard module in terms of encoding time through software and hardware methods [19]. An overview of previous works, which introduced fast algorithms using RA and LD configurations, is presented in the next section.

Related Works
In fact, computational complexity remains a serious issue in video compression, especially when real-time application is desired. Consequently, several optimizations are required to reduce the computational complexity. The main goal of FVC is to solve the critical issues in the HEVC [20]. Being in continuous development, the works detailing these techniques and methods are very rare. This is why some efforts are being made using HEVC to develop FVC. Besides, to minimize the encoding time for JEM software, especially, the ME time, several fast approaches have been proposed [21].
García-Lucas et al. [21] proposed a fast scheme in order to accelerate the ME in JEM called "pre-analysis algorithm." This algorithm allows reducing the size of the search range and the number of the reference frame. Numerical results prove that the proposed algorithm achieves more than 62% of the execution time with a negligible BD-rate of 0.11%. Moreover, the work in [22] proposes a Naïve-Bayes modelbased fast CU mode prediction in HEVC-JEM transcoding to improve coding efficiency. This technique reduces the computational time by 12.71%.
In [23], Khemiri et al. suggested an algorithm using the parallel-difference-reduction process to optimize the ME module of HEVC. The proposed scheme achieves on average 56.17% and 30.4% reduction in coding time with a PSNR loss of 0.095 dB and a reduction in Bitrate of 0.64%. Three algorithms are presented in [24] to improve the TZ search (test zonal search) algorithm. The computational complexity reaches 75% with 0.12 dB PSNR and a decrease by 0.5% in the Bitrate in RA configuration. Another fast algorithm named "early skip mode decision" for the HEVC is presented in [25]. The obtained results reveal that the fast scheme saves on average 58.5% and 54.8% of execution for several video sequences under RA and LDB configurations.
Two other fast-mode decisions are presented in [26]: the Coded Block Flag and the Early CU Termination. This work reduced the encoder complexity by 58.7%, while maintaining the same level of coding efficiency. Kim et al. [27] proposed two fast-mode decisions in order to accelerate the inter prediction time: ESD and CBF. This proposed algorithm achieves 34.55% of reduction in execution time using RA configuration and 36.48% using the LD configuration. Lee et al. [28] suggest an algorithm called "Adaptive Search Range" (ASR) to reduce the ME complexity by replacing the fixed ME search range with an adaptive one. That is, ASR can also be adopted for several search models in the software execution to minimize the number of search points. The results obtained show that the proposed algorithm can reduce the execution time by up to 53% for different sequences in fast ME schemes. Another interesting work proposed by Park et al. [29] is aimed at reducing the complexity of encoding JVET JEM with the QTBT partition technique. The proposed "Reference Frame Search" method allows the encoder to skip over important reference frame searches by using the strong correlation between parent and child nodes in the QTBT partition. The experiment was carried on with a quad-core Intel i7 4.00 GHz CPU with 16 GB RAM. Results revealed that this technique reduced the ME time by 34% compared with JEM 3.1, while maintaining less than a 0.3% BD-rate.
In order to optimize the TZSearch motion estimation, Purnachand et al. [30] replaced the "diamond search pattern" with the "Hexagonal search pattern." In addition, the proposed algorithm is improved by changing the search threshold in the search area for each grid. All simulations prove that the computational complexity for ME is decreased by almost 50% compared to the TZSearch algorithm with a nonsignificant change in PSNR and Bitrate. In addition, Ahn et al. [31] introduced a fast inter-HEVC encoding scheme. The achieved results prove that the proposed scheme makes it possible to obtain an average time saving of 49.6% and 42.7% with an average Bitrate of 1.4% and 1% under the RA and LDB configurations for various sequences of test.
On the other hand, in [32], the authors proposed an effective quadtree plus binary tree (QTBT) partition method to reach a good compromise between compression performances. Experimental results provide an average time reduction of 64% with only 1.26% increase in Bitrate. A fast algorithm combining both CU and PU early termination decisions to solve the problem of high computational complexity of HEVC is proposed by Chen et al. in [33]. Results show that the proposed method achieves 57% time saving with an increase of 0.43% in the BD-rate. Wang et al. [34] proposed an algorithm named "Confidence Interval-Based Early Termination" for QTBT partition, to classify the redundant partition methods in terms of RD cost technique. The results obtained prove that the proposed scheme can speed up the QTBT partition process by reducing the execution time by 54.7% with an increase of only 1.12% in Bitrate. In short, to reduce the FVC complexity, several schemes have been introduced. Some of them are aimed at reducing the number of searches in order to improve ME. Others adopted fast-mode decisions to improve the TZSearch motion estimation using different configurations.

JEM Configuration Overview
As with HM for HEVC, the reference software JEM supports four types of coding configurations, as indicated in the Common Test Conditions [35]. The four modes provided are as follows: All Intra, Low-Delay B, Random Access, and Low-Delay P slices only.
4.1. All Intra (AI). All pictures are encoded using I-slices. The Quantization Parameter (QP) is constant for all images. For the AI configuration, a temporal subsampling of the sequences is performed in JEM. The subsampling can be enabled in the JEM software using the parameter "Temporal-SubsampleRatio." This parameter of the AI "encoder_intra_ jvet10.cfg" configuration file is 8, indicating that one frame is encoded every 8 frames [35]. The number related with each image signifies the encoding and display order. The QPI represents QP for the IDR (Instantaneous Decoder Refresh) picture which is the same for all pictures. Figure 7 gives a graphical presentation of AI configuration.

Low-Delay (LD).
For this configuration, there are two subtypes, which are "Low-Delay P" and "Low-Delay B." In the LD configuration, only the first frame is encoded in the Intra mode. So, in the LDP mode, all pictures are encoded as a P-slice only while all frames are taken as P and B slices for LDB. The coding order is represented by the associated number for each frame. The QP of each intercoded frame must be calculated by adding an offset to the QP of the intracoded frame as a function of the temporal layer. Figure 8 represents a Low-Delay configuration graphical presentation. For the JEM reference software encoder, the hierarchical B-picture coding structure is used in the RA configuration. The "encoder_randomaccess_ jvet10.cfg" is selected. The size of the Group of Picture (GOP) is fixed to 16 frames [36]. Figure 9 shows a random access configuration graphical presentation. In RA mode, only the first frame in the video sequence is encoded as intra-frame. Other successive pictures will be encoded as generalized P and B pictures.

Experimental Condition.
In this section, we have evaluated the performance of the FVC standard and compared    [34]. Each class consists of different videos with different scenarios and features, as shown in Table 1.

Evaluation Criteria.
The coding performance is evaluated through: PSNR, BR, and T, which are defined as follows:

FVC Time Profile.
In this section, we evaluate the profiling results obtained by JEM-7.1, in order to define the encoding components that consume the most time. The time distribution of the JEM encoder for three profiles, namely, All Intra, Random Access, and Low-Delay P is illustrated in Figure 10. These profiling results were obtained with Valgrind tools when processing the "Drums100" sequence encoded with QP = 37 in RA and All Intra configuration, and the "BasketballDrive" sequence encoded in LDP configuration with QP = 37.
For All Intra, the most critical block in execution time is the Transform and Quantization module. The encoding time consumed in intra prediction exceeds 30%, while in Low-Delay P, more than 60% is devoted to the inter prediction. Likewise, the inter prediction consumes 60% of the execution time in an RA configuration.
The complexity of the inter prediction is explained by the huge number of redundant operations that the standard must perform on the same frame and with different block partitions.

FVC Fast Mode Decision.
To reduce the encoding time of the new standard FVC, many fast-mode decision algorithms for splitting were adopted in JEM software, such as Early Skip Detection (ESD), Coded Block Flag (CBF), and Early CU termination (ECU) algorithms, which are clarified as follows.

Early CU Termination (ECU).
The early CU termination algorithm is used in the passage from depth p to depth p + 1. The best mode is determined by computing the RD cost. After selecting the skip mode having the minimum of RD cost, there is no need to continue the partitioning [24].

Early Skip Detection (ESD)
. Some works show that the most modes chosen were the SKIP mode. The skip mode is a very efficient coding tool. It can represent a coded block without residual information. After searching the best inter2N × 2N, the Early Skip Detection (ESD) algorithm represents a simple checking of the differential motion vector (DMV) and the Coded Block Flag (CBF) which are the two conditions called as "early Skip conditions." After selecting the best mode having the minimum of RD cost, the proposed method checks its DMV and CBF. If the DMV and CBF of the best inter2N × 2N mode are, respectively, equal to (0, 0) and zero, the best mode is determined as the SKIP mode. In other words, the remaining CU modes are not searched for inter mode decision [26].

Coded Block Flag Algorithm (CBF).
The detection of the optimal predicted mode will be provided by the coded black flag fast method (CFM) algorithm. For each mode of the CU, RD cost will be calculated. If CBF is zero (all transform coefficients are zeros: CBF_Y, CBF_U, and CBF_V), the other remaining modes will not be tested anymore [25,26]. Figure 11 shows the flowchart of the mode decision process. The Early_Skip condition checks if the motion vector difference of inter partition mode 2N × 2N is equal to (0, 0). The CBF_Fast condition checks if inter partition mode 2N × 2N does not contain nonzero transform coefficients. The algorithm evaluates the Early_CU condition directly when the condition is true. This Early_CU condition checks if the best inter coding mode is Skip. If the condition is true, the algorithm stops. Otherwise, it evaluates the next CU level of recursive mode decision if the current CU depth is not the maximum. The aforementioned process is repeated recursively for every coding depth 0, 1, 2, and 3, being the corresponding CU sizes. For every prediction mode, it is necessary to calculate the RD cost with its associated high computational cost. The combined fast algorithms (ECU, ESD, CBF) were proposed in order to reduce the FVC computational complexity and improve the RD performance.

Results
. The comparative performance of the proposed scheme to the original algorithm in terms of Bitrate, encoding time, and PSNR is listed in Table 2. The test configurations are (randomaccess_jvet10, lowdelay_jvet10 (P and B)) based on the JEM CTC [35].
As shown in Table 2 on average, runtime was reduced by 13% for the RA configuration and slightly less for the LDP and LDB configurations. Regarding the performance of the  International Journal of Digital Multimedia Broadcasting fast JEM algorithm, the Bitrate decreases by 0.6% with a loss of 0.05 dB in PSNR for the LDB configuration compared to the RA and LDP conditions. Furthermore, the results obtained show that the proposed approach performs well with high-resolution video sequences, since it can achieve up to 20% reduction in time. For low-resolution class C and D sequences, where a block of 128 × 128 pixels depicts a huge part of an image, the splitting chances of this block into a quadtree are therefore higher. However, the time reduction is less compared to other classes, but with an insignificant impact of the BR. In summary, the fast FVC algorithm provides a good trade-off between encoding time and coding efficiency. Figure 12 evaluates the RD curves for video sequences. The four points shown on these graphs represent the QP parameters 22, 27, 22, and 37. The Bitrate (kbps) is shown on the horizontal axis, while PSNR (dB) is shown on the vertical axis, in each chart. The achieved results demonstrate that the fast JEM algorithm offers almost similar performances to the original JEM software, with negligible loss of quality and Bitrate. According to Figure 12, the degradation of quality is important for lower values of QPs. Figure 13 shows the time saving for sequences of class A1 and B coded in RA configuration while varying the QP from 22 to 37. The reduction in time decreases in proportion to the increase in the QP value. This proposed algorithm achieves 14.65% reduction in execution time for the Camp-fireParty video and 13.9% for the BQTerrace video for lower QP values. 5.6. Comparative Performance with Other Approaches. For a more in-depth evaluation of the proposed scheme's encoding performance, a comparison with other approaches proposed in [22,38,39] is given below. Comparing the two execution times, our proposed scheme saves 13.10%, where only 12.5% is saved by [22], with an insignificant degradation of Bitrate, around 0.7%. Therefore, we confirm that our proposed scheme outperforms the method proposed in [22], and this is due to its ability to quickly split the QTBT partition, which ensures a low FVC complexity.
In the work cited in [38], authors proposed an enhanced fast algorithm of the QTBT structure. This proposed approach skips some partition processes in QTBT to enhance the encoding efficiency. The obtained results show that the proposed method in [38] achieves 10% encoding time saving with less than 0.2% BD-rate loss under the RA profile. Therefore, our proposed scheme outperforms the work cited in [38] in terms of encoding time by 13.10% with 0.7% of Bitrate.
In the work cited in [39], Huang et al. proposed an algorithm to reduce the encoding complexity by reusing the encoder decisions of the same CU explored in previous partition choices. The simulation results report that the proposed fast algorithms can achieve 9% encoding time 9 International Journal of Digital Multimedia Broadcasting reduction with a 0.1% BD-rate in RA configuration, while our proposed approach saves encoding time of 13.10% with an insignificant degradation of Bitrate, around 0.7%. When comparing our work to the state-of-the-art approaches in [22,38,39], we can conclude that our proposed scheme performs better in terms of the encoding efficiency.     [40,41]. In this context, all the modules will jointly learn through a single loss function, in which they will collaborate with each other by considering the trade-off between reducing the number of compression bits and improving the quality of the decoded video. Thus, the deep end-to-end video compression model can be advantageous to enhance FVC performance.
6.3. Deep Learning Approaches for FVC. Multimedia video streaming requirement has increased exponentially and the video currently consumes 75% of the internet traffic. Due to which video streaming and storage are a huge challenge for service providers. Image and video compression algorithms rely on FVC codecs which are encoders and decoders that lack adaptability. Due to the advent and advances in deep learning, these issues can be solved by replacing the coding tools for FVC with a new deep learning model. Yet, an intelligent fast algorithm based on deep learning models [42] will be proposed to achieve higher encoding efficiency, lower computational complexity, and better visual quality of the next generation video coding developed on 2020, named VVC [43,44].

Conclusion
In this paper, an overview of the FVC standard versus HEVC has been presented. We propose to compare the three JEM configurations in terms of evaluation metrics (encoding time, Bitrate, and PSNR). The most important feature of FVC is QTBT partition, which simplifies coding units and improves compression efficiency. We adopt a fast decision algorithm for reducing FVC encoding complexity. Experimental results reveal that the proposed method can consistently achieve promising performance for various video sequences.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.