Optimized Rate Control Algorithm of High-Efficiency Video Coding Based on Region of Interest

Aiming at the problems that the strategy of target bit allocation at the CTU layer has deviations from the human subjective observation mechanism, and the update phase of parametric model has a higher complexity in the JCTVC-K0103 rate control algorithm of ITU-T H.265/high eﬃciency video coding (HEVC) standard. Optimized rate control (ORC) algorithm of ITU-T H.265/HEVC based on region of interest (ROI) is proposed. Firstly, the algorithm extracts the region of interest of video frames based on time and space domains by using the improved Itti model. Then, the weight of target bits w is recalculated based on space-time domains to improve the rate control accuracy, and the target bits are distributed based on ROI by the adaptive weight algorithm once again to make the output videos more attuned with the human visual attention mechanism. Finally, the quasi-Newton algorithm is used to update the rate distortion model, which reduces the computational complexity in the update phase of the parametric model. The experimental results show that the ORC algorithm can obtain a better subjective quality in the compressed results with less bit error compared with the other two algorithms. Meanwhile, the rate distortion performance of the ORC algorithm is better on the premise of guaranteeing rate control performance.


Introduction
With the improvement of videos in the aspects of clarity and quality, the traditional video coding standard already cannot meet the coding requirements of high resolution videos. In order to catch up with the developed trend of high resolution videos and the requirements of coding technology, the video coding joint working group JCT-VC (Joint Collaborative Team on Video Coding) has developed a new video coding standard based on ITU-T H.264/AVC in 2010, named ITU-T H.265/High Efficiency Video Coding (HEVC) standard [1]. e rates can be reduced by more than 50% of ITU-T H.265/HEVC compared with ITU-T H.264/AVC. In the transmission process of high-resolution videos, the higher requirements are proposed for network bandwidth in order to avoid compressed coding stream in the buffer of limited bandwidth from overflow and to ensure a reasonable bit allocation strategy to generate videos with minimum loss of rate distortion performance. JCT-VC has proposed two solutions to control the rates of ITU-T H.265/HEVC, the JCTVC-H0213 rate control algorithm based on R − Q model, and the JCTVC-K0103 rate control algorithm based on R − λ model, which are the most representative [2]. e JCTVC-K0103 rate control algorithm has better rate control effects and less fluctuation of bits compared with the JCTVC-H0213 rate control algorithm. However, the JCTVC-K0103 rate control algorithm does not consider the characteristics of video content in the process of bit allocation at the CTU layer, which causes the inaccurate results of bit allocation to affect the quality of output videos and the accuracy of the rate control algorithm. At the same time, the gradient descent algorithm in the update phase of rate distortion model has high computational complexity, which increases the complexity of rate control algorithm. erefore, optimizing the JCTVC-K0103 rate control algorithm to improve the compression performance under the premise of guaranteeing the performance of rate control algorithm has become a hot point in the coding area of videos.
Aiming at the shortcomings of K0103 rate control algorithm, many scholars have done a lot of research at home and abroad, which appears in the aspects of the complexity measurement of CTU and the computational load of parametric model. In the aspect of the complexity measurement of CTU, Guo et al. [3] proposes a method which calculates the complexity of CTU based on the pixel statistical methods, but the bits allocated will have a large error of video sequences with severe local motion. In [4], SATD is taken as the measurement of complexity, which ignores the relevant characteristics of video content. In [5], differential histograms are adopted to measure the complexity of CTU, which has the uncertainty of the selection for threshold in different video sequences. e above algorithms improve the accuracy of complexity of CTU to a certain extent, but the overall performance of rate control algorithms is not improved obviously. In the aspect of the computational load of parametric model, in [6], the rate distortion of the parametric model is improved, but the adaptability is poor when the scenes of videos are transforming, and the computational load is also large. In [7], the gradient descent method is improved in the original parametric model, but with the increment of iterations, the computational load is also increasing gradually. e above algorithms improve the performance of the parametric model to a certain extent, but the computational load reduced is not obvious. e above algorithms are improved unilaterally from the complexity measurement of CTU and the computational load of the parametric model. e overall performance of optimized rate control algorithms is not obvious. In this paper, the rate control algorithm is optimized between the complexity measurement of CTU and weight assignments of bits at the CTU layer and the computational load of parametric model. Firstly, the improved Itti model is used to extract ROI of video frames. en, the complexity at the CTU layer is calculated based on ROI, and the adaptive weight algorithm is used to redistribute the target bits combining with the complexity of CTU, which makes the output videos more attuned with human visual attention mechanism. Finally, the quasi-Newton method is used to update the parametric model, which reduces the computational load of the update phase of the parametric model. erefore, the overall performance of rate control algorithm is improved.

JCTVC-K0103 Rate Control Algorithm.
e JCTVC-K0103 rate control algorithm allocates reasonable number of bits to each coding layer, given certain target rates based on the R − λ model, to optimize the coding performance, which includes the following two steps specifically: (1) Performing target bit allocation of GOP layer, image layer, and CTU layer hierarchically according to target bit rate (2) e quantization parameter QP is determined by the R − λ − QP model according to target bit allocation corresponding to the hierarchy 2.1.1. Target Bit Allocation at the CTU Layer. e bit allocation of each layer in the K0103 rate control algorithm is a dynamic process. It is necessary to refer to the actual number of bits in the current coding layer when performing the allocation of target bits in the next layer and to ensure that the number of target bits allocated of current coding layer is smaller than the total number of target bits allocated. e implementation process of target bit allocation at the CTU layer is as follows.
After the number of bits is allocated at the image layer, the target bit allocation of CTU layer depends on parameter R i , the number of target bits per frame. And equation (1) [2] of target bit allocation at the CTU layer is as follows: where R header represents the information of data header encoded, which includes the GOP flag bit, the frame flag bit, and so on; R compic represents the actual number of bits of CTU encoded in the current frame; N L represents the total number of CTU in the current frame, c � [1, N L ]; W k represents the weight of bit allocation in the kth CTU; W c represents the weight of bit allocation of the cth CTU.
In the process of bit allocation at the CTU layer, the weight W k of bit allocation of each CTU affects the compressed quality of videos directly. In the K0103 rate control algorithm, the weight allocated W k can reflect the texture complexity of current CTU. Generally, CTU with high texture complexity need to be allocated more bit rates, and the value W k should be larger at this time; CTU with low texture complexity need to be allocated less bit rates, and the value W k should be smaller at this time. e K0103 rate control algorithm uses the mean absolute difference (MAD) to characterize the texture complexity of current CTU based on this strategy [8]. Equation (2) is as follows: where M and N represent the width and height of current CTU, respectively; I(x, y) represents the pixel value of current CTU; I ′ (x, y) represents pixel value reconstructed of current CTU. According to the definition of MAD, the predicted value of MAD is only taken as the weight of target bit allocation in the process of target bit allocation at the CTU layer, and the accuracy is low, which affects the compressed quality of videos ultimately. erefore, the weight of bit allocation at the CTU layer needs to be adjusted.

Implementation of Target Bit.
e purpose of implementation of target bit is converting the value of target bit allocation into the value of quantization parameter QP, and the compression ratio and the final encoding rate are determined by QP. After the value of quantization parameter is determined, it is necessary to update the parameters and adjust the relevant parameters. e implementation process of target bits is completed at the image layer and CTU layer, and the implementation ideas are same basically. e implementation steps of target bits at the CTU layer are as follows: e K0103 rate control algorithm uses the R − D hyperbolic model to simulate the relationship between rate and distortion: where D represents the distortion values, C and K represent the parameters related to the characteristics of video content, and R represents the value of target bit allocation. In the ratedistortion optimization theory, the Lagrangian multiplier λ represents the absolute value of tangent slope in the ratedistortion curve, and the derivative of equation (3) is as follows: where α � C · K, β � − K − 1, and α and β represent the parameters related to the characteristics of video content. A large number of experimental studies have shown that [9] there is a linear relationship between the quantization parameter QP and the Lagrangian multiplier λ: Under the premise of target rates known, the QP can be determined by adjusting α and β. In the implementation process of target bits at the image layer or CTU layer, the relationship between λ and the value of target bits satisfies: where bpp represents the average number of bits per pixel in the image or CTU determined by the value of target bit allocation at the image layer or CTU layer. In the determining process of QP, λ can be determined by α and β firstly, and the relationship between λ and QP determines the quantization parameter QP. After performing encoding work of the current image or CTU, the parameters α and β are adjusted according to the actual number of bits encoded in the current image or CTU. e updating methods of parameters α and β are specifically as follows: where λ com represents the Lagrangian multiplier of image or CTU encoded, λ old represents the Lagrangian multiplier of current image or CTU, α old and β old are the parameter values of current image or CTU, bpp ′ represents the average number of bits per pixel in the image or CTU determined by the value of target bit allocation at the current image layer or CTU layer, and α new and β new represent the values updated of parameters encoded. e K0103 rate control algorithm uses the gradient descent method to update and adjust the parameters α and β.
However, the gradient descent method has a slow convergence rate, which leads to higher computational complexity for K0103 algorithm, and it is not conducive to practical application.

Extraction of ROI from Videos Based on Improved Itti Algorithm
To address the deficiencies of the Itti algorithm such as incomplete ROI of image and the edge blur, this paper improves the Itti algorithm to extract the ROI of video sequences based on picture contents in both spatial and temporal domains. First, the algorithm extracts texture and shape features in the spatial domain and extracts motion feature in time domain on the basis of the extraction of brightness, color, and orientation features. en, the normalized step and the cross-scale step of Itti model are adopted to generate six single-feature salient maps, the brightness feature maps, the color feature maps, the orientation feature maps, the texture maps, the shape feature maps, and the motion feature maps. Finally, the information entropy theory is used to acquire the weights of six singlefeature salient maps adaptively, and the cross-scale fusion is carried out to extract the final ROI of video sequences.

Texture
Feature. e characteristics of human visual system (HVS) indicate that [10] human eyes have different attention to different regions in the image, and the texturerich regions or the moving objects in the image are more likely to attract the attention of human eyes. e texture feature is extracted as a subfeature of the Itti model in this paper. Chiranjeevi and Sengupta [11] propose that the structural tensor matrix can represent the texture feature of image well, but the problem of inaccurate positioning of location exists. As a result of the isotropic Sobel operator having more accurate weighted coefficients of position, the isotropic Sobel operator is used to improve the structural tensor matrix to extract the texture feature of video frames, and the directional templates of isotropic Sobel operator are shown in Figure 1.
For the image I(x, y), let I x and I y represent the horizontal gradient and vertical gradient of image, respectively. en, the structural tensor matrix M of image is M � I 11 I 12 where I 11 � I

Journal of Electrical and Computer Engineering
where G x and G y are replaced with I x and I y , respectively, and Since the eigenvalues λ 1 and λ 2 can represent the overall trend of grayscale in the window and the contrast in the direction of eigenvector in the structural tensor matrix, the consistency T of video frames can be calculated by λ 1 and λ 2 : According to the central-peripheral difference algorithm (C-S algorithm) of the Itti model [12], the calculation formula of texture feature maps T(c, s) obtained by the consistency T of video frames is where c ∈ 2, 3, 4 { } represents the center of receptive field, s � c + δ (δ ∈ 3, 4 { }) represents the periphery of the receptive field, and Θ represents the difference operator among different scale feature maps. e normalized function N(·) is used to perform crossscale fusion of six texture feature maps to form a singlefeature salient map T: In summary, the texture feature salient map can be obtained.

Shape Feature.
Aiming at the problem that the edges of salient maps are blur by the Itti algorithm, many scholars have done a lot of research. Long and Wu [13] adopts the improved Canny operator to extract the shape feature to locate the edges of salient maps accurately, but the computational load is large. In this paper, the boundary function is used to analyze the consistency of shape. e SUSAN corner detection algorithm is used to extract the shape feature of video frames based on the idea of corner point; the corner detection algorithm has the advantages of simple calculations, accurate positioning, and strong antinoise ability compared with the traditional edge detection algorithms and Harris corner algorithm, KLT corner algorithm, Kitchen-Rosenfeld algorithm, and so on [14]. e steps that the SUSAN corner detection algorithm extracts the shape feature as follows: (1) Firstly, defining a graphic template containing 37 pixels to slide on the video frames and determining whether the pixel belongs to the USAN region. e discriminant is as follows: where r represents the length of neighborhood pixels from the central point, r 0 represents the point of central location, and t represents the value of similarity demarcation.
In order to obtain more stable results, the similarity calculations of pixel points are performed in the following equation: (2) Calculating the total similarity n(r 0 ), where D(r 0 ) represents the area of graphic template centered on r 0 . (3) According to n(r 0 ) obtained, the initial corner points are determined by using the corner response function: where g is introduced to eliminate the effects of noise. (4) A final set of corner points S is obtained by the operation of nonmaximum suppression among the initial corner points: where x i and y j represent the coordinates of corner points, respectively, and M and N represent the width and height of video frames, respectively. (5) Since the boundary function can describe the border of objects, the elements which are in the set of corner points S mentioned are brought into the coordinate function δ(k), and the coordinate function is convolved with the Gaussian kernel linearly at the scale σ to obtain the boundary function δ(σ, k) corresponding to the set of corner points. e boundary function δ(σ, k) is taken as a measure of shape feature in this paper: where N represents the length of border, represents the center of regions, and g(σ) represents the Gaussian kernel function at the scale σ. (6) According to the C-S operation, the calculation formula of shape feature maps obtained by the boundary function δ(σ, k) is as follows: (7) e normalized function N(·) is used to perform the cross-scale fusion of six shape feature maps to form a single-feature salient map δ In summary, the shape feature salient map can be obtained.

Motion Feature.
Aiming at the problem that the salient maps extracted by Itti model are incomplete, a large number of scholars have studied it. Jalink [15] proposes to improve the integrity of salient maps by using the MoSIFT algorithm to calculate the motion feature, but the MoSIFT algorithm has high computational complexity, which affects the realtime in the extracted process of salient regions. e human visual mechanism indicates [10] that the description of moving objects by human eyes is localized and the description of feature points by the SURF algorithm is also localized. erefore, the description of motion feature by the SURF algorithm is more attuned with the human visual attention mechanism. e SURF algorithm is used to improve the MoSIFT algorithm to extract motion feature of video frames based on the idea in this paper, which reduces the computational complexity of the MoSIFT algorithm and obtains more stable motion feature. e steps of the MoSIFT algorithm are shown in Figure 2.
e steps of extracting motion feature by the improved MoSIFT algorithm are as follows: (1) In the construction phase of Hessian matrix, the SURF algorithm uses Hessian matrix to extract the feature points. Since the elements which are in the Hessian matrix are calculated by second-order Taylor expansion, the computational complexity is high. Based on the theory of linear scale space (LOG), the derivation of function is equal to the convolution between the function and the derivative corresponding to the Gaussian function. And, the elements in the Hessian matrix are calculated by where f(x, y) represents a quadratic function corresponding to the image and G(x, y) represents a Gaussian function corresponding to the quadratic function. (2) e SURF algorithm is used for downsampling.
Compared with the SIFT algorithm, the SURF algorithm keeps the size of frames unchanged and changes the size of filter for downsampling, which reduces the computational complexity in the process of downsampling. Each pixel processed by Hessian matrix is compared with 26 pixels in the two-dimensional image space and the neighborhood scale space, which locates the preliminary key points and filters out the key points with weak energy and error location to determine the final stable SURF feature  Journal of Electrical and Computer Engineering D(φ k , θ), k � 1, . . . , 64, which are regarded as motion features in this paper, we get where ω 1 and ω 2 represent weights which are constants, and the motion feature are obtained by In summary, the motion feature salient map can be obtained.

Feature Fusion Based on Adaptive Weight.
e Itti algorithm adopts the mean method to assign weights to each single feature salient map, which ignores the contribution of each feature salient map to the final salient map, which affects the overall performance of feature salient map merged. Shannon's information entropy theory [16,17] can describe the overall statistical characteristics of source objectively, which can describe the contribution rate of the current impact factor to the whole. is paper determines the adaptive weight coefficient of each single feature salient map based on the information entropy theory, which can improve the accuracy of video frames merged.
Let a random variable of a single feature salient map is I(A i ), i � 1, . . . , n and its information entropy can be represented by where P(A i ) represents the probability when removing the ith signal and H(X) represents the information entropy of single feature salient map. e values of information entropy of the brightness feature salient maps I − , the color feature salient maps C − , the orientation feature salient maps O, the texture complexity salient maps T, the shape feature salient maps δ − , and the motion salient maps D obtained are calculated according to equation (25), respectively, shorted for H 1 , H 2 , H 3 , H 4 , H 5 , and H 6 . e adaptive weight coefficient α i is calculated as follows: In summary, the extraction of ROI of video frames can be completed, which solves the problem that the Itti model extracts the incomplete ROI and the positioning edge is blur. In order to verify the effectiveness of the above algorithm, three pictures from MSRA, the 208th frame of the tennis sequences, and the 25th frame of the basketball sequences provided by JCT-VC are used to perform experiments comparatively for the ORC algorithm, the Itti algorithm [18], the GBVS algorithm [19], the SR algorithm [20], the FT algorithm [21], the CAS algorithm [22], and the LC algorithm [23].
As shown in Figure 3, the ORC algorithm can extract the ROI in the image or video frame accurately. And, the ORC algorithm performs better in terms of the integrity of ROI extracted and the positioning accuracy of edge in the ROI compared with the other six algorithms, which proves the effectiveness of the ORC algorithm.

Target Bit Allocation of CTU Layer Based on Space-Time Domain
Aiming at the problem that MAD used as the weight index of CTU cannot measure the complexity of current CTU accurately in the process of target bit allocation, many scholars have studied it. Khoshnevisan and Salmasi [24] propose to take the gradient as the weight index of CTU, but ignored the contribution of influencing factors in time domain. Studies have shown that [25] the gradient of each pixel in the frame has a linear relationship with the bits allocated. e bits bpp allocated of CTU are taken as the allocated weight of the complexity of space-time domain based on ROI, the gradient T is used to measure the complexity of CTU, and the weight of bit allocation ω is redistributed at the CTU layer, which improves the accuracy of rate control algorithm. en, the weight of bit allocation ω of CTU layer is distributed once again by the adaptive weight algorithm, which makes the output videos more attuned with human visual attention mechanism.
As a result of human eyes being sensitive to the gradient information in the image, we take gradient T as a measure of complexity of current CTU: where M and N represent the height and width of current CTU, respectively, and I i,j represents pixel value of the component of brightness at the position (i, j) of CTU. In equation (6), bpp represents the average number of bits per pixel in the image or CTU determined by the value of target bit allocation at the current image layer or the CTU layer; we take bpp as the average number of bits per pixel in the CTU determined by the value of target bit allocation at the current CTU layer. e average number of bits bpp of current CTU is as follows: where bpp cf represents the total number of target bits in the current frame and N T represents the total number of CTU in the current frame. As a result of the CTU of current frame and the CTU of adjacent frame having the similar texture feature, the complexity of current CTU is measured by the complexity of current CTU and the complexity of CTU corresponding to the adjacent frame. e weight of bit ω new allocated of current CTU is where bpp cur represents the average number of bits allocated for the CTU in current frame, bpp nei represents the average number of bits allocated for the CTU in adjacent frame corresponding to the current CTU, C cur represents the texture complexity obtained of current CTU according to equation (28), and C nei represents the texture complexity obtained of CTU corresponding to adjacent frame according to equation (28). e redistribution of weight of bits at the CTU layer can be achieved by following the above steps. e human visual attention mechanism indicates that the attentiveness of human eyes from video frames is distributed in the central regions, while the attentiveness of human eyes from the peripheral regions of video frames is small [10]. We adjust the weight allocated to the target bits at  Journal of Electrical and Computer Engineering the CTU layer to realize the redistribution of target bits once again based on the idea.
According to Figure 4, we determine the central coordinates (x 1 , y 1 ) of CTU in the current frame and the central coordinates (x 2 , y 2 ) of CTU coding in the current frame firstly, and the Manhattan distance is used to determine the weight of CTUα in the ROI and the weight of CTUβ in the RONI in the current frame. e calculation formula is as follows: After the weights α and β are obtained, equation (32) designed is used to adjust the weight ω new once again: In summary, the calculation of weight of bits at the CTU layer can be completed. e more bits are allocated to ROI of high texture complexity in the compressed frames, and the less bits are allocated for RONI of low texture complexity in the compressed frames, which makes the output videos more attuned to human visual attention mechanism.
In order to verify the effectiveness of the ORC algorithm, the 25th frame of the BasketballPass sequences and the 208th frame of the tennis sequences provided by JCT-VC are used for the experiment. Figure 5 shows the experimental results of the ORC algorithm and the K0103 algorithm of the HM10.0 model. From the perspective of vision, the subjective quality of video frames compressed by the ORC algorithm is better than the subjective quality of the K0103 algorithm in the HM10.0 model in Figure 5. e quality of videos compressed is improved by the ORC algorithm. To further illustrate the effectiveness of the ORC algorithm, the enlarged details of the 208th frame of Tennis sequences are displayed for comparison. Figures 6 and 7 show the results of ROI and RONI in enlarged details.
We can see that the subjective quality of ROI by the ORC algorithm is better than the K0103 rate control algorithm from Figure 6. e value of QP of CTU is 29 in Figure 6(a) by the code stream analysis software, Elecard HEVC analyzer, and the value of QP of CTU is 28 in Figure 6(b), which indicates that the ORC algorithm can allocate more rates to the ROI. We can see that the subjective quality of RONI by the K0103 rate control algorithm is better than the ORC algorithm from Figure 7. e value of QP of CTU is 30 in Figure 7(a) by the code stream analysis software, and the value of QP of CTU is 31 in Figure 7(b), which indicates that the ORC algorithm reduces the allocation of rates to the RONI. On the basis of above analysis, it is verified that the ORC algorithm can redistribute the rates of RONI and ROI in the video frames, which makes the output videos more attuned with human visual attention mechanism.

Update of Parametric Model Based on Quasi-Newton Method
Aiming at the JCTVC-K0103 rate control algorithm adopting the gradient descent method to update the parametric model with the problem of slow convergence speed and high computational load, Li et al. [26] propose to use the Newton method to update the parametric model, which reduces the complexity of rate control algorithm to some extent, but the calculation load is still large and the overall performance is improved limited in the updated phase of the parametric model. is paper introduces the quasi-Newton method to update the parametric model and uses the BFGS algorithm to update the positive definite matrix B n approximated to the inverse of Hessian matrix, which reduces the computational load of the parametric model. e quasi-Newton method is used to solve the optimization problems usually. e basic idea is to take the optimal solution of quadratic model as search direction, which obtains a new iterative point x n+1 , and update B n in the each iteration. e iterative equation of quasi-Newton method is as follows: where x n represents the nth iterative point, x n+1 represents the (n + 1)th iterative point, λ represents a constant, and H n represents a positive definite matrix approximated to the inverse of Hessian matrix. e specific implementation steps of the ORC algorithm are as follows: (1) In the implementation process of target bits, the relationship between rate and distortion is shown in equation (3). According to equation (3), the distortion value D 1 estimated from the target bit rate and the actual coding distortion value D 2 are as follows: where C 1 and k 1 represent parameters related to the characteristics of video content before the parametric model is updated; C 2 and k 2 represent parameters related to the characteristics of video content after the parametric model is updated; R 1 and R 2 represent target bit rate and actual bit rate, respectively.
(2) Taking the logarithm of equations (34) and (35), respectively, Journal of Electrical and Computer Engineering e error e 2 between the value of actual coding distortion and the value of distortion estimated by the target bit rate from equations (36) and (37) are as follows: where C � ln C 2 and C 1 � D 1 . And, taking the derivative of C and k 2 , respectively, (3) Performing the iteration by quasi-Newton method according to equation (33), we get As we can see from equation (4), α � C · k, β � − k − 1, then where α old and β old represent the parameters while determining the quantization parameter QP and α new and β new represent parameters updated. e parameters α new and β new updated can be obtained by simultaneous equations (40)-(43).
(4) Selecting B n as the approximation of H − 1 n according to the following conditions of quasi-Newton method, we get where δ n � x k+1 − x k and y n � g n+1 − g n , g n represents the first derivative of target function at x n . In this paper, the BFGS algorithm is used to calculate B n , and the performance and accuracy of BFGS algorithm are higher compared with the DFP algorithm and the Broyden algorithm. e equation of B k+1 is as follows: where s k � x k+1 − x k , y k � ∇f(x k+1 ) − ∇f(x k ), and the positive definite matrix generated satisfies the following equation: In summary, the updated process of α and β can be achieved. In order to verify the performance of the improved algorithm compared with the K0103 rate control algorithm of the HM10.0 model, five different test sequences provided by JCT-VC with 200 frames selected are experimented, which are under the two configuration files of the type of low delay LDMmain and the type of random access RAMmain. And, the coding efficiency is measured by encoding time saved ΔT [27,28]. e calculation equation of ΔT is as follows: where Time pro represents encoding time of the algorithm in this paper and Time HM10.0 represents encoding time of the K0103 rate control algorithm. From the comparison of experimental data in Table 1, we can see that the coding time by the algorithm in this paper is lower than the coding time of K0103 rate control algorithm regardless of the configuration file being LDMmain or RAMain, which reduces the computational load in the updated phase of the parametric model, and the algorithm in this paper is more effective.

Experimental Parameters and Evaluation Indicators.
In order to verify the validity of the algorithm in this paper, the hardware configurations are as follows: Inter(R) Core(TM) i5-3470, the main frequency 3.2 GHz, the memory 4 GB; the software configurations are as follows: Microsoft Visual Studio 2010 and OPENCV2.4.10 are used as experimental platform. Simulation was performed to verify the compression performance of the ORC algorithm in the HM10.0 model. e experimental data sets come from five different levels of video sequences provided by JCT-VC. e configuration files are LD configuration files of the IPPP coding structure. e number of experimental frames is 100 frames. In order to evaluate the compression performance of the ORC algorithm under different sequences, the component of peak signal-to-noise ratio of brightness Y − PSNR, the increment of the component of peak signal-to-noise ratio of brightness ΔPSNR, the error of rate ΔBR, the peak signal-to-noise ratio PSNR, the percentage increase of the bit rates BDBR, and the reduction of peak signal-to-noise ratio BDPSNR [29,30] are used as indicators to measure the compression performance of the algorithm in this paper. e equations of ΔPSNR and ΔBR are as follows: where Y − PSNR Pro represents the component of peak signal-to-noise ratio of brightness of coded image obtained by the ORC algorithm, Y − PSNR HM10.0 represents the component of peak signal-to-noise ratio of brightness of coded image obtained by the K0103 rate control algorithm, BR Pro represents the actual output rate obtained by the ORC algorithm, and BR Tar represents the target bit rate.

Analysis of Experimental
Results. e performance of the ORC algorithm in this paper is evaluated from two aspects: compression performance and rate control accuracy. e experimental comparison results of the ORC algorithm, the K0103 algorithm, and the algorithm in [31] are shown in Tables 2 and 3, respectively, under four sets of target bit rate, which are 500 kbps, 1 Mbps, 2 Mbps, and 4 Mbps.
From the experimental data in Tables 2 and 3, we can see that the range of ΔBR by the ORC algorithm is 0.000% ∼ 0.691%, and the mean value is 0.190%. e range of ΔBR by the K0103 algorithm is 0.011% ∼ 4.137%, and the mean value is 0.244%. e range of ΔBR by the algorithm in [31] is 0.009% ∼ 0.824%, and the mean value is 0.220%. e output bit rates of the ORC algorithm are more in line with the target bit rate. From the perspective of rate control accuracy, the ORC algorithm can control the rate more accurately compared with the K0103 algorithm and the algorithm in [31]. At the same time, the value of Y-PSNR by the ORC algorithm is improved by 0.065 dB and 0.045 dB compared with the K0103 rate control algorithm and the algorithm in [31], respectively. From the analysis of compression performance, the ORC algorithm improves the overall quality of videos while maintaining the rate control accuracy.
In order to analyze the compression performance of the ORC algorithm specifically, BasketballPass sequences with complex texture and Kimono sequences with flat texture are selected for the experiment. Figure 8 shows the comparison results of Y-PSNR each frame among the ORC algorithm, the K0103 algorithm, and the algorithm in [31] when QP � 27.
We can analyze that the average value of Y-PSNR by the ORC algorithm is larger compared with that by the K0103 algorithm and the algorithm in [31] under two sequences of different complexities from Figure 8. e ORC algorithm can measure the texture complexity of video frames more accurately, and the compression performance gets promoted. Table 4 shows the experimental data of PSNR in the ROI and RONI of four different sets of video sequences with 100 frames per set under different target bit rates. As we can see from Table 4, the average PSNR of the K0103 algorithm in the ROI is 36.77 and the average PSNR in the RONI is 36.72. e average PSNR of the algorithm in [31] in the ROI is 37.07, and the average PSNR in the RONI is 36.48. e average PSNR of the ORC algorithm in the ROI is 37.55, and the average PSNR in the RONI is 36.17. Compared with the K0103 algorithm and the algorithm in [31], the average values of PSNR increased to 0.78 dB and 0.48 dB, respectively, by the ORC algorithm, which means that the ORC algorithm has a definite advantage to improve the quality of videos compared with the others two algorithms. At the same time, according to the analysis of the data in Table 4, we     Journal of Electrical and Computer Engineering 13 can see that the range of PSNR by the K0103 algorithm between ROI and RONI is 0∼0.08 dB, the range of PSNR by the algorithm in [31] between ROI and RONI is 0.23∼1.24 dB, and the range of PSNR by the ORC algorithm between ROI and RONI is 0.52∼2.32 dB. Compared with the K0103 algorithm and the algorithm in [31], the ORC algorithm has a larger difference of PSNR between ROI and RONI, which indicates that the ORC algorithm can make the output videos more attuned with the human visual attention mechanism.
In terms of rate distortion performance, the experimental results of the ORC algorithm compared with the K0103 algorithm and the algorithm in [31] under five categories of test sequences are shown in Tables 5 and 6.  comparison results of the rate distortion (RD) curves among the ORC algorithm, the K0103 algorithm, and the algorithm in [31] under the Johnny sequences and the ParkScene sequences, which have different resolutions.
We can see that the rate-distortion curves of the ORC algorithm are above the rate-distortion curves of the K0103 algorithm and the algorithm in [31] from Figure 9, which indicates that the Y-PSNR of the ORC algorithm is larger compared with the K0103 algorithm and the algorithm in [31] when the rates are same, and the RD performance is more optimized. e overall performance of the ORC algorithm is better.

Conclusions
is paper proposes an optimized rate control algorithm of ITU-T H.265/high-efficiency video coding based on the region of interest. e algorithm improves the Itti model based on the space-time domain firstly, which extracts the ROI of video frames. And, the target bits of CTU layer are redistributed based on ROI so that the output videos are more attuned with the human visual attention mechanism. Finally, the quasi-Newton method is used to update the parametric model, which reduces the computational complexity in the updated phase of the parametric model. e experimental results show that the ORC algorithm has better compression performance and rate control accuracy than the K0103 algorithm and the algorithm in [31], which can obtain better compression results of videos.