High Embedding Capacity Data Hiding Algorithm for H . 264 / AVC Video Sequences without Intraframe Distortion Drift

1Department of Information Engineering and Computer Science, Feng Chia University, Taichung 40724, Taiwan 2Faculty of Information Technology, Hung Yen University of Technology and Education, Dan Tien, Khoai Chau, Hung Yen, Vietnam 3School of Engineering and Technology, Tra Vinh University, Tra Vinh Province, Vietnam 4Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan


Introduction
With the rapid development of multimedia and network technology, a huge amount of digital media, i.e., images, video, audio, and texts, is transmitted each second in the public network, such as the Internet.Such transmitted media are easily modified or illegally copied by malicious attackers.As a result, the security of private information has become a very important issue.Therefore, many solutions have been proposed in the literature, and they can be divided into two different categories, i.e., encryption and data hiding techniques.Since the encryption technique converts the media into a meaningless form that will emphasize the importance of the media's content, this technique can actually result in attracting the attention of attackers.Data hiding is one promising technique to protect the security and privacy of the digital media because it embeds the secret data into the cover media.The embedded media that contain the secret data have a meaningful form, which helps avoid attracting the attention of attackers and can guarantee the security and privacy of the secret data.
Many data hiding schemes [1][2][3][4][5][6][7] have been proposed for different digital data in the last decade.After Richardson introduced the H.264/AVC video compression standard [8] in 2003, this standard has been used extensively for hiding secret data [9][10][11][12][13][14][15][16][17][18][19].In 2005, Noorkami et al. [9] proposed a low complexity watermarking algorithm by using the relative change of the DC coefficients of the 4 × 4 block.In their scheme, a public key is used for determining the embedded data, while the copyright owner possesses a secret key.In 2006, Nguyen et al. [10] proposed a fast watermarking system based on H.264/AVC motion vectors.However, their scheme offered low embedding capacity; i.e., the average embedding capacity was only about 2,000 bits.Then, to achieve robustness, Zhang et al. [11] proposed a new data hiding scheme with a 2D, 8-bit watermark in the compressed domain.Noorkami and Mersereau [12] introduced a framework for robust watermarking of H.264 encoded video by using the quantized AC coefficient to obtain optimal detection of video watermarking.By using a texture masking-based perceptual model, Gong et al. [13] proposed a fast and robust watermarking scheme for H.264 video.In this scheme, the quantized DC coefficients were used to conceal the watermark.However, in these two schemes [12,13], the original watermark is required for detection.In 2007, Kapotas et al. [15] proposed blind data hiding in an H.264 stream by using the difference of block sizes during the interframe prediction stage to increase the embedding capacity.However, Kapotas et al. 's scheme had a high bit-rate increment.To solve this issue, Kim et al. [16] embedded the watermark bit into the sign bit of the trailing ones in Context Adaptive Variable Length Coding (CAVLC).However, embedding the watermark in the discrete cosine transform (DCT) coefficients of the I-frames in these previous schemes result in stego videos that have low visual quality, which is caused by the distortion drift in the I-frame prediction.To overcome this shortcoming, in 2010, Ma et al. [17] proposed a new algorithm for data hiding in H.264/AVC based on DCT coefficients.Their scheme used several pairedcoefficients of a 4 × 4 macroblock to embed the secret data to avoid distortion drift.However, this scheme obtained limited visual quality.To improve the visual quality of stego videos, Huo et al. [14] proposed a new data hiding scheme by using the controllable error drift-elimination technique.However, unsatisfactory embedding capacity was obtained by their scheme.To increase the embedding capacity of Ma et al. 's scheme, Lin et al. [18] fully used the remaining luminance blocks to hide the secret data.Their experimental results indicated that their embedding capacity was improved further when the increase of embedding capacity obtained by Lin et al. 's scheme is 0.15 bits per pixel (bpp) more than that of Ma et al. 's scheme.However, the average embedding capacity of these two schemes [17,18] is still low, i.e., always smaller than 0.68 bpp, because they use a pair of DCT coefficients to hide the secret bit separately.By doing so, as the amount of embedded secret data increases, the the distortion of the video becomes greater.To overcome these shortcomings, we propose in this paper a new data hiding scheme for video sequences without intraframe distortion drift.Instead of using the coefficient pair separately for embedding data, all embeddable coefficient pairs in each luminance block are determined and classified into two different clusters, i.e., the embedding group and the averting group.The secret data are hidden in the embedding group by minimum modification, while the averting group is used to avoid distortion drift.The experimental results showed that the proposed algorithm further improved the embedding capacity while maintaining good visual quality and no distortion drift.
The remainder of this paper is organized as follows.Section 2 provides information about intraframe prediction and introduces the previous data hiding schemes.The proposed scheme is explained in Section 3. In Section 4, experimental results are presented that illustrate the performance of the proposed scheme in comparison with some previous schemes.Our conclusion is presented in Section 5.

Related Work
2.1.Intraframe Prediction.Intraframe prediction is a technique in H.264/AVC coding [7] that is used to reduce the spatial redundancies of H.264/AVC intraframes.In H.264/AVC coding, for each block, some previously encoded adjacent blocks are used to predict the pixels of the current block.Figure 1 shows the current 4 × 4 block,  , , with its pixels labeled from  1 to  16 .These pixels are predicted based on the reference pixels (labeled from  to M) of four adjacent blocks.These four blocks were encoded previously by using a prediction formula corresponding to the selected optimal prediction mode from nine prediction modes of each 4 × 4 block in Figure 2.
In H.264/AVC encoding, the current block,  i,j , is subtracted from its prediction block,  i,j , to obtain the residual block,  , =  , −  , .Then, the residual block, where  reconstructed block, denoted as   , , is calculated as   , =  , +   , .In the decoding phase, the residual block,   , , is reconstructed using the dequantization process and the 4 × 4 integer inverse DCT transformation on   , by (2), and the reconstructed block is obtained as   , =  , +   , . where

Data Hiding Schemes Based on Intraframe Prediction.
In 2010, Ma et al. [17] used three pairs of quantized DCT coefficients in   , for embedding data into H.264/AVC video sequences.To prevent intraframe distortion drift during the embedding process, they analyzed the use of seven pixels, i.e.,   4 ,   8 ,   12 ,   13 ,   14 ,   15 ,    16 , in the reconstructed block,   , , for the intraprediction process.Figure 3 shows the four adjacent blocks, i.e.,  ,+1 ,  +1,−1 ,  +1, , and  +1,+1 , that are affected directly by the above process.For example, if the selected prediction mode of  i+1,j is 0, as shown in Figure 2  the pixels located at the positions of   13 ,   14 ,    15 , and b r 16 are referenced for predicting  +1, or  +1,−1 .Con 3 consists of the prediction modes in {4, 5, 6}.This means that the pixel located at the position of   16 is referenced for predicting  +1,+1 .
To take advantage of the relationship of the reference pixels and the selected prediction modes for embedding data, these three conditions also can be classified into the five categories presented in Table 1.
To solve the drift distortion problem during embedding the secret data, Ma et al. classified the current block into three different conditions, i.e., Con 1 , Con 2 , and Con 3 .Then three specific pairs of quantized DCT coefficients are selected for embedding three secret data bits.Take the category Con 2 as an example, three coefficient pairs, i.e., {( 01 ,  21 ), ( 02 ,  22 ), ( 03 ,  23 ), are used for embedding.The main reason is that these three pairs have the same property; i.e., when the values of quantized DCT coefficients are modified in a pair, the modification will be concentrated only on the two middle columns or the two middle rows of the block.
In this scenario, it is clear that the modification is concentrated only on the two middle rows, while the modifications in the first and the last rows are zeros.Therefore, the distortion drift is avoided.However, they did not fully explore all of the cases for embedding data, so their average embedding rate was less than 0.45 bpp.
To further improve the embedding capacity of Ma et al. 's scheme while avoiding the distortion drift in the H.264/AVC video sequences, Lin et al. [18] have divided the relationship of the reference pixels and the selected prediction modes into five categories presented in Table 1.Then, for the block belongs to three categories, i.e., Cat 1 , Cat 2 , and Cat 4 , Lin et al. extracted one more coefficient pair for embedding one more secret bit by the same way as was done by Ma et al. 's scheme.In addition, each block belonging to Cat 5 is also used for embedding one more secret bit to increase embedding capacity.As a result, the embedding capacity obtained by Lin et al. 's scheme is 0.15 bits per pixel (bpp) higher than that of Ma et al. 's scheme.However, in these two schemes [17,18], each pair of quantized DCT coefficients is subsequently perturbed to embed only one secret bit.Thus, to embed  secret bits, n selected pairs of quantized DCT coefficients are modified.This means that the more secret bits are embedded, the more distortion will cause in the video frames, leading to low visual quality of the video frames.In the schemes [17,18], to guarantee the higher visual quality, if ( mn ,  pq ) is a zero coefficient pair, it is not used to embed any secret data bits.Therefore, their embedding capacity is still low when the average embedding capacity is smaller than 0.68 bpp when QP = 28 is used.
It is obvious that, to maintain the high visual quality of video sequence, the previous schemes [17,18] have selected quantized DCT coefficient pairs (excluding the zero coefficient pairs) of three categories, Cat 1 , Cat 2 , and Cat 4 for embedding data.However, their schemes still obtained low embedding capacity, while the visual quality of video sequences is not guaranteed.Therefore, in this paper, to overcome their shortcoming, instead of modifying each coefficient pair for embedding data, the group of coefficients are selected and altered at the same time.In particular, all suitable pairs of quantized DCT coefficients are extracted and classified into two different groups, one group is used for embedding data and the other one is used to prevent distortion drift of video sequences.This means that, in the proposed scheme, the group of  coefficient pairs is modified to embed  secret bits.By modifying by the group, at most /2 coefficient pairs are modified which guarantees the better visual quality of the embedding video sequences.In addition, to increase embedding capacity, in the proposed scheme, zero coefficient pairs are still used for embedding data.The details of the proposed scheme are described in the next section.

The Proposed Scheme
Figure 4 shows all of the main processes of the proposed embedding phase and extracting phase.In the embedding phase, the original H.264/AVC video sequence is first decoded by entropy coding.Then, the 4 × 4 quantized DCT blocks meet three cases, i.e., Cat1, Cat2, and Cat4; four secret data bits are embedded by based on group modulation.And if the blocks belong to the category Cat3, each coefficient is used to contain one secret bit.Then, all the quantized DCT coefficients are entropy encoded to get the embedded H.264/AVC video sequences.In this phase, to prevent the distortion drift in the proposed scheme, quantized DCT coefficients of each block are partitioned into two groups, i.e., the embedding group and the averting group.Then, to achieve high embedding capacity and to ensure good image quality, the embedding group is used for embedding data, while the averting group is used for preventing the proliferation of errors.Figure 4(b) shows the detail of the extracting phase.The embedded H.264/AVC video sequence is entropy decoded.Then, the category of the 4 × 4 quantized DCT blocks is determined.After that according to the determined category, the corresponding secret bits are extracted.

Category Selection and Coefficient Grouping.
To prevent intraframe distortion drift in the proposed scheme, all blocks that belong to the first four categories are selected for embedding data by suitable ways.Therefore, the pixels that are used for intraframe prediction are not used during the embedding process so that the embedding distortion would not affect the other adjacent blocks.Figure 5 shows the percentage of the blocks which meet the conditions of the first four categories of the 14 H.264/AVC test video sequences.It is obvious that most of the blocks in each video sequence belong to Cat 1 and Cat 2 .Therefore, the proposed scheme is designed to embed more secret bits into these two categories with small distortion and without distortion drift.In the proposed scheme, four quantized DCT coefficient pairs of these three categories are also selected and divided into two groups, i.e., an embedding group  and an averting group A, to conceal the secret data.Take the block  which belongs to category Cat 1 as an example, four coefficient pairs, (  ,   ) ∈ {( 00 ,  02 ), ( 10 ,  12 ), ( 20 ,  22 ), ( 30 ,  32 )}, are selected for embedding data.Then, all of the first coefficients   of each pair are grouped to construct the embedding group, E, which is used to carry the secret bits by modifying, at most, two coefficients, whereas the remaining coefficients   of four pairs are clustered into the averting group A, which is modified to prevent intraframe distortion drift during embedding process.Figure 6 shows an example of the grouping process of Cat 1 .
Four coefficient pairs Averting group A  Step 1. Read four appropriate pair-coefficients and classify them into two different groups, i.e., E and A, as was done in Section 3.1.
Step 2. Read  secret bits,  1 ,  2 ,. ..,  n , from the encrypted message and embed them into group .For embedding, the weight value  of group  is calculated by (3).In this paper, four coefficient pairs are used for embedding; therefore, the value of  could be set to at most 4.
Step 3. If the value of  is 0, all elements of group  are kept unchanged.Otherwise, we can arbitrarily increase or decrease the elements of group  by 1.If  i increases by 1, the values of  will be increased by 1 to 4. Whereas, if  i decreases by 1, the values of  will be decreased by 2  − 4 to 2  − 1.It can be observed that, in the proposed scheme, at most, two elements in group  are modified by the value 1.Therefore, we can alter  to   by changing; at most, two elements in  to satisfy   ≡ ( 1  2 . . .  ) mod (2  ).
Step 4. Preventing the drift of intraframe distortion, the inverse operations of modifying the group  are performed on the corresponding elements of the group .For example, if the element  i increases by 1, the element  i will decrease by 1 and vice versa.Similarly, when the block  , satisfies one of two categories, i.e., Cat 2 and Cat 4 , four coefficient pairs, (  ,   ) ∈ {( 00 ,  20 ), ( 01 ,  21 ), ( 02 ,  22 ), and ( 03 ,  23 )}, are selected to generate the embedding and averting groups, E and A; then  secret bits are embedded into the group by the same manner as was done for Cat 1 .
Case 2 (the block belongs to category Cat 3 ).In such blocks, each coefficient is used to embed one secret bit because this category does not have any effect on other adjacent blocks during the encoding process.Then, the stego coefficient,    , is calculated by where  ij is one of 16 quantized DCT coefficients in   , .Here, if the block  , belongs to Cat 5 , we leave it without embedding any secret bits.

Extracting Phase.
In this subsection, the extracting algorithm is used to extract the secret data from the embedded H.264/AVC video sequences.If the current block belongs to Cat 1 , Cat 2 , or Cat 4 , the embedding group  will be reconstructed as was done in the embedding phase.Then, n embedded bits are extracted by For the block that belongs to Cat 3 , each coefficient is checked to determine the embedded bit,  r , by using

Experimental Results
In this section, we describe the experimental evaluation of the performance of the proposed scheme.Fourteen video sequences, i.e., Akiyo, Bridge-Close, Bridge-Far, Carphone, Claire, Coastguard, Container, Foreman, Grandma, Hall, Mobile, Mother-Daughter, News, and Salesman, were used as test samples.The size 30 of the group of pictures (GOP) and the structure of "IBPBP" were used in the experiment.Six different quantization steps (QP), i.e., 18, 23, 28, 33, 38, and 43, were checked for the 14 video sequences mentioned above.In principle, in H.264/AVC coding, if a small value of QP is used, the better visual quality of video sequences is obtained and the more encoded bits are required.

Embedding Capacity Evaluation.
Figure 7 shows the performance of the proposed scheme, in terms of embedding capacity, using six QPs.It is apparent that using a smaller value of QP resulted in a higher embedding capacity.Table 2 compares the embedding capacity of the proposed scheme and two state-of-the-art schemes [17,18].As shown Table 2: Comparison of embedding capacity of the proposed scheme and two previous schemes [17,18] in terms of embedded bits per 4 × 4 block.

QP
M a e t a l. in the table, the average embedding capacity of the proposed scheme was considerably higher than those of the two stateof-the-art schemes.The average improvement in embedding capacity of our proposed scheme over the schemes of Ma et al. [17] and Lin et al. [18] were 160 and 91%, respectively.The main reason for this improvement over Ma et al. 's scheme was that they only selected three coefficient pairs for carrying secret bits, which resulted in low embedding capacity.Lin et al. 's scheme also had a lower embedding capacity than the proposed scheme because they did not use coefficient pairs with value of zero for embedding data to avoid significant distortion in the video intraframes.In our proposed scheme, all of the blocks that belonged to the first four categories were used to embed secret data with a small modification.As a result, the proposed scheme achieved higher embedding capacity.Specifically, Table 3 shows the embedding capacity of the three schemes for 14 test video sequences when the value of QP was 28.As Table 3 shows, the gain in embedding capacity of the proposed scheme ranged from 22 to 629% better than that of Ma et al. 's scheme and from 12 to 410% better than that of Lin et al. 's scheme.The value of the improvement rate was different for the 14 video sequences

Visual Quality Evaluation.
To evaluate the visual quality of video frames in the three schemes, the peak signal-tonoise ratio (PSNR) [20] is calculated by comparing the original frame to the embedded frame.Figure 8 shows the performance of the proposed scheme in the visual quality with different value of QP.It clear to see that the higher PSNR is obtained when the smaller QP is used.Table 5 shows comparison results of the proposed scheme, Ma et al. 's scheme, and Lin et al. 's scheme in terms of visual quality of the video frames.It can be observed that the PSNR of the proposed scheme was slightly smaller than that of other two schemes [17,18].However, the proposed scheme can improve embedding capacity significantly; i.e., the average improvement rates were 160 and 91% over Ma et al. 's scheme and Lin et al. 's scheme, respectively.Table 6 compares the visual quality of the three schemes for QP = 28, corresponding to the embedding capacity in Table 3.Compared with Ma et al. 's scheme, the average degradation of the proposed scheme was larger than 0.45 dB.However, the proposed scheme provided better visual quality of video frames than Lin et al. 's scheme.
For a fair comparison, both values of PSNR and the structural similarly (SSIM) [21] index are used to evaluate the visual quality of the proposed scheme and two other schemes [17,18].Here, the max embedding capacity obtained by Lin et al. 's scheme [17] is embedded into the proposed scheme.[17,18] for QP = 28.
Comparison of SSIM of the proposed scheme and two previous schemes [17,18] 0.89 0.9

Figure 1 :
Figure 1: Predicted pixels in block,  , , and the reference pixels in the adjacent region.

Figure 3 :
Figure 3: The current block and the four blocks directly affected by the encoding process.

Figure 4 :Figure 5 :
Figure 4: Main processes of the proposed scheme: (a) embedding phase and (b) extracting phase.

Figure 8 :
Figure 8: Visual quality of the proposed scheme for different QPs.

Table 1 :
Five general categories of reference pixels and selected prediction modes.For example, consider the quantized DCT coefficient pair ( 03 ,  23 ) of   , .Assume that  03 is added by the value of V to embed the secret data, i.e., Δ = [ ], which will propagate to other adjacent blocks.However, in the Ma et al. 's scheme, to embed a hidden bit V in ( 03 , 23 ), the quantized DCT coefficient pair ( 03 , 23 ) is perturbed to ( 03 +V,  23v), i.e., Δ = [ 3.2.Embedding Phase.In this subsection, the embedding algorithm is described in detail.First, for security reasons, the secret data are encrypted in advance to S = s 1 , s 2 , . .., s |S| , and s  ∈ {0, 1}.Original video sequences are decoded by entropy decoder to get the intraframe prediction modes and quantized DCT coefficients.Then, if the 4 × 4 quantized DCT blocks belong to Cat 1 , Cat 2 , Cat 4 , the secret data are embedded by based on group modulation.Otherwise, if the blocks belong to the category Cat 3 , each coefficient is used to contain one secret bit.Then, all the quantized DCT coefficients are entropy encoded to get the target embedded video sequences.To make the embedding algorithm clearer, two cases are used for embedding data as an example: Case 1.If the blocks belong to Cat 1 , Cat 2 , and Cat 4 , the four steps are implemented for embedding  secret bits as follows.

Table 4 :
[17,18]son of bit rate increment ratio of the proposed scheme and two previous schemes[17,18].
4.2.Bit-Rate Increment Ratio.Table4shows the ratio of the bit-rate increment of different QPs.The average ratios of the bit-rate increment of Ma et al. 's scheme, Lin et al. 's scheme, and the proposed scheme were 1.44%, 1.68%, and 2.01%, respectively.These results indicated that the degradation was quite small in all three schemes.

Table 6 :
Comparison of visual quality (PSNR: dB) of the proposed scheme and two previous schemes