Reversible Data Hiding Based on Multiple Pairwise PEE and Two-Layer Embedding

Recent reversible data hiding (RDH) work tends to realize adaptive embedding by discriminately modifying pixels according to image content. However, further optimization and computational complexity remain great challenges. By presenting a better incorporation of pixel value ordering (PVO) prediction and pairwise prediction-error expansion (PEE) technologies, this paper proposes a new RDH scheme. The largest/smallest three pixels of each block are utilized to generate error-pairs. To achieve optimization of the distribution of error pairs, two-layer embedding is introduced such that full-enclosed pixels of each block can be used to determine how to optimally deﬁne the spatial location of pixels within block. Then, to modify error pairs with less distortion introduced, the shifted pairing error is involved in the separable utilization of the other one; i.e., it serves as the context for recalculating the other one. Since the recalculation is equivalent to expansion bins selection, various extensions of original pairwise PEE are designed, parameterized, and combined into the so-called multiple pairwise PEE, with which the 2D histogram can be divided into a set of sub-ones for more accurate modiﬁcation. The experimental results verify the superiority of the proposed scheme over several PVO-based schemes. On the Kodak image database, the average PSNR gains over original PVO-based pairwise PEE are 0.83 and 0.99dB for capacities of 10,000 and 20,000 bits, respectively.


Introduction
REVERSIBLE data hiding (RDH) is a kind of data hiding technology in which some secret message is embedded into the cover media for various purposes such as secret communication and integrity authentication. Moreover, the cover media can be exactly recovered after extracting the hidden data. Compared with traditional data hiding technologies such as digital watermarking whose main concern lies in the robustness against various attacks, RDH is categorized as fragile watermarking technology, and its specific property is to exactly restore the cover media. RDH for uncompressed gray-scale image [1] is the most investigated subject of RDH nowadays. Furthermore, RDH in encrypted domain [2] has also gained increasing attention.
Adaptive embedding and high-dimensional PEE are two remarkable achievements in recent years. For example, pixels are adaptively selected for data embedding in [17]. In [29], the obtained errors are classified and embedded with different amounts of data. In [26], the expansion bins are adaptively selected so as to achieve optimal embedding performance. In [23], a histogram sequence is first derived from decomposing the PEH. en, for each subhistogram the expansion bins are adaptively selected. High-dimensional PEE additionally exploits the correlation of errors by jointly modifying them [25,27]. For example, pairwise PEE [27] outperforms conventional PEE by introducing more flexible modification of error pairs; i.e., a pair of expandable errors can only be embedded with one of the combinations of bits "00", "01", and "10". Pairwise PEE was firstly incorporated with rhombus prediction [27]. Afterwards, there was also the incorporation of pairwise PEE and pixel value ordering (PVO) prediction [30]. To enhance pairwise PEE by combining expandable errors into a pair, the strategy of adaptive pairing is proposed later [28]. Based on the observation that adaptive pairing actually aims to achieve optimization of the distribution of error pairs, similar optimization is achieved in [31,32] where pairwise PEE is better incorporated with PVO prediction. Existing pairwise PEE-based schemes have demonstrated the advantage of pairwise PEE in distortion reduction. On the other hand, PVO prediction has been verified to have high accuracy. erefore, PVO-based pairwise PEE is of great significance to high-fidelity RDH.
In this paper, a new RDH scheme is proposed by extending PVO-based pairwise PEE into adaptive embedding from two aspects including adaptive error-pair generation and adaptive error-pair modification. e cover image is first divided into shadow and blank blocks to enable two-layer embedding. In this way, the local complexity of each block can be more precisely computed using fullenclosed pixels, and thus a better selection of pixel blocks for data embedding is presented. For error-pair generation, the largest/smallest three pixels of each block are utilized by improved PVO (IPVO) [33] to generate error pairs. For two correlated pixels, the principle of IPVO can be interpreted as predicting the one having relatively smaller location with the other one. Since errors valued 0 or 1 are expandable in IPVO, it is quite necessary to ensure that the to-be-predicted pixel with relatively smaller location has relatively larger gray value. Based on the above analysis, full-enclosed pixels are additionally used to estimate the distribution of pixels within block and to accordingly fulfill spatial location definition. As for error-pair modification, a key observation lies in that most secret data is embedded into error pairs consisting of one or more shiftable errors. For these error pairs, we propose to use the shifted error to adaptively recalculate the other pairing error. Such recalculation can be equivalent to expansion bins selection. In this way, multiple pairwise PEE is designed and parameterized by the selection of expansion bins. Finally, an efficient mechanism to determine the expansion bins aiming for minimized distortion is also proposed. e rest of the paper is organized as follows. Section 2 briefly introduces some related work including PVO, IPVO, and PVO-based pairwise PEE. e proposed scheme is presented in detail in Section 3. e experimental results are given and discussed in Section 4. is paper is concluded in Section 5. [34] exploits image redundancy in a blockwise manner. After dividing the cover image into nonoverlapped blocks consisting of n pixels, pixels of each block are collected to generate a pixel sequence (p 1 , p 2 , . . . , p n ). Next, they are sorted to obtain (p σ (1) , p σ (2) , · · · , p σ(n) ) where σ: 1, · · · , n { } ⟶ 1, · · · , n { } is the unique one-to-one mapping such that p σ (1)

PVO-Based PEE. PVO-based PEE
In this way, the prediction error is computed as To fulfill data embedding, PE max is modified as

IPVO-Based PEE.
In PVO-based PEE, smooth blocks are preferentially selected and a large number of errors valued 0 are thus generated. However, these errors are ignored according to (2), which implies insufficient utilization of smooth blocks. To solve this problem, pixel location is introduced into IPVO-based PEE [33] and the prediction error is renewed as where u � min(σ(n − 1), σ(n)), v � max(σ(n − 1), σ(n)). If σ(n) > σ(n − 1), there is e max � p σ(n− 1) − p σ(n) ≤ 0; otherwise, there is e max � p σ(n) − p σ(n− 1) ≥ 1. Obviously, the relative order of σ(n − 1) and σ(n) remains unchanged after enlarging p σ(n) . In this way, the marked prediction error e max ′ is computed as and accordingly p σ(n) is modified to

PVO-Based Pairwise PEE.
Despite the superiority over PVO-based PEE, the performance of IPVO-based PEE is still limited by considering only the largest two pixels. By involving the 3rd largest pixel, e max is reinterpreted as e max � e 1 max − e 2 max in PVO-based pairwise PEE [30] where e 1 max and e 2 max are computed as Obviously, expandable errors e max ∈ 0, 1 { } in IPVObased PEE would turn to error pairs (e 1 max , e 2 max )/e 1 max − e 2 max ∈ 0, 1 { }}. In this way, the modification rule of IPVO-based PEE is described as a 2D mapping in Figure 1(a), for which only one pairing error is modifiable.
To better utilize the largest three pixels, another 2D mapping is proposed as shown in Figure 1(b). Correspondingly, p u and p v will be modified after embedding. For this new mapping, six types of error pairs or error-pair transforms are defined as shown in Figure 2. Among them, Type-D error pair is the most valuable one since it introduces the least distortion per embedding one bit, followed by Type-A, Type-B, Type-C, Type-E, and Type-F error pairs, respectively. Based on such consideration, it is quite necessary to optimize the distribution of various types of error pairs. For example, Type-D error pair consists of two expandable errors. As we can learn from IPVO, more expandable errors can be generated with pixel location involved.

Proposed Scheme
e proposed method is presented in this section. Firstly, the framework of two-layer embedding is introduced and a new calculation of local complexity is presented. en, error-pair generation and adaptive definition of spatial location are introduced. Finally, multiple pairwise PEE and implementation details are presented.

Two-Layer Embedding.
Existing PVO-based schemes are generally implemented by processing blocks in rasterscan order, i.e., from top to bottom and left to right. For each block, its right and bottom neighbors are always recovered before it at decoder such that they can serve as its context. To obtain a better context so as to achieve better pixel block selection, two-layer embedding is introduced into the proposed scheme. As shown in Figure 3(a), all pixels except boundary ones are divided into nonoverlapped blocks denoted by "shadow" and "blank," respectively. During data embedding, shadow blocks are first embedded with half of the secret data and blank blocks are embedded with the remaining half later. At decoder, blank blocks are first recovered after data extraction.
en, shadow blocks are similarly processed later.
Referring to Figure 3(a), with the nearest four blank blocks serving as the context of a shadow block, a fullenclosed context is thus constructed. e first advantage of the new framework is to more precisely compute the local complexity of each block. In this paper, a complexity measurement NL i is computed considering the absolute difference between two consecutive context pixels. As shown in Figure 3(b), NL i is computed as the sum of all these differences.

Adaptive Error-Pair Generation.
For each selected shadow or blank block, the largest three pixels are utilized to generate an error pair. It has been verified by IPVO that more expandable errors can be derived from two correlated pixels by predicting the one having relatively smaller location. erefore, with u � min(σ(n − 1), σ(n)), v � max (σ(n − 1), σ(n)), two pairing errors e 1 max and e 2 max are computed as ). According to IPVO we have expandable errors e 1 max ∈ 0, 1 { } and e 2 max ∈ 0, 1 { } here. By combining them into a pair for pairwise PEE, the initial 2D mapping is obtained and shown in Figure 4.
To make a comparison between the new 2D mapping in Figure 4 (denoted by f) and the original one in Figure 1(b) (denoted by f o ), the image Lena is divided into blocks with the size of 4 × 4. en, the numbers of all types of error pairs are computed and presented in Table 1. As it is shown, the new 2D mapping f not only achieves capacity improvement, but also introduces less distortion per embedding one bit.
According to f, all of (0, 0), (1, 0), and (1, 1) are Type-D error pairs. Specifically, the correlation between p σ(n− 2) and p u is exploited to generate e 1 max while the one between p σ(n− 2) and p v is exploited to generate e 2 max . To generate more expandable e 1 max or e 2 max such that more Type-D error pairs can be obtained, the probability of u < v < σ(n − 2) should be promoted. To accomplish this, we propose to adaptively define the spatial location of pixels within block. Suppose the cover image (denoted by I) is divided into a set of h × w sized blocks, for each of which we propose to use horizontal and vertical gradients to estimate the distribution of pixels within block and thus determine how to collect them. With the upper-left pixel located at (i 0 , j 0 ), horizontal gradient G h and vertical gradient G v are computed as For example, every pixel is quite likely to have a right neighbor having smaller gray value when G h > 0. In this case, we should collect pixels from left to right so as to ensure pixel having relatively smaller location has relatively larger gray value. Similarly, every pixel is quite likely to have a bottom neighbor having smaller gray value when G v > 0. In that case, we should collect pixels from top to bottom. By combining G h and G v , eight modes of spatial location definition are summarized and presented in Figure 5, taking block with the size of 3 × 4 as an example. Moreover, how to determine the optimal mode is also given in Figure 5.

Security and Communication Networks
x 21 x 9 x 10 x 13 x 14 x 22 x 23 x 24 x 17 x 18 x 11 x 12 x 15 x 16 x 19 x 20 Figure 3: Image division and complexity measurement. Table 2 presents the performance comparison after applying adaptive mode. According to Tables 1 and 2, f o obtains more Type-A error pairs but fewer Type-E error pairs or specifically more error pairs (1, 0) but fewer error pairs (0, 1). is can be explained by the higher probability of p u > p v brought by adaptive mode. As a result, the capacity increases whereas the distortion reduces. It is also seen that adaptive mode is even more beneficial for f. Specifically, we have more Type-B, Type-C, and Type-D error pairs but fewer Type-F error pairs. So far, with the improvement in error-pair generation, the capacity, which used to be 9198 bits, now turns to be 11 219 bits. Although there is more serious distortion, the proposed scheme introduces less distortion per embedding one bit; i.e., whereas the original performance is measured by 2.338, the new one is measured by 2.213.

Multiple Pairwise PEE.
Notice that the correlation between p u and p v is neglected so far. For σ(n − 2), u, and v, their order can be one of the following three: . en, in Case (3) we have e 1 max � p u − p σ(n− 2) ≥ 1 and e 2 max � p v − p σ(n− 2) ≥ 1. By additionally considering the correlation between p u and p v , one can see that e 1 max is shifted in case of p u > p v . In this situation how e 2 max is utilized for data embedding should be related to e 1 max considering the similarity between them; i.e., the larger e 1 max is, the more likely e 2 max is shifted. In this paper, we propose to adaptively determine the predicted value of p v and recalculate e 2 max as with a parameter η. To fulfill data embedding, e N max is modified as and accordingly p v is modified to Given that p u − η > p σ(n− 2) (i.e., e 1 max � p u − p σ(n− 2) > η), . at is to say, if e 1 max ≤ η the correlation between p σ(n− 2) and p v is likely to produce expandable error. Otherwise, we turn to utilize the correlation between p v and p u with p u − η serving as the predicted value of p v . By varying η to cover every similarity between pairing errors, pairwise PEE will become more comprehensive. Similarly, e 2 max is shifted in case of p v > p u . In that situation p v − η serves as another candidate predicted value of p u and we recalculate e 1 max as en, e N′ max is obtained according to (11) and p u is similarly modified according to (12). Given that p v − η > p σ(n− 2) (i.e., e 2 max > η), we have expandable error at is to say, one data bit will be embedded into error pairs (e 1 max , e 2 max )|e 2 max − e 1 max � η − 1}. is can be equivalent to expansion bins selection. Table 3 presents the evolution of all types of error pairs from Case (3). e same evolution can be performed on error pairs from Cases (1) and (2) as well. In this way, the original 2D mapping is extended into f η for η ≥ 2. One can verify that f η includes the original 2D mapping f as a special case for taking η � 255.

Multiple Pairwise PEE Embedding.
For each block, the minimum pixels are also utilized for data embedding. Let u � max(σ(1), σ(2)), v � min(σ(1), σ(2)); another two pairing errors e 1 min and e 2 min are calculated as where r1 � min(u, σ (3) (3)). (e 1 min , e 2 min ) is similarly utilized like (e 1 max , e 2 max ), and only p σ (1) and p σ(2) are reduced. By counting the frequency of error pairs (e 1 max , e 2 max ) and (e 1 min , e 2 min ) derived from blocks with a given complexity, a 2D histogram sequence h n for n ≥ 0 is defined as h n e 1 , e 2 � # e 1 max/min � e 1 , e 2 max/min � e 2 , NL i � n . (15) So far we have histogram sequence h n (n ≥ 0) and mapping sequence f η (η ≥ 2). e final step is the selection of histograms and mappings and the determination of parameters s(n) which indicates that the histogram h n is to be processed by the mapping f s(n) . For histograms selection, the one with low complexity is preferentially selected. For mappings selection, any combination of various mappings can be attempted. Suppose that T mappings f η 0 , f η 1 , · · · , f η T− 1 }(η 0 < η 1 < · · · < η T− 1 ) are activated; s(n) takes values from η 0 to η T− 1 or 0 indicating that histogram h n is excluded. e determination of optimal s(n) is described as follows. Firstly, the capacity-distortion performance of f η in manipulating histogram h n is considered. Let A η , B η , C η , D η , E η , F η denote the types of error pair defined by f η . e capacity and distortion, denoted by EC pw (n, η) and ED pw (n, η), can be formulated as en, the complete determination procedure is presented in Algorithm 1.

Implementation of the Proposed Scheme.
With specific block size and parameter T, all pixels except boundary ones are divided into shadow blocks and blank blocks. en, the embedding and extraction procedures for the shadow layer are given below. Notice that the shadow and blank layers are embedded equally; i.e., each layer is embedded with half of the secret message. Firstly, to solve the problem of overflow/underflow, pixels are processed in raster-scan order by changing those valued 0 (255) into 1 (254). Here we use a location map record whether a pixel has been changed or not. en, the location map is compressed using arithmetic coding and we use L clm to denote its length.
Next, optimal s(n) is determined by capacity requirement which refers to the embedding of secret message and auxiliary information. Suppose that histograms with n < M are selected; the transmission of parameters s(n) requires a capacity of M × log 2 T bits. en, the least significant bits (LSBs) of 50 + M × log 2 T pixels from the first few blocks are recorded.
ese LSBs and the location map will be embedded along with half of the secret message as the payload.
With the first few ones skipped, shadow blocks are successively processed to embed the payload. For each block, local complexity NL i and two error pairs (e 1 max , e 2 max ), (e 1 min , e 2 min ) are first calculated. If s(NL i ) ≥ 2, these two error pairs are processed by the 2D mapping f s(NL i ) ; otherwise, this block is skipped. After embedding the whole payload, the auxiliary information will be embedded by replacing the LSBS of the first 50 + M × log 2 T pixels. e auxiliary information can be decomposed as follows: (i) Block size h × w (4 bits) (ii) Parameters T (4 bits) and M (10 bits) (iii) Index of the last processed block (16 bits) (iv) Length of the location map (16 bits) (v) Parameters s(n) (M × log 2 T bits) At decoder, the auxiliary information is first retrieved from the LSBs of 50 + M × log 2 T pixels of the first few shadow blocks. Next, with the first few ones skipped, shadow blocks are successively processed. For each block, local complexity NL i and two marked error pairs (e 1′ max , e 2′ max ), (e 1′ min , e 2′ min ) are calculated. If s(NL i ) ≥ 2, these two error pairs are processed by the inverse 2D mapping of f s(NL i ) to extract the embedded data and recover the original pixel values; otherwise, this block is skipped. Finally, the LSB sequence, the compressed location map, and the hidden message are retrieved from the extracted data bits. With the LSB sequence, the LSBs of 50 + M × log 2 T pixels of the first few shadow blocks are recovered. e location map is decompressed and used to recover a pixel valued 1 (254).

Experimental Results
To demonstrate the performance of the proposed scheme, several experiments are conducted in this section. e test images are several gray-scale images with the size of 512 × 512. Except Barbara, they are all downloaded from the USC-SIPI1 image database. To achieve the best performance, the embedding procedure is conducted for various block sizes taking h, w ∈ 2, 3, 4, 5 { }. In terms of activated mappings, we consider only the combination of mappings with η ∈ 2, 3, · · · , T + 1 { } where T is set as 2, 4, and 8. To verify the performance of the proposed scheme, some embedding parameters are presented in Table 4. As it is shown, for a given EC of 10,000 bits, the image Lena is divided into 4 × 3 sized blocks. At error generation stage, mode-8 becomes the most popular one and it is chosen by about 16.5% of blocks in shadow layer and about 16.1% of blocks in blank layer. During shadow-layered embedding   182  177  182  178  180  182  175   181  184  177  184  181  178  187   179  180  188  177  182  179  178   180  184  179  185  178  178  183   181  179  182  182  181  179  181   184  181  181  180  183  183  181 189 176 182    Figure 7: Performance comparison with related schemes including Sachnev et al. [17], pairwise PEE [27], PVO-based pairwise PEE [30], multipass PVO-based pairwise PEE [31], and dynamic IPVO [35]. histograms with n < 53 are selected, of which 34 are processed by f 2 and 19 are processed by f 3 . During blanklayered embedding histograms with n < 56 are selected, of which 45 are processed by f 2 and 11 are processed by f 3 . As a result, the highest PSNR reaches up to 61.15 dB. en, the proposed scheme is compared with five stateof-the-art schemes [17,27,30,31,35], as shown in Figure 7. Notice that rhombus prediction is adopted by [17,27] while PVO prediction is adopted by [30,31,35] and the proposed scheme. Figure 7 shows that larger maximum EC is always guaranteed by rhombus prediction. However, PVO-based schemes are generally better in fidelity when capacity is moderate. Rhombus prediction is firstly utilized in 1D PEE [17]. Later, it is incorporated with pairwise PEE to realize 2D PEE [27]. As shown in Figure 7, the comparison of [17,27] indicates that significant performance improvement is brought by the incorporation with pairwise PEE.
PVO-based pairwise PEE is firstly proposed in [30]. Figure 7 shows that, for some smooth images with large capacity (e.g., Lena with an EC larger than 35,000 bits and Barbara with an EC larger than 27,000 bits), [27] outperforms [30]. In other cases, [30] always outperforms [27]. e proposed scheme can also be regarded as an improved PVObased pairwise PEE. As shown in Figure 7, the proposed scheme always outperforms [30]. According to Tables 5 and 6, the average gains of the proposed scheme over [30] are 0.52 and 0.63 dB for capacities of 10,000 and 20,000 bits, respectively. e proposed scheme also outperforms [27] in almost all cases. Rhombus prediction guarantees very large capacity on smooth images Lena and Barbara. As a result, on these images the proposed scheme only outperforms [27] with EC not larger than 44,000 and 35,000 bits, respectively.
An improved incorporation of pairwise PEE and PVO prediction is proposed in [31] and it motivates the strategy of multiple pairwise PEE. In [31], the original 2D mapping of [30] is extended to two new ones according to local complexity. However, the gain is limited and the reason mainly lies in that only error pairs consisting of shiftable error are involved in the 2D mapping evolution. e proposed scheme, by contrast, achieves rather significant performance improvement over [30]. Referring to Tables 5 and 6, the proposed scheme outperforms [31] by 0.33 dB on average for an EC of 10,000 bits and 0.43 dB on average for an EC of 20,000 bits. e spatial correlation of pixels in a block is similarly exploited in [35]. Although the obtained errors are modified individually, the advantage of [35] is to fully exploit block redundancy by predicting as many pixels as possible. As shown in Figure 7, the proposed scheme outperforms [35] in almost all cases. Referring to Tables 5 and 6, the proposed scheme outperforms [35] by 0.24 dB on average for an EC of 10,000 bits and 0.4 dB on average for an EC of 20,000 bits. Furthermore, there is also significant superiority of the proposed scheme over [35] in capacity.
In summary, the proposed scheme mainly aims to present a better incorporation of pairwise PEE and PVO prediction technologies. Specifically, original PVO-based pairwise PEE [30] is extended into adaptive embedding from two aspects including adaptive error-pair generation and adaptive error-pair modification. To better verify the advantage of adaptive embedding, the comparison of the proposed scheme with [30] on the Kodak2 image database which contains several 512 × 768 or 768 × 512 sized images is shown in Figure 8. It shows that the proposed scheme achieves significant superiority over [30] on all kinds of images.
Finally, Table 7 gives the main notations used in this paper and it is worth mentioning that although achievement is achieved through adaptive error-pair generation and adaptive error-pair modification, the proposed scheme still uses only the largest/smallest three pixels in a block for data embedding. In the future work, we will focus on how to predict more pixels in a block and obtain more error pairs.

Conclusion
In this paper, a novel RDH scheme is proposed by presenting a better incorporation of pairwise PEE and PVO prediction. For error-pair generation, two-layer embedding is applied to obtain full-enclosed pixels. en, full-enclosed pixels are used to estimate the distribution of pixels within a block and to realize adaptive spatial location definition. In this way, the error-pair generation becomes adaptive, and the distribution of error pairs is thus optimized. For error-pair modification, pairwise PEE is extended to multiple pairwise PEE with which the 2D histogram can be divided into a set of sub-ones for more accurate modification. e experimental results have shown that the proposed scheme outperforms previous state-of-the-art schemes by improving the marked image quality.

Data Availability
Data are available from the corresponding author upon request.    Figures 4 and 1(b) f η e parameterized 2D mapping h n 2D histogram sequence EC pw (n, η)

Conflicts of Interest
Capacity performance of using f η to modify h n ED pw (n, η) Distortion performance of using f η to modify h n L clm Length of the compressed location map