High Capacity Reversible Watermarking for Audio by Histogram Shifting and Predicted Error Expansion

Being reversible, the watermarking information embedded in audio signals can be extracted while the original audio data can achieve lossless recovery. Currently, the few reversible audio watermarking algorithms are confronted with following problems: relatively low SNR (signal-to-noise) of embedded audio; a large amount of auxiliary embedded location information; and the absence of accurate capacity control capability. In this paper, we present a novel reversible audio watermarking scheme based on improved prediction error expansion and histogram shifting. First, we use differential evolution algorithm to optimize prediction coefficients and then apply prediction error expansion to output stego data. Second, in order to reduce location map bits length, we introduced histogram shifting scheme. Meanwhile, the prediction error modification threshold according to a given embedding capacity can be computed by our proposed scheme. Experiments show that this algorithm improves the SNR of embedded audio signals and embedding capacity, drastically reduces location map bits length, and enhances capacity control capability.


Introduction
Reversible watermarking is a technique that means after watermark information was embedded into the host data, original data can be restored while watermark information is extracted through a series of processes. Because the reversible watermarking schemes can achieve lossless recovery, they can be widely used in military communication, medical image, and digital forensics fields [1][2][3]. At present, the reversible watermarking technology research mainly concentrates on image [4][5][6]. Tian [7] proposed an image reversible watermarking algorithm using difference expansion technique. This algorithm first applied an integer wavelet transform on two adjacent pixels in the image and then embedded the watermark data in host pixel by expanding the difference of adjacent pixel. Finally, the algorithm restored pixel data through the inverse integer transform. In [8], a reversible watermarking algorithm was presented using prediction error expansion. First, the algorithm predicted a pixel value according to three pixels adjacent to the current pixel and then extended the difference between prediction and current value to achieve watermark information embedding. The algorithm has higher embedding capacity capability and better quality of embedded image than Tian's scheme. Digital speech is one of the important forms of digital media, but reversible watermarking algorithm for digital audio is relatively less. Reversible audio watermarking schemes mainly face the following problems: (1) audio in WAV format is a floating number ranged from −1 to 1. In order to transform these data, we must map the data to integer space; (2) human audio system is more sensitive than the human visual system, so even slight alteration of audio signal can easily be perceived; (3) the process of audio signal is much trickier than digital image [9]. In [10], Yan and Wang proposed a reversible audio watermarking scheme based on prediction error expansion method. First, three previous samples of the audio sample were multiplied by integer coefficients individually and then the sum of three results was used to 2 The Scientific World Journal predict the current sample. Second, watermarking information was embedded by expanding the difference between the prediction and current amplitude. This algorithm mapped the audio data to integer interval and achieved recovery of original audio signals. In the algorithm, a location map was used to mark the samples without watermark bits. However, the method suffers from lack of embedding capacity due to large auxiliary embedded location information. In [11], a reversible audio data hiding algorithm based on linear prediction and error expansion was present by A. Nishimura. Relative to [10], this paper improved the prediction coefficients using Burg method and obtained a higher SNR and embedding capacity. However, its capacity has the same drawback with [10] and capacity control capability was not considered.
Differential evolution (DE) algorithm is an optimization algorithm based on swarm intelligence. It can simulate the biological natural selection and evolution. This algorithm was first present by Stron and Price in 1995 [12]. DE algorithm can solve multiparameters optimization without depending on the continuity of objective function and the conductivity of the constraints. Meanwhile, this algorithm has good robustness and universality. It is widely used in many fields [13][14][15][16].
The application of histogram to reversible watermarking technique first appeared in image field. Ni et al. [17] proposed a reversible watermarking algorithm based on histogram shifting, where watermark is embedded into the histogram bins. A novel watermarking technique has been presented by Thodi and Rodrígez in [18], where location information is dramatically compressed by using histogram shifting technique. To our best knowledge, however, reversible audio watermarking based on histogram technique has not been reported yet. So, we cannot apply the histogram technique of image field to audio data directly.
In this paper, we proposed a novel reversible audio watermarking scheme based on improved prediction error expansion and histogram shifting which combined with the characteristics of digital audio data. Different from using the fixed set of linear prediction coefficients, we propose the optimization model of prediction coefficients and then use the differential optimization algorithm to determine the optimal linear prediction coefficients, so that the overall prediction errors are as small as possible. In addition, according to the characteristics of audio data, we introduced histogram shifting technique which can dramatically reduce the embedding location information. Meanwhile, we present the method which can compute the prediction error modification threshold according to a given embedding capacity. The experiments show that this algorithm can enhance capacity control capability.
The organization of this paper is as follows. In Section 2, we propose prediction error expansion based on differential evolution. Section 3 describes how information embedding and extraction are done by using histogram shifting and prediction error expansion. The experiment results and analysis are presented in Sections 4 and 5, respectively. The conclusions are in Section 6.

Prediction Error Expansion Based on Differential Evolution
2.1. Differential Evolution Algorithm. Differential evolution algorithm has the advantages of high efficiency, good convergence, and robustness. It can perform well when deal with multiparameter optimization problem, especially when the solution space is stochastic. Differential evolution algorithm can be divided into the following operations.
(1) The mutation: the mutation of DE is driven by the difference between the parent individuals; the mutation of vector is as follows: where 1 , 2 , 3 are uniform random integers ∈ [1, ], ∈ [0, 2] represents the control parameters, and means the index of generation.
(2) Crossover: in the DE algorithm, the crossover operation guarantees the diversity of population. When applying crossover on mutant vector V +1 and current individual , we can get new vector +1 , . The formula of crossover operation is as follows: where ∈ (1, 2, . . . , ) and is the dimension of the problem, rand () means the random values ∈ [0, 1], and represents the probability of crossover operation. ∈ [1, 2, . . . ] is a randomly chosen value.
(3) Selection: the selection operation of DE algorithm is based on a greedy selection scheme. We choose the individual which has better fitness between the vector +1 and the individual of original population . The formula is described as follows: where (⋅) represents fitness function. Figure 1 presents a typical histogram of prediction errors for classic music. The prediction errors histogram for most natural audio would be similar to this. Consider a process of expansion embedding. The smaller the magnitude of the prediction errors, the smaller the distortion of the audio. In the sense, the prediction errors occur more frequently in the central region. Since embedding capacity and distortion performance are most important indicators for reversible audio watermarking scheme, the optimization aim is to maximize SNR of embedded audio and capacity. The formula is presented as follows:

The Optimization of Linear Prediction Coefficients Based on DE Algorithm.
The Scientific World Journal where 2 ( ) and 2 ( ) mean original and embedded audio, respectively, is the length of the audio data, and means the number of embedded bits.

Prediction Error Expansion Based on Histogram Shifting
3.1. Histogram Shifting. In [18], histogram of prediction error for a host data is divided into two nonoverlapping regions, the outer and the inner regions, which are shown in Figure 1.
Expansion of the prediction errors expands the histogram of the inner region. As is shown in Figure 2, after a process of expansion, the peak of the histogram has fallen by almost 50%. Because the histogram of prediction errors for audio does not strictly satisfy the continuity, the algorithm in [18], in which the determination of embedded location was based on symmetrical histogram, cannot be applied to audio reversible watermarking algorithm. In this paper, we assume two thresholds and , which can be used to control the boundaries of the inner region and outer region. We first set = 0, = 1. Given a capacity, we can determine the boundaries of inner region and outer region according to the Algorithm 1.
Where means all the prediction errors, function num (⋅) returns the index of the given value. In the process of determining the thresholds, we considered the cumulative impact by all expansive bins rather than only the current expansive bin.

Embedding and Extraction.
All steps of the watermark embedding procedure based on improved prediction error expansion and histogram shifting are shown in Figure 3. The details are described as follows. its neighborhood samples and as its predicted value. As in formula (5), (⋅) represents predictor: Step 2. Define as prediction error = − ; then we embedded watermarking bit by expansion of prediction error . The formula is as follows: Step 3. Determine the linear prediction coefficients based on DE algorithm. We can define (⋅) = round  proposed in Section 2, we can obtain optimal prediction coefficients = [ 1best , 2best , 3best ].
Step 4. Give capacity . We can obtain threshold , based on Algorithm 1.
Step 5. Embed watermarking information using histogram shifting and expansion of prediction errors. eml and emr are left and right boundaries of histogram. We can obtain inner region [ , ] and outer regions [ eml , ) ∪ ( , emr ].
Finally, embed the watermarking information using expansion of prediction errors or achieve location marking using histogram shifting. The formula is described as follows: Step 6. Build location map. Since ∈ [32767, −32768], in order to prevent overflow and underflow problems during embedding process, we assign a value "1" in location map when the above situations occur. Otherwise we assign a value "0". The location map should be lossless compressed. We use run-length coding and Huffman coding scheme.
In the process of decoding, we can extract the watermark information and restore host audio data. The details are as follows.
Step 1. The current audio sample is obtained from (5) by using restored host data ( ) ( > 3). If the value of location map is "1, " keep the audio sample unchanged. For prediction errors ranges in [2 , 2 + 1], we can extract the watermark bit and restore audio sample using = mod ( , 2) , Step 2. For histogram shifting, we can restore the audio data by the following formula:

Experiment Set
In this section, we choose five common audio types to demonstrate the performance of the proposed algorithm. All audio files are 16-bit mono in wave format and the sampling rates of them are 44.1 kHz. The detailed description is listed in Table 1. For optimization of linear prediction coefficients based on DE algorithm, we set the size of initial population to 65. The iteration is set to 50. The probability of crossover and mutation are 0.9 and 0.09, respectively.

Evaluation Method.
In order to evaluate the performance of the proposed algorithm, we choose two common indicators SNR and capacity. SNR is a statistical difference metric The Scientific World Journal  to measure the perceptual similarity between original audio and embedded audio. The formula is as follows: where ( ), ( ) are original audio sample and embedded audio sample, respectively. Capacity is used to measure how much information can be embedded into a host data. The formula is as follows: where means the required number of audio sample when embedding one bit watermark during embedding procedure = 1 .

The Experimental Results and Analysis
Using the ED algorithm settings in Section 4, the convergence curves of object function with DE algorithm are shown in Figure 5. The best fitness of an individual can converge fast to a stable value in the twenty-fifth generation. The comparison of SNR and capacity between embedded audio using the optimal linear prediction coefficients and best linear prediction coefficients proposed in [10] was shown in Table 2. We can see that the linear prediction coefficients generated by DE algorithm achieve better SNR and capacity than fixed coefficients for all the types of test audios in the same condition. Figure 6 presents the performance of the 64 prediction coefficient sets and optimal prediction coefficient set for classic audio. It can be seen that linear prediction coefficients obtained by DE have obvious advantage in SNR and capacity. The original and watermarked audio signals in time domain are shown in Figure 4; the difference between them is invisible.   Figure 7 graphically depicts the embedding capacity versus the embedded audio quality curve of five types of audio. We can see that the SNR of embedded audio decreased with the increase of embedding capacity. According to the given embedding capacity, we can conveniently determine and modify the threshold of prediction error. That enhances capacity control capability. In order to evaluate the performance of proposed algorithm based on histogram shifting, we use run-length coding and Huffman coding scheme to   compress the location map. Figure 8 presents the embedding capacity versus compressed location map length curve of five types of audio. The location map is generated using algorithm proposed in [11]. We can see that location map of algorithm compared has a large length and the compressibility of location map is low. The large auxiliary location map will take a lot of capacity. In this paper, we adopt histogram shifting to reduce the location map. As we know, we can map the audio data to integer ranged from −32768 to 32767. As the range of audio data is very wide, the overflow phenomenon rarely occurs. Meanwhile, the location map of proposed algorithm only needs overflow location information, so the length of location map can be reduced drastically. The experiments showed that the length of location map using proposed algorithm is negligible.

Conclusion
This paper proposed a novel reversible audio watermarking algorithm based on improved prediction error expansion and histogram shifting. As the performance of reversible audio scheme using prediction error expansion is affected by the predictor, we proposed the optimization of linear prediction coefficients using differential evolution algorithm. The experiments have shown that the proposed scheme can achieve better embedded audio quality and higher capacity. In order to reduce the length of location map, we introduced histogram shifting scheme. In addition, we proposed the scheme which can be used to compute prediction error modification threshold according to a given embedding capacity. The simulation results verify our scheme can drastically