A Fast DCT Algorithm for Watermarking in Digital Signal Processor

Discrete cosine transform (DCT) has been an international standard in Joint Photographic Experts Group (JPEG) format to reduce the blocking effect in digital image compression.This paper proposes a fast discrete cosine transform (FDCT) algorithm that utilizes the energy compactness and matrix sparseness properties in frequency domain to achieve higher computation performance. For a JPEG image of 8 × 8 block size in spatial domain, the algorithm decomposes the two-dimensional (2D) DCT into one pair of one-dimensional (1D) DCTs with transform computation in only 24 multiplications. The 2D spatial data is a linear combination of the base image obtained by the outer product of the column and row vectors of cosine functions so that inverse DCT is as efficient. Implementation of the FDCT algorithm shows that embedding a watermark image of 32 × 32 block pixel size in a 256 × 256 digital image can be completed in only 0.24 seconds and the extraction of watermark by inverse transform is within 0.21 seconds. The proposed FDCT algorithm is shown more efficient than many previous works in computation.


Introduction
Discrete cosine transform (DCT) has been widely used to convert a dynamic signal into frequency components so as to reduce digital image storage size, expedite data transmission, and remove redundant information.DCT is closely related to discrete Fourier transform with the advantage of concentrating the energy of transformed signal in low frequency range where human eyes are less sensitive in image processing [1].The joint ISO committee therefore adopts DCT to Joint Photographic Experts Group (JPEG) international standard of 8 × 8 block size to reduce the blocking effect in image compression.A basic JPEG image encoding is composed of three procedures: image transform, quantization, and encoding.
DCT can map an original data into frequency domain by cosine waveform, and conversely inverse discrete cosine transform (IDCT) transfers frequency domain data into spatial domain.Numerous coding methods based on DCT have been presented for digital image processing; however, the associated memory size, bandwidth, and safety issues are of significant concern to real-time applications.Sun and Yang [2] proposed an image compression method based on a Laplace transparent composite model to achieve high coding efficiency.Jridi et al. [3] presented image compression hardware to reduce computational complexity.Others have proposed to optimize image computation by digital signal processor (DSP).Kumbhare and Gokhale [4] developed a low complexity architecture for computing an algebraic integer based 8-point DCT in digital image processing.Jridi et al. [5] designed a low complexity DCT engine in digital video and image processing.Subband decomposition algorithms based on DCT have also been used in transmitting image data of low resolution to rebuilt image of better quality [6][7][8], but they required high complexity and thus time-consuming computation.Stassen's matrix multiplication algorithm was proposed to reduce complex matrix multiplication in DCT [9].Khan et al. [10] increased the coordination between the pixel size and subword size to maximize resource utilization for multimedia application, but  the work required heavy computation.This paper proposes a fast DCT (FDCT) algorithm with significantly reduced number of multiplications to achieve higher computation efficiency in digital image processing.It is also shown suitable for hardware implementation in DSP on digital watermarking applications.

DCT in JPEG
The basic JPEG image encoding method is composed of three procedures: image transform, quantization, and encoding.Figure 1 shows the encoder and decoder model, where an original image is first divided into block pixel size 8 × 8 in RGB model with each block in 0 to 63 frequency coefficients as shown in Figure 2. The low frequency coefficients are in the light color region.During image processing, DCT maps the spatial domain data into frequency domain by cosine waveform and conversely in inverse discrete cosine transform [11].The spatial domain indicates the "magnitude" of a color image while the frequency domain shows the magnitude change from one pixel to the next.In DCT, the original host signal is first divided into nonoverlapping 2D blocks of size 8 × 8.Each block is then processed independently and transformed into AC and DC coefficients in frequency domain, representing the average color of the block and the color change across the block, respectively.
After DCT, a quantizer with quantization table is used to provide higher compassion ratio in transmission by approximating a continuous set of values in image data to a finite (preferably small) set of values.It is done by dividing each component in frequency domain by a constant and then rounding to the nearest integer.The input to a quantizer is the original data and the output is by a function of a set of discrete, finite output values.A good quantizer is to represent the original signal with minimum loss or distortion.A highly useful feature of JPEG process is that varying levels of image compression and quality are obtained by the selection of specific quantization matrix, similar to weighting function to mean psychological visual capability.Quantization involves dividing each coefficient by an integer value between 1 and 255.
After quantization, the DC coefficient, which contains a significant fraction of the total image energy, becomes a measure of the average value of the original 64 pixels, and the 63 AC components are treated in an entropy coding process in the order of increasing frequency.Because the 8 × 8 blocks are usually with strong correlation, the quantized DC coefficient is encoded as the difference from the DC term of the previous block.The higher frequency coefficients are more likely to be 0 or negligible after quantization, thereby improving the compression of run-length encoding.

2D FDCT. Implementation of a 2D
DCT is by separating into a pair 1D DCT as illustrated in Figure 3. Consider a 2D spatial data sequence s(i, j), 0 ≤ ,  ≤ 7, in matrix S of 8 × 8, and the corresponding 2D DCT sequence (, V), 0 ≤ , V ≤ 7, in frequency domain of matrix F is defined as The inverse transformation represented by S  , S  = {  (, )}, where (), (V) = 1/ √ 2 if , V = 0 and C(u), (V) = 1 for others.By defining a matrix M = {(, V)}, where (, V) represents the matrix element in the uth row and vth column, A 2D DCT data matrix F and its inverse matrix S  can be written as Because the base vectors of DCT are orthogonal, the inverse transform IDCT can therefore be easily obtained as shown in  (12b).DCT transforms high correlated image into a few transform coefficients.Conventional image coding techniques use the quantization process to achieve higher compression ratio.Therefore, F includes only a few nonzero elements in the low frequency range, which makes it possible to design efficient IDCT algorithm by fully utilizing the computation efficiency in (3) and ( 4) and the energy compactness property of F.
The 2D spatial data matrix S  in (12b) can be considered as linear combination of the base images or the outer product of the column and row vectors in M.This interpretation makes it easier for (12a) and (12b) to manipulate the sparseness of the 2D DCT matrix F and to calculate the spatial data matrix S  .The proposed FDCT algorithm achieves high computation performance over many other previous algorithms.The row (or column) data can be processed by using 1D DCT (or IDCT) first with the results stored in transposition memory.By exploiting the redundancy in the coefficients of DCT, the algorithm reduces the complexity of 2D DCT of an 8 × 8 block to only 24 multiplications.

Performance Evaluation in Digital Image Watermarking
Discrete cosine transform (DCT) can map an original digital data into frequency domain by cosine waveform, and, conversely, inverse discrete cosine transform (IDCT) transfers the frequency domain data into spatial domain.
The associated memory size, bandwidth, and safety issues in the transformation algorithms are of significant concern.Direct computation of 2D DCT (N × N pixel size of a block) requires  4 multiplications, while direct realization (with the row-column separation) of 8 × 8 DCT 2 × 8 3 = 1024 multiplications.Although many algorithms have been  previous works in reducing the complexity of computation.Implementation in digital signal processor (DSP) for realtime digital watermarking becomes feasible.The process of embedding watermark in digital image for copyright protection and marketing applications have been proposed over the past decade.Conventional technique is to embed a secret bit string in spatial, frequency, or wavelet domain into an image.The FDCT algorithm is implemented in a digital signal processor (TMS320C6701) to validate its efficiency in digital watermarking applications.The signal processor in both fixed-point and floating-point is supported by a set of software development tools with C/C++ compiler, assembly optimizer, a linker, and assorted utilities as shown in Figure 4.The proposed FDCT algorithm is written in C language, and the C/C++ compiler is able to perform optimization ( = 0, 1, 2, 3) in different level of clock cycles and code size.The lowest level ( = 0) optimization provides the operations of performing loop rotation, allocating variables to registers, and simplifying expressions and statements, the first level ( = 1) on constant propagation and unused assignments, the second level ( = 2) on software pipelining, unused global assignments, loop unrolling, and incremented pointer, and the highest level ( = 3) on optimization by simplifying functions with return values never used, removing all functions never called, inline calls to small functions, and reorder function declarations so that the attributes of called functions are known when the caller is optimized. = 0 and 1 levels can efficiently reduce the code size, while  = 2 and 3 enhance the execution speed with larger code size.The computation time and code size in different levels of optimization are listed in Table 2.With increasing code size, the clock cycles and processing time decrease, so the best optimization is by  = 3.For a digital watermark (32 × 32 block size) embedded in an original image (256 × 256), calculation by the FDCT algorithm in frequency domain and then inverse transform of the encrypted data back to spatial domain image uses 5,653,914 clocks ( = 3), corresponding to 34 ms in 35 KB code size.Implementation shows that it takes only 0.24 seconds to have the watermark embedded in the original image.Extraction of watermark by inverse transform IDCT is within 0.21 seconds.Real-time implementation of the FDCT algorithm in DSP for image processing is shown very efficient.

Conclusions
(1) A fast discrete cosine transform (FDCT) algorithm that utilizes the energy compactness and matrix sparseness properties in frequency domain for higher computation performance is developed.For a JPEG image of 8 × 8 block size in spatial domain, the algorithm first decomposes the 2D DCT into one pair of 1D DCTs, and the calculation can be completed in only 24 multiplications.The 2D spatial data is a linear combination of base image obtained by the outer product of the column and row vectors of cosine functions such that the inverse DCT is as efficient.
The algorithm is shown to achieve high performance compared to many other previous works.
(2) The algorithm optimizes a 2D DCT by exploiting the redundancy of the frequency coefficients so as to facilitate the implementation in digital signal processor (DSP).For a spatial domain data matrix S, the 2D DCT data matrix F includes only a few nonzero elements in the low frequency range, which makes it possible to design efficient IDCT algorithm.The energy compactness property of F and its inverse matrix S  can be written as linear combinations of the cosine functions such that both FDCT and its inverse transform are shown to have the same coefficients and matrix blocks for efficient hardware implementation. (

Figure 1 :
Figure 1: The encoder and decoder model of JPEG compression standard by using discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT).

Figure 2 :
Figure 2: The distribution of coefficients in DCT, (a) low frequency position (heavy color), (b) low-middle-frequency position (light color), (c) high-middle frequency position (light color with dots), and (d) high frequency position (white).

8 × 8
of picture in spatial domain

Figure 3 :
Figure 3: The FDCT algorithm of 2D DCT calculated by one pair of 1D DCTs.

Table 1 :
The number of multiplications for 2D DCT by the algorithms of previous works and by the proposed FDCT algorithm.
[10]Dixit[9]used Stassen's matrix multiplication but their 2D DCT (8 × 8) needed 28 multiplications.Khan et al.[10]took 40 multiplications for 2D DCT operation on an image of 8 × 8 block.In summary, both the FDCT algorithm and its inverse transform are more efficient than many Mathematical Problems in Engineering

Table 2 :
Processing time of the FDCT algorithm in DSP.
) An example of digital image watermarking is applied to demonstrate the efficiency of the FDCT algorithm.Hardware implementation of watermarking in DSP shows that it takes only 0.24 seconds to embed a 32 × 32 block size digital watermark into a digital image of block size 256 × 256.Implementation also shows that extraction of watermark can be completed within 0.21 seconds.The FDCT algorithm in DSP is shown efficient and effective in real-time implementation of digital image watermarking.