Lossless Image Compression Based on Multiple-Tables Arithmetic Coding

1 Department of Information Management, Chaoyang University of Technology, No. 168, Jifong E. Rd., Wufong Township Taichung County 41349, Taiwan 2 Department of Computer Science, National Tsing-Hua University, No. 101, Section 2, Kuang-Fu Road, Hsinchu 300, Taiwan 3 Department of Management Information Systems, National Chung Hsing University, 250 Kuo-kuang Road, Taichung 402, Taiwan 4 Department of Information Engineering and Computer Science, Feng Chia University, No. 100 Wenhwa Rd., Seatwen, Taichung 407, Taiwan


Introduction
With the rapid development of image processing and Internet technologies, a great number of digital images are being created every moment.Therefore, it is necessary to develop an effective image-compression method to reduce the storage space required to hold image data and to speed the image transmission over the Internet 1-16 .
Image compression reduces the amount of data required to describe a digital image by removing the redundant data in the image.Lossless image compression deals with reducing coding redundancy and spatial redundancy.Coding redundancy consists in using variablelength codewords selected to match the statistics of the original source.The gray levels of some pixels in an image are more common than those of others-that is, different gray levels occur with different probabilities-so coding redundancy reduction uses shorter codewords for the more common gray levels and longer codewords for the less common gray levels.We call this process variable-length coding.This type of coding is always reversible and is usually implemented using look-up tables.Examples of image coding schemes that explore coding redundancy are the Huffman coding 4, 5, 7 and arithmetic coding techniques 8, 9 .
There exists a significant correlation among the neighbor pixels in an image, which may result spatial redundancy in data.Spatial redundancy reduction exploits the fact that the gray levels of the pixels in an image region are usually the same or almost the same.Methods, such as the LZ77, LZ88, and LZW methods, exploit the spatial redundancy in several ways, one of which is to predict the gray level of a pixel through the gray levels of its neighboring pixels 14 .
To encode an image effectively, a statistical-model-based compression method needs precisely to predict the occurrence probabilities of the data patterns in the image.This paper proposes a lossless image compression method based on multiple-tables arithmetic coding MTAC method to encode a gray-level image.
A statistical-model-based compression method generally creates a code table to hold the probabilities of occurrence of all data patterns.The type of data pattern significantly affects the encoding efficiency when minimizing storage space.When the data come from different sources, it is difficult to find an appropriate code table to describe all the data.Therefore, this MTAC method categorizes the data and adopts distinct code tables that record the frequencies which the data patterns occur in different clusters.

The MTAC Method
The proposed MTAC method contains three approaches: median edge detector MED processing, base-switching transformation, and statistical-model-based compressing.This section introduces these three approaches.

MED Processing Approach
Shannon's entropy equation can estimate the average minimum number of bits needed to encode a data pattern based on the frequency which the data pattern occurs in a data set 11, 12 .Let l be the total number of different data patterns in a data set and p i the probability of the ith data pattern's occurring in the data set.The entropy rate E of the data set is defined as It is impossible to encode the data set, in a lossless manner, with a bit rate higher than or equal to E. The bit rate is defined as the ratio of the number of bits holding the compression data to the number of pixels in the compressed image.The higher the entropy rate, the less one can compress it using a statistical-model-based compression method.Let f be the encoded image consisting of H × W pixels, where H and W are the height and the width of f, respectively.MED 10 estimates the gray level of a pixel by detecting whether there is an edge passing through the pixel.MED scans each pixel in f, starting from the left-top pixel of f, in the order shown in Figure 1.While scanning a pixel P i, j , MED estimates the gray level g i, j of P i, j via the gray-levels g i, j −1 , g i−1, j −1 , and g i−1, j of the pixels P i, j − 1 , P i − 1, j − 1 , and P i − 1, j , where P i, j is the pixel located at the coordinates i, j in f. Figure 2 shows the spatial relationships of P i, j , P i, j −1 , P i−1, j −1 , and P i − 1, j .
For i 1 or j 1, the estimated gray level g i, j of P i, j is defined as

2.2
In addition, for i 1 to H, and j 1 to W: Here, Max g i, j − 1 , g i − 1, j and Min g i, j − 1 , g i − 1, j are the maximum and the minimum between g i, j − 1 and g i − 1, j , respectively.If g i, j g i, j − 1 , MED considers that a horizontal edge passes through P i, j or some pixels above P i, j .When g i, j g i − 1, j , MED perceives that one vertical edge passes through P i, j or some pixel on the left of P i, j .
Let e i, j g i, j − g i, j be the difference between g i, j and g i, j ; we call e i, j the estimated error of g i, j .Similarly, in the decompressing phrase, based on g i, j − 1 , g i − 1, j − 1 , and g i − 1, j , the MTAC method can compute g i, j through formulas 2.2 or 2.3 ; then it can get g i, j g i, j e i, j .To recover f without loss, the MTAC method needs to save g 0, 0 and the estimated errors of all other pixels in f.The estimated error e i, j is within the interval between −255 and 255.Each e i, j can be represented by an 8-bit memory space that describes the absolution value |e i, j | of e i, j and one bit b that records the sign of e i, j .All the |e i, j |s compose a gray-level image f e , and all the sign bits bs make up a binary image f s .We call f e the error image and f s the sign bit image of f.
Figure 3 shows two 512 × 512 gray-level images Airplane and Baboon, and their graylevel histograms.Let l 256, and p i the number of pixels whose gray levels are equal to i the total number of pixels in the image .

2.4
According to formula 2.1 , the entropy rates of Airplane and Baboon, in theory, reach 6.5 and 7.2 bits/pixel, respectively.From Shannon's limit 11, 12 , with such an entropy rate, the minimum number of bits required to describe a pixel in Airplane resp., Baboon is 6.5 resp., 7.2 bits/pixel.Since the numbers of bits are over the acceptable maximum, the MTAC method utilizes MED 10 to decrease the entropy rate of f before encoding f. Figure 4 demonstrates the error images of Airplane and Baboon shown in Figure 3, and the gray-level histograms of the error images.The gray levels of most pixels in the error images are close to 0; the entropy rates of the error images of Airplane and Baboon are 3.6 and 5.2 bits/pixel, respectively, which are far lower than the entropy rates of the original Airplane and Baboon.
Figure 5 shows the sign bit images of Airplane and Baboon.It is clear that both sign bit images are messy, so it is difficult to find a method effectively with which to encode them effectively.To deal with this problem, the MTAC method transforms the error image and sign bit image into a difference image and an MSB most significant bit image, respectively.The MTAC method pulls out the MSB of all the |e i, j |s to create an H × W binary image f MSB , where the MSB of |e i, j | is given to the pixel located at the coordinates i, j of f MSB .We call the binary image the MSB image f MSB of f.Meanwhile, the MTAC method concatenates the sign bit b of e i, j and the remaining |e i, j |, whose MSB has been drawn out, by appending b to the rightmost bit of the remaining |e i, j | in order to generate another graylevel image.We name the gray-level image the difference image of f. Figure 6 illustrates these actions.
Figure 7 shows the MSB images of Airplane and Baboon.Almost all the pixels on the MSB images are 0. Figure 8 displays the difference images of Airplane and Baboon and their gray-level histograms.Clearly, the gray levels of most pixels in the difference images are equal to 0. The entropy rates of the difference images of Airplane and Baboon are 4.4 and 6.2 bits/pixel, respectively.These entropy rates are higher than the entropy rates of their error images but are much lower than those of the original Airplane and Baboon.

Base-Switching Transformation Approach
The gray level of a pixel in a gray-level image is generally represented by an 8-bit memory space.However, it is uneconomical if the gray levels of the pixels in a gray-level image are similar.Hence, the MTAC method adopts the base-switching transformation BST algorithm 1, 2 to compress a difference image.
The BST algorithm partitions a difference image into small nonoverlapping image blocks, each consisting of m × n pixels.Let g min and g max be the minimal and maximal gray levels of the pixels in an image block B. The difference between g min and the gray level g of each pixel in B can be depicted by log 2 g max − g min bits.The MTAC method uses a 3-bit memory space S to describe log 2 g max − g min , where if log 2 g max − g min 0 or 1, log 2 g max − g min − 1, otherwise.

2.5
For each image block, the BST algorithm needs to hold only g min , S, and the graylevel differences between g min and the gray levels of all the pixels in B. We call the difference between the gray level of a pixel P and g min the gray-level difference of P. Figure 9 is a 4 × 4 image block B. The 16 × 8 128 bits of memory space are required to store B. However, in the BST algorithm, g max , g min , and S of B are 137, 122, and 3, respectively.The BST algorithm uses 8 bits, 3 bits, and 4 × 16 bits to hold g min , S, and the gray-level differences of all the pixels in B; hence, the BST algorithm requires only a total of 75 bits to store B.

Statistical-Model-Based Compressing Approach
After the MED processing approach, image f is transformed into an MSB image and a difference image.In the base-switching transformation approach, the difference image is segmented into nonoverlapping small image blocks.The MTAC method then writes down g min , S, and the gray-level differences of all the pixels in each image block.However, a few pixels may have big gray-level differences in an image block, so each gray-level difference in this image block requires large number of bits to hold it.For example, the maximal graylevel difference of the image block in Figure 9 is 15; therefore, each gray-level difference can be expressed by at least 4 bits.To remedy this problem, the MTAC method takes arithmetic coding algorithm continuously to compress the data obtained in the MED processing and BST approaches.The arithmetic coding algorithm 8, 15 is one of the statistical-model-based compressing methods that decide the bit length of a code according to the occurrence frequencies of data patterns.These methods give longer codes to the data patterns that occur more frequently and shorter codes to those that occur less often.Hence, the type of data pattern significantly affects the encoding's efficiency in minimizing storage space.The MTAC method will adopt the arithmetic coding algorithm to compress the MSB image, all the g min s, and all Ss.Since the MSB image, all the g min s, and all Ss have different statistics, the MTAC method will require distinct code tables to record the data patterns of the MSB image, all the a Airplane b Baboon  g min s, and all Ss.Each data pattern of the MSB image, all the g min s, and all Ss are described by 8-bits, 8-bits, and 3-bits in length, respectively.
Next, the MTAC method concatenates the gray-level differences of all the image blocks into a binary string GD S , where the bit length of each gray-level difference in the image blocks is S.For example, the gray-level differences in all the image blocks with S 4 are concatenated into GD 4 .The MTAC method then uses an arithmetic coding algorithm to encode each GD S , where the bit length of each data pattern in encoding GD S is S. Finally, the MTAC method needs to hold only the height H and the width W of f, all the code tables, and all the compression data generated by the arithmetic coding algorithm.
After the statistical-model-based compressing approach has been employed, the MTAC method concatenates W, H, String CODE

Image Decompression
In the decompression phrase, the MTAC method first draws W, H, and String CODE TABLE from the compression data.The bit length of each data pattern in the MSB image is 8.   Hence, the MTAC method can reconstruct the MSB image based on String MSB by using the arithmetic decoding method.Since f consists of H × W/9 image blocks, the MTAC method will decompress the H ×W/9 g min s from String g min using the arithmetical decoding method, where the bit length of a data pattern is 8. Similarly, it can decode H × W/9 Ss from String S , where each data pattern is described by 3 bits.How many data patterns are in each GD S can be easily computed via Ss.Hence, each GD S can be decoded as well.

Experiments
The purpose of this section is to investigate the performance of the MTAC method by experiments.In these experiments, ten 256 × 256 gray-level images Airplane, Lena, Baboon, Gold, Sailboat, Boat, Toy, Barb, Pepper, and Girl, shown in Figure 10, are used as test images.The first experiment explores the effect of the MED processing approach on reducing the entropy rate of the compressed image.Table 1 lists the entropy rates of the ten original test images and the entropy rates of their error images.The experimental results show that most of the entropy rates of the error images are close to half those of the original test images.In experiment 2, the arithmetic coding method is used to encode the sign bit images of the ten test images, where the bit length of each data pattern is 8 bits.Table 2 shows the sizes of the original sign bit images and their compression data obtained by the arithmetic coding

Conclusions
This paper proposes the MTAC method to encode a gray-level image f.The MTAC method contains the MED processing, BST, and statistical-model-based compressing approaches.The MED processing approach reduces the entropy rate of f.The BST approach decreases the spatial redundancy of the difference image of f based on the similarity among adjacent pixels.The statistical-model-based compressing approach further compresses the data generated in the MED processing and BST approaches, based on their coding redundancy.The data patterns of the data produced by the MED processing approach and the BST approach have different bit lengths and distinct occurrence frequencies.Hence, the MTAC method first classifies the data into clusters before compressing the data in each cluster using the arithmetic coding algorithm via separated code tables.
The experimental results reveal that the MTAC method usually gives a better bit rate than the lossless JPEG2000 does, particularly for the images with small gray-level variations among adjacent pixels.However, when the gray-level variations among adjacent pixels in an image are very large, the MTAC method performs worse in terms of bit rate.

Figure 1 :
Figure 1: The scanning order of the MED in an image represented by 8 × 8 pixels.

Figure 2 :
Figure 2: Part of the pixels in an image.

Figure 3 :
Figure 3: Two gray-level images, Airplane and Baboon, and their color histograms.

Figure 4 :
Figure 4: The difference images and their histograms of Airplane and Baboon.

Figure 5 :
Figure 5: The sign bit images of the images Airplane and Baboon.

Figure 6 :
Figure 6: The pixel values of e i, j .

Figure 11 :
Figure 11: The difference image of Barb and its partial image.
TABLE , String MSB , String g min , String S , and String GD into the compression data.Here, String CODE TABLE represents all the code tables; String MSB , String g min , String S , and String GD are the compression data of the MSB image, all g min s, Ss, and GD S s, respectively.

Table 1 :
Entropies of ten original images and their error images.

Table 4 :
Bit rates bits/pixel obtained by the MTAC and lossless JPEG 2000.