A Size-optimization Design for Variable Length Coding Using Distributed Logic *

In this paper, we first employ an efficient approach to reduce the time to construct a codeword. With this codeword, a novel VLSI architecture is proposed to realize highspeed Variable Length Coding (VLC). In order to combine with other circuits using cellbased design, we adopt distributed logic rather than memory devices (ROM, PLA) for the implementation. In this architecture, the VLC coding scheme is partitioned into two parts, one is the codeword length and order index for bit control, another is the codeword bank for actual codeword generation. The advantage is that the circuit size of the proposed method can be reduced, where the transistor count of proposed method is only 1/2 and 1/4 of that of ROMbased and PLA-based in average, respectively.


I. INTRODUCTION
Variable length coding (VLC) is a popular techni- que for reducing redundancy in data due to its loss- less data compression.In current coding systems, VLC is also a key component that combines with other data compression method to reduce bit rate further, such as video coding systems.Efficient data encoding schemes can be valuable in the design of databases [1], and many algorithms [2,3] are employed to realize VLC by software programs.However, they can't meet the speed requirement in practical applications.Thus the VLSI implementa- tion for variable length coding is announced in reports [4][5][6][7][8][9][10] that integrate circuits into single chips with low cost fabrication to attain highspeed operation.Most of papers only discuss the real-time VLC decoder, because the VLC encoder is not considered as the critical bottleneck in a VLC codec.Generally, traditional memory devices such ROM [4] or PAL [6] that combines with control circuits can easily realize VLC encoders for real time applications.However, when chip design employs memory devices, which circuit size can *This work was supported by the National Science Council, Republic of China, under Grant NSC87-2213-E-235-001.
Corresponding author.Fax: 886-7-6011012, e-mail: hsia@ccms.nkfu.edu.twnot be efficiently reduced in the implementation, and its design scheme becomes more difficult in system simulation.
It is very interesting "how to use a new archi- tecture rather than memory device and reduce the implementation cost".Basically, the circuit imple- mentation with cell-based scheme is easier to create and control than memory device in the system design.With this reason, we explore the realization of a VLC by using distributed cell logic.Although logic synthesis tools are popular in current chip design, the circuit size becomes larger if we directly write "If-then" structure for VLC codeword with high level language (such as VHDL) because there are too many "if-then" statements.To achieve this goal, we first employ a new approach that is an easy and efficient method to produce codeword, the results are the same as binary tree-based.Then, a new architecture that uses distributed logic rather than complete ROMs, PLAs structure is proposed.
There are two advantages in our proposed method, one is that the circuits can be efficiently reduced, the other is that the circuits can easily combine with other system with cell-based design to take full system simulation.In order to describe this paper, the structure is organized as following: In Section II, we address a fast approach for the construction of codeword.The new architecture for variable length coding is illustrated in Section III.Section VI shows the performance comparison with exited VLC technologies.

II. THE CODEWORD CONSTRUCTION
Variable length code (VLC) is referred using Huffman coding [5], which maps the probabilities of symbols into variable-length codeword.The (C)@@ Q codewords are created by binary-tree structure, the setup procedure is shown in Figure 1.We first compute the probability of each symbol, and sort their probabilities.The binary-tree construction can be established by Huffman-Tree, then the codeword assignment is according to probabilities, which is shown in Figure 2.
In order to create codeword quickly, a new approach is used, which the operation is shown in Figure 3.The symbol C denotes the codeword that is in rth order of the kth group, where k is codeword length.C], and C represent the 1st and 2nd codewords in the kth group, respectively.In the same kth group, the codeword is sequentially increased by one, thus C2+= C + 1.When the length of codeword is increased, the first codeword can be attained from the addition of last codeword of current length and one, then the bit data is shifted by left-2.So, we have codewords could be produced, Table I shows the codewords for each symbol, the indexes of k length codeword and rth word oflength are useful for hardware design that will be addressed in the next section.

III. SYSTEM ARCHITECTURE OF THE PROPOSED VLC
where T is the threshold of k length codeword.T can be determined by [n] denotes the number of codeword in kth group.
For example, when k 4, T4 22 Y[2] + 2 Y[3] + Figure 4 shows the system diagram of the proposed architecture.The input symbols are with fixed length, however, the output codeword is variable length, so a first-in first-out (FIFO) memory is required as a speed buffer.The symbols are written to FIFO by fixed clock rate.When the overflow-flag (OF) of FIFO becomes high, the write-data of FIFO should be idle.In another    FIGURE 4 The system diagram of proposed real-time VLC.
case, as the FIFO is empty, the underflow-flag (UF) of FIFO becomes high, this case denotes that the bit-stream of output is not available now.
The symbols could be encoded to the kth length codeword and the rth word of kth length using Length Encoder and Order Encoder respectively, which can be realized by referring Table I.The codeword length k is loaded to the down-counter and the Length Decoder (LD) in the same time.The LD controls which-one modules of kth-length codeword to be selected by enable pin (En).The Bi is binary code that denotes the rth word of length, where 0 to m, rn is the most significant bit.With the value of Bi and En, the codeword can be selected to load the shift register (R1 Rk).Now, we can attain one-bit codeword from the shift register in every one clock.As coding procedure goes on, the value of down-counter is decreased by one when shift register outputs one-bit codeword.
Until the value of down-counter becomes zero, the read-clock of FIFO is active, then the new symbol will be processed by the same procedures.
Figure 5 shows the internal circuits of k-length modules, which is easy to realize by using the Table I.The modules of codeword banks are gen- erated by tri-state buffers, in which the kth bit code- word only use k buffers.The detailed scheduling  II.The two-phase F+I and Fq52 are used to control the down-counter and the shift register, respectively.The timing diagram is shown in Figure 6.When coding procedure is starting, the 'h' symbol first inputs the coding system from FIFO, then the Length Encoder and Order Encoder individually outputs k-5 and B-0 10, because its word-length and order is 5 and 3 respectively.The k value sends to Length Decoder (LD), then we attain the E4_ 1000 from LD.In next phase, the 5th bank codeword is loaded to shift register.The shift register data is variable from R1 R5, now the data is '10110'.Thus we first get 'l' in the output of shift register in the first clock.In the second clock, the shift register outputs '0', and the down counter becomes to 4. Continue this procedure, we should com- pletely attain the bit stream 10110' for h symbol after 5 clocks.Then the down counter becomes to zero, we again read FIFO data for VLC coding.
The new symbol 'a' enters this architecture, which is coded with the same above procedures.Hence we spend k clocks to code kth length codeword, there is no waiting cycle, which can achieve a real- time operation.In this architecture, the VLC coding scheme is partitioned into two stages, one is the codeword length and order index for bit control, another is the codeword bank for actual codeword generation.
In the codeword banks, we only employ a few tri-state buffers rather than memory devices such as ROM, PLA and so on.Actually, we can use one pass transistor instead of one tri-state buffer in physical design.Furthermore, some bit could be directly tired to VCC or GND in codeword banks, so the circuits should be further reduced in our architecture.In this system, the control circuits involve the Length Encoder, Order Encoder, Length Decoder and down counter.Length Encoder en- codes the number of bit for each codeword, which combines with down counter to control the bit stream data.So Length Decoder and down counter are also required even for ROM based or PLA based VLC coding.Order Encoder and Length Decoder are overhead in our architecture, the cir- cuit complexity will be discussed in the next section.

Vl. PERFORMANCE COMPARISON
The conventional VLC coding always uses the ROMs [4] or PLAs [6] to build-in codeword table.
Generally, the circuits of ROM or PLA contain the AND plane and OR plane.In ROM device, the circuit size of AND plane is fixed, the cell location is called its address.Assumed that maximum resolu- tion is n bits for symbols, the number of address is 2 and each address contains n bits, and the circuits need use two transistors to represent '0' or '1'.Thus the number of transistors requires n2 n+l in AND plane.In OR plane, we require p2n, where p is maximum codeword length.So, the sum of the number of transistor SROM needs SOM n2 n+' + p 2 2n(2n + p). (3) To reduce the circuits of ROM based, the PLA device is always employed for VLC coding.Since the AND plane and OR plane of PLAs are all programmable, the total number of transistors requires 2rnn and rnp for AND and OR plane respectively, where rn denotes the total number of input symbols.So the number of total transistor SpL A in PLA based requires SpLA 2mn + rnp m(2n + p). (4) Since 2> m, the circuit of PLA based is less than the one of ROM based.
In the proposed architecture, the codeword banks contain k groups, the kth group uses k transistors if all bits need pass transistor.So the maximum number requires y'f 2 transistors in codeword banks.In Length Decoder (LD), we use E1 Ek to control which one codeword banks is active, in which we can find Emax--P.Thus, the number of output requires p and the number of maximum input is log.We need 2plog transis- tors to realize the LD circuit.Consider the Order Encoder, the number of output Bi is less or equal to log, then we need 2mlog transistors in the maximum condition.Therefore, the number of transistor requires p Sproposed / 2p log + 2m log (5) i=2 in proposed architecture.Table Ill shows the comparison of the transistor number with ROM based, PLA based and the proposed architecture.The complexity of our method is about 1/4 and 1/2 of ROMs and PLAs in average, respectively.Survey the complexity order, we can find the one of ROM based is the exponential-product order O(2nP), the one of PLD based is the linear-product order O(mp), and the one of proposed architecture is the log-product order O(mlog).It is obvious that the proposed architecture becomes more efficient when the codeword size is large.
V. CONCLUSIONS In this paper, we first present an efficient VLC method to create the codeword quickly, and the constructed codeword is capable to compress data with variable length format.With these codewords, the architecture of proposed VLC is partitioned to two parts, one is an index for coding control, and another is a real codeword bank.With this scheme, the complexity order of circuits can be efficiently decreased from the linear-product order to logproduct order, so the proposed architecture be- comes popular.When the codeword size is more and more large, the proposed technique becomes more efficient than the conventional ROM or PLD based.The timing shows that we use k clocks to encode in kth-length codeword, there is no waiting cycle, so this architecture can achieve a real-time purpose.As this proposed VLC architecture com- bines with other coding methods such as MEPG or JPEG coding, we can take full system simulation.It becomes higher efficiency when system on one-chip (SOC) is developed in the future.
FIGURESetup procedure for codeword construction.

FIGURE 2
FIGURE 2 Codeword assignment with binary tree based.

FIGURE 3
FIGURE3 The fast codeword creation.

FIGURE 5
FIGURE 5 Codeword bank in the modular of k-bits.

TABLE III
#The number of symbols is m, each symbol has n bits.The maximum length of codeword is p.