Improved Quantization Error Compensation Method for Fixed-Width Booth Multipliers

A novel quantization error (QE) compensation method is proposed in design of high accuracy fixed-width radix-4 Booth multipliers, which will effectively reduce the QE and save the area of multipliers when they are employed in cognitive radio (CR) detector and digital signal processor (DSP).The truncated partial-products of the proposedmultipliers are finely divided into three sections: reserved section, adaptive compensation section, and constant compensation section.TheQE compensation carries of the multipliers are generated by applying probability estimation based on a shrunkenminor truncated sectionwhich is a combination of the constant compensation and adaptive compensation.The proposed compensation method not only reduces the QE of the fixedwidth Booth multipliers, but also avoids the exhaustive computing resources (time and memory) during getting the compensation carries by statistical simulation.The proposedmethod can achieve higher accuracy than the existing works under the same area and power budgets. Simulation and experiment results show that the improved compensation method has the minimum power-delay products compared with the existing methods under the same area and can save up to 30% area for realization of full-width radix-4 Booth multipliers.


Introduction
The fixed-width multipliers have been widely used in the design of digital signal processor (DSP) due to their smaller area and lower power dissipation [1][2][3].In order to reduce the chip area of channel detector for cognitive radio, many fixed-width Booth multipliers have been used.However, they reduce the detection accuracy because of truncated partial-products.Therefore, a quantization error compensation (QEC) technique is required in the design of fixed-width Booth multipliers.
The traditional methods of QEC for fixed-width multipliers can be divided into three categories.The first category is constant compensation [4].In this method, the value of QEC is a constant, and it has the advantage of simplicity, but leads to large quantization error (QE).The second category is adaptive QEC [5].This method can reduce the truncated error by using variable compensation value.The third category is hybrid error compensation, which uses both constant and adaptive QEC techniques together to reduce the truncated error.Compared with the first two categories, this category is more accurate [6].In [7,8], two new QEC methods have been presented, respectively.However, these two methods are usually used in the design of Baugh-Wooley multiplier and are not applicable for the design of Booth multipliers.Another QEC method using binary threshold algorithm has been presented in [9], but its accuracy is lower compared with the method presented in [1].In [2], the statistical analysis and linear regression analysis are adopted to generate QEC value, and the adaptive QEC method is introduced into the design of fixed-width radix-4 Booth multipliers for the first time.The adaptive QEC method has higher accuracy compared with [1] but is very complex.In order to overcome the disadvantages of [2], [10] has presented a method of dividing the truncated partial-products into the major truncated section and the minor truncated section.The major truncated section is adjacent to the reserved section, while the QEC of the minor truncated section is realized by adaptive method.In [11], a new QEC method based on [10] is proposed.Its truncation error is more symmetric, but its quantization accuracy is not improved distinctly.Refrence [12] has proposed an adaptive QEC method with conditional-probability estimator instead of the time-consuming exhaustive simulation of the previous works, especially for large bit-width Booth multipliers.One closed form formula was derived in [13] with the traditional method of probabilistic estimation to estimate the QEC.However, the closed form formula is only an approximation of the mean of the minor truncated section, which decreased the compensation accuracy.
The compensation carries of the minor truncated section in [10][11][12][13] are generated by exploiting adaptive QEC methods based on all the truncated partial-products.In [10,11], the compensation carries are generated by all the truncated partial-products, and hence the simulation statistics will consume large amount of computer resources (time and memory requirements).The demanded resources are almost an exponential function of the bit-width.Therefore, for the multipliers with wider bits, an ordinary computer cannot compete for the simulations.Although the resources are decreased in [12,13], their QEC accuracy cannot be effectively improved.
Based on the trade-off between the accuracy and computer resources, the minor truncated section is divided into two parts in this paper: the lower partial-products and the upper partial-products.In fact, the compensation carries are less affected by the lower partial-products of the minor truncated section.Therefore, we propose that a compensation constant of the lower partial-products can be generated by statistical analysis, and then the compensation constant is incorporated into the upper partial-products to form a shrunken minor truncated section.Finally, the quantitative compensation carries are created by applying probability estimation based on the shrunken minor truncated section; hereafter, this multiplier is called shrunken partial-products compensation (SPPC) Booth multiplier.The proposed QEC method not only reduces the QE of the multipliers, but also avoids the exhaustive simulation resource requirements.Simulations and experiments show that, comparing with the previous works [9][10][11][12][13], the proposed QEC method can effectively improve the QE and performances of the fixedwidth Booth multipliers.In order to verify the proposed QEC method we have also designed different width SPPC multiplier circuits and have compared them with other same width multipliers based on TSMC 0.18 m process.The experiments show that the proposed SPPC Booth multipliers have smaller die area as the width of multipliers increases.
The rest of the paper is organized as follows.Section 2 introduces the principle of the modified radix-4 Booth multiplier.Proposed QEC method is discussed in Section 3 along with circuit realization.Simulation results, comparisons, and an application experiment are presented in Section 4. Section 5 concludes the paper.

Modified Radix-4 Booth Multiplier
The modified radix-4 Booth recoding method was proposed in [14], which is a common method used in Booth multiplier designs.
The N-bit multiplicand  and M-bit multiplicator  of the 2-complements' multiplier are expressed as follows: According to modified radix-4 Booth recoding, from the most significant bit (MSB), every three bits form a group and adjacent groups overlap by one bit.When M is odd,   =  −1 is assumed for proper recoding for sign extension.In any case,  −1 = 0.  in (2) can be rewritten as where The recoding radix upon ( 1) is listed in Table 1. ,sel1 / ,sel2 ,  ,sel , and    denote the binary expression, signs, and nonzero states of recoding values, respectively.The scheme of the modified radix-4 Booth recoding is shown in Figure 1.
In a radix-4 Booth multiplier, each partial-product ( , ) associates with two adjacent bits in .The partial-products of four possible combinations of   and  −1 are shown in Table 2.
In the proposed method, the QE of  , is compensated by adaptive QEC, and the QE of  , is compensated by constant QEC.As a rule of thumb, 3∼5 columns of partial-products are needed to compose  , when the width of multipliers is 8∼32 bits.
Supposing that both the probabilities of  , = 1 and where [⋅] is the expected value.Substituting (11) in (9) we can obtain Thus, the constant 3/16 acts as the QEC value of  , .Similarly, the expected value of  7 is 1/2.We replace  7 with 1/2 in (8), obtaining its equivalent decimal value of 1/8.Based on the above proposed SPPC multiplier, the  , with carries in Figure 2 can be redisplayed in Figure 3, which is denoted by   , .The maximum carries will be generated if all p i,j in P L,A are 1.In Figure 3, the maximum carries need 7 bits; therefore we register the carry output states of P L,A in the variable   ( = 0, . . ., 6) temporarily.In the following, we propose one method that associates the nonzero recoding label    with the compensation carries.
According to the number of 1 in    , we divide the shrunken partial-product section   , in Figure 3   deduced as above.Compared with [10,11], the method can greatly reduce the simulation cost.The statistical results of different categories are listed in Table 4 (note that the middle bit between  7,0 and  6 in   , is still regarded as a partialproduct in simulation).
We then encode each category with 4-bits to associate    with compensation carries.For example, cate-0 is encoded  3  2  1  0 = 0000, cate-1 is encoded  3  2  1  0 = 0001, and so on.The compensation carries   are derived by Table 4, which comply with the following rule: if the numbers of   = 1 are larger than a half of the numbers of simulations (NoS), then   = 1, otherwise   = 0.The corresponding relations between the code ( 3  2  1  0 ) and c i are listed in Table 5.
According to Table 5, the carries  0 ∼  3 are

Architecture Design of SPPC Booth Multipliers.
The circuit implementation of category encoding is shown in Figure 4, where the 2Bs-Adder and 3Bs-Adder denote the two-bit adder and three-bit adder, respectively.In light of ( 13), one implementation of carry generation circuits is shown in Figure 5.
According to the above discussions, a modified radix-4 Booth fixed-width multiplier with the proposed QEC circuits is shown in Figure 6.The traditional Booth recoding encoder, circuits of the proposed category encoding, and carry generation compose the QEC circuits to generate the QEC carries.Table 5: Relations between category code and   .

Comparisons and Discussions
4.1.Quantization Accuracy Simulations.The comparison of various errors between the proposed SPPC Booth multipliers and the ideal truncated Booth multiplier and other previous works is listed in Table 6.These errors include the average error  mean , maximum error  max , and variance  var .In accuracy simulation, all the pair data samples are inputted to estimate the QE of the SPPC multiplier.The  mean ,  max , and  var are defined as follows: where  and   are the ideal product and the quantized product of Booth multipliers, respectively; | ⋅ | and max{⋅} are the absolute and maximum operators, respectively.The adaptive estimation methods in [9][10][11] have been adopted to improve the truncation error.Instead of exhaustive computing resource simulation methods in previous works [9][10][11], the QE of SPPC multipliers is analyzed and derived from a simpler statistical method.It is seen from Table 6 that the proposed SPPC multipliers have almost the best error performance compared with previous works, except [11] that has the highest performance of  mean .The reason is that it uses more information from Booth encoder to alleviate the truncation errors [11].Nevertheless, the area cost in [11] is increased from the extra information of compensation circuits.Even though  max and  var of the multipliers in [13] are smaller than other multipliers, their  mean is larger compared with the proposed SPPC multiplier and multipliers in [10][11][12].
The distributions of QE have been calculated in different multipliers.The sample ratios of QE value (i.e., ) are listed in the last three columns of Table 6.It is seen from the statistical results that the sample rations of || < 1 in the SPPC Booth multipliers are higher than that in [9][10][11] by about 13%.On the other hand, this shows that the quantization accuracy of the proposed SPPC multiplier is higher compared with those four methods.

Performance Simulation.
A comparison of performances between the proposed SPPC and previous works is implemented by using their own compensation circuits.Multipliers with different widths are synthesized by Synopsys Design Compiler using a standard cell library of TSMC 0.18 m CMOS process.Their area, delay, and power dissipation are listed in Table 7.
In general, there exists a trade-off between the hardware overhead and the accuracy in these compensation circuits.The multiplier proposed in [11] has the highest accuracy in  mean , but it has a larger area, delay, and power.However, the SPPC multiplier has the same area as the multipliers in [12,13] and lower area compared with the multipliers in [9][10][11].As a result, the proposed SPPC multipliers achieve higher accuracy at the cost of the lower area.
In order to comprehensively compare the performances of different multipliers, we consider their power-delay products as the standard of comparisons, which are listed in the last column of the Table 7.It is shown that the power-delay products of multipliers proposed in [9][10][11] are larger than that of other multipliers distinctly.Compared with the other two multipliers in [12,13], the proposed SPPC multipliers have better comprehensive performances.attenuation in the stop-band for a CR detector.All the widths of input, output, and coefficient of the FIR filter are 16 bits, and the internal adders of the FIR filter are 22 bits.The input for test is a 5 MHz sinusoidal signal with a sampling rate of 40 MHz.Four different multipliers (16 × 16 bits) including the SPPC multiplier and three other multipliers in [11][12][13] are instantiated in the filter.The error mean and error variance of the output samples in different instantiated multipliers are listed in Table 8.It is seen from Table 8 that the error mean of [11] is the smallest, and the error mean of SPPC is better than that of [12,13], whereas the error variance of SPPC is the smallest.These results are consistent with the QE accuracy simulation results in the previous section.
In CR detectors, it is very important to detect the signal's spectral peak-values for determining whether the channel is idle [16].The relative errors of the average spectral peakvalues (with 100 times of simulations) with respect to the ideal spectral peak-values of the FIR filter outputs are listed in the last column of Table 8.Table 8 shows that the spectral peakvalues of the proposed SPPC multiplier are closer to the ideal peak-value.

Conclusion
By further dividing the minor truncated section of Booth multiplier into the adaptive compensation and constant compensation sections, we rebuilt the adaptive QEC for fixedwidth multipliers.According to the numbers of 1 in the sequence of nonzero Booth recoding label we propose a new QEC method to generate the compensation carries.The simulation results have shown that the QE of the SPPC is smaller compared with the existing methods.The proposed QEC method and SPPC are useful for the DSP system with a large width multipliers and higher precision requirements.

Figure 6 :
Figure 6: Partial-product array of fixed-width Booth multiplier with proposed QEC.

Table 1 :
The modified Booth recoding.

Table 3 :
Partial-products of different    .

Table 4 :
The statistic of carry output states for different categories.