We investigate Residue Number System (RNS) to binary conversion, which is an important issue concerning the utilization of RNS numbers in Digital Signal Processing (DSP) applications. We propose two new reverse converters for the moduli set {2n+1+1,2n+1-1,2n}. First, we simplify the Chinese Remainder Theorem (CRT) to obtain a reverse converter that uses mod-(2n+1-1)
operations instead of mod-(2n+1+1)(2n+1-1)
operations required by other state-of-the-art equivalent converters. Next, we further reduce the hardware complexity by making the resulting reverse converter architecture adder based. Two hybrid Cost-Efficient (CE) and Speed-Efficient (SE) reverse converters are proposed. These two hybrid converters are obtained by combining the best state-of-the-art converter with the newly introduced area-delay efficient scheme. The proposed hybrid CE converter outperforms the best state-of-the-art CE converter in terms of delay with similar area cost. Additionally, the proposed hybrid SE converter requires less area cost with smaller delay when compared to the best state-of-the-art equivalent SE converter.
1. Introduction
The presence of carry chains in conventional weighted number systems such as binary or decimal number systems often limits the efficiency of computer arithmetic operations. Residue Number System (RNS) is a number system having an attractive carry-free property, which has proved to be highly useful in many Digital Signal Processing (DSP) applications requiring high-speed computations [1, 2]. RNS also has the following inherent features due to its carry-free property: modularity, parallelism, and fault tolerance. In order not to offset the speed gained in RNS operations, a fast RNS-to-binary converter is required. The complexity as well as the efficiency of RNS to binary converter is determined by the moduli choice and by the conversion algorithm. Many different choices of moduli sets are available for RNS to represent the binary numbers in a certain range. Three moduli sets have been actively investigated, for example, {2n-1,2n,2n+1} [3], {2n,2n-1,2n-1-1} [4], {2n+1,2n-1,2n} [5], and {2n+2,2n+1,2n} [6]. The dynamic range of these moduli sets is not sufficient for applications requiring larger dynamic range. Thus, the moduli set {2n-1,2n,2n+1}, which is the most popular length three moduli set, has been enhanced to {2n+1+1,2n+1-1,2n} [7] and {2n+1-1,2n,2n-1} [8], to mention just a few. Generally speaking, RNS-to-binary conversion is either based on the Chinese Remainder Theorem (CRT) [1, 2, 6] or the Mixed Radix Conversion (MRC) [9].
In this paper, we propose two new hybrid Cost Efficient (CE) and Speed Efficient (SE) reverse converters for the moduli set {2n+1+1,2n+1-1,2n}. We obtain the two hybrid converters by combining the converters in [7] with the newly introduced area-delay efficient converters. First, we simplify the CRT to obtain a reverse converter that uses mod-(2n+1-1) operations instead of mod-(2n+1+1)(2n+1-1) operations required by the proposal in [7]. Next, we further reduce the hardware complexity by making the resulting reverse converter architecture adder based. Basically, the newly introduced scheme is made up of two Carry-Save Adders (CSAs) and (n+1)-bit Carry Propagate Adder (CPA). The proposed CE converter outperforms the one in [7] in terms of delay with similar area cost. Additionally, the proposed SE converter requires less area cost and smaller conversion delay when compared to the one in [7].
The rest of the paper is organised as follows: Section 2 presents the necessary background information. In Section 3, we describe the proposed algorithm. Section 4 presents the hardware realization of the proposed algorithm, and a comparison with the state-of-the-art reverse converters is provided in Section 5. The paper is concluded in Section 6.
2. Background
RNS is defined in terms of a set of relatively prime moduli set {mi}i=1,k such that gcd(mi,mj)=1 for i≠j, where gcd means the greatest common divisor of mi and mj, while M=∏i=1kmi, is the dynamic range. The residues of a decimal number X can be obtained as xi=|X|mi, thus X can be represented in RNS as X=(x1,x2,x3,…,xk), 0≤xi<mi. This representation is unique for any integer X∈[0,M-1]. We note here that in this paper we use |X|mi to denote the X mod mi operation.
For a moduli set {mi}i=1,k with the dynamic range M=∏i=1kmi, the residue number (x1,x2,x3,…,xk) can be converted into the decimal number X, according to the Chinese Reminder theorem, as follows [10]:X=|∑i=1kMi|Mi-1xi|mi|M,
where M=∏i=1kmi, Mi=M/mi, and Mi-1 is the multiplicative inverse of Mi with respect to mi.
This general scheme can be actually simplified when certain moduli sets of interests like {2n+1+1,2n+1-1,2n} are utilized. For this moduli set efficient converters have been presented in [7]. In [2], the New CRT proposed in [3] is formulated as follows:X=x1+m1|w1x1+∑i=2kwi|Ni-1xi|mi|m2⋯mk,
where k>1, wi=Ni/m1 and Ni-1 is the multiplicative inverse of Ni with respect to mi. For the moduli set {2n+1+1,2n+1-1,2n}, the following relation, based on the New CRT represented by (2) has been presented in [7]:X=x1+2n|2n+2(x2-x1)+2n+1(2n+1+1)(x3-x2)|22n+2-1.
Given that the residues (x1,x2,x3) have the following binary representations:x1=(x1,n-1x1,n-2⋯x1,1x1,0),x2=(x2,n+1x2,n⋯x2,1x2,0),x3=(x3,nx3,n-2⋯x3,1x3,0).
Equation (3) was further simplified to obtainX=x1+2n|v1′+v21+v3|22n+2-1,
wherev1′=x¯1,n-1⋯x¯1,1x¯1,0︸nx¯2,n+1⋯x¯2,1x¯2,0︸n+1,v21=x2,n⋯x2,1x2,0︸n+10⋯00︸nx2,n+1,v3=x3,n⋯x3,1x3,0︸n+1x3,n⋯x3,1x3,0︸n+1.
Given that the moduli set {2n+1+1,2n+1-1,2n} is desirable, can we obtain a more effective reverse converter when compared to the ones in [7]? In the following section, we present two effective reverse converters for the moduli set {2n+1+1,2n+1-1,2n} by first simplifying (1).
3. Proposed Algorithm
Given the RNS number (x1,x2,x3) with respect to the moduli set {m1,m2,m3} in the form {2n+1+1,2n+1-1,2n}, the proposed algorithm computes the decimal equivalent of this RNS number based on a further simplification of the well-known traditional CRT. First, we show that the computation of the multiplicative inverses can be eliminated for this moduli set resulting into memoryless reverse converters. Next, we obtain reverse converters that utilize modulo-(2n+1-1) operations instead of modulo-(2n+1+1)(2n+1-1) operations required by the state-of-the-art equivalent converters. We further reduce the hardware complexity by obtaining adder-based reverse converters.
Theorem 1.
Given the moduli set {m1,m2,m3} with m1=2n+1+1,m2=2n+1-1,m3=2n, the following holds true:
|(m1m2)-1|m3=2n-1,|(m2m3)-1|m1=1,|(m1m3)-1|m2=1.
Proof.
If it can be demonstrated that |(2n-1)×(m1m2)|m3=1, then (2n-1) is the multiplicative inverse of (m1m2) with respect to m3. |(2n-1)×(m1m2)|m3 is given by
|(2n+1+1)(2n+1-1)(2n-1)|2n=|(22n+2-1)(2n-1)|2n=|23n+2-2n-22n+2+1|2n=||23n+2|2n-|2n|2n-|22n+2|2n+1|2n=|0-0-0+1|2n=1,
thus (7) holds true.
In the same way if |1×(m2m3)|m1=1, then 1 is the multiplicative inverse of (m2m3) with respect to m1. |1×(m2m3)|m1 is given by: |(2n+1-1)(2n)|2n+1+1=|22n+1-2n|2n+1+1=|1|2n+1+1=1, thus (8) holds true.
Again, if |1×(m1m3)|m2=1, then 1 is the multiplicative inverse of (m1m3) with respect to m2. |1×(m1m3)|m2 is given by
|(2n+1+1)(2n)|2n+1-1=|22n+1+2n|2n+1-1=|1|2n+1-1=1,
thus (9) holds true.
The following important relations are used in the subsequent theorem: Given the moduli set {m1,m2,m3} with m1=2n+1+1,m2=2n+1-1,m3=2n, the following holds true:m1=m2+2,m1=2m3+1,m2=2m3-1.
Theorem 2.
The decimal equivalent of the RNS number (x1,x2,x3) with respect to the moduli set {m1,m2,m3} in the form {2n+1+1,2n+1-1,2n} is computed as follows:
X=2m3(x3-x1)+x3+m3m1|x1-2x3+x2|m2.
Proof.
Equation (1) for k=3 is given by
X=|∑i=13Mi|Mi-1xi|mi|M.
By substituting (7), (8), and (9) into (16) we obtain the following:
X=|(m2m3)x1+(m1m3)x2+(m1m2)(m3-1)x3Mi-1xi|M,=|(m2m3)x1+(m1m3)x2+M-m1m2x3Mi-1xi|M,=|(m2m3)x1+(m1m3)x2-m1m2x3Mi-1xi|M,
Substituting (12) in the above equation, we obtain
X=|m3(m1-2)x1+m1m3x2-m1m2x3|M,=||Mi-1xi|m1m3x1-2m3x1+m1m3x2-m1m2x3|M,=-2m3x1+||Mi-1xi|m1m3x1+m1m3x2-m1m2x3|m1m2m3.
Equation (18) can be further simplified by using the following lemma presented in [11]:
|am1|m1m2=m1|a|m2.
Applying (19), (18) becomes
X=-2m3x1+m1|m3x1+m3x2-m2x3|m2m3.
Using (14) in (20), we obtain
X=-2m3x1+m1|m3x1+m3x2-x3(2m3-1)|Mi-1xi||m2m3,=-2m3x1+m1||Mi-1xi|m3x1+m3x2-2m3x3+x3|m2m3,=-2m3x1+m1x3+m1||Mi-1xi|m3(x1-2x3+x2)|m2m3.
Applying (19), (21) becomes
X=-2m3x1+m1x3+m1m3|x1-2x3+x2|Mi-1xi||m2.
Using (13) in (22), we obtain
X=-2m3x1+x3(2m3+1)+m1m3|x1-2x3+x2|Mi-1xi||m2,=2m3(x3-x1)+x3+m1m3|x1-2x3+x2|Mi-1xi||m2,
thus, (15) holds true.
We reduce the hardware complexity by further simplifying (15) using the following properties [7].
Property 1.
Modulo (2s-1) multiplication of a residue number by 2t, where s and t are positive integers, is equivalent to t bit circular left shifting.
Property 2.
Modulo (2s-1) of a negative number is equivalent to the one’s complement of the number, which is obtained by subtracting the number from (2s-1).
Assumption 1.
The hardware complexity is reduced based on the assumption that x1<2n+1 always holds true.
Based on the given assumption, x1, which is (n+2)-bit binary number can now be represented like an (n+1)-bit number. Therefore, the residues (x1,x2,x3) have binary representations as follow:x1=(x1,nx1,n-1⋯x1,1x1,0),x2=(x2,nx2,n-1⋯x2,1x2,0),x3=(x3,n-1x3,n-2⋯x3,1x3,0).
Then (15) is further simplified using the following theorem.
Theorem 3.
Provided that x1<2n+1 holds true, the binary equivalent of the RNS number (x1,x2,x3) with respect to the moduli set {m1,m2,m3} in the form {2n+1+1,2n+1-1,2n} is computed as follows:
X=A0+22n+1A1+2nA1,
where
A1=|x1+x2+u3|2n+1-1,A0=u1+u2,u1=(x3,n-1x3,n-2⋯x3,0)︸n0(x3,n-1x3,n-2⋯x3,0)︸n,u2=(x¯1,n+1x¯1,n⋯x¯1,011⋯1)︸2n+2,u3=(x¯3,n-1x¯3,n-2⋯x¯3,01)︸n+1.
Proof.
We need to show that (15) can be presented as (25), and that the values of A0,A1,{ui}i=1,3 are valid. It should be noted from (15) that 2m3(x3-x1)+x3=2n+1x3+x3-2n+1x1=u1+u2 and u1 can be represented as
u1=2m3x3+x3=2n+1x3+x3=2n+1(x3,n-1x3,n-2⋯x3,1x3,0)︸n+(x3,n-1x3,n-2⋯x3,1x3,0)︸n=(x3,n-1x3,n-2⋯x3,1x3,0)︸n00⋯0︸n+1+(x3,n-1x3,n-2⋯x3,1x3,0)︸n=(x3,n-1x3,n-2⋯x3,0)0(x3,n-1x3,n-2⋯x3,0)︸2n+1.
Also, we represent u2 as
u2=-2n+1x1=-2n+1(x1,n+1x1,n⋯x1,0)︸n+1=-((x1,n+1x1,n⋯x1,0)︸n+1(00⋯0)︸n+1)=(x¯1,n+1x¯1,n⋯x¯1,011⋯1︸2n+2).
Next, we convert each term in |x1+x2+u3|2n+1-1 to an (n+1) bit binary number. No manipulation is required for x1 since it is treated like an (n+1)-bit number and also x2 is already an (n+1)-bit number. The only term to be manipulated is -2x3 and this is carried out as follows:
u3=-2x3=-2(x3,n-1x3,n-2⋯x3,0)︸n=-(x3,n-1x3,n-2⋯x3,00)︸n+1=(x¯3,n-1x¯3,n-2⋯x¯3,01)︸n+1.
However, if the condition x1<2n+1 does not hold true, the converter in [7], represented by (5), is utilized.
4. Hardware Realization
The hardware implementations of the proposed reverse converters are based on the combination of (25) and (5). We propose two converters based on this approach, termed “hybrid approach”. The hardware realizations of the proposed schemes are depicted by Figures 2 and 3. In Figure 2 (the proposed hybrid cost-efficient converter), x1 is first input into the system. Given that the Most Significant Bit (MSB) of x1 is x1,n, then x1,n is compared with “1”. If x1,n=1, (5) is utilized just as outlined in [7] otherwise the hardware realization follows (25). It should be noted that the overhead for comparison is approximately zero as only the MSB of x1 needs to be compared with a 1. The condition x1,n=1 holds true only in very few cases. The probability of occurrence of this condition is denoted by p, and it can be seen from Figure 1 that p approaches zero as n increases. This is fully explained in the next section. However, if x1,n=0 (which occurs most of the time) holds true, the hardware realization is as follows. The operands x1, x2, and u3 in (25) are added using CSA1 producing s1 and c1, which are in turn added using a one’s complement adder (this is equivalent to a CPA with End Around Carry (EAC)). Suppose that B1 and B2 are, respectively, used to store the results of the 2n+1 and n-bit right shifting of the one’s complement adder. Since u1 is a 2n+1 bit number, it can be concatenated with B1 with no computational hardware. The second operand B2 is a (2n+1)-bit number with n-bit of zeros. B2 must be converted to a (3n+2)-bit number by appending (n+1)-bit of zeros to its MSB part. The third operand u2, which is a (2n+2)-bit number, is also made a (3n+2)-bit number by appending n-bit of ones to its MSB part. B1, B2, and u2 are all now (3n+2)-bit numbers and are to be added using CSA2 yielding s2 and c2. It should be noted that (2n+1) bits of the Full Adders (FAs) in CSA2 are reduced to Half Adders (HAs). The final result is supposed to be obtained by a CPA but the final result (obtained by means of simulation) is always the same as inverting s2. Thus, the final CPA is eliminated. On the other hand, Figure 2 depicts the hardware realization of the proposed hybrid speed efficient converter. Just like the proposed hybrid CE converter, hybrid SE converter is also made up of the same level of CSAs, but the only difference is that two CPAs are utilized in parallel instead of the 1’s complement adders in Figure 2. Consequently, the conversion time is significantly reduced.
Probability of occurrence of x1,n=1 versus n.
Proposed hybrid cost-efficient RNS to binary converter.
Proposed hybrid speed-efficient RNS to binary converter.
Design Example
Given the moduli set {2n+1+1,2n+1-1,2n}, where n=3, convert the residue number (x1,x2,x3)=(2,4,3) to binary.
In order to obtain the required binary equivalent, (25) is applied as given in Table 1 where CSA1 represents the first CSA in Theorem 3. The CPA then adds s1 and c1 producing Cr. We then obtain B1, B2, and u2 as already expalined above. CSA2 is used to add B1, B2, and u2 producing s2 and c2. It can be seen from Table 1 that Rf=-s2. Thus, the final CPA can be eliminated, and consequently the conversion delay and the area cost are significantly reduced.
Design example.
CSA1
CPA
CSA2
x1:0010
s1:1111
B1:00000110011
x2:0100
c1:0000
B2:00000000000
u3:1001
Cr:1111¯
u2:11111011111
s1:1111¯
Cr:0000
s3:11111101100¯
c1:0000
c3:00000100110
Rf:00000010011
5. Performance Evaluation
The performances of the proposed converters are evaluated in terms of area cost and conversion delay. Two efficient reverse converters have been proposed. The performance comparisons of these converters and the best state-of-the-art converter in [7] are presented in Tables 2, 3, and 4. From Table 2, it can be seen that the proposed hybrid CE outperforms the CE in [7] in terms of delay with slightly lesser or similar area cost (whenever x1,n=0, the same result otherwise). With the same condition, the proposed hybrid SE converter outperforms the SE converter in [7] in terms of both speed and area. Table 4 shows the occurrence probability and other associated parameters. We note here that in Table 4, the following notations are utilized: n is an integer value that determines various dynamic range requirements, c is the number of times x1,n=1 occurs within the dynamic range M of the system, p is the probability of occurrence of the condition x1,n=1 within the dynamic range of the system (p is computed by p=c/M), hybrid CE stands for the delay of the hybrid CE converter, CE [7] stands for the delay of the CE converter in [7], and hybrid SE stands for the delay of the hybrid SE converter while SE [7] stands for the delay of the SE converter in [7]. As shown in Table 4, p reduces as the dynamic range increases (i.e., as n increases). For example, when n=3, p=0.058824 whereas p=0.000244 when n=10. This implies, as can also be deduced from Figure 1, that the occurrence probability approaches zero as n continues to grow. The delays of hybrid CE and SE converters and that of the CE and SE converters in [7] are depicted by Figure 4. It can be easily seen from Figure 4 that the rate of growth of the conversion delay is comparably very small in the hybrid SE. Another interesting thing to note here is that the proposed CE converter and the SE converter in [7] have nearly equal conversion delay.
Area-delay comparison.
Converters
CE [7]
Proposed CE
SE [7]
Proposed SE
FA
3n+4
3n+3
5n+6
4n+4
HA
n
2n
n
2n
OR/NOT
6n+4
3n+3
2n+2
6n+4
Multiplexer
—
—
1
1
Delay
(4n+5)tFA
(2n+4)tFA
(2n+3)tFA+tMUX
(n+3)tFA+tMUX
Synthesized results: area-delay comparison.
n
Our area
Our delay in (n/s)
Area [7]
Delay [7] in (n/s)
3
14
17.574
26
28.083
4
19
22.808
30
28.289
5
23
24.419
40
32.335
6
25
24.874
47
39.975
7
32
30.166
54
42.598
8
36
30.991
48
46.504
9
37
30.326
66
50.154
10
44
38.375
69
50.700
Simulation results: showing the occurrence probability and other associated parameters.
n
c
M
p
Hybrid CE
CE [7]
Hybrid SE
SE [7]
3
120
2040
0.058824
10.4118
17
6.1765
9
4
496
16368
0.030303
12.2727
21
7.1212
11
5
2016
131040
0.015385
14.1692
25
8.0769
13
6
8128
1048512
0.007752
16.1008
29
9.0465
15
7
32640
8388480
0.003891
18.0584
33
10.0272
17
8
130816
67108608
0.001949
20.0331
37
11.0156
19
9
523776
536870400
0.000976
22.0185
41
12.0088
21
10
1048000
4295000000
0.000244
24.0051
45
13.0024
23
Delay comparison: hybrid converters versus converters [7].
Additionally, the proposed hybrid SE converter (whenever x1,n=0) and the SE converter in [7] are implemented using Xilinx92i FPGA technology for varoius dynamic range requirements (different values of n). The target technology is Xillinx (Xa3s200-4vqg100) FPGA. The performance is evaluated in terms of area (measured in terms of the number of slices) and delay (represents the total gate delay, which is measured in nanoseconds). Table 3 shows the synthesized results for various values of n, which show the superiority of our scheme over the one in [7]. Consequently, the RNS-to-binary converters proposed in this paper are better than the ones in [7].
6. Conclusions
In this paper, we proposed two new reverse converters for the moduli set {2n+1+1,2n+1-1,2n}. First, we simplified the traditional CRT to obtain a reverse converter that uses mod-(2n+1-1) operations instead of mod-(2n+1+1)(2n+1-1) operations required by the proposal in [7]. Next, we further reduced the hardware complexity by making the resulting reverse converter architecture adder based. We proposed two hybrid CE and SE converters. In each of the schemes, the converter in [7] is integrated into a newly proposed area-delay efficient scheme. The path to be followed depends on whether x1,n=1 (which occurs only in few cases) or x1,n=0. In terms of delay, the two proposed hybrid CE and SE converters require (2n(p+1)+p+4)tFA and (n(p+1)+3)tFA+tMUX, respectively, while the CE and SE converters in [7], respectively, require (4n+5)tFA and (2n+3)tFA+tMUX, where tFA denotes the delay of one full adder and tMUX that of a multiplexer, and p is the occurrence probability of x1,n=1. The proposed hybrid CE converter outperforms the one in [7] in terms of delay with slightly higher or similar area cost. Additionally, with smaller delay, the proposed hybrid SE converter also requires less area cost when compared to the one in [7].
HiasatA. A.aahi-asat@psut.edu.joVLSI implementation of new arithmetic residue to binary decoders200513115315810.1109/TVLSI.2004.840400WangW.SwamyM. N. S.AhmadM. O.WangY.A study of the residue-to-binary converters for the three-moduli sets20035022352432-s2.0-003730157310.1109/TCSI.2002.808191WangY.SongX.AboulhamidM.ShenH.Adder based residue to binary number converters for {2n+1,2n,2n-1}20025017721779HiasatA. A.Abdel-Aty-ZohdyH. S.Residue-to-binary arithmetic converter for the moduli set {2k,2k-1,2k-1-1}19984522042092-s2.0-0032000038GbolagadeK. A.gbolagade@ce.et.tudelft.nlCotofanaS. D.sorin@ce.et.tudelft.nlGeneralized matrix method for efficient residue to decimal conversionProceeding of the 10th IEEE Asia-Pacific Conference on Circuits and Systems (APCCAS '08)December 2008Macao, China1414141710.1109/APCCAS.2008.4746295GbolagadeK. A.gbolagade@ce.et.tudelft.nlCotofanaS. D.sorin@ce.et.tudelft.nlResidue number system operands to decimal conversion for 3-moduli setsProceeding of the 51st IEEE Midwest Symposium on Circuits and SystemsAugust 2008Knoxville, Tenn, USA79179410.1109/MWSCAS.2008.4616918MolahosseiniA. S.NaviK.New arithmetic residue to binary converters200714295299MohanP. V. A.anandmohanpv@hotmail.comRNS-to-binary converter for a new three-moduli set {2n+1-1,2n,2n-1}200754977577910.1109/TCSII.2007.900844MillerD. F.McCormickW. S.An arithmetic free parallel mixed radix conversion algorithm1998451158162SzaboN.TanakaR.1967New York, NY, USAMC-Graw-HillWangY.New Chinese remainder theorems1Proceeding of the Asilomar ConferenceNovember 1998Pacific Grove, Calif, USA165171