International Scholarly Research Network ISRN Signal Processing Volume 2011, Article ID 378293, 4 pages doi:10.5402/2011/378293

# Research Article

# Multipath Pipelined Polyphase Structures for FIR Interpolation and Decimation in MIMO OFDM Systems

## Zhen-dong Zhang, Bin Wu, and Yu-mei Zhou

Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China

Correspondence should be addressed to Zhen-dong Zhang, zhangzhendong@ime.ac.cn

Received 13 September 2011; Accepted 3 October 2011

Academic Editors: A. Plaza and A. Wong

Copyright © 2011 Zhen-dong Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The combination of multiple-input multiple-output (MIMO) signal processing with orthogonal frequency-division multiplexing (OFDM) technique is one favored solution in wireless communication systems for enhancing data rate. However, the computational complexity is also linear increased with the number of data streams. Generally, multiple finite impulse response (FIR) interpolations and decimations are added to solve the multiple data streams in a MIMO OFDM system, which cause a large increase in the hardware cost. In this paper, two multipath pipelined polyphase structures for FIR interpolation and decimation to efficiently deal with the simultaneous multiple data streams are proposed. According to the proposed structures, *M* simultaneous data streams can be supported in the *M*-component polyphase interpolation or decimation with only one set of computation units. Implementation examples show that up to 56% reduction of silicon area can be obtained over the traditional polyphase structures.

#### 1. Introduction

In multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems, high order mapping schemes as 16 and 64 quadrature amplitude modulation are most commonly used for enhancing the data rata. Oversampled digital-to-analog converters and analog-to-digital converters are also widely used to improve the signal-to-noise ratio. Because the performance of high-order mapping schemes is highly sensitive to phase and amplitude distortion, finite impulse response (FIR) interpolations and decimations with linear phase, low ripple, and high speed are required to perform the sampling rate conversions at the analog-digital interface.

Previously, optimized implementation of FIR interpolation and decimation in hardware has been treated in a number of papers. In [1–6], various approaches have been proposed to reduce the area by replacing the multiplications with minimal hard-wired shifts, adders, and subtractors. The quantization effects of coefficients and round-off noise in multiplication have been analyzed in [7] and [8], respectively. In [9], FIR interpolation and decimation that make

use of-farrow structure are proposed. However, to our best knowledge, few papers have paid attention to system level considerations and optimizations.

In general, multiple interpolations and decimations are added to solve the simultaneous multiple data streams in a MIMO OFDM system, which cause a large increase in the hardware cost, compared with that of a single interpolation or decimation. In this paper, two multipath pipelined polyphase structures to efficiently deal with the issue of the multiple data streams in a MIMO OFDM system are proposed. According to the proposed structures, *M* simultaneous data streams can be supported in the *M*-component polyphase interpolation or decimation with one set of computation units; therefore, a significant reduction of silicon area can be achieved in system on chip implementation.

The remainder of this paper is organized as follows. Section 2 introduces the traditional polyphase structures. In Section 3, the proposed multipath pipelined polyphase structures are in detail demonstrated. Implementation examples and conclusions are presented in Sections 4 and 5, respectively.

2 ISRN Signal Processing



Figure 1: Conventional polyphase structures of (a) interpolation and (b) decimation.



FIGURE 2: Data scheduling of each subfilter of the conventional (a) interpolation and (b) decimation.

#### 2. Traditional Structures

Traditionally, FIR interpolation and decimation are implemented using polyphase filter structure and noble identities [8] as shown in Figure 1. The polyphase filter structure is based on that the transfer function of a FIR filter can be written as

$$H(z) = \sum_{n=0}^{N} h(n)z^{-n} = \sum_{m=0}^{M-1} z^{-m} P_m(z^M), \tag{1}$$

where parameter N is the filter order, M is the number of subfilters in the polyphase decomposed structure; the subfilter transfer function  $P_m(z)$  is defined as

$$P_m(z) = \sum_{i=0}^{\text{Ceiling}((N+1)/M)-1} h'(iM+m)z^{-i}.$$
 (2)

The function Ceiling  $(\cdot)$  is a function that returns the smallest integer value greater than or equal to its argument value. The function h'(j) is defined as

$$h'(j) = \begin{cases} h(j), & 0 \le j \le N \\ 0, & \text{otherwise.} \end{cases}$$
 (3)

The major feature of the polyphase structure is that the computations of each subfilter are performed at lower sampling rate, compared with straightforward implementation. The data scheduling of each subfilter of the conventional interpolation and decimation is shown in Figure 2, the position of letter *X* stands for the operating time of the corresponding subfilter; therefore, one set of multipliers can be shared by all the subfilters in hardware realization. However, if the sub-

filters were implemented using multiplierless approaches, the hard-wired shifts, adders, and subtractors of one subfilter cannot be shared with the other ones because of the coefficients of each subfilter are quite different. In addition, only one data stream can be supported in a conventional FIR interpolation or decimation, multiple interpolations and decimations are needed to deal with the multiple in-phase and quadrature data streams in a MIMO OFDM system. For example, eight interpolations and eight decimations are needed in a  $4\times4$  MIMO OFDM transceiver, consequently, the total hardware cost of the interpolations and decimations will be very high.

#### 3. Proposed Structures

The block diagrams of the proposed 4-path pipelined polyphase structures for interpolation and decimation are shown in Figure 3, in which four simultaneous data streams are supported with one set of computation units (CUs). The letters, A, B, C, D and A', B', C', D', mean the different input sequences and the corresponding output sequences. The four overlapped blocks stand for the four different phases in the four-component polyphase decomposition. In the proposed structures, there are four independent data paths in each phase. In order to avoid data collision, different-size delay elements from 1 to 3 are inserted in each data path at the appropriate positions as shown in Figure 3. The data scheduling of CU 0-3 of the proposed interpolation and decimation is illustrated in Figure 4, where A0, A1, A2, and A3 denote the data of phase 0, 1, 2, and 3 of the input data stream A; B0, B1, B2, and B3 denote the data of phase 0, 1, 2, and 3 of the input data stream B, and so on. It is clear that the four CUs of the proposed interpolation or decimation are all in ISRN Signal Processing 3



FIGURE 3: Proposed 4-path pipelined polyphase structures for (a) interpolation and (b) decimation.

| Time | 0  | 1  | 2  | 3  |     | Time | 0  | 1  | 2  | 3  |      |
|------|----|----|----|----|-----|------|----|----|----|----|------|
| CU0: | A0 | В0 | C0 | D0 | ··· | CU0: | В0 | C0 | D0 | A0 | ···· |
| CU1: | D1 | A1 | B1 | C1 |     | CU1: | C1 | D1 | A1 | B1 |      |
| CU2: | C2 | D2 | A2 | B2 |     | CU2: | D2 | A2 | B2 | C2 |      |
| CU3: | В3 | C3 | D3 | A3 |     | CU3: | A3 | В3 | C3 | D3 |      |
| (a)  |    |    |    |    | (b) |      |    |    |    |    |      |

Figure 4: Data scheduling of CU 0-3 of the proposed (a) interpolation and (b) decimation.

operating mode in every clock period, which leads to 100% hardware utilization efficiency.

In the proposed structure for interpolation, one register delay chain (DC) is shared by the four phases of one input stream. The four output streams are generated by four commutators. All the commutators rotate clockwise starting at time 3 at phase 0. For each input sample of one input stream, the corresponding commutator reads the output of every phase to obtain 4 samples of the interpolated signal. In the proposed structure for decimation, another four commutators are used to distribute the input samples of each input sequence to the phases. Those commutators also rotate clockwise but starting at time 0 at phase 3. For every clock period, the outputs of all phases are summed to produce one sample of the decimated signal. The samples of the decimated signal are finally distributed to each data path by a commutator in the output module; the commutator rotates clockwise starting at time 3 at data path A'.

The two main advantages of the proposed structures are summarized as follows. First, the simultaneous multiple data streams in a MIMO OFDM system can be operated with minimal hardware cost. Second, the CUs in the proposed structures can be efficiently implemented using hard-wired shifts, adders, and subtractors. Note that only the 4-path pipelined polyphase structures are in detail demonstrated here; in practice the *M*-path pipelined polyphase structure

for a *M*-component polyphase interpolation or decimation can be implemented similarly.

### 4. Implementation Examples

To enable a comparison, several different FIR interpolations and decimations have been implemented both using the proposed structures and using the traditional structures. The implementation details and comparison results are shown in Table 1, where the parameter  $F_c$  is the normalized cutoff frequency,  $\omega_p$  is the normalized passband edge;  $\omega_s$  is the normalized stopband edge, the maximum passband ripple and stopband ripple are denoted as  $\delta_1$  and  $\delta_2$ , respectively. For each implementation, the filter coefficients are rounded to 13 bits and represented with canonic sign digit [2]; the input word length is 10 bits, while the outputs have full precision; the multiplications in CUs are implemented using hardwired shifts, adders, and subtractors.

At first, the presented examples are modeled in Verilog and functional verified using ModelSim simulator. After functional validation, all the examples are synthesized using a 130 nm standard cell library from semiconductor manufacturing international corporation (SMIC). The cell area of each implementation is reported by Synopsys Design Compiler tool. The reductions of cell area using the proposed structures over the traditional structures are from 21% to

ISRN Signal Processing

|                      |                                                     | Tra    | ditional stru | cture       | Proposed structure |            |             | Area/stream   |  |
|----------------------|-----------------------------------------------------|--------|---------------|-------------|--------------------|------------|-------------|---------------|--|
| Example              | FIR filter information                              | Stream | Area (mm²)    | Area/stream | Stream             | Area (mm²) | Area/stream | reduction (%) |  |
| 1:2<br>Interpolation | Raised-cosine                                       | 1      | 0.026         | 0.026       | 2                  | 0.039      | 0.0195      | 25%           |  |
| 2:1<br>Decimation    | $39 \text{ tap}$ $F_c = 0.5\pi$                     | 1      | 0.031         | 0.031       | 2                  | 0.049      | 0.0245      | 21%           |  |
| 1:3<br>Interpolation | Hamming                                             | 1      | 0.066         | 0.066       | 3                  | 0.11       | 0.0367      | 45%           |  |
| 3:1<br>Decimation    | $66 \text{ tap}$ $F_c = 0.33\pi$                    | 1      | 0.093         | 0.093       | 3                  | 0.17       | 0.0567      | 39%           |  |
| 1:4<br>Interpolation | Equiripple 96 tap $\omega_p = 0.225\pi$             | 1      | 0.097         | 0.097       | 4                  | 0.17       | 0.0425      | 56%           |  |
| 4:1<br>Decimation    | $\omega_s = 0.275\pi$ $\delta_1 = \delta_2 = 0.005$ | 1      | 0.14          | 0.14        | 4                  | 0.31       | 0.0775      | 45%           |  |

TABLE 1: Implementation details and comparison results.

56%. This can be attributed to the fact that multiple simultaneous data streams are operated with one set of CUs in the proposed structures. It is noticed that the efficiency of reduction increases with the number of pipelined data paths.

#### 5. Conclusions

4

In this paper, we proposed two novel multipath pipelined polyphase structures for FIR interpolation and decimation based on system level considerations and optimizations. According to the proposed structures, M simultaneous data streams can be operated in the M-component polyphase interpolation or decimation with one set of computation units. In order to show the correctness and the advantages of our proposed structures, several different FIR interpolations and decimations have been implemented both using the proposed multipath pipelined polyphase structures and using the conventional polyphase structures. The comparison results show that the proposed structures are more efficient than the traditional polyphase structures and best suited for implementing FIR interpolation and decimation in MIMO OFDM systems.

#### **Acknowledgments**

This work was supported by the Major National Science and Technology Program of China under Grant no. 2010ZX-03005-001 and the National Natural Science Foundation of China under Grant no. 60976022.

#### References

- [1] A. G. Dempster and M. D. Macleod, "Use of minimum-adder multiplier blocks in FIR digital filters," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 42, no. 9, pp. 569–577, 1995.
- [2] R. T. Hartley, "Subexpression sharing in filters using canonic signed digit multipliers," *IEEE Transactions on Circuits and Sys*-

- tems II: Analog and Digital Signal Processing, vol. 43, no. 10, pp. 677–688, 1996.
- [3] R. Paško, P. Schaumont, V. Derudder, S. Vernalde, and D. Ďuračková, "A new algorithm for elimination of common sub-expressions," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 18, no. 1, pp. 58–68, 1999.
- [4] M. Martínez-Peiró, E. I. Boemo, and L. Wanhammar, "Design of high-speed multiplierless filters using a nonrecursive signed common subexpression algorithm," *IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing*, vol. 49, no. 3, pp. 196–203, 2002.
- [5] A. P. Vinod and E. M. K. Lai, "Low power and high-speed implementation of FIR filters for software defined radio receivers," *IEEE Transactions on Wireless Communications*, vol. 5, no. 7, Article ID 1673078, pp. 1669–1675, 2006.
- [6] R. Mahesh and A. P. Vinod, "A new common subexpression elimination algorithm for realizing low-complexity higher order digital filters," *IEEE Transactions on Computer-Aided De*sign of Integrated Circuits and Systems, vol. 27, no. 2, pp. 217– 229, 2008.
- [7] R. Mehboob, S. A. Khan, and R. Qamar, "FIR filter design methodology for hardware optimized implementation," *IEEE Transactions on Consumer Electronics*, vol. 55, no. 3, pp. 1669–1673, 2009.
- [8] J. G. Proakis and D. G. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications, Prentice Hall, Upper Saddle River, NJ, USA, 2006.
- [9] H. Johansson and O. Gustafsson, "Linear-phase FIR interpolation, decimation, and M th-band filters utilizing the farrow structure," *IEEE Transactions on Circuits and Systems I*, vol. 52, no. 10, pp. 2197–2207, 2005.

















Submit your manuscripts at http://www.hindawi.com























