Multipath Pipelined Polyphase Structures for FIR Interpolation and Decimation inMIMOOFDM Systems

The combination of multiple-input multiple-output (MIMO) signal processing with orthogonal frequency-division multiplexing (OFDM) technique is one favored solution in wireless communication systems for enhancing data rate. However, the computational complexity is also linear increased with the number of data streams. Generally, multiple finite impulse response (FIR) interpolations and decimations are added to solve the multiple data streams in a MIMO OFDM system, which cause a large increase in the hardware cost. In this paper, two multipath pipelined polyphase structures for FIR interpolation and decimation to efficiently deal with the simultaneous multiple data streams are proposed. According to the proposed structures, M simultaneous data streams can be supported in the M-component polyphase interpolation or decimation with only one set of computation units. Implementation examples show that up to 56% reduction of silicon area can be obtained over the traditional polyphase structures.


Introduction
In multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) systems, high order mapping schemes as 16 and 64 quadrature amplitude modulation are most commonly used for enhancing the data rata.Oversampled digital-to-analog converters and analog-todigital converters are also widely used to improve the signalto-noise ratio.Because the performance of high-order mapping schemes is highly sensitive to phase and amplitude distortion, finite impulse response (FIR) interpolations and decimations with linear phase, low ripple, and high speed are required to perform the sampling rate conversions at the analog-digital interface.
Previously, optimized implementation of FIR interpolation and decimation in hardware has been treated in a number of papers.In [1][2][3][4][5][6], various approaches have been proposed to reduce the area by replacing the multiplications with minimal hard-wired shifts, adders, and subtractors.The quantization effects of coefficients and round-off noise in multiplication have been analyzed in [7] and [8], respectively.In [9], FIR interpolation and decimation that make use of-farrow structure are proposed.However, to our best knowledge, few papers have paid attention to system level considerations and optimizations.
In general, multiple interpolations and decimations are added to solve the simultaneous multiple data streams in a MIMO OFDM system, which cause a large increase in the hardware cost, compared with that of a single interpolation or decimation.In this paper, two multipath pipelined polyphase structures to efficiently deal with the issue of the multiple data streams in a MIMO OFDM system are proposed.According to the proposed structures, M simultaneous data streams can be supported in the M-component polyphase interpolation or decimation with one set of computation units; therefore, a significant reduction of silicon area can be achieved in system on chip implementation.
The remainder of this paper is organized as follows.Section 2 introduces the traditional polyphase structures.In Section 3, the proposed multipath pipelined polyphase structures are in detail demonstrated.Implementation examples and conclusions are presented in Sections 4 and 5, respectively.P 0 (z): P 0 (z): P 1 (z):

Traditional Structures
Traditionally, FIR interpolation and decimation are implemented using polyphase filter structure and noble identities [8] as shown in Figure 1.The polyphase filter structure is based on that the transfer function of a FIR filter can be written as where parameter N is the filter order, M is the number of subfilters in the polyphase decomposed structure; the subfilter transfer function P m (z) is defined as The function Ceiling (•) is a function that returns the smallest integer value greater than or equal to its argument value.The function h ( j) is defined as The major feature of the polyphase structure is that the computations of each subfilter are performed at lower sampling rate, compared with straightforward implementation.The data scheduling of each subfilter of the conventional interpolation and decimation is shown in Figure 2, the position of letter X stands for the operating time of the corresponding subfilter; therefore, one set of multipliers can be shared by all the subfilters in hardware realization.However, if the sub-filters were implemented using multiplierless approaches, the hard-wired shifts, adders, and subtractors of one subfilter cannot be shared with the other ones because of the coefficients of each subfilter are quite different.In addition, only one data stream can be supported in a conventional FIR interpolation or decimation, multiple interpolations and decimations are needed to deal with the multiple in-phase and quadrature data streams in a MIMO OFDM system.For example, eight interpolations and eight decimations are needed in a 4 × 4 MIMO OFDM transceiver, consequently, the total hardware cost of the interpolations and decimations will be very high.

Proposed Structures
The block diagrams of the proposed 4-path pipelined polyphase structures for interpolation and decimation are shown in Figure 3, in which four simultaneous data streams are supported with one set of computation units (CUs).The letters, A, B, C, D and A' , B' , C' , D' , mean the different input sequences and the corresponding output sequences.The four overlapped blocks stand for the four different phases in the four-component polyphase decomposition.In the proposed structures, there are four independent data paths in each phase.In order to avoid data collision, different-size delay elements from 1 to 3 are inserted in each data path at the appropriate positions as shown in Figure 3.The data scheduling of CU 0-3 of the proposed interpolation and decimation is illustrated in Figure 4, where A0, A1, A2, and A3 denote the data of phase 0, 1, 2, and 3 of the input data stream A; B0, B1, B2, and B3 denote the data of phase 0, 1, 2, and 3 of the input data stream B, and so on.It is clear that the four CUs of the proposed interpolation or decimation are all in  operating mode in every clock period, which leads to 100% hardware utilization efficiency.
In the proposed structure for interpolation, one register delay chain (DC) is shared by the four phases of one input stream.The four output streams are generated by four commutators.All the commutators rotate clockwise starting at time 3 at phase 0. For each input sample of one input stream, the corresponding commutator reads the output of every phase to obtain 4 samples of the interpolated signal.In the proposed structure for decimation, another four commutators are used to distribute the input samples of each input sequence to the phases.Those commutators also rotate clockwise but starting at time 0 at phase 3.For every clock period, the outputs of all phases are summed to produce one sample of the decimated signal.The samples of the decimated signal are finally distributed to each data path by a commutator in the output module; the commutator rotates clockwise starting at time 3 at data path A' .
The two main advantages of the proposed structures are summarized as follows.First, the simultaneous multiple data streams in a MIMO OFDM system can be operated with minimal hardware cost.Second, the CUs in the proposed structures can be efficiently implemented using hard-wired shifts, adders, and subtractors.Note that only the 4-path pipelined polyphase structures are in detail demonstrated here; in practice the M-path pipelined polyphase structure for a M-component polyphase interpolation or decimation can be implemented similarly.

Implementation Examples
To enable a comparison, several different FIR interpolations and decimations have been implemented both using the proposed structures and using the traditional structures.The implementation details and comparison results are shown in Table 1, where the parameter F c is the normalized cutoff frequency, ω p is the normalized passband edge; ω s is the normalized stopband edge, the maximum passband ripple and stopband ripple are denoted as δ 1 and δ 2 , respectively.For each implementation, the filter coefficients are rounded to 13 bits and represented with canonic sign digit [2]; the input word length is 10 bits, while the outputs have full precision; the multiplications in CUs are implemented using hardwired shifts, adders, and subtractors.
At first, the presented examples are modeled in Verilog and functional verified using ModelSim simulator.After functional validation, all the examples are synthesized using a 130 nm standard cell library from semiconductor manufacturing international corporation (SMIC).The cell area of each implementation is reported by Synopsys Design Compiler tool.The reductions of cell area using the proposed structures over the traditional structures are from 21% to 56%.This can be attributed to the fact that multiple simultaneous data streams are operated with one set of CUs in the proposed structures.It is noticed that the efficiency of reduction increases with the number of pipelined data paths.

Conclusions
In this paper, we proposed two novel multipath pipelined polyphase structures for FIR interpolation and decimation based on system level considerations and optimizations.According to the proposed structures, M simultaneous data streams can be operated in the M-component polyphase interpolation or decimation with one set of computation units.In order to show the correctness and the advantages of our proposed structures, several different FIR interpolations and decimations have been implemented both using the proposed multipath pipelined polyphase structures and using the conventional polyphase structures.The comparison results show that the proposed structures are more efficient than the traditional polyphase structures and best suited for implementing FIR interpolation and decimation in MIMO OFDM systems.

Figure 2 :
Figure 2: Data scheduling of each subfilter of the conventional (a) interpolation and (b) decimation.

Figure 4 :
Figure 4: Data scheduling of CU 0-3 of the proposed (a) interpolation and (b) decimation.

Table 1 :
Implementation details and comparison results.