Low-Complexity Distortionless Techniques for Peak Power Reduction in OFDM Communication Systems

A high peak-to-average power ratio (PAPR) is one of the major drawbacks to using orthogonal 
frequency division multiplexing (OFDM) modulation. The three most effective distortionless techniques 
for PAPR reduction are partial transmit sequence (PTS), selective mapping (SLM), and tone reservation 
(TR). However, the high computational complexity due to the inverse discrete Fourier transform (IDFT) 
is a problem with these approaches. Implementation of these techniques typically employ direct computation 
of the IDFT, which is not the most efficient solution. In this paper, we consider the development 
and performance analysis of these distortionless techniques in conjunction with low-complexity IFFT 
algorithms to reduce the PAPR of the OFDM signal. Recently, proposed IFFT-based techniques are 
shown to substantially reduce the computational complexity and improve PAPR performance.


Introduction
Multicarrier modulation is a data transmission technique, which provides efficient bandwidth utilization and robustness against time dispersive channels. Examples of multicarrier modulation systems are digital audio broadcasting (DAB), digital video broadcasting (DVB), and wireless local and metropolitan area networks using orthogonal frequency division multiplexing (OFDM), and digital subscriber line (DSL) using discrete multitone (DMT) systems. OFDM is an effective transmission technique for wireless communications over frequency selective channels as it provides immunity to multipath fading. An inverse fast Fourier transform (IFFT) and a fast Fourier transform (FFT) are typically employed for baseband modulation and demodulation, respectively. Using an IFFT/FFT simplifies the design of the transceiver and eliminates the need for high speed equalizers, resulting in an efficient hardware implementation.
In order to fully exploit the benefits provided by OFDM modulation, large envelope variations before the RF portion of an OFDM transmitter must be avoided. Signal peaks can lead to saturation in the power amplifier (PA), which in turn increases out-of-band radiation, creates in-band distortion, and reduces PA efficiency. The PA dominates the power consumption of the communication system. Thus this decrease in efficiency results in lower battery life in mobile (wireless) devices and the need for sophisticated heat dissipation techniques in base stations. To deal with this important issue, advanced signal processing techniques are required, which have low implementation complexity.
As distortionless phase optimization techniques, partial transmit sequence (PTS) [19], and selective mapping (SLM) [20] can provide significant PAPR reduction with a small amount of redundancy. With SLM, multiple sequences are generated by phase rotating the original data block and the sequence with the lowest PAPR is selected for transmission. Randomly chosen phase sequences lead to lower PAPR compared to other schemes such as sequences with adjacent or equally spaced subcarriers. In the PTS approach [19], disjoint subblocks of OFDM subcarriers are phase shifted separately after the IFFT is computed. If the subblocks are optimally phase shifted, they exhibit minimum PAPR and consequently reduce the PAPR of the merged signal. The number of subblocks and the corresponding partitioning determine the PAPR reduction. The search for optimum subblock phase factors is computationally complex, but this can be reduced with sphere decoding [21]. Pseudorandom (such as m-sequence), subblock partitioning has been found to provide better PAPR reduction compared to contiguous partitioning schemes. Typically, the receiver requires side information corresponding to the optimal phases in PTS and the transmitted sequences in SLM. Techniques for avoiding explicit side information transmission are presented in [22,23]. SLM and PTS are known as average power preserving techniques as they do not increase the average signal power. The main drawback of PTS and SLM arises from the computation of multiple IFFTs, resulting in a complexity proportional to the number of PTS subblocks or SLM sequences.
Another class of distortionless techniques, tone reservation (TR) [16], increases the average power. With TR, a predefined set of OFDM subcarriers is reserved to generate peak reduction signals. The transmitter does not send data on these subcarriers, so they are orthogonal to the data subcarriers. Consequently, these added signals do not distort the data subcarriers, and recovering the data at the receiver is trivial. These peak reduction signals are used to compensate for high peaks in the signals on the data subcarriers. TR is particularly appropriate when there are a large number of subcarriers. The optimal choice of values for the reserved tones can be formulated as a quadratically constrained quadratic program for complex multicarrier signals (passband) and a linear program for real multicarrier signals (baseband) [16].
Suboptimal iterative algorithms such as gradient and controlled clipping [16] have been proposed, which have less complexity but slightly inferior PAPR performance compared to the optimal solutions. Hence, they provide a tradeoff between computational complexity and PAPR reduction. These algorithms, however, suffer from slow convergence after the first few iterations. In [24], the controlled clipper algorithm was considered. Improved performance was obtained by using filtered clipping noise as the peak-reduction signal. This noise was adaptively scaled to reduce the PAPR. However, the resulting complexity is still high since multiple FFT and IFFT operations are required during the iterative process.
In the suboptimal iterative algorithms, the PAPR reduction capability greatly depends on the location and number of peak reduction tones (PRTs). The locations also affect the convergence rate of the algorithm. As indicated in [16], the optimal peak reduction kernels can be computed offline or during initialization if the channel is static. However, when the channel is not static, the PRT locations and consequently the peak reduction kernels should be updated according to the rate at which the channel changes [16]. This can create significant computational complexity. Side information must also be transmitted in order to identify the reserved tones at the receiver.
This paper first provides an overview of the major distortionless PAPR reduction techniques, namely, partial transmit sequence (PTS), selective mapping (SLM), and tone reservation (TR). Their main features are described and analyzed. We consider practical solutions for improving both the computational complexity and PAPR performance of these techniques. In doing so, we provide a review of the lowcomplexity techniques proposed in [25][26][27][28] that exploit the structure of the IFFT/FFT algorithms. These new techniques are based on the concept of identical inverse discrete Fourier transforms (IDFTs). The structure and properties of these identical IDFTs are used to improve both the PAPR and complexity.
The remainder of this paper is organized as follows. The structure of an OFDM transceiver is described in Section 2. We also characterize the PAPR, including a statistical description, and examine its effect on performance. Distortionless PAPR reduction techniques are introduced in Section 3. Section 4 presents the IFFT-based PAPR reduction techniques, and their performance and computational complexity are examined. Finally, Section 5 provides some concluding remarks and suggestions for future work. We use the following notation. Upper case and lower case bold letters represent matrices and vectors, respectively. We use · ∞ to denote infinity-norm, · 2 for 2-norm, E[·] for expectation, (·) T for transpose, and (·) * for complex conjugate.

OFDM Signals and PAPR
This section presents the OFDM system model and a characterization of the major disadvantage of OFDM, namely, a high peak-to-average power ratio (PAPR). We first introduce the OFDM transceiver structure and then describe the nonlinear power amplifier model used at the transmitter. Next, the effect of high peaks on the OFDM signal envelope is discussed.

The OFDM Transceiver.
A block diagram of the OFDM transceiver is given in Figure 1. The serial input bit stream is sent to a quadrature-amplitude modulation (QAM) or phase-shift keying (PSK) constellation mapper, which outputs N parallel constellation points X(k) representing  The parallel time domain samples x(n) are then converted to a serial stream. A cyclic prefix is appended before the OFDM symbol x. This prefix should have a duration longer than the maximum delay due to the propagation paths [16]. This prevents intersymbol interference and enables simple single-tap equalization. The sequence is converted to an analog signal x(t), up converted to the carrier frequency, amplified to obtain x(t), and the resulting signal is transmitted through the channel. At the receiver, the reverse operations are performed.
The complex envelope of the baseband OFDM signal, defined over the time interval t ∈ [0, T s ] where T s is the OFDM symbol duration, can be expressed as The corresponding discrete time signal is [16] x(n) = 1 N N−1 k=0 X(k)e j2πkn/JN , n = 0, . . . , N − 1. (2) To simplify the notation, the time domain OFDM samples are represented as where Q is an N-point IDFT matrix with elements Q α,n = In this paper, we consider a solid state power amplifier (SSPA) model [16] with amplitude modulation (AM) characteristics where x(t) is the amplified OFDM signal and x SAT is the output saturation level. The amplifier saturation power, P SAT , is defined as P SAT = x 2 SAT . The parameter γ controls the smoothness and h is the amplifier small signal gain. The AM/phase modulation (PM) conversion of the SSPA is assumed to be zero. In order to reduce the nonlinear distortion due to signal peaks, the amplifier is driven with an input back-off (IBO)

Peak Power and Its Effects.
If the modulated QAM/QPSK complex symbols X(k) add constructively, they can generate a time domain signal with a large amplitude. Thus, the output signal x(n) can have high peak values. Signal peaks much higher than the average can exceed the linear range of the amplifier, causing distortion of the OFDM signal. This distortion has an impact on the received signal constellation similar to additive noise if it occurs frequently [16]. Figure 2 illustrates the effect of the distortion for N = 256 and a 16-QAM constellation. The in-band distortion is similar to additive Gaussian noise, which increases the bit error rate (BER) at the receiver. The out-of-band radiation (outside the spectrum of the OFDM signal) introduces interference in adjacent channels. Figure 3 shows the power spectral density  To control the effects of nonlinear distortion, one can force (scale) the PA to operate in the linear region or increase the IBO. However, PAs are more power efficient when operating close to the saturation region. In addition, for the same transmit power, a larger IBO increases power consumption and the cost of hardware devices (e.g., a power amplifier with a large linear range).

Peak-to-Average Power Ratio (PAPR).
The most popular metric to evaluate the variation in the time domain signal is the peak-to-power average ratio (PAPR). For the OFDM system in Figure 1, the PAPR of x is defined as where J is the oversampling factor [16]. In order to evaluate the PAPR reduction, we employ the complementary cumulative distribution function (CCDF) which represents the probability that the PAPR of a symbol exceeds the clipping level δ.

Distortionless PAPR Reduction Techniques
In this section, we provide an overview of the most popular distortionless PAPR reduction techniques, PTS, SLM, and TR. They are based on data or signal modification prior to the power amplifier and reduce the PAPR without distorting the signal (which creates out-of-band radiation and/or in-band distortion). These techniques are described and analyzed below.
The combination of these subblocks with the phase rotation vectors ] yields the alternative frequency domain vectors with The P subblocks are optimally phase rotated to achieve a reduced PAPR signal x. Since each subblock is independently rotated by a phase vector Θ p , the phase vector multiplication can be performed after the IDFT computation. This is an advantage as the PAPR can be computed without changing between the time and frequency domains. Hence, we can take the IDFT of (9) and exploit the linearity of the IDFT to obtain where the x p = QX p are the P time domain partial transmit sequences. The sequence x with the smallest PAPR is chosen for transmission based on the following criterion:  sphere decoding [21]. PTS requires that (P−1)log 2 W bits per OFDM symbol be transmitted as explicit side information. This information is used at the receiver to recover the original data. Techniques to avoid side information are given in [23,29]. As shown in [19], pseudorandom subblock partitioning was found to have the best PAPR reduction compared to contiguous and other noncontiguous partitioning schemes. The autocorrelation function (ACF) of the PTS subblocks shows that this approach provides less correlated adjacent time samples. PTS is very effective at reducing the PAPR; however, the PAPR performance depends on W, P, and the method of subblock partitioning. If the number of subblocks, P, and/or the number of phase values, W, are increased, the PAPR reduction capability is improved. This is shown in Figure 5 for N = 256.
According to (10), the number of IDFT calculations that have to be computed is P, which is typically in the range from 2 to 16. Thus the resulting computational complexity can be high, particularly when N is large. IFFT-based PTS focuses on the computational complexity of the multiple IFFTs. This will be discussed in Section 4.

Selective Mapping.
A simple approach to generate different mappings for the same OFDM symbol is to phaserotate the frequency domain signal. This technique, called selective/selected mapping (SLM), was proposed in [30,31]. The SLM concept is similar to that of PTS, but SLM phase rotates each subcarrier individually, while blocks of subcarriers are rotated in PTS. With SLM, multiple sequences are generated by multiplying independent phase sequences with the original data, and the sequence with the lowest PAPR is chosen for transmission (see Figure 6). Consider a phase rotated version of X given by where · denotes elementwise multiplication and The time domain OFDM signal u ω is obtained using the IDFT of U ω . Hence, all of the candidate symbols carry the same information X. In SLM, the lowest PAPR signal u ω is chosen for transmission from the Ω candidate signals, including the original signal x, that is, It has been determined that using randomly chosen phase sequences provides better PAPR reduction than other sequences such as complementary Golay and Walsh-Hadamard sequences [32].  shows that larger PAPR reductions can be achieved as the number of SLM sequences Ω is increased. In addition, similar to PTS, SLM requires Ω IDFTs to obtain the sequences u ω . As a consequence, this techniques has significant computational complexity for typical values of Ω, between 2 and 16. Complexity reductions can be achieved by implementing the IFFT-based SLM technique described in Section 4.

Tone Reservation.
With the tone reservation (TR) technique, L subcarriers called peak reduction tones (PRTs) are reserved to generate a peak-reduction signal. Let R = [λ 0 , . . . , λ L−1 ] denote the ordered set of reserved tone positions, and R C denote the complement of R in N = [0, . . . , N − 1]. The frequency domain signal including PRTs can be expressed as where C(k) is the PAPR reduction signal. The corresponding signal c is given by The PAPR after adding the peakreduction signal c is where J is the oversampling factor. Since the optimal tone reservation solution, c opt , only slightly increases the mean power, the denominator of (15) is not a significant function of c and so can be ignored [16]. This simplifies the calculation of c opt . Hence, the goal is to minimize the peak power of the signal x + c. The optimal TR peak reduction signal c can then be formulated as the solution to the optimization problem [16] min where C is the corresponding frequency domain vector of c.
To obtain a low-complexity solution, suboptimal iterative algorithms such as gradient and controlled clipper [16] have been developed to generate the peak reduction signal c. They provide a trade-off between computational complexity and PAPR reduction. The gradient algorithm iteratively computes where a β (n) =x β (n) − x SAT e ∠(x β (n)) is a complex scalar at the βth iteration and e = QE. E = [E(0), . . . , E(N − 1)] is called the frequency domain peak reduction kernel with binary elements {0, 1} according to the reserved subcarriers C. A block diagram of the suboptimal iterative gradient algorithm is given in Figure 8. This algorithm first uses the PRT set to generate the kernel signal e, then iteratively shifts this signal to the peak locations to reduce the amplitude of the OFDM signal. Figure 9 shows the performance of the gradient-based technique with L/N = 6.3% reserved subcarriers chosen randomly, β = 40 and x SAT = 6 dB. In this case, the gradient algorithm achieves a PAPR reduction of 3 dB. The computational complexity of the gradient-based algorithm is essentially determined by the IFFT operations required to obtain the kernel signal e. If the channel is not static, the PRT locations should be updated periodically according to the coherence time of the channel. This can lead to very high computational complexity due to repeated calculations of the peak reduction kernels. Stage v IFFT output Hence, we consider efficient computation of these kernels in Section 4.

IFFT-Based PAPR Reduction Techniques
This section presents a survey of IFFT-based PAPR reduction techniques [25][26][27][28]. It is shown that they have much lower complexity compared to the conventional distortionless methods given in the previous section. The performance of the IFFT-based PAPR reduction techniques and the previously developed solutions is also evaluated.

IFFT Algorithms.
An IFFT algorithm converts the IDFT computation to r × N/r-point DFTs iteratively through m = log r N stages. As a consequence, the computational complexity is reduced from O(N 2 ) to O (Nlog r N). The value of r is called the radix. The PAPR reduction algorithms proposed in [25][26][27][28] exploit this recursive structure and the resulting identical IDFTs at each IFFT stage. This conversion also provides a means of analyzing and quantifying the effects of the intermediate signals in the transform in terms of PAPR reduction and computational complexity. As described in [25], there are r v−1 identical N/r v−1 -point IDFTs at a particular radix-r IFFT stage v between stage v and the last stage m. Hence, from (3) the IFFT output corresponding to the inputs at a particular stage v can be expressed as with v identical submatrices Q 1 = Q 2 = · · · = Q η = · · · = Q r v−1 . Figure 10 illustrates the IDFTs at stage v.

Radix IFFT Computational
Complexity. The radix-r IFFT algorithms derived above allow us to compute the computational complexity per stage. This consists of multiplicative and additive complexity. There are two major IFFT algorithms: decimation in time (DIT) and decimation in frequency (DIF) [25]. The multiplicative complexity of DIF algorithms; differs from that of DIT algorithms; however, both algorithms have the same additive complexity per stage. From [25], the multiplicative complexity per stage for the radix-r DIF and DIT algorithms is defined as respectively. The overall computational complexity for m stages is then given by [25]   The number of additions for the radix-r IFFT algorithm at stage v is

IFFT-Based PTS.
As discussed previously, one of the major drawback of PTS arises from the computation of multiple IFFTs, resulting in a high complexity proportional to the number of subblocks. In order to reduce this complexity, the PTS technique proposed in [25] employs the input signals to the identical IDFTs at a given stage to obtain the PTS subblocks. This reduces the number of stages requiring multiple IFFTs.

DIF versus DIT Algorithms.
The overall multiplicative complexity for IFFT-based PTS using DIF with P subblocks from (19) is Similarly, from (20) the overall multiplicative complexity for IFFT-based PTS using DIT with P subblocks is From (23), both algorithms have the same additive complexity per stage, so the overall additive complexity for IFFTbased PTS with P subblocks is The multiplicative complexity, which largely determines the computational complexity of the remaining stages, depends heavily on the type of IFFT algorithm. It was shown in [25] that DIT algorithms have a majority of the complex multiplication operations in the last stages, while DIF algorithms have a majority of these operations in the first stages. Thus, DIF has lower multiplicative complexity in generating the PTS subblocks compared to DIT, while both provide the same PAPR reduction. The PAPR performance was verified numerically in [25].
As an example, to compare the multiplicative complexity for m − v remaining stages, consider N = 256, 2048, 8192 for radix-2 DIF and DIT algorithms. This multiplicative complexity on a logarithmic scale is given in Figure 11.
To evaluate the PAPR performance, the CCDFs of radix-2 DIF-PTS and DIT-PTS at stage m − v = 5 and original PTS (O-PTS) are depicted in Figure 12 for P = 8 and N = 256, 2048. This shows that the algorithms provide similar performance. In this case, DIF-PTS achieves a multiplicative complexity reduction ratio of 30% over DIT-PTS. This value increases to 43% and 58% when compared to O-PTS. It has been shown that high radix algorithms provide better PAPR reduction per stage compared to low radix algorithms and also have lower multiplicative complexity.
Although IFFT-based PTS reduces the computational complexity, the PAPR reduction decreases as the number of stages after PTS partitioning is reduced. To achieve  PAPR reduction close to that of O-PTS, there should be a sufficient number of stages remaining, so there is a limit on the achievable computational complexity reduction. Thus the major challenge is to decrease this complexity while maintaining a PAPR reduction close to that of O-PTS. To address this, the PTS technique in [27] was developed based on the normalized periodic auto-correlation function (ACF) of the PTS subblocks, which is defined as where This represents the correlation between ξ-spaced complex samples in a subblock p. The ACF can be used in the design of the PTS subblocks to reduce both the PAPR and computational complexity. It was shown that the previously proposed pseudorandom subblock designs are not optimal for the case of IFFT-based PTS as they introduce repeated subcarriers (identical inputs to different IDFTs at a given stage are defined as repeated subcarriers) within a subblock. The effect of repeated subcarriers on the ACF of the subblocks is illustrated in Figure 13 where the number of subcarriers and subblocks is 32 and 4, respectively. The same pseudorandom sequence [01023312] was used for the inputs to the 4 × 8-point IDFTs using a radix-2 algorithm at Stage 3. This shows that repeated subcarriers result in a large ACF for the PTS sequences.
PTS subblock partitioning was developed based on errorcorrecting codes (ECCs) to limit the number of repeated subcarriers [26]. This partitioning improves the ACF properties of the PTS subblocks as it provides less correlated adjacent time samples compared with other partitioning schemes. In fact, the proposed ECC technique minimizes the number of repeated subcarriers within the subblocks at a particular stage v. The ECCs are repetition codes (RCs) over Z P , the integer ring of P elements. As an example, consider the 4 × 8point IDFTs using radix-2 at Stage 3. The m-sequence (MS) subblocks from [33] with P = 4 are Each row in (28) represents the input to an 8-point IDFT.
The repeated subcarriers within each subblock can be identified from the repeated numbers in the columns of the matrix. This shows that there are as many as three repeated Note that in this case there are no repeated subcarriers within the subblocks. Figures 14 and 15 present the absolute value of the ACF vectors for subblocks p 0 to p 3 with MS and ECC subblocks, respectively. The ECC subblocks show a significant reduction in the ACF compared with the MS subblocks. In fact, the ECC subblocks provide an ACF, which is nearly flat. As stated in [26], the IFFT and FFT operations use the same coefficients. Hence, the FFT coefficients used to recover the data at the receiver are the same as the IFFT coefficients used to generate the OFDM symbols at the transmitter. However, we must take into account the order of the IFFT inputs at the transmitter. If we assume these inputs are in normal order, the inputs to the FFT at the receiver should be in reverse order. Thus, the FFT computations at the receiver are symmetric to the IFFT computations, so if stage v is used to obtain the PTS subblocks, the data is recovered at FFT stage m − v at the receiver. Hence, the side information required is the same as with O-PTS. Figure 16 presents the PAPR performance for N = 256 and P = 8. This shows that at CCDF = 10 −4 , PTS-ECC improves the PAPR performance by approximately 2 dB for 2 and 3 remaining stages compared to PTS-MS. The corresponding computational complexity reduction over O-PTS is from 45% to 52% [25].    high complexity for practical systems, which is proportional to the number of SLM sequences. This complexity can be reduced with IFFT-based SLM [26], which uses the product of intermediate signals within the IFFT and the SLM phase sequences. Multiple IFFTs are then applied only to the remaining stages.
To reduce the computational complexity of the remaining stages, DIF is employed with IFFT-based SLM rather than DIT, as the former results in fewer multiplication. In addition (similar to PTS), IFFT-based SLM using a higher radix FFT algorithm results in lower multiplicative  complexity than using a lower radix. To further reduce the computational complexity of IFFT-based SLM, partial SLM was proposed in [26]. In this case, multiple IFFTs are computed over a fraction of the identical IDFTs for the remaining stages, so that only a subset of the inputs X η are phase rotated to obtain the time domain sequences U w . It was shown that partial SLM significantly lowers the computational complexity compared to original SLM (O-SLM). As stated in [26], a complexity reduction of 75% over O-SLM can be obtained with 5 remaining stages and 8 SLM sequences. Figure 17 presents the CCDF of O-SLM and partial SLM for various numbers of remaining stages and subsets ν. This shows that partial SLM with ν = 2 provides PAPR performance close to that with O-SLM. Therefore, this technique provides a better PAPR versus complexity tradeoff compared to O-SLM. To recover the data at the receiver, similar to IFFT-based PTS, only the IFFT input order at the transmitter must be taken into account. Therefore, the side information is the same as with original SLM.

IFFT-Based TR.
As previously discussed, with the suboptimal iterative algorithms the PAPR reduction capability of TR depends strongly on the locations and number of peak reduction tones (PRTs). These locations also affect the convergence rate of the solution. As indicated in [16], the optimal peak reduction kernels can be computed off-line or during the initialization process if the channel is static. However, when the channel is not static, the PRT locations and consequently the peak reduction kernels should be updated as the channel varies [16]. This creates significant computational complexity. For example, to obtain a good  PRT set (with a PAPR loss less than 0.1 dB from the optimal case), a complexity of one IFFT per set is reported in [16], with 11 to 28 randomly chosen sets of reserved subcarriers required. Side information must also be transmitted in order to identify the reserved tones at the receiver.
In [28], IFFT-based TR was proposed as an efficient gradient-based algorithm for PAPR reduction. This algorithm utilizes the FFT algorithms described previously to reduce the computational complexity associated with optimizing the peak reduction kernels. It was shown that this can significantly reduce the computational complexity in generating the kernels. Thus they can be updated efficiently when the channel is not static. IFFT-based TR has significantly less computational complexity compared with the approaches in [16].
With IFFT-based TR, the reserved tones are chosen over the inputs X η to the identical IDFTs at a given stage. In fact, a subset of PRT locations for each of the identical N/r v−1point IDFTs is used to generate the peak reduction kernels e η . This results in a low-complexity solution to computing the kernels, which is appropriate when the PRTs must be updated frequently. The results presented in [28] demonstrate that the proposed algorithm has PAPR performance close to that of the gradient algorithm in [16], at a cost of one to two additional bits of side information.
The PAPR performance of IFFT-based TR and the algorithm in [16] are compared in Figure 18 for various numbers of iterations (β p ), with N = 512, r = 2, v = 4, and β = 40. This shows that the IFFT-based TR algorithm with 8 and 16 iterations outperforms the algorithm in [16] with 40 iterations. In this case, the IFFT-based TR technique also provides a complexity reduction of 56% [28]. The proposed IFFT-based TR algorithm thus provides a better tradeoff between complexity and PAPR performance compared to the gradient algorithm in [16], as these performance metrics are a function of r and v.
It should be noted that a radix-r IFFT algorithm can be practically implemented using radix-r butterflies with N/r butterflies per stage. Each stage of the IFFT operation reads N memory locations containing the butterfly inputs, processes the inputs, and writes them back. Since the proposed IFFT algorithm uses only m − v of the m stages, the number of butterfly processing elements is reduced from mN/r to (m−v)N/r. In addition, since we compute the peak reduction kernel for only one N/r v−1 -point IDFT, the number of butterfly processing elements is only (m − v)N/r v−1 . Using N/r v−1 -point IDFTs instead of N-point IDFTs significantly reduces the number of butterfly operations required.

Conclusions
A time domain OFDM signal can exhibit a large peak-toaverage power ratio (PAPR). This reduces the efficiency of the power amplifier and causes nonlinear distortion, which increases out-of-band radiation and creates in-band distortion. One solution to this problem is to employ an expensive power amplifier with a large linear range. More practical techniques decrease the PAPR of the transmitted signal by modifying of the OFDM signal prior to the power amplifier.
There have been a variety of techniques developed to generate OFDM symbols with reduced PAPR, but none provide a large reduction in PAPR with low complexity and without degrading the performance of the system (distorting the OFDM signal). The most popular distortionless PAPR reduction techniques, partial transmit sequence (PTS), selective mapping (SLM), and tone reservation (TR) provide significant PAPR reduction. However, all have relatively high computational complexity. The tradeoff between PAPR reduction and computational complexity has motivated the development of numerous techniques.
IFFT-based PTS provides a solution to the major drawback of PTS, namely, the computational complexity due to multiple IFFTs. To generate the time domain PTS sequences, multiple transforms were computed over identical IDFTs. The periodic autocorrelation function (ACF) of the time domain PTS sequences was employed to develop a PTS subblock partitioning technique using error-correcting codes (ECCs). This minimizes the number of repeated subcarriers within a subblock and provides better PAPR reduction than pseudorandom subblocks. A PAPR reduction comparison between ECC subblocks and previous approaches was presented. This showed that the PTS subblocks using ECCs provides significant PAPR reduction with a small number of remaining stages. Hence, both PAPR and complexity reduction are achieved with this technique.
Similar to PTS, IFFT-based SLM can reduce the computational complexity due to the computation of multiple IFFTs. To generate the SLM sequences, intermediate signals within the IFFT were phase rotated. Further, a lowcomplexity SLM technique based on partial phase rotated inputs to the identical IDFTs was examined. This technique computes multiple inverse IFFTs with significantly lower computational complexity. A comparison between this technique and original SLM was presented in terms of PAPR reduction and computational complexity. This showed that the partial SLM approach provides significant complexity reduction with PAPR performance very close to that of original SLM.
A new class of gradient-based tone reservation algorithms, IFFT-based TR, was presented. To generate the peak reduction kernels, the transform matrices of identical IDFTs were employed. It was shown that they can be used to provide low-complexity solutions to determining the peak reduction tones (PRTs) and computing the peak reduction kernels. This is particularly important with time varying channels, in which case the algorithms must be employed periodically according to the coherence time of the channel. The cost is a slight increase in side information. Results were presented, which demonstrate that the IFFT-based TR technique outperforms the previous gradient algorithm in terms of computational complexity, with a slightly degradation in PAPR performance.

Future Work.
Low-complexity IFFT algorithms are desirable for efficient transform implementation in hardware and software. The design of IFFT algorithms in the context of IFFT-based PAPR reduction presents new signal processing opportunities for multicarrier modulation. IFFT algorithms such as split-radix, radix-2/4/8, and radix-2 2 are interesting from the point of view of computational complexity, PAPR reduction, and suitability for hardware implementation. Thus this is an important research direction for the future. Optimization of the IFFT-based techniques introduced in this paper for implementation on digital signal processors (DSP), field programmable gate arrays (FPGAs), or application-specific integrated circuits (ASICs), should also be investigated.
Multiple-input multiple-output OFDM (MIMO-OFDM) systems and the related signal processing have acquired great significance in recent years. It is employed in fourth generation (4G) networks and has been proposed for broadband wireless communication systems. Thus, efficient MIMO-OFDM transmission and reception techniques are of interest. Similar to single antenna OFDM, one of the major drawbacks with MIMO-OFDM is that the signals transmitted on different antennas may exhibit a large PAPR. If one of the distortionless PAPR techniques is employed, complexity becomes a severe problem, much more so than in a single antenna OFDM system. This is because each transmitter implements the OFDM modulation. Previously developed PAPR reduction techniques for MIMO-OFDM do not address the computational issues associated with practical implementation. Thus, the development of efficient algorithms for IFFT-based PAPR reduction in space, frequency, or/and time block coded MIMO-OFDM systems is an important direction for future research.