Suboptimal Greedy Power Allocation Schemes for Discrete Bit Loading

We consider low cost discrete bit loading based on greedy power allocation (GPA) under the constraints of total transmit power budget, target BER, and maximum permissible QAM modulation order. Compared to the standard GPA, which is optimal in terms of maximising the data throughput, three suboptimal schemes are proposed, which perform GPA on subsets of subchannels only. These subsets are created by considering the minimum SNR boundaries of QAM levels for a given target BER. We demonstrate how these schemes can significantly reduce the computational complexity required for power allocation, particularly in the case of a large number of subchannels. Two of the proposed algorithms can achieve near optimal performance including a transfer of residual power between subsets at the expense of a very small extra cost. By simulations, we show that the two near optimal schemes, while greatly reducing complexity, perform best in two separate and distinct SNR regions.


Introduction
In OFDM, multiplexing over multiple-input multiple-output (MIMO) channels, or general transmultiplexing techniques, a number of independent subcarriers or subchannels arise for transmission,which differ in SNR. Maximising the channel capacity or data throughput under the constraint of limited transmit power leads to the well-known and simple waterfilling algorithm [1]. Water-filling is generally followed by bit loading, where bits are allocated to the QAM symbols transmitted over the th subchannel. To achieve an identical target bit error ratio (BER) across all subchannels leads to ∈ R, which needs to be rounded off to the nearest integer (r) = ⌊ ⌋, thus lowering the overall throughput. Furthermore, unbounded modulation orders (r) → ∞ in the case of infinite SNR are required to efficiently utilise the transmit power but are practically unfeasible.
In order to optimise capacity and throughput, a wide range of methods has been suggested in the literature. Pure water-filling-based solutions have been reported in [2][3][4], leading to some of the above stated problems. Reallocation of the excess power when realising the target BER given (r) ∈ Z and the SNR in the th subchannel has led to a rate-optimal algorithm known as the greedy algorithm [5,6], of which a number of different variations have emerged constraining either the average BER [7] or the total power [8]. For a good review of greedy algorithms, please refer to [9]. Owing to the iterative nature of these algorithms to optimally achieve their respective objective functions, the computational complexity dramatically increases with the number of subchannels. The situation becomes practically prohibitive for multicarrier systems (such as OFDM) as the number of subcarriers is usually high and can reach, for example, up to 2 13 for digital video broadcasting (DVB) for terrestrial (DVB-T) or handheld (DVB-H) applications [10][11][12].
While achieving rate optimality, the family of greedy algorithms is also known to be greedy in terms of computing requirements. Therefore, reduced complexity schemes are either water-filling-based only [2] or aim at simplifications 2 The Scientific World Journal [13]. In this paper we propose a novel suboptimal greedy algorithm, whereby the power reallocation is performed in subsets of the subchannels. We show that with simple overall power redistribution between groups, two different methods in terms of approximate overall optimisation can be proposed. These suboptimal schemes, while greatly simplifying complexity, hardly sacrifice any performance compared to the full GPA algorithm, provided that the proper algorithmic version is selected for specific SNR regions.
Different from our previous work in [14], the interest of this paper is focusing on the simplification achievements of our proposed power allocation scheme compared to the standard greedy approach by further elaborating on the complexity analysis of both algorithms. Moreover, results for multicarrier systems are included which highlight the significant reduction in complexity gained by our approach. The rest of the paper is organised as follows. In Section 2, the standard greedy approach is first reviewed including the initialisation step of uniform power allocation (UPA). Our proposed reduced-complexity schemes are presented in Section 3, where computational complexity is analysed and evaluated in Section 4. Simulation results are discussed in Section 5 and conclusions are drawn in Section 6.

The Greedy Approach
In this section, the greedy approach for the power allocation problem to maximise the transmission rate over a multichannel system is introduced.

Constrained Optimisation Problem.
We are interested in the problem of maximising the transmission rate over a multichannel system. This problem could arise from any transmultiplexed communications system, such as narrowband MIMO systems decoupled by an SVD for precoding and equalisation [15]. Given an × narrowband MIMO system with receive and transmit antennas, the channel can be characterised by a matrix H ∈ C × of complex coefficients ℎ which describe the gains between the th transmit to the th receive antennas. The singular value decomposition (SVD) in this case can be used to decouple the system H into = rank(H) ≤ min( , ) subchannels whose gains are equal to the singular values , = 1, . . . , , that are ordered such that ≥ +1 for all . This is likely to result in different SNRs for each subchannel, and if all subchannels are allocated the same number of bits and transmit power, the overall system performance will be dominated by the worst subchannel with gain . Another popular multiplex system is either SISO or MIMO OFDM. Without loss of generalisation, in the following we assume a SISO OFDM system, whereby the ISI channel is characterised by an FIR vector h = [ℎ 0 ⋅ ⋅ ⋅ ℎ ] ∈ C +1 of order . If this OFDM system is based on an -point discrete Fourier transform (DFT), then the resulting subcarriers experience different gains , = 1 ⋅ ⋅ ⋅ , that represent the Fourier coefficients of the channel impulse response; that is, = ∑ −1 =0 ℎ 2 / . The th subcarrier with gain will be used to transmit bits per symbol.
In both cases considered previously, independent subcarriers or subchannels arise, whereby in the following we will use both terms synonymously. To optimise the data throughput across such a system with independent subchannels, in this paper we consider the maximisation of the sum-rate constrained by the total power budget, the target bit error ratio (BER), and the maximum permissible QAM modulation order. These constraints can be formulated as where is the amount of power allocated to the th subchannel to achieve a BER P , and max is the maximum number of permissible bits allocated to a subchannel. Note that the target BERs are assumed to be equal, that is, P , = P target in (2) for all subchannels = 1, . . . , , and therefore the subscript will be dropped from the BER notation, that is, P , = P .
The channel-to-noise ratio of the th subchannel can be defined as where N 0 is the total noise power at the receiver, whereas the SNR of this subchannel is We consider rectangular -QAM modulation of order , 1 ≤ ≤ , where is the maximum QAM constellation, that is, permissible by the transmission system, that is, = 2 max . The BER of this modulation scheme is given by Assuming availability of channel state information (CSI) at the transmitter, symbols of -bits, = log 2 can be loaded to a subcarrier with minimum required SNR to achieve P target obtained from (5) as The Scientific World Journal 3 Initialisation: (1) Set power available for GPA to gpa = ex in (12b) For each subchannel do the following: (2) Set gpa = in (10) and index = in (9) (3) Cal. the min required upgrade power up = QAM +1 − QAM CNR Recursion: while gpa ≥ min( up ) and min( ) < , 1 ≤ ≤ Algorithm 1: Full GPA algorithm applied to the initialisation step of the UPA algorithm.
where −1 is the inverse of the well-known function Based on (6), the bit loading problem is solved in two steps-(i) a uniform power allocation (UPA) initialisation step and (ii) the greedy algorithm step-which are both described below.

Subchannel
Grouping and UPA Algorithm. The uniform power allocation is performed by the following steps.
(2) Equally allocate budget among all subchannels 1 ≤ ≤ : (3) Allocate subchannels according to their SNR to QAM groups G , 0 ≤ ≤ , bounded by QAM levels QAM and QAM +1 with QAM 0 = 0 and QAM +1 = +∞ (cf. Figure 1) such that (4) For each group G , load subchannels within this group with QAM constellation and compute the group's total allocated bits: with 0 = 0. It is clear at this point and from step (3) that subchannels are resided into QAM groups of SNR levels that are below their actual SNRs, ≥ QAM , therefore leaving some unused (excess) power: where , 1 ≤ ≤ , is the number of subchannels that occupies the QAM group G .
(5) Overall, the allocated bits and the used power for the uniform power allocation scheme are therefore where ex = ∑ =0 ex is the total excess power that remains unallocated under the UPA scheme.

Full Greedy Power Allocation (GPA) Algorithm.
The second step towards the GPA is described next. Based on the initialisation step described in the UPA,the full GPA algorithm [8] performs an iterative redistribution of the unallocated power of the UPA algorithm ex by applying the algorithmic steps detailed in Algorithm 1. At each iteration, 4 The Scientific World Journal Subchannels SNR and QAM levels Indices for G 0 Subchannels (a) Subchannels SNR and QAM levels

Subchannels
Indices for group G 2 Indices for group G 1 Indices for group G 0 · · · (b) Figure 1: Subchannel grouping into + 1 QAM groups based on their SNRs in (8) and step (3) of Section 2.2 for (a) a multicarrier system and (b) an ordered multicarrier system or a SVD-based MIMO system. this algorithm tries to increase bit loading by upgrading (to the next higher QAM level) the subchannel with the least power requirements through an exhaustive search by performing step (4) in Algorithm 1 for all subchannels . When either (i) the remaining power cannot support any further upgrades or (ii) all subchannels appear in the highest QAM level , the algorithm stops, resulting in the bit allocation and power usage given by, respectively,

Proposed Low-Cost GPA Schemes
Given as defined in (10) and ex in (11), three low-cost greedy algorithms are proposed to efficiently utilise the total excess power of the uniform power allocation in (12b) using the QAM grouping concept. More precisely, GPA is separately accomplished for each QAM group G aiming to increase the total bit allocation to this group and therefore the overall allocated bits. Based on the way of utilising ex , we propose three different algorithms, which below are referred to as (i) grouped GPA (g-GPA), (ii) power moving-up GPA (Mu-GPA), and (iii) power moving-down GPA (Md-GPA).
3.1. Grouped GPA (g-GPA) Algorithm. As discussed in Section 2, optimum discrete bit loading constrained by total power and maximum permissible QAM order can be performed by the GPA approach. However, the direct application of the GPA algorithm is computationally very costly due to the fact that at each iteration an exhaustive sorting of all subchannels is required as evident from Algorithm 1.
A simplification of the GPA algorithm can be achieved if subchannels are first divided into QAM groups G , 0 ≤ ≤ , according to their SNRs as shown in Figure 1(a). After subchannel ordering or due to the implicit ordering of the singular values in case of SVD-based decoupling of MIMO systems, the grouping as shown in Figure 1(b) arises. The GPA algorithm is therefore independently applied to each group G , trying to allocate as much of the excess power ex within this QAM group as possible. This excess power is iteratively allocated to subchannels within this group according to the greedy concept with the aim of upgrading as many subchannels as possible to the next QAM level.
The pseudocode of the g-GPA algorithm for the th QAM group is given in Algorithm 2. Note that, different from the standard GPA, this algorithm permits upgrades to the next QAM level only for a given QAM group, with up set to +∞ in steps (5) and (6) in Algorithm 2. Therefore, some left-over (LO) power LO may remain for each QAM group G , resulting in a total LO power Intuitively, for the overall performance of the g-GPA algorithm, the algorithm in Algorithm 2 has to be executed times, once for each QAM group from 0 to −1 , resulting in the system achieving the following bit allocation and power usage, respectively:  (11) while LO

Power
, and LO Algorithm 2: g-GPA algorithm for subchannels in the th QAM group .

Yes
End No For all QAM groups: Apply g -GPA algorithm for subchannels in group G 0 with P ex 0 Apply g -GPA algorithm for Is k = K Compute final left-over power and overall allocated bits using (16) and (17a), respectively whereby we first proposed to move power upwards starting from the lowest QAM group, as outlined in Figure 3(a) and by the flowchart in Figure 2. This modifies the g-GPA algorithm by considering the LO power LO 0 of the QAM group G 0 after running the g-GPA algorithm on that group and assigns this power for redistribution to group G 1 . Any LO power after running g-GPA on G 1 is then passed further upwards to G 2 and so forth. At the th algorithmic iteration, the Mu-GPA algorithm is working with G and tries to allocate the sum of the excess power missed by the UPA algorithm of that  group as well as the LO power of the application of the g-GPA algorithm to the previous group G −1 , that is, ex + LO −1 (cf. Figure 3(a)). Finally, the LO power resulting from the QAM group G −1 is added to the excess power of the Kth QAM group ex to end up with a final LO power The overall number of allocated bits and the amount of used power for Mu-GPA are, respectively, used

Power Moving-Down GPA (Md-GPA) Algorithm.
A second algorithm is proposed to exploit the residual power LO of each QAM group but in a reverse direction compared to the Mu-GPA algorithm of Section 3.2. Starting from the highest-indexed QAM group G −1 downwards to the lowestindexed QAM group G 0 , the Md-GPA algorithm, similar to the Mu-GPA algorithm, tries to improve the bit allocation by efficiently utilising LO , − 1 ≥ ≥ 1, plus the excess power ex . These procedures are illustrated in Figure 3(b) which show the direction of the LO power flow. Proceeding downwards, at the th stage the Md-GPA scheme applies the g-GPA algorithm for the available power that comprises both the excess power missed by the UPA algorithm of the previous QAM group (G +1 in this case) and the LO power of the previous stage, that is, ex +1 + LO +1 . Therefore, the excess power of the QAM group under consideration along with its LO power is not utilised within this group but is transferred to the next working group. This will finally result in a LO power of The overall number of allocated bits and the amount of used power for Md-GPA are, respectively, used

Computational Complexity Evaluation
In order to address the significance of the proposed power loading schemes in terms of simplicity compared to the full GPA algorithm, the computational complexity of both g-GPA and GPA algorithms is evaluated. Instead of jointly applying the GPA algorithm across all subchannels which consequently requires high system complexity especially for large numbers of subchannels, the g-GPA algorithm only addresses a subset of subchannels within a specific QAM group at a time. Beyond the effect of the QAM grouping concept, a further reduction in complexity can be achieved if subchannels are ordered with respect to their gains CNR , as found with SVD-based decoupling of MIMO systems. In this case, search step (3) in Algorithm 2 can be replaced by a simple incremental indexing. Referring to Algorithms 1 and 2, the computational complexities of both GPA and g-GPA algorithms are summarised in Table 1, whereby the number of operations (NoO) is computed for each algorithm. We consider the cases where subchannel SNRs are either ordered prior to involving g-GPA or the ordering is left to any of the g-GPAs. Note that for the GPA algorithm, ordering of subchannels does not led to any improvement in complexity as search step (4) in the while loop-which represents the bottleneck of the overall computations-has to include all subchannels. This is due to the fact that by relaxing the grouping concept it is possible to find subchannels in lower QAM levels that need less power to upgrade than others in higher QAM levels, whereas in the case of the g-GPA algorithm, initial sorting of subchannels according to their CNR is sufficient to avoid the repetitive search/sorting step (3) of Algorithm 2 as this algorithm is Table 1: Computational analysis for both GPA and g-GPA algorithms.
The quantities 1 and 2 in Table 1 denote, respectively, the averaged number of iterations of the while loops for the GPA algorithm in Algorithm 1 and the g-GPA algorithm in Algorithm 2. Note that it is expected that 1 ≥ 2 = ∑ −1 =0 2 as ex in (11) collected from all subchannels has to be redistributed by the GPA algorithm, while ex collected from only subchannels ∈ G is considered by the g-GPA algorithm.
Obviously, in (11) cannot be easily quantified as it depends on both the operating SNR and CNR , which for Rayleigh channels is a chi-squared distributed random variable. Therefore, the complexity of g-GPA is evaluated in a heuristic fashion. In the worst case and by assuming that subchannels are uniformly distributed across all QAM groups, that is, = / , the complexity of the g-GPA algorithm can be approximated as given in Table 1 which is lower than its GPA counterpart.

Simulation Results and Discussion
Sections 3.2 and 3.3 have shown that both Mu-GPA and Md-GPA algorithms work very similarly in utilising the power LO that remains unused after applying the g-GPA algorithm to all groups , 0 ≤ ≤ − 1. The two algorithms differ in the direction in which LO is transferred. Below we compare by simulations the bit allocation performance of the two algorithms with the UPA, GPA, and g-GPA approaches. Two sets of simulations are conducted to explore the achieved data throughput of the considered algorithms for the case of narrowband MIMO and OFDM-multicarrier systems, whereby the latter is characterised by a much higher number of subchannels.

Narrowband MIMO Case.
The proposed loading schemes are first tested on a 4 × 4 narrowband MIMO system to investigate bit loading performance. The entries of the channel matrix H are drawn from a complex Gaussian distribution with zero mean and unit variance; that is, ℎ ∈ CN(0, 1). The subchannels are obtained by means of an SVD, which provides optimal joint linear precoding and equalisation in a number of senses [15] and yields subchannel gains that are equivalent to the 4 singular values of H. Results presented below refer to ensemble averages across 10 4 different channel realisations for a target BER of P target = 10 −3 and various levels of SNRs using QAM modulation schemes = 2 , = 1 ⋅ ⋅ ⋅ , with = 6 being the maximum permissible QAM level with constellation size = 64, which is equivalent to encoding 6 bits per data symbol.
The total system throughput is examined and shown in Figure 4 for all proposed algorithms in addition to both UPA and standard GPA algorithms. It is evident that UPA represents an inefficient way of bit loading since the performance is approximately 2 to 5 dB below other algorithms when operating under moderate SNRs between 10 and 30 dB, and provides approximately only half the throughput in the SNR region between 5 and 10 dB.
Of the proposed low-cost greedy algorithms, both Mu-GPA and Md-GPA algorithms outperform the g-GPA without the refinement stage to allocate residual power across QAM groups. Interestingly, Mu-GPA performs better at low SNR, while Md-GPA performs better at higher SNRs. This can be attributed to the fact that, for low-to-medium SNRs, ex (which is missed by the Mu-GPA) will be relatively low and can be allocated without violating the constraint on the maximum QAM level . In contrast, ex 0 , which is missed by the Md-GPA, is most likely to be high as evident from (11) and Figure 1. For medium-to-high SNRs, ex > ex 0 can be expected to be high, and thus Md-GPA is likely to be advantageous in its bit allocation, as the maximum QAM level  constraint is beginning to be felt and ex is fully utilised by the Md-GPA algorithm. Finally, for very high SNRs most subchannels will appear in the highest QAM group G as their SNRs, in (8), exceed the highest QAM level QAM in (6). As a result, the overall system throughput of all different algorithms reaches its expected maximum of 4 × max bits/symbol.
The data throughput performance of the various algorithms can also be confirmed when considering the power utilisation. Figure 5 shows the total transmit power budget and the levels of power allocation that are reached by the different algorithms. For Md-GPA and Mu-GPA it can be  noted that, within their respective superiority regions, both are very close to the performance of the standard GPA which demonstrates the efficient utilisation of the LO power missed by the g-GPA algorithm. Nevertheless, at high SNR, both g-GPA and Mu-GPA algorithms behave like the UPA algorithm due to the increase of ex , which is missed by both of them and therefore deteriorates their performances. Note that the minimum theoretical transmit power that according to (6) is required to load all subchannels with max averaged over all 10 4 channel realisations corresponds to an approximate SNR of 38.17 dB as shown in Figure 5.

OFDM-Multicarrier Case.
Another simulation set is conducted to examine the performance of our proposed schemes for an OFDM-multicarrier system with a significantly higher number of subchannels as considered in Section 5.1. Here we assume a SISO OFDM system, whereby the ISI channel is characterised by an impulse response vector h of order = 6 with entries drawn from an independent complex Gaussian process with zero mean and unit variance. Results are conducted for a 32-subcarrier system averaged over 10 4 channel realisations for a target BER of P target = 10 −3 and varying SNR using the same QAM modulation schemes as in Section 5.1.
The total system throughput is shown in Figure 6 for all proposed algorithms in addition to both UPA and standard GPA algorithms. It is clearly shown that both Mu-GPA and Md-GPA algorithms perform very close to the GPA algorithm (with throughput loss ≤ 4 bits) within their SNR favourable regions, which swap approximately at SNR = 25.82 dB. Figure 7 again shows the power usage of all algorithms that is required to reach their respective throughput in Figure 6. Compared to the optimum GPA, the Md-GPA algorithm demonstrates very similar power utilisation with some inferior performance due to missing to allocate the final Table 2: Simulation results for the parametric analysis of the GPA and g-GPA algorithms given in Table 1   LO power in (18). At higher SNRs, both Mu-GPA and g-GPA algorithms converge to the power usage performance of the UPA algorithm as ex dominates other ex , 0 ≤ ≤ K − 1, and therefore only the Md-GPA algorithm is advantageous in this region. The minimum theoretical transmit power required to load all subcarriers with max in this case is shown to be equivalent to an approximate SNR of 41.61 dB.

Computational Complexity Results.
In order to evaluate the computational complexity of the proposed scheme compared to the standard GPA algorithm, the number of algorithmic operations presented in the complexity analysis in Section 4 is tested and compared for both g-GPA and GPA algorithms using a 1024-subcarrier system. Table 2 gives the simulation results of the number of operations-averaged over 10 4 channel instances-for both "no order" and "order" cases of the g-GPA algorithm along with the GPA algorithm at three different values of SNR of 15 dB, 25 dB, and 35 dB. Note that 2 = ∑ −1 =0 2 is less than 1 for all SNR values which validates the complexity analysis of Section 4. Furthermore, a reduction of almost half the number of operations can be gained by ordering subchannels of the g-GPA algorithm, which results in an overall reduction factor compared to the full GPA algorithm of approximately 2, 3, and an order of magnitude for the considered SNR values, respectively (cf. Table 2).
The complexity analysis can also be evaluated by investigating the computation time of both GPA and g-GPA algorithms. Figure 8 shows the computation time against the number of subcarriers for the g-GPA algorithm with both "no order" and "order" cases compared to the GPA algorithm. Two different SNRs values of 15 dB and 35 dB that represent the approximate conditions of mobile and fixed wireless communication, respectively, are considered in this simulation. It is clear that the g-GPA algorithm has a higher computational efficiency in particular for large values of and high SNRs, while the effect of subcarrier ordering is also evident as discussed in Section 4. Assuming a close correlation between the number of operations and their computation time, it is noted that at = 1024 subcarriers these results coincide with that of Table 2.
In a statistical fashion, Figure 9 demonstrates the cumulative distribution function (CDF) of the computation time for both algorithms at the same SNR values which reveals the computational efficiency of the proposed g-GPA algorithm and its modified versions of both Mu-GPA and Md-GPA.

Conclusions
Power allocation to achieve maximum data throughput under constraints on the transmit power and the maximum QAM level has been discussed. The optimum solution is provided by the greedy power allocation (GPA) algorithm, which operates across all subchannels but is computationally very expensive. Therefore, in this paper suboptimal lowcomplexity alternatives have been explored. The common theme amongst the proposed algorithms is to restrict the GPA algorithm to subsets of subchannels, which are grouped according to the QAM levels assigned to them in the uniform power allocation stage. In order to exploit excess (unused) power in each subset, two algorithms were created which carry left-over power forward into the next subset that is optimised by a local greedy algorithm. Two different schemes have been suggested, of which one moves the left-over power upwards from the lowest to the highest subgroup, where in the high SNR case a limitation by the maximum defined QAM level can restrict the performance. A second scheme moves the power from the highest towards the lower subgroups, whereby at low SNR the channel quality in the lowest subgroups may not be such that it can be lifted across the lowest QAM level, and hence no bits may be loaded with the excess power. However, in general both algorithms perform very close to the GPA in their respective domains of preferred operation, thus permitting to allocate power close to the performance of the GPA at a much reduced cost.