We propose a dynamic resource allocation algorithm for device-to-device (D2D) communication underlying a Long Term Evolution Advanced (LTE-A) network with reinforcement learning (RL) applied for unlicensed channel allocation. In a considered system, the inband and outband resources are assigned by the LTE evolved NodeB (eNB) to different device pairs to maximize the network utility subject to the target signal-to-interference-and-noise ratio (SINR) constraints. Because of the absence of an established control link between the unlicensed and cellular radio interfaces, the eNB cannot acquire any information about the quality and availability of unlicensed channels. As a result, a considered problem becomes a stochastic optimization problem that can be dealt with by deploying a learning theory (to estimate the random unlicensed channel environment). Consequently, we formulate the outband D2D access as a dynamic single-player game in which the player (eNB) estimates its possible strategy and expected utility for all of its actions based only on its own local observations using a joint utility and strategy estimation based reinforcement learning (JUSTE-RL) with regret algorithm. A proposed approach for resource allocation demonstrates near-optimal performance after a small number of RL iterations and surpasses the other comparable methods in terms of energy efficiency and throughput maximization.

D2D communication is a direct communication between the users transmitting over the cellular spectrum (inband) or operating on an unlicensed band (i.e., outband). The main advantages of inband D2D communication are the increased spectrum efficiency and possibility of quality of service (QoS) provisioning for different cellular/D2D users. The chief obstacles to the implementation of inband D2D access are (i) interference mitigation (between the users transmitting over the same frequency bands) and (ii) resource allocation [

The main contributions of this work are as follows. We consider a network-controlled D2D communication in which the licensed and unlicensed spectrum resources, user modes, and transmission power levels are allocated to different device pairs by the LTE eNB to maximize the overall network utility. We consider a general network deployment scenario where the unlicensed band is assumed to be provided by one or more radio access technologies (RATs) based on the orthogonal frequency division multiple access (OFDMA), carrier sense multiple access with collision avoidance (CSMA/CA), frequency-hopping code division multiple access (FH-CDMA), or any other multiple access method. It is assumed that all device pairs are equipped with different wireless interfaces allowing them to connect to the appropriate RAT and use a CSMA/CA to avoid collisions when operating on the unlicensed band. Hence, each unlicensed channel becomes available to a D2D pair only when it is idle. Unlike many previous works, we jointly solve the problems of inband/outband access, mode selection, and spectrum/power assignment by combining these problems into one optimization problem which allows to allocate the inband network resources and offload the D2D traffic in a most effective way (in terms of maximizing the overall network utility). Note that the formulated problem can be solved to optimality only if the global channel and network knowledge (including the precise information on the operating conditions of the licensed and unlicensed channels) is available to the eNB. However, because of the absence of an established control link between the unlicensed and cellular radio interfaces, the eNB cannot get any information about the quality and availability of the unlicensed channels. As a result, a considered resource allocation problem becomes a stochastic optimization problem that can be dealt with by deploying a learning theory [

Consequently, we formulate the outband D2D access as a dynamic single-player game in which the player (eNB) estimates its possible strategy and expected utility for all of its actions based only on its own local observations using a JUSTE-RL with regret (originally proposed in [

It is worth mentioning that, in wireless communications, RL has been studied in the context of various spectrum access problems. In [

The rest of the paper is organized as follows. A general network model for inband and outband network operation is described in Section

In this paper, the problem of resource allocation for D2D communication is investigated for both the uplink (UL) and downlink (DL) directions. Similarly, the discussion through the rest of the paper is applicable (if not stated otherwise) to either direction. Consider a basic LTE-A network consisting of one eNB and

In our network, any potential D2D pair can be allocated with cellular or D2D mode (based on the results of resource allocation procedure). Consequently, we define a binary mode allocation variable _{n} is allocated CM at slot

Inband D2D: a D2D pair operates within the licensed LTE spectrum in an underlay to cellular communication.

Outband D2D: a D2D pair transmits over the unlicensed band by exploiting other RATs, such as Wi-Fi Direct [

_{n}operates inband at slot

In LTE/LTE-A, RBs are allocated to cellular users by the eNBs using a standard packet scheduling procedure [

Let us further define a binary RB allocation variable _{n} is allocated with RB_{k} at slot

Let _{n} and PU_{m} operating on RB_{k} (for _{n} operating on RB_{k} and the eNB). In LTE system, the instantaneous values of _{n} operating on RB_{k}, the SINR at slot_{n} at slot_{n} depends on the number of RBs allocated to this device pair and the SINR in each RB. That is, _{n} (in bits per slot or bps) over licensed (inband) spectrum and

We consider_{n} is allocated with the unlicensed channel

Let _{n} operating on unlicensed channel _{n} transmitting over the channel _{n} over unlicensed (outband) spectrum is described by

We define a binary

A D2D-enabled cellular network with three cellular pairs (PU_{2}, PU_{5}, and PU_{7}) and five D2D pairs (PU_{1}, PU_{3}, PU_{4}, PU_{6}, and PU_{8}). Three of the D2D pairs (PU_{1}, PU_{3}, and PU_{4}) use inband access and two D2D pairs (PU_{6}, PU_{8}) are allocated with the unlicensed channels. In this example, different cellular and D2D pairs interfere with each other when transmitting over RB_{2} (where PU_{2} interferes with PU_{1}), RB_{3} (PU_{1} interferes with PU_{5}), RB_{4} (PU_{5} interferes with PU_{3}), RB_{5} (PU_{7} interferes with PU_{3}), and RB_{6} (PU_{4} interferes with PU_{7}).

Ideally, at any slot_{n} (operating either inband or outband). However, when communicating over the unlicensed spectrum, each D2D pair should transmit at a maximal power level to achieve the high SINR regime (and, consequently, service rate) which, in turn, results in increased power consumption of mobile terminals. Therefore, when formulating the utility of each device pair, we should also consider the cost of power consumption, to quantify the trade-off between the achieved rate and power level (as in [_{n} at slot _{n}.

Using the above definition, we can express our resource allocation problem as follows:_{n}). Note that information on the sets

The main idea behind RL is that the actions (unlicensed channel allocations) leading to the higher network utility at slot

To apply JUSTE-RL with regret to our problem, we represent it as a game with one player (the eNB) having no information about the operating environment. A finite set of the eNB’s actions _{n} and

We also define a mixed-strategy probability ^{−23} J/K is the Boltzmann constant;

Using the above definitions, the dynamics of a JUSTE-RL with regret can be described as [

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

Consider (

Most of the MINLP solution techniques involve the construction of the following relaxations to the considered problem: a nonlinear programming (NLP) relaxation (the original problem without integer restrictions) and a mixed-integer linear-programming (MILP) relaxation (an original problem where the nonlinearities are replaced by supporting hyperplanes). To form the MILP and NLP relaxations to (

In general, all MINLP problems can be solved using either exact techniques (e.g., branch-and-bound [

Particularly, with the input _{1}-norm and_{2}-norm, respectively. The rounding is carried out by solving the problem (

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

Note that, in general, finding an optimal solution to any joint resource allocation problem with integrality constraints is NP-hard (which has been shown in [

We now discuss the implementation of the proposed algorithms (presented in Section

All users send their SRs to the eNB via dedicated PUCCHs. Note that the SRs may contain some useful control information, such as updated target SINR level

After receiving the SRs from all of the users, the eNB performs resource allocation (by assigning the modes, RBs and unlicensed channels, and power levels to user pairs according to Algorithms

After receiving the SGs, the users start their data transmissions over allocated RBs/unlicensed channels with assigned mode and power levels.

As it was already been mentioned, we deploy a CSMA/CA for outband D2D access using a procedure described in IEEE 802.11 [_{n},

It is worth mentioning that, at some point in time, a JUSTE-RL will reach its equilibrium state. However, even after the equilibrium has been reached, the eNB continues the learning process, because the network environment (channel quality, network traffic, and the number of active users) is likely to change over time resulting in different optimal mode, RB/unlicensed channel, and power allocations.

A simulation model of the network has been implemented upon a standard LTE-A platform using the OPNET simulation and development package [^{6} K, and

Simulation parameters of LTE-A Model.

Parameter | Value |
---|---|

Cell radius | 500 m |

Frame structure | Type 2 (time division duplex) |

Slot duration | 1 ms |

TDD configuration | 0 |

eNodeB Tx power | 46 dBm |

UE active/idle Tx power | 23/2 dBm |

Noise power | −174 dBm/Hz |

Path loss and cellular link | |

NLOS path loss and D2D link | |

LOS path loss and D2D link | |

Shadowing st. dev. | 10 B (cell mode); 12 dB (D2D mode) |

In this paper, the evaluation of a proposed approach for inband and outband resource allocation, referred to as JRA (JUSTE-RL based resource allocation), is divided into two parts. In the first part, we analyze the performance of JUSTE-RL with regret for unlicensed channel allocation (Algorithm

First is joint inband/outband resource allocation with

Second is centralized optimal strategy (COS), where the inband and outband network resources are allocated to the users by solving (

Third is social heuristic for multimode D2D communication (SMD) in an LTE-A network proposed in [

Fourth is greedy heuristic for multimode D2D communication (GMD) in LTE-A networks [

Fifth is ranked heuristic for multimode D2D communication (RMD) in LTE-A networks [

We start with the performance evaluation of JUSTE-RL with regret for unlicensed channel allocation (outlined in Algorithm

The average number of RL iterations (slots) necessary for convergence of strategies in JRA with different values of

The average number of RL iterations (slots) necessary for convergence of utilities in JRA with different values of

The absolute error of strategy estimation

The absolute error of utility estimation

In Figures

The instantaneous network utility

The instantaneous network utility

We now evaluate the efficiency of a proposed inband/outband resource allocation (Algorithms

The average number of FP iterations (per slot) necessary for the convergence of the algorithms with fixed

The average solution time (in

The average relative deviation from the optimal solution Δ in different algorithms with fixed

Figures

The average user throughput

The average user transmit power (in dBm) in different algorithms with fixed

The instantaneous network utility

This paper introduces a JRA algorithm for a D2D-enabled LTE-A network with access to unlicensed band provided by one or more RATs based on different channel access methods (OFDMA, CSMA/CA, FH-CDMA, etc.). In the presented framework, the inband/outband network resources (cellular/D2D modes, spectrum, and power) are allocated jointly by the LTE eNB to maximize the total network utility. Unlike most of the previously proposed techniques for outband D2D communication (which presume a certain level of coordination and information exchange between licensed and unlicensed systems), our JUSTE-RL based approach for unlicensed channel assignment is fully autonomous and has demonstrated relatively fast (≈300 RL iterations) convergence to

The authors declare that they have no competing interests.