Energy-Efficient Time-Domain Equilibrium Scheduling and Optimization Scheme for Energy Harvesting-Powered D2D Communication

Energy Harvesting(EH-) powered Device-to-Device (D2D) Communication underlaying Cellular Network (EH-DCCN) has been deemed as one of the basic building blocks of Internet of Things due to its green energy efficiency and adjacent communication. But available energy will be one of the biggest obstacles when implementing EH-DCCN due to the immaturity of EH technology and the volatility of environmental energy resources. To improve energy utilization, this study investigates an efficient scheduling and power allocation scheme about transmission load equilibrium in the time domain. Accordingly, a short-term Sum Energy Efficiency (stSEE) maximization problem for EH-powered D2D communication is modelled, while ensuring a fundamental transmission rate requirement of cellular users. Consequently, the optimization problem is a nonconvex mixed integer nonlinear programming problem. Thus, we propose a two-layer convex approximation iteration algorithm which can obtain a feasible quasioptimal solution for the stSEE problem. Simultaneously, a two-step heuristic algorithm in a slot-by-slot fashion is also developed to acquire a suboptimal solution without requiring statistical knowledge of channel and energy arrival processes. Simulated analysis indicates that the short-term scheduling strategy can obtain better performances in terms of energy efficiency and transmission rate than conventional real-time scheduling scheme. Besides, the maximum scheduled number of EH-D2D pairs underlaying one cellular user under different EH efficiency is analysed, which can give us a theoretical reference about the deployment of future EH-DCCN.


Introduction
The dawn of Internet of Things (IoT) installed for industry production, smart housing, and environmental monitoring has given birth to billions of mobile wireless devices [1]. Ericsson predicts that there will be about 15 billion mobile devices in 2021, and most of them will be low-power devices with short-range communication [2]. The ever-increasing proliferation of wireless devices, together with an exponential rise in users' data demand, is already creating an urgent need for wireless cellular networks to design some new technologies that can attain desired transmission rates and, meanwhile, can achieve green communications [3,4]. Recent emphasis on green communications has generated great interest in the investigations of energy harvesting-(EH-) powered wireless networks [5][6][7]. EH technique can harvest energy from environmental energy resources to prolong the lifetime of wireless communication devices with low power consumption and limited battery life. On the other hand, Device-to-Device (D2D) communication has been viewed as a promising paradigm that can offload traffic from cellular networks by communicating directly with each other in close proximity via multiplexing the spectrum resources being assigned to cellular users (CUs) [8,9]. Accordingly, EHpowered D2D Communication underlaying Cellular Network (EH-DCCN), which combines the characteristics of short-range transmission and green communication, will be an attractive way to keep a better transmission service quality with green energy resources. Nevertheless, this sustainable energy support technique will result in intermittent energy supply issue, which do not exist in the conventional D2D communication systems with fixed energy sources. Consequently, how to efficiently utilize and manage unreliable energy so as to satisfy different transmission demand is the most urgent challenge to achieve the green D2D communication paradigm.
1.1. Related Works. A lot of research has paid their attentions to the challenges caused by the uncertain energy supply technique from different aspects, such as access control and resource management, for the EH-DCCN.
In view of access control, Darak et al. design an online learning handover scheme between D2D mode and Radio Frequency EH (RFEH) mode based on subband statistics in D2D-RFEH communication [10]. For cognitive and EHbased D2D transmission, Sakr and Hossain propose two spectrum access policies for the cellular network, namely, random and prioritized access policies, to evaluate transmission and outage probability for D2D and cellular users [11]. The authors of [12] develop a D2D communication provided by EH Heterogeneous cellular Network (D2D-EHHN), where User Equipment Relays (UERs) harvest energy from an access point to support D2D communication. This paper derives the proper distribution of RFEH-powered UER and proposes an efficient UER selection method.
To adapt the uncertainty in energy supply, researchers have also devoted their efforts to the design of efficient resource management schemes in terms of power allocation, spectrum matching, and time sharing for EH-DCCN. Tutuncuoglu and Yener study the power allocation policies for EHsupported transmitters by optimizing the sum rate [13], which is similar to a one-to-one spectrum sharing model in D2D communication underlaying cellular network [14]. Yet, the Quality of Service (QoS) requirement, which mainly refers to transmission rate demand, is not involved in the proposed policy. Similarly, the sum rate maximization problem for D2D communication under a downlink resource multiplex system in the presence of multiple CUs and EHpowered D2D links is studied in [15]. To ensure energyefficient spectrum resource assignment, Ding et al. investigate the energy cost minimization problem [16]. Considering the power allocation and time-sharing spectrum occupation management, Hadzi-Velkov et al. maximize the overall cellular network transmission rate based on the statistical average of the harvested energy [17].
The above research works have addressed many challenges caused by the unstable and unreliable power supply of EH from different aspects. Nevertheless, those works are mainly based on a one-to-one spectrum sharing mode where one CU's radio resource is multiplexed by one D2D pair. In this sharing mode, the CU's spectrum will be vacant when the available energy of the EH-powered D2D pair (EH-DP) cannot meet the energy consumption requirement. Hence, by taking the high spectrum efficiency demand into account, one-to-multiple sharing mode, namely, one CU and multiple EH-DPs sharing one radio spectrum, is needed. In this way, the cellular spectrum gap caused by the energy deficiency of the single EH-DP can be filled up. In practice, multiple D2D pairs can be allowed to share the same resource with the cellular users as long as the interference of D2D communications is not harmful to the cellular links [18][19][20]. However, under the one-to-multiple sharing mode, due to the variation of EH efficiency, environmental energy resources, and channel status, the transmission requests, which are relied on available energy, among EH-DPs at different time slots may be unbalanced. Since the traffic is delay-tolerant, the mutual interference among users can be efficiently decreased by balancing the transmission requests among time slots and finally improve energy efficiency. The following section will take two EH-powered D2D pairs underlaying cellular network as a simple example to illustrate our main motivations.

Motivations.
Before describing our motivations, we must make some important statements. We only consider that the D2D user (DU) has EH capability [21]. Meanwhile, for simplicity, suppose that the traffic pattern of each user is full buffer, and all users operate in a time-slotted fashion and are synchronous.
When multiple EH-DPs multiplex the same spectrum resource, the transmission request of EH-DPs at the different time slots may be unbalanced. As illustrated by Figure 1, two EH-DPs d 1 and d 2 are permitted to share a radio spectrum with one CU. From Figure 1(a), the available energy of users is different due to various channel interference conditions and energy conversion efficiency. When the available energy of EH-DP reaches its transmission power threshold E th d u ðu ∈ f1, 2gÞ, EH-DPs will launch the transmission request. Thus, when the two EH-DPs d 1 and d 2 initiate the transmission request at the same time slots, such as time slot 1, the conventional real-time transmission strategy will let them transmit data at the same time by the corresponding power allocation scheme under the interference constraint. However, each of them may not multiplex the spectrum resource in the next time slot (e.g., time slot 2) because of energy supply or energy consumption. However, in the short-term time-domain equilibrium strategy, as depicted in Figure 1(b), either of the two EH-powered D2D pairs can be assigned to multiplex the spectrum resource of CU in the time slot 2. As a result, the interference between the two D2D links in the time slot 1 will be eliminated. Hence, to avoid unnecessary consumption of the harvested energy, the interference among users is required to be appropriately managed in one-to-multiple sharing scenarios [5].   2 Journal of Sensors As mentioned above, by fully considering the available energy and channel status (channel status mainly refers to mutual interference conditions among users (including CU and EH-DPs) in this study), how to balance the transmission loads among different time slots under EH-DCCN with oneto-multiple sharing mode so as to improve the performance of EH-DCCN is the key concern of this study. As far as we know, the considered short-term time-domain energyefficient equilibrium program is the first attempt to do so in EH-DCCN.
1.3. Contributions and Organizations. As previously described, this study focuses on designing an energyefficient transmission scheduling and power allocation scheme so as to increase the performance of the EH-DCCN with the one-to-multiple radio resource sharing mode. Thus, our main contributions can be divided into three main areas: (i) Firstly, this study optimizes a short-term Sum Energy Efficiency (stSEE) problem about EHpowered D2D communication to realize the energy-efficient scheduling scheme. Simultaneously, the available energy and transmission rate constraints of both CUs and EH-DPs are also considered in the optimization problem (ii) Subsequently, a two-layer convex approximation iteration algorithm (CAIA), which consists of an outer-layer iteration algorithm (OLIA) and an inner-layer convex approximation (ILCA) algorithm, is proposed to obtain a feasible quasioptimal solution for the modelled stSEE maximization problem which is a nonconvex mixed integer nonlinear programming (MINLP) problem (iii) Thirdly, a two-step heuristic algorithm, the timedivision scheduling scheme (TDSS), is also developed to acquire a suboptimal solution without requiring statistical knowledge of channel and EH processes. Remarkably, TDSS not only can acquire a suboptimal solution for the stSEE problem but also has a lower computational complexity The rest of this study is organized as follows. In Section 2, we describe the system model in detail and formulate the stSEE maximization problem. The two feasible algorithms, CAIA and TDSS, are elaborated by Sections 3 and 4, respectively. The numerical simulation performance results and the computational complexity of the proposed algorithms are presented and analysed in Section 5. In Section 6, we conclude this study.

System Description and Problem Formulation
This section introduces the system model and formulates the considered resource scheduling problem. To facilitate the understanding, some important notations in this study are listed in Table 1.

System Model.
In what follows, we assume that spectrum matching has already finished. This is to say that multiple EH-DPs have already been allocated to one dedicated CU in some particular optimization conditions, e.g., EE maximization [22]. Thus, as shown in Figure 2, a typical single cellular network consisting of a Base Station (BS) and K EH-DP/CU groups is considered. Suppose that the system utilizes a certain number of orthogonal spectra, then we can divide the spectra into the same number of EH-DP/CU groups. Namely, the communication links in the same group transmit on the same spectrum, and the communication links in different groups use the orthogonal one. Let B represent BS, c k ðk = f1, 2, ⋯, KgÞ denote CU in the kth EH-DP/CU group, and d k,i be a pair of D2D users in the EH-DP set jΦ D k j = N k of the kth EH-DP/CU group. In the kth EH-DP/CU group, N k EH-DPs share the uplink transmission link of the c k th CU (c k ∈ Φ C , f|Φ C | = Kg) to transmit. According to the energy assumption in Section 1.2, the transmitter of each EH-DP is supplied by EH technique and has a battery to store harvested energy. Meanwhile, the available power of the receiver of each EH-DP is deemed as unlimited due to the low-power property of the decoding process. Generally speaking, as illustrated by the EH-DP/CU group 2 of Figure 2, each EH-DP transmission will simultaneously cause interference to receivers of cellular and EH-DP links. Likewise, cellular transmission will generate interference to the EH-DP receivers. Assume that the entire system executes on a slot-by-slot basis. Accordingly, in any time slot, the instantaneous transmission rate of the cellular and EH-DP links can be given by r t c k and r t d k,i , respectively, where x t d k,i ðd k,i ∈ Φ D k Þ is the indicator parameter, 1 indicates the d k,i th D2D pair chosen to transmit in time slot t, and 0 indicates the d k,i th D2D pair not chosen. p t c k and p t d k,i are the corresponding transmission power of CUs and EH-DPs in time slot t, respectively. g i,j denotes the channel gain between nodes i and j. n 0 means the noise power and equals to B W ⋅ β n , where β n is the density of noise power and B W is the uplink channel bandwidth of each group. Figure 3, at time slot t, the transmitter of each EH-DP needs to harvest energy from the environmental energy resources, to store the energy in a battery, and to use the available energy to finish transmission. We study the condition that the energy arrival process in each EH-DP is i.i.d. For the d k,i th EH-DP,

Energy Model. As demonstrated in
i g is the time sequence of harvested energy in T time slots and obeys an i.i.d. Bernoulli process: λ d k,i and E are called EH efficiency of the d k,i th EH-DP. Notably, the concepts of terms of energy and power in this study are equivalent in the unit time slot.
In Figure 3, E t d k,i units of energy are harvested by EH technique and added to the battery at each time slot. Accordingly, p t d k,i units of energy will be consumed for data transmission of the d k,i th device. The existing energy of the d k,i th EH-DP in a battery is defined as B t d k,i ðt ∈ f1, 2, ⋯, TgÞ. Thus, a cumulative power constraint can be expressed as follows: Suppose that the harvested energy can be stored without any loss and used for only communication purposes from the battery. Meanwhile, the battery capacity is large enough to hold every quanta of harvested energy. This assumption is especially valid for the current state of technology in which batteries have very large capacities compared to the energy harvesting efficiency [23]. Furthermore, assume that all state information including Channel State Information (CSI) and Energy State Information (ESI) can be obtained by BS so that BS has the control capability in terms of transmission scheduling and power allocation [24,25].

Mathematical
Model. The short-term Sum Energy Efficiency (stSEE) for EH-powered D2D communication optimization problem is formulated as P EE : TgÞg is the indicator parameter set about whether the d k,i th EH-DP is allowed to transmit or not in an instantaneous time slot t.
TgÞg are the sets of transmission power of CU and EH-DPs, respectively. To avoid serious mutual interference, as represented by (4a) and (4d), the maximal transmission power in each time slot should be limited at the CU and EH-DP side, respectively. In this study, multiple EH-DPs can share the CU's uplink channel resource to transmit. Thus, the mathematical model should guarantee the minimum QoS of CU. So, (4b) defines a threshold about transmission rate demand for CU. Similarly, as (4e) shows, the chosen EH-DPs allowed to transmit in the tth time slot must have a minimum transmission rate requirement. At last, (4c) denotes the available energy constraint of EH-DP.

Problem
Decoupling. The maximization problem P EE can be decoupled into K subproblems according to the spectrum orthogonality. Hence, for any of EH-DP/CU groups kðk = ð1, 2, ⋯, KÞÞ, we have following optimization problem P k EE : where 3. Two-Layer Convex Approximation Algorithm (TLCA) As described in P k EE , some of the variables (the components of P C k and P D k ) can be real-valued, whereas the other variables (the components of X k ) are binary-valued. Furthermore, the optimization utility function (5) and restraints (4b) and (4e) depend on r t c k and r t d k,i , which have nonconvex feature (the simple proof of the nonconvex of r t c k and r t d k,i can be seen in Appendix A). So, (5) is a nonconvex MINLP problem, by which computational complexity is NP-hard. An intuitive proof of NP-hardness is that MINLP includes ILP problem (Formula (5) can be reduced to an ILP problem when the power allocation variables P C k and P D k are fixed), which has been proved to be NP-hard [26,27]. Based on the above discussion, we design a two-layer convex approximation iteration algorithm (CAIA), which contains an outer-layer iteration algorithm (OLIA) and an inner-layer convex approximation (ILCA) algorithm, to obtain a feasible quasioptimal solution. The OLIA first equivalently transform the fractional programming problem. Secondly, the ILCA is implemented to approximately convert the nonconvex MINLP optimization into a convex one.

Outer-Layer Iteration Algorithm (OLIA).
First of all, the target of P k EE , (5), is a nonlinear fractional programming paradigm [28], which can be transformed into an equivalent multiobjective program by the Dinkelbach method. For easier description, we use Ω to represent the feasible solution set of problem (5). Let q * k denote the maximum stSEE of EH-DP communication in the kth EH-DP/CU group. Then, we have the following definition: Accordingly, the following theorem can be ready to present.
Proof. The proof is similar to the proof in [28]. Hence, formula (7) can be addressed by an iterative process, which is demonstrated by Algorithm 1. Define m as the number of iterations, q m k as the instantaneous EE for the kth EH-DP/CU group in the mth iteration, and ε as the convergence threshold.
Although problem (Algorithm 1) is equivalent to problem (5), which is mainly transferred by Dinkelbach's theorem, problem (Algorithm 1) is also a nonconvex MINLP formulation. Hence, to handle this situation, we propose an inner-layer convex approximation algorithm to convert (Algorithm 1) into a convex one.

Inner-Layer Convex Approximation (ILCA).
For convenience, let q k represent q m k in each iteration of OLIA. ILCA should perform the following three steps to convert the nonconvex MINLP problem (Algorithm 1) into a convex one. For the first step, the value of x t d k,i is relaxed into a continuous interval ½0, 1, where R t c k and R t d k,i shown by (9) and (10) are the equivalent transformation functions of r t c k and r t d k,i according to S t d k,i = x t d k,i ⋅ p t d k,i , respectively, and are related to p t c k and S t d k,i : According to the above equivalent substitution, the optimization problem (Algorithm 1) can be equivalently solved by finding solutions about variables of X k , P C k , and S D k .
For the second step, as inequality (11) expresses, ILCA introduces the same convex approximation formula as [29] used to acquire an approximate transmission rate about the original one: The bound of approximation rate is proven to be tight and has low complexity in a high-SINR regime (i.e., n ≫ 1). At this moment, n = n 0 and α = n 0 /ð1 + n 0 Þ, β = log ð1 + n 0 Þ − α log n 0 [29]. To obtain the tightened lower bound, we need an iteration algorithm (such as step 2∼step 6 in p.3751 of [29]) to make the approximation reach a high-SINR one.
For the third step, we perform some equivalent substitution of variables by equations of p t c k = e p t c k~, S t d k,i = e S t d k,i~. Consequently, according to the above three steps, problem (Algorithm 1) can be approximately transformed into the following convex optimization formulation: s:t: p t c k ≤ log p max where R t approximation about R t c k and R t d k,i after the second and third steps and denoted by (13) and (14), respectively, where the updating of α t c k , α t d k,i and β t c k , β t d k,i is the same. Obviously, according to the convexity of log-sum-exp [29,30], problem (12) can easily proof to be a convex one. As a result, we can utilize one of the typical convex optimization algorithms to solve it easily and efficiently. When the solutions of problem (12) are obtained, we can convert the solving variables of original problem (5) back by using equations p t c k = e p t c k~, S t d k,i = e S t d k,i~, x t d k,i = 1 when p t d k,i is no less than zero, and otherwise, x t d k,i = 0. Even though CAIA can obtain a quasioptimal solution for the original problem, there are two key obstacles to practically implement the proposed algorithm. Firstly, the iteration complexity of CAIA is one of the key obstacles to implement in the LTE (Long-Term Evolution) system which requires the scheduling period in milliseconds [31]. Secondly, the overall CSI and ESI during a period of time are hard to obtain practically. Thus, we propose a heuristic algorithm, which is named time-division scheduling scheme 6 Journal of Sensors (TDSS), to obtain a suboptimal solution with low computational complexity.

The Time-Division Scheduling Scheme (TDSS)
The complexity of CAIA and the difficulty obtained in the overall ESI and CSI promote us to design a heuristic algorithm to solve the stSEE problem. Although the ESI of all D2D pairs and the CSI of all involved communication links during a period of time are hard to obtain, the latest CSI and ESI (such as the next time slot) can be acquired through some prediction algorithms. For example, some environmental sources' (e.g., solar and wind) behavior can be predicted through the expected availability at a given time within some error margin [32]. Similarly, channel prediction is feasible and accurate if the predicted frequency is much higher than the channel changing time [33]. It is important to note that the prediction algorithms are not in our consideration.
With the CSI and ESI of the latest time slot, we can simply balance the transmission requirements between two adjacent time slots and hence increase the network performance. Accordingly, a heuristic algorithm, which is called TDSS, is proposed. TDSS can decouple the stSEE problem into two steps: D2D Pairs Choosing Strategy (DPCS) and power allocation strategy (PAS). As the pseudocode of Algorithm 2 shows, firstly, DPCS determines the corresponding EH-DPs in the set of Φ D k to multiplex the channel resource in each time slot for the kth EH-DP/CU group, where k = f1, 2, ⋯, Kg. After that, PAS allocates the corresponding power for the CU and the chosen EH-DPs. In other words, in each time slot, the two-step scheme firstly determines the indicative factor x t d k,i for EH-DPs in the kth group and then allocates p t c k and p t d k,i for the CU and the chosen EH-DPs. In the following two subsections, the detailed algorithm steps of DPCS and PAS will be described.

D2D Pairs Choosing Strategy (DPCS).
The DPCS is the first procedure in TDSS is and aimed at choosing the proper EH-DPs to multiplex the channel resource of CU at each time slot with the intention of load balance. The purpose of load balance is to schedule the transmission requirements between two adjacent time slots so as to decrease the interference between EH-DPs.
The design of DPCS is inspired by a basic characteristic of the optimization problem, which comes from the transmission rate restraints of CU and EH-DP. It means that those chosen EH-DPs in each time slot should satisfy a basic available energy threshold p th d k,i . If not, the EH-DP cannot be a candidate to multiplex the channel resource. Based on this, p th d k,i is expressed by Corollary 4 and is derived in Appendix A.

Corollary 4.
In order to ensure the PAS has a feasible solution set, the available power of the chosen EH-DP d k,i battery must satisfy a minimum value: where where M d k,i = r th Proof. The proof of this corollary is provided in Appendix B.
Therefore, based on the minimum energy threshold p th the proposed DPCS has three key steps to determine the candidate EH-DPs for each EH-DP/CU group as illustrated in Figure 4 and Algorithm 3.
Step 1. With the current available energy B t d k,i and minimum energy consumption threshold p th d k,i , we can pick out m t Initialization: m=1, q m k =0; ε=10 -2 ; step 1: For a given q m k ,obtain X k ′,P C k ′ and P D k ′ by solving the following optimization problem: Algorithm 1: Outer-layer iteration algorithm (OLIA). 7 Journal of Sensors candidate EH-DPs from the set Φ D k in an instantaneous time slot t for the kth EH-DP/CU group. As Figure 4 illustrates, the EH-DPs with the red color are candidates which meet the above basic energy demand.
Step 2. If m t k candidate EH-DPs can multiplex the channel resource to transmit at the tth time slot, p th d k,i unit power will be consumed at least. According to the power update rule Step 3. The number of candidate D2D pairs between the two time slots can be balanced for the purpose of decreasing the interference among D2D pairs. For example, in Figure 4, the current and next EH-DP candidates are 3 and 1, respectively. Thus, the number of transmitted D2D pairs of the two adjacent time slots can be evenly assigned to 2. In other for all EH-DP in Φ D k . step 3: load balance procedure: consider the feature that energy cannot be used in advance. So, only when m t k − m t+1 k ≥ 2, the transmission load between two adjacent time slots can be balanced, and the current allowed transmission load c t equals to dðm t k + m t+1 k Þ/2e. After that, pick out m t k EH-DPs in the current candidate list to multiplex the current channel resource of c k -th CU.  Journal of Sensors words, if the number difference of candidate D2D pairs between the current time slot and the adjacent next time slot is larger than or equal to 2, we can execute the load balance procedure. Notably, we can only schedule the condition that the service requirements in the current time is larger than the next due to the store-and-use characteristic of energy harvesting [34]. After that, in time slot t, DPCS will choose the average assignment number of D2D candidates dðm t k + m t+1 k Þ/2e to multiplex CU channel according to the principles of lower interference and larger transmission rate.
After the DPCS, m t k EH-DPs can be selected from Φ D k to multiplex the channel resource of CU c k in kth EH-DP/CU group. Therefore, is the set of the selected EH-DPs in Φ D k , is set to 1. In the same way, gÞ is set to 0.

Power Allocation Strategy (PAS).
After determining the binary indicator variables x t d k,i , the optimal power should be allocated for each transmission node (CUs and chosen EH-DPs) by maximizing the EE of the D2D communication, while guaranteeing the CUs' transmission service quality. Thus, the optimization problem P k EE will become an EE maximization problem as stated in s P1: where R t c k and R t d k,i are the relative transmission rate equations about p t c k and p t d k,i , respectively, and can be obtained by (9) and (10). Notably, the EH-DPs belong to the set of Φ D s k at this time. The constraint (17c) is the available energy of each chosen D2D pair in battery. (17d) and (17e) are the constraints of maximum transmission power and minimum transmission rate of the relative choosing EH-DPs, respectively. The similar restraints for CU are (17a) and (17b).
Remarkably, the objective function is a fractional nonconvex optimization problem because the numerator of objective function of s P1 and the constraints (17b) and (17e) are nonconvex. Thus, it is difficult to find a solution for the objective optimization problem. However, we can utilize the same convex approximation approach as CAIA used to obtain a tight lower-bound convex approximation of the numerator in the nonconcave formula (17). Hence, inequal-ity (11) where (19) and (20), respectively. Similarly, R t ∼ Þ can be seen in (13) and (14), respectively. As fs P1 shows, it is a fractional optimization problem. As we know, log-sum-exp is convex. So, the function f ðp t Thus, fs P1 is a typical fractional optimization problem and can be solved by the Dinkelbach algorithm [35]. So, the main power allocation algorithm flow of PAS can be summarized by Algorithm 4.

Simulation Results
In this section, our goals are to verify the effectiveness of our proposed algorithms and study the impact of EH efficiency factors on system performance. Consequently, we will present numerical results to evaluate the proposed algorithm (CAIA) and the suboptimal heuristic algorithm (TDSS) in aspects of average (avg.) achievable EE and transmission rate of EH-DP. Furthermore, to assess the proposed algorithms, we will compare the proposed algorithms with the real-time transmission strategy (RTS), the Exhaustive Searching Scheme (ESS), and the Q-Learning Approach (QLA).

Journal of Sensors
In each time slot, once the EH-DP has enough energy to satisfy its transmission power demand, the RTS scheme will let EH-DP transmit directly by executing the power allocation algorithm for the CUs and the chosen EH-DPs. ESS can enumerate all possible solutions during the short-term time horizon and thus attain an optimal solution. QLA, a well-known reinforcement learning program, is widely used to solve some long-term or short-term utilities [36,37]. To better assess the effectiveness of the proposed algorithms, we implement QLA as a centralized one.

Simulation Setup.
The performance of the compared methods and the proposed algorithms in this study is evaluated via simulations. Above all, the considered cellular network with radius of 800 meters is demonstrated in Figure 1. The central controller, BS, has the capability of acquiring all users' position, and it is always located at the centre of this cellular area. Suppose that there exists KðK = 20Þ EH-DP/CU groups. For the kth group (k = f1, 2, ⋯, Kg), N k EH-DPs can multiplex the uplink channel radio resource of CU c k , where N k is randomly selected from 2 to 8. All users are randomly deployed in the cellular zone. And meanwhile, the distance between transmitter and receiver of each EH-DP pair is randomly selected between [20,50] meters [21]. Remarkably, to avoid serious mutual interference between each other, a minimum distance threshold, which equals to 200 meters, should be obeyed between CU and EH-DPs [38,39]. Similarly, the distance among EH-DPs in each group must be larger than 100 meters. The energy arrival process for every EH-DP is assumed to be i.i.d. Bernoulli sample, which conforms to formula (2). The other network parameters used in this study are listed in Table 2.
We repeat each simulation scenario with different energy arrival probabilities; for example, δ is 0.3 among EH-DPs, 100 times and average the results.

Complexity Comparison.
The computational complexity is an important aspect to better assess the effectiveness of the above algorithms. First of all, there are two important things that should be mentioned.
(i) The procedure of algorithms (CAIA, TDSS, and RTS) includes convex optimization of nonlinear programming. So, we use the ε-approximate solution to measure the computational complexity. It means that the computational complexity of algorithms (CAIA, TDSS, and RTS) is the needed iteration times when the solution reaches ε condition, such as Algorithms 1 and 4 (ii) We calculate and express the worst-case computational complexity of all algorithms for a fair comparison It is hard to have a thorough and correct analysis of complexity of convex nonlinear programming problems. However, generally speaking, the complexity is related to the space required to store input data and to the running time of the algorithm until a solution is found [40]. Besides, Vidal et al. [41] produced that the complexity of an ε-approximate Input: fx t i ði ∈ fΦ D k gÞg for k-th group at time slot t; Output: fp t i ði ∈ fc k , Φ D k gÞg; Initialization: m = 0, λ m k = 0, ε = 10 −2 ; set the fs_P1 as the power allocation optimization target; ∼ Þg with constraints of (18a)~(18e); Algorithm 4: Power allocation strategy (PAS).

10
Journal of Sensors solution for the continuous convex problem is Oðn log m log ðnB/εÞÞ, where n is the number of variables, m is the number of constraints, and B is the constraint bound. Moreover, as we all know, the complexity of QLA is related to the size of state-action space jS × Aj [42]. Thus, the complexity of the above-mentioned algorithms can be depicted in Table 3.
In Table 3, D means the number of D2D pairs, i.e., D = N k = jΦ D k j. B represents the maximal resource threshold. γ denotes the discount factor of QLA. The simulation time is the average result of the algorithms being executed about 100 times under a simulation scenario with N k = 5, δ = 0:3. Thus, we can draw a valid conclusion that ESS has the highest complexity while TDSS and RTS have the similar complexity, which is the lowest one. And then, QLA exhibits a little bit lower complexity than ESS, and CAIA is lower than QLA.

Performance Results and Analysis
5.3.1. The EE Performance Comparison. Figure 5 shows the simulation results of avg. achievable EE of EH-DP. It is worth noting that the results of QLA shown are steady-state perfor-mance. The simulation results of avg. achievable EE exhibit three significant conclusions: (i) By comparing with RTS, the EE performance of CAIA and TDSS shows that our proposed shortterm transmission scheduling strategy obtains better EE than RTS whatever the numbers of EH-DP. In other words, the short-term energy-efficient scheduling strategy is more suitable for the EH-DCCN. Moreover, due to the approximate rules of CAIA, CAIA can acquire a tight lower bound of the stSEE problem by comparing with the optimal algorithm ESS. The heuristic algorithm TDSS can obtain a suboptimal solution for the stSEE problem with the lowest computational complexity. With lots of trial and error, QLA owns the similar EE performance with ESS. However, the computational complexity is the biggest obstacle about the implementation of ESS and QLA (ii) With the rise of the harvested energy or the number of EH-DPs, the growth rate of EE between the short-term scheduling scheme and the real-time Avg. achievable EE of EH-DP (bps/Hz/Watt)

11
Journal of Sensors transmission gradually becomes smaller. For instance, as shown in Figure 5(b)), the increasing rate of avg. EE between CAIA and RTS is about 35% under the number of EH-DP which is 5. However, the increasing rate of avg. EE is declined to 13% under the number of EH-DP which is 8. That is due to the fact that as the energy arrival probability or number of EH-DPs increases, the time slots available to be scheduled will be decreased (iii) Additionally, please note that under δ equals 0.7 of Figure 5(b), the avg. EE slowly declines when the EH-DP's number is no less than 5. This is because the more numbers of EH-DPs there are, the more transmission requests there are. As a result, more energy must be consumed to endure the mutual interference so as to satisfy the target transmission rate. In the end, the average EE will decline. This phenomenon means that the maximal available scheduled number of EH-DPs is 5 under the energy arrival probability of 0.7. From this, the maximum sustainable numbers of EH-DP can be easily obtained under different EH probability. In this way, we can give a theoretical development reference for future EH-DCCN

The Transmission Rate Performance Comparison.
For purpose of better revealing the different reactions of the above algorithms in the modelled stSEE optimization under different EH-DCCN scenarios, the mean and variance of transmission rate of EH-DPs are exhibited in Figures 6 and  7, respectively. The variance of transmission rate, which can express the equilibrium level among users, means the degree of deviation from its mean. The smaller the value of variance, the higher the equilibrium level. Two crucial phenomena can be concluded from Figures 6 and 7: (i) Obviously, as demonstrated by Figure 6, along with the increasing of numbers of EH-DPs, or energy arrival probabilities, the avg. transmission rate of EH-DPs will increase accordingly. However, the growth rate of the avg. transmission rate of EH-DPs, just like the trend of EE performance, is going to be very slow when the harvested energy or the number of EH-DPs grows a certain extent (ii) From Figure 7, the variance of transmission rate of CAIA, ESS, and QLA is always lower than that of RTS and TDSS under any situations. It means that CAIA, ESS, and QLA will make the EH-DPs achieve better balanced effect. So, stSEE programming can be better executed by CAIA, ESS, and QLA. This is because of the fact that CAIA, ESS, and QLA can make better decision by fully considering shortterm resource equilibrium according to channel interference (i.e., number of EH-DPs) and energy level (i.e., energy arrival probability). Moreover, the poorer equilibrium of the slot-by-slot programming schemes, i.e., TDSS and RTS, also perfectly explains that the short-term energy-efficient scheduling strategy is more suitable for improving the performance of EH-DCCN

Conclusions
In this study, we investigate energy-efficient time-domain transmission equilibrium scheduling and power allocation scheme for improving the performance of the EH-powered D2D communication underlaying cellular network, where cellular users (CUs) and EH-powered D2D pairs (EH-DPs) transmit data over a shared uplink channel. According to the volatility of energy availability and the channel conditions of users, a short-term Sum Energy Efficiency (stSEE) maximization problem of EH-powered D2D communication is modelled, while ensuring the transmission rate requirements of CUs. However, the optimization problem, including transmission scheduling of EH-DPs and power allocation for CUs and chosen EH-DPs during a finite time horizon, is difficult and time-consuming. So, we propose a two-layer convex approximation iteration algorithm (CAIA) which can obtain a feasible quasioptimal solution for the modelled stSEE maximization problem. Simultaneously, a two-step heuristic algorithm in a slot-by-slot fashion is also developed to acquire a suboptimal solution without requiring statistical knowledge of channel and EH processes. By numerical analysis, the results show that the short-term time-domain equilibrium scheduling strategy can obtain better performances in terms of energy efficiency and transmission rate than the real-time scheduling algorithm for different EH settings. Besides, we also study the maximum scheduled number of D2D pairs underlaying one CU under different energy harvesting efficiency, which can give us the insight needed to design EH-powered D2D communications.