Energy Cooperation in Ultradense Network Powered by Renewable Energy Based on Cluster and Learning Strategy

A new method about renewable energy cooperation among small base stations (SBSs) is proposed, which is for maximizing the energy efficiency in ultradense network (UDN). In UDN each SBS is equipped with energy harvesting (EH) unit, and the energy arrival times are modeled as a Poisson counting process. Firstly, SBSs of large traffic demands are selected as the clustering centers, and then all SBSs are clustered using dynamic k-means algorithm. Secondly, SBSs coordinate their renewable energy within each formed cluster. The process of energy cooperation among SBSs is considered as Markov decision process. Q-learning algorithm is utilized to optimize energy cooperation. In the algorithm there are four different actions and their corresponding reward functions. Q-learning explores the action as much as possible and predicts better action by calculating reward. In addition, ε greedy policy is used to ensure the algorithm convergence. Finally, simulation results show that the new method reduces data dimension and improves calculation speed, which furthermore improves the utilization of renewable energy and promotes the performance of UDN.Through online optimization, the proposedmethod can significantly improve the energy utilization rate and data transmission rate.


Introduction
With the rapid development of mobile network, wireless data traffic has increased exponentially.More and more small base stations (SBSs) are needed to satisfy the traffic demand, which results in ultradense network (UDN).Meanwhile energy harvesting (EH) has become a very promising technology because of its flexible deployment and renewable energy [1].The combination between SBS and EH can make it possible to harvest energy to extend the network running time.However, there are some difficulties for EH technology [2,3], that is, the randomness and instability of renewable energy, the limited energy storage of EH devices, and so on.
The energy cooperation in EH wireless networks has recently been studied extensively.There are lots of researches for maximizing network throughout, energy cooperation, traffic management, and so on.Energy cooperation schemes among different BSs are considered [4][5][6], and the energy efficiency (EE) is improved by adopting the evolutionary algorithms.The jointly optimal policy that maximizes sumthroughput is proposed [7][8][9]; both energy cooperation and traffic management are considered.The capacity region coincides with a traditional K-user Gaussian MAC, and users can perform energy cooperation [10].The cooperation between primary and secondary users at information and energy levels is considered [11].These studies mainly focus on energy cooperation between the cells or users in cellular network or the hybrid power supply system, which aims at improving the EE and traffic management.With the increase of SBSs in the future communication network, it is necessary to divide SBSs into small groups for optimal energy cooperation.
UDN has a much higher deployment density than current mobile network, which greatly improves the network throughput.Many studies on spectrum sensing, spectrum efficiency, and intercell interference have been published [12][13][14][15].EE and resource management are studied in UDN [16][17][18], and a cluster-based EE resource allocation scheme is proposed [19].Taking advantage of the density feature of SBS, enhancing the energy cooperation will promote the network performance.With the increasing density of SBSs, it is more efficient and practical to utilize cluster and online learning method for greater network throughput.

Wireless Communications and Mobile Computing
There are also researches about energy supply especially EH technology.A system consisting of two cooperative microgrids is considered [20], and the two microgrids exchange renewable energy through the transmission line.Two energy sources for supplying the energy required for system operation are proposed, that is, an energy harvester and a constant energy source driven by a nonrenewable resource [21][22][23].An energy harvesting circuit is equipped at the energy harvesting receiver and introduces various nonlinearities into the wireless power transfer [24].A dense small cell network consisting of a set of small cells and a set of users is considered.In the network every user selects an SBS by itself, and multiple users can be served by a single SBS [25].For simplicity, we only consider the renewable energy cooperation among clustered SBSs in UDN.
Considering that clustering technique obtains optimal energy cooperation and reinforcement learning gains greater EE, we combine EH technology with clustered energy cooperation in UDN to solve EE problems.A renewable energy cooperation management algorithm is proposed based on clustered SBSs in UDN.Firstly, according to the geographical location and traffic load, sampling technique is used to determine the centers of data division, and then all SBSs are clustered using dynamic k-means algorithm.Secondly, within each formed cluster, SBSs coordinate their renewable energy.The process of energy cooperation among SBSs is considered as MDP.-learning algorithm is adopted to optimize energy cooperation, in which  greedy policy can ensure algorithm convergence.There are four different actions and their corresponding immediate reward functions for each SBS.The reward function represents the energy value after taking the corresponding action.-learning explores the action as much as possible, learns how to predict the relationship between the action and the reward, and furthermore predicts better actions by calculating rewards.At last, simulation results show that the new method reduces data dimension and improves calculation speed, which furthermore improves the utilization of renewable energy and promotes the performance of UDN.Through online optimization, the proposed method can significantly improve the energy utilization rate and data transmission rate.
This paper is organized as follows: Section 2 presents the system model according to MDP.In Section 3, we optimize the problem of energy cooperation among SBSs in UDN using clustering technology and reinforcement learning.Simulation results are given in Section 4. Section 5 concludes the paper.

System Model
In practice, the EH model depends on its specific implementation.There are solar panel and wind turbine-generator which can generate renewable energy, but the EH characteristics are different in both cases.The energy arrival times in the energy harvester can be modeled as a Poisson counting process [21].Sinusoidal curve can also be selected [22].In order to provide a general model for EH communication systems, we model it as a stochastic process in order to isolate the considered problem from specific implementation assumptions.In this paper, the location of SBSs in UDN is randomly deployed [26,27], and each SBS is equipped with EH unit and a limited capacity battery.Assume that a limited time-slot (TS) system, the renewable energy, and the required data are arrived at the beginning of each TS.The channel state information is (), which is kept constant in the same TS.The processing of energy/data packet can be considered as one-order discrete Markov model [28].In each TS, the amount of data SBS processed is   , and the minimum energy required for data transmission is    ().The energy arrival times in the EH unit are modeled as a Poisson counting process with   .The harvested energy    () in each TS is    () = (   /!) *  −  ,  = 0, 1, 2, . ... In UDN, for a certain SBS , the system parameters are given in System Parameters in UDN section.
Energy charged to the battery is   (),   () ≥ 0; and energy discharged from the battery is   (),   () ≥ 0. At any time, SBS will charge/discharge energy to/from battery.There is at most one of   () and   () that is strictly positive, that is,   () *   () = 0.
The SBS operation in TS  satisfies the following constraints ( represents battery charging efficiency): We consider one hot region in UDN, which is shown in Figure 1.The colored SBSs are the cluster centers, which form groups with the uncolored SBSs in their circles.Other SBSs join to the nearest group.The SBSs in one cluster achieve energy cooperation.For simplicity, we only consider the scenario of one user.For the request data rate of user, we ignore the fluctuation of data services and suppose that there is the full traffic data case.The user always has data to be received, and SBSs are in full load operation.
To simplify the system model, we give one SBS energy harvesting model in UDN, which is shown in Figure 2.
To find the required energy to reliably transmit a data packet over the channel we consider Shannon's capacity formula for Gaussian channels.
() can be approximately calculated by where  is the transmit power and  0 is the noise power density.

SBS
Figure 1: System model.   () is an integer multiple of the energy unit [22] and can be calculated by

Renewable energy
In this paper, we define that EE in terms of bits/Hz/Joule is where ∑  =1   is the sum power consumption of  SBSs in one cluster,   denotes the channel-gain-to-noise ratio,   denotes the addition power dissipation due to SBS's circuitry,   is the static power dissipation, and  is the power amplifier efficiency.
A multiuser scenario can be derived similarly.At the beginning of each TS, each SBS schedules its users, such that a single user is allocated to each subcarrier.Let  , = 1 denote that subcarrier  is assigned to user , and  , = 0, otherwise.Accordingly, EE can be expressed as where  is the user number assigned for one SBS,  is the subcarrier number, and  , denotes power allocation.
where ℎ , is the channel gain,  0 is the noise power spectral density of additive white Gaussian noise, and  is the bandwidth.

Energy Cooperation Optimization Algorithm Based on SBS Clustering and Learning Strategy
This section optimizes the problem of energy cooperation among SBSs in UDN using clustering technology and reinforcement learning.

Dynamic 𝑘-Means Clustering Method for SBSs in UDN.
In this paper, a cluster-based approach for maximizing EE in UDN is proposed.According to the different realization process, the common clustering methods are divided into hierarchical clustering and partition clustering.There are also clustering methods for large scale data sets.One of the most classic clustering methods is k-means, which uses cluster centers to represent the whole group.The cluster centers need to be updated repeatedly in the process of determining the final result [29,30].All data should be redivided before updating cluster centers, which makes kmeans lose the ability to deal with very large scale data, and the execution time of these algorithms grows with the data number increases.Since there are more SBSs than the traditional network in UDN, it is very time-consuming to run the clustering algorithm on the whole data set.To remedy this problem, sampling strategy is selected, which can greatly save storage space and reduce computation amount.Sampling technology is used to select some samples from the original data.According to the cluster result, the distribution of the original data set is estimated.In this paper, the sampling technique is used to determine the centers of data division.If the distance between SBSs is too long, energy cooperation may cause power loss and transmission delay.We cluster SBSs in UDN according to distance and traffic.The sampled SBS is selected according to SBS traffic from large one to small, and the SBS traffic refers to the average data packets sent in the past week or month.Each SBS is viewed as a data point in a twodimensional space, and the distances between sampling SBSs are saved [31,32].
Assume that there are  SBSs in UDN.The distances between all SBSs are saved in the matrix   *  .The dynamic -means algorithm is described as follows: (1) Select  sampling SBSs (according to SBS traffic from large to small) and save their distances in the matrix   *  ; each column represents the distances between one SBS and other sampling SBSs,   *  = (  ,   ) (,  = 1, 2, . . ., ), where (  ,   ) represents the distances between sampling SBSs.
How to determine  value is a very difficult problem.Once the  value is not reasonable, it is likely to lead to great errors in clustering.For this case, according to the data distribution properties and their distances, after a series of transformations, the final clustering number is obtained.Obtaining  value is a changing, dynamic process, and there is no need to know initial experience value.The dynamical division is closer to actual demand.It solves the problem that the algorithm needs manual input and improves the automatic clustering ability.
EE is considered to be one of the main benefits from clustering architecture.When the network is dense, the benefit of clustering is the improved stability of cooperative relationships.Clusters are formed in order to maximize the time availability of clusters, hence in this paper maximizing the availability of energy cooperative.The clustering centers are all SBSs of large traffics, which is benefit for energy cooperation among SBSs of small traffics.It also ensures the supply of renewable energy and improves the data transmission rate of the network.)), where  is the finite environment state space;  is the finite system action space;    (  ,  +1 ) ∈ [0,1] and    (  ,  +1 ), respectively, represent the state transition probability and the immediate reward of transferring the state from   to  +1 by taking action   .The probability and the immediate reward depend only on the current state   and the selected action   and are irrelevant to the past states and actions.
In the proposed model, the system state of th SBS in TS  is   = (   (),    (),   (),   ()), and action set is  = { 1 ,  2 ,  3 ,  4 }.At the beginning of each TS, SBS has four actions for each data packet, which is shown in Table 1.
Assume that the optimal state value function and the optimal action value function are  * and  * , which satisfy the Behrman optimal equation: As a result, the optimal policy can be obtained: The goal of MDP is to find the system's optimal policy  * , which can be obtained by the optimal value function [33].Drop the data packet and achieve energy cooperation Drop the data packet and have no energy cooperation   = 0 -learning algorithm is adopted, and its iterative formula is as follows: where (  ,   ) is a state-action pair in TS ,  (0 <  < 1) is the learning factor, and  (0 <  < 1) is the discount factor. iterative learning uses the reward of state-action pair as the evaluation function.First initialize  value, then determine the action   in the state   according to  greedy policy, get the knowledge and experience of training samples (  ,   ,  +1 ,   ), and then modify  value.When agent gets the target state, the algorithm terminates one iteration loop.The algorithm starts from the initial state again until the end of learning.
-learning algorithm is applied in renewable energy cooperation within one SBS cluster based on UDN, and its process is as follows: (1) Initialize: (, ) ← any value,  and : given values.

Repeat.
A Choose   according to  greedy policy, then obtain   and  +1 .
Until   is termination state.Until all (, ) are converged.
In A,   has four choices and the corresponding   shown in Table 2 ( represents storage efficiency of the battery, and  represents resistance loss).
Actually, four actives can be combined together: Their values are listed in Table 3.
The reward function is actually the sum of the energy for sending data package and the energy for cooperation.In each TS, it encourages sending data and carrying out energy cooperation.
In -learning algorithm, actions with the highest  values at a particular state should be taken at each step.The agent who rigidly follows this rule might underperform since the same decision will be investigated over and over again.In order to be exploited, the state-action pair needs to be explored firstly [34].In this paper, we utilize  greedy policy (0 <  < 1), which is commonly used during the process of state-action space exploration.It enforces sporadic jumps to suboptimal states for the exploration purposes, but also to detect changes of the environmental conditions.Whenever a decision is to be made, the one will be picked at random with the (1 − ) probability, which is given to the action with the highest  value.
-learning converges to the optimal  function.While  → ∞, (, ) converges to  * (, ) with probability 1 [35], and the convergence rate is related to many factors.The convergence rate increases with the value of  and the number of learning iterations   and decreases with the number of , , and  [36].Action selection follows the  greedy policy with probability (1 − ) at each TS.The exploration probability is , and the exploitation probability is (1 − ).

Numerical Simulations
According to the traffic amount of SBS from large to small, their relative positions in UDN are marked as Table 4.All SBSs satisfy the constraints (C1).
Through dynamic k-means clustering, without knowing the  value before, SBSs that have large business amount can be selected as the cluster centers.There are relatively larger energy demands for cluster centers, which can effectively improve the utilization rate of energy.In addition, -learning   in this paper only considers the energy cooperation between the same clusters, which can reduce the dimension disaster problem caused by too many states.
As shown in Figure 4, -learning approach for energy cooperation in one class is convergent.The horizontal axis is the iteration times (the time of one iteration is 10 ms, and it is the same as other figures), and the vertical axis is the difference between the adjacent two sampling value functions.The line uses 5-degree polynomial fitting, and the  sampling interval is 100  (it is the same as in Figures 4-10).The change of the value function proves the algorithm's convergence.When the iteration number (  ) reaches about 6 × 10 4 , the function value is basically unchanged.
In Figure 5 it shows that the learning factor  in learning algorithm can influence the iteration times when the packet transmission rate reaches stable.When  = 0.5 and   ≈ 3.0 * 10 4 , the black diamond line is basically unchanged; when  = 0.6 and   ≈ 2.5 * 10 4 , the triangle blue line reaches stable; when  = 0.  stable.We can conclude that -learning algorithm can keep stable with smaller iteration times as  increases.
As shown in Figure 6,  greedy policy can ensure the convergence of -learning algorithm.When  uses different values, it can influence the final package transmission rate.When  = 0.1, the package transmission rate reaches about 60% as   increases; when  = 0.2, it reaches about 55%; when  = 0.3, it reaches about 50%.We can conclude that -learning algorithm is able to learn the optimal policy with increasing accuracy as  decreases.
In Figure 7 we show the effect of the max battery size  max on the expected data transmission rate.We can conclude that the expected data transmission rate increases with  max for the proposed algorithm.In our model, the system state of th SBS in TS  is   = (   (),    (),   (),   ()), in which all the parameter values are finite and discrete.If  max has a relatively larger value, then   () ∈ {0, 1, . . .,  max } has more choices, which leads to increased computation and dimension disaster in  learning.In this paper we make  max = 5 to simplify the algorithm, but we have executed exhaustive numerical simulations with different parameter settings and observed similar results.Figure 8 displays the data transmission rate for different    values.We see that the expected data transmission rate increases with    .It means that the more stable the channel state is, the higher the data transmission rate is.In class {6 ⊕ , 11, 18, 20}, when    = 0.5, the data transmission rate is about 28 kbit/s; when    = 0.9, the data transmission rate is about 40 kbit/s.As    increases, EH process becomes less random, and the proposed algorithm can better estimate its future states and adapt to it.
In Figure 9 it shows the relationship between the data transmission rate and iteration times.Energy cooperation can improve the energy utilization; that is, the data transmission rate can be higher through energy cooperation.The black diamond line which represents the rate of energy cooperation reaches about 28 kbit/s, and the green triangular line which represents the rate of no energy cooperation is below 24 kbit/s.The transmission rate through energy cooperation is about 15% higher than no energy cooperation.It is the same with other clusters in UDN.The proposed method in this paper can significantly improve the data transmission rate.
The ratio of the network throughput to the power consumption per unit area is defined as EE (the energy efficiency).The energy efficiency metric is a performance indicator that measures the benefit-cost ratio by comparing the achievable rate to the energy costs.In Figure 10 we illustrate, together with the performance of the other approaches, the expected average energy efficiency by the proposed approach against the number of learning iterations times.It can be observed that the average EE of energy cooperation is higher than that of no energy cooperation.The black curve is more closer to the Offline-LP algorithm.The proposed method can significantly improve energy utilization rate.

( 4 )( 6 )( 7 )( 8 )( 9 )
Calculate the average value of all   ,  = ∑  =1   /.(5) Calculate high density radius  =  + min (  ,   ); min (  ,   ) is added to make  big enough so as to ensure that most high density points are correctly labeled.Calculate cluster radius  according to  =  + min (  ,   ) and select the two furthest points from sampling SBSs as the initial cluster centers and mark  = 2 to ensure that the centers come from different clusters.Divide the data near the centers into two clusters according to , find the next farthest point according to the centers, mark  =  + 1, and divide the data again until all the data are clustered completely.Cluster the remaining SBSs into the nearest center point.Calculate  =  + , and ℎ = min (  ,   ) + min (  ,   ).If the distance between the cluster centers is smaller than  and the distance between the boundary points is smaller than ℎ, then combine the two clusters and label  =  − 1.

Figure 3 :
Figure 3: The dynamic k-means clustering result for SBSs in UDN.

Figure 10 :
Figure 10: Average energy efficiency with and without energy cooperation.

Table 1 :
Actions for SBS processing each data packet.-Learning Approach for Energy Cooperation.Due to the instability of the renewable energy and the arbitrary distribution of SBSs, it is necessary to improve EE through energy cooperation.In this paper, we propose a renewable energy cooperation scheme among different SBSs, in which one SBS can collect/share energy from/to another SBS.The energy cooperation efficiency will be improved when the number of the cooperation SBSs is large.Consider energy cooperation among SBSs in UDN as the finite states and discrete time MDP.(, ,    (  ,  +1 ),    (  ,  +1

Table 2 :
Four choices of   and the corresponding   .

Table 3 :
Values of A and B according to different actives.

Table 4 :
The positions of SBS.