UAV-Aided Multiuser Mobile Edge Computing Networks with Energy Harvesting

This article studies a mobile edge computing (MEC) with one edge node (EN), where multiple unmanned aerial vehicles (UAVs) act as users which have some heavy tasks. As the users generally have limitations in both calculating and power supply, the EN can help calculate the tasks and meanwhile supply the power to the users through energy harvesting. We optimize the system by proposing a joint strategy to unpacking and energy harvesting. Speci ﬁ cally, a deep reinforcement learning (DRL) algorithm is implemented to provide a solution to the unpacking, while several analytical solutions are given to the power allocation of energy harvesting among multiple users. In particular, criterion I is the equivalent power allocation, criterion II is designed through equal data rate, while criterion III is based on the equivalent transmission delay. We ﬁ nally give some results to verify the joint strategy for the UAV-aided multiuser MEC system with energy harvesting.


Introduction
In recent years, wireless communication has been put into many efforts from the researchers of both academy and industry [1,2], which inspires a lot of practical applications, such as internet of things and video monitoring [3]. Among these applications, a key feature is that massive calculating is involved due to the massive number of accessing nodes [4]. To suppress the massive calculating, cloud computing has been proposed which assisted the task calculating through wireless transmission [5,6]. A major limitation is that the latency and power consumption (PoC) become prohibitively high in a poor channel state, which limits the development and application of cloud computing severely.
To resolve the above disadvantages of cloud computing, mobile edge computing (MEC) has been proposed to install the calculating resources at the edge node (ENs) of the network [7][8][9]. In this way, the users can unpack the tasks to the nearby EN through wireless transmission, which leads to a decreased delay and PoC compared to the cloud computing. A key design in the MEC system is the unpacking scale [10,11], which gives the number of scale of tasks to be calculated at the EN. The fundamental principle of unpacking is to jointly utilize the communication and calculating resources, through achieving a fine trade-off between the calculating and wireless transmission. Moreover, some advanced wireless techniques have been proposed to decrease the delay and PoC in the calculating and transmission [12,13].
Another new technique to assist the calculating and communication in IoT networks is the deployment of unmanned aerial vehicles (UAVs), which are easy to be used and provide flexible ability. Moreover, the price of UAV is becoming cheaper and cheaper, which inspires a lot of applications in practice [14,15]. For the MEC system, the UAVs can rescue the data calculating with higher priority through some intelligent path routing and scheduling, which exploits the incremental system resources due to the usage of UAVs. The integration of UAVs into MEC systems has attracted much attention from the researchers of academy and industry, which becomes the motivation of this article.
Motivated by the above literature review, this article studies a MEC system with one EN, where multiple unmanned aerial vehicles (UAVs) act as users which have some heavy tasks. As the users generally have limitations in both calculating and power supply, the EN can help calculate the tasks and meanwhile supply the power to the users through energy harvesting. We optimize the system by proposing a joint strategy to unpacking and energy harvesting. Specifically, a deep reinforcement learning (DRL) algorithm is implemented to provide a solution to the unpacking, while several analytical solutions are given to the power allocation of energy harvesting among multiple users. In particular, criterion I is the equivalent power allocation, criterion II is designed through equal data rate, while criterion III is based on the equivalent transmission delay. We finally give some results to verify the joint strategy for the UAV-aided multiuser MEC system with energy harvesting.

System Model
In this paper, we consider an unloading system model in Figure 1 which has an edge node (EN) (note that the notation of "CAP" is used in some literature, while the notation "EN" is used in other literature. Both stand for the same meaning, and these two notations can be used interchangeably) surrounded by N unmanned aerial vehicles (UAVs). Specifically, the EN has an energy transmitter and a server which can provide calculating. The EN is capable of providing charging services to the UAVs, and each UAV is equipped with a limited battery capable of wireless charging. Each UAV has a calculating task l n . Due to the UAVs' limited calculating power, each UAV unloads the calculating task to the EN in order to reduce the calculating time. The EN ensures that the UAV is always supplied with electricity, so the UAVs in this system unload tasks without considering power consumption. We will introduce the local calculating model and unloading calculating model in the next parts.
where l n is the size of the task. β n EN is the unloading ratio from UAV n to the EN. Moreover, c is the CPU cycles for executing one bit, and f n is the local calculating ability. Because all UAVs calculate their tasks in parallel, we use the maximum value of local calculating as the local delay of the whole system. So, the local calculating delay of the whole system is 2.2. Unloading Calculating Model. In this paper, UAV n will be charged by EN, and the charging process from EN to UAV n is where notations η denotes the charging factor, P n charge is the charged power of EN, α n is the charging time, and Γ denotes the span of each time slot.
From (3), the transmission power at the UAV n is The transmission rate between the UAV n and the EN is where W total is the total bandwidth of the system. h n EN~C N ð0, δ EN Þ is the channel parameter from UAV n to the EN. σ 2 EN is the variance of the additive white Gaussian noise at the EN. The transmission delay of the UAV n is The calculating delay at the UAV n is where f EN is the calculating ability at the EN. Further, the transmission delay of all UAVs is The calculating delay of all UAVs is

Wireless Communications and Mobile Computing
From (8) and (9), the unloading calculating of the whole system is Therefore, the system target in this considered MEC network is

Wireless Communications and Mobile Computing
where P total charge is the total charged power of EN. In the next section, we will describe how we optimize the system target in detail.

System Optimization
In this section, we demonstrate our optimization scheme for the considered system target. Specifically, we first utilize deep Q-network (DQN) algorithm to obtain the task unloading strategy, and then, we proposed three methods to allocate the charged power for UAVs in the considered system. The details of our optimization scheme are expressed as follows.
3.1. Scheme on the Task Unloading. Due to the complexity of wireless link in the system, it is hard to dynamically unload the task of UAVs by traditional method. Therefore, we exploit DQN algorithm to obtain the task unloading strategy. Different from the Q-learning algorithm, DQN has an experience pool and two neural networks that include the evaluation network and the target network, to interact with the training environment and break the training data correlation.

Wireless Communications and Mobile Computing
Moreover, we use the Markov decision process (MDP) to model the consider task unloading issue. In particular, MDP generally consists of the state set S = ½s 1 , s 2 ,⋯,s N , the action set A = ½a 1 , a 2 ,⋯,a 2N , and reward function R = ½0,−1, 1. The training process can be represented as follows: the DQN agent first initializes the system state set S, and then, it selects an action command under the current state. After the DQN agent executes the selected action command, the system state set will be updated. Further, the DQN agent will obtain a feedback according to the reward function R. Then, the DQN agent will put the previous state, the updated state, the selected action under the previous state, and the according feedback into the experience pool. When the DQN agent finishes the above process, it will obtain a state-action value QðS, A ; ωÞ that ω represents the network matrix of evaluation network. Then, the evaluation network will be trained by the loss function, which is where Y denotes the function of target network, which is

Wireless Communications and Mobile Computing
where φ represents a discount element and ω * denotes the network matrix of the target network. It is notable that the structures of evaluation network and the target network are the same. However, different from the target network, the evaluation network will be trained in every round, and its training process can be denoted as where ξ is the learning rate of the evaluation network.

3.2.
Methods on the Charged Power Allocation. In this part, we will describe three methods for allocating the charged power from EN to UAV n . Specifically, we exploit equal- (1) Equal-charge-power allocation method Firstly, we allocate the charge power to UAV n in a traditional way, so that each UAV n can obtain the same charge power. Moreover, we define this method as equal-chargepower allocation method or method 1, and it can be denoted as where notation P n charge denotes the allocated charge power of UAV n .
(2) Equal-transmission-rate allocation method Secondly, we allocate the charge power to UAV n by a method that ensure each UAV n can obtain the same transmission rate according to (5). Moreover, we define this method as equal-transmission-rate allocation method or method 2. This method can be represented as From (16) and (5) By removing the common item of W total , we can have From (4), we can obtain 1 +

Wireless Communications and Mobile Computing
Moreover, from (3) and (22), we can obtain After removing the comment term of Γ, we can have Then, by further removing the comment term of η, we can have For simplicity, we assume the charging time of each UAV is the same, which can be written as Therefore, from (26), we can obtain By removing the common item of ð1 − α n Þ for n ∈ ½1, N, we can have From this equation, we can have the power charge allocation result of method 2 as (3) Equal-charge-energy allocation method Thirdly, we allocate the charge power to UAV n by a method that ensure each UAV n can be charged same energy according to (3). Moreover, we define this method as equalcharge-energy allocation method or method 3, which can be represented as From (3), we can obtain By removing the common item of η, we can have Then, by removing the common item of Γ, we can have Since we assume that the charging time of each UAV is In the next section, we will perform some simulations to demonstrate the effectiveness of our proposed scheme on task offloading and charged power allocation.

Simulation
In this section, we perform some simulations to demonstrate our proposed scheme on task offloading and charged power allocation. Specifically, the channel in the considered MEC network adopts the Gaussian channel, and the average channel gain of the wireless link from UAVs to EN is set to 1. The variance of AWGN at the EN is set 0.1. Moreover, the number of UAVs is set to 2, and the task size of UAVs is set to 50 MB. We set the calculating ability of UAVs to 1:3 × 10 2 cycle/s, while the calculating ability of EN is set to 1 × 10 7 cycle/s. The total wireless bandwidth of EN is set to 50 MHz, and the total charged power of EN is set to 20 W, while the charging time of UAV is set 0.5. Figure 2 shows the convergence of the proposed strategy with method 1. We can find that the system delay declines rapidly and converges after 15 epochs. For example, the system delay of method 1 decreases from 35 to less than 5. Similarly, Figures 3 and 4 show the convergence of the proposed strategy with methods 2 and 3, respectively. We can find that the system delay converges after 15 epochs and the value of delay eventually stabilised below five. These results demonstrate that the proposed DRL optimization strategy can effectively reduce the system delay and find the minimum value of the system delay. Figure 5 shows the performance of the proposed strategy with method 1, where the value of W total ranges from 30 to 70. When the task size of each UAV is 100M or 50M, the system delay decreases as W total increases. This is because the increase in total bandwidth speeds up the transmission from the UAV to the EN and reduces system delay effectively. For example, the system delay at W total = 70 is lower than the delay at W total = 30. Similarly, Figures 6 and 7 show the performance of the proposed strategy with methods 2 and 3 when W total ranges from 30 to 70, respectively. We can find that system delay decreases when the total bandwidth is increasing. These results demonstrate the effectiveness of proposed optimization strategy. Figure 8 shows the performance of the proposed strategy with method 1, where the number of UAV ranges from 1 to 5. When the task size of each UAV is 100M or 50M, system delay increases as the number of UAVs increases. This i because the increase in the number of UAVs increases system burden and calculating delay. For example, the system delay when n = 2 is lower than the delay when n = 5. Similarly, Figures 9 and 10 show the performance of the proposed strategy with methods 2 and 3 when the number of UAVs ranges from 1 to 5, respectively. We can find that system delay increases when the number of UAVs is increasing. These results demonstrate that the proposed strategy can find the lowest system delay when the number of UAV ranges from 1 to 5.

Conclusions
This article studied a MEC system with one EN, where multiple unmanned aerial vehicles (UAVs) acted as users which had some heavy tasks. As the users generally had limitations in both calculating and power supply, the EN could help calculate the tasks and meanwhile supply the power to the users through energy harvesting. We optimized the system by proposing a joint strategy to unpacking and energy harvesting. Specifically, a deep reinforcement learning algorithm was implemented to provide a solution to the unpacking, while several analytical solutions were given to the power allocation of energy harvesting among multiple users. In particular, criterion I was the equivalent power allocation, criterion II was designed through equal data rate, while criterion III was based on the equivalent transmission delay. We finally gave some results to verify the joint strategy for the UAV-aided multiuser MEC system with energy harvesting.

Data Availability
The data can be obtained through email to the authors.

Conflicts of Interest
The authors declare that they have no conflicts of interest. 9 Wireless Communications and Mobile Computing