Resource Allocation in UAV-Assisted Wireless Powered Communication Networks for Urban Monitoring

In this paper, we studied the unmanned aerial vehicle-assisted urban monitoring network, in which unmanned aerial vehicle (UAV) with wireless power transmission provides energy transmission and data collection services for the network. Considering the density of the urban monitoring network, we use the k -means algorithm to cluster the monitoring nodes and reduce the complexity of the UAV services. The UAVs are serviced using a ﬂ y-hover-communication protocol. During hovering, the UAV works in the full-duplex mode, collecting data from cluster head nodes on one side and recharging nodes in coverage on the other side. We propose a multiobjective joint optimization problem that considers maximizing the amount of data collection and energy transfer and minimizing the energy consumption of the UAV during the service period. In the optimization process, there is a partial con ﬂ ict between the three objectives. For this reason, the importance of the optimization objectives is considered and described by weighting parameters. A multiobjective joint deep deterministic policy gradient algorithm is proposed for the multiobjective control policy of UAVs. Numerical results show that the proposed algorithm can achieve the joint optimization of multiple objectives and is compared with other algorithms to verify the superiority of the proposed algorithm.


Introduction
As a key technology for future networks, the internet of things (IoT) can meet the transparent and seamless integration of massive heterogeneous nodes at any time, any place, and any interconnection mode [1,2] and can provide service support for information exchange and content sharing between nodes [3].
Internet of things (IoT) technologies are widely used, and wireless monitoring networks are one of the application scenarios. It is important for the development of the internet to obtain environmental information and summarize and analyze it by large-scale and intensive deployment of monitoring nodes in wireless monitoring networks to realize the value exploitation of data. There are numerous application scenarios for wireless monitoring networks, such as smart agriculture [4], railroad systems [5], and environmental monitoring [6].
In smart city monitoring networks, network nodes are affected by deployment location and terrain, and the needs of monitoring nodes are often not satisfied. In terms of data acquisition, the complex building environment in the monitoring network makes it difficult to transmit data between monitoring nodes through multihop communication and mobile access point communication. In terms of energy transmission, the traditional wired energy charging and natural environment energy acquisition methods are not efficient enough to meet the energy transmission needs of urban monitoring networks. Inefficient data collection and energy transmission services can easily lead to data overflow and energy voids in nodes. The introduction of unmanned aerial vehicles (UAVs) and wireless energy-carrying technology provides an effective solution for urban monitoring networks [7,8]. UAV-assisted wireless networks are a promising technology for improving network performance [9]. The UAV's ability to fly allows it to get close to the device quickly without being hindered by ground obstacles [10]. Through UAV-assisted data collection and energy transmission, it provides an efficient network service for urban monitoring networks. Considering the high performance demand for data collection services and energy transmission services in the monitoring network and the low energy consumption requirements of UAVs, it is of great research significance to improve the network performance by optimizing the resource allocation in the UAV-assisted urban monitoring network.
In urban monitoring networks, network nodes have the characteristics of large-scale and dense distribution, while the distribution of network nodes will be affected by their own characteristics. Thus, the UAV's data collection service to the monitoring nodes as well as the energy transmission service needs to first consider the clustering of the nodes. The clustering of nodes based on their own characteristics and distribution characteristics can effectively reduce the complexity of the network and reduce the energy consumption of UAVs. During the mission, the UAV sends data collection information to the cluster monitoring nodes via downlink and carries out energy transmission at the same time and interacts with the cluster head nodes to collect monitoring data via uplink. In this paper, we optimize the UAV action strategy under the premise of ensuring the network quality, which mainly includes flight decision, hovering time position division, and UAV transmission power. The UAV action strategy is optimized to reduce UAV energy consumption while providing efficient data collection and energy transmission services.
The main contributions of this paper are as follows.
(i) To address the high demand for data collection and energy transmission services in urban monitoring networks, the data collection quantity, energy transmission quantity, and UAV energy consumption are used as multiobjective optimization problems, and to minimize node data overflow and energy hole situations. An optimal resource allocation strategy based on the antenna switching structure is proposed, which can maximize network performance and minimize UAV energy consumption while guaranteeing network quality (ii) The monitoring nodes are clustered based on the k -means algorithm considering the strong correlation between a large number of monitoring nodes in the urban monitoring network, deployment distance, distribution characteristics, and data queue characteristics (iii) The decision problem in the UAV-assisted urban monitoring network is modeled as a Markovian decision process, and a dynamic resource allocation algorithm based on the multiobjective joint optimization-oriented DDPG algorithm (MJDDPG) is proposed to achieve the optimization objective by jointly optimizing the UAV flight decision, hovering time slot, and UAV launch power. Finally, the effectiveness of the proposed algorithm is verified in the simulation analysis, and the superiority of the algorithm is verified by comparing it with the baseline algorithm The remainder of this paper is arranged as follows. In Section 2, a brief summary of related work is presented. In Section 3, we introduce the system model and present the multiobjective optimization problem. In Section 4, we introduce the MJDDPG-based resource allocation algorithm. Section 5 gives the related simulation results and analysis. In Section 6, we conclude the full paper.

Related Work
The data collection of large-scale monitoring nodes and the optimization of energy transmission have been the research hotspots of wireless monitoring networks. As a common auxiliary method in wireless monitoring networks, a lot of research work has been done on UAV-assisted wireless monitoring networks. In [11], for a system of animal species and health status monitoring, considering the strict requirements of monitoring data collection latency in this scenario, the UAV trajectory and performance were optimized based on the latency as well as the information value of the UAV by deploying the UAV to collect monitoring information in static nodes. In [12], a study was conducted for the data collection task in a multi-UAV-assisted agricultural monitoring scenario, where the UAV flight speed and the node data transmission rate were considered, the node upload data should be defined as gradient upload, and the automatic data collection function was realized by building an overall architecture based on dual UAVs as well as ground data terminals, which improved the data collection in real time as well as accuracy. In [13], a medical health monitoring scenario was studied to guarantee effective coverage by deploying UAVs to cover health monitoring equipment in the region, enabling the collection of health monitoring data and providing real-time computational processing to guarantee timely rescue services in case of accidents. In [14], the UAV-assisted maritime monitoring scenario was studied, in which the UAV acts as a mobile base station to achieve energy transmission, and the monitoring nodes are based on obtaining energy upload data information, and the energy efficiency is obtained as a measure of the ratio between the amount of data uploaded by the nodes and the energy consumption of the UAV, and the impact of the UAV hovering point on the network performance is investigated. For the remote ocean monitoring scenario [15], the data of underwater monitoring points are obtained through the surface convergence point, and the data collection work is completed by UAVs, and the optimization goal is to maximize the network lifetime by jointly allocating the UAV deployment, subchannel matching, and joint allocation optimization of power matters, time, and other resources to maximize the remaining energy of underwater nodes while guaranteeing the delay requirements.
For the energy supply of monitoring nodes, Lin et al. [16] conducted a study for a smart agriculture monitoring scenario in which UAVs are used as data relays as well as energy charging components to charge monitoring nodes and meet the charging requirements of all nodes by planning the optimal path of UAVs and considering the UAV movement path length as well as UAV energy as constraints. In [17], the charging of monitoring nodes in agricultural monitoring scenarios is considered, and RF signals are sent 2 Wireless Communications and Mobile Computing to monitoring nodes, and monitoring data are collected through wireless power transmission technology, where UAV movements are controlled by developing systems as well as feedback control algorithms. Considering the recharge scenario of the wearable medical emergency system, the authors in [18] investigated the problem of minimizing the total recharge and computational task offloading time, in which the UAV hovering position, flight decision, and recharge time were mainly considered and the optimization problem was solved based on an intelligent optimization algorithm.
In summary, most of the existing research works on UAV-assisted wireless monitoring networks are oriented towards data acquisition services of monitoring nodes, without considering energy supply and data acquisition optimization. Thus, it is necessary to optimize the resource allocation strategy of wireless monitoring networks to provide efficient data acquisition and energy supply services.

System Model and Problem Formulation
In this section, we first present a UAV-assisted urban monitoring network in which UAVs provide data collection and energy transfer and then formulate the multiobjective optimization problem.
3.1. UAV-Assisted Urban Monitoring Network Model. In this paper, we consider UAV-assisted urban monitoring networks, with the specific scenario of a single UAV providing energy transmission as well as data collection services for multiple monitoring nodes through mobile. The urban monitoring network model is shown in Figure 1, where the UAV is equipped with a single antenna and the monitoring nodes are equipped with multiple antennas. The flight altitude of the UAV is H. The monitoring nodes perform information decoding as well as energy collection based on the antenna switching structure, respectively, and the monitoring nodes will differ in data generation rate, distribution density, and the importance of monitoring data according to the different monitoring tasks.
Considering the limited energy of the UAV, each mission lasts for a period of time, the duration is T > 0, and for the purpose of analysis, the total time is divided into N equal time slots, i.e., t = T/N, as shown in Figure 2. The UAV works with a flight-hover communication protocol, where the UAV does not communicate with the monitoring node during flight and only transmits energy and collects data from the monitoring node during hovering. The UAV hovering time slot is split into two parts, corresponding to uplink and downlink communication, respectively. In the downlink subtime slot, the UAV sends information to the monitoring nodes in the cluster and carries out energy transmission at the same time. Meanwhile, the monitoring nodes receive RF signals for information decoding and energy collection based on the antenna switching structure. In a single time slot, the time size of the downlink subtime slot is τ½t. In the uplink subtime slot, the cluster head node uploads monitoring data to the UAV. In a single time slot, the time size of the uplink subtime slot is t − τ½t.

Transmission Queue Model.
In the urban monitoring network, the monitoring node is denoted as M = 1, ⋯, m and the node location is ½x m , y m . For monitoring node m, λ m ðtÞ denotes the data generation rate of node m during the monitoring task performed at time slot t. Due to monitoring node deployment location and hardware factors, λ m ðtÞ will be different. Consider that λ m ðtÞ obeys Poisson distribution for different nodes, and this parameter is constant during the monitoring task. b m ðtÞ denotes the length of data waiting to be uploaded in the data transmission queue of monitoring node m at time slot t. At the beginning of each time slot, the data transmission queue of monitoring node m is is the maximum capacity of the data transmission queue storage. We assume that b max is the same for all monitoring nodes. When b m ðtÞ exceeds b max m , the newly collected data of the monitoring node not put into the data buffer will be discarded, resulting in data overflow.
Considering the energy transmission of the monitoring node, δ m ðtÞ denotes the remaining battery energy of the monitoring node m at time slot t. Let υ m ðtÞ denote the energy consumption rate of the node at time slot t. The energy consumption rate of the monitoring node varies due to hardware factors and the deployment location. At the beginning of each time slot t, the energy state of the monitoring node is where δ m ∈ ½0, δ max m , δ max m is the maximum capacity of the energy transfer queue storage. We assume that δ max m is the same for all monitoring nodes. When δ m ≤ 0, it means that the monitoring nodes run out of energy and there is an energy hole situation.

Channel Model.
Considering that the study scenario is an urban area with many buildings, the free-space propagation channel model is no longer applicable. The probabilistic channel model that combined with the line-ofsight (LOS) link and non-line-of-sight (NLOS) link is considered. Under this channel model, we define L k ðtÞ as the path loss. The mathematical description of the path loss is shown below [19]: where γ 0 = ð4πf c /cÞ −2 denotes the channel power gain at the reference distance d m = 1m, with f c denoting the carrier frequency and c denoting the speed of light. And d −α m is the distance between the UAV and the target node, whereα denotes the path loss exponent. The term μ NLOS is the attenuation coefficient of the NLOS link.

Wireless Communications and Mobile Computing
We define P LOS as the LOS probability. For monitoring node m, the LOS probability [20] at moment t is where a and b are constants depending on the carrier frequency and the type of environment. It is also influenced by the relative position of the communicating nodes. θ m ðtÞ is the elevation angle between the UAV and the target monitoring node, denoted as where d m ðtÞ is denoted as the distance between the UAV and the monitoring node and is calculated as d m ðtÞ = ðH 2 + ðx u ðtÞ − x m Þ 2 + ðy u ðtÞ − y m Þ 2 Þ 1/2 , with ½x u ðtÞ, y u ðtÞ being the horizontal position of the UAV at time slot t.
The non-line-of-sight link probability can be denoted as P NLOS t ðθ m ðtÞÞ = 1 − P LOS t ðθ m ðtÞÞ. The downlink channel power gain and uplink channel power gain of the communication link between the UAV and the target monitoring node m can be denoted as h m ðtÞ and g m ðtÞ, respectively. We assume that the uplink and downlink channels of the UAV are approximately the same. The channel power gain between the UAV and the target node can be expressed as

Wireless Communications and Mobile Computing
UAV is controlled by controlling the flight speed vðtÞ and yaw angle θðtÞ, where vðtÞ ∈ ½0, v max and θðtÞ ∈ ½−π, π. Consider the propulsion power consumption of the UAV at speed V expressed as [21] where P 0 is the blade profile power during hovering and U tip is the rotor blade tip speed. P i and v 0 denote the induced power and the mean rotor induced velocity under hovering conditions. For parasitic power, d 0 , ρ, s, and A, denote the fuselage drag ratio, air density, rotor solidity, and rotor disk area, respectively. The propulsive power consumption of the UAV includes blade profile, induced power, and parasitic power. The hovering power consumption of the UAV by setting V = 0 can be expressed as In summary, the flight energy consumption of the UAV during the time slot t can be expressed as The UAV communication energy consumption mainly considers the energy transmission loss in the process of charging the monitoring nodes. In the actual scenario, the energy transmission range of the UAV is limited, and only the energy transmission and data collection are carried out to the monitoring nodes within the coverage of the UAV during the hovering phase.
The UAV moves to the target node location for service by controlling the flight speed vðtÞ and the yaw angle θðtÞ. The radio frequency signal is transmitted to the monitoring node by transmitting at transmit power p d within subtime slot τ½t at time slot t, where p d is limited by ½0, p max d . In τ½ t, all monitoring nodes in the coverage area of UAV energy transmission will be charged, and the received power at monitoring node m can be expressed as In order to be closer to the realistic scenario, a nonlinear energy transfer model [22] is applied as the air-ground energy transfer model in this paper. Compared with the linear model, the nonlinear energy transfer model considers the saturation limit of the circuit and is more general as well as practical. With the RF-EH model [23], the actual received power of the node can be expressed as where P limit is the maximum output direct current power and c and d are both circuit characteristic-related constants.

Network Clustering Model.
Consider the urban monitoring network scenario in which the number of monitoring nodes is large and densely distributed. UAVs traversing all monitoring nodes in turn for energy transmission and data collection will cause serious energy consumption. In addition, monitoring nodes without timely data collection and energy transmission services will lead to serious energy voids and data loss problems. Thus, the problem is considered to be divided into two parts: cluster head election and resource allocation. After clustering the nodes, a suitable monitoring node is selected from each cluster as the cluster head, and the cluster head node collects the data from the monitoring nodes within the cluster and forwards it to the UAV. In this paper, the k-means algorithm is chosen as the clustering algorithm, and in terms of monitoring node characteristics, the neighboring monitoring nodes have similarity in terms of data generation rate and energy consumption rate.
The M monitoring nodes are divided into k clusters according to the clustering algorithm, and each cluster corresponds to a subset of nodes denoted as M k . In the cluster, the node transmits its own monitoring data to the cluster head node k. The UAV transmits energy to all nodes in the cluster during the hovering phase, and the monitoring node as the cluster head uploads monitoring data via uplink during t − τ½t time. In the data upload phase, the uplink transmit power P u k ðtÞ of the cluster head node depends on the total energy collected in time τ½t, i.e., P u k ðtÞ is positively related to the actual received power P h k ðtÞ [24] and can be expressed as where ζ is a constant value indicating the energy conversion efficiency. The upload data rate of cluster head node k is expressed as where σ 2 k denoted the channel noise power of the UAV and g k ðtÞ denoted the channel power gain between UAV and cluster head node k at time slot t.
At each time slot, the UAV selects a cluster head node as the target node for the next service. If the target node is still the current node, the UAV continues to maintain the hovering service for the next time slot, and if the target node changes, the UAV will be in the flight state and move to the target node position by making a decision on the flight speed as well as the yaw angle. The selection of the target node requires consideration of the service priority of the node, which includes data collection priority and distance to the node, where the data collection priority is set based on the data queue length and data generation rate of the 5 Wireless Communications and Mobile Computing monitoring node. The service priority of cluster head node k at time slot t can be defined as where d u k ðtÞ denotes the relative distance between the monitoring node k and the UAV. The priority of nodes does not only depend on the data transmission queue ratio of the node but also is influenced by the node data growth rate and the relative distance from the node to the UAV. The growth rate of node data contains the prediction of future data queues. The relative distance between nodes and UAV is considered in order to let UAV serve the whole area as much as possible. Thus, the UAV is not served only in a certain area. Meanwhile, the relative distance includes the flight process before the UAV provides the service.
3.6. Problem Formulation. Considering the UAV-assisted urban monitoring network, a dynamic resource allocation strategy based on the antenna switching structure is developed, which can guarantee the network performance by considering three aspects: data collection demand, energy transmission demand, and UAV energy consumption. The data collection demand is mainly reflected in the total amount of data collected by the UAV to the monitoring nodes, the energy transmission demand is the total amount of energy acquired by the monitoring nodes, and the UAV energy consumption is the total communication energy consumption and mobile energy consumption of the UAV during the service period.
After the monitoring nodes are clustered, the monitoring nodes within the cluster send the cached data in the data transmission queue to the cluster head node in a singlehop or multihop way. During the time slot t, the data collection of the UAV during hovering at the monitoring node m is realized based on the cluster head node k. The amount of data collected by the UAV can be expressed as The total data collection from the UAV to the monitoring nodes during the service period T can be expressed as During the time slot t, the UAV sends data collection information and transmits energy to the cluster head node k and the remaining monitoring nodes within the coverage area of the UAV. During hovering at the position of cluster head node k, the amount of energy transmitted by the UAV within the time slot t can be expressed as Thus, the total energy transfer from the UAV to the monitoring node during the service period T can be expressed as According to the state of the UAV in time slot t, the UAV energy consumption can be divided into flight energy consumption as well as hovering energy consumption, where hovering energy consumption includes the UAV hovering energy consumption and the total downlink transmission energy, which can be expressed as The energy consumption of the UAV in the flight state is denoted as E uav ðtÞ = pðvÞt. The energy consumption of the UAV during the service period can be expressed as [23] At the same time, to minimize data overflow and energy hole, we consider the actual scenario. We consider b k ðtÞ ≤ α b max , i.e., the data overflow situation is considered to occur when the amount of data in the monitoring node is greater than the threshold value αb max , where α is a constant. Similarly, we consider the energy hole case. An energy hole case is considered when the energy of the monitored node is less than βδ max , i.e., δ k ðtÞ ≥ βδ max , where β is a constant.
The optimization objectives are considered to maximize the total amount of data collection and energy transmission in the urban monitoring network, while minimizing the energy consumption of the UAV. In the process of planning the flight trajectory and hovering position of the UAV, it is necessary to consider the status of the monitoring nodes and the energy consumption of the UAV and try to avoid the situation of data overflow and energy voids in the monitoring nodes. The UAV accesses the monitoring nodes for service in order according to the priority of the monitoring section, i.e., monitoring node k = argmaxQ d k ðtÞ is selected as the target node for the UAV at time slot t.
By jointly optimizing the UAV flight strategy and resource allocation strategy, maximizing the amount of uplink data collection and downlink energy transfer, and minimizing the UAV energy consumption, the multiobjective optimization problem is defined as where C1 and C2 are the flight speed and yaw angle constraints of the UAV, C3 is the UAV transmission power 6 Wireless Communications and Mobile Computing constraint, and C4 and C5 are the constraints of the data queue and energy queue of the monitoring nodes, respectively.

UAV-Assisted Urban Monitoring Network Resource Allocation Algorithm
In this section, we first analyze the multiobjective optimization problem, due to the complexity of solving it. To facilitate the solution, a deep reinforcement learning-(DRL-) based approach is used to solve the problem and is presented in detail.

Problem Analysis.
In the optimization objective of the problem, the maximization of the data collection volume mainly depends on the allocation of time slots and transmission power during the hovering of the UAV at the target monitoring node position, i.e., the amount of uploaded data at the current target monitoring node can be increased by optimizing the time slots and power. However, the energy consumption of the UAV increases when too many resources are allocated. At the same time, the energy transfer also depends on the allocation of time slots and power by the UAV during hovering. By allocating more time slots and more transmission power, more energy can be transmitted to the monitoring nodes, but it causes an increase in the energy consumption of the UAV. Based on the above analysis, the three optimization objectives have optimization conflicts. How to find the best hovering position, the optimal flight trajectory, and the optimal resource allocation scheme of the UAV has a certain complexity and brings a large computational cost in the solution process. In addition, since the environment is partially observable, traditional model-based approaches such as dynamic programming methods cannot solve this problem. Considering the dense distribution of monitoring nodes in urban networks, the deep q network algorithm is not suitable for continuous action spaces. The deep deterministic policy gradient (DDPG) [25] as a classical DRL algorithm has been shown to learn effective strategies in continuous action spaces through low-dimensional observations [26]. The DDPG algorithm has a strong understanding and decision-making capability to enable end-to-end learning. It has shown great potential in solving complex network optimization. Therefore, in order to reduce the computational complexity and computational cost, the DDPG algorithm is applied to the solution of the UAV decision problem. Considering the rewards as scalar values in the original DDPG algorithm, a Multiobjective Joint DDPG (MJDDPG) algorithm for UAV-assisted urban monitoring network data collection and energy transmission is proposed to describe the optimization objective preferences by introducing weight parameters according to the multiobjective optimization problem extended to multidimensional rewards.

State Space.
In the UAV-assisted urban monitoring network, the state space is jointly determined by the monitoring nodes, UAVs, and environmental information. At time slot t, ½d x n ðtÞ, d y n ðtÞ is the relative distance between the target monitoring node and the UAV in the Cartesian coordinate system. After the UAV finishes serving the current target node, it selects a new monitoring node as the target node based on the current system status. The relative distance between the UAV and the node helps guide the UAV to include the target monitoring node in its data collection coverage. The cumulative number of times that the UAV exceeds the restricted area during the mission up to time slot t is defined as N f ðtÞ. Considering the absolute position of the UAV helps to prevent the UAV from flying out of the specified area and causing unnecessary waste of resources. At the same time, considering the data queue status, energy status, and transmission data size of nodes helps UAV to make effective decisions and provide timely services. Thus, the state space is defined as where N d ðtÞ denoted the number of nodes with overflow data, N e ðtÞ denoted the number of nodes with depleted power, and C n ðtÞ denoted the amount of data to be uploaded on the current target node. N f ðtÞ, N d ðtÞ, and N e ðtÞ are all 0 at the time t = 0.

Action
Space. The UAV makes actions by observing the state of the environment in real time. In this paper, UAV acts as agents that need to map the state space to a continuous action space. In the UAV-assisted urban monitoring network scenario, multiobjective optimization is achieved by jointly optimizing UAV flight decisions, time slot allocation, and power allocation. Based on the observed current environmental state, the actions selected by the UAV at time slot t include the UAV's flight speed vðtÞ, yaw angle θðtÞ, and subtime slot allocation τ and the transmit power allocation p d ðtÞ. The action variables of the UAV are all continuous variables. Thus, the actions that an agent can take at time slot t can be expressed as where the yaw of the UAV is denoted by ½cos ðθðtÞÞ, sin ðθ ðtÞÞ and τðtÞ denotes the proportion of time allocated to downlink energy transmission within a single time slot.

Reward.
In reinforcement learning, the reward function serves as a quantitative evaluation of an agent after taking action. Suitable reward functions are particularly important for the performance of deep reinforcement learning algorithms. In this paper, a good reward function contributes to the learning of unmanned control strategies. The network performance is improved by optimizing the amount of data collection, energy transmission, and energy consumption of the UAVs while safeguarding the overall quality of the network. The reward is defined as a multidimensional vector, defined as 7 Wireless Communications and Mobile Computing where r dc ðtÞ, r eh ðtÞ, and r ec ðtÞ are optimized objective rewards and r aux ðtÞ is a penalty term. Based on the service provided by the UAV to the monitoring node at time slot t, the reward value can be expressed as where D k ðtÞ denotes the reward value corresponding to the total amount of data collected by the UAV hovering at the monitoring node k. The larger the total amount of data collected by the drone, the larger the reward value. E k ðtÞ denotes the reward value of the energy transferred by the UAV in the hovering state at monitoring node k.
The larger the total amount of energy transferred, the larger the reward value obtained. E uav k ðtÞ denotes the total energy consumption of the UAV in time slot t. If the UAV is in flight, then E uav k ðtÞ = p v t; if it is in hovering, then it contains hovering energy consumption and communication energy consumption, i.e., E uav k ðtÞ = pðvÞ + p d ðtÞ. When the target monitoring node is within the data collection coverage radius of the UAV, the UAV will hover and collect data and transmit energy; otherwise, the UAV is in the flight phase. In the flight phase of the UAV, r dc ðtÞ, r eh ðtÞ are 0. The penalty function r is defined as where the first two terms in r aux ðtÞ are the distance between the UAV and the target monitoring node. The value of the penalty term is smaller if the UAV is further away from the target monitoring node. This helps the UAV to identify the location of the target monitoring node in order to approach the target node. In addition, if the UAV appears to fly out of the service area, it will receive a negative reward. Similarly, the UAV will also receive a negative reward for untimely data collection, resulting in data overflow or energy depletion of the monitoring node. The quality of UAV service is guaranteed by penalizing UAVs for incorrect flight decisions in order to motivate them to learn the correct flight strategy.

MJDDPG-Based Resource Allocation Algorithm.
The MJDDPG architecture is shown in Figure 3. Unlike the original DDPG, the DDPG is a single target MDP with a scalar reward signal, and the reward in the experience tuple is a vector. Since the value of the action depends on the preference between competing targets, a linear weighting approach is used to represent the reward vector, denoted as r = rw T , where w is the weight vector. It is worth noting that by this design, the MJDDPG algorithm is applicable to optimization problems with an arbitrary number of objectives. All weight parameters are selected in the interval ½0:0, 1:0 according to the importance preference of each subobjective in the optimization objective. The pseudocode related to the proposed MJDDPGbased resource allocation algorithm is given in Algorithm 1.

Simulation Results and Performance Analysis
In this section, the proposed algorithm is simulated and verified with performance analysis, in order to verify the superiority of the proposed dynamic resource allocation algorithm in urban monitoring networks. The MJDDPGbased resource allocation algorithm and several baseline algorithms are compared and analyzed; the baseline algorithm is selected based on different resource allocation strategies as follows: (1) Random Resource Allocation (RRA). In this algorithm, the UAV flying decision and the resource allocation regarding hovering time slot division and transmitting power are randomized.
(2) Constant Transmit Power (CTP). In this algorithm, the transmit power of the UAV is constant, and the resource allocation is achieved by optimizing the UAV flying decision and hovering time slot.
(3) Constant Slot Allocation (CSA). In this algorithm, the hovering time slot allocation ratio of the UAV is constant, and the resource allocation is achieved by optimizing the UAV flying decision and transmitting power.
(4) Dynamic Resource Allocation Algorithm for Single-Objective Optimization. With data collection quantity, energy transmission quantity, and UAV energy consumption as the optimization objectives, the flying decision of UAV and resource allocation algorithm are implemented based on DDPG, respectively.
The performance of the above algorithms is compared for the three optimization objectives of network throughput, energy transmission, and UAV energy consumption, and the effectiveness and superiority of the proposed algorithms are verified for different node densities.

Simulation Settings.
In the parameter setting of the UAV-assisted urban monitoring network scenario, it is assumed that the urban monitoring network coverage area is a square area of 400 m × 400 m, the UAV flying height is set to H = 10 m, and the number of ground monitoring nodes is set to N = 100. The maximum UAV flying speed is v max = 20 m/s [27]; the UAV single service period is T = 600 s, which starts at a random location in the area service. The coverage radius of the UAV is D = 20 m. The node energy conversion factor ζ = 0:5; the maximum UAV transmitting power p d = 40 dBm. The data cache volume of the monitoring node is updated at a frequency f d = 1 time/ second; the expectation of the data generation rate is taken as 4, 6, 10, and 18; the Poisson process expectation of the energy loss rate is set as 0.1, 0.2, 0.4, and 0.6; the maximum capacity of the data buffer is set as b max = 5000; and the maximum capacity of the energy transmission queue δ max = 600. The rest of the system parameters are listed in Table 1 [21]. 8 Wireless Communications and Mobile Computing

Performance Metrics and Analysis.
In order to verify the usability of the proposed algorithm, two aspects of network quality of service and network performance will be used as metrics, respectively.

Network Performance.
For the network performance analysis, the network performance is described based on the optimization objectives, i.e., the network performance is evaluated in terms of data collection, energy transmission, and UAV energy consumption.

Network Quality of Service.
Urban monitoring network nodes due to data collection and energy charging service are not timely resulting in data and energy hole phenomenon seriously affecting the network quality. The quality of service of an urban monitoring network is expressed by the number of nodes with data overflow and energy voids, which can be expressed as First, we analyzed the convergence of the proposed algorithm, as shown in Figures 4 and 5. By observing the cumulative reward value as the change curve of the accumulated reward value during the training process, it can be found   9 Wireless Communications and Mobile Computing that as the number of iterations increases, the UAV quickly learns the strategy to obtain higher reward values and converges stably at a higher level. The accumulated reward value is at a low level until the 50 episodes, which is due to the fact that the algorithm has less experience in its experience buffer and takes actions that rely more on random strategies. Also, it can be observed from Figure 5 that the network loss at this stage is 0 and all objectives are not optimized. When the replay experience buffer is full, the UAV starts sampling the replay experience buffer to train the network. Observing the change curve of network loss in the training process, we can see that the network loss decreases rapidly after a sharp increase before the 300 episode, which is because the algorithm is in the exploration and learning phase at that stage.
The optimization effect of the proposed algorithm with multiple optimization objectives is shown in Figure 6, where the optimization objective weight parameter is set to w dc = w eh = w ec = 1. It can be observed that the total data collection and the total collected energy increase rapidly during 0-300 episodes and the UAV energy consumption decreases rapidly because the UAV accesses more monitoring nodes to maximize the data collection. At the same time, the UAV maximizes the energy harvesting of the monitoring nodes by allocating more transmission power. As the number of training increases, the UAV energy consumption gradually increases and the total harvested energy is further increased, i.e., the UAV further adjusts the control strategy to achieve the trade-off between data collection, energy harvesting, and energy consumption. With the oscillatory convergence of the loss function, the energy consumption target of the UAV is largely stable, and the data collection target and energy transfer target converge in a certain interval. The distribution of users is random. In some areas, the number of users is more concentrated, i.e., the number of users located is high. When the UAV moves to that area for service, more energy is transmitted. Therefore, the change curves of the data collection target and energy transfer target are not smooth. The results show the effectiveness of the proposed algorithm.
To further analyze the effectiveness of the proposed algorithm and compare different resource allocation algorithms, the performance is analyzed under different monitoring node densities. The number of monitoring nodes in the considered urban monitoring network scenario is set to 100, 130, 150, 170, and 200, respectively. The final experimental data results are taken as the average of the 200 evaluated results.
The results of the amount of data collection for different resource allocation algorithms in urban monitoring networks with different densities are shown in Figure 7. As the density increases, the data collection quantity of the MJDDPG algorithm and CTP resource allocation algorithm increases and then levels off with the increase of the number of nodes. This is because as the number of monitoring nodes in the cluster increases, the data to be uploaded in the cluster head monitoring nodes increases. The UAV can make the total amount of data uploaded by the monitoring nodes increase by optimizing the hovering time slot allocation, and the curve stabilizes and stops growing when the amount of data that the nodes need to upload reaches a certain level. Both the CSA algorithm and RRA algorithm do not involve the optimization of UAV hovering time slot allocation. With the number of monitoring nodes increasing, the CSA algorithm shows a linear growth trend in the data collection volume. The RRA algorithm shows a certain growth in the data collection volume as the number of nodes increases, but there are fluctuations overall. This is because the RRA algorithm has certain randomness.
The results of the amount of energy transmitted by different resource allocation algorithms in urban monitoring networks with different densities are shown in Figure 8. With the increase in network node density, the energy transmission curves of all algorithms show an increasing trend. The CSA algorithm and the RRA algorithm grow the most, and the MJDDPG algorithm and the CTP algorithm grow more slowly. This is because in the multiobjective optimization process, the MJDDPG algorithm and the CTP algorithm optimize the UAV hovering time slot allocation in order to achieve more data collection. This leads to an insignificant growth of the total energy transmission. While the CSA algorithm keeps the hovering time slot allocation constant, the total energy transmission is proportional to the number of monitoring nodes, and the total energy transmission grows approximately linearly as the density increases. The RRA algorithm, as a random resource allocation algorithm, is not affected by any optimization objective, but there are certain fluctuations in the growth curve.
The results of considering the impact of different resource allocation algorithms on UAV energy consumption under different densities of urban monitoring networks are shown in Figure 9. It can be observed that the energy consumption of UAVs under all resource allocation algorithms increases gradually with the rise of density, among which the RAA algorithm still has fluctuations; the CTP algorithm has an approximately linear growth curve of UAV energy consumption with the increase of density because it maintains a constant transmitting power; the MJDDPG algorithm and the CSA algorithm are limited by the optimization objective, which will limit the transmitting power of UAVs to ensure the low energy consumption of UAVs, so the growth rate is slower and tends to be flat.
By analyzing the performance of different algorithms in urban monitoring networks of different densities, it can be seen that the dynamic resource allocation algorithm based on multiobjective joint optimization proposed in this paper has obvious advantages in multiobjective joint optimization. With the increase in density, there is a significant improvement in data collection, energy transmission, and UAV energy consumption. In order to analyze the impact of the number of optimization objectives on the performance of the algorithm, the proposed algorithm is compared and analyzed with the single optimization objective algorithm. The single-objective DDPG algorithm with data collection volume, energy transmission volume, and UAV energy consumption as optimization objectives is considered for comparison to further validate the performance of the proposed algorithm in terms of overall network performance.
The single-objective optimization algorithm is as follows: (1) SD-DDPG: the optimization objective is to maximize the data collection volume and to ensure that the energy transmission of nodes in the urban monitoring network exceeds a certain threshold, while ensuring that the energy consumption of UAVs is less than a certain threshold; (2) SE-DDPG: the optimization objective is to maximize the energy transmission volume, while ensuring that the data collection volume of the monitoring nodes exceeds a certain threshold and the energy consumption of the UAV is controlled within a certain range; and (3) SC-DDPG: the optimization goal is to minimize the energy consumption of the UAV, while ensuring that the data collection volume and energy transmission volume of the monitoring nodes in the urban monitoring network are maintained at a certain level.
The results of the different optimization objective algorithms are shown in Figure 10. It can be seen that the SD-DDPG, SE-DDPG, and SC-DDPG algorithms achieve the optimization objectives of highest data collection, highest energy transfer, and lowest UAV energy consumption,     respectively. In the SD-DDPG algorithm, although the UAV is able to adjust the hovering position, transmission power, and time slot to ensure the maximum amount of data collection from the monitoring nodes and to ensure that the overall energy transmission is maintained at a normal level, there is a large number of monitoring nodes with an energy hole situation and increased energy con-sumption of the UAV. Thus, the SD-DDPG algorithm is not able to guarantee the overall performance of the network. Similarly, the SE-DDPG algorithm and the SC-DDPG algorithm cannot guarantee the overall performance of the network. The MJDDPG algorithm proposed in this paper performs well in all three suboptimization objectives as well as data overflow and energy hole.

Conclusion
In this paper, we investigate the resource allocation problem in a UAV-assisted urban monitoring network, where UAVs act as mobile base stations to provide data collection and energy transmission services to monitoring nodes. By optimizing UAV transmission power, hovering time slots, and flight decisions, we maximize data collection, maximize energy transmission, and minimize UAV energy consumption in the monitoring network. At the same time, we try to avoid data overflow or energy voids in the monitoring nodes during the UAV decision-making process. Considering that the optimization problem of UAV is a multiobjective optimization problem, the dynamic resource allocation algorithm based on MJDDPG is proposed. The effectiveness of the algorithm is verified through simulation experiments. Meanwhile, the dynamic resource allocation algorithm based on MJDDPG is compared with other resource allocation algorithms to verify the excellence of the algorithm. In the future, we will expand our existing work to apply multiple UAVs to assist the urban monitoring network and consider collaborative control of UAVs to improve the network performance further.

Data Availability
No data were used to support this study.

Conflicts of Interest
The authors declare that they have no conflicts of interest. 14 Wireless Communications and Mobile Computing