Network Resource Allocation Strategy Based on UAV Cooperative Edge Computing

Aiming at the problem that ﬁxed mobile edge computing (MEC) server is diﬃcult to meet the needs of mobile users and temporary computing services, this study proposes a network resource allocation strategy based on unmanned aerial vehicle (UAV) cooperative edge computing. First, a UAV-aided MEC scene is designed, and a single UAV with an MEC server is used to provide auxiliary computing services for ground multiusers. Then, an optimization model aiming at total system delay is constructed by considering the system communication model and calculation model. Finally, Deep Q-Network is used to solve the optimization problem to obtain the best resource allocation scheme. Based on the experimental platform, the proposed strategy is demonstrated and analyzed. The results show that when the number of user equipment is 40, the total delay is about 33s, which is 35.29%, 31.25%, and 15.38% lower than other comparison strategies and eﬀectively reduces the computing delay of users.


Introduction
In recent years, with the development of 5G mobile communication technology and the Internet of ings (IoT), smart mobile equipment have shown an explosive growth [1][2][3]. In the context of cloud computing, MEC is considered to be a technical key to improving the computing efficiency of mobile edge equipment [4]. In MEC systems, task offloading is the key for mobile equipment to support resourceintensive applications by executing on edge cloud resources [5,6]. However, computing servers are usually deployed in fixed base stations. In practical scenarios, fixed base stations cannot meet dynamic services such as user mobility, base station damage, and temporary hotspot areas. It is particularly significant to meet the dynamic communication and multiequipment access requirements.
According to the existing research, computing servers can be deployed on an unmanned aerial vehicle (UAV) to meet the communication requirements. UAV themselves have the advantages of low cost and high mobility. In UAVassisted MEC networks, mobile equipment can offload tasks to UAV with high computing power and flexible connectivity at the network edge [7,8]. is method utilizes the flexibility of UAV and better channel gain to offload computing tasks for users, which can not only save computing delay and user energy but also reduce the traffic load on fixed cloud servers [9]. Reference [10] proposed a twostage joint hovering altitude and power control solution for the resource allocation problem in UAV networks considering the inevitable cross-tier interference from space-airground heterogeneous networks.
At present, significant progress has been made on the research of computing task offloading. For example, reference [11] proposed a collaborative service deployment and application allocation algorithm to achieve the final edge service policy deployment. e minimum energy consumption was obtained through the minimum resource ratio increasing algorithm, and computing tasks are redistributed in combination with the load balancing algorithm to balance the computing load. However, its overall computational efficiency needs to be improved. Reference [12] proposed a computing framework for coordinating terminals, edge nodes, and cloud centers based on the pipeline offloading scheme. According to the computing and communication capabilities of the entire network, they reasonably allocated computing-intensive tasks to specific terminals or clouds, effectively improving computing efficiency. Reference [13] proposed a vehicle-assisted computational offloading architecture for UAVs. e proposed framework used vehicleassisted computing offload for UAV computing tasks and network resource optimization, which has commonality with the research topic. Reference [14] introduced agents into computing task offloading and proposed a UAV-MEC (UMEC) agent-enabled computing task offloading framework to help users, UAV, and edge clouds perform computing task offloading. Reference [15] proposed a computational offloading scheme to minimize the time and energy consumption of computational task costs. e scheme reduced the execution cost by coordinating the allocation of computing resources between mobile equipment and edge servers. However, many iterations were needed to find the optimal solution. Reference [16] designed a multiround iterative auction algorithm based on auction theory. But the overall performance of the algorithm needs to be further optimized. Reference [17] proposed an optimized auction-based incentive mechanism. e mechanism can optimize long-term system welfare by operating in an online fashion. However, it does not have good scalability for resources that cannot be covered by network hardware.
With the continuous development of computer technology, intelligent algorithms such as machine learning are continuously applied to edge computing. As in reference [18], a heuristic-based algorithm with a low time cost is proposed. Compared with the existing centralized resource allocation and decision-making algorithms, this scheme acquired a higher number of successfully offloaded tasks in different scenarios, but the offloading of computing tasks in complex situations cannot achieve collaborative optimization. Reference [19] proposed a blockchain-driven collaborative framework for MEC. Reference [20] proposed a caching mechanism based on Q-learning to reduce the backhaul traffic load and transmission delay from the cloud. Reference [21] proposed a collaborative computing framework based on deep neural networks. e experimental results indicate that the proposed method has batter effectiveness than the traditional methods. e above methods based on machine learning not only consider the efficiency and timeliness of MEC network resource allocation to a certain extent but also do not have good scalability. erefore, the offloading strategy based on UAV has been further researched and developed. Reference [22] proposed a deep reinforcement learning-based MEC UAV-assisted computing offloading. e total cost minimization was taken as the objective function. Although this method expands the wireless network to a certain extent, it needs to be further optimized for efficient offloading of computing tasks and reasonable allocation of network resources in complex IoT.
Aiming at the problem that it is difficult for fixed MEC servers to meet the computing needs of mobile users and computing efficiency of existing UMEC, a network resource allocation strategy based on UAV collaborative edge computing was proposed. Compared with the traditional network resource optimization allocation strategies, the innovations of this study are as follows: (1) For the low-latency service guarantee for users, the UAV carrying MEC server is designed to improve the auxiliary edge computing system model for users. Taking the total system delay as the optimization goal, the transmission delay of user calculation is greatly shortened. (2) Due to the huge amount of data and high real-time requirements of the 5G system, the proposed strategy uses Deep Q-Network (DQN) to solve the optimal resource allocation scheme, which improves the analysis efficiency of the network state.

System
Model. e scenario of human-machine-assisted MEC system uplink communication is studied, as shown in Figure 1, that is, UAVs provide auxiliary computing services for multiple ground user equipment. Only the scenario where a single rotor UAV provides services to N user equipment on the ground is considered.
For the convenience of theoretical analysis, a three-dimensional Cartesian coordinate system is considered, where L n � (x n , y n ) represents the position of n user equipment on the ground. e drone flies in a circle within a specified range and provides auxiliary computing services to ground user equipment. Among them, UAV is at a fixed height and the initial position of UAV that is denoted as d [1], and the final position is denoted as d [K]. Each ground user equipment offloads some of computing tasks to UAV and the rest is locally computed.
It is assumed that the total execution time is denoted T, and it can be equally divided into K slots and denoted t � 1, 2, . . . , K { }, where τ � T/K, where τ refers to the length of each slot. In the t slot, the position of drone is . Assuming that the maximum flight speed of UAV is V max , the trajectory constraints of UAV are as follows: where formula (1) indicates that the flight speed of the UAV cannot exceed its maximum flight speed. Since UAV flight energy consumption is related to UAV flight speed, UAV flight energy consumption expression is as follows: where represents the flight speed of UAV in the t time slot; η represents the proportional factor of energy consumption and flight speed.

Communication Model.
Assuming that the wireless channel between UAV and each user equipment is a line-of-sight (LOS) channel, the channel power gain between UAV and user equipment n can be expressed as follows: where β 0 represents the channel power gain per unit distance; d −2 n [t] represents the horizontal distance between UAV and user equipment n in the t gap; and ‖ · ‖ represents Euclidean norm.

Local Computing.
When user equipment n selects the local mode, all computing tasks will be locally performed, and the computing delay can be expressed by the following formula: where f n is the computing capability of user equipment n (number of cycles per second); C n represents the number of cycles required to calculate 1 bit; and D n is the size of input data.

Offload to UAV for Calculation.
e UAV carrying MEC server periodically flies at a fixed altitude above the ground user equipment. e user equipment that selects MEC mode can offload tasks to UAV through time-division multiple access (TDMA) [23,24]. e set of ground users is set to be offloaded to MEC as N ′ , and UAV can choose to relay some tasks to access point (AP) for calculation. erefore, it can be assumed that the ratio of user equipment n that selects the relay to the ground base station in the t time slot is ζ n [t], n ∈ N ′ , t ∈ K; then, the task that needs to be calculated by UAV in the t time slot is (1) e transmission speed of user equipment n in time where B 1 is the bandwidth between UAV and user n; P n is the maximum transmission power of the user n, and the user equipment uses the maximum power transmission; and h n [t] is t time slot channel gain between the user n and UAV communication.
(2) Transmission delay of user equipment n: assuming that the bits offloaded by the user n in time slot t are D n [t], so the user n needs to offload all bits in K time slots, and the constraints are expressed as follows: e total transmission delay of the user equipment is expressed as follows: (3) e calculation delay of UAV: according to the relay ratio, the task calculated by UAV for user equipment n in the t time slot and then the calculation delay of user equipment in one time slot in MEC mode can be obtained, and finally, the calculation task of UAV for a single user equipment can be obtained. e total delay is as follows: where f c represents the computing power of MEC server.

Offload to Base Station on the Ground for Calculation.
UAVs can choose to relay part of computing tasks to AP computing. e proportion of UAVs choosing relays in time slot t is ζ n [t] ∈ [0, 1], n ∈ N ′ . Since the ground base station can have multiple MEC servers built in, the calculation delay of ground base station is considered to be ignored. At the same time, the result of the calculation task is usually very small, and it is considered to ignore the transmission delay of calculation result. us, the delay of UAV relaying tasks to the computing part of the ground base station only includes the transmission delay of the UAV. e flying height of the UAV is relatively high from the ground base station and has a good line-of-sight link. It is considered that the communication between the UAV and ground base station adopts LOS channel. e transmission rate of UAV relaying the task to AP in the time slot is calculated as follows: where B 2 is the bandwidth between UAV and AP; P u is the transmission power of UAV; and h u [t] is the communication channel between UAV and AP. e transmission delay from UAV relay to ground AP can be expressed as follows: . (10) Assuming that UAV can calculate and transmit part of computing tasks to the ground AP at the same time, the total calculation delay offloaded to UAV is as follows: When the user equipment selects MEC mode, the total delay of offloading computing tasks to UAV is as follows:

Model Constraint Description.
e system goal is to minimize the total delay of all requesting user equipment, so the system goal can be defined as follows: where T n represents the calculation delay of the user n.
Considering different computation and offloading modes, T n is defined as follows: where α k represents the mode selected by the user equipment. When the variable is 1, it means that the offloading to the drone is selected, and when the variable is 0, it means that the local calculation is selected. T loc n represents the total delay locally calculated by the user equipment; T Ω n represents the total delay calculated by the user to select all offload to the drone.
Optimization of user equipment offloading strategy, UAV relay ratio, UAV trajectory, and user bit allocation are combined to minimize the total delay of user equipment. e mathematical expression is as follows: where C1 represents that the user equipment n can only choose one mode; C2 represents that the relay ratio of UAV is a variable between 0 and 1. C3 represents that the calculation delay for each user equipment n must be less than the maximum tolerated delay. C4 represents that the user equipment n that selects the MEC mode needs to complete the offloading of tasks within K time slots. C5 means that the drone flies in cycles; C6 means that the maximum horizontal distance of the drone in one time slot cannot exceed the threshold. C7 represents that the bits offloaded by all user equipment to UAV in a time slot cannot exceed the computational threshold of UAV.

Reinforcement Learning
Modeling. e optimization problem in equation (15) is solved by a single-agent Q-learning algorithm based on reinforcement learning. e algorithm model consists of four parts, namely, agent, state, action, and reward [25].
(1) Agent: Using UAV as the agent of the algorithm, UAV will be responsible for collecting the information of each equipment in the system and making scheduling decisions. (2) Action: a � [λ, ϖ n ] is defined to represent each action decision variable of UAV. Among them, λ � n indicates that the drone will provide on-demand services for the n user equipment in the current state, and λ ∈ N is satisfied. ϖ n represents the computing mode of required computing task of n user equipment. (3) State: e state variable s � [ω, z, t serve , t fly ] is defined to represent the service state in the system. e state quantity consists of the following four parts: the user equipment serves state quantity ω � [ω 1 , . . . , ω N ] and satisfies ω n ∈ 0, 1 { }. ω n represents whether the n user equipment in the system has been serviced by drone and the service status has been completed, ω n � 1 represents that the drone has completed the on-demand service for the first user equipment, and otherwise, ω n � 0. e maximum tolerant delay amount z � [z 1 , . . . , z N ] and satisfies z n � T max n , that is, z n represents the maximum tolerant delay of n user equipment in the system. Waiting service delay amount t serve � [t serve (4) Rewards: e proposed optimization problem aims at maximizing the number of network user equipment services under UAV-MEC system architecture, while the proposed Q-learning algorithm based on reinforcement learning aims to obtain maximum reward feedback. Combining the above two points, and based on the current system state variable, s � [ω, z, t serve , t fly ] and the selected action variable a � [λ, ϖ n ]. e system reward feedback function is defined as r(s, a) � N n�1 ω n , that is, when UAV selects the action decision variable as a in the state s, the total number of network equipment in the system can meet the service requirements [26,27]. e training iterative process of the Q-learning algorithm based on reinforcement learning satisfies the Bellman equation.
e selection principle of action decision performed by UAV under different state variables is based on ε-greedy mechanism.

DQN-Based Offloading Strategy Optimization Algorithm.
e pseudocode of the DQN-based offloading strategy optimization algorithm is shown in Algorithm 1.
First, the network parameters are initialized, including the capacity C 1 of replay experience pool, the parameter θ of action value function Q(s, a; θ), the parameter θ � θ of target action value function Q(s, a; θ), and the initial state s. en, at each episode t, each user equipment randomly picks an action from the feasible action space according to ε-greedy policy. A random action with probability ε or an action satisfying the formula a � arg max Q(s, a; θ).
en, resource allocation is performed according to the action selected by user equipment. If the local processing task is selected, let the local CPU cycle frequency be the maximum computing power of equipment, namely, F n � f loc n . If the user equipment chooses to upload tasks to UAV for processing, let its uplink transmission power be the maximum available power, i.e., p n � P n .
After the resource allocation is completed, reward r can be calculated according to the designed reward function, a new state s ′ can be obtained, and a new sample (s, a, r, s ′ ) can be stored in the experience pool. Finally, a batch of samples is randomly sampled from the experience pool, and these samples are used to update the parameters θ of Q-Network, synchronizing the parameters θ � θ of target Q-Network every e steps. Note that although each user equipment independently selects actions, since the resources of UAVs are shared, the results of their action selections influence each other [28,29]. Assuming that all users have selected the same drone for association, the drone may be overloaded, resulting in the user's equipment mission not being able to complete within the specified latency limit, or the drone running out of energy. In order to balance the load as much as possible to avoid this extreme situation, an invalid action pool is added. After all user equipment complete the action selection, if the action set composed of all the user equipment cannot meet the energy consumption constraints of the base station or the load is too concentrated so that most of the user equipment cannot meet the delay limit T max n , the action set is added to invalid action pool [30]. In this way, when the user equipment performs action selection again, if the current action combination has already appeared in an invalid action pool, it is ignored and the selection is made again. is avoids invalid calculations to improve efficiency.

Experiments and Analysis
Simulation results are used to evaluate the performance and efficiency of the proposed strategy. It is assumed that the user equipment is randomly distributed in a two-dimensional area of 200 m × 200 m, and the AP is located in the upper right corner of the two-dimensional area. Each user equipment has different computing tasks, time delay tolerance, and the number of cycles required for computing a 1 bit task. e specific parameters are shown in Table 1.

Average Reward of Algorithm Training Process.
e average reward value of multiple episodes in the training phase of the DQN-based offloading strategy optimization algorithm is shown in Figure 2. In order to increase the exploratory nature, a certain amount of noise is added to the action during the experiment, which makes the original image have many glitches and become unsmooth. Figure 2 shows the reward curve after taking the moving average.
It can be seen from Figure 2 that the proposed algorithm has converged at 2,000 time slots, and the average reward value fluctuates around 220. It can be seen that the algorithm has a fast convergence speed and an ideal average reward value.

Relationship between the Number of Convergence Iterations and the Number of IoT Equipment.
As the number of equipment in the system increases, the number of times that the proposed algorithm training reaches convergence also increases, as shown in Figure 3.
As can be seen from Figure 3, when the number of system equipment increases, more iterations are required to achieve convergence. But when the number of equipment exceeds 35, the increase in the number of iterations increases. is is mainly because the Q-table size of the algorithm is closely related to the number of equipment present in the system. When the number of equipment exceeds the threshold, the processing pressure of the algorithm sharply increases, resulting in a rapid increase in the number of iterations required for convergence. However, the algorithm can achieve convergence regardless of the number of equipment, thus demonstrating the effectiveness of the proposed algorithm.

Relationship between the Calculated Energy Efficiency and Maximum Energy Consumption of Different Loading Models.
e relationship between the computing energy efficiency of user equipment and the maximum energy consumption under different loading models is shown in Figure 4.
As can be seen from Figure 4, the local computing model is that the user equipment only performs local computing, while the global offloading model is that the user equipment completely offloads computing tasks to UAV for computing. Both schemes optimize the flight trajectory of UAV. Using the DQN-based offloading optimization algorithm for partial offloading can achieve higher computational energy efficiency. When the maximum energy consumption is 4J, the computing energy efficiency of its user equipment exceeds 200 bits/J. is is because the user equipment can flexibly allocate resources according to the quality of channel state information, so as to select offload computation or local computation under the partial offload model. Furthermore, the global offload model outperforms the local computing Journal of Robotics model, and the computing energy efficiency of user equipment increases with the increase in maximum consumed energy. e reason is that as the energy of user equipment increases, the user equipment has more energy to perform local computations or offload computations. In addition, in the local computing mode, the computing energy efficiency does not change as the energy increases. It can be considered that when the maximum energy consumption value is 1J, the computing energy efficiency of user equipment has reached the maximum value. Subsequently, the user equipment does not need to consume more energy to improve the computing energy efficiency.

Relationship between Total Delay and the Number of IoT
Equipment. As the number of equipment in the system changes, the total latency of the IoT system will also accordingly change. In order to demonstrate the performance of the proposed strategy, it is compared with reference [13], reference [14], and reference [22], and the results are shown in Figure 5.
As can be seen from Figure 5, as the number of equipment increases, the total system delay also increases as expected.
e main reason is that when the number of equipment in the system increases, UAV flight delay, UAV edge computing delay, equipment upload delay, and equipment local computing delay corresponding to the newly added equipment will be added to the system. will be added to the total delay of system. Furthermore, the proposed strategy significantly outperforms other comparative strategies in reducing the total system latency. When the Initialization Experience pool capacity C 1 , Invalid action pool capacity C 2 , Parameters θ of action value function Q, Parameters θ � θ of target action value function Q, Initial state s. Begin (1) While t ≤ t max (2) For n � 1: N If rand(0, 1) < ε User n randomly selects action a from action space A; Else User n selection action a � arg max a∈A Q(s, a; θ) End if End for (3) Allocate resources according to user actions (4) For n � 1: N (5) Obtain a reward r and a new status s′, and store the new sample (s, a, r, s′) in the experience pool. (6) Update status s � s′ (7) Random sampling (s i , a i , r i , s i+1 ) is conducted from the experience pool to form small batch samples. (8) Perform gradient descent on (y i − Q(s i , α i ; θ) 2 with respect to parameter θ. (9) Reset the target action value function Q � Q every e steps.    Journal of Robotics number of user equipment is 40, the total delay is about 33s, which is 35.29%, 31.25%, and 15.38% lower than reference [13], reference [14], and reference [22], respectively. Since the proposed strategy utilizes mobile UAVs for computational offloading and optimizes resource allocation with DQN, the delay can be minimized. Reference [13] proposed a vehicle-assisted computing offloading architecture for UAVs. Since the optimization algorithm has a weak ability to seek optimization, it takes a long time, more than 50s. Reference [14] proposed a UAV-assisted agent-enabled computing task offloading framework to help users, UAVs, and edge clouds perform computing task offloading. However, this algorithm requires a large amount of computation and takes a long time, resulting in increased delay.
Reference [22] proposed a deep reinforcement learningbased MEC UAV-assisted computing offloading scheme. When the number of user equipment is small, the total delay is close to the proposed strategy. However, when the number of user equipment is large, the processing timeliness is not strong, and the delay is relatively large.

Conclusions
In order to give full play to the advantages of a UAVassisted MEC network, it is necessary to further formulate a reasonable user offloading strategy and UAV flight trajectory. To this end, this study proposes a network resource allocation strategy based on UAV collaborative edge computing. Based on the system scenario of collaborative computing between UAVs and ground users, an optimization model is constructed to minimize the total system delay and use DQN to optimize, so as to obtain the best resource allocation scheme. e experimental results show the following: (1) With the fast mobility of UAVs, users in the system can offload computing tasks in real time, which greatly reduces the transmission delay. Especially when the computing power of UAV reaches 2000 MHz, the total system delay is about 20s. (2) e proposed strategy uses DQN to optimize the offloading strategy, which not only has a fast optimization speed but also has an efficient optimization result, and realizes a reasonable allocation of resources while reducing the delay. Especially when the amount of user data is large, the optimization effect is more obvious. When the number of user equipment is 40, the total delay is about 33s, which is more than 10% lower than the total delay of other comparison strategies.  Total system delay (s) Ref. [13] Ref. [14] Ref. [22] Proposed strategy   As the network scale becomes larger, the network scenarios also become more complex. In order to be suitable for large-scale complex network scenarios and meet the realtime requirements of MEC systems, we will focus on researching and designing a simpler and more effective online scheduling strategy in the future. is is crucial for the large-scale application of MEC.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.