Task Offloading and Resource Allocation Strategy Based on Deep Learning for Mobile Edge Computing

For the problems of unreasonable computation offloading and uneven resource allocation in Mobile Edge Computing (MEC), this paper proposes a task offloading and resource allocation strategy based on deep learning for MEC. Firstly, in the multiuser multiserver MEC environment, a new objective function is designed by combining calculation model and communication model in the system, which can shorten the completion time of all computing tasks and minimize the energy consumption of all terminal devices under delay constraints. Then, based on the multiagent reinforcement learning system, system benefits and resource consumption are designed as rewards and losses in deep reinforcement learning. Dueling-DQN algorithm is used to solve the system problem model for obtaining resource allocation method with the highest reward. Finally, the experimental results show that when the learning rate is 0.001 and discount factor is 0.90, the performance of proposed strategy is the best. Furthermore, the proportions of reducing energy consumption and shortening completion time are 52.18% and 34.72%, respectively, which are better than other comparison strategies in terms of calculation amount and energy saving.


Introduction
With the rise of computing-intensive applications and explosive growth of data traffic, users' requirements for the computing power and service quality of mobile devices are also increasing [1]. At present, cloud computing also faces many problems and challenges. Due to its resource-intensive architecture, mobile cloud computing imposes a huge additional load on the backhaul link of mobile networks [2,3].
us, Mobile Edge Computing (MEC) technology is proposed, which physically integrates computing and storage resources into the edge of mobile network architecture [4,5].
is not only effectively reduces the transmission delay but also solves the problems of high load and high delay caused by mobile cloud computing [6]. At the same time, MEC has the characteristics of distributed architecture, being at the edge of network, low latency, user location awareness, and network status awareness [7,8]. However, deploying a large number of computing and storage devices at the edge of network for users to choose and accessing neighboring service providers for edge computing will bring a series of complexities such as access and resource allocation strategy selection, user mobility management, and computing task migration problems [9].
In order to achieve the goal of short completion time and lower terminal energy consumption under delay constraint, this paper proposes a task offloading and resource allocation strategy based on deep learning for MEC. In order to shorten the completion time of computing tasks and minimize the energy consumption of all terminal devices while satisfying delay constraints, the proposed strategy is designed in a multiuser multiserver MEC environment, combined with computing model and communication model in system. Moreover, a new objective function is designed, which uses objective optimization to further reduce energy consumption and time delay. It uses Dueling-DQN algorithm to solve the optimization model to shorten completion time and minimize energy consumption of all terminal devices while meeting the delay constraints. e remaining chapters of this paper are arranged as follows: Section 2 introduces the relevant research work on mobile task unloading. Section 3 introduces the system model. Section 4 introduces the new computing offload method based on improved DQN. In Section 5, simulation experiments are designed to verify the performance of the proposed model. Section 6 is the conclusion.

Related Work
In MEC network, computation offloading may occur in three types: full offloading, partial offloading, and local processing [10]. An important research hotspot in the field of computation offloading is computation offloading decisions. Generally speaking, the offloading goal mainly focuses on minimizing the overall delay or minimizing energy consumption for user devices while meeting the minimum delay requirements [11]. Reference [12] proposed a distributed task offloading strategy for low-load base station groups in MEC environment. It selects the best MEC node offloading amount by game equation on the basis of quantifying offloading cost and delay. But it is not suitable for high-load application environments. In reference to the problem of unbalanced computing resources on the edge server in vehicle edge computing network, [13] proposed a load balancing task offloading scheme based on softwaredefined network. is solution can effectively reduce delay and improve the efficiency of task offloading processing. However, the processing method used has poor performance, which affects the distribution efficiency. Reference [14] used greedy selection to design a maximum energysaving priority algorithm to achieve optimal offloading of computing tasks on mobile devices, but it does not consider the delay constraints of task offloading. Reference [15] combined long short-term memory (LSTM) and candidate network set to improve the deep reinforcement learning algorithm and used this algorithm to solve the problem of offloading dependency of multinode and mobile tasks in large-scale heterogeneous MEC. But they ignore the optimal allocation problem of computing resources.
Similar to computation offloading, resource allocation is also one of the core issues in MEC [16]. In MEC network, technologies such as content caching and ultradense deployment are introduced, and multiple resources are deployed to mobile network edge according to the specific needs of users. is can further ensure the quality of service and greatly increase system capacity [17]. However, due to objective reasons such as physical volume and power consumption, the mobile network edge has limited computing resources, storage cache capacity, and spectrum resources. How to allocate multiple resources and improve the system service efficiency has a huge effect on the improvement of MEC network system performance [18]. Reference [19] proposed a time average computation rate maximization (TACRM) algorithm, which allows joint allocation of radio resources and calculation resources. However, the overall performance and task requirements of devices were not considered comprehensively in the allocation process, and the allocation efficiency still needs to be improved. Reference [20] comprehensively considered factors such as CPU, hard disk space, and required time and distance and proposed a comprehensive utility function for MEC resource allocation to achieve the optimal allocation of resources in MEC and cloud computing. However, this function considers many factors, which will seriously affect the efficiency of allocation in real applications. Reference [21] designed a two-layer optimization method for MEC, which uses pruning candidate modes to reduce the number of unfeasible offloading decisions.
rough ant colony algorithm to achieve the upper-level optimization, the resource allocation effect is better. However, the server computing resource constraints and task delay constraints are not considered, and the overall timeliness is not good. Reference [22] constructed a lowcomplexity advanced branch model, which can be used for resource scheduling in large-scale MEC scenarios.
Due to the lack of powerful processing algorithms, the overall efficiency and performance are not ideal. To this end, comprehensively considering the task offloading and resource allocation problem, a deep learning-based MEC task offloading and resource allocation strategy is proposed to coordinate and optimize the allocation and offloading between computing resources and computing tasks, which improves the comprehensive computing efficiency of MEC.

System Model
e system model is a multiuser multiserver application scenario, in which there are N terminal devices and M MEC servers. e base station is used to provide communication resources for user equipment. Each base station is connected to an edge computing server through optical fiber, through the wireless communication link to connect to MEC server to calculate task data of offloading terminal devices, as shown in Figure 1. It is assumed that each terminal device can perform offloading computations or local calculations for its own execution tasks. And when offloading, the task can only be offloaded to one MEC server for calculation, and each terminal device is within the range of wireless   { }, and the collection of all tasks is G. Each terminal device n has a calculation-intensive task G n to be processed, which specifically includes the data D n (code and parameters) required for computing task G n , the CPU workload ϕ n required for computing task G n , and the completion time of task G n . e extension constraint is τ n , namely, G n � (D n , ϕ n , τ n ). e set of offloading decisions for each G n is X � [ When x n � 0, 1, . . . , m, . . . , M { }, and x n � 0 is local offloading, the rest means offloading G n to m MEC servers.

Communication Model.
In the computation offloading problem, two links are mainly studied: wireless link from terminal devices to MEC and the wired link from MEC to cloud in the core network. In the wireless link, Finite-State Markov Channel (FSMC) model based on fading characteristics is used. FSMC model has a wide range of applications in wireless networks [23,24]. e channel is divided into nonoverlapping intervals through the division of channel-related parameter ranges, and each interval of selected parameters represents a state in FSMC model. e relevant parameter used in FSMC may be Signal-to-Noise Ratio (SNR) amplitude of received signal at the receiving end or collected energy. SNR can be selected as a parameter that composes SNR model [25]. e SNR of receiving end is divided into K levels, and each level is associated with a state of Markov chain. e block fading channel is considered to be that the SNR of receiving end is a constant within a period of time but will change according to Markov transition probability between different periods. Assume that random variable c is the SNR of receiving end of terminal device n. at is, c can be improved according to Markov chain of finite states, and all its states can be expressed as κ � 1, 2, . . . , K { }. e realization of random variable c of terminal device n in the time period t is represented as Γ(t), specifically expressed as denote the probability of state Γ n (t) transitioning from state s n ′ to state s n ″ in the time period t. e K × K channel state transition probability matrix of terminal device n is denoted as In practical applications, the transfer matrix can be observed and measured from wireless environment in the past. In addition, it is considered that Γ n (t), 1 ≤ t ≤ T exists independently for terminal device n. Based on FSMC channel model, Γ n,m is used here to represent SNR between terminal device n and MEC server m. Since there is no interference between terminal devices, its channel efficiency can be expressed as ϑ n,m � log 2 (1 + Γ n,m ). Considering that the bandwidth W m of MEC server m is divided into W m /B m , the bandwidth of each channel is B m . Assuming that each user is allocated at most one channel, the transmission rate from terminal device n to MEC server m can be expressed as v n,m (t) � B m ϑ n,m (t).
(2) e subchannel owned by MEC server m has certain restrictions on receiving W m /B m ; that is, the bandwidth allocated by MEC server m to all connected users cannot exceed the total bandwidth of MEC server m. Besides, MEC server is limited by cache and computing capacity. On the one hand, MEC server can only handle a limited number of tasks; on the other hand, the load that MEC server can handle is also limited (such as the number of computing tasks). erefore, some tasks will be further offloaded to the core network to be processed by the core network. Use g u (t) ∈ 0, 1 { } to represent the computation offloading decision indicator, which is used to indicate the way the server provides services. Among them, g n (t) � 0 means that the terminal device n is processed by connected MEC server for computing tasks. And g n (t) � 1 indicates that the task will be further offloaded to core network for processing by connected MEC server.
In order to further offload tasks to cloud, the wired backhaul link from MEC server to core network is considered. Assuming that the backhaul link capacity of network is Z (in bits per second), the backhaul link capacity allocated by MEC server m is Z m . en, the following restrictions must be met: where θ n,m is the connection between terminal device n and MEC server m and ϖ n,m is the transmission rate between terminal device n and MEC server m.
e sum of the rates of offloading computation tasks to the terminal device of core network by MEC server m cannot exceed the backhaul capacity of MEC server m. And the sum of speeds of all terminal devices processing computing tasks in the cloud cannot exceed the total backhaul capacity of system.

Calculation Model.
If G n is processed locally, use T L n to represent the time when G n is executed locally, which is specifically defined as where workload ϕ n is the total number of CPU cycles required to complete G n and f L n is the local computing power of terminal device n (i.e., the number of CPU cycles executed per second).
Use E L n to represent the energy consumption of devices executed locally by G n , which is defined as follows: Computational Intelligence and Neuroscience where e n is terminal device n to calculate the energy consumption per unit of CPU cycle, e n � (f L n ) 2 × 10 − 27 . If G n is processed at the edge, delay T O n and device energy consumption E O n under G n edge execution should be calculated from three parts: data upload, data processing, and data return [26]. e specific calculation is as follows.
First, terminal device n uploads data G n to the corresponding MEC server by wireless channel. Let T n ′ be the time when device n uploads G n data, which is defined as where D n is the data size of G n and v ′ is data upload rate in the system model (i.e., the amount of data uploaded per second). en, the energy consumption E n ′ of terminal device n uploading data is where P ′ is the uplink transmission power of terminal device n. en, MEC allocates computing resources for calculation after receiving processed data. Use T n ″ to represent the time when the offloading data is calculated in MEC server, which is defined as where f O nm are the computing resources allocated by m MEC servers for G n offload execution (i.e., the number of CPU cycles executed per second). When G n is unloaded to the local or other MEC server, f O ij is zero and serves as a constraint in the model, namely, At this time, terminal device n has no computing task and is in a waiting state and generates idle energy consumption. Suppose P I n is the idle power of terminal device n, then the idle energy consumption E n ″ of terminal device n under offloading computation is Finally, MEC server returns the calculation result to terminal device n. e calculation result during backhaul is small and downlink rate is high, so the time delay and energy consumption when terminal device is received are ignored.
erefore, delay T O n under G n edge execution is the sum of transmission delay T n ′ and the calculation delay T n ″ of MEC server, namely, e device energy consumption E O n under G n edge execution is the sum of upload energy consumption E n ″ of device n and the idle energy consumption E n ″ of device n waiting for G n to complete calculation on MEC server, namely, In summary, the time delay T n and energy consumption E n of the entire calculation process of task G n in terminal device n are Note that T n and f O nm should meet the following restrictions: e time delay constraint η n of G n is that computing power is twice 1.4 GHz. F m is the overall computing resources of MEC server m; that is, the sum of computing resources allocated by each G n that is offloaded to MEC server m should not exceed F m .

Problem Model.
e purpose of this paper is to jointly optimize offloading decision-making and resource allocation scheme in the multiuser multi-MEC server scenario, considering the limited computing resources and time delay constraint of computing tasks. is allows all computing tasks to shorten the completion time and minimize energy consumption of all terminal devices while meeting the delay constraints and extend the use time of terminal devices [27,28]. us, the system objective function Ψ is defined as (T n /η n ) is the ratio of completion time G n to the delay constraints. According to the calculation results of simulation experiment, the difference between N n�1 E n and N n�1 (T n /η n ) is a decimal order of magnitude. erefore, to ensure that the two are of the same order of magnitude and optimized together, N n�1 (T n /η n ) is multiplied by a factor of 10. e objective function Ψ minimizes the ratio of overall energy consumption of terminal devices to the task execution time and delay constraints by solving the optimal offloading decision and resource allocation plan to achieve research purpose. e overall problem model is as follows: where X is the task offloading decision amount and Y is the calculation resource allocation amount. Constraints C 1 , C 2 , and C 3 indicate that each task G n can only be offloaded to the local or one of MEC servers for calculation. C 4 represents the constraint of task completion delay, and C 5 and C 6 represent the constraint that allocated computing resources should meet.

Multiagent Reinforcement Learning
Algorithm. e multiagent reinforcement learning system is shown in Figure 2, where multiple agents act at the same time. Under the joint action, the entire system will be transferred, and each agent will be rewarded immediately [29,30].
For multiagent reinforcement learning, it is first necessary to establish a Markov game model. Markov game can be described by a multigroup (n, S, A 1 , . . . , A n , R 1 , . . . , R n ). Among them, (1) n is the number of agents; that is, N is the number of terminal devices. S is the system state, which generally refers to the joint state of multiple agents, that is, the joint state of each agent. e terminal device shares the current load status of edge computing servers, which can be expressed as where LD m is the load of MEC. (2) R i is the instant reward function of each agent. at is, in current state s, after joint action (A 1 , . . . , A n ) taken by multiple agents, the reward is obtained in the next system state s. e reward function completely describes the relationship between multiple agents. When the reward function of each agent is the same, that is, R 1 � R 2 � · · · � R n , it means that the agent is a complete cooperative relationship. When there are only two agents and reward function is opposite, that is, R 1 � −R 2 , it means that the agents are in perfect competition. When the return function is between the two, it is a mixed relationship between competition and cooperation.

Network Status. S � s(t)
{ } represents the network state space, where s(t) represents the network state at time period t, and improvements are made in the entire time period T. e network status consists of SNR of each terminal device and cache status of each MEC server. s(t) can be defined as where Γ n � Γ n,m , m ∈ M represents SNR between user terminal device n and all MECs. ψ m (t) � φ k,m , k ∈ K represents the cache status of MEC servers.

Network
Behavior. e intelligent agent needs to determine the attachment relationship between the terminal device and MEC server in each time period.
at is the determination of the terminal device's computing offload, the allocation of computing resources, and the service cache policy of each MEC server. us, each executable action of terminal devices in the time period t can be defined as follows: where A n (t) � a n,m (t), m ∈ M represents the attachment indicator of terminal device n and G n (t) represents the calculation and offloading decision of terminal device n.

Reward Function.
e goal is to maximize total benefit of system, but the reward function should be set to current benefit of system. First calculate the system leased spectrum and backhaul resources and allocate them to terminal devices part of the revenue. e unit price of spectrum leased from MEC server m is set to δ m per Hz, and the unit price of backhaul link from MEC server m to core network is set to σ m per bps. Corresponding to this, the calculation data is transmitted to MEC server corresponding to terminal device n and backhaul link from MEC server to the core network is used for charging. e unit price is defined as α n per Hz and β n per bps. erefore, by summarizing this part of the income and expenditure, part of income for leased spectrum and backhaul resources obtained by terminal device n can be obtained: σ m a n,m (t)R n,m (t). (20) en, calculate the profit obtained by terminal devices from allocating computing resources. On the one hand, when MEC side performs computing tasks, it needs to pay communication company for the loss of processing computing tasks and define the unit price of MEC server m energy consumption as χ m . On the other hand, the terminal device needs to pay a certain price for the server on MEC side, and computing resource allocated for each unit computing task is set to ζ n . erefore, the benefit obtained by allocating computing resources to terminal device n can be calculated as e amount of computing resources allocated to each unit computing task by the above formula has a very important impact on the completion time of computing task. us, the service cache cost mainly includes two parts: the cost of replacing type of cache supported on MEC side, and the cost of caching specific services on MEC server. Define the unit price of replacing cache type on MEC server m as ξ m for each service type, and the unit price for caching services on MEC server is ξ m per storage space. In order to increase the benefits of cache, the business type is quantified by weak backhaul from MEC server to the core network, which will be used to measure the cost of users. e benefits obtained by executing the cache service on MEC server m can be expressed as where |ψ m (t)| represents the number of nonzero elements, I[·] is an auxiliary function, and when x > 0, I(x) � 1; otherwise, I(x) � 0. e instant reward is designed as the total income of MVNO of all current users of system during the time period t, namely, Here the long-term return R(t) is expressed as where ϵ ∈ [0, 1) is the discount rate of future earnings weights. When ϵ approaches 1, the system will pay more attention to long-term benefits, and when ϵ approaches 0, the system will pay more attention to short-term benefits.

Dueling-DQN.
DQN is an effective reinforcement learning algorithm, which can make the agent learn good experience from the interaction with environments [31][32][33]. At the same time, according to DQN learning mechanism, there are improvements to DQN algorithm in different aspects. In DQN, due to the error in the Q estimated value itself, max a Q process can be seen according to the expression. It is equivalent to putting forward the largest error, which also leads to the problem of overestimation. Double-DQN is an effective improved algorithm for this problem. In Double-DQN algorithm, the update form of Q(S) is changed to where λ is the discount factor. e Double-DQN algorithm takes advantage of double neural network and uses two neural networks to learn at the same time, effectively avoiding the overestimation problem caused by error amplification.
Dueling-DQN is also an improvement to DQN algorithm. Compared with previous algorithms, Dueling-DQN algorithm learns faster and has better results. Compared with DQN algorithm, Dueling-DQN retains most of the learning mechanism, and the only difference is the improvement of neural network, as shown in Figure 3.
In the traditional DQN algorithm, the output result is Q value corresponding to each action. In Dueling-DQN algorithm, the output is expressed as a combination of two parts: the value function and advantage function [34]. Among them, value function refers to the value of a certain state, and advantage function refers to the advantage obtained by each action on the state. erefore, in Dueling-DQN algorithm, Q value problem in DQN can be reexpressed as the following form: where V(·) and l(·) are the value function and advantage function, respectively, and ω is the parameter of neural network convolutional layer. ω 1 , ω 2 are the parameters of two control flow layers, respectively. e latter item of the plus sign centralizes the advantage function in order to solve the uniqueness problem of Q value.

Experimental Results and Analysis
e specific simulation parameters are as follows.
Assume that the computing power of each device n is 1.5 GHz, the uplink transmission power is 800 mW, the idle power is 100 mW, and the upload rate is 2.5 Mb/s. M � 4 and overall computing capacity of each MEC server is 6 GHz, 5 GHz, 3 GHz, and 1 GHz, respectively. e data D n in task G n obeys uniform distribution of (600, 1200), and the unit is k bits. e workload ϕ n obeys uniform distribution of (1000, 1500), and the unit is Megacycles.
For the parameters of Dueling-DQN algorithm, set the learning rate ϵ to 0.001 and discount coefficient λ to 0.90. e size of experience replay set is 3000, and the number of randomly sampled samples is 40.

Learning Rate Analysis.
e learning rate of the algorithm will have a great impact on the performance of the proposed strategy. erefore, three different learning rates ϵ of 0.01, 0.001, and 0.0001 are selected to compare the convergence of improved DQN algorithm, as shown in Figure 4.

Discount Factors Analysis.
Similarly, the influence of discount factor on improved DQN algorithm is shown in Figure 5, where the discount factor takes values 0.8, 0.9, and 0.95.
It can be seen from Figure 5 that as the discount factor increases, the long-term reward is continuously increasing. When λ is 0.95, the long-term reward is 3700 when it is  Computational Intelligence and Neuroscience stable. Because the discount factor will affect behavior selection strategy, that is to say, a larger discount factor will cause system to pay more attention to long-term benefits, and a lower discount factor will cause system to pay more attention to current benefits, a higher discount factor will often lead to greater long-term benefits. However, in actual use, using an overly high discount factor does not have corresponding benefits. is is because the system in reality is more changeable, and too much emphasis on future benefits will lead to excessive calculations and excessive losses in the system, which often requires a trade-off.

Optimization Comparison under Different Objective
Functions. For multiobjective optimization problems that reduce time delay and energy consumption, the weighted sum of task execution time delay and terminal execution energy consumption is usually used as the objective function to solve problem, and the calculation is as follows: where ω t is the weight coefficient of execution delay and 1 − ω t is the weight coefficient of execution energy.
Comparing (22) with the proposed objective function (13) to optimize the delay and energy consumption, the number of terminal devices is 12. Considering that the goal of the proposed strategy is to shorten time delay and reduce energy consumption while satisfying the time delay constraints, therefore, the values of ω t are, respectively, 0.8, 0.6, and 0.4, and the joint experiments of Energy Reduced Scale (ERS) and Time Reduced Scale (TRS) are carried out, as shown in Table 1.
It can be seen from Table 1 that when ω t is 0.8 and 0.6, the control strategy pays more attention to the optimization of time delay, and when ω t is 0.4, optimization results pay more attention to the optimization of energy consumption. However, the optimization result of the proposed objective function is the best, and ERS and TRS are 52.18% and 34.72%, respectively, which can shorten time delay and reduce energy consumption under the time delay constraints.
When computing task is 150, comparing control strategies under the four objective functions with the random offloading strategy, the results of ratio of the time delay and energy consumption reduction are shown in Table 2.
It can be seen from Table 2 that delay and energy consumption optimization effect of the proposed optimization target is better, and the reduction ratio of delay and energy consumption is 2.58% and 30.67%, respectively, because the optimization objective of the proposed strategy comprehensively considers the offloading decision and resource allocation plan of joint optimization system when the computing resources are limited and computing tasks have time delay constraints. is allows all computing tasks to shorten completion time and minimize the energy consumption of all terminal devices while meeting the delay constraints. is demonstrates the effectiveness of the proposed objective function.

Performance Comparison with Other Algorithms.
In order to demonstrate the performance of the proposed strategy, compare it with [12], [19], and [14] in terms of objective function value, calculation amount, and time saving. Li and Jiang [12] proposed a distributed task offloading strategy, which selects the best MEC node offloading amount by game equation on the basis of quantifying offloading cost and delay. Reference [14] used the greedy selection algorithm to design the maximum energy-saving priority algorithm and energy priority strategy to achieve optimal offloading of computing tasks on mobile devices. Reference [19] used the time average calculation rate maximization algorithm to jointly and efficiently allocate radio resources and computing resources.

Algorithm Comparison under Different Cumulative
Tasks. In the experiment, objective function value results of the four strategies are shown in Figure 6 for different accumulations of computing tasks.
It can be seen from Figure 6 that the value of objective function is gradually increasing with the increase of cumulative number of tasks for the four offloading strategies. However, the proposed strategy has a relatively lower objective function value than other strategies.
at is, the energy consumption and delay are relatively small. For example, when the number of tasks is 180, the objective function value is only 298. Since the proposed strategy considers computation offloading and resource allocation comprehensively, improved deep learning algorithm is used for optimization, and delay and energy consumption are minimized. Reference [19] only matched computing resources but did not rationally optimize the task offloading scheme and computing resource allocation scheme, resulting in high task execution time delay and energy consumption. References [12] and [14] both used corresponding algorithms for optimization to achieve better resource allocation and task offloading. But their analysis of time delay is less, so the performance needs to be strengthened.   Figure 7. Vertical axis represents the total calculation number of tasks performed by all terminal devices to perform calculation and offloading. e calculation number of tasks is used to represent the amount of calculation services provided by MEC server. erefore, the evaluation indicators in the figure also represent the benefits of computing terminal devices in the offloading mode. It can be seen from Figure 7 that as time increases, computing tasks continue to increase, and the amount of task calculations also increases. However, the calculation amount of the proposed strategy is significantly better than other comparison strategies. Taking the simulation time of 140 s as an example, compared with [12], [19], and [14], the proposed strategy has increased by 11.54%, 20.83%, and 152.72%, respectively. It can be argued that the proposed strategy is the best compared to task offloading. It uses Dueling-DQN algorithm to process task offloading and resource allocation models, and its optimization performance is better than the greedy selection algorithm in [14] and the game equation model in [12].

Energy-Saving Comparison of per Unit Terminal
Devices. Under four different computation offloading strategies, the comparison of energy consumption saved by each terminal device by computation offloading on average is shown in Figure 8. In the local calculation model, all energy consumption is generated by local calculations. In the computation offloading mode, the energy consumption is communication energy consumption caused by upload tasks. For the task of performing computation offloading, the difference between the two is energy saving.
It can be seen from Figure 8 that, compared with other comparison strategies, the proposed strategy has the largest energy-saving rate, which is close to 10 × 104 J; this also means the least energy consumption. Aiming at the overestimation problem in DQN, the proposed strategy uses Dueling-DQN algorithm for optimization. And it designs the system benefits and resource consumption as rewards and Proposed strategy Ref. [12] Ref. [19] Ref. [14]   Proposed strategy Ref. [12] Ref. [19] Ref. [14]  Ref. [12] Ref. [19] Ref. [14] Proposed strategy losses, which improves the efficiency and rationality of task offloading and resource allocation by optimizing problem solution. Reference [19] only used the time average calculation rate maximization algorithm to efficiently allocate computing resources. e optimization algorithm is more traditional and has poor performance. us, the overall energy saving is not high. Reference [12] used the game equation model to optimize task offloading strategy but does not realize the rationalization of resource allocation. erefore, the maximum energy saving is 710 × 104 J. Reference [14] used greedy selection algorithm to design an optimal energy-saving strategy but did not consider server computing resource constraints and task delay constraints. erefore, the overall performance is not as good as the proposed strategy.

Conclusion
MEC server has limited computing resources and computing task has delay constraint. How to shorten completion time and reduce terminal energy consumption under the delay constraints becomes an important research issue. To solve this problem, this paper proposes a task offloading and resource allocation strategy based on deep learning for MEC. In the multiuser multiserver MEC environment, a new objective function is designed to construct mathematical model. In combination with deep reinforcement learning, the partially improved Dueling-DQN algorithm is used to solve the optimization problem model, which can reduce the completion time of computing tasks and minimize energy consumption of all terminal devices under the delay constraints. e proposed strategy is demonstrated by experiments based on Python platform. e experimental results show that when learning rate is 0.001 and discount factor is 0.90, the energy saving is close to 10 × 104 J, which is better than other comparison strategies. In terms of calculation amount, it increased by 11.54%, 20.83%, and 152.72%, respectively.
In practice, different users have different concerns about service quality. erefore, we can refer to the different needs of users when making computation and offloading decisions in the following research. It can assign a certain weight to the factors affecting the quality of service and combine the task priority for scheduling.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.