A Resource Allocation Scheme for Real-Time Energy-Aware Offloading in Vehicular Networks with MEC

With the emergence of new vehicular applications, computation oﬄoading based on mobile edge computing (MEC) has become a promising paradigm in resource-constrained vehicular networks. However, an unreasonable oﬄoading strategy in oﬄoading can cause serious energy consumption and latency. A real-time energy-aware oﬄoading scheme for vehicle networks, based on MEC, is proposed to optimize communication and computation resource to decrease energy consumption and latency. Because the problem of computation oﬄoading and resource allocation is the mixed-integer nonlinear problem (MINLP), this article uses a bi-level optimization method to transform the original MINLP into two subproblems. Furthermore, considering the mobility of vehicle users (V-UEs) and the availability of cloud resources, an oﬄoading scheme based on deep reinforcement learning (DRL) is adopted to help users make the optimal oﬄoading decisions. The simulation results show that the proposed bi-level optimization algorithm reduces the total overhead by nearly 40% to the compared algorithm.


Introduction
With the continuous progress of the sixth-generation (6G) communication and the Internet of vehicles [1][2][3][4][5][6], novel mobile applications in vehicular networks have promoted demand for the low-delay high-quality services, for example, interactive gaming, augmented reality/virtual reality (AR/VR), face recognition, and natural language processing [7]. e demand of computing is prominent, so that it frequently exceeds the capacity that local mobile devices can provide [8]. In general, vehicle units have limited computation resources (e.g., central process unit (CPU) frequency and memory) and battery lifetime, which will bring an unprecedented challenge for effectively executing these mobile applications [9][10][11]. To cope with the explosive application demands of vehicular terminals, tasks of vehicles are migrated from the vehicular units to the cloud servers [12], referred to as computation offloading. However, the cloud servers are spatially far from mobile vehicles, which may cause high transmission latency and severe degradation of the offloading efficiency.
Mobile edge computing (MEC) technology [13,14] provided the cloud computation resource close to the mobile vehicular terminals. Compared with MCC, MEC can provide lower latency, high bandwidth, and computational flexibility in computation offloading. In [15], the authors studied the offloading problem among multiple devices for mobile edge cloud computing and proposed a game-theoretic approach for achieving efficient offloading mechanisms. To minimize the energy consumption of the MEC offloading system, the authors in [10] proposed an energyefficient computation offloading scheme, which jointly optimized the offloading decisions and the radio resource allocation strategies. In [16], the authors proposed the energy-aware offloading scheme to investigate the trade-off between the energy consumption and the latency, optimizing the computation offloading and resource allocation under the limited energy and sensitive latency. But the weighting factor to investigate the trade-off between energy consumption and latency is a constant, which can define the value of the weighting factor subjectively. e work ignored the service conditions of the battery and the user-specific demands. In addition, the computation offloading, especially in ultra-dense networks (UDNs), may cause more interference and result in unexpected transmission latency [17]. erefore, it is unreasonable to offload all computation tasks to the MEC servers. As a result, it is critical to make an efficient offloading decision and study the trade-off between the energy consumption of vehicle units and the latency of the corresponding tasks. Based on the above discussion, in this article, we propose a real-time energy-aware offloading scheme to study the trade-off between the energy consumption and the task latency of the vehicle units, which will optimize the allocation of communication and computing resources. e motivations behind our work are attributed to the following observations: (1) Build a network scenario that deploys multiple MEC servers and multiple request vehicles, where service nodes (MEC servers and vehicles) are equipped with limited wireless and computational resource. (2) Because of the limited computation resource, the MEC-enabled base stations (MEC-BSs) cannot provide endless computation offloading services for all tasks from V-UEs. erefore, the computation offloading decision should be rationally determined.
(3) e energy consumption and latency affect the quality of service of the vehicle units. e energy consumption and latency depend mainly on allocating transmission power and wireless channels when offloading the tasks to the MEC server.
(4) Depending on the using conditions of a battery of V-UE and user-specific demands, the user preference (i.e., the weighting factor) should be redefined to allow vehicle units to choose different optimization objectives according to the actual energy consumption.
erefore, based on the above discussions, we propose a real-time energy-aware offloading scheme to study the trade-off between the energy consumption and latency (transmission latency and execution latency) of the vehicle units, which optimizes the allocation of communication and computing resource. e main contributions of this article are as follows: (i) We present an integrated framework for computation offloading and resource allocation in multicell scenarios. For V-UEs, we present a real-time energy-aware offloading scheme that combines with computation and communication resource allocation to minimize the weighted sum of energy consumption and latency.
(ii) We redefine the weighting factor with the residual energy, which is no longer the weighting factor of the V-UEs' subjective energy consumption and latency. e advantage of the updated definition is that the actual energy consumption is taken into account, and it can help maintain the lifetime of vehicle users.
(iii) For the mixed-integer non-linear problem (MINLP) of computation offloading and resource allocation, a bi-level optimization approach is adopted to transform the original MINLP problem into two subproblems, that is, a lower-level problem for seeking the allocation of power and channels, and an upper-level problem for task offloading. Furthermore, taking into account the mobility of vehicle users (V-UEs) and the availability of cloud resources, an offloading scheme based on deep reinforcement learning (DRL) is proposed to enable the users to make the optimal offloading decisions.
e rest of the article is organized as follows: Section 2 describes the system model and problem formulation. Section 3 presents the system model and problem formulations. Section 4 presents the optimal computation offloading and resource allocation scheme. e simulation results are discussed in Section 5, and Section 6 concludes the article.

MEC-Enabled Vehicular Networks.
e MEC-enabled vehicular network has been extensively studied recently, since it can support the communication of vehicles and edge servers for real-time information sharing in the Internet of vehicles. In [18], the authors presented a cooperative autonomous driving-oriented MEC-enabled 5G-V2X prototype system design with the knowledge of big data gathered from the environment. e authors in [19] geared the intelligent battery switch service management for electric vehicles by a MEC-driven architecture. In [20], the application of multisource data fusion to support vehicle on-road analysis was proposed. It can provide some services for vehicles of peeking around the corner, extending sensing range and reinforcing local observations. In [21], the traffic prediction algorithm based on MEC was proposed for addressing short-term traffic prediction. A positioning calibration mechanism by calculating the one-way delay existing in the transmission of vehicles based on MEC and global positioning system (GPS) was proposed in [22]. e above works demonstrated that the combination of MEC and vehicular network can be meaningful. For realizing these applications, the resource allocation and offloading scheme become topics worth studying.

Offloading Scheme in Vehicular Networks.
In order to reduce the latency and the energy consumption in the network as well as ensure the reliability of V-UEs, many works have studied resource allocation and offloading scheme in the vehicular network. e authors in [23] designed a distributed best response algorithm based on the computation offloading game model to maximize the utility of each vehicle. But this work only regarded the MEC server as the object to offload tasks. In [24], the authors proposed a joint allocation of wireless resource and MEC computing resource algorithm including V2X links clustering and MEC computation resource scheduling. A hierarchical architecture was constructed based on MEC in [25] to minimize the total network delay, which jointly considers vehicle-to-RSU computation offloading, RSU peer offloading, and content caching. In [26], the authors constructed a three-layer architecture in vehicular network and formulated an energy-efficient computational offloading problem to minimize the overall power consumption while satisfying the delay constraint. e authors in [27] constructed an SDN-assisted MEC architecture for the vehicular network to reduce the system overhead by optimizing offloading decision, transmission power control, subchannels assignment, and computing resource allocation scheme. However, the above works ignored the dynamic balance of the latency and the energy consumption. Different from the above computation offloading and resource allocation strategies for vehicular MEC networks, we designed a real-time energy-aware offloading scheme that combines with computation and communication resource allocation considering the task deadline constraint and the balance of the energy consumption and latency. On this basis, we proposed an offloading and resource allocation scheme based on deep reinforcement learning (DRL) to minimize the total overhead.

System
Model. M Road Side Units (RSUs) are located along the straight road, each of which is connected to a MEC server. Each RSU covers one region, and vehicles can only access an RSU in the region where they are located. e vehicle can communicate with neighboring vehicles or RSUs to send or receive information. As shown in Figure 1, L request vehicles with a Poisson distribution are deployed, expressed as V i (i ∈ 1, 2, . . . L { }). Each RSU serves U j (j ∈ 1, 2, . . . , M { }) V-UEs. To multiplex spectrum, we assume that multiple RSUs share a single spectral resource, so there exists interference between RSUs. e bandwidth W is divided into N channels. V-UEs connect to the RSUs in the orthogonal frequency-division multiple access (OFDMA) mode, where each V-UE channel in the same RSU is orthogonal to others. In this network, V-UE i in RSU j has a computation-intensive task to be completed, in which d i,j is the size of computation input data, c i,j denotes the total number of CPU cycle required to accomplish the computation task, and D max i,j denotes the maximal latency that the V-UE can tolerate. For each V-UE, its task can be executed either locally on itself or remotely in the MEC server via computation offloading. Let s i,j be the offloading decision of V-UE i in RSU j. If the V-UE i in RSU j offloads its task to the MEC server, s i,j � 1, and s i,j � 0 otherwise. In order to increase the readability of the article, the definitions of some variables are listed in Table 1.

Local
Computing. Define f i,loc as the computation power (i.e., CPU cycles per second) of V-UEs. When the task A i,j is executed locally, the computation execution time t L i,j is expressed as e corresponding energy consumption of the V-UE for local execution is given by where k � 10 − 26 is a coefficient that depends on the chip architecture [28,29]. Note that f i,loc affects the computation time and energy consumptions simultaneously. e CPU cycle frequency can be scheduled via dynamic voltage and frequency scaling (DVS) technology [30].

MEC Edge
Computing. When the input data are transmitted to the MEC server through RSU, the transmission expenditure between the MEC server and the RSU is neglected [31,32]. If the V-UE accesses the RSU over channel n, the achievable uplink transmission rate can be represented as where w is the channel bandwidth, w � W/N. p i,j,n and h i,j,n stand for the uplink transmission power and channel gain between the V-UE i and the RSU j via channel n, respectively. σ 2 denotes the noise power. I i,j,n is the interference of V-UE i in RSU j suffering from other V-UEs in adjacent RSUs on the same channel n, and it can be expressed as where l denotes the lth except the jth RSU, h j k,l,n is the channel gain from V-UE k in RSU j over channel n, and U l is the number of V-UEs in RSU l. Hence, the total uplink transmission rate for V-UE i in RSU j can be obtained as

Wireless Communications and Mobile Computing
where a i,j,n ∈ 0, 1 { }. a i,j,n � 1 indicates that the channel n is assigned to V-UE i in RSU j to offload its task, and a i,j,n � 0 otherwise. Let f C express the CPU cycle frequency of the MEC server, which is fixed during computation task execution [30,33]. en, the total edge computing time of the task includes the transmission time and computation time on the MEC server, which can be written as e corresponding energy consumption of V-UE is expressed as Here, we assume that the time and energy consumption of computation result from the MEC server to the V-UE are neglected in this case because the size of output data is much smaller than the size of input data and the download data rate is very high in general, which is similar to the study [32].
During the execution of a task, both latency and energy consumption will affect the V-UEs (i.e., battery energy limitation of V-UEs). Hence, we introduce the weighting factor w i,j (w i,j ∈ [0, 1]) to study the trade-off between latency and energy consumption, which can define to meet the user-specific demands [31]. Energy and latency can be saved by adjusting the weighting factor. However, we introduce the residual energy rate r E i,j of the battery into our model's weighting factor. It is defined as where is the residual energy of V-UE i in RSU j. E total is the battery capacity in Joules [34]. r E i,j is a variable that affects the real-time service conditions of the battery.
According to (1) and (2), the overhead of the task locally computed on V-UE i in RSU j, namely, the weighted sum of energy consumption and latency O L i,j , can be obtained as (9) can be simplified to Similarly, the overhead of the task computed in the MEC sever can be expressed as erefore, the overhead of the V-UE i in RSU j can be obtained by 3.2. Problem Formulation. As mentioned above, we model the problem in a multicell scenario. In the multicell network, interference management and computation offloading are considered [16]. We formulate the offloading and resource allocation for the MEC system as an optimization problem.
Constraint C1 is the maximum tolerance latency. C2 is the energy consumption that should not exceed the residual energy of the V-UE. C3 restricts the local CPU cycle frequency into a finite set of values. C4 guarantees the maximum transmission power on V-UE i in RSU j. C5 addresses that the interference on RSU j caused by V-UEs with offloading tasks in other RSUs on each channel must meet the predefined threshold I. C6 indicates that each V-UE can only be allocated at most one channel. C7 states the binary variables used to present channel allocation. a i,j,n � 1 indicates that V-UE i uses subchannel n for task offloading, and a i,j,n � 0 otherwise. C8 denotes offloading decisions as to the binary variables. In other words, C8 specifies that each V-UE completes its task either by local execution or by edge execution.

Optimal Local Computing via CPU Cycle Frequency
Scheduling. e latency and energy consumption of local computing is determined by local computing capability f i,loc . To minimize the local overhead, we can schedule the CPU cycle frequency of V-UE. Considering the constraints of C1, C2, and C3 in problem P1, P1.1 can be formed as follows: According to [25], it can be seen that f i,loc is a large order of magnitude value; therefore, (f i,loc ) 2 and (f i,loc ) 3 are negligible. en, the above equation can be obtained as follows: (16) monotonously increases with the increase of f i,loc . Otherwise, it monotonously decreases with the increase of f i,loc . According to C1 and C2, we have Combining (16) with the constraint C3, we define To ensure the feasible region of f i,loc to be nonempty,

Optimal Edge Computing through Resource Allocation and Computation
Offloading. In a multicell scenario, we combine power allocation, channel allocation, and computation offloading to minimize the V-UEs' weighted sum of latency and energy consumption. Because there are two binary variables, a large number of other variables and interference terms exist. erefore, the problem is nonconvex and MINLP, the bi-level optimization approach is taken to solve it. e original MINLP problem is decoupled into a lower-level problem for seeking the optimal power allocation and optimal channel allocation and an upper-level issue for task offloading [35]. Simultaneously, considering the mobility of vehicle users (V-UEs) and the availability of cloud resources, an offloading scheme based on DRL is proposed to enable the users to make the optimal offloading decisions [36].

P1.2: min
Because both the objective function and constraints are nonconvex, P1.2 is a nonconvex problem and belongs to mixed-integer non-linear problem.
Proof: the proof of nonconvexity of objective function is shown in Appendix A. e proof of nonconvexity of constraints refers to the proof of objective function However, formula (21) is an MINLP and it is very hard to find its optimal solution. Next, we use the bi-level optimization approach to solve it. Note that P1.2 involves two embedded problems. One is to calculate the computing overhead of the entire system only when the offloading strategy s is known, and the other is to be inversely influenced by power p and channel allocation strategy a.

Bi-Level Optimization Approach.
In the view of P1.2, we adopt a bi-level optimization approach to solve the original problem. e bi-level optimization problem is treated as a multistage game between the upper problem and the lower problem. Firstly, given the task offloading strategy s, the optimal power allocation p and channel allocation strategy a can be solved by function F(s, p, a). en, according to the optimal power p * (s) and channel allocation strategy a * (s), which have been calculated in lower-level problem, the optimal task offloading strategy s * is solved by function F(s, p * (s), a * (s)) in the upper-level problem.

Lower-Level Problem.
Given the task offloading strategy s, the local overhead becomes the known quantity, and the lower-level problem can be written in the form of P1.2 as follows: Constraint C1 can be written as s i, Obviously, P2.1 is almost a strictly convex problem, except for the discrete channel assignment value a i,j,n . Relaxing the integer variable a i,j,n to be continuous between [0, 1], the problem of P2.1 is written as the Lagrangian function: For fixed s, the optimal power and channel allocation value can be obtained by partial derivation of the p i,j,n and a i,j,n in formula (23). erefore, the deviation of p i,j,n in formula (23) can be calculated: en, we can obtain the optimal power of V-UE i on channel n.
Once the optimal power allocation p * (s) is calculated, the optimal channel allocation strategy can be calculated through the following expression: en, the optimal channel allocation a * i,j,n can be obtained as follows: where variable matrices λ 1 , λ 2 , and λ 3 are updated using their corresponding subgradients: where μ 1 , μ 2 , and μ 3 are the appropriate step sizes of the subgradient algorithm. Hence, the pseudocode of the algorithm is shown in Algorithm 1. Algorithm 1 details the iterative solution process for resource allocation using the two-tier optimization algorithm.

Upper-Level Problem.
Given the power and channel allocation strategies (p * (s), a * (s)) for fixed s, at this point, the optimal objection becomes P3: min where the optimization problem is convex concerning s i,j , and if the constraint C1 is satisfied, the optimal offloading decision can be found by Q-learning deep reinforcement learning offloading scheme.

Q-Learning Deep Reinforcement Learning Algorithm.
Q-Learning deep reinforcement learning algorithm can ensure the V-UEs make optimal offloading decisions based on the dynamics of the system in terms of V-UE and cloudlet behaviors. e problem is represented as a Markov decision process (MDP), taking into account the number of task requests for all V-UEs, the maximum number of service resources that a V-UE can provide and the size of the number of remaining tasks in the edge cloud, and the distance between the V-UE and edge cloud as state space. e primary objective is to find the optimal actions on how many tasks the V-UE should process locally and how many tasks will be offloaded to each cloudlet such that the V-UE's utility obtained by task execution is minimized while minimizing the energy consumption and latency [36].

MDP-Based Offloading Problem Formulation.
e problem of making an offloading decision for the V-UE is expressed as a finite MDP. e MDP can be represented as a tuple M � (S, A, P, R), where S and A represent state and action spaces, R(s, a) is the immediate reward obtained by doing action a at state s [36]. π is a policy that maps from a state s to an action a, that is, π(s) � a. e primary goal of the V-UE is to find the optimal policy π * to minimize the V-UE's utility. erefore, the state space S is defined as follows: where Q u , Q um , Q c , and D denote the number of task requests for all V-UEs, the maximum number of service resources that a V-UE can provide, and the size of the number of remaining tasks in the edge cloud and distance state, respectively. e action space A is defined as � a 0 , . . . , a i , . . . , a N |a i ∈ 0, 1, . . . , a max ,

Wireless Communications and Mobile Computing
where a 0 represents locally processed task sequence, and a i (i � 1, . . . , N) is the number of tasks to be offloaded to the cloudlet i. a max is maximum number of tasks that are processed locally or offloaded to the MEC within each decision period. In addition, the total number of tasks per action a must be equal to or less than the number of tasks currently remained in V-UE's queue. Immediate reward for V-UEs minimizes overhead when making optimal offloading decisions in each system state. erefore, we define the immediate reward function R(s, a) given an action a at state s as follows: where U(s, a) and C(s, a) are immediate utility and cost functions. For the utility, we define the utility function as where ρ stands for a utility constant. e immediate cost function C(s, a) is expressed as where η 1 and η 2 are a constant. E(s, a) is energy consumption, and T(s, a) is delay, that is,

DQN-Based Offloading Decision Algorithm.
To find out the solution based on MDP offloading problem, we propose an online learning scheme based on the model-free deep RL algorithm, called deep Q-network (DQN) [37]. In the DQN-based learning scheme, the V-UE acts as an agent that interacts with the mobile edge cloud environment, receives sensory data, and performs a valid action. en, the agent selects an offloading action a t for state s t at time slot t in a fashion that minimizes the future discounted reward over a long run. A deep neural network, called a deep Q-network, is used to approximate the optimal action-value function [37]: Taking action a at the state s at the time slot t in the offloading policy π discounted through c to minimize the sum of reward r t . c ∈ (0, 1) is a discount factor to balance immediate reward and future reward. Q * (s, a) denotes the expected total reward in state s and action a. Note that E [.] denotes the expectation function.
e Q-network can be regarded as a neural network approximator, which is used to estimate the action-value function. θ is a weight value of Q(s, a; θ). At each decision period, the V-UE first takes the state vector S � (Q u , Q um , Q c , D) as the input of the Q-network and obtains the Q-values Q(s, .) for all possible action a as outputs. en, the V-UE selects the action according to the ε − greedy exploration method.
Furthermore, the Q-network can be trained by iteratively adjusting the weights θ to minimize the loss function sequence, where the loss function at time slot t is where θ t represents the network parameters in iteration t, and the previous network parameters θ t− 1 are used to compute the target (r t + c max a′ Q(s t+1 , a ′ )). In other words, given a transition s t , a t , r t , s t+1 , the weights θ updates the Q-network in a way that minimizes the squared error loss between the current predicted Q-value of Q(s t , a t ) and the target Q-value of (r t + c max a′ Q(s t+1 , a ′ )). In addition, in the DQN algorithm, the empirical replay technique is used as the training method to improve the Q-network instability due to the use of a non-linear approximation function. More specifically, the V-UE's experiences set e t � s t , a t , r t , s t+1 are stored in a (1) Input： s i,j ε I dd , O i,j (2) Output: p * i,j,n a * i,j,n , (3) Initialize weights matrix p i,j,n , a i,j,n (4) for o � 1: for n � 1: N (8) for i � 1: L (9) Calculate p * i,j,n according to (25) Obtain the optimal channel allocation a * i,j,n using (27) (11) end for (12) end for (13) Update the variables λ 1 , λ 2 and λ 3 using (28), (29) and (30)  replay memory Ω � e t− ψ , . . . , e t , where ψ is the replay memory capacity. While training the Q-network, a random mini-batch is taken from the replay memory instead of the most recent transition e t . e pseudocode of Algorithm 2, a detailed DQN-based offloading decision algorithm is presented. e lines (2)-(4) are recursive, presenting the V-UE's action on making an offloading decision at the beginning of each decision period. en, the Q-values are estimated by using Q-network. e lines (5)-(7) are to train the Q-network by using the experience replay method.

Simulation Results
In this section, the MATLAB simulation platform verifies that the proposed resource allocation mechanism is efficient in the MEC-based vehicle network. We follow the highway parameter settings described in the 3GPP TR 36.885 and build the system model as specified in the MEC white paper.
Among them, the RSU is deployed on the highway side with a communication radius of 250 meters. In addition, the MEC server is deployed on the RSU to provide services to the vehicles. e specific simulation parameters are shown in Table 2.
In a multicell scenario, each RSU shares the same spectrum to improve spectral efficiency. However, V-UEs are connected to the RSU by OFDMA technology, and they suffer from the interference of neighboring RSUs. For example, assuming that each RSU has ten channels, each RSU has 25 V-UEs, and there are 5 RSUs. Figure 2 shows the bi-level optimization approach compared with the F-W algorithm [16], the exhaustive resource allocation algorithm (ERAA) [38], and the nonresource allocation (no power allocation and channel allocation) in preference analysis. Although the iterative times of the ERAA are high, the optimal allocation value is found in all the cases, so it converges to almost optimal value in each iteration. As seen from Figure 2, the latency without resource allocation is the highest, followed by the  (5) : Select action a t according to ε − gree dy policy (6) : Offload the tasks according to action a t and observe reward r t � R(s t , a t ) and next states t+1 (7) : Store experience s t , a t , r t , s t+1 into replay memoryΩ (8) : Randomly select a set of transitions s, a, r, s′ from replay memoryΩ (9) : Train the Q-network based on selected transitions using (r t + c max a′ Q(s t+1 , a′; θ t− 1 ) − Q(s t , a t ; θ t )) 2 as loss function (10)  Wireless Communications and Mobile Computing F-W algorithm, and the bi-level optimization approach has the lowest latency.
is fully shows that resource allocation can reduce the latency of the system. Furthermore, compared with the F-W algorithm, the proposed bi-level optimization approach can find better power allocation and channel allocation values so that the latency is lower. Figure 3 shows that the total energy consumption increases with the increase in the number of users. When the number of users is less than 10, there is a slight difference in the total energy consumption among different algorithms due to the small number of users. When the number of users is more than 10, the total energy consumption using F-W algorithm increases gradually as the number of users increases. e total energy consumption without resource allocation keeps growing at high speed all the time. It can be seen that resource allocation is significant for reducing energy consumption. ERAA algorithm can converge to the optimal value through multiple iterations. e performance of the bi-level optimization approach we adopted is very close to that of the ERAA algorithm in terms of reducing energy consumption. Still, the complexity of ERAA is much higher than that of the bi-level optimization approach. Because the energy consumption is related to the transmission power, and the advantage of the bi-level optimization approach compared with the F-W algorithm is that it can find a better power allocation value, so the performance of the bi-level optimization approach is better than that of the F-W algorithm in terms of reducing the total energy consumption. Figure 4 shows the relationship between the number of V-UEs and the total overhead. e total overhead is the weighted sum of the latency and energy consumption. It can be seen from the figure that when the number of users reaches 20, the ERAA algorithm becomes stable and no longer increases. In other words, the ERAA algorithm eventually converges to the optimal value. But it needs to be iterated many times, which makes the complexity very high compared with the proposed bi-level optimization approach. It can be seen from Figure 4 that the total overhead is the largest when the resources do not allocate. Compared with the F-W algorithm, the bi-level approach we used better affects the channel and power allocation. In summary, considering the problem of minimum complexity and overhead, the bi-level optimization algorithm can provide better performance. Figure 5 shows the effect of different weighting factors on V-UEs' sum of energy consumption and latency. e definition of the weighting factor considers the real-time residual energy awareness of V-UEs. Compared with the subjective weighting, we redefine the weighting factor w1 � w * rE, which can achieve a better compromise, and save more energy than w � 0.8 and w � 0.5 from Figure 6. Because energy consumption is proportional to the weighting factor and inversely proportional to latency, the weighting factor w1 � w * rE can decrease latency comparing with w � 0.2 from Figure 7. It shows that the proposed algorithm can obtain lower cost than other weighting factors. Figure 8 shows the learning curves, which present the total reward obtained by the offloading policy learned in each training episode. Note that an episode includes 5000 iterations (i.e., the number of decision periods), in each of which the V-UE selects an offloading action for each state according to the current policy learned by the proposed algorithm. As shown in Figure 8, the total reward obtained in each episode decreases steadily when the learning time, that is, the number of episodes, increases from 1 to 200. en, all learning curves become stable and no longer decrease when the episode number is higher than 200. is result indicates that the proposed DQN-based learning algorithm converges after 200 learning episodes. It can also be seen from Figure 8 that when the discount factor gamma � 0.4, the convergence speed is the fastest, and when gamma � 0.8, the overhead is the smallest. Because minimal overhead is the primary consideration, gamma � 0.8 is chosen. The total latency (ms) 10 15 20 25 5 The number of V-UEs

Conclusions
In this work, we proposed a resource allocation scheme for offloading in real-time energy-aware vehicular networks with MEC to minimize the total overhead, which balanced latency and energy consumption. We jointly considered the offloading resource allocation and redefined the weighting factor based on the real-time residual energy aware of the V-UEs. In the multicell networks, a bi-level optimization approach was proposed to approximate the MINLP problem to a convex problem. By taking partial derivatives, we found a better resource allocation scheme, which simplified the problem and reduced the computational complexity. e scheme in this article reduced the total overhead by nearly 40% to the compared algorithm. Also, the proposed offloading scheme based on deep reinforcement learning (DRL) enabled the users to make the optimal offloading decisions by considering uncertainties of resource availabilities of users and cloudlets. e authors use the proof of absurdity to prove that the objective function in formula (20) is nonconvex. e authors suppose that the objective function in formula (20) is convex. e affine transformation is composed of a linear addition and a translation. It does not change the concavity and convexity of functions. erefore, according to affine transformation and convexity preserving operations, the objective function in formula (20) can be equated to and formula (A.2) is obtained through a series of affine transformations and convexity preserving operations according to formula (20). Formula (20) and formula (A.2) have the same concavity and convexity.
According to the theorem, if the function f: Ω ⟶ R, f ∈ C 2 is defined on the open convex set Ω ⊂ R n , f is defined on the convex function Ω, if and only if for any x ∈ Ω, f is semidefinite on the Hessian matrix F(x) at point x. en, the authors calculate the Hessian matrix and determine whether it is semidefinite:  Let F s,a � p i,j,n /log 2 (1 + p i,j,n ), F a,s � p i,j,n log 2 1 + p i,j,n , F s,p � a i,j,n log 2 1 + p i,j,n − 1 + a i,j,n p i,j,n /p i,j,n ln 2 log 2 1 + p i,j,n 2 , F p,s � a i,j,n log 2 1 + p i,j,n − 1 + a i,j,n p i,j,n /p i,j,n ln 2 log 2 1 + p i,j,n 2 , F a,p � s i,j,n log 2 1 + p i,j,n − s i,j,n /ln 2 log 2 1 + p i,j,n 2 , F p,a � s i,j,n log 2 1 + p i,j,n − s i,j,n /ln 2 log 2 1 + p i,j,n 2 , F p,p � s i,j,n a i,j,n /p i,j,n ln 2 + s i,j,n /ln 2p 2 i,j,n log 1 + p i,j,n 2 − s i,j,n a i,j,n log 1 + p i,j,n − s i,j,n 1 + p i,j,n /p i,j,n ln 2 2 log 1 + p i,j,n /p i,j,n ln 2 Due to F s,a � p i,j,n /log 2 (1 + p i,j,n ) > 0 and F a,s � p i,j,n /log 2 (1 + p i,j,n ) > 0, 0 F s,a F a,s 0 � − (F s,a ) 2 < 0. erefore, H is not a positive definite matrix. is shows that the objective function in formula (20) is nonconvex.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare no conflicts of interest.