An Efficient Resource Management Optimization Scheme for Internet of Vehicles in Edge Computing Environment

The contradiction between limited network resources and a large number of user demands in vehicle environment will cause a lot of system delay and energy consumption. To solve the problem, this paper proposes an efficient resource management optimization scheme for Internet of Vehicles in edge computing environment. Firstly, we give a detailed formulation description of communication and computing cost incurred in the resource optimization process. Then, the optimization objective of this paper is clarified by considering the constraints of computing resources, and system delay and energy consumption are considered comprehensively. Secondly, considering dynamic, random, and time-varying characteristics of vehicle network, the optimal resource management scheme of Internet of Vehicles is given by using distributed reinforcement learning algorithm to optimize total system overhead to the greatest extent. Finally, experiments show that when bandwidth = 40 MHz, the total system cost of the proposed algorithm is only 3.502, while that of comparison algorithms is 4.732 and 4.251, respectively. It is proved that the proposed method can effectively reduce the total system overhead.


Introduction
In recent years, the automotive industry has developed rapidly, and intelligence and networking have become an important trend in the future development of automotive industry [1]. On the one hand, these technologies enable communication and information exchange between Vehicle to Vehicle (V2V) and Vehicle to Infrastructure (V2I), helping to build safe, collaborative, and intelligent transportation systems. On the other hand, this in turn generates a large amount of data, and at the same time, the demand for computing, communication, and content increases significantly [2][3][4]. With the development of Internet of Vehicles (IoV) and intelligent connected vehicles, in-vehicle infotainment applications such as road safety, intelligent navigation, autonomous driving, and in-vehicle entertainment continue to emerge.
is promotes the development of intelligent transportation and brings a great improvement to driving experience [5][6][7]. Due to the particularity of the physical location of vehicles and cloud servers, the backhaul link capacity is limited. Such a high content demand of the Internet of ings will bring a huge burden to the core network [9]. At the same time, they also pose a major challenge to support massive content delivery and meet the low-latency requirements of IoT [10][11][12]. e introduction of mobile edge computing (MEC) technology makes up for the network instability and delay limitations of cloud computing in IoT scenario and is more suitable for low-latency, high-reliability task computing on IoT requirements [13][14][15][16][17][18]. e cloud server located in core network is far away from vehicles, and vehicles need to rely on a large base station for multi-hop transmission to offload tasks to the cloud server for processing. However, it is prone to network fluctuations and transmission interruptions and is unreliable for in-vehicle applications, especially safe driving applications [19,20]. erefore, using distributed MEC services to replace traditional cloud computing services can effectively solve the resource management optimization problem in IoT [21]. e main factors that affect the decision of computing offloading are the execution delay and energy consumption of task. us, optimization goals usually include solutions such as reducing delay, reducing energy consumption, and weighting between delay and energy. Reinforcement learning can capture the hidden dynamics of environment well, so it is often used to optimize resource allocation algorithms. Liu et al. [22] proposed a resource allocation strategy based on deep reinforcement learning (DRL). Zhan et al. [23] designed a strategy optimization method based on DRL by using game theory. Huang et al. [24] studied the wireless charging MEC network and proposed an online decision-making method based on DRL. Hui et al. [25] proposed a content dissemination framework based on edge computing. Combining the selfishness and transmission ability of vehicles, the authors designed a two-level relay selection algorithm to reasonably select relay vehicles to meet different transmission needs. Su et al. [26] used the vehicles parked around the street and the vehicles driving along the road to communicate the vehicle social community through V2V and used the content cached in the parked vehicles to reduce the delay of content download. Zhao et al. [27] proposed a caching strategy in V2V scenario with information as the center and designed a dynamic probabilistic caching scheme. Zhang et al. [28] proposed a MEC scenario computing resource allocation scheme based on DRL network, which avoids falling into the disaster of dimensionality. Zhang et al. [29] proposed a joint optimization scheme of IoT content caching and resource allocation based on MEC in high-speed free-flow scenario, which reduced data acquisition latency. Li [29] proposed a resource allocation strategy for computing unloading in vehicle Internet based on DRL. However, in the case of limited network resources and a large number of user demands, the above research has the problems of excessive resource delay consumption and large energy consumption. erefore, the optimization of IoT resource management in the MEC system scenario is a challenging problem.
Based on the above analysis, in view of delay and energy consumption caused by the contradiction between limited network resources and a large number of user needs in vehicle environment, this paper proposes an efficient resource management optimization scheme for IoT in edge computing environment. is method takes minimizing the weighted sum of system delay and energy consumption as the optimization goal and constructs a communication model and task offloading optimization model in IoT edge computing scenario. Moreover, a solution algorithm based on distributed reinforcement learning is used to optimize the total system overhead.

System Model.
e system model is shown in Figure 1. J road side units (RSUs) are evenly distributed on the road, and all have MEC services configured, denoting MEC server as mec j , j ∈ 1, 2, . . . , J { }. Each of C randomly distributed vehicles performs multiple computing tasks. Suppose the sum of computing tasks of all vehicles is N, and the computing tasks are denoted by L. b represents the size of input data, w represents the task computation amount, and t max represents the task deadline. If the task processing exceeds the time limit, it means that the task processing fails, and R L represents MEC cell carried by vehicle-mounted terminal to which the task belongs. ω represents the importance of computing task to distinguish the task is a secure computing task and a common computing task. erefore, the computational task can be denoted as L � b, w, ω, t max , R L . Let i represent the number of onboard terminals offloading the computing task to MEC server, and x � 0 indicates that the task is executed locally. e offloading strategies of N computing tasks constitute the offloading strategy vector set X � x 1 , x 2 , . . . , x N .

Communication Model.
When task i chooses to perform computing offloading, a corresponding offloading decision needs to be made to decide which MEC server to offload to and which channel to select to upload data. When the offloading decision vector d of all users is given, data transmission rate R m n (d) on the n channel between user u i of offloading decision d 0 n > 0 and the j RSU can be obtained. e maximum information transfer rate V is where W is the channel bandwidth and SNR is the ratio of average power of signal transmitted in the channel to noise power in the channel, that is, the signal-to-noise ratio. Its calculation is where p i represents the transmission power of users, that is, the transmit power of user equipment; g h i represents the channel gain of communication channel selected by users; and c 2 represents the white Gaussian noise power. e data transfer rate When MEC server resources are insufficient, the system unloads the task to other servers.

Local Server Computing.
When the vehicle communicates directly with the local server, the vehicle will unload the computing task to the server in the cell. After the server completes the execution, the result will be returned to the vehicle immediately. e total task delay includes upload delay, server calculation delay, and return delay. Let t mec i,j denote the task execution delay, e mec i,j denote the computing resources, and v i,j denote the wireless transmission rate: Since the return rate is much higher than the upload rate, the return delay of the calculation result can be ignored. e total time delay of unloading calculation is e energy consumption for unloading calculation is where P loc represents the energy consumption per unit cycle of local CPU.

Other Server Computing.
When the MEC server in the cell where the vehicle is located is overloaded, the computing task is unloaded to other cell servers. Communication between MEC servers is generally performed through wired communication links such as optical fibers. Assuming that the average task transmission delay on the wired link is t w and c represents the number of wired link hops between computing tasks offloaded to other servers, the task processing delay at this time is expressed as en, the energy consumption of other servers' offloading computing is 2.4. Problem Modeling. τ is the calculated weight, and the weighted sum of the total delay is e total energy consumption is Considering delay and energy consumption, the total cost of local calculation is where c is the weight. e optimization problem can be formulated as In order to ensure that the task is completed on time, the calculation task is required to complete the task before the vehicle leaves the MEC unit, and the following conditions shall be met: Computing tasks unloaded to other servers should meet the following conditions: e computing resources required to complete computing tasks are e total computing resources required for computing tasks are e constraints are as follows: C1: x i ∈ 0, 1, 2, . . . , j , ∀i ∈ N, C1 indicates that a computing task can only be offloaded to one edge server and cannot be offloaded to two or more at the same time. C2 means that the computing task adopts binary offloading, which can choose not to offload or offload entire task at the same time, that is, the task is indivisible. C3 indicates that computing resources required by computing tasks offloaded to edge server cannot exceed the total resources of edge servers.
Computational Intelligence and Neuroscience 3

Problem Solving Based on Distributed Reinforcement
Learning. In view of dynamic, random, and time-varying nature of in-vehicle networks, artificial intelligence algorithms are more suitable for resource management and task scheduling than traditional mathematical methods. In comparison, Q-learning needs to maintain Q-table and is not suitable for networks with many states. Deep deterministic policy gradient algorithms need to use an experience replay mechanism to eliminate the correlation between training data. For experience playback mechanism, the agent consumes more resources for each interaction with the environment. e off-policy learning method adopted can only be updated based on the data generated by the old policy. erefore, consider using the actor-critic algorithm to reduce the overhead required for algorithm execution, while providing optimal offloading decisions and resource management based on real-time network environment. Modeling the environment of system with an actor-critic algorithm requires determining its state space, action space, and reward function. e state space, S, consists of computing resources and cache resources of in-vehicle network, S � F 1 , F 2 , . . . , F M , S 1 , S 2 , . . . , S M }, where F i and S i represent the computing capacity and storage capacity of road side unit i, respectively. e action space consists of offloading decisions of vehicles, caches of road side units, and computing resource management, . . , f iM represent the set of vehicle i offloading decision, road side unit storage, and computing resource management, respectively.

Reward Function.
e goal of reinforcement learning training is to maximize long-term cumulative reward. According to the objective function of this paper, the reward function is designed as e public neural network in actor-critic algorithm consists of multiple threads, and each thread has the same 2 modules as public neural network: the policy (actor) network and the critic (critic) network. e actor network is used to optimize the policy π(a t |s t ; δ) with parameters δ; the critic network tries to estimate the value function V(s t ; δ) with parameters δ v . At time t, the actor network performs action a t based on current state s t , gets a reward r t , and enters the next state s t+1 .
Use the advantage function A(a t , s t ) to represent the difference between the action value function Q(a t , s t ) and state value function V(s t ): A a t , s t � Q a t , s t − V s t .
To speed up convergence, approximate Q(a t , s t ) with k step sampling: where c is the discount coefficient, r t+i represents the instant reward, and V(s t ) is obtained through critic network. Taking the parameter δ as a variable, differentiate the policy loss function to obtain ∇ δ f π (δ) � ∇ δ log π a t ||s t ; δ A a t , s t + β∇ δ H π s t ; δ , (22) where H is the entropy of policy and β is the coefficient.
For the value loss function, there are Based on RMSProp algorithm, the gradient estimate can be expressed as where a represents momentum and Δδ represents the cumulative gradient of loss function. e update parameters of RMSProp algorithm are

Algorithm
Flow. e proposed offloading strategy flow based on distributed reinforcement learning is shown in Algorithm 1.

Simulation Settings.
is section uses Python to simulate and verify resource management optimization scheme for IoT and evaluate the pros and cons of different algorithms by comparing the impact of each algorithm on the total system overhead with the number of vehicles, the number of tasks, and the bandwidth. e simulation parameters are set as shown in Table 1. Due to the existence of small-scale fast fading and the mobility of mobile devices in established model, the results of each run are random. erefore, the mathematical method of statistical averaging is used to obtain average value as the final result. e computer configuration information used for the simulation is Windows Server 2019, Intel(R) Xeon(R) 2.6 GHz processor, and 16 GB RAM. Figure 2 describes the convergence of algorithm in this paper under different learning rate scenarios. It can be found from the figure that when the learning rates of actor and critical networks are L a � 1×10−3 and L b � 1×10−2, respectively, although the learning speed of algorithm is very fast, it will degrade the final convergence performance of system. When the learning rate is too small (L a � 1×10−3, L b � 1×10−2), the learning speed will drop sharply. erefore, the learning rate is set to L a � 1×10−4, L b � 1×10−3 in the follow-up experiment.

Comparison of Accumulated Average Rewards under Different Schemes.
Compare the average reward value of the proposed scheme with the following schemes: (1) all-local strategy; (2) random strategy; (3) all-MEC policy. During DDPG training, there will be violent shocks.
us, this section observes the convergence of neural network by calculating the cumulative average value of system reward. Figure 3 shows the comparison of cumulative average rewards for different caching schemes. With the increase of training times, it can be seen that all-MEC and random schemes can gradually converge to a stable cumulative average. All-local strategy behavior is not encouraged, so the reward value is the lowest. Because the proposed algorithm needs to consider the road conditions of adjacent areas, increases the dimension of system state, and improves the complexity, it has poor performance at the beginning of training and obtains the highest average reward value after convergence. erefore, the proposed resource optimization management scheme for IoT can make full use of communication resources and effectively improve the effectiveness of the system.

Performance Comparison under Different Algorithms.
In order to prove the advantages of the proposed algorithm, the algorithms in [28][29] are compared with the proposed algorithm under the same experimental conditions. Figure 4 shows the impact of the number of vehicles on the delay. It can be found that the delay of system task processing increases with the increase of the number of vehicles. is is mainly due to the increase of processing tasks and the limitation of computing resources. Among all algorithms, e reference [29] algorithm has the largest delay.
Input: actor network, actor target network, critical network and critical target network, learning rate α, discount rate c, attenuation factor λ. Output: computing task offloading policy π ′ . Initialize the critical network parameter δ c and actor network parameter δ a . Initialize the status of experience playback pool and task vehicle s 0 For t ≤ T do Observe the environment status s t and select actions a t based on the current policy Execute the action a t , get the reward r t , and transfer to the state s t+1 Save array (s t , a t , r t , s t+1 ) to experience playback pool If the memory bank is full, but the stop condition is not met, a small batch of arrays (s, a, r, s′) is randomly sampled from the experience playback pool.
Update critical network parameters, actor network parameters, and target network parameters End End ALGORITHM 1: Resource management algorithm based on distributed reinforcement learning.

Computational Intelligence and Neuroscience
Compared with the reference [29] algorithm and the proposed algorithm, the vehicle will undertake more tasks. Due to the limitation of the vehicle's own computing resources, processing tasks alone will cause greater delay. Due to the limitation of vehicle computing resources, processing tasks alone will cause large time delay. e proposed algorithm considers the cooperation of terminal, edge, and cloud, improves the utilization efficiency of resources, and minimizes the system delay. e change of total system overhead with bandwidth under different algorithms is shown in Figure 5. Figure 5 shows that with the increase of bandwidth, the total system overhead of three algorithms shows a downward trend, but the total system overhead of the proposed algorithm is always lower than that of other two algorithms. When bandwidth � 40 MHz, the total system overhead of the algorithm in [28] is 4.732, and the total system overhead of the algorithm in [28] algorithm is 4.251, while the total system overhead of the proposed algorithm is only 3.502. Further analysis shows that when the cloud computing ability of comparison algorithm is relatively weak, most of computing tasks will be completed at the edge node, which cannot make good use of the cloud edge system. erefore, it produces high total system overhead. Compared with the other two algorithms, the proposed algorithm can achieve the lowest total system overhead because the proposed algorithm considers the dynamic, random, and time-varying characteristics of vehicle network to optimize system performance to the greatest extent. e change of total system overhead with the number of tasks is shown in Figure 6. With the increase of the number of tasks, the total cost of the three algorithms shows an upward trend. However, the total system overhead of this algorithm is less than that of the algorithms in [28,29]. is is because the algorithm can collect the state and action information of the whole system and make better decisions according to the global information, so the total cost of the system is low. e comparison algorithm does not fully analyze the state and action information of the system, and the multi-vehicle game increases the energy consumption, resulting in the increase of the total cost of the system.

Conclusion
Aiming at the problem of delay and energy consumption caused by the contradiction between limited network resources and a large number of user needs in vehicle environment, this paper proposes an efficient resource management optimization scheme for IoT in edge computing environment. e proposed algorithm builds a communication model and task offloading optimization model in IoT edge computing scenario and solves them based on distributed reinforcement learning to maximize the system performance.
In the future, we will study the content acquisition decision combined with micro-traffic data and the N=20 Number of vehicle Ref. [28] Ref. [29] Proposed algorithm Ref. [28] Ref. [29] Proposed algorithm Ref. [28] Ref. [29] Proposed algorithm  prediction of vehicle mobility to further improve algorithm performance. Besides, a dynamic situation will be considered, that is, devices may leave the current edge server during computing offloading. In this case, it is necessary to set up a more effective mobile model for devices. In addition, the current blockchain technology provides a powerful solution for the unloading of secure computing tasks in the IoV. In view of the contradiction between the high real time of IoV applications and the low real time of blockchain, we can take advantage of the differences of participants in IoV in terms of security level, computing power, and communication ability to design a hierarchical blockchain structure matching the cloud IoV structure to solve it. It is of great significance to use the blockchain technology to build a secure Internet of Vehicles computing task unloading platform.

Data Availability
e data used to support the findings of this study are included within the article.