A Resource Allocation Scheme for Intelligent Tasks in Vehicular Networks

Lots of resource-consuming intelligent tasks need to be handled in vehicular networks, and traditional resource allocation schemes are hard to meet the intelligent demands. Therefore, this paper proposes a task-oriented resource allocation scheme for intelligent tasks in vehicular networks. First, we propose a task-oriented communication system and formulate a resource allocation problem, which is aimed at maximizing the task performance. Second, based on the system model, an intelligent task-oriented resource allocation optimization criterion is proposed, which is formulated as a mathematical model, and its parameters are solved by the proposed gradient descent-based algorithm. Third, to solve resource allocation problem, a multiagent deep Q -network-(MADQN-) based algorithm is proposed, whose convergence and complexity are further analyzed. Last, experiments on real datasets verify the performance advantages of our proposed algorithms.


Introduction
Recently, more and more vehicles are equipped with highdefinition cameras to enhance visual perception [1]. More than 90% of the driving environment information can be collected and acquired by the cameras [2]. At the same time, the vehicles use artificial intelligence technology to fully analyze the large amount of data collected by the vehicular cameras, so as to complete the various tasks in the process of driving [3]. These intelligent tasks, based on a deep neural network, such as classification, detection, and recognition, put forward a huge demand for computing resource at the vehicle ends.
Thanks to the development of Internet of vehicles (IoV) technology in recent years [4], a feasible solution is that the vehicles transmit data to the edge server to complete the intelligent tasks. Then, the edge server feeds back the calculation results to the vehicles, so as to support the intelligent needs of various applications, such as assisted driving and automatic driving in IoV scenarios.
However, a large amount of multimedia data transmission and computing task offloading from the vehicles to the edge server bring great pressure to the communication resources. Therefore, in order to promote the integration of communication and computing processes in the IoV system, it is urgent to study efficient resource allocation schemes to improve the resource utilization rate in the vehicular networks and better serve the intelligent tasks of the vehicles.

Related Works and Motivations.
Existing studies on resource allocation in the IoV are mainly oriented to network efficiency or user experience, with the purpose of maximizing quality of service (QoS) or quality of experience (QoE). Vehicle mobility and service diversity in the IoV scenarios lead to different QoS requirements, so QoS-based resource allocation schemes focus on how to build QoS models suitable for IoV scenarios [5,6] and how to design resource allocation algorithms based on QoS models, such as channel selection [7], power control [8], and spectrum sharing [9]. Although these QoS-based schemes can improve the traditional performance metrics such as capacity, they do not handle the subjective needs.
Compared to network efficiency, user experienceoriented demands are more subjective. QoE-based resource allocation schemes focus more on the needs of human users. Most works focus on QoE modeling to meet the diverse needs of humankind users [10]. For example, safety traffic services have higher requirements on video definition and resolution, while entertainment services have higher requirements on video smoothness. Moreover, some works pay attention to algorithm designing for resource allocation to meet QoE requirements in the IoV scenarios. The main method is to map QoE requirements to communication resource requirements such as spectrum and power [11,12]. And the other is to establish the mapping or association between QoE and QoS and then use the multiobjective resource allocation methods to improve the quality of user experience [13,14].
Most of the existing resource allocation schemes for the IoV scenarios mentioned above ignore the needs for intelligent tasks and seldom consider the contents or semantics of data. When the transmitted data is used for intelligent tasks such as classification, detection, and recognition, its goal is no longer network efficiency or user experience, but the understanding or analysis accuracy of visual contents or semantics, so the traditional QoS or QoE-based resource allocation schemes are no longer optimal [15]. A more suitable resource allocation scheme is needed for intelligent task-oriented communication system.
It is worth noting that the future network is becoming more and more intelligent. It is no longer simply concerned about the pursuit of transmission speed but pays more attention to the demands of intelligence [16]. The intelligent requirements of 6G make the research shift from the traditional communication based on Shannon's framework to the semantic or goal-oriented communication [17]. Recently, some works focus on intelligent endto-end semantic communication systems based on deep learning [18,19]. They propose that image or video features, instead of the full data, are uploaded to the servers for data analyses. Although transmitting features can save wireless resources, they are not suitable for all kinds of tasks since the full data is required to be archived in the server for future investigations. Moreover, even if they have achieved good results in solving the end-to-end intelligent tasks' requirements, they ignore the problems of resource limitation. Both sending and receiving ends require a lot of computing power, and it is difficult to support the implementation of those algorithms under the condition of resource shortage.
In the IoV scenarios where intelligence and network connectivity are highly integrated, it is urgent to explore the balance between network intelligence and efficient resource utilization. According to the above analysis, the main challenges are as follows: (i) How to design a multimedia data transmission system for vehicles with limited computing capacity, to improve transmission efficiency and meet the needs of intelligent computing?
(ii) How to design an optimization criteria for resource allocation in intelligent scenarios of the IoV, to solve the contradiction between intelligent task requirements and traditional resource allocation methods?
(iii) How to design a resource allocation algorithm with high stability and low complexity to adapt to the dynamic changing environment of the IoV?

Contributions and Organization.
To address these challenges, in our previous work [20,21], we made preliminary exploration and proposed a single-task oriented spectrum allocation algorithm. In order to meet the requirements of multitasks and to extend the flexibility of resource allocation scheme in mobile scenarios, this paper further proposes a deep reinforcement learning-based resource allocation scheme for multiple intelligent tasks in vehicular networks, which are aimed at maximizing the performance of intelligent tasks under resource constraints. Contributions of this paper are as follows: (i) We design a task-oriented communication system for multi-intelligent tasks in the IoV scenario. Based on the proposed system model, we construct a multivariable resource allocation optimization problem, which is aimed at maximizing the performance of intelligent tasks (ii) We propose an intelligent task-oriented resource allocation optimization criterion, which is expressed as a mathematical model, and we design a gradient descent-based algorithm for solving model parameters (iii) We propose a multiagent deep Q-network-(MADQN-) based algorithm to solve the resource allocation problem, and analyze the convergence and complexity of the proposed algorithm In addition, we verify the performance advantages of the proposed algorithms based on the new dataset found by us and the existing datasets.
The rest of this paper is organized as follows. In Section 2, the system model is described and the resource allocation problem is formulated. In Section 3, a detailed description about resource allocation optimization criteria is given. In Section 4, a MADQN-based resource allocation algorithm is given. The numerical results are shown in Section 5, followed by the concluding remarks in Section 6.

System Model and Problem Formulation
2.1. Scenario and System Model. A typical scenario of taskoriented vehicular network is shown in Figure 1. The network consists of an edge server and multiple vehicles equipped with cameras. Vehicles are randomly distributed in the server's coverage area. Vehicles perform data collecting and preprocessing, and an edge server completes data storage and intelligent tasks, such as classification, detection, and reidentification (Re-ID).

Wireless Communications and Mobile Computing
Specifically, vehicles perceive the surrounding environment through cameras, including traffic signals and moving objects (such as pedestrians and other vehicles). This visual environmental information, like image and video data perceived by vehicles, needs to be further analyzed. Considering the huge amount of collected multimedia data and the limited computing capacity of the vehicles, the data is transmitted to the edge server through wireless channels to complete the data storage, data analysis, and subsequent intelligent tasks.
Therefore, the goal of the communication system is to improve the tasks' performance as much as possible under the condition of limited resources. To achieve that goal, we design a task-oriented system, as shown in Figure 2.
The proposed system is divided into three parts: vehicles, wireless channel, and edge server. The vehicles contain three modules: (1) collecting module, which is used to collect original multimedia data and complete data preprocessing; (2) encoding module, completing the source coding and digital signal modulation; and (3) control module, which controls the transmission power and selects transmission frequency band and channel. A wireless channel is a mobile timevarying fading channel. The edge server consists of two modules: (1) decoding module, which completes the demodulation and decoding of signals to restore the original data, and (2) computing module, including data storage (for data analysis and model training), intelligent tasks performed by neural networks, optimization criteria modeling, and system resource allocation.
In this system, the vehicles select appropriate encoding data rate, power, and bandwidth according to the results of resource allocation and transmit the compressed data to the edge server through wireless channel. The edge server restores the data and then completes the tasks. The main purpose of the system is to achieve the best task performance by reasonable resource allocation, which is a multivariable resource allocation optimization problem.

Resource Allocation Problem
Formulation. Due to the proposed communication system being task-oriented, the communication goal is to maximize the task performance. Based on the proposed system, task performance is mainly related to the quality of received multimedia data, which is mainly affected by compression rate and bit error rate. Hence, the resource optimization criteria can be formulated as where m ∈ f1, 2,⋯,Mg is the vehicle index, M is the maximum number of vehicles, F task m denotes the task performance such as accuracy or mAP, λ m is the weight coefficient, and q m and p m are the compression rate and bit error rate, respectively.
Here, F task m denotes the task performance like accuracy and mAP, which is the metric of resource allocation schemes for intelligent tasks. From Equation (1), the task performance values are related to traditional communication performance metrics, compression rate, and bit error rate. The following is the explanation. According to Figure 2, the communication progress involves two processes, encoding and transmission, in which compression and noise are the key factors leading to data quality decline and affecting the performance of subsequent tasks. Therefore, Equation (1) describes the relationship between the performance of intelligent tasks and communication parameters. It reveals that task performance like accuracy and mAP can be improved by reasonable communication resource allocation. Besides, the accurate mathematical model of Equation (1) is the basis for resource allocation problems.
Accordingly, the key of optimization problem formulation lies in the calculation of compression rate and bit error rate.

Compression
Rate. According to [22], the compression rate is related to the source coding scheme, data block size, and encoder packet number; hence, it can be expressed as where L m is the encoder packet length, G m is the packet size, and S m is the block size. Then, the encoded data rate is where R B m is the data rate and T G m is the block duration.   [23], with quadrature amplitude modulation (QAM), the bit error rate is expressed as where N is the modulation order. Qð·Þ is the Q-function, and Eð·Þ is the expectation function. P m is the transmission power, h m is the channel gain, n 0 is the noise power spectral density, and R m = B m log 2 ð1 + ðP m h m /n 0 B m ÞÞ is the transmission rate, where B m is the bandwidth, P m is the power, and h m is the channel gain of each vehicle.
The channel gain h m is modeled as an independent random variable, accounting for both large-scale fading h L m (contains path loss h pl and shadowing h sd ) and small-scale fading effects h S m . Since the large-scale fading of channels is typically determined by vehicle locations, which do not change too much during transmission slots [24]. Here, the path loss is modeled as h pl = 148:1 + 37:6log 10 ðd m ÞðdBÞ, where d m (in km) is the distance between the m-th vehicle and the edge server. Shadowing is modeled by using a lognormal distribution, with a standard deviation of 8 dB and zero mean [24]. However, the small-scale fading components might change. Considering the dynamic nature of the small-scale fading, we model the time-varying coefficients as independent first-order auto-regressive processes [25], given by where t e is the time interval, e h is the process noise sequence from a CN ð0, 1 − ρ 2 m ðt e ÞÞ distribution, ρ m ðt e Þ is the channel autocorrelation function, and ρ m ðt e Þ = J 0 ð2πcv m t e /f c Þ, where J 0 ð·Þ is the zero-order Bessel function of the first kind, c is the velocity of light, f c is the band mid-frequency, and v m is the velocity of the m-th vehicle.

Optimization Problem.
The goal of the proposed system is to transmit images and videos to the edge server, the server returns results in time. Therefore, the optimiza-tion problem is to allocate resources to transmit images or videos under constraints to achieve the best task performance, which can be formulated as where R B max , B max , and P max denote the total data rate, power, and bandwidth of communication system, respectively. τ denotes the maximum transmission delay allowed by the system. f min , q e , and p e denote the task performance threshold, compression rate threshold and bit error rate threshold, respectively. The constraints are described in detail below.
C1~C3 represents the constraints of resources, which means the sum of data rate, bandwidth, and power allocated to each vehicle are not larger than the total available data rate resource R B max , the total available bandwidth B max , and the total available power P max , respectively. C4 represents the constraint of delay requirement, which means the transmission time must be shorter than the maximum delay τ allowed by the system. C5~C6 represents the constraints of task performance requirements, in order to ensure that the results of intelligent tasks can meet the minimum requirements f min of vehicle users. C7~C8 represents the   Wireless Communications and Mobile Computing constraints of communication requirements. C7 represents the constraint of the compression rate, which guarantees the image and video quality. C8 represents the constraint of bit error rate, which guarantees the transmission quality.

Resource Allocation Criteria
In this section, we introduce the way to get the accurate mathematical models of optimization criterion F task m , which describe the mathematical relationship between the performance of intelligent tasks and communication parameters. Besides, these mathematical models are the basis for resource allocation problems. The steps of optimization criterion modeling are described as follows.
First is related data generation. In order to get the final mathematical model as y = f ðxÞ, we first need to obtain a large number of data as ðx, yÞ, where x represents communication parameters and y represents the performance values of task completion. Due to communication progress involving two processes, encoding and transmission, in which compression and noise are the key factors leading to data quality decline and affecting the performance of subsequent tasks, in this paper, x represents different compression rates q m and bit error rates p m . Based on the proposed communication system, multimedia data is transmitted at different compression rates and bit error rates, and then, intelligent tasks are completed at the edge server to obtain the performance values such as accuracy and mAP of corresponding tasks. Second is mathematical model selection. Based on the data obtained in the first step, scatter diagrams are drawn to analyze the trend, as shown in Figures 3 and 4. Then, according to the experimental results, the power function model is selected in this paper. Firstly, the power function model can reflect the corresponding relationship well, and secondly, it has monotonicity and is convenient for derivation.
Third is model parameter solution. Based on the large amount of data obtained in the first step and the power function model selected in the second step, mean square error (MSE) criterion is taken as the guidance to design the algorithm for solving model parameters. The flow is shown as Algorithm 1.
According to the proposed method, we obtain the models for different AI tasks, such as classification, detection, and Re-ID. The models reveal the relationship between task performance and communication parameters like compression rate or bit error rate. The mathematical formulas for the criteria models are as follows:

Wireless Communications and Mobile Computing
where F task m denotes the task performance like accuracy and mAP, q m and p m are the compression rate and bit error rate, respectively, and the other symbols are model parameters, which are solved by Algorithm 1.
In the flowchart, a and b represent the vectors of model parameters. The loss functions L 1 and L 2 represent the MSE values of F task m ðq m Þ and F task m ðp m Þ, respectively, and L 0 is the MSE threshold. Moreover, G 1 and G 2 represent the 1: Input: Initial parameters a = ½a 1 , a 2 , a 3 , b = ½b 1 , b 2 , b 3 , data set D, step length δ, and threshold L 0 ; 2: Output: Parameters a, b.
end for 8: Compute the loss based on MSE: 9: : Compute the gradient of a and b: 12: Gradient descent based update strategy:

Resource Allocation Algorithm
According to the above analysis, we have gotten the models to describe the relationship between the AI task performances and communication metrics. To get the AI tasks done better in the edge server, reliable resource allocation should be carried out in the process of transmission, which means to maximize the accuracy or mAP performance. Hence, in this section, we first analyze the difficulties in solving the above optimization problem. Then, we propose a multiagent deep Q-network-(MADQN-) based algorithm to solve the problem.

Optimization Problem
Analysis. Based on the above optimization problem, the proposed system transmission model and the scenario, the design of algorithm faces the following challenges.

Nonconvexity Problem.
For the optimization problem, its objective function F task m ðq m , p m Þ is nonlinear, and its constraint conditions C4 and C8 are nonconvex and nonlinear (which can be derived by Equation (4), and its optimization variables are multidimensional. Therefore, the optimization problem P1 is essentially a NP-hard problem, which is difficult to be solved by convex optimization methods.

Dynamic Transmission Conditions.
For the proposed transmission model, the channel gain h m is dynamically changing, which leads to the traditional convex optimization algorithms, and heuristic algorithms are difficult to capture the dynamic characteristics of the channel.

High Requirements in the Scenario.
For the proposed task-oriented dynamic vehicular network scenario, the complexity and stability of the algorithm are highly required.
Recently, deep reinforcement learning has shown strong advantages in solving resource allocation optimization problems in dynamic environments. Therefore, this paper considers using deep reinforcement learning to solve the challenges mentioned above, mainly based on the following.
By transforming the original nonconvex mathematical problems into sequential decision problems, deep reinforcement learning uses historical data and interaction with the environment to learn strategies and uses neural networks to approximate the optimal solution, so as to solve the NPhard problems. One of the advantages of deep reinforcement learning is to solve the problems of dynamic environment [26], which refers to randomness under fixed state distribution, such as the channel model h m proposed in this paper. The neural network fitting function itself can map the close set to the close domain so as to improve the robustness of dynamic learning. Another advantage of deep reinforcement learning lies in its strong generalization ability [27]. Through full offline training, online decision-making can greatly reduce the complexity, which fits well with the proposed transmission system proposed, as shown in Figure 2. In addition, introducing experience replay and iterative updating mechanism can also improve the convergence stability of deep reinforcement learning.

Key Elements for Deep Reinforcement Learning.
The DQN is a widely used deep reinforcement learning algorithm. Figure 5 is the flowchart of the proposed DQN algorithm in this paper, in which all vehicles act as agents to optimize the objective function in the optimization problem P1, namely, the performance metrics of the intelligent tasks. After performing an action aðkÞ, the agents can receive feedback information such as reward values rðkÞ related to tasks' performance from the environment, update the current state sðkÞ, and then select the next action aðk + 1Þ according to the current state and action selection strategy by Q-network, until the convergence of the best strategy is obtained.
There are some key elements in the design process of DQN algorithm, especially to the optimization problem and system model in this paper.

Agent.
We regard M vehicles in a vehicular network as the agents in the multiagent DQN algorithm.

Action.
In the proposed scheme, the execution action contains three dimensions, namely, the change of the encoding packet length L m , the bandwidth resource B m , and the transmission power P m of the m-th vehicle. There are three options for actions (increases, remains, or decreases) in each dimension. For example, L A = f+ΔL, 0,−ΔLg represents the change of the allocated packet length; the same goes for the changes of allocated bandwidth B A and power P A . Action space is represented as where × denotes Cartesian product. Therefore, the action space size is 3 × 3 × 3, and its dimension is 3. For example, a m = ð+,−,0Þ indicates that the current action is as follows: the packet length allocated to the m-th vehicle increases, the bandwidth decreases, and the power remains unchanged. Where considering that the packet length L m ∈ N * , the change of L m is set as ΔL = 1. In the discretization of B m and P m , ΔB = 0:001B max and ΔP = 0:001P max , respectively.
where the dimension of the state space S m is 4.

Reward.
The objective function is that all users execute the action strategy to maximize the total task performance under constraint conditions. To achieve this purpose, in the case of packet length, bandwidth and power changes, and channel state changes, the reward function is set as 4.2.5. Q-Network. Figure 6 shows the Q-network structure. Each agent updates the network structure by using the data acquired by itself. It is worth noting that, considering the possible lack of computing power of the agent, the agent can upload the data to the server, and after the server completes the training, the network parameters are downlink transmitted to the agent. As shown in Figure 6, the Q-network contains three layers, namely, the input layer, the hidden layer, and the output layer. The input dimensions are equal to the sum of the of state vector s m dimensions, the hidden layer dimensions are H, and the output layer dimensions are equal to the action vector a m dimensions, respectively. In addition, we use rectified linear unit (ReLU) as the activation function. For simplicity, the Q-function of agent m is expressed as Q m . To train the DQN, we use a finite-size experience replay buffer Z to save the history transition samples zðkÞ = ðsðkÞ, aðkÞ, rðkÞ, sðk + 1ÞÞ, and old samples will be discarded when the storage is full. The two DQNs evaluation Q-network Q θ ð·Þ and target Q-network Q θ * ð·Þ are used to approximate Q-function, with θ and θ * being their weights, have the same structure. Here, θ is expressed as where ω in and b in are parameters of the first fully connected layer from input layer to hidden layer and ω out and b out are parameters of the second fully connected layer from hidden layer to output layer, as shown in Figure 6. Moreover, the weight update of Q θ * ð·Þ is expressed as slowly approaching the weight of Q θ ð·Þ, as θ * ⟵ θ, after C steps.

Multiagent DQN Algorithm.
Algorithm 2 shows the process of multiagent DQN Algorithm. The algorithm mainly includes two parts: initialization process and reinforcement learning process. Reinforcement learning is a process of repeated iteration; each iteration should solve two problems: give a strategy to obtain value function, and update the strategy according to the value function. The iterative process is as follows: (1) The environment gives an observed state s, and the agent obtains all Q θ ðs, aÞ of state s based on the value function network, namely, Q-network Q θ ð·Þ. Then, the agent selects actions and makes decisions using ε-greedy strategy. (2) Upon receiving the action, the environment will give a reward and the next observation state. (3) The agent update the parameters θ of Q-network according to the loss function, and then enter the next step. (4) The cycle continues until a convergent Q-network is trained.
In the proposed DQN algorithm, experience replay mechanism and target network are introduced: (1) Experience replay is used to solve the problem of data correlation, that is, to store the experienced data in a buffer and extract a part of the data from the buffer for each parameter updating, so as to ensure that the training samples are independent and equally distributed, improve the utilization rate of data, and make the model better converge. (2) Target network is used to solve the problem of value function fluctuation in the iterative process. By using target network, the model calculating the target value will be fixed in a period of time, which can reduce the volatility of the model and make the training more stable.
In addition, the training effect of DQN algorithm is related to its main parameters. After many trials, the main parameters of the algorithm in this paper are set as follows: the learning factor μ is set at 0.001, so that the algorithm retains most of the historical training results and pays more attention to past experience, and the discount factor λ is set as 0.9, allowing the algorithm to consider 90% of the next reward and pay more attention to long-term rewards. The coefficient of ε-greedy strategy is set as ε = 1/ ffiffi ffi k p , so that the algorithm strikes a balance between exploration and exploitation.

Convergence and Complexity.
First, we discuss the convergence of the proposed algorithm: The convergence of DQN is difficult to prove directly theoretically, but the training of DQN is stable due to the introduction of the mechanism of experience replay and target network. It can be proved that network parameters can converge to a very small interval, so that a stable approximate optimal solution can be obtained through DQN. The proof process is shown in Appendix A.
Inference complexity is mainly related to Q-network structure. As shown in Figure 6, each inference needs to go through two fully connected layers, so the computational complexity in the reasoning stage can be expressed by the required multiplication times, which is where d in , d h , d out are dimensions of input layer, hidden layer, and output layer, respectively, and the dimensions represent the number of neurons of each layer. The consumed time of online resource allocation depends on the inference complexity of the proposed DQN algorithm. The proposed multiagent DQN is distributed, its training complexity is only related to the number of iterations, and its inference complexity is only related to the dimensions of each layer, which are fixed. The complexity will not expand exponentially with the increase of vehicles. Therefore, the proposed algorithm is suitable for scenarios with low delay and high access.

Numerical Results
In this section, numerical results are provided for the performance evaluation of the proposed algorithms. Firstly, the parameter settings of communication system simulation and intelligent task experiment are introduced. Secondly, the datasets used in this paper and the related schemes are introduced. Thirdly, the performance evaluation results of Algorithm 1 are presented and analyzed. Lastly, the performance evaluation results of Algorithm 2 are presented and analyzed.

15:
if x k < ε then 16: Select a random action a from A; 17: else 18: Select the action a = arg max Q θ ðs, aÞ.

19:
end if 20: Perform action a; 21: Get new state s′ and reward rðkÞ by Equation (11)  Firstly, the simulation scenarios of vehicular networks are based on the urban intersection scene generated by SUMO. Each road includes 4 two-way lanes 3.5 m wide, and the initial positions of vehicles are randomly generated. Secondly, the Winner model in the 3GPPTR 36. 885 standard is used for the channel model, as described in Section 4. Meanwhile, the small scale fading caused by vehicle movement and construction is considered, and the time-decrement scale fading model shown in Equation (5) is adopted. The specific system simulation parameter settings are based on 3GPP TR 36.885, as shown in Table 1. Lastly, considering the time-varying unsteady characteristics of the vehicular networks in practice, 200 independent Monte Carlo simulations are used in each experiment to take the average value to eliminate errors caused by abnormal data In the experiments of intelligent tasks, the GPU model is NVIDIA GeForce RTX 3090, and the training and testing environment is Windows 10 + CUDA 10.2. The intelligent tasks adopted in this paper include classification, object detection, and Re-ID. Deep learning based experimental parameter settings are shown in Table 2, mostly based on experience. In addition, JPEG 2000 is used for image compression and coding, and HEVC is used for video compression and coding.

Datasets and Related Scheme
Introduction. The datasets used in this paper include four existing datasets and the new dataset constructed in this paper. The datasets are summarized in Table 3. The STL-10 [28] dataset mainly includes image data and is applicable to classification tasks. Caltech [29] and Waymo [30] datasets are capture by real vehicular cameras in driving scenarios, which are mainly used for object detection tasks in vehicular networks. Market-1501 [31] mainly contains image data and is suitable for Re-ID tasks. In addition, a semantic communication-oriented dataset [32], namely, SCO, containing 5100 images and 10 video clips, is also constructed for classification tasks and object detection tasks.
To verify the performance advantages of the proposed resource allocation scheme, three comparison schemes are used. Based on the main characteristics of each scheme, the scheme names are all abbreviated. The schemes are detailed as follows: (i) TPO-JRA scheme (Task Performance-oriented Joint Resource Allocation scheme): it is the proposed scheme in this paper, which is aimed at maximizing the performance of intelligent task, and the optimization variables include data rate resource, bandwidth resource, and power resource. The optimization algorithm is based on MA-DQN (ii) CPO-RA scheme (Content Priority Oriented Resource Allocation Scheme): in this scheme, the optimization objective is to maximize the effective information, the definition of which is shown in [20]. The optimization variables include power resource and bandwidth resource, and the optimization algorithm is based on Q-learning (iii) QoC-RA scheme (Quality of Content-based Resource Allocation scheme): in this scheme, the optimization objective is to maximize content quality, which is defined in [15]. Under the object detection task, that is, the average detection accuracy, the optimization variable is bit-rate resource, and the adopted optimization algorithm is based on convex optimization (iv) MRA scheme (Mean Resource Allocation scheme): in this scheme, all resources are evenly allocated to each user. This scheme serves as a baseline for other schemes to verify the performance gain of the proposed scheme In addition, in order to ensure the fairness of comparison, the same algorithm is adopted in this paper under the same scenario and intelligent task. Figures 3  and 4 show the relationship between task performance and compression rate and bit error rate, respectively. Task performance decreases with the increase of compression rate and bit error rate. This is because the semantic or content of the original data is lost due to lossy compression or noise interference, so that the machine at the edge server cannot correctly identify the semantic or content of the original data through deep learning networks. Secondly, the relationship between task performance and compression rate presents the nature of concave function, while the relationship

200~700 Kbps
The total bandwidth B max 0.1~0.9 MHZ The total power P max 5~45 dBm The minimum performance limit f min 0.1 The delay threshold range τ 50~300 ms The compression rate threshold q e 0.95 The bit error rate threshold p e 0.001 10 Wireless Communications and Mobile Computing between task performance and bit error rate presents the nature of convex function. This is because they have different ways of semantic distortion. The lossy compression mainly leads to blur distortion, and the noise mainly causes the symbol errors in the decoding process. In addition, the curves of each task have differences, which is because different tasks are based on diverse deep learning networks, which have various degrees of sensitivity to compression distortion and noise. Table 4 shows the numerical results of model parameters and RMSE metric, and all numerical results are reserved for 4 significant digits. RMSE reflects the degree of agreement between the predicted values based on the mathematical model and the actual values. For the proposed method, in the classification, detection, and Re-ID tasks, the RMSE values of model Equation (7) are 0.03438, 0.04303, and 0.02984, respectively, and the RMSE values of model Equation (8) are 0.04078, 0.03739, and 0.01633, respectively. All of them are less than 0.05, reflecting that the modeling method (Algorithm 1) has good accuracy performance and can reflect the relationship between task performance and system variables.

Performance Results of Resource Allocation Scheme.
In this section, we use the task performance values related to F task m in Equation(1) to verify the advantages of the proposed scheme. The experiments include three different intelligent tasks, classification, detection, and Re-ID tasks. A different task is related to different metric, such as accuracy or mAP, but the values are between 0 and 1. Hence, we use the weighted average values of different vehicles with different kinds of tasks to present the performance metric of the whole system. The larger the values are, the better intelligent tasks of the system are completed. Figures 7-9 show the curves of task performance versus bit rate, band-width, and power, respectively. With the variation of resource parameters, the proposed resource allocation scheme has the best performance, which verifies the effectiveness of the proposed scheme. Combined with Figures 7-9, the following conclusions can be drawn: Firstly, optimization of resource allocation is crucial to the performance improvement of intelligent tasks, which indicates that in intelligent task-oriented communication system, resource optimization can further ensure the correct understanding of transmitted data by the server, rather than only pursuing the improvement of computing capacity of the server. Secondly, when resources are limited, it is necessary to jointly optimize different types of resource parameters, because the influence of various resource parameters on video quality is coupled in the transmission process.

Performance versus Resource Parameters.
In addition, within the numerical range of experimental settings, the performance gains brought by optimization of different resource parameters are inconsistent, which is due to the different mathematical relationship between resource parameters and the objective function. There is an inequality of average performance gains with resource parameters, Δ F ðP m Þ > Δ FðL m Þ > Δ FðB m Þ, which indicates that in the actual environment, the allocation of power should be given

Parameters Values
The learning factor μ 0.001 The discount factor γ 0.9 The update frequency of target Q-network C 5 The maximum number of iterative steps K 10 4 The size of replay buffer Z m 200 The dimensions of input layer d in 4 The dimensions of hidden layer d h 10 The dimensions of output layer d out 3   11 Wireless Communications and Mobile Computing priority. Lastly, with the increase of resources, the performance gain decreases gradually, because under the condition of resource saturation, the performance tends to be stable, which is determined by the neural network structure related to intelligent tasks.

Performance versus Delay and Number of Vehicles.
With different schemes, the change of task performance under the time-delay constraint is shown in Figure 10. Under the condition of strict delay constraints, the performance of the proposed scheme has advantages, which is due to the reasonable design of the constraints in the optimization problem. The proposed algorithm explores the best action strategy to maximize the task performance. At the same time, the delay requirements are met, according to Equation (11). Secondly, the complexity of the proposed algorithm is low. Based on the above analysis, the proposed resource allocation scheme is more suitable for time-critical scenarios.
The change of task performance with the number of vehicles is shown in Figure 11. It can be seen that with the increase of the number of vehicles connected to the edge server, the overall performance shows a downward trend, because the increase of vehicles causes more intense competition for resources, and the average resources that can be allocated to each vehicle decreases. However, with the increase of the number of vehicles, the proposed resource allocation scheme still maintains the optimal performance, which verifies the performance advantage of the proposed scheme in the scenario of large access or resource shortage. In addition, it also reveals that largescale access systems must optimize the multidimensional parameters to ensure the accurate understanding of the data at the receiving end.

Convergence Performance.
The convergence results of the proposed resource allocation algorithm are shown in Figure 12. To get rid of burrs, the convergence curves are smoothed for better visual effect. More precisely, the value of each step is the average of the near 5 original steps. The smoothing operation does not affect the convergence performance of the algorithm. It can be seen that the proposed algorithm converges stably within a limited number of iterations. There are two reasons for this. First, the introduction of experience replay mechanism and target network enables the parameters of Q-network to converge. Second, the design of action selection strategy makes the algorithm pay more attention to learning historical experience rather than exploring when the number of iterations increases, which ensures the stability of Q-network. Moreover, according to the curves, our algorithm seems to converge very fast, that is, because the structure of Q-network we designed is relatively simple as shown in Figure 6, which only contains three layers, and the dimensions of each layer are 4, 10, and 3, respectively. This results in fast convergence of the algorithm. Because, in each update iteration, the computation of network parameters becomes smaller, as shown in Equation (13) and (14). The choice of network structure and layer dimensions are the results of experience.

Conclusion
This paper studied resource allocation schemes, in order to achieve efficient transmission of multimedia data and accurate completion of intelligent tasks in the IoV scenarios. In this paper, a mathematical model of resource allocation optimization criterion for intelligent tasks was constructed, and the model parameters were solved by a gradient descent algorithm. For classification, detection, and Re-ID tasks, the RMSE index of the algorithm was less than 0.05. Secondly, under the guidance of this model, this paper designed a joint allocation algorithm of data rate, bandwidth, and power resources based on MADQN and discussed the con-vergence and complexity of the algorithm. Experimental results showed that the proposed resource allocation scheme is more suitable for the intelligent networked environment with lots of computer vision tasks. The results revealed that to improve the performance of intelligent tasks; not only the intelligent algorithms but also the communication technologies like resource allocations should be considered.
In line with the general trend of the close combination of communication technology and artificial intelligence, the scheme proposed in this paper provides a new solution to the difficult problems existing in the era of intelligent driving, such as complex traffic environment and low transmission efficiency. In the future work, we will deduce a more universal resource allocation criterion from the perspective of information theory and analyze the performance of intelligent tasks oriented communication system from the theoretical perspective.