Energy-Efficient Resource Allocation for NOMA-Enabled Internet of Vehicles

With the rapid development of Internet of vehicles (IoV) technology, the distribution of vehicles on the highway becomes more dense and the highly reliable communication between vehicles becomes more important. Nonorthogonal multiple access (NOMA) is a promising technology to meet the multiple access volume and the high reliability communication demands of IoV. To meet the Vehicle-to-Vehicle (V2V) communication requirements, a NOMA-based IoV system is proposed. Firstly, a NOMA-based resource allocation model in IoV is developed to maximize the energy efficiency (EE) of the system. Secondly, the established model is transformed into a Markov decision process (MDP) model and a deep reinforcement learning-based subchannel and power allocation (DSPA) algorithm is designed. An event trigger block is used to reduce computation time. Finally, the simulation results show that NOMA can significantly improve the system performance compared to orthogonal multiaccess, and the proposed DSPA algorithm can significantly improve the system EE and reduce the computation time.


Introduction
With the rapid development of vehicle wireless communication technology, Internet of vehicles (IoV) has a broad development prospect [1]. Among the various applications generated by IoV, security applications are undoubtedly of the highest priority because they impact on the safety of the vehicles directly [2]. Vehicle-to-Vehicle (V2V) communication as a key technology in intelligent transportation system (ITS) that could meet the strict latency and reliability requirements of safety applications has attracted continuous academic attention [3].
V2V communication is aimed at communicating directly between vehicles with extremely low latency and ultrahigh reliability, which could guarantee the quality of service (QoS) requirements of security applications [4]. In general, device-to-device (D2D) communication provides the principle of direct propagate information between adjacent devices, which could greatly reduce latency and transmission energy consumption. Therefore, D2D technology is commonly used as the basis for V2V communication.
That is why the 3rd Generation Partnership Project (3GPP) developed V2V communication principles based on D2D technology [5] in the long-term evolutionary (LTE) system. However, it has been shown that the QoS requirements of V2V communication cannot always be guaranteed under this principle. The reason is D2D communication following this principle is based on orthogonal multiple access (OMA) [6], a technology that does not make full use of spectrum resource and has difficulty in solving interference problems due to the increase in vehicles. When vehicles are deployed densely, IoV system would suffer from severe congestion, which affects the performance of the system. Such problems have been solved with the rise of 5th generation (5G) mobile networks. 5G introduces nonorthogonal multiple access (NOMA) technology that allows a resource block to be assigned to multiple users, thus greatly expanding the amount of access to the network [7]. In some cases, such as uplink communication intensive scenarios, NOMAenabled system has a significant performance improvement compared to OMA system. The cost of extended access is that NOMA actively introduces interference information and requires reducing the impact of interference by successive interference cancelation (SIC) techniques [8]. Compared to OMA system, NOMA is more complex to decode at the receiver side, but after adopting SIC and other technologies, it is beneficial to the whole system performance. SIC technology decodes the received signal level by level and removes it after successful decoding to reduce the interference to the undecoded signal. In NOMA-enabled IoV system, the performance of V2V communication can be significantly improved.
Due to its advantages over OMA, NOMA is widely used in ultradense network (UDN), mobile edge computing (MEC), IoV, and other environments [9,10]. Currently, NOMA has great potential to expand network access and improve network performance, but there are still some issues that need to be addressed. There have been many works introducing NOMA technology for resource allocation and interference management. In these works, the optimization of system throughput and the QoS requirements for V2V communication have been mainly considered. However, NOMA extends the number of user accesses through channel multiplexing, which increases the difficulty of channel allocation. In addition, the power allocation scheme becomes more complex due to the interference introduced by NOMA, and the overall system power consumption should be considered in the resource allocation scheme. Besides, literature [11] has analyzed the SIC technique and pointed out that, due to the complexity of implementation, normally two users could share the same subchannel at most.
To solve the above problem, we study the resource allocation problem for high energy efficiency (EE) in IoV systems. We describe the scenario of the NOMA-enabled IoV system and present the resource allocation problem for maximizing the system EE. Due to the complexity of the system and the high computational dimensionality of the direct solution, we transform the optimization problem into a Markov decision process (MDP) and use deep reinforcement learning (DRL) method to solve it. The main contributions of this paper are as follows: (i) We investigate the problem of resource allocation in IoV system. The NOMA technology is introduced to meet the demand for multivehicle access, and the implementation of uplink SIC technology is presented. By allocating the channel and power resources of vehicles, we propose an optimization goal of maximizing the system EE (ii) We transform the optimization goal into a problem of resource allocation strategy based on MDP and propose a DRL-based subchannel and power allocation (DSPA) algorithm to solve it. Specifically, the deep Q network (DQN) method is used to solve the subchannel selection and the deep deterministic policy gradient (DDPG) method is used to solve the power allocation problem. The event trigger block is used to reduce the computation time (iii) We simulate and analyze the designed algorithm.
The simulation results show that the performance of the NOMA-enabled IoV system is more suitable for multiple vehicle access situations than OMA, and the DSPA algorithm can effectively enhance the system EE and reduce the computation time The rest of this paper is organized as follows. In Section 2, we analyze the work related to this paper. The system model and problem formulation are given in Section 3. In Section 4, we transform the optimization problem into an MDP model and design the DSPA algorithm for solving it. In Section 5, the proposed resource allocation method is simulated and analyzed. Section 6 is the conclusion.

Related Work
Due to the variability of QoS requirements for vehicle users, the resource allocation problem in vehicle networks has attractive research value and has received extensive attention from researchers for years [12,13]. Since the high speed movement of vehicles in IoV makes it difficult to obtain accurate and fast channel change information, Guo et al. [14] obtained the time delay of V2V link in steady state based on Markov process and determine the optimal transmit power for each possible spectrum and finally allocated the spectrum resource by dichotomous matching method to maximize the system data transmission rate. Chen et al. [15] developed an online network slice resource allocation strategy that can meet the demand for QoS requirements for IoV applications and maximize system capacity. Liang et al. [16] designed a multi-intelligent DQN algorithm to allocate spectrum and power for each V2V link and maximize the total system throughput. Yang et al. [17] studied the design frame structure for V2V communication in IoV and proposed a semipersistent frame scheduling algorithm, which greatly meets the needs of V2V communication.
Resource allocation for IoV system can also be combined with MEC. Chen et al. [18] considered the dynamics of computational task arrival and wireless channel state in the MEC scenario and jointly optimized task and computational resource allocation to minimize system energy consumption while guaranteeing the upper limit of queue length. Zhao et al. [19] studied the collaborative offloading strategy of edge clouds in IoV and designed a distributed computational offloading and resource allocation algorithm to optimize the joint benefits of offloading and resource allocation. The problem of joint allocation of spectrum, computation, and storage resources in MEC-based IoV was studied by Peng et al. [20]. Since the problem has a high computational complexity, the authors transformed the problem using reinforcement learning (RL) method and solved it with a hierarchical learning architecture to obtain the optimal resource allocation decision.
By introducing NOMA technology in IoV scenery, the system performance will be further improved. Di et al. [21] proposed a resource allocation scheme in IoV broadcast scenarios, using NOMA to reduce latency and improve data acceptance probability. The main idea of this scheme is a centralized channel selection strategy and a distributed power allocation strategy. The packet reception probability is significantly improved by this scheme. Liu et al. [22] 2 Wireless Communications and Mobile Computing studied the optimal power allocation problem in broadcast and multicast transmission schemes in half-duplex NOMA-based IoV scenarios and proposed a bifurcationbased power allocation algorithm that significantly improves the system throughput compared with the OMA scheme.

System Model and Problem Formulation
3.1. System Model. We consider a multivehicle highway scenario where one base station is located at the center and the radius of the base station coverage is D, as shown in Figure 1 where α VT denotes the arrival intensity of VT users in terms of VTs per second. A right-angle coordinate system is established with the base station as the origin, and the position of each vehicle is denoted by ða m , b m Þ. All vehicles are traveling in one direction with speed v m , and the coverage radius of V2V communication is d max . The total available bandwidth for the D2D communication is W all and is divided equally into K nonorthogonal subchannels, each bandwidth W = W all /K.
Due to the dense vehicle coverage, when multiple VT users send messages through the same subchannel simultaneously, the receiving vehicles (denoted as VR) located in the common coverage area of these VT users may receive messages with large interference. NOMA allows multiple vehicles to transmit information through the same channel simultaneously, and the VR users use SIC technology to decode the received information and reduce the cochannel interference.
We denote ℕ l as the set of all VT users that can be received by the receiving vehicle VR l , i.e., is the distance between VT n and VR l . In time slot t, the signal received by the receiving vehicle VR l on subchannel k (SC k ) is where α ðtÞ n,k is a binary variable that indicates the subchannel selected by VT n . Specifically, α We map the mobility of the vehicle to the change in the position of the vehicle. Since there is short length of time slots, it can be assumed that the position of the vehicles in time slot t does not change, so the distance of any two vehicles d m,m′ remains constant in time slot t. The position of the vehicle needs to be recalculated at the beginning of the time slot t + 1. According to Equation (2), the distance between vehicles is further mapped to the change in channel gain, so we assume that the channel gain also remains within one time slot, while it changes in the adjacent time slots. Thus, the SINR between VT n and VR l over SC k in time slot t without SIC technology can be expressed as where σ 2 l = E½jz ðtÞ l j 2 is the noise power on SC k and jh ðtÞ n,l,k j 2 is the channel gain. The data rate of SC k between VT n and VR l without SIC technique can be expressed as In the uplink NOMA system, the superimposed signal y ðtÞ n,l received by VR l needs to have a certain clarity between the different signals in order to eliminate interference. Since the channels between each VT n and VR l are different, the signals sent by each VT user in the uplink experience a different channel gain. Therefore, among the superimposed signals y ðtÞ n,l , the VT user with the best channel quality may have the strongest received power, and VR l decodes this VT signal first, i.e., the decoding order of VR l is from VT users with good channel quality to those with poor channel quality. Otherwise, it has to allocate higher power for VT users with poor channel quality to improve their received power, which will reduce EE. Assuming that there are N VT users sending messages to VR l over SC k and the order of the channel gains between each VT user and VR l is

Wireless Communications and Mobile Computing
According to the SIC decoding rules, VR l firstly decodes VT users with n′ < n and eliminates VT n ′ interference symbols when decoding VT n , but not eliminate VT n " ðn ′ ′ > nÞ interference symbols. Therefore, the SINR between VT n and VR l over SC k in time slot t with SIC technology can be expressed as where ℕ l ′ = fn ′ ∈ ℕ | jh ðtÞ n ′ ,l,k j < jh ðtÞ n,l,k jg represents a set of interfering VT users.
Considering the QoS requirements of VT users, VR l can successfully decode the information delivered by VT n through subchannel SC k which also needs to satisfy the transmission rate R ðtÞ n,l,k not below the rate threshold, i.e., R ðtÞ n,l,k ≥ R min . Otherwise, VR l will not be able to decode the information. We assume that the transmission rate R ðtÞ n,l,k = 0 in this case. Then, the data rate of SC k between VT n and VR l can be expressed as Therefore, the total rate of the NOMA-enabled IoV system in time slot t can be expressed as where L is the sum of VR users in time slot t. SIC technique in NOMA-enabled IoV system has been investigated in [11]. At the VR side, as the maximum number of VT users who are multiplexing the same subchannel increases, the difficulty of SIC technology increases dramatically. To avoid excessive SIC complexity for VR users, in this paper, we assume that each VT user delivers information to at most one VR user during each slot. In addition, it also reduces transmission errors.

Problem Formulation.
In NOMA-enabled IoV system, data transmission rate and system power consumption are both important parameters to measure system performance. Our goal is to minimize the overall power consumption of all VT users while maintaining the system transmission rate, i.e., transmitting more bits per unit Joule. Therefore, we set the optimization objective as the ratio of the overall transmission rate to the total transmit power of VT users, i.e., EE, which can be expressed as  Figure 1: NOMA-based IoV system scenario.

Wireless Communications and Mobile Computing
where P ðtÞ sum = ∑ K k=1 ∑ N n=1 p ðtÞ n,k denotes the sum transmitted power for all VT users in time slot t and P c is additional circuit power consumption.
Thus, the optimization problem can be expressed mathematically as Constraint C1 indicates that two vehicles within the communication range cannot pass messages to each other, i.e., VT n cannot pass messages to VT n ' within its communication range. This is because of the half-duplex nature that no two vehicles can receive a message at the same time as it is passed, according to [21]. To reduce the SIC complexity at the receiver side, we assume that each subchannel SC k is multiplexed by at most U max VT users and that each VT user delivers information to at most one VR user within its communication range during slot t, which are reflected in constraints C2, C3, and C4. Constraint C5 limits the threshold of transmit power for VT users.

DRL-Based Subchannel and Power Allocation Algorithms
The optimization problem in (10) is nonconvex and NP hard, which has a complex system with high computational dimensionality. The problem requires exponential levels of time complexity for direct computation of all possible subchannel selections and power allocations, which is difficult to implement in practice. Therefore, we use reinforcement learning methods to select the subchannel selection and power allocation strategies of maximizing EE. We first transform the resource allocation problem in NOMA-enabled IoV system into an MDP-based resource allocation problem and then solve the model using DRL methods.

Optimize Problem Conversion.
In the proposed NOMAenabled IoV system, the system state in each time slot t + 1 depends only on the actions, including subchannel selection and power allocation, made by the VT users in time slot t. Therefore, we transform the developed model for maximizing EE into a resource allocation model based on MDP and then solve it through the DRL method. The state space S, action space A, and reward R of the MDP model are defined below, respectively.
4.1.1. State Space. The system state information can be described jointly by the system data transmission rate and the energy consumption. Thus, the system state space S includes the transmission rates between all VT users and the corresponding VR users, as well as the transmission power of all VT users, and this information is the basis for this resource allocation. Since we assume that each VT user transmits information to only one VR user, during time slot t, the state s t ∈ S can be expressed as follows: 4.1.3. Reward. We denote the reward for selecting the action a t under state s t as EE of the current system, which can be calculated by Equation (9). Specifically, for r t ∈ R, it can be expressed as The goal of reinforcement learning is to find the optimal policy π * through multiple iterations to achieve the maximum long-term discounted reward where γ ∈ ½0, 1Þ is the discount factor. When γ is equal to 0, only the current reward has been considered, while the subsequent has been ignored. As γ increases, the system will focus more with long-term discount rewards. The reward function can be set to satisfy the requirement of receiving a higher reward when the agent chooses to perform an action that makes the system EE larger and otherwise receives a lower reward or even receives zero reward. After several rounds of iterations, the agent will gradually choose the policy that can obtain higher rewards, i.e., a better resource allocation policy.

Event
Trigger. The framework of the proposed DSPA algorithm is shown in Figure 2. During the process of interacting with the environment, the agent selects and executes an action a t based on the environment's current state s t , 5 Wireless Communications and Mobile Computing after which the state s t becomes state s t+1 , and the agent gets a reward r t given by the environment. Then, the agent executes a new action a t+1 according to a certain policy π based on the new state and the reward. After a long iterative process, the agent will get an optimal policy π * that earns the most reward.
Policy π is a mapping of the state space S to the action space A. Specifically, π = S ⟶ A. Considering the stateaction value function of the action Q : S × Q ⟶ R that represents the expected reward for performing action a with policy π in state s, i.e., For the established MDP model, the ultimate goal is to find an optimal policy π * that can be satisfied as Q π * ≥ Q π for all policy π. The optimal action-value function can be expressed as Equation (16) is the Bellman equation, which indicates that when the agent makes an optimal decision, the obtained Q value must be the expected reward for the optimal action in that state. For the MDP model, the schemes to obtain the optimal policy π * mainly include model-based approaches and model-free approaches. Since a part of the prior knowledge, such as transfer probability, is unknown in the NOMAenabled IoV system, it is necessary to use the model-free approach RL to obtain statistical information of the unknown model. DRL combines RL with deep neural networks (DNN) and solves high-dimensional state and action space problems by DNN, which is widely used in IoV systems.
However, solving the MDP model using the DRL method is still time costly, as it takes more time to update the neural network weight parameters, generate the actions,    [23], the authors propose an event trigger module, which is a controller that updates the neural network parameters only when the system state deviates from a certain level. Such method can effectively reduce the computation time, so we introduce it into our DSPA algorithm. In NOMA-enabled IoV systems, there may be two adjacent time slots in which the system states are similar or even identical, and then, the action selection corresponding to these two states should also be the same. So when the DNN outputs the action in the first time slot, the same action in the next time slot can be executed directly without the DNN. Referring to Lemma 1 in [24], we give a proof for this consideration.

Theorem 1.
For two consecutive states s t and s t+1 , their corresponding optimal actions a t and a t+1 should be the same when s t = s t+1 .
Proof. According to Equation (16), after obtaining the optimal state-action value function Q * ðs, aÞ for all states, by using the greedy strategy, the optimal actions a t and a t+1 corresponding to states s t and s t+1 can be expressed as where A represents the action space of two actions. Assuming that s t = s t+1 , we can obtain which proves our assumption.
Based on the above assumption, we add the event trigger module into the DRL framework as a way to decide whether to output new actions by using the neural network. Specifically, the previous state s and the corresponding action a are stored in the event trigger. The new state s t is firstly compared with s; if the difference between the two is less than a certain threshold, a is directly output as the action of state s t . Otherwise, the DNN outputs the action a according to state s t , and s and a are replaced with s t and a t . Using the binary variable ζ as the event trigger decision, specifically, where ρ is the threshold, ζ = 1 means outputting action a t through the neural network, and ζ = 0 means obtaining action a stored in the event trigger.

DRL-Based Resource Allocation Framework.
In the proposed DSPA algorithm, the subchannel selection action, i.e., a 1 t in Equation (12), is obtained by the DQN method.
Since the transmission power is a continuous interval, we use the DDPG method for power allocation, i.e., a 2 t in Equation (12).

DQN-Based Subchannel Selection Method.
In the DQN algorithm, the Q function is approximated by DNN and the Q value is approximated by the DNN weight parameter θ. The Q value is updated by minimizing the loss function to update the parameter θ; the loss function can be defined as where According to Equations (20) and (21), the gradient descent method can be used to solve for the weight parameter θ. DQN uses the current network to evaluate the current value function and uses the target network to generate the target value in Equation (21). The combination of these two networks can decouple the current Q value and the target Q value to some extent, which in turn improves the stability of the algorithm.
The DQN algorithm further introduces an experience replay mechanism to solve the problem of high sample coupling. At each step, the data of the intelligent body interacting with the environment, i.e., the current state s, action a, reward r, and next state s′, are stored in the experience pool. The data can later be drawn from the experience pool for training.
The introduction of the experience replay mechanism makes it easier to store the feedback data and allows training samples to be drawn by random sampling, reducing the high coupling between samples. Furthermore, this mechanism can also solve the problems of nonindependent correlation and nonstationary distribution among data in reinforcement learning, which reduces the convergence difficulty of the network model.

DDPG-Based Power Allocation
Method. The DQN method is able to solve large-scale state space problems, but its limitation is that it can only solve discrete action space problems, so it is not feasible to use the DQN method to make choices in continuous power intervals. For this case, we use the DDPG method for power allocation. DDPG is a DRL method based on value function and policy gradient, which can effectively solve the problem of high-dimensional and continuous action space. The method generates a deterministic action directly through a DNN network named actor, i.e., 7 Wireless Communications and Mobile Computing where μ * ðs t Þ is the optimal behavior policy and ω μ is the parameter of actor network. The resulting actions are then evaluated by a DNN network called critic, with the aim of minimizing the loss function. The loss function is where Similar to DQN, two independent target networks, namely, the target actor network and the target critic network, are introduced to further improve the stability of learning. The parameters of the target network are related to the current network and updated in real time, with the update criterion where δ ≪ 1 is used to limit the change rate of the target value and improve the stability of DNN training. Based on the above theory, the DSPA algorithm in the NOMA-enabled IoV system is shown in the algorithm.

Simulation Environment.
In this section, we conduct simulation experiments on the proposed resource allocation scheme and analyze the results. The simulation experiments are conducted on Windows 10 operating platform with Intel i5-8300H CPU, NVIDIA 1050Ti GPU, and 16 G memory size and based on Python 3.7 and use the TensorFlow 1.13 framework. All networks contain two hidden layers with 128 and 64 neurons, respectively. Following the 3GPP standard and existing studies, we set the parameters to meet the simulation requirements of the NOMA-enabled IoV system, as shown in Table 1.

Learning Rate.
In the DSPA algorithm, the learning rate is an extremely important hyperparameter. Generally speaking, the larger learning rate, the faster convergence speed, but will ignore the optimal solution due to premature convergence, and the convergence value is normally lower than the global optimal value. As the learning rate approaches zero, the speed of obtaining the optimal policy π * decreases gradually and could not obtain the optimal solution quickly. This is because the learning rate controls the size of the optimization gradient step, too large learning rate will lead to too large gradient step, ignoring the optimal solution, while too small learning rate will lead to too small step, requiring more time to converge. Therefore, it is first necessary to choose a suitable learning rate.
We set the values of learning rate as 0.1, 0.01, and 0.001, respectively. The simulation results are shown in Figure 3. When the learning rate is 0.1, the algorithm obtains the maximum EE of 2.8 Mbit/Joule after about 400 iterations. The EE after convergence is not much different between learning rates 0.01 and 0.001, both of which are about 3.2 Mbit/Joule. However, the optimal value is obtained after 500 iterations with the learning rate of 0.01, while the learning rate of 0.001 requires 700 rounds of iterations. In order to take into account the convergence speed and quality, we set the learning rate to 0.01 in the following simulation. Figure 4 shows the impact of different discount factors on the convergence of the system EE. We set the values of the discount factor γ as 0.1, 0.5, and 0.9, respectively. As the number of iterations increase, the system EE gradually leveled off. The system EE for each of the three discount factors is maximized after about 500 iterations, when the EE is 3.0 Mbit/Joule, 3.1 Mbit/Joule, and 3.2 Mbit/-Joule, respectively. The comparison leads to the conclusion that the smaller γ, the more system focuses on the current reward, and the larger the γ, the more system focuses on the long-term reward. Our goal is to maximize the longterm discounted rewards of the system, so we choose γ = 0:9 for the following simulation.

Transmission Rate Thresholds.
We compare the effect of different transmission rate thresholds R min on the system EE, as shown in Figure 5. According to Equation (7), when the transmission rate R ðtÞ n,l,k < R min , VR l cannot successfully decode the information from VT n , and we set R ðtÞ n,l,k = 0 in this case. That is, P ðtÞ n,l,k > 0 but R ðtÞ n,l,k = 0, which will seriously affect the system EE. We set R min as 0 Mbps, 0.1 Mbps, 0.5 Mbps, and 1 Mbps, respectively. Simulation results show that the system EE is maximum when R min = 0. In this case, all messages are decoded successfully as valid messages. However, this setting is not reasonable considering the QoS demand of VT users. The increase of R min indicates that the QoS demand of VT users becomes more strict, and more messages are discarded as invalid messages because they cannot meet the QoS requirement; the system EE gradually decreases as a result. In the following simulations, we choose R min = 0:1 Mbps because the QoS demand of most VT users can be satisfied.

Comparison on SIC Technology.
We compare the EE of the NOMA-enabled IoV system with SIC technology, the NOMA-enabled IoV system without SIC technology, and the OMA IoV system with different vehicles, as shown in Figure 6. It can be seen that when the system contains only 10 vehicles, whether to use SIC technology has less impact on the system EE, while OMA system has the lowest EE. This is because when there are fewer vehicles, the probability of two VT users occupying the same subchannel is lower and only a small amount of interference is generated at the receiving end. The increase of the total number of vehicles means that there are more VT users that need to transmit 8 Wireless Communications and Mobile Computing information; under the condition of a certain number of subchannels, the EE of all three approaches gradually decreases, the EE of the system with SIC technology is always the highest, and the EE of the system without SIC technology is gradually lower than the EE of the OMA approach. The reason is that NOMA actively introduces interference, the large number of VT users multiplexing the same subchannel, the stronger interference received of VR user, and not using SIC technology can lead to disastrous results.

Comparison on Event Trigger
Block. Next, we analyze the event trigger block by comparing the impact of the event trigger block on the system EE. The threshold ρ of the event trigger module is set to 0.1, and the results are shown in Figure 7. In a variety of different situations, the event trigger block has little impact on the system EE.
1: Initialize the Q network weight parameters θ 2: Initialize the actor and critic network weight parameters ω Q and ω μ 3: Initialize the weight parameters of the target network θ′ ⟵ θ, target actor network ω Q′ ⟵ ω Q , and target critic network ω μ ⟵ ω μ′ 4: Initialize replay memory D and event trigger block s, a 5: for episode = 1, M do 6: Initialize random noise ϱ t 7: Initialize the state of the NOMA-enabled IoV system s 1 8: for t = 1, T do 9: Calculate the difference between s and s t according to Equation (19)  10: if ζ = 1 then 11: Select action a 1 t according to the DQN method 12: Select action a 2 t according to the DDPG method 13: Replace s and a in the event trigger with s t and a t = fa 1 t , a 2 t g 14: else 15: Output the action a t = a 16: end if 17: Perform a t , get reward r t and new state s t+1 18: Store sample ðs t , a t , r t , s t+1 Þ into replay memory D 19: Sampling samples ðs i , a i , r i , s i+1 Þ from replay memory D 20: Update the Q network, actor network, and critic network weight parameters θ, ω Q , and ω μ 21: Update the target network, target actor network, and target critic network weight parameters θ′ ⟵ θ, ω Q′ ⟵ ω Q , and ω μ ⟵ ω μ′ 22: end for 23: end for Algorithm 1: DRL-based resource allocation algorithm.  Figure 3: Impact of learning rate. 9 Wireless Communications and Mobile Computing Figure 8 reflects the average computation time for the three comparisons. As can be seen from the figure, the average computation time per execution increases as the number of vehicles increases, and the event trigger block effectively reduces the computation time. Such result shows that although the event trigger block costs extra time to compute the environment similarity, it can reduce some unnecessary neural network computations, which take more time.
We further compared the impact of event trigger thresholds ρ on the system EE, and the results are shown in Figure 9. It can be seen that when the threshold ρ is equal to 0.1, it only slightly decreases the system EE. Combining  Figures 7-9, choosing an appropriate threshold ρ can reduce the computation time of the DSPA algorithm with a slight reduction in system performance.     Figure 10. In the DQN method, we discrete the transmission power uniformly into 10 levels to meet the demand of DQN for discrete action space. The random algorithm indicates that the VT user randomly selects the channel and transmit power each time. As shown in Figure 10, we can see that the system EE decreases for all three algorithms. For both the DSPA algorithm and the DQN algorithm, the system EE decreases faster when the number of vehicles first increases and then gradually decreases. The reason is that, when the system interference is low, adding vehicles causes a significant change in system interference; with the gradual increase of vehicles, the change of system interference gradually flattens out. The system EE of the DQN algorithm is lower than that of our proposed DSPA framework because in the DSPA algorithm we use the DDPG method to select among continuous power intervals, while in the DQN algorithm we can only select among discrete 10 power levels. We believe that the performance of the DQN algorithm will be improved if the power selection levels in the DQN algorithm are increased. However, this would increase the action dimension of the DQN algorithm and take a lot of time. The system EE using the random algorithm is always the lowest due to the random selection of subchannels and transmission power at each step, which can produce catastrophic results.

Conclusion
In this paper, we study the NOMA-enabled resource allocation problem in IoV system. Firstly, we have maximized the system EE by allocating channel resources and power resources for VT users to reduce transmission power consumption on the basis of guaranteed system transmission rate. Secondly, we have transformed the resource allocation problem of maximizing EE into an MDP model. Finally, we designed a DSPA algorithm to obtain the subchannel selection and power allocation strategies for maximizing system EE and used the event trigger block to reduce the computation time. Simulation results show that the NOMA-enabled IoV system outperforms the OMA system, and the proposed resource allocation scheme can significantly improve the system EE compared to other schemes and reduce the computation time. In future work, we will study other NOMA-enabled resource allocation strategies and consider the introduction of mobile edge computing in IoV.

Data Availability
Data is available on request from the corresponding authors.

Conflicts of Interest
The authors declare that they have no conflicts of interest.