Joint Resource Allocation and Power Control Based on Vehicle’s Motion Characteristics in NOMA-Based V2V Systems

Due to the high spectrum utilization of Nonorthogonal Multiple Access (NOMA), it becomes one of the potential candidate technologies for future wireless communication systems. Meanwhile, in New Radio, Vehicle to Everything (V2X) has been proposed as a promising issue in the 3 Generation Partnership Project (3GPP). +is paper studies the resource allocation mechanism with power control strategy which makes full use of vehicles’ moving characteristics in the NOMA-based Vehicle to Vehicle (V2V) communication system. Firstly, vehicles are grouped according to their moving characteristics by spectral clustering. +en, vehicles which are in the same group are allocated the same wireless resource with NOMA strategy. Two grouping methods have been designed for freeway and urban scenarios separately. After that, the transmission power of vehicles is adjusted based on the result of power control strategy utilizing Q-learning. +e simulation results show that the performance of the V2V system in terms of Packet Received Ratio (PRR) can be evidently improved by the proposed joint NOMA resource allocation and power control mechanism compared to typical energy sensing-based resource allocation method.


Introduction
With the development of Intelligent Transport System (ITS), Vehicle to Vehicle (V2V) communication has been the center of intensive research for several years. Due to the increasingly rare wireless spectrum resources along with more and more vehicles, resource allocation scheme design has become a focus of the research in both academic and industrial areas.
Among all academic researches, graph theory and optimization theory have been used the most. For example, resource allocation problem was transformed into a maximum weight matching problem in [1], while it was formulated as a three-dimensional matching problem in [2]. In industrial area, the specific group that belongs to the 3 rd Generation Partnership Project (3GPP) has been doing the standardization work for V2V communications. In Radio Access Network (RAN) 80 th meeting of 3GPP, New Radio-(NR-) Vehicle to Everything (V2X) has been put forward based on standards specified from Release 14 for Long-Term Evolution-(LTE-) V2X. In past meetings, 3GPP RAN have discussed several resource allocation mechanisms. ese mechanisms include mechanisms involving Base Station (BS) schedule resources among vehicles dynamically and the mechanisms by which vehicles select resources automatically without the aid from BS. ese two types of resource allocation mechanisms are referred to as Mode 1 and Mode 2, respectively, in 3GPP [3]. Among all discussed resource selection methods, energy sensing-based mechanism [4] is the most typical one. Otherwise, making the use of vehicles' geographical position has also been another resource selection method put forward by 3GPP [5,6]. In NR communication system, the whole frequency bandwidth is divided into subcarriers. In time domain, one transmission period includes dozens of time slots. Twelve subcarriers correspond to one Resource Block (RB) in frequency domain. Vehicles transmit signal on specific RBs and on specific time slots. Considering the limited bandwidth that V2V communication can use, for example, 10 MHz, if many vehicles exist or frequent interaction is required, the

System Model and Problem Formulation
2.1. System Model. In V2V broadcast system, each vehicle broadcasts its messages, while others receive messages and try to decode them as shown in Figure 1. T 1 and T 2 transmit their messages in the form of data packets; R 1 , R 2 , and R 3 attempt to receive packets from T 1 as well as packets from T 2 . SINR ij which denotes the Signal to Interference plus Noise Ratio (SINR) at receiver j from transmitter i can be calculated by In (1), P i is the transmitting power of transmitter i, |h ij | 2 denotes channel coefficient from transmitter i to receiver j, N 0 is the one-sided power spectral density of Additive White Gaussian Noise (AWGN), B 0 is the bandwidth that transmitter i uses to transmit messages, and N represents the number of vehicles in V2V communication system. Binary variable α ik equals one when transmitters i and k transmit messages on the same wireless resource at the same time (1a). Another binary variable s j i,k shows the result of applying SIC. When (P i |h ij | 2 /n ij ) > (P k |h kj | 2 /n kj ), vehicle j decodes message from vehicle i firstly and decodes message from vehicle k afterwards. In this situation, binary variable s j i,k equals one, which means that interference from transmitter k exists. Otherwise, s j i,k equaling zero represents interference cancelled with SIC (1b).
In Figure 1, when T 1 and T 2 use the same resource in NOMA manner, R 1 can decode the packets from both T 1 and T 2 successfully. Because R 1 is close to T 1 and far from T 2 , it can decode message with the optimal order of decreasing channel gains normalized by the noise. More specifically, R 1 decodes the signal s 1 from T 1 firstly, while it treats the signal s 2 from T 2 as interference when it receives signal s from both T 1 and T 2 . After signal s 1 from T 1 is decoded, it is cancelled in the signal s before R 1 decodes signal s 2 from T 2 subsequently, which means it subtracts s 1 from s and decodes signal s 2 after. erefore, the SINR of receiver R 1 when decoding signal from T 2 is SINR 21 � ((P 2 |h 21 | 2 )/(N 0 B 0 )) rather than SINR 21 � ((P 2 |h 21 | 2 )/ (N 0 B 0 + P 1 |h 11 | 2 )), and so does SINR of R 2 . However, R 3 cannot decode any signal at strong possibility, neither from T 1 nor from T 2 , because it receives almost the same power of signals from T 1 and T 2 . In V2V communication system, safety related messages are broadcasted by vehicles periodically. In 3GPP, PRR is defined as the typical performance metric, which is the statistical average of the probability of all packets received successfully. In essence, the probability of packets decoded correctly depends on the SINR at the receiver. e higher PRR is, the more reliable communications between vehicles are.

Problem Formulation and Analysis.
In the regulation of 3GPP, one communication type is vehicle as transmitter broadcasts its messages, while other vehicles are as receivers receiving messages. As mentioned above, PRR is the statistical average of the probability of all packets to be decoded successfully. Hence, the objective for resource allocation and power control is maximizing PRR of all vehicles. Furthermore, since the probability of packets to be correctly decoded basically depends on the SINR at the receiver, the goal alerts to maximize the total SINR of all vehicles. e relationship between PRR and SINR is shown in Figure 2, which illustrates the feasibility to replace PRR with SINR as the objective of resource allocation and power control: Assume that there are N vehicles in the V2V system. e total available bandwidth is divided into M subchannels (one subchannel consists of several RB) and one transmission period is divided into T slots. In this paper, one subchannel in frequency domain and one time slot in time domain are defined as a Resource Block Group (RBG) as the basis for resource allocation. erefore, there are totally M * T RBGs in one period. Assume that BS has perfect channel state information of all vehicles in the system via dedicated feedback channels. Each vehicle i transmits packets with power P i on the RBG, which is allocated by BS. All of vehicles except transmitters decode messages with the optimal order of decreasing channel gains as discussed in Section 2.1. For each slot, specific subchannels are allocated to vehicles to transmit packets and receivers' SINR are calculated. e total SINR of all vehicles within one period can be calculated by (2).
In this formulation, set Tr represents vehicles using subchannel m in slot t to transmit messages, while set Re represents other vehicles receiving messages (2). Constraint (2a) corresponds to the transmitting limitation that one vehicle should transmit packets in one period once with only one subchannel, which means each vehicle uses one RBG in each period for transmission. Constraint (2b) means each RBG can only be assigned to at most c users in NOMA manner, which is also the maximum number of vehicles using the same RBG. e larger c is, the more complicated the process of decoding and receiver can be. c equaling one means no resource collision happens. Binary variable x i m,t equals one only when subchannel m in slot t is allocated to vehicle i (2c). Constraint (2d) shows the result of using SIC in NOMA manner: when (P i |h ij | 2 /n ij ) > (P k |h kj | 2 /n kj ), binary variable s j i,k equals one, which means interference from transmitter k exists; s j i,k equaling zero representing interference has been cancelled with SIC. Constraint (2e) shows the transmission power of each vehicle should not exceed the maximum power P max . e optimization objective shown in (2) is a nonconvex problem and also a NP-hard problem because variables are binary and the existence of interference. However, the upper boundary of (2) can be given and that is the situation without interference between transmitting vehicles. Because of the existence of interference, this upper boundary can never be achieved. erefore, a joint resource allocation and power control algorithm based on machine learning to solve (2) is proposed in the next section.

Resource Allocation Mechanism and Power Control Strategy
It is not easy to find the optimal solution with such constraints. Even with the greedy method, the optimal solution at this moment does not mean that it is the optimal solution at next moment as well because the distance between each two vehicles is changing from time to time. To simplify the problem in (2), it is decoupled into two stages. At the first stage, assuming that transmit powers of all vehicles are the same, resource allocation mechanism based on vehicles' moving characteristics is conducted. At the second stage, power control is further done according to the resource allocation results obtained at stage one.
In this paper, two typical scenarios, freeway and urban, are considered, corresponding to comparatively simple and complicated vehicle traffic conditions. In different traffic scenarios, the importance of different moving characteristics of vehicles is different. For example, vehicles running on highway rarely change their moving statuses such as direction and speed. In most cases, link type between a transmitter and a receiver is Line of Sight (LOS). However, in urban scenario, vehicles may change their driving directions at any cross and their speed can change every now and then due to, for example, traffic jam and the control of traffic light. Even the link type between two vehicles may change because of the probability of building block. erefore, different users grouping methods are proposed in this paper to cover different scenarios.

Resource Allocation Mechanism for Freeway.
As mentioned above, the key step in the design of resource allocation mechanism is user grouping. In freeway scenario, there are some considerations in design. Firstly, vehicles which are as far away as possible should be chosen to use the same RBG in NOMA manner. e reason is that when two transmitters that are allocated to same resource are near to each other, their neighbors as receivers cannot receive and decode packets successfully from neither of them because of the large interference between them. To reduce the occurrence of the above situation, a parameter c is defined to represent the minimum distance between vehicles sharing the same resource.
In user grouping, receivers which have similar distance to the transmitting vehicles should also be taken into account. e reason is that such kind of receivers usually cannot decode the messages from any of the transmitters; for example, R 3 in Figure 1 cannot decode message from neither T 1 nor T 2 at great possibility.
A centralized scheduling mechanism that groups the vehicles according to their moving features is proposed based on the above considerations. e vehicles in the same group are allocated to the same resource in proposed resource scheduling algorithm. Vehicles in the same group are expected to have similar speed and similar moving direction. Furthermore, the distance between them should be larger than c. By this means, distance between members in the same group is relatively stable and the interference caused by close distance between transmitters sharing the same resource can be avoided. e detailed resource allocation algorithm is described as follows: In the first step, all vehicles are divided into several categories according to their speed and direction. Vehicles having similar direction and speed are in the same category G f . e roads in freeway are mainly designed for two or three kinds of vehicles' speed, like carriageway and passing lane for relatively slow and fast vehicles, respectively. us, there are two kinds of speeds adopted in simulation in Section 4.1 to simulate freeway in reality. e more dispersed the vehicle speed is, the greater the number of the category G f will be, and the less vehicles each category has and vice versa. e following steps are done in each category G f and the resource allocation algorithm will end until each vehicle belongs to a group. Hence, there is no influence on resource allocation mechanism proposed in the paper no matter how dispersed or intensive the vehicles' speed is.
In the second step, we will decide which vehicles can be chosen to be in the same group. Vehicle j is randomly chosen at the beginning of the algorithm, and suppose that vehicle j is in the r th group, g r . Vehicles that can be in the same group as vehicle j should be in the same category as vehicle j. en we check each vehicle in this category according to (3) and the vehicle with the maximum argument is selected to be in the same group as vehicle j. Suppose that there are N vehicles in the system; (3) ensures that vehicle i is in the same category as j and is far away from j. At the same time, the number of receivers which have similar distance to transmitters using the same resource is minimized.
In (3), dm ij denotes the minimum distance between vehicle i and vehicles in the same group with j (3a); N k�1,k≠i,j f kij denotes the number of vehicles that have the almost equal distance to vehicle i and vehicle j; d ij denotes the distance between vehicle i and vehicle j; binary variablef kij equals one when the difference between distance from receiver k to transmitter iand distance from receiver k to transmitter jis within αm and such kind of receiver k basically cannot decode messages from any transmitters correctly (3b); binary variable x ij equals one when i, j are in the same category (3c); vehicle i should be far away from j and the distance between them should be larger than c (3d). Considering the number of vehicles that have similar distance between the two transmitters can be a quite small number compared to the distance between two transmitters, f kij is suitably magnified through multiplying by β(β > 1).
Repeat searching until no vehicle in the same category satisfies (3). en the vehicle in the same category with j which has the minimum distance to all vehicles in the previous group is regarded as the first element of the next group. Repeat checking the satisfaction of (3) and finding the first member of next group until all vehicles in this category are chosen into groups. en another category is chosen and the above steps are repeated until all the vehicles are the members in groups.
Vehicles in the same group use the same resource. If the number of groups is larger than the number of resources, the value of c is repeatedly decreased and the vehicles are grouped until the number of groups is slightly less than the number of resources.
In order to get better performance with scarce spectrum, the minimum distance between vehicles sharing the same resource is calculated by the above steps, which means that distance depends on the amount of wireless resources. If wireless resources are adequate for vehicles information transmission, the minimum distance between vehicles sharing the same resource will be larger than that under the situation with little wireless resources. e detailed description of Algorithm 1, vehicles grouping algorithm, is shown below. e vehicle keeps transmitting on the allocated resource until it changes its motion status, like leaving the road or changing the speed, direction, and so forth. Once things like the above happen, it leaves the original group and needs to be regrouped. e group that satisfies (3) is the group for it to join in. If no group satisfies (3), it becomes the only member of a new group when idle resources exist. When no group satisfies (3) and no resource is idle, the group that satisfies (3) but relaxes (3d) can be the group for it to join in. e whole RBG reallocation algorithm for vehicles changing their moving status is shown in Algorithm 2.

Resource Allocation Mechanism for Urban Scenario.
In urban scenario, the motion characteristics of vehicles are more complicated than those in freeway scenario. Vehicles change driving direction and speed frequently and the link type between two vehicles may change unpredictably due to, for example, blocking by a building or tree. In user grouping algorithm, these characteristics should be taken into account in addition to distance considered in freeway scenario.
Spectral clustering (SC) as an unsupervised method can partition the data into different groups according to multiple features. In most cases, the superiority of SC is attributed to the design of a metric function and the affinity graph [14]. As for the constant and unpredictable vehicle traffic changes in urban scenario, SC is therefore considered to solve the resource allocation problem while regarding vehicles as data points in this paper. Vehicles' geographical position, driving direction, speed, and communication link type are the main motion features taken into consideration.
Apart from selecting proper motion features, appropriate weight should also be built upon each feature with the consideration of their influence on communication between vehicles to design metric function. Proper grouping vehicles method in urban scenario needs quantizing features about vehicles and building appropriate weight among vehicles to assess the similarity of vehicles. According to similarity, the vehicles are clustered by, for example, BS during each transmitting period. After clustering, vehicles in the same cluster share the same resource in NOMA manner.
In the beginning of the next part, a brief introduction of normalized cut which is the major step in SC is given. After that, features selection and metric establishment are described and resource allocation algorithm in urban scenario is proposed.

Brief Introduction of Normalized Cut.
When inputting a data set X with n d-dimensional samples, like X � x 1 , x 2 , . . . , x n ∈ R d×n , the clustering algorithm can group X into k 2 clusters c i , i � 1, . . . , k 2 with the aim of keeping data within the same cluster close to one another and data points from different clusters remain apart. at is to say, normalized cut not only minimizes weight of edges between different clusters but also maximizes weight of edges within cluster [15]. e main steps of normalized cut are shown as follows: (1) Construct similarity matrix S.
(2) Construct adjacency matrix W and degree matrix D.

Features Selection
Feature 1: Distance between vehicles. Similar to freeway scenario, the relative position among vehicles, namely, distance, is still important. In vehicles grouping, vehicles sharing the same resource are expected to be as far away as possible from each other. Weight calculating from distance feature is designed to be proportional to distance. e closer vehicles are, the less likely they are in the same cluster. Feature 2: Speed of vehicles. Speed is another important feature because the speed of vehicles affects the distance between them dynamically. When there is a speed difference between vehicles, the distance between them changes drastically continuously.
is brings about uncertainty of the distance. For instance, two vehicles Mobile Information Systems are far away from each other at the beginning and move in the same direction. If they are allocated to the same resource without considering their speed, their distance may become smaller after a short time if the back vehicle is faster than the front one. us, weight should be proportional to the difference of the speed inversely, which ensures that vehicles with large speed difference are in the same cluster with low probability. Considering that urban is the scenario with plenty of vehicles with different speeds, Simulation of Urban Mobility (SUMO) is used in Section 4.2. to model the real urban scenario. e speed of each vehicle is the result of normal running and complying with traffic rules. Feature 3: Moving direction of vehicles. Similar to speed, moving direction also affects the change in distance between vehicles, like approaching or separating. Feature 4: Type of communication link between vehicles. e link types we refer to here are LOS and Non-Line-of-Sight (NLOS). If the link between two vehicles is blocked by buildings or other obstacles, the link is regarded as NLOS. Otherwise, it is LOS. When type of communication link is NLOS between transmitting vehicles, the communication links' types between receiving vehicle and each transmitting vehicle are more likely to be different. us, the receiving power values from different transmitting vehicles are more likely to have larger difference, which is helpful for receiving vehicle to decode the messages from both transmitters in NOMA manner. erefore, the vehicles with NLOS Input: e sets of vehicles G f based on the moving features; u � |G f |; d ij denotes distance between i and j; N is the number of vehicles; Output: e group g r in which vehicles use the same RBG; end if (15) end for (16) until (G f � ∅) (17) end for ALGORITHM 1: Vehicles grouping algorithm.

Input:
Vehicle i which changes its moving status; m � |g r | is the number of g r depends on Algorithm 1 Output: e new group i belongs to; (1) for each r ∈ [1, m] do (2) if i ∈ g r then (3) g r � g r \ i { } (4) end if (5) end for (6) if g r satisfies (3) then (7) g r � g r ∪ i { } (8) else if m < number of RBG then (9) g r+1 � i { }, m � m + 1 (10) else if g r satisfies (3) without (3d) then (11) g r � g r ∪ i { } (12) end if ALGORITHM 2: RBG reallocation algorithm. 6 Mobile Information Systems link are expected to share the same resource to reduce interference.

Metric Function Establishment.
ere are several strategies to construct adjacency matrix. Among all strategies, the most common way to compute the adjacency matrix, namely, weight matrix, is full connection (see the following equations): Consider that different features have different effects on V2V communication. e method to compute weight matrix is designed upon different features.
where v i is the numerical value of vehicle i's speed and d i is the numerical value of vehicle i's driving direction; four moving directions, up, down, left, and right, are, respectively, represented by 1, 2, 3, and 4; Σ is twodimensional covariance matrix in . e similarity of vehicle speed and moving direction between vehicle i and vehicle j is measured by Mahalanobis distance which ensures that each variable is independent of the measurement scale. w dis ij represents the weight calculated from distance, d ij is the distance between i and j, σ 1 and σ 2 are fixed, and σ 2 is bigger than σ 1 . e closer the distance between vehicles sharing the same resource is, the more interference there will be for receivers receiving messages from any of them. As can be seen in [16], interference is relatively small when the distance between transmitting vehicles using the same resources is larger than 300 meters. erefore, 300 meters is used as demarcation point σ 1 of piecewise weight calculation formula derived from distance.
us, here, σ 1 � 300 and σ 2 � 600; the value of w dis ij is large when the distance is large; when d ij is bigger than σ 1 , w dis ij is equal to exp((d ij /σ 2 ) − 1); when d ij is smaller than σ 2 and bigger than σ 1 , w dis ij is smaller than one; when d ij is bigger than σ 2 and bigger than σ 1 , w dis ij is bigger than one; otherwise w dis ij is equal to zero. Because the multiplication of w dis ij and other terms is equal to w ij , w dis ij > 1 means that the possibility of vehicle i and vehicle j to be divided into the same group is increased and the possibility of them using the same wireless resources for transmission is increased accordingly. Otherwise, it means that they are less likely to be divided into the same group. Because vehicles are in half duplex mode, the vehicles within a certain range cannot send information at the same time, or they definitely cannot receive information from the others. us, when d ij ≤ σ 1 , w dis ij � 0 means that the possibility of vehicle i and vehicle j being divided into a cluster and using the same wireless resources for transmission is basically zero (5a). In (5b), w l ij refers to the weight calculated according to the type of communication link between vehicle i and vehicle j, l ij represents the type of communication link between vehicle i and vehicle j, and σ 3 (σ 3 > 1) and σ 4 (σ 4 < 1) are fixed, such as σ 3 � 1.2 and σ 4 � 0.9; when the communication link from vehicle i to vehicle j is NLOS, the possibility of them using the same wireless resources for information transmission is increased; otherwise, it is decreased.

Resource Allocation Mechanism. Algorithm 3 (vehicle clustering algorithm) based on normalized cut shown below is conducted in the beginning of each transmission period.
Vehicles in the same partition share the same resources; therefore, the number of clusters k equals the number of RBGs.

Signaling Process for Resource Allocation Mechanism.
Resource allocation mechanism proposed in this paper is done by BS, which means that BS needs to collect information about vehicles. Consensus has been formed that NR supports that users (UE) report assistance information to the generation Node B (gNB) after the discussion in 3GPP 94 th meeting [17]. e whole signaling process is shown in Figure 3. In the beginning, each vehicle reports geographic and speed related information to BS. Next, BS executes resource allocation mechanism. en, BS transmits message about resource allocation results to vehicles and vehicles receive that message. After decoding corresponding message, vehicles obtain which RBG it can use to transmit message to other vehicles.

Power Control Based on Q-Learning.
Q-learning is a model-free reinforcement learning algorithm, and it can learn to find optimal policy through maximizing expected reward. It shows very good performance in the complex system. In this paper, Q-learning is introduced to solve the power control problem.
In Q-learning, state, action, and reward are three main elements. Detailed contents about Q-learning can be found in [18]. In this NOMA-based V2V power control problem, state is formulated as the set of transmission power values of all transmitting vehicles using the same wireless resources. In order to limit the number of elements in the state set, it is assumed that the vehicle can use one possible discrete power value in P to send information (see the following equation): Because the upper limit of the transmission power value of each vehicle is P max , each possible power value p i is smaller than that. When K vehicles use the same resource, the state set S is expressed as follows: Action set is the change of transmission power of all transmitting vehicles using the same resource. In order to limit the number of elements in action set, the change of the transmission power value of each transmitting vehicle can only be reduced, increased, and kept unchanged, which are represented by −1, 0, and 1, respectively. When K vehicles use the same resource, there are 3 K possible actions. At this time, a i is the power changing action of vehicle i, and action set A is represented by the following equation: e upper limit of transmitting power value of each vehicle is P max . In the circumstance that vehicle uses the power value p i to transmit messages, if continuously increasing p i causes transmission power to be greater than P max , the action of increasing the transmission power will not be taken; similarly, if continuously decreasing p i causes transmission power to be equal to or less than zero, the action of reducing the transmission power will not be taken.
Reward is the sum of SINR when other vehicles receive messages from transmitting vehicles since the possibility of information being decoded successfully is proportional to the SINR. When K vehicles use the same resource, the sum of SINR when receiving vehicles receive information is as follows: Greedy search which achieves the balance between exploration and utilization is used. Explore rate ε equals one and decreases gradually. e action is selected based on the information that has already known when the random number generated in each step is bigger than ε; otherwise, the action is chosen randomly. e power control strategy learning process based on Q-learning is shown in Algorithm 4. When estimated value functions of the t-th Q t (s t , a t ) sampling and t + 1-th sampling r t+1 are known, Q t+1 (s t , a t ) can be obtained by incremental summation (10), where α t is the learning rate, which indicates how fast to give up the old value. Q t+1 s t , a t � Q t s t , a t + α t r t+1 − Q t s t , a t .
After learning, the action that gets the maximum Q value is the power control strategy.

Simulation Results
In this section, the proposed joint resource allocation and power control mechanism is evaluated through system-level simulation.

Freeway Scenario.
In this section, the proposed mechanism under the freeway scenario defined by 3GPP [19] Input: e moving features of vehicles: d ij denotes the distance between vehicle i and vehicle j; v i is the speed of vehicle i; d i is the direction of vehicle i; l ij represents the communication link type between vehicle i and vehicle j; N is the number of vehicles; Output: e partition G;  shown in Figure 4 is evaluated. e major simulation parameters are shown in Table 1.
Here, sensing mechanism is selected as the comparison algorithm, which is one kind of resource allocation scheme regulated by 3GPP. Each vehicle senses the energy on every RBG and ranks them from low to high. When reallocation happens, it converts to a RBG which is one of RBGs corresponding to the lowest 20% energy ones as long as the 20 th minimum value of energy of RBGs is 3 dB less than the energy value of RBG it is using now [1]. is mechanism needs vehicles to sense energy constantly. Figure 5 shows the performance of PRR for sensing mechanism and the proposed mechanisms. It is clear that the proposed mechanism with or without power control strategy's PRR is higher than that of the sensing mechanism. Among three mechanisms, the proposed mechanism with power control has the best performance. When the distance between vehicles is larger than 250 m, the performance of sensing mechanism decreases greatly compared to that of the proposed mechanisms. e reason is that the vehicles' moving characteristics are taken fully into consideration to determine the resource allocation proposed in this paper, which makes vehicles as far as possible use the same resource.
To further evaluate the resource utilization efficiency, the utilization ratio of resource is defined as the number of RBGs that have been allocated to vehicles divided by the total available number of RBGs. As shown in Figure 6, the RBG utilization ratio is constant in the proposed mechanism, while the utilization ratio changes frequently for the sensing mechanism. is means that resource reallocation in sensing mechanism happens more frequently than that in proposed mechanism. In freeway scenario, where vehicles do not easily change moving characteristics, the proposed mechanism capturing these steady characteristics has very stable performance.
To get better performance in V2V communication system, the amount of vehicles that are allocated to the same resource should be less and the distance between them should be far to eliminate interference. Figures 7 and 8 show the relationship between the number of vehicles sharing the same resource and their distance at different time for the sensing mechanism and proposed mechanism between vehicles sharing resource in proposed mechanism is relatively farther, while that in sensing mechanism is equally distributed along the whole length of the road. It is clear that the proposed mechanism taking distance characteristic into account in the resource allocation has better performance.

Urban Scenario.
e joint resource allocation and power control mechanism for urban scenario is evaluated under the real map of Manhattan in Figure 9. Simulation scenario is further abstracted by SUMO [20] as shown in Figure 10. Moving features of vehicles are also obtained by SUMO according to the map, for example, the change of moving speed and direction at the cross. Major simulation parameters are shown in Table 2, including parameters of clustering as described in Section 3.
(1) initialize Q-table to zeros (2) for time t do (3) if rand(·) < ε then (4) select action randomly (5) else (6) choose action a t+1 � arg a t+1 max Q(s t+1 , a t+1 ) (7) end if (8) calculate reward value as (9) (9) update Q-table as (10) (10) t � t + 1 (11) end for (12) choose a � arg a max Q(s, a) ALGORITHM 4: Q-learning based power control algorithm.  Mobile Information Systems Figure 11 shows the PRR performance of the proposed mechanism without power control and with power control and sensing mechanism. Compared with the PRR performance in freeway scenario, PRR in urban scenario decreases more sharply when the distance between vehicles increases. Similar to the results under freeway scenario, the PRR of the proposed mechanism is higher than that of the sensing       mechanism. Among three mechanisms, the proposed mechanism with power control has the best performance. Figures 12 and 13show the relationship between the number of vehicles sharing the same resource and their distance at different time for the sensing mechanism and proposed mechanism, respectively, in urban scenario. It is clear that vehicles with larger distance have greater possibility to share the same resource. However, in urban scenario, there are more characteristics to consider in vehicles clustering, including distance. erefore, the proposed mechanism has more possibility to group vehicles with similar moving characteristics, not just vehicles that are far away from each other, to share the same resource. Figure 14 demonstrates the average number of vehicles sharing the same resource at different distance ranges. It can be observed that less vehicles share the same resource in the proposed mechanism compared to the sensing scheme at most distance values, which indicates that the proposed scheme can utilize resource more efficiently and has better PRR performance.

Analysis of Computational Complexity.
In the sensing mechanism, vehicles need to collect Schedule Assignment (SA) messages, detect received energy on each RBG, and exclude RBGs based on SA messages. Each vehicle ranks RBGs according to their own average received energy and selects RBGs for itself. All of the above steps are done by vehicles themselves, which means that vehicles' computing capabilities influence delay greatly. With the existence of the ranking procedure, the computational complexity is between O(n log n) and O(n 2 ) according to which kind of sorting algorithm is adopted, while n is the number of RBGs. e complexity of the vehicle grouping resource allocation algorithm for freeway scenario proposed in this paper   is O(N * m), while N is the number of vehicles and m is the number of vehicles in the group divided by their velocity. Because in a typical freeway scenario [19] the number of vehicles N is a little bigger than the number of RBGs, while the number of vehicles in the group divided by their velocity m is a bit smaller than the number of RBGs, the computational complexity of vehicle grouping algorithm is between the lower limit of the sensing mechanism and the upper limit of the sensing mechanism. Computational complexity of the vehicle clustering resource allocation algorithm for urban scenario utilizing SC is O(N 3 ) which is larger than the sensing mechanism. However, considering that computation process in the resource allocation mechanisms for freeway and urban scenarios proposed in the paper is done by the BS which collects vehicles' geographic related information and so forth from vehicles and that BS's computing capacity is far beyond that of the vehicle equipment, delay is less or at least comparable to the sensing mechanism with higher PRR.
After receiving resource allocation results from BS, vehicles adjust their power according to power control strategy utilizing Q-learning. Owing to learning process in reinforcement learning, the computational complexity is relatively high, which depends on the number of the steps in each episode. e advantage of this method is that the transmitting power can be adjusted along with the change of the environment. In the future, some effective methods can be adopted to reduce the number of iterations and reduce the computational complexity.

Conclusion
In this paper, NOMA is introduced into V2V communication system to enhance the utilization of limited frequency resource and a joint resource allocation and power control mechanism based on vehicles' moving characteristics is proposed. According to different moving conditions in freeway and urban scenarios, two resource assignment algorithms are designed, which divide vehicles into several groups according to their moving features. After that, power control strategy is obtained through Q-learning. Systemlevel simulation results show that PRR of the proposed mechanism can be improved compared to that of the energy sensing mechanism.
Data Availability e simulation codes' data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.