Joint Trust Management and Sharing Provisioning in IoV-Based Urban Road Network

Internet of Vehicles (IoV) is a novel technology to enhance the safety, intelligence, and efficiency of traffic systems, where vehicles can exchange critical information with other vehicles, roadside units, pedestrians, and cloud platforms. However, the dynamic network topology, high speed, and exposed communication links inevitably pose security threats to IoV. It is pivotal to establish a trust management and trust-sharing mechanism between vehicles to guarantee the safety of IoV. This paper proposes a distributed trust management scheme to discriminate malicious vehicles utilizing the machine learning technology Random Forest (RF). With the help of the sliding time window technology, the trust degree of vehicles can be comprehensively evaluated through the CART trees according to the current and historical records. To further improve the security of communication processes, we also introduce a lightweight cryptography mechanism. In addition, a trust-sharing mechanism based on path prediction algorithm is proposed to guarantee the consistency of trust information in the network. Finally, extensive simulations are conducted to demonstrate the feasibility and efficiency of the proposed scheme.


Introduction
Under the facilitation of 5G/B5G, bulks of smartapparatuses are connected to the Internet to execute massive information interactions, symbolizing the official arrival of Internet of Things (IoT) [1]. Internet of Vehicle (IoV) is a variation of IoT. The ultimate goal of IoV is to enhance the safety, intelligence, and efficiency of traffic systems, with vehicles exchanging critical information with other traffic entities. Recently, IoV is on the verge of widespread deployment with the emergence of various advancements in radio access and core network technologies [2,3]. Equipped with intelligent devices, such as wireless sensors and On-Board Units (OBU), vehicles have powerful communication, storage, and computing capabilities. Besides, IoV is also capable of implementing the Intelligent Transportation System (ITS), and the integration of dynamic information service, which can reduce the number of traffic accidents and alleviate traffic congestions [4,5].
Meanwhile, the highly dynamic network topology, unconversant relationships between vehicles, and exposed communication links inevitably pose security menaces to IoV. On one hand, there are deviations between the information obtained by vehicles and the natural environment because of the failures of sensors or other smart devices. On the other hand, malicious vehicles can acquire illegal benefits by injecting false information into the network. They can directly forge and broadcast fake messages, disguise as legitimate entities, and even tamper with the transmitter's practical information [6], which causes threats to the authenticity and reliability of the information. Based on the above analyses, it is particularly urgent to establish an efficient trust management mechanism for IoV. A proper trust management mechanism can discriminate malicious nodes, resist malicious attacks, and ensure the stability of communication processes, thereby improving driving conditions and ultimately improving the safety of IoV.
The concept of "trust management" was first proposed by M. Blaze in 1996 [7]. The author emphasized that trust management is an integral part of network service security. Trust management mechanisms can be considered from identity verification, attack detection and mitigation, confidentiality, privacy, trust and reputation, and other dimensions. The kernel of trust management is to formulate a suitable trust evaluation mechanism according to precise regulations. Malicious nodes can be discriminated by calculating trust value, and then, other nodes in the network select trusted nodes for interactions.
According to the framework of trust management mechanisms, it can be divided into two categories: centralized and decentralized management. For centralized trust management, a trusted entity in the network is required to execute the trust management mechanism, and all information is reposited in a central server. When vehicles need to exchange information with other vehicles, they must communicate with the central server. Centralized trust management mechanism has good stability but poor scalability and cannot adapt to the highly dynamic network topology of IoV. Besides, it also faces a single point of failure problem. Conspicuous, the decentralized trust management is more applicable to IoV. At present, the researches on decentralized trust management mechanism mainly adopt the following underlying technologies: cryptographic, recommendation-based, fuzzy logic-based, game theory-based, and machine learning-based approach.
Cryptography is the first line of defense for communication systems. Choi et al. [8] first associate symmetric certification by using short-lived pseudonyms in VANETs. Vasudev et al. [9] propose a lightweight trust authentication and management scheme using Cryptographic Hash Functions, but it lacks the judgment on fake messages. In IoV, it is essential to ensure the trustworthiness of vehicles, but the authenticity of messages also cannot be fooled. Ahmad et al. [10] propose MARINE to detect and revoke dishonest vehicles, incorporating entity and data trust. Wang et al. [11] propose a distributed HDMA scheme for 5G-enabled VANETs using a group signature-based algorithm for mutual authentication between V2V communications.
The trust management method based on neighborhood vehicle recommendation is realized through indirect communication between vehicle nodes. Hu et al. [12] propose a scheme called "REPLACE," which is a trust-based platoon service recommendation scheme to help the user vehicles avoid choosing badly behaved platoon head vehicles. Ahmed et al. [13] combined direct and indirect trust to identify any potential malicious nodes in the current network by calculating local trust and analyzing suggestions from other neighbors. Li et al. [14] propose a reputation-based global trust establishment scheme (RGTE) that safely shares the trust information in VANET by applying statistical laws. In addition to the above two methods, Soleymani et al. [15] propose a fuzzy trust model based on experience and plausibility to secure the vehicular network. Guleng et al. [16] propose a scheme that uses a fuzzy logic-based trust calculation approach to evaluate the direct trust of trustee nodes. Halabi and Zulkernine [17] present a vehicular coalition formation approach that incorporates a hedonic cooperative game model, which aims at preventing malicious or faulty vehicles from joining collaborative benign vehicular communities. With the emergence of machine learning, scholars have made researches on the application of this in network security [18]. Jiang et al. [19] propose a new trust evaluation and update mechanism for underwater wireless sensor networks based on the C4.5 decision tree algorithm (TECU).
However, vehicle travels at high speed following the intended driving route and only establishing a suitable trust management mechanism that is not sufficient to ensure the safety of IoV. How to certify the consistency of "trust" is another problem worth paying attention to. At present, most researchers use the central system controller to share the trust value of the vehicles. However, the location of the central controller is generally stationary, and the potency of trust sharing drops markedly as the distance between the vehicle and the controller increases, which imposes restrictions on the scalability of IoV.
Considering the property of decentralization, immutability, transparency, and fault tolerance of blockchain, many researchers use blockchain technology to realize trust management and sharing mechanism. Singh et al. [20] propose a blockchain-based decentralized trust management scheme using smart contracts. Specifically, they introduce blockchain sharding to reduce the load on the main blockchain and increase the transaction throughput. Yang et al. [21] propose a traffic event validation and trust verification mechanism based on blockchain's decentralized nature and first proposes the "proof-of-event" consensus algorithm to ensure the correctness of stored information. However, it must be noted that the mining cost of running consensus mechanisms is expensive and requires enormous computing and storage resources, limiting blockchain applications.
Our article proposes a Random Forest-based trust management mechanism named MTRF for IoV to determine vehicles' identities and ensure vehicle network security. To avoid the overfitting problem for decision learning technology, we combine the ensemble learning method Random Forest (RF), which allows the model to limit overfitting without increasing the error due to bias. Besides, we also propose a trust-sharing mechanism based on a path prediction algorithm to forecast the following orientations of vehicles. The trust value of the vehicle can be shared point-to-point between RSUs to conquer the negative impact of the central controller. The main contributions of this paper are summarized as follows: (i) To reduce the excessive network resources' consumption and the increased difficulty of vehicle management caused by dynamic vehicle topologies, we propose a dynamic clustering process according to their current locations, driving directions, and other parameters. We also adopt RF technology to realize cooperative multivehicle trust management in a temporary cluster to achieve malicious vehicle identification (ii) Considering the conceptual nature of "trust," a single correct behavior is not enough to prove the vehicle's identity. Therefore, we introduce a sliding time window algorithm to store the vehicles' decision results at different time slots and comprehensively evaluate the degree of vehicle trust. In addition, we set a penalty factor to prevent sudden attacks from malicious nodes with higher accumulated trustworthiness (iv) To accurately share the trust information of vehicles during cluster-switching, we propose a trust-sharing mechanism by utilizing a DQN-based path prediction algorithm. Therefore, the trust information of the corresponding vehicle can be shared between RSUs to conquer the negative impact of the central controller and improve system scalability The rest of this paper is organized as follows. The system model are presented in Section 2. Section 3 introduces the proposed trust management mechanism. The lightweight cryptography algorithm is presented in Section 4. Section 5 presents the trust sharing mechanism. Then, the performances of our proposed mechanisms are evaluated in Section 6. Finally, the conclusion is evaluated in Section 7

System Model
In this paper, we mainly consider the research on trust management and trust-sharing mechanism of IoV under the urban road network scenario. Figure 1 illustrates a typical urban road network architecture composed of numerous intersections, where vehicles are randomly deployed on the roads with known origins and destinations. Roadside units (RSUs) are deployed at the intersections along the roads. Three are two kinds of communication modes in the system: vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I), both of which can be undermined by attackers and reveal important information.
2.1. The Threat and Adversary Model. A variety of emerging communication technologies provide a stable connection between vehicles but also put forward higher requirements on the network model, communication protocol, quality of service, and security of communication system. Due to the strange relationship between vehicles, the authenticity and reliability of the message are questionable. Figure 2 illustrates three major threats to IoV, in which vehicle 1 observes the accident message and transmits to vehicle 2 and vehicle 3 [22]. Figure 2(a) is from the perspective of legitimate vehicles. It is assumed that vehicles 1, 2, and 3 are legitimate vehicles, and the communication links are not attacked. However, intelligent devices such as sensors of vehicle 3 are faulty. At this point, vehicle 2 successfully receives the accurate information about the real event sent by vehicle 1, while vehicle 3 receives an error message mess age A ′ because of sensor malfunction. Even if the identity of the vehicle is legal, it also unintentionally spreads false information into the network.
Figures 2(b) and 2(c) are for malicious vehicles. In 2(b), the malicious vehicle 1 tampers with the observed accident information and acts as an information source to transmit false messages to other vehicles for deception. In 2(c), malicious vehicle 2 communicates with vehicle 1 as a legitimate entity to obtain accident information, tampers with and forwards the information to vehicle 3, and finally commits fraud by destroying the communication link between vehicle 1 and 3. They inject false information into the network to disrupt the transportation system and seek illegal profits. In our trust management mechanism, we mainly consider methods to resist the above two attack modes to resist malicious vehicle attacks.

Trust Management Process.
To improve the framework's flexibility and implement the RF, we divide the vehicles into many clusters that mainly execute decision processes based on the communication processes between vehicles. Vehicles have two types: Cluster-Head-Vehicles (CHVs) and Cluster-Member-Vehicles (CMVs). CMVs establish communication links with other same-clustered vehicles and collect trust evidence to evaluate the identity of the node transmitting messages. The CHV selected for each cluster manages other same-clustered vehicles and communicates with RSUs. We set buses as CHVs to ensure high reliability, computing, and storage capacity [23]. It must be noted that clusters are temporary and updated overtime because of the high mobility of vehicles and the dynamic topology changes of communication networks [24]. Figure 3 visually depicts the configuration of a unitary intersection. The cluster regulated by CHV1 can illustrate the trust management process. If vehicle 1 observes the accident on the road, it will immediately generate the message and broadcast it in the cluster to inform other vehicles. CMVs can judge the trust identities of others in the cluster by utilizing RF algorithm based on the collected trust evidence. CMVs may have different decision results for the exact observed vehicle. CHV collects the decision results from the cluster and transmits them to RSU, enabling RSU to comprehensively consider the different decision results and decide the credibilities of vehicles by updating the vehicle's trust value based on the final integration result.
However, a high trust value of vehicle at the current time is not necessarily indicative of the reliable identification of the vehicle. Trust is a dynamic accumulated value that allows vehicles to use both current and historical records as benchmarks for the trust value. RSU adopts sliding time window technology to compute the trust value of the vehicle both based on the reputation value from the current and former time slots. RSU has the right to remove the vehicle whose trust value is below the specified threshold from the current cluster and notify other CHVs of the vehicle's identity.
By utilizing Random Forest, the identity of vehicle is jointly determined by the other vehicles in the cluster, which partly avoids the problem of decision failure caused by communication interruption between CMVs. However, CMVs, CHVs, and RSUs exchange highly-aggregated information, severely affecting the accuracy of vehicle identity judgment. To strengthen the mechanism's ability against attacks, we introduce a lightweight cryptography mechanism based on Elliptic Curve Cryptography (ECC), Cryptographic Hash Function, and XOR operations to protect the above communication processes from being destroyed by malicious vehicles.        Wireless Communications and Mobile Computing between CHV and CMV causes the vehicle to break away from the current cluster and find an appropriate one called cluster switching. Meanwhile, the trust information of vehicles needs to be synchronized to the corresponding CHV to facilitate the implementation of trust management for newcomers. To overcome the weakness of the traditional algorithm, we propose a novel trust-sharing mechanism based on the vehicle path prediction algorithm utilizing deep reinforcement learning in this paper.
Our algorithm takes the intersection as a unit. Vehicles execute the path predicting algorithm based on the traffic conditions to forecast the following driving orientations by Deep Q-network whenever they reach intersections. When the prediction is complete, CHV receives the prediction results sent by CMVs and compares the information with its direction. If there is a discrepancy, the CHV will establish a communication link with its RSU, and then RSU finds the applicable RSU in the same direction as the prediction. The vehicle trust information is transmitted between RSUs to facilitate synchronization to the corresponding CHVs. Vehicles can predict the next driving direction so that the trust value of the vehicle can be shared point-to-point between RSUs to conquer the negative impact of the central controller, which brings the benefits as follows: (1) The path prediction algorithm is executed at the vehicle layer to improve network scalability and adapted to IoV  Figure 4 shows the primary process of the proposed trust management mechanism, which consists of five parts: dynamic clustering, trust evidence collection and preprocessing, trust evaluation, trust value calculation and update, and communication process encryption. In this section, we elaborate on each of the above four former parts.

Trust Management Process of MTRF
3.1. Dynamic Clustering Process. The high-density vehicles are randomly deployed at intersections with different speeds and paths, which significantly increases the difficulty of vehicle management. An apposite dynamic clustering process is indispensable to reduce excessive signaling overhead and enhance the stability and scalability of the system. The Euclidean distance between vehicle i and j is defined as d ij . This parameter is collectively determined by the current location of the vehicle and the destination location. X i ðtÞ andX i ðt + ΔtÞ, respectively, represent the current and the estimated position of vehicle i. X i ðtÞ, andX i ðt + ΔtÞ and d ij are expressed as follows: To maintain the relative stability of a cluster, we also consider the driving directions of vehicles in our study. The driving direction of vehicle i is defined as d i . Only the vehicles moving in the same directions can be grouped into a same cluster. Binary judgment variable α ij is defined to describe this constraint.
where α ij = 1 represents vehicle i and j that have the same driving directions; otherwise, α ij = 0. After defining d ij and α ij , we can execute vehicle clustering operations. The bus w with R w communication range is designated as the CHV of the w − th cluster. CMVs belonging to cluster w can be defined as follows: where β wi represents whether vehicle i belongs to cluster w, and the CMV i will become the member of m − th cluster only if it satisfies the requirement of driving condition and communication condition simultaneously.
Since the irregular distribution of CHVs, a CMV may be located at the overlapping area of two clusters. In this case, we stipulate that vehicle chooses the cluster where vehicle is closest to its corresponding CHV. We assume that vehicle j locates at the overlapping area of cluster w and w + 1, and then, it chooses the cluster by As stated previously, clusters are impermanent and changing over time. If an CMV changes its path or takes an overtaking action, it will potentially exceed the current CHV's communication range and depart from the current cluster. In this case, to ensure the consistency of the vehicle trust value, the vehicle trust information needs to be synchronized with the corresponding CHV. This paper introduces a DQN-based path prediction algorithm for vehicles to solve the above issue, which will be described in Section 5.

Trust Evidence Collection.
Trust evidence is the basis of decision tree learning used to train the tree's structure and test the tree's accuracy, which is crucial to the performance of MTRF. We collect evidence from three aspects: vehicle-based, data-based, and link-based to consider the credibility of IoV comprehensively. However, the raw data collected by sensors contain missing values, outliers, and obsolete or redundant fields. To ensure the accuracy of the RF-based trust management mechanism, we must preprocess the trust evidence before training. In our proposed scheme, each trust evidence is missing, and default values are replaced by its field mean. Through Equations (5) to (9), we also normalize all the indicators and ensure that they increase monotonically.
3.2.1. Vehicle-Based Trust Evidence. We consider malicious vehicles have three types of attacks: generating fake messages, tampering with messages they received, and deliberately concealing messages about actual accidents. Vehicles receive multiple information about the same event sent by other CMVs and decide whether to forward the information. The number of information and correct information forwarded by the vehicle can reflect its identity. Two parameters TEV i 1 and TEV i 2 are proposed to represent the degree of selfishness and honesty of the vehicle i [16].
where H is the number of neighbor vehicles that send messages about accident m, N i send ðmÞ is the number of messages that vehicle i sends to its neighbor's vehicle, M is the number of accidents that occurred on the roads. ∑ M m=1 N i honest ðmÞ is the total number of honest messages that vehicle i sends to its neighbors, and ∑ M m=1 N i send ðmÞ is the total number of messages that vehicle i sends.

Data-Based Trust
Evidence. Data-based trust evaluations use the message to measure the vehicle's reliability. CHVs collect messages about each accident from different vehicles. Based on the spatial-temporal correlation, the quality of interactive information can be measured by its relevance to other information about the same accident. According to [25], we assume that the data obeys the normal distribution. The deviation between data and the average value reflects the reliability of the data. If the data is closer to the mean value, it will be more reliable than those far away. TEV 3 is defined to evaluate the trust degree of messages.
where v m i is the value of message transmitted by v i . 3.2.3. Link-Based Trust Evidence. The link quality influences the accuracy of messages transmissions among vehicles. Considering attack patterns, such as Man-in-the-Middle attacks [26], attackers intercept normal network traffic data by attacking communication links and perform data tampering and sniffing. We measure the link quality from two aspects: link transmission delay and usage frequency [19].   Wireless Communications and Mobile Computing where TEV i 4 and TEV i 5 represent the link transmission delay and the link usage frequency, respectively, l delay ðn i Þ is the link transmission delay between vehicle n i and its neighbor n j , and l use ðn i Þ is the link usage of n i .
After completing the trust evidence collection, we have five continuous variables. In order to further shorten the MTRF execution time and better meet the requirements of IoV delay sensitivity, we adopt the fuzzification method for TEV i 1~T EV i 5 . Each data is converted into two-category variables fLow, Highg based on fuzzy rules to reduce the computational complexity and latency. If the data is less than threshold, the discrimination is Low; otherwise, the discrimination is High. Vehicles can further use the discretized trust evidence for trust degree classification.

Trust Evaluation
Based on Random Forest. We adopt the Random Forest algorithm to evaluate vehicle reliability. Random Forest is an ensemble learning algorithm using bootstrap technology to extract a random sample set from the original sample set to construct a single decision [27]. Splitting nodes are selected to split at each node of the decision tree employing random feature subspace. Finally, these decision trees are combined to generate the final classification results through majority voting (bagging). RF synthesizes multiple deep decision trees that are trained on different parts of a training set to solve the overfitting problems by reducing variance instead of pruning processes. The details of RF are shown in Algorithm 1.
The CART is selected to generate trees because it uses GINI impurity metric to minimize classification error. S represents the training set with the size of N, which has class-labeled tuples. F represents the attribute set with the size of five. Y contains two types of target fHigh, Lowg and represents the trust degree of vehicles.
As previously described, vehicles have been grouped into several temporary clusters, and the formulas are shown in Equations (10) to (13). Let the total number of clusters as the P, and Num describe the number of vehicles in different clusters. For the w − th cluster, each vehicle trains a CART tree, and the RF scale of cluster w is num w . T w represents the set of decision trees in the w − th cluster. S w 1 and F w 1 are the training data extracted from S and the attribute set for the T w 1 , respectively.
S w 1 = s w 11 , s w 12 , ⋯, s w It is important to note that each sample from S is extracted multiple times because N samples are randomly selected from the training set S with replacement. Each tree has a different training set, and a spot of identical samples appears in S w 1 . m is determined by the size of attribute set F. The training delay is not considered in our proposal because the classifiers are trained offline.
As demonstrated previously in Figure 3, once V x discover the accident e occupied on the road, it will transmit message e to other vehicles in the same cluster. Then, In our proposed algorithm, as shown in Figure 5, a sliding time window is used to store the trust characteristics of the vehicle nodes. The trust value calculation process consists of two parts: majority voting and trust accumulation.
As mentioned earlier, after CHV consolidates the decision results of vehicles in the cluster to obtain RES e x and uploads it to RSU, the RSU executes the majority voting process to obtain the classification result VR t x of the vehicle x at the current moment. In the follow-up process of trust accumulation, the sliding time window is adopted to perform a weighted summation of the trust values from time t − ðh − 1Þ to t to obtain the final trust value of V x according to Equation (15).
where N t H and N t L are the number of high and low classification results in VR, respectively. δ is a positive integer. The value of ffiffiffiffiffiffi N t L δ p rises sharply with the increase in the number of low labels, reflecting the strict punishment characteristics for malicious vehicles. Figure 6 is an effect diagram of Equation (15), where the horizontal axis is δ and the number of classified results as Low, and the vertical axis represents the trust value of the vehicle. Compared with the traditional linear relationship, the addition of the penalty factor δ makes the trust of the vehicle shows a rapid decline with the appearance of the 7 Wireless Communications and Mobile Computing malicious rating, reflecting the system's strict punishment characteristics for malicious vehicles. But at the same time, the penalty factor cannot be selected too small because the system needs to be fault-tolerant to decision failures caused by the randomness of the RF training process.
For MTRF, the computational complexity is mainly composed of RF decision process. Profit from the dynamic clustering mechanism, the evaluation process of the trust degree of the observed vehicles is restricted to a single cluster, which is uniformly managed by CHV. The out-of-cluster CMVs need not to participate in the decision-making process, resulting in lower computational complexity Oðρ 2 N total 2 log ðρN total Þ ffiffiffiffi ffi M p Þ, where ρ ≪ 1 is the proportion of vehicles in the cluster to the total number of vehicles in the environment.

The Lightweight Cryptography Algorithm
With the help of MTRF, attacks by malicious vehicles broadcasting fake information among CMVs can be resisted effectively. However, this is not enough for a complex communication network such as IoV. IoV communication is carried out in an open wireless channel, where numerous types of adversarial behaviors exist. The communication processes between CMVs, CHV, and RSU lack protection mechanisms. Once attackers attack the above communication processes, information such as res e x, 1 and RES e x are directly disclosed, which poses a severe threat to the effectiveness of MTRF.
Input: Training set S = fs 1 , s 2 , ⋯, s N g with targets Y = fy 1 , y 2 , ⋯, y N g Attribute set F = f f 1 , f 2 , f 3 , ⋯, f M g Output: A random Forest ðS, Y, KÞ For k = 1 ⟶ K do Bootstrapping: For k − th tree, N samples are randomly extracted from the training set S with replacement to constitute a new training set S k with targets Y k . Specify a constant m that is far less than M, m features are randomly selected from the attribute set F as a new attribute set F k .
Training: Use S k , Y k , and F k to perform CART procedure and train a classification T k .

End for
Bagging: The final classification is performed by majority vote which is based on the decision results generated by K trees.
Algorithm 1: Random forest algorithm.   Considering the limited computing, storage capacities of vehicles, and strict requirements for time delay for IoV, we propose a lightweight cryptography mechanism. The notations used are shown in Table 1.

Sliding time window moving direction
Elliptic Curve Cryptography (ECC) is an asymmetric encryption algorithm based on the mathematical theory of elliptic curve. Compared with RSA, the ECC has the advantage of using shorter keys to achieve even higher security than RSA [28]. As shown in Table 1, F p represents the finite field of a large prime number p, E p ða, bÞ is an elliptic curve defined by homogeneous Equation (16), where x, y, a, and b belong to F p and are satisfied with Equation (17) [29]. And then Gðx 1 , y 1 Þ is assigned as the base point of E p ða, bÞ of which the order n is a large number. The random number N * is less than n.
The communication channels exposed to the environment are vulnerable to malicious attacks. Providing that only two communication processes are taken into consideration: CMV and CHV, CHV, and RSU. In addition, we assume that each type of road entity knows its identifier ID and private key PRK and generates its own public key PUK based on PRK × G, which is shared in the communication network. The proposed cryptography mechanism is shown in Figure 7. If an CMV V i makes a decision about the vehicle under observation and wants to transmit res e i to corresponding CHV, it will first encrypt res e i through hash function and self ID V i to generate Msg1 e i according to Equation (18) and then select a random number After CHV received the information from V i , it restores Msg1 e′ i based on its PRK CHV and then calculate hðres e′ i kI D V i Þ . If hðres e′ i kID V i Þ = hðres e i kID V i Þ, res e i will be regarded as a complete and legal message. CHV then generates V R t x and Msg2 e x and transmits fN CHV G, Msg2 e x ⊕ N V i PU K RSU g to RSU. The RSU performs the same steps to verify the received VR t x , calculates the TV t x , and returns it to the CHV to ensure the accuracy of TV t x . We take the communication process of CHV-RSU as an example to verify the mechanism's effectiveness. CHV independently chooses the random number N CHV , which is unknown to other road entities. It is almost impossible for an attacker to recover Msg2 e x from fN CHV G, Msg2 e x ⊕ N V i P UK RSU g, when only the RSU public key PUK RSU is known. In addition, since the public key of the post-RSU is used to encrypt the information, the RSU is competent to use its private key to restore Msg2 e x based on Equation (19). To restore Msg2 e x , an attacker must have G and N CHV G to solve for N CHV , which is considered problematic.

The Trust Sharing Algorithm
At present, most researchers use the central system controller to share the trust value of the vehicles across the cluster and adopt the soft handoff method. However, for the network with strong mobility and high delay sensitivity as IoV, it is confronted with the following three weaknesses: (1) When the vehicle requires cluster switching, it must primarily establish a communication connection with the central controller and inform the target cluster. The trust value of the vehicle cannot be shared until the controller establishes communication with the target cluster, resulting in high communication delay and signaling overhead (2) The location of the central controller is generally stationary, and the potency of trust sharing drops markedly as the distance between the vehicle and the controller increases, which imposes restrictions on the scalability of IoV.
(3) The employment of soft handoff makes the vehicle maintain the communication connection with the historical CHV before joining the new cluster, which is a waste of the communication resources of historical CHVs In this paper, we combine trust information sharing with vehicle path prediction, so that trust information can be shared locally purposefully. Recently, reinforcement learning (RL) is developing rapidly and has a good application prospect in path prediction. RL is a principled mathematical  [30]. An agent learns how to maximize the benefits of a sequential decision problem by interacting with the environment. Formally, RL can be described as a Markov decision process (MDP), composed of a 5-dimension tuple ðS, A, P , R, γÞ, where S and A is the state and action set, respectively, P represents the state transition probability Pr ðs t+1 | s t , a t Þ, R stands for the expected reward set. At each time slot t, an agent observes state s t and takes action a t to make the interaction with environment. If the agent takes a t , it will be transformed to a new state s t+1 and acquire a reward r t ∈ R based on the current state and the chosen action. The ultimate goal of RL agent is to find a policy π to maximize the cumulative reward E π ½∑ ∞ t=1 γ t−1 r t , where γ is a discount factor and belongs to ½0, 1.
Q-learning is a widely used model-free RL algorithm that aims to find the Q-function of each state-action pair for the given policy, which is defined as where Q π ðs t , a t Þ represents the cumulative reward when taking action a t in state s t and under the policy π. Q-learning updates the value function by time difference formula: where α is the learning rate. However, the traditional Q-learning algorithm learns the optimal policy by establishing and updating a Q-table, limiting the RL's scalability and ability to solve highdimension problems. Deep Q-network, which is the combination of deep learning and Q-learning, is proposed to settle the above problems by using deep neural networks to approximate the value of the Q-table. The architecture of DQN is shown in fig.reffig:DQN, and after using DQN, Equation (22) can convert to Figure 7: The cryptography mechanism based on ECC and hash function.

Wireless Communications and Mobile Computing
where Qðs t , a t ; θÞ and Qðs t+1 , a t+1 ; θ − Þ are the evaluation and target network, respectively, with different weight θ, θ − , which is used to improve the training stability of DQN. It should be pointed out that the weight θ − of target network is synchronized with θ periodically. Then we use the mean-square error to define the loss function. The network is trained by minimizing the loss, and finally Q ðs t , a t Þ is estimated.
The formulation of path prediction mechanism is mainly divided into three parts: environment observation, action space design, and reward design.

Environment Observation.
Environment observation is the input of the neural network. Whether the observation design is close to the natural environment information directly affects the availability of prediction results. So, the design of observation must accurately capture the characteristics of the application scenario. As for IoV, we take driving safety and driving efficiency as the focus to simulate the environment. The input of DQN is an RGB pixel image, which is shown in Figure 8, consisting of origin, terminus, and current point. To better simulate road conditions, the obstacles, flow, and accident points are also settled.

Action Space.
In traffic path prediction algorithm, there are four types of actions in action space A, A = fup, down, lef t, rightg. Each vehicle is an agent, which comprehensively considers the vehicle's current location and the surrounding environment. Vehicles choose different behaviors to interact with the environment and learn the best policy.

Reward Design.
The core of RL is to learn unfamiliar scenes through interaction with the environment to obtain behavioral strategies to meet the set goals. In this process, the reward is the only feedback that an agent can obtain from the environment [31]. Rewards directly affect whether an agent can learn toward the desired goal and determine the model's effectiveness. Therefore, the design of rewards must fully reflect the expectation. For the consideration of driving safety and efficiency, the reward design of this mechanism focuses on four aspects: avoiding the section where traffic accidents are happening, avoiding the section with high vehicle density, avoiding the obstacle, and reaching the destination. The reward function is defined as folows: where f low threshold is the maximum traffic flow that meets the normal driving speed of vehicles and according to the degree of need of different targets, r barrier < r accident < r f low < 0 < r reach . Based on MTRF, we adopt the cryptography mechanism to prevent malicious attackers from destroying the communication connection between CHVS, CMVS, and RSUs, which effectively protects vehicle privacy information and decision results from being disclosed. After going through the trust management mechanism, the ultimate trust values of vehicles are stored in RSUs and propagated among RSUs according to the path prediction results. Considering that RSUs are mainly deployed by the government and have good authority and security, we assume that RSUs cannot be compromised by attackers. Under this assumption, when the RSU is not under attack, the private information of the vehicle is not easy to disclose.

Implementation and Performance Analysis
The simulation process consists of two parts: MTRF and DQN-based path prediction, and the parameters are shown in Table 2. As for the MTRF training process, each vehicle goes through a random selection of samples. Some samples are not involved in the training process for each vehicle's CART tree, called the out-of-bag samples. The system's accuracy can be evaluated by classifying the samples out of the bag by using RF [32]. We use the parameter accuracy to measure the performance of MTRF.
where OOB represents the out-of-bag error, N out−correct is the number of correctly classified out-of-bag samples, and N out is the total out-of-bag samples. In order to make the results more accurate and convincing, the decision-making process 11 Wireless Communications and Mobile Computing of each vehicle is repeated 500 times to calculate the comprehensive value. Figure 9 illustrates the accuracy and time-consuming with the variable of the proportion of randomly extracted features. It can be seen that as the proportion increases, the accuracy of MTRF first increases rapidly, and then gradually stabilizes, accompanied by a rapid increase in time consumption. This is because too small feature extraction rate results in incomplete growth of decision tree training and inability to accurately determine vehicle identity; excessive feature extraction rate results in complete growth of decision trees, and the forest composed of such trees is too capable to reflect the RF superiority. Time-consuming continues to increase as the complexity of the decision tree structure increases. Considering the accuracy of the system and the time-consuming comprehensively, the proportion of extracted features is 0.3, that is, when two features are selected, the accuracy and effectiveness of the system are both quite satisfactory. It should be noted that f eatureextr actionrate × M is a noninteger, and the rounding operation  is adopted in the experiment, resulting in the same number of feature extraction, which causes the slow growth of the time-consuming curve at the early stage. Figure 10 illustrates the accuracy of MTRF with a variable of the number of vehicles in a cluster, where the variable gradually increases from 10 to 30. The proportion of malicious vehicles remained 30%, and the probability of malicious vehicles attacking remained unchanged at 50%. It can be seen that as the number of vehicles in the cluster increases, the system's overall accuracy decreases slightly. When the number of vehicles is 11, the accuracy reaches the maximum 98:0879%, and when the number of vehicles is 29, the accuracy reached the lowest 91:4176%. The overall system accuracy remained above 90% with no apparent downward trend, which verified that the system we proposed could identify malicious vehicles, thereby improving the reliability of IoV. The random selection of samples and features during RF training results in slight fluctuations in the accuracy of MTRF, but within a reasonable range. Figures 11 and 12 illustrate the accuracy of MTRF with the variables of the number of malicious vehicles in a cluster and the probability of launching a malicious attack. Figure 11 shows the performance under different numbers of vehicles in a cluster. The four broken lines represent four different attack probabilities. It can be seen that when the attack probability is lower than 0.3, the accuracy of MTRF for malicious vehicle identification remains around 90% and does not change significantly with the increase of the number of malicious vehicles in the cluster. When the attack probability is higher than 0.5, the accuracy of MTRF decreases to a certain extent with the increase of the number of malicious vehicles in the cluster, but MTRF still maintains the accuracy of 77.8% until there are half of the malicious vehicles in the cluster. Figure 12 shows the performance under different probabilities of launching attacks. The six broken lines represent six different numbers of malicious vehicles in a cluster. It can be seen that when the attack probability of malicious vehicles is 0.1, even if the malicious vehicles in the cluster increase to 27, the accuracy can be maintained above 92%. When the number of malicious vehicles in the cluster is less than 9, MTRF is almost not affected by the attack probability of malicious vehicles. Even if all attackers launch attacks in the cluster, MTRF can still maintain an accuracy rate above 83%. When the number of malicious vehicles is greater than 15, the accuracy of MTRF decreases as the probability of a malicious vehicle launching an attack increases, but the accuracy remains around 70% until the probability is 0.5.
The above results can prove that MTRF has a relatively superior effect. The RSU consolidates the results of all vehicles through the voting process, which is why only when both the number of malicious vehicles and the probability of launching attacks remains high, accuracy will be dramatically reduced. Malicious vehicles randomly choose whether to reverse the decision result according to the probability. When the proportion of malicious vehicles is less than 50% , it is difficult to affect the system classification results even if the attack is launched. When the proportion of malicious vehicles exceeds 50% and the probability of attack remains at a medium or low level, according to probability statistics, the probability of simultaneous attacks by vehicles is low so that the system can effectively resist attacks and the system accuracy is maintained at a high level.
We also compare the performance of MTRF with that of TECU proposed in [19] in the case of 30% malicious vehicles. As shown in Figure 13, under the same malicious vehicle number in a cluster and three different probabilities of malicious attack, the accuracy of MTRF is significantly better than TECU. In [19], the authors only realize the identification of malicious nodes but do not consider the attack behavior of malicious nodes. Once the node launches an attack, the system performance will decline sharply. Especially

13
Wireless Communications and Mobile Computing when the probability of malicious attack is 0.5, the accuracy of TECU to identify malicious nodes is less than 80%. For MTRF, we use RF to make the vehicles in the cluster jointly identify malicious vehicles and weaken the attack effect of malicious nodes and keep the classification accuracy of MTRF above 90% even in the terrible environment. In addition, with the help of encryption algorithms, we further improve MTRF's ability to resist malicious vehicle attacks, thus maintaining superior performance.
The next portion is the simulation of the path prediction mechanism based on DQN. We use a 25 × 25 pixel image to simulate the traffic environment. As shown in Figure 14(a), the circumstance is composed of 25 intersections and 20 T-shaped intersections. Pixels have six corresponding RGB values. Black represents clear roads, white represents obstacles, blue and green represent the current location and destination of the vehicle, and red represents accidents. The other three colors represent the degree of congestion in the road, quantified by the traffic volume, and the degree of congestion increases as the color darkens.
There are two traffic accidents on the road. First of all, we simulate the path prediction with good traffic conditions. 14 Wireless Communications and Mobile Computing There are two traffic accidents on the road, four sections with light congestion, four sections with moderate congestion, and three sections with severe congestion. It can be seen from the simulation results that the vehicle as an agent finds a path in the network that can avoid the above nodes and reach the destination safely. However, it is worth mentioning that there is more than one optimal path because the road conditions are relatively simple. Then we worsened the traffic situation, which consisted of six accident nodes, 15 lightly congested road sections, ten moderately congested road sections, and eight severely congested road sections. As shown in Figure 14(c), there is no perfect path in the current network, and the vehicle must experience congestion. However, according to our proposed algorithm, the vehicle chooses a path that only passes through a section of lightly congested traffic and obtains a better effect. Figure 15 shows the average loss during the training process of the above two situations. It can be seen that there is a fast convergence rate in both scenarios, and the learning net would be desired. With the increase of the scene's complexity, the convergence speed does not significantly improve, with good generalization, which can cope with the scene of the IoV. Finally, to more convincingly illustrate the superiority of the proposed path prediction mechanism, we compared the time consumed by the DQN-based path planning algorithm and the traditional A * -based algorithm under the same environmental conditions, and the results are shown in Figure 16. When the scale of IoV network is small, the difference between the time-consuming of the above algorithms is not apparent. With the increase of network topology scale, the delay and delay growth rate of the A * -based algorithm are much higher than those of our proposed algorithm, which showcases the superiority of our algorithm in terms of timeliness.

Conclusion
This paper proposed an efficient RF-based trust management mechanism MTRF tailored to the urban scenarios in IoV. We also proposed a trust-sharing mechanism based on path prediction using Deep Q-network. According to simulation results, we demonstrated the performance of our trust management scheme under different situations. In addition, we simulated the path prediction algorithms under different IoV network topology scales and different

16
Wireless Communications and Mobile Computing traffic conditions to verify that the proposed mechanism can achieve good results and has good convergence performance. Compared with the traditional A * -based algorithm, the proposed algorithm can also highlight its better generalization and superiority in time-effectiveness. For the trust management mechanism, we mainly considered two types of attack modes, and lacked the considerations of other attack modes such as Sybil attack, which limited the defense capability of MTRF. In addition, the simulation of the urban road network environment took regular intersections as the basic unit, which simplified the complexity of roads to a certain extent. The defects mentioned above should be solved in future research.

Data Availability
The trust evidence data used to support the findings of this study came from scenario simulation and reasonable assumptions, which are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.