Trustworthy Event-Information Dissemination in Vehicular Ad Hoc Networks

In vehicular networks, trustworthiness of exchanged messages is very important since a fake message might incur catastrophic accidents on the road. In this paper, we propose a new scheme to disseminate trustworthy event information while mitigating message modification attack and fake message generation attack. Our scheme attempts to suppress those attacks by exchanging the trust level information of adjacent vehicles and using a two-step procedure. In the first step, each vehicle attempts to determine the trust level, which is referred to as truth-telling probability, of adjacent vehicles. The truth-telling probability is estimated based on the average of opinions of adjacent vehicles, and we apply a new clustering technique to mitigate the effect of malicious vehicles on this estimation by removing their opinions as outliers. Once the truth-telling probability is determined, the trustworthiness of a given message is determined in the second step by applying a modified threshold random walk (TRW) to the opinions of the majority group obtained in the first step. We compare our scheme with other schemes using simulation for several scenarios. The simulation results show that our proposed scheme has a low false decision probability and can efficiently disseminate trustworthy event information to neighboring vehicles in VANET.


Introduction
Vehicular networks are expected to be used for traffic control, accident avoidance, parking management, and so on [1].Communication security between vehicles needs to be addressed carefully due to the safety requirements of vehicular network applications [2].There is a lot of ongoing research on security topics, which aims to provide secure communications and verification of data to thwart malicious attackers.One of the major issues in vehicular ad hoc network (VANET) is message trust, which can be used to secure VANET communications.It is essential to periodically evaluate the trustworthiness of event information based on trust metrics.Generally, trust computation in a static network is relatively simple, because the trust level can be calculated based on the behavior of the nodes with sufficient observations [3].However, message trust computation in VANET is challenging due to the ephemeral nature of the network topology.
The wireless access in vehicular environments (WAVE) protocol is based on the IEEE 802.11p standard and provides the basic radio standard for dedicated short range communication (DSRC) operating in the 5.9 GHz frequency band [4].Vehicular communications can be achieved in the infrastructure domain for vehicle-to-infrastructure (V2I) communications or in the ad hoc domain for vehicle-tovehicle (V2V) communications.We mainly focus on V2V communications because road side units (RSUs) [1] may not be available in some parts of the country during the initial stages of deployment of the vehicular communications infrastructure.Vehicles communicate with other vehicles through on-board units (OBUs) forming mobile ad hoc networks that allow communications in a completely distributed manner [5].We note that some event information (e.g., accident reports) needs to be disseminated quickly and accurately, with minimum delay.Failure in timely and accurate dissemination of such time-critical information might lead to collateral damage to neighboring vehicles.
Mobile Information Systems Some of the issues in vehicular networks include simple routing problems and application-oriented problems like Sybil attacks and false data dissemination [6].The traditional reputation systems may not work efficiently in vehicular networks [7].Public key infrastructure (PKI) may not be available everywhere during the initial stages of vehicular network deployment around a country, because some regions may not be covered due to deployment costs or budget issues.Generally, cryptography-based verification of message trustworthiness is computationally expensive.It can protect against a few types of attack from external nodes.However, it will not protect against malicious nodes in the network, which already have the required cryptographic keys, and may not be suitable for V2V ephemeral network communications.Our scheme does not use cryptography and centralized servers, and, thus, it does not have a single point of failure.Most VANET models assume that the system is up and running, where all vehicles have a certain trust score.However, it is not easy to know the trustworthiness of vehicles without having had any interaction with those vehicles.In highly distributed vehicular networks, vehicles can join and leave a network frequently [8,9].When a new vehicle joins the network for the first time, there is no information about it.One of the challenges faced by VANET is that the trust model of the VANET should consider the requirement for anonymity of vehicles.The trust model should have minimal overhead in terms of computation complexity, as well as storage.The trust model should be robust to data-centric attacks and be able to detect those attacks [10][11][12].VANET security frameworks should be light, scalable, reliable, and secure.
Our proposed scheme investigates the trustworthiness of event information received from adjacent vehicles, which serves as multiple pieces of evidence.We use truth-telling probability as a measure for the trustworthiness of a vehicle.The vehicles communicate through safety messages to report events, such as accident information, safety warnings, information on traffic jams, weather reports, and reports of ice on the road.In our proposed scheme, all vehicles are assumed to have a pseudo identity (PID), which is independent of the node identity.Each vehicle broadcasts an event message to adjacent vehicles from the time it collects information about that event.Every vehicle maintains the trust level of its neighbors in a distributed manner to cope with the propagation of false information.We introduce an enhanced -means clustering technique to minimize the effect of malicious nodes on trust level calculation.We use a modified threshold random walk algorithm with a single threshold to make a final decision about the occurrence of an event, while supporting real-time decision.We focus on determining the trustworthiness of the event information in the received messages by considering reports from neighboring vehicles differently with a truth-telling probability.
The main contributions of our work can be summarized as follows: (i) Our proposed scheme can contribute to dissemination of trustworthy information since each vehicle makes a decision on the trustworthiness of information in the received message individually, while dropping packets containing fake information.(ii) Since all the decisions are made based on the information received from the neighbor vehicles, our proposed scheme can work in an infrastructure-less environment as well.(iii) Our proposed scheme can make a better decision on the trustworthiness of a given message compared to a simple voting mechanism, since the modified threshold random walk (TRW) can give a higher weight on the opinion of a vehicle, which makes more true statements than false statements.
The remainder of this paper is organized as follows.In Section 2, we discuss the related work.In Section 3, we propose a trustworthy event-information dissemination scheme for VANET.In Section 4, we evaluate the performance of the proposed scheme using simulation.Section 5 concludes the paper with future work.

Related Work
Several trust management systems have been proposed for VANET [13][14][15].Trust management systems evaluate the trust values of the neighbor nodes to prevent them from interacting with the malicious nodes.The authors in [16] provide a quantitative and systematic review of existing trust management schemes for VANET.They address comprehensive trust model concepts, problems, and solutions related to VANET trust management.There are several works on trust management scheme based on infrastructure framework and cryptography techniques.Trust management schemes can be divided into four categories based on the use of infrastructure and cryptographic measures such as public key infrastructure (PKI) as shown in Table 1.The first category represents the trust management techniques based on infrastructure such as RSU and PKI.In the second category, the nodes rely on infrastructure for trust management without using PKI.In the third category, each node handles the issue of message trustworthiness based on PKI without using any infrastructure.In the fourth category, the nodes are fully decentralized and operate in infrastructure-less environment, and they do not depend on PKI.
In the first category, trust management systems are based on infrastructure such as RSU and PKI and can be effective in identifying malicious nodes with some accuracy [17][18][19][20][21][22][23].However, trust management schemes in this category may not work if infrastructure is not available.The trust management based on PKI is computationally expensive and cannot secure VANET against insider attack, where the malicious nodes have already acquired the cryptographic keys [5,23].Some researchers used group signatures (GS) techniques [17][18][19][20][21] to authenticate message sender and guarantee message integrity such as Identity-Based Group Signatures (IBGS) in [18], GSIS in [19], and Identity-based Batch Verification (IBV) in [20].However, GS schemes are usually based on PKI, and message sender authentication cannot prevent legitimate nodes from sending malicious messages.
In order to overcome the limitation of existing approaches, some researchers investigated trust management without using PKI in an infrastructure-less environment, which corresponds to the fourth category [14,15,[30][31][32][33][34].Trust management schemes in this category such as Vehicle Ad Hoc Reputation System (VARS) [32] are more suitable for distributed VANET architecture.Our proposed scheme also belongs to this category, since the decision on the trustworthiness of event information received from neighbor nodes is made in an infrastructure-less environment without using PKI.
The existing trust management systems are established on specific application domain implementing different trustbased models to enhance intervehicular communication.The trust-based models can be classified into three main categories.They are entity based, data-centric based, and hybrid trust models [35].Entity based trust model deals with the trustworthiness of each node considering the opinions of the peer nodes [26][27][28]36].In [24], the authors proposed a fuzzy approach for the verification of the trustworthiness of the nodes by using feedback from their neighbors.However, the trustworthiness of a message may not always agree with the trustworthiness of the node itself.Thus, this model cannot resolve the issue of message trustworthiness properly.
On the other hand, in data-centric trust model, the trustworthiness of the reported events from the neighbor vehicles is evaluated rather than the trust of the entities or the node itself [5,30,31,35].In [5], the authors used a Bayesian inference decision module to evaluate the received event reports.But, the inference module uses the prior probability, which is not easy to obtain due to dynamic topology of VANET.In [28], the authors proposed a trust model called a Lightweight Self-Organized Trust (LSOT), which contains trust certificate-based and recommendation-based trust evaluations.However, it did not distinguish between the trust value of a node and that of the reported message.The trustworthiness of the nodes does not guarantee the trustworthiness of the message as the trustworthy nodes can send fake or faulty messages, if attackers compromise them.In [30], the authors proposed Real-time Message Content Validation (RMCV) scheme in an infrastructure-less mode.This scheme assigns a trust score to a received message based on three metrics, that is, message content similarity, content conflict, and message routing path similarity.The message trustworthiness is based on the maximum value of final trust scores collected from the neighbor nodes.However, this scheme does not consider high mobility of the vehicles and its time complexity is high.
Hence, a hybrid trust model is introduced that combines the entity based and data-centric trust models to evaluate the trustworthiness of a message [32][33][34].The authors in [29] proposed a hybrid trust management mechanism called Beacon-based Trust Management (BTM) system, which constructs entity trust from beacon messages and computes data trust from crosschecking the plausibility of event messages and beacon messages.However, their trust model is based on PKI and digital signature, which incurs overhead while signing and authenticating each beacon message before broadcasting.
Thus, we attempt to overcome the limitation of the existing schemes by improving the hybrid trust model for message trustworthiness.As a first step, we use initializationstep enhanced -means clustering algorithm (IEKA) for clustering of the vehicles into normal and malicious vehicle groups to determine the trustworthiness of each neighbor node.As a second step, we use a modified threshold random walk (TRW) algorithm to decide the trustworthiness of a given message.Thus, our scheme is based on a hybrid trust model.Although RMCV is based on data-centric trust model, it belongs to the fourth category, that is, trust management scheme that requires neither PKI nor infrastructure, according to the classification in Table 1.Thus, we compare our proposed scheme with the RMCV scheme.The detailed comparison and performance evaluation is discussed in Section 4.

System Model
Each vehicle collects sufficient information to assess the validity and correctness of a message.Notations explains the parameters and variables used in this paper.
When an event occurs on the road, a vehicle that is near that event sends the safety event message,   , to neighboring vehicles.Let us suppose that vehicle   wants to know the true information about the event reported by vehicle   in Figure 1.
The vehicle   manages an information pair (p i ,  i ) for each neighbor vehicle   , where p i is the pseudo identity of the th neighbor vehicle and  i is the trust level, that is, truth-telling probability, of vehicle   .We assume that the transportation authority preloads the pseudo ID of the vehicles during vehicle registration, and it should be renewed periodically.To maintain the privacy in VANET, the pseudonym should change over time to achieve unlinkability that protects the vehicle from location tracking.Only privileged authorities are allowed to trace or resolve a pseudonym of the vehicle to a real identity under specific condition [37].The truth-telling probability ( i ) is the ratio of the number of true event reports propagated by vehicle   to the total number of event reports sent by vehicle   over a specific time period.

Proposed Trustworthy Information Dissemination Scheme.
An outline of the proposed scheme to determine the trustworthiness of event information in the received message is shown in Algorithm 1.The vehicle parameters such as pseudo ID (PID) and default trust level are initialized at the beginning.All the vehicles periodically broadcast beacon messages, along with status information such as speed and location to neighboring vehicles.If there is no event triggered, then the vehicles will gather information from the neighboring vehicles.If a vehicle encounters any event by itself, then it broadcasts a safety message along with the trust levels of neighboring vehicles that it knows.Each vehicle accumulates the trust levels of the neighboring vehicles based on the collected safety messages.  creates a trust matrix based on the trust level opinions from other vehicles.Thus, the trust matrix manages the trust levels of each neighboring vehicle.Sometimes, vehicles misbehave by sending false information due to selfish motives like getting easier and faster access to the road, or due to faults.To prevent such false information that can corrupt the trust level of legitimate vehicles, we use a clustering algorithm.Our proposed clustering algorithm attempts to separate the trust level opinions of normal vehicles from the trust level opinions of malicious vehicles.The vehicle will calculate the aggregated trust level of adjacent vehicles belonging to the majority group of normal vehicles from the trust matrix.It will update the trust matrix using the average of trust levels.Then, a modified TRW is applied to know whether the event has actually occurred or not.The modified TRW can provide better decision on the trustworthiness of an event information by giving higher weights on the true event messages.After the trustworthiness of the event information has been verified, the event message is disseminated to other neighboring vehicles along with the updated trust levels.If the event information contained in the message turns out to be untrustworthy, then the message is dropped.
When new vehicles join the VANET, they are not likely to have enough information to infer the trust levels of neighboring vehicles at the beginning.We need a trust level bootstrapping procedure to assign a default trust level for this situation [38].The trust level, that is, the truth-telling probability, ranges from 0 to 1.If vehicle  does not have any information on vehicle , then the truth-telling probability of vehicle  is set to 0.5 at vehicle .We assume that each vehicle sets the truth-telling probability for itself to 1 by default.
We mainly deal with two types of messages: beacon messages and safety messages.The vehicles use beacons to periodically broadcast and advertise status information to neighboring vehicles at intervals of 100 ms.The sender reports its speed, position, and so on to neighboring vehicles with beacon messages via one-hop communications [39].On the other hand, safety messages support vehicles on the road //The process is executed by a receiver vehicle upon receiving safety message //  : Pseudo ID of th neighbor vehicle of   //  : truth-telling probability of   //  : estimator of   by   //Θ  : trust level opinion generated by   // θ : estimator for truth-telling probability of   Input: Y = {Θ  } ( = 1, 2, . . ., ) Output: { θ }, updated trust matrix (1) Information gathering from neighbor vehicles (2) If event is triggered then goto step (5) (3) Else goto step (2).( 4) If event source is the   itself then goto step (13).
(5) Else   accumulates the trust levels opinions of neighbor vehicles Θ  = (( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   ), . . ., (  ,   )) (6)   generates a trust matrix based on the trust level opinions.(7) Use modified clustering algorithm to separate trust level opinions of normal from malicious vehicles.( 8) Calculate aggregated trust level of adjacent vehicles belonging to majority group from trust matrix θ = (1/) ∑  =1   , (9) Update the trust matrix (10) Use modified TRW to know if the event has actually occurred or not.(11) If we decide that the event message is trustworthy, then goto step (13).( 12) Else drop the message.( 13) Broadcast safety message and trust level to neighboring vehicles.Algorithm 1: Determining trustworthiness of event information in the received message.
by delivering time-critical information so that proper action can be taken to prevent accidents and to save people from life-threatening situations.Safety messages include different types of events,   , such as road accidents, traffic jams, slippery roads, road constructions, poor visibility due to fog, and emergency vehicle warnings.Vehicles broadcast a safety message to neighboring vehicles when they encounter events on the road [1].The message payload includes information about the vehicle's position, message sending time, direction, speed, and road events [19].Each vehicle gathers information about the neighboring vehicles within its communication range.
One advantage of our proposed message dissemination scheme is to avoid a central trusted third party for trust accumulation in a distributed vehicular networking environment.We consider VANET without infrastructure such as RSUs.Vehicles communicate with each other in V2V mode using DSRC [40].This allows fast data transmission for critical safety applications within a short range of 250 m.A basic safety application contains vehicle safety-related information, such as speed, location, and other parameters, and this information is broadcast to neighboring vehicles [41][42][43].Let us consider two vehicles:   and   .The truth-telling probability of   depends on whether vehicle   is truthful when relaying event information.According to Velloso et al. [44], the more positive experiences vehicle   has with vehicle   , the higher the trust vehicle   will have towards vehicle   .
Let us suppose that vehicle   has a pseudo ID   and broadcasts a safety warning message   , which is defined in (1), when event   , where  represents an event type, is detected.If the vehicle itself detects the event, then it broadcasts the safety message along with the trust levels of neighboring vehicles.If a vehicle receives a safety message from other vehicles, it will accumulate the safety message along with trust levels from neighboring vehicles.When vehicle   collects event information from vehicle   , it finds the type and location of event from the message.Let event message   be given by where   is the pseudo ID of vehicle   ,  is the message generation time,   is the location of event   , and   is the location of   at time .
In addition to this, each vehicle periodically broadcasts a beacon message defined as   = (  ,   ,   ,   ), where   is the pseudo ID of   ,   is the beacon generation time,   is the location of   , and   is the speed of   .
Let   be the trust level, that is, truth-telling probability, of vehicle   .Truth-telling probability   is defined as the ratio of the number of true events reported by vehicle   divided by the total number of events reported by vehicle   over a specific period of time.Let  denote the total number of true events reported by   and let  denote the total number of events reported by the vehicle up to the current time.Then, the truth-telling probability is A value for   approaching 1 indicates reliable behavior of the corresponding vehicle, whereas a value close to zero indicates a high tendency towards providing false information [45].
Mobile Information Systems

Calculation of Trust Level of Neighbor Vehicles.
When an event occurs, the nearby vehicles broadcast safety messages with additional data, such as pseudo IDs and truth-telling probabilities of other vehicles.Based on the safety messages from the neighboring vehicles, trust matrix [  ] can be obtained, where   is estimation of   by vehicle   .The trust matrix manages the truth-telling probability of each neighboring vehicle from the viewpoint of other vehicles.We assume that each vehicle sets its own truth-telling probability to 1.If the trust matrix is constructed, the aggregated trust level, that is, truth-telling probability of vehicle   , is calculated from the trust matrix by where θ is the estimator for the truth-telling probability of   .

Estimation of Truth-Telling Probability Based on the
Correctness of Message Information.If we can decide whether specific event information received from a vehicle is correct, this information can be used to estimate the truth-telling probability of the reporting vehicle more accurately.The reliable information about a specific event might be obtained from direct observation of an event spot, or announcement from a public and reliable group.We explain how the truth-telling probability can be estimated more accurately if we collect more evidence to decide the correctness of messages generated by a given vehicle.We can estimate the truth-telling probability   , defined in (2), based on the correctness of recent  messages from   .We introduce a random variable   to estimate the number of true reports among the recent  reports from   on arrival of the th report from   .Then, the truth-telling probability of   can be estimated by   /.We attempt to estimate   from  −1 using the following relation: Then, we can show that   / approaches the truth-telling probability   of   under the assumption that the correctness of one message is independent of the correctness of other messages.By taking expectation on (4), we can obtain [  ] as By solving the recursive relation in (5), we can obtain Thus, regardless of the initial condition on  0 , we have lim →∞ [  ] =   , and lim →∞ [  /] =   from (6).
In other words, we can say that   / approaches the truthtelling probability   asymptotically, and we use the estimator of   / and the relation in ( 4) to update the truth-telling probability of some vehicle whenever we have some evidence to determine the correctness of a message from that vehicle.

Clustering Algorithm.
If there is no evidence to determine the truth of a given message, then the truth-telling probability of vehicle   will be calculated using (3).However, malicious vehicles can modify the trust levels of neighboring vehicles to mislead vehicles in a vehicular network.Thus, we need a clustering algorithm that can separate the trust levels of normal vehicles from the trust levels of malicious vehicles.It can reduce the effect of malicious vehicles on the trust levels of normal vehicles.In this subsection, we propose a new clustering algorithm to tackle this issue.
The main goal of our modified clustering algorithm is outlier detection.Our modified clustering algorithm classifies the trust level (truth-telling probability) opinions of the vehicles into two groups, one with the trust level opinions of normal vehicles and the other with the trust level opinions of malicious vehicles.We will select the majority group and neglect the outliers corresponding to the minority group.
Let us assume that an event has occurred on the road and the vehicles near the event location send event messages along with trust level opinions to neighbor vehicles.The vehicle   gathers reports about a specific event from neighbor vehicles and manages the trust level opinions of other vehicles as follows.Each vehicle maintains a sorted vehicle list (SVL), which manages pseudo IDs of all the adjacent vehicles in an ascending order, and the vehicle index will be assigned based on the sequence in the sorted list as shown in Table 2. Whenever a vehicle   needs to disseminate its own trust level opinion to its neighbors, it sends its trust level opinion Θ  defined as Θ  = (( 1 ,  1 ) , ( 2 ,  2 ) , . . . , (  ,   ) , . . . , (  ,   )) , (7) where   is the pseudo ID of the th neighbor vehicle of the vehicle  and   is likely to be set to 1 because every node will trust itself.
If   receives trust level opinion Θ  , then   updates its own SVL by adding the vehicles that are in Θ  , but are not in the SVL.After updating SVL,   derives Θj  from the received Θ  as Θj  = ( θ1j  , θ2j  , . . ., θ  j  , . . ., θ  j  ) , where   is the new index of the vehicle  according to its sequence in the updated SVL and   is the total number of vehicles in the updated SVL.When   in the received Θ  agrees with the   th pseudo ID in the updated SVL, θ  j  in Θj  is updated as In this case,   is always larger than or equal to  since Θj  accommodates all the vehicles in Θ  .If   > , then it means that there is some pseudo ID that is in the SVL, but not in Θ  .If an index  corresponds to such a pseudo ID, θlj  will be set to 0.5 since the vehicle   does not know the vehicle .If Θj  is derived, then the trust matrix table is updated by adding the transpose of Θj  as the   th column.
If each vehicle includes the pseudo IDs and the truthtelling probabilities of all the vehicles that it knows in the trust level opinion message defined in (7), then the traffic overhead due to this message can be excessively large.However, we can reduce the message overhead by omitting trivial information.For example, if   in Θ  is 0.5, this means that the vehicle  does not know the vehicle  since 0.5 is the default value used to initialize the truth-telling probability of a new vehicle.In this case, the vehicle  need not advertise this probability because this default value can be easily filled up by the neighbor vehicles according to the trust matrix updating rule mentioned above with (7), (8), and (9).The vehicles on the road are likely to be ignorant of each other in terms of the trust matrix table, since they need not exchange the trust level opinions if there is no event.Thus, we expect that the policy of omitting trivial information can significantly reduce traffic overhead due to trust level opinion messages.
If   collects trust level opinions Θ  ( ̸ = ) from other vehicles along with event information, then   can construct trust matrix Γ, defined as We use a simple example to show our proposed clustering algorithm illustrated by Table 3 and Figure 2. Table 3 shows an example of trust matrix defined in (10), when  = 3.Three columns in Table 3 correspond to points , , and   2, where each axis represents  1 ,  2 , and  3 , respectively.If there is no attacker, that is, all vehicles tell the truth, then all the points will be close to each other.The trust level opinions (Θ  's) with similar characteristics are likely to form the same cluster.If there is an attacker that tells a lie, then the corresponding point will deviate from the majority group, and this point can be distinguished as an outlier.Even if the attacker tries to change or give higher trust levels by using collusion attack, it can be detected as an outlier.Let us suppose that point  represents an attacker that tells a lie by changing the trust level, as shown in Table 3.The clustering algorithm will separate the trust level opinions into two groups, one group with normal-vehicle trust levels (i.e.,  and ) and the other group with malicious vehicle trust levels (i.e., ). Figure 2 describes the outcome of one possible clustering algorithm.
The final aggregated trust level is calculated based on the trust level opinions corresponding to the majority group using (3).
The final aggregated trust level based on the majority group will be used to update the trust level of the vehicle itself.The resulting trust level is appended to the message during message propagation.
Input: Y = {Θ  } ( = 1, 2, . . ., ) Output: C = {c  } ( = 1, 2) Initialize: Calculate unique centroid as initial cluster center, (5) end ( 6) While two centroids are not converged, do (7) for each Θ  ∈ Y do (8) Assign Θ  to nearest centroid, ( 9) (10) end (11) Update cluster centroid; (12) Calculate new centroid c j as (13) for each c j ∈ C do ( 14) We propose a modified -means clustering algorithm.The main problem with a -means algorithm lies in the initialization step, so we introduce an enhanced -means clustering technique by modifying the initialization step, which is called initialization-step enhanced -means clustering algorithm (IEKA).We use the IEKA to cluster the trust level opinions, while reducing the effect of malicious vehicles on trust levels for other vehicles.Our proposed clustering algorithm can be described in more detail as follows.
After generating a trust matrix, IEKA partitions the trust level opinions into (≤) groups C = {c 1 , c 2 , . . .., c  }.We designate Y to be a set of the Θ i vectors; that is, Y = {Θ 1 , Θ 2 , . . ., Θ  }.We consider only two clusters for our scheme.Initially, we take the mean of all the data points in Y to find a unique centroid, that is, .
We calculate the Euclidean distance between  and each vector in Y. Choose the point that has the maximum distance from the unique centroid; that is, the selected point is at the farthest distance from the unique centroid.We consider this point as the first centroid, c 1 , for the first cluster: Similarly, we compute the Euclidean distance between first centroid c 1 and the remaining points in Y and select the point with the maximum distance from c 1 .Then, this point becomes the second centroid, c 2 : As a next step, we run the conventional -means clustering algorithm, with c 1 and c 2 being the centroids of two separate groups.Update centroids c 1 and c 2 by calculating the mean value for each group.This gives new centroids c 1 and c 2 and then reassigns each data point to the cluster to which it is closest.We will repeat this process until those two centroids converge.The proposed modified clustering algorithm is given in Algorithm 2.
After clustering using Algorithm 2, which is based on ( 11), (12), and ( 13), the aggregated trust level of each neighbor vehicle is calculated based on the trust level opinions belonging to the majority group.We assume that the number of malicious vehicles is less than that of normal vehicles.The aggregated trust level is used for TRW calculation.In the next subsection, we discuss the decision on the event based on hypothesis testing using TRW.

Event Decision Based on Threshold Random Walk (TRW).
Sequential hypothesis testing is usually used to determine if a specific hypothesis is true or not based on sequential observations [46].Among the sequential hypothesis testing schemes, threshold random walk has been used to detect scanners with a minimal number of packet observations, while guaranteeing false positives and false negatives [47].Since we are interested in determining whether a given message is true or not, if true message constitutes one of the two hypotheses, then threshold random walk might be applied to this problem.The threshold random walk scheme in [47] uses two thresholds, that is, one upper bound and one lower bound, and the decision is made when a likelihood ratio reaches either threshold.However, in this threshold random walk scheme, we cannot know the number of samples required to reach either threshold in advance.This means that real-time decisions may not be possible if we cannot collect a sufficient number of samples in a short interval.In this paper, we use a modified threshold random walk scheme to determine the validity of a given event, while resolving the issue of real-time decision.We resolve this issue by applying threshold random walk with a single threshold instead of two thresholds.Hereafter, we explain the threshold random walk (TRW) scheme applied to our problem in more detail.
1 represents one of the events that can happen on a road.After clustering trust level opinions of neighbor vehicles using IEKA, each vehicle determines the occurrence of event  1 based on the aggregated trust level table.The aggregated trust level table consists of vehicle PIDs, aggregated trust levels, and event observations, as shown in Table 4.
In Table 4,   is the report received from the th neighbor vehicle about event  1 .Event  1 represents the occurrence of an event, and  1 represents nonoccurrence of event  1 .We need a rule to make a decision about the occurrence of the event.We assume that   's are independent of each other among different vehicles.For a given event, suppose random variable   can take only two values (0 and 1); that is, After collecting a sufficient number of reports, we wish to determine whether the event ( 1 ) has really occurred using sequential analysis [46].
Let us consider two hypotheses: one is null and the other is an alternate hypothesis (i.e.,  0 and  1 ), where  0 is the hypothesis that event  1 has occurred and  1 is the hypothesis that event  1 has not occurred, that is,  1 .We also assume that conditionals on the hypothesis  |   where  = 0, 1 are independent.From the definition of the truthtelling probability and ( 14), we obtain where Pr( =  |   ) is the conditional probability that the observation of , given hypothesis   , is .Then, Pr(  = 0 |  0 ) =   becomes the truth-telling probability, and Pr(  = 1 |  0 ) = 1 −   becomes the lying probability.In order to make a timely decision, we collect report samples from neighbor vehicles during an interval of fixed duration .
Let  denote the number of report samples collected during this interval.Following the approach of Wald [46], we use collected report samples to calculate the likelihood ratio by Although the TRW scheme in [47] makes a decision based on two thresholds, the upper and lower bounds, we use a single threshold to make a decision without the issue of long waiting time.When the threshold is , the decision rule is as follows: If Λ ≥ , then accept hypothesis  1 .
In this paper, the threshold  will be set to 1, and the truthtelling probability   of an unknown vehicle  will be set to 0.5.When a vehicle receives  report messages, if the th report has come from a vehicle with no information on the truth-telling probability, Pr(  |  1 ) = Pr(  |  0 ) since   = 1 −   .Thus, the report from the unknown vehicle will not affect the likelihood ratio by (15) and (16).Furthermore, if all the report messages are from the vehicles with no history information, then the likelihood ratio in ( 16) becomes 1, and, thus, it is fair to put  = 1, since it is not easy to make a decision in this case.
The advantage of our threshold random walk compared to a simple voting scheme can be described with a simple example as follows.Let us consider a case where an event  1 is true, and a vehicle receives 5 report messages.Among them, only two report that  1 is true, and the other three claim that  1 did not happen.If we make a decision based on a simple voting, then the decision will be  1 .However, if we apply threshold random walk considering the truthtelling probability of each node, the decision can be different as follows.If the truth-telling probability of the two nodes claiming  1 is 0.8 and the truth-telling probability of the three nodes claiming  1 is 0.6, then the likelihood ratio defined in (16) becomes Thus, we will select the hypothesis  0 according to the decision rule mentioned above, since the likelihood ratio calculated in (17) is less than the threshold .This means the correct decision of  1 is made by the proposed threshold random walk.This advantage comes from the fact that the likelihood ratio in (16) gives a higher weight to the opinion of vehicles with a high truth-telling probability.
After the decision on the actual occurrence of the event is made, vehicle  will forward the received message to its neighboring vehicles (with aggregated trust levels) within radio range, which is denoted by where   is the PID of vehicle , which forwards the message,  is the time at which   was sent, and Θ  denotes the trust level opinion of vehicle  defined in (7).

Attack Model.
We consider two types of attacks: message modification attack and fake message generation attack in a VANET environment.Figure 3 shows an example of both message modification and fake message generation attack.A malicious vehicle might modify warning messages, either with malicious intent or due to an error in the communications system.In the message modification attack, malicious vehicles can modify message information at any time and falsify the parameters.
In Figure 3(a), when an accident event occurs on the road, the vehicles in an accident or the vehicles which are close to that accident broadcast the accident event message.After vehicle   sends an accident report to other vehicles, a malicious vehicle   modifies the message and sends the modified no-accident message as   , defined in (18), with the intent to affect decisions taken by other vehicles.Similarly, in a fake message generation attack, malicious vehicles generate a false warning message.For example, in Figure 3(b) [48], a malicious vehicle might send an accident message to neighboring vehicles, even when there is no such event on the road, to clear the route it wants to take.In this case, the malicious vehicle wants to convince other vehicles that an event has occurred.In this scenario, the attacker may have already compromised one or more vehicles and launches attacks by generating a fake message for neighboring vehicles.We assume that the number of malicious vehicles is less than the number of normal vehicles [12].In simulation, we vary the number of malicious vehicles from 5% to 50% of overall vehicles to evaluate the performance of our proposed scheme in an adversarial environment.

Performance Evaluation
4.1.Simulation Setup.The performance of our proposed scheme was evaluated through simulation.We used the Vehicles in Network Simulation (VEINS) framework version We use the default map of Erlangen, Germany, from the VEINS framework with the map size of 2500 m × 2500 m for our simulation.We evaluated our scheme under different traffic densities to consider diverse situations.When the vehicles reach the edge of the road, the vehicles reroute their path and can meet other vehicles multiple times during simulation.The number of vehicles increases linearly with time from 0 s to 300 s.The average vehicle speed changes from 40 km/h in an urban scenario to 110 km/h for highway scenarios.The key parameters considered in our simulation are summarized in Table 5.
We considered two scenarios (urban and highway) by varying parameters such as speed, vehicle density, and percentage of malicious vehicles, as shown in Table 6.The number of malicious vehicles was varied considering the mobility of vehicles in a realistic simulation environment by adjusting vehicle densities and vehicle speeds.We assume that the normal vehicles and the malicious vehicles are uniformly distributed on the roads for each ratio of malicious vehicles [52].

Simulation Results.
In this section, we analyze the simulation results based on OMNet++.The traffic density increases from free-flow traffic (5 vehicles/km 2 ) to congested traffic (100 vehicles/km 2 ) where vehicles can meet multiple times.The simulation scenarios are summarized in Table 6.For performance evaluation, we have considered false decision probability and message overhead.We compared our scheme with other schemes under different scenarios.In order to evaluate our proposed scheme, we considered the message modification attack and the fake message generation attack one by one, while increasing the number of malicious vehicles from 5% to 50% in both scenarios.The positions of normal vehicles and the initial distribution of the attackers were randomly determined.We calculated the average false decision probability by averaging the simulation results for 30 simulation runs.A decision is regarded as a false decision when the decision result does not agree with the true status of the event at the time of the decision.In other words, a false decision probability is the ratio of the number of incorrect decisions to the total number of decisions.
In order to update the truth-telling probability of vehicle   based on the truth of a given message according to (4), we need to decide the parameter , that is, the number of recent messages from   that will be considered in this estimation.In order to decide , we run 20 simulations under fake message attack with 30% of malicious vehicles in a highway scenario.We calculate the average false decision probability for various values of , and Figure 4 shows the result.As the value of  increases, the false decision probability tends to decrease.The false decision probability reaches zero at  = 15 and does not change for larger values of .Thus,  is fixed to 15 hereafter based on this result.
In Figure 4, the false decision probability is high for lower values of .Let us consider an example to explain worse performance for lower values of .Let us take an extreme case of  = 1.Then, this means that when   receives a message from   , it decides the truth-telling probability of   only based on the last message, since  = 1.Thus, if   finds that the last message from   was false, then   will think that the truth-telling probability of   is 0, according to the updating rule described in (4).On the other hand, if   finds that the last message from   was true, then   will think that the truth-telling probability of   is 1.Thus, the estimated truth-telling probability of each vehicle is either 0 or 1.However, if the truth-telling probability of a given vehicle is different from 0 or 1, then this updating rule (with  = 1) will never find the accurate truth-telling probability, since the truth-telling probability is always 0 or 1 according to the updating rule.Hence, the truth-telling probability can be significantly different from the correct truth-telling probability for lower values of , especially when  = 1.
For our modified TRW scheme, we need to determine an optimal value of message collection time  to achieve a good decision accuracy.We run several simulations under fake message generation attack with 30% of malicious vehicle in highway scenario.We calculated the average false decision probability against the message collection time under the simulation parameters given in Table 5.In the sparse network, we received report message as low as five messages  with report collection time less than 100 ms which results in high false decision probability.The simulation result is shown in Figure 5.As the report collection time interval increases, the false decision probability decreases and, from 800 ms, the false decision probability does not decrease anymore.Based on this, we set the value of  to 1 sec, and this value will be used for  hereafter.
We now compare our proposed scheme with other schemes: RMCV scheme, a simple voting scheme, and TRWonly scheme.RMCV is an information oriented trust model and the outcome of the scheme is a trustworthiness value associated with each received message.In RMCV scheme, we consider the message trustworthiness based on the content similarity.The message trustworthiness is likely to increase as the message contents are similar among different vehicles.In the TRW-only scheme, we used the modified threshold random walk to make a decision about the event in the warning message without applying our proposed clustering algorithm.Several voting methods have been proposed to estimate the trustworthiness of each report message [53][54][55].In the simple voting mechanism, each vehicle collects a fixed number of warning messages from the neighboring vehicles regarding an event and makes a decision by following the opinion of the majority group [55].For the voting scheme, we collected 15 messages to make a decision, as this was the optimal number according to our simulation.
We compare our proposed scheme with other schemes in terms of false decision probability for various ratios of malicious vehicles under the message modification attack in a highway scenario as shown in Figure 6.We can see that our proposed scheme yields a lower false decision probability compared to the other mechanisms, even when the number of malicious vehicles increases.The simple voting mechanism performs worst among the four schemes.The performance of the RMCV scheme is close to TRW-only scheme when the malicious vehicle ratio is low.However, it degrades significantly compared to our proposed scheme as the malicious vehicle ratio increases.Our proposed scheme has a false decision probability of 0% when the ratio of malicious vehicles is 30% in highway scenario.
We compare our proposed scheme with other schemes in terms of false decision probability under a fake message attack in a highway scenario as shown in Figure 7.We consider a case where the attacker generates messages about a fake event.Our proposed scheme yields better performance compared to the RMCV, simple voting, and TRW-only schemes with a low false decision probability of less than 10%.The false decision probability of RMCV and voting scheme exceed 40% when the ratio of malicious vehicles increased to 50%.In Figure 7, the false decision probability increases as the ratio of malicious vehicles increases, with a tendency similar to Figure 6.
We compare our proposed scheme with other schemes in terms of false decision probability under a message modification attack in an urban scenario in Figure 8.Our proposed scheme exhibits better performance compared to other schemes.The false decision probability does not exceed 10% for our proposed scheme.However, it reaches 20% for the TRW-only scheme.The RMCV and the simple voting schemes exhibit much higher false decision probabilities compared to our proposed scheme.
We compare our proposed scheme with other schemes in terms of false decision probability under a fake message attack in an urban scenario in Figure 9.The false decision probability of the proposed scheme increases when the density of the malicious vehicles generating the false message increases, resulting in a false decision probability slightly greater than 10%.In an urban scenario, the high density of vehicles and low speeds help the propagation of false event messages generated by attackers.Thus, the false decision probability in this case is slightly higher than that for the highway scenario.In this scenario, the RMCV and the simple voting schemes exhibit higher false decision probabilities compared to the proposed scheme, with a tendency similar to Figure 8.
We now compare our scheme with the RMCV scheme in terms of message overhead.We considered a situation where there is an actual accident without malicious vehicles.In Figure 10, we present the simulation results of the message overhead with respect to the varying density of vehicles per square kilometer in both urban and highway scenarios.The  message overhead is the cost incurred due to the extra message that is exchanged with neighboring vehicles.In terms of message overhead, as the vehicle density increases, the message overhead also increases in both scenarios, as shown in Figure 10.In the beginning, when the vehicle density is low, our proposed scheme has low message overhead as the scheme does not advertise the default trust level of new neighbor vehicles; however the pseudo IDs in the trust level opinion pair cause some message overhead.We can see that the message overhead is higher for the RMCV as compared to our scheme in both scenarios because in their scheme the vehicle nodes send query messages to the neighboring vehicles and then receive response messages regarding the accident event.On the contrary, there is no query message, but only one-way report messages are sent in our scheme.The message overhead in the urban scenario for both schemes is slightly higher than the highway scenario, because the speed of the vehicles in the urban scenario is less than that of highway scenario, and, thus, each vehicle accumulates more messages, compared to the highway scenario.
We also compare our scheme with RMCV in terms of message overhead in the presence of malicious vehicles under fake message attack.The average vehicle density increases from 1 vehicle per km 2 to 100 vehicles per km 2 throughout the simulation time in urban and highway scenarios.We run several simulations by increasing the ratio of the malicious vehicles from 5% to 50% in both scenarios.Our scheme collects warning messages from neighboring vehicles to detect the trustworthiness of event information contained in the received messages.In both schemes, the message overhead increases as the ratio of the malicious vehicles increases because the vehicles accumulate more messages due to the presence of the malicious vehicles.The message overhead for different ratios of malicious vehicles is shown in Figure 11.Our scheme has a lower message overhead compared to the RMCV scheme in both scenarios.

Conclusion and Future Work
In this paper, we proposed a trustworthy event-information dissemination scheme in VANET.We determine and disseminate only the trustworthy event messages to neighbor  vehicles.We introduced a modified -means clustering algorithm to reduce the effect of malicious vehicles on the trust levels (i.e., the truth-telling probabilities) of other vehicles.In other words, the issue of node trustworthiness is resolved through a modified -means clustering algorithm in our proposed scheme.In the next step, the issue of message trustworthiness is resolved by applying a modified TRW to the report messages received from neighbor vehicles along with the information on node trustworthiness.We compared our proposed scheme with RMCV, simple voting, and TRW-only schemes through simulation.The simulation results show that our proposed scheme has a lower false decision probability compared to other schemes as well as low message overhead compared to the RMCV scheme.The simulation results also show that our proposed scheme can effectively cope with message modification attack and fake message generation attack as long as the number of benign vehicles is larger than the number of malicious vehicles.Our scheme has an additional advantage that the decision on the trustworthiness of a given message is made in an infrastructure-less environment without using PKI.
In this paper, we assumed that the malicious vehicles are uniformly distributed on the roads.However, this assumption may not be valid if colluding malicious vehicles move as a group to increase their influence on the nearby vehicles.Such a complicated issue will be studied in more detail in our future work.

2 Figure 2 :
Figure 2: Clustering example for the proposed clustering algorithm.

Figure 3 :
Figure 3: Two types of attack patterns considered in this paper: (a) message modification attack and (b) fake message generation attack.

Figure 4 :
Figure 4: False decision probability versus total number of messages ().

Figure 5 :
Figure 5: False decision probability for various values of message collection time ().

Figure 6 :Figure 7 :
Figure 6: Comparison of the proposed scheme with other schemes under a message modification attack in a highway scenario.

Figure 8 :Figure 9 :
Figure 8: Comparison of the proposed scheme with other schemes under a message modification attack in an urban scenario.

Figure 10 :
Figure 10: Message overhead for various values of vehicle densities under no malicious vehicle.

Figure 11 :
Figure 11: Message overhead for various ratios of malicious vehicles under fake message generation attack.

Table 1 :
Trust management category based on infrastructure and PKI.

Table 2 :
Example of sorted vehicle list (SVL).

Table 3 :
Trust matrix table of vehicle.
in Figure2, respectively.The three probability values in the first column of Table3correspond to  11 ,  21 , and  31 , respectively, and these values are estimation of  1 ,  2 , and  3 by   .This tuple of probability values is represented as point  in the three-dimensional space of Figure

Table 4 :
Aggregated trust level table.
: Pseudo ID of vehicle     : Trust level (truth-telling prob.) of vehicle     : Event of type    : Event message   : Beacon message   : Forwarded message   : Location of event     : Location of vehicle     : Speed of vehicle     : Estimation of trust level   for vehicle  by vehicle .