Privacy-Preserving Cross-Zone Ride-Matching for Online Ride-Hailing Service

Although online ride-hailing supplies the nearest taxi matching for riders, the potential leakage of riders’ hobbies and physical condition raises privacy concerns. Since most of the privacy-preserving schemes do limited matching of riders and drivers in the same zone, in this paper, we propose a novel privacy-preserving cross-zone ride-matching scheme, namely, Cride, which extends one zone into multiple neighboring zones. Based on the zone division of a city, CRide allows distance computation between rider and driver across adjacent zones in the encrypted domain. Furthermore, towards efficiency improvement, a ciphertext packing technique is introduced. Theoretical analysis and experimental results suggest that CRide achieves a high ride-matching accuracy and acceptable efficiency without leaking privacy.


Introduction
Today, online ride-hailing (ORH) services like DiDi and Uber have been welcomed by more and more people who use it to resolve the problem of taking a taxi at rush hour [1,2]. Compared with traditional taxi services, ORH provides people with convenient services. Riders can request a driver just using his mobile phone in a short time, instead of standing by the street and waiting for a taxi.
Despite the advantage of ORH, the way of traveling brings challenges of privacy leakage [3,4]. To o er ridematching, ORH server (RS) needs to collect riders' and drivers' sensitive information, such as identities, locations, and traveling time, which is used to implement matching between drivers and riders. e ride-hailing data may be leaked and used to monitor the traces of riders, infer the hobbies and physical condition of riders, etc. Riders will become the target of junk mail, robbery, or blackmail attacks [5,6]. Hence, it is essential to protect the privacy of riders and drivers.
In ORH service, to protect the sensitive data about drivers and riders, some works [7,8] studied the privacypreserving ORH service. PrivateRide [7] was proposed to protect the privacy of ride-hailing. Later, ORide [8] was designed to nd the nearest driver for the rider without leaking identities and locations of drivers and rides. However, the matched drivers may not be the global nearest one because of local matching in ORide. en, Xie et al. [9] utilized the RNE techniques and di erence evaluation scheme based on the property-preserving hash to achieve privacy-preserving ride-matching. But, Vivek [10] demonstrated that the scheme faces a passive attack, which can recover locations of riders and matched drivers. erefore, how to achieve secure ride-hailing becomes a challenge.
To resolve the above problem, a privacy-preserving cross-zone ride-matching scheme for ORH (CRide) is proposed. Based on the coordinate ciphertext of riders and drivers, CRide calculates the distance ciphertext of the rider from all drivers without learning their locations and then sends it to the rider. e distance ciphertext is decrypted and the nearest driver is found by the rider. We conduct an experiment on real dataset to evaluate CRide, and results show that it achieves a high accuracy and acceptable computing and communication e ciency without the risk of privacy disclosure. e primary contributions of this thesis are introduced as follows: (1) We propose a privacy-preserving cross-zone ridematching scheme for ORH service (CRide), which can find the nearest driver for a rider. During matching, the location information of drivers and riders is protected. Using the ciphertext packing technology, the proposed scheme simultaneously calculates the distance ciphertext between the rider and all drivers in a single ciphertext, efficiently reducing the bandwidth overhead between the rider and the server. (2) A ride-matching approach crossing adjacent zones is presented in CRide. It can securely find the nearest driver for riders in the whole zone rather than in the local zone. us, the accuracy of ride-matching can be increased. (3) In order to evaluate the performance and accuracy of CRide, we design and implement the scheme by C++ on real dataset. Experimental results confirm that CRide achieves comparable accuracy with acceptable computation and network cost for ORH.
is paper is organized as follows. e problem statement is given in Section 2. e necessary preliminaries are introduced in Section 3. e privacy-preserving ride-hailing scheme crossing zones is proposed in Section 4. e security of scheme is analyzed in Section 5. CRide is implemented and evaluated in terms of performance and accuracy in Section 6. At last, related works and conclusions are, respectively, given.

System
Model. In ORH service, when a rider requests a ride query, the nearest driver is matched. In the period of matching, riders' and drivers' sensitive information like location and identification is protected. Figure 1 shows the system model which includes three entities, i.e., RS, drivers, and riders.
(1) Riders. Riders send ride request to RS, and ride request includes rider's public key pk, coordinate ciphertexts, and located zone. After receiving the distance ciphertexts from RS, riders decrypt the ciphertexts and find the nearest driver. (2) RS. RS computes the distance ciphertext based on the encrypted coordinates of riders and drivers. Meanwhile, RS sends pk to drivers who are in the rider's zone z i and the adjacent zones z i,j , where 1 ≤ j ≤ 8. Drivers in the rider's zone and some adjacent zones send their encrypted location information to RS when they receive pk from RS.

reat Model
(1) RS is honest but curious, which is curious about drivers' and riders' sensitive information although following the protocol. So, except for the ride-matching results, locations and identification of riders and drivers cannot be leaked to RS. (2) Riders and drivers are also honest but curious. Riders upload ride request to get the nearest driver, and drivers send their encrypted locations to RS, but they are curious about the location information of each other. Meanwhile, drivers may be ill-natured who could destroy other drivers' location information. (3) ere may be adversaries from Internet, which may launch eavesdropping to learn sensitive locations. erefore, the information of drivers and riders ought to be protected, about which nontrusted entity should not learn anything. (4) Drivers do not collude with RS. e assumption is reasonable because drivers are not the employees of ORH service.

Design Goals.
Based on the above scenarios and the threat model analysis, the scheme will perform ride-hailing matching efficiently while protecting the privacy of drivers and riders. e specific goals are described here.
(1) Security. During the matching process, CRide ought to protect the privacy of drivers and riders. RS only knows the riding-matching result and does not learn anything else. Meanwhile, riders and drivers also learn nothing about the locations of each other except the matched driver. (2) Accuracy. e scheme should be able to match the nearest driver accurately according to the distance between riders and drivers in the whole zone. (3) Efficiency. CRide should achieve a high ridematching accuracy rate with acceptable costs of communication and computation in practice.

Somewhat Homomorphic Encryption.
Homomorphic encryption allows people to perform any operations on the encrypted data and get encrypted results whose plaintext is the same as the operations conducted in plain domain [11,12]. roughout the whole operation process, ciphertext need not be decrypted to protect data privacy. It is crucial for RS to compute the match value based on encrypted location coordinates of drivers and riders while not leaking plaintext.
In our scheme, we use the somewhat homomorphic encryption (SHE) which supports some types of operations on ciphertexts with limited times.
As an efficient SHE, the Fan-Vercauteren (FV) scheme [13] is an additively and multiplicatively homomorphic encryption. In particular, plaintext m and ciphertext c are polynomials over a ring, plaintext elements mϵR t � Z t [X]/(X d + 1), ciphertext elements cϵR q � Z q [X]/(X d + 1), t and q are positive integers which, respectively, define the maximum of the plaintext and ciphertext coefficients, and q > t.
(1) FV.KeyGen (1 λ ): suppose χ is a short noise random distribution in R q , s←χ is a secret key sk, a is a random element in R q , and e←χ is a noise term; then, output pubic key pk

Number-eoretic
Transform. Number-theoretic transform (NTT) is a Fourier transform for finite fields which can speed up the polynomial multiplication in encrypted domain and transform convolution products into coefficient-wise products [14]. For a vector x ∈ Z n t , its NTT is represented as Inverse NTT: where n − 1 is the modulo inverse of n in Z t and α is a principal nth root of unity in Z t .

Privacy-Preserving Ride-Matching Scheme Crossing Zones
4.1. CRide Overview. Generally, both riders and drivers can use smartphones with third-party navigation apps and map projection systems to get their locations and convert the pair of (latitude, longitude) to planar coordinates (x, y). To lighten the calculation and communication burden, we can divide a city into a number of zones. When a rider needs ride-hailing, he sends his encrypted planar coordinates to RS. RS packs coordinate ciphertexts into a ciphertext polynomial after receiving coordinate ciphertexts of drivers in the local zone. en, the distance ciphertext of the rider from all drivers is simultaneously calculated in a single ciphertext. When the rider receives the distance ciphertext, he decrypts it and gets the minimum distance value dist. Based on the dist, the rider finds the adjacent zones. For each adjacent zone, the same operation is performed iteratively. Finally, the nearest driver in the whole zone is matched.
Note that it is impossible that a driver currently being matched is matched to a new rider again, so we use variable matched [k] to represent the status of the kth driver. If matched [k] � 1, the kth driver is busy; otherwise, he is free. Drivers with busy status will not be considered for a new ride requesting. Figure 2 depicts the framework of CRide which can be described as follows:

Framework of CRide.
(1) Initialization. A city is divided into a number of zones of a certain size vertically and horizontally, and some parameters are set. Drivers registered on ORS service submit their zone locations to RS and constantly update when their zones are changed. Meanwhile, matched [k] is set to 0. (2) When a rider requests ride-hailing, he firstly generates a key pair (pk, sk) and then packs his location i�0 y R X i . Next, he applies inverse NTT on polynomial P x R and P y R and encrypts them to C x R and C y R . Finally, the rider sends pk, his zone location (e.g., z), C x R , and C y R to RS. and c y D � n−1 i�0 c i y D ) after receiving them. e base X � 2 θ and θ is large enough to separate each coordinate value at the bit level. en, RS computes the distances among the rider and n drivers in parallel by 2 in encrypted domain and sends C di st to the rider. (7) e distance ciphertext is decrypted and NTT is applied to get the distance polynomial, which consists of distances between rider and each driver. en, the rider gets the shortest distance d * in local zone. Taking d * , (x rt , y rt ) and (x lb , y lb ) as inputs, rider runs algorithm 1 to find adjacent zones array β Mathematical Problems in Engineering with at most 8 zones [15]. Generally, the nearest driver exists in one of them. (8) For each adjacent zone, 4-7 are run iteratively until the nearest driver is found and its matched [k] is set to 1.

Security Analysis from RS. RS is honest but curious who follows the protocol but is curious about locations of riders and drivers.
Towards privacy-preserving ride-matching, riders and drivers separately encrypt their location coordinates (x R , y R ) and (x D , y D ) with public key pk and then send individually location ciphertexts (C x R , C y R ) and (C x D , C y D ) to RS. So, the procedure of ride-matching is conducted in encrypted domain that prevents attackers and RS from learning anything because of the semantic security of FV. Meanwhile, none of RS and attackers has the rider's private key sk. So, they cannot decrypt the ciphertext. In the matching process, CRide ought to protect the privacy of drivers and riders. RS only knows the riding-matching result and does not learn anything else. us, RS cannot launch inference attack on riders.

Security Analysis from Driver.
In the proposed scheme, drivers may be ill-natured who could destroy other drivers' location information by encrypting nonzero values for other slots. To prevent the malicious attack, RS sets a different index i for each driver, and drivers encode their coordinates in the ith coefficient q i x D � x D i X i and q i y D � y D i X i , by which, RS can figure out the malicious behavior.

Security Analysis from Rider.
During ride-matching, drivers' locations are encrypted before being sent to RS. RS computes the ciphertext of distance and sends it to the rider. After receiving the ciphertext of distance, the rider can decrypt it and find the nearest driver. Meanwhile, he learns nothing about the location of the matched driver except for his ID, although the rider knows sk.

Evaluation
To evaluate communication and computation overload and ride-matching accuracy, some experiments are conducted on real-world dataset. In the experiments, we set the polynomial dimension of a 20-bit plaintext and the coefficient size to 4096 and 124 bits, respectively. us, CRide can reach a 112-bit security and support ciphertext calculation of 4096 bits. We implement the scheme with the C++ NFLlib library [16]. RS is deployed on a machine with Intel i7-10700 which has 2.9 GHz CPU and 16 GB RAM. Riders and drivers are located on the same machine with Intel i7-3537U which has 2.5 GHz CPU and 8 GB RAM. 2. pk, zone z and C xR C yR 2. (pk, sk) FV. keyGen (1 λ ) Figure 2: e framework of CRide.

Dataset.
CRide is evaluated based on real dataset that comes from [17]. In the experiment, we divide a zone z from a dataset for October 2014 with over 14 million riders. Zone z has about 4096 drivers and is used to evaluate the performance. Towards cross-zone ride-matching, zone z is divided into some zones vertically and horizontally. For example, the zone is divided into nine zones of the same size using 3 * 3 divisions.
In the dataset, we assume a record is a ride-matching request, and the pick-up location of a rider is his location. Since 99% of the time interval between the drop-off of a driver and his next pick-up is about 30 seconds, a driver is available for a ride request if there is once drop-off in the last 30 seconds.

Performance.
is part evaluates the performance of CRide, including communication and computation overhead compared with PrivateRide [7] and ORide [8]. Based on the above settings, the polynomial size is 62 KB. e maximum number of available drivers for a request is 4096. Experimental results are collected after 100 trials. When a rider requests ride-matching, communication overhead and computation cost are, respectively, shown in Tables 1 and 2.

e Performance of Riders.
Assuming that there are n drivers and m adjacent zones, the nearest driver may exists in one of zones. e rider sends RS a public key and two encrypted coordinates with six polynomials and a 372 KB payload for a ride request. e download overhead is linear to the number of available drivers in PrivateRide. With the ciphertext packing technique, ORide only has a distance ciphertext, and the number of distance ciphertext is reduced to m * n/d] in CRide. e size of an encrypted distance is 186 KB. For a ride request, 4096 encrypted distances are sent to the rider in RrivateRide. In CRide, the download overhead is linear to the number of chosen adjacent zones. e maximum value of m is 8. Experiments show that m is up to 3 in most cases. One hundred experiments show that on average, RS sends the rider the ciphertext of distance with size 353.4 KB for 3 * 3 division. Table 1 shows that the communication overhead of CRide is slightly increasing compared with ORide, which is significantly reducing compared with PrivateRide.
As shown in Table 2, the three schemes have the same computation cost for key generation and encryption. e decryption cost of CRide increases slightly compared with ORide. However, it is also significantly reduced compared with PrivateRide. Tables 1 and 2 show that the three schemes have the same communication and computational overhead for each ride request. For the driver, he will receive the public key pk and upload his coordinate encryption for ride requests from his zone and some adjacent zones. e driver's bandwidth should meet the ride-matching while not being too large. e bandwidth is calculated by multiplying the size of the required bandwidth per request with the number of receiving requests per second. Results of the experiment show that the required bandwidths for CRide (3 * 3), CRide (6 * 6), and CRide (9 * 9) are separately less than 0.68 Mb/s, 0.3 Mb/s, and 0.23 Mb/s. erefore, RS should balance the bandwidth requirement for drivers and zone division. Table 1, using ciphertext packing and cross-zone ride-matching, the communication overhead of CRide is significantly reduced compared with PrivateRide and ORide. e computation cost of CRide is linear to the number m of adjacent zones, and that of PrivateRide is linear to the number of available drivers. Table 2 shows that the computation cost of CRide is significantly reduced compared to (1) Input: the upper right corner coordinates (x rt , y rt ) and lower left corner (x lb , y lb ) of zone z (2) e distance d * between the nearest driver and the rider in zone z (3) e coordinate (x r , y r ) of the rider (4) Output: zones array β that nearest driver may exist (5) β is initialized to empty (6) if x r + dist > x rt , then add z 3 into β (7) if x r − di st < x lb , then add z 7 into β (8) if y r − di st < y lb , then add z 5 into β (9) if y r + di st > y rt , then add z 1 into β (10) if (x r − x lb ) 2 + (y r − y lb ) 2 < di st 2 , then add z 6 into β (7) if (x r − x lb ) 2 + (y r + y rt ) 2 < di st 2 , then add z 8 into β (11) if (x r + x rt ) 2 + (y r + y rt ) 2 < di st 2 , then add z 2 into β (12) if (x r + x rt ) 2 + (y r − y lb ) 2 < di st 2 , then add z 4 into β (13) return β ALGORITHM 1: Adjacent zone search. PrivateRide, and it is slightly increasing compared with ORide.

Accuracy.
For ride-hailing service, the key is to match the nearest drivers for riders. In experiments, 100 ride requests are generated randomly and implemented. We use ride-matching accuracy rate to demonstrate the matching degree of PrivateRide, ORide, and CRide with the ground in plaintext. As shown in Figure 3, CRide gets the nearest drivers for about 92% of riders, which exceeds ORide by nearly 30% under all division granularities, which implements ride-matching without cross zone. Meanwhile, CRide reaches the same accuracy as PrivateRide, which implements ride-matching without division because of cross-zone matching. But Table 1 shows that the communication overhead and computation cost of PrivateRide are considerably large compared with CRide.

Related Work
With the popularity of ride-hailing services, privacy protection receives more and more attention. Some works have been done to design privacy-preserving ride-hailing schemes. Duan et al. and Khazbak et al. [18,19] used cloaking technologies to match the nearest drivers for a ride request while protecting the sensitive information of drivers and riders. Duan et al. [18] proposed cloaking region-based passenger privacy protection in ride-hailing, maximizing social welfare under riders' privacy requirements. Khazbak et al. [19] proposed a ride-hailing scheme with privacy preserving, which considers the privacy preference of riders using novel obfuscation techniques. However, these schemes use location range to find the nearest drivers for a rider, which results in low accuracy.
To promote the accuracy of ride-matching, schemes [15,20] were designed using the road network embedding technology (RNE) to match. Luo et al. [15] proposed pRide which allows ORH to efficiently match riders and drivers based on RNE and garbled circuits. Yu et al. [20] proposed RMatch, which computes the shortest distance using RNE and PHE. Meanwhile, Yu et al. [20] also proposed EPRide, in which the Hamming distance replaces road distance to implement ridematching using somewhat homomorphic encryption and load network hypercube embedding technology. But the three schemes assume that there is a trusted party, which cannot fully guarantee the privacy of user in practice.
ere are new works on privacy-preserving ride-hailing. Shivers et al. [21] proposed a framework for developing a decentralized ride-hailing architecture for autonomous vehicles implemented on the Hyperledger Fabric blockchain platform. Huang et al. [22] proposed a privacy-preserving ride-matching scheme with prediction, which utilizes a deep learning model to predict the emergence of ride requests in various regions and find the best driver in a global perspective instead of the nearest driver in the local region, leveraging prediction results.

Conclusions
In this thesis, a privacy-preserving cross-zone ridematching scheme is proposed in ride-hailing services. e city is divided into some zones, and ride-matching is implemented in cross adjacent zones using somewhat homomorphic encryption and ciphertext packing. By using CRide, RS can find the nearest driver for riders in the whole zone. During matching, the privacy of drivers and riders is protected.
eoretical analysis and experimental results over real-world datasets prove that CRide achieves acceptable efficiency and comparable matching accuracy with the ground in plaintext. In the next research, the reputation of drivers and the interest of riders are worthy of considering for ORH service.
Data Availability e data that support the findings of this study are from previously reported studies and datasets, which have been cited.
e processed data are available from the corresponding author upon request.