Sound-Proximity: 2-Factor Authentication against Relay Attack on Passive Keyless Entry and Start System

Passive keyless entry and start system has been widely used in modern cars. Car owners can open the door or start the engine merely by having the key in their pocket. PKES was originally designed to establish a communication channel between the car and its key within approximately one meter. However, the channel is vulnerable to relay attacks by which attackers unlock the door even if the key is out of range. Even though relay attacks have been recognized as a potential threat for over ten years, such attacks were thought to be impractical due to highly expensive equipment; however, the required cost is gradually practical. Recently, a relay attack has been demonstrated with equipment being sold only under $100. In this paper, we propose a sound-based proximity-detection method to prevent relay attacks on PKES systems. The sound is eligible to be applied to PKES because audio systems are commonly available in cars. We evaluate our method, considering environments where cars are commonly parked, and present the recording time satisfying both usability and security. In addition, we newly define an advanced attack, called the record-and-playback attack, for sound-based proximity detection, demonstrating that our method is robust to such an attack.


Introduction
Passive key entry and start (PKES) system is a service that aims to provide car owners with convenience. In a traditional way, they insert their keys directly in order to open car doors and start the engine. Accordingly, keys must be removed from pockets and bags before using their car. The PKES system allows them to unlock and start engine without needing to remove the key from their pocket or bag. This implies that they do not have to locate their key while standing in front of their car. PKES was first introduced by Mercedes-Benz in 1998 [1]. Since then, car manufacturers have been applying the system in modern cars. In PKES, cars communicate with the corresponding key only when they are in proximity, using a short-range channel. In other words, cars communicate with their paired key only if the car owner with the key is close to the car. Then, cars verify the key over the established communication channel. Because car and its key share a secret value for authentication, keys without the secret value are not authenticated correctly, even when they are in close proximity to the car. Despite the convenience of PKES, its vulnerability has been recently discovered [2,3], which is our motivation to construct a new method.
In various areas, researchers have studied relay-attackresilient proximity-verification methods using either a distance-bounding protocol or context-based method [4][5][6][7][8][9][10][11][12][13][14][15]. Unfortunately, however, these existing methods have each limitation to be applied into PKES. A distancebounding protocol measures distance based on received signal strength indicator (RSSI), ultrasonic sound, or radio frequency (RF) signal. These factors are inappropriate to be used for proximity verification in PKES. The RSSIbased or ultrasonic-sound-based method are vulnerable to attacks where the "verifier" (i.e., the car) is deceived into thinking that even a distant "prover" (i.e., the key) is nearby, by amplifying or relaying signals [8,16]. The RF-based method is unsuitable for PKES (since radio waves move at the speed of light, the variation of processing time causes measurement error a lot), because the processing time of the prover should be invariant [8]. Context-based proximity-detection methods depend on two types of contextual information: wireless signals (e.g., Bluetooth and 2 Journal of Advanced Transportation Wi-Fi) and environmental signals (e.g., light and ambient sound). Because cars are often in environment where wireless signal is rare, it is inappropriate to use the wireless signal as contextual information. Moreover, it is difficult to exploit the similarity in light between the car and a key, because keys are often in a bag or a pocket. However, sound exists anywhere, and even in silence cars can make sounds easily since they already have a buzzer. And a key can record similar sounds to a car even when it is in a bag or pocket. Recently, there have been several services using sounds to detect proximity to user's smartphone [17,18].
In this paper, we propose a sound-based proximitydetection method to prevent relay attacks on PKES. In addition, we newly define an adversary model for sound-based proximity verification, which is called record-and-playback attack. We evaluate the fact that our method is robust to this novel attack and that the proposed method performs accurately. Our contributions are detailed as follows.

Our Contribution.
We first propose a method designed to prevent relay attacks on PKES system in modern cars. We utilize sound-based proximity detection to construct our method.
(i) Our method is able to prevent the attack which existing sound-based approach is reportedly vulnerable to [10]. We show that our method prevents such attacks by emitting random sounds, while also recording ambient sound.
(ii) We newly define a record-and-playback attack that thwarts sound-based approaches. In addition, we demonstrate that our method is robust to such an attack.
(iii) We conducted a series of experiments using commercial off-the-shelf products (i.e., microphones and loudspeakers). With these products, we show how practically feasible attacks on PKES systems are prevented.
The rest of the paper is organized as follows. In Section 2, we first describe the motivation of this paper. In Section 3, we introduce the preliminary background for this paper. In Sections 4 and 5, we provide a system model and explain our method, respectively. Section 6 presents the result of the evaluation of our method. Related works are discussed in Section 7. The limitations of this paper and future works are presented in Section 8. Finally, we conclude this paper in Section 9.

Motivation
With PKES, the car owner is passively authenticated based on the proximity of a car to its corresponding key, which belongs to the car owner. The car (i.e., the verifier) detects the proximity of the corresponding key by using a shortrange communication channel, such as Bluetooth or radio frequency identification (RFID) [19,20]. Then, an authentication protocol-KeeLoq, for instance [21]-is executed between the car and the key, such that the car owner does not need to do anything for being authenticated. Unfortunately, short-range communication channel is insufficient for guaranteeing the correct proximity of the owner, because the communication range can be extended. For example, the communication range of Bluetooth is approximately ten meters; however, malicious attackers can scan and attack a Bluetooth device from up to a mile away using signal-extending device called BlueSniper Rifle [22]. Indeed, Francillon et al. demonstrated that relay attacks could be used against PKES in modern cars [3]. They successfully relayed a signal from a car to its corresponding key using two loop antennas connected together with a cable. Moreover, attackers can opt to use an amplifier in the middle of the cable to improve the signal power. Francillon et al. succeeded at opening the door and starting the engine of a car, despite the fact that the key was 60 m away. In theory, they added that such an attack is possible at distances of up to 1.5 km. In spite of these concerns, relay attacks on PKES have been regarded as impractical. Designers consider relay attacks too difficult and costly for attackers to deploy. However, Bilton claimed that the tools required for such a relay attack are available for only USD$17 [23]. Furthermore, a recent robbery involved thieves perpetrating a relay attack in Frankfurt, Germany [2]. The robbers succeeded at opening a car door with a relay attack and proceeded to steal a number of valuables from inside the car. Fundamentally, PKES works by detecting the proximity of the key with a short-range communication channel. This makes it possible to deploy a relay attack. By relaying signal, it is possible for a car to communicate with its corresponding key at extended distances, ultimately unlocking the door and even starting the engine.

Background
In this section, we explain the PKES system and the metrics used for similarity measurements.

Passive Keyless Entry and Start (PKES)
System. Waraksa et al. first proposed the PKES system to enable drivers to automatically lock, unlock, and start a car. The system unlocks the car as the driver approaches it, carrying the corresponding key. It likewise locks the car whenever the key is out of range. Because the system does not require any action on the part of the driver, it is called "passive." With PKES, a car communicates with its corresponding key using magnetically coupled RF signals. Thus, a communication channel is established when the key is in close proximity to the car. The car and its corresponding key use two types of RFID tags: low-frequency (LF) and ultrahighfrequency (UHF) RFID tags. The car uses an LF channel to send messages to the key, instructing it to "wake up" and accept "challenges." The key uses a UHF channel to send messages to the car in response. The UHF RFID tag is used to save battery power [24]. Because the communication range of the LF RFID tag is approximately 1-2 m, the key wakes up only when it is close to the car (there are several categories  of RFID technology based on used frequencies, and each has both advantages and disadvantages) [25]. Figure 1 illustrates how the PKES system works. The car periodically broadcasts a wake-up signal using the LF channel. If the key is within range of the car, an authentication protocol is executed (e.g., KeeLoq (sometimes called a hopping code or a rolling code) is a block cipher that uses a nonlinear feedback shift register (NLFSR); it accepts 64-bit keys and encrypts 32bit blocks, which was used in many remote keyless entry systems) to generate a challenge and receive the response. Both the car and the corresponding key share a symmetric key for authentication, and this is set by the manufacturer. PKES is now a widely used system. However, as described in Section 2, attackers can unlock the car door even without having the symmetric key, simply by relaying LF and UHF channels.

Sound Similarity.
To detect the proximity of a car to its key, the similarity of sound can be used, when both the key and the car record sound concurrently. In this subsection, we describe three typical metrics for measuring the similarity between two recorded sounds: the Euclidean distance, cross-correlation, and cosine similarity [26]. We evaluated our method using these metrics. In what follows, = ( 1 , 2 ⋅ ⋅ ⋅ ) and = ( 1 , 2 ⋅ ⋅ ⋅ ) denote two signals represented as -points in a discrete time series (i.e., the recorded sound). For simplicity, we assume that both series have the same length.
Euclidean Distance. The Euclidean distance (or Euclidean metric) is the most common sound-similarity metric. Geometrically, the Euclidean distance refers to the length of a straight line between two points in Euclidean space. In two dimensions, if = ( 1 , 2 ) and = ( 1 , 2 ), then the Euclidean distance is given as follows: A value of ( , ) = 0 indicates that the two points are exactly same. Values higher than 0 refer to the Euclidean distance between the two points. We can measure the sound similarity between = ( 1 , 2 , . . . , ) and = ( 1 , 2 , . . . , ) using the Euclidean distance as follows: The resulting distance ranges from 0, when they are exactly the same, to any positive integer. The degree by which two time series differ increases in proportion to the Euclidean distance.
Cross-Correlation. Cross-correlation is frequently used to measure the similarity between two series. The concept of correlation is important to understanding of crosscorrelation. The correlation between two variables refers to the degree of linearity between them. A correlation of 0 indicates that two variables are independent, and a correlation of 1 indicates that the two variables are exactly the same. Correlation ranges from −1 to 1, where the former refers to two variables that are the same but with opposite signs.
Cross-correlation is a measure of the similarity of two series as a function of the lag of one relative to the other. For two discrete time series, = ( 1 , 2 , . . . , ) and = ( 1 , 2 , . . . , ), the cross-correlation is as follows: where ∈ [0, − 1] denotes the lag.
To accommodate for different amplitudes of the two series, the cross-correlation can be normalized as follows: where Corr , [0] is the so-called autocorrelation of and A value of Corr , [ ] = 1 indicates that, at lag , two signals have the same shape, even if their amplitudes are different; −1 indicates that two signals have the same shape but opposite signs; and 0 indicates that the two signals are uncorrelated.
Accordingly, we can measure the sound similarity with a cross-correlation metric. The following absolute value for the maximum cross-correlationĈorr , can be used as a metric for similarity.Ĉ A value of Corr , = 0 indicates that the two series and are uncorrelated; 1 indicates that the two series are exactly the same.
Cosine Similarity. The cosine similarity is a measure of the similarity between two vectors by measuring the cosine of the angle between them. A cosine of 0 ∘ is 1, and it is less than 1 for any other angle. The cosine similarity is thus a judgment of the orientation, rather than the magnitude. Two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 ∘ have a similarity of 0, and two vectors diametrically opposed have a similarity of −1.
For two time series and , the similarity between them is calculated as follows: The resulting similarity ranges from −1 (exactly opposite), to 1 (exactly the same), with 0 indicating decorrelation. Values within this range indicate the intermediate similarity or dissimilarity.

System Model
We present a system model in which our method enables a car to verify proximity of its key. The basic concept is for a car and its key to concurrently record ambient sounds as shown in Figure 2. The car would measure similarity between the two recorded sounds and then proximity of the car key is determined based on the result. In contrast to existing PKES, our method needs both entities to have a microphone by which they record sounds; accordingly, it seems that our method might have a limitation to be directly applied into current cars. Even though car has enough ability to record sounds or it is relatively easier to install an additional device for it, it is hard for current car keys to record sound and transmit it wirelessly. Recently, however, most car manufactures have released their telematics services to provide convenience with car users [27]. Through smart devices, car owners are able to unlock the door or start the engine at a distance from their cars, which implies that cars have been already connected with smart devices. In fact, smartwatch would be a promising type to replace car keys because users would always wear it [28][29][30]. We expect this kind of devices will gradually replace conventional car keys and our method will be suitable. To better explain our method, we detail each procedure by which the car verifies its corresponding key based on proximity of the car key.
(i) The car periodically broadcasts a beacon message via a short-range communication channel (e.g., the LF RFID channel).
(ii) If a car key is within the communication range, the car key responds to the beacon.
(iii) The car and its key perform an authentication protocol using the preshared secret (i.e., symmetric key).
(iv) If the car keys are authenticated, they simultaneously record ambient sounds, and then the keys transmit the recorded audio file to the sound-similarity estimator module of the car.
(v) The estimator module measures the similarity of the two audio files-the audio file from the car and the one from the key-thereby detecting the proximity of a car to the key.
If the car key is authenticated and its proximity is verified, the door would be unlocked. It is noted that our method is designed for Steps (iv)-(v) and (i)-(iii) are the same procedures as existing PKES.

Concern of Interference.
Because our method is designed to verify proximity based on recorded ambient sounds, a concern of interference by noise might exist. In this subsection, we describe how our method works even under noisy condition which might cause interference. It is important to understand the fact that a car and its key do not communicate with each other through sounds in our method. Ambient sounds to be recorded are only used to check if they are close to each other. In other words, our method is designed based on the fact that the entities which are close to each other would listen to the same or very similar ambient sounds. As a result, no matter how noisy environment is, a car and its key would record the same sounds; rather, noisy sounds can be helpful to extract unique characteristics for their proximity because that noise would be location-specific information.
Furthermore, even in case of multiple cars, our method would work properly. For example, multiple cars are using our method at a parking lot and they record the same ambient sounds nearby. In this case, all of the cars may record the same or very similar ambient sounds so that it might seem difficult to distinguish the proper car key. Even though, however, a car key has the same sound signature, the car key would be blocked by the authentication protocol that is performed before our method. Please recall that our method is designed for a car to verify proximity of its key, by which the existing authentication protocol would be supported.

Adversary Model.
The main goal of attackers on our method is to unlock the door or start engine without car owner's knowledge. Because the existing PKES is able to prevent passive attacks, we only consider active attacks on our method. Accordingly, adversaries have abilities to modify replay messages and relay signals. In our adversary model, adversaries are divided into two types based on how they acquire ambient sounds which will be used for comparison with one measured from a target car. The first type of adversaries (Type I) is the attack that tries to unlock the door when the driver and car key are only a moderate distance away (e.g., as the driver walks through the parking lot). The other type of adversaries (Type II) is the attack that records the ambient sound near the car and then plays back the recorded audio in the vicinity of the car key. The detailed descriptions of each adversary are as follows.
Adversary Model Type I (Out-of-Range Attack). Type I adversary model represents attackers who have the ability to execute a relay attack on PKES. After the driver parks his/her car and locks the door, he/she would walk away. At this point, both a car and its key are likely to record very similar sounds. That is, Type I adversaries have the chance to unlock the door before the driver moves too far away, by relaying signals from the car to its corresponding key. We define this as an out-ofrange attack.
Adversary Model Type II (Record-and-Playback Attack). Type II adversary model represents the attack that records the ambient sound near the car and then plays back the recorded audio in the vicinity of the car key. Even though the car is distant from its corresponding key (e.g., when the driver works at an office), Type II adversaries would manipulate the ambient sound by relaying it such that both the car and key record a similar sound (hence, a record-and-playback attack). Because we define this type of attacks for the first time, we detail each step in record-and-playback attack as follows. Moreover, two colluding attackers A and B are required in this attack. We refer to this as a record-and-playback attack.
(i) Attackers A and B, who are close to the car and its key, respectively, establish a long-distance communication channel.
(ii) Attacker A records the ambient noise near the car, encodes the sound, and transmits this audio file to Attacker B via the long-distance communication channel.
(iii) Attacker B decodes and plays the audio file, such that the key records audio that is similar to the ambient noise near the car.

Our Assumptions.
Our method is designed based on the following assumptions: (1) First, our method assumes that, like the existing PKES system, both the car and key have a preshared secret (i.e., a symmetric key).
(2) Further, because our method aims to prevent relay attacks (as described in Section 2), we assume that attackers have the ability to relay short-range communications between the car and the key.
(3) Finally, we assume that the key has sufficient resources to record and transmit audio files. A smartwatch, for example, would work as such a key with our method [29,30].

Our Method
Our method is designed to accurately detect the physical proximity of a car to its key. To do so, the method first executes one of the existing authentication protocols for PKES. After this initial authentication, both the car and the key start to record ambient sounds concurrently. The car key sends the recorded sounds to the car; then, the car verifies proximity of the key based on the similarity result of the two recorded audios. We describe each step in our method as follows.

Symmetric-Key Based Authentication.
In PKES, a car authenticates its corresponding key with a preshared symmetric key. As explained in Section 3.1, an RFID communication channel is used such that the car periodically checks whether the key is nearby. When the key and the car are both within the RFID communication range, they execute a predefined authentication protocol. KeeLoq is the standard protocol used with existing PKES system [21]. If the car key is authenticated, the following steps are taken to ensure physical proximity between the car and its corresponding key.

Ambient Sound
Recording. Before the car and its key record ambient sounds, car makes other random sounds to enhance entropy of ambient sounds to be recorded such as horn. This is because that the car cannot make sure of the verified proximity if the entropy is low. For the random sound the car emits, frequencies are randomly selected within the human audio spectrum, ranging within 20−20,000 Hz as follows: ∈ [20,20000] , 1 ≤ ≤ .
For the selected frequencies , the corresponding length of time during which each frequency plays is also selected randomly, as follows: where is the total time during which the random sound plays. The car and its corresponding key both record ambient sound, whereas only the car emits the random sound. As a result, two audio files are generated, REC car and REC key .

Recorded-Signal Transmission.
The key generates a message authentication code (MAC) for the recorded audio file REC key , by using a preshared secret for the MAC. Then, the key transmits REC key and MAC (REC key ) to the car. Because the MAC is coupled with the key's audio file, the car can verify the integrity of REC key to confirm that REC key was recorded exclusively by the appropriate key.

Proximity Detection.
The car thus receives REC key from the corresponding key and estimates the similarity between REC car and REC key . The estimation function (described in Section 3.2) is used to calculate the similarity score as follows: Score Similarity = (REC Key , REC car ) .
After this similarity estimation, the car compares the similarity score to a threshold that has been set in advance. If the score is higher than this threshold, the method concludes that the key is in close proximity to the car. Accordingly, the doors of the car will be safely unlocked.

Experimental Results
We evaluated our method by performing a series of experiments. We assessed the error rate when detecting the proximity of the key to the car. We used typical metrics to assess this error rate-that is, the False Negative (FN) rate and the False Positive (FP) rate [31]. The FN rate represents the probability that our method will (falsely) determine that the key is not close to the car even though, in fact, it is. The FP rate represents the probability that our method (falsely) judges that the key is near the car when, in fact, it is not. False Positives can arise, for instance, when the two audio files are recorded at different times. Our experiments yielded an equal error rate (i.e., the rate at which both the FN rate and FP rate are the same) of 0.0024. Because our method is executed with the existing symmetric-key-based authentication protocol, such an error rate should be considered trustworthy.

Experimental Setup.
We tested our method with a laptop and a Raspberry Pi to represent the car and its corresponding key, respectively. Figure 3 shows our experimental setup, and Table 1 explains the function of each component and its specifications. To record sounds, we selected a sampling rate in accordance with the Nyquist−Shannon sampling theorem [32]. Because sound is audible up to a frequency of 20 KHz, the sampling rate should be greater than 40 KHz. Our method records sound at 44.1 KHz, because this is the minimum sampling rate in common, satisfying greater than 40 KHz. Notably, a higher sampling rate requires more memory and battery power.

Basic Experiment.
In this section, we discuss the process of finding an optimal recording time for our method. With this recording time, we show that our method has a sufficiently low error rate for proximity detection. With PKES, the car door is unlocked only if the driver is carrying the key within one meter from the car. Accordingly, we evaluated our method such that the car (i.e., the laptop) is within a meter from the key (i.e., the Raspberry Pi). We selected an outdoor parking lot as the most common parking location and performed the experiment during the daytime. We tested our method by changing the recording time from 1 s to 5 s at intervals of 1 s. Furthermore, 500 pairs of audio files were recorded to evaluate each candidate recording time. In order to estimate the similarity between each pair of audio files, we used the metrics described in Section 3.2. Tables 2-4 show the error rates for our method with three different similarity metrics: Euclidean distance, cosine similarity, and crosscorrelation. In these tables, rows represent the recording time and columns represent the thresholds. From these three metrics, we can see that the error rate with cross-correlation is much lower than it is with the others. Even when the threshold was between 0.2 and 0.3, the error rate was almost 0. That is, cross-correlation outperformed the other similarity metrics. Thus, only cross-correlation was used to estimate the sound similarity in subsequent evaluations. In addition, a longer recording time resulted in a lower error rate, and vice versa. This implies a tradeoff between Used for a record-and-playback attack. The two mobile phones are connected to one another in speaker mode via a cellular (3G) network.     usability and security. We set the optimal recording time at 2 s, to satisfy both usability and security, because we expect that car owners can easily wait 2 s to unlock a door with an error rate close to 0. Therefore, we used 2 s as the recording time for subsequent evaluations.

Environment Conditions.
In this subsection, we focus on environments where cars are commonly parked. We selected three typical locations, as shown in Figure 4: (i) an outdoor parking lot, (ii) an underground parking lot, and (iii) roadside parking. Each environment has distinct ambient noise: wind and the buzz of conversation in the outdoor parking lot; frictional sound between the tires and surface, fan noise, and siren noise in the underground parking lot; and the noise from passing cars when the car is parked on the side of the road. Figure 5 shows the error rates for our method (with crosscorrelation, recorded for 2 seconds). Because the road noise is much louder, it was sometimes difficult for the car key to recognize the beacon sound. On the other hand, it was easier for the car to recognize its own beacon sound, because the car's microphone and loudspeaker were close together. This explains why the error rate with road-side parking was relatively higher. Nevertheless, the equal error rate with roadside parking was only 0.0037, meaning that our method is robust to environmental dynamics.
In addition, the thresholds for the equal error rates in each environment were different. Figure 5(d) shows an equal error rate of 0.0024 when all data was evaluated regardless of the environment. The threshold for the equal error rate was set at 0.272. We thus applied this threshold (i.e., 0.272) to each environment. Table 5 shows the FN and FP rates for each environment, with a threshold of 0.272.
A threshold of 0.272 is therefore suitable, because the error rates shown in Table 5 are adequate for proximity detection.
6.4. Sound Level. As we described in Section 4.1, loud condition would help our method clearly verify the proximity. In this subsection, we provide the experimental measurements on the decibel levels of sounds to show how error rates change. Our method was evaluated either indoors or outdoors to get the decibel level we want. The decibel levels were measured including random sounds the car emits to increase in our method. It is noted that lower sound than 60 dB was available inside of building. Figure 6 shows FN and FP rate in different sound levels. These error rates are calculated with the same threshold applied above (i.e., threshold of 0.272). In this result, we can see that every sound level has 0 of FP rate. As sound level is higher, our method has lower error rate, in which our method has FN of 0.004 under the decibel level of 90 dB.

Out-of-Range Attack.
In this subsection, we evaluate the robustness of our method to adversary models. Because our method is designed to detect close proximity, we regard a car key that is farther than 1 m as invalid. Unfortunately, our method cannot exactly distinguish the difference between 1 m and 2 m, because the sound heard at both distances is almost identical. On the other hand, if the key is far from the car, our method can distinguish between the audio files because the key cannot properly recognize the beacon sound. Figure 7 shows the impact of the sound-similarity score on the distance. As the distance increases, the similarity score decreases. Indeed, when the key was 15 m away from a car, the mean of the sound-similarity score was almost 0. Table 6 shows the FP rate corresponding to the distance between the car and the key.
Although the error rate at 15 m is significantly higher, it is unrealistic for attackers to unlock the door at such a distance because the driver would be close enough to notice the attack. Accordingly, out-of-range attacks within 15 m are impractical, whereas those deployed from more than 15 m away are easily detected with our method. Therefore, we conclude that our method is robust to Type I attacks, as described in Section 4.2.  6.6. Record-and-Playback Attack. In this subsection, we evaluate the robustness of our method to a record-and-playback attack. Because record-and-playback attacks occur when a car and a key are far away from each other, we first selected typical locations of cars and keys. We used the same three car locations as described above, and we selected three common places where keys might be found: (i) a coffee shop, (ii) an office, and (ii) a gym locker room. We thus evaluated our method in nine paired environments.

Journal of Advanced
For a record-and-playback attack, the attackers must stream two-way audio between the car and the key in realtime. Consequently, we used two mobile phones to record    Figure 8 shows the impact of the sound-similarity scores on the paired environments. The reason why all of the soundsimilarity scores were lower than the threshold (0.272) is that the car and the key recorded different ambient noise, while it is difficult to relay the beacon sounds clearly to the corresponding key.

Related Works
Systems that enable passive authentication, including PKES, are vulnerable to relay attacks [3,33]. In this section, we introduce related works that focus on physical-proximity detection, where the goal is to prevent relay attacks.
Received Signal Strength Indicator (RSSI). RSSI systems have been used to estimate the distance between a verifier and a prover by processing signal strength information. For example, if a verifier determines that the prover's signal strength, they can be considered close to one another. Krumm and Horvitz proposed an RSSI-based system using the Wi-Fi signal strength [34]. With their system, a wireless access point (AP) measures the Wi-Fi signal strength of users' devices. From these measurements, both the user's motion and the distance between the user and the AP can be inferred. Fishkin and Roy proposed an RFID-based method that uses the relation between the RSSI and variations in the energy signature to determine the level of trust [35]. However, both of the above methods are vulnerable to attackers who amplify the broadcasting power or change the directional characteristics of the devices [8].
Distance-Bounding Protocol. Several security methods have been proposed based on a distance-bounding protocol in order to prevent three types of attacks [4-6, 8, 11, 13-15, 36, 37]. For historical reasons, these are known as Distance Fraud, Mafia Fraud, and Terrorist Fraud [38,39]. The goal of a distance-bounding protocol is to enable a verifier to establish an upper bound on its physical distance to a prover. With the distance-bounding protocol, the verifier sends an unpredictable challenge to the prover. The prover then generates a response to the challenge and sends this value to the verifier. Based on the round-trip time measured by the verifier, the distance between the verifier and prover can be derived. There are two types of distance-bounding protocols. The first uses an ultrasonic communication channel to transfer data. This method is more accurate at measuring the distance between the verifier and prover. However, it is more vulnerable to relay attacks that use a faster communication channel than ultrasonic communication [16]. The second type of distancebounding protocol uses RF communication channels. Even though RF communication channels are resilient to relay attacks, both the verifier and the prover must be strictly timesynchronized and the prover's processing must be invariant [8].
Context-Based Proximity Detection. Contextual information varying over time or location can also be used to verify whether a prover is close to a verifier, assuming that copresent devices acquire similar information. In addition, contextbased detection assumes that attackers cannot infer the information needed for verification at a distance. For example, GPS coordinates and/or RF packets can be used as contextual information for proximity detection [40][41][42][43].
Varshavsky et al. proposed a proximity-detection method that uses RF packets and RSSI as contextual information [43]. Shrestha et al. proposed a method that exclusively relies on contextual information such as temperature, precision gas, humidity, and altitude [41]. Because it is difficult for exclusively contextual information to be compromised, their method is resilient to attacks. Truong et al. proposed a method that uses modalities that can be measured from the sensors of smartphones [42]. They used Wi-Fi signal strength, Bluetooth signal strength, GPS, and audio as contextual information. Miettinen et al. proposed a method that uses ambient noise and luminosity, both of which can be easily measured with wearable devices [40]. They used this information both to establish a secure channel between the verifier and the prover and to detect proximity.
Sound-Based Proximity Detection. In the context of a nearfield communication (NFC) financial transactions, bank servers can validate transactions when both the NFC phone and reader are precisely at the same location, thereby preventing relay attacks against such systems. Halevi et al. proposed a method that can detect proximity using ambient sound in NFC payment systems [33]. Thiel et al. proposed a proximity-detection method with mobile phones that is based on sound beacons in an inaudible spectrum around 18 KHz [44]. Even though both the above methods are similar to ours, their method cannot be easily applied to PKES systems. Moreover, they are not resilient to record-andplayback attacks. Schürmann and Sigg proposed a method that establishes a secure communication channel among devices based on similar audio patterns using a fuzzycryptography scheme [45]. They extracted features from ambient audio and used these features to generate a shared cryptographic key between devices, without exchanging information regarding the ambient audio itself. Karapanos et al. proposed a two-factor authentication method using sound-based proximity detection [10]. When clients login to websites with this method, they first submit their username and password. Then, smartphones carried by the clients are used as the secondary authentication (i.e., a what-you-have authentication). Both the smartphone and a laptop record ambient sound at the same time, after which both audio files are sent to an authentication server. The authentication server estimates the sound similarity in order to confirm the proximity of devices. Thus, even if attackers know the client's username and password, they cannot be authenticated because they do not have the client's smartphone. This method is practical for applications with two-factor authentication systems. However, their method is vulnerable to same-media attacks and the record-and-play attack we described above.

Limitations and Future Work
In this section, we discuss limitations of our method and future work. We have evaluated our method under various environmental conditions, but there are still other environmental conditions that remain to need to be tested. For example, during the day, sound bends away from the ground, whereas during the night, it bends toward the ground. Accordingly, an experimental result on this phenomenon seems to be needed.
Moreover, we should imagine an attacker who has the ability to emit loud noise near both the car and its key at the same time. This noise might be able to overwhelm the ambient noise and the random sound; as a result, both the car and its key may record the artificial noise generated by the attacker. We shall explore this type of attack in future work.
We can also enhance the performance of our method by combining it with the existing researches. In case of an out-ofrange attack, we might combine our method with the research by Lu et al., who proposed a method that enables mobile phones to model sound events [46]. They demonstrated that their system was capable of recognizing meaningful sound events occurring in the everyday lives of users (e.g., walking outdoors and highway driving). By applying their research to our method, the car will recognize the environment it is in. Then, a dynamic threshold can be set based on the recognized environment, thus decreasing the error rate.
To counter a record-and-playback attack, we can combine our method with that of Das et al., who proposed a method to identify smartphones through imperfections in acoustic components [47]. Their method was able to distinguish between different loudspeakers even when they play the same sound. Accordingly, by applying their research to our method, the car will be able to distinguish between different sources of recorded sounds (i.e., whether sounds are coming from a car or a mobile phone).
Finally, with our method, the random sound emitted by the car risks annoying people in the vicinity. Thus, inaudible frequencies (i.e., 18-20 KHz) might be used for the beacon sound, such that the sound is imperceptible to humans but not to the car and the key.

Conclusion
In this paper, we described PKES system and its vulnerability (i.e., relay attack). The relay attack has been a critical issue. In fact, thieves even utilized the relay attack to steal valuables from a car. The main reason why PKES system is vulnerable to the relay attack is that the communication neighborhood would not be proof of physical proximity. We proposed a method which enables detecting physical proximity, making it possible to prevent relay attacks on PKES systems. Because our method detects physical proximity based on sound similarity, we presented a system model for our method and defined a new adversary model (i.e., a record-and-playback attack).
In addition, we evaluated our method by taking environmental conditions into consideration. In other words, we selected typical locations of cars and keys for our evaluations. We showed that our method has high accuracy in detecting physical proximity of a car to a key and is robust to both an out-of-range attack and a record-and-playback attack. We discussed several methods that could be applied into our method in order to improve accuracy in detecting physical proximity [46,47], which we leave for future work.