This paper presents an acoustic indoor localization system for commercial smart phones that emit high pitched acoustic signals beyond the audible range. The acoustic signals with an identifier code modulated on the signal are detected by self-built receivers which are placed at the ceiling or on walls in a room. The receivers are connected in a Wi-Fi network, such that they synchronize their clocks and exchange the time differences of arrival (TDoA) of the received chirps. The location of the smart phone is calculated by TDoA multilateration. The precise time measuring of sound enables high precision localization in indoor areas. Our approach enables applications that require high accuracy, such as finding products in a supermarket or guiding blind people through complicated buildings. We have evaluated our system in real-world experiments using different algorithms for calibration-free localization and different types of sound signals. The adaptive GOGO-CFAR threshold enables a detection of 48% of the chirp pulses even at a distance of 30 m. In addition, we have compared the trajectory of a pedestrian carrying a smart phone to reference positions of an optic system. Consequently, the localization error is observed to be less than 30 cm.
From the sustained rise and ubiquitous availability of mobile computers, smart phones, and handheld devices in everyday life, a multitude of exciting new location-dependent applications have emerged. Context sensitive applications support the user in everyday life. One of the most important contexts is
The GPS-Module in commercial off-the-shelf (COTS) smart phones and hand-held devices makes navigation systems reliable to assist in outdoor areas [
Today several indoor localization systems are available, based on different methods and technologies. Some of these systems work with COTS smart phones. In addition, many participants have already COTS smart phones, which reduces the costs of the localization system. Figure
Overview of localization system based on smart phones.
We use the principles of smart phone localization from our prior work [
Many present localization systems use radio frequency (RF) signals for localization. The RF systems use the propagation of radio waves for position calculation. Therefore, the existing infrastructure can often be used. In the following, a brief description of indoor localization systems based on three different RF technologies is presented. Otsason et al. used the GSM communication with wide signal-strength fingerprints to locate the user in indoor environments [ To sum up, the RF systems are susceptible to errors in dynamic environments. For example, the RSSI value depends on the environment and the smart phone. The RSSI value is distorted by objects in the direct path, in the vicinity and by environmental influences, like air humidity, and so forth. Additionally, the RSSI value also depends on the orientation of the antenna. The antenna directivity is influenced by specific smart phone types and the actual orientation to the anchor nodes. RF localization systems can localize people with low accuracy (1.5 m–3 m). Through combination with other technologies, this accuracy can be improved. The multimethod approach [ An alternative technology is pedestrian dead reckoning (PDR) with inertial sensors. By using the integrated MEMS sensors (accelerometers, gyroscopes), the current position can be calculated recursively based on the measured acceleration and angular rate of the movement. Inertial sensors based localization work without addition infrastructure. However, the errors of the sensors are accumulated during the integration of the measurement values, which increases the localization error with the investigation time [ Other existing smart phone localization systems use information of the surrounding. Further, the magnetic field fluctuations and anomalies inside buildings [
Sound is feasible for high accuracy indoor localization. Smart phones can generate sounds from their built-in speaker or they can detect sounds with the integrated microphone. In comparison to other technologies, the position accuracy can be increased. The sound propagation is slow compared to the speed of light; thereby, the time stamp of the received signals is easier to determine. Precise measurement of the time of arrival (ToA) is very important for exact position determination. Errors exist from the clock skew and drift between devices and differences of the propagation speed. In contrast, through the high propagation speed of light, small errors lead to high position deviations. Furthermore, the received sound signals can be analyzed in detail and the suppression of multipath signals is straight forward. A brief description of current indoor localization systems based on sound is presented in this section.
Most of the research groups uses the time of flight (ToF) or round trip time (RTT) measurement for smart phone positioning. However, there are several intrinsic uncertainty factors of a ToF measurement which lead to the ranging inaccuracy. For COTS smart phones, there exists a variable latency, a changeable misalignment between the timestamps of the command from the transmitted signal and the transmitted signal from the loudspeaker. Another problem is the synchronization of the smart phones and receivers. These delays can easily add up to several milliseconds, which imply a ranging error of several cm.
Borriello et al. presented the WALRUS [
In most of the state-of-the-art systems, the anchor nodes are used as transmitters. Those receivers detect the sound signal emitted by the anchor nodes. However, this method suffers from certain disadvantages. The sound signals are received at different positions during a movement (see Figure The microphone of COTS smart phones can only detect relatively low frequencies (i.e., frequencies in the audible range) due the limitation of its built-in microphone (made for normal speaking which uses the band between 80 Hz and 12 kHz). Outside this frequency range, the microphone has low sensitivity to receive sound from larger distances. Additionally, there exists a maximum sampling rate of the analog to digital converter of COTS smart phones. The corresponding sampling frequency needs to be greater than twice of the maximum signal frequency. As a result, the sound emitted by the handheld device lies in the audible range, detectable by the user. Furthermore, this frequency band is crowded with natural sounds, making it more difficult to distinguish the localization signal from noise. Due to permanently receiving the sound signals by the mobile device, an increased power consumption on the mobile side is necessary for signal identification and calculation [
Principle of moving device position estimation with stationary transmitters [
In the presented work, the practical implementation of the concept acoustic self-calibrating system for indoor smart phone tracking (ASSIST) as discussed in [
Overview of the ASSIST system with smart phone, the network of receivers and an evaluation unit.
In ASSIST, the smart phones generate sound impulses beyond the human audible range. The sound impulses were received by self-built receivers which can be placed at the ceiling or on the walls of a room. A minimum of three receivers is required to localize a mobile phone in one localization cell in two dimensions. The receivers were connected to a Wi-Fi network to synchronize the timestamps of the incoming signal. Additionally, the receivers were connected with a wireless network to an evaluation unit. The evaluation unit is connected to the smart phones via cellular communication (GPRS/UMTS/LTE), which serves the ID of the specific sound and provides the map with the actual position of the user. In ASSIST, the absolute acoustic localization system is supported by the integrated inertial sensors. In areas where no receivers are available, the integrated inertial sensors can be used to localize the user for short periods.
In an applicable localization system based on sound signals, the frequency range of the used signals should be outside of the audible range. Choosing the correct frequency range is therefore essential. The following section elaborates different frequency ranges of human cognition and various hearing thresholds. Human hearing capability is the best at frequencies where most of the speech takes place, which is around 0.5–6 kHz. The absolute hearing threshold defines the minimum sound pressure level, which a pure tone needs to have in order to be recognizable for a human being. Sakamoto et al. have conducted measurements of the absolute hearing threshold in the frequency range from 8 to 20 kHz, for different age groups [
To evaluate the audibility of high frequency sound signals emitted by smart phones, we measured the sound pressure level of different commercial smart phones for different frequencies and distances. The sound pressure level values of the smart phones were then compared to the lowest values of average hearing threshold and corresponding standard deviation
Difference between average hearing threshold and sound pressure level emitted by smart phones in units of the standard deviation.
Distance to smart phone | Smart phone type | 18 kHz | 20 kHz |
---|---|---|---|
1 cm | iPhone 4S | 1.6 |
5.0 |
Samsung GT-S5830 | 0.9 |
4.1 | |
iPod Touch | 1.0 |
4.4 | |
|
|||
10 cm | iPhone 4S | 1.9 |
6.4 |
Samsung GT-S5830 | 1.8 |
4.1 | |
iPod Touch | 1.7 |
4.1 | |
|
|||
5 m | iPhone 4S | 3.5 |
11.7 |
Samsung GT-S5830 | 3.4 |
12.0 | |
iPod Touch | 3.5 |
11.3 |
As expected, the audibility of the sound signals is worse when frequency increases and the distance to the measured smart phone as well. The measurements and calculations show that with a chance of 0.13% (
In our system, the smart phone speakers transmit the sound signals for the localization. To analyze the maximum frequency limitation and the maximum acoustic bandwidth of a smart phone speaker, several COTS-smart phones were tested. Therefore, the frequency response and the radiation characteristic were measured.
For the measurement of the frequency response, sound with white noise was transmitted from several smart phones and recorded with a broadband measurement microphone Earthworks M50.
The frequency response is depicted in Figure
Frequency response of commercial available smart phones.
For the measurement of the smart phone radiation characteristics, the sound signals were measured within a distance of 25 cm from a microphone at different positions. Therefore, a smart phone holder is designed to allow a manual rotation of the smart phone and inclination angle around the holders axes. The smart phone is placed along the horizontal axis. The speaker is located on the opposite side of the measuring microphone. The measurements start at an inclination angle of 0° and the smart phone rotates around the holders axes with an angle of 15°. This corresponds to the movement of the microphone along a circle around the smart phone. The advantage of this rotation around the holders axes is its simple implementation. Eventually, the inclination angle of the smart phone is increased to reach 180°. The 3D measured radiation characteristics are shown in Figure
3D radiation pattern of iPhone 4S sound speaker.
Radiation characteristic in dB of iPhone 4S with implemented loudspeaker in the front side.
The sound pressure is plotted logarithmically. As a reference, the sound pressure is located within a direct orientation of the speaker to the microphone. The reference sound pressure level of 0 dB is assigned to a distance of 35 dB as the origin of the coordinate system.
Generating audio signals with a smart phone requires approximately 33 mW [
Using TDOA as the localization principle, the system is independent of the exact transmission time of the pulse. Hence, the operation system requires no modification or patch to ensure deterministic behaviour. On the contrary, localization systems based on TOF or round trip measurements rely on precise transmission and receiving time of the signal. Thus, the operation system is patched to ensure deterministic real-time behaviour. Consequently, the user does not need root rights to modify the operating system.
Ten prototype receivers for receiving the sound signals from the smart phones were built. Figure
Block diagram of the receiver with signal processing (described in Section
The first part in the signal chain of our receivers is a transducer, which converts acoustical signals into electrical signals. The designed system uses a small, low cost transducer, powered by a maximum voltage of 5 V. Further, MEMS-microphones from Knowles Acoustics were used and the sensitivity as a function of frequency was calculated and compared for different measurements as depicted in Figure
Frequency response of electret- and MEMS-microphones in the range from 500 Hz to 25 kHz.
An 8th order Butterworth low-pass filter with a cut-off frequency of 17.5 kHz was used to eliminate ambient noise. Before digitizing the data, the signal is analog amplified by a factor of
Photo of the receiver with analog module (bottom) and Gumstix Overo Board (top) for the signal processing.
To determine the time of arrival (ToA) of the received sound impulses, a precise time synchronization is needed, as the accuracy of the localization system relies on synchronization precision between the receivers. The receivers are connected to a Fast-Ethernet network to synchronize their clocks. The connected receivers (slaves) negotiate a master receiver which acts as a time reference. Subsequently, the other clients (receivers) adjust their clocks to the master considering time offset and time drifts. The slaves ping to the master to get the current time of the master via UDP-protocol. This time is corrected by round trip time from the slave. Time offset and the time drift are both considered by an adaption of the Network Time Protocol algorithm. Both time offset and clock drift between slave and master are obtained by linear regression from the set of the time stamps. The implementation of synchronization can be found in [
Figure
Opening angle of the receivers.
We have developed an Android software application (app), which transforms a standard COTS device into a transmitter for ASSIST. Fundamentally our designed application has three functionalities: (I) communication with the evaluation unit (server), (II) sound control, (III) and visualization of the current position on the map. The system works when the user downloads and starts the app in an area which supports the ASSIST infrastructure. The user interface is simple as one starts the app, which connects to an evaluation unit and receives an ID using its internet connection. Every registered hand-held device in a localization cell is assigned a unique ID. The smart phone is connected to the internet without a special infrastructure, only a mobile network is mandatory. In this work long term evolution (LTE) is used for wireless data communication which is the latest standard technology of mobile data transmission. The smart phones and the server communicate using the secure communications protocol HTTPS in JavaScript Object Notation (JSON) format. Specific parameters were assigned to each user, such that several devices can be distinguished by the appearance of the chirps. The necessary parameters conceived from the evaluation unit are frequency, impulse The app controls the loudspeaker of the smart phone and generates the specific sound signals (which is described in chapter 4) inside the smart phones. The current position and the map are transmitted from the evaluation unit to the smart phone. The position of the user is displayed on the screen of the smart phone in context to the environment, with a map and surrounding items. Figure
The developed Android software application on the screen of a smart phone.
In our approach, we use TDoA-Algorithms to calculate the position of the smart phones. When using TDoA-Algorithms for localization, the processing time inside the smart phones is not relevant. For using other localization algorithms, the position accuracy would be affected in a negative sense if the processing time is not measured. Only by knowing the propagation speed of sound and the precise arrival times at the receivers, the position of the smart phone device can be calculated. The receivers are connected to a Wi-Fi network, such that they synchronize their clocks and exchange the time differences of arrival of the received sound impulses. A smart phone transmits acoustic signals at a position
The speed of sound
The speed of sound depends on the temperature
In case of using sound waves instead of electromagnetic waves, the influence of the position accuracy from the synchronization of the receiver is decreased. The synchronization of the receivers is necessary for generating the timestamps for the TDoA-Algorithms. The receivers are connected together via wireless network (WLAN) which provides a precise time synchronization up to an order of 0.1 ms. Hence, the theoretical maximum localization error, caused by synchronization error, is 3.4 cm.
Smart phones generate specific sound signals at time
Time
Equation (
Our first approach uses only the amplitude of an incoming sound signal to detect its presence. Therefore, the smart phone generates short sound impulses with 18 kHz.
The approach of using envelope detection of sound signals is relatively easy but suffers from different drawbacks. The amplitude of sound decreases rapidly with distance. In the presence of background noise, one cannot distinguish between wanted and unwanted signals. Figure
Description of the signal processing with envelope detection (blue block in Figure
Threshold detection of the incoming sound signals.
To increase the robustness against measurement outliers and incorrect initialization, we implemented a particle filter for localization of the smart phone. The algorithm is described in [
In a second approach, we use a chirp impulse to increase the performance of the system by using pulse compression.
We use linear chirp signals to transmit the sound signal. A linear chirp is a signal in which the frequency increases or decreases linearly with time (up- and down-chirps). Some of their characteristics make them applicable for localization. Signals with maximum energy are essential for receiving short signals over large ranges. The influence of interfering signals or white and Gaussian noise can be reduced by increasing the signal energy, where the signal-to-noise (SNR) ratio is increased. The increase of signal energy can be done either by increasing the signal amplitude or the signal length. In radar or sonar applications, chirp signals are used to increase the SNR for a given bandwidth.
When autocorrelating a linear chirp signal, the resulting function shows a high and narrow peak. This characteristic allows high temporal accuracy for detecting signals. Cross correlating chirps in different frequency bands or up and down chirps, the resulting function does not show a distinct peak. This characteristic can be used to have multiple emitters operating at the same time. References [
The chirp impulse works between
The received signal is cross correlated with a stored up and down reference chirps. The mathematical formula for cross-correlation of two signals
Description of the signal processing with the chirp (blue block in Figure
Using a constant static threshold limits the transmission range by a high value to reduce false detections. However, an adaptive threshold, which detects the presence of the signal and increases the threshold and decreases the threshold for lower signal values can improve the sensitivity of the system. Moreover, an adaptive threshold can also reduce false detection by echoes due to increasing the threshold after the receipt of the signal. Therefore, we modified the constant failure alarm ratio (CFAR) algorithm to calculate the adaptive threshold [
Adaptive threshold calculation by GOGO-CFAR. The greatest value is taken from both windows.
Figure
Adaptive threshold GOGO-CFAR for real signal with small correlation amplitude (20 m distance). Echoes and noise is suppressed.
Absolute localization systems uses fixed and installed anchor nodes as infrastructure. Further, the localization system has to know the position
There are several self-calibrating TDoA-Algorithms available to calculate the positions of the receivers (anchors). In the
For the general case of arbitrarily distributed signal positions [
For the calibration phase of the localization system, an iterative optimization algorithm is used. The “Iterative Cone Alignment” algorithm [
We show measurement results for localization with the acoustic system and a possibility of using an IMU for localization.
In the first experiment, we use pulses with constant frequency and use the envelope to detect the presence of the signal. Figure
Photo of the real-world experiment environment.
Trajectory of the real-world experiment with envelope detection and particle filter. The mean positioning error of the signal estimate is 0.25 m (
Cumulative distribution of distance errors for Figure
We analyzed in an additional experiment the receiver range. Therefore, an iPhone 4S is positioned at a variable distance between 1 and 12 meters from two receivers. At each distance of 1 meter, the smart phone transmits 500 acoustic pulses. The length of each acoustic pulse is 50 ms with a frequency of 18 kHz. Figure
Measurement values of ASSIST with envelope detection depending on the distance.
The measurement deviation of the system was evaluated in static experiments. Figure
Measurement errors of ASSIST with correlation and TDoA-Algorithms in a static experiment.
Further, we verified our system in dynamic real-world scenario, 2D experiment. For a reference, we defined a walking track of 14 m which was exactly measured. In our experiment, we placed seven receiver devices in an oval of 10 m times 10 m around the walking track in a height of 1 m. A person walked along the defined track.
We calculated the positions of the smart phone which transmitted acoustic chirp impulses between 19 kHz and 20 kHz with a length of 50 ms. Figure
Real-world experiment of ASSIST. Data set
The smart phone track shows an average deviation of 0.34 m (
In an additional experiment, the receiver range was analyzed. Figure
Measurement values of ASSIST with chirp correlation depending on the distance. Black line is with adaptive GOGO-CFAR threshold and red line with circles shows the results for constant threshold.
Reflections at walls or hard surfaces (e.g., cabinet) induce echoes and disturb the line of sight signal. Furthermore, the echoes reduce the accuracy of the localization system, which is assumed to work on the line of sight signal. Figure
Analysis of the echoes. The abscissa represents the time delay and the ordinate represents the signal number. There are multiple echoes due to reflections.
In areas where no infrastructure is available, the integrated inertial sensors can be used to localize the user for a short period.
Currently, many different sensor types are integrated in the smart phones. For example, the commercial smart phone Samsung Galaxy S2 provides the data of the integrated inertial sensors like gyroscope, accelerometer, and a magnetic field sensor.
User localization based on the inertial sensors leads to measurement errors with increased observation time. The inertial sensor unit additionally supports a method to perform acoustic localization.
Since the smart phone is usually held by hands, methods as zero velocity update [
Measurement values of the acceleration sensor in
In an experiment, the data from inertial sensors of the smart phone without the ASSIST localization system was used for detecting a walk of 45 m distance in a building. The trajectory of the walk is shown in Figure
Trajectory of data from an experiment. Data from the inertial sensors of a smart phone (blue line) and a reference inertial measurement unit (red-dashed line).
In this paper, we presented a smart phone indoor localization system based on sound. The user of the system needs no additional hardware except a COTS smart phone. Through our self-built receivers, which were synchronized with a Wi-Fi network, the arrived signal can be correlated and the position is calculated with a TDoA-Algorithm. The first experiments showed that it is possible to use the system in a real environment to localize a user in an indoor environment with less effort.
The system does not require a special knowledge to be installed by the provider; hence, the installation effort is minimized. Through an anchor-free algorithm, the receivers work as a plug-and-play system and there is no need for additional information, since the positions of the receivers can be calculated.
In our paper, two different approaches were tested. The envelope detection shows an easy implementation but a limited transmission range of 11 m. The particle filter showed a robust localization of the smart phone with an error of
In areas with a poor receiver coverage, the localization with built-in inertial sensors in smart phone was tested. The integrated inertial sensors can be used as an additional localization method to support ASSIST for a short time. The maximum deviation from the reference track of 45 m was 1 m.
In our future investigations, we will improve the acoustic localization in situations where there is no line of sight between the smart phone and the receivers. Error minimization can be achieved, through fusion of the data from the inertial sensors and the data from the acoustic localization. Additionally, we will use the self-calibration algorithms in combination of the particle filter to increase the robustness.
We will modulate an identifier onto the signal to distinguish between different senders and enable multiuser applications. Another possibility is to use time-division multiplexing (TDM) to identify different smart phones. The time domain is divided into time slots which can be used from the respective of smart phones to generate the sound.
In some applications (e.g., in a supermarket), the receivers should be installed in the ceiling. In this case, the position of the user should be localized in a 2D area with defined height. The algorithm must be modified for 2.5D applications.
In addition, we will improve ASSIST through reducing measurement errors from multipath propagation. The experimental results should be evaluated with a reference system to measure the systematic error precisely.
This paper is an extended version of [
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work has partly been supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) within the Research Training Group 1103 (Embedded Microsystems). Special thanks are due to our technicians Hans Baumer, Christoph Bohnert and Uwe Burzlaff for the redesign of the hardware.