Electronic travel aids (ETAs) have been in focus since technology allowed designing relatively small, light, and mobile devices for assisting the visually impaired. Since visually impaired persons rely on spatial audio cues as their primary sense of orientation, providing an accurate virtual auditory representation of the environment is essential. This paper gives an overview of the current state of spatial audio technologies that can be incorporated in ETAs, with a focus on user requirements. Most currently available ETAs either fail to address user requirements or underestimate the potential of spatial sound itself, which may explain, among other reasons, why no single ETA has gained a widespread acceptance in the blind community. We believe there is ample space for applying the technologies presented in this paper, with the aim of progressively bridging the gap between accessibility and accuracy of spatial audio in ETAs.
Spatial audio rendering techniques have various application areas ranging from personal entertainment, through teleconferencing systems, to real-time aviation environments [1]. They are also used in health care, for instance, in motor rehabilitation systems [2], electronic travel aids (ETAs, i.e., devices which aid in independent mobility through obstacle detection or help in orientation and navigation) [3], and other assistive technologies for visually impaired persons [4].
In the case of ETAs, the hardware has to be portable, lightweight, and user-friendly, allow for real-time operation, and be able to support long-term operation. All these issues put designers and developers to a challenge where state-of-the-art technology literally comes at hand in the form of high-tech mobile devices, smartphones, and so on. Furthermore, if ETAs are designed for the visually impaired (The term Electronic Travel Aid was born and is almost exclusively used to describe systems developed to help visually impaired persons with navigating their surroundings safely and efficiently. Nevertheless, visually impaired persons are not strictly the only group who might benefit from ETAs: for instance, nonvisual interaction focused towards navigation is of interest to firefighters operating in smoke-filled buildings [5].), even more aspects have to be considered. Beyond the aforementioned, the devices should have a special user interface as well as alternative input and output solutions, where feedback in the form of sound can enhance the functionality of the device. Most of the developments of ETAs for the visually impaired aim at safety during navigation, such as avoiding obstacles, recognizing objects, and extending the auditory information by spatial cues [6, 7]. Since visually impaired persons rely on spatial audio cues as their primary sense of orientation [8], providing them with an accurate virtual auditory representation of the environment is essential.
ETAs evolved considerably over the past years, and a variety of virtual auditory displays [9] were proposed, using different spatial sound techniques and sonification approaches, as well as basic auditory icons, earcons, and speech [10]. Available ETAs for the visually impaired provide various information that ranges from simple obstacle detection with a single range-finding sensor, to more advanced feedback employing data generated from visual representations of the scenes, acquired through camera technologies. The auditory outputs of such systems range from simple binary alerts indicating the presence of an obstacle in the range of a sensor, to complex spatial sound patterns aiming at sensory substitution and carrying almost as much information as a graphical image [7, 11].
A division can also be made between local mobility aids (environmental imagers or obstacle detectors, with visual or ranging sensors) that present only the nearest surroundings to the blind traveler and navigation aids (usually GPS- or beacon-based) that provide information on path waypoints [12] or geographical points of interest [13]. While the latter group focuses on directions towards the next waypoint, meaning that a limited spatial sound rendering could be used (e.g., just presenting sounds in the horizontal plane) [14], the former group primarily provides information on obstacles (or the lack of them) and near scene layouts (e.g., walls and shorelines), supporting an accurate spatial representation of the scene [6].
Nevertheless, most of these systems are still in their infancy and at a prototype stage. Moreover, no single electronic assistive device has gained a widespread acceptance in the blind community, for different reasons: limited functionalities, ergonomics, small scientific/technological value, limited end-user involvement, high cost, and potential lack of commercial/corporate interest in pushing high-quality electronic travel aids [3].
While many excellent recent reviews on ETA solutions are available (see, e.g., [3, 4, 6, 7]), to our knowledge none of these works critically discusses or analyzes in depth the important aspect of spatial audio delivery. This paper gives an overview about existing solutions for delivering spatial sound, focusing on wearable technologies suitable for use in electronic travel aids for the visually impaired. The analysis reported in this paper indicates a significant potential to achieve accurate spatial sound rendering through state-of-the-art audio playback devices suitable for visually impaired persons and advances in customization of virtual auditory displays. This review was carried out within the European Horizon 2020 project named Sound of Vision (http://www.soundofvision.net). Sound of Vision focuses on creating an ETA for the blind that translates 3D environment models, acquired in real-time, into their corresponding real-time auditory and haptic representations [15].
The remainder of the paper is organized as follows. Section 2 reviews the basics of 3D sound localization, with a final focus on blind localization. Section 3 introduces the available state-of-the-art software solutions for customized binaural sound rendering, while Section 4 presents the available state-of-the-art hardware solutions suitable for the visually impaired. Finally, in Section 5 we discuss current uses and future perspectives of spatial audio in ETAs.
2. Basics of 3D Sound Localization
Localizing a sound source means determining the location of the sound’s point of origin in the three-dimensional sound space [16]. Location is defined according to a head-related coordinate system, for instance, the interaural polar system. In the interaural polar coordinate system the origin coincides with the interaural midpoint and the elevation angle ϕ goes from -180∘ to 180∘ with negative values below the horizontal plane and positive values above, while the azimuth angle θ ranges from -90∘ at the left ear to 90∘ at the right ear. The third dimension, distance r, is the Euclidean distance between the sound source and the origin. In the following we will refer to the three planes that divide the head into halves as the horizontal plane (upper/lower halves), the median plane (left/right halves), and the frontal plane (front/back halves).
Spatial cues for sound localization can be categorized according to polar coordinates. As a matter of fact, each coordinate is thought to have one or more dominant cues in a certain frequency range associated with a specific body component, in particular the following:
Azimuth and distance cues at all frequencies are associated with the head.
Elevation cues at high frequencies are associated with the pinnae.
Elevation cues at low frequencies are associated with torso and shoulders.
Based on well-known concepts and results, the most relevant cues for sound localization are now discussed [17].
2.1. Azimuth Cues
At the beginning of the twentieth century, Lord Rayleigh studied the means through which a listener is able to discriminate at a first level the horizontal direction of an incoming sound wave. Following his Duplex Theory of Localization [18], azimuth cues can be reduced to two basic quantities thanks to the active role of the head in the differentiation of incoming sound waves, that is, the following:
Interaural Time Difference (ITD), defined as the temporal delay between sound waves at the two ears
Interaural Level Difference (ILD), defined as the ratio between the instantaneous amplitudes of the same two sounds.
ITD is known to be frequency-independent below 500Hz and above 3kHz, with an approximate ratio of low-frequency ITD by high-frequency ITD of 3/2, and slightly variable at middle range frequencies [19]. Conversely, frequency-dependent shadowing and diffraction effects introduced by the human head cause ILD to greatly depend on frequency.
Consider a low-frequency sinusoidal signal (up to 1kHz approximately). Since its wavelength is greater than the head dimensions, ITD is no more than a phase lag Δϕ<2π between the signals arriving at the ears and therefore a reliable cue for horizontal perception in the low-frequency range [16]. Conversely, the considerable shielding effect of the human head on high-frequency waves (above 1kHz) makes ILD the most relevant cue in such spectral range.
Still, the information provided by ITD and ILD can be ambiguous. If one assumes a spherical geometry of the human head, a sound source located in front of the listener at azimuth θ and a second one located at the rear, at azimuth 180-θ, provide in theory identical ITD and ILD values. In practice, ITD and ILD will not be identical at these two azimuth angles because the human head is clearly not spherical, and all subjects exhibit slight asymmetries with respect to the median plane. Nonetheless their values will be very similar, and front-back confusion is in fact often observed experimentally [20]: listeners erroneously locate sources at the rear instead of the front (or less frequently, vice versa).
2.2. Elevation Cues
Directional hearing in the median vertical plane is known to have lower resolution compared with that in the horizontal plane [21]. For the record, the smallest change of position of a sound source producing a just-noticeable change of position of the auditory event (known as “localization blur”) along the median plane was found to be never less than 4∘, reaching a much larger threshold (≈17°) for unfamiliar speech sounds, as opposed to a localization blur of approximately 1∘-2∘ in the frontal part of the horizontal plane for a vast class of sounds [16]. Such a poor resolution is due to
the need of high-frequency content (above 4-5 kHz) for accurate vertical localization [22, 23];
mild interaural differences between the signals arriving at the left and right ear for sources in the median plane.
If a source is located outside the horizontal plane, ITD- and ILD-based localization becomes problematic. As a matter of fact, sound sources located at all possible points of a conic surface pointing towards the ear of a spherical head produce the same ITD and ILD values. These surfaces, which generalize the aforementioned concept of front-back confusion for elevation angles, are known as cones of confusion and represent a potential difficulty for accurate perception of sound direction.
Nonetheless, it is undisputed that vertical localization ability is brought by the presence of the pinnae [24]. Even though localization in any plane involves pinna cavities of both ears [25], determination of the perceived vertical angle of a sound source in the median plane is essentially a monaural process [26]. The external ear plays an important role by introducing peaks and notches in the high-frequency spectrum of the incoming sound, whose center frequency, amplitude, and bandwidth greatly depend on the elevation angle of the sound source [27, 28], to a remarkably minor extent on azimuth [29], and are almost independent of distance between source and listener beyond a few centimeters from the ear [30, 31]. Such spectral effects are physically due to reflections on pinna edges as well as resonances and diffraction inside pinna cavities [26, 29, 32].
In general, both pinna peaks and notches are thought to play an important function in vertical localization of a sound source [33, 34]. Contrary to notches, peaks alone are not sufficient vertical localization cues [35]; however, the addition of spectral peaks supports the improvement of localization performance at upper directions with respect to notches alone [36]. It is also generally considered that a sound source has to contain substantial energy in the high-frequency range for accurate judgement of elevation, because wavelengths significantly longer than the size of the pinna are not affected. Since wavelength λ and frequency f are related as λ=c/f (Here c is the speed of sound, typically c = 343.2 m/s in dry air at 20°C.), we could roughly state that pinnae have relatively little effect below f = 3 kHz, corresponding to an acoustic wavelength of λ ≈ 11 cm.
While the role of the pinna in vertical localization has been extensively studied, the role of torso and shoulders is less understood. Their effects are relatively weak if compared to those due to the head and pinnae, and experiments to establish the perceptual importance of the relative cues have produced mixed results in general [23, 37, 38]. Shoulders disturb incident sound waves at frequencies lower than those affected by the pinna by providing a major additional reflection, whose delay is proportional to the distance from the ear to the shoulder when the sound source is directly above the listener. Complementarily, the torso introduces a shadowing effect for sound waves coming from below. Torso and shoulders are also commonly seen to perturb low-frequency ITD, even though it is questionable whether they may help in resolving localization ambiguities on a cone of confusion [39].
However, as Algazi et al. remarked [38], when a signal is low-passed below 3 kHz, elevation judgement is very poor in the median plane if compared to a broadband source but proportionally improves as the source is progressively moved away from the median plane, where performance is more accurate in the back than in the front. This result suggests the existence of low-frequency cues for elevation that although being overall weak is significant away from the median plane.
2.3. Distance and Dynamic Cues
Distance estimation of a sound source (see [40] for a comprehensive review on the topic) is even more troublesome than elevation perception. At a first level, when no other cue is available, sound intensity is the first variable that is taken into account: the weaker the intensity is, the farther the source should be perceived. Under anechoic conditions, sound intensity reduction with increasing distance can be predicted through the inverse square law: intensity of an omnidirectional sound source will decay by approximately 6 dB for each doubling distance [41]. Still, a distant blast and a whisper at few centimeters from the ear could produce the same sound pressure level at the eardrum. Having a certain familiarity with the involved sound is thus a second fundamental requirement [42].
However, the apparent distance of a sound source is systematically underestimated in an anechoic environment [43]. On the other hand, if the environment is reverberant, additional information can be given by the direct to reflected energy ratio, or DRR, which functions as a stronger cue for distance than intensity: a sensation of changing distance occurs if the overall intensity is constant but the DRR is altered [41]. Furthermore, distance-dependent spectral effects also have a role in everyday environments: higher frequencies are increasingly attenuated with distance due to air absorption effects.
Literature on source direction perception is generally based on a fundamental assumption; that is, the sound source is sufficiently far from the listener. In particular, previously discussed azimuth and elevation cues are distance-independent when the source is in the so-called far-field (approximately more than 1.5 m from the center of the head) where sound waves reaching the listener can be assumed to be planar. On the other hand, when the source is in the near field some of the previously discussed cues exhibit a clear dependence on distance. By gradually approaching the sound source to the listener’s head in the near field, it was observed that low-frequency gain is emphasized; ITD slightly increases; and ILD dramatically increases across the whole spectrum for lateral sources [20, 30, 44]. The following conclusions were drawn:
Elevation-dependent features are not correlated to distance-dependent features.
ITD is roughly independent of distance even when the source is close.
Low-frequency ILDs are the dominant auditory distance cues in the near field.
It should be then clear that ILD-related information needs to be considered in the near field, where dependence on distance cannot be approximated by a simple inverse square law.
Finally, it has to be remarked that, switching from a static to a dynamic environment where the source and/or the listener move with respect to each other, both source direction and distance perception improve. The tendency to point towards the sound source in order to minimize interaural differences, even without visual aid, is commonly seen and aids in disambiguating front/back confusion [45]. Active motion helps especially in azimuth estimation and to a lesser extent in elevation estimation [46]. Furthermore, thanks to the motion parallax effect, slight translations of the listener’s head on the horizontal plane can help discriminate source distance [47, 48]: if the source is near, its angular direction will drastically change after the translation (reflecting itself onto interaural differences), while for a distant source this will not happen.
2.4. Sound Source Externalization
Real sound sources are typically externalized, that is, perceived to be located outside our own head. However, when virtual 3D sound sources are presented through headphones (see next section), in-the-head localization may typically occur and have a major impact on localization ability. Alternatively, listeners may perceive the direction of the sound source and be able to make accurate localization judgements yet accompanied with perception of the source being way closer to the head than otherwise intended (e.g., on the surface of the skull [49]). However, when relevant constraints are taken into account, such as the use of individually measured head-related transfer functions as explained in Section 3, virtual sound sources can be externalized almost as efficiently as real sound sources [50, 51]. Externalization is, along with other attributes such as coloration, immersion, and realism, one of the key perceptual attributes that go beyond the basic issue of localization recently proposed for the evaluation of virtually rendered sound sources [52].
In-the-head localization is mainly introduced by the loss of accuracy in interaural level differences and spectral profiles in virtually rendered sound sources [49]. Another extremely important factor is given by the interaural and spectral changes triggered by natural head movements in real-life situations: correctly tracked head movements can indeed substantially enhance externalization in virtual sonic environments, especially for sources close to the median plane (hardest to externalize statically in anechoic conditions, due to minimal interaural differences [53]), and even relatively small movements of a few degrees can efficiently reduce in-the-head localization [54]. Furthermore, it has been recently showed that externalization can persist once coherent head movement with the virtual auditory space is stopped [55].
Finally, factors related to sound reverberation contribute to a strong sense of externalization, as opposed to dry anechoic sound. The introduction of artificial reverberation [56] through image-source model-based early reflections, wall and air absorption, and late reverberation can significantly contribute to sound image externalization in headphone-based 3D audio systems [57], as well as congruence between the real listening room and the virtually recreated reverberating environment [58].
2.5. Auditory Localization by the Visually Impaired
A number of previous studies showed that sound source localization by visually impaired persons can be different from that of sighted persons. It has to be first highlighted that previous investigations on visually impaired subjects indicated neither better auditory sensitivity [59–61] nor lower auditory hearing thresholds [62] compared to normally sighted subjects. On the other hand, visually impaired subjects acquire the ability to use auditory information more efficiently thanks to the plasticity of the central nervous system, as, for instance, in speech discrimination [63], temporal resolution [64], or spatial tuning [65].
Experiments with real sound sources suggest that visually impaired (especially early blind) subjects map the auditory environment with equal or better accuracy than sighted subjects on the horizontal plane [62, 66–68] but are less accurate in detecting elevation [67] and show an overly compressed auditory distance perception beyond the near field [69]. However, unlike sighted subjects, visually impaired subjects can correctly localize sounds monaurally [66, 70], which suggests a trade-off in the localization proficiency between the horizontal and median planes taking place [71]. By comparing behavioral and electrophysiological indices of spatial tuning within the central and peripheral auditory space in congenitally blind and normally sighted but blindfolded adults, it was found that blind participants displayed localization abilities that were superior to those of sighted controls, but only when attending to sounds in peripheral auditory space [72]. Still, it has to be taken into account that early blind subjects have no possibility of learning the mapping between auditory events and visual stimuli [73].
While localizing, adapting to the coloration of the signals is a relevant component for both sighted and blind subjects. Improved obstacle sense of the blind is also mainly due to enhanced sensitivity to echo cues [74], which allows so-called echolocation [75, 76]. Thanks to this obstacle sensing ability, which can be improved by training, distance perception in blind subjects may be enhanced [68, 76–78]. In addition, some blind subjects are able to determine size, shape, or even texture of obstacles based on auditory cues [70, 77, 79, 80].
Switching to virtual auditory displays, that is, the focus of this paper, a detailed comparative evaluation of blind and sighted subjects [81] confirmed some of the previously discussed results in the literature on localization with real sound sources. Better performance in localizing static frontal sources was obtained in the blind group due to a decreased number of front-back reversals. In the case of moving sources, blind subjects were more accurate in determining movements around the head in the horizontal plane. Sighted participants, however, performed better during listening to ascending movements in the median plane and in identifying sound sources in the back. In-the-head localization rates and the ability to detect descending movements were almost identical for the two groups. In a further experiment [82] error rates of about 6 to 14 degrees horizontally and 9 to 24 degrees vertically were measured for a pool of blind subjects. Improvements in localization by blind persons were observed mainly in the horizontal plane and in case of a broadband stimulus.
Finally, although visual information corresponding to auditory information significantly aids localization and creation of correct spatial mental mappings, it has to be remarked that visually impaired subjects can benefit from off-site representations in order to gain spatial knowledge of a real environment. For instance, results of recent studies showed that interactive exploration of virtual acoustic spaces [83–85] and audio-tactile maps [86] can provide relevant information for the construction of coherent spatial mental maps of a real environment in blind subjects and that such mental representations preserve topological and metric properties, with performances comparable or even superior to an actual navigation experience.
3. Binaural Technique
The most basic method for simulating sound source direction over loudspeakers is to use panning. This usually refers to amplitude panning using two channels (stereo panning). In this case, only level information is used as a balance between the channels, and the virtual source is shifted towards the louder channel. However, ILD and spectral cues are determined by the actual speaker locations. In traditional stereo setups, where loudspeakers and listener form a triangle, sources can be correctly simulated on the line ideally connecting the two speakers. However, although traditional headphones also use two channels, correct directional information is not maintained due to a different arrangement of the speakers with respect to the listener and by the loss of crosstalk between the channels.
Spatial features of virtual sound sources can be more realistically rendered through headphones by processing an input sound with a pair of filters, each simulating all the linear transformations undergone by the acoustic signal during its path from the sound source to the corresponding listener’s eardrum. These filters are known in the literature as head-related transfer functions (HRTFs) [87], formally defined as the frequency-dependent ratio between the sound pressure level (SPL) Φ(θ,ϕ,ω) at the eardrum and the free field SPL at the center of the head Φf(ω) as if the listeners were absent:(1)Hθ,ϕ,ω=Φθ,ϕ,ωΦfω,where (θ,ϕ) indicates the angular position of the source relative to the listener and ω is angular frequency. The HRTF contains all of the information relative to sound transformations caused by the human body, in particular by the head, external ears, torso, and shoulders.
HRTF measurements are typically conducted in large anechoic rooms. Usually, a set of loudspeakers is arranged around the subject, pointing towards him/her and spanning an imaginary spherical surface. The listener is positioned so that the center of the interaural axis coincides with the center of the sphere defined by the loudspeakers and their rotation (or, equivalently, the subject’s rotation). A probe microphone is inserted into each ear, either at the entrance or inside the ear canal. The measurement technique consists in recording and storing the signal arriving at the microphones. Consequently, these signals are processed in order to remove the effects of the room and the recording equipment (especially speakers and microphones), leaving only the HRTF [87, 88].
By processing a desired monophonic sound signal with a pair of individual HRTFs, one per channel, and by adequately accounting for headphone-induced spectral coloration (see next Section), authentic 3D sound experiences can take place. Virtual sound sources created with individual HRTFs can be localized almost as accurately as real sources and efficiently externalized [50], provided that head movements can be made and that the sound is sufficiently long [89]. As a matter of fact, localization of short broadband sounds without head movements is less accurate for virtual sources than for real sources, especially in regard to vertical localization accuracy [90], and front/back reversal rates are higher for virtual sources [89].
Unfortunately, the individual HRTF measurement technique requires the use of dedicated research facilities. Furthermore, the process can take up to several hours, depending on the used measurement system and on the desired spatial grid density, being uncomfortable and tedious for subjects. As a consequence, most practical applications use nonindividual (or generic) HRTFs, for instance, measured on dummy heads, that is, mannequins constructed from average anthropometric measurements. Several generic HRTF sets are available online. The most popular are based on measurements using the KEMAR mannequin [91] or the Neumann KU-100 dummy head (see the Club Fritz study [92]). Alternatively, an HRTF set can be taken from one of many public databases of individual measurements (see, e.g., [93]); many of these databases were recently unified in a common HRTF format known as Spatially Oriented Format for Acoustics (SOFA) (https://www.sofaconventions.org/).
On the other hand, while nonindividual HRTFs represent the cheapest means of providing 3D perception in headphone reproduction, especially in the horizontal plane [94, 95], listening to nonindividual spatial sounds is more likely to result in evident sound localization errors such as incorrect perception of source elevation, front-back reversals, and lack of externalization [96] that cannot be fully counterbalanced by additional spectral cues, especially in static conditions [46]. In particular, individual elevation cues cannot be characterized through generic spectral features.
For the above reasons, different alternative approaches towards HRTF-based synthesis were proposed throughout the last decades [37, 97]. These are now reviewed and presented sorted by increasing level of customization.
3.1. HRTF Selection Techniques
HRTF selection techniques typically use specific criteria in order to choose the best HRTF set for a particular user from a database. Seeber and Fastl [98] proposed a procedure according to which one HRTF set is selected based on multiple criteria such as spatial perception, directional impression, and externalization. Zotkin et al. [99] selected the HRTF set that best matched an anthropometric data vector of the pinna. Geronazzo et al. [100] and Iida et al. [101] selected the HRTF set whose extracted pinna notch frequencies were closest to the hypothesized frequencies of the user according to a reflection model and an anthropometric regression model, respectively.
Similarly, selection can be targeted at detecting a subset of HRTFs in a database that fit the majority of a pool of listeners. Such an approach was pursued, for example, by So et al. [102] through cluster analysis and by Katz and Parseihian [103] through subjective ratings. The choice of the personal best HRTF among this reduced set is left to the user. Even different selection approaches were undertaken by Hwang et al. [104] and Shin and Park [105]. They modeled HRIRs on the median plane as linear combinations of basis functions whose weights were then interactively self-tuned by the listeners themselves.
Results of localization tests included in the majority of these works show a general decrease of the average localization error as well as of the front/back reversal and inside-the-head localization rates using selected HRTFs rather than generic HRTFs.
3.2. Analytical Solutions
These methods try to find a mathematical solution for the HRTF, taking into account the size and shape of the head and torso in particular. The most recurring head model in the literature is that of a rigid sphere, where the response related to a fixed observation point on the sphere’s surface can be described by means of an analytical transfer function [106]. Brown and Duda [37] proposed a first-order approximation of this transfer function for sources in the far-field as a minimum-phase analog filter. Near-field distance dependence can be accounted for through an additional filter structure [107].
Although the spherical head model provides a satisfactory approximation to the low-frequency magnitude of a measured HRTF [108], it is far less accurate in predicting ITD, which is actually variable around a cone of confusion by as much as 18% of the maximum interaural delay [109]. ITD estimation accuracy can be improved by considering an ellipsoidal head model that can account for the ITD variation and be adapted to individual listeners [110]. It has to be highlighted, however, that ITD estimation from HRTFs is a nontrivial operation, given the large variability of objective and perceptual ITD results produced by different common calculation methods for the same HRTF dataset [111, 112].
A spherical model can also approximate the contribution of the torso to the HRTF. Coaxial superposition of two spheres of different radii, separated by a distance accounting for the neck, results in the snowman model [113]. The far-field behavior of the snowman model was studied in the frontal plane both by direct measurements on two rigid spheres and by computation through multipole reexpansion [114]. A filter model was also derived from the snowman model [113]; its structure distinguishes the two cases where the torso acts as a reflector or as a shadower, switching between the two filter substructures as soon as the source enters or leaves the torso shadow zone, respectively. Additionally, an ellipsoidal model for the torso was studied in combination with the usual spherical head [38]. Such model is able to account for different torso reflection patterns; listening tests confirmed that this approximation and the corresponding measured HRTF gave similar results, showing larger correlations away from the median plane.
A drawback of these techniques is that since they do not consider the contribution of the pinna, the generated HRTFs match measured HRTFs at low frequencies only, lacking spectral features at higher frequencies [115].
3.3. Structural HRTF Models
According to the structural modeling approach, the contributions to the HRTF of the user’s head, pinnae, torso, and shoulders, each accounting for some well-defined physical phenomena, are treated separately and modeled with a corresponding filtering element [37]. The global HRTF model is then constructed by combining all the considered effects [116]. Structural modeling opens to an interesting form of content adaptation to the user’s anthropometry, since parameters of the rendering blocks can be estimated from physical data, fitted, and finally related to anthropometric measurements.
Structural models typically assume a spherical or ellipsoidal geometry for both the head and torso, as discussed in the previous subsection. Effective customizations of the spherical head radius given the head dimensions were proposed [117, 118], resulting in a close agreement with experimental ITDs and ILDs, respectively. Alternatively, ITD can be synthesized separately using individual morphological data [119]. An ellipsoidal torso can also be easily customized for a specific subject by directly defining control points for its three axes on the subject’s torso [114]. Furthermore, a great variety of pinna models is available in the literature, ranging from simple reflection models [120] and geometric models [121] to more complex physical models that treat the pinna either as a configuration of cavities [122] or as a reflecting surface [29]. Structural models of the pinna, simulating its resonant and reflective behaviors in two separate filter blocks, were also proposed [123–125].
Algazi et al. [93] suggested using a number of one-dimensional anthropometric measurements for HRTF fitting through regression methods or other machine learning techniques. This approach was recently pursued in a number of studies [126–129] investigating the correspondence between anthropometric parameters and HRTF shape. When suitable processing is performed on HRTFs, clear relations with anthropometry emerge. For instance, Middlebrooks [130] reported a correlation between pinna size and center frequencies of HRTF peaks and notches and argued that similarly shaped ears that differ in size just by a scale factor produce similarly shaped HRTFs that are scaled in frequency. Further evidence of the correspondence between pinna shape and HRTF peaks [123, 131, 132] and notches [125, 133, 134] is provided in a number of following works. The use of such knowledge leads to the effective parametrization of structural pinna models based on anthropometric parameters, which suggests an improvement in median plane localization with respect to generic HRTFs [135, 136].
3.4. Numerical HRTF Simulations
Numerical methods typically require as input a 3D mesh of the subject, in particular the head and torso, and include approaches such as finite-difference time domain (FDTD) methods [108], the finite element method (FEM) [137], and the boundary element method (BEM) [138].
Recent literature has focused on the BEM. It is known that high-resolution meshes are needed in order to effectively simulate HRTFs with the BEM, especially for the pinna area. Low mesh resolution results indeed in simulated HRTFs that greatly differ from acoustically measured HRTFs at high frequencies, thus destroying elevation cues [139]. However, as the number of mesh elements grows, memory requirements and computational load grow even faster [140]. Recent works introduced the fast multipole method (FMM) and the reciprocity principle (i.e., interchanging sources and receivers) in order to face BEM efficiency issues [140, 141]. Ultimately, localization performances of simulated HRTFs through the BEM were found to be similar to those observed with acoustically measured HRTFs [142], and databases of simulated HRTFs [143] as well as open-source tools for calculating HRTFs through the BEM given a head mesh as input [144] are available online.
On the other hand, image-based 3D modeling, based on the reconstruction of 3D geometry from a set of user pictures, is a fast and cost-effective alternative to obtaining mesh models [145]. Furthermore, the advent of consumer level depth cameras and the availability of huge computational power on consumer computers open new perspectives towards very cheap and yet very accurate calculation of individualized HRTFs.
4. Headphone Technologies
One of the crucial variables for generating HRTF-based binaural audio is the headphone itself. Headphones are of different types (e.g., circumaural, supra-aural, extra-aural, and in-ear) and can have transfer functions that are far from linear. The main issue with classic headphones is that the transfer function between headphone and eardrum heavily varies from person to person and with small displacements of the headphone itself [146, 147]. Such variation is particularly marked in the high-frequency range where important elevation cues generally lie. As a consequence, headphone playback introduces significant localization errors, such as in-the-head localization, front-back confusion, and elevation shift [148].
In order to preserve the relevant localization cues provided by HRTF filtering during headphone listening, various headphone equalization techniques, usually based on a prefiltering with the inverse of the average headphone transfer function, are used [149]. However, previous research suggests that these techniques are little to no effective when nonindividual (even selected) HRTFs are used [149, 150]. On the other hand, several authors support the use of individual headphone compensation in order to preserve localization cues in the high-frequency range [146, 147].
In the case of travel aids for the visually impaired, additional factors need to be considered in the design and choice of the headphone type. Most importantly, ears are essential to provide information about the environment, and visually impaired persons refuse to use headphones during navigation if these either partially or fully cover the ears, therefore blocking environmental noises. The results of a survey of the preferences of visually impaired subjects for a possible personal navigation device [151] showed indeed that the majority of participants rated headphones worn over the ears as the least acceptable output device, compared to other technologies such as bone-conduction and small tube-like headphones, or even a single headphone worn over one ear. Furthermore, those fully blind had much stronger negative feelings about headphones that blocked ambient sounds than those who were partially sighted.
This important consideration shifts our focus to alternative state-of-the-art solutions for spatial audio delivery such as unconventional headphone configurations, bone-conduction headsets, or active transparent headsets.
4.1. Unconventional Headphone Configurations
The problem of ear occlusion can be tackled by decentralizing the point of sound delivery from the entrance of the ear canal to positions around the ear, with one or more transducers per ear. In this case, issues arise regarding the proper direction and distance of each transducer with respect to the ear canal, as well as their types and dimensions. Furthermore, there is a challenge in the spatial rendering technique in that no research results support the application of traditional loudspeaker-based spatial audio techniques (such as Vector Base Amplitude Panning [152] or Ambisonics [153]) to multispeaker headsets and that traditional HRTF measurements do not match with decentralized speaker positions.
The first attempts in delivering spatial audio through multispeaker headphones were performed by König. A decentralized 4-channel arrangement placed on a pair of circumaural earcups for frontal surround sound reproduction was implemented [154] (an alternative small supra-aural configuration was also proposed [155]). Results showed that this speaker arrangement induces individual direction-dependent pinna cues as they appear in real frontal sound irradiation in the free field for frequencies above 1 kHz [156]. Psychoacoustic effects introduced by the headphone revealed that frontal auditory events are achieved, as well as effective distance perception [154].
The availability of individual pinna cues at the eardrum is imperative for accurate frontal localization [157]. Accordingly, Sunder et al. [158] later proposed the use of a 2-channel frontal projection headphone which customizes nonindividual HRTFs by introducing idiosyncratic pinna cues. Perceptual experiments validated the effectiveness of frontal headphone playback over conventional headphones with reduced front-back confusions and improved frontal localization. It was also observed that the individual spectral cues created by the frontal projection are self-sufficient for front-back discrimination even with the high-frequency pinna cues removed from the nonindividual HRTF. However, additional transducers are needed if virtual sounds behind the head have to be delivered, and timbre differences with respect to the frontal transducers need to be solved.
Greff and Katz [159] extended the above solutions to a multiple transducer array placed around each ear (8 speakers per ear) recreating the pinna-related component of the HRTF. Simulations and subjective evaluations showed that it is possible to excite the correct localization cues provided by the diffraction of the reconstructed wave front on the listener’s own pinnae, using transducer driving filters related to a simple spherical head model. Furthermore, different speaker configurations were investigated in a preliminary localization test, the one with transducers placed at grazing incidence all around the pinna showing the best results in terms of vertical localization accuracy and front/back confusion rate.
Recently, Bujacz et al. [160] proposed a custom headphone solution for a prospective ETA with four proximaural speakers positioned above and below the ears, all slightly to the front. Amplitude panning was then used as spatial audio technique to shift the power of the output sound between pairs of speakers, both horizontally and vertically. Results of a preliminary localization test showed a localization accuracy comparable to HRTF-based rendering through high-quality circumaural headphones, both in azimuth and in elevation.
4.2. Bone-Conduction Headsets
The use of a binaural bone-conduction headset (also known as bonephones) is an extremely attractive solution for devices intended for the blind as the technology does not significantly interfere with sounds received through the ear canal, allowing for natural perception of environmental sounds. The typical solution is to place vibrational actuators, also referred to as bone-conduction transducers, on each mastoid (the raised portion of the temporal bone located directly behind the ear) or alternatively on the cheek bones just in front of the ears [161]. Pressure waves are sent through the bones in the skull to the cochlea, with some amount of natural sound leakage through air into the ear canals still occurring.
There are some difficulties in using bone conduction for delivering spatial audio. The first is the risk of crosstalk impeding an effective binaural separation: because of the high propagation speed and low attenuation of sound in the human skull, both the ITD and ILD cues are significantly softened. Walker et al. [162] still observed some degree of spatial separation with interaural cues provided through bone conduction and ear canals either free or occluded, especially relative to ILD. Perceived lateralization is even comparable between air conduction and bone conduction with unoccluded ear canals [163]. However, the degradation relative to standard headphones suggests the difficulty to produce large enough interaural differences to simulate sound sources at extreme lateral locations [162].
The second problem is the need to introduce additional transfer functions for correct equalization of HRTF-based spatial audio: the frequency response of the transducer [164] and the transfer function to the bones themselves, referred to as bone-conduction adjustment function (BAF) [165], which takes into account high-frequency attenuation by the skin [166] and differs between individuals, similar to HRTFs. Walker et al. [167, 168] proposed the use of appropriate bone-related transfer functions (BRTFs) in replacement of HRTFs. Stanley [165] derived individual BAFs from equal-loudness judgements on pure tones, showing that individual BAF adjustments to HRTF-based spatial sound delivery were effective in restoring the spectral cues altered by the bone-conduction pathway. This allowed for effective localization in the median plane by reducing up/down reversals with respect to the BAF-uncompensated stimuli. However, there is no way to measure BAFs empirically, and it is unclear whether the use of a generic, average BAF could lead to the same conclusions.
MacDonald et al. [164] reported similar localization results in the horizontal plane between bone conduction and air conduction, using individual HRTFs as the virtual auditory display and headphone frequency response compensation. Lindeman et al. [169, 170] compared localization accuracy between bone conduction with unoccluded ear canals and an array of speakers located around the listener. The results showed that although the best accuracy was achieved with the speaker array in the case of stationary sounds, there was no difference in accuracy between the speaker array and the bone-conduction device for sounds that were moving, and that both devices outperformed standard headphones for moving sounds.
Finally, Barde et al. [171] recently investigated the minimum discernable angle difference in the horizontal plane with nonindividual HRTFs over a bone-conduction headset, resulting in an average value of 10°. Interestingly, almost all participants reported actual sound externalization.
4.3. Active Transparent Headsets
An active headset is able to detect and process environmental sounds through analog circuits or digital signal processing. One of the most important fields of application of active headsets is noise reduction, where the headset uses active noise control [172, 173] to reduce unwanted sound by the addition of an antiphase signal to the output sound. In the case of ETAs, the environmental signal should not be canceled but provided back to the listener (hear-through signal) mixed with the virtual auditory display signal in order for the subject to be aware of the surroundings. Binaural hear-through headsets (in-ear headphones with integrated microphones) are typically used in augmented reality audio (ARA) applications [174], where a combination of real and virtual auditory objects in a real environment is needed [175].
The hear-through signal is a processed version of the environmental sound and should produce similar auditory perception to natural perception with unoccluded ears. Thus, equalization is needed to make the headset acoustically transparent, since it affects the acoustic properties of the outer ear [176]. The most important problem here is poor fit on the head causing leaks and attenuation problems. The fit of the headphone affects isolation and frequency response as well. Using internal microphones inside the headset in addition to the external ones, a controlled adaptive equalization can be realized [177].
The second basic requirement for a hear-through system is that processing of the recorded sound should have minimal latency [175]. As a matter of fact, when the real signal (leaked to the eardrum) is summed up with the hear-through signal, the delayed version can cause audible comb-filtering effects, especially at lower frequencies where leakage is higher. The audibility of comb-filtering effects depends on both the time and amplitude difference between the hear-through signal and the leaked signal [178]. Using digital realizations, which are preferable over analog circuits in the case of an ETA in terms of both cost and size, suitable latencies of less than 1.4 ms, for which the comb-filtering effect was found to be inaudible when the attenuation of the headset is 20 dB or more, can be achieved with a DSP board [179].
Finally, the hear-through signal should preserve localization cues at the ear canal entrance. Since sound transmission from the microphone to the eardrum is independent of direction whether the microphone is inside or at most 6 mm outside the ear canal [180], having binaural microphones just outside the ear canal entrance is sufficient for obtaining the correct listener-dependent spatial information.
5. Spatial Audio in ETAs
From the multitude of ETAs, two main trends in selecting sound cues can be observed, one to provide very limited yet easily interpretable data, typically from a range sensor, and the other to provide an overabundance of auditory data and let the user learn to extract useful information from it (e.g., the vOICe [181]). A third approach, taken for instance by the authors in the Sound of Vision project [15], is to limit the data from a full-scene representation to just the most useful information, for example, by segmenting the environment and identifying the nearest obstacles or detecting special dangerous scene elements such as stairs. Surveys show that individual preferences among the blind can vary greatly, and all three approaches have users that prefer them [182].
In a recent literature review, Bujacz and Strumiłło [6] classified the auditory display solutions implemented in the most widely known ETAs, either commercially available or in various stages of research and development. Of the 22 considered ETAs, 12 use a spatial representation of the environment. However, breaking the list of ETAs down to obstacle detectors (mostly hand-held) and environmental imagers (mostly head-mounted), ETAs that use a spatial representation almost all belong to the second category. Some of them, such as the vOICe [181], Navbelt [183], SVETA [184], and AudioGuider [185], use stereo panning to represent directions, whereas elevation information is either ignored or coded into sound pitch. ETAs (including works not included in the above cited review) that use HRTFs as the spatial rendering method are now summarized. All of the systems presented in the following are laboratory prototypes.
5.1. Available ETAs Using HRTFs
The EAV (Espacio Acustico Virtual) system [186] uses stereoscopic cameras to create a low resolution (16 × 16 × 16) 3D stereopixel map of the environment in front of the user. Each occupied stereopixel becomes a virtual sound source filtered with the user’s individual HRTFs, measured in a reverberating environment. The sonification technique employs spatial audio cues (synthesized with HRTFs) and a distance-to-loudness encoding. Sounds were presented through a pair of individually equalized Sennheiser HD-580 circumaural headphones. Classic localization tests with the above virtual auditory display and tests with multiple sources were performed on 6 blind and 6 normally sighted subjects. Subjects were accurate in identifying the objects’ position and recognizing shapes and dimensions within the limits imposed by the system’s resolution.
The cross-modal ETA device [187] is a wearable prototype that consists of low-cost hardware: earphones (no further information provided), sunglasses fitted with two CMOS micro cameras, and a palm-top computer. The system is able to detect the light spot produced by a laser pointer, compute its angular position and depth, and generate a corresponding sound to the position and distance of the pointed surface. The sonification encoding uses directional auditory cues provided through Brown and Duda’s structural HRTF model [37], and distance cues through loudness control and reverberation effects. The subjective effectiveness of the sonification technique was evaluated by several volunteers who were asked to use the system and report their opinions. The overall result was satisfactory, with some problems related to the lack of elevation perception. Targets very high and very low were perceived correctly, whereas those laying in the middle were associated with wrong elevations.
The Personal Guidance System [12] receives information from a GPS receiver and was evaluated in five different types of configurations involving different types of auditory displays, spatial sound delivery methods (either via classic headphones or through a speaker worn on the shoulder), and tracker locations. No details about the binaural spatialization engine or the headphones used were provided. Fifteen visually impaired subjects traveled a 50m long pathway with each of the 5 configurations. Results showed that the configuration using binaurally spatialized virtual speech led to the shortest travel times and highest subjective ratings. However, there were many negative comments about the headphones blocking environmental sounds.
The SWAN system [8, 188] aids navigation and guidance through a set of navigation beacons (earcon-like sounds), object-related sounds (provided through spatial auditory icons), location information, and brief prerecorded speech samples. Sounds are updated in real-time by tracking the subject’s orientation and accordingly spatialized through nonindividual HRTFs. Sounds were played either through a pair of Sony MDR-7506 closed-ear headphones or an equalized bone-conduction headset (see [165]). In an experimental procedure, 108 sighted subjects were required to navigate three different maps. Results showed good navigation skills for almost all the participants in both time and path efficiency.
The main idea of the Virtual Reality Simulator for the visually impaired people [189] consists in calculating the distance between the user and nearby objects (depth map) and converting it into sound. The depth map is transformed into a spatial auditory map by using 3D sound cues synthesized with individually measured HRTFs from 1003 positions in the frontal field. Sounds were provided through a standard pair of stereophonic headphones (no further information provided). The Virtual Reality Simulator proved to be helpful for visually impaired people in different research experiments performed indoors and outdoors, in virtual and real-life situations. Among the main limitations of the simulator are tracking accuracy and the lack of a real-time HRTF convolver.
The Real-Time Assistance Prototype [190], an evolution of the CASBliP prototype [191], encodes objects’ position in space based on their distance (inversely proportional to sound frequency), direction (3D binaural sounds synthesized with nonindividual HRTFs), and speed (proportional to pitch variation). Nonindividual HRTFs of a KEMAR mannequin were measured for different spatial points in a 64° azimuth range, a 30° elevation range, and a 15 m distance range. Sounds were provided through a pair of SONY MDR-EX75SL in-ear headphones. Two experiments were performed with four totally blind subjects, one requiring subjects to identify the sound direction and the other one to detect the position of a moving source and to follow. Despite providing encouraging results in static conditions for objects moving in the detected area, its main limitations reside in the inability to detect objects at ground level and in the reduced 64° field of view.
The NAVITON system [192, 193] processes stereo images to segment out key elements for auditory presentation. For each segmented element, the sonification approach uses discrete pitched sounds, whose pitch, loudness, and temporal delay (depth scanning) depend on object distance, and whose duration is proportional to the depth of the object. Sounds are spatialized with individual HRTFs, custom measured in the full azimuth range and in the vertical plane from −54° to 90°, in 5° steps. Sounds were provided through high-quality open-air reference headphones without headphone compensation. Ten blindfolded participants reported their auditory perception about the sonified virtual 3D scenes in a virtual reality trial, proving to be capable of grasping the general spatial structure of the environment and accurately estimate scene layouts. A real-world navigation scenario was also tested with 5 blind and 5 blindfolded volunteers, who could accurately estimate the spatial position of single obstacles or pairs of obstacles and walk through simple obstacle courses.
The NAVIG (Navigation Assisted by Artificial VIsion and GNSS) system [194, 195] aims to enhance mobility and orientation, navigation, object localization, and grasping, both indoors and outdoors. It uses a Global Navigation Satellite System (GNSS) and a rapid visual recognition algorithm. Navigation is ensured by real-time nonindividual HRTF-based rendering, text-to-speech, and semantic sonification metaphors that provide information about the trajectory, position, and the important landmarks in the environment. The 3D audio scenes are conveyed through a bone-conduction headset whose complex frequency response is equalized in order to properly render all the spectral cues of the HRTF. Preliminary experiments have shown that it is possible to design a wearable device that can provide fully analyzed information to the user. However, thorough evaluations of the NAVIG prototype have not been published yet.
5.2. Discussion and Conclusions
The use of HRTFs to code directional information in the above summarized ETAs suggests the importance of a high-fidelity spatial auditory representation of the environment for blind users. However, most of the above works fail to address the hardware- and/or software-related aspects we discussed in Sections 3 and 4, presenting results of performance and usability tests that are based on binaural audio rendering setups that either are ideal yet unrealistic (e.g., [186]) or underestimate the potential of spatial sound itself (e.g., [190]).
As a matter of fact, the preferred choice for the virtual auditory display within the 8 listed ETAs is either individually measured HRTFs or nonindividual, generic HRTFs. Only the cross-modal ETA [187] proposes the use of structural HRTF modeling as a trade-off between localization accuracy and measurement cost. As a result, the evaluation of these systems (often performed through proper localization performance tests) is based either on the best scoring yet unfeasible solution (individually measured HRTFs) or on a costless yet inaccurate one (generic HRTFs), overlooking important aspects in the fidelity of the virtual auditory display such as elevation accuracy and front/back confusion avoidance. Furthermore, the aforementioned monaural localization ability by visually impaired persons (especially early blind) suggests the use of individual pinna cues for azimuth perception, which would make a visually impaired person more vulnerable to degraded localization from nonindividual HRTFs than a sighted person.
Even more unfortunately, the headphones chosen for these tests were in the majority of cases classic circumaural or in-ear headphones that block environmental sounds and thus, as discussed before, are not acceptable for the visually impaired community. The use of a bone-conduction headset is reported only for the SWAN and NAVIG systems [188, 194], where the importance of headphone equalization, although forced to be nonindividual, is also stressed. None of the remaining works, except one [186], even mentions headphone equalization. Effective externalization of the virtual sounds provided to the users is therefore questionable.
It is difficult to rank the importance of the various factors influencing a satisfactory virtual acoustic experience (e.g., externalization, localization accuracy, and front-back confusion rate). Most studies check for only one or two factors and can confirm their influence on one or more spatial sound perception parameters. Besides the choice of the HRTF set, headphone type, and equalization, and type of sound source (frequency content, familiar/unfamiliar sound, and temporal aspects) [16, 44, 196], other important factors have to be considered. For instance, as explained in Section 2.4, rendering environmental reflections increases externalization, as well as the use of a proper head-tracking method, which also helps in resolving front/back confusion [95]. This may be why most of the above cited studies chose to use high-quality headphones with generic or individual HRTFs, without applying headphone equalization as long as head-tracking or real-time obstacle tracking is implemented. It is also relevant to notice that those systems that use head-mounted cameras to render sounds at locations relative to current head orientation do not even strictly require head-tracking to work dynamically [197].
We believe there is ample space for applying the technologies presented in this review paper to the case of ETAs for the blind. Basic research in HRTF customization techniques is currently in a prolific stage, thanks to advances in computational power and the widespread availability of technologies such as 3D scanning and printing allowing researchers to investigate in detail the relation between individual anthropometry and HRTFs. Although a full and thorough understanding of the mechanisms involved in spatial sound perception still has to be reached, techniques such as HRTF selection, structural HRTF modeling, or HRTF simulations are expected to progressively bridge the gap between accessibility and accuracy of individual binaural audio.
Still it has to be noted that many experiments proved that subjective training to nonindividual HRTFs, especially through cross-modal and game-based training methods, can significantly reduce localization errors in both free field and virtual listening conditions [198]. Feedback can be provided through visual stimuli [199, 200], proprioceptive cues [201, 202], or haptic information [203]. Reductions in front-back confusion rates as large as 40% were reported, as well as improvements in sound localization accuracy in the horizontal and vertical planes regardless of head movement.
On the other hand, the headphone technologies discussed in Section 4 are expected to reach widespread popularity in the blind community. Bone-conduction and active headsets are growing in the consumer market thanks to their affordable price. External multispeaker headsets are still at a prototype stage but from a research point of view open the attractive possibility of introducing individualized binaural playback without the need of fully individual HRTFs. Efforts in the design of such headphones have been produced within the Sound of Vision project [160].
A final comment regards the cosmetic acceptability of the playback device. While bone-conduction and binaural headsets are relatively discreet and portable, external multispeaker headsets may require a bulky and unconventional design. There is considerable variation within the blind community when assessing the cosmetic acceptability of a wearable electronic device, even if it works well. Nevertheless, the visually impaired participants to the survey by Golledge et al. [151] showed overwhelming support for the idea of traveling more often with such a device, independently of its appearance.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement no. 643636.
SimpsonB. D.BrungartD. S.DallmanR. C.JoffrionJ.PresnarM. D.GilkeyR. H.Spatial audio as a navigation aid and attitude indicator49Proceedings of the 49th Annual Meeting of the Human Factors and Ergonomics Society, HFES '05September 2005160216062-s2.0-44349167644AvanziniF.SpagnolS.RodáA.De GötzenA.FraninovicK.SerafinS.Designing interactive sound for motor rehabilitation tasks2013Massachusetts, Mass, USAMIT Press273283chapter 12DakopoulosD.BourbakisN. G.Wearable obstacle avoidance electronic travel aids for blind: a survey2010401253510.1109/tsmcc.2009.20212552-s2.0-73049090982BhowmickA.HazarikaS. M.An insight into assistive technology for the visually impaired and blind people: State-of-the-art and future trends201720172124AhlmarkD. I.2016Lulea, SwedenLulea University of TechnologyBujaczM.StrumiłłoP.Sonification: review of auditory display solutions in electronic travel aids for the blind201641340141410.1515/aoa-2016-00402-s2.0-84996618854CsapóÁ.WersényiG.NagyH.StockmanT.A survey of assistive technologies and applications for blind users on mobile platforms: a review and foundation for research2015942752862-s2.0-8494786587710.1007/s12193-015-0182-7WalkerB. N.LindsayJ.Navigation performance with a virtual auditory display: effects of beacon sound, capture radius, and practice200648226527810.1518/0018720067777245072-s2.0-33745966449BrungartD. S.Near-field virtual audio displays2002111931062-s2.0-003648761110.1162/105474602317343686CsapóÁ.WersényiG.Overview of auditory representations in human-machine interfaces201346212310.1145/2543581.2543586KristjánssonÁ.MoldoveanuA.JóhannessonÓ. I.BalanO.SpagnolS.ValgeirsdóttirV. V.UnnthorssonR.Designing sensory-substitution devices: principles, pitfalls and potential201634576978710.3233/RNN-1606472-s2.0-84989204995LoomisJ. M.MarstonJ. R.GolledgeR. G.KlatzkyR. L.Personal guidance system for people with visual impairment: a comparison of spatial displays for route guidance20059942192322-s2.0-17644361899SkulimowskiP.KorbelP.WawrzyniakP.POI explorer - A sonified mobile application aiding the visually impaired in urban navigationProceedings of the 2014 Federated Conference on Computer Science and Information Systems, FedCSIS '14September 201496997610.15439/2014F2932-s2.0-84912117336GarciaA.FinomoreV.BurnettG.CalvoA.BaldwinC.BrillC.Evaluation of multimodal displays for waypoint navigationProceedings of the 2012 IEEE International Multi-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support, CogSIMA '12March 2012Louisiana, La, USAIEEE13413710.1109/CogSIMA.2012.61883652-s2.0-84861121879StrumiłłoP.BujaczM.BaranskiP.PissalouxE.VelazquezR.Different approaches to aiding blind persons in mobility and navigation in the Naviton and Sound of Vision projects2018Cham, SwitzerlandSpringer International Publishing435468BlauertJ.19962ndMassachusetts, Mass, USAMIT PressSpagnolS.2012Padova, ItalyUniversity of PadovaStruttJ. W.On our perception of sound direction1907131907214232KuhnG. F.Model for the interaural time differences in the azimuthal plane19776211571672-s2.0-001737808910.1121/1.381498BrungartD. S.DurlachN. I.RabinowitzW. M.Auditory localization of nearby sources. II. Localization of a broadband source199910641956196810.1121/1.4279432-s2.0-0032876793WilskaA.2010Helsinki, FinlandUniversity of HelsinkiHebrankJ.WrightD.Spectral cues used in the localization of sound sources on the median plane1974566182918342-s2.0-001631343610.1121/1.1903520AsanoF.SuzukiY.SoneT.Role of spectral cues in median plane localization19908811591682-s2.0-002533520110.1121/1.399963GardnerM. B.GardnerR. S.Problem of localization in the median plane: effect of pinnae cavity occlusion197353240040810.1121/1.19133362-s2.0-0015575571MorimotoM.The contribution of two ears to the perception of vertical angle in sagittal planes20011094159616032-s2.0-003507253410.1121/1.1352084HebrankJ.WrightD.Are two ears necessary for localization of sound sources on the median plane?19745639359382-s2.0-001609965810.1121/1.1903351ShawE. A. G.TeranishiR.Sound pressure generated in an external-ear replica and real human ears by a nearby point source196844124024910.1121/1.19110592-s2.0-0014314453SpagnolS.HiipakkaM.PulkkiV.A single-azimuth Pinna-Related Transfer Function databaseProceedings of the 14th International Conference on Digital Audio Effects, DAFx '11September 2011209212Lopez-PovedaE. A.MeddisR.A physical model of sound diffraction and reflections in the human concha19961005324832592-s2.0-002990753610.1121/1.417208BrungartD. S.RabinowitzW. M.Auditory localization of nearby sources. Head-related transfer functions199910631465147910.1121/1.4271802-s2.0-0032861877SpagnolS.On distance dependence of pinna spectral patterns in head-related transfer functions20151371EL58EL642-s2.0-8491968306010.1121/1.4903919ShawE. A. G.GilkeyR. H.AndersonT. R.Acoustical features of human ear1997New Jersey, NJ, USALawrence Erlbaum Associates2547MooreB. C. J.OldfieldS. R.DooleyG. J.Detection and discrimination of spectral peaks and notches at 1 and 8 khz198985282083610.1121/1.3975542-s2.0-0024571273IidaK.ItohM.ItagakiA.MorimotoM.Median plane localization using a parametric model of the head-related transfer function based on spectral cues20076888358502-s2.0-3424756277210.1016/j.apacoust.2006.07.016GreffR.KatzB. F. G.Perceptual evaluation of HRTF notches versus peaks for vertical localisationProceedings of the 19th International Congress on Acoustics2007IidaK.IshiiY.Roles of spectral peaks and notches in the head-related transfer functions in the upper median plane for vertical localization201614042957295710.1121/1.4969132BrownC. P.DudaR. O.A structural model for binaural sound synthesis19986547648810.1109/89.7096732-s2.0-0032163505AlgaziV. R.AvendanoC.DudaR. O.Elevation localization and head-related transfer function analysis at low frequencies20011093111011222-s2.0-003510345510.1121/1.1349185KirkebyO.SeppalaE. T.KarkkainenA.KarkkainenL.HuttunenT.Some effects of the torso on head-related transfer functionsProceedings of the 122nd Audio Engineering Society ConventionMay 200710451052ZahorikP.BrungartD. S.BronkhorstA. W.Auditory distance perception in humans: a summary of past and present research20059134094202-s2.0-19544369514BegaultD. R.1994Massachusetts, Mass, USAAcademic Press Professional, Inc.GardnerM. B.Distance estimation of 0° or apparent 0°-oriented speech signals in anechoic space1969451475310.1121/1.19113722-s2.0-0014455498MershonD. H.BowersJ. N.Absolute and relative cues for the auditory perception of egocentric distance1979833113222-s2.0-001876443810.1068/p080311BrungartD. S.Auditory localization of nearby sources. III. Stimulus effects19991066358936022-s2.0-003280335110.1121/1.428212WightmanF. L.KistlerD. J.Resolution of front-back ambiguity in spatial hearing by listener and source movement19991055284128532-s2.0-003294185810.1121/1.426899ThurlowW. R.RungeP. S.Effect of induced head movements on localization of direction of sounds196742248048810.1121/1.19106042-s2.0-0014121330SpeigleJ. M.LoomisJ. M.Auditory distance perception by translating observersProceedings of the IEEE Research Properties in Virtual Reality Symposium1993California, Calif, USAIEEE929910.1109/VRAIS.1993.378257RebillatM.BoutillonX.CorteelE. T.KatzB. F. G.Audio, visual, and audio-visual egocentric distance perception by moving subjects in virtual environments201294, article no. 192-s2.0-8487850328510.1145/2355598.2355602HartmannW. M.WittenbergA.On the externalization of sound images1996996367836882-s2.0-002999825910.1121/1.414965PlengeG.On the differences between localization and lateralization19745639449512-s2.0-001609960110.1121/1.1903353WightmanF. L.KistlerD. J.Headphone simulation of free-field listening. II: psychophysical validation198985286887810.1121/1.3975582-s2.0-0024499371SimonL. S. R.ZacharovN.KatzB. F. G.Perceptual attributes for the comparison of head-related transfer functions20161405362336322-s2.0-8499545066210.1121/1.4966115BegaultD. R.WenzelE. M.Headphone localization of speech19933523613762-s2.0-0027611203834929210.1177/001872089303500210WersényiG.Effect of emulated head-tracking for reducing localization errors in virtual audio simulation20091722472522-s2.0-7035046154510.1109/TASL.2008.2006720HendrickxE.StittP.MessonnierJ.LyzwaJ.KatzB. F.de BoishéraudC.Influence of head tracking on the externalization of speech stimuli for non-individualized binaural synthesis201714132011202310.1121/1.4978612VälimäkiV.ParkerJ. D.SaviojaL.SmithJ. O.AbelJ. S.Fifty years of artificial reverberation2012205142114482-s2.0-8487112569910.1109/TASL.2012.2189567YuanY.XieL.FuZ.-H.XuM.CongQ.Sound image externalization for headphone based real-time 3D audio20171134194282-s2.0-8501849290310.1007/s11704-016-6182-2WernerS.GötzG.KleinF.Influence of head tracking on the externalization of auditory events at divergence between synthesized and listening room using a binaural headphone systemProceedings of the 142nd Audio Engineering Society International Convention2017BenedettiL. H.LoebM.A comparison of auditory monitoring performance in blind subjects with that of sighted subjects in light and dark197211110162-s2.0-004081145310.3758/BF03212675StarlingerI.NiemeyerW.Do the blind hear better? investigations on auditory processing in congenital or early acquired blindness I. Peripheral functions19812065035092-s2.0-001941018110.3109/00206098109072718BrossM.BorensteinM.Temporal auditory acuity in blind and sighted subjects: a signal detection analysis198255396396610.2466/pms.1982.55.3.9632-s2.0-0020405516LaiH.-H.ChenY.-C.A study on the blind's sensory ability20063665655702-s2.0-3364676352810.1016/j.ergon.2006.01.015NiemeyerW.StarlingerI.Do the blind hear better? investigations on auditory processing in congenital or early acquired blindness II. Central functions19812065105152-s2.0-001941573610.3109/00206098109072719MuchnikC.EfratiM.NemethE.MalinM.HildesheimerM.Central auditory skills in blind and sighted subjects1991201192310.3109/010503991090707852-s2.0-0026085601RauscheckerJ. P.Compensatory plasticity and sensory substitution in the cerebral cortex199518136432-s2.0-002885442310.1016/0166-2236(95)93948-WLessardN.ParéM.LeporeF.LassondeM.Early-blind human subjects localize sound sources better than sighted subjects199839566992782802-s2.0-003254157810.1038/26228ZwiersM. P.Van OpstalA. J.CruysbergJ. R.A spatial hearing deficit in early-blind humans2001219152-s2.0-0035344027OhuchiM.IwayaY.SuzukiY.MunekataT.A comparative study of sound localization acuity of congenital blind and sighted people20062752902932-s2.0-3374830745510.1250/ast.27.290KolarikA. J.PardhanS.CirsteaS.MooreB. C. J.Auditory spatial representations of the world are compressed in blind humans201723525976062-s2.0-8499479505710.1007/s00221-016-4823-1RiceC. E.Human echo perception196715537636566642-s2.0-001419777810.1126/science.155.3763.656VossP.TabryV.ZatorreR. J.Trade-off in the sound localization abilities of early blind individuals between the horizontal and vertical planes20153515605160562-s2.0-8492925751010.1523/JNEUROSCI.4544-14.2015RöderB.Teder-SalejarviW.SterrA.RoslerF.HillyardS. A.NevilleH. J.Improved auditory spatial tuning in blind humans1999400674016216610.1038/221062-s2.0-0345211544CrattyB. J.ThomasC. C.1971Illinois, Ill, USASpringfieldDufourA.DesprésO.CandasV.Enhanced sensitivity to echo cues in blind subjects200516545155192-s2.0-2404450166510.1007/s00221-005-2329-3BassettI. G.EastmondE. J.Echolocation: measurement of pitch versus distance for sounds reflected from a flat surface196436591191610.1121/1.19191172-s2.0-0001666629RosenblumL. D.GordonM. S.JarquinL.Echolocating distance by moving and stationary listeners200012318120610.1207/S15326969ECO1203_12-s2.0-0034422059KelloggW. N.Sonar System of the Blind196213735283994042-s2.0-000014725310.1126/science.137.3528.399MiuraT.MuraokaT.IfukubeT.Comparison of obstacle sense ability between the blind and the sighted: A basic psychophysical study for designs of acoustic assistive devices20103121371472-s2.0-7795116936510.1250/ast.31.137RiceC. E.FeinsteinS. H.Sonar system of the blind: size discrimination1965381481107110810.1126/science.148.3673.11072-s2.0-0007027156StoffregenT. A.PittengerJ. B.Human echolocation as a basic form of perception and action19957318121610.1207/s15326969eco0703_22-s2.0-77951200596WersényiG.Virtual localization by blind persons2012607-85685792-s2.0-84865958639DobruckiA.PlaskotaP.PruchnickiP.PecM.BujaczM.StrumiłłoP.Measurement system for personalized head-related transfer functions and its verification by virtual source localization trials with visually impaired and sighted individuals20105897247382-s2.0-77958476469AfonsoA.BlumA.KatzB. F. G.TarrouxP. E.BorstG.DenisM.Structural properties of spatial representations in blind people: scanning images constructed from haptic exploration or from locomotion in a 3-D audio virtual environment201038559160410.3758/MC.38.5.5912-s2.0-77956458824PicinaliL.AfonsoA.DenisM.KatzB. F. G.Exploration of architectural spaces by blind people using auditory virtual reality for the construction of spatial knowledge20147243934072-s2.0-8489274199710.1016/j.ijhcs.2013.12.008CoboA.GuerrónN. E.MartínC.del PozoF.SerranoJ. J.Differences between blind people's cognitive maps after proximity and distant exploration of virtual environments2017771229430810.1016/j.chb.2017.09.0072-s2.0-85029378444PapadopoulosK.KoustriavaE.BaroutiM.Cognitive maps of individuals with blindness for familiar and unfamiliar spaces: construction through audio-tactile maps and walked experience2017751037638410.1016/j.chb.2017.04.0572-s2.0-85019710823ChengC. I.WakefieldG. H.Introduction to head-related transfer functions (HRTFs): Representations of HRTFs in time, frequency, and space20014942312492-s2.0-0035304034BlauertJ.2013New York, NY, USASpringerBronkhorstA. W.Localization of real and virtual sound sources1995985254225532-s2.0-002879327310.1121/1.413219WersényiG.Localization in a head-related transfer function-based virtual audio synthesis using additional high-pass and low-pass filtering of sound sources200728424425010.1250/ast.28.2442-s2.0-34547336513BurkhardM. D.SachsR. M.Anthropometric manikin for acoustic research19755812142222-s2.0-001652274810.1121/1.380648AndreopoulouA.BegaultD. R.KatzB. F. G.Inter-laboratory round robin HRTF measurement comparison20159589590610.1109/JSTSP.2015.24004172-s2.0-84937039947AlgaziV. R.DudaR. O.ThompsonD. M.AvendanoC.The CIPIC HRTF databaseProceedings of the IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics2001New York, NY, USAIEEE9910210.1109/ASPAA.2001.969552WenzelE. M.ArrudaM.KistlerD. J.WightmanF. L.Localization using nonindividualized head-related transfer functions19939411111232-s2.0-002717008510.1121/1.407089BegaultD. R.WenzelE. M.AndersonM. R.Direct comparison of the impact of head tracking, reverberation, and individualized head-related transfer functions on the spatial perception of a virtual speech source200149109049162-s2.0-0035483479MøllerH.SørensenM. F.JensenC. B.HammershøiD.Binaural technique: do we need individual recordings?19964464514642-s2.0-0030165624GeronazzoM.SpagnolS.AvanziniF.A modular framework for the analysis and synthesis of head-related transfer functionsProceedings of the 134th AES Convention - Audio Engineering Society2013SeeberB. U.FastlH.Subjective selection of non-individual head-related transfer functionsProceedings of the International Conference on Auditory Display (ICAD '03)2003259262ZotkinD. N.DuraiswamiR.DavisL. S.Rendering localized spatial audio in a virtual auditory space2004645535642-s2.0-324277295010.1109/TMM.2004.827516GeronazzoM.SpagnolS.BedinA.AvanziniF.Enhancing vertical localization with image-guided selection of non-individual head-related transfer functionsProceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '14May 201444964500IidaK.IshiiY.NishiokaS.Personalization of head-related transfer functions in the median plane based on the anthropometry of the listener's pinnae201413613173332-s2.0-8490397604410.1121/1.4880856SoR. H. Y.NganB.HornerA.BraaschJ.BlauertJ.LeungK. L.Toward orthogonal non-individualised head-related transfer functions for forward and backward directional sound: Cluster analysis and an experimental study20105367677812-s2.0-7795265233110.1080/00140131003675117KatzB. F. G.ParseihianG.Perceptually based head-related transfer function database optimization20121312EL99EL1052-s2.0-8485741671410.1121/1.3672641HwangS.ParkY.ParkY.-S.Modeling and customization of head-related impulse responses based on general basis functions in time domain20089469659802-s2.0-5914910519410.3813/AAA.918113ShinK. H.ParkY.Enhanced vertical perception through head-related impulse response customization based on pinna response tuning in the median plane2008E91-A13453562-s2.0-4184909649110.1093/ietfec/e91-a.1.345RabinowitzW. M.MaxwellJ.ShaoY.WeiM.Sound localization cues for a magnified head: implications from sound diffraction about a rigid sphere19932212512910.1162/pres.1993.2.2.125SpagnolS.TavazziE.AvanziniF.Distance rendering and perception of nearby virtual sound sources with a near-field filter model201711561732-s2.0-8498344609710.1016/j.apacoust.2016.08.015MokhtariP.TakemotoH.NishimuraR.KatoH.Acoustic simulation of KEMAR’s HRTFs: verification with measurements and the effects of modifying head shape and pinna concavityProceeding of the Interenational Workshop Principles Apply Spatial Hearing (IWPASH '09)2009DudaR. O.AvendanoC.AlgaziV. R.An adaptable ellipsoidal head model for the interaural time differenceProceedings of the IEEE International Conference on Acoustics, Speech, and Signal ProcessingMarch 1999965968BomhardtR.LinsM.FelsJ.Analytical ellipsoidal model of interaural time differences for the individualization of head-related impulse responses201664118828942-s2.0-8501168123710.17743/jaes.2016.0041KatzB. F. G.NoisternigM.A comparative study of interaural time delay estimation methods20141356353035402-s2.0-8490321293910.1121/1.4875714AndreopoulouA.KatzB. F.Identification of perceptually relevant methods of inter-aural time difference estimation2017142258859810.1121/1.4996457AlgaziV. R.DudaR. O.ThompsonD. M.The use of head-and-torso models for improved spatial sound synthesis inProceedings of the 113th Convention of the Audio Engineering Society2002118AlgaziV. R.DudaR. O.DuraiswamiR.GumerovN. A.TangZ.Approximating the head-related transfer function using simple geometric models of the head and torso200211252053206410.1121/1.15087802-s2.0-0036840525MeshramA.MehraR.YangH.DunnE.FranmJ.-M.ManochaD.P-HRTF: Efficient personalized HRTF computation for high-fidelity spatial soundProceedings of the 13th IEEE International Symposium on Mixed and Augmented Reality, ISMAR '14September 2014Munich, GermanyIEEE536110.1109/ISMAR.2014.69484092-s2.0-84945117975AlgaziV.DudaR.MorrisonR.ThompsonD.Structural composition and decomposition of HRTFsProceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics2001New York, NY, USAIEEE10310610.1109/ASPAA.2001.969553AlgaziV. R.AvendanoC.DudaR. O.Estimation of a spherical-head model from anthropometry20014964724792-s2.0-0035358512SpagnolS.AvanziniF.Anthropometric tuning of a spherical head model for binaural virtual acoustics based on interaural level differencesProceedings of the 21st International Conference on Auditory Display ICAD '152015204209AussalM.AlougesF.KatzB. F.HRTF interpolation and ITD personalization for binaural synthesis using spherical harmonicsProceedings of the 25th UK Conference Audio Engineering Society2012WatkinsA. J.Psychoacoustical aspects of synthesized vertical locale cues1978634115211652-s2.0-001819074410.1121/1.381823TeranishiR.ShawE. A. G.External-ear acoustic models with simple geometry196844125726310.1121/1.19110612-s2.0-0014314578ShawE. A. G.StudebakerG. A.HochbergI.The acoustics of the external ear1980Maryland, Md, USAUniversity Park Press109125SatarzadehP.AlgaziR. V.DudaR. O.Physical and filter pinna models based on anthropometryProceedings of the 122nd Audio Engineering Society Convention '07May 2007718737FallerK. J.BarretoA.AdjouadiM.Augmented Hankel total least-squares decomposition of head-related transfer functions2010581-23212-s2.0-77249143378SpagnolS.GeronazzoM.AvanziniF.On the relation between pinna reflection patterns and head-related transfer function features20132135085202-s2.0-8487219250710.1109/TASL.2012.2227730LiL.HuangQ.HRTF personalization modeling based on RBF neural networkProceedings of the 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '13May 2013Vancouver, BC, CanadaIEEE3707371010.1109/ICASSP.2013.66383502-s2.0-84890530858HuangQ.LiL.Modeling individual HRTF tensor using high-order partial least squares201458114GrijalvaF.MartiniL.GoldensteinS.FlorencioD.Anthropometric-based customization of head-related transfer functions using Isomap in the horizontal planeProceedings of the 39th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '14May 2014Firenze, ItalyIEEE4506451010.1109/ICASSP.2014.68544482-s2.0-84905248159BilinskiP.AhrensJ.ThomasM. R. P.TashevI. J.PlattJ. C.HRTF magnitude synthesis via sparse representation of anthropometric featuresProceedings of the 39th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '14May 2014Firenze, ItalyIEEE45014505MiddlebrooksJ. C.Individual differences in external-ear transfer functions reduced by scaling in frequency199910631480149210.1121/1.4271762-s2.0-0032845319MokhtariP.TakemotoH.NishimuraR.KatoH.Frequency and amplitude estimation of the first peak of head-related transfer functions from individual pinna anthropometry201513726907012-s2.0-8492335673010.1121/1.4906160MokhtariP.TakemotoH.NishimuraR.KatoH.Vertical normal modes of human ears: Individual variation and frequency estimation from pinna anthropometry201614028148312-s2.0-8498121174410.1121/1.4960481RaykarV. C.DuraiswamiR.YegnanarayanaB.Extracting the frequencies of the pinna spectral notches in measured head related impulse responses200511813643742-s2.0-2214448809410.1121/1.1923368SpagnolS.AvanziniF.Frequency estimation of the first pinna notch in head-related transfer functions with a linear anthropometric modelProceedings of the 18th International Conference on Digital Audio Effects2015231236SpagnolS.GeronazzoM.RocchessoD.AvanziniF.Synthetic individual binaural audio delivery by pinna image processing20141032392542-s2.0-8491143752110.1108/IJPCC-06-2014-0035SpagnolS.ScaiellaS.GeronazzoM.AvanziniF.Subjective evaluation of a low-order parametric filter model of the pinna for binaural sound renderingProceedings of the 22nd International Congress on Sound and Vibration, ICSV '15July 2015HuttunenT.SeppäläE. T.KirkebyO.KärkkäinenA.KärkkäinenL.Simulation of the transfer function for a head-and-torso model over the entire audible frequency range20071544294482-s2.0-4954909225110.1142/S0218396X07003469KatzB. F. G.Boundary element method calculation of individual head-related transfer function. I. Rigid model calculation200111052440244810.1121/1.14124402-s2.0-0035173802KahanaY.NelsonP. A.Boundary element simulations of the transfer function of human heads and baffled pinnae using accurate geometric models20073003-55525792-s2.0-3384588695710.1016/j.jsv.2006.06.079KreuzerW.MajdakP.ChenZ.Fast multipole boundary element method to calculate head-related transfer functions for a wide frequency range20091263128012902-s2.0-7034916080710.1121/1.3177264GumerovN. A.O'DonovanA. E.DuraiswamiR.ZotkinD. N.Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation201012713703862-s2.0-7594910618310.1121/1.3257598ZiegelwangerH.MajdakP.KreuzerW.Numerical calculation of listener-specific head-related transfer functions and sound localization: microphone model and mesh discretization2015138120822210.1121/1.49225182-s2.0-84937044108JinC. T.GuillonP.EpainN.ZolfaghariR.Van SchaikA.TewA. I.HetheringtonC.ThorpeJ.Creating the Sydney York morphological and acoustic recordings of ears database2014161374610.1109/TMM.2013.22821342-s2.0-84890964909ZiegelwangerH.KreuzerW.MajdakP.MESH2HRTF: an open-source software package for the numerical calculation of head-related transfer functionsProceedings of the 22nd International Congress on Sound and Vibration 2015 (ICSV '22)2015Firenze, ItalyIEEEDellepianeM.PietroniN.TsingosN.AsselotM.ScopignoR.Reconstructing head models from photographs for individualized 3D-audio processing2008277171917272-s2.0-5904908824710.1111/j.1467-8659.2008.01316.xMøllerH.HammershøiD.JensenC. B.SørensenM. F.Transfer characteristics of headphones measured on human ears1995434203217PralongD.CarlileS.The role of individualized headphone calibration for the generation of high fidelity virtual auditory space19961006378537932-s2.0-002977660810.1121/1.417337MasieroB.FelsJ.Perceptually robust headphone equalization for binaural reproductionProceedings of the 130th Audio Engineering Society Convention2011SchärerZ.LindauA.Evaluation of equalization methods for binaural signalsProceedings of the 126th Audio Engineering Society Convention2009SchonsteinD.FerréL.KatzB. F.Comparison of headphones and equalization for virtual auditory source localization200812353724372410.1121/1.2935199GolledgeR. G.MarstonJ. R.LoomisJ. M.KlatzkyR. L.Stated Preferences for Components of a Personal Guidance System for Nonvisual Navigation20049831351472-s2.0-1642395553PulkkiV.Virtual sound source positioning using vector base amplitude panning1997456456466GerzonM. A.Ambisonics in multichannel broadcasting and video198533118598712-s2.0-0022162094KönigF. M.4-canal headphone for in-front localization and HDTV- or Dolby-surround useProceedings of the 96th Convention Audio Engineering Society1994KönigF. M.A new supra-aural dynamic headphone system for in-front localization and surround reproduction of soundProceedings of the 102nd Convention Audio Engineering Society1997KönigF. M.New measurements and psychoacoustic investigations on a headphone for TAX/HDTV/Dolby-surround reproduction of soundProceedings of the 98th Convention Audio Engineering Society1995WeinrichS. G.Improved externalization and frontal perception of headphone signalsProceedings of the 92nd Convention Audio Engineering Society1992SunderK.TanE.-L.GanW.-S.Individualization of binaural synthesis using frontal projection headphones2013611298910002-s2.0-84892771521GreffR.KatzB. F.Circumaural transducer arrays for binaural synthesis200812353562356210.1121/1.2934607BujaczM.KropidlowskiK.IvanicaG.MiesenbergerK.BühlerC.PenazP.Sound of Vision - Spatial audio output and sonification approaches9759Proceedings of the 15th International Conference of Computers Helping People with Special Needs (ICCHP '16)2016Linz, AustriaSpringer202209WalkerB. N.StanleyR. M.Thresholds of audibility for bone-conduction headsetsProceedings of the International Conference on Auditory Display2005Limerick, IrelandIEEE218222WalkerB. N.StanleyR. M.IyerN.SimpsonB. D.BrungartD. S.Evaluation of bone-conduction headsets for use in multitalker communication environmentsProceedings of the 49th Annual Meeting of Human Factors and Ergonomics Society200516151619StanleyR. M.WalkerB. N.Lateralization of sound using bone-conduction headsetsProceedings of the 50th Annual Meeting of Human Factors and Ergonomics Society200615711575MacDonaldJ. A.HenryP. P.LetowskiT. R.Spatial audio through a bone conduction interface200645105955992-s2.0-3375039112910.1080/14992020600876519StanleyR. M.2009Georgia, Ga, USAGeorgia Institute of TechnologyReinfeldtS.HåkanssonB.TaghaviH.Eeg-OlofssonM.New developments in bone-conduction hearing implants: a review20158799310.2147/mder.s396912-s2.0-84921480400WalkerB. N.LindsayJ. L.Navigation performance in a virtual environment with bonephonesProceedings of the 11th International Conf.erence Auditory Display (ICAD '05)2005Limerick, IrelandIEEE260263WalkerB. N.StanleyR. M.PrzekwasA.High fidelity modeling and experimental evaluation of binaural bone conduction communication devicesProceedings of the 19th International Congress on Acoustics2007LindemanR. W.NomaH.De BarrosP. G.Hear-through and mic-through augmented reality: Using bone conduction to display spatialized audioProceedings of the 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, ISMAR '07November 200710.1109/ISMAR.2007.45388432-s2.0-50649121269LindemanR. W.NomaH.De BarrosP. G.An empirical study of Hear-Through augmented reality: Using bone conduction to deliver spatialized audioProceedings of the IEEE Virtual RealityMarch 2008354210.1109/VR.2008.44807472-s2.0-50249182987BardeA.HeltonW. S.LeeG.BillinghurstM.Binaural spatialization over a bone conduction headset: Minimum discernable angular differenceProceedings of the 140th Convention of the Audio Engineering Society2016RafaelyB.Active noise reducing headset - An overviewProceedings of the International Congress and Exhibition on Noise Control Engineering2001OppenheimA. V.WeinsteinE.ZangiK. C.FederM.GaugerD.Single-sensor active noise cancellation199422285290TikanderM.Usability issues in listening to natural sounds with an augmented reality audio headset20095764304412-s2.0-67749135750HärmäA.JakkaJ.TikanderM.KarjalainenM.LokkiT.HiipakkaJ.LorhoG.Augmented reality audio for mobile and wearable appliances20045266186392-s2.0-4344645826TikanderM.KarjalainenM.RiikonenV.An augmented reality audio headsetProceedings of the 11th International Conference on Digital Audio Effects, DAFx '082008LiskiJ.2016Espoo, FinlandAalto University School of Electrical EngineeringBrunnerS.MaempelH. J.WeinzierlS.On the audibility of comb filter distortionsProceedings of the 122nd Convention Audio Engineering Society2007RämöJ.VälimäkiV.Digital augmented reality audio headset201220121345737410.1155/2012/4573742-s2.0-84869078167HammershøiD.MøllerH.Sound transmission to and within the human ear canal199610014084272-s2.0-003005898110.1121/1.415856MeijerP. B. L.An experimental system for auditory image representations199239211212110.1109/10.1216422-s2.0-0026816635HershM. A.JohnsonM. A.20081stLondon, UKSpringerShovalS.BorensteinJ.KorenY.Auditory guidance with the Navbelt-a computerized travel aid for the blind199828345946710.1109/5326.7045892-s2.0-0032141288BalakrishnanG.SainarayananG.NagarajanR.YaacobS.A stereo image processing system for visually impaired200823136145ZhigangF.TingL.Audification-based electronic travel aid system5Proceedings of the International Conference on Computer Design and Applications, ICCDA '102010Qinhuangdao, ChinaIEEE13714110.1109/ICCDA.2010.55415222-s2.0-77955905868González-MoraJ. L.Rodríguez-HernándezA.Rodríguez-RamosL. F.Díaz-SacoL.SosaN.Development of a new space perception system for blind people, based on the creation of a virtual acoustic space19991607Berlin, GermanySpringer321330Lecture Notes in Computer ScienceFontanaF.FusielloA.GobbiM.MurinoV.RocchessoD.SartorL.PanuccioA.A cross-modal electronic travel aid device20022411Berlin, GermanySpringer393397Lecture Notes in Computer ScienceWilsonJ.WalkerB. N.LindsayJ.CambiasC.DellaertF.SWAN: system for wearable audio navigationProceedings of the 11th IEEE International Symposium on Wearable Computers, ISWC '07October 2007Massachusetts, Mass, USAIEEE919810.1109/ISWC.2007.43737862-s2.0-41149131898Torres-GilM. A.Casanova-GonzalezO.Gonzalez-MoraJ. L.Applications of virtual reality for visually impaired people2010921841932-s2.0-77950157934DunaiL.FajarnesG. P.PraderasV. S.GarciaB. D.LenguaI. L.Real–Time Assistance Prototype –A new navigation aid for blind peopleProceedings of the 36th Annual Conference of the IEEE Industrial Electronics Society, IECON '10November 2010Arizona, Ariz, USAIEEE1173117810.1109/IECON.2010.56755352-s2.0-78751562065FajarnesG. P.DunaiL.PraderasV. S.CASBLiP - A new cognitive object detection and orientation system for impaired peopleProceedings of the 4th International Conference on Cognitive Systems2010Zurich, SwitzerlandIEEEBujaczM.SkulimowskiP.StrumiłłoP.Sonification of 3D scenes using personalized spatial audio to aid visually impaired personsProceedings of the 17th International Conference on Auditory Display2011BujaczM.SkulimowskiP.StrumilloP.Naviton-a prototype mobility aid for auditory presentation of three-dimensional scenes to the visually impaired20126096967082-s2.0-84868558915KatzB. F. G.KammounS.ParseihianG.GutierrezO.BrilhaultA.AuvrayM.TruilletP.DenisM.ThorpeS.JouffraisC.NAVIG: augmented reality guidance system for the visually impaired201216425326910.1007/s10055-012-0213-6KatzB. F. G.DramasF.ParseihianG.GutierrezO.KammounS.BrilhaultA.BrunetL.GallayM.OriolaB.AuvrayM.TruilletP.DenisM.ThorpeS.JouffraisC.NAVIG: Guidance system for the visually impaired using virtual augmented reality20122421631782-s2.0-8485915680310.3233/TAD-2012-0344VliegenJ.Van OpstalA. J.The influence of duration and level on human sound localization200411541705171310.1121/1.16874232-s2.0-1842425633SpagnolS.HoffmannR.Herrera MartínezM.UnnthorssonR.Blind wayfinding with physically-based liquid sounds201811591910.1016/j.ijhcs.2018.02.002MendonçaC.A review on auditory space adaptations to altered head-related cues20148219114Article 21910.3389/fnins.2014.002192-s2.0-84905898525ZahorikP.BangayanP.SundareswaranV.WangK.TamC.Perceptual recalibration in human sound localization: Learning to remediate front-back reversals200612013433592-s2.0-3374575479410.1121/1.2208429CarlileS.The plastic ear and perceptual relearning in auditory spatial perception20148237113ParseihianG.KatzB. F. G.Rapid head-related transfer function adaptation using a virtual auditory environment.20121314294829572-s2.0-8486309041810.1121/1.3687448HondaA.ShibataH.HidakaS.GyobaJ.IwayaY.SuzukiY.Effects of head movement and proprioceptive feedback in training of sound localization2013442532642-s2.0-8487918234710.1068/i0522BalanO.MoldoveanuA.NagyH.WersényiG.BotezatuN.StanA.LupuR. G.Haptic-auditory perceptual feedback based training for improving the spatial acoustic resolution of the visually impaired peopleProceedings of the 21st International Conference on Auditory Display (ICAD '15)20152128