A sound source localization device based on a multimicrophone array with the rectangular pyramid structure is proposed for mobile robot in some indoor applications. Firstly, a time delay estimation method based on the cross-power spectral phase algorithm and a fast search strategy of peak value based on the geometric distribution of microphones are developed to estimate the sound propagation delay differences between two microphones. Moreover, a rejection strategy is presented to evaluate the correctness of the delay difference values. And then, the device’s geometric equations based on the time-space mapping relationship are established to calculate the position of the sound source. For fast solving the equations, the multimicrophone array space is divided into several subspaces to narrow the solution range, and Newton iteration algorithm is introduced to solve the equations, while its solution is evaluated by an evaluation mechanism based on coordinate thresholds. Finally, some experiments are carried out to verify the performance of the device, of which the results show that the device can achieve sound source localization with a high accuracy.
As an important branch of the robot family, mobile robots for home services are definitely the trend that must lead a new life style of human being [
In contrast to vision, robot audition has its unique advantages in perception. Robot’s perception of sound is nearly omnidirectional and independent of the lighting conditions. Similarly, robots are able to detect sound signals even in the presence of obstructions [
SSL technology allows a robot determine the direction and position of a sound source using only sound data, which is essential in the overall scheme of robot audition, and has an important impact on other robot audition modules[
Taking the localization of real-time and high accuracy and a relatively simple structure into account, this paper designs a new microphone array with a regular quadrature pyramid structure for mobile robots in indoor environment. The proposed microphone array provides a nonplanar reference point with the vertex of the regular quadric pyramid, which can help the array to obtain sound source signals and improve the localization accuracy. According to the microphone array, this paper develops its estimation model of SSL and computing method and proposes a double screening mechanism to improve the reliability of position results.
The rest of the paper is as follows. Section
Irie [
All above SSL methods are always based on a certain number of microphones. Pavlidi et al. [
The above-mentioned microphone arrays have a regular structure, such as linear, triangular, polygonal, circle, and polyhedral arrays, which have the ability to locate sound sources in two-dimensional and three-dimensional space, respectively. The number of microphones and their topology in SSL system mainly depend on the SSL method adopted. Generally, the number of microphones is required by TDOA, HRSA, and BS increases successively.
Among these SSL localization methods, TDOA method is more suitable for robot auditions, in which the azimuth and horizontal distance of a sound source should be determined in real-time. So, the localization methods considered here are all based on TDOA technique. The basic idea of SSL based on TDOA technique is using the observed time difference between signals of a sound source arriving two microphones to construct a hyperboloid in space, on which the location of the sound source can be constrained to lie [
The accuracy of TDE estimation is related to the performance of a SSL system. Some research results show, as the source is moved further away from the robot, an important error growth in distance estimation, as opposed to azimuth and elevation estimation [
Perez-Lorenzo et al. [
As shown in Figure
SSL of hyperbolic curve.
GCC method based on the basic cross-correlation method is often used to estimate the time difference. The main functions of GCC are to suppress noise, prevent the occurrence of multipeak, and highlight the effective peak.
If
Implementation process of time delay estimation based on GCC.
GCC function is defined as
When there is reverberation in the environment, sound signals received by the two microphones can be expressed as
In (
When the reverberation intensity is weak, the values of the last three items in formula (
Formula (
Reverberation, as a physical quantity in acoustic space, has two characteristic parameters of reverberation time and direct to reverberation ratio (DRR).
Reverberation time is a very important parameter to describe the degree of the sound decay in the room. Specifically, it refers to the time that the energy attenuates 60dB after the sound is stopped in the diffusion field and the residual sound energy is refracted for many times. Reverberation time increases with the increase of the volume of the room and decreases with the increase of the sound absorption coefficient of the room. If
DRR refers to the power spectral ratio of a direct sound signal to a reverberant signal. Its mathematical expression is as follows:
The value of DRR depends on the distance between the sound source and the microphone and reverberation time. Further, DRR can be expressed as
The value of
Substituting (
Further, according to (
So,
Some simulation experiments based on MATLAB are carried to verify the performance of the improved algorithm of PHAT weight under reverberation time of 100ms, 200ms, and 300ms, respectively. The experiments estimate the time delay between two microphone signals. One signal is the sampled signal of a section of sound signal with the sampling frequency of 20kHz and the sampling digit of 16bit, which is considered to contain no noise. Another signal is the sampled signal with a delay of 800 sampling cycles. According to the sampling frequency, the time delay between the two signals is 40ms. Gaussian white noise is added to the two signals with a SNR of 5dB.
Figure
The results of simulation experiment of the improved algorithm PHAT weight under different reverberation time. (a), (b), and (c) are the results of the unimproved algorithm PHAT weight under reverberation time of 100ms, 200ms, and 300ms, respectively. (d), (e), and (f) are the results of the improved algorithm PHAT weight under reverberation time of 100ms, 200ms, and 300ms, respectively.
The multiplier of the sound increases due to the reflection of the surface of the object when the sound source signals are in different positions in the room. The improved algorithm of PHAT weight takes full account of the influencing factors of reverberation, which makes the time delay estimation algorithm based on the cross-power spectrum phase not only retain the effective suppression of noise, but also play a good role in eliminating reverberation.
However, the improved weight algorithm needs to know some parameters of the room and the rough position of the sound source in advance, which limits its universality and requires further improvements.
It will be a way to improve the real-time performance of TDE algorithm through reducing the searching interval of the peak of cross-correlation function. So, the paper proposes a fast search strategy of the peak based on the geometric model of the microphone array. There are two geometric relationships between a sound source and two microphones on a plane. As shown in Figure
Geometric relationships between three sound sources and two microphones.
According to Figure
The search interval of the peak.
The error of delay difference will lead to an erroneous positioning result. On the other hand, it is also easy to produce a directional error of positioning if the error of delay difference existed near the area’s boundary. Therefore, the paper proposes a screening strategy of delay difference based on [
(a) Ideally, the delay differences between the microphones
(b) Under the condition of the error of time delay estimation, (
(c) When the inequality is not satisfied, it indicates that the error of this time delay estimation is large and should be discarded. Conversely, this time delay estimation is accepted.
A three-dimensional microphone array based on the structure of rectangular pyramid is designed to locate a sound source. As shown in Figure
The geometry of the microphone array based on the structure of rectangular pyramid.
The distance difference between the distance from the sound source to the other microphone and the distance from the sound source to M0 is
According to the spatial coordinate relation shown in Figure
The coordinates of the sound source can be obtained by using Newton iterative algorithm to solve (
In Newton iteration, s set of appropriate initial coordinates are helpful to improve the convergence rate and get an accuracy solution. So, the paper proposes a strategy of partition iterative to reduce the number of iterations and ensure that each initial coordinates range has a unique optimization point. According to the structural characteristics of rectangular pyramid, the microphone array can get redundant delay differences. For example, the coordinates of the sound source should meet
Schematic diagram of location interval partition.
According to the principle of Newton iteration algorithm and the localization model of the microphone array, (
Jacobian matrix of
So, the coordinates of the sound source can be calculated as
The screening of the delay difference provides a certain guarantee for the accurate space division and solution of the iterative operation. Therefore, when the iterative operation is running in the correct interval, the four localization results will be more accurate, and the difference between them will be smaller. However, when the sound source is very close to the boundary line, a small error of delay difference will also cause a wrong solution interval, which leads to a localization result with a large error. The paper proposes a screening strategy based on geometric model, which sets three coordinate thresholds
The localization results will be considered correct when they meet the above three inequalities; otherwise they are marked as a localization failure (in fact, it is just for this paper to describe and does not mean that the localization result is not obtained).
The average value of the four localization results obtained by the double screening mechanism is used as the final localization result
Figure
The prototype of the rectangular pyramid microphone array mounted on a mobile robot.
The selected microphone model is the DGO-6050CD-P, and its specific parameters are in Table
The parameters of microphone.
Index | Parameter |
---|---|
dimensions | 9.7×6.7mm |
sensitivity | -48-66dB |
frequency range | 20-16kHz |
S/N ratio | greater than 58dB |
operation voltage range | 1.5-10V |
directivity | omnidirectional |
The model of data acquisition card is USB_ HRF4626, which is a high-speed and high-precision synchronous data acquisition card based on USB bus, and its specific parameters are shown in Table
The parameters of data acquisition card.
Index | Parameter |
---|---|
the type of AD | 9.7×6.7mm |
accuracy | 16 bits |
input range | -10~+10V |
numbers of voltage channels | 8 single-ended |
range of sampling rate | 1-100 KHz |
Two terminals of the acquisition card are connected with the microphone array through 5 data wires and the computer through USB, respectively. The sound signals collected by the acquisition card are transmitted to the host computer.
ALL experiments are performed in a laboratory setting with a reverberation time of approximate 200ms shown in Figure
The experimental environment and condition, (a) the experimental environment in a laboratory; (b) a view sketch of the sound source used in the experiment.
The test signals are sampled by a single channel at 20kHz. The sound source is generated by playing prerecorded clapping sound, which is easy to acquire and analyze because its energy is concentrated, and its silent segment is clearly distinguished from the sound segment. In the experiment, the laboratory is considered as a quiet environment, and all environmental noises are ignored. White noise is superimposed on the sound source to get the required SNR, so the actual SNR must be less than the set value.
Three kinds of experiments are designed as follows:
In the experiment 1, the success rate of the delay estimation is defined as
Different time thresholds
The success rate of delay estimation based on the screening strategy of delay difference.
| | |||||
1m | 2m | 3m | 4m | 5m | 6m | |
| 65 | 68 | 67 | 72 | 70 | 66 |
| 76 | 78 | 75 | 76 | 77 | 76 |
| 85 | 86 | 87 | 87 | 86 | 85 |
| 91 | 89 | 91 | 90 | 88 | 91 |
| 92 | 92 | 91 | 93 | 91 | 92 |
Table
In fact, we only need the accurate delay differences of three microphones to calculate the exact position of the sound source. So, the estimate results of delay difference are considered acceptable only if three of the six estimate results meet (
In experiment 2, the localization results of each testing point calculated according to Newton iterative are evaluated by formula (
The evaluation results based on the screening strategy of coordinate thresholds.
| | |||||
1m | 2m | 3m | 4m | 5m | 6m | |
| 10 | 13 | 9 | 13 | 15 | 21 |
| 7 | 8 | 8 | 9 | 9 | 15 |
| 3 | 4 | 5 | 4 | 6 | 6 |
If the localization results do not meet formula (
The positioning error curves of distance and azimuth angle, as shown in Figure
The results of localization experiment for all test points at SNR of 45dB, (a) the relation curves between the positioning error of distance and the testing distance; (b) the relation curves between the positioning error of azimuth angle and the testing distance; (c) the relation curves between the positioning error of distance and the testing azimuth angle; (d) the relation curves between the positioning error of azimuth angle and the testing azimuth angle.
Figures
Figures
The relative error can be calculated as
Figure
The relative positioning errors of localization experiment for all test points at SNR of 45dB, (a) the relative positioning errors of distance; (b) the relative positioning errors of azimuth angle.
According the results shown by Figures
Currently, there are many kinds of microphone arrays for SSL, such as linear arrays, planar arrays, and three-dimensional arrays. From the perspective of positioning accuracy, the positioning accuracy of the three-dimensional array is obviously higher than that of other arrays. In [
Comparison of positioning accuracy between two microphone arrays.
Array model | Distance positioning error | Azimuth positioning error |
---|---|---|
tetrahedral | ≤ 0.40m | ≤ 2.0° |
Rectangular pyramid | ≤ 0.24m | ≤ 1.5° |
Figure
The results of comparison experiment of prototype with different SNR, (a) the curve between the positioning error of distance and the testing distance; (b) the curve between the positioning error of azimuth angle and the testing distance.
Considering that indoor environment is relatively quiet and the mobile robot mainly locates a purposeful active sound source with a strong sound signal, SNR is considered to be generally greater than 10dB. The distance error no more than 0.4m and the angle error of no more than 2° have little impact on mobile robot tracking because robot is able to approach the target by constantly modifying its positioning results. Therefore, the prototype can be used for mobile robot positioning the acoustic target in indoor environment.
Paper [
A SSL system for indoor mobile robots is proposed, including
The proposed method has some novelties compared with the existing methods, including
The next research is to combine the device with a mobile robot to realize the autonomous positioning and tracking a sound source target, which includes
The research library related to the dissertation will be established in GitHub, where you can access the folders and find experimental data and lists: GitHub:
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Guoliang Chen conceived the idea, designed the experiments, and wrote the paper; Yang Xu helped with the algorithm and analyzing the experimental data.
This work is supported by the National Natural Science Foundation of China under Grant no. 61672396.