Distributed 3 D Source Localization from 2 D DOA Measurements Using Multiple Linear Arrays

This manuscript addresses the problem of 3D source localization from direction of arrivals (DOAs) in wireless acoustic sensor networks. In this context, multiple sensors measure the DOA of the source, and a central node combines the measurements to yield the source location estimate. Traditional approaches require 3DDOAmeasurements; that is, each sensor estimates the azimuth and elevation of the source bymeans of a microphone array, typically in a planar or spherical configuration.The proposedmethodology aims at reducing the hardware and computational costs by combining measurements related to 2D DOAs estimated from linear arrays arbitrarily displaced in the 3D space. Each sensor measures the DOA in the plane containing the array and the source. Measurements are then translated into an equivalent planar geometry, in which a set of coplanar equivalent arrays observe the source preserving the original DOAs. This formulation is exploited to define a cost function, whose minimization leads to the source location estimation. An extensive simulation campaign validates the proposed approach and compares its accuracy with state-of-the-art methodologies.


Introduction
The problem of acoustic source localization with microphone arrays has been an active topic of research in the last thirty years [1].Several are the applications that benefit from this technology, among which video-conferencing [2,3], audiosurveillance [4], and so forth are worth mentioning.
In the literature, it has been proposed to use either compact or distributed arrays.Compact arrays are advantageous since their deployment is easy.As an example, in a videoconferencing system, a compact array can be located in the proximity of the camera to steer it towards the detected speaker.Unfortunately, compact arrays pose some limits in terms of localization accuracy.Indeed, for far sources, only the direction of arrival (DOA) can be accurately estimated, while the range estimate is generally unreliable [1].On the other hand, distributed arrays envision the presence of multiple sensors, each equipped with one or more microphones, located around the volume of interest [5][6][7][8][9][10].With this configuration, the source is observed from different angles, with benefits in terms of localization accuracy.Until recently, the cost of the deployment of the sensors and the cabling made this kind of solutions cumbersome and possible only in specific ad hoc applications.In the last few years, however, the advent of wireless sensor networks made this kind of solutions attractive for a wider range of applications.
When each sensor is equipped with multiple microphones, the direction of arrival of the source can be measured internally to each sensor.The source is then localized by a central node, which combines measurements coming from the individual sensors.The advantage of DOAs with respect to other measurements (e.g., time differences of arrival or times of arrival) lies in the fact that each sensor transmits a single measurement to the central node, independently from the number of microphones.
Source localization from multiple DOAs can be brought to the problem of triangulating the estimated DOA lines [11,12].Following this approach, 3D localization is typically accomplished by combining multiple 3D DOAs measured at sensors with noncollinear microphone arrangements.For instance, multiple spherical arrays are used in [13].A 3D DOA is represented by a pair of angles, which denote 2 Wireless Communications and Mobile Computing the azimuth and the elevation of the source with respect to the sensor position.Therefore, measuring a 3D DOA involves a two-dimensional search space.As robust DOA estimation algorithms are generally based on grid-search solutions (e.g., beamforming [14], steered-response-power [15]), the inherent power requirement may be an issue for their implementation at local sensors.For this reason, many works limit source localization to 2D (planar) geometries, assuming that all the sensors and the source lie on the same plane.This is typically accomplished through the triangulation of multiple 2D DOAs, as done, for instance, in [8,12,16].From a geometrical standpoint, a 2D DOA is measured in the plane containing the array and the source and consequently consists of a single angle.The search space is therefore one-dimensional, and simple array geometries can be adopted for this task.For instance, using a linear array, the DOA can be estimated by evaluating an objective function in the range [−/2, /2], suitably sampled according to the desired angular resolution.Moreover, in order to achieve a prescribed resolution, a smaller number of microphones are required compared to that needed for 3D DOA measurements.For these reasons, 2D DOA measurements are more affordable for low-power sensing devices, as typically required in designing wireless sensor networks.
In this manuscript, we propose a technique that localizes a source in 3D from the combination of 2D DOAs, measured using multiple linear microphone arrays.Differently from state-of-the-art methods exploiting 2D DOAs, we remove the requirement of working in a planar geometry.Specifically, we do not pose any constraint about the positions of the source and of the arrays in the 3D space.The proposed technique proceeds in two steps.First, each array measures the 2D DOA of the source referred to a local plane, determined by the lying line of the array and the location of the source.Using the concept of equivalent arrays, the array positions are rototranslated into equivalent ones all lying in the same plane containing also the source.Each equivalent array position is found so that the original DOA is preserved, as well as its distance from the source.Once this transformation has been accomplished, in the second step the source can be localized in a 2D geometry.In order to do this, we take advantage of the Ray Space [17], in which DOAs are interpreted as acoustic rays originating from the source and impinging on the acoustic centers of the sensors.Acoustic rays are parameterized by the parameters of the line on which they lie.A cost function is defined, whose minimization gives us the source location.
The proposed technique is advantageous with respect to state-of-the-art methods for 3D source localization using DOAs, which, to the best of our knowledge, all rely on 3D measurements.Being based on simpler 2D DOA measurements, indeed, the proposed method requires a reduced number of microphones for each sensor of the network, with obvious advantages in terms of hardware cost (microphones and analog-to-digital converters).For the same reasons, power consumption at each sensor turns out to be significantly reduced, as the beamforming operation leading to the 2D DOA estimate is accomplished on a one-dimensional search space.Finally, it is worth noticing that, exploiting a simple geometry such as that of a linear array, the manufacturing costs are much smaller than those required to realize more complex structures (e.g., spherical and cylindrical arrays).
The remainder of the manuscript is structured as follows.Section 2 gives some background information concerning the Ray Space representation and the localization of acoustic sources in 2D.Section 3 introduces the theoretical framework for the proposed approach, with particular reference to the equivalent arrays.Section 4 addresses the problem of source localization using the equivalent arrays.Section 5 is devoted to the validation of the proposed technique.Finally, Section 6 draws some conclusions.

Background
2.1.Ray Space Representation.Consider an acoustic source in a 2D scenario.Its Cartesian coordinates are ( 푥 ,  푦 ).In geometrical acoustics, it can be described as the set of all acoustic rays that originate from it.Each ray can be parameterized with the line on which it lies.The homogeneous representation of all the lines passing through ( 푥 ,  푦 ) is This is equivalent to the vector form The parameter vector l is homogeneous; that is, any scaling l,  ̸ = 0, represents the same line.Consequently, the homogeneous coordinates l form a class of equivalence in a two-dimensional projective space, defined in [18] as the projective Ray Space.

DOAs in the Ray Space.
Let us now assume that the acoustic source is located in the far field with respect to the center of a microphone array ( 푥 ,  푦 ); that is, the length of the array is much smaller than the curvature of the wavefront.In this context, the spatial information of the source can be described in terms of its direction of arrival (DOA), that is, the angle  of propagation of the wavefront.Source and array location are related to the DOA by The DOA defines the acoustic ray that joins the source position and the observation point, which turns to be oriented as .In the Ray Space, the DOA is thus represented as the acoustic ray passing through ( 푥 ,  푦 ) and directed as , whose parameters are In absence of measurement noise, all these acoustic rays are bound to pass through the source position s, meaning that In practical situations, DOAs are corrupted by measurement noise; thus the equalities in (5) do not hold.However, since the system is linear in the unknown variables ( 푥 ,  푦 ), the source position can be easily estimated as the solution of a linear least squares problem.To do so, we first rewrite the system as where Then, the source position is estimated as which corresponds to the solution of the linear least squares problem

Theoretical Framework
Despite its simplicity, the solution provided in ( 8) is valid only for 2D configurations, that is, when the source and the microphone arrays lie on the same plane.For more general 3D scenarios, unfortunately, the generalization of the Ray Space is not straightforward.Indeed, representing acoustic rays in 3D requires the introduction of a five-dimensional projective space involving the use of Plücker's coordinates along with a number of nonlinear constraints [17], which would make the source localization problem intractable.
In this manuscript we follow an alternate route: rather than extending the idea of the Ray Space to a 3D setup, we focus on the reinterpretation of 3D source localization into an equivalent 2D space, by introducing the concept of equivalent microphone arrays.

DOAs in 3D Using Linear Arrays.
A linear microphone array can be described by the parameters of a line passing through its center m 푖 and directed as the unit vector k 푖 .In a 3D setup, a total of 5 parameters are required.Specifically, three Cartesian coordinates define the center m 푖 = [ 푥푖 ,  푦푖 ,  푧푖 ] 푇 , while the orientation is defined by the pair of angles  푖 and  푖 so that Using a linear array it is only possible to estimate a 2D DOA, which corresponds to the angle from which the source is observed by the array center, measured in the plane containing k 푖 and the source position s = [ 푥 ,  푦 ,  푧 ] 푇 .More formally, the DOA as measured by the array is given by the angle formed between the vectors k 푖 and w 푖 = s − m 푖 , which can be computed as where we used the fact that ‖k 푖 ‖ = 1.
Figure 2: A set of real linear arrays (continuous colored arrows) are projected into the corresponding equivalent arrays (dashed colored arrows).Equivalent arrays all lie on the same plane, and they are disposed so that they look at the source from the same DOAs  푖 ,  = 1, . . ., , as measured by the real arrays.

Equivalent Microphone Arrays.
Each DOA measurement refers to a different plane, according to the displacement and orientation of the arrays with respect to the source position.Nonetheless, we can interpret the available DOAs as being measured by a set of equivalent microphone arrays all located in the same plane, which contains also the source position s.Analogously to the real ones, the equivalent arrays are described by the pairs (m 耠 푖 , k 耠 푖 ) indicating their centers and orientations, respectively, defined for  = 1, . . .,  as This idea is illustrated in Figure 2, which highlights a set of real arrays disposed arbitrarily in the space and the corresponding equivalent arrays that all lie in the same plane.
For clarity of visualization, we denoted each pair of real and equivalent arrays using the same color.
The final goal of this reinterpretation is to bring the 3D source localization problem to the one formulated in (9), which exploits the fact that all the measurements are referred to the same plane.For this reason, we must determine the unknown parameters m 耠 푖 and k 耠 푖 so that the equivalent arrays preserve the information related by their real counterparts.To do so, we notice that the only requirements are that all the equivalent arrays (i) must lie in a unique plane containing the source; (ii) must preserve the DOA, so that they observe the source from the same angles as the real arrays.
To satisfy the first constraint, without loss of generality, a convenient choice is to force all the arrays to lie on the plane  =  푧 .Mathematically speaking, this corresponds to posing for  = 1, . . ., .As for the second requirement, formally we must require that the angle formed by the vectors k 耠 푖 and w 耠 푖 = s − m 耠 푖 equals the DOA  푖 as defined in (11); that is, which is equivalent to We observe that the condition in ( 16) is met by an infinite number of equivalent arrays.This happens for two reasons, as exemplified in Figure 3. Indeed, on one hand, for a given choice of the angle  耠 푖 , the DOA is preserved by all the parallel equivalent arrays directed as k 耠 푖 = [cos  耠 푖 , sin  耠 푖 , 0] 푇 and having centers on a common line, as in Figure 3(a).On the other hand, by keeping the distance  푖 of the array from the source fixed, each rigid rotation of the array about s leads to another possible equivalent array preserving the DOA, as shown in Figure 3(b).In order to make the solution unique, we impose that the transformation preserves the angle  푖 (the orientation of the array on the horizontal plane) and the distance from the source.These additional constraints can be formally written as where  = 1, . . ., .Given the constraints in ( 13), ( 14), (17), and (18), what still remains undetermined are the positions of the equivalent arrays in the plane  =  푧 , namely, the terms  耠 푥푖 and  耠 푦푖 .Since they depend on the unknown source location, we will derive  耠 푥푖 and  耠 푦푖 as functions of s = [ 푥 ,  푦 ,  푧 ] 푇 .To do so, we start by replacing (18) in (16) to obtain By inserting ( 14), (13), and ( 17) into (19) and by making the terms explicit we obtain Moreover, if we take the square of ( 18) and we exploit the constraint in (13), we can write that Expressions in ( 20) and ( 21) form a second-order system in the unknown variables  耠 푥푖 and  耠 푦푖 .Rearranging the terms, this system can be rewritten in a more compact form as with the following definitions: Solving for  푖 and  푖 we obtain Using the definitions in (23) and noticing that  2 푖 +  2 푖 = 1, we readily obtain the expressions for the equivalent array positions Note that, as expected, the second-order system (22) leads to two valid solutions.They represent two equivalent arrays, mutually mirrored about the source.Consequently, we can safely select one of those randomly, without loss of information.For instance, one may select the equivalent array located closest to the corresponding real one.

Source Localization
In this section we take advantage of the planar geometry induced by the transformation of the real into the equivalent arrays for the purposes of source localization.We first reformulate the source localization problem as a nonlinear least squares minimization problem, introducing the equivalent array positions in the cost function in (9).Then, we describe the iterative minimization process leading to the estimation of the source location.Finally, we describe how the estimated location can be refined by detecting and removing potential outliers from the set of measurements.

Cost Function Definition.
The introduction of the equivalent arrays makes it possible to exploit the 2D Ray Space localization discussed in Section 2.3.Let us consider a network of  linear arrays arbitrarily deployed in the 3D space, whose positions m 푖 and orientations k 푖 are assumed to be known.For a candidate source position s, the positions of the equivalent array centers can be computed using (25), here denoted as  耠 푥푖 (s) and  耠 푦푖 (s) to stress the dependency on s.With  푖 being the direction angle of the equivalent array and  푖 the DOA in the array reference system, the DOA expressed in the global reference system (i.e., measured in the plane  =  푧 with respect to the  axis) is therefore  푖 =  푖 +  푖 .Consequently, the acoustic ray associated with the DOA measured by the th equivalent array turns to be l 푖 (s) = [ 푖 ,  푖 ,  푖 (s)] 푇 , whose parameters can be computed using (4), obtaining Differently from the pure 2D case discussed in Section 2.3, the parameter  푖 (s) of the acoustic ray is a nonlinear function of the source position s.For this reason, the combination of  equations in the form l 푇 푖 (s)s = 0 does not lead anymore to the Wireless Communications and Mobile Computing definition of a linear system.Nevertheless, we can still formulate the source localization problem as in (9), where now the vector c depends nonlinearly on the source location s.More formally, the optimization problem can be rewritten as the minimization of the cost function (s) = e 푇 (s)e(s); that is,

Minimization.
The minimization of the cost function can be efficiently addressed in an iterative fashion, by locally approximating (s) at each iteration by means of a Taylor series expansion.Following the same approach as in [19], we compute the first-order expansion of the localization residual about an initial guess position The Jacobian matrix ∇ e is given by where According to the problem in ( 27), the update equation of the iterative update is given by [19] where  is the iteration number and The source location is estimated as after stopping the iteration when |s (푡+1) − s (푡) | is smaller than a prescribed threshold.

Localization Refinement.
In adverse acoustic conditions, the DOA set may contain some outliers, typically related to the bias introduced by reverberation in standard DOA estimation algorithms [20].A simple and effective method for detecting outliers is to check the magnitude of localization residuals | 푖 (ŝ)|,  = 1, . . ., .Inspired by the work in [21] related to time differences of arrival-based source localization, we compute the standard deviation  E of the residual set E = { 1 (ŝ), . . .,  푁 (ŝ)}.If  E exceeds a prescribed threshold  E , the DOA set potentially contains outliers.In this case, we mark as an outlier the DOA  푖 associated with the maximum localization residual, that is, so that This outlier DOA  푖 is thus removed from the measurement set, and the associated residual  푖 (ŝ) is removed from E.  The standard deviation  E is then recomputed, and the procedure is iterated until  E ≤  E or a maximum number of measurements are removed.At this point, the source location is refined by running a second time the algorithm presented in Section 4.2, recomputing the cost function with the outlierfree measurement set.As discussed in [21], it is worth noticing that this refinement step has a negligible computational cost, as the residual terms are directly available after source localization.Moreover, the iterative minimization procedure is repeated at most one time.Note also that the localization refinement is accomplished by the central node and does not require additional data exchange with the sensors.

Validation
We now present the results of a set of simulations aimed at verifying the validity of the proposed method.With reference to the setup shown in Figure 4, we tested the localization algorithm in a rectangular room of size 4 × 3 × 3 m 3 by means of  = 8 linear arrays located close to the vertices of the room.Each sensor is equipped with 3 microphones, located on a line segment and uniformly spaced by  = 10 cm.The positions m 푖 and orientations ( 푖 ,  푖 ) of the sensors are summarized in Table 1.We considered 10 test source positions, located on the line connecting the center of the room and the center of the first sensor.

Test 1: Robustness against Additive Noise on DOAs.
In a first stage, we considered a completely anechoic scenario, in which we simulated the measurement of noisy DOAs.This allowed us to test the statistical behavior of the proposed localization cost function.More specifically, we corrupted the nominal DOAs  푖 ,  = 1, . . ., , observed by each array with additive noise  푖 , which is a realization of zero-mean white noise drawn from a normal distribution with standard deviation  휃 .Note that, in this case, the measurements are by definition free of outliers, since the deviation from the nominal DOA values is only due to the additive error model [22].
In order to enforce the statistical validity of the analysis of the cost function, we took care of two aspects.First, we bypassed the source localization refinement step by setting  E = ∞.This avoided the risk of incidentally removing DOAs from the measurement set, which is explicitly free of outliers.Second, we verified (at least empirically) that the iterative minimization procedure converges to the global optimum.Even though the cost function (s) is not convex, for the considered setup we observed that it presents a very smooth behavior, along with a wide basin of attraction around its global minimum.This turns to be true independently from the tested source positions.These facts ensure a rapid convergence of the iterative algorithms towards the solution, without significant risks of getting trapped into local optima.An illustrative example is offered in Figure 5, which shows three planar sections of (s) corresponding to a randomly picked realization of noisy DOAs, for a selected source location.In particular, the source is located at s = where s is the true source location and ŝ푙 denotes its estimate at the th Monte-Carlo run and  is the number of realizations.
As a matter of comparison, we computed the lower bound for the RMSE implied by the Cramer-Rao Lower Bound (CRLB).More specifically, given the CRLB in terms of the localization variances  2 푠  (s),  2 푠  (s), and  2 푠  (s), the RMSE Lower Bound (RLB) was computed as Details about the computation of the CRLB are given in Appendix.
In Figure 6 we report the resulting RMSE as a function of the distance of the source from the center of the room, for three selected noise standard deviations.The continuous lines refer to the RMSE, while the dashed lines denote the related RLB.We observe that RMSE ranges from a minimum of about 2 cm for  휃 = 0.5 ∘ to a maximum of about 7.5 cm for  휃 = 1.5 ∘ .The RMSE tends to increase as the source position approaches the center of the first array.Moreover, we notice that the RMSE approaches the RLB for source positions close to the room center, while the accuracy slightly degrades for farther sources.Nevertheless, the maximum gap between the RMSE and the RLB maintains reasonable, being slightly above 1 cm in the worst case.
Inspired by this observation, we evaluated the RMSE as a function of the standard deviation of the injected noise for  4 selected source positions located at different distances  from the room center.Results are reported in Figure 7.We first notice that the RMSE grows almost linearly with respect to the standard deviation  휃 .Moreover, once again we notice that the gap between the RMSE and the RLB tends to increase when the distance  of the source from the room center increases.In the worst case (i.e., when  휃 = 3 ∘ and  = 1.6 m) this gap is about 2 cm; thus we can conclude that the proposed method is sufficiently robust against Gaussian additive noise on DOAs.

Test 2: Robustness against Reverberation and Noise.
To mimic a realistic scenario, we simulated microphone signal acquisitions in a reverberant room by means of the image source method [23].In particular, we adopted the implementation provided in the RIR toolbox [24] to generate the room impulse responses from the source position to all the microphones.For this test we considered 50 test source positions randomly selected within the rectangular volume formed by the 8 array centers.The acoustic source was modeled to be omnidirectional and emitted 30 s of phonetically rich female speech.The microphone signals were obtained by convolving the source signal with the related RIRs and finally corrupted with zero-mean Gaussian noise with standard deviation tuned to simulate a prescribed signal-to-noise ratio (SNR).The signals were divided into where a(, ) = [ 푗푟sin(휙)휔  , 0,  −푗푟sin(휙)휔  ] 푇 is the far-field steering vector for the used arrays, each composed of three microphones, and G() is the sample estimate of the array covariance matrix, obtained as x (, ) x (, ) 퐻 . (38) The DOA at the th array is finally estimated as where  푖 () is the geometric mean of the pseudospectra  푖 (, ) with frequency bins  in the range [250 Hz, 8 kHz].
The collected DOAs  푖 ,  = 1, . . ., , were then used to estimate source position with the proposed method.For this test, we enabled the localization refinement step by selecting  E = 10 cm, while ensuring that a maximum of 2 potential outliers are removed from the DOA set.As a matter of comparison, we estimated the source position also using the popular steered-response power with phase transform (SRP-PHAT) method, which is known to be very robust against both noise and reverberation.In particular, we used the Stochastic Region Contraction variant of SRP (SRC-PHAT) [25], which is best suited for 3D scenarios.To enable a fair comparison, the generalized cross correlations (GCC-PHATs) were computed only at microphone pairs belonging to the same linear array, for a total of 3 ×  GCC-PHAT computations.Thinking in terms of distributed localization, these GCC-PHATs functions are transferred to the central node that finally combines them to produce the source position estimate.
The results are reported in Figures 8(a) and 8(b), showing the RMSE achieved by the proposed method and SRP-PHAT, averaged among all the tested source positions and all the analyzed audio frames.In particular, Figure 8(a) reports the localization accuracy at different levels of reverberation, expressed in terms of the reverberation time RT 60 [26], while keeping the SNR fixed to 20 dB.Conversely, Figure 8(b) shows the localization accuracy at different SNR levels, while the reverberation is kept fixed to RT 60 ≤ 0.4s.It can be noticed that the proposed technique achieves better scores than SRP-PHAT up to a moderate amount of reverberation RT 60 ≤ 0.6 s and above an acceptable SNR level (i.e., greater than 15 dB).Out of these ranges, SRP-PHAT is slightly more robust against reverberation and noise.

Discussion.
Even though SRP-PHAT tends to be more robust in particularly adverse acoustic environments, this comes at the expense of both higher computational complexity and higher communication bandwidth requirements.

Figure 3 :
Figure 3: Examples of equivalent arrays all measuring the same DOA.In (a) the arrays are all parallel and with aligned centers.In (b) the arrays are equidistant from the source.

Figure 4 :
Figure 4: Simulation setup:  = 8 microphone arrays are disposed near to the vertices of a rectangular room.Blue circles denote the tested source positions.

Figure 5 :
Figure 5: Planar sections of the 3D cost function (s) for a selected test source position at s = [1.5 m, 1.2 m, 1.2 m] 푇 .Three planar sections are displayed, in correspondence of the planes  = 1.5 m,  = 1.2 m, and  = 1.2 m, respectively.

Figure 6 :
Figure 6: RMSE as a function of the source position, expressed in terms of its distance from the room center.

Figure 7 :
Figure 7: RMSE as a function of the noise standard deviation, for source located at different distances  from the room center.

Figure 8 :
Figure 8: Localization accuracy at different reverberation and SNR levels.

Table 1 :
Array positions and orientations.