Improving the Sound Source Identification Performance of Sparsity Constrained Deconvolution Beamforming Utilizing SFISTA

In this paper, an alternative sparsity constrained deconvolution beamforming utilizing the smoothing fast iterative shrinkagethresholding algorithm (SFISTA) is proposed for sound source identification.)eoretical background and solving procedures are introduced. )e influence of SFISTA regularization and smoothing parameters on the sound source identification performance is analyzed, and the recommended values of the parameters are obtained for the presented cases. Compared with the sparsity constrained deconvolution approach for the mapping of acoustic sources (SC-DAMAS) and the fast iterative shrinkagethresholding algorithm (FISTA), the proposed SFISTA with appropriate regularization and smoothing parameters has faster convergence speed, higher quantification accuracy and computational efficiency, and more insensitivity to measurement noise.


Introduction
Beamforming [1][2][3] based on a microphone array has become a popular sound source identification technology for aircraft [4], express train [5], wind turbine [6], automobile [7], etc. Conventional beamforming (CB) suffers from a poor spatial resolution at low frequency and plenty of spurious sources at high frequency [8][9][10]. To overcome these issues, various deconvolution beamforming techniques with different solving algorithms were developed, such as deconvolution approach for the mapping of acoustic sources (DAMAS) [11], nonnegative least square (NNLS) [12], and Richardson-Lucy (RL) [12] and their corresponding fast Fourier transform-(FFT-) based variants: DAMAS2 [13], FFT-NNLS [12], and FFT-RL [12]. In 2015, on the basis of the iterative shrinkage-thresholding algorithm (ISTA) [14] and the fast iterative shrinkage-thresholding algorithm (FISTA) [15], which are used to solve the inverse problem in the image processing, Lylloff et al. [16] proposed FFT-FISTA deconvolution beamforming for sound source identification. Compared to FFT-NNLS, FFT-FISTA has higher computational efficiency and a better convergence rate. In addition, in the discussion of Ref. [16], it was suggested to extend the capabilities of FFT-FISTA to include a sparsity constraint on the solution to see whether the efficiency could be further improved. However, the proximal operator in FISTA does not have a closed-form solution when solving the sparse recovery problem. is makes it difficult for FISTA to introduce sparse constraints directly and explicitly. To overcome this difficulty, Zhao et al. [17] recently proposed the smoothing fast iterative shrinkage-threshold algorithm (SFISTA), which enjoys the advantage of quickly processing the large-scale problems in the compressive sensing framework. To the authors' knowledge, SFISTA has not yet been successfully adapted to deconvolution beamforming to enhance sound source identification performance so far. In addition, several similar deconvolution beamforming techniques successfully include the sparse distribution constraint of sound source, such as sparsity constrained DAMAS (SC-DAMAS) [18], robust super-resolution approach with sparsity constraint (SC-RDAMAS) [19], and orthogonal matching pursuit DAMAS (OMP-DAMAS) [20]. SC-DAMAS and SC-RDAMAS are solved by the CVX toolbox [21], and the calculation speed is slow. OMP-DAMAS usually requires a priori information about the number of sound sources to obtain good sound source identification performance.
Inspired by Refs. [16,17], this paper proposes a SFISTA deconvolution beamforming, which includes the sparsity constraint that the main sound sources are usually sparsely distributed.
e proposed approach bypasses the priori information about the number of sound sources. Compared to the SC-DAMAS and FISTA, the proposed approach enjoys faster convergence speed, higher quantification accuracy and computational efficiency, and more insensitivity to measurement noise. e remainder of this paper is organized as follows. Section 2 establishes the theory of SFISTA deconvolution beamforming for sound source identification. Sections 3 and 4 compare the performance of deconvolution beamforming utilizing SC-DAMAS, FISTA, and SFISTA by simulation and experiment, respectively. Section 5 concludes this paper.

Theory
e beamforming based on cross-spectral imaging function is a very common method for sound source identification, and it is as follows [22]: where r indicates the position of the focus point where the assuming acoustic source is positioned, C∈ M×M is the crossspectral matrix of the sound pressure signals perceived by array microphones, M is the number of microphones, 1 ∈ R M×M is a matrix with all elements equal to 1, v(r) � [v 1 (r), v 2 (r), . . . , v m (r), . . . , v M (r)] T is the steering vector, w(r) ≡ [|v 1 (r)| 2 , |v 2 (r)| 2 , . . . , |v m (r)| 2 , . . . , |v M (r)| 2 ] T , and the superscript "T" and " * " represent the transpose and the conjugate operator, respectively. v m (r) is defined as where k � 2πf/c is the wave number, f is the frequency, c is the sound speed, i � �� � −1 √ , and r m indicates the position of the mth microphone, m � 1, 2, . . . , M is the index number of microphones.
In the case that the acoustic source is incoherent, the output of beamforming can be expressed in the following linear equation in matrix form: where x � [x(r ′ )] ∈ C N×1 is the unknown column vector of the sound pressure at 1 m distance from the corresponding assuming point sound source, which is used to measure the sound source strength; A � [psf(r | r ′ )] ∈ C N×N is the known PSF matrix, in which psf(r | r ′ ) expresses the beamforming contribution of the unit-amplitude point source at r ′ to the focus point at r and N is the total number of the focus points; b � [b(r)] ∈ C N×1 is the known column vector of CB outputs; n ∈ C N×1 represents the noise.
Considering that acoustic sources are usually sparsely distributed, the majority of the elements in the vector x are zero or approximately zero.
at is, the number of the nonzero elements is far less than that of zero elements. Assuming that the ℓ2-norm of the noise n is bounded by ε, equation (3) can be formulated as min‖x‖ 0 subject to ‖b − Ax‖ 2 ≤ ε. (4) Alternatively, in the field of acoustics, under the restricted isometry, the above nonconvex ℓ0-norm can be approximated by the convex ℓ1-norm, leading to the following relaxed problem: Equation (5) is equivalent to the following unconstrained optimization [17]: where λ is the regularization parameter. Let f(x) � 1/2‖Ax − b‖ 2 2 and g(x) � λ‖x‖ 1 . SFISTA solves equation (6) by smoothing the sparse constraint g(x). e nonsmoothed g(x) is replaced approximately by the corresponding smoothed Moreau envelope g μ (x). Here, g μ (x) is a continuous differentiable and the gradient of g μ (x) is where ∇(.) represents the gradient, μ > 0 is the smoothing parameter, and Γ λμ is the soft shrinkage operator, and it is defined as where [|x| − λμ] + denotes the vector whose components are the maximum number between |x| − λμ and 0 and sgn(.) is the sign function which returns the sign of the variable in parentheses. Initializing e specific steps of the lth iteration are as follows: (1) Calculating ∇f(y (l− 1) ) and ∇g μ (x (l− 1) ): (2) Calculating x (l) :

Shock and Vibration
where P + is the Euclidean projection onto the nonnegative quadrant and L is the Lipschitz constant equal to the largest eigenvalue of A T A.

Simulation
To determine the influence of the parameters λ and μ on the sound source identification performance of SFISTA deconvolution beamforming, a 0.65 m diameter Brüel & Kjaer 36-channel sector microphone array, as shown in Figure 1, is used to conduct the simulation. e calculation plane of interest is set as 1 m × 1 m with 51 × 51 focus point. e grid space of focus points is 0.02 m. e distance between the calculation plane and the array plane is 1 m. e point source at each focus point is considered, and its frequency varies from 2000 to 6000 Hz with a step size of 100 Hz (i.e., 2000 Hz, 2100 Hz, . . ., 6000 Hz). e 1 m sound pressure level (SPL) of the point source is 100 dB, signalnoise ratio (SNR) is 20 dB, and the iteration number is 1000. e average deviation between the output of SFISTA and the theoretical value is acquired by all the source positions and frequencies, as shown in equation (13). erein, N f represents the number of frequency, x(r, r ′ , f) represents the reconstructed SPL (in dB) of focus point r for a certain frequency f and a certain point source at r ′ , x e (r, r ′ , f) represents the exact one. e deviation result is shown in Figure 2. Obviously, the smallest deviation occurs in the region where λ is less than 1 and μ is close to 1. erefore, μ � 1 and μ � 1000 λ (corresponding to the red marker "+" in Figure 2) are used in this paper: Simulations with two known uncorrelated point sources located at (−0. e mainlobe widths of the sources reduce at 6000 Hz, and the two sources are separated. However, many spurious sources appear which leads to a blurred result. Comparing to CB, other three deconvolution algorithms can effectively narrow the mainlobe width, enhance the spatial resolution, and eliminate the spurious sources. Comparing the 2000 Hz results of three deconvolution algorithms in Figures 3-5, it can be generally seen that the lower the SNR, the more irregular the mainlobe. Comparing the submaps (b) to (d) in Figures 3-5, respectively, the mainlobe width of SFISTA is the narrowest, followed by SC-DAMAS and FISTA. Comparing the 6000 Hz results of three deconvolution algorithms in Figures 3-5, there is almost no difference among them due to the high spatial resolution of CB itself.
To verify the quantification performance of SFISTA, taking the result of 20 dB SNR as an example, the quantification accuracy of each approach is described. Table 1   deconvolution beamforming are close to the preset 1 m SPL of the source. It indicates that the sound source can be accurately quantified by the integral value of the mainlobe. en, the difference between the mainlobe integral value and the corresponding peak value of each deconvolution beamforming approach is compared, and the difference of SFISTA is smaller (about 0.1 dB) than SC-DAMAS (about 1.3 dB) and FISTA (about 4.2 dB). Namely, both the mainlobe integral value and mainlobe peak value of SFISTA are close to the peak value of CB. is indicates that the mainlobe width of SFISTA is the narrowest and the convergence is the best at relatively low frequency. At 6000 Hz, due to the high spatial resolution of CB itself, the mainlobe converges to a grid point after deconvolution, and the difference between the mainlobe peak value and the corresponding mainlobe integral value of each deconvolution beamforming is zero. Further, the mainlobe integral values of each deconvolution beamforming and corresponding peak values of CB are compared with the preset 1 m SPL of the source. SFISTA and CB are almost the same and closer to the true value than the other two, and the deviation between the true value and the other two deconvolution beamforming is also less than 1 dB. It indicates that all algorithms can accurately quantify the sound source strength, and SFISTA slightly outperforms SC-DAMAS and FISTA at relatively high frequency. Assuming a known point source with 100 dB is located at (0, 0, 1) m. In Figure 6, the quantification accuracy, convergence performance, and computational efficiency are further compared for the three deconvolution beamforming algorithms. e black dotted line, the red solid line, and the blue dashed line represent SC-DAMAS, SFISTA, and FISTA, respectively. e iteration number of FISTA and SFISTA is 1000. Since SC-DAMAS is solved by the convex optimization MATLAB toolbox, the iteration number cannot be set and the default terminal condition is applied. Figure 6(a) shows the 1 m SPL deviation between the mainlobe integral and the true value at each frequency. When the frequency is lower than 3000 Hz, the deviations of the three algorithms are similar. When the frequency is higher than 3000 Hz, the deviation of SFISTA is smaller than that of SC-DAMAS and FISTA. In summary, the quantification accuracy of SFISTA is superior to the others. e standard deviation, which is used to measure the convergence, is defined as [12] where x (l) is the reconstructed SPL at the lth iteration and x e is the true one. Since the stopping criteria of SC-DAMAS does not depend on the iteration number and its computational efficiency is low, only the standard deviation curves of FISTA and SFISTA at 2000 Hz are given in Figures 6(b) and 6(c). Figure 6(b) shows the curves of the standard deviation vs. the iteration number and, Figure 6(c) shows the curves of the standard deviation vs. computational time.
As shown in Figure 6(b), standard deviation of SFISTA decreases rapidly and tends to be stable after about 1000 iterations. Standard deviation of FISTA decreases slower and tends to be stable after about 2500 iterations. In addition, the stable standard deviation of SFISTA is less than that of FISTA, which indicates that SFISTA enjoys better quantification accuracy. Furthermore, Figure 6(c) shows that SFISTA takes less computational time than FISTA to achieve the same standard deviation. is more intuitively shows that SFISTA converges faster than FISTA.  Shock and Vibration 5 To sum up, SFISTA has faster convergence speed and higher quantification accuracy and computational efficiency compared to SC-DAMAS and FISTA. Further, the uncertainty analysis of the sound source identification performance of SFISTA is performed. In practice, the sound source positions are unknown prior to measurement. So, a statistical simulation based on the Monte Carlo approach is to be used to archive uncertainty [23][24][25]. A 200-time Monte Carlo simulation is performed. e monopole sound source with 100 dB is randomly placed on a 50 cm × 50 cm plane with a 1 m distance from the microphone array. e SPL at 1 m distance from the sound source is retrieved by integration over 4 segments of 0.02 cm × 0.02 cm that are defined in the map around the maximum value [26,27]. e sound frequencies are 2000 Hz and 6000 Hz, and the SNR is 20 dB and 40 dB.  Figure 7(a) is the CDF of the location error, and in general, the location accuracy of 6000 Hz is higher than that of 2000 Hz, and the accuracy is higher in the case of 40 dB SNR than that of 20 dB SNR. Except that there are a few points whose location error is greater than one grid interval in the case of 2000 Hz and 20 dB SNR, the location errors of other points are less than one grid interval, which indicates that almost all the identified sound source positions fall on

Experiment
As shown in Figure 8, the same microphone array as that in Section 3 is used for the experiment. Shock and Vibration and 9(f )-9(h) are the results of SC-DAMAS, FISTA, and SFISTA at 2000 Hz and 6000 Hz, respectively. As shown, all deconvolution beamforming can locate the sound sources accurately. Table 2 gives the experimental results of amplitude quantification. Because we do not obtain the actual strengths of the loudspeaker sources, we use the mainlobe peak value of CB instead of strength to measure the quantification accuracy here. Both the results of 2000 Hz and 6000 Hz indicate that the results of the mainlobe integral value of each deconvolution algorithm are close, and the difference between the mainlobe peak value and the corresponding mainlobe integral value of SFISTA is the smallest. Besides, similar with the above simulation conclusion, compared with SC-DAMAS and FISTA, the mainlobe peak values of SFISTA are closer to those of CB. It also shows that SFISTA enjoys the better spatial convergence performance than SC-DAMAS and FISTA. In summary, SFISTA performs better than SC-DAMAS and FISTA, which is consistent with the simulation conclusion.

Conclusions
In this paper, SFISTA deconvolution beamforming for the sparse sound source identification is proposed. Simulations and experiments indicate that the proposed SFISTA has good acoustic source identification performance. Compared with SC-DAMAS and FISTA, SFISTA enjoys better spatial resolution, convergence performance, quantification accuracy, and computational efficiency.

Data Availability
Datasets generated and analyzed in the current study are available from the corresponding author on reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.