Small agglomerative microphone array systems have been proposed for use with speech communication and recognition systems. Blind source separation methods based on frequency domain independent component analysis have shown significant separation performance, and the microphone arrays are small enough to make them portable. However, the level of computational complexity involved is very high because the conventional signal collection and processing method uses 60 microphones. In this paper, we propose a band selection method based on magnitude squared coherence. Frequency bands are selected based on the spatial and geometric characteristics of the microphone array device which is strongly related to the dodecahedral shape, and the selected bands are nonuniformly spaced. The estimated reduction in the computational complexity is 90% with a 68% reduction in the number of frequency bands. Separation performance achieved during our experimental evaluation was 7.45 (dB) (signal-to-noise ratio) and 2.30 (dB) (cepstral distortion). These results show improvement in performance compared to the use of uniformly spaced frequency band.
Speech communication and recognition systems are widely used in the present-day world, generally under reverberant and noisy conditions. An acoustic sound field is described by source signals and impulse responses which correspond to source locations and reflections, in other words, virtual source locations. The voice terminals are usually equipped with microphones, which are used to observe speech signals. In general, the observed signals include some source speech signals, mixed with each other and with the acoustic sound field. Extracting source signals and their locations, which is called encoding an acoustic field, is an important technique for acoustic schemes such as highly realistic communication and speech recognition systems. Blind source separation (BSS) is a useful method used to extract the sound source signals, and frequency domain independent component analysis (FDICA) [
A blind source separation method using a dodecahedral microphone array (DHMA) system for the acoustic field encoding has been proposed by Ogasawara et al. [
For a faster convergence to estimate the separation matrix of FDICA, only a limited number of frequency bands with uniformly spaced intervals can be selected for the estimation [
In this paper, we propose a BSS method with a frequency band selection method suitable for a DHMA, in order to reduce computational complexity. This method uses the spatial characteristics of a DHMA, and results show separation performance nearly equivalent to the method proposed in [
Figure
Dodecahedral microphone array (DHMA).
Characteristics of the DHMA are fully described in [
The conventional method of BSS is based on FDICA. The permutation problem is solved by the physical characteristics of the DHMA, namely, its dodecahedral shape. In addition, this method does not require prior information, such as the number of sound sources or source locations.
The number of the source signals is estimated using the eigenvalue of the spatial covariance matrix. The number of dimensions of FDICA is reduced using a subspace method (PCA) from the number of microphones. The separation matrix is estimated by FDICA, the scaling problem is solved using the projection method [
In this paper, we propose a method involving less computational complexity than the conventional method [
Block diagram of the proposed method.
To improve the efficiency and effectiveness of the frequency domain BSS method, frequency band selection has been proposed by the authors in [
When estimating the separation matrix of FDICA, it is very important that the training of the separation matrix is performed on highly separable frequency bands when using band selection method. In this paper, magnitude squared coherence (MSC) is considered as a method of selecting the separable frequency bands. MSC corresponds to a measure of the interference between two signals and is formulated as follows:
In this section, we experimentally evaluate the effectiveness of using MSC for a DHMA. As mentioned in Section
Simulation conditions.
Sampling frequency | 40 (kHz) |
Source signal | Speech (6 males, 6 females), 4 (sec) |
Target frequency region | 0–8 (kHz) |
Number of sources | 12 |
Velocity of sound | 340 (m/sec) |
Reverberation time | 138 (msec) |
Window function | Hann |
Window length | 1024 (sample) |
Shift length | 256 (sample) |
FFT length ( |
1024 (sample) |
Sources and loudspeaker positions.
Figure
Example of MSC: same face.
Example of MSC: different faces.
Figure
Averaged experimental MSC (AEMSC).
Region of the band selection.
In Section
The source signals are transferred from their locations to the DHMA, and a mixing matrix is described in frequency domain
The BSS method using a DHMA is conducted under an overdetermined condition, because a DHMA can include up to 160 microphones. FDICA assumes that the number of source signals is equal to the number of the microphones, and thus an estimation of the number of source signals and a reduction in the number of observed signals are needed to perform FDICA. In addition, the reduced dimensions must exceed the number of actual sound sources, because not only actual sound source signals but also reflection waves are used in FDICA. The spatial covariance matrix is calculated by an expectation of the observed signal
The threshold for the normalized eigenvalues evaluates the number of virtual sound sources in each frequency band, and the maximum estimated value in all frequencies is assumed to be
Following estimation of the number of virtual sound sources
Solution of the permutation problem affects separation performance significantly, and Ogasawara et al. have proposed a method which combines the acoustic pressure distribution and the relative phase distance [
The Moore-Penrose pseudoinverse of the separation matrix corresponds to the mixing matrix; in other words, it corresponds to the transfer functions between the source signals and the observed signals. The transfer function is estimated as the
Two similarities, acoustic pressure distribution
The similarity described in (
In this section, computational complexity is estimated from order
In general, eigenvalue decomposition (EVD) is solved by the Householder method (HHM) and the implicit shifted QL method (ISQL) [
Estimated computational complexity is shown in Table
An example, estimation is shown in Table
Computational complexity.
Method | Complexity |
---|---|
STFT (forward and inverse) |
|
Covariance matrix |
|
Eigenvalue decomposition |
|
Subspace method |
|
Separation matrix |
|
Projection method |
|
Hierarchical clustering |
|
Estimated computational complexity.
Method | Complexity |
Complexity |
Ratio |
Complexity |
Ratio |
---|---|---|---|---|---|
STFT (forward and inverse) |
|
|
1.0 (0 [%]) |
|
4.8 |
Covariance matrix |
|
|
0.4 (60 [%]) |
|
25.6 |
Eigenvalue decomposition |
|
|
0.4 (60 [%]) |
|
10.3 |
Subspace method |
|
|
0.4 (60 [%]) |
|
53.5 |
Separation matrix |
|
|
0.4 (60 [%]) |
|
156.6 |
Projection method |
|
|
0.4 (60 [%]) |
|
10.3 |
Hierarchical clustering |
|
|
0.16 (84 [%]) | — | — |
| |||||
Total |
|
|
0.16 (84 [%]) |
|
0.80 (20 [%]) |
In Section
In this paper, we compare our proposed method with the conventional method because it is important to evaluate feasibility of the proposed method, which might represent a balance between separation performance and computational complexity.
The experimental conditions are the same as in Figure
Results for each performance criterion are shown in Table
Experimental results.
|
Number of selected bands | Computational complexity | SIR improvement (dB) | Segmental SNR (dB) | Cepstral distortion (dB) |
---|---|---|---|---|---|
(1,1,1) (conventional method) | 200 | 1.0 | 24.4 | 7.85 | 2.65 |
(1/3,1/2,1) | 134 | 0.45 | 22.3 | 7.80 | 2.51 |
(1/3,1/2,1/2) | 97 | 0.24 | 21.8 | 7.74 | 2.48 |
(1/5,1/3,1/2) | 77 | 0.15 | 20.6 | 7.52 | 2.30 |
(1/5,1/3,1/3) | 64 | 0.1 | 20.8 | 7.45 | 2.30 |
Uniformly spaced | 64 | 0.1 | 19.9 | 6.18 | 2.58 |
SIR improvement.
Segmental SNR.
Cepstral distortion.
The configuration “(1,1,1)” in Table
The proposed band selection method for the frequency domain BSS method is advantageous to achieve the trade-off between the separation performance with the significantly low degradation of the sound quality and the computational complexity. On the other hand, the degradation of the separation performance shows a disadvantage of the proposed method compared to the conventional method which uses all of the frequency bands. The joint diagonalization using SOS such as TRINICON shows less computational complexity, around 20% reduction in Section
A blind source separation method with efficient computational complexity for use with an agglomerative DHMA which can encode the acoustic field is proposed. The proposed band selection method uses the spatial characteristics of a DHMA, and a preliminary experiment on magnitude squared coherence describes the criterion of the band selection process. The proposed method uses nonuniformly spaced selection of frequency bands, which contributes to improved separation performance versus uniformly spaced band selection in experiments. Estimated computational complexity was greatly reduced during hierarchical clustering, and thus the total reduction in complexity achieved exceeds the reduction due to limitation of the number of the frequency bands. For example, if the number of frequency bands is reduced by 60%, the total reduction in computational complexity achieved is 84%. Experimental results of the proposed method show practical separation performance compared to the conventional method. In addition, equivalent signal distortion compared with the conventional method is maintained. Band selection is simply based on the spatial characteristics of the DHMA, and therefore any state-of-the-art frequency domain BSS method with the permutation solver can be applied to the proposed method without a loss of generality. However, the method proposed in this paper is only considered for use with an off-line algorithm; therefore future work includes developing an on-line causal method to reduce computational complexity.