Comparative Study between the Discrete-Frequency Kalman Filtering and the Discrete-Time Kalman Filtering with Application in Noise Reduction in Speech Signals

1Department of Electrical Engineering, Universidade Federal de Uberlândia, Av. João Naves de Ávila, 2160 Bloco 3N, Campus Santa Mônica, Uberlândia, MG, Brazil 2Department of Electrical Engineering, Faculdade de Talentos Humanos, R. Manoel Gonçalves de Rezende, 230 São Cristóvão, Uberaba, MG, Brazil 3Department of Electrical Engineering, Universidade Federal de Goiás, Av. Esperança, s/n. Campus Universitário, Goiânia, GO, Brazil


Introduction
Even with the advent of the Internet, voice transmission is still one of the most important ways of communication.The quality and intelligibility of speech signals play a key role in the ease and precision during information exchange.Practically in almost all voice transmission applications, the quality can be affected by factors such as ambient noise, losses due to digital link encoding, and interference from other conversations or even from other signal sources [1].
In order to overcome their harmful effects, digital speech processing techniques can be employed to reduce or even eliminate them.In recent years, some techniques and methods such as spectral subtraction, Kalman filtering, psychoacoustics, and wavelet transforms gained more prominence, especially in noise reduction, so that many research efforts have been made for improving them.
In [2,3], the authors enhance speech quality by removing the musical noise introduced by spectral subtraction.In [1], the authors combined spectral subtraction and wavelets on a prefiltering approach for noise reduction in speech signals and used the result as an initial guess for a Kalman filter.When compared to Kalman filtering using only wavelets or spectral subtraction alone to produce the initial guess, their method showed the least spectral distortion and a similar segmental output signal-to-noise ratio.
Since wavelet-based denoising is highly dependent on thresholding the approximation and detail coefficients, recent research in this area focuses on new thresholds [4,5].
Shao and Chang [6] concatenated the Kalman filter to a bank of wavelet filters with a perceptual weighting filter.They used a technique of masking the psychoacoustic model to derive the weighting filter.According to the authors, that work brought two contributions.The first one was the wavelet-based auditory model with a perceptual wavelet filter bank that maps the frequency response of the human auditory system through subband decomposition.The second was the Kalman filter using a voice state space model in the wavelet domain, whose computational cost was reduced when compared to the discrete-time Kalman filter.They were able to reduce the noise in different environments with low signal degradation.
Dhivya and Justin [7] proposed a noise reduction based technique that applies spectral subtraction to the wavelet approximation coefficients and soft thresholding to the detail coefficients.They used five wavelet filters and compared them according to their output signal-to-noise ratios.Besides the output SNR, they also considered the correlation coefficient and the perceptual evolution of speech quality (PESQ) criteria.
However, although these algorithms show significant advances in noise removal, most of them do not evaluate spectral distortion nor do they attempt to minimize it.So, since the method in [1] provided low spectral distortion, this article proposes a comparative study between discrete-time and discrete-frequency Kalman filters simply using the noisy signal as initial estimate.According to Fujimoto and Ariki [8], the main difference between the two approaches is that the operation of the Kalman filter is more computationally efficient in the frequency domain than in the time domain.
On the other hand, transforming the set of Kalman filter equations to/from the frequency domain produces a significant distortion in the estimated signal.Then, we used prefiltering based on spectral subtraction to reduce this distortion.In order to assess the performance of the proposed algorithms, we measured both the segmental signal-to-noise ratio of the outputs and the Itakura-Saito distance.
This article is structured as follows: Sections 2 and 3 describe the discrete-time and discrete-frequency Kalman filtering algorithms, respectively.Section 4 brings the experimental results and finally, in Section 5, the conclusions are presented.

Discrete-Time Kalman Filtering (DTKF)
In the 1960s, Rudolf Emil Kalman published the paper "A New Approach to Linear Filtering and Prediction Problems", describing a recursive solution to the discrete-time linear filtering problem [1].Since then, due to the major advances of digital computing, Kalman filtering has become a very important technique in several areas such as navigation, monitoring processes, economics, and signal reconstruction from noisy samples.
In this article, the Kalman filtering development follows the heuristics described by Vaseghi [9].Thus, the speech signal is modeled as an autoregressive process of order , AR(), according to where   () are the linear prediction coefficients of order , () is the prediction error associated with the excitation of the source-filter model of speech production, and () is the th sample of the speech signal.It can be observed that, in the acquisition process of audio and speech signals, most of the signals are captured in the presence of some type of additive noise.Consequently, we can model the noisy signal as shown in where () is the noisy speech signal and V() is a white Gaussian additive noise.
From ( 1) and (2), we can set up a state space model described by ( 3) and (4), respectively [9]: where x() is the ×1 state vector at time ; A(−1) is the state transition matrix with dimensions  ×  that relates current time  with past time ( − 1); w() is the  × 1 input vector of the state equation and it is modeled as a white noise; y() is the  × 1 observation vector; H() is the channel distortion matrix of dimensions  × ; and v() is an  × 1 additive white noise vector [9].According to Vaseghi [9], w() and v() are assumed to be independent white noise processes so that where R() and Q() are diagonal covariance matrices, respectively, related to the additive noise and the prediction error.
The Kalman filtering estimates a process by using a kind of feedback control: first, the filter estimates the state of the process at a given time, then the feedback is obtained in the form of a new measurement.
Brown and Hwang [10] and Vaseghi [9] divided the Kalman filtering equations into two groups.The first ones are the time-update equations (prediction) and the second are the measurement-update equations (correction).Equation (7) describes the time-update: while measurement-update equations are shown in ( 8) and (9), respectively.
x (   ) = x (   − 1 ) where P(/) is the error covariance matrix at time ; K() is the Kalman gain matrix, responsible for minimizing P(/); and x(/) is the state estimate at time , according to the previous observations of y().

Discrete-Frequency Kalman Filtering (DFKF)
Fujimoto and Ariki [8] introduced the discrete-frequency Kalman filtering (DFKF) in 2000 to provide more computationally efficient algorithm.This is accomplished by transforming the Kalman filter equations to be iterated in the frequency domain and then inverse transforming the estimated spectrum back to the time domain to find the estimated signal.In order to do so, they divide the frequency domain into multiple frames in such a way that the th frame (, ) is the complex spectrum of the noiseless signal (, ) and (, ) is the white Gaussian noise.Thus, the noisecorrupted signal (, ) is given by the following equation [8]: Since (, ) can be replaced by the Inverse Discrete Fourier Transform (IDFT) of (, ), we have In matrix notation, ( 12 . . .

𝑋 (𝑁 − 1, 𝑙)
) that can be simply written as where  represents time within th frame,  is the number of samples in the frame, and F  is the  × 1 vector containing the basis of the Discrete Fourier Transform (DFT).X  is the complex spectrum vector of the th frame.Since time  has no meaning for X  , there is no state transition matrix in the Kalman equations for the frequency domain, so that the computational effort of the DFKF is significantly reduced.Analogous to the DTKF, the DFKF can be represented by the following equations: Noiseless Speech Signal where (⋅)  means the complex conjugate transpose of a matrix.
In order to obtain the estimated signal of the Kalman filter in the time domain, we must apply the Inverse Discrete Fourier Transform (IDFT) on (16).

Results
In order to compare the performances of the studied techniques, we used 25 different recorded speech signals sampled at 22050 Hz and coded with 16 bits per sample.Each signal was windowed by a Hamming window of size 512 with 50% overlap.All tests were performed using Matlab R2013B on a Core i7 processor computer with 8 GB RAM.
The quality of the estimated speech signal in the output of each filter was evaluated using the segmental signal-to-noise ratio (SNRseg).We have chosen the SNRseg because it can be calculated over short segments of the speech signal, in order to balance the weights assigned to each segment of higher or lower signal strength.SNRseg is given by [11] SNRseg = 10 where  are the limits of each one of the  frames of length .To carry out the tests, the signals were contaminated by additive white noise and the input segmental signal-to-noise ratio (SNRI) was adjusted to 3 dB.
As reported by Rabiner and Schafer [12], a suitable way to measure spectrum variations is the Itakura-Saito distance.Such measure can be calculated as where a and b are the linear prediction coefficients (LPC) vectors of the original and estimated signals, respectively, and R is the autocorrelation matrix of the original signal.The closer to each other the spectra of the original and estimated signals, the smaller (b, a).Thus, an Itakura-Saito distance equal to zero indicates that the spectra are the equal [12].The DTKF algorithm was employed in the first test, which used the utterance elétrica (electrical in Portuguese).The results are shown in Figures 1, 2, and 3, respectively.Figures 2 and 3 evidence the noise reduction, especially during the silence parts of the signal.The SNRO in this case was 10 dB and the Itakura-Saito distance was 0.3250.
The second test preserved the same parameters of the first test except for the use of DFKF.The results are shown in Figures 4 and 5, respectively.The SNRO was 8 dB and the comparison of Figures 4 and 5 shows a considerable reduction in the noise.However, the Itakura-Saito distance was 0.3782, which indicates a larger distortion in the filtering.
Therefore, the DTKF algorithm produced smaller spectral distortion than the DFKF but provided a larger SNRO.
The results of the tests for the 25 words are presented in Figures 6 and 7. Figure 6 shows that the SNRO in targeted tests was almost always the same for DTKF and DFKF, with an average of 9 dB.
Figure 7 shows that the DTKF algorithm produced smaller signal distortion for all tests.Thus, we can affirm that the DTKF is more suitable than the DFKF for speech processing.
Tests were also performed after prefiltering the noisy signals.The prefiltering was based on spectral subtraction like in [1].All results showed that the DTKF produced smaller spectral distortion than DFKF.The spectral distortions for the 25 words are shown in Figure 8 for an SNRI of 3 dB.
The comparison of Figures 7 and 8 indicates that prefiltering allowed only a tiny improvement in the reduction of spectral distortion provided by the DTKF algorithm.

Conclusions
This paper presented a comparative study between discretetime and discrete-frequency Kalman filtering algorithms.Tests were carried out with 25 different words using Itakura-Saito distance to measure the spectral distortion and the segmental signal-to-noise ratio to evaluate the noise reduction.
Although the two algorithms performed very similarly regarding noise reduction, discrete-time Kalman filtering produced smaller spectral distortion on the estimated signals for all targeted tests.This shows that discrete-time Kalman  filtering is more suitable than discrete-frequency Kalman filtering for the reconstruction of speech signals corrupted by additive white noise.

Figure 1 :
Figure 1: Noiseless signal used for comparison with the estimated signal.

Figure 2 :Figure 3 :
Figure 2: Contaminated signal with white noise applied to the DTKF algorithm.

Figure 4 :Figure 5 :
Figure 4: Contaminated signal with white noise applied to the algorithm DFKF.

Figure 6 :
Figure 6: Comparison for segmental signal-to-noise ratio output (SNRO) with 25 words contaminated by white noise with signal-tonoise ratio input (SNRI) of 3 dB.

Figure 7 :
Figure 7: Comparison for spectral distortion for 25 words contaminated by white noise with signal-to-noise ratio input (SNRI) of 3 dB.

Figure 8 :
Figure 8: Comparison for spectral distortion for 25 words contaminated by white noise with signal-to-noise ratio input (SNRI) of 3 dB, using spectral subtraction with prefiltering of the contaminated signal.