Evaluation of Seven Time-Frequency Representation Algorithms Applied to Broadband Echolocation Signals

Time-frequency representation algorithms such as spectrograms have proven to be useful tools in marine biosonar signal analysis. Although there are several different time-frequency representation algorithms designed for different types of signals with various characteristics, it is unclear which algorithms that are best suited for transient signals, like the echolocation signals of echolocating whales. This paper describes a comparison of seven different time-frequency representation algorithms with respect to their usefulness when it comes to marine biosonar signals. It also provides the answer to how close in time and frequency two transients can be while remaining distinguishable as two separate signals in time-frequency representations. This is, for instance, relevant in studies where echolocation signal component azimuths are compared in the search for the exact location of their acoustic sources. The smallest time difference was found to be 20 μs and the smallest frequency difference 49 kHz of signals with a −3 dB bandwidth of 40 kHz. Among the tested methods, the Reassigned Smoothed Pseudo Wigner-Ville distribution technique was found to be the most capable of localizing closely spaced signal components.


Introduction
The present paper discusses the advantages and disadvantages of seven time-frequency representation algorithms of short transient acoustic signals such as the echolocation signals from dolphins.Previous studies have shown that timefrequency representations of transient biosonar signals are valuable tools when analyzing details in the signals generated by for instance bottlenose dolphins Tursiops truncatus [1,2].However, it is unclear from the literature which timefrequency representation method is best suited for echolocation signals with respect to their inherent properties (signal length, signal to noise levels, frequency component compositions, etc.) or which limitations the different methods have.
This review of methods aims to serve two main purposes.First and foremost it is outlined as a guide to help researchers choose the time-frequency distribution algorithm that is best suited for their specific application in biosonar studies.Secondly, it aims to investigate how close together in time and frequency two transients, with only slightly different timing and/or frequency content, can be from each other while remaining distinguishable by currently existing timefrequency representation algorithms [3,4].
This smallest distinguishable time difference is relevant in for example marine biosonar studies when comparing component azimuths of echolocation signals ("clicks") in different positions within the sonar beam.Such studies would, for instance, be useful when trying to determine the propagation path of echolocation clicks inside a dolphin's head and when trying to establish whether certain echolocation signals originate from one or two sound sources.Reports by Dormer [5], Amundin and Andersen [6], Madsen et al. [7], and Au et al., [8] suggest that the echolocation beam is generated by only one source, the right set of phonic lips.Lammers and Castellote [9] suggest that beluga whales (Delphinapterus leucas) may generate their echolocation clicks by simultaneous activation of both the right and left pairs of phonic lips.Cranford et al. [10] indicate that either the left or the right 2 Advances in Acoustics and Vibration set of phonic lips or simultaneous activation of both sets of phonic lips present in the animals may produce echolocation signals.
The application of adequate time-frequency representation methods will also be useful in studies of the distribution of the frequency components within the beam, as well as in investigations of the internal beam steering ability of echolocating animals [11][12][13].Future studies comparing azimuths and frequency contents of different parts of the echolocation beam are likely to be useful in these areas of biosonar research.However, the smallest distinguishable time difference of transients must first be established theoretically.These results are provided in the present paper.
The signal processing methods have been evaluated after application to echolocation-inspired synthetic test signals where the parameters are based on previously reported measurements and theories regarding the generation of dolphin echolocation [8][9][10][11]14].As a reference, the methods have also been applied to measurement data recorded from a bottlenose dolphin (Tursiops truncatus) performing a target detection task [15].The evaluation is based on holistic visual judgments of the ability of the time-frequency representations to localize the signal components rather than on individual quantified parameters.This strategy was chosen after careful consideration and was based on the fact that an analysis that is strictly based on quantified parameters tends to narrow down the perspective of the algorithms' abilities to visualize the signals in an intuitively interpretable way.This limitation was largely avoided by the chosen method.

Method
The Fractional Fourier Transform has been applied to echolocation signals before [1].However, we chose here not to include the Fractional Fourier Transform in the algorithm comparison study since others have shown that its implementation is problematic.For instance, Bultheel and Martínez Sulbaran [16] demonstrated that the different ways of implementing the Fractional Fourier Transform did not give coherent results.

Spectrogram (Sp).
Traditional spectrum analysis techniques characterize the frequency content of signals without providing any information of when in time the different frequency components of the signal appear.However, the spectrogram divides the signal into several shorter sections, where a regular spectrum is estimated from each section, revealing in which shorter time interval the different frequencies occur.The spectrogram of a signal () is defined as where the window function ℎ(), centered at time , is multiplied with the signal () before the Fourier transform and squared absolute value (see [3]).There are several advantages of using the spectrogram, for example, a fast implementation thanks to the fast Fourier transform (FFT) and a straightforward interpretation.In the ideal case, the window and thereby the time slot should be very short in order to clearly reveal the exact time when the frequencies start and stop.However, a short time slot gives a poor frequency resolution, with the resultant problem that we cannot differ between closely spaced frequencies.Therefore, each application requires a reasonable compromise between high time resolution and high frequency resolution.The length (and shape) of the window function ℎ() determines the resolution in time and frequency.Usually, a proper trade-off between time and frequency resolution is found by choosing a window length of roughly the same size as the length of the signal.The 3 dB bandwidth will then be approximately (2/window length) *   , which is also the best achievable frequency resolution (with   as the sampling frequency).

Wigner-Ville Distribution (WV).
The Wigner-Ville distribution is based on the Fourier transform of the socalled instantaneous autocorrelation, which provides the best possible concentration in time as well as frequency (see [3]).Usually, the expression for the Wigner-Ville distribution is referred to as where () is the measured signal or the corresponding analytical signal.However, this representation can only be used for simple signals, consisting of a single component in the time-frequency plane.With more than one component, cross terms appear.These are located in between the components in the time-frequency plane and can be twice as large as the true signal components.The cross terms show up between all separate components and if the signal is disturbed by high levels of noise, which usually generates many time-frequency components and therefore also many cross terms, it becomes difficult to interpret the Wigner-Ville distribution.
The implementation we have made for this paper is based on the discrete-time, discrete-frequency representation described in Chapter 6 of Boashash [4].

Choi-Williams Distribution (CW).
Various kernels for smoothing of the Wigner-Ville distribution have been proposed to reduce cross terms.The smoothing is formulated as where (, ) is the smoothing kernel and * * denotes double convolution (smoothing), in time as well as frequency.
Possibly the most applied method is the Choi-Williams distribution [17], also called the exponential distribution where and  is a smoothing parameter to be chosen.The Choi-Williams distribution gives a reduction of the cross terms but keeps most of the concentration of the auto-terms, when the signal components are located at different frequencies as well as different times.However, for two components located at the same frequency, or at the same time, the cross terms will still be more or less present and the remaining amplitude of the cross terms could cause a misinterpretation of the distributions.The Choi-Williams kernel is implemented using the sampled discrete-time and discrete-frequency form presented in Chapter 6 of Boashash [4].

Reassigned Multitaper Spectrogram Using Hermite Functions (RMSp).
The concept of multitapers replaces the calculation of the two-dimensional convolution between the smoothing kernel and the Wigner-Ville distribution with socalled multitaper spectrograms, defined as where   (, ) is the spectrogram using the window ℎ  () and  is the number of multitapers.Using Hermite functions as windows has been shown to give the best timefrequency localization and orthonormality in the timefrequency domain.Orthonormality leads to optimal reduction of variance if the disturbances can be defined as white Gaussian noise, which results in a better estimate when the signals are heavily disturbed by noise.Another advantage of the multitaper technique is its easy and fast implementation.The multitaper spectrogram smoothes the time-frequency representation resulting in a lower time-frequency resolution and therefore the reassignment technique is applied as proposed in Xiao and Flandrin [18].

Locally Stationary Process Multitaper Spectrogram (LMSp).
A locally stationary process is here designed to have a covariance function that is a multiplication of a covariance function of a stationary process and a time-variable function as The parameter  is a design parameter.The process is nonstationary with properties suitable for modeling measured signals that, for example, have a transient amplitude behavior like the echolocation signals of odontocetes.For such signals, the mean square error optimal smoothing kernel and the corresponding multitaper spectrogram representation is defined according to where   (, ) is the spectrogram using the th Hermite function.The parameters   ,  = 1 ⋅ ⋅ ⋅ , correspond to a set of weighting factors that are defined depending on the value of the model-parameter  [19].

Reassigned Smoothed Pseudo Wigner-Ville Distribution (RSmWV).
The reassignment principle has existed for some time but was not applied to any large extent until Auger and Flandrin reintroduced the method [20].The idea behind the reassignment is to first reduce the cross terms by smoothing of the Wigner distribution and then recapture the localization of a single component by reassigning its mass to the center of gravity.In RSmWV, the smoothing is defined as where  1 () and  2 () are separate smoothing window functions for time and frequency, respectively.This so-called separable kernel replaces the 2D convolution of the quadratic time-frequency representation with two 1D convolutions, which might simplify the choice of suitable parameters in the design of the windows.In this form, the spectrogram is easily seen as a weighted sum of all the Wigner-Ville distribution values at the neighboring points.The "mass" at these points should be assigned to the centers of gravity, which are calculated as The reassignment is then made according to where  is the Dirac impulse which reassigns (moves) the "mass" of  SP (, V) to the time and frequency values specified by  SP (, V) and  SP (, V).This leads to perfectly localized signals in the case of single component chirp signals, frequency tones, and impulse signals.
The reassignment procedure can be made for all time and frequency values.However, there are several advantages by setting the spectrum values to a lower limit, below which no reassignment is performed.This limit is mostly related to the "noise floor," the disturbances showing up randomly everywhere in the time-frequency plane, caused by the noise disturbance of the signal.The other advantage is the faster computation as the reassignment made at every time and frequency value is a heavy calculation procedure.

Multiplication of Short Time Fourier Transforms (SpM).
The spectrogram multiplication method is based on an optimal combination of spectrograms from a minimum mean cross-entropy criterion, [21] and the solution becomes a multiplication of a long-window and a short-window spectrogram (geometric mean), The idea of this method is to overcome some of the disadvantages of having a trade-off between a high time resolution and a high frequency resolution.The signal components that are present in both spectrograms (both the long-window and the short-window spectrograms) are enhanced with this method.where  = time.Each component was sampled with   = 1 MHz and windowed with a Gaussian window (GW) before summation.The window can be described by

Description of the
where The windows smoothed the frequency spectra so that the true start frequencies were in fact lower than   .The start frequency of the 100 kHz signal component was stepwise decreased until all of the time-frequency representation methods displayed S1 as one single signal and the two components merged into one single click structure.This was done in order to establish the smallest frequency difference that the two components of S1 could have while still being represented as two components by each of the studied timefrequency representations.
The second test signal (S2) (see Figure 2) was a synthetic two-component signal comprising two successive transient chirps with overlapping frequency content; see Figure 2.Both chirps were created in the same way as those in S1.The start frequency of both components was set to 50 kHz and the start time difference was 35 s.The start time difference was later stepwise decreased until the two signal components were represented as a single signal by all methods.This was done in order to establish the smallest distinguishable time difference between two signal components.
The third tested signal (S3) comprised one transient up-sweep chirp and one shorter transient chirp partially overlapping in both time and frequency content; see Figure 3.The shorter of the two components was windowed with a Gaussian window described by (15) where  = 10 −5 /3 s.The amplitude of that component was doubled compared to the other one before windowing to compensate for signal loss due to the narrower window.
Generally, the further apart the components are in either frequency or time, the easier they are to accurately represent in both time and frequency.Therefore, the frequency and/or time differences between the signal components have been chosen in order for some of the studied methods to collapse and not be able to separate the two components.This was Advances in Acoustics and Vibration  done to enlighten both the advantages and limitations of the different methods.
The capacity of the methods to cope with noise was tested by adding white Gaussian noise with a mean value of zero and a variance of 0.1 to S3.
The fourth tested signal (S4) was an echolocation click from a male bottlenose dolphin (Tursiops truncatus).The signal was recorded 45 degrees off the main acoustic axis of the animal and contained two transients, separated in time (see Figure 4).A complete description of how the signal was measured can be found by Branstetter et al. [15].In summary, the animal was performing a strictly echoic target detection task while stationed on a bite plate.The signal was recorded with 300 kHz and band pass filtered from 100 Hz to 200 kHz.
All the time-frequency representation figures (Figures 5-10) consist of grids of 300 time points and 819 frequency   is narrowed, the resulting picture will visualize just one component.When the time difference between the signal components was decreased to 20 s, only RMSp, RSmWV, and SpM displayed them as separate signals.Signals closer in time than 20 s were visualized as a single signal.It is also notable that the shape of the slope of the chirp in S2 was discernible with RMSp, in this case (compared to the previous example with S1), and the slope could not be clearly visualized by any of the other methods.S3 is a good example of a signal that is difficult to display accurately in the time-frequency plane (see Figure 7).RMSp, RSmWV, and LMSp displayed it as containing two components separated in time and partially overlapping in frequency content, as the frequency resolution was too low using the best possible time-resolution.The RSmWV and the LMSp representations factually indicated that the signal contained one short broadband transient in addition to one longer and more narrow band transient.All other methods represented S3 as a triangular structure, caused by the cross terms.
Adding white Gaussian noise of 0.29 V rms to S3 (see Figure 8) gave the result shown in Figure 9.With the exception of RSmWV, all methods demonstrated a collapse in the presence of intense noise.The Sp, LMSp, and Spm all become difficult to interpret, due to the smoothing of signal components and additional interfering with noise components.The cross terms with noise components in WV and CW make these time-frequency spectra impossible to interpret.In the RMSp reassignment of noise components to single time-frequency points also makes the visualization unclear.
According to all methods, S4 contained two click components and one low-frequency click approximately 63 s before a more high-frequency click (Figure 10).The center frequency of the two clicks was 25 kHz and 103 kHz.The time-frequency visualizations are most similar to the example signal S3, as there are both a time and a frequency difference between components.The typical view with cross terms between components is seen for the WV and CW and the smoothing, especially in frequency, is clear for the Sp, LMSp, and SpM.However, RMSp and RSmWV showed more localized signal energies as compared to the other methods.
Table 1 summarizes the results in terms of which methods that were able to localize the signal components in both time and frequency for the smallest possible frequency and time differences.The RSmWV method was the only one that managed to localize both signal components accurately for extremely small time and frequency differences between the signal components.
The two-component Gaussian windowed chirp signal used in this study consists of components that are approximately 70 s long.For components overlapping only in time, a frequency difference no smaller than 49 kHz was required for signals with a −3 dB bandwidth of 40 kHz in order to be able to clearly localize both signal components.For components overlapping only in frequency, the time difference had to be at least 20 s for any of the investigated methods to visually resolve them as two components.For signals of more complex structure, for example, overlapping in frequency as well as in time, and where the two components were of different length, special care had to be taken when interpreting the results.For such signals, it is advisable to base the interpretation on the results from more than one method.
Furthermore, the results indicate that the RSmWV method was the least sensitive to noise disturbances.It was also the most robust when taking into account that different noise realizations result in rather different appearances of the time-frequency representations.RSmWV and RMSp are both based on smoothing or multitapering, which reduces noise disturbances in the time-frequency plane.The reason for the RMSp and RSmWV to perform different and better than  the other methods is however the reassignment technique, which moves mass of the spectrum to the mass center of the corresponding spectrum.Therefore, the reassignment based methods are not useful for finding, for example, the energy or amplitude of components but is a technique for visualization to resolve time-frequency components.A general reflection is that these results also indicate that the

Figure 1 :
Figure 1: Test signal 1 (S1) is composed of two transient up-sweep chirps overlapping in time and with separate frequency content.

Figure 2 :
Figure 2: Test signal 2 (S2) is composed of two transient down-sweep chirps overlapping in frequency content with a 35 s start time difference.

Figure 3 :
Figure 3: Test signal 3 is composed of two consecutive transient up-sweeps with different chirp rates, partially overlapping in frequency content.

Figure 4 :
Figure 4: Time signal and frequency spectrum of a bottlenose dolphin (Tursiops truncatus) echolocation click recorded 45 degree off axis, to the right of the animal.

Figure 8 :
Figure 8: Test signal 3 after adding white Gaussian noise with a mean value of zero and a variance of 0.1 to S3.
reassignment technique is very useful when it comes to noisy transient signals and would thus benefit scientists in marine biosonar research.Nevertheless, there is room for further developments of for instance design parameters and smoothing/multitaper techniques of time-frequency representations to further optimize the reassignment methods to short transient biosonar signals in future studies.
Hz/s, the start frequencies (  ) of each signal component are 40 kHz and 100 kHz, and the total signal is approximately 70 s long.It is generated by the superposition of two chirp components (  ,  = 1, 2) defined by

Table 1 :
Summary of results.