We propose a new active nonlinear model of the frequency response of the basilar membrane in biological cochlea called the simple dual path nonlinear (SDPN) model and a novel sound processing strategy for cochlear implants (CIs) based upon this model. The SDPN model was developed to utilize the advantages of the level-dependent frequency response characteristics of the basilar membrane for robust formant representation under noisy conditions. In comparison to the dual resonance nonlinear model (DRNL) which was previously proposed as an active nonlinear model of the basilar membrane, the SDPN model can reproduce similar level-dependent frequency responses with a much simpler structure and is thus better suited for incorporation into CI sound processors. By the analysis of dominant frequency component, it was confirmed that the formants of speech are more robustly represented after frequency decomposition by the nonlinear filterbank using SDPN, compared to a linear bandpass filter array which is used in conventional strategies. Acoustic simulation and hearing experiments in subjects with normal hearing showed that the proposed strategy results in better syllable recognition under speech-shaped noise compared to the conventional strategy based on fixed linear bandpass filters.
Cochlear implants (CIs) have been used successfully for the restoration of hearing function in cases of profound sensorineural hearing loss by stimulation of spiral ganglia using electrical pulses. The parameters of the electrical pulses are determined from incoming sound via sound processing strategy. Despite the great progress over a period of more than two decades, many issues remain to be resolved to achieve successful restoration of hearing in noisy environments, melody recognition, and reduction of cognitive load in the patients [
Several methods can be utilized for the improvement of CI. Among them, the development of novel sound processing strategies is particularly useful because it can be accomplished by modifying embedded programs in the speech processor and does not require a change of hardware. A sound-processing strategy is defined here as an algorithm to generate electrical stimulation pulses based on the processing of incoming sound waveforms and is also called an encoding strategy. More accurate imitation of normal auditory function is a promising approach for CI sound-processing strategy development [
It has been suggested that speech perception performance can be improved considerably by adopting an active nonlinear model of the basilar membrane in the cochlea, called the dual resonance nonlinear (DRNL) model [
The aforementioned CI performance improvement by the use of active nonlinear model of the basilar membrane may result from robust representation of formants under noisy conditions. The DRNL model was first applied to a CI sound processor and improved speech perception performance was verified from one listener [
Here, we propose a new active nonlinear model of the frequency response of the basilar membrane, called the simple dual path nonlinear (SDPN) model and a novel sound-processing strategy based on this model. The aim of the present study is only to utilize the advantages of the active nonlinear response and not to replicate the physiological properties of the basilar membrane in biological cochlea in detail. A subset of results has been presented in a conference proceeding [
Figure
(a) General structure of CI sound-processing strategies. Incoming sound is decomposed into multiple frequency bands, and the relative strength of each subband is then determined with an envelope detector to modulate the amplitudes of stimulus pulses after logarithmic compression. (b) The frequency decomposition stage for the conventional strategy based on a fixed linear bandpass filter array. (c) The frequency decomposition stage for the proposed strategy based on the SDPN model.
Figures
Figure
(a) Block diagram of the DRNL model. The output of each cochlear partition is represented as a summation of the outputs from a linear and a nonlinear pathway. (b) Block diagram of the proposed SDPN model.
The block diagram of the SDPN model is shown in Figure
The frequency response of the proposed SDPN model when the center frequency is set to 1500 Hz. When the input amplitude is low, the contribution of the nonlinear pathway is relatively large so that the overall response shows a sharp frequency selectivity determined by the tip filter. As the amplitude increases, the contribution of linear pathway becomes dominant, and the overall frequency response therefore becomes broader.
After frequency decomposition, the envelopes of each channel output are obtained. We used a conventional envelope detector consisting of a rectifier and a low-pass filter. In addition, we also examined the advantages of using an enhanced envelope detector proposed by Geurts and Wouters [
Acoustic simulation can be used to predict performance trends of CI sound-processing strategies and has therefore been utilized for many studies of the development of novel strategies [
Center frequencies and bandwidths of the filter arrays used for frequency decomposition.
4 Channel implementation
Ch. 1 | Ch. 2 | Ch. 3 | Ch. 4 | |
---|---|---|---|---|
CFs and BWs of BPFs (in conventional strategy) | ||||
| ||||
CF (Hz) | 460 | 953 | 1971 | 4078 |
BW (Hz) | 321 | 664 | 1373 | 2426 |
| ||||
CFs and BWs of tip and tail BPFs (in proposed strategy) | ||||
| ||||
CF (Hz) | 460 | 953 | 1971 | 4078 |
BW of tip filter (Hz) | 321 | 664 | 1373 | 2426 |
BW of tail filter (Hz) | 107 | 221.3 | 457.7 | 808.7 |
8 Channel implementation
Ch. 1 | Ch. 2 | Ch. 3 | Ch. 4 | Ch. 5 | Ch. 6 | Ch. 7 | Ch. 8 | |
---|---|---|---|---|---|---|---|---|
CFs and BWs of BPFs (in conventional strategy) | ||||||||
| ||||||||
CF (Hz) | 394 | 692 | 1064 | 1528 | 2109 | 2834 | 3740 | 4871 |
BW (Hz) | 265 | 331 | 431 | 516 | 645 | 805 | 1006 | 1257 |
| ||||||||
CFs and BWs of tip and tail BPFs (in proposed strategy) | ||||||||
| ||||||||
CF (Hz) | 394 | 692 | 1064 | 1528 | 2109 | 2834 | 3740 | 4871 |
BW of tip filter (Hz) | 265 | 331 | 431 | 516 | 645 | 805 | 1006 | 1257 |
BW of tail filter (Hz) | 83.3 | 110.3 | 143.7 | 172 | 215 | 268.3 | 335.3 | 419 |
12 Channel implementation
Ch. 1 | Ch. 2 | Ch. 3 | Ch. 4 | Ch. 5 | Ch. 6 | Ch. 7 | Ch. 8 | Ch. 9 | Ch. 10 | Ch. 11 | Ch. 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
CFs and BWs of BPFs (in conventional strategy) | ||||||||||||
| ||||||||||||
CF (Hz) | 274 | 453 | 662 | 905 | 1190 | 1521 | 1908 | 2359 | 2885 | 3499 | 4215 | 5050 |
BW (Hz) | 165 | 193 | 225 | 262 | 306 | 357 | 416 | 486 | 567 | 661 | 771 | 900 |
| ||||||||||||
CFs and BWs of tip and tail BPFs (in proposed strategy) | ||||||||||||
| ||||||||||||
CF (Hz) | 274 | 453 | 662 | 905 | 1190 | 1521 | 1908 | 2359 | 2885 | 3499 | 4215 | 5050 |
BW of tip filter (Hz) | 165 | 193 | 225 | 262 | 306 | 357 | 416 | 486 | 567 | 661 | 771 | 900 |
BW of tail filter (Hz) | 55 | 64.3 | 75 | 87.3 | 102 | 119 | 138.7 | 162 | 189 | 220.3 | 257 | 300 |
CF: center frequency, BPF: bandpass filter, BW: bandwidth.
The method of acoustic simulation in the conventional strategy was similar to that of Dorman et al. [
For the generation of an acoustic waveform corresponding to the proposed strategy, frequency decomposition was performed by an array of SDPN models, and then the envelopes of the outputs from each SDPN model were extracted by envelope detectors. Either conventional or enhanced envelope detectors were adopted. The amplitudes of sinusoids were modulated according to the outputs from the envelope detectors. The frequencies of sinusoids were the same as in the simulation using the conventional strategy. Note that we assigned one sinusoid per channel, as the center frequencies of the tail and tip filters were identical. Thus, the results of acoustic simulation can be readily compared to those of the conventional strategy. This is different from the case of acoustic simulation of the DRNL-based sound-processing strategy [
Ten subjects with normal hearing volunteered to participate in the hearing experiment (
Syllable identification tests were performed using closed-set tasks. Consonant-vowel-consonant-vowel (CVCV) disyllables were constructed mainly to test vowel perception performance. Each speech token was fixed to the form of /sVda/; that is, only the first vowel was changed whereas the others were fixed to /s/, /d/, and /a/. The first vowel was selected from /a/, /
The acoustic waveforms of speech tokens were generated by 16-bit mono analog-to-digital conversion at sampling rate of 22.050 kHz and stored as .wav files. The stored files were played by clicking icons displayed in a graphical user interface on a personal computer prepared for the experimental run. The speech tokens were presented binaurally using headphones (Sennheiser HD25SP1) and a 16-bit sound card (SoundMAX integrated digital audio soundcard). The sound level was controlled to be comfortable for each subject (range:
Figure
The superiority of the active nonlinear models for robust representation of formants under noisy conditions could be demonstrated by dominant frequency component analysis, that is, by plotting the maximum frequencies of the output from each cochlear partition as a function of the center frequency [
Dominant frequency component analysis for the vowel /i/. F1, F2, and F3 are at 270 Hz, 2290 Hz, and 3010 Hz, respectively. Upper row: under quiet conditions. Middle row: under 2.5 dB WGN. Lower row: under 2.5 dB SSN. Left column: by the linear BPF array. Middle columns: DRNL. Right column: SDPN.
From the results of dominant frequency component analysis, formant representation performance could be quantified by counting the number of cochlear partitions the maximum output frequencies of which were determined by the formant frequencies. We defined two formant extraction ratios (FERs), FER1 and FER2, as the ratios of cochlear partitions with maximum output frequencies that were the same as the 1st and 2nd formant frequencies, respectively. FER1 and FER2 can be regarded as good quantitative measures of saliency of the formant representation in the output speech. Since the performance of nonlinear models could vary according to the input level as the response characteristic changes with respect to the input level, we observed the changes in formant representation performance at various SPLs. Figure
FER1 ((a) and (c)) and FER2 ((b) and (d)) at various sound pressure levels (SPLs) for the vowel /i/. (a) and (b) under WGN of 2.5 dB SNR. (c) and (d) under SSN of 2.5 dB SNR.
Figure
The envelopes obtained from (a) conventional and (b) enhanced envelope detectors after frequency decomposition by the SDPN model. The arrows in (b) indicate emphasis of speech onset.
The results of hearing experiments using acoustic simulation of the proposed sound-processing strategy based on the SDPN model are shown in Figure
Results of syllable identification tests using the sound-processing strategy based on the SDPN and the conventional envelope detector (under quiet conditions or SSN of 2 dB SSN). (a) 4 channels. (b) 8 channels. (c) 12 channels.
Results of syllable identification tests using the sound-processing strategy based on the SDPN and the enhanced envelope detector (under quiet conditions or SSN of 2 dB SSN). (a) 4 channels. (b) 8 channels. (c) 12 channels.
In this study, we proposed a simple active nonlinear model of basilar membrane in the cochlea and developed a novel sound-processing strategy for the CIs based on this model. Acoustic simulation and hearing experiments in subjects with normal hearing indicated that the proposed strategy provides enhanced syllable identification performance under conditions of speech-shaped noise, compared to the conventional strategy using a fixed linear bandpass filter array.
Some previous experimental studies indicated that the active nonlinear frequency response property contributes significantly to robust representation of formant information in noisy environments. Several models were suggested to reproduce this property [
Although the DRNL model is one of the most efficient models in terms of computational costs, its purposes are to quantitative description of the physiological properties of the basilar membrane and to replicate detailed experimental results. The complicated structure and numerous parameters of the DRNL model make it unsuitable for the CI sound processor. The motivation for development of the SDPN model was to simplify the DRNL model without compromising its advantages due to the adaptive nonlinear frequency response. The SDPN model was developed as a further simplification of the DRNL model, with the purpose of developing a CI sound-processing strategy. The emphasis was on reproducing the input-dependent response characteristics of biological cochlea qualitatively. Many building blocks and parameters of the DRNL model were not necessary to implement the level-dependent frequency response of the biological cochlea, because they were adopted for the detailed replication of experimental results and are not essential to our goal here. The proposed SDPN is much simpler than the DRNL but can still provide the level-dependent frequency response, which is beneficial for real-time processing with lower power consumption due to less computation.
The results of dominant frequency analysis verified that more robust formant representation under SSN could be obtained from the proposed SDPN model. When the SDPN model was used, the output frequency was dominated by formant frequencies in much more cochlear partitions compared to the case of the linear bandpass filterbank (Figures
The comparison between the envelopes extracted by two envelope detectors shown in Figure
A new sound-processing strategy for CI should be applied in clinical tests for more comprehensive verification. This requires the modulation of electrical pulse trains based on the sound processor output. The proposed SDPN-based strategy was developed so that it employs one amplitude-modulated pulse train per channel in actual CI devices. Thus, it is readily applicable to the existing hardware of current CIs.
In conclusion, we proposed a simple novel model of active nonlinear characteristics of biological cochlea and developed a sound-processing strategy for CI based on the model. The proposed SDPN model was based on the function of the basilar membrane so that a level-dependent frequency response can be reproduced; it is much simpler than the DRNL model and is thus better suited for incorporation into CI sound processors. The SDPN-based strategy was evaluated by spectral analysis and hearing experiments in subjects with normal hearing. The results indicated that the use of the SDPN model provides advantages similar to those of the DRNL-based strategy in that the formant is more robustly represented under noisy conditions. Further improvement in speech perception under noisy conditions was possible by adopting an enhanced envelope detector.
The authors declare that there exists no conflict of interests.
This study was supported by the Grant from the Industrial Source Technology Development Program (no. 10033812) of the Ministry of Knowledge Economy (MKE) of the Republic of Korea and the Grant from the Smart IT Convergence System Research Center (no. 2011-0031867) funded by the Ministry of Education, Science and Technology as a Global Frontier Project.