Cochlear Implant Simulation

This demonstration simulates what speech and music would sound like through a cochlear implant. These simulations were generated using the noise-band vocoder first described in Shannon et al. (Science, 1995, 270: 303-304). While the quality of the sound is not necessarily the same as what a person hears with a cochlear implant, laboratory tests have demonstrated that the intelligibility of speech with this simulation is very similar to that of a cochlear implant.

The simulations are generated by dividing the speech frequency spectrum into several spectral bands by band-pass filters. The envelope is extracted from each band by half-wave rectification and low-pass filtering at 160 Hz. This envelope signal is used to modulate a wide-band white noise, which is then filtered with a band-pass filter, usually the same filter used in the analysis. The modulated noise bands are then summed and written to the enclosed demo file. Brief descriptions of the files are as follows.

Decrease_Channels and Increase_Channels: These files demonstrate the effect of changing the spectral resolution on speech recognition. The two speech demo files each contain a single sentence that has been processed with a noise-band processor (Shannon et al., 1995) with 1, 2, 4, 8, and 16 bands, as well as the original unprocessed sentence. The following audio files are mp3 files and must be played back on a program that supports mp3, such as Windows Media player and Apple Quicktime.


Decrease_Channels contains the original sentence first and then progressively degrades the spectral resolution. The order is: original, 32 channels, 16, 8, 4, 2, and 1 channel. Since the content of the sentence is known, it is possible to understand the sentences down to 4 and even 2 spectral channels. Most cochlear implant listeners are in the 4 to 8 channel range.





However, Increase_Channels presents the demonstration in the reverse order - starting from poor quality, so that the content of the sentence is not understood at first. As the number of channels is increased from 1 to 2 to 4 you will start to understand the words in the sentence at 4 channels. From 4 to 8 to 32 channels the words are clearer and the sound has better quality. Finally, the original sentence is played.





Music1 presents a clip of a very familiar popular song, with a single male singer. First you will hear the song with 4 channels, then 8, 16, and 32 channels. Finally you will hear the original music clip. Notice that you may understand the words of the song at 4 or 8 channels, even though you may not recognize the melody. Even at 16 and 32 channels the melody is not very good quality.





Music2 presents the opening lines of a very popular and familiar piece of instrumental music. As with previous demo, the number of channels increases from 4 to 8 to 16 to 32. The original music clip is played at the end. Note that you will probably not recognize it even with 32 channels. However if you listen to the sequence repeatedly you will notice that some melodic information is available with 16 and 32 channels, although the quality is still poor compared to the original.





These demonstrations highlight the difference in the role of the brain and the ear in listening to speech and music. Speech is more of a "top-down" pattern recognition process by the brain, which is tolerant of considerable degradation. However, music requires most of the fine details of processing in the cochlea (inner ear) to achieve melody recognition and good quality.