This demonstration simulates what speech and music would sound like through a cochlear implant. These simulations were generated using the noise-band vocoder first described in Shannon et al. (Science, 1995, 270: 303-304). While the quality of the sound is not necessarily the same as what a person hears with a cochlear implant, laboratory tests have demonstrated that the intelligibility of speech with this simulation is very similar to that of a cochlear implant.
The simulations are generated by dividing the speech frequency spectrum into several spectral bands by band-pass filters. The envelope is extracted from each band by half-wave rectification and low-pass filtering at 160 Hz. This envelope signal is used to modulate a wide-band white noise, which is then filtered with a band-pass filter, usually the same filter used in the analysis. The modulated noise bands are then summed and written to the enclosed demo file. Brief descriptions of the files are as follows.
Decrease_Channels and Increase_Channels: These files demonstrate the effect of changing the spectral resolution on speech recognition. The two speech demo files each contain a single sentence that has been processed with a noise-band processor (Shannon et al., 1995) with 1, 2, 4, 8, and 16 bands, as well as the original unprocessed sentence. The following audio files are mp3 files and must be played back on a program that supports mp3, such as Windows Media player and Apple Quicktime.
Decrease_Channels
contains the original sentence
first and then progressively degrades the spectral resolution. The
order is: original, 32 channels, 16, 8, 4, 2, and 1 channel. Since the
content of the sentence is known, it is possible to understand the
sentences down to 4 and even 2 spectral channels. Most cochlear implant
listeners are in the 4 to 8 channel range.
However, Increase_Channels
presents the demonstration in the reverse order - starting from
poor quality, so that the content of the sentence is not understood at
first. As the number of channels is increased from 1 to 2 to 4 you will
start to understand the words in the sentence at 4 channels. From 4 to
8 to 32 channels the words are clearer and the sound has better
quality. Finally, the original sentence is played.
Music1 presents a clip of a very familiar popular song, with a single
male
singer. First you will hear the song with 4 channels, then 8, 16, and
32 channels. Finally you will hear the original music clip. Notice that
you may understand the words of the song at 4 or 8 channels, even
though you may not recognize the melody. Even at 16 and 32 channels the
melody is not very good quality.
Music2 presents the opening lines of a very popular and familiar piece
of instrumental music. As with previous demo, the number of channels
increases from 4 to 8 to 16 to 32. The original music clip is played at
the end. Note that you will probably not recognize it even with 32
channels. However if you listen to the sequence repeatedly you will
notice that some melodic information is available with 16 and 32
channels, although the quality is still poor compared to the original.
These demonstrations highlight the difference in the role of the brain and the ear in listening to speech and music. Speech is more of a "top-down" pattern recognition process by the brain, which is tolerant of considerable degradation. However, music requires most of the fine details of processing in the cochlea (inner ear) to achieve melody recognition and good quality.