SummerFest '09 Summer School: An Introduction to Speech Processing

Course Description

Prerequisites:

This course is most appropriate for someone who has some background in signal processing, but should be easy for anyone else to follow.

Content:

This course is an introduction to the speech signal and how it is processed by humans and by machines. We begin with a brief overview of how speech is produced before looking at the properties of the acoustic speech signal in the time and frequency domains and how speech is perceived by humans. Then we look at a couple of methods of analysing the speech signal. Speech signal analysis and human perception are tied together by looking at speech coding (compression of audio signals), in particular perceptual coding of sound using psychoacoustic models, such as MP3. We will then examine various techniques of automatic speech recognition, speech synthesis and cochlear implant sound coding, finishing up with an overview of areas of ongoing speech processing research.

Overview:

The aim of this course, “Introduction to Speech Processing”, is to provide an understanding of the basic techniques used in the major applications of speech coding, speech recognition, speech synthesis, and speech processing for the cochlear implant (Bionic Ear). The components of the course are…

The Speech Signal: This will briefly cover the basic characteristics of the speech signal and the key acoustic features that are important for its perception and characterisation. These features include waveform properties and spectral properties, such as formants and other regions of high energy.

Speech Perception: This will cover how the ear converts acoustic vibrations into action potentials that are sent to the brain, and introduces the concepts of critical bands and perceptual masking, including frequency and temporal masking.

Analysis of Speech: This section will briefly cover analysis methods of speech, including windowing, energy & magnitude extraction, zero crossings, autocorrelation, and frequency domain processing.

Audio Coding: The concept of audio coding will be described followed by techniques for compressing audio using different quantisation schemes. A detailed description of the MP3-type encoding method will be used as an example of perceptual coding of sound.

Automatic Speech Recognition: A large portion of the time allotted will be devoted to describing techniques for automatic speech recognition. After introducing the task and its applications, four different approaches will be presented: knowledge-based, template-based, hidden Markov models, and artificial neural networks. A worked example of dynamic time warping will help to explain the template matching approach.

Speech Synthesis: Formant synthesis and waveform concatenation will be described as the two major approaches for speech synthesis.

Sound Processing for Cochlear Implants: The historical development of cochlear implant sound processing will be presented, from the feature extraction strategies of the ‘80s to the auditory model-based approaches undergoing research today.

Areas of Research: The course will conclude with an overview of areas of ongoing speech processing research.

Outcomes:

The aim of this course is to provide an understanding of the basic techniques used in the major applications of speech signal processing.

Presenter

Dr David B Grayden

Presenter Biography

David Grayden

David Grayden obtained his PhD in automatic speech recognition in the Department of Electrical & Electronic Engineering at the University of Melbourne. He then worked for nine years at The Bionic Ear Institute on cochlear implant sound processing. David is now a senior lecturer in the Department of Electrical & Electronic Engineering at the University of Melbourne. He is Discipline Coordinator for the Biomedical Engineering teaching programme and is one of the leaders of the Neuro-Engineering research group. The focus of his teaching and research is in understanding how the brain processes information and how best to present information to the brain using medical bionics, such as the bionic ear and bionic eye.