Main navigation | Main content
HOME » PROGRAMS/ACTIVITIES » Annual Thematic Program
Richard M. Stern
Department of Electrical and Computer Engineering and School
of Computer Science
Carnegie Mellon University
rms@cs.cmu.edu
http://www.ece.cmu.edu/~rms
As speech recognition technology is transferred from the laboratory to the marketplace, robust speech recognition is becoming increasingly important This talk will review and discuss classical and contemporary approaches to robust speech recognition. We begin by reviewing the role of cepstral analysis in speech recognition, as implemented by mel frequency cepstral coefficients and by perceptual linear prediction, along with the contributions of cepstral differences to feature vectors. The most tractable types of environmental degradation are produced by quasi-stationary additive noise and quasi-stationary linear filtering. These distortions can be largely ameliorated by the "classical" techniques of cepstral high-pass filtering (as exemplified by cepstral mean normalization and RASTA filtering) as well as by techniques that develop statistical models of the distortion such as CDCN and VTS. Nevertheless, these types of approaches fail to provide much useful improvement when speech is degraded by transient or non-stationary noise such as background music or speech. We describe and compare the effectiveness of techniques based on missing-feature compensation and multi-band analysis toward resolving these problems. We briefly review the literature on signal processing based on models of the auditory system and comment on its effectiveness in achieving robustness to date. Finally, we briefly summarize some recent work in which optimal feature sets for particular tasks are developed by nonlinear transformations selected to maximize the likelihood of a particular set of training data.
Material from talk powerpoint (1.5MB)
Mathematical Foundations of Speech Processing and Recognition
|
|
|
|
|