Talk
Abstract:
Robust Signal Representations for Automatic Speech Recognition
Richard M. Stern
Department of Electrical and Computer Engineering and School
of Computer Science
Carnegie Mellon University
rms@cs.cmu.edu
http://www.ece.cmu.edu/~rms
As
speech recognition technology is transferred from the laboratory
to the marketplace, robust speech recognition is becoming increasingly
important This talk will review and discuss classical and contemporary
approaches to robust speech recognition. We begin by reviewing
the role of cepstral analysis in speech recognition, as implemented
by mel frequency cepstral coefficients and by perceptual linear
prediction, along with the contributions of cepstral differences
to feature vectors. The most tractable types of environmental
degradation are produced by quasi-stationary additive noise
and quasi-stationary linear filtering. These distortions can
be largely ameliorated by the "classical" techniques of cepstral
high-pass filtering (as exemplified by cepstral mean normalization
and RASTA filtering) as well as by techniques that develop statistical
models of the distortion such as CDCN and VTS. Nevertheless,
these types of approaches fail to provide much useful improvement
when speech is degraded by transient or non-stationary noise
such as background music or speech. We describe and compare
the effectiveness of techniques based on missing-feature compensation
and multi-band analysis toward resolving these problems. We
briefly review the literature on signal processing based on
models of the auditory system and comment on its effectiveness
in achieving robustness to date. Finally, we briefly summarize
some recent work in which optimal feature sets for particular
tasks are developed by nonlinear transformations selected to
maximize the likelihood of a particular set of training data.
Material
from talk powerpoint
(1.5MB)
Mathematical
Foundations of Speech Processing and Recognition
2000-2001
Program: Mathematics in Multimedia
|