Talk
Abstract:
Modelling Graph-based Observation Spaces for
Segment-Based Speech
Recognition
James Glass
MIT
Laboratory for Computer Science
glass@mit.edu
In most current speech recognizers, the observation space of
an utterance consists of a temporal sequence of "frames". An
important property of this framework is that every segmentation
of the input utterance accounts for all of the observations.
In contrast, in a "feature"-based framework based on segments
(either implicit or explicit) each segment is represented by
a fixed-dimensional feature vector, so that alternative segmentations
of the utterance will produce different feature vector sequences.
The total observation space for all possible segmentations can
be represented as a temporal graph of feature vectors.
In our work with segment-based speech recognition, we have explored
probabilistic frameworks which allow us to compare different
segmentations by considering the entire observation space of
features. The first approach adds an extra lexical unit which
is defined to map to all segments which do not correspond to
one of the existing units. In our phonetic-based modelling,
we call this unit the anti-phone, and use it to model all feature
vectors which are not hypothesized to be a phonetic unit. Two
competing segmentations must therefore account for all segments,
either as a normal acoustic-phonetic unit or as the anti-phone.
An extension of the anti-phone concept partitions the observation
space into near-miss subsets whereby each segment in a hypothesized
segmentation is associated with a near-miss subset of segments
which are not in the segmentation. To be accurate, the near-miss
subsets for every segmentation must be mutually-exclusive and
exhaustive.
In this talk, I will describe the probabilistic frameworks using
the anti-phone and near-miss modelling techniques, and show
how they have been employed in a segment-based recognizer to
achieve state-of-the-art results on a common phonetic recognition
task.
Mathematical
Foundations of Speech Processing and Recognition
2000-2001
Program: Mathematics in Multimedia
|