Mathematical
Modeling in Industry-A Workshop for Graduate Students
Optimizing
Language Models and Texts for Automatic Speech Recognition
Tutor:
Joan Bachenko, Linguistic Technologies,
Inc.
Speech recognizers incorporate three core modules: the decoder,
which performs pattern matching; the language model, which defines
the vocabulary and word patterns; and the acoustic model, which
defines the phone set and phone patterns. The process of recognition
is essentially a series of guesses among thousands of hypotheses.
The job of language and acoustic models is to represent the
hypotheses and their likelihood in order to maximize the recognizer's
chances of getting the right output for speech input. This workshop
will focus on language modeling and on experiments in language
model optimization. Training data, language model software and
access to a high quality speech recognizer will be made available
to participating students.
A language model (LM) is a probabilistic model trained on text
data. Most LMs today are trigram models (where "gram" is a word)
that back off to bigrams and unigrams and use a smoothing technique
to handle sparse data. For working speech recognition systems,
the LM is only as good as the text that it trains on. Hence
texts are usually taken from some limited domain, e.g. airline
reservations or radiology, in order to constrain the set of
hypotheses that the LM makes available. If the training text
is too limited, however, the LM will fail to represent ngrams
that are likely to be spoken.
One question we will be addressing in the workshop is how to
determine when a training text is sufficiently good for producing
a good LM. Another question we will address is how to partition
a training text into minimally overlapping sublanguage texts
in order to build good sublanguage models. For example, is it
possible to predict whether the best LM includes both pediatrics
and general medicine or whether the recognizer will perform
better with two distinct LMs; if the best approach is two LMs,
then how should the distance between them be measured and optimized?
The workshop will focus on lexical growth and LM interpolation
in an attempt to provide some answers to these questions. Lexical
growth is a measure of the rate at which new words are observed
in a text as the text grows in size. Predictive models of lexical
growth exist and will be discussed. LM interpolation is a statistical
method of weighting texts in LM construction. Although interpolation
is commonly used in adapting LMs to new domains, little is known
about how interpolation can be used to predict LM performance.
References:
Internet:
Speech at Carnegie-Mellon University: http://www.speech.cs.cmu.edu/index.html
This is one of the best toolkits for building LMs for research.
You can download the CMU Statistical Language Modeling Toolkit
and read the documentation.
Speech at Cambridge University: http://svr-www.eng.cam.ac.uk/research/speech
Click on Links to Related Sites to visit other speech recognition
laboratories. Cambridge is closely connected to Entropic Cambridge
Research Laboratory, which produces a highly regarded speech
recognizer called HTK (Hidden Markov Model Toolkit).
Center for Language and Speech Processing at Johns Hopkins University:
http://www.clsp.jhu.edu Follow the links to workshops.
Reading:
Allen, James. 1995. Natural Language Understanding. Redwook
City, CA: Benjamin/Cummings Publishing. Chapter 7 and Appendix
C.
Jurafsky, Dan and James H. Martin. 2000. Speech and Language
Processing. Englewood Cliffs, NJ: Prentice-Hall. Chapters 5,
6, 7.
Project
Team Participants:
| Jennifer
Suzanne Lynch |
Cornell
University |
| Kevin
McCleary |
Kent
State University |
| Maria
Kiskowski |
University
of Notre Dame |
| Jennifer
Lefeaux |
North
Carolina State University |
| Dany
Ngouyassa |
Indiana
University |
| Bryan
Smith |
Tufts
University |
Workshop
Schedule
Back
to Mathematical Modeling in Industry
|