Statistical Models of Text: From Bags of Words to Structure

Monday, April 17, 2000 - 4:45pm - 5:10pm
Keller 3-180
Ralph Weischedel (Bolt Beranek and Newman (BBN) Laboratories, Inc.)
During the last five years, attempts to apply statistical language models to computational linguistics have led to new capabilities in processing text. In this paper, we survey those techniques (named entity identification and classification, parsing, and fact extraction), since they provide structural and semantic features that can be the input to text mining algorithms, rather than relying solely on models of bags of words.