Jaime
Carbonell
Carnegie Mellon University
Jaime_Carbonell@lti.cs.cmu.edu
Discovery of novel relevant information in large, dynamically-updated
document collection is a difficult challenge. Though previously
neglected, novelty is perhaps as important as relevance in
text mining and information retrieval, especially with the
vast growth of on-line information stores, including the web
itself. The presentation focuses on multiple manifestations
of novelty, such as:
1) "Give me a summary of what's new in the Microsoft trial
this week,"
2) "Alert me when there is a new astronomical discovery,"
or simply have an IR system that ranks documents for novelty
and non-redundancy as well as relevance to query. Measures
of of novelty discussed include Maximal Marginal Relevance
in multi-document summarization, dissimilarity with history
for new-event detection in newswire and broadcast streams,
and new research on automatically detecting linguistic indicators
for novelty in documents. This latter is based on recent results
with statistical machine learning methods for genre classification,
extended to first-report detection.