Carnegie Mellon University
Discovery of novel relevant information in large, dynamically-updated document collection is a difficult challenge. Though previously neglected, novelty is perhaps as important as relevance in text mining and information retrieval, especially with the vast growth of on-line information stores, including the web itself. The presentation focuses on multiple manifestations of novelty, such as:
1) "Give me a summary of what's new in the Microsoft trial this week,"
2) "Alert me when there is a new astronomical discovery," or simply have an IR system that ranks documents for novelty and non-redundancy as well as relevance to query. Measures of of novelty discussed include Maximal Marginal Relevance in multi-document summarization, dissimilarity with history for new-event detection in newswire and broadcast streams, and new research on automatically detecting linguistic indicators for novelty in documents. This latter is based on recent results with statistical machine learning methods for genre classification, extended to first-report detection.