Talk
Abstract:
Level Search Filtering for IR Model Reduction
Michael
W. Berry
University of Tennessee
Information
retrieval models such as Latent Semantic Indexing (LSI) have
been demonstrated to outperform lexical matching in the presence
of synonymy and polysemy. Unfortunately, such rank-reduced models
require the decomposition (e.g., singular value decomposition)
of large term-by- document matrices which can prohibit scalable
information retrieval. This particular work demonstrates how
information filtering using level search techniques can reduce
the computational cost of LSI. For each query, level search
extracts a much smaller subset of the original term-by-document
matrix, containing on average 25% of the original non-zero entries.
When LSI is applied to such subsets, the average precision degrades
only by 5% due to level search filtering. Moreover, for some
document collections an increase in precision has also been
observed. Further enhancement of level search can be based on
a "pruning" scheme which deletes terms connected to
only one document from the query-specific submatrix. Such pruning
has achieved a 65% reduction (on average) in the number of non-zeros
with a precision loss of no more than 5% for most collections.
This
work is joint with Xiaoyan Zhang
and Padma Raghavan, University
of Tennessee.
Slides
for the Talk powerpoint html
Back to Workshop Schedule
Back to IMA "HOT TOPICS" Workshop: Text Mining
|