Relevance ranking and hit packaging in math search

Friday, December 8, 2006 - 10:15am - 10:45am
EE/CS 3-180
Abdou Youssef (George Washington University)
As in most search applications, math search involves relevance ranking and hit
packaging. That is, hits must be ordered using quantitative relevance scores, and every hit
must be accompanied by a small amount of qualitative relevance information that
conveys what the hit is about and why it matched.

Determining and quantifying relevance is a very hard problem in text search, and is at
least as hard in math search. The relevance score must factor in not only query terms, but
also a priori information about the hit target such as: (1) whether the target is/has
definitions, notations, graphs, theorems, proofs, and so on; (2) expert-predetermined
weights of certain entities (e.g., concepts, functions names, operators, etc.) in the target
document; (3) number of database-wide or Web-wide links pointing to the hit target; and
(4) frequency of user-access to that target.

Hit packaging is primarily a process of document summarization that is biased by both
the user-query and the same kinds of a priori information used in relevance scoring. One
way to summarize a document is to partition it into small fragments and select several of
the most relevant fragments; fragments from the document metadata, if any, may also be
included in the summary.

This talk will identify the issues involved in relevance ranking and hit packaging in math
search, and discuss approaches for addressing them.