Talk
Abstract:
Seminar
on Industrial Problems
April
6, 2001
Content-Sensitive Fingerprinting and its
Applications
Rafail Ostrovsky
Telcordia Technologies
http://www.argreenhouse.com/bios/rafail/index.shtml
rafail@research.telcordia.com
570
Vincent Hall
10:10 am

A
standard notion of a hash-function is the one that maps all documents
to short "random-looking" outputs. That is, even if two documents
differ only in a few bits, a "classical" hash function is geared
to output "unrelated" results. In many settings, there is a need
for hash-functions which produce "similar" short outputs for "similar"
documents. This is especially relevant when searching for a "related"
documents in a large database, where one must find answers faster
then searching through the entire database, with applications
to information distillation and data mining algorithms.
In this talk, I will show how to construct such a hash function
and explain the underling ideas of the algorithm. I will then
show applications of such hash function to approximate searching,
information distillation, data-mining algorithms, dimension-reduction
techniques, clustering, filtering, approximate matching, facility
location, nearest neighbor search and other approximation problems.
The talk will be self-contained.
Back
to Industrial Programs
2000-2001
Program: Mathematics in Multimedia