standard notion of a hash-function is the one that maps all documents
to short "random-looking" outputs. That is, even if two documents
differ only in a few bits, a "classical" hash function is geared
to output "unrelated" results. In many settings, there is a need
for hash-functions which produce "similar" short outputs for "similar"
documents. This is especially relevant when searching for a "related"
documents in a large database, where one must find answers faster
then searching through the entire database, with applications
to information distillation and data mining algorithms.
In this talk, I will show how to construct such a hash function
and explain the underling ideas of the algorithm. I will then
show applications of such hash function to approximate searching,
information distillation, data-mining algorithms, dimension-reduction
techniques, clustering, filtering, approximate matching, facility
location, nearest neighbor search and other approximation problems.
The talk will be self-contained.
to Industrial Programs
Program: Mathematics in Multimedia