A standard notion of a hash-function is the one that maps all documents to short "random-looking" outputs. That is, even if two documents differ only in a few bits, a "classical" hash function is geared to output "unrelated" results. In many settings, there is a need for hash-functions which produce "similar" short outputs for "similar" documents. This is especially relevant when searching for a "related" documents in a large database, where one must find answers faster then searching through the entire database, with applications to information distillation and data mining algorithms.
In this talk, I will show how to construct such a hash function and explain the underling ideas of the algorithm. I will then show applications of such hash function to approximate searching, information distillation, data-mining algorithms, dimension-reduction techniques, clustering, filtering, approximate matching, facility location, nearest neighbor search and other approximation problems. The talk will be self-contained.