Active Clustering and Ranking

Monday, September 26, 2011 - 3:00pm - 4:00pm
Keller 3-180
Robert Nowak (University of Wisconsin, Madison)
Clustering and ranking is often based on pairwise similarities (metric data) or comparisons (ordinal data). Most methods assume that the entire collection of all possible pairwise similarities or comparisons are known, but in high-dimensional settings there may be missing data and/or the costs of collecting this information may be prohibitive. The existence of clusters or other intrinsic low-dimensional structure can induce redundancy in these data, and therefore it may be possible to robustly cluster or rank based on a small subset of the data. I will discuss two results that demonstrate this phenomenom. First, I will describe a clustering procedure that generates a hierarchical clustering of N objects using only N log N similarities, instead of all N(N-1)/2 similarities. The method can recover all clusters of size larger than log N, even in the presence of a limited fraction of arbitrarily corrupted or noisy similarities. Second, if N objects are embedded in d-dimensions (d
This is joint work with Gautam Dasarathy, Brian Eriksson, Kevin Jamieson, and Aarti Singh. Related papers are available at and
MSC Code: