Talk
Abstract:
Document Clustering: Is Hierarchical Clustering Really Better?
Michael
Steinbach
University of Minnesota
steinbac@cs.umn.edu
In this presentation we compare the two main approaches to document
clustering: agglomerative hierarchical clustering and K-means.
Hierarchical clustering is often portrayed as the better quality
clustering approach, but is limited because of its quadratic
time complexity. In contrast, K-means has a time complexity
which is linear in the number of documents, but is thought to
produce inferior clusters. Sometimes the two approaches are
combined so as to "get the best of both worlds." This paper
presents research that disputes the view that hierarchical clustering
produces better quality document clusters than K-means or its
variants. In particular, we give results which show that a simple
and efficient "bisecting" K-means technique is as good or better
than the pure or hybrid hierarchical approaches we tested. While
our results are limited in scope, we believe that researchers
interested in document clustering will find them quite interesting.
Back to Workshop Schedule
Back to Reactive Flow and Transport Phenomena
Back to IMA "HOT TOPICS" Workshops
|