A Story of Principal Component Analysis in the Distributed Model

Monday, May 16, 2016 - 3:10pm - 4:00pm
Keller 3-180
David Woodruff (IBM Research Division)
We consider an illustrative problem in distributed machine learning - computing a low rank approximation in the arbitrary partition model. In this model each of s servers holds an n x d matrix A^i, where each entry is an O(log nd)-bit word, and we let A = A^1 + ... + A^s. We would like each server to output the same k x d matrix V, so that V is an approximation to the top k principal components of A, in the sense that projecting onto V provides a (1+eps)-approximate low rank approximation. We give the first communication optimal protocol for this problem, namely a protocol with communication O(skd) + poly(sk/eps) words, improving upon several previous works, and show how to implement the protocol in input sparsity time up to low order terms. Importantly our results do not make any condition number assumptions, yet still achieve the desired bit complexity.

Based on work with Christos Boutsidis and Peilin Zhong.
MSC Code: