Spectral Methods for Learning Multivariate Latent Tree Structure

Monday, September 26, 2011 - 4:30pm - 5:30pm
Keller 3-180
Kamalika Chaudhuri (University of California)
This talk considers the problem of learning the structure of a broad
class of multivariate latent variable tree models, which include a
variety of continuous and discrete models (including the widely used
linear-Gaussian models, hidden Markov models, and Markov
evolutionary trees). The setting is one where we only have samples
from certain observed variables in the tree and our goal is to
estimate the tree structure (emph{i.e.}, the graph of how the underlying
hidden variables are connected to the observed variables). We
provide the Spectral Recursive Grouping algorithm, an efficient and
simple bottom-up procedure for recovering the tree structure from
independent samples of the observed variables. Our finite sample
size bounds for exact recovery of the tree structure elucidate
certain natural dependencies on underlying statistical and
structural properties of the underlying joint
distribution. Furthermore, our sample complexity guarantees have no
explicit dependence on the dimensionality of the observed variables,
making the algorithm applicable to many high-dimensional settings.
At the heart of our algorithm is a spectral quartet test for
determining the relative topology of a quartet of variables, which
only utilizes certain second order statistics and is based
on the determinants of certain cross-covariance matrices.

This talk is based on joint work with A. Anandkumar, D. Hsu, S. Kakade, L. Song and T. Zhang.
MSC Code: