Learning Graphical Models by Competitive Assembly of Marginals

Wednesday, October 26, 2011 - 1:30pm - 2:30pm
Keller 3-180
Donald Geman (Johns Hopkins University)
Learning high-dimensional probability distributions with a very
reduced number of samples is no more difficult than with a great
many. However, arranging for such models to generalize well in the
small-sample domain is hard. Our approach is motivated by
compositional models and Bayesian networks, and designed to adapt to
sample size. We start with a large, overlapping set of elementary
statistical building blocks, or primitives, which are
low-dimensional marginal distributions learned from data. Subsets of
primitives are combined in a lego-like fashion to construct a
probabilistic graphical model. Model complexity is controlled by
adapting the primitives to the amount of training data and imposing
strong restrictions on merging them into allowable compositions. In
the case of binary forests, structure optimization corresponds to an
integer linear program and the maximizing composition can be computed
for reasonably large numbers of variables.
MSC Code: