Data Fusion and Multi-cue Data Matching Using Diffusion Maps

Tuesday, December 6, 2005 - 10:30am - 11:30am
EE/CS 3-180
Stephane Lafon (Google Inc.)
Data fusion and multi-cue data matching are fundamental tasks arising
in a variety of systems that process large amounts of data. These
tasks often rely on dimensionality reduction techniques that
traditionally follow a data acquisition/reprocessing phase.
In this talk, I will describe a powerful framework based on diffusions
that can be used in order to learn the intrinsic geometry of data
sets. These techniques allow to simultaneously handle data acquisition
issues and data processing tasks. In particular, I will explain how we
can use this set of tools in order to address three major challenges
related to data fusion:

1) How to deal with data coming from sensors/sources sampled at
different rates, and possibly at different times. We provide
algorithms to obtain density-invariant descriptors (parametrization)
of data sets.

2) How to integrate and combine information streams coming from
different sensors into one representation of the data. The diffusion
coordinates allow to learn the geometry of the data captured by each
sensor independently, and then to combine the various representations
into a unified description of the data.

3) How to do matching of data sets based on their intrinsic geometry.
As an illustration, I will present numerical results on the
integration of audio and video streams for lip-reading and speech
recognition. Other examples will be more focused on imaging
(multiscale data-driven image segmentation, image data sets

This is joint work with R.R. Coifman, A. Glaser, Y. Keller and S.W.
Zucker (Yale university).