The Space Between the Data

Tuesday, October 8, 2013 - 9:00am - 9:50am
Keller 3-180
Leonidas Guibas (Stanford University)
The information contained across many data sets is often highly correlated. Such connections and correlations can arise because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy. This is particularly true for data sets of a geometric character, such as GPS traces, images, videos, 3D scans, 3D models, etc.

We argue that when extracting knowledge from the data in a given data set, we can do significantly better if we exploit the wider context provided by all the relationships between this data set and a society or social network of other related data sets. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps, small and large, between inter-related data. The network can act as a regularizer, allowing us to to benefit from the wisdom of the collection in performing operations on individual data sets or in map inference between them.

This functorial view of data puts the spotlight on consistent, shared relations and maps as the key to understanding structure in data. It is a little different from the current dominant paradigm of extracting supervised or unsupervised feature sets, defining distance or similarity metrics, and doing regression or classification -- though sparsity still plays an important role. The inspiration is more from ideas in homological algebra or algebraic topology, exploiting the algebraic structure of data relationships or maps in an effort to disentangle dependencies and assign importance to the vast web of all possible relationships among multiple data sets.

We illustrate these ideas largely using examples from the realm of 3D shapes and images, e.g., for segmentation or for capturing and visualizing differences between 3D data sets -- and more generally for abstraction, analogy, compression, error correction, and summarization.

This is an overview of joint work with multiple collaborators, as discussed in the talk.