Exploiting Group and Geometric Structures for Massive Data Analysis

Tuesday, December 3, 2019 - 1:25pm - 2:25pm
Lind 305
Zhizhen (Jane) Zhao (University of Illinois at Urbana-Champaign)
In this talk, I will introduce a new unsupervised learning framework for data points that lie on or close to a smooth manifold naturally equipped with a group action. In many applications, such as cryo-electron microscopy image analysis and shape analysis, the dataset of interest consists of images or shapes of potentially high spatial resolution, and admits a natural group action that plays the role of a nuisance or latent variable that needs to be quotient out before useful information is revealed. We define the pairwise group-invariant distance and the corresponding optimal alignment. We construct a graph from the dataset, where each vertex represents a data point and the edges connect points with small group-invariant distance. In addition, each edge is associated with the estimated optimal alignment group. Inspired by the vector diffusion maps proposed by Singer and Wu, we explore the cycle consistency of the group transformations under multiple irreducible representations to define new similarity measures for the data. Utilizing the representation theoretic mechanism, multiple associated vector bundles can be constructed over the orbit space, providing multiple views for learning the geometry of the underlying base manifold from noisy observations. I will introduce three approaches to systematically combine the information from different representations, and show that by exploring the redundancy created artificially across irreducible representations of the transformation group, we can get drastically improved nearest neighbor identification, when a large portion of the true edges are corrupted. I will also show the application in cryo-electron microscopy image analysis.