Fast Statistical and Geometric Distances Between Families of Distributions

Tuesday, November 17, 2020 - 1:25pm - 2:25pm
Alexander Cloninger (University of California, San Diego)

Detecting differences and building classifiers between a family of distributions, given only finite samples, has had renewed interest due to data science applications in high dimensions.   Applications include survey response effects, topic modeling, and various measurements of cell or gene populations per person.  Recent advances have focused on kernel Maximum Mean Discrepancy and Optimal Transport.  However, when the family of distributions are concentrated near a low dimensional structure, or when the family of distributions being considered is generated from a family of simple group actions, these algorithms fail to exploit the reduced complexity.  In this talk, we'll discuss the theoretical and computational advancements that can be made under these assumptions, and their connections to harmonic analysis, approximation theory, and group actions. Similarly, we'll use both techniques to develop methods of provably identifying not just how much the distributions deviate, but where these differences are concentrated. We'll also focus on applications in medicine, generative modeling, and supervised learning.

Alex Cloninger is an Assistant Professor in Mathematics and the Halıcıoğlu Data Science Institute at UC San Diego. He received his PhD in Applied Mathematics and Scientific Computation from the University of Maryland in 2014, and was then an NSF Postdoc and Gibbs Assistant Professor of Mathematics at Yale University until 2017, when he joined UCSD.   Alex researches problems in the area of geometric data analysis and applied harmonic analysis.  He focuses on approaches that model the data as being locally lower dimensional, including data concentrated near manifolds or subspaces.  These types of problems arise in a number of scientific disciplines, including imaging, medicine, and artificial intelligence, and the techniques developed relate to a number of machine learning and statistical algorithms, including deep learning, network analysis, and measuring distances between probability distributions.