Past Events

Does the Data Induce Capacity Control in Deep Learning?

You may attend the talk either in person in Walter 402 or register via Zoom. Registration is required to access the Zoom webinar.

Accepted statistical wisdom suggests that larger the model class, the more likely it is to overfit the training data. And yet, deep networks generalize extremely well. The larger the deep network, the better its accuracy on new data. This talk seeks to shed light upon this apparent paradox.

We will argue that deep networks are successful because of a characteristic structure in the space of learning tasks. The input correlation matrix for typical tasks has a peculiar (“sloppy”) eigenspectrum where, in addition to a few large eigenvalues (salient features), there are a large number of small eigenvalues that are distributed uniformly over exponentially large ranges. This structure in the input data is strongly mirrored in the representation learned by the network. A number of quantities such as the Hessian, the Fisher Information Matrix, as well as others activation correlations and Jacobians, are also sloppy. Even if the model class for deep networks is very large, there is an exponentially small subset of models (in the number of data) that fit such sloppy tasks. This talk will demonstrate the first analytical non-vacuous generalization bound for deep networks that does not use compression. We will also discuss an application of these concepts that develops new algorithms for semi-supervised learning.

References

  1. Does the data induce capacity control in deep learning?. Rubing Yang, Jialin Mao, and Pratik Chaudhari. [ICML '22] https://arxiv.org/abs/2110.14163
  2. Deep Reference Priors: What is the best way to pretrain a model? Yansong Gao, Rahul Ramesh, Pratik Chaudhari. [ICML '22] https://arxiv.org/abs/2202.00187

Pratik Chaudhari is an Assistant Professor in Electrical and Systems Engineering and Computer and Information Science at the University of Pennsylvania. He is a member of the GRASP Laboratory. From 2018-19, he was a Senior Applied Scientist at Amazon Web Services and a Postdoctoral Scholar in Computing and Mathematical Sciences at Caltech. Pratik received his PhD (2018) in Computer Science from UCLA, his Master's (2012) and Engineer's (2014) degrees in Aeronautics and Astronautics from MIT. He was a part of NuTonomy Inc. (now Hyundai- Aptiv Motional) from 2014—16. He received the NSF CAREER award and the Intel Rising Star Faculty Award in 2022.

Photo: https://pratikac.github.io/img/photo.jpg

Navigating a Career Path, a Case Study

Industrial Problems Seminar

Paula Dassbach (Medtronic)

You may attend the talk either in person in Walter 402 or register via Zoom. Registration is required to access the Zoom webinar.

What do you want to be when you grow up? It's a question that many of us are asked from a young age. We start with dreams of being a ballerina or fireman but seldom stay on this path. This talk is to share my experience of young aspirations, educational decisions, and choices that ultimately led to a career that is immensely fulfilling. I will also share some insight into my current role and some of the projects you might work on in a medical device company. As is likely clear, I will not be presenting an industrial problem, but instead the problem that we all face and navigate. My hope is that sharing my experience can provide some tools to help you get closer to the answer of the question that many of us are still asking ourselves: 'What do I want to be when I grow up?'

Cubic-Regularized Newton for Spectral Constrained Matrix Optimization and its Application to Fairness

Data Science Seminar

You may attend the talk either in person in Walter 402 or register via Zoom. Registration is required to access the Zoom webinar.

Matrix functions are utilized to rewrite smooth spectral constrained matrix optimization problems as smooth unconstrained problems over the set of symmetric matrices which are then solved via the cubic-regularized Newton method. We will discuss the solution procedure and showcase our method on a new fair data science model for estimating fair and robust covariance matrices in the spirit of the Tyler's M-estimator (TME) model. This is joint work with Dr. Gilad Lerman and Dr. Shuzhong Zhang.

Capacity Planning for the Cloud

Industrial Problems Seminar

Alex Gutierrez (Google Inc.)

You may attend the talk either in person in Walter 402 or register via Zoom. Registration is required to access the Zoom webinar.

Abstract: Cloud computing is still a young and extremely fast growing area with. In this talk I will give a high level overview of (some of!) the interesting problems that arise when planning capacity for a service that is designed to be "elastic" from a customer perspective.

Research Problems in Quantitative Finance

John Goes (GMO)

You may attend the talk either in person in Walter 402 or register via Zoom. Registration is required to access the Zoom webinar.

This talk will be a high level taxonomy of quantitative research challenges in financial markets, with a particular focus on examples in fixed income (bond) markets. I will also discuss my general experience as a mathematician in the finance industry.

Multiscale analysis of manifold-valued curves

Nir Sharon (Tel Aviv University)

You may attend the talk either in person in Walter 402 or register via Zoom. Registration is required to access the Zoom webinar.

A multiscale transform is a standard signal and image processing tool that enables a mathematically hierarchical analysis of objects. Customarily, the first scale corresponds to a coarse representation, and as scales increase, so is the refinement level of the entity we represent. This multiscale approach introduces a dynamic and flexible framework with many computational and approximation advantages. In this talk, we introduce a multiscale analysis that aims to represent manifold-valued curves. First, we will present the settings and our multiscale construction. Then, we will show some of the theoretical properties of our multiscale representation. Finally, we will conclude with several numerical examples illustrating how to apply our multiscale method for various data processing techniques.

Flexible multi-output multifidelity uncertainty quantification via MLBLUE

Matteo Croci (The University of Texas at Austin)

You may attend the talk either in person in Walter 402 or register via Zoom. Registration is required to access the Zoom webinar.

A central task in forward uncertainty quantification (UQ) is estimating the expectation of one or more quantities of interest (QoIs). In computational engineering UQ problems often involve multiple QoIs, and extremely heterogeneous models, both in terms of how they are constructed (varying grids, equations, or dimensions, different physics, surrogate and reduced-order models...) and in terms of their input-output structure (different models might have different uncertain inputs and yield different QoIs). In this complex scenario it is crucial to design estimators that are as flexible and as efficient as possible.

Multilevel (or multifidelity) Monte Carlo (MLMC) methods are often the go-to methods for estimating expectations as they are able to exploit the correlations between models to significantly reduce the estimation cost. However, multi-output strategies in MLMC methods are either sub-optimal, or non-existent.

In this talk we focus on multilevel best linear unbiased estimators (MLBLUEs, Schaden and Ullmann, SIAM/ASA JUQ, 2021). MLBLUEs are extremely flexible and have the appealing property of being provably optimal among all multilevel linear unbiased estimators, making them, in our opinion, one of the most powerful MLMC methods available in the literature. Nevertheless, MLBLUEs have two limitations: 1) their setup requires solving their model selection and sample allocation problem (MOSAP), which is a non-trivial nonlinear optimization procedure, and 2) they can only work with one scalar QoI at a time.

In this talk we show how the true potential of MLBLUEs can be unlocked:

  1. We present a new formulation of their MOSAP that can be solved almost as easily and efficiently as a linear program.
  2. We extend MLBLUEs to the multi- and infinite-dimensional output case.
  3. We provide multi-output MLBUE MOSAP formulations that can be solved efficiently and consistently with widely available optimization software.

We show that the new multi-output MLBLUEs can be setup very efficiently and that they significantly outperform existing MLMC methods in practical problems with heterogeneous model structure.

Matteo Croci is a postdoctoral researcher at the Oden Institute for Computational Engineering and Sciences at the University of Texas at Austin working with Karen E. Willcox and Robert D. (Bob) Moser. Before moving to Austin in late 2022, Matteo worked for two years as a postdoctoral researcher in the Mathematical Institute at the University of Oxford (UK) under the supervision of Michael B. (Mike) Giles. Matteo obtained his PhD from the University of Oxford (UK) in March 2020 under the supervision of Patrick E. Farrell, Michael B. (Mike) Giles, and in collaboration with Marie E. Rognes from Simula Research Laboratory (Oslo, Norway). Matteo has a MSc in Mathematical Modelling and Scientific Computing from the University of Oxford (UK), and a BSc in Mathematical Engineering from the Politecnico of Milan (Italy).

Matteo’s research has always been interdisciplinary, working at the interface between different fields in applied mathematics and computational engineering. During his PhD, he developed numerical methods for uncertainty quantification, including multilevel Monte Carlo methods, finite element methods for the solution of partial differential equations (PDEs) with random coefficients, and stochastic modelling techniques using Gaussian fields. He applied these techniques to design, validate, and solve different models for brain solute movement. He also developed an optimization method for finding multiple solutions of semismooth equations, variational inequalities and constrained optimization problems. In his years as a postdoc, Matteo has become an expert in reduced- and mixed-precision (RP and MP) computing, in particular in the development of RP/MP methods for the numerical solution of PDEs, including RP finite difference and finite element methods, and MP time stepping methods.

Matteo won the Charles Broyden Prize for the best paper published in Optimization Methods and Software in 2020.

Simplicity Bias in Deep Learning

Prateek Jain (Google Inc.)

While deep neural networks have achieved large gains in performance on benchmark datasets, their performance often degrades drastically with changes in data distribution encountered during real-world deployment. In this work, through systematic experiments and theoretical analysis, we attempt to understand the key reasons behind such brittleness of neural networks in real-world settings.

More concretely, we demonstrate through empirical+theoretical studies that (i) neural network training exhibits "simplicity bias" (SB), where the models learn only the simplest discriminative features and (ii) SB is one of the key reasons behind non-robustness of neural networks. We will then briefly outline some of our (unsuccessful) attempts so far on fixing SB in neural networks illustrating why this is an exciting but challenging problem.

View recording

The Back-And-Forth Method For Wasserstein Gradient Flows

Data Science Seminar

Wonjun Lee (University of Minnesota, Twin Cities)

You may attend the talk either in person in Walter 402 or register via Zoom. Registration is required to access the Zoom webinar.

We present a method to efficiently compute Wasserstein gradient flows. Our approach is based on a generalization of the back-and-forth method (BFM) introduced by Jacobs and Leger to solve optimal transport problems. We evolve the gradient flow by solving the dual problem to the JKO scheme. In general, the dual problem is much better behaved than the primal problem. This allows us to efficiently run large scale gradient flows simulations for a large class of internal energies including singular and non-convex energies.

Joint work with Matt Jacobs (Purdue University) and Flavien Leger (INRIA Paris)

View recording

Speculations

Gunnar Carlsson (Stanford University)

Slides

I would like to talk about the interaction of traditional algebraic topology and homotopy theory with applied topology, and specifically describe specifically some opportunities for better integration of "higher tech" techniques into applications.