# Prediction with Expert Advice: A PDE Perspective on a Model Problem from Online Machine Learning

Thursday, November 12, 2020 - 12:30pm - 1:15pm

Robert Kohn (New York University)

The area known as online machine learning considers problems where data becomes available sequentially and a prediction must be made at each time based on the information then available. In the model problem known as prediction with expert advice, a predictor has access to guidance from N experts, whose outcomes are chosen by an adversary. At each time step, the predictor must combine the experts' guidance using a mixed strategy, with the goal of minimizing his worst-case final-time shortfall (regret) with respect to the best-performing expert. Since the focus is on worst-case behavior, probabilistic modeling of the experts' history is not involved; rather, we are considering a sequential two-person zero-sum game involving the predictor and the adversary.

The optimal policies and the predictor's worst-case regret are determined by a dynamic programming principle. To understand the asymptotic behavior over many time steps, it is convenient to consider a scaled version of the dynamic programming principle. I'll discuss recent work with Nadejda Drenska, which characterizes the scaling limit of the worst-case regret as the unique viscosity solution of a certain nonlinear PDE.

The nonlinear PDE has been solved explicitly in a few special cases involving small numbers of experts (N=2,3,4). But the PDE viewpoint also suggests another approach, namely to seek upper and lower bounds using explicit supersolutions and subsolutions. I'll discuss recent work of this type with Vladimir Kobzar and Zhilei Wang. The upper bounds obtained this way are in some sense familiar, as they involve potential-based strategies for the predictor. The lower bounds are, however, more novel: they involve potential-based strategies for the adversary. When the potentials (i.e. supersolutions or subsolutions) are smooth enough, there is no need to consider a scaling limit, and no need for the theory of viscosity solutions; our upper and lower bounds follow directly from the unscaled dynamic programming principle using little more than Taylor expansion.

The optimal policies and the predictor's worst-case regret are determined by a dynamic programming principle. To understand the asymptotic behavior over many time steps, it is convenient to consider a scaled version of the dynamic programming principle. I'll discuss recent work with Nadejda Drenska, which characterizes the scaling limit of the worst-case regret as the unique viscosity solution of a certain nonlinear PDE.

The nonlinear PDE has been solved explicitly in a few special cases involving small numbers of experts (N=2,3,4). But the PDE viewpoint also suggests another approach, namely to seek upper and lower bounds using explicit supersolutions and subsolutions. I'll discuss recent work of this type with Vladimir Kobzar and Zhilei Wang. The upper bounds obtained this way are in some sense familiar, as they involve potential-based strategies for the predictor. The lower bounds are, however, more novel: they involve potential-based strategies for the adversary. When the potentials (i.e. supersolutions or subsolutions) are smooth enough, there is no need to consider a scaling limit, and no need for the theory of viscosity solutions; our upper and lower bounds follow directly from the unscaled dynamic programming principle using little more than Taylor expansion.