From stochastic search to bandit problems to dynamic programs: Fresh perspectives of some old problems

Monday, July 24, 2017 - 1:30pm - 2:00pm
Keller 3-180
Warren Powell (Princeton University)
Stochastic optimization comes in a wide variety of flavors under names such as stochastic search, dynamic programming, stochastic programming, stochastic control, ranking and selection and multiarmed bandit problems (to name just a few), with variations that use names such as simulation optimization, approximate dynamic programming, reinforcement learning and POMDPs. With careful notation, we can model all of these with one framework which requires making the transition from finding the best decision (a deterministic scalar or vector) to find the best policy (which is a function). While elegant, this framework disguises four important classes of problems that are distinguished by whether we are working with state dependent, or state-independent, functions, or whether we are maximizing the terminal or cumulative reward. This organization provides fresh perspectives for some familiar problems. We then address the challenge of searching over policies. We identify two core strategies for designing policies, each of which divide into two classes, producing four classes of policies. These four classes cover all of the different subfields of stochastic optimization. By drawing ideas from other fields, we open up new approaches for solving problems. Using a simple energy storage problem, we demonstrate that each of the four classes of policies may work best depending on the data, while hybrids may work even better.

Link to tutorial article: