Multi-armed bandit

Wednesday, December 5, 2018 - 1:30pm - 2:30pm
Mike Wei (University at Buffalo (SUNY))
We propose a minimax concave penalized multi-armed bandit algorithm under generalized linear model (G-MCP-Bandit) for a decision-maker facing high-dimensional data in an online learning and decision-making process. We demonstrate that the G-MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in the sample size dimension T, O(log T), and further attains a tight bound in the covariates dimension d, O(log d).
Friday, September 15, 2017 - 3:00pm - 3:30pm
Yuhong Yang (University of Minnesota, Twin Cities)
In practice of medicine, multiple treatments are often available to treat individual patients. The task of identifying the best treatment for a specific patient is very challenging due to patient inhomogeneity. Multi-armed bandit with covariates provides a framework for designing effective treatment allocation rules in a way that integrates the learning from experimentation with maximizing the benefits to the patients along the process.
Subscribe to RSS - Multi-armed bandit