Wednesday, December 5, 2018 - 1:30pm - 2:30pm
Mike Wei (University at Buffalo (SUNY))
We propose a minimax concave penalized multi-armed bandit algorithm under generalized linear model (G-MCP-Bandit) for a decision-maker facing high-dimensional data in an online learning and decision-making process. We demonstrate that the G-MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in the sample size dimension T, O(log T), and further attains a tight bound in the covariates dimension d, O(log d).
Subscribe to RSS - MCP