Multi-armed Bandits with Covariates

Abstract : Suppose that there are K (K>1) possibly biased coins available for play. At each time, you can choose one and only one coin to flip, and win a dollar if you get a head and receive nothing otherwise. Your goal is to obtain as much money as possible. Such a problem is called a multi-armed bandit problem. Bandit problems have applications in clinical trials, scheduling, and automated problem-solving in machine learning. In the literature, covariates, while available in most applications, are rarely considered. In this talk, I will present a non-parametric approach with a randomization technique for effectively utilizing the information in the covariates. A consistency property of the proposed approach is established.