Tutorial on Bandits Games
In the recent years the multi-armed bandit problem has attracted a lot of attention in the theoretical learning community. This growing interest is a consequence of the large number of problems that can be modeled as a multi-armed bandit: ad placement, website optimization, packet routing, ect. Furthermore the bandit methodology is also used as a building block for more complicated scenarios such as reinforcement learning, model selection in statistics, or computer game-playing. While the basic stochastic multi-armed bandit can be traced back to Thompson (1933) and Robbins (1952), it is only very recently that we obtained an (almost) complete understanding of this simple model. Moreover many extensions of the original problem have been proposed in the past fifteen years, such as bandits without a stochastic assumption (the so-called adversarial model), or bandits with a very large (but structured) set of arms.
The tutorial will be divided into three parts: