Tutorial on Bandits Games

This tutorial was presented at ALT 2011, ACML 2012, and SIGMETRICS 2014.


Pdf (slightly updated and shortened 2014 version: Pdf)


In the recent years the multi-armed bandit problem has attracted a lot of attention in the theoretical learning community. This growing interest is a consequence of the large number of problems that can be modeled as a multi-armed bandit: ad placement, website optimization, packet routing, ect. Furthermore the bandit methodology is also used as a building block for more complicated scenarios such as reinforcement learning, model selection in statistics, or computer game-playing. While the basic stochastic multi-armed bandit can be traced back to Thompson (1933) and Robbins (1952), it is only very recently that we obtained an (almost) complete understanding of this simple model. Moreover many extensions of the original problem have been proposed in the past fifteen years, such as bandits without a stochastic assumption (the so-called adversarial model), or bandits with a very large (but structured) set of arms.

The tutorial will be divided into three parts:

  • In the first part we discuss the state-of-the-art results on the basic multi-armed bandit problem (both stochastic and adversarial).

  • In the second part the focus will be on continuously-armed stochastic bandits, with a Lipschitz assumption on the mean-payoff.

  • Finally in the third part we consider the case of adversarial bandits, with a linear loss and a very large set of arms with some combinatorial structure