Tutorial on Bandits Games

This tutorial was presented at ALT 2011.

Slides

Pdf

Abstract

In the recent years the multi-armed bandit problem has attracted a lot of attention in the theoretical learning community. This growing interest is a consequence of the large number of problems that can be modelized as a multi-armed bandit: web advertisement, dynamic pricing, online optimization, ect. Bandits algorithms are also used as building blocks in more complicated scenarios such as reinforcement learning, model selection problems, or games.

While the basic stochastic multi-armed bandit can be traced back to Robbins (1952), it is only very recently that we obtained an (almost) complete understanding of this simple model. Moreover many extensions of the original problem have been proposed, such as bandits without a stochastic assumption (the so-called adversarial model), or bandits with a very large (structured) set of arms.

The tutorial will be divided into three parts:

  • in the first part we discuss the state-of-the-art results on the basic multi-armed bandit problem (both stochastic and adversarial).

  • in the second part the focus will be on continuously-armed stochastic bandits, with a Lipschitz assumption on the mean-payoff.

  • finally in the third part we consider the case of adversarial bandits, with a linear loss and a very large set of arms with some combinatorial structure.