Algorithmic models of human decision making in Gaussian multi-armed bandit problems

Paul Reverdy, Vaibhav Srivastava and Naomi Ehrich Leonard

Proceedings of European Control Conference, Strasbourg, France, 2014.
Winner of the Best Student Paper Award

Pdf of paper in conference proceedings
We consider a heuristic Bayesian algorithm as a model of human decision making in multi-armed bandit problems with Gaussian rewards. We derive a novel upper bound on the Gaussian inverse cumulative distribution function and use it to show that the algorithm achieves logarithmic regret. We extend the algorithm to allow for stochastic decision making using Boltzmann action selection with a dynamic temperature parameter and provide a feedback rule for tuning the temperature parameter such that the stochastic algorithm achieves logarithmic regret. The stochastic algorithm encodes many of the observed features of human decision making.

Back to home page
Back to publications page