Satisficing in Gaussian bandit problems

Paul Reverdy and Naomi Ehrich Leonard

Proceedings of IEEE Conference on Decision and Control, Los Angeles, CA, 2014.

Pdf of paper
We propose a satisficing objective for the multiarmed bandit problem, i.e., where the objective is to achieve performance above a given threshold. We show that this new problem is equivalent to a standard multi-armed bandit problem with a maximizing objective and use this equivalence to find bounds on performance in terms of the satisficing objective. For the special case of Gaussian rewards we show that the satisficing problem is equivalent to a related standard multiarmed bandit problem again with Gaussian rewards. We apply the Upper Credible Limit (UCL) algorithm to this standard problem and show how it achieves optimal performance in terms of the satisficing objective.

Back to home page
Back to publications page