Towards optimization of a human-inspired heuristic for solving explore-exploit problems
Paul Reverdy, Robert C. Wilson, Philip Holmes and Naomi E. Leonard
Proceedings of the IEEE Conference on Decision and Control, Maui, HI,
2012.
Motivated by models of human decision making, we consider a heuristic solution for
explore-exploit problems. In a numerical example we show that, with appropriate parameter values,
the algorithm performs well. However, the parameters of the algorithm trade off exploration against
exploitation in a complicated way so that finding the optimal parameter values is not obvious.
We show that the optimal parameter values can be analytically computed in some cases and prove
that suboptimal parameter tunings can provide robustness to modeling error. The analytic results
suggest a feedback control law for dynamically optimizing parameters.
(227 KB pdf)
Back to home page
Back to publications page