##
Surveillance in an Abruptly Changing World via Multiarmed Bandits

###
Vaibhav Srivastava, Paul Reverdy and Naomi Ehrich Leonard

*Proceedings of IEEE Conference on Decision and Control*, Los Angeles, CA, pp. 692-697,
2014.

Pdf of paper
We study a path planning problem in an environment that is abruptly changing due to the
arrival of unknown spatial events. The objective of the path planning problem is to collect the data
that is most evidential about the events. We formulate this problem as a multiarmed bandit (MAB)
problem with Gaussian rewards and change points, and address the fundamental tradeoff between
learning the true event (exploration), and collecting the data that is most evidential about
the true event (exploitation). We extend the switching-window UCB algorithm for MAB problems
with bounded rewards and change points to the context of correlated Gaussian rewards and develop
the switching-window UCL (SW-UCL) algorithm. We extend the SW-UCL algorithm to an adaptive SW-UCL
algorithm that utilizes statistical change detection to adapt the SW-UCL algorithm.
We also develop a block SW-UCL algorithm that reduces the number of transitions among arms in the
SW-UCL algorithm, and is more amenable to robotic applications.

Back to home page
Back to publications page