Evolution of Reinforcement Learning in Uncertain Environments: Emergence of Risk-Aversion and Matching
Reinforcement
learning (RL) is a fundamental process by which organisms learn to achieve a
goal from interactions with the environment. We use Artificial Life techniques
to derive (near-)optimal neuronal learning rules in a simple neural network
model of decision-making in simulated bumblebees foraging for nectar. The
resulting networks exhibit efficient RL, allowing the bees to respond rapidly
to changes in reward contingencies. Furthermore, the evolved synaptic
plasticity dynamics give rise to varying exploration/exploitation levels from
which emerge the well-documented foraging strategies of risk aversion and
probability matching. These are shown to be a direct result of optimal RL,
providing a biologically founded, parsimonious and novel explanation for these
behaviors.Our results are corroborated by a rigorous mathematical analysis and
by experiments in mobile robots.