Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)
Lantao Yu, Tianhe Yu, Chelsea Finn, Stefano Ermon
Reinforcement learning demands a reward function, which is often difficult to provide or design in real world applications. While inverse reinforcement learning (IRL) holds promise for automatically learning reward functions from demonstrations, several major challenges remain. First, existing IRL methods learn reward functions from scratch, requiring large numbers of demonstrations to correctly infer the reward for each task the agent may need to perform. Second, and more subtly, existing methods typically assume demonstrations for one, isolated behavior or task, while in practice, it is significantly more natural and scalable to provide datasets of heterogeneous behaviors. To this end, we propose a deep latent variable model that is capable of learning rewards from unstructured, multi-task demonstration data, and critically, use this experience to infer robust rewards for new, structurally-similar tasks from a single demonstration. Our experiments on multiple continuous control tasks demonstrate the effectiveness of our approach compared to state-of-the-art imitation and inverse reinforcement learning methods.