Hierarchical Memory-Based Reinforcement Learning

Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)

Bibtex Metadata Paper


Natalia Hernandez-Gardiol, Sridhar Mahadevan


Sridhar Mahadevan

Department of Computer Science

Michigan State University East Lansing, MI 48824 mahadeva@cse.msu.edu

A key challenge for reinforcement learning is scaling up to large partially observable domains. In this paper, we show how a hier(cid:173) archy of behaviors can be used to create and select among variable length short-term memories appropriate for a task. At higher lev(cid:173) els in the hierarchy, the agent abstracts over lower-level details and looks back over a variable number of high-level decisions in time. We formalize this idea in a framework called Hierarchical Suffix Memory (HSM). HSM uses a memory-based SMDP learning method to rapidly propagate delayed reward across long decision sequences. We describe a detailed experimental study comparing memory vs. hierarchy using the HSM framework on a realistic corridor navigation task.