Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)
Bram Bakker
This paper presents reinforcement learning with a Long Short(cid:173) Term Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage(,x) learning and directed exploration can solve non-Markovian tasks with long-term dependencies be(cid:173) tween relevant events. This is demonstrated in a T-maze task, as well as in a difficult variation of the pole balancing task.