Transition Point Dynamic Programming

Part of Advances in Neural Information Processing Systems 6 (NIPS 1993)

Bibtex Metadata Paper


Kenneth Buckland, Peter Lawrence


Transition point dynamic programming (TPDP) is a memory(cid:173) based, reinforcement learning, direct dynamic programming ap(cid:173) proach to adaptive optimal control that can reduce the learning time and memory usage required for the control of continuous stochastic dynamic systems. TPDP does so by determining an ideal set of transition points (TPs) which specify only the control action changes necessary for optimal control. TPDP converges to an ideal TP set by using a variation of Q-Iearning to assess the mer(cid:173) its of adding, swapping and removing TPs from states throughout the state space. When applied to a race track problem, TPDP learned the optimal control policy much sooner than conventional Q-Iearning, and was able to do so using less memory.