Part of Advances in Neural Information Processing Systems 7 (NIPS 1994)
Steven Bradtke, Michael Duff
Semi-Markov Decision Problems are continuous time generaliza(cid:173) tions of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approxima(cid:173) tion. Among these are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose al(cid:173) gorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal con(cid:173) trol for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully ap(cid:173) plied.