Local Bandit Approximation for Optimal Learning Problems

Duff, Michael; Barto, Andrew

Local Bandit Approximation for Optimal Learning Problems

Michael O. Duff, Andrew G. Barto

Advances in Neural Information Processing Systems 9 (NIPS 1996)

Abstract

In general, procedures for determining Bayes-optimal adaptive controls for Markov decision processes (MDP's) require a pro(cid:173) hibitive amount of computation-the optimal learning problem is intractable. This paper proposes an approximate approach in which bandit processes are used to model, in a certain "local" sense, a given MDP. Bandit processes constitute an important subclass of MDP's, and have optimal learning strategies (defined in terms of Gittins indices) that can be computed relatively efficiently. Thus, one scheme for achieving approximately-optimal learning for gen(cid:173) eral MDP's proceeds by taking actions suggested by strategies that are optimal with respect to local bandit models.

Abstract

Name Change Policy