Fitted Q-iteration in continuous action-space MDPs

Antos, András; Szepesvári, Csaba; Munos, Rémi

Fitted Q-iteration in continuous action-space MDPs

András Antos, Csaba Szepesvári, Rémi Munos

Advances in Neural Information Processing Systems 20 (NIPS 2007)

Abstract

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by another policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous theoretical analysis of this algorithm, proving what we believe is the first finite-time bounds for value-function based algorithms for continuous state- and action-space problems.

Abstract

Name Change Policy