Actor-Critic Algorithms

Konda, Vijay; Tsitsiklis, John

Actor-Critic Algorithms

Vijay R. Konda, John N. Tsitsiklis

Advances in Neural Information Processing Systems 12 (NIPS 1999)

Abstract

We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information pro(cid:173) vided by the critic. We show that the features for the critic should span a subspace prescribed by the choice of parameterization of the actor. We conclude by discussing convergence properties and some open problems.

Abstract

Name Change Policy