An Actor/Critic Algorithm that is Equivalent to Q-Learning

Crites, Robert; Barto, Andrew

An Actor/Critic Algorithm that is Equivalent to Q-Learning

Robert H. Crites, Andrew G. Barto

Advances in Neural Information Processing Systems 7 (NIPS 1994)

Abstract

We prove the convergence of an actor/critic algorithm that is equiv(cid:173) alent to Q-Iearning by construction. Its equivalence is achieved by encoding Q-values within the policy and value function of the ac(cid:173) tor and critic. The resultant actor/critic algorithm is novel in two ways: it updates the critic only when the most probable action is executed from any given state, and it rewards the actor using cri(cid:173) teria that depend on the relative probability of the action that was executed.

Abstract

Name Change Policy