Policy Evaluation Using the Ω-Return

Thomas, Philip; Niekum, Scott; Theocharous, Georgios; Konidaris, George

Policy Evaluation Using the Ω-Return

Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Konidaris

Advances in Neural Information Processing Systems 28 (NIPS 2015)

Bibtex Metadata Paper Reviews Supplemental

Abstract

We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.

Abstract

Name Change Policy