Width-based Lookaheads with Learnt Base Policies and Heuristics Over the Atari-2600 Benchmark

Part of Advances in Neural Information Processing Systems 34 pre-proceedings (NeurIPS 2021)

Paper Supplemental

Bibtek download is not available in the pre-proceeding


Authors

Stefan O'Toole, Nir Lipovetzky, Miquel Ramirez, Adrian Pearce

Abstract

We propose new width-based planning and learning algorithms applied over the Atari-2600 benchmark. The algorithms presented are inspired from a careful analysis of the design decisions made by previous width-based planners. We benchmark our new algorithms over the Atari-2600 games and show that our best performing algorithm, RIW$_C$+CPV, outperforms previously introduced width-based planning and learning algorithms $\pi$-IW(1), $\pi$-IW(1)+ and $\pi$-HIW(n, 1). Furthermore, we present a taxonomy of the set of Atari-2600 games according to some of their defining characteristics. This analysis of the games provides further insight into the behaviour and performance of the width-based algorithms introduced. Namely, for games with large branching factors, and games with sparse meaningful rewards, RIW$_C$+CPV outperforms $\pi$-IW, $\pi$-IW(1)+ and $\pi$-HIW(n, 1).