Approximate Policy Iteration with a Policy Language Bias

Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)

Bibtex Metadata Paper


Alan Fern, Sungwook Yoon, Robert Givan


We explore approximate policy iteration, replacing the usual cost- function learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-speciļ¬c planners for clas- sical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs.