Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Pieter Abbeel, Andrew Ng
First-order Markov models have been successfully applied to many prob- lems, for example in modeling sequential data using Markov chains, and modeling control problems using the Markov decision processes (MDP) formalism. If a first-order Markov model’s parameters are estimated from data, the standard maximum likelihood estimator considers only the first-order (single-step) transitions. But for many problems, the first- order conditional independence assumptions are not satisfied, and as a re- sult the higher order transition probabilities may be poorly approximated. Motivated by the problem of learning an MDP’s parameters for control, we propose an algorithm for learning a first-order Markov model that ex- plicitly takes into account higher order interactions during training. Our algorithm uses an optimization criterion different from maximum likeli- hood, and allows us to learn models that capture longer range effects, but without giving up the benefits of using first-order Markov models. Our experimental results also show the new algorithm outperforming conven- tional maximum likelihood estimation in a number of control problems where the MDP’s parameters are estimated from data.