Christopher Atkeson, Jun Morimoto
A longstanding goal of reinforcement learning is to develop non- parametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of di- mensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along tra- jectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is up- dated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinu- ities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the ap- proach to make the policies more robust to modeling error and sensor noise.