Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

Engel, Yaakov; Szabo, Peter; Volkinshtein, Dmitry

Learning to Control an Octopus Arm with Gaussian Process Temporal Difference Methods

Yaakov Engel, Peter Szabo, Dmitry Volkinshtein

Advances in Neural Information Processing Systems 18 (NIPS 2005)

Abstract

The Octopus arm is a highly versatile and complex limb. How the Octo- pus controls such a hyper-redundant arm (not to mention eight of them!) is as yet unknown. Robotic arms based on the same mechanical prin- ciples may render present day robotic arms obsolete. In this paper, we tackle this control problem using an online reinforcement learning al- gorithm, based on a Bayesian approach to policy evaluation known as Gaussian process temporal difference (GPTD) learning. Our substitute for the real arm is a computer simulation of a 2-dimensional model of an Octopus arm. Even with the simpliﬁcations inherent to this model, the state space we face is a high-dimensional one. We apply a GPTD- based algorithm to this domain, and demonstrate its operation on several learning tasks of varying degrees of difﬁculty.

Abstract

Name Change Policy