Robust exploration in linear quadratic reinforcement learning

Part of Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

AuthorFeedback »Bibtex »Bibtex »MetaReview »Metadata »Paper »Reviews »Supplemental »


Jack Umenberger, Mina Ferizbegovic, Thomas B. Schön, Håkan Hjalmarsson


Learning to make decisions in an uncertain and dynamic environment is a task of fundamental performance in a number of domains. This paper concerns the problem of learning control policies for an unknown linear dynamical system so as to minimize a quadratic cost function. We present a method, based on convex optimization, that accomplishes this task ‘robustly’, i.e., the worst-case cost, accounting for system uncertainty given the observed data, is minimized. The method balances exploitation and exploration, exciting the system in such a way so as to reduce uncertainty in the model parameters to which the worst-case cost is most sensitive. Numerical simulations and application to a hardware-in-the-loop servo-mechanism are used to demonstrate the approach, with appreciable performance and robustness gains over alternative methods observed in both.