NeurIPS 2020

On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

Meta Review

This paper studies a recent method on Robust Adversarial Reinforcement Learning (RARL) by Pinto et al in the linear quadratic setting (linear dynamics, quadratic cost function), which is a typical starting point in the analysis of optimal control algorithms. The paper examines the stabilization behavior of the linear controller, showing that RARL in the simplified linear quadratic setting shows instabilities. The paper proposes a new formulation of RARL in the linear quadratic setting, which can inform solutions in the nonlinear setting, and provides stability guarantees for the proposed method. In the post rebuttal discussion 3/4 reviewers evaluated the paper highly and recommended that the paper be accepted. 2/4 reviewers expressed concerns about the limited novelty of the paper, as it relies heavily on [30, 44, 10]. I agree that the paper makes a significant and interesting enough contribution in terms of pointing out the instabilities of RARL and addressing them in the linear quadratic setting, which in my view is sufficient for publication at NeurIPS.