NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The paper studies the certainty-equivalence (aka plug-in) principle for linear quadratic systems (LQR and LQG). The main result is a "fast rate" style of guarantee, estimating model parameters to accuracy epsilon gives a policy error of O(epsilon^2). There are several things to like here: results for LQG, a deeper understanding of plug-in estimators, and a honest/detailed comparison to prior work. One concern brought up by the reviewers is that the pre-multipliers in the results are relatively large (in particular larger than those in Dean et al.) and difficult to interpret. The paper is honest about this in the discussion and a larger constant factor is confirmed by prior empirical results, so I do not view this as a deal-breaker. (As an aside, it might be nice to include that experiment in this paper, just so it is more immediately accessible to readers.)