NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
1. Originality: The paper considers standard control setting in LQR and LQG with unknown dynamics, and standard algorithm---certainty equivalence. A key part of the analysis is based on perturbation bounds for discrete Riccati equations, which uses an existing results from Konstantinov et al (the authors provided a new analysis but at the cost of a stronger assumption). Finite sample analysis of certainty equivalence method is not new in RL literature, especially for tabular MDPs. In my perspective, the survey on related works on sample complexity of certainty equivalence methods for different models other than LQR should be included in related work for discussion. 2. Quality: the submission is technically sound and the results are well supported by theoretical statements. The authors also carefully compared their results to previous results using a robust control approach to account model uncertainty. The result is that certainty equivalence method could perform better under the regime of small model estimation error, but could be more sensitive to model error. Would it be possible to conduct experiments to verify and to better illustrate the tradeoff between these two? 3. clarity: I think the paper is well written and organized. 4. Significance: The finite sample analysis of certainty equivalence in LQR and LQG was missing previously and the paper fills this gaps. While the analysis techniques used in paper existed before (e.g., Riccati Perturbation Theory), the final result is new to me.
Reviewer 2
The result of theorems 1, 2 and 3 hold when the error in the parameter estimation is “sufficiently” small. However, the regime that these theorems allow for estimation error is almost never practical. For example, Theorem 2 holds as long as the right hand side is smaller than \sigma^2_w. Consider a system with d=l=4 and \beta = \Gamma^* = \tau=2. This requires \varepsilon=2^{-23} which is not possible for most practical purposes. As the comparison in Section 2.1 suggests, the constants behind the main results of this paper are significantly larger than that of Theorem 4.1 of Dean et. al (2017). Despite the faster rate on , the price paid for the constants is too much to call it an improvement upon Dean et. al (2017). In the online control setting, it seems that the idea of using the offline algorithm with \epsilon-greedy exploration should work, it would be nice if the authors can provide a more rigorous statement and proof for Corollary 1. It would also have been useful to see some numerical results to get a better sense of the theoretical results. Minor Comments: I couldn’t find the proof of Theorem 2. Also, I suggest to write the exact universal constant for Theorem 2 and 3 rather than using O(1) notation. Page 2, line 72: \underline{\sigma} is not defined upto this point. I think the first time it is defined is line 105. In Assumption 2, it is assumed that the system is (l,\vartheta)-controllable. but what are (l,\vartheta)? Page 6, line 232: What is “dare”? I don’t think this is defined before.
Reviewer 3
This paper studies the performance of certainty equivalence principle, i.e., fitting a model from given data and designing a controller using the fitted model, in Linear Quadratic systems. Originality: The results rely critically on the extension of an existing result in perturbation theory of discrete Riccati equations. The bounds reflect explicit dependence on system level parameters, unlike previous literature. Quality and Clarity: I have checked some of the proofs and they seem correct to me. The paper is easy to follow, and the statement of the main results make it clear to follow the next steps. Significance: I feel that the contributions are reasonably significant to be published at NeurIPS. The main reason is that it connects seamlessly to past work on system identification.