Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The topic of this paper is highly relevant, since stability guarantees are often sought after for learned policies. While I'm generally excited about the approach, the paper does not address aspects that might make the technique applicable to domains with non-smooth (e.g. legged locomotion) or unknown dynamics (e.g. a real robot or even a physics simulator). The condition in Eq. 1 is a strong one and I understand the theoretical need for it. However, I'm wondering if there's any value to the proposed technique in case this condition is not met? Will the inclusion of the Lyapunov risk as a term in the cost function yield feedback controllers that are more robust in practice, even for non-smooth systems? A few more detailed comments: - You mention that Relu based networks can't be supported from a theoretical point of view. Does this matter in practice? - Have you tried your approach on any more complex systems or using a learned (fitted) dynamics model rather than a given one? - Please provide learning curves for the various experiments. As a reader, I have no idea how complicated it is to learn the stability guarantees as compared to just optimizing a normal feedback controller. How often does it fail? Is failure to find a stable controller more often the case for larger (more DoF) system. Originality: Good. This paper combines Lyapunov theory and learning based control. The basic algorithm (without the SMT solver) appears to be relatively straightforward to add to an existing setup (given it has differentiable dynamics). Quality: OK. The experiments seem difficult to reproduce with the information provided. Clarity: OK. The writing itself is of high quality. However, the paper could be better organized. As a reader, I'm missing a concise overview of how the algorithm works. Significance: High. If it's possible to obtain stability guarantees as part of the learning process for non-trivial tasks, then that's a significant step forward. This paper appears to be a step in the right direction.
The core of the method is the interaction between the learner and the falsifier. The learner minimizes the Lyapunov risk, defined as the maximum violation of the Lyapunov stability condition over the state space. The falsifier solves a non-linear constraints feasibility problem where it tries to find a state that violates the Lyapunov stability criterion given the current Lyapunov function and controller. Due to the delta-completeness of the falsifier, if no such example can be found, the controller is guaranteed to stabilize the system. The originality of the work is not clearly defined. In the related work section, the authors mention other works that use NN to learn Lyapunov functions as well as other works that use similar learner-falsifier frameworks which use the same non-linear constraint solver to learn controllers for non-linear systems. As such, it is not easy to assess the novelty of the paper. The clarity of the work could be improved. The authors use a good amount of space to report things that do not really give an insight into the paper while they introduce central concepts in just a few lines. For example, the authors unroll the definition of Lyapunov falsification constraints in example 1 for a simple NN and a simple controller and they report the numerical values of the entire matrices learned by their method in section 4. In my opinion, these do not help to understand the paper better. On the other hand, the authors do not introduce the constraint solving problem. Moreover, they mention that solving such a problem is NP-hard as it involves the global minimization of a highly non-convex function but they do not explain or give an intuition on how the delta-complete algorithm can deal with this. The problem of finding complex non-linear Lyapunov functions together with the controllers for which they can guarantee stability is very important to the community and, therefore, this is a relevant paper.
This is a very strong contribution. In terms of organization, I would have moved all the numerical values learned (W_1, B_1, u_1, …) and many of the equations describing the systems the algorithm was evaluated on into the appendix, and spent more of the main paper describing more of the systems implications of the algorithm. In particular, Appendix B Table 1 and some commentary about how that cost differs from other approaches (LQR?) would be more immediately useful to the reader. I would also have loved to see validation of the approach on a real physical system with real-world complexity to help understand how this approach may scale. More discussion of the limitation of the approach would also strengthen the paper. None of these issues are critical. In general, making it possible to certify non-linear controllers is a huge problem, and new approaches to this issue are of general interest to the community. POST-REBUTTAL: thank you to the authors for their feedback. I had hoped for a slightly stronger statement than 'Yes, we *can* add evaluation of the control designs in physical wheeled robots.', but this doesn't change my overall assessment. Increasing my confidence level based on peer reviews.