NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4493
Title:Learning-In-The-Loop Optimization: End-To-End Control And Co-Design Of Soft Robots Through Learned Deep Latent Representations

Reviewer 1

Summary The paper proposes a differentiable pipeline that can jointly learn a latent space representation (via variational autoencoder) for controlling soft robots and optimize for the controller and the soft robot parameters for tasks in simulation, such as making a soft 2D robot walk forward as fast as possible. The work is made possible by using a differentiable hybrid-particle-grid based soft material physics simulator. The authors provided insightful details on the alternative minimization scheme for training the autoencoder, the controller neural network, and the robot parameters in tandem. The proposed framework was evaluated on 5 simulated experiments that show controllers using the learned representation outperforms ones using the baseline representation obtained via k-means clustering. Review While the performance of the system is impressive, the motivation of the approach is not well-communicated in 3 folds: In discussing the proposed hybrid-particle-grid based soft robot representation vs finite element methods, the authors claim that the high “degrees of freedom of finite element methods is impractical for most modern control algorithms.” While probably true, the authors provide no additional details to back up this statement, and it’s not clear why one can’t learn a latent representation over FEMs to act as the control input. The authors also claim that previous work using FEM simulation requires a priori knowledge of how robot will move as a drawback, but isn’t the hybrid-particle-grid simulator also “a prior knowledge”? The authors do not sufficiently explain why co-adapting a soft robot’s design parameters along with task and latent representation learning is desirable. Potential drawbacks of the approach include that learning the latent representations, controller, and robot parameters together, driven by loss of a single task, does not lead to representations and robot parameters that are useful for other tasks: The advantage of a fully differentiable pipeline over learning the representation and the controller is clear in the context of implementation and perhaps efficiency at learning a specific task. But, this does not mean that the learned latent representation is sufficient for tasks that it was not trained on, and it is also possible that representation is not generalizable to different downstream controllers. A similar point on task-generalization can be made about co-adapting the soft robot parameters in tandem with representation and controller learning. The optimized soft robot parameters (i.e. the young’s modulus of the robot materials) may have overfitted to the task and the controller since the only way that the robot’s fitness is determined is through the performance of a particular task. Of course the paper’s focus is on multi-task learning for soft robotics. This appears to weaken the argument for learning the representation explicitly over learning the controller directly for the task. Evaluating this alternative approach, which is simpler to train as it avoids the alternating training scheme, would strengthen the paper’s position. Additional comments: The specific task loss function was not provided in the paper. There is no discussion on convolutional autoencoder needs to scale w/ robot particle grid size While the authors state that they’ve chosen the particular autoencoder architecture based on “stability and generality across experiments,” they do no explained what stability and generality across experiments specifically mean, and it’s also not clear how (in)sensitive the architecture is to soft robots with varying complexities. The algorithm section contains many insights on how the authors made the alternating minimization scheme to work. In particular, training is done alternatively between the parameters of the controller + robot, and the latent representation. The authors implement conservative early stopping for the encoder to prevent overfitting to historical snapshots, experience replay buffer to prevent overfitting to recent trajectories, and target network to update and copy to source every once in awhile w/ a learning rate. There is concern however, that a training procedure with as many moving parts as this may be difficult to tune for new tasks and robots. The authors should also discuss other recent works on co-design and hardware representation approaches (just a couple listed): Wang et. al. Neural Graph Evolution, ICLR 2019 Chen et. al. Hardware Conditioned Policies for Multi-Robot Transfer Learning, NuerIPS 2018 Pathak et. al. Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity, Conclusion: The proposed method is novel and the experiments are impressive, but because the motivation could be better articulated and method seems difficult to reproduce, the paper as it stands may not be as useful as its novelty. In conclusion a borderline accept is recommended.

Reviewer 2

This paper presents a hybrid algorithm (learning-in-the-loop optimization). This algorithm is tested on different robots in 2D and 3D. Although learning-in-the-loop optimization is not new in itself, the way the authors use it is interesting. The convolutional variational autoencoder is used to reduce the dimensionality of the robot, which is a very common control strategy. Instead of using the classic control theory, the surrogate observer seems a promising or good alternative way given the increasing power of modern computation. While close loop learning and optimization is promising, it does shows the problems such as stability and convergence. Another drawback of this method is it may works for robot that is with regular shape. The examples shown in the paper are all block-based robot which might not represent the real-world robot well. Also, representing the robot in 2D grid seems over-simplified. The methods relies on a fullly differentiable simulator, which may be another limitation as not many differentiable simulators are available.

Reviewer 3

This paper presents a method for end-to-end co-optimization of soft robots. The proposed algorithm is based on learning low-dimensional robot state while simultaneously optimizing robot control and/or material parameters. This is realized through a learning-in-the-loop co-optimization algorithm in which a latent state representation is learned as the robot solves a task. Originality: the proposed method is a novel interesting solution to controlling soft bodies in simulation; the paper frames the contribution within relevant related work, highlighting the differences of the proposed method to the state-of-the-art, including both classical control methods and data-driven approaches. Quality: the submission is technically sound and claims are supported by adequate analysis of the experimental results; authors also underline limitations and constraints of their methods (e.g. different performance with different working setups and design choices), which give valuable insight and understanding of the proposed method. Clarity: the paper is clearly written and organized, the description of the method is clear, and pseudo-code supports the text. It would be interesting to clarify the following points: - how realistic is it to assume a large dataset of robot motion data that is "*representative of the way the robot will move* when completing the prescribed task"? If part of the optimization goal is to actually learn how to perform the task, where is the initial dataset coming from? - how much did the feature oscillation affect the experiments? do you have quantitative results on the instability given by this effect? Significance: this is an interesting piece of work that can have a good impact in the field of soft robotics modeling and control in simulation. The experimental setup is simple but diverse enough to evaluate the proposed method. It would be interesting to see how the proposed method perform with more realistic models of soft bodies, possibly including multiple bodies interacting. The experiments presented seem however a good starting point for demonstrating the new proposed method.