NeurIPS 2020

Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps


Meta Review

This paper proposes an approach to predict multiple stable 6-dof grasp parameters for standard parallel-jaw grippers from object point cloud inputs, with associated confidence values. Grasps are represented as tuples of (contact points of the 2 jaws and the pitch angle of the gripper), which motivates the new architectural choices proposed here, inspired by standard architectures in 2D object detection. While the network is trained end-to-end, it is internally decomposed in a sensible stage-wise manner. They also create a synthetic 22.6M 6-DOF grasp dataset built on ShapeNet objects using physics simulation, which upon public release, will be the largest such dataset. Finally, there are some limited transfer results that demonstrate transferability to real-world grasping with acceptable performance drop. The author responses were helpful in addressing many of the weaknesses raised in the first round of reviews (and I urge the authors to incorporate key response points into future versions), but a few concerns remain. In particular, grasping in robotics is a long-studied field, typically involving extensive real-world experimentation and benchmarking for proper evaluation. By comparison, the real-world transfer results presented here are limited. While I am recommending acceptance, I urge the authors to consider more extensive experimentation on reporting real-world grasping results. Another drawback is that, unlike a lot of recent grasping work, the method is only ever evaluated on objects from the training categories. It would be interesting to see evaluation on unseen object categories too. Finally, the current broader impact statement could be improved with some more careful thought, particularly about ethical consequences. "Negative impact may arise if the technology is abused" really doesn't say anything at all.