NeurIPS 2020

3D Shape Reconstruction from Vision and Touch


Meta Review

This paper proposes to fuse vision and haptic information to reconstruct 3D shapes for robotic hand manipulation. The reconstruction is done by representing the objects as a collection of deformable meshes (defined as charts in the previously published AtlasNet paper). The merging of the vision and touch charts is done using graph convolutional networks, with local and cross-modality communication between charts. Experiments are conducted in simulation, on a new dataset designed by the authors, with known hand and object surface structure, and vision and touch inputs. After rebuttal, reviewers gave scores between 6 and 7. They praised the writing, the ideas and the relevance of the problem. Reviewers R2 and R4 had concerns about the simulated environment. Reviewer R1 is concerned about too strong biology-related claims, as well as comparisons with "FoldingNet: Point cloud auto-encoder via deep grid deformation" (CVPR, 2018). Based on these comments, I believe the paper should be accepted as a poster. Please note that in the final version, the authors should address the many requests for clarification that have been raised by the reviewers.