NeurIPS 2020

Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization

Meta Review

The paper proposes a method for LIDAR-based object detection that exploits cross-view consistency between bird's-eye view and range view point clouds of the scene. The two inputs are fed to separate neural networks trained with a loss function that includes a term that encourages consistency between the two representations. Evaluations demonstrate strong performance compared to baselines on NuScenes. The paper was reviewed by four knowledgeable referees, who read the author response and subsequently discussed the paper. The reviewers agree that the manner in which the method exploits the bird's-eye and range views is interesting and elegant, namely the HCS voxel representation that enables feature extraction for both views and the manner in which the method enforces consistency on the transformed feature representations. Experimental results on NuScenes show the method's promise, while the ablations help to convey the contributions of the different model components. The reviewers raised important questions about the experimental evaluation. Some of these questions were addressed in the author response, including the additional evaluation on the Waymo dataset. However, there are issues with the way that the paper is currently written and the authors are strongly urged to address these in the next version of the paper.