Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Xu Liu, Chengtao Li, Jian Wang, Jingbo Wang, Boxin Shi, Xiaodong He
Global context is crucial for 3D point cloud scene understanding tasks. In this work, we extended the contextual encoding layer that was originally designed for 2D tasks to 3D Point Cloud scenarios. The encoding layer learns a set of code words in the feature space of the 3D point cloud to characterize the global semantic context, and then based on these code words, the method learns a global contextual descriptor to reweight the featuremaps accordingly. Moreover, compared to 2D scenarios, data sparsity becomes a major issue in 3D point cloud scenarios, and the performance of contextual encoding quickly saturates when the number of code words increases. To mitigate this problem, we further proposed a group contextual encoding method, which divides the channel into groups and then performs encoding on group-divided feature vectors. This method facilitates learning of global context in grouped subspace for 3D point clouds. We evaluate the effectiveness and generalizability of our method on three widely-studied 3D point cloud tasks. Experimental results have shown that the proposed method outperformed the VoteNet remarkably with 3 mAP on the benchmark of SUN-RGBD, with the metrics of mAP@ 0.25, and a much greater margin of 6.57 mAP on ScanNet with the metrics of mAP@ 0.5. Compared to the baseline of PointNet++, the proposed method leads to an accuracy of 86 %, outperforming the baseline by 1.5 %. Our proposed method have outperformed the non-grouping baseline methods across the board and establishes new state-of-the-art on these benchmarks.