NeurIPS 2020

RSKDD-Net: Random Sample-based Keypoint Detector and Descriptor

Review 1

Summary and Contributions: This paper proposes a new method to efficiently generate keypoint detector and descriptor on 3D point clouds. They propose a few novel modules in the network such as random dilation cluster to improve the receptive field of the network, attention-based point aggregation, and a novel matching loss. The proposed method outperforms previous state-of-the-art methods in various metrics.

Strengths: The overall method is sound and reasonable. I’m very surprised that the proposed matching loss outperforms the triplet loss and to be honest still find it difficult to believe it, but the ablation study is adequate to support their claim. The proposed method may have some significance in the fields that require keypoint detection and descriptors such as 3D localization and reconstruction. The registration experiment seems to have demonstrated a promising potential. The proposed random dilation cluster may be useful for generic 3D reasoning as well.

Weaknesses: Fundamentally, the performance of the proposed method seems to be throttled by the initial random choice of cluster centers. I am worried that the proposed keypoint detection may not work well if we use too small number of keypoints compared to the size of the scene. Metric: I am surprised that recall is not a metric of keypoint detector when precision is. For keypoint detection, I would care more about missing keypoint than uninformative ones. Ablation study: There are many details of the proposed components that could better be ablated. For example, the effect of keypoint weight in the matching loss and the effect of temperature parameter on the performance may have been helpful. minor: Although the overall paper is sound as a computer vision application paper, it is a bit concerning that his paper may not fully fit the venue of NeurIPS community.

Correctness: The method and the claims is sound to me.

Clarity: The writing quality looks okay to me. The paper nicely motivates the problem and introduces the previous works to ground this paper on. I had no major problem following the details of this work. However, there were a few grammatical errors and it could better be revised. Furthermore, the paper is not fully self-contained and often refers import details to other papers.

Relation to Prior Work: This is the first paper I read in deep learning keypoint detection and descriptor. I am not aware of prior works and as far as what this paper has described, the previous state-of-the-art methods look recent and the evaluation protocols seem to match as well.

Reproducibility: Yes

Additional Feedback: N/A post rebuttal comment: Authors have addressed most of my concerns and I do not see a major weakness of this work on other reviews. I am slightly concerned about the way the runtime is evaluated in this work, though. I would like to update my rating to accept.

Review 2

Summary and Contributions: The key idea of the paper is to use random sampling to select candidate points for keypoint detection through a saliency mapping and learn the keypoint descriptors simultaneously for the task of point cloud registration. An attention mechanism is introduced to aggregate the keypoint features. The paper suggests an additional loss function for the task. The method is evaluated on 2 outdoor datasets: the KITTI dataset and the Ford dataset.

Strengths: Even though the Fig. 1 that describes the network architecture seems complicated the method is straightforward. The proposed method outperforms previous works on keypoint repeatability. The method produces comparable results on the registration task with the previous work of USIP.

Weaknesses: The method seems to predict saliency maps for all the points in the point cloud treating all points as potential keypoints. This seems computationally heavy for large point clouds. The claim on computational time does not correspond to the computational time reported on the paper of the previous work of USIP.

Correctness: In Table 1 the computational time report does not match the computational time reported in the USIP paper. It is very different. The proposed method is surprisingly fast for a method that seems to consider all keypoints as potential keypoints. (line 90)

Clarity: The paper is clearly written. The task is explained in detail and the components of the approach are described in detail.

Relation to Prior Work: The proposed approach is compared with previous works on the field and what separates the proposed method from previous works is clearly explained.

Reproducibility: Yes

Additional Feedback: ===== POST AUTHOR FEEDBACK ====== I read the other reviews and the author feedback. My concerns regarding the speed comparison and the confusing sampling definition were mostly addressed in the rebuttal. However FPS is a slow sampling algorithm I would have liked to see USIP running times with maybe random sampling or both methods compared without the sampling runtime. I am still a bit unsure about the random sampling. For me their method can compensate for the few number of keypoints as long as they sample enough points. With few sampled points everything will break. So I was hoping they would focus on that side a bit more, the authors admit the issue in the rebuttal and claim that even like this they still outperform other methods (few papers also have no issues at all). But I am still on the positive side.

Review 3

Summary and Contributions: Point could registration is an important task in self-driving and mapping resaserch area. However, previous SOTA methods adopt farthest point sample (FPS) which results in O(n^2) complexity which makes them hard to become real-time applications. This work proposes random dilation cluster strategy and attention learning mechanism with random sampling to avoid the information loss. It also has a joint detector-descriptor learning pipeline for more accurate registration. They finally achieve SOTA results in both time and performance.

Strengths: By improving the speed (15x!!!) and the performance at the same time, this paper already has enough impact to be on a top conference. The overall design makes a lot of senses, the learning of the detector-descriptor is clearly inspired by pointnet alike structure and attention mechanism.

Weaknesses: From idea, method, to experiment, they are all impressive. I can hardly find a drawback of this paper. However, it looks like the ablation against the random sampling itself is not fully analyzed. The final results only suggest that sampling was not that important as long as the receptive field is large enough. I believe the author need to discuss this more. For example, if the sampling process is purely random, then how can one guarantee the descriptor and keypoint can be matched in different maps. What if the attention mechanism is unable to correct this due to the sampled points are simply too different? I think Figure.6 is meant for this kind of problem but it's still hard to imagine such issue can be resolved that easily. I would like to know the authors' explanation.

Correctness: I think the intuition and the method are both reasonable. Abundant experiments are provided.

Clarity: The paper is clearly written in terms of high level idea, contribution, and overview. However, please detach the legend list in Fig.3,4. It's hard to read.

Relation to Prior Work: The proposed method has a very clear feature against the current SOTAs. It improves both speed and performance by consider the basic rather than complex idea. Impressive.

Reproducibility: Yes

Additional Feedback: I have read the rebuttal. I vote for acceptance. Looking forward to see the code.