Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper is about Cascade RPN, a novel region proposal network which is typically used to propose potential boxes containing objects of an image. Starting from the observation that the original RPN has fixed anchors distributed uniformly and do not reflect the arbitrary locations of ground truth boxes, the paper describes the recent attempts to address this issue. Accurate description of Iterative RPN, its plus version, and GA-RPN is reported with their issues regarding the feature alignment problem. The paper then describes the use of adaptive convolution and the use of iterative anchor distribution which form the base of Cascade RPN. Anchors are predicted dynamically along with their features properly aligned. Experiments on the popular COCO2017 detection dataset show that the proposed Cascade RPN has better performance vs previous RPN models in terms of average recall. The integration of Cascade RPN with state of the art object detection model Fast RCNN and Faster RCNN show a slight improvement on mAP performance. Strengths: + The paper is very well written, clear and easy to follow. + The proposed Cascade RPN goes into the good direction of improving the RPN network by dynamically set the anchors. + The method is sound and experiments seems to be well carried. + Performance in terms of average recall is clearly improved by the approach, and the integration of the model into Fast(er)-RCNN show a little improvement. Weaknesses: - The approach is a refinement of previous methods (iterative RPN, GA-RPN) and thus incremental. Nonetheless, the performance are significant. - More analysis regarding the various parameters would have been interesting. For instance, typically a region proposal is tested according to its average recall by varying the number of proposal emitted. It would have been interesting to see a plot when varying such number. After the rebuttal update —————————— I thank the authors for addressing my concerns. As a result, I increase my score to accept.
The paper proposes a new variant of the popular region proposal networks (RPN). The authors first identify the issue stemming from a misalignment between predefined anchors and the ground truth boxes. They investigate why this issue is not resolved well enough by iterative RPNs and propose to address the issue directly by removing the heuristically defined anchors. This is enabled by: - using a single box per position, - using a combination of anchor-free and anchor-based criteria to define the positive boxes, - introducing a new adaptive convolution layer that allows the features to be well-aligned with the anchors. The experiments show that the method clearly surpasses other state-of-the-art methods (2-4% over the 2nd best method GA-RPN in about the same runtime). The authors also perform an ablation study which shows the contribution of the individual components. Overall the paper is clearly written and well structured. Questions / Changes: - I wonder why the improvements seem to be larger for mainly for large objects (Table 2, AP_L). Is there an explanation? - Why do you prefer the IOU loss over the more common L1-smooth loss? Have you tested both? - Please make the font in Figs. 1 & 2 larger - as of now it is too small, which makes it hard to read the labels. - Correct the typo on L172 "object objectness" -> "objectness". Also after the rebuttal I think this paper should be accepted.
The idea is simple and valid, and the experimental results demonstrate improvement over other methods. The writing of the paper is also good, with strong logic and easy to follow. My major uncertainty, is whether cascade-rpn could be plug in other popular two-stage object detection approaches, and achieve better object detection results.