NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:91
Title:DeepUSPS: Deep Robust Unsupervised Saliency Prediction via Self-supervision

Reviewer 1

Saliency detection is an interesting and important problem in computer vision. It is often used for localising the object of interest in a given scene and as a guiding step (or an anchor) for object detection algorithms. Authors address this problem by adopting MAV/CRF and deep FCN in their framework. Although the outcome of this method is good, the authors do not motivate the use of FCN, CRF and handcrafted saliency prediction methods. Given that several state-of-the-art CNN based segmentation and saliency detection algorithms have been proposed in the recent literature, what value does this work add to CV/ML research? Can you please list the saliency detection methods whihc were used in this paper? Can you please comment on the computational efficiency of this technique compared to the other methods? A similar work, "Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective" (CVPR 2018), with some overlap has been published in the past. How is your contribution different from this work?

Reviewer 2

Despite the impressive empirical results, the paper is difficult to follow, and it seems the proposed method is just a combination of existing previous methods. The paper, despite the simplicity of the method, could have used better writing. - in terms of style -- related work should come before describing the method. - when using acronyms (e.g. USD, SBF), for the first time, make sure the full name is mentioned as well, and the acronym in brackets. - if a method is referred to using an acronym (RBD, DSR, MC, HS) -- please use the same acronym in the tables -- it is difficult to trace the acronyms and citations throughout the paper otherwise. Are the results on ECSSD, DUT and SED2 evaluated with the model trained on MSRA-B? How would this method compare to Chen et al 2018 (DeepLab)? Is there a reason for not evaluating on Pascal segmentation? How does the weighted F-measure act as a pixel-wise loss, and how does it enforce inter-image consistency? The section (L86-106) is only explaining how to compute the F measure, and that loss is 1-F_beta. What is the difference between the "No-CRF" in the ablation in supplementary material and the "no self-supervision" in Table 2? L188-189 -- it is not clear what the change to the network was made. Is the same change applied to ResNet? If so, is only this new layer trained, and the rest of the network is frozen? Training seems to be done on a small number of images, for a small, fixed number of epochs (25). Is this sufficient to prevent overfitting? How was 25 decided? Did the authors consider other forms of regularization?

Reviewer 3

DeepUDPS is an interesting improvement to [Zhang et al. (2018, 2017a)]&[Makansi et al. (2018)] via at least 2 aspects: 1) refining noisy labels in isolation and then 2) incremental refining with self-supervision via historical model averaging and yields very competative results. Their method has generality to some extent e.g.,for other problem seetings including medical image segementation and so on, and from my viewpoint, though novelty is not specially significant due to some similarity to both the crowdsourcing and the unsupervised ensemble learning, the idea is intuitively simple yet effective. My other comments are 1.Using many handcrafted methods actually is similar to the crowdourcing in idea,where each handcrafted method plays a worker role for which the authors do mention nothing! 2. how to avoid cumulative mistakes once a mistake has truly been made but is not perceived. 3.One theoretical problem: Under what condition, such noisy label refinement can be helpful for better results? 4. Whether do the handcrafted methods need the diversity for improved performance? What effect is the number of the handcrafted methods used on performance? 5. For which images, your method will NOT work! 6.How do different initial handcrafted methods and the number of them influence on final label-quality, Do these methods need to selectively be fused? 7.Performance curves in Fig. 5 are NOT quite clear!