NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6913
Title:Unsupervised Object Segmentation by Redrawing

Reviewer 1

Originality: To my knowledge this is an original approach to the unsupervised learning of object segmentation. Besides the specific method proposed in this paper, I find this problem very exciting and I think that it will have a great development in the near future. I do not see this as a combination of prior work. In general, this paper uses methodologies/toola that are becoming well-established (eg GANs). There is a good prior work section, but some important works are missing and should be discussed, such as: Remez et al. Learning to segment via cut-and-paste. ECCV 2018. Ostyakov et al. SEIGAN: towards compositional image generation by simultaneously learning to segment, enhance, and inpaint, ArXiv 2018 Kanezaki. Unsupervised image segmentation by backpropagation. ICASSP 2018 Ji et al. Invariant information clustering for unsupervised image classification and segmentation. ArXiv 2018. In any case, this paper proposes a different method compared to those in the missing citations. ******************************************** Quality: The contribution is mostly on a solution to a very challenging problem. Therefore, there is more of an algorithmic and experimental contribution. In general the approach is in the right direction. However, there are some technical issues that I find unconvincing and I would like the authors to explain them. 1) The independence assumption between the textures in each layer is not true (the authors also mention this at lines 137-140). I expect that this limits the applicability of the method to specific datasets where there is a single object category or where background and foreground are mostly independent (so the context is irrelevant). If this is the case, I find the whole idea limited as it could not be, in principle, applied to datasets with a mix of different categories. 2) The other important restriction in this method is the mask constraint: Basically, only objects with the same mask can be generated. This could limit the variability of the generated objects. 3) Another potential problem is the use of a regressor of z_k. The idea is technically correct, but the generator could easily "fool" this strategy. It could learn to vary the foreground object only a little bit, enough for the delta_k function to retrieve the latent code z_k. Since we know that adversarial examples can change the classification result with an imperceptible change of the input, we could also expect the generator to do the same. Besides, producing a small variation is certainly easier than a large one, so it is more likely to behave in this undesired way. 4) It is also unclear why the training would not simply result in an f function that outputs the same set of masks, where there is a single foreground object (M1=1 and M2=0 everywhere). Then, G1 is simply a GAN generator. I could not see a term that would discourage this degenerate behavior from happening. Probably it would be good to see an analysis of and convincing experiments on these issues. For example, what happens when 2 categories are mixed. Another one is how much variability is possible for the same mask. If not a quantitative evaluation, it would be good to see a qualitative one. An ablation showing the effect of using or not using delta_k would be useful. An explanation of why f does not output a content would be very useful. ******************************************** Clarity: The paper is clearly written and well organized. I think that the only missing components are the points above, The code seems reproducible. ******************************************** Significance: In my view unsupervised object segmentation is a very important problem. The results in this paper are a first step. In my view their main restriction is that they can only work with a single category at a time, but this is still a step forward. However, technically this should be called weakly supervised object segmentation, as the category label would be needed. Experimentally the conclusions are that the method yields a meaningful segmentation. However, my doubts above are still waiting for a clarification from the authors. ++++++++++++++++++++++++++ After reading all the reviews and the author's rebuttal we discussed briefly. I decided to keep my score.

Reviewer 2

The idea of training using an adversarial approach, and generating region by region, instead of the whole image, goes towards building robustness on region segmentation. Results seems to demonstrate the effectiveness of this novel approach. The assumption of the paper, that training on partly reconstructed images should yield as good results than training on whole new images, seems intuitive. This is the first work that I aware of, tacking this approach.

Reviewer 3

Originality: The proposed framework for unsupervised object segmentation is novel and its the first work using generative models to demonstrate unsupervised segmentation on real world datasets. The design of the segmentation model and the learner is done by combining ideas from prior work. Quality: The paper is technically sound, and the experimental results show that proposed model performs well on three datasets showing the effectiveness of the approach. However, the evaluation is done in settings with a single fore-ground object, while the described model is for "n" different objects. Clarity: The paper is well written and experimental details are clearly mentioned. Significance: The proposed unsupervised segmentation model is shown to be effective on simple single object images. This is a step-up from recent prior work~[14][18], which is largely demonstrated on synthetic datasets. Future work could build on these ideas to extend this to segment varying number of objects on real images. ------------------------------- Post-rebuttal --------------------------------------- I commend the authors for a well written rebuttal. The experiments with the combined LFW+Flowers dataset was interesting and illustrates that method works with more diverse real image datasets, which is a step beyond recent prior work. I increase my score to 7 and recommend accepting the paper.