Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
- The paper extends the unsupervised domain adaptation task for semantic segmentation to multiple sources. This seems to be the first paper that does that. - The paper is well written and sufficiently clear. - The results are sufficient and convincing. - The approach can be considered as reasonable extension of CyCADA to leverage multiple sources. There is also an improvement of CyCADA given by a new dynamic semantic segmentation loss. Besides that, the other major novelty is the introduction of losses to produce an adversarial domain aggregation. - The most important aspect of the paper is that it builds on solid previous work and goes into a new direction, which is to use multiple sources for adaptation of segmentation networks. - No major concerns, just that the Authors have not done much to describe the details of the architecture used. How are D_T and D_i defined? What network structures have you used for them and also for the generator networks? - In Eq(9) you may want to replace F_A with F.
This paper proposes to leverage training data from multiple source domains to strengthen the ability of segmentation models on processing domain shift and dataset bias. It well reviews the literature of related work and clearly points out the differences between this paper and the prior ones. The presentation is good and the ideas are also interesting and original. Nevertheless, I still think this paper has the following issues that need to be settled. 1) The paper aims to handle the domain shift problem by leveraging multiple source domain datasets. Intuitively, this would definitely strengthen the segmentation ability of the resulting model compared to those trained on a single source dataset. Despite so, I think it is still necessary to experimentally demonstrate that multiple sources indeed assist compared to the case where only one source dataset is provided under the setting that all other configurations are kept unchanged. However, I do not see any results in this paper about this. In Table 2, the authors do list results based on single source domain but the experiment environment is different. I think this experiment would more convincingly explain that multiple sources help. 2) I suggest that the authors should re-organize the layout of Figure 1. It really confuses me a lot and I think this will also happen to other readers. It really takes me some time to figure out what all these arrow (in various colors and with different styles) are used for. 3) The adversarial domain aggregation part seems naive a bit. Loss functions for different discriminators are just simply linearly combined tegother. This is too straightforward in my view. I originally thought there would be some discussion on how to design this part and why doing so but no such content is found.
I am not entirely convinced about the validity of incorporating semantic alignment, based on the assumption that the aggregated domain image and the target image should have the same segmentation labels, which seems rather strong over a diverse source-target pair. It is unclear what exactly make a "feature" - this needs to be clarified so as to justify the proposed feature alignment. Overall, the framework reflected by Eq. 9 is rather large, and it is unclear how costly it is computationally. As for the experiment results, two questions can be asked: - Compared with the source-combine DA, the proposed MADAN framework gives only marginal improvements on most of the classes. Interestingly, for "Sky", source-combine DA performs much better. Is there any explanation? On the other hand, poorly performing classes remain so with MADAN - is it really worthwhile to use such a complicated framework? - The ablation study results are not very convincing. Using SAD alone achieves the best performance on "sky"; further progressive additions receive poorer results. In fact, contrary to the claims made by the author, progressive additions actually degrade the performance, e.g. for 5 classes, SAD+CCD+DSC+Feat gives worse results compared with SAD+CCD+DSC. SAD+CCD gives the best result on "person", better than SAD+CCD+DSC and SAD+CCD+DSC+Feat. Although the addition seems to give growing mIoU indices, to me the often marginal improvements obtained by these algorithmic components may not be counted any significant contribution. Section 4.4 gives some generalisation results. It could be strengthened by adding a progressive multi-source DA process, from GTA/Synthia->BSDS, to GTA/Synthia/Cityscapes->BSDS, so as to demonstrate a better case for "multi-source". Some minor corrections: - Eq.3, missing a half parenthesis after G_S_i->T(x_i)-x_i: should be G_S_i(x_i))-x_i - End of p.2, the claim "our method can be easily extended to tackle heterogeneous DA ..." seems too strong. By adding one "unknown" class won't solve the issue of having multiple new classes; same to training on "a specified category".