NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:3314
Title:Dual Adversarial Semantics-Consistent Network for Generalized Zero-Shot Learning

Reviewer 1

(+) The primary contribution of the paper is the dual-GAN structure with semantics-consistency and visual-consistency loss. The paper has novel components (although it is close to [7], see below). The paper shows that their model performs better than the existing methods on GZSL benchmark datasets. (+) The paper is written well and easy to follow. (+/-) There are multiple losses in the paper that contribute to the overall performance. To better understand the individual contribution of these losses, the paper gives an ablation study in Table 3. However, it should also include results of ablation study on CUB and SUN datasets in the main paper. It is important since these two datasets are considerably larger than the other datasets in terms of classes. (-) In Figure 3(b), it appears that DASCN w/o SC has better performance than the full DASCN model. This needs to be clarified. (-) The paper is closely related to [7] but the paper does not have sufficient discussion and comparisons. The main idea of both, this paper and [7], is inspired from the cycle-consistency in image-to-image translation works [33]. In [7] (i) a GAN and a regressor is used instead of the proposed dual-GAN structure and, (ii) the proposed visual-consistency loss (VC) is not used in [7]. To establish the importance of dual-GAN structure, the performance of DASCN w/o VC should be compared with that of [7]. In the present state of paper, this cannot be done because (i) [7] have used semantic features different from others on CUB dataset (ii) authors have not presented the ablation study results on SUN and the CUB dataset. This is a very important issue and should be carefully addressed in the rebuttal. Mistakes and typos: (Important) In the second term of equation (7), it should be Dv(x’’, a’) instead of Dv(x, a’). It should be lowercase c in summation limits in equation (2) Line 157: instability during training Post rebuttal edit: The rebuttal answers some questions, but adds significant results including results on a new dataset which was absent in the original paper (FLO). In comparison with [7] the results on CUB and SUN (the bigger datasets) do not show significant improvements. AWA1 is smaller and FLO has been added afresh. IMO the paper needs to be revised and resubmitted with all the details added and discussions extended/revised.

Reviewer 2

This paper proposes a novel generative model for GZSL to synthesize inter-class discrimination and semantics preserving visual features for seen and unseen classes. The proposed DASCN model preserves the visual-semantic consistency by employing dual GANs to capture the visual and semantic distributions, respectively. Extensive experimental results consistently demonstrates the superiority of DASCN to state-of-the-art GZSL approaches.

Reviewer 3

Originality: low The proposed method looks the variant of Cycle-WAGN [7] that changes the consistency loss to class-wise loss. Quality: middle The proposed method shows better performance by a large margin and ablation study shows that proposed loss works well. One question is the result that apply Cycle-WGAN [7] in the proposed setting because the proposed method is similar to [7]. Another question is that the high performance of the proposed method arises from ts that is the accuracy for unseen class. Therefore I think the contribution is for zero-shot learning part more than generalized zero-shot learning part. If it is true, I am curious about the comparison of the methods in the standard zero-shot learning setting with the state-of-the-art zero-shot learning methods. Clarity: high The paper is easy to follow and the proposed algorithm is well described. Significance: middle Though the proposed method experimentally works well, it is not clear if the performance gain comes from the success of generalized zero-shot learning or simply the success of zero-shot learning. Also, given Cycle-WGAN, the technical contribution seems small. After Rebuttal I increase the score because the rebuttal address my concerns about the difference to Cycle-WGAN and the implication of the score ts.