NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:3320
Title:Mining GOLD Samples for Conditional GANs

Reviewer 1

This paper revisited the conception of conditional GAN and proposed to make a deeper analysis of the discrepancy between the data distribution and the model distribution. They measured this discrepancy with the gap of log-density of them. Then they derived an approximated surrogate to implement the gap of log-density by quantifying the contributions of faked samples and labelled samples under the framework of ACGAN. This paper make a clear introduction of their work with some originality and quality. However, one of my concerns is the basic assumption, they choose the gap of log-density as a measure of discrepancy but why not others, for example the relative entropy or other generalized metrics.

Reviewer 2

Section 2, line 62 - what is c) , what method groups does it refer to? Sect. 3.2 line 129-131 are written unclear, a better explanation might help for the intuition of the equation (5). "CGAN WITH PROJECTION DISCRIMINATOR", Miyato 2018, are another type of conditional GAN, which is different than the (a) and (b) types from Section 2. They also decompose the loss of a GAN into marginal and conditional ratios, which is part of the GOLD motivator definition. A recent paper "Metropolis-Hastings (MH) Generative Adversarial Networks",Turner2018 uses the discriminator values to create a sampling procedure for better inference in GANs. Since the current paper uses rejection sampling (which can be inferior to MH) it can be good to discuss whether GOLD can work with MH and what performance to expect -- e.g. put this in Figure 2 in the experiments. Discriminator calibration is also a relevant concept from that paper. Sec. 3.2 equation (7) -- please discuss in more detail why the entropy is used to estimate unknown classes, and how this related to uncertainty in the prediction of the most probable class. Experiments 4, line 180 - why are 3 different GAN architectures used, one can do every experiment with ALL 3 of the chosen architectures? Or are there some limitations for data and model? Sec. 4.3 line 245 -- the G and D training scheme re-initialization seems heuristic, the intuition can be better explained line 255 -- it is unclear what is shown in column 2 and 3 in Figure 4. please clarify

Reviewer 3

Clarity - The paper is very well written and very clearly structured - Experimental setup is clear and results are well explained Originality While there are several related methods that use the discriminator for estimating likelihood ratios (e.g., [1], [2], and [3]), the proposed method is specific for the conditional case, and is applied in a new way for modifying training and active learning. The paper clearly states that for rejection sampling is an extension of a similar approach for the unconditional case. In terms of novelty, I think the paper passes the required bar. Quality - The method used is sound. I think the paper does a good job in addressing the main issues that can arise in the proposed method, such as using sufficiently trained discriminators (line 132-133). - That being said, I think the paper should address better some failure cases, such as the effect of GOLD when training is unstable, e.g. divergence or mode collapse. - Experimental results are generally convincing. I appreciate that the paper applied example re-weighting on 6 image datasets and showed improvement of performance in all of them. - While the paper provides clear justification of using the "fitting capacity" for quantitative evaluation, I still believe that providing FID scored would strengthen results. - One main concern I have with using GOLD in re-weighting examples is that it forces the generator to produce samples that can "fool" a classifier to produce the right class rather than generating a more "realistic" example. I think the paper should address this issue in more detail. Significance - The paper proposes a method can potentially improve research in conditional GANs Minor issues: - The GAN loss in Eq. 1 is wrong. The code seems to be using minimax-GAN (i.e., BCE loss for both generator and discriminator). References [1] [2] [3] ==================== Post rebuttal: Thanks for responding thoroughly to my comments. I think this is a good paper, and I vote for accepting it.