Review for NeurIPS paper: Training Generative Adversarial Networks with Limited Data

NeurIPS 2020

Training Generative Adversarial Networks with Limited Data

Review 1

Summary and Contributions: This paper proposes a new effective method for training GANs from small size of data by incorporating non-leaking data augmentation. For achieving this, the authors design an adaptive discriminator augmentation (ADA). They extensively analyze why data augmentation can harm the GAN performance with diverse experiments. The ADA employs a data augmentation probability for generator as well as discriminator and a heuristic metric for measuring overfitting status. They evaluate ADA on various datasets such as FFHQ, LSUN Cat, AFHQ-Dog, MetFaces, BreCaHad, and CIFAR-10 comparing with SOTA methods including bCR. The results are very promising.

Strengths: - Robust GAN training from small datasets is very important and challenging. - Even if some recent studies proposed data augmentation-based GAN training, this method handles leaky data augmentation problem. - The proposed method is simple but effective. - The paper is well-written and clear. - The authors provide extensive analysis on leaky augmentation and promising results on many datasets including interpolation videos. - They will release a new MetFace dataset.

Weaknesses: Basically, I like this paper. - How about the results when not applying augmentation to G? Because the overall flow of the proposed method is different from bCR, D only augmentation results might be meaningful for supporting the hypothesis. - In Figure2, what yellowgreen boxes mean? - The minibatch numbers to adjust p is set to 4. How sensitive is the performance to the number? - What is the reason of large r_t fluctuation for fixed p in Figure 5(d)? More discussion will be helpful for readers - For AFHQ dataset, the authors presented the results of the dog domain. How is the results of the wild-life domain where the intra-domain variance is larger? Similar or not?

Correctness: The claims are clear and correct.

Clarity: The paper is well written and easy to follow.

Relation to Prior Work: Clearly discussed.

Reproducibility: Yes

Additional Feedback: - Overall, this method deals with discriminator overfitting issue with non-leaky data augmentation. It is well-know that model capacity affects the effects of data augmentation methods. Strong augmentation such as mixup [Zhang et al. 2018] and cutmix [Yun et al. 2019] make more improvements for larger models (ResNet, EfficientNet-L) than smaller ones (mobilenet_v2). Then, given larger training dataset, if the parameter size of discriminator increases, data augmentation can make some effects? Or any results on larger discriminator? Of course, I know this is beyond the scope of this paper. - In Figure 1, it will be better to add "black dots mean the best points." - In Figure 3, for color transformation, the patterns of p are different depending on each color? for example, when fixed color (blue tone, green tone, yellow tone, etc), the p patterns are different? - In Figure 7(d), quantitative values can be helpful such as pixel-wise L1 difference between two mean images. [Zhang et al. 2018] mixup: Beyond Empirical Risk Minimization. ICLR 2018. [Yun et al. 2019] CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. ICCV 2019. After rebuttal: ========================================== I thank the authors for their great efforts. I carefully read the other reviewers' comments and author response. The authors clearly answered my questions. I decided to raise my score to 9 considering the importance of this topic.

Review 2

Summary and Contributions: This work proposes to address the problem of limited data in GAN training with discriminator augmentation (DA), a technique which enables most standard data augmentation techniques to be applied to GANs without leaking them into the learned distribution. The method is simple, yet effective: non-leaking differentiable transformations are applied to real and fake images before being passed through the discriminator, both during discriminator and generator updates. To make transformations non-leaking, it is proposed to apply them with some probability p < 1 such that the discriminator will eventually be able to discern the true underlying distribution. One challenge introduced with this technique is that different datasets require different amounts of augmentation depending on their size, and as such, expensive grid search is required for optimization. To eliminate the need for this search step an adaptive version called adaptive discriminator augmentation (ADA) is introduced. ADA monitors discriminator overfitting and adjusts the strength of the augmentation accordingly throughout training. It is shown that ADA improves image generation quality significantly in the limited data setting, outperforming all competing regularization and transfer learning methods. A new dataset called MetFaces is introduced as a high resolution, low data option. Additionally, significant improvements over current state-of-the-art results are achieved on the popular CIFAR-10 benchmark.

Strengths: S1 - Data augmentation is an extremely common tool in classification, but until now has been sorely missing from the GAN literature. This work is very likely to have widespread appeal considering the current popularity of GANs. S2 - Principled definition for what constitutes a non-leaking augmentation operator. S3 - Adaptive method eliminates the need for costly grid searches. S4 - Considerable improvement in limited data setting, which has long been a sore spot for GANs. S5 - State-of-the-art performance on the CIFAR-10 dataset by a large margin. Considering how popular this dataset is for benchmarking GAN performance, this is no small feat.

Weaknesses: None that I found noteworthy.

Correctness: Co1 - Comparison between the proposed technique and competing methods is fair. Care is taken to properly optimize each method, rather than simply using default settings.

Clarity: Cl1 - Paper is well written and easy to follow. Figures do a great job of summarizing key points. Supplementary material contains extensive details for reproducibility and further insights.

Relation to Prior Work: RPW1 - Extensive discussion and comparison with related methods.

Reproducibility: Yes

Additional Feedback: AF1 - This is fantastic work! Proper data augmentation has been missing from GANs for a long time, and this work fills that hole. Having adaptive augmentation strength as well is icing on the cake. The supplementary material is top class - very detailed and contains many insights that will be useful for those trying to reimplement or extend this work in the future. == Post Rebuttal == After reading the rebuttal and other reviewers' comments I have decided to maintain my scoring on this paper. The problem addressed is one of widespread interest, and the paper provides many details and insights that will be of use to those who would like to use or build on this work.

Review 3

Summary and Contributions: This paper investigates the overfiting problem of GANs given limited data. The overfitting appear in discriminator. This paper consider data augmentation to eliminate this issue. Authors exploit many data augmentation methods, and explore the combination of data augmentations. From experiments authors find that the performance of model is effected by training process, and thus proposed a new adaptive discriminator augmentation, which is computed on the output of discriminator. Both qualitative and quantitative results demonstrate that the proposed method achieves good results.

Strengths: Given the few data this paper exploits the overfitting problem of GANs, which is in discriminator. Pros: The paper designs one interesting experiment to investigate the output of discriminator, which evaluates the distance of the train image, generated image and validation image. I like this experiment, which considers three sets to check what happen in D. Figure 1 show the output validation image is similar to the one of the generated image, which means the D heavily remember the train image. The comprehensive analysis is conducted. Specially, it refers to the overfittng, the way of data augmentation, the probability, the dataset size, different datasets, and transfer learning. The paper is easy to follow

Weaknesses: I have a few questions: 1. If using different GAN loss (here WGAN-GP) to train, the output distribution of D (Figure c) will be similar or not. What I means is that D(real) is always larger than D(fake). What happen if we use hinge loss ? 2. In Figure 8(a), the result of spectral norm is worse. Why it happen here? I know it is weird to ask since it is not from this paper. 3. The title should be more specific and include information like 'data augmentation', which is fast to get the key point from the title. There are a few techniques to reduce the overfitting, such as the data augmentation, regularization, transfer learning etc. I would like to add specific information.

Correctness: It is vary clear

Clarity: It is good paper, and easy to follow.

Relation to Prior Work: Authors concludes the related work.

Reproducibility: Yes

Additional Feedback: --------------------- AFTER REBUTTAL --------------------- I thank authors for rebuttal. Authors address my concern. I would like to keep my score.

Review 4

Summary and Contributions: The paper proposes to learn gans given limited number of data. The idea is to perform data augmentation in the discriminator. The augmentation is carefully designed to ensure that the generator will converge to the data distribution if the augmented distributions match. The proposed method is evaluated in several visual tasks.

Strengths: I agree that learning gans in the limited data setting is important and interesting. The proposed method is simple and makes sense. The augmentation is carefully designed to ensure that the generator will converge to the data distribution if the augmented distributions match. The results on several visual tasks are impressive.

Weaknesses: The paper misses some theoretical insight of the proposed method. The discussion about the "non-leaking" augmentation operators only considers the equilibrium point, which can be hardly achieved in practice. Then how this artificial augmented data can lead to a better generator in an adversarial game is still unclear for me. The main hypothesis of the paper is that it can prevent the discriminator from overfitting the training data. Then, what if we use a smaller discriminator, early stoping or other techniques in the discriminator to prevent overfitting. Will the results be the same? An empirical comparison is necessary and further analysis of the improvements is preferable ------after rebuttal------- Thanks for the author feedback. I agree that the proposed method won't change the equilibrium. I won't fault the author for the missing analysis of the empirical convergence. Besides, the authors claim that other techniques including smaller and finetuned discriminator and early stopping won't prevent the discriminator from overfitting, which strengthens the motivation. I raise my score from 6 to 7.

Correctness: Yes.

Clarity: Yes.

Relation to Prior Work: The most related paper is discussed.

Reproducibility: Yes

Additional Feedback: