Paper ID: | 6875 |
---|---|

Title: | Implicit Semantic Data Augmentation for Deep Networks |

Originality: To the best of my knowledge, the paper main idea is novel, and I find it very interesting. Clarity: The paper is clear, well structured and well written. Quality: The initial motivation and the description of the method are sound. The derivation of the upper bound seems correct, and very useful. The experimental evaluation is well designed and provides a fair assessment of the method. One missing experiment/information in my opinion is the time cost for the estimation of the covariance matrices. Another interesting experiment would be to reduce the number of training samples and see if a smaller sample with a strong regularization by ISDA can achieve (close to) state-of-the-art results. Significance: The experimental results show that adding ISDA to state-of-the-art models improves significantly the results. It also outperforms the data augmentation state-of-the-art method. When applied alone, Dropout has a better performance, but the combination of the two regularizations gives the best results. Moreover, the ablation study shows the importance of using the full covariance matrix for the computation of the loss.

The idea of the paper is original to the best of my knowledge. Major references are being cited and the paper does a good job differentiating it from previously published papers. The paper is technically sound with correct derivations. Claims are supported by results and theoretical analysis. The assumption that the embeddings follow a Gaussian distribution seems strong to me, and the limitations of such assumption could be studied by the paper. Also, the generated images from the embeddings look interesting, but it would be nice to see results where the system fails. The paper is clearly written -- very easy to follow. A few details of the implementation are explained in the supplementary material. The most relevant aspect of the paper is perhaps its significance - the proposed approach has the potential to be used by many researchers in the field given its simplicity and effectiveness. After reading the rebuttal, I'm still happy with the paper. I think this paper should be accepted to NIPS.

This paper proposes a new image data-augmentation approach that adds class-dependent noise to the features (instead of input images). The idea of augmenting in the feature space is new and intuitive. The surrogate loss looks reasonably sound. The paper is well written. I have major concerns about the experimental results. In particular, the reported performance of the baselines looks much weaker than those in other papers. E.g., from [12] table.2, Wide-ResNet-28-10 on CIFAR10 has 3.9 top-1 error rate; while in the present paper, it's only 4.81 for the base model and 4.30 for the proposed approach, both of which are weaker than the base model in [12]. The same observation applies to other settings (different base models and datasets). The empirical comparison is mainly with other "robust losses", such as focal loss, etc which can be seen as "implicit" data augmentation. How about other popular data augmentation approaches, such as those proposed in [6, 12, etc] which perform "explicit" data augmentation? The paper claims the proposed approach brings little additional computational cost. Doesn't the computation of covariance matrices for each data instance (Line.6 in Alg.1) cause more computation? What's computation complexity, and how does it affect the run time empirically? Line.185: lost --> loss