NeurIPS 2020

A Closer Look at the Training Strategy for Modern Meta-Learning

Meta Review

The main result of this paper is a generalization gap for S/Q meta-learners that depends solely on the number of tasks and not on the per-task support set size. The proof techniques for this result draw heavily on Maurer [2005], but is conceptually novel. The empirical results are consistent with the theory, but are a bit noisy and lack error bars (particularly Figures 2a and 2b, making it difficult to judge if they strongly support the generalization bound. This paper remains controversial between the reviewers. R1 is advocating for rejection, with the following assessment: (1) The theoretical machinery is not itself a contribution, amounting to a manipulation of the proof technique of Maurer [2005]; (2) The empirical results in the paper are not inconsistent with, but do not strongly support the generalization bound; (3) The paper lacks clarity overall (e.g., the unbiasedness assumption, restated but not clarified in (4) of the author response); (4) The assumptions used to produce the bound do not hold in general practice nor in the eval in the paper: Dinh et al. (2017; show that ReLU networks (which the appendix remarks is used to produce the empirical results) result in non-Lipschitz loss, breaking the assumptions to get uniform stability of SGD in Hardt et al. (2016), which is an assumption required to derive the generalization bound in the submission. R2 and R3 are in favor of acceptance, as they found the paper to be interesting and did not take issue with limited novelty in the proof technique. I tend to agree with R2 and R3. I think that using a proof technique from prior work is not necessarily a weakness (and may even be a strength), and the outcome is interesting and novel. I recommend acceptance. However, I strongly encourage the authors to: - include error bars and/or more runs to the experiments in Figures 2a and 2b so that it is more clear if the trend is flat or downward. - include a point of comparison in the plots in Figures 2c and 2d so that it is possible to understand the scale - revise the paper to address the clarity concerns of R1 and R2 - address or at least discuss the limitation of (4) that R1 brought up above.