The reviews of this paper were positive overall. The authors propose a meta-learning approach to predict the generalization gap of a model. The approach yields a generalization bound on the model, and an optimal regularization to train a model. The experimental results show that the proposed approach is an effective regularizer in a few-shot setting. The authors also show how the proposed method can be used to improve upon a single large task with n-fold cross validation. The reviewers appreciated the significance of the contributions as well as the novelty of the perspective. They appreciated the clever use of multi-headed attention and the nuanced interpretation of the results. The reviewers expressed concerns, though, about baselines and comparisons in experimental evaluation and pointed out several minor issues. There were also more detailed concerns regarding cross-validation, data augmentation and bilinear layer. The authors submitted a response to the reviewers' comments, as well as confidential comments to the area chair. After reading the response, updating the reviews, and discussion, the reviewers feel that 'the idea is super promising, but the paper still needs more work' and that 'stronger baselines and larger models reported in the rebuttal are much appreciated'. We highly recommend to take the reviewers' comments and suggestions into account while preparing the final version, in particular the additional materials in the experiments and the additional clarifications in the text mentioned by the authors in the response. Accept.