This paper evaluates various "generalization measures" --- numbers computed from training data and training algorithm and network properties --- in terms of their success predicting generalization. The work builds on the prior work of Jiang et al. (their ) in ways they clearly define and thus they provide a new set of results on similar questions. Their changes are interesting, and since generalization of deep networks is of such extensive interest to so many, I also feel these results will be valuable. I look forward to seeing this paper appear, and support the authors on future work. --- Minor comments (just my own opinions and I tried not to use them too much in evaluation). (a) Thanks for trying the spectral thing, it's pretty random I noticed; I've also done many such experiments and was surprised it didn't matter so much in my case, but your metrics are more sensitive than what I've tried, so tbh i think it's quite interesting, maybe worth a mention in an appendix? (b) I still feel (this was my other point) that it would be valuable to especially outside readers to include some table and/or description of these generalization measures, what they mean both rigorously and intuitively, etc. (c) I personally still find the presentation of figure 1 quite dense. Since IMO figure 1 is the main core of this paper, I think it would be reasonable to spend more time explaining figure 1 and even expanding it, in the process shortening some other stuff and moving to appendices? (d) in your feedback, you included responses to reviewers about how your metrics compare to . I think it is essential to include these comments and more in your revisions. (e) your rebuttal was pretty thorough, thanks; I should have highlighted the importance of (b) more, it mattered more to me than (a).