The work considers SGD training for Bayesian linear models and illustrate a connection between training speed and generalization and why SGD tends to select simpler models. In particular, the work illustrates that a particular type of posterior sampling from gradient descent yields same model rankings as that based on the true posterior under suitable assumptions. Experiments on deep nets are also presented. The reviewers liked the work overall, but felt that some aspects of the exposition were unclear, the transition and implications for deep nets is not quite convincing especially since there is now better understanding of both optimization and generalization in deep nets, and baseline comparisons (e.g., sgld, L2 regularization, dropout, etc.) would strengthen the work.