This paper studies the landscape of overparametrized convolutional networks and argues that their training dynamics can be analyzed by comparing the trajectories of feature distributions. Using “network grafting” as a metric, it shows feature distribution trajectories of two networks with the same architecture but different initializations remain close during training. The paper also shows that although the landscape is non-convex with respect to the trainable parameters, it can be reformulated as a convex function with respect to the features. Reviewers rate the paper as top 50%, marginally above, and marginally above. They find that the paper is well written and proposes a novel and appealing perspective for analyzing training dynamics. However, there was lack of clarity about the claim of convexity, which the authors clarified in the rebuttal. I think adding those clarifications to the paper is needed. I also note there are earlier papers on the analysis of the landscape which show that the non-convex objective on the parameters can be tightly connected to a convex objective on output space (https://arxiv.org/pdf/1506.07540). Finally, I also think it would be good to add the discussion in the rebuttal about using other standard statistic metrics. Overall, there is agreement this is a good paper.