This paper presents an analysis of how batch normalization affects the trainability of residual networks. Based on this analysis, the authors propose an initialization scheme that allows the training of deep unnormalized residual networks. The paper received mixed reviews (clear reject -> marginally below, marginally above -> accept, marginally below -> accept, top 50%). On the positive side, the paper is clearly written, addresses a relevant problem, has solid experiments, overall a very good analysis of the effects of BN. On the negative side, R1 finds that the conclusion not surprising and argues that proposed initialization method had been suggested before by Balduzzi and Zhang. In addition, R1 finds the technical writing lacks clarity to assess correctness. Other reviewers had critiques, but were satisfied by rebuttal and raised their ratings. Overall, I find the strengths outweigh the weaknesses and side with the majority of the reviewers (3/4) that recommend acceptance.