I have read the reviews, the rebuttal, and much of the paper itself. The paper provides an accessible analysis of the role of L2 normalization on two common "similarity measures". The main issue I find with the analysis is that the claim in lines 89-91 is circular reasoning -- is not the role of the analysis to show why L2 normalization is beneficial? The three architectural modifications suggested are reasonably tied with the analysis, and while each, by itself is not groundbreaking, the final method is powerful. Like some of the reviewers, I am concerned that the major contributor seems to be the FRN method. However, I find the rebuttal convincing enough regarding this point. Based on this, I suggest accepting the paper despite the borderline ratings.