Authors propose to learn a method for data-augmentation that improves performance as compared to data-augmentation strategy where all parameters are randomized. The results are not on datasets used by SOTA methods and some of them are in the appendix instead of the main paper. I agree with the authors that R2 might have misunderstood the paper and did not participate in the post-rebuttal discussion. It is also mentioned in his review, "The remaining contribution of this paper is the use of the re-parametrization trick to adapt the group over which we want to be invariant on, which is in my opinion not a substantial contribution to present this paper in NeurIPS." I don't think we should judge papers solely based on novelty in the technical section. While the idea is indeed simple, it does lead to performance improvements and does bring to forth the importance of learning how to augment data, instead of just performing random data-augmentation. R4's main concern seem to be requirement of hyper-parameter tuning, but I believe most methods require it. This cannot solely be the reason for rejection. R1 who recommends acceptance, says, "The main idea is very simple and this very appealing to a broad audience of practitioners. I would certainly like to use this method in my own research." While the idea does not produce ground-breaking results, I can see it being impactful in a wide-variety of problems. I broadly agree with R1. Despite negative scores from the reviewers and discussion with other ACs, I recommend the paper be accepted.