The initial reviews showed some disagreement about this paper, with two positive reviewers noting the reduction in computational and communication costs compared to prior solutions, and two more negative reviewers with some concerns in particular regarding novelty and comparison with respect to previous work. After reading the author rebuttal and further discussion, the doubts regarding the comparison to recent work were lifted, leading to one reviewer increasing his/her score. While some concerns remain regarding the applicability of the work to non-linear models, the merits of the work are judged significant enough, and we decided the paper should be accepted. In the final version, the authors are asked to be more explicit about the potential limitations of the degree-1 approximation to the sigmod, and to add a discussion about how one may go about extending the approach to more complicated (deep) models.