NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper proposes the elegant and obvious-in-retrospect idea of using exact activations for the forward pass and low-precision activations for the backward pass, thereby achieving nearly the full memory savings of low-precision activations. It shows that this scheme nearly matches the exact training curves while allowing 4-bit precision. Overall, the paper is well-executed. The writing is clear, references to related work are pretty complete, and the experiments seem sensible and convincing. The reviewers feel like the paper could have been more ambitious in certain respects (e.g. experiments on more diverse architectures), but didn't spot any major problems. The method seems genuinely useful and is probably simpler than other quantization methods. I think it should be accepted. After reading the author feedback, I'm still a little confused why the method is inapplicable to logistic activations; it's not obvious to me a priori that quantized activations couldn't give reasonably good approximations to backprop. It seems worth at least running the experiment, even if the results turn out negative. (I see no reason to delay publication over this point, though.)