Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
This paper proposes to estimate the gradient w.r.t. the full-precision weight in a weight-quantized network by learning a meta network. The idea is original, and the paper is easy to follow overall. Hwoever, there is some concern on the experimental results and setup. Specifically, the authors provide three designs of meta quantizer, but they report the *best* test accuracy over these three. This may not be a fair comparison with the baseline STE. In the appendix, the authors reported detailed results for each design on cifar10 and cifar100, but not on imagenet. Also, apparently the authors use the change in loss values as stopping criterion. But as can be seen from figure3 (and other figures in the appendix), the loss values can still fluctuate a lot towards the end of training, and so this may again lead to unfair comparison.