The paper proposes a novel end-to-end gradient-based optimization for searching discrete low-bit weights in quantized networks. After reading the reviews, rebuttal, and the discussion among reviewers the paper clearly is recognized as novel and well executed. I would encourage the authors to further improve their work by better clarifying the decay strategy for the temperature in the camera ready and to add a comparison with SGD-R scheduling as pointed out by one of the reviewers. It would be also nice to have a mention on how the proposed approach relates to Latent Weights Do Not Exist: Rethinking Binarized Neural. NeurIPS2019 as pointed by R1.