Paper ID: | 4363 |
---|---|

Title: | Post training 4-bit quantization of convolutional networks for rapid-deployment |

The authors' provide an approach for post-training quantization deep convolutional neural networks down to 4-bit weights and activations. Their approach includes several methods: 1. Analytical Clipping for Integer Quantization (ACIQ): ACIQ clips activations to minimize MSE, assuming Gaussian or Laplace distributions. 2. Per-channel bit allocation (PCBA): In this scheme one channel that needs a greater number of bits can take them from a channel that requires fewer as long as the average number of bits remains at the target (e.g., 4 bits). 3. Bias correction: This method analyse and compensates for some of the weight quantization induced bias. The authors created many combinations these approaches, applying them to ImageNet on many networks at a few activation and weight bit precisions. The authors are among the first to apply post-training quantization approaches to 4-bit and lower precision networks. The paper is interesting, and it has clear significance, creating a state of the art for post-training quantization at 4 bits. Table 1 is clear and shows how different methods effect results. The writing and results are mostly clear, but could be improved. The work may overstate its value a bit. The assumption that networks will not be retrained seems week. If there is great value in a 4-bit network, then it will be fine-tuned to achieve the best score it can. The per-channel bit allocation is difficult for inference. The benefit is reduced model size when applied to weights. Hardware must support the worst-case bit widths, so there is no real benefit for activations, so 4W4A is somewhat misleading. Also, model parsing is more complex. line 187: "5.2 Interaction between quantization medthods" should be "methods"

In this paper, authors introduce simple yet effective dataset-free heuristics for post-training quantization. Namely, 1. Clipping the activation values in some range which helps to focus on more dense area of values and better quantize them (with less distortion). The range values are determined using mean-square-error between the original weights and quantized weights. 2. Per-channel bit-allocation. Authors propose dynamic number of bits allocation instead of fixing it ahead of time for all channels. This is done by formulating an optimization problem and solution can be obtained analytically. 3. Since quantized weights have different mean and std than the original float32 weights, authors propose to correct those differences. It can be clearly seen that these heuristics based on statistical information about weights/activations and they can be combined together. The paper is well orginized and easy to follow. All proofs and derivations are seem correct to me. Major concerns: - For ACIQ, as authors stated the idea of using clipping is not a new. It is not clear what is the performance compare to other types of removal of outliers (e.g. simply removing all values which are greater than \pm 2\sigma). - Authors apply the proposed methods for channel-wise quantization. What is the performance for other types of quantization (filter-wise, layer-wise)? Probably, some additional experiments and comparisons required to see if the methods are generally applicable. Otherwise, it might be problematic to integrate with other techniques for quantization. - What is the motivation for bias-correction? Empirically, it shows the benefits but it is unclear why having bias in the mean and variance is harmful for quantization and why this correction should improve it? ----------------------------------------------------------------------------------------------------------- Since authors address most of my concerns, I would like to increase my score from 6->7. Overall, I think that it is a good paper and has a significant contribution to the machine learning community.

1) The paper is clean, focused and novel. Directly applicable to various area and research, well reflecting the current trend on quantized neural networks. 2) Can you explain why ACIQ on InceptionV3 is less effective than Resnet-50 or Resnet-101? Is it related to distribution assumption? 3) For clarity, it would better specify "signed" or "unsigned" quantization. When using activation quantization after ReLU, using [0, a] range and "unsigned" 8/4-bit for quantization. In this case, "round to midpoint" also valid?