NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
This paper presents solid theoretical work that is well written in clear, at least to the extent that is possible for a dense topic. It address relevant questions about generalization error for random projection combined with quantization and provides useful insights after each results. The choice of simple but meaningful models for which a broad set of people are likely to have decent intuition, also makes the work more accessible. The paper is not without its shortcomings however. Most of which I cover in section 5 below. That being said I struggle with one of the key assumptions in the paper which is the fact that we can normalize all the data to the unit circle. The author(s) justification is that it is a “standard preprocessing step for many learning algorithms”. This certainly has not been my personal experience, as opposed to say mean/variance or min/max normalization. For high dimensional data I can conceive this transformation to be a reasonable choice, but I definitely feel it requires more than a passing mention. It either needs established references or a separate analysis, potentially empirical, of how this affects performance.
Reviewer 2
Overall it is an interesting paper, with nice results. However, I am not as certain about the relevance of the results. It is not clear that the three methods considered are often used in their compressive version combined with quantization. One example I have seen is: [*] M. Li, S. Rane, and P. T. Boufounos, “Quantized embeddings of scale-invariant image features for mobile augmented reality,” IEEE 14th International Workshop on Multimedia Signal Processing (MMSP), Banff, Canada, Sept. 17-19, 2012, and subsequent work by the authors above. Still, the literature is still scarce for such applications. It would be interesting if the authors could provide a bit more motivation. With respect to the results, I think that overall it is a good start in this direction. One thing that is missing from the results is the link between the quantization interval \Delta and the number of best used. I can always reduce \Delta and improve the error bounds, but then I need to use more bits per measurement. Furthermore, when it comes to compressive methods, there is also the tradeoff between number of measurements and size of \Delta, which affects the total rate. Some discussion towards addressing these questions would be valuable. ==== I've seen and taken into account the author's response, and it does not change my score.
Reviewer 3
In general, I think the quality of this paper is good and it is written in a clear form. I think the content of this paper meet the quality of the conference. However, I have a concern about the title of the paper. Is it appropriate to use the term of "generalization error". From my understanding, generalization error refers to the difference of error on testing set and training set, given the same parameters. The error bound in Theorem 5 exactly shows the generalization error. However, I'm not sure whether the other theorems refers to generalization error. (1) In Theorem 2 and 3 , lower case (x,y) refers to test data. LHS is testing error with expectation over (x,y). The first term in RHS does not contain expectation over (x,y), so this is strange. I think it should be just L(h*), please check. If it were L(h^*), it is Bayer risk, not training error. So I think Theorem 2 and 3 can not be called "generalization" error. They should be called "compression error" and "quantization and compression error". (2) In Theorem 6, similarly both terms in the LHS refers to errors on test data, so it should not be "generalization error". Also, L_Q(\hat{\beta}*_Q) is already expected over Y|R so there is no need to take expectation over Y and R. I'm not blaming the significance of these bounds, but I think there should be a more principled way to make the connections over there three bounds, rather than using the term "generalization error". Minor suggestion: I wish to see more discussions about previous work on quantized compressive learning. The authors provide some references in the introduction, but I wish to see more detailed comparisons of results of the previous work and the current work. This will make the main contribution of this work more clear. ============================================================== I'm happy that the authors have answered my question well by providing a good reference. I also learned something new from their feedback. I would like to increase my score from 6 to 7.