NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
The main advance of this manuscript is the reduced sampling complexity and computational speed of the proposed algorithms ExactSHT and RobustSHT; with theoretical guarantees. These advances are illustrated on a small scale toy problem, see Figure on page 8. One of the deficiencies of the manuscript is the very limited quality of numerical experiments, so much as to not include some of the important aspects such as probability of recovery and the differing performance of the ExactSHT and RobustSHT. The main manuscript includes a pair of plots with a small discussion, and the supplementary material includes a modestly longer experimental section; see for instance Table 1 on page 23. The advances are important, more so than many results in compressed sensing these days as CS is a well developed topic. The results are presented clearly, but the organisation of the results isn't ideal at present as the sampling complexity isn't shown in the main manuscript. The authors are encouraged to move Table 1 of the supplementary material to the main manuscript as this is especially important; alternatively it could be presented in plot form within the current plots using a right axis on sampling complexity.
Reviewer 2
I do not feel qualified enough to review the technical aspects of this submission, but reading the paper makes me sligthly uneasy. - The very first sentence of the introduction is almost verbatim from another paper (first sentence of the abstract of [11]) - Also; - some clear overselling: "a natural and very important class to consider" (based on one paper about hyper-parameter tuning for deep nets) seems like a tenuous motivation. - specifying in all O(.)-type result the base of the logarithm seems.. strange, and does not really inspire confidence ("log_2" everywhere in big_Oh notations...) - Theorem 2 has no approximation parameter. I do not see how, given query access to a function x, one can output the *exact* values of the non-zero coefficients of \hat{x} in a sublinear number of queries (which is what "problem (1)" asks) - no comparison of the results with what is obtained in [8], which gives results on learning Fourier-sparse functions. While I may very well be wrong, the above points make me believe this submission should be rejected. UPDATE: In light of the other reviews, and the authors' response, I have updated my score to reflect better the importance of the problem solved, and the response of the authors regarding the points made above (also, I would suggest to include, in some form, part of the answer (2) in the paper itself). However, I would very much like to point out again that, regardless of the results, verbatim lifting of other papers' writing with no acknowledgment is not OK (even in the case of self-plagiarism, a point anyway moot here given the anonymity). This played a significant part in my original assessment.
Reviewer 3
The results are clever and nice. They are a bit similar to the results in "New Results for Learning Noisy Parities and Halfspaces" (Theorem 1 where in that paper they isolate large Fourier coefficients). Here you must be very careful with the hashing scheme to get the optimal bounds.