NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Reviewer 1
Quality: - Overall the paper is high-quality. - Some lingering questions I have: -- In the experiments, it appears that the algorithms have oracle knowledge of the exact sparsity k. Do these methods work without oracle knowledge of k? It appears that this oracle knowledge of k is used directly in setting measurement parameters, such as the entries of M1, which is unrealistic since the true value of k is rarely known a priori. -- How does this approach perform in the presence of noise? Can the authors run more experiments with noise on the 1-bit measurements? -- Why do the authors only compare against “all Gaussian” measurements? Surely there must be other methods that take measurements in two-stages. In particular the authors reference the approaches in (Acharya et al. 2017) and (Gopi et al. 2013) several times but do not run experimental comparisons against these methods. Originality: - The idea of taking measurements in two stages in compressed sensing (to recover the support, then approximate the signal values on this support) is not new. However, the idea of recovering a superset of the support rather than the exact support appears to be novel. - The connections to Group Testing are interesting and appear to be novel as far as I know. Significance: - Tighter upper bounds than existing results, along with explicit measurement construction methods, are provided for the important settings of non-negative signals (such as images) and binary signals. Even if the idea of taking compressed sensing measurements in two stages is not new, these improvements are important for pushing forward 1bCS research. Clarity: - The paper is mostly clear. - The authors did a good job of juxtaposing their results with previous open questions in the literature - Table 1 clearly highlights the novel contributions -------------------------- Update after rebuttal. The authors comments were satisfactory, as my comments were mostly suggestions for improvement. They didn't mention the points of unclarity in their writing, but I do hope that they will make appropriate corrections on revision.
Reviewer 2
Overall an interesting paper. However, there are several issues that I would recommend the authors address. Most of the issues revolve around the robustness of their approach. In particular, there proposed approach critically depends on the support superset recovery, in order to proceed to the next step of signal recovery. Misidentification of the support will result to incorrect recovery. Furthermore, the larger the identified support the more measurements will be required to fully reconstruct the signal based on a larger support. It is not clear how group testing behaves in the presence of measurement noise. Furthermore, quite often practical signals are not exactly sparse. i.e., the support might be the whole signal even though the signal is sparse because the coefficients decay very fast. Thus, some are approximately, but not exactly, sparse (often known as compressible). Images are such an example. Is group testing robust to this? Will the support returned represent the largest coefficients? If the group testing fails, so will the rest of the process. Another issue is that, it seems, the process requires knowledge of k, to design the group testing. This might often not be available, (admittedly, also a problem with some conventional algorithms, such as the BIHT). However, conventional algorithms seem to be more robust to lack of knowledge of this parameter than group testing ones. Is that an issue? In addition, the experiments are not as convincing as I would expect. The improvements are modest, if any, to justify the extra complications of the two-step process. Also, I would recommend the authors include a line (i.e., a third experiment) in which the superset method matrix is used whole with BIHT (i.e., without the support recovery step first). It might be that the matrix they designed is actually better than a fully Gaussian matrix, even when used with conventional methods. ===== I've seen and taken into account the author's response, and it does not change my score.
Reviewer 3
This is a generally well-written paper with a nice set of results. A potential weakness is that some of the main results come across as rather simple combinations of existing ideas/results, but on the other hand the simplicity can also be viewed as a strength. I don’t find the Experiments section essential, and would have been equally happy to have this as a purely theory paper. But the experiments don’t hurt either. My remaining comments are mostly quite minor – I will put a * next to those where I prefer a response, and any other responses are optional: [*] p2: Please justify the claim “optimal number of measurements” - in particular highlighting the k*log(n/k) + 1/eps lower bound from [1] and adding it to Table 1. As far as I know, it is an open problem as to whether the k^{3/2} term is unavoidable in the binary setting - is this correct? (If not, again please include a citation and add to Table 1) - p2: epsilon is used without being defined (and also the phrase “approximate recovery”) - p4: Avoid the uses of the word “necessary”, since these are only sufficient conditions. Similarly, in Lemma 3 the statement “provided that” is strictly speaking incorrect (e.g., m = 0 satisfies the statement given). - The proof of Lemma 1 is a bit confusing, and could be re-worded. - p6: The terminology “rate”, “relative distance”, and notation H_q(delta) should not be assumed familiar for a NeurIPS audience. - I think the proof of Theorem 10 should be revised. Please give brief explanations for the steps (e.g., the step after qd = (…) follows by re-arranging the choice of n, etc.) [*] In fact, I couldn’t quite follow the last step – substituting q=O(k/alpha) is clear, but why is the denominator also proportional to alpha/k? (A definition of H_q would have helped here) - Lemma 12: Please emphasize that m is known but x is not – this seems crucial. - For the authors’ interest, there are some more recent refined bounds on the “for-each” setting such as “Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework” and “Sparse Classification: A Scalable Discrete Optimization Perspective”, though since the emphasis of this paper is on the “for-all” setting, mentioning these is not essential. Very minor comments: - No need for capitalization in “Group Testing” - Give a citation when group testing first mentioned on p3 - p3: Remove the word “typical” from “the typical group testing measurement”, I think it only increases ambiguity/confusion. - Lemma 1: Is “\cdot” an inner product? Please make it clear. Also, should it be mx or m^T x inside the sign(.)? - Theorem 8: Rename delta to beta to avoid inconsistency with delta in Theorem 7. Also, is a “for all d” statement needed? - Just before Section 4.2, perhaps re-iterate that the constructions for [1] were non-explicit (hence highlighting the value of Theorem 10). - p7: “very low probability” -> “zero probability” - p7: “This connection was known previously” -> Add citation - p10: Please give a citation for Pr[sign = sign] = (… cos^-1 formula …). === POST-REVIEW COMMENTS: The responses were all as I had assumed them to be when stating my previous score, so naturally my score is unchanged. Overall a good paper, with the main limitation probably being the level of novelty.