Reviews: A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution

The paper is very clearly written, the litterature review and introduction are adequate the derivations presented in the main paper are sound and the experiments are insightful and relevant. I have the following comments and remarks: - The proposed use of a Huber loss is in fact only there to simplify the theoretical analysis compared to using a classical L1 loss. The current presentation tends to suggest that the Huber loss may bring some specific algorithmic advantage, but it is not the case since the L1 norm (which is the limiting case of Huber as \mu goes to 0) always performs better or equivalently to Huber experimentally. This fact could be made more explicit by the authors in the introduction and problem formulation. A natural question is then: what would happen when using L1 minimization without LP rounding? Finally, it would be nice to discuss the relationship between the proposed approach and classical cross-relation methods penalized by an L1 term (e.g., Y Lin, J Chen, Y Kim, DD Lee (Neurips 2008), Benichoux, Vincent Gribonval (2013)). - Line 120: "the nonconvex relaxation (5).." : the authors probably mean (4). However, while it is clear that solving (3) lead to exact recovery using the formula under line 121, it is less clear that it is also true for solving the convex relaxation (4). This statement should be justified. - line 157: the definition below is not strictly speaking a partition, for any \xi>0. - Below line 157: the definition is not valid for ||q_{-1}||_\infty=0, i.e., for basis vectors: the authors should explicitely mention whether or not they are part of the set. - Line 160: "..they are closer to e_i.." : in what norm? - The remark on the top of page 6 about the assumption that RQ^{-1} ~ I is not mathematically rigorous and makes one wonder how valid this approximation really is and how it affects all the propositions, which are only stated within this regime. More details would be welcome. - Proposition 3.4. : I guess the assumptions \epsilon>1/5log(n) is needed for the proposition to hold. -Footnote (11): the authors state that they use a Riemannian subgradient method similar to (15) for the L1 loss, but it is not clear how this can be done. -Some implementation details are missing for the experiments, e.g., what values are chosen for \tau, \tau^(k) and \beta. -The paper lacks a short conclusion outlining perspectives and future research directions. - Unfortunately, I could not thoroughly check the 33 pages of involved mathematical proof in the supplementary material. I wonder whether it is reasonnable to expect NeurIPS reviewers to thoroughly check such large proofs, given that many reviewers have many other papers to review in a relatively short amount of time (in contrast, reviewing a long journal article may take several months). Typos: - L99: similar -> similarly - footnote 6: of the standard Huber function - L105: a transpose sign is missing, i.e., C_a^TC_a - L113: amendable for -> amenable to - L117: In contrast, the sphere is ... - L130: by using a random initialization, provably recovers... - L133: only requires the kernel a to be invertible - L150 the problem (5) -> the problem (4) - L197: in stating -> to state - Eq. (14): u should be a q instead - Footnote 9: an equivalent problem to (14) - L214: in general subgradient method -> a general subgradient method - L217: we prove that \Zeta - L218: in the sense that - L219: we use the matrix-vector form of convolutions - L240: producing -> produces - L242: For fix n=500 and p=50 - L248: we repeat the simulation 15 times. - There are additional misuses of articles throughout the paper, please proofread.

Reviewer 2

Originality: This work gives a thorough overview of the state of the art for the problem at hand. It combines the technique of optimization of Riemannian manifolds together with the Huber loss. Quality: The quality of the paper is high. Content is well organized and all the theoretical claims contain a proof provided in the supplementary material. The reviewer was not able to evaluate the correctness of all of the proofs. If the source code was provided, it would have been easier to get a sense of the performance of the proposed method. Clarity: Narrative is clear. Content is organized well. Significance: The solved problem is of great importance. Sparse blind deconvolution is still an open research question. Most the the papers in the domain still balance the performance trade-offs and costs in the terms of number of required samples.

Reviewer 3

Overall the presented result is timely and novel by providing a significant improvement over existing works. I have major concerns with the presentation and discussion on main results and missing references. 1. First of all, it is really important to interpret the meaning of the main result in the context of the problem. The sample complexity is given in terms of the number of channels p, which scales as a polynomial in the length of signal n. In practical applications, we do not expect a very large number for p. Is it really necessary to have p increasing in n? If one recalls classical results on multichannel blind deconvolution with FIR models, two channels suffice. How is the sparsity case fundamentally different from the FIR case? Although this question also applies to previous work, I think this fundamental question is still legitimate and worth a good discussion. Even in Figures 2 and 3, p is very small number compared to n. This is a bit inconsistent with theory. 2. The presentation of the algorithm needs some clarification. The matrix Q is a function of unknown a and hence is not available. In (15), does one need to know Q to perform the last step? I guess it is just for the sake of the analysis. This requires a clarification. 3. In the simulation, the author(s) could have considered illustrating how the reconstruction improves as one applies random initialization, RGD, and LP rounding in a progressive way. This can justify particularly the value of the last step. 4. There are several places where the statements are not clear enough. - eq (2) is hard to parse. - The decomposition of a circulant matrix requires a scale factor when F is unitary. Since the authors used the adjoint in the decomposition, it looks like they considered the unitary F. - I did not follow the sentence 92. How does the nonsmoothness make the analysis difficult? - The comparison in Figure 1 can be explained with more details. How can one see that the landscape is better with Huber loss? - line 132: why is it trivial when theta is less than 1/n? - line 157: The union does not cover the entire space. It does if xi converges to 0. Can it still be a valid partition? - The introduction of (10) might be just for the sake of the proofs. Results are directly stated on (9). The authors might want to reorganize the contents to improve the readability. 5. The following paper considered a similar problem and it will be useful to compare the result to it. Augustin Cosse, A note on the blind deconvolution of multiple sparse signals from unknown subspaces

Paper ID:	2215
Title:	A Nonconvex Approach for Exact and Efficient Multichannel Sparse Blind Deconvolution

Reviewer 1

Reviewer 2

Reviewer 3