NIPS 2017
Mon Dec 4th through Sat the 9th, 2017 at Long Beach Convention Center
Paper ID: 3092 Convolutional Phase Retrieval

### Reviewer 1

The authors of the article consider phase retrieval problems where the measurements have a specific form: they are the convolution of the n-dimensional input signal x with a random filter of length m. The authors show that, provided that m is larger than O(n), up to log factors, and up to a term that is related with the sparsity of x in the Fourier domain, a simple non-convex algorithm can solve such problems with high probability. The proof is quite technical. Indeed, the measurement vectors are highly correlated with one another. To overcome this difficulty, the authors use "decoupling techniques", together with results from [Rauhut, 2010] and [Krahmer, Mendelson, Rauhut, 2014]. I liked this article. I had not realized it was so complex to prove the convergence of a simple non-convex phase retrieval method in a very structured case, and, in my opinion, it is good that it has been done. The result by itself is interesting, and the proof techniques developed by the authors seem general enough to me so that they can be reused in other settings. As previously said, the proof is quite technical, and long. There are 38 pages in the appendix. I tried to read as much of it as possible, but, given the time constraints of the review process, I certainly did not have the time to deeply understand everything. However, what I read seems both clear and solid. The organization is also good. However, maybe the authors could discuss in more detail the case of coded diffraction patterns ([CLS15b]). Indeed, the case of phase retrieval with coded diffraction patterns is, at least at first sight, very similar to the case considered in the article: seen in the Fourier domain, coded diffraction patterns consist in convolving the input signal with (m/n) random filters of length n, while, in convolutional phase retrieval, the input signal is convolved with a single random filter of length m. I would appreciate the authors to compare their result and their proof techniques with the ones obtained for coded diffraction patterns. Minor remarks and typos: - Before Equation (4): add "that"? - l.95 (and also in the "basic notations" paragraph of the appendix): ", because" should be ". Because". - Paragraph 3.1: I did not know the name "alternating direction method". I think that "alternating projection method" is more common. - Equation (12), and the previous two: "i\phi" is missing in the exponential function. - l.180: ". Controlling" should be ", controlling". - l. 209: "are not" should be "is not". - Equation (19) (and the one after Equation (46) in the appendix): it took me a while to understand why this equation was true. Maybe it could be proved? - l.220: "this control obtains" should be "we obtain this control". - Equations (21) and (23): \mathcal{Q} should be \hat\mathcal{L}. - The paragraph starting at line 244 seems a bit redundant with the explanations around Equation (22). - Experiments: the main theorem says that, when x is proportional to the all-one vector, the initialization step still works, but the local descent does not. It would be nice to add an experiment that checks whether this is really what happens in numerical simulations. - [CC15] has been published. - The accent on "Candès" is missing in some of the bibliography entries. - Appendix, l.58: "we proof" should be "we prove". - Proposition 2.2: some "that" and "such that" could be added. - l.92: "be" should be "is". - l.150: I do not understand "for \gamma_2 functional of the set \mathcal{A}". - Lemma 3.11: the last equation contains a \sqrt{\delta} while, in the statement, it is a \delta (no square root). - l.202: I think that a "1/m" is missing in front of the expectation. - l.217: "for w" should be "for all w". - l.220: "define" should be "be defined". - l.222: "define" should be "defined". - l.224: "is constant" should be "is a constant". - Equation after line 226: I think that there should be no parentheses inside the sup. - Theorem 4.3: "are" should be "be". - l.268: a word is missing before "that". - Second line of the equation after line 278: the upper bound in the second integral should depend on d, not v. - l. 279: "where the" should be "where in the", and "Choose" should be "choosing". - Equation after line 292: I do not understand why the first inequality is true. - l. 297: "so that" should be removed. - l.305: "is complex" should be "is a complex". - Equation just before line 326: there should be parentheses aroung the logarithms. - Lemma 5.3: I do not understand the meaning of the first inequality: what is the difference between the first and the second term? - l.466: "is close" should be "are close". - l.481: "are defined" should be "is defined". - l.484 and 486: "first" and "second" have been exchanged. - I think that "we have [Equation] holds" (used multiple times) could be replaced with "we have that [Equation] holds" or simply "[Equation] holds". - Equation after line 492: it could be said that it is a consequence of Lemma 6.2. - Lemma 6.5, main equation: a multiplicative constant is missing before \delta. - Equation after l.510, sixth line and after, until the end of the proof: the \leq sign should be a \geq sign. - Equation after l.513: why w^{(r)}? Also, I think that there should maybe be close to 1 multiplicative constants before the last two terms of the sum. - Equation after l.551: I do not understand the presence of ||x||. - Lemma 6.3 and lemma 8.1 are not perfectly identical (there is a 2 in 6.3 that becomes a 2+\epsilon in 8.1). - References: the name of the authors is missing in reference [led07].

### Reviewer 2

This paper considers the phase retrieval problem where the measurement strategy is as follows: convolve the hidden vector z with a random pattern a and observe the amplitude of it. Similar problems have been considered and analyzed when measurements arise from inner products with i.i.d. random Gaussian vectors (see Candes et al. Wirtinger flow as well as convex optimization based papers). This paper is a technical and the main contribution is the fact that authors analyze an algorithm which is a) highly nonconvex (and runtime efficient algorithm). Additionally, the algorithm does not require "resampling" which means they can use the same samples for their iterations. This makes the problem more challenging and algorithm more realistic b) Structured measurements that can be diagonalized by Fourier transform and that are more realistic than Gaussian samples. I believe the paper will be a nice contribution to NIPS however authors can do a better job motivating the problem they analyze. While Figure 1 and 2 is informative, I don't see why Figure 3 is real data. They might want to find an application where convolutional measurements are directly relevant (in a similar fashion to works in coded diffraction patterns). This would increase the practical motivation of the paper. They also mention their techniques can be useful in other problems where convolutional measurements exist. They should consider elaborating more on it. Finally, can the same analysis address the Fourier measurements (where signal is again zero-padded) instead of convolutional. Since convolutional measurements are diagonalized by Fourier transform, they are already fairly related but it is not clear if the same ideas apply or not. Also they should cite related works such as Non-convex phase retrieval from STFT measurements and mention how their techniques compare.

### Reviewer 3

This paper studies the convolutional phase retrieval problem. It can be transformed into standard form for phase retrieval, where the design matrix is partial random circulant. This is a different from most previous work where the design is sub-gaussian. The authors propose to solve a weighted version of the objective for Reshaped WF. Spectral initialization + gradient descent are used to minimize it. This is a very technical paper, so my questions mainly focuses on this part. I would kindly ask the authors to elaborate the following: 1. I like the setup where rows of A (design) are dependent. Yet from my understanding, the partial random circulant design still satisfies RIP (Thm 3.10). If you use the normal fourth order objective (maybe with or without weights), would it be possible that the result from Procrustes Flow [1] can be applied here? 2. The proof follows the idea of recent analysis for alternating minimization. The Reshaped WF paper tries to prove local regularity condition. Out of my curiosity, why do the authors choose the AltMin route? Would it be easier if you try to prove the latter? It is really nontrivial to use decoupling to bound the phase term Tau_2 in line 177. How does your proof technique relate to geometry, intuitively? 3. By Proposition 2.2, we can see the basin of contraction is shrinking in higher dimension. Is the log factor some artifact from the analysis? I think the rate is acceptable, viewing that the radius of unit n-ball decreases approximately as O(n^{1/2 + 1/2n}). Thanks! [1] Low-rank Solutions of Linear Matrix Equations via Procrustes Flow