Paper ID: | 317 |
---|---|

Title: | Solving Random Systems of Quadratic Equations via Truncated Generalized Gradient Flow |

The paper proposes a novel algorithm for solving random systems of quadratic equations, an important problem in signal processing (which belongs to the class of phase retrieval problems). As in previous proposed algorithms (e.g. the truncated Wirtinger flow algorithm), the algorithm is comprised of two stages: an initialization procedure, followed by iterative refinement of the solution. In contrast to the (truncated) spectral initialization procedures used in previous works, the authors propose a novel orthogonality-promoting initialization procedure, avoiding the known short-coming of spectral methods relying on heavy-tailed distributed quantities. An empirical study clearly shows the advantage of this new initialization procedure over previous ones. Given the initial approximation, the paper then proposes an iterative truncated-generalized-gradient algorithm for minimizing a non-convex and non-smooth amplitude-based cost function (instead of an intensity-based one) to further refine the solution. Assuming Gaussian design, they provide a rigorous analysis of its convergence to the correct solution in exponential rate, given a sufficiently accurate initial approximation. As far as I understand, the proof relies on a novel observation, showing that a smart truncation of generalized gradient components avoids directing the algorithm to spurious direction during the search. Empirical study shows that the proposed algorithm is superior to the state-of-the-art algorithms and significantly reduce the number of equations needed for solving the problem, thus considerably narrowing the gap toward the information theoretic number of equations provably required.

The paper is well written and easy to follow. Although I haven't been familiar with the problem before, the background and connection to previous results put me right on track for understanding the motivation behind the proposed algorithm. The analysis and state-of-the-art empirical results are quite convincing and as an outsider to phase retrieval problems (and counting on the authors integrity) I believe that the results given in the paper will play a significant role in future work in the field. Some minor issues: - In line 133, should it be "for sufficiently small step size"? - In line 152, is z missing the subscript t? - In equation 9, can you please define h explicitly, as it is quite confusing.

2-Confident (read it all; understood it all reasonably well)

This paper proposes a new algorithm called the truncated generalized gradient flow (TGGF) to solve quadratic linear systems. This algorithm starts with an orthogonality-promoting initialization and then then successively updates scaled iterations, which is in contrast with to existing literature where spectral initializations are used.

I found the paper in general well-written and the results solid. The main result Theorem 1 assumes noiseless measurements. Can this be generalized to the case of noisy measurements?

1-Less confident (might not have understood significant parts)

The authors propose a novel algorithm, called Truncated Generalised Grdient Flow, to solve a system of quadratic equations, which is a NP-hard problem. The authors show how their method outperforms existing ones in terms of quality of the result, reaching an exact solution with high probability even when the ratio between number of equations and number of unknown approaches the minimum theoretical limit (which is not the case of the other existing methods mentioned in the manuscript). Also, the solution proposed here keeps a level of complexity that is linear with the time required to read the data

The paper is very well written and structured. The problem is clearly stated and the proposed method represents an improvement above the other ones mentioned in the paper, which is clearly proved by the extensive comparison between the methods carried out throughout the entire paper. You give a good introduction, however I would advise to complete it by briefly mention existing methods performance and that of your technique (also lines 89-93 could be moved in the introduction, as well as lines 106-107). Finally, it would be a good addition to the paper if you could include a real-data example of a typical problem where solving such a system of equations is fundamental. This would clearly create a gap between your method and existing ones.

2-Confident (read it all; understood it all reasonably well)

The problem of recovering a vector from quadratic measurements (a.k.a. phase retrieval) is studied. This paper follows the now-standard recipe of formulating a (non-convex) optimization problem, which is solved by a gradient-descent-like method with suitable initialization. However, key changes are made to both the initialization and the gradient-descent stage: 1) As in existing works, the initialization is obtained as an eigenvector of a matrix. However, the matrix proposed in this paper is quite different from that used in existing methods. The improved sample complexity due to this change is demonstrated clearly in the numerical experiments (Figs. 3 and 4). 2) The refinement post-initialization is essentially gradient-descent, but ignoring a few undesirable data points at each iteration. The novelty here compared to existing works is that both the objective function and the truncation rule are new. The improved sample complexity due to this change is depicted in Fig. 1. The overall advantages of the new method in terms of sample complexity are depicted in Fig. 5. The computational complexity is not significantly better than existing methods.

1) Figure 1 provides a comparison of different algorithms which are all initialized using the truncated spectral initialization. Here, the success rate of TWF is significantly worse than TGGF. It would be very interesting to see what happens if the initialization is replaced by the new initialization proposed in this paper. Is the success rate of TWF still worse than TGGF? If not, then it will tell us that the improvement of TGGF can be essentially attributed to the improved initialization. If yes, then it increases the strength of this paper by giving evidence for improvement which is due to the second stage. 2) The TWF paper also provides some experiments using structured sensing vectors (coded diffraction patterns). How does TGGF perform if the sensing vectors are structured and not random? 3) While the empirical results in the noisy case are very good, theoretical results for the noisy case are missing. This is not a major concern about this paper, because the empirical results for both the noiseless and noisy case, and the theoretical results for the noiseless case are already good. 4) For the noisy case, the measurement model is considered to be |a_i^Tx + eta_i| instead of |a_i^Tx|^2 + eta_i as in some earlier papers. Also, eta_i is assumed to be Gaussian, not Poisson or arbitrary. Since the paper anyway does not provide theoretical results for the noisy case, can numerical experiments for the case of Poisson noise be provided, which is a good model for imaging applications? 5) Is the improvement of TGGF over TWF purely in terms of the sample complexity? Say n=1000 and m=8000. How does the rate of convergence of TGGF compare with that of TWF?

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

The paper proposes an almost everywhere differentiable generalization of subgradient solution to an over-determined system of quadratic equations with random coefficients that is know to be inconsistent. The cost function was shown to belong to the class of nonconvex and nonsmooth function. The authors established orthogonality in identiying a suitable initialization followed by the update with the generalized gradient iteration.

The paper is impressive in the structure, contents and presentation. The algorithm was well justified and the results are good. However, it is recommended that the authors look into the 4n-2 claim for the complex case in page 2. Also, reference [22] included did not have the name(s) of the author(s).

2-Confident (read it all; understood it all reasonably well)

The authors provide an algorithm for solving a random system of quadratic equations using truncated gradient updates. The method minimizes the least squares criterion based on the amplitude based empirical loss. The truncation is different than Chen, Candes 2015 in the sense that large-sized gradients are not thrown away. Authors theoretically analyze the noiseless case and support their analysis via simulations.

Definition 1 (generalized gradient) is not referred to in main text. Where is this definition used, why is this needed? In line 145, do you mean "sufficient" instead of necessary? Equation 9 can be presented much better so that definition of h is clear to the reader without too much effort. This is the first time h appears, and it is not explicitly defined anywhere. In line 154, authors invoke strong law of large numbers, but both fixed points x and z^* depend on the data, hence there is dependence among the summands. Also, why is h a good direction? Please elaborate on this. In line 164, how much does the statement rely on the gaussian data assumption? Authors do not mention about the noisy case in Section 3. Can you generalize the main theorem to the noisy case? Or is there a technical challenge in this case? In any case, a comment about this would be helpful for future research. Does the proposed algorithm work on datasets that have distributions far from Gaussian? A comment is necessary. The authors should test the algorithm on real datasets to validate its performance, i.e. image recovery example in TWF paper.

2-Confident (read it all; understood it all reasonably well)