NeurIPS 2020

### Review 1

Summary and Contributions: This paper proposes a semi-explicit expression for the expected value of a random projection matrix involved in sketching. More precisely, if S is the sketch matrix for A the data matrix, then in order to investigate the error of sketching one needs to control \Porth=I-P, where P is the orthogonal projection on the subspace spanned by the rows of SA. The main result of the paper is that \expec{\Porth}=(\gamma A^\top A+I)^{-1} up to error terms depending on the stable rank of A, and where \gamma is solution of some equation. Some applications of this results are given in the field of randomized iterative optimization, and the validity of the main result is verified by several numerical experiments.

Strengths: *after authors' response:* Thanks for the response, I will keep my score. -------------------------------------------------------------------------------------------------------------- The theoretical result is new, and has very interesting corollaries for randomized iterative optimization schemes (namely, trivializing the convergence proofs). Another application of the result is implicit regularization: the paper shows that Theorem 1 can be used to study the bias of the \ell_2 regularized solution for least squares. The exposition of the theoretical results is very neat, and quite understandable. Some care was also given in demonstrating numerically the peertinence of the main result in simple, understandable cases.

Weaknesses: In my opinion, the main caveat is the non expliciteness of \gamma. While this is seriously adressed by the paper in Section 4, I feel like I still do not understand well the behavior of k after reading the paper.

Correctness: As far as I can tell, the proof of the theoretical results are correct. One minor thing: line 87, the paper states that "\gamma increases at least linearly as a function of $k$." Even if the proof is not complicated, I think it deserves to be explained to help the reader get intuition on \gamma. Simple bounds, and maybe a graph, would also help.

Clarity: The paper is very well-written.

Relation to Prior Work: The prior work is appropriately cited. Maybe it would just be better to cite Hanson and Wright when Hanson-Wright inequality is used in Lemma 2 even though the version from Zajkowski (2018) is used.

Reproducibility: Yes

Additional Feedback: This is just a suggestion, but I think that Section 1.3 should be expanded. The current version of the paper goes a bit quickly there for readers that are not familiar with randomized iterative optimization, double descent, or both topics. It is especially hard to see from where the equation line 136 comes from. This could be achieved easily by removing one of the two proof sketches of Theorem 2 (page 5): one of them is enough. Another suggestion: since the closed-form expressions of \gamma are obtained in the n,\to +\infty limit, it looks possible to consider centered i.i.d. data and use Marcenko-Pastur to approximate the sum in (4), therefore obtaining another expression for \gamma in this case.

### Review 2

Summary and Contributions: This paper is concerned with sketching of matrices, where the authors use subgaussian sketching matrices. The authors derive accurate predictions for the residual projection matrix. This allows them to make accurate prediction for the low-rank approximation error. Furthermore, the authors conduct simulations, which confirm the predictions of their theory. ----------------------------------------- I want to thank the authors for their rebuttal and for answering my questions. I will keep my score.

Strengths: This work characterizes the low-rank approximation error precisely with respect to the singular values of the sketched matrix. Since their guarantee depends precisely on the distribution of the existing singular values, this goes significantly beyond existing literature, where typically worst-case guarantees are discussed. This makes this paper an important contribution in the sketching literature.

Weaknesses: -Theorem 1 and Theorem 2 only hold with high probability according to the proofs. This is missing in the statement of the both theorems. The authors should add this to the statement. -In order to quantify the low-rank approximation error the authors use the quantity E||A-AP||^2. Since this is central to the paper, it would be great if the authors could add more explanation and motivation to it (and maybe mention other notions).

Correctness: The claims are sound and the arguments in the main part are correct.

Clarity: The paper is clearly written.

Relation to Prior Work: Prior work is clearly discussed.

Reproducibility: Yes

Additional Feedback: What is the high-level motivation of relying on the rank-one update formula in the proof (see Lemma 1)?

### Review 3

Summary and Contributions: This paper offers spectral bounds when sketching matrices using certain types of random random projections. The results are novel, compared to literature, in that the authors give bounds in terms of the stable rank. As a special case, the bounds improve if the spectral gap is large enough.

Strengths: Most random projections are analyzed in the worst case. This is usually not particularly surprising, as there is no worst or best case in most regimes. This is also their feature, as they are data oblivious, meaning that any special structure is difficult to exploit. As such, identifying a setting where there are better bounds (e.g. spectral gap) is a very nice result.

Weaknesses: The related work could be done better. Matrix sketching is a very large field with plenty of work. While the selection of citations are ok, I wish the authors would have placed greater emphasis on previous results with regards to subspace embeddings, especially when placing their own results. This should not necessarily impact the paper's chances of getting accepted, but nevertheless I hope that the authors do that for the final version

Correctness: yes

Clarity: yes

Relation to Prior Work: absolutely inadequate.

Reproducibility: Yes