NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 5343 Decentralized sketching of low rank matrices

### Reviewer 1

I have a few major questions about the paper that should be addressed in a published version of this paper. * Does the sensing model interact with the new choice of norm? * Compare to the literature on matrix sensing as formulated in [1]. Why do recovery results for matrix sensing not apply to this problem setting? The sensing matrices A would have only one nonzero column each... * Why do you call this model "decentralized"? Recovery still requires localizing all the measurements to a single master node (to solve the SDP). And it is easy to implement more general matrix sensing so it is equally decentralized: just add up the measurements from each column to form the measurement of the full matrix. (Perhaps the matrix sensing approach to decentralized sketching requires more measurements?) * Does the mixed norm have an interpretation as a spectral function, or is it not equivalent to any function of the spectrum of the matrix? * The experiments should be improved. In particular, the authors should also compare with max norm and other more exotic norms such as [2] (or explain why they are inferior). How were the parameters chosen, such as the radius R? * Please say more about what elements of your proof are standard, and which elements are new to adapt to either the sensing structure or the mixed norm. and a few detailed comments: * line 135: consider citing [3], which explores tensors norms of this form in detail in the context of matrix estimation problems. * line 139: I found the notation for injective vs projective norms distracting. Is it standard? [1] Recht, Benjamin, Maryam Fazel, and Pablo A. Parrilo. "Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization." SIAM review 52.3 (2010): 471-501. [2] Shang, Fanhua, Yuanyuan Liu, and James Cheng. "Tractable and scalable Schatten quasi-norm approximations for rank minimization." Artificial Intelligence and Statistics. 2016. [3] John Bruer. "Recovering Structured Low-rank Operators Using Nuclear Norms." PhD Thesis, Caltech, 2018. https://thesis.library.caltech.edu/10048/

### Reviewer 2

The critical part of the algorithm/analysis seems to the notation of the mixed norm defined in Eq. (4) and so the traditional recovery approach can be adapted to a carefully defined set of data matrices depending on the max ell_2 column norm and the mixed norm. The algorithm is just to solve a least square minimization over that set of data matrices. Is it true that the measurement matrices need not be of Gaussian entries and can be any symmetric subgaussian entries? A key entropy estimate is Lemma 2, which lacks a proof, not even in the supplementary. The paper seems to be written in a rush and is clearly not proofread. It is infested with writing problems, particularly seriously concerning the definition, and thus impedes understanding. The following is not an exhaustive list: Line 48: say ‘Assume’ instead of ‘It follows’ Equation (2): what is the inner product? Equation (3): the subscript field of ‘max’ on the right-hand side should be j = 1,\dots,d_2 Equation (4): What are the dimensions of U and V? Arbitrary matching dimensions? Line 131: maybe just say X is a map from (R^{d_2}, ell_p) to (R^{d_1}, ell_q), which is a cleaner expression. Equation (9): What is the use of u? Is it ||Xu||_q? Is the dimension of u correct? Line 143: what is the *-norm? Trace norm? Line 144: What is the infinity-norm? Maximum absolute column sum?

### Reviewer 3

The paper is very clearly written. I particularly appreciated the simplicity of the exposition both in the main paper and the appendix. The quality of the presentation is high. The authors clearly outline their techniques at multiple levels, and also discuss possible obstacles along the way and how they resolved them. The techniques seem original. The main component seems to be the use of the new mixed norm as a regularizer in low-rank matrix recovery. This is not altogether surprising, since earlier papers by Lee, Srebro, and co-authors also proposed similar norms in the same context. However, such an approach to the compressive PCA setting is to my knowledge not very common, and gives improved (for-all) guarantees in contrast with earlier work by Negahban and Wainwright. Compressive PCA is a somewhat more challenging problem than regular low-rank matrix recovery from affine measurements, because of the tensor-product structure of the linear observation model. To my knowledge, most existing theoretical results for low-rank matrix recovery do not quite cover this setting. The results of this paper fill this gap. === Update (after reading rebuttal/reviews): thanks for the response. Consider adding real-data experiments, as well as incorporating other suggestions by the reviewers.