NeurIPS 2019

**Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center**

### Reviewer 1

The authors propose mini-batched versions of matrix stochastic gradient (MSG) and regularized matric stochastic gradient (RMSG) for PCA. They show for recovering top $k$ the proposed methods match the per iteration complexity of the Oja's algorithm up to order k. They show rank control of the iterates in the proposed algorithms by deriving a sufficient condition on the rank of the iterates being $k$ in Lemma 5.2. Finally, some synthetic results are presented to verify the convergence and rank control for the proposed algorithm. Originality: The originality lies in introducing the mini-batching on top of MSG and R-MSG. Also, the paper introduces a novel sufficiency condition for rank control of the iterates of MB-MSG and MB-RMSG (Lemma 5.2). The previous rank control idea for the top 1 PCA in Garber et al. (2018) is claimed to be not easily extendable to top k PCA. Quality: The authors honestly present which ideas are existing and which ideas are new. The proofs are hard to follow at times as detailed in the Improvements part. I believe the proofs, however, I am not completely convinced of their correctness. Clarity: The exposition in the paper is clear. Significance: The paper takes an important step towards understanding the empirical performance of MSG and RMSG for the PCA problem. The results presented in the paper, work with mini-batched versions of MSG and RMSG. However, it is unclear how the results here can be extended for the general MSG and RMSG.

### Reviewer 2

The work studies two algorithms (MG-MSG and MB-RMSG) on a convex relaxation to principal component analysis and gives new performance bounds. The analysis and experiments are solid. I recommend it to be accepted.

### Reviewer 3

This work considers convex relaxation to the PCA problem and revisit two algorithms: matrix stochastic gradient (MSG) and l2 regularized MSG (RMSG) from the theoretical perspective. For the Algorithm 1, it is not clear on how to choose T, the number of iterations. For Theorem 4.1, it is a bit confusing that the upper bound will get large as the value of T increases. Note that T is the number of iterations. One would expect that a larger T should lead to tight bound. For the numerical study, it is better to compare some existing methods to have fair evaluation of the proposed method.