Review for NeurIPS paper: A novel variational form of the Schatten-$p$ quasi-norm

NeurIPS 2020

A novel variational form of the Schatten-$p$ quasi-norm

Review 1

Summary and Contributions: This paper presents a new variational form of the Schatten-p quasi-norm via matrix factorization. The authors showed that the the local minimum of the factorization model is equivalent to the local minimum of the original model.

Strengths: The variational form of SVD-free Schatten-p quasi-norm appeared in previous work [4], in which the value of $p$ has discrete values. In the current work, the authors generalized the $p$ to arbitrary value in (0,1). The authors also provided some theoretical guarantees about the optimization and local minima.

Weaknesses: (1) The work is a little incremental. Although the authors extended the $p$ to arbitrary value, the impact is limited in practice because in previous work [4], for any $p$, one can always find a discrete p' such that p'<p. (2) The formulation (9) already appeared in the following paper: Giampouras et al. Alternating Iteratively Reweighted Least Squares Minimization for Low-Rank Matrix Factorization. IEEE TSP 2019. (3) The analysis of "escaping from poor local minima" is based on the rank one update. But the optimization algorithm for matrix completion is based on block successive minimization.

Correctness: Yes.

Clarity: Yes.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: (1) The authors proposed to use rank one update to avoid poor local minima but in the experiment block successive minimization was utilized. The authors should explain the discrepancy. (2) Does Theorem 1 still hold for $p>1$? (3) It would be better if the authors could provide an intuitive example of the "poor" local minima in Section 4.2. %%% Thanks for the efforts. I am satisfied with the response. I increased the score to "accept"

Review 2

Summary and Contributions: This paper builds on a large literature of Schatten relaxations, and presents an elegant and compelling new approach to optimizing Schatten norms with p<1 without resorting to SVD computations. The authors also present an analysis showing the factorization does not introduce spurious local minima, and present a method to add a rank 1 component to reduce the rank further. Numerics validate the approach and show it to perform about as well as the best previous work.

Strengths: The factorization presented is more elegant than previous work while attaining similar performance. The theory presented is cogent and relevant: it is nice to know that the factorization does not introduce spurious local minima, and it is surely useful to know how to escape any local minima we may find.

Weaknesses: The method does not present an empirical improvement. The numerics are not so exciting. While the theory in this paper is stronger, it seems the method performs almost identically to the FGSR when the Schatten p norm target is similar.

Correctness: Yes, claims seem sound.

Clarity: Yes, it is easy to read, despite a few typos (noted below in detailed remarks).

Relation to Prior Work: Yes, the relation to previous work is clear. On the other hand, the idea in 4.2 is very far from new. For example, a similar idea was proposed in https://arxiv.org/abs/0807.4423. I wonder if these ideas might allow a generalization beyond RIP matrices?

Reproducibility: Yes

Additional Feedback: * line 22: typo, should be Frobenius * write out "with respect to" in text. * line 93: use an em-dash (written as three dashes --- in latex), not hyphen. * Theorem 3 Do \hat{U} and \hat{V} have to be in the specific form in theorem 3? Looks like it is necessary for \hat{U} and \hat{V} to be of the form of orthornomal matrix times positive diagonal matrix, which is not always satisfied for arbitrary local minima. Even though the paragraph after Theorem 3 says one can always achieve a local minima for X if a local minima for (U,V) is attained. Please explain. * What is R in Proposition 1? The delta = 0 case is quite rare and not so useful. Perhaps make it a remark instead of a part of the proposition. * Experiments seem to suggest smaller p is better; can you comment? The plots are also a bit difficult to see due to the many overlapping curves. * The author might want to provide a definition of regular subgradients in the appendix.

Review 3

Summary and Contributions: Authors propose to re-formulate the schatten p-norm (l_p norm of singular values of a matrix) using a variational form. They provide RIP based recovery results for matrix completion & propose to solve the problem using a conditional gradient a.k.a. Frank-Wolfe method

Strengths: Strength of the work are in discussing RIP-based recovery guarantees to S_p norms.

Weaknesses: The "novel" variational formulation is not novel. A more general statement is known since Jameson, Graham James Oscar. Summing and nuclear norms in Banach space theory. Vol. 8. Cambridge University Press, 1987, see also Convex relaxations of structured matrix factorizations https://arxiv.org/pdf/1309.3117.pdf Proposition 6 for example. Optimization using this norm hence becomes straightforward, see the arxiv reference above

Correctness: Looks correct to me

Clarity: Pretty clear

Relation to Prior Work: Missing references that I cited which question novelty of the work

Reproducibility: Yes

Additional Feedback: