NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:2800
Title:Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery

Reviewer 1

This study links "factor group-sparse regularized matrix factorization" and "Schatten-p-norm regularization with p=1/2 or 2/3" theoretically. This perspective is interesting. However, the proposed method and results do not have big impact in practical perspective because the convex regularized matrix factorization itself is very naive, and non-convex regularized low-rank matrix recovery is now widely studied and there are many related works such as truncated nuclear-norm[ex1], weighted nuclear-norm[ex2], capped-l1[ex3], LSP[ex4], SCAD[ex5], and MCP[ex6]. [ex1] Y. Hu, D. Zhang, J. Ye, X. Li, and X. He, “Fast and accurate matrix completion via truncated nuclear norm regularization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 9, pp. 2117–2130, 2013. [ex2] Gu, Shuhang, et al. "Weighted nuclear norm minimization with application to image denoising." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. [ex3] T. Zhang, “Analysis of multi-stage convex relaxation for sparse regularization,” Journal of Machine Learning Research, vol. 11, pp. 1081–1107, 2010. [ex4] E. Candes, M. Wakin, and S. Boyd, “Enhancing sparsity by reweighted l1 minimization,” Journal of Fourier Analysis and Applications, vol. 14, no. 5-6, pp. 877–905, 2008. [ex5] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001. [ex6] C. Zhang, “Nearly unbiased variable selection under minimax concave penalty,” Annals of Statistics, vol. 38, no. 2, pp. 894–942, 2010. Also in another perspective, greedy rank-increment approach [ex7,ex8,ex9] for low-rank matrix recovery should be referred for discussion. This does not need to estimate initial d unlike regularized matrix factorization methods, and it is usually memory efficient. [ex7] M. Tan, I. W. Tsang, L. Wang, B. Vandereycken, and S. J. Pan, “Riemannian pursuit for big matrix recovery,” in Proc. 31st Int. Conf. Mach. Learn. (ICML), pp. 1539–1547, 2014. [ex8] Yokota, Tatsuya, and Andrzej Cichocki. "A fast automatic low-rank determination algorithm for noisy matrix completion." 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE, 2015. [ex9] A. Uschmajew and B. Vandereycken, “Greedy rank updates combined with Riemannian descent methods for low-rank optimization,” in Proc. Int. Conf. Sampl. Theory Appl. (SampTA), pp. 420–424, 2015.

Reviewer 2

This paper proposed a new non-convex regularizer for the low-rank matrix analysis, which can get rid of SVD, and apply to large-scale matrix analysis. The theoretical analysis gives the bounds for the proposed method.

Reviewer 3

This paper focuses on the problem of the low-rank matrix recovery, and proposes a factor group-sparse regularizers as a relaxation of the number of nonzero columns in a factorization of the matrix. It seek to minimize the number of nonzero columns of the factored matrices. Analysis is conducted. However, there are some concerns to be addressed. -The motivation is somewhat unclear. In introduction, some contents should be presented in the related work section or the Preliminary section to well show the motivation of this work. -My another concern is about the value of p. In real application, how to set the suitable values of p? Is there any suggestion? -In experiments, how about the results of FGSR-1/2? It is better to show the results of different values of p.

Reviewer 4

Updated after author feedback: The authors did not have the opportunity to address my concerns about the experimental setup, as this review was solicited after the response period. But I think the changes they have made in response to other reviewers (comparisons with non-convex estimators) have increased the persuasiveness of their experimental comparisons. My score remains 7. The authors extend the factorization of the nuclear norm into the sum of squared Frobenius norms to find factorized expressions for the low-rank promoting Shatten quasi-norms. The advantage of these factorized expressions is that they are weighted powers of the column norms of the factorizations, so are computationally attractive in one sense. On the other hand, they are non-convex. The authors demonstrate empirically that these estimators perform well in practice using their suggested optimization algorithms, outperforming prior factorization and nuclear norm-based formulations of low rank matrix recovery problems. The paper is well-written, with the exception of the experimental section which is too vague about how the experiments were conducted. The results should be averaged over multiple runs and standard deviations should be described. The procedure for choosing the hyperparameters of the methods should be described so the readers can be sure that the comparisons are fair. How were the methods implemented (especially, what algorithms were used for the baseline methods?). I am concerned by the fact that in Figure 1d the factorized nuclear norm approach is slower than the FGSR-2/3 method: they are both sums of weighted powers of column norms of the factorizations, but the former is a smooth objective, while the latter is not. What is causing this counter-intuitive result? Overall, the main contribution (the factorized representation for the Shatten quasi-norms) is interesting and useful in itself, and the resulting algorithms are reasonable, but the experimental evaluation is questionable. Thus my score is 7: I recommend accept, although I will be fine with a reject.