NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:2674
Title:Global Convergence of Least Squares EM for Demixing Two Log-Concave Densities

Reviewer 1

The task of demixing a balanced two log-concave densities with the same and known covariance matrix seems restrictive. The authors did not discuss whether the results apply to unbalanced mixtures of two log-concave densities, nor did they discuss the implication of requiring covariances for the two being the same or the knowledge of covariances. The works builds on previous works on global convergence guarantees via EM for a balanced 2 GMM (or mixture of 2 truncated Gaussians) with known covariance, as well as following up works on mixture of 2 linear regressions and mixtures of 2 1-d laplacian distributions. The methodological contribution is to modify the M-step. However, the technical difficulty of proving convergence or providing finite-sample analysis compared with previous works is unclear. The incentive of using EM algorithm is unclear. It is well known that global convergence of EM has been established for mixtures of two densities (not more than two). However, there are other methods such as spectral and tensor methods that are guaranteed to converge to global optima for mixtures of more than two. So the question is why not consider the spectral methods? How does the least squares EM compare with, say tensor methods? The robustness analysis under model mis-specification is under a restrictive setting where the ground truth densities are log-concave and the fitted are Gaussians. I am curious in what happens if the ground truth densities are not log-concave. If the theoretical analysis is difficult to be established, can you show some experimental results? The theoretical analysis seems sound, however it would be nice to show some experimental results. ================ Comments after author response: After carefully reading the author's response, I am more convinced about the technical challenges involved in this paper. I would leave my score unchanged though since I think adding some experiments and a more general model misspecification analysis would make the paper much more impactful.

Reviewer 2

The study of finite mixture of distributions is an important issue in machine learning. Except for the Gaussian Mixture Model (GMM), there is not theoretical guarantee for the parameter estimation problem for parametric mixture models. This paper proposes to provide some guarantee for a wider class of mixture models, namely the mixture of log-concave distributions. The expectation-maximization (EM) algorithm is the standard tool to study mixture models. However, the global convergence of the EM algorithm is not ensured, even in the simple case of GMM. Recently, the global convergence of the EM algorithm has been established for a balanced mixture of two Gaussians, two truncated Gaussians, two linear regressions or two Laplace distributions. The present paper extend these previous works to a mixture of two log-concave distributions. In particular, the explicit formulation of the density is not known in this context. To overcome this issue, the authors propose a new variant of the EM algorithm: the Least Square EM (LS-EM) algorithm. The basic idea is to replace the classical M-step of the EM algorithm by a least square problem. In particular, for GMM, the LS-EM is reduced to the EM algorithm. The authors prove the global convergence of the LS-EM algorithm for a mixture of rotation invariant log-concave distributions, with random initialization. Convergence is established for any dimension in the infinite sampling setting and for dimension one in the finite sampling setting. Comments: This new convergence study is very interesting. However, some assumptions made by the authors reduce the scope of application of this paper. First of all, the covariance of the sampling is supposed known and the estimation concerns only the position parameter. Moreover, the mixture is assumed to be balanced and symmetric. Last, the distributions are assumed to be rotation invariant ; in practice many interesting distributions do not satisfy this condition (as claimed by the authors in their conclusion). Despite these limitations, the authors propose a new way to demonstrate the global convergence of an EM-like algorithm based on a geometric notion of angle-decreasing. They also provide a demonstration of the non-effectiveness of the traditional approach through the l2 distance. The paper is well-written. Unless I am mistaken, the proofs (in the supplementary material) are correct, very clear and easy to follow. There are some typos in the supplementary material but they do not interfere with reading. Numerical experiments are provided in the supplementary material but it is not mentioned in the main text, which is unfortunate. Last, the author refer to a “regular condition” throughout the paper but never explicit it. I guess the authors are referring to the classical assumption for differentiation under the integral sign but I would have liked to see it explained, at least in the additional material.

Reviewer 3

~~~~~~~~~~~ After rebuttal: Thank the authors for clarifying my comments #1 and #3. Since my comments on #2, finite sample analysis for a general dimensional case, was not addressed, I would not increase my rating and would like to keep the current "accept" rating. ~~~~~~~~~~~ This paper studies the theoretical properties in the estimation of location of mixture of two rotation invariant log-concave densities. It shows that a newly proposed LS-EM algorithm converges to the true location parameter with a random initialization. In the finite sample analysis, explicit convergence rates and sample complexity bounds are further developed. Strengths: 1. This paper is very well written and easy to follow. 2. It addresses an important and challenging problems. 3. The theoretical analysis is non-trivial and the technical contribution is high. Weakness: 1. A special case of the proposed model is the Gaussian mixture model. Can the authors discuss the proved convergence rates and sample complexity bounds with that established in GMM (Balakrishnan et al., 2017)? It is interesting to see if there is any accuracy loss by using a different proof technique. Sivaraman Balakrishnan, Martin J. Wainwright, and Bin Yu. Statistical guarantees for the EM algorithm: From population to sample-based analysis. The Annals of Statistics, 45(1):77–120, 2017 2. The finite sample analysis (sample complexity bounds) is only derived for the 1-dimensional case. This largely limits the popularity of the proposed theoretical framework. Can the authors extend the finite sample analysis to a general d-dimensional case, or at least provide some numerical study to show the convergence performance? 3. In Proposition 6.1, the condition \eta \ge C_0 for some constant C_0 seems to be strong. Typically, the signal-to-noise ratio \eta is a small value. It would be great if the authors can further clarify this condition and compare it with that in Section 4 (correct model case).