NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:4190
Title:A Polynomial Time Algorithm for Log-Concave Maximum Likelihood via Locally Exponential Families

Reviewer 1


		
In terms of related work: are there any results on approximating a function by a convex function -- that seems like a simpler question and looks somewhat relevant. It would help to give some one/two sentence intuition for the definition of the tent function. In the related work you mention a lower bound of (1/eps)^O(d) for *any* estimator -- I didn't quite understand the exact difference from your formulation that allows a poly time upper bound. I understand that there is much prior work on this problem and the result seems interesting and significant -- but it would be nice to point out any specific applications or directions that could potentially be enabled by this finding.

Reviewer 2


		
The paper proposes the first polynomial time algorithm for computing the MLE of log-concave distributions in high dimensions. The proposed approach leverages the fact that log-concave MLE is contained in “tent densities”. Based on this observation, they propose a stochastic convex optimization algorithm for computing MLE. The main subroutine of the algorithm is an algorithm to sample from tent densities. The paper is theoretically interesting and provides a polynomial time algorithm, however, the degree of the polynomial is quite high. For example, step 1 of the Algorithm 2 takes time O(d^5) for d dimensional data. Further, given the sample complexity of the algorithm itself is exponential in d, the advantage of the polynomial time algorithm is not clear. In lemma 1, tent poles are defined as points X_i, but in line 184 it is defined as pairs (X_i, y_i) Is the epsilon in the sample complexity discussion (lines 156 - 170) different than the epsilon in Definition 2? If so, please clarify. Update: Thanks for answering my questions.

Reviewer 3


		
Post-rebuttal: The authors have promised to incorporate an exposition of the sampler in the revised paper, I believe that will make the paper a more self-contained read. I maintain my rating of strong accept (8). --------------------------------- The family of log-concave distributions is a powerful and general family of distributions that are often used in machine learning (they include most of the common distribution families). I think this paper makes very nice contributions to the fundamental question of estimating the MLE distribution given a bunch of observations. I think the key contributions can be broken up into two key parts: - A bunch of simple but elegant structural results for the MLE distribution in terms of 'tent distributions' -- distributions such that its log-density is piecewise linear, and is supported over subdivisions of the convex hull of the datapoints. This allows them to write a convex program for optimizing over tent distributions. This program captures the MLE distribution. - In order to apply SGD (or other optimization methods) to optimize this program, we need a (sub)gradient oracle. Constructing such an oracle is the other key contribution of this paper. This part is technically involved and the authors build an algorithm to approximately sample from the current distribution. The authors give the first efficient algorithm for MLE estimation problem over a natural and important class of distributions. I think the paper merits acceptance on the strength of the results, and the techniques necessary.