NeurIPS 2020

Instance Based Approximations to Profile Maximum Likelihood

Review 1

Summary and Contributions: Statistical property estimation is an important and active area at the intersection of theoretical computer science, statistics, and information theory. For example, a basic question in this realm: given n iid samples from an unknown discrete distribution p, how well can we estimate the entropy H(p), and what is an efficient algorithm for doing so? Recent efforts have shown that, for any symmetric property, the profile maximum likelihood estimator is universally minimax optimal for a wide range of parameters. While this at first seemed like a purely theoretical result, algorithmic efforts quickly caught up to show that 1) efficient approximation of the profile maximum likelihood estimator is possible and 2) approximate profile maximum likelihood estimation suffices for minimax optimality. In this context, this paper refines recent approximation algorithms from exp(-\sqrt{n} log n) to exp(-k log n) where k is the number of observed frequencies, with k = O(\sqrt{n}). This is accomplished by drawing upon graph-theoretical arguments to improve upon approximation guarantees from existing work. These bounds are complemented by numerical experiments that show that the performance of the proposed estimator is comparable to prior art.

Strengths: From a technical standpoint, this is an impressive paper that draws upon deep facts from graph theory to refine the sparsification and matrix rounding steps in the approximation algorithm. Combining these improvements with the prior work from Anari et al. 2020, the authors are able to obtain an exp(-k log n) approximation. I think this is important work towards a full picture of symmetric property estimation.

Weaknesses: Edit after author response: I am satisfied with the authors' response regarding the issues raised below and I have increased my overall score accordingly. I think the biggest gap in the paper is that it does not convincingly show that the regime k << \sqrt{n} is important with respect to numerical experiments. In particular, all of the provided plots show approximation error that is comparable to prior work, but they do not demonstrate any advantage with respect to runtime or other parameters. Therefore, it remains unclear what practical advantage is provided by this refined algorithm. Note that it very well may be the case that the algorithms from prior work have similar performance, but it is difficult to prove that this is the case. Some further discussion by the authors would be appreciated.

Correctness: To the best of my understanding, the claims of the paper and the numerical experiments are correct.

Clarity: The paper is very well-written.

Relation to Prior Work: The paper is very carefully placed within the context of prior work. It draws upon prior results where helpful, and does not exaggerate its own contributions relative to those from prior work, which is much appreciated. As a result, the contribution of this paper in terms of the approximation guarantee is very easy to understand.

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This paper presents a new efficient algorithm for approximately computing the PML distribution. The proposed method matches the best known efficient algorithms for computing approximate PML and improves when the number of distinct observed frequencies is small. This work obtains the first computationally efficient implementation of PseudoPML. The authors also conduct experiments to evaluate their method.

Strengths: Theoretically, the authors prove that the proposed algorithm computes an exp(k log n) approximate PML distribution where k is the number of distinct observed frequencies ( k \leq sqrt{n} ), generalizing the best known result from that computed an exp( sqrt{n} log n)-approximate PML. This paper exploits interesting sparsity structure in convex relaxations of PML and provides a novel matrix rounding algorithm. The authors provide a simplified instantiation of their results that is sufficient for implementing PseudoPML, which is believed to be a key step towards practical PseudoPML.

Weaknesses: The main concern about this paper is that it seems the proofs rely heavily on the lemmas and theorems from previous papers, for example [CSS19a] and [ACSS20]. The technical contribution of this paper and challenging parts of the proofs are not very clear to the readers. Is there a main proving step to obtain the factor $k$ instead of $\sqrt{n}$? The authors may need to further clarify the technical contribution in the proofs that is different from the previous works( [CSS19a] and [ACSS20] ). ======================== Thank the authors for the response. The authors address the concerns. I raise the score from 6 to 7.

Correctness: The theoretical claims look correct. The detailed proofs are not checked carefully. The empirical methodology is correct.

Clarity: The writing of this paper is good.

Relation to Prior Work: The authors have clearly compared their results with previous works.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: Symmetric distribution properties such as support size, support coverage, entropy, and proximity to uniformity, are of prominent interest in machine learning. Profile maximum likelihood is a sample competitive approach for the estimation of such symmetric properties, which is in particular asymptotically sample-optimal for all the above properties.This paper proposes new and substantial improvements to the algorithmic side of the PLM estimation problem. New theoretical tools are introduced and the analysis is refined and deep. Some numerical results illustrate the theoretical contributions.

Strengths: The paper's contribution is worthwhile. Rigorous proofs are provided for the claimed results. A new matrix rounding is introduced and analysed in great details. The proposed approach seems very novel and powerful.

Weaknesses: I would have appreciate a wider account of the applicability of PML for practical problems.

Correctness: The proofs I have reviewed are correct.

Clarity: The paper is well written but rapidly enters into very technical details. I would appreciate if the authors could smooth the presentation out and explain the strategy in a less dry style.

Relation to Prior Work: Relationship with prior work is clearly stated.

Reproducibility: Yes

Additional Feedback: