NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona
Paper ID: 1292 Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

### Reviewer 1

#### Summary

This paper unifies several previous approaches to the problem of density and entropy estimation from i.i.d. samples of a random variable. The authors derive new estimators and provide theoretical and experimental improvements over previous state-of-the-art (generalizing previous approaches to the problem). They accomplish this by reducing the data-dependent bandwidth of a local likelihood density estimator, and then proving that the increased bias is distribution independent and can therefore be subtracted off. Under some assumptions, the authors further can write this estimation bias in closed form.

#### Qualitative Assessment

The technical quality seems very good, worthy of an oral presentation. The authors should discuss the theoretical impact of truncation to $\log n$ nearest neighbors discussed on line 153-155. This is done to improve computational efficiency without discussion of how it affects the estimation. My only other technical comment is on the scaling parameters of order statistics. Lemma 3.1 assumes $m_n$ is order $\log n$, but in the experiments m=50,000 and n is either 1,000,000 (Table 1) or between 100 and 1500 (all other experiments). The authors should comment on the latter case where m >> n, and the convergence of their total variation bound as a function of m and n. The work seems significantly novel and can have quite a large impact since it generalizes and unifies several previous heuristic results. However, empirical performance of mutual information and entropy estimators are only shown on synthetic data. An unsupervised learning problem on real data would have shown the true impact of this approach to the machine learning community. Overall the paper is written well and explains the previous history and setup of the studied problems. There are a few grammatical errors and typos that do not detract from the overall presentation ("Despite the advantages of the LLDE, they require...", "k-NN method suffer from…”). Compound words like "log-likelihood" and "bottleneck" are split into two words when they are typically combined. Also, equation (2) cuts into the equation label. Some log-log plots only have logarithmic axes on the y axis. I enjoyed placing experimental results and figures in the relevant sections throughout. My only other suggestions would be to provide more intuition on the choice of exponential random variables/random vectors on the sphere, and to move some of the discussion in Section 6 to earlier in the paper.

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

### Reviewer 2

#### Summary

This paper proposes a novel entropy and mutual information estimator that in a way combines kNN and kernels, by making the (local) kernel bandwidths dependent on kNN distances. This adaptive bandwidth selection results in s bias that the authors characterize analytically. Also, as opposed to a few other recent methods, this work establishes closed-form solution for local likelihood density estimation for certain choices of the local parametric form.

#### Qualitative Assessment

Recent work on entropy (and mutual information) estimation has shown that traditional plug-in estimators that rely on local uniformity assumption underperform under certain realistic conditions. Thus, recent efforts have proposed improved estimators that do not rely on local uniformity. This paper makes several novel and important contributions in that direction. First, it considers a local likelihood density estimation (LLDE) with exponential polynomial family, and establishes a closed form solution for polynomial degrees p=1,2. Remarkably, the results hold in arbitrary dimensions, which greatly improves on existing results (e.g., the practicality of LLDE in high dimensions was previously limited by the need to calculate high dimensional integrals). Second, the paper suggests a plug-in entropy estimator that uses LLDE with data-dependent bandwidth. As their main theoretical contribution, the authors derive a closed form expression of the bias term (resulting from the choice of local/adaptive bandwidth), and show that this term does not depend on data distribution. The authors also show that the above result holds for more general class of entropy estimators. And finally, the authors replicate the setup of KSG estimator in their framework by using correlated bandwidth, and show that it results in improved estimator for mutual information. This is an excellent paper overall, which advances state of the art in LLDE and entropy estimation. My main issue with the paper is the limited evaluation of the method against other approaches such as the ones proposed in [1,2,11], and which are known to significantly outperform the baselines considered by the authors. Also, all the experiments are for low-dimensional examples. Other comments: - Line 251: “Mutual information estimators have been recently proposed in [1, 2, 12], which claim to solve similar local likelihood maximizations as ours.” This language unnecessarily downplays the results from prior studies. While the solution provided here is better in certain ways, that does not invalidate the claims made in those studies. - Eq. 23: Should be Z-s instead of W-s

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

### Reviewer 3

#### Summary

This paper studies the bandwidth selection problem for shannon entrap estimation using local like likelihood based density estimator. Based on an analysis of the asymptotic bias, the authors propose to use KNN bandwidth selector but instead of the more classical data-independent estimator.

#### Qualitative Assessment

My main concern for this paper is that the theoretical analysis does not provide any explicit rate of convergence (either in terms of MSE or in terms of high probability rate). This is a little surprising, since almost all the rest of the estimators have strong guarantees on the convergence rate.

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

### Reviewer 4

#### Summary

The authors study the estimation of entropy and mutual information from samples. Their approach combines geometric and kernel-based ones: it uses a bandwidth choice of fixed k-nearest neighbor distances. Their new estimator has a bias that is universal in the sense that it is asymptotically independent of the underlying distribution; after subtracting this, the estimator becomes (asymptotically) unbiased. The authors provide extensive theoretical and practical results that support the validity and strength of the results,

#### Qualitative Assessment

The paper is generally well written and contains interesting results, both theoretically and experimentally. The idea of combining the two types of methods is nice and implemented well. The topic is certainly of interest to the NIPS community, and thus I believe that this contribution could be worthwhile to consider.

#### Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

### Reviewer 5

#### Summary

The paper studies nonparametric estimation of of differential entropy and mutual information and proposes a family of estimators based on resubstitution of a local likelihood density estimate, with a (fixed k) k-nearest neighbor bandwidth. This family includes the classic Kozachenko-Leonenko entropy estimator. Closed-form expressions for the estimators are derived for several cases of interest (Proposition 2.1). Results on the asymptotic distributions of k-NN distances (Lemma 3.1) are used to show that, after adding a bias correction (that can be estimated via Monte Carlo methods), the family of estimators is asymptotically unbiased (Theorem 2). In the particular case that variables have strongly functional relationships, it is suggested theoretically and empirically that the proposed estimators are a substantial improvement over the state-of-the-art.

#### Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

### Reviewer 6

#### Summary

The authors propose a new class of estimators for Shannon entropy (and Mutual Information (MI)). The estimators are density plug-in estimators based on the empirical average of the log of density estimates at points drawn from the underlying distribution. For density estimation, the authors use local likelihood density estimators (LLDE’s), which are higher-order generalizations of KDE and kNN density estimators.