NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:2244
Title:Practical and Consistent Estimation of f-Divergences

Reviewer 1

There are several rooms for improvement of the proposed method. For practical use of the RAM-MC estimator, \hat{q}_N(z) should be known, however, this assumption seems restrictive. Several sampling methods work without known density functions, hence it is necessary to develop an estimator without knowing density functions. Also, the theoretical analysis may be improved. Since a main source of the error is from estimating density functions from both theoretical and practical aspects, hence an error of the estimator should be investigated without known density functions.

Reviewer 2

This paper discusses mainly theoretical convergence results for MC-based variational estimators of f-divergences. To the best of my knowledge, the presented theoretical convergence analysis of the f-divergence estimators is novel work. The results give nice and comprehensive rates of convergence of both the estimator's expectation, as well as its MC-estimator, for different f-functions. This convergence result gives theoretical justification to several well-established methods that can be interpreted as special cases of this work. The paper is well structured, very accessible and provides both high-level proof sketches and exhaustive proofs in the supplementary material. The work is in general well motivated and provides practical context to the theoretical results. As a minor drawback, the motivation and context seems to focus quite strongly on variational-inference related methods, while estimators of f-divergence are of interest in a much broader field as well. Limitations of the proposed results were recognized (such as constants still potentially depending on the dimensionality, but not in N^{-d}).

Reviewer 3

The paper considers an important problem of estimating f-divergences under a scenario which has a lot of potential applications. Pros: - The proposed estimator is simple to understand and implement. - The theoretical analysis is complete and many cases are examined. - Simulation considers many cases and contains real data and simulated data, as well as comparison of other methods. Cons - Some notations in the paper are confusing. (e.g. line 59, q() is multi-defined. Please use proper footnote to distinguish them) - N=1 in the simulation seems not very necessary. What is the purpose here? The other methods are all under N=500 - If there is an extra table comparing the proposed method with the convergence rate of the existing method that would be very helpful. Overall a good contribution to the venue and I would recommend the paper to be included. Edit: Thank the authors for the rebuttal. I appreciate the specific replies. I will keep my score as accept.