NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:2555
Title:Statistical bounds for entropic optimal transport: sample complexity and the central limit theorem

Reviewer 1

This paper provides several improved results on statistical qualities of entropic optimal transport. Entropic OT has been a heated area of research for the past several years. This paper provides improved theoretical understanding for entropic OT from a statistical point of view, improving upon previous bounds and providing a central limit theorem for entropic OT for subgaussian random variables. Previous CLT was known only for for probability measures supported on a finite set of points. Originality: Provides fresh theoretical results in sample complexity and limiting distribution that improve upon previously known results. Quality: The problems are well-motivated and placed in context. Theoretical claims are supported by proofs. Simulation studies assess empirical performance. Clarity: The paper is well-written and clear. Significance: This paper provides improved statistical understanding of entropic OT, with a highly improved sample complexity bound, and more broadly applicable CLT. *** I thank the authors for their careful reading of our reviews and for addressing my concerns.

Reviewer 2

AFTER REBUTTAL I thank the authors for their careful consideration of our remarks, I believe the paper will benefit from all these comments and am confident the authors will be able to improve it accordingly. ____________________________________________ The theoretical contributions of this paper for entropy-regularized OT are : - getting rid of a bad exponential factor in the previous sample complexity bound by Genevay et al - extending that result to unbounded domains for sub-gaussian distributions - a CLT for entropy-regularized OT, based on the proof for the standard case. The theoretical part is overall clear and well written, although: - briefly introducing F_sigma would improve overall readability (e.g introducing it as the set of functions with bounded successive derivatives) - it would be nice to state more clearly (in the abstract or intro) which additional hypotheses you use to improve existing results (subgaussianity + quadratic cost hypothesis). I believe it is not essential to get rid of the exponential bound, but required for the extension to unbounded domains? A couple of typos: - line 136 : should be v instead of u in the second term - line 140 : should be v instead of u - line 166 : theorem 3 instead of 2 Regarding the application to entropy estimation, I feel that it would benefit from a little rewriting and some clarifications in the experimental section. - prop 3 : what is Z_g? I an guessing the normalization constant for phi_g. - equation 13 : I understand here that S is the entropic OT where the cost function is c(x,y) = g(x-y) but it should be stated more clearly (maybe introduce S_c to insist on the dependence of S on the cost function). In the experiments, in figure 1, should the legend read epsilon instead of sigma_g? I am confused.

Reviewer 3

Typo: Just below (2) `defiend' is written instead of `defined'. There are many places in which the notation c(x,y) is used without it being defined, for example in (3). By comparing with Proposition 3 much later in the paper it seems that the authors intend c(x,y)=\|x-y\|^2/2, but this is not at all clear earlier in the paper. At the top of page 3 the authors state that the Wasserstein distance W_2^2(P_n,Q_n) converges to W_2^2(P,Q) no faster than n^{-1/d}. A citation should be provided. There is no discussion in the paper of how close S_\epsilon is to W_2^2, or which values of \epsilon one is interested in in practice. Is the idea to choose \epsilon small in order to estimate W_2? In this case I would have liked to see some bounds on |S_\epsilon - W_2^2|. Page 5, line 139. The authors write `Applying the bound on E L^2...', but no such bound has been established. Do they mean the bound on E L given in Proposition 2? This would suffice. It is not made clear what the functions f and g are in the statement of Theorem 3. I guess they are the optimal potentials from (4), but this should be stated. On page 6 the authors write that the proof of Theorem 2 has two main steps, and they label these steps (a) and (c). Is there a step (b) that is missing? Or did they mean to write (a) and (b)? My main concern is with the proof of Proposition 3 that provides the link between EOT and entropy estimation. It is written that \nu is assumed to be a translation invariant measure. In the proof, \beta is a probability measure which, in line 195, is taken to be \nu, and so it must be the case that \nu can be taken to be a probability measure. However, it seems to me that the only translation invariant measure on \mathbb{R}^d is given by the Lebesgue measure (up to a scale factor), which cannot be normalized to be a probability measure on all of \mathbb{R}^d. Therefore, it seems necessary to take \nu to be compactly supported, which is not what the authors are focussing their attention on, as they claim that there results only require bounds on the subgaussian constant, and not on the support. The proof of Propostion 3 thus seems to be incomplete. ========================================================== UPDATE (after author rebuttal): The authors' rebuttal addressed most of my concerns. In reply to the rebuttal, I will just add that Theorem 4 on entropy estimation here does not show how the bounds depend on the dimension d. If you want to discuss the behaviour of the entropy estimator when d is large then the bounds should show this dependence. I have now changed the overall score.