NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 4419 Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices

### Reviewer 1

Summary: This paper analyzes both Langevin dynamics (LD) and unadjusted Langevin algorithm (ULA) for sampling a target distribution When the target distribution is smooth and satisfies a log-Sobolev inequality (LSI), the authors show exponential convergence of LD to the target distribution in KL-divergence (Theorem 1) and then extends convergence bound to ULA with an additional error term (Theorem 2) by first proving Langevin dynamics (LD), . The author extend this result to convergence in Renyi-divergence (Theorems 3 + 4 for LD and ULA respectively), under the additional assumptions that perturbations of the target satisfy LSI (Lemma 8) and the existence of a growth function $g$ for ULA. The authors also provide a convergence results when the target distribution satisfies a relaxation of LSI (Poincare inequality) in the Supplement. Quality: The paper appears to be technically sound. To complement the theoretical convergence bounds, although the toy Gaussian examples are nice illustrations, the paper lacks (synthetic) experiments to complement the theoretical results. The paper does not (empirically) demonstrate the tightness (or lack of tightness) of the theoretical convergence bounds. The paper does not provide an example that is not strongly log-concave but satisfies LSI. The convergence bounds for ULA for Renyi divergence (Theorem 4 and Section F.2 for targets satisfying LSI and Poincare inequality respectively) rely on an unknown growth function (line 248). An example is provide for a Gaussian target, but it is not clear to me that this exists in general (for any target distribution satisfying LSI). Originality: To the best of my knowledge, analyzing the convergence of ULA for target distributions under LSI is novel as is measuring convergence in Renyi-divergence. The proof of the Theorems and Lemmas in the Supplement are elegant and easy to follow. Appropriate references to related work and comparisons to previous contributions appear to be included. Clarity: This paper is very well written and excellently organized. Really nice work. Although lines (66-72) provide background on what Renyi divergence is, it could be made more clear what the advantages and disadvantages Renyi divergence offers compared to KL-divergence (for q > 1). Significance: The main contribution of this paper is generalizing the convergence results of ULA beyond log-concave targets. This paper generalizes to target distributions satisfying LSI. Section 2.2 provides a compelling case for the reasonableness of the LSI condition - it is indeed a wider class of distributions and includes practical. These results are of interest to the stochastic gradient Langevin dynamics (SGLD) literature, as they could be extended to SGLD by handling additional noise in the gradient. This is highly significant, because SGLD (and it variants) are commonly used in practice to sample from non log-concave distributions. The significance of the results for Renyi divergence is more difficult to evaluate. I am not sure what additional benefits Renyi divergence offers (compared to KL-divergence) and the additional assumptions (Lemma 8) and existence of an unknown growth function are more difficult to justify. However, I believe the significance of the first contribution is enough for me to support acceptance. Typos / Minor Comments: -Line 29 could change "L^2" to "L^2-space" to avoid confusion with "L" used for smoothness. -Line 441 "Hessian" -> "Laplacian" === Update based on Author Feedback === Thank you for providing additional motivation for why Renyi divergence may be of practical interest over KL divergence and how the growth function only needs to be a rough initial estimate of the asymptotic bias. Although the toy examples of nonconvex functions that satisfy LSI (in the author feedback) are nice, I still wish that there was a clear motivating practical example (perhaps for Bayesian inference using the Holley-Stroock perturbation theorem, where the prior satisfies a LSI, the loglikelihood is bounded, and the target posterior distribution is not logconcave but will still satisfy LSI).

### Reviewer 2

Summary: Authors study the convergence of unadjusted Langevin algorithm (ULA) which is Euler discretization of overdamped Langevin dynamics. They establish rates in KL divergence and Renyi divergence by only assuming a log-Sobolev inequality and smoothness. The results are significant and paper is well-written. My main concern is that there is no practical demonstration (examples or experiments) of the usefulness of the established results. See my further questions/comments below. ** Major comments: - Authors start their algorithm at N(x_*, . ) where x_* is a first order critical point of the potential function. They claim that x_* can be found via gradient descent. However, in practice they can only find a point that is arbitrarily close to a first-order critical point x_*. Would your final bound change after taking this into account? Same question for line 187. - The class of distributions that satisfy LSI is larger than class of logconcave distributions. However, authors do not provide an example of such a distribution that is used in practice (satisfies LSI but not strong convexity). An example of this sort would further motivate the reader and demonstrate the significance of the established results. - What happens to the upper bounds when you start the algorithm at a deterministic point? - LSI implies (by Talagrand inequality) exponential decay in Wasserstein-2 metric. In the case of exponential contraction in W2, this implies back strong convexity. Is it because you only have exponential decay rather than contraction that your results are more general than strong convexity? ** Minor comments: - I haven't noticed a single typo in the paper. -- I would like to thank the authors for answering my questions. I like the results of this paper; however, I will keep my score (see major concerns above).