NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:7786
Title:The Implicit Metropolis-Hastings Algorithm

Reviewer 1


		
- L109 "These properties imply convergence of the Markov chain defined by t to some stationary distribution": in general, irreducibility and aperiodicity does not imply the existence of a stationary distribution (eg. a standard Random walk). - the TV metric does not seem like a good metric for measuring convergence in very high-dimensional spaces such as probability distributions on the set of natural images (where GANs are usually leveraged). - assumption in Section 3.1 are terribly restrictive (especially in high-dim settings) and are not likely to be satisfied in any realistic scenarios where GANs are usually used. - the paper seems to use the languages of convergence of Markov Chain to motivate an additional "post-processing" step on top of a standard GAN / generative process. This is definitely interesting, although the derivation through minoration conditions / cvg of MC is not that convincing. - Although the derivation is not convincing, the numerical seem to indicate that the resulting methodology is worthwhile.

Reviewer 2


		
1. The paper proposes the implicit MH algorithm, which can be used to improve performance of implicit models like GANs. The proposed method can be applied to different models in practice. Compared to some previous work, the proposed method doesn’t need the optimality of the discriminator. 2. The paper provides some analysis on how to minimize the total variation distance between stationary distribution of the chain and the target distribution, how the different objective functions are derived. It also shows some relationships between the proposed method and previous works. 3. The paper considers both Markov proposal and independent proposal, shows how the proposed method can be applied in these scenarios with different objective functions. 4, The motivation is not very clear. The setting seems to suggest applications for sample generation based on a training dataset, similar to the GAN setting. Although experiments show some interesting results, the metric results (FID, IS) are not comparable to some of the recent models (MMD-GAN, SMMD-GAN, Sobolev-GAN, etc.). I think comparisons with some state-of-the-art models are needed; Baselines for comparisons to all the cases are also needed.

Reviewer 3


		
This paper introduces an algorithm for performing Metropolis-Hastings sampling with implicit distributions. This paper is very well written and the more difficult theory parts of the paper are explained very clearly. I particularly found the bound computed without the use of an optimal discriminator unique and interesting. I consider it a useful contribution to the field of deep generative modeling. In section 4.1, are the models pretrained with their original objective? VAEs (as proposed) are also not implicit models, and you can actually generate samples “correct” samples from them with annealed importance sampling. Would these samples have a 100% accept rate? In general, the experiment setup could be explained in more detail. For example, do you train an encoder for the VAE?