Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
EDIT: After reading the author's rebuttal, I changed my assessment of the paper to an accept. The paper is well written and it does a good job at explaining the intuition behind the proposed algorithm. I appreciated the inclusion of the small dimensional toy example as it illustrates in a simple and clear manner the adaptability property of the algorithm. My main concern with the proposed algorithm is that, in my opinion, it is most suitable for small dimensional problems only. The provided examples further justify my impression given that posterior distribution to sample from is of reduced dimension. Consequently, I'm having a hard time justifying the interest of the ML community with respect to the proposed sampling algorithm considering its perceived limited scope. The authors acknoweledge that Sequential Monte Carlo algorithms partly served as an inspiration for the proposed algorithm, however there is no comparison of the proposed algorithm with SMC algorithms. I feel that such a comparison is warranted given that both approaches use a set of points to construct an empirical estimate of the target distribution. I am curious, and I presume that I'm not the only one, as to how the proposed algorithm stacks up to SMC algorithms in a high dimensional setting. I feel that not enough attention is devoted to the case when we restrict ourselves to a diagonal covariance matrix for the proposal distribution. If one wants to apply the proposed algorithm in a high dimensional setting, then this is the only viable manner to do it. In such a scenario we cannot afford to store in memory the full covariance matrix for the proposal distribution. Furthermore, it would not at all be trivial to sample from the proposal distribution even when it is a gaussian distribution. It would be interesting to see and analyze how the generated samples manage to capture the covariance structure of the target distribution. I will list in the following two small issues that I encountered when reading the paper. The first issue is that there are way too many citations, 62 for an 8 page article. Whenever there are multiple sources for an algorithm, please cite only the most representative one. The second issue is that the text in figures 1 and 2 is hardly visible when the article is printed, it would be really great to increase the size of the text.
Originality: The specific algorithm and ergodicity results are novel (however, as pointed out by the authors, it is a special case of the sequential substitution framework proposed in [11, 12]. Quality/clarity: In my view, the manuscript is well written and the methodology is presented in a clear and sufficiently rigorous manner. Post-rebuttal edit: ------------------------------------------------------------------------------- The authors' rebuttal provides further (empirical) support for the idea that the proposed MCMC kernel can work well even in higher dimensions. Furthermore, as already mentioned in my review, the presentation of the methodology in the paper is clear and there is substantial amount of theoretical support. The same cannot be said for most other NeurIPS submissions that I have seen this year. As a result, I am happy to raise my recommendation from 6 to 7.
The method is interesting and the paper is well-written. Moreover, it seems technically sound. Perhaps the contribution is a bit incremental and the degree of novelty is not very high. However, the state-of-the-art is well done, the paper is easy to read. I believe that in the literature, there is also a need of papers which focus on a detailed study of the previous literature in order to yield important and required variants, as new algorithms. I have just some suggestions for completing the related references, clarify some points and maybe possible future works. - I believe that your proposal could be itself a mixture of densities. In this sense, you could mix your work with the ideas in Cappe et al, “Adaptive Importance Sampling in general mixture classes”, Statistics and Computing 2008, or better for MCMC to have a look to Luengo and Martino, "Fully Adaptive Gaussian Mixture Metropolis-Hastings Algorithm", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver (Canada), 2013. Please discuss. - Another possible approach for using a mixture is the nice idea in G. R. Warnes, “The Normal Kernel Coupler: An adaptive Markov Chain Monte Carlo method for efficiently sampling from multi-modal distributions,” Technical Report, 2001. That you already consider as a reference. In any case, even without considering the possible use of a mixture, the discussion regarding the Normal Kernel Coupler (NKC) should be extended since in this method there is also the idea of replacing one sample inside a population (one per iteration). In this sense there are several connections with the SA-MCMC method. Please discuss relationships and differences. In this sense, I also suggest to have a look to the recent reference F. Llorente et al, "Parallel Metropolis-Hastings Coupler", IEEE Signal Processing Letters, 2019, which combines OMCMC and NKC. It could be nice to discuss it also as a possible future work (extension) of your paper. These discussions can improve substantially the paper and make it a very complete and nice piece of work. I also suggest to upload your revised work to arXiv and/or ResearchGate to increase its impact.