NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6160
Title:Exact sampling of determinantal point processes with sublinear time preprocessing

Reviewer 1

Originality. This work provides a novel algorithm for sampling from a DPP, showing empirically that it is a fast algorithm and also providing a strong technical analysis. Quality & clarity. This paper is well-organized and the content is technically solid. Significance. Algorithms for fast exact sampling from DPPs are of interest to the machine learning community to enforce diversity or to perform batch Bayesian optimization. I think many researchers will be benefited from the contributions of this manuscript.

Reviewer 2

Review of "Exact sampling of determinantal point processes with sublinear time preprocessing." Summary: the paper proposes a new algorithm for exact sampling from determinantal point processes (DPP). A DPP sampling problem involves an n x n matrix L; the probability of selecting some subset S of k <= n indices is given by the determinant of a k x k matrix subset divided by the determinant of L + I. The proposed algorithm has preprocessing which is sublinear in matrix size n and polynomial in subset size k; sampling is polynomial in k and independent of n. Previous algorithms which require eigen-decomposition or MCMC are O(n^3). The main idea is to downsample the index set using a regularzed DPP and then run a DPP on this downsample. Algorithms for both fixed and random subset size k are provided. Section 2 gives details of the proposed algorithm, and Section 3 analyzes conditions for fast sampling for random subset size k. Section 4 gives a modified algorithm for fixed subset size k, and proves that it is efficient to simply reject samples which are not size k. Section 5 provides empirical evidence that the proposed algorithm is faster than MCMC and exact baselines, for subsets of MNIST. Strengths: Theoretical results prove the speed and accuracy of the proposed sampling algorithm. Some empirical results proving that the proposed algorithm is empirically faster than previous baselines. Weaknesses: It is claimed that DPP is a "tool in machine learning and recommender systems for inducing diversity in subset selection and as a variance reduction approach" but it is not clear that this paper is relevant for our field. "Determinantal Point Processes for Machine Learning" by Alex Kulesza and Ben Taskar is cited but experimental results measuring empirical accuracy on several data sets should be provided. Can you do computational cross-validation experiments to measure the test error of your algo versus other baselines? That would make your contribution much more convincing. Also is there some metric (other than test error) for measuring the empirical accuracy of your sampling algo, relative to baselines? It would be much more convincing to see some empirical demonstration of the empirical accuracy of your "exact" algorithm, relative to previous exact and inexact algos. Comments/Suggestions: Although the text is well-written in general, the proofs are difficult to follow, because they have multiple lines of math without corresponding comments. It would help to have a line of commentary/explanation for each line of math in your proof. Line 203: Le -> Let. Table 1 is a very informative summary of the time complexity of previously proposed algorithms. The main result (Theorem 1) in Section 1 is somewhat unusual, and redundant with the statements in the text/abstract. Perhaps delete or move later in the paper? It is good that error bands over 10 runs are shown in Figure 1. However linear axes are used, which makes it difficult to see differences between algorithms -- it is not clear whether or not the bottom line is linear or constant. Log-log axes would clarify. Also the orange/blue labels should be changed in either the left or the right plot, because the legend entries are inconsistent, but could be consistent (e.g. change vfx_resample to blue on the right, and change exact_resample to red). On lines 54--55, what is \bar S and \tilde S? They are not defined.

Reviewer 3

This paper studies the problem studying complexity of sampling from of a detrimental point process and proposes efficient algorithms significantly improving state-of-the-art. The paper is very well written. The comment I have is regarding the readability of the paper: may be discussing a use-case and some motivation early in the introduction will make this paper more accessible to a non-expert audience. It may also be useful if the authors discuss other more efficient repulsive point processes based on Poisson Disk Sampling (PDS) methods: 1)Zhang, Cheng, et al. "Active mini-batch sampling using repulsive point processes." arXiv preprint arXiv:1804.02772 (2018). 2)Kailkhura, Bhavya, et al. "A spectral approach for the design of experiments: Design, analysis and algorithms." The Journal of Machine Learning Research 19.1 (2018): 1214-1259. ************************ Response to the rebuttal**************************** I thank the authors for their rebuttal. I am happy with the authors’ response on my suggested changes—motivation and citations. However, I will also encourage authors to include at least one experimental comparison with DPP on the real data.