NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:6422
Title:Prediction of Spatial Point Processes: Regularized Method with Out-of-Sample Guarantees

Reviewer 1

The paper under review introduces a new inference methodology for the estimation of spatial point processes intensity for which the authors derive confidence interval with guaranteed out-of-sample performance. The approach combines a discretised Poisson process with the conformal prediction framework to obtain point-wise confidence intervals. The authors propose an efficient inference procedure based on a majorization-minimization algorithm. The approach is illustrated by simulations and applied to two different data sets. The paper is very well written but some imprecisions in the notations make the exposition sometime difficult to follow, for instance it is a good practice to denote random variables with upper case letters. This work is original and of good quality, the reviewer has the following comments: Minor points: - l.3-4: the authors mention a tuning-free regularized criterion, but the choice of $\gamma$ is never discussed and chosen arbitrarily. Remove this point or give further explanations. - l.27: `a spatially varying intensity interval' the formulation is awkward. Maybe, develop `an inference procedure for the spatial intensity of a point process'. Inference imply directly that you derive not only a point estimate but also confidence intervals. - l.29-35: the choice a bullet list is strange as only the first bullet is a contribution, the other three are precisions/details on this contribution. - l.39: NotationS - Figure 1 is really misleading as it shows an example where the estimator does not rely on discretization of the space, which make the presentation confusing. Should be changed. - l.44: the index $r$ is missing the union. Isn't $\lambda(x)$ the expected number of event? Is $y$ an event or the number of events? $Y$ is not defined (maximum number of events?). - l.47: Further details about the motivation for partitioning should be given (more than `it is usual practice'). - Equation (3): I don't really understand the notation. Shouldn't be $Pr\{y \in \Lambda(x) \} > 1 - \alpha$ for all $x \in \chi$ as suggested by Algorithm 1? - l.64: $r$ is an index and not a region. - l.65: what do you mean by `free to vary $\tilde{y}$'? - Algorithm 1: In 3, shouldn't be find the $r$ corresponding to $x$? In 5, do you accumulate the score the ones of previous value of $\tilde{y}$? - Equation 5: in KL, there is no $n$. The empirical estimator of the KL should be introduced later. Or replace be the log-likelihood function ($p(y|r)$ is not accessible in practice). - l.88-89: $\Phi$ is not defined in the equation. - Equation 6: now it should be the log-likelihood function. - Equation 7: $\gamma$ is not defined. - l.102: why do you restrict $\gamma$ to $[(0,1/2)$? Can you comment on it? - l.107+: `we then obtain..` is it the estimator you obtain with this particular choice of weights? Then the new $\hat{\theta}$ might benefit from a new notation to stress the difference. - l.110: $p(y|r)$ is not known and it is not possible to compute the empirical divergence. - Algorithm 2: why not using `while' and specify the exact criteria for convergence? - Section 4.1: Might be worth mentioning that $Y = 10$. - Section 4.1: would it be possible to have (in Appendix) the empirical coverage rate? As this is a simulation study, that can be easily computed and would be highly pertinent. Indeed, your methodology might have reach $100\%$ easily (too conservative) while the likelihood approach might have a lower but still good coverage. Those tables would help to understand the differences. - Figure 2. a): Can you explain why the likelihood approach yields wider confidence interval on the left side on the interval? This sounds conter-intuitive as your method ought to be more conservative. - Figure 2. b): The bad performance of the likelihood seems caused here by a not strong enough regularization/bad choice of priori. - Section 4.2: what is the value of $Y$ and how did you choose it? Does it impact the results? - l. 204-205: from the simulation study we cannot say that your methodology yields more informative intervals. Indeed an interval with $100\%$ coverage is not informative. So either you give the empirical coverage numbers of the simulation study to illustrate that it is the case in this setting, or you just re-formulate with `leads to intervals with guaranteed out-of-sample coverage level'.

Reviewer 2

This paper presents a regularized method of spatial point process to infer predictive intensity intervals. The intensity interval is constructed using a spatial Poisson model with provable out of sample accuracy. The method is demonstrated using both synthetic and real spatial data. In this work, the intensity interval is developed using the conformal prediction framework. The intensity interval exhibits provable out of sample prediction performance guarantee. My major concern is the practicality of the proposed method. For the intensity interval where there is missing data, the estimator is relatively useful in the interpolated area. It is not surprising that the intensity interval grows drastically in the extrapolated area (see the right side of Figure 1). Also regarding the learning criterion (7), the first term (log-likelihood) is proportional to n^-1, and the second term (regularization) is proportional to n^-gamma, where gamma is close to 0.5. Hence the two terms look a bit imbalanced. It will be helpful if the proposed method can be further tested with more datasets and clarified in details. Also, it will be helpful if the proposed method can be compared with the state of the art methods besides LCGP. Moreover, it will be useful to discuss the computation complexity of the proposed method.

Reviewer 3

I think the application of conformal prediction to spatial point processes within the ML community is interesting, particularly as most recent work in this area has taken on a Bayesian flavor, and we lack sufficient rigor in evaluating accuracy of uncertainty quantification. I do think that it could be made a little more clear that the intervals require a partitioning of the domain, and I would be interested to understand how this partitioning affects the performance (and bound)---is improper discretization covered under the model-misspecification guarantee? In general, I think the submission is reasonably well written, novel (to the best of my knowledge), and would be of interest to those in the community who work on point processes.