NeurIPS 2020
### Noise-Contrastive Estimation for Multivariate Point Processes

### Meta Review

The paper derives a new estimation method for multi-variate point processes that is based on the 'ranking'-variant of NCE. The paper is borderline: two reviewers think that the difference to previous work by Gao (who use NCE to estimate point-processes) and the empirical comparison is not sufficient. Two other reviewers disagree, with one in particular arguing that the paper should be accepted.
The meta-reviewer thinks that the theory in the paper is sufficiently different from Gao's work, and that the theoretical aspects of the paper are deeper and more rigorous. The results do not follow directly from previous work by Gutmann & Hyvarinen (2012) or Ma & Collins (2018).
The empirical results are good and the method should be useful in practice. Moreover, the additional results provided in the rebuttal demonstrate compellingly the advantage compared to previous work.
For these reasons, the meta-reviewer is in favour of accepting the paper, requiring however that (using the additional space available for camera-ready papers):
- empirical comparisons to Gao's work and the least-square estimator are added to the examples considered in the paper (much like the rebuttal, but with a detailed description of the setup and tuning parameters used)
- the discussion of related work, in particular the differences to Gao's work, are expanded
- the figures are presented such that all labels are legible.
(Of course, the changes promised in the rebuttal need to be implemented and the reviewers' comments taken into account when revising the paper)
Additional comments:
- The sentence "Our method is a version of noise-contrastive estimation (NCE), which was originally developed for softmax distributions such as language models." is wrong. NCE is a general estimation principle that was developed for unnormalised (energy-based) models. First applications were in natural image statistics.
- Appendix C2 discusses the idea of using as noise distribution a model previously learned with NCE. Please note and acknowledge that this has already been considered in the original NCE paper in 2010.