__ Summary and Contributions__: Please see the comments to the authors section

__ Strengths__: Please see the comments to the authors section

__ Weaknesses__: Please see the comments to the authors section

__ Correctness__: Please see the comments to the authors section

__ Clarity__: Please see the comments to the authors section

__ Relation to Prior Work__: Please see the comments to the authors section

__ Reproducibility__: Yes

__ Additional Feedback__: The paper studies the problem of designing payment rules for acquiring data. The problem here is that the procurer has no way of directly verifying the accuracy of the data procured. The goal is to design a payment scheme for the procurer so that when he announces that scheme, the data providers don't feel incentivized to misreport their data.
The problem as such is very close the peer prediction literature which suffers from the same problem of being unable to directly verify accuracy. The peer prediction literature has seen a lot of activity recently, and one of the main ideas from there is to pay based on the pointwise-mutual-information (PMI) of two datasets. However directly applying the log-PMI rule developed previously runs into problems of budget infeasibility, non-individual-rationality. The paper adds a normalization to the log-PMI score suggested by prior work --- the normalization is not trivial, but not too involved either. The paper argues that the mechanism is not just truthful, but also that truthful is essentially the only utility maximizing strategy, something they formalize as sensitivity. The paper also studies the repeated setting case where the data is repeatedly generated and applies the same normalization idea.
The paper is very well written, both the introduction and technical section. Comparison to prior work is quite clear. The degree of technical advance compared to prior work may not be sufficient for NeurIPS --- this is a paper that I would like to see published, but maybe doesn't yet clear the NeurIPS bar.
[After author rebuttal]: Thanks for the response. I read it and updated my score.

__ Summary and Contributions__: This paper examines the problem of incentivizing the truthful revelation of data for machine learning. The work builds heavily on prior work [14,15] which establishes a way to do so via peer prediction. In doing so it makes three contributions. First, a way to transform the payments is given so that they are not just truthful but also IR, budget-fixed, and symmetric if the distribution is such that the range of possible payments can be bounded. Second, the transformed payments are shown to satisfy a notion of sensitivity, which means that truthful reporting is strictly better than any misreport that damages the learning from the combined data sets. Third, a trick is introduced to use repeated elicitation to eliminate the requirement that bounds on the range of payments can be derived from the distribution.

__ Strengths__: While the first contribution seems to rely on standard techniques, the second and third contributions seem to be quite novel and interesting. The notion of sensitivity seems a natural desideratum analogous to the concept of strict properness in the scoring rules literature. The third seems a useful relaxation of the technical requirement as well as possibly of interest in other settings where mechanisms like PMI can be applied.

__ Weaknesses__: The core technical content in Sections 5 and 6 is quite compressed, resulting in an exposition that is hard to follow, as discussed under clarity below. I think this rises to the level that it significantly weakens the paper, because while there is enough to convince me that the contributions seem interesting, the presentation is so compressed that I have a hard time digesting what they actually say.

__ Correctness__: Apart from minor presentation issues mentioned under additional feedback the results appear correct.

__ Clarity__: The material from line 212 to the end of Section 5 gives a precise explanation of what the technical conditions are to guarantee sensitivity, but provides very little in the way of intuition for either why those conditions suffice or what those conditions mean. As a result, I don’t even have a clear picture of how common it is for a model to satisfy these conditions.
For the material in Section 6, I think very little of what is going on has anything to do with the data-sets or the mechanism used and instead relies entirely on what ought to be a technical lemma about equation (5) that is instead buried in the proof of Theorem 6.1. I’d like to see the relevant lemma factored out and stated on its own since it may be useful in other settings and doing so will provide more intuition about why the Theorem is actually true than the brief discussion on ines 267-269 that I can’t really follow. Also, because of the way it is written I can’t actually tell whether this lemma is a novel contribution or not. If it is already found in previous work I would significantly reduce my enthusiasm for the third contribution.

__ Relation to Prior Work__: Generally the coverage of prior work is good. One omission is that there is a small literature on aggregating datasets using scoring rules (rather than peer prediction) that uses the same sorts of modeling in terms of exponential families. See the below and subsequent work.
@article{fang2007putting,
title={" Putting Your Money Where Your Mouth Is"-A Betting Platform for Better Prediction},
author={Fang, Fang and Stinchcombe, Maxwell and Whinston, Andrew},
journal={Review of Network Economics},
volume={6},
number={2},
year={2007},
publisher={De Gruyter}
}

__ Reproducibility__: Yes

__ Additional Feedback__: 84 – Here since you are talking about a particular way of using \theta as a parameterization I think it should be “an exponential family” rather than “the exponential family”
125 – The definition of sensitivity is quite hard to parse. I think it could be simplified using the idea that, since the mechanism only actually computes scores based on p(\theta | D), the mechanism could just as easily be defined as taking that as the report directly rather than D, at which point this reduces to more standard notions of strict propriety. The language of property elicitation, which has recently found other use in peer prediction, may be helpful for reformulating in this fashion.
@article{frongillo2017geometric,
title={A geometric perspective on minimal peer prediction},
author={Frongillo, Rafael and Witkowski, Jens},
journal={ACM Transactions on Economics and Computation (TEAC)},
volume={5},
number={3},
pages={1--27},
year={2017},
publisher={ACM New York, NY, USA}
}
159 – In the definitions here, there is a notation conflict, as x is used as a generic stand-in while previously it is more specifically used for the features.
211 – The proof of Theorem 5.1 in the supplemental material only seems to state and show the weaker condition that the mechanism is budget bounded, although the argument for budget-fixed seems straightforward.
Post response: Thank you for the response.

__ Summary and Contributions__: This paper considers the problem of paying users for data when the mechanism has no ground truth to test against. It designs two mechanisms for this. One that works with just one batch of data, and another that works if new data arrives over time. The technical heart is borrowed from the peer prediction literature. However, they also make it work for the exponential family of mechanisms and ensure the mechanisms are budget balanced and IR.

__ Strengths__: + Innovative contribution for an very interesting problem.
+ Extends previous peer prediction results in interesting directions.

__ Weaknesses__: - While the results are nice, there is not too much technical novelty here.
-Some more proof-reading/putting through Grammery required.

__ Correctness__: I did not check everything carefully, but nothing seems incorrect.

__ Clarity__: a few typoes; it is pretty well written overall.

__ Relation to Prior Work__: yes

__ Reproducibility__: Yes

__ Additional Feedback__: Summary: Overall, I think this is a nice contribution for this conference. It has an interesting take on an important problem. While the technical novelty is limited; there is still work and value in generalizing the previous techniques to this new context.
Other comments:
In example 3.2 it seems like you need p(theta) as well to compute p(\theta|x_i, y_i) not just p(\x_i, y_i|theta). In fact, I don’t really understand this claim as stated. It seems more a claim about computation rather than knowledge. Once you can compute p(\x_i, y_i|theta) and know p(theta), then you know the entire prior. But this does not mean that it is easy to compute properties you desire about it.
The function g should be defined (better) in definition 4.2.
Typoes:
Line 54 (remove “to”); Line 78; Line 96; Line 155 -> “Preliminaries”;
******AFTER REBUTTAL************
The rebuttal was not addressed to me, and after reading it my review remains unchanged.

__ Summary and Contributions__: This paper studies the following problem. There are n data providers; provider i holds a dataset D_i. There is some joint distribution p( \theta, D_1, \dots, D_n ). There is a known prior p(\theta), and the distributions are independent conditioned on \theta. The goal is to incentivize the providers to report their true datasets (using a budget of at most B).
The proposed mechanism is based on the pointwise information. Specifically, each provider is assigned a score based on the pointwise information of the reported dataset and sets a payment proportional to this score. This mechanism is shown to be truthful and, under mild assumptions, to be “sensitive”: it is strictly better to report a dataset that induces a correct posterior p( \theta | D_i ). Similar ideas work when the principal needs to acquire data repeatedly.

__ Strengths__: The mechanisms provided are fairly clean and have strong theoretical guarantees. Specifically, I was initially surprised that getting a sensitive mechanism is even possible. In terms of fit, this paper is relevant to the NeurIPS community.

__ Weaknesses__: There seems to be a lot of overlap with the related work in terms of techniques. So, if this paper is seen as an application of known techniques to a new problem then the contribution is marginal.

__ Correctness__: Claims seem to be correct.

__ Clarity__: Overall writing is good.

__ Relation to Prior Work__: The authors are honest about the overlap in techniques with the related work, and the related papers are sufficiently discussed.

__ Reproducibility__: Yes

__ Additional Feedback__: By truthfulness here it is meant bayesian incentive compatibility. The term "truthful" typically means "DSIC", so it would be useful to stress that this is not the case here.
Update: After reading the author rebuttal and the discussion I am more positive towards this paper. I updated my score to reflect this.