__ Summary and Contributions__: This paper studies the sample complexity of estimating the buyers' utilities in non-truthful auctions, such as first-price auctions and all-pay auctions, against any strategies. In this work, the samples are drawn from the prior distribution (instead of historical bids).
The authors provide tight bounds of sample complexities for various auction formats, including the first-price auction, the all-pay auction, and auctions with search costs. They also give a slightly improved bound for the Pandora's Box problem.
With the estimated buyers' utilities, one can compute the approximate-BNE. As an application, the authors obtain a sample-efficient and computational-efficient algorithm for computing approximate-BNE in the first-price auction.

__ Strengths__: (1) This paper is well-written, clear, and easy to follow.
(2) The authors provide full characterizations of sample complexities for various auctions, including the first-price auction, the all-pay auction, and auctions with search costs. The upper bounds and lower bounds are matched up to polylog factors.
(3) This paper concerns the sample complexity of non-truthful auctions, which is relevant and interesting to the NeurIPS community, especially people working in computational economics.

__ Weaknesses__: This paper seems to be a dense theoretical paper with many results, which does not fit the 8-page limit in NeurIPS very well (but the authors do a good job of presenting their findings and high-level intuitions.)

__ Correctness__: The reviewer did not check the proofs in the appendix, but all the results seem sound and plausible in hindsight. The proof of Lemma 3.7 is clear and easy to follow.

__ Clarity__: This paper is generally well written, clear, and easy to follow.

__ Relation to Prior Work__: The authors do a good job of surveying the literature.

__ Reproducibility__: Yes

__ Additional Feedback__: Do the results continue to hold for any arbitrary tie-breaking rules? For example, ties are broken lexicographically. [After rebuttal: thank you for the response.]

__ Summary and Contributions__: The paper studies the sample complexity of learning the interim utility functions in single-item first-price auctions with bounded valuations. The main result is that with O(n / \eps^2) samples, whp the empirical utility functions on the sample set induced by all monotone bidding strategies uniformly converge to the actual interim utility functions with an additive error of \eps. Moreover, the authors give a lower bound that matches the above sample complexity up to a polylogarithmic factor. The authors further show that uniform convergence holds even if one replaces the empirical distribution with the product of the empirical marginal distributions. Combined with a recent result by Shen et al. this gives a Monte Carlo algorithm for computing \eps-BNE for first-price auctions with arbitrary (but bounded and independent across agents) valuation distributions. The authors also study extensions of their results to all-pay auctions, etc.

__ Strengths__: The problem studied is of theoretical and practical significance. The empirical utility functions as estimators of the actual utility functions are quite natural. The main result (the sample complexity upper bound) is approximately optimal, and in particular implies an algorithm for computing \eps-BNE. The extensions also suggest the techniques are quite general and might be useful in other contexts.

__ Weaknesses__: While I like the paper overall, given the theoretical nature of the paper, I'm a bit more inclined to evaluate it from a technical perspective, as is common at TCS venues. The authors mention that their bound on the pseudo-dimension "follows a powerful framework for bounding pseudo-dimensions, introduced by Morgenstern and Roughgarden (2016) and Balcan et al. (2018)" -- I'd appreciate it if the authors could further clarify how their proofs rely on / differ from those in previous work. Similarly, the authors say that Lemma 3.10 "is closely related to a concentration inequality by Devanur et al. (2016)" -- which makes me wonder how exactly they are related since very superficially they do look quite similar. I'm definitely not obsessed with "technical novelty" but it would be nice if the authors could make the connections clear (which I don't think would in any way diminish the contribution of the current paper).

__ Correctness__: I believe the main results are correct, though I did not check the proofs in the appendices.

__ Clarity__: The paper is overall well-written and carefully polished. I feel a bit lost in Section 4, where the authors discuss the model with costly search, Dutch auctions and Padora's Box in passing and present their results in similar settings, but given the rather strict length limit of NeurIPS submissions this is more or less inevitable.

__ Relation to Prior Work__: The authors give a satisfactory discussion of related work.

__ Reproducibility__: Yes

__ Additional Feedback__: ===== after rebuttal =====
The response does answer my questions, and my overall evaluation remains the same.
====================
Does Theorem 3.3 hold even when the valuations are not independent?

__ Summary and Contributions__: This paper studies the sample complexity of learning the interim
utility in a non-truthful, single-item auction, uniformly for any strategy profile,
by sample access to the value distribution and ability to calculate ex
post utility in the auction. The results are attained under the
restriction to monotone bidding strategies. Surprisingly, this can be done in
n/eps^2, for first price and all-pay auctions, and this bound is
tight (they also make observations for descending price, dutch auctions). This is interesting, because by
estimating interim utilities then in principle, all approximate
equilibria can then be understood (since they are implied by understanding interim
utility). The sample complexity bound uses pseudo-dimension.

__ Strengths__: As best as I know, the results are novel, and are technically sophisticated even though they don't require any truly new machinery.

__ Weaknesses__: I am not sure that the work can be immediately applied. The authors talk in the introduction about using this kind of method in the
following way:
(1) Evaluating the performance of a strategy given what other bidders do.
--> yes, but how would an agent would form a hypothesis about what strategies others will follow?
(2) Is a profile of strategies in an eps-BNE?
--> yes, but how to find a candidate BNE to test?
(3) Estimate the interim regret to a bidder, assuming others are truthful
--> how? Just plugging in truthful strategies doesn't work. Rather, we need to find the best response to truthful strategies by others.
[After rebuttal:] I appreciate the authors' responses, and suggest to find space to lift out these practical observations in the camera-ready version of the paper.

__ Correctness__: The claims appear to be sound, with complete proofs for all results
and a good understanding of the relevant literature.

__ Clarity__: Yes, the exposition of the main results is clear.

__ Relation to Prior Work__: Yes, this is good.

__ Reproducibility__: Yes

__ Additional Feedback__: The main question for me, outlined above, is exactly how this result can be applied. For example, in applying to interim regret, where does the "max over type for bidder i" get handled in making use of the methods outlined here? This needs to be handled to get at regret bounds, as per the bounds in Balcan et al. and Duetting et al. (ICML 2019)

__ Summary and Contributions__: This paper examines the sample complexity of learning interim utilities for monotone bidding strategies in a first price auctions. It provides two estimators to achieve this with essentially tight bounds. The first estimator introduces correlations which are undesirable for further estimation about equilibria, motivating the second estimator. An extension to auctions with costly search is sketched, with the details deferred to the supplemental material. It is also observed that the results apply to all pay auctions.

__ Strengths__: The paper introduces a new type of question about sample complexity in auctions, namely learning interim utilities, and provides tight bounds for it in the natural starting example of first price auctions. The idea of restricting to learning for monotone strategies is natural and, while I haven’t followed the literature on sample complexity of auctions carefully enough to be fully confident about this, as far as I know novel as well. The extension to costly search refines bounds for the Pandora’s Box algorithm, which has found recent use in a number of economic settings and so may be of independent interest.

__ Weaknesses__: The problem of learning utilities is certainly natural, but I do see a bit of a disconnect. The first paragraph starts with some very concrete motivations for why we may want to be able to estimate interim utilities. But then these motivations disappear from the rest of the paper and even the broader impact statement acknowledges that this is not intended to have an impact on practice. This seems to be a missed opportunity to me. The paper would be stronger if the sample complexity bounds derived could be connected back to showing how they can be used to inform practical decisions or recommendations in auction systems, of the sort referenced in the first paragraph. One potential issue I see with this is that the bounds apply to arbitrary choices of monotone strategies, and for most of the applications of interest some sort of assumption or information would be needed about exactly which strategies bidders are using.
The extension to auctions with costly search doesn’t seem particularly well motivated beyond pointing to it as a natural problem studied in prior work. Additionally, the treatment in the main text is quite brief. So while this may be interesting from a technical perspective, it isn’t clear to me that it adds much to the paper in its current form.

__ Correctness__: While I did not examine the supplemental material, all the key results in the paper are clearly stated and accompanied by intuition for the proof strategy or proofs for special cases which appear reasonable.

__ Clarity__: The paper is clear and well written. The technical exposition is precise and all notation is carefully defined. The decisions about what to technical arguments to include in the main text and what to defer to the appendix seem tasteful. I have a number of small comments under additional feedback, but the only one I think is particularly significant is clarifying the issue on line 249.

__ Relation to Prior Work__: The paper provides a good overview of related work.

__ Reproducibility__: Yes

__ Additional Feedback__: 10 – “connection between this setting and the first price auction”. Isn’t this setting the first price auction?
53 – If an O~ bound is tight up to a log factor, isn’t it just tight? And is log actually what is meant here or should it be polylog?
55 – The difference between this work and prior work with type samples is not explained here. In particular, the line of prior work focused on estimation for revenue maximization while this work focuses on estimating interim utilities, right? This is explained later on line 113 but it seems like it should already be mentioned here.
142 – should be the sum of the x_i, not the b_i
171 – should identify the utility learning algorithm as A
175 – I would say “take B” rather than “restrict B” here because a particular choice of B is assumed.
249 – I’m a bit unclear about the issue of correlation being discussed here. F is assumed to be a product distribution, so the s_j^i aren’t correlated (which was my first reading of what was being claimed here). Looking at the way Empp is designed I can see how this breaks apart the original samples exploiting that we have a product distribution, but I’m still not clear exactly what the crucial correlations being broken are (although I haven’t studied the proof of Corollary 3.13 to try and figure this out).
Post response comments: To clarify my comments on the correlation, I did eventually figure out that the correlations were what you meant, but I'm still not clear on why breaking them is important for your results.