NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:5151
Title:Dynamic Incentive-Aware Learning: Robust Pricing in Contextual Auctions

Reviewer 1

The authors study the problem of setting (individual) reserve prices in a scenario of repeated contextual second-price auctions. The buyers are assumed strategic, i.e. they optimize a cumulative discounted utility, where their valuations are linear functions of the feature vector of a good. The considered scenario explicitly assumes existence of noise in the market. The seller’s goal is to find an algorithm for setting prices that has sub-linear regret. Two algorithms are proposed: - the first one attain O(d log(Td) log(T)) regret bound, when the market noise distribution is known to the seller. - the other one attain O(d (log (Td))^1/2 T^2/3) regret bound, when the market noise distribution is unknown to the seller. T is the horizon of the game, while d is the dimension of the feature vector space. The authors conducted a large study. The results are technically good. However, the presentation of the work can be significantly improved. Key weaknesses: 1. Positioning with respect to related work: - The authors compare their work with literature on the non-contextual single strategic buyer case and cite [1,25]. However, for this case, the state-of-the-art solutions have been found recently, see (Drutsa, WWW’2017) and (Drutsa, ICML’2018), where first algorithms with optimal regret bound of O(log log T) are presented. It seems that the authors are unaware of those studies. Thus, it is unclear in the paper, whether the proposed algorithms (e.g., CORP) are optimal and is it possible to find an algorithm with a more favorable regret bound (since, in the non-contextual case, the bound is doubly logarithmic). - The buyer valuation is a linear function of a good’s features. This approach assumes that the seller knows in advance the type of the model used by the buyer, what seems too unrealistic. There are works on multi-dimensional non-parametric search, e.g., (Mao et al., NeurIPS’2018) - Besides [2], there is a lot of works on contextual pricing for auctions, where buyers are not strategic. For instance, [11, 18] and (Leme and Schneider, FOCS’2018). I hope a light on this can be done in the paper. - Moreover, Cohen et al. in [11] also provided tricks for robustness of their algorithm (including a stochastic noise). Could you position your ways to fight noise w.r.t. that study, please? [After author feedback phase: Thank you for the clear discussion of the related studies. Please, include these discussions in the final version of your paper. Such a discussion of related work will help ML community to better understand the position and the scope of the study.] 2. The study deals with assumption that the buyer valuation linearly depends on the features of a good. What if the dependence is not linear? [After author feedback phase: Great, it is a good way to extend your result. I hope you add it in the next version of your paper] 3. There is no conclusion of the study. Refs: Drutsa, “Horizon-independent optimal pricing in repeated auctions with truthful and strategic buyers”, WWW’2017: Drutsa, “Weakly Consistent Optimal Pricing Algorithms in Repeated Posted-Price Auctions with Strategic Buyer”, ICML’2018 Mao, Leme, Schneider, “Contextual pricing for Lipschitz buyers”, NeurIPS’2018 Leme and Schneider, "Contextual search via intrinsic volumes", FOCS'2018.

Reviewer 2

This work considers a revenue maximizing seller trying to set personalized reserve prices for a contextual repeated single-item auction among a number of strategic bidders. Importantly, the bidders must have a lower time discount factor than the seller, as this is used to balance the incentive to influence learning from the buyers. The paper proposes two pricing schemes: CORP and SCORP, for the cases that there is known and unknown noise in the bidders value functions respectively. Both schemes are explore/exploit schemes that balance the number of periods of explore vs exploit to curtail the revenue impact from misreports as well as the number of periods in which there could be an incentive to misreport. Importantly, in the explore phase the auction presents one random bidder with a random take it or leave it price and only leverages whether or not the item was purchased for its learning of valuations. This is critical to the analysis as it ensures that the utility penalty due to over or underreporting is high relative to the amount to gain from misreporting in the future. Significance - This work is really the first to tackle learning contextual reserve pricing with multiple strategic bidders and it's approaches should be useful for future research in the space. [Added in discussion period] The paper is very well written, particularly for each pricing policy. The core proofs are very involved - I followed the sketches, but am not able to verify the full proofs. Minor notes: - 327, 468: COPR - 842: over-bided

Reviewer 3

Overall, I found this paper a joy to read. The problem is explained clearly, adequate context is given in the related work section, and the main paper focuses on assumptions, intuition and take-aways rather than getting bogged down in techincal details. I'm not a world-expert on learning with auctions, and rather partially aware of the literature, but that did not hinder me. Given the wide range of interests at NeurIPS, I think that this paper is of interests to those for whom auctions and strategic behavior is not their primary focus. I have a few minor comments: - I'm not particularly convinced by the motivation for buyers using a discount factor, i.e. being less patient than the firm (Paragraph starting at 248). For example, in the targeting example it seems to me that the user who shows up today is not the same as the one showing up tomorrow. Whether a user just visited the site of the buyer seems better modeled by the context rather than by a discount factor. Similarly, I feel that the context should capture the value of using cloud computing capacity rather than a discount factor. I think one can reasonably defend the modeling assumption, and I think the paper could do a better job. - I like the discussion of important features (line 289 and beyond) a lot, and would suggest adding a heading to guide readers to this content; currently, it's easy to overlook. - Given that the discount factor plays such a big role in the results of the paper, it would be nice to have the results explicitly depend on the discount factor (Theorem 4.1). - Footnote 5 raises an interesting question whether a firm can do better by committing to some strategy that does not explicitly try to maximize it's own revenue (for example, by attracting more buyers to the platform). Stated less paradoxically; how does the strategy of the firm relate to the number of buyers on the platform, rather than the number of buyers considered fixed. --- Post rebuttal --- Overall, I think the authors addressed concerns raised in initial review well and incorporating their response into the paper seems beneficial. I'm not fully convinced yet by the justification for the discount factor though (to be clear, I don't mind the assumption but just think the justification is a bit shaky). E.g. in cloud computing the authors write that opportunity costs exists because many tasks are time-sensitive. That makes sense if the same task persists over time (I rather get the task done now than in the next time period). But tasks are different each round, and so there does not seem to be a way to capture time sensitivity. All in all, great job and hope to see the paper accepted!