Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
The authors investigate several settings of selling an item to a single buyer repeatedly over T timesteps, with the goal to maximize revenue. The assumption is that the buyer uses a no-regret algorithm to decide which option (that the seller offers) to choose (given the private value that they have). The authors propose 1-eps approximations of the entire surplus when values are adversarially chosen using a polynomial (in eps^-1) number of options. When restricting attention to the class of critical mechanisms, they achieve revenue guarantee of O(log log H / log H) for stochastic values and O(1/log H) for adversarial values (where H is the support of the distribution). While the problem is well-motivated, the exposition seems pretty informal at times which makes the claims difficult to verify (see details under “improvements”).
This work builds significantly on the results from [Braverman et al 2018] by reducing dependence on the form and structure of buyers value distributions, very much in line with the prior-free and prior-independent thread in AGT. The paper first considers menus of prices, and shows in a rather unrealistic mechanism that the full welfare can be extracted, without knowing the distribution exactly, just the support, and that the number of options is essentially tight. [Braverman et al 2018] gave a similar algorithm, but with dependence on the distributions and more importantly the existence of a distribution in each round. These approaches rely very heavily on the mean-based no regret learning, as there are easy improvements in policies if the learner isn't restricted to mean based. Focusing on mechanisms that are likely to be used, they give a decreasing reserve price for use in a first price auction. [Braverman et al 2018] had given a decreasing reserve price according to an LP that depended explicitly on the value distribution; this work provides a decreasing reserve price just based on the support of the distribution. As it is building directly on Braverman et al 2018, it is not introducing any significant new models, but it is making the approach significantly more robust, as well as illustrates an approach for eliminating prior dependence that may be interesting in other dynamic settings. The paper is generally very well written.
This study deals with pricing mechanisms against a no-regret buyer in prior-independent setting and in prior-free one. In the setup, a single seller interacts with a single buyer. The authors investigate the case of contextual auctions as well. The paper is well written and well structured. The submission is technically sound. However, there are weaknesses as follows: 1. Not clear positioning with respect to related work. The authors write in Lines 49-52: “This opens a wealth of questions of how to robustly design dynamic mechanisms that perform well in the worst-case against some class of learning agents. In this paper, we explore this question for one of the simplest problems in dynamic mechanism design: repeatedly selling a single item to a single buyer for T rounds.” But, there is a large body of studies devoted to worst-case scenario in repeated auctions between a single seller and a single buyer in ML venues: (Amin et al, NIPS’2013), (Mohri&Medina, NIPS’2014), (Drutsa, ICML’2018), (Huang et al, NeurIPS’2018), etc. None of the above has been cited by the authors. [after author feedback: OK, the answer is clear for me. Please, put it in the next version of your paper. The main reason of ambiguity here consists in the following: past few years a very large variety of papers on dynamic auctions/mechanisms has been published. I would suggest authors to highlight the key parameters of a setup in the field of dynamic auctions/mechanisms (make a scheme), e.g.: - single buyer or multiple; - myopic buyer/ strategic one / learning buyer/ etc; - prior-free, prior-[in]dependend, - etc. For instance, it would be great to use a table if space permits. Then, you put related work in this scheme and put your study here.] 2. Why the no-regret buyer is modeled via Bandits algorithm? [after author feedback: I did not get an answer for this; and I do not understand it still. I confirm that Nekipelov et al. argue that the setting is realistic. However, the question was: why a no-regret algorithm is exactly a Bandit algorithm? For instance, the series of works (Amin et al, NIPS’2013), (Mohri&Medina, NIPS’2014), (Drutsa, ICML’2018) consider no-regret algorithms, but they are not Bandit ones.] Do the results hold for a buyer from the study [Heidari et al, ICML’2016]? [after author feedback: OK, the answer is clear for me. Please, put it in the next version of your paper.] 3. Most proofs of theoretical statements are moved to Supplementary Materials. I think, some of the key statements should be supported by at least a sketch proof in the main test. [after author feedback: OK, but I believe that key/central statements should be supplied by key intuitions behind their proofs even in a hard space limit. If the work is so large that it cannot match this, it may be a signal that the paper is better suited for other venue with larger format.] 4. Unclear text in Lines 68-70: “If the option-complexity of the mechanism begins to scale with the number of rounds T, this may even nullify any sort of low-regret or mean-based guarantee the learning algorithm has (it may not even be possible to explore all potential options).” [after author feedback: OK, I get it.]
Originality: reasonable. The paper is a follow-up on Braverman et al. 2018, but it extends that model to adversarial arrivals. I believe the main algorithmic contribution is nicely original, though the approach is probably not that surprising to experts in hindsight. Quality: Good in my opinion. I didn't find errors. Although, the math is very terse in the main submission, I didn't verify proofs. Clarity: Very good in my opinion. My only complaint was not having a helfpul high-level description of the algorithms. Significance: Medium. The paper addresses an extension of the problem that seems very natural. The main significant contribution seems to be the option-based algorithm, but the other results also seem helpful in understanding the space. Other comments: I appreciated that Section 3 goes from easy cases to more difficult ones, but I still had trouble pulling the key idea from even the "warm-up" -- some more intuition would have been helpful. I was a bit confused whether the theorems really capture "extracting full surplus". Any instantiation of the algorithm loses O(T). We can apparently make the constant factor as small as we want, but it's still qualitatively different than an o(T) guarantee.